<?xml version='1.0' encoding='ascii'?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc ipr="trust200902" category="info" docName="draft-ietf-aqm-fq-codel-00" obsoletes="" updates="" submissionType="IETF" xml:lang="en">
  <front>
    <title abbrev="Fq_Codel">FlowQueue-Codel</title>
    <author initials="T." surname="H&#248;iland-J&#248;rgensen" fullname="Toke H&#248;iland-J&#248;rgensen">
      <organization>Karlstad University</organization>
      <address>
        <postal>
          <street>Dept. of Computer Science</street>
          <city>Karlstad</city>
          <region/>
          <code>65188</code>
          <country>Sweden</country>
        </postal>
        <phone/>
        <email>toke.hoiland-jorgensen@kau.se</email>
        <uri/>
      </address>
    </author>
    <author initials="P." surname="McKenney" fullname="Paul McKenney">
      <organization>IBM Linux Technology Center</organization>
      <address>
        <postal>
          <street>1385 NW Amberglen Parkway</street>
          <city>Hillsboro</city>
          <region>OR</region>
          <code>97006</code>
          <country>USA</country>
        </postal>
        <phone/>
        <email>paulmck@linux.vnet.ibm.com</email>
        <uri>http://www2.rdrop.com/~paulmck/</uri>
      </address>
    </author>
    <author initials="D." surname="Taht" fullname="Dave Taht">
      <organization>Teklibre</organization>
      <address>
        <postal>
          <street>2104 W First street</street>
          <street>Apt 2002</street>
          <city>FT Myers</city>
          <region>FL</region>
          <code>33901</code>
          <country>USA</country>
        </postal>
        <phone/>
        <email>d+ietf@teklibre.com</email>
        <uri>http://www.teklibre.com/</uri>
      </address>
    </author>
    <author initials="J." surname="Gettys" fullname="Jim Gettys">
      <organization>Google, Inc.</organization>
      <address>
        <postal>
          <street>21 Oak Knoll Road</street>
          <city>Carlisle</city>
          <region>MA</region>
          <code>01741</code>
          <country>USA</country>
        </postal>
        <phone/>
        <email>jg@freedesktop.org</email>
        <uri>https://en.wikipedia.org/wiki/Jim_Gettys</uri>
      </address>
    </author>
    <author initials="E." surname="Dumazet" fullname="Eric Dumazet">
      <organization>Google, Inc.</organization>
      <address>
        <postal>
          <street>1600 Amphitheater Pkwy</street>
          <city>Mountain View</city>
          <region>Ca</region>
          <code>94043</code>
          <country>USA</country>
        </postal>
        <phone/>
        <email>edumazet@gmail.com</email>
        <uri/>
      </address>
    </author>
    <date month="December" year="2014"/>
    <area>General</area>
    <workgroup>AQM working group</workgroup>
    <keyword>RFC</keyword>
    <keyword>Request for Comments</keyword>
    <keyword>I-D</keyword>
    <keyword>Internet-Draft</keyword>
    <keyword>XML</keyword>
    <keyword>Pandoc</keyword>
    <keyword>Extensible Markup Language</keyword>
    <abstract>
      <t>This memo presents the FQ-CoDel hybrid packet scheduler/AQM algorithm, a critical tool for fighting bufferbloat and reducing latency across the Internet.  </t>
      <t>FQ-CoDel mixes packets from multiple flows and reduces the impact of head of line blocking from bursty traffic. It provides isolation for low-rate traffic such as DNS, web, and videoconferencing traffic. It improves utilisation across the networking fabric, especially for bidirectional traffic, by keeping queue lengths short; and it can be implemented in a memory- and CPU-efficient fashion across a wide range of hardware.  </t>
    </abstract>
  </front>
  <middle><!--This document was prepared using Pandoc2rfc, https://github.com/miekg/pandoc2rfc --><section title="Introduction" anchor="introduction" toc="default"><t>The FQ-CoDel algorithm is a combined packet scheduler and AQM developed as part of the bufferbloat-fighting community effort. It is based on a modified Deficit Round Robin (DRR) queue scheduler, with the CoDel AQM algorithm operating on each sub-queue. This document describes the combined algorithm; reference implementations are available for ns2 and ns3 and it is included in the mainline Linux kernel as the FQ-CoDel queueing discipline.  </t><t>The rest of this document is structured as follows: This section gives some concepts and terminology used in the rest of the document, and gives a short informal summary of the FQ-CoDel algorithm. Section 2 gives an overview of the CoDel algorithm.  Section 3 covers the DRR portion. Section 4 defines the parameters and data structures employed by FQ-CoDel. Section 5 describes the working of the algorithm in detail. Section 6 describes implementation considerations, and section 7 lists some of the limitations of using flow queueing. Finally section 11 concludes.  </t><section title="Terminology and concepts" anchor="terminology-and-concepts" toc="default"><t>Flow: A flow is typically identified by a 5-tuple of source IP, destination IP, source port, destination port, and protocol. It can also be identified by a superset or subset of those parameters, or by mac address, or other means.  </t><t>Queue: A queue of packets represented internally in FQ-CoDel. In most instances each flow gets its own queue; however because of the possibility of hash collisions, this is not always the case.  In an attempt to avoid confusion, the word 'queue' is used to refer to the internal data structure, and 'flow' to refer to the actual stream of packets being delivered to the FQ-CoDel algorithm.  </t><t>Scheduler: A mechanism to select which queue a packet is dequeued from.  </t><t>CoDel AQM: The Active Queue Management algorithm employed by FQ-CoDel.  </t><t>DRR: Deficit round-robin scheduling.  </t><t>Quantum: The maximum amount of bytes to be dequeued from a queue at once.  </t></section><section title="Informal summary of FQ-CoDel" anchor="informal-summary-of-fq-codel" toc="default"><t>FQ-CoDel is a <spanx style="emph" xml:space="preserve">hybrid</spanx> of <xref target="DRR" pageno="false" format="default">DRR</xref> and <xref target="CODELDRAFT" pageno="false" format="default">CoDel</xref>, with an optimisation for sparse flows similar to <xref target="SQF2012" pageno="false" format="default">SQF</xref> and <xref target="DRRPP" pageno="false" format="default">DRR++</xref>. We call this "Flow Queueing" rather than "Fair Queueing" as flows that build a queue are treated differently than flows that do not.  </t><t>FQ-CoDel stochastically classifies incoming packets into different sub-queues by hashing the 5-tuple of IP protocol number and source and destination IP and port numbers, perturbed with a random number selected at initiation time (although other flow classification schemes can optionally be configured instead). Each queue is managed by the CoDel queueing discipline. Packet ordering within a queue is preserved, since queues have FIFO ordering.  </t><t>The FQ-CoDel algorithm consists of two logical parts: the scheduler which selects which queue to dequeue a packet from, and the CoDel AQM which works on each of the queues. The subtleties of FQ-CoDel are mostly in the scheduling part, whereas the interaction between the scheduler and the CoDel algorithm are fairly straight forward: </t><t>At initialisation, each queue is set up to have a separate set of CoDel state variables. By default, 1024 queues are created. The current implementation supports anywhere from one to 64K separate queues, and each queue maintains the state variables throughout its lifetime, and so acts the same as the non-FQ CoDel variant would. This means that with only one queue, FQ-CoDel behaves essentially the same as CoDel by itself.  </t><t>On dequeue, FQ-CoDel selects a queue from which to dequeue by a two-tier round-robin scheme, in which each queue is allowed to dequeue up to a configurable quantum of bytes for each iteration.  Deviations from this quantum is maintained as a deficit for the queue, which serves to make the fairness scheme byte-based rather than a packet-based. The two-tier round-robin mechanism distinguishes between "new" queues (which don't build up a standing queue) and "old" queues, that have queued enough data to be around for more than one iteration of the round-robin scheduler.  </t><t>This new/old queue distinction has a particular consequence for queues that don't build up more than a quantum of bytes before being visited by the scheduler: Such queues are removed from the list, and then re-added as a new queue each time a packet arrives for it, and so will get priority over queues that do not empty out each round (except for a minor modification to protect against starvation, detailed below). Exactly how much data a flow has to send to keep its queue in this state is somewhat difficult to reason about, because it depends on both the egress link speed and the number of concurrent flows. However, in practice many things that are beneficial to have prioritised for typical internet use (ACKs, DNS lookups, interactive SSH, HTTP requests, ARP, ICMP, VoIP) <spanx style="emph" xml:space="preserve">tend</spanx> to fall in this category, which is why FQ-CoDel performs so well for many practical applications.  However, the implicitness of the prioritisation means that for applications that require guaranteed priority (for instance multiplexing the network control plane over the network itself), explicit classification is still needed.  </t><t>This scheduling scheme has some subtlety to it, which is explained in detail in the remainder of this document.  </t></section></section><section title="CoDel" anchor="codel" toc="default"><t>CoDel is described in the the <xref target="CODEL2012" pageno="false" format="default">ACM Queue paper</xref>, and the <xref target="CODELDRAFT" pageno="false" format="default">AQM working group draft</xref>. The basic idea is to control queue length, maintaining sufficient queueing to keep the outgoing link busy, but avoiding building up the queue beyond that point. This is done by preferentially dropping packets that remain in the queue for &#8220;too long&#8221;. The CoDel algorithm itself will not be described here; instead we refer the reader to the <xref target="CODELDRAFT" pageno="false" format="default">CoDel draft</xref>.  </t></section><section title="Flow Queueing" anchor="flow-queueing" toc="default"><t>FQ-CoDel's DRR scheduler is byte-based, employing a deficit round-robin mechanism between queues. This works by keeping track of the current byte <spanx style="emph" xml:space="preserve">deficit</spanx> of each queue. This deficit is initialised to the configurable quantum; each time a queue gets a dequeue opportunity, it gets to dequeue packets, decreasing the deficit by the packet size for each packet, until the deficit runs into the negative, at which point it is increased by one quantum, and the dequeue opportunity ends.  </t><t>This means that if one queue contains packets of, for instance, size quantum/3, and another contains quantum-sized packets, the first queue will dequeue three packets each time it gets a turn, whereas the second only dequeues one. This means that flows that send small packets are not penalised by the difference in packet sizes; rather, the DRR scheme approximates a (single-)byte-based fairness queueing.  The size of the quantum determines the scheduling granularity, with the tradeoff from too small a quantum being scheduling overhead. For small bandwidths, lowering the quantum from the default MTU size can be advantageous.  </t><t>Unlike DRR there are two sets of flows - a "new" list for flows that have not built a queue recently, and an "old" list for flow-building queues.  </t></section><section title="FQ-CoDel Parameters and Data Structures" anchor="fq-codel-parameters-and-data-structures" toc="default"><t>This section goes into the parameters and data structures in FQ-CoDel.  </t><section title="Parameters" anchor="parameters" toc="default"><section title="Interval" anchor="interval" toc="default"><t>The <spanx style="emph" xml:space="preserve">interval</spanx> parameter has the same semantics as CoDel and is used to ensure that the measured minimum delay does not become too stale. The minimum delay MUST be experienced in the last epoch of length interval. It SHOULD be set on the order of the worst-case RTT through the bottleneck to give end-points sufficient time to react.  </t><t>The default interval value is 100 ms.  </t></section><section title="Target" anchor="target" toc="default"><t>The <spanx style="emph" xml:space="preserve">target</spanx> parameter has the same semantics as CoDel. It is the acceptable minimum standing/persistent queue delay for each FQ-CoDel Queue. This minimum delay is identified by tracking the local minimum queue delay that packets experience.  </t><t>The default target value is 5 ms, but this value SHOULD be tuned to be at least the transmission time of a single MTU-sized packet at the prevalent egress link speed (which for e.g. 1Mbps and MTU 1500 is ~15ms). It should otherwise be set to on the order of 5-10% of the configured interval.  </t></section><section title="Packet limit" anchor="packet-limit" toc="default"><t>Routers do not have infinite memory, so some packet limit MUST be enforced.  </t><t>The <spanx style="emph" xml:space="preserve">limit</spanx> parameter is the hard limit on the real queue size, measured in number of packets. This limit is a global limit on the number of packets in all queues; each individual queue does not have an upper limit. When the limit is reached and a new packet arrives for enqueue, a packet is dropped from the head of the largest queue (measured in bytes) to make room for the new packet.  </t><t>The default packet limit is 10240 packets, which is suitable for up to 10GigE speeds. In practice, the hard limit is rarely, if ever, hit, as drops are performed by the CoDel algorithm long before the limit is hit. For platforms that are severely memory constrained, a lower limit can be used.  </t></section><section title="Quantum" anchor="quantum" toc="default"><t>The <spanx style="emph" xml:space="preserve">quantum</spanx> parameter is the number of bytes each queue gets to dequeue on each round of the scheduling algorithm. The default is set to 1514 bytes which corresponds to the Ethernet MTU plus the hardware header length of 14 bytes.  </t><t>In TSO-enabled systems, where a "packet" consists of an offloaded packet train, it can presently be as large as 64K bytes. In GRO-enabled systems, up to 17 times the TCP max segment size (or 25K bytes).  </t></section><section title="Flows" anchor="flows" toc="default"><t>The <spanx style="emph" xml:space="preserve">flows</spanx> parameter sets the number of sub-queues into which the incoming packets are classified. Due to the stochastic nature of hashing, multiple flows may end up being hashed into the same slot.  </t><t>This parameter can be set only at load time since memory has to be allocated for the hash table in the current implementation.  </t><t>The default value is 1024.  </t></section><section title="ECN" anchor="ecn" toc="default"><t>ECN is <spanx style="emph" xml:space="preserve">enabled</spanx> by default. Rather than do anything special with misbehaved ECN flows, FQ-CoDel relies on the packet scheduling system to minimise their impact, thus unresponsive packets in a flow being marked with ECN can grow to the overall packet limit, but will not otherwise affect the performance of the system.  </t><t>It can be disabled by specifying the <spanx style="emph" xml:space="preserve">noecn</spanx> parameter.  </t></section></section><section title="Data structures" anchor="data-structures" toc="default"><section title="Internal sub-queues" anchor="internal-sub-queues" toc="default"><t>The main data structure of FQ-CoDel is the array of sub-queues, which is instantiated to the number of queues specified by the <spanx style="emph" xml:space="preserve">flows</spanx> parameter at instantiation time. Each sub-queue consists simply of an ordered list of packets with FIFO semantics, two state variables tracking the queue deficit and total number of bytes enqueued, and the set of CoDel state variables. Other state variables to track queue statistics can also be included: for instance, the Linux implementation keeps a count of dropped packets.  </t><t>Queue space is shared: there's a global limit on the number of packets the queues can hold, but not one per queue.  </t></section><section title="New and old queues lists" anchor="new-and-old-queues-lists" toc="default"><t>FQ-CoDel maintains two lists of active queues, called "new" and "old" queues. Each list is an ordered list containing references to the array of sub-queues.  When a packet is added to a queue that is not currently active, that queue becomes active by being added to the list of new queues. Later on, it is moved to the list of old queues, from which it is removed when it is no longer active. This behaviour is the source of some subtlety in the packet scheduling at dequeue time, explained below.  </t></section></section></section><section title="The FQ-CoDel scheduler and AQM interactions" anchor="the-fq-codel-scheduler-and-aqm-interactions" toc="default"><t>This section describes the operation of the FQ-CoDel scheduler and AQM. It is split into two parts explaining the enqueue and dequeue operations.  </t><section title="Enqueue" anchor="enqueue" toc="default"><t>The packet enqueue mechanism consists of three stages: classification into a sub-queue, timestamping and bookkeeping, and optionally dropping a packet when the total number of enqueued packets goes over the maximum.  </t><t>When a packet is enqueued, it is first classified into the appropriate sub-queue. By default, this is done by hashing on the 5-tuple of IP protocol, and source and destination IP and port numbers, permuted by a random value selected at initialisation time, and taking the hash value modulo the number of sub-queues.  However, an implementation MAY also specify a configurable classification scheme along a wide variety of other possible parameters such as mac address, diffserv, firewall and flow specific markings, etc. (the Linux implementation does so in the form of the 'tc filter' command).  </t><t>If a custom filter fails, classification failure results in the packet being dropped and no further action taken. By design the standard filter cannot fail.  </t><t>Additionally, the default hashing algorithm presently deployed does decapsulation of some common packet types (6in4, IPIP, GRE 0), mixes IPv6 IP addresses thoroughly, and uses Jenkins hash on the result.  </t><t>Once the packet has been successfully classified into a sub-queue, it is handed over to the CoDel algorithm for timestamping. It is then added to the tail of the selected queue, and the queue's byte count is updated by the packet size. Then, if the queue is not currently active (i.e. if it is not in either the list of new or the list of old queues), it is added to the end of the list of new queues, and its deficit is initiated to the configured quantum.  Otherwise it is added to the old queue list.  </t><t>Finally, the total number of enqueued packets is compared with the configured limit, and if it is <spanx style="emph" xml:space="preserve">above</spanx> this value (which can happen since a packet was just enqueued), a packet is dropped from the head of the queue with the largest current byte count. Note that this in most cases means that the packet that gets dropped is different from the one that was just enqueued, and may even be from a different queue.  </t></section><section title="Dequeue" anchor="dequeue" toc="default"><t>Most of FQ-CoDel's work is done at packet dequeue time. It consists of three parts: selecting a queue from which to dequeue a packet, actually dequeuing it (employing the CoDel algorithm in the process), and some final bookkeeping.  </t><t>For the first part, the scheduler first looks at the list of new queues; for each queue in that list, if that queue has a negative deficit (i.e. it has already dequeued at least a quantum of bytes), its deficit is increased by one quantum, and the queue is put onto <spanx style="emph" xml:space="preserve">the end of</spanx> the list of old queues, and the routine selects the next queue and starts again.  </t><t>Otherwise, that queue is selected for dequeue. If the list of new queues is empty, the scheduler proceeds down the list of old queues in the same fashion (checking the deficit, and either selecting the queue for dequeuing, or increasing the deficit and putting the queue back at the end of the list).  </t><t>After having selected a queue from which to dequeue a packet, the CoDel algorithm is invoked on that queue. This applies the CoDel control law, and may discard one or more packets from the head of that queue, before returning the packet that should be dequeued (or nothing if the queue is or becomes empty while being handled by the CoDel algorithm).  </t><t>Finally, if the CoDel algorithm did not return a packet, the queue is empty, and the scheduler does one of two things: if the queue selected for dequeue came from the list of new queues, it is moved to <spanx style="emph" xml:space="preserve">the end of</spanx> the list of old queues. If instead it came from the list of old queues, that queue is removed from the list, to be added back (as a new queue) the next time a packet arrives that hashes to that queue. Then (since no packet was available for dequeue), the whole dequeue process is restarted from the beginning.  </t><t>If, instead, the scheduler <spanx style="emph" xml:space="preserve">did</spanx> get a packet back from the CoDel algorithm, it updates the byte deficit for the selected queue before returning the packet as the result of the dequeue operation.  </t><t>The step that moves an empty queue from the list of new queues to <spanx style="emph" xml:space="preserve">the end of</spanx> the list of old queues before it is removed is crucial to prevent starvation. Otherwise the queue could reappear (the next time a packet arrives for it) before the list of old queues is visited; this can go on indefinitely even with a small number of active flows, if the flow providing packets to the queue in question transmits at just the right rate. This is prevented by first moving the queue to <spanx style="emph" xml:space="preserve">the end of</spanx> the list of old queues, forcing a pass through that, and thus preventing starvation. Moving it to the end of the list, rather than the front, is crucial for this to work.  </t><t>The resulting migration of queues between the different states is summarised in the following state diagram: </t><figure title="" suppress-title="false" align="left" alt="" width="" height=""><artwork xml:space="preserve" name="" type="" align="left" alt="" width="" height="">
+-----------------+                +--------------------+
|                 |     Empty      |                    |
|     Empty       |&lt;---------------+        Old         +-----+
|                 |                |                    |     |
+-------+---------+                +--------------------+     |
        |                             ^              ^        |Quantum
        |Arrival                      |              |        |Exceeded
        v                             |              |        |
+-----------------+                   |              |        |
|                 |     Empty or      |              |        |
|      New        +-------------------+              +--------+
|                 |  Quantum exceeded
+-----------------+
</artwork></figure></section></section><section title="Implementation considerations" anchor="implementation-considerations" toc="default"><section title="Probability of hash collisions" anchor="probability-of-hash-collisions" toc="default"><t>Since the Linux FQ-CoDel implementation by default uses 1024 hash buckets, the probability that (say) 100 VoIP sessions will all hash to the same bucket is something like ten to the power of minus 300. Thus, the probability that at least one of the VoIP sessions will hash to some other queue is very high indeed.  </t><t>Conversely, the probability that each of the 100 VoIP sessions will get its own queue is given by (1023!/(924!*1024^99)) or about 0.007; so not all that probable. The probability rises sharply, however, if we are willing to accept a few collisions. For example, there is about an 86% probability that no more than two of the 100 VoIP sessions will be involved in any given collision, and about a 99% probability that no more than three of the VoIP sessions will be involved in any given collision. These last two results were computed using Monte Carlo simulations: Oddly enough, the mathematics for VoIP-session collision exactly matches that of hardware cache overflow.  </t></section><section title="Memory Overhead" anchor="memory-overhead" toc="default"><t>FQ-CoDel can be implemented with a very low memory footprint (less than 64 bytes per queue on 64 bit systems). These are the data structures used in the Linux implementation: </t><figure title="" suppress-title="false" align="left" alt="" width="" height=""><artwork xml:space="preserve" name="" type="" align="left" alt="" width="" height="">
struct codel_vars {
    u32             count;
    u32             lastcount;
    bool            dropping;
    u16             rec_inv_sqrt;
    codel_time_t    first_above_time;
    codel_time_t    drop_next;
    codel_time_t    ldelay;
};

struct fq_codel_flow {
    struct sk_buff    *head;
    struct sk_buff    *tail;
    struct list_head  flowchain;
    int               deficit;
    u32               dropped; /* number of drops (or ECN marks) on this flow */
    struct codel_vars cvars;
};
</artwork></figure><t>The master table managing all queues looks like this: </t><figure title="" suppress-title="false" align="left" alt="" width="" height=""><artwork xml:space="preserve" name="" type="" align="left" alt="" width="" height="">
struct fq_codel_sched_data {
    struct tcf_proto *filter_list;  /* optional external classifier */
    struct fq_codel_flow *flows;    /* Flows table [flows_cnt] */
    u32             *backlogs;      /* backlog table [flows_cnt] */
    u32             flows_cnt;      /* number of flows */
    u32             perturbation;   /* hash perturbation */
    u32             quantum;        /* psched_mtu(qdisc_dev(sch)); */
    struct codel_params cparams;
    struct codel_stats cstats;
    u32             drop_overlimit;
    u32             new_flow_count;

    struct list_head new_flows;     /* list of new flows */
    struct list_head old_flows;     /* list of old flows */
};
</artwork></figure></section><section title="Per-Packet Timestamping" anchor="per-packet-timestamping" toc="default"><t>The CoDel portion of the algorithm requires per-packet timestamps be stored along with the packet. While this approach works well for software-based routers, it may be impossible to retrofit devices that do most of their processing in silicon and lack space or mechanism for timestamping.  </t><t>Also, while perfect resolution is not needed, timestamping functions in the core OS need to be efficient as they are called at least once on each packet enqueue and dequeue.  </t></section><section title="Other forms of &quot;Fair Queueing&quot;" anchor="other-forms-of-fair-queueing" toc="default"><t>Much of the scheduling portion of FQ-CoDel is derived from DRR and is substantially similar to DRR++. SFQ-based versions have also been produced and tested in ns2. Other forms of Fair Queueing, such as WFQ or QFQ, have not been thoroughly explored.  </t></section><section title="Differences between CoDel and FQ-CoDel behaviour" anchor="differences-between-codel-and-fq-codel-behaviour" toc="default"><t>CoDel can be applied to a single queue system as a straight AQM, where it converges towards an "ideal" drop rate (i.e.  one that minimises delay while keeping a high link utilisation), and then optimises around that control point.  </t><t>The scheduling of FQ-CoDel mixes packets of competing flows, which acts to pace bursty flows to better fill the pipe. Additionally, a new flow gets substantial "credit" over other flows until CoDel finds an ideal drop rate for it. However, for a new flow that exceeds the configured quantum, more time passes before all of its data is delivered (as packets from it, too, are mixed across the other existing queue-building flows). Thus, FQ-CoDel takes longer (as measured in time) to converge towards an ideal drop rate for a given new flow, but does so within fewer delivered <spanx style="emph" xml:space="preserve">packets</spanx> from that flow.  </t><t>Finally, the flow isolation FQ-CoDel provides means that the CoDel drop mechanism operates on the flows actually building queues, which results in packets being dropped more accurately from the largest flows than CoDel alone manages. Additionally, flow isolation radically improves the transient behaviour of the network when traffic or link characteristics change (e.g. when new flows start up or the link bandwidth changes); while CoDel itself can take a while to respond, fq_codel doesn't miss a beat.  </t></section></section><section title="Limitations of flow queueing" anchor="limitations-of-flow-queueing" toc="default"><t>While FQ-CoDel has been shown in many scenarios to offer significant performance gains, there are some scenarios where the scheduling algorithm in particular is not a good fit. This section documents some of the known cases which either may require tweaking the default behaviour, or where alternatives to flow queueing should be considered.  </t><section title="Fairness between things other than flows" anchor="fairness-between-things-other-than-flows" toc="default"><t>In some parts of the network, enforcing flow-level fairness may not be desirable, or some other level of fairness may be more important. An example of this can be an Internet Service Provider that may be more interested in ensuring fairness between customers than between flows. Or a hosting or transit provider that wishes to ensure fairness between connecting Autonomous Systems or networks. Another issue can be that the number of simultaneous flows experienced at a particular link can be too high for flow-based fairness queueing to be effective.  </t><t>Whatever the reason, in a scenario where fairness between flows is not desirable, reconfiguring FQ-CoDel to match on a different characteristic can be a way forward. The implementation in Linux can leverage the powerful packet matching mechanism of the <spanx style="emph" xml:space="preserve">tc</spanx> subsystem to use any available packet field to partition packets into virtual queues, to for instance match on address or subnet source/destination pairs, application layer characteristics, etc.  </t><t>Furthermore, as commonly deployed today, FQ-CoDel is used with three or more tiers of classification: priority, best effort and background, based on diffserv markings. Some products do more detailed classification, including deep packet inspection and destination-specific filters to achieve their desired result.  </t></section><section title="Flow bunching by opaque encapsulation" anchor="flow-bunching-by-opaque-encapsulation" toc="default"><t>Where possible, FQ-CoDel will attempt to decapsulate packets before matching on the header fields for the flow hashing.  However, for some encapsulation techniques, most notably encrypted VPNs, this is not possible. If several flows are bunched into one such encapsulated tunnel, they will be seen as one flow by the FQ-CoDel algorithm. This means that they will share a queue, and drop behaviour, and so flows inside the encapsulation will not benefit from the implicit prioritisation of FQ-CoDel, but will continue to benefit from the reduced overall queue length from the CoDel algorithm operating on the queue. In addition, when such an encapsulated bunch competes against other flows, it will count as one flow, and not assigned a share of the bandwidth based on how many flows are inside the encapsulation.  </t><t>Depending on the application, this may or may not be desirable behaviour. In cases where it is not, changing FQ-CoDel's matching to not be flow-based (as detailed in the previous subsection above) can be a way to mitigate this.  </t></section><section title="Low-priority congestion control algorithms" anchor="low-priority-congestion-control-algorithms" toc="default"><t>Because of the flow isolation that FQ-CoDel provides, low-priority congestion control algorithms (or, in general, algorithms that try to voluntarily use up less than their fair share of bandwidth) can be re-prioritised. Because a flow experiences very little added latency when the link is congested, such algorithms lack the signal to back off that added latency previously afforded them. As such, existing algorithms tend to revert to loss-based congestion control, and will consume the fair share of bandwidth afforded to them by the FQ-CoDel scheduler. However, low-priority congestion control mechanisms may be able to take steps to continue to be low priority, for instance by taking into account the vastly reduced level of delay afforded by an AQM, or by using a coupled approach to observing the behaviour of multiple flows.  </t></section></section><section title="Security Considerations" anchor="security-considerations" toc="default"><t>There are no specific security exposures associated with FQ-CoDel.  Some exposures present in current FIFO systems are in fact reduced (e.g. simple minded packet floods).  </t></section><section title="IANA Considerations" anchor="iana-considerations" toc="default"><t>This document has no actions for IANA.  </t></section><section title="Acknowledgements" anchor="acknowledgements" toc="default"><t>Our deepest thanks to Eric Dumazet (author of FQ-CoDel), Kathie Nichols, Van Jacobson, and all the members of the bufferbloat.net effort.  </t></section><section title="Conclusions" anchor="conclusions" toc="default"><t>FQ-CoDel is a very general, efficient, nearly parameterless active queue management approach combining flow queueing with CoDel. It is a critical tool in solving bufferbloat.  </t><t>FQ-CoDel's default settings SHOULD be modified for other special-purpose networking applications, such as for exceptionally slow links, for use in data centres, or on links with inherent delay greater than 800ms (e.g. satellite links).  </t><t>On-going projects are: improving FQ-CoDel with more SFQ-like behaviour for lower bandwidth systems, improving the control law, optimising sparse packet drop behaviour, etc..  </t><t>In addition to the Linux kernel sources, ns2 and ns3 models are available. Refinements (such as <eref target="http://www.bufferbloat.net/projects/cerowrt/wiki/nfq_codel">NFQCODEL</eref>) are being tested in the CeroWrt effort.  </t></section> </middle>
  <back>
    <references title="Normative References"><reference anchor="RFC2119"><front><title abbrev="RFC Key Words">Key words for use in RFCs to Indicate Requirement Levels</title><author initials="S." surname="Bradner" fullname="Scott Bradner"><organization>Harvard University</organization><address><postal><street>1350 Mass. Ave.</street><street>Cambridge</street><street>MA 02138</street></postal><phone>- +1 617 495 3864</phone><email>sob@harvard.edu</email></address></author><date year="1997" month="March"/><area>General</area><keyword>keyword</keyword><abstract><t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized.  This document defines these words as they should be interpreted in IETF documents.  Authors who follow these guidelines should incorporate this phrase near the beginning of their document: <list><t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.  </t></list></t><t>Note that the force of these words is modified by the requirement level of the document in which they are used.  </t></abstract></front><seriesInfo name="BCP" value="14"/><seriesInfo name="RFC" value="2119"/><format type="TXT" octets="4723" target="ftp://ftp.isi.edu/in-notes/rfc2119.txt"/><format type="HTML" octets="17491" target="http://xml.resource.org/public/rfc/html/rfc2119.html"/><format type="XML" octets="5777" target="http://xml.resource.org/public/rfc/xml/rfc2119.xml"/></reference> <reference anchor="RFC0896"><front><title>Congestion control in IP/TCP internetworks</title><author initials="J." surname="Nagle" fullname="John Nagle"><organization>Ford Aerospace and Communications Corporation</organization></author><date year="1984" month="January" day="6"/><abstract><t>This memo discusses some aspects of congestion control in  IP/TCP Internetworks. It  is intended to stimulate thought and further discussion of this topic. While some specific  suggestions  are made for improved congestion  control  implementation, this memo does not specify any standards.</t></abstract></front><seriesInfo name="RFC" value="896"/><format type="TXT" octets="26782" target="http://www.rfc-editor.org/rfc/rfc896.txt"/></reference> <reference anchor="RFC0970"><front><title>On packet switches with infinite storage</title><author initials="J." surname="Nagle" fullname="John Nagle"><organization>FACC Palo Alto</organization></author><date year="1985" day="1" month="December"/><abstract><t>Most prior work on congestion in datagram systems focuses on buffer management.  We find it illuminating to consider the case of a packet switch with infinite storage.  Such a packet switch can never run out of buffers. It can, however, still become congested.  The meaning of congestion in an infinite-storage system is explored.  We demonstrate the unexpected result that a datagram network with infinite storage, first-in-first-out queuing, at least two packet switches, and a finite packet lifetime will, under overload, drop all packets.  By attacking the problem of congestion for the infinite-storage case, we discover new solutions applicable to switches with finite storage.</t></abstract></front><seriesInfo name="RFC" value="970"/><format type="TXT" octets="35316" target="http://www.rfc-editor.org/rfc/rfc970.txt"/></reference> <reference anchor="RFC2309"><front><title abbrev="Internet Performance Recommendations">Recommendations on Queue Management and Congestion Avoidance in the Internet</title><author initials="B." surname="Braden" fullname="Bob Braden"><organization>USC Information Sciences Institute</organization><address><postal><street>4676 Admiralty Way</street><city>Marina del Rey</city><region>CA</region><code>90292</code></postal><phone>310-822-1511</phone><email>Braden@ISI.EDU</email></address></author><author initials="D.D." surname="Clark" fullname="David D. Clark"><organization>MIT Laboratory for Computer Science</organization><address><postal><street>545 Technology Sq.</street><city>Cambridge</city><region>MA</region><code>02139</code></postal><phone>617-253-6003</phone><email>DDC@lcs.mit.edu</email></address></author><author initials="J." surname="Crowcroft" fullname="Jon Crowcroft"><organization>University College London</organization><address><postal><street>Department of Computer Science</street><street>Gower Street</street><street>London, WC1E 6BT</street><street>ENGLAND</street></postal><phone>+44 171 380 7296</phone><email>Jon.Crowcroft@cs.ucl.ac.uk</email></address></author><author initials="B." surname="Davie" fullname="Bruce Davie"><organization>Cisco Systems, Inc.</organization><address><postal><street>250 Apollo Drive</street><city>Chelmsford</city><region>MA</region><code>01824</code></postal><email>bdavie@cisco.com</email></address></author><author initials="S." surname="Deering" fullname="Steve Deering"><organization>Cisco Systems, Inc.</organization><address><postal><street>170 West Tasman Drive</street><city>San Jose</city><region>CA</region><code>95134-1706</code></postal><phone>408-527-8213</phone><email>deering@cisco.com</email></address></author><author initials="D." surname="Estrin" fullname="Deborah Estrin"><organization>USC Information Sciences Institute</organization><address><postal><street>4676 Admiralty Way</street><city>Marina del Rey</city><region>CA</region><code>90292</code></postal><phone>310-822-1511</phone><email>Estrin@usc.edu</email></address></author><author initials="S." surname="Floyd" fullname="Sally Floyd"><organization>Lawrence Berkeley National Laboratory, MS 50B-2239, One Cyclotron Road, Berkeley CA 94720</organization><address><phone>510-486-7518</phone><email>Floyd@ee.lbl.gov</email></address></author><author initials="V." surname="Jacobson" fullname="Van Jacobson"><organization>Lawrence Berkeley National Laboratory, MS 46A, One Cyclotron Road, Berkeley CA 94720</organization><address><phone>510-486-7519</phone><email>Van@ee.lbl.gov</email></address></author><author initials="G." surname="Minshall" fullname="Greg Minshall"><organization>Fiberlane Communications</organization><address><postal><street>1399 Charleston Road</street><city>Mountain View</city><region>CA</region><code>94043</code></postal><phone>+1 650 237 3164</phone><email>Minshall@fiberlane.com</email></address></author><author initials="C." surname="Partridge" fullname="Craig Partridge"><organization>BBN Technologies</organization><address><postal><street>10 Moulton St.</street><street>Cambridge MA 02138</street></postal><phone>510-558-8675</phone><email>craig@bbn.com</email></address></author><author initials="L." surname="Peterson" fullname="Larry Peterson"><organization>Department of Computer Science</organization><address><postal><street>University of Arizona</street><city>Tucson</city><region>AZ</region><code>85721</code></postal><phone>520-621-4231</phone><email>LLP@cs.arizona.edu</email></address></author><author initials="K.K." surname="Ramakrishnan" fullname="K.K. Ramakrishnan"><organization>AT&amp;T Labs. Research</organization><address><postal><street>Rm. A155</street><street>180 Park Avenue</street><street>Florham Park, N.J. 07932</street></postal><phone>973-360-8766</phone><email>KKRama@research.att.com</email></address></author><author initials="S." surname="Shenker" fullname="Scott Shenker"><organization>Xerox PARC</organization><address><postal><street>3333 Coyote Hill Road</street><city>Palo Alto</city><region>CA</region><code>94304</code></postal><phone>415-812-4840</phone><email>Shenker@parc.xerox.com</email></address></author><author initials="J." surname="Wroclawski" fullname="John Wroclawski"><organization>MIT Laboratory for Computer Science</organization><address><postal><street>545 Technology Sq.</street><city>Cambridge</city><region>MA</region><code>02139</code></postal><phone>617-253-7885</phone><email>JTW@lcs.mit.edu</email></address></author><author initials="L." surname="Zhang" fullname="Lixia Zhang"><organization>UCLA</organization><address><postal><street>4531G Boelter Hall</street><city>Los Angeles</city><region>CA</region><code>90024</code></postal><phone>310-825-2695</phone><email>Lixia@cs.ucla.edu</email></address></author><date year="1998" month="April"/><area>Routing</area><keyword>congestion</keyword><abstract><t>This memo presents two recommendations to the Internet community concerning measures to improve and preserve Internet performance.  It presents a strong recommendation for testing, standardization, and widespread deployment of active queue management in routers, to improve the performance of today's Internet.  It also urges a concerted effort of research, measurement, and ultimate deployment of router mechanisms to protect the Internet from flows that are not sufficiently responsive to congestion notification.  </t></abstract></front><seriesInfo name="RFC" value="2309"/><format type="TXT" octets="38079" target="http://www.rfc-editor.org/rfc/rfc2309.txt"/><format type="XML" octets="42517" target="http://xml.resource.org/public/rfc/xml/rfc2309.xml"/></reference> <reference anchor="CODELDRAFT" target="https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/"><front><title>Controlled Delay Active Queue Management</title><author initials="K" surname="Nichols" fullname="Kathleen Nichols"><organization/></author><author initials="V" surname="Jacobson" fullname="Van Jacobson"><organization>Google, Inc</organization></author><author initials="A" surname="McGregor" fullname="Andrew McGregor"><organization>Google, Inc</organization></author><author initials="J" surname="Iyengar" fullname="Jana Iyengar"><organization>Google, Inc</organization></author><!--<workgroup>Communications of the ACM Vol. 55 No. 11, July, 2012, pp.  42-50.</workgroup>--><date year="2014" month="October"/></front></reference> </references>
    <references title="Informative References"><reference anchor="SFQ" target="http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf"><front><title>Stochastic Fairness Queuing</title><author initials="P.E." surname="McKenney" fullname="Paul E. McKenney"><organization/></author><date year="1990" month="September"/><abstract><t>This paper presents a class of algorithms collectively called stochastic fairness queueing that are probilistic variants of fair queuing. These algorithms do not require an exact mapping and thus are suitable for high-speed software or firmware implementations. Furthermore these algorithms represent a broad range of CPU, memory, and fairness tradeoffs.  </t></abstract></front><format type="PDF" octets="294635" target="http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf"/></reference> <reference anchor="CODEL2012" target="http://queue.acm.org/detail.cfm?id=2209336"><front><title>Controlling Queue Delay</title><author initials="K" surname="Nichols" fullname="Kathleen Nichols"><organization/></author><author initials="V" surname="Jacobson" fullname="Van Jacobson"><organization>Google, Inc</organization></author><!--<workgroup>Communications of the ACM Vol. 55 No. 11, July, 2012, pp.  42-50.</workgroup>--><date year="2012" month="July"/><abstract><t></t></abstract></front><format type="PDF" octets="" target="http://queue.acm.org/detail.cfm?id=2209336"/></reference> <reference anchor="SQF2012" target="http://perso.telecom-paristech.fr/~bonald/Publications_files/BMO2011.pdf"><front><title>On the impact of TCP and per-flow scheduling on Internet Performance - IEEE/ACM transactions on Networking</title><author initials="T" surname="Bonald" fullname="Thomas Bonald"><organization>Telecom ParisTech</organization></author><author initials="L" surname="Muscariello" fullname="Luca Muscariello"><organization>Orange Labs</organization></author><author initials="N" surname="Ostallo" fullname="Norberto Ostallo"><organization>Eurocom</organization></author><date year="2012" month="April"/><abstract><t>We present a packet scheduler called &#8220;shortest queue first&#8221; (SQF) that aims at protecting audio and video traffic from the congestion caused by data traffic. Unlike standard solutions, the services to be handled with priority are not known in advance. It is rather the traffic characteristics of audio and video applications that are used to detect their delay sensitivity.  The SQF algorithm does not require any prior configuration of the network and, as such, adapts to the fast evolution of traffic and usage. The performance of the proposed solution is demonstrated using both analysis and experiments on a testbed emulating a residential access line </t></abstract></front><format type="PDF" octets="" target="http://perso.telecom-paristech.fr/~bonald/Publications_files/BMO2011.pdf"/></reference> <reference anchor="DRR" target="http://users.ece.gatech.edu/~siva/ECE4607/presentations/DRR.pdf"><front><title>Efficient Fair Queueing Using Deficit Round Robin</title><author initials="M." surname="Shreedhar" fullname="M. Shreedhar"><organization/></author><author initials="G." surname="Varghese" fullname="George Varghese"><organization/></author><date year="1996" month="June"/><abstract><t>(FIXME need IEEE/ACM TRANSACTIONS ON NETWORKING VOL 4, NO. 3 in here somehow above), and Member, IEEE Fair Queuing is a technique that allows each flow passing through a network device to have a fair share of network resources. Previous schemes for fair queueinng that achieved nearly perfect fairness were expensive to implement; specificly the work required to process a packet in these schemes was O(log(n)) where n is the number of active flows.  </t><t>Deficit Round Robin achieves near perfect fairness, requires only O(1) work to process a packet and is simple enough to implement in hardware.  </t></abstract></front><format type="PDF" octets="1101478" target="http://users.ece.gatech.edu/~siva/ECE4607/presentations/DRR.pdf"/></reference> <reference anchor="DRRPP" target="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=875803"><front><title>Deficits for Bursty Latency-critical Flows: DRR++</title><author surname="MacGregor" fullname="M.H. MacGregor"><organization/></author><author surname="Shi" fullname="W. Shi"><organization/></author><date year="2000"/></front></reference> </references>
  </back>
</rfc>
