TCP Flow Control
Note: This topic describes the Reno enhancement of classical "Van Jacobson" or Tahoe congestion control. There have been many suggestions for improving this mechanism - see the topic on high-speed TCP variants.
TCP flow control and window size adjustment is mainly based on two key
mechanism:
Slow Start and
Additive Increase/Multiplicative Decrease (AIMD),
also known as Congestion Avoidance.
(
RFC 793 and
RFC 2581)
Slow Start
To avoid that a starting TCP connection floods the network, a
Slow Start
mechanism was introduced in TCP. This mechanism effectively
probes to find the available bandwidth.
In addition to the window advertised by the receiver, a
Congestion Window (
cwnd) value is used and the effective window size is the lesser of the two. The starting value of the
cwnd window is set initially to the
maximum segment size (MSS) of the connection (obtained during SYN handshake, discovered path MTU). After each acknowledgment, the
cwnd window is increased by one MSS. By this algorithm, the data rate of the sender doubles each round-trip time (RTT) interval (actually, taking into account
Delayed ACKs, rate increases by 50% every RTT). For a properly implemented version of TCP this increase continues until:
- the advertised window size is reached
- congestion (packet loss) is detected on the connection.
- there is no traffic waiting to take advantage of an increased window (i.e. cwnd should only grow if it needs to)
When congestion is detected, the TCP flow-control mode is changed from
Slow Start to
Congestion Avoidance. Note that some TCP implementations maintain cwnd in units of bytes, while others use units of full-sized segments.
Congestion Avoidance
Once congestion is detected (through timeout and/or duplicate ACKs),
the data rate is reduced in order to let the network recover.
Slow Start uses an exponential increase in window size and thus also
in data rate.
Congestion Avoidance uses a linear growth function
(additive increase). This is achieved by introducing - in addition to
the
cwnd window - a
slow start threshold (
ssthresh).
As long as
cwnd is less than
ssthresh,
Slow Start applies. Once
ssthresh is reached,
cwnd is increased by at most one segment per
RTT. The
cwnd window continues to open with this linear rate until
a congestion event is detected.
When congestion is detected,
ssthresh is set to half the
cwnd (or to be strictly accurate, half the "Flight Size". This distinction is important if the implementation lets cwnd grow beyond
rwnd (the receiver's declared window)).
cwnd is either set to 1 if congestion was signalled by a timeout,
forcing the sender to enter
Slow Start, or to
ssthresh if
congestion was signalled by duplicate ACKs and the
Fast Recovery
algorithm has terminated. In either case, once the sender enters
Congestion Avoidance, its rate has been reduced to half the value at
the time of congestion. This multiplicative decrease causes the
cwnd to close exponentially with each detected loss event.
Fast Retransmit
In
Fast Retransmit, the arrival of three duplicate ACKs is
interpreted as packet loss, and retransmission starts before the
retransmission timer (RTO) expires.
The missing segment will be retransmitted immediately without going
through the normal retransmission queue processing. This improves
performance by eliminating delays that would suspend effective data
flow on the link.
Fast Recovery
Fast Recovery is used to react quickly to a single packet loss. In
Fast recovery, the receipt of 3 duplicate ACKs, while being taken to
mean a loss of a segment, does not result in a full Slow Start. This
is because obviously later segments got through, and hence congestion
is not stopping everything. In fast recovery, ssthresh is set to half
of the current send window size, the missing segment is retransmitted
(
Fast Retransmit) and
cwnd is set to
ssthresh plus three
segments. Each additional duplicate ACK indicates that one segment
has left the network at the receiver and
cwnd is increased by one
segment to allow the transmission of another segment if allowed by the
new
cwnd. When an ACK is received for new data,
cwmd is reset to
the
ssthresh, and TCP enters congestion avoidance mode.
References
- Congestion Avoidance and Control, V. Jacobson, Computer Communication Review, vol. 18, no. 4, pp. 314-329, August 1988, ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
- TCP Congestion Control, RFC 2581, M. Allman, V. Paxson, W. Stevens, April 1999
- TCP Congestion Control, "RFC 2851 bis", project to revise RFC 2581, M. Allman, V. Paxson, E. Blanton, work in progress, September 2007
- Congestion Control in Linux TCP, P. Sarolahti, A. Kuznetsov, USENIX Annual Technical Conference 2002, Freenix Track
--
UlrichSchmid - 07 Jun 2005
--
SimonLeinen - 27 Jan 2006