Computer Networks
TCP: Congestion Control
Цели урока
- Understand the 1986 congestion collapse and the positive-feedback loop that caused it
- Know the four stages of the classic algorithm: slow start, congestion avoidance, fast retransmit, fast recovery
- Tell apart loss-based (Reno, Cubic) and model-based (BBR) congestion control
- Reason about cwnd, rwnd, ssthresh and which one is bounding throughput right now
- Understand why BBR shines on long fat networks and lossy Wi-Fi
October 1986. The internet hits its first congestion collapse: the LBL to Berkeley link, all 400 yards of it, drops from 32 kbit/s to 40 bit/s - 800x slower. Link 100% busy, useful traffic near zero: everyone retransmits, retransmits trigger more retransmits. In 1988 Van Jacobson publishes the SIGCOMM paper introducing slow start and congestion avoidance, and the internet survives.
- **Video calls:** BBR helps Zoom/Meet work even with Wi-Fi losses
- **Downloads:** Cubic efficiently uses gigabit channels
- **Games:** Low cwnd = low latency, but less throughput
Предварительные знания
Why Congestion Control Is Needed
**Congestion Control** - a mechanism that prevents a sender from overloading the network. Unlike flow control (protecting the receiver), congestion control protects routers and communication channels between them.
TCP uses **implicit feedback**: a packet loss or growing delay means congestion. There is no explicit signal from routers (unlike ECN). The sender draws its own conclusions.
**Flow Control vs Congestion Control:** Flow control - rwnd (receive window) from the receiver. Congestion control - cwnd (congestion window) at the sender. Actual window = min(rwnd, cwnd).
What does congestion control protect?
Slow Start
**Slow Start** - the initial phase of a TCP connection. The name is misleading: growth is actually exponential! The window doubles every RTT until a loss occurs or the threshold (ssthresh) is reached.
**IW (Initial Window):** Previously - 1 MSS. Now RFC 6928 recommends IW=10 MSS (~14 KB). This speeds up loading of small web pages that can be transferred in 1 RTT.
How does cwnd grow during Slow Start?
AIMD - Additive Increase, Multiplicative Decrease
**AIMD** - the main congestion control algorithm after slow start. Additive Increase: window grows linearly (+1 MSS per RTT). Multiplicative Decrease: on loss, window is halved.
**Triple Duplicate ACK:** 3 identical ACKs in a row = Fast Retransmit + Fast Recovery. cwnd is halved, but not reset to 1. Timeout (a more serious problem) resets cwnd to 1 and restarts Slow Start.
What happens to cwnd when a packet is lost in AIMD?
Congestion Window (cwnd)
**cwnd** (Congestion Window) - the congestion window maintained by the sender. Unlike rwnd (from the receiver), cwnd is the sender's own estimate of network state. Effective window = min(cwnd, rwnd).
**ssthresh (Slow Start Threshold):** The boundary between Slow Start and Congestion Avoidance. On loss: ssthresh = cwnd/2. The next Slow Start will continue only up to ssthresh.
What is the effective send window?
TCP Variants: Reno, Cubic, BBR
Classic AIMD (TCP Reno) is not the only algorithm. Modern variants: **TCP Cubic** (Linux default), **BBR** (Google) - optimized for different scenarios.
**BBR (Bottleneck Bandwidth and RTT):** Instead of reacting to losses, BBR models the network: measures max bandwidth and min RTT. Keeps inflight = BDP. Works better with random losses (Wi-Fi).
TCP always works the same way
There are many congestion control algorithms, each with its own advantages
Networks are different: satellite with 600 ms RTT, mobile internet with losses, data center with 0.1 ms. One algorithm is not optimal everywhere. That's why Linux lets you choose: Cubic, BBR, Reno.
Which congestion control algorithm is used in Linux by default?
Key Ideas
- **Congestion Control** - protecting the network from overload
- **Slow Start** - exponential cwnd growth up to the threshold
- **AIMD** - linear growth, multiplicative decrease on loss
- **cwnd** - congestion window; effective = min(cwnd, rwnd)
Related Topics
Congestion control is at the heart of TCP performance:
- TCP Flow Control — Another constraint - protecting the receiver
- QUIC — Congestion control at the application level
- Buffer Bloat — When large buffers break congestion signals
Вопросы для размышления
- Why is timeout worse than triple duplicate ACK?
- How does BBR determine bandwidth without losses?
- Why do two TCP flows with different RTTs unfairly share a channel?