Skip to content
Published on

[Computer Networking] 10. TCP Deep Dive: Connection, Flow Control, Congestion Control

Authors

This post is based on the textbook Computer Networking: A Top-Down Approach (6th Edition) by James Kurose and Keith Ross.


1. TCP Connection

1.1 Characteristics of a TCP Connection

Key properties of a TCP connection:
  +-- Connection-Oriented
  |   Connection must be established before data transfer
  +-- Full-Duplex
  |   Simultaneous bidirectional data transfer
  +-- Point-to-Point
  |   Exactly one sender and one receiver
  +-- Byte Stream Service
      Continuous byte flow without message boundaries

1.2 3-Way Handshake

The TCP connection establishment process.

Client                        Server
   |                               |
   |-- SYN (seq=client_isn) ------>|  Step 1: SYN
   |                               |
   |<-- SYN+ACK ------------------|  Step 2: SYN+ACK
   |   (seq=server_isn,            |
   |    ack=client_isn+1)          |
   |                               |
   |-- ACK (ack=server_isn+1) --->|  Step 3: ACK
   |   (may include data)          |
   |                               |

Step-by-Step Explanation

Step 1 - SYN:
  - Client sends a SYN segment
  - SYN bit = 1
  - Sets initial sequence number (client_isn)
  - No data

Step 2 - SYN+ACK:
  - Server responds with SYN+ACK segment
  - SYN bit = 1, ACK bit = 1
  - Sets server's initial sequence number (server_isn)
  - Acknowledgment number = client_isn + 1

Step 3 - ACK:
  - Client sends ACK segment
  - SYN bit = 0 (connection established)
  - Data can be included from this segment onward

1.3 TCP Connection Termination (4-Way Handshake)

Client                        Server
   |                               |
   |-- FIN ----------------------->|  Step 1
   |                               |
   |<-- ACK -----------------------|  Step 2
   |                               |
   |<-- FIN -----------------------|  Step 3
   |                               |
   |-- ACK ----------------------->|  Step 4
   |                               |
   | (TIME_WAIT: 30 sec wait)      |
   |   -> Connection fully closed   |

TIME_WAIT state: The client waits for a period to handle potential FIN retransmissions in case the last ACK is lost.


2. TCP Segment Structure

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-----------------------+-----------------------+
|   Source port (16)    |   Dest port (16)      |
+-----------------------+-----------------------+
|            Sequence number (32)               |
+-----------------------------------------------+
|         Acknowledgment number (32)            |
+--------+------+-------+-----------------------+
| Header | Unused|Flags  |   Receive window (16) |
| len(4) | (6)   |(6bits)|                       |
+--------+------+-------+-----------------------+
|    Checksum (16)      | Urgent pointer (16)   |
+-----------------------+-----------------------+
|           Options (variable length)           |
+-----------------------------------------------+
|                                               |
|           Data (variable length)              |
|                                               |
+-----------------------------------------------+

Key Fields

FieldSizeDescription
Sequence number32 bitsByte number of first data byte in segment
ACK number32 bitsNext byte number expected
Receive window16 bitsReceivable byte count (flow control)
Header length4 bitsTCP header length (in 4-byte units)
Flag bits6 bitsSYN, ACK, FIN, RST, PSH, URG

3. Sequence Numbers and Acknowledgment Numbers

3.1 Sequence Number

The byte stream number of the first byte of data in the segment.

Example: 500,000 byte file, MSS = 1,000 bytes

Segment 1: seq = 0,      data = bytes 0-999
Segment 2: seq = 1000,   data = bytes 1000-1999
Segment 3: seq = 2000,   data = bytes 2000-2999
...
Segment 500: seq = 499000, data = bytes 499000-499999

3.2 Acknowledgment Number

The next byte number the receiver expects. Uses cumulative acknowledgment.

Example: Typing 'C' in a telnet session

Host A                          Host B
   |-- seq=42, ack=79, 'C' --->|
   |                             |  Receives 'C', echo response
   |<-- seq=79, ack=43, 'C' ---|
   |                             |
   |-- seq=43, ack=80 -------->|  (ACK)

  seq=42: 42nd byte that A is sending
  ack=79: A expects byte 79 from B
  ack=43: B expects byte 43 from A (byte 42 received)

4. RTT Estimation and Timeout

4.1 SampleRTT

The actually measured RTT value. The time from segment transmission to ACK reception.

  • Retransmitted segments are excluded from measurement
  • SampleRTT is highly variable

4.2 EstimatedRTT (Exponential Weighted Moving Average)

EstimatedRTT = (1 - alpha) * EstimatedRTT + alpha * SampleRTT

  alpha = 0.125 (recommended value)

  -> More weight on recent measurements
  -> Smooths out rapid fluctuations in SampleRTT

4.3 DevRTT (RTT Deviation)

DevRTT = (1 - beta) * DevRTT + beta * |SampleRTT - EstimatedRTT|

  beta = 0.25 (recommended value)

4.4 Timeout Interval

TimeoutInterval = EstimatedRTT + 4 * DevRTT

  EstimatedRTT: Average RTT estimate
  4 * DevRTT: Safety margin (accounting for variability)
RTT estimation example (changes over time):

RTT
(ms)
 ^
 |  *    *
 |   *  * *  *         SampleRTT (highly variable)
 |    **    * *  *
 |  ___________*____   EstimatedRTT (stable)
 |                     TimeoutInterval (upper bound)
 +---------------------> Time

5. TCP Reliable Data Transfer

TCP implements reliable transfer on top of IP's unreliable service.

5.1 Key Events for the TCP Sender

Event 1: Data received from upper layer
  -> Create segment, assign sequence number
  -> Pass to IP
  -> Start timer if not already running

Event 2: Timer expires
  -> Retransmit the oldest unacknowledged segment
  -> Restart timer

Event 3: ACK received
  -> Update SendBase (cumulative acknowledgment)
  -> Restart timer if there are still unacknowledged segments

5.2 TCP Retransmission Scenarios

Scenario 1: ACK Loss

Host A                    Host B
  |-- seq=92, 8 bytes --->|
  |                        |-- ACK=100 (lost!)
  | Timer expires          |
  |-- seq=92, 8 bytes --->|  (retransmit)
  |<-- ACK=100 ---------- |

Scenario 2: Premature Timeout

Host A                    Host B
  |-- seq=92, 8 bytes --->|
  |-- seq=100, 20 bytes ->|
  |                        |-- ACK=100
  | Timer expires!         |-- ACK=120
  | (ACK=100 not yet arrived)
  |-- seq=92, 8 bytes --->|  (unnecessary retransmit)
  |<-- ACK=100 ---------- |
  |<-- ACK=120 ---------- |  (cumulative ACK resolves naturally)

Scenario 3: Cumulative ACK

Host A                    Host B
  |-- seq=92, 8 bytes --->|
  |-- seq=100, 20 bytes ->|
  |                        |-- ACK=100 (lost!)
  |                        |-- ACK=120 (arrives)
  |<-- ACK=120 ---------- |

  ACK=120 confirms all bytes before 120
  -> seq=92 packet is also confirmed!
  -> No retransmission needed

5.3 Fast Retransmit

Instead of waiting for a timeout, retransmit immediately upon receiving 3 duplicate ACKs.

Host A                    Host B
  |-- seq=92 ------------>|  ACK=100
  |-- seq=100 ----------->|  (lost!)
  |-- seq=120 ----------->|  ACK=100 (dup 1)
  |-- seq=140 ----------->|  ACK=100 (dup 2)
  |-- seq=160 ----------->|  ACK=100 (dup 3)
  |                        |
  | 3 duplicate ACKs received!
  | Retransmit immediately before timer expires
  |-- seq=100 ----------->|  ACK=180 (cumulative)

3 duplicate ACKs = the original ACK + 3 additional = 4 total identical ACKs


6. Flow Control

6.1 The Problem

If the sender sends data too quickly, the receiver's buffer can overflow.

Receiver buffer:
  +------------------------------+
  |[Data][Data][  Free space     ]|
  +------------------------------+
  |<-- In use -->|<--- rwnd ---->|
                   Receive window

6.2 Receive Window (rwnd)

TCP uses the receive window (rwnd) for flow control.

rwnd = RcvBuffer - (LastByteRcvd - LastByteRead)

  RcvBuffer: Total size of the receive buffer
  LastByteRcvd: Last byte number received
  LastByteRead: Last byte number read by the upper layer

6.3 How It Works

Receiver: Includes rwnd value in ACK segments

  ACK segment:
  +--------+--------+
  | ACK=N  | rwnd=X |  "Received up to byte N,
  +--------+--------+   can accept X more bytes"

Sender: Limits transmission so as not to exceed rwnd

  LastByteSent - LastByteAcked <= rwnd

  As rwnd decreases -> transmission rate decreases
  When rwnd = 0 -> stop transmission (but send 1-byte probe packets)

The Problem When rwnd = 0

Problem: After receiver sends rwnd=0, even if it clears its buffer,
         there is no way to notify the sender (no data to send)

Solution: Sender periodically sends 1-byte probe segments
  -> Receiver responds with ACK containing current rwnd value
  -> When free space appears, transmission resumes

7. TCP Congestion Control

7.1 What Is Congestion

Congestion:
  A state where data volume in the network exceeds network capacity

Symptoms:
  +-- Packet loss (router buffer overflow)
  +-- Long delays (waiting in router queues)
  +-- Unnecessary retransmissions (due to timeouts)

7.2 Congestion Control vs Flow Control

Flow control:   Protects the receiver (prevents receive buffer overflow)
                -> Controlled by rwnd

Congestion control: Protects the network (prevents network overload)
                    -> Controlled by cwnd (congestion window)

Actual transmission rate:
  LastByteSent - LastByteAcked <= min(cwnd, rwnd)

7.3 AIMD (Additive Increase Multiplicative Decrease)

The fundamental philosophy of TCP congestion control.

cwnd adjustment principle:

  No packet loss (ACKs received normally):
    -> Increase cwnd (probing for bandwidth)

  Packet loss (timeout or 3 duplicate ACKs):
    -> Decrease cwnd (alleviating congestion)

  "Sawtooth" pattern:

  cwnd
    ^
    |    /\      /\      /\
    |   /  \    /  \    /  \
    |  /    \  /    \  /    \
    | /      \/      \/      \
    +---------------------------> Time

7.4 Slow Start

At the beginning of a connection, cwnd is increased exponentially.

Initial cwnd = 1 MSS

On each ACK received: cwnd += 1 MSS
(If all ACKs are received in one RTT, cwnd doubles)

cwnd progression:
  RTT 1: cwnd = 1 MSS  -> 1 segment sent
  RTT 2: cwnd = 2 MSS  -> 2 segments sent
  RTT 3: cwnd = 4 MSS  -> 4 segments sent
  RTT 4: cwnd = 8 MSS  -> 8 segments sent
  ...

  Exponential growth: 1, 2, 4, 8, 16, 32, ...

Slow Start Termination Conditions

1. cwnd >= ssthresh (slow start threshold) -> Switch to congestion avoidance
2. Timeout -> ssthresh = cwnd/2, cwnd = 1 MSS, restart slow start
3. 3 duplicate ACKs -> ssthresh = cwnd/2, cwnd = ssthresh + 3, fast recovery

7.5 Congestion Avoidance

After cwnd reaches ssthresh, increase linearly.

Each RTT: cwnd += 1 MSS
(Per ACK: cwnd += MSS * MSS / cwnd)

cwnd progression (in MSS units):
  RTT n:   cwnd = 10
  RTT n+1: cwnd = 11
  RTT n+2: cwnd = 12
  ...

  Linear increase: 10, 11, 12, 13, ...

Behavior on Congestion Events

Timeout:
  ssthresh = cwnd / 2
  cwnd = 1 MSS
  -> Return to slow start

3 duplicate ACKs:
  ssthresh = cwnd / 2
  cwnd = ssthresh + 3 MSS
  -> Enter fast recovery

7.6 Fast Recovery

The state entered upon receiving 3 duplicate ACKs. Introduced in TCP Reno.

Fast recovery operation:
  1. On each duplicate ACK received: cwnd += 1 MSS
  2. On new ACK received (loss recovery complete):
     cwnd = ssthresh
     -> Switch to congestion avoidance
  3. On timeout:
     ssthresh = cwnd / 2
     cwnd = 1 MSS
     -> Return to slow start

7.7 TCP Congestion Control State Diagram

                    +---------------+
       Start ------>| Slow Start    |
                    | cwnd exp.     |
                    | increase      |
                    +------+--------+
                           |
                cwnd >= ssthresh
                           |
                    +------v--------+
              +---->| Congestion    |<----+
              |     | Avoidance     |     |
              |     | cwnd linear   |     |
              |     | increase      |     |
              |     +------+--------+     |
              |            |              |
         New ACK      3 dup ACKs     Timeout
         (recovery        |              |
          complete)       |              |
              |     +------v--------+     |
              |     | Fast Recovery |     |
              +-----| cwnd adjust   |     |
                    +---------------+     |
                           |              |
                      Timeout             |
                           +--------------+
                           ssthresh=cwnd/2
                           cwnd=1 MSS
                           -> Slow Start

7.8 TCP Tahoe vs TCP Reno

cwnd
  ^
  |           *
  |          * *
  |         *   *        TCP Reno
  |        *     *      /
  |       *       *    *
  |      *         *  * *
  |     *           **   *
  |    *                  *
  |   *
  |  * TCP Tahoe: Always resets cwnd to 1
  | *
  +-------------------------------> Time
       ^         ^
     Loss 1    Loss 2
EventTCP TahoeTCP Reno
Timeoutcwnd = 1 MSScwnd = 1 MSS
3 dup ACKscwnd = 1 MSScwnd = cwnd/2 (fast recovery)

8. TCP Fairness

8.1 Definition of Fairness

When K TCP connections share a bottleneck link of capacity R, ideally each connection should get R/K throughput.

8.2 Why TCP Is Fair

Two connections sharing bandwidth R:

Throughput2
    ^
    |  \  Fair point
    |   \ (R/2, R/2)
    |    \   *
R   |     \ / \
    |      *   *
    |     / \   \
    |    /   *   *
    |   /   / \   \
    |  /   /   *   *  -> Converges to fair point
    +--------------------> Throughput1
    0                 R

  AIMD convergence process:
  1. Both connections increase cwnd equally (Additive Increase)
     -> Move at 45 degrees (toward the total=R line)
  2. On loss, each halves cwnd (Multiplicative Decrease)
     -> Move toward origin
  3. Repeating converges to the fair point (R/2, R/2)

8.3 Practical Limitations of Fairness

UDP does not perform congestion control:
  -> UDP takes bandwidth that TCP yields
  -> Unfair to TCP

Parallel TCP connections:
  -> An application with multiple TCP connections
  -> Gets more bandwidth than single-connection applications
  -> Example: Browsers downloading web objects over multiple connections

9. Summary

TCP key mechanisms summary:

  Connection management:
  +-- 3-way handshake (SYN -> SYN+ACK -> ACK)
  +-- 4-way handshake termination (FIN -> ACK -> FIN -> ACK)

  Reliable transfer:
  +-- Sequence numbers + acknowledgment numbers (cumulative ACK)
  +-- Timer-based retransmission
  +-- Fast retransmit (3 duplicate ACKs)
  +-- RTT estimation (EWMA)

  Flow control:
  +-- rwnd (receive window) protects the receiver

  Congestion control:
  +-- Slow start: Exponential increase
  +-- Congestion avoidance: Linear increase
  +-- Fast recovery: Halve cwnd on 3 dup ACKs
  +-- AIMD -> Fair bandwidth sharing

10. Review Questions

Q1. Why are 3 message exchanges needed in the 3-way handshake?

Step 1 (SYN): The client requests a connection and communicates its initial sequence number. Step 2 (SYN+ACK): The server accepts the connection, communicates its initial sequence number, and acknowledges the client's sequence number. Step 3 (ACK): The client acknowledges the server's sequence number.

Through this process, both sides confirm each other's initial sequence numbers and intent to connect. With only 2 steps, the server would not know whether the client received the SYN+ACK.

Q2. What is the difference between flow control and congestion control?
  • Flow control: Protects the receiver. Uses rwnd (receive window) to limit the sender's transmission rate so the receive buffer does not overflow.
  • Congestion control: Protects the network. Uses cwnd (congestion window) to limit the sender's transmission rate so the network does not become overloaded.

The actual transmission rate is determined by min(cwnd, rwnd).

Q3. In slow start, cwnd increases exponentially -- so why is it called "slow" start?

The name comes from the fact that the initial cwnd starts very small at 1 MSS. Protocols before TCP would immediately send data at the maximum allowed window size at the start of a connection, which caused network congestion. Slow start is "slow" in comparison because it starts from 1 MSS. Of course, since the growth is exponential, it actually increases rapidly.

Q4. Why does TCP Reno treat timeouts and 3 duplicate ACKs differently?

Three duplicate ACKs indicate that only a specific segment was lost, but segments after it are arriving. Since the network is still delivering some packets, the congestion is judged to be not severe, so cwnd is only halved (fast recovery).

In contrast, a timeout means no ACK is arriving at all, suggesting severe network congestion. Therefore, cwnd is reset to 1 MSS and slow start begins again from scratch.