Split View: TCP/IP 연결 디버깅 완벽 가이드 - 실무에서 바로 쓰는 트러블슈팅

TCP/IP 연결 디버깅 완벽 가이드 - 실무에서 바로 쓰는 트러블슈팅

개요
1. TCP 3-Way Handshake 이해하기
2. 흔한 TCP 연결 문제들
3. 디버깅 도구 활용
4. TCP 튜닝 파라미터
5. Connection Tracking (conntrack)
- 5.1 conntrack 기본
- 5.2 conntrack 최적화
6. 실전 디버깅 시나리오
7. 모니터링 자동화 스크립트
- 7.1 TCP 상태 모니터링
- 7.2 conntrack 모니터링
8. 정리: TCP 디버깅 체크리스트

개요

서버 운영이나 백엔드 개발을 하다 보면 "연결이 안 돼요", "타임아웃이 나요", "간헐적으로 끊겨요"라는 문제를 자주 접하게 됩니다. 이런 문제의 대부분은 TCP/IP 레벨에서 발생하며, 정확한 원인을 파악하려면 TCP의 동작 원리와 디버깅 도구를 제대로 이해해야 합니다.

이 글에서는 TCP 연결의 기본 원리부터 실무에서 바로 활용할 수 있는 디버깅 기법까지 단계적으로 정리합니다.

1. TCP 3-Way Handshake 이해하기

1.1 연결 수립 과정

TCP 연결은 3-way handshake를 통해 수립됩니다.

Client                    Server
  |                         |
  |--- SYN (seq=x) ------->|  (1) 클라이언트가 SYN 전송
  |                         |
  |<-- SYN-ACK (seq=y,      |  (2) 서버가 SYN-ACK 응답
  |     ack=x+1) ----------|
  |                         |
  |--- ACK (ack=y+1) ----->|  (3) 클라이언트가 ACK 전송
  |                         |
  |== 연결 수립 완료 ========|

각 단계에서 문제가 발생할 수 있으며, 어느 단계에서 실패하느냐에 따라 원인이 달라집니다.

1.2 TCP 연결 상태 전이

TCP 소켓은 다양한 상태를 가집니다. 디버깅 시 이 상태를 이해하는 것이 핵심입니다.

LISTEN       → 서버가 연결 대기 중
SYN_SENT     → 클라이언트가 SYN 전송 후 대기 중
SYN_RECV     → 서버가 SYN 수신 후 SYN-ACK 전송, ACK 대기 중
ESTABLISHED  → 연결 수립 완료
FIN_WAIT_1   → 능동적 종료 시작 (FIN 전송)
FIN_WAIT_2   → FIN에 대한 ACK 수신, 상대방 FIN 대기
CLOSE_WAIT   → 수동적 종료 (상대방 FIN 수신)
TIME_WAIT    → 연결 종료 후 대기 (2MSL)
LAST_ACK     → 마지막 ACK 대기
CLOSING      → 양쪽 동시 종료
CLOSED       → 연결 종료 완료

1.3 연결 종료 (4-Way Handshake)

Client                    Server
  |                         |
  |--- FIN --------------->|  (1) 클라이언트 종료 요청
  |<-- ACK ----------------|  (2) 서버 ACK
  |                         |
  |<-- FIN ----------------|  (3) 서버 종료 요청
  |--- ACK --------------->|  (4) 클라이언트 ACK
  |                         |
  |-- TIME_WAIT (2MSL) ----|

2. 흔한 TCP 연결 문제들

2.1 Connection Refused (연결 거부)

서버 측에서 해당 포트를 리슨하고 있지 않을 때 RST 패킷으로 응답합니다.

# 증상 확인
$ curl -v http://192.168.1.100:8080
* Trying 192.168.1.100:8080...
* connect to 192.168.1.100 port 8080 failed: Connection refused

# 서버에서 포트 확인
$ ss -tlnp | grep 8080
# (출력 없음 = 리슨하는 프로세스가 없음)

# 방화벽 확인
$ iptables -L -n | grep 8080
$ firewall-cmd --list-ports

주요 원인:

서비스가 시작되지 않음
서비스가 다른 포트에 바인딩됨
서비스가 localhost(127.0.0.1)에만 바인딩됨
방화벽에서 차단

2.2 Connection Timeout (연결 시간 초과)

SYN 패킷을 보냈으나 응답이 없을 때 발생합니다.

# 증상
$ curl --connect-timeout 5 http://10.0.0.50:3306
curl: (28) Connection timed out after 5001 milliseconds

# SYN 재전송 확인
$ netstat -s | grep -i retrans
    12345 segments retransmitted
    678 SYNs to LISTEN sockets dropped

# 커널의 SYN 재전송 횟수 확인
$ cat /proc/sys/net/ipv4/tcp_syn_retries
6

# 재전송 간격: 1초 → 2초 → 4초 → 8초 → 16초 → 32초 (총 약 63초)

주요 원인:

네트워크 경로 문제 (라우팅 오류)
방화벽이 SYN 패킷을 DROP (REJECT가 아닌)
서버가 과부하로 응답 불가
중간 네트워크 장비 문제

2.3 SYN Flood 공격

# SYN_RECV 상태가 비정상적으로 많은 경우
$ ss -s
Total: 15234
TCP:   12890 (estab 234, closed 45, orphaned 12, timewait 567)

$ ss -tn state syn-recv | wc -l
8923  # 비정상적으로 높은 수치

# SYN 쿠키 활성화 확인
$ cat /proc/sys/net/ipv4/tcp_syncookies
1

# SYN backlog 확인
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024

# SYN flood 방어 설정
$ sysctl -w net.ipv4.tcp_syncookies=1
$ sysctl -w net.ipv4.tcp_max_syn_backlog=4096
$ sysctl -w net.ipv4.tcp_synack_retries=2

2.4 RST (Reset) 패킷 분석

RST 패킷은 여러 상황에서 전송됩니다.

# RST 원인 파악을 위한 tcpdump
$ tcpdump -i eth0 'tcp[tcpflags] & (tcp-rst) != 0' -nn

# 출력 예시
14:23:01.123456 IP 192.168.1.10.45678 > 10.0.0.5.80: Flags [R], seq 0, ack 1234567, win 0, length 0
14:23:01.234567 IP 10.0.0.5.80 > 192.168.1.10.45678: Flags [R.], seq 0, ack 7654321, win 0, length 0

RST 발생 원인:

닫힌 포트로의 연결 시도
애플리케이션의 강제 종료 (SO_LINGER 설정)
TCP keepalive 실패
방화벽의 연결 추적 테이블 오버플로

2.5 TIME_WAIT 과다

# TIME_WAIT 상태 카운트
$ ss -s
TCP:   8934 (estab 234, closed 45, orphaned 12, timewait 8123)

# TIME_WAIT 상태의 소켓 상세 확인
$ ss -tn state time-wait | head -20
Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
0       0       192.168.1.10:45678  10.0.0.5:80
0       0       192.168.1.10:45679  10.0.0.5:80
...

# TIME_WAIT 소켓 재사용 설정
$ sysctl -w net.ipv4.tcp_tw_reuse=1

# TIME_WAIT 타임아웃 확인 (수정 불가, 기본 60초)
$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60

3. 디버깅 도구 활용

3.1 ss (Socket Statistics)

ss는 netstat의 현대적 대안으로 더 빠르고 상세한 정보를 제공합니다.

# 모든 TCP 리스닝 소켓 + 프로세스 정보
$ ss -tlnp
State    Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN   0       128     0.0.0.0:22          0.0.0.0:*          users:(("sshd",pid=1234,fd=3))
LISTEN   0       511     0.0.0.0:80          0.0.0.0:*          users:(("nginx",pid=5678,fd=6))
LISTEN   0       128     0.0.0.0:443         0.0.0.0:*          users:(("nginx",pid=5678,fd=7))

# 특정 상태의 소켓만 필터링
$ ss -tn state established '( dport = :443 )'
Recv-Q  Send-Q  Local Address:Port     Peer Address:Port
0       0       192.168.1.10:45678     10.0.0.5:443
0       0       192.168.1.10:45679     10.0.0.5:443

# 타이머 정보 포함 (재전송, keepalive 등)
$ ss -tnio
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
ESTAB  0       0       192.168.1.10:22     10.0.0.1:54321
     cubic wscale:7,7 rto:204 rtt:1.5/0.75 ato:40 mss:1448 pmtu:1500
     rcvmss:1448 advmss:1448 cwnd:10 ssthresh:20 bytes_sent:12345
     bytes_acked:12345 bytes_received:6789 segs_out:100 segs_in:80

# Recv-Q/Send-Q 의미 (LISTEN 상태)
# Recv-Q: 현재 SYN backlog 큐에 있는 연결 수
# Send-Q: SYN backlog 큐의 최대 크기

3.2 netstat (레거시이지만 여전히 유용)

# TCP 연결 상태별 카운트
$ netstat -ant | awk '{print $6}' | sort | uniq -c | sort -rn
    234 ESTABLISHED
    123 TIME_WAIT
     45 CLOSE_WAIT
     12 LISTEN
      5 SYN_SENT
      3 FIN_WAIT2
      1 SYN_RECV

# 특정 포트의 연결 확인
$ netstat -antp | grep :3306
tcp  0  0  192.168.1.10:3306  192.168.1.20:45678  ESTABLISHED  1234/mysqld
tcp  0  0  192.168.1.10:3306  192.168.1.21:34567  ESTABLISHED  1234/mysqld

# 네트워크 통계 요약
$ netstat -s | head -40
Ip:
    123456 total packets received
    0 forwarded
    0 incoming packets discarded
    123456 incoming packets delivered
    654321 requests sent out
Tcp:
    12345 active connection openings
    6789 passive connection openings
    123 failed connection attempts
    45 connection resets received
    234 connections established
    567890 segments received
    678901 segments sent out
    1234 segments retransmitted

3.3 tcpdump 패킷 캡처

# 기본 패킷 캡처 (특정 호스트, 포트)
$ tcpdump -i eth0 host 10.0.0.5 and port 80 -nn

# 3-way handshake만 캡처
$ tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn|tcp-ack) != 0' -nn

# SYN 패킷만 캡처 (ACK 없이 SYN만)
$ tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn) != 0 and tcp[tcpflags] & (tcp-ack) == 0' -nn

# 파일로 저장 (Wireshark에서 분석용)
$ tcpdump -i eth0 -w /tmp/capture.pcap -c 10000 host 10.0.0.5

# 저장된 파일 읽기
$ tcpdump -r /tmp/capture.pcap -nn

# 패킷 내용 포함 (hex + ASCII)
$ tcpdump -i eth0 -X port 80 -c 5

# 출력 예시 (3-way handshake)
14:30:01.001 IP 192.168.1.10.45678 > 10.0.0.5.80: Flags [S], seq 1234567, win 64240, options [mss 1460,sackOK,TS val 123 ecr 0,nop,wscale 7], length 0
14:30:01.002 IP 10.0.0.5.80 > 192.168.1.10.45678: Flags [S.], seq 7654321, ack 1234568, win 65160, options [mss 1460,sackOK,TS val 456 ecr 123,nop,wscale 7], length 0
14:30:01.002 IP 192.168.1.10.45678 > 10.0.0.5.80: Flags [.], ack 7654322, win 502, options [nop,nop,TS val 123 ecr 456], length 0

# HTTP 요청/응답 내용까지 캡처
$ tcpdump -i eth0 -A -s 0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -c 20

3.4 Wireshark 기본 필터

tcpdump로 캡처한 pcap 파일을 Wireshark에서 분석할 때 유용한 필터입니다.

# 기본 디스플레이 필터
tcp.port == 80                      # 특정 포트
ip.addr == 192.168.1.10             # 특정 IP
tcp.flags.syn == 1 && tcp.flags.ack == 0  # SYN만
tcp.flags.reset == 1                # RST 패킷
tcp.analysis.retransmission         # 재전송 패킷
tcp.analysis.duplicate_ack          # 중복 ACK
tcp.analysis.zero_window            # 제로 윈도우
tcp.analysis.window_full            # 윈도우 풀

# 스트림 추적
tcp.stream eq 5                     # 특정 TCP 스트림

# 시간 기반 필터
frame.time >= "2026-03-08 14:00:00" && frame.time <= "2026-03-08 14:30:00"

# 느린 응답 찾기
tcp.time_delta > 1                  # 1초 이상 지연된 패킷

4. TCP 튜닝 파라미터

4.1 Backlog 설정

# listen backlog (SYN 큐 + accept 큐)
$ cat /proc/sys/net/core/somaxconn
4096

# SYN backlog (반-열림 연결 큐)
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024

# 권장 설정 (고트래픽 서버)
$ sysctl -w net.core.somaxconn=65535
$ sysctl -w net.ipv4.tcp_max_syn_backlog=65535

# 애플리케이션에서도 backlog 설정 필요
# Nginx 예시
# listen 80 backlog=65535;

4.2 Keepalive 설정

# 현재 설정 확인
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200   # 기본값: 2시간 (idle 후 첫 probe까지)

$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75     # probe 간격 (초)

$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9      # 최대 probe 횟수

# 빠른 죽은 연결 감지를 위한 설정
$ sysctl -w net.ipv4.tcp_keepalive_time=600    # 10분
$ sysctl -w net.ipv4.tcp_keepalive_intvl=30    # 30초
$ sysctl -w net.ipv4.tcp_keepalive_probes=5    # 5회

# 총 감지 시간: 600 + (30 * 5) = 750초 (12.5분)

4.3 Window Scaling

# 윈도우 스케일링 활성화 (기본값: 1)
$ cat /proc/sys/net/ipv4/tcp_window_scaling
1

# TCP 버퍼 크기 설정 (min, default, max)
$ cat /proc/sys/net/ipv4/tcp_rmem
4096    131072    6291456

$ cat /proc/sys/net/ipv4/tcp_wmem
4096    16384     4194304

# 고대역폭 환경 최적화
$ sysctl -w net.ipv4.tcp_rmem="4096 262144 16777216"
$ sysctl -w net.ipv4.tcp_wmem="4096 262144 16777216"
$ sysctl -w net.core.rmem_max=16777216
$ sysctl -w net.core.wmem_max=16777216

# BDP (Bandwidth-Delay Product) 계산
# 대역폭 1Gbps, RTT 10ms의 경우:
# BDP = 1,000,000,000 * 0.01 / 8 = 1,250,000 bytes (약 1.2MB)
# → tcp_rmem/tcp_wmem의 max 값을 BDP 이상으로 설정

4.4 기타 중요한 파라미터

# 포트 범위 (로컬 임시 포트)
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   60999

# 더 넓은 포트 범위 설정
$ sysctl -w net.ipv4.ip_local_port_range="1024 65535"

# FIN_TIMEOUT (TIME_WAIT → CLOSED 전환 시간)
$ sysctl -w net.ipv4.tcp_fin_timeout=30

# TCP Fast Open 활성화
$ sysctl -w net.ipv4.tcp_fastopen=3

# 최대 파일 디스크립터 수 (소켓도 fd)
$ cat /proc/sys/fs/file-max
1048576
$ ulimit -n
65535

5. Connection Tracking (conntrack)

5.1 conntrack 기본

Linux의 netfilter는 모든 연결을 추적합니다. iptables/nftables 규칙에서 상태 기반 필터링에 사용됩니다.

# conntrack 테이블 확인
$ conntrack -L | head -10
tcp      6 431999 ESTABLISHED src=192.168.1.10 dst=10.0.0.5 sport=45678 dport=80 src=10.0.0.5 dst=192.168.1.10 sport=80 dport=45678 [ASSURED] mark=0 use=1
tcp      6 116 SYN_SENT src=192.168.1.10 dst=10.0.0.50 sport=34567 dport=3306 [UNREPLIED] src=10.0.0.50 dst=192.168.1.10 sport=3306 dport=34567 mark=0 use=1

# conntrack 테이블 카운트
$ conntrack -C
45678

# conntrack 최대값 확인
$ cat /proc/sys/net/nf_conntrack_max
262144

# conntrack 테이블이 가득 차면 새 연결이 DROP됨
# dmesg에서 확인
$ dmesg | grep conntrack
[12345.678] nf_conntrack: table full, dropping packet

5.2 conntrack 최적화

# conntrack 최대값 증가
$ sysctl -w net.nf_conntrack_max=1048576

# conntrack 해시 테이블 크기
$ cat /proc/sys/net/netfilter/nf_conntrack_buckets
65536

# 타임아웃 설정
$ cat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established
432000  # 5일 (기본값)

# 더 짧은 타임아웃으로 변경
$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400   # 1일
$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_close_wait=60

# 특정 연결 삭제
$ conntrack -D -s 192.168.1.100
$ conntrack -D -p tcp --dport 80

6. 실전 디버깅 시나리오

6.1 시나리오: 웹 서버 간헐적 타임아웃

# 1단계: 현재 연결 상태 확인
$ ss -s
Total: 15234
TCP:   12890 (estab 8234, closed 45, orphaned 12, timewait 4567)

# 2단계: CLOSE_WAIT 확인 (서버 쪽 소켓 누수 의심)
$ ss -tn state close-wait | wc -l
2345  # 너무 많음! 애플리케이션에서 소켓을 닫지 않는 것

# 3단계: 어떤 프로세스가 CLOSE_WAIT 소켓을 가지고 있는지
$ ss -tnp state close-wait | awk '{print $NF}' | sort | uniq -c | sort -rn
   2100 users:(("java",pid=5678,fd=...))
    200 users:(("python",pid=1234,fd=...))

# 4단계: 해당 프로세스의 fd 수 확인
$ ls -la /proc/5678/fd | wc -l
4567  # 파일 디스크립터 누수 확인

# 5단계: lsof로 상세 확인
$ lsof -p 5678 | grep TCP | grep CLOSE_WAIT | wc -l
2100

6.2 시나리오: 마이크로서비스 간 연결 실패

# 1단계: DNS 확인
$ dig +short service-b.internal
10.0.0.50

# 2단계: 포트 연결 테스트
$ nc -zv -w 3 10.0.0.50 8080
Connection to 10.0.0.50 8080 port [tcp/*] succeeded!

# 또는 실패 시
$ nc -zv -w 3 10.0.0.50 8080
nc: connect to 10.0.0.50 port 8080 (tcp) failed: Connection timed out

# 3단계: traceroute로 네트워크 경로 확인
$ traceroute -T -p 8080 10.0.0.50
traceroute to 10.0.0.50, 30 hops max
 1  gateway (10.0.0.1)  0.5 ms  0.4 ms  0.3 ms
 2  * * *
 3  10.0.0.50  1.2 ms  1.1 ms  1.0 ms

# 4단계: MTU 문제 확인
$ ping -M do -s 1472 10.0.0.50
PING 10.0.0.50 (10.0.0.50) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1400

# MTU 조정
$ ip link set dev eth0 mtu 1400

6.3 시나리오: 대량 트래픽 서버 최적화

# 전체 튜닝 스크립트
cat << 'EOF' > /etc/sysctl.d/99-tcp-tuning.conf
# Backlog 설정
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535

# TIME_WAIT 최적화
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

# Keepalive
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5

# 버퍼 크기
net.ipv4.tcp_rmem = 4096 262144 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# 포트 범위
net.ipv4.ip_local_port_range = 1024 65535

# SYN flood 방어
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2

# conntrack
net.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 86400

# TCP Fast Open
net.ipv4.tcp_fastopen = 3

# Window Scaling
net.ipv4.tcp_window_scaling = 1
EOF

# 적용
$ sysctl -p /etc/sysctl.d/99-tcp-tuning.conf

6.4 시나리오: tcpdump로 느린 응답 분석

# 특정 서비스와의 통신 캡처 (30초간)
$ timeout 30 tcpdump -i eth0 -w /tmp/slow.pcap host 10.0.0.5 and port 443

# 캡처 파일 분석 - 재전송 확인
$ tcpdump -r /tmp/slow.pcap 'tcp[tcpflags] & (tcp-syn) != 0' -nn
# SYN 재전송이 보이면 초기 연결 문제

# 캡처 파일 분석 - 데이터 흐름 확인
$ tcpdump -r /tmp/slow.pcap -ttt -nn | head -30
# -ttt: 패킷 간 시간 간격 표시
 00:00:00.000000 IP 192.168.1.10.45678 > 10.0.0.5.443: Flags [S], seq 123
 00:00:00.001234 IP 10.0.0.5.443 > 192.168.1.10.45678: Flags [S.], seq 456, ack 124
 00:00:00.000100 IP 192.168.1.10.45678 > 10.0.0.5.443: Flags [.], ack 457
 00:00:00.000500 IP 192.168.1.10.45678 > 10.0.0.5.443: Flags [P.], seq 124:500
 00:00:02.345678 IP 10.0.0.5.443 > 192.168.1.10.45678: Flags [P.], seq 457:1200
# ↑ 2초 이상의 간격 = 서버 측 처리 지연

# tshark (CLI Wireshark)로 상세 분석
$ tshark -r /tmp/slow.pcap -q -z io,stat,1
# 1초 간격 통계

$ tshark -r /tmp/slow.pcap -q -z conv,tcp
# TCP 대화별 통계

7. 모니터링 자동화 스크립트

7.1 TCP 상태 모니터링

#!/bin/bash
# tcp_monitor.sh - TCP 연결 상태 모니터링

LOG_FILE="/var/log/tcp_monitor.log"
ALERT_THRESHOLD_TW=5000      # TIME_WAIT 경고 임계치
ALERT_THRESHOLD_CW=100       # CLOSE_WAIT 경고 임계치
ALERT_THRESHOLD_SYNR=500     # SYN_RECV 경고 임계치

while true; do
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

    # 상태별 카운트
    ESTABLISHED=$(ss -tn state established | wc -l)
    TIME_WAIT=$(ss -tn state time-wait | wc -l)
    CLOSE_WAIT=$(ss -tn state close-wait | wc -l)
    SYN_RECV=$(ss -tn state syn-recv | wc -l)
    LISTEN=$(ss -tn state listening | wc -l)

    echo "$TIMESTAMP ESTAB=$ESTABLISHED TW=$TIME_WAIT CW=$CLOSE_WAIT SYNR=$SYN_RECV LISTEN=$LISTEN" >> "$LOG_FILE"

    # 경고 체크
    if [ "$TIME_WAIT" -gt "$ALERT_THRESHOLD_TW" ]; then
        echo "$TIMESTAMP [WARN] TIME_WAIT=$TIME_WAIT exceeds threshold" >> "$LOG_FILE"
    fi

    if [ "$CLOSE_WAIT" -gt "$ALERT_THRESHOLD_CW" ]; then
        echo "$TIMESTAMP [ALERT] CLOSE_WAIT=$CLOSE_WAIT - possible socket leak!" >> "$LOG_FILE"
    fi

    if [ "$SYN_RECV" -gt "$ALERT_THRESHOLD_SYNR" ]; then
        echo "$TIMESTAMP [ALERT] SYN_RECV=$SYN_RECV - possible SYN flood!" >> "$LOG_FILE"
    fi

    sleep 10
done

7.2 conntrack 모니터링

#!/bin/bash
# conntrack_monitor.sh

MAX=$(cat /proc/sys/net/nf_conntrack_max)
CURRENT=$(conntrack -C 2>/dev/null || cat /proc/sys/net/netfilter/nf_conntrack_count)
USAGE=$((CURRENT * 100 / MAX))

echo "Conntrack: $CURRENT / $MAX ($USAGE%)"

if [ "$USAGE" -gt 80 ]; then
    echo "[WARN] Conntrack table usage above 80%!"
    echo "Top source IPs:"
    conntrack -L 2>/dev/null | awk '{print $5}' | cut -d= -f2 | sort | uniq -c | sort -rn | head -10
fi

8. 정리: TCP 디버깅 체크리스트

문제 발생 시 다음 순서로 확인합니다.

1. 기본 연결 확인
   □ ping / traceroute로 네트워크 경로 확인
   □ nc -zv 또는 telnet으로 포트 연결 테스트
   □ DNS 해석 확인 (dig, nslookup)

2. 소켓 상태 확인
   □ ss -tlnp로 리스닝 포트 확인
   □ ss -s로 전체 소켓 통계 확인
   □ 비정상 상태 (CLOSE_WAIT, SYN_RECV 과다) 체크

3. 패킷 레벨 분석
   □ tcpdump로 패킷 캡처
   □ RST, 재전송, 지연 분석
   □ Wireshark로 상세 분석

4. 커널 파라미터 확인
   □ backlog 크기
   □ conntrack 테이블 사용률
   □ 파일 디스크립터 제한

5. 애플리케이션 레벨
   □ 연결 풀 설정
   □ 타임아웃 설정
   □ 에러 로그 확인

TCP/IP 문제는 계층적으로 접근하는 것이 핵심입니다. 물리적/네트워크 레벨부터 시작하여 애플리케이션 레벨로 올라가면서 원인을 좁혀 나가는 것이 효율적입니다.

The Complete Guide to TCP/IP Connection Debugging in Production

Overview
1. Understanding the TCP 3-Way Handshake
2. Common TCP Connection Issues
3. Debugging Tools
4. TCP Tuning Parameters
5. Connection Tracking (conntrack)
- 5.1 conntrack Basics
- 5.2 conntrack Optimization
6. Real-World Debugging Scenarios
7. Automated Monitoring Scripts
- 7.1 TCP State Monitor
- 7.2 conntrack Monitor
8. Summary: TCP Debugging Checklist
Quiz

Overview

When running servers or developing backend systems, you will inevitably encounter problems like "it won't connect," "I'm getting timeouts," or "the connection drops intermittently." Most of these issues occur at the TCP/IP level, and diagnosing the root cause requires a solid understanding of how TCP works and how to use the right debugging tools.

This post walks through TCP connection fundamentals to practical debugging techniques you can apply immediately in production.

1. Understanding the TCP 3-Way Handshake

1.1 Connection Establishment

TCP connections are established through a 3-way handshake.

Client                    Server
  |                         |
  |--- SYN (seq=x) ------->|  (1) Client sends SYN
  |                         |
  |<-- SYN-ACK (seq=y,      |  (2) Server responds with SYN-ACK
  |     ack=x+1) ----------|
  |                         |
  |--- ACK (ack=y+1) ----->|  (3) Client sends ACK
  |                         |
  |== Connection Established |

Problems can occur at each step, and the failure point indicates different root causes.

1.2 TCP Connection States

TCP sockets transition through various states. Understanding these states is essential for debugging.

LISTEN       - Server waiting for incoming connections
SYN_SENT     - Client has sent SYN, waiting for SYN-ACK
SYN_RECV     - Server received SYN, sent SYN-ACK, waiting for ACK
ESTABLISHED  - Connection fully established
FIN_WAIT_1   - Active close initiated (FIN sent)
FIN_WAIT_2   - Received ACK for FIN, waiting for peer's FIN
CLOSE_WAIT   - Passive close (received peer's FIN)
TIME_WAIT    - Waiting after close (2x MSL)
LAST_ACK     - Waiting for final ACK
CLOSING      - Both sides closing simultaneously
CLOSED       - Connection terminated

1.3 Connection Termination (4-Way Handshake)

Client                    Server
  |                         |
  |--- FIN --------------->|  (1) Client requests close
  |<-- ACK ----------------|  (2) Server acknowledges
  |                         |
  |<-- FIN ----------------|  (3) Server requests close
  |--- ACK --------------->|  (4) Client acknowledges
  |                         |
  |-- TIME_WAIT (2MSL) ----|

2. Common TCP Connection Issues

2.1 Connection Refused

The server responds with an RST packet when no process is listening on the target port.

# Observe the symptom
$ curl -v http://192.168.1.100:8080
* Trying 192.168.1.100:8080...
* connect to 192.168.1.100 port 8080 failed: Connection refused

# Check if the port is being listened on
$ ss -tlnp | grep 8080
# (no output = no process listening)

# Check firewall rules
$ iptables -L -n | grep 8080
$ firewall-cmd --list-ports

Common causes:

The service is not running
The service is bound to a different port
The service is bound only to localhost (127.0.0.1)
A firewall is blocking the connection with REJECT

2.2 Connection Timeout

Occurs when a SYN packet is sent but no response is received.

# Symptom
$ curl --connect-timeout 5 http://10.0.0.50:3306
curl: (28) Connection timed out after 5001 milliseconds

# Check SYN retransmission statistics
$ netstat -s | grep -i retrans
    12345 segments retransmitted
    678 SYNs to LISTEN sockets dropped

# View kernel SYN retry count
$ cat /proc/sys/net/ipv4/tcp_syn_retries
6

# Retry intervals: 1s -> 2s -> 4s -> 8s -> 16s -> 32s (total ~63s)

Common causes:

Routing issues (packets going to the wrong destination)
Firewall dropping SYN packets silently (DROP instead of REJECT)
Server overloaded and unable to respond
Intermediate network equipment failure

2.3 SYN Flood Attacks

# Abnormally high SYN_RECV count
$ ss -s
Total: 15234
TCP:   12890 (estab 234, closed 45, orphaned 12, timewait 567)

$ ss -tn state syn-recv | wc -l
8923  # Abnormally high

# Check if SYN cookies are enabled
$ cat /proc/sys/net/ipv4/tcp_syncookies
1

# Check SYN backlog size
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024

# SYN flood defense configuration
$ sysctl -w net.ipv4.tcp_syncookies=1
$ sysctl -w net.ipv4.tcp_max_syn_backlog=4096
$ sysctl -w net.ipv4.tcp_synack_retries=2

2.4 RST (Reset) Packet Analysis

RST packets are sent in various situations.

# Capture RST packets with tcpdump
$ tcpdump -i eth0 'tcp[tcpflags] & (tcp-rst) != 0' -nn

# Example output
14:23:01.123456 IP 192.168.1.10.45678 > 10.0.0.5.80: Flags [R], seq 0, ack 1234567, win 0, length 0
14:23:01.234567 IP 10.0.0.5.80 > 192.168.1.10.45678: Flags [R.], seq 0, ack 7654321, win 0, length 0

Causes of RST:

Connection attempt to a closed port
Application forced close (SO_LINGER configuration)
TCP keepalive failure
Firewall connection tracking table overflow

2.5 Excessive TIME_WAIT

# Count TIME_WAIT sockets
$ ss -s
TCP:   8934 (estab 234, closed 45, orphaned 12, timewait 8123)

# View TIME_WAIT socket details
$ ss -tn state time-wait | head -20
Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
0       0       192.168.1.10:45678  10.0.0.5:80
0       0       192.168.1.10:45679  10.0.0.5:80
...

# Enable TIME_WAIT socket reuse
$ sysctl -w net.ipv4.tcp_tw_reuse=1

# Check FIN timeout (default 60 seconds, not modifiable for TIME_WAIT itself)
$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60

3. Debugging Tools

3.1 ss (Socket Statistics)

ss is the modern replacement for netstat, offering faster output and more detailed information.

# List all TCP listening sockets with process info
$ ss -tlnp
State    Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN   0       128     0.0.0.0:22          0.0.0.0:*          users:(("sshd",pid=1234,fd=3))
LISTEN   0       511     0.0.0.0:80          0.0.0.0:*          users:(("nginx",pid=5678,fd=6))
LISTEN   0       128     0.0.0.0:443         0.0.0.0:*          users:(("nginx",pid=5678,fd=7))

# Filter sockets by state
$ ss -tn state established '( dport = :443 )'
Recv-Q  Send-Q  Local Address:Port     Peer Address:Port
0       0       192.168.1.10:45678     10.0.0.5:443
0       0       192.168.1.10:45679     10.0.0.5:443

# Include timer information (retransmission, keepalive, etc.)
$ ss -tnio
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
ESTAB  0       0       192.168.1.10:22     10.0.0.1:54321
     cubic wscale:7,7 rto:204 rtt:1.5/0.75 ato:40 mss:1448 pmtu:1500
     rcvmss:1448 advmss:1448 cwnd:10 ssthresh:20 bytes_sent:12345
     bytes_acked:12345 bytes_received:6789 segs_out:100 segs_in:80

# Recv-Q/Send-Q meaning for LISTEN state:
# Recv-Q: current number of connections in the SYN backlog queue
# Send-Q: maximum size of the SYN backlog queue

3.2 netstat (Legacy but Still Useful)

# Count connections by TCP state
$ netstat -ant | awk '{print $6}' | sort | uniq -c | sort -rn
    234 ESTABLISHED
    123 TIME_WAIT
     45 CLOSE_WAIT
     12 LISTEN
      5 SYN_SENT
      3 FIN_WAIT2
      1 SYN_RECV

# Check connections on a specific port
$ netstat -antp | grep :3306
tcp  0  0  192.168.1.10:3306  192.168.1.20:45678  ESTABLISHED  1234/mysqld
tcp  0  0  192.168.1.10:3306  192.168.1.21:34567  ESTABLISHED  1234/mysqld

# Network statistics summary
$ netstat -s | head -40
Ip:
    123456 total packets received
    0 forwarded
    0 incoming packets discarded
    123456 incoming packets delivered
    654321 requests sent out
Tcp:
    12345 active connection openings
    6789 passive connection openings
    123 failed connection attempts
    45 connection resets received
    234 connections established
    567890 segments received
    678901 segments sent out
    1234 segments retransmitted

3.3 tcpdump Packet Capture

# Basic packet capture (specific host and port)
$ tcpdump -i eth0 host 10.0.0.5 and port 80 -nn

# Capture only 3-way handshake packets
$ tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn|tcp-ack) != 0' -nn

# Capture only SYN packets (SYN without ACK)
$ tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn) != 0 and tcp[tcpflags] & (tcp-ack) == 0' -nn

# Save to file for Wireshark analysis
$ tcpdump -i eth0 -w /tmp/capture.pcap -c 10000 host 10.0.0.5

# Read a saved capture file
$ tcpdump -r /tmp/capture.pcap -nn

# Include packet contents (hex + ASCII)
$ tcpdump -i eth0 -X port 80 -c 5

# Example output (3-way handshake)
14:30:01.001 IP 192.168.1.10.45678 > 10.0.0.5.80: Flags [S], seq 1234567, win 64240, options [mss 1460,sackOK,TS val 123 ecr 0,nop,wscale 7], length 0
14:30:01.002 IP 10.0.0.5.80 > 192.168.1.10.45678: Flags [S.], seq 7654321, ack 1234568, win 65160, options [mss 1460,sackOK,TS val 456 ecr 123,nop,wscale 7], length 0
14:30:01.002 IP 192.168.1.10.45678 > 10.0.0.5.80: Flags [.], ack 7654322, win 502, options [nop,nop,TS val 123 ecr 456], length 0

# Capture HTTP request/response content
$ tcpdump -i eth0 -A -s 0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -c 20

3.4 Wireshark Display Filters

Useful filters when analyzing pcap files captured by tcpdump in Wireshark.

# Basic display filters
tcp.port == 80                      # Specific port
ip.addr == 192.168.1.10             # Specific IP
tcp.flags.syn == 1 && tcp.flags.ack == 0  # SYN only
tcp.flags.reset == 1                # RST packets
tcp.analysis.retransmission         # Retransmitted packets
tcp.analysis.duplicate_ack          # Duplicate ACKs
tcp.analysis.zero_window            # Zero window
tcp.analysis.window_full            # Window full

# Follow a specific TCP stream
tcp.stream eq 5

# Time-based filtering
frame.time >= "2026-03-08 14:00:00" && frame.time <= "2026-03-08 14:30:00"

# Find slow responses
tcp.time_delta > 1                  # Packets delayed more than 1 second

4. TCP Tuning Parameters

4.1 Backlog Configuration

# Listen backlog (SYN queue + accept queue)
$ cat /proc/sys/net/core/somaxconn
4096

# SYN backlog (half-open connection queue)
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024

# Recommended settings for high-traffic servers
$ sysctl -w net.core.somaxconn=65535
$ sysctl -w net.ipv4.tcp_max_syn_backlog=65535

# The application must also set the backlog
# Nginx example:
# listen 80 backlog=65535;

4.2 Keepalive Configuration

# View current settings
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200   # Default: 2 hours (idle time before first probe)

$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75     # Interval between probes (seconds)

$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9      # Maximum number of probes

# Configuration for faster dead connection detection
$ sysctl -w net.ipv4.tcp_keepalive_time=600    # 10 minutes
$ sysctl -w net.ipv4.tcp_keepalive_intvl=30    # 30 seconds
$ sysctl -w net.ipv4.tcp_keepalive_probes=5    # 5 probes

# Total detection time: 600 + (30 * 5) = 750 seconds (12.5 minutes)

4.3 Window Scaling

# Check if window scaling is enabled (default: 1)
$ cat /proc/sys/net/ipv4/tcp_window_scaling
1

# TCP buffer sizes (min, default, max)
$ cat /proc/sys/net/ipv4/tcp_rmem
4096    131072    6291456

$ cat /proc/sys/net/ipv4/tcp_wmem
4096    16384     4194304

# Optimization for high-bandwidth environments
$ sysctl -w net.ipv4.tcp_rmem="4096 262144 16777216"
$ sysctl -w net.ipv4.tcp_wmem="4096 262144 16777216"
$ sysctl -w net.core.rmem_max=16777216
$ sysctl -w net.core.wmem_max=16777216

# BDP (Bandwidth-Delay Product) calculation
# For 1 Gbps bandwidth, 10ms RTT:
# BDP = 1,000,000,000 * 0.01 / 8 = 1,250,000 bytes (~1.2 MB)
# Set tcp_rmem/tcp_wmem max values to at least the BDP

4.4 Other Important Parameters

# Ephemeral port range
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   60999

# Expand port range
$ sysctl -w net.ipv4.ip_local_port_range="1024 65535"

# FIN timeout (time before transitioning from TIME_WAIT to CLOSED)
$ sysctl -w net.ipv4.tcp_fin_timeout=30

# Enable TCP Fast Open
$ sysctl -w net.ipv4.tcp_fastopen=3

# Maximum file descriptors (sockets are file descriptors too)
$ cat /proc/sys/fs/file-max
1048576
$ ulimit -n
65535

5. Connection Tracking (conntrack)

5.1 conntrack Basics

Linux netfilter tracks all connections. This is used for stateful filtering in iptables/nftables rules.

# View the conntrack table
$ conntrack -L | head -10
tcp      6 431999 ESTABLISHED src=192.168.1.10 dst=10.0.0.5 sport=45678 dport=80 src=10.0.0.5 dst=192.168.1.10 sport=80 dport=45678 [ASSURED] mark=0 use=1
tcp      6 116 SYN_SENT src=192.168.1.10 dst=10.0.0.50 sport=34567 dport=3306 [UNREPLIED] src=10.0.0.50 dst=192.168.1.10 sport=3306 dport=34567 mark=0 use=1

# Count conntrack entries
$ conntrack -C
45678

# Check conntrack table maximum
$ cat /proc/sys/net/nf_conntrack_max
262144

# When the conntrack table is full, new connections are DROPPED
# Check dmesg for evidence
$ dmesg | grep conntrack
[12345.678] nf_conntrack: table full, dropping packet

5.2 conntrack Optimization

# Increase conntrack maximum
$ sysctl -w net.nf_conntrack_max=1048576

# conntrack hash table size
$ cat /proc/sys/net/netfilter/nf_conntrack_buckets
65536

# Timeout settings
$ cat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established
432000  # 5 days (default)

# Use shorter timeouts
$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400   # 1 day
$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_close_wait=60

# Delete specific connections
$ conntrack -D -s 192.168.1.100
$ conntrack -D -p tcp --dport 80

6. Real-World Debugging Scenarios

6.1 Scenario: Intermittent Web Server Timeouts

# Step 1: Check current connection state overview
$ ss -s
Total: 15234
TCP:   12890 (estab 8234, closed 45, orphaned 12, timewait 4567)

# Step 2: Check for CLOSE_WAIT accumulation (suspected socket leak)
$ ss -tn state close-wait | wc -l
2345  # Far too many! Application is not closing sockets properly

# Step 3: Identify which processes hold CLOSE_WAIT sockets
$ ss -tnp state close-wait | awk '{print $NF}' | sort | uniq -c | sort -rn
   2100 users:(("java",pid=5678,fd=...))
    200 users:(("python",pid=1234,fd=...))

# Step 4: Check file descriptor count for the offending process
$ ls -la /proc/5678/fd | wc -l
4567  # Confirms file descriptor leak

# Step 5: Detailed check with lsof
$ lsof -p 5678 | grep TCP | grep CLOSE_WAIT | wc -l
2100

6.2 Scenario: Microservice-to-Microservice Connection Failure

# Step 1: Verify DNS resolution
$ dig +short service-b.internal
10.0.0.50

# Step 2: Test port connectivity
$ nc -zv -w 3 10.0.0.50 8080
Connection to 10.0.0.50 8080 port [tcp/*] succeeded!

# Or on failure:
$ nc -zv -w 3 10.0.0.50 8080
nc: connect to 10.0.0.50 port 8080 (tcp) failed: Connection timed out

# Step 3: Trace the network path
$ traceroute -T -p 8080 10.0.0.50
traceroute to 10.0.0.50, 30 hops max
 1  gateway (10.0.0.1)  0.5 ms  0.4 ms  0.3 ms
 2  * * *
 3  10.0.0.50  1.2 ms  1.1 ms  1.0 ms

# Step 4: Check for MTU issues
$ ping -M do -s 1472 10.0.0.50
PING 10.0.0.50 (10.0.0.50) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1400

# Adjust MTU accordingly
$ ip link set dev eth0 mtu 1400

6.3 Scenario: High-Traffic Server Tuning

# Complete tuning configuration
cat << 'EOF' > /etc/sysctl.d/99-tcp-tuning.conf
# Backlog settings
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535

# TIME_WAIT optimization
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

# Keepalive
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5

# Buffer sizes
net.ipv4.tcp_rmem = 4096 262144 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# Port range
net.ipv4.ip_local_port_range = 1024 65535

# SYN flood defense
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2

# conntrack
net.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 86400

# TCP Fast Open
net.ipv4.tcp_fastopen = 3

# Window Scaling
net.ipv4.tcp_window_scaling = 1
EOF

# Apply the configuration
$ sysctl -p /etc/sysctl.d/99-tcp-tuning.conf

6.4 Scenario: Analyzing Slow Responses with tcpdump

# Capture traffic to a specific service for 30 seconds
$ timeout 30 tcpdump -i eth0 -w /tmp/slow.pcap host 10.0.0.5 and port 443

# Analyze capture - check for retransmissions
$ tcpdump -r /tmp/slow.pcap 'tcp[tcpflags] & (tcp-syn) != 0' -nn
# SYN retransmissions indicate initial connection problems

# Analyze capture - examine data flow timing
$ tcpdump -r /tmp/slow.pcap -ttt -nn | head -30
# -ttt shows inter-packet time deltas
 00:00:00.000000 IP 192.168.1.10.45678 > 10.0.0.5.443: Flags [S], seq 123
 00:00:00.001234 IP 10.0.0.5.443 > 192.168.1.10.45678: Flags [S.], seq 456, ack 124
 00:00:00.000100 IP 192.168.1.10.45678 > 10.0.0.5.443: Flags [.], ack 457
 00:00:00.000500 IP 192.168.1.10.45678 > 10.0.0.5.443: Flags [P.], seq 124:500
 00:00:02.345678 IP 10.0.0.5.443 > 192.168.1.10.45678: Flags [P.], seq 457:1200
# The 2+ second gap above indicates server-side processing delay

# Detailed analysis with tshark (CLI Wireshark)
$ tshark -r /tmp/slow.pcap -q -z io,stat,1
# Statistics at 1-second intervals

$ tshark -r /tmp/slow.pcap -q -z conv,tcp
# Per-TCP-conversation statistics

7. Automated Monitoring Scripts

7.1 TCP State Monitor

#!/bin/bash
# tcp_monitor.sh - Monitor TCP connection states

LOG_FILE="/var/log/tcp_monitor.log"
ALERT_THRESHOLD_TW=5000      # TIME_WAIT alert threshold
ALERT_THRESHOLD_CW=100       # CLOSE_WAIT alert threshold
ALERT_THRESHOLD_SYNR=500     # SYN_RECV alert threshold

while true; do
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

    # Count by state
    ESTABLISHED=$(ss -tn state established | wc -l)
    TIME_WAIT=$(ss -tn state time-wait | wc -l)
    CLOSE_WAIT=$(ss -tn state close-wait | wc -l)
    SYN_RECV=$(ss -tn state syn-recv | wc -l)
    LISTEN=$(ss -tn state listening | wc -l)

    echo "$TIMESTAMP ESTAB=$ESTABLISHED TW=$TIME_WAIT CW=$CLOSE_WAIT SYNR=$SYN_RECV LISTEN=$LISTEN" >> "$LOG_FILE"

    # Alert checks
    if [ "$TIME_WAIT" -gt "$ALERT_THRESHOLD_TW" ]; then
        echo "$TIMESTAMP [WARN] TIME_WAIT=$TIME_WAIT exceeds threshold" >> "$LOG_FILE"
    fi

    if [ "$CLOSE_WAIT" -gt "$ALERT_THRESHOLD_CW" ]; then
        echo "$TIMESTAMP [ALERT] CLOSE_WAIT=$CLOSE_WAIT - possible socket leak!" >> "$LOG_FILE"
    fi

    if [ "$SYN_RECV" -gt "$ALERT_THRESHOLD_SYNR" ]; then
        echo "$TIMESTAMP [ALERT] SYN_RECV=$SYN_RECV - possible SYN flood!" >> "$LOG_FILE"
    fi

    sleep 10
done

7.2 conntrack Monitor

#!/bin/bash
# conntrack_monitor.sh

MAX=$(cat /proc/sys/net/nf_conntrack_max)
CURRENT=$(conntrack -C 2>/dev/null || cat /proc/sys/net/netfilter/nf_conntrack_count)
USAGE=$((CURRENT * 100 / MAX))

echo "Conntrack: $CURRENT / $MAX ($USAGE%)"

if [ "$USAGE" -gt 80 ]; then
    echo "[WARN] Conntrack table usage above 80%!"
    echo "Top source IPs:"
    conntrack -L 2>/dev/null | awk '{print $5}' | cut -d= -f2 | sort | uniq -c | sort -rn | head -10
fi

8. Summary: TCP Debugging Checklist

When a connection problem occurs, follow this systematic approach.

1. Basic Connectivity Check
   [ ] Verify network path with ping / traceroute
   [ ] Test port connectivity with nc -zv or telnet
   [ ] Verify DNS resolution (dig, nslookup)

2. Socket State Analysis
   [ ] Check listening ports with ss -tlnp
   [ ] Review overall socket statistics with ss -s
   [ ] Look for abnormal states (excessive CLOSE_WAIT, SYN_RECV)

3. Packet-Level Analysis
   [ ] Capture packets with tcpdump
   [ ] Analyze RSTs, retransmissions, and delays
   [ ] Use Wireshark for detailed inspection

4. Kernel Parameter Review
   [ ] Backlog queue sizes
   [ ] conntrack table utilization
   [ ] File descriptor limits

5. Application Level
   [ ] Connection pool configuration
   [ ] Timeout settings
   [ ] Application error logs

The key to TCP/IP troubleshooting is taking a layered approach. Start from the physical/network layer and work upward to the application layer, systematically narrowing down the root cause at each step.

Quiz

Q1: What is the main topic covered in "The Complete Guide to TCP/IP Connection Debugging in Production"?

A practical guide covering TCP 3-way handshake, connection state analysis, SYN flood defense, packet capture with tcpdump/Wireshark, kernel tuning parameters, and real-world debugging scenarios.

Q2: What is Understanding the TCP 3-Way Handshake?

1.1 Connection Establishment TCP connections are established through a 3-way handshake. Problems can occur at each step, and the failure point indicates different root causes. 1.2 TCP Connection States TCP sockets transition through various states.

Q3: Explain the core concept of Common TCP Connection Issues.

2.1 Connection Refused The server responds with an RST packet when no process is listening on the target port. Common causes: The service is not running The service is bound to a different port The service is bound only to localhost (127.0.0.1) A firewall is blocking the connecti...

Q4: What approach is recommended for Debugging Tools?

3.1 ss (Socket Statistics) ss is the modern replacement for netstat, offering faster output and more detailed information.

Q5: How does TCP Tuning Parameters work?

4.1 Backlog Configuration 4.2 Keepalive Configuration 4.3 Window Scaling 4.4 Other Important Parameters