Split View: 컨테이너 & 쿠버네티스 네트워크 디버깅 완전 가이드
컨테이너 & 쿠버네티스 네트워크 디버깅 완전 가이드
- 1. Docker 네트워킹 모델
- 2. Kubernetes 네트워킹 모델과 CNI
- 3. Pod-to-Pod 통신 디버깅
- 4. Pod-to-Service 통신 디버깅
- 5. Service DNS 해석 문제
- 6. 외부 통신 디버깅
- 7. Network Policy 디버깅
- 8. Calico 트러블슈팅
- 9. Cilium 트러블슈팅
- 10. Debug 컨테이너와 Ephemeral 컨테이너
- 11. 실전 디버깅 시나리오
- 12. 디버깅 체크리스트
1. Docker 네트워킹 모델
컨테이너 네트워크 디버깅의 기초는 Docker가 제공하는 네트워킹 모델을 정확히 이해하는 것에서 시작합니다.
1.1 Bridge 네트워크
Docker의 기본 네트워크 드라이버입니다. 각 컨테이너는 docker0 브릿지에 연결된 가상 이더넷(veth) 인터페이스를 할당받습니다.
# 브릿지 네트워크 상세 확인
docker network inspect bridge
# 컨테이너의 네트워크 네임스페이스 확인
docker inspect --format '{{.NetworkSettings.IPAddress}}' <container_id>
# veth 페어 확인
ip link show type veth
# 브릿지에 연결된 인터페이스 확인
brctl show docker0
브릿지 네트워크에서 흔히 발생하는 문제는 다음과 같습니다.
- 컨테이너 간 통신 불가: 동일 브릿지에 연결되어 있는지 확인
- 외부 통신 불가: iptables NAT 규칙과 IP 포워딩 설정 확인
- 포트 충돌: 호스트 포트 바인딩 중복 확인
# iptables NAT 규칙 확인
sudo iptables -t nat -L -n -v
# IP 포워딩 상태 확인
cat /proc/sys/net/ipv4/ip_forward
# 포트 바인딩 확인
docker port <container_id>
1.2 Host 네트워크
컨테이너가 호스트의 네트워크 스택을 직접 사용합니다. 네트워크 격리가 없으므로 성능은 좋지만 포트 충돌 위험이 있습니다.
# host 모드로 컨테이너 실행
docker run --network host nginx
# 컨테이너 내부에서 네트워크 인터페이스 확인
docker exec <container_id> ip addr show
# 호스트와 동일한 네트워크 스택을 사용하는지 확인
docker exec <container_id> ss -tlnp
1.3 Overlay 네트워크
여러 Docker 호스트에 걸쳐 컨테이너 간 통신을 가능하게 합니다. Docker Swarm이나 외부 키-값 저장소가 필요합니다.
# overlay 네트워크 생성
docker network create --driver overlay my-overlay
# VXLAN 터널 상태 확인
ip -d link show type vxlan
# overlay 네트워크의 피어 정보 확인
docker network inspect my-overlay --format '{{json .Peers}}'
overlay 네트워크 디버깅 시 주의할 점은 다음과 같습니다.
- VXLAN 포트(UDP 4789)가 방화벽에서 허용되어 있는지 확인
- MTU 설정이 올바른지 확인 (VXLAN 오버헤드 50바이트 고려)
- 노드 간 시간 동기화 상태 확인
2. Kubernetes 네트워킹 모델과 CNI
2.1 Kubernetes 네트워킹의 기본 원칙
Kubernetes 네트워킹 모델은 세 가지 핵심 원칙을 따릅니다.
- 모든 Pod는 NAT 없이 다른 모든 Pod와 통신할 수 있어야 한다
- 모든 노드는 NAT 없이 모든 Pod와 통신할 수 있어야 한다
- Pod가 자신의 IP로 인식하는 주소가 다른 Pod가 보는 주소와 동일해야 한다
# 클러스터 네트워크 CIDR 확인
kubectl cluster-info dump | grep -m 1 cluster-cidr
# 노드의 Pod CIDR 확인
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
# 현재 사용 중인 CNI 플러그인 확인
ls /etc/cni/net.d/
cat /etc/cni/net.d/*.conflist
2.2 CNI 플러그인 이해
CNI(Container Network Interface)는 컨테이너의 네트워크 인터페이스를 구성하는 표준입니다.
# CNI 바이너리 위치 확인
ls /opt/cni/bin/
# CNI 설정 파일 확인
cat /etc/cni/net.d/10-calico.conflist
# kubelet의 CNI 관련 로그 확인
journalctl -u kubelet | grep -i cni
주요 CNI 플러그인별 특징은 다음과 같습니다.
| CNI 플러그인 | 데이터 플레인 | Network Policy | 특징 |
|---|---|---|---|
| Calico | iptables/eBPF | 지원 | BGP 기반 라우팅 |
| Cilium | eBPF | 지원 | L7 정책, Hubble 관측 |
| Flannel | VXLAN/host-gw | 미지원 | 간단한 설정 |
| Weave | VXLAN | 지원 | 자동 메시 네트워크 |
3. Pod-to-Pod 통신 디버깅
3.1 같은 노드 내 Pod 간 통신
# Pod IP 확인
kubectl get pods -o wide
# Pod에서 다른 Pod로 ping 테스트
kubectl exec -it <pod-a> -- ping <pod-b-ip>
# 같은 노드의 Pod 간 경로 확인
kubectl exec -it <pod-a> -- traceroute <pod-b-ip>
# 노드에서 veth 인터페이스 확인
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- ip link show
3.2 다른 노드의 Pod 간 통신
# 노드 간 라우팅 테이블 확인
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- route -n
# VXLAN이나 IPIP 터널 상태 확인
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- ip tunnel show
# tcpdump로 패킷 캡처
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
tcpdump -i any -nn host <target-pod-ip>
# MTU 문제 확인 (Path MTU Discovery)
kubectl exec -it <pod> -- ping -M do -s 1400 <target-pod-ip>
노드 간 통신 문제의 주요 원인은 다음과 같습니다.
- 클라우드 보안 그룹에서 Pod CIDR 트래픽 차단
- MTU 불일치로 인한 패킷 단편화
- 노드 간 라우팅 테이블 불일치
- IPIP/VXLAN 터널 인터페이스 다운
4. Pod-to-Service 통신 디버깅
4.1 Service 기본 동작 확인
# Service의 ClusterIP와 Endpoints 확인
kubectl get svc <service-name> -o wide
kubectl get endpoints <service-name>
# Service에 연결된 Pod 확인
kubectl get endpoints <service-name> -o yaml
# Service로 접근 테스트
kubectl exec -it <test-pod> -- curl -v http://<service-name>:<port>
# Service의 selector와 매칭되는 Pod 확인
kubectl get pods -l <label-selector>
4.2 kube-proxy와 iptables/IPVS 규칙
kube-proxy는 Service의 가상 IP를 실제 Pod IP로 변환하는 역할을 합니다.
# kube-proxy 모드 확인
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# iptables 모드: Service 관련 iptables 규칙 확인
sudo iptables -t nat -L KUBE-SERVICES -n
sudo iptables -t nat -L KUBE-SVC-<hash> -n
sudo iptables -t nat -L KUBE-SEP-<hash> -n
# IPVS 모드: 가상 서버 목록 확인
sudo ipvsadm -Ln
sudo ipvsadm -Ln -t <cluster-ip>:<port>
# kube-proxy 로그 확인
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=100
iptables 규칙 분석 시 체인의 흐름을 이해해야 합니다.
PREROUTING -> KUBE-SERVICES -> KUBE-SVC-xxx -> KUBE-SEP-xxx (DNAT)
# 특정 Service의 전체 iptables 체인 추적
SERVICE_IP=$(kubectl get svc <service-name> -o jsonpath='{.spec.clusterIP}')
sudo iptables -t nat -L KUBE-SERVICES -n | grep $SERVICE_IP
# conntrack 테이블에서 Service 연결 추적
sudo conntrack -L -d $SERVICE_IP
5. Service DNS 해석 문제
5.1 CoreDNS 동작 확인
Kubernetes 클러스터 내부 DNS는 CoreDNS가 담당합니다.
# CoreDNS Pod 상태 확인
kubectl get pods -n kube-system -l k8s-app=kube-dns
# CoreDNS 설정 확인
kubectl get configmap coredns -n kube-system -o yaml
# DNS 해석 테스트
kubectl exec -it <test-pod> -- nslookup <service-name>
kubectl exec -it <test-pod> -- nslookup <service-name>.<namespace>.svc.cluster.local
# DNS 응답 시간 측정
kubectl exec -it <test-pod> -- dig <service-name>.default.svc.cluster.local +stats
5.2 DNS 문제 진단
# Pod의 DNS 설정 확인
kubectl exec -it <pod> -- cat /etc/resolv.conf
# CoreDNS 로그에서 에러 확인
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
# DNS 디버깅용 Pod 배포
kubectl run dnsutils --image=gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 \
--restart=Never -- sleep 3600
# 외부 도메인 해석 테스트
kubectl exec -it dnsutils -- nslookup google.com
kubectl exec -it dnsutils -- nslookup kubernetes.default
흔히 발생하는 DNS 문제와 해결 방법은 다음과 같습니다.
- ndots 설정:
resolv.conf의 ndots:5 설정 때문에 짧은 이름에 대해 여러 번 DNS 쿼리가 발생합니다. FQDN(마지막에.포함)을 사용하면 해결됩니다. - CoreDNS CrashLoopBackOff: 로그를 확인하고, 업스트림 DNS 서버 연결 상태를 점검합니다.
- DNS 캐시 문제: CoreDNS의 cache 플러그인 설정을 확인합니다.
# ndots 문제 확인: 쿼리 횟수 비교
kubectl exec -it <pod> -- dig myservice.default.svc.cluster.local +search +showsearch
kubectl exec -it <pod> -- dig myservice.default.svc.cluster.local. +search +showsearch
6. 외부 통신 디버깅
6.1 Ingress / LoadBalancer 문제
# Ingress 리소스 상태 확인
kubectl get ingress -A
kubectl describe ingress <ingress-name>
# Ingress Controller 로그 확인
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100
# LoadBalancer Service의 External IP 확인
kubectl get svc -l type=LoadBalancer
# NodePort를 통한 외부 접근 테스트
curl -v http://<node-ip>:<node-port>
6.2 Egress 트래픽 디버깅
# Pod에서 외부로 나가는 트래픽 확인
kubectl exec -it <pod> -- curl -v https://httpbin.org/ip
# NAT 게이트웨이 및 라우팅 확인
kubectl exec -it <pod> -- traceroute 8.8.8.8
# SNAT 규칙 확인
sudo iptables -t nat -L POSTROUTING -n -v
7. Network Policy 디버깅
7.1 Network Policy 기본 진단
# 적용된 Network Policy 확인
kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name> -n <namespace>
# Pod에 적용되는 Network Policy 확인
kubectl get networkpolicy -n <namespace> -o json | \
jq '.items[] | select(.spec.podSelector.matchLabels | to_entries[] |
.key == "app" and .value == "myapp")'
# Network Policy 적용 전후 연결 테스트
kubectl exec -it <source-pod> -- nc -zv <target-pod-ip> <port>
7.2 Network Policy 트러블슈팅 패턴
# 모든 트래픽을 차단하는 기본 정책
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# 정책 적용 후 통신 테스트
kubectl exec -it <pod> -- wget -qO- --timeout=2 http://<target-service>
# CNI별 Network Policy 로그 확인
# Calico
kubectl logs -n calico-system -l k8s-app=calico-node --tail=100
# Cilium
kubectl exec -n kube-system <cilium-pod> -- cilium policy get
kubectl exec -n kube-system <cilium-pod> -- cilium endpoint list
8. Calico 트러블슈팅
8.1 Calico 상태 확인
# Calico 노드 상태 확인
kubectl get pods -n calico-system
calicoctl node status
# BGP 피어 상태 확인
calicoctl get bgpPeer
calicoctl node status | grep -A 5 "BGP"
# IP Pool 확인
calicoctl get ippool -o wide
# Calico 네트워크 정책 확인 (Kubernetes + Calico 전용)
calicoctl get networkpolicy -A
calicoctl get globalnetworkpolicy
8.2 Calico 디버깅
# Felix 로그 레벨 변경 (디버깅 시)
calicoctl patch felixconfiguration default \
--patch '{"spec":{"logSeverityScreen":"Debug"}}'
# BIRD 라우팅 데몬 상태 확인
kubectl exec -n calico-system <calico-node-pod> -- birdcl show route
# IP-in-IP 터널 상태
kubectl exec -n calico-system <calico-node-pod> -- ip tunnel show
# Calico가 프로그래밍한 iptables 규칙 확인
sudo iptables -L -n | grep cali
9. Cilium 트러블슈팅
9.1 Cilium 상태 확인
# Cilium 에이전트 상태 확인
kubectl exec -n kube-system <cilium-pod> -- cilium status --verbose
# Cilium 엔드포인트 목록
kubectl exec -n kube-system <cilium-pod> -- cilium endpoint list
# BPF 맵 상태 확인
kubectl exec -n kube-system <cilium-pod> -- cilium bpf lb list
kubectl exec -n kube-system <cilium-pod> -- cilium bpf ct list global
# Hubble로 네트워크 플로우 관찰
hubble observe --namespace <namespace>
hubble observe --pod <pod-name> --protocol TCP
9.2 Cilium 디버깅
# Cilium 연결성 테스트
kubectl exec -n kube-system <cilium-pod> -- cilium connectivity test
# eBPF 프로그램 상태 확인
kubectl exec -n kube-system <cilium-pod> -- cilium bpf prog list
# Cilium 정책 트레이싱
kubectl exec -n kube-system <cilium-pod> -- cilium policy trace \
--src-k8s-pod <namespace>:<src-pod> \
--dst-k8s-pod <namespace>:<dst-pod> \
--dport <port>
# Cilium 모니터로 실시간 이벤트 관찰
kubectl exec -n kube-system <cilium-pod> -- cilium monitor --type drop
kubectl exec -n kube-system <cilium-pod> -- cilium monitor --type policy-verdict
10. Debug 컨테이너와 Ephemeral 컨테이너
10.1 kubectl debug 활용
# 실행 중인 Pod에 디버그 컨테이너 연결
kubectl debug -it <pod-name> --image=nicolaka/netshoot --target=<container-name>
# 노드 수준 디버깅
kubectl debug node/<node-name> -it --image=ubuntu
# Pod 복사본 생성하여 디버깅 (원본에 영향 없음)
kubectl debug <pod-name> -it --copy-to=debug-pod --image=nicolaka/netshoot
# 프로세스 네임스페이스를 공유하는 디버그 컨테이너
kubectl debug -it <pod-name> --image=busybox --share-processes
10.2 netshoot을 활용한 네트워크 디버깅
# netshoot 컨테이너 배포
kubectl run netshoot --image=nicolaka/netshoot --restart=Never -- sleep 3600
# 종합 네트워크 진단
kubectl exec -it netshoot -- bash
# 내부에서 실행할 명령어들
# TCP 연결 테스트
nc -zv <service-ip> <port>
# HTTP 요청 테스트
curl -v http://<service-name>.<namespace>.svc.cluster.local:<port>/health
# DNS 조회
dig +short <service-name>.<namespace>.svc.cluster.local
# 패킷 캡처
tcpdump -i eth0 -nn port 80
# 네트워크 경로 추적
mtr <target-ip>
# SSL/TLS 연결 확인
openssl s_client -connect <service-ip>:443
11. 실전 디버깅 시나리오
시나리오 1: Pod가 Service에 접근할 수 없는 경우
# 1단계: Service와 Endpoints 확인
kubectl get svc <service-name> -o yaml
kubectl get endpoints <service-name>
# 2단계: Endpoints가 비어있다면 -> selector 확인
kubectl get pods -l <label-selector> --show-labels
# 3단계: Pod가 Ready 상태인지 확인
kubectl get pods -l <label-selector> -o wide
kubectl describe pod <target-pod> | grep -A 5 "Conditions"
# 4단계: kube-proxy 규칙 확인
sudo iptables -t nat -L KUBE-SERVICES -n | grep <cluster-ip>
# 5단계: 직접 Pod IP로 접근 테스트
kubectl exec -it <source-pod> -- curl -v http://<pod-ip>:<target-port>
시나리오 2: 간헐적인 타임아웃 발생
# 1단계: conntrack 테이블 포화 확인
sudo sysctl net.netfilter.nf_conntrack_count
sudo sysctl net.netfilter.nf_conntrack_max
# 2단계: 패킷 드롭 확인
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
netstat -s | grep -i drop
# 3단계: 네트워크 인터페이스 오류 확인
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
ip -s link show
# 4단계: TCP 재전송 확인
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
netstat -s | grep -i retrans
시나리오 3: DNS 해석이 느린 경우
# 1단계: DNS 응답 시간 측정
kubectl exec -it <pod> -- dig <service-name>.default.svc.cluster.local +stats | grep "Query time"
# 2단계: CoreDNS Pod 리소스 사용량 확인
kubectl top pods -n kube-system -l k8s-app=kube-dns
# 3단계: ndots 최적화 적용
# Pod spec에 다음 추가:
# dnsConfig:
# options:
# - name: ndots
# value: "2"
# 4단계: CoreDNS 캐시 히트율 확인
kubectl exec -n kube-system <coredns-pod> -- \
wget -qO- http://localhost:9153/metrics | grep coredns_cache
12. 디버깅 체크리스트
네트워크 문제를 체계적으로 접근하기 위한 체크리스트입니다.
[ ] Pod 상태 확인 (Running, Ready)
[ ] Service Endpoints 확인 (비어있지 않은지)
[ ] DNS 해석 확인 (nslookup/dig)
[ ] 직접 Pod IP로 연결 테스트
[ ] kube-proxy 규칙 확인 (iptables/IPVS)
[ ] Network Policy 확인 (Ingress/Egress 규칙)
[ ] CNI 플러그인 상태 확인
[ ] 노드 간 네트워크 연결 확인
[ ] MTU 설정 확인
[ ] 방화벽/보안그룹 규칙 확인
[ ] conntrack 테이블 상태 확인
[ ] CoreDNS 상태 및 로그 확인
네트워크 디버깅은 계층적으로 접근하는 것이 효과적입니다. L2(이더넷) -> L3(IP/라우팅) -> L4(TCP/UDP) -> L7(HTTP/DNS) 순서로 각 계층에서 문제가 없는지 확인하면, 복잡한 네트워크 문제도 체계적으로 해결할 수 있습니다.
The Complete Guide to Container & Kubernetes Network Debugging
- 1. Docker Networking Models
- 2. Kubernetes Networking Model and CNI
- 3. Pod-to-Pod Communication Debugging
- 4. Pod-to-Service Communication Debugging
- 5. Service DNS Resolution Issues
- 6. External Traffic Debugging
- 7. Network Policy Debugging
- 8. Calico Troubleshooting
- 9. Cilium Troubleshooting
- 10. Debug Containers and Ephemeral Containers
- 11. Practical Debugging Scenarios
- 12. Debugging Checklist
- Quiz
1. Docker Networking Models
The foundation of container network debugging starts with a solid understanding of the networking models Docker provides.
1.1 Bridge Network
The bridge driver is Docker's default network driver. Each container receives a virtual ethernet (veth) interface connected to the docker0 bridge.
# Inspect the bridge network details
docker network inspect bridge
# Check a container's IP address
docker inspect --format '{{.NetworkSettings.IPAddress}}' <container_id>
# List veth pairs
ip link show type veth
# Show interfaces connected to docker0
brctl show docker0
Common issues with bridge networking include:
- Container-to-container communication failure: Verify that both containers are on the same bridge network.
- No external connectivity: Check iptables NAT rules and IP forwarding settings.
- Port conflicts: Inspect host port binding overlaps.
# Check iptables NAT rules
sudo iptables -t nat -L -n -v
# Verify IP forwarding is enabled
cat /proc/sys/net/ipv4/ip_forward
# Check port bindings
docker port <container_id>
1.2 Host Network
The container shares the host's network stack directly. There is no network isolation, which yields better performance but introduces port conflict risks.
# Run a container in host network mode
docker run --network host nginx
# Verify the container sees host interfaces
docker exec <container_id> ip addr show
# Confirm shared network stack
docker exec <container_id> ss -tlnp
1.3 Overlay Network
Overlay networks enable communication between containers across multiple Docker hosts. They require Docker Swarm or an external key-value store.
# Create an overlay network
docker network create --driver overlay my-overlay
# Check VXLAN tunnel status
ip -d link show type vxlan
# Inspect overlay network peer information
docker network inspect my-overlay --format '{{json .Peers}}'
Key considerations when debugging overlay networks:
- Ensure the VXLAN port (UDP 4789) is allowed through firewalls
- Verify MTU settings account for VXLAN overhead (50 bytes)
- Check time synchronization between nodes
2. Kubernetes Networking Model and CNI
2.1 Core Principles of Kubernetes Networking
The Kubernetes networking model is built on three fundamental requirements:
- Every Pod can communicate with every other Pod without NAT
- Every node can communicate with every Pod without NAT
- The IP a Pod sees for itself is the same IP that other Pods see for it
# Check the cluster network CIDR
kubectl cluster-info dump | grep -m 1 cluster-cidr
# Get Pod CIDRs assigned to each node
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
# Identify the CNI plugin in use
ls /etc/cni/net.d/
cat /etc/cni/net.d/*.conflist
2.2 Understanding CNI Plugins
CNI (Container Network Interface) is the standard for configuring network interfaces in containers.
# Locate CNI binaries
ls /opt/cni/bin/
# Inspect the CNI configuration
cat /etc/cni/net.d/10-calico.conflist
# Check kubelet logs for CNI-related events
journalctl -u kubelet | grep -i cni
Comparison of popular CNI plugins:
| CNI Plugin | Data Plane | Network Policy | Key Feature |
|---|---|---|---|
| Calico | iptables/eBPF | Supported | BGP-based routing |
| Cilium | eBPF | Supported | L7 policies, Hubble observability |
| Flannel | VXLAN/host-gw | Not supported | Simple configuration |
| Weave | VXLAN | Supported | Automatic mesh networking |
3. Pod-to-Pod Communication Debugging
3.1 Same-Node Pod Communication
# Get Pod IPs
kubectl get pods -o wide
# Ping from one Pod to another
kubectl exec -it <pod-a> -- ping <pod-b-ip>
# Trace the route between Pods on the same node
kubectl exec -it <pod-a> -- traceroute <pod-b-ip>
# Inspect veth interfaces on the node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- ip link show
3.2 Cross-Node Pod Communication
# Check the routing table on the node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- route -n
# Verify VXLAN or IPIP tunnel status
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- ip tunnel show
# Capture packets with tcpdump
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
tcpdump -i any -nn host <target-pod-ip>
# Test for MTU issues (Path MTU Discovery)
kubectl exec -it <pod> -- ping -M do -s 1400 <target-pod-ip>
Common causes of cross-node communication failures:
- Cloud security groups blocking Pod CIDR traffic
- MTU mismatch causing packet fragmentation
- Inconsistent routing tables across nodes
- IPIP/VXLAN tunnel interfaces being down
4. Pod-to-Service Communication Debugging
4.1 Verifying Service Basics
# Check the Service's ClusterIP and Endpoints
kubectl get svc <service-name> -o wide
kubectl get endpoints <service-name>
# Inspect the Endpoints in detail
kubectl get endpoints <service-name> -o yaml
# Test connectivity to the Service
kubectl exec -it <test-pod> -- curl -v http://<service-name>:<port>
# Verify Pods matching the Service selector
kubectl get pods -l <label-selector>
4.2 kube-proxy and iptables/IPVS Rules
kube-proxy translates Service virtual IPs into actual Pod IPs.
# Determine the kube-proxy mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# iptables mode: inspect Service-related rules
sudo iptables -t nat -L KUBE-SERVICES -n
sudo iptables -t nat -L KUBE-SVC-<hash> -n
sudo iptables -t nat -L KUBE-SEP-<hash> -n
# IPVS mode: list virtual servers
sudo ipvsadm -Ln
sudo ipvsadm -Ln -t <cluster-ip>:<port>
# Check kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=100
Understanding the iptables chain flow is essential for debugging:
PREROUTING -> KUBE-SERVICES -> KUBE-SVC-xxx -> KUBE-SEP-xxx (DNAT)
# Trace the full iptables chain for a specific Service
SERVICE_IP=$(kubectl get svc <service-name> -o jsonpath='{.spec.clusterIP}')
sudo iptables -t nat -L KUBE-SERVICES -n | grep $SERVICE_IP
# Inspect conntrack entries for the Service
sudo conntrack -L -d $SERVICE_IP
5. Service DNS Resolution Issues
5.1 Verifying CoreDNS Operation
CoreDNS handles all in-cluster DNS resolution in Kubernetes.
# Check CoreDNS Pod status
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Inspect CoreDNS configuration
kubectl get configmap coredns -n kube-system -o yaml
# Test DNS resolution
kubectl exec -it <test-pod> -- nslookup <service-name>
kubectl exec -it <test-pod> -- nslookup <service-name>.<namespace>.svc.cluster.local
# Measure DNS response times
kubectl exec -it <test-pod> -- dig <service-name>.default.svc.cluster.local +stats
5.2 Diagnosing DNS Problems
# Check the Pod's DNS configuration
kubectl exec -it <pod> -- cat /etc/resolv.conf
# Review CoreDNS logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
# Deploy a DNS debugging Pod
kubectl run dnsutils --image=gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 \
--restart=Never -- sleep 3600
# Test external domain resolution
kubectl exec -it dnsutils -- nslookup google.com
kubectl exec -it dnsutils -- nslookup kubernetes.default
Common DNS issues and their resolutions:
- ndots configuration: The default
ndots:5inresolv.confcauses multiple DNS queries for short names. Using FQDNs (with a trailing.) eliminates unnecessary lookups. - CoreDNS CrashLoopBackOff: Check logs and verify upstream DNS server connectivity.
- DNS caching issues: Review the CoreDNS cache plugin configuration.
# Demonstrate the ndots effect: compare query counts
kubectl exec -it <pod> -- dig myservice.default.svc.cluster.local +search +showsearch
kubectl exec -it <pod> -- dig myservice.default.svc.cluster.local. +search +showsearch
6. External Traffic Debugging
6.1 Ingress / LoadBalancer Troubleshooting
# List all Ingress resources
kubectl get ingress -A
kubectl describe ingress <ingress-name>
# Check Ingress Controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100
# Verify LoadBalancer Service external IPs
kubectl get svc -l type=LoadBalancer
# Test NodePort access externally
curl -v http://<node-ip>:<node-port>
6.2 Egress Traffic Debugging
# Verify outbound connectivity from a Pod
kubectl exec -it <pod> -- curl -v https://httpbin.org/ip
# Check NAT gateway and routing
kubectl exec -it <pod> -- traceroute 8.8.8.8
# Inspect SNAT rules
sudo iptables -t nat -L POSTROUTING -n -v
7. Network Policy Debugging
7.1 Basic Network Policy Diagnostics
# List all Network Policies
kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name> -n <namespace>
# Find Network Policies applying to a specific Pod
kubectl get networkpolicy -n <namespace> -o json | \
jq '.items[] | select(.spec.podSelector.matchLabels | to_entries[] |
.key == "app" and .value == "myapp")'
# Test connectivity before and after policy application
kubectl exec -it <source-pod> -- nc -zv <target-pod-ip> <port>
7.2 Network Policy Troubleshooting Patterns
# Default deny-all policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# Test connectivity after applying the policy
kubectl exec -it <pod> -- wget -qO- --timeout=2 http://<target-service>
# Check CNI-specific Network Policy logs
# Calico
kubectl logs -n calico-system -l k8s-app=calico-node --tail=100
# Cilium
kubectl exec -n kube-system <cilium-pod> -- cilium policy get
kubectl exec -n kube-system <cilium-pod> -- cilium endpoint list
8. Calico Troubleshooting
8.1 Checking Calico Status
# Verify Calico node health
kubectl get pods -n calico-system
calicoctl node status
# Check BGP peer status
calicoctl get bgpPeer
calicoctl node status | grep -A 5 "BGP"
# Inspect IP pools
calicoctl get ippool -o wide
# List Calico-specific network policies
calicoctl get networkpolicy -A
calicoctl get globalnetworkpolicy
8.2 Calico Debugging
# Increase Felix log level for debugging
calicoctl patch felixconfiguration default \
--patch '{"spec":{"logSeverityScreen":"Debug"}}'
# Check BIRD routing daemon status
kubectl exec -n calico-system <calico-node-pod> -- birdcl show route
# Inspect IP-in-IP tunnel status
kubectl exec -n calico-system <calico-node-pod> -- ip tunnel show
# Review Calico-programmed iptables rules
sudo iptables -L -n | grep cali
9. Cilium Troubleshooting
9.1 Checking Cilium Status
# Get detailed Cilium agent status
kubectl exec -n kube-system <cilium-pod> -- cilium status --verbose
# List all Cilium endpoints
kubectl exec -n kube-system <cilium-pod> -- cilium endpoint list
# Inspect BPF maps
kubectl exec -n kube-system <cilium-pod> -- cilium bpf lb list
kubectl exec -n kube-system <cilium-pod> -- cilium bpf ct list global
# Observe network flows with Hubble
hubble observe --namespace <namespace>
hubble observe --pod <pod-name> --protocol TCP
9.2 Cilium Debugging
# Run Cilium connectivity test
kubectl exec -n kube-system <cilium-pod> -- cilium connectivity test
# Check eBPF program status
kubectl exec -n kube-system <cilium-pod> -- cilium bpf prog list
# Trace policy decisions between Pods
kubectl exec -n kube-system <cilium-pod> -- cilium policy trace \
--src-k8s-pod <namespace>:<src-pod> \
--dst-k8s-pod <namespace>:<dst-pod> \
--dport <port>
# Monitor events in real time
kubectl exec -n kube-system <cilium-pod> -- cilium monitor --type drop
kubectl exec -n kube-system <cilium-pod> -- cilium monitor --type policy-verdict
10. Debug Containers and Ephemeral Containers
10.1 Using kubectl debug
# Attach a debug container to a running Pod
kubectl debug -it <pod-name> --image=nicolaka/netshoot --target=<container-name>
# Debug at the node level
kubectl debug node/<node-name> -it --image=ubuntu
# Create a copy of a Pod for debugging (no impact on original)
kubectl debug <pod-name> -it --copy-to=debug-pod --image=nicolaka/netshoot
# Debug container with shared process namespace
kubectl debug -it <pod-name> --image=busybox --share-processes
10.2 Network Debugging with netshoot
# Deploy a netshoot container
kubectl run netshoot --image=nicolaka/netshoot --restart=Never -- sleep 3600
# Enter the container for comprehensive diagnostics
kubectl exec -it netshoot -- bash
# Commands to run inside the container:
# TCP connectivity test
nc -zv <service-ip> <port>
# HTTP request test
curl -v http://<service-name>.<namespace>.svc.cluster.local:<port>/health
# DNS lookup
dig +short <service-name>.<namespace>.svc.cluster.local
# Packet capture
tcpdump -i eth0 -nn port 80
# Network path tracing
mtr <target-ip>
# SSL/TLS connection verification
openssl s_client -connect <service-ip>:443
11. Practical Debugging Scenarios
Scenario 1: Pod Cannot Reach a Service
# Step 1: Verify the Service and its Endpoints
kubectl get svc <service-name> -o yaml
kubectl get endpoints <service-name>
# Step 2: If Endpoints are empty, check the selector
kubectl get pods -l <label-selector> --show-labels
# Step 3: Verify target Pods are Ready
kubectl get pods -l <label-selector> -o wide
kubectl describe pod <target-pod> | grep -A 5 "Conditions"
# Step 4: Inspect kube-proxy rules
sudo iptables -t nat -L KUBE-SERVICES -n | grep <cluster-ip>
# Step 5: Test direct Pod IP connectivity
kubectl exec -it <source-pod> -- curl -v http://<pod-ip>:<target-port>
Scenario 2: Intermittent Timeouts
# Step 1: Check conntrack table saturation
sudo sysctl net.netfilter.nf_conntrack_count
sudo sysctl net.netfilter.nf_conntrack_max
# Step 2: Look for packet drops
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
netstat -s | grep -i drop
# Step 3: Check network interface errors
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
ip -s link show
# Step 4: Monitor TCP retransmissions
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
netstat -s | grep -i retrans
Scenario 3: Slow DNS Resolution
# Step 1: Measure DNS response times
kubectl exec -it <pod> -- dig <service-name>.default.svc.cluster.local +stats | grep "Query time"
# Step 2: Check CoreDNS resource usage
kubectl top pods -n kube-system -l k8s-app=kube-dns
# Step 3: Apply ndots optimization
# Add to Pod spec:
# dnsConfig:
# options:
# - name: ndots
# value: "2"
# Step 4: Check CoreDNS cache hit rates
kubectl exec -n kube-system <coredns-pod> -- \
wget -qO- http://localhost:9153/metrics | grep coredns_cache
12. Debugging Checklist
A systematic checklist for approaching network issues:
[ ] Pod status verification (Running, Ready)
[ ] Service Endpoints verification (not empty)
[ ] DNS resolution test (nslookup/dig)
[ ] Direct Pod IP connectivity test
[ ] kube-proxy rules inspection (iptables/IPVS)
[ ] Network Policy review (Ingress/Egress rules)
[ ] CNI plugin health check
[ ] Cross-node network connectivity
[ ] MTU configuration verification
[ ] Firewall / security group rules
[ ] conntrack table status
[ ] CoreDNS health and logs
Network debugging is most effective when approached layer by layer: L2 (Ethernet) -> L3 (IP/Routing) -> L4 (TCP/UDP) -> L7 (HTTP/DNS). By systematically verifying each layer, even the most complex networking issues can be methodically diagnosed and resolved.
Quiz
Q1: What is the main topic covered in "The Complete Guide to Container & Kubernetes Network
Debugging"?
A systematic guide to debugging container networking, from Docker networking models to Kubernetes CNI, Service DNS, Network Policies, and Calico/Cilium troubleshooting with practical commands.
Q2: What is Docker Networking Models?
The foundation of container network debugging starts with a solid understanding of the networking
models Docker provides. 1.1 Bridge Network The bridge driver is Docker's default network driver.
Q3: Explain the core concept of Kubernetes Networking Model and CNI.
2.1 Core Principles of Kubernetes Networking The Kubernetes networking model is built on three
fundamental requirements: Every Pod can communicate with every other Pod without NAT Every node
can communicate with every Pod without NAT The IP a Pod sees for itself is the same IP th...
Q4: What approach is recommended for Pod-to-Pod Communication Debugging?
3.1 Same-Node Pod Communication 3.2 Cross-Node Pod Communication Common causes of cross-node
communication failures: Cloud security groups blocking Pod CIDR traffic MTU mismatch causing
packet fragmentation Inconsistent routing tables across nodes IPIP/VXLAN tunnel interfaces bei...
Q5: What approach is recommended for Pod-to-Service Communication Debugging?
4.1 Verifying Service Basics 4.2 kube-proxy and iptables/IPVS Rules kube-proxy translates Service
virtual IPs into actual Pod IPs. Understanding the iptables chain flow is essential for debugging: