Skip to content

Split View: Kubernetes 네트워킹 심화 가이드 2025: CNI, Service, Ingress, DNS, Network Policy

✨ Learn with Quiz
|

Kubernetes 네트워킹 심화 가이드 2025: CNI, Service, Ingress, DNS, Network Policy

목차

1. Kubernetes 네트워킹 모델의 기본 원칙

Kubernetes 네트워킹은 세 가지 기본 원칙 위에 구축됩니다.

1.1 핵심 요구사항

  1. 모든 Pod는 고유한 IP 주소를 가진다 - NAT 없이 서로 통신 가능
  2. 모든 Pod는 다른 모든 Pod에 도달할 수 있다 - 같은 노드든 다른 노드든
  3. Pod가 보는 자신의 IP와 다른 Pod가 보는 IP가 동일하다 - NAT 없음
┌────────────────────────────────────────────────────┐
Kubernetes 네트워킹 기본 원칙              │
│                                                    │
│  ┌──────────┐           ┌──────────┐              │
│  │  Node 1  │           │  Node 2  │              │
│  │          │           │          │              │
│  │ ┌──────┐ │    NAT    │ ┌──────┐ │              │
│  │ │Pod A │◄├───없음!───►┤│Pod C │ │              │
│  │ │10.1.1│ │           │ │10.1.2│ │              │
│  │ └──────┘ │           │ └──────┘ │              │
│  │ ┌──────┐ │           │ ┌──────┐ │              │
│  │ │Pod B │ │           │ │Pod D │ │              │
│  │ │10.1.1│ │           │ │10.1.2│ │              │
│  │ └──────┘ │           │ └──────┘ │              │
│  └──────────┘           └──────────┘              │
│                                                    │
Pod A(10.1.1.2)Pod C(10.1.2.3): 직접 통신     │
NAT, 포트 매핑 없음.  Pod는 고유 IP 소유         │
└────────────────────────────────────────────────────┘

1.2 네트워킹 계층 구조

┌──────────────────────────────────────────┐
Kubernetes 네트워킹 계층          │
├──────────────────────────────────────────┤
L7: Ingress / Gateway API     (HTTP 라우팅, TLS 종료)├──────────────────────────────────────────┤
L4: Service     (ClusterIP, NodePort, LoadBalancer)├──────────────────────────────────────────┤
L3: Pod 네트워킹 (CNI)     (Pod-to-Pod, 오버레이/언더레이)├──────────────────────────────────────────┤
L2-L3: 노드 네트워킹                     │
     (물리/가상 네트워크)└──────────────────────────────────────────┘

2. Pod-to-Pod 네트워킹: 내부 동작 원리

2.1 같은 노드 내 Pod 통신

┌────────────────────────────────────────────┐
Node│                                            │
│  ┌──────┐    veth pair    ┌──────────┐    │
│  │Pod A │◄──────────────►│           │    │
│  │eth0  │                │   cbr0    │    │
│  │      │                  (bridge) │    │
│  └──────┘                │           │    │
│                          │           │    │
│  ┌──────┐    veth pair    │           │    │
│  │Pod B │◄──────────────►│           │    │
│  │eth0  │                └──────────┘    │
│  └──────┘                                │
│                                            │
1. Pod APod B로 패킷 전송              │
2. veth pair를 통해 브릿지(cbr0)로 전달    │
3. 브릿지가 MAC 주소로 목적지 veth 찾기     │
4. Pod B의 veth pair를 통해 Pod B로 전달   │
└────────────────────────────────────────────┘

veth pair: 한쪽 끝은 Pod 네임스페이스의 eth0, 다른 쪽은 노드의 브릿지에 연결된 가상 이더넷 쌍입니다.

2.2 다른 노드의 Pod 통신 (오버레이)

┌──────────┐                      ┌──────────┐
Node 1  │                      │  Node 2│          │                      │          │
│ ┌──────┐ │     VXLAN/IPinIP     │ ┌──────┐ │
│ │Pod A │ │    ┌───────────┐     │ │Pod C │ │
│ │10.1.1│ │───►│ 터널링     │────►│ │10.1.2│ │
│ └──────┘ │    │           │     │ └──────┘ │
│          │    │원본 패킷을  │     │          │
│ cbr0     │    │외부 패킷에  │     │ cbr0     │
10.1.1.0 │    │캡슐화      │     │ 10.1.2.0│          │    └───────────┘     │          │
│ eth0     │                      │ eth0     │
192.168.1│                      │192.168.2└──────────┘                      └──────────┘

VXLAN 캡슐화:
┌─────────────────────────────────────────┐
│ 외부IPUDPVXLAN │ 내부IPPayloadHdrHdrHdrHdr   │         │
192192│     │       │1010Data└─────────────────────────────────────────┘

2.3 오버레이 vs 언더레이 네트워킹

특성오버레이 (VXLAN/IPinIP)언더레이 (BGP/Direct)
설정 난이도쉬움어려움
네트워크 요구사항L3 연결만 필요BGP 지원 필요
성능 오버헤드있음 (캡슐화)없음
MTU 영향감소 (50-54 bytes)없음
디버깅어려움쉬움
사용 사례클라우드, 멀티테넌트베어메탈, 고성능

3. CNI 플러그인 상세 비교

3.1 CNI란 무엇인가

CNI(Container Network Interface)는 컨테이너 런타임이 네트워크 플러그인과 상호작용하는 표준 인터페이스입니다.

Pod 생성 시 CNI 동작 흐름:

1. kubelet이 CRI를 통해 컨테이너 생성 요청
2. 컨테이너 런타임이 네트워크 네임스페이스 생성
3. kubelet이 CNI 플러그인 호출 (ADD 명령)
4. CNI 플러그인이:
   a. veth pair 생성
   b. Pod에 IP 주소 할당 (IPAM)
   c. 라우팅 규칙 설정
   d. 필요시 오버레이 터널 설정
5. Pod가 네트워크 준비 완료

3.2 주요 CNI 플러그인 비교

기능CalicoCiliumFlannelWeave Net
개발사TigeraIsovalent (Cisco)CoreOSWeaveworks
데이터플레인iptables/eBPFeBPFVXLANVXLAN/슬리브
네트워크 모드BGP/VXLAN/IPinIPVXLAN/NativeVXLANVXLAN
Network Policy풍부함매우 풍부 (L7)없음기본적
암호화WireGuardWireGuard/IPsec없음IPsec
kube-proxy 대체eBPF 모드네이티브 eBPF불가불가
관찰성기본적Hubble (강력)없음Scope
서비스 메시없음내장 (옵션)없음없음
멀티클러스터지원ClusterMesh미지원지원
성능높음 (BGP)매우 높음 (eBPF)보통낮음
복잡도중간높음낮음낮음
프로덕션 추천강력 추천강력 추천소규모만비추천

3.3 Calico 심화

# Calico BGP 모드 설정 (IPPool)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 10.244.0.0/16
  encapsulation: None        # BGP 사용 시 캡슐화 없음
  natOutgoing: true
  nodeSelector: all()
  blockSize: 26              # /26 = 노드당 64 IP

---
# Calico BGP Peering 설정
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: rack-tor-switch
spec:
  peerIP: 192.168.1.1
  asNumber: 64512
  nodeSelector: rack == "rack-1"
# Calico VXLAN 모드 (클라우드 환경)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 10.244.0.0/16
  encapsulation: VXLAN       # VXLAN 캡슐화
  natOutgoing: true
  vxlanMode: Always

3.4 Cilium 심화

# Cilium Helm 설치 (kube-proxy 대체 모드)
# helm install cilium cilium/cilium --namespace kube-system \
#   --set kubeProxyReplacement=true \
#   --set k8sServiceHost=API_SERVER_IP \
#   --set k8sServicePort=6443 \
#   --set hubble.enabled=true \
#   --set hubble.relay.enabled=true \
#   --set hubble.ui.enabled=true \
#   --set encryption.enabled=true \
#   --set encryption.type=wireguard

# Cilium 상태 확인
# cilium status
# cilium connectivity test
# Cilium L7 Network Policy (HTTP 기반 정책)
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: l7-rule
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/v1/.*"
        - method: "POST"
          path: "/api/v1/orders"

4. Service 심화: ClusterIP부터 LoadBalancer까지

4.1 Service의 동작 원리

┌──────────────────────────────────────────────────┐
Service 트래픽 흐름                 │
│                                                  │
Client Pod ──► Service VIP ──► kube-proxy       │
                (10.96.0.10)     /iptables/IPVS│                                    │             │
│                           ┌────────┼────────┐    │
│                           ▼        ▼        ▼    │
Pod A    Pod B    Pod C10.1.1.2 10.1.1.3 10.1.2.4│                                                  │
Service VIP는 실제 인터페이스에 바인딩되지 않음    │
│  iptables/IPVS 규칙이 DNAT으로 Pod IP로 변환      │
└──────────────────────────────────────────────────┘

4.2 Service 유형별 상세

# 1. ClusterIP (기본값) - 클러스터 내부 접근만 가능
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
  # 클러스터 IP: 자동 할당 (예: 10.96.0.10)
  # 클러스터 내부에서만 10.96.0.10:80으로 접근 가능
# 2. NodePort - 모든 노드의 고정 포트로 외부 접근
apiVersion: v1
kind: Service
metadata:
  name: my-nodeport-service
spec:
  type: NodePort
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080    # 30000-32767 범위
  # 모든 노드의 30080 포트로 접근 가능
  # NodeIP:30080 → ClusterIP:80 → Pod:8080
# 3. LoadBalancer - 클라우드 로드밸런서 자동 프로비저닝
apiVersion: v1
kind: Service
metadata:
  name: my-lb-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
  - port: 443
    targetPort: 8443
  # 외부 LB IP → NodePort → ClusterIP → Pod
# 4. ExternalName - 외부 DNS를 클러스터 내 서비스로 매핑
apiVersion: v1
kind: Service
metadata:
  name: external-db
spec:
  type: ExternalName
  externalName: mydb.example.com
  # external-db.default.svc.cluster.local → mydb.example.com
# 5. Headless Service - ClusterIP 없이 Pod IP 직접 반환
apiVersion: v1
kind: Service
metadata:
  name: my-headless-service
spec:
  clusterIP: None            # Headless 선언
  selector:
    app: my-stateful-app
  ports:
  - port: 5432
  # DNS 쿼리 시 Service IP가 아닌 개별 Pod IP 반환
  # StatefulSet과 함께 사용: pod-0.my-headless-service.default.svc

4.3 Session Affinity

apiVersion: v1
kind: Service
metadata:
  name: sticky-service
spec:
  selector:
    app: web
  ports:
  - port: 80
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800  # 3시간
  # 같은 클라이언트 IP는 같은 Pod로 라우팅

5. kube-proxy 모드: iptables vs IPVS vs eBPF

5.1 iptables 모드 (기본)

iptables 규칙 체인 (Service당):

PREROUTINGKUBE-SERVICESKUBE-SVC-XXXXKUBE-SEP-YYYY
                                             (확률적 분배)

Service에 Pod 3개인 경우:
KUBE-SVC-XXXX:
  -p tcp -d 10.96.0.10 --dport 80
33% 확률로 KUBE-SEP-1 (DNAT10.1.1.2:8080)
33% 확률로 KUBE-SEP-2 (DNAT10.1.1.3:8080)
34% 확률로 KUBE-SEP-3 (DNAT10.1.2.4:8080)

문제점:
- Service/Pod 수에 비례하여 규칙 수 증가 (O(n))
- 규칙 업데이트 시 전체 체인 재작성
- 10,000+ Service에서 성능 저하 심각

5.2 IPVS 모드

IPVS (IP Virtual Server):

┌───────────────────────────────────────────┐
IPVS Virtual Server: 10.96.0.10:80│                                           │
Real Server 1: 10.1.1.2:8080  (weight 1)Real Server 2: 10.1.1.3:8080  (weight 1)Real Server 3: 10.1.2.4:8080  (weight 1)│                                           │
Algorithm: rr (Round Robin)│  다른 옵션: lc, dh, sh, sed, nq          │
└───────────────────────────────────────────┘

장점:
- 해시 테이블 기반 O(1) 조회
- 다양한 로드밸런싱 알고리즘
- 대규모 클러스터에서 안정적 성능
- 실시간 통계 및 연결 추적
# kube-proxy IPVS 모드 설정
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr"       # rr, lc, dh, sh, sed, nq
  syncPeriod: "30s"
  minSyncPeriod: "2s"

5.3 eBPF 모드 (Cilium)

eBPF kube-proxy 대체:

┌────────────────────────────────────────────┐
│  기존: Pod → iptables/IPVSPod│  eBPF: PodBPF 프로그램 → Pod (직접!)│                                            │
│  ┌──────┐   BPF map    ┌──────┐           │
│  │Pod A │──(Service    ──│Pod B │           │
│  │      │   lookup)     │      │           │
│  └──────┘               └──────┘           │
│                                            │
│  iptables 체인 순회 없음!│  커널 공간에서 직접 패킷 리다이렉트            │
└────────────────────────────────────────────┘
비교 항목iptablesIPVSeBPF (Cilium)
조회 복잡도O(n)O(1)O(1)
규칙 업데이트전체 재작성증분 업데이트증분 업데이트
로드밸런싱확률적다양한 알고리즘Maglev 해시
연결 추적conntrackconntrackBPF conntrack
성능보통높음매우 높음
L7 정책불가불가가능
관찰성제한적기본적Hubble (강력)
10K Services매우 느림빠름매우 빠름

6. DNS: CoreDNS와 서비스 디스커버리

6.1 CoreDNS 아키텍처

┌──────────────────────────────────────────────────┐
CoreDNS 동작                       │
│                                                  │
Pod ──DNS 쿼리──► CoreDNS Pod  (nameserver       (kube-system)10.96.0.10)│                      │                           │
│           ┌──────────┼──────────┐                │
│           ▼          ▼          ▼                │
K8s API     Corefile     Upstream    (서비스/Pod  (플러그인      DNS│     레코드)     설정)       (외부 DNS)└──────────────────────────────────────────────────┘

6.2 DNS 레코드 형식

Service DNS:
  my-service.my-namespace.svc.cluster.local
  └─ 서비스명 ─┘ └ 네임스페이스 ┘

Pod DNS:
  10-1-1-2.my-namespace.pod.cluster.local
IP(점→대시)
Headless Service의 Pod DNS (StatefulSet):
  pod-0.my-headless.my-namespace.svc.cluster.local
Pod명┘ └ 서비스명 ┘

SRV 레코드:
  _http._tcp.my-service.my-namespace.svc.cluster.local
  → 포트 번호와 호스트명 반환

6.3 CoreDNS Corefile 설정

# CoreDNS Corefile (ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
          ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

6.4 DNS 디버깅

# DNS 디버깅용 Pod 생성
kubectl run dnsutils --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
  --restart=Never -- sleep infinity

# Service DNS 조회
kubectl exec dnsutils -- nslookup my-service.default.svc.cluster.local

# Pod DNS 조회
kubectl exec dnsutils -- nslookup 10-1-1-2.default.pod.cluster.local

# DNS 응답 상세 확인
kubectl exec dnsutils -- dig my-service.default.svc.cluster.local +search +short

# CoreDNS 로그 확인
kubectl logs -n kube-system -l k8s-app=kube-dns -f

# resolv.conf 확인
kubectl exec dnsutils -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

6.5 ndots:5 문제와 해결

# ndots:5 기본 설정의 문제:
# "api.example.com" 조회 시 (점이 5개 미만):
# 1. api.example.com.default.svc.cluster.local (실패)
# 2. api.example.com.svc.cluster.local (실패)
# 3. api.example.com.cluster.local (실패)
# 4. api.example.com (성공)
# → 외부 DNS 조회에 불필요한 3번의 쿼리 발생!

# 해결방법 1: Pod DNS 설정 커스터마이징
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"       # ndots를 2로 줄임
  containers:
  - name: app
    image: my-app

# 해결방법 2: FQDN으로 직접 접근 (끝에 점)
# api.example.com.  ← 마지막 점이 FQDN을 의미

7. Ingress: HTTP/HTTPS 라우팅

7.1 Ingress 아키텍처

┌──────────────────────────────────────────────────┐
Ingress 아키텍처                    │
│                                                  │
│  외부 트래픽                                      │
│      │                                           │
│      ▼                                           │
│  ┌─────────────────────┐                         │
│  │  Ingress Controller   (nginx, Traefik)  (실제 프록시)       │                         │
│  └─────────┬───────────┘                         │
│            │                                     │
Ingress 리소스 감시                            │
     (라우팅 규칙)│            │                                     │
│     ┌──────┼──────┐                              │
│     ▼      ▼      ▼                              │
Svc A  Svc B  Svc C/api   /web   /docs                            │
└──────────────────────────────────────────────────┘

7.2 Ingress 리소스 예시

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80

7.3 Ingress Controller 비교

기능nginx-ingressTraefikHAProxyContour
개발사Kubernetes/F5Traefik LabsHAProxy TechVMware
프로토콜HTTP/HTTPS/gRPCHTTP/HTTPS/gRPC/TCPHTTP/HTTPS/TCPHTTP/HTTPS/gRPC
설정 리로드재시작 필요핫 리로드핫 리로드핫 리로드
Rate Limiting어노테이션미들웨어내장미지원
인증Basic/OAuthForward Auth내장제한적
성능높음중간매우 높음높음
Gateway API지원지원부분 지원완전 지원
커뮤니티매우 큼중간중간

8. Gateway API: Ingress의 진화

8.1 왜 Gateway API인가

Ingress의 한계:

  • 어노테이션에 의존한 비표준 설정
  • L7 HTTP만 지원 (TCP/UDP/gRPC 미지원)
  • 단일 리소스로 역할 분리 어려움
  • 트래픽 분할, 헤더 매칭 등 고급 기능 부재

8.2 Gateway API 리소스 계층

┌──────────────────────────────────────────────────┐
Gateway API 리소스 계층                 │
│                                                  │
│  인프라 관리자:│  ┌─────────────┐                                 │
│  │GatewayClass │  어떤 컨트롤러를 사용할지         │
│  └──────┬──────┘                                 │
│         │                                        │
│  클러스터 운영자:│  ┌──────▼──────┐                                 │
│  │  Gateway리스너 (포트, 프로토콜, TLS)│  └──────┬──────┘                                 │
│         │                                        │
│  애플리케이션 개발자:│  ┌──────▼──────┐                                 │
│  │ HTTPRoute   │  라우팅 규칙 (호스트, 경로, 헤더)│  │ TCPRoute    │                                 │
│  │ GRPCRoute   │                                 │
│  └─────────────┘                                 │
└──────────────────────────────────────────────────┘

8.3 Gateway API 실전 예시

# 1. GatewayClass - 인프라 관리자가 정의
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: cilium
spec:
  controllerName: io.cilium/gateway-controller

---
# 2. Gateway - 클러스터 운영자가 정의
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: my-gateway
  namespace: gateway-infra
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: wildcard-tls
    allowedRoutes:
      namespaces:
        from: Selector
        selector:
          matchLabels:
            shared-gateway: "true"

---
# 3. HTTPRoute - 개발자가 정의
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: app-routes
  namespace: my-app
spec:
  parentRefs:
  - name: my-gateway
    namespace: gateway-infra
  hostnames:
  - "app.example.com"
  rules:
  # 헤더 기반 라우팅
  - matches:
    - headers:
      - name: "X-Canary"
        value: "true"
    backendRefs:
    - name: app-canary
      port: 80
      weight: 100
  # 트래픽 분할 (Canary 배포)
  - matches:
    - path:
        type: PathPrefix
        value: /api
    backendRefs:
    - name: app-stable
      port: 80
      weight: 90
    - name: app-canary
      port: 80
      weight: 10
  # 기본 라우팅
  - backendRefs:
    - name: app-stable
      port: 80

8.4 Ingress vs Gateway API 비교

기능IngressGateway API
역할 분리단일 리소스GatewayClass/Gateway/Route
프로토콜HTTP/HTTPS만HTTP/HTTPS/TCP/UDP/gRPC
트래픽 분할어노테이션 (비표준)네이티브 weight 지원
헤더 매칭어노테이션 (비표준)표준 스펙
TLS 설정기본적세밀한 제어
크로스 네임스페이스어려움네이티브 지원
확장성어노테이션만Policy API 확장
상태GA (안정)GA (v1.0+, 2023)

9. Network Policy: 마이크로 세그멘테이션

9.1 기본 개념

Network Policy는 Pod 수준의 방화벽입니다. 선택된 Pod에 대한 트래픽을 허용/차단합니다.

중요: Network Policy가 없으면 모든 트래픽이 허용됩니다. Network Policy를 하나라도 적용하면 해당 Pod에 대해 명시적으로 허용된 트래픽만 통과합니다.

9.2 Default Deny 정책

# 네임스페이스의 모든 Ingress 차단 (기본 거부)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}           # 모든 Pod에 적용
  policyTypes:
  - Ingress                 # Ingress만 차단 (Egress는 허용)

---
# 네임스페이스의 모든 Egress도 차단
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

9.3 실전 Network Policy 예시

# 백엔드 Pod: 프론트엔드에서만 접근 허용
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    # 같은 네임스페이스의 frontend Pod만 허용
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  # DB 접근 허용
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
  # DNS 허용 (필수!)
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

9.4 Cilium Network Policy (L7 확장)

# Cilium: HTTP 메서드/경로 기반 정책
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-l7-policy
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: reader
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
      rules:
        http:
        - method: GET                # 읽기만 허용
  - fromEndpoints:
    - matchLabels:
        role: writer
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
      rules:
        http:
        - method: GET
        - method: POST               # 쓰기도 허용
        - method: PUT
        - method: DELETE

---
# Cilium: DNS 기반 Egress 정책
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: dns-egress-policy
spec:
  endpointSelector:
    matchLabels:
      app: payment
  egress:
  - toFQDNs:
    - matchPattern: "*.stripe.com"   # Stripe API만 허용
    - matchName: "api.paypal.com"    # PayPal API만 허용
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

10. eBPF 네트워킹: iptables를 넘어서

10.1 eBPF란 무엇인가

eBPF(extended Berkeley Packet Filter)는 커널을 수정하지 않고 커널 공간에서 프로그램을 실행할 수 있는 기술입니다.

┌──────────────────────────────────────────────────┐
│              eBPF 아키텍처                        │
│                                                  │
User Space│  ┌──────────────────────────────────────────┐   │
│  │  Cilium Agent / hubble                   │   │
  (eBPF 프로그램 관리, 정책 적용)           │   │
│  └──────────────┬───────────────────────────┘   │
│                 │ BPF Syscall│  ───────────────┼────────────────────────────── │
Kernel Space   │                                │
│  ┌──────────────▼───────────────────────────┐   │
│  │  BPF Program (검증 후 JIT 컴파일)         │   │
│  │                                          │   │
│  │  Hook Points:                            │   │
│  │  ┌─────┐ ┌──────┐ ┌────────┐ ┌──────┐  │   │
│  │  │ XDP │ │tc/cls│ │ socket │ │kprobe│  │   │
│  │  │     │ │      │ │        │ │      │  │   │
│  │  └─────┘ └──────┘ └────────┘ └──────┘  │   │
│  │                                          │   │
│  │  BPF Maps (프로그램 간 데이터 공유)        │   │
│  │  ┌──────────────────────────────────┐   │   │
│  │  │ Service MapEndpoint MapCT  │   │   │
│  │  └──────────────────────────────────┘   │   │
│  └──────────────────────────────────────────┘   │
└──────────────────────────────────────────────────┘

10.2 eBPF가 iptables를 대체하는 이유

비교 항목iptableseBPF
실행 위치커널 네트필터 프레임워크커널 다양한 훅 포인트
규칙 매칭선형 탐색 O(n)해시맵 O(1)
업데이트전체 체인 재작성맵 엔트리만 업데이트
관찰성제한적 (카운터만)풍부한 메트릭, 이벤트
L7 처리불가능가능 (HTTP, DNS 등)
연결 추적conntrack (공유)BPF 자체 CT (효율적)
CPU 사용높음 (대규모)낮음
확장성10K 서비스에서 저하100K+ 서비스도 안정

10.3 Cilium Hubble: eBPF 기반 관찰성

# Hubble CLI로 트래픽 모니터링
hubble observe --namespace production

# 특정 Pod의 트래픽 확인
hubble observe --pod production/frontend-abc123

# DNS 쿼리 모니터링
hubble observe --protocol DNS

# 차단된 트래픽 확인 (Network Policy)
hubble observe --verdict DROPPED

# HTTP 요청 모니터링 (L7)
hubble observe --protocol HTTP --http-method GET

# 출력 예시:
# TIMESTAMP            SOURCE              DESTINATION         TYPE     VERDICT
# Apr 14 09:15:23.456  prod/frontend-xxx   prod/backend-yyy    L4/TCP   FORWARDED
# Apr 14 09:15:23.789  prod/backend-yyy    prod/postgres-zzz   L4/TCP   FORWARDED
# Apr 14 09:15:24.012  prod/frontend-xxx   prod/redis-aaa      L4/TCP   DROPPED

11. 네트워크 디버깅 실전 가이드

11.1 tcpdump를 이용한 패킷 캡처

# 에페메럴 컨테이너로 tcpdump 실행 (K8s 1.25+)
kubectl debug -it pod/my-app-xxx \
  --image=nicolaka/netshoot \
  --target=my-app-container \
  -- tcpdump -i eth0 -nn port 8080

# 특정 호스트와의 통신 캡처
kubectl debug -it pod/my-app-xxx \
  --image=nicolaka/netshoot \
  --target=my-app-container \
  -- tcpdump -i eth0 -nn host 10.1.2.3

# DNS 쿼리 캡처
kubectl debug -it pod/my-app-xxx \
  --image=nicolaka/netshoot \
  --target=my-app-container \
  -- tcpdump -i eth0 -nn port 53

11.2 네트워크 연결 테스트

# netshoot Pod으로 종합 테스트
kubectl run netshoot --image=nicolaka/netshoot --rm -it -- bash

# TCP 연결 테스트
curl -v telnet://my-service:8080

# DNS 해상도 확인
dig my-service.default.svc.cluster.local
dig +trace my-service.default.svc.cluster.local

# MTU 확인
ping -M do -s 1472 target-pod-ip  # 1472 + 28 = 1500

# 라우팅 테이블 확인
ip route show

# ARP 테이블 확인
ip neigh show

# iptables 규칙 확인 (노드에서)
iptables -t nat -L KUBE-SERVICES -n --line-numbers

11.3 Network Policy 트러블슈팅

# 1. 현재 적용된 Network Policy 확인
kubectl get networkpolicy -A

# 2. 특정 Pod에 적용되는 정책 확인
kubectl describe networkpolicy -n production

# 3. Cilium의 경우: 엔드포인트 정책 상태
cilium endpoint list
cilium policy get

# 4. 연결 테스트
kubectl exec -it frontend-pod -- curl -v http://backend-service:8080

# 5. DNS 허용 확인 (Egress Policy 시 필수)
kubectl exec -it my-pod -- nslookup kubernetes.default

# 흔한 실수들:
# - Egress Policy에서 DNS(UDP/TCP 53) 허용 누락
# - namespaceSelector 없이 podSelector만 사용
#   (다른 네임스페이스의 Pod 매칭 안 됨)
# - policyTypes에 Egress 누락 (Egress 규칙이 있어도 무시됨)

11.4 DNS 문제 해결 체크리스트

# 1. CoreDNS Pod 상태 확인
kubectl get pods -n kube-system -l k8s-app=kube-dns

# 2. CoreDNS 로그 확인
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

# 3. DNS 서비스 엔드포인트 확인
kubectl get endpoints kube-dns -n kube-system

# 4. resolv.conf 확인
kubectl exec my-pod -- cat /etc/resolv.conf

# 5. 직접 DNS 쿼리 테스트
kubectl exec my-pod -- nslookup kubernetes.default.svc.cluster.local 10.96.0.10

# 6. 외부 DNS 해상도 테스트
kubectl exec my-pod -- nslookup google.com

# DNS 성능 문제 시:
# - ndots:5를 ndots:2로 줄이기
# - NodeLocal DNSCache 사용
# - autopath 플러그인 활성화

12. 성능 튜닝

12.1 MTU 최적화

MTU 체인:
Pod (MTU)오버레이 (VXLAN -50, IPinIP -20) → 물리 NIC (MTU)

권장 설정:
- 물리 NIC: Jumbo Frame 9000 (가능한 경우)
- VXLAN 오버레이: 물리 MTU - 50 = 8950
- IPinIP 오버레이: 물리 MTU - 20 = 8980
- WireGuard 암호화: 물리 MTU - 60 = 8940

기본 1500 NIC:
- VXLAN: 1500 - 50 = 1450
- IPinIP: 1500 - 20 = 1480
# Calico MTU 설정
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
  name: default
spec:
  mtu: 8950              # 물리 NIC 9000 - VXLAN 50
  vxlanMTU: 8950
  wireguardMTU: 8940

12.2 NodeLocal DNSCache

DNS 쿼리를 노드 로컬에서 캐싱하여 CoreDNS 부하를 줄입니다.

# NodeLocal DNSCache DaemonSet (간략화)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: node-local-dns
  template:
    spec:
      containers:
      - name: node-cache
        image: registry.k8s.io/dns/k8s-dns-node-cache:1.23.0
        args:
        - "-localip"
        - "169.254.20.10"    # 링크 로컬 IP
        - "-conf"
        - "/etc/Corefile"
        - "-upstreamsvc"
        - "kube-dns"
NodeLocal DNSCache 동작:

Pod169.254.20.10 (NodeLocal)CoreDNS (캐시 미스 시)
          로컬 캐시 히트 시 즉시 응답

효과:
- DNS 레이턴시 50% 이상 감소
- CoreDNS 부하 70-80% 감소
- conntrack 경합 해소
- UDP DNS 패킷 손실 방지

12.3 TCP 튜닝

# 노드 레벨 커널 파라미터 튜닝

# 연결 추적 테이블 크기
net.netfilter.nf_conntrack_max = 1048576

# TCP 버퍼 크기
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# TCP 연결 재사용
net.ipv4.tcp_tw_reuse = 1

# SYN 백로그
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Keep-alive
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10

13. 멀티클러스터 네트워킹

13.1 Cilium ClusterMesh

┌──────────────┐         ┌──────────────┐
Cluster A   │         │  Cluster B  (us-east-1)  (eu-west-1)│              │         │              │
│  ┌────────┐  │ Tunnel  │  ┌────────┐  │
│  │Cilium  │◄─┼─────────┼─►│Cilium  │  │
│  │Agent   │  │         │  │Agent   │  │
│  └────────┘  │         │  └────────┘  │
│              │         │              │
Pod CIDR:   │         │  Pod CIDR:10.1.0.0/16 │         │  10.2.0.0/16│              │         │              │
Service:    │         │  Service:│  shared-db   │ Global  │  shared-db   │
  (annotated)Service   (annotated)└──────────────┘         └──────────────┘
# Global Service 설정 (양쪽 클러스터에서)
apiVersion: v1
kind: Service
metadata:
  name: shared-database
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/shared: "true"
spec:
  selector:
    app: postgres
  ports:
  - port: 5432

14. 퀴즈

Q1. Kubernetes에서 모든 Pod가 고유한 IP를 가지며 NAT 없이 통신하는 이유는?

정답: Kubernetes 네트워킹 모델의 세 가지 기본 원칙 때문입니다:

  1. 모든 Pod는 고유한 IP 주소를 가진다
  2. 모든 Pod는 다른 모든 Pod에 NAT 없이 도달할 수 있다
  3. Pod가 보는 자신의 IP와 다른 Pod가 보는 IP가 동일하다

이 모델은 포트 매핑이나 NAT의 복잡성을 제거하여 서비스 디스커버리와 네트워크 정책을 단순하게 만듭니다. CNI 플러그인이 이 요구사항을 구현합니다.

Q2. eBPF가 iptables보다 Kubernetes 네트워킹에 더 적합한 이유는?

정답: iptables는 규칙을 선형으로 탐색(O(n))하며, 규칙 변경 시 전체 체인을 재작성해야 합니다. 10,000개 이상의 Service에서 심각한 성능 저하가 발생합니다.

eBPF는:

  • 해시맵 기반 O(1) 조회로 Service 수에 관계없이 일정한 성능
  • 맵 엔트리만 업데이트하므로 규칙 변경이 빠름
  • L7 정책(HTTP 메서드/경로)을 커널에서 직접 처리
  • Hubble로 풍부한 관찰성 제공

Cilium이 eBPF를 사용하여 kube-proxy를 완전히 대체합니다.

Q3. Gateway API가 Ingress를 대체하게 된 핵심 이유는?

정답: Ingress의 핵심 한계들:

  • 비표준 어노테이션에 의존하여 컨트롤러마다 설정이 다름
  • HTTP/HTTPS만 지원하여 TCP/UDP/gRPC 불가
  • 단일 리소스로 인프라 관리자, 운영자, 개발자 간 역할 분리 어려움
  • 트래픽 분할, 헤더 매칭 등 고급 기능이 표준에 없음

Gateway API는 GatewayClass/Gateway/Route의 계층적 리소스로 역할을 분리하고, 트래픽 분할, 헤더 매칭, 다중 프로토콜을 표준 스펙으로 지원합니다.

Q4. Network Policy에서 Egress 규칙 설정 시 가장 흔한 실수는?

정답: DNS(UDP/TCP 포트 53) 허용을 누락하는 것입니다.

Egress Policy를 설정하면 모든 아웃바운드 트래픽이 차단됩니다. DNS 쿼리도 차단되므로 Service 이름 해상도가 실패합니다. 반드시 kube-dns(CoreDNS) Pod로의 UDP/TCP 53 포트 Egress를 허용해야 합니다.

다른 흔한 실수:

  • policyTypes에 Egress 명시 누락
  • namespaceSelector 없이 다른 네임스페이스 Pod를 매칭하려는 시도
  • CIDR 기반 규칙에서 Pod CIDR 범위 오류
Q5. ndots:5 설정이 외부 DNS 조회 성능을 저하시키는 이유와 해결책은?

정답: Kubernetes 기본 ndots:5 설정에서 "api.example.com"처럼 점이 5개 미만인 도메인을 조회하면, search 도메인을 순서대로 추가하여 먼저 시도합니다:

  1. api.example.com.default.svc.cluster.local (실패)
  2. api.example.com.svc.cluster.local (실패)
  3. api.example.com.cluster.local (실패)
  4. api.example.com (성공)

외부 DNS 하나를 조회하는데 불필요한 3번의 쿼리가 발생합니다.

해결책:

  • Pod dnsConfig에서 ndots를 2로 줄이기
  • FQDN 사용 (끝에 점 추가: api.example.com.)
  • NodeLocal DNSCache로 캐시 활용
  • CoreDNS autopath 플러그인 사용

15. 참고 자료

  1. Kubernetes Networking Model - Official Documentation
  2. Cilium Documentation - eBPF-based Networking
  3. Calico Documentation - Project Calico
  4. Gateway API - Official Specification
  5. CoreDNS - DNS for Service Discovery
  6. Network Policies - Kubernetes Documentation
  7. eBPF.io - Introduction to eBPF
  8. Hubble - Network Observability for Kubernetes
  9. IPVS-based kube-proxy - Kubernetes Blog
  10. CNI Specification - Container Network Interface
  11. NodeLocal DNSCache - Kubernetes Documentation
  12. Cilium ClusterMesh - Multi-Cluster Networking
  13. Life of a Packet in Kubernetes - Conference Talk

Kubernetes Networking Deep Dive Guide 2025: CNI, Service, Ingress, DNS, Network Policy

Table of Contents

1. Fundamental Principles of the Kubernetes Networking Model

Kubernetes networking is built on three fundamental principles.

1.1 Core Requirements

  1. Every Pod gets a unique IP address -- can communicate with others without NAT
  2. Every Pod can reach every other Pod -- whether on the same node or different nodes
  3. The IP a Pod sees for itself is the same IP others see -- no NAT
┌────────────────────────────────────────────────────┐
Kubernetes Networking Fundamentals│                                                    │
│  ┌──────────┐           ┌──────────┐              │
│  │  Node 1  │           │  Node 2  │              │
│  │          │           │          │              │
│  │ ┌──────┐ │   No NAT  │ ┌──────┐ │              │
│  │ │Pod A<├───────────>┤│Pod C │ │              │
│  │ │10.1.1│ │           │ │10.1.2│ │              │
│  │ └──────┘ │           │ └──────┘ │              │
│  │ ┌──────┐ │           │ ┌──────┐ │              │
│  │ │Pod B │ │           │ │Pod D │ │              │
│  │ │10.1.1│ │           │ │10.1.2│ │              │
│  │ └──────┘ │           │ └──────┘ │              │
│  └──────────┘           └──────────┘              │
│                                                    │
Pod A(10.1.1.2) -> Pod C(10.1.2.3): Direct comm  │
No NAT, no port mapping. Each Pod owns unique IP└────────────────────────────────────────────────────┘

1.2 Networking Layer Hierarchy

┌──────────────────────────────────────────┐
Kubernetes Networking Layers├──────────────────────────────────────────┤
L7: Ingress / Gateway API     (HTTP routing, TLS termination)├──────────────────────────────────────────┤
L4: Service     (ClusterIP, NodePort, LoadBalancer)├──────────────────────────────────────────┤
L3: Pod Networking (CNI)     (Pod-to-Pod, overlay/underlay)├──────────────────────────────────────────┤
L2-L3: Node Networking     (Physical/Virtual network)└──────────────────────────────────────────┘

2. Pod-to-Pod Networking: How It Works Internally

2.1 Pod Communication on the Same Node

┌────────────────────────────────────────────┐
Node│                                            │
│  ┌──────┐    veth pair    ┌──────────┐    │
│  │Pod A<--------------->│           │    │
│  │eth0  │                │   cbr0    │    │
│  │      │                  (bridge) │    │
│  └──────┘                │           │    │
│                          │           │    │
│  ┌──────┐    veth pair    │           │    │
│  │Pod B<--------------->│           │    │
│  │eth0  │                └──────────┘    │
│  └──────┘                                │
│                                            │
1. Pod A sends packet to Pod B2. Travels via veth pair to bridge (cbr0)3. Bridge finds destination veth by MAC4. Delivered to Pod B via its veth pair    │
└────────────────────────────────────────────┘

veth pair: A virtual ethernet pair -- one end is eth0 in the Pod namespace, the other connects to the node's bridge.

2.2 Pod Communication Across Nodes (Overlay)

┌──────────┐                      ┌──────────┐
Node 1  │                      │  Node 2│          │                      │          │
│ ┌──────┐ │     VXLAN/IPinIP     │ ┌──────┐ │
│ │Pod A │ │    ┌───────────┐     │ │Pod C │ │
│ │10.1.1│ │--->Tunneling---->│ │10.1.2│ │
│ └──────┘ │    │           │     │ └──────┘ │
│          │    │Encapsulate│     │          │
│ cbr0     │    │original   │     │ cbr0     │
10.1.1.0 │    │in outer   │     │ 10.1.2.0│          │    └───────────┘     │          │
│ eth0     │                      │ eth0     │
192.168.1│                      │192.168.2└──────────┘                      └──────────┘

VXLAN Encapsulation:
┌─────────────────────────────────────────┐
│OuterIP │ UDPVXLAN │InnerIP│ PayloadHdrHdrHdrHdr   │         │
192->192│      │       │10->10Data└─────────────────────────────────────────┘

2.3 Overlay vs Underlay Networking

FeatureOverlay (VXLAN/IPinIP)Underlay (BGP/Direct)
Setup DifficultyEasyHard
Network RequirementsL3 connectivity onlyBGP support needed
Performance OverheadYes (encapsulation)None
MTU ImpactReduced (50-54 bytes)None
DebuggingHardEasy
Use CaseCloud, multi-tenantBare metal, high perf

3. Detailed CNI Plugin Comparison

3.1 What Is CNI

CNI (Container Network Interface) is the standard interface for container runtimes to interact with network plugins.

CNI Flow During Pod Creation:

1. kubelet requests container creation via CRI
2. Container runtime creates network namespace
3. kubelet invokes CNI plugin (ADD command)
4. CNI plugin:
   a. Creates veth pair
   b. Assigns IP address to Pod (IPAM)
   c. Configures routing rules
   d. Sets up overlay tunnel if needed
5. Pod is network-ready

3.2 Major CNI Plugin Comparison

FeatureCalicoCiliumFlannelWeave Net
DeveloperTigeraIsovalent (Cisco)CoreOSWeaveworks
Data Planeiptables/eBPFeBPFVXLANVXLAN/Sleeve
Network ModeBGP/VXLAN/IPinIPVXLAN/NativeVXLANVXLAN
Network PolicyRichVery rich (L7)NoneBasic
EncryptionWireGuardWireGuard/IPsecNoneIPsec
kube-proxy ReplacementeBPF modeNative eBPFNoNo
ObservabilityBasicHubble (powerful)NoneScope
Service MeshNoneBuilt-in (optional)NoneNone
Multi-clusterSupportedClusterMeshNot supportedSupported
PerformanceHigh (BGP)Very high (eBPF)MediumLow
ComplexityMediumHighLowLow
Production Rec.Strongly rec.Strongly rec.Small onlyNot rec.

3.3 Calico Deep Dive

# Calico BGP mode configuration (IPPool)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 10.244.0.0/16
  encapsulation: None        # No encapsulation with BGP
  natOutgoing: true
  nodeSelector: all()
  blockSize: 26              # /26 = 64 IPs per node

---
# Calico BGP Peering configuration
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: rack-tor-switch
spec:
  peerIP: 192.168.1.1
  asNumber: 64512
  nodeSelector: rack == "rack-1"
# Calico VXLAN mode (cloud environments)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 10.244.0.0/16
  encapsulation: VXLAN       # VXLAN encapsulation
  natOutgoing: true
  vxlanMode: Always

3.4 Cilium Deep Dive

# Cilium Helm install (kube-proxy replacement mode)
# helm install cilium cilium/cilium --namespace kube-system \
#   --set kubeProxyReplacement=true \
#   --set k8sServiceHost=API_SERVER_IP \
#   --set k8sServicePort=6443 \
#   --set hubble.enabled=true \
#   --set hubble.relay.enabled=true \
#   --set hubble.ui.enabled=true \
#   --set encryption.enabled=true \
#   --set encryption.type=wireguard

# Verify Cilium status
# cilium status
# cilium connectivity test
# Cilium L7 Network Policy (HTTP-based policy)
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: l7-rule
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/v1/.*"
        - method: "POST"
          path: "/api/v1/orders"

4. Service Deep Dive: ClusterIP to LoadBalancer

4.1 How Services Work

┌──────────────────────────────────────────────────┐
Service Traffic Flow│                                                  │
Client Pod --> Service VIP --> kube-proxy        │
                (10.96.0.10)    /iptables/IPVS|│                           ┌────────┼────────┐    │
│                           v        v        v    │
Pod A    Pod B    Pod C10.1.1.2 10.1.1.3 10.1.2.4│                                                  │
Service VIP is not bound to any real interface│  iptables/IPVS rules DNAT to Pod IPs└──────────────────────────────────────────────────┘

4.2 Service Types in Detail

# 1. ClusterIP (default) - internal access only
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
  # Cluster IP: auto-assigned (e.g., 10.96.0.10)
  # Accessible only within cluster at 10.96.0.10:80
# 2. NodePort - external access via fixed port on all nodes
apiVersion: v1
kind: Service
metadata:
  name: my-nodeport-service
spec:
  type: NodePort
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080    # Range: 30000-32767
  # Accessible via port 30080 on any node
  # NodeIP:30080 -> ClusterIP:80 -> Pod:8080
# 3. LoadBalancer - auto-provisions cloud load balancer
apiVersion: v1
kind: Service
metadata:
  name: my-lb-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
  - port: 443
    targetPort: 8443
  # External LB IP -> NodePort -> ClusterIP -> Pod
# 4. ExternalName - maps external DNS to cluster service
apiVersion: v1
kind: Service
metadata:
  name: external-db
spec:
  type: ExternalName
  externalName: mydb.example.com
  # external-db.default.svc.cluster.local -> mydb.example.com
# 5. Headless Service - returns Pod IPs directly, no ClusterIP
apiVersion: v1
kind: Service
metadata:
  name: my-headless-service
spec:
  clusterIP: None            # Declares Headless
  selector:
    app: my-stateful-app
  ports:
  - port: 5432
  # DNS queries return individual Pod IPs, not Service IP
  # Used with StatefulSet: pod-0.my-headless-service.default.svc

4.3 Session Affinity

apiVersion: v1
kind: Service
metadata:
  name: sticky-service
spec:
  selector:
    app: web
  ports:
  - port: 80
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800  # 3 hours
  # Same client IP routes to same Pod

5. kube-proxy Modes: iptables vs IPVS vs eBPF

5.1 iptables Mode (Default)

iptables rule chain (per Service):

PREROUTING -> KUBE-SERVICES -> KUBE-SVC-XXXX -> KUBE-SEP-YYYY
                                               (probabilistic)

Service with 3 Pods:
KUBE-SVC-XXXX:
  -p tcp -d 10.96.0.10 --dport 80
    -> 33% probability KUBE-SEP-1 (DNAT -> 10.1.1.2:8080)
    -> 33% probability KUBE-SEP-2 (DNAT -> 10.1.1.3:8080)
    -> 34% probability KUBE-SEP-3 (DNAT -> 10.1.2.4:8080)

Problems:
- Rule count grows proportionally with Services/Pods O(n)
- Full chain rewrite on rule updates
- Severe degradation at 10,000+ Services

5.2 IPVS Mode

IPVS (IP Virtual Server):

┌───────────────────────────────────────────┐
IPVS Virtual Server: 10.96.0.10:80│                                           │
Real Server 1: 10.1.1.2:8080  (weight 1)Real Server 2: 10.1.1.3:8080  (weight 1)Real Server 3: 10.1.2.4:8080  (weight 1)│                                           │
Algorithm: rr (Round Robin)Others: lc, dh, sh, sed, nq             │
└───────────────────────────────────────────┘

Advantages:
- Hash table-based O(1) lookup
- Multiple load balancing algorithms
- Stable performance at scale
- Real-time statistics and connection tracking
# kube-proxy IPVS mode configuration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr"       # rr, lc, dh, sh, sed, nq
  syncPeriod: "30s"
  minSyncPeriod: "2s"

5.3 eBPF Mode (Cilium)

eBPF kube-proxy replacement:

┌────────────────────────────────────────────┐
Before: Pod -> iptables/IPVS -> Pod│  eBPF:   Pod -> BPF program -> Pod (direct)│                                            │
│  ┌──────┐   BPF map    ┌──────┐           │
│  │Pod A--(Service   --│Pod B │           │
│  │      │   lookup)     │      │           │
│  └──────┘               └──────┘           │
│                                            │
No iptables chain traversal!Direct packet redirect in kernel space    │
└────────────────────────────────────────────┘
ComparisoniptablesIPVSeBPF (Cilium)
Lookup ComplexityO(n)O(1)O(1)
Rule UpdatesFull rewriteIncrementalIncremental
Load BalancingProbabilisticMultiple algorithmsMaglev hash
Connection TrackingconntrackconntrackBPF conntrack
PerformanceMediumHighVery high
L7 PolicyNoNoYes
ObservabilityLimitedBasicHubble (powerful)
10K ServicesVery slowFastVery fast

6. DNS: CoreDNS and Service Discovery

6.1 CoreDNS Architecture

┌──────────────────────────────────────────────────┐
CoreDNS Operation│                                                  │
Pod --DNS query--> CoreDNS Pod  (nameserver        (kube-system)10.96.0.10)|│           ┌──────────┼──────────┐                │
│           v          v          v                │
K8s API     Corefile     Upstream   (Service/Pod  (Plugin      DNS│    records)      config)   (External DNS)└──────────────────────────────────────────────────┘

6.2 DNS Record Format

Service DNS:
  my-service.my-namespace.svc.cluster.local
  |--svcname-| |-namespace--|

Pod DNS:
  10-1-1-2.my-namespace.pod.cluster.local
  |IP(dots->dashes)|

Headless Service Pod DNS (StatefulSet):
  pod-0.my-headless.my-namespace.svc.cluster.local
  |podname| |-svcname--|

SRV Records:
  _http._tcp.my-service.my-namespace.svc.cluster.local
  -> Returns port number and hostname

6.3 CoreDNS Corefile Configuration

# CoreDNS Corefile (ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
          ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

6.4 DNS Debugging

# Create a debug Pod for DNS testing
kubectl run dnsutils --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
  --restart=Never -- sleep infinity

# Service DNS lookup
kubectl exec dnsutils -- nslookup my-service.default.svc.cluster.local

# Pod DNS lookup
kubectl exec dnsutils -- nslookup 10-1-1-2.default.pod.cluster.local

# Detailed DNS response
kubectl exec dnsutils -- dig my-service.default.svc.cluster.local +search +short

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns -f

# Check resolv.conf
kubectl exec dnsutils -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

6.5 The ndots:5 Problem and Solutions

# Problem with default ndots:5:
# Looking up "api.example.com" (fewer than 5 dots):
# 1. api.example.com.default.svc.cluster.local (fail)
# 2. api.example.com.svc.cluster.local (fail)
# 3. api.example.com.cluster.local (fail)
# 4. api.example.com (success)
# -> 3 unnecessary queries for external DNS lookups!

# Solution 1: Customize Pod DNS config
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"       # Reduce ndots to 2
  containers:
  - name: app
    image: my-app

# Solution 2: Use FQDN directly (trailing dot)
# api.example.com.  <- trailing dot means FQDN

7. Ingress: HTTP/HTTPS Routing

7.1 Ingress Architecture

┌──────────────────────────────────────────────────┐
Ingress Architecture│                                                  │
External Traffic|│      v                                           │
│  ┌─────────────────────┐                         │
│  │  Ingress Controller   (nginx, Traefik, etc.)  (actual proxy)     │                         │
│  └─────────┬───────────┘                         │
|Watches Ingress resources                    │
     (routing rules)|│     ┌──────┼──────┐                              │
│     v      v      v                              │
Svc A  Svc B  Svc C/api   /web   /docs                            │
└──────────────────────────────────────────────────┘

7.2 Ingress Resource Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80

7.3 Ingress Controller Comparison

Featurenginx-ingressTraefikHAProxyContour
DeveloperKubernetes/F5Traefik LabsHAProxy TechVMware
ProtocolsHTTP/HTTPS/gRPCHTTP/HTTPS/gRPC/TCPHTTP/HTTPS/TCPHTTP/HTTPS/gRPC
Config ReloadRestart neededHot reloadHot reloadHot reload
Rate LimitingAnnotationsMiddlewareBuilt-inNot supported
AuthBasic/OAuthForward AuthBuilt-inLimited
PerformanceHighMediumVery highHigh
Gateway APISupportedSupportedPartialFull support
CommunityVery largeLargeMediumMedium

8. Gateway API: The Evolution of Ingress

8.1 Why Gateway API

Limitations of Ingress:

  • Non-standard configuration dependent on annotations
  • Only L7 HTTP supported (no TCP/UDP/gRPC)
  • Difficult role separation with a single resource
  • No advanced features like traffic splitting, header matching

8.2 Gateway API Resource Hierarchy

┌──────────────────────────────────────────────────┐
Gateway API Resource Hierarchy│                                                  │
Infrastructure Admin:│  ┌─────────────┐                                 │
│  │GatewayClass │  Which controller to use        │
│  └──────┬──────┘                                 │
|Cluster Operator:│  ┌──────v──────┐                                 │
│  │  GatewayListeners (port, protocol, TLS)│  └──────┬──────┘                                 │
|Application Developer:│  ┌──────v──────┐                                 │
│  │ HTTPRouteRouting rules (host,path,hdr)│  │ TCPRoute    │                                 │
│  │ GRPCRoute   │                                 │
│  └─────────────┘                                 │
└──────────────────────────────────────────────────┘

8.3 Gateway API Practical Example

# 1. GatewayClass - Defined by infra admin
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: cilium
spec:
  controllerName: io.cilium/gateway-controller

---
# 2. Gateway - Defined by cluster operator
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: my-gateway
  namespace: gateway-infra
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: wildcard-tls
    allowedRoutes:
      namespaces:
        from: Selector
        selector:
          matchLabels:
            shared-gateway: "true"

---
# 3. HTTPRoute - Defined by developer
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: app-routes
  namespace: my-app
spec:
  parentRefs:
  - name: my-gateway
    namespace: gateway-infra
  hostnames:
  - "app.example.com"
  rules:
  # Header-based routing
  - matches:
    - headers:
      - name: "X-Canary"
        value: "true"
    backendRefs:
    - name: app-canary
      port: 80
      weight: 100
  # Traffic splitting (Canary deployment)
  - matches:
    - path:
        type: PathPrefix
        value: /api
    backendRefs:
    - name: app-stable
      port: 80
      weight: 90
    - name: app-canary
      port: 80
      weight: 10
  # Default routing
  - backendRefs:
    - name: app-stable
      port: 80

8.4 Ingress vs Gateway API Comparison

FeatureIngressGateway API
Role SeparationSingle resourceGatewayClass/Gateway/Route
ProtocolsHTTP/HTTPS onlyHTTP/HTTPS/TCP/UDP/gRPC
Traffic SplittingAnnotations (non-standard)Native weight support
Header MatchingAnnotations (non-standard)Standard spec
TLS ConfigBasicFine-grained control
Cross-namespaceDifficultNative support
ExtensibilityAnnotations onlyPolicy API extension
StatusGA (stable)GA (v1.0+, 2023)

9. Network Policy: Microsegmentation

9.1 Basic Concept

Network Policy is a Pod-level firewall. It allows or blocks traffic to selected Pods.

Important: Without Network Policy, all traffic is allowed. Once any Network Policy applies to a Pod, only explicitly allowed traffic passes through.

9.2 Default Deny Policy

# Deny all Ingress in namespace (default deny)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}           # Applies to all Pods
  policyTypes:
  - Ingress                 # Block Ingress only (Egress allowed)

---
# Deny all Egress too
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

9.3 Practical Network Policy Examples

# Backend Pod: Allow access only from frontend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    # Allow only frontend Pods in same namespace
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  # Allow DB access
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
  # Allow DNS (essential!)
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

9.4 Cilium Network Policy (L7 Extension)

# Cilium: HTTP method/path based policy
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-l7-policy
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: reader
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
      rules:
        http:
        - method: GET                # Read only
  - fromEndpoints:
    - matchLabels:
        role: writer
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
      rules:
        http:
        - method: GET
        - method: POST               # Write allowed too
        - method: PUT
        - method: DELETE

---
# Cilium: DNS-based Egress policy
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: dns-egress-policy
spec:
  endpointSelector:
    matchLabels:
      app: payment
  egress:
  - toFQDNs:
    - matchPattern: "*.stripe.com"   # Only Stripe API
    - matchName: "api.paypal.com"    # Only PayPal API
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

10. eBPF Networking: Beyond iptables

10.1 What Is eBPF

eBPF (extended Berkeley Packet Filter) enables running programs in kernel space without modifying the kernel.

┌──────────────────────────────────────────────────┐
│              eBPF Architecture│                                                  │
User Space│  ┌──────────────────────────────────────────┐   │
│  │  Cilium Agent / hubble                   │   │
  (eBPF program mgmt, policy enforcement) │   │
│  └──────────────┬───────────────────────────┘   │
| BPF Syscall│  ───────────────┼────────────────────────────── │
Kernel Space   |│  ┌──────────────v───────────────────────────┐   │
│  │  BPF Program (verified, JIT compiled)    │   │
│  │                                          │   │
│  │  Hook Points:                            │   │
│  │  ┌─────┐ ┌──────┐ ┌────────┐ ┌──────┐  │   │
│  │  │ XDP │ │tc/cls│ │ socket │ │kprobe│  │   │
│  │  └─────┘ └──────┘ └────────┘ └──────┘  │   │
│  │                                          │   │
│  │  BPF Maps (data sharing between progs)   │   │
│  │  ┌──────────────────────────────────┐   │   │
│  │  │ Service Map | Endpoint Map | CT  │   │   │
│  │  └──────────────────────────────────┘   │   │
│  └──────────────────────────────────────────┘   │
└──────────────────────────────────────────────────┘

10.2 Why eBPF Replaces iptables

ComparisoniptableseBPF
ExecutionKernel netfilter frameworkKernel hook points
Rule MatchingLinear scan O(n)Hash map O(1)
UpdatesFull chain rewriteMap entry update
ObservabilityLimited (counters only)Rich metrics, events
L7 ProcessingNot possiblePossible (HTTP, DNS, etc.)
Conn. Trackingconntrack (shared)BPF CT (efficient)
CPU UsageHigh (at scale)Low
ScalabilityDegrades at 10K servicesStable at 100K+

10.3 Cilium Hubble: eBPF-Based Observability

# Monitor traffic with Hubble CLI
hubble observe --namespace production

# Check traffic for specific Pod
hubble observe --pod production/frontend-abc123

# Monitor DNS queries
hubble observe --protocol DNS

# Check dropped traffic (Network Policy)
hubble observe --verdict DROPPED

# Monitor HTTP requests (L7)
hubble observe --protocol HTTP --http-method GET

# Example output:
# TIMESTAMP            SOURCE              DESTINATION         TYPE     VERDICT
# Apr 14 09:15:23.456  prod/frontend-xxx   prod/backend-yyy    L4/TCP   FORWARDED
# Apr 14 09:15:23.789  prod/backend-yyy    prod/postgres-zzz   L4/TCP   FORWARDED
# Apr 14 09:15:24.012  prod/frontend-xxx   prod/redis-aaa      L4/TCP   DROPPED

11. Practical Network Debugging Guide

11.1 Packet Capture with tcpdump

# Run tcpdump via ephemeral container (K8s 1.25+)
kubectl debug -it pod/my-app-xxx \
  --image=nicolaka/netshoot \
  --target=my-app-container \
  -- tcpdump -i eth0 -nn port 8080

# Capture traffic with specific host
kubectl debug -it pod/my-app-xxx \
  --image=nicolaka/netshoot \
  --target=my-app-container \
  -- tcpdump -i eth0 -nn host 10.1.2.3

# Capture DNS queries
kubectl debug -it pod/my-app-xxx \
  --image=nicolaka/netshoot \
  --target=my-app-container \
  -- tcpdump -i eth0 -nn port 53

11.2 Network Connectivity Testing

# Comprehensive testing with netshoot Pod
kubectl run netshoot --image=nicolaka/netshoot --rm -it -- bash

# TCP connection test
curl -v telnet://my-service:8080

# DNS resolution check
dig my-service.default.svc.cluster.local
dig +trace my-service.default.svc.cluster.local

# MTU check
ping -M do -s 1472 target-pod-ip  # 1472 + 28 = 1500

# Routing table check
ip route show

# ARP table check
ip neigh show

# iptables rules check (on node)
iptables -t nat -L KUBE-SERVICES -n --line-numbers

11.3 Network Policy Troubleshooting

# 1. Check current Network Policies
kubectl get networkpolicy -A

# 2. Check policies applied to specific Pod
kubectl describe networkpolicy -n production

# 3. For Cilium: endpoint policy status
cilium endpoint list
cilium policy get

# 4. Connection test
kubectl exec -it frontend-pod -- curl -v http://backend-service:8080

# 5. Verify DNS is allowed (essential for Egress Policy)
kubectl exec -it my-pod -- nslookup kubernetes.default

# Common mistakes:
# - Missing DNS (UDP/TCP 53) in Egress Policy
# - Using podSelector without namespaceSelector
#   (won't match Pods in other namespaces)
# - Missing Egress in policyTypes (Egress rules ignored)

11.4 DNS Troubleshooting Checklist

# 1. Check CoreDNS Pod status
kubectl get pods -n kube-system -l k8s-app=kube-dns

# 2. Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

# 3. Check DNS service endpoints
kubectl get endpoints kube-dns -n kube-system

# 4. Check resolv.conf
kubectl exec my-pod -- cat /etc/resolv.conf

# 5. Direct DNS query test
kubectl exec my-pod -- nslookup kubernetes.default.svc.cluster.local 10.96.0.10

# 6. External DNS resolution test
kubectl exec my-pod -- nslookup google.com

# For DNS performance issues:
# - Reduce ndots:5 to ndots:2
# - Use NodeLocal DNSCache
# - Enable autopath plugin

12. Performance Tuning

12.1 MTU Optimization

MTU Chain:
Pod (MTU) -> Overlay (VXLAN -50, IPinIP -20) -> Physical NIC (MTU)

Recommended settings:
- Physical NIC: Jumbo Frame 9000 (if possible)
- VXLAN overlay: Physical MTU - 50 = 8950
- IPinIP overlay: Physical MTU - 20 = 8980
- WireGuard encryption: Physical MTU - 60 = 8940

Default 1500 NIC:
- VXLAN: 1500 - 50 = 1450
- IPinIP: 1500 - 20 = 1480
# Calico MTU configuration
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
  name: default
spec:
  mtu: 8950              # Physical NIC 9000 - VXLAN 50
  vxlanMTU: 8950
  wireguardMTU: 8940

12.2 NodeLocal DNSCache

Caches DNS queries locally on nodes to reduce CoreDNS load.

# NodeLocal DNSCache DaemonSet (simplified)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: node-local-dns
  template:
    spec:
      containers:
      - name: node-cache
        image: registry.k8s.io/dns/k8s-dns-node-cache:1.23.0
        args:
        - "-localip"
        - "169.254.20.10"    # Link-local IP
        - "-conf"
        - "/etc/Corefile"
        - "-upstreamsvc"
        - "kube-dns"
NodeLocal DNSCache Flow:

Pod -> 169.254.20.10 (NodeLocal) -> CoreDNS (on cache miss)
                |
          On cache hit: immediate response

Benefits:
- DNS latency reduced by 50%+
- CoreDNS load reduced by 70-80%
- Eliminates conntrack contention
- Prevents UDP DNS packet loss

12.3 TCP Tuning

# Node-level kernel parameter tuning

# Connection tracking table size
net.netfilter.nf_conntrack_max = 1048576

# TCP buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# TCP connection reuse
net.ipv4.tcp_tw_reuse = 1

# SYN backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Keep-alive
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10

13. Multi-Cluster Networking

13.1 Cilium ClusterMesh

┌──────────────┐         ┌──────────────┐
Cluster A   │         │  Cluster B  (us-east-1)  (eu-west-1)│              │         │              │
│  ┌────────┐  │ Tunnel  │  ┌────────┐  │
│  │Cilium  │<----------->│Cilium  │  │
│  │Agent   │  │         │  │Agent   │  │
│  └────────┘  │         │  └────────┘  │
│              │         │              │
Pod CIDR:   │         │  Pod CIDR:10.1.0.0/16 │         │  10.2.0.0/16│              │         │              │
Service:    │         │  Service:│  shared-db   │ Global  │  shared-db   │
  (annotated)Service   (annotated)└──────────────┘         └──────────────┘
# Global Service configuration (on both clusters)
apiVersion: v1
kind: Service
metadata:
  name: shared-database
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/shared: "true"
spec:
  selector:
    app: postgres
  ports:
  - port: 5432

14. Quiz

Q1. Why does every Pod in Kubernetes get a unique IP and communicate without NAT?

Answer: Because of Kubernetes networking model's three fundamental principles:

  1. Every Pod gets a unique IP address
  2. Every Pod can reach every other Pod without NAT
  3. The IP a Pod sees for itself is the same IP others see

This model eliminates the complexity of port mapping and NAT, simplifying service discovery and network policies. CNI plugins implement these requirements.

Q2. Why is eBPF more suitable for Kubernetes networking than iptables?

Answer: iptables traverses rules linearly (O(n)) and requires full chain rewrites on updates. At 10,000+ Services, performance degrades severely.

eBPF offers:

  • Hash map-based O(1) lookup with consistent performance regardless of Service count
  • Map entry updates only for fast rule changes
  • L7 policies (HTTP method/path) processed directly in kernel
  • Hubble for rich observability

Cilium uses eBPF to fully replace kube-proxy.

Q3. What are the key reasons Gateway API replaces Ingress?

Answer: Core limitations of Ingress:

  • Non-standard annotations -- configuration differs per controller
  • HTTP/HTTPS only -- no TCP/UDP/gRPC support
  • Single resource -- hard to separate roles between infra admins, operators, and developers
  • Advanced features like traffic splitting and header matching not in standard

Gateway API uses hierarchical resources (GatewayClass/Gateway/Route) for role separation and supports traffic splitting, header matching, and multiple protocols as standard spec.

Q4. What is the most common mistake when setting up Egress Network Policy rules?

Answer: Forgetting to allow DNS (UDP/TCP port 53).

When Egress Policy is set, all outbound traffic is blocked. DNS queries are also blocked, so Service name resolution fails. You must allow UDP/TCP port 53 Egress to kube-dns (CoreDNS) Pods.

Other common mistakes:

  • Missing Egress in policyTypes
  • Trying to match Pods in other namespaces without namespaceSelector
  • CIDR range errors in Pod CIDR-based rules
Q5. Why does ndots:5 degrade external DNS lookup performance, and what are the solutions?

Answer: With the default Kubernetes ndots:5 setting, looking up domains like "api.example.com" (fewer than 5 dots) first tries appending search domains:

  1. api.example.com.default.svc.cluster.local (fail)
  2. api.example.com.svc.cluster.local (fail)
  3. api.example.com.cluster.local (fail)
  4. api.example.com (success)

This causes 3 unnecessary queries for a single external DNS lookup.

Solutions:

  • Reduce ndots to 2 in Pod dnsConfig
  • Use FQDN (add trailing dot: api.example.com.)
  • Use NodeLocal DNSCache for caching
  • Enable CoreDNS autopath plugin

15. References

  1. Kubernetes Networking Model - Official Documentation
  2. Cilium Documentation - eBPF-based Networking
  3. Calico Documentation - Project Calico
  4. Gateway API - Official Specification
  5. CoreDNS - DNS for Service Discovery
  6. Network Policies - Kubernetes Documentation
  7. eBPF.io - Introduction to eBPF
  8. Hubble - Network Observability for Kubernetes
  9. IPVS-based kube-proxy - Kubernetes Blog
  10. CNI Specification - Container Network Interface
  11. NodeLocal DNSCache - Kubernetes Documentation
  12. Cilium ClusterMesh - Multi-Cluster Networking
  13. Life of a Packet in Kubernetes - Conference Talk