Split View: 네트워크 엔지니어링 & 보안 완전 정복: TCP/IP부터 서비스 메시, AI 서빙 네트워크까지

네트워크 엔지니어링 & 보안 완전 정복: TCP/IP부터 서비스 메시, AI 서빙 네트워크까지

1. 네트워크 기초

OSI 7계층 모델

OSI(Open Systems Interconnection) 모델은 네트워크 통신을 7개의 계층으로 추상화한 참조 모델입니다.

계층	이름	프로토콜 예시	PDU
7	Application	HTTP, DNS, SMTP	Message
6	Presentation	TLS, JPEG, ASCII	Message
5	Session	NetBIOS, RPC	Message
4	Transport	TCP, UDP	Segment
3	Network	IP, ICMP	Packet
2	Data Link	Ethernet, 802.11	Frame
1	Physical	케이블, 광섬유	Bits

TCP/IP 스택

실제 인터넷은 OSI보다 단순화된 4계층 TCP/IP 모델을 사용합니다.

Application Layer: HTTP/2, HTTP/3, DNS, TLS
Transport Layer: TCP (신뢰성), UDP (속도)
Internet Layer: IPv4, IPv6, ICMP
Link Layer: Ethernet, Wi-Fi

HTTP/2와 HTTP/3의 차이

HTTP/1.1은 하나의 연결에서 한 번에 하나의 요청만 처리할 수 있어 Head-of-Line(HOL) 블로킹이 발생합니다.

HTTP/2 개선점:

멀티플렉싱: 하나의 TCP 연결에서 여러 스트림 동시 처리
헤더 압축: HPACK 알고리즘으로 중복 헤더 제거
서버 푸시: 클라이언트 요청 없이 리소스 선제 전송
바이너리 프레이밍: 텍스트 대신 바이너리 프레임 사용

HTTP/3 & QUIC: HTTP/3는 TCP 대신 QUIC(UDP 기반) 위에서 동작하여 TCP 레벨 HOL 블로킹까지 해결합니다.

HTTP/1.1:  [Req1] → [Res1] → [Req2] → [Res2]  (순차)
HTTP/2:    [Req1, Req2, Req3] → [Res1, Res2, Res3]  (멀티플렉싱, 단일 TCP)
HTTP/3:    [Req1, Req2, Req3] → [Res1, Res2, Res3]  (멀티플렉싱, 독립 QUIC 스트림)

DNS 동작 원리

api.example.com을 조회할 때의 흐름:

브라우저 캐시 확인
OS 캐시 확인 (/etc/hosts)
Recursive Resolver(ISP DNS)에 질의
Root NS → .com TLD NS → example.com Authoritative NS 순서로 계층적 질의
A 레코드(IPv4) 또는 AAAA 레코드(IPv6) 반환

TLS 1.3 핸드셰이크

TLS 1.3은 이전 버전보다 왕복 횟수를 줄여 1-RTT(최초 연결) 또는 0-RTT(재연결)로 세션을 수립합니다.

Client                          Server
  |--- ClientHello (키 교환) ---->|
  |<-- ServerHello + Certificate-|
  |<-- + EncryptedExtensions -----|
  |--- Finished ----------------->|
  |<-> 암호화된 데이터 교환 <------>|

2. 소켓 프로그래밍

Python asyncio TCP 서버

import asyncio

async def handle_client(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
    addr = writer.get_extra_info('peername')
    print(f"연결됨: {addr}")

    try:
        while True:
            data = await reader.read(1024)
            if not data:
                break
            message = data.decode('utf-8').strip()
            print(f"수신: {message} from {addr}")

            # 에코 응답
            response = f"Echo: {message}\n"
            writer.write(response.encode('utf-8'))
            await writer.drain()
    except asyncio.IncompleteReadError:
        pass
    finally:
        print(f"연결 종료: {addr}")
        writer.close()
        await writer.wait_closed()

async def main():
    server = await asyncio.start_server(
        handle_client, '0.0.0.0', 8888
    )
    addr = server.sockets[0].getsockname()
    print(f"서버 시작: {addr}")

    async with server:
        await server.serve_forever()

if __name__ == '__main__':
    asyncio.run(main())

aiohttp를 사용한 비동기 HTTP 클라이언트

import asyncio
import aiohttp
from typing import List

async def fetch_url(session: aiohttp.ClientSession, url: str) -> dict:
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
        return {
            'url': url,
            'status': response.status,
            'body': await response.text()
        }

async def fetch_all(urls: List[str]) -> List[dict]:
    connector = aiohttp.TCPConnector(
        limit=100,          # 최대 동시 연결 수
        limit_per_host=10,  # 호스트당 최대 연결
        keepalive_timeout=30
    )
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [fetch_url(session, url) for url in urls]
        return await asyncio.gather(*tasks, return_exceptions=True)

# 실행
urls = [f"https://httpbin.org/get?id={i}" for i in range(20)]
results = asyncio.run(fetch_all(urls))

gRPC vs REST 비교

항목	gRPC	REST
프로토콜	HTTP/2	HTTP/1.1 or HTTP/2
직렬화	Protocol Buffers (바이너리)	JSON (텍스트)
스트리밍	양방향 스트리밍 지원	제한적 (SSE, WebSocket 별도)
타입 안전성	강한 타입 (.proto 스키마)	약한 타입
레이턴시	낮음	상대적으로 높음
브라우저 지원	grpc-web 필요	네이티브

3. 서비스 메시

Istio 아키텍처

Istio는 사이드카 패턴으로 Envoy 프록시를 각 Pod에 주입하여 서비스 간 통신을 제어합니다.

Control Plane (Istiod): 설정 관리, 인증서 발급, 트래픽 정책 배포
Data Plane (Envoy Sidecar): 실제 트래픽 처리, 메트릭 수집, mTLS 적용

VirtualService 설정 예시

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ml-inference-vs
  namespace: production
spec:
  hosts:
    - ml-inference-svc
  http:
    - match:
        - headers:
            x-model-version:
              exact: 'v2'
      route:
        - destination:
            host: ml-inference-svc
            subset: v2
          weight: 100
    - route:
        - destination:
            host: ml-inference-svc
            subset: v1
          weight: 90
        - destination:
            host: ml-inference-svc
            subset: v2
          weight: 10

로드 밸런싱 알고리즘

Round Robin: 순서대로 분산 (기본값)
Least Connections: 활성 연결이 가장 적은 서버 선택
Weighted Round Robin: 가중치 기반 분산
IP Hash: 클라이언트 IP로 특정 서버에 고정 (세션 유지)
Consistent Hashing: 분산 캐시 환경에서 서버 추가/제거 시 재분산 최소화

서비스 디스커버리

Kubernetes 환경에서는 CoreDNS가 서비스 디스커버리를 담당합니다. service-name.namespace.svc.cluster.local 형식으로 서비스에 접근합니다.

4. 네트워크 보안

TLS 인증서 체인

Root CA (최상위 인증기관, 브라우저/OS에 내장)
  └── Intermediate CA (중간 인증기관)
        └── Leaf Certificate (실제 서버 인증서)

각 인증서는 상위 CA의 개인키로 서명되어 신뢰 체인을 형성합니다.

mTLS (Mutual TLS)

일반 TLS는 서버만 인증서를 제공합니다. mTLS는 클라이언트도 인증서를 제공하여 양방향 인증을 수행합니다.

일반 TLS:  Client → [서버 인증서 검증] → Server
mTLS:      Client ↔ [양방향 인증서 검증] ↔ Server

서비스 메시에서 mTLS는 사이드카 프록시 간의 통신을 자동으로 암호화하고 인증합니다.

Nginx TLS 설정

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/ssl/certs/server.crt;
    ssl_certificate_key /etc/ssl/private/server.key;

    # TLS 1.2 이상만 허용
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;

    # HSTS (1년)
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;

    location /api/ {
        proxy_pass http://backend_pool;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Zero Trust 아키텍처

Zero Trust의 핵심 원칙: "절대 신뢰하지 말고, 항상 검증하라 (Never Trust, Always Verify)"

ID 기반 접근 제어: IP 주소 대신 사용자/서비스 ID로 인증
최소 권한 원칙: 필요한 리소스에만 접근 허용
지속적 검증: 세션 중에도 지속적으로 신뢰 수준 평가
마이크로 세그멘테이션: 네트워크를 작은 구역으로 분리하여 측면 이동 차단

JWT & OAuth 2.0

JWT(JSON Web Token)는 세 부분으로 구성됩니다: Header.Payload.Signature

OAuth 2.0 Authorization Code Flow:

클라이언트가 Authorization Server로 리다이렉트
사용자 인증 후 Authorization Code 발급
백엔드에서 Code를 Access Token으로 교환
Access Token으로 Resource Server API 호출

5. CDN & 엣지 컴퓨팅

CDN 작동 원리

CDN(Content Delivery Network)은 전 세계 PoP(Point of Presence)에 콘텐츠를 캐싱하여 지연 시간을 줄입니다.

캐시 전략:

Cache-Control: max-age=86400: 브라우저 및 CDN에서 1일 캐시
Cache-Control: no-cache: 항상 원본 서버에 재검증 요청
ETag / Last-Modified: 조건부 요청으로 변경 여부 확인

Cloudflare Workers 예시

export default {
  async fetch(request, env) {
    const url = new URL(request.url)

    // 엣지에서 AI 추론 라우팅
    if (url.pathname.startsWith('/inference/')) {
      const modelId = url.searchParams.get('model') || 'default'

      // 가장 가까운 GPU 클러스터로 라우팅
      const region = request.cf.region
      const backendUrl = selectBackend(region, modelId)

      return fetch(backendUrl, {
        method: request.method,
        headers: request.headers,
        body: request.body,
      })
    }

    // 정적 자산 캐싱
    const cache = caches.default
    const cachedResponse = await cache.match(request)
    if (cachedResponse) return cachedResponse

    const response = await fetch(request)
    // 성공 응답만 캐싱
    if (response.status === 200) {
      const responseToCache = response.clone()
      await cache.put(request, responseToCache)
    }
    return response
  },
}

6. AI 서빙 네트워크

gRPC를 활용한 ML 모델 서빙

Protocol Buffers 정의:

syntax = "proto3";
package inference;

service InferenceService {
  rpc Predict(PredictRequest) returns (PredictResponse);
  rpc StreamPredict(PredictRequest) returns (stream PredictResponse);
}

message PredictRequest {
  string model_name = 1;
  repeated float input_data = 2;
  map<string, string> metadata = 3;
}

message PredictResponse {
  repeated float output_data = 1;
  float confidence = 2;
  int64 latency_ms = 3;
}

Python gRPC 서버:

import grpc
from concurrent import futures
import inference_pb2
import inference_pb2_grpc
import numpy as np
import time

class InferenceServicer(inference_pb2_grpc.InferenceServiceServicer):
    def Predict(self, request, context):
        start = time.time()
        input_array = np.array(request.input_data)

        # 실제 모델 추론 (예시)
        output = input_array * 2.0

        latency = int((time.time() - start) * 1000)
        return inference_pb2.PredictResponse(
            output_data=output.tolist(),
            confidence=0.95,
            latency_ms=latency
        )

    def StreamPredict(self, request, context):
        # 스트리밍 응답 (LLM 토큰 생성 등에 활용)
        tokens = ["안녕", "하세요", "!", " gRPC", " 스트리밍", "입니다."]
        for token in tokens:
            yield inference_pb2.PredictResponse(
                output_data=[float(ord(c)) for c in token],
                confidence=0.9,
                latency_ms=10
            )

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    inference_pb2_grpc.add_InferenceServiceServicer_to_server(
        InferenceServicer(), server
    )
    server.add_insecure_port('[::]:50051')
    server.start()
    print("gRPC 서버 시작: 포트 50051")
    server.wait_for_termination()

SSE를 활용한 LLM 실시간 스트리밍

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio
import json

app = FastAPI()

async def llm_token_generator(prompt: str):
    """LLM 토큰을 SSE 형식으로 스트리밍"""
    # 실제 환경에서는 LLM 라이브러리 사용
    tokens = prompt.split() + ["[완료]"]
    for i, token in enumerate(tokens):
        data = json.dumps({"token": token, "index": i})
        yield f"data: {data}\n\n"
        await asyncio.sleep(0.05)  # 토큰 생성 시뮬레이션
    yield "data: [DONE]\n\n"

@app.get("/stream")
async def stream_llm(prompt: str = "안녕하세요"):
    return StreamingResponse(
        llm_token_generator(prompt),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",  # Nginx 버퍼링 비활성화
        }
    )

7. 패킷 분석 실전

tcpdump 기본 명령어

# 특정 인터페이스에서 HTTP 트래픽 캡처
sudo tcpdump -i eth0 -w capture.pcap port 80 or port 443

# TCP 핸드셰이크만 필터링 (SYN 패킷)
sudo tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0'

# 특정 호스트와의 통신
sudo tcpdump -i eth0 host 10.0.0.1 -n

# DNS 쿼리 모니터링
sudo tcpdump -i any udp port 53 -v

# gRPC (HTTP/2) 트래픽
sudo tcpdump -i eth0 port 50051 -w grpc_trace.pcap

curl을 이용한 네트워크 디버깅

# TLS 인증서 정보 확인
curl -vI https://api.example.com 2>&1 | grep -A 20 "SSL connection"

# HTTP/2 사용 여부 확인
curl --http2 -I https://api.example.com

# 응답 시간 측정
curl -o /dev/null -s -w \
  "DNS: %{time_namelookup}s\nTCP: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" \
  https://api.example.com

# gRPC 엔드포인트 테스트 (grpcurl)
grpcurl -plaintext localhost:50051 list
grpcurl -plaintext -d '{"model_name": "bert", "input_data": [1.0, 2.0]}' \
  localhost:50051 inference.InferenceService/Predict

네트워크 성능 지표

RTT (Round-Trip Time): 패킷 왕복 시간, ping으로 측정
Throughput: 단위 시간당 전송 데이터량, iperf3으로 측정
Packet Loss: 손실률, 불안정한 UDP 환경에서 중요
Jitter: RTT 변동성, 실시간 스트리밍에서 중요

# 네트워크 대역폭 측정
iperf3 -s  # 서버
iperf3 -c server_ip -t 30 -P 4  # 클라이언트 (4개 병렬 스트림)

퀴즈

Q1. TCP 3-way handshake와 4-way termination 과정을 설명하세요.

정답:

3-way handshake (연결 수립):

Client → Server: SYN (seq=x)
Server → Client: SYN-ACK (seq=y, ack=x+1)
Client → Server: ACK (ack=y+1)

4-way termination (연결 종료):

Client → Server: FIN
Server → Client: ACK
Server → Client: FIN
Client → Server: ACK (TIME_WAIT 상태 후 완전 종료)

설명: 연결 종료가 4단계인 이유는 서버가 FIN을 받은 후에도 남은 데이터를 전송할 수 있기 때문입니다. 클라이언트의 ACK 이후 TIME_WAIT 상태는 2MSL(Maximum Segment Lifetime) 동안 유지되어 지연 패킷을 처리합니다.

Q2. HTTP/2의 multiplexing이 HTTP/1.1의 HOL blocking을 해결하는 방식을 설명하세요.

정답: HTTP/2는 하나의 TCP 연결에서 여러 스트림을 동시에 전송합니다. 각 스트림은 독립적인 ID를 가지며, 한 스트림의 처리가 지연되어도 다른 스트림은 영향받지 않습니다.

설명: HTTP/1.1은 파이프라이닝을 지원하지만 응답은 요청 순서대로 받아야 해서 앞의 응답이 느리면 뒤의 응답도 대기합니다 (HOL blocking). HTTP/2는 프레임 단위로 인터리빙하여 이 문제를 해결하지만, TCP 레벨 HOL blocking은 HTTP/3(QUIC)에서 완전히 해결됩니다.

Q3. mTLS가 일반 TLS와 다른 점과 서비스 메시에서의 역할을 설명하세요.

정답: 일반 TLS는 서버 인증서만 검증하지만, mTLS(Mutual TLS)는 클라이언트도 인증서를 제시하여 양방향 인증을 수행합니다.

설명: 서비스 메시(Istio 등)에서 mTLS는 사이드카 프록시(Envoy)가 자동으로 처리합니다. 각 서비스는 고유한 SPIFFE ID를 가진 X.509 인증서를 발급받아 서비스 간 통신 시 상호 인증 및 암호화가 이루어집니다. 이를 통해 내부 네트워크에서도 도청, 스푸핑, 중간자 공격을 방지할 수 있습니다.

Q4. gRPC가 REST API보다 ML 모델 서빙에 유리한 이유를 설명하세요.

정답: gRPC는 Protocol Buffers를 사용한 바이너리 직렬화, HTTP/2 기반 멀티플렉싱, 양방향 스트리밍 지원으로 ML 서빙에 최적화되어 있습니다.

설명: 대용량 텐서 데이터 전송 시 Protocol Buffers는 JSON 대비 3-5배 작은 페이로드를 생성합니다. 서버 스트리밍을 통해 LLM의 토큰 생성 결과를 실시간으로 클라이언트에 전달할 수 있습니다. 또한 .proto 스키마로 API 계약이 명확하게 정의되어 ML 파이프라인 통합 시 타입 안전성을 보장합니다.

Q5. CDN 캐시 무효화(invalidation) 전략의 종류와 트레이드오프를 설명하세요.

정답: 주요 전략으로 TTL 기반 만료, 버전 기반 URL, API를 통한 즉시 무효화, 서로게이트 키(태그) 기반 무효화가 있습니다.

설명:

TTL 기반: 구현이 간단하지만 만료 전까지 오래된 콘텐츠가 제공될 수 있음
버전 URL (style.v2.css): 캐시 히트율 유지 + 즉각 갱신 가능, 단 URL 관리 복잡
즉시 무효화 API: 빠른 반영 가능하나 CDN 비용 발생, 전파 지연(수십 초) 존재
서로게이트 키: 관련 콘텐츠 그룹을 한 번에 무효화 가능 (Cloudflare Cache Tags 등)

실제 운영에서는 정적 자산(JS/CSS)은 버전 URL, API 응답은 짧은 TTL + 즉시 무효화를 조합하여 사용합니다.

Network Engineering & Security Master Guide: From TCP/IP to Service Mesh and AI Serving Networks

Network Fundamentals: OSI & TCP/IP
Socket Programming: asyncio & aiohttp
Service Mesh: Istio & Envoy
Network Security: TLS & Zero Trust
CDN & Edge Computing
AI Serving Networks: gRPC & SSE
Packet Analysis in Practice
Quiz

1. Network Fundamentals

The OSI 7-Layer Model

The OSI (Open Systems Interconnection) model abstracts network communication into seven distinct layers, each with a specific responsibility.

Layer	Name	Protocol Examples	PDU
7	Application	HTTP, DNS, SMTP	Message
6	Presentation	TLS, JPEG, ASCII	Message
5	Session	NetBIOS, RPC	Message
4	Transport	TCP, UDP	Segment
3	Network	IP, ICMP	Packet
2	Data Link	Ethernet, 802.11	Frame
1	Physical	Cable, Fiber	Bits

The TCP/IP Stack

The real internet uses a simplified 4-layer TCP/IP model rather than the full OSI stack:

Application Layer: HTTP/2, HTTP/3, DNS, TLS
Transport Layer: TCP (reliability), UDP (speed)
Internet Layer: IPv4, IPv6, ICMP
Link Layer: Ethernet, Wi-Fi

HTTP/2 vs HTTP/3

HTTP/1.1 suffers from Head-of-Line (HOL) blocking because each connection handles only one request at a time.

HTTP/2 improvements:

Multiplexing: multiple streams over a single TCP connection
Header compression: HPACK algorithm eliminates redundant headers
Server push: proactively send resources before client requests
Binary framing: binary frames instead of plain text

HTTP/3 & QUIC: HTTP/3 runs over QUIC (UDP-based) instead of TCP, eliminating TCP-level HOL blocking entirely.

HTTP/1.1:  [Req1] → [Res1] → [Req2] → [Res2]  (sequential)
HTTP/2:    [Req1, Req2, Req3] → [Res1, Res2, Res3]  (mux, single TCP)
HTTP/3:    [Req1, Req2, Req3] → [Res1, Res2, Res3]  (mux, independent QUIC streams)

How DNS Works

When resolving api.example.com:

Browser cache lookup
OS cache lookup (/etc/hosts)
Query sent to Recursive Resolver (ISP DNS)
Hierarchical resolution: Root NS → .com TLD NS → example.com Authoritative NS
Returns A record (IPv4) or AAAA record (IPv6)

TLS 1.3 Handshake

TLS 1.3 reduces round trips to 1-RTT for new connections or 0-RTT for session resumption.

Client                          Server
  |--- ClientHello (key share) ---->|
  |<-- ServerHello + Certificate ---|
  |<-- + EncryptedExtensions -------|
  |--- Finished -------------------->|
  |<-> Encrypted application data <->|

2. Socket Programming

Python asyncio TCP Server

import asyncio

async def handle_client(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
    addr = writer.get_extra_info('peername')
    print(f"Connected: {addr}")

    try:
        while True:
            data = await reader.read(1024)
            if not data:
                break
            message = data.decode('utf-8').strip()
            print(f"Received: {message} from {addr}")

            # Echo response
            response = f"Echo: {message}\n"
            writer.write(response.encode('utf-8'))
            await writer.drain()
    except asyncio.IncompleteReadError:
        pass
    finally:
        print(f"Disconnected: {addr}")
        writer.close()
        await writer.wait_closed()

async def main():
    server = await asyncio.start_server(
        handle_client, '0.0.0.0', 8888
    )
    addr = server.sockets[0].getsockname()
    print(f"Server started: {addr}")

    async with server:
        await server.serve_forever()

if __name__ == '__main__':
    asyncio.run(main())

Async HTTP Client with aiohttp

import asyncio
import aiohttp
from typing import List

async def fetch_url(session: aiohttp.ClientSession, url: str) -> dict:
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
        return {
            'url': url,
            'status': response.status,
            'body': await response.text()
        }

async def fetch_all(urls: List[str]) -> List[dict]:
    connector = aiohttp.TCPConnector(
        limit=100,          # max concurrent connections
        limit_per_host=10,  # max per host
        keepalive_timeout=30
    )
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [fetch_url(session, url) for url in urls]
        return await asyncio.gather(*tasks, return_exceptions=True)

# Run
urls = [f"https://httpbin.org/get?id={i}" for i in range(20)]
results = asyncio.run(fetch_all(urls))

gRPC vs REST Comparison

Aspect	gRPC	REST
Protocol	HTTP/2	HTTP/1.1 or HTTP/2
Serialization	Protocol Buffers (binary)	JSON (text)
Streaming	Bidirectional streaming	Limited (SSE, WebSocket separate)
Type safety	Strong typing (.proto schema)	Weak typing
Latency	Low	Relatively higher
Browser support	Requires grpc-web	Native

3. Service Mesh

Istio Architecture

Istio uses the sidecar pattern, injecting Envoy proxies into each Pod to control service-to-service communication.

Control Plane (Istiod): manages configuration, issues certificates, distributes traffic policies
Data Plane (Envoy Sidecar): handles actual traffic, collects metrics, enforces mTLS

VirtualService Configuration Example

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ml-inference-vs
  namespace: production
spec:
  hosts:
    - ml-inference-svc
  http:
    - match:
        - headers:
            x-model-version:
              exact: 'v2'
      route:
        - destination:
            host: ml-inference-svc
            subset: v2
          weight: 100
    - route:
        - destination:
            host: ml-inference-svc
            subset: v1
          weight: 90
        - destination:
            host: ml-inference-svc
            subset: v2
          weight: 10

Load Balancing Algorithms

Round Robin: distribute requests in order (default)
Least Connections: select server with fewest active connections
Weighted Round Robin: distribute based on server weights
IP Hash: pin client to a server based on IP (session affinity)
Consistent Hashing: minimize redistribution when adding/removing servers (distributed caching)

Service Discovery

In Kubernetes, CoreDNS handles service discovery. Services are reachable via service-name.namespace.svc.cluster.local.

4. Network Security

TLS Certificate Chain

Root CA (trusted by browsers/OS)
  └── Intermediate CA
        └── Leaf Certificate (actual server cert)

Each certificate is signed by the private key of the CA above it, forming a chain of trust.

mTLS (Mutual TLS)

Standard TLS only authenticates the server. mTLS requires both parties to present certificates, enabling bidirectional authentication.

Standard TLS:  Client → [verify server cert] → Server
mTLS:          Client ↔ [verify both certs]  ↔ Server

In a service mesh, mTLS is handled automatically by sidecar proxies, securing all service-to-service communication without application changes.

Nginx TLS Configuration

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/ssl/certs/server.crt;
    ssl_certificate_key /etc/ssl/private/server.key;

    # Only TLS 1.2+
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;

    # HSTS (1 year)
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;

    location /api/ {
        proxy_pass http://backend_pool;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Zero Trust Architecture

Core Zero Trust principle: "Never Trust, Always Verify"

Identity-based access: authenticate by user/service identity, not IP address
Least privilege: grant access only to required resources
Continuous verification: evaluate trust level throughout the session
Micro-segmentation: divide network into small zones to block lateral movement

JWT & OAuth 2.0

A JWT (JSON Web Token) consists of three parts: Header.Payload.Signature

OAuth 2.0 Authorization Code Flow:

Client redirects user to Authorization Server
After user authentication, Authorization Code is issued
Backend exchanges Code for Access Token
Access Token is used to call Resource Server APIs

5. CDN & Edge Computing

How CDNs Work

A CDN (Content Delivery Network) caches content at PoPs (Points of Presence) worldwide to reduce latency.

Cache strategies:

Cache-Control: max-age=86400: cache for 1 day in browser and CDN
Cache-Control: no-cache: always revalidate with origin
ETag / Last-Modified: conditional requests to check for changes

Cloudflare Workers Example

export default {
  async fetch(request, env) {
    const url = new URL(request.url)

    // Route AI inference at the edge
    if (url.pathname.startsWith('/inference/')) {
      const modelId = url.searchParams.get('model') || 'default'

      // Route to the nearest GPU cluster
      const region = request.cf.region
      const backendUrl = selectBackend(region, modelId)

      return fetch(backendUrl, {
        method: request.method,
        headers: request.headers,
        body: request.body,
      })
    }

    // Static asset caching
    const cache = caches.default
    const cachedResponse = await cache.match(request)
    if (cachedResponse) return cachedResponse

    const response = await fetch(request)
    if (response.status === 200) {
      const responseToCache = response.clone()
      await cache.put(request, responseToCache)
    }
    return response
  },
}

6. AI Serving Networks

ML Model Serving with gRPC

Protocol Buffers definition:

syntax = "proto3";
package inference;

service InferenceService {
  rpc Predict(PredictRequest) returns (PredictResponse);
  rpc StreamPredict(PredictRequest) returns (stream PredictResponse);
}

message PredictRequest {
  string model_name = 1;
  repeated float input_data = 2;
  map<string, string> metadata = 3;
}

message PredictResponse {
  repeated float output_data = 1;
  float confidence = 2;
  int64 latency_ms = 3;
}

Python gRPC server:

import grpc
from concurrent import futures
import inference_pb2
import inference_pb2_grpc
import numpy as np
import time

class InferenceServicer(inference_pb2_grpc.InferenceServiceServicer):
    def Predict(self, request, context):
        start = time.time()
        input_array = np.array(request.input_data)

        # Actual model inference (example)
        output = input_array * 2.0

        latency = int((time.time() - start) * 1000)
        return inference_pb2.PredictResponse(
            output_data=output.tolist(),
            confidence=0.95,
            latency_ms=latency
        )

    def StreamPredict(self, request, context):
        # Streaming response for LLM token generation
        tokens = ["Hello", " world", "!", " gRPC", " streaming", "."]
        for token in tokens:
            yield inference_pb2.PredictResponse(
                output_data=[float(ord(c)) for c in token],
                confidence=0.9,
                latency_ms=10
            )

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    inference_pb2_grpc.add_InferenceServiceServicer_to_server(
        InferenceServicer(), server
    )
    server.add_insecure_port('[::]:50051')
    server.start()
    print("gRPC server started on port 50051")
    server.wait_for_termination()

Real-Time LLM Streaming with SSE

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio
import json

app = FastAPI()

async def llm_token_generator(prompt: str):
    """Stream LLM tokens as SSE events"""
    # In production, use an LLM library here
    tokens = prompt.split() + ["[DONE]"]
    for i, token in enumerate(tokens):
        data = json.dumps({"token": token, "index": i})
        yield f"data: {data}\n\n"
        await asyncio.sleep(0.05)  # simulate token generation delay
    yield "data: [DONE]\n\n"

@app.get("/stream")
async def stream_llm(prompt: str = "Hello world"):
    return StreamingResponse(
        llm_token_generator(prompt),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",  # Disable Nginx buffering
        }
    )

7. Packet Analysis in Practice

tcpdump Commands

# Capture HTTP traffic on a specific interface
sudo tcpdump -i eth0 -w capture.pcap port 80 or port 443

# Filter TCP handshakes only (SYN packets)
sudo tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0'

# Monitor traffic with a specific host
sudo tcpdump -i eth0 host 10.0.0.1 -n

# Monitor DNS queries
sudo tcpdump -i any udp port 53 -v

# Capture gRPC (HTTP/2) traffic
sudo tcpdump -i eth0 port 50051 -w grpc_trace.pcap

Network Debugging with curl

# Check TLS certificate info
curl -vI https://api.example.com 2>&1 | grep -A 20 "SSL connection"

# Verify HTTP/2 is being used
curl --http2 -I https://api.example.com

# Measure response time breakdown
curl -o /dev/null -s -w \
  "DNS: %{time_namelookup}s\nTCP: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" \
  https://api.example.com

# Test gRPC endpoint (grpcurl)
grpcurl -plaintext localhost:50051 list
grpcurl -plaintext -d '{"model_name": "bert", "input_data": [1.0, 2.0]}' \
  localhost:50051 inference.InferenceService/Predict

Key Network Performance Metrics

RTT (Round-Trip Time): packet round-trip time, measured with ping
Throughput: data transferred per unit time, measured with iperf3
Packet Loss: loss rate, critical in unstable UDP environments
Jitter: RTT variation, important for real-time streaming

# Measure network bandwidth
iperf3 -s                           # server
iperf3 -c server_ip -t 30 -P 4     # client (4 parallel streams)

Quiz

Q1. Explain the TCP 3-way handshake and 4-way termination process.

Answer:

3-way handshake (connection setup):

Client → Server: SYN (seq=x)
Server → Client: SYN-ACK (seq=y, ack=x+1)
Client → Server: ACK (ack=y+1)

4-way termination (connection teardown):

Client → Server: FIN
Server → Client: ACK
Server → Client: FIN
Client → Server: ACK (enters TIME_WAIT, then fully closes)

Explanation: Termination requires 4 steps because after receiving a FIN, the server may still have data to send. The client enters TIME_WAIT for 2MSL (Maximum Segment Lifetime) to handle any delayed packets from the old connection.

Q2. How does HTTP/2 multiplexing solve the HOL blocking problem of HTTP/1.1?

Answer: HTTP/2 sends multiple streams concurrently over a single TCP connection. Each stream has an independent ID, so a delay in one stream does not block other streams.

Explanation: HTTP/1.1 supports pipelining but responses must arrive in request order, so a slow response blocks all subsequent ones (HOL blocking). HTTP/2 solves this at the application layer by interleaving frames from different streams. However, TCP-level HOL blocking (a dropped packet stalls the entire connection) is only resolved in HTTP/3, which uses QUIC's independently delivered streams over UDP.

Q3. How does mTLS differ from standard TLS, and what role does it play in a service mesh?

Answer: Standard TLS only authenticates the server certificate. mTLS (Mutual TLS) requires both client and server to present certificates, enabling bidirectional authentication.

Explanation: In a service mesh like Istio, mTLS is handled transparently by sidecar proxies (Envoy). Each service is issued an X.509 certificate with a unique SPIFFE identity. All inter-service traffic is mutually authenticated and encrypted without any application code changes. This defends against eavesdropping, spoofing, and man-in-the-middle attacks even inside the private network.

Q4. Why is gRPC better suited for ML model serving than REST APIs?

Answer: gRPC offers binary serialization with Protocol Buffers, HTTP/2 multiplexing, and bidirectional streaming — all of which are highly advantageous for ML serving workloads.

Explanation: Large tensor payloads serialize 3-5x more compactly with Protocol Buffers than JSON. Server-side streaming enables real-time delivery of LLM-generated tokens to the client. The .proto schema enforces a strict API contract, ensuring type safety across ML pipeline integrations. Persistent HTTP/2 connections also reduce the overhead of establishing new connections per request.

Q5. What are the types of CDN cache invalidation strategies and their trade-offs?

Answer: The main strategies are TTL-based expiration, versioned URLs, API-based instant invalidation, and surrogate key (tag) based invalidation.

Explanation:

TTL-based: simple to implement, but stale content may be served until expiration
Versioned URL (e.g., style.v2.css): preserves cache hit rate while allowing instant refresh; URL management adds complexity
Instant invalidation API: fast propagation but incurs CDN costs; propagation delay of tens of seconds still applies
Surrogate keys: invalidate groups of related content at once (e.g., Cloudflare Cache Tags)

In practice, static assets (JS/CSS) use versioned URLs while API responses use short TTLs combined with instant invalidation as needed.

네트워크 엔지니어링 & 보안 완전 정복: TCP/IP부터 서비스 메시, AI 서빙 네트워크까지

목차

1. 네트워크 기초

OSI 7계층 모델

TCP/IP 스택

HTTP/2와 HTTP/3의 차이

DNS 동작 원리

TLS 1.3 핸드셰이크

2. 소켓 프로그래밍

Python asyncio TCP 서버

aiohttp를 사용한 비동기 HTTP 클라이언트

gRPC vs REST 비교

3. 서비스 메시

Istio 아키텍처

VirtualService 설정 예시

로드 밸런싱 알고리즘

서비스 디스커버리

4. 네트워크 보안

TLS 인증서 체인

mTLS (Mutual TLS)

Nginx TLS 설정

Zero Trust 아키텍처

JWT & OAuth 2.0

5. CDN & 엣지 컴퓨팅

CDN 작동 원리

Cloudflare Workers 예시

6. AI 서빙 네트워크

gRPC를 활용한 ML 모델 서빙

SSE를 활용한 LLM 실시간 스트리밍

7. 패킷 분석 실전

tcpdump 기본 명령어

curl을 이용한 네트워크 디버깅

네트워크 성능 지표

퀴즈

Network Engineering & Security Master Guide: From TCP/IP to Service Mesh and AI Serving Networks

Table of Contents

1. Network Fundamentals

The OSI 7-Layer Model

The TCP/IP Stack

HTTP/2 vs HTTP/3

How DNS Works

TLS 1.3 Handshake

2. Socket Programming

Python asyncio TCP Server

Async HTTP Client with aiohttp

gRPC vs REST Comparison

3. Service Mesh

Istio Architecture

VirtualService Configuration Example

Load Balancing Algorithms

Service Discovery

4. Network Security

TLS Certificate Chain

mTLS (Mutual TLS)

Nginx TLS Configuration

Zero Trust Architecture

JWT & OAuth 2.0

5. CDN & Edge Computing

How CDNs Work

Cloudflare Workers Example

6. AI Serving Networks

ML Model Serving with gRPC

Real-Time LLM Streaming with SSE

7. Packet Analysis in Practice

tcpdump Commands

Network Debugging with curl

Key Network Performance Metrics

Quiz