Skip to content

필사 모드: Network Engineering & Security Master Guide: From TCP/IP to Service Mesh and AI Serving Networks

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Table of Contents

1. [Network Fundamentals: OSI & TCP/IP](#1-network-fundamentals)

2. [Socket Programming: asyncio & aiohttp](#2-socket-programming)

3. [Service Mesh: Istio & Envoy](#3-service-mesh)

4. [Network Security: TLS & Zero Trust](#4-network-security)

5. [CDN & Edge Computing](#5-cdn--edge-computing)

6. [AI Serving Networks: gRPC & SSE](#6-ai-serving-networks)

7. [Packet Analysis in Practice](#7-packet-analysis-in-practice)

8. [Quiz](#quiz)

1. Network Fundamentals

The OSI 7-Layer Model

The OSI (Open Systems Interconnection) model abstracts network communication into seven distinct layers, each with a specific responsibility.

| Layer | Name | Protocol Examples | PDU |

| ----- | ------------ | ----------------- | ------- |

| 7 | Application | HTTP, DNS, SMTP | Message |

| 6 | Presentation | TLS, JPEG, ASCII | Message |

| 5 | Session | NetBIOS, RPC | Message |

| 4 | Transport | TCP, UDP | Segment |

| 3 | Network | IP, ICMP | Packet |

| 2 | Data Link | Ethernet, 802.11 | Frame |

| 1 | Physical | Cable, Fiber | Bits |

The TCP/IP Stack

The real internet uses a simplified 4-layer TCP/IP model rather than the full OSI stack:

- **Application Layer**: HTTP/2, HTTP/3, DNS, TLS

- **Transport Layer**: TCP (reliability), UDP (speed)

- **Internet Layer**: IPv4, IPv6, ICMP

- **Link Layer**: Ethernet, Wi-Fi

HTTP/2 vs HTTP/3

HTTP/1.1 suffers from Head-of-Line (HOL) blocking because each connection handles only one request at a time.

**HTTP/2 improvements:**

- Multiplexing: multiple streams over a single TCP connection

- Header compression: HPACK algorithm eliminates redundant headers

- Server push: proactively send resources before client requests

- Binary framing: binary frames instead of plain text

**HTTP/3 & QUIC:**

HTTP/3 runs over QUIC (UDP-based) instead of TCP, eliminating TCP-level HOL blocking entirely.

HTTP/1.1: [Req1] → [Res1] → [Req2] → [Res2] (sequential)

HTTP/2: [Req1, Req2, Req3] → [Res1, Res2, Res3] (mux, single TCP)

HTTP/3: [Req1, Req2, Req3] → [Res1, Res2, Res3] (mux, independent QUIC streams)

How DNS Works

When resolving `api.example.com`:

1. Browser cache lookup

2. OS cache lookup (`/etc/hosts`)

3. Query sent to Recursive Resolver (ISP DNS)

4. Hierarchical resolution: Root NS → `.com` TLD NS → `example.com` Authoritative NS

5. Returns A record (IPv4) or AAAA record (IPv6)

TLS 1.3 Handshake

TLS 1.3 reduces round trips to 1-RTT for new connections or 0-RTT for session resumption.

Client Server

|--- ClientHello (key share) ---->|

|<-- ServerHello + Certificate ---|

|<-- + EncryptedExtensions -------|

|--- Finished -------------------->|

|<-> Encrypted application data <->|

2. Socket Programming

Python asyncio TCP Server

async def handle_client(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):

addr = writer.get_extra_info('peername')

print(f"Connected: {addr}")

try:

while True:

data = await reader.read(1024)

if not data:

break

message = data.decode('utf-8').strip()

print(f"Received: {message} from {addr}")

Echo response

response = f"Echo: {message}\n"

writer.write(response.encode('utf-8'))

await writer.drain()

except asyncio.IncompleteReadError:

pass

finally:

print(f"Disconnected: {addr}")

writer.close()

await writer.wait_closed()

async def main():

server = await asyncio.start_server(

handle_client, '0.0.0.0', 8888

)

addr = server.sockets[0].getsockname()

print(f"Server started: {addr}")

async with server:

await server.serve_forever()

if __name__ == '__main__':

asyncio.run(main())

Async HTTP Client with aiohttp

from typing import List

async def fetch_url(session: aiohttp.ClientSession, url: str) -> dict:

async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:

return {

'url': url,

'status': response.status,

'body': await response.text()

}

async def fetch_all(urls: List[str]) -> List[dict]:

connector = aiohttp.TCPConnector(

limit=100, # max concurrent connections

limit_per_host=10, # max per host

keepalive_timeout=30

)

async with aiohttp.ClientSession(connector=connector) as session:

tasks = [fetch_url(session, url) for url in urls]

return await asyncio.gather(*tasks, return_exceptions=True)

Run

urls = [f"https://httpbin.org/get?id={i}" for i in range(20)]

results = asyncio.run(fetch_all(urls))

gRPC vs REST Comparison

| Aspect | gRPC | REST |

| --------------- | ----------------------------- | --------------------------------- |

| Protocol | HTTP/2 | HTTP/1.1 or HTTP/2 |

| Serialization | Protocol Buffers (binary) | JSON (text) |

| Streaming | Bidirectional streaming | Limited (SSE, WebSocket separate) |

| Type safety | Strong typing (.proto schema) | Weak typing |

| Latency | Low | Relatively higher |

| Browser support | Requires grpc-web | Native |

3. Service Mesh

Istio Architecture

Istio uses the sidecar pattern, injecting Envoy proxies into each Pod to control service-to-service communication.

- **Control Plane (Istiod)**: manages configuration, issues certificates, distributes traffic policies

- **Data Plane (Envoy Sidecar)**: handles actual traffic, collects metrics, enforces mTLS

VirtualService Configuration Example

apiVersion: networking.istio.io/v1alpha3

kind: VirtualService

metadata:

name: ml-inference-vs

namespace: production

spec:

hosts:

- ml-inference-svc

http:

- match:

- headers:

x-model-version:

exact: 'v2'

route:

- destination:

host: ml-inference-svc

subset: v2

weight: 100

- route:

- destination:

host: ml-inference-svc

subset: v1

weight: 90

- destination:

host: ml-inference-svc

subset: v2

weight: 10

Load Balancing Algorithms

- **Round Robin**: distribute requests in order (default)

- **Least Connections**: select server with fewest active connections

- **Weighted Round Robin**: distribute based on server weights

- **IP Hash**: pin client to a server based on IP (session affinity)

- **Consistent Hashing**: minimize redistribution when adding/removing servers (distributed caching)

Service Discovery

In Kubernetes, CoreDNS handles service discovery. Services are reachable via `service-name.namespace.svc.cluster.local`.

4. Network Security

TLS Certificate Chain

Root CA (trusted by browsers/OS)

└── Intermediate CA

└── Leaf Certificate (actual server cert)

Each certificate is signed by the private key of the CA above it, forming a chain of trust.

mTLS (Mutual TLS)

Standard TLS only authenticates the server. mTLS requires both parties to present certificates, enabling bidirectional authentication.

Standard TLS: Client → [verify server cert] → Server

mTLS: Client ↔ [verify both certs] ↔ Server

In a service mesh, mTLS is handled automatically by sidecar proxies, securing all service-to-service communication without application changes.

Nginx TLS Configuration

server {

listen 443 ssl http2;

server_name api.example.com;

ssl_certificate /etc/ssl/certs/server.crt;

ssl_certificate_key /etc/ssl/private/server.key;

Only TLS 1.2+

ssl_protocols TLSv1.2 TLSv1.3;

ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;

ssl_prefer_server_ciphers off;

HSTS (1 year)

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

OCSP Stapling

ssl_stapling on;

ssl_stapling_verify on;

location /api/ {

proxy_pass http://backend_pool;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_set_header X-Forwarded-Proto $scheme;

}

}

Zero Trust Architecture

Core Zero Trust principle: "Never Trust, Always Verify"

- **Identity-based access**: authenticate by user/service identity, not IP address

- **Least privilege**: grant access only to required resources

- **Continuous verification**: evaluate trust level throughout the session

- **Micro-segmentation**: divide network into small zones to block lateral movement

JWT & OAuth 2.0

A JWT (JSON Web Token) consists of three parts: `Header.Payload.Signature`

OAuth 2.0 Authorization Code Flow:

1. Client redirects user to Authorization Server

2. After user authentication, Authorization Code is issued

3. Backend exchanges Code for Access Token

4. Access Token is used to call Resource Server APIs

5. CDN & Edge Computing

How CDNs Work

A CDN (Content Delivery Network) caches content at PoPs (Points of Presence) worldwide to reduce latency.

Cache strategies:

- **Cache-Control: max-age=86400**: cache for 1 day in browser and CDN

- **Cache-Control: no-cache**: always revalidate with origin

- **ETag / Last-Modified**: conditional requests to check for changes

Cloudflare Workers Example

export default {

async fetch(request, env) {

const url = new URL(request.url)

// Route AI inference at the edge

if (url.pathname.startsWith('/inference/')) {

const modelId = url.searchParams.get('model') || 'default'

// Route to the nearest GPU cluster

const region = request.cf.region

const backendUrl = selectBackend(region, modelId)

return fetch(backendUrl, {

method: request.method,

headers: request.headers,

body: request.body,

})

}

// Static asset caching

const cache = caches.default

const cachedResponse = await cache.match(request)

if (cachedResponse) return cachedResponse

const response = await fetch(request)

if (response.status === 200) {

const responseToCache = response.clone()

await cache.put(request, responseToCache)

}

return response

},

}

6. AI Serving Networks

ML Model Serving with gRPC

Protocol Buffers definition:

syntax = "proto3";

package inference;

service InferenceService {

rpc Predict(PredictRequest) returns (PredictResponse);

rpc StreamPredict(PredictRequest) returns (stream PredictResponse);

}

message PredictRequest {

string model_name = 1;

repeated float input_data = 2;

map<string, string> metadata = 3;

}

message PredictResponse {

repeated float output_data = 1;

float confidence = 2;

int64 latency_ms = 3;

}

Python gRPC server:

from concurrent import futures

class InferenceServicer(inference_pb2_grpc.InferenceServiceServicer):

def Predict(self, request, context):

start = time.time()

input_array = np.array(request.input_data)

Actual model inference (example)

output = input_array * 2.0

latency = int((time.time() - start) * 1000)

return inference_pb2.PredictResponse(

output_data=output.tolist(),

confidence=0.95,

latency_ms=latency

)

def StreamPredict(self, request, context):

Streaming response for LLM token generation

tokens = ["Hello", " world", "!", " gRPC", " streaming", "."]

for token in tokens:

yield inference_pb2.PredictResponse(

output_data=[float(ord(c)) for c in token],

confidence=0.9,

latency_ms=10

)

def serve():

server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))

inference_pb2_grpc.add_InferenceServiceServicer_to_server(

InferenceServicer(), server

)

server.add_insecure_port('[::]:50051')

server.start()

print("gRPC server started on port 50051")

server.wait_for_termination()

Real-Time LLM Streaming with SSE

from fastapi import FastAPI

from fastapi.responses import StreamingResponse

app = FastAPI()

async def llm_token_generator(prompt: str):

"""Stream LLM tokens as SSE events"""

In production, use an LLM library here

tokens = prompt.split() + ["[DONE]"]

for i, token in enumerate(tokens):

data = json.dumps({"token": token, "index": i})

yield f"data: {data}\n\n"

await asyncio.sleep(0.05) # simulate token generation delay

yield "data: [DONE]\n\n"

@app.get("/stream")

async def stream_llm(prompt: str = "Hello world"):

return StreamingResponse(

llm_token_generator(prompt),

media_type="text/event-stream",

headers={

"Cache-Control": "no-cache",

"X-Accel-Buffering": "no", # Disable Nginx buffering

}

)

7. Packet Analysis in Practice

tcpdump Commands

Capture HTTP traffic on a specific interface

sudo tcpdump -i eth0 -w capture.pcap port 80 or port 443

Filter TCP handshakes only (SYN packets)

sudo tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0'

Monitor traffic with a specific host

sudo tcpdump -i eth0 host 10.0.0.1 -n

Monitor DNS queries

sudo tcpdump -i any udp port 53 -v

Capture gRPC (HTTP/2) traffic

sudo tcpdump -i eth0 port 50051 -w grpc_trace.pcap

Network Debugging with curl

Check TLS certificate info

curl -vI https://api.example.com 2>&1 | grep -A 20 "SSL connection"

Verify HTTP/2 is being used

curl --http2 -I https://api.example.com

Measure response time breakdown

curl -o /dev/null -s -w \

"DNS: %{time_namelookup}s\nTCP: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" \

https://api.example.com

Test gRPC endpoint (grpcurl)

grpcurl -plaintext localhost:50051 list

grpcurl -plaintext -d '{"model_name": "bert", "input_data": [1.0, 2.0]}' \

localhost:50051 inference.InferenceService/Predict

Key Network Performance Metrics

- **RTT (Round-Trip Time)**: packet round-trip time, measured with `ping`

- **Throughput**: data transferred per unit time, measured with `iperf3`

- **Packet Loss**: loss rate, critical in unstable UDP environments

- **Jitter**: RTT variation, important for real-time streaming

Measure network bandwidth

iperf3 -s # server

iperf3 -c server_ip -t 30 -P 4 # client (4 parallel streams)

Quiz

**Answer**:

3-way handshake (connection setup):

1. Client → Server: SYN (seq=x)

2. Server → Client: SYN-ACK (seq=y, ack=x+1)

3. Client → Server: ACK (ack=y+1)

4-way termination (connection teardown):

1. Client → Server: FIN

2. Server → Client: ACK

3. Server → Client: FIN

4. Client → Server: ACK (enters TIME_WAIT, then fully closes)

**Explanation**: Termination requires 4 steps because after receiving a FIN, the server may still have data to send. The client enters TIME_WAIT for 2MSL (Maximum Segment Lifetime) to handle any delayed packets from the old connection.

**Answer**: HTTP/2 sends multiple streams concurrently over a single TCP connection. Each stream has an independent ID, so a delay in one stream does not block other streams.

**Explanation**: HTTP/1.1 supports pipelining but responses must arrive in request order, so a slow response blocks all subsequent ones (HOL blocking). HTTP/2 solves this at the application layer by interleaving frames from different streams. However, TCP-level HOL blocking (a dropped packet stalls the entire connection) is only resolved in HTTP/3, which uses QUIC's independently delivered streams over UDP.

**Answer**: Standard TLS only authenticates the server certificate. mTLS (Mutual TLS) requires both client and server to present certificates, enabling bidirectional authentication.

**Explanation**: In a service mesh like Istio, mTLS is handled transparently by sidecar proxies (Envoy). Each service is issued an X.509 certificate with a unique SPIFFE identity. All inter-service traffic is mutually authenticated and encrypted without any application code changes. This defends against eavesdropping, spoofing, and man-in-the-middle attacks even inside the private network.

**Answer**: gRPC offers binary serialization with Protocol Buffers, HTTP/2 multiplexing, and bidirectional streaming — all of which are highly advantageous for ML serving workloads.

**Explanation**: Large tensor payloads serialize 3-5x more compactly with Protocol Buffers than JSON. Server-side streaming enables real-time delivery of LLM-generated tokens to the client. The `.proto` schema enforces a strict API contract, ensuring type safety across ML pipeline integrations. Persistent HTTP/2 connections also reduce the overhead of establishing new connections per request.

**Answer**: The main strategies are TTL-based expiration, versioned URLs, API-based instant invalidation, and surrogate key (tag) based invalidation.

**Explanation**:

- **TTL-based**: simple to implement, but stale content may be served until expiration

- **Versioned URL** (e.g., `style.v2.css`): preserves cache hit rate while allowing instant refresh; URL management adds complexity

- **Instant invalidation API**: fast propagation but incurs CDN costs; propagation delay of tens of seconds still applies

- **Surrogate keys**: invalidate groups of related content at once (e.g., Cloudflare Cache Tags)

In practice, static assets (JS/CSS) use versioned URLs while API responses use short TTLs combined with instant invalidation as needed.

현재 단락 (1/318)

1. [Network Fundamentals: OSI & TCP/IP](#1-network-fundamentals)

작성 글자: 0원문 글자: 14,021작성 단락: 0/318