- Published on
Network Engineering & Security Master Guide: From TCP/IP to Service Mesh and AI Serving Networks
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Table of Contents
- Network Fundamentals: OSI & TCP/IP
- Socket Programming: asyncio & aiohttp
- Service Mesh: Istio & Envoy
- Network Security: TLS & Zero Trust
- CDN & Edge Computing
- AI Serving Networks: gRPC & SSE
- Packet Analysis in Practice
- Quiz
1. Network Fundamentals
The OSI 7-Layer Model
The OSI (Open Systems Interconnection) model abstracts network communication into seven distinct layers, each with a specific responsibility.
| Layer | Name | Protocol Examples | PDU |
|---|---|---|---|
| 7 | Application | HTTP, DNS, SMTP | Message |
| 6 | Presentation | TLS, JPEG, ASCII | Message |
| 5 | Session | NetBIOS, RPC | Message |
| 4 | Transport | TCP, UDP | Segment |
| 3 | Network | IP, ICMP | Packet |
| 2 | Data Link | Ethernet, 802.11 | Frame |
| 1 | Physical | Cable, Fiber | Bits |
The TCP/IP Stack
The real internet uses a simplified 4-layer TCP/IP model rather than the full OSI stack:
- Application Layer: HTTP/2, HTTP/3, DNS, TLS
- Transport Layer: TCP (reliability), UDP (speed)
- Internet Layer: IPv4, IPv6, ICMP
- Link Layer: Ethernet, Wi-Fi
HTTP/2 vs HTTP/3
HTTP/1.1 suffers from Head-of-Line (HOL) blocking because each connection handles only one request at a time.
HTTP/2 improvements:
- Multiplexing: multiple streams over a single TCP connection
- Header compression: HPACK algorithm eliminates redundant headers
- Server push: proactively send resources before client requests
- Binary framing: binary frames instead of plain text
HTTP/3 & QUIC: HTTP/3 runs over QUIC (UDP-based) instead of TCP, eliminating TCP-level HOL blocking entirely.
HTTP/1.1: [Req1] → [Res1] → [Req2] → [Res2] (sequential)
HTTP/2: [Req1, Req2, Req3] → [Res1, Res2, Res3] (mux, single TCP)
HTTP/3: [Req1, Req2, Req3] → [Res1, Res2, Res3] (mux, independent QUIC streams)
How DNS Works
When resolving api.example.com:
- Browser cache lookup
- OS cache lookup (
/etc/hosts) - Query sent to Recursive Resolver (ISP DNS)
- Hierarchical resolution: Root NS →
.comTLD NS →example.comAuthoritative NS - Returns A record (IPv4) or AAAA record (IPv6)
TLS 1.3 Handshake
TLS 1.3 reduces round trips to 1-RTT for new connections or 0-RTT for session resumption.
Client Server
|--- ClientHello (key share) ---->|
|<-- ServerHello + Certificate ---|
|<-- + EncryptedExtensions -------|
|--- Finished -------------------->|
|<-> Encrypted application data <->|
2. Socket Programming
Python asyncio TCP Server
import asyncio
async def handle_client(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
addr = writer.get_extra_info('peername')
print(f"Connected: {addr}")
try:
while True:
data = await reader.read(1024)
if not data:
break
message = data.decode('utf-8').strip()
print(f"Received: {message} from {addr}")
# Echo response
response = f"Echo: {message}\n"
writer.write(response.encode('utf-8'))
await writer.drain()
except asyncio.IncompleteReadError:
pass
finally:
print(f"Disconnected: {addr}")
writer.close()
await writer.wait_closed()
async def main():
server = await asyncio.start_server(
handle_client, '0.0.0.0', 8888
)
addr = server.sockets[0].getsockname()
print(f"Server started: {addr}")
async with server:
await server.serve_forever()
if __name__ == '__main__':
asyncio.run(main())
Async HTTP Client with aiohttp
import asyncio
import aiohttp
from typing import List
async def fetch_url(session: aiohttp.ClientSession, url: str) -> dict:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
return {
'url': url,
'status': response.status,
'body': await response.text()
}
async def fetch_all(urls: List[str]) -> List[dict]:
connector = aiohttp.TCPConnector(
limit=100, # max concurrent connections
limit_per_host=10, # max per host
keepalive_timeout=30
)
async with aiohttp.ClientSession(connector=connector) as session:
tasks = [fetch_url(session, url) for url in urls]
return await asyncio.gather(*tasks, return_exceptions=True)
# Run
urls = [f"https://httpbin.org/get?id={i}" for i in range(20)]
results = asyncio.run(fetch_all(urls))
gRPC vs REST Comparison
| Aspect | gRPC | REST |
|---|---|---|
| Protocol | HTTP/2 | HTTP/1.1 or HTTP/2 |
| Serialization | Protocol Buffers (binary) | JSON (text) |
| Streaming | Bidirectional streaming | Limited (SSE, WebSocket separate) |
| Type safety | Strong typing (.proto schema) | Weak typing |
| Latency | Low | Relatively higher |
| Browser support | Requires grpc-web | Native |
3. Service Mesh
Istio Architecture
Istio uses the sidecar pattern, injecting Envoy proxies into each Pod to control service-to-service communication.
- Control Plane (Istiod): manages configuration, issues certificates, distributes traffic policies
- Data Plane (Envoy Sidecar): handles actual traffic, collects metrics, enforces mTLS
VirtualService Configuration Example
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ml-inference-vs
namespace: production
spec:
hosts:
- ml-inference-svc
http:
- match:
- headers:
x-model-version:
exact: 'v2'
route:
- destination:
host: ml-inference-svc
subset: v2
weight: 100
- route:
- destination:
host: ml-inference-svc
subset: v1
weight: 90
- destination:
host: ml-inference-svc
subset: v2
weight: 10
Load Balancing Algorithms
- Round Robin: distribute requests in order (default)
- Least Connections: select server with fewest active connections
- Weighted Round Robin: distribute based on server weights
- IP Hash: pin client to a server based on IP (session affinity)
- Consistent Hashing: minimize redistribution when adding/removing servers (distributed caching)
Service Discovery
In Kubernetes, CoreDNS handles service discovery. Services are reachable via service-name.namespace.svc.cluster.local.
4. Network Security
TLS Certificate Chain
Root CA (trusted by browsers/OS)
└── Intermediate CA
└── Leaf Certificate (actual server cert)
Each certificate is signed by the private key of the CA above it, forming a chain of trust.
mTLS (Mutual TLS)
Standard TLS only authenticates the server. mTLS requires both parties to present certificates, enabling bidirectional authentication.
Standard TLS: Client → [verify server cert] → Server
mTLS: Client ↔ [verify both certs] ↔ Server
In a service mesh, mTLS is handled automatically by sidecar proxies, securing all service-to-service communication without application changes.
Nginx TLS Configuration
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/ssl/certs/server.crt;
ssl_certificate_key /etc/ssl/private/server.key;
# Only TLS 1.2+
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
# HSTS (1 year)
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
location /api/ {
proxy_pass http://backend_pool;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Zero Trust Architecture
Core Zero Trust principle: "Never Trust, Always Verify"
- Identity-based access: authenticate by user/service identity, not IP address
- Least privilege: grant access only to required resources
- Continuous verification: evaluate trust level throughout the session
- Micro-segmentation: divide network into small zones to block lateral movement
JWT & OAuth 2.0
A JWT (JSON Web Token) consists of three parts: Header.Payload.Signature
OAuth 2.0 Authorization Code Flow:
- Client redirects user to Authorization Server
- After user authentication, Authorization Code is issued
- Backend exchanges Code for Access Token
- Access Token is used to call Resource Server APIs
5. CDN & Edge Computing
How CDNs Work
A CDN (Content Delivery Network) caches content at PoPs (Points of Presence) worldwide to reduce latency.
Cache strategies:
- Cache-Control: max-age=86400: cache for 1 day in browser and CDN
- Cache-Control: no-cache: always revalidate with origin
- ETag / Last-Modified: conditional requests to check for changes
Cloudflare Workers Example
export default {
async fetch(request, env) {
const url = new URL(request.url)
// Route AI inference at the edge
if (url.pathname.startsWith('/inference/')) {
const modelId = url.searchParams.get('model') || 'default'
// Route to the nearest GPU cluster
const region = request.cf.region
const backendUrl = selectBackend(region, modelId)
return fetch(backendUrl, {
method: request.method,
headers: request.headers,
body: request.body,
})
}
// Static asset caching
const cache = caches.default
const cachedResponse = await cache.match(request)
if (cachedResponse) return cachedResponse
const response = await fetch(request)
if (response.status === 200) {
const responseToCache = response.clone()
await cache.put(request, responseToCache)
}
return response
},
}
6. AI Serving Networks
ML Model Serving with gRPC
Protocol Buffers definition:
syntax = "proto3";
package inference;
service InferenceService {
rpc Predict(PredictRequest) returns (PredictResponse);
rpc StreamPredict(PredictRequest) returns (stream PredictResponse);
}
message PredictRequest {
string model_name = 1;
repeated float input_data = 2;
map<string, string> metadata = 3;
}
message PredictResponse {
repeated float output_data = 1;
float confidence = 2;
int64 latency_ms = 3;
}
Python gRPC server:
import grpc
from concurrent import futures
import inference_pb2
import inference_pb2_grpc
import numpy as np
import time
class InferenceServicer(inference_pb2_grpc.InferenceServiceServicer):
def Predict(self, request, context):
start = time.time()
input_array = np.array(request.input_data)
# Actual model inference (example)
output = input_array * 2.0
latency = int((time.time() - start) * 1000)
return inference_pb2.PredictResponse(
output_data=output.tolist(),
confidence=0.95,
latency_ms=latency
)
def StreamPredict(self, request, context):
# Streaming response for LLM token generation
tokens = ["Hello", " world", "!", " gRPC", " streaming", "."]
for token in tokens:
yield inference_pb2.PredictResponse(
output_data=[float(ord(c)) for c in token],
confidence=0.9,
latency_ms=10
)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
inference_pb2_grpc.add_InferenceServiceServicer_to_server(
InferenceServicer(), server
)
server.add_insecure_port('[::]:50051')
server.start()
print("gRPC server started on port 50051")
server.wait_for_termination()
Real-Time LLM Streaming with SSE
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio
import json
app = FastAPI()
async def llm_token_generator(prompt: str):
"""Stream LLM tokens as SSE events"""
# In production, use an LLM library here
tokens = prompt.split() + ["[DONE]"]
for i, token in enumerate(tokens):
data = json.dumps({"token": token, "index": i})
yield f"data: {data}\n\n"
await asyncio.sleep(0.05) # simulate token generation delay
yield "data: [DONE]\n\n"
@app.get("/stream")
async def stream_llm(prompt: str = "Hello world"):
return StreamingResponse(
llm_token_generator(prompt),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no", # Disable Nginx buffering
}
)
7. Packet Analysis in Practice
tcpdump Commands
# Capture HTTP traffic on a specific interface
sudo tcpdump -i eth0 -w capture.pcap port 80 or port 443
# Filter TCP handshakes only (SYN packets)
sudo tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0'
# Monitor traffic with a specific host
sudo tcpdump -i eth0 host 10.0.0.1 -n
# Monitor DNS queries
sudo tcpdump -i any udp port 53 -v
# Capture gRPC (HTTP/2) traffic
sudo tcpdump -i eth0 port 50051 -w grpc_trace.pcap
Network Debugging with curl
# Check TLS certificate info
curl -vI https://api.example.com 2>&1 | grep -A 20 "SSL connection"
# Verify HTTP/2 is being used
curl --http2 -I https://api.example.com
# Measure response time breakdown
curl -o /dev/null -s -w \
"DNS: %{time_namelookup}s\nTCP: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" \
https://api.example.com
# Test gRPC endpoint (grpcurl)
grpcurl -plaintext localhost:50051 list
grpcurl -plaintext -d '{"model_name": "bert", "input_data": [1.0, 2.0]}' \
localhost:50051 inference.InferenceService/Predict
Key Network Performance Metrics
- RTT (Round-Trip Time): packet round-trip time, measured with
ping - Throughput: data transferred per unit time, measured with
iperf3 - Packet Loss: loss rate, critical in unstable UDP environments
- Jitter: RTT variation, important for real-time streaming
# Measure network bandwidth
iperf3 -s # server
iperf3 -c server_ip -t 30 -P 4 # client (4 parallel streams)
Quiz
Q1. Explain the TCP 3-way handshake and 4-way termination process.
Answer:
3-way handshake (connection setup):
- Client → Server: SYN (seq=x)
- Server → Client: SYN-ACK (seq=y, ack=x+1)
- Client → Server: ACK (ack=y+1)
4-way termination (connection teardown):
- Client → Server: FIN
- Server → Client: ACK
- Server → Client: FIN
- Client → Server: ACK (enters TIME_WAIT, then fully closes)
Explanation: Termination requires 4 steps because after receiving a FIN, the server may still have data to send. The client enters TIME_WAIT for 2MSL (Maximum Segment Lifetime) to handle any delayed packets from the old connection.
Q2. How does HTTP/2 multiplexing solve the HOL blocking problem of HTTP/1.1?
Answer: HTTP/2 sends multiple streams concurrently over a single TCP connection. Each stream has an independent ID, so a delay in one stream does not block other streams.
Explanation: HTTP/1.1 supports pipelining but responses must arrive in request order, so a slow response blocks all subsequent ones (HOL blocking). HTTP/2 solves this at the application layer by interleaving frames from different streams. However, TCP-level HOL blocking (a dropped packet stalls the entire connection) is only resolved in HTTP/3, which uses QUIC's independently delivered streams over UDP.
Q3. How does mTLS differ from standard TLS, and what role does it play in a service mesh?
Answer: Standard TLS only authenticates the server certificate. mTLS (Mutual TLS) requires both client and server to present certificates, enabling bidirectional authentication.
Explanation: In a service mesh like Istio, mTLS is handled transparently by sidecar proxies (Envoy). Each service is issued an X.509 certificate with a unique SPIFFE identity. All inter-service traffic is mutually authenticated and encrypted without any application code changes. This defends against eavesdropping, spoofing, and man-in-the-middle attacks even inside the private network.
Q4. Why is gRPC better suited for ML model serving than REST APIs?
Answer: gRPC offers binary serialization with Protocol Buffers, HTTP/2 multiplexing, and bidirectional streaming — all of which are highly advantageous for ML serving workloads.
Explanation: Large tensor payloads serialize 3-5x more compactly with Protocol Buffers than JSON. Server-side streaming enables real-time delivery of LLM-generated tokens to the client. The .proto schema enforces a strict API contract, ensuring type safety across ML pipeline integrations. Persistent HTTP/2 connections also reduce the overhead of establishing new connections per request.
Q5. What are the types of CDN cache invalidation strategies and their trade-offs?
Answer: The main strategies are TTL-based expiration, versioned URLs, API-based instant invalidation, and surrogate key (tag) based invalidation.
Explanation:
- TTL-based: simple to implement, but stale content may be served until expiration
- Versioned URL (e.g.,
style.v2.css): preserves cache hit rate while allowing instant refresh; URL management adds complexity - Instant invalidation API: fast propagation but incurs CDN costs; propagation delay of tens of seconds still applies
- Surrogate keys: invalidate groups of related content at once (e.g., Cloudflare Cache Tags)
In practice, static assets (JS/CSS) use versioned URLs while API responses use short TTLs combined with instant invalidation as needed.