- Published on
CDN and Edge Computing Deep Dive — Anycast, Cache Invalidation, DDoS Defense, Lambda@Edge vs Workers vs Compute
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Introduction — The 1.3-Second Battle
2006 Amazon experiment: every 100ms of added page latency costs 1% in sales. 2017 Google: on mobile, going from 1s to 3s increases bounce rate by 32%.
To win back a second, internet companies have spread servers across the globe, manipulated BGP, stacked cache tiers, and even placed serverless functions within 100km of every user. In this post:
- The birth of CDN — the problem Akamai solved in 1998
- The internals of Anycast routing — "how does the same IP route to the nearest location everywhere?"
- Cache tier strategies — tiered, shielded, push
- The hard problem of cache invalidation and real-world patterns
- DDoS defense — the L3/L4/L7 three-layer shield
- The meaning of edge computing and the three major products (Lambda@Edge, Cloudflare Workers, Fastly Compute)
- Image/video transformation at the edge
- Zero Trust — replacing VPN with Cloudflare Access
- Three real-world incident case studies
In the previous post WebAssembly + WASI Deep Dive, we showed Wasm as the execution engine of the edge. In this post, we look at the globe-scale infrastructure that engine runs on.
1. The Birth of CDN — Akamai, 1998
The Problem: 1997 Super Bowl
CBS tried streaming a sports event over the web. The server was in Boston, users in California. In 1997, a round trip across the US took about 70ms, plus bandwidth contention. The site went down.
An MIT research team turned this into a company — Akamai (Hawaiian for "smart/quick").
The Solution: Replicate Content Close to Users
Place cache servers in POPs (Points of Presence) at ISPs around the world, and serve identical requests from the nearest server. In 1999, Akamai commercialized the world's first global system that responded to a user's DNS query with a geographically distinct IP.
1999-2025 CDN Evolution
- 1999: Akamai commercialized, static asset caching
- 2004: LimeLight, the predecessor of CloudFront
- 2008: AWS CloudFront launches
- 2009: Cloudflare founded, introduces free plan
- 2011: Fastly — high performance based on Varnish
- 2017: Cloudflare Workers — edge computing goes mainstream
- 2019: Fastly Compute@Edge (Wasm-based)
- 2022: CDNs absorb Zero Trust, DDoS, and Bot defense
- 2025: Cloudflare/Fastly extend AI inference to the edge
2. Anycast — Same IP, Different Locations
Traditional DNS-Based Geo Routing
1990s Akamai approach: user queries a.akamai.net → DNS inspects the client IP and returns the closest server's IP.
Limitations:
- Location inferred only from the recursive resolver's IP, often inaccurate (improved via EDNS Client Subnet)
- No path change until TTL expires
- Requires maintaining many IPs
Anycast — Magic at the BGP Level
Cloudflare assigns the same IP 1.1.1.1 to thousands of servers across 300+ cities simultaneously. When a user sends a packet to 1.1.1.1, BGP routing automatically delivers it to the closest server.
How it works:
- Cloudflare advertises the
1.1.1.0/24prefix from every POP with the same AS number via BGP - Neighboring ISPs pick the shortest AS-path among multiple routes
- The user is routed to the geographically closest POP
Benefits of Anycast
- No need for DNS-level tuning
- If one POP goes down, BGP automatically re-routes to the next closest POP (failover in seconds)
- Single IP serves the entire globe
Anycast's Pitfalls
Session persistence is hard. What if the BGP path changes during the initial TCP 3-way handshake? The new POP only saw the SYN, not the ACK, so the connection fails.
Workarounds:
- Most CDNs keep BGP paths stable enough
- QUIC/Connection ID-based migration (previous post)
- Session affinity: switch from Anycast at the entry point to internal Unicast for L7 sessions
3. Cache Tiers — From Edge to Origin
The Order a Request Encounters Caches
[User] -> [Edge POP (nearest city)] -> [Regional cache] -> [Origin Shield] -> [Origin server]
The role of each tier:
Edge POP: within 50-200km of the user. Target: 90%+ hit ratio. Regional cache: continent/region level. Used on edge miss. Origin Shield: right before origin. Shared cache to minimize origin load. Origin: the actual content store (S3, web server).
The Economics of Hit Ratio
Assume 1TB/month of traffic:
- Edge hit ratio 95% -> only 50GB reaches origin -> massive bandwidth savings
- Cloudflare's bandwidth is free to customers; AWS S3 charges $50-100/TB. Edge caching cuts cost by multiples.
Interpreting Cache-Control Headers
Cache-Control: public, max-age=31536000, s-maxage=86400, immutable
public— intermediate caches allowedmax-age=31536000— browser cache 1 years-maxage=86400— CDN cache 1 dayimmutable— do not even send revalidation requests (even on reload)
Modern edges also support stale-while-revalidate:
Cache-Control: max-age=3600, stale-while-revalidate=86400
"Fresh within 1 hour, serve stale for up to 24 hours while refreshing in the background."
Key Design — When the Same URL Should Be Treated Differently
The default cache key is the URL, but the same URL may differ by:
- Mobile vs desktop (image resolution)
- Language/region
- Auth state
Header-based cache key extensions are needed:
cacheKey:
- url
- header: Accept-Encoding
- header: User-Agent(normalized) # mobile/desktop only
Too many dimensions -> cardinality explosion, hit ratio collapses. Policy design matters.
4. Cache Invalidation — One of Two Hard Things
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
The Heart of the Problem
Content at a URL changes, but the CDN still serves the old version. How do you "instantly" wipe a cache scattered across hundreds of POPs worldwide?
Four Invalidation Strategies
1. Natural TTL expiry
- Easy, no control
- Unsuitable for news/prices that need immediate reflection
2. Purge (immediate delete)
- Invoke the CDN API to purge a URL
- Cloudflare: handles 100k+ purges per second
- Cost: free-to-paid depending on plan
3. Surrogate Key / Tag-based invalidation
- Response has header
Surrogate-Key: product-42 cat-electronics - "Purge everything tagged cat-electronics" invalidates thousands of URLs at once
- Fastly pioneered this pattern, originating in Varnish
4. Versioned URLs (cache busting)
- Hash in the filename:
main.abc123.js - Each deploy yields new URLs -> no invalidation needed, old files age out naturally
- Default in React/Next.js/Vite
Reality — Combinations
Production operations combine all four:
- CSS/JS: versioned URLs
- API responses: short TTL + surrogate key
- Images: long TTL + purge when needed
- HTML: stale-while-revalidate
5. DDoS Defense — Three-Layer Shield
Responses by Attack Type
L3/L4 (volumetric) — UDP flood, SYN flood, amplification
- Distributed absorption via Anycast
- Network edge devices drop abnormal patterns
- 2024 largest on record: Cloudflare absorbed 3.8Tbps
L7 (application) — HTTP flood, Slowloris, recursive queries
- Rate limiting (per IP/path/cookie)
- Challenges (CAPTCHA, Turnstile)
- Bot management (behavioral fingerprints, JS challenges)
Target (resource exhaustion) — expensive DB queries, payment endpoint attacks
- WAF (ModSecurity, Cloudflare WAF rules)
- Protecting origin via caching
- Circuit breakers
Scrubbing Center Architecture
Traditional DDoS protection:
- Customer's legitimate traffic passes through a scrubbing center
- Malicious patterns are filtered; only legitimate traffic reaches origin
Problem: if the customer's origin IP is exposed, attackers can bypass.
Modern CDN: Absorption-Based
Cloudflare/Fastly Anycast distributes and absorbs attacks across hundreds of POPs worldwide. Even if an individual POP is overloaded, BGP shifts traffic to another POP.
Products like "Magic Transit" let the CDN advertise the customer's own /24 prefix via BGP for upstream protection.
6. The Three Edge Computing Products
AWS Lambda@Edge (2017)
- Event-driven via CloudFront (Viewer Request/Response, Origin Request/Response)
- Node.js, Python runtimes
- Cold start: hundreds of ms
- Limited to a subset of regions (not every POP)
- Max execution time: 5s for Viewer, 30s for Origin
- Memory: 128-10,240MB
Use cases: redirects, A/B testing, auth validation
Cloudflare Workers (2017)
- Based on V8 Isolate (with optional Wasm)
- JavaScript/TypeScript/Rust (via Wasm)
- Cold start: ~5ms
- Runs on all POPs in 300+ cities
- Limits: 10ms CPU (Free), up to 30s (paid)
- Durable Objects — strongly consistent state
Use cases: full applications (Remix, Next.js on Workers), API gateways, dynamic image transformation
Fastly Compute@Edge (2019)
- Wasm/WASI-based
- Rust, JavaScript, Go (TinyGo), AssemblyScript
- Cold start: ~35 microseconds
- All of Fastly's POPs
- Unbounded execution time (within practical limits)
Use cases: API transformation, A/B testing, image transformation, complex edge logic
Comparison of the Three
| Metric | Lambda@Edge | Workers | Compute@Edge |
|---|---|---|---|
| Engine | Lambda (container) | V8 Isolate | Wasm |
| Cold start | ~hundreds of ms | ~5ms | ~35 microseconds |
| POP count | limited (13 regions) | 300+ | 80+ |
| Execution time | 5-30s | 10ms-30s | effectively unlimited |
| Languages | Node, Python | JS/TS, Wasm | anything targeting Wasm |
| Price | expensive | cheap | medium |
7. Image Optimization — The CDN Killer Feature
Why It Matters
Over 50% of web traffic is images. Choosing the right format/size alone can cut page load time in half.
Modern Image Formats
- JPEG — still the baseline, decent compression
- WebP — Chrome 2010, supported by all modern browsers
- AVIF — +30% compression over WebP, supported by Safari 16+
- JPEG XL — Safari 17+, gradual adoption
Edge Image Transformation
Cloudflare Images, Fastly Image Optimizer, AWS CloudFront + Lambda@Edge:
https://cdn.example.com/photo.jpg?w=400&fm=webp&q=75
The edge transforms on demand and caches the result. Origin keeps just the single original.
Responsive Images with Accept
When the browser sends Accept: image/avif, image/webp, image/*, the edge picks the best supported format. Separate cache keys via Vary: Accept.
8. Streaming Media — HLS/DASH/LL-HLS
ABR (Adaptive Bitrate)
Encode the video at multiple bitrates up front:
- 240p at 400kbps
- 480p at 1000kbps
- 720p at 2500kbps
- 1080p at 5000kbps
The player picks per segment based on current bandwidth. Quality adjusts seamlessly.
HLS (HTTP Live Streaming, Apple)
m3u8 master playlist -> per-resolution playlist -> 6-second .ts segments. HTTP-based, a perfect match for CDNs.
LL-HLS (Low Latency)
Standard HLS has 30s+ delay. LL-HLS can achieve under 2 seconds:
- Partial segments within a segment
- HTTP/2 push
- Preload hints
WebRTC — Ultra-Low Latency
Not for streaming but real-time conversation (sub-500ms latency). Needs an SFU (Selective Forwarding Unit), not a CDN.
9. Zero Trust — Replacing VPN
Why VPN Has Hit Its Limits
- The "internal network access = trust" assumption is outdated
- A compromised internal account can exfiltrate everything
- Remote work at scale has turned VPN into a bottleneck
The Zero Trust Model
Every request is always authenticated + context-evaluated. No inside/outside distinction.
Components:
- User ID (SSO)
- Device posture (validated via MDM)
- Network location
- Request context
Edge-Based Zero Trust
Cloudflare Access, Google BeyondCorp, Zscaler, etc:
- Expose internal apps through the edge network
- Every request is authenticated at the edge
- Only legitimate traffic is forwarded internally
Benefits:
- No public IP needed for internal apps
- DDoS defense automatically
- Centralized user logs
Cloudflare Access has a free plan, which made it popular with individuals and startups.
10. Real-World Incidents
Case 1: Fastly's 1-Hour Global Outage, 2021
A customer's VCL (Varnish Config Language) setting triggered a global edge bug. All POPs went down simultaneously. Amazon, NYT, Reddit, and UK government sites were affected. Lesson: CDN config test environments are essential; consider vendor diversity.
Case 2: Cloudbleed, 2017
Cloudflare's HTTP parser read an out-of-bounds memory region and mixed fragments of another customer's HTTP response into responses. Sensitive data ended up in Google's cache. It took 7 days to clean up. Lesson: rewrite parsers in memory-safe languages (Rust). Cloudflare subsequently migrated core components to Rust.
Case 3: Rogers Canada, 2022
An ISP outage paralyzed DNS and even emergency services. Indirect CDN impact. Lesson: map network dependencies, design emergency paths.
11. Vendor Selection Guide
Cloudflare
Strengths: price (generous free plan), developer experience, Workers ecosystem, Zero Trust. Weaknesses: video streaming isn't the specialty; enterprise custom support is limited.
Akamai
Strengths: large enterprise references, complex media workflows, mature security auditing. Weaknesses: expensive; developer UX is not up to modern standards.
Fastly
Strengths: extremely fast purge (under 150ms globally), VCL customization, Wasm Compute. Weaknesses: pricier than Cloudflare and a steep learning curve.
AWS CloudFront
Strengths: AWS ecosystem integration, low egress cost (to other AWS services). Weaknesses: limited edge features, average developer UX.
Reality — Multi-Vendor
Large-scale media often runs Akamai + Fastly in parallel; startups typically stick to Cloudflare alone.
12. Observability and Debugging
CDN Response Headers
Debugger's best friends:
CF-Ray: 123abc-ICN # Cloudflare
Age: 3421 # seconds since cache entry
X-Cache: HIT from edge # Fastly cache hit
X-Served-By: cache-icn1 # which POP served the response
Log Streaming
- Cloudflare Logpush -> S3/GCS/Splunk
- Fastly Real-time Log -> S3/Kinesis
- CloudFront -> S3 + Athena
Real-Time Dashboards
Collect with Prometheus, visualize with Grafana:
- Hit ratio per origin
- P99 edge latency per POP
- 4xx/5xx ratio
- Bandwidth to origin
13. Cost Optimization — Bandwidth Is Cash
Principle 1: Raise the Hit Ratio
Going from 90% to 95% halves origin traffic. Tuning:
- Explicit Cache-Control
- Normalize cookies/query parameters
- Enable Origin Shield
Principle 2: Optimize Image Formats
Just converting JPG to WebP can cut traffic by 30%.
Principle 3: Brotli Compression
Brotli adds 15-25% more compression over gzip. Most CDNs enable it by default.
Principle 4: Origin Location Strategy
Cloudflare/Fastly generally waive egress between themselves and origin (within the same CDN). AWS CloudFront to AWS regions is cheap. Cross-cloud is expensive.
Principle 5: Commit Contracts
If your traffic is predictable, an annual commit can get you 40%+ discount.
14. Pitfalls and Anti-Patterns
Pitfall 1: No Cache Headers at Origin
If origin responds without Cache-Control, the CDN is conservative and doesn't cache. Always be explicit.
Pitfall 2: Breaking Cache with Cookies/Query Strings
?timestamp=123 or ad-tracking cookies that differ per request -> cache miss. Normalization rules are mandatory.
Pitfall 3: Habitual Purging
A full purge on every deploy causes origin re-explosion. Use versioned URLs or surrogate keys.
Pitfall 4: Anycast + Stateful UDP
Running UDP sessions over Anycast breaks on BGP changes. You need QUIC's Connection ID or L4 stickiness.
Pitfall 5: Hot Event Without Origin Shield
Viral traffic hits all POPs -> each POP misses and goes to origin -> origin goes down. Use Shield as a shared cache layer.
Pitfall 6: Preview/Staging URLs Bypassing the CDN
If staging is exposed directly without CDN, DDoS risk follows. Put staging behind access control and the CDN.
15. Real-World Checklist of 12
- Explicit Cache-Control — don't rely on defaults
- Turn on Brotli compression
- Auto-convert to AVIF/WebP — cut image size 50%
- Enable Origin Shield — prepare for large events
- Document your purge strategy — surrogate key vs versioned URL
- Set default rate limiting — especially on auth endpoints
- Apply WAF OWASP Top 10 rules
- Turn on HTTP/3 — improves mobile UX
- Integrate observability — CDN logs -> SIEM
- Cost alerts — detect deviations from forecasts
- Multi-vendor PoC — use two CDNs for critical services
- Review edge-compute use cases — redirects, auth, A/B belong at the edge
Coming Next — Chaos Engineering
"What happens when the internet slows down?" "What if one AZ goes down?" The answer is to actually break things on purpose. In the next post:
- The birth of Netflix's Chaos Monkey (2010)
- The 4 Pillars of Chaos Engineering
- Game Day — real-world simulation drills
- Netflix's Simian Army — Chaos Monkey/Gorilla/Kong
- LitmusChaos / Chaos Mesh — chaos for Kubernetes
- AWS Fault Injection Simulator
- Combining Observability + Chaos
- Extension to organizational culture — blameless postmortems
- How to design chaos experiments
You've probably heard "we're considering chaos engineering" in an ops meeting at least once. The next post unpacks what it actually means.
"You cannot avoid failure. You can prepare for failure. Chaos engineering codifies that preparation."