Skip to content
Published on

CDN and Edge Computing Deep Dive — Anycast, Cache Invalidation, DDoS Defense, Lambda@Edge vs Workers vs Compute

Authors

Introduction — The 1.3-Second Battle

2006 Amazon experiment: every 100ms of added page latency costs 1% in sales. 2017 Google: on mobile, going from 1s to 3s increases bounce rate by 32%.

To win back a second, internet companies have spread servers across the globe, manipulated BGP, stacked cache tiers, and even placed serverless functions within 100km of every user. In this post:

  • The birth of CDN — the problem Akamai solved in 1998
  • The internals of Anycast routing — "how does the same IP route to the nearest location everywhere?"
  • Cache tier strategies — tiered, shielded, push
  • The hard problem of cache invalidation and real-world patterns
  • DDoS defense — the L3/L4/L7 three-layer shield
  • The meaning of edge computing and the three major products (Lambda@Edge, Cloudflare Workers, Fastly Compute)
  • Image/video transformation at the edge
  • Zero Trust — replacing VPN with Cloudflare Access
  • Three real-world incident case studies

In the previous post WebAssembly + WASI Deep Dive, we showed Wasm as the execution engine of the edge. In this post, we look at the globe-scale infrastructure that engine runs on.


1. The Birth of CDN — Akamai, 1998

The Problem: 1997 Super Bowl

CBS tried streaming a sports event over the web. The server was in Boston, users in California. In 1997, a round trip across the US took about 70ms, plus bandwidth contention. The site went down.

An MIT research team turned this into a company — Akamai (Hawaiian for "smart/quick").

The Solution: Replicate Content Close to Users

Place cache servers in POPs (Points of Presence) at ISPs around the world, and serve identical requests from the nearest server. In 1999, Akamai commercialized the world's first global system that responded to a user's DNS query with a geographically distinct IP.

1999-2025 CDN Evolution

  • 1999: Akamai commercialized, static asset caching
  • 2004: LimeLight, the predecessor of CloudFront
  • 2008: AWS CloudFront launches
  • 2009: Cloudflare founded, introduces free plan
  • 2011: Fastly — high performance based on Varnish
  • 2017: Cloudflare Workers — edge computing goes mainstream
  • 2019: Fastly Compute@Edge (Wasm-based)
  • 2022: CDNs absorb Zero Trust, DDoS, and Bot defense
  • 2025: Cloudflare/Fastly extend AI inference to the edge

2. Anycast — Same IP, Different Locations

Traditional DNS-Based Geo Routing

1990s Akamai approach: user queries a.akamai.net → DNS inspects the client IP and returns the closest server's IP.

Limitations:

  • Location inferred only from the recursive resolver's IP, often inaccurate (improved via EDNS Client Subnet)
  • No path change until TTL expires
  • Requires maintaining many IPs

Anycast — Magic at the BGP Level

Cloudflare assigns the same IP 1.1.1.1 to thousands of servers across 300+ cities simultaneously. When a user sends a packet to 1.1.1.1, BGP routing automatically delivers it to the closest server.

How it works:

  1. Cloudflare advertises the 1.1.1.0/24 prefix from every POP with the same AS number via BGP
  2. Neighboring ISPs pick the shortest AS-path among multiple routes
  3. The user is routed to the geographically closest POP

Benefits of Anycast

  • No need for DNS-level tuning
  • If one POP goes down, BGP automatically re-routes to the next closest POP (failover in seconds)
  • Single IP serves the entire globe

Anycast's Pitfalls

Session persistence is hard. What if the BGP path changes during the initial TCP 3-way handshake? The new POP only saw the SYN, not the ACK, so the connection fails.

Workarounds:

  • Most CDNs keep BGP paths stable enough
  • QUIC/Connection ID-based migration (previous post)
  • Session affinity: switch from Anycast at the entry point to internal Unicast for L7 sessions

3. Cache Tiers — From Edge to Origin

The Order a Request Encounters Caches

[User] -> [Edge POP (nearest city)] -> [Regional cache] -> [Origin Shield] -> [Origin server]

The role of each tier:

Edge POP: within 50-200km of the user. Target: 90%+ hit ratio. Regional cache: continent/region level. Used on edge miss. Origin Shield: right before origin. Shared cache to minimize origin load. Origin: the actual content store (S3, web server).

The Economics of Hit Ratio

Assume 1TB/month of traffic:

  • Edge hit ratio 95% -> only 50GB reaches origin -> massive bandwidth savings
  • Cloudflare's bandwidth is free to customers; AWS S3 charges $50-100/TB. Edge caching cuts cost by multiples.

Interpreting Cache-Control Headers

Cache-Control: public, max-age=31536000, s-maxage=86400, immutable
  • public — intermediate caches allowed
  • max-age=31536000 — browser cache 1 year
  • s-maxage=86400 — CDN cache 1 day
  • immutable — do not even send revalidation requests (even on reload)

Modern edges also support stale-while-revalidate:

Cache-Control: max-age=3600, stale-while-revalidate=86400

"Fresh within 1 hour, serve stale for up to 24 hours while refreshing in the background."

Key Design — When the Same URL Should Be Treated Differently

The default cache key is the URL, but the same URL may differ by:

  • Mobile vs desktop (image resolution)
  • Language/region
  • Auth state

Header-based cache key extensions are needed:

cacheKey:
  - url
  - header: Accept-Encoding
  - header: User-Agent(normalized)  # mobile/desktop only

Too many dimensions -> cardinality explosion, hit ratio collapses. Policy design matters.


4. Cache Invalidation — One of Two Hard Things

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

The Heart of the Problem

Content at a URL changes, but the CDN still serves the old version. How do you "instantly" wipe a cache scattered across hundreds of POPs worldwide?

Four Invalidation Strategies

1. Natural TTL expiry

  • Easy, no control
  • Unsuitable for news/prices that need immediate reflection

2. Purge (immediate delete)

  • Invoke the CDN API to purge a URL
  • Cloudflare: handles 100k+ purges per second
  • Cost: free-to-paid depending on plan

3. Surrogate Key / Tag-based invalidation

  • Response has header Surrogate-Key: product-42 cat-electronics
  • "Purge everything tagged cat-electronics" invalidates thousands of URLs at once
  • Fastly pioneered this pattern, originating in Varnish

4. Versioned URLs (cache busting)

  • Hash in the filename: main.abc123.js
  • Each deploy yields new URLs -> no invalidation needed, old files age out naturally
  • Default in React/Next.js/Vite

Reality — Combinations

Production operations combine all four:

  • CSS/JS: versioned URLs
  • API responses: short TTL + surrogate key
  • Images: long TTL + purge when needed
  • HTML: stale-while-revalidate

5. DDoS Defense — Three-Layer Shield

Responses by Attack Type

L3/L4 (volumetric) — UDP flood, SYN flood, amplification

  • Distributed absorption via Anycast
  • Network edge devices drop abnormal patterns
  • 2024 largest on record: Cloudflare absorbed 3.8Tbps

L7 (application) — HTTP flood, Slowloris, recursive queries

  • Rate limiting (per IP/path/cookie)
  • Challenges (CAPTCHA, Turnstile)
  • Bot management (behavioral fingerprints, JS challenges)

Target (resource exhaustion) — expensive DB queries, payment endpoint attacks

  • WAF (ModSecurity, Cloudflare WAF rules)
  • Protecting origin via caching
  • Circuit breakers

Scrubbing Center Architecture

Traditional DDoS protection:

  1. Customer's legitimate traffic passes through a scrubbing center
  2. Malicious patterns are filtered; only legitimate traffic reaches origin

Problem: if the customer's origin IP is exposed, attackers can bypass.

Modern CDN: Absorption-Based

Cloudflare/Fastly Anycast distributes and absorbs attacks across hundreds of POPs worldwide. Even if an individual POP is overloaded, BGP shifts traffic to another POP.

Products like "Magic Transit" let the CDN advertise the customer's own /24 prefix via BGP for upstream protection.


6. The Three Edge Computing Products

AWS Lambda@Edge (2017)

  • Event-driven via CloudFront (Viewer Request/Response, Origin Request/Response)
  • Node.js, Python runtimes
  • Cold start: hundreds of ms
  • Limited to a subset of regions (not every POP)
  • Max execution time: 5s for Viewer, 30s for Origin
  • Memory: 128-10,240MB

Use cases: redirects, A/B testing, auth validation

Cloudflare Workers (2017)

  • Based on V8 Isolate (with optional Wasm)
  • JavaScript/TypeScript/Rust (via Wasm)
  • Cold start: ~5ms
  • Runs on all POPs in 300+ cities
  • Limits: 10ms CPU (Free), up to 30s (paid)
  • Durable Objects — strongly consistent state

Use cases: full applications (Remix, Next.js on Workers), API gateways, dynamic image transformation

Fastly Compute@Edge (2019)

  • Wasm/WASI-based
  • Rust, JavaScript, Go (TinyGo), AssemblyScript
  • Cold start: ~35 microseconds
  • All of Fastly's POPs
  • Unbounded execution time (within practical limits)

Use cases: API transformation, A/B testing, image transformation, complex edge logic

Comparison of the Three

MetricLambda@EdgeWorkersCompute@Edge
EngineLambda (container)V8 IsolateWasm
Cold start~hundreds of ms~5ms~35 microseconds
POP countlimited (13 regions)300+80+
Execution time5-30s10ms-30seffectively unlimited
LanguagesNode, PythonJS/TS, Wasmanything targeting Wasm
Priceexpensivecheapmedium

7. Image Optimization — The CDN Killer Feature

Why It Matters

Over 50% of web traffic is images. Choosing the right format/size alone can cut page load time in half.

Modern Image Formats

  • JPEG — still the baseline, decent compression
  • WebP — Chrome 2010, supported by all modern browsers
  • AVIF — +30% compression over WebP, supported by Safari 16+
  • JPEG XL — Safari 17+, gradual adoption

Edge Image Transformation

Cloudflare Images, Fastly Image Optimizer, AWS CloudFront + Lambda@Edge:

https://cdn.example.com/photo.jpg?w=400&fm=webp&q=75

The edge transforms on demand and caches the result. Origin keeps just the single original.

Responsive Images with Accept

When the browser sends Accept: image/avif, image/webp, image/*, the edge picks the best supported format. Separate cache keys via Vary: Accept.


8. Streaming Media — HLS/DASH/LL-HLS

ABR (Adaptive Bitrate)

Encode the video at multiple bitrates up front:

  • 240p at 400kbps
  • 480p at 1000kbps
  • 720p at 2500kbps
  • 1080p at 5000kbps

The player picks per segment based on current bandwidth. Quality adjusts seamlessly.

HLS (HTTP Live Streaming, Apple)

m3u8 master playlist -> per-resolution playlist -> 6-second .ts segments. HTTP-based, a perfect match for CDNs.

LL-HLS (Low Latency)

Standard HLS has 30s+ delay. LL-HLS can achieve under 2 seconds:

  • Partial segments within a segment
  • HTTP/2 push
  • Preload hints

WebRTC — Ultra-Low Latency

Not for streaming but real-time conversation (sub-500ms latency). Needs an SFU (Selective Forwarding Unit), not a CDN.


9. Zero Trust — Replacing VPN

Why VPN Has Hit Its Limits

  • The "internal network access = trust" assumption is outdated
  • A compromised internal account can exfiltrate everything
  • Remote work at scale has turned VPN into a bottleneck

The Zero Trust Model

Every request is always authenticated + context-evaluated. No inside/outside distinction.

Components:

  • User ID (SSO)
  • Device posture (validated via MDM)
  • Network location
  • Request context

Edge-Based Zero Trust

Cloudflare Access, Google BeyondCorp, Zscaler, etc:

  1. Expose internal apps through the edge network
  2. Every request is authenticated at the edge
  3. Only legitimate traffic is forwarded internally

Benefits:

  • No public IP needed for internal apps
  • DDoS defense automatically
  • Centralized user logs

Cloudflare Access has a free plan, which made it popular with individuals and startups.


10. Real-World Incidents

Case 1: Fastly's 1-Hour Global Outage, 2021

A customer's VCL (Varnish Config Language) setting triggered a global edge bug. All POPs went down simultaneously. Amazon, NYT, Reddit, and UK government sites were affected. Lesson: CDN config test environments are essential; consider vendor diversity.

Case 2: Cloudbleed, 2017

Cloudflare's HTTP parser read an out-of-bounds memory region and mixed fragments of another customer's HTTP response into responses. Sensitive data ended up in Google's cache. It took 7 days to clean up. Lesson: rewrite parsers in memory-safe languages (Rust). Cloudflare subsequently migrated core components to Rust.

Case 3: Rogers Canada, 2022

An ISP outage paralyzed DNS and even emergency services. Indirect CDN impact. Lesson: map network dependencies, design emergency paths.


11. Vendor Selection Guide

Cloudflare

Strengths: price (generous free plan), developer experience, Workers ecosystem, Zero Trust. Weaknesses: video streaming isn't the specialty; enterprise custom support is limited.

Akamai

Strengths: large enterprise references, complex media workflows, mature security auditing. Weaknesses: expensive; developer UX is not up to modern standards.

Fastly

Strengths: extremely fast purge (under 150ms globally), VCL customization, Wasm Compute. Weaknesses: pricier than Cloudflare and a steep learning curve.

AWS CloudFront

Strengths: AWS ecosystem integration, low egress cost (to other AWS services). Weaknesses: limited edge features, average developer UX.

Reality — Multi-Vendor

Large-scale media often runs Akamai + Fastly in parallel; startups typically stick to Cloudflare alone.


12. Observability and Debugging

CDN Response Headers

Debugger's best friends:

CF-Ray: 123abc-ICN       # Cloudflare
Age: 3421                # seconds since cache entry
X-Cache: HIT from edge   # Fastly cache hit
X-Served-By: cache-icn1  # which POP served the response

Log Streaming

  • Cloudflare Logpush -> S3/GCS/Splunk
  • Fastly Real-time Log -> S3/Kinesis
  • CloudFront -> S3 + Athena

Real-Time Dashboards

Collect with Prometheus, visualize with Grafana:

  • Hit ratio per origin
  • P99 edge latency per POP
  • 4xx/5xx ratio
  • Bandwidth to origin

13. Cost Optimization — Bandwidth Is Cash

Principle 1: Raise the Hit Ratio

Going from 90% to 95% halves origin traffic. Tuning:

  • Explicit Cache-Control
  • Normalize cookies/query parameters
  • Enable Origin Shield

Principle 2: Optimize Image Formats

Just converting JPG to WebP can cut traffic by 30%.

Principle 3: Brotli Compression

Brotli adds 15-25% more compression over gzip. Most CDNs enable it by default.

Principle 4: Origin Location Strategy

Cloudflare/Fastly generally waive egress between themselves and origin (within the same CDN). AWS CloudFront to AWS regions is cheap. Cross-cloud is expensive.

Principle 5: Commit Contracts

If your traffic is predictable, an annual commit can get you 40%+ discount.


14. Pitfalls and Anti-Patterns

Pitfall 1: No Cache Headers at Origin

If origin responds without Cache-Control, the CDN is conservative and doesn't cache. Always be explicit.

Pitfall 2: Breaking Cache with Cookies/Query Strings

?timestamp=123 or ad-tracking cookies that differ per request -> cache miss. Normalization rules are mandatory.

Pitfall 3: Habitual Purging

A full purge on every deploy causes origin re-explosion. Use versioned URLs or surrogate keys.

Pitfall 4: Anycast + Stateful UDP

Running UDP sessions over Anycast breaks on BGP changes. You need QUIC's Connection ID or L4 stickiness.

Pitfall 5: Hot Event Without Origin Shield

Viral traffic hits all POPs -> each POP misses and goes to origin -> origin goes down. Use Shield as a shared cache layer.

Pitfall 6: Preview/Staging URLs Bypassing the CDN

If staging is exposed directly without CDN, DDoS risk follows. Put staging behind access control and the CDN.


15. Real-World Checklist of 12

  1. Explicit Cache-Control — don't rely on defaults
  2. Turn on Brotli compression
  3. Auto-convert to AVIF/WebP — cut image size 50%
  4. Enable Origin Shield — prepare for large events
  5. Document your purge strategy — surrogate key vs versioned URL
  6. Set default rate limiting — especially on auth endpoints
  7. Apply WAF OWASP Top 10 rules
  8. Turn on HTTP/3 — improves mobile UX
  9. Integrate observability — CDN logs -> SIEM
  10. Cost alerts — detect deviations from forecasts
  11. Multi-vendor PoC — use two CDNs for critical services
  12. Review edge-compute use cases — redirects, auth, A/B belong at the edge

Coming Next — Chaos Engineering

"What happens when the internet slows down?" "What if one AZ goes down?" The answer is to actually break things on purpose. In the next post:

  • The birth of Netflix's Chaos Monkey (2010)
  • The 4 Pillars of Chaos Engineering
  • Game Day — real-world simulation drills
  • Netflix's Simian Army — Chaos Monkey/Gorilla/Kong
  • LitmusChaos / Chaos Mesh — chaos for Kubernetes
  • AWS Fault Injection Simulator
  • Combining Observability + Chaos
  • Extension to organizational culture — blameless postmortems
  • How to design chaos experiments

You've probably heard "we're considering chaos engineering" in an ops meeting at least once. The next post unpacks what it actually means.

"You cannot avoid failure. You can prepare for failure. Chaos engineering codifies that preparation."