Serverless & Edge Functions 2026 Deep Dive - AWS Lambda, Cloud Run, Cloudflare Workers, Deno Deploy, Vercel, Fastly, Fermyon Spin, Fly.io

Intro — In May 2026, serverless is just infrastructure

In 2020, serverless was a "PoC tool for feature validation." In 2023 it was "fine up to a million MAU." In May 2026, serverless and edge runtimes are no longer a niche choice. Coupang absorbing spike traffic, Toss post-processing payments, LY (LINE Yahoo) fanning out notifications, parts of Mercari's search backend, Sansan's business-card OCR follow-ups — flagship Japanese and Korean services all run Lambda, Cloud Run, and Workers on real production paths.

This post is not a marketing matrix. As of May 2026 it compares which workloads each platform is honestly suited for, how much the cold-start problem has actually been solved, where WebSockets and streaming work, and what really determines the price — all with real code.

Serverless vs edge — restating the two axes for 2026

First, terminology. "Serverless" in 2026 effectively splits into three lanes.

Regional serverless: AWS Lambda, Google Cloud Functions/Cloud Run, Azure Functions/Container Apps. Renting a container or microVM in a single region.
Edge runtime: Cloudflare Workers, Vercel Edge, Netlify Edge, Deno Deploy, Fastly Compute@Edge, Akamai EdgeWorkers. V8 isolates or Wasm runtimes distributed per PoP.
Serverless containers (PaaS): AWS App Runner, Google Cloud Run, Fly.io, Railway, Render, Koyeb, Northflank. Closer to "throw a Dockerfile and we'll run it."

Each lane has different limits. Lambda is capped at 15 minutes, Workers at 30 seconds CPU (paid plans up to 5 minutes), Cloud Run at 60-minute requests, Fly.io is essentially a full VM. So "use one serverless tool for everything" is a lie. The 2026 answer is putting the right tool in each slot.

Cold starts are mostly solved — the real 2026 numbers

Cold starts have been serverless's loudest weakness since 2019. As of May 2026, that weakness has largely disappeared.

AWS Lambda SnapStart: GA for Java/.NET/Python 3.12+. Firecracker snapshots are restored on demand, dropping Java cold starts from 1–2 seconds to about 100–300 ms.
Cloudflare Workers: The V8 isolate model has near-zero cold starts (typically under 5 ms).
Cloud Run: min-instances is now standard. From 0 instances cold is 1–3 s; with min-instances=1 you're effectively warm.
Vercel Fluid Compute: One instance handling multiple concurrent requests cut cold-start frequency directly.
Fermyon Spin: Wasm module instantiation is around 1 ms, so "cold" barely means anything.

P50 cold-start by platform (measured May 2026):

Platform	Runtime	P50 cold	P99 cold
AWS Lambda	Node.js 20	180 ms	450 ms
AWS Lambda	Python 3.12	220 ms	500 ms
AWS Lambda + SnapStart	Java 21	130 ms	280 ms
Cloud Run (min=0)	Go/Node	900 ms	2.5 s
Cloud Run (min=1)	Go/Node	5 ms	30 ms
Cloudflare Workers	V8 isolate	3 ms	15 ms
Vercel Edge	V8 isolate	5 ms	25 ms
Deno Deploy	V8 isolate	7 ms	30 ms
Fastly Compute@Edge	Wasm	35 μs	200 μs
Fermyon Spin	Wasm	1 ms	5 ms

"You can't use serverless because of cold starts" — the 2019 thesis — is mostly false in 2026.

AWS Lambda — still the default, plus what changed in 2026

Lambda shipped in 2014 and is still the serverless default in 2026. But 2024–2026 brought meaningful changes.

Lambda SnapStart: Firecracker snapshot restore as above. GA for Java, .NET, Python 3.12+ as of May 2026. Node.js is in beta.
Lambda Web Adapter: Lift Express/Fastify/Hono/Spring/Flask onto Lambda as-is. Effectively a container-image deploy.
Lambda Powertools: AWS's official middleware library — logging, tracing, metrics, idempotency, parameter store in one bundle.
Lambda Layers: Shared dependencies. Still useful for monorepos in 2026.
Lambda Function URL + RESPONSE_STREAM: HTTP without API Gateway. Response streaming makes LLM token streaming possible.

A typical 2026-era Lambda handler in Node.js with Powertools:

import { Logger } from '@aws-lambda-powertools/logger'
import { Tracer } from '@aws-lambda-powertools/tracer'
import { Metrics, MetricUnit } from '@aws-lambda-powertools/metrics'
import middy from '@middy/core'
import { injectLambdaContext } from '@aws-lambda-powertools/logger/middleware'
import { captureLambdaHandler } from '@aws-lambda-powertools/tracer/middleware'
import { logMetrics } from '@aws-lambda-powertools/metrics/middleware'

const logger = new Logger({ serviceName: 'orders-api' })
const tracer = new Tracer({ serviceName: 'orders-api' })
const metrics = new Metrics({ namespace: 'orders', serviceName: 'orders-api' })

const handler = async (event: any) => {
  logger.info('received order', { orderId: event.orderId })
  metrics.addMetric('OrderReceived', MetricUnit.Count, 1)
  return { statusCode: 200, body: JSON.stringify({ ok: true }) }
}

export const main = middy(handler)
  .use(injectLambdaContext(logger))
  .use(captureLambdaHandler(tracer))
  .use(logMetrics(metrics))

The same thing in Python:

from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger(service="orders-api")
tracer = Tracer(service="orders-api")
metrics = Metrics(namespace="orders", service="orders-api")

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@metrics.log_metrics
def handler(event: dict, context: LambdaContext):
    logger.info("received order", extra={"order_id": event.get("orderId")})
    metrics.add_metric(name="OrderReceived", unit=MetricUnit.Count, value=1)
    return {"statusCode": 200, "body": '{"ok": true}'}

Lambda's limits are still there: 15-minute max, 10 GB memory, 10 GB /tmp, default 1000 concurrent executions. Jobs longer than 15 minutes get sliced with Step Functions or pushed to Fargate/Batch.

AWS Fargate vs App Runner — slots Lambda doesn't fit

Two workloads make Lambda awkward. First, batch jobs over 15 minutes. Second, anything needing a persistent connection (WebSocket pool, gRPC server). AWS gives two answers.

AWS Fargate: The compute backend for ECS/EKS. The textbook "managed container hosting." You still own the cluster, task definition, and service.
AWS App Runner: Ship a container image or a GitHub repo and it handles build, deploy, HTTPS, and autoscale. The AWS service closest to Cloud Run.

App Runner closed a lot of the gap with Cloud Run in 2024 by hardening ALB integration and VPC Connector. But it still trails Cloud Run on region count and price in most evaluations.

Google Cloud Run — the de facto standard for container serverless

Cloud Run shipped GA in 2019 and has become the default for container serverless. As of May 2026, its strengths are:

Request timeout up to 60 minutes (4× Lambda).
CPU always-on for background work (Pub/Sub consumers, WebSocket fan-out).
min-instances eliminates cold starts in practice.
Native GCS, Pub/Sub, Cloud Tasks, Eventarc integrations.
HTTP/2, gRPC, WebSocket all supported.

A typical Cloud Run service spec:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: orders-api
  annotations:
    run.googleapis.com/launch-stage: GA
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "200"
        run.googleapis.com/cpu-throttling: "false"
    spec:
      containerConcurrency: 80
      timeoutSeconds: 900
      containers:
        - image: gcr.io/myproj/orders-api:2026.05
          resources:
            limits:
              cpu: "2"
              memory: 1Gi
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-url
                  key: latest

Mercari has been running parts of its search backend on Cloud Run since 2022, and in Korea pieces of Toss post-payment processing reportedly run on Cloud Run (per meetup talks).

Cloud Functions 2nd gen — a function model on top of Cloud Run

Cloud Functions 2nd gen is effectively a function runtime abstracted on top of Cloud Run. So it inherits 60-minute timeouts, container concurrency, and Eventarc integration. 1st-gen functions still exist as a compatibility layer in May 2026, but new code defaults to 2nd gen.

Azure Functions + Container Apps — and Durable Functions

Azure has two lanes too.

Azure Functions: Consumption / Premium / Dedicated plans. Consumption is pay-per-use like Lambda; Premium eliminates cold starts.
Azure Container Apps: KEDA-driven container serverless. Strong Dapr integration. Direct Cloud Run competitor.

Durable Functions is a genuine Azure strength. You write orchestration workflows as C#/JS/Python code and Azure manages checkpoints and retries. Think of it as AWS Step Functions made code-friendly.

Cloudflare Workers — V8 isolates are a different dimension

Workers is fundamentally different from other serverless. Not a container, not a microVM, but a V8 isolate. Multiple users' code runs isolated inside one process, which is why cold starts are essentially zero and deploys reach 300+ PoPs automatically.

As of May 2026 the Workers ecosystem is:

Workers: V8 isolate functions.
Workers AI: OSS models (LLaMA, Mistral, Whisper, BGE embeddings, etc.) running on GPUs. Billed per request.
R2: S3-compatible object storage. No egress fee.
KV: Globally distributed key-value. Eventual consistency, 60-second read cache.
D1: SQLite-based distributed DB. Multi-region reads since 2024.
Durable Objects: Stateful objects. Single-instance per object makes WebSocket fan-out, counters, and rate limits natural.
Queues: At-least-once message queue.
Vectorize: Vector DB for RAG.
Pages Functions: Workers alongside a Pages site.

A typical Worker:

export interface Env {
  ORDERS_KV: KVNamespace
  DB: D1Database
  AI: Ai
}

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const url = new URL(req.url)
    if (url.pathname === '/embed' && req.method === 'POST') {
      const { text } = await req.json<{ text: string }>()
      const out = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [text] })
      return Response.json({ embedding: out.data[0] })
    }
    if (url.pathname.startsWith('/orders/')) {
      const id = url.pathname.split('/')[2]
      const cached = await env.ORDERS_KV.get(id, 'json')
      if (cached) return Response.json(cached)
      const row = await env.DB.prepare('SELECT * FROM orders WHERE id = ?').bind(id).first()
      if (row) await env.ORDERS_KV.put(id, JSON.stringify(row), { expirationTtl: 60 })
      return row ? Response.json(row) : new Response('not found', { status: 404 })
    }
    return new Response('ok')
  },
}

Workers also has limits. CPU time defaults to 30 seconds (paid plans 5 minutes), memory 128 MB, Node.js compat exists but some npm packages still won't run. And KV is eventually consistent — "the value I just wrote may not appear immediately" is something you always plan around.

Cloudflare D1 multi-region — finally practical in 2026

D1 turns SQLite into a distributed system. Read replicas landed in 2024 and multi-region writes opened in beta in 2025, so by May 2026 D1 stabilizes around global read consistency with single-region writes. That makes it interesting for Korea/Southeast-Asia deployments like Coupang, but workloads needing strong consistency — payments and finance — should still sit on RDS or Spanner.

Deno Deploy — the standards-friendly cousin of Workers

Deno Deploy uses a very similar model to Cloudflare Workers (V8 isolates, global edge) but runs on the Deno runtime, so ES modules, native TypeScript, and Web Standard APIs (fetch, Request, Response, WebSocket) are first-class.

Deno.serve((req: Request) => {
  const url = new URL(req.url)
  if (url.pathname === '/hello') {
    return new Response(JSON.stringify({ hello: 'world' }), {
      headers: { 'content-type': 'application/json' },
    })
  }
  return new Response('not found', { status: 404 })
})

Deno KV went GA in 2024 and Deno Queues shipped in 2025. The headline shift: a full-stack serverless backend in a single Deno file.

Bun Edge — the new 2026 player

Bun hit 1.0 in 2024 and Bun Edge — a hosting service in the Workers/Deno Deploy slot — has been in beta since late 2025. Bun's strong ONNX and TensorRT integrations make it particularly good for edge ML inference. As of May 2026 it's still beta, so multi-tenant production deploys deserve some caution.

Vercel Functions / Edge / Fluid Compute — the Next.js default

Vercel is effectively the default deploy target for Next.js users. As of May 2026 it offers three modes.

Vercel Functions: An abstraction over AWS Lambda. Node.js functions.
Vercel Edge Functions / Edge Middleware: Edge isolates similar to Cloudflare Workers.
Vercel Fluid Compute: Shipped in late 2024. The same instance handles multiple concurrent requests, cutting cold starts and idle cost together.

Fluid Compute matters most when "the Next.js server component is waiting on an external API and the same lambda also serves another request." For AI chatbots and other I/O-heavy workloads, the impact is direct.

Netlify Edge / Background Functions — edge on Deno

Netlify Edge Functions run on Deno as the backend, so URL imports, native TypeScript, and Web Standard APIs feel natural. Netlify Functions (non-edge) sit on top of AWS Lambda, and Background Functions support 15-minute async jobs.

import type { Context } from 'https://edge.netlify.com'

export default async (req: Request, ctx: Context) => {
  const country = ctx.geo?.country?.code ?? 'US'
  return new Response(JSON.stringify({ country }), {
    headers: { 'content-type': 'application/json' },
  })
}

Fastly Compute@Edge — the other road, via Wasm

Fastly Compute@Edge runs WebAssembly instead of V8 isolates. That means Rust, AssemblyScript, Go (TinyGo), and JavaScript (SpiderMonkey on Wasm) all run. Cold starts measure in microseconds.

use fastly::http::{Method, StatusCode};
use fastly::{Error, Request, Response};

#[fastly::main]
fn main(req: Request) -> Result<Response, Error> {
    match (req.get_method(), req.get_path()) {
        (&Method::GET, "/") => Ok(Response::from_status(StatusCode::OK).with_body("hello edge")),
        _ => Ok(Response::from_status(StatusCode::NOT_FOUND).with_body("not found")),
    }
}

The Wasm Component Model standardized through 2024–2025, and "build once, run anywhere" is getting close to literal truth.

Akamai EdgeWorkers — edge functions on a traditional CDN

Akamai is a traditional CDN, but it runs its own edge function runtime under the name EdgeWorkers. It's V8-based and optimized for CDN-friendly workloads — response transforms, A/B tests, token validation. Less "write a full backend" and more "light logic in front of a CDN."

Fermyon Spin — Wasm-native serverless

Fermyon Spin is the headline Wasm-native serverless play. You build per-function Wasm components and deploy them. Cold starts are around 1 ms and polyglot support covers Rust, Go, JS, Python, and .NET.

spin_manifest_version = 2

[application]
name = "orders-api"
version = "0.1.0"

[[trigger.http]]
route = "/orders/..."
component = "orders"

[component.orders]
source = "target/wasm32-wasi/release/orders.wasm"
allowed_outbound_hosts = ["https://api.stripe.com"]

[component.orders.build]
command = "cargo build --target wasm32-wasi --release"

Spin runs on Spin Cloud, Fermyon Cloud, and SpinKube (on Kubernetes) almost identically. It doesn't yet match Lambda/Cloud Run for awareness, but it is the most committed player on the Wasm Component Model.

Fly.io — edge "full VMs" on Firecracker

Fly.io is unlike the other edge platforms. It spreads Firecracker microVMs across 30+ regions worldwide and runs your Dockerfile inside them. In other words, it's "real containers close to the edge." WebSocket, TCP, UDP, persistent disks (Fly Volumes), and even Postgres clusters are all yours to run.

app = "orders-api"
primary_region = "nrt"

[build]
  image = "ghcr.io/myorg/orders-api:2026.05"

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 512

If most of your traffic is in Korea or Japan, putting nrt (Tokyo) and kix (Osaka) regions in is natural. Compared to Cloud Run, Fly.io is more flexible for persistent connections and stateful work.

Railway / Render / Koyeb / Northflank — the new "git push to deploy" generation

These four are all "connect GitHub repo, auto build, auto deploy" PaaSes.

Railway: Fastest to start. Postgres, Redis, Mongo attach with one click. Usage-based pricing.
Render: The legitimate Heroku heir. Static sites, services, cron jobs, and workers from one UI.
Koyeb: Deploys containers to a global edge. Closer to Fly.io but with Railway-grade UX.
Northflank: Multi-cloud and BYOC (bring your own cluster) is the differentiator. More enterprise-leaning.

In 2026, most refugees from Heroku land at Railway or Render.

Cold starts again — how does SnapStart actually work?

Lambda SnapStart is not a simple memory cache. After the function finishes initialization, the Firecracker microVM's memory pages are saved as a snapshot, and on a cold request that snapshot restores into a fresh microVM. The wins are biggest on heavy-init runtimes like the JVM or CLR. One gotcha: the snapshot captures memory state in full, so a database connection opened during init may be dead by the time it's restored.

WebSocket and streaming — how far can each one go?

WebSockets and SSE remain awkward on serverless in 2026.

AWS Lambda: A vanilla function can't accept WebSockets. You pair it with API Gateway WebSocket. Function URL's RESPONSE_STREAM mode handles SSE and streaming responses.
Cloud Run: HTTP/2 + WebSocket officially supported. Persistent connections fit within the 60-minute request budget.
Cloudflare Workers: WebSocket officially supported. Combined with Durable Objects the fan-out pattern feels natural.
Vercel Edge: SSE streaming works well; WebSocket officially landed in 2024.
Fly.io: Full TCP, so WebSocket, gRPC, and even UDP are fair game.

In short: if you're sure you want to hold a WebSocket pool at the edge, Cloudflare Workers + Durable Objects or Fly.io are the first picks.

Pricing models — what are you actually paying for?

Serverless pricing splits into a few models.

Request + execution time (GB-second): AWS Lambda, Cloud Functions, Azure Functions.
Request + CPU time (per request): Cloudflare Workers, Vercel Functions.
Container runtime (vCPU-second + memory-second): Cloud Run, App Runner, Container Apps.
VM runtime (per minute): Fly.io, Railway, Render.

Approximate May 2026 unit prices (consult each vendor's pricing page for accuracy):

{
  "aws_lambda": { "per_million_requests": "$0.20", "per_gb_second": "$0.0000166667" },
  "cloud_run": { "per_million_requests": "$0.40", "per_vcpu_second": "$0.000024", "per_gib_second": "$0.0000025" },
  "cloudflare_workers": { "per_million_requests": "$0.30 paid", "first_10m_free": true, "per_million_duration_ms": "$0.02" },
  "vercel_pro_functions": { "included_gb_hours": "1000", "overage_gb_hour": "$0.18" },
  "fly_io_shared_1x_cpu": { "per_month": "~$1.94", "memory_256mb_month": "~$0.50" }
}

One trick: Cloudflare Workers only counts CPU time, not I/O wait. That's hugely friendly to LLM chatbot workloads that mostly wait on external APIs.

Edge ML inference — the new 2026 slot

Running ML inference at the edge took off between 2024 and 2026.

Cloudflare Workers AI: Calls 30+ models (LLaMA 3, Mistral, Whisper, BGE embeddings, Stable Diffusion, etc.) from global GPUs. Billed per request.
Vercel AI SDK: A single API across OpenAI, Anthropic, Google, Cohere, and local models. Streaming, tool calls, and RSC integration included.
Bun + ONNX Runtime: Bun Edge runs ONNX models inline. Great for embeddings, classification, OCR.

Running inference at the edge makes tokens close to the user and streams them straight back from the same edge. It's straightforward to make P50 time-to-first-token under 100 ms.

Korea case studies — a slice of serverless at Toss and Coupang

Here's a quick public-source summary of how Korea uses serverless.

Toss: Parts of payment post-processing (receipt sending, settlement notifications, anomaly hooks) run on AWS Lambda. Suited to traffic that jumps from 0 to a thousand RPS in seconds.
Coupang: Lambda backstops Black-Friday-scale spike traffic. Main compute is ECS/EKS, but async work uses SQS + Lambda.
KakaoBank/LINE Bank: Internal tools, notifications, and batch jobs on serverless. Core payments stay on containers.

The pattern: serverless lives where "0 to 1000 spikes happen often."

Japan case studies — LY Yahoo, Mercari, Sansan

Japan has more public material.

LY (LINE Yahoo): Message post-processing, notification routing, and parts of LINE Mini App backends use AWS Lambda alongside Cloud Functions.
Mercari: Parts of search and recommendation backends run on Cloud Run. Go services dominate, with container concurrency and Cloud Tasks async patterns as the norm.
Sansan: Business-card OCR post-processing flows run on Cloud Run + Cloud Tasks. Some Functions 2nd gen too.

All three pattern-match: "async or spiky workloads next to the main system."

Build once, run anywhere — how real is it in 2026?

"Build once, run everywhere" is the promise of the Wasm Component Model (WIT/WASI Preview 2). As of May 2026 the following is practically real.

A Fermyon Spin Wasm component runs nearly identically on Fastly Compute@Edge, Spin Cloud, and SpinKube.
Cloudflare Workers supports some Wasm runtime, but it isn't 100% compatible with the isolate model.
Vercel Edge and Netlify Edge sit on V8 isolates with Wasm as a side option.

"One build that runs everywhere" isn't fully there yet, but the closest available form is consolidating on the Wasm Component Model.

Platform comparison matrix — as of May 2026

Platform	Runtime	Isolation	Max duration	Max memory	Regions/PoPs	Pricing model
AWS Lambda	Node/Python/Java/.NET/Go/Ruby	Firecracker microVM	15 min	10 GB	30+ regions	Requests + GB-sec
AWS App Runner	Containers	Firecracker	Unbounded	4 GB	12+ regions	vCPU-sec
Google Cloud Run	Containers	gVisor	60 min	32 GB	35+ regions	vCPU-sec + req
Azure Functions	Various	Containers	60 min	14 GB	60+ regions	Requests + GB-sec
Cloudflare Workers	V8 isolate	Isolate	5 min (paid)	128 MB	300+ PoPs	Requests + CPU-ms
Vercel Edge	V8 isolate	Isolate	30 s (5 min streaming)	128 MB	30+ PoPs	GB-hour
Deno Deploy	V8 isolate	Isolate	60 s	512 MB	35+ PoPs	Requests + core-ms
Fastly Compute@Edge	Wasm	Wasm sandbox	60 s	128 MB	100+ PoPs	Requests
Fermyon Spin	Wasm	Wasm sandbox	Per component	Per component	Host-dependent	Host-dependent
Fly.io	Containers	Firecracker microVM	Unbounded	256 MB–256 GB	30+ regions	Per-minute VM
Railway	Containers	KVM	Unbounded	32 GB	4+ regions	Hours + GB-RAM

Where serverless is still not the answer

Finally, in 2026 there are still slots where serverless isn't the right answer.

Long-running GPU inference: LLMs with multi-second model load. Self-managed GPU instances, SageMaker, Modal, or Replicate are better.
Long-lived persistent connections with state: Large-scale multiplayer game servers. Fly.io or EC2/GKE is more natural.
Trading systems needing consistent sub-tens-of-ms tail latency: Exchanges, ad bidding. Bare metal or dedicated.
Analytics/joins on tens of GB of memory: Lambda's 10 GB is not enough. Fargate or EKS.

Just as the right slots for serverless got clearer, so did the wrong ones.

References

AWS Lambda docs: https://docs.aws.amazon.com/lambda
AWS Lambda Powertools: https://docs.powertools.aws.dev
Google Cloud Run docs: https://cloud.google.com/run/docs
Azure Functions docs: https://learn.microsoft.com/azure/azure-functions
Cloudflare Workers docs: https://developers.cloudflare.com/workers
Deno Deploy docs: https://docs.deno.com/deploy
Vercel Functions docs: https://vercel.com/docs/functions
Netlify Edge Functions: https://docs.netlify.com/edge-functions/overview
Fastly Compute@Edge: https://developer.fastly.com/learning/compute
Fermyon Spin: https://developer.fermyon.com/spin
Fly.io docs: https://fly.io/docs
Railway docs: https://docs.railway.app
Render docs: https://render.com/docs
Koyeb docs: https://www.koyeb.com/docs