- Published on
WebAssembly & Edge Computing: Browser AI Inference, Cloudflare Workers, and IoT Wasm
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- What Is WebAssembly?
- 1. WAT Text Format and Bytecode Structure
- 2. WASI: WebAssembly System Interface
- 3. The Wasm Ecosystem: Rust, AssemblyScript, Emscripten
- 4. Browser AI: ONNX Runtime Web, WebNN, WebGPU
- 5. Edge AI Deployment: Cloudflare Workers, Fastly, AWS Lambda@Edge
- 6. IoT & Embedded Wasm
- 7. Performance Benchmarking: Wasm vs Native
- 8. Real-World Case Studies
- Quiz
- Conclusion
What Is WebAssembly?
WebAssembly (Wasm) is a binary instruction format that became a W3C official web standard in 2019. It runs at near-native speed in browsers, servers, and IoT devices alike, effectively ending JavaScript's monopoly as the only language that executes natively in browsers.
The four core design principles of Wasm are:
- Safety: A sandboxed memory model protects the host environment.
- Portability: Behaves identically regardless of CPU architecture.
- Speed: JIT/AOT compilation achieves several times the throughput of JavaScript for compute-intensive workloads.
- Openness: Not tied to any specific language or platform.
1. WAT Text Format and Bytecode Structure
WebAssembly has two representations: the .wasm binary format and the human-readable .wat (WebAssembly Text Format).
WAT Example: Sum of Two Integers
(module
(func $add (param $a i32) (param $b i32) (result i32)
local.get $a
local.get $b
i32.add)
(export "add" (func $add)))
When compiled to .wasm, the binary starts with the magic number \0asm (0x00 0x61 0x73 0x6D).
Bytecode Structure
A Wasm module is organized into typed sections:
| Section ID | Name | Description |
|---|---|---|
| 1 | Type | Function signature definitions |
| 3 | Function | Function index table |
| 7 | Export | Symbols exposed to the host |
| 10 | Code | Actual function bodies |
Linear Memory
Wasm uses a linear memory model — a single contiguous byte array. It is allocated in 64KB page units and can be accessed directly from JavaScript via the WebAssembly.Memory object.
const memory = new WebAssembly.Memory({ initial: 1, maximum: 10 })
const buffer = new Uint8Array(memory.buffer)
// Read/write directly starting at offset 0
buffer[0] = 42
Pointer arithmetic is safely sandboxed inside the Wasm instance; host memory is never accessible.
2. WASI: WebAssembly System Interface
WASI is a standard system interface that allows Wasm modules to access OS capabilities such as the file system, networking, and environment variables. Solomon Hykes (Docker's creator) famously said that if WASM+WASI had existed in 2008, Docker would not have been needed.
(import "wasi_snapshot_preview1" "fd_write"
(func $fd_write (param i32 i32 i32 i32) (result i32)))
Key abstractions provided by WASI:
- File system:
fd_read,fd_write,path_open - Clocks:
clock_time_get - Environment variables:
environ_get - Networking (WASI 0.2):
wasi:socketsinterface
WASI 0.2 (Component Model), released in 2024, introduced WIT (Wasm Interface Types) — a high-level interface definition language for composable Wasm components.
3. The Wasm Ecosystem: Rust, AssemblyScript, Emscripten
Rust to WebAssembly with wasm-pack
Rust is currently the most mature language in the Wasm ecosystem. Using wasm-pack, you can produce npm-ready Wasm packages in minutes.
// src/lib.rs
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub fn fibonacci(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => fibonacci(n - 1) + fibonacci(n - 2),
}
}
#[wasm_bindgen]
pub fn matrix_multiply(a: &[f32], b: &[f32], n: usize) -> Vec<f32> {
let mut result = vec![0.0f32; n * n];
for i in 0..n {
for j in 0..n {
for k in 0..n {
result[i * n + j] += a[i * n + k] * b[k * n + j];
}
}
}
result
}
Build and deploy:
# Install wasm-pack
cargo install wasm-pack
# Build for browser target
wasm-pack build --target web
# Build for Node.js target
wasm-pack build --target nodejs
The generated pkg/ directory contains the .wasm binary, JavaScript glue code, and TypeScript type definitions.
Calling Wasm from JavaScript
import init, { fibonacci, matrix_multiply } from './pkg/my_module.js'
async function main() {
// Initialize Wasm module
await init()
// Compute Fibonacci
console.log(fibonacci(40)) // 102334155
// Matrix multiplication (4x4)
const a = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
const b = new Float32Array([1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
const result = matrix_multiply(a, b, 4)
console.log(result)
}
main()
AssemblyScript
AssemblyScript lets you write Wasm using TypeScript-like syntax.
// assembly/index.ts
export function add(a: i32, b: i32): i32 {
return a + b
}
export function sumArray(ptr: usize, len: i32): i64 {
let sum: i64 = 0
for (let i = 0; i < len; i++) {
sum += load<i32>(ptr + i * 4)
}
return sum
}
Emscripten (C/C++)
Emscripten is the go-to toolchain for porting C/C++ codebases to Wasm. Figma and Google Earth both use this approach.
# Compile a C file to Wasm
emcc compute.c -O3 -o compute.js \
-s WASM=1 \
-s EXPORTED_FUNCTIONS='["_process_image"]' \
-s EXPORTED_RUNTIME_METHODS='["ccall","cwrap"]'
4. Browser AI: ONNX Runtime Web, WebNN, WebGPU
In-Browser Inference with ONNX Runtime Web
ONNX Runtime Web runs ONNX models directly in the browser. It supports WebAssembly (CPU), WebGL, and WebGPU as execution backends.
import * as ort from 'onnxruntime-web'
async function runInference() {
// Prefer WebGPU backend, fall back to Wasm
const session = await ort.InferenceSession.create('/models/bert-base.onnx', {
executionProviders: ['webgpu', 'wasm'],
graphOptimizationLevel: 'all',
})
// Create input tensors
const inputIds = new BigInt64Array([101n, 2023n, 2003n, 102n])
const attentionMask = new BigInt64Array([1n, 1n, 1n, 1n])
const feeds = {
input_ids: new ort.Tensor('int64', inputIds, [1, 4]),
attention_mask: new ort.Tensor('int64', attentionMask, [1, 4]),
}
const results = await session.run(feeds)
console.log('Logits:', results.logits.data)
}
Matrix Multiplication with a WebGPU Compute Shader
WebGPU unlocks the GPU's parallel compute power directly from the web. The key reason it is far better than WebGL for ML inference is first-class compute shader support.
async function webgpuMatmul(matA, matB, M, N, K) {
const adapter = await navigator.gpu.requestAdapter()
const device = await adapter.requestDevice()
const shaderCode = `
@group(0) @binding(0) var<storage, read> matA: array<f32>;
@group(0) @binding(1) var<storage, read> matB: array<f32>;
@group(0) @binding(2) var<storage, read_write> result: array<f32>;
@compute @workgroup_size(8, 8)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
let row = gid.x;
let col = gid.y;
var sum = 0.0;
for (var k = 0u; k < ${K}u; k++) {
sum += matA[row * ${K}u + k] * matB[k * ${N}u + col];
}
result[row * ${N}u + col] = sum;
}
`
const shaderModule = device.createShaderModule({ code: shaderCode })
// ... create buffers, pipeline, dispatch
}
WebNN API
WebNN (Web Neural Network API) is a W3C standard that allows browsers to directly leverage OS-level hardware acceleration — NPUs, GPUs, and DSPs.
const context = await navigator.ml.createContext({ deviceType: 'gpu' })
const builder = new MLGraphBuilder(context)
const input = builder.input('input', { type: 'float32', dimensions: [1, 3, 224, 224] })
const weights = builder.constant(/* ... */)
const conv = builder.conv2d(input, weights, { padding: [1, 1, 1, 1] })
const relu = builder.relu(conv)
const graph = await builder.build({ output: relu })
const results = await context.compute(graph, inputs, outputs)
5. Edge AI Deployment: Cloudflare Workers, Fastly, AWS Lambda@Edge
Cloudflare Workers AI
Cloudflare Workers runs on a V8 isolate model across 300+ global PoPs (Points of Presence). With the AI binding, inference happens at the edge closest to your users.
// Cloudflare Worker with AI binding
export default {
async fetch(request, env) {
const body = await request.json()
const userMessage = body.message
// Run LLM inference via AI binding
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{
role: 'system',
content: 'You are a helpful assistant.',
},
{
role: 'user',
content: userMessage,
},
],
max_tokens: 512,
})
return new Response(JSON.stringify({ reply: response.response }), {
headers: { 'Content-Type': 'application/json' },
})
},
}
wrangler.toml configuration:
name = "edge-ai-worker"
main = "src/index.js"
compatibility_date = "2024-09-23"
[ai]
binding = "AI"
Fastly Compute (Rust-based Wasm)
use fastly::{Error, Request, Response};
#[fastly::main]
fn main(req: Request) -> Result<Response, Error> {
let body = req.into_body_str();
// Business logic processed entirely within Wasm
let processed = process_at_edge(&body);
Ok(Response::from_body(processed))
}
fn process_at_edge(input: &str) -> String {
format!("Processed at edge: {}", input.to_uppercase())
}
Cloudflare Workers vs AWS Lambda@Edge Comparison
| Property | Cloudflare Workers | AWS Lambda@Edge |
|---|---|---|
| Execution model | V8 Isolate | Container-based |
| Cold start | ~0ms | 100ms to seconds |
| Memory limit | 128MB | 128MB to 10GB |
| Max duration | 30s (paid plan) | 30s |
| Global PoPs | 300+ | CloudFront edges |
| Language support | JS/TS, Wasm | Node.js, Python, Java, etc. |
The core reason Cloudflare Workers has near-zero cold starts is that it reuses V8 isolates rather than spawning new OS processes. Each Worker runs in an isolated JavaScript execution context within the same process, eliminating OS-level process initialization entirely.
6. IoT & Embedded Wasm
WasmEdge
WasmEdge is a CNCF sandbox project — a lightweight Wasm runtime optimized for IoT and edge devices.
# Install WasmEdge
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash
# Run a Python script via WasmEdge
wasmedge --dir .:. python_wasm.wasm script.py
Running a Python anomaly detection script on WasmEdge:
# script.py - runs on top of WasmEdge
import sys
import json
def process_sensor_data(data):
temperature = data.get('temperature', 0)
humidity = data.get('humidity', 0)
if temperature > 80 or humidity > 90:
return {'alert': True, 'reason': 'threshold_exceeded'}
return {'alert': False, 'status': 'normal'}
data = json.loads(sys.argv[1])
result = process_sensor_data(data)
print(json.dumps(result))
WAMR (WebAssembly Micro Runtime)
WAMR, developed by the Bytecode Alliance, is an ultra-lightweight Wasm runtime that can operate with only a few kilobytes of RAM.
Minimum memory requirements:
- Interpreter mode: ~85KB ROM + ~64KB RAM
- AOT mode: ~60KB ROM + ~64KB RAM
Fermyon Spin
Spin is a Wasm-based microservices framework that makes building and deploying edge functions straightforward.
# spin.toml
spin_manifest_version = 2
[application]
name = "iot-processor"
version = "0.1.0"
[[trigger.http]]
route = "/sensor"
component = "sensor-handler"
[component.sensor-handler]
source = "target/wasm32-wasi/release/sensor_handler.wasm"
[component.sensor-handler.build]
command = "cargo build --target wasm32-wasi --release"
7. Performance Benchmarking: Wasm vs Native
SIMD in WebAssembly
Wasm SIMD supports 128-bit vector operations, dramatically accelerating ML workloads.
// Using Wasm SIMD in Rust
#[cfg(target_arch = "wasm32")]
use std::arch::wasm32::*;
pub fn dot_product_simd(a: &[f32], b: &[f32]) -> f32 {
let mut sum = f32x4_splat(0.0);
let chunks = a.len() / 4;
for i in 0..chunks {
let va = v128_load(a[i*4..].as_ptr() as *const v128);
let vb = v128_load(b[i*4..].as_ptr() as *const v128);
sum = f32x4_add(sum, f32x4_mul(va, vb));
}
// Horizontal sum
let arr: [f32; 4] = unsafe { std::mem::transmute(sum) };
arr.iter().sum()
}
Multithreading with SharedArrayBuffer
Wasm threading leverages SharedArrayBuffer and the Atomics API.
// Wasm multithreading with shared memory
const sharedMemory = new WebAssembly.Memory({
initial: 16,
maximum: 256,
shared: true, // Enables SharedArrayBuffer
})
// Pass shared memory to a Worker
const worker = new Worker('wasm-worker.js')
worker.postMessage({ memory: sharedMemory })
Benchmark Results (Reference)
| Task | JavaScript | Wasm (single) | Wasm + SIMD | Native C |
|---|---|---|---|---|
| Matrix multiply (1024x1024) | 850ms | 210ms | 55ms | 40ms |
| SHA-256 hash (1MB) | 120ms | 35ms | 22ms | 18ms |
| Image resize (4K) | 340ms | 95ms | 28ms | 20ms |
Wasm + SIMD closes to within 10–40% of native C performance.
8. Real-World Case Studies
Figma
Figma's entire rendering engine is written in C++ and compiled to Wasm via Emscripten. This allows complex vector graphics operations to run at 60fps in the browser without any plugins.
Google Earth
Google Earth for Web ports its massive C++ 3D terrain rendering engine to the browser through Wasm, enabling gigabytes of 3D geographic data to be rendered client-side.
Pyodide: Python in the Browser
Pyodide compiles the entire CPython interpreter to Wasm, enabling full Python execution inside a browser tab.
<script src="https://cdn.jsdelivr.net/pyodide/v0.27.0/full/pyodide.js"></script>
<script>
async function runPython() {
const pyodide = await loadPyodide()
// Install and use numpy/pandas entirely in the browser
await pyodide.loadPackage(['numpy', 'pandas'])
const result = pyodide.runPython(`
import numpy as np
import pandas as pd
# numpy operations running in the browser
arr = np.random.randn(1000, 1000)
eigenvalues = np.linalg.eigvals(arr[:10, :10])
float(np.abs(eigenvalues).max())
`)
console.log('Max eigenvalue:', result)
}
runPython()
</script>
Quiz
Q1. What is the fundamental reason WebAssembly outperforms JavaScript in numerical computation?
Answer: Static type system and predictable AOT/JIT compilation
Explanation: JavaScript is a dynamically typed language. At runtime, the JS engine must perform type inference, inline caching, hidden class transitions, and many other optimizations before generating machine code. Wasm, by contrast, has all types fixed at compile time, allowing the JIT engine to emit optimized machine code immediately. Additionally, Wasm bytecode has very low parsing overhead and gives explicit access to SIMD and multithreading instructions.
Q2. Why is WASI (WebAssembly System Interface) necessary, and what abstractions does it provide?
Answer: A standard interface for accessing system resources without OS-specific dependencies
Explanation: Wasm modules running outside the browser need access to file systems, networking, and environment variables, but Wasm itself is sandboxed with no system access. WASI standardizes POSIX-like system calls as Wasm interface imports, allowing the same .wasm binary to run identically on Linux, Windows, macOS, or embedded systems. It is often described as "the future of containers" — a single Wasm binary that runs anywhere without Docker.
Q3. Why is WebGPU more suitable than WebGL for ML inference?
Answer: First-class compute shader support and explicit GPU memory management
Explanation: WebGL is designed for graphics rendering pipelines, making general-purpose parallel computation (GPGPU) awkward — you had to abuse fragment shaders to perform matrix operations. WebGPU provides compute shaders as a first-class feature, flexible storage buffer access patterns, and a better asynchronous execution model. In practice, the ONNX Runtime Web WebGPU backend achieves 2–5x faster inference than the WebGL backend for transformer models.
Q4. Why do Cloudflare Workers have near-zero cold starts compared to AWS Lambda?
Answer: V8 isolates are created within an existing process, eliminating container/OS initialization
Explanation: AWS Lambda provisions a new container (or execution environment) for each cold start, which involves OS boot, runtime initialization, and code loading — adding hundreds of milliseconds to seconds of latency. Cloudflare Workers creates a memory-isolated V8 isolate inside an already-running V8 process in approximately 1ms. The existing process already has JIT-compiled code ready, so there is effectively no cold start delay in practice.
Q5. How does Pyodide use WebAssembly to run Python in the browser?
Answer: The entire CPython interpreter (written in C) is compiled to Wasm via the Emscripten toolchain
Explanation: Pyodide takes the CPython 3.x source code and compiles it to a WebAssembly binary using Emscripten. When a browser loads this Wasm binary, a complete Python interpreter runs inside the browser tab. C extension modules like numpy and scipy are similarly compiled to Wasm. Bidirectional Python-JavaScript bindings (PyProxy, JsProxy) allow Python objects to be manipulated from JS and JS objects from Python. Projects like JupyterLite build fully in-browser Jupyter environments on top of this foundation.
Conclusion
WebAssembly has evolved far beyond being a "faster JavaScript alternative" — it is becoming a universal runtime platform. With the maturation of the WASI Component Model, the growing adoption of WebGPU, and rapid uptake of Wasm across edge platforms, Wasm is now a standard technology in the following domains as of 2026:
- Browser AI inference (ONNX Runtime Web + WebGPU)
- Serverless edge functions (Cloudflare Workers, Fastly Compute)
- IoT and embedded systems (WasmEdge, WAMR)
- Plugin systems (Extism, waPC)
Whether you are starting with Rust and wasm-pack, deploying to Cloudflare Workers, or running ML models in the browser with WebGPU — this guide should serve as your compass into the world of edge computing with WebAssembly.