Skip to content

필사 모드: WebAssembly & Edge Computing: Browser AI Inference, Cloudflare Workers, and IoT Wasm

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

What Is WebAssembly?

WebAssembly (Wasm) is a binary instruction format that became a W3C official web standard in 2019. It runs at near-native speed in browsers, servers, and IoT devices alike, effectively ending JavaScript's monopoly as the only language that executes natively in browsers.

The four core design principles of Wasm are:

- **Safety**: A sandboxed memory model protects the host environment.

- **Portability**: Behaves identically regardless of CPU architecture.

- **Speed**: JIT/AOT compilation achieves several times the throughput of JavaScript for compute-intensive workloads.

- **Openness**: Not tied to any specific language or platform.

1. WAT Text Format and Bytecode Structure

WebAssembly has two representations: the `.wasm` binary format and the human-readable `.wat` (WebAssembly Text Format).

WAT Example: Sum of Two Integers

(module

(func $add (param $a i32) (param $b i32) (result i32)

local.get $a

local.get $b

i32.add)

(export "add" (func $add)))

When compiled to `.wasm`, the binary starts with the magic number `\0asm` (0x00 0x61 0x73 0x6D).

Bytecode Structure

A Wasm module is organized into typed sections:

| Section ID | Name | Description |

| ---------- | -------- | ------------------------------ |

| 1 | Type | Function signature definitions |

| 3 | Function | Function index table |

| 7 | Export | Symbols exposed to the host |

| 10 | Code | Actual function bodies |

Linear Memory

Wasm uses a **linear memory** model — a single contiguous byte array. It is allocated in 64KB page units and can be accessed directly from JavaScript via the `WebAssembly.Memory` object.

const memory = new WebAssembly.Memory({ initial: 1, maximum: 10 })

const buffer = new Uint8Array(memory.buffer)

// Read/write directly starting at offset 0

buffer[0] = 42

Pointer arithmetic is safely sandboxed inside the Wasm instance; host memory is never accessible.

2. WASI: WebAssembly System Interface

WASI is a **standard system interface** that allows Wasm modules to access OS capabilities such as the file system, networking, and environment variables. Solomon Hykes (Docker's creator) famously said that if WASM+WASI had existed in 2008, Docker would not have been needed.

(import "wasi_snapshot_preview1" "fd_write"

(func $fd_write (param i32 i32 i32 i32) (result i32)))

Key abstractions provided by WASI:

- **File system**: `fd_read`, `fd_write`, `path_open`

- **Clocks**: `clock_time_get`

- **Environment variables**: `environ_get`

- **Networking (WASI 0.2)**: `wasi:sockets` interface

WASI 0.2 (Component Model), released in 2024, introduced WIT (Wasm Interface Types) — a high-level interface definition language for composable Wasm components.

3. The Wasm Ecosystem: Rust, AssemblyScript, Emscripten

Rust to WebAssembly with wasm-pack

Rust is currently the most mature language in the Wasm ecosystem. Using `wasm-pack`, you can produce npm-ready Wasm packages in minutes.

// src/lib.rs

use wasm_bindgen::prelude::*;

#[wasm_bindgen]

pub fn fibonacci(n: u32) -> u32 {

match n {

0 => 0,

1 => 1,

_ => fibonacci(n - 1) + fibonacci(n - 2),

}

}

#[wasm_bindgen]

pub fn matrix_multiply(a: &[f32], b: &[f32], n: usize) -> Vec<f32> {

let mut result = vec![0.0f32; n * n];

for i in 0..n {

for j in 0..n {

for k in 0..n {

result[i * n + j] += a[i * n + k] * b[k * n + j];

}

}

}

result

}

Build and deploy:

Install wasm-pack

cargo install wasm-pack

Build for browser target

wasm-pack build --target web

Build for Node.js target

wasm-pack build --target nodejs

The generated `pkg/` directory contains the `.wasm` binary, JavaScript glue code, and TypeScript type definitions.

Calling Wasm from JavaScript

async function main() {

// Initialize Wasm module

await init()

// Compute Fibonacci

console.log(fibonacci(40)) // 102334155

// Matrix multiplication (4x4)

const a = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])

const b = new Float32Array([1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])

const result = matrix_multiply(a, b, 4)

console.log(result)

}

main()

AssemblyScript

AssemblyScript lets you write Wasm using TypeScript-like syntax.

// assembly/index.ts

export function add(a: i32, b: i32): i32 {

return a + b

}

export function sumArray(ptr: usize, len: i32): i64 {

let sum: i64 = 0

for (let i = 0; i < len; i++) {

sum += load<i32>(ptr + i * 4)

}

return sum

}

Emscripten (C/C++)

Emscripten is the go-to toolchain for porting C/C++ codebases to Wasm. Figma and Google Earth both use this approach.

Compile a C file to Wasm

emcc compute.c -O3 -o compute.js \

-s WASM=1 \

-s EXPORTED_FUNCTIONS='["_process_image"]' \

-s EXPORTED_RUNTIME_METHODS='["ccall","cwrap"]'

4. Browser AI: ONNX Runtime Web, WebNN, WebGPU

In-Browser Inference with ONNX Runtime Web

ONNX Runtime Web runs ONNX models directly in the browser. It supports WebAssembly (CPU), WebGL, and WebGPU as execution backends.

async function runInference() {

// Prefer WebGPU backend, fall back to Wasm

const session = await ort.InferenceSession.create('/models/bert-base.onnx', {

executionProviders: ['webgpu', 'wasm'],

graphOptimizationLevel: 'all',

})

// Create input tensors

const inputIds = new BigInt64Array([101n, 2023n, 2003n, 102n])

const attentionMask = new BigInt64Array([1n, 1n, 1n, 1n])

const feeds = {

input_ids: new ort.Tensor('int64', inputIds, [1, 4]),

attention_mask: new ort.Tensor('int64', attentionMask, [1, 4]),

}

const results = await session.run(feeds)

console.log('Logits:', results.logits.data)

}

Matrix Multiplication with a WebGPU Compute Shader

WebGPU unlocks the GPU's parallel compute power directly from the web. The key reason it is far better than WebGL for ML inference is **first-class compute shader support**.

async function webgpuMatmul(matA, matB, M, N, K) {

const adapter = await navigator.gpu.requestAdapter()

const device = await adapter.requestDevice()

const shaderCode = `

@group(0) @binding(0) var<storage, read> matA: array<f32>;

@group(0) @binding(1) var<storage, read> matB: array<f32>;

@group(0) @binding(2) var<storage, read_write> result: array<f32>;

@compute @workgroup_size(8, 8)

fn main(@builtin(global_invocation_id) gid: vec3<u32>) {

let row = gid.x;

let col = gid.y;

var sum = 0.0;

for (var k = 0u; k < ${K}u; k++) {

sum += matA[row * ${K}u + k] * matB[k * ${N}u + col];

}

result[row * ${N}u + col] = sum;

}

`

const shaderModule = device.createShaderModule({ code: shaderCode })

// ... create buffers, pipeline, dispatch

}

WebNN API

WebNN (Web Neural Network API) is a W3C standard that allows browsers to directly leverage OS-level hardware acceleration — NPUs, GPUs, and DSPs.

const context = await navigator.ml.createContext({ deviceType: 'gpu' })

const builder = new MLGraphBuilder(context)

const input = builder.input('input', { type: 'float32', dimensions: [1, 3, 224, 224] })

const weights = builder.constant(/* ... */)

const conv = builder.conv2d(input, weights, { padding: [1, 1, 1, 1] })

const relu = builder.relu(conv)

const graph = await builder.build({ output: relu })

const results = await context.compute(graph, inputs, outputs)

5. Edge AI Deployment: Cloudflare Workers, Fastly, AWS Lambda@Edge

Cloudflare Workers AI

Cloudflare Workers runs on a V8 isolate model across 300+ global PoPs (Points of Presence). With the AI binding, inference happens at the edge closest to your users.

// Cloudflare Worker with AI binding

export default {

async fetch(request, env) {

const body = await request.json()

const userMessage = body.message

// Run LLM inference via AI binding

const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {

messages: [

{

role: 'system',

content: 'You are a helpful assistant.',

},

{

role: 'user',

content: userMessage,

},

],

max_tokens: 512,

})

return new Response(JSON.stringify({ reply: response.response }), {

headers: { 'Content-Type': 'application/json' },

})

},

}

`wrangler.toml` configuration:

name = "edge-ai-worker"

main = "src/index.js"

compatibility_date = "2024-09-23"

[ai]

binding = "AI"

Fastly Compute (Rust-based Wasm)

use fastly::{Error, Request, Response};

#[fastly::main]

fn main(req: Request) -> Result<Response, Error> {

let body = req.into_body_str();

// Business logic processed entirely within Wasm

let processed = process_at_edge(&body);

Ok(Response::from_body(processed))

}

fn process_at_edge(input: &str) -> String {

format!("Processed at edge: {}", input.to_uppercase())

}

Cloudflare Workers vs AWS Lambda@Edge Comparison

| Property | Cloudflare Workers | AWS Lambda@Edge |

| ---------------- | ------------------ | --------------------------- |

| Execution model | V8 Isolate | Container-based |

| Cold start | ~0ms | 100ms to seconds |

| Memory limit | 128MB | 128MB to 10GB |

| Max duration | 30s (paid plan) | 30s |

| Global PoPs | 300+ | CloudFront edges |

| Language support | JS/TS, Wasm | Node.js, Python, Java, etc. |

The core reason Cloudflare Workers has near-zero cold starts is that **it reuses V8 isolates rather than spawning new OS processes**. Each Worker runs in an isolated JavaScript execution context within the same process, eliminating OS-level process initialization entirely.

6. IoT & Embedded Wasm

WasmEdge

WasmEdge is a CNCF sandbox project — a lightweight Wasm runtime optimized for IoT and edge devices.

Install WasmEdge

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash

Run a Python script via WasmEdge

wasmedge --dir .:. python_wasm.wasm script.py

Running a Python anomaly detection script on WasmEdge:

script.py - runs on top of WasmEdge

def process_sensor_data(data):

temperature = data.get('temperature', 0)

humidity = data.get('humidity', 0)

if temperature > 80 or humidity > 90:

return {'alert': True, 'reason': 'threshold_exceeded'}

return {'alert': False, 'status': 'normal'}

data = json.loads(sys.argv[1])

result = process_sensor_data(data)

print(json.dumps(result))

WAMR (WebAssembly Micro Runtime)

WAMR, developed by the Bytecode Alliance, is an ultra-lightweight Wasm runtime that can operate with only a few kilobytes of RAM.

Minimum memory requirements:

- Interpreter mode: ~85KB ROM + ~64KB RAM

- AOT mode: ~60KB ROM + ~64KB RAM

Fermyon Spin

Spin is a Wasm-based microservices framework that makes building and deploying edge functions straightforward.

spin.toml

spin_manifest_version = 2

[application]

name = "iot-processor"

version = "0.1.0"

[[trigger.http]]

route = "/sensor"

component = "sensor-handler"

[component.sensor-handler]

source = "target/wasm32-wasi/release/sensor_handler.wasm"

[component.sensor-handler.build]

command = "cargo build --target wasm32-wasi --release"

7. Performance Benchmarking: Wasm vs Native

SIMD in WebAssembly

Wasm SIMD supports 128-bit vector operations, dramatically accelerating ML workloads.

// Using Wasm SIMD in Rust

#[cfg(target_arch = "wasm32")]

use std::arch::wasm32::*;

pub fn dot_product_simd(a: &[f32], b: &[f32]) -> f32 {

let mut sum = f32x4_splat(0.0);

let chunks = a.len() / 4;

for i in 0..chunks {

let va = v128_load(a[i*4..].as_ptr() as *const v128);

let vb = v128_load(b[i*4..].as_ptr() as *const v128);

sum = f32x4_add(sum, f32x4_mul(va, vb));

}

// Horizontal sum

let arr: [f32; 4] = unsafe { std::mem::transmute(sum) };

arr.iter().sum()

}

Multithreading with SharedArrayBuffer

Wasm threading leverages `SharedArrayBuffer` and the `Atomics` API.

// Wasm multithreading with shared memory

const sharedMemory = new WebAssembly.Memory({

initial: 16,

maximum: 256,

shared: true, // Enables SharedArrayBuffer

})

// Pass shared memory to a Worker

const worker = new Worker('wasm-worker.js')

worker.postMessage({ memory: sharedMemory })

Benchmark Results (Reference)

| Task | JavaScript | Wasm (single) | Wasm + SIMD | Native C |

| --------------------------- | ---------- | ------------- | ----------- | -------- |

| Matrix multiply (1024x1024) | 850ms | 210ms | 55ms | 40ms |

| SHA-256 hash (1MB) | 120ms | 35ms | 22ms | 18ms |

| Image resize (4K) | 340ms | 95ms | 28ms | 20ms |

Wasm + SIMD closes to within 10–40% of native C performance.

8. Real-World Case Studies

Figma

Figma's entire rendering engine is written in C++ and compiled to Wasm via Emscripten. This allows complex vector graphics operations to run at 60fps in the browser without any plugins.

Google Earth

Google Earth for Web ports its massive C++ 3D terrain rendering engine to the browser through Wasm, enabling gigabytes of 3D geographic data to be rendered client-side.

Pyodide: Python in the Browser

Pyodide compiles the entire CPython interpreter to Wasm, enabling full Python execution inside a browser tab.

async function runPython() {

const pyodide = await loadPyodide()

// Install and use numpy/pandas entirely in the browser

await pyodide.loadPackage(['numpy', 'pandas'])

const result = pyodide.runPython(`

numpy operations running in the browser

arr = np.random.randn(1000, 1000)

eigenvalues = np.linalg.eigvals(arr[:10, :10])

float(np.abs(eigenvalues).max())

`)

console.log('Max eigenvalue:', result)

}

runPython()

Quiz

**Answer**: Static type system and predictable AOT/JIT compilation

**Explanation**: JavaScript is a dynamically typed language. At runtime, the JS engine must perform type inference, inline caching, hidden class transitions, and many other optimizations before generating machine code. Wasm, by contrast, has all types fixed at compile time, allowing the JIT engine to emit optimized machine code immediately. Additionally, Wasm bytecode has very low parsing overhead and gives explicit access to SIMD and multithreading instructions.

**Answer**: A standard interface for accessing system resources without OS-specific dependencies

**Explanation**: Wasm modules running outside the browser need access to file systems, networking, and environment variables, but Wasm itself is sandboxed with no system access. WASI standardizes POSIX-like system calls as Wasm interface imports, allowing the same `.wasm` binary to run identically on Linux, Windows, macOS, or embedded systems. It is often described as "the future of containers" — a single Wasm binary that runs anywhere without Docker.

**Answer**: First-class compute shader support and explicit GPU memory management

**Explanation**: WebGL is designed for graphics rendering pipelines, making general-purpose parallel computation (GPGPU) awkward — you had to abuse fragment shaders to perform matrix operations. WebGPU provides compute shaders as a first-class feature, flexible storage buffer access patterns, and a better asynchronous execution model. In practice, the ONNX Runtime Web WebGPU backend achieves 2–5x faster inference than the WebGL backend for transformer models.

**Answer**: V8 isolates are created within an existing process, eliminating container/OS initialization

**Explanation**: AWS Lambda provisions a new container (or execution environment) for each cold start, which involves OS boot, runtime initialization, and code loading — adding hundreds of milliseconds to seconds of latency. Cloudflare Workers creates a memory-isolated V8 isolate inside an already-running V8 process in approximately 1ms. The existing process already has JIT-compiled code ready, so there is effectively no cold start delay in practice.

**Answer**: The entire CPython interpreter (written in C) is compiled to Wasm via the Emscripten toolchain

**Explanation**: Pyodide takes the CPython 3.x source code and compiles it to a WebAssembly binary using Emscripten. When a browser loads this Wasm binary, a complete Python interpreter runs inside the browser tab. C extension modules like numpy and scipy are similarly compiled to Wasm. Bidirectional Python-JavaScript bindings (PyProxy, JsProxy) allow Python objects to be manipulated from JS and JS objects from Python. Projects like JupyterLite build fully in-browser Jupyter environments on top of this foundation.

Conclusion

WebAssembly has evolved far beyond being a "faster JavaScript alternative" — it is becoming a **universal runtime platform**. With the maturation of the WASI Component Model, the growing adoption of WebGPU, and rapid uptake of Wasm across edge platforms, Wasm is now a standard technology in the following domains as of 2026:

- Browser AI inference (ONNX Runtime Web + WebGPU)

- Serverless edge functions (Cloudflare Workers, Fastly Compute)

- IoT and embedded systems (WasmEdge, WAMR)

- Plugin systems (Extism, waPC)

Whether you are starting with Rust and `wasm-pack`, deploying to Cloudflare Workers, or running ML models in the browser with WebGPU — this guide should serve as your compass into the world of edge computing with WebAssembly.

현재 단락 (1/280)

WebAssembly (Wasm) is a binary instruction format that became a W3C official web standard in 2019. I...

작성 글자: 0원문 글자: 14,395작성 단락: 0/280