Skip to content

✍️ 필사 모드: Node.js Event Loop & libuv Internals — 6 Phases, Microtasks, Worker Threads, V8 Heap (2025)

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Intro — The Truth About "Single-Threaded"

The Node.js docs say "event-driven, non-blocking I/O." Every tutorial says "Node is single-threaded." Both are half-true.

The JavaScript execution context (V8) is single-threaded. But the Node process is not. libuvs worker thread pool runs 4 threads by default (up to 1024), and V8 has separate GC and compile threads. Run htop on a simple Express server and youll see 7 to 10 threads.

Still, your JS code runs on one event loop thread. libuv manages that loop, V8 runs JS callbacks, control returns to libuv. This post dissects the 6 phases, microtask/macrotask priorities, the setImmediate vs setTimeout subtlety, how async/await desugars, and what to do when CPU-bound work blocks the loop.

The real answer to "how does one thread handle 100K connections?" is a three-layer stack: OS kernel I/O multiplexing (epoll/kqueue/IOCP) + libuv abstraction + V8s fast closure calls.


1. Node.js Architecture Overview

┌──────────────────────────────────────────────────────┐
│  User JavaScript (app.js)                            │
└──────────────────────────────────────────────────────┘
           ↕ (bindings)
┌──────────────────────────────────────────────────────┐
│  Node.js Core Modules (JS + C++)                     │
│  fs, net, http, crypto, stream, ...                  │
└──────────────────────────────────────────────────────┘
┌──────────────┐  ┌──────────────┐  ┌───────────────┐
│     V8       │  │    libuv     │  │  OpenSSL/zlib │
│  (JS Engine) │  │ (Event Loop) │  │  (Crypto/Gzip)│
└──────────────┘  └──────────────┘  └───────────────┘
┌──────────────────────────────────────────────────────┐
│  OS Kernel (Linux: epoll, macOS: kqueue, Win: IOCP)  │
└──────────────────────────────────────────────────────┘
  • V8: Googles JS engine. Parser, bytecode (Ignition), JIT tiers (Sparkplug/Maglev/TurboFan), GC (Orinoco).
  • libuv: C library that unifies epoll (Linux), kqueue (macOS/BSD), IOCP (Windows), event ports (Solaris) behind one API. Owns the loop, thread pool, DNS, file I/O, TCP/UDP, timers.
  • Bindings: N-API lets V8 call into C++.

2. libuv Event Loop — The 6 Phases

   ┌───────────────────────────┐
┌─▶│           timers          │  ← setTimeout / setInterval
│  └─────────────┬─────────────┘
│  ┌─────────────▼─────────────┐
│  │     pending callbacks     │  ← deferred I/O error callbacks
│  └─────────────┬─────────────┘
│  ┌─────────────▼─────────────┐
│  │       idle, prepare       │  ← internal
│  └─────────────┬─────────────┘      ┌──────────────┐
│  ┌─────────────▼─────────────┐      │   incoming:  │
│  │           poll            │◀─────┤  connections │
│  └─────────────┬─────────────┘      └──────────────┘
│  ┌─────────────▼─────────────┐
│  │           check           │  ← setImmediate
│  └─────────────┬─────────────┘
│  ┌─────────────▼─────────────┐
└──┤      close callbacks      │  ← socket.on('close')
   └───────────────────────────┘

One "tick" is one full lap. Each phase has a FIFO queue; the loop drains whats ready, then advances.

2.1 Timers

setTimeout(fn, 100) inserts a node into a min-heap keyed on now + 100ms. Each tick, the Timers phase pops all heap entries whose deadline has passed. The delay is a lower bound, not a guarantee — a slow Poll phase delays it. setTimeout(fn, 1) often fires after 5–15ms.

2.2 Pending Callbacks

System-level callbacks deferred from a previous tick (e.g., certain TCP errors). Rarely user-visible.

2.3 Idle, Prepare

Internal to libuv. A hook point for metrics collectors.

2.4 Poll — Where Most Time Is Spent

Node calls epoll_wait() (or kqueue/IOCP) here.

1. Execute all callbacks already in the poll queue.
2. If the queue is empty:
   a. If setImmediate is scheduled → jump to check phase.
   b. If a timer is about to expire → jump to timers phase.
   c. Otherwise → block in epoll_wait() until the OS wakes us.
3. New events are queued and dispatched.

This is why an idle Node process sits at 0% CPU — it blocks in the kernel, and the OS scheduler hands CPU to other processes.

2.5 Check

Runs setImmediate() callbacks.

2.6 Close Callbacks

Cleanup callbacks like socket.destroy(), stream.end(), 'close' event listeners.


3. setImmediate vs setTimeout(fn, 0)

setTimeout(() => console.log('timeout'), 0)
setImmediate(() => console.log('immediate'))

Output order is unpredictable:

  1. setTimeout(fn, 0) clamps to setTimeout(fn, 1) internally.
  2. Whether 1ms has already passed by the time the Timers phase is reached decides who wins.

When ordering matters — inside an I/O callback:

const fs = require('fs')
fs.readFile(__filename, () => {
  setTimeout(() => console.log('timeout'), 0)
  setImmediate(() => console.log('immediate'))
})

Here immediate always prints first. The fs callback ran in the Poll phase; the next phase is Check (setImmediate). Timers only comes around on the next tick.


4. Microtasks — Outside the Phase Loop

4.1 Two Microtask Queues

  • process.nextTick queue: Node-specific, higher priority than Promises.
  • Promise microtask queue: V8s standard Promise/async queue.

After each callback in any phase:

1. Drain process.nextTick queue.
2. Drain Promise microtask queue.
3. Next callback.

4.2 Experiment

setImmediate(() => console.log('1. setImmediate'))
setTimeout(() => console.log('2. setTimeout'), 0)
Promise.resolve().then(() => console.log('3. promise'))
process.nextTick(() => console.log('4. nextTick'))
console.log('5. sync')

Output:

5. sync
4. nextTick
3. promise
2. setTimeout  (or setImmediate first)
1. setImmediate

4.3 Why process.nextTick Is Dangerous

function explode() {
  process.nextTick(explode)
}
explode()

This starves the loop forever — Node will not advance phases until the nextTick queue is empty. Recursive Promise.resolve().then() has the same effect. Always yield via setImmediate or setTimeout in recursive async chains.

4.4 Node 11+ Change

Node 10 drained microtasks only after a phase finished. Node 11+ drains between every callback, matching browser behavior. Old tutorials lie.


5. Thread Pool — Cracks in the Single-Threaded Myth

5.1 Worker Pool (UV_THREADPOOL_SIZE)

libuvs default pool has 4 worker threads. The following go there:

  • File I/O (most fs.*)
  • DNS lookups (dns.lookup, backed by glibc getaddrinfo)
  • CPU-heavy crypto (crypto.pbkdf2, bcrypt, scrypt, argon2)
  • zlib compression

Raise it with UV_THREADPOOL_SIZE=16 (max 1024) before app start.

5.2 Why File I/O Uses the Pool

Linux epoll cannot watch regular files — file reads are always blocking (pre-io_uring). libuv runs blocking read/write on a worker thread and posts completion back to the loop. Sockets, being epoll-compatible, bypass the pool. Thats why Node shines at network I/O but struggles with heavy file I/O.

5.3 io_uring

Node 20+ added experimental io_uring support (UV_USE_IO_URING=1). Real async file I/O without the pool — will gradually replace the current model.

5.4 Pool Starvation

for (const user of users) {
  bcrypt.hash(user.password, 10, callback)
}

bcrypt saturates all 4 workers, stalling every other fs call. Fixes: bump UV_THREADPOOL_SIZE, move CPU work to Worker Threads, or add concurrency limits (p-queue).


6. Worker Threads — Real Parallel JS

6.1 Why

libuvs pool runs C++ tasks only. CPU-bound JavaScript still blocks the loop.

// Blocks everyone
app.post('/process', (req, res) => {
  const result = heavyMatrixMultiplication(req.body.matrix)
  res.json(result)
})

6.2 API

// main.js
const { Worker } = require('worker_threads')
const w = new Worker('./heavy.js', { workerData: matrix })
w.on('message', (result) => res.json(result))

// heavy.js
const { parentPort, workerData } = require('worker_threads')
parentPort.postMessage(heavyMatrixMultiplication(workerData))

Each worker is a separate V8 isolate (own heap, own loop). Communicates via postMessage (structured clone). Use SharedArrayBuffer + Atomics for shared memory.

6.3 Worker vs Cluster vs child_process

MechanismProcess isolationShared memoryStartupUse case
Worker ThreadsnoSharedArrayBuffer~30 msCPU-bound JS
Cluster (fork)yesIPC only~100 msHTTP horizontal scale
child_process.spawnyesstdio/IPC~100 ms+External binaries
child_process.forkyesIPC only~100 msOther Node scripts

6.4 Piscina

Creating workers costs 30–50ms each. Piscina pools and recycles them — the standard way to run CPU-bound JS in production.


7. async/await Internals

async function foo() {
  return 42
}
// equivalent to
function foo() {
  return Promise.resolve(42)
}

V8 compiles async functions to a state machine. Each await is a resume point; the resume job is enqueued as a microtask when the Promise settles.

console.log('1')
demo()
console.log('2')

Prints 1, A, 2, B, C. demo() runs synchronously up to the first await, then returns.

Async is not free — V8s 2018 "zero-cost async" reduced Promise allocations, but awaits in tight loops are still slower than plain iteration.

// Slow: 10K awaits → 10K microtasks
for (const item of items) await validate(item)
// Fast
await Promise.all(items.map(validate))

8. I/O Model Comparison

Thread-per-Connection (classic Rails/Java)

10K connections × 8MB stack = 80GB RAM. Context-switch overhead is brutal.

Node (event-driven)

1 thread + epoll watching 10K fds. Hundreds of MB. Breaks under CPU-bound work.

Go / Elixir (M:N)

Goroutines (2KB stack) / Erlang processes (~300B). Runtime schedules; developers write "synchronous-looking" code.

Nodes strength: ecosystem, shared language front/back, fast V8 async. Weakness: one blocking op freezes everything.


9. Loop Blocks — Diagnosis

9.1 Measuring Lag

const { monitorEventLoopDelay } = require('perf_hooks')
const h = monitorEventLoopDelay({ resolution: 20 })
h.enable()

setInterval(() => {
  console.log('p99 delay (ms):', (h.percentile(99) / 1e6).toFixed(2))
  h.reset()
}, 5000)

p99 over 100ms = users feel it.

9.2 CPU Profiling

node --inspect app.js
# or
npx 0x app.js

9.3 Five Common Culprits

  1. Large JSON parse/stringify → streaming parser (stream-json).
  2. Sync fs APIs → async versions.
  3. Regex catastrophic backtracking → RE2 bindings.
  4. bcrypt on the loop → Worker Thread.
  5. Tight for-loop over 1M items → batch + setImmediate yield.

10. V8 — Where JS Actually Runs

10.1 Compile Pipeline (2024–2025)

Source → Parser → AST → Ignition (bytecode)
  → (interpreter) → Sparkplug (baseline) → Maglev (mid-tier) → TurboFan

Tiered JIT: cold code stays interpreted, hot code is optimized.

10.2 Hidden Class (Shape)

function Point(x, y) { this.x = x; this.y = y }
const p1 = new Point(1, 2)   // Shape A
const p2 = new Point(3, 4)   // Shape A (reused)
p2.z = 5                      // Shape B (deopt)

Avoid adding/removing properties after construction in hot paths.

10.3 GC — Orinoco

  • Young (Scavenger): fast Minor GC.
  • Old (Mark-Sweep-Compact): incremental/concurrent marking.

--max-old-space-size=4096 sets the old-space cap. Node 18+ detects cgroup limits — earlier versions often OOM inside Docker.


11. libuv Data Structures

PurposeStructure
Timersmin-heap
Pending queueintrusive doubly-linked list
Thread pool tasksFIFO queue + condition variable
DNSc-ares + worker pool

uv_run(loop, UV_RUN_DEFAULT) is the 6-phase loop function itself. UV_RUN_ONCE runs one tick — handy for embedding/tests.


12. Streams & Backpressure

// Bad: loads 2GB into memory
const data = fs.readFileSync('/big.csv')
// Good: chunked
fs.createReadStream('/big.csv')
  .pipe(csvParser())
  .pipe(uploader)

Streams emit chunks (default 16KB). When the buffer exceeds highWaterMark, .write() returns false; the producer pauses until 'drain'. Use pipeline() (or stream/promises) — safer than raw .on('data').

Node 18+ supports WHATWG Web Streams natively (Readable.fromWeb / .toWeb). Fetch/Undici/Bun prefer them.


13. Cluster & Scaling

const cluster = require('cluster')
const os = require('os')

if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) cluster.fork()
} else {
  require('./app')
}

All workers share the same TCP port. With SO_REUSEPORT, the kernel round-robins. Modern deployments often prefer "1 container = 1 process" and let Kubernetes scale, skipping the cluster module entirely.


14. AbortController

const controller = new AbortController()
const { signal } = controller

fetch('/api', { signal })
fs.readFile('/big', { signal }, cb)
setTimeout(() => controller.abort(), 5000)

Node 15+ supports the Web standard everywhere. Aborting rejects the Promise, but underlying resources may still complete — network sockets close, but an in-flight fs op may finish.


15. Diagnostics

node --inspect=0.0.0.0:9229 app.js
const v8 = require('v8')
v8.writeHeapSnapshot('/tmp/heap.heapsnapshot')

AsyncLocalStorage (built on async_hooks) carries request-scoped context across async boundaries — Nodes equivalent of Pythons contextvars / Gos context.Context. OpenTelemetry Node builds trace propagation on it. diagnostics_channel (Node 16+) is the standard hook point for APMs.


16. Security

Permissions Model (Node 20+)

node --permission --allow-read=./data --allow-write=./logs app.js

Deno-inspired. Limits fs/child_process/worker_threads. Useful against supply-chain attacks.

npm Attack Surface

  • Typosquatting, dependency confusion, post-install scripts.
  • Defenses: npm ci + lockfile, --ignore-scripts, Socket/Snyk/Dependabot, internal registry.

Prototype Pollution

Prefer Object.create(null) and Object.hasOwn().


17. Bun & Deno

  • Bun: Zig, JavaScriptCore, custom event loop. 2–4× faster benchmarks; less production mileage.
  • Deno: V8 + Rust + tokio. TypeScript-first, permissions, URL imports. Deno 2 dramatically improved npm compatibility.
  • Node response: built-in test runner, --watch, --env-file, permission model, Web Streams, fetch, WebSocket.

2025 reality: Node is default, Bun is experimental/fast, Deno is TS-first greenfield.


18. Production Checklist

  • --max-old-space-size matched to container memory
  • Event loop lag monitoring
  • UV_THREADPOOL_SIZE tuned to cores
  • Graceful shutdown on SIGTERM
  • Log rotation (pino + pino-roll)
  • HTTP keep-alive configured
  • unhandledRejection / uncaughtException handlers
  • Health & readiness probes
  • PID 1 handling in containers (--init)
  • Dependency scanning in CI

Performance: remove sync APIs, stream large JSON, parallelize with Promise.all, move CPU work to Worker Threads, share an HTTP agent (undici).


Closing — Draw the Loop in Your Head

If you can answer these, debugging "why is my API slow" becomes easy:

  • Which phase runs HTTP callbacks? (Poll)
  • What happens during JSON.stringify on a huge object? (loop frozen)
  • What does the loop do during await db.query(...)? (other I/O)
  • Why does recursive process.nextTick hang? (never leaves current phase)
  • Worker Thread vs Cluster Worker? (same process vs separate process)

Node.js is V8 + libuv + OS kernel performing together. The event loop at the center has evolved for two decades. Next up: WebSocket and Server-Sent Events internals — how real-time protocols work over TCP and how libraries like Socket.IO scale to 100K connections.

현재 단락 (1/231)

The Node.js docs say "event-driven, non-blocking I/O." Every tutorial says "Node is single-threaded....

작성 글자: 0원문 글자: 13,385작성 단락: 0/231