Skip to content

✍️ 필사 모드: Complete Guide to Async I/O Models 2025: epoll, io_uring, Reactor/Proactor, and async/await Internals

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

TL;DR

  • Async I/O is the foundation of modern servers: Nginx, Node.js, Redis — every high-performance server.
  • Evolution: select → poll → epoll/kqueue → io_uring.
  • Reactor vs Proactor: event readiness notification vs operation-completion notification.
  • io_uring: the Linux 5.1+ revolution. More efficient than epoll, and truly async.
  • async/await: the compiler rewrites your code as a state machine.

1. Evolution of I/O Models

1.1 Five I/O Models

POSIX classification:

  1. Blocking I/O: wait until the operation completes.
  2. Non-blocking I/O: return immediately, poll for completion.
  3. I/O Multiplexing: select/poll/epoll.
  4. Signal-driven I/O: notified via signals.
  5. Asynchronous I/O: true async (POSIX AIO, io_uring).

1.2 Blocking I/O — The Simplest

data = sock.recv(1024)  # blocks until data arrives
process(data)

Problems:

  • One thread = one connection.
  • 10K connections = 10K threads (memory blow-up).
  • The C10K problem.

1.3 Multi-Threaded as the Fix?

def handle_client(sock):
    data = sock.recv(1024)
    process(data)

while True:
    client = server.accept()
    threading.Thread(target=handle_client, args=(client,)).start()

Limits:

  • Per-thread memory (1–8 MB).
  • Context-switching cost.
  • 10K threads = death.

1.4 Non-blocking I/O

sock.setblocking(False)
try:
    data = sock.recv(1024)
except BlockingIOError:
    pass  # no data — do something else

Problem: infinite-loop polling pins CPU at 100%.

1.5 Enter I/O Multiplexing

Idea: one syscall watching many fds at once.

ready, _, _ = select([sock1, sock2, sock3], [], [])
for sock in ready:
    data = sock.recv(1024)

A single thread can now handle thousands of connections.


2. select and poll

2.1 select (1983)

fd_set readfds;
FD_ZERO(&readfds);
FD_SET(sock, &readfds);

int n = select(max_fd + 1, &readfds, NULL, NULL, &timeout);

if (FD_ISSET(sock, &readfds)) {
    // readable
}

Problems:

  • FD_SETSIZE limit (typically 1024).
  • O(n) scan: every call inspects every fd.
  • fd_set copy: user space ↔ kernel on each call.

2.2 poll (1986)

struct pollfd fds[1024];
fds[0].fd = sock;
fds[0].events = POLLIN;

int n = poll(fds, 1024, timeout);

if (fds[0].revents & POLLIN) {
    // readable
}

Improvements:

  • No hard fd-count limit.
  • Dynamic array.

Still problematic:

  • O(n) scan.
  • The fd list is passed on every call.

2.3 Limits of select/poll

10,000 connections with only 100 active:

  • select/poll scan all 10,000 every time.
  • Wasted CPU.

A new mechanism was needed.


3. epoll — The Linux Revolution

3.1 Arrival (2002, Linux 2.5.44)

int epfd = epoll_create1(0);

struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = sock;
epoll_ctl(epfd, EPOLL_CTL_ADD, sock, &ev);

struct epoll_event events[64];
int n = epoll_wait(epfd, events, 64, -1);

for (int i = 0; i < n; i++) {
    int fd = events[i].data.fd;
    // handle
}

3.2 Why epoll Wins

1. O(1) event detection:

  • Kernel returns only ready fds.
  • 10K connections + 100 active returns 100.

2. Register once:

  • epoll_ctl registers the fd.
  • No need to pass them every call.

3. Trigger modes:

  • Level-Triggered (LT): keep notifying while data is present (default).
  • Edge-Triggered (ET): notify only on state change (Nginx).

3.3 ET vs LT

LT (Level-Triggered):

// 16KB available, you read 8KB → next epoll_wait still notifies

Easier to use, slightly slower.

ET (Edge-Triggered):

// Only notified when new data arrives → must drain fully
while (true) {
    n = recv(fd, buf, sizeof(buf), 0);
    if (n < 0 && errno == EAGAIN) break;  // fully drained
    process(buf, n);
}

Faster, but easier to lose events if you mishandle.

3.4 Systems Built on epoll

  • Nginx — core.
  • Node.js (via libuv).
  • Redis.
  • HAProxy.
  • Memcached.
  • Nearly every high-performance Linux server.

3.5 Equivalents on Other OSes

OSAPI
Linuxepoll
macOS/BSDkqueue
WindowsIOCP
Solaris/dev/poll (deprecated), event ports

4. kqueue (BSD/macOS)

4.1 More Powerful Than epoll

int kq = kqueue();

struct kevent change;
EV_SET(&change, sock, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, NULL);
kevent(kq, &change, 1, NULL, 0, NULL);

struct kevent events[64];
int n = kevent(kq, NULL, 0, events, 64, NULL);

Richer than epoll:

  • File changes (EVFILT_VNODE).
  • Signals (EVFILT_SIGNAL).
  • Timers (EVFILT_TIMER).
  • Process exit (EVFILT_PROC).
  • Disk I/O, and more.

Downside: smaller user base than epoll.


5. io_uring — The New Revolution

5.1 Limits of epoll

Even epoll has limits:

  • Synchronous I/O: you still call read/write after epoll_wait.
  • Syscall cost: user space ↔ kernel on every call.
  • Not truly async: read/write themselves can still block.

5.2 io_uring (2019, Linux 5.1)

Built by Jens Axboe (Linux kernel developer). True async I/O.

Core ideas:

  • Two ring buffers (Submission, Completion).
  • Shared memory between user space and kernel.
  • Near-zero syscalls.
[User Space]                    [Kernel Space]
[Submission Queue (SQ)] ────→ [Worker]
[Completion Queue (CQ)] ←──── [I/O complete]

5.3 Example Usage

struct io_uring ring;
io_uring_queue_init(256, &ring, 0);

// Submit work
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, sizeof(buf), 0);
io_uring_submit(&ring);

// Wait for completion
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
// cqe->res = bytes read
io_uring_cqe_seen(&ring, cqe);

5.4 Advantages of io_uring

1. Truly async:

  • Disk I/O no longer blocks.
  • Unified model for network and disk.

2. Fewer syscalls:

  • Batch submission.
  • Some modes allow zero syscalls.

3. Many operations:

  • read, write, send, recv.
  • accept, connect.
  • fsync, fallocate.
  • splice, tee.
  • statx, openat, close.

4. Faster:

  • 30–50% improvement over epoll on certain workloads.

5.5 io_uring in the Wild

  • Nginx 1.21+: optional support.
  • PostgreSQL 17: partial adoption.
  • Tokio (Rust): io-uring backend.
  • ScyllaDB: core technology.
  • Cloud Hypervisor.

5.6 Future of io_uring

  • eBPF integration.
  • Direct NVMe access.
  • GPU/FPGA I/O.
  • All syscalls async.

io_uring is the future of Linux I/O.


6. Reactor vs Proactor

6.1 Reactor Pattern

[Event Loop]
   ↓ epoll_wait
[Event: fd=5 readable]
[Handler]
[read(5, ...)]  ← the user calls read directly

Characteristics:

  • Event-readiness notification.
  • The user calls read/write.
  • Built on epoll, kqueue.

Examples: Nginx, Node.js, Redis.

6.2 Proactor Pattern

[User]"read 1024 bytes from this fd"
[Kernel]async work
[Completion: data ready]
[Handler] → user receives data

Characteristics:

  • Delegate the operation itself to the kernel.
  • Notified on completion.
  • Built on IOCP, io_uring.

Examples: Boost.ASIO, Windows IOCP, io_uring.

6.3 Comparison

ReactorProactor
Event"ready""completed"
Readuser callskernel handles
OS supportbroad (epoll, kqueue)limited (io_uring, IOCP)
Abstractionsimplecomplex
Performancegoodbetter (in theory)

6.4 ASIO's Unified Model

Boost.ASIO (C++) exposes Reactor and Proactor through the same interface:

async_read(socket, buffer, [](error_code ec, size_t bytes) {
    // called on completion
});

Internally: epoll on Linux, IOCP on Windows, io_uring on recent builds.


7. async/await Internals

7.1 What async/await Actually Does

async def fetch_data():
    data = await http_request("https://api.example.com")
    return process(data)

There is no magic. The compiler rewrites this as a state machine.

7.2 The Rewrite (Pseudo-code)

class FetchDataStateMachine:
    state = 0

    def resume(self):
        if self.state == 0:
            self.future = http_request("https://api.example.com")
            self.future.set_callback(self.resume)
            self.state = 1
            return  # suspend

        if self.state == 1:
            data = self.future.result()
            return process(data)

Core idea:

  • Suspend the function at each await.
  • Resume when the result is ready.
  • Yield the stack back to the event loop.

7.3 The Event Loop

loop = asyncio.get_event_loop()

while True:
    # 1. Run tasks on the ready queue
    while ready_queue:
        task = ready_queue.pop()
        task.run()

    # 2. Wait for I/O events via epoll_wait
    events = epoll.wait(timeout)

    # 3. Move waiters back onto the ready queue
    for event in events:
        task = pending[event.fd]
        ready_queue.append(task)

7.4 Python asyncio

import asyncio

async def main():
    # Concurrent execution
    results = await asyncio.gather(
        fetch_user(1),
        fetch_user(2),
        fetch_user(3),
    )
    return results

asyncio.run(main())

Internals:

  • The selectors module (abstracts epoll/kqueue/select).
  • Event loop.
  • Tasks and Futures.
  • Coroutines.

7.5 JavaScript (Node.js)

async function main() {
    const data = await fetch("https://api.example.com")
    const json = await data.json()
    console.log(json)
}

Internals:

  • libuv (C library).
  • Event loop.
  • epoll/kqueue/IOCP.

7.6 Rust Tokio

#[tokio::main]
async fn main() {
    let data = fetch("https://api.example.com").await;
    println!("{:?}", data);
}

Internals:

  • mio (abstracts epoll/kqueue).
  • Optional io-uring.
  • M:N scheduling across multiple OS threads.
  • Work-stealing.

8. Single-Threaded vs Multi-Threaded

8.1 Single-Threaded (Node.js, Redis, Nginx)

[1 thread]
[Event Loop] (epoll)
[Callback 1] [Callback 2] [Callback 3]

Pros:

  • No locks (no race conditions).
  • Simple.
  • Memory-efficient.

Cons:

  • CPU work blocks the loop.
  • Only one core is used.

8.2 Multi-Threaded + Event Loops

[N threads (one per CPU core)]
[Event Loop per thread]
[Work-stealing]

Examples: Tokio (Rust), Akka (JVM), Kotlin Coroutines.

Pros:

  • Multi-core utilization.
  • Hundreds of thousands of concurrent tasks in a single process.

Cons:

  • Synchronization required.
  • Harder to debug.

8.3 SO_REUSEPORT

Linux 3.9+ feature:

setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));

Multiple processes can listen on the same port:

  • The kernel load-balances automatically.
  • Each process gets its own epoll instance.
  • Used by Nginx and HAProxy.
[Port 80]
   ├─ [Worker 1] (epoll)
   ├─ [Worker 2] (epoll)
   └─ [Worker 3] (epoll)

9. Performance Comparison

9.1 Handling 10K Connections

ModelMemoryThroughput
Thread per connection~10 GBlow
selectlowvery low
polllowlow
epolllowhigh
io_uringlowvery high

9.2 Real Benchmarks

HTTP server (10K connections):

Apache (prefork):    5,000 req/s
Apache (worker):     20,000 req/s
Nginx (epoll):       100,000 req/s
Nginx (io_uring):    150,000 req/s

ScyllaDB (io_uring):

  • 10x+ throughput over Cassandra.
  • 1M+ ops/s on a single node.

9.3 NewSQL and Storage

ScyllaDB's secret sauce:

  • io_uring + DPDK.
  • One shard per core.
  • Lock-free.
  • Userspace TCP stack.

10. In Practice — Building a High-Performance Server

10.1 Python (asyncio)

import asyncio

async def handle_client(reader, writer):
    while True:
        data = await reader.read(1024)
        if not data:
            break
        writer.write(data)
        await writer.drain()
    writer.close()

async def main():
    server = await asyncio.start_server(handle_client, '0.0.0.0', 8080)
    async with server:
        await server.serve_forever()

asyncio.run(main())

Internally uses selectors → epoll.

10.2 Rust (Tokio)

use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();

    loop {
        let (mut socket, _) = listener.accept().await.unwrap();

        tokio::spawn(async move {
            let mut buf = [0; 1024];
            loop {
                let n = socket.read(&mut buf).await.unwrap();
                if n == 0 { return }
                socket.write_all(&buf[..n]).await.unwrap();
            }
        });
    }
}

Tokio handles:

  • mio (epoll).
  • M:N scheduling.
  • Work-stealing.

10.3 Go

listener, _ := net.Listen("tcp", ":8080")

for {
    conn, _ := listener.Accept()
    go handle(conn)
}

func handle(conn net.Conn) {
    buf := make([]byte, 1024)
    for {
        n, err := conn.Read(buf)
        if err != nil { return }
        conn.Write(buf[:n])
    }
}

The Go runtime handles:

  • netpoll (epoll/kqueue).
  • Goroutine scheduling.
  • Channel integration.

10.4 Comparison

Python asyncioRust TokioGo
Syntaxasync/awaitasync/awaitgo func()
Performancemoderatetopvery good
Learning curveeasysteepeasy
Memorymoderatevery lowlow
Best forfast prototypingmaximum performancebalance

Quiz

1. What is the core difference between select and epoll?

Answer: select scans every fd on every call (O(n)), is limited by FD_SETSIZE (typically 1024), and copies the fd_set between user and kernel on every call. epoll returns only fds with events (O(1)), has no fd-count limit, and registers fds once via epoll_ctl. With 10K connections and 100 active: select scans 10K every time, epoll returns just 100. That is why Nginx, Node.js, and Redis are all built on epoll.

2. Why is io_uring better than epoll?

Answer: epoll still has limits — synchronous read/write (you still call them after epoll_wait), syscall cost, and disk I/O that can still block. io_uring: (1) two ring buffers (SQ/CQ) plus shared memory, (2) near-zero syscalls (batch submission), (3) truly async (disk I/O no longer blocks), (4) supports not just read/write but also accept, connect, fsync, and more. 30–50% faster. Used by ScyllaDB and PostgreSQL 17. The future of Linux I/O.

3. What is the difference between the Reactor and Proactor patterns?

Answer: Reactor (epoll-based): the event loop signals "fd is ready" and the user calls read. Proactor (io_uring / IOCP-based): the user requests "read this data for me", the kernel performs the work, and signals on completion. Reactor makes the user do the work; Proactor makes the kernel do it. Boost.ASIO abstracts both patterns behind the same interface.

4. How does async/await actually work?

Answer: No magic. The compiler rewrites the function as a state machine. At each await the function suspends and registers a callback on the future/promise. When the result is ready, it resumes. The event loop (1) runs tasks on the ready queue, (2) waits for I/O via epoll_wait, (3) moves completed tasks back onto the ready queue. Python asyncio, JS/Node.js, Rust Tokio, and Go goroutines all follow this pattern.

5. What is SO_REUSEPORT and how is it used?

Answer: A Linux 3.9+ feature. Multiple processes can listen on the same port simultaneously. The kernel distributes new connections automatically. Each process has its own epoll instance, giving you true lock-free parallelism. Used by Nginx and HAProxy — each worker process listens on the same port, and the kernel balances between them. It sidesteps single-process limits. One worker per CPU core is the common pattern.


References

현재 단락 (1/408)

- **Async I/O is the foundation of modern servers**: Nginx, Node.js, Redis — every high-performance ...

작성 글자: 0원문 글자: 13,834작성 단락: 0/408