✍️ 필사 모드: Complete Guide to Async I/O Models 2025: epoll, io_uring, Reactor/Proactor, and async/await Internals
EnglishTL;DR
- Async I/O is the foundation of modern servers: Nginx, Node.js, Redis — every high-performance server.
- Evolution: select → poll → epoll/kqueue → io_uring.
- Reactor vs Proactor: event readiness notification vs operation-completion notification.
- io_uring: the Linux 5.1+ revolution. More efficient than epoll, and truly async.
- async/await: the compiler rewrites your code as a state machine.
1. Evolution of I/O Models
1.1 Five I/O Models
POSIX classification:
- Blocking I/O: wait until the operation completes.
- Non-blocking I/O: return immediately, poll for completion.
- I/O Multiplexing: select/poll/epoll.
- Signal-driven I/O: notified via signals.
- Asynchronous I/O: true async (POSIX AIO, io_uring).
1.2 Blocking I/O — The Simplest
data = sock.recv(1024) # blocks until data arrives
process(data)
Problems:
- One thread = one connection.
- 10K connections = 10K threads (memory blow-up).
- The C10K problem.
1.3 Multi-Threaded as the Fix?
def handle_client(sock):
data = sock.recv(1024)
process(data)
while True:
client = server.accept()
threading.Thread(target=handle_client, args=(client,)).start()
Limits:
- Per-thread memory (1–8 MB).
- Context-switching cost.
- 10K threads = death.
1.4 Non-blocking I/O
sock.setblocking(False)
try:
data = sock.recv(1024)
except BlockingIOError:
pass # no data — do something else
Problem: infinite-loop polling pins CPU at 100%.
1.5 Enter I/O Multiplexing
Idea: one syscall watching many fds at once.
ready, _, _ = select([sock1, sock2, sock3], [], [])
for sock in ready:
data = sock.recv(1024)
A single thread can now handle thousands of connections.
2. select and poll
2.1 select (1983)
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(sock, &readfds);
int n = select(max_fd + 1, &readfds, NULL, NULL, &timeout);
if (FD_ISSET(sock, &readfds)) {
// readable
}
Problems:
- FD_SETSIZE limit (typically 1024).
- O(n) scan: every call inspects every fd.
- fd_set copy: user space ↔ kernel on each call.
2.2 poll (1986)
struct pollfd fds[1024];
fds[0].fd = sock;
fds[0].events = POLLIN;
int n = poll(fds, 1024, timeout);
if (fds[0].revents & POLLIN) {
// readable
}
Improvements:
- No hard fd-count limit.
- Dynamic array.
Still problematic:
- O(n) scan.
- The fd list is passed on every call.
2.3 Limits of select/poll
10,000 connections with only 100 active:
- select/poll scan all 10,000 every time.
- Wasted CPU.
A new mechanism was needed.
3. epoll — The Linux Revolution
3.1 Arrival (2002, Linux 2.5.44)
int epfd = epoll_create1(0);
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = sock;
epoll_ctl(epfd, EPOLL_CTL_ADD, sock, &ev);
struct epoll_event events[64];
int n = epoll_wait(epfd, events, 64, -1);
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
// handle
}
3.2 Why epoll Wins
1. O(1) event detection:
- Kernel returns only ready fds.
- 10K connections + 100 active returns 100.
2. Register once:
epoll_ctlregisters the fd.- No need to pass them every call.
3. Trigger modes:
- Level-Triggered (LT): keep notifying while data is present (default).
- Edge-Triggered (ET): notify only on state change (Nginx).
3.3 ET vs LT
LT (Level-Triggered):
// 16KB available, you read 8KB → next epoll_wait still notifies
Easier to use, slightly slower.
ET (Edge-Triggered):
// Only notified when new data arrives → must drain fully
while (true) {
n = recv(fd, buf, sizeof(buf), 0);
if (n < 0 && errno == EAGAIN) break; // fully drained
process(buf, n);
}
Faster, but easier to lose events if you mishandle.
3.4 Systems Built on epoll
- Nginx — core.
- Node.js (via libuv).
- Redis.
- HAProxy.
- Memcached.
- Nearly every high-performance Linux server.
3.5 Equivalents on Other OSes
| OS | API |
|---|---|
| Linux | epoll |
| macOS/BSD | kqueue |
| Windows | IOCP |
| Solaris | /dev/poll (deprecated), event ports |
4. kqueue (BSD/macOS)
4.1 More Powerful Than epoll
int kq = kqueue();
struct kevent change;
EV_SET(&change, sock, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, NULL);
kevent(kq, &change, 1, NULL, 0, NULL);
struct kevent events[64];
int n = kevent(kq, NULL, 0, events, 64, NULL);
Richer than epoll:
- File changes (
EVFILT_VNODE). - Signals (
EVFILT_SIGNAL). - Timers (
EVFILT_TIMER). - Process exit (
EVFILT_PROC). - Disk I/O, and more.
Downside: smaller user base than epoll.
5. io_uring — The New Revolution
5.1 Limits of epoll
Even epoll has limits:
- Synchronous I/O: you still call read/write after epoll_wait.
- Syscall cost: user space ↔ kernel on every call.
- Not truly async: read/write themselves can still block.
5.2 io_uring (2019, Linux 5.1)
Built by Jens Axboe (Linux kernel developer). True async I/O.
Core ideas:
- Two ring buffers (Submission, Completion).
- Shared memory between user space and kernel.
- Near-zero syscalls.
[User Space] [Kernel Space]
[Submission Queue (SQ)] ────→ [Worker]
↓
[Completion Queue (CQ)] ←──── [I/O complete]
5.3 Example Usage
struct io_uring ring;
io_uring_queue_init(256, &ring, 0);
// Submit work
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, sizeof(buf), 0);
io_uring_submit(&ring);
// Wait for completion
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
// cqe->res = bytes read
io_uring_cqe_seen(&ring, cqe);
5.4 Advantages of io_uring
1. Truly async:
- Disk I/O no longer blocks.
- Unified model for network and disk.
2. Fewer syscalls:
- Batch submission.
- Some modes allow zero syscalls.
3. Many operations:
- read, write, send, recv.
- accept, connect.
- fsync, fallocate.
- splice, tee.
- statx, openat, close.
4. Faster:
- 30–50% improvement over epoll on certain workloads.
5.5 io_uring in the Wild
- Nginx 1.21+: optional support.
- PostgreSQL 17: partial adoption.
- Tokio (Rust): io-uring backend.
- ScyllaDB: core technology.
- Cloud Hypervisor.
5.6 Future of io_uring
- eBPF integration.
- Direct NVMe access.
- GPU/FPGA I/O.
- All syscalls async.
io_uring is the future of Linux I/O.
6. Reactor vs Proactor
6.1 Reactor Pattern
[Event Loop]
↓ epoll_wait
[Event: fd=5 readable]
↓
[Handler]
↓
[read(5, ...)] ← the user calls read directly
Characteristics:
- Event-readiness notification.
- The user calls read/write.
- Built on epoll, kqueue.
Examples: Nginx, Node.js, Redis.
6.2 Proactor Pattern
[User] → "read 1024 bytes from this fd"
↓
[Kernel] → async work
↓
[Completion: data ready]
↓
[Handler] → user receives data
Characteristics:
- Delegate the operation itself to the kernel.
- Notified on completion.
- Built on IOCP, io_uring.
Examples: Boost.ASIO, Windows IOCP, io_uring.
6.3 Comparison
| Reactor | Proactor | |
|---|---|---|
| Event | "ready" | "completed" |
| Read | user calls | kernel handles |
| OS support | broad (epoll, kqueue) | limited (io_uring, IOCP) |
| Abstraction | simple | complex |
| Performance | good | better (in theory) |
6.4 ASIO's Unified Model
Boost.ASIO (C++) exposes Reactor and Proactor through the same interface:
async_read(socket, buffer, [](error_code ec, size_t bytes) {
// called on completion
});
Internally: epoll on Linux, IOCP on Windows, io_uring on recent builds.
7. async/await Internals
7.1 What async/await Actually Does
async def fetch_data():
data = await http_request("https://api.example.com")
return process(data)
There is no magic. The compiler rewrites this as a state machine.
7.2 The Rewrite (Pseudo-code)
class FetchDataStateMachine:
state = 0
def resume(self):
if self.state == 0:
self.future = http_request("https://api.example.com")
self.future.set_callback(self.resume)
self.state = 1
return # suspend
if self.state == 1:
data = self.future.result()
return process(data)
Core idea:
- Suspend the function at each
await. - Resume when the result is ready.
- Yield the stack back to the event loop.
7.3 The Event Loop
loop = asyncio.get_event_loop()
while True:
# 1. Run tasks on the ready queue
while ready_queue:
task = ready_queue.pop()
task.run()
# 2. Wait for I/O events via epoll_wait
events = epoll.wait(timeout)
# 3. Move waiters back onto the ready queue
for event in events:
task = pending[event.fd]
ready_queue.append(task)
7.4 Python asyncio
import asyncio
async def main():
# Concurrent execution
results = await asyncio.gather(
fetch_user(1),
fetch_user(2),
fetch_user(3),
)
return results
asyncio.run(main())
Internals:
- The selectors module (abstracts epoll/kqueue/select).
- Event loop.
- Tasks and Futures.
- Coroutines.
7.5 JavaScript (Node.js)
async function main() {
const data = await fetch("https://api.example.com")
const json = await data.json()
console.log(json)
}
Internals:
- libuv (C library).
- Event loop.
- epoll/kqueue/IOCP.
7.6 Rust Tokio
#[tokio::main]
async fn main() {
let data = fetch("https://api.example.com").await;
println!("{:?}", data);
}
Internals:
- mio (abstracts epoll/kqueue).
- Optional io-uring.
- M:N scheduling across multiple OS threads.
- Work-stealing.
8. Single-Threaded vs Multi-Threaded
8.1 Single-Threaded (Node.js, Redis, Nginx)
[1 thread]
↓
[Event Loop] (epoll)
↓
[Callback 1] [Callback 2] [Callback 3]
Pros:
- No locks (no race conditions).
- Simple.
- Memory-efficient.
Cons:
- CPU work blocks the loop.
- Only one core is used.
8.2 Multi-Threaded + Event Loops
[N threads (one per CPU core)]
↓
[Event Loop per thread]
↓
[Work-stealing]
Examples: Tokio (Rust), Akka (JVM), Kotlin Coroutines.
Pros:
- Multi-core utilization.
- Hundreds of thousands of concurrent tasks in a single process.
Cons:
- Synchronization required.
- Harder to debug.
8.3 SO_REUSEPORT
Linux 3.9+ feature:
setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));
Multiple processes can listen on the same port:
- The kernel load-balances automatically.
- Each process gets its own epoll instance.
- Used by Nginx and HAProxy.
[Port 80]
├─ [Worker 1] (epoll)
├─ [Worker 2] (epoll)
└─ [Worker 3] (epoll)
9. Performance Comparison
9.1 Handling 10K Connections
| Model | Memory | Throughput |
|---|---|---|
| Thread per connection | ~10 GB | low |
| select | low | very low |
| poll | low | low |
| epoll | low | high |
| io_uring | low | very high |
9.2 Real Benchmarks
HTTP server (10K connections):
Apache (prefork): 5,000 req/s
Apache (worker): 20,000 req/s
Nginx (epoll): 100,000 req/s
Nginx (io_uring): 150,000 req/s
ScyllaDB (io_uring):
- 10x+ throughput over Cassandra.
- 1M+ ops/s on a single node.
9.3 NewSQL and Storage
ScyllaDB's secret sauce:
- io_uring + DPDK.
- One shard per core.
- Lock-free.
- Userspace TCP stack.
10. In Practice — Building a High-Performance Server
10.1 Python (asyncio)
import asyncio
async def handle_client(reader, writer):
while True:
data = await reader.read(1024)
if not data:
break
writer.write(data)
await writer.drain()
writer.close()
async def main():
server = await asyncio.start_server(handle_client, '0.0.0.0', 8080)
async with server:
await server.serve_forever()
asyncio.run(main())
Internally uses selectors → epoll.
10.2 Rust (Tokio)
use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
#[tokio::main]
async fn main() {
let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
loop {
let (mut socket, _) = listener.accept().await.unwrap();
tokio::spawn(async move {
let mut buf = [0; 1024];
loop {
let n = socket.read(&mut buf).await.unwrap();
if n == 0 { return }
socket.write_all(&buf[..n]).await.unwrap();
}
});
}
}
Tokio handles:
- mio (epoll).
- M:N scheduling.
- Work-stealing.
10.3 Go
listener, _ := net.Listen("tcp", ":8080")
for {
conn, _ := listener.Accept()
go handle(conn)
}
func handle(conn net.Conn) {
buf := make([]byte, 1024)
for {
n, err := conn.Read(buf)
if err != nil { return }
conn.Write(buf[:n])
}
}
The Go runtime handles:
- netpoll (epoll/kqueue).
- Goroutine scheduling.
- Channel integration.
10.4 Comparison
| Python asyncio | Rust Tokio | Go | |
|---|---|---|---|
| Syntax | async/await | async/await | go func() |
| Performance | moderate | top | very good |
| Learning curve | easy | steep | easy |
| Memory | moderate | very low | low |
| Best for | fast prototyping | maximum performance | balance |
Quiz
1. What is the core difference between select and epoll?
Answer: select scans every fd on every call (O(n)), is limited by FD_SETSIZE (typically 1024), and copies the fd_set between user and kernel on every call. epoll returns only fds with events (O(1)), has no fd-count limit, and registers fds once via epoll_ctl. With 10K connections and 100 active: select scans 10K every time, epoll returns just 100. That is why Nginx, Node.js, and Redis are all built on epoll.
2. Why is io_uring better than epoll?
Answer: epoll still has limits — synchronous read/write (you still call them after epoll_wait), syscall cost, and disk I/O that can still block. io_uring: (1) two ring buffers (SQ/CQ) plus shared memory, (2) near-zero syscalls (batch submission), (3) truly async (disk I/O no longer blocks), (4) supports not just read/write but also accept, connect, fsync, and more. 30–50% faster. Used by ScyllaDB and PostgreSQL 17. The future of Linux I/O.
3. What is the difference between the Reactor and Proactor patterns?
Answer: Reactor (epoll-based): the event loop signals "fd is ready" and the user calls read. Proactor (io_uring / IOCP-based): the user requests "read this data for me", the kernel performs the work, and signals on completion. Reactor makes the user do the work; Proactor makes the kernel do it. Boost.ASIO abstracts both patterns behind the same interface.
4. How does async/await actually work?
Answer: No magic. The compiler rewrites the function as a state machine. At each await the function suspends and registers a callback on the future/promise. When the result is ready, it resumes. The event loop (1) runs tasks on the ready queue, (2) waits for I/O via epoll_wait, (3) moves completed tasks back onto the ready queue. Python asyncio, JS/Node.js, Rust Tokio, and Go goroutines all follow this pattern.
5. What is SO_REUSEPORT and how is it used?
Answer: A Linux 3.9+ feature. Multiple processes can listen on the same port simultaneously. The kernel distributes new connections automatically. Each process has its own epoll instance, giving you true lock-free parallelism. Used by Nginx and HAProxy — each worker process listens on the same port, and the kernel balances between them. It sidesteps single-process limits. One worker per CPU core is the common pattern.
References
- The C10K Problem — Dan Kegel
- epoll man page
- Introduction to io_uring — Jens Axboe
- tokio — Rust async runtime
- Boost.ASIO
- libuv — the foundation of Node.js
- Nginx Event Loop
- What is io_uring?
- ScyllaDB — powered by io_uring
- The Reactor Pattern
- Python asyncio Internals
현재 단락 (1/408)
- **Async I/O is the foundation of modern servers**: Nginx, Node.js, Redis — every high-performance ...