Split View: 비동기 I/O 모델 완전 가이드 2025: epoll, io_uring, Reactor/Proactor, async/await 내부
비동기 I/O 모델 완전 가이드 2025: epoll, io_uring, Reactor/Proactor, async/await 내부
TL;DR
- 비동기 I/O = 현대 서버의 기반: Nginx, Node.js, Redis, 모든 고성능 서버
- 진화: select → poll → epoll/kqueue → io_uring
- Reactor vs Proactor: 이벤트 알림 vs 작업 완료 알림
- io_uring: Linux 5.1+의 혁명. epoll보다 효율적, 진정한 async
- async/await: 컴파일러가 state machine으로 변환
1. I/O 모델의 진화
1.1 5가지 I/O 모델
POSIX의 분류:
- Blocking I/O: 작업 완료까지 대기
- Non-blocking I/O: 즉시 반환, 폴링
- I/O Multiplexing: select/poll/epoll
- Signal-driven I/O: 시그널로 알림
- Asynchronous I/O: 진정한 async (POSIX AIO, io_uring)
1.2 Blocking I/O — 가장 단순
data = sock.recv(1024) # 데이터 올 때까지 블록
process(data)
문제:
- 한 스레드 = 한 연결
- 1만 연결 = 1만 스레드 (메모리 폭증)
- C10K 문제
1.3 멀티 스레드 해결?
def handle_client(sock):
data = sock.recv(1024)
process(data)
while True:
client = server.accept()
threading.Thread(target=handle_client, args=(client,)).start()
한계:
- 스레드당 메모리 (1-8 MB)
- 컨텍스트 스위칭 비용
- 1만 스레드 = 죽음
1.4 Non-blocking I/O
sock.setblocking(False)
try:
data = sock.recv(1024)
except BlockingIOError:
pass # 데이터 없음, 다른 일 하기
문제: 무한 루프 폴링 → CPU 100%.
1.5 I/O Multiplexing의 등장
아이디어: 하나의 시스템 콜로 여러 fd를 동시에 감시.
ready, _, _ = select([sock1, sock2, sock3], [], [])
for sock in ready:
data = sock.recv(1024)
→ 단일 스레드로 수많은 연결 처리.
2. select와 poll
2.1 select (1983)
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(sock, &readfds);
int n = select(max_fd + 1, &readfds, NULL, NULL, &timeout);
if (FD_ISSET(sock, &readfds)) {
// 읽기 가능
}
문제:
- FD_SETSIZE 제한 (보통 1024)
- O(n) 스캔: 매번 모든 fd 검사
- fd_set 복사: 매 호출마다 사용자 ↔ 커널
2.2 poll (1986)
struct pollfd fds[1024];
fds[0].fd = sock;
fds[0].events = POLLIN;
int n = poll(fds, 1024, timeout);
if (fds[0].revents & POLLIN) {
// 읽기 가능
}
개선:
- fd 수 제한 없음
- 동적 배열
여전한 문제:
- O(n) 스캔
- 매 호출마다 fd 목록 전달
2.3 select/poll의 한계
10,000 연결 + 100개만 활성:
- select/poll: 매번 10,000 스캔
- CPU 낭비
→ 새로운 메커니즘 필요.
3. epoll — Linux의 혁명
3.1 등장 (2002, Linux 2.5.44)
int epfd = epoll_create1(0);
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = sock;
epoll_ctl(epfd, EPOLL_CTL_ADD, sock, &ev);
struct epoll_event events[64];
int n = epoll_wait(epfd, events, 64, -1);
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
// 처리
}
3.2 epoll의 장점
1. O(1) 이벤트 감지:
- 커널이 ready 상태인 fd만 반환
- 10,000 연결 + 100개 활성 → 100만 반환
2. 등록 한 번:
epoll_ctl로 fd 등록- 매번 전달 X
3. 이벤트 트리거 모드:
- Level-Triggered (LT): 데이터 있는 한 계속 알림 (기본)
- Edge-Triggered (ET): 상태 변할 때만 알림 (Nginx)
3.3 ET vs LT
LT (Level-Triggered):
// 데이터 16KB 있고 8KB만 읽음 → 다음 epoll_wait도 알림
쉽지만 약간 느림.
ET (Edge-Triggered):
// 새 데이터 도착 시에만 알림 → 모두 읽어야 함
while (true) {
n = recv(fd, buf, sizeof(buf), 0);
if (n < 0 && errno == EAGAIN) break; // 데이터 다 읽음
process(buf, n);
}
빠르지만 처리 누락 위험.
3.4 epoll을 사용하는 시스템
- Nginx — 핵심
- Node.js (libuv)
- Redis
- HAProxy
- Memcached
- 거의 모든 Linux 고성능 서버
3.5 다른 OS의 동등물
| OS | API |
|---|---|
| Linux | epoll |
| macOS/BSD | kqueue |
| Windows | IOCP |
| Solaris | /dev/poll (deprecated), event ports |
4. kqueue (BSD/macOS)
4.1 epoll보다 강력
int kq = kqueue();
struct kevent change;
EV_SET(&change, sock, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, NULL);
kevent(kq, &change, 1, NULL, 0, NULL);
struct kevent events[64];
int n = kevent(kq, NULL, 0, events, 64, NULL);
epoll보다 풍부:
- 파일 변경 (
EVFILT_VNODE) - 시그널 (
EVFILT_SIGNAL) - 타이머 (
EVFILT_TIMER) - 프로세스 종료 (
EVFILT_PROC) - 디스크 I/O 등
문제: epoll보다 사용자 적음.
5. io_uring — 새로운 혁명
5.1 epoll의 한계
epoll도 한계가 있습니다:
- 동기 I/O: epoll_wait 후 read/write 호출 필요
- 시스템 콜 비용: 매번 사용자 ↔ 커널
- 진정한 async X: read/write 자체는 블록 가능
5.2 io_uring (2019, Linux 5.1)
Jens Axboe (Linux 커널 개발자)가 만듦. 진정한 async I/O.
핵심:
- 두 개의 ring buffer (Submission, Completion)
- 공유 메모리 (사용자 ↔ 커널)
- 시스템 콜 거의 0
[User Space] [Kernel Space]
[Submission Queue (SQ)] ────→ [Worker]
↓
[Completion Queue (CQ)] ←──── [I/O 완료]
5.3 사용 예시
struct io_uring ring;
io_uring_queue_init(256, &ring, 0);
// 작업 제출
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, sizeof(buf), 0);
io_uring_submit(&ring);
// 완료 대기
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
// cqe->res = 읽은 바이트 수
io_uring_cqe_seen(&ring, cqe);
5.4 io_uring의 장점
1. 진정한 async:
- 디스크 I/O도 블록 안 함
- 네트워크 I/O 통합
2. 적은 시스템 콜:
- 배치 제출
- 일부 모드에서는 0 시스템 콜 가능
3. 다양한 작업:
- read, write, send, recv
- accept, connect
- fsync, fallocate
- splice, tee
- statx, openat, close
4. 더 빠름:
- epoll 대비 30-50% 향상 (특정 워크로드)
5.5 io_uring 사용 사례
- Nginx 1.21+: 옵션 지원
- PostgreSQL 17: 일부 사용
- Tokio (Rust): io-uring backend
- ScyllaDB: 핵심 기술
- Cloud Hypervisor
5.6 io_uring의 미래
- eBPF 통합
- NVMe 직접 액세스
- GPU/FPGA I/O
- 모든 시스템 콜 async
io_uring은 Linux I/O의 미래입니다.
6. Reactor vs Proactor 패턴
6.1 Reactor Pattern
[Event Loop]
↓ epoll_wait
[Event: fd=5 readable]
↓
[Handler]
↓
[read(5, ...)] ← 사용자가 직접 read
특징:
- 이벤트 발생 알림
- 사용자가 read/write 호출
- epoll, kqueue 기반
예: Nginx, Node.js, Redis
6.2 Proactor Pattern
[User] → "이 fd에서 1024 byte 읽어줘"
↓
[Kernel] → 비동기 작업
↓
[Completion: 데이터 준비됨]
↓
[Handler] → 사용자가 받음
특징:
- 작업 자체를 커널에 위임
- 완료되면 알림
- IOCP, io_uring 기반
예: ASIO (Boost), Windows IOCP, io_uring
6.3 비교
| Reactor | Proactor | |
|---|---|---|
| 이벤트 | "준비됨" | "완료됨" |
| 읽기 | 사용자가 호출 | 커널이 알아서 |
| OS 지원 | 광범위 (epoll, kqueue) | 제한적 (io_uring, IOCP) |
| 추상화 | 단순 | 복잡 |
| 성능 | 좋음 | 더 좋음 (이론) |
6.4 ASIO의 통합 모델
Boost.ASIO (C++): Reactor와 Proactor를 같은 인터페이스로:
async_read(socket, buffer, [](error_code ec, size_t bytes) {
// 완료 시 호출
});
내부적으로 Linux는 epoll, Windows는 IOCP, 최신은 io_uring.
7. async/await 내부
7.1 async/await가 하는 일
async def fetch_data():
data = await http_request("https://api.example.com")
return process(data)
이는 마법이 아닙니다. 컴파일러가 state machine으로 변환.
7.2 변환 결과 (의사 코드)
class FetchDataStateMachine:
state = 0
def resume(self):
if self.state == 0:
self.future = http_request("https://api.example.com")
self.future.set_callback(self.resume)
self.state = 1
return # 일시 중지
if self.state == 1:
data = self.future.result()
return process(data)
핵심:
await에서 함수 일시 중지- 결과 준비되면 재개
- 스택을 이벤트 루프에 양보
7.3 이벤트 루프
loop = asyncio.get_event_loop()
while True:
# 1. ready 큐의 작업 실행
while ready_queue:
task = ready_queue.pop()
task.run()
# 2. epoll_wait로 I/O 이벤트 대기
events = epoll.wait(timeout)
# 3. 대기 중이던 작업을 ready 큐로
for event in events:
task = pending[event.fd]
ready_queue.append(task)
7.4 Python asyncio
import asyncio
async def main():
# 동시 실행
results = await asyncio.gather(
fetch_user(1),
fetch_user(2),
fetch_user(3),
)
return results
asyncio.run(main())
내부:
- selectors 모듈 (epoll/kqueue/select 추상화)
- 이벤트 루프
- Task와 Future
- Coroutine
7.5 JavaScript (Node.js)
async function main() {
const data = await fetch("https://api.example.com")
const json = await data.json()
console.log(json)
}
내부:
- libuv (C 라이브러리)
- 이벤트 루프
- epoll/kqueue/IOCP
7.6 Rust Tokio
#[tokio::main]
async fn main() {
let data = fetch("https://api.example.com").await;
println!("{:?}", data);
}
내부:
- mio (epoll/kqueue 추상화)
- io-uring 옵션
- M:N 스케줄링 (멀티 OS 스레드)
- Work-stealing
8. 단일 스레드 vs 멀티 스레드
8.1 단일 스레드 (Node.js, Redis, Nginx)
[1 스레드]
↓
[Event Loop] (epoll)
↓
[Callback 1] [Callback 2] [Callback 3]
장점:
- 락 없음 (race condition 없음)
- 단순
- 메모리 효율
단점:
- CPU 작업이 블록
- 단일 코어만 사용
8.2 멀티 스레드 + 이벤트 루프
[N 스레드 (CPU 코어 수)]
↓
[Event Loop per thread]
↓
[Work-stealing]
예: Tokio (Rust), Akka (JVM), Kotlin Coroutines
장점:
- 멀티 코어 활용
- 단일 프로세스에서 수십만 동시 작업
단점:
- 동기화 필요
- 디버깅 어려움
8.3 SO_REUSEPORT
리눅스 3.9+의 기능:
setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));
여러 프로세스가 같은 포트 listen:
- 커널이 자동 로드 밸런싱
- 각 프로세스가 독립 epoll
- Nginx, HAProxy가 사용
[Port 80]
├─ [Worker 1] (epoll)
├─ [Worker 2] (epoll)
└─ [Worker 3] (epoll)
9. 성능 비교
9.1 1만 연결 처리
| 모델 | 메모리 | 처리량 |
|---|---|---|
| Thread per connection | ~10 GB | 낮음 |
| select | 적음 | 매우 낮음 |
| poll | 적음 | 낮음 |
| epoll | 적음 | 높음 |
| io_uring | 적음 | 매우 높음 |
9.2 실제 벤치마크
HTTP 서버 (10K connections):
Apache (prefork): 5,000 req/s
Apache (worker): 20,000 req/s
Nginx (epoll): 100,000 req/s
Nginx (io_uring): 150,000 req/s
ScyllaDB (io_uring):
- Cassandra 대비 10배+ 처리량
- 단일 노드에서 100만+ ops/s
9.3 NewSQL/스토리지
ScyllaDB의 비밀:
- io_uring + DPDK
- 코어당 shard
- 락 없음
- 사용자 공간 TCP 스택
10. 실전 — 고성능 서버 만들기
10.1 Python (asyncio)
import asyncio
async def handle_client(reader, writer):
while True:
data = await reader.read(1024)
if not data:
break
writer.write(data)
await writer.drain()
writer.close()
async def main():
server = await asyncio.start_server(handle_client, '0.0.0.0', 8080)
async with server:
await server.serve_forever()
asyncio.run(main())
내부적으로 selectors → epoll 사용.
10.2 Rust (Tokio)
use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
#[tokio::main]
async fn main() {
let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
loop {
let (mut socket, _) = listener.accept().await.unwrap();
tokio::spawn(async move {
let mut buf = [0; 1024];
loop {
let n = socket.read(&mut buf).await.unwrap();
if n == 0 { return }
socket.write_all(&buf[..n]).await.unwrap();
}
});
}
}
Tokio가 알아서:
- mio (epoll)
- M:N 스케줄링
- Work-stealing
10.3 Go
listener, _ := net.Listen("tcp", ":8080")
for {
conn, _ := listener.Accept()
go handle(conn)
}
func handle(conn net.Conn) {
buf := make([]byte, 1024)
for {
n, err := conn.Read(buf)
if err != nil { return }
conn.Write(buf[:n])
}
}
Go runtime이 알아서:
- netpoll (epoll/kqueue)
- Goroutine 스케줄링
- 채널과 통합
10.4 비교
| Python asyncio | Rust Tokio | Go | |
|---|---|---|---|
| 구문 | async/await | async/await | go func() |
| 성능 | 보통 | 최고 | 매우 좋음 |
| 러닝커브 | 낮음 | 가파름 | 낮음 |
| 메모리 | 보통 | 매우 적음 | 적음 |
| 권장 | 빠른 프로토타입 | 최고 성능 | 균형 |
퀴즈
1. select와 epoll의 핵심 차이는?
답: select: 매 호출마다 모든 fd를 스캔 (O(n)), FD_SETSIZE 제한 (보통 1024), fd_set을 사용자/커널 간 복사. epoll: 이벤트 발생한 fd만 반환 (O(1)), fd 수 제한 없음, epoll_ctl로 한 번만 등록. 10,000 연결 + 100개 활성: select는 매번 10,000 스캔, epoll은 100만 반환. Nginx, Node.js, Redis가 epoll 기반인 이유.
2. io_uring이 epoll보다 좋은 이유는?
답: epoll도 한계가 있습니다 — 동기 read/write (epoll_wait 후 직접 호출), 시스템 콜 비용, 디스크 I/O는 여전히 블록 가능. io_uring: (1) 두 개의 ring buffer (SQ/CQ) + 공유 메모리, (2) 시스템 콜 거의 0 (배치 제출), (3) 진정한 async (디스크 I/O도 블록 안 함), (4) read/write 외에도 accept, connect, fsync 등 다양한 작업. 30-50% 빠름. ScyllaDB, PostgreSQL 17이 사용. Linux I/O의 미래.
3. Reactor와 Proactor 패턴의 차이는?
답: Reactor (epoll 기반): 이벤트 루프가 "fd가 ready됨"을 알림 → 사용자가 read 호출. Proactor (io_uring, IOCP 기반): 사용자가 "이 데이터 읽어줘"라고 요청 → 커널이 알아서 처리 → 완료 시 알림. Reactor는 사용자가 일을 함, Proactor는 커널이 일을 함. Boost.ASIO는 두 패턴을 같은 인터페이스로 추상화.
4. async/await가 어떻게 작동하나요?
답: 마법이 아닙니다. 컴파일러가 state machine으로 변환합니다. await 지점마다 함수가 일시 중지되고, future/promise에 콜백을 등록. 결과가 준비되면 재개. 이벤트 루프는 (1) ready 큐의 작업 실행, (2) epoll_wait로 I/O 이벤트 대기, (3) 완료된 작업을 ready 큐로 이동. Python asyncio, JS Node.js, Rust Tokio, Go의 goroutine 모두 이 패턴을 따릅니다.
5. SO_REUSEPORT가 무엇이고 어떻게 사용되나요?
답: Linux 3.9+의 기능. 여러 프로세스가 같은 포트를 동시에 listen 가능. 커널이 새 연결을 자동으로 분산. 각 프로세스가 자체 epoll 인스턴스 → 락 없는 진짜 병렬 처리. Nginx, HAProxy가 사용 — 워커 프로세스마다 같은 포트 listen, 커널이 로드 밸런싱. 단일 프로세스의 한계를 우회. CPU 코어 수만큼 워커가 일반적.
참고 자료
- The C10K Problem — Dan Kegel
- epoll man page
- io_uring 소개 — Jens Axboe
- tokio — Rust async runtime
- Boost.ASIO
- libuv — Node.js의 기반
- Nginx Event Loop
- What is io_uring?
- ScyllaDB — io_uring 사용
- The Reactor Pattern
- Python asyncio Internals
Complete Guide to Async I/O Models 2025: epoll, io_uring, Reactor/Proactor, and async/await Internals
TL;DR
- Async I/O is the foundation of modern servers: Nginx, Node.js, Redis — every high-performance server.
- Evolution: select → poll → epoll/kqueue → io_uring.
- Reactor vs Proactor: event readiness notification vs operation-completion notification.
- io_uring: the Linux 5.1+ revolution. More efficient than epoll, and truly async.
- async/await: the compiler rewrites your code as a state machine.
1. Evolution of I/O Models
1.1 Five I/O Models
POSIX classification:
- Blocking I/O: wait until the operation completes.
- Non-blocking I/O: return immediately, poll for completion.
- I/O Multiplexing: select/poll/epoll.
- Signal-driven I/O: notified via signals.
- Asynchronous I/O: true async (POSIX AIO, io_uring).
1.2 Blocking I/O — The Simplest
data = sock.recv(1024) # blocks until data arrives
process(data)
Problems:
- One thread = one connection.
- 10K connections = 10K threads (memory blow-up).
- The C10K problem.
1.3 Multi-Threaded as the Fix?
def handle_client(sock):
data = sock.recv(1024)
process(data)
while True:
client = server.accept()
threading.Thread(target=handle_client, args=(client,)).start()
Limits:
- Per-thread memory (1–8 MB).
- Context-switching cost.
- 10K threads = death.
1.4 Non-blocking I/O
sock.setblocking(False)
try:
data = sock.recv(1024)
except BlockingIOError:
pass # no data — do something else
Problem: infinite-loop polling pins CPU at 100%.
1.5 Enter I/O Multiplexing
Idea: one syscall watching many fds at once.
ready, _, _ = select([sock1, sock2, sock3], [], [])
for sock in ready:
data = sock.recv(1024)
A single thread can now handle thousands of connections.
2. select and poll
2.1 select (1983)
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(sock, &readfds);
int n = select(max_fd + 1, &readfds, NULL, NULL, &timeout);
if (FD_ISSET(sock, &readfds)) {
// readable
}
Problems:
- FD_SETSIZE limit (typically 1024).
- O(n) scan: every call inspects every fd.
- fd_set copy: user space ↔ kernel on each call.
2.2 poll (1986)
struct pollfd fds[1024];
fds[0].fd = sock;
fds[0].events = POLLIN;
int n = poll(fds, 1024, timeout);
if (fds[0].revents & POLLIN) {
// readable
}
Improvements:
- No hard fd-count limit.
- Dynamic array.
Still problematic:
- O(n) scan.
- The fd list is passed on every call.
2.3 Limits of select/poll
10,000 connections with only 100 active:
- select/poll scan all 10,000 every time.
- Wasted CPU.
A new mechanism was needed.
3. epoll — The Linux Revolution
3.1 Arrival (2002, Linux 2.5.44)
int epfd = epoll_create1(0);
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = sock;
epoll_ctl(epfd, EPOLL_CTL_ADD, sock, &ev);
struct epoll_event events[64];
int n = epoll_wait(epfd, events, 64, -1);
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
// handle
}
3.2 Why epoll Wins
1. O(1) event detection:
- Kernel returns only ready fds.
- 10K connections + 100 active returns 100.
2. Register once:
epoll_ctlregisters the fd.- No need to pass them every call.
3. Trigger modes:
- Level-Triggered (LT): keep notifying while data is present (default).
- Edge-Triggered (ET): notify only on state change (Nginx).
3.3 ET vs LT
LT (Level-Triggered):
// 16KB available, you read 8KB → next epoll_wait still notifies
Easier to use, slightly slower.
ET (Edge-Triggered):
// Only notified when new data arrives → must drain fully
while (true) {
n = recv(fd, buf, sizeof(buf), 0);
if (n < 0 && errno == EAGAIN) break; // fully drained
process(buf, n);
}
Faster, but easier to lose events if you mishandle.
3.4 Systems Built on epoll
- Nginx — core.
- Node.js (via libuv).
- Redis.
- HAProxy.
- Memcached.
- Nearly every high-performance Linux server.
3.5 Equivalents on Other OSes
| OS | API |
|---|---|
| Linux | epoll |
| macOS/BSD | kqueue |
| Windows | IOCP |
| Solaris | /dev/poll (deprecated), event ports |
4. kqueue (BSD/macOS)
4.1 More Powerful Than epoll
int kq = kqueue();
struct kevent change;
EV_SET(&change, sock, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, NULL);
kevent(kq, &change, 1, NULL, 0, NULL);
struct kevent events[64];
int n = kevent(kq, NULL, 0, events, 64, NULL);
Richer than epoll:
- File changes (
EVFILT_VNODE). - Signals (
EVFILT_SIGNAL). - Timers (
EVFILT_TIMER). - Process exit (
EVFILT_PROC). - Disk I/O, and more.
Downside: smaller user base than epoll.
5. io_uring — The New Revolution
5.1 Limits of epoll
Even epoll has limits:
- Synchronous I/O: you still call read/write after epoll_wait.
- Syscall cost: user space ↔ kernel on every call.
- Not truly async: read/write themselves can still block.
5.2 io_uring (2019, Linux 5.1)
Built by Jens Axboe (Linux kernel developer). True async I/O.
Core ideas:
- Two ring buffers (Submission, Completion).
- Shared memory between user space and kernel.
- Near-zero syscalls.
[User Space] [Kernel Space]
[Submission Queue (SQ)] ────→ [Worker]
↓
[Completion Queue (CQ)] ←──── [I/O complete]
5.3 Example Usage
struct io_uring ring;
io_uring_queue_init(256, &ring, 0);
// Submit work
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, sizeof(buf), 0);
io_uring_submit(&ring);
// Wait for completion
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
// cqe->res = bytes read
io_uring_cqe_seen(&ring, cqe);
5.4 Advantages of io_uring
1. Truly async:
- Disk I/O no longer blocks.
- Unified model for network and disk.
2. Fewer syscalls:
- Batch submission.
- Some modes allow zero syscalls.
3. Many operations:
- read, write, send, recv.
- accept, connect.
- fsync, fallocate.
- splice, tee.
- statx, openat, close.
4. Faster:
- 30–50% improvement over epoll on certain workloads.
5.5 io_uring in the Wild
- Nginx 1.21+: optional support.
- PostgreSQL 17: partial adoption.
- Tokio (Rust): io-uring backend.
- ScyllaDB: core technology.
- Cloud Hypervisor.
5.6 Future of io_uring
- eBPF integration.
- Direct NVMe access.
- GPU/FPGA I/O.
- All syscalls async.
io_uring is the future of Linux I/O.
6. Reactor vs Proactor
6.1 Reactor Pattern
[Event Loop]
↓ epoll_wait
[Event: fd=5 readable]
↓
[Handler]
↓
[read(5, ...)] ← the user calls read directly
Characteristics:
- Event-readiness notification.
- The user calls read/write.
- Built on epoll, kqueue.
Examples: Nginx, Node.js, Redis.
6.2 Proactor Pattern
[User] → "read 1024 bytes from this fd"
↓
[Kernel] → async work
↓
[Completion: data ready]
↓
[Handler] → user receives data
Characteristics:
- Delegate the operation itself to the kernel.
- Notified on completion.
- Built on IOCP, io_uring.
Examples: Boost.ASIO, Windows IOCP, io_uring.
6.3 Comparison
| Reactor | Proactor | |
|---|---|---|
| Event | "ready" | "completed" |
| Read | user calls | kernel handles |
| OS support | broad (epoll, kqueue) | limited (io_uring, IOCP) |
| Abstraction | simple | complex |
| Performance | good | better (in theory) |
6.4 ASIO's Unified Model
Boost.ASIO (C++) exposes Reactor and Proactor through the same interface:
async_read(socket, buffer, [](error_code ec, size_t bytes) {
// called on completion
});
Internally: epoll on Linux, IOCP on Windows, io_uring on recent builds.
7. async/await Internals
7.1 What async/await Actually Does
async def fetch_data():
data = await http_request("https://api.example.com")
return process(data)
There is no magic. The compiler rewrites this as a state machine.
7.2 The Rewrite (Pseudo-code)
class FetchDataStateMachine:
state = 0
def resume(self):
if self.state == 0:
self.future = http_request("https://api.example.com")
self.future.set_callback(self.resume)
self.state = 1
return # suspend
if self.state == 1:
data = self.future.result()
return process(data)
Core idea:
- Suspend the function at each
await. - Resume when the result is ready.
- Yield the stack back to the event loop.
7.3 The Event Loop
loop = asyncio.get_event_loop()
while True:
# 1. Run tasks on the ready queue
while ready_queue:
task = ready_queue.pop()
task.run()
# 2. Wait for I/O events via epoll_wait
events = epoll.wait(timeout)
# 3. Move waiters back onto the ready queue
for event in events:
task = pending[event.fd]
ready_queue.append(task)
7.4 Python asyncio
import asyncio
async def main():
# Concurrent execution
results = await asyncio.gather(
fetch_user(1),
fetch_user(2),
fetch_user(3),
)
return results
asyncio.run(main())
Internals:
- The selectors module (abstracts epoll/kqueue/select).
- Event loop.
- Tasks and Futures.
- Coroutines.
7.5 JavaScript (Node.js)
async function main() {
const data = await fetch("https://api.example.com")
const json = await data.json()
console.log(json)
}
Internals:
- libuv (C library).
- Event loop.
- epoll/kqueue/IOCP.
7.6 Rust Tokio
#[tokio::main]
async fn main() {
let data = fetch("https://api.example.com").await;
println!("{:?}", data);
}
Internals:
- mio (abstracts epoll/kqueue).
- Optional io-uring.
- M:N scheduling across multiple OS threads.
- Work-stealing.
8. Single-Threaded vs Multi-Threaded
8.1 Single-Threaded (Node.js, Redis, Nginx)
[1 thread]
↓
[Event Loop] (epoll)
↓
[Callback 1] [Callback 2] [Callback 3]
Pros:
- No locks (no race conditions).
- Simple.
- Memory-efficient.
Cons:
- CPU work blocks the loop.
- Only one core is used.
8.2 Multi-Threaded + Event Loops
[N threads (one per CPU core)]
↓
[Event Loop per thread]
↓
[Work-stealing]
Examples: Tokio (Rust), Akka (JVM), Kotlin Coroutines.
Pros:
- Multi-core utilization.
- Hundreds of thousands of concurrent tasks in a single process.
Cons:
- Synchronization required.
- Harder to debug.
8.3 SO_REUSEPORT
Linux 3.9+ feature:
setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));
Multiple processes can listen on the same port:
- The kernel load-balances automatically.
- Each process gets its own epoll instance.
- Used by Nginx and HAProxy.
[Port 80]
├─ [Worker 1] (epoll)
├─ [Worker 2] (epoll)
└─ [Worker 3] (epoll)
9. Performance Comparison
9.1 Handling 10K Connections
| Model | Memory | Throughput |
|---|---|---|
| Thread per connection | ~10 GB | low |
| select | low | very low |
| poll | low | low |
| epoll | low | high |
| io_uring | low | very high |
9.2 Real Benchmarks
HTTP server (10K connections):
Apache (prefork): 5,000 req/s
Apache (worker): 20,000 req/s
Nginx (epoll): 100,000 req/s
Nginx (io_uring): 150,000 req/s
ScyllaDB (io_uring):
- 10x+ throughput over Cassandra.
- 1M+ ops/s on a single node.
9.3 NewSQL and Storage
ScyllaDB's secret sauce:
- io_uring + DPDK.
- One shard per core.
- Lock-free.
- Userspace TCP stack.
10. In Practice — Building a High-Performance Server
10.1 Python (asyncio)
import asyncio
async def handle_client(reader, writer):
while True:
data = await reader.read(1024)
if not data:
break
writer.write(data)
await writer.drain()
writer.close()
async def main():
server = await asyncio.start_server(handle_client, '0.0.0.0', 8080)
async with server:
await server.serve_forever()
asyncio.run(main())
Internally uses selectors → epoll.
10.2 Rust (Tokio)
use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
#[tokio::main]
async fn main() {
let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
loop {
let (mut socket, _) = listener.accept().await.unwrap();
tokio::spawn(async move {
let mut buf = [0; 1024];
loop {
let n = socket.read(&mut buf).await.unwrap();
if n == 0 { return }
socket.write_all(&buf[..n]).await.unwrap();
}
});
}
}
Tokio handles:
- mio (epoll).
- M:N scheduling.
- Work-stealing.
10.3 Go
listener, _ := net.Listen("tcp", ":8080")
for {
conn, _ := listener.Accept()
go handle(conn)
}
func handle(conn net.Conn) {
buf := make([]byte, 1024)
for {
n, err := conn.Read(buf)
if err != nil { return }
conn.Write(buf[:n])
}
}
The Go runtime handles:
- netpoll (epoll/kqueue).
- Goroutine scheduling.
- Channel integration.
10.4 Comparison
| Python asyncio | Rust Tokio | Go | |
|---|---|---|---|
| Syntax | async/await | async/await | go func() |
| Performance | moderate | top | very good |
| Learning curve | easy | steep | easy |
| Memory | moderate | very low | low |
| Best for | fast prototyping | maximum performance | balance |
Quiz
1. What is the core difference between select and epoll?
Answer: select scans every fd on every call (O(n)), is limited by FD_SETSIZE (typically 1024), and copies the fd_set between user and kernel on every call. epoll returns only fds with events (O(1)), has no fd-count limit, and registers fds once via epoll_ctl. With 10K connections and 100 active: select scans 10K every time, epoll returns just 100. That is why Nginx, Node.js, and Redis are all built on epoll.
2. Why is io_uring better than epoll?
Answer: epoll still has limits — synchronous read/write (you still call them after epoll_wait), syscall cost, and disk I/O that can still block. io_uring: (1) two ring buffers (SQ/CQ) plus shared memory, (2) near-zero syscalls (batch submission), (3) truly async (disk I/O no longer blocks), (4) supports not just read/write but also accept, connect, fsync, and more. 30–50% faster. Used by ScyllaDB and PostgreSQL 17. The future of Linux I/O.
3. What is the difference between the Reactor and Proactor patterns?
Answer: Reactor (epoll-based): the event loop signals "fd is ready" and the user calls read. Proactor (io_uring / IOCP-based): the user requests "read this data for me", the kernel performs the work, and signals on completion. Reactor makes the user do the work; Proactor makes the kernel do it. Boost.ASIO abstracts both patterns behind the same interface.
4. How does async/await actually work?
Answer: No magic. The compiler rewrites the function as a state machine. At each await the function suspends and registers a callback on the future/promise. When the result is ready, it resumes. The event loop (1) runs tasks on the ready queue, (2) waits for I/O via epoll_wait, (3) moves completed tasks back onto the ready queue. Python asyncio, JS/Node.js, Rust Tokio, and Go goroutines all follow this pattern.
5. What is SO_REUSEPORT and how is it used?
Answer: A Linux 3.9+ feature. Multiple processes can listen on the same port simultaneously. The kernel distributes new connections automatically. Each process has its own epoll instance, giving you true lock-free parallelism. Used by Nginx and HAProxy — each worker process listens on the same port, and the kernel balances between them. It sidesteps single-process limits. One worker per CPU core is the common pattern.
References
- The C10K Problem — Dan Kegel
- epoll man page
- Introduction to io_uring — Jens Axboe
- tokio — Rust async runtime
- Boost.ASIO
- libuv — the foundation of Node.js
- Nginx Event Loop
- What is io_uring?
- ScyllaDB — powered by io_uring
- The Reactor Pattern
- Python asyncio Internals