Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

TL;DR

Rust의 async fn은 컴파일 타임에 State Machine struct로 변환된다. 런타임 스택 없음 → zero-cost abstraction.
Future trait의 핵심 메소드는 poll(&mut self, cx: &mut Context) -> Poll<Output>. 결과가 준비되면 Ready, 아니면 Pending을 반환하고 waker를 저장한다.
Waker는 "이 task를 다시 폴해달라"고 런타임에 알리는 콜백. I/O 이벤트가 도착하거나 타이머가 만료되면 런타임은 waker를 호출해 task를 재스케줄한다.
Tokio는 리액터(mio = epoll/kqueue/IOCP) + 스케줄러(work-stealing) + 드라이버(I/O, timer) 3층 구조. #[tokio::main]이 이 모든 것을 setup.
Work-Stealing 스케줄러: Go와 유사하게 per-worker local run queue. 하지만 LIFO slot(최근 spawn된 task 우선)이 특징 — 캐시 지역성 ↑.
Pin은 자체 참조 구조체(자기 자신의 필드를 가리키는 포인터)를 안전하게 만들기 위한 타입. async fn이 생성한 상태 머신이 자주 자기 참조를 갖는다.
Cooperative 스케줄링: Tokio는 스스로 await 해야만 양보한다. CPU-bound 작업은 tokio::task::yield_now() 또는 spawn_blocking으로 분리해야.
Function Coloring: async fn과 동기 함수는 서로를 직접 호출할 수 없다. Rust async의 철학적 대가.

1. 왜 Rust async는 "특이한가"

1.1 두 가지 접근법

동시성 모델은 크게 두 가지로 나뉜다.

Stackful (고루틴 스타일):

각 task가 자기 스택을 가진다.
함수 중간에 자유롭게 suspend/resume.
예: Go goroutine, Erlang process, Java virtual thread (Loom).

Stackless (상태 머신 스타일):

각 task는 상태 머신 struct.
suspend 지점은 컴파일 타임에 결정.
런타임 스택 없음.
예: Rust async, C# async/await, JavaScript async/await, Python asyncio.

Rust는 후자를 선택했다. 이유:

런타임 없이 embedded에서도 사용 가능해야 한다.
Zero-cost: 쓰지 않으면 비용이 없어야 한다.
스택을 동적으로 할당하지 않는다 → 결정적 메모리 사용량.

1.2 Zero-cost Abstraction의 의미

Bjarne Stroustrup: "당신이 쓰지 않는 것에 대해 비용을 치르지 않고, 쓰는 것에 대해서는 손으로 쓸 수 있는 것보다 더 나빠지지 않는다."

Rust async는 이 원칙을 극한으로 밀어붙였다:

async fn 하나는 struct 하나로 컴파일.
suspend 가능 지점은 state machine의 상태(variant)로 표현.
힙 할당 없음 (Future가 Box될 때만).
컴파일러 최적화가 제대로 들면 C로 짠 이벤트 루프와 거의 동일한 기계어.

1.3 대가

zero-cost의 대가는 복잡도다.

Function Coloring: sync와 async가 섞이지 못함.
Pin의 복잡성: 자체 참조 구조체 때문에.
런타임 의존성: Future는 poll 되지 않으면 아무 일도 일어나지 않음. Tokio 같은 executor 필요.
디버깅 어려움: 스택 트레이스가 state machine 내부를 가리켜 읽기 어려움.

2. Future Trait와 State Machine

2.1 Future는 단순하다

핵심 정의:

pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

pub enum Poll<T> {
    Ready(T),
    Pending,
}

의미:

Future는 "나중에 값을 반환하는 계산".
poll()은 "지금 진행 가능한 만큼 진행해라"는 요청.
Ready(value): 끝났음, 결과는 value.
Pending: 아직 못 끝냄, 나중에 cx.waker()로 알려주면 다시 시도.

2.2 async fn은 Future를 생성

async fn hello() -> u32 {
    42
}

위 코드는 대략 다음과 동등:

fn hello() -> impl Future<Output = u32> {
    async move { 42 }
}

그리고 async move { 42 }는 컴파일러에 의해 state machine struct로 변환된다.

2.3 상태 머신 해부

조금 더 복잡한 예:

async fn example() -> u32 {
    let x = read_value().await;   // suspend point 1
    let y = compute(x).await;      // suspend point 2
    x + y
}

컴파일러가 생성하는 것 (개념적):

enum ExampleState {
    Start,
    Awaiting1 { fut: impl Future<Output = u32> },
    Awaiting2 { x: u32, fut: impl Future<Output = u32> },
    Done,
}

struct Example {
    state: ExampleState,
}

impl Future for Example {
    type Output = u32;
    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<u32> {
        let this = unsafe { self.get_unchecked_mut() };
        loop {
            match &mut this.state {
                ExampleState::Start => {
                    let fut = read_value();
                    this.state = ExampleState::Awaiting1 { fut };
                }
                ExampleState::Awaiting1 { fut } => {
                    match Pin::new(fut).poll(cx) {
                        Poll::Ready(x) => {
                            let fut2 = compute(x);
                            this.state = ExampleState::Awaiting2 { x, fut: fut2 };
                        }
                        Poll::Pending => return Poll::Pending,
                    }
                }
                ExampleState::Awaiting2 { x, fut } => {
                    match Pin::new(fut).poll(cx) {
                        Poll::Ready(y) => {
                            let result = *x + y;
                            this.state = ExampleState::Done;
                            return Poll::Ready(result);
                        }
                        Poll::Pending => return Poll::Pending,
                    }
                }
                ExampleState::Done => panic!("polled after completion"),
            }
        }
    }
}

핵심 관찰:

suspend point가 enum variant로 표현된다.
poll()은 매번 "현재 상태에서 진행 가능한 만큼" 진행한다.
.await은 match poll() { Pending => return Pending, Ready(v) => v }와 같다.
로컬 변수(x)는 다음 상태의 struct 필드로 저장된다. 그래서 suspend 후에도 살아있다.

2.4 생성되는 상태 머신의 크기

State machine의 크기 = 동시에 살아있는 모든 로컬 변수의 합 (각 suspend 지점에서 필요한 만큼).

async fn big() {
    let buf = [0u8; 1024 * 1024];  // 1MB 스택 배열!
    do_io().await;
    println!("{}", buf[0]);
}

이 Future의 크기는 1MB를 포함한다. 스택에 올리면 스택 오버플로우, 힙에 올려도 오래 유지된다. 이런 "async fn 크기 폭발" 문제가 Rust의 악명 높은 함정 중 하나.

해결책: Box::pin 으로 힙에 할당하거나, 큰 데이터를 Vec/Box에 넣어 간접 참조.

3. Waker — 재개의 비밀

3.1 왜 Waker가 필요한가

Future가 Pending을 반환했다고 하자. 런타임은 "나중에 다시 폴해야 한다"는 건 알지만, 언제 폴해야 하는가?

두 가지 나쁜 선택:

계속 폴하기 (busy loop): CPU 낭비.
모든 Future를 주기적으로 폴하기: 스케일 X.

좋은 선택: Future가 스스로 "이제 준비됐다"고 알린다. 이것이 Waker다.

3.2 Waker 메커니즘

pub struct Waker {
    waker: RawWaker,
}

impl Waker {
    pub fn wake(self);      // self 소비
    pub fn wake_by_ref(&self); // 참조만
    pub fn clone(&self) -> Waker;
}

각 poll() 호출에 Context가 전달되고, Context에서 waker를 얻는다:

fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
    if self.io_ready() {
        Poll::Ready(value)
    } else {
        self.waker = Some(cx.waker().clone());
        // I/O 드라이버에게 "이 waker를 저장해두세요"
        self.register_io(cx.waker().clone());
        Poll::Pending
    }
}

그 후 I/O가 준비되면 드라이버가 waker.wake()를 호출 → 런타임이 해당 task를 다시 스케줄.

3.3 RawWaker와 vtable

Waker는 실제로는 vtable을 통한 동적 디스패치다.

pub struct RawWaker {
    data: *const (),
    vtable: &'static RawWakerVTable,
}

pub struct RawWakerVTable {
    clone: unsafe fn(*const ()) -> RawWaker,
    wake: unsafe fn(*const ()),
    wake_by_ref: unsafe fn(*const ()),
    drop: unsafe fn(*const ()),
}

왜 이렇게 복잡한가? 런타임 무관성을 위해서. Tokio, async-std, smol, embedded 런타임 모두 자기 waker 구현을 가질 수 있다. Future는 런타임을 모르고도 wake()를 호출할 수 있다.

3.4 Waker의 비용

clone()은 대부분의 런타임에서 Arc::clone() 수준 — 원자적 참조 카운트 증가. wake()는:

Task를 런큐에 넣거나
이미 queue에 있으면 no-op.

Tokio의 최적화: waker가 이미 현재 실행 중인 worker의 run queue에 있다면 clone/재등록 생략.

4. Pin과 자체 참조 구조체

4.1 문제

상태 머신이 종종 자기 자신을 가리키는 포인터를 가진다.

async fn example() {
    let data = vec![1, 2, 3];
    let ptr = &data[0];  // &data[0]는 state machine의 필드 data를 가리킴
    some_future().await;
    println!("{}", *ptr);  // suspend 후에도 유효해야!
}

생성된 상태 머신:

struct Example {
    data: Vec<i32>,
    ptr: *const i32,  // data[0]를 가리킴
    state: State,
}

여기서 Example을 메모리 상에서 이동시키면? data는 새 위치로 가지만 ptr는 여전히 이전 위치를 가리킨다 → use-after-free.

4.2 Pin의 역할

Pin은 "이 값은 절대 이동되지 않는다"는 타입 수준 약속이다.

pub struct Pin<P> {
    pointer: P,
}

핵심 규칙:

Pin<&mut T>에서 &mut T를 안전하게 꺼낼 수 없다 (unsafe get_unchecked_mut 제외).
Pin<Box<T>>를 통해서만 T를 수정 가능하되, 이동은 불가.
T: Unpin인 경우 이동 가능(예: 기본 타입, Pin 관련 없음).

Future::poll은 Pin<&mut Self>를 받는다 → 런타임이 Future를 이동시킬 수 없음 → 자체 참조 안전.

4.3 Pin을 만드는 방법

1. Box::pin: 힙에 고정.

let fut = Box::pin(my_async_fn());

2. pin! 매크로: 스택에 고정 (Rust 1.68+).

use std::pin::pin;
let fut = pin!(my_async_fn());

3. 런타임이 spawn 시 자동: tokio::spawn(fut)는 내부적으로 Box::pin 사용.

4.4 Unpin

대부분의 타입은 Unpin이다 (자동 트레이트). Unpin이면 Pin이 사실상 의미 없음:

// u32는 Unpin이므로 자유롭게 이동
let mut x = 5u32;
let pinned = Pin::new(&mut x);
let _ = pinned.as_mut();  // OK

!Unpin (Unpin 아님)은 수동으로 선언하거나 PhantomPinned을 필드로 포함해야.

async fn이 생성한 상태 머신은 !Unpin (자체 참조 가능성 때문). 그래서 tokio::spawn 전에 Pin이 필요.

4.5 Pin의 한계

Pin은 완벽하지 않다. Linus Torvalds를 비롯한 여러 사람이 비판했다:

복잡성: "async Rust를 이해하려면 Pin을 이해해야 한다"는 장벽.
교육 자료 부족: 초보자가 이해하기 매우 어렵다.
API 강제: 많은 API가 Pin<&mut Self>를 받는데, 이는 구조적으로 무거움.

개선 작업으로 "pin-project" 크레이트, pin! 매크로 등이 나왔지만 여전히 핵심 복잡도는 남아있다.

5. Tokio 런타임 아키텍처

5.1 3층 구조

┌─────────────────────────────────────────────────┐
│ 유저 async 코드 (async fn, .await)                │
├─────────────────────────────────────────────────┤
│ Scheduler (task, worker, queue, park/unpark)      │
├─────────────────────────────────────────────────┤
│ Driver (I/O driver = mio, time driver = 타이머 휠) │
├─────────────────────────────────────────────────┤
│ OS (epoll / kqueue / IOCP)                        │
└─────────────────────────────────────────────────┘

유저 코드: Future.
Scheduler: 어떤 task를 언제 poll할지 결정.
Driver: I/O 이벤트와 타이머를 감지, waker 호출.
OS: 실제 I/O multiplexing.

5.2 #[tokio::main]

#[tokio::main]
async fn main() {
    do_something().await;
}

이건 매크로 확장 후:

fn main() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap()
        .block_on(async {
            do_something().await;
        })
}

핵심 컴포넌트:

Builder::new_multi_thread(): 멀티 스레드 런타임 (기본 CPU 코어 수 worker).
enable_all(): I/O 드라이버 + 타이머 드라이버 활성화.
block_on(future): 이 Future가 완료될 때까지 블로킹. 일반적으로 main의 top-level.

5.3 Runtime Flavors

Tokio는 두 가지 런타임:

1. Multi-threaded (기본):

#[tokio::main]
async fn main() { ... }
// 또는
Builder::new_multi_thread().build()

Worker N개 (기본 CPU 코어 수).
Work-stealing 스케줄러.
Task가 여러 스레드 사이로 이동 가능 → Send + 'static 필요.

2. Current-thread (단일 스레드):

#[tokio::main(flavor = "current_thread")]
async fn main() { ... }

단일 스레드, 단일 worker.
Task가 이동하지 않음 → Send 불필요.
CLI 도구, 임베디드, 테스트에 적합.

6. Tokio 스케줄러 내부

Tokio 1.0 이후 스케줄러는 여러 번 튜닝됐지만 기본 원리는 Go와 유사한 work-stealing이다.

6.1 Worker와 Task

struct Worker {
    // 로컬 런큐 (길이 256)
    local: LocalQueue<Task>,

    // LIFO slot (최근 spawn된 task)
    lifo_slot: Option<Task>,

    // 글로벌 런큐 참조
    remote: Arc<Mutex<RemoteQueue<Task>>>,

    // Park (잠재우기용)
    park: Parker,
}

각 worker는 자기 로컬 큐 + LIFO slot 한 개를 가진다.

6.2 LIFO Slot — 캐시 최적화

LIFO slot의 아이디어: "방금 spawn된 task를 바로 실행하면 캐시 hit률이 높다." 메시지 패싱 패턴에서 특히 효과적.

// task A가 task B를 spawn
tokio::spawn(async {
    // B의 로컬 상태는 A의 컨텍스트와 연관
});
// A가 .await 즉시, B가 LIFO slot에서 꺼내져 실행 → L1 cache hit

하지만 LIFO는 공정성에 나쁘다. 한 task가 LIFO slot을 독점하면 다른 task들이 굶는다. Tokio는 일정 횟수마다 LIFO slot을 건너뛰고 FIFO 큐를 우선시한다.

6.3 런큐 선택 순서

run_task_loop에서 다음 task 선택 순서:

loop {
    // 1. LIFO slot (최근 spawn)
    if let Some(task) = self.lifo_slot.take() {
        task.run();
        continue;
    }

    // 2. 로컬 큐
    if let Some(task) = self.local.pop() {
        task.run();
        continue;
    }

    // 3. 글로벌 큐 (tick마다)
    if self.tick % GLOBAL_POLL_INTERVAL == 0 {
        if let Some(task) = self.remote.pop() {
            task.run();
            continue;
        }
    }

    // 4. I/O 드라이버 polling
    self.poll_driver();

    // 5. 다른 worker에서 훔치기
    if let Some(task) = self.steal_from_other_worker() {
        task.run();
        continue;
    }

    // 6. park (잠자기)
    self.park.park();
}

Go와 유사하지만 LIFO slot + I/O 드라이버 주기적 폴링이 추가.

6.4 Work Stealing 알고리즘

fn steal_from(&self, victim: &Worker) -> Option<Task> {
    // 1. victim의 로컬 큐에서 절반을 가져옴
    let n = victim.local.len() / 2;
    if n == 0 { return None; }

    let mut stolen = Vec::with_capacity(n);
    for _ in 0..n {
        if let Some(task) = victim.local.steal() {
            stolen.push(task);
        }
    }

    // 2. 첫 번째를 실행, 나머지는 자기 로컬 큐에 넣기
    let first = stolen.pop()?;
    for task in stolen {
        self.local.push(task);
    }
    Some(first)
}

Go와 마찬가지로 절반 훔치기 전략.

6.5 Async Task의 생명주기

Spawn: tokio::spawn(fut) → Task 생성, 현재 worker의 LIFO slot 또는 로컬 큐에 배치.
Poll: worker가 task를 꺼내 poll() 호출. Ready 또는 Pending 반환.
Pending → 대기: 어딘가(I/O 드라이버, 타이머, 채널)에 waker 등록.
Wake: 이벤트 발생 → waker 호출 → task가 다시 런큐에 들어감.
Ready: 결과를 JoinHandle을 통해 caller에게 전달.

6.6 JoinHandle

let handle = tokio::spawn(async { 42 });
let result: i32 = handle.await.unwrap();

JoinHandle은 Future이다. 내부적으로:

Task 완료 시 결과를 Mutex<Option<Result>>에 저장.
JoinHandle이 poll 되면 결과를 가져옴. 아니면 waker 등록.
Task와 JoinHandle은 Arc로 공유: 둘 중 하나라도 살아있으면 메모리 해제 안 됨.

7. I/O Driver — mio와 epoll

7.1 mio 역할

mio는 플랫폼 독립적 이벤트 알림 라이브러리:

Linux: epoll
macOS/BSD: kqueue
Windows: IOCP

Tokio의 I/O 드라이버는 mio 위에서 동작한다.

7.2 Non-blocking I/O

핵심 원칙: 모든 socket/fd는 non-blocking.

let socket = TcpStream::connect("example.com:80").await?;
// 실제로는:
// 1. socket(), fcntl(O_NONBLOCK)
// 2. connect() → EINPROGRESS (즉시 반환)
// 3. epoll에 WRITE 이벤트 등록
// 4. poll() → Pending
// 5. epoll_wait → WRITE 가능 → waker 호출
// 6. poll() 다시 → 완료

7.3 I/O Resource 등록

pub struct TcpStream {
    io: PollEvented<mio::net::TcpStream>,
}

PollEvented가 핵심. poll_read, poll_write를 구현하며:

fn poll_read(self: Pin<&mut Self>, cx: &mut Context, buf: &mut [u8])
    -> Poll<io::Result<usize>>
{
    // 1. 실제 read 시도
    match self.io.read(buf) {
        Ok(n) => Poll::Ready(Ok(n)),
        Err(e) if e.kind() == WouldBlock => {
            // 2. WouldBlock이면 readiness 상태를 Pending으로
            self.register_readiness(cx, Readiness::READ);
            Poll::Pending
        }
        Err(e) => Poll::Ready(Err(e)),
    }
}

7.4 Epoll Loop

I/O 드라이버의 메인 루프:

fn turn(&mut self, max_wait: Option<Duration>) {
    let mut events = self.events.borrow_mut();
    self.poll.poll(&mut events, max_wait).unwrap();

    for event in events.iter() {
        let token = event.token();
        let source = self.resources.get(token);
        if event.is_readable() {
            source.wake_readers();
        }
        if event.is_writable() {
            source.wake_writers();
        }
    }
}

드라이버는 주기적으로 epoll_wait을 호출 → 준비된 fd들을 찾아 → waker 호출 → task 재개.

7.5 I/O Driver는 언제 돌아가는가

Tokio의 트릭: worker 스레드가 직접 epoll_wait을 호출한다.

Worker Loop:
  1. 로컬 큐에서 task 실행
  2. 큐가 비면 I/O driver turn (epoll_wait)
  3. 깨어난 task들을 런큐에 추가
  4. 다시 실행

별도의 "I/O 스레드"가 없다. 일이 없는 worker가 자연스럽게 I/O 폴링을 담당. 이 설계는 캐시 지역성과 에너지 효율 모두에 유리.

8. Timer Driver — 타이머 휠

8.1 타이머의 요구

tokio::time::sleep(Duration): 지정 시간 후 Ready.
tokio::time::timeout(dur, fut): Future가 dur 안에 안 끝나면 에러.
tokio::time::interval(period): 주기적 tick.

이들 전부 내부적으로 타이머 등록/해제가 필요.

8.2 단순한 구현: Binary Heap

간단한 접근: 모든 타이머를 만료 시간 기준 최소 힙에 넣고, 가장 가까운 것을 보며 sleep.

Insert: O(log n)
Expire check: O(1)
Cancel: O(log n)

문제: 수만 개 연결이 각자 타임아웃을 가지면 힙 연산이 병목.

8.3 Hashed Timer Wheel

Tokio(와 Netty, Linux 커널 등)는 계층적 타이머 휠을 쓴다.

Level 0: 64 slots × 1 ms    (0 ~ 64 ms)
Level 1: 64 slots × 64 ms   (64 ms ~ 4 s)
Level 2: 64 slots × 4 s     (4 s ~ 4 m)
Level 3: 64 slots × 4 m     (4 m ~ 4 h)
Level 4: 64 slots × 4 h     (4 h ~ 11 d)
Level 5: 64 slots × 11 d    ...

Insert: O(1) (적절한 레벨의 슬롯에 삽입).
Expire: O(1) amortized.
메모리: 레벨 수 × 슬롯 수 = 상수.

매 tick(1 ms)마다 현재 슬롯의 모든 타이머를 만료 처리. 레벨 0이 한 바퀴 돌면 레벨 1에서 "cascade" → 더 작은 해상도 슬롯들로 재분배.

8.4 Tokio의 구현

tokio::time::driver::Driver:

pub struct Driver {
    wheel: wheel::Wheel<Handle>,
    ...
}

Wheel::insert(): O(1).
Wheel::poll(): O(expired count).

벤치마크상 수백만 개 타이머도 효율적으로 처리.

9. spawn vs spawn_blocking vs task::yield_now

9.1 tokio::spawn

let handle = tokio::spawn(async {
    read_file().await;
    compute()  // CPU 작업
});

Async task를 런타임에 등록.
Send + 'static 필요 (multi-thread 런타임에서).
Future가 .await 하지 않으면 worker를 독점.

9.2 문제: CPU-bound 코드

tokio::spawn(async {
    for i in 0..1_000_000_000 {
        hash(i);  // 그냥 CPU 작업
    }
});

.await이 없음 → poll이 한 번 시작되면 수 초간 반환 안 함 → 다른 task가 모두 굶음.
worker가 N개인데 그 중 하나가 막히면 쓰루풋 N-1/N 로 감소.

해결책:

1. tokio::task::yield_now() 수동 양보:

for i in 0..1_000_000_000 {
    hash(i);
    if i % 1000 == 0 {
        tokio::task::yield_now().await;
    }
}

2. tokio::task::spawn_blocking: 블로킹 전용 스레드 풀로 격리.

let result = tokio::task::spawn_blocking(|| {
    // CPU/블로킹 작업
    expensive_computation()
}).await.unwrap();

내부적으로 별도 스레드 풀(기본 최대 512 스레드).
async worker를 막지 않음.
Send 필요하지만 !Unpin도 OK.

9.3 언제 어떤 것?

짧은 CPU 작업(< 100μs): 그냥 넣어도 OK.
긴 CPU 작업(> 100μs): spawn_blocking.
블로킹 I/O(mutex, file read, DB 드라이버): spawn_blocking.
동시에 하나의 작업을 분할하고 싶다: tokio::task::yield_now() 또는 rayon + spawn_blocking.

10. Function Coloring 문제

10.1 현상

Rust async의 불편한 진실: async 함수와 sync 함수가 자유롭게 섞이지 않는다.

fn sync_fn() { /* ... */ }
async fn async_fn() { /* ... */ }

// OK
async fn caller1() {
    sync_fn();             // OK: async에서 sync 호출
    async_fn().await;      // OK
}

// NG
fn caller2() {
    async_fn();            // ??? 이건 Future를 리턴할 뿐 실행되지 않음
    async_fn().await;      // 컴파일 에러! sync에서 await 불가
}

sync에서 async를 실행하려면 런타임이 필요.

fn caller3() {
    let rt = tokio::runtime::Handle::current();
    rt.block_on(async_fn());  // OK, 하지만 worker 한 개를 잠재움 — 위험
}

이것이 "function coloring"이다. 함수가 두 가지 "색"(sync/async)을 가지며, 한 색에서 다른 색을 쉽게 호출할 수 없다.

10.2 Bob Nystrom의 원래 지적

2015년 Bob Nystrom이 블로그 "What Color is Your Function?"에서 이 문제를 지적했다. JavaScript, C#, Python, Rust 모두 같은 함정:

Sync 함수는 async를 못 부른다 (혹은 매우 불편).
async 함수는 어디서나 sync를 부를 수 있지만, sync가 블로킹이면 async 전체가 막힘.
라이브러리가 sync인지 async인지 둘 중 하나를 택해야 → 생태계 분열.

10.3 Go는 어떻게 피했는가

Go는 모든 함수가 내재적으로 async다. go func()로 임의 함수를 고루틴화할 수 있고, 블로킹 호출은 스케줄러가 자동 처리. Function coloring이 없다.

대가: runtime이 항상 존재한다. Go 바이너리는 최소 ~2MB. 임베디드/커널 코드에 쓸 수 없다.

10.4 완화 전략

1. Sync wrapper 제공:

pub fn sync_do_thing() -> Result<Output> {
    let rt = Runtime::new()?;
    rt.block_on(async_do_thing())
}

2. Runtime 통합: reqwest, sqlx, aws-sdk 등 대부분 라이브러리가 async 전용. sync 버전은 별도 제공 또는 reqwest::blocking.

3. 경계 분리: async 코드는 async 끼리, sync 코드는 sync 끼리 모아두고 경계에서만 bridge.

10.5 `block_on`의 위험

worker 안에서 block_on을 호출하면:

tokio::spawn(async {
    let rt = tokio::runtime::Handle::current();
    rt.block_on(another_async());  // 데드락 가능!
});

현재 worker가 block_on에 묶이면 그 worker는 다른 task를 실행 못 함. 만약 another_async가 같은 worker에 있는 다른 task를 기다린다면 → 데드락.

해결책: tokio::task::block_in_place 사용. worker를 blocking pool로 옮기고 원래 자리를 다른 worker가 대체.

11. 채널 — async 세계의 커뮤니케이션

Tokio는 여러 채널 구현을 제공한다.

11.1 mpsc (Multi-Producer, Single-Consumer)

use tokio::sync::mpsc;

let (tx, mut rx) = mpsc::channel::<String>(32);  // 버퍼 32

tokio::spawn(async move {
    for i in 0..10 {
        tx.send(format!("msg {}", i)).await.unwrap();
    }
});

while let Some(msg) = rx.recv().await {
    println!("{}", msg);
}

버퍼 있는 채널. send().await은 버퍼 풀이면 대기.
여러 producer가 tx.clone()으로 복제 가능.
단일 consumer.

11.2 oneshot

let (tx, rx) = tokio::sync::oneshot::channel::<u32>();
tokio::spawn(async move {
    tx.send(42).unwrap();  // send는 async 아님 (값이 이미 있음)
});
let value = rx.await.unwrap();

1회성 채널. 단일 값 전달.
Request-Response 패턴에 적합.

11.3 broadcast

let (tx, _) = tokio::sync::broadcast::channel::<String>(16);
let mut rx1 = tx.subscribe();
let mut rx2 = tx.subscribe();

tokio::spawn(async move {
    tx.send("hello".to_string()).unwrap();
});

// 모든 receiver가 같은 메시지 받음
println!("{}", rx1.recv().await.unwrap());
println!("{}", rx2.recv().await.unwrap());

팬아웃: 하나의 send가 여러 receiver에 전달.
각 receiver가 자기 위치 포인터 가짐.
느린 receiver는 "lag" — 메시지 놓침.

11.4 watch

let (tx, mut rx) = tokio::sync::watch::channel::<u32>(0);
tokio::spawn(async move {
    tx.send(1).unwrap();
    tx.send(2).unwrap();
});

rx.changed().await.unwrap();
let val = *rx.borrow();  // 최신 값만

상태 관찰. 최신 값만 유지(과거 값 누락 OK).
설정 변경 알림 같은 용도.

11.5 Select!

여러 Future를 동시 대기:

tokio::select! {
    val = rx.recv() => { /* 채널 수신 */ }
    _ = tokio::time::sleep(Duration::from_secs(1)) => { /* 타임아웃 */ }
    _ = shutdown.recv() => { /* 종료 시그널 */ }
}

먼저 Ready가 되는 분기를 실행.
나머지 Future는 drop → cancel safety 주의 필요.

11.6 Cancellation Safety

select!의 함정: 선택되지 않은 분기의 Future는 drop된다. 그 Future가 중간 상태를 가지고 있으면 유실.

let mut buf = vec![0u8; 1024];

tokio::select! {
    n = socket.read(&mut buf) => { /* ... */ }
    _ = timeout => { /* ... */ }
}

만약 socket.read가 200 바이트를 버퍼에 쓰고 Pending인 상태에서 timeout이 발동하면? read 는 drop되고 200 바이트는 유실. 다음 read 호출은 그 이후부터 읽는다.

Cancel safe 함수는 이런 상황에서도 안전한 API를 뜻한다. tokio::io::AsyncReadExt::read_exact는 cancel safe가 아니다. tokio::sync::mpsc::Receiver::recv는 cancel safe다(큐에 다시 넣어줌).

12. 성능 튜닝

12.1 tokio-console

런타임 상태를 시각화하는 도구:

tokio = { version = "1", features = ["full", "tracing"] }
console-subscriber = "0.4"

fn main() {
    console_subscriber::init();
    // ...
}

브라우저에서 실행 중인 모든 task, wake 빈도, 평균 poll 시간, 블로킹 task를 볼 수 있다.

12.2 tracing

#[tracing::instrument]
async fn handle_request(req: Request) -> Response {
    // ...
}

tracing 크레이트는 async에 맞춰 설계된 구조화 로깅. span 경계가 .await에서도 유지된다.

12.3 Flamegraph

cargo flamegraph --bin myapp

CPU 프로파일 시각화. async 코드의 특정 지점이 왜 느린지 찾을 때 유용.

12.4 튜닝 팁

1. Worker 수:

Builder::new_multi_thread()
    .worker_threads(8)  // 기본은 num_cpus
    .build()

CPU-bound가 많으면 코어 수만큼, I/O-bound가 많으면 약간 더 늘려도 OK.

2. 블로킹 풀 크기:

Builder::new_multi_thread()
    .max_blocking_threads(1024)  // 기본 512
    .build()

spawn_blocking 호출이 많으면 증가.

3. 큰 Future → Box:

// BAD: 16 KB future를 stack allocate
async fn big() { let _ = [0u8; 16384]; foo().await; }

// GOOD: 큰 데이터는 heap에
async fn big() { let _ = Box::new([0u8; 16384]); foo().await; }

4. select 대신 join_all: 많은 Future를 동시에 기다릴 때:

use futures::future::join_all;
let results = join_all(futs).await;

select는 여러 분기를 반복하지만 join_all은 한 번에 처리 → 오버헤드 감소.

12.5 흔한 안티패턴

1. async 안에서 std::sync::Mutex:

let data = std::sync::Mutex::new(HashMap::new());
tokio::spawn(async move {
    let lock = data.lock().unwrap();  // await 없이 블로킹 락 → worker 막음
    some_async().await;  // lock을 계속 쥐고 있음 — 다른 worker도 접근 못 함
});

해결: tokio::sync::Mutex 사용 (lock().await).

2. 긴 CPU 작업:

tokio::spawn(async {
    huge_compute();  // blocking CPU
});

해결: spawn_blocking 또는 rayon.

3. Arc<Mutex<Vec<Task>>> 매니저: 직접 task를 수집하고 관리하는 대신 JoinSet 사용:

let mut set = tokio::task::JoinSet::new();
set.spawn(task1());
set.spawn(task2());
while let Some(res) = set.join_next().await {
    // ...
}

13. Tokio vs Go 비교

항목	Rust Tokio	Go
모델	Stackless state machine	Stackful goroutine
스택	없음 (struct 필드)	2KB 동적
생성 비용	~ns (struct 초기화)	~ns (goroutine 할당)
크기 오버헤드	Future 크기 (수 KB ~ 수 MB)	2KB
Scheduler	Work-stealing + LIFO	Work-stealing + LIFO(runnext)
Preemption	Cooperative only	Async (SIGURG, Go 1.14+)
CPU-bound	막힐 수 있음 (yield 필요)	자동 preempt
Runtime 크기	수십 KB (tokio)	~2 MB (Go 런타임)
임베디드	Yes (`no_std` 런타임)	Practically no
학습 곡선	가파름 (Pin, lifetime)	평탄
성능	최고 수준	매우 좋음 (런타임 오버헤드 있음)

Go의 장점: 단순성, 런타임 자동 관리, function coloring 없음. Rust의 장점: zero-cost, 임베디드 가능, 메모리 안전성 보장, 최대 성능.

14. Rust Async 생태계

14.1 런타임들

Tokio: 사실상의 표준. 가장 성숙, 가장 많은 라이브러리.
async-std: std와 유사 API. 활동이 줄어듦.
smol: 작고 간단. 조합 가능한 컴포넌트.
embassy: no_std, 임베디드용. 인터럽트 기반 드라이버.
glommio: io_uring 기반 per-core 런타임. 고성능 DB/네트워크용.

대부분의 최신 프로젝트는 Tokio를 기본 선택한다.

14.2 주요 라이브러리

hyper / axum: HTTP 서버.
reqwest: HTTP 클라이언트.
sqlx: 비동기 SQL.
sea-orm / diesel-async: ORM.
tonic: gRPC.
tracing / opentelemetry: 관측성.
tower: 미들웨어 추상화.

14.3 `async fn` in Traits

오랫동안 Rust에서는 trait에 async fn를 직접 쓸 수 없었다. 우회로:

#[async_trait]
trait MyTrait {
    async fn do_thing(&self);  // 매크로가 Box<dyn Future>로 변환
}

단점: 매 호출마다 힙 할당 + vtable 호출.

Rust 1.75 (2023년 12월)부터 async fn in traits가 안정화됐다. 단, dyn trait은 여전히 제약 있음. 완전한 지원은 2025년 현재 진행 중.

15. 최신 변경사항 (2025 기준)

15.1 async fn in traits (1.75+)

trait Fetcher {
    async fn fetch(&self, url: &str) -> Result<String>;
}

#[async_trait] 매크로 없이 네이티브 지원. 하지만 dyn은 여전히 불가:

let x: Box<dyn Fetcher> = ...;  // ERROR

대안: async fn 대신 impl Trait을 리턴하는 메소드, 또는 Box<dyn Future> 우회.

15.2 RTN (Return Type Notation)

fn use_fetcher<F: Fetcher>(f: F) where F::fetch(..): Send { ... }

trait의 메소드가 리턴하는 Future 타입에 bound를 걸 수 있다.

15.3 Polonius / NLL 개선

Rust 컴파일러의 borrow checker가 점점 똑똑해져 async 코드에서 false positive가 줄어든다.

15.4 Stabilized `gen` blocks (Rust 2024 edition)

let iter = gen {
    yield 1;
    yield 2;
    yield 3;
};

async와 유사하지만 sync generator. Python의 generator와 같은 역할.

15.5 glommio / io_uring 운동

io_uring 기반 per-core 런타임이 점차 관심. 기존 Tokio보다 제한된 환경에서 더 빠를 수 있음. 단 모든 I/O가 io_uring으로 돌아가야 효과.

16. 실전 시나리오

16.1 "처음 100 요청 후 느려진다"

증상: 처음에는 빠르다가 특정 시점부터 응답이 느려진다.

의심 1: 연결 누수. netstat으로 TIME_WAIT/CLOSE_WAIT 확인.

의심 2: 블로킹 mutex. tokio-console로 wake/idle 비율 확인. 많은 task가 lock().await에 몰려 있으면 hot mutex.

의심 3: Worker 포화. top -H로 worker 스레드 CPU 확인. 100% 포화면 CPU-bound 작업을 찾아 spawn_blocking으로 분리.

16.2 "spawn_blocking 풀이 고갈됐다"

"thread pool is shutting down" 또는 task가 영원히 대기.

기본 제한은 512. 넘어가면 새 요청이 기존 task 대기. 해결:

max_blocking_threads 늘리기.
블로킹 라이브러리를 async 버전으로 교체 (예: rusqlite → sqlx).
Work queue로 front-load: spawn_blocking 대신 한 번에 N개 task로 제한.

16.3 "CPU는 20%인데 왜 처리량이 안 오르지?"

I/O waiting에 대부분 시간을 쓰고 있을 가능성. strace 또는 perf trace로 syscall 추적.

epoll_wait 비율 높음 → 정상, 연결 수 늘리기.
read/write가 EAGAIN 많이 → 적절.
futex에 많은 시간 → 락 경합.

17. 학습 로드맵

1단계: 기본

Async Book — 공식 문서.
Tokio Tutorial — 실습 중심.
cargo run --example로 Tokio 예제 돌려보기.

2단계: 내부

Rust for Rustaceans — Jon Gjengset의 책.
Jon Gjengset의 YouTube: "Crust of Rust" 시리즈, 특히 "Pinning".
Tokio 소스코드: tokio-rt/src/runtime/scheduler/multi_thread/.

3단계: 심화

Withoutboats의 블로그 (async Rust 설계자 중 하나).
"Zero to Production in Rust" — 웹 서버 실전.
"Asynchronous Programming in Rust" — Carl Fredrik Samson.

영상:

dotRust, RustConf 토크들.
"The What and How of Futures and async/await in Rust" — Jon Gjengset.

18. 요약 — 한 장 정리

┌─────────────────────────────────────────────────────┐
│            Rust Tokio Cheat Sheet                    │
├─────────────────────────────────────────────────────┤
│ Future trait:                                        │
│   fn poll(Pin<&mut Self>, &mut Context) -> Poll      │
│     Poll::Ready(value) | Poll::Pending               │
│                                                      │
│ async fn → State machine struct                      │
│   - suspend point = enum variant                     │
│   - 로컬 변수 = struct field                          │
│   - .await = match poll { Pending => return; .. }    │
│                                                      │
│ Waker:                                               │
│   - "이 task 다시 poll해달라" 콜백                    │
│   - Context::waker() → clone → 등록                  │
│   - I/O/timer/channel이 wake 호출                    │
│                                                      │
│ Pin:                                                 │
│   - "이동 불가" 보장                                  │
│   - 자체 참조 Future 안전                             │
│   - Box::pin, pin!, spawn이 자동 처리                 │
│                                                      │
│ Tokio scheduler:                                     │
│   - Worker + LIFO slot + local runq + global         │
│   - Work-stealing (절반)                              │
│   - I/O driver 주기적 폴링                            │
│   - Cooperative only (자동 preempt 없음)              │
│                                                      │
│ Spawning:                                            │
│   - tokio::spawn(fut): async task                    │
│   - spawn_blocking(f): 블로킹 풀                      │
│   - block_in_place(f): 현재 worker를 blocking 풀로    │
│                                                      │
│ Channels:                                            │
│   - mpsc: 버퍼 있는 생산자→소비자                      │
│   - oneshot: 1회성                                   │
│   - broadcast: 팬아웃                                 │
│   - watch: 최신 값 관찰                               │
│                                                      │
│ 주의:                                                 │
│   - std::sync::Mutex를 async에서 쓰지 말 것          │
│   - CPU-heavy는 spawn_blocking/yield_now             │
│   - select!는 cancel-safe 확인                       │
│   - big Future는 Box로                               │
│                                                      │
│ 디버깅:                                                │
│   - tokio-console                                    │
│   - tracing                                          │
│   - cargo flamegraph                                 │
└─────────────────────────────────────────────────────┘

19. 퀴즈

Q1. async fn은 컴파일 타임에 무엇으로 변환되는가?

A. 컴파일러가 자동 생성하는 상태 머신 struct로 변환된다. 각 .await 지점은 enum의 variant가 되고, 로컬 변수는 struct 필드로 저장된다. Future trait을 impl하며 poll() 메소드는 거대한 match로 현재 상태에 따라 진행한다. 런타임 스택이 없어 zero-cost.

Q2. Waker의 역할은 무엇인가?

A. "이 task를 다시 poll해달라"고 런타임에 알리는 콜백. Future가 Pending을 반환할 때 현재 Context의 waker를 저장해두고, 나중에 I/O 이벤트가 도착하거나 타이머가 만료되면 waker.wake()를 호출. 런타임은 해당 task를 런큐에 다시 넣어 실행한다. 이 메커니즘 덕분에 polling 없이 효율적으로 재개할 수 있다.

Q3. Pin이 필요한 이유는?

A. async fn에서 생성되는 상태 머신이 종종 자체 참조 구조체를 포함하기 때문. 예를 들어 로컬 변수에 대한 참조가 suspend 후에도 유효해야 한다. 만약 상태 머신 자체가 메모리 상에서 이동하면 내부 포인터가 옛 위치를 가리켜 use-after-free가 된다. Pin<&mut T>은 "이 타입은 이동하지 않는다"는 타입 수준 약속을 제공해 안전성을 보장한다.

Q4. Tokio에서 tokio::spawn 안에 CPU-bound 루프를 넣으면 어떻게 되는가?

A. 해당 task가 worker를 독점한다. Tokio는 cooperative 스케줄러라서 task가 .await을 호출해야 양보한다. .await 없이 100만 번 루프를 돌면 그 worker 스레드는 수 초간 다른 task를 실행하지 못한다. N개 worker 중 하나가 막히면 처리량은 (N-1)/N으로 떨어진다. 해결: tokio::task::yield_now().await을 주기적으로 호출하거나, CPU 작업을 spawn_blocking으로 별도 풀로 분리.

Q5. Go goroutine과 Rust Future의 본질적 차이는?

A. Go는 stackful: 각 goroutine이 2KB 동적 스택을 가진다. 어느 함수 어디서든 suspend 가능하고 자동 preemption도 된다. Rust는 stackless: Future는 상태 머신 struct로 스택 없음. suspend는 .await 지점에서만. 런타임 오버헤드가 0에 가깝고 no_std에서도 쓸 수 있지만, function coloring과 Pin의 복잡성이 대가. 두 방식은 철학이 다른 것이지 우열은 없다.

Q6. select! 매크로의 cancel safety란 무엇인가?

A. select!는 여러 Future 중 먼저 Ready가 되는 하나를 선택하고 나머지를 drop한다. 만약 drop되는 Future가 중간 상태(예: 버퍼에 일부 데이터를 읽은 상태)를 가지고 있으면 그 상태가 유실된다. Cancel-safe 함수는 이런 상황에서도 데이터 손실이나 내부 상태 손상이 없는 API. 예: mpsc::Receiver::recv는 cancel safe(채널에 메시지가 남음), AsyncReadExt::read_exact는 cancel safe가 아님(일부 읽은 데이터 유실).

Q7. "Function coloring" 문제를 Go는 어떻게 피했는가?

A. Go는 모든 함수를 내재적으로 async로 만든다. 런타임이 항상 존재하고 go func()로 어떤 함수든 고루틴화할 수 있다. 블로킹 I/O는 스케줄러가 자동으로 처리(netpoller + syscall handoff). 개발자는 sync/async를 구분할 필요가 없다. 대가는 항상 존재하는 런타임(~2MB 바이너리 크기, 임베디드 불가능). Rust는 반대의 트레이드오프를 택했다: 런타임을 optional로, function coloring은 대가로.

이 글이 도움이 됐다면 다음 포스트도 확인해 보세요:

"Go Runtime & GMP Scheduler Deep Dive" — stackful goroutine의 정반대 접근.
"Python GIL과 CPython 내부" — 또 다른 대안, Rust와도 Go와도 다른 선택.
"eBPF Deep Dive" — 커널 프로그래밍의 혁명.
"io_uring과 비동기 I/O 모델" — Tokio의 미래.