💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

도입: 동시성은 왜 어려운가?

2024년 AWS 장애의 주요 원인 중 하나는 동시성 버그였습니다. Cloudflare의 대규모 장애도 레이스 컨디션에서 비롯되었습니다. 동시성 프로그래밍은 올바르게 이해하지 않으면 디버깅이 불가능한 버그를 만들어냅니다.

Rob Pike(Go 언어 공동 창시자)는 명확하게 구분했습니다:

"동시성(Concurrency)은 독립적으로 실행되는 작업들을 다루는 것이고,

병렬성(Parallelism)은 여러 작업을 동시에 실행하는 것이다.

동시성은 구조(structure)에 관한 것이고,

병렬성은 실행(execution)에 관한 것이다."

— Rob Pike

동시성 (Concurrency): 병렬성 (Parallelism):

한 사람이 두 가지 일을 번갈아 두 사람이 각각 한 가지 일을 동시에

Task A ━━━━━━╸ Task A ━━━━━━━━━━━━

╻ Task B ━━━━━━━━━━━━

Task B ━━━━━━╸

╻ CPU Core 1 ━━━━━━━━

Task A ━━━━━━╸ CPU Core 2 ━━━━━━━━

╻

CPU Core 1 ━━━━━━━━

(시분할로 번갈아 실행) (물리적으로 동시 실행)

이 가이드에서는 동시성과 병렬성의 이론적 기초부터 언어별 구현, 실전 패턴, 그리고 흔한 버그와 해결책까지 체계적으로 다룹니다.

1. 스레딩 모델

1.1 1:1 모델 (커널 스레드)

OS가 직접 관리하는 스레드. 각 사용자 스레드가 하나의 커널 스레드에 매핑됩니다.

사용자 스레드: T1 T2 T3 T4

│ │ │ │

커널 스레드: KT1 KT2 KT3 KT4

│ │ │ │

CPU 코어: Core1 Core2 Core1 Core2

**사용 언어**: Java (전통적), C/C++ (pthread), Rust

1.2 N:1 모델 (그린 스레드)

런타임이 여러 사용자 스레드를 하나의 OS 스레드에 매핑합니다.

사용자 스레드: T1 T2 T3 T4 T5 T6

└──┬──┘ └──┬──┘ └──┬──┘

│ │ │

커널 스레드: KT1 KT1 KT1

│

CPU 코어: Core1

**장점**: 경량, 빠른 컨텍스트 스위칭

**단점**: 멀티코어 활용 불가

**사용**: 초기 Java, 초기 Ruby

1.3 M:N 모델 (하이브리드)

M개의 사용자 스레드를 N개의 OS 스레드에 매핑하는 최적의 모델:

사용자 스레드: G1 G2 G3 G4 G5 G6 G7 G8

└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘

│ │ │ │

커널 스레드: KT1 KT2 KT3 KT4

│ │ │ │

CPU 코어: Core1 Core2 Core3 Core4

**사용 언어**: Go (고루틴), Java 21+ (Virtual Thread), Erlang (프로세스)

1.4 스레딩 모델 비교

┌─────────────────┬──────────────┬──────────────┬──────────────┐

│ 항목 │ 1:1 │ N:1 │ M:N │

├─────────────────┼──────────────┼──────────────┼──────────────┤

│ 스레드 비용 │ ~1MB 스택 │ ~KB 스택 │ ~KB 스택 │

│ 생성 속도 │ 느림 (~ms) │ 빠름 (~us) │ 빠름 (~us) │

│ 컨텍스트 스위치 │ 비쌈 (커널) │ 저렴 (유저) │ 저렴 (유저) │

│ 멀티코어 활용 │ O │ X │ O │

│ I/O 블로킹 │ 스레드만 블록│ 전체 블록 │ 스레드만 블록│

│ 최대 스레드 수 │ ~수천 │ ~수백만 │ ~수백만 │

│ 복잡도 │ 낮음 │ 중간 │ 높음 │

│ 예시 │ Java, C++, │ 초기 Ruby │ Go, Erlang, │

│ │ Rust │ │ Java 21+ │

└─────────────────┴──────────────┴──────────────┴──────────────┘

2. 언어별 동시성 비교

2.1 Go — 고루틴과 채널

package main

"fmt"

"sync"

)

func main() {

var wg sync.WaitGroup

results := make(chan string, 10)

urls := []string{

"https://api.example.com/users",

"https://api.example.com/orders",

"https://api.example.com/products",

}

for _, url := range urls {

wg.Add(1)

go func(u string) {

defer wg.Done()

// HTTP 요청 시뮬레이션

results <- fmt.Sprintf("Fetched: %s", u)

}(url)

}

go func() {

wg.Wait()

close(results)

}()

for result := range results {

fmt.Println(result)

}

**특징**: CSP(Communicating Sequential Processes) 기반. "메모리를 공유하지 말고, 통신으로 메모리를 공유하라."

2.2 Java 21+ — Virtual Thread

public class VirtualThreadExample {

public static void main(String[] args) throws Exception {

// Virtual Thread로 100만 동시 작업

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {

List<Future<String>> futures = List.of(

executor.submit(() -> fetchURL("https://api.example.com/users")),

executor.submit(() -> fetchURL("https://api.example.com/orders")),

executor.submit(() -> fetchURL("https://api.example.com/products"))

);

for (var future : futures) {

System.out.println(future.get());

}

// Structured Concurrency (Preview)

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {

var user = scope.fork(() -> fetchUser(1));

var order = scope.fork(() -> fetchOrder(1));

scope.join();

scope.throwIfFailed();

System.out.println(user.get() + ", " + order.get());

}

**특징**: Project Loom으로 추가된 가상 스레드. 기존 코드 변경 최소화하면서 수백만 동시 작업 처리.

2.3 Python — asyncio와 GIL

async def fetch_url(session, url):

async with session.get(url) as response:

return await response.text()

async def main():

urls = [

"https://api.example.com/users",

"https://api.example.com/orders",

"https://api.example.com/products",

]

async with aiohttp.ClientSession() as session:

동시에 모든 URL 요청

tasks = [fetch_url(session, url) for url in urls]

results = await asyncio.gather(*tasks)

for url, result in zip(urls, results):

print(f"Fetched {url}: {len(result)} bytes")

asyncio.run(main())

**특징**: GIL(Global Interpreter Lock) 때문에 CPU 바운드 작업은 진정한 병렬 처리 불가. I/O 바운드 작업에는 asyncio 효과적. CPU 바운드는 multiprocessing 사용.

2.4 Rust — tokio 비동기 런타임

use tokio;

use reqwest;

#[tokio::main]

async fn main() -> Result<(), Box<dyn std::error::Error>> {

let urls = vec![

"https://api.example.com/users",

"https://api.example.com/orders",

"https://api.example.com/products",

];

let mut handles = vec![];

for url in urls {

let handle = tokio::spawn(async move {

let resp = reqwest::get(url).await?;

let body = resp.text().await?;

Ok::<String, reqwest::Error>(body)

});

handles.push(handle);

}

for handle in handles {

match handle.await? {

Ok(body) => println!("Fetched: {} bytes", body.len()),

Err(e) => eprintln!("Error: {}", e),

}

Ok(())

}

**특징**: 소유권 시스템으로 컴파일 타임에 데이터 레이스 방지. Send/Sync 트레잇으로 스레드 안전성 보장.

2.5 JavaScript — 이벤트 루프

// Node.js 비동기 패턴

async function fetchAll() {

const urls = [

'https://api.example.com/users',

'https://api.example.com/orders',

'https://api.example.com/products',

];

// Promise.all로 동시 실행

const results = await Promise.all(

urls.map(url => fetch(url).then(r => r.json()))

);

// Promise.allSettled로 실패 허용

const settled = await Promise.allSettled(

urls.map(url => fetch(url).then(r => r.json()))

);

settled.forEach((result, i) => {

if (result.status === 'fulfilled') {

console.log(`Success: ${urls[i]}`);

} else {

console.log(`Failed: ${urls[i]} - ${result.reason}`);

}

});

}

**특징**: 단일 스레드 + 이벤트 루프. CPU 바운드 작업은 Worker Thread 필요. I/O에 최적화.

2.6 Erlang/Elixir — 액터 모델

%% Erlang 프로세스 (경량 액터)

-module(counter).

-export([start/0, increment/1, get/1]).

start() ->

spawn(fun() -> loop(0) end).

loop(Count) ->

receive

{increment, From} ->

From ! {ok, Count + 1},

loop(Count + 1);

{get, From} ->

From ! {value, Count},

loop(Count);

stop ->

end.

increment(Pid) ->

Pid ! {increment, self()},

receive {ok, NewCount} -> NewCount end.

get(Pid) ->

Pid ! {get, self()},

receive {value, Count} -> Count end.

**특징**: 프로세스당 약 300바이트. 수백만 프로세스 동시 실행. "Let it crash" 철학으로 내결함성(fault-tolerance) 달성.

2.7 언어별 동시성 비교표

┌─────────────┬─────────────┬───────────────┬──────────────┬──────────────┐

│ 언어 │ 모델 │ 단위 │ 스케줄링 │ 특징 │

├─────────────┼─────────────┼───────────────┼──────────────┼──────────────┤

│ Go │ CSP (M:N) │ 고루틴 │ 런타임 │ 채널 통신 │

│ Java 21+ │ M:N │ Virtual Thread│ JVM │ 기존호환 │

│ Python │ 이벤트루프 │ 코루틴 │ asyncio │ GIL 제약 │

│ Rust │ async/await │ Future/Task │ tokio 등 │ 제로코스트 │

│ JavaScript │ 이벤트루프 │ Promise │ libuv/V8 │ 단일스레드 │

│ Erlang │ Actor(M:N) │ 프로세스 │ BEAM VM │ 내결함성 │

│ Kotlin │ 코루틴 │ Coroutine │ Dispatcher │ 구조적동시성 │

└─────────────┴─────────────┴───────────────┴──────────────┴──────────────┘

3. 이벤트 루프 심층 분석

3.1 Node.js 이벤트 루프 단계

┌───────────────────────────┐

│ timers │ ← setTimeout, setInterval

│ (타이머 콜백 실행) │

├───────────────────────────┤

│ pending callbacks │ ← 시스템 에러 콜백

│ (지연된 I/O 콜백) │

├───────────────────────────┤

│ idle, prepare │ ← 내부용

│ (내부 전용) │

├───────────────────────────┤

│ poll │ ← I/O 콜백 (대부분의 콜백)

│ (새 I/O 이벤트 대기) │

├───────────────────────────┤

│ check │ ← setImmediate 콜백

│ (setImmediate 실행) │

├───────────────────────────┤

│ close callbacks │ ← socket.on('close')

│ (종료 콜백) │

└───────────────────────────┘

각 단계 사이: 마이크로태스크 큐 처리

(Promise.then, queueMicrotask, process.nextTick)

3.2 마이크로태스크 vs 매크로태스크

console.log('1: sync');

setTimeout(() => console.log('2: macro (setTimeout)'), 0);

Promise.resolve().then(() => console.log('3: micro (Promise)'));

queueMicrotask(() => console.log('4: micro (queueMicrotask)'));

console.log('5: sync');

// 출력 순서:

// 1: sync

// 5: sync

// 3: micro (Promise)

// 4: micro (queueMicrotask)

// 2: macro (setTimeout)

**규칙**: 동기 코드 먼저 실행 → 마이크로태스크 전부 실행 → 매크로태스크 하나 실행 → 반복

3.3 libuv 아키텍처

┌──────────────────────────────────────────┐

│ Node.js │

│ ┌─────────────────────────────────┐ │

│ │ JavaScript (V8 Engine) │ │

│ │ ┌───────────┐ ┌────────────┐ │ │

│ │ │ Async │ │ Event │ │ │

│ │ │ APIs │ │ Queue │ │ │

│ │ └─────┬─────┘ └──────┬─────┘ │ │

│ └────────┼────────────────┼────────┘ │

│ │ Event Loop │ │

│ ┌────────▼────────────────▼────────┐ │

│ │ libuv │ │

│ │ ┌──────────┐ ┌──────────────┐ │ │

│ │ │ Thread │ │ epoll/kqueue │ │ │

│ │ │ Pool │ │ /IOCP │ │ │

│ │ │ (4 thds) │ │ (non-block) │ │ │

│ │ └──────────┘ └──────────────┘ │ │

│ └───────────────────────────────────┘ │

│ │ │ │

│ File I/O Network I/O │

│ DNS lookup TCP/UDP │

│ Crypto HTTP │

└──────────────────────────────────────────┘

파일 I/O와 DNS는 스레드 풀에서 처리되고, 네트워크 I/O는 OS의 비동기 메커니즘(epoll, kqueue, IOCP)으로 처리됩니다.

4. 코루틴과 구조적 동시성

4.1 Python async/await

async def process_item(item: str) -> str:

await asyncio.sleep(1) # I/O 시뮬레이션

return f"Processed: {item}"

async def main():

items = ["a", "b", "c", "d", "e"]

동시 실행

tasks = [process_item(item) for item in items]

results = await asyncio.gather(*tasks)

print(results) # 1초만에 모두 완료

타임아웃

try:

result = await asyncio.wait_for(

process_item("slow"),

timeout=0.5

)

except asyncio.TimeoutError:

print("Timeout!")

세마포어로 동시 실행 수 제한

semaphore = asyncio.Semaphore(3)

async def limited_process(item):

async with semaphore:

return await process_item(item)

results = await asyncio.gather(*[limited_process(i) for i in items])

asyncio.run(main())

4.2 Kotlin 코루틴과 구조적 동시성

suspend fun fetchUser(id: Int): User {

delay(1000) // I/O 시뮬레이션

return User(id, "User $id")

}

fun main() = runBlocking {

// 구조적 동시성: 부모가 취소되면 자식도 취소

coroutineScope {

val user = async { fetchUser(1) }

val orders = async { fetchOrders(1) }

println("User: ${user.await()}")

println("Orders: ${orders.await()}")

} // 모든 자식 코루틴 완료까지 대기

// SupervisorScope: 자식 실패가 다른 자식에 영향 안 줌

supervisorScope {

val job1 = launch {

delay(100)

throw RuntimeException("Failed!")

}

val job2 = launch {

delay(200)

println("Job2 completed") // 정상 실행

}

4.3 구조적 동시성 원칙

전통적 동시성: 구조적 동시성:

┌─────────────┐ ┌─────────────┐

│ Parent │ │ Parent │

│ ┌───┐ ┌───┐│ │ ┌───┐ ┌───┐│

│ │ A │ │ B ││ │ │ A │ │ B ││

│ └─┬─┘ └─┬─┘│ │ └─┬─┘ └─┬─┘│

└────┼─────┼───┘ └────┼─────┼───┘

│ │ ← 누가 관리? │ │ ← Parent가 관리

▼ ▼ 누수 가능! ▼ ▼ 자동 정리!

규칙:

1. 모든 동시 작업은 스코프 내에서 시작

2. 스코프는 모든 자식 작업 완료까지 대기

3. 부모 취소 시 모든 자식도 취소

4. 자식 에러는 부모에게 전파

5. 액터 모델

5.1 개념

액터 모델에서 각 액터는:

- 자신만의 상태를 가짐 (다른 액터와 공유하지 않음)

- 메시지를 비동기로 주고받음

- 메시지를 처리하며 새 액터를 만들거나, 상태를 변경하거나, 메시지를 보낼 수 있음

┌──────────┐ 메시지 ┌──────────┐

│ Actor A │ ─────────────→ │ Actor B │

│ │ │ │

│ State: x │ 메시지 │ State: y │

│ Mailbox │ ←───────────── │ Mailbox │

└──────────┘ └──────────┘

│

│ 메시지

▼

┌──────────┐

│ Actor C │

│ State: z │

│ Mailbox │

└──────────┘

핵심: 공유 상태 없음, 락 없음, 메시지 전달만

5.2 Erlang/OTP 감독 트리

┌──────────────┐

│ Supervisor │

│ (one_for_one)│

└──────┬───────┘

┌────┼────┐

▼ ▼ ▼

┌─────┐ ┌─────┐ ┌─────┐

│ W1 │ │ W2 │ │ W3 │

└─────┘ └─────┘ └─────┘

W2가 크래시하면:

1. Supervisor가 감지

2. W2만 재시작 (one_for_one 전략)

3. W1, W3는 영향 없음

4. 5분 안에 5번 이상 재시작하면 Supervisor도 종료

5.3 언제 액터 모델을 사용하나?

적합한 경우:

- 수백만 독립적 상태 관리 (채팅방, IoT 디바이스)

- 분산 시스템 (노드 간 통신)

- 내결함성이 필수인 시스템 (통신, 금융)

- 실시간 시스템 (게임 서버, 트레이딩)

부적합한 경우:

- 단순한 CRUD 애플리케이션

- 강한 일관성(strong consistency) 필요

- 트랜잭션 처리가 핵심

- 순서 보장이 중요한 파이프라인

6. 동기화 프리미티브

6.1 뮤텍스 (Mutex)

// Go - Mutex

type SafeMap struct {

mu sync.Mutex

data map[string]int

}

func (m *SafeMap) Set(key string, val int) {

m.mu.Lock()

defer m.mu.Unlock()

m.data[key] = val

}

func (m *SafeMap) Get(key string) (int, bool) {

m.mu.Lock()

defer m.mu.Unlock()

val, ok := m.data[key]

return val, ok

}

// Java - ReentrantLock

public class SafeCounter {

private final ReentrantLock lock = new ReentrantLock();

private int count = 0;

public void increment() {

lock.lock();

try {

count++;

} finally {

lock.unlock();

}

6.2 RWLock (읽기-쓰기 잠금)

type Cache struct {

mu sync.RWMutex

data map[string]string

}

// 여러 고루틴이 동시에 읽기 가능

func (c *Cache) Get(key string) (string, bool) {

c.mu.RLock()

defer c.mu.RUnlock()

val, ok := c.data[key]

return val, ok

}

// 쓰기는 배타적 잠금

func (c *Cache) Set(key, val string) {

c.mu.Lock()

defer c.mu.Unlock()

c.data[key] = val

}

6.3 세마포어 (Semaphore)

async def worker(name, semaphore):

async with semaphore:

print(f"{name} acquired semaphore")

await asyncio.sleep(1)

print(f"{name} released semaphore")

async def main():

최대 3개 동시 실행

semaphore = asyncio.Semaphore(3)

tasks = [worker(f"Worker-{i}", semaphore) for i in range(10)]

await asyncio.gather(*tasks)

asyncio.run(main())

6.4 조건 변수 (Condition Variable)

type BoundedQueue struct {

mu sync.Mutex

notEmpty *sync.Cond

notFull *sync.Cond

items []interface{}

maxSize int

}

func NewBoundedQueue(maxSize int) *BoundedQueue {

q := &BoundedQueue{maxSize: maxSize}

q.notEmpty = sync.NewCond(&q.mu)

q.notFull = sync.NewCond(&q.mu)

return q

}

func (q *BoundedQueue) Put(item interface{}) {

q.mu.Lock()

defer q.mu.Unlock()

for len(q.items) == q.maxSize {

q.notFull.Wait() // 공간이 생길 때까지 대기

}

q.items = append(q.items, item)

q.notEmpty.Signal() // 소비자 깨우기

}

func (q *BoundedQueue) Take() interface{} {

q.mu.Lock()

defer q.mu.Unlock()

for len(q.items) == 0 {

q.notEmpty.Wait() // 아이템이 올 때까지 대기

}

item := q.items[0]

q.items = q.items[1:]

q.notFull.Signal() // 생산자 깨우기

return item

}

7. Lock-Free 프로그래밍

7.1 CAS (Compare-And-Swap)

type LockFreeCounter struct {

value int64

}

func (c *LockFreeCounter) Increment() int64 {

for {

old := atomic.LoadInt64(&c.value)

new := old + 1

if atomic.CompareAndSwapInt64(&c.value, old, new) {

return new

}

// CAS 실패 시 재시도 (다른 고루틴이 먼저 변경)

}

func (c *LockFreeCounter) Get() int64 {

return atomic.LoadInt64(&c.value)

}

7.2 ABA 문제

스레드 A: 값 읽기 → A

스레드 B: A → B 변경

스레드 B: B → A 변경 (원래 값으로 복원)

스레드 A: CAS(A → C) 성공! (하지만 중간에 값이 변했었음)

해결책:

- 버전 번호 추가 (A,v1 → B,v2 → A,v3)

- 태그된 포인터 (AtomicStampedReference in Java)

- Hazard Pointer

7.3 Lock-Free를 사용하지 말아야 할 때

Lock-Free가 적합한 경우:

- 매우 높은 경합(contention) 상황

- 실시간 시스템 (우선순위 역전 방지)

- 단순한 자료구조 (카운터, 스택, 큐)

Lock-Free를 피해야 하는 경우 (대부분의 경우):

- 일반적인 애플리케이션 (Mutex로 충분)

- 복잡한 자료구조 (트리, 그래프)

- 정확성 검증이 어려운 상황

- 팀원이 Lock-Free에 익숙하지 않은 경우

"Mutex가 작동하면 Mutex를 쓰세요."

8. 흔한 동시성 버그

8.1 레이스 컨디션 (Race Condition)

// 버그: 여러 고루틴이 동시에 counter 수정

var counter int

func buggyIncrement() {

for i := 0; i < 1000; i++ {

go func() {

counter++ // 읽기-수정-쓰기가 원자적이지 않음

}()

}

// 해결: atomic 또는 mutex 사용

var safeCounter int64

func safeIncrement() {

for i := 0; i < 1000; i++ {

go func() {

atomic.AddInt64(&safeCounter, 1)

}()

}

8.2 데드락 (Deadlock)

4가지 필요조건 (모두 만족해야 데드락):

1. 상호 배제 (Mutual Exclusion): 리소스를 한 번에 하나만 사용

2. 점유와 대기 (Hold and Wait): 리소스 점유하면서 다른 리소스 대기

3. 비선점 (No Preemption): 점유된 리소스를 강제로 빼앗을 수 없음

4. 순환 대기 (Circular Wait): A→B→C→A 순환 대기

// 데드락 예시

var mu1, mu2 sync.Mutex

func goroutine1() {

mu1.Lock()

defer mu1.Unlock()

time.Sleep(time.Millisecond)

mu2.Lock() // mu2 대기 (goroutine2가 점유)

defer mu2.Unlock()

}

func goroutine2() {

mu2.Lock()

defer mu2.Unlock()

time.Sleep(time.Millisecond)

mu1.Lock() // mu1 대기 (goroutine1이 점유)

defer mu1.Unlock()

}

// 해결: 항상 같은 순서로 잠금 획득

func fixed1() {

mu1.Lock()

defer mu1.Unlock()

mu2.Lock()

defer mu2.Unlock()

}

func fixed2() {

mu1.Lock() // mu1을 먼저 (같은 순서)

defer mu1.Unlock()

mu2.Lock()

defer mu2.Unlock()

}

8.3 라이브락 (Livelock)

A와 B가 좁은 복도에서 만남:

A: 왼쪽으로 비킴 ← → B: 오른쪽으로 비킴 (같은 방향!)

A: 오른쪽으로 비킴 ← → B: 왼쪽으로 비킴 (또 같은 방향!)

... (무한 반복, 아무도 진행 못함)

vs 데드락: 아무것도 안 함

vs 라이브락: 계속 움직이지만 진행 없음

8.4 기아 (Starvation)

우선순위가 높은 스레드가 계속 리소스를 차지하여, 낮은 우선순위 스레드가 영원히 실행되지 못하는 상태.

8.5 우선순위 역전 (Priority Inversion)

1997년 화성 탐사선 Mars Pathfinder 사건:

Low-priority 태스크가 뮤텍스 점유

↓

Medium-priority 태스크가 Low를 선점 (뮤텍스 무관)

↓

High-priority 태스크가 뮤텍스 대기... 영원히!

(Low가 실행돼야 뮤텍스 해제되는데, Medium이 계속 선점)

해결: 우선순위 상속 프로토콜

Low가 뮤텍스를 점유하면 High의 우선순위를 일시적으로 상속

8.6 감지 도구

Go: 레이스 디텍터

go test -race ./...

go run -race main.go

Java: Thread Sanitizer

java -XX:+UseThreadSanitizer MyApp

Rust: 컴파일러가 컴파일 타임에 감지!

Send/Sync 트레잇 위반 시 컴파일 에러

Python: ThreadSanitizer (C extension)

또는 asyncio.debug 모드

PYTHONASYNCIODEBUG=1 python app.py

일반: Helgrind (Valgrind 도구)

valgrind --tool=helgrind ./program

9. 실전 패턴

9.1 생산자-소비자

func producerConsumer() {

ch := make(chan int, 100) // 버퍼 있는 채널

// 생산자들

var prodWg sync.WaitGroup

for i := 0; i < 3; i++ {

prodWg.Add(1)

go func(id int) {

defer prodWg.Done()

for j := 0; j < 10; j++ {

ch <- id*100 + j

}

}(i)

}

// 채널 닫기

go func() {

prodWg.Wait()

close(ch)

}()

// 소비자들

var consWg sync.WaitGroup

for i := 0; i < 2; i++ {

consWg.Add(1)

go func(id int) {

defer consWg.Done()

for item := range ch {

fmt.Printf("Consumer %d: %d\n", id, item)

}

}(i)

}

consWg.Wait()

}

9.2 스레드 풀 / 워커 풀

from asyncio import Queue

async def worker(name: str, queue: Queue):

while True:

item = await queue.get()

try:

await process(item)

finally:

queue.task_done()

async def main():

queue = Queue(maxsize=100)

워커 5개 시작

workers = [

asyncio.create_task(worker(f"worker-{i}", queue))

for i in range(5)

]

작업 추가

for item in range(50):

await queue.put(item)

모든 작업 완료 대기

await queue.join()

워커 종료

for w in workers:

w.cancel()

asyncio.run(main())

9.3 Work Stealing

전통적 스레드 풀: Work Stealing:

┌─────────────┐ ┌─────────────┐

│ 공유 큐 │ │ Worker 1 │

│ [T1|T2|T3] │ │ 로컬큐:[T1] │

│ │ ├─────────────┤

│ 모든 워커가 │ │ Worker 2 │

│ 경합! │ │ 로컬큐:[T2] │ ← 비면 Worker 1에서 훔침

│ │ ├─────────────┤

└─────────────┘ │ Worker 3 │

│ 로컬큐:[T3] │

└─────────────┘

장점: 캐시 지역성 향상, 경합 감소

사용: Java ForkJoinPool, Go 런타임, Tokio(Rust)

10. 성능

10.1 암달의 법칙 (Amdahl's Law)

속도 향상 = 1 / ((1 - P) + P/N)

P = 병렬화 가능한 비율

N = 프로세서 수

예: 프로그램의 95%가 병렬화 가능, 64코어일 때

속도 향상 = 1 / (0.05 + 0.95/64) = 1 / 0.0648 = 15.4배

→ 아무리 코어를 늘려도 직렬 부분(5%)이 병목!

→ 최대 이론적 속도 향상 = 1/0.05 = 20배

10.2 구스타프슨의 법칙 (Gustafson's Law)

속도 향상 = N - alpha * (N - 1)

alpha = 직렬 부분의 비율

N = 프로세서 수

핵심 차이:

- 암달: "문제 크기 고정, 코어를 늘리면?"

- 구스타프슨: "코어 고정, 문제 크기를 늘리면?"

→ 현실에서는 데이터가 늘어나면 병렬 부분도 늘어남

→ 구스타프슨이 더 낙관적이고 현실적

10.3 컨텍스트 스위치 비용

컨텍스트 스위치 비용 비교:

┌───────────────────────────┬──────────────┐

│ 스위치 유형 │ 대략적 비용 │

├───────────────────────────┼──────────────┤

│ 함수 호출 │ ~1-5 ns │

│ 고루틴 스위치 │ ~100-200 ns │

│ 코루틴 스위치 (Python) │ ~1-5 us │

│ 스레드 스위치 (같은 프로세스)│ ~1-10 us │

│ 프로세스 스위치 │ ~10-50 us │

│ 가상머신 스위치 │ ~100+ us │

└───────────────────────────┴──────────────┘

10.4 False Sharing (캐시 라인 문제)

문제: 서로 다른 변수가 같은 캐시 라인에 위치

CPU Core 1 CPU Core 2

│ │

┌───▼───────────────────▼───┐

│ Cache Line (64 bytes) │

│ [counter_A] [counter_B] │ ← 서로 다른 변수인데

│ │ 같은 캐시 라인!

└───────────────────────────┘

Core 1이 counter_A 수정 → Cache Line 무효화 → Core 2도 재로드 필요

Core 2가 counter_B 수정 → Cache Line 무효화 → Core 1도 재로드 필요

해결: 패딩으로 캐시 라인 분리

// Go에서 False Sharing 방지

type PaddedCounter struct {

value int64

_pad [56]byte // 64바이트 캐시 라인 패딩

}

type Counters struct {

c1 PaddedCounter

c2 PaddedCounter

}

11. 면접 질문 15선

1. **동시성(Concurrency)과 병렬성(Parallelism)의 차이는?**

동시성은 여러 작업을 번갈아 처리하는 구조(structure), 병렬성은 물리적으로 동시에 실행(execution)하는 것. 단일 코어에서도 동시성은 가능하지만 병렬성은 불가.

2. **프로세스와 스레드의 차이는?**

프로세스는 독립된 메모리 공간, 스레드는 프로세스 내 메모리 공유. 스레드가 생성/전환 비용이 낮지만 동기화가 필요.

3. **데드락의 4가지 필요조건과 각각의 해결법은?**

상호배제, 점유와대기, 비선점, 순환대기. 순서화된 잠금 획득(순환대기 방지)이 가장 실용적.

4. **Go 고루틴이 OS 스레드보다 가벼운 이유는?**

약 2KB 스택(OS 스레드는 1MB+), M:N 스케줄러로 소수 OS 스레드에 수백만 고루틴 매핑, 사용자 공간 컨텍스트 스위칭.

5. **Python의 GIL이란 무엇이며 왜 존재하는가?**

Global Interpreter Lock. CPython의 메모리 관리(참조 카운팅)가 스레드 안전하지 않아서 필요. CPU 바운드 병렬 처리를 위해서는 multiprocessing 사용.

6. **이벤트 루프와 멀티스레딩의 장단점은?**

이벤트 루프: 단순한 모델, 컨텍스트 스위칭 비용 없음, CPU 바운드에 부적합. 멀티스레딩: CPU 활용 극대화, 동기화 복잡, 데드락 위험.

7. **async/await가 스레드와 다른 점은?**

async/await는 협력적(cooperative) 멀티태스킹으로, await 지점에서만 전환. 스레드는 선점적(preemptive)으로 OS가 언제든 전환 가능.

8. **액터 모델의 장점과 단점은?**

장점: 공유 상태 없음, 데드락 없음, 분산 시스템에 자연스러움. 단점: 메시지 순서 보장 어려움, 디버깅 복잡, 성능 오버헤드.

9. **CAS 연산이란? 언제 Mutex 대신 사용하나?**

Compare-And-Swap은 원자적 비교-교환 연산. 단순한 카운터나 포인터 업데이트에서 Mutex보다 빠르지만, 복잡한 로직에는 부적합.

10. **False Sharing이란? 어떻게 해결하나?**

서로 다른 코어가 같은 캐시 라인의 다른 변수를 수정할 때 발생하는 성능 저하. 패딩으로 변수를 별도 캐시 라인에 배치하여 해결.

11. **암달의 법칙이 시사하는 바는?**

직렬 부분이 전체 속도 향상의 상한을 결정. 병렬화 최적화보다 직렬 부분 축소가 더 중요할 수 있음.

12. **Java Virtual Thread와 Go 고루틴의 차이는?**

둘 다 M:N 모델이지만, Go는 채널 기반 CSP 모델, Java는 기존 Thread API 호환성 유지. Go가 먼저 도입, Java 21에서 정식 지원.

13. **구조적 동시성(Structured Concurrency)이란?**

동시 작업을 스코프에 묶어 생명주기를 관리. 부모 취소 시 자식 자동 취소, 자식 완료까지 부모 대기. 고루틴/스레드 누수 방지.

14. **Erlang의 "Let it crash" 철학은?**

에러를 방어적으로 처리하기보다, 프로세스를 빠르게 종료하고 감독자가 재시작하는 것이 더 안전. 복잡한 에러 복구 코드 불필요.

15. **스레드 풀의 적절한 크기는?**

CPU 바운드: CPU 코어 수. I/O 바운드: CPU 코어 수 * (1 + 대기시간/처리시간). Little's Law(L = lambda * W)로 추정 가능.

12. 실전 퀴즈

console.log('A');

setTimeout(() => console.log('B'), 0);

Promise.resolve().then(() => console.log('C'));

console.log('D');

**답**: A, D, C, B. 동기 코드(A, D) 먼저 → 마이크로태스크(Promise: C) → 매크로태스크(setTimeout: B).

func main() {

ch := make(chan int)

ch <- 42

fmt.Println(<-ch)

}

**답**: 언버퍼 채널은 송신자와 수신자가 동시에 있어야 합니다. 메인 고루틴이 `ch <- 42`에서 수신자를 기다리며 블록되어 `<-ch`에 도달할 수 없습니다.

**답**: `gather`는 모든 결과를 순서대로 반환하고, 하나가 실패하면 예외를 발생시킵니다(return_exceptions=True로 변경 가능). `wait`는 완료/미완료 세트를 반환하고, FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED 옵션을 제공합니다. `wait`가 더 유연합니다.

**답**: 코어를 무한으로 늘려도 최대 속도 향상 = 1/(1-0.9) = 1/0.1 = **10배**입니다. 10%의 직렬 부분이 병목이 됩니다. 64코어에서는 1/(0.1 + 0.9/64) = 약 8.8배입니다.

**답**: Rust의 소유권(ownership) 시스템과 Send/Sync 트레잇이 핵심입니다. Send는 스레드 간 소유권 이전 가능, Sync는 스레드 간 참조 공유 가능을 의미합니다. 가변 참조는 동시에 하나만 존재할 수 있어(borrow checker), 컴파일 타임에 데이터 레이스를 원천 차단합니다.

참고 자료

1. [Rob Pike - Concurrency Is Not Parallelism (Talk)](https://go.dev/talks/2012/waza.slide)

2. [The Art of Multiprocessor Programming (Herlihy & Shavit)](https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-415950-1)

3. [Java Concurrency in Practice (Goetz et al.)](https://jcip.net/)

4. [JEP 444: Virtual Threads](https://openjdk.org/jeps/444)

5. [Node.js Event Loop Documentation](https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick)

6. [Python asyncio Documentation](https://docs.python.org/3/library/asyncio.html)

7. [Tokio Tutorial (Rust)](https://tokio.rs/tokio/tutorial)

8. [Erlang/OTP Design Principles](https://www.erlang.org/doc/design_principles/des_princ.html)

9. [Go Concurrency Patterns](https://go.dev/talks/2012/concurrency.slide)

10. [Kotlin Coroutines Guide](https://kotlinlang.org/docs/coroutines-guide.html)

11. [Lock-Free Programming (Preshing)](https://preshing.com/20120612/an-introduction-to-lock-free-programming/)

12. [Amdahl's Law (Wikipedia)](https://en.wikipedia.org/wiki/Amdahl%27s_law)

13. [LMAX Disruptor (Lock-Free Queue)](https://lmax-exchange.github.io/disruptor/)

14. [Structured Concurrency (Wikipedia)](https://en.wikipedia.org/wiki/Structured_concurrency)

15. [Thread Sanitizer (Google)](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)