- Published on
Concurrency & Parallelism Deep Dive: Multithreading, Async, and Event Loops Explained
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction: Why Is Concurrency So Hard?
- 1. Threading Models
- 2. Language-by-Language Concurrency
- 3. Event Loop Deep Dive
- 4. Coroutines and Structured Concurrency
- 5. Actor Model
- 6. Synchronization Primitives
- 7. Lock-Free Programming
- 8. Common Concurrency Bugs
- 9. Practical Patterns
- 10. Performance
- 11. Interview Questions (15)
- 12. Practice Quiz
- References
Introduction: Why Is Concurrency So Hard?
One of the major causes of the 2024 AWS outage was a concurrency bug. Cloudflare's large-scale incident also originated from a race condition. Concurrency programming creates impossible-to-debug bugs if not properly understood.
Rob Pike (co-creator of Go) made the distinction clear:
"Concurrency is about dealing with lots of things at once.
Parallelism is about doing lots of things at once.
Concurrency is about structure.
Parallelism is about execution."
— Rob Pike
Concurrency: Parallelism:
One person juggling two tasks Two people each doing one task
Task A ━━━━━━╸ Task A ━━━━━━━━━━━━
╻ Task B ━━━━━━━━━━━━
Task B ━━━━━━╸
╻ CPU Core 1 ━━━━━━━━
Task A ━━━━━━╸ CPU Core 2 ━━━━━━━━
╻
CPU Core 1 ━━━━━━━━
(time-slicing) (physically simultaneous)
This guide systematically covers the theoretical foundations of concurrency and parallelism, language-specific implementations, practical patterns, and common bugs with their solutions.
1. Threading Models
1.1 1:1 Model (Kernel Threads)
Threads managed directly by the OS. Each user thread maps to one kernel thread.
User threads: T1 T2 T3 T4
│ │ │ │
Kernel threads: KT1 KT2 KT3 KT4
│ │ │ │
CPU cores: Core1 Core2 Core1 Core2
Languages: Java (traditional), C/C++ (pthread), Rust
1.2 N:1 Model (Green Threads)
Runtime maps multiple user threads to a single OS thread.
User threads: T1 T2 T3 T4 T5 T6
└──┬──┘ └──┬──┘ └──┬──┘
│ │ │
Kernel threads: KT1 KT1 KT1
│
CPU cores: Core1
Pros: Lightweight, fast context switching Cons: Cannot use multiple cores Used by: Early Java, early Ruby
1.3 M:N Model (Hybrid)
Maps M user threads to N OS threads — the optimal model:
User threads: G1 G2 G3 G4 G5 G6 G7 G8
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
│ │ │ │
Kernel threads: KT1 KT2 KT3 KT4
│ │ │ │
CPU cores: Core1 Core2 Core3 Core4
Languages: Go (goroutines), Java 21+ (Virtual Threads), Erlang (processes)
1.4 Threading Model Comparison
┌─────────────────┬──────────────┬──────────────┬──────────────┐
│ Category │ 1:1 │ N:1 │ M:N │
├─────────────────┼──────────────┼──────────────┼──────────────┤
│ Thread cost │ ~1MB stack │ ~KB stack │ ~KB stack │
│ Creation speed │ Slow (~ms) │ Fast (~us) │ Fast (~us) │
│ Context switch │ Expensive │ Cheap (user) │ Cheap (user) │
│ Multi-core │ Yes │ No │ Yes │
│ I/O blocking │ Thread only │ All blocked │ Thread only │
│ Max threads │ ~thousands │ ~millions │ ~millions │
│ Complexity │ Low │ Medium │ High │
│ Examples │ Java, C++, │ Early Ruby │ Go, Erlang, │
│ │ Rust │ │ Java 21+ │
└─────────────────┴──────────────┴──────────────┴──────────────┘
2. Language-by-Language Concurrency
2.1 Go — Goroutines and Channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
results := make(chan string, 10)
urls := []string{
"https://api.example.com/users",
"https://api.example.com/orders",
"https://api.example.com/products",
}
for _, url := range urls {
wg.Add(1)
go func(u string) {
defer wg.Done()
results <- fmt.Sprintf("Fetched: %s", u)
}(url)
}
go func() {
wg.Wait()
close(results)
}()
for result := range results {
fmt.Println(result)
}
}
Characteristics: CSP (Communicating Sequential Processes) based. "Don't communicate by sharing memory; share memory by communicating."
2.2 Java 21+ — Virtual Threads
import java.util.concurrent.*;
import java.util.List;
public class VirtualThreadExample {
public static void main(String[] args) throws Exception {
// Million concurrent tasks with Virtual Threads
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
List<Future<String>> futures = List.of(
executor.submit(() -> fetchURL("https://api.example.com/users")),
executor.submit(() -> fetchURL("https://api.example.com/orders")),
executor.submit(() -> fetchURL("https://api.example.com/products"))
);
for (var future : futures) {
System.out.println(future.get());
}
}
// Structured Concurrency (Preview)
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var user = scope.fork(() -> fetchUser(1));
var order = scope.fork(() -> fetchOrder(1));
scope.join();
scope.throwIfFailed();
System.out.println(user.get() + ", " + order.get());
}
}
}
Characteristics: Virtual threads added via Project Loom. Millions of concurrent tasks with minimal code changes.
2.3 Python — asyncio and the GIL
import asyncio
import aiohttp
async def fetch_url(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
urls = [
"https://api.example.com/users",
"https://api.example.com/orders",
"https://api.example.com/products",
]
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
for url, result in zip(urls, results):
print(f"Fetched {url}: {len(result)} bytes")
asyncio.run(main())
Characteristics: The GIL (Global Interpreter Lock) prevents true parallelism for CPU-bound work. asyncio is effective for I/O-bound tasks. Use multiprocessing for CPU-bound work.
2.4 Rust — tokio Async Runtime
use tokio;
use reqwest;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let urls = vec![
"https://api.example.com/users",
"https://api.example.com/orders",
"https://api.example.com/products",
];
let mut handles = vec![];
for url in urls {
let handle = tokio::spawn(async move {
let resp = reqwest::get(url).await?;
let body = resp.text().await?;
Ok::<String, reqwest::Error>(body)
});
handles.push(handle);
}
for handle in handles {
match handle.await? {
Ok(body) => println!("Fetched: {} bytes", body.len()),
Err(e) => eprintln!("Error: {}", e),
}
}
Ok(())
}
Characteristics: Ownership system prevents data races at compile time. Send/Sync traits guarantee thread safety.
2.5 JavaScript — Event Loop
async function fetchAll() {
const urls = [
'https://api.example.com/users',
'https://api.example.com/orders',
'https://api.example.com/products',
];
// Concurrent execution with Promise.all
const results = await Promise.all(
urls.map(url => fetch(url).then(r => r.json()))
);
// Promise.allSettled allows failures
const settled = await Promise.allSettled(
urls.map(url => fetch(url).then(r => r.json()))
);
settled.forEach((result, i) => {
if (result.status === 'fulfilled') {
console.log(`Success: ${urls[i]}`);
} else {
console.log(`Failed: ${urls[i]} - ${result.reason}`);
}
});
}
Characteristics: Single thread + event loop. CPU-bound work requires Worker Threads. Optimized for I/O.
2.6 Erlang/Elixir — Actor Model
%% Erlang processes (lightweight actors)
-module(counter).
-export([start/0, increment/1, get/1]).
start() ->
spawn(fun() -> loop(0) end).
loop(Count) ->
receive
{increment, From} ->
From ! {ok, Count + 1},
loop(Count + 1);
{get, From} ->
From ! {value, Count},
loop(Count);
stop ->
ok
end.
increment(Pid) ->
Pid ! {increment, self()},
receive {ok, NewCount} -> NewCount end.
get(Pid) ->
Pid ! {get, self()},
receive {value, Count} -> Count end.
Characteristics: About 300 bytes per process. Millions of concurrent processes. "Let it crash" philosophy for fault tolerance.
2.7 Language Comparison Table
┌─────────────┬─────────────┬───────────────┬──────────────┬──────────────┐
│ Language │ Model │ Unit │ Scheduling │ Key Feature │
├─────────────┼─────────────┼───────────────┼──────────────┼──────────────┤
│ Go │ CSP (M:N) │ Goroutine │ Runtime │ Channels │
│ Java 21+ │ M:N │ Virtual Thread│ JVM │ Backward compat│
│ Python │ Event loop │ Coroutine │ asyncio │ GIL limit │
│ Rust │ async/await │ Future/Task │ tokio etc. │ Zero-cost │
│ JavaScript │ Event loop │ Promise │ libuv/V8 │ Single thread│
│ Erlang │ Actor (M:N) │ Process │ BEAM VM │ Fault tolerant│
│ Kotlin │ Coroutines │ Coroutine │ Dispatcher │ Structured │
└─────────────┴─────────────┴───────────────┴──────────────┴──────────────┘
3. Event Loop Deep Dive
3.1 Node.js Event Loop Phases
┌───────────────────────────┐
│ timers │ ← setTimeout, setInterval
│ (execute timer cbs) │
├───────────────────────────┤
│ pending callbacks │ ← system error callbacks
│ (deferred I/O cbs) │
├───────────────────────────┤
│ idle, prepare │ ← internal use
├───────────────────────────┤
│ poll │ ← I/O callbacks (most)
│ (wait for new I/O) │
├───────────────────────────┤
│ check │ ← setImmediate callbacks
├───────────────────────────┤
│ close callbacks │ ← socket.on('close')
└───────────────────────────┘
Between each phase: microtask queue is drained
(Promise.then, queueMicrotask, process.nextTick)
3.2 Microtasks vs Macrotasks
console.log('1: sync');
setTimeout(() => console.log('2: macro (setTimeout)'), 0);
Promise.resolve().then(() => console.log('3: micro (Promise)'));
queueMicrotask(() => console.log('4: micro (queueMicrotask)'));
console.log('5: sync');
// Output order:
// 1: sync
// 5: sync
// 3: micro (Promise)
// 4: micro (queueMicrotask)
// 2: macro (setTimeout)
Rule: Synchronous code first, then all microtasks, then one macrotask, repeat.
3.3 libuv Architecture
┌──────────────────────────────────────────┐
│ Node.js │
│ ┌─────────────────────────────────┐ │
│ │ JavaScript (V8 Engine) │ │
│ │ ┌───────────┐ ┌────────────┐ │ │
│ │ │ Async │ │ Event │ │ │
│ │ │ APIs │ │ Queue │ │ │
│ │ └─────┬─────┘ └──────┬─────┘ │ │
│ └────────┼────────────────┼────────┘ │
│ │ Event Loop │ │
│ ┌────────▼────────────────▼────────┐ │
│ │ libuv │ │
│ │ ┌──────────┐ ┌──────────────┐ │ │
│ │ │ Thread │ │ epoll/kqueue │ │ │
│ │ │ Pool │ │ /IOCP │ │ │
│ │ │ (4 thds) │ │ (non-block) │ │ │
│ │ └──────────┘ └──────────────┘ │ │
│ └───────────────────────────────────┘ │
│ │ │ │
│ File I/O Network I/O │
│ DNS lookup TCP/UDP │
│ Crypto HTTP │
└──────────────────────────────────────────┘
File I/O and DNS are handled via the thread pool, while network I/O uses the OS's async mechanisms (epoll, kqueue, IOCP).
4. Coroutines and Structured Concurrency
4.1 Python async/await
import asyncio
async def process_item(item: str) -> str:
await asyncio.sleep(1) # I/O simulation
return f"Processed: {item}"
async def main():
items = ["a", "b", "c", "d", "e"]
# Concurrent execution
tasks = [process_item(item) for item in items]
results = await asyncio.gather(*tasks)
print(results) # All complete in 1 second
# Timeout
try:
result = await asyncio.wait_for(
process_item("slow"),
timeout=0.5
)
except asyncio.TimeoutError:
print("Timeout!")
# Semaphore to limit concurrency
semaphore = asyncio.Semaphore(3)
async def limited_process(item):
async with semaphore:
return await process_item(item)
results = await asyncio.gather(*[limited_process(i) for i in items])
asyncio.run(main())
4.2 Kotlin Coroutines and Structured Concurrency
import kotlinx.coroutines.*
suspend fun fetchUser(id: Int): User {
delay(1000)
return User(id, "User $id")
}
fun main() = runBlocking {
// Structured concurrency: parent cancel propagates to children
coroutineScope {
val user = async { fetchUser(1) }
val orders = async { fetchOrders(1) }
println("User: ${user.await()}")
println("Orders: ${orders.await()}")
} // Waits for all child coroutines
// SupervisorScope: child failure doesn't affect siblings
supervisorScope {
val job1 = launch {
delay(100)
throw RuntimeException("Failed!")
}
val job2 = launch {
delay(200)
println("Job2 completed") // Still runs
}
}
}
4.3 Structured Concurrency Principles
Traditional concurrency: Structured concurrency:
┌─────────────┐ ┌─────────────┐
│ Parent │ │ Parent │
│ ┌───┐ ┌───┐│ │ ┌───┐ ┌───┐│
│ │ A │ │ B ││ │ │ A │ │ B ││
│ └─┬─┘ └─┬─┘│ │ └─┬─┘ └─┬─┘│
└────┼─────┼───┘ └────┼─────┼───┘
│ │ ← Who manages? │ │ ← Parent manages
▼ ▼ Leak possible! ▼ ▼ Auto cleanup!
Rules:
1. All concurrent work starts within a scope
2. Scope waits for all child tasks to complete
3. Parent cancellation cancels all children
4. Child errors propagate to parent
5. Actor Model
5.1 Concept
In the actor model, each actor:
- Has its own state (not shared with other actors)
- Sends and receives messages asynchronously
- Can create new actors, change state, or send messages when processing a message
┌──────────┐ message ┌──────────┐
│ Actor A │ ──────────────→ │ Actor B │
│ │ │ │
│ State: x │ message │ State: y │
│ Mailbox │ ←────────────── │ Mailbox │
└──────────┘ └──────────┘
│
│ message
▼
┌──────────┐
│ Actor C │
│ State: z │
│ Mailbox │
└──────────┘
Key: No shared state, no locks, only message passing
5.2 Erlang/OTP Supervision Trees
┌──────────────┐
│ Supervisor │
│ (one_for_one)│
└──────┬───────┘
┌────┼────┐
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│ W1 │ │ W2 │ │ W3 │
└─────┘ └─────┘ └─────┘
When W2 crashes:
1. Supervisor detects it
2. Only W2 restarts (one_for_one strategy)
3. W1, W3 are unaffected
4. If restarted 5+ times in 5 minutes, Supervisor also terminates
5.3 When to Use the Actor Model
Good fit:
- Managing millions of independent states (chat rooms, IoT devices)
- Distributed systems (inter-node communication)
- Fault-tolerance required (telecom, finance)
- Real-time systems (game servers, trading)
Poor fit:
- Simple CRUD applications
- Strong consistency required
- Transaction processing is key
- Order guarantees critical in pipelines
6. Synchronization Primitives
6.1 Mutex
// Go - Mutex
type SafeMap struct {
mu sync.Mutex
data map[string]int
}
func (m *SafeMap) Set(key string, val int) {
m.mu.Lock()
defer m.mu.Unlock()
m.data[key] = val
}
func (m *SafeMap) Get(key string) (int, bool) {
m.mu.Lock()
defer m.mu.Unlock()
val, ok := m.data[key]
return val, ok
}
// Java - ReentrantLock
import java.util.concurrent.locks.ReentrantLock;
public class SafeCounter {
private final ReentrantLock lock = new ReentrantLock();
private int count = 0;
public void increment() {
lock.lock();
try {
count++;
} finally {
lock.unlock();
}
}
}
6.2 RWLock (Read-Write Lock)
type Cache struct {
mu sync.RWMutex
data map[string]string
}
// Multiple goroutines can read simultaneously
func (c *Cache) Get(key string) (string, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
val, ok := c.data[key]
return val, ok
}
// Write requires exclusive lock
func (c *Cache) Set(key, val string) {
c.mu.Lock()
defer c.mu.Unlock()
c.data[key] = val
}
6.3 Semaphore
import asyncio
async def worker(name, semaphore):
async with semaphore:
print(f"{name} acquired semaphore")
await asyncio.sleep(1)
print(f"{name} released semaphore")
async def main():
semaphore = asyncio.Semaphore(3) # max 3 concurrent
tasks = [worker(f"Worker-{i}", semaphore) for i in range(10)]
await asyncio.gather(*tasks)
asyncio.run(main())
6.4 Condition Variable
type BoundedQueue struct {
mu sync.Mutex
notEmpty *sync.Cond
notFull *sync.Cond
items []interface{}
maxSize int
}
func NewBoundedQueue(maxSize int) *BoundedQueue {
q := &BoundedQueue{maxSize: maxSize}
q.notEmpty = sync.NewCond(&q.mu)
q.notFull = sync.NewCond(&q.mu)
return q
}
func (q *BoundedQueue) Put(item interface{}) {
q.mu.Lock()
defer q.mu.Unlock()
for len(q.items) == q.maxSize {
q.notFull.Wait() // wait until space available
}
q.items = append(q.items, item)
q.notEmpty.Signal() // wake consumer
}
func (q *BoundedQueue) Take() interface{} {
q.mu.Lock()
defer q.mu.Unlock()
for len(q.items) == 0 {
q.notEmpty.Wait() // wait until item available
}
item := q.items[0]
q.items = q.items[1:]
q.notFull.Signal() // wake producer
return item
}
7. Lock-Free Programming
7.1 CAS (Compare-And-Swap)
import "sync/atomic"
type LockFreeCounter struct {
value int64
}
func (c *LockFreeCounter) Increment() int64 {
for {
old := atomic.LoadInt64(&c.value)
new := old + 1
if atomic.CompareAndSwapInt64(&c.value, old, new) {
return new
}
// CAS failed, retry (another goroutine changed it first)
}
}
func (c *LockFreeCounter) Get() int64 {
return atomic.LoadInt64(&c.value)
}
7.2 The ABA Problem
Thread A: reads value A
Thread B: changes A to B
Thread B: changes B back to A
Thread A: CAS(A -> C) succeeds! (but value changed in between)
Solutions:
- Add version numbers (A,v1 -> B,v2 -> A,v3)
- Tagged pointers (AtomicStampedReference in Java)
- Hazard Pointers
7.3 When NOT to Use Lock-Free
Lock-free is good for:
- Very high contention scenarios
- Real-time systems (avoiding priority inversion)
- Simple data structures (counters, stacks, queues)
Avoid lock-free when (most cases):
- Regular applications (Mutex is sufficient)
- Complex data structures (trees, graphs)
- Correctness verification is difficult
- Team is not experienced with lock-free
"If a Mutex works, use a Mutex."
8. Common Concurrency Bugs
8.1 Race Condition
// Bug: multiple goroutines modifying counter simultaneously
var counter int
func buggyIncrement() {
for i := 0; i < 1000; i++ {
go func() {
counter++ // read-modify-write is not atomic
}()
}
}
// Fix: use atomic or mutex
var safeCounter int64
func safeIncrement() {
for i := 0; i < 1000; i++ {
go func() {
atomic.AddInt64(&safeCounter, 1)
}()
}
}
8.2 Deadlock
Four necessary conditions (all must hold for deadlock):
1. Mutual Exclusion: resource used by only one at a time
2. Hold and Wait: holding resource while waiting for another
3. No Preemption: cannot forcibly take held resource
4. Circular Wait: A waits for B, B waits for C, C waits for A
// Deadlock example
var mu1, mu2 sync.Mutex
func goroutine1() {
mu1.Lock()
defer mu1.Unlock()
time.Sleep(time.Millisecond)
mu2.Lock() // waiting for mu2 (held by goroutine2)
defer mu2.Unlock()
}
func goroutine2() {
mu2.Lock()
defer mu2.Unlock()
time.Sleep(time.Millisecond)
mu1.Lock() // waiting for mu1 (held by goroutine1)
defer mu1.Unlock()
}
// Fix: always acquire locks in the same order
func fixed1() {
mu1.Lock()
defer mu1.Unlock()
mu2.Lock()
defer mu2.Unlock()
}
func fixed2() {
mu1.Lock() // mu1 first (same order)
defer mu1.Unlock()
mu2.Lock()
defer mu2.Unlock()
}
8.3 Livelock
A and B meet in a narrow corridor:
A: steps left <-> B: steps right (same direction!)
A: steps right <-> B: steps left (same direction again!)
... (infinite loop, nobody progresses)
vs Deadlock: doing nothing
vs Livelock: constantly moving but no progress
8.4 Starvation
High-priority threads continuously hold resources, preventing low-priority threads from ever executing.
8.5 Priority Inversion
1997 Mars Pathfinder incident:
Low-priority task holds mutex
|
Medium-priority task preempts Low (unrelated to mutex)
|
High-priority task waits for mutex... forever!
(Low needs to run to release mutex, but Medium keeps preempting)
Solution: Priority Inheritance Protocol
Low temporarily inherits High's priority while holding the mutex
8.6 Detection Tools
# Go: Race Detector
go test -race ./...
go run -race main.go
# Java: Thread Sanitizer
java -XX:+UseThreadSanitizer MyApp
# Rust: Compiler detects at compile time!
# Send/Sync trait violations = compile error
# Python: asyncio debug mode
PYTHONASYNCIODEBUG=1 python app.py
# General: Helgrind (Valgrind tool)
valgrind --tool=helgrind ./program
9. Practical Patterns
9.1 Producer-Consumer
func producerConsumer() {
ch := make(chan int, 100) // buffered channel
// Producers
var prodWg sync.WaitGroup
for i := 0; i < 3; i++ {
prodWg.Add(1)
go func(id int) {
defer prodWg.Done()
for j := 0; j < 10; j++ {
ch <- id*100 + j
}
}(i)
}
go func() {
prodWg.Wait()
close(ch)
}()
// Consumers
var consWg sync.WaitGroup
for i := 0; i < 2; i++ {
consWg.Add(1)
go func(id int) {
defer consWg.Done()
for item := range ch {
fmt.Printf("Consumer %d: %d\n", id, item)
}
}(i)
}
consWg.Wait()
}
9.2 Thread Pool / Worker Pool
import asyncio
from asyncio import Queue
async def worker(name: str, queue: Queue):
while True:
item = await queue.get()
try:
await process(item)
finally:
queue.task_done()
async def main():
queue = Queue(maxsize=100)
# Start 5 workers
workers = [
asyncio.create_task(worker(f"worker-{i}", queue))
for i in range(5)
]
# Add tasks
for item in range(50):
await queue.put(item)
await queue.join()
for w in workers:
w.cancel()
asyncio.run(main())
9.3 Work Stealing
Traditional thread pool: Work Stealing:
┌─────────────┐ ┌─────────────┐
│ Shared queue │ │ Worker 1 │
│ [T1|T2|T3] │ │ local:[T1] │
│ │ ├─────────────┤
│ All workers │ │ Worker 2 │
│ contend! │ │ local:[T2] │ ← steals from W1 when empty
│ │ ├─────────────┤
└─────────────┘ │ Worker 3 │
│ local:[T3] │
└─────────────┘
Benefits: Better cache locality, reduced contention
Used by: Java ForkJoinPool, Go runtime, Tokio (Rust)
10. Performance
10.1 Amdahl's Law
Speedup = 1 / ((1 - P) + P/N)
P = fraction that can be parallelized
N = number of processors
Example: 95% parallelizable program on 64 cores
Speedup = 1 / (0.05 + 0.95/64) = 1 / 0.0648 = 15.4x
Key insight: No matter how many cores, the serial portion (5%)
is the bottleneck! Maximum theoretical speedup = 1/0.05 = 20x
10.2 Gustafson's Law
Speedup = N - alpha * (N - 1)
alpha = fraction of serial work
N = number of processors
Key difference:
- Amdahl: "Fixed problem size, add more cores?"
- Gustafson: "Fixed cores, grow the problem?"
In reality, as data grows, the parallel portion also grows.
Gustafson is more optimistic and realistic.
10.3 Context Switch Cost
Context Switch Cost Comparison:
┌───────────────────────────┬──────────────┐
│ Switch Type │ Approx Cost │
├───────────────────────────┼──────────────┤
│ Function call │ ~1-5 ns │
│ Goroutine switch │ ~100-200 ns │
│ Coroutine switch (Python) │ ~1-5 us │
│ Thread switch (same proc) │ ~1-10 us │
│ Process switch │ ~10-50 us │
│ VM switch │ ~100+ us │
└───────────────────────────┴──────────────┘
10.4 False Sharing (Cache Line Problem)
Problem: Different variables on the same cache line
CPU Core 1 CPU Core 2
| |
+---v-------------------v---+
| Cache Line (64 bytes) |
| [counter_A] [counter_B] | <- Different variables but
| | same cache line!
+---------------------------+
Core 1 modifies counter_A -> invalidates cache line -> Core 2 must reload
Core 2 modifies counter_B -> invalidates cache line -> Core 1 must reload
Fix: Padding to separate cache lines
// Preventing False Sharing in Go
type PaddedCounter struct {
value int64
_pad [56]byte // 64-byte cache line padding
}
type Counters struct {
c1 PaddedCounter
c2 PaddedCounter
}
11. Interview Questions (15)
-
What is the difference between concurrency and parallelism? Concurrency is about structuring tasks to alternate. Parallelism is physically executing simultaneously. Concurrency is possible on a single core; parallelism is not.
-
What is the difference between a process and a thread? Processes have independent memory spaces; threads share memory within a process. Threads are cheaper to create/switch but require synchronization.
-
What are the 4 necessary conditions for deadlock and how to solve each? Mutual exclusion, hold and wait, no preemption, circular wait. Ordered lock acquisition (preventing circular wait) is most practical.
-
Why are Go goroutines lighter than OS threads? About 2KB stack (OS threads 1MB+), M:N scheduler maps millions of goroutines to few OS threads, user-space context switching.
-
What is Python's GIL and why does it exist? Global Interpreter Lock. CPython's reference counting memory management is not thread-safe. Use multiprocessing for CPU-bound parallel work.
-
Pros and cons of event loops vs multithreading? Event loop: simpler model, no context switch overhead, poor for CPU-bound. Multithreading: maximizes CPU utilization, complex synchronization, deadlock risk.
-
How is async/await different from threads? async/await is cooperative multitasking, switching only at await points. Threads are preemptive, OS can switch at any time.
-
Pros and cons of the actor model? Pros: no shared state, no deadlocks, natural for distributed systems. Cons: message ordering hard, debugging complex, performance overhead.
-
What is CAS? When to use instead of Mutex? Compare-And-Swap is an atomic compare-exchange operation. Faster than Mutex for simple counters or pointer updates, but unsuitable for complex logic.
-
What is false sharing and how to fix it? Performance degradation when different cores modify different variables on the same cache line. Fix by padding variables onto separate cache lines.
-
What does Amdahl's Law tell us? The serial portion determines the upper bound on speedup. Reducing the serial fraction may matter more than optimizing parallel work.
-
Difference between Java Virtual Threads and Go goroutines? Both are M:N models, but Go uses channel-based CSP, Java maintains existing Thread API compatibility. Go introduced first, Java 21 made it official.
-
What is Structured Concurrency? Binding concurrent tasks to a scope for lifecycle management. Parent cancel propagates to children, parent waits for all children. Prevents goroutine/thread leaks.
-
Explain Erlang's "Let it crash" philosophy. Rather than defensively handling errors, quickly terminate the process and let the supervisor restart it. Eliminates complex error recovery code.
-
What is the right thread pool size? CPU-bound: number of CPU cores. I/O-bound: CPU cores * (1 + wait_time/compute_time). Can estimate using Little's Law (L = lambda * W).
12. Practice Quiz
Q1: What is the output order of this JavaScript code?
console.log('A');
setTimeout(() => console.log('B'), 0);
Promise.resolve().then(() => console.log('C'));
console.log('D');
Answer: A, D, C, B. Synchronous code (A, D) first, then microtasks (Promise: C), then macrotasks (setTimeout: B).
Q2: Why does this Go code deadlock?
func main() {
ch := make(chan int)
ch <- 42
fmt.Println(<-ch)
}
Answer: An unbuffered channel requires both sender and receiver to be present simultaneously. The main goroutine blocks at ch <- 42 waiting for a receiver and can never reach <-ch.
Q3: What is the difference between asyncio.gather and asyncio.wait in Python?
Answer: gather returns all results in order and raises an exception if one fails (configurable with return_exceptions=True). wait returns sets of completed/pending tasks with FIRST_COMPLETED, FIRST_EXCEPTION, and ALL_COMPLETED options. wait is more flexible.
Q4: According to Amdahl's Law, what is the maximum speedup for a program that is 90% parallelizable?
Answer: Even with infinite cores, maximum speedup = 1/(1-0.9) = 1/0.1 = 10x. The 10% serial portion becomes the bottleneck. With 64 cores: 1/(0.1 + 0.9/64) = approximately 8.8x.
Q5: How does Rust prevent data races at compile time?
Answer: Rust's ownership system and Send/Sync traits are the key. Send means ownership can be transferred between threads; Sync means references can be shared between threads. Only one mutable reference can exist at a time (borrow checker), which prevents data races at compile time entirely.
References
- Rob Pike - Concurrency Is Not Parallelism (Talk)
- The Art of Multiprocessor Programming (Herlihy & Shavit)
- Java Concurrency in Practice (Goetz et al.)
- JEP 444: Virtual Threads
- Node.js Event Loop Documentation
- Python asyncio Documentation
- Tokio Tutorial (Rust)
- Erlang/OTP Design Principles
- Go Concurrency Patterns
- Kotlin Coroutines Guide
- Lock-Free Programming (Preshing)
- Amdahl's Law (Wikipedia)
- LMAX Disruptor (Lock-Free Queue)
- Structured Concurrency (Wikipedia)
- Thread Sanitizer (Google)