Split View: OS 핵심 개념 완전 정리: 개발자 면접에 나오는 운영체제의 모든 것

OS 핵심 개념 완전 정리: 개발자 면접에 나오는 운영체제의 모든 것

들어가며
1. 왜 OS 지식이 중요한가
- 면접에서 빈출되는 이유
- 실무에서의 중요성
2. 프로세스 관리
3. 스레드 (Thread)
4. CPU 스케줄링
5. 동기화 (Synchronization)
6. 데드락 (Deadlock)
7. 메모리 관리
8. 가상 메모리 (Virtual Memory)
9. 파일 시스템
10. I/O 관리
11. 리눅스 커널 기초
12. 컨테이너의 OS 관점
13. 면접 질문 25선
14. 퀴즈
참고 자료

들어가며

운영체제(OS) 지식은 개발자 면접에서 가장 자주 출제되는 CS 기초 영역 중 하나입니다. 단순히 면접 대비를 넘어, OS 개념을 깊이 이해하면 성능 최적화, 동시성 버그 해결, 시스템 설계에서 확실한 차이를 만들 수 있습니다.

이 글에서는 프로세스 관리, 스레드, CPU 스케줄링, 동기화, 데드락, 메모리 관리, 가상 메모리, 파일 시스템, I/O 관리, 리눅스 커널 기초, 그리고 컨테이너의 OS 관점까지 — 개발자가 알아야 할 운영체제의 모든 것을 실전 코드와 함께 체계적으로 정리합니다.

1. 왜 OS 지식이 중요한가

면접에서 빈출되는 이유

거의 모든 기술 면접에서 OS 질문이 등장합니다. 특히 다음과 같은 질문들이 자주 출제됩니다.

프로세스와 스레드의 차이를 설명하세요
데드락의 4가지 조건과 해결 방법은?
가상 메모리가 무엇이고 왜 필요한가?
컨텍스트 스위치 비용을 줄이는 방법은?
뮤텍스와 세마포어의 차이는?

실무에서의 중요성

성능 최적화: CPU 캐시, 메모리 계층, I/O 패턴 이해가 필수
동시성 프로그래밍: 레이스 컨디션, 데드락 방지를 위한 동기화 이해
시스템 설계: 프로세스 간 통신, 분산 시스템의 기초
트러블슈팅: strace, perf, eBPF 등 시스템 도구 활용
컨테이너/클라우드: namespace, cgroups 이해가 Docker/K8s 활용의 핵심

2. 프로세스 관리

프로세스란?

프로세스는 실행 중인 프로그램의 인스턴스입니다. 각 프로세스는 독립적인 메모리 공간을 가집니다.

PCB (Process Control Block)

┌─────────────────────────────────┐
│         PCB (Process            │
│         Control Block)          │
├─────────────────────────────────┤
│  Process ID (PID)               │
│  Process State                  │
│  Program Counter (PC)           │
│  CPU Registers                  │
│  CPU Scheduling Info            │
│  Memory Management Info         │
│  I/O Status Info                │
│  Accounting Info                │
└─────────────────────────────────┘

OS는 PCB를 통해 각 프로세스의 실행 상태를 관리합니다. 컨텍스트 스위치 시 현재 프로세스의 PCB를 저장하고, 다음 프로세스의 PCB를 복원합니다.

프로세스 상태 전이

         ┌──────────┐
    생성  │          │  종료
  ──────▶│  Ready   │──────▶
         │          │
         └────┬─────┘
              │ dispatch
              ▼
         ┌──────────┐     I/O 요청
         │ Running  │───────────┐
         │          │           │
         └────┬─────┘           ▼
              │           ┌──────────┐
     선점    │           │ Waiting  │
  (preempt)  │           │ (Blocked)│
              └───────────┤          │
                          └──────────┘
                          I/O 완료 시 Ready로

fork/exec — 프로세스 생성

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();  // 프로세스 복제

    if (pid < 0) {
        // fork 실패
        perror("fork failed");
        return 1;
    } else if (pid == 0) {
        // 자식 프로세스
        printf("Child PID: %d, Parent PID: %d\n", getpid(), getppid());
        // exec: 새로운 프로그램으로 교체
        execlp("ls", "ls", "-la", NULL);
        // exec 성공 시 이 줄은 실행되지 않음
        perror("exec failed");
    } else {
        // 부모 프로세스
        printf("Parent PID: %d, Child PID: %d\n", getpid(), pid);
        int status;
        waitpid(pid, &status, 0);  // 자식 프로세스 종료 대기
        printf("Child exited with status: %d\n", WEXITSTATUS(status));
    }
    return 0;
}

Copy-on-Write (COW)

fork() 시 부모와 자식은 처음에 같은 물리 페이지를 공유합니다. 한쪽이 쓰기를 시도하면 그때 페이지를 복사합니다. 이를 통해 불필요한 메모리 복사를 방지합니다.

IPC (Inter-Process Communication)

IPC 방식	특징	사용 사례
Pipe	단방향, 부모-자식 간	셸 파이프라인
Named Pipe (FIFO)	양방향, 비관련 프로세스 간	간단한 데이터 전달
Socket	양방향, 네트워크 지원	클라이언트-서버 통신
Shared Memory	가장 빠름, 동기화 필요	고성능 데이터 교환
Message Queue	비동기, 버퍼	작업 큐, 이벤트 시스템
Signal	비동기 알림	프로세스 제어 (SIGTERM, SIGKILL)

// Shared Memory 예시
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>

int main() {
    const char *name = "/my_shm";
    const int SIZE = 4096;

    // 공유 메모리 객체 생성
    int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, SIZE);

    // 메모리 매핑
    void *ptr = mmap(0, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
    memcpy(ptr, "Hello from producer!", 20);

    // 정리: munmap, shm_unlink
    return 0;
}

# Python 멀티프로세싱 + 공유 메모리
from multiprocessing import Process, Value, Array

def worker(shared_counter, shared_array):
    shared_counter.value += 1
    shared_array[0] = 3.14

if __name__ == '__main__':
    counter = Value('i', 0)     # 정수 공유 변수
    arr = Array('d', [0.0, 0.0]) # 실수 공유 배열

    processes = [Process(target=worker, args=(counter, arr)) for _ in range(4)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print(f"Counter: {counter.value}, Array: {arr[:]}")

3. 스레드 (Thread)

프로세스 vs 스레드

프로세스 A                프로세스 B
┌───────────────┐        ┌───────────────┐
│ Code  │ Data  │        │ Code  │ Data  │
│───────│───────│        │───────│───────│
│ Heap  │ Stack │        │ Heap  │ Stack │
│       │ (main)│        │       │ (main)│
└───────────────┘        └───────────────┘
  독립적 메모리 공간          독립적 메모리 공간

프로세스 C (멀티스레드)
┌───────────────────────────────┐
│ Code (공유) │  Data (공유)     │
│─────────────│─────────────────│
│ Heap (공유) │ Stack1 │ Stack2 │
│             │ (T1)   │ (T2)   │
└───────────────────────────────┘
  스레드는 Code, Data, Heap을 공유
  각 스레드는 독립적인 Stack만 보유

항목	프로세스	스레드
메모리 공간	독립	공유 (Code, Data, Heap)
생성 비용	높음	낮음
컨텍스트 스위치	비쌈 (TLB 플러시)	저렴
통신	IPC 필요	공유 메모리 직접 접근
안정성	한 프로세스 크래시가 다른 프로세스에 영향 없음	한 스레드 크래시 시 전체 프로세스 영향

커널 스레드 vs 유저 스레드

User-Level Thread (ULT):
┌─────────────────────────┐
│     User Space          │
│  Thread Library         │
│  ┌───┐ ┌───┐ ┌───┐    │
│  │ T1│ │ T2│ │ T3│    │
│  └───┘ └───┘ └───┘    │
│         |               │
│  ┌──────▼──────┐       │
│  │ Single      │       │
│  │ Kernel Thread│       │
│  └─────────────┘       │
└─────────────────────────┘
장점: 빠른 스위칭
단점: 하나가 블록되면 모두 블록

Kernel-Level Thread (KLT):
┌─────────────────────────┐
│     User Space          │
│  ┌───┐ ┌───┐ ┌───┐    │
│  │ T1│ │ T2│ │ T3│    │
│  └─┬─┘ └─┬─┘ └─┬─┘    │
└────┼──────┼──────┼──────┘
  ┌──▼──┐┌──▼──┐┌──▼──┐
  │ KT1 ││ KT2 ││ KT3 │
  └─────┘└─────┘└─────┘
장점: 진정한 병렬성
단점: 스위칭 비용 높음

POSIX pthread (C)

#include <pthread.h>
#include <stdio.h>

#define NUM_THREADS 4

typedef struct {
    int thread_id;
    int start;
    int end;
    long result;
} ThreadArg;

void* sum_range(void* arg) {
    ThreadArg* targ = (ThreadArg*)arg;
    targ->result = 0;
    for (int i = targ->start; i <= targ->end; i++) {
        targ->result += i;
    }
    printf("Thread %d: sum(%d..%d) = %ld\n",
           targ->thread_id, targ->start, targ->end, targ->result);
    return NULL;
}

int main() {
    pthread_t threads[NUM_THREADS];
    ThreadArg args[NUM_THREADS];
    int range_per_thread = 250;  // 1-1000을 4등분

    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].thread_id = i;
        args[i].start = i * range_per_thread + 1;
        args[i].end = (i + 1) * range_per_thread;
        pthread_create(&threads[i], NULL, sum_range, &args[i]);
    }

    long total = 0;
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        total += args[i].result;
    }

    printf("Total sum: %ld\n", total);  // 500500
    return 0;
}

Go goroutine

package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    results := make(chan int, 4)

    for i := 0; i < 4; i++ {
        wg.Add(1)
        go func(id, start, end int) {
            defer wg.Done()
            sum := 0
            for j := start; j <= end; j++ {
                sum += j
            }
            results <- sum
            fmt.Printf("Goroutine %d: sum(%d..%d) = %d\n", id, start, end, sum)
        }(i, i*250+1, (i+1)*250)
    }

    go func() {
        wg.Wait()
        close(results)
    }()

    total := 0
    for r := range results {
        total += r
    }
    fmt.Printf("Total: %d\n", total)
}

Go의 goroutine은 수 KB의 스택으로 시작하며, Go 런타임 스케줄러가 M:N 스레딩(M개 goroutine을 N개 OS 스레드에 매핑)을 관리합니다.

Python GIL (Global Interpreter Lock)

import threading
import multiprocessing
import time

# CPU-bound 작업: GIL로 인해 스레드가 이점 없음
def cpu_bound(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

# 스레드 방식 (GIL 제한)
def thread_test():
    threads = [threading.Thread(target=cpu_bound, args=(10_000_000,))
               for _ in range(4)]
    start = time.time()
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    print(f"Threads: {time.time() - start:.2f}s")

# 프로세스 방식 (GIL 우회)
def process_test():
    processes = [multiprocessing.Process(target=cpu_bound, args=(10_000_000,))
                 for _ in range(4)]
    start = time.time()
    for p in processes:
        p.start()
    for p in processes:
        p.join()
    print(f"Processes: {time.time() - start:.2f}s")

# I/O-bound 작업: 스레드가 유효
# CPU-bound 작업: 멀티프로세싱 또는 C 확장 사용

Python 3.13+에서는 실험적으로 GIL-free 빌드가 지원되기 시작했습니다 (PEP 703).

4. CPU 스케줄링

스케줄링 알고리즘 비교

알고리즘	특징	장점	단점
FCFS	먼저 온 순서대로	구현 간단	호위 효과(convoy effect)
SJF	최단 작업 우선	최소 평균 대기 시간	긴 작업 기아(starvation)
Round Robin	시간 할당량 순환	공평, 응답 시간 좋음	시간 할당량 설정이 중요
Priority	우선순위 기반	중요 작업 빠른 처리	기아 문제 (aging으로 해결)
MLFQ	다단계 피드백 큐	적응적, 범용적	복잡한 구현
CFS	리눅스 기본	공평한 CPU 시간 배분	레이턴시 보장 어려움

CFS (Completely Fair Scheduler) — 리눅스 기본 스케줄러

CFS는 모든 프로세스에 공평한 CPU 시간을 할당하는 것을 목표로 합니다. 레드-블랙 트리를 사용하여 vruntime(가상 실행 시간)이 가장 작은 프로세스를 다음에 실행합니다.

Red-Black Tree (CFS)
       ┌───┐
       │ 8 │  vruntime이 가장 작은 노드가
       └─┬─┘  항상 가장 왼쪽에 위치
      ┌──┴──┐
    ┌─┤    ┌┤
    │5│    │12│
    └─┘    └──┘
   ↑
  다음 실행 대상

# nice 값으로 프로세스 우선순위 조정 (-20 ~ +19)
nice -n 10 ./my_program        # 낮은 우선순위로 실행
renice -n -5 -p 1234          # PID 1234의 우선순위 상향

# 실시간 스케줄링 정책 설정
chrt -f 50 ./realtime_program   # FIFO, 우선순위 50
chrt -r 50 ./realtime_program   # Round Robin, 우선순위 50

# 현재 스케줄링 정보 확인
chrt -p 1234

cgroups로 CPU 자원 제한

# cgroups v2로 CPU 사용량 제한
# /sys/fs/cgroup에 새 그룹 생성
mkdir /sys/fs/cgroup/my_group

# CPU 대역폭 제한: 50ms 주기에서 최대 25ms 사용 (50%)
echo "25000 50000" > /sys/fs/cgroup/my_group/cpu.max

# 프로세스를 cgroup에 추가
echo 1234 > /sys/fs/cgroup/my_group/cgroup.procs

# CPU 가중치 설정 (기본 100, 범위 1-10000)
echo 200 > /sys/fs/cgroup/my_group/cpu.weight

5. 동기화 (Synchronization)

레이스 컨디션 (Race Condition)

// 레이스 컨디션 예시 — 동기화 없는 카운터
#include <pthread.h>
#include <stdio.h>

int counter = 0;  // 공유 변수

void* increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // 원자적이지 않음! (read -> modify -> write)
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Counter: %d (expected 2000000)\n", counter);
    // 실제로는 2000000보다 작은 값 출력
    return 0;
}

임계 영역 (Critical Section)

임계 영역은 공유 자원에 접근하는 코드 블록입니다. 다음 3가지 조건을 만족해야 합니다.

상호 배제 (Mutual Exclusion): 한 번에 하나의 프로세스/스레드만 접근
진행 (Progress): 임계 영역에 아무도 없으면 대기 중인 프로세스가 진입 가능
한정 대기 (Bounded Waiting): 무한 대기 방지

뮤텍스 (Mutex)

#include <pthread.h>
#include <stdio.h>

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;

void* safe_increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);     // 잠금 획득
        counter++;                      // 임계 영역
        pthread_mutex_unlock(&lock);   // 잠금 해제
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, safe_increment, NULL);
    pthread_create(&t2, NULL, safe_increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Counter: %d\n", counter);  // 정확히 2000000
    return 0;
}

세마포어 (Semaphore)

#include <semaphore.h>
#include <pthread.h>
#include <stdio.h>

sem_t semaphore;

void* worker(void* arg) {
    int id = *(int*)arg;
    printf("Thread %d waiting...\n", id);
    sem_wait(&semaphore);     // 세마포어 값 감소, 0이면 대기
    printf("Thread %d entered critical section\n", id);
    sleep(2);                 // 작업 수행
    printf("Thread %d leaving\n", id);
    sem_post(&semaphore);     // 세마포어 값 증가
    return NULL;
}

int main() {
    sem_init(&semaphore, 0, 3);  // 동시에 3개 스레드 허용

    pthread_t threads[10];
    int ids[10];
    for (int i = 0; i < 10; i++) {
        ids[i] = i;
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }
    for (int i = 0; i < 10; i++) {
        pthread_join(threads[i], NULL);
    }
    sem_destroy(&semaphore);
    return 0;
}

스핀락 (Spinlock)

#include <pthread.h>

pthread_spinlock_t spinlock;
int counter = 0;

void* spin_increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_spin_lock(&spinlock);    // 잠금 획득까지 busy-wait
        counter++;
        pthread_spin_unlock(&spinlock);
    }
    return NULL;
}
// 스핀락은 임계 영역이 매우 짧을 때 유용
// 컨텍스트 스위치 비용 > 스핀 대기 비용인 경우에 사용

뮤텍스 vs 세마포어 vs 스핀락

특성	뮤텍스	세마포어	스핀락
동시 접근 수	1	N	1
대기 방식	Sleep (블로킹)	Sleep (블로킹)	Busy-wait
소유권	있음 (잠금한 스레드만 해제)	없음	있음
적합한 경우	일반적 상호 배제	자원 풀 관리	짧은 임계 영역
컨텍스트 스위치	발생	발생	없음

Read-Write Lock

import threading

class ReadWriteLock:
    def __init__(self):
        self._read_ready = threading.Condition(threading.Lock())
        self._readers = 0
        self._writers = 0

    def acquire_read(self):
        with self._read_ready:
            while self._writers > 0:
                self._read_ready.wait()
            self._readers += 1

    def release_read(self):
        with self._read_ready:
            self._readers -= 1
            if self._readers == 0:
                self._read_ready.notify_all()

    def acquire_write(self):
        with self._read_ready:
            while self._readers > 0 or self._writers > 0:
                self._read_ready.wait()
            self._writers += 1

    def release_write(self):
        with self._read_ready:
            self._writers -= 1
            self._read_ready.notify_all()

6. 데드락 (Deadlock)

데드락의 4가지 필요 조건

  Thread A              Thread B
  ┌──────┐             ┌──────┐
  │Lock X│────대기────▶│Lock Y│
  │보유   │             │보유   │
  └──────┘◀────대기────└──────┘

상호 배제 (Mutual Exclusion): 자원을 한 번에 하나의 프로세스만 사용
점유와 대기 (Hold and Wait): 자원을 보유한 채 다른 자원을 대기
비선점 (No Preemption): 다른 프로세스의 자원을 강제로 빼앗을 수 없음
순환 대기 (Circular Wait): 프로세스들이 순환적으로 자원을 대기

네 조건이 모두 만족되어야 데드락이 발생합니다. 하나라도 깨면 데드락을 방지할 수 있습니다.

데드락 예시 (Python)

import threading
import time

lock_a = threading.Lock()
lock_b = threading.Lock()

def thread_1():
    print("Thread 1: Acquiring lock_a")
    lock_a.acquire()
    time.sleep(0.1)  # 타이밍 이슈를 위한 지연
    print("Thread 1: Acquiring lock_b")
    lock_b.acquire()  # 데드락! Thread 2가 lock_b를 보유 중
    lock_b.release()
    lock_a.release()

def thread_2():
    print("Thread 2: Acquiring lock_b")
    lock_b.acquire()
    time.sleep(0.1)
    print("Thread 2: Acquiring lock_a")
    lock_a.acquire()  # 데드락! Thread 1이 lock_a를 보유 중
    lock_a.release()
    lock_b.release()

# 데드락 발생!
t1 = threading.Thread(target=thread_1)
t2 = threading.Thread(target=thread_2)
t1.start()
t2.start()

데드락 해결 전략

1. 예방 (Prevention) — 4가지 조건 중 하나를 제거

# 순환 대기 제거: 락 순서 강제
def safe_thread_1():
    lock_a.acquire()   # 항상 lock_a를 먼저
    lock_b.acquire()
    # ... 작업 ...
    lock_b.release()
    lock_a.release()

def safe_thread_2():
    lock_a.acquire()   # 항상 lock_a를 먼저 (같은 순서)
    lock_b.acquire()
    # ... 작업 ...
    lock_b.release()
    lock_a.release()

2. 회피 (Avoidance) — Banker's Algorithm

시스템이 안전 상태(safe state)를 유지하도록 자원 할당 결정. 각 프로세스의 최대 자원 요구를 미리 알아야 합니다.

3. 탐지 및 복구 (Detection and Recovery)

자원 할당 그래프에서 사이클을 탐지하고, 데드락 발견 시 프로세스 종료 또는 자원 회수.

4. 무시 (Ostrich Algorithm)

데드락이 매우 드물게 발생하면, 발생 시 시스템을 재시작. 대부분의 범용 OS가 이 전략을 사용합니다.

7. 메모리 관리

메모리 계층 구조

┌────────────────────┐  속도: 매우 빠름
│    CPU Registers   │  용량: 수 KB
├────────────────────┤
│    L1 Cache        │  ~1ns, 64KB
├────────────────────┤
│    L2 Cache        │  ~4ns, 256KB
├────────────────────┤
│    L3 Cache        │  ~12ns, 수 MB
├────────────────────┤
│    Main Memory     │  ~100ns, 수 GB
│    (DRAM)          │
├────────────────────┤
│    SSD             │  ~100us, 수 TB
├────────────────────┤  속도: 매우 느림
│    HDD             │  ~10ms, 수 TB
└────────────────────┘

논리 주소 vs 물리 주소

프로세스가 사용하는 주소는 논리(가상) 주소입니다. MMU(Memory Management Unit)가 이를 물리 주소로 변환합니다.

페이징 (Paging)

논리 주소 공간              물리 메모리
┌───────────┐              ┌───────────┐
│  Page 0   │──────────▶  │  Frame 3  │
├───────────┤              ├───────────┤
│  Page 1   │──────────▶  │  Frame 7  │
├───────────┤              ├───────────┤
│  Page 2   │──────────▶  │  Frame 1  │
├───────────┤              ├───────────┤
│  Page 3   │──────────▶  │  Frame 5  │
└───────────┘              └───────────┘

페이지 테이블:
Page 0 -> Frame 3
Page 1 -> Frame 7
Page 2 -> Frame 1
Page 3 -> Frame 5

TLB (Translation Lookaside Buffer)

TLB는 페이지 테이블의 캐시입니다. 주소 변환 속도를 크게 향상시킵니다.

논리 주소 ──▶ TLB 확인 ──hit──▶ 물리 주소
                  │
                 miss
                  │
                  ▼
              페이지 테이블
              조회 후 TLB 갱신

컨텍스트 스위치 시 TLB가 플러시(무효화)되므로, 프로세스 전환 비용이 큽니다. 스레드 전환은 TLB 플러시가 불필요합니다 (같은 주소 공간).

8. 가상 메모리 (Virtual Memory)

요구 페이징 (Demand Paging)

모든 페이지를 처음부터 메모리에 로드하지 않고, 실제로 접근할 때만 로드합니다.

페이지 폴트 처리 과정

1. CPU가 논리 주소 접근
2. 페이지 테이블에서 유효 비트(valid bit) 확인
3. 유효하지 않음 → 페이지 폴트 인터럽트 발생
4. OS가 디스크에서 해당 페이지 찾기
5. 빈 프레임 할당 (없으면 페이지 교체)
6. 디스크에서 프레임으로 로드
7. 페이지 테이블 갱신 (유효 비트 = 1)
8. 중단된 명령어 재실행

페이지 교체 알고리즘

알고리즘	설명	성능	구현
FIFO	가장 먼저 들어온 페이지 교체	Belady's anomaly 발생 가능	큐
LRU	가장 오래 사용되지 않은 페이지	좋음, 최적에 근사	스택/카운터
Clock	LRU 근사, 참조 비트 사용	좋음, 효율적	원형 리스트
LFU	가장 적게 사용된 페이지	특정 패턴에 좋음	카운터 + 힙
Optimal	가장 나중에 사용될 페이지	이론적 최적	구현 불가

스래싱 (Thrashing)

프로세스가 작업에 실제로 필요한 페이지(워킹 셋)보다 적은 프레임을 할당받으면, 페이지 폴트가 극도로 빈번하게 발생하여 CPU 사용률이 급격히 떨어지는 현상입니다.

CPU 사용률
 100% │       ╱╲
      │      ╱  ╲
      │     ╱    ╲
      │    ╱      ╲  ← 스래싱 시작
      │   ╱        ╲
      │  ╱          ╲
      │ ╱            ╲
   0% └──────────────────▶
      적음    멀티프로그래밍 정도    많음

해결책:

Working Set 모델: 프로세스의 워킹 셋 크기에 맞게 프레임 할당
PFF (Page Fault Frequency): 페이지 폴트 빈도로 프레임 할당 조절
프로세스 수 줄이기: 일부 프로세스를 스왑 아웃

9. 파일 시스템

inode 구조

┌─────────────────────────────┐
│           inode              │
├─────────────────────────────┤
│ 파일 타입 (regular, dir...) │
│ 권한 (rwxrwxrwx)           │
│ 소유자 (UID, GID)          │
│ 파일 크기                    │
│ 타임스탬프 (atime,mtime,ctime)│
│ 링크 카운트                  │
├─────────────────────────────┤
│ Direct Blocks (12개)        │ → 데이터 블록
│ Single Indirect Block       │ → 포인터 블록 → 데이터
│ Double Indirect Block       │ → 포인터 → 포인터 → 데이터
│ Triple Indirect Block       │ → 3단계 간접
└─────────────────────────────┘

ext4 vs XFS

특성	ext4	XFS
최대 파일 크기	16TB	8EB
최대 볼륨 크기	1EB	8EB
저널링	메타데이터 + 데이터	메타데이터만
할당 방식	Extent 기반	Extent + B+Tree
병렬 I/O	보통	우수 (Allocation Groups)
적합한 용도	범용, 소규모 파일	대용량 파일, 고성능

저널링 (Journaling)

파일 시스템 변경 사항을 데이터에 직접 적용하기 전에 저널(로그)에 먼저 기록합니다. 시스템 크래시 시 저널을 사용하여 일관성을 복구합니다.

1. 저널에 트랜잭션 시작 기록
2. 변경할 메타데이터/데이터를 저널에 기록
3. 저널에 트랜잭션 완료 기록
4. 실제 파일 시스템에 변경 적용
5. 저널에서 트랜잭션 제거

VFS (Virtual File System) 계층

사용자 프로그램
    │
    ▼
VFS (Virtual File System)
    │
    ├──▶ ext4
    ├──▶ XFS
    ├──▶ NFS (네트워크)
    ├──▶ procfs (/proc)
    └──▶ sysfs (/sys)

VFS는 다양한 파일 시스템에 대한 통일된 인터페이스를 제공합니다. 사용자 프로그램은 VFS를 통해 어떤 파일 시스템이든 동일한 시스템 콜(open, read, write 등)로 접근합니다.

/proc과 /sys

# /proc — 프로세스 및 커널 정보
cat /proc/cpuinfo          # CPU 정보
cat /proc/meminfo          # 메모리 정보
cat /proc/1234/status      # PID 1234의 상태
cat /proc/1234/maps        # PID 1234의 메모리 맵
cat /proc/sys/vm/swappiness # 스왑 사용 정도 (0-100)

# /sys — 커널 및 디바이스 정보
ls /sys/class/net/         # 네트워크 인터페이스
cat /sys/block/sda/queue/scheduler  # I/O 스케줄러

10. I/O 관리

Blocking vs Non-blocking I/O

// Blocking I/O — read()가 데이터 도착까지 블록
char buf[1024];
int n = read(fd, buf, sizeof(buf));  // 블로킹!

// Non-blocking I/O
int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
int n = read(fd, buf, sizeof(buf));
if (n == -1 && errno == EAGAIN) {
    // 데이터 없음, 나중에 다시 시도
}

I/O 멀티플렉싱: epoll (Linux)

#include <sys/epoll.h>

int epoll_fd = epoll_create1(0);

struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;  // Edge-Triggered
ev.data.fd = listen_fd;
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, listen_fd, &ev);

struct epoll_event events[MAX_EVENTS];
while (1) {
    int nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; i++) {
        if (events[i].data.fd == listen_fd) {
            // 새 연결 수락
            int conn_fd = accept(listen_fd, ...);
            // conn_fd도 epoll에 등록
        } else {
            // 데이터 읽기/쓰기
            handle_client(events[i].data.fd);
        }
    }
}

io_uring (Linux 5.1+)

io_uring은 시스템 콜 오버헤드 없이 비동기 I/O를 수행하는 최신 리눅스 인터페이스입니다.

┌─────────────────────────┐
│     User Space          │
│  ┌──────────────────┐   │
│  │ Submission Queue  │───┼───▶ 커널이 처리
│  │ (SQ)             │   │
│  └──────────────────┘   │
│  ┌──────────────────┐   │
│  │ Completion Queue  │◀──┼─── 커널이 완료 통지
│  │ (CQ)             │   │
│  └──────────────────┘   │
└─────────────────────────┘

SQ와 CQ는 커널과 유저 스페이스가 공유하는 링 버퍼입니다. 시스템 콜 없이 메모리를 통해 I/O 요청을 주고받습니다.

DMA (Direct Memory Access)

CPU 개입 없이 디바이스가 직접 메모리에 데이터를 전송합니다. 네트워크 카드, 디스크 컨트롤러 등에서 사용합니다.

Zero-Copy

전통적 방식 (4번 복사):
디스크 → 커널 버퍼 → 유저 버퍼 → 소켓 버퍼 → NIC

Zero-Copy (sendfile):
디스크 → 커널 버퍼 → NIC (2번 또는 그 이하)

#include <sys/sendfile.h>
// 파일을 소켓으로 직접 전송 (유저 공간 복사 없음)
sendfile(socket_fd, file_fd, NULL, file_size);

11. 리눅스 커널 기초

시스템 콜 (System Call)

사용자 프로그램
    │
    │  write(fd, buf, count)
    ▼
┌─────────────────┐
│  C Library      │  ← glibc의 write() 래퍼
│  (glibc)        │
└────────┬────────┘
         │ syscall 명령어
         ▼
┌─────────────────┐
│  Kernel Space   │  ← sys_write() 커널 함수
│                 │
└─────────────────┘

strace — 시스템 콜 추적

# 프로세스의 시스템 콜 추적
strace -p 1234

# 특정 시스템 콜만 필터링
strace -e trace=open,read,write ./my_program

# 시간 정보 포함
strace -T -e trace=network ./my_server

# 통계 요약
strace -c ./my_program

perf — 성능 분석

# CPU 프로파일링
perf record -g ./my_program
perf report

# 캐시 미스 확인
perf stat -e cache-misses,cache-references ./my_program

# 특정 이벤트 카운팅
perf stat -e context-switches,cpu-migrations ./my_program

eBPF (extended Berkeley Packet Filter)

eBPF는 커널을 수정하지 않고 커널 내부에서 프로그램을 실행할 수 있는 기술입니다. 네트워크 모니터링, 보안, 성능 분석에 활용됩니다.

# bpftrace를 사용한 간단한 eBPF 프로그래밍

# 모든 시스템 콜 카운트
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# 프로세스별 파일 열기 추적
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
    printf("%s opened %s\n", comm, str(args->filename));
}'

# 디스크 I/O 지연 히스토그램
bpftrace -e 'tracepoint:block:block_rq_complete {
    @usecs = hist(args->nr_sector);
}'

12. 컨테이너의 OS 관점

Docker의 실체 — Namespace + cgroups + Union FS

Docker 컨테이너는 VM이 아닙니다. 리눅스 커널의 기능을 조합하여 프로세스를 격리합니다.

Namespace — 프로세스 격리

Namespace	격리 대상	설명
PID	프로세스 ID	컨테이너 내부에서 PID 1부터 시작
NET	네트워크 스택	독립적 네트워크 인터페이스, IP
MNT	파일 시스템 마운트	독립적 마운트 포인트
UTS	호스트명	독립적 hostname
IPC	IPC 자원	독립적 메시지 큐, 세마포어
USER	사용자/그룹 ID	컨테이너 내 root를 호스트의 비특권 사용자로 매핑
CGROUP	cgroup 루트	독립적 cgroup 계층

# 새 PID namespace에서 셸 실행
sudo unshare --pid --mount-proc --fork /bin/bash
# 이 셸에서 ps aux를 하면 격리된 프로세스만 보임

# 새 네트워크 namespace
sudo ip netns add myns
sudo ip netns exec myns ip addr  # 격리된 네트워크 스택

cgroups — 자원 제한

# Docker의 cgroups 자원 제한 예시
docker run --cpus="0.5"           # CPU 50% 제한
docker run --memory="512m"        # 메모리 512MB 제한
docker run --pids-limit=100       # 최대 100개 프로세스
docker run --device-read-bps /dev/sda:10mb  # 디스크 읽기 제한

Union File System (OverlayFS)

Container Layer (읽기/쓰기)
    │
    ▼
┌─────────────────────┐
│   OverlayFS         │
│   (merged view)     │
└─────────────────────┘
    │         │
    ▼         ▼
Image Layer 3  Image Layer 2  Image Layer 1 (읽기 전용)
(app code)     (dependencies)  (base OS)

OverlayFS는 여러 파일 시스템 레이어를 하나로 합쳐 보여줍니다. 쓰기 시 Copy-on-Write로 상위 레이어에만 변경을 기록합니다.

13. 면접 질문 25선

프로세스/스레드 (1-5)

Q1: 프로세스와 스레드의 차이를 설명하세요.

프로세스는 독립적인 메모리 공간(Code, Data, Heap, Stack)을 가지는 실행 단위입니다. 스레드는 같은 프로세스 내에서 Code, Data, Heap을 공유하고 독립적인 Stack만 가집니다.

핵심 차이:

메모리: 프로세스는 독립, 스레드는 공유
생성 비용: 프로세스가 훨씬 높음
통신: 프로세스는 IPC 필요, 스레드는 공유 메모리 직접 접근
안정성: 프로세스 격리가 더 안전, 스레드는 하나의 오류가 전체에 영향

Q2: 컨텍스트 스위치란 무엇이고 비용이 발생하는 이유는?

컨텍스트 스위치는 CPU가 한 프로세스/스레드에서 다른 것으로 전환하는 것입니다.

비용 발생 이유:

PCB 저장/복원: 레지스터, 프로그램 카운터 등의 상태 저장
TLB 플러시: 프로세스 전환 시 TLB 무효화 (스레드 전환은 불필요)
캐시 무효화: L1/L2 캐시의 데이터가 새 프로세스와 무관
파이프라인 비우기: CPU 파이프라인의 명령어를 버림

최적화: 스레드 사용 (TLB 플러시 불필요), 코루틴, CPU 친화성(affinity) 설정.

Q3: fork()와 exec()의 차이를 설명하세요.

fork(): 현재 프로세스를 복제하여 자식 프로세스를 생성. 부모와 같은 코드를 실행. Copy-on-Write로 효율적.
exec(): 현재 프로세스의 메모리를 새로운 프로그램으로 교체. PID는 유지. 복귀하지 않음 (성공 시).

일반적인 패턴: fork() 후 자식에서 exec()를 호출하여 새 프로그램을 실행.

Q4: 좀비 프로세스와 고아 프로세스의 차이는?

좀비 프로세스: 자식이 종료되었지만 부모가 wait()를 호출하지 않은 상태. PCB만 남아 자원 낭비. SIGCHLD 핸들러나 wait() 호출로 해결.
고아 프로세스: 부모가 먼저 종료된 자식 프로세스. init(PID 1)이 양부모가 되어 wait()를 호출하므로 큰 문제 없음.

Q5: IPC 방법들을 비교하고 각각의 적합한 사용 사례를 설명하세요.

Pipe: 단방향, 부모-자식 간. 셸 파이프라인.
Socket: 양방향, 네트워크 지원. 클라이언트-서버 통신.
Shared Memory: 가장 빠름, 동기화 필요. 고성능 데이터 교환.
Message Queue: 비동기, 버퍼. 작업 큐 시스템.
Signal: 비동기 알림. 프로세스 제어 (SIGTERM, SIGKILL).

선택 기준: 통신 방향, 성능 요구, 네트워크 필요 여부, 동기/비동기 요구.

메모리 (6-10)

Q6: 가상 메모리가 무엇이고 왜 필요한가요?

가상 메모리는 각 프로세스에 독립적이고 연속적인 주소 공간을 제공하는 추상화 계층입니다.

필요한 이유:

메모리 보호: 프로세스 간 메모리 접근 차단
메모리 확장: 물리 메모리보다 큰 프로그램 실행 가능
메모리 효율: 실제 사용하는 페이지만 물리 메모리에 로드
프로그래밍 단순화: 프로세스는 0번지부터 시작하는 연속 주소 사용

Q7: 페이지 폴트 처리 과정을 설명하세요.

CPU가 가상 주소 접근
MMU가 페이지 테이블에서 해당 페이지의 유효 비트(valid bit) 확인
유효하지 않음 → 페이지 폴트 트랩 발생
OS가 디스크에서 해당 페이지 위치를 찾음
빈 물리 프레임이 없으면 페이지 교체 알고리즘 실행
디스크에서 물리 프레임으로 페이지 로드 (I/O 발생)
페이지 테이블 갱신 (유효 비트 = 1, 프레임 번호 기록)
트랩으로 중단된 명령어를 재실행

Q8: LRU 페이지 교체를 설명하고 구현 방법은?

LRU(Least Recently Used)는 가장 오랫동안 사용되지 않은 페이지를 교체합니다. 최적(OPT) 알고리즘에 근사합니다.

구현 방법:

카운터 방식: 각 페이지에 마지막 접근 시간 기록. 교체 시 최솟값 찾기.
스택 방식: 페이지 접근 시 스택 맨 위로 이동. 교체 시 스택 바닥의 페이지.
근사 LRU (Clock Algorithm): 참조 비트를 사용하여 원형 리스트에서 교체 대상 찾기. 실제 OS에서 가장 많이 사용.

Q9: 내부 단편화와 외부 단편화의 차이는?

내부 단편화: 할당된 메모리 블록 내부에 사용되지 않는 공간. 페이징에서 마지막 페이지가 완전히 사용되지 않을 때 발생.
외부 단편화: 총 빈 메모리는 충분하지만 연속적이지 않아 할당 불가능. 세그멘테이션에서 발생.

해결: 페이징은 외부 단편화를 제거하지만 내부 단편화 발생. 압축(compaction)으로 외부 단편화 해결 가능하지만 비용이 큼.

Q10: 스래싱이란 무엇이고 어떻게 방지하나요?

스래싱은 프로세스가 작업에 필요한 페이지(워킹 셋)보다 적은 프레임을 할당받아, 페이지 폴트가 극도로 빈번하게 발생하는 현상입니다. CPU 사용률이 급격히 떨어집니다.

방지 방법:

워킹 셋 모델: 프로세스의 워킹 셋 크기에 맞게 프레임 할당
PFF(Page Fault Frequency) 조절: 페이지 폴트 빈도 모니터링
멀티프로그래밍 정도 조절: 프로세스 수 줄이기 (일부 스왑 아웃)

동기화/데드락 (11-15)

Q11: 뮤텍스와 세마포어의 차이를 설명하세요.

뮤텍스:

이진(0/1) 잠금 메커니즘
소유권 있음: 잠금을 획득한 스레드만 해제 가능
하나의 스레드만 임계 영역 접근

세마포어:

카운팅 가능 (0~N)
소유권 없음: 어떤 스레드든 signal(post) 가능
N개의 스레드가 동시에 자원에 접근 가능

사용 사례: 뮤텍스는 상호 배제(하나의 자원), 세마포어는 자원 풀 관리(DB 커넥션 풀, 스레드 풀).

Q12: 데드락의 4가지 조건과 각각을 깨는 방법은?

상호 배제: 자원을 공유 가능하게 변경 (읽기 전용 자원은 공유 가능)
점유와 대기: 모든 자원을 한 번에 요청하거나, 자원 요청 전 보유 자원 해제
비선점: 자원을 강제로 빼앗는 메커니즘 도입
순환 대기: 자원에 번호를 매겨 항상 오름차순으로만 요청

실무에서 가장 효과적인 방법: 순환 대기 방지 (락 순서 강제).

Q13: 스핀락을 언제 사용해야 하나요?

스핀락 적합 조건:

임계 영역이 매우 짧은 경우 (수십 나노초 이하)
멀티코어 환경 (다른 코어에서 실행 중인 스레드가 곧 잠금 해제)
컨텍스트 스위치 비용이 스핀 대기 비용보다 큰 경우

부적합한 경우:

임계 영역이 긴 경우 (CPU 낭비)
단일 코어 환경 (잠금 보유 스레드가 실행될 수 없음)
잠금 보유 스레드가 선점될 수 있는 경우

Q14: 프로듀서-컨슈머 문제를 설명하고 해결 코드를 작성하세요.

프로듀서는 버퍼에 아이템을 추가하고, 컨슈머는 버퍼에서 아이템을 제거합니다. 버퍼가 가득 차면 프로듀서가 대기하고, 비어있으면 컨슈머가 대기합니다.

import threading
import queue

buffer = queue.Queue(maxsize=10)

def producer():
    for i in range(20):
        buffer.put(i)
        print(f"Produced: {i}")

def consumer():
    while True:
        item = buffer.get()
        if item is None:
            break
        print(f"Consumed: {item}")

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)
t1.start()
t2.start()
t1.join()
buffer.put(None)  # 종료 신호
t2.join()

Q15: Priority Inversion이 무엇이고 어떻게 해결하나요?

우선순위 역전: 높은 우선순위 태스크가 낮은 우선순위 태스크가 보유한 자원을 대기하는 상황. 중간 우선순위 태스크가 낮은 우선순위 태스크를 선점하면, 높은 우선순위 태스크가 무한히 대기할 수 있습니다.

해결 방법:

Priority Inheritance: 낮은 우선순위 태스크가 높은 우선순위를 일시적으로 상속
Priority Ceiling: 자원의 우선순위 상한을 미리 설정

실제 사례: Mars Pathfinder의 리셋 버그가 Priority Inversion 때문이었으며, Priority Inheritance로 해결했습니다.

리눅스/실전 (16-25)

Q16: 리눅스에서 파일을 삭제해도 디스크 공간이 회수되지 않는 경우는?

파일이 여전히 프로세스에 의해 열려 있으면 (open file descriptor), 파일명은 삭제되지만 inode는 유지됩니다. 해당 프로세스가 파일을 닫거나 종료될 때까지 디스크 공간은 회수되지 않습니다.

확인: lsof +L1 (삭제되었지만 열려있는 파일 목록) 해결: 해당 프로세스를 재시작하거나, 파일 디스크립터를 닫음.

Q17: Linux의 OOM Killer는 무엇인가요?

OOM(Out of Memory) Killer는 시스템의 메모리가 고갈되었을 때 커널이 프로세스를 선택하여 강제 종료하는 메커니즘입니다.

선택 기준: /proc/PID/oom_score 값이 높은 프로세스. 메모리 사용량, 실행 시간, 중요도 등을 고려합니다.

보호: oom_score_adj를 -1000으로 설정하면 OOM Kill 대상에서 제외.

echo -1000 > /proc/1234/oom_score_adj  # PID 1234를 OOM Kill에서 보호

Q18: epoll의 Level Triggered와 Edge Triggered 차이는?

Level Triggered (LT): 조건이 유지되는 동안 계속 알림. 데이터가 남아있으면 epoll_wait가 계속 반환. 안전하지만 불필요한 호출 가능.
Edge Triggered (ET): 상태가 변경될 때만 알림. 한 번만 알림을 받으므로 한 번에 모든 데이터를 읽어야 함. 고성능이지만 프로그래밍 주의 필요.

ET 사용 시 주의: non-blocking I/O + EAGAIN까지 읽기 필수.

Q19: strace로 프로세스의 어떤 문제를 진단할 수 있나요?

파일 접근 문제: 어떤 파일을 열려고 하는데 실패하는지
네트워크 문제: 어디에 connect하는지, 타임아웃이 발생하는지
성능 문제: 어떤 시스템 콜이 오래 걸리는지 (-T 옵션)
시그널 처리: 어떤 시그널을 받고 처리하는지
자원 부족: 파일 디스크립터 한도, 메모리 할당 실패 등

Q20: Docker 컨테이너가 VM과 다른 점을 OS 관점에서 설명하세요.

VM: 하이퍼바이저 위에 완전한 게스트 OS를 실행. 각 VM은 독자적 커널을 가짐. 하드웨어 수준 가상화.
컨테이너: 호스트 OS의 커널을 공유. Namespace로 프로세스를 격리하고, cgroups로 자원을 제한. OS 수준 가상화.

핵심 차이: 컨테이너는 커널을 공유하므로 부팅이 빠르고 오버헤드가 낮지만, 커널 취약점이 모든 컨테이너에 영향.

Q21: CFS 스케줄러의 동작 원리를 설명하세요.

CFS(Completely Fair Scheduler)는 모든 프로세스에 공평한 CPU 시간을 할당합니다.

동작 원리:

각 프로세스에 vruntime(가상 실행 시간) 추적
레드-블랙 트리에 vruntime 순으로 정렬
항상 vruntime이 가장 작은 프로세스를 다음에 실행
nice 값이 낮을수록 vruntime 증가 속도가 느림 (더 많은 CPU 시간)

Q22: Copy-on-Write의 동작 원리와 활용 사례는?

COW는 자원 복사를 실제 수정이 발생할 때까지 지연합니다. 처음에는 같은 물리 페이지를 공유하고, 한쪽이 쓰기를 시도하면 그때 페이지를 복사합니다.

활용 사례:

fork(): 자식 프로세스 생성 시 메모리 복사 지연
mmap(): 파일 매핑 시 실제 변경 시점까지 공유
Redis RDB 저장: fork()로 자식 프로세스가 스냅샷 생성

Q23: 커널 모드와 사용자 모드의 차이는?

사용자 모드: 제한된 명령어만 실행 가능. 하드웨어 직접 접근 불가. 일반 프로그램이 실행되는 모드.
커널 모드: 모든 명령어와 하드웨어에 접근 가능. 특권 명령어 실행 가능.

모드 전환: 시스템 콜, 인터럽트, 예외 발생 시 사용자 모드에서 커널 모드로 전환. 처리 후 복귀.

비용: 모드 전환 자체가 수백 나노초의 오버헤드 발생 (레지스터 저장/복원, 보안 검사).

Q24: 인터럽트와 트랩의 차이는?

인터럽트: 외부 이벤트에 의한 비동기적 신호. 하드웨어 장치(키보드, 네트워크, 타이머)가 CPU에 알림.
트랩: 소프트웨어에 의한 동기적 신호. 시스템 콜 실행, 0으로 나누기, 페이지 폴트 등.

공통점: 둘 다 현재 실행을 중단하고 인터럽트/트랩 핸들러를 실행. 처리 후 복귀.

Q25: 리눅스에서 프로세스의 메모리 레이아웃을 설명하세요.

높은 주소
┌─────────────────┐
│   Kernel Space   │  (사용자 접근 불가)
├─────────────────┤
│     Stack        │  ↓ 성장 (함수 호출, 지역 변수)
│                  │
├─────────────────┤
│   Shared Libs    │  (동적 라이브러리)
├─────────────────┤
│                  │
│     Heap         │  ↑ 성장 (malloc, new)
├─────────────────┤
│     BSS          │  (초기화되지 않은 전역/정적 변수)
├─────────────────┤
│     Data         │  (초기화된 전역/정적 변수)
├─────────────────┤
│     Text(Code)   │  (실행 코드, 읽기 전용)
└─────────────────┘
낮은 주소

/proc/PID/maps 또는 pmap PID로 실제 메모리 레이아웃을 확인할 수 있습니다.

14. 퀴즈

퀴즈 1: 프로세스가 fork()를 3번 호출하면 총 몇 개의 프로세스가 되나요?

8개 (2의 3제곱)

각 fork()는 현재 존재하는 모든 프로세스를 복제합니다.

첫 번째 fork(): 1 → 2
두 번째 fork(): 2 → 4
세 번째 fork(): 4 → 8

퀴즈 2: 다음 상황에서 데드락이 발생할 수 있나요? 자원 A, B가 있고, 스레드 1은 A를 먼저 잠그고 B를 잠그며, 스레드 2도 A를 먼저 잠그고 B를 잠급니다.

아니요, 데드락이 발생하지 않습니다.

두 스레드 모두 A를 먼저 잠그려 하므로, 하나가 A를 획득하면 다른 하나는 A를 기다립니다. A를 획득한 스레드가 B도 안전하게 획득하고 해제할 수 있습니다. 순환 대기 조건이 만족되지 않습니다.

데드락은 스레드 1이 A->B 순서, 스레드 2가 B->A 순서로 잠글 때 발생합니다.

퀴즈 3: TLB 미스가 페이지 폴트보다 비용이 낮은 이유는?

TLB 미스: 메모리에 있는 페이지 테이블을 참조하면 됨. 메모리 접근 1-2회 추가. 수백 나노초.
페이지 폴트: 디스크에서 페이지를 로드해야 함. 디스크 I/O는 수 밀리초(SSD) ~ 수십 밀리초(HDD). TLB 미스의 수천~수만 배.

핵심: TLB 미스는 메모리에서 해결, 페이지 폴트는 디스크에서 해결.

퀴즈 4: Docker에서 PID 1인 프로세스가 중요한 이유는?

리눅스에서 PID 1(init)은 특별한 역할을 합니다.

시그널 처리: 기본 시그널 핸들러가 적용되지 않아 SIGTERM을 무시할 수 있음
좀비 프로세스 회수: PID 1이 고아 프로세스의 부모가 되어 wait()을 호출해야 함
컨테이너 종료: PID 1이 종료되면 전체 컨테이너가 종료됨

해결: tini 같은 경량 init을 PID 1으로 사용하거나, Docker의 --init 옵션.

퀴즈 5: 물리 메모리가 4GB인 시스템에서 각 프로세스에 4GB 가상 주소 공간을 제공할 수 있는 이유는?

가상 메모리의 핵심 원리 때문입니다.

요구 페이징: 실제로 사용하는 페이지만 물리 메모리에 로드
스왑 공간: 사용하지 않는 페이지를 디스크로 교체
페이지 공유: 같은 라이브러리를 사용하는 프로세스들은 하나의 물리 복사본 공유

모든 프로세스가 동시에 4GB 전체를 사용하지 않으므로, 물리 메모리보다 큰 가상 공간을 제공할 수 있습니다.

참고 자료

"Operating System Concepts" (10th Edition) — Silberschatz, Galvin, Gagne
"Modern Operating Systems" (4th Edition) — Andrew S. Tanenbaum
"Linux Kernel Development" (3rd Edition) — Robert Love
"Understanding the Linux Kernel" — Daniel P. Bovet, Marco Cesati
Linux man pages: https://man7.org/linux/man-pages/
eBPF 공식 문서: https://ebpf.io/
Linux 커널 소스: https://github.com/torvalds/linux
OSDev Wiki: https://wiki.osdev.org/
Julia Evans의 시스템 프로그래밍 블로그: https://jvns.ca/

운영체제는 모든 소프트웨어의 기반입니다. 이 글에서 다룬 개념들은 단순히 면접 대비를 넘어, 성능 문제를 진단하고, 동시성 버그를 예방하며, 시스템 설계에서 올바른 결정을 내리는 데 도움이 됩니다. 특히 리눅스 커널의 동작 원리를 이해하면, 컨테이너, 클라우드, 분산 시스템에서 발생하는 문제를 근본적으로 해결할 수 있는 역량을 갖추게 됩니다.

OS Core Concepts Complete Guide: Everything About Operating Systems for Developer Interviews

Introduction
1. Why OS Knowledge Matters
- Frequently Asked in Interviews
- Importance in Practice
2. Process Management
3. Threads
4. CPU Scheduling
5. Synchronization
6. Deadlock
7. Memory Management
8. Virtual Memory
9. File System
10. I/O Management
11. Linux Kernel Basics
12. Containers from an OS Perspective
13. Interview Questions — 25 Essential Questions
14. Quiz
References

Introduction

Operating system (OS) knowledge is one of the most frequently tested CS fundamentals in developer interviews. Beyond interview preparation, a deep understanding of OS concepts makes a definitive difference in performance optimization, concurrency bug resolution, and system design.

This article systematically covers everything developers need to know about operating systems with practical code examples — from process management, threads, CPU scheduling, synchronization, deadlocks, memory management, virtual memory, file systems, I/O management, Linux kernel basics, to containers from an OS perspective.

1. Why OS Knowledge Matters

Frequently Asked in Interviews

OS questions appear in nearly every technical interview. Common questions include:

Explain the difference between processes and threads
What are the 4 conditions for deadlock and how to resolve them?
What is virtual memory and why is it needed?
How to reduce context switch costs?
What is the difference between mutex and semaphore?

Importance in Practice

Performance Optimization: Understanding CPU cache, memory hierarchy, and I/O patterns is essential
Concurrent Programming: Understanding synchronization to prevent race conditions and deadlocks
System Design: Inter-process communication, foundations of distributed systems
Troubleshooting: Utilizing system tools like strace, perf, eBPF
Containers/Cloud: Understanding namespaces and cgroups is key to Docker/K8s proficiency

2. Process Management

What is a Process?

A process is an instance of a running program. Each process has an independent memory space.

PCB (Process Control Block)

┌─────────────────────────────────┐
│         PCB (Process            │
│         Control Block)          │
├─────────────────────────────────┤
│  Process ID (PID)               │
│  Process State                  │
│  Program Counter (PC)           │
│  CPU Registers                  │
│  CPU Scheduling Info            │
│  Memory Management Info         │
│  I/O Status Info                │
│  Accounting Info                │
└─────────────────────────────────┘

The OS manages each process's execution state through the PCB. During a context switch, it saves the current process's PCB and restores the next process's PCB.

Process State Transitions

         ┌──────────┐
    New   │          │  Terminated
  ──────▶│  Ready   │──────▶
         │          │
         └────┬─────┘
              │ dispatch
              ▼
         ┌──────────┐     I/O request
         │ Running  │───────────┐
         │          │           │
         └────┬─────┘           ▼
              │           ┌──────────┐
   Preempt   │           │ Waiting  │
              │           │ (Blocked)│
              └───────────┤          │
                          └──────────┘
                     I/O complete -> Ready

fork/exec — Process Creation

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();  // Duplicate process

    if (pid < 0) {
        perror("fork failed");
        return 1;
    } else if (pid == 0) {
        // Child process
        printf("Child PID: %d, Parent PID: %d\n", getpid(), getppid());
        execlp("ls", "ls", "-la", NULL);
        perror("exec failed");
    } else {
        // Parent process
        printf("Parent PID: %d, Child PID: %d\n", getpid(), pid);
        int status;
        waitpid(pid, &status, 0);
        printf("Child exited with status: %d\n", WEXITSTATUS(status));
    }
    return 0;
}

Copy-on-Write (COW)

When fork() is called, parent and child initially share the same physical pages. Only when one side attempts to write does the page get copied. This prevents unnecessary memory copying.

IPC (Inter-Process Communication)

IPC Method	Characteristics	Use Cases
Pipe	Unidirectional, parent-child	Shell pipelines
Named Pipe (FIFO)	Bidirectional, unrelated processes	Simple data transfer
Socket	Bidirectional, network support	Client-server communication
Shared Memory	Fastest, requires synchronization	High-performance data exchange
Message Queue	Asynchronous, buffered	Task queues, event systems
Signal	Asynchronous notification	Process control (SIGTERM, SIGKILL)

// Shared Memory example
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>

int main() {
    const char *name = "/my_shm";
    const int SIZE = 4096;

    // Create shared memory object
    int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, SIZE);

    // Memory mapping
    void *ptr = mmap(0, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
    memcpy(ptr, "Hello from producer!", 20);

    // Cleanup: munmap, shm_unlink
    return 0;
}

3. Threads

Process vs Thread

Process A                Process B
┌───────────────┐        ┌───────────────┐
│ Code  │ Data  │        │ Code  │ Data  │
│───────│───────│        │───────│───────│
│ Heap  │ Stack │        │ Heap  │ Stack │
│       │ (main)│        │       │ (main)│
└───────────────┘        └───────────────┘
  Independent memory       Independent memory

Process C (multi-threaded)
┌───────────────────────────────┐
│ Code (shared) │ Data (shared) │
│───────────────│───────────────│
│ Heap (shared) │ Stack1│Stack2 │
│               │ (T1)  │(T2)  │
└───────────────────────────────┘
  Threads share Code, Data, Heap
  Each thread has its own Stack

Item	Process	Thread
Memory Space	Independent	Shared (Code, Data, Heap)
Creation Cost	High	Low
Context Switch	Expensive (TLB flush)	Cheap
Communication	Requires IPC	Direct shared memory access
Stability	One crash does not affect others	One crash affects entire process

Kernel Thread vs User Thread

User-Level Threads: Managed by thread library in user space. Fast switching but one blocking call blocks all threads.
Kernel-Level Threads: Managed by the OS kernel. True parallelism but higher switching cost.

POSIX pthread (C)

#include <pthread.h>
#include <stdio.h>

#define NUM_THREADS 4

typedef struct {
    int thread_id;
    int start;
    int end;
    long result;
} ThreadArg;

void* sum_range(void* arg) {
    ThreadArg* targ = (ThreadArg*)arg;
    targ->result = 0;
    for (int i = targ->start; i <= targ->end; i++) {
        targ->result += i;
    }
    printf("Thread %d: sum(%d..%d) = %ld\n",
           targ->thread_id, targ->start, targ->end, targ->result);
    return NULL;
}

int main() {
    pthread_t threads[NUM_THREADS];
    ThreadArg args[NUM_THREADS];
    int range_per_thread = 250;

    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].thread_id = i;
        args[i].start = i * range_per_thread + 1;
        args[i].end = (i + 1) * range_per_thread;
        pthread_create(&threads[i], NULL, sum_range, &args[i]);
    }

    long total = 0;
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        total += args[i].result;
    }

    printf("Total sum: %ld\n", total);  // 500500
    return 0;
}

Go Goroutines

package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    results := make(chan int, 4)

    for i := 0; i < 4; i++ {
        wg.Add(1)
        go func(id, start, end int) {
            defer wg.Done()
            sum := 0
            for j := start; j <= end; j++ {
                sum += j
            }
            results <- sum
            fmt.Printf("Goroutine %d: sum(%d..%d) = %d\n", id, start, end, sum)
        }(i, i*250+1, (i+1)*250)
    }

    go func() {
        wg.Wait()
        close(results)
    }()

    total := 0
    for r := range results {
        total += r
    }
    fmt.Printf("Total: %d\n", total)
}

Go goroutines start with only a few KB of stack. The Go runtime scheduler manages M:N threading (mapping M goroutines to N OS threads).

Python GIL (Global Interpreter Lock)

import threading
import multiprocessing
import time

# CPU-bound work: threads offer no benefit due to GIL
def cpu_bound(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

# Thread approach (GIL-limited)
def thread_test():
    threads = [threading.Thread(target=cpu_bound, args=(10_000_000,))
               for _ in range(4)]
    start = time.time()
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    print(f"Threads: {time.time() - start:.2f}s")

# Process approach (bypasses GIL)
def process_test():
    processes = [multiprocessing.Process(target=cpu_bound, args=(10_000_000,))
                 for _ in range(4)]
    start = time.time()
    for p in processes:
        p.start()
    for p in processes:
        p.join()
    print(f"Processes: {time.time() - start:.2f}s")

# I/O-bound: threads are effective
# CPU-bound: use multiprocessing or C extensions

Python 3.13+ has experimental GIL-free builds (PEP 703).

4. CPU Scheduling

Scheduling Algorithm Comparison

Algorithm	Feature	Pros	Cons
FCFS	First-come, first-served	Simple to implement	Convoy effect
SJF	Shortest job first	Minimum average wait time	Starvation for long jobs
Round Robin	Time quantum rotation	Fair, good response time	Quantum size critical
Priority	Priority-based	Fast handling of important tasks	Starvation (solved by aging)
MLFQ	Multi-level feedback queue	Adaptive, general-purpose	Complex implementation
CFS	Linux default	Fair CPU time distribution	Hard to guarantee latency

CFS (Completely Fair Scheduler) — Linux Default Scheduler

CFS aims to give every process a fair share of CPU time. It uses a red-black tree to always run the process with the smallest vruntime (virtual runtime) next.

# Adjust process priority with nice values (-20 to +19)
nice -n 10 ./my_program        # Run with lower priority
renice -n -5 -p 1234          # Raise priority of PID 1234

# Set real-time scheduling policy
chrt -f 50 ./realtime_program   # FIFO, priority 50
chrt -r 50 ./realtime_program   # Round Robin, priority 50

Resource Limiting with cgroups

# Limit CPU usage with cgroups v2
mkdir /sys/fs/cgroup/my_group

# CPU bandwidth limit: max 25ms per 50ms period (50%)
echo "25000 50000" > /sys/fs/cgroup/my_group/cpu.max

# Add process to cgroup
echo 1234 > /sys/fs/cgroup/my_group/cgroup.procs

# CPU weight (default 100, range 1-10000)
echo 200 > /sys/fs/cgroup/my_group/cpu.weight

5. Synchronization

Race Condition

// Race condition example — counter without synchronization
#include <pthread.h>
#include <stdio.h>

int counter = 0;  // Shared variable

void* increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // Not atomic! (read -> modify -> write)
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Counter: %d (expected 2000000)\n", counter);
    // Actually outputs less than 2000000
    return 0;
}

Critical Section

A critical section is a code block that accesses shared resources. It must satisfy 3 conditions:

Mutual Exclusion: Only one process/thread can access at a time
Progress: If no one is in the critical section, waiting processes can enter
Bounded Waiting: Prevent infinite waiting

Mutex

#include <pthread.h>
#include <stdio.h>

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;

void* safe_increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);     // Acquire lock
        counter++;                      // Critical section
        pthread_mutex_unlock(&lock);   // Release lock
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, safe_increment, NULL);
    pthread_create(&t2, NULL, safe_increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Counter: %d\n", counter);  // Exactly 2000000
    return 0;
}

Semaphore

#include <semaphore.h>
#include <pthread.h>
#include <stdio.h>

sem_t semaphore;

void* worker(void* arg) {
    int id = *(int*)arg;
    printf("Thread %d waiting...\n", id);
    sem_wait(&semaphore);     // Decrement, wait if 0
    printf("Thread %d entered critical section\n", id);
    sleep(2);
    printf("Thread %d leaving\n", id);
    sem_post(&semaphore);     // Increment
    return NULL;
}

int main() {
    sem_init(&semaphore, 0, 3);  // Allow 3 concurrent threads

    pthread_t threads[10];
    int ids[10];
    for (int i = 0; i < 10; i++) {
        ids[i] = i;
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }
    for (int i = 0; i < 10; i++) {
        pthread_join(threads[i], NULL);
    }
    sem_destroy(&semaphore);
    return 0;
}

Mutex vs Semaphore vs Spinlock

Feature	Mutex	Semaphore	Spinlock
Concurrent Access	1	N	1
Wait Method	Sleep (blocking)	Sleep (blocking)	Busy-wait
Ownership	Yes (only locking thread can unlock)	No	Yes
Best For	General mutual exclusion	Resource pool management	Short critical sections
Context Switch	Occurs	Occurs	None

6. Deadlock

Four Necessary Conditions

  Thread A              Thread B
  ┌──────┐             ┌──────┐
  │Lock X│────wait────▶│Lock Y│
  │held  │             │held  │
  └──────┘◀────wait────└──────┘

Mutual Exclusion: Only one process can use a resource at a time
Hold and Wait: Holding resources while waiting for others
No Preemption: Cannot forcibly take another process's resources
Circular Wait: Processes wait for resources in a circular chain

All four conditions must be met for deadlock. Breaking any one prevents deadlock.

Deadlock Example (Python)

import threading
import time

lock_a = threading.Lock()
lock_b = threading.Lock()

def thread_1():
    lock_a.acquire()
    time.sleep(0.1)
    lock_b.acquire()  # Deadlock! Thread 2 holds lock_b
    lock_b.release()
    lock_a.release()

def thread_2():
    lock_b.acquire()
    time.sleep(0.1)
    lock_a.acquire()  # Deadlock! Thread 1 holds lock_a
    lock_a.release()
    lock_b.release()

t1 = threading.Thread(target=thread_1)
t2 = threading.Thread(target=thread_2)
t1.start()
t2.start()

Deadlock Resolution Strategies

1. Prevention — Remove one of the 4 conditions

# Remove circular wait: enforce lock ordering
def safe_thread_1():
    lock_a.acquire()   # Always lock_a first
    lock_b.acquire()
    # ... work ...
    lock_b.release()
    lock_a.release()

def safe_thread_2():
    lock_a.acquire()   # Always lock_a first (same order)
    lock_b.acquire()
    # ... work ...
    lock_b.release()
    lock_a.release()

2. Avoidance — Banker's Algorithm: Ensure system stays in a safe state.

3. Detection and Recovery — Detect cycles in resource allocation graph, then terminate processes or reclaim resources.

4. Ignore (Ostrich Algorithm) — If deadlock is very rare, restart the system when it occurs. Most general-purpose OSes use this strategy.

7. Memory Management

Memory Hierarchy

┌────────────────────┐  Speed: Very fast
│    CPU Registers   │  Capacity: Several KB
├────────────────────┤
│    L1 Cache        │  ~1ns, 64KB
├────────────────────┤
│    L2 Cache        │  ~4ns, 256KB
├────────────────────┤
│    L3 Cache        │  ~12ns, Several MB
├────────────────────┤
│    Main Memory     │  ~100ns, Several GB
│    (DRAM)          │
├────────────────────┤
│    SSD             │  ~100us, Several TB
├────────────────────┤  Speed: Very slow
│    HDD             │  ~10ms, Several TB
└────────────────────┘

Paging

Logical Address Space        Physical Memory
┌───────────┐              ┌───────────┐
│  Page 0   │──────────▶  │  Frame 3  │
├───────────┤              ├───────────┤
│  Page 1   │──────────▶  │  Frame 7  │
├───────────┤              ├───────────┤
│  Page 2   │──────────▶  │  Frame 1  │
├───────────┤              ├───────────┤
│  Page 3   │──────────▶  │  Frame 5  │
└───────────┘              └───────────┘

TLB (Translation Lookaside Buffer)

TLB is a cache for the page table. It dramatically speeds up address translation. Context switching flushes the TLB (invalidation), which is why process switches are costly. Thread switches do not require TLB flushes (same address space).

8. Virtual Memory

Demand Paging

Pages are not loaded into memory upfront. They are loaded only when actually accessed.

Page Fault Handling Process

1. CPU accesses a logical address
2. Check valid bit in page table
3. Not valid -> Page fault interrupt
4. OS finds the page on disk
5. Allocate free frame (page replacement if none)
6. Load from disk to frame
7. Update page table (valid bit = 1)
8. Restart interrupted instruction

Page Replacement Algorithms

Algorithm	Description	Performance	Implementation
FIFO	Replace oldest page	Belady's anomaly possible	Queue
LRU	Replace least recently used	Good, approximates optimal	Stack/counter
Clock	LRU approximation with reference bit	Good, efficient	Circular list
LFU	Replace least frequently used	Good for certain patterns	Counter + heap
Optimal	Replace page used furthest in future	Theoretically optimal	Not implementable

Thrashing

When a process is allocated fewer frames than its working set requires, page faults become extremely frequent and CPU utilization drops dramatically.

Solutions:

Working Set Model: Allocate frames matching the working set size
PFF (Page Fault Frequency): Adjust frame allocation based on fault frequency
Reduce Multiprogramming: Swap out some processes

9. File System

inode Structure

┌─────────────────────────────┐
│           inode              │
├─────────────────────────────┤
│ File type (regular, dir...) │
│ Permissions (rwxrwxrwx)     │
│ Owner (UID, GID)            │
│ File size                    │
│ Timestamps (atime,mtime,ctime)│
│ Link count                   │
├─────────────────────────────┤
│ Direct Blocks (12)          │ -> Data blocks
│ Single Indirect Block       │ -> Pointer block -> Data
│ Double Indirect Block       │ -> Pointer -> Pointer -> Data
│ Triple Indirect Block       │ -> 3-level indirect
└─────────────────────────────┘

ext4 vs XFS

Feature	ext4	XFS
Max File Size	16TB	8EB
Max Volume Size	1EB	8EB
Journaling	Metadata + Data	Metadata only
Allocation	Extent-based	Extent + B+Tree
Parallel I/O	Average	Excellent (Allocation Groups)
Best For	General purpose, small files	Large files, high performance

Journaling

Records file system changes in a journal (log) before applying them to data. Uses the journal to restore consistency after system crashes.

VFS (Virtual File System) Layer

User Program
    |
    v
VFS (Virtual File System)
    |
    +-->  ext4
    +-->  XFS
    +-->  NFS (network)
    +-->  procfs (/proc)
    +-->  sysfs (/sys)

VFS provides a unified interface for various file systems. User programs access any file system through the same system calls (open, read, write, etc.).

10. I/O Management

Blocking vs Non-blocking I/O

// Blocking I/O -- read() blocks until data arrives
char buf[1024];
int n = read(fd, buf, sizeof(buf));  // Blocking!

// Non-blocking I/O
int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
int n = read(fd, buf, sizeof(buf));
if (n == -1 && errno == EAGAIN) {
    // No data available, try again later
}

I/O Multiplexing: epoll (Linux)

#include <sys/epoll.h>

int epoll_fd = epoll_create1(0);

struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;  // Edge-Triggered
ev.data.fd = listen_fd;
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, listen_fd, &ev);

struct epoll_event events[MAX_EVENTS];
while (1) {
    int nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; i++) {
        if (events[i].data.fd == listen_fd) {
            int conn_fd = accept(listen_fd, ...);
            // Register conn_fd with epoll too
        } else {
            handle_client(events[i].data.fd);
        }
    }
}

io_uring (Linux 5.1+)

io_uring is a modern Linux interface for performing asynchronous I/O without system call overhead. Submission Queue (SQ) and Completion Queue (CQ) are shared ring buffers between kernel and user space.

Zero-Copy

Traditional approach (4 copies):
Disk -> Kernel Buffer -> User Buffer -> Socket Buffer -> NIC

Zero-Copy (sendfile):
Disk -> Kernel Buffer -> NIC (2 or fewer copies)

#include <sys/sendfile.h>
// Send file directly to socket (no user space copy)
sendfile(socket_fd, file_fd, NULL, file_size);

11. Linux Kernel Basics

System Calls

User Program
    |
    |  write(fd, buf, count)
    v
+-------------------+
|  C Library        |  <- glibc write() wrapper
|  (glibc)          |
+--------+----------+
         | syscall instruction
         v
+-------------------+
|  Kernel Space     |  <- sys_write() kernel function
+-------------------+

strace — System Call Tracing

# Trace system calls of a process
strace -p 1234

# Filter specific system calls
strace -e trace=open,read,write ./my_program

# Include timing info
strace -T -e trace=network ./my_server

# Summary statistics
strace -c ./my_program

perf — Performance Analysis

# CPU profiling
perf record -g ./my_program
perf report

# Check cache misses
perf stat -e cache-misses,cache-references ./my_program

# Count specific events
perf stat -e context-switches,cpu-migrations ./my_program

eBPF (extended Berkeley Packet Filter)

eBPF allows running programs inside the kernel without modifying it. Used for network monitoring, security, and performance analysis.

# Simple eBPF with bpftrace

# Count all system calls
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Trace file opens per process
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
    printf("%s opened %s\n", comm, str(args->filename));
}'

12. Containers from an OS Perspective

Docker's Reality — Namespaces + cgroups + Union FS

Docker containers are not VMs. They combine Linux kernel features to isolate processes.

Namespaces — Process Isolation

Namespace	Isolates	Description
PID	Process IDs	PID starts at 1 inside container
NET	Network stack	Independent network interfaces, IP
MNT	Filesystem mounts	Independent mount points
UTS	Hostname	Independent hostname
IPC	IPC resources	Independent message queues, semaphores
USER	User/Group IDs	Map container root to unprivileged host user
CGROUP	cgroup root	Independent cgroup hierarchy

# Run shell in new PID namespace
sudo unshare --pid --mount-proc --fork /bin/bash
# ps aux in this shell shows only isolated processes

cgroups — Resource Limiting

# Docker cgroups resource limits
docker run --cpus="0.5"           # CPU 50% limit
docker run --memory="512m"        # Memory 512MB limit
docker run --pids-limit=100       # Max 100 processes

Union File System (OverlayFS)

Container Layer (read/write)
    |
    v
+---------------------+
|   OverlayFS         |
|   (merged view)     |
+---------------------+
    |         |
    v         v
Image Layer 3  Image Layer 2  Image Layer 1 (read-only)
(app code)     (dependencies)  (base OS)

OverlayFS merges multiple filesystem layers into one view. Writes use Copy-on-Write to record changes only in the upper layer.

13. Interview Questions — 25 Essential Questions

Process/Thread (1-5)

Q1: Explain the difference between processes and threads.

A process is an execution unit with independent memory space (Code, Data, Heap, Stack). A thread shares Code, Data, and Heap within the same process and has only an independent Stack.

Key differences:

Memory: Processes are independent, threads share
Creation cost: Processes are much more expensive
Communication: Processes need IPC, threads directly access shared memory
Stability: Process isolation is safer, thread errors affect the entire process

Q2: What is a context switch and why is it costly?

A context switch is when the CPU transitions from one process/thread to another.

Cost reasons:

PCB Save/Restore: Save registers, program counter state
TLB Flush: TLB invalidation on process switch (not needed for thread switch)
Cache Invalidation: L1/L2 cache data irrelevant to new process
Pipeline Flush: Discard CPU pipeline instructions

Optimization: Use threads (no TLB flush), coroutines, CPU affinity settings.

Q3: Explain the difference between fork() and exec().

fork(): Duplicates the current process to create a child. Child runs same code. Efficient with Copy-on-Write.
exec(): Replaces current process memory with a new program. PID is preserved. Does not return (on success).

Common pattern: Call fork(), then exec() in the child to run a new program.

Q4: What is the difference between zombie and orphan processes?

Zombie Process: Child has terminated but parent has not called wait(). Only PCB remains, wasting resources. Fix with SIGCHLD handler or wait().
Orphan Process: Parent terminated before child. init (PID 1) becomes the new parent and calls wait(), so not a major issue.

Q5: Compare IPC methods and explain appropriate use cases.

Pipe: Unidirectional, parent-child. Shell pipelines.
Socket: Bidirectional, network support. Client-server communication.
Shared Memory: Fastest, requires synchronization. High-performance data exchange.
Message Queue: Asynchronous, buffered. Task queue systems.
Signal: Asynchronous notification. Process control.

Selection criteria: Communication direction, performance requirements, network needs, sync/async requirements.

Memory (6-10)

Q6: What is virtual memory and why is it needed?

Virtual memory is an abstraction layer providing each process with an independent, contiguous address space.

Why needed:

Memory Protection: Block inter-process memory access
Memory Extension: Run programs larger than physical memory
Memory Efficiency: Only load actually-used pages into physical memory
Simplified Programming: Processes use contiguous addresses starting from 0

Q7: Describe the page fault handling process.

CPU accesses virtual address
MMU checks valid bit in page table
Not valid -> Page fault trap
OS finds page location on disk
If no free frame, run page replacement algorithm
Load page from disk to physical frame (I/O occurs)
Update page table (valid bit = 1, record frame number)
Restart the trapped instruction

Q8: Explain LRU page replacement and implementation methods.

LRU (Least Recently Used) replaces the page that has not been used for the longest time. It approximates the optimal (OPT) algorithm.

Implementation methods:

Counter method: Record last access time for each page. Find minimum on replacement.
Stack method: Move page to top on access. Replace page at bottom.
Approximate LRU (Clock Algorithm): Use reference bits in a circular list. Most commonly used in real OSes.

Q9: What is the difference between internal and external fragmentation?

Internal Fragmentation: Unused space within an allocated memory block. Occurs in paging when last page is not fully used.
External Fragmentation: Total free memory is sufficient but not contiguous, making allocation impossible. Occurs in segmentation.

Solution: Paging eliminates external fragmentation but introduces internal. Compaction can solve external fragmentation but is costly.

Q10: What is thrashing and how do you prevent it?

Thrashing occurs when a process receives fewer frames than its working set requires, causing extremely frequent page faults and dramatic CPU utilization drops.

Prevention:

Working Set Model: Allocate frames matching working set size
PFF (Page Fault Frequency): Monitor and adjust frame allocation
Reduce multiprogramming: Swap out some processes

Synchronization/Deadlock (11-15)

Q11: Explain the difference between mutex and semaphore.

Mutex:

Binary (0/1) locking mechanism
Has ownership: only the locking thread can unlock
One thread in critical section

Semaphore:

Counting capable (0~N)
No ownership: any thread can signal (post)
N threads can access resource simultaneously

Use cases: Mutex for mutual exclusion (single resource), semaphore for resource pool management (DB connection pool, thread pool).

Q12: What are the 4 deadlock conditions and how to break each?

Mutual Exclusion: Make resources shareable (read-only resources can be shared)
Hold and Wait: Request all resources at once, or release held resources before requesting
No Preemption: Introduce mechanism to forcibly take resources
Circular Wait: Number resources and always request in ascending order

Most effective in practice: Prevent circular wait (enforce lock ordering).

Q13: When should you use a spinlock?

Suitable conditions:

Very short critical sections (tens of nanoseconds or less)
Multi-core environment (thread on another core will release lock soon)
Context switch cost exceeds spin wait cost

Not suitable:

Long critical sections (CPU waste)
Single-core environment (lock-holding thread cannot run)
Lock-holding thread can be preempted

Q14: Explain the Producer-Consumer problem with solution code.

Producer adds items to a buffer, consumer removes them. Producer waits when buffer is full, consumer waits when empty.

import threading
import queue

buffer = queue.Queue(maxsize=10)

def producer():
    for i in range(20):
        buffer.put(i)
        print(f"Produced: {i}")

def consumer():
    while True:
        item = buffer.get()
        if item is None:
            break
        print(f"Consumed: {item}")

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)
t1.start()
t2.start()
t1.join()
buffer.put(None)  # Termination signal
t2.join()

Q15: What is Priority Inversion and how do you solve it?

Priority Inversion: A high-priority task waits for a resource held by a low-priority task. If a medium-priority task preempts the low-priority task, the high-priority task can wait indefinitely.

Solutions:

Priority Inheritance: Low-priority task temporarily inherits high priority
Priority Ceiling: Pre-set the priority ceiling for resources

Real case: The Mars Pathfinder reset bug was caused by Priority Inversion, solved with Priority Inheritance.

Linux/Practical (16-25)

Q16: When can deleting a file in Linux not reclaim disk space?

If a process still has the file open (open file descriptor), the filename is removed but the inode persists. Disk space is not reclaimed until the process closes the file or terminates.

Check: lsof +L1 (list deleted but open files) Fix: Restart the process or close the file descriptor.

Q17: What is the Linux OOM Killer?

The OOM (Out of Memory) Killer is a kernel mechanism that selects and forcibly terminates processes when system memory is exhausted.

Selection criteria: Processes with high /proc/PID/oom_score. Considers memory usage, runtime, importance, etc.

Protection: Set oom_score_adj to -1000 to exclude from OOM Kill targets.

Q18: What is the difference between epoll Level Triggered and Edge Triggered?

Level Triggered (LT): Continues notifying while condition persists. epoll_wait keeps returning if data remains. Safe but may cause unnecessary calls.
Edge Triggered (ET): Notifies only on state change. Must read all data at once since only one notification. Higher performance but requires careful programming.

ET caution: non-blocking I/O + read until EAGAIN is mandatory.

Q19: What problems can you diagnose with strace?

File access issues: Which files are being opened and failing
Network issues: Where it connects, whether timeouts occur
Performance issues: Which system calls take long (-T option)
Signal handling: Which signals are received and processed
Resource exhaustion: File descriptor limits, memory allocation failures

Q20: Explain how Docker containers differ from VMs from an OS perspective.

VM: Runs a complete guest OS on a hypervisor. Each VM has its own kernel. Hardware-level virtualization.
Container: Shares the host OS kernel. Uses Namespaces for process isolation and cgroups for resource limiting. OS-level virtualization.

Key difference: Containers share the kernel, so they boot faster with lower overhead, but kernel vulnerabilities affect all containers.

Q21: Explain how the CFS scheduler works.

CFS (Completely Fair Scheduler) allocates fair CPU time to all processes.

How it works:

Track vruntime (virtual runtime) for each process
Sort by vruntime in a red-black tree
Always run the process with smallest vruntime next
Lower nice values cause vruntime to increase more slowly (more CPU time)

Q22: Explain Copy-on-Write mechanics and use cases.

COW defers resource copying until actual modification occurs. Initially shares the same physical pages, and copies only when one side attempts to write.

Use cases:

fork(): Defer memory copying during child process creation
mmap(): Share file mappings until actual modification
Redis RDB save: Child process created via fork() generates snapshot

Q23: What is the difference between kernel mode and user mode?

User Mode: Can only execute restricted instructions. No direct hardware access. Where normal programs run.
Kernel Mode: Can access all instructions and hardware. Can execute privileged instructions.

Mode switch: Transitions from user mode to kernel mode during system calls, interrupts, and exceptions.

Cost: Mode switching itself incurs hundreds of nanoseconds overhead (register save/restore, security checks).

Q24: What is the difference between interrupts and traps?

Interrupt: Asynchronous signal from external events. Hardware devices (keyboard, network, timer) notify the CPU.
Trap: Synchronous signal from software. System call execution, division by zero, page faults, etc.

Common: Both suspend current execution and run the handler. Return after processing.

Q25: Describe the memory layout of a Linux process.

High address
+-----------------+
|   Kernel Space  |  (user inaccessible)
+-----------------+
|     Stack       |  grows downward (function calls, locals)
|                 |
+-----------------+
|   Shared Libs   |  (dynamic libraries)
+-----------------+
|                 |
|     Heap        |  grows upward (malloc, new)
+-----------------+
|     BSS         |  (uninitialized global/static vars)
+-----------------+
|     Data        |  (initialized global/static vars)
+-----------------+
|     Text(Code)  |  (executable code, read-only)
+-----------------+
Low address

Check actual layout with /proc/PID/maps or pmap PID.

14. Quiz

Quiz 1: If a process calls fork() 3 times, how many total processes exist?

8 (2 to the power of 3)

Each fork() duplicates all currently existing processes:

First fork(): 1 -> 2
Second fork(): 2 -> 4
Third fork(): 4 -> 8

Quiz 2: Can deadlock occur in this scenario? Resources A and B exist. Thread 1 locks A first then B. Thread 2 also locks A first then B.

No, deadlock cannot occur.

Both threads try to lock A first, so one acquires A while the other waits. The thread that acquired A can safely acquire B and release both. The circular wait condition is not satisfied.

Deadlock occurs when Thread 1 locks A->B and Thread 2 locks B->A.

Quiz 3: Why is a TLB miss less costly than a page fault?

TLB miss: Just needs to reference the page table in memory. 1-2 additional memory accesses. Hundreds of nanoseconds.
Page fault: Must load page from disk. Disk I/O takes milliseconds (SSD) to tens of milliseconds (HDD). Thousands to tens of thousands of times more than a TLB miss.

Key point: TLB miss is resolved in memory, page fault is resolved from disk.

Quiz 4: Why is the PID 1 process important in Docker?

In Linux, PID 1 (init) has special roles:

Signal handling: Default signal handlers do not apply, so it may ignore SIGTERM
Zombie process cleanup: PID 1 becomes parent of orphan processes and must call wait()
Container termination: When PID 1 exits, the entire container stops

Solution: Use a lightweight init like tini as PID 1, or Docker's --init option.

Quiz 5: How can a system with 4GB physical memory provide each process a 4GB virtual address space?

Due to virtual memory's core principles:

Demand Paging: Only actually-used pages are loaded into physical memory
Swap Space: Unused pages are swapped to disk
Page Sharing: Processes using the same library share one physical copy

Since not all processes use their full 4GB simultaneously, virtual space larger than physical memory can be provided.

References

"Operating System Concepts" (10th Edition) — Silberschatz, Galvin, Gagne
"Modern Operating Systems" (4th Edition) — Andrew S. Tanenbaum
"Linux Kernel Development" (3rd Edition) — Robert Love
"Understanding the Linux Kernel" — Daniel P. Bovet, Marco Cesati
Linux man pages: https://man7.org/linux/man-pages/
eBPF Official Docs: https://ebpf.io/
Linux Kernel Source: https://github.com/torvalds/linux
OSDev Wiki: https://wiki.osdev.org/
Julia Evans' Systems Programming Blog: https://jvns.ca/

Operating systems are the foundation of all software. The concepts covered in this article go beyond mere interview preparation — they help diagnose performance issues, prevent concurrency bugs, and make correct decisions in system design. Understanding how the Linux kernel works, in particular, equips you with the capability to fundamentally solve problems that arise in containers, cloud, and distributed systems.