Skip to content
Published on

OS Core Concepts Complete Guide: Everything About Operating Systems for Developer Interviews

Authors

Introduction

Operating system (OS) knowledge is one of the most frequently tested CS fundamentals in developer interviews. Beyond interview preparation, a deep understanding of OS concepts makes a definitive difference in performance optimization, concurrency bug resolution, and system design.

This article systematically covers everything developers need to know about operating systems with practical code examples — from process management, threads, CPU scheduling, synchronization, deadlocks, memory management, virtual memory, file systems, I/O management, Linux kernel basics, to containers from an OS perspective.


1. Why OS Knowledge Matters

Frequently Asked in Interviews

OS questions appear in nearly every technical interview. Common questions include:

  • Explain the difference between processes and threads
  • What are the 4 conditions for deadlock and how to resolve them?
  • What is virtual memory and why is it needed?
  • How to reduce context switch costs?
  • What is the difference between mutex and semaphore?

Importance in Practice

  1. Performance Optimization: Understanding CPU cache, memory hierarchy, and I/O patterns is essential
  2. Concurrent Programming: Understanding synchronization to prevent race conditions and deadlocks
  3. System Design: Inter-process communication, foundations of distributed systems
  4. Troubleshooting: Utilizing system tools like strace, perf, eBPF
  5. Containers/Cloud: Understanding namespaces and cgroups is key to Docker/K8s proficiency

2. Process Management

What is a Process?

A process is an instance of a running program. Each process has an independent memory space.

PCB (Process Control Block)

┌─────────────────────────────────┐
PCB (ProcessControl Block)├─────────────────────────────────┤
Process ID (PID)Process StateProgram Counter (PC)CPU RegistersCPU Scheduling InfoMemory Management InfoI/O Status InfoAccounting Info└─────────────────────────────────┘

The OS manages each process's execution state through the PCB. During a context switch, it saves the current process's PCB and restores the next process's PCB.

Process State Transitions

         ┌──────────┐
    New   │          │  Terminated
  ──────▶│  Ready   │──────▶
         │          │
         └────┬─────┘
              │ dispatch
         ┌──────────┐     I/O request
Running  │───────────┐
         │          │           │
         └────┬─────┘           ▼
              │           ┌──────────┐
   Preempt   │           │ Waiting (Blocked)              └───────────┤          │
                          └──────────┘
                     I/O complete -> Ready

fork/exec — Process Creation

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();  // Duplicate process

    if (pid < 0) {
        perror("fork failed");
        return 1;
    } else if (pid == 0) {
        // Child process
        printf("Child PID: %d, Parent PID: %d\n", getpid(), getppid());
        execlp("ls", "ls", "-la", NULL);
        perror("exec failed");
    } else {
        // Parent process
        printf("Parent PID: %d, Child PID: %d\n", getpid(), pid);
        int status;
        waitpid(pid, &status, 0);
        printf("Child exited with status: %d\n", WEXITSTATUS(status));
    }
    return 0;
}

Copy-on-Write (COW)

When fork() is called, parent and child initially share the same physical pages. Only when one side attempts to write does the page get copied. This prevents unnecessary memory copying.

IPC (Inter-Process Communication)

IPC MethodCharacteristicsUse Cases
PipeUnidirectional, parent-childShell pipelines
Named Pipe (FIFO)Bidirectional, unrelated processesSimple data transfer
SocketBidirectional, network supportClient-server communication
Shared MemoryFastest, requires synchronizationHigh-performance data exchange
Message QueueAsynchronous, bufferedTask queues, event systems
SignalAsynchronous notificationProcess control (SIGTERM, SIGKILL)
// Shared Memory example
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>

int main() {
    const char *name = "/my_shm";
    const int SIZE = 4096;

    // Create shared memory object
    int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, SIZE);

    // Memory mapping
    void *ptr = mmap(0, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
    memcpy(ptr, "Hello from producer!", 20);

    // Cleanup: munmap, shm_unlink
    return 0;
}

3. Threads

Process vs Thread

Process A                Process B
┌───────────────┐        ┌───────────────┐
CodeData  │        │ CodeData│───────│───────│        │───────│───────│
HeapStack │        │ HeapStack (main)│        │        (main)└───────────────┘        └───────────────┘
  Independent memory       Independent memory

Process C (multi-threaded)
┌───────────────────────────────┐
Code (shared)Data (shared)│───────────────│───────────────│
Heap (shared)Stack1│Stack2 (T1)  (T2)└───────────────────────────────┘
  Threads share Code, Data, Heap
  Each thread has its own Stack
ItemProcessThread
Memory SpaceIndependentShared (Code, Data, Heap)
Creation CostHighLow
Context SwitchExpensive (TLB flush)Cheap
CommunicationRequires IPCDirect shared memory access
StabilityOne crash does not affect othersOne crash affects entire process

Kernel Thread vs User Thread

  • User-Level Threads: Managed by thread library in user space. Fast switching but one blocking call blocks all threads.
  • Kernel-Level Threads: Managed by the OS kernel. True parallelism but higher switching cost.

POSIX pthread (C)

#include <pthread.h>
#include <stdio.h>

#define NUM_THREADS 4

typedef struct {
    int thread_id;
    int start;
    int end;
    long result;
} ThreadArg;

void* sum_range(void* arg) {
    ThreadArg* targ = (ThreadArg*)arg;
    targ->result = 0;
    for (int i = targ->start; i <= targ->end; i++) {
        targ->result += i;
    }
    printf("Thread %d: sum(%d..%d) = %ld\n",
           targ->thread_id, targ->start, targ->end, targ->result);
    return NULL;
}

int main() {
    pthread_t threads[NUM_THREADS];
    ThreadArg args[NUM_THREADS];
    int range_per_thread = 250;

    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].thread_id = i;
        args[i].start = i * range_per_thread + 1;
        args[i].end = (i + 1) * range_per_thread;
        pthread_create(&threads[i], NULL, sum_range, &args[i]);
    }

    long total = 0;
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        total += args[i].result;
    }

    printf("Total sum: %ld\n", total);  // 500500
    return 0;
}

Go Goroutines

package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    results := make(chan int, 4)

    for i := 0; i < 4; i++ {
        wg.Add(1)
        go func(id, start, end int) {
            defer wg.Done()
            sum := 0
            for j := start; j <= end; j++ {
                sum += j
            }
            results <- sum
            fmt.Printf("Goroutine %d: sum(%d..%d) = %d\n", id, start, end, sum)
        }(i, i*250+1, (i+1)*250)
    }

    go func() {
        wg.Wait()
        close(results)
    }()

    total := 0
    for r := range results {
        total += r
    }
    fmt.Printf("Total: %d\n", total)
}

Go goroutines start with only a few KB of stack. The Go runtime scheduler manages M:N threading (mapping M goroutines to N OS threads).

Python GIL (Global Interpreter Lock)

import threading
import multiprocessing
import time

# CPU-bound work: threads offer no benefit due to GIL
def cpu_bound(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

# Thread approach (GIL-limited)
def thread_test():
    threads = [threading.Thread(target=cpu_bound, args=(10_000_000,))
               for _ in range(4)]
    start = time.time()
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    print(f"Threads: {time.time() - start:.2f}s")

# Process approach (bypasses GIL)
def process_test():
    processes = [multiprocessing.Process(target=cpu_bound, args=(10_000_000,))
                 for _ in range(4)]
    start = time.time()
    for p in processes:
        p.start()
    for p in processes:
        p.join()
    print(f"Processes: {time.time() - start:.2f}s")

# I/O-bound: threads are effective
# CPU-bound: use multiprocessing or C extensions

Python 3.13+ has experimental GIL-free builds (PEP 703).


4. CPU Scheduling

Scheduling Algorithm Comparison

AlgorithmFeatureProsCons
FCFSFirst-come, first-servedSimple to implementConvoy effect
SJFShortest job firstMinimum average wait timeStarvation for long jobs
Round RobinTime quantum rotationFair, good response timeQuantum size critical
PriorityPriority-basedFast handling of important tasksStarvation (solved by aging)
MLFQMulti-level feedback queueAdaptive, general-purposeComplex implementation
CFSLinux defaultFair CPU time distributionHard to guarantee latency

CFS (Completely Fair Scheduler) — Linux Default Scheduler

CFS aims to give every process a fair share of CPU time. It uses a red-black tree to always run the process with the smallest vruntime (virtual runtime) next.

# Adjust process priority with nice values (-20 to +19)
nice -n 10 ./my_program        # Run with lower priority
renice -n -5 -p 1234          # Raise priority of PID 1234

# Set real-time scheduling policy
chrt -f 50 ./realtime_program   # FIFO, priority 50
chrt -r 50 ./realtime_program   # Round Robin, priority 50

Resource Limiting with cgroups

# Limit CPU usage with cgroups v2
mkdir /sys/fs/cgroup/my_group

# CPU bandwidth limit: max 25ms per 50ms period (50%)
echo "25000 50000" > /sys/fs/cgroup/my_group/cpu.max

# Add process to cgroup
echo 1234 > /sys/fs/cgroup/my_group/cgroup.procs

# CPU weight (default 100, range 1-10000)
echo 200 > /sys/fs/cgroup/my_group/cpu.weight

5. Synchronization

Race Condition

// Race condition example — counter without synchronization
#include <pthread.h>
#include <stdio.h>

int counter = 0;  // Shared variable

void* increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // Not atomic! (read -> modify -> write)
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Counter: %d (expected 2000000)\n", counter);
    // Actually outputs less than 2000000
    return 0;
}

Critical Section

A critical section is a code block that accesses shared resources. It must satisfy 3 conditions:

  1. Mutual Exclusion: Only one process/thread can access at a time
  2. Progress: If no one is in the critical section, waiting processes can enter
  3. Bounded Waiting: Prevent infinite waiting

Mutex

#include <pthread.h>
#include <stdio.h>

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;

void* safe_increment(void* arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);     // Acquire lock
        counter++;                      // Critical section
        pthread_mutex_unlock(&lock);   // Release lock
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, safe_increment, NULL);
    pthread_create(&t2, NULL, safe_increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Counter: %d\n", counter);  // Exactly 2000000
    return 0;
}

Semaphore

#include <semaphore.h>
#include <pthread.h>
#include <stdio.h>

sem_t semaphore;

void* worker(void* arg) {
    int id = *(int*)arg;
    printf("Thread %d waiting...\n", id);
    sem_wait(&semaphore);     // Decrement, wait if 0
    printf("Thread %d entered critical section\n", id);
    sleep(2);
    printf("Thread %d leaving\n", id);
    sem_post(&semaphore);     // Increment
    return NULL;
}

int main() {
    sem_init(&semaphore, 0, 3);  // Allow 3 concurrent threads

    pthread_t threads[10];
    int ids[10];
    for (int i = 0; i < 10; i++) {
        ids[i] = i;
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }
    for (int i = 0; i < 10; i++) {
        pthread_join(threads[i], NULL);
    }
    sem_destroy(&semaphore);
    return 0;
}

Mutex vs Semaphore vs Spinlock

FeatureMutexSemaphoreSpinlock
Concurrent Access1N1
Wait MethodSleep (blocking)Sleep (blocking)Busy-wait
OwnershipYes (only locking thread can unlock)NoYes
Best ForGeneral mutual exclusionResource pool managementShort critical sections
Context SwitchOccursOccursNone

6. Deadlock

Four Necessary Conditions

  Thread A              Thread B
  ┌──────┐             ┌──────┐
  │Lock X│────wait────▶│Lock Y  │held  │             │held  │
  └──────┘◀────wait────└──────┘
  1. Mutual Exclusion: Only one process can use a resource at a time
  2. Hold and Wait: Holding resources while waiting for others
  3. No Preemption: Cannot forcibly take another process's resources
  4. Circular Wait: Processes wait for resources in a circular chain

All four conditions must be met for deadlock. Breaking any one prevents deadlock.

Deadlock Example (Python)

import threading
import time

lock_a = threading.Lock()
lock_b = threading.Lock()

def thread_1():
    lock_a.acquire()
    time.sleep(0.1)
    lock_b.acquire()  # Deadlock! Thread 2 holds lock_b
    lock_b.release()
    lock_a.release()

def thread_2():
    lock_b.acquire()
    time.sleep(0.1)
    lock_a.acquire()  # Deadlock! Thread 1 holds lock_a
    lock_a.release()
    lock_b.release()

t1 = threading.Thread(target=thread_1)
t2 = threading.Thread(target=thread_2)
t1.start()
t2.start()

Deadlock Resolution Strategies

1. Prevention — Remove one of the 4 conditions

# Remove circular wait: enforce lock ordering
def safe_thread_1():
    lock_a.acquire()   # Always lock_a first
    lock_b.acquire()
    # ... work ...
    lock_b.release()
    lock_a.release()

def safe_thread_2():
    lock_a.acquire()   # Always lock_a first (same order)
    lock_b.acquire()
    # ... work ...
    lock_b.release()
    lock_a.release()

2. Avoidance — Banker's Algorithm: Ensure system stays in a safe state.

3. Detection and Recovery — Detect cycles in resource allocation graph, then terminate processes or reclaim resources.

4. Ignore (Ostrich Algorithm) — If deadlock is very rare, restart the system when it occurs. Most general-purpose OSes use this strategy.


7. Memory Management

Memory Hierarchy

┌────────────────────┐  Speed: Very fast
CPU RegistersCapacity: Several KB
├────────────────────┤
L1 Cache~1ns, 64KB
├────────────────────┤
L2 Cache~4ns, 256KB
├────────────────────┤
L3 Cache~12ns, Several MB
├────────────────────┤
Main Memory~100ns, Several GB
    (DRAM)├────────────────────┤
SSD~100us, Several TB
├────────────────────┤  Speed: Very slow
HDD~10ms, Several TB
└────────────────────┘

Paging

Logical Address Space        Physical Memory
┌───────────┐              ┌───────────┐
Page 0   │──────────▶  │  Frame 3├───────────┤              ├───────────┤
Page 1   │──────────▶  │  Frame 7├───────────┤              ├───────────┤
Page 2   │──────────▶  │  Frame 1├───────────┤              ├───────────┤
Page 3   │──────────▶  │  Frame 5└───────────┘              └───────────┘

TLB (Translation Lookaside Buffer)

TLB is a cache for the page table. It dramatically speeds up address translation. Context switching flushes the TLB (invalidation), which is why process switches are costly. Thread switches do not require TLB flushes (same address space).


8. Virtual Memory

Demand Paging

Pages are not loaded into memory upfront. They are loaded only when actually accessed.

Page Fault Handling Process

1. CPU accesses a logical address
2. Check valid bit in page table
3. Not valid -> Page fault interrupt
4. OS finds the page on disk
5. Allocate free frame (page replacement if none)
6. Load from disk to frame
7. Update page table (valid bit = 1)
8. Restart interrupted instruction

Page Replacement Algorithms

AlgorithmDescriptionPerformanceImplementation
FIFOReplace oldest pageBelady's anomaly possibleQueue
LRUReplace least recently usedGood, approximates optimalStack/counter
ClockLRU approximation with reference bitGood, efficientCircular list
LFUReplace least frequently usedGood for certain patternsCounter + heap
OptimalReplace page used furthest in futureTheoretically optimalNot implementable

Thrashing

When a process is allocated fewer frames than its working set requires, page faults become extremely frequent and CPU utilization drops dramatically.

Solutions:

  • Working Set Model: Allocate frames matching the working set size
  • PFF (Page Fault Frequency): Adjust frame allocation based on fault frequency
  • Reduce Multiprogramming: Swap out some processes

9. File System

inode Structure

┌─────────────────────────────┐
│           inode              │
├─────────────────────────────┤
File type (regular, dir...)Permissions (rwxrwxrwx)Owner (UID, GID)File size                    │
Timestamps (atime,mtime,ctime)Link count                   │
├─────────────────────────────┤
Direct Blocks (12)-> Data blocks
Single Indirect Block-> Pointer block -> Data
Double Indirect Block-> Pointer -> Pointer -> Data
Triple Indirect Block-> 3-level indirect
└─────────────────────────────┘

ext4 vs XFS

Featureext4XFS
Max File Size16TB8EB
Max Volume Size1EB8EB
JournalingMetadata + DataMetadata only
AllocationExtent-basedExtent + B+Tree
Parallel I/OAverageExcellent (Allocation Groups)
Best ForGeneral purpose, small filesLarge files, high performance

Journaling

Records file system changes in a journal (log) before applying them to data. Uses the journal to restore consistency after system crashes.

VFS (Virtual File System) Layer

User Program
    |
    v
VFS (Virtual File System)
    |
    +-->  ext4
    +-->  XFS
    +-->  NFS (network)
    +-->  procfs (/proc)
    +-->  sysfs (/sys)

VFS provides a unified interface for various file systems. User programs access any file system through the same system calls (open, read, write, etc.).


10. I/O Management

Blocking vs Non-blocking I/O

// Blocking I/O -- read() blocks until data arrives
char buf[1024];
int n = read(fd, buf, sizeof(buf));  // Blocking!

// Non-blocking I/O
int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
int n = read(fd, buf, sizeof(buf));
if (n == -1 && errno == EAGAIN) {
    // No data available, try again later
}

I/O Multiplexing: epoll (Linux)

#include <sys/epoll.h>

int epoll_fd = epoll_create1(0);

struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;  // Edge-Triggered
ev.data.fd = listen_fd;
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, listen_fd, &ev);

struct epoll_event events[MAX_EVENTS];
while (1) {
    int nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; i++) {
        if (events[i].data.fd == listen_fd) {
            int conn_fd = accept(listen_fd, ...);
            // Register conn_fd with epoll too
        } else {
            handle_client(events[i].data.fd);
        }
    }
}

io_uring (Linux 5.1+)

io_uring is a modern Linux interface for performing asynchronous I/O without system call overhead. Submission Queue (SQ) and Completion Queue (CQ) are shared ring buffers between kernel and user space.

Zero-Copy

Traditional approach (4 copies):
Disk -> Kernel Buffer -> User Buffer -> Socket Buffer -> NIC

Zero-Copy (sendfile):
Disk -> Kernel Buffer -> NIC (2 or fewer copies)
#include <sys/sendfile.h>
// Send file directly to socket (no user space copy)
sendfile(socket_fd, file_fd, NULL, file_size);

11. Linux Kernel Basics

System Calls

User Program
    |
    |  write(fd, buf, count)
    v
+-------------------+
|  C Library        |  <- glibc write() wrapper
|  (glibc)          |
+--------+----------+
         | syscall instruction
         v
+-------------------+
|  Kernel Space     |  <- sys_write() kernel function
+-------------------+

strace — System Call Tracing

# Trace system calls of a process
strace -p 1234

# Filter specific system calls
strace -e trace=open,read,write ./my_program

# Include timing info
strace -T -e trace=network ./my_server

# Summary statistics
strace -c ./my_program

perf — Performance Analysis

# CPU profiling
perf record -g ./my_program
perf report

# Check cache misses
perf stat -e cache-misses,cache-references ./my_program

# Count specific events
perf stat -e context-switches,cpu-migrations ./my_program

eBPF (extended Berkeley Packet Filter)

eBPF allows running programs inside the kernel without modifying it. Used for network monitoring, security, and performance analysis.

# Simple eBPF with bpftrace

# Count all system calls
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Trace file opens per process
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
    printf("%s opened %s\n", comm, str(args->filename));
}'

12. Containers from an OS Perspective

Docker's Reality — Namespaces + cgroups + Union FS

Docker containers are not VMs. They combine Linux kernel features to isolate processes.

Namespaces — Process Isolation

NamespaceIsolatesDescription
PIDProcess IDsPID starts at 1 inside container
NETNetwork stackIndependent network interfaces, IP
MNTFilesystem mountsIndependent mount points
UTSHostnameIndependent hostname
IPCIPC resourcesIndependent message queues, semaphores
USERUser/Group IDsMap container root to unprivileged host user
CGROUPcgroup rootIndependent cgroup hierarchy
# Run shell in new PID namespace
sudo unshare --pid --mount-proc --fork /bin/bash
# ps aux in this shell shows only isolated processes

cgroups — Resource Limiting

# Docker cgroups resource limits
docker run --cpus="0.5"           # CPU 50% limit
docker run --memory="512m"        # Memory 512MB limit
docker run --pids-limit=100       # Max 100 processes

Union File System (OverlayFS)

Container Layer (read/write)
    |
    v
+---------------------+
|   OverlayFS         |
|   (merged view)     |
+---------------------+
    |         |
    v         v
Image Layer 3  Image Layer 2  Image Layer 1 (read-only)
(app code)     (dependencies)  (base OS)

OverlayFS merges multiple filesystem layers into one view. Writes use Copy-on-Write to record changes only in the upper layer.


13. Interview Questions — 25 Essential Questions

Process/Thread (1-5)

Q1: Explain the difference between processes and threads.

A process is an execution unit with independent memory space (Code, Data, Heap, Stack). A thread shares Code, Data, and Heap within the same process and has only an independent Stack.

Key differences:

  • Memory: Processes are independent, threads share
  • Creation cost: Processes are much more expensive
  • Communication: Processes need IPC, threads directly access shared memory
  • Stability: Process isolation is safer, thread errors affect the entire process
Q2: What is a context switch and why is it costly?

A context switch is when the CPU transitions from one process/thread to another.

Cost reasons:

  1. PCB Save/Restore: Save registers, program counter state
  2. TLB Flush: TLB invalidation on process switch (not needed for thread switch)
  3. Cache Invalidation: L1/L2 cache data irrelevant to new process
  4. Pipeline Flush: Discard CPU pipeline instructions

Optimization: Use threads (no TLB flush), coroutines, CPU affinity settings.

Q3: Explain the difference between fork() and exec().
  • fork(): Duplicates the current process to create a child. Child runs same code. Efficient with Copy-on-Write.
  • exec(): Replaces current process memory with a new program. PID is preserved. Does not return (on success).

Common pattern: Call fork(), then exec() in the child to run a new program.

Q4: What is the difference between zombie and orphan processes?
  • Zombie Process: Child has terminated but parent has not called wait(). Only PCB remains, wasting resources. Fix with SIGCHLD handler or wait().
  • Orphan Process: Parent terminated before child. init (PID 1) becomes the new parent and calls wait(), so not a major issue.
Q5: Compare IPC methods and explain appropriate use cases.
  • Pipe: Unidirectional, parent-child. Shell pipelines.
  • Socket: Bidirectional, network support. Client-server communication.
  • Shared Memory: Fastest, requires synchronization. High-performance data exchange.
  • Message Queue: Asynchronous, buffered. Task queue systems.
  • Signal: Asynchronous notification. Process control.

Selection criteria: Communication direction, performance requirements, network needs, sync/async requirements.

Memory (6-10)

Q6: What is virtual memory and why is it needed?

Virtual memory is an abstraction layer providing each process with an independent, contiguous address space.

Why needed:

  1. Memory Protection: Block inter-process memory access
  2. Memory Extension: Run programs larger than physical memory
  3. Memory Efficiency: Only load actually-used pages into physical memory
  4. Simplified Programming: Processes use contiguous addresses starting from 0
Q7: Describe the page fault handling process.
  1. CPU accesses virtual address
  2. MMU checks valid bit in page table
  3. Not valid -> Page fault trap
  4. OS finds page location on disk
  5. If no free frame, run page replacement algorithm
  6. Load page from disk to physical frame (I/O occurs)
  7. Update page table (valid bit = 1, record frame number)
  8. Restart the trapped instruction
Q8: Explain LRU page replacement and implementation methods.

LRU (Least Recently Used) replaces the page that has not been used for the longest time. It approximates the optimal (OPT) algorithm.

Implementation methods:

  • Counter method: Record last access time for each page. Find minimum on replacement.
  • Stack method: Move page to top on access. Replace page at bottom.
  • Approximate LRU (Clock Algorithm): Use reference bits in a circular list. Most commonly used in real OSes.
Q9: What is the difference between internal and external fragmentation?
  • Internal Fragmentation: Unused space within an allocated memory block. Occurs in paging when last page is not fully used.
  • External Fragmentation: Total free memory is sufficient but not contiguous, making allocation impossible. Occurs in segmentation.

Solution: Paging eliminates external fragmentation but introduces internal. Compaction can solve external fragmentation but is costly.

Q10: What is thrashing and how do you prevent it?

Thrashing occurs when a process receives fewer frames than its working set requires, causing extremely frequent page faults and dramatic CPU utilization drops.

Prevention:

  1. Working Set Model: Allocate frames matching working set size
  2. PFF (Page Fault Frequency): Monitor and adjust frame allocation
  3. Reduce multiprogramming: Swap out some processes

Synchronization/Deadlock (11-15)

Q11: Explain the difference between mutex and semaphore.

Mutex:

  • Binary (0/1) locking mechanism
  • Has ownership: only the locking thread can unlock
  • One thread in critical section

Semaphore:

  • Counting capable (0~N)
  • No ownership: any thread can signal (post)
  • N threads can access resource simultaneously

Use cases: Mutex for mutual exclusion (single resource), semaphore for resource pool management (DB connection pool, thread pool).

Q12: What are the 4 deadlock conditions and how to break each?
  1. Mutual Exclusion: Make resources shareable (read-only resources can be shared)
  2. Hold and Wait: Request all resources at once, or release held resources before requesting
  3. No Preemption: Introduce mechanism to forcibly take resources
  4. Circular Wait: Number resources and always request in ascending order

Most effective in practice: Prevent circular wait (enforce lock ordering).

Q13: When should you use a spinlock?

Suitable conditions:

  • Very short critical sections (tens of nanoseconds or less)
  • Multi-core environment (thread on another core will release lock soon)
  • Context switch cost exceeds spin wait cost

Not suitable:

  • Long critical sections (CPU waste)
  • Single-core environment (lock-holding thread cannot run)
  • Lock-holding thread can be preempted
Q14: Explain the Producer-Consumer problem with solution code.

Producer adds items to a buffer, consumer removes them. Producer waits when buffer is full, consumer waits when empty.

import threading
import queue

buffer = queue.Queue(maxsize=10)

def producer():
    for i in range(20):
        buffer.put(i)
        print(f"Produced: {i}")

def consumer():
    while True:
        item = buffer.get()
        if item is None:
            break
        print(f"Consumed: {item}")

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)
t1.start()
t2.start()
t1.join()
buffer.put(None)  # Termination signal
t2.join()
Q15: What is Priority Inversion and how do you solve it?

Priority Inversion: A high-priority task waits for a resource held by a low-priority task. If a medium-priority task preempts the low-priority task, the high-priority task can wait indefinitely.

Solutions:

  1. Priority Inheritance: Low-priority task temporarily inherits high priority
  2. Priority Ceiling: Pre-set the priority ceiling for resources

Real case: The Mars Pathfinder reset bug was caused by Priority Inversion, solved with Priority Inheritance.

Linux/Practical (16-25)

Q16: When can deleting a file in Linux not reclaim disk space?

If a process still has the file open (open file descriptor), the filename is removed but the inode persists. Disk space is not reclaimed until the process closes the file or terminates.

Check: lsof +L1 (list deleted but open files) Fix: Restart the process or close the file descriptor.

Q17: What is the Linux OOM Killer?

The OOM (Out of Memory) Killer is a kernel mechanism that selects and forcibly terminates processes when system memory is exhausted.

Selection criteria: Processes with high /proc/PID/oom_score. Considers memory usage, runtime, importance, etc.

Protection: Set oom_score_adj to -1000 to exclude from OOM Kill targets.

Q18: What is the difference between epoll Level Triggered and Edge Triggered?
  • Level Triggered (LT): Continues notifying while condition persists. epoll_wait keeps returning if data remains. Safe but may cause unnecessary calls.
  • Edge Triggered (ET): Notifies only on state change. Must read all data at once since only one notification. Higher performance but requires careful programming.

ET caution: non-blocking I/O + read until EAGAIN is mandatory.

Q19: What problems can you diagnose with strace?
  1. File access issues: Which files are being opened and failing
  2. Network issues: Where it connects, whether timeouts occur
  3. Performance issues: Which system calls take long (-T option)
  4. Signal handling: Which signals are received and processed
  5. Resource exhaustion: File descriptor limits, memory allocation failures
Q20: Explain how Docker containers differ from VMs from an OS perspective.
  • VM: Runs a complete guest OS on a hypervisor. Each VM has its own kernel. Hardware-level virtualization.
  • Container: Shares the host OS kernel. Uses Namespaces for process isolation and cgroups for resource limiting. OS-level virtualization.

Key difference: Containers share the kernel, so they boot faster with lower overhead, but kernel vulnerabilities affect all containers.

Q21: Explain how the CFS scheduler works.

CFS (Completely Fair Scheduler) allocates fair CPU time to all processes.

How it works:

  1. Track vruntime (virtual runtime) for each process
  2. Sort by vruntime in a red-black tree
  3. Always run the process with smallest vruntime next
  4. Lower nice values cause vruntime to increase more slowly (more CPU time)
Q22: Explain Copy-on-Write mechanics and use cases.

COW defers resource copying until actual modification occurs. Initially shares the same physical pages, and copies only when one side attempts to write.

Use cases:

  1. fork(): Defer memory copying during child process creation
  2. mmap(): Share file mappings until actual modification
  3. Redis RDB save: Child process created via fork() generates snapshot
Q23: What is the difference between kernel mode and user mode?
  • User Mode: Can only execute restricted instructions. No direct hardware access. Where normal programs run.
  • Kernel Mode: Can access all instructions and hardware. Can execute privileged instructions.

Mode switch: Transitions from user mode to kernel mode during system calls, interrupts, and exceptions.

Cost: Mode switching itself incurs hundreds of nanoseconds overhead (register save/restore, security checks).

Q24: What is the difference between interrupts and traps?
  • Interrupt: Asynchronous signal from external events. Hardware devices (keyboard, network, timer) notify the CPU.
  • Trap: Synchronous signal from software. System call execution, division by zero, page faults, etc.

Common: Both suspend current execution and run the handler. Return after processing.

Q25: Describe the memory layout of a Linux process.
High address
+-----------------+
|   Kernel Space  |  (user inaccessible)
+-----------------+
|     Stack       |  grows downward (function calls, locals)
|                 |
+-----------------+
|   Shared Libs   |  (dynamic libraries)
+-----------------+
|                 |
|     Heap        |  grows upward (malloc, new)
+-----------------+
|     BSS         |  (uninitialized global/static vars)
+-----------------+
|     Data        |  (initialized global/static vars)
+-----------------+
|     Text(Code)  |  (executable code, read-only)
+-----------------+
Low address

Check actual layout with /proc/PID/maps or pmap PID.


14. Quiz

Quiz 1: If a process calls fork() 3 times, how many total processes exist?

8 (2 to the power of 3)

Each fork() duplicates all currently existing processes:

  • First fork(): 1 -> 2
  • Second fork(): 2 -> 4
  • Third fork(): 4 -> 8
Quiz 2: Can deadlock occur in this scenario? Resources A and B exist. Thread 1 locks A first then B. Thread 2 also locks A first then B.

No, deadlock cannot occur.

Both threads try to lock A first, so one acquires A while the other waits. The thread that acquired A can safely acquire B and release both. The circular wait condition is not satisfied.

Deadlock occurs when Thread 1 locks A->B and Thread 2 locks B->A.

Quiz 3: Why is a TLB miss less costly than a page fault?
  • TLB miss: Just needs to reference the page table in memory. 1-2 additional memory accesses. Hundreds of nanoseconds.
  • Page fault: Must load page from disk. Disk I/O takes milliseconds (SSD) to tens of milliseconds (HDD). Thousands to tens of thousands of times more than a TLB miss.

Key point: TLB miss is resolved in memory, page fault is resolved from disk.

Quiz 4: Why is the PID 1 process important in Docker?

In Linux, PID 1 (init) has special roles:

  1. Signal handling: Default signal handlers do not apply, so it may ignore SIGTERM
  2. Zombie process cleanup: PID 1 becomes parent of orphan processes and must call wait()
  3. Container termination: When PID 1 exits, the entire container stops

Solution: Use a lightweight init like tini as PID 1, or Docker's --init option.

Quiz 5: How can a system with 4GB physical memory provide each process a 4GB virtual address space?

Due to virtual memory's core principles:

  1. Demand Paging: Only actually-used pages are loaded into physical memory
  2. Swap Space: Unused pages are swapped to disk
  3. Page Sharing: Processes using the same library share one physical copy

Since not all processes use their full 4GB simultaneously, virtual space larger than physical memory can be provided.


References


Operating systems are the foundation of all software. The concepts covered in this article go beyond mere interview preparation — they help diagnose performance issues, prevent concurrency bugs, and make correct decisions in system design. Understanding how the Linux kernel works, in particular, equips you with the capability to fundamentally solve problems that arise in containers, cloud, and distributed systems.