Skip to content
Published on

[OS Concepts] 04. Threads and Concurrent Programming

Authors

Thread Concept

A thread is the basic unit of CPU utilization. Multiple threads can execute within a single process, and each thread has its own thread ID, program counter, register set, and stack. Threads of the same process share resources such as code, data, and files.

[Single-threaded vs Multi-threaded Process]

Single-threaded process:         Multi-threaded process:

+------------------+          +------------------+
| Code | Data | Files|          | Code | Data | Files|
+------------------+          +------------------+
| Registers | Stack |          | Registers| Registers| Registers|
+------------------+          | Stack    | Stack    | Stack    |
                              | Thread1  | Thread2  | Thread3  |
                              +----------------------------------+

Motivation for Using Threads

  1. Responsiveness: In GUI applications, one thread handles user input while another performs computations
  2. Resource Sharing: Threads of the same process automatically share memory and resources
  3. Economy: Thread creation is much lighter and faster than process creation. Context switching is also faster
  4. Scalability: Performance improves by running threads in parallel on multicore architectures
[Multi-threaded Web Server Structure]

                     +-> [Worker Thread 1] -> Handle request
Client requests ---> [Main Thread] -+-> [Worker Thread 2] -> Handle request
                     +-> [Worker Thread 3] -> Handle request

Each worker thread handles individual client requests
Code, data (cache, etc.) are shared by all threads

Multicore Programming

Parallel Processing on Multicore Systems

On multicore systems, threads can actually execute in parallel.

[Thread Execution on Single Core vs Multicore]

Single core (Concurrency):
Time -->
Core1: [T1][T2][T3][T1][T2][T3][T1]...
        Alternating execution (time-sharing)

Multicore (Parallelism):
Time -->
Core1: [T1][T1][T1][T1][T1]...
Core2: [T2][T2][T2][T2][T2]...
Core3: [T3][T3][T3][T3][T3]...
        Simultaneous execution

Amdahl's Law

If there is a portion of a program that must execute sequentially, no matter how many cores are added, there is a limit to speed improvement.

[Amdahl's Law]

Speedup = 1 / (S + (1 - S) / N)

S = Sequential execution ratio
N = Number of processors (cores)

Example: 25% of program is sequential (S = 0.25)

Cores (N)  |  Speedup
-----------+-----------
    1      |   1.00x
    2      |   1.60x
    4      |   2.29x
    8      |   2.91x
   16      |   3.20x
 Infinity  |   4.00x (theoretical maximum)

If the sequential portion is 25%, the limit is 4x regardless of how many cores are added!

Challenges of Multicore Programming

  1. Task Decomposition: Identify tasks that can execute independently
  2. Balance: Distribute equal amounts of work to each thread
  3. Data Splitting: Partition data so it can be used on individual cores
  4. Data Dependency: Synchronization needed for shared data
  5. Testing and Debugging: Difficulties due to non-deterministic execution paths

Multithreading Models

User Threads and Kernel Threads

  • User Threads: Managed by user-level libraries. The kernel is unaware of their existence
  • Kernel Threads: Directly managed by the OS kernel. Kernel performs scheduling
[Multithreading Models]

1. Many-to-One:
   User threads: T1 T2 T3 T4
                  \  |  |  /
   Kernel threads:  K1
   - If one blocks, all block
   - Cannot utilize multicore

2. One-to-One:
   User threads: T1  T2  T3  T4
                  |   |   |   |
   Kernel threads: K1  K2  K3  K4
   - Higher parallelism
   - May limit number of threads
   - Used by Linux, Windows

3. Many-to-Many:
   User threads: T1 T2 T3 T4 T5
                  \  | X  |  /
   Kernel threads: K1  K2  K3
   - Flexible mapping
   - Complex implementation

Most modern OSes use the one-to-one model. Linux creates kernel threads using the clone() system call.


Thread Libraries

Pthreads

The POSIX standard thread API. Used on Linux and macOS.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 5

// Thread function
void *thread_func(void *arg) {
    int thread_id = *(int *)arg;
    long sum = 0;

    // Each thread computes sum of a different range
    long start = thread_id * 1000000L + 1;
    long end = (thread_id + 1) * 1000000L;

    for (long i = start; i <= end; i++) {
        sum += i;
    }

    printf("Thread %d: sum from %ld to %ld = %ld\n",
           thread_id, start, end, sum);

    long *result = malloc(sizeof(long));
    *result = sum;
    return (void *)result;
}

int main() {
    pthread_t threads[NUM_THREADS];
    int thread_ids[NUM_THREADS];
    long total = 0;

    // Create threads
    for (int i = 0; i < NUM_THREADS; i++) {
        thread_ids[i] = i;
        int rc = pthread_create(&threads[i], NULL,
                                thread_func, &thread_ids[i]);
        if (rc) {
            fprintf(stderr, "Thread creation failed: %d\n", rc);
            exit(EXIT_FAILURE);
        }
    }

    // Wait for all threads and collect results
    for (int i = 0; i < NUM_THREADS; i++) {
        void *result;
        pthread_join(threads[i], &result);
        total += *(long *)result;
        free(result);
    }

    printf("Total sum: %ld\n", total);
    return 0;
}
# -pthread flag required for compilation
gcc -pthread thread_sum.c -o thread_sum

Java Threads

In Java, threads are created by extending the Thread class or implementing the Runnable interface.

// Multi-threaded summation using Runnable interface
public class ThreadSum {

    static long[] partialSums;

    static class SumTask implements Runnable {
        private final int threadId;
        private final long start;
        private final long end;

        SumTask(int threadId, long start, long end) {
            this.threadId = threadId;
            this.start = start;
            this.end = end;
        }

        @Override
        public void run() {
            long sum = 0;
            for (long i = start; i <= end; i++) {
                sum += i;
            }
            partialSums[threadId] = sum;
            System.out.printf("Thread %d: sum %d~%d = %d%n",
                              threadId, start, end, sum);
        }
    }

    public static void main(String[] args) throws Exception {
        int numThreads = 5;
        partialSums = new long[numThreads];
        Thread[] threads = new Thread[numThreads];

        for (int i = 0; i < numThreads; i++) {
            long start = i * 1_000_000L + 1;
            long end = (i + 1) * 1_000_000L;
            threads[i] = new Thread(new SumTask(i, start, end));
            threads[i].start();
        }

        // Wait for all threads to finish
        for (Thread t : threads) {
            t.join();
        }

        long total = 0;
        for (long s : partialSums) {
            total += s;
        }
        System.out.println("Total sum: " + total);
    }
}

Implicit Threading

Multithreaded programming is difficult and error-prone. Implicit threading is an approach that delegates thread creation and management to the compiler or runtime library.

Thread Pool

Creates several threads in advance and assigns tasks to idle threads as work arrives.

// Java thread pool usage example
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.Callable;
import java.util.ArrayList;
import java.util.List;

public class ThreadPoolExample {

    static class SumTask implements Callable<Long> {
        private final long start, end;

        SumTask(long start, long end) {
            this.start = start;
            this.end = end;
        }

        @Override
        public Long call() {
            long sum = 0;
            for (long i = start; i <= end; i++) {
                sum += i;
            }
            return sum;
        }
    }

    public static void main(String[] args) throws Exception {
        // Create thread pool with as many threads as CPU cores
        int cores = Runtime.getRuntime().availableProcessors();
        ExecutorService pool = Executors.newFixedThreadPool(cores);

        List<Future<Long>> futures = new ArrayList<>();

        // Submit tasks
        for (int i = 0; i < 10; i++) {
            long start = i * 1_000_000L + 1;
            long end = (i + 1) * 1_000_000L;
            futures.add(pool.submit(new SumTask(start, end)));
        }

        // Collect results
        long total = 0;
        for (Future<Long> f : futures) {
            total += f.get();
        }

        System.out.println("Total sum: " + total);
        pool.shutdown();
    }
}

The advantages of thread pools are as follows.

  • Reduces creation overhead by reusing existing threads
  • Protects system resources by limiting the number of concurrent threads
  • Enables flexible scheduling by separating task creation from execution

Fork-Join Framework

A pattern that divides (forks) a large task into smaller subtasks and combines (joins) the results.

[Fork-Join Operation]

       [Full task: 1~10M]
              |
        fork /  \ fork
            /    \
  [1~5M]        [5M+1~10M]
     |                |
  fork/ \fork      fork/ \fork
   /    \           /    \
[1~2.5M][2.5M+1  [5M+1   [7.5M+1
         ~5M]    ~7.5M]   ~10M]
   \    /           \    /
  join\ /join      join\ /join
     |                |
  [Partial1]       [Partial2]
        \    /
      join\ /join
           |
       [Total Sum]
import java.util.concurrent.RecursiveTask;
import java.util.concurrent.ForkJoinPool;

public class ForkJoinSum extends RecursiveTask<Long> {
    private static final long THRESHOLD = 100_000;
    private final long start, end;

    ForkJoinSum(long start, long end) {
        this.start = start;
        this.end = end;
    }

    @Override
    protected Long compute() {
        if (end - start <= THRESHOLD) {
            // Base case: compute directly
            long sum = 0;
            for (long i = start; i <= end; i++) {
                sum += i;
            }
            return sum;
        }

        // Recursive division
        long mid = (start + end) / 2;
        ForkJoinSum left = new ForkJoinSum(start, mid);
        ForkJoinSum right = new ForkJoinSum(mid + 1, end);

        left.fork();              // Execute left asynchronously
        long rightResult = right.compute();  // Execute right in current thread
        long leftResult = left.join();       // Wait for left result

        return leftResult + rightResult;
    }

    public static void main(String[] args) {
        ForkJoinPool pool = new ForkJoinPool();
        long result = pool.invoke(new ForkJoinSum(1, 10_000_000));
        System.out.println("Result: " + result);
    }
}

OpenMP

Parallelizes C/C++ code using compiler directives (pragmas).

#include <stdio.h>
#include <omp.h>

int main() {
    long sum = 0;
    int n = 10000000;

    // OpenMP parallel for loop
    // Compiler automatically distributes iterations to threads
    #pragma omp parallel for reduction(+:sum)
    for (int i = 1; i <= n; i++) {
        sum += i;
    }

    printf("Sum: %ld\n", sum);
    printf("Number of threads used: %d\n", omp_get_max_threads());

    return 0;
}
# OpenMP compilation
gcc -fopenmp parallel_sum.c -o parallel_sum

# Run with specified number of threads
OMP_NUM_THREADS=4 ./parallel_sum

The main OpenMP directives are as follows.

  • parallel: Start a parallel region
  • for: Parallelize a loop
  • critical: Designate a critical section
  • atomic: Atomic operation
  • reduction: Reduction operation (sum, product, etc.)

Semantics of fork() and exec()

When fork() is called in a multithreaded process, there are two options.

  • fork that copies all threads
  • fork that copies only the calling thread (most Unix implementations)

When exec() is called, all threads are replaced by the new program.

Signal Handling

Which thread should receive a signal?

  1. Deliver to the thread that the signal applies to
  2. Deliver to every thread
  3. Deliver to a specific thread
  4. Designate a specific thread as the signal receiver

Thread Cancellation

Prematurely terminating a running thread. There are two approaches.

  • Asynchronous Cancellation: Immediately terminates the target thread (risk of resource leaks)
  • Deferred Cancellation: Target thread periodically checks for cancellation requests (safe)

Summary

A thread is a lightweight unit that divides execution flow within a process. True parallel execution is possible on multicore systems, but according to Amdahl's Law, the sequential portion becomes the bottleneck. Implicit threading techniques like thread pools, Fork-Join, and OpenMP reduce the programmer's burden while enabling efficient parallel processing.