- Authors

- Name
- Youngju Kim
- @fjvbn20031
Thread Concept
A thread is the basic unit of CPU utilization. Multiple threads can execute within a single process, and each thread has its own thread ID, program counter, register set, and stack. Threads of the same process share resources such as code, data, and files.
[Single-threaded vs Multi-threaded Process]
Single-threaded process: Multi-threaded process:
+------------------+ +------------------+
| Code | Data | Files| | Code | Data | Files|
+------------------+ +------------------+
| Registers | Stack | | Registers| Registers| Registers|
+------------------+ | Stack | Stack | Stack |
| Thread1 | Thread2 | Thread3 |
+----------------------------------+
Motivation for Using Threads
- Responsiveness: In GUI applications, one thread handles user input while another performs computations
- Resource Sharing: Threads of the same process automatically share memory and resources
- Economy: Thread creation is much lighter and faster than process creation. Context switching is also faster
- Scalability: Performance improves by running threads in parallel on multicore architectures
[Multi-threaded Web Server Structure]
+-> [Worker Thread 1] -> Handle request
Client requests ---> [Main Thread] -+-> [Worker Thread 2] -> Handle request
+-> [Worker Thread 3] -> Handle request
Each worker thread handles individual client requests
Code, data (cache, etc.) are shared by all threads
Multicore Programming
Parallel Processing on Multicore Systems
On multicore systems, threads can actually execute in parallel.
[Thread Execution on Single Core vs Multicore]
Single core (Concurrency):
Time -->
Core1: [T1][T2][T3][T1][T2][T3][T1]...
Alternating execution (time-sharing)
Multicore (Parallelism):
Time -->
Core1: [T1][T1][T1][T1][T1]...
Core2: [T2][T2][T2][T2][T2]...
Core3: [T3][T3][T3][T3][T3]...
Simultaneous execution
Amdahl's Law
If there is a portion of a program that must execute sequentially, no matter how many cores are added, there is a limit to speed improvement.
[Amdahl's Law]
Speedup = 1 / (S + (1 - S) / N)
S = Sequential execution ratio
N = Number of processors (cores)
Example: 25% of program is sequential (S = 0.25)
Cores (N) | Speedup
-----------+-----------
1 | 1.00x
2 | 1.60x
4 | 2.29x
8 | 2.91x
16 | 3.20x
Infinity | 4.00x (theoretical maximum)
If the sequential portion is 25%, the limit is 4x regardless of how many cores are added!
Challenges of Multicore Programming
- Task Decomposition: Identify tasks that can execute independently
- Balance: Distribute equal amounts of work to each thread
- Data Splitting: Partition data so it can be used on individual cores
- Data Dependency: Synchronization needed for shared data
- Testing and Debugging: Difficulties due to non-deterministic execution paths
Multithreading Models
User Threads and Kernel Threads
- User Threads: Managed by user-level libraries. The kernel is unaware of their existence
- Kernel Threads: Directly managed by the OS kernel. Kernel performs scheduling
[Multithreading Models]
1. Many-to-One:
User threads: T1 T2 T3 T4
\ | | /
Kernel threads: K1
- If one blocks, all block
- Cannot utilize multicore
2. One-to-One:
User threads: T1 T2 T3 T4
| | | |
Kernel threads: K1 K2 K3 K4
- Higher parallelism
- May limit number of threads
- Used by Linux, Windows
3. Many-to-Many:
User threads: T1 T2 T3 T4 T5
\ | X | /
Kernel threads: K1 K2 K3
- Flexible mapping
- Complex implementation
Most modern OSes use the one-to-one model. Linux creates kernel threads using the clone() system call.
Thread Libraries
Pthreads
The POSIX standard thread API. Used on Linux and macOS.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_THREADS 5
// Thread function
void *thread_func(void *arg) {
int thread_id = *(int *)arg;
long sum = 0;
// Each thread computes sum of a different range
long start = thread_id * 1000000L + 1;
long end = (thread_id + 1) * 1000000L;
for (long i = start; i <= end; i++) {
sum += i;
}
printf("Thread %d: sum from %ld to %ld = %ld\n",
thread_id, start, end, sum);
long *result = malloc(sizeof(long));
*result = sum;
return (void *)result;
}
int main() {
pthread_t threads[NUM_THREADS];
int thread_ids[NUM_THREADS];
long total = 0;
// Create threads
for (int i = 0; i < NUM_THREADS; i++) {
thread_ids[i] = i;
int rc = pthread_create(&threads[i], NULL,
thread_func, &thread_ids[i]);
if (rc) {
fprintf(stderr, "Thread creation failed: %d\n", rc);
exit(EXIT_FAILURE);
}
}
// Wait for all threads and collect results
for (int i = 0; i < NUM_THREADS; i++) {
void *result;
pthread_join(threads[i], &result);
total += *(long *)result;
free(result);
}
printf("Total sum: %ld\n", total);
return 0;
}
# -pthread flag required for compilation
gcc -pthread thread_sum.c -o thread_sum
Java Threads
In Java, threads are created by extending the Thread class or implementing the Runnable interface.
// Multi-threaded summation using Runnable interface
public class ThreadSum {
static long[] partialSums;
static class SumTask implements Runnable {
private final int threadId;
private final long start;
private final long end;
SumTask(int threadId, long start, long end) {
this.threadId = threadId;
this.start = start;
this.end = end;
}
@Override
public void run() {
long sum = 0;
for (long i = start; i <= end; i++) {
sum += i;
}
partialSums[threadId] = sum;
System.out.printf("Thread %d: sum %d~%d = %d%n",
threadId, start, end, sum);
}
}
public static void main(String[] args) throws Exception {
int numThreads = 5;
partialSums = new long[numThreads];
Thread[] threads = new Thread[numThreads];
for (int i = 0; i < numThreads; i++) {
long start = i * 1_000_000L + 1;
long end = (i + 1) * 1_000_000L;
threads[i] = new Thread(new SumTask(i, start, end));
threads[i].start();
}
// Wait for all threads to finish
for (Thread t : threads) {
t.join();
}
long total = 0;
for (long s : partialSums) {
total += s;
}
System.out.println("Total sum: " + total);
}
}
Implicit Threading
Multithreaded programming is difficult and error-prone. Implicit threading is an approach that delegates thread creation and management to the compiler or runtime library.
Thread Pool
Creates several threads in advance and assigns tasks to idle threads as work arrives.
// Java thread pool usage example
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.Callable;
import java.util.ArrayList;
import java.util.List;
public class ThreadPoolExample {
static class SumTask implements Callable<Long> {
private final long start, end;
SumTask(long start, long end) {
this.start = start;
this.end = end;
}
@Override
public Long call() {
long sum = 0;
for (long i = start; i <= end; i++) {
sum += i;
}
return sum;
}
}
public static void main(String[] args) throws Exception {
// Create thread pool with as many threads as CPU cores
int cores = Runtime.getRuntime().availableProcessors();
ExecutorService pool = Executors.newFixedThreadPool(cores);
List<Future<Long>> futures = new ArrayList<>();
// Submit tasks
for (int i = 0; i < 10; i++) {
long start = i * 1_000_000L + 1;
long end = (i + 1) * 1_000_000L;
futures.add(pool.submit(new SumTask(start, end)));
}
// Collect results
long total = 0;
for (Future<Long> f : futures) {
total += f.get();
}
System.out.println("Total sum: " + total);
pool.shutdown();
}
}
The advantages of thread pools are as follows.
- Reduces creation overhead by reusing existing threads
- Protects system resources by limiting the number of concurrent threads
- Enables flexible scheduling by separating task creation from execution
Fork-Join Framework
A pattern that divides (forks) a large task into smaller subtasks and combines (joins) the results.
[Fork-Join Operation]
[Full task: 1~10M]
|
fork / \ fork
/ \
[1~5M] [5M+1~10M]
| |
fork/ \fork fork/ \fork
/ \ / \
[1~2.5M][2.5M+1 [5M+1 [7.5M+1
~5M] ~7.5M] ~10M]
\ / \ /
join\ /join join\ /join
| |
[Partial1] [Partial2]
\ /
join\ /join
|
[Total Sum]
import java.util.concurrent.RecursiveTask;
import java.util.concurrent.ForkJoinPool;
public class ForkJoinSum extends RecursiveTask<Long> {
private static final long THRESHOLD = 100_000;
private final long start, end;
ForkJoinSum(long start, long end) {
this.start = start;
this.end = end;
}
@Override
protected Long compute() {
if (end - start <= THRESHOLD) {
// Base case: compute directly
long sum = 0;
for (long i = start; i <= end; i++) {
sum += i;
}
return sum;
}
// Recursive division
long mid = (start + end) / 2;
ForkJoinSum left = new ForkJoinSum(start, mid);
ForkJoinSum right = new ForkJoinSum(mid + 1, end);
left.fork(); // Execute left asynchronously
long rightResult = right.compute(); // Execute right in current thread
long leftResult = left.join(); // Wait for left result
return leftResult + rightResult;
}
public static void main(String[] args) {
ForkJoinPool pool = new ForkJoinPool();
long result = pool.invoke(new ForkJoinSum(1, 10_000_000));
System.out.println("Result: " + result);
}
}
OpenMP
Parallelizes C/C++ code using compiler directives (pragmas).
#include <stdio.h>
#include <omp.h>
int main() {
long sum = 0;
int n = 10000000;
// OpenMP parallel for loop
// Compiler automatically distributes iterations to threads
#pragma omp parallel for reduction(+:sum)
for (int i = 1; i <= n; i++) {
sum += i;
}
printf("Sum: %ld\n", sum);
printf("Number of threads used: %d\n", omp_get_max_threads());
return 0;
}
# OpenMP compilation
gcc -fopenmp parallel_sum.c -o parallel_sum
# Run with specified number of threads
OMP_NUM_THREADS=4 ./parallel_sum
The main OpenMP directives are as follows.
parallel: Start a parallel regionfor: Parallelize a loopcritical: Designate a critical sectionatomic: Atomic operationreduction: Reduction operation (sum, product, etc.)
Thread-Related Issues
Semantics of fork() and exec()
When fork() is called in a multithreaded process, there are two options.
- fork that copies all threads
- fork that copies only the calling thread (most Unix implementations)
When exec() is called, all threads are replaced by the new program.
Signal Handling
Which thread should receive a signal?
- Deliver to the thread that the signal applies to
- Deliver to every thread
- Deliver to a specific thread
- Designate a specific thread as the signal receiver
Thread Cancellation
Prematurely terminating a running thread. There are two approaches.
- Asynchronous Cancellation: Immediately terminates the target thread (risk of resource leaks)
- Deferred Cancellation: Target thread periodically checks for cancellation requests (safe)
Summary
A thread is a lightweight unit that divides execution flow within a process. True parallel execution is possible on multicore systems, but according to Amdahl's Law, the sequential portion becomes the bottleneck. Implicit threading techniques like thread pools, Fork-Join, and OpenMP reduce the programmer's burden while enabling efficient parallel processing.