[OS Concepts] 08. Deadlocks: Prevention, Avoidance, Detection, Recovery

What is Deadlock

Deadlock is a state where two or more processes are each waiting for a resource held by the other, unable to make progress forever.

[Everyday Analogy of Deadlock]

Two cars meeting on a narrow bridge from opposite sides:
  <--- Car A      Car B --->
        [Bridge]
Both wait for the other to back off -> Deadlock

In a system:
Thread 1: lock(A) -> attempts lock(B) (waits)
Thread 2: lock(B) -> attempts lock(A) (waits)

// Code example that causes deadlock
#include <pthread.h>

pthread_mutex_t mutex_a = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex_b = PTHREAD_MUTEX_INITIALIZER;

void *thread1(void *arg) {
    pthread_mutex_lock(&mutex_a);    // Acquire A
    sleep(1);                         // Timing issue
    pthread_mutex_lock(&mutex_b);    // Wait for B (deadlock!)
    // ...
    pthread_mutex_unlock(&mutex_b);
    pthread_mutex_unlock(&mutex_a);
    return NULL;
}

void *thread2(void *arg) {
    pthread_mutex_lock(&mutex_b);    // Acquire B
    sleep(1);                         // Timing issue
    pthread_mutex_lock(&mutex_a);    // Wait for A (deadlock!)
    // ...
    pthread_mutex_unlock(&mutex_a);
    pthread_mutex_unlock(&mutex_b);
    return NULL;
}

Four Necessary Conditions for Deadlock

Deadlock occurs only when all four conditions hold simultaneously.

Mutual Exclusion: A resource can be used by only one process at a time
Hold and Wait: A process holds resources while waiting to acquire additional ones
No Preemption: Resources already allocated cannot be forcibly taken away
Circular Wait: P0 -> P1 -> P2 -> ... -> Pn -> P0 circular waiting pattern

Removing any one of the four conditions prevents deadlock.

Resource-Allocation Graph

A directed graph that visually represents deadlock.

[Resource-Allocation Graph Elements]

Process: Circle (O)
Resource type: Dots inside a rectangle (instances)
Request edge: Pi -> Rj  (process requests resource)
Assignment edge: Rj -> Pi  (resource assigned to process)

[Resource-Allocation Graph with Deadlock]

    P1 --request--> R1 (1) --assigned--> P2
    ^                                     |
    |                                     |
  assigned                             request
    |                                     |
    R2 (1) <--request-- P3 <--assigned-- R3 (1)

P1 requests R1 (held by P2)
P2 requests R3 (held by P3)
P3 requests R2 (held by P1)
-> Cycle exists -> Deadlock!

[Resource-Allocation Graph without Deadlock - Multiple Instances]

    P1 --request--> R1 (2)  Instance1 --assigned--> P2
                             Instance2 --assigned--> P3

When P2 or P3 releases R1, P1 can proceed
-> Cycle may exist but no deadlock
   (when resource has 2+ instances)

Summary of rules.

One instance per resource type: A cycle means definite deadlock
Multiple instances per resource type: A cycle may or may not mean deadlock

Deadlock Handling Methods

There are four approaches to dealing with deadlock.

Prevention: Fundamentally remove one of the 4 conditions
Avoidance: Only grant resource requests when safe
Detection and Recovery: Allow deadlock to occur, detect and recover
Ignore: Ignore deadlock (adopted by most OSes, the so-called ostrich algorithm)

Deadlock Prevention

1. Negate Mutual Exclusion

Sharable resources (e.g., read-only files) do not need mutual exclusion. However, this cannot apply to inherently non-sharable resources like printers and mutexes.

2. Negate Hold and Wait

Ensure a process holds no other resources when requesting a new one.

Method A: Request all needed resources at once before execution
  Pros: Simple
  Cons: Low resource utilization, possible starvation

Method B: Release all held resources before requesting new ones
  Pros: Flexible
  Cons: Complex implementation

3. Negate No Preemption

If a process holding resources cannot obtain additional resources, its held resources are forcibly reclaimed.

Process P holds resource A and requests B:
  If B is unavailable:
    Forcibly release A (preempt)
    Put P in waiting state for both A and B
    Restart when both A and B are available

This is only applicable to resources whose state can be saved/restored, such as CPU registers and memory.

4. Negate Circular Wait

Assign numbers to all resource types and enforce requests in order of increasing numbers.

// Circular wait prevention: Resource ordering protocol
// Resource numbers: mutex_a = 1, mutex_b = 2, mutex_c = 3

// Correct usage: Always acquire in ascending order
void safe_function() {
    pthread_mutex_lock(&mutex_a);    // Number 1
    pthread_mutex_lock(&mutex_b);    // Number 2
    pthread_mutex_lock(&mutex_c);    // Number 3

    // Critical section

    pthread_mutex_unlock(&mutex_c);
    pthread_mutex_unlock(&mutex_b);
    pthread_mutex_unlock(&mutex_a);
}

// Incorrect usage: Requesting in reverse order risks deadlock
void unsafe_function() {
    pthread_mutex_lock(&mutex_c);    // Number 3
    pthread_mutex_lock(&mutex_a);    // Number 1 (order violation!)
}

Deadlock Avoidance

Deadlock avoidance provides additional information to the system, refusing requests that could lead to deadlock.

Safe State

If the system is in a safe state, there exists a safe sequence that can complete all processes in some order.

[Relationship between Safe State and Deadlock]

+-------------------------------------------+
|              All States                    |
|  +----------------------------------+     |
|  |          Safe States              |     |
|  |                                  |     |
|  |                                  |     |
|  +----------------------------------+     |
|            Unsafe States                  |
|       (includes deadlock region)          |
|                 +------+                  |
|                 |Dead- |                  |
|                 |lock  |                  |
|                 +------+                  |
+-------------------------------------------+

Safe state -> No deadlock (guaranteed)
Unsafe state -> Deadlock possible (not certain)

Banker's Algorithm

A deadlock avoidance algorithm proposed by Dijkstra, analogous to a bank managing loans to avoid bankruptcy.

Required data structures (n = number of processes, m = number of resource types).

Available[m]:     Number of available instances per resource type
Max[n][m]:        Maximum demand per process
Allocation[n][m]: Current allocation per process
Need[n][m]:       Remaining need per process (Max - Allocation)

Safety Algorithm

[Safety Check Algorithm]

1. Work = copy of Available
   Finish[i] = false (for all i)

2. Find i such that:
   Finish[i] == false AND Need[i] <= Work

3. If found:
   Work = Work + Allocation[i]
   Finish[i] = true
   Go back to step 2

4. If not found:
   If all Finish[i] are true -> Safe state
   Otherwise -> Unsafe state

Example

[Banker's Algorithm Example]

Resources: A(10), B(5), C(7)

        Allocation    Max        Need       Available
        A  B  C      A  B  C    A  B  C    A  B  C
P0      0  1  0      7  5  3    7  4  3    3  3  2
P1      2  0  0      3  2  2    1  2  2
P2      3  0  2      9  0  2    6  0  0
P3      2  1  1      2  2  2    0  1  1
P4      0  0  2      4  3  3    4  3  1

Finding safe sequence:
1. Work = [3,3,2]
   P1's Need[1,2,2] <= [3,3,2]? Yes!
   Work = [3,3,2] + [2,0,0] = [5,3,2], Finish[P1] = true

2. Work = [5,3,2]
   P3's Need[0,1,1] <= [5,3,2]? Yes!
   Work = [5,3,2] + [2,1,1] = [7,4,3], Finish[P3] = true

3. Work = [7,4,3]
   P4's Need[4,3,1] <= [7,4,3]? Yes!
   Work = [7,4,3] + [0,0,2] = [7,4,5], Finish[P4] = true

4. Work = [7,4,5]
   P2's Need[6,0,0] <= [7,4,5]? Yes!
   Work = [7,4,5] + [3,0,2] = [10,4,7], Finish[P2] = true

5. Work = [10,4,7]
   P0's Need[7,4,3] <= [10,4,7]? Yes!
   Finish[P0] = true

Safe sequence: P1 -> P3 -> P4 -> P2 -> P0
The system is currently in a safe state!

Resource Request Algorithm

[When P1 additionally requests resources (1,0,2)]

1. Request[1] = [1,0,2] <= Need[1] = [1,2,2]? Yes (valid request)
2. Request[1] = [1,0,2] <= Available = [3,3,2]? Yes (resources exist)
3. Hypothetical allocation:
   Available = [3,3,2] - [1,0,2] = [2,3,0]
   Allocation[1] = [2,0,0] + [1,0,2] = [3,0,2]
   Need[1] = [1,2,2] - [1,0,2] = [0,2,0]

4. Run safety check:
   Check if safe sequence exists with new Available = [2,3,0]
   -> Allow request if safe, deny if unsafe

// Banker's Algorithm Java Implementation
public class BankersAlgorithm {
    private int[][] max;
    private int[][] allocation;
    private int[][] need;
    private int[] available;
    private int numProcesses;
    private int numResources;

    public BankersAlgorithm(int[][] max, int[][] alloc,
                             int[] avail) {
        this.max = max;
        this.allocation = alloc;
        this.available = avail;
        this.numProcesses = max.length;
        this.numResources = avail.length;
        this.need = new int[numProcesses][numResources];

        for (int i = 0; i < numProcesses; i++)
            for (int j = 0; j < numResources; j++)
                need[i][j] = max[i][j] - allocation[i][j];
    }

    public boolean isSafe() {
        int[] work = available.clone();
        boolean[] finish = new boolean[numProcesses];
        int count = 0;

        while (count < numProcesses) {
            boolean found = false;
            for (int i = 0; i < numProcesses; i++) {
                if (!finish[i] && canAllocate(need[i], work)) {
                    // Reclaim resources
                    for (int j = 0; j < numResources; j++)
                        work[j] += allocation[i][j];
                    finish[i] = true;
                    count++;
                    found = true;
                    System.out.println("P" + i + " can execute");
                }
            }
            if (!found) return false;
        }
        return true;
    }

    private boolean canAllocate(int[] need, int[] work) {
        for (int j = 0; j < numResources; j++)
            if (need[j] > work[j]) return false;
        return true;
    }
}

Deadlock Detection

Allow deadlock to occur and periodically run a detection algorithm.

Single Instance per Resource Type

Detect cycles in a wait-for graph.

[Wait-For Graph]

Remove resource nodes from the resource-allocation graph
and show only direct dependencies between processes

P1 --> P2 --> P3
 ^            |
 |            v
 +---- P4 <--+

Cycle exists: P1->P2->P3->P4->P1 -> Deadlock!

Multiple Instances per Resource Type

Use a detection algorithm similar to the banker's algorithm.

[Deadlock Detection Algorithm]

1. Work = Available
   If Allocation[i] != 0 then Finish[i] = false
   If Allocation[i] == 0 then Finish[i] = true

2. Find i where Finish[i] == false AND Request[i] <= Work

3. If found: Work = Work + Allocation[i]
             Finish[i] = true
             Go to step 2

4. If Finish[i] == false for any process
   -> Those processes are in deadlock

How often should the detection algorithm run?

If deadlock occurs frequently: Run often (increased overhead)
If rare: Run at intervals (delayed detection)
Run when CPU utilization drops below 40%

Deadlock Recovery

Process Termination

Terminate all deadlocked processes: Certain but costly
Terminate one at a time: Terminate one by one until deadlock resolves. Rerun detection algorithm each time

Criteria for determining termination order.

Process priority
Resources and time used so far
Resources remaining until completion
Number of processes that need to be terminated
Process type (interactive vs batch)

Resource Preemption

Forcibly reclaim resources from deadlocked processes.

[Resource Preemption Considerations]

1. Victim selection: Which process to take resources from?
   Select based on minimizing cost

2. Rollback: Roll back the victimized process to a safe state
   - Total rollback: Restart from the beginning
   - Partial rollback: Roll back just enough to break the deadlock

3. Starvation prevention: Ensure the same process is not always the victim
   Include rollback count in cost factors

Deadlock Handling in Real Systems

Most operating systems (Linux, Windows) use the strategy of ignoring deadlock. The overhead of deadlock prevention or avoidance is high, and deadlock occurs rarely.

Application-level countermeasures are as follows.

// Deadlock prevention using trylock
int try_acquire_both(pthread_mutex_t *a, pthread_mutex_t *b) {
    while (1) {
        pthread_mutex_lock(a);

        if (pthread_mutex_trylock(b) == 0) {
            return 0;  // Success: both locks acquired
        }

        // If b cannot be obtained, release a and retry
        pthread_mutex_unlock(a);
        usleep(rand() % 1000);  // Wait briefly then retry
    }
}

// Deadlock prevention using Java's tryLock
import java.util.concurrent.locks.ReentrantLock;
import java.util.concurrent.TimeUnit;

public class DeadlockFree {
    private final ReentrantLock lockA = new ReentrantLock();
    private final ReentrantLock lockB = new ReentrantLock();

    public void safeMethod() throws InterruptedException {
        while (true) {
            boolean gotA = lockA.tryLock(100, TimeUnit.MILLISECONDS);
            boolean gotB = lockB.tryLock(100, TimeUnit.MILLISECONDS);

            if (gotA && gotB) {
                try {
                    // Both locks acquired - work safely
                    System.out.println("Both resources acquired!");
                    break;
                } finally {
                    lockA.unlock();
                    lockB.unlock();
                }
            }

            // If either failed, release held lock
            if (gotA) lockA.unlock();
            if (gotB) lockB.unlock();

            // Random wait then retry (prevent livelock)
            Thread.sleep((long)(Math.random() * 100));
        }
    }
}

Summary

Deadlock occurs when all four conditions - mutual exclusion, hold and wait, no preemption, and circular wait - hold simultaneously. Prevention removes one condition, avoidance maintains safe state via the banker's algorithm, and detection detects and recovers after deadlock occurs. In real systems, it is common to either ignore deadlock or handle it at the application level with techniques like trylock.