- Authors

- Name
- Youngju Kim
- @fjvbn20031
Overview
Most of a program's execution time is spent in loops. Therefore, optimizing loops is the most effective way to improve overall program performance. In this article, we systematically examine techniques for identifying and optimizing loops.
1. Dominators
1.1 Definition
In a flow graph, node d dominates node n = every path from the entry point to n passes through d.
This is denoted as d dom n.
Properties:
- Every node dominates itself (reflexive)
- If d dom n and n dom d then d = n (antisymmetric)
- If d dom n and n dom p then d dom p (transitive)
1.2 Dominator Tree
A tree representation of the dominator relation. Each node's parent is its immediate dominator.
// Flow graph:
// 1 -> 2 -> 3 -> 4
// | ^ |
// v | v
// 5 ---+ 6
// |
// v
// 7
// Dominator tree:
// 1
// |
// 2
// / | \
// 3 5 6
// | |
// 4 7
//
// Interpretation: 1 dom all nodes
// 2 dom 3,4,5,6,7
// 3 dom 4
// 6 dom 7
1.3 Dominator Computation Algorithm
Dominators are computed using an iterative algorithm:
Algorithm: Dominator Computation
Initialization:
DOM[entry] = {entry}
DOM[n] = set of all nodes (n != entry)
Iterate (until convergence):
for (all nodes n != entry):
DOM[n] = {n} U ( n DOM[p] ) // p is all predecessors of n
// Add n to the intersection of predecessors' DOMs
1.4 Dominance Frontier
The dominance frontier DF(n) of node n is the boundary just outside the region dominated by n.
DF(n) = set of nodes y | n dominates a predecessor of y, but does not (strictly) dominate y itself
The dominance frontier is critically used to determine where to insert phi functions in SSA construction.
2. Natural Loops
2.1 Back Edge
In a flow graph, if in edge n -> d, d dominates n, then this edge is called a back edge.
// In a flow graph:
// 1 -> 2 -> 3 -> 4
// ^ |
// +---------+ // 4 -> 2 is a back edge (2 dom 4)
2.2 Definition of Natural Loop
The natural loop for back edge n -> d:
- Header: Node d (the sole entry point of the loop)
- Body: The set of all nodes that can reach n without going through d, plus d
Algorithm: Natural Loop Construction (back edge n -> d)
1. loop = {d}
2. stack = {n} (if n is different from d)
3. while stack is not empty:
m = pop from stack
if m is not in loop:
add m to loop
push all predecessors of m onto stack
4. Result: loop
2.3 Example
// Flow graph:
// B1 -> B2 -> B3 -> B4
// ^ |
// +----------+ (back edge: B4 -> B2)
// |
// +-- B5 (B2 -> B5, B5 -> B2 is also a back edge)
// Natural loop for back edge B4 -> B2:
// {B2, B3, B4} (reachable to B4 without going through B2: B3, B4)
// Natural loop for back edge B5 -> B2:
// {B2, B5}
2.4 Loop Nesting
If two loops share the same header, they are merged and treated as a single loop. Otherwise, one is completely contained within the other (nested) or they are disjoint.
// Outer loop: {B1, B2, B3, B4, B5}
// Inner loop: {B3, B4}
//
// Optimize the inner loop first, then the outer loop
// (from inside out)
3. Loop-Invariant Code Detection and Motion
3.1 Loop-Invariant Code Detection
The conditions for instruction s: x = y op z to be loop invariant:
- All definitions of y and z are outside the loop, or
- y (or z) has exactly one definition, and that definition has already been identified as loop invariant
- y (or z) is a constant
Algorithm: Loop-Invariant Code Detection
1. Mark instructions whose operands are constants or defined only outside
the loop as "loop invariant"
2. Repeat:
- Newly mark instructions as "loop invariant" if all operands are constants,
outside definitions, or have exactly one definition already marked as loop invariant
3. Terminate when no new markings are made
3.2 Example
// Loop:
// B2: i = i + 1
// t1 = n * 4 // n is defined outside the loop -> loop invariant
// t2 = a[t1] // t1 is loop invariant and a is invariant in loop -> loop invariant
// t3 = t2 + i
// if i < 100 goto B2
// Pass 1: Mark t1 = n * 4 (n is an outside definition)
// Pass 2: Mark t2 = a[t1] (t1 is loop invariant)
// Pass 3: No more -> terminate
3.3 Code Motion Conditions
To move a loop-invariant instruction s: x = y op z outside the loop (to the preheader), all of the following conditions must be satisfied:
Condition 1: The block containing s dominates all exits of the loop
// Safe case:
// preheader -> [B1: s: x=... ] -> B2(exit)
// B1 dominates the exit, so motion is possible
// Dangerous case:
// preheader -> B1 -> B3(exit)
// |
// v
// B2: s: x=...
// B2 does not dominate exit B3 -> motion not possible
// (a path exists where s is not executed)
Condition 2: x is not defined elsewhere within the loop
Condition 3: All uses of x in the loop reach only the definition at s
3.4 Preheader Insertion
A preheader block is inserted just before the loop header for code motion.
// Before motion:
// ... -> header -> body -> ...
// ^ |
// +---------------+
// After preheader insertion:
// ... -> preheader -> header -> body -> ...
// ^ |
// +---------------+
// Place loop-invariant code in the preheader:
// preheader:
// t1 = n * 4 // Moved code
// t2 = a[t1] // Moved code
4. Induction Variable Detection and Elimination
4.1 Basic Induction Variable
A variable i that is defined only in the form i = i + c (where c is a loop constant) within the loop is called a basic induction variable.
4.2 Derived Induction Variable
A variable j that can be expressed as j = c1 * i + c2 (where c1, c2 are loop constants) for a basic induction variable i is called a derived induction variable.
Triple notation for induction variables: (i, c1, c2) means j = c1 * i + c2.
// Example:
// i = i + 1 // i is a basic induction variable
// t1 = 4 * i // t1 is a derived induction variable: (i, 4, 0)
// t2 = t1 + base // t2 is a derived induction variable: (i, 4, base)
4.3 Induction Variable Detection Algorithm
Algorithm: Induction Variable Detection
Step 1: Detect basic induction variables
- Find variables defined only in the form i = i + c within the loop
Step 2: Detect derived induction variables
- j = c1 * i (i is basic induction variable, c1 is loop constant)
-> j's triple: (i, c1, 0)
- j = k + c2 (k is induction variable (i, a, b), c2 is loop constant)
-> j's triple: (i, a, b+c2)
- j = c1 * k (k is induction variable (i, a, b))
-> j's triple: (i, c1*a, c1*b)
4.4 Applying Strength Reduction
Convert the multiplication of derived induction variables to addition.
// Original code:
// i = 0
// loop:
// t1 = 4 * i // Multiplication
// a[t1] = 0
// i = i + 1
// if i < n goto loop
// After strength reduction:
// i = 0
// t1 = 0 // Initial value: 4 * 0 = 0
// loop:
// a[t1] = 0
// i = i + 1
// t1 = t1 + 4 // Multiplication -> addition
// if i < n goto loop
4.5 Induction Variable Elimination
After strength reduction, if the basic induction variable is used only in the loop termination condition, it can be replaced with the derived induction variable.
// After strength reduction:
// i = 0
// t1 = 0
// loop:
// a[t1] = 0
// i = i + 1 // i is used only in comparison
// t1 = t1 + 4
// if i < n goto loop
// After induction variable elimination:
// t1 = 0
// limit = 4 * n // Termination condition conversion (computed once outside loop)
// loop:
// a[t1] = 0
// t1 = t1 + 4
// if t1 < limit goto loop
//
// i is completely eliminated!
5. Loop Unrolling
5.1 Concept
A technique that copies the loop body multiple times to reduce the number of iterations.
5.2 Basic Loop Unrolling
// Original (100 iterations)
for (i = 0; i < 100; i++) {
a[i] = 0;
}
// 4x unrolling (25 iterations)
for (i = 0; i < 100; i += 4) {
a[i] = 0;
a[i+1] = 0;
a[i+2] = 0;
a[i+3] = 0;
}
5.3 Advantages
- Reduced loop overhead: Fewer branch, comparison, and increment instructions
- Increased instruction scheduling opportunities: More independent instructions improve pipeline utilization
- Improved register utilization: Value reuse between consecutive operations
5.4 Disadvantages and Considerations
// When the iteration count is not a multiple of the unrolling factor
// Remainder handling is needed
// When n might not be a multiple of 4
i = 0
// Main loop (4x unrolling)
while (i + 3 < n) {
a[i] = 0; a[i+1] = 0; a[i+2] = 0; a[i+3] = 0;
i += 4;
}
// Remainder handling
while (i < n) {
a[i] = 0;
i++;
}
- Code size increase: May negatively impact cache
- Register pressure: Excessive unrolling may cause register spills
- Typically 2x to 8x unrolling is effective
6. Comprehensive Loop Optimization Example
Let us examine the entire loop optimization process through a single example.
// Original code (initializing a row of a matrix)
i = 0
loop:
t1 = i * 8 // Induction variable (multiplication)
t2 = base + t1 // Induction variable
*t2 = 0.0
i = i + 1
if i < n goto loop
// Step 1: Loop-invariant code motion
// (none applicable in this example)
// Step 2: Strength reduction
i = 0
t1 = 0 // Initial value
t2 = base // Initial value
loop:
*t2 = 0.0
i = i + 1
t1 = t1 + 8 // Multiplication -> addition
t2 = t2 + 8 // Multiplication -> addition
if i < n goto loop
// Step 3: Induction variable elimination (remove i and t1)
t2 = base
limit = base + n * 8 // Computed once outside the loop
loop:
*t2 = 0.0
t2 = t2 + 8
if t2 < limit goto loop
// Step 4: Loop unrolling (2x)
t2 = base
limit = base + n * 8
loop:
*t2 = 0.0
*(t2 + 8) = 0.0
t2 = t2 + 16
if t2 < limit goto loop
// (remainder handling code omitted)
The original required 1 multiplication, 2 additions, 1 store, and 1 comparison per loop iteration, but after optimization, only 1 addition, 2 stores, and 1 comparison are needed.
Summary
| Concept | Description |
|---|---|
| Dominator | A node through which all paths from the entry point to a specific node pass |
| Dominator tree | A tree structure representing the dominator relation |
| Back edge | An edge going back to a dominator (forming a loop) |
| Natural loop | A loop defined by a back edge |
| Loop-invariant code motion | Moving code that computes the same value every iteration outside the loop |
| Induction variable | A variable that changes by a constant amount in a loop |
| Strength reduction | Replacing multiplication with addition |
| Induction variable elimination | Removing induction variables that become unnecessary |
| Loop unrolling | Copying the loop body to reduce the number of iterations |
Loop optimization is the area of compiler optimization that yields the greatest performance improvements. The basic strategy is to identify loops through dominator analysis, lighten the loop body through invariant code motion and strength reduction, and reduce overhead through loop unrolling. In the next article, we will cover instruction-level parallelism and scheduling.