Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Overview

Most of a program's execution time is spent in **loops**. Therefore, optimizing loops is the most effective way to improve overall program performance. In this article, we systematically examine techniques for identifying and optimizing loops.

1. Dominators

1.1 Definition

In a flow graph, node d **dominates** node n = every path from the entry point to n passes through d.

This is denoted as `d dom n`.

Properties:

- Every node dominates itself (reflexive)

- If d dom n and n dom d then d = n (antisymmetric)

- If d dom n and n dom p then d dom p (transitive)

1.2 Dominator Tree

A tree representation of the dominator relation. Each node's parent is its **immediate dominator**.

// Flow graph:

// 1 -> 2 -> 3 -> 4

// | ^ |

// v | v

// 5 ---+ 6

// |

// v

// 7

// Dominator tree:

// 1

// |

// 2

// / | \

// 3 5 6

// | |

// 4 7

// Interpretation: 1 dom all nodes

// 2 dom 3,4,5,6,7

// 3 dom 4

// 6 dom 7

1.3 Dominator Computation Algorithm

Dominators are computed using an iterative algorithm:

Algorithm: Dominator Computation

Initialization:

DOM[entry] = {entry}

DOM[n] = set of all nodes (n != entry)

Iterate (until convergence):

for (all nodes n != entry):

DOM[n] = {n} U ( n DOM[p] ) // p is all predecessors of n

// Add n to the intersection of predecessors' DOMs

1.4 Dominance Frontier

The **dominance frontier** DF(n) of node n is the boundary just outside the region dominated by n.

DF(n) = set of nodes y | n dominates a predecessor of y, but does not (strictly) dominate y itself

The dominance frontier is critically used to determine where to insert phi functions in SSA construction.

2. Natural Loops

2.1 Back Edge

In a flow graph, if in edge n -> d, d dominates n, then this edge is called a **back edge**.

// In a flow graph:

// 1 -> 2 -> 3 -> 4

// ^ |

// +---------+ // 4 -> 2 is a back edge (2 dom 4)

2.2 Definition of Natural Loop

The **natural loop** for back edge n -> d:

- **Header**: Node d (the sole entry point of the loop)

- **Body**: The set of all nodes that can reach n without going through d, plus d

Algorithm: Natural Loop Construction (back edge n -> d)

1. loop = {d}

2. stack = {n} (if n is different from d)

3. while stack is not empty:

m = pop from stack

if m is not in loop:

add m to loop

push all predecessors of m onto stack

4. Result: loop

2.3 Example

// Flow graph:

// B1 -> B2 -> B3 -> B4

// ^ |

// +----------+ (back edge: B4 -> B2)

// |

// +-- B5 (B2 -> B5, B5 -> B2 is also a back edge)

// Natural loop for back edge B4 -> B2:

// {B2, B3, B4} (reachable to B4 without going through B2: B3, B4)

// Natural loop for back edge B5 -> B2:

// {B2, B5}

2.4 Loop Nesting

If two loops share the same header, they are merged and treated as a single loop. Otherwise, one is completely contained within the other (nested) or they are disjoint.

// Outer loop: {B1, B2, B3, B4, B5}

// Inner loop: {B3, B4}

// Optimize the inner loop first, then the outer loop

// (from inside out)

3. Loop-Invariant Code Detection and Motion

3.1 Loop-Invariant Code Detection

The conditions for instruction `s: x = y op z` to be loop invariant:

1. All definitions of y and z are outside the loop, or

2. y (or z) has exactly one definition, and that definition has already been identified as loop invariant

3. y (or z) is a constant

Algorithm: Loop-Invariant Code Detection

1. Mark instructions whose operands are constants or defined only outside

the loop as "loop invariant"

2. Repeat:

- Newly mark instructions as "loop invariant" if all operands are constants,

outside definitions, or have exactly one definition already marked as loop invariant

3. Terminate when no new markings are made

3.2 Example

// Loop:

// B2: i = i + 1

// t1 = n * 4 // n is defined outside the loop -> loop invariant

// t2 = a[t1] // t1 is loop invariant and a is invariant in loop -> loop invariant

// t3 = t2 + i

// if i < 100 goto B2

// Pass 1: Mark t1 = n * 4 (n is an outside definition)

// Pass 2: Mark t2 = a[t1] (t1 is loop invariant)

// Pass 3: No more -> terminate

3.3 Code Motion Conditions

To move a loop-invariant instruction `s: x = y op z` outside the loop (to the preheader), all of the following conditions must be satisfied:

**Condition 1**: The block containing s **dominates** all exits of the loop

// Safe case:

// preheader -> [B1: s: x=... ] -> B2(exit)

// B1 dominates the exit, so motion is possible

// Dangerous case:

// preheader -> B1 -> B3(exit)

// |

// v

// B2: s: x=...

// B2 does not dominate exit B3 -> motion not possible

// (a path exists where s is not executed)

**Condition 2**: x is not defined elsewhere within the loop

**Condition 3**: All uses of x in the loop reach only the definition at s

3.4 Preheader Insertion

A **preheader** block is inserted just before the loop header for code motion.

// Before motion:

// ... -> header -> body -> ...

// ^ |

// +---------------+

// After preheader insertion:

// ... -> preheader -> header -> body -> ...

// ^ |

// +---------------+

// Place loop-invariant code in the preheader:

// preheader:

// t1 = n * 4 // Moved code

// t2 = a[t1] // Moved code

4. Induction Variable Detection and Elimination

4.1 Basic Induction Variable

A variable i that is defined only in the form `i = i + c` (where c is a loop constant) within the loop is called a **basic induction variable**.

4.2 Derived Induction Variable

A variable j that can be expressed as `j = c1 * i + c2` (where c1, c2 are loop constants) for a basic induction variable i is called a **derived induction variable**.

Triple notation for induction variables: `(i, c1, c2)` means `j = c1 * i + c2`.

// Example:

// i = i + 1 // i is a basic induction variable

// t1 = 4 * i // t1 is a derived induction variable: (i, 4, 0)

// t2 = t1 + base // t2 is a derived induction variable: (i, 4, base)

4.3 Induction Variable Detection Algorithm

Algorithm: Induction Variable Detection

Step 1: Detect basic induction variables

- Find variables defined only in the form i = i + c within the loop

Step 2: Detect derived induction variables

- j = c1 * i (i is basic induction variable, c1 is loop constant)

-> j's triple: (i, c1, 0)

- j = k + c2 (k is induction variable (i, a, b), c2 is loop constant)

-> j's triple: (i, a, b+c2)

- j = c1 * k (k is induction variable (i, a, b))

-> j's triple: (i, c1*a, c1*b)

4.4 Applying Strength Reduction

Convert the multiplication of derived induction variables to addition.

// Original code:

// i = 0

// loop:

// t1 = 4 * i // Multiplication

// a[t1] = 0

// i = i + 1

// if i < n goto loop

// After strength reduction:

// i = 0

// t1 = 0 // Initial value: 4 * 0 = 0

// loop:

// a[t1] = 0

// i = i + 1

// t1 = t1 + 4 // Multiplication -> addition

// if i < n goto loop

4.5 Induction Variable Elimination

After strength reduction, if the basic induction variable is used only in the loop termination condition, it can be replaced with the derived induction variable.

// After strength reduction:

// i = 0

// t1 = 0

// loop:

// a[t1] = 0

// i = i + 1 // i is used only in comparison

// t1 = t1 + 4

// if i < n goto loop

// After induction variable elimination:

// t1 = 0

// limit = 4 * n // Termination condition conversion (computed once outside loop)

// loop:

// a[t1] = 0

// t1 = t1 + 4

// if t1 < limit goto loop

// i is completely eliminated!

5. Loop Unrolling

5.1 Concept

A technique that copies the loop body multiple times to reduce the number of iterations.

5.2 Basic Loop Unrolling

// Original (100 iterations)

for (i = 0; i < 100; i++) {

a[i] = 0;

}

// 4x unrolling (25 iterations)

for (i = 0; i < 100; i += 4) {

a[i] = 0;

a[i+1] = 0;

a[i+2] = 0;

a[i+3] = 0;

}

5.3 Advantages

1. **Reduced loop overhead**: Fewer branch, comparison, and increment instructions

2. **Increased instruction scheduling opportunities**: More independent instructions improve pipeline utilization

3. **Improved register utilization**: Value reuse between consecutive operations

5.4 Disadvantages and Considerations

// When the iteration count is not a multiple of the unrolling factor

// Remainder handling is needed

// When n might not be a multiple of 4

i = 0

// Main loop (4x unrolling)

while (i + 3 < n) {

a[i] = 0; a[i+1] = 0; a[i+2] = 0; a[i+3] = 0;

i += 4;

}

// Remainder handling

while (i < n) {

a[i] = 0;

i++;

}

- **Code size increase**: May negatively impact cache

- **Register pressure**: Excessive unrolling may cause register spills

- Typically 2x to 8x unrolling is effective

6. Comprehensive Loop Optimization Example

Let us examine the entire loop optimization process through a single example.

// Original code (initializing a row of a matrix)

i = 0

loop:

t1 = i * 8 // Induction variable (multiplication)

t2 = base + t1 // Induction variable

*t2 = 0.0

i = i + 1

if i < n goto loop

// Step 1: Loop-invariant code motion

// (none applicable in this example)

// Step 2: Strength reduction

i = 0

t1 = 0 // Initial value

t2 = base // Initial value

loop:

*t2 = 0.0

i = i + 1

t1 = t1 + 8 // Multiplication -> addition

t2 = t2 + 8 // Multiplication -> addition

if i < n goto loop

// Step 3: Induction variable elimination (remove i and t1)

t2 = base

limit = base + n * 8 // Computed once outside the loop

loop:

*t2 = 0.0

t2 = t2 + 8

if t2 < limit goto loop

// Step 4: Loop unrolling (2x)

t2 = base

limit = base + n * 8

loop:

*t2 = 0.0

*(t2 + 8) = 0.0

t2 = t2 + 16

if t2 < limit goto loop

// (remainder handling code omitted)

The original required 1 multiplication, 2 additions, 1 store, and 1 comparison per loop iteration, but after optimization, only 1 addition, 2 stores, and 1 comparison are needed.

Summary

| Concept | Description |

| ------------------------------ | --------------------------------------------------------------------------- |

| Dominator | A node through which all paths from the entry point to a specific node pass |

| Dominator tree | A tree structure representing the dominator relation |

| Back edge | An edge going back to a dominator (forming a loop) |

| Natural loop | A loop defined by a back edge |

| Loop-invariant code motion | Moving code that computes the same value every iteration outside the loop |

| Induction variable | A variable that changes by a constant amount in a loop |

| Strength reduction | Replacing multiplication with addition |

| Induction variable elimination | Removing induction variables that become unnecessary |

| Loop unrolling | Copying the loop body to reduce the number of iterations |

Loop optimization is the area of compiler optimization that yields the greatest performance improvements. The basic strategy is to identify loops through dominator analysis, lighten the loop body through invariant code motion and strength reduction, and reduce overhead through loop unrolling. In the next article, we will cover instruction-level parallelism and scheduling.