- Authors

- Name
- Youngju Kim
- @fjvbn20031
Overview
Machine-Independent Code Optimization refers to techniques that improve program efficiency at the intermediate code level without depending on specific hardware. These optimizations preserve program semantics while reducing execution time or code size.
In this article, we will examine the principles of major optimization techniques with examples and introduce data-flow analysis, which forms the foundation for these optimizations.
1. Common Subexpression Elimination
1.1 Concept
When the same operation is performed on the same operands more than once, subsequent computations are eliminated and the previous result is reused.
1.2 Local Common Subexpression Elimination
This is common subexpression elimination within a single basic block.
// Before optimization
a = b + c
d = a - e
f = b + c // b + c has already been computed
g = f - e
// After optimization
a = b + c
d = a - e
f = a // Reuse a instead of b + c
g = f - e // Or g = d (a-e is also a common subexpression)
1.3 Global Common Subexpression Elimination
Common subexpression elimination across multiple basic blocks. This requires Available Expressions analysis.
// Block B1:
// t1 = a + b
// ...
// Block B2: (reachable from B1, a and b unchanged)
// t2 = a + b // Same value as t1 in B1
// After optimization
// Block B2:
// t2 = t1 // Reuse
Important note: It must be verified that the operands (a or b) have not been modified between the two computations.
1.4 Practical Example
Common subexpressions frequently occur in array accesses:
// C code
x = a[i][j] + a[i][j+1];
// Intermediate code generated during compilation
t1 = i * n // Row offset
t2 = t1 + j // Index of a[i][j]
t3 = a[t2]
t4 = i * n // Common subexpression! (same as t1)
t5 = t4 + j + 1 // Index of a[i][j+1]
t6 = a[t5]
x = t3 + t6
// After optimization
t1 = i * n
t2 = t1 + j
t3 = a[t2]
t5 = t1 + j + 1 // Reuse t1 instead of t4
t6 = a[t5]
x = t3 + t6
2. Dead Code Elimination
2.1 Concept
Dead code is code whose computed results are never used afterward. Removing it eliminates unnecessary computations.
2.2 Example
// Before optimization
a = b + c // Dead code if a is not used afterward
d = e * f
return d
// After optimization
d = e * f
return d
2.3 Commonly Found in Debug Code
// C code example
int debug = 0; // Debug mode off
void process() {
// ... processing logic ...
if (debug) { // Condition is always false
printf("Debug info"); // Dead code
}
}
After constant propagation, knowing that debug is 0 makes the entire if block dead code.
2.4 Dead Code Identification
The condition for an assignment x = expr to be dead code:
- x is not live after the assignment
- That is, the value of x does not affect the program's output or side effects
Live Variable Analysis is needed to determine this.
3. Constant Folding and Constant Propagation
3.1 Constant Folding
Operations with all-constant operands are computed at compile time.
// Before optimization
x = 3 + 5
y = x * 2
// After constant folding
x = 8
y = x * 2
// After constant propagation + constant folding
x = 8
y = 16
3.2 Constant Propagation
When a constant is assigned to a variable, the constant is directly substituted at places where that variable is used.
// Before optimization
n = 10
i = 0
limit = n - 1
// After constant propagation
n = 10
i = 0
limit = 10 - 1 // Replace n with 10
// After constant folding
n = 10
i = 0
limit = 9
3.3 Conditional Constant Propagation
When constants are propagated into branch conditions, the branches themselves can be eliminated.
// Before optimization
x = 1
if x > 0 goto L1
goto L2
// After constant propagation + constant folding
x = 1
if 1 > 0 goto L1 // Always true
goto L2 // Unreachable
// After branch simplification
x = 1
goto L1
4. Loop-Invariant Code Motion
4.1 Concept
Code that computes the same value on every loop iteration is moved outside the loop.
4.2 Example
// Before optimization
while (i < limit) {
// t = n * 4 is loop invariant
t = n * 4
a[i] = t + i
i = i + 1
}
// After optimization
t = n * 4 // Moved outside the loop
while (i < limit) {
a[i] = t + i
i = i + 1
}
4.3 Loop-Invariant Code Detection
The condition for instruction s: x = y op z to be loop invariant:
- All definitions of y and z are outside the loop, or
- y and z are constants, or
- The instructions defining y and z have already been identified as loop invariant
4.4 Movement Conditions
Even loop-invariant code cannot always be moved unconditionally. Conditions for safe movement:
Conditions for moving instruction s: x = y op z outside the loop:
1. The block containing s dominates all exits of the loop
2. x is not defined elsewhere within the loop
3. All uses of x in the loop reach only the definition at s
5. Induction Variables and Strength Reduction
5.1 Induction Variable
A variable that changes by a constant amount on each loop iteration is called an induction variable.
// i is a basic induction variable (increments by 1 each iteration)
// j is a derived induction variable (j = 4*i, so increments by 4)
for (i = 0; i < n; i++) {
j = 4 * i
a[j] = 0
}
5.2 Strength Reduction
Replaces expensive operations (multiplication) on induction variables with cheaper operations (addition).
// Before optimization
i = 0
L1: if i >= n goto L2
j = 4 * i // Multiplication (expensive)
a[j] = 0
i = i + 1
goto L1
L2:
// After strength reduction
i = 0
j = 0 // Initial value of j
L1: if i >= n goto L2
a[j] = 0
i = i + 1
j = j + 4 // Replace multiplication with addition
goto L1
L2:
5.3 Induction Variable Elimination
After strength reduction, if the basic induction variable is used only in the loop termination condition, it can be replaced with the derived induction variable and eliminated.
// After strength reduction
i = 0
j = 0
L1: if i >= n goto L2
a[j] = 0
i = i + 1
j = j + 4
goto L1
L2:
// After induction variable elimination (replace i with j)
j = 0
limit = 4 * n // Termination condition conversion
L1: if j >= limit goto L2
a[j] = 0
j = j + 4
goto L1
L2:
The multiplication is moved outside the loop and executed only once, while only addition and comparison are performed inside the loop.
6. Copy Propagation
6.1 Concept
After a copy instruction x = y, use y directly instead of x.
// Before optimization
x = y
z = x + 1 // Can use y instead of x
// After copy propagation
x = y // This line may become dead code
z = y + 1
Copy propagation itself does not reduce code, but it is effective when combined with dead code elimination afterward.
6.2 Relationship with Common Subexpression Elimination
// After common subexpression elimination
a = b + c
...
f = a // Copy instead of b + c
// After copy propagation (substitute a where f is used)
// If f becomes unnecessary, it is removed as dead code
7. Introduction to Data-Flow Analysis
The optimizations described above require analysis of the program's data flow to be applied safely.
7.1 What is Data-Flow Analysis
Data-Flow Analysis is a technique for determining whether certain properties hold at each point in a program.
Key data-flow analyses:
| Analysis | Question | Use Case |
|---|---|---|
| Reaching Definitions | Where was the value of variable x defined at this point? | Constant propagation, CSE |
| Live Variables | Is the variable's value at this point used later? | Dead code elimination, register allocation |
| Available Expressions | Has this expression already been computed and is it available? | Global CSE |
7.2 Basic Concept: gen and kill Sets
For each basic block B:
- gen(B): Information newly generated in block B
- kill(B): Information invalidated in block B
// Block B:
// d1: x = a + b // gen: d1, kill: all other definitions of x
// d2: y = c - d // gen: d2, kill: all other definitions of y
7.3 Iterative Algorithm Preview
Data-flow equations are solved iteratively until a fixed point is reached:
// Transfer function for reaching definitions analysis
OUT[B] = gen(B) U (IN[B] - kill(B))
// IN[B] is the union of OUT of all predecessor blocks
IN[B] = U OUT[P] (P is all predecessors of B)
Details will be covered in the next article.
8. Optimization Application Order
The order in which optimizations are applied is also important. A typical order:
1. Constant folding / Constant propagation
-> Establish constants first to contribute to other optimizations
2. Common subexpression elimination
-> Remove duplicate computations
3. Copy propagation
-> Prepare for removing unnecessary copies
4. Dead code elimination
-> Clean up dead code generated by the above optimizations
5. Loop-invariant code motion
-> Move code outside loops
6. Induction variables / Strength reduction
-> Reduce cost within loops
7. Repeat (until convergence)
Summary
| Optimization Technique | Effect | Required Analysis |
|---|---|---|
| Common subexpression elimination | Prevents duplicate computation | Available expressions analysis |
| Dead code elimination | Removes unnecessary code | Live variable analysis |
| Constant folding/propagation | Pre-computes at compile time | Reaching definitions analysis |
| Loop-invariant code motion | Eliminates repeated computation in loops | Reaching definitions, dominator analysis |
| Induction variables/strength reduction | Replaces multiplication with addition | Induction variable detection |
| Copy propagation | Removes unnecessary copies | Reaching definitions analysis |
Machine-independent optimization significantly improves the quality of code generated by compilers. Loop-related optimizations (loop-invariant code motion, strength reduction) are particularly impactful since they improve the efficiency of loops, where most program execution time is spent. In the next article, we will examine the data-flow analysis framework that forms the foundation for these optimizations.