- Authors

- Name
- Youngju Kim
- @fjvbn20031
Overview
In the previous article, we explored the basic concepts of code generation. This article focuses on the actual algorithms for generating code. We will cover the simple code generation algorithm, DAG-based code generation, optimal code generation, graph coloring for register allocation, and peephole optimization.
1. Simple Code Generation Algorithm
1.1 Basic Idea
The most basic code generation algorithm processes three-address instructions x = y op z one at a time to generate target code.
The key is to maintain register descriptors and address descriptors while selecting the optimal register for each operation.
1.2 Register Descriptor
A register descriptor tracks which variable's value each register currently holds.
Initial state:
R1: empty
R2: empty
R3: empty
After instruction "LD R1, a":
R1: a
R2: empty
R3: empty
1.3 Address Descriptor
An address descriptor tracks where each variable's current value is stored. A value can exist simultaneously in multiple locations such as registers, memory, and the stack.
Variable a: memory
Variable b: R1, memory
Variable c: R2
1.4 getReg Function
The getReg(x = y op z) function selects a register to store the result of an instruction.
Selection priority:
- If y is in register R, R contains only y, and y is not used later -> select R
- If there is an empty register -> select that register
- If all registers are in use -> select the register holding the value that will be used furthest in the future (spill)
1.5 Code Generation Example
Let us generate code for the following basic block. Assume only two registers R1 and R2 are available.
// Three-address code
t = a - b
u = a - c
v = t + u
d = v + u
Code generation process:
Instruction: t = a - b
getReg -> R1 (empty)
Code: LD R1, a // R1 = a
LD R2, b // R2 = b
SUB R1, R1, R2 // R1 = a - b = t
Registers: R1=t, R2=b
Descriptors: t->R1, a->memory, b->R2/memory
Instruction: u = a - c
getReg -> R2 (b is not needed later)
Code: LD R2, a // R2 = a
SUB R2, R2, c // R2 = a - c = u
Registers: R1=t, R2=u
Instruction: v = t + u
getReg -> R1 (t is not needed later)
Code: ADD R1, R1, R2 // R1 = t + u = v
Registers: R1=v, R2=u
Instruction: d = v + u
getReg -> R1 (v is not needed later)
Code: ADD R1, R1, R2 // R1 = v + u = d
Registers: R1=d, R2=u
// End of basic block: store live variable d to memory
Code: ST d, R1
Final generated code:
LD R1, a
LD R2, b
SUB R1, R1, R2
LD R2, a
SUB R2, R2, c
ADD R1, R1, R2
ADD R1, R1, R2
ST d, R1
2. DAG-Based Code Generation
Generating code from a basic block's DAG representation naturally achieves optimizations such as common subexpression elimination.
2.1 Determining Node Order in DAG
Choosing a good evaluation order for DAG nodes can minimize the number of registers needed.
Key principle: If a node is evaluated and its parent node is evaluated immediately after, there is no need to keep the value in a register for a long time.
// DAG:
// (d: +)
// / \
// (v: +) (u: -)
// / \ / \
// (t: -) (u) a0 c0
// / \
// a0 b0
// Good order: t, u, v, d (consumed right after use)
// Bad order: t, v, u, d (t must be kept for a long time)
2.2 Heuristic Ordering Algorithm
Algorithm: DAG Node Ordering
1. Push root nodes (nodes with no output) onto the stack
2. While the stack is not empty:
a. Pop node n from the top of the stack and output it
b. Push children of n whose parents have all been output
c. If multiple children satisfy the condition, push the leftmost child first
3. Tree-Based Optimal Code Generation
3.1 Ershov Numbers
A method for calculating the minimum number of registers needed for a tree-form expression.
Ershov number calculation rules:
1. Leaf node: label = 1 (left child) or 0 (right child in memory)
2. If the two children's labels differ: label = max(l1, l2)
3. If the two children's labels are equal: label = l1 + 1
Example:
// (*) label = 2
// / \
// (+) (-) label = 1, 1
// / \ / \
// a b c d leaves
//
// (+): left=1, right=0 -> different, max(1,0) = 1
// (-): left=1, right=0 -> different, max(1,0) = 1
// (*): left=1, right=1 -> equal, 1+1 = 2
//
// Conclusion: 2 registers needed
3.2 Optimal Code Generation Algorithm
Algorithm: genCode(node)
if node is a leaf:
LD Ri, node.name
else:
left = node.leftChild
right = node.rightChild
if left.label >= right.label:
genCode(left) // result in Ri
genCode(right) // result in Rj
emit("OP Ri, Ri, Rj")
else:
genCode(right) // right side first
genCode(left)
emit("OP Ri, Rj, Ri")
Computing the side with the larger label first optimizes register usage.
4. Register Allocation by Graph Coloring
4.1 Interference Graph
An interference graph represents the simultaneous liveness relationships of variables.
- Nodes: Each variable (or temporary) in the program
- Edges: Two variables that are simultaneously live are connected by an edge
// Code:
// 1: a = ...
// 2: b = ... (a live)
// 3: c = a + b (a, b live)
// 4: d = c (c live)
// 5: e = b + d (b, d live)
// 6: return e
// Interference graph:
// a --- b
// |
// c d --- b
// |
// e
4.2 Graph Coloring
The problem of assigning variables to k registers is equivalent to the k-coloring problem on the interference graph.
- Adjacent nodes (simultaneously live variables) must not be assigned the same color (register)
- If k-coloring is possible, k registers are sufficient
- If not possible, some variables must be spilled to memory
4.3 Simplification-Based Coloring Algorithm
Chaitin's algorithm:
Algorithm: Graph Coloring Register Allocation
1. Simplify:
- Remove nodes with degree less than k from the graph and push onto stack
- After removal, other nodes' degrees may decrease, so repeat
2. Spill:
- If all nodes have degree >= k, mark one as a potential spill and remove it
- Spill candidate selection criteria: infrequently used variables, variables outside loops
3. Select:
- Pop nodes from the stack one at a time, adding them back to the graph
- Assign a color (register) not used by neighbors
- If no color is available, actual spill occurs
4. Actual Spill Handling:
- Insert load/store code for spilled variables
- Rebuild the interference graph and repeat the algorithm
4.4 Example
Coloring with 2 registers (R1, R2):
// Interference graph:
// a --- b --- c
//
// Degrees: a=1, b=2, c=1
// Simplify: remove a (degree 1 < 2) -> remove c (degree 1 < 2) -> remove b (degree 0 < 2)
// Stack: [a, c, b]
// Select:
// b -> R1 (no neighbors, any color works)
// c -> R2 (neighbor b=R1, so R2)
// a -> R2 (neighbor b=R1, so R2)
// Result: a=R2, b=R1, c=R2
5. Peephole Optimization
5.1 Concept
Peephole optimization is a technique that improves generated target code by examining a small window (peephole) of instructions. It is applied as a post-processing step after code generation.
5.2 Key Peephole Optimization Techniques
Redundant Load/Store Elimination
// Before optimization
ST a, R1
LD R1, a // Redundant load
// After optimization
ST a, R1
// LD removed (R1 already contains the value of a)
Unreachable Code Elimination
// Before optimization
BR L2
ADD R1, R2, R3 // Unreachable
SUB R4, R5, R6 // Unreachable
L2:
// After optimization
BR L2
L2:
// Or the BR itself can be removed (if consecutive labels)
Control Flow Optimization
// Before optimization
BR L1
...
L1: BR L2
// After optimization
BR L2 // Skip L1 and go directly to L2
Algebraic Simplification
// Before optimization
ADD R1, R1, #0 // x + 0 = x
MUL R2, R2, #1 // x * 1 = x
MUL R3, R3, #2 // x * 2
// After optimization
// ADD removed (identity operation)
// MUL removed (identity operation)
SHL R3, R3, #1 // Replace multiplication with shift (strength reduction)
Strength Reduction
Replacing expensive operations with equivalent but cheaper operations.
// Before optimization
MUL R1, R2, #4 // Multiplication (slow)
MUL R3, R4, #8
DIV R5, R6, #2
// After optimization
SHL R1, R2, #2 // Left shift (fast): x*4 = x<<2
SHL R3, R4, #3 // x*8 = x<<3
SHR R5, R6, #1 // Right shift: x/2 = x>>1
Machine Idiom Utilization
// Before optimization
LD R1, a
ADD R1, R1, #1
ST a, R1
// After optimization (if machine supports INC instruction)
INC a
5.3 Characteristics of Peephole Optimization
- Local optimization: Observes only a small window (usually 2-3 instructions)
- Iterative application: One optimization may create new optimization opportunities, so repeat until no more changes
- Simple implementation: Easily implemented through pattern matching
- Effective: Simple yet significantly improves code quality
6. Comparison of Code Generation Techniques
| Technique | Scope | Optimality | Complexity |
|---|---|---|---|
| Simple code generation | Basic block | Sub-optimal | Low |
| DAG-based generation | Basic block | Sub-optimal | Medium |
| Tree-based optimal generation | Tree expression | Optimal | Medium |
| Graph coloring allocation | Entire function | Sub-optimal | High |
| Peephole optimization | Instruction window | Locally optimal | Low |
Summary
| Concept | Description |
|---|---|
| Simple code generation | Sequential code generation using register/address descriptors |
| getReg | Core function for selecting a register to store results |
| DAG-based generation | Code generation that automatically eliminates common subexpressions |
| Ershov numbers | Calculates minimum register requirements for tree expressions |
| Interference graph | Represents simultaneously live variable relationships as a graph |
| Graph coloring | Register allocation via k-coloring of the interference graph |
| Peephole optimization | Improves generated code by pattern matching on a small window |
| Strength reduction | Replaces expensive operations with cheaper equivalent operations |
Code generation algorithms are at the core of compiler performance. While simple algorithms can produce reasonable code, combining DAG-based approaches with graph coloring register allocation yields much more efficient code. Peephole optimization serves as a final finishing step that provides simple yet effective code improvement. In the next article, we will examine machine-independent code optimization techniques.