[Compiler] 12. Code Generation Algorithms and Register Allocation

Overview

In the previous article, we explored the basic concepts of code generation. This article focuses on the actual algorithms for generating code. We will cover the simple code generation algorithm, DAG-based code generation, optimal code generation, graph coloring for register allocation, and peephole optimization.

1. Simple Code Generation Algorithm

1.1 Basic Idea

The most basic code generation algorithm processes three-address instructions x = y op z one at a time to generate target code.

The key is to maintain register descriptors and address descriptors while selecting the optimal register for each operation.

1.2 Register Descriptor

A register descriptor tracks which variable's value each register currently holds.

Initial state:
  R1: empty
  R2: empty
  R3: empty

After instruction "LD R1, a":
  R1: a
  R2: empty
  R3: empty

1.3 Address Descriptor

An address descriptor tracks where each variable's current value is stored. A value can exist simultaneously in multiple locations such as registers, memory, and the stack.

Variable a: memory
Variable b: R1, memory
Variable c: R2

1.4 getReg Function

The getReg(x = y op z) function selects a register to store the result of an instruction.

Selection priority:

If y is in register R, R contains only y, and y is not used later -> select R
If there is an empty register -> select that register
If all registers are in use -> select the register holding the value that will be used furthest in the future (spill)

1.5 Code Generation Example

Let us generate code for the following basic block. Assume only two registers R1 and R2 are available.

// Three-address code
t = a - b
u = a - c
v = t + u
d = v + u

Code generation process:

Instruction: t = a - b
  getReg -> R1 (empty)
  Code: LD  R1, a      // R1 = a
        LD  R2, b      // R2 = b
        SUB R1, R1, R2 // R1 = a - b = t
  Registers: R1=t, R2=b
  Descriptors: t->R1, a->memory, b->R2/memory

Instruction: u = a - c
  getReg -> R2 (b is not needed later)
  Code: LD  R2, a      // R2 = a
        SUB R2, R2, c  // R2 = a - c = u
  Registers: R1=t, R2=u

Instruction: v = t + u
  getReg -> R1 (t is not needed later)
  Code: ADD R1, R1, R2 // R1 = t + u = v
  Registers: R1=v, R2=u

Instruction: d = v + u
  getReg -> R1 (v is not needed later)
  Code: ADD R1, R1, R2 // R1 = v + u = d
  Registers: R1=d, R2=u

// End of basic block: store live variable d to memory
  Code: ST  d, R1

Final generated code:

LD  R1, a
LD  R2, b
SUB R1, R1, R2
LD  R2, a
SUB R2, R2, c
ADD R1, R1, R2
ADD R1, R1, R2
ST  d, R1

2. DAG-Based Code Generation

Generating code from a basic block's DAG representation naturally achieves optimizations such as common subexpression elimination.

2.1 Determining Node Order in DAG

Choosing a good evaluation order for DAG nodes can minimize the number of registers needed.

Key principle: If a node is evaluated and its parent node is evaluated immediately after, there is no need to keep the value in a register for a long time.

// DAG:
//        (d: +)
//       /      \
//    (v: +)    (u: -)
//    /    \    /    \
// (t: -)  (u) a0    c0
//  /   \
// a0   b0

// Good order: t, u, v, d (consumed right after use)
// Bad order: t, v, u, d (t must be kept for a long time)

2.2 Heuristic Ordering Algorithm

Algorithm: DAG Node Ordering

1. Push root nodes (nodes with no output) onto the stack
2. While the stack is not empty:
   a. Pop node n from the top of the stack and output it
   b. Push children of n whose parents have all been output
   c. If multiple children satisfy the condition, push the leftmost child first

3. Tree-Based Optimal Code Generation

3.1 Ershov Numbers

A method for calculating the minimum number of registers needed for a tree-form expression.

Ershov number calculation rules:

1. Leaf node: label = 1 (left child) or 0 (right child in memory)
2. If the two children's labels differ: label = max(l1, l2)
3. If the two children's labels are equal: label = l1 + 1

Example:

//       (*)      label = 2
//      /   \
//    (+)    (-)   label = 1, 1
//   / \    / \
//  a   b  c   d   leaves
//
// (+): left=1, right=0 -> different, max(1,0) = 1
// (-): left=1, right=0 -> different, max(1,0) = 1
// (*): left=1, right=1 -> equal, 1+1 = 2
//
// Conclusion: 2 registers needed

3.2 Optimal Code Generation Algorithm

Algorithm: genCode(node)

if node is a leaf:
    LD Ri, node.name
else:
    left = node.leftChild
    right = node.rightChild

    if left.label >= right.label:
        genCode(left)   // result in Ri
        genCode(right)  // result in Rj
        emit("OP Ri, Ri, Rj")
    else:
        genCode(right)  // right side first
        genCode(left)
        emit("OP Ri, Rj, Ri")

Computing the side with the larger label first optimizes register usage.

4. Register Allocation by Graph Coloring

4.1 Interference Graph

An interference graph represents the simultaneous liveness relationships of variables.

Nodes: Each variable (or temporary) in the program
Edges: Two variables that are simultaneously live are connected by an edge

// Code:
// 1: a = ...
// 2: b = ...     (a live)
// 3: c = a + b   (a, b live)
// 4: d = c       (c live)
// 5: e = b + d   (b, d live)
// 6: return e

// Interference graph:
//   a --- b
//   |
//   c     d --- b
//         |
//         e

4.2 Graph Coloring

The problem of assigning variables to k registers is equivalent to the k-coloring problem on the interference graph.

Adjacent nodes (simultaneously live variables) must not be assigned the same color (register)
If k-coloring is possible, k registers are sufficient
If not possible, some variables must be spilled to memory

4.3 Simplification-Based Coloring Algorithm

Chaitin's algorithm:

Algorithm: Graph Coloring Register Allocation

1. Simplify:
   - Remove nodes with degree less than k from the graph and push onto stack
   - After removal, other nodes' degrees may decrease, so repeat

2. Spill:
   - If all nodes have degree >= k, mark one as a potential spill and remove it
   - Spill candidate selection criteria: infrequently used variables, variables outside loops

3. Select:
   - Pop nodes from the stack one at a time, adding them back to the graph
   - Assign a color (register) not used by neighbors
   - If no color is available, actual spill occurs

4. Actual Spill Handling:
   - Insert load/store code for spilled variables
   - Rebuild the interference graph and repeat the algorithm

4.4 Example

Coloring with 2 registers (R1, R2):

// Interference graph:
//   a --- b --- c
//
// Degrees: a=1, b=2, c=1

// Simplify: remove a (degree 1 < 2) -> remove c (degree 1 < 2) -> remove b (degree 0 < 2)
// Stack: [a, c, b]

// Select:
//   b -> R1 (no neighbors, any color works)
//   c -> R2 (neighbor b=R1, so R2)
//   a -> R2 (neighbor b=R1, so R2)

// Result: a=R2, b=R1, c=R2

5. Peephole Optimization

5.1 Concept

Peephole optimization is a technique that improves generated target code by examining a small window (peephole) of instructions. It is applied as a post-processing step after code generation.

5.2 Key Peephole Optimization Techniques

Redundant Load/Store Elimination

// Before optimization
ST a, R1
LD R1, a    // Redundant load

// After optimization
ST a, R1
// LD removed (R1 already contains the value of a)

Unreachable Code Elimination

// Before optimization
BR L2
ADD R1, R2, R3   // Unreachable
SUB R4, R5, R6   // Unreachable
L2:

// After optimization
BR L2
L2:
// Or the BR itself can be removed (if consecutive labels)

Control Flow Optimization

// Before optimization
BR L1
...
L1: BR L2

// After optimization
BR L2     // Skip L1 and go directly to L2

Algebraic Simplification

// Before optimization
ADD R1, R1, #0   // x + 0 = x
MUL R2, R2, #1   // x * 1 = x
MUL R3, R3, #2   // x * 2

// After optimization
// ADD removed (identity operation)
// MUL removed (identity operation)
SHL R3, R3, #1   // Replace multiplication with shift (strength reduction)

Strength Reduction

Replacing expensive operations with equivalent but cheaper operations.

// Before optimization
MUL R1, R2, #4    // Multiplication (slow)
MUL R3, R4, #8
DIV R5, R6, #2

// After optimization
SHL R1, R2, #2    // Left shift (fast): x*4 = x<<2
SHL R3, R4, #3    // x*8 = x<<3
SHR R5, R6, #1    // Right shift: x/2 = x>>1

Machine Idiom Utilization

// Before optimization
LD  R1, a
ADD R1, R1, #1
ST  a, R1

// After optimization (if machine supports INC instruction)
INC a

5.3 Characteristics of Peephole Optimization

Local optimization: Observes only a small window (usually 2-3 instructions)
Iterative application: One optimization may create new optimization opportunities, so repeat until no more changes
Simple implementation: Easily implemented through pattern matching
Effective: Simple yet significantly improves code quality

6. Comparison of Code Generation Techniques

Technique	Scope	Optimality	Complexity
Simple code generation	Basic block	Sub-optimal	Low
DAG-based generation	Basic block	Sub-optimal	Medium
Tree-based optimal generation	Tree expression	Optimal	Medium
Graph coloring allocation	Entire function	Sub-optimal	High
Peephole optimization	Instruction window	Locally optimal	Low

Summary

Concept	Description
Simple code generation	Sequential code generation using register/address descriptors
getReg	Core function for selecting a register to store results
DAG-based generation	Code generation that automatically eliminates common subexpressions
Ershov numbers	Calculates minimum register requirements for tree expressions
Interference graph	Represents simultaneously live variable relationships as a graph
Graph coloring	Register allocation via k-coloring of the interference graph
Peephole optimization	Improves generated code by pattern matching on a small window
Strength reduction	Replaces expensive operations with cheaper equivalent operations

Code generation algorithms are at the core of compiler performance. While simple algorithms can produce reasonable code, combining DAG-based approaches with graph coloring register allocation yields much more efficient code. Peephole optimization serves as a final finishing step that provides simple yet effective code improvement. In the next article, we will examine machine-independent code optimization techniques.