Split View: [컴파일러] 12. 코드 생성 알고리즘과 레지스터 할당

[컴파일러] 12. 코드 생성 알고리즘과 레지스터 할당

개요

이전 글에서 코드 생성의 기본 개념을 살펴보았습니다. 이번 글에서는 실제로 코드를 생성하는 알고리즘에 초점을 맞춥니다. 단순 코드 생성 알고리즘부터 시작하여, DAG 기반 코드 생성, 최적 코드 생성, 그래프 컬러링을 이용한 레지스터 할당, 그리고 핍홀 최적화까지 다루겠습니다.

1. 단순 코드 생성 알고리즘

1.1 기본 아이디어

가장 기본적인 코드 생성 알고리즘은 세 주소 명령어 x = y op z를 하나씩 처리하면서 목적 코드를 생성합니다.

핵심은 레지스터 디스크립터와 주소 디스크립터를 유지하면서, 각 연산에 대해 최적의 레지스터를 선택하는 것입니다.

1.2 레지스터 디스크립터 (Register Descriptor)

레지스터 디스크립터는 각 레지스터가 현재 어떤 변수의 값을 담고 있는지 추적합니다.

초기 상태:
  R1: 비어있음
  R2: 비어있음
  R3: 비어있음

명령어 "LD R1, a" 실행 후:
  R1: a
  R2: 비어있음
  R3: 비어있음

1.3 주소 디스크립터 (Address Descriptor)

주소 디스크립터는 각 변수의 현재 값이 어디에 저장되어 있는지 추적합니다. 값이 레지스터, 메모리, 스택 등 여러 곳에 동시에 존재할 수 있습니다.

변수 a: 메모리
변수 b: R1, 메모리
변수 c: R2

1.4 getReg 함수

getReg(x = y op z) 함수는 명령어의 결과를 저장할 레지스터를 선택합니다.

선택 우선순위:

y가 레지스터 R에 있고, R에 y만 있으며, y가 이후에 사용되지 않으면 -> R 선택
비어있는 레지스터가 있으면 -> 그 레지스터 선택
모든 레지스터가 사용 중이면 -> 가장 나중에 사용될 값을 담고 있는 레지스터 선택 (spill)

1.5 코드 생성 예시

다음 기본 블록에 대해 코드를 생성해 봅시다. 레지스터는 R1, R2 두 개만 사용 가능하다고 가정합니다.

// 세 주소 코드
t = a - b
u = a - c
v = t + u
d = v + u

코드 생성 과정:

명령어: t = a - b
  getReg -> R1 (비어있음)
  코드: LD  R1, a      // R1 = a
        LD  R2, b      // R2 = b
        SUB R1, R1, R2 // R1 = a - b = t
  레지스터: R1=t, R2=b
  디스크립터: t->R1, a->메모리, b->R2/메모리

명령어: u = a - c
  getReg -> R2 (b는 이후 불필요)
  코드: LD  R2, a      // R2 = a
        SUB R2, R2, c  // R2 = a - c = u
  레지스터: R1=t, R2=u

명령어: v = t + u
  getReg -> R1 (t는 이후 불필요)
  코드: ADD R1, R1, R2 // R1 = t + u = v
  레지스터: R1=v, R2=u

명령어: d = v + u
  getReg -> R1 (v는 이후 불필요)
  코드: ADD R1, R1, R2 // R1 = v + u = d
  레지스터: R1=d, R2=u

// 기본 블록 끝: 살아있는 변수 d를 메모리에 저장
  코드: ST  d, R1

최종 생성 코드:

LD  R1, a
LD  R2, b
SUB R1, R1, R2
LD  R2, a
SUB R2, R2, c
ADD R1, R1, R2
ADD R1, R1, R2
ST  d, R1

2. DAG 기반 코드 생성

기본 블록의 DAG 표현으로부터 코드를 생성하면, 공통 부분식 제거 등의 최적화가 자연스럽게 이루어집니다.

2.1 DAG에서의 노드 순서 결정

DAG의 노드를 평가하는 순서를 잘 선택하면 필요한 레지스터 수를 최소화할 수 있습니다.

핵심 원칙: 한 노드를 평가한 직후, 그 노드의 값을 사용하는 부모 노드를 바로 평가하면 레지스터에 값을 오래 유지할 필요가 없습니다.

// DAG:
//        (d: +)
//       /      \
//    (v: +)    (u: -)
//    /    \    /    \
// (t: -)  (u) a0    c0
//  /   \
// a0   b0

// 좋은 순서: t, u, v, d (사용 직후 소비)
// 나쁜 순서: t, v, u, d (t를 오래 유지해야 함)

2.2 휴리스틱 순서 결정 알고리즘

알고리즘: DAG 노드 순서 결정

1. 루트 노드(출력이 없는 노드)를 스택에 넣음
2. 스택이 비어있지 않은 동안:
   a. 스택 꼭대기의 노드 n을 pop하여 출력
   b. n의 자식 중 모든 부모가 이미 출력된 것을 스택에 push
   c. 여러 자식이 조건을 만족하면 가장 왼쪽 자식을 먼저 push

3. 트리 기반 최적 코드 생성

3.1 Ershov 수

트리 형태의 수식에 대해 필요한 최소 레지스터 수를 계산하는 방법입니다.

Ershov 수 계산 규칙:

1. 리프 노드: label = 1 (왼쪽 자식) 또는 0 (오른쪽 자식이 메모리)
2. 내부 노드의 두 자식 label이 다르면: label = max(l1, l2)
3. 두 자식 label이 같으면: label = l1 + 1

예시:

//       (*)      label = 2
//      /   \
//    (+)    (-)   label = 1, 1
//   / \    / \
//  a   b  c   d   리프
//
// (+)의 label: max(1,1) -> 같으므로 1+1 = 2? 아니요
// 리프: 왼쪽=1, 오른쪽=0
// (+): l=1, r=0 -> 다르므로 max(1,0) = 1
// (-): l=1, r=0 -> 다르므로 max(1,0) = 1
// (*): l=1, r=1 -> 같으므로 1+1 = 2
//
// 결론: 레지스터 2개 필요

3.2 최적 코드 생성 알고리즘

알고리즘: genCode(node)

if node가 리프이면:
    LD Ri, node.name
else:
    left = node.leftChild
    right = node.rightChild

    if left.label >= right.label:
        genCode(left)   // 결과가 Ri에
        genCode(right)  // 결과가 Rj에
        emit("OP Ri, Ri, Rj")
    else:
        genCode(right)  // 오른쪽을 먼저
        genCode(left)
        emit("OP Ri, Rj, Ri")

label이 큰 쪽을 먼저 계산하면 레지스터 사용이 최적화됩니다.

4. 그래프 컬러링에 의한 레지스터 할당

4.1 간섭 그래프 (Interference Graph)

간섭 그래프는 변수들의 동시 활성(live) 관계를 나타냅니다.

노드: 프로그램의 각 변수(또는 임시값)
간선: 두 변수가 동시에 활성이면 간선으로 연결

// 코드:
// 1: a = ...
// 2: b = ...     (a live)
// 3: c = a + b   (a, b live)
// 4: d = c       (c live)
// 5: e = b + d   (b, d live)
// 6: return e

// 간섭 그래프:
//   a --- b
//   |
//   c     d --- b
//         |
//         e

4.2 그래프 컬러링

k개의 레지스터에 변수를 할당하는 문제는 간섭 그래프의 k-컬러링 문제와 같습니다.

인접한 노드(동시 활성 변수)에 같은 색(레지스터)을 배정하면 안 됨
k-컬러링이 가능하면 k개의 레지스터로 충분
불가능하면 일부 변수를 메모리로 spill

4.3 간소화 기반 컬러링 알고리즘

Chaitin의 알고리즘:

알고리즘: 그래프 컬러링 레지스터 할당

1. 간소화(Simplify):
   - 차수(degree)가 k 미만인 노드를 그래프에서 제거하고 스택에 push
   - 제거 후 다른 노드의 차수가 감소할 수 있으므로 반복

2. Spill:
   - 모든 노드의 차수가 k 이상이면, 하나를 잠재적 spill로 표시하고 제거
   - spill 대상 선택 기준: 사용 빈도가 낮은 변수, 루프 밖의 변수

3. 선택(Select):
   - 스택에서 노드를 하나씩 꺼내면서 그래프에 다시 추가
   - 이웃이 사용하지 않는 색(레지스터)을 배정
   - 배정 가능한 색이 없으면 실제 spill 발생

4. 실제 Spill 처리:
   - spill된 변수에 로드/저장 코드 삽입
   - 간섭 그래프를 다시 구성하고 알고리즘 반복

4.4 예시

레지스터 2개(R1, R2)로 컬러링:

// 간섭 그래프:
//   a --- b --- c
//
// 차수: a=1, b=2, c=1

// 간소화: a(차수1 < 2) 제거 -> c(차수1 < 2) 제거 -> b(차수0 < 2) 제거
// 스택: [a, c, b]

// 선택:
//   b -> R1 (이웃 없음, 아무 색이나 가능)
//   c -> R2 (이웃 b=R1이므로 R2)
//   a -> R2 (이웃 b=R1이므로 R2)

// 결과: a=R2, b=R1, c=R2

5. 핍홀 최적화 (Peephole Optimization)

5.1 개념

핍홀 최적화는 생성된 목적 코드의 작은 윈도우(peephole)를 살펴보면서 개선하는 기법입니다. 코드 생성 후 후처리 단계에서 적용됩니다.

5.2 주요 핍홀 최적화 기법

중복 로드/저장 제거

// 최적화 전
ST a, R1
LD R1, a    // 불필요한 로드

// 최적화 후
ST a, R1
// LD 제거 (R1에 이미 a의 값이 있음)

도달 불가능 코드 제거

// 최적화 전
BR L2
ADD R1, R2, R3   // 도달 불가능
SUB R4, R5, R6   // 도달 불가능
L2:

// 최적화 후
BR L2
L2:
// 또는 BR 자체도 제거 가능 (연속된 레이블이면)

흐름 제어 최적화

// 최적화 전
BR L1
...
L1: BR L2

// 최적화 후
BR L2     // L1을 건너뛰고 직접 L2로

대수적 간소화

// 최적화 전
ADD R1, R1, #0   // x + 0 = x
MUL R2, R2, #1   // x * 1 = x
MUL R3, R3, #2   // x * 2

// 최적화 후
// ADD 제거 (항등 연산)
// MUL 제거 (항등 연산)
SHL R3, R3, #1   // 곱셈을 시프트로 대체 (강도 감소)

강도 감소 (Strength Reduction)

비용이 큰 연산을 동등하지만 비용이 작은 연산으로 교체합니다.

// 최적화 전
MUL R1, R2, #4    // 곱셈 (느림)
MUL R3, R4, #8
DIV R5, R6, #2

// 최적화 후
SHL R1, R2, #2    // 왼쪽 시프트 (빠름): x*4 = x<<2
SHL R3, R4, #3    // x*8 = x<<3
SHR R5, R6, #1    // 오른쪽 시프트: x/2 = x>>1

기계 관용구 활용

// 최적화 전
LD  R1, a
ADD R1, R1, #1
ST  a, R1

// 최적화 후 (기계가 INC 명령어를 지원하는 경우)
INC a

5.3 핍홀 최적화의 특징

지역적 최적화: 작은 윈도우(보통 2-3개 명령어)만 관찰
반복 적용: 한 번의 최적화가 새로운 최적화 기회를 만들 수 있으므로 변화가 없을 때까지 반복
구현이 간단: 패턴 매칭으로 쉽게 구현 가능
효과적: 단순하지만 코드 품질을 크게 개선

6. 코드 생성 기법 비교

기법	범위	최적성	복잡도
단순 코드 생성	기본 블록	준최적	낮음
DAG 기반 생성	기본 블록	준최적	중간
트리 기반 최적 생성	트리 수식	최적	중간
그래프 컬러링 할당	전체 함수	준최적	높음
핍홀 최적화	명령어 윈도우	지역 최적	낮음

정리

개념	설명
단순 코드 생성	레지스터/주소 디스크립터를 활용한 순차적 코드 생성
getReg	결과 저장용 레지스터를 선택하는 핵심 함수
DAG 기반 생성	공통 부분식을 자동으로 제거하며 코드 생성
Ershov 수	트리 수식의 최소 레지스터 요구량 계산
간섭 그래프	동시 활성 변수 관계를 그래프로 표현
그래프 컬러링	간섭 그래프의 k-컬러링으로 레지스터 할당
핍홀 최적화	생성된 코드의 작은 윈도우를 패턴 매칭으로 개선
강도 감소	비용이 큰 연산을 저렴한 연산으로 대체

코드 생성 알고리즘은 컴파일러 성능의 핵심입니다. 단순 알고리즘으로도 합리적인 코드를 생성할 수 있지만, DAG 기반 접근법과 그래프 컬러링 레지스터 할당을 결합하면 훨씬 효율적인 코드를 얻을 수 있습니다. 핍홀 최적화는 마지막 마무리 단계로서 간단하면서도 효과적인 코드 개선을 제공합니다. 다음 글에서는 머신 독립적인 코드 최적화 기법들을 살펴보겠습니다.

[Compiler] 12. Code Generation Algorithms and Register Allocation

Overview

In the previous article, we explored the basic concepts of code generation. This article focuses on the actual algorithms for generating code. We will cover the simple code generation algorithm, DAG-based code generation, optimal code generation, graph coloring for register allocation, and peephole optimization.

1. Simple Code Generation Algorithm

1.1 Basic Idea

The most basic code generation algorithm processes three-address instructions x = y op z one at a time to generate target code.

The key is to maintain register descriptors and address descriptors while selecting the optimal register for each operation.

1.2 Register Descriptor

A register descriptor tracks which variable's value each register currently holds.

Initial state:
  R1: empty
  R2: empty
  R3: empty

After instruction "LD R1, a":
  R1: a
  R2: empty
  R3: empty

1.3 Address Descriptor

An address descriptor tracks where each variable's current value is stored. A value can exist simultaneously in multiple locations such as registers, memory, and the stack.

Variable a: memory
Variable b: R1, memory
Variable c: R2

1.4 getReg Function

The getReg(x = y op z) function selects a register to store the result of an instruction.

Selection priority:

If y is in register R, R contains only y, and y is not used later -> select R
If there is an empty register -> select that register
If all registers are in use -> select the register holding the value that will be used furthest in the future (spill)

1.5 Code Generation Example

Let us generate code for the following basic block. Assume only two registers R1 and R2 are available.

// Three-address code
t = a - b
u = a - c
v = t + u
d = v + u

Code generation process:

Instruction: t = a - b
  getReg -> R1 (empty)
  Code: LD  R1, a      // R1 = a
        LD  R2, b      // R2 = b
        SUB R1, R1, R2 // R1 = a - b = t
  Registers: R1=t, R2=b
  Descriptors: t->R1, a->memory, b->R2/memory

Instruction: u = a - c
  getReg -> R2 (b is not needed later)
  Code: LD  R2, a      // R2 = a
        SUB R2, R2, c  // R2 = a - c = u
  Registers: R1=t, R2=u

Instruction: v = t + u
  getReg -> R1 (t is not needed later)
  Code: ADD R1, R1, R2 // R1 = t + u = v
  Registers: R1=v, R2=u

Instruction: d = v + u
  getReg -> R1 (v is not needed later)
  Code: ADD R1, R1, R2 // R1 = v + u = d
  Registers: R1=d, R2=u

// End of basic block: store live variable d to memory
  Code: ST  d, R1

Final generated code:

LD  R1, a
LD  R2, b
SUB R1, R1, R2
LD  R2, a
SUB R2, R2, c
ADD R1, R1, R2
ADD R1, R1, R2
ST  d, R1

2. DAG-Based Code Generation

Generating code from a basic block's DAG representation naturally achieves optimizations such as common subexpression elimination.

2.1 Determining Node Order in DAG

Choosing a good evaluation order for DAG nodes can minimize the number of registers needed.

Key principle: If a node is evaluated and its parent node is evaluated immediately after, there is no need to keep the value in a register for a long time.

// DAG:
//        (d: +)
//       /      \
//    (v: +)    (u: -)
//    /    \    /    \
// (t: -)  (u) a0    c0
//  /   \
// a0   b0

// Good order: t, u, v, d (consumed right after use)
// Bad order: t, v, u, d (t must be kept for a long time)

2.2 Heuristic Ordering Algorithm

Algorithm: DAG Node Ordering

1. Push root nodes (nodes with no output) onto the stack
2. While the stack is not empty:
   a. Pop node n from the top of the stack and output it
   b. Push children of n whose parents have all been output
   c. If multiple children satisfy the condition, push the leftmost child first

3. Tree-Based Optimal Code Generation

3.1 Ershov Numbers

A method for calculating the minimum number of registers needed for a tree-form expression.

Ershov number calculation rules:

1. Leaf node: label = 1 (left child) or 0 (right child in memory)
2. If the two children's labels differ: label = max(l1, l2)
3. If the two children's labels are equal: label = l1 + 1

Example:

//       (*)      label = 2
//      /   \
//    (+)    (-)   label = 1, 1
//   / \    / \
//  a   b  c   d   leaves
//
// (+): left=1, right=0 -> different, max(1,0) = 1
// (-): left=1, right=0 -> different, max(1,0) = 1
// (*): left=1, right=1 -> equal, 1+1 = 2
//
// Conclusion: 2 registers needed

3.2 Optimal Code Generation Algorithm

Algorithm: genCode(node)

if node is a leaf:
    LD Ri, node.name
else:
    left = node.leftChild
    right = node.rightChild

    if left.label >= right.label:
        genCode(left)   // result in Ri
        genCode(right)  // result in Rj
        emit("OP Ri, Ri, Rj")
    else:
        genCode(right)  // right side first
        genCode(left)
        emit("OP Ri, Rj, Ri")

Computing the side with the larger label first optimizes register usage.

4. Register Allocation by Graph Coloring

4.1 Interference Graph

An interference graph represents the simultaneous liveness relationships of variables.

Nodes: Each variable (or temporary) in the program
Edges: Two variables that are simultaneously live are connected by an edge

// Code:
// 1: a = ...
// 2: b = ...     (a live)
// 3: c = a + b   (a, b live)
// 4: d = c       (c live)
// 5: e = b + d   (b, d live)
// 6: return e

// Interference graph:
//   a --- b
//   |
//   c     d --- b
//         |
//         e

4.2 Graph Coloring

The problem of assigning variables to k registers is equivalent to the k-coloring problem on the interference graph.

Adjacent nodes (simultaneously live variables) must not be assigned the same color (register)
If k-coloring is possible, k registers are sufficient
If not possible, some variables must be spilled to memory

4.3 Simplification-Based Coloring Algorithm

Chaitin's algorithm:

Algorithm: Graph Coloring Register Allocation

1. Simplify:
   - Remove nodes with degree less than k from the graph and push onto stack
   - After removal, other nodes' degrees may decrease, so repeat

2. Spill:
   - If all nodes have degree >= k, mark one as a potential spill and remove it
   - Spill candidate selection criteria: infrequently used variables, variables outside loops

3. Select:
   - Pop nodes from the stack one at a time, adding them back to the graph
   - Assign a color (register) not used by neighbors
   - If no color is available, actual spill occurs

4. Actual Spill Handling:
   - Insert load/store code for spilled variables
   - Rebuild the interference graph and repeat the algorithm

4.4 Example

Coloring with 2 registers (R1, R2):

// Interference graph:
//   a --- b --- c
//
// Degrees: a=1, b=2, c=1

// Simplify: remove a (degree 1 < 2) -> remove c (degree 1 < 2) -> remove b (degree 0 < 2)
// Stack: [a, c, b]

// Select:
//   b -> R1 (no neighbors, any color works)
//   c -> R2 (neighbor b=R1, so R2)
//   a -> R2 (neighbor b=R1, so R2)

// Result: a=R2, b=R1, c=R2

5. Peephole Optimization

5.1 Concept

Peephole optimization is a technique that improves generated target code by examining a small window (peephole) of instructions. It is applied as a post-processing step after code generation.

5.2 Key Peephole Optimization Techniques

Redundant Load/Store Elimination

// Before optimization
ST a, R1
LD R1, a    // Redundant load

// After optimization
ST a, R1
// LD removed (R1 already contains the value of a)

Unreachable Code Elimination

// Before optimization
BR L2
ADD R1, R2, R3   // Unreachable
SUB R4, R5, R6   // Unreachable
L2:

// After optimization
BR L2
L2:
// Or the BR itself can be removed (if consecutive labels)

Control Flow Optimization

// Before optimization
BR L1
...
L1: BR L2

// After optimization
BR L2     // Skip L1 and go directly to L2

Algebraic Simplification

// Before optimization
ADD R1, R1, #0   // x + 0 = x
MUL R2, R2, #1   // x * 1 = x
MUL R3, R3, #2   // x * 2

// After optimization
// ADD removed (identity operation)
// MUL removed (identity operation)
SHL R3, R3, #1   // Replace multiplication with shift (strength reduction)

Strength Reduction

Replacing expensive operations with equivalent but cheaper operations.

// Before optimization
MUL R1, R2, #4    // Multiplication (slow)
MUL R3, R4, #8
DIV R5, R6, #2

// After optimization
SHL R1, R2, #2    // Left shift (fast): x*4 = x<<2
SHL R3, R4, #3    // x*8 = x<<3
SHR R5, R6, #1    // Right shift: x/2 = x>>1

Machine Idiom Utilization

// Before optimization
LD  R1, a
ADD R1, R1, #1
ST  a, R1

// After optimization (if machine supports INC instruction)
INC a

5.3 Characteristics of Peephole Optimization

Local optimization: Observes only a small window (usually 2-3 instructions)
Iterative application: One optimization may create new optimization opportunities, so repeat until no more changes
Simple implementation: Easily implemented through pattern matching
Effective: Simple yet significantly improves code quality

6. Comparison of Code Generation Techniques

Technique	Scope	Optimality	Complexity
Simple code generation	Basic block	Sub-optimal	Low
DAG-based generation	Basic block	Sub-optimal	Medium
Tree-based optimal generation	Tree expression	Optimal	Medium
Graph coloring allocation	Entire function	Sub-optimal	High
Peephole optimization	Instruction window	Locally optimal	Low

Summary

Concept	Description
Simple code generation	Sequential code generation using register/address descriptors
getReg	Core function for selecting a register to store results
DAG-based generation	Code generation that automatically eliminates common subexpressions
Ershov numbers	Calculates minimum register requirements for tree expressions
Interference graph	Represents simultaneously live variable relationships as a graph
Graph coloring	Register allocation via k-coloring of the interference graph
Peephole optimization	Improves generated code by pattern matching on a small window
Strength reduction	Replaces expensive operations with cheaper equivalent operations

Code generation algorithms are at the core of compiler performance. While simple algorithms can produce reasonable code, combining DAG-based approaches with graph coloring register allocation yields much more efficient code. Peephole optimization serves as a final finishing step that provides simple yet effective code improvement. In the next article, we will examine machine-independent code optimization techniques.