Split View: [컴파일러] 11. 코드 생성 - 기본 블록과 흐름 그래프

[컴파일러] 11. 코드 생성 - 기본 블록과 흐름 그래프

개요

컴파일러의 최종 단계인 코드 생성(Code Generation)은 중간 표현(IR)을 목적 기계의 명령어로 변환하는 과정입니다. 이 단계에서의 결정이 최종 실행 파일의 성능에 직접적인 영향을 미치기 때문에, 코드 생성기 설계는 컴파일러 구현에서 가장 중요한 부분 중 하나입니다.

이번 글에서는 코드 생성기 설계 시 고려해야 할 주요 이슈들, 목적 언어 모델, 주소 지정 방식, 기본 블록과 흐름 그래프, 그리고 DAG 표현에 대해 알아보겠습니다.

1. 코드 생성기 설계의 주요 이슈

코드 생성기를 설계할 때 다음과 같은 핵심 이슈들을 고려해야 합니다.

1.1 명령어 선택 (Instruction Selection)

중간 코드의 각 연산을 목적 기계의 명령어로 매핑하는 과정입니다. 하나의 IR 연산이 여러 가지 명령어 조합으로 구현될 수 있기 때문에, 최적의 명령어를 선택하는 것이 중요합니다.

예를 들어, 세 주소 코드 a = a + 1을 변환할 때:

// 단순 변환
LD  R0, a
ADD R0, R0, #1
ST  a, R0

// 증가 명령어가 있는 경우
INC a

목적 기계에 INC 명령어가 있다면 한 줄로 처리할 수 있지만, 단순 변환은 3개의 명령어가 필요합니다. 명령어 선택이 코드 품질에 미치는 영향이 큰 이유입니다.

1.2 레지스터 할당 (Register Allocation)

레지스터는 CPU에서 가장 빠르게 접근할 수 있는 저장소입니다. 변수를 레지스터에 할당하면 메모리 접근을 줄여 성능을 크게 향상시킬 수 있습니다.

레지스터 할당은 두 가지 하위 문제로 나뉩니다:

레지스터 할당(Register Allocation): 어떤 변수를 레지스터에 넣을지 결정
레지스터 배정(Register Assignment): 구체적으로 어떤 레지스터에 넣을지 결정

레지스터 수가 제한되어 있으므로, 동시에 사용 중인(live) 변수가 레지스터 수보다 많으면 일부 변수를 메모리로 내보내야(spill) 합니다.

1.3 평가 순서 (Evaluation Order)

연산의 수행 순서를 바꾸면 필요한 레지스터 수를 줄일 수 있습니다. 예를 들어:

// 순서 A: 레지스터 3개 필요
t1 = a + b
t2 = c + d
t3 = t1 * t2

// 순서 B: 레지스터 2개로 충분
t1 = a + b
t2 = c + d
t1 = t1 * t2

최적의 평가 순서를 찾는 것은 일반적으로 NP-완전 문제이므로, 실용적인 휴리스틱을 사용합니다.

2. 목적 언어 모델

코드 생성기는 특정 목적 기계를 대상으로 합니다. 여기서는 일반적인 레지스터 기계 모델을 사용합니다.

2.1 명령어 형식

가상의 목적 기계는 다음과 같은 명령어를 지원한다고 가정합니다:

LD  dst, addr    // 메모리에서 레지스터로 로드
ST  addr, src    // 레지스터에서 메모리로 저장
OP  dst, src1, src2  // 산술/논리 연산
BR  label        // 무조건 분기
CBR cond, label  // 조건 분기

2.2 주소 지정 모드

목적 기계는 다양한 주소 지정 모드를 제공합니다:

모드	형식	의미
절대 주소	`M`	메모리 주소 M의 값
레지스터	`R`	레지스터 R의 값
인덱스	`c(R)`	주소 c + R 내용의 값
간접 레지스터	`*R`	R이 가리키는 메모리의 값
즉치값	`#c`	상수 c 자체

3. 목적 코드의 주소 지정

프로그램의 데이터가 메모리에 배치되는 방식에 따라 주소를 생성하는 방법이 달라집니다.

3.1 정적 할당 (Static Allocation)

컴파일 시점에 모든 데이터의 주소가 결정됩니다. 전역 변수나 정적 변수에 사용됩니다.

// 정적 할당 예시
// 변수 x가 주소 100번지에 할당된 경우
LD R1, 100      // x의 값을 R1에 로드
ADD R1, R1, #1  // x + 1 계산
ST 100, R1      // 결과를 x에 저장

정적 할당의 특징:

주소가 컴파일 시 결정되므로 빠른 접근 가능
재귀 호출을 지원하지 못함 (각 호출마다 별도의 공간이 필요)
C 언어의 static 변수가 이 방식을 사용

3.2 스택 할당 (Stack Allocation)

지역 변수는 함수 호출 시 스택 프레임에 할당됩니다.

// 스택 프레임 구조
// +------------------+
// | 반환 주소         |  <- SP + 0
// | 이전 프레임 포인터 |  <- SP + 4
// | 지역 변수 a       |  <- SP + 8
// | 지역 변수 b       |  <- SP + 12
// +------------------+

// 지역 변수 a에 접근
LD R1, 8(SP)     // SP + 8에 있는 a를 로드

스택 할당은 재귀 호출을 자연스럽게 지원합니다. 각 호출마다 새로운 스택 프레임이 생성되기 때문입니다.

3.3 활성 레코드 (Activation Record)

함수가 호출되면 다음 정보를 담은 활성 레코드가 스택에 생성됩니다:

실제 매개변수(actual parameters)
반환 값(return value)
제어 링크(control link) - 호출자의 활성 레코드를 가리킴
접근 링크(access link) - 비지역 변수 접근용
기계 상태 저장(saved machine state)
지역 변수와 임시 변수

// 함수 호출 시 코드 생성 예시
// call f(a, b)

// 1. 매개변수 전달
ST  0(SP), a     // 첫 번째 인수
ST  4(SP), b     // 두 번째 인수
// 2. 반환 주소 저장
ST  8(SP), returnAddr
// 3. SP 갱신 후 분기
ADD SP, SP, #frameSize
BR  f_entry

4. 기본 블록과 흐름 그래프

4.1 기본 블록 (Basic Block)

기본 블록은 다음 조건을 만족하는 연속된 명령어 시퀀스입니다:

제어 흐름은 블록의 첫 번째 명령어로만 진입
블록의 마지막 명령어에서만 제어가 빠져나감
중간에 분기나 분기 대상이 없음

기본 블록을 구성하는 알고리즘:

알고리즘: 기본 블록 분할

입력: 세 주소 명령어 시퀀스

1단계: 리더(leader) 결정
  - 첫 번째 명령어는 리더
  - 조건/무조건 분기의 대상 명령어는 리더
  - 분기 명령어 바로 다음 명령어는 리더

2단계: 각 리더부터 다음 리더(또는 끝) 직전까지가 하나의 기본 블록

예시:

// 세 주소 코드
1: i = 1
2: j = 1
3: t1 = 10 * i
4: t2 = t1 + j
5: t3 = 8 * t2
6: a[t3] = 0.0
7: j = j + 1
8: if j <= 10 goto 3
9: i = i + 1
10: if i <= 10 goto 2
11: i = 1

리더: 1번(첫 명령어), 3번(8번의 분기 대상), 2번(10번의 분기 대상), 9번(8번 다음), 11번(10번 다음)

기본 블록:

B1: 명령어 1-2
B2: 명령어 3-8
B3: 명령어 9-10
B4: 명령어 11

4.2 흐름 그래프 (Flow Graph)

흐름 그래프는 기본 블록을 노드로, 블록 간의 제어 흐름을 간선으로 표현한 방향 그래프입니다.

     B1 (i=1, j=1)
         |
         v
  +---> B2 (t1=10*i, ..., if j<=10 goto B2)
  |      |  \
  |      |   \(j<=10)
  |      |    +---+
  |      |        |
  |      v (j>10) |
  |     B3 (i=i+1, if i<=10 goto B2)
  |      |  \
  |      |   \(i<=10)
  |      +----+
  |
  +--- (되돌아가는 간선)
         |
         v (i>10)
        B4 (i=1)

간선의 종류:

순방향 간선: B1 -> B2 (자연스러운 흐름)
조건 간선: 조건 분기에 의한 간선
후방 간선(back edge): 루프를 형성하는 간선 (B2 -> B2, B3 -> B2)

4.3 루프 검출

흐름 그래프에서 루프를 검출하는 것은 최적화에 매우 중요합니다.

**자연 루프(Natural Loop)**의 정의:

루프 헤더(header) 노드 d가 존재
후방 간선 n -> d가 존재 (d가 n을 지배)
루프 본체는 d를 거치지 않고 n에 도달할 수 있는 모든 노드와 d의 집합

루프의 중요 성질:

프로그램 실행 시간의 대부분이 루프에서 소비됨 (90/10 법칙)
루프 내부 코드의 최적화가 전체 성능에 큰 영향

5. 기본 블록의 DAG 표현

**DAG(Directed Acyclic Graph)**는 기본 블록 내의 연산들 사이의 관계를 표현하는 효과적인 방법입니다.

5.1 DAG 구성

DAG의 각 노드는 다음을 나타냅니다:

리프(leaf): 초기 값 (변수의 입력 값이나 상수)
내부 노드: 연산자와 그 피연산자를 나타내는 자식 노드
각 노드에는 해당 값을 사용하는 변수 이름이 레이블로 붙음

5.2 DAG 구성 예시

다음 세 주소 코드에 대한 DAG를 구성해 봅시다:

a = b + c
b = a - d
c = b + c
d = a - d

DAG 구성 과정:

Step 1: a = b + c
    (+) [a]
   /   \
  b0    c0

Step 2: b = a - d  (a는 (+) 노드 참조)
    (+) [a]       (-) [b]
   /   \         /   \
  b0    c0      (+)   d0
                 |
              [a 노드]

Step 3: c = b + c  (b는 (-) 노드, c는 c0)
    (+) [a]       (-) [b]      (+) [c]
   /   \         /   \        /   \
  b0    c0      (+)   d0    (-)    c0
                               [b 노드]

Step 4: d = a - d  (같은 연산이 이미 존재: (-) 노드)
    (-) 노드에 d 레이블 추가 -> (-) [b, d]

핵심 관찰: b = a - d와 d = a - d는 같은 연산이므로 DAG에서 하나의 노드로 표현됩니다. 이것이 **공통 부분식 제거(Common Subexpression Elimination)**입니다.

5.3 DAG의 활용

DAG를 통해 다음을 발견할 수 있습니다:

공통 부분식: 같은 연산이 반복되는 경우 하나로 합침
불필요한 코드: 결과가 사용되지 않는 연산 제거
상수 폴딩: 피연산자가 모두 상수인 연산을 컴파일 시점에 계산

// DAG 최적화 전
t1 = a + b
t2 = a + b    // 공통 부분식
t3 = t1 * t2

// DAG 최적화 후
t1 = a + b
t3 = t1 * t1  // t2 대신 t1 재사용

5.4 DAG에서 코드 재생성

DAG로부터 최적화된 코드를 재생성할 때, 노드의 평가 순서에 따라 필요한 레지스터 수가 달라집니다.

// DAG 구조:
//       (*)
//      /   \
//    (+)   (-)
//   / \   / \
//  a   b c   d

// 순서 1: 왼쪽 먼저
R1 = a + b   // (+) 결과를 R1에
R2 = c - d   // (-) 결과를 R2에
R1 = R1 * R2 // 최종 결과

// 순서 2: 오른쪽 먼저
R1 = c - d   // (-) 결과를 R1에
R2 = a + b   // (+) 결과를 R2에
R1 = R2 * R1 // 최종 결과

두 순서 모두 2개의 레지스터가 필요하지만, 복잡한 DAG에서는 순서에 따라 필요한 레지스터 수에 큰 차이가 날 수 있습니다.

6. 코드 생성기 입력과 출력

6.1 입력: 중간 표현

코드 생성기는 다양한 형태의 중간 표현을 입력으로 받을 수 있습니다:

세 주소 코드(Three-Address Code): x = y op z 형태
추상 구문 트리(AST): 프로그램의 구조를 트리로 표현
선형 IR: 가상 머신의 명령어 형태

6.2 출력: 목적 코드

코드 생성기의 출력은 다음 중 하나입니다:

절대 기계 코드: 고정된 메모리 위치에 로드되는 코드
재배치 가능 기계 코드: 링커에 의해 조합되는 오브젝트 코드
어셈블리 코드: 어셈블러를 한번 더 거치는 코드

대부분의 실용적인 컴파일러는 재배치 가능 기계 코드를 생성합니다.

정리

개념	설명
명령어 선택	IR 연산을 목적 기계 명령어로 매핑
레지스터 할당	변수를 제한된 레지스터에 효율적으로 배정
평가 순서	레지스터 사용을 최소화하는 연산 순서 결정
기본 블록	하나의 입구와 하나의 출구를 가진 명령어 시퀀스
흐름 그래프	기본 블록 간의 제어 흐름을 나타내는 방향 그래프
DAG 표현	기본 블록 내 연산의 의존 관계를 비순환 그래프로 표현
정적 할당	컴파일 시 주소 결정, 재귀 미지원
스택 할당	실행 시 스택 프레임에 할당, 재귀 지원

코드 생성은 컴파일러의 최종 단계로서, 앞선 모든 분석과 최적화의 결과를 실제 실행 가능한 코드로 변환합니다. 기본 블록과 흐름 그래프는 코드 생성과 최적화의 기본 단위가 되며, DAG 표현은 기본 블록 수준에서의 지역 최적화를 가능하게 합니다. 다음 글에서는 구체적인 코드 생성 알고리즘과 레지스터 할당 기법을 살펴보겠습니다.

[Compiler] 11. Code Generation - Basic Blocks and Flow Graphs

Overview

Code Generation, the final phase of the compiler, is the process of translating intermediate representation (IR) into target machine instructions. Since decisions made at this stage directly affect the performance of the final executable, code generator design is one of the most critical aspects of compiler implementation.

In this article, we will explore the major issues to consider when designing a code generator, the target language model, addressing modes, basic blocks and flow graphs, and DAG representation.

1. Key Issues in Code Generator Design

When designing a code generator, the following core issues must be considered.

1.1 Instruction Selection

This is the process of mapping each operation in the intermediate code to target machine instructions. Since a single IR operation can be implemented by various instruction combinations, selecting the optimal instructions is important.

For example, when converting the three-address code a = a + 1:

// Simple conversion
LD  R0, a
ADD R0, R0, #1
ST  a, R0

// When an increment instruction is available
INC a

If the target machine has an INC instruction, it can be handled in a single line, whereas the simple conversion requires 3 instructions. This illustrates why instruction selection has a significant impact on code quality.

1.2 Register Allocation

Registers are the fastest accessible storage in the CPU. Assigning variables to registers can significantly improve performance by reducing memory accesses.

Register Allocation: Deciding which variables to place in registers
Register Assignment: Deciding which specific register to assign them to

Since the number of registers is limited, when there are more simultaneously live variables than available registers, some variables must be spilled to memory.

1.3 Evaluation Order

Changing the order of operations can reduce the number of registers needed. For example:

// Order A: 3 registers needed
t1 = a + b
t2 = c + d
t3 = t1 * t2

// Order B: 2 registers sufficient
t1 = a + b
t2 = c + d
t1 = t1 * t2

Finding the optimal evaluation order is generally an NP-complete problem, so practical heuristics are used.

2. Target Language Model

The code generator targets a specific target machine. Here we use a general register machine model.

2.1 Instruction Format

We assume the hypothetical target machine supports the following instructions:

LD  dst, addr    // Load from memory to register
ST  addr, src    // Store from register to memory
OP  dst, src1, src2  // Arithmetic/logic operation
BR  label        // Unconditional branch
CBR cond, label  // Conditional branch

2.2 Addressing Modes

The target machine provides various addressing modes:

Mode	Format	Meaning
Absolute address	`M`	Value at memory address M
Register	`R`	Value in register R
Indexed	`c(R)`	Value at address c + contents of R
Register indirect	`*R`	Value at memory pointed to by R
Immediate	`#c`	The constant c itself

3. Target Code Addressing

The method of generating addresses differs depending on how program data is laid out in memory.

3.1 Static Allocation

All data addresses are determined at compile time. This is used for global and static variables.

// Static allocation example
// Variable x is allocated at address 100
LD R1, 100      // Load value of x into R1
ADD R1, R1, #1  // Calculate x + 1
ST 100, R1      // Store result in x

Characteristics of static allocation:

Fast access since addresses are determined at compile time
Cannot support recursive calls (separate space needed for each call)
C language static variables use this approach

3.2 Stack Allocation

Local variables are allocated in the stack frame during function calls.

// Stack frame structure
// +------------------+
// | Return address    |  <- SP + 0
// | Previous frame ptr|  <- SP + 4
// | Local variable a  |  <- SP + 8
// | Local variable b  |  <- SP + 12
// +------------------+

// Accessing local variable a
LD R1, 8(SP)     // Load a at SP + 8

Stack allocation naturally supports recursive calls because a new stack frame is created for each call.

3.3 Activation Record

When a function is called, an activation record containing the following information is created on the stack:

Actual parameters
Return value
Control link - points to the caller's activation record
Access link - for accessing non-local variables
Saved machine state
Local and temporary variables

// Code generation example for function call
// call f(a, b)

// 1. Pass parameters
ST  0(SP), a     // First argument
ST  4(SP), b     // Second argument
// 2. Save return address
ST  8(SP), returnAddr
// 3. Update SP and branch
ADD SP, SP, #frameSize
BR  f_entry

4. Basic Blocks and Flow Graphs

4.1 Basic Block

A basic block is a contiguous sequence of instructions satisfying the following conditions:

Control flow enters only at the first instruction of the block
Control exits only at the last instruction of the block
No branches or branch targets in the middle

Algorithm for constructing basic blocks:

Algorithm: Basic Block Partitioning

Input: Sequence of three-address instructions

Step 1: Determine leaders
  - The first instruction is a leader
  - Target instructions of conditional/unconditional branches are leaders
  - Instructions immediately following branch instructions are leaders

Step 2: Each leader up to (but not including) the next leader (or end) forms one basic block

Example:

// Three-address code
1: i = 1
2: j = 1
3: t1 = 10 * i
4: t2 = t1 + j
5: t3 = 8 * t2
6: a[t3] = 0.0
7: j = j + 1
8: if j <= 10 goto 3
9: i = i + 1
10: if i <= 10 goto 2
11: i = 1

Leaders: instruction 1 (first instruction), 3 (branch target of 8), 2 (branch target of 10), 9 (after 8), 11 (after 10)

Basic blocks:

B1: Instructions 1-2
B2: Instructions 3-8
B3: Instructions 9-10
B4: Instruction 11

4.2 Flow Graph

A flow graph is a directed graph that represents basic blocks as nodes and control flow between blocks as edges.

     B1 (i=1, j=1)
         |
         v
  +---> B2 (t1=10*i, ..., if j<=10 goto B2)
  |      |  \
  |      |   \(j<=10)
  |      |    +---+
  |      |        |
  |      v (j>10) |
  |     B3 (i=i+1, if i<=10 goto B2)
  |      |  \
  |      |   \(i<=10)
  |      +----+
  |
  +--- (back edge)
         |
         v (i>10)
        B4 (i=1)

Types of edges:

Forward edge: B1 -> B2 (natural flow)
Conditional edge: Edge caused by conditional branch
Back edge: Edge forming a loop (B2 -> B2, B3 -> B2)

4.3 Loop Detection

Detecting loops in the flow graph is very important for optimization.

Definition of a Natural Loop:

A loop header node d exists
A back edge n -> d exists (d dominates n)
The loop body is the set of all nodes that can reach n without going through d, plus d

Important properties of loops:

Most of program execution time is spent in loops (the 90/10 rule)
Optimization of code inside loops has a major impact on overall performance

5. DAG Representation of Basic Blocks

A DAG (Directed Acyclic Graph) is an effective way to represent the relationships between operations within a basic block.

5.1 DAG Construction

Each node in the DAG represents:

Leaf: Initial values (input values of variables or constants)
Internal node: An operator and its child nodes representing operands
Each node is labeled with variable names that use the computed value

5.2 DAG Construction Example

Let us construct a DAG for the following three-address code:

a = b + c
b = a - d
c = b + c
d = a - d

DAG construction process:

Step 1: a = b + c
    (+) [a]
   /   \
  b0    c0

Step 2: b = a - d  (a references the (+) node)
    (+) [a]       (-) [b]
   /   \         /   \
  b0    c0      (+)   d0
                 |
              [a node]

Step 3: c = b + c  (b is the (-) node, c is c0)
    (+) [a]       (-) [b]      (+) [c]
   /   \         /   \        /   \
  b0    c0      (+)   d0    (-)    c0
                               [b node]

Step 4: d = a - d  (same operation already exists: (-) node)
    Add d label to (-) node -> (-) [b, d]

Key observation: b = a - d and d = a - d are the same operation, so they are represented as a single node in the DAG. This is Common Subexpression Elimination.

5.3 Applications of DAG

Through the DAG, the following can be discovered:

Common subexpressions: Merge repeated identical operations into one
Dead code: Remove operations whose results are never used
Constant folding: Compute operations with all-constant operands at compile time

// Before DAG optimization
t1 = a + b
t2 = a + b    // Common subexpression
t3 = t1 * t2

// After DAG optimization
t1 = a + b
t3 = t1 * t1  // Reuse t1 instead of t2

5.4 Code Regeneration from DAG

When regenerating optimized code from the DAG, the number of registers needed varies depending on the evaluation order of nodes.

// DAG structure:
//       (*)
//      /   \
//    (+)   (-)
//   / \   / \
//  a   b c   d

// Order 1: Left first
R1 = a + b   // (+) result in R1
R2 = c - d   // (-) result in R2
R1 = R1 * R2 // Final result

// Order 2: Right first
R1 = c - d   // (-) result in R1
R2 = a + b   // (+) result in R2
R1 = R2 * R1 // Final result

Both orders require 2 registers, but for complex DAGs, the order can make a significant difference in the number of registers needed.

6. Code Generator Input and Output

6.1 Input: Intermediate Representation

The code generator can accept various forms of intermediate representation as input:

Three-Address Code: Form x = y op z
Abstract Syntax Tree (AST): Represents program structure as a tree
Linear IR: In the form of virtual machine instructions

6.2 Output: Target Code

The output of the code generator is one of the following:

Absolute machine code: Code loaded at fixed memory locations
Relocatable machine code: Object code combined by the linker
Assembly code: Code that goes through an assembler

Most practical compilers generate relocatable machine code.

Summary

Concept	Description
Instruction selection	Mapping IR operations to target machine instructions
Register allocation	Efficiently assigning variables to limited registers
Evaluation order	Determining operation order to minimize register usage
Basic block	Instruction sequence with one entry and one exit
Flow graph	Directed graph representing control flow between basic blocks
DAG representation	Representing dependency relations of operations in a basic block as an acyclic graph
Static allocation	Address determined at compile time, no recursion support
Stack allocation	Allocated in stack frame at runtime, supports recursion

Code generation, as the final phase of the compiler, translates all preceding analysis and optimization results into actual executable code. Basic blocks and flow graphs serve as the fundamental units for code generation and optimization, and DAG representation enables local optimization at the basic block level. In the next article, we will examine specific code generation algorithms and register allocation techniques.