Skip to content

필사 모드: [Compiler] 09. Intermediate Code Generation - Three-Address Code and Type Checking

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Intermediate Code Generation - Three-Address Code and Type Checking

Intermediate code generation is the phase that converts the syntax tree into a machine-independent **Intermediate Representation (IR)**. This article covers the main IR formats, type checking, and control flow translation.

1. Syntax Tree Variants

Abstract Syntax Tree (AST)

An AST is a concise representation with unnecessary nodes removed from the parse tree.

// a + a * (b - c) + (b - c) * d

AST:

+

/ \

+ *

/ \ / \

a * - d

/ \ / \

a - b c

/ \

b c

Directed Acyclic Graph (DAG)

A DAG shares **common subexpressions** in the AST.

// a + a * (b - c) + (b - c) * d

DAG:

+

/ \

+ *

/ \ / \

a * | d

/ \|

a - <- (b-c) represented only once

/ \

b c

The advantages of a DAG are:

- **Automatic common subexpression detection**: Shares identical operations to prevent redundant computation.

- **Memory savings**: Reuses identical nodes.

DAG Construction Algorithm

Node createNode(String op, Node left, Node right) {

// Check if a node with the same operation and children already exists

Node existing = lookup(op, left, right);

if (existing != null) {

return existing; // Reuse existing node

}

Node node = new Node(op, left, right);

insert(op, left, right, node); // Register in hash table

return node;

}

2. Three-Address Code

Three-address code is the most widely used intermediate representation format. Each instruction has at most **one operator** and **at most three addresses** (operands and result).

Forms of Three-Address Code

Basic form: x = y op z

Main instruction types:

x = y op z // Binary operation

x = op y // Unary operation

x = y // Copy

goto L // Unconditional branch

if x relop y goto L // Conditional branch

param x // Parameter passing

call p, n // Function call (p: function, n: arg count)

return y // Function return

x = y[i] // Array access (index)

x[i] = y // Array assignment

x = &y // Address of

x = *y // Dereference

*x = y // Indirect assignment

Example: Three-Address Code Translation of an Expression

// Original code

a = b * -c + b * -c

// Three-address code (AST based)

t1 = minus c

t2 = b * t1

t3 = minus c

t4 = b * t3

t5 = t2 + t4

a = t5

// Three-address code (DAG based - optimized)

t1 = minus c

t2 = b * t1

t5 = t2 + t2

a = t5

3. Implementation of Three-Address Code

Quadruples

Composed of 4 fields: `(op, arg1, arg2, result)`

No. | op | arg1 | arg2 | result

----+-------+------+------+-------

0 | minus | c | | t1

1 | * | b | t1 | t2

2 | minus | c | | t3

3 | * | b | t3 | t4

4 | + | t2 | t4 | t5

5 | = | t5 | | a

Triples

Removes the result field and references results by instruction number.

No. | op | arg1 | arg2

----+-------+------+------

0 | minus | c |

1 | * | b | (0)

2 | minus | c |

3 | * | b | (2)

4 | + | (1) | (3)

5 | = | a | (4)

Triples save memory but require updating references when instructions are moved.

Indirect Triples

Maintains a separate pointer list to triples, making instruction reordering easier.

Order list: [0, 1, 2, 3, 4, 5]

Reordered: [2, 0, 3, 1, 4, 5] // Original triples unchanged

4. Static Single Assignment (SSA) Form

In **SSA (Static Single Assignment)** form, each variable is **defined exactly once**.

// Normal three-address code

p = a + b

q = p - c

p = q * d // p defined twice

e = p + q

// SSA form

p1 = a + b

q1 = p1 - c

p2 = q1 * d // New name p2

e1 = p2 + q1

Phi Functions

**Phi functions** are used at points where control flow merges.

// Original code

if (flag) x = 1; else x = 2;

y = x;

// SSA form

if (flag) x1 = 1; else x2 = 2;

x3 = phi(x1, x2); // Selects based on which path was taken

y1 = x3;

Advantages of SSA

- **Simplifies data flow analysis**: Each variable has a unique definition.

- **Facilitates optimization**: Constant propagation, dead code elimination, etc., become intuitive.

- Used as the core IR in **modern compilers (LLVM, GCC)**.

5. Type Expressions

How types are represented in a type system.

Basic Types

boolean, char, integer, float, void

Type Constructors

1. Array: array(size, element_type)

int a[10] -> array(10, integer)

2. Record/Struct: record(field_list)

struct { int x; float y; }

-> record((x, integer), (y, float))

3. Pointer: pointer(type)

int *p -> pointer(integer)

4. Function: parameter_types -> return_type

int f(float, char) -> float x char -> integer

Type Equivalence

There are two equivalence criteria:

Structural Equivalence:

- Two types are equivalent if their structures are the same

- C language's basic approach

Name Equivalence:

- Equivalent only if they have the same name

- Pascal, Ada's approach

Example:

type A = array(10, integer)

type B = array(10, integer)

Structural equivalence: A == B (same structure)

Name equivalence: A != B (different names)

6. Type Checking

SDD for Type Checking Rules

E -> num { E.type = integer }

E -> num . num { E.type = float }

E -> id { E.type = lookup(id.entry) }

E -> E1 + E2 { E.type = max(E1.type, E2.type)

if compatible(E1.type, E2.type)

else type_error }

E -> E1 [ E2 ] { E.type = if E1.type == array(s, t)

and E2.type == integer

then t

else type_error }

Type Coercion

Automatically converts operands of different types.

Type conversion hierarchy (automatic widening):

char -> int -> long -> float -> double

Example: int + float -> float + float -> float

Conversion rules (widening):

E -> E1 + E2

if E1.type == integer and E2.type == integer:

E.type = integer

else if E1.type == float and E2.type == integer:

t = new Temp()

emit(t "=" "intToFloat" E2.addr)

E.type = float

else if E1.type == integer and E2.type == float:

t = new Temp()

emit(t "=" "intToFloat" E1.addr)

E.type = float

7. Control Flow Translation

If-Else Statement Translation

// Source code

if (x < 100) {

x = x + 1;

} else {

x = x - 1;

}

// Three-address code

if x < 100 goto L1

goto L2

L1: t1 = x + 1

x = t1

goto L3

L2: t2 = x - 1

x = t2

L3: ...

While Statement Translation

// Source code

while (x < 100) {

x = x * 2;

}

// Three-address code

L1: if x < 100 goto L2

goto L3

L2: t1 = x * 2

x = t1

goto L1

L3: ...

8. Boolean Expression Translation

Short-Circuit Evaluation

Most languages use **short-circuit evaluation** for boolean expressions.

// a < b || c < d && e < f

Translation:

if a < b goto Ltrue

if c < d goto L1

goto Lfalse

L1: if e < f goto Ltrue

goto Lfalse

If `a < b` is true, the rest is not evaluated.

SDD for Boolean Expressions

B -> B1 || B2

B1.true = B.true

B1.false = newLabel()

B2.true = B.true

B2.false = B.false

B.code = B1.code || label(B1.false) || B2.code

B -> B1 && B2

B1.true = newLabel()

B1.false = B.false

B2.true = B.true

B2.false = B.false

B.code = B1.code || label(B1.true) || B2.code

B -> ! B1

B1.true = B.false

B1.false = B.true

B.code = B1.code

B -> E1 relop E2

B.code = gen("if" E1.addr relop E2.addr "goto" B.true)

|| gen("goto" B.false)

B -> true

B.code = gen("goto" B.true)

B -> false

B.code = gen("goto" B.false)

9. Backpatching

**Backpatching** is a technique for filling in branch target labels later when they are not yet determined.

Core Idea of Backpatching

1. Leave the target address empty when generating branch instructions

2. Fill in the empty address when the target is determined (backpatch)

Example:

100: if x < y goto ___ <- target unknown

101: goto ___ <- target unknown

...

105: x = x + 1 <- discover this is the target

After backpatching:

100: if x < y goto 105 <- filled in

101: goto ___ <- still unknown

Functions Used in Backpatching

makelist(i): Creates a list containing only instruction number i

merge(p1, p2): Merges two lists p1, p2

backpatch(p, i): Sets the target address of all instructions in list p to i

Boolean Expression Translation with Backpatching

B -> B1 || M B2

backpatch(B1.falselist, M.quad)

B.truelist = merge(B1.truelist, B2.truelist)

B.falselist = B2.falselist

B -> B1 && M B2

backpatch(B1.truelist, M.quad)

B.truelist = B2.truelist

B.falselist = merge(B1.falselist, B2.falselist)

B -> ! B1

B.truelist = B1.falselist

B.falselist = B1.truelist

M -> epsilon

M.quad = nextquad (record current instruction number)

Backpatching Example

// a < b || c < d && e < f

Generation process:

100: if a < b goto ___ (B1.truelist = {100})

101: goto ___ (B1.falselist = {101})

102: if c < d goto ___ (B2.truelist = {102})

103: goto ___ (B2.falselist = {103})

104: if e < f goto ___ (B3.truelist = {104})

105: goto ___ (B3.falselist = {105})

After backpatching:

100: if a < b goto Ltrue (truelist)

101: goto 102 (B1.false -> B2 start)

102: if c < d goto 104 (B2.true -> B3 start)

103: goto Lfalse (falselist)

104: if e < f goto Ltrue (truelist)

105: goto Lfalse (falselist)

Final truelist = {100, 104}

Final falselist = {103, 105}

10. Switch Statement Translation

// Source code

switch (x) {

case 1: stmt1; break;

case 5: stmt2; break;

default: stmt3;

}

// Three-address code

t = x

if t == 1 goto L1

if t == 5 goto L2

goto L3 // default

L1: (stmt1 code)

goto Lnext

L2: (stmt2 code)

goto Lnext

L3: (stmt3 code)

goto Lnext

Lnext: ...

When there are many cases, optimization is done using **jump tables** or **binary search**.

Summary

| Concept | Key Content |

| ------------------- | -------------------------------------------------- |

| DAG | Graph sharing common subexpressions from AST |

| Three-Address Code | One operator per instruction, max 3 addresses |

| Quadruples | (op, arg1, arg2, result) format |

| Triples | Results referenced by instruction number |

| SSA | Each variable defined exactly once |

| Type Checking | Verifying type compatibility of operators/operands |

| Short-Circuit Eval. | Partial evaluation of boolean expressions |

| Backpatching | Filling in branch targets later |

Intermediate code generation is the core phase that represents the semantics of source language in a machine-independent form. In the next article, we will cover runtime environments where programs execute.

현재 단락 (1/293)

Intermediate code generation is the phase that converts the syntax tree into a machine-independent *...

작성 글자: 0원문 글자: 8,175작성 단락: 0/293