- Published on
[Compiler] 09. Intermediate Code Generation - Three-Address Code and Type Checking
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Intermediate Code Generation - Three-Address Code and Type Checking
Intermediate code generation is the phase that converts the syntax tree into a machine-independent Intermediate Representation (IR). This article covers the main IR formats, type checking, and control flow translation.
1. Syntax Tree Variants
Abstract Syntax Tree (AST)
An AST is a concise representation with unnecessary nodes removed from the parse tree.
// a + a * (b - c) + (b - c) * d
AST:
+
/ \
+ *
/ \ / \
a * - d
/ \ / \
a - b c
/ \
b c
Directed Acyclic Graph (DAG)
A DAG shares common subexpressions in the AST.
// a + a * (b - c) + (b - c) * d
DAG:
+
/ \
+ *
/ \ / \
a * | d
/ \|
a - <- (b-c) represented only once
/ \
b c
The advantages of a DAG are:
- Automatic common subexpression detection: Shares identical operations to prevent redundant computation.
- Memory savings: Reuses identical nodes.
DAG Construction Algorithm
Node createNode(String op, Node left, Node right) {
// Check if a node with the same operation and children already exists
Node existing = lookup(op, left, right);
if (existing != null) {
return existing; // Reuse existing node
}
Node node = new Node(op, left, right);
insert(op, left, right, node); // Register in hash table
return node;
}
2. Three-Address Code
Three-address code is the most widely used intermediate representation format. Each instruction has at most one operator and at most three addresses (operands and result).
Forms of Three-Address Code
Basic form: x = y op z
Main instruction types:
x = y op z // Binary operation
x = op y // Unary operation
x = y // Copy
goto L // Unconditional branch
if x relop y goto L // Conditional branch
param x // Parameter passing
call p, n // Function call (p: function, n: arg count)
return y // Function return
x = y[i] // Array access (index)
x[i] = y // Array assignment
x = &y // Address of
x = *y // Dereference
*x = y // Indirect assignment
Example: Three-Address Code Translation of an Expression
// Original code
a = b * -c + b * -c
// Three-address code (AST based)
t1 = minus c
t2 = b * t1
t3 = minus c
t4 = b * t3
t5 = t2 + t4
a = t5
// Three-address code (DAG based - optimized)
t1 = minus c
t2 = b * t1
t5 = t2 + t2
a = t5
3. Implementation of Three-Address Code
Quadruples
Composed of 4 fields: (op, arg1, arg2, result)
No. | op | arg1 | arg2 | result
----+-------+------+------+-------
0 | minus | c | | t1
1 | * | b | t1 | t2
2 | minus | c | | t3
3 | * | b | t3 | t4
4 | + | t2 | t4 | t5
5 | = | t5 | | a
Triples
Removes the result field and references results by instruction number.
No. | op | arg1 | arg2
----+-------+------+------
0 | minus | c |
1 | * | b | (0)
2 | minus | c |
3 | * | b | (2)
4 | + | (1) | (3)
5 | = | a | (4)
Triples save memory but require updating references when instructions are moved.
Indirect Triples
Maintains a separate pointer list to triples, making instruction reordering easier.
Order list: [0, 1, 2, 3, 4, 5]
Reordered: [2, 0, 3, 1, 4, 5] // Original triples unchanged
4. Static Single Assignment (SSA) Form
In SSA (Static Single Assignment) form, each variable is defined exactly once.
// Normal three-address code
p = a + b
q = p - c
p = q * d // p defined twice
e = p + q
// SSA form
p1 = a + b
q1 = p1 - c
p2 = q1 * d // New name p2
e1 = p2 + q1
Phi Functions
Phi functions are used at points where control flow merges.
// Original code
if (flag) x = 1; else x = 2;
y = x;
// SSA form
if (flag) x1 = 1; else x2 = 2;
x3 = phi(x1, x2); // Selects based on which path was taken
y1 = x3;
Advantages of SSA
- Simplifies data flow analysis: Each variable has a unique definition.
- Facilitates optimization: Constant propagation, dead code elimination, etc., become intuitive.
- Used as the core IR in modern compilers (LLVM, GCC).
5. Type Expressions
How types are represented in a type system.
Basic Types
boolean, char, integer, float, void
Type Constructors
1. Array: array(size, element_type)
int a[10] -> array(10, integer)
2. Record/Struct: record(field_list)
struct { int x; float y; }
-> record((x, integer), (y, float))
3. Pointer: pointer(type)
int *p -> pointer(integer)
4. Function: parameter_types -> return_type
int f(float, char) -> float x char -> integer
Type Equivalence
There are two equivalence criteria:
Structural Equivalence:
- Two types are equivalent if their structures are the same
- C language's basic approach
Name Equivalence:
- Equivalent only if they have the same name
- Pascal, Ada's approach
Example:
type A = array(10, integer)
type B = array(10, integer)
Structural equivalence: A == B (same structure)
Name equivalence: A != B (different names)
6. Type Checking
SDD for Type Checking Rules
E -> num { E.type = integer }
E -> num . num { E.type = float }
E -> id { E.type = lookup(id.entry) }
E -> E1 + E2 { E.type = max(E1.type, E2.type)
if compatible(E1.type, E2.type)
else type_error }
E -> E1 [ E2 ] { E.type = if E1.type == array(s, t)
and E2.type == integer
then t
else type_error }
Type Coercion
Automatically converts operands of different types.
Type conversion hierarchy (automatic widening):
char -> int -> long -> float -> double
Example: int + float -> float + float -> float
Conversion rules (widening):
E -> E1 + E2
if E1.type == integer and E2.type == integer:
E.type = integer
else if E1.type == float and E2.type == integer:
t = new Temp()
emit(t "=" "intToFloat" E2.addr)
E.type = float
else if E1.type == integer and E2.type == float:
t = new Temp()
emit(t "=" "intToFloat" E1.addr)
E.type = float
7. Control Flow Translation
If-Else Statement Translation
// Source code
if (x < 100) {
x = x + 1;
} else {
x = x - 1;
}
// Three-address code
if x < 100 goto L1
goto L2
L1: t1 = x + 1
x = t1
goto L3
L2: t2 = x - 1
x = t2
L3: ...
While Statement Translation
// Source code
while (x < 100) {
x = x * 2;
}
// Three-address code
L1: if x < 100 goto L2
goto L3
L2: t1 = x * 2
x = t1
goto L1
L3: ...
8. Boolean Expression Translation
Short-Circuit Evaluation
Most languages use short-circuit evaluation for boolean expressions.
// a < b || c < d && e < f
Translation:
if a < b goto Ltrue
if c < d goto L1
goto Lfalse
L1: if e < f goto Ltrue
goto Lfalse
If a < b is true, the rest is not evaluated.
SDD for Boolean Expressions
B -> B1 || B2
B1.true = B.true
B1.false = newLabel()
B2.true = B.true
B2.false = B.false
B.code = B1.code || label(B1.false) || B2.code
B -> B1 && B2
B1.true = newLabel()
B1.false = B.false
B2.true = B.true
B2.false = B.false
B.code = B1.code || label(B1.true) || B2.code
B -> ! B1
B1.true = B.false
B1.false = B.true
B.code = B1.code
B -> E1 relop E2
B.code = gen("if" E1.addr relop E2.addr "goto" B.true)
|| gen("goto" B.false)
B -> true
B.code = gen("goto" B.true)
B -> false
B.code = gen("goto" B.false)
9. Backpatching
Backpatching is a technique for filling in branch target labels later when they are not yet determined.
Core Idea of Backpatching
1. Leave the target address empty when generating branch instructions
2. Fill in the empty address when the target is determined (backpatch)
Example:
100: if x < y goto ___ <- target unknown
101: goto ___ <- target unknown
...
105: x = x + 1 <- discover this is the target
After backpatching:
100: if x < y goto 105 <- filled in
101: goto ___ <- still unknown
Functions Used in Backpatching
makelist(i): Creates a list containing only instruction number i
merge(p1, p2): Merges two lists p1, p2
backpatch(p, i): Sets the target address of all instructions in list p to i
Boolean Expression Translation with Backpatching
B -> B1 || M B2
backpatch(B1.falselist, M.quad)
B.truelist = merge(B1.truelist, B2.truelist)
B.falselist = B2.falselist
B -> B1 && M B2
backpatch(B1.truelist, M.quad)
B.truelist = B2.truelist
B.falselist = merge(B1.falselist, B2.falselist)
B -> ! B1
B.truelist = B1.falselist
B.falselist = B1.truelist
M -> epsilon
M.quad = nextquad (record current instruction number)
Backpatching Example
// a < b || c < d && e < f
Generation process:
100: if a < b goto ___ (B1.truelist = {100})
101: goto ___ (B1.falselist = {101})
102: if c < d goto ___ (B2.truelist = {102})
103: goto ___ (B2.falselist = {103})
104: if e < f goto ___ (B3.truelist = {104})
105: goto ___ (B3.falselist = {105})
After backpatching:
100: if a < b goto Ltrue (truelist)
101: goto 102 (B1.false -> B2 start)
102: if c < d goto 104 (B2.true -> B3 start)
103: goto Lfalse (falselist)
104: if e < f goto Ltrue (truelist)
105: goto Lfalse (falselist)
Final truelist = {100, 104}
Final falselist = {103, 105}
10. Switch Statement Translation
// Source code
switch (x) {
case 1: stmt1; break;
case 5: stmt2; break;
default: stmt3;
}
// Three-address code
t = x
if t == 1 goto L1
if t == 5 goto L2
goto L3 // default
L1: (stmt1 code)
goto Lnext
L2: (stmt2 code)
goto Lnext
L3: (stmt3 code)
goto Lnext
Lnext: ...
When there are many cases, optimization is done using jump tables or binary search.
Summary
| Concept | Key Content |
|---|---|
| DAG | Graph sharing common subexpressions from AST |
| Three-Address Code | One operator per instruction, max 3 addresses |
| Quadruples | (op, arg1, arg2, result) format |
| Triples | Results referenced by instruction number |
| SSA | Each variable defined exactly once |
| Type Checking | Verifying type compatibility of operators/operands |
| Short-Circuit Eval. | Partial evaluation of boolean expressions |
| Backpatching | Filling in branch targets later |
Intermediate code generation is the core phase that represents the semantics of source language in a machine-independent form. In the next article, we will cover runtime environments where programs execute.