- Authors

- Name
- Youngju Kim
- @fjvbn20031
Building a Simple Syntax-Directed Translator
In this article, we will build a simple translator to understand the core principles of compilers. Starting from grammar definition, we cover parse trees and the basics of syntax-directed translation.
1. Syntax Definition
The syntax of a programming language is defined using Context-Free Grammar (CFG).
Components of a Grammar
A grammar consists of four elements:
- Terminal: Also called tokens, these are the basic units of the language (e.g.,
+,*,id,num). - Nonterminal: Variables that represent syntactic constructs (e.g.,
expr,stmt). - Production: Defines how a nonterminal is composed.
- Start Symbol: The nonterminal that represents the entire program.
Grammar Example: Arithmetic Expressions
expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> ( expr ) | num | id
This grammar expresses that multiplication/division has higher precedence than addition/subtraction.
2. Derivation
Derivation is the process of starting from the start symbol and repeatedly applying production rules to generate a string.
Example: Derivation of 9 - 5 + 2
Grammar:
list -> list + digit | list - digit | digit
digit -> 0 | 1 | 2 | ... | 9
Leftmost Derivation:
list
=> list + digit
=> list - digit + digit
=> digit - digit + digit
=> 9 - digit + digit
=> 9 - 5 + digit
=> 9 - 5 + 2
Types of Derivation
- Leftmost Derivation: Always replaces the leftmost nonterminal first.
- Rightmost Derivation: Always replaces the rightmost nonterminal first.
3. Parse Tree
A parse tree represents the derivation process in tree form.
Let us examine the parse tree for 9 - 5 + 2:
list
/ | \
/ | \
list '+' digit
/ | \ |
/ | \ 2
list '-' digit
| |
digit 5
|
9
The characteristics of a parse tree are:
- Root: Start symbol
- Internal nodes: Nonterminals (application of production rules)
- Leaf nodes: Terminals (actual tokens)
- Reading the leaf nodes from left to right yields the original input.
4. Ambiguity
If two or more parse trees exist for a single string, the grammar is said to be ambiguous.
Example of an Ambiguous Grammar
string -> string + string | string - string | 0 | 1 | ... | 9
Parsing 9 - 5 + 2 with this grammar allows two interpretations:
Interpretation 1: (9 - 5) + 2 = 6 Interpretation 2: 9 - (5 + 2) = 2
string string
/ | \ / | \
/ | \ / | \
string + string string - string
/ | \ | | / | \
/ | \ 2 9 / | \
string - string string + string
| | | |
9 5 5 2
Resolving Ambiguity
Ambiguity is resolved by reflecting operator precedence and associativity in the grammar.
// Ambiguous grammar
expr -> expr + expr | expr * expr | num
// Disambiguated grammar (multiplication first, left-associative)
expr -> expr + term | term
term -> term * factor | factor
factor -> num
5. Operator Precedence and Associativity
Precedence
Expressed through the hierarchical structure of the grammar. Lower precedence operators are placed higher in the tree.
Precedence (low -> high):
|| (logical OR)
&& (logical AND)
== !=
< > <= >=
+ -
* / %
! - (unary)
Associativity
Defines the order in which operators of the same precedence are combined when they appear consecutively.
Left-Associative:
a - b - c = (a - b) - c
Grammar: expr -> expr - term | term
Right-Associative:
a = b = c is a = (b = c)
Grammar: right -> letter = right | letter
Most binary operators are left-associative, while the assignment operator (=) and exponentiation operator (**) are right-associative.
6. Syntax-Directed Translation Basics
Syntax-directed translation is a technique that attaches semantic actions to each production rule of the grammar, performing translation simultaneously with parsing.
Attributes
Values associated with grammar symbols are called attributes. There are two types:
- Synthesized Attribute: Computed from the attributes of child nodes.
- Inherited Attribute: Computed from the attributes of parent or sibling nodes.
Syntax-Directed Definition
Links semantic rules to each production rule.
Production Semantic Rule
----------------------------------------------
expr -> expr1 + term expr.val = expr1.val + term.val
expr -> term expr.val = term.val
term -> num term.val = num.lexval
For example, evaluating 9 + 5 yields:
expr (val = 14)
/ | \
/ | \
expr + term (val = 9, val = 5)
| |
term num
| (5)
num
(9)
7. Postfix Notation
Postfix notation (Reverse Polish Notation) is a representation where the operator comes after its operands.
Infix Notation Postfix Notation
-------------------------------------------
9 - 5 + 2 9 5 - 2 +
9 - (5 + 2) 9 5 2 + -
a * b + c a b * c +
a + b * c a b c * +
The advantages of postfix notation are:
- No parentheses needed: The order of operations is embedded in the notation itself.
- Easy evaluation with a stack.
Postfix Notation Evaluation Algorithm
Input: 9 5 - 2 +
Stack changes:
Read 9 -> [9]
Read 5 -> [9, 5]
Read - -> [4] (9 - 5 = 4)
Read 2 -> [4, 2]
Read + -> [6] (4 + 2 = 6)
Result: 6
8. Syntax-Directed Definition for Infix-to-Postfix Translation
Let us write a syntax-directed definition that translates infix expressions to postfix expressions.
Production Semantic Action
----------------------------------------------
expr -> expr1 + term print('+')
expr -> expr1 - term print('-')
expr -> term (none)
term -> 0 print('0')
term -> 1 print('1')
...
term -> 9 print('9')
Applying this translation scheme to 9 - 5 + 2 outputs 9 5 - 2 +.
9. Implementing a Simple Translator
Let us implement the above translator in Java using the recursive descent approach.
public class PostfixTranslator {
private String input;
private int pos;
private char lookahead;
public PostfixTranslator(String input) {
this.input = input;
this.pos = 0;
this.lookahead = input.charAt(0);
}
// expr -> term rest
// rest -> + term { print('+') } rest
// | - term { print('-') } rest
// | (epsilon)
void expr() {
term();
while (true) {
if (lookahead == '+') {
match('+');
term();
System.out.print('+');
} else if (lookahead == '-') {
match('-');
term();
System.out.print('-');
} else {
return;
}
}
}
// term -> 0 { print('0') } | 1 { print('1') } | ... | 9 { print('9') }
void term() {
if (Character.isDigit(lookahead)) {
System.out.print(lookahead);
match(lookahead);
} else {
throw new RuntimeException("Syntax error");
}
}
void match(char expected) {
if (lookahead == expected) {
pos++;
if (pos < input.length()) {
lookahead = input.charAt(pos);
} else {
lookahead = '\0'; // End of input
}
} else {
throw new RuntimeException("Expected " + expected);
}
}
public static void main(String[] args) {
PostfixTranslator translator = new PostfixTranslator("9-5+2");
translator.expr();
// Output: 95-2+
}
}
Execution Trace
Input: 9-5+2
expr() called
term() -> prints '9', lookahead = '-'
lookahead == '-':
match('-'), lookahead = '5'
term() -> prints '5', lookahead = '+'
not '+', so prints '-'
lookahead == '+':
match('+'), lookahead = '2'
term() -> prints '2', lookahead = '\0'
prints '+'
lookahead == '\0': return
Final output: 95-2+
10. Constructing Abstract Syntax Trees (AST)
Instead of postfix notation, a translator can also generate an Abstract Syntax Tree (AST).
An AST is a simplified version of the parse tree, removing unnecessary intermediate nodes.
Parse Tree: AST:
expr +
/ | \ / \
expr + term - 2
/ | \ | / \
expr - term num:2 9 5
| |
term num:5
|
num:9
AST Node Implementation
abstract class ASTNode {
abstract int eval();
}
class NumNode extends ASTNode {
int value;
NumNode(int value) { this.value = value; }
int eval() { return value; }
}
class BinOpNode extends ASTNode {
char op;
ASTNode left, right;
BinOpNode(char op, ASTNode left, ASTNode right) {
this.op = op;
this.left = left;
this.right = right;
}
int eval() {
switch (op) {
case '+': return left.eval() + right.eval();
case '-': return left.eval() - right.eval();
case '*': return left.eval() * right.eval();
default: throw new RuntimeException("Unknown op: " + op);
}
}
}
Summary
| Concept | Key Content |
|---|---|
| Context-Free Grammar | Composed of terminals, nonterminals, productions, start symbol |
| Derivation | Generating strings by applying production rules from the start symbol |
| Parse Tree | Tree representation of the derivation process |
| Ambiguity | Two or more parse trees exist for a single string |
| Precedence/Associativity | Expressed through grammar hierarchy and recursion direction |
| Syntax-Directed Translation | Attaching semantic actions to production rules for translation |
| Postfix Notation | Notation where operators follow their operands |
| Synthesized Attribute | Attributes computed from child nodes |
The simple translator built in this article is a miniature that demonstrates the core principles of compilers. In the next article, we will take a deeper look at the lexical analysis phase.