Skip to content

필사 모드: [Compiler] 03. Building a Simple Syntax-Directed Translator

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Building a Simple Syntax-Directed Translator

In this article, we will build a simple translator to understand the core principles of compilers. Starting from grammar definition, we cover parse trees and the basics of syntax-directed translation.

1. Syntax Definition

The syntax of a programming language is defined using **Context-Free Grammar (CFG)**.

Components of a Grammar

A grammar consists of four elements:

1. **Terminal**: Also called tokens, these are the basic units of the language (e.g., `+`, `*`, `id`, `num`).

2. **Nonterminal**: Variables that represent syntactic constructs (e.g., `expr`, `stmt`).

3. **Production**: Defines how a nonterminal is composed.

4. **Start Symbol**: The nonterminal that represents the entire program.

Grammar Example: Arithmetic Expressions

expr -> expr + term | expr - term | term

term -> term * factor | term / factor | factor

factor -> ( expr ) | num | id

This grammar expresses that multiplication/division has **higher precedence** than addition/subtraction.

2. Derivation

Derivation is the process of starting from the start symbol and repeatedly applying production rules to generate a string.

Example: Derivation of `9 - 5 + 2`

Grammar:

list -> list + digit | list - digit | digit

digit -> 0 | 1 | 2 | ... | 9

Leftmost Derivation:

list

=> list + digit

=> list - digit + digit

=> digit - digit + digit

=> 9 - digit + digit

=> 9 - 5 + digit

=> 9 - 5 + 2

Types of Derivation

- **Leftmost Derivation**: Always replaces the leftmost nonterminal first.

- **Rightmost Derivation**: Always replaces the rightmost nonterminal first.

3. Parse Tree

A parse tree represents the derivation process in tree form.

Let us examine the parse tree for `9 - 5 + 2`:

list

/ | \

/ | \

list '+' digit

/ | \ |

/ | \ 2

list '-' digit

| |

digit 5

|

9

The characteristics of a parse tree are:

- **Root**: Start symbol

- **Internal nodes**: Nonterminals (application of production rules)

- **Leaf nodes**: Terminals (actual tokens)

- Reading the leaf nodes from left to right yields the original input.

4. Ambiguity

If **two or more parse trees** exist for a single string, the grammar is said to be **ambiguous**.

Example of an Ambiguous Grammar

string -> string + string | string - string | 0 | 1 | ... | 9

Parsing `9 - 5 + 2` with this grammar allows two interpretations:

Interpretation 1: (9 - 5) + 2 = 6 Interpretation 2: 9 - (5 + 2) = 2

string string

/ | \ / | \

/ | \ / | \

string + string string - string

/ | \ | | / | \

/ | \ 2 9 / | \

string - string string + string

| | | |

9 5 5 2

Resolving Ambiguity

Ambiguity is resolved by reflecting **operator precedence** and **associativity** in the grammar.

// Ambiguous grammar

expr -> expr + expr | expr * expr | num

// Disambiguated grammar (multiplication first, left-associative)

expr -> expr + term | term

term -> term * factor | factor

factor -> num

5. Operator Precedence and Associativity

Precedence

Expressed through the hierarchical structure of the grammar. Lower precedence operators are placed higher in the tree.

Precedence (low -> high):

|| (logical OR)

&& (logical AND)

== !=

< > <= >=

+ -

* / %

! - (unary)

Associativity

Defines the order in which operators of the same precedence are combined when they appear consecutively.

Left-Associative:

a - b - c = (a - b) - c

Grammar: expr -> expr - term | term

Right-Associative:

a = b = c is a = (b = c)

Grammar: right -> letter = right | letter

Most binary operators are left-associative, while the assignment operator (`=`) and exponentiation operator (`**`) are right-associative.

6. Syntax-Directed Translation Basics

Syntax-directed translation is a technique that attaches **semantic actions** to each production rule of the grammar, performing translation simultaneously with parsing.

Attributes

Values associated with grammar symbols are called **attributes**. There are two types:

- **Synthesized Attribute**: Computed from the attributes of child nodes.

- **Inherited Attribute**: Computed from the attributes of parent or sibling nodes.

Syntax-Directed Definition

Links **semantic rules** to each production rule.

Production Semantic Rule

----------------------------------------------

expr -> expr1 + term expr.val = expr1.val + term.val

expr -> term expr.val = term.val

term -> num term.val = num.lexval

For example, evaluating `9 + 5` yields:

expr (val = 14)

/ | \

/ | \

expr + term (val = 9, val = 5)

| |

term num

| (5)

num

(9)

7. Postfix Notation

**Postfix notation** (Reverse Polish Notation) is a representation where the operator comes after its operands.

Infix Notation Postfix Notation

-------------------------------------------

9 - 5 + 2 9 5 - 2 +

9 - (5 + 2) 9 5 2 + -

a * b + c a b * c +

a + b * c a b c * +

The advantages of postfix notation are:

- **No parentheses needed**: The order of operations is embedded in the notation itself.

- **Easy evaluation with a stack**.

Postfix Notation Evaluation Algorithm

Input: 9 5 - 2 +

Stack changes:

Read 9 -> [9]

Read 5 -> [9, 5]

Read - -> [4] (9 - 5 = 4)

Read 2 -> [4, 2]

Read + -> [6] (4 + 2 = 6)

Result: 6

8. Syntax-Directed Definition for Infix-to-Postfix Translation

Let us write a syntax-directed definition that translates infix expressions to postfix expressions.

Production Semantic Action

----------------------------------------------

expr -> expr1 + term print('+')

expr -> expr1 - term print('-')

expr -> term (none)

term -> 0 print('0')

term -> 1 print('1')

...

term -> 9 print('9')

Applying this translation scheme to `9 - 5 + 2` outputs `9 5 - 2 +`.

9. Implementing a Simple Translator

Let us implement the above translator in Java using the recursive descent approach.

public class PostfixTranslator {

private String input;

private int pos;

private char lookahead;

public PostfixTranslator(String input) {

this.input = input;

this.pos = 0;

this.lookahead = input.charAt(0);

}

// expr -> term rest

// rest -> + term { print('+') } rest

// | - term { print('-') } rest

// | (epsilon)

void expr() {

term();

while (true) {

if (lookahead == '+') {

match('+');

term();

System.out.print('+');

} else if (lookahead == '-') {

match('-');

term();

System.out.print('-');

} else {

return;

}

}

}

// term -> 0 { print('0') } | 1 { print('1') } | ... | 9 { print('9') }

void term() {

if (Character.isDigit(lookahead)) {

System.out.print(lookahead);

match(lookahead);

} else {

throw new RuntimeException("Syntax error");

}

}

void match(char expected) {

if (lookahead == expected) {

pos++;

if (pos < input.length()) {

lookahead = input.charAt(pos);

} else {

lookahead = '\0'; // End of input

}

} else {

throw new RuntimeException("Expected " + expected);

}

}

public static void main(String[] args) {

PostfixTranslator translator = new PostfixTranslator("9-5+2");

translator.expr();

// Output: 95-2+

}

}

Execution Trace

Input: 9-5+2

expr() called

term() -> prints '9', lookahead = '-'

lookahead == '-':

match('-'), lookahead = '5'

term() -> prints '5', lookahead = '+'

not '+', so prints '-'

lookahead == '+':

match('+'), lookahead = '2'

term() -> prints '2', lookahead = '\0'

prints '+'

lookahead == '\0': return

Final output: 95-2+

10. Constructing Abstract Syntax Trees (AST)

Instead of postfix notation, a translator can also generate an **Abstract Syntax Tree (AST)**.

An AST is a simplified version of the parse tree, removing unnecessary intermediate nodes.

Parse Tree: AST:

expr +

/ | \ / \

expr + term - 2

/ | \ | / \

expr - term num:2 9 5

| |

term num:5

|

num:9

AST Node Implementation

abstract class ASTNode {

abstract int eval();

}

class NumNode extends ASTNode {

int value;

NumNode(int value) { this.value = value; }

int eval() { return value; }

}

class BinOpNode extends ASTNode {

char op;

ASTNode left, right;

BinOpNode(char op, ASTNode left, ASTNode right) {

this.op = op;

this.left = left;

this.right = right;

}

int eval() {

switch (op) {

case '+': return left.eval() + right.eval();

case '-': return left.eval() - right.eval();

case '*': return left.eval() * right.eval();

default: throw new RuntimeException("Unknown op: " + op);

}

}

}

Summary

| Concept | Key Content |

| --------------------------- | --------------------------------------------------------------------- |

| Context-Free Grammar | Composed of terminals, nonterminals, productions, start symbol |

| Derivation | Generating strings by applying production rules from the start symbol |

| Parse Tree | Tree representation of the derivation process |

| Ambiguity | Two or more parse trees exist for a single string |

| Precedence/Associativity | Expressed through grammar hierarchy and recursion direction |

| Syntax-Directed Translation | Attaching semantic actions to production rules for translation |

| Postfix Notation | Notation where operators follow their operands |

| Synthesized Attribute | Attributes computed from child nodes |

The simple translator built in this article is a miniature that demonstrates the core principles of compilers. In the next article, we will take a deeper look at the lexical analysis phase.

현재 단락 (1/247)

In this article, we will build a simple translator to understand the core principles of compilers. S...

작성 글자: 0원문 글자: 7,625작성 단락: 0/247