- Authors

- Name
- Youngju Kim
- @fjvbn20031
Syntax Analysis (2) - Bottom-Up Parsing and LR Parsers
Bottom-Up parsing starts from the input string and reduces it toward the start symbol. It can handle a wider range of grammars than LL parsers and is widely used in actual compilers.
1. Bottom-Up Parsing Overview
Basic Idea
Bottom-Up parsing repeatedly performs reductions to turn the input string into the start symbol.
Input: id * id + id
Reduction process (reverse of rightmost derivation):
id * id + id
F * id + id (F -> id)
T * id + id (T -> F)
T * F + id (F -> id)
T + id (T -> T * F)
E + id (E -> T)
E + F (F -> id)
E + T (T -> F)
E (E -> E + T)
This traces the rightmost derivation in reverse.
Handle
A handle is the combination of a reducible substring and the corresponding production rule. Finding and reducing the correct handle leads to the start symbol.
Sentential form: T * F + id
Handle: T * F (production: T -> T * F)
After reduction: T + id
Choosing the wrong handle makes it impossible to reach the start symbol. Handle selection is the core challenge of Bottom-Up parsing.
2. Shift-Reduce Parsing
A shift-reduce parser uses a stack and an input buffer.
Four Operations
- Shift: Pushes the next input token onto the stack.
- Reduce: Replaces the handle at the top of the stack with the nonterminal on the left side of a production.
- Accept: Parsing succeeds.
- Error: A syntax error is found.
Parsing Example: id * id + id
Stack Input Action
$ id * id + id $ Shift
$ id * id + id $ Reduce (F -> id)
$ F * id + id $ Reduce (T -> F)
$ T * id + id $ Shift
$ T * id + id $ Shift
$ T * id + id $ Reduce (F -> id)
$ T * F + id $ Reduce (T -> T * F)
$ T + id $ Reduce (E -> T)
$ E + id $ Shift
$ E + id $ Shift
$ E + id $ Reduce (F -> id)
$ E + F $ Reduce (T -> F)
$ E + T $ Reduce (E -> E + T)
$ E $ Accept!
Shift-Reduce Conflicts
Sometimes the parser cannot decide whether to shift or reduce.
- Shift-Reduce Conflict: Both shift and reduce are possible
- Reduce-Reduce Conflict: Multiple production rules can be used for reduction
Resolving these conflicts is the core of LR parsing techniques.
3. LR Parsing Overview
LR parsers are the most powerful shift-reduce parsers.
Meaning of LR(k)
- L: Scans input from left to right
- R: Produces a rightmost derivation in reverse
- (k): Makes decisions based on k lookahead tokens
Components of an LR Parser
Input: a1 a2 ... an $
+---+---+---+---+
Stack: | s0| X1| s1| X2| s2 ...
+---+---+---+---+
Parsing table:
ACTION[s, a] = shift/reduce/accept/error
GOTO[s, A] = next state
- Stack: States and grammar symbols are stacked alternately.
- ACTION table: Determines the action based on current state and input token.
- GOTO table: Determines the state to transition to after a reduction.
LR Parsing Algorithm
Push initial state s0 onto the stack
loop:
s = top state on stack
a = current input symbol
case ACTION[s, a]:
shift sn:
Push a and state sn onto the stack
Advance to next input symbol
reduce (A -> beta):
Pop 2*|beta| symbols from the stack
s' = new top state on stack
Push A and GOTO[s', A] onto the stack
accept:
Parsing success, terminate
error:
Call error handling routine
4. LR(0) Items and Automaton
LR(0) Items
An LR(0) item is a production rule with a dot added, representing the parsing progress.
Production: A -> X Y Z
Possible LR(0) items:
A -> . X Y Z (nothing seen yet)
A -> X . Y Z (X has been seen)
A -> X Y . Z (X Y have been seen)
A -> X Y Z . (ready to reduce!)
Closure Operation
Computes the closure of an item set.
closure(I):
Add all items in I to the result
If A -> alpha . B beta is in the result:
For all productions B -> gamma:
Add B -> . gamma to the result (if not already present)
Repeat until no changes
GOTO Operation
goto(I, X):
For items A -> alpha . X beta in I,
collect A -> alpha X . beta, then take their closure
LR(0) Automaton Construction Example
Grammar:
E' -> E (augmented grammar start)
E -> E + T
E -> T
T -> T * F
T -> F
F -> ( E )
F -> id
I0 = closure({E' -> . E}):
E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id
goto(I0, E) = I1:
E' -> E .
E -> E . + T
goto(I0, T) = I2:
E -> T .
T -> T . * F
goto(I0, F) = I3:
T -> F .
goto(I0, '(') = I4:
F -> ( . E )
E -> . E + T
E -> . T
...
goto(I0, id) = I5:
F -> id .
5. SLR Parsing Table Construction
SLR (Simple LR) parsing uses LR(0) items and FOLLOW sets.
SLR Parsing Table Construction Algorithm
1. Construct the LR(0) item set collection C for the augmented grammar
2. For item set Ii:
a) If A -> alpha . a beta is in Ii and goto(Ii, a) = Ij:
ACTION[i, a] = shift j
b) If A -> alpha . is in Ii (A != S'):
For all a in FOLLOW(A):
ACTION[i, a] = reduce (A -> alpha)
c) If S' -> S . is in Ii:
ACTION[i, $] = accept
3. If goto(Ii, A) = Ij: GOTO[i, A] = j
SLR Parsing Table Example
ACTION GOTO
State | id | + | * | ( | ) | $ | E | T | F
-----+-----+-----+-----+-----+-----+---+---+---+---
0 | s5 | | | s4 | | | 1 | 2 | 3
1 | | s6 | | | | acc| | |
2 | | r2 | s7 | | r2 | r2| | |
3 | | r4 | r4 | | r4 | r4| | |
4 | s5 | | | s4 | | | 8 | 2 | 3
5 | | r6 | r6 | | r6 | r6| | |
6 | s5 | | | s4 | | | | 9 | 3
7 | s5 | | | s4 | | | | | 10
8 | | s6 | | | s11 | | | |
9 | | r1 | s7 | | r1 | r1| | |
10 | | r3 | r3 | | r3 | r3| | |
11 | | r5 | r5 | | r5 | r5| | |
Here s5 means shift 5, r2 means reduce by production 2, and acc means accept.
6. Canonical LR(1) Items
Some grammars cannot be handled by SLR parsers. Canonical LR(1) items add a lookahead symbol for more powerful parsing.
Form of LR(1) Items
[A -> alpha . beta, a]
- A -> alpha . beta: Same as LR(0) item
- a: Lookahead symbol (consulted when reducing with this item)
LR(1) Closure
closure(I):
If [A -> alpha . B beta, a] is in I:
For all productions B -> gamma:
For all b in FIRST(beta a):
Add [B -> . gamma, b]
Pros and Cons of Canonical LR(1) Parsing
Pros: Can handle more grammars than SLR
Cons: Number of states can become very large
(tens to hundreds of times more than LR(0))
7. LALR Parsing
LALR (Look-Ahead LR) is a compromise between the power of canonical LR(1) and the efficiency of SLR.
Core Idea
Merge item sets in LR(1) that have the same core.
LR(1) states:
I4: {[C -> .c C, c], [C -> .c C, d], [C -> .d, c], [C -> .d, d]}
I7: {[C -> .c C, $], [C -> .d, $]}
Same core, so merge:
I47: {[C -> .c C, c/d/$], [C -> .d, c/d/$]}
Characteristics of LALR Parsers
- Same table size as SLR with the same number of states as canonical LR(1)
- More powerful than SLR, slightly weaker than canonical LR(1)
- Can handle most programming languages
- Yacc and Bison generate LALR parsers
Relationship Between SLR, LR(1), and LALR
LR(0) grammars < SLR grammars < LALR grammars < LR(1) grammars
Practical choices:
- Simple grammars: SLR
- Most programming languages: LALR (Yacc/Bison)
- Complex grammars: Canonical LR(1)
8. Yacc Parser Generator
Yacc (Yet Another Compiler Compiler) is a tool that automatically generates LALR(1) parsers. The GNU version, Bison, is widely used.
Yacc Program Structure
Declarations
%%
Grammar Rules and Actions
%%
Supporting C Functions
Yacc Program Example: Calculator
%{
#include <stdio.h>
#include <ctype.h>
int yylex();
void yyerror(const char *s);
%}
%token NUM
%%
line : expr '\n' { printf("Result: %d\n", $1); }
;
expr : expr '+' term { $$ = $1 + $3; }
| expr '-' term { $$ = $1 - $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| term '/' factor { $$ = $1 / $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| NUM { $$ = $1; }
;
%%
int yylex() {
int c = getchar();
if (isdigit(c)) {
yylval = c - '0';
while (isdigit(c = getchar()))
yylval = yylval * 10 + c - '0';
ungetc(c, stdin);
return NUM;
}
return c;
}
void yyerror(const char *s) {
fprintf(stderr, "Error: %s\n", s);
}
int main() {
return yyparse();
}
Yacc Conflict Resolution Rules
When conflicts exist in the LALR parsing table, Yacc applies the following rules:
- Shift-Reduce Conflict: Defaults to shift (resolves the dangling else problem).
- Reduce-Reduce Conflict: Reduces by the first listed rule.
Yacc Precedence Directives
%left '+' '-' // Left-associative, lower precedence
%left '*' '/' // Left-associative, higher precedence
%right UMINUS // Right-associative, unary minus
Using these directives, ambiguous grammars can be handled without conflicts.
9. Error Recovery in LR Parsers
Panic Mode Recovery
1. Pop the stack until a state that can handle A is found
2. Skip input until a token in FOLLOW(A) is found
3. Push GOTO[s, A] and continue parsing
Yacc Error Recovery: error Token
stmt : expr ';'
| error ';' { yyerrok; printf("Skipping bad statement\n"); }
;
error is a special Yacc token that automatically matches in error situations. yyerrok declares that error recovery is complete.
Summary
| Concept | Key Content |
|---|---|
| Bottom-Up Parsing | Starts from input and reduces to start symbol |
| Handle | Reducible substring and production rule combination |
| Shift-Reduce | Stack-based 4 operations (shift, reduce, accept, error) |
| LR(0) Items | Dot represents parsing progress |
| SLR | LR(0) items + FOLLOW sets |
| Canonical LR(1) | Items with lookahead, most powerful |
| LALR | Merges LR(1) states with same core, practical |
| Yacc/Bison | Automatic LALR(1) parser generation tool |
LR parsing is a powerful technique that can efficiently handle most programming languages. In the next article, we will cover syntax-directed translation, which performs meaningful translation using parsing results.