compilation 0368-3133 lecture 4 syntax analysis noam rinetzky 1 zijian xu, hong chen, song-chun zhu...
TRANSCRIPT
1
Compilation 0368-3133
Lecture 4
Syntax AnalysisNoam Rinetzky
Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model for face representation and sketching," IEEE Trans. Pattern Analysis and Machine Intelligence(PAMI)'08.
*
*
*
*
2
Where are we?
Executable
code
exe
Source
text
txtLexicalAnalysi
s
Sem.Analysis
Process text input
characters SyntaxAnalysi
s
tokens AST
Intermediate code
generation
Annotated AST
Intermediate code
optimization
IR CodegenerationIR
Target code optimizatio
n
Symbolic Instructions
SI Machine code
generation
Write executable
output
MI
LexicalAnalysi
s
SyntaxAnalysi
s✓✓ ﹅
From scanning to parsing
3
((23 + 7) * x)
) x * ) 7 + 23 (
RP Id OP RP Num ( Num LP LP
Lexical Analyzer
characters (program text)
token stream
ParserContext free grammar: Exp ... |Exp + Exp | Id
Op(*)
Id(x)
Num(23) Num(7)
Op(+)
Abstract Syntax Treevalidsyntax
error
Regular language: Id ‘a’ | ... | ‘z’
4
Broad kinds of parsers
• Top-Down parsers – Construct parse tree in a top-down matter– Find the leftmost derivation
• Bottom-Up parsers – Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order
• Parsers for arbitrary grammars– Earley’s method, CYK method– Usually, not used in practice (though might change)
5
6
Context free grammars (CFGs)
• V – non terminals (syntactic variables)• T – terminals (tokens)• P – derivation rules
• Each rule of the form V (T V)*
• S – start symbol
G = (V,T,P,S)
7
Derivations
• Show that a sentence ω is in a grammar G by repeatedly applying a production rule
• Sentence αNβ • Rule Nµ• Derived sentence: αNβ αµβ
– µ1 * µ2 if µ1 … µ2
8
Leftmost Derivationx := z;y := x + z
S S;S
S id := E
E id | E + E | E * E | ( E )
SS S;
id := E S;id := id S;id := id id := E ;id := id id := E + E ; id := id id := id + E ;id := id id := id + id ;
S S;SS id := EE idS id := EE E + E E id
E id
x := z ; y := x + z
9
Rightmost Derivation
SS S;S id := E;S id := E + E;S id := E + id;S id := id + id ;
id := E id := id + id ;id := id id := id + id ;
<id,”x”> ASS <id,”z”> ;<id,”y”> ASS <id,”x”> PLUS <id,”z”>
S S;SS id := E | …E id | E + E | E * E | …
S S;SS id := EE E + EE id E id S id := E
E id <id,”x”> ASS <id,”z”> ; <id,”y”> ASS <id,”x”> PLUS <id,”z”>
10
Parse treeS
S S;
id := E S;
id := id S;
id := id id := E ;
id := id id := E + E ;
id := id id := E + id ;
id := id id := id + id ;x:= z ; y := x + z
S
S
;
S
id :=
E
id
id := E
E
+
E
id id
11
Ambiguity
x := y+z*wS S ; SS id := E | … E id | E + E | E * E | …
S
id := E
E + E
id
id
E * E
id
S
id := E
E*E
id
id
E + E
id
12
Top-down parsing
• Begin with Start symbol• Apply production rules• Until desired word is derived
13
Top-down parsing
• Begin with Start symbol• Apply production rules• Until desired word is derived• Can be implemented using recursion
14
Recursive descent parsing
• Define a function for every nonterminal• Every function work as follows
– Find applicable production rule– Terminal function checks match with next input
token– Nonterminal function calls (recursively) other
functions
15
Recursive descent parsing
• Define a function for every nonterminal• Every function work as follows
– Find applicable production rule– Terminal function checks match with next input
token– Nonterminal function calls (recursively) other
functions• If there are several applicable productions …
16
Top-down parsing
17
Recursive descent parsingwith lookahead
• Define a function for every nonterminal• Every function work as follows
– Find applicable production rule– Terminal function checks match with next input
token– Nonterminal function calls (recursively) other
functions• If there are several applicable productions
decide based on the next unmatched token
18
Predictive parsing
• Recursive descent• LL(k) grammars
19
A predictive (recursive descent) parser
E() { if (current {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error();}
LIT() { if (current == TRUE)
match(TRUE); else if (current == FALSE)
match(FALSE); else error();}
E LIT | (E OP E) | not ELIT true | falseOP and | or | xor
match(token t) { if (current == t) current = next_token() else error();}
Reminder: Variable current holds the current input token
OP() { if (current == AND) match(AND); else if (current == OR) match(OR); else if (current == XOR) match(XOR); else error();}
Note: TRUE = token for “true” etc.
20
What we want: Lookaheads!
E() { if (current {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error();}
E ⟶ LIT | (E OP E) | not ELIT ⟶ true | falseOP ⟶ and | or | xor
Note: TRUE = token for “true” etc.
21
Why we want it: Prediction Table!
• Given – Non terminal X– Derivation rules X ⟶ α1 | … | αk
– Terminal t• T[X,t] = αi if we should apply rule αi
Prediction table
Remember the colors
22
How we get it?
• First FIRST • Then FOLLOW
– (then FIRST again …)
23
FIRST Sets• FIRST(µ): The set of first tokens in words in L(µ)
• FIRST( LIT ) = { TRUE, FSLSE }• FIRST( ( E OP E ) ) = { LPAREN }• FIRST( not E ) = { NOT }
• X ⟶ α1 |…|αk
• FIRST(X) = FIRST(α1) … FIRST∪ ∪ (αk) { ∪ ℇ } if αi * ⟶ ℇ for some αi
E ⟶ LIT | (E OP E) | not E
24
FIRST Sets• FIRST(X) = { t | X * ⟶ t β} { ∪ ℇ | X * ⟶ ℇ }
– all terminals t can appear as first in some derivation for X • Plus ℇ if it can be derived from X
• If for every α and β such that X⟶ ... α |…| β … FIRST(α)∩FIRST(β) = {} then we can always predict (choose) which rule to apply based on next token
25
FIRST Sets• FIRST(X) = { t | X * ⟶ t β} { ∪ ℇ | X * ⟶ ℇ }
– all terminals t can appear as first in some derivation for X • Plus ℇ if it can be derived from X
• If for every α and β such that X⟶ ... α |…| β … FIRST(α)∩FIRST(β) = {} then we can always predict (choose) which rule to apply based on next token
X⟶ ... α |…| β … input = a…
a FIRST∈ (α) use X⟶ α
26
Computing FIRST sets
• FIRST (t) = { t } – t is a non terminal – t is a sentential form
• ℇ ∈ FIRST(X) if – X ⟶ ℇ or – X A⟶ 1…Ak and ℇ FIRST(A∈ i) i=1..k
27constraints
Computing FIRST sets (take I)
• Assume no null productions X …| ℇ| …
• Observation
If X … | ⟶ tα |…| Nβ Then { t } = FIRST(N) FIRST(⊆ X) and FIRST(N) FIRST(⊆ X)
Compute FIRST by solving the
constraint system
If we know the (minimal) solution to this
constraints system then we have first
If we know FIRST() then we have the
solution to this constraints system
28
Fixed-point algorithm for computing FIRST sets
• Assume no null productions Xi …| ℇ| …
Initialization FIRST(Xi) = { t | Xi tβ for some β} i = 1..m
Body do for every Xi Xkβ
FIRST(Xi) = FIRST(Xi) ∪ FIRST(Xk) until FIRST(Xi) does not change for any Xi
Say we have m non-terminals
29
FIRST sets constraints example
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
Initialization: F(STMT) = {if, while} F (EXPR) = {zero?, not, ++, --} F(TERM) = {id, constant}
do F’(STMT) = F(STMT); F’(EXPR) = F(EXPR); F’(TERM) = F(TERM); F(STMT) = F(STMT) ∪ F(EXPR); F(EXPR) = F(EXPR) ∪ F(TERM);
Until (F’(STMT) == F(STMT) && F’(EXPR) = F(EXPR) && F’(TERM) = F(TERM));
*F = FIRST
30
FIRST sets computation exampleSTMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMT
31
1. Initialization
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
32
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMT
idconstant
zero?Not++--
ifwhile
zero?Not++--
2. F(STMT) = F(STMT) ∪ F(EXPR)
33
3. F(EXPR) = F(EXPR) ∪ F(TERM)
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
idconstant
zero?Not++--
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
34
4. F(STMT) = F(STMT) ∪ F(EXPR)
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
idconstant
zero?Not++--
idconstant
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
35
4. We reached a fixed-point
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
idconstant
zero?Not++--
idconstant
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
36
Fixed-point algorithm for computing FIRST sets
• What to do with null productions?
X Y a | Z bY ℇ Z ℇ
• Say input=“a”, which rule to use?• a FIRST (∉ Y) , a FIRST (∉ Z)
Use what comes after
Y/Z
37constraint
Computing FIRST sets (take II)
• Observation
If X ⟶ A1 .. Ak N α| … and ℇ FIRST(∈ A1) , … , ℇ FIRST(∈ Ak)
Then FIRST(N) \ { ℇ } FIRST(⊆ X)
ℇ a…
Use what comes after A1..Ak to predict which
production rule of X to use
38
FOLLOW sets
• FOLLOW(N) = the set of tokens that can immediately follow the non-terminal N in some sentential form
If S * ➝ αNtβ then t ∈ FOLLOW(N)
p. 189
39
FOLLOW sets
• FOLLOW(N) = the set of tokens that can immediately follow the non-terminal N in some sentential form
If S * ➝ αNtβ then t ∈ FOLLOW(N)
• FOLLOW(t) = the set … terminal t … form
If αNtβ * ➝ α’tqβ’then q ∈ FOLLOW(t)
p. 189
40
FOLLOW sets: Constraints
• $ ∈ FOLLOW(S)
• If X α N βthen FIRST(β) – { ℇ } FOLLOW(⊆ N)
• If X α N β and ℇ ∈ FIRST(β)then FOLLOW(X) ⊆ FOLLOW(N)
End of input Start symbol Compute FIRST and FOLLOW by solving
the extended constraint system
41
Example: FOLLOW sets
• E TX X+ E | ℇ• T (E) | int Y Y * T | ℇ
Terminal + ( * ) int
FOLLOW int, ( int, ( int, ( _, ), $ *, ), +, $
Non. Term.
E T X Y
FOLLOW ), $ +, ), $ $, ) _, ), $
42
Prediction Table
• A α
• T[A,t] = α if t FIRST(∈ α)• T[A,t] = α if ℇ FIRST(∈ α) and t FOLLOW(∈ A)
– t can also be $
• T is not well defined the grammar is not LL(1)
43
LL(k) grammars• A grammar is in class LL(k) iff
for every two productions Aα and Aβ – FIRST(α) ∩ FIRST(β) = {}
• In particular α*ℇ and β*ℇ is not possible – If β* ℇ then FIRST(α) ∩ FOLLOW(A) = {}
44
Problem: Non LL(k) grammars
45
LL(k) grammars• An LL(k) grammar G can be derived via:
– Top-down derivation– Scanning the input from left to right (L)– Producing the leftmost derivation (L)– With lookahead of k tokens (k)
– G is not ambiguous – G is not left-recursive
• A language is said to be LL(k) when it has an LL(k) grammar
46
Non LL grammar: Common prefix
• FIRST(term) = { ID }• FIRST(indexed_elem) = { ID }
• FIRST/FIRST conflict
term ID | indexed_elemindexed_elem ID [ expr ]
47
Solution: left factoring• Rewrite the grammar to be in LL(1)
Intuition: just like factoring x*y + x*z into x*(y+z)
term ID | indexed_elemindexed_elem ID [ expr ]
term ID after_IDAfter_ID [ expr ] |
48
S if E then S else S | if E then S | T
S if E then S S’ | TS’ else S |
Left factoring – another example
49
• FIRST(S) = { a } FOLLOW(S) = { $ } • FIRST(X) = { a, }FOLLOW(X) = { a }
• FIRST/FOLLOW conflict
S X a bX a |
Non LL grammar: Problematic null productions
T[X,a] = α if a FIRST(∈ a)T[X, a] = if ℇ FIRST(∈ ℇ) and a FOLLOW(∈ X)
t can also be $
T is not well defined the grammar is not LL(1)
50
Solution: substitution
S A a bA a |
S a a b | a b
Substitute A in S
S a after_A after_A a b | b
Left factoring
51
Non LL grammar: Left-recursion
• Left recursion cannot be handled with a bounded lookahead
• What can we do?
E E - term | term
52
Solution: Left recursion removal
• L(G1) = β, βα, βαα, βααα, …• L(G2) = same
N Nα | β N βN’ N’ αN’ |
G1 G2
p. 130
Can be done algorithmically.Problem: grammar becomes mangled beyond recognition
53
Solution: Left recursion removal
• L(G1) = β, βα, βαα, βααα, …• L(G2) = same
N Nα | β N βN’ N’ αN’ |
G1 G2
E E - term | term
E term TE | termTE - term TE |
p. 130
Can be done algorithmically.Problem: grammar becomes mangled beyond recognition
54
LL(k) Parsers
• Recursive Descent– Manual construction– Uses recursion
• Wanted– A parser that can be generated automatically– Does not use recursion
55
Pushdown automata uses• Prediction stack• Input stream• Transition table
– nonterminals x tokens -> production alternative– Entries indexed by nonterminal N and token t
• Entry contains the alternative of N that must be predicated when current input starts with t
LL(k) parsing via PDA
56
LL(k) parsing via PDA: Moves
• Prediction top(prediction stack) = N– Pop N– If table[N, current] = α, push α to prediction
stack, otherwise – syntax error
• Match top(prediction stack) = t– If (t == current) pop prediction stack,
otherwise syntax error
57
LL(k) parsing via PDA: Termination
• Parsing terminates when prediction stack is empty– If input is empty at that point, success,
otherwise, syntax error
58
( ) not true false and or xor $
E 2 3 1 1
LIT 4 5
OP 6 7 8
(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor
Non
term
inal
s
Input tokens
Which rule should be used
Example transition table
59
Model of non-recursivepredictive parser
Predictive Parsing program
Parsing Table
X
Y
Z
$
Stack
$ b + a
Output
60
a b c
A A aAb A c
A aAb | caacbb$
Input suffix Stack content Move
aacbb$ A$ predict(A,a) = A aAbaacbb$ aAb$ match(a,a)
acbb$ Ab$ predict(A,a) = A aAbacbb$ aAbb$ match(a,a)
cbb$ Abb$ predict(A,c) = A ccbb$ cbb$ match(c,c)
bb$ bb$ match(b,b)
b$ b$ match(b,b)
$ $ match($,$) – success
Running parser example
61
Erorrs
62
Handling Syntax Errors
• Report and locate the error• Diagnose the error• Correct the error• Recover from the error in order to discover
more errors– without reporting too many “strange” errors
63
Error Diagnosis
• Line number – may be far from the actual error
• The current token• The expected tokens• Parser configuration
64
Error Recovery
• Becomes less important in interactive environments
• Example heuristics:– Search for a semi-column and ignore the statement– Try to “replace” tokens for common errors– Refrain from reporting 3 subsequent errors
• Globally optimal solutions – For every input w, find a valid program w’ with a
“minimal-distance” from w
65
a b c
A A aAb A c
A aAb | cabcbb$
Input suffix Stack content Move
abcbb$ A$ predict(A,a) = A aAbabcbb$ aAb$ match(a,a)
bcbb$ Ab$ predict(A,b) = ERROR
Illegal input example
66
Error handling in LL parsers
• Now what?– Predict b S anyway “missing token b inserted in line XXX”
S a c | b Sc$
a b c
S S a c S b S
Input suffix Stack content Move
c$ S$ predict(S,c) = ERROR
67
Error handling in LL parsers
• Result: infinite loop
S a c | b Sc$
a b c
S S a c S b S
Input suffix Stack content Move
bc$ S$ predict(b,c) = S bSbc$ bS$ match(b,b)
c$ S$ Looks familiar?
68
Error handling and recovery
• x = a * (p+q * ( -b * (r-s);
• Where should we report the error?
• The valid prefix property
69
The Valid Prefix Property
• For every prefix tokens– t1, t2, …, ti that the parser identifies as legal:
• there exists tokens ti+1, ti+2, …, tn such that t1, t2, …, tn is a syntactically valid program
• If every token is considered as single character:– For every prefix word u that the parser identifies as legal
there exists w such that u.w is a valid program
70
Recovery is tricky
• Heuristics for dropping tokens, skipping to semicolon, etc.
71
Building the Parse Tree
72
Adding semantic actions
• Can add an action to perform on each production rule
• Can build the parse tree– Every function returns an object of type Node– Every Node maintains a list of children– Function calls can add new children
73
Building the parse tree
Node E() { result = new Node(); result.name = “E”; if (current {TRUE, FALSE}) // E LIT result.addChild(LIT()); else if (current == LPAREN) // E ( E OP E ) result.addChild(match(LPAREN)); result.addChild(E()); result.addChild(OP()); result.addChild(E()); result.addChild(match(RPAREN)); else if (current == NOT) // E not E result.addChild(match(NOT)); result.addChild(E()); else error; return result;}
static int Parse_Expression(Expression **expr_p) {
Expression *expr = *expr_p = new_expression() ;
/* try to parse a digit */
if (Token.class == DIGIT) {
expr->type=‘D’; expr->value=Token.repr –’0’;
get_next_token();
return 1; }
/* try parse parenthesized expression */
if (Token.class == ‘(‘) {
expr->type=‘P’; get_next_token();
if (!Parse_Expression(&expr->left)) Error(“missing expression”);
if (!Parse_Operator(&expr->oper)) Error(“missing operator”);
if (Token.class != ‘)’) Error(“missing )”);
get_next_token();
return 1; }
return 0;
}
74
Parser for Fully Parenthesized Expers
75
Bottom-up parsing
76
Intuition: Bottom-Up Parsing
• Begin with the user's program• Guess parse (sub)trees • Check if root is the start symbol
77
+ * 321
Bottom-up parsingUnambiguousgrammarE E * TE TT T + FT FF idF numF ( E )
78
+ * 321
F
Bottom-up parsingUnambiguousgrammarE E * TE TT T + FT FF idF numF ( E )
79
Bottom-up parsingUnambiguousgrammarE E * TE TT T + FT FF idF numF ( E )
+ * 321
F F
T
F
T
80
Top-Down vs Bottom-Up• Top-down (predict match/scan-complete )
to be read…
already read…
A
Aa b
Aa b
c
aacbb$
AaAb|c
81
Top-Down vs Bottom-Up• Top-down (predict match/scan-complete )
Bottom-up (shift reduce)
to be read…
already read…
A
Aa b
Aa b
c
A
a bA
c
a b
A
aacbb$
AaAb|c
82
Bottom-up parsing: LR(k) Grammars
• A grammar is in the class LR(K) when it can be derived via:– Bottom-up derivation– Scanning the input from left to right (L)– Producing the rightmost derivation (R)– With lookahead of k tokens (k)
83
Bottom-up parsing: LR(k) Grammars
• A language is said to be LR(k) if it has an LR(k) grammar
• The simplest case is LR(0), which we will discuss
84
Terminology: Reductions & Handles
• The opposite of derivation is called reduction– Let Aα be a production rule– Derivation: βAµ βαµ– Reduction: βαµ βAµ
• A handle is the reduced substring– α is the handles for βαµ
85
Goal: Reduce the Input to the Start Symbol
Example: 0 + 0 * 1B + 0 * 1E + 0 * 1E + B * 1E * 1E * BE
E → E * B | E + B | BB → 0 | 1
Go over the input so far, and upon seeing a right-hand side of a rule, “invoke” the rule and replace the right-hand side with the left-hand side (reduce)
E
BE *
B 1
0B
0
E +
86
Use Shift & Reduce In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.
87
Use Shift & Reduce In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.
E
BE *
B 1
0
Stack Input action0+0*1$ shift
0 +0*1$ reduceB +0*1$ reduceE +0*1$ shiftE+ 0*1$ shiftE+0 *1$ reduceE+B *1$ reduceE *1$ shiftE* 1$ shiftE*1 $ reduceE*B $ reduceE $ accept
B
0
E +
Example: “0+0*1”
E → E * B | E + B | BB → 0 | 1
88
Stack
Parser
Input
Output
Action Table
Goto table
) x * ) 7 + 23 ( (
RP Id OP RP Num OP Num LP LPtoken stream
Op(*)
Id(b)
Num(23) Num(7)
Op(+)
How does the parser know what to do?
89
How does the parser know what to do?
• A state will keep the info gathered on handle(s)– A state in the “control” of the PDA– Also (part of) the stack alpha beit
• A table will tell it “what to do” based on current state and next token– The transition function of the PDA
• A stack will records the “nesting level”– Prefixes of handles
Set of LR(0) items
90
LR item
N αβ
Already matched To be matched
Input
Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β
Example: LR(0) Items• All items can be obtained by placing a dot at every
position for every production:
91
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
1: S E$2: S E $3: S E $ 4: E T5: E T 6: E E + T7: E E + T8: E E + T9: E E + T 10: T i11: T i 12: T (E)13: T ( E)14: T (E )15: T (E)
Grammar LR(0) items
92
LR(0) items
N αβ Shift Item
N αβ Reduce Item
93
States and LR(0) Items
• The state will “remember” the potential derivation rules given the part that was already identified
• For example, if we have already identified E then the state will remember the two alternatives:
(1) E → E * B, (2) E → E + B• Actually, we will also remember where we are in each of
them: (1) E → E ● * B, (2) E → E ● + B• A derivation rule with a location marker is called LR(0) item
• The state is actually a set of LR(0) items. E.g., q13 = { E → E ● * B , E → E ● + B}
E → E * B | E + B | BB → 0 | 1
94
Intuition
• Gather input token by token until we find a right-hand side of a rule and then replace it with the non-terminal on the left hand side– Going over a token and remembering it in the
stack is a shift• Each shift moves to a state that remembers what
we’ve seen so far – A reduce replaces a string in the stack with the
non-terminal that derives it
95
Model of an LR parser
LR Parser0
T
2
+
7
id
5
Stack
$ id + id + id
Outputstate
symbol
goto action
Input
Terminals and Non-terminals
96
LR parser stack
• Sequence made of state, symbol pairs• For instance a possible stack for the
grammarS E $E TE E + TT id T ( E )
could be: 0 T 2 + 7 id 5Stack grows this way
Form of LR parsing table
97
state terminals non-terminals
Shift/Reduce actions Goto part01...
sn
rk
shift state n reduce by rule k
gm
goto state m
acc
accept
error
98
LR parser table examplegoto action STATE
T E $ ) ( + id
g6 g1 s7 s5 0
acc s3 1
2
g4 s7 s5 3
r3 r3 r3 r3 r3 4
r4 r4 r4 r4 r4 5
r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7
s9 s3 8
r5 r5 r5 r5 r5 9
99
Shift move
LRParsing
program
q...
Stack
$ … a …
Output
goto action
Input
• If action[q, a] = sn
Result of shift
100
LRParsing
program
naq...
Stack
$ … a …
Output
goto action
Input
• If action[q, a] = sn
101
Reduce move
• If action[qn, a] = rk• Production: (k) A β• If β= σ1… σn
Top of stack looks like q1 σ1… qn σn• goto[q, A] = qm
LRParsing
program
qn
…
q…
Stack
$ … a …
Output
goto action
Input
2*|β|
102
Result of reduce move
LRParsing
program
Stack
Output
goto action
2*|β|qm
A
q
…
$ … a …Input
• If action[qn, a] = rk• Production: (k) A β• If β= σ1… σn
Top of stack looks like q1 σ1… qn σn• goto[q, A] = qm
Last slide
Accept move
103
LRParsing
program
q...
Stack
$ a …
Output
goto action
Input
If action[q, a] = acceptparsing completed
Error move
104
LRParsing
program
q...
Stack
$ … a …
Output
goto action
Input
If action[q, a] = error (usually empty)parsing discovered a syntactic error
105
Example
Z E $E T | E + TT i | ( E )
106
Example: parsing with LR itemsZ E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
i + i $
Why do we need these additional LR items?Where do they come from?What do they mean?
107
-closure
• Given a set S of LR(0) items
• If P αNβ is in S• then for each rule N in the grammar
S must also contain N -closure({Z E $}) =
E T, E E + T,T i , T ( E ) }
{ Z E $,
Z E $E T | E + TT i | ( E )
108
i + i $
E T E E + T
T i T ( E )
Z E $
Z E $E T | E + TT i | ( E )
Items denote possible future handles
Remember position from which we’re trying to reduce
Example: parsing with LR items
109
T i Reduce item!
i + i $
E T E E + T
T i T ( E )
Z E $
Z E $E T | E + TT i | ( E )
Match items with current token
Example: parsing with LR items
110
i
E T Reduce item!
T + i $Z E $E T | E + TT i | ( E )
E T E E + T
T i T ( E )
Z E $
Example: parsing with LR items
111
T
E T Reduce item!
i
E + i $Z E $E T | E + TT i | ( E )
E T E E + T
T i T ( E )
Z E $
Example: parsing with LR items
112
T
i
E + i $Z E $E T | E + TT i | ( E )
E T E E + T
T i T ( E )
Z E $
E E+ T
Z E$
Example: parsing with LR items
113
T
i
E + i $Z E $E T | E + TT i | ( E )
E T E E + T
T i T ( E )
Z E $
E E+ T
Z E$ E E+T
T i T ( E )
Example: parsing with LR items
114
E E+ T
Z E$ E E+T
T i T ( E )
E + T $
i
Z E $E T | E + TT i | ( E )
E T E E + T
T i T ( E )
Z E $
T
i
Example: parsing with LR items
115
E T E E + T
T i T ( E )
Z E $
Z E $E T | E + TT i | ( E )
E + T
T
i
E E+ T
Z E$ E E+T
T i T ( E )
i
E E+T
$
Reduce item!
Example: parsing with LR items
116
E T E E + T
T i T ( E )
Z E $
E $
E
T
i
+ T
Z E$
E E+ T
i
Z E $E T | E + TT i | ( E )
Example: parsing with LR items
117
E T E E + T
T i T ( E )
Z E $
E $
E
T
i
+ T
Z E$
E E+ T
Z E$
i
Z E $E T | E + TT i | ( E )
Example: parsing with LR items
Reduce item!
118
E T E E + T
T i T ( E )
Z E $
Z
E
T
i
+ T
Z E$
E E+ T
Z E$
Reduce item!
E $
i
Z E $E T | E + TT i | ( E )
Example: parsing with LR items
119
GOTO/ACTION tables
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 ZE$q3 q5 q7 q4 Shift
q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 TE
GOTO TableACTIONTable
empty – error move
120
LR(0) parser tables
• Two types of rows:– Shift row – tells which state to GOTO for
current token– Reduce row – tells which rule to reduce
(independent of current token)• GOTO entries are blank
121
LR parser data structures• Input – remainder of text to be processed• Stack – sequence of pairs N, qi
– N – symbol (terminal or non-terminal)– qi – state at which decisions are made
• Initial stack contains q0
+ i $input
q0stack i q5
122
LR(0) pushdown automaton• Two moves: shift and reduce• Shift move
– Remove first token from input– Push it on the stack– Compute next state based on GOTO table– Push new state on the stack– If new state is error – report error
i + i $input
q0stack
+ i $input
q0stack
shift
i q5
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
Stack grows this way
123
LR(0) pushdown automaton• Reduce move
– Using a rule N α– Symbols in α and their following states are removed from stack– New state computed based on GOTO table (using top of stack,
before pushing N)– N is pushed on the stack– New state pushed on top of N
+ i $input
q0stack i q5
ReduceT i + i $input
q0stack T q6
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
Stack grows this way
124