cs 335: top-down parsing - cse - iit kanpur...cs 335: top-down parsing swarnendu biswas semester...
TRANSCRIPT
CS 335: Top-down ParsingSwarnendu Biswas
Semester 2019-2020-II
CSE, IIT Kanpur
Content influenced by many excellent references, see References slide for acknowledgements.
Example Expression Grammar
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β πΈπ₯ππ + ππππ πΈπ₯ππ β ππππ ππππ
ππππ β ππππ Γ πΉπππ‘ππ ππππ Γ· πΉπππ‘ππ πΉπππ‘ππ
πΉπππ‘ππ β πΈπ₯ππ | num | name
CS 335 Swarnendu Biswas
pri
ori
ty
Derivation of name + name Γ nameSentential Form Input
πΈπ₯ππ β name + name Γ name
πΈπ₯ππ + ππππ β name + name Γ name
ππππ + ππππ β name + name Γ name
πΉπππ‘ππ + ππππ β name + name Γ name
name + ππππ β name + name Γ name
name + ππππ name β +name Γ name
name + ππππ name +β name Γ name
name + ππππ Γ πΉπππ‘ππ name +β name Γ name
name + πΉπππ‘ππ Γ πΉπππ‘ππ name +β name Γ name
name + name Γ πΉπππ‘ππ name +β name Γ name
name + name Γ πΉπππ‘ππ name + name βΓ name
name + name Γ πΉπππ‘ππ name + name Γβ name
name + name Γ name name + name Γβ name
name + name Γ name name + name Γ name β
CS 335 Swarnendu Biswas
Derivation of name + name Γ nameSentential Form Input
πΈπ₯ππ β name + name Γ name
πΈπ₯ππ + ππππ β name + name Γ name
ππππ + ππππ β name + name Γ name
πΉπππ‘ππ + ππππ β name + name Γ name
name + ππππ β name + name Γ name
name + ππππ name β +name Γ name
name + ππππ name +β name Γ name
name + ππππ Γ πΉπππ‘ππ name +β name Γ name
name + πΉπππ‘ππ Γ πΉπππ‘ππ name +β name Γ name
name + name Γ πΉπππ‘ππ name +β name Γ name
name + name Γ πΉπππ‘ππ name + name βΓ name
name + name Γ πΉπππ‘ππ name + name Γβ name
name + name Γ name name + name Γβ name
name + name Γ name name + name Γ name β
CS 335 Swarnendu Biswas
The current input terminal being scanned is called the lookahead symbol
Derivation of name + name Γ name
CS 335 Swarnendu Biswas
ππππ‘πππ‘
πππΈπ₯ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
ππππ
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
Derivation of name + name Γ name
CS 335 Swarnendu Biswas
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
ππππ Γ πΉπππ‘ππ
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
ππππ Γ πΉπππ‘ππ
πΉπππ‘ππ
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
ππππ Γ πΉπππ‘ππ
πΉπππ‘ππ
name
Derivation of name + name Γ name
CS 335 Swarnendu Biswas
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
ππππ Γ πΉπππ‘ππ
πΉπππ‘ππ
name
ππ πΈπ₯ππ
πΈπ₯ππ + ππππ
Term
πΉπππ‘ππ
name
ππππ Γ πΉπππ‘ππ
πΉπππ‘ππ
name
name
General Idea of Top-down Parsing
Start with the root (start symbol) of the parse tree
Grow the tree downwards by expanding productions at the lower levels of the tree
β’ Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal
Repeat till
β’ Lower fringe consists only terminals and the input is consumed
Top-down parsing basically finds a leftmost derivation for an input string
CS 335 Swarnendu Biswas
General Idea of Top-down Parsing
Start with the root of the parse tree
Grow the tree by expanding productions at the lower levels of the tree
β’ Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal
Repeat till
β’ Lower fringe consists only terminals and the input is consumed
β’ Mismatch in the lower fringe and the remaining input stream
β’ Selection of a production may involve trial-and-error
β’ Wrong choice of productions while expanding nonterminals
β’ Input character stream is not part of the language
CS 335 Swarnendu Biswas
Leftmost Top-down Parsing Algorithmroot = node for Start symbol
curr = root
push(null) // Stack
word = nextWord()
while (true):
if curr β Nonterminal:
pick next rule π΄ βΆ π½1π½2β¦π½π to expand curr
create nodes for π½1, π½2, β¦, π½π as children of curr
push(π½π, π½πβ1, π½1)
curr = π½1
if curr == word:
word = nextWord()
curr = pop()
if word == eof and curr == null:
accept input
else
backtrack
CS 335 Swarnendu Biswas
Implementing Backtracking
β’ Extend the previous algorithm to backtrackβ’ Set curr to parent and delete the children
β’ Expand the node curr with untried rules if anyβ’ Create child nodes for each symbol in the right hand of the production
β’ Push those symbols onto the stack in reverse order
β’ Set curr to the first child node
β’ Move curr up the tree if there are no untried rules
β’ Report a syntax error when there are no more moves
CS 335 Swarnendu Biswas
Example of Top-down ParsingRule # Sentential Form Input
πΈπ₯ππ β name+ name Γ name
1 πΈπ₯ππ + ππππ β name+ name Γ name
3 ππππ + ππππ β name+ name Γ name
6 πΉπππ‘ππ + ππππ β name+ name Γ name
9 name+ ππππ β name+ name Γ name
name+ ππππ name β +name Γ name
name+ ππππ name+β name Γ name
4 name+ ππππ Γ πΉπππ‘ππ name+β name Γ name
6 name+ πΉπππ‘ππ Γ πΉπππ‘ππ name+β name Γ name
9 name+ name Γ πΉπππ‘ππ name+β name Γ name
name+ name Γ πΉπππ‘ππ name+ name βΓ name
name+ name Γ πΉπππ‘ππ name+ name Γβ name
9 name+ name Γ name name+ name Γβ name
name+ name Γ name name+ name Γ name β
CS 335 Swarnendu Biswas
Rule # Production
0 ππ‘πππ‘ β πΈπ₯ππ
1 πΈπ₯ππ β πΈπ₯ππ + ππππ
2 πΈπ₯ππ β πΈπ₯ππ β ππππ
3 πΈπ₯ππ β ππππ
4 ππππ β ππππ Γ πΉπππ‘ππ
5 ππππ β ππππ Γ· πΉπππ‘ππ
6 ππππ β πΉπππ‘ππ
7 πΉπππ‘ππ β (πΈπ₯ππ)
8 πΉπππ‘ππ β num
9 πΉπππ‘ππ β name
Example of Top-down ParsingRule # Sentential Form Input
πΈπ₯ππ β name+ name Γ name
1 πΈπ₯ππ + ππππ β name+ name Γ name
3 ππππ + ππππ β name+ name Γ name
6 πΉπππ‘ππ + ππππ β name+ name Γ name
9 name+ ππππ β name+ name Γ name
name+ ππππ name β +name Γ name
name+ ππππ name+β name Γ name
4 name+ ππππ Γ πΉπππ‘ππ name+β name Γ name
6 name+ πΉπππ‘ππ Γ πΉπππ‘ππ name+β name Γ name
9 name+ name Γ πΉπππ‘ππ name+β name Γ name
name+ name Γ πΉπππ‘ππ name+ name βΓ name
name+ name Γ πΉπππ‘ππ name+ name Γβ name
9 name+ name Γ name name+ name Γβ name
name+ name Γ name name+ name Γ name β
CS 335 Swarnendu Biswas
Rule # Production
0 ππ‘πππ‘ β πΈπ₯ππ
1 πΈπ₯ππ β πΈπ₯ππ + ππππ
2 πΈπ₯ππ β πΈπ₯ππ β ππππ
3 πΈπ₯ππ β ππππ
4 ππππ β ππππ Γ πΉπππ‘ππ
5 ππππ β ππππ Γ· πΉπππ‘ππ
6 ππππ β πΉπππ‘ππ
7 πΉπππ‘ππ β (πΈπ₯ππ)
8 πΉπππ‘ππ β num
9 πΉπππ‘ππ β name
How does a top-down parser choose which rule to apply?
Example of Top-down Parsing
Rule # Sentential Form Input
πΈπ₯ππ β name+ name Γ name
1 πΈπ₯ππ + ππππ β name+ name Γ name
1 πΈπ₯ππ + ππππ + ππππ β name+ name Γ name
1 πΈπ₯ππ + ππππ + ππππ +β― β name+ name Γ name
1 β¦ β name+ name Γ name
1 β¦ β name+ name Γ name
CS 335 Swarnendu Biswas
Rule # Production
0 ππ‘πππ‘ β πΈπ₯ππ
1 πΈπ₯ππ β πΈπ₯ππ + ππππ
2 πΈπ₯ππ β πΈπ₯ππ β ππππ
3 πΈπ₯ππ β ππππ
4 ππππ β ππππ Γ πΉπππ‘ππ
5 ππππ β ππππ Γ· πΉπππ‘ππ
6 ππππ β πΉπππ‘ππ
7 πΉπππ‘ππ β (πΈπ₯ππ)
8 πΉπππ‘ππ β num
9 πΉπππ‘ππ β name
Example of Top-Down Parsing
Rule # Sentential Form Input
πΈπ₯ππ β name+ name Γ name
1 πΈπ₯ππ + ππππ β name+ name Γ name
1 πΈπ₯ππ + ππππ + ππππ β name+ name Γ name
1 πΈπ₯ππ + ππππ + ππππ +β― β name+ name Γ name
1 β¦ β name+ name Γ name
1 β¦ β name+ name Γ name
CS 335 Swarnendu Biswas
Rule # Production
0 ππ‘πππ‘ β πΈπ₯ππ
1 πΈπ₯ππ β πΈπ₯ππ + ππππ
2 πΈπ₯ππ β πΈπ₯ππ β ππππ
3 πΈπ₯ππ β ππππ
4 ππππ β ππππ Γ πΉπππ‘ππ
5 ππππ β ππππ Γ· πΉπππ‘ππ
6 ππππ β πΉπππ‘ππ
7 πΉπππ‘ππ β (πΈπ₯ππ)
8 πΉπππ‘ππ β num
9 πΉπππ‘ππ β name
A top-down parser can loop indefinitely with left-recursive grammar
Left Recursion
β’ A grammar is left-recursive if it has a nonterminal π΄ such that there is
a derivation π΄ Φ+π΄πΌ for some string πΌ
β’ Direct left recursion: There is a production of the form π΄ β π΄πΌ
β’ Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left
CS 335 Swarnendu Biswas
We can often reformulate a grammar to avoid left recursion
Remove Left Recursion
CS 335 Swarnendu Biswas
π΄ β π΄πΌ1 π΄πΌ2 β¦ |π΄πΌπ π½1 β¦ |π½π
π΄ β π½1π΄β²|π½2π΄
β²|β¦| π½ππ΄β²
π΄β² β πΌ1π΄β² πΌ2π΄
β² β¦ |πΌππ΄β²|π
Remove Left Recursion
CS 335 Swarnendu Biswas
πΈ β πΈ + π | ππ β π β πΉ | πΉπΉ β πΈ | id
πΈ β ππΈβ²
πΈβ² β +ππΈβ²
π β πΉπβ²
πβ² ββ πΉπβ²
πΉ β πΈ |id
Non-Left-Recursive Expression Grammar
CS 335 Swarnendu Biswas
Rule # Production
0 ππ‘πππ‘ β πΈπ₯ππ
1 πΈπ₯ππ β πΈπ₯ππ + ππππ
2 πΈπ₯ππ β πΈπ₯ππ β ππππ
3 πΈπ₯ππ β ππππ
4 ππππ β ππππ Γ πΉπππ‘ππ
5 ππππ β ππππ Γ· πΉπππ‘ππ
6 ππππ β πΉπππ‘ππ
7 πΉπππ‘ππ β (πΈπ₯ππ)
8 πΉπππ‘ππ β num
9 πΉπππ‘ππ β name
Rule # Production
0 ππ‘πππ‘ β πΈπ₯ππ
1 πΈπ₯ππ β ππππ πΈπ₯ππβ²
2 πΈπ₯ππβ² β + ππππ πΈπ₯ππβ²
3 πΈπ₯ππβ² β β ππππ πΈπ₯ππβ²
4 πΈπ₯ππβ² β π
5 ππππ β πΉπππ‘ππ ππππβ²
6 ππππβ² βΓ πΉπππ‘ππ ππππβ²
7 ππππβ² βΓ· πΉπππ‘ππ ππππβ²
8 ππππβ² β π
9 πΉπππ‘ππ β (πΈπ₯ππ)
10 πΉπππ‘ππ β num
11 πΉπππ‘ππ β name
Indirect Left Recursion
β’ There is a left recursion because π β π΄π β πππ
CS 335 Swarnendu Biswas
π β π΄π | ππ΄ β π΄π ππ π
Eliminating Left Recursion
β’ Input: Grammar πΊ with no cycles or πβproductions
β’ AlgorithmArrange nonterminals in some order π΄1, π΄2, β¦ , π΄πfor π β 1β¦π
for π β 1 to π β 1
If β a production π΄π β π΄ππΎ
Replace π΄π β π΄ππΎ with one or more productions that expand π΄π
Eliminate the immediate left recursion among the π΄π productions
CS 335 Swarnendu Biswas
Eliminating Left Recursion
β’ Input: Grammar πΊ with no cycles or πβproductions
β’ AlgorithmArrange nonterminals in some order π΄1, π΄2, β¦ , π΄πfor π β 1β¦π
for π β 1 to π β 1
If β a production π΄π β π΄ππΎ
Replace π΄π β π΄ππΎ with one or more productions that expand π΄π
Eliminate the immediate left recursion among the π΄π productions
CS 335 Swarnendu Biswas
Loop invariant at the start of outer iteration π
βπ < π, no production expanding π΄π has π΄π in its righthand side for all π < π
Eliminating Indirect Left Recursion
CS 335 Swarnendu Biswas
π β π΄π | ππ΄ β π΄π ππ π
π β π΄π | ππ΄ β πππ΄β² | π΄β²
π΄β² β ππ΄β² πππ΄β² π
Cost of Backtracking
Backtracking is expensive
β’ Parser expands a nonterminal with the wrong rule
β’ Mismatch between the lower fringe of the parse tree and the input is detected
β’ Parser undoes the last few actions
β’ Parser tries other productions if any
CS 335 Swarnendu Biswas
Avoid Backtracking
β’ Parser is to select the next rule β’ Compare the curr symbol and the next input symbol called the lookahead
β’ Use the lookahead to disambiguate the possible production rules
β’ Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead β’ Also called a predictive grammar
CS 335 Swarnendu Biswas
FIRST Set
β’ Intuitionβ’ Each alternative for the leftmost nonterminal leads to a distinct terminal
symbol
β’ Which rule to choose becomes obvious by comparing the next word in the input stream
β’ Given a string πΎ of terminal and nonterminal symbols, FIRST(πΎ) is the set of all terminal symbols that can begin any string derived from πΎβ’ We also need to keep track of which symbols can produce the empty string
β’ FIRST: (ππ βͺ π βͺ π, EOF ) β (π βͺ π, EOF )
CS 335 Swarnendu Biswas
Steps to Compute FIRST Set
1. If π is a terminal, then FIRST π = {π}
2. If π β π is a production, then π β FIRST(π)
3. If π is a nonterminal and π β π1π2β¦ ππ is a productionI. Everything in FIRST(π1) is in FIRST π
II. If for some π, π β FIRST(ππ) and β1 β€ π < π, π β FIRST(ππ), then π βFIRST(π)
III. If π β FIRST(π1β¦ππ), then π β FIRST(π)
CS 335 Swarnendu Biswas
FIRST Set
β’ Generalize FIRST relation to string of symbols
FIRST ππΎ β FIRST π if π β π
FIRST ππΎ β FIRST π βͺ FIRST πΎ if π β π
CS 335 Swarnendu Biswas
Compute FIRST Set
CS 335 Swarnendu Biswas
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β ππππ πΈπ₯ππβ²
πΈπ₯ππβ² β +ππππ πΈπ₯ππβ²
βππππ πΈπ₯ππβ² π
ππππ β πΉπππ‘ππ ππππβ²
ππππβ² βΓ πΉπππ‘ππ ππππβ²
Γ· πΉπππ‘ππ ππππβ² π
πΉπππ‘ππ β (πΈπ₯ππ) | num | name
Compute FIRST Set
FIRST πΈπ₯ππ = {(, name, num}
FIRST πΈπ₯ππβ² = {+,β, π}
FIRST ππππ = {(, name, num}
FIRST ππππβ² = {π Γ,Γ·}
FIRST πΉπππ‘ππ = {(, name,num}
CS 335 Swarnendu Biswas
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β ππππ πΈπ₯ππβ²
πΈπ₯ππβ² β +ππππ πΈπ₯ππβ²
βππππ πΈπ₯ππβ² π
ππππ β πΉπππ‘ππ ππππβ²
ππππβ² βΓ πΉπππ‘ππ ππππβ²
Γ· πΉπππ‘ππ ππππβ² π
πΉπππ‘ππ β (πΈπ₯ππ) | num | name
FOLLOW Set
β’ FOLLOW(π) is the set of terminals that can immediately follow πβ’ That is, π‘ β FOLLOW(π) if there is any derivation containing ππ‘
CS 335 Swarnendu Biswas
π
π΄ π π½πΌ
π πΎβ¦
Terminal π is in FIRST(π΄) and πis in FOLLOW(π΄)
Steps to Compute FOLLOW Set
1. Place $ in FOLLOW(π) where π is the start symbol and $ is the end marker
2. If there is a production π΄ β πΌπ΅π½, then everything in FIRST(π½)except π is in FOLLOW(π΅)
3. If there is a production π΄ β πΌπ΅, or a production π΄ β πΌπ΅π½ where FIRST(π½) contains π, then everything in FOLLOW(π΄) is in FOLLOW(π΅)
CS 335 Swarnendu Biswas
Compute FOLLOW Set
CS 335 Swarnendu Biswas
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β ππππ πΈπ₯ππβ²
πΈπ₯ππβ² β +ππππ πΈπ₯ππβ²
βππππ πΈπ₯ππβ² π
ππππ β πΉπππ‘ππ ππππβ²
ππππβ² βΓ πΉπππ‘ππ ππππβ²
Γ· πΉπππ‘ππ ππππβ² π
πΉπππ‘ππ β (πΈπ₯ππ) | num | name
Compute FOLLOW Set
FOLLOW πΈπ₯ππ = {$, )}
FOLLOW πΈπ₯ππβ² = {$,)}
FOLLOW ππππ = {$, +,β, )}
FOLLOW ππππβ² = {$,+,β, )}
FOLLOW πΉπππ‘ππ = {$, +,β,Γ,Γ·, )}
CS 335 Swarnendu Biswas
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β ππππ πΈπ₯ππβ²
πΈπ₯ππβ² β +ππππ πΈπ₯ππβ²
βππππ πΈπ₯ππβ² π
ππππ β πΉπππ‘ππ ππππβ²
ππππβ² βΓ πΉπππ‘ππ ππππβ²
Γ· πΉπππ‘ππ ππππβ² π
πΉπππ‘ππ β (πΈπ₯ππ) | num | name
Conditions for Backtrack-Free Grammar
β’ Consider a production π΄ β π½
FIRST+ = αFIRST π½ if π β FIRST(π½)
FIRST π½ βͺ FOLLOW π΄ otherwise
β’ For any nonterminal π΄ where π΄ β π½1|π½2|β¦| π½π, a backtrack-free grammar has the property FIRST+ π΄ β π½π β© FIRST+ π΄ β π½π = π, β1 β€ π, π β€ π, π β π
CS 335 Swarnendu Biswas
Backtracking
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β πππππΈπ₯ππβ²
πΈπ₯ππβ² β +πππππΈπ₯ππβ²
βπππππΈπ₯ππβ² π
ππππ β πΉπππ‘ππππππβ²
ππππβ² βΓ πΉπππ‘ππππππβ²
Γ· πΉπππ‘ππππππβ² π
πΉπππ‘ππ β name
| name [ π΄πππππ π‘ ]
| name ( π΄πππππ π‘ )
π΄πππππ π‘ β πΈπ₯ππ πππππ΄πππ
πππππ΄πππ β , πΈπ₯ππ πππππ΄πππ
| π
CS 335 Swarnendu Biswas
Backtracking
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β πππππΈπ₯ππβ²
πΈπ₯ππβ² β +πππππΈπ₯ππβ²
βπππππΈπ₯ππβ² π
ππππ β πΉπππ‘ππππππβ²
ππππβ² βΓ πΉπππ‘ππππππβ²
Γ· πΉπππ‘ππππππβ² π
πΉπππ‘ππ β name
| name [ π΄πππππ π‘ ]
| name ( π΄πππππ π‘ )
π΄πππππ π‘ β πΈπ₯ππ πππππ΄πππ
πππππ΄πππ β , πΈπ₯ππ πππππ΄πππ
| π
CS 335 Swarnendu Biswas
Not all grammars are backtrack free
Left Factoring
β’ Left factoring is the process of extracting and isolating common prefixes in a set of productions
β’ Algorithm
CS 335 Swarnendu Biswas
πΉπππ‘ππ β ππππ π΄πππ’ππππ‘π π΄πππ’ππππ‘π β π΄πππΏππ π‘ π΄πππΏππ π‘ π
π΄ β πΌπ½1 πΌπ½2 β¦ πΌπ½π πΎ1 πΎ2 β¦ |πΎπ
π΄ β πΌπ΅|πΎ1 πΎ2 β¦ |πΎππ΅ β π½1 π½2 β¦ |π½π
Key Insight in Using Top-Down Parsing
β’ Efficiency depends on the accuracy of selecting the correct production for expanding a nonterminalβ’ Parser may not terminate in the worst case
β’ A large subset of the context-free grammars can be parsed without backtracking
CS 335 Swarnendu Biswas
Recursive-Descent Parsing
CS 335 Swarnendu Biswas
Recursive-Descent Parsing
β’ Recursive-descent parsing is a form of top-down parsing that mayrequire backtracking
β’ Consists of a set of procedures, one for each nonterminal
CS 335 Swarnendu Biswas
void A() {Choose an A-production π΄ β π1π2β¦ππfor π β 1β¦π
if ππ is a nonterminalcall procedure ππ()
else if ππ equals the current input symbol πadvance the input to the next symbol
else // error
}
Limitations with Recursive-Descent Parsing
β’ Consider a grammar with two productions π β πΎ1 and π β πΎ2β’ Suppose FIRST(πΎ1) β© FIRST(πΎ2) β π
β’ Say π is the common terminal symbol
β’ Function corresponding to π will not know which production to use on input token π
CS 335 Swarnendu Biswas
Recursive-Descent Parsing with Backtracking
β’ To support backtracking β’ All productions should be tried in some order
β’ Failure for some production implies we need to try remaining productions
β’ Report an error only when there are no other rules
CS 335 Swarnendu Biswas
Predictive Parsing
β’ Special case of recursive-descent parsing that does not require backtrackingβ’ Lookahead symbol unambiguously determines which production rule to use
β’ Advantage is that the algorithm is simple and the parser can be constructed by hand
CS 335 Swarnendu Biswas
π π‘ππ‘ β expr ;| if ππ₯ππ π π‘ππ‘| for πππ‘ππ₯ππ ; πππ‘ππ₯ππ ; πππ‘ππ₯ππ π π‘ππ‘| other
πππ‘ππ₯ππ β π | expr
Pseudocode for a Predictive Parser
void stmt() {switch(lookahead) {
case expr:match(expr); match(β;β); break;
case if:match(if); match(β(β); match(expr); match(β)β); stmt(); break;
case for:match(for); match(β(β); optexpr(); match(β;β); optexpr(); match(β;β); optexpr(); match(β)β); stmt(); break;
case other:match(other); break;
default:report(βsyntax errorβ);
}}
CS 335 Swarnendu Biswas
LL(1) Grammars
β’ Class of grammars for which no backtracking is requiredβ’ First L stands for left-to-right scan, second L stands for leftmost derivation
β’ There is one lookahead token
β’ No left-recursive or ambiguous grammar can be LL(1)
β’ In LL(k), k stands for k lookahead tokensβ’ Predictive parsers accept LL(k) grammars
β’ Every LL(1) grammar is a LL(2) grammar
CS 335 Swarnendu Biswas
Nonrecursive Table-Driven Predictive Parser
CS 335 Swarnendu Biswas
Parsing Table π
Predictive Parsing Program
a + b $Input
OutputStack X
Y
Z
$
Predictive Parsing Algorithmβ’ Input: String π€ and parsing table π for grammar πΊ
β’ Algorithm:Let π be the first symbol in π€Let π be the symbol at the top of the stack while π β $:
if π == π:pop the stack and advance the input
else if π is a terminal or π[π, π] is an error entry:error
else if π π, π == π β π1π2β¦ππ:output the production pop the stackpush ππππβ1β¦π1 onto the stack
π β top stack symbol
CS 335 Swarnendu Biswas
Predictive Parsing Table
CS 335 Swarnendu Biswas
Nonterminal id + * ( ) $
πΈ πΈ β ππΈβ² πΈ β ππΈβ²
πΈβ² πΈβ² β +ππΈβ² πΈβ² β π πΈβ² β π
π π β πΉπβ² π β πΉπβ²
πβ² πβ² β π πβ² ββ πΉπβ² πβ² β π πβ² β π
πΉ πΉ β id πΉ β (πΈ)
πΈ β ππΈβ²
πΈβ² β +ππΈβ² | ππ β πΉπβ²
πβ² ββ πΉπβ² | ππΉ β πΈ | id
Construction of a Predictive Parsing Table
β’ Input: Grammar πΊ
β’ Algorithm:β’ For each production π΄ β πΌ in πΊ,
β’ For each terminal π in FIRST πΌ , add π΄ β πΌ to π[π΄, π]
β’ If π is in FIRST πΌ , then for each terminal π in FOLLOW(π΄), add π΄ β πΌ to π π΄, π
β’ If π is in FIRST πΌ and $ is in FOLLOW(π΄), add π΄ β πΌ to π[π΄, $]
β’ No production in π[π΄, π] indicates error
CS 335 Swarnendu Biswas
Working of Predictive ParserMatched Stack Input Action
πΈ$ id+ id β id$
ππΈβ²$ id+ id β id$ Output πΈ β ππΈβ²
πΉπβ²πΈβ²$ id+ id β id$ Output π β πΉπβ²
idπβ²πΈβ²$ id+ id β id$ Output πΉ β id
id πβ²πΈβ²$ +id β id$ Match id
id πΈβ²$ +id β id$ Output πβ² β π
id +ππΈβ²$ +id β id$ Output πΈβ² β +ππΈβ²
id+ ππΈβ²$ id β id$ Match +
id+ πΉπβ²πΈβ²$ id β id$ Output π β πΉπβ²
id+ idπβ²πΈβ²$ id β id$ Output πΉ β id
CS 335 Swarnendu Biswas
Working of Predictive ParserMatched Stack Input Action
β¦
id+ idπβ²πΈβ²$ id β id$ Output πΉ β id
id+ id πβ²πΈβ²$ β id$ Match id
id+ id β πΉπβ²πΈβ²$ β id$ Output πβ² ββ πΉπβ²
id+ idβ πΉπβ²πΈβ²$ id$ Match β
id+ idβ idπβ²πΈβ²$ id$ Output πΉ β id
id+ idβid πβ²πΈβ²$ $ Match id
id+ idβid πΈβ²$ $ Output πβ² β π
id+ idβid $ $ Output πΈβ² β π
CS 335 Swarnendu Biswas
Predictive Parsing
β’ Grammars whose predictive parsing tables contain no duplicate entries are called LL(1)
β’ If grammar πΊ is left-recursive or is ambiguous, then parsing table πwill have at least one multiply-defined cell
β’ Some grammars cannot be transformed into LL(1)β’ The adjacent grammar is ambiguous
CS 335 Swarnendu Biswas
π β ππΈπ‘ππβ² | ππβ² β ππ | ππΈ β π
Predictive Parsing Table
CS 335 Swarnendu Biswas
Nonterminal a b e i t $π π β π π β ππΈπ‘ππβ²
πβ² πβ² β ππβ² β ππ
πβ² β π
πΈ πΈ β π π β πΉπβ²
π β ππΈπ‘ππβ²| ππβ² β ππ | ππΈ β π
Error Recovery in Predictive Parsing
β’ Error conditionsβ’ Terminal on top of the stack does not match the next input symbol
β’ Nonterminal π΄ is on top of the stack, π is the next input symbol, and π[π΄, π]is error
β’ Choicesβ’ Raise an error and quit parsing
β’ Print an error message, try to recover from the error, and continue with compilation
CS 335 Swarnendu Biswas
Error Recovery in Predictive Parsing
β’ Panic mode β skip over symbols until a token in a set of synchronizing (synch) tokens appearsβ’ Add all tokens in FOLLOW(π΄) to the synch set for π΄
β’ Add symbols in FIRST(π΄) to the synch set for π΄
β’ Add keywords that can begin sentences
β’ β¦
CS 335 Swarnendu Biswas
Predictive Parsing Table with Synchronizing Tokens
CS 335 Swarnendu Biswas
Nonterminal id + * ( ) $
πΈ πΈ β ππΈβ² πΈ β ππΈβ² synch synch
πΈβ² πΈβ² β +ππΈβ² πΈβ² β π πΈβ² β π
π π β πΉπβ² synch π β πΉπβ² synch synch
πβ² πβ² β π πβ² ββ πΉπβ² πβ² β π πβ² β π
πΉ πΉ β id synch synch πΉ β (πΈ) synch synch
πΈ β ππΈβ²
πΈβ² β +ππΈβ² | ππ β πΉπβ²
πβ² ββ πΉπβ² | ππΉ β πΈ | id
Error Recover Moves by Predictive ParserStack Input Remark
πΈ$ )id β +id$ Error, skip )
πΈ$ id β +id$ id is in FIRST(πΈ)
ππΈβ²$ id β +id$
πΉππΈβ²$ id β +id$
idππΈβ²$ id β +id$
πβ²πΈβ²$ β +id$
β πΉπβ²πΈβ²$ β +id$
πΉπβ²πΈβ²$ +id$ Error, π πΉ,+ = synch
πβ²πΈβ²$ +id$ πΉ has been popped
πΈβ²$ +id$
CS 335 Swarnendu Biswas
Error Recover Moves by Predictive ParserStack Input Remark
+ππΈβ²$ +id$
ππΈβ²$ id$
πΉπβ²πΈβ²$ id$
idπβ²πΈβ²$ id$
πβ²πΈβ²$ $
πΈβ²$ $
$ $
CS 335 Swarnendu Biswas
References
β’ A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 4.4.
β’ K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.3.
CS 335 Swarnendu Biswas