cs 335: top-down parsing - cse - iit kanpur...cs 335: top-down parsing swarnendu biswas semester...
Post on 23-Jul-2020
24 Views
Preview:
TRANSCRIPT
CS 335: Top-down ParsingSwarnendu Biswas
Semester 2019-2020-II
CSE, IIT Kanpur
Content influenced by many excellent references, see References slide for acknowledgements.
Example Expression Grammar
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đžđ„đđ + đđđđ đžđ„đđ â đđđđ đđđđ
đđđđ â đđđđ Ă đčđđđĄđđ đđđđ Ă· đčđđđĄđđ đčđđđĄđđ
đčđđđĄđđ â đžđ„đđ | num | name
CS 335 Swarnendu Biswas
pri
ori
ty
Derivation of name + name Ă nameSentential Form Input
đžđ„đđ â name + name Ă name
đžđ„đđ + đđđđ â name + name Ă name
đđđđ + đđđđ â name + name Ă name
đčđđđĄđđ + đđđđ â name + name Ă name
name + đđđđ â name + name Ă name
name + đđđđ name â +name Ă name
name + đđđđ name +â name Ă name
name + đđđđ Ă đčđđđĄđđ name +â name Ă name
name + đčđđđĄđđ Ă đčđđđĄđđ name +â name Ă name
name + name Ă đčđđđĄđđ name +â name Ă name
name + name Ă đčđđđĄđđ name + name âĂ name
name + name Ă đčđđđĄđđ name + name Ăâ name
name + name Ă name name + name Ăâ name
name + name Ă name name + name Ă name â
CS 335 Swarnendu Biswas
Derivation of name + name Ă nameSentential Form Input
đžđ„đđ â name + name Ă name
đžđ„đđ + đđđđ â name + name Ă name
đđđđ + đđđđ â name + name Ă name
đčđđđĄđđ + đđđđ â name + name Ă name
name + đđđđ â name + name Ă name
name + đđđđ name â +name Ă name
name + đđđđ name +â name Ă name
name + đđđđ Ă đčđđđĄđđ name +â name Ă name
name + đčđđđĄđđ Ă đčđđđĄđđ name +â name Ă name
name + name Ă đčđđđĄđđ name +â name Ă name
name + name Ă đčđđđĄđđ name + name âĂ name
name + name Ă đčđđđĄđđ name + name Ăâ name
name + name Ă name name + name Ăâ name
name + name Ă name name + name Ă name â
CS 335 Swarnendu Biswas
The current input terminal being scanned is called the lookahead symbol
Derivation of name + name Ă name
CS 335 Swarnendu Biswas
đđđđĄđđđĄ
đđđžđ„đđ đžđ„đđ
đžđ„đđ + đđđđ
đđ đžđ„đđ
đžđ„đđ + đđđđ
đđđđ
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
Derivation of name + name Ă name
CS 335 Swarnendu Biswas
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
đđđđ Ă đčđđđĄđđ
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
đđđđ Ă đčđđđĄđđ
đčđđđĄđđ
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
đđđđ Ă đčđđđĄđđ
đčđđđĄđđ
name
Derivation of name + name Ă name
CS 335 Swarnendu Biswas
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
đđđđ Ă đčđđđĄđđ
đčđđđĄđđ
name
đđ đžđ„đđ
đžđ„đđ + đđđđ
Term
đčđđđĄđđ
name
đđđđ Ă đčđđđĄđđ
đčđđđĄđđ
name
name
General Idea of Top-down Parsing
Start with the root (start symbol) of the parse tree
Grow the tree downwards by expanding productions at the lower levels of the tree
âą Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal
Repeat till
âą Lower fringe consists only terminals and the input is consumed
Top-down parsing basically finds a leftmost derivation for an input string
CS 335 Swarnendu Biswas
General Idea of Top-down Parsing
Start with the root of the parse tree
Grow the tree by expanding productions at the lower levels of the tree
âą Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal
Repeat till
âą Lower fringe consists only terminals and the input is consumed
âą Mismatch in the lower fringe and the remaining input stream
âą Selection of a production may involve trial-and-error
âą Wrong choice of productions while expanding nonterminals
âą Input character stream is not part of the language
CS 335 Swarnendu Biswas
Leftmost Top-down Parsing Algorithmroot = node for Start symbol
curr = root
push(null) // Stack
word = nextWord()
while (true):
if curr â Nonterminal:
pick next rule đŽ ⶠđœ1đœ2âŠđœđ to expand curr
create nodes for đœ1, đœ2, âŠ, đœđ as children of curr
push(đœđ, đœđâ1, đœ1)
curr = đœ1
if curr == word:
word = nextWord()
curr = pop()
if word == eof and curr == null:
accept input
else
backtrack
CS 335 Swarnendu Biswas
Implementing Backtracking
âą Extend the previous algorithm to backtrackâą Set curr to parent and delete the children
âą Expand the node curr with untried rules if anyâą Create child nodes for each symbol in the right hand of the production
âą Push those symbols onto the stack in reverse order
âą Set curr to the first child node
âą Move curr up the tree if there are no untried rules
âą Report a syntax error when there are no more moves
CS 335 Swarnendu Biswas
Example of Top-down ParsingRule # Sentential Form Input
đžđ„đđ â name+ name Ă name
1 đžđ„đđ + đđđđ â name+ name Ă name
3 đđđđ + đđđđ â name+ name Ă name
6 đčđđđĄđđ + đđđđ â name+ name Ă name
9 name+ đđđđ â name+ name Ă name
name+ đđđđ name â +name Ă name
name+ đđđđ name+â name Ă name
4 name+ đđđđ Ă đčđđđĄđđ name+â name Ă name
6 name+ đčđđđĄđđ Ă đčđđđĄđđ name+â name Ă name
9 name+ name Ă đčđđđĄđđ name+â name Ă name
name+ name Ă đčđđđĄđđ name+ name âĂ name
name+ name Ă đčđđđĄđđ name+ name Ăâ name
9 name+ name Ă name name+ name Ăâ name
name+ name Ă name name+ name Ă name â
CS 335 Swarnendu Biswas
Rule # Production
0 đđĄđđđĄ â đžđ„đđ
1 đžđ„đđ â đžđ„đđ + đđđđ
2 đžđ„đđ â đžđ„đđ â đđđđ
3 đžđ„đđ â đđđđ
4 đđđđ â đđđđ Ă đčđđđĄđđ
5 đđđđ â đđđđ Ă· đčđđđĄđđ
6 đđđđ â đčđđđĄđđ
7 đčđđđĄđđ â (đžđ„đđ)
8 đčđđđĄđđ â num
9 đčđđđĄđđ â name
Example of Top-down ParsingRule # Sentential Form Input
đžđ„đđ â name+ name Ă name
1 đžđ„đđ + đđđđ â name+ name Ă name
3 đđđđ + đđđđ â name+ name Ă name
6 đčđđđĄđđ + đđđđ â name+ name Ă name
9 name+ đđđđ â name+ name Ă name
name+ đđđđ name â +name Ă name
name+ đđđđ name+â name Ă name
4 name+ đđđđ Ă đčđđđĄđđ name+â name Ă name
6 name+ đčđđđĄđđ Ă đčđđđĄđđ name+â name Ă name
9 name+ name Ă đčđđđĄđđ name+â name Ă name
name+ name Ă đčđđđĄđđ name+ name âĂ name
name+ name Ă đčđđđĄđđ name+ name Ăâ name
9 name+ name Ă name name+ name Ăâ name
name+ name Ă name name+ name Ă name â
CS 335 Swarnendu Biswas
Rule # Production
0 đđĄđđđĄ â đžđ„đđ
1 đžđ„đđ â đžđ„đđ + đđđđ
2 đžđ„đđ â đžđ„đđ â đđđđ
3 đžđ„đđ â đđđđ
4 đđđđ â đđđđ Ă đčđđđĄđđ
5 đđđđ â đđđđ Ă· đčđđđĄđđ
6 đđđđ â đčđđđĄđđ
7 đčđđđĄđđ â (đžđ„đđ)
8 đčđđđĄđđ â num
9 đčđđđĄđđ â name
How does a top-down parser choose which rule to apply?
Example of Top-down Parsing
Rule # Sentential Form Input
đžđ„đđ â name+ name Ă name
1 đžđ„đđ + đđđđ â name+ name Ă name
1 đžđ„đđ + đđđđ + đđđđ â name+ name Ă name
1 đžđ„đđ + đđđđ + đđđđ +⯠â name+ name Ă name
1 ⊠â name+ name Ă name
1 ⊠â name+ name Ă name
CS 335 Swarnendu Biswas
Rule # Production
0 đđĄđđđĄ â đžđ„đđ
1 đžđ„đđ â đžđ„đđ + đđđđ
2 đžđ„đđ â đžđ„đđ â đđđđ
3 đžđ„đđ â đđđđ
4 đđđđ â đđđđ Ă đčđđđĄđđ
5 đđđđ â đđđđ Ă· đčđđđĄđđ
6 đđđđ â đčđđđĄđđ
7 đčđđđĄđđ â (đžđ„đđ)
8 đčđđđĄđđ â num
9 đčđđđĄđđ â name
Example of Top-Down Parsing
Rule # Sentential Form Input
đžđ„đđ â name+ name Ă name
1 đžđ„đđ + đđđđ â name+ name Ă name
1 đžđ„đđ + đđđđ + đđđđ â name+ name Ă name
1 đžđ„đđ + đđđđ + đđđđ +⯠â name+ name Ă name
1 ⊠â name+ name Ă name
1 ⊠â name+ name Ă name
CS 335 Swarnendu Biswas
Rule # Production
0 đđĄđđđĄ â đžđ„đđ
1 đžđ„đđ â đžđ„đđ + đđđđ
2 đžđ„đđ â đžđ„đđ â đđđđ
3 đžđ„đđ â đđđđ
4 đđđđ â đđđđ Ă đčđđđĄđđ
5 đđđđ â đđđđ Ă· đčđđđĄđđ
6 đđđđ â đčđđđĄđđ
7 đčđđđĄđđ â (đžđ„đđ)
8 đčđđđĄđđ â num
9 đčđđđĄđđ â name
A top-down parser can loop indefinitely with left-recursive grammar
Left Recursion
âą A grammar is left-recursive if it has a nonterminal đŽ such that there is
a derivation đŽ Ö+đŽđŒ for some string đŒ
âą Direct left recursion: There is a production of the form đŽ â đŽđŒ
âą Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left
CS 335 Swarnendu Biswas
We can often reformulate a grammar to avoid left recursion
Remove Left Recursion
CS 335 Swarnendu Biswas
đŽ â đŽđŒ1 đŽđŒ2 ⊠|đŽđŒđ đœ1 ⊠|đœđ
đŽ â đœ1đŽâČ|đœ2đŽ
âČ|âŠ| đœđđŽâČ
đŽâČ â đŒ1đŽâČ đŒ2đŽ
âČ âŠ |đŒđđŽâČ|đ
Remove Left Recursion
CS 335 Swarnendu Biswas
đž â đž + đ | đđ â đ â đč | đčđč â đž | id
đž â đđžâČ
đžâČ â +đđžâČ
đ â đčđâČ
đâČ ââ đčđâČ
đč â đž |id
Non-Left-Recursive Expression Grammar
CS 335 Swarnendu Biswas
Rule # Production
0 đđĄđđđĄ â đžđ„đđ
1 đžđ„đđ â đžđ„đđ + đđđđ
2 đžđ„đđ â đžđ„đđ â đđđđ
3 đžđ„đđ â đđđđ
4 đđđđ â đđđđ Ă đčđđđĄđđ
5 đđđđ â đđđđ Ă· đčđđđĄđđ
6 đđđđ â đčđđđĄđđ
7 đčđđđĄđđ â (đžđ„đđ)
8 đčđđđĄđđ â num
9 đčđđđĄđđ â name
Rule # Production
0 đđĄđđđĄ â đžđ„đđ
1 đžđ„đđ â đđđđ đžđ„đđâČ
2 đžđ„đđâČ â + đđđđ đžđ„đđâČ
3 đžđ„đđâČ â â đđđđ đžđ„đđâČ
4 đžđ„đđâČ â đ
5 đđđđ â đčđđđĄđđ đđđđâČ
6 đđđđâČ âĂ đčđđđĄđđ đđđđâČ
7 đđđđâČ âĂ· đčđđđĄđđ đđđđâČ
8 đđđđâČ â đ
9 đčđđđĄđđ â (đžđ„đđ)
10 đčđđđĄđđ â num
11 đčđđđĄđđ â name
Indirect Left Recursion
âą There is a left recursion because đ â đŽđ â đđđ
CS 335 Swarnendu Biswas
đ â đŽđ | đđŽ â đŽđ đđ đ
Eliminating Left Recursion
âą Input: Grammar đș with no cycles or đâproductions
âą AlgorithmArrange nonterminals in some order đŽ1, đŽ2, ⊠, đŽđfor đ â 1âŠđ
for đ â 1 to đ â 1
If â a production đŽđ â đŽđđŸ
Replace đŽđ â đŽđđŸ with one or more productions that expand đŽđ
Eliminate the immediate left recursion among the đŽđ productions
CS 335 Swarnendu Biswas
Eliminating Left Recursion
âą Input: Grammar đș with no cycles or đâproductions
âą AlgorithmArrange nonterminals in some order đŽ1, đŽ2, ⊠, đŽđfor đ â 1âŠđ
for đ â 1 to đ â 1
If â a production đŽđ â đŽđđŸ
Replace đŽđ â đŽđđŸ with one or more productions that expand đŽđ
Eliminate the immediate left recursion among the đŽđ productions
CS 335 Swarnendu Biswas
Loop invariant at the start of outer iteration đ
âđ < đ, no production expanding đŽđ has đŽđ in its righthand side for all đ < đ
Eliminating Indirect Left Recursion
CS 335 Swarnendu Biswas
đ â đŽđ | đđŽ â đŽđ đđ đ
đ â đŽđ | đđŽ â đđđŽâČ | đŽâČ
đŽâČ â đđŽâČ đđđŽâČ đ
Cost of Backtracking
Backtracking is expensive
âą Parser expands a nonterminal with the wrong rule
âą Mismatch between the lower fringe of the parse tree and the input is detected
âą Parser undoes the last few actions
âą Parser tries other productions if any
CS 335 Swarnendu Biswas
Avoid Backtracking
âą Parser is to select the next rule âą Compare the curr symbol and the next input symbol called the lookahead
âą Use the lookahead to disambiguate the possible production rules
âą Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead âą Also called a predictive grammar
CS 335 Swarnendu Biswas
FIRST Set
âą Intuitionâą Each alternative for the leftmost nonterminal leads to a distinct terminal
symbol
âą Which rule to choose becomes obvious by comparing the next word in the input stream
âą Given a string đŸ of terminal and nonterminal symbols, FIRST(đŸ) is the set of all terminal symbols that can begin any string derived from đŸâą We also need to keep track of which symbols can produce the empty string
âą FIRST: (đđ âȘ đ âȘ đ, EOF ) â (đ âȘ đ, EOF )
CS 335 Swarnendu Biswas
Steps to Compute FIRST Set
1. If đ is a terminal, then FIRST đ = {đ}
2. If đ â đ is a production, then đ â FIRST(đ)
3. If đ is a nonterminal and đ â đ1đ2⊠đđ is a productionI. Everything in FIRST(đ1) is in FIRST đ
II. If for some đ, đ â FIRST(đđ) and â1 †đ < đ, đ â FIRST(đđ), then đ âFIRST(đ)
III. If đ â FIRST(đ1âŠđđ), then đ â FIRST(đ)
CS 335 Swarnendu Biswas
FIRST Set
âą Generalize FIRST relation to string of symbols
FIRST đđŸ â FIRST đ if đ â đ
FIRST đđŸ â FIRST đ âȘ FIRST đŸ if đ â đ
CS 335 Swarnendu Biswas
Compute FIRST Set
CS 335 Swarnendu Biswas
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đđđđ đžđ„đđâČ
đžđ„đđâČ â +đđđđ đžđ„đđâČ
âđđđđ đžđ„đđâČ đ
đđđđ â đčđđđĄđđ đđđđâČ
đđđđâČ âĂ đčđđđĄđđ đđđđâČ
Ă· đčđđđĄđđ đđđđâČ đ
đčđđđĄđđ â (đžđ„đđ) | num | name
Compute FIRST Set
FIRST đžđ„đđ = {(, name, num}
FIRST đžđ„đđâČ = {+,â, đ}
FIRST đđđđ = {(, name, num}
FIRST đđđđâČ = {đ Ă,Ă·}
FIRST đčđđđĄđđ = {(, name,num}
CS 335 Swarnendu Biswas
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đđđđ đžđ„đđâČ
đžđ„đđâČ â +đđđđ đžđ„đđâČ
âđđđđ đžđ„đđâČ đ
đđđđ â đčđđđĄđđ đđđđâČ
đđđđâČ âĂ đčđđđĄđđ đđđđâČ
Ă· đčđđđĄđđ đđđđâČ đ
đčđđđĄđđ â (đžđ„đđ) | num | name
FOLLOW Set
âą FOLLOW(đ) is the set of terminals that can immediately follow đâą That is, đĄ â FOLLOW(đ) if there is any derivation containing đđĄ
CS 335 Swarnendu Biswas
đ
đŽ đ đœđŒ
đ đŸâŠ
Terminal đ is in FIRST(đŽ) and đis in FOLLOW(đŽ)
Steps to Compute FOLLOW Set
1. Place $ in FOLLOW(đ) where đ is the start symbol and $ is the end marker
2. If there is a production đŽ â đŒđ”đœ, then everything in FIRST(đœ)except đ is in FOLLOW(đ”)
3. If there is a production đŽ â đŒđ”, or a production đŽ â đŒđ”đœ where FIRST(đœ) contains đ, then everything in FOLLOW(đŽ) is in FOLLOW(đ”)
CS 335 Swarnendu Biswas
Compute FOLLOW Set
CS 335 Swarnendu Biswas
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đđđđ đžđ„đđâČ
đžđ„đđâČ â +đđđđ đžđ„đđâČ
âđđđđ đžđ„đđâČ đ
đđđđ â đčđđđĄđđ đđđđâČ
đđđđâČ âĂ đčđđđĄđđ đđđđâČ
Ă· đčđđđĄđđ đđđđâČ đ
đčđđđĄđđ â (đžđ„đđ) | num | name
Compute FOLLOW Set
FOLLOW đžđ„đđ = {$, )}
FOLLOW đžđ„đđâČ = {$,)}
FOLLOW đđđđ = {$, +,â, )}
FOLLOW đđđđâČ = {$,+,â, )}
FOLLOW đčđđđĄđđ = {$, +,â,Ă,Ă·, )}
CS 335 Swarnendu Biswas
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đđđđ đžđ„đđâČ
đžđ„đđâČ â +đđđđ đžđ„đđâČ
âđđđđ đžđ„đđâČ đ
đđđđ â đčđđđĄđđ đđđđâČ
đđđđâČ âĂ đčđđđĄđđ đđđđâČ
Ă· đčđđđĄđđ đđđđâČ đ
đčđđđĄđđ â (đžđ„đđ) | num | name
Conditions for Backtrack-Free Grammar
âą Consider a production đŽ â đœ
FIRST+ = áFIRST đœ if đ â FIRST(đœ)
FIRST đœ âȘ FOLLOW đŽ otherwise
âą For any nonterminal đŽ where đŽ â đœ1|đœ2|âŠ| đœđ, a backtrack-free grammar has the property FIRST+ đŽ â đœđ â© FIRST+ đŽ â đœđ = đ, â1 †đ, đ †đ, đ â đ
CS 335 Swarnendu Biswas
Backtracking
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đđđđđžđ„đđâČ
đžđ„đđâČ â +đđđđđžđ„đđâČ
âđđđđđžđ„đđâČ đ
đđđđ â đčđđđĄđđđđđđâČ
đđđđâČ âĂ đčđđđĄđđđđđđâČ
Ă· đčđđđĄđđđđđđâČ đ
đčđđđĄđđ â name
| name [ đŽđđđđđ đĄ ]
| name ( đŽđđđđđ đĄ )
đŽđđđđđ đĄ â đžđ„đđ đđđđđŽđđđ
đđđđđŽđđđ â , đžđ„đđ đđđđđŽđđđ
| đ
CS 335 Swarnendu Biswas
Backtracking
đđĄđđđĄ â đžđ„đđ
đžđ„đđ â đđđđđžđ„đđâČ
đžđ„đđâČ â +đđđđđžđ„đđâČ
âđđđđđžđ„đđâČ đ
đđđđ â đčđđđĄđđđđđđâČ
đđđđâČ âĂ đčđđđĄđđđđđđâČ
Ă· đčđđđĄđđđđđđâČ đ
đčđđđĄđđ â name
| name [ đŽđđđđđ đĄ ]
| name ( đŽđđđđđ đĄ )
đŽđđđđđ đĄ â đžđ„đđ đđđđđŽđđđ
đđđđđŽđđđ â , đžđ„đđ đđđđđŽđđđ
| đ
CS 335 Swarnendu Biswas
Not all grammars are backtrack free
Left Factoring
âą Left factoring is the process of extracting and isolating common prefixes in a set of productions
âą Algorithm
CS 335 Swarnendu Biswas
đčđđđĄđđ â đđđđ đŽđđđąđđđđĄđ đŽđđđąđđđđĄđ â đŽđđđżđđ đĄ đŽđđđżđđ đĄ đ
đŽ â đŒđœ1 đŒđœ2 ⊠đŒđœđ đŸ1 đŸ2 ⊠|đŸđ
đŽ â đŒđ”|đŸ1 đŸ2 ⊠|đŸđđ” â đœ1 đœ2 ⊠|đœđ
Key Insight in Using Top-Down Parsing
âą Efficiency depends on the accuracy of selecting the correct production for expanding a nonterminalâą Parser may not terminate in the worst case
âą A large subset of the context-free grammars can be parsed without backtracking
CS 335 Swarnendu Biswas
Recursive-Descent Parsing
CS 335 Swarnendu Biswas
Recursive-Descent Parsing
âą Recursive-descent parsing is a form of top-down parsing that mayrequire backtracking
âą Consists of a set of procedures, one for each nonterminal
CS 335 Swarnendu Biswas
void A() {Choose an A-production đŽ â đ1đ2âŠđđfor đ â 1âŠđ
if đđ is a nonterminalcall procedure đđ()
else if đđ equals the current input symbol đadvance the input to the next symbol
else // error
}
Limitations with Recursive-Descent Parsing
âą Consider a grammar with two productions đ â đŸ1 and đ â đŸ2âą Suppose FIRST(đŸ1) â© FIRST(đŸ2) â đ
âą Say đ is the common terminal symbol
âą Function corresponding to đ will not know which production to use on input token đ
CS 335 Swarnendu Biswas
Recursive-Descent Parsing with Backtracking
âą To support backtracking âą All productions should be tried in some order
âą Failure for some production implies we need to try remaining productions
âą Report an error only when there are no other rules
CS 335 Swarnendu Biswas
Predictive Parsing
âą Special case of recursive-descent parsing that does not require backtrackingâą Lookahead symbol unambiguously determines which production rule to use
âą Advantage is that the algorithm is simple and the parser can be constructed by hand
CS 335 Swarnendu Biswas
đ đĄđđĄ â expr ;| if đđ„đđ đ đĄđđĄ| for đđđĄđđ„đđ ; đđđĄđđ„đđ ; đđđĄđđ„đđ đ đĄđđĄ| other
đđđĄđđ„đđ â đ | expr
Pseudocode for a Predictive Parser
void stmt() {switch(lookahead) {
case expr:match(expr); match(â;â); break;
case if:match(if); match(â(â); match(expr); match(â)â); stmt(); break;
case for:match(for); match(â(â); optexpr(); match(â;â); optexpr(); match(â;â); optexpr(); match(â)â); stmt(); break;
case other:match(other); break;
default:report(âsyntax errorâ);
}}
CS 335 Swarnendu Biswas
LL(1) Grammars
âą Class of grammars for which no backtracking is requiredâą First L stands for left-to-right scan, second L stands for leftmost derivation
âą There is one lookahead token
âą No left-recursive or ambiguous grammar can be LL(1)
âą In LL(k), k stands for k lookahead tokensâą Predictive parsers accept LL(k) grammars
âą Every LL(1) grammar is a LL(2) grammar
CS 335 Swarnendu Biswas
Nonrecursive Table-Driven Predictive Parser
CS 335 Swarnendu Biswas
Parsing Table đ
Predictive Parsing Program
a + b $Input
OutputStack X
Y
Z
$
Predictive Parsing Algorithmâą Input: String đ€ and parsing table đ for grammar đș
âą Algorithm:Let đ be the first symbol in đ€Let đ be the symbol at the top of the stack while đ â $:
if đ == đ:pop the stack and advance the input
else if đ is a terminal or đ[đ, đ] is an error entry:error
else if đ đ, đ == đ â đ1đ2âŠđđ:output the production pop the stackpush đđđđâ1âŠđ1 onto the stack
đ â top stack symbol
CS 335 Swarnendu Biswas
Predictive Parsing Table
CS 335 Swarnendu Biswas
Nonterminal id + * ( ) $
đž đž â đđžâČ đž â đđžâČ
đžâČ đžâČ â +đđžâČ đžâČ â đ đžâČ â đ
đ đ â đčđâČ đ â đčđâČ
đâČ đâČ â đ đâČ ââ đčđâČ đâČ â đ đâČ â đ
đč đč â id đč â (đž)
đž â đđžâČ
đžâČ â +đđžâČ | đđ â đčđâČ
đâČ ââ đčđâČ | đđč â đž | id
Construction of a Predictive Parsing Table
âą Input: Grammar đș
âą Algorithm:âą For each production đŽ â đŒ in đș,
âą For each terminal đ in FIRST đŒ , add đŽ â đŒ to đ[đŽ, đ]
âą If đ is in FIRST đŒ , then for each terminal đ in FOLLOW(đŽ), add đŽ â đŒ to đ đŽ, đ
âą If đ is in FIRST đŒ and $ is in FOLLOW(đŽ), add đŽ â đŒ to đ[đŽ, $]
âą No production in đ[đŽ, đ] indicates error
CS 335 Swarnendu Biswas
Working of Predictive ParserMatched Stack Input Action
đž$ id+ id â id$
đđžâČ$ id+ id â id$ Output đž â đđžâČ
đčđâČđžâČ$ id+ id â id$ Output đ â đčđâČ
idđâČđžâČ$ id+ id â id$ Output đč â id
id đâČđžâČ$ +id â id$ Match id
id đžâČ$ +id â id$ Output đâČ â đ
id +đđžâČ$ +id â id$ Output đžâČ â +đđžâČ
id+ đđžâČ$ id â id$ Match +
id+ đčđâČđžâČ$ id â id$ Output đ â đčđâČ
id+ idđâČđžâČ$ id â id$ Output đč â id
CS 335 Swarnendu Biswas
Working of Predictive ParserMatched Stack Input Action
âŠ
id+ idđâČđžâČ$ id â id$ Output đč â id
id+ id đâČđžâČ$ â id$ Match id
id+ id â đčđâČđžâČ$ â id$ Output đâČ ââ đčđâČ
id+ idâ đčđâČđžâČ$ id$ Match â
id+ idâ idđâČđžâČ$ id$ Output đč â id
id+ idâid đâČđžâČ$ $ Match id
id+ idâid đžâČ$ $ Output đâČ â đ
id+ idâid $ $ Output đžâČ â đ
CS 335 Swarnendu Biswas
Predictive Parsing
âą Grammars whose predictive parsing tables contain no duplicate entries are called LL(1)
âą If grammar đș is left-recursive or is ambiguous, then parsing table đwill have at least one multiply-defined cell
âą Some grammars cannot be transformed into LL(1)âą The adjacent grammar is ambiguous
CS 335 Swarnendu Biswas
đ â đđžđĄđđâČ | đđâČ â đđ | đđž â đ
Predictive Parsing Table
CS 335 Swarnendu Biswas
Nonterminal a b e i t $đ đ â đ đ â đđžđĄđđâČ
đâČ đâČ â đđâČ â đđ
đâČ â đ
đž đž â đ đ â đčđâČ
đ â đđžđĄđđâČ| đđâČ â đđ | đđž â đ
Error Recovery in Predictive Parsing
âą Error conditionsâą Terminal on top of the stack does not match the next input symbol
âą Nonterminal đŽ is on top of the stack, đ is the next input symbol, and đ[đŽ, đ]is error
âą Choicesâą Raise an error and quit parsing
âą Print an error message, try to recover from the error, and continue with compilation
CS 335 Swarnendu Biswas
Error Recovery in Predictive Parsing
âą Panic mode â skip over symbols until a token in a set of synchronizing (synch) tokens appearsâą Add all tokens in FOLLOW(đŽ) to the synch set for đŽ
âą Add symbols in FIRST(đŽ) to the synch set for đŽ
âą Add keywords that can begin sentences
âą âŠ
CS 335 Swarnendu Biswas
Predictive Parsing Table with Synchronizing Tokens
CS 335 Swarnendu Biswas
Nonterminal id + * ( ) $
đž đž â đđžâČ đž â đđžâČ synch synch
đžâČ đžâČ â +đđžâČ đžâČ â đ đžâČ â đ
đ đ â đčđâČ synch đ â đčđâČ synch synch
đâČ đâČ â đ đâČ ââ đčđâČ đâČ â đ đâČ â đ
đč đč â id synch synch đč â (đž) synch synch
đž â đđžâČ
đžâČ â +đđžâČ | đđ â đčđâČ
đâČ ââ đčđâČ | đđč â đž | id
Error Recover Moves by Predictive ParserStack Input Remark
đž$ )id â +id$ Error, skip )
đž$ id â +id$ id is in FIRST(đž)
đđžâČ$ id â +id$
đčđđžâČ$ id â +id$
idđđžâČ$ id â +id$
đâČđžâČ$ â +id$
â đčđâČđžâČ$ â +id$
đčđâČđžâČ$ +id$ Error, đ đč,+ = synch
đâČđžâČ$ +id$ đč has been popped
đžâČ$ +id$
CS 335 Swarnendu Biswas
Error Recover Moves by Predictive ParserStack Input Remark
+đđžâČ$ +id$
đđžâČ$ id$
đčđâČđžâČ$ id$
idđâČđžâČ$ id$
đâČđžâČ$ $
đžâČ$ $
$ $
CS 335 Swarnendu Biswas
References
âą A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 4.4.
âą K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.3.
CS 335 Swarnendu Biswas
top related