cs 335: top-down parsing - cse - iit kanpur...cs 335: top-down parsing swarnendu biswas semester...

60
CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References slide for acknowledgements.

Upload: others

Post on 23-Jul-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

CS 335: Top-down ParsingSwarnendu Biswas

Semester 2019-2020-II

CSE, IIT Kanpur

Content influenced by many excellent references, see References slide for acknowledgements.

Page 2: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Example Expression Grammar

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿ βˆ’ π‘‡π‘’π‘Ÿπ‘š π‘‡π‘’π‘Ÿπ‘š

π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘š Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ | num | name

CS 335 Swarnendu Biswas

pri

ori

ty

Page 3: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Derivation of name + name Γ— nameSentential Form Input

𝐸π‘₯π‘π‘Ÿ ↑ name + name Γ— name

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š name ↑ +name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š name +↑ name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name +↑ name Γ— name

name + πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name +↑ name Γ— name

name + name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name +↑ name Γ— name

name + name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name + name ↑× name

name + name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name + name ×↑ name

name + name Γ— name name + name ×↑ name

name + name Γ— name name + name Γ— name ↑

CS 335 Swarnendu Biswas

Page 4: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Derivation of name + name Γ— nameSentential Form Input

𝐸π‘₯π‘π‘Ÿ ↑ name + name Γ— name

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š ↑ name + name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š name ↑ +name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š name +↑ name Γ— name

name + π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name +↑ name Γ— name

name + πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name +↑ name Γ— name

name + name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name +↑ name Γ— name

name + name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name + name ↑× name

name + name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name + name ×↑ name

name + name Γ— name name + name ×↑ name

name + name Γ— name name + name Γ— name ↑

CS 335 Swarnendu Biswas

The current input terminal being scanned is called the lookahead symbol

Page 5: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Derivation of name + name Γ— name

CS 335 Swarnendu Biswas

π‘™π‘šπ‘†π‘‘π‘Žπ‘Ÿπ‘‘

π‘™π‘šπΈπ‘₯π‘π‘Ÿ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

π‘‡π‘’π‘Ÿπ‘š

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

Page 6: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Derivation of name + name Γ— name

CS 335 Swarnendu Biswas

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

Page 7: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Derivation of name + name Γ— name

CS 335 Swarnendu Biswas

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘™π‘š 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

Term

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

name

name

Page 8: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

General Idea of Top-down Parsing

Start with the root (start symbol) of the parse tree

Grow the tree downwards by expanding productions at the lower levels of the tree

β€’ Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal

Repeat till

β€’ Lower fringe consists only terminals and the input is consumed

Top-down parsing basically finds a leftmost derivation for an input string

CS 335 Swarnendu Biswas

Page 9: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

General Idea of Top-down Parsing

Start with the root of the parse tree

Grow the tree by expanding productions at the lower levels of the tree

β€’ Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal

Repeat till

β€’ Lower fringe consists only terminals and the input is consumed

β€’ Mismatch in the lower fringe and the remaining input stream

β€’ Selection of a production may involve trial-and-error

β€’ Wrong choice of productions while expanding nonterminals

β€’ Input character stream is not part of the language

CS 335 Swarnendu Biswas

Page 10: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Leftmost Top-down Parsing Algorithmroot = node for Start symbol

curr = root

push(null) // Stack

word = nextWord()

while (true):

if curr ∈ Nonterminal:

pick next rule 𝐴 ⟢ 𝛽1𝛽2…𝛽𝑛 to expand curr

create nodes for 𝛽1, 𝛽2, …, 𝛽𝑛 as children of curr

push(𝛽𝑛, π›½π‘›βˆ’1, 𝛽1)

curr = 𝛽1

if curr == word:

word = nextWord()

curr = pop()

if word == eof and curr == null:

accept input

else

backtrack

CS 335 Swarnendu Biswas

Page 11: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Implementing Backtracking

β€’ Extend the previous algorithm to backtrackβ€’ Set curr to parent and delete the children

β€’ Expand the node curr with untried rules if anyβ€’ Create child nodes for each symbol in the right hand of the production

β€’ Push those symbols onto the stack in reverse order

β€’ Set curr to the first child node

β€’ Move curr up the tree if there are no untried rules

β€’ Report a syntax error when there are no more moves

CS 335 Swarnendu Biswas

Page 12: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Example of Top-down ParsingRule # Sentential Form Input

𝐸π‘₯π‘π‘Ÿ ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

3 π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

6 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

9 name+ π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

name+ π‘‡π‘’π‘Ÿπ‘š name ↑ +name Γ— name

name+ π‘‡π‘’π‘Ÿπ‘š name+↑ name Γ— name

4 name+ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+↑ name Γ— name

6 name+ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+↑ name Γ— name

9 name+ name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+↑ name Γ— name

name+ name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+ name ↑× name

name+ name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+ name ×↑ name

9 name+ name Γ— name name+ name ×↑ name

name+ name Γ— name name+ name Γ— name ↑

CS 335 Swarnendu Biswas

Rule # Production

0 π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

1 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

2 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ βˆ’ π‘‡π‘’π‘Ÿπ‘š

3 𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š

4 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

5 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

6 π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

7 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ)

8 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ num

9 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

Page 13: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Example of Top-down ParsingRule # Sentential Form Input

𝐸π‘₯π‘π‘Ÿ ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

3 π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

6 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

9 name+ π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

name+ π‘‡π‘’π‘Ÿπ‘š name ↑ +name Γ— name

name+ π‘‡π‘’π‘Ÿπ‘š name+↑ name Γ— name

4 name+ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+↑ name Γ— name

6 name+ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+↑ name Γ— name

9 name+ name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+↑ name Γ— name

name+ name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+ name ↑× name

name+ name Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ name+ name ×↑ name

9 name+ name Γ— name name+ name ×↑ name

name+ name Γ— name name+ name Γ— name ↑

CS 335 Swarnendu Biswas

Rule # Production

0 π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

1 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

2 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ βˆ’ π‘‡π‘’π‘Ÿπ‘š

3 𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š

4 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

5 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

6 π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

7 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ)

8 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ num

9 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

How does a top-down parser choose which rule to apply?

Page 14: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Example of Top-down Parsing

Rule # Sentential Form Input

𝐸π‘₯π‘π‘Ÿ ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š +β‹― ↑ name+ name Γ— name

1 … ↑ name+ name Γ— name

1 … ↑ name+ name Γ— name

CS 335 Swarnendu Biswas

Rule # Production

0 π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

1 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

2 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ βˆ’ π‘‡π‘’π‘Ÿπ‘š

3 𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š

4 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

5 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

6 π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

7 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ)

8 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ num

9 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

Page 15: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Example of Top-Down Parsing

Rule # Sentential Form Input

𝐸π‘₯π‘π‘Ÿ ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š ↑ name+ name Γ— name

1 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š + π‘‡π‘’π‘Ÿπ‘š +β‹― ↑ name+ name Γ— name

1 … ↑ name+ name Γ— name

1 … ↑ name+ name Γ— name

CS 335 Swarnendu Biswas

Rule # Production

0 π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

1 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

2 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ βˆ’ π‘‡π‘’π‘Ÿπ‘š

3 𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š

4 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

5 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

6 π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

7 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ)

8 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ num

9 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

A top-down parser can loop indefinitely with left-recursive grammar

Page 16: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Left Recursion

β€’ A grammar is left-recursive if it has a nonterminal 𝐴 such that there is

a derivation 𝐴 ֜+𝐴𝛼 for some string 𝛼

β€’ Direct left recursion: There is a production of the form 𝐴 β†’ 𝐴𝛼

β€’ Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left

CS 335 Swarnendu Biswas

We can often reformulate a grammar to avoid left recursion

Page 17: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Remove Left Recursion

CS 335 Swarnendu Biswas

𝐴 β†’ 𝐴𝛼1 𝐴𝛼2 … |π΄π›Όπ‘š 𝛽1 … |𝛽𝑛

𝐴 β†’ 𝛽1𝐴′|𝛽2𝐴

β€²|…| 𝛽𝑛𝐴′

𝐴′ β†’ 𝛼1𝐴′ 𝛼2𝐴

β€² … |π›Όπ‘šπ΄β€²|πœ–

Page 18: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Remove Left Recursion

CS 335 Swarnendu Biswas

𝐸 β†’ 𝐸 + 𝑇 | 𝑇𝑇 β†’ 𝑇 βˆ— 𝐹 | 𝐹𝐹 β†’ 𝐸 | id

𝐸 β†’ 𝑇𝐸′

𝐸′ β†’ +𝑇𝐸′

𝑇 β†’ 𝐹𝑇′

𝑇′ β†’βˆ— 𝐹𝑇′

𝐹 β†’ 𝐸 |id

Page 19: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Non-Left-Recursive Expression Grammar

CS 335 Swarnendu Biswas

Rule # Production

0 π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

1 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ + π‘‡π‘’π‘Ÿπ‘š

2 𝐸π‘₯π‘π‘Ÿ β†’ 𝐸π‘₯π‘π‘Ÿ βˆ’ π‘‡π‘’π‘Ÿπ‘š

3 𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š

4 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

5 π‘‡π‘’π‘Ÿπ‘š β†’ π‘‡π‘’π‘Ÿπ‘š Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

6 π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ

7 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ)

8 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ num

9 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

Rule # Production

0 π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

1 𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

2 𝐸π‘₯π‘π‘Ÿβ€² β†’ + π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

3 𝐸π‘₯π‘π‘Ÿβ€² β†’ βˆ’ π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

4 𝐸π‘₯π‘π‘Ÿβ€² β†’ πœ–

5 π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

6 π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

7 π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

8 π‘‡π‘’π‘Ÿπ‘šβ€² β†’ πœ–

9 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ)

10 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ num

11 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

Page 20: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Indirect Left Recursion

β€’ There is a left recursion because 𝑆 β†’ π΄π‘Ž β†’ π‘†π‘‘π‘Ž

CS 335 Swarnendu Biswas

𝑆 β†’ π΄π‘Ž | 𝑏𝐴 β†’ 𝐴𝑐 𝑆𝑑 πœ–

Page 21: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Eliminating Left Recursion

β€’ Input: Grammar 𝐺 with no cycles or πœ–βˆ’productions

β€’ AlgorithmArrange nonterminals in some order 𝐴1, 𝐴2, … , 𝐴𝑛for 𝑖 ← 1…𝑛

for 𝑗 ← 1 to 𝑖 βˆ’ 1

If βˆƒ a production 𝐴𝑖 β†’ 𝐴𝑗𝛾

Replace 𝐴𝑖 β†’ 𝐴𝑗𝛾 with one or more productions that expand 𝐴𝑗

Eliminate the immediate left recursion among the 𝐴𝑖 productions

CS 335 Swarnendu Biswas

Page 22: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Eliminating Left Recursion

β€’ Input: Grammar 𝐺 with no cycles or πœ–βˆ’productions

β€’ AlgorithmArrange nonterminals in some order 𝐴1, 𝐴2, … , 𝐴𝑛for 𝑖 ← 1…𝑛

for 𝑗 ← 1 to 𝑖 βˆ’ 1

If βˆƒ a production 𝐴𝑖 β†’ 𝐴𝑗𝛾

Replace 𝐴𝑖 β†’ 𝐴𝑗𝛾 with one or more productions that expand 𝐴𝑗

Eliminate the immediate left recursion among the 𝐴𝑖 productions

CS 335 Swarnendu Biswas

Loop invariant at the start of outer iteration 𝑖

βˆ€π‘˜ < 𝑖, no production expanding π΄π‘˜ has 𝐴𝑙 in its righthand side for all 𝑙 < π‘˜

Page 23: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Eliminating Indirect Left Recursion

CS 335 Swarnendu Biswas

𝑆 β†’ π΄π‘Ž | 𝑏𝐴 β†’ 𝐴𝑐 𝑆𝑑 πœ–

𝑆 β†’ π΄π‘Ž | 𝑏𝐴 β†’ 𝑏𝑑𝐴′ | 𝐴′

𝐴′ β†’ 𝑐𝐴′ π‘Žπ‘‘π΄β€² πœ–

Page 24: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Cost of Backtracking

Backtracking is expensive

β€’ Parser expands a nonterminal with the wrong rule

β€’ Mismatch between the lower fringe of the parse tree and the input is detected

β€’ Parser undoes the last few actions

β€’ Parser tries other productions if any

CS 335 Swarnendu Biswas

Page 25: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Avoid Backtracking

β€’ Parser is to select the next rule β€’ Compare the curr symbol and the next input symbol called the lookahead

β€’ Use the lookahead to disambiguate the possible production rules

β€’ Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead β€’ Also called a predictive grammar

CS 335 Swarnendu Biswas

Page 26: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

FIRST Set

β€’ Intuitionβ€’ Each alternative for the leftmost nonterminal leads to a distinct terminal

symbol

β€’ Which rule to choose becomes obvious by comparing the next word in the input stream

β€’ Given a string 𝛾 of terminal and nonterminal symbols, FIRST(𝛾) is the set of all terminal symbols that can begin any string derived from 𝛾‒ We also need to keep track of which symbols can produce the empty string

β€’ FIRST: (𝑁𝑇 βˆͺ 𝑇 βˆͺ πœ–, EOF ) β†’ (𝑇 βˆͺ πœ–, EOF )

CS 335 Swarnendu Biswas

Page 27: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Steps to Compute FIRST Set

1. If 𝑋 is a terminal, then FIRST 𝑋 = {𝑋}

2. If 𝑋 β†’ πœ– is a production, then πœ– ∈ FIRST(𝑋)

3. If 𝑋 is a nonterminal and 𝑋 β†’ π‘Œ1π‘Œ2… π‘Œπ‘˜ is a productionI. Everything in FIRST(π‘Œ1) is in FIRST 𝑋

II. If for some 𝑖, π‘Ž ∈ FIRST(π‘Œπ‘–) and βˆ€1 ≀ 𝑗 < 𝑖, πœ– ∈ FIRST(π‘Œπ‘—), then π‘Ž ∈FIRST(𝑋)

III. If πœ– ∈ FIRST(π‘Œ1β€¦π‘Œπ‘˜), then πœ– ∈ FIRST(𝑋)

CS 335 Swarnendu Biswas

Page 28: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

FIRST Set

β€’ Generalize FIRST relation to string of symbols

FIRST 𝑋𝛾 β†’ FIRST 𝑋 if 𝑋 ↛ πœ–

FIRST 𝑋𝛾 β†’ FIRST 𝑋 βˆͺ FIRST 𝛾 if 𝑋 β†’ πœ–

CS 335 Swarnendu Biswas

Page 29: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Compute FIRST Set

CS 335 Swarnendu Biswas

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

𝐸π‘₯π‘π‘Ÿβ€² β†’ +π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

βˆ’π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€² πœ–

π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€² πœ–

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ) | num | name

Page 30: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Compute FIRST Set

FIRST 𝐸π‘₯π‘π‘Ÿ = {(, name, num}

FIRST 𝐸π‘₯π‘π‘Ÿβ€² = {+,βˆ’, πœ–}

FIRST π‘‡π‘’π‘Ÿπ‘š = {(, name, num}

FIRST π‘‡π‘’π‘Ÿπ‘šβ€² = {πœ– Γ—,Γ·}

FIRST πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ = {(, name,num}

CS 335 Swarnendu Biswas

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

𝐸π‘₯π‘π‘Ÿβ€² β†’ +π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

βˆ’π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€² πœ–

π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€² πœ–

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ) | num | name

Page 31: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

FOLLOW Set

β€’ FOLLOW(𝑋) is the set of terminals that can immediately follow 𝑋‒ That is, 𝑑 ∈ FOLLOW(𝑋) if there is any derivation containing 𝑋𝑑

CS 335 Swarnendu Biswas

𝑆

𝐴 π‘Ž 𝛽𝛼

𝑐 𝛾…

Terminal 𝑐 is in FIRST(𝐴) and π‘Žis in FOLLOW(𝐴)

Page 32: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Steps to Compute FOLLOW Set

1. Place $ in FOLLOW(𝑆) where 𝑆 is the start symbol and $ is the end marker

2. If there is a production 𝐴 β†’ 𝛼𝐡𝛽, then everything in FIRST(𝛽)except πœ– is in FOLLOW(𝐡)

3. If there is a production 𝐴 β†’ 𝛼𝐡, or a production 𝐴 β†’ 𝛼𝐡𝛽 where FIRST(𝛽) contains πœ–, then everything in FOLLOW(𝐴) is in FOLLOW(𝐡)

CS 335 Swarnendu Biswas

Page 33: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Compute FOLLOW Set

CS 335 Swarnendu Biswas

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

𝐸π‘₯π‘π‘Ÿβ€² β†’ +π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

βˆ’π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€² πœ–

π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€² πœ–

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ) | num | name

Page 34: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Compute FOLLOW Set

FOLLOW 𝐸π‘₯π‘π‘Ÿ = {$, )}

FOLLOW 𝐸π‘₯π‘π‘Ÿβ€² = {$,)}

FOLLOW π‘‡π‘’π‘Ÿπ‘š = {$, +,βˆ’, )}

FOLLOW π‘‡π‘’π‘Ÿπ‘šβ€² = {$,+,βˆ’, )}

FOLLOW πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ = {$, +,βˆ’,Γ—,Γ·, )}

CS 335 Swarnendu Biswas

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

𝐸π‘₯π‘π‘Ÿβ€² β†’ +π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€²

βˆ’π‘‡π‘’π‘Ÿπ‘š 𝐸π‘₯π‘π‘Ÿβ€² πœ–

π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€²

Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ π‘‡π‘’π‘Ÿπ‘šβ€² πœ–

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ (𝐸π‘₯π‘π‘Ÿ) | num | name

Page 35: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Conditions for Backtrack-Free Grammar

β€’ Consider a production 𝐴 β†’ 𝛽

FIRST+ = α‰ŠFIRST 𝛽 if πœ– βˆ‰ FIRST(𝛽)

FIRST 𝛽 βˆͺ FOLLOW 𝐴 otherwise

β€’ For any nonterminal 𝐴 where 𝐴 β†’ 𝛽1|𝛽2|…| 𝛽𝑛, a backtrack-free grammar has the property FIRST+ 𝐴 β†’ 𝛽𝑖 ∩ FIRST+ 𝐴 β†’ 𝛽𝑗 = πœ™, βˆ€1 ≀ 𝑖, 𝑗 ≀ 𝑛, 𝑖 β‰  𝑗

CS 335 Swarnendu Biswas

Page 36: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Backtracking

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘šπΈπ‘₯π‘π‘Ÿβ€²

𝐸π‘₯π‘π‘Ÿβ€² β†’ +π‘‡π‘’π‘Ÿπ‘šπΈπ‘₯π‘π‘Ÿβ€²

βˆ’π‘‡π‘’π‘Ÿπ‘šπΈπ‘₯π‘π‘Ÿβ€² πœ–

π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘‡π‘’π‘Ÿπ‘šβ€²

π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘‡π‘’π‘Ÿπ‘šβ€²

Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘‡π‘’π‘Ÿπ‘šβ€² πœ–

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

| name [ π΄π‘Ÿπ‘”π‘™π‘–π‘ π‘‘ ]

| name ( π΄π‘Ÿπ‘”π‘™π‘–π‘ π‘‘ )

π΄π‘Ÿπ‘”π‘™π‘–π‘ π‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ π‘€π‘œπ‘Ÿπ‘’π΄π‘Ÿπ‘”π‘ 

π‘€π‘œπ‘Ÿπ‘’π΄π‘Ÿπ‘”π‘  β†’ , 𝐸π‘₯π‘π‘Ÿ π‘€π‘œπ‘Ÿπ‘’π΄π‘Ÿπ‘”π‘ 

| πœ–

CS 335 Swarnendu Biswas

Page 37: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Backtracking

π‘†π‘‘π‘Žπ‘Ÿπ‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ

𝐸π‘₯π‘π‘Ÿ β†’ π‘‡π‘’π‘Ÿπ‘šπΈπ‘₯π‘π‘Ÿβ€²

𝐸π‘₯π‘π‘Ÿβ€² β†’ +π‘‡π‘’π‘Ÿπ‘šπΈπ‘₯π‘π‘Ÿβ€²

βˆ’π‘‡π‘’π‘Ÿπ‘šπΈπ‘₯π‘π‘Ÿβ€² πœ–

π‘‡π‘’π‘Ÿπ‘š β†’ πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘‡π‘’π‘Ÿπ‘šβ€²

π‘‡π‘’π‘Ÿπ‘šβ€² β†’Γ— πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘‡π‘’π‘Ÿπ‘šβ€²

Γ· πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘‡π‘’π‘Ÿπ‘šβ€² πœ–

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ name

| name [ π΄π‘Ÿπ‘”π‘™π‘–π‘ π‘‘ ]

| name ( π΄π‘Ÿπ‘”π‘™π‘–π‘ π‘‘ )

π΄π‘Ÿπ‘”π‘™π‘–π‘ π‘‘ β†’ 𝐸π‘₯π‘π‘Ÿ π‘€π‘œπ‘Ÿπ‘’π΄π‘Ÿπ‘”π‘ 

π‘€π‘œπ‘Ÿπ‘’π΄π‘Ÿπ‘”π‘  β†’ , 𝐸π‘₯π‘π‘Ÿ π‘€π‘œπ‘Ÿπ‘’π΄π‘Ÿπ‘”π‘ 

| πœ–

CS 335 Swarnendu Biswas

Not all grammars are backtrack free

Page 38: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Left Factoring

β€’ Left factoring is the process of extracting and isolating common prefixes in a set of productions

β€’ Algorithm

CS 335 Swarnendu Biswas

πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ β†’ π‘›π‘Žπ‘šπ‘’ π΄π‘Ÿπ‘”π‘’π‘šπ‘’π‘›π‘‘π‘ π΄π‘Ÿπ‘”π‘’π‘šπ‘’π‘›π‘‘π‘  β†’ π΄π‘Ÿπ‘”πΏπ‘–π‘ π‘‘ π΄π‘Ÿπ‘”πΏπ‘–π‘ π‘‘ πœ–

𝐴 β†’ 𝛼𝛽1 𝛼𝛽2 … 𝛼𝛽𝑛 𝛾1 𝛾2 … |𝛾𝑗

𝐴 β†’ 𝛼𝐡|𝛾1 𝛾2 … |𝛾𝑗𝐡 β†’ 𝛽1 𝛽2 … |𝛽𝑛

Page 39: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Key Insight in Using Top-Down Parsing

β€’ Efficiency depends on the accuracy of selecting the correct production for expanding a nonterminalβ€’ Parser may not terminate in the worst case

β€’ A large subset of the context-free grammars can be parsed without backtracking

CS 335 Swarnendu Biswas

Page 40: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Recursive-Descent Parsing

CS 335 Swarnendu Biswas

Page 41: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Recursive-Descent Parsing

β€’ Recursive-descent parsing is a form of top-down parsing that mayrequire backtracking

β€’ Consists of a set of procedures, one for each nonterminal

CS 335 Swarnendu Biswas

void A() {Choose an A-production 𝐴 β†’ 𝑋1𝑋2β€¦π‘‹π‘˜for 𝑖 ← 1β€¦π‘˜

if 𝑋𝑖 is a nonterminalcall procedure 𝑋𝑖()

else if 𝑋𝑖 equals the current input symbol π‘Žadvance the input to the next symbol

else // error

}

Page 42: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Limitations with Recursive-Descent Parsing

β€’ Consider a grammar with two productions 𝑋 β†’ 𝛾1 and 𝑋 β†’ 𝛾2β€’ Suppose FIRST(𝛾1) ∩ FIRST(𝛾2) β‰  πœ™

β€’ Say π‘Ž is the common terminal symbol

β€’ Function corresponding to 𝑋 will not know which production to use on input token π‘Ž

CS 335 Swarnendu Biswas

Page 43: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Recursive-Descent Parsing with Backtracking

β€’ To support backtracking β€’ All productions should be tried in some order

β€’ Failure for some production implies we need to try remaining productions

β€’ Report an error only when there are no other rules

CS 335 Swarnendu Biswas

Page 44: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Predictive Parsing

β€’ Special case of recursive-descent parsing that does not require backtrackingβ€’ Lookahead symbol unambiguously determines which production rule to use

β€’ Advantage is that the algorithm is simple and the parser can be constructed by hand

CS 335 Swarnendu Biswas

π‘ π‘‘π‘šπ‘‘ β†’ expr ;| if 𝑒π‘₯π‘π‘Ÿ π‘ π‘‘π‘šπ‘‘| for π‘œπ‘π‘‘π‘’π‘₯π‘π‘Ÿ ; π‘œπ‘π‘‘π‘’π‘₯π‘π‘Ÿ ; π‘œπ‘π‘‘π‘’π‘₯π‘π‘Ÿ π‘ π‘‘π‘šπ‘‘| other

π‘œπ‘π‘‘π‘’π‘₯π‘π‘Ÿ β†’ πœ– | expr

Page 45: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Pseudocode for a Predictive Parser

void stmt() {switch(lookahead) {

case expr:match(expr); match(β€˜;’); break;

case if:match(if); match(β€˜(β€˜); match(expr); match(β€˜)’); stmt(); break;

case for:match(for); match(β€˜(β€˜); optexpr(); match(β€˜;’); optexpr(); match(β€˜;’); optexpr(); match(β€˜)’); stmt(); break;

case other:match(other); break;

default:report(β€œsyntax error”);

}}

CS 335 Swarnendu Biswas

Page 46: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

LL(1) Grammars

β€’ Class of grammars for which no backtracking is requiredβ€’ First L stands for left-to-right scan, second L stands for leftmost derivation

β€’ There is one lookahead token

β€’ No left-recursive or ambiguous grammar can be LL(1)

β€’ In LL(k), k stands for k lookahead tokensβ€’ Predictive parsers accept LL(k) grammars

β€’ Every LL(1) grammar is a LL(2) grammar

CS 335 Swarnendu Biswas

Page 47: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Nonrecursive Table-Driven Predictive Parser

CS 335 Swarnendu Biswas

Parsing Table 𝑀

Predictive Parsing Program

a + b $Input

OutputStack X

Y

Z

$

Page 48: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Predictive Parsing Algorithmβ€’ Input: String 𝑀 and parsing table 𝑀 for grammar 𝐺

β€’ Algorithm:Let π‘Ž be the first symbol in 𝑀Let 𝑋 be the symbol at the top of the stack while 𝑋 β‰  $:

if 𝑋 == π‘Ž:pop the stack and advance the input

else if 𝑋 is a terminal or 𝑀[𝑋, π‘Ž] is an error entry:error

else if 𝑀 𝑋, π‘Ž == 𝑋 β†’ π‘Œ1π‘Œ2β€¦π‘Œπ‘˜:output the production pop the stackpush π‘Œπ‘˜π‘Œπ‘˜βˆ’1β€¦π‘Œ1 onto the stack

𝑋 ← top stack symbol

CS 335 Swarnendu Biswas

Page 49: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Predictive Parsing Table

CS 335 Swarnendu Biswas

Nonterminal id + * ( ) $

𝐸 𝐸 β†’ 𝑇𝐸′ 𝐸 β†’ 𝑇𝐸′

𝐸′ 𝐸′ β†’ +𝑇𝐸′ 𝐸′ β†’ πœ– 𝐸′ β†’ πœ–

𝑇 𝑇 β†’ 𝐹𝑇′ 𝑇 β†’ 𝐹𝑇′

𝑇′ 𝑇′ β†’ πœ– 𝑇′ β†’βˆ— 𝐹𝑇′ 𝑇′ β†’ πœ– 𝑇′ β†’ πœ–

𝐹 𝐹 β†’ id 𝐹 β†’ (𝐸)

𝐸 β†’ 𝑇𝐸′

𝐸′ β†’ +𝑇𝐸′ | πœ–π‘‡ β†’ 𝐹𝑇′

𝑇′ β†’βˆ— 𝐹𝑇′ | πœ–πΉ β†’ 𝐸 | id

Page 50: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Construction of a Predictive Parsing Table

β€’ Input: Grammar 𝐺

β€’ Algorithm:β€’ For each production 𝐴 β†’ 𝛼 in 𝐺,

β€’ For each terminal π‘Ž in FIRST 𝛼 , add 𝐴 β†’ 𝛼 to 𝑀[𝐴, π‘Ž]

β€’ If πœ– is in FIRST 𝛼 , then for each terminal 𝑏 in FOLLOW(𝐴), add 𝐴 β†’ 𝛼 to 𝑀 𝐴, 𝑏

β€’ If πœ– is in FIRST 𝛼 and $ is in FOLLOW(𝐴), add 𝐴 β†’ 𝛼 to 𝑀[𝐴, $]

β€’ No production in 𝑀[𝐴, π‘Ž] indicates error

CS 335 Swarnendu Biswas

Page 51: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Working of Predictive ParserMatched Stack Input Action

𝐸$ id+ id βˆ— id$

𝑇𝐸′$ id+ id βˆ— id$ Output 𝐸 β†’ 𝑇𝐸′

𝐹𝑇′𝐸′$ id+ id βˆ— id$ Output 𝑇 β†’ 𝐹𝑇′

id𝑇′𝐸′$ id+ id βˆ— id$ Output 𝐹 β†’ id

id 𝑇′𝐸′$ +id βˆ— id$ Match id

id 𝐸′$ +id βˆ— id$ Output 𝑇′ β†’ πœ–

id +𝑇𝐸′$ +id βˆ— id$ Output 𝐸′ β†’ +𝑇𝐸′

id+ 𝑇𝐸′$ id βˆ— id$ Match +

id+ 𝐹𝑇′𝐸′$ id βˆ— id$ Output 𝑇 β†’ 𝐹𝑇′

id+ id𝐓′𝐸′$ id βˆ— id$ Output 𝐹 β†’ id

CS 335 Swarnendu Biswas

Page 52: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Working of Predictive ParserMatched Stack Input Action

…

id+ id𝑇′𝐸′$ id βˆ— id$ Output 𝐹 β†’ id

id+ id 𝑇′𝐸′$ βˆ— id$ Match id

id+ id βˆ— 𝐹𝑇′𝐸′$ βˆ— id$ Output 𝑇′ β†’βˆ— 𝐹𝑇′

id+ idβˆ— 𝐹𝑇′𝐸′$ id$ Match βˆ—

id+ idβˆ— id𝑇′𝐸′$ id$ Output 𝐹 β†’ id

id+ idβˆ—id 𝑇′𝐸′$ $ Match id

id+ idβˆ—id 𝐸′$ $ Output 𝑇′ β†’ πœ–

id+ idβˆ—id $ $ Output 𝐸′ β†’ πœ–

CS 335 Swarnendu Biswas

Page 53: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Predictive Parsing

β€’ Grammars whose predictive parsing tables contain no duplicate entries are called LL(1)

β€’ If grammar 𝐺 is left-recursive or is ambiguous, then parsing table 𝑀will have at least one multiply-defined cell

β€’ Some grammars cannot be transformed into LL(1)β€’ The adjacent grammar is ambiguous

CS 335 Swarnendu Biswas

𝑆 β†’ 𝑖𝐸𝑑𝑆𝑆′ | π‘Žπ‘†β€² β†’ 𝑒𝑆 | πœ–πΈ β†’ 𝑏

Page 54: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Predictive Parsing Table

CS 335 Swarnendu Biswas

Nonterminal a b e i t $𝑆 𝑆 β†’ π‘Ž 𝑆 β†’ 𝑖𝐸𝑑𝑆𝑆′

𝑆′ 𝑆′ β†’ πœ–π‘†β€² β†’ 𝑒𝑆

𝑆′ β†’ πœ–

𝐸 𝐸 β†’ 𝑏 𝑇 β†’ 𝐹𝑇′

𝑆 β†’ 𝑖𝐸𝑑𝑆𝑆′| π‘Žπ‘†β€² β†’ 𝑒𝑆 | πœ–πΈ β†’ 𝑏

Page 55: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Error Recovery in Predictive Parsing

β€’ Error conditionsβ€’ Terminal on top of the stack does not match the next input symbol

β€’ Nonterminal 𝐴 is on top of the stack, π‘Ž is the next input symbol, and 𝑀[𝐴, π‘Ž]is error

β€’ Choicesβ€’ Raise an error and quit parsing

β€’ Print an error message, try to recover from the error, and continue with compilation

CS 335 Swarnendu Biswas

Page 56: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Error Recovery in Predictive Parsing

β€’ Panic mode – skip over symbols until a token in a set of synchronizing (synch) tokens appearsβ€’ Add all tokens in FOLLOW(𝐴) to the synch set for 𝐴

β€’ Add symbols in FIRST(𝐴) to the synch set for 𝐴

β€’ Add keywords that can begin sentences

β€’ …

CS 335 Swarnendu Biswas

Page 57: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Predictive Parsing Table with Synchronizing Tokens

CS 335 Swarnendu Biswas

Nonterminal id + * ( ) $

𝐸 𝐸 β†’ 𝑇𝐸′ 𝐸 β†’ 𝑇𝐸′ synch synch

𝐸′ 𝐸′ β†’ +𝑇𝐸′ 𝐸′ β†’ πœ– 𝐸′ β†’ πœ–

𝑇 𝑇 β†’ 𝐹𝑇′ synch 𝑇 β†’ 𝐹𝑇′ synch synch

𝑇′ 𝑇′ β†’ πœ– 𝑇′ β†’βˆ— 𝐹𝑇′ 𝑇′ β†’ πœ– 𝑇′ β†’ πœ–

𝐹 𝐹 β†’ id synch synch 𝐹 β†’ (𝐸) synch synch

𝐸 β†’ 𝑇𝐸′

𝐸′ β†’ +𝑇𝐸′ | πœ–π‘‡ β†’ 𝐹𝑇′

𝑇′ β†’βˆ— 𝐹𝑇′ | πœ–πΉ β†’ 𝐸 | id

Page 58: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Error Recover Moves by Predictive ParserStack Input Remark

𝐸$ )id βˆ— +id$ Error, skip )

𝐸$ id βˆ— +id$ id is in FIRST(𝐸)

𝑇𝐸′$ id βˆ— +id$

𝐹𝑇𝐸′$ id βˆ— +id$

id𝑇𝐸′$ id βˆ— +id$

𝑇′𝐸′$ βˆ— +id$

βˆ— 𝐹𝑇′𝐸′$ βˆ— +id$

𝐹𝑇′𝐸′$ +id$ Error, 𝑀 𝐹,+ = synch

𝑇′𝐸′$ +id$ 𝐹 has been popped

𝐸′$ +id$

CS 335 Swarnendu Biswas

Page 59: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

Error Recover Moves by Predictive ParserStack Input Remark

+𝑇𝐸′$ +id$

𝑇𝐸′$ id$

𝐹𝑇′𝐸′$ id$

id𝑇′𝐸′$ id$

𝑇′𝐸′$ $

𝐸′$ $

$ $

CS 335 Swarnendu Biswas

Page 60: CS 335: Top-down Parsing - CSE - IIT Kanpur...CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References

References

β€’ A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 4.4.

β€’ K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.3.

CS 335 Swarnendu Biswas