cs466(prasad)l7parse1 parsing recognition of strings in a language

17
CS466(Prasad) L7Parse 1 Parsing Recognition of strings in a language

Upload: dorthy-smith

Post on 16-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 1

Parsing

Recognition of strings in a language

Page 2: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 2

Graph of a Grammar

• Represents leftmost derivations of a CFG.– A pathpath from node S to a node w is a leftmost

derivation.

Nodes Left sentential forms

Arc labels Production rules

Root Start Symbol

Leaves Sentences

Page 3: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 3

|

||

||

aCC

bCbSaBB

bBaSS

S

aS

bB

aaS abB a baB bbS bbC

aSS

S

aSS

bBS

bBS S

aBBaSB

bCB

Page 4: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 4

Properties of Graph of a Grammar

• Every node has a finite number of children.• Simple breadth-first enumeration feasible.

• The number of leaves is infinite if the language is infinite.

• Typical case.

• There can be infinite long paths (derivations).

• Loops in depth-first traversals.

Page 5: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 5

abSbaSS ||

S

aS Sb

aaS aab aSb abb Sbb

(Illustrates ambiguity in the grammar.)

ab

DirectedAcyclicGraph

aSS

aSS aSS

SbS

SbS SbS

abS

abS abS

Page 6: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 6

|SSS

(Illustrates ambiguous grammar with cycles.)

Cyclic structure

S

SS

SSS

Page 7: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 7

• ParserA program that determines if a string

by constructing a derivation. Equivalently,

it searches the graph of G.

– Top-down parsers• Constructs the derivation tree from root to leaves.

• Leftmost derivation.

– Bottom-up parsers• Constructs the derivation tree from leaves to root.

• Rightmost derivation in reverse.

)(GL

Page 8: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 8

)(

||

SLab

baSSS

S

S S

SS

S Sa

S

S Sa b

Leftmost derivation abaSSSS

DerivationTrees

Page 9: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 9

S

S S

S S

S Sb

S

S S

S

S

S S

a

a b

bRightmostDerivationin Reverse

abSbSSS Rightmost derivation

DerivationTrees

S S

Page 10: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 10

Top-down parsers: Breadth-first vs Depth-first

• Search the graph of a grammar breadth-first

• Uses: Queue• (+) Always terminates

with shortest derivation

• (-) Inefficient in general.

• Search the graph of a grammar depth-first

• Uses: Stack• (-) Can get into

infinite loops (e.g., left recursion)

• (+) Efficient in general.

Page 11: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 11

Determining when

• Number of terminals in sentential form

> length of w

• Prefix of sentential form preceding the leftmost non-terminal not a prefix of w.

• No rules applicable to sentential form.

)(GL

Page 12: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 12

Parsing Examples

)()(

Show

)(|

|

SLbb

AbT

TATA

AS

Page 13: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 13

Breadth-first top-down parser

S

A

T A+T

b (A) T+T A+T+T

(T) (A+T)

(b) ((A))…

……

T+T+TA+T+T+T

… …

Queue-up leftsentential forms level by level

(T)+T

(A)+T

(b)+T

(b)+b

Parse successful

Page 14: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 14

Depth-first top-down parser

S

A

T A+T

b (A) T+T A+T+T

(T) (A+T)

(b) ((A))

… T+T+T A+T+T+T

… …

Use stack to pursue entire path from left

loop

Backtrack On failure

Parse fails

Page 15: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 15

Summary

• In BFTD version, all left derivations investigated in parallel.

• In DFTD version, one specific derivation is pursued to completion.

• Done, if succeeds.

• Otherwise, backtrack and investigate another path.

(Incomplete strategy)

(Used by Prolog interpreter)

Page 16: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 16

Bottom-up parsing

(b)+b

(T)+b (b)+T

(T)+T Not allowed

(b)+A(T)+T

……

(A)+b

(A)+T(S)+b T+b

A+b

A+T SA

Parse successful

Page 17: CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad) L7Parse 17

Practical Parsers

• Language/Grammar designed to enable deterministic (directed and backtrack-free) searches.

• Uses lookahead tokens and/or exploits the context in the sentential form constructed so far.

“Look before you leap.” vs “Procrastination principle.”

– Top-down parsers : LL(k) languages• E.g., Pascal, Ada, etc.• Better error diagnosis and recovery.

– Bottom-up parsers : LALR(1), LR(k) languages• E.g., C/C++, Java, etc.• Handles left recursion in the grammar.

– Backtracking parsers• E.g., Prolog interpreter.