more parsing cpsc 388 ellen walker hiram college

More Parsing

CPSC 388Ellen WalkerHiram College

Review LL(1) Grammars

• Compute First and Follow sets• Build the parsing table

– If x is in First(A), then M[A,x] = A->xZ (the rule that put x in First(A)

– If is in First(A) and x is in Follow(A), then M[A,x] = A->

• If each cell has no more than 1 rule, grammar is LL(1).

LL(k) Grammars

• Look at k terminals instead of 1 terminal– First(S) is all sequences of k terminals that can begin S

– Follow(S) is all sequences of k terminals that can follow S

– Col. headers of table are sequences of k terminals instead of single terminals

• First & Follow computations get messy!

Building Parse Trees

• Each item on the stack is a syntax tree node

• To “use” a rule:– Pop (and save) LHS from stack.– Create nodes for each RHS element– Connect RHS nodes as children of LHS node

– Push RHS nodes (reverse order) on stack

Parse Tree Example

• Parsing: “aabb”• Grammar: S->aSb | • After S->aSb:

a b

a

b S2

S1S2

Stack Tree

Error Recovery

• Recognizer - either program is acceptable or not

• Error Correction - attempt to replace error by correct program– Minimal distance error correction is too hard

– Limited to simple errors (e.g. missing ;)

Error Recovery Principles

• Find error as soon as possible (to report its location accurately)

• Pick up parsing as soon as possible after error (so multiple errors caught)

• Avoid errors generating many spurious additional error messages

• Avoid infinite loops on errors (!)

Recursive Descent Error Recovery

• Panic Mode– Each function has additional parameter: synchronizing tokens (e.g. ;)

– Error causes parser to scan ahead (ignoring tokens) to find next synchronizing token

– Typical synchronizing tokens are in follow set.

Example Pseudocode

Void factor (list<token> synchset){ token = scanto({(,num}, syncset); switch (token){ (: exp(‘)’); match(‘)’); break;

num: match(num); break; default: error(“Factor”); return false;

} return true; }

Error Recovery in LL(1)

• Fill in each “blank” cell with one of the following options:– Pop: pop A from the stack (if current token is $ or in Follow(A)). “give up on” A

– Scan: skip tokens until we find one where we can restart the parse.

– Push a new nonterminal (e.g. start symbol if stack becomes empty before input does)

Bottom Up Parsing

• Start with tokens• Build up rule RHS (right side)• Replace RHS by LHS• Done when stack is only start symbol

• (Working from leaves of tree to root)

Operations in Bottom-up Parsing

• Shift:– Push the terminal from the beginning of the string to the top of the stack

• Reduce– Replace the string xyz at the top of the stack by a nonterminal A (assuming A->xyz)

• Accept (when stack is $S’; empty input)

Lookahead

• Look ahead in input by shifting (it’ll all be in the stack)

• Look ahead in the stack– This requires breaking the abstraction just a little bit (but is technically no problem)

• As before, decision to shift or reduce is made based on next token and stack

Sample Parse

• S’ -> S; S-> aSb | bSa | SS | e• String: abba

– Stack = $, input = abba$; shift– Stack = $a input = bba$; reduce S->e

– Stack = $aS input = bba$ ; shift– Stack = $aSb input = ba$ ; reduce S->aSb

– Stack = $S input = ba ; shift

Sample Parse (cont)

– Stack = $S input = ba$ ; shift– Stack = $Sb input = a$ ; reduce S->e– Stack = $SbS input = a$ ; shift– Stack = $SbSa input = $; reduce S->bSa

– Stack = $SS input = $; reduce S->SS– Stack = $S input = $; reduce S’-> S– Stack = $S’ input = $; accept

Rightmost Derivation

• Reduce rules (in order used)– S->e– S->aSb– S->e– S->bSa– S-> SS– S’-> S

Rightmost Derivation

• Rules read “upward” give the following derivation:– S’->S ->SS ->SbSa->Sba ->aSba ->abba

• Shift reduce parser generates rightmost derivation in reverse order!

• LR(k) = left-to-right input, rightmost derivation.

Right Sentential Form

• Each intermediate term of a rightmost derivation is called a right sentential form– S’ S SS SbSa– Sba aSbaabba

• All legal intermediate states are right sentential forms (split btwn stack and input string)

Shift vs. Reduce

• Shift until reduction to next left sentential form is possible– When complete RHS is at top of stack

– …and more of RHS is not at beginning of string. (Otherwise, S->e would always be used!)

more parsing cpsc 388 ellen walker hiram college

Documents