bottom up parsing

149
1 BOTTOM UP PARSING

Upload: vivek-patel

Post on 11-Nov-2015

26 views

Category:

Documents


2 download

DESCRIPTION

Bottom Up Parsing

TRANSCRIPT

  • 1BOTTOM UP PARSING

  • 2Bottom up parsing

    Goal of parser : build a derivation

    top-down parser : build a derivation by working from the start

    symbol towards the input.

    builds parse tree from root to leaves

    builds leftmost derivation

    bottom-up parser : build a derivation by working from the input

    back toward the start symbol

    builds parse tree from leaves to root

    builds reverse rightmost derivation

    a string the starting symbol

    reduced to

  • 3 A general style of bottom-up syntax analysis, known as shift-reduce

    parsing.

    Two types of bottom-up parsing:

    Operator-Precedence parsing

    LR parsing

  • 4Bottom Up Parsing Shift-Reduce Parsing

    Reduce a string to the start symbol of the grammar.

    At every step a particular sub-string is matched (in left-to-right fashion)

    to the right side of some production, Replace this string by the LHS

    (called reduction).

    If the substring is chosen correctly at each step, it is the trace of a

    rightmost derivation in reverse

    Consider:

    S aABe

    A Abc | b

    B d

    abbcde

    aAbcde

    aAde

    aABe

    SRightmost Derivation:

    S aABe aAde aAbcde abbcde

    Reverse

    order

  • 5Handle

    A Handle of a string

    A substring that matches the RHS of some production and whose

    reduction represents one step of a rightmost derivation in reverse

    So we scan tokens from left to right, find the handle, and replace it

    by corresponding LHS

    Formally:

    handle of a right sentential form is a production A ,

    location of in , that satisfies the above property.

    i.e. A is a handle of at the location immediately after the

    end of , if:

    S => A =>

  • 6Handle

    A certain sentential form may have many different handles.

    Right sentential forms of a non-ambiguous grammar

    have one unique handle

  • 7An Example of Bottom-Up Paring

    S aABe

    A Abc | b

    B d

  • 8Handle-pruning,

    The process of discovering a handle & reducing it to the

    appropriate left-hand side is called handle pruning.

    Handle pruning forms the basis for a bottom-up parsing method.

    Problems:

    Two problems:

    locate a handle and

    decide which production to use (if there are more than two candidate

    productions).

  • 9Shift Reduce Parser using Stack

    General Construction: using a stack:

    shift input symbols into the stack until a handle is found on top of

    it.

    reduce the handle to the corresponding non-terminal.

    other operations:

    accept when the input is consumed and only the start symbol is

    on the stack, also: error

    Initial stack just contains only the end-marker $.

    The end of the input string is marked by the end-marker $.

  • 10

    Shift-Reduce Parser Example

    * ( + num )numnum

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

    Stack

    Input String

    $

    $

  • 11

    Shift-Reduce Parser Example

    * ( + num )numnum

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 12

    SH

    IFT

    Shift-Reduce Parser Example

    * ( + num )numnum

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 13

    SH

    IFT

    Shift-Reduce Parser Example

    * ( + num )num

    num

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 14

    Shift-Reduce Parser Example

    * ( + num )num

    num

    RE

    DU

    CE

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 15

    Shift-Reduce Parser Example

    * ( + num )num

    num

    RE

    DU

    CE

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 16

    Shift-Reduce Parser Example

    * ( + num )num

    num

    RE

    DU

    CE

    Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 17

    Shift-Reduce Parser Example

    * ( + num )num

    num

    Expr

    SH

    IFT

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 18

    Shift-Reduce Parser Example

    ( + num )num

    num

    Expr

    SH

    IFT

    *

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 19

    Shift-Reduce Parser Example

    ( + num )num

    num

    Expr

    Op

    RE

    DU

    CE

    *

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 20

    Shift-Reduce Parser Example

    ( + num )num

    num

    Expr

    Op

    *

    SH

    IFT

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 21

    Shift-Reduce Parser Example

    + num )num

    num

    Expr

    Op

    *

    SH

    IFT

    (

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 22

    Shift-Reduce Parser Example

    + num )num

    num

    Expr

    Op

    *

    SH

    IFT

    (

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 23

    Shift-Reduce Parser Example

    + num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 24

    Shift-Reduce Parser Example

    + num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    RE

    DU

    CE

    Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 25

    Shift-Reduce Parser Example

    + num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 26

    Shift-Reduce Parser Example

    num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    +

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 27

    Shift-Reduce Parser Example

    num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Op

    RE

    DU

    CE

    +

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 28

    Shift-Reduce Parser Example

    num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Op

    +

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 29

    Shift-Reduce Parser Example

    num )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Op

    +

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 30

    Shift-Reduce Parser Example

    )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Op

    +

    Expr

    RE

    DU

    CE

    num

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 31

    Shift-Reduce Parser Example

    )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num +

    RE

    DU

    CE

    num

    Expr

    Expr

    Op

    Expr

    Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 32

    Shift-Reduce Parser Example

    )

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Op

    +

    Expr

    num

    Expr

    Expr Expr Op

    Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 33

    Shift-Reduce Parser Example

    num

    Expr

    Op

    *

    SH

    IFT

    (

    num

    Expr

    Op

    +

    Expr

    num

    Expr

    )Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 34

    RE

    DU

    CEShift-Reduce Parser Example

    num

    Expr

    Op

    *

    (

    num

    Expr

    Op

    +

    Expr

    num

    Expr

    )Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 35

    RE

    DU

    CEShift-Reduce Parser Example

    num

    Expr Expr Op Expr

    Expr

    Op

    *

    (

    num

    Expr

    Op

    +

    Expr

    num

    Expr

    )Expr

    Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 36

    AC

    CE

    PT

    !Shift-Reduce Parser Example

    num

    Expr

    Op

    *

    (

    num

    Expr

    Op

    +

    Expr

    num

    Expr

    )Expr

    Expr

    Expr Expr Op Expr

    Expr (Expr)

    Expr - Expr

    Expr num

    Op +

    Op -

    Op *

  • 37

    Basic Idea

    Goal: construct parse tree for input string

    Read input from left to right

    Build tree in a bottom-up fashion

    Use stack to hold pending sequences of terminals and nonterminals

  • 38

    Example, Corresponding Parse Tree

    S

    Term

    Fact.

    Expr

    Expr

    Fact.

    Fact.Term

    Term

    *

    1. Shift until top-of-stack is the right end of a handle

    2. Pop the right end of the handle & reduce

    SEXP EXP EXP + TERM |TERM

    TERMTERM*F ACT| FACT FACT(EXP)|ID |NUM

  • 39

    Conflicts During Shift-Reduce Parsing

    There are context-free grammars for which shift-reduce parsers cannot

    be used.

    Stack contents and the next input symbol may not decide action:

    shift/reduce conflict: Whether make a shift operation or a reduction.

    reduce/reduce conflict: The parser cannot decide which of several

    reductions to make.

    If a shift-reduce parser cannot be used for a grammar, that grammar is

    called as non-LR(k) grammar.

    left to right right-most k lookheadscanning derivation

    An ambiguous grammar can never be a LR grammar.

  • 40

    More on Shift-Reduce Parsing

    stmt if expr then stmt

    | if expr then stmt else stmt

    | other (any other statement)

    Stack Input

    if then stmt else Shift/ Reduce Conflict

    Conflictsshift/reduce or reduce/reduce

    Example:

    We cant tell whether

    it is a handle

  • 41

    Confilcts Resolution

    Conflict resolution by adapting the parsing algorithm (e.g., in

    parser generators)

    Shift-reduce conflict

    Resolve in favor of shift

    Reduce-reduce conflict

    Use the production that appears earlier

  • 42

    Shift-Reduce Parsers

    There are two main categories of shift-reduce parsers

    1. Operator-Precedence Parser

    simple, but only a small class of grammars.

    2. LR-Parsers

    covers wide range of grammars.

    SLR simple LR parser

    LR most general LR parser

    LALR intermediate LR parser (lookhead LR parser)

    SLR, LR and LALR work same, only their parsing tables are different.

    SLR

    CFG

    LR

    LALR

  • 43

    Consider the Grammar

    SS

    S(S)S|

    Show the actions of shift reduce parser for the input string ( ) using the

    above grammar

  • 44

    Operator-Precedence Parser

    Operator grammar

    small, but an important class of grammars

    we may have an efficient operator precedence parser (a shift-reduce parser) for an operator grammar.

    In an operator grammar, no production rule can have:

    at the right side

    two adjacent non-terminals at the right side.

    Ex:EAB EEOE EE+E |

    Aa Eid E*E |

    Bb O+|*|/ E/E | id

    not operator grammar not operator grammar operator grammar

  • 45

    Precedence Relations

    In operator-precedence parsing, we define three disjoint precedence

    relations between certain pairs of terminals.

    a b b has lower precedence than a

    The determination of correct precedence relations between terminals

    are based on the traditional notions of associativity and precedence of

    operators. (Unary minus causes a problem).

  • 46

    Using Operator-Precedence Relations

    The intention of the precedence relations is to find the handle of

    a right-sentential form,

    marking the right hand.

    In our input string $a1a2...an$, we insert the precedence relation

    between the pairs of terminals (the precedence relation holds between

    the terminals in that pair).

  • 47

    Using Operator -Precedence Relations

    E E+E | E-E | E*E | E/E | E^E | (E) | -E | id

    The partial operator-precedence

    table for this grammar

    Then the input string id+id*id with the precedence relations inserted

    will be:

    $ + * $

    id + * $

    id .> .> .>

    +

    * .> .>

    $

  • 48

    To Find The Handles

    1. Scan the string from left end until the first .> is encountered.

    2. Then scan backwards (to the left) over any = until a and to the right of

    the

  • 49

    Operator-Precedence Parsing Algorithm

    The input string is w$, the initial stack is $ and a table holds precedence relations between certain terminals

    Algorithm:

    set p to point to the first symbol of w$ ;

    repeat forever

    if ( $ is on top of the stack and p points to $ ) then return

    else {

    let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by p;

    if ( a b ) then /* REDUCE */

    repeat pop stack

    until ( the top of stack terminal is related by

  • 50

    Operator-Precedence Parsing Algorithm -- Example

    stack input action

    $ id+id*id$ shift

    $

  • 51

    How to Create Operator-Precedence Relations

    We use associativity and precedence relations among operators.

    1. If operator O1 has higher precedence than operator O2, O1

    .> O2 and O2 O2 and O2.> O1

    they are right-associative O1 .> .> .> .> .> .> .> .> .> .> .> .> .>

    (

    $

  • 53

    Handling Unary Minus

    Operator-Precedence parsing cannot handle the unary minus when we

    also the binary minus in our grammar.

    The best approach to solve this problem, let the lexical analyzer handle

    this problem.

    The lexical analyzer will return two different operators for the unary minus and the binary

    minus.

    The lexical analyzer will need a lookhead to distinguish the binary minus from the unary

    minus.

    Then, we make

    O O if unary-minus has higher precedence than O

    unary-minus

  • 54

    Precedence Functions

    Compilers using operator precedence parsers do not need to store the

    table of precedence relations.

    The table can be encoded by two precedence functions f and g that map

    terminal symbols to integers.

    For symbols a and b.

    f(a) < g(b) whenever a g(b) whenever a .> b

  • 55

    Disadvantages of Operator Precedence Parsing

    Disadvantages:

    It cannot handle the unary minus (the lexical analyzer should handle

    the unary minus).

    Small class of grammars.

    Difficult to decide which language is recognized by the grammar.

    Advantages:

    simple

    powerful enough for expressions in programming languages

  • 56

    Error Recovery in Operator-Precedence Parsing

    Error Cases:

    1. No relation holds between the terminal on the top of stack and the

    next input symbol.

    2. A handle is found (reduction step), but there is no production with

    this handle as a right side

    Error Recovery:

    1. Each empty entry is filled with a pointer to an error routine.

    2. Decides the popped handle looks like which right hand side. And

    tries to recover from that situation.

  • 57

    Handling Shift/Reduce Errors

    When consulting the precedence matrix to decide whether to shift or

    reduce, we may find that no relation holds between the top stack and

    the first input symbol.

    To recover, we must modify (insert/change)

    1. Stack or

    2. Input or

    3. Both.

    We must be careful that we dont get into an infinite loop.

  • 58

    Example

    id ( ) $

    id e3 e3 .> .>

    (

    $

  • 59

    Example

    id ( ) $

    id e3 e3 .> .>

    (

    $

  • 60

    CONSTRUCT OPERATOR PRECEDENCE PARSING TABLE FOR THE GRAMMAR

    EE+E | E* E|(E)| id

  • 61

    LR Parsers

    The most powerful shift-reduce parsing (yet efficient) is:

    LR(k) parsing.

    left to right right-most k lookheadscanning derivation (k is omitted it is 1)

    LR parsing is attractive because: LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.

    The class of grammars that can be parsed using LR methods is a proper superset of the class

    of grammars that can be parsed with predictive parsers.

    LL(1)-Grammars LR(1)-Grammars

    An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right

    scan of the input.

  • 62

    LL(k) vs. LR(k)

    LL(k): must predict which production to use having seen only first k

    tokens of RHS

    Works only with some grammars

    But simple algorithm (can construct by hand)

    LR(k): more powerful

    Can postpone decision until seen tokens of entire RHS of a

    production & k more beyond

  • 63

    More on LR(k)

    Can recognize virtually all programming language constructs (if CFG

    can be given)

    Most general non-backtracking shift-reduce method known, but can be

    implemented efficiently

    Class of grammars can be parsed is a superset of grammars parsed by

    LL(k)

    Can detect syntax errors as soon as possible

  • 64

    More on LR(k)

    Main drawback: too tedious to do by hand for typical

    programming lang. grammars We need a parser generator

    Many available

    Yacc (yet another compiler compiler) or bison for C/C++

    environment

    CUP (Construction of Useful Parsers) for Java environment;

    JavaCC is another example

    We write the grammar and the generator produces the parser for that

    grammar

  • 65

    LR Parsers

    LR-Parsers

    covers wide range of grammars.

    SLR simple LR parser

    LR most general LR parser

    LALR intermediate LR parser (look-head LR parser)

    SLR, LR and LALR work same (they used the same algorithm),

    only their parsing tables are different.

  • 66

    LR Parsing Algorithm

    Sm

    Xm

    Sm-1

    Xm-1

    .

    .

    S1

    X1

    S0

    a1 ... ai ... an $

    Action Table

    terminals and $

    st four different a actionstes

    Goto Table

    non-terminal

    st each item isa a state numbertes

    LR Parsing Algorithm

    stack

    input

    output

  • 67

    Key Idea

    Deciding when to shift and when to reduce is based on a DFA

    applied to the stack

    Edges of DFA labeled by symbols that can be on stack

    (terminals + non-terminals)

    Transition table defines transitions (and characterizes the type

    of LR parser)

  • 68

    Entries in Transition Table

    Entry Meaning

    sn Shift into state n (advance input

    pointer to next token)

    gn Goto state n

    rk Reduce by rule (production) k;

    corresponding gn gives next state

    a Accept

    Error (denoted by blank entry)

  • 69

    How to make the Parse Table?

    Use DFA for building parse tables

    Each state now summarizes how much we have seen so far

    and what we expect to see

    Helps us to decide what action we need to take

    How to build the DFA, then?

    Analyze the grammar and productions

    Need a notation to show how much we have seen so far

    for a given production: LR(0) item

  • 70

    LR(0) Item

    An LR(0) item is a production and a position in its RHS marked by a dot

    (e.g., A )

    The dot tells how much of the RHS we have seen so far. For example,

    for a production S XYZ,

    S XYZ: we hope to see a string derivable from XYZ

    S XYZ: we have just seen a string derivable from X and we hope to see a string derivable from YZ

    SXY.Z : we have just seen a string derivable from XY and we hope to see a string derivable from Z

    SXYZ. : we have seen a string derivable from XYZ and going to reduce it to S

    (X, Y, Z are grammar symbols)

  • 71

    SLR PARSING

    The central idea in the SLR method is first to construct from

    the grammar a DFA to recognize viable prefixes. We group

    items into sets, which become the states of the SLR parser.

    Viable prefixes:

    The set of prefixes of a right sentential form that can appear on the

    stack of a Shift-Reduce parser is called Viable prefixes.

    Example :- a, aa, aab, and aabb are viable prefixes of aabbbbd.

    One collection of sets of LR(0) items, called the canonical

    LR(0) collection, provides the basis for constructing SLR

    parsers.

  • 72

    Augmented Grammar

    If G is a grammar with start symbol S, then G', the

    augmented grammar for G, is G with

    new start symbol S' and

    the production S' S.

    The purpose of the augmenting production is to indicate to

    the parser when it should stop parsing and accept the input.

    That is, acceptance occurs only when the parser is about to

    reduce by the production S' S.

  • 73

    Constructing Sets of LR(0) Items

    1. Create a new nonterminal S' and a new production S' S where S is

    the start symbol.

    2. Put the item S' S into a start state called state 0.

    3. Closure: If A B is in state s, then add B to state s for

    every production B in the grammar.

    4. Creating a new state from an old state[ goto operation] : Look for an

    item of the form A x where x is a single terminal or

    nonterminal and build a new state from A x . Include in the

    new state all items with x in the old state. A new state is created for

    each different x.

    5. Repeat steps 3 and 4 until no new states are created. A state is new if it

    is not identical to an old state.

  • 74

    The Closure Operation (Example)

    Grammar:E E + T | TT T * F | FF ( E )F id

    { [E E] }

    closure({[E E]}) =

    { [E E][E E + T][E T] }

    { [E E][E E + T][E T][T T * F][T F] }

    { [E E][E E + T][E T][T T * F][T F][F ( E )][F id] }

    Add [E]Add [T]

    Add [F]

  • 75

    Formal Definition of GOTO operation for constructing

    LR(0) Items

    1. For each item [AX] I, add the set of items

    closure({[AX]}) to goto(I,X) if not already

    there

    2. Repeat step 1 until no more items can be added to

    goto(I,X)

  • 76

    The Goto Operation (Example 1)

    Suppose I = Then goto(I,E)= closure({[E E , E E + T]})= { [E E ]

    [E E + T] }

    Grammar:E E + T | TT T * F | FF ( E )F id

    { [E E][E E + T][E T][T T * F][T F][F ( E )][F id] }

  • 77

    The Goto Operation (Example 2)

    Suppose I = { [E E ], [E E + T] }

    Then goto(I,+) = closure({[E E + T]}) ={ [E E + T][T T * F][T F][F ( E )][F id] }

    Grammar:E E + T | TT T * F | FF ( E )F id

  • 78

    State 0

    We start by adding item E' E to

    state 0.

    This item has a " " immediately to the

    left of a nonterminal. Whenever this is

    the case, we must perform step 3

    (closure) of the set construction

    algorithm.

    We add the items E E + T and E

    T to state 0, giving

    I0: { E' E

    E E + T

    E T }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 79

    State 0

    Reapplying closure to E T, we must add the

    items T T * F and

    T F to state 0, giving

    I0: { E' E

    E E + T

    E T

    T T * F

    T F

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 80

    State 0

    Reapplying closure to T F, we must

    add the items F ( E ) and F id

    to state 0, giving

    I0: { E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 81

    Creating State 1 From State 0 [ goto(I0,E)]

    Final version of state 0:

    I0: {

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    }

    Using step 4, we create new state 1 from items E'

    E and E E + T

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 E I1

  • 82

    State 1

    State 1 starts with the items E' E and E E + T. These items are formed from items E' E and E E + T by moving the "" one grammar symbol to the right. In each case, the grammar symbol is E.

    Closure does not add any new items, so state 1 ends up with the 2 items:

    I1: {

    E' E

    E E + T

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 E I1

  • 83

    Creating State 2 From State 0 [ goto(I0,T)]

    Using step 4, we create state 2 from items E T

    and T T * F by moving the "" past the T.

    State 2 starts with 2 items,

    I2: {

    E T

    T T * F

    }

    Closure does not add additional items to state 2.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 T I2

  • 84

    Creating State 3 From State 0 [ goto(I0,F)]

    Using step 4, we create state 3 from item T

    F.

    State 3 starts (and ends up) with one item:

    I3: {

    T F

    }

    Since the only item in state 3 is a complete

    item, there will be no transitions out of state

    3.

    The figure on the next slide shows the DFA of

    viable prefixes to this point.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 F I3

  • 85

    DFA After Creation of State 3

  • 86

    Creating State 4 From State 0 [ goto(I0,( )]

    Using step 4, we create state 4 from item F

    ( E ).

    State 4 begins with one item:

    F ( E )

    Applying closure to this item, we add the items

    E E + T

    E T

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 87

    State 4

    Applying closure to E T, we add items T T * F and T F to state 4, giving

    F ( E )

    E E + T

    E T

    T T * F

    T F

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 88

    State 4

    Applying step 3 to T F, we add items F

    ( E ) and F id to state 4, giving the

    final set of items

    I4: {

    F ( E )

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    }

    The next slide shows the DFA to this point.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 ( I4

  • 89

    DFA After Creation of State 4

  • 90

    Creating State 5 From State 0 [ goto(I0,id)]

    Finally, from item F id in state 0, we

    create state 5, with the single item:

    I5: {

    F id

    }

    Since this item is a complete item, we will

    not be able to produce new states from state

    5.

    The next slide shows the DFA to this point.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 id I4

  • 91

    DFA After Creation of State 5

  • 92

    Creating State 6 From State 1 [ goto(I1,+)]

    State 1 consists of 2 items

    E' E E E + T

    Create state 6 from item E E + T, giving the item E E + T.

    Closure results in the set of items

    I6: {

    E E + T

    T T * F

    T F

    F ( E )

    F id

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I1 + I6

  • 93

    DFA After Creation of State 6

  • 94

    Creating State 7 From State 2 [ goto(I2,*)]

    State 2 has two items,

    E T

    T T * F

    We create state 7 from T T * F,

    giving the initial item T T * F.

    Using closure, we end up with

    I7: {

    T T * F

    F ( E )

    F id}

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I2 * I7

  • 95

    DFA After Creation of State 7

  • 96

    Creating State 8 From State 4 [ goto(I4,E)]

    We use the items F ( E ) and E

    E + T from State 4 to add the

    following items to State 8:

    I8: {

    F ( E )

    E E + T

    }

    No further items can be added to state 8

    through closure.

    There are other transitions from state 4,

    but they do not result in new states.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I4 E I8

  • 97

    Other Transitions From State 4 [ goto(I4,T),

    goto(I4,F), goto(I4,( ), goto(I4,id)]

    If we use the items E T and

    T T * F from state 4 to start a

    new state, we begin with items

    E T

    T T * F

    This set is identical to state 2.

    Similarly, the items

    T F will produce state 3

    F ( E ) will produce state 4

    F id will produce state 5

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 98

    DFA After Creation of State 8

  • 99

    Creating State 9 From State 6 [ goto(I6,+)]

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    We use item E E + T from state six to create state 9:

    I9: {

    E E + T

    T T * F

    T F

    F ( E )

    F id

    }

    All other transitions from state 6 go to existing states. The next slide shows the DFA to this point.

    I6 + I9

  • 100

    DFA After Creation of State 9

  • 101

    Creating State 10 From State 7 [ goto(I7,F)]

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    We use item T T * F from state 7 to create state 10:

    I10: {

    T T * F

    }

    All other transitions from state 7 go to existing states. The next slide shows the DFA to this point.

    I7 F I10

  • 102

    DFA After Creation of State 10

  • 103

    Creation of State 11 From State 8 [ goto(I8,F)]

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    We use item F ( E * ) from state 8 to create state 11:

    I11: {

    F ( E )

    }

    All other transitions from state 8 go to existing states.

    State 9 has one transition to an existing state (7). No other new states can be added, so we are done.

    The next slide shows the final DFA for viable prefixes. I7 F I10

  • 104

    DFA for Viable Prefixes

  • 105

    (SLR) Parsing Tables for Expression Grammar

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2 r2 s7 r2 r2

    3 r4 r4 r4 r4

    4 s5 s4 8 2 3

    5 r6 r6 r6 r6

    6 s5 s4 9 3

    7 s5 s4 10

    8 s6 s11

    9 r1 s7 r1 r1

    10 r3 r3 r3 r3

    11 r5 r5 r5 r5

    Action Table Goto Table

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

  • 106

    Constructing Parse Table

    Construct the DFA (state graph) as in LR(0)

    Action Table

    If there is a transition from the state i to state j on a terminal a,

    ACTION[i, a] = shift j

    If there is a reduce item A (for a production #k in state i, for each a FOLLOW(A),

    ACTION[i, a] = Reduce k

    If an item S S. is in state i,

    ACTION[i, $] = Accept

    Otherwise, error

    GOTO

    Write GOTO for nonterminals: for terminals it is already embedded

    in the action table

  • 107

    Algorithm Construction of SLR Parsing Table1. Construct the canonical collection of sets of LR(0) items for G.

    C{I0,...,In}

    2. Create the parsing action table as follows

    If a is a terminal, A.a in Ii and goto(Ii,a)=Ij then action[i,a] is

    shift j.

    If A. is in Ii , then action[i,a] is reduce A for all a in FOLLOW(A) where AS.

    If SS. is in Ii , then action[i,$] is accept.

    If any conflicting actions generated by these rules, the grammar is

    not SLR(1).

    Create the parsing goto table

    for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

    All entries not defined by (2) and (3) are errors.

    4. Initial state of the parser contains S.S

  • 108

    We use the partial DFA at right

    to fill in row 0 of the parse table.

    By rule 2a,

    action[ 0, ( ] = shift 4

    action[ 0, id ] = shift 5

    By rule 3,

    goto[ 0, E ] = 1

    goto[ 0, T ] = 2

    goto[ 0, F ] = 3

  • 109

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

    Action Table Goto Table

  • 110

    We use the partial DFA at right

    to fill in row 1 of the parse table.

    By rule 2a,

    action [ 1, + ] = shift 6

    By rule 2c

    action [ 1, $ ] = accept

  • 111

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

    Action Table Goto Table

  • 112

    We use the partial DFA at right

    to fill in row 5 of the parse table.

    By rule 2b, we set

    action[ 5, x ] = reduce Fid

    for each x Follow(F).

    Since Follow(F) = { ), +, *, $)

    we have

    action[ 5, ) ] = reduce

    Fid

    action[ 5, +] = reduce

    Fid

    action[5, *] = reduce

    Fid

    action[5, $] = reduce

    Fid

  • 113

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2

    3

    4

    5 r6 r6 r6 r6

    6

    7

    8

    9

    10

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

    Action Table Goto Table

  • 114

    Use the DFA to Finish the SLR Table

    The complete SLR parse table for the expression grammar is given on the next slide.

  • 115

    Parse Table For Expression Grammar

    Rules:

    1. E E + T

    2. E T

    3. T T * F

    4. T F

    5. F ( E )

    6. F id

    Notation:

    s5 = shift 5

    r2 = reduce by

    E T

    action goto

    State id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2 r2 s7 r2 r2

    3 r4 r4 r4 r4

    4 s5 s4 8 2 3

    5 r6 r6 r6 r6

    6 s5 s4 9 3

    7 s5 s4 10

    8 s6 s11

    9 r1 s7 r1 r1

    10 r3 r3 r3 r3

    11 r5 r5 r5 r5

  • 116

    Example SLR Grammar and LR(0) Items

    Augmentedgrammar:1. C C2. C A B3. A a4. B a

    State I0:C CC A BA a

    State I1:C C

    State I2:C ABB a

    State I3:A a

    State I4:C A B

    State I5:B a

    goto(I0,C)

    goto(I0,a)

    goto(I0,A)

    goto(I2,a)

    goto(I2,B)

    I0 = closure({[C C]})I1 = goto(I0,C) = closure({[C C]})

    start

    final

  • 117

    Example SLR Parsing Table

    s3

    acc

    s5

    r3

    r2

    r4

    a $

    0

    1

    2

    3

    4

    5

    C A B

    1 2

    4

    State I0:C CC A BA a

    State I1:C C

    State I2:C ABB a

    State I3:A a

    State I4:C A B

    State I5:B a

    1

    2

    4

    5

    3

    0start

    a

    A

    CB

    a

    Grammar:1. C C2. C A B3. A a4. B a

  • 118

    Actions of A LR-Parser

    1. shift s -- shifts the next input symbol and the state s onto the stack

    ( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

    2. reduce A (or rn where n is a production number)

    pop 2|| (=r) items from the stack;

    then push A and s

    Output is the reducing production reduce A

    3. Accept Parsing successfully completed

    4. Error -- Parser detected an error (an empty entry in the action table)

  • 119

    LR Parsing Algorithm

    Refer Text:

    Compilers Principles Techniques and Tools by Alfred V Aho, Ravi

    Sethi, Jeffery D Ulman

    Page No. 218-219

  • 120

    Actions of A (S)LR-Parser -- Example

    stack input action output

    0 id*id+id$ shift 5

    0id5 *id+id$ reduce by Fid Fid

    0F3 *id+id$ reduce by TF TF

    0T2 *id+id$ shift 7

    0T2*7 id+id$ shift 5

    0T2*7id5 +id$ reduce by Fid Fid

    0T2*7F10 +id$ reduce by TT*F TT*F

    0T2 +id$ reduce by ET ET

    0E1 +id$ shift 6

    0E1+6 id$ shift 5

    0E1+6id5 $ reduce by Fid Fid

    0E1+6F3 $ reduce by TF TF

    0E1+6T9 $ reduce by EE+T EE+T

    0E1 $ accept

  • 121

    Exercise

    Consider the following grammar of simplified statement sequences:

    stmt_sequencestmt_sequene;stmt |stmt

    stmts

    a) Construct the DFA of LR(0) items of this grammar.

    b) Construct SLR parsing table.

    c) Show the parsing stack and the actions of the SLR parsing for the input

    string s;s;s

  • 122

    shift/reduce and reduce/reduce conflicts

    If a state does not know whether it will make a shift operation or

    reduction for a terminal, we say that there is a shift/reduce conflict.

    If a state does not know whether it will make a reduction operation using the production rule i or j for a terminal, we say that there is a

    reduce/reduce conflict.

    If the SLR parsing table of a grammar G has a conflict, we say that that

    grammar is not SLR grammar.

  • 123

    Conflict Example

    S L=R I0: S .S I1:S S. I6:S L=.R I9: S L=R.

    S R S .L=R R .L

    L *R S .R I2:S L.=R L .*R

    L id L .*R R L. L .id

    R L L .id

    R .L I3:S R.

    I4:L *.R I7:L *R.

    Problem R .L

    FOLLOW(R)={=,$} L .*R I8:R L.

    = shift 6 L .id

    reduce by R L

    shift/reduce conflict I5:L id.

  • 124

    Conflict Example2

    S AaAb I0: S .S

    S BbBa S .AaAb

    A S .BbBa

    B A .

    B .

    Problem

    FOLLOW(A)={a,b}

    FOLLOW(B)={a,b}

    a reduce by A b reduce by A

    reduce by B reduce by B

    reduce/reduce conflict reduce/reduce conflict

  • 125

    SLR(1)

    There is an easy fix for some of the shift/reduce or reduce/reduce errors requires to look one token ahead (called the lookahead token)

    Steps to resolve the conflict of an itemset:1) for each shift item Y b . c you find FIRST(c)

    2) for each reduction item X a . you find FOLLOW(X)

    3) if each FOLLOW(X) do not overlap with any of the other sets, you have resolved the conflict!

    eg, for the itemset with E T . and T T . * F FOLLOW(E) = { $, +, ) }

    FIRST(* F) = { * }

    no overlapping!

    This is a SLR(1) grammar, which is more powerful than LR(0)

  • 126

    General LR(1) Parsing

    The SLR(1) trick doesn't always work

    The difficulty with the SLR(1) method is that it applies lookaheads after

    the construction of the DFA of LR(0) items.

    The power of general LR(1) method is that it uses a new DFA that has

    the lookaheads built into its construction from the start.

    This DFA uses extension of LR(0) items ie. LR(1) items.

    A single lookahead token is attached with each item.

    LR(1) item is a pair of consisting of an LR(0) item and a lookahead

    token of the form.

    [A. , a] where A. is an LR(0) item and a is a token.

  • 127

    LR(1) items

    A LR(1) item is:

    A .,a where a is the look-head of the LR(1) item(a is a terminal or end-marker.)

    When ( in the LR(1) item A .,a ) is not empty, the look-head does not have any affect.

    When is empty (A .,a ), we do the reduction by A only if the next input symbol is a (not for any terminal in FOLLOW(A)).

    A state will contain A .,a1 where {a1,...,an} FOLLOW(A)...

    A .,an

  • 128

    Canonical Collection of Sets of LR(1) Items

    The construction of the canonical collection of the sets of LR(1) items

    are similar to the construction of the canonical collection of the sets of

    LR(0) items, except that closure and goto operations work a little bit

    different.

    closure(I) is: ( where I is a set of LR(1) items)

    every LR(1) item in I is in closure(I)

    if A.B,a in closure(I) and B is a production rule of G;then B.,b will be in the closure(I) for each terminal b in FIRST(a) .

  • 129

    goto operation

    If I is a set of LR(1) items and X is a grammar symbol

    (terminal or non-terminal), then goto(I,X) is defined as

    follows:

    If A .X,a in I

    then every item in closure({A X.,a}) will be in

    goto(I,X).

  • 130

    Construction of The Canonical LR(1) Collection

    Algorithm:

    C is { closure({S.S,$}) }

    repeat the followings until no more set of LR(1) items can be added to C.

    for each I in C and each grammar symbol X

    if goto(I,X) is not empty and not in C

    add goto(I,X) to C

    goto function is a DFA on the sets in C.

  • 131

    A Short Notation for The Sets of LR(1) Items

    A set of LR(1) items containing the following items

    A .,a1...

    A .,an

    can be written as

    A .,a1/a2/.../an

  • 132

    LR (1) Items -Example

    SCC

    CcC|d

    Augmented Grammar

    S S

    SCC

    CcC|d

    Start with closure { S.S,$}

    I 0.{S.S, $

    S.CC,$

    C. cC, c/d

    C. d, c/d

    }

  • 133

    State 1 from State 0 (goto( I0,S)

    I1: {SS., $}

    S.S,$S.CC,$

    C.cC,c/dC.d, c/d

    SS.,$

    I0

    S

    I1

  • 134

    State 2 from State 0 ( GOTO (I0,C)

    State 3 from 0 ( GOTO ( I0,c)

    State 4 from 0 (GOTO (I0,d) I2:

    {

    SC.C,$

    C.cC,$

    C.d,$

    }

    I3:

    {

    Cc.C,c/d

    CcC, c/d

    C.d,c/d

    }

    I4:

    {

    Cd.,c/d

    }

    DFA upto this point is shown in the next

    slide

  • 135

    S.S,$S.CC,$

    C.cC,c/dC.d, c/d

    SS.,$

    I0

    S

    SC.C,$C.cC,$C.d,$

    Cc.C,c/dC.cC,c/dC.d,c/d

    I1

    S

    Cd., c/d

    I2

    I3 I4

    C

    c

    d

  • 136

    New states from State 2 (GOTO (I2,C)

    GOTO(I2, c), GOTO(I2,d)

    I5:

    {

    SCC.,$

    }

    I6

    {Cc.C,$

    C.cC,$

    C.d,$

    }

    I7:

    {Cd.,$}

  • 137

  • 138

    Construction of LR(1) Parsing Tables

    1. Construct the canonical collection of sets of LR(1) items for G.

    C{I0,...,In}

    2. Create the parsing action table as follows If a is a terminal, A.a,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j. If A.,a is in Ii , then action[i,a] is reduce A where AS. If SS.,$ is in Ii , then action[i,$] is accept. If any conflicting actions generated by these rules, the grammar is not LR(1).

    3. Create the parsing goto table

    for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

    4. All entries not defined by (2) and (3) are errors.

    5. Initial state of the parser contains S.S,$

  • 139

    c d $ S C

    0 s3 s4 1 2

    1 acc

    2 s6 s7 5

    3 s3 s4 8

    4 r3 r3

    5 r1

    6 s6 s7 9

    7 r3

    8 r2 r2

    9 r2

    ACTION GOTO

    1.SCC

    2.CcC

    3. Cd

    Cannonical LR Parsing Table

  • 140

    LALR(1) If the lookaheads s1 and s2 are different, then the items A a, s1

    and A a , s2 are different this results to a large number of states since the combinations of expected lookahead

    symbols can be very large.

    We can combine the two states into one by creating an item A a, s3 where s3 is the union of s1 and s2

    LALR(1) is weaker than LR(1) but more powerful than SLR(1)

    LALR(1) and LR(0) have the same number of states

    Most parser generators are LALR(1), including CUP (Constructor of

    Useful Parsers)

  • 141

    Practical Considerations

    How to avoid reduce/reduce and shift/reduce conflicts: left recursion is good, right recursion is bad

    Most shift/reduce errors are easy to remove by assigning precedence and

    associativity to operators

    + and * are left-associative

    * has higher precedence than +

  • 142

    LALR Parsing Tables

    LALR stands for LookAhead LR.

    LALR parsers are often used in practice because LALR parsing tables

    are smaller than LR(1) parsing tables.

    The number of states in SLR and LALR parsing tables for a grammar G

    are equal.

    But LALR parsers recognize more grammars than SLR parsers.

    yacc creates a LALR parser for the given grammar.

    A state of LALR parser will be again a set of LR(1) items.

  • 143

    Creating LALR Parsing Tables

    Canonical LR(1) Parser LALR Parser

    shrink # of states

    This shrink process may introduce a reduce/reduce conflict in the

    resulting LALR parser (so the grammar is NOT LALR)

    But, this shrink process does not produce a shift/reduce conflict.

  • 144

    The Core of A Set of LR(1) Items

    We will find the states (sets of LR(1) items) in a canonical LR(1) parser

    with same items. Then we will merge them as a single state.

    I4:Cd., c/d A new state: I47: Cd..,c/d/$

    I7:Cd.,$ have same item, merge them We will do this for all states of a canonical LR(1) parser to get the states

    of the LALR parser.

    In fact, the number of the states of the LALR parser for a grammar will

    be equal to the number of states of the SLR parser for that grammar.

  • 145

    I3: Cc.C,c/d A new state: I36: Cc.C,c/d/$

    C.cC,c/d C.cC,c/d/$

    C.d,c/d C.d,c/d/$

    I6: Cc.C,$ have same item, merge them

    C.cC,$

    C.d,$

  • 146

    I8 and I9 are replaced by their union

    I89 :

    {

    CcC. , c/d/$

    }

  • 147

    Creation of LALR Parsing Tables

    Create the canonical LR(1) collection of the sets of LR(1) items for the given grammar.

    Find each core; find all sets having that same core; replace those sets having same cores with a single set which is their union.

    C={I0,...,In} C={J1,...,Jm} where m n

    Create the parsing tables (action and goto tables) same as the construction of the parsing tables of LR(1) parser. Note that: If J=I1 ... Ik since I1,...,Ik have same cores

    cores of goto(I1,X),...,goto(I2,X) must be same.

    So, goto(J,X)=K where K is the union of all sets of items having same cores as goto(I1,X).

    If no conflict is introduced, the grammar is LALR(1) grammar. (We may only introduce reduce/reduce conflicts; we cannot introduce a shift/reduce conflict)

  • 148

    LALR Parsing Table

    c d $ S C

    0 s36 s47 1 2

    1 acc

    2 s36 s47 5

    36 s36 s47 89

    47 r3 r3 r3

    5 r1

    89 r2 r2 r2

    1.SCC

    2.CcC

    3. Cd

  • 149

    Exercises

    Q1. Show that the following grammar

    SAa|bAc|dc|dba

    Ad

    Is LALR(1) but not SLR(1).

    Q2. Show that the following grammar

    SAa|bAc|Bc|bBa

    Ad

    Bd

    Is LR(1) but not LALR(1)