tomita‘s parser tihomira panayotova paolina teneva

Tomita‘s Parser

Tihomira PanayotovaPaolina Teneva

Seminar für Sprachwissenschaft

31.01.2007

A simple overview

LR(0) conflicts Tomita‘s method summarized Complications Two optimiziations A moderately ambiguous grammar Stack duplication Combinig equal states Combinig equal stack prefixes Discussion Summary

LR(0) Conflicts

LR Parser:

1.handle recognizing FS automaton

2. no inadequate states

There exist grammars for which the automaton has some inadequate states

LR(0) Conflicts

Tomita‘s method summarized

Simple definition: – A breadth-first search over those parsing

decisions that are not solved by the LR automaton

– It gives an efficient and very effective approach to grammars for which the automaton has some inadequate states


How does the parser act when it encounters an inadequate state on the top of the stack?

Step1. It duplicates the stack and splits the parse into a different process for each copy:

One copy is reserved for the REDUCE step

The other copy is reserved for the SHIFT step

Step2. Stacks that have a right-most state that does not allow a shift on the next input token are DISCARDED


SHIFT step: push a new symbol and the new state onto the stack

REDUCE step: removes part of the right end of the stack and replaces it with a non-terminal; using this non-terminal as a move in the automaton, we find a new state to put on the top of the stack

Conclusion: Every time we encounter an inadequate state on the top of the stack, the duplication process is repeated untill all reduces have been treated.

Complications

The repetition of the duplication process can cause a proliferation of stacks. A great number will be copied and subsequently discarded

If all stacks are discarded in Step2 => the input was in error

Grammars with loops : A->B B->A - the process may not terminate

Complications

Some ideas to cope with the complications:

1. Use of look-ahead to decide which reduces can be made in Step1

2.Grammar with loops: 2.1. upon creating a stack, check if it is already there (and then ignore it) 2.2 check the grammar in advance for loops (and then reject it).

Two optimizations

Combining equal states

Combibing equal stack prefixes

A moderately ambiguous grammar

SS -> E #

E -> E + E

E -> d Figure 9.38 A moderately ambiguous grammar

LR(0) automaton for the grammar

Stack Duplication

a. 1 d+d+d# shiftb. 1 d 2 +d+d# reducec. 1 E 3 +d+d# shiftd. 1 E 3 + 4 d+d# shifte. 1 E 3 + 4 d 2 +d# reducef. 1 E 3 + 4 E 5 +d# duplicate to g1 and g2 g1. 1 E 3 + 4 E 5 +d# REDUCE; reduce to g1.1g2. 1 E 3 + 4 E 5 +d# SHIFT; shift to g1.2g1.1 1 E 3 +d# shift to h1g1.2 1 E 3 + 4 E 5 + 4 d# shift to h2h1 1 E 3 + 4 d# shift to h1.1h2 1 E 3 + 4 E 5 + 4 d 2 # reduce to h1.2h1.1 1 E 3 + 4 d 2 # reduce to ih1.2 1 E 3 + 4 E 5 + 4 E 5 # duplicate to i1 and i2 i 1 E 3 + 4 E 5 # duplicate to j1 and j2

Stack Duplication

i1 1 E 3 + 4 E 5 + 4 E 5 #REDUCE, reduce to k1I2 1 E 3 + 4 E 5 + 4 E 5 #SHIFT - DISCARDEDj1 1 E 3 + 4 E 5 #REDUCE, reduce to k2j2 1 E 3 + 4 E 5 #SHIFT - DISCARDEDk1 1 E 3 + 4 E 5 # reduce to l1k2 1 E 3 # shift to l2l1 1 E 3 # shift to m1l2 1 E 3 # 6 reduce to m2m1 1 E 3 # 6 reduce to nm2 1 S ACCEPTn 1 S ACCEPT

Parse trees


Examine the following:

Both stacks have the same state on top=>further actions on both stacks will be identical

Combine the two stacks to avoid duplicate work


f) 1. 1 E 3 + 4 d 2 # both

2. 1 E 3 + 4 E 5 + 4 d 2 # REDUCE to g


f) 1. 1 E 3 + 4 d 2 # both

2. 1 E 3 + 4 E 5 + 4 d 2 # REDUCE to g

g) 1. 1 E 3 + 4 E 5 # duplicate to

2. 1 E 3 + 4 E 5 + 4 E 5 # g’ and g ’’


f) 1. 1 E 3 + 4 d 2 # both

2. 1 E 3 + 4 E 5 + 4 d 2 # REDUCE to g


2. 1 E 3 + 4 E 5 + 4 E 5 # g’ and g ’’

g’) 1. 1 E 3 + 4 E 5 # for REDUCE

2. 1 E 3 + 4 E 5 + 4 E 5 #


f) 1. 1 E 3 + 4 d 2 # both

2. 1 E 3 + 4 E 5 + 4 d 2 # REDUCE to g


2. 1 E 3 + 4 E 5 + 4 E 5 # g’ and g ’’

g’) 1. 1 E 3 + 4 E 5 # for REDUCE

2. 1 E 3 + 4 E 5 + 4 E 5 #

g’’) 1. 1 E 3 + 4 E 5 # copy to h3)

2. 1 E 3 + 4 E 5 + 4 E 5 # for SHIFT


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)

g’.2) 1 E 3 + 4 E 5 + 4 E 5 # REDUCE to h.2)


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)

g’.2) 1 E 3 + 4 E 5 + 4 E 5 # REDUCE to h.2)

h.1 ) 1 E 5 # SHIFT


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)

g’.2) 1 E 3 + 4 E 5 + 4 E 5 # REDUCE to h.2)

h.1 ) 1 E 5 # SHIFT

h2 ) 1 E 3 + 4 E 5 # REDUCE to h2.1) and h2.2)


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)

g’.2) 1 E 3 + 4 E 5 + 4 E 5 # REDUCE to h.2)

h.1 ) 1 E 5 # SHIFT


h2.1) 1 E 5 # SHIFT


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)

g’.2) 1 E 3 + 4 E 5 + 4 E 5 # REDUCE to h.2)

h.1 ) 1 E 5 # SHIFT


h2.1) 1 E 5 # SHIFT

h2.2) 1 E 3 + 4 E 5 # SHIFT


g’.1) 1 E 3 + 4 E 5 # REDUCE to h.1)

g’.2) 1 E 3 + 4 E 5 + 4 E 5 # REDUCE to h.2)

h.1 ) 1 E 5 # SHIFT


h2.1) 1 E 5 # SHIFT

h2.2) 1 E 3 + 4 E 5 # SHIFT

h3) 1. 1 E 3 + 4 E 5 # SHIFT

2. 1 E 3 + 4 E 5 + 4 E 5


Now we have five stacks (h1, h2.1, h2.2, h3). h1) and h2.1) carry state (3) on top h2.2) and h3) carry state (5) on top

h1 ) 1 E 3 # SHIFT

h2 ) 1 E 3 + 4 E 5 # REDUCE to h2.1), copy to h2.2)

h2.1) 1 E 3 # SHIFT

h2.2) 1 E 3 + 4 E 5 # SHIFT

h3) 1. 1 E 3 + 4 E 5 # SHIFT 2. 1 E 3 + 4 E 5 + 4 E 5


We combine the stacks with identical states on top into two bundles h’ and h’’.

h’) h1) 1 E 3 #copy to i)

h2.1) 1 E 3


We combine the stacks with identical states on top into two bundles h’ and h’’.

h’) h1) 1 E 3 #copy to i)

h2.1) 1 E 3

h’’) h3) 1. 1 E 3 + 4 E 5

2. 1 E 3 + 4 E 5 + 4 E 5 #discard

h2) 1 E 3 + 4 E 5


i) 1) 1 E 3

3 # 6

2) 1 E 3


i) 1) 1 E 3

3 # 6

2) 1 E 3

i’) 1 E 3 # 6 REDUCE to j1)


i) 1) 1 E 3

3 # 6

2) 1 E 3


i”) 1 E 3 # 6 REDUCE to j2)


i) 1) 1 E 3

3 # 6

2) 1 E 3



j1) 1 S ACCEPT


i) 1) 1 E 3

3 # 6

2) 1 E 3



j1) 1 S ACCEPT

j2) 1 S ACCEPT

Combining equal stack prefixes

When the parser makes the call for the stack

to be copied,there is no actual need to copy the entire stack!

It is enough to copy the top state suffixes

Combining Equal Stack-Prefixes

If we observe the example :e) 1 E 3 + 4 E 5 +d#



When we duplicate the stack we have two copies of

It and REDUCE is applied only to one of the copies and

only “so much” of the stack is copied:



When we duplicate the stack we have two copies of

it and REDUCE is applied only to one of the copies and

only “so much” of the stack is copied:

e’) 1 E 3 +d# SHIFT

e’’) 1 E 3 + 4 E 5 +d# SHIFT

Discussion

Table characteristics:

-the method can work with every bottom-up table

-the weaker the table, the more non-determinism will have to be resolved by breadth-first search

Time requirements:

- in theory – exponential

- in practice - linear or slightly more than linear

SUMMARY

Breadth-first search over those parsing decisions that are not solved by the LR automaton

Important notions that should be memorized:

– stack duplication (inadequate states, reduce, shift, discarded)

– combining equal states– combining equal stack prefixes

References

Dick Grune & Ceriel Jacobs (1990). Parsing Techniques

tomita‘s parser tihomira panayotova paolina teneva

Documents

e e e e d figure

ambiguous grammar ss

shift shift

shift discardedj1

shift discardedk1

shift step step2

new state

j2 stack duplicationi1