chapter 5 bottom-up parsing zhang jing, wang hailing college of computer science & technology...

72
Chapter 5 Bottom- Up Parsing Zhang Jing, Wang HaiLing College of Computer Scie nce & Technology Harbin Engineeri ng University

Upload: geoffrey-blair

Post on 02-Jan-2016

232 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

Chapter 5 Bottom-Up Parsing

Zhang Jing, Wang HaiLing

College of Computer Science & Technology

Harbin Engineering University

Page 2: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 2

Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves which can be considered as bottom and working up towards the root, know as top. We can think of this process as one of reducing which reduce a string to the start symbol. At each reduction step, a particular substring matches the right side of production and is replaced by the symbol on the left of the production. An easy-to-implement form of shift-reduce parsing is operator-precedence parsing. A much more general method of shift-reduce parsing is LR(0) and SLR(1) parsing. The position of bottom-up syntax analyzer in compiler is shown by Fig.5.1.

Page 4: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 4

5.1 Operator-precedence Parsing

If a grammar has the property that has two adjacent nonterminals, we can easily construct efficient shift-reduce parsers by hand, the easy-to-implement parsing technique called operator-precedence parsing. The technique is described as a manipulation on tokens without any reference to any grammar. Once we finish building an operator-precedence parser from a grammar, we may efficiently ignore the grammar, using the nonterminals on the stack only as placeholders for attributes associated with the nonterminals.

Page 5: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 5

5.1.1 Relation between pairs of operator precedence

There are three relations between pairs of operator precedence, “a” and “b” belongs to VT , U, V and R belong to VN ,

then their operator precedence are

Page 6: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 6

1. a b , means there are rules U =…ab… or U =…a∷ ∷Vb…

2. a b, means there are rules U =…aR…∷ , R+b…or R+Vb

3. a b, means there are rules U =…Rb…∷ , R+…a or R+…aV

Note: The precedence relations between a and b are different with arithmetic relations “less than”, “equal to ” and “greater than”, that is,

a b does not equal to b a , a b does not equal to b a

.>

· <

.> · < ·

·

·

Page 7: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 7

Example 5.1 grammar G 〔 E 〕: E =E+T|T∷ T =T*F|F∷ F =(E)|i ∷ From rule F =∷ ( E ), we can obtain the precedence r

elation between “(“ and “)” ( ) From rule E =E+T∷ , we know after “+” there is TT*… ,

so the precedence relation between “+” and “*” : + * From rule F =∷ ( E ), and E+…+T , we can obtain

the precedence relation between “+” and “ ) ” : + )

·

.>

· <

Page 8: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 8

5.1.2 Constructing Operator- precedence Relation This section, we will give a general method of con

structing operator precedence, firstly, we will define two new sets: FIRSTTERM(U) and LASTTERM(U).

b FIRSTTERM(U) when there is rule: U =+b∈ ∷… or U =+Vb…∷

b LASTTERM(U) when there is rule: U =+∈ ∷…b or U =+…bV∷

while b V∈ T , V V∈ N 。

Page 9: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 9

The algorithm of constructing operator precedence is

Step1 constructing set of FIRSTTERM and set of LASTTERM for each nonterminal. a, b V∈ T and U, R V∈ N 。

Step2 If there is grammar G like U =…ab… o∷r U =…aVb… a b ∷

If there is grammar G like U =…aR…, and, b∷FIRSTTERM(R) a b∈

If there is grammar G like U =…Rb…, and, a∷ ∈LASTTERM(R) a b

·

· <

.>

Page 10: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 10

Step 3 constructing operator precedence from string “ #”

and other terminals, there are # FIRSTTERM(U) LASTTERM(U # # # According to the algorithm, we construct the

operator precedence of example 5.1

·

.>

· <

Page 12: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 12

So, the operator precedence matrix of example 5.1 is shown by table 5.1

Page 13: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 13

5.1.3 Operator-precedence Grammar Operator-precedence parsing has three

disadvantages, It is hard to handle tokens like the minus sign,

which has two different precedence. One can not always be sure the parser accepts

exactly the desired language. Only a small class of grammars can be parsed

using operator-precedence techniques. In an operator grammar, no production rule can

have.

Page 14: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 14

at the right side two adjacent non-terminals at the right side.

E =AB∷ E =EOE ∷ E =E+E |∷ A =a∷ E =id∷ E*E | B =b∷ O =+|*|/∷ E/E | i not operator grammar not operator grammar operator grammar

Page 15: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 15

Operator grammar also can be called OG. There are three types disjoint precedence relation between pair of terminals, the three types disjoint precedence are , and , But if a pair of terminals only has one certain type precedence relation, this kind of OG is operator precedence grammar, namely, OPG.

For example, grammar E =E+E|E*E|E/E|i is not operat∷or-precedence grammar. Because from Fig5.2, we know there are two grammar tree for(+ , /), in addition there are two precedence relations between them, namely,

+ / and + / · < .>

Page 16: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 16

E

E / E

E E + E

E

E + E

E E / E

Fig. 5.2 Two syntax tree of string E+E/E图 5.2 句型 E+E/E 的两棵语法树

Page 17: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 17

5.1.4 Leftmost Phrase

The syntax tree for sentence #T+T*F+i# in grammar G[E] of example 5.1 is shown by Fig.5.3.

E

E + T

E T + F

T T * F i

Fig.5.3 syntax tree of #T+T*F+i#图 5.3 句型语法树

Page 18: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 18

We can see that there are several phrases from Fig.5.3

T ( For nonterminal E ) T * F ( For nonterminal T ) T + T * F ( For nonterminal E ) i ( For nontermina F ) T + T * F + i ( For nonterminal E )

Page 19: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 19

The simple phrases are T , T * F and i, the handle is T, T*F is the leftmost phrase. So the definition of leftmost phrase is: it is a phrase that includes at least one terminal, in addition, it does not include any other phrase.

For example, there is sentence #F*i+i#, its syntax tree is shown by Fig.5.4. It has two phrases i and i, but F*i is not phrase, because it includes the other phrase i.

Page 21: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 21

Next, we will give a general method to obtain the leftmost phrase of operator precedence. the sentence of a operator grammar

#V1a1V2a2…Viai…Vn and Vn+1# While Vi is non terminal, ai is terminal , that me

ans there is only one non terminal between two adjacent terminals. Left most phrase has the property

ai ai+1 , ai+1 ai+2 , aj-1 aj , aj aj+1 · · .> · <

Page 22: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 22

the leftmost phrase is Vi+1ai+1…VjajVj+1 For example, the sentence of G[E] is #T+T*F+i#,

there are three nonterminals ( V1=T , V2=T ,V3=F ) , and four terminals ( a1=+ , a2=* , a3=+ , a4=i ) , while a1 , a2 , a3 have the propertie,

a1 a2 , a2 a3

So, T*F ( namely, V2a2V3 ) is the leftmost phrase of the sentence #T+T*F+i#.

· < .>

Page 23: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 23

5.1.5 The Algorithm and Program of Operator Precedence Parsing This section, we will introduce a bottom-up parsing algorith

m—operator precedence parsing algorithm. In the algorithm, every placeholder is leftmost phrase, namely, every reduction is to find the leftmost phrase.

Step 1. Construct operator precedence relation matrix. Step 2. Create a symbol stack to store the reduction string or

leftmost string, build other input stack to store input string. At beginning, there is only one symbol “#” in symbol stack, and there is the first terminal in input stack.

Page 24: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 24

Step 3. From the top terminal xn move to bottom of symbol stack, and at the same time compare with its closest. .

terminal, if xn-1 xn go on comparing xn-2 and xn-

1 till xi-1 xi , now we can obtain the leftmost phrase: Nixi Ni+1xi+1… Nnxn Nn+1 ( If Ni is empty ,xi is the beginning symbol )

.>

Page 25: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 25

Step 4. In grammar G, we choose the right of rule is Nixi Ni+1xi+1… Nnxn Nn+1 to reduce (non terminal need not be same), that is, pop leftmost phrase at the top of symbol stack, and push its left of the rule into the stack. When there are only # or one non terminal and # in symbol stack, there is # in input stack, that means the analysis succeed, the input string is the sentence of the grammar, exit from the program; or not, return to 3.

Page 26: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 26

The program of operator precedence parsing is as follows. set p to point to the first symbol of w$ ;

repeat forever if ( $ is on top of the stack and p points to $ ) then return else { let a be the topmost terminal symbol on the stack and let b

be the symbol pointed to by p; if ( a b or a b ) then { /* SHIFT */ push b onto the stack; advance p to the next input symbol;} else if ( a b ) then /* REDUCE */ repeat pop stack until ( the top of stack terminal is related by to the

terminal most recently popped ); else error();}

Page 27: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 27

So for example 5.1 grammar G 〔 E 〕: E =E+T|T∷ T =T*F|F∷ F =(E)|i∷ String i* ( i+i ) is recognized by operator prec

edence algorithm, the analysis process is shown by Table 5.2

Page 29: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 29

Example 5.2 Consider the following grammar S = ( L ) | a∷ L = L , S | S∷ and the following operator-precedence relations

Page 30: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 30

Using these precedence relations to parse the sentence (a, (a, a)).

Page 31: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 31

5.2 LR ( 0 ) Parser We have known that there are some limitations in gramm

ar when we reduce by method of operator precedence, for example, the rule of U =εshould not be appeared, and th∷ere are two adjacent nonterminals in operator precedence grammar. For LR(0) parser, there are no such limits, so it is efficient bottom-up syntax analysis technique that can be used to parse a large class of context-free grammars. “L” in LR parsing means left-to-right scanning of the input, the “R” in it is for constructing a rightmost derivation in reverse, the “0” means need not to check up look a head for the input symbols that are used in making parsing decisions.

Page 32: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 32

5.2.1 Viable Prefix In order to explain how to derivation from bottom to up,

we will firstly discuss the concept of canonical prefix by an example.

There is grammar G 〔 S 〕: S =aABe∷ A =Abc|b∷ B =d∷ We label four rules in G[S] by numbers, they are S = aABe∷ 〔 1 〕 A = Abc ∷ 〔 2 〕 A =b ∷ 〔 3 〕 B =d ∷ 〔 4 〕

Page 33: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 33

So the right sentential deduction of “abbcde” is

S aABe aAde aAbcde abbcde

〔 1 〕 〔 4 〕 〔 2 〕 〔 3 〕 The reduction of the input string “abbcde” is shown b

elow.

So, the prefix of every derivation, we call it viable prefix.

Page 34: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 34

5.2.2 Constructing FA by Viable Prefix

It is remarkable fact that if it is possible to recognize a viable prefix knowing only the grammar symbol on the stack, there is finite automation that can determine what the handle is. .

In addition, we can define that item of grammar is the state of finite automation.

Page 35: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 35

Item of grammar is a production of grammar with a dot at some position of the right side. For example, production A =XYZ yields the four ∷items

A =·XYZ∷ A =X·YZ∷ A =XY·Z∷ A =XYZ· ∷

Page 36: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 36

The first item above indicates that we hope to see a string derivable from XYZ next on the input. The second item indicates that we have just seen on the input a string derivable from X , and we hope next step to see a string derivable from YZ. The production U =εgenerates only one ite∷m, U =·. ∷

After defining the item, we know the states in finite automation, then we can design finite automation. For example, there is a rule of grammar: X::=aAc, it has three items,

(h) X::= •aAc (i) X::=a•Ac (k) A::= •d

Page 37: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 37

h, i, k are items(states) of finite automation. The dot in state i is in next position of state h, so we can draw an arc from state h to state i, the arc is labeled by a. In addition, A is nonterminal, and there is item k that its left side is A, we can draw an arc from i to k and label the arc byε. .

Page 38: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 38

Example 5.3 Grammar G[S] : S =E ∷ 〔 1 〕 E =aA ∷ 〔 2 〕 E = bB ∷ 〔 3 〕 A =cA ∷ 〔 4 〕 A = d ∷ 〔 5 〕 B =cB ∷ 〔 6 〕 B = d ∷ 〔 7 〕

Page 39: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 39

From the item defined above, we know there are 18 items, and its finite automation is shown by Fig. 5.5

1 . S =·E 2∷ . S =E·∷ 3 . E =·aA 4∷ . E =a·A∷ 5 . E =aA· 6∷ . A =·cA∷ 7 . A =c·A 8∷ . A =cA·∷ 9 . A =·d 10∷ . A =d·∷ 11 . E =·bB 12∷ . E =b·B∷ 13 . E =bB· 14∷ . B =·cB∷ 15 . B =c·B 16∷ . B =cB·∷ 17 . B =·d 18∷ . B =d·∷

Page 41: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 41

We divide items into several types according to the dot position in item and judge by the symbol after the dot if it is nonterminal or terminal.

(1) Shift item, the item form looks like A::=α·aβ, means push “a” into stack, and state changes from before dot state to dot after state, while α,β V*, ∈a V∈ T. .

(2) Waiting reduction item, the item form looks like A::=α·Bβ, item after dot is waiting reduce item, it means after reduce B that A can be reduce, whileα,β V*,B V∈ ∈ N. .

Page 42: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 42

(3) Reduce item, the item form is A::=α·, while α V*∈ ,namely, it is reduction item when dot is on the

rightmost, it means that the right side of a production has been analyzed, the handle has been recognized.

(4) Accept item, the item form looks like S =α·,whileα∷V∈ +, S is start symbol.

In example 5.3, state 3 and state 17 is shift item, state 4 and state 15 is waiting reduce item, state 2 and state 5 is reduce item, in addition, state 2 is accept item. The connection arcs on path from start state to one of reduce state is viable prefix of the sentence, such as bccB is viable prefix. .

Page 43: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 43

From Fig.5.4, we know it is a nonfinite automation. The central idea in the LR method is to construct a deterministic finite automation from the grammar. So, we should group items together into sets, which can construct deterministic finite automation from it. We use closure operation to construct item sets.

Page 44: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 44

5.2.3 The Closure of set of items If I is a set of items for a grammar G, then closure(I) is th

e set of items constructed from I by the two rules:

1 Initially, every item in I is added to closure(I).

2 If U =x·V∷ y is in closure(I) and V =z is a production, th∷en add the item V =·z to I, if it is not already there. We ∷apply this rule until no more new items can be added to closure(I).

For example, there is item S =·E, and it is in closure I0, ∷then E =aA|bB, so the items E =·aA and E =·bB are ∷ ∷ ∷in closure I0 too, that is,

I0={S =·E∷ , E =·aA∷ , E =·bB}∷

Page 45: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 45

Intuitively, U =x·Vy in closure (I) indicates that, ∷at some points in the parsing, we might see a substring derivable from Vy that is as input. If V =z ∷is a production, we also expect we might see a substring derivable from z. For this reason, V =·z is ∷included in closure (I).

An useful application of closure is function GOTO (I, X), while I is a set of items and X is a symbol. GOTO (I, X) is defined to be the closure of the set of all items U = xX·y is in I.∷

Page 46: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 46

The algorithm of closure(I) is:

C is { closure({S’.S}) }repeat the followings until no more set of LR(0) items can be added to C.for each I in C and each grammar symbol Xif goto(I,X) is not empty and not in C add goto(I,X) to C

Page 47: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 47

With closure and GOTO function, we can easily change the NFA to DFA, Fig.5.6 is an example of it.

Page 48: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 48

5.2.4 LR(0) Parsing Table LR(0) parser consists of an input and output stack, a drive

r program, and a parsing table that has two parts(ACTION and GOTO). The driving program is same for all LR parser, only the parsing table changes from each other. Input stack stores input string of the form s0X1s1X2s2…Xm sm, where each Xi is a grammar symbol , and each si is a symbol called a state. Parsing table includes two parts, a parsing action function ACTION and a goto function GOTO. ACTION and GOTO functions can recognize viable prefix from all the deterministic finite automation. .

Page 49: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 49

There are three rows in LR(0) parsing table, the first one represents the states (Ii); the second one is ACTION, means what ACTION should do next; the third one is GOTO, means to judge which state will be chosen next. We shall explain GOTO and ACTION as follow. x,y V ,a∈ ∈VT. Construct C={I0 , I1 ,… In}, the collection of sets of LR(0) items for grammar.

(1) If U =x·a∷ y is in Ii ,and GOTO ( Ii , a ) =Ij, then set ACTION [i , a] =“Sj”, Here “a” must be a terminal.

(2) If U =x· is in Ii, then set ACTION[i∷ , a]= “rj ” or ACTION[i , #]= “rj ”, means using rule j: U =x to red∷uce, because “#” and “a” represents any symbol;.

Page 50: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 50

(3) If Z =x· is in I∷ i, Z is start symbol of grammar. then set ACTION[i , #]=“acc” ,“ acc”means accept.

(4) The GOTO transitions for state i are constructed for all nonterminals U, if GOTO(Ii , U)=Ij, then GOTO[i, U]= “ j ”.

(5) All entries is not defined by above rules are made “error”.

Note: if any conflicting action is generated by the above rules, we say the grammar is not LR(0), the algorithm fails to produce a parser in this case.

Page 51: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 51

We know how to construct the items from grammar and how to obtain the closure of items, then what we do next is to use example 5.3 to explain how to construct LR(0) parsing table by the method above.

First, we look for the item which form is U =x·ay from ∷I0 to I11 , in example 5.3 , there are rules E =•aA ,E =•∷ ∷bB, GOTO ( I0 , a ) =I2, GOTO ( I0 , b ) =I3, so there are ACTION 0﹝ , a =“S﹞ 2” , ACTION 0﹝ , b﹞=“S3”, that is why there are S2 and S3 in the first and second row in Table 5.5, with the similar reason , there are S5 , S6 , S8 , S9 in different rows.

Page 52: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 52

Second, we judge if there is item form of U =x·, in exa∷mple 5.3 , form of item I4 is E =aA• ,so the number 2 ru∷le of E =aA can be used to reduce and there is r∷ 2 in I4, similarly, I6 I7 I9 I10 I11 have r5 ,r3 ,r7 ,r4 and r6 separately.

Third , we check if there are items which form is Z =x·, ∷in example 5.3 , item I1 is the form of S =E•, so ACTIO∷N 1﹝ , # =“acc”, there is acc in I﹞ 1 in Table 5.5.

Finally, we look for the item that form is GOTO ( Ii ,U ) =Ij and U VN , in example 5.3 , there is GOTO∈( I0 , E ) =I1 ,so there is 1 in row E of I0. Similarly, with I2 ,I3 ,I4 and I8 ,they have 4,7,10 and 11 in row A and B differently.

So, we obtain the parsing table of example 5.3 and it is shown by Table 5.5.

Page 54: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 54

With parsing table, we are easy to parse grammar, but if the parser runs automatically, the limitation is that we must have a driver program to control the input and output stack, and their information transformation with parsing table. Namely, every step of driver program will check up the present state of stack, input symbol and LR ( 0 ) parsing table, run the operation of ACTION q﹝ , a and GO﹞TO. We can use the following configuration to represent their relation, it includes three parts: state stack “q”, symbol stack “X” and input string “a”.

( q0q1…qi , #X1X2…Xi , akak+1…an# )

Page 55: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 55

The top state of state stack is qi , top symbol of symbol stack is Xi , the current input symbol is ak. What we will do next is to check LR ( 0 ) parsing table and run the operation by ACTION[qi , ak], the detail is as follows. The initial configuration

( q0 , # , a1a2…an# ) (1). If ACTION q﹝ i , ak =S﹞ j , means the input symb

ol ak would be pushed into symbol stack X, the state will shift from state qi to its next state qj , the configuration becomes

( q0q1…qiqj , #X1X2…Xiak , ak+1…an# ) The current state becomes state qj, current input symbol is

ak+1.

Page 56: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 56

(2). If ACTION q﹝ i , ak =r﹞ j , ak is terminal or #, the parser executes a reduce move, the top of symbol stack will reduce by rule j, the length of symbol stack and state stack should decrease length m, here m is length of right side of rj . For example, rule j is U =x, the length of x is m, in addition, th∷ere is GOTO q﹝ i-m , U =q﹞ t , so the configuration becomes

( q0q1…qi-mqt , #X1X2…Xi-mU , akak+1…an

# ) While ak is not in symbol stack X, the current input symbol still is ak , the current state is qt , it comes from GOTO q﹝ i-m , U =q﹞ t.

Page 57: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 57

3 If ACTION q﹝ i , ak =acc, parsing is complete﹞d.

4 If ACTION ( qi , ak ) =ERROR, the parser has discovered an error, driver program of parser will stop.

We use the driver program to judge if string “acccd” can be recognized by the grammar of example 5.3 , the recognition succeeds and the result is shown by table 5.6.

Page 59: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 59

5.3 SLR(1)Parser We often meet augmented expression grammar w

hen we parse, such as the grammar G[U].

G[U]:

U =x·by ∷ 〔 1 〕 V =x· ∷ 〔 2 〕 W =x· ∷ 〔 3 〕 The three rules belong to one item I0={U =x·by∷ , V =x· ∷ , W =x·} ∷

Page 60: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 60

When we parse the grammar, there are two rules V =x· ∷and W =x· that they both can be used to reduce, namely,∷ r2 and r3. In this case, we can not parse the grammar by LR(0), because LR(0) parse table can not recognize two reduce rules in one item form. What we should do in this case? SLR(1) parser can solve this problem. SLR(1) parser will check the input symbol “a” to judge

( 1 ) If a=b , then ACTION[0,a]=“S1” ( 2 ) If a FOLLOW∈ ( V ), then ACTION[[1 ,

a]=“r2” ( 3 ) If a FOLLOW∈ ( W ), then ACTION[[1 ,

a]=“r3” ( 4 ) Otherwise, ACTION[0,a]=“ERROR”

Page 61: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 61

Note: FOLLOW ( V ), FOLLOW ( W ) and {b} should not be intersected and have not same element.

An SLR(1) grammar can be defined as follows. I={U1 =x·b∷ 1y1 , U2 =x·b∷ 2y2 ,…, Um =x·b∷ mym , V1 =x· ∷ ,

V2 =x· ∷ ,…, Vn =x· }∷ Set {b1 , b2 ,…, bm},FOLLOW ( V1 ) ,FOLLOW ( V2 ) ,…,

and FOLLOW ( Vn ) should not be intersected. SLR means simple LR, it is the weakest grammar and is the easiest to b

e implement. Parsing table constructed by this method is called SLR table, in addition, LR parser using an SLR parsing table is said to be SLR parser. A grammar for which an SLR parser can be constructed is SLR grammar.

SLR(1) parser works like that it scans the input string from left-to-right, constructs a rightmost derivation in reverse, checks up 1 input symbol lookahead.

Page 62: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 62

The different between constructing SLR(1) parser table and LR(0) parser table is at the second step, the changed step 2 is:

If U =x· is in I∷ i, and if a FOLLOW∈ ( U ) , then set ACTION[i ,a]= “rj ” or ACTION[i , #]= “rj ”,means using rule j: U =x to reduce, ∷because “#” and “a” (represents any symbol) are in Follow(U);

We still use the grammar G[S] of example 5.1 to explain how to construct the SLR(1)parser.

G[S] : S =E ∷ 〔 1 〕 E =E+T ∷ 〔 2 〕 E = E ∷ 〔 3 〕 T =T*F ∷ 〔 4 〕 T = F ∷ 〔 5 〕 F =∷ ( E ) 〔 6 〕 F = i ∷ 〔 7 〕

Page 63: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 63

Beginning From the first rule S =E∷ , We obtain the item I0 : I0={ S =·E ∷ , E =· E+T∷ , E =·T∷ , T =·T*F∷ , T =·F∷ , F

=·∷ ( E ), F =·i }∷ From the transition function GOTO ( I0 , E ), we get item I1 : I1={ S =E· ∷ , E = E·+T }∷ From the transition function GOTO ( I0 , T ) , we get Item I2 : I2={ E =T· ∷ , T =T·*F }∷ From the transition function GOTO ( I0 , F ) , we get Item I3 : I3={ T =F· }∷ From the transition function GOTO ( I0 , i ) , we get Item I4 : I4={ F =∷ ( ·E ), E =· E+T∷ , E =·T∷ , T =·T*F∷ , T =·F∷ ,

F =·∷ ( E ), F =·i }∷ From the transition function GOTO ( I0 , i ) , we get Item I5 : I5={ F =·i }∷

Page 64: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 64

Similarly, From the transition function GOTO ( I1 ,+ ) , we get Item I6 :

I6={ E = E+·T∷ , T =·T*F∷ , T =·F∷ , F =·∷( E ), F =·i }∷

From the transition function GOTO ( I2 , * ) , we get Item I7 :

I7={ T =T*·F ∷ , F =·∷ ( E ), F =·i } ∷ The other new items are I8={ F =∷ ( E· ), E = E·+T }∷ I9={ E = E+T·∷ , T =T·*F }∷ I10={ T =T*F·}∷ I11={ F =∷ ( E ) ·}

Page 65: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 65

The algorithm is end that there is no new item produced. The FA of it is shown by Fig.5.7. There are 12 items above, in addition, I1 、 I2 and I9 are conflicted item set, means that it has both shift item and reduce item. But those problem can be solved by SLR(1) parser.

For example, item set I1={ S =E· ∷ , E = E·+∷T }, FOLLOW ( S ) ={ # } and set{ + } are not intersected. When

the input symbol is “+”, it will shift, when the input symbol is “#”, it will be reduced by rule S =E.∷

Page 66: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 66

For item set I2={ S =E· ∷ , E = E·+T }, FOLLOW∷( E ) ={ +,),# } and set{ * } are not intersected. When the

input symbol is “*”, it will shift, when the input symbol is“+” , “)” or “#”, it will be reduced by rule E =T.∷

For item set I9={ E = E+T·∷ , T =T·*F }, FOLLOW∷( E ) and set { * } are not intersected. When the input symbol is “*”, it will shift, when the input symbol is“+” , “)” or “#”, it will be reduced by rule E = E+T. FOLLO∷W ( E ) ={ + , ) , #} , FOLLOW ( T ) =FOLLOW ( F ) ={ + , * , ) , # }.

Page 67: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 67

So, SLR ( 1 ) parsing table is shown by table 5.7.

Page 68: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 68

Table 5.8 is that we will use SLR ( 1 ) parsing table to judge if the string i* ( i+i ) is the sentence of G[S].

Page 69: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 69

From table 5.8 , we know that string i*(i+i) is the sentence of grammar G[S].

Page 70: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 70

Example 5.4 G[S]: S’ = S∷ S = Aa|Bb|ac∷ A =a∷ B = a∷The corresponding DFA for G[S]:Fig. 5.8 DFA for G[S]图 5.8 与 G[S] 相应的确定有穷自动机 S’ = ·S∷ S = ·Aa∷ S = ·Bb∷ S = ·ac∷ A =·a∷ B = ·aS’ = S·S = A·aS = B·bS = a·c∷ ∷ ∷ ∷ ∷ A =a·∷ B = a·S = Aa·S = Bb·S = ac·aI∷ ∷ ∷ ∷ 1SI0I2I4BbI5I7caI3I6A

Page 72: Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

[email protected] 72

SLR(1) Parse Table for G[S]:

The judgment if a string is the sentence of the grammar is for your assignment.