compiler structures
DESCRIPTION
Compiler Structures. 241-437 , Semester 1 , 2011-2012. Objective describe bottom-up (LR) parsing using shift-reduce and parse tables explain how LR parse tables are generated. 6. Bottom-up (LR) Parsing. Overview. 1. What is a LR Parser? 2. Bottom-up using Shift-Reduce - PowerPoint PPT PresentationTRANSCRIPT
241-437 Compilers: Bottom-up/6 1
Compiler Structures
• Objective– describe bottom-up (LR) parsing using shift-
reduce and parse tables– explain how LR parse tables are generated
241-437, Semester 1, 2011-2012
6. Bottom-up (LR) Parsing
241-437 Compilers: Bottom-up/6 2
Overview
1. What is a LR Parser?2. Bottom-up using Shift-Reduce3. Building a LR Parser4. Generating the Parse Table5. LR Conflicts6.LL, SLR, LR, LALR Grammars
241-437 Compilers: Bottom-up/6 3
In this lecture
Source Program
Target Lang. Prog.
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
FrontEnd
Code Optimizer
Target Code Generator
BackEnd
Int. Code Generator
Intermediate Code
but concentratingon bottom-up parsing
241-437 Compilers: Bottom-up/6 4
1. What is a LR Parser?
• A LR parser reads its input tokens from Left-to-right and produces a Rightmost derivation.
• The parse tree is built bottom-up, starting from the leaves and working upwards to the start symbol.
241-437 Compilers: Bottom-up/6 5
LR in ActionGrammar:S a A B eA A b c | bB d
The tree correspondsto a rightmost derivation:S a A B e a A d e a A b c d e a b b c d e
Reducing a sentence:a b b c d ea A b c d ea A d ea A B eS
S
a b b c d eA
AB
a b b c d eA
AB
a b b c d eA
A
a b b c d eA
These matchproduction’s
right-hand sides
parse "a b b c d e"
241-437 Compilers: Bottom-up/6 6
LR(k) Parsing
• The k is to the number of input tokens that are looked at when deciding which production to use.– e.g. LR(0), LR(1)
• We'll be using a variation of LR(0) parsing in this chapter.
241-437 Compilers: Bottom-up/6 7
LR versus LL
• LR can deal with more complex (powerful) grammars than LL (top-down parsers).
• LR can detect errors quicker than LL.
• LR parsers can be implemented very efficiently, but they're difficult to build by hand (unlike LL parsers).
241-437 Compilers: Bottom-up/6 8
2. Bottom-up using Shift-Reduce
• The usual way of implementing bottom-up parsing is by using shift-reduce:– ‘shift’ means read in a new input token, and push it
onto a stack
– ‘reduce’ means to group several symbols into a single non-terminal• by choosing a production to use 'backwards'• the symbols are popped off the stack, and the production's
non-terminal is pushed onto it
241-437 Compilers: Bottom-up/6 9
Shift-Reduce Parsing
$$$$Reduce S => a A B Reduce S => a A B ee
$$$ a A B e$ a A B eShiftShifte $e $$ a A B$ a A BReduce B => dReduce B => de $e $$ a A d$ a A dShiftShiftd e $d e $$ a A$ a AReduce A => A b cReduce A => A b cd e $d e $$ a A b c$ a A b cShiftShiftc d e $c d e $$ a A b$ a A bShiftShiftb c d e $b c d e $$ a A$ a AReduce A => bReduce A => bb c d e $b c d e $$ a b$ a bShiftShiftb b c d e $b b c d e $$ a$ aShiftShifta b b c d e a b b c d e
$$$$
ActionActionInputInputStackStack
S => a A B e A => A b c | b B => d
241-437 Compilers: Bottom-up/6 10
3. Building a LR Parser
• The standard way of writing a shift-reduce LR parser is to generate a parse table for the grammar, and 'plug' that into a standard LR compiler framework.
• The table has two main parts: actions and gotos.
241-437 Compilers: Bottom-up/6 11
actions gotos
3.1. Inside an LR Parser
$$aann……aaii……aa22aa11
LR Parser
XXo o ss00
……XXm-1 m-1 ssm-1 m-1
XXm m ssmm output(parse tree)
stack
input tokens
possible actions areshift, reduce, accept, error
X is terminals ornon-terminals,S = state
Parse table(you create this bit)
gotos involvestate changes
push; pop
241-437 Compilers: Bottom-up/6 12
Parse Table for the Example
r2r28
acc7
r46
s85
s74
r3r33
4s6s522s31
s10
BAS$edcbaState1: S => a A B e2: A => A b c 3: A => b4: B => d
Action part
Goto parts means shift toto that state
r means reduce by that numbered production
241-437 Compilers: Bottom-up/6 13
3.2. Table Algorithm
push(<$,0>); /* push <symbol,state> pair */currToken = scanner();
while(1) { <x,state> = pair on top of stack; if (action[state, currToken ] == <shift newState>) { push(<currToken ,newState>); currToken = scanner();
} : : 4 branches for the four
possible actions thatcan be in a table cell
continued
241-437 Compilers: Bottom-up/6 14
else if (action[state, currToken ] == <reduce ruleNum> ) {
A --> is rule number ruleNum; bodySize = numElements(); pop bodySize pairs off stack; state’ = state part of pair on top of stack; push( <A, goto[state’,A] > ); }
: :
continued
241-437 Compilers: Bottom-up/6 15
else if (action[state,currToken ] = accept) { S --> is the start symbol production; bodySize = numElements(); pop bodySize pairs off stack; state’ = state part of pair on top of stack; if (state’ == 0) break; // success; can now stop else error(); } else error();
} // of while loop
241-437 Compilers: Bottom-up/6 16
3.3. Table Parsing Example
$$$0$0Accept S => a A B eAccept S => a A B e$$$0,a1,A2,B6,e$0,a1,A2,B6,e
77
Shift 7Shift 7e $e $$0,a1,A2,B4$0,a1,A2,B4Reduce B => dReduce B => de $e $$0,a1,A2,d6$0,a1,A2,d6Shift 6Shift 6d e $d e $$0,a1,A2$0,a1,A2Reduce A => A b cReduce A => A b cd e $d e $$0,a1,A2,b5,c8$0,a1,A2,b5,c8Shift 8Shift 8c d e $c d e $$0,a1,A2,b5$0,a1,A2,b5Shift 5Shift 5b c d e $b c d e $$0,a1,A2$0,a1,A2Reduce A => bReduce A => bb c d e $b c d e $$0,a1,b3$0,a1,b3Shift 3Shift 3b b c d e $b b c d e $$0,a1$0,a1Shift 1Shift 1a b b c d e a b b c d e
$$$0$0
ActionActionInputInputStackStack
pop 1 pairstate' == 1push(A,goto(1, A)) = push(A,2)
pop 3 pairsstate' == 1push(A,goto(1, A)) = push(A,2)
S => a A B e A => A b c | b B => d
241-437 Compilers: Bottom-up/6 17
3.4. The LR Parse Stack
• The parse stack holds the branches of the tree being built bottom-up.
• For example, – the stack $0,a1,A2,b5,c8 represents:
a b
A
b c
continued
241-437 Compilers: Bottom-up/6 18
The next stack: $0,a1,A2
a b
A
b c
A
Later, $0,a1,A2,B6,e7
a b
A
b c
A
d
B
e
continued
241-437 Compilers: Bottom-up/6 19
4. Generating the Parse Table
• The example parse table was generated using the SLR (simple LR) algorithm– an extension of LR(0) which uses the grammar'
s FOLLOW() sets
• The other LR algorithms can be used to make a parse table:– e.g. LR(1), LALR(1)
241-437 Compilers: Bottom-up/6 20
Supporting Techniques
• SLR table generation makes use of three techniques:– LR(0) items– the closure() function– the goto() function
• I'll explain each one first, before the table generation algorithm.
241-437 Compilers: Bottom-up/6 21
4.1. LR(0) Items
• An LR(0) item is a grammar production with a • at some position of the right-hand side.
• So, a productionA X Y Z
has four items:A • X Y ZA X • Y Z A X Y • ZA X Y Z •
• Production A has one item A •
241-437 Compilers: Bottom-up/6 22
4.2. The closure() Function
• The closure() function generates a set of LR(0) items.
• Assume that the grammar only has one production for the start symbol S, S =>
• The initial closure set is: closure( { S => • } )
continued
241-437 Compilers: Bottom-up/6 23
• If A•B is in the set, then for each production B, add the item B• to the set, if it's not already there.
• Repeat until no new items can be added to the set.
241-437 Compilers: Bottom-up/6 24
Example use of closure()Grammar:S --> EE E + T | TT T * F | FF ( E )F id
{ S • E }
closure({ S •E }) =
{ S • E E • E + T E • T }
{ S • E E • E + T E • T T • T * F T • F }
{ S • E E • E + T E • T T • T * F T • F F • ( E ) F • id }
Add E•
Add T•Add F•
241-437 Compilers: Bottom-up/6 25
4.3. The goto() Function
• goto(In, X) takes as input an existing closure set In, and a terminal/non-terminal symbol X.
• The output is a new closure set In+1:– for each item A • X in In, add
closure({ A X • }) to In+1
– repeat until no more items can be added to In+1
In In+1
X
241-437 Compilers: Bottom-up/6 26
goto() Example 1
• Grammar:S => A B // rule 1, for start symbolA => aB => b
• Initial state I0 = closure( { S => • A B } )= { S => • A B
A => • a }
continued
241-437 Compilers: Bottom-up/6 27
• goto( I0, A) == closure( { S => A • B } )= { S => A • B, B => • b} // call it I1
• goto( I0, a) == closure( { A => a • } )= { A => a • } // call it I2
I0 I1
I2
A
a
continued
241-437 Compilers: Bottom-up/6 28
• goto( I1, B) == closure( { S => A B • } )= { S => A B • } // call it I3
– this is the end of the S production
• goto( I1, b) == closure( { B => b • } )= { B => b • } // call it I4
I0 I1
I2
A
a
I3
I4
B
bendstate
241-437 Compilers: Bottom-up/6 29
goto() Example 2
• Grammar:S => a A B e // rule 1, for start symbolA => A b c | bB => d
• Initial state I0 = closure( { S => • a A B e } )= { S => • a A B e }
continued
241-437 Compilers: Bottom-up/6 30
• goto( I0, a) == closure( { S => a • A B e } )= { S => a • A B e
A => • A b c A => • b} // call it I1
continued
I0 I1
a
241-437 Compilers: Bottom-up/6 31
• goto( I1, A) == closure( { S => a A • B e
A => A • b c } )= { S => a A • B e
A => A • b c B => • d } // call it I2
• goto( I1, b) == closure( { A => b • } )= { A => b • } // call it I3
I0 I1
I2
a
A
I3
b
continued
241-437 Compilers: Bottom-up/6 32
• goto( I2, B) == closure( { S => a A B • e } )= { S => a A B • e } // call it I4
• Others– I5: { A => A b • c }
– I6: { B => d • }– I7: { S => a A B e • } // end of start symbol rule
– I8: { A => A b c • }
I0 I1
I2
a
A
I3
b
I4 I5 I6
I7 I8
B b d
e c
241-437 Compilers: Bottom-up/6 33
4.4. Using goto() to make a Table
• The columns of the table should be the grammar's terminals, $, and non-terminals.
• The rows should be the I0, I1, …, In numbers 0, 1, …, n.• what we've been calling states
241-437 Compilers: Bottom-up/6 34
Stage 1• In stage 1, we add the shift, goto, and accept en
tries to the table.
• action[i, a] gets <shift j> ifgoto(Ii,a) = Ij
• goto[ i, A ] gets j if
goto( Ii, A) == Ij
continued
241-437 Compilers: Bottom-up/6 35
• action[i, $] get accept ifS => • in Ii (there must be only one S rule)
241-437 Compilers: Bottom-up/6 36
Example Grammar 1 S --> A BA --> aB --> b
I0 I1
I2
A
a
I3
I4
B
b
01234
a b $ S A Bs2
s4
acc
13
action[] goto[]
241-437 Compilers: Bottom-up/6 37
Stage 2
• In stage 2, we add the reduce and error entries to the table.
• action[i, a] gets <reduce ruleNum> if[A => • ] in Ii and A is not S and a is in FOLLOW(A) and
A => is rule number ruleNum
continued
241-437 Compilers: Bottom-up/6 38
• After filling the table cells with shift, goto, accept, and reduce actions, any remaining empty cells will trigger an error() call.
241-437 Compilers: Bottom-up/6 39
Finishing the Example Table• The reduce states are the state boxes at the leave
s of the closure graph.– but exclude the end state
• For the example 1 grammar, there are two boxes at the leaves: I2 and I4.
I0 I1
I2
A
a
I3
I4
B
b
241-437 Compilers: Bottom-up/6 40
I2 Reduction
• I2 = { A => a • }– A => a is rule number 2– FOLLOW(A) == FIRST(B) = { b }
• So action[ 2, b ] gets <reduce 2>
S --> A BA --> aB --> b
241-437 Compilers: Bottom-up/6 41
I4 Reduction
• I4 = { B => b • }– B => b is rule number 3– FOLLOW(B) = { $ }
• So action[ 4, $ ] gets <reduce 3>
S --> A BA --> aB --> b
241-437 Compilers: Bottom-up/6 42
Adding Reduce Entries S --> A BA --> aB --> b
I0 I1
I2
A
a
I3
I4
B
b
01234
a b $ S A Bs2
s4
acc
13
action[] goto[]
r2
r3
241-437 Compilers: Bottom-up/6 43
Using the Example 1 Table
$$$0$0Accept (S --> A B)Accept (S --> A B)$$$0,A1,B3$0,A1,B3Reduce 3 (B --> b)Reduce 3 (B --> b)$$$0,A1,b4$0,A1,b4Shift 4Shift 4b $b $$0,A1$0,A1Reduce 2 (A --> a)Reduce 2 (A --> a)b $b $$0,a2$0,a2Shift 2Shift 2a b $a b $$0$0ActionActionInputInputStackStack
S --> A BA --> aB --> b
pop 1 pair;state' = 0;push(A, goto(0,A)) == push(A,1);
pop 1 pair;state' = 1;push(B, goto(1,B)) == push(B,3);
241-437 Compilers: Bottom-up/6 44
4.5. Example Grammar 2S --> a A B eA --> A b c | bB --> d
I0 I1
I2
a
A
I3
b
I4 I5 I6
I7 I8
B b d
e c
action[] goto[]
01234
a b c d e $ S A B
5678
Stage 1
s1s3s5 s6
s7s8
acc
24
241-437 Compilers: Bottom-up/6 45
Reduce States
• For the example 2 grammar, there are three boxes at the leaves: I3, I6, and I8.
241-437 Compilers: Bottom-up/6 46
I3 Reduction
• I3 = { A => b • }– A => b is rule number 3– FOLLOW(A) = {b} FIRST(B)– = {b, d}
• So action[ 3, b ] and action[ 3, d ] gets <reduce 3>
S --> a A B eA --> A b c A --> bB --> d
241-437 Compilers: Bottom-up/6 47
I6 Reduction
• I6 = { B => d • }– B => d is rule number 4– FOLLOW(B) = {e}
• So action[ 6, e ] gets <reduce 4>
S --> a A B eA --> A b c A --> bB --> d
241-437 Compilers: Bottom-up/6 48
I8 Reduction
• I8 = { A => A b c • }– A => A b c is rule number 2– FOLLOW(A) = {b, d}
• So action[ 8, b ] and action[ 8, d ] gets <reduce 2>
S --> a A B eA --> A b c A --> bB --> d
241-437 Compilers: Bottom-up/6 49
Adding Reduce EntriesS --> a A B eA --> A b c | b B --> d
I0 I1
I2
a
A
I3
b
I4 I5 I6
I7 I8
B b d
e c
action[] goto[]
01234
a b c d e $ S A B
5678
s1s3s5 s6
s7s8
acc
24
r3 r3
r4
r2 r2
241-437 Compilers: Bottom-up/6 50
5. LR Conflicts• A LR conflict occurs when a cell in the
action part of the parse table contains more than one action.
• There are two kinds of conflict:– shift/reduce and reduce/reduce
• Conflicts appear because of:– grammar ambiguity– limitations of the SLR parsing method
(even when the grammar is unambiguous)
241-437 Compilers: Bottom-up/6 51
5.1. Shift/Reduce
• A shift/reduce conflict occurs when the parser cannot decide whether to shift the next symbol or reduce with a production– typically, the default action is to shift
241-437 Compilers: Bottom-up/6 52
Dangling Else Example
• Grammar rule:IfStmt => if Expr then Stmt | if Expr then Stmt else Stmt
• Example:if (a == 1) then
if (b == 4) then x = 2; else ... <-- this goes with which 'if' ?
241-437 Compilers: Bottom-up/6 53
On the Stack
Stack$…$…if Expr then Stmt
Input…$
else…$
Action…shift or reduce?
Choose shift, so elsematches closest if
241-437 Compilers: Bottom-up/6 54
5.2. Reduce/Reduce
• A reduce/reduce conflict occurs when the parser cannot decide which production to use to make a reduction.
• Typically, the first suitable production is used.
241-437 Compilers: Bottom-up/6 55
Example
Stack$$a
Inputaa$
a$
Actionshiftreduce A a or B a ?
Grammar:C A BA aB a
Choose A a,since it's the first
suitable one.
241-437 Compilers: Bottom-up/6 56
6. LL, SLR, LR, LALR Grammars
LL(1)
LR(1)
LR(0)
SLR
LALR(1)
the ovalsrepresent thecomplexityof the grammarsthat the notationcan handle
we've been using SLR in this chapter
LL(1) was usedin chapter 5 ontop-down parsing
241-437 Compilers: Bottom-up/6 57
LR(1) Grammars
• LR(1) parsing uses one token lookahead to avoid conflicts in the parsing table.
• It can deal with more complex/powerful grammars than LR(0) or SLR.
• A LR(1) grammar takes longer to convert into a parse table.
241-437 Compilers: Bottom-up/6 58
LALR(1) Grammars
• LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce the size of the parse table.
• LALR(1) is less powerful than LR(1)– it may introduce reduce-reduce conflicts, but that's not
likely for programming language grammars
• LALR(1) is used by the YACC parsing tool– see next chapter