compilation 0368-3133 (semester a, 2013/14)

58
Compilation 0368-3133 (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav

Upload: raleigh

Post on 25-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Compilation 0368-3133 (Semester A, 2013/14). Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky. Slides credit: Roman Manevich , Mooly Sagiv , Eran Yahav. What is a Compiler?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compilation   0368-3133 (Semester A, 2013/14)

1

Compilation 0368-3133 (Semester A, 2013/14)

Lecture 6a: Syntax(Bottom–up parsing)

Noam RinetzkySlides credit: Roman Manevich, Mooly Sagiv, Eran Yahav

Page 2: Compilation   0368-3133 (Semester A, 2013/14)

2

What is a Compiler?

“A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language).

The most common reason for wanting to transform source code is to create an executable program.”

--Wikipedia

Page 3: Compilation   0368-3133 (Semester A, 2013/14)

3

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend

CompilerFrontend

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

words sentences

Page 4: Compilation   0368-3133 (Semester A, 2013/14)

4

From scanning to parsing((23 + 7) * x)

) x * ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP

Lexical Analyzer

program text

token stream

ParserGrammar: E ... | Id Id ‘a’ | ... | ‘z’

Op(*)

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

validsyntaxerror

Page 5: Compilation   0368-3133 (Semester A, 2013/14)

5

Top-Down vs Bottom-Up• Top-down (predict match/scan-complete )

to be read…

already read…A

Aa b

Aa b

c

aacbb$

AaAb|c

Page 6: Compilation   0368-3133 (Semester A, 2013/14)

6

Top-Down vs Bottom-Up• Top-down (predict match/scan-complete )

Bottom-up (shift reduce)

to be read…

already read…

A

a bA

c

a b

A

aacbb$

AaAb|c

Page 7: Compilation   0368-3133 (Semester A, 2013/14)

7

Model of an LR parser

LR Parserq5

+

q2

T

q0

Stack

$ id + id

Output (AST/Error)state

symbol

ACTION GOTO

Input

Terminals and Non-terminals

Control Tables

Top

Remainder of text to be processed

State controls decisions

Initial stack contains q0

Page 8: Compilation   0368-3133 (Semester A, 2013/14)

8

LR(0) parser tables

State i + ( ) $ E T actionq0 q5 q7 q1 q6 shiftq1 q3 q2 shiftq2 reduce: ZE$q3 q5 q7 q4 shiftq4 reduce: EE+Tq5 reduce: Tiq6 reduce: ETq7 q5 q7 q8 q6 shiftq8 q3 q9 shiftq9 reduce: TE

GOTO Table ACTION TableEmpty cell =

error move

Page 9: Compilation   0368-3133 (Semester A, 2013/14)

9

LR(0) parser tables

• Shift action row – Tells which state to GOTO for current token– Blank entry indicates an error

• Reduce action row – Tells which rule to reduce with

• Independent of current token– GOTO entries are blank

State id + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q5 reduce: Ti

Page 10: Compilation   0368-3133 (Semester A, 2013/14)

10

State id + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q5 reduce: Ti

Shift Move• Remove first token from input• Push it on the stack• Compute next state based on GOTO table• Push new state on the stack• If new state is error – report error

id + id $input

q0stack

+ id $input

q0stack

shift

id q5

Page 11: Compilation   0368-3133 (Semester A, 2013/14)

11

Reduce Move (using Nα)• Symbols in α and their following states are removed from

stack• New state computed based on GOTO table (using top of

stack, before pushing N)• N is pushed on the stack• New state pushed on top of N

+ id $input

q0stack id q5

Reduce:T id + id $input

q0stack T

State id + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q5 reduce: Ti

Page 12: Compilation   0368-3133 (Semester A, 2013/14)

12

Reduce Move (using Nα)• Symbols in α and their following states are removed from

stack• New state computed based on GOTO table (using top of

stack, before pushing N)• N is pushed on the stack• New state pushed on top of N

+ id $input

q0stack id q5

Reduce:T id + id $input

q0stack T q6

State id + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q5 reduce: Ti

Page 13: Compilation   0368-3133 (Semester A, 2013/14)

13

GOTO/ACTION tableState i + ( ) $ E T

q0 s5 s7 s1 s6

q1 s3 s2

q2 r1 r1 r1 r1 r1 r1 r1

q3 s5 s7 s4

q4 r3 r3 r3 r3 r3 r3 r3

q5 r4 r4 r4 r4 r4 r4 r4

q6 r2 r2 r2 r2 r2 r2 r2

q7 s5 s7 s8 s6

q8 s3 s9

q9 r5 r5 r5 r5 r5 r5 r5

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

Warning: numbers mean different things!rn = reduce using rule number nsm = shift to state m

Page 14: Compilation   0368-3133 (Semester A, 2013/14)

Parsing id+id$

14

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Stack Input Action0 id + id $ s5

Initialize with state 0

Page 15: Compilation   0368-3133 (Semester A, 2013/14)

15

Parsing id+id$goto action S

T E $ ) ( + idg6 g1 s7 s5 0

acc s3 12

g4 s7 s5 3r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

Stack Input Action0 id + id $ s5

Initialize with state 0

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 16: Compilation   0368-3133 (Semester A, 2013/14)

16

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 17: Compilation   0368-3133 (Semester A, 2013/14)

17

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

pop id 5

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 18: Compilation   0368-3133 (Semester A, 2013/14)

18

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

push T 6

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 19: Compilation   0368-3133 (Semester A, 2013/14)

19

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r2

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 20: Compilation   0368-3133 (Semester A, 2013/14)

20

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s3

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 21: Compilation   0368-3133 (Semester A, 2013/14)

21

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s5

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 22: Compilation   0368-3133 (Semester A, 2013/14)

22

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 23: Compilation   0368-3133 (Semester A, 2013/14)

23

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r3

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 24: Compilation   0368-3133 (Semester A, 2013/14)

24

Parsing id+id$Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s2

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 25: Compilation   0368-3133 (Semester A, 2013/14)

25

Constructing an LR parsing table

• Construct a transition diagram (deterministic FSM) – States = sets of LR(0) items– Transitions = one-step derivation

• If there are conflicts – stop• Fill table entries from diagram

Page 26: Compilation   0368-3133 (Semester A, 2013/14)

26

Terminology: Reductions & Handles

• The opposite of derivation is called reduction– Let Aα be a production rule– Derivation: βAµ βαµ– Reduction: βαµ βAµ

• A handle is the reduced substring– α is the handles for βαµ

Page 27: Compilation   0368-3133 (Semester A, 2013/14)

LR(0) Items• The items of a grammar are obtained by placing a

dot at every position in every production

27

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

1: S E$2: S E $3: S E $ 4: E T5: E T 6: E E + T7: E E + T8: E E + T9: E E + T 10: T id11: T id 12: T (E)13: T ( E)14: T (E )15: T (E)

Grammar LR(0) items

Page 28: Compilation   0368-3133 (Semester A, 2013/14)

28

LR(0) Item - Intuition

N αβ

Already matched To be matchedInput

Hypothesis about αβ is the rule being reduced, and so far we’ve matched α and we expect to see β

Page 29: Compilation   0368-3133 (Semester A, 2013/14)

29

Types of LR(0) items

N αβ Shift Item

N αβ Reduce Item

Page 30: Compilation   0368-3133 (Semester A, 2013/14)

LR(0) automaton example

30

Z E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) Z E$

Z E$E E+ T E E+T

T iT (E)

T i

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

reduce stateshift state

Page 31: Compilation   0368-3133 (Semester A, 2013/14)

31

Computing item sets

• Initial set– Z is in the start symbol – -closure({ Zα | Zα is in the grammar } )

• Next set from a set S and the next symbol X– step(S,X) = { NαXβ | NαXβ in the item set S}– nextSet(S,X) = -closure(step(S,X))

Page 32: Compilation   0368-3133 (Semester A, 2013/14)

32

Operations for transition diagram construction

• Initial = {S’S$}

• For an item set IClosure(I) = Closure(I) ∪ {Xµ is in grammar| NαXβ in I}

• Goto(I, X) = { NαXβ | NαXβ in I}

Page 33: Compilation   0368-3133 (Semester A, 2013/14)

33

Initial example

• Initial = {S E $}(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

Page 34: Compilation   0368-3133 (Semester A, 2013/14)

34

Closure example

• Initial = {S E $}• Closure({S E $}) = {

S E $ E T E E + T T id T ( E ) }

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

Page 35: Compilation   0368-3133 (Semester A, 2013/14)

35

Goto example

• Initial = {S E $}• Closure({S E $}) = {

S E $ E T E E + T T id T ( E ) }

• Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

Page 36: Compilation   0368-3133 (Semester A, 2013/14)

36

Constructing the transition diagram

• Start with state 0 containing itemClosure({S E $})

• Repeat until no new states are discovered– For every state p containing item set Ip, and

symbol N, compute state q containing item setIq = Closure(goto(Ip, N))

Page 37: Compilation   0368-3133 (Semester A, 2013/14)

LR(0) automaton example

37

Z E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) Z E$

Z E$E E+ T E E+T

T iT (E)

T i

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

reduce stateshift state

Page 38: Compilation   0368-3133 (Semester A, 2013/14)

38

Automaton construction example

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$q0

Initialize

Page 39: Compilation   0368-3133 (Semester A, 2013/14)

39

S E$E TE E + TT iT (E)

q0

applyClosure

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 40: Compilation   0368-3133 (Semester A, 2013/14)

40

Automaton construction example

S E$E TE E + TT iT (E)

q0 E T

q6

TT (E)E TE E + TT iT (E)

(

T i

q5

i

S E$E E+ T

q1

E

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 41: Compilation   0368-3133 (Semester A, 2013/14)

41

S E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) S E$

Z E$E E+ T E E+T

T iT (E)

T i

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

terminal transition corresponds to shift action in parse table

non-terminal transition corresponds to goto action in parse table

a single reduce item corresponds to reduce action

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Page 42: Compilation   0368-3133 (Semester A, 2013/14)

42

Are we done?

• Can make a transition diagram for any grammar

• Can make a GOTO table for every grammar

• Cannot make a deterministic ACTION table for every grammar

Page 43: Compilation   0368-3133 (Semester A, 2013/14)

LR(0) conflicts

43

Z E $E T E E + TT i T ( E )T i[E]

Z E$E T

E E + TT i

T (E)T i[E] T i

T i[E]

q0

q5

T

(

i

E Shift/reduce conflict

Page 44: Compilation   0368-3133 (Semester A, 2013/14)

LR(0) conflicts

44

Z E $E T E E + TT i V iT ( E )

Z E$E T

E E + TT IV i

T (E)T i[E] T i

V i

q0

q5

T

(

i

E reduce/reduce conflict

Page 45: Compilation   0368-3133 (Semester A, 2013/14)

45

LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)

• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ

Page 46: Compilation   0368-3133 (Semester A, 2013/14)

46

Conflicts

• Can construct a diagram for every grammar but some may introduce conflicts

• shift-reduce conflict: an item set contains at least one shift item and one reduce item

• reduce-reduce conflict: an item set contains two reduce items

Page 47: Compilation   0368-3133 (Semester A, 2013/14)

47

LR variants

• LR(0) – what we’ve seen so far• SLR(0)

– Removes infeasible reduce actions via FOLLOW set reasoning

• LR(1)– LR(0) with one lookahead token in items

• LALR(0)– LR(1) with merging of states with same LR(0)

component

Page 48: Compilation   0368-3133 (Semester A, 2013/14)

48

LR (0) GOTO/ACTIONS tables

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 ZE$q3 q5 q7 q4 Shift

q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 TE

GOTO TableACTION

Table

ACTION table determined only by state, ignores input

GOTO table is indexed by state and a grammar symbol from the stack

Page 49: Compilation   0368-3133 (Semester A, 2013/14)

49

SLR parsing• A handle should not be reduced to a non-terminal N if

the lookahead is a token that cannot follow N• A reduce item N α is applicable only when the

lookahead is in FOLLOW(N)– If b is not in FOLLOW(N) we just proved there is no derivation

S * βNb. – Thus, it is safe to remove the reduce item from the conflicted

state• Differs from LR(0) only on the ACTION table

– Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take

Page 50: Compilation   0368-3133 (Semester A, 2013/14)

50

SLR action table

State i + ( ) [ ] $

0 shift shift

1 shift accept

2

3 shift shift

4 EE+T EE+T EE+T5 Ti Ti shift Ti6 ET ET ET7 shift shift

8 shift shift

9 T(E) T(E) T(E)

vs.

state action

q0 shift

q1 shift

q2

q3 shift

q4 EE+Tq5 Tiq6 ETq7 shift

q8 shift

q9 TE

SLR – use 1 token look-ahead LR(0) – no look-ahead… as before…T i T i[E]

Lookahead token from the input

Page 51: Compilation   0368-3133 (Semester A, 2013/14)

51

LR(1) grammars

• In SLR: a reduce item N α is applicable only when the lookahead is in FOLLOW(N)

• But FOLLOW(N) merges lookahead for all alternatives for N– Insensitive to the context of a given production

• LR(1) keeps lookahead with each LR item• Idea: a more refined notion of follows

computed per item

Page 52: Compilation   0368-3133 (Semester A, 2013/14)

52

LR(1) items• LR(1) item is a pair

– LR(0) item– Lookahead token

• Meaning– We matched the part left of the dot, looking to match the part on

the right of the dot, followed by the lookahead token• Example

– The production L id yields the following LR(1) items

[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

[L → ● id][L → id ●]

LR(0) items

LR(1) items

Page 53: Compilation   0368-3133 (Semester A, 2013/14)

53

LALR(1)

• LR(1) tables have huge number of entries• Often don’t need such refined observation (and

cost)• Idea: find states with the same LR(0) component

and merge their lookaheads component as long as there are no conflicts

• LALR(1) not as powerful as LR(1) in theory but works quite well in practice– Merging may not introduce new shift-reduce conflicts,

only reduce-reduce, which is unlikely in practice

Page 54: Compilation   0368-3133 (Semester A, 2013/14)

54

Summary

Page 55: Compilation   0368-3133 (Semester A, 2013/14)

55

Summary• Bottom up derivation• LR(k) can decide on a reduce after seeing the entire right

side of the rule plus k look-ahead tokens. – We focused on LR(0)

• Using a table and a stack to derive• LR Items and the automaton• Creating the table from the automaton• LR parsing with pushdown automata• LR(0), SLR, LR(1) – different kinds of LR items, same basic

algorithm

Page 56: Compilation   0368-3133 (Semester A, 2013/14)

56

Broad kinds of parsers

• Parsers for arbitrary grammars– Earley’s method, CYK method– Usually, not used in practice (though might change)

• Top-Down parsers – Construct parse tree in a top-down matter– Find the leftmost derivation

• Bottom-Up parsers – Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order

Page 57: Compilation   0368-3133 (Semester A, 2013/14)

57

LR is More Powerful than LL• Any LL(k) language is also in LR(k), i.e., LL(k) LR(k). ⊂

– LR is more popular in automatic tools

• But less intuitive

• Also, the lookahead is counted differently in the two cases– In an LL(k) derivation the algorithm sees the left-hand side of the

rule + k input tokens and then must select the derivation rule– In LR(k), the algorithm “sees” all right-hand side of the derivation

rule + k input tokens and then reduces• LR(0) sees the entire right-side, but no input token

Page 58: Compilation   0368-3133 (Semester A, 2013/14)

58

Grammar Hierarchy

Non-ambiguous CFGLR(1)

LALR(1)

SLR(1)

LL(1)

LR(0)