241-437 compilers: topdown/5 1 compiler structures objective – –look at top-down (ll) parsing...

97
241-437 Compilers: topDown/5 Compiler Structures Objective look at top-down (LL) parsing using recursive descent and tables consider a recursive descent parser for the Expressions language 241-437, Semester 1, 2011-2012 5. Top-down Parsing

Upload: dwain-franklin

Post on 11-Jan-2016

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 1

Compiler Structures

• Objective– look at top-down (LL) parsing using recursive descent and tables– consider a recursive descent parser for the Expressions language

241-437, Semester 1, 2011-2012

5. Top-down Parsing

Page 2: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 2

Overview

1. Parsing with a Syntax Analyzer

2. Creating a Recursive Descent Parser

3. The Expressions Language Parser

4. LL(1) Parse Tables

5. Making a Grammar LL(1)

6. Error Recovery in LL Parsing

Page 3: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 3

In this lecture

Source Program

Target Lang. Prog.

Semantic Analyzer

Syntax Analyzer

Lexical Analyzer

FrontEnd

Code Optimizer

Target Code Generator

BackEnd

Int. Code Generator

Intermediate Code

but concentratingon top-down parsing

Page 4: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 4

1. Parsing with a Syntax Analyzer

LexicalAnalyzer

(using chars)

SyntaxAnalyzer

(using tokens)

SourceProgram

3. Token,token value

1. Get nexttoken

lexicalerrors

syntaxerrors

2. Get charsto makea token

parsetree

Page 5: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 5

1.1. Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

B 1

2

3

4

5

6

B => begin SS end

SS => S ; SS

SS => S => simplestmt

S => begin SS end

Page 6: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 6

1.2. LL Parsing Definition

• An LL parser is a top-down parser for a context-free grammar.

• It parses input from Left to right, and constructs a Leftmost derivation of the input.

Page 7: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 7

A Leftmost Derivation

• In a leftmost derivation, the leftmost non-terminal is chosen to be expanded.– this builds the parse tree top-down, left-to-right

• Example grammar:L => ( L ) L

L =>

Page 8: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 8

Leftmost Derivation for (())()

LL // L => ( L ) L// L => ( L ) L

( ( LL ) L ) L // L => ( L ) L// L => ( L ) L

( ( ( ( LL ) L ) L) L ) L // L => // L =>

( ( ) ( ( ) LL ) L ) L // L => // L =>

( ( ) ) ( ( ) ) LL // L => // L =>

( ( ) ) ( ( ( ) ) ( LL ) L ) L // L =>( L ) L// L =>( L ) L

( ( ) ) ( ) ( ( ) ) ( ) LL // L => // L =>

( ( ) ) ( )( ( ) ) ( )

( ( ) ) ( )

input

Page 9: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 9

1.3. LL(1) and LL(k)

• An LL(1) parser uses the current token only to decide which production to use next.

• An LL(k) parser uses k tokens of input to decide which production to use– this make the grammar easier to write– adds no 'power' compared to LL(1)– harder to implement efficiently

Page 10: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 10

1.4. Two LL Implementation Approaches

• Recursive Descent parsing – all the compiler code is generated

(automatically) from the grammar

• Table Driven parsing– a table is generated (automatically) from the

grammar– the table is 'plugged' into an existing compiler

Page 11: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 11

2. Creating a Recursive Descent Parser

• Each non-terminal (e.g. A) is translated into a parsing function (e.g. A()).

• The A() function is generated from all the productions for A:– A => B, A => a C, etc.

Page 12: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 12

2.1. Basic Translation Rules

• I'll start by assuming a production body doesn't use *, [], or .– I'll add to the translation rules later to deal with

these extra features

• S => Bodybecomesvoid S(){ translate< Body > }

Page 13: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 13

• If Body isB1 B2 . . . Bn

then it becomes:

translate< B1 > ;translate< B2 > ; :translate< Bn > ;

Page 14: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 14

• If Body isB1 | B2 . . . | Bn

then it becomes:

if (currToken in FIRST_SEQ<B1>) translate<B1> ;else if (currToken in FIRST_SEQ<B2>) translate<B2> ; :else if (currToken in FIRST_SEQ<Bn>) translate<Bn> ;else error();

Page 15: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 15

• currToken is the current token, which is obtained from the lexical analyzer:

Token currToken; // global

void nextToken(void){ currToken = scanner(); }

Page 16: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 16

• The first token is read when the parser first starts. main() also calls the function representing the start symbol:

int main(void){ nextToken(); S(); // S is the grammar's start symbol : // other code return 0;}

Page 17: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 17

• error() reports that the current token cannot be matched against any production:

int lineNum; // global

void error(){ printf("\nSyntax error at \'%s\' on line %d\n", currentToken, lineNum); exit(1);}

Page 18: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 18

• In a body, if B is a non-terminal, it is translated into the function call:

B();

• In a body, if b is a terminal, it is translated into a match() call:

match(b);

Page 19: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 19

• match() checks that the current token is what is expected (e.g. b), and reads in the next one for future testing:

void match(Token expected){ if(currToken == expected) currToken = scanner(); else error();}

Page 20: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 20

• Special '|' Body case. If Body isa1 B1 | a2 B2 . . . | an Bn // ai's are terminals

then it becomes:

if (currToken == a1) { match(a1); translate<B1> ; }else if (currToken == a2) { match(a2); translate<B2> ; } :else if (currToken == an) { match(an); translate<Bn> ; }else error();

a1, a2, ..., anmust be different

Page 21: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 21

void S() { // S => a B | b C if (currToken == a) {

match(a); B(); } else if (currToken == b) {

match(b); C(); } else error();}

void B() { // B => b b C match(b); match(b); C();}

void C() { // C => c c match(c); match(c);}

2.2. Example Translation

And main(),nextToken(),match(), anderror().

Page 22: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 22

Parsing "abbcc"

S

a B

b b C

c c

Function calls:main() --> S() --> match(a); B() --> match(b); match(b); C() --> match(c); match(c)

a b b c c

input

Page 23: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 23

2.3. When can we use Recursive Descent?• A fast/efficient recursive descent parser can

be generated for a LL(1) grammar.

• So we must first check if the grammar is LL(1).– the check will generate information that can be

used in constructing the parser– e.g. FIRST_SEQ<...>

Page 24: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 24

Dealing with "if"

• A tricky part of LL(1) is making sure that A tricky part of LL(1) is making sure that branches can be codedbranches can be coded– each branch must start differently so it's easy each branch must start differently so it's easy

(and also fast) to decide which branch to use (and also fast) to decide which branch to use based only on the current input token based only on the current input token (currToken value)(currToken value)

continued

Page 25: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 25

• e.g.e.g.– A --> a B1A --> a B1

A --> b B2A --> b B2– is okay since the two branches start is okay since the two branches start

differently (a and b)differently (a and b)

– A --> a B1A --> a B1A --> a B2A --> a B2

– notnot okay since both branches start the same okay since both branches start the same wayway

a .. .. .. ..

currToken

continued

Page 26: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 26

• In non-mathematical words, a grammar is In non-mathematical words, a grammar is LL(1) if the choice between productions LL(1) if the choice between productions can be made by looking only at the start of can be made by looking only at the start of the production bodies and the current input the production bodies and the current input token (currToken).token (currToken).

Page 27: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 27

Is a Grammar LL(1)?

• For every non-terminal in the language (e.g. A, B, C), generate the PREDICT set for all the productions:

PREDICT( A => 1) PREDICT( A => 2 )PREDICT( A => 3 )

PREDICT( B => 1 ) PREDICT( B => 2 )

PREDICT( C => 1 ) ...

in maths

continued

Page 28: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 28

• Take the intersection of all Take the intersection of all pairs of setspairs of sets for A: for A:PREDICT( A => PREDICT( A => 1) 1) ∩∩PREDICT( A => PREDICT( A => 2 ) 2 ) ∩∩

PREDICT( A => PREDICT( A => 1) 1) ∩∩PREDICT( A => PREDICT( A => 3 ) 3 ) ∩∩

PREDICT( A => PREDICT( A => 2) 2) ∩∩PREDICT( A => PREDICT( A => 3 ) 3 ) ∩∩

– the intersection of the intersection of every pair every pair must be empty (must be empty (disjointdisjoint))

continued

Page 29: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 29

• Repeat for all the sets for B, C, etc.:Repeat for all the sets for B, C, etc.:– B --> B --> 11 B --> B --> 22– C --> C --> 11 C --> C --> 22 C --> C --> 33

• If every PREDICT intersection pair is If every PREDICT intersection pair is disjoint then the grammar is LL(1).disjoint then the grammar is LL(1).

continued

Page 30: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 30

• If there's only one PREDICT set for a non-terminal (e.g. D --> d1), then it's automatically disjoint.

Page 31: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 31

Calculating PREDICT

• PREDICT(A => ) = (FIRST_SEQ() – { FOLLOW(A)

if in FIRST_SEQ()or= FIRST_SEQ() if not in FIRST_SEQ()

• FIRST_SEQ() and FOLLOW() are the set functions I described in chapter 4.

Page 32: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 32

Short Example 1

• S => a S | a• Production Predict

– S => a S {a}– S => a {a}

• PREDICT(S) = {a} ∩ {a } == {a}– not disjoint– the grammar is not LL(1)

Page 33: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 33

Short Example 2

• S => a S | b• Production Predict

– S => a S {a}– S => b {b}

• PREDICT(S) = {a} ∩ {b } == {}– disjoint– the grammar is LL(1)

Page 34: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 34

Larger Example

• Is this grammar LL(1)?E => T E1

E1 => + T E1 | T => F T1

T1 => * F T1 | F => id | '(' E ')'

FIRST(F) = {(,id}

FIRST(T) = {(,id}

FIRST(E) = {(,id}

FIRST(T1) = {*,}

FIRST(E1) = {+,}

FOLLOW(E) = {$,)}

FOLLOW(E1) = {$,)}

FOLLOW(T) = {+$,)}

FOLLOW(T1) = {+,$,)}

FOLLOW(F) = {*,+,$,)}

Page 35: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 35

ProductionProduction PredictPredict

E => T E1E => T E1 = FIRST(T) = {(,id}= FIRST(T) = {(,id}

E1 => + T E1 => + T E1E1

{+}{+}

E1 => E1 => = FOLLOW(E1) = {$,)}= FOLLOW(E1) = {$,)}

T => F T1T => F T1 = FIRST(F) = {(,id}= FIRST(F) = {(,id}

T1 => * F T1T1 => * F T1 {*}{*}

T1 => T1 => = FOLLOW(T1) = {+,$,)}= FOLLOW(T1) = {+,$,)}

F => idF => id {id}{id}

F => ( E )F => ( E ) {(}{(}

FIRST(F) = {(,id}

FIRST(T) = {(,id}

FIRST(E) = {(,id}

FIRST(T1) = {*,}

FIRST(E1) = {+,}

FOLLOW(E) = {$,)}

FOLLOW(E1) = {$,)}

FOLLOW(T) = {+$,)}

FOLLOW(T1) = {+,$,)}

FOLLOW(F) = {*,+,$,)}

Page 36: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 36

• Are the PREDICT sets disjoint for all the non-terminals?– PREDICT(E): {(,id} yes– PREDICT(E1): {+} ∩ {$,)} yes– PREDICT(T): {(,id} yes– PREDICT(T1): {*} ∩ {+,$,)} yes– PREDICT(F): {id} ∩ {(} yes

• All disjoint, so the grammar is LL(1).

Page 37: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 37

2.4. Extended Translation Rules

• These extra rules allow a production body to use *, [], or .

• S => Bodybecomesvoid S(){ translate< Body > }

same as before

Page 38: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 38

• If Body isB1 | B2 . . . | Bn |

then it becomes:

if (currToken in FIRST_SEQ(B1)) translate<B1> ;else if (currToken in FIRST_SEQ(B2)) translate<B2> ; :else if (currToken in FIRST_SEQ(Bn)) translate<Bn> ;else error();

optional part

include if there's no part in the grammar

Page 39: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 39

• If Body is[ B1 B2 . . . Bn ]

then it becomes:

if (currToken in FIRST_SEQ(B1)) { translate<B1> ; translate<B2> ; : translate<Bn> ;}

– [ B1 B2 ... Bn ] is the same as ( B1 B2 ... Bn ) |

rule []-1

Page 40: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 40

• A variant [] translation. If the body is[ B1 B2 . . . Bn ] C

then it can become: if (currToken not in FIRST_SEQ(C)) translate<B1> ; translate<B2> ; : translate<Bn> ; } translate<C> ;

rule []-2

This may besimpler code than FIRST_SEQ(B1)

Page 41: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 41

• Another variant [] translation. If the grammar rule is

A => [ B1 B2 . . . Bn ]

then it becomes:void A() { if (currToken not in FOLLOW(A)) translate<B1> ; translate<B2> ; : translate<Bn> ; }}

rule []-3

This may besimpler code than FIRST_SEQ(B1)

Page 42: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 42

• If Body is( B1 B2 . . . Bn )*

then it becomes:

while (currToken in FIRST_SEQ(B1)) translate<B1> ; translate<B2> ; : translate<Bn> ;}

rule *-1

Page 43: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 43

• A variant * translation. If the body is( B1 B2 . . . Bn )* C

then it becomes: while (currToken not in FIRST_SEQ(C)) translate<B1> ; translate<B2> ; : translate<Bn> ; } translate<C> ;

rule *-2

This may besimpler code than FIRST_SEQ(B1)

Page 44: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 44

• Another variant * translation. If the grammar rule is

A => ( B1 B2 . . . Bn )*

then it becomes:void A() { while (currToken not in FOLLOW(A)) translate<B1> ; translate<B2> ; : translate<Bn> ; }}

rule *-3

This may besimpler code than FIRST_SEQ(B1)

Page 45: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 45

• match() is slightly changed to deal with the end of input symbol, $:

void match(Token expected){ if(currToken == expected) { if (currToken != $)

currToken = scanner();}

else error();}

Page 46: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 46

Translation Example 1

• The LL(1) Grammar:E => T E1

E1 => [ '+' T E1 ]

T => F T1

T1 => [ '*' F T1 ]

F => id | '(' E ')'

This is the same grammaras on slides 34-36, sowe know it's LL(1).

Page 47: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 47

Generated Parser

void E() // E => T E1{ T(); E1(); }

void E1() // E1 => ['+' T E1 ]{ if (currToken == '+') { match('+'); T(); E1(); }}

use rule []-1

This is C code for"currToken in FIRST_SEQ(+)"

Page 48: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 48

void T() // T => F T1{ F(); T1(); }

void T1() // T1 => ['*' F T1 ]{ if (currToken == '*') { match('*'); F(); T1(); }}

rule []-1

This is C code for"currToken in FIRST_SEQ(*)"

Page 49: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 49

void F() // F => id | '(' E ')'{ if (currToken == ID) match(ID); else if (currToken == '(') { match('('); E(); match(')'): } else error();}

Page 50: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 50

Parsing "a + b * c"

E

T E1

F T1 + T E1

id

a * F T1id

b

F T1

id

c

a + b * c

input

Page 51: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 51

Optimizations

• It's possible to combine grammar rules and/or parse functions, in order to simplify the compiler.

• For example, we can combine:– E and E1– T and T1

Page 52: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 52

Translation Example 2

• The previous LL(1) grammar can be expressed using *:E => T ( '+' T )*

T => F ( '*' F )*

F => id | '(' E ')'

same as before

Page 53: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 53

Generated Parser

• void E() // E => T ('+' T)*{ T(); while (currToken == '+') { match('+'); T(); }}

void T() // T => F ('*' F)*{ F(); while (currToken == '*') { match('*'); F(); }}

rule *-1

rule *-1

Page 54: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 54

void F() // F => id | '(' E ')'{ if (currToken == ID) match(ID); else if (currToken == '(') { match('('); E(); match(')'): } else error();}

same as before

Page 55: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 55

Parsing "a + b * c" Again

E

T

F

+ T

id

a

* F

id

b

F

id

c

done inside theE() loop

done inside theT() loop

Page 56: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 56

3. The Expressions Language Parser

• Is this grammar LL(1)?

Stats => ( [ Stat ] \n )*

Stat => let ID = Expr | Expr

Expr => Term ( (+ | - ) Term )*

Term => Fact ( (* | / ) Fact ) *

Fact => '(' Expr ')' | Int | ID

Page 57: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 57

3.1. FIRST and FOLLOW Sets

• First(Stats) = {let, (, Int, Id, \n, }• First(Stat) = {let, (, Int, Id}• First(Expr) = {(, Int, Id}• First(Term) = {(, Int, Id}• First(Fact) = {(, Int, Id}

• Follow(Stats) = {$}Follow(Stats) = {$}• Follow(Stat) = {\n}Follow(Stat) = {\n}• Follow(Expr) = {\n}Follow(Expr) = {\n}• Follow(Term) = {+, -, \n}Follow(Term) = {+, -, \n}• Follow(Fact) = {*, /, +,-,\n}Follow(Fact) = {*, /, +,-,\n}

Page 58: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 58

3.2. PREDICT Sets

• Production Predict DisjointStats => ( [ Stat ] \n )* {let,(,Int,Id,\n,$} Yes

Stat => let ID = Expr {let} Yes

Stat => Expr {(,Int,Id}

Expr => Term ( (+ | - ) Term )* {(,Int,Id} Yes

Term => Fact ( (* | / ) Fact ) * {(,Int,Id} Yes

Fact => '(' Expr ')' {(}Yes

Fact => Int {Int}

Fact => Id {Id}

Page 59: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 59

3.3. exprParse0.c

• exprParse0.c is a recursive descent parser generated from the expressions grammar.

• It reads in an expressions program file.

• It's output is a print-out of parse function calls.

Page 60: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 60

An Expressions Program (test1.txt)

5 + 6

let x = 2

3 + ( (x*y)/2) // comments

// y

let x = 5

let y = x /0

// comments

Page 61: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 61

Usage> gcc -Wall -o exprParse0 exprParse0.c> ./exprParse0 < test1.txt 1: stats< 2: stat<expr<term<fact<num(5) >>'+' term<fact<num(6) >>>> 3: stat<'let' var(x) '=' expr<term<fact<num(2) >>>> 4: stat<expr<term<fact<num(3) >>'+' term<fact<'('

expr<term<fact<'(' expr<term> 5: 6: stat<'let' var(x) '=' expr<term<fact<num(5) >>>> 7: stat<'let' var(y) '=' expr<term<fact<var(x) >'/'

fact<num(0) >>>> 8: 9: 10: >'eof'

Page 62: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 62

exprParse0.c Callgraphlexical parser(like exprTokens.c)

generated fromthe grammar

Page 63: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 63

Standard Token Functions

// globals (first used in exprToken.c)Token currToken;char tokString[MAX_IDLEN];int tokStrLen = 0;int currTokValue;

int lineNum = 1; // no. of lines read in

void nextToken(void){ currToken = scanner(); }

continued

Page 64: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 64

void match(Token expected){ if(currToken == expected){ printToken(); // produces the parser's output if(currToken != SCANEOF) currToken = scanner(); } else printf("Expected %s, found %s on line %d\n", tokSyms[expected], tokSyms[currToken],lineNum);} // end of match()

continued

Page 65: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 65

void printToken(void){ if (currToken == ID) printf("%s(%s) ", tokSyms[currToken], tokString);

// show token string else if (currToken == INT) printf("%s(%d) ", tokSyms[currToken], currTokValue); // show value else if (currToken == NEWLINE) printf("%s%2d: ", tokSyms[currToken], lineNum); // print newline token else printf("'%s' ", tokSyms[currToken]); // other tokens} // end of printToken()

Page 66: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 66

Syntax Error Reporting

void syntax_error(Token tok){ printf("\nSyntax error at \'%s\'

on line %d\n", tokSyms[tok], lineNum); exit(1);}

Page 67: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 67

main()

int main(void){ printf("%2d: ", lineNum); nextToken(); statements(); match(SCANEOF); printf("\n\n"); return 0;}

function forstart symbol

check that programis finished at eof

Page 68: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 68

Parsing Functions

void statements(void)// Stats => ( [ Stat ] '\n' )* { printf("stats<"); while (currToken != SCANEOF) { if (currToken != NEWLINE) statement(); match(NEWLINE); } printf(">");} // end of statements()

rule *-3

rule []-2

Page 69: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 69

void statement(void)// Stat => ( 'let' ID '=' Expr ) | Expr{ printf("stat<"); if (currToken == LET) { match(LET); match(ID); match(ASSIGNOP); expression(); } else if ((currToken == LPAREN) ||

(currToken == INT) || (currToken == ID)) expression(); else error(); printf(">");} // end of statement()

Complicated, butit can be optimized with some 'tricks'

Page 70: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 70

void expression(void)// Expr => Term ( ( '+' | '-' ) Term )*{ printf("expr<"); term(); while((currToken == PLUSOP) ||

(currToken == MINUSOP)) { if (currToken == PLUSOP)

match(PLUSOP); else if (currToken == MINUSOP) match(MINUSOP);

else error(); term(); } printf(">");} // end of expression()

rule *-1

Version 1

Page 71: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 71

void expression(void)// Expr => Term ( ( '+' | '-' ) Term )*{ printf("expr<"); term(); while((currToken == PLUSOP) ||

(currToken == MINUSOP)) { match(currToken); term(); } printf(">");} // end of expression()

Version 2: simplified | code

Shorter, but alsoharder tounderstand!

Page 72: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 72

void term(void)// Term => Fact ( ('*' | '/' ) Fact )*{ printf("term<"); factor(); while((currToken == MULTOP) || (currToken == DIVOP)) { if (currToken == MULTOP)

match(MULTOP); else if (currToken == DIVOP) match(DIVOP);

else error(); factor(); } printf(">");} // end of term()

rule *-1

Version 1

Page 73: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 73

void term(void)// Term => Fact ( ('*' | '/' ) Fact )*{ printf("term<"); factor(); while((currToken == MULTOP) || (currToken == DIVOP)) { match(currToken); factor(); } printf(">");} // end of term()

Version 2: simplified | code

Shorter, but alsoharder tounderstand!

Page 74: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 74

void factor(void)// Fact => '(' Expr ')' | INT | ID{ printf("fact<"); if(currToken == LPAREN) { match(LPAREN); expression(); match(RPAREN); } else if(currToken == INT) match(INT); else if (currToken == ID) match(ID); else syntax_error(currToken); printf(">");} // end of factor()

Page 75: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 75

4. LL(1) Parse Tables• The format of a parse table:

– T[non-term][term]

A

non-

term

inal

s

bterminals

a production A => with b PREDICT(A=>)

Page 76: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 76

Other Data Structures

• Sequence of input tokens (ending with $).• A parse stack to hold nonterminals and

terminals that are being processed.

$E

pushpop

Page 77: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 77

push($); push(start_symbol); currToken = scanner();do X = pop(stack); if (X is a terminal or $) { if (X == currToken) currToken = scanner(); else error(); }

else // X is a non-terminal

if (T[X][currToken] == X => Y1 Y2 ...Ym )

push(Ym); ... push (Y1); else error(); while (X != $);

The Parsing Algorithm

like match()

Page 78: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 78

4.1. Table Parsing Example

• Use the LL(1) grammar:E => T E1

E1 => '+' T E1 | T => F T1

T1 => '*' F T1 | F => id | '(' E ')'

Page 79: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 79

NT/TNT/T ++ ** (( )) IDID $$

EE 11 11

E1E1 22 33

TT 44 44

T1T1 66 55 66 66

FF 88 77

ProductionProduction PredictPredict

1: E => T E11: E => T E1 {(,id}{(,id}

2: E1 => + T E12: E1 => + T E1 {+}{+}

3: E1 => 3: E1 => {$,)}{$,)}

4: T => F T14: T => F T1 {(,id}{(,id}

5: T1 => * F T15: T1 => * F T1 {*}{*}

6: T1 => 6: T1 => {+,$,)}{+,$,)}

7: F => id7: F => id {id}{id}

8: F => ( E )8: F => ( E ) {(}{(}

Parse Table Generation

Page 80: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 80

Parsing "a + b * c $"

StackStack InputInput ActionAction$E$E a+b*c$a+b*c$ E => T E1 E => T E1

$E1 T$E1 T "" T => F T1 T => F T1

$E1 T1 F$E1 T1 F "" F => idF => id

$E1 T1 id$E1 T1 id "" matchmatch

$E1 T1$E1 T1 +b*c$+b*c$ T1 => T1 =>

$E1$E1 "" E1 => + T E1E1 => + T E1

$E1 T+$E1 T+ "" matchmatch

$E1 T$E1 T b*c$b*c$ T => F T1T => F T1

StackStack InputInput ActionAction$E1 T1 F$E1 T1 F "" F => idF => id

$E1 T1 id$E1 T1 id "" matchmatch

$E1 T1 $E1 T1 *c$*c$ T1 => * F T1 T1 => * F T1

$E1 T1 F *$E1 T1 F * "" matchmatch

$E1 T1 F$E1 T1 F c$c$ F => idF => id

$E1 T1 id$E1 T1 id "" matchmatch

$E1 T1 $E1 T1 $$ T1 => T1 =>

$E1 $E1 "" E1 =>E1 =>

$$ "" SuccessSuccess!!

Page 81: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 81

5. Making a Grammar LL(1)

• Not all context free grammars are LL(1).

• We can tell if a grammar is not LL(1) by looking at its PREDICT sets– for a LL(1) grammar, the PREDICT sets for a

non-terminal will be disjoint

Page 82: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 82

ExampleProductionProduction PredictPredict

E => E + TE => E + T = FIRST(E) = {(,id}= FIRST(E) = {(,id}

E => TE => T = FIRST(T) = {(,id}= FIRST(T) = {(,id}

T => T * FT => T * F = FIRST(T) = {(,id}= FIRST(T) = {(,id}

T => FT => F = FIRST(F) = {(,id}= FIRST(F) = {(,id}

F => idF => id = {id}= {id}

F => ( E )F => ( E ) = {(}= {(}

•FIRST(F) = {(,id}

•FIRST(T) = {(,id}

•FIRST(E) = {(,id}

•FOLLOW(E) = {$,),+}

•FOLLOW(T) = {+,$,),*}

•FOLLOW(F) = {+,$,),*}

E and T are problems since their PREDICT sets are not disjoint.

Page 83: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 83

Example of Disjoint Problem

• Input "5 + b"• There are two productions to choose from:

E => E + T

E => T

• Which should be chosen by looking only at the current token "5"?

Page 84: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 84

5.1. From non-LL(1) to LL(1)

• There are two main techniques for converting a non-LL(1) grammar to LL(1).– but they don't work for every grammar

• 1. Left Factoring– e.g. used on A => B a C D | B a C E

• 2. Transforming left recursion to right recursion– e.g. used on E => E + T | T

Page 85: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 85

5.2. Left Factoring

• S => a B | a C– to see the problem try choosing a production to

parse "a" in "andrew"

• Change S to:

S => a S1

S1 => B | C– now there is no difficult choice

Page 86: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 86

• In general:

A => n

becomes

A => A1A1 => n

Page 87: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 87

5.3. Why is Left Recursion a Problem?

• Grammar:A => A b

A => b

• The input is "bbbb".• Using only the current token, "b", which

production should be used?

Page 88: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 88

Remove Left Recursion

A => A 1 | A 2 | … | 1 | 2 | …

becomes

A => 1 A1 | 2 A1 | …

A1 => 1 A1 | 2 A1 | … |

• he left recursion is changed to right recursion in the new A1 rule.

Page 89: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 89

Example Translation

• The left recursive grammar:A => A b | b

becomes

A => b A1

A1 => b A1 | • Try parsing the input string "bbbb" using

only the current token "b".

Page 90: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 90

Fixing the E Grammar

• The folowing E grammar is not LL(1):E => E + T | T

T => T * F | F

F => id | ( E )

• Try parsing "5 + b"

continued

Page 91: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 91

• Eliminate left recursion in E and T:E => T E1

E1 => + T E1 | T => F T1

T1 => * F T1 | F => id | ( E )

• This version of the E grammar is LL(1), and we've been using it for most of our examples.

Page 92: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 92

5.4. Non-Immediate Left Recursion

• Ex: A1 => A2 a | b

A2 => A1 c | A2 d

• Convert to immediate left recursion– replace A1 in A2 productions by A1’s definition:

A1 => A2 a | b

A2 => A2 a c | b c | A2 d

• Now eliminate left recursion in A2:

A1 => A2 a | b

A2 => b c A3

A3 => a c A3 | d A3 |

A1 A2

Page 93: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 93

Example

A => B c | d

B => C f | B f

C => A e | g

• Replace C in B's production by C's defn: B => A e f | g f | B f

• Replace A in B's production by A's defn:B => B c e f | d e f | g f | B f

A

C

B

Page 94: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 94

• Now grammar is:A => B c | d

B => B c e f | d e f | g f | B f

C => A e | g

• Get rid of left recursion in B:A => B c | d

B => d e f B1 | g f B1

B1 => c e f B1 | f B1 | C => A e | g

If A is the startsymbol, then theC production isnever called, socan be deleted.

Page 95: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 95

6. Error Recovery in LL Parsing

• Simple answer: – when there's an error, print a message and exit

• Better error recovery:– 1. insert the expected token and continue

• this approach can cause non-termination

– 2. keep deleting tokens until the parser gets a token in the FOLLOW set for the production that went wrong• see example on next slide

Page 96: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 96

void E()

{

if (currToken in FIRST(T)) { // error checking

T(); E1(); // FIRST(T) == {(,ID} }

else { // error reporting and recovery

printf("Expecting one of FIRST(T)");

while (currToken not in FOLLOW(E)) // FOLLOW(E) == {),$}

currToken = scanner(); // skip input

}

} // end of E()

Example: E→T E1 from slide 29

Page 97: 241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive

241-437 Compilers: topDown/5 97

void E()

{ if ((currToken == LPAREN) || (currToken == ID))

{

T(); E1(); }

else {

printf("Expecting ( or id"); while ( (currToken != RPAREN) && (currToken != SCANEOF))

currToken = scanner();

}

} // end of E()

C Code