compiler construction in4020 – lecture 3

48
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands

Upload: carr

Post on 14-Jan-2016

43 views

Category:

Documents


2 download

DESCRIPTION

Compiler construction in4020 – lecture 3. Koen Langendoen Delft University of Technology The Netherlands. program text. token description. scanner generator. lexical analysis. tokens. syntax analysis. AST. context handling. annotated AST. Summary of lecture 2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compiler construction in4020 –  lecture 3

Compiler constructionin4020 – lecture 3

Koen Langendoen

Delft University of TechnologyThe Netherlands

Page 2: Compiler construction in4020 –  lecture 3

Summary of lecture 2

• lexical analyzer generator• description FSA

• FSA construction• dotted items

• character moves

• moves

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

scanner

generator

token

description

Page 3: Compiler construction in4020 –  lecture 3

Quiz

2.24 Will the function identify() (to access a symbol table) still work if the hash function maps all identifiers onto the same number?

2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.

Page 4: Compiler construction in4020 –  lecture 3

• syntax analysis: tokens AST

• AST construction• by hand: recursive descent

• automatic: top-down (LLgen), bottom-up (yacc)

Overview

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parser

generator

language

grammar

Page 5: Compiler construction in4020 –  lecture 3

Syntax analysis

• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.

• result: parse tree

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘4’

constant

term

factor

term

‘a’

factor

identifier

term

factor

identifier

expression

‘c’‘*’

‘-’

‘*’

‘*’

syn-tax: the way in which words are put together to form phrases, clauses, or sentences.

Webster’s Dictionary

Page 6: Compiler construction in4020 –  lecture 3

Syntax analysis

• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.

• result: parse tree

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘4’

constant

term

factor

term

‘a’

factor

identifier

term

factor

identifier

expression

‘c’‘*’

‘-’

‘*’

‘*’

Page 7: Compiler construction in4020 –  lecture 3

Context free grammar

• G = (VN, VT, S, P)

• VN : set of Non-terminal symbols

• VT : set of Terminal symbols

• S : Start symbol (S VN)

• P : set of Production rules {N }

• VN VT =

• P = {N | N VN (VN VT)*}

Page 8: Compiler construction in4020 –  lecture 3

Top-down parsing

• process tokens left to right

• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |

• example expression

input

expression EOF

term rest_expression

IDENTIFIER

aap + ( noot + mies )

Page 9: Compiler construction in4020 –  lecture 3

rest_expression

expression

rest_exprterm

Bottom-up parsing

• process tokens left to right

• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |

IDENT

• • •

IDENT

• •

IDENT

• •aap + ( noot + mies )

Page 10: Compiler construction in4020 –  lecture 3

Comparison

top-down bottom-up

node creation pre-order post-order

alternative selection first token last token

grammar type restricted

LL(1)

relaxed

LR(1)

implementation manual + automatic

automatic

5

1 4

2 3

1

2 3

4 5

Page 11: Compiler construction in4020 –  lecture 3

Recursive descent parsing

• each rule N translates to a boolean function• return true if a terminal production of N was matched

• return false otherwise (without consuming any token)

• try alternatives of N in turn

• a terminal symbol must match the current token

• a non-terminal is matched by calling its routine

input expression EOF

int input(void) {

return expression() && require(token(EOF));

}

Page 12: Compiler construction in4020 –  lecture 3

Recursive descent parsing

expression term rest_expression

int expression(void) {

return term() && require(rest_expression());

}

term IDENTIFIER | ‘(’ expression ‘)’

int term(void) {

return token(IDENTIFIER) ||

token('(') && require(expression()) && require(token(')'));

}

Page 13: Compiler construction in4020 –  lecture 3

Recursive descent parsing

auxiliary functions• consume matched tokens

• report syntax errors

int token(int tk) {

if (Token.class != tk) return 0;

get_next_token(); return 1;

}

int require(int found) {

if (!found) error();

return 1;

}

rest_expression ‘+’ expression |

int rest_expression(void) {

return token('+') && require(expression()) || 1;

}

Page 14: Compiler construction in4020 –  lecture 3

Automatic top-down parsing

• follow recursive descent scheme, but avoid interpretation overhead

• for each rule and alternative determine the tokens it can start with: FIRST set

• parsing scheme for rule N A1 | A2 | …• if token FIRST(N) then ERROR

• if token FIRST(A1) then parse A1

• if token FIRST(A2) then parse A2

• …

Page 15: Compiler construction in4020 –  lecture 3

Exercise (7 min.)

• design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar.

• hint: consider the types of rules• alternatives

• composition

• empty productions

input expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |

Page 16: Compiler construction in4020 –  lecture 3

Answers

Page 17: Compiler construction in4020 –  lecture 3

Answers (Fig 2.58, page 122)

• N w (w VT )

FIRST(N) = {w}

• N A1 | A2 | …

FIRST(N) = FIRST(Ai )

• N FIRST(N) = {}

• N A FIRST(N) = FIRST(A ) , FIRST(A)

FIRST(N) = FIRST(A ) \ {} FIRST() , otherwise

closure algorithm

Page 18: Compiler construction in4020 –  lecture 3

Break

Page 19: Compiler construction in4020 –  lecture 3

Predictive parsing

• similar to recursive descent, but no back-tracking

• functions “know” what they are doing

input expression EOF FIRST(expression) = {IDENT, ‘(‘}

void input(void) { switch (Token.class) { case IDENT: case '(': expression(); token(EOF); break; default: error();

}

}

void token(int tk) {

if (Token.class != tk) error();

get_next_token();

}

Page 20: Compiler construction in4020 –  lecture 3

Predictive parsing

expression term rest_expression FIRST(term) = {IDENT, ‘(‘}

void expression(void) { switch (Token.class) { case IDENT: case '(': term(); rest_expression(); break; default: error();

}

}

term IDENTIFIER | ‘(’ expression ‘)’void term(void) { switch (Token.class) { case IDENT: token(IDENT); break; case '(': token('('); expression(); token(')'); break; default: error();

}

}

Page 21: Compiler construction in4020 –  lecture 3

Predictive parsing

• FIRST() = {}• check nothing?

• NO: token FOLLOW(rest_expr)

rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, }void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error();

}

}

FOLLOW(rest_expr) = {EOF, ‘)’}

Page 22: Compiler construction in4020 –  lecture 3

Limitations of LL(1) parsers

• FIRST/FIRST conflictterm IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’

• FIRST/FOLLOW conflictS A ‘a’ ‘b’

A ‘a’ |

• left recursionexpression expression ‘-’ term | term

FIRST(A) = { ‘a’ } = FOLLOW(A)

Page 23: Compiler construction in4020 –  lecture 3

Making grammars LL(1)

• manual labour• rewrite grammar

• adjust semantic actions

• three rewrite methods• left factoring

• substitution

• left-recursion removal

Page 24: Compiler construction in4020 –  lecture 3

Left factoring

term IDENTIFIER

| IDENTIFIER ‘[‘ expression ‘]’

• factor out common prefix

term IDENTIFIER after_identifier

after_identifier | ‘[‘ expression ‘]’

‘[’ FOLLOW(after_identifier)

Page 25: Compiler construction in4020 –  lecture 3

Substitution

S A ‘a’ ‘b’

A ‘a’ |

• replace non-terminal by its alternative

S ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’

Page 26: Compiler construction in4020 –  lecture 3

Left-recursion removal

N N |

• replace by

N M

M M |

• example

expression expression ‘-’ term | term

...

expression term tail

tail ‘-’ term tail |

N

Page 27: Compiler construction in4020 –  lecture 3

Exercise (7 min.)

• make the following grammar LL(1)

expression expression ‘+’ term | expression ‘-’ term | termterm term ‘*’ factor | term ‘/’ factor | factorfactor ‘(‘ expression ‘)’ | func-call | identifier | constantfunc-call identifier ‘(‘ expr-list? ‘)’expr-list expression (‘,’ expression)*

• and what about

S if E then S (else S)?

Page 28: Compiler construction in4020 –  lecture 3

Answers

Page 29: Compiler construction in4020 –  lecture 3

Answers

• substitutionF ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant

• left factoringE E ( ‘+’ | ‘-’ ) T | TT T ( ‘*’ | ‘/’ ) F | FF ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant

• left recursion removalE T (( ‘+’ | ‘-’ ) T )*T F (( ‘*’ | ‘/’ ) F )*

• if-then-else grammar is ambiguous

Page 30: Compiler construction in4020 –  lecture 3

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parser

generator

language

grammar

automatic generation

LL(1) push-down

automaton

Page 31: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

• stack right-hand side of production

transition tablestate

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 32: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

input

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 33: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

input

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

replace non-terminal by transition entry

Page 34: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

expression EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 35: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

expression EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

replace non-terminal by transition entry

Page 36: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

term rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 37: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

term rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

replace non-terminal by transition entry

Page 38: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

IDENT rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 39: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

aap + ( noot + mies ) EOF

IDENT rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

pop matching token

Page 40: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

+ ( noot + mies ) EOF

rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 41: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

+ ( noot + mies ) EOF

rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 42: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

+ ( noot + mies ) EOF

+ expression EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

Page 43: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

+ ( noot + mies ) EOFinput

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

+ expression EOF

Page 44: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

( noot + mies ) EOFinput

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

expression EOF

Page 45: Compiler construction in4020 –  lecture 3

LL(1) push-down automaton

( noot + mies ) EOFinput

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

expression EOF

Page 46: Compiler construction in4020 –  lecture 3

LLgen

• top-down parser generator

• to be used in assignment #1

• discussed in lecture 5

Page 47: Compiler construction in4020 –  lecture 3

• syntax analysis: tokens AST

• top-down parsing• recursive descent• push-down automaton• making grammars LL(1)

Summary

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parser

generator

language

grammar

Page 48: Compiler construction in4020 –  lecture 3

Homework

• study sections:• 1.10 closure algorithm

• 2.2.4.6 error handling in LL(1) parsers

• print handout for next week [blackboard]

• find a partner for the “practicum”

• register your group• send e-mail to [email protected]