compiler construction in4020 – lecture 3
Post on 14-Jan-2016
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
Compiler constructionin4020 – lecture 3
Koen Langendoen
Delft University of TechnologyThe Netherlands
Summary of lecture 2
• lexical analyzer generator• description FSA
• FSA construction• dotted items
• character moves
• moves
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
scanner
generator
token
description
Quiz
2.24 Will the function identify() (to access a symbol table) still work if the hash function maps all identifiers onto the same number?
2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.
• syntax analysis: tokens AST
• AST construction• by hand: recursive descent
• automatic: top-down (LLgen), bottom-up (yacc)
Overview
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
parser
generator
language
grammar
Syntax analysis
• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.
• result: parse tree
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘4’
constant
term
factor
term
‘a’
factor
identifier
term
factor
identifier
expression
‘c’‘*’
‘-’
‘*’
‘*’
syn-tax: the way in which words are put together to form phrases, clauses, or sentences.
Webster’s Dictionary
Syntax analysis
• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.
• result: parse tree
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘4’
constant
term
factor
term
‘a’
factor
identifier
term
factor
identifier
expression
‘c’‘*’
‘-’
‘*’
‘*’
Context free grammar
• G = (VN, VT, S, P)
• VN : set of Non-terminal symbols
• VT : set of Terminal symbols
• S : Start symbol (S VN)
• P : set of Production rules {N }
• VN VT =
• P = {N | N VN (VN VT)*}
Top-down parsing
• process tokens left to right
• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |
• example expression
input
expression EOF
term rest_expression
IDENTIFIER
aap + ( noot + mies )
rest_expression
expression
rest_exprterm
Bottom-up parsing
• process tokens left to right
• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |
IDENT
• • •
IDENT
• •
IDENT
• •aap + ( noot + mies )
Comparison
top-down bottom-up
node creation pre-order post-order
alternative selection first token last token
grammar type restricted
LL(1)
relaxed
LR(1)
implementation manual + automatic
automatic
5
1 4
2 3
1
2 3
4 5
Recursive descent parsing
• each rule N translates to a boolean function• return true if a terminal production of N was matched
• return false otherwise (without consuming any token)
• try alternatives of N in turn
• a terminal symbol must match the current token
• a non-terminal is matched by calling its routine
input expression EOF
int input(void) {
return expression() && require(token(EOF));
}
Recursive descent parsing
expression term rest_expression
int expression(void) {
return term() && require(rest_expression());
}
term IDENTIFIER | ‘(’ expression ‘)’
int term(void) {
return token(IDENTIFIER) ||
token('(') && require(expression()) && require(token(')'));
}
Recursive descent parsing
auxiliary functions• consume matched tokens
• report syntax errors
int token(int tk) {
if (Token.class != tk) return 0;
get_next_token(); return 1;
}
int require(int found) {
if (!found) error();
return 1;
}
rest_expression ‘+’ expression |
int rest_expression(void) {
return token('+') && require(expression()) || 1;
}
Automatic top-down parsing
• follow recursive descent scheme, but avoid interpretation overhead
• for each rule and alternative determine the tokens it can start with: FIRST set
• parsing scheme for rule N A1 | A2 | …• if token FIRST(N) then ERROR
• if token FIRST(A1) then parse A1
• if token FIRST(A2) then parse A2
• …
Exercise (7 min.)
• design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar.
• hint: consider the types of rules• alternatives
• composition
• empty productions
input expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |
Answers
Answers (Fig 2.58, page 122)
• N w (w VT )
FIRST(N) = {w}
• N A1 | A2 | …
FIRST(N) = FIRST(Ai )
• N FIRST(N) = {}
• N A FIRST(N) = FIRST(A ) , FIRST(A)
FIRST(N) = FIRST(A ) \ {} FIRST() , otherwise
closure algorithm
Break
Predictive parsing
• similar to recursive descent, but no back-tracking
• functions “know” what they are doing
input expression EOF FIRST(expression) = {IDENT, ‘(‘}
void input(void) { switch (Token.class) { case IDENT: case '(': expression(); token(EOF); break; default: error();
}
}
void token(int tk) {
if (Token.class != tk) error();
get_next_token();
}
Predictive parsing
expression term rest_expression FIRST(term) = {IDENT, ‘(‘}
void expression(void) { switch (Token.class) { case IDENT: case '(': term(); rest_expression(); break; default: error();
}
}
term IDENTIFIER | ‘(’ expression ‘)’void term(void) { switch (Token.class) { case IDENT: token(IDENT); break; case '(': token('('); expression(); token(')'); break; default: error();
}
}
Predictive parsing
• FIRST() = {}• check nothing?
• NO: token FOLLOW(rest_expr)
rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, }void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error();
}
}
FOLLOW(rest_expr) = {EOF, ‘)’}
Limitations of LL(1) parsers
• FIRST/FIRST conflictterm IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’
• FIRST/FOLLOW conflictS A ‘a’ ‘b’
A ‘a’ |
• left recursionexpression expression ‘-’ term | term
FIRST(A) = { ‘a’ } = FOLLOW(A)
Making grammars LL(1)
• manual labour• rewrite grammar
• adjust semantic actions
• three rewrite methods• left factoring
• substitution
• left-recursion removal
Left factoring
term IDENTIFIER
| IDENTIFIER ‘[‘ expression ‘]’
• factor out common prefix
term IDENTIFIER after_identifier
after_identifier | ‘[‘ expression ‘]’
‘[’ FOLLOW(after_identifier)
Substitution
S A ‘a’ ‘b’
A ‘a’ |
• replace non-terminal by its alternative
S ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’
Left-recursion removal
N N |
• replace by
N M
M M |
• example
expression expression ‘-’ term | term
...
expression term tail
tail ‘-’ term tail |
N
Exercise (7 min.)
• make the following grammar LL(1)
expression expression ‘+’ term | expression ‘-’ term | termterm term ‘*’ factor | term ‘/’ factor | factorfactor ‘(‘ expression ‘)’ | func-call | identifier | constantfunc-call identifier ‘(‘ expr-list? ‘)’expr-list expression (‘,’ expression)*
• and what about
S if E then S (else S)?
Answers
Answers
• substitutionF ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant
• left factoringE E ( ‘+’ | ‘-’ ) T | TT T ( ‘*’ | ‘/’ ) F | FF ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant
• left recursion removalE T (( ‘+’ | ‘-’ ) T )*T F (( ‘*’ | ‘/’ ) F )*
• if-then-else grammar is ambiguous
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
parser
generator
language
grammar
automatic generation
LL(1) push-down
automaton
LL(1) push-down automaton
• stack right-hand side of production
transition tablestate
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
aap + ( noot + mies ) EOF
input
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
aap + ( noot + mies ) EOF
input
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
replace non-terminal by transition entry
LL(1) push-down automaton
aap + ( noot + mies ) EOF
expression EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
aap + ( noot + mies ) EOF
expression EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
replace non-terminal by transition entry
LL(1) push-down automaton
aap + ( noot + mies ) EOF
term rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
aap + ( noot + mies ) EOF
term rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
replace non-terminal by transition entry
LL(1) push-down automaton
aap + ( noot + mies ) EOF
IDENT rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
aap + ( noot + mies ) EOF
IDENT rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
pop matching token
LL(1) push-down automaton
+ ( noot + mies ) EOF
rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
+ ( noot + mies ) EOF
rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
+ ( noot + mies ) EOF
+ expression EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
LL(1) push-down automaton
+ ( noot + mies ) EOFinput
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
+ expression EOF
LL(1) push-down automaton
( noot + mies ) EOFinput
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
expression EOF
LL(1) push-down automaton
( noot + mies ) EOFinput
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
expression EOF
LLgen
• top-down parser generator
• to be used in assignment #1
• discussed in lecture 5
• syntax analysis: tokens AST
• top-down parsing• recursive descent• push-down automaton• making grammars LL(1)
Summary
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
parser
generator
language
grammar
Homework
• study sections:• 1.10 closure algorithm
• 2.2.4.6 error handling in LL(1) parsers
• print handout for next week [blackboard]
• find a partner for the “practicum”
• register your group• send e-mail to koen@pds.twi.tudelft.nl
top related