241-437 compilers: attr. grammars/8 1 compiler structures objective – –describe semantic...
TRANSCRIPT
241-437 Compilers: Attr. Grammars/8 1
Compiler Structures
• Objective– describe semantic analysis with attribute
grammars, as applied in yacc and recursive descent parsers
241-437, Semester 1, 2011-2012
8. Attribute Grammars
241-437 Compilers: Attr. Grammars/8 2
Overview
1. What is an Attribute Grammar?
2. Parse Tree Evaluation
3. Attributes
4. Attribute Grammars and yacc
5. A Grid Grammar
6. Recursive Descent and Attributes
241-437 Compilers: Attr. Grammars/8 3
In this lecture
Source Program
Target Lang. Prog.
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
FrontEnd
Code Optimizer
Target Code Generator
BackEnd
Int. Code Generator
Intermediate Codeconcentratingon attribute grammars
241-437 Compilers: Attr. Grammars/8 4
1. What is an Attribute Grammar?
• An attribute grammar is a context free grammar with semantic actions attached to some of the productions– semantic = meaning
• An action specifies the meaning of a production in terms of its body terminals and nonterminals.
241-437 Compilers: Attr. Grammars/8 5
Example Attribute Grammar
L EE E + TE TT T * FT FF ( E )F num
printf(Ebody.val)E.val := Ebody.val + Tbody.valE.val := Tbody.valT.val := Tbody.val * Fbody.valT.val := Fbody.valF.val := Ebody.valF.val := value(num)
Production Semantic Action
241-437 Compilers: Attr. Grammars/8 6
2. Parse Tree Evaluation
• One way of understanding semantic actions is as extra information (attributes) attached to the nodes of the parse tree for the input.
• The semantic action specifies the parent node attribute in terms of the attributes of its children.
241-437 Compilers: Attr. Grammars/8 7
Basic Parse Tree Input: 9 * 5 + 2
L EE E + TE TT T * FT FF ( E )F num
L
E
TE +
*T
F
9
F
5
F
2
T
241-437 Compilers: Attr. Grammars/8 8
Adding Meaning to the Tree
• What is the meaning of "9 * 5 + 2"?– the answer is to evaluate it, to get 47
• Add attributes to the tree, starting from the leaves and working up to the root– use the semantic actions to get the attribute
values
241-437 Compilers: Attr. Grammars/8 9
Parse Tree with Actions
L
E
TE +
*T
F
9
F
5
F
2
T
printf(Ebody.val)E.val := Ebody.val + Tbody.valE.val := Tbody.valT.val := Tbody.val * Fbody.valT.val := Fbody.valF.val := Ebody.valF.val := value(num) 9
9
45
45
47
47printf
2
2
evaluatebottom-up
5
241-437 Compilers: Attr. Grammars/8 10
3. Attributes
• Attribute values can be– numbers, strings, any data structures,
code, assembly language instructions
• It's not always necessary to build a parse tree in order to evaluate the grammar's action.
241-437 Compilers: Attr. Grammars/8 11
Kinds of Attribute
• There are two main kinds of attribute evaluation:– synthesized and inherited attributes
• The value of a synthesized attribute is calculated by using its body values– as in the previous example
241-437 Compilers: Attr. Grammars/8 12
Synthesized Attributes in a Tree
• Example:Production Semantic Action
T T * F T.val := Tbody.val * Fbody.val
*T F
T
9
45
5 evaluatebottom-up
241-437 Compilers: Attr. Grammars/8 13
Inherited Attributes
• An inherited attribute for a body symbol (i.e. terminal, non-terminal) gets its value from the other body symbols and the parent value– often used for evaluating more complex
programming language features
241-437 Compilers: Attr. Grammars/8 14
Inherited Attributes in a Tree
X.x := function(A.a, Y.y)
Y.y := function(A.a, X.x)
A.a
X.x Y.y
A.a
X.x Y.y
Direction of
evaluation
• Two examples:
241-437 Compilers: Attr. Grammars/8 15
4. Attribute Grammars and yacc
• yacc supports (synthesized) attribute grammars– yacc actions are semantic actions– no parse tree is needed, since yacc evaluates the
actions using the parser's built-in stack
241-437 Compilers: Attr. Grammars/8 16
expr.y Again%token NUMBER
%%
exprs: expr '\n' { printf("Value = %d\n", $1); }
| exprs expr '\n' { printf("Value = %d\n", $2); }
;
expr: expr '+' term { $$ = $1 + $3; }
| expr '-' term { $$ = $1 - $3; }
| term { $$ = $1; }
;
continued
declarations
actions
attributes
241-437 Compilers: Attr. Grammars/8 17
term: term '*' factor { $$ = $1 * $3; }
| term '/' factor { $$ = $1 / $3; } /* integer division */
| factor
;
factor: '(' expr ')' { $$ = $2; }
| NUMBER
;
continued
more actions
241-437 Compilers: Attr. Grammars/8 18
$$#include "lex.yy.c"
int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}
int main(void){ yyparse(); // the syntax analyzer return 0;}
c code
241-437 Compilers: Attr. Grammars/8 19
Evaluation in yaccStack$$ 3$ F$ T$ T *$ T * 5$ T * F$ T$ E$ E +$ E + 4$ E + F$ E + T$ E$ E \n$ Es
Input3*5+4\n$
*5+4\n$*5+4\n$*5+4\n$
5+4\n$+4\n$+4\n$+4\n$
+4\n$ 4\n$
\n$\n$\n$\n$
$$
Stack Actionshiftreduce F numreduce T Fshiftshiftreduce F num reduce T T * Freduce E T shiftshiftreduce F num reduce T F reduce E E + T shiftreduce Es E \naccept
val_3333 3 53 5151515 15 415 415 41919 19
Semantic Action
$$ = $1 (implicit)$$ = $1 (implicit)
$$ = $1 (implicit)$$ = $1 * $3$$ = $1 (implicit)
$$ = $1 (implicit)
$$ = $1 (implicit)$$ = $1 + $3
printf $1
Input: 3 * 5 + 4\n
241-437 Compilers: Attr. Grammars/8 20
5. A Grid Grammar
• A robot starts at (0,0) on a grid, and is given compass directions:– n = north, s = south, e = east, w = west
• Evaluate the sequence of directions to work out the final position of the robot.
241-437 Compilers: Attr. Grammars/8 21
Example
• The robot receives the directions:– n e e n n w– what is the 'meaning' (semantics) of the
directions?– the 'meaning' is the final robot position, (1,3)
start
final
n
ew
s
241-437 Compilers: Attr. Grammars/8 22
5.1. Grid Grammar Input: n w s s
robot pathpath path step | step e | w | s | n
robot
path
path step
spath step
spath step
wpath step
n
241-437 Compilers: Attr. Grammars/8 23
Grid Attribute Grammar
robot path
path path step
path
step estep wstep sstep n
printf( pathbody.(x,y) )
path.x := pathbody.x + stepbody.dxpath.y := pathbody.y + stepbody.dypath.(x,y) = (0,0)
step.(dx,dy) := (1,0)step.(dx,dy) := (-1,0)step.(dx,dy) := (0,-1)step.(dx,dy) := (0,1)
Production Semantic Actions
241-437 Compilers: Attr. Grammars/8 24
Data Types
• The path rules use (x,y), the position of the robot.
• The step rules use (dx,dy), the step taken by the robot.
• Implementing these data types requires new features of yacc.
(x,y)
dx,dy
241-437 Compilers: Attr. Grammars/8 25
Parse Tree with Actions Input: n w s s
robot
path
path step
spath step
spath step
wpath step
n
(0,0)
(0,1)
(-1,1)
(-1,0)
(-1,-1)
0,1
-1,0
0,-1
0,-1
printf (-1,-1)
evaluatebottom-up
241-437 Compilers: Attr. Grammars/8 26
5.2. Non-integer Yacc Attributes
• The default yacc attributes (e.g. $$, $1, etc) are integers.
• We want data structures for (x,y) and (dx,dy), coded as two struct types.
241-437 Compilers: Attr. Grammars/8 27
Defining New Types
• The new types are collected together inside a %union in the yacc definitions section:
%union{ type1 name1; type2 name2; . . .}
• For the grid:%union{ struct (int x, int y; } pos; struct (int dx, int dy; } offset;}
241-437 Compilers: Attr. Grammars/8 28
• The non-terminals that return the new types must be listed.
• Any tokens that use the types must be listed.
• For the grid:% type <offset> step% type <pos> path
Using the Types
these non-terminals returnvalues of the specified type
241-437 Compilers: Attr. Grammars/8 29
Using Typed Variables
• If an attribute (variable) is a record, then dotted-name notation is used to refer to its fields– e.g. $$.dx, $1.y
• The default action ($$ = $1) will cause an error if $$ and $1 are not the same type.
241-437 Compilers: Attr. Grammars/8 30
5.3. Grid Compiler
$ flex grid.l$ bison grid.y$ gcc grid.tab.c -o gridEval
grid.l,
a flex file
grid.y,
a bison file
bison
flex lex.yy.c
grid.tab.c
gccgridEval,
c executable
#include
241-437 Compilers: Attr. Grammars/8 31
Usage
$ ./gridEvalnwssRobot is at (-1,-1)$ ./gridEvaln n n w w w s eRobot is at (-2,2)$
I typed these lines.
I typed ctrl-D
241-437 Compilers: Attr. Grammars/8 32
grid.l%%
[nN] {return NORTH;}[sS] {return SOUTH;}[eE] {return EAST;}[wW] {return WEST;}
[ \n\t] ;
%%
int yywrap(void) { return 1; }
241-437 Compilers: Attr. Grammars/8 33
grid.y
%union{ struct { int x; int y; } pos; struct { int dx; int dy; } offset;}
%token EAST WEST NORTH SOUTH
%type <offset> step%type <pos> path
%%
continued
typedefinitions
types use by thenon-terminals
241-437 Compilers: Attr. Grammars/8 34
robot: path { printf("Robot is at (%d,%d)\n", $1.x, $1.y); }
;
path: path step {$$.x = $1.x + $2.dx; $$.y = $1.y + $2.dy;}
| {$$.x = 0; $$.y = 0;} ;
step: EAST {$$.dx = 1; $$.dy = 0;} | WEST {$$.dx = -1; $$.dy = 0;} | SOUTH {$$.dx = 0; $$.dy = -1;} | NORTH {$$.dx = 0; $$.dy = 1;} ;
%%
continued
241-437 Compilers: Attr. Grammars/8 35
#include "lex.yy.c"
int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}
int main(void){ yyparse(); return 0;}
241-437 Compilers: Attr. Grammars/8 36
6. Recursive Descent and Attributes
• It is easy to add semantic actions to a recursive descent parser– in many cases, there's no need for the parser to
build a parse tree in order to evaluate the attributes
• The basic translation strategy:– each production becomes a function
continued
241-437 Compilers: Attr. Grammars/8 37
• The function (e.g. f()) calls other functions representing its body non-terminals– those functions return values (attributes) to f()– f() combines the values, and returns a value
(attribute)
241-437 Compilers: Attr. Grammars/8 38
6.1. The Expressions Parser Again
• The basic LL(1) grammar:Stats => ( [ Stat ] \n )*
Stat => let ID = Expr | Expr
Expr => Term ( (+ | - ) Term )*
Term => Fact ( (* | / ) Fact ) *
Fact => '(' Expr ')' | Int | Id
241-437 Compilers: Attr. Grammars/8 39
An Expressions Program (test3.txt)
5 + 6 give answerlet x = 2 declare
variable3 + ( (x*y)/2) // comments// ylet x = 5let y = x /0 error
// comments
241-437 Compilers: Attr. Grammars/8 40
• exprParse1.c is a recursive descent parser using the expressions language.
• It differs from exprParse0.c by having semantic actions attached to its productions– these actions evaluate the expressions, and
assign values to expression variables
6.2. Parsing with Actions
241-437 Compilers: Attr. Grammars/8 41
Grammar with Actions
• Productions ActionsStats => ( [ Stat ] \n )* ---
Stat => let ID = Expr add id to symbol table;id.val = expr.val;print( id.val );
Stat => Expr print( expr.val );
continued
241-437 Compilers: Attr. Grammars/8 42
Expr => Term ( (+ | - ) Term )*
return term1.val (+| -)term2.val (+| -) ...termn.val;
Term => Fact ( (* | / ) Fact ) *
return fact1.val (*| /)fact2.val (*| /) ...factn.val;
continued
241-437 Compilers: Attr. Grammars/8 43
Fact => '(' Expr ') return expr.val;
Fact => Int return int.val;
Fact => Id lookup id;if not found then add (id, 0) to table;
return id.val;
241-437 Compilers: Attr. Grammars/8 44
The Symbol Table
• The symbol table is a data structure used to store expression variables and their values.
• In exprParse1.c, it's an array of structs, with each struct holding the name of the variable and its current integer value.
. . . .idvalue
syms[]
241-437 Compilers: Attr. Grammars/8 45
6.3. Usage
$ gcc -Wall -o exprParse1 exprParse1.c$ ./exprParse1 < test3.txt== 11x being declaredx = 2y being declared== 3x = 5Error: Division by zero; using 1 insteady = 5$
241-437 Compilers: Attr. Grammars/8 46
6.4. exprParse1.c Callgraphsame as in exprParse0.c
symboltable (new)
generated fromgrammar (nowwith actions)
241-437 Compilers: Attr. Grammars/8 47
6.5. Symbol Table Data Structures
#define MAX_SYMS 15 // max no of variables
typedef struct SymInfo { char *id; // name of variable int value; // value (an integer)} SymbolInfo;
int symNum = 0; // number of symbols storedSymbolInfo syms[MAX_SYMS];
. . . .idvalue
syms[]
0 1 2 14
241-437 Compilers: Attr. Grammars/8 48
Symbol Table FunctionsSymbolInfo *getIDEntry(void)/* find _OR_ create symbol table entry for
current tokString; return a pointer to it */{ SymbolInfo *si = NULL; if ((si = lookupID(tokString)) != NULL)
// already declared return si;
// add id to table printf("%s being declared\n", tokString); return addID(tokString, 0); //0 is default value} // end of getIDEntry()
241-437 Compilers: Attr. Grammars/8 49
SymbolInfo *lookupID(char *nm)/* is nm in the symbol table? return pointer to struct or NULL */{ int i; for(i=0; i<symNum; i++) if (!strcmp(syms[i].id, nm)) return &syms[i]; return NULL;} // end of lookupID()
241-437 Compilers: Attr. Grammars/8 50
SymbolInfo *addID(char *nm, int value)/* add nm and value to the symbol table;
return pointer to struct */{ if (symNum == MAX_SYMS) { printf("Symbol table full; cannot add %s\n", nm); exit(1); }
syms[symNum].id = (char *) malloc(strlen(nm)+1); strcpy(syms[symNum].id, nm); syms[symNum].value = value; SymbolInfo *si = &syms[symNum]; symNum++;
return si;} // end of addID()
241-437 Compilers: Attr. Grammars/8 51
Using the Symbol Table• The grammar functions use the symbol table via the
matchID() function.
SymbolInfo *matchId(void)// checks current ID with symbol table{ SymbolInfo *si; dprint("Parsing ident\n"); if ((si = getIDEntry()) == NULL) { printf("Error: id is NULL on line %d\n",lineNum); exit(1); } match(ID); // ok, so consume ID token return si;} // end of matchId()
241-437 Compilers: Attr. Grammars/8 52
6.6. Translating the Grammar Rules
• The same translation is carried out as before, but the code is augmented with actions.
• The semantic actions are translated into extra C code in the grammar functions.
241-437 Compilers: Attr. Grammars/8 53
The Grammar Functions
• main() and statements() are unchanged from exprParse0.c since they don't have any semantic actions.
• Functions with extra actions:– statement(), expression(), term(), factor()
241-437 Compilers: Attr. Grammars/8 54
int main(void){ nextToken(); statements(); match(SCANEOF); return 0;}
void statementsvoid statements((voidvoid))// // statements statements ::= ::= { { // // [ [ statementstatement] ] '\n' }'\n' }{{ dprintdprint("("Parsing Parsing
statements\n statements\n")");; while while ((currToken currToken != !=
SCANEOF SCANEOF) ) {{ if if ((currToken currToken != != NEWLINENEWLINE)) statementstatement()();; matchmatch((NEWLINENEWLINE));; }}} } // // end of statementsend of statements()()
Unchanged Functions
241-437 Compilers: Attr. Grammars/8 55
statement() Before and After
void statement(void)// statement ::= ( 'let' ID '=' EXPR ) | EXPR{ if (currToken == LET) { match(LET); match(ID); match(ASSIGNOP); expression(); } else expression();} // end of statement()
with no semantic actions
241-437 Compilers: Attr. Grammars/8 56
void statement(void)// statement ::= ( 'let' ID '=' EXPR ) | EXPR{ SymbolInfo *si; int value; dprint("Parsing statement\n"); if (currToken == LET) { match(LET); si = matchId(); // was match(ID); match(ASSIGNOP); value = expression(); si->value = value; printf("%s = %d\n", si->id, value); } else { // expression value = expression(); printf("== %d\n", value); }}
Actions: add id to table; id.val = expr.val; print( id.val );or print( expr.val );
241-437 Compilers: Attr. Grammars/8 57
expression() Before and After
void expression(void)// expression ::= term ( ('+'|'-') term )*{ term(); while((currToken == PLUSOP) ||
(currToken == MINUSOP)) { match(currToken); term(); }} // end of expression()
with no semantic actions
241-437 Compilers: Attr. Grammars/8 58
int expression(void)// expression ::= term ( ('+'|'-') term )*{ int result, v2; int isAddOp;
dprint("Parsing expression\n"); result = term(); while((currToken == PLUSOP) || (currToken == MINUSOP)) { isAddOp = (currToken == PLUSOP) ? 1 : 0; match(currToken); v2 = term(); if (isAddOp == 1) // addition result += v2; else // subtraction result -= v2; } return result;} // end of expression()
Action: return term1.val (+| -) term2.val (+| -) ... termn.val;
241-437 Compilers: Attr. Grammars/8 59
term() Before and After
void term(void)// term ::= factor ( ('*'|'/') factor )*{ factor(); while((currToken == MULTOP) ||
(currToken == DIVOP)) { match(currToken); factor(); }} // end of term()
with no semantic actions
241-437 Compilers: Attr. Grammars/8 60
int term(void)// term ::= factor ( ('*'|'/') factor )*{ int result, v2; int isMultOp; dprint("Parsing term\n"); result = factor(); while((currToken == MULTOP) || (currToken == DIVOP)) { isMultOp = (currToken == MULTOP) ? 1 : 0; match(currToken); v2 = factor(); if (isMultOp == 1) // multiplication result *= v2; else { // division if (v2 == 0) printf("Error: Division by zero; using 1 instead\n"); else result = result / v2; } } return result;} // end of term()
Action: return fact1.val (*| / ) fact2.val (*| / ) ... factn.val;
241-437 Compilers: Attr. Grammars/8 61
factor() Before and After
void factor(void)// factor ::= '(' expression ')' | INT | ID{ if(currToken == LPAREN) { match(LPAREN); expression(); match(RPAREN); } else if(currToken == INT) match(INT); else if (currToken == ID) match(ID); else syntax_error(currToken);} // end of factor()
with no semantic actions
241-437 Compilers: Attr. Grammars/8 62
int factor(void)// factor ::= '(' expression ')' | INT | ID{ int result = 0; dprint("Parsing factor\n"); if(currToken == LPAREN) { match(LPAREN); result = expression(); match(RPAREN); } else if(currToken == INT) { match(INT); result = currTokValue; } else if (currToken == ID) { SymbolInfo *si = matchId(); result = si->value; } else syntax_error(currToken); return result;} // end of factor()
Actions: return expr.val;or return int.val;or add id to table (if new); return id.val;