241-437 compilers: yacc/7 1 compiler structures objective – –describe yacc (actually bison) –...
TRANSCRIPT
241-437 Compilers: Yacc/7 1
Compiler Structures
• Objective– describe yacc (actually bison)– give simple examples of its use
241-437, Semester 1, 2011-2012
7. Yacc
241-437 Compilers: Yacc/7 2
Overview
1. What is Yacc?
2. Format of a yacc/bison File
3. Expressions Compiler
4. Bottom-up Parsing Reminder
5. Expression Conflicts
6. Precedence/Associativity in yacc
continued
241-437 Compilers: Yacc/7 3
7. Dangling Else Conflict
8. Left and Right Recursion
9. Error Recovery
10. Embedded Actions
11. More Information
241-437 Compilers: Yacc/7 4
1. What is Yacc?
• Yacc (Yet Another Compiler Compiler) is a tool for translating a context free grammar into a bottom-up LALR parser– it creates a parse table like that described in the
last chapter
• Yacc is used with lex to create compilers.
continued
241-437 Compilers: Yacc/7 5
• Most people use bison, a much improved version of yacc– on most modern Unixes, when you call yacc,
you're really using bison
• bison works with flex (the fast version of lex).
241-437 Compilers: Yacc/7 6
Bison and Flex $ flex foo.l$ bison foo.y$ gcc foo.tab.c -o foo
foo.l,
a flex file
foo.y,
a bison file
bison
flex lex.yy.c
foo.tab.c
C compilerfoo,
c executable
#include
foo,
c executablesourceprogram
parsed output
$ ./foo < program.txt
241-437 Compilers: Yacc/7 7
Compiler Components (in foo)
lex.yy.c,Lexical
Analyzer(using chars)
foo.tab.c,Syntax
Analyzer(using tokens)
SourceProgram
3. Token,token value,token type
1. Get nexttoken bycallingyylex()
lexicalerrors
syntaxerrors
2. Get charsto makea token
parsedoutput
241-437 Compilers: Yacc/7 8
actions gotos
Inside foo.tab.c
$$aann……aaii……aa22aa11
LALR Parser
XXo o ss00
……
XXm-1 m-1 ssm-1 m-1
XXm m ssmm parsedoutput
stack
input tokens
X is terminals ornon-terminals,S = state
Parse table(bison creates thisbased on yourgrammar)
241-437 Compilers: Yacc/7 9
2. Format of a yacc/bison File
declarations:C data and yacc definitions (or nothing)
%%
Grammar rules (with actions)
%%
#include "lex.yy.c"
C functions, including main()
241-437 Compilers: Yacc/7 10
Declarations• C data is put between %{ and %}
• The yacc definitions list the tokens (terminals) used in the grammar
%token terminal1 terminal2 ...
• Other yacc definitions:– %left and %right for associativity– %prec for precedence
241-437 Compilers: Yacc/7 11
Precedence example: 2 + 3 * 5– does it mean (2 + 3) * 5
or 2 + (3 * 5) ?
Associativity example: 1 – 1 – 1– does it mean (1 – 1) – 1 // left
or 1 – (1 – 1) ? // right
241-437 Compilers: Yacc/7 12
Rules
• Rule format:
nonterminal : body 1 {action 1} | body 2 {action 2}
. . . | body n {action n) ;
• Actions are optional; they are C code.• Actions are usually at the end of a body,
but can be placed anywhere.
grammar part is the same as: nonterminal body1 | body2 | ... | bodyN
241-437 Compilers: Yacc/7 13
3. Expressions Compiler
$ flex expr.l$ bison expr.y$ gcc expr.tab.c -o exprEval
expr.l,
a flex file
expr.y,
a bison file
bison
flex lex.yy.c
expr.tab.c
gccexprEval,
c executable
#include
241-437 Compilers: Yacc/7 14
Usage
$ ./exprEval2 + 3Value = 52 - (5 * 2)Value = -81 / 3Value = 0$
I typed these lines.
I typed ctrl-D
241-437 Compilers: Yacc/7 15
expr.l
%%
[-+*/()\n] { return *yytext; }
[0-9]* { yylval = atoi(yytext); return(NUMBER); }
[ \t] ; /* skip whitespace */
%%
int yywrap(void) { return 1; }
No main() function
RE actions usually end with a return. The token is passed to the syntax analyser.
241-437 Compilers: Yacc/7 16
Lex File Format Reminder
• A lex program has three sections:
REs and/or C code%% RE/action rules%%C functions
241-437 Compilers: Yacc/7 17
expr.y%token NUMBER
%%
exprs: expr '\n' { printf("Value = %d\n", $1); }
| exprs expr '\n' { printf("Value = %d\n", $2); }
;
expr: expr '+' term { $$ = $1 + $3; }
| expr '-' term { $$ = $1 - $3; }
| term { $$ = $1; }
;
continued
declarations
rules
attributes
241-437 Compilers: Yacc/7 18
term: term '*' factor { $$ = $1 * $3; }
| term '/' factor { $$ = $1 / $3; } /* integer division */
| factor
;
factor: '(' expr ')' { $$ = $2; }
| NUMBER
;
continued
more rules
241-437 Compilers: Yacc/7 19
$$#include "lex.yy.c"
int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}
int main(void){ yyparse(); // the syntax analyzer return 0;}
c code
241-437 Compilers: Yacc/7 20
Yacc Actions
• yacc actions (the C code) can use attributes (variables).
• Each body terminal/non-terminal has an attribute, which contains it's return value.
241-437 Compilers: Yacc/7 21
Attributes
• An attribute is $n, where n is the position of the terminal/non-terminal in the body starting at 1– $1 = first terminal/non-terminal of the body– $2 = second one– etc.– $$ = return value for the rule
• the default value for $$ is the $1 value
241-437 Compilers: Yacc/7 22
Evaluation in yaccStack$$ 3$ F$ T$ T *$ T * 5$ T * F$ T$ E$ E +$ E + 4$ E + F$ E + T$ E$ E \n$ Es
Input3*5+4\n$
*5+4\n$*5+4\n$*5+4\n$
5+4\n$+4\n$+4\n$+4\n$
+4\n$ 4\n$
\n$\n$\n$\n$
$$
Actionshiftreduce F numreduce T Fshiftshiftreduce F num reduce T T * Freduce E T shiftshiftreduce F num reduce T F reduce E E + T shiftreduce Es E \naccept
val_3333 3 53 5151515 15 415 415 41919 19
Rule
$$ = $1 (implicit)$$ = $1 (implicit)
$$ = $1 (implicit)$$ = $1 * $3$$ = $1 (implicit)
$$ = $1 (implicit)
$$ = $1 (implicit)$$ = $1 + $3
printf $1
Input: 3 * 5 + 4\n
241-437 Compilers: Yacc/7 23
4. Bottom-up Parsing Reminder
• Simple expressions grammar:E => E '+' E // rule r1
E => E '*' E // rule r2
E => id // rule r3
241-437 Compilers: Yacc/7 24
Parsing "x + y * z"
1. . x + y * z // shift2. x . + y * z // reduce(r3)3. E . + y * z // shift4. E + . y * z // shift5. E + y . * z // reduce(r3)6. E + E . * z // shift7. E + E * . z // shift8. E + E * z . // reduce(r3)9. E + E * E . // reduce(r2)10. E + E . // reduce(r1)11. E . // accept
241-437 Compilers: Yacc/7 25
Shift/Reduce Conflict
• At step 6, a shift or a reduce is possible.6. E + E . * z // reduce (r1)7. E . * z
:
• What should be done?– by default, yacc (bison) shifts
241-437 Compilers: Yacc/7 26
Reduce/Reduce Conflict
• Modify the grammar to include:E => T // new rule r3
E => id // rule r4
T => id // rule r5
continued
241-437 Compilers: Yacc/7 27
• Consider step 2: x . + y * z
• There are two ways to reduce:E . + y * z // reduce (r4)
or
T . + y * z // reduce (r5)
• What should be done?– by default, yacc (bison) reduces using the first
possible rule (i.e. rule r4)
241-437 Compilers: Yacc/7 28
Common Conflicts
• The two most common shift/reduce problems in prog. languages are:– expression precedence– dangling else
• yacc has features for fixing both of these
• Reduce/reduce problems are usually due to errors in your grammar.
241-437 Compilers: Yacc/7 29
Debugging Conflicts
• bison can generate extra conflict information, which can help you debug your grammar.– use the -v option
241-437 Compilers: Yacc/7 30
5. Expression Conflicts
%token NUMBER
%%
expr: expr '+' expr | expr '*' expr | '(' expr ')' | NUMBER ;
in shiftE.y
continued
shift/reduce here,as in previousexample
241-437 Compilers: Yacc/7 31
%%#include "lex.yy.c"
int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}
int main(void){ yyparse(); return 0;}
241-437 Compilers: Yacc/7 32
Example
• When the parsing state is:expr '+' expr . '*' z
should bison shift:
expr '+' expr '*' . z
or reduce?:
expr . '*' z // using rule 1
241-437 Compilers: Yacc/7 33
Using -v
$ bison shiftE.y
shiftE.y: conflicts: 4 shift/reduce
$ bison -v shiftE.yshiftE.y: conflicts: 4 shift/reduce
– creates a shiftE.output file with extra conflict information
241-437 Compilers: Yacc/7 34
Inside shiftE.outputState 9 conflicts: 2 shift/reduceState 10 conflicts: 2 shift/reduce
Grammar
0 $accept: expr $end
1 expr: expr '+' expr 2 | expr '*' expr 3 | '(' expr ')' 4 | NUMBER
: // many state blocks
states 9 and 10are the problems
the rulesare numbered
continued
241-437 Compilers: Yacc/7 35
state 9
1 expr: expr . '+' expr 1 | expr '+' expr . 2 | expr . '*' expr
'+' shift, and go to state 6 '*' shift, and go to state 7
'+' [reduce using rule 1 (expr)] '*' [reduce using rule 1 (expr)] $default reduce using rule 1 (expr)
bison does this
but it could do this
when bison is looking at these kinds of parsing states
continued
241-437 Compilers: Yacc/7 36
state 10
1 expr: expr . '+' expr 2 | expr . '*' expr 2 | expr '*' expr .
'+' shift, and go to state 6 '*' shift, and go to state 7
'+' [reduce using rule 2 (expr)] '*' [reduce using rule 2 (expr)] $default reduce using rule 2 (expr)
bison does this
but it could do this
when bison is looking at these kinds of parsing states
241-437 Compilers: Yacc/7 37
What causes Expression Conflicts?
• The problems are the precedence and associativity of the operators:– does 2 + 3 * 5 mean
(2 + 3) * 5 or 2 + (3 * 5) ? // should be 2nd– does 1 - 1 - 1 mean
(1 - 1) - 1 or 1 - (1 - 1) ? // should be 1st
• * should have higher precedence than +, and – should be left associative.
241-437 Compilers: Yacc/7 38
6. Precedence/Associativity in yacc
• The declarations section can contain associativity and precedence settings for tokens:– %left, %right, %nonassoc– precedence is given by the order of the lines
• Example:%left '+' '-'%left '*' '/'
All left associative, with '*' and '/' higher precedencethan '+' and '-'.
241-437 Compilers: Yacc/7 39
Expressions Variables Compiler
$ flex exprVars.l$ bison exprVars.y$ gcc exprVars.tab.c -o exprVarsEval
exprVars.l,
a flex file
exprVars.y,
a bison file
bison
flex lex.yy.c
exprVars.tab.c
gccexprVarsEval,
c executable
#include
241-437 Compilers: Yacc/7 40
Usage
$ ./exprVarsEval2 + 5 * 3Value = 171 - 1 - 1Value = -1a = 3 * 4a Value = 12b = (3 - 6) * abValue = -36$
I typed these lines.
I typed ctrl-D
241-437 Compilers: Yacc/7 41
exprVars.l/* Added: RE vars, token names, VAR token, assignment,
error msgs */
digits [0-9]+letter [a-z]%%
\n return('\n');\= return(ASSIGN);\+ return(PLUS);\- return(MINUS);\* return(TIMES);\/ return(DIV);\( return(LPAREN);\) return(RPAREN);
continued
the token namesare defined in the yacc file
241-437 Compilers: Yacc/7 42
{letter} { yylval = *yytext - 'a'; return(VAR); }
{digits} { yylval = atoi(yytext); return(NUMBER); }
[ \t] ; /* skip whitespace */
. yyerror("Invalid char"); /* reject everything else */
%%
int yywrap(void)
{ return 1; }
241-437 Compilers: Yacc/7 43
exprVars.y/* Added: token names, assoc/precedence ops,
changed grammar rules, vars and assignment. */
%token VAR NUMBER ASSIGN PLUS MINUS TIMES DIV LPAREN RPAREN
%left PLUS MINUS%left TIMES DIV
%{ int symbol[26]; // stores var's values%}
%%
continued
241-437 Compilers: Yacc/7 44
program: program statement '\n' | ;
statement: expr { printf("Value = %d\n", $1); } | VAR ASSIGN expr { symbol[$1] = $3; }
expr: NUMBER | VAR { $$ = symbol[$1]; } | expr PLUS expr { $$ = $1 + $3; } | expr MINUS expr { $$ = $1 - $3; } | expr TIMES expr { $$ = $1 * $3; } | expr DIV expr { $$ = $1 / $3; }
/* integer division */ | LPAREN expr RPAREN { $$ = $2; } ;
%%
continued
241-437 Compilers: Yacc/7 45
#include "lex.yy.c"
int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}
int main(void){ yyparse(); return 0;}
241-437 Compilers: Yacc/7 46
7. Dangling Else Conflict%token IF ELSE variable
%%
stmt: expr | if_stmt ;
if_stmt: IF expr stmt | IF expr stmt ELSE stmt ;
expr: variable ;
in iffy.y
$ bison -v iffy.yiffy.y: conflicts: 1 shift/reduce
241-437 Compilers: Yacc/7 47
Shift or Reduce?
• Current state:– IF expr IF expr stmt . ELSE stmt
• Shift choice:– IF expr IF expr stmt . ELSE stmt– IF expr IF expr stmt ELSE . stmt– IF expr IF expr stmt ELSE stmt .– IF expr stmt .
the second ELSE is paired with the second IF
continued
if (x < 5) if (x < 3) y = a – b; else y = b – a;
241-437 Compilers: Yacc/7 48
• Reduce option:– IF expr IF expr stmt . ELSE stmt– IF expr stmt . ELSE stmt– IF expr stmt ELSE . stmt– IF expr stmt ELSE stmt .
the second ELSE is paired with the first IF
if (x < 5) if (x < 3) y = a – b; else y = b – a;
241-437 Compilers: Yacc/7 49
Inside iffy.outputState 8 conflicts: 1 shift/reduce
Grammar 0 $accept: stmt $end
1 stmt: expr 2 | if_stmt
3 if_stmt: IF expr stmt 4 | IF expr stmt ELSE stmt
5 expr: variable
: // many state blocks
continued
241-437 Compilers: Yacc/7 50
state 8
3 if_stmt: IF expr stmt . 4 | IF expr stmt . ELSE stmt
ELSE shift, and go to state 9
ELSE [reduce using rule 3 (if_stmt)] $default reduce using rule 3 (if_stmt)
bison does this
but it could do this
when bison is looking at these kinds of parsing states
241-437 Compilers: Yacc/7 51
8. Left and Right Recursion
• A left recursive rule:list: item | list ',' item ;
• A right recursion rule:list: item | item ',' list
• Left recusion keeps the parse table stack smaller, so may be a better choice• this is the opposite of top-down
241-437 Compilers: Yacc/7 52
9. Error Recovery
• When an error occurs, yacc/bison calls yyerror() and then terminates.
• A better approach is to call yyerror(), then try to continue– this can be done by using the keyword error in
the grammar rules
241-437 Compilers: Yacc/7 53
Example
• If there's an error in the stmt rule, then skip the rest of the input tokens until ';'" or '}' is seen, then continue as before:
stmt: ';'| expr ';'| VAR '=' expr ';'| '{' stmt_list '}'| error ';'| error '}';
241-437 Compilers: Yacc/7 54
10. Embedded Actions
• Actions can be placed anywhere in a rule, not just at the end:
listPair: item1 { do_item1($1); } item2 { do_item2($3); }
– the action variable in the second action block is $3 since the first action block is counted as part of the rule
241-437 Compilers: Yacc/7 55
11. More Information
• Lex and Yaccby Levine, Mason, and BrownO'Reilly; 2nd edition
• On UNIX:– man yacc– info yacc
continued
in our library
241-437 Compilers: Yacc/7 56
• A Compact Guide to Lex & Yaccby Tom Niemannhttp://epaperpress.com/lexandyacc/
– with several yacc calculator examples, which I'll be discussing in the next few chapters
• The Lex & Yacc Page– documentation and toolshttp://dinosaur.compilertools.net/
continued