241-437 compilers: yacc/7 1 compiler structures objective – –describe yacc (actually bison) –...

57
241-437 Compilers: Yacc/7 Compiler Structures Objective describe yacc (actually bison) give simple examples of its use 241-437, Semester 1, 2011-2012 7. Yacc

Upload: asher-george

Post on 26-Dec-2015

234 views

Category:

Documents


0 download

TRANSCRIPT

241-437 Compilers: Yacc/7 1

Compiler Structures

• Objective– describe yacc (actually bison)– give simple examples of its use

241-437, Semester 1, 2011-2012

7. Yacc

241-437 Compilers: Yacc/7 2

Overview

1. What is Yacc?

2. Format of a yacc/bison File

3. Expressions Compiler

4. Bottom-up Parsing Reminder

5. Expression Conflicts

6. Precedence/Associativity in yacc

continued

241-437 Compilers: Yacc/7 3

7. Dangling Else Conflict

8. Left and Right Recursion

9. Error Recovery

10. Embedded Actions

11. More Information

241-437 Compilers: Yacc/7 4

1. What is Yacc?

• Yacc (Yet Another Compiler Compiler) is a tool for translating a context free grammar into a bottom-up LALR parser– it creates a parse table like that described in the

last chapter

• Yacc is used with lex to create compilers.

continued

241-437 Compilers: Yacc/7 5

• Most people use bison, a much improved version of yacc– on most modern Unixes, when you call yacc,

you're really using bison

• bison works with flex (the fast version of lex).

241-437 Compilers: Yacc/7 6

Bison and Flex $ flex foo.l$ bison foo.y$ gcc foo.tab.c -o foo

foo.l,

a flex file

foo.y,

a bison file

bison

flex lex.yy.c

foo.tab.c

C compilerfoo,

c executable

#include

foo,

c executablesourceprogram

parsed output

$ ./foo < program.txt

241-437 Compilers: Yacc/7 7

Compiler Components (in foo)

lex.yy.c,Lexical

Analyzer(using chars)

foo.tab.c,Syntax

Analyzer(using tokens)

SourceProgram

3. Token,token value,token type

1. Get nexttoken bycallingyylex()

lexicalerrors

syntaxerrors

2. Get charsto makea token

parsedoutput

241-437 Compilers: Yacc/7 8

actions gotos

Inside foo.tab.c

$$aann……aaii……aa22aa11

LALR Parser

XXo o ss00

……

XXm-1 m-1 ssm-1 m-1

XXm m ssmm parsedoutput

stack

input tokens

X is terminals ornon-terminals,S = state

Parse table(bison creates thisbased on yourgrammar)

241-437 Compilers: Yacc/7 9

2. Format of a yacc/bison File

declarations:C data and yacc definitions (or nothing)

%%

Grammar rules (with actions)

%%

#include "lex.yy.c"

C functions, including main()

241-437 Compilers: Yacc/7 10

Declarations• C data is put between %{ and %}

• The yacc definitions list the tokens (terminals) used in the grammar

%token terminal1 terminal2 ...

• Other yacc definitions:– %left and %right for associativity– %prec for precedence

241-437 Compilers: Yacc/7 11

Precedence example: 2 + 3 * 5– does it mean (2 + 3) * 5

or 2 + (3 * 5) ?

Associativity example: 1 – 1 – 1– does it mean (1 – 1) – 1 // left

or 1 – (1 – 1) ? // right

241-437 Compilers: Yacc/7 12

Rules

• Rule format:

nonterminal : body 1 {action 1} | body 2 {action 2}

. . . | body n {action n) ;

• Actions are optional; they are C code.• Actions are usually at the end of a body,

but can be placed anywhere.

grammar part is the same as: nonterminal body1 | body2 | ... | bodyN

241-437 Compilers: Yacc/7 13

3. Expressions Compiler

$ flex expr.l$ bison expr.y$ gcc expr.tab.c -o exprEval

expr.l,

a flex file

expr.y,

a bison file

bison

flex lex.yy.c

expr.tab.c

gccexprEval,

c executable

#include

241-437 Compilers: Yacc/7 14

Usage

$ ./exprEval2 + 3Value = 52 - (5 * 2)Value = -81 / 3Value = 0$

I typed these lines.

I typed ctrl-D

241-437 Compilers: Yacc/7 15

expr.l

%%

[-+*/()\n] { return *yytext; }

[0-9]* { yylval = atoi(yytext); return(NUMBER); }

[ \t] ; /* skip whitespace */

%%

int yywrap(void) { return 1; }

No main() function

RE actions usually end with a return. The token is passed to the syntax analyser.

241-437 Compilers: Yacc/7 16

Lex File Format Reminder

• A lex program has three sections:

REs and/or C code%% RE/action rules%%C functions

241-437 Compilers: Yacc/7 17

expr.y%token NUMBER

%%

exprs: expr '\n' { printf("Value = %d\n", $1); }

| exprs expr '\n' { printf("Value = %d\n", $2); }

;

expr: expr '+' term { $$ = $1 + $3; }

| expr '-' term { $$ = $1 - $3; }

| term { $$ = $1; }

;

continued

declarations

rules

attributes

241-437 Compilers: Yacc/7 18

term: term '*' factor { $$ = $1 * $3; }

| term '/' factor { $$ = $1 / $3; } /* integer division */

| factor

;

factor: '(' expr ')' { $$ = $2; }

| NUMBER

;

continued

more rules

241-437 Compilers: Yacc/7 19

$$#include "lex.yy.c"

int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}

int main(void){ yyparse(); // the syntax analyzer return 0;}

c code

241-437 Compilers: Yacc/7 20

Yacc Actions

• yacc actions (the C code) can use attributes (variables).

• Each body terminal/non-terminal has an attribute, which contains it's return value.

241-437 Compilers: Yacc/7 21

Attributes

• An attribute is $n, where n is the position of the terminal/non-terminal in the body starting at 1– $1 = first terminal/non-terminal of the body– $2 = second one– etc.– $$ = return value for the rule

• the default value for $$ is the $1 value

241-437 Compilers: Yacc/7 22

Evaluation in yaccStack$$ 3$ F$ T$ T *$ T * 5$ T * F$ T$ E$ E +$ E + 4$ E + F$ E + T$ E$ E \n$ Es

Input3*5+4\n$

*5+4\n$*5+4\n$*5+4\n$

5+4\n$+4\n$+4\n$+4\n$

+4\n$ 4\n$

\n$\n$\n$\n$

$$

Actionshiftreduce F numreduce T Fshiftshiftreduce F num reduce T T * Freduce E T shiftshiftreduce F num reduce T F reduce E E + T shiftreduce Es E \naccept

val_3333 3 53 5151515 15 415 415 41919 19

Rule

$$ = $1 (implicit)$$ = $1 (implicit)

$$ = $1 (implicit)$$ = $1 * $3$$ = $1 (implicit)

$$ = $1 (implicit)

$$ = $1 (implicit)$$ = $1 + $3

printf $1

Input: 3 * 5 + 4\n

241-437 Compilers: Yacc/7 23

4. Bottom-up Parsing Reminder

• Simple expressions grammar:E => E '+' E // rule r1

E => E '*' E // rule r2

E => id // rule r3

241-437 Compilers: Yacc/7 24

Parsing "x + y * z"

1. . x + y * z // shift2. x . + y * z // reduce(r3)3. E . + y * z // shift4. E + . y * z // shift5. E + y . * z // reduce(r3)6. E + E . * z // shift7. E + E * . z // shift8. E + E * z . // reduce(r3)9. E + E * E . // reduce(r2)10. E + E . // reduce(r1)11. E . // accept

241-437 Compilers: Yacc/7 25

Shift/Reduce Conflict

• At step 6, a shift or a reduce is possible.6. E + E . * z // reduce (r1)7. E . * z

:

• What should be done?– by default, yacc (bison) shifts

241-437 Compilers: Yacc/7 26

Reduce/Reduce Conflict

• Modify the grammar to include:E => T // new rule r3

E => id // rule r4

T => id // rule r5

continued

241-437 Compilers: Yacc/7 27

• Consider step 2: x . + y * z

• There are two ways to reduce:E . + y * z // reduce (r4)

or

T . + y * z // reduce (r5)

• What should be done?– by default, yacc (bison) reduces using the first

possible rule (i.e. rule r4)

241-437 Compilers: Yacc/7 28

Common Conflicts

• The two most common shift/reduce problems in prog. languages are:– expression precedence– dangling else

• yacc has features for fixing both of these

• Reduce/reduce problems are usually due to errors in your grammar.

241-437 Compilers: Yacc/7 29

Debugging Conflicts

• bison can generate extra conflict information, which can help you debug your grammar.– use the -v option

241-437 Compilers: Yacc/7 30

5. Expression Conflicts

%token NUMBER

%%

expr: expr '+' expr | expr '*' expr | '(' expr ')' | NUMBER ;

in shiftE.y

continued

shift/reduce here,as in previousexample

241-437 Compilers: Yacc/7 31

%%#include "lex.yy.c"

int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}

int main(void){ yyparse(); return 0;}

241-437 Compilers: Yacc/7 32

Example

• When the parsing state is:expr '+' expr . '*' z

should bison shift:

expr '+' expr '*' . z

or reduce?:

expr . '*' z // using rule 1

241-437 Compilers: Yacc/7 33

Using -v

$ bison shiftE.y

shiftE.y: conflicts: 4 shift/reduce

$ bison -v shiftE.yshiftE.y: conflicts: 4 shift/reduce

– creates a shiftE.output file with extra conflict information

241-437 Compilers: Yacc/7 34

Inside shiftE.outputState 9 conflicts: 2 shift/reduceState 10 conflicts: 2 shift/reduce

Grammar

0 $accept: expr $end

1 expr: expr '+' expr 2 | expr '*' expr 3 | '(' expr ')' 4 | NUMBER

: // many state blocks

states 9 and 10are the problems

the rulesare numbered

continued

241-437 Compilers: Yacc/7 35

state 9

1 expr: expr . '+' expr 1 | expr '+' expr . 2 | expr . '*' expr

'+' shift, and go to state 6 '*' shift, and go to state 7

'+' [reduce using rule 1 (expr)] '*' [reduce using rule 1 (expr)] $default reduce using rule 1 (expr)

bison does this

but it could do this

when bison is looking at these kinds of parsing states

continued

241-437 Compilers: Yacc/7 36

state 10

1 expr: expr . '+' expr 2 | expr . '*' expr 2 | expr '*' expr .

'+' shift, and go to state 6 '*' shift, and go to state 7

'+' [reduce using rule 2 (expr)] '*' [reduce using rule 2 (expr)] $default reduce using rule 2 (expr)

bison does this

but it could do this

when bison is looking at these kinds of parsing states

241-437 Compilers: Yacc/7 37

What causes Expression Conflicts?

• The problems are the precedence and associativity of the operators:– does 2 + 3 * 5 mean

(2 + 3) * 5 or 2 + (3 * 5) ? // should be 2nd– does 1 - 1 - 1 mean

(1 - 1) - 1 or 1 - (1 - 1) ? // should be 1st

• * should have higher precedence than +, and – should be left associative.

241-437 Compilers: Yacc/7 38

6. Precedence/Associativity in yacc

• The declarations section can contain associativity and precedence settings for tokens:– %left, %right, %nonassoc– precedence is given by the order of the lines

• Example:%left '+' '-'%left '*' '/'

All left associative, with '*' and '/' higher precedencethan '+' and '-'.

241-437 Compilers: Yacc/7 39

Expressions Variables Compiler

$ flex exprVars.l$ bison exprVars.y$ gcc exprVars.tab.c -o exprVarsEval

exprVars.l,

a flex file

exprVars.y,

a bison file

bison

flex lex.yy.c

exprVars.tab.c

gccexprVarsEval,

c executable

#include

241-437 Compilers: Yacc/7 40

Usage

$ ./exprVarsEval2 + 5 * 3Value = 171 - 1 - 1Value = -1a = 3 * 4a Value = 12b = (3 - 6) * abValue = -36$

I typed these lines.

I typed ctrl-D

241-437 Compilers: Yacc/7 41

exprVars.l/* Added: RE vars, token names, VAR token, assignment,

error msgs */

digits [0-9]+letter [a-z]%%

\n return('\n');\= return(ASSIGN);\+ return(PLUS);\- return(MINUS);\* return(TIMES);\/ return(DIV);\( return(LPAREN);\) return(RPAREN);

continued

the token namesare defined in the yacc file

241-437 Compilers: Yacc/7 42

{letter} { yylval = *yytext - 'a'; return(VAR); }

{digits} { yylval = atoi(yytext); return(NUMBER); }

[ \t] ; /* skip whitespace */

. yyerror("Invalid char"); /* reject everything else */

%%

int yywrap(void)

{ return 1; }

241-437 Compilers: Yacc/7 43

exprVars.y/* Added: token names, assoc/precedence ops,

changed grammar rules, vars and assignment. */

%token VAR NUMBER ASSIGN PLUS MINUS TIMES DIV LPAREN RPAREN

%left PLUS MINUS%left TIMES DIV

%{ int symbol[26]; // stores var's values%}

%%

continued

241-437 Compilers: Yacc/7 44

program: program statement '\n' | ;

statement: expr { printf("Value = %d\n", $1); } | VAR ASSIGN expr { symbol[$1] = $3; }

expr: NUMBER | VAR { $$ = symbol[$1]; } | expr PLUS expr { $$ = $1 + $3; } | expr MINUS expr { $$ = $1 - $3; } | expr TIMES expr { $$ = $1 * $3; } | expr DIV expr { $$ = $1 / $3; }

/* integer division */ | LPAREN expr RPAREN { $$ = $2; } ;

%%

continued

241-437 Compilers: Yacc/7 45

#include "lex.yy.c"

int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}

int main(void){ yyparse(); return 0;}

241-437 Compilers: Yacc/7 46

7. Dangling Else Conflict%token IF ELSE variable

%%

stmt: expr | if_stmt ;

if_stmt: IF expr stmt | IF expr stmt ELSE stmt ;

expr: variable ;

in iffy.y

$ bison -v iffy.yiffy.y: conflicts: 1 shift/reduce

241-437 Compilers: Yacc/7 47

Shift or Reduce?

• Current state:– IF expr IF expr stmt . ELSE stmt

• Shift choice:– IF expr IF expr stmt . ELSE stmt– IF expr IF expr stmt ELSE . stmt– IF expr IF expr stmt ELSE stmt .– IF expr stmt .

the second ELSE is paired with the second IF

continued

if (x < 5) if (x < 3) y = a – b; else y = b – a;

241-437 Compilers: Yacc/7 48

• Reduce option:– IF expr IF expr stmt . ELSE stmt– IF expr stmt . ELSE stmt– IF expr stmt ELSE . stmt– IF expr stmt ELSE stmt .

the second ELSE is paired with the first IF

if (x < 5) if (x < 3) y = a – b; else y = b – a;

241-437 Compilers: Yacc/7 49

Inside iffy.outputState 8 conflicts: 1 shift/reduce

Grammar 0 $accept: stmt $end

1 stmt: expr 2 | if_stmt

3 if_stmt: IF expr stmt 4 | IF expr stmt ELSE stmt

5 expr: variable

: // many state blocks

continued

241-437 Compilers: Yacc/7 50

state 8

3 if_stmt: IF expr stmt . 4 | IF expr stmt . ELSE stmt

ELSE shift, and go to state 9

ELSE [reduce using rule 3 (if_stmt)] $default reduce using rule 3 (if_stmt)

bison does this

but it could do this

when bison is looking at these kinds of parsing states

241-437 Compilers: Yacc/7 51

8. Left and Right Recursion

• A left recursive rule:list: item | list ',' item ;

• A right recursion rule:list: item | item ',' list

• Left recusion keeps the parse table stack smaller, so may be a better choice• this is the opposite of top-down

241-437 Compilers: Yacc/7 52

9. Error Recovery

• When an error occurs, yacc/bison calls yyerror() and then terminates.

• A better approach is to call yyerror(), then try to continue– this can be done by using the keyword error in

the grammar rules

241-437 Compilers: Yacc/7 53

Example

• If there's an error in the stmt rule, then skip the rest of the input tokens until ';'" or '}' is seen, then continue as before:

stmt: ';'| expr ';'| VAR '=' expr ';'| '{' stmt_list '}'| error ';'| error '}';

241-437 Compilers: Yacc/7 54

10. Embedded Actions

• Actions can be placed anywhere in a rule, not just at the end:

listPair: item1 { do_item1($1); } item2 { do_item2($3); }

– the action variable in the second action block is $3 since the first action block is counted as part of the rule

241-437 Compilers: Yacc/7 55

11. More Information

• Lex and Yaccby Levine, Mason, and BrownO'Reilly; 2nd edition

• On UNIX:– man yacc– info yacc

continued

in our library

241-437 Compilers: Yacc/7 56

• A Compact Guide to Lex & Yaccby Tom Niemannhttp://epaperpress.com/lexandyacc/

– with several yacc calculator examples, which I'll be discussing in the next few chapters

• The Lex & Yacc Page– documentation and toolshttp://dinosaur.compilertools.net/

continued

241-437 Compilers: Yacc/7 57

• Compiler Construction using Flex and Compiler Construction using Flex and BisonBisonby Anthony A. Aabyby Anthony A. Aaby– in the "Useful Info" subdirectory of the course in the "Useful Info" subdirectory of the course

websitewebsite