syntax analysis part iv - elearning.dei.unipd.it · yacc and bison • yacc (yet another compiler...

36
Syntax Analysis Part IV Chapter 4: Bison Slides adapted from : © Robert van Engelen, Florida State University

Upload: others

Post on 24-Feb-2020

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Syntax Analysis ���Part IV

Chapter 4: Bison

Slides adapted from : © Robert van Engelen, Florida State University

Page 2: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Yacc and Bison

•  Yacc (Yet Another Compiler Compiler)– Generates LALR(1) parsers

•  Bison–  Improved version of Yacc

Page 3: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Creating an LALR(1) Parser with Yacc/Bison

Bison���compiler

yacc���specification���yacc.y

yacc.tab.c

input���stream

C���compiler

a.outoutput���stream

yacc.tab.c

a.out

Page 4: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Bison Specification•  A Bison specification consists of three parts:���

Bison declarations, and C declarations within %{ %} ���%% ���translation rules ���%% ���user-defined auxiliary procedures

•  The translation rules are CFG productions with actions:���production1 { semantic action1 }���production2 { semantic action2 }���…���productionn { semantic actionn }

Page 5: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Writing a Grammar in Yacc•  Productions in Yacc are of the form���

Nonterminal: tokens/nonterminals { action }���| tokens/nonterminals { action }���…���;

•  Tokens that are single characters can be used directly within productions, e.g. ‘+’

•  Named tokens must be declared first in the declaration part using���

%token TokenName

Page 6: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Synthesized Attributes•  Semantic actions may refer to attributes of

terminals and nonterminals in a production:���X : Y1 Y2 Y3 … Yn { action }

–  $$ refers to the value of the attribute of X–  $i refers to the value of the attribute of Yi

•  Example :���factor : ‘(’ expr ‘)’ { $$=$2; }

factor.val=x

expr.val=x ) ( $$=$2

Page 7: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Example I : Incomplete Code%{ #include <ctype.h> %} %token DIGIT %% line : expr ‘\n’ { printf(“%d\n”, $1); }

; expr : expr ‘+’ term { $$ = $1 + $3; }

| term { $$ = $1; } ;

term : term ‘*’ factor { $$ = $1 * $3; } | factor { $$ = $1; } ;

factor : ‘(’ expr ‘)’ { $$ = $2; } | DIGIT { $$ = $1; } ;

%% int yylex() { int c = getchar(); if (isdigit(c)) { yylval = c-’0’; return DIGIT; } return c; }

Attribute of token���(stored in yylval)

Attribute of���expr (parent)

Attribute of ���expr (child)

Example of a very crude lexical���analyzer invoked by the parser

Attribute of ���term (child)

Page 8: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Example I : Complete Code%{ #include <ctype.h> #include <stdio.h> int yylex(); void yyerror(char *s); %} %token DIGIT %% line : expr '\n' { printf("%d\n", $1); } ; expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | DIGIT { $$ = $1; } ;

Page 9: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Example I : Complete Code%% int yylex() { int c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } int main() { if (yyparse() != 0 ) fprintf(stderr, "Abnormal exit\n"); return 0; } void yyerror(char *s) { fprintf(stderr, "Error: %s\n", s); }

bison calculator.y gcc calculator.tab.c ./a.out 5+8*(9+2)*3 269 ^D

Page 10: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Tokens

•  Two types of tokens: literal and symbolic •  Literal tokens represented using the corresponding

C character constant (ASCII code)•  Symbolic tokens represented as numbers higher

than any possible character’s code, so they will not conflict with any literal tokens

•  Numbers can be forced by declaration%token NUMBER 621

Page 11: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Tokens

•  When using symbolic tokens, run Bison with –d option to create a C header file with definitions

•  If Bison is combined with Flex, add���#include xxx.tab.h in lexer file, where xxx.y is the source file

Page 12: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Symbol Values

•  Both tokens and nonterminals have an associated semantic value–  For token, the value is stored in the C variable yylval–  For nonterminals, the value is stored in the $$, $1, …

pseudo-variables•  The associated semantic value is of type YYSTYPE, declared as int by default

•  YYSTYPE can be redefined using the C instruction #define

Page 13: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Dealing with ���Ambiguous Grammars

•  A description of parsing action conflicts can be obtained using the -v option, which produces an additional file y.output

•  Reduce/reduce conflicts solved by using the conflicting production listed first

•  Shift/reduce conflicts resolved in favor of shift

Page 14: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Dealing with ���Ambiguous Grammars

•  We can also deal with ambiguous grammars by defining operator precedence levels and left/right associativity of the operators

•  Example:������

%left ‘+’ ‘-’ %left ‘*’ ‘/’ %right UMINUS

Page 15: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Dealing with ���Ambiguous Grammars

•  Productions are also given precedence and associativity, inherited from their rightmost nonterminal

•  Example: item [E → E + E •] and lookahead + resolved with reduction (+ left associative)

•  Example: item [E → E + E •] and lookahead * resolved with shift (* higher precedence)

Page 16: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Dealing with ���Ambiguous Grammars

•  Can force different precedence by attaching to a production %prec 〈terminal〉

•  Symbol 〈terminal〉 can be a placeholder: this terminal is never used by lexical analyzer, but indicates a precedence (see later examples)

Page 17: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Combining Bison with Flex

Yacc or Bison���compiler

Yacc or Bison���specification���yacc.y

lex.yy.c y.tab.c

input���stream

C���compiler

a.outoutput���stream

y.tab.c y.tab.h

a.out

Lex or Flex���compiler

Lex or Flex specification���lex.l

and token definitions ���y.tab.h

lex.yy.c

Page 18: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Example II : Bison%{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double %} %token NUMBER %left ‘+’ ‘-’ %left ‘*’ ‘/’ %right UMINUS

double type for nonterminal ���attributes and yylval

terminal placeholder

Page 19: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

%% lines : lines expr ‘\n’ { printf(“%g\n”, $2); }

| lines ‘\n’ | /* empty */ ;

expr : expr ‘+’ expr { $$ = $1 + $3; }

| expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(’ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;

Example II : Bison

Page 20: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

%% int main() { if (yyparse() != 0) fprintf(stderr, “Abnormal exit\n”); return 0; } void yyerror(char *s) { fprintf(stderr, “Error: %s\n”, s); }

Run the parser

Invoked by parser���to report parse errors

Example II : Bison

Page 21: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

%option noyywrap %{ #include “example.tab.h” extern YYSTYPE yylval; %} number [0-9]+\.?|[0-9]*\.[0-9]+

Generated by Bison, contains ���#define NUMBER xxx

Defined in example.tab.c

Example II : Flex

Page 22: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

%% [ \t] { /* skip blanks */ } {number} { sscanf(yytext, “%lf”, &yylval);

return NUMBER; }

\n|. { return yytext[0]; }

bison –d example.y flex example.l gcc example.tab.c lex.yy.c ./a.out 2.4*(6+13) 45.6 ^D

Example II : Flex

Page 23: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Error Recovery in Yacc%{ … %} … %% lines : lines expr ‘\n’ { printf(“%g\n”, $2; }

| lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter last line: ”); yyerrok; } ;

Reset parser to normal modeError production:���set error mode and���

skip input until newline

Page 24: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Symbol Values

•  If several types are needed for grammar symbols, a union type must be defined

•  The %union directive identifies all of the possible C types that a symbol value can have

•  The field declarations are copied verbatim into a ���C union declaration of the type YYSTYPE

•  In the absence of a %union declaration, Bison defines YYSTYPE to be int

Page 25: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Symbol Values

•  Associate the types declared in %union with specific grammar symbols using –  the %type declaration for nonterminal–  the %token declaration for tokens

Page 26: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Example%union { double dval; char *sval; } ... %token <dval> REAL // token %token <sval> STRING // token %type <dval> expr // nonterminal

Page 27: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III

•  Write an interpreter for combination of arithmetic expressions and Boolean expressions

•  Use %union directive to deal with different data types

Page 28: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Flex%option noyywrap %{ #include <stdlib.h>

#include "boolean.tab.h" %} fract [0-9]+\.?|[0-9]*\.[0-9]+

Page 29: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Flex%% [ \t] { /* skip blanks */ } "&&" { return AND; } "||" { return OR; } "!" { return NOT; } "<" { return LT; } "<=" { return LE; } ">" { return GT; } ">=" { return GE; } "==" { return EQ; } {fract} { yylval.val = atof(yytext);

return FRACT; } \n|. { return yytext[0]; }

Page 30: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Bison%{ #include <ctype.h> #include <stdio.h> int yylex(); void yyerror(char *s); %} %union { double val; int bool; // 1 == true, 0 == false }

Page 31: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Bison%token <val> FRACT %type <val> expr %type <bool> comp %type <bool> bexpr %left '+' '-' %left '*' '/' %right UMINUS %left OR %left AND %right NOT %nonassoc EQ LT GT LE GE

Page 32: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Bison%% lines : lines bexpr '\n' { printf("%d\n", $2); } | lines '\n' | /* empty */ ; bexpr : bexpr OR bexpr { if ($1 == 1 || $3 == 1) $$ = 1; else $$ = 0; } | bexpr AND bexpr { if ($1 == 1 && $3 == 1)

$$ = 1; else $$ = 0; } | NOT bexpr { if ($2 == 1 )

$$ = 0; else $$ = 1; } | '(' bexpr ')' { $$ = $2; } | comp { $$ = $1; } ;

Page 33: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Bisoncomp : expr LT expr { if ($1 < $3)

$$ = 1; else $$ = 0; } | expr LE expr { if ($1 <= $3)

$$ = 1; else $$ = 0; } | expr GE expr { if ($1 >= $3)

$$ = 1; else $$ = 0; } | expr GT expr { if ($1 > $3)

$$ = 1; else $$ = 0; } | expr EQ expr { if ($1 == $3)

$$ = 1; else $$ = 0; } // | '(' comp ')' { $$ = $2; } ;

Page 34: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Bisonexpr : expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3; } | expr '*' expr { $$ = $1 * $3; } | expr '/' expr { $$ = $1 / $3; } | '(' expr ')' { $$ = $2; } | '-' expr %prec UMINUS { $$ = -$2; } | FRACT ;

Page 35: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Exercise III : Bison%% int main() { if (yyparse() != 0) fprintf(stderr, "Abnormal exit\n"); return 0; } void yyerror(char *s) { fprintf(stderr, "Error: %s\n", s); }

Page 36: Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

Summary of Commands•  To run Bison + Flex :������

bison –d –o parser.c parser.y flex –o scanner.c scanner.l

gcc –o parser parser.c scanner.c ./parser < input > output