yacc bnf grammar example.y yacc c compiler +linker example.tab.c other modules executable
Post on 18-Dec-2015
249 views
TRANSCRIPT
Yacc: what is it?
Yacc: a tool for automatically generating a parser given a grammar written in a yacc specification (.y file). The grammars accepted are LALR(1) grammars with disambiguating rules.
A grammar specifies a set of production rules, which define a language. A production rule specifies a sequence of symbols, sentences, which are legal in the language.
Structure of Yacc Usually Lex/Yacc work together
yylex(): to get the next token
To call the parser, the function yyparse()is invoked
How the parser works The parser produced by Yacc consists
of a finite state machine with a stack A move of the parser is done as
follows: Calls to yylex to obtain the next token
when needed Using the current state, and the
lookahead token, the parser decides on its next action (shift, reduce, accept or error) and carries it out
Skeleton of a yacc specification (.y file)
{declarations}%%{rules}%%{user code}
Rules: <production> actionGrammar type 2 productions
Action: C code that specifies what to do when a production is reduced
Skeleton of a yacc specification (.y file)
%{
< C global variables, prototypes, comments >
%}
[DEFINITION SECTION]
%%
[PRODUCTION RULES SECTION]
%%
< C auxiliary subroutines>
This part will be embedded into *.c
contains token declarations. Tokens are recognized in lexer.
define how to “understand” the input language, and what actions to take for each “sentence”.
any user code. For example, a main function to call the parser function yyparse()
Structure of yacc file
Definition section
declarations of tokens
type of values used on parser stack
Rules section
list of grammar rules with semantic routines
User code
The declaration section Terminal and non terminals%token symbol
%type symbol Operator precedence and operator
associability%noassoc symbol
%left symbolo
%right symbol Axiom%start symbol
The declaration section: terminals They are returned by the yylex()function
which is called be the yyparse() They become #define in the generated file They are numbered starting from 257. But
a concrete number can be associated with a token
%token T_Key 345 Terminals that consist of a single character
can be directly used (they are implicit). The corresponding tokens have values <257
The declaration section:examples
expressions.y
%{#include <stdio.h>%}
%token NUMBER, PLUS, MINUS, MUL, DIV, L_PAR, R_PAR%start expr …
The declaration section:examples
patterns.l
%{#include "expressions_tab.h"%}digit [0-9]
%%[ \t]+ ;{digit}+ {yylval=atoi(yytext); return NUMBER;}"+" return PLUS;"-" return MINUS;"*" return MUL;"/" return DIV;"(" return L_PAR;")" return R_PAR;. {printf("token erroneous\n");}
The declaration section:examples
. . .%token NUMBER, +, -, *, /, (, ). . .
. . .digit [0-9]%%[ \t]+ ;{digit}+ {yylval=atoi(yytext); return NUMBER;}"+" return ’+’;"-" return ’-’;"*" return ’*’;"/" return ’/’;"(" return ’(’;")" return ’)’;. . .
YACC:
Lex:
Flex/Yacc communicationfile.y
yacc -d file.y
file.tab.c
file.tab.o
cc file.tab.c -c
file.l
lex file.l
lex.yy.c
lex.yy.o
cc lex.yy.c -c
file.tab.h
gcc lex.yy.o file.tab.o -o calc
calc
header
Lex/Yacc: lex file%{
#include "expressions.tab.h"
%}
digit [0-9]
%option noyywrap
%%
[ \t]+ ;
{digito}+ {yylval=atoi(yytext); /*printf("lex: %s, %d\n ",yytext, yylval);*/ return NUMERO;}
"+" return PLUS;
"-" return MINUS;
. {printf("token erroneous\n");}
%%
Generated by Yacc
no main()
Flex/Yacc communication
expressions.tab.h
#ifndef YYSTYPE#define YYSTYPE int#endif#define NUMBER258#define PLUS 259#define MINUS260#define MUL 261#define DIV 262#define L_PAR 263#define R_PAR264
The Production Rules Section
%%
production : symbol1 symbol2 … { action }
| symbol3 symbol4 … { action }
| …
production: symbol1 symbol2 { action }
%%
Semantic values%%
statement : expression { printf (“ = %g\n”, $1); }
expression : expression ‘+’ expression { $$ = $1 + $3; }
| expression ‘-’ expression { $$ = $1 - $3; }
| NUMBER { $$ = $1; }
%%
According these two productions,
5 + 4 – 3 + 2 is parsed into:
statement
expression
expression
expression
expression
expression
expressionexpression
number
number
number
number
+-+ 2345
Defining Values
expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ;
Defining Values
expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ;
$1$1
Defining Values
expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $2$2
Defining Values
expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $3$3
Default: $$ = $1;
The declaration section Use of union
terminal declaration%token <intval> NATURAL non terminal declaration%type <type> NO_TERMINAL in productionsexpr: NAT ´+´ NAT {$$=$<intval>1+$<intval>3};
In the lex file[-+]?{digit}+ { yyval.intval=atoi(yytext);
return INTEGER;}
Ambiguity
By default yacc does the following: s/r: chooses reduce over shift r/r: reduce the production that
appears first Better to solve the conflicts by
setting precedence
Error recovery
Yacc detects errors To inform of errors a function needs
to be implementedint yyerror (char *s) {fprintf (stderr, “%s”,s)};
Panic mode recoveryE: IF ´(´ cond ´)´
| IF ´(´ error ´)´ {yyerror(“condition missing”);