yacc bnf grammar example.y yacc c compiler +linker example.tab.c other modules executable

26
Yacc BNF grammar example .y YACC C compiler +linker example.tab .c Other modules Executab le

Post on 18-Dec-2015

249 views

Category:

Documents


2 download

TRANSCRIPT

YaccBNF grammar

example.y

YACCC compiler

+linker

example.tab.c

Other modules

Executable

Yacc: what is it?

Yacc: a tool for automatically generating a parser given a grammar written in a yacc specification (.y file). The grammars accepted are LALR(1) grammars with disambiguating rules.

A grammar specifies a set of production rules, which define a language. A production rule specifies a sequence of symbols, sentences, which are legal in the language.

Structure of Yacc Usually Lex/Yacc work together

yylex(): to get the next token

To call the parser, the function yyparse()is invoked

How the parser works The parser produced by Yacc consists

of a finite state machine with a stack A move of the parser is done as

follows: Calls to yylex to obtain the next token

when needed Using the current state, and the

lookahead token, the parser decides on its next action (shift, reduce, accept or error) and carries it out

Skeleton of a yacc specification (.y file)

{declarations}%%{rules}%%{user code}

Rules: <production> actionGrammar type 2 productions

Action: C code that specifies what to do when a production is reduced

Skeleton of a yacc specification (.y file)

%{

< C global variables, prototypes, comments >

%}

[DEFINITION SECTION]

%%

[PRODUCTION RULES SECTION]

%%

< C auxiliary subroutines>

This part will be embedded into *.c

contains token declarations. Tokens are recognized in lexer.

define how to “understand” the input language, and what actions to take for each “sentence”.

any user code. For example, a main function to call the parser function yyparse()

Structure of yacc file

Definition section

declarations of tokens

type of values used on parser stack

Rules section

list of grammar rules with semantic routines

User code

The declaration section Terminal and non terminals%token symbol

%type symbol Operator precedence and operator

associability%noassoc symbol

%left symbolo

%right symbol Axiom%start symbol

The declaration section: terminals They are returned by the yylex()function

which is called be the yyparse() They become #define in the generated file They are numbered starting from 257. But

a concrete number can be associated with a token

%token T_Key 345 Terminals that consist of a single character

can be directly used (they are implicit). The corresponding tokens have values <257

The declaration section:examples

expressions.y

%{#include <stdio.h>%}

%token NUMBER, PLUS, MINUS, MUL, DIV, L_PAR, R_PAR%start expr …

The declaration section:examples

patterns.l

%{#include "expressions_tab.h"%}digit [0-9]

%%[ \t]+ ;{digit}+ {yylval=atoi(yytext); return NUMBER;}"+" return PLUS;"-" return MINUS;"*" return MUL;"/" return DIV;"(" return L_PAR;")" return R_PAR;. {printf("token erroneous\n");}

The declaration section:examples

. . .%token NUMBER, +, -, *, /, (, ). . .

. . .digit [0-9]%%[ \t]+ ;{digit}+ {yylval=atoi(yytext); return NUMBER;}"+" return ’+’;"-" return ’-’;"*" return ’*’;"/" return ’/’;"(" return ’(’;")" return ’)’;. . .

YACC:

Lex:

Flex/Yacc communicationfile.y

yacc -d file.y

file.tab.c

file.tab.o

cc file.tab.c -c

file.l

lex file.l

lex.yy.c

lex.yy.o

cc lex.yy.c -c

file.tab.h

gcc lex.yy.o file.tab.o -o calc

calc

header

Lex/Yacc: lex file%{

#include "expressions.tab.h"

%}

digit [0-9]

%option noyywrap

%%

[ \t]+ ;

{digito}+ {yylval=atoi(yytext); /*printf("lex: %s, %d\n ",yytext, yylval);*/ return NUMERO;}

"+" return PLUS;

"-" return MINUS;

. {printf("token erroneous\n");}

%%

Generated by Yacc

no main()

Flex/Yacc communication

expressions.tab.h

#ifndef YYSTYPE#define YYSTYPE int#endif#define NUMBER258#define PLUS 259#define MINUS260#define MUL 261#define DIV 262#define L_PAR 263#define R_PAR264

The Production Rules Section

%%

production : symbol1 symbol2 … { action }

| symbol3 symbol4 … { action }

| …

production: symbol1 symbol2 { action }

%%

Semantic values%%

statement : expression { printf (“ = %g\n”, $1); }

expression : expression ‘+’ expression { $$ = $1 + $3; }

| expression ‘-’ expression { $$ = $1 - $3; }

| NUMBER { $$ = $1; }

%%

According these two productions,

5 + 4 – 3 + 2 is parsed into:

statement

expression

expression

expression

expression

expression

expressionexpression

number

number

number

number

+-+ 2345

Defining Values

expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ;

Defining Values

expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ;

$1$1

Defining Values

expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $2$2

Defining Values

expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $3$3

Default: $$ = $1;

The declaration section

Support for arbitrary value types

%union{

int intval;

char *str;

}

The declaration section Use of union

terminal declaration%token <intval> NATURAL non terminal declaration%type <type> NO_TERMINAL in productionsexpr: NAT ´+´ NAT {$$=$<intval>1+$<intval>3};

In the lex file[-+]?{digit}+ { yyval.intval=atoi(yytext);

return INTEGER;}

Ambiguity

By default yacc does the following: s/r: chooses reduce over shift r/r: reduce the production that

appears first Better to solve the conflicts by

setting precedence

Error recovery

Yacc detects errors To inform of errors a function needs

to be implementedint yyerror (char *s) {fprintf (stderr, “%s”,s)};

Panic mode recoveryE: IF ´(´ cond ´)´

| IF ´(´ error ´)´ {yyerror(“condition missing”);

Error recovery

After detecting an error, the parser will scan ahead looking for three legal tokens. yyerrork resets the parser to its normal mode

yyclearin allows the token that caused the error to be discarded