introduction to lex - skkuarcs.skku.edu/.../p1.flex-bison.pdf · lex –a lexical analyzer...

14
Introduction to Lex & Yacc (flex & bison)

Upload: others

Post on 26-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Introduction to Lex & Yacc

(flex & bison)

Page 2: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Lex & Yacc (flex & bison)

lex (flex) yacc (bison)

yylex() yyparse()

lexical rules

(regular expression)

InputProcessed

output

lexical rules

(context-free grammar)

Page 3: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

call yylex()

next token: (NUM, 12)

Working Version of Automated Software

lex.yy.c

yylex() { … }

myparser.tab.c

yyparse() { … }

Input

program

12 + 26

main.c

main() { … }

call yyparse()+

12 26

syntax tree

Page 4: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Lex – A Lexical Analyzer Generator

Source

A table of regular expressions and corresponding program fragments

Optional user codes

Output: lex.yy.c ( yylex() )

The table is translated to a program which

Reads an input stream

Copy the input to an output stream

Partition the input strings which match the given expressions

As each such string is recognized the corresponding program fragment is executed

User codes are simply copied to the output

Page 5: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Lex – Format of Input File

%{

#include <stdio.h>

#include <stdlib.h>

void mycode(char *);

%}

LETTER [A-Za-z]

DIGIT [0-9]

%%

({LETTER})({LETTER}|{DIGIT})* printf(“ID = %s\n“, yytext);

({DIGIT})+ {

mycode(yytext);

}

%%

void mycode(char* text) {

printf(“NUMBER = %d\n“, atoi(yytext));

}

definitions

rules

user codes

headers

name definition

pattern action

Page 6: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Flex on the run

$ ls

dn.l main.c

$ flex dn.l

$ ls

dn.l lex.yy.c main.c

$ gcc lex.yy.c main.c –lfl

$ ls

a.exe dn.l lex.yy.c main.c

$ ./a.exe

Ctrl-D

$

$ cat main.c

main()

{

yylex();

}

Page 7: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Regular Expressions . matches any single character

* matches zero or more copies of preceding expression

+ matches one or more copies of preceding expression

? matches zero or one copy of preceding expression -?[0-9]+ : signed numbers including optional minus sign

[] matches any character within the brackets [Abc1], [A-Z], [A-Za-z], [^123A-Z]

^ matches the beginning of line

$ matches the end of line

\ escape metacharacter e.g. \* matches with *

| matches either the preceding expression or the following abc|ABC

() groups a series of regular expression (123)(123)*

exclude [123A-Z]

Page 8: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Yacc – Yet Another Compiler-Compiler

Similar to Lex

Input: Context Free Grammar (CFG)

Output: <filename>.tab.h, <filename>.tab.c ( yyparse() )

%{

headers

%}

Declarations

%%

Grammar Rules

%%

User Codes

Page 9: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Yacc with Lex - an example

<calc.l>

/* Definitions */

%{

#include <stdlib.h>

#include “calc.tab.h”

%}

NUMBER [0-9]+

%%

/* Rules */

{NUMBER} { yylval = atoi(yytext);

return NUMBER; }

“+” { return PLUS; }

“*” { return MULT; }

“\n” { return EOL; }

. { yyerror(“unexpected char”); }

%%

/* User Code */

<calc.y>

/* Definitions */

%{

#include <stdio.h>

%}

%token NUMBER PLUS MULT EOL

%%

/* Rules */

goal: eval goal {}

| eval {}

;

eval: expr EOL { printf(“= %d\n”, $1); }

;

expr: NUMBER { $$ = $1; }

| expr PLUS expr { $$ = $1 + $3; }

| expr MULT expr { $$ = $1 * $3; }

;

%%

/* User Code */

int yyerror(char *s)

{ return printf(“%s₩n”, s); }

Page 10: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Associativity & Precedence

Specify associativity & precedence

Change the grammar rules

If-else conflict can be OK in yacc

%token NUMBER PLUS MULT EOL ASSN

%token NUMBER EOL

%left PLUS

%left MULT

%right ASSN

expr0: NUMBER { $$ = $1; }

;

expr1: expr0 { $$ = $1; }

| expr1 MULT expr0 { $$ = $1 * $3; }

;

expr2: expr1 { $$ = $1; }

| expr2 PLUS expr1 { $$ = $1 + $3; }

;

Lowest

Highest

Precedence level

Page 11: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Tips

For Lex (flex)

Single character can be used as token

char c = yytext[0];

String should be duplicated to safely pass outside the scanner

yylval.str = strndup(yytext, yyleng);

For Yacc (bison)

Token and nonterminal symbol have type

%union {

Exp* exp;

char* str;

}

%token <str> ID

%type <exp> expr

Page 12: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Bison on the run

$ ls

calc.l calc.y main.c

$ flex calc.l

$ ls

calc.l calc.y lex.yy.c main.c

$ bison –d calc.y

$ ls

calc.l calc.y lex.yy.c calc.tab.c calc.tab.h main.c

$ gcc -o calc.exe calc.tab.c lex.yy.c –lfl

$ ls

calc.l calc.y lex.yy.c calc.tab.c calc.tab.h main.c

calc.exe

$ ./calc.exe

<main.c>

main()

{ yyparse(); }

Page 13: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Programming Assignment #1

Implement an engineering calculator flex & bison on Cygwin environment (http://www.cygwin.com)

Inputs: equation and command.

Outputs: calculated value

error message for illegal equations or illegal commands

Online submission (zipped file named as student_id.zip) http://www.icampus.ac.kr

student_id.zip should contain engcalc.l – flex input file

engcalc.y – bison input file

main.c – main C program

makefile – make command should generate “engcalc.exe”

README – user manual, limitations

Page 14: Introduction to Lex - SKKUarcs.skku.edu/.../p1.flex-bison.pdf · Lex –A Lexical Analyzer Generator Source A table of regular expressions and corresponding program fragments Optional

Programming Assignment #1(cont’d)

Inputs: equations:

integer, real number (note: numbers can be signed, e.g. -2, -3.45, +2.1e23)

+, -, *, /, ^, #, (, ) e.g.) 2^3 = 23 , #4 = 4 , 3#9 = 39

^, # are right-to-left associative operators e.g.) 2^3^2 = 232 , 2#4^2 = 242

built-in functions: sin(), cos(), tan(), log(), ln(), exp()

sin(), cos(), tan() take degrees()

commands: memory: e.g.) m = 35, n = 3.14

clear [m] - Invalidate the value stored in m.

If m is not specified, invalidate all memory values.

Extra Points (+10%) user defined functions: e.g.) f(x, y) = x + 4 * y

clearf [fn] - Invalidate the user defined function fn.

If fn is not specified, invalidate all user defined functions.

base 10base e

real number divide, not integer divide