cs420 programming project implementation of minijava...

17
Lex & Yacc (GNU distribution - flex & bison) Hwansoo Han

Upload: others

Post on 24-Feb-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Lex & Yacc(GNU distribution - flex & bison)

Hwansoo Han

Page 2: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Prerequisite

Ubuntu

Version 14.04 or over

Virtual machine for Windows user or native OS

flex

bison

gcc

Version 4.7 or over

Install in Ubuntu

sudo apt-get install flex bison gcc

Page 3: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Flex Bison source code

Flex

Input: sample.l

Output: lex.yy.c

Execution command

$> flex sample.l

Bison

Input: sample.y

Output: sample.tab.c sample.tab.h

Execution command

$> bison -d sample.y

Page 4: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Lex & Yacc (flex & bison)

lex (flex) yacc (bison)

yylex() yyparse()

lexical rules

(regular expression)

InputProcessed

output

lexical rules

(context-free grammar)

Page 5: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

(2) call yylex()

(3) next token: (NUM, 12)

Working Version of Automated Software

lex.yy.c

yylex() { … }

myparser.tab.c

yyparse() { … }

Input

program

12 + 26

main.c

main() { … }

(1) call yyparse()

+

12 26

syntax tree

Page 6: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Lex – A Lexical Analyzer Generator

Source

A table of regular expressions and corresponding program fragments

Optional user codes

Output: lex.yy.c ( yylex() )

The table is translated to a program which

Reads an input stream

Copy the input to an output stream

Partition the input strings which match the given expressions

As each such string is recognized the corresponding program fragment is executed

User codes are simply copied to the output

Page 7: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Lex – Format of Input File

%{

#include <stdio.h>

#include <stdlib.h>

#include “bison.tab.h”

void mycode(char *);

%}

ALPHA [A-Za-z]

DIGIT [0-9]

%%

({ALPHA})({ALPHA}|{DIGIT})* {

printf(“ID = %s\n“, yytext);

return LETTER;

}

({DIGIT})+ mycode(yytext);

%%

void mycode(char* text) {

printf(“NUMBER = %d\n“, atoi(yytext));

return NUMBER;

}

definitions

rules

user codes

headers

name definition

pattern action

Units

Page 8: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Flex on the run

$ ls

dn.l main.c sample.c

$ flex dn.l

$ ls

dn.l lex.yy.c main.c sample.c

$ gcc lex.yy.c main.c –lfl

$ ls

a.out dn.l lex.yy.c main.c sample.c

$ ./a.out

Ctrl-D

$ ./a.out < sample.c

$ cat main.c

main()

{

yylex();

}

Page 9: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Regular Expressions . matches any single character

* matches zero or more copies of preceding expression

+ matches one or more copies of preceding expression

? matches zero or one copy of preceding expression -?[0-9]+ : signed numbers including optional minus sign

[] matches any character within the brackets [Abc1], [A-Z], [A-Za-z], [^123A-Z]

^ matches the beginning of line

$ matches the end of line

\ escape metacharacter e.g. \* matches with *

| matches either the preceding expression or the following abc|ABC

() groups a series of regular expression (123)(123)*

exclude [123A-Z]

Page 10: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Yacc – Yet Another Compiler-Compiler

Similar to Lex

Input: Context Free Grammar (CFG)

Output: <filename>.tab.h, <filename>.tab.c ( yyparse() )

Definitions

%%

Grammar Rules

%%

User Codes

Page 11: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Yacc – Analysis of AST

%{

#include <stdio.h>

#include “AST.h”

%}

%type <ptr_Letter> LETTER

%type <ptr_Digit> DIGIT

%%

LETTER : .. { ... }

DIGIT : .. {

yyerror (“this is error”);

}

%%

void yyerror (char* text) {

return fprintf(stderr, “%s\n“, text);

}

definitions

Grammar Rules

user codes

headers

struct_type token

pattern action

Declarations

Page 12: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Yacc with Lex - an example

<calc.l>

/* Definitions */

%{

#include <stdlib.h>

#include “calc.tab.h”

%}

NUMBER [0-9]+

%%

/* Rules */

{NUMBER} { yylval = atoi(yytext);

return NUMBER; }

“+” { return PLUS; }

“*” { return MULT; }

“\n” { return EOL; }

. { yyerror(“unexpected char”); }

%%

/* User Code */

<calc.y>

/* Definitions */

%{

#include <stdio.h>

%}

%token NUMBER PLUS MULT EOL

%%

/* Rules */

goal: eval goal {}

| eval {}

;

eval: expr EOL { printf(“= %d\n”, $1); }

;

expr: NUMBER { $$ = $1; }

| expr PLUS expr { $$ = $1 + $3; }

| expr MULT expr { $$ = $1 * $3; }

;

%%

/* User Code */

int yyerror(char *s)

{ return printf(“%s₩n”, s); }

Page 13: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Associativity & Precedence

Specify associativity & precedence

Change the grammar rules

If-else conflict can be resolved in yacc by specifying precedence

%token NUMBER PLUS MULT EOL ASSN

%token NUMBER EOL

%left PLUS

%left MULT

%right ASSN

expr0: NUMBER { $$ = $1; }

;

expr1: expr0 { $$ = $1; }

| expr1 MULT expr0 { $$ = $1 * $3; }

;

expr2: expr1 { $$ = $1; }

| expr2 PLUS expr1 { $$ = $1 + $3; }

;

Lowest

Highest

Precedence level

Page 14: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Precedence for If-Then and If-Then-Else

<clang.y>

/* Definitions */

%token NUMBER EOL

%left PLUS

%left MULT

%nonassoc IF_THEN

%nonassoc ELSE

%%

/* Rules */

stmt : …

| IF ‘(‘ expr ‘)’ stmt %prec IF_THEN { … }

| IF ‘(‘ expr ‘)’ stmt ELSE stmt { … }

%%

/* User Code */

Page 15: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Tips

For Lex (flex)

Single character can be used as token

char c = yytext[0];

String should be duplicated to safely pass outside the scanner

yylval.str = strndup(yytext, yyleng);

For Yacc (bison)

Token and nonterminal symbol have type

%union {

Exp* exp;

char* str;

}

%token <str> ID

%type <exp> expr

Page 16: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Bison on the run

$ ls

calc.l calc.y main.c

$ flex calc.l

$ ls

calc.l calc.y lex.yy.c main.c

$ bison –d calc.y

$ ls

calc.l calc.y lex.yy.c calc.tab.c calc.tab.h main.c

$ gcc -o calc.out calc.tab.c lex.yy.c main.c –lfl

$ ls

calc.l calc.y lex.yy.c calc.tab.c calc.tab.h main.c

calc.out

$ ./calc.out

<main.c>

main()

{ yyparse(); }

Page 17: CS420 Programming Project Implementation of miniJava Compilerarcs.skku.edu/pmwiki/uploads/Courses/Compilers/03-th... · 2017-09-14 · Lex –A Lexical Analyzer Generator Source A

Symbol Table

Symbol Table

Location : main

Count Type Name Array Role

1 int argc - parameter

2 float f 4 variable

Location : main – For(1) – If(1)

Count Type Name Array Role

1 int a - variable