compiler engineering lab#1

43
LAB # 1: INTRODUCTION & LEXICAL ANALYSIS COMPILER ENGINEERING University of Dammam Girls’ College of Science Department of Computer Science Compiler Engineering Lab

Upload: mashaelq

Post on 17-Jan-2015

1.831 views

Category:

Education


1 download

DESCRIPTION

Introduction to Compiler EngineeringHow to start building a lexical analyzer

TRANSCRIPT

Page 1: Compiler Engineering Lab#1

L A B # 1 : I N T R O D U C T I O N & L E X I C A L A N A LY S I S

COMPILER ENGINEERING

University of DammamGirls’ College of ScienceDepartment of Computer Science Compiler Engineering Lab

Page 2: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

2

WHAT IS A COMPILER?

• It is a program that reads a program written in one language - the source language – and translates it into an equivalent program in another language – the target language-

• An important part of this translation process is that the compiler reports to its user the presence of errors in the source program.

25-29/2/12

Page 3: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

3

Compiler

error messages

Source

program

target

program

COMPILER THEORY

25-29/2/12

Page 4: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

4

COMPILER ENVIRONTMENT TOOLS

• Many software tools that manipulate source program first perform some analysis .

• Some examples of such tools include

25-29/2/12

Page 5: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

5

1- STRUCTURE EDITOR

• It takes as input a sequence of commands to build a source program

• performs the text creation and modification function of a text editor

• Analyze program text, putting and appropriate hierarchical structure on the source program• Checks that the input is correctly formed• Can supply Keywords automatically• Can jump from a begin or left parenthesis to its

matching end or right parenthesis

25-29/2/12

Page 6: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

6

2- PRETTY PRINTERS

• Analyze the program and prints it in such a way that the structure of the program becomes clearly visible.

25-29/2/12

Page 7: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

7

3- STATIC CHECKERS

• Reads a program • Analyze it• Discover potential bugs without running the program

• Catch logical errors

25-29/2/12

Page 8: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

8

4 - INTERPRETERS

• Performs the operations implied by the source program.

• What is the difference between a Compiler and an Interpreter ?

25-29/2/12

Page 9: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

925-29/2/12

COMPILER PHASES

Page 10: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

10

PARTS OF COMPILATION

1. Analysis The analysis part

breaks up the source program into consistent pieces

and creates an intermediate representation of the source program.

2. Synthesis

The synthesis part constructs the desired target program from the intermediate representation.

25-29/2/12

Page 11: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

11

PROCESSING ENDS OF A COMPILER

1. Front-End Consists of phases that depend primarily on the source language and largely independent of the target machine (lexical – syntactic – symbol table – semantic – intermediate code )

2. Back-End Includes those portions of

the compiler that depend on the target machine , and do not depend on the source language (code optimization , code generation)

25-29/2/12

Page 12: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

12

COMPLIER PHASES

25-29/2/12

SourceProgram

Machine Language

Compiler

Front End

Back End

Analysis

Synthesis

Intermediate CodeObjectCode

Lexical

Syntax (Hierarchical)Contextual

“Scanning”

“Parsing”

“Semantic Analysis”

Phases are important to simplify the compiler’s structure

Page 13: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

13

COMPLIER PHASES INTERACTION (VIA DATA STRUCTURE)

25-29/2/12

SourceProgram

Machine Language

Compiler Analysis

Synthesis

Intermediate CodeObjectCode

Lexical

Syntax

Contextual

Text

Tokens

Abstract (Syntax

Tree)Decorated

AST + Symbol Table

Front End

Back EndIntermediate

Code

Object Code

Page 14: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

14

COMPILER PHASES

25-29/2/12

LEXICAL ANALYZER

SYMANTIC ANALYZER

SYNTAX ANALYZER

INTERMEDIATE CODE GENERATOR

CODE OPTIMIZER

CODE GENERATOR

ERROR HANDLING

Symbol Table Manager

Page 15: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

15

COMPILER CONSTRUCTION TOOLS

• Compiler can be written like any program• A programmer can use software

development tools like :• Debugger• Version manager• Profilers

• More specialized tools have been developed for helping implementing various phases of a compiler

25-29/2/12

Page 16: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

16

1- SCANNER GENERATORS

• Generate lexical analyzer from a specification based on regular expression.

25-29/2/12

Page 17: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

17

2- PARSER GENERATORS

• Produces syntax analyzers from input that is based on a context – free grammar.

• In early compilers ,syntax analysis consumed a large fraction of running time and large fraction of intellectual effort of writing compilers.

• Using parser generator gives ability to implement this phase in few days.

25-29/2/12

Page 18: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

18

3- SYNTAX–DIRECTED TRANSLATOR ENGINE

• Produce collection of routines that walk the parser tree generating the intermediate code

25-29/2/12

Page 19: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

19

4 - AUTOMATIC CODE GENERATOR

• Takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for the target machine

25-29/2/12

Page 20: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

20

5 - DATA FLOW ENGINE

• Much of information needed to perform good code optimization involves “ data_ flow analysis”,

• The gathering of information about how values are transmitted from one part of a program to each other part

25-29/2/12

Page 21: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

2125-29/2/12

LEXICAL ANALYSISFIRST PHASE OF A COMPILER

Page 22: Compiler Engineering Lab#1

INSERTING A LEXICAL ANALYZER BETWEEN THE INPUT AND THE PARSER

InputLexical

AnalyzerParser

Read character

push back

character

pass Token and its attribute

Page 23: Compiler Engineering Lab#1

LEXICAL ANALYZER MECHANISM

• Read the characters from the input• Group them into lexemes• Pass the tokens formed by the lexemes together

with their attribute values to the later stages• In some situations the lexical analyzer has to

read some more characters ahead before it can decide on the token to be returned to the parser

• the extra character has to be pushed back onto the input, because it can be the beginning of the next lexeme.

Page 24: Compiler Engineering Lab#1

IMPLEMENTING THE INTERACTION

Lexan))

Lexical

Analyzer

Read character

using getchar) )

push back

character F

ungetc)F,stdin)

pass Token and its attribute

Page 25: Compiler Engineering Lab#1

LEX …

• A particular tool , that has been widely used to specify lexical analyzers for a variety of languages

• Using such tool will allow us to show how the specification of patterns using regular expressions can be combined with action

Page 26: Compiler Engineering Lab#1

REGULAR EXPRESSION PATTERNS FOR TOKENS

Attribute-value Token Regular expression

- - ws

- if If

- then then

- else else

Pointer to table entry id Id

Pointer to table entry num Num

LT relop <

LE relop <=

EQ relop =

NE relop <>

GT relop >

GE relop >=

Page 27: Compiler Engineering Lab#1

LEX SPECIFICATION

• A Lex program consists of three parts:1. Declarations

2. Translation rules

3. Auxiliary procedure

Page 28: Compiler Engineering Lab#1

1- DECLARATIONS SECTION

Includes declarations of :

variables, manifest constants

and regular definitions

Manifest constant..

Is an identifier that is declared to represent a constant

Page 29: Compiler Engineering Lab#1

DEFINITION OF MANIFEST CONSTANT USED BY THE TRANSLATION RULES

LT , LE, EQ , NE , GT , GE , IF , THEN , ELSE , ID , NUMBER ,

RELOP, AROP

Page 30: Compiler Engineering Lab#1

REGULAR DEFINITIONS

delim [ \t\n]

Ws {delim}+

letter [A-Za-z]

digit [0-9]

id {letter}({letter}|{digit})*

number

{digit}+(\.{digit}+)?(E[+\-]?{digit}+)?

Page 31: Compiler Engineering Lab#1

2-TRANSLATION RULES

are statements of the form P1 {action1}

P2 {action2}

……………..

Pn {action n}

• where each p is a regular expression and each {action} is a program fragment describing what action the lexical analyzer shoud take when pattern p matches a lexeme

Page 32: Compiler Engineering Lab#1

2- TRANSLATION RULES

Ws no action and no returnif return (IF)then return (THEN)else return (ELSE)“<“ val =LT return (RELOP)and similarly to other relation operationsId val = install_id( ) return(ID)Number val= install_num( ) return(NUM)

Page 33: Compiler Engineering Lab#1

3-AUXILIARY PROCEDURES

• Holds whatever auxiliary procedures are needed by the action

• a lexical analyzer created by lex behaves in concert with a parser in the following manner:

when activated by the parser the lexical analyzer begins reading its remaining input ,one character at a time ,until it has found the longest prefix of the input that is matched by one of the regular expressions P then it execute action

Page 34: Compiler Engineering Lab#1

CON..

• Typically action will return control to the parser, if it does not the lexical analyzer proceeds to find more lexemes until an action causes control to return to the parser

• The lexical analyzer returns a single quantity to the parser ,the token..

• to pass an attribute value with information about the lexeme we can set a global variable called val

Page 35: Compiler Engineering Lab#1

AUXILIARY PROCEDURES

• install_id ( )

Procedure to install the lexeme • install_num ( )

similar procedure to install a lexeme that is a number

Page 36: Compiler Engineering Lab#1

WRITING A LEXICAL ANALYZER

• Write a lexical analyzer Using C++ language.

• Write it as a function called from inside main( )

• Call that function Lexan• Lexan function returns the value of Token

Page 37: Compiler Engineering Lab#1

THE LEXICAL ANALYZER WILL DO..

• Read character from the user

• If the character is a blank (Space) or a (tab) (written ‘\t’) no token is returned to the parser, exit the function

• If the character is (new line) written (‘\n’) the line numbers will be incremented ,no token is returned

• If the character is one Digit .. Tokenval

Page 38: Compiler Engineering Lab#1

MORE THAN ONE DIGIT ..

• Allow user to enter sequence of characters• While the user entering digits after first digit the

analyzer allows him to enter more digits• Each time the analyzer compute the Tokenval• If the next character is not digit push back the

character• Each time print the result from each part to see

the output

Page 39: Compiler Engineering Lab#1

TOKENVAL..

• First digit

Tokenval= t –’0’• Next digit Tokenval = tokenval * 10 + t - ’0’

Page 40: Compiler Engineering Lab#1

READING CHARACTER FROM THE USER

#include <stdio.h>int getchar( );• Gets character from stdin.• getchar is a macro that returns the next

character on the named input stream stdin. • On success , getchar returns the character read,

after converting it to an int without sign extension using the ASCII code.

Page 41: Compiler Engineering Lab#1

PUSHING BACK CHARACTERS

#include <stdio.h>ungetc (c,stdin)• Pushes a character back into input stream.• ungetc pushes the character c back onto the

named input stream, which must be open for reading. This character will be returned on the next call to getchar for that stream. One character can be pushed back in all situations.

• On success, ungetc returns the character pushed back.

Page 42: Compiler Engineering Lab#1

TEST CHARACTER IF (DIGIT) OR NOT

#include <ctype.h>

isdigit(t)• Tests for decimal-digit character.• isdigit is a macro that classifies ASCII-coded

integer values by table lookup• isdigit returns nonzero if c is a digit.

Page 43: Compiler Engineering Lab#1

Department of Computer Science - Compiler Engineering Lab

43

QUESTIONS?

Thank you for listening

25-29/2/12