compiler engineering lab#1

Post on 17-Jan-2015

1.831 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to Compiler EngineeringHow to start building a lexical analyzer

TRANSCRIPT

L A B # 1 : I N T R O D U C T I O N & L E X I C A L A N A LY S I S

COMPILER ENGINEERING

University of DammamGirls’ College of ScienceDepartment of Computer Science Compiler Engineering Lab

Department of Computer Science - Compiler Engineering Lab

2

WHAT IS A COMPILER?

• It is a program that reads a program written in one language - the source language – and translates it into an equivalent program in another language – the target language-

• An important part of this translation process is that the compiler reports to its user the presence of errors in the source program.

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

3

Compiler

error messages

Source

program

target

program

COMPILER THEORY

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

4

COMPILER ENVIRONTMENT TOOLS

• Many software tools that manipulate source program first perform some analysis .

• Some examples of such tools include

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

5

1- STRUCTURE EDITOR

• It takes as input a sequence of commands to build a source program

• performs the text creation and modification function of a text editor

• Analyze program text, putting and appropriate hierarchical structure on the source program• Checks that the input is correctly formed• Can supply Keywords automatically• Can jump from a begin or left parenthesis to its

matching end or right parenthesis

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

6

2- PRETTY PRINTERS

• Analyze the program and prints it in such a way that the structure of the program becomes clearly visible.

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

7

3- STATIC CHECKERS

• Reads a program • Analyze it• Discover potential bugs without running the program

• Catch logical errors

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

8

4 - INTERPRETERS

• Performs the operations implied by the source program.

• What is the difference between a Compiler and an Interpreter ?

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

925-29/2/12

COMPILER PHASES

Department of Computer Science - Compiler Engineering Lab

10

PARTS OF COMPILATION

1. Analysis The analysis part

breaks up the source program into consistent pieces

and creates an intermediate representation of the source program.

2. Synthesis

The synthesis part constructs the desired target program from the intermediate representation.

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

11

PROCESSING ENDS OF A COMPILER

1. Front-End Consists of phases that depend primarily on the source language and largely independent of the target machine (lexical – syntactic – symbol table – semantic – intermediate code )

2. Back-End Includes those portions of

the compiler that depend on the target machine , and do not depend on the source language (code optimization , code generation)

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

12

COMPLIER PHASES

25-29/2/12

SourceProgram

Machine Language

Compiler

Front End

Back End

Analysis

Synthesis

Intermediate CodeObjectCode

Lexical

Syntax (Hierarchical)Contextual

“Scanning”

“Parsing”

“Semantic Analysis”

Phases are important to simplify the compiler’s structure

Department of Computer Science - Compiler Engineering Lab

13

COMPLIER PHASES INTERACTION (VIA DATA STRUCTURE)

25-29/2/12

SourceProgram

Machine Language

Compiler Analysis

Synthesis

Intermediate CodeObjectCode

Lexical

Syntax

Contextual

Text

Tokens

Abstract (Syntax

Tree)Decorated

AST + Symbol Table

Front End

Back EndIntermediate

Code

Object Code

Department of Computer Science - Compiler Engineering Lab

14

COMPILER PHASES

25-29/2/12

LEXICAL ANALYZER

SYMANTIC ANALYZER

SYNTAX ANALYZER

INTERMEDIATE CODE GENERATOR

CODE OPTIMIZER

CODE GENERATOR

ERROR HANDLING

Symbol Table Manager

Department of Computer Science - Compiler Engineering Lab

15

COMPILER CONSTRUCTION TOOLS

• Compiler can be written like any program• A programmer can use software

development tools like :• Debugger• Version manager• Profilers

• More specialized tools have been developed for helping implementing various phases of a compiler

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

16

1- SCANNER GENERATORS

• Generate lexical analyzer from a specification based on regular expression.

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

17

2- PARSER GENERATORS

• Produces syntax analyzers from input that is based on a context – free grammar.

• In early compilers ,syntax analysis consumed a large fraction of running time and large fraction of intellectual effort of writing compilers.

• Using parser generator gives ability to implement this phase in few days.

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

18

3- SYNTAX–DIRECTED TRANSLATOR ENGINE

• Produce collection of routines that walk the parser tree generating the intermediate code

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

19

4 - AUTOMATIC CODE GENERATOR

• Takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for the target machine

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

20

5 - DATA FLOW ENGINE

• Much of information needed to perform good code optimization involves “ data_ flow analysis”,

• The gathering of information about how values are transmitted from one part of a program to each other part

25-29/2/12

Department of Computer Science - Compiler Engineering Lab

2125-29/2/12

LEXICAL ANALYSISFIRST PHASE OF A COMPILER

INSERTING A LEXICAL ANALYZER BETWEEN THE INPUT AND THE PARSER

InputLexical

AnalyzerParser

Read character

push back

character

pass Token and its attribute

LEXICAL ANALYZER MECHANISM

• Read the characters from the input• Group them into lexemes• Pass the tokens formed by the lexemes together

with their attribute values to the later stages• In some situations the lexical analyzer has to

read some more characters ahead before it can decide on the token to be returned to the parser

• the extra character has to be pushed back onto the input, because it can be the beginning of the next lexeme.

IMPLEMENTING THE INTERACTION

Lexan))

Lexical

Analyzer

Read character

using getchar) )

push back

character F

ungetc)F,stdin)

pass Token and its attribute

LEX …

• A particular tool , that has been widely used to specify lexical analyzers for a variety of languages

• Using such tool will allow us to show how the specification of patterns using regular expressions can be combined with action

REGULAR EXPRESSION PATTERNS FOR TOKENS

Attribute-value Token Regular expression

- - ws

- if If

- then then

- else else

Pointer to table entry id Id

Pointer to table entry num Num

LT relop <

LE relop <=

EQ relop =

NE relop <>

GT relop >

GE relop >=

LEX SPECIFICATION

• A Lex program consists of three parts:1. Declarations

2. Translation rules

3. Auxiliary procedure

1- DECLARATIONS SECTION

Includes declarations of :

variables, manifest constants

and regular definitions

Manifest constant..

Is an identifier that is declared to represent a constant

DEFINITION OF MANIFEST CONSTANT USED BY THE TRANSLATION RULES

LT , LE, EQ , NE , GT , GE , IF , THEN , ELSE , ID , NUMBER ,

RELOP, AROP

REGULAR DEFINITIONS

delim [ \t\n]

Ws {delim}+

letter [A-Za-z]

digit [0-9]

id {letter}({letter}|{digit})*

number

{digit}+(\.{digit}+)?(E[+\-]?{digit}+)?

2-TRANSLATION RULES

are statements of the form P1 {action1}

P2 {action2}

……………..

Pn {action n}

• where each p is a regular expression and each {action} is a program fragment describing what action the lexical analyzer shoud take when pattern p matches a lexeme

2- TRANSLATION RULES

Ws no action and no returnif return (IF)then return (THEN)else return (ELSE)“<“ val =LT return (RELOP)and similarly to other relation operationsId val = install_id( ) return(ID)Number val= install_num( ) return(NUM)

3-AUXILIARY PROCEDURES

• Holds whatever auxiliary procedures are needed by the action

• a lexical analyzer created by lex behaves in concert with a parser in the following manner:

when activated by the parser the lexical analyzer begins reading its remaining input ,one character at a time ,until it has found the longest prefix of the input that is matched by one of the regular expressions P then it execute action

CON..

• Typically action will return control to the parser, if it does not the lexical analyzer proceeds to find more lexemes until an action causes control to return to the parser

• The lexical analyzer returns a single quantity to the parser ,the token..

• to pass an attribute value with information about the lexeme we can set a global variable called val

AUXILIARY PROCEDURES

• install_id ( )

Procedure to install the lexeme • install_num ( )

similar procedure to install a lexeme that is a number

WRITING A LEXICAL ANALYZER

• Write a lexical analyzer Using C++ language.

• Write it as a function called from inside main( )

• Call that function Lexan• Lexan function returns the value of Token

THE LEXICAL ANALYZER WILL DO..

• Read character from the user

• If the character is a blank (Space) or a (tab) (written ‘\t’) no token is returned to the parser, exit the function

• If the character is (new line) written (‘\n’) the line numbers will be incremented ,no token is returned

• If the character is one Digit .. Tokenval

MORE THAN ONE DIGIT ..

• Allow user to enter sequence of characters• While the user entering digits after first digit the

analyzer allows him to enter more digits• Each time the analyzer compute the Tokenval• If the next character is not digit push back the

character• Each time print the result from each part to see

the output

TOKENVAL..

• First digit

Tokenval= t –’0’• Next digit Tokenval = tokenval * 10 + t - ’0’

READING CHARACTER FROM THE USER

#include <stdio.h>int getchar( );• Gets character from stdin.• getchar is a macro that returns the next

character on the named input stream stdin. • On success , getchar returns the character read,

after converting it to an int without sign extension using the ASCII code.

PUSHING BACK CHARACTERS

#include <stdio.h>ungetc (c,stdin)• Pushes a character back into input stream.• ungetc pushes the character c back onto the

named input stream, which must be open for reading. This character will be returned on the next call to getchar for that stream. One character can be pushed back in all situations.

• On success, ungetc returns the character pushed back.

TEST CHARACTER IF (DIGIT) OR NOT

#include <ctype.h>

isdigit(t)• Tests for decimal-digit character.• isdigit is a macro that classifies ASCII-coded

integer values by table lookup• isdigit returns nonzero if c is a digit.

Department of Computer Science - Compiler Engineering Lab

43

QUESTIONS?

Thank you for listening

25-29/2/12

top related