comp455: compiler and language design · comp455: compiler and language design dr. alaa aljanaby...

41
COMP455: COMPILER AND LANGUAGE DESIGN Dr. Alaa Aljanaby University of Nizwa Spring 2013

Upload: nguyenthuy

Post on 12-May-2018

223 views

Category:

Documents


2 download

TRANSCRIPT

COMP455: COMPILER AND LANGUAGE DESIGN

Dr. Alaa AljanabyUniversity of Nizwa

Spring 2013

Chapter 1: introduction ٢

Chapter 1: Introduction• Compilers draw together all of the theory and

techniques that you’ve learned about in most of your previous computer sciences courses.

• You will gain a deeper understanding of how compilers work, and be able to write better code.

• We will focusing on a “little language” - you will be writing a simple compiler or may be parts of it

Chapter 1: introduction ٣

Chapter 1: introduction ٤

Compilers and Interpreters

Compiler

Source program

Target program

Interpreter

Source program

InputOutput

Target program

input

output

Chapter 1: introduction ٥

The compiler

• A Compiler is a program that reads a program written in a language (Source language) and translates it into an equivalent program in another language (target language) .

• An important role of the complier is to report any error it detects during the translation process.

Chapter 1: introduction ٦

The interpreter

• It is another type of language processor, instead of producing a target program, an interpreter directly executes the operations specified in the source program on inputs supplied by the user.

Chapter 1: introduction ٧

Language Processing System

Chapter 1: introduction ٨

The structure of the compiler

• A compiler operates in a sequence of phases

• each phase transforms the source program from one representation to another.

Chapter 1: introduction ٩

Chapter 1: introduction ١٠

Structure of a CompilerLexical AnalysisLexical Analysis

ParsingParsing

Semantic AnalysisSemantic Analysis

OptimizationOptimization

Code GenerationCode Generation

token stream

syntax tree

syntax tree

intermediate code

Intermediate Code Generate

Intermediate Code Generate

intermediate code

target machine code

These steps are often done in “phases” or “passes”.  This structure is very common.  Each step will be a set of algorithms we’ll explore.

Symbol TableSymbol Table

Front End

BackEnd

character stream

Chapter 1: introduction ١١

Analysis – Synthesis Model

Chapter 1: introduction ١٢

Lexical Analysis

Lexical AnalysisLexical Analysis

token stream

Read the character stream and converts it into a stream of tokens

A sequential set of characters, called a lexeme, becomes a token.

We’re recognizing substrings that are meaningful.

speed = speed + 10 *  me

What is meaningful about this

character stream

Sort of like recognizing the words in a sentence.

Chapter 1: introduction ١٣

Lexical Analysis

• The lexemes and their tokens will be determined

• Things that become lexemes: punctuation, symbols, keywords, constants, etc.

• The tool “lex” creates lexical analyzers

Chapter 1: introduction ١٤

Lexemes for this stringspeed = speed + 10 *  me

We’ll convert each of these into a token of the form <name, value>.Sometime the “value” will be omitted.

“speed” becomes:  <id, 1>, where “id” means this is a symbol and “1” is the location in the symbol table.

“10” becomes:  <constant, 10> (or just <10>)

Location Name1 speed

Symbol Table:

Chapter 1: introduction ١٥

Lexemes for this stringspeed = speed + 10 *  me

<id, 1> <=> <id,1> <+> <10> <*> <id, 2>

Location Name1 speed2 time

Symbol Table:

Lexical AnalysisLexical Analysis

Lexeme Token Symbol Table Entry

speed id 1

= ass

speed id 1

+ opr

10 num

* opr

time id 2

Lexical Table:

Chapter 1: introduction ١٦

Syntax Analysis

ParsingParsing

token stream

syntax treeConverting the token stream into a syntax tree.  

In a syntax tree, the nodes are operations and the children are the arguments to the operation.

<id, 1> <=> <id,1> <+> <10> <*> <id, 2>

What are the operations and arguments here?

Sort of like diagramming a sentence in English class.

Chapter 1: introduction ١٧

Grammar Rules

Chapter 1: introduction ١٨

Parse Tree

Chapter 1: introduction ١٩

Syntax Trees

<id, 1> <=> <id,1> <+> <10> <*> <id, 2>

Here’s the assignment operation

<=>

<id, 1> <id,1> <+> <10> <*> <id, 2>

Chapter 1: introduction ٢٠

A complete syntax tree<id, 1> <=> <id,1> <+> <10> <*> <id, 2>

<=>

<id, 1>  <+>

<id,1>  <*>

<10>  <id, 2>

ParsingParsing

Location Name1 speed2 time

Symbol Table:

Chapter 1: introduction ٢١

Semantic AnalysisSemantic AnalysisSemantic Analysis

syntax treeSemantics are the meaning of the programming language.

Now we’re going to analyze our syntax tree to see if it is, or can be converted, to a tree that semantically meaningful.

Common checks:  Valid argumentsType checking

<=>

<id, 1>  <+>

<id,1>  <*>

<10>  <id, 2>

Location Name Type1 speed float2 time float

Symbol Table:

Chapter 1: introduction ٢٢

Type Checking

We modify the syntax tree to fix semantic issues that are fixable

What if there are not fixable?What’s an example of something not fixable?

<=>

<id, 1>  <+>

<id,1>  <*>

<10> 

<id, 2>

Location Name Type1 speed float2 time float

Symbol Table:

<inttofloat> 

Coercion

Chapter 1: introduction ٢٣

Semantic Analysis<=>

<id, 1>  <+>

<id,1>  <*>

<10>  <id, 2>

<=>

<id, 1>  <+>

<id,1>  <*>

<10> 

<id, 2><inttofloat> 

Semantic AnalysisSemantic Analysis

Chapter 1: introduction ٢٤

Intermediate Code Generatorsyntax tree

Intermediate Code GenerateIntermediate Code Generate

intermediate code

Most compilers convert the syntax tree into some intermediate code.  This is then subject to optimization and conversion to the final machine code.

Why an intermediate code?

Chapter 1: introduction ٢٥

Intermediate code example

t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3

<=>

<id, 1>  <+>

<id,1>  <*>

<10> 

<id, 2><inttofloat> 

Each operation became a line of intermediate code.  The “t” values are temporary variables.

The textbook refers to this as three‐address code.  Each operation has up to 3 operands (some have fewer).

Can you see the three operands in each of these statements?

Chapter 1: introduction ٢٦

Intermediate code example

t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3

<=>

<id, 1>  <+>

<id,1>  <*>

<10> 

<id, 2><inttofloat> 

t2 = t1 * id2 Operands are:  t2, t1, id2This like an assembly instruc on:  mult t1, id2, t2

t1 = in ofloat(10)Operands are:  t1, 10

Chapter 1: introduction ٢٧

Optimization

OptimizationOptimization

intermediate code

intermediate code

t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3

Optimization:  Making the code more efficient.

Any optimization ideas here?

Chapter 1: introduction ٢٨

Optimization

OptimizationOptimization

t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3

t2 = 10.0 * id2id1 = id1 + t2

Chapter 1: introduction ٢٩

Code Generation

Code GenerationCode Generation

intermediate code

target machine codeTranslate the intermediate code into a target code.

t2 = 10.0 * id2id1 = id1 + t2

LDF R2, id2MULF R2, #10.0LDF R1, t2ADDF R1, R2STF id1, R1

Code GenerationCode Generation

Chapter 1: introduction ٣٠

Chapter 1: introduction ٣١

Chapter 1: introduction ٣٢

Cousins of the Compiler 1.Preprocessors

It produce input to compiler, it may perform the following functions:

• Macro processing # define ---------• File inclusion # include -------• Rational preprocessors: augment older

language with modern control structures• Language extensionse.g. : C uses # # to indicate data lease – access

statement that is embedded with in a C program.

Chapter 1: introduction ٣٣

Cousins of the Compiler 2- Assemblers

Assembly code is a mnemonic version of machine

e.g. : b := a + 2 is the same as

Load R1, a1ADD R1, # 2Store b , R1

Chapter 1: introduction ٣٤

Two Passes AssemblerThe simplest form of assemblers makes two

phases over the input

• In the first pass , the identifiers are found and stored in a symbol table

• In the second pass , it translates operations & identifiers to binary codes & addresses .

load: memory to registerstore : register to memory

Chapter 1: introduction ٣٥

Example

• E.G.: Hypothetical machine with 4-bits instruction code 0001, 0010, 0011 stand for load, store, and Add.

Address mode:• 00 ordinary address modes: next 8-bits

refer to memory address.• 10 immediate mode: next 8-bits are

constant.

Chapter 1: introduction ٣٦

Example

• The equivalent machine code might be:

inst. Code reg. no address mode address or value

Load: 0001 01 00 00000000Add: 0011 01 10 00000010Store: 0010 01 00 00000100

Chapter 1: introduction ٣٧

Cousins of the Compiler 3. Loaders and Link- editors

• Loading means to take the re locatable machine code and placing the instructions and data in memory at the proper locations .

• The Link- editor allows making a single program from several files of re-locatable machine code .

Chapter 1: introduction ٣٨

The Grouping of phases

Front end: consist of phases that depend on source program and are independent of the target machine (first 4 phases)

Back End: phases that depend on target machine

Chapter 1: introduction ٣٩

Front end vs. Back end

Chapter 1: introduction ٤٠

Reducing the no of passes

Pass: several phases are usually implemented in a single pass reading input file and writing output file.

it is desirable to have relatively few passes , since it takes time to read and write intermediate files .

Chapter 1: introduction ٤١

Compiler Construction Tools

• Scanner generator – Lexical Analyzer• Passer generators – Syntax analyzer• Syntax-directed translation engines• Intermediate code generator • automatic code generators - produce

machine code.• dataflow engines - code optimizing