comp455: compiler and language design · comp455: compiler and language design dr. alaa aljanaby...
TRANSCRIPT
Chapter 1: introduction ٢
Chapter 1: Introduction• Compilers draw together all of the theory and
techniques that you’ve learned about in most of your previous computer sciences courses.
• You will gain a deeper understanding of how compilers work, and be able to write better code.
• We will focusing on a “little language” - you will be writing a simple compiler or may be parts of it
Chapter 1: introduction ٤
Compilers and Interpreters
Compiler
Source program
Target program
Interpreter
Source program
InputOutput
Target program
input
output
Chapter 1: introduction ٥
The compiler
• A Compiler is a program that reads a program written in a language (Source language) and translates it into an equivalent program in another language (target language) .
• An important role of the complier is to report any error it detects during the translation process.
Chapter 1: introduction ٦
The interpreter
• It is another type of language processor, instead of producing a target program, an interpreter directly executes the operations specified in the source program on inputs supplied by the user.
Chapter 1: introduction ٨
The structure of the compiler
• A compiler operates in a sequence of phases
• each phase transforms the source program from one representation to another.
Chapter 1: introduction ١٠
Structure of a CompilerLexical AnalysisLexical Analysis
ParsingParsing
Semantic AnalysisSemantic Analysis
OptimizationOptimization
Code GenerationCode Generation
token stream
syntax tree
syntax tree
intermediate code
Intermediate Code Generate
Intermediate Code Generate
intermediate code
target machine code
These steps are often done in “phases” or “passes”. This structure is very common. Each step will be a set of algorithms we’ll explore.
Symbol TableSymbol Table
Front End
BackEnd
character stream
Chapter 1: introduction ١٢
Lexical Analysis
Lexical AnalysisLexical Analysis
token stream
Read the character stream and converts it into a stream of tokens
A sequential set of characters, called a lexeme, becomes a token.
We’re recognizing substrings that are meaningful.
speed = speed + 10 * me
What is meaningful about this
character stream
Sort of like recognizing the words in a sentence.
Chapter 1: introduction ١٣
Lexical Analysis
• The lexemes and their tokens will be determined
• Things that become lexemes: punctuation, symbols, keywords, constants, etc.
• The tool “lex” creates lexical analyzers
Chapter 1: introduction ١٤
Lexemes for this stringspeed = speed + 10 * me
We’ll convert each of these into a token of the form <name, value>.Sometime the “value” will be omitted.
“speed” becomes: <id, 1>, where “id” means this is a symbol and “1” is the location in the symbol table.
“10” becomes: <constant, 10> (or just <10>)
Location Name1 speed
Symbol Table:
Chapter 1: introduction ١٥
Lexemes for this stringspeed = speed + 10 * me
<id, 1> <=> <id,1> <+> <10> <*> <id, 2>
Location Name1 speed2 time
Symbol Table:
Lexical AnalysisLexical Analysis
Lexeme Token Symbol Table Entry
speed id 1
= ass
speed id 1
+ opr
10 num
* opr
time id 2
Lexical Table:
Chapter 1: introduction ١٦
Syntax Analysis
ParsingParsing
token stream
syntax treeConverting the token stream into a syntax tree.
In a syntax tree, the nodes are operations and the children are the arguments to the operation.
<id, 1> <=> <id,1> <+> <10> <*> <id, 2>
What are the operations and arguments here?
Sort of like diagramming a sentence in English class.
Chapter 1: introduction ١٩
Syntax Trees
<id, 1> <=> <id,1> <+> <10> <*> <id, 2>
Here’s the assignment operation
<=>
<id, 1> <id,1> <+> <10> <*> <id, 2>
Chapter 1: introduction ٢٠
A complete syntax tree<id, 1> <=> <id,1> <+> <10> <*> <id, 2>
<=>
<id, 1> <+>
<id,1> <*>
<10> <id, 2>
ParsingParsing
Location Name1 speed2 time
Symbol Table:
Chapter 1: introduction ٢١
Semantic AnalysisSemantic AnalysisSemantic Analysis
syntax treeSemantics are the meaning of the programming language.
Now we’re going to analyze our syntax tree to see if it is, or can be converted, to a tree that semantically meaningful.
Common checks: Valid argumentsType checking
<=>
<id, 1> <+>
<id,1> <*>
<10> <id, 2>
Location Name Type1 speed float2 time float
Symbol Table:
Chapter 1: introduction ٢٢
Type Checking
We modify the syntax tree to fix semantic issues that are fixable
What if there are not fixable?What’s an example of something not fixable?
<=>
<id, 1> <+>
<id,1> <*>
<10>
<id, 2>
Location Name Type1 speed float2 time float
Symbol Table:
<inttofloat>
Coercion
Chapter 1: introduction ٢٣
Semantic Analysis<=>
<id, 1> <+>
<id,1> <*>
<10> <id, 2>
<=>
<id, 1> <+>
<id,1> <*>
<10>
<id, 2><inttofloat>
Semantic AnalysisSemantic Analysis
Chapter 1: introduction ٢٤
Intermediate Code Generatorsyntax tree
Intermediate Code GenerateIntermediate Code Generate
intermediate code
Most compilers convert the syntax tree into some intermediate code. This is then subject to optimization and conversion to the final machine code.
Why an intermediate code?
Chapter 1: introduction ٢٥
Intermediate code example
t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3
<=>
<id, 1> <+>
<id,1> <*>
<10>
<id, 2><inttofloat>
Each operation became a line of intermediate code. The “t” values are temporary variables.
The textbook refers to this as three‐address code. Each operation has up to 3 operands (some have fewer).
Can you see the three operands in each of these statements?
Chapter 1: introduction ٢٦
Intermediate code example
t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3
<=>
<id, 1> <+>
<id,1> <*>
<10>
<id, 2><inttofloat>
t2 = t1 * id2 Operands are: t2, t1, id2This like an assembly instruc on: mult t1, id2, t2
t1 = in ofloat(10)Operands are: t1, 10
Chapter 1: introduction ٢٧
Optimization
OptimizationOptimization
intermediate code
intermediate code
t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3
Optimization: Making the code more efficient.
Any optimization ideas here?
Chapter 1: introduction ٢٨
Optimization
OptimizationOptimization
t1 = in ofloat(10)t2 = t1 * id2t3 = id1 + t2id1 = t3
t2 = 10.0 * id2id1 = id1 + t2
Chapter 1: introduction ٢٩
Code Generation
Code GenerationCode Generation
intermediate code
target machine codeTranslate the intermediate code into a target code.
t2 = 10.0 * id2id1 = id1 + t2
LDF R2, id2MULF R2, #10.0LDF R1, t2ADDF R1, R2STF id1, R1
Code GenerationCode Generation
Chapter 1: introduction ٣٢
Cousins of the Compiler 1.Preprocessors
It produce input to compiler, it may perform the following functions:
• Macro processing # define ---------• File inclusion # include -------• Rational preprocessors: augment older
language with modern control structures• Language extensionse.g. : C uses # # to indicate data lease – access
statement that is embedded with in a C program.
Chapter 1: introduction ٣٣
Cousins of the Compiler 2- Assemblers
Assembly code is a mnemonic version of machine
e.g. : b := a + 2 is the same as
Load R1, a1ADD R1, # 2Store b , R1
Chapter 1: introduction ٣٤
Two Passes AssemblerThe simplest form of assemblers makes two
phases over the input
• In the first pass , the identifiers are found and stored in a symbol table
• In the second pass , it translates operations & identifiers to binary codes & addresses .
load: memory to registerstore : register to memory
Chapter 1: introduction ٣٥
Example
• E.G.: Hypothetical machine with 4-bits instruction code 0001, 0010, 0011 stand for load, store, and Add.
Address mode:• 00 ordinary address modes: next 8-bits
refer to memory address.• 10 immediate mode: next 8-bits are
constant.
Chapter 1: introduction ٣٦
Example
• The equivalent machine code might be:
inst. Code reg. no address mode address or value
Load: 0001 01 00 00000000Add: 0011 01 10 00000010Store: 0010 01 00 00000100
Chapter 1: introduction ٣٧
Cousins of the Compiler 3. Loaders and Link- editors
• Loading means to take the re locatable machine code and placing the instructions and data in memory at the proper locations .
• The Link- editor allows making a single program from several files of re-locatable machine code .
Chapter 1: introduction ٣٨
The Grouping of phases
Front end: consist of phases that depend on source program and are independent of the target machine (first 4 phases)
Back End: phases that depend on target machine
Chapter 1: introduction ٤٠
Reducing the no of passes
Pass: several phases are usually implemented in a single pass reading input file and writing output file.
it is desirable to have relatively few passes , since it takes time to read and write intermediate files .