Transcript
Page 1: Compilation   0368 -3133

Compilation 0368-3133

Lecture 1: Introduction

Noam Rinetzky

1

Page 2: Compilation   0368 -3133

Admin

• Lecturer: Noam Rinetzky– [email protected]– http://www.cs.tau.ac.il/~maon

• T.A.: Shachar Itzhaky– [email protected]

• Textbooks: – Modern Compiler Design – Compilers: principles, techniques and tools 2

Page 3: Compilation   0368 -3133

Admin• Compiler Project 40%

– 4 practical exercises– Groups of 3

• 1 theoretical exercise 10%– Groups of 1

• Final exam 50% – must pass

3

Page 4: Compilation   0368 -3133

Course Goals

• What is a compiler• How does it work• (Reusable) techniques & tools

4

Page 5: Compilation   0368 -3133

Course Goals

• What is a compiler• How does it work• (Reusable) techniques & tools

• Programming language implementation– runtime systems

• Execution environments– Assembly, linkers, loaders, OS

5

Page 6: Compilation   0368 -3133

6

What is a Compiler?

“A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language).

The most common reason for wanting to transform source code is to create an executable program.”

--Wikipedia

Page 7: Compilation   0368 -3133

7

What is a Compiler?source language target language

Compiler

Executable code

exe

Sourcetext

txt

Page 8: Compilation   0368 -3133

8

What is a Compiler?

Executable code

exe

Sourcetext

txt

Compiler

int a, b;a = 2;b = a*2 + 1;

MOV R1,2SAL R1INC R1MOV R2,R1

Page 9: Compilation   0368 -3133

9

What is a Compiler?source language target language

CC++

PascalJava

PostscriptTeX

PerlJavaScript

PythonRuby

Prolog

LispScheme

MLOCaml

IA32IA64

SPARC

CC++

PascalJava

Java Bytecode…

Compiler

Page 10: Compilation   0368 -3133

Lecture Outline

• High level programming languages• Interpreters vs. Compilers• Techniques and tools (1.1)

– why study compilers … • Handwritten toy compiler & interpreter (1.2)• Summary

10

Page 11: Compilation   0368 -3133

High Level Programming Languages• Imperative Algol, PL1, Fortran, Pascal, Ada, Modula, C

– Closely related to “von Neumann” Computers• Object-oriented Simula, Smalltalk, Modula3, C++, Java,

C#, Python– Data abstraction and ‘evolutionary’ form of program

development• Class an implementation of an abstract data type (data+code)• Objects Instances of a class• Inheritance + generics

• Functional Lisp, Scheme, ML, Miranda, Hope, Haskel, OCaml, F#

• Logic Programming Prolog11

Page 12: Compilation   0368 -3133

Other Languages• Hardware description languages VHDL

– The program describes Hardware components– The compiler generates hardware layouts

• Scripting languages Shell, C-shell, REXX, Perl– Include primitives constructs from the current software

environment• Web/Internet HTML, Telescript, JAVA, Javascript • Graphics and Text processing TeX, LaTeX, postscript

– The compiler generates page layouts• Intermediate-languages P-Code, Java bytecode, IDL

12

Page 13: Compilation   0368 -3133

Interpreter

• A program which executes a program• Input a program (P) + its input (x)• Output the computed output (P(x))

13

Interpreter

Sourcetext

txt

Input

OutputExecutionEngine

Page 14: Compilation   0368 -3133

Interpreter

• A program which executes a program• Input a program (P) + its input (x)• Output the computed output (“P(x)”)

14

Interpreter

5

ExecutionEngine

int x;scanf(“%d”, &x);x = x + 1 ;printf(“%d”, x);

6

Page 15: Compilation   0368 -3133

Compiler

• A program which transforms programs • Input a program (P)• Output an object program (O)

– For any x, “O(x)” = “P(x))”

15

Compiler

Sourcetext

txt

Backend(synthesis) Executabl

e code

exe

P O

Page 16: Compilation   0368 -3133

Example

16

Compiler

Backend(synthesis)

int x;scanf(“%d”, &x);x = x + 1 ;printf(“%d”, x);

add %fp,-8, %l1mov %l1, %o1call scanfld [%fp-8],%l0add %l0,1,%l0st %l0,[%fp-8]ld [%fp-8], %l1mov %l1, %o1call printf

5

6

Page 17: Compilation   0368 -3133

17

Compiler vs. Interpreter

Executable code

exe

Sourcetext

txtBackend(synthesi

s)

Sourcetext

txt

Input

OutputExecutionEngine

5

Output

Page 18: Compilation   0368 -3133

Remarks

• Both compilers and interpreters are programs written in high level languages– Requires additional step to compile the

compiler/interpreter

• Compilers and interpreters share functionality

18

Page 19: Compilation   0368 -3133

How to write a compiler?

19

L1 CompilerExecutable compiler

exe

L2 Compiler source

txtL1

Page 20: Compilation   0368 -3133

How to write a compiler?

20

L1 CompilerExecutable compiler

exe

L2 Compiler source

txtL1

L2 CompilerExecutable program

exe

Program source

txtL2

=

Page 21: Compilation   0368 -3133

How to write a compiler?

21

L1 CompilerExecutable compiler

exe

L2 Compiler source

txtL1

L2 CompilerExecutable program

exe

Program source

txtL2

=

21

ProgramOutput

Y

Input

X

=

Page 22: Compilation   0368 -3133

Bootstrapping a compiler

22

L1 Compilersimple

L2 executable compiler

exeSimple

L2 compiler source

txtL1

L2s CompilerInefficient adv.

L2 executable compiler

exeadvanced

L2 compiler source

txtL2

L2 CompilerEfficient adv.

L2 executable compiler

Yadvanced

L2 compiler source

X

=

=

Page 23: Compilation   0368 -3133

Conceptual structure of a compiler

23

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend(synthesi

s)

Compiler

Frontend(analysis)

Page 24: Compilation   0368 -3133

Interpreter vs. Compiler• Conceptually simpler

– “define” the prog. lang. • Can provide more specific

error report• Easier to port

• Faster response time

• [More secure]

• How do we know the translation is correct?

• Can report errors before input is given

• More efficient code– Compilation can be expensive – move computations to

compile-time• compile-time + execution-time <

interpretation-time is possible

24

Page 25: Compilation   0368 -3133

Interpreters report input-specific definite errors

• Input-program

• Input data – y = -1– y = 0

25

scanf(“%d”, &y);if (y < 0)

x = 5;...If (y <= 0)

z = x + 1;

Page 26: Compilation   0368 -3133

Compilers report input-independent possible errors

• Input-program

• Compiler-Output – “line 88: x may be used before set''

26

scanf(“%d”, &y);if (y < 0)

x = 5;...If (y <= 0)

z = x + 1;

Page 27: Compilation   0368 -3133

Compiled programs are usually more efficient than

27

scanf(“%d”,&x);y = 5 ;z = 7 ;x = x + y * z;printf(“%d”,x);

add %fp,-8, %l1mov %l1, %o1call scanfmov 5, %l0st %l0,[%fp-12]mov 7,%l0st %l0,[%fp-16]ld [%fp-8], %l0ld [%fp-8],%l0add %l0, 35 ,%l0st %l0,[%fp-8]ld [%fp-8], %l1mov %l1, %o1 call printf

Backend(synthesi

s)

Compiler

Page 28: Compilation   0368 -3133

Compiler vs. Interpreter

28

Source

Code

Executable

Code Machine

Source

Code

Intermediate

Code Interpreter

preprocessing

processingpreprocessing

processing

Page 29: Compilation   0368 -3133

Lecture Outline

• High level programming languages• Interpreters vs. Compilers• Techniques and tools (1.1)

– why study compilers … • Handwritten toy compiler & interpreter (1.2)• Summary

29

Page 30: Compilation   0368 -3133

Why Study Compilers?

• Become a compiler writer– New programming languages– New machines– New compilation modes: “just-in-time”– New optimization goals (energy)

• Using some of the techniques in other contexts

• Design a very big software program using a reasonable effort

• Learn applications of many CS results (formal languages, decidability, graph algorithms, dynamic programming, ...

• Better understating of programming languages and machine architectures

• Become a better programmer

30

Page 31: Compilation   0368 -3133

Why Study Compilers?

• Compiler construction is successful– Clear problem – Proper structure of the solution– Judicious use of formalisms

• Wider application– Many conversions can be viewed as

compilation• Useful algorithms

31

Page 32: Compilation   0368 -3133

32

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend(synthesi

s)

Compiler

Frontend(analysis)

Page 33: Compilation   0368 -3133

33

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend(synthesi

s)

Compiler

Frontend(analysis)

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

Page 34: Compilation   0368 -3133

Proper Design

• Simplify the compilation phase– Portability of the compiler frontend– Reusability of the compiler backend

• Professional compilers are integrated

34

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

IR

Page 35: Compilation   0368 -3133

35

Modularity

SourceLanguage 1

txt

SemanticRepresentation

BackendTL2

FrontendSL2

int a, b;a = 2;b = a*2 + 1;

MOV R1,2SAL R1INC R1MOV R2,R1

FrontendSL3

FrontendSL1

BackendTL1

BackendTL3

SourceLanguage 1

txt

SourceLanguage 1

txt

Executabletarget 1

exe

Executabletarget 1

exe

Executabletarget 1

exe

SET R1,2STORE #0,R1SHIFT R1,1STORE #1,R1ADD R1,1STORE #2,R1

Page 36: Compilation   0368 -3133

Judicious use of formalisms• Regular expressions (lexical analysis)• Context-free grammars (syntactic analysis)• Attribute grammars (context analysis)• Code generator generators (dynamic programming)

• But also some nitty-gritty programming

36

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

Page 37: Compilation   0368 -3133

Use of program-generating tools

• Parts of the compiler are automatically generated from specification

37

Stream of tokens

Jlex

regular expressions

input program scanner

Page 38: Compilation   0368 -3133

Use of program-generating tools

• Parts of the compiler are automatically generated from specification

38

Jcup

Context free grammar

Stream of tokens parser Syntax tree

Page 39: Compilation   0368 -3133

Use of program-generating tools• Simpler compiler construction

– Less error prone– More flexible

• Use of pre-canned tailored code– Use of dirty program tricks

• Reuse of specification

39

toolspecification

input (generated) code

output

Page 40: Compilation   0368 -3133

Wide applicability

• Structured data can be expressed using context free grammars– HTML files– Postscript– Tex/dvi files– …

40

Page 41: Compilation   0368 -3133

Generally useful algorithms

• Parser generators• Garbage collection• Dynamic programming• Graph coloring

41

Page 42: Compilation   0368 -3133

Lecture Outline

• High level programming languages• Interpreters vs. Compilers• Techniques and tools (1.1)

– why study compilers … • Handwritten toy compiler & interpreter (1.2)• Summary

42

Page 43: Compilation   0368 -3133

43

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend(synthesi

s)

Compiler

Frontend(analysis)

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

Page 44: Compilation   0368 -3133

Toy compiler/interpreter (1.2)

• Trivial programming language• Stack machine• Compiler/interpreter written in C• Demonstrate the basic steps

44

Page 45: Compilation   0368 -3133

The abstract syntax tree (AST)

• Intermediate program representation• Defines a tree

– Preserves program hierarchy• Generated by the parser• Keywords and punctuation symbols are not

stored – Not relevant once the tree exists

45

Page 46: Compilation   0368 -3133

Syntax tree for 5*(a+b)

46

expression

number expression‘*’

identifier

expression‘(’ ‘)’

‘+’ identifier

‘a’ ‘b’

‘5’

Page 47: Compilation   0368 -3133

Abstract Syntax tree for 5*(a+b)

47

‘*’

‘+’

‘a’ ‘b’

‘5’

Page 48: Compilation   0368 -3133

Annotated Abstract Syntax tree

48

‘*’

‘+’

‘a’ ‘b’

‘5’

type:real

loc: reg1

type:real

loc: reg2

type:real

loc: sp+8

type:real

loc: sp+24

type:integer

Page 49: Compilation   0368 -3133

49

Structure of toy Compiler / interpreter

Executable code

exe

Sourcetext

txtSemantic

Representation

Backend (synthesi

s)Frontend(analysis)

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis(NOP)

IntermediateRepresentati

on(AST)

CodeGeneration

ExecutionEngine

Execution Engine Output*

* Programs in our PL do not take input

Page 50: Compilation   0368 -3133

Source Language

• Fully parameterized expressions• Arguments can be a single digit

(4 + (3 * 9))✗3 + 4 + 5✗(12 + 3)

50

expression digit | ‘(‘ expression operator expression ‘)’operator ‘+’ | ‘*’digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

Page 51: Compilation   0368 -3133

Driver for the Toy Compiler

51

#include "parser.h" /* for type AST_node */#include "backend.h" /* for Process() */#include "error.h" /* for Error() */

int main(void) { AST_node *icode;

if (!Parse_program(&icode)) Error("No top-level expression"); Process(icode);

return 0;}

Page 52: Compilation   0368 -3133

Lexical Analysis

• Partitions the inputs into tokens– DIGIT– EOF– ‘*’– ‘+’– ‘(‘– ‘)’

• Each token has its representation• Ignores whitespaces

52

Page 53: Compilation   0368 -3133

lex.h: Header File for Lexical Analysis

53

/* Define class constants *//* Values 0-255 are reserved for ASCII characters */#define EoF 256#define DIGIT 257typedef struct {

int class; char repr;} Token_type;

extern Token_type Token;extern void get_next_token(void);

Page 54: Compilation   0368 -3133

#include "lex.h" token_type Token; // Global variable

void get_next_token(void) { int ch; do { ch = getchar(); if (ch < 0) { Token.class = EoF; Token.repr = '#'; return; } } while (Layout_char(ch)); if ('0' <= ch && ch <= '9') {Token.class = DIGIT;} else {Token.class = ch;} Token.repr = ch;}

static int Layout_char(int ch) { switch (ch) { case ' ': case '\t': case '\n': return 1; default: return 0; }}

Lexical Analyzer

54

Page 55: Compilation   0368 -3133

Parser

• Invokes lexical analyzer• Reports syntax errors• Constructs AST

55

Page 56: Compilation   0368 -3133

Driver for the Toy Compiler

56

#include "parser.h" /* for type AST_node */#include "backend.h" /* for Process() */#include "error.h" /* for Error() */

int main(void) { AST_node *icode;

if (!Parse_program(&icode)) Error("No top-level expression"); Process(icode);

return 0;}

Page 57: Compilation   0368 -3133

Parser Environment

57

#include "lex.h”, "error.h”, "parser.h"

static Expression *new_expression(void) { return (Expression *)malloc(sizeof (Expression));}static void free_expression(Expression *expr) { free((void *)expr);}

static int Parse_operator(Operator *oper_p);static int Parse_expression(Expression **expr_p);int Parse_program(AST_node **icode_p) { Expression *expr; get_next_token(); /* start the lexical analyzer */ if (Parse_expression(&expr)) { if (Token.class != EoF) { Error("Garbage after end of program"); } *icode_p = expr; return 1; } return 0;}

Page 58: Compilation   0368 -3133

Parser Header File

58

typedef int Operator;typedef struct _expression { char type; /* 'D' or 'P' */ int value; /* for 'D' type expression */ struct _expression *left, *right; /* for 'P' type expression */ Operator oper; /* for 'P' type expression */ } Expression;

typedef Expression AST_node; /* the top node is an Expression */extern int Parse_program(AST_node **);

Page 59: Compilation   0368 -3133

AST for (2 * ((3*4)+9))

59

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3

Page 60: Compilation   0368 -3133

Top-Down Parsing• Optimistically build the tree from the root to leaves• For every P A1 A2 … An | B1 B2 … Bm

– If A1 succeeds• If A2 succeeds & A3 succeeds & …• Else fail

– Else if B1 succeeds• If B2 succeeds & B3 succeeds & ..• Else fail

– Else fail

• Recursive descent parsing– Simplified: no backtracking

• Can be applied for certain grammars 60

Page 61: Compilation   0368 -3133

static int Parse_expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression(); if (Token.class == DIGIT) { expr->type = 'D'; expr->value = Token.repr - '0'; get_next_token(); return 1; } if (Token.class == '(') { expr->type = 'P'; get_next_token(); if (!Parse_expression(&expr->left)) { Error("Missing expression"); } if (!Parse_operator(&expr->oper)) { Error("Missing operator"); } if (!Parse_expression(&expr->right)) { Error("Missing expression"); } if (Token.class != ')') { Error("Missing )"); } get_next_token(); return 1; } /* failed on both attempts */ free_expression(expr); return 0;}

Parser

61

static int Parse_operator(Operator *oper) { if (Token.class == '+') { *oper = '+'; get_next_token(); return 1; } if (Token.class == '*') { *oper = '*'; get_next_token(); return 1; } return 0;}

Page 62: Compilation   0368 -3133

AST for (2 * ((3*4)+9))

62

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3

Page 63: Compilation   0368 -3133

Semantic Analysis

• Trivial in our case• No identifiers• No procedure / functions• A single type for all expressions

63

Page 64: Compilation   0368 -3133

Code generation

• Stack based machine• Four instructions

– PUSH n– ADD– MULT– PRINT

64

Page 65: Compilation   0368 -3133

Code generation

65

#include "parser.h" #include "backend.h" static void Code_gen_expression(Expression *expr) { switch (expr->type) { case 'D': printf("PUSH %d\n", expr->value); break; case 'P': Code_gen_expression(expr->left); Code_gen_expression(expr->right); switch (expr->oper) { case '+': printf("ADD\n"); break; case '*': printf("MULT\n"); break; } break; }}void Process(AST_node *icode) { Code_gen_expression(icode); printf("PRINT\n");}

Page 66: Compilation   0368 -3133

Compiling (2*((3*4)+9))

66

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3

Page 67: Compilation   0368 -3133

Generated Code Execution

67

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack Stack’

2

Page 68: Compilation   0368 -3133

Generated Code Execution

68

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

3

2

Stack

2

Page 69: Compilation   0368 -3133

Generated Code Execution

69

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

4

3

2

Stack

3

2

Page 70: Compilation   0368 -3133

Generated Code Execution

70

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

12

2

Stack

4

3

2

Page 71: Compilation   0368 -3133

Generated Code Execution

71

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

9

12

2

Stack

12

2

Page 72: Compilation   0368 -3133

Generated Code Execution

72

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

21

2

Stack

9

12

2

Page 73: Compilation   0368 -3133

Generated Code Execution

73

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

42

Stack

21

2

Page 74: Compilation   0368 -3133

Generated Code Execution

74

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’Stack

42

Page 75: Compilation   0368 -3133

Interpretation

• Bottom-up evaluation of expressions• The same interface of the compiler

75

Page 76: Compilation   0368 -3133

#include "parser.h" #include "backend.h”

static int Interpret_expression(Expression *expr) { switch (expr->type) { case 'D': return expr->value; break; case 'P': int e_left = Interpret_expression(expr->left); int e_right = Interpret_expression(expr->right); switch (expr->oper) { case '+': return e_left + e_right; case '*': return e_left * e_right; break; }}

void Process(AST_node *icode) { printf("%d\n", Interpret_expression(icode));}

76

Page 77: Compilation   0368 -3133

Interpreting (2*((3*4)+9))

77

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3

Page 78: Compilation   0368 -3133

Lecture Outline

• High level programming languages• Interpreters vs. Compilers• Techniques and tools (1.1)

– why study compilers … • Handwritten toy compiler & interpreter (1.2)• Summary

78

Page 79: Compilation   0368 -3133

79

Summary: Journey inside a compiler

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

x = b*b – 4*a*c

txt

<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>

TokenStream

Page 80: Compilation   0368 -3133

80LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>

‘b’ ‘4’

‘b’‘a’

‘c’

ID

ID

ID

ID

ID

factor

term factorMULT

term

expression

expression

factor

term factorMULT

term

expression

term

MULT factor

MINUS

SyntaxTree

Summary: Journey inside a compiler

Page 81: Compilation   0368 -3133

81Sem.Analysi

s

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

LexicalAnalysi

s

Syntax Analysi

s

AbstractSyntaxTree

Summary: Journey inside a compiler

Page 82: Compilation   0368 -3133

82LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

type: intloc: sp+8

type: intloc: const

type: intloc: sp+16

type: intloc: sp+16

type: intloc: sp+24

type: intloc: R2

type: intloc: R2

type: intloc: R1

type: intloc: R1

AnnotatedAbstractSyntaxTree

Summary: Journey inside a compiler

Page 83: Compilation   0368 -3133

83

Journey inside a compiler

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

type: intloc: sp+8

type: intloc: const

type: intloc: sp+16

type: intloc: sp+16

type: intloc: sp+24

type: intloc: R2

type: intloc: R2

type: intloc: R1

type: intloc: R1

R2 = 4*aR1=b*bR2= R2*cR1=R1-R2

IntermediateRepresentation

Page 84: Compilation   0368 -3133

84

Journey inside a compiler

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

type: intloc: sp+8

type: intloc: const

type: intloc: sp+16

type: intloc: sp+16

type: intloc: sp+24

type: intloc: R2

type: intloc: R2

type: intloc: R1

type: intloc: R1

R2 = 4*aR1=b*bR2= R2*cR1=R1-R2

MOV R2,(sp+8)SAL R2,2MOV R1,(sp+16)MUL R1,(sp+16)MUL R2,(sp+24)SUB R1,R2

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

IntermediateRepresentation

AssemblyCode

Page 85: Compilation   0368 -3133

85

Error Checking• In every stage…

• Lexical analysis: illegal tokens• Syntax analysis: illegal syntax • Semantic analysis: incompatible types, undefined variables,

• Every phase tries to recover and proceed with compilation (why?)– Divergence is a challenge

Page 86: Compilation   0368 -3133

86

The Real Anatomy of a Compiler

Executable

code

exe

Sourcetext

txtLexicalAnalysi

s

Sem.Analysis

Process text input

characters SyntaxAnalysi

stokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimization

IR CodegenerationIR

Target code optimizatio

n

Symbolic Instructions

SI Machine code

generation

Write executable

outputMI

Page 87: Compilation   0368 -3133

87

Optimizations• “Optimal code” is out of reach

– many problems are undecidable or too expensive (NP-complete)– Use approximation and/or heuristics

• Loop optimizations: hoisting, unrolling, … • Peephole optimizations• Constant propagation

– Leverage compile-time information to save work at runtime (pre-computation)

• Dead code elimination– space

• …

Page 88: Compilation   0368 -3133

88

Machine code generation• Register allocation

– Optimal register assignment is NP-Complete– In practice, known heuristics perform well

• assign variables to memory locations• Instruction selection

– Convert IR to actual machine instructions

• Modern architectures– Multicores– Challenging memory hierarchies

Page 89: Compilation   0368 -3133

89

Compiler Construction Toolset

• Lexical analysis generators– Lex, JLex

• Parser generators– Yacc, Jcup

• Syntax-directed translators• Dataflow analysis engines

Page 90: Compilation   0368 -3133

Shortcuts

• Avoid generating machine code• Use local assembler• Generate C code

90

Page 91: Compilation   0368 -3133

One More Thing: Runtime systems

• Responsible for language dependent dynamic resource allocation

• Memory allocation– Stack frames– Heap

• Garbage collection• I/O• Interacts with operating system/architecture• Important part of the compiler

91

Page 92: Compilation   0368 -3133

92

Summary (for Real)

• Compiler is a program that translates code from source language to target language

• Compilers play a critical role– Bridge from programming languages to the machine– Many useful techniques and algorithms– Many useful tools (e.g., lexer/parser generators)

• Compiler constructed from modular phases– Reusable – Different front/back ends

Page 93: Compilation   0368 -3133

93


Top Related