4. selected topics in compiler construction · 4. selected topics in compiler construction c prof....

Compilers and Language Processing ToolsSummer Term 2011

Prof. Dr. Arnd Poetzsch-Heffter

Software Technology GroupTU Kaiserslautern

c© Prof. Dr. Arnd Poetzsch-Heffter 1

Content of Lecture

1. Introduction2. Syntax and Type Analysis

2.1 Lexical Analysis2.2 Context-Free Syntax Analysis2.3 Context-Dependent Analysis

3. Translation to Target Language3.1 Translation of Imperative Language Constructs3.2 Translation of Object-Oriented Language Constructs

4. Selected Topics in Compiler Construction4.1 Intermediate Languages4.2 Optimization4.3 Register Allocation4.4 Just-in-time Compilation4.5 Further Aspects of Compilation

5. Garbage Collection6. XML Processing (DOM, SAX, XSLT)

c© Prof. Dr. Arnd Poetzsch-Heffter 2

4. Selected Topics in CompilerConstruction

c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 3

Chapter Outline

4. Selected Topics in Compiler Construction4.1 Intermediate Languages

4.1.1 3-Address Code4.1.2 Other Intermediate Languages

4.2 Optimization4.2.1 Classical Optimization Techniques4.2.2 Potential of Optimizations4.2.3 Data Flow Analysis4.2.4 Non-local Optimization

4.3 Register Allocation4.3.1 Sethi-Ullman Algorithm4.3.2 Register Allocation by Graph Coloring

4.4 Just-in-time Compilation4.5 Further Aspects of Compilation


Selected topics in compiler construction

Focus:• Techniques that go beyond the direct translation of source

languages to target languages• Concentrate on concepts instead of language-dependent details• Use program representations tailored for the considered tasks

(instead of source language syntax):I simplifies representationI (but needs more work to integrate tasks)


Selected topics in compiler construction (2)

Learning objectives:• Intermediate languages for translation and optimization of

imperative languages• Different optimization techniques• Different static analysis techniques for (intermediate) programs• Register allocation• Some aspects of code generation


Intermediate languages

4.1 Intermediate languages




• Intermediate languages are used asI appropriate program representation for certain language

implementation tasksI common representation of programs of different source languages

Source Language 1

Source Language 2

Source Language n

Intermediate Language

Target Language 1

Target Language 2

Target Language m

...

...



Intermediate languages (2)

• Intermediate languages for translation are comparable to datastructures in algorithm design, i.e., for each task, an intermediatelanguage is more or less suitable.

• Intermediate languages can conceptually be seen as abstractmachines.


Intermediate languages 3-Address Code

4.1.1 3-Address Code



3-address code

3-address code (3AC) is a common intermediate language with manyvariants.

Properties:

• only elementary data types (but often arrays)• no nested expressions• sequential execution, jumps and procedure calls as statements• named variables as in a high level language• unbounded number of temporary variables



3-address code (2)

A program in 3AC consists of

• a list of global variables

• a list of procedures with parameters and local variables

• a main procedure

• each procedure has a sequence of 3AC commands as body



3AC commands

Syntax Explanation

x := y bop zx : = uop zx:= y

x: variable (global, local, parameter, temporary)y,z: variable or constantbop: binary operatoruop: unary operator

goto Lif x cop y goto L

jump or conditional jump to label Lcop: comparison operatoronly procedure-local jumps

x:= a[i]a[i]:= y a one-dimensional array

x : = & ax:= *y*x := y

a global, local variable or parameter& a address of a* dereferencing operator



3AC commands (2)

Syntax Explanation

param xcall preturn y

call p(x1, ..., xn) is encoded as:(block is considered as one command)param x1

...

param xn

call p

return y causes jump to return addresswith (optional) result y

We assume that 3AC only contains labelsfor which jumps are used in the program.



Basic blocks

A sequence of 3AC commands can be uniquely partitioned into basicblocks.

A basic block B is a maximal sequence of commands such that• at the end of B, exactly one jump, procedure call, or return

command occurs• labels only occur at the first command of a basic block



Basic blocks (2)

Remarks:• The commands of a basic block are always executed sequentially,

there are no jumps to the inside• Often, a designated exit-block for a procedure containing the

return jump at its end is required. This is handled by additionaltransformations.

• The transitions between basic blocks are often denoted by flowcharts.



Example: 3AC and basic blocks

Consider the following C program:Beispiel: (3AC und Basisblöcke)

Wir betrachten den 3AC für ein C-Programm:

int a[2];

int b[7];

int skprod(int i1, int i2, int lng) {... }

int main( ) {

a[0] = 1; a[1] = 2;

b[0] = 4; b[1] = 5; b[2] = 6;

skprod(0 1 2);skprod(0,1,2);

return 0;

}

3AC mit Basisblockzerlegung für die Prozedur main:

main:

a[0] := 1a[0] := 1

a[1] := 2

b[0] := 4

b[1] := 5

b[2] := 6

param 0

param 1

param 2

call skprod

28.06.2007 296© A. Poetzsch-Heffter, TU Kaiserslautern

return 0



Example: 3AC and basic blocks (2)

3AC with basic block partitioning for main procedure

Beispiel: (3AC und Basisblöcke)

Wir betrachten den 3AC für ein C-Programm:

int a[2];

int b[7];

int skprod(int i1, int i2, int lng) {... }

int main( ) {

a[0] = 1; a[1] = 2;

b[0] = 4; b[1] = 5; b[2] = 6;

skprod(0 1 2);skprod(0,1,2);

return 0;

}

3AC mit Basisblockzerlegung für die Prozedur main:

main:

a[0] := 1a[0] := 1

a[1] := 2

b[0] := 4

b[1] := 5

b[2] := 6

param 0

param 1

param 2

call skprod


return 0




Procedure skprod:Prozedur skprod mit 3AC und Basisblockzerlegung:

int skprod(int i1, int i2, int lng) {

int ix, res = 0;

for( ix=0; ix <= lng-1; ix++ ){

res += a[i1+ix] * b[i2+ix];

}

skprod:

}

return res;

}

res:= 0

ix := 0

t0 := lng-1

if ix<=t0

true false

t1 := i1+ix

t2 := a[t1]

t1 := i2+ix

t3 := b[t1]

t1 := t2*t3

return res

t1 := t2*t3

res:= es+t1

ix := ix+1





Procedure skprod as 3AC with basic blocks

Prozedur skprod mit 3AC und Basisblockzerlegung:

int skprod(int i1, int i2, int lng) {

int ix, res = 0;

for( ix=0; ix <= lng-1; ix++ ){

res += a[i1+ix] * b[i2+ix];

}

skprod:

}

return res;

}

res:= 0

ix := 0

t0 := lng-1

if ix<=t0

true false

t1 := i1+ix

t2 := a[t1]

t1 := i2+ix

t3 := b[t1]

t1 := t2*t3

return res

t1 := t2*t3

res:= es+t1

ix := ix+1




Intermediate Language Variations

3 AC after elimination of array operations (at above example)

Variation im Rahmen einer Zwischensprache:

3-Adress-Code nach Elimination von Feldoperationen

anhand des obigen Beispiels:

skprod:p

res:= 0

ix := 0

t0 := lng-1

if ix<=t0

t1 := i1+ix

tx := t1*4

ta := a+tx

true false

return res

t2 := *ta

t1 := i2+ix

tx := t1*4

tb := b+tx

t3 *tbt3 := *tb

t1 := t2*t3

res:= res+t1

ix := ix+1




Characteristics of 3-Address Code

• Control flow is explicit.• Only elementary operations• Rearrangement and exchange of commands can be handled

relatively easily.


Intermediate languages Other Intermediate Languages

4.1.2 Other Intermediate Languages



Further Intermediate Languages

We consider• 3AC in Static Single Assignment (SSA) representation• Stack Machine Code



Single Static Assignment Form

If a variable a is read at a program position, this is a use of a.

If a variable a is written at a program position, this is a definition of a.

For optimizations, the relationship between use and definition ofvariables is important.

In SSA representation, each variable has exactly one definition. Thus,relationship between use and definition in the intermediate language isexplicit.



Single Static Assignment Form (2)

SSA is essentially a refinement of 3AC.

The different definitions of one variable are represented by indexingthe variable.

For sequential command lists, this means that• at each definition position, the variable gets a different index.• at the use position, the variable has the index of its last definition.



Example: SSA

In SSA-Repräsentation besitzt jede Variable genau

eine Definition. Dadurch wird der Zusammenhang

ischen An end ng nd Definition in derzwischen Anwendung und Definition in der

Zwischensprache explizit, d.h. eine zusätzliche

def-use-Verkettung oder use-def-Verkettung wird

unnötig.

SSA ist im Wesentlichen eine Verfeinerung von 3AC.

Die Unterscheidung zwischen den Definitionsstellen

wird häufig durch Indizierung der Variablen dargestelltwird häufig durch Indizierung der Variablen dargestellt.

Für sequentielle Befehlsfolgen bedeutet das:

• An jeder Definitionsstelle bekommt die Variable

einen anderen Indexeinen anderen Index.

• An der Anwendungsstelle wird die Variable mit

dem Index der letzten Definitionsstelle notiert.

a := x + y

Beispiel:

a := x + y 1 0 0

b := a – 1

a := y + b

b := x * 4

a := a + b

b := a - 1

a := y + b

b := x * 4

a := a + b

1 1

2

2

0

0 1


a := a + b a := a + b 3 2 2



SSA - Join Points of Control Flow

At join points of control flow, an additional mechanism is required:

An Stellen, an denen der Kontrollfluß zusammen-

führt bedarf es eines zusätzlichen Mechanismus:führt, bedarf es eines zusätzlichen Mechanismus:

3 2 2a := x + y a := a – b1 0 0

?b := a3

...

Einführung der fiktiven Orakelfunktion“ ! dieEinführung der fiktiven „Orakelfunktion !, die

quasi den Wert der Variable im zutreffenden Zweig

auswählt:

3 2 2a := x + y a := a – b1 0 0

a := !(a ,a )b := a

43

1 34


...



SSA - Join Points of Control Flow (2)

Introduce an "oracle" Φ that selects the value of the variable of the usebranch:

An Stellen, an denen der Kontrollfluß zusammen-

führt bedarf es eines zusätzlichen Mechanismus:führt, bedarf es eines zusätzlichen Mechanismus:

3 2 2a := x + y a := a – b1 0 0

?b := a3

...

Einführung der fiktiven Orakelfunktion“ ! dieEinführung der fiktiven „Orakelfunktion !, die

quasi den Wert der Variable im zutreffenden Zweig

auswählt:

3 2 2a := x + y a := a – b1 0 0

a := !(a ,a )b := a

43

1 34


...



SSA - Remarks

• The construction of an SSA representation with a minimal numberof applications of the Φ oracle is a non-trivial task.(cf. Appel, Sect. 19.1. and 19.2)

• The term single static assignment form reflects that for eachvariable in the program text, there is only one assignment.Dynamically, a variable in SSA representation can be assignedarbitrarily often (e.g., in loops).



Further intermediate languages

While 3AC and SSA representation are mostly used as intermediatelanguages in compilers, intermediate languages and abstractmachines are more and more often used as connections betweencompilers and runtime environments.

Java Byte Code and CIL (Common Intermediate Language, cf. .NET)are examples for stack machine code, i.e., intermediate results arestored on a runtime stack.

Further intermediate languages are, for instance, used foroptimizations.



Stack machine code as intermediate language

Homogeneous scenario for Java:Sprachlich homogenes Szenario bei Java:

C1.java

C2.javajikes

C1.class

C2 class

Java ByteCode

C2.java

C3.java javac2

C2.class

C3.class

JVM

Sprachlich ggf. inhomogenes Szenario bei .NET:

ProgrammeIntermediate

C# -

C il

prog1.cs prog1.il

verschiedener

Hochsprachen

Intermediate

Language

Compilerprog2.cs prog2.il

prog3.il

CLR

Haskell -

Compilerprog3.hs

Java-ByteCode und die MS-Intermediate Language

sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll


Zwischenergebnisse werden auf einem Laufzeitkeller

verwaltet.



Stack machine code as intermediate language (2)

Inhomogeneous scenario for .NET:

Sprachlich homogenes Szenario bei Java:

C1.java

C2.javajikes

C1.class

C2 class

Java ByteCode

C2.java

C3.java javac2

C2.class

C3.class

JVM

Sprachlich ggf. inhomogenes Szenario bei .NET:

ProgrammeIntermediate

C# -

C il

prog1.cs prog1.il

verschiedener

Hochsprachen

Intermediate

Language

Compilerprog2.cs prog2.il

prog3.il

CLR

Haskell -

Compilerprog3.hs

Java-ByteCode und die MS-Intermediate Language

sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll


Zwischenergebnisse werden auf einem Laufzeitkeller

verwaltet.



Example: Stack machine code

Beispiel: (Kellermaschinencode)

package beisp;

class Weltklasse extends Superklasse

implements BesteBohnen {

Qualifikation studieren ( Arbeit schweiss){

return new Qualifikation();

}}

}

Compiled from Weltklasse.java

class beisp Weltklasse extends beisp Superklasseclass beisp.Weltklasse extends beisp.Superklasse

implements beisp.BesteBohnen{

beisp.Weltklasse();

beisp.Qualifikation studieren( beisp.Arbeit);

}

Method beisp.Weltklasse()

0 aload_0

1 invokespecial #6 <Method beisp.Superklasse()>

4 return

Method beisp.Qualifikation studieren( beisp.Arbeit )

0 new #2 <Class beisp.Qualifikation>

3 dup

4 invokespecial #5 <Method beisp.Qualifikation()>

7 areturn7 areturn

Bemerkung:

Weitere Zwischensprachen werden insbesondere auch



im Zusammenhang mit Optimierungen eingesetzt.



Example: Stack machine code (2)

Beispiel: (Kellermaschinencode)

package beisp;

class Weltklasse extends Superklasse

implements BesteBohnen {

Qualifikation studieren ( Arbeit schweiss){

return new Qualifikation();

}}

}

Compiled from Weltklasse.java

class beisp Weltklasse extends beisp Superklasseclass beisp.Weltklasse extends beisp.Superklasse

implements beisp.BesteBohnen{

beisp.Weltklasse();

beisp.Qualifikation studieren( beisp.Arbeit);

}

Method beisp.Weltklasse()

0 aload_0

1 invokespecial #6 <Method beisp.Superklasse()>

4 return

Method beisp.Qualifikation studieren( beisp.Arbeit )

0 new #2 <Class beisp.Qualifikation>

3 dup

4 invokespecial #5 <Method beisp.Qualifikation()>

7 areturn7 areturn

Bemerkung:




im Zusammenhang mit Optimierungen eingesetzt.


Optimization

4.2 Optimization


Optimization

Optimization

Optimization refers to improving the code with the following goals:

• Runtime behavior

• Memory consumption

• Size of code

• Energy consumption


Optimization

Optimization (2)

We distinguish the following kinds of optimizations:• machine-independent optimizations• machine-dependent optimizations (exploit properties of a

particular real machine)

and• local optimizations• intra-procedural optimizations• inter-procedural/global optimizations


Optimization

Remark on Optimization

Appel (Chap. 17, p 350):

"In fact, there can never be a complete list [of optimizations]. "

"Computability theory shows that it will always be possible to inventnew optimizing transformations."


Optimization Classical Optimization Techniques

4.2.1 Classical Optimization Techniques



Constant Propagation

If the value of a variable is constant, the variable can be replaced withthe constant.



Constant Folding

Evaluate all expressions with constants as operands at compile time.

Iteration of Constant Folding and Propagation:



Non-local Constant Optimization

For each program position, the possible values for each variable arerequired. If the set of possible values is infinite, it has to be abstractedappropriately.



Copy Propagation

Eliminate all copies of variables, i.e., if there exist several variablesx,y,z at a program position, that are known to have the same value, alluses of y and z are replaced by x.



Copy Propagation (2)

This can also be done at join points of control flow or for loops:

For each program point, the information which variables have the samevalue is required.



Common Subexpression Elimination

If an expression or a statement contains the same partial expressionseveral times, the goal is to evaluate this subexpression only once.



Common Subexpression Elimination (2)

Optimization of a basic block is done after transformation to SSA andconstruction of a DAG:



Common Subexpression Elimination (3)

Remarks:• The elimination of repeated computations is often done before

transformation to 3AC, but can also be reasonable following othertransformations.

• The DAG representation of expressions is also used asintermediate language by some authors.



Algebraic Optimizations

Algebraic laws can be applied in order to be able to use otheroptimizations. For example, use associativity and commutativity ofaddition:

Caution: For finite data type, common algebraic laws are not valid ingeneral.



Strength Reduction

Replace expensive operations by more efficient operations (partiallymachine-dependent).

For example: y: = 2* x can be replaced by

y : = x + x

or by

y: = x « 1



Inline Expansion of Procedure Calls

Replace call to non-recursive procedure by its body with appropriatesubstitution of parameters.

Note: This reduces execution time, but increases code size.



Inline Expansion of Procedure Calls (2)

Remarks:• Expansion is in general more than text replacement:



Inline Expansion of Procedure Calls (3)

• In OO programs with relatively short methods, expansion is animportant optimization technique. But, precise information aboutthe target object is required.

• A refinement of inline expansion is the specialization ofprocedures/functions if some of the current parameters areknown. This technique can also be applied to recursiveprocedures/functions.



Dead Code Elimination

Remove code that is not reached during execution or that has noinfluence on execution.

In one of the above examples, constant folding and propagationproduced the following code:

Provided, t3 and t4 are no longer used after the basic block (not live).



Dead Code Elimination (2)

A typical example for non-reachable and thus, dead code that can beeliminated:



Dead Code Elimination (3)

Remarks:

• Dead code is often caused by optimizations.

• Another source of dead code are program modifications.

• In the first case, liveness information is the prerequiste for deadcode elimination.



Code motion

Move commands over branching points in the control flow graph suchthat they end up in basic blocks that are less often executed.

We consider two cases:

• Move commands in succeeding or preceeding branches• Move code out of loops

Optimization of loops is very profitable, because code inside loops isexecuted more often than code not contained in a loop.



Move code over branching points

If a sequential computation branches, the branches are less oftenexecuted than the sequence.



Move code over branching points (2)

Prerequisite for this optimization is that a defined variable is only usedin one branch.

Moving the command over a preceeding joint point can be advisable, ifthe command can be eliminated by optimization from one of thebranches.



Partial redundancy elimination

Definition (Partial Redundancy)An assignment is redundant at a program position s, if it has alreadybeen executed on all paths to s.

An expression e is redundant at s, if the value of e has already beencalculated on all paths to s.

An assignment/expression is partially redundant at s, if it is redundantwith respect to some execution paths leading to s.



Partial redundancy elimination (2)

Example:




Elimination of partial redundancy:




Remarks:

• PRE can be seen as a combination and extension of commonsubexpression elimination and code motion.

• Extension: Elimination of partial redundancy according toestimated probability for execution of specific paths.



Code motion from loops

Idea: Computations in loops whose operations are not changed insidethe loop should be done outside the loop.

Provided, t1 is not live at the end of the top-most block on the left side.



Optimization of loop variables

Variables and expressions that are not changed during the executionof a loop are called loop invariant.

Loops often have variables that are increased/decreasedsystematically in each loop execution, e.g., for-loops.

Often, a loop variable depends on another loop variable,e.g., a relative address depends on the loop counter variable.



Optimization of loop variables (2)

Definition (Loop Variables)A variable i is called explicit loop variable of a loop S, if there is exactlyone definition of i in S of the form i := i + c where c is loop invariant.

A variable k is called derived loop variable of a loop S, if there isexactly one definition of k in S of the form k := j ∗ c or k := j + dwhere j is a loop variable and c and d are loop invariant.



Induction variable analysis

Compute derived loop variables inductively, i.e., instead of computingthem from the value of the loop variable, compute them from thevalued of the previous loop execution.

Note: For optimization of derived loop variables, the dependenciesbetween variable definitions have to be precisely understood.



Loop unrolling

If the number of loop executions is known statically or properties about thenumber of loop executions (e.g., always an even number) can be inferred, theloop body can be copied several times to save comparisons and jumps.

Provided, ix is dead at the end of the fragment.Note, the static computation of ix ’s values in the unrolled loop.



Loop unrolling (2)

Remarks:

• Partial loop unrolling aims at obtaining larger basic blocks in loopsto have more optimization options.

• Loop unrolling is in particular important for parallel processorarchitectures and pipelined processing (machine-dependent).



Optimization for other language classes

The discussed optimizations aim at imperative languages. Foroptimizing programs of other language classes, special techniqueshave been developed.

For example:

• Object-oriented languages: Optimization of dynamic binding(type analysis)

• Non-strict functional languages: Optimization of lazy function calls(strictness analysis)

• Logic programming languages: Optimization of unification


Optimization Potential of Optimizations

4.2.2 Potential of Optimizations



Potential of optimizations - Example

Consider procedure skprod for the evaluation of the optimization techniques:

4.2.2 Optimierungspotential

Am Beispiel der Prozedur skprod demonstrieren

i i i d bi T h ik d dwir einige der obigen Techniken und das

Verbesserungspotential, das durch Optimierungen

erzielt werden kann; dabei skizzieren wir auch

dessen Bewertung.

k dskprod:

res:= 0

ix := 0

t0 := lng-1

if ix<=t0

true false

return res

t1 := i1+ix

tx := t1*4

ta := a+tx

t2 := *ta

t1 := i2+ixt1 : i2+ix

tx := t1*4

tb := b+tx

t3 := *tb

t1 := t2*t3

res:= res+t1

ix := ix+1

Bewertung: Anzahl der Befehlsschritte in Abhängigkeit


Bewertung: Anzahl der Befehlsschritte in Abhängigkeit

von lng: 2 + 2 + 13*lng + 1 = 13*lng + 5

( lng = 100: 1305, lng = 1000: 13005 )

Evaluation:Number of steps depending on lng:2 + 2 + 13 ∗ lng + 1 = 13 ∗ lng + 5lng=100: 1305lng=1000: 13005



Potential of optimizations - Example (2)Move computation of loop invariant out of loop:Herausziehen der Berechnung der

Schleifeninvariante t0:

skprod:

res:= 0res:= 0

ix := 0

t0 := lng-1

if i < t0

return res

t1 := i1+ix

tx := t1*4

if ix<=t0

true false

ta := a+tx

t2 := *ta

t1 := i2+ix

tx := t1*4

tb := b+txtb : b+tx

t3 := *tb

t1 := t2*t3

res:= res+t1

ix := ix+1

Bewertung: 3 + 1 + 12*lng + 1 = 12*lng + 5


g g g

( lng = 100: 1205, lng = 1000: 12005 )Evaluation: 3+1+12*lng+1 = 12 *lng + 5c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 73


Potential of optimizations - Example (3)Optimization of loop variables: There are no derived loop variables, becauset1 and tx have several definitions; transformation to SSA for t1 and tx yieldsthat t11, tx1, ta, t12, tb become derived loop variables.

Optimierung von Schleifenvariablen (1):

Zunächst gibt es keine abgeleiteten Schleifenvariablen,

da t1 und tx mehrere Definitionen besitzen; Einführen

von SSA für t1 und tx macht t11, tx1, ta, t12, tx2, tb

zu abgeleiteten Schleifenvariablen:

skprod:

res:= 0res:= 0

ix := 0

t0 := lng-1

if i < t0

return res

t11:= i1+ix

tx1:= t11*4

1

if ix<=t0

true false

ta := a+tx1

t2 := *ta

t12:= i2+ix

tx2:= t12*4

tb := b+tx2tb : b t

t3 := *tb

t13:= t2*t3

res:= res+t13

ix := ix+1

28.06.2007 324© A. Poetzsch-Heffter, TU Kaiserslauternc© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 74


Potential of optimizations - Example (4)

Optimization of loop variables(2): Inductive definition of loop variablesOptimierung von Schleifenvariablen (2):

Initialisierung und induktive Definition der

S hl if i blSchleifenvariablen:

skprod:

res:= 0res:= 0

ix := 0

t0 := lng-1

t11:= i1-1

tx1:= t11*4

ta := a+tx1

t12:= i2-1

tx2:= t12*4

tb := b+tx2

t11:= t11+1

if ix<=t0

true false

return res

t11:= t11+1

tx1:= tx1+4

ta := ta+4

t2 := *ta

t12:= t12+1

tx2:= tx2+4

tb := tb+4

t3 := *tb

t13:= t2*t3

res:= res+t13


res: res+t13

ix := ix+1



Potential of optimizations - Example (5)Dead Code Elimination: t11, tx1, t12, tx2 do not influence the result.

Elimination toten Codes:

Die Zuweisungen an t11, tx1, t12, tx2 sind toter

Code da sie das Ergebnis nicht beeinflussen

skprod:

Code, da sie das Ergebnis nicht beeinflussen.

res:= 0

ix := 0

t0 := lng-1

t11:= i1-1

tx1:= t11*4tx1: t11 4

ta := a+tx1

t12:= i2-1

tx2:= t12*4

tb := b+tx2

if ix<=t0

true false

return res

ta := ta+4

t2 := *ta

tb := tb+4

t3 := *tb

t13:= t2*t3t13: t2 t3

res:= res+t13

ix := ix+1



( lng = 100: 811, lng = 1000: 8011 )

Evaluation: 9 + 1 + 8 * lng +1 = 8 * lng +11c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 76



Algebraic Optimizations: Use invariants ta = 4 ∗ (i1− 1 + ix) + a for thecomparison ta ≤ 4 ∗ (i1− 1 + t0) + aAlgebraische Optimierung:

Ausnutzen der Invarianten: ta = 4*(i1-1+ix)+ a

für den Vergleich: ta < 4*(i1 1+t0)+ afür den Vergleich: ta <= 4*(i1-1+t0)+ a

skprod:

res:= 0

ix := 0

t0 := lng-1

t11:= i1-1

tx1:= t11*4tx1: t11 4

ta := a+tx1

t12:= i2-1

tx2:= t12*4

tb := b+tx2

t4 := t11+t0

t5 := 4*t4

t6 := t5+a

ta := ta+4

t2 := *ta

if ta<=t6

true false

return rest2 : ta

tb := tb+4

t3 := *tb

t13:= t2*t3

res:= res+t13


ix := ix+1



Potential of optimizations - Example (7)Dead Code Elimination: Assignment to ix is dead code and can be eliminated.Elimination toten Codes:

Durch die Transformation der Schleifenbedingung ist

di Z i C d d d kdie Zuweisung an ix toter Code geworden und kann

eliminiert werden:skprod:

res:= 0

t0 := lng-1

t11:= i1-1

tx1:= t11*4

ta := a+tx1ta := a+tx1

t12:= i2-1

tx2:= t12*4

tb := b+tx2

t4 := t11+t0

t5 := 4*t4

t6 := t5+a

if ta<=t6

return res

ta := ta+4

t2 := *ta

tb := tb+4

if ta< t6

true false

tb : tb+4

t3 := *tb

t13:= t2*t3

res:= res+t13



( lng = 100: 713, lng = 1000: 7013 )

Evaluation: 11 + 1 + 7 * Ing +1 = 7 * lng + 13c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 78



Remarks:

• Reduction of execution steps by almost half, where the mostsignificant reductions are achieved by loop optimization.

• Combination of optimization techniques is important. Determiningthe ordering of optimizations is in general difficult.

• We have only considered optimizations at examples. The difficultyis to find algorithms and heuristics for detecting optimizationpotential automatically and for executing the optimizingtransformations.


Optimization Data flow analysis

4.2.3 Data flow analysis



Data flow analysis

For optimizations, data flow information is required that can beobtained by data flow analysis.

Goal: Explanation of basic concepts of data flow analysis at examples

Outline:• Liveness analysis (Typical example of data flow analysis)• Data flow equations• Important analyses classes

Each analysis has an exact specification which information it provides.



Liveness analysis

Definition (Liveness Analysis)Let P be a program. A variable v is live at a program position S in P ifthere is an execution path π from S to a use of v such that there is nodefinition of v on π.

The liveness analysis determines for all positions S in P whichvariables are live at S.



Liveness analysis (2)

Remarks:• The definition of liveness of variables is static/syntactic. We have

defined dead code dynamically/semantically.• The result of the liveness analysis for a programm P can be

represented as a function live mapping positions in P to bitvectors, where a bit vector contains an entry for each variable inP. Let i be the index of a variable in P, then it holds that:

live(S)[i] = 1 iff v is live at position S




Idea:

• In a procedure-local analysis, exactly the global variables are liveat the end of the exit block of the procedure.

• If the live variables out(B) at the end of a basic block B are known,the live variables in(B) at the beginning of B are computed by:

in(B) = gen(B) ∪ (out(B) \ kill(B))

whereI gen(B) is the set of variables v such that v is applied in B without a

prior definition of vI kill(B) is the set of variables that are defined in B




As the set in(B) is computed from out(B), we have a backwardanalysis.

For B not the exit block of the procedure, out(B) is obtained by

out(B) =⋃

in(Bi) for all successors Bi of B

Thus, for a program without loops, in(B) and out(B) are defined for allbasic blocks B. Otherwise, we obtain a system of recursive equations.



Liveness analysis - Example

Question: How do we compute out(B2)?c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 86


Data flow equations

Theory:

• There is always a solution for equations of the considered form.• There is always a smallest solution that is obtained by an iteration

starting from empty in and out sets.

Note: The equations may have several solutions.



Ambiguity of solutions - Example

a := aB0:

b := 7B1:

out(B0) = in(B0) ∪ in(B1)out(B1) = { }in(B0) = gen(B0) ∪ (out(B0)\kill(B0))

= {a } ∪ out(B0)in(B1) = gen(B1) ∪ (out(B1)\kill(B1))

= { }

Thus, out(B0) = in(B0), and hence in(B0) = {a} ∪ in(B0).

Possible Solutions: in(B0) = {a} or in(B0) = {a,b}



Computation of smallest fixpoint

1. Compute gen(B), kill(B) for all B.

2. Set out(B) = ∅ for all B except for the exit block. For the exit block,out(B) comes from the program context.

3. While out(B) or in(B) changes for any B:

Compute in(B) from current out(B) for all B.

Compute out(B) from in(B) of its successors.



Further analyses and classes of analyses

Many data flow analyses can be described as bit vector problems:• Reaching definitions: Which definitions reach a position S?• Available expressions for elimination of repeated computations• Very busy expressions: Which expression is needed for all

subsequent computations?

The according analyses can be treated analogue to liveness analysis,but differ in• the definition of the data flow information• the definition of gen and kill• the direction of the analysis and the equations



Further analyses and classes of analyses (2)

For backward analyses, the data flow information at the entry of abasic block B is obtained from the information at the exit of B:

in(B) = gen(B) ∪ (out(B) \ kill(B))

Analyses can be distinguished if they consider the conjunction or theintersection of the successor information:

out(B) =⋃

Bi∈succ(B)

in(Bi)

or

out(B) =⋂

Bi∈succ(B)

in(Bi)




For forward analyses, the dependency is the other way round:

out(B) = gen(B) ∪ (in(B) \ kill(B))

with

in(B) =⋃

Bi∈pred(B)

out(Bi)

or

in(B) =⋂

Bi∈pred(B)

out(Bi)




Overview of classes of analyses:

conjunction intersectionforward reachable definitions available expressions

backward live variables busy expressions




For bit vector problems, data flow information consists of subsets offinite sets.

For other analyses, the collected information is more complex, e.g., forconstant propagation, we consider mappings from variables to values.

For interprocedural analyses, complexity increases because the flowgraph is not static.

Formal basis for the development and correctness of optimizations isprovided by the theory of abstract interpretation.


Optimization Non-Local Program Analysis

4.2.4 Non-Local Program Analysis



Non-local program analysis

We use a points-to analysis to demonstrate:• interprocedural aspects: The analysis crosses the borders of

single procedures.• constraints: Program analysis very often involves solving or

refining constraints.• complex analysis results: The analysis result cannot be

represented locally for a statement.• analysis as abstraction: The result of the analysis is an

abstraction of all possible program executions.



Points-to analysis

Analysis for programs with pointers and for object-oriented programs

Goal: Compute which references to which records/objects a variablecan hold.

Applications of Analysis Results:

Basis for optimizations• Alias information (e.g., important for code motion)

I Can p.f = x cause changes to an object referenced by q?I Can z = p.f read information that is written by p.f = x?

• Call graph construction• Resolution of virtual method calls• Escape analysis



Alias InformationBeispiele: (Verwendung von Points-to-

Analyseinformation)Analyseinformation)

(1) p.f = x;

(2) f

A. Nutzen von Alias-Information:

(2) y = q.f;

(3) q.f = z;

p == q: (1)

(2) y = x;(2) y x;

(3) q.f = z;

p != q: Erste Anweisung lässt sich mit den

anderen beiden vertauschenanderen beiden vertauschen.

B. Elimination dynamischer Bindung:

class A {class A {

void m( ... ) { ... }

}

class B extends A {

void m( ) { }void m( ... ) { ... }

}

...

A p;


p = new B();

p.m(...) // Aufruf von B::m

First two statements can

be switched.



Elimination of Dynamic Binding

Beispiele: (Verwendung von Points-to-

Analyseinformation)Analyseinformation)

(1) p.f = x;

(2) f

A. Nutzen von Alias-Information:

(2) y = q.f;

(3) q.f = z;

p == q: (1)

(2) y = x;(2) y x;

(3) q.f = z;

p != q: Erste Anweisung lässt sich mit den

anderen beiden vertauschenanderen beiden vertauschen.

B. Elimination dynamischer Bindung:

class A {class A {

void m( ... ) { ... }

}

class B extends A {

void m( ) { }void m( ... ) { ... }

}

...

A p;

© A. Poetzsch-Heffter, TU Kaiserslautern

p = new B();

p.m(...) // Aufruf von B::mCall of B::m



Escape Analysis

C. Escape-Analyse:

R m( A p ) {( p ) {

B q;

q = new B(); // Kellerverwaltung möglich

q.f = p;

q.g = p.n(); q g p ();

return q.g;

}

Eine Points-to-Analyse für Java:

Vereinfachungen:

• Gesamte Programm ist bekannt.

• Nur Zuweisungen und Methodenaufrufe der

folgenden Form:

Di kt Z i- Direkte Zuweisung: l = r

- Schreiben auf Instanzvariablen: l.f = r

- Lesen von Instanzvariablen: l = r.f

Objekterzeugung: l C()- Objekterzeugung: l = new C()

- Einfacher Methodenaufruf: l = r0.m(r1,..)

• Ausdrücke ohne Seiteneffekte

• Zusammengesetzte Anweisungen

© A. Poetzsch-Heffter, TU Kaiserslautern

• Zusammengesetzte Anweisungen

Can be stored on stack



A Points-to Analysis for Java

Simplifications and assumptions about underlying language• Complete program is known.• Only assignments and method calls of the following form are

used:I Direct assignment: l = rI Write to instance variables: l.f = rI Read of instance variables: l = r.fI Object creation: l = new C()I Simple method call: l = r0.m(r1, ...)

• Expressions without side effects• Compound statements



A Points-to Analysis for Java (2)

Analysis type• Flow-insensitive: The control flow of the program has no

influence on the analysis result. The states of the variables atdifferent program points are combined.

• Context-insensitive: Method calls at different program points arenot distinguished.



A Points-to Analysis for Java (3)

Points-to graph as abstraction

Result of the analysis is a so-called points-to graph having• abstract variables and abstract objects as nodes• edges represent that an abstract variable may have a reference to

an abstract object

Abstract variables V represent sets of concrete variables at runtime.

Abstract objects O represent sets of concrete objects at runtime.

An edge between V and O means that in a certain program state, aconcrete variable in V may reference an object in O.



Points-to Graph - ExampleBeispiel: (Points-to-Graph)

class Y { ... }

class X {

Y f;

void set( Y r ) { this.f = r; }

static void main() {

X p = new X(); // s1 „erzeugt“ o1

Y q = new Y(); // s2 „erzeugt“ o2q (); // „ g

p.set(q);

}

}

p

o1

this

o1

f

q

r

o2


r



Points-to Graph - Example (2)



Definition of the Points-to Graph

For all method implementations,• create node o for each object creation• create nodes for

I each local variable vI each formal parameter p of any method

(incl. this and results (ret))I each static variable s

(Instance variables are modeled by labeled edges.)



Definition of the Points-to Graph (2)Edges: Smallest Fixpoint of f : PtGraph × Stmt → PtGraph with

• f (G, l = new C()) = G ∪ {(l ,oi )}• f (G, l = r) = G ∪ {(l ,oi ) |oi ∈ Pt(G, r)}• f (G, l .f = r) = G ∪ {(< oi , f >,oj ) |oi ∈ Pt(G, l),oj ∈ Pt(G, r)}• f (G, l = r .f ) = G ∪ {(l ,oi ) | ∃oj ∈ Pt(G, r).oi ∈ Pt(G, < oj , f >)}• f (G, l = r0.m(r1, . . . , rn)) =

G ∪⋃oi∈Pt(G,r0)

resolve(G,m,oi , r1, . . . , rn, l)

where Pt(G, x) is the points-to set of x in G,

resolve(G,m,oi , r1, . . . , rn, l) =let mj (p0,p1, . . . ,pn, retj ) = dispatch(oi ,m) in{(p0,oi )} ∪ f (G,p1 = r1) ∪ . . . ∪ f (G, l = retj ) end

and dispatch(oi ,m) returns the actual implementation of m for oi with formalparameters p1, . . . ,pn, result variable retj , p0 refers to this.



Definition of the Points-to Graph (3)

Remark:

The main problem for practical use of the analysis is the efficientimplementation of the computation of the points-to graph.

Literature:

A. Rountev, A. Milanova, B. Ryder: Points-to Analysis for Java UsingAnnotated Constraints. OOPSLA 2001.


Register Allocation

4.3 Register Allocation


Register Allocation

Register allocation

Efficient code has to make good use of the available registers on thetarget machine: Accessing registers is much faster then accessingmemory (the same holds for cache).

Register allocation has two aspects:• Determine which variables are implemented by registers at which

positions.• Determine which register implements which variable at which

positions (register assignment).


Register Allocation

Register allocation (2)

Goals of register allocation

1. Generate code that requires as little registers as possible

2. Avoid unnecessary memory accesses, i.e., not only temporaries,but also program variables are implemented by registers.

3. Allocate registers such for variables that are used often (do notuse them for variables that are only rarely accessed).

4. Obey programmer’s requirements.


Register Allocation

Register allocation (3)

Outline

• Algorithm interleaving code generation and register allocationfor nested expressions (cf. Goal 1)

• Algorithm for procedure-local register allocation(cf. Goals 2 and 3)

• Combination and other aspects


Register Allocation Sethi-Ullmann Algorithm

4.3.1 Sethi-Ullmann Algorithm



Evaluation ordering with minimal registers

The algorithm by Sethi and Ullmann is an example of an integratedapproach for register allocation and code generation.(cf. Wilhelm, Maurer, Sect. 12.4.1, p. 584 ff)

Input:

An assignment with a nested expression on the right hand side

4.3.1 Auswertungsordnung mit

minimalem Registerbedarfminimalem Registerbedarf

Der Algorithmus von Sethi-Ullman ist ein Beispiel

für eine integriertes Verfahren zur Registerzuteilung

und Codeerzeugung.

Eingabe:

Eine Zuweisung mit zusammengesetztem Ausdruckg g

auf der rechten Seite:

Assign ( Var, Exp )

Exp = BinExp | Var

BinExp ( Exp Op Exp )BinExp ( Exp, Op, Exp )

Var ( Ident )

Ausgabe:

Zugehörige Maschinencode bzw ZwischensprachenZugehörige Maschinencode bzw. Zwischensprachen-

code mit zugewiesenen Registern. Wir betrachten hier

Zwei-Adresscode, d.h. Code mit maximal einem

Speicherzugriff:i [ ]Ri := M[V]

M[V] := Ri

Ri := Ri op M[V]

Ri := Ri op Rj


(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)



Evaluation ordering with minimal registers (2)

Output:

Machine or intermediate language code with assigned registers.

We consider two-address code, i.e., code with one memory access atmaximum. The machine has r registers represented by R0, . . . ,Rr−1.

4.3.1 Auswertungsordnung mit

minimalem Registerbedarfminimalem Registerbedarf

Der Algorithmus von Sethi-Ullman ist ein Beispiel

für eine integriertes Verfahren zur Registerzuteilung

und Codeerzeugung.

Eingabe:

Eine Zuweisung mit zusammengesetztem Ausdruckg g

auf der rechten Seite:

Assign ( Var, Exp )

Exp = BinExp | Var

BinExp ( Exp Op Exp )BinExp ( Exp, Op, Exp )

Var ( Ident )

Ausgabe:

Zugehörige Maschinencode bzw ZwischensprachenZugehörige Maschinencode bzw. Zwischensprachen-

code mit zugewiesenen Registern. Wir betrachten hier

Zwei-Adresscode, d.h. Code mit maximal einem

Speicherzugriff:i [ ]Ri := M[V]

M[V] := Ri

Ri := Ri op M[V]

Ri := Ri op Rj


(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)



Example: Code generation w/ register allocation

Consider f := (a + b)− (c − (d + e))

Assume that there are two registers R0 and R1 available for thetranslation.

Result of direct translation:

Beispiel: (Codeerzeugung mit Registerzuteil.)

Betrachte: f:= (a+b)-(c-(d+e))Betrachte: f:= (a+b) (c (d+e))

Annahme: Zur Übersetzung stehen nur zwei Registerzur Verfügung.

Ergebnis der direkten Übersetzung:

R0 := M[a]

R0 := R0 + M[b]

R1 := M[d]R1 := M[d]

R1 := R1 + M[e]

M[t1] := R1

R1 := M[c]

R1 := R1 – M[t1]

R0 := R0 – R1

M[f] := R0

Ergebnis von Sethi-Ullman:

R0 := M[c]

R1 := M[d]

R1 := R1 + M[e]

R0 := R0 – R1

R1 : M[a]R1 := M[a]

R1 := R1 + M[b]

R1 := R1 – R0

M[f] := R1


Besser, weil ein Befehl weniger und keine Zwischen-Speicherung nötig.



Example: Code generation w/ register allocation (2)

Result of Sethi-Ullmann algorithm:

Beispiel: (Codeerzeugung mit Registerzuteil.)

Betrachte: f:= (a+b)-(c-(d+e))Betrachte: f:= (a+b) (c (d+e))

Annahme: Zur Übersetzung stehen nur zwei Registerzur Verfügung.

Ergebnis der direkten Übersetzung:

R0 := M[a]

R0 := R0 + M[b]

R1 := M[d]R1 := M[d]

R1 := R1 + M[e]

M[t1] := R1

R1 := M[c]

R1 := R1 – M[t1]

R0 := R0 – R1

M[f] := R0

Ergebnis von Sethi-Ullman:

R0 := M[c]

R1 := M[d]

R1 := R1 + M[e]

R0 := R0 – R1

R1 : M[a]R1 := M[a]

R1 := R1 + M[b]

R1 := R1 – R0

M[f] := R1


Besser, weil ein Befehl weniger und keine Zwischen-Speicherung nötig.

More efficient, because it uses one instruction less and does not needto store intermediate results.



Sethi-Ullmann algorithm

Goal: Minimize number of registers and number of temporaries.

Idea: Generate code for subexpression requiring more registers first.

Procedure:• Define function regbed that computes the number of registers

needed for an expression• Generate code for an expression E = BinExp(L,OP,R);



Sethi-Ullmann algorithm (2)

We use the following notations:• v_reg(E): the set of available registers for the translation of E• v_tmp(E): the set of addresses where values can be stored

temporarily when translating E• cell(E): register/memory cell where the result of E is stored

Now, let• E be an expression• L the left subexpression of E• R the right subexpression of E• vr abbreviate |v_reg(E)|




We distinguish the following cases:

1. regbed(L) < vr

2. regbed(L) ≥ vr and regbed(R) < vr

3. regbed(L) ≥ vr and redbed(R) ≥ vr




Case 1: regbed(L) < vr

• Generate code for R using v_reg(E) and v_tmp(E) with result incell(R)

• Generate code for L using v_reg(E) \{ cell(R) } and v_tmp(E) withresult in cell(L)

• Generate code for the operation cell(L) := cell(L) OP cell(R)• Set cell(E) = cell(L)




Case 2: regbed(L) ≥ vr and regbed(R) < vr

• Generate code for L using v_reg(E) and v_tmp(E) with result incell(L)

• Generate code for R using v_reg(E) \{ cell(L) } and v_tmp(E) withresult in cell(R)

• Generate code for the operation cell(L) := cell(L) OP cell(R)• Set cell(E) = cell(L)




Case 3: regbed(L) ≥ vr and redbed(R) ≥ vr

• Generate code for R using v_reg(E) and v_tmp(E) with result incell(R)

• Generate code M[first(v_tmp(E))] := cell(R)• Generate code for L using v_reg(E) and rest(v_tmp(E)) with result

in cell(L)• Generate code for the operation cell(L) := cell(L) OP

M[first(v_tmp(E))]• Set cell(E) = cell(L)




Function regbed in MAX notation (can be realized by S-Attribution):

3. Fall: regbed( L ) ! vr und regbed( R ) ! vr

Generiere zunächst Code für RGeneriere zunächst Code für R

unter Verwendung von v_reg(E) und v_tmp(E)

mit Ergebnis in zelle(R)

Generiere Code: M[ first(v_tmp(E)) ] := zelle(R)

Generiere Code für L

unter Verwendung von v_reg(E) und

rest( v_tmp(E) ) mit Ergebnis in zelle(L)

G i C d fü di O tiGeneriere Code für die Operation:

zelle(L) := zelle(L) OP M[ first(v_tmp(E)) ]

Setze zelle(E) = zelle(L)

Die Funktion regbed (in MAX-Notation):

ATT regbed( Exp@ E ) Nat:

IF Assign@< Var@ E> : 0IF Assign@<_,Var@ E> : 0

| BinExp@< Var@ E,_,_> : 1

| BinExp@<_,_,Var@ E > : 0

| BinExp@< L,_, R > E :

IF regbed(L)=regbed(R)

THEN regbed(L) + 1

ELSE max( regbed(L), regbed(R) )

ELSE nil // Fall kommt nicht vor


(In ML wäre die Definition von regbed etwas

aufwendiger, da der Kontext von Var-Ausdrücken

nicht direkt berücksichtigt werden kann.)c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 124


Example: Sethi-Ullman Algorithm

Consider f:= (( a + b ) - (c + d)) * (a - (d+e))

Attributes:

Beispiel: (Ablauf Sethi-Ullman)

Betrachte: f:= ((a+b)-(c+d)) * (a-(d+e))Betrachte: f: ((a+b) (c+d)) (a (d+e))

Attribute: v_reg | v_tmp 12T

regbed

zellezelle

Assign

Var

fBinExp

*

(3.)

312T

1

T

BinExpBinExp

-

(1.)

-

(1.)

2 212 12T

12

Var

a

BinExpBinExpBinExp

+ + + 1 1 11

2 12 12T212

VarVar

d

Var

d

Var VarVar

b

(2.) (1.) (1.)

1 0 1 0 1 0


edda cb1 0 1 0 1 0



Example: Sethi-Ullman Algorithm (2)

Beispiel: (Ablauf Sethi-Ullman)

Betrachte: f:= ((a+b)-(c+d)) * (a-(d+e))Betrachte: f: ((a+b) (c+d)) (a (d+e))

Attribute: v_reg | v_tmp 12T

regbed

zellezelle

Assign

Var

fBinExp

*

(3.)

312T

1

T

BinExpBinExp

-

(1.)

-

(1.)

2 212 12T

12

Var

a

BinExpBinExpBinExp

+ + + 1 1 11

2 12 12T212

VarVar

d

Var

d

Var VarVar

b

(2.) (1.) (1.)

1 0 1 0 1 0


edda cb1 0 1 0 1 0




For formalizing the algorithm, we realize the set of available registersand addresses for storing temporaries with lists, where• the list RL of registers is non-empty• the list AL of addresses is long enough• the result cell is always a register which is the first in RL, i.e.,

first(RL)• the function exchange switches the first two elements of a list,

fst returns the first element of the list,rest returns the tail of the list




In the following, the function expcode for code generation is given inMAX notation (functional).

Note: The application of the functions exchange, fst and expcodesatisfy their preconditions length(RL) > 1 or length(RL) > 0, resp.



Example: Sethi-Ullman Algorithm (5)FCT expcode( Exp@ E, RegList RL, AdrList AL )

CodeList: // pre: length(RL)>0

IF Var@<ID> E:

[ fst(RL) := M[adr(ID)] ]

| BinExp@< L,OP,Var@<ID> > E:

expcode(L,RL,AL)

++ [ fst(RL) := fst(RL) OP M[adr(ID)] ]

| BinExp@< L,OP,R > E:

LET vr == length( RL ) :

IF regbed(L) < vr :

expcode(R,exchange(RL),AL)

++ expcode(L,rst(exchange(RL)),AL)

++ [ fst(RL):= fst(RL) OP fst(rst(RL))]

| regbed(L)>=vr AND regbed(R)<vr :

expcode(L,RL,AL)

++ expcode(R,rst(RL),AL)

++ [ fst(RL):= fst(RL) OP fst(rst(RL))]

| regbed(L)>=vr AND regbed(R)>=vr :

expcode(R,RL,AL)

[ [ f ( ) ] f ( ) ]++ [ M[ fst(AL) ] := fst(RL) ]

++ expcode(L,RL,rst(AL))

++ [ fst(RL):= fst(RL) OP M[fst(AL)] ]

ELSE nil

ELSE []ELSE []

Beachte:

Die Anwendungen der Funktionen exchange, fst und


expcode erfüllen jeweils ihre Vorbedingungen

length(RL) > 1 bzw. length(RL) > 0 .

Remarks:• The algorithm generates 2AC which is optimal with respect to the

number of instructions and the number of temporaries if theexpression has no common subexpressions.

• The algorithm shows the dependency between code generationand register allocation and vice versa.

• In a procedural implementation, register and address lists can berealized by a global stack.


Register Allocation Register Allocation by Graph Coloring

4.3.2 Register Allocation by Graph Coloring



Register allocation by graph coloring

Register allocation by graph coloring is an algorithm (with manyvariants) for allocation of registers in control flow graphs.

Register allocation for CGF with 3AC in SSA form• Input: CFG with using temporary variables• Output: Structurally the same CFG with

I registers instead of temporary variablesI additional instructions for storing intermediate results on the stack,

if applicable



Register allocation by graph coloring (2)

Remarks:• The SSA representation is not necessary, but simplifies the

formulation of the algorithm(e.g.,Wilhelm/Maurer do not use SSA in Sect. 12.5)

• It is no restriction that only temporary variables are implementedby registers. We assume that program variables are assigned totemporary variables in a preceding step.



Life range and interference graph

Definition (Life range)The life range of a temporary variable is the set of program positions atwhich it is alive.

Definition (Interference)Two temporary variables interfere if their life ranges have a non-emptyintersection.

Definition (Interference graph)Let P be a program part/CFG in 3AC/SSA. The interference graph of Pis an undirected graph G = (N,E), where• N is the set of temporary variables• an edge (n1,n2) is in E iff n1 and n2 interfere.



Register allocation by graph coloring

Goal: Reduce number of temporary variables with the availableregisters.

Idea: Translate the problem to graph coloring (NP-complete). Colorthe interference graph, such that• neighboring nodes have different colors• no more colors are used than available registers




General procedure: Try to color the graph as described below. Then:• If a coloring is found, terminate.• If nodes could not be colored,

I choose a non-colored node kI modify the 3AC program such that the value of k is stored

temporarily and is first loaded when it is usedI try to color the modified program

Termination: The procedure terminates, because storing valuesintermediately reduces life ranges of temporaries and interferences.In practice, two or three iterations are sufficient.




Coloring algorithm: Let rn be the number of available registers, i.e.,for coloring, maximally rn colors may be used.

The coloring algorithm consists of the phases:

• (a) Simplify with marking

• (b) Coloring



Simplify with marking

Remove iteratively nodes with less than rn neighbors from the graphand push them onto a stack.

Case 1: The current simplification steps lead to an empty graph.Continue with the coloring phase.

Case 2: The graph contains only nodes with rn and more than rnneighbors. Choose a suitable node as candidate for storing ittemporarily, mark it, push it onto the stack and continue simplification.



Coloring

The nodes are successively popped from the stack and, if possible,colored and put back into the graph.

Let k be the popped node.

Caseh1: k is not marked. Thus, it has less than rn neighbors. Then, kcan be colored with a new color.

Case 2: k is marked.a) the rn or more neighbors have less than rn-1 different colors.

Then, color k appropriately.b) there are rn or more colors in the neighborhood. Leave k

uncolored.



Example - Graph coloring

For simplicity, we only consider one basic block.

In the beginning, t0 and t2 are live.Beispiel: (Graphfärbung)

Einfachheitshalber betrachten wir nur einen Basisblock:

t1 := a + t0

t3 := t2 – 1

t4 := t1 * t3

t5 := b + t0

Am Anfang sindt0, t2 lebendig

0 1 2 3 4 5 6 7 8 9

t5 := b + t0

t6 := c + t0

t7 := d + t4

t8 := t5 + 8

t9 := t8

A E d i dt2 := t6 + 4

t0 := t7

Am Ende sindt0, t2, t9 leb.

Interferenzgraph:t4

t5

Interferenzgraph:

t0

t1

t2

t3

t6t7

t8

t1

t9

Annahme: 4 verfügbare Register


g g

Vereinfachung: Eliminiere der Reihe nach

t1, t3, t2, t9, t0, t5, t4, t7, t8, t6

In the end, t0, t2, t9 are alive.



Example - Graph coloring (2)Interference graph:

Assumption: 4 available registers

Simplification: Remove (in order) t1, t3, t2, t9, t0, t5, t4, t7, t8, t6



Example - Graph coloring (3)

Possible coloring:

Fortsetzung des Beispiels:

Möglich Färbung (t1, t3, t2, t9, t0, t5, t4, t7, t8, t6):g g ( , , , , , , , , , )

t4

t5

t0 t2

t3

t5

t6t7

t8

t1

t9

Bemerkung:

Es gibt eine Reihe von Erweiterungen des Verfahrens:

• Elimination von Move-BefehlenElimination von Move Befehlen

• Bestimmte Heuristiken bei der Vereinfachung (Was

ist ein geeigneter Knoten?)

• Berücksichtigung vorgefärbter KnotenBerücksichtigung vorgefärbter Knoten

Lesen Sie zu Abschnitt 4.3.2:

A l


Appel:

• Section 11.1-11.3 , S. 238-251



Example - Graph coloring (4)

Remarks:

There are several extensions of the algorithm:• Elimination of move instructions• Specific heuristics for simplification (What is a suitable node?)• Consider pre-colored nodes

Recommended reading:• Appel, Sec. 11.1 – 11.3



Further aspects of register allocation

The introduced algorithms consider subproblems. In practice, thereare further aspects that have to be dealt with for register allocation:• Interaction with other compiler phases (in particular optimization

and code generation)• Relation between temporaries and registers• Source/intermediate/target language• Number of applications (Is a variable inside an inner loop?)



Further aspects of register allocation (2)

Possible global procedure

• Allocate registers for standard tasks (registers for stack andargument pointers, base registers)

• Decide which variables and parameters should be stored inregisters

• Evaluate application frequency of temporaries (occurrences ininner loops, distribution of accesses over life range)

• Use evaluation together with heuristics of register allocationalgorithm

• If applicable, optimize again


Just-In-Time Compilation

4.4 Just-In-Time Compilation


Just-In-Time Compilation Language Execution Techniques

4.4.1 Language Execution Techniques



Static (Ahead-of-Time) Compilation

RuntimeCompile Time

SourceCode

AOT CompilerMachine

CodeMachine

Advantages

• Fast execution

Disadvantages

• Platform dependent• Compilation step

Examples

• C/C++, Pascal



Interpretation

Runtime

SourceCode

Interpreter

Advantages

• Platform independent• No compilation step

Disadvantages

• Slow execution

Examples

• Bash, Javascript (old browsers)



Use of Virtual Machine Code (Bytecode)

RuntimeCompile Time

SourceCode

AOT Compiler Bytecode Virtual Machine

Advantages

• Faster execution• Platform independent

Disadvantages

• Still slow due to interpretation• Compilation step

Examples

• Java, C#


Just-In-Time Compilation Just-In-Time Compilation

4.4.2 Just-In-Time Compilation



Dynamic (Just-In-Time) Compilation

Runtime

Byte/SourceCode

JIT CompilerMachine

CodeMachine

Virtual Machine/Interpreter

Advantages

• Fast execution• Platform independent

Disadvantages

• JIT runtime overhead

Examples

• Java HotSpot VM, .NET CLR, Mozilla SpiderMonkeyc© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 151


Just-in-time Compilation

• Just-in-time (dynamic) compilation compiles code during runtime• The goal is to improve performance compared to pure

interpretation• Trade-off between compilation cost and execution time benefit



The History of Just-In-Time1

1960 McCarthy: compile LISP functions at runtime

1968 Thompson: compile regular expressions at runtime

1968 Mitchell: get compiled code by storing interpreter actions

1970 Abrams: JIT-Compilers for APL

1974 Hansen: Detect hot-spots using frequency counters

1993 Jones: Use partial evaluation to create compilers frominterpreters

1994 Hölzle: Adaptive optimization for Self

1997 Sun Hot-Spot JVM

2006 Gal and Franz: Tracing JITs

2011 Google V8, Mozilla TraceMonkey

1See Aycook, 2003c© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 153


Advantages of JIT Compilation2

Many optimizations can be done at runtime, which are not possible instatic compilation, due to additional runtime information:• Concrete operating system and execution platform

I e.g. to use SSE2 instructions• Concrete input values

I Inline virtual method callsI Apply constant foldingI ...

• Program can be monitored at runtimeI Optimize hot-code

• Global optimizations in presence ofI Library codeI Dynamically loaded code

2Source: http://en.wikipedia.org/wiki/Just-in-time_compilationc© Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 154


Kinds of JIT Compilation

Classic• No interpretation• Compile code with a fast (non-optimizing) traditional compiler

Mixed-Mode• Start with interpretation• Only compile hot code• Examples: Sun Hot-Spot JVM, Mozilla SpiderMonkey

Adaptive Compilation• No interpretation• Start with fast compilation (nearly no optimizations)• Recompile hot code with optimizing compiler• Example: Google V8



Design Decisions for JIT Implementations

• JIT implementations have to decide:I What to compile?

• All code or only some code?I How to compile?

• Fast or optimal?I When to compile?

• At startup or when hot-code is detected?• Longer analysis⇒ better generated code

• Decisions may depend on the target machine and the targetapplication

I Client applications require fast start-upI Server applications should be optimized more aggressively

• JIT implementations typically allow to configure these parameters• Default values are based on empirical data (benchmarks)



Different Compilers

Fast Compiler

• Only simple optimizations (e.g. constant folding)• No intermediate representations• Simple register allocation (linear time)• Advantage: fast compilation• Disadvantage: slow code

Optimizing Compiler

• Use all techniques of traditional compilers• Disadvantage: slow compilation• Advantage: very fast code

I Generated code can outperform C or C++ compiled code due toadditional runtime information


Just-In-Time Compilation Hot-Spot Detection

4.4.3 Hot-Spot Detection



Hot-Spot Detection

ObservationMany programs spend the majority of their time executing a minority oftheir code (hot spots)3

ProblemIt is often statically not clear which parts of the program are executedmore often than others

SolutionMonitor code during runtime (profiling)

3D.E. Knuth. An empirical study of Fortran programs. Software—Practice andExperience 1, pp. 105–133, 1971.



Profiling

Profiling

1. monitor and trace events that occurs during runtime,2. set the cost of these events3. attribute the cost of these events to specific parts of the program.

Profiling uses the past to predict the future

Ways to profile

• Time-based profiling• Counter-based profiling• Sampling-based profiling



Time-based Profiling

Method

• Record time spent in each method• Profiling instructions are inserted in prolog and epilog• Measure time and add it to the total time of the method• Methods are compiled when a certain amount of time has been

spent in that method

Properties

• All methods are profiled• Maybe inaccurate for short methods• Very large overhead



Counter-based Profiling

Method

• Invocation counter for each method (loop back-branches)• Increase counter for each method call (branch take)• Compile method when counter reaches a predefined threshold

Properties

• All methods are profiled• Accurate• Difficult to choose good thresholds• Large overhead



Sampling-based Profiling

Method

• Counter for each method• Sample application periodically (e.g., every 10ms)• Increase counter of current method (and caller method)• Compile method when counter reaches a predefined threshold

Properties

• Low overhead• May miss methods• Non-deterministic (difficult to debug)


Just-In-Time Compilation Further Aspects of JIT Compilers

4.4.4 Further Aspects of JIT Compilers



Memory Mangement of Compiled Code

Problem

• Compiled (native) code is often 4-8 times larger than the originalbytecode

• Compiled code must be hold in memory

Solution

• To reduce the memory consumption only a fixed amount (cache)of compiled code is hold in memory

Cache Replacement Strategies

• FIFO (First in First Out)• LRU (Least Recently Used)



On-Stack Replacement (OSR)

Problem

• When a hot-loop is detected, the compiled version of theexecuting method is only executed the next time the method iscalled (which may never be the case)

Solution

• Compile a special version of the method that starts in the middleof the method, where the loop is executing

• Stop interpreting the executing method and execute the specialcompiled version



De-Optimization

Problem

• In languages that allow dynamic code loading (i.e., Java)optimizations may become invalid

• For example: method inlining for virtual method calls can becomeinvalid when new classes are added to the type hierarchy

De-Optimization

• Optimized code can be deoptimized at runtime• Deoptimized code can be reoptimized again



Inline Caches (1/2)4

Problem• Message lookup in prototype-based languages like Javascript or

Smalltalk can be expensive due to complex lookup rules.

Observation• Receiver objects at certain call site are often of the same type

Idea• After first dynamic lookup, inline the lookup result at the call site• Add typecheck to fallback to dynamic lookup and update cache

4Good introduction:http://blog.cdleary.com/2010/09/picing-on-javascript-for-fun-and-profit/



Inline Caches (2/2)

Example (Javascript)

function isPoint(obj) {return obj.isPoint;

}

Generated code (pseudo code):

type := gettype(obj)if type = CACHED_TYPE

result = staticcall CACHED_METHODjump L

elseresult = dynamiccall obj, "isPoint"# ... update cached values (modify generated code)

L: return result



Polymorphic Inline Caches (PICs)5

Problem

• Inline caches only work for a single type (monomorphic type)

Solution

• Polymorphic Inline Caches (PICs)• Like (monomorphic) inline caches, but handles multiple cases• If typecheck fails add additional case (linear search)• If a certain number of cases is reached, treat the call site as

megamorphic and only do dynamic lookup

5Craig Chambers, David Ungar, and Elgin Lee. Optimizing dynamically-typedobject-oriented languages with polymorphic inline caches. ECOOP 1991.


Just-In-Time Compilation Tracing JIT Compilers

4.4.5 Tracing JIT Compilers



Tracing JIT Compilers

Observation

• Most time is spent in hot paths

Idea

• Concentrate on hot paths and not whole methods/code blocks

Approach

• Detect hot paths at runtime• Record trace when hot path is detected• Generate optimized code for individual traces• Use trace trees instead of control flow graphs



Example

1: code;2: do {

if (rare condition) {3: code;

} else {4: code;

}5: } while (frequent condition);6: code;

Control Flow Graph:

1

2

3 4

5

6



Example

1: code;2: do {

if (rare condition) {3: code;

} else {4: code;

}5: } while (frequent condition);6: code;

Control Flow Graph:

1

2

3 4

5

6

hot path = (2,4,5,2)



Hot Path Detection

• Only loops are considered for hot path detection (hot loops)• Add counter to each destination of a backward branch (potential

loop header)• Interpret the program• Increase counter when branch is taken• When threshold (e.g. 2 in TraceMonkey) is reached, hot loop is

detected



Tracing

1. When hot loop is detected start with code tracing2. Record all interpreter instructions3. Stop recording, when either

I Cycle is found (tracing finished)I Trace becomes too long (tracing aborted)I Exception is thrown (tracing aborted)

4. Result is a code trace (loop trace)5. Branches in a code trace are replaced by guards to handle side

exitsI Failed guards return control to the interpreter

6. Method calls are inlined into the trace with appropriate guards incase of dynamic dispatch

7. The trace is optimized and compiled to native code8. In the next iteration the native code is executed



Properties of Simple Tracing JITs

Advantages

• Optimizing single traces is much easier (faster) than whole CFG• Optimizing happens across method boundaries, which is

especially good for programs with many small methods• Implementation is simpler and takes less code compared to a

CFG-based JIT compiler

Disadvantages

• Only works well when there are hot dominant paths• Trace recording is very expensive



Trace Trees6

Problem

• Simple tracing only records a single path• Does not work well for loops with non-dominant paths

Idea

• Instead of single traces use trace trees

Approach

• When a guard during execution of a compiled trace fails,immediately start trace recording

• When the new trace reaches the loop header, incorporate the newtrace into the trace tree

• Corresponding guard is turned into a conditional branch6See Gal and Franz, 2006



Example

1: code;2: do {

if (condition) {3: code;

} else {4: code;

}5: } while (condition);6: code;

Control FlowGraph:

1

2

3 4

5

6

Trace

2

4

5

side exit (sx)

sx



Example

1: code;2: do {

if (condition) {3: code;

} else {4: code;

}5: } while (condition);6: code;

Control FlowGraph:

1

2

3 4

5

6

Trace Tree

2

4

5

3

5sx sx



Properties of Trace Trees

• A trace tree is a directed rooted tree• The root is called anchor node a and represents the loop header• All leaf nodes have an implicit back-edge to a• All nodes, except a, have exactly one predecessor• Nodes maybe duplicated if on multiple traces• Transformation to SSA form is fast, because there is only one

join-point (the anchor node)



Nested Loops

• Traces are added to a trace tree when a side exit is taken• For nested loops, the inner loop gets hot before the outer loop• As a consequence the loop is turned "inside out"



Nested Loops Example

1: code;2: do {

code;3: do {

code;4: } while (condition);5: } while (condition);6: code;

CFG

1

2

3

4

5

6

Inner Trace

3

4 sx



Nested Loops Example

1: code;2: do {

code;3: do {

code;4: } while (condition);5: } while (condition);6: code;

CFG

1

2

3

4

5

6

Extended Trace

3

4

5

2

sx



Bounding Trace Trees

• Trace trees can grow indefinitely• In order to limit the size of trace trees, extending the tree is

stopped after a certain number of backward branches (e.g. 3)• Effectively limits the possible number of inlined outer loops



Method Calls

• Like outer loops method calls are inlined• Virtual calls result in a branch


Just-In-Time Compilation Literature

Literature

General JIT Compilation• John Aycook. A Brief History of Just-In-Time. ACM Computing Surveys, Vol. 35, No. 2, June 2003, pp. 97-113.

http://dx.doi.org/10.1145/857076.857077

• M. Arnold et al. A Survey of Adaptive Optimization in Virtual Machines. Proc. IEEE. 2005.http://dx.doi.org/10.1109/JPROC.2004.840305

• T. Kotzmann, C. Wimmer, H. Mossenbock. Design of the Java HotSpot Client Compiler for Java 6. ACM TACO 2008.http://dx.doi.org/10.1145/1369396.1370017

• Sami Zhioua. A dynamic compiler in an embedded Java Virtual machine. Master’s Thesis. 2003.http://www.cs.mcgill.ca/~zhioua/MscSami.pdf

Tracing JITs• A. Gal and M. Franz. Inremental Dynamic Code Generation with Trace Trees. Technical Report, 2006.

http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-06-16.pdf

• A. Gal, C. W. Probst, and M. Franz. HotpathVM: An Effective JIT Compiler for Resource-constrained Devices. VEE’06.http://www.usenix.org/events/vee06/full_papers/p144-gal.pdf

• Gal et al. Trace-based Just-in-Time Type Specialization for Dynamic Languages. PLDI 2009.http://people.mozilla.org/~gal/compressed.tracemonkey-pldi-09.pdf


Further Aspects of Compilation

4.5 Further Aspects of Compilation



Code generation

Code generation can be split into four independentmachine-dependent tasks:• Memory allocation• Instruction selection and addressing• Instruction scheduling• Code optimization



Memory allocation

Modern machines have the following memory hierarchy:• Registers• Primary Cache (Instruction Cache, Data Cache)• Secondary Cache• Main memory (page/segment addressing)

Different from registers, the cache is controlled by the hardware.Efficient usage of the cache means in particular to align data objectsand instructions to borders of cache blocks (cf. Appel, Chap. 21). Thesame holds for main memory.



Instruction selection

Instruction selection aims at the best possible translation ofexpressions and basic blocks using the instruction set of the machine,for instance,• using complex addressing modes• considering the sizes of constants or the locality of jumps

Instruction selection is often formulated as a tree pattern matchingproblem with costs. (cf. Wilhelm/Maurer, Chap.11)



Instruction scheduling

Modern machines allow processor-local parallel processing (pipeline,super-scalar, VLIW).

In order to use this parallel processing, code has to comply toadditionalrequirements that have to be considered for code generation.(see Appel, Chap. 20; Wilhelm/Maurer, Sect. 12.6)



Code optimization

Optimizations of the assembler or machine code may allow anadditional increase in program efficiency.(see Wilhelm/Maurer, Sect. 6.9)


4. selected topics in compiler construction · 4. selected topics in compiler construction c prof....

Documents