compiler design

99
University of Amsterdam CSA CSA Computer Systems Architecture Introduction to Compiler Design: optimization and backend issues Andy Pimentel Computer Systems Architecture group [email protected] Introduction to Compiler Design – A. Pimentel – p. 1/98

Upload: meatscribd4dl

Post on 17-Nov-2014

666 views

Category:

Documents


3 download

DESCRIPTION

Compiler Design

TRANSCRIPT

Page 1: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Introduction to Compiler Design:optimization and backend issues

Andy Pimentel

Computer Systems Architecture [email protected]

Introduction to Compiler Design – A. Pimentel – p. 1/98

Page 2: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Compilers: Organization Revisited

IRSource MachineCode

IRFrontend BackendOptimizer

OptimizerIndependent part of compilerDifferent optimizations possibleIR to IR translation

Introduction to Compiler Design – A. Pimentel – p. 2/98

Page 3: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Intermediate Representation (IR)

Flow graphNodes are basic blocks

Basic blocks are single entry and single exitEdges represent control-flow

Abstract Machine CodeIncluding the notion of functions and procedures

Symbol table(s) keep track of scope and bindinginformation about names

Introduction to Compiler Design – A. Pimentel – p. 3/98

Page 4: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Partitioning into basic blocks

1. Determine the leaders, which are:The first statementAny statement that is the target of a jumpAny statement that immediately follows a jump

2. For each leader its basic block consists of the leader and allstatements up to but not including the next leader

Introduction to Compiler Design – A. Pimentel – p. 4/98

Page 5: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Partitioning into basic blocks (cont’d)

1 prod=02 i=13 t1=4*i4 t2=a[t1]5 t3=4*i6 t4=b[t3]7 t5=t2*t48 t6=prod+t59 prod=t610 t7=i+i11 i=t712 if i < 21 goto 3

BB1

BB2

Introduction to Compiler Design – A. Pimentel – p. 5/98

Page 6: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Intermediate Representation (cont’d)

Structure within a basic block:

Abstract Syntax Tree (AST)Leaves are labeled by variable names or constantsInterior nodes are labeled by an operator

Directed Acyclic Graph (DAG)

C-like

3 address statements (like we have already seen)

Introduction to Compiler Design – A. Pimentel – p. 6/98

Page 7: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Directed Acyclic Graph

Like ASTs:Leaves are labeled by variable names or constantsInterior nodes are labeled by an operator

Nodes can have variable names attached that contain thevalue of that expression

Common subexpressions are represented by multiple edgesto the same expression

Introduction to Compiler Design – A. Pimentel – p. 7/98

Page 8: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

DAG creation

Suppose the following three address statements:

1. x � y op z

2. x � op y

3. x � y

i f

i � � 20

... will be treated like case 1 with x undefined

Introduction to Compiler Design – A. Pimentel – p. 8/98

Page 9: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

DAG creation (cont’d)

If node

y

is undefined, create leaf labeled y, same for z ifapplicable

Find node n labeled op with children node

y

and node

z

if applicable. When not found, create node n. In case 3 let nbe node

y

Make node

x

point to n and update the attached identifiersfor x

Introduction to Compiler Design – A. Pimentel – p. 9/98

Page 10: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

DAG example

1 t1 = 4 * i2 t2 = a[t1]3 t3 = 4 * i4 t4 = b[t3]5 t5 = t2 * t46 t6 = prod + t57 prod = t68 t7 = i + 19 i = t7

10 if (i � � 20) goto 1

[ ] [ ]

*

* +

+

<=

prod

t6, prod

t5

t2 t4

t1, t3 t7, i 20

1i4ba

Introduction to Compiler Design – A. Pimentel – p. 10/98

Page 11: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Local optimizations

On basic blocks in the intermediate representationMachine independent optimizations

As a post code-generation step (often called peepholeoptimization)

On a small “instruction window” (often a basic block)Includes machine specific optimizations

Introduction to Compiler Design – A. Pimentel – p. 11/98

Page 12: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Transformations on basic blocks

Examples

Function-preserving transformationsCommon subexpression eliminationConstant foldingCopy propagationDead-code eliminationTemporary variable renamingInterchange of independent statements

Introduction to Compiler Design – A. Pimentel – p. 12/98

Page 13: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Transformations on basic blocks (cont’d)

Algebraic transformations

Machine dependent eliminations/transformationsRemoval of redundant loads/storesUse of machine idioms

Introduction to Compiler Design – A. Pimentel – p. 13/98

Page 14: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Common subexpression elimination

If the same expression is computed more than once it iscalled a common subexpression

If the result of the expression is stored, we don’t have torecompute it

Moving to a DAG as IR, common subexpressions areautomatically detected!

x � a � b x � a � b

� � � � � � �

y � a � b y � x

Introduction to Compiler Design – A. Pimentel – p. 14/98

Page 15: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Constant folding

Compute constant expression at compile time

May require some emulation support

x � 3 � 5 x � 8

� � � � � � �

y � x � 2 y � 16

Introduction to Compiler Design – A. Pimentel – p. 15/98

Page 16: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Copy propagation

Propagate original values when copied

Target for dead-code elimination

x � y x � y

� � � � � � �

z � x � 2 z � y � 2

Introduction to Compiler Design – A. Pimentel – p. 16/98

Page 17: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dead-code elimination

A variable x is dead at a statement if it is not used after thatstatement

An assignment x � y � z where x is dead can be safelyeliminated

Requires live-variable analysis (discussed later on)

Introduction to Compiler Design – A. Pimentel – p. 17/98

Page 18: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Temporary variable renaming

t1 � a � b t1 � a � bt2 � t1 � 2 t2 � t1 � 2

� � � � � � �t1 � d � e t3 � d � ec � t1 � 1 c � t3 � 1

If each statement that defines a temporary defines a newtemporary, then the basic block is in normal-form

Makes some optimizations at BB level a lot simpler(e.g. common subexpression elimination, copypropagation, etc.)

Introduction to Compiler Design – A. Pimentel – p. 18/98

Page 19: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Algebraic transformations

There are many possible algebraic transformations

Usually only the common ones are implemented

x � x � 0

x � x � 1

x � x � 2 � x � x � � 1

x � x2 � x � x � x

Introduction to Compiler Design – A. Pimentel – p. 19/98

Page 20: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Machine dependent eliminations/transformations

Removal of redundant loads/stores1 mov R0, a2 mov a, R0 // can be removed

Removal of redundant jumps, for example1 beq ...,$Lx bne ...,$Ly2 j $Ly � $Lx: ...3 $Lx: ...

Use of machine idioms, e.g.,Auto increment/decrement addressing modesSIMD instructions

Etc., etc. (see practical assignment)

Introduction to Compiler Design – A. Pimentel – p. 20/98

Page 21: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Other sources of optimizations

Global optimizationsGlobal common subexpression eliminationGlobal constant foldingGlobal copy propagation, etc.

Loop optimizations

They all need some dataflow analysis on the flow graph

Introduction to Compiler Design – A. Pimentel – p. 21/98

Page 22: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop optimizations

Code motionDecrease amount of code inside loop

Take a loop-invariant expression and place it before theloop

while (i � � limit � 2) � t � limit � 2while (i � � t)

Introduction to Compiler Design – A. Pimentel – p. 22/98

Page 23: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop optimizations (cont’d)

Induction variable eliminationVariables that are locked to the iteration of the loop arecalled induction variables

Example: in for (i = 0; i < 10; i++) i is aninduction variable

Loops can contain more than one induction variable, forexample, hidden in an array lookup computation

Often, we can eliminate these extra induction variables

Introduction to Compiler Design – A. Pimentel – p. 23/98

Page 24: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop optimizations (cont’d)

Strength reductionStrength reduction is the replacement of expensiveoperations by cheaper ones (algebraic transformation)

Its use is not limited to loops but can be helpful forinduction variable elimination

i � i � 1 i � i � 1t1 � i � 4 � t1 � t1 � 4t2 � a

t1

t2 � a

t1

if (i � 10) goto top if (i � 10) goto top

Introduction to Compiler Design – A. Pimentel – p. 24/98

Page 25: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop optimizations (cont’d)

Induction variable elimination (2)Note that in the previous strength reduction we have toinitialize t1 before the loop

After such strength reductions we can eliminate aninduction variable

i � i � 1 t1 � t1 � 4t1 � t1 � 4 � t2 � a

t1

t2 � a

t1

if (t1 � 40) goto topif (i � 10) goto top

Introduction to Compiler Design – A. Pimentel – p. 25/98

Page 26: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Finding loops in flow graphs

Dominator relationNode A dominates node B if all paths to node B go throughnode A

A node always dominates itself

We can construct a tree using this relation: the Dominator tree

Introduction to Compiler Design – A. Pimentel – p. 26/98

Page 27: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dominator tree example

5

1

2

3

4

9

6

7

8

4

10

1

2 3

5

6

7

8

9 10

Flow graph Dominator tree

Introduction to Compiler Design – A. Pimentel – p. 27/98

Page 28: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Natural loops

A loop has a single entry point, the header, whichdominates the loop

There must be a path back to the header

Loops can be found by searching for edges of which theirheads dominate their tails, called the backedges

Given a backedge n � d, the natural loop is d plus thenodes that can reach n without going through d

Introduction to Compiler Design – A. Pimentel – p. 28/98

Page 29: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Finding natural loop of n d

procedure insert(m) {if (not m � loop) {

loop � loop

mpush(m)

}}

stack � /0loop � �d

insert(n)while (stack

� � /0) {m = pop()for (p � pred

�m

) insert(p)}

Introduction to Compiler Design – A. Pimentel – p. 29/98

Page 30: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Natural loops (cont’d)

When two backedges go to the same header node, we mayjoin the resulting loops

When we consider two natural loops, they are eithercompletely disjoint or one is nested inside the other

The nested loop is called an inner loop

A program spends most of its time inside loops, so loopsare a target for optimizations. This especially holds forinner loops!

Introduction to Compiler Design – A. Pimentel – p. 30/98

Page 31: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Our example revisited

1

2

3

4

5 6

7

8

9 10

Flow graph

Natural loops:

1. backedge 10 −> 7: {7,8,10} (the inner loop) 2. backedge 7 −> 4: {4,5,6,7,8,10} 3. backedges 4 −> 3 and 8 −> 3: {3,4,5,6,7,8,10} 4. backedge 9 −> 1: the entire flow graph

Introduction to Compiler Design – A. Pimentel – p. 31/98

Page 32: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Our example revisited

1

2

3

4

5 6

7

8

9 10

Flow graph

Natural loops:

1. backedge 10 −> 7: {7,8,10} (the inner loop) 2. backedge 7 −> 4: {4,5,6,7,8,10} 3. backedges 4 −> 3 and 8 −> 3: {3,4,5,6,7,8,10} 4. backedge 9 −> 1: the entire flow graph

Introduction to Compiler Design – A. Pimentel – p. 31/98

Page 33: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Reducible flow graphs

A flow graph is reducible when the edges can be partitionedinto forward edges and backedges

The forward edges must form an acyclic graph in whichevery node can be reached from the initial node

Exclusive use of structured control-flow statements such asif-then-else, while and break produces reduciblecontrol-flow

Irreducible control-flow can create loops that cannot beoptimized

Introduction to Compiler Design – A. Pimentel – p. 32/98

Page 34: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Reducible flow graphs (cont’d)

Irreducible control-flow graphs can always be madereducible

This usually involves some duplication of code

a

cb

a

cb

c’

Introduction to Compiler Design – A. Pimentel – p. 33/98

Page 35: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dataflow analysis

Data analysis is needed for global code optimization, e.g.:Is a variable live on exit from a block? Does adefinition reach a certain point in the code?

Dataflow equations are used to collect dataflow informationA typical dataflow equation has the formout

S

� gen

S

in

S� kill

S

The notion of generation and killing depends on thedataflow analysis problem to be solved

Let’s first consider Reaching Definitions analysis forstructured programs

Introduction to Compiler Design – A. Pimentel – p. 34/98

Page 36: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Reaching definitions

A definition of a variable x is a statement that assigns ormay assign a value to x

An assignment to x is an unambiguous definition of x

An ambiguous assignment to x can be an assignment to apointer or a function call where x is passed by reference

Introduction to Compiler Design – A. Pimentel – p. 35/98

Page 37: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Reaching definitions (cont’d)

When x is defined, we say the definition is generated

An unambiguous definition of x kills all other definitions ofx

When all definitions of x are the same at a certain point, wecan use this information to do some optimizations

Example: all definitions of x define x to be 1. Now, byperforming constant folding, we can do strength reductionif x is used in z � y � x

Introduction to Compiler Design – A. Pimentel – p. 36/98

Page 38: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dataflow analysis for reaching definitions

During dataflow analysis we have to examine every paththat can be taken to see which definitions reach a point inthe code

Sometimes a certain path will never be taken, even if it ispart of the flow graph

Since it is undecidable whether a path can be taken, wesimply examine all paths

This won’t cause false assumptions to be made for thecode: it is a conservative simplification

It merely causes optimizations not to be performed

Introduction to Compiler Design – A. Pimentel – p. 37/98

Page 39: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

The building blocks

S

S

S

S

d: a=b+c

S1

S2

S1 S2

S1

gen[S]=

d

kill[S]=Da � �d

out[S]=gen[S]

(in[S]-kill[S])

gen[S]=gen[S2]

(gen[S1]-kill[S2])kill[S]=kill[S2]

(kill[S1]-gen[S2])in[S1]=in[S]in[S2]=out[S1]out[S]=out[S2]

gen[S]=gen[S1]

gen[S2]kill[S]=kill[S1]

kill[S2]in[S1]=in[S2]=in[S]out[S]=out[S1]

out[S2]

gen[S]=gen[S1]kill[S]=kill[S1]in[S1]=in[S]

gen[S1]out[S]=out[S1]

Introduction to Compiler Design – A. Pimentel – p. 38/98

Page 40: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dealing with loops

The in-set to the code inside the loop is the in-set of theloop plus the out-set of the loop: in

S1 � in

S

out

S1

The out-set of the loop is the out-set of the code inside:out

S

� out

S1

Fortunately, we can also compute out

S1

in terms of in

S1

:out

S1

� gen

S1

in

S1� kill

S1

Introduction to Compiler Design – A. Pimentel – p. 39/98

Page 41: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dealing with loops (cont’d)

I � in

S1

��� O � out

S1

��� J � in

S

�� G � gen

S1

and K � kill

S1

I � J

O

O � G

� �

I � K

Assume O � /0, then I1 � J

O1 � G

� �

I1 � K

� � G

� �

J � K�

I2 � J

O1 � J

G

� �

J � K� � J

G

O2 � G

� �

I2 � K

� � G� �

J

G � K

� � G

� �

J � K

O1 � O2 so in

S1� � in�

S

� �

gen

S1

and out

S

� � out

S1

Introduction to Compiler Design – A. Pimentel – p. 40/98

Page 42: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Reaching definitions example

d1 i = m - 1d2 j = nd3 a = u1

dod4 i = i + 1d5 j = j - 1

if (e1)d6 a = u2

elsed7 i = u3

while (e2)

001 1111110 0000

;000 1111111 0000

;000 1101110 0000

100 0000000 1001

d1 d2010 0000000 0100

d3001 0000000 0010

do000 1111110 0000

;

;

d4

110 0000000 1111

000 1100110 0001

000 1000100 0001 d5

000 0100010 0000

if

e1

000 0011000 0000

d6 d7

000 0010 100 1000000 0001

001 0000

e2

;

In reality, dataflow analysis is often performed at the granularityof basic blocks rather than statements

Introduction to Compiler Design – A. Pimentel – p. 41/98

Page 43: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Iterative solutions

Programs in general need not be made up out of structuredcontrol-flow statements

We can do dataflow analysis on these programs using aniterative algorithm

The equations (at basic block level) for reaching definitionsare:

in

B �

P � pred

B

!

out

P

out

B

� gen

B

in

B

� kill

B

Introduction to Compiler Design – A. Pimentel – p. 42/98

Page 44: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Iterative algorithm for reaching definitions

for (each block B) out

B

� gen

B

do {change � falsefor (each block B) {

in

B

P � pred

B

!out

P

oldout � out

B

out

B � gen

B

in

B

� kill

B

if (out

B

� � oldout) change � true}

} while (change)

Introduction to Compiler Design – A. Pimentel – p. 43/98

Page 45: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Reaching definitions: an example

d1: i = m −1d2: j = nd3: a = u1

d4: i = i + 1d5: j = j − 1

d6: a = u2

d7: i = u3

B3

B1

B2

B4

gen[B1] = {d1,d2,d3}kill[B1] = {d4,d5,d6,d7}

kill[B2] = {d1,d2,d7}gen[B2] = {d4,d5}

gen[B3] = {d6}kill[B3] = {d3}

gen[B4] = {d7}kill[B4] = {d1,d4}

Block B Initial Pass 1 Pass 2

in

"

B

#

out"

B#

in

"

B

#

out

"

B

#

in

"

B

#

out

"

B

#

B1 000 0000 111 0000 000 0000 111 0000 000 0000 111 0000

B2 000 0000 000 1100 111 0011 001 1110 111 1111 001 1110

B3 000 0000 000 0010 001 1110 000 1110 001 1110 000 1110

B4 000 0000 000 0001 001 1110 001 0111 001 1110 001 0111

Introduction to Compiler Design – A. Pimentel – p. 44/98

Page 46: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Available expressions

An expression e is available at a point p if every path fromthe initial node to p evaluates e, and the variables used by eare not changed after the last evaluations

An available expression e is killed if one of the variablesused by e is assigned to

An available expression e is generated if it is evaluated

Note that if an expression e is assigned to a variable usedby e, this expression will not be generated

Introduction to Compiler Design – A. Pimentel – p. 45/98

Page 47: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Available expressions (cont’d)

Available expressions are mainly used to find commonsubexpressions

t1 = 4 * i

?

t2 = 4 * i

B2

B3

B1 t1 = 4 * i

t2 = 4 * i

t0 = 4 * ii = ...

B1

B2

B3

Introduction to Compiler Design – A. Pimentel – p. 46/98

Page 48: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Available expressions (cont’d)

Dataflow equations:

out

B

� e_gen

B

in

B� e_kill

B

in

B

P � pred

B!

out

P

for B not initial

in

B1

� /0 where B1 is the initial block

The confluence operator is intersection instead of the union!

Introduction to Compiler Design – A. Pimentel – p. 47/98

Page 49: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Liveness analysis

A variable is live at a certain point in the code if it holds avalue that may be needed in the future

Solve backwards:Find use of a variableThis variable is live between statements that havefound use as next statementRecurse until you find a definition of the variable

Introduction to Compiler Design – A. Pimentel – p. 48/98

Page 50: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dataflow for liveness

Using the sets use

B

and de f

B

de f

B

is the set of variables assigned values in B priorto any use of that variable in Buse

B

is the set of variables whose values may be usedin B prior to any definition of the variable

A variable comes live into a block (in in

B

), if it is eitherused before redefinition of it is live coming out of the blockand is not redefined in the block

A variable comes live out of a block (in out

B

) if and onlyif it is live coming into one of its successors

Introduction to Compiler Design – A. Pimentel – p. 49/98

Page 51: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dataflow equations for liveness

in

B

� use

B

out

B

� de f

B �

out

B

S �succ$

B%

in

S

Note the relation between reaching-definitions equations:the roles of in and out are interchanged

Introduction to Compiler Design – A. Pimentel – p. 50/98

Page 52: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Algorithms for global optimizations

Global common subexpression eliminationFirst calculate the sets of available expressions

For every statement s of the form x � y � z where y � z isavailable do the following

Search backwards in the graph for the evaluations ofy � zCreate a new variable uReplace statements w � y � z by u � y � z; w � uReplace statement s by x � u

Introduction to Compiler Design – A. Pimentel – p. 51/98

Page 53: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Copy propagation

Suppose a copy statement s of the form x � y isencountered. We may now substitute a use of x by a use ofy if

Statement s is the only definition of x reaching the useOn every path from statement s to the use, there are noassignments to y

Introduction to Compiler Design – A. Pimentel – p. 52/98

Page 54: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Copy propagation (cont’d)

To find the set of copy statements we can use, we define anew dataflow problem

An occurrence of a copy statement generates this statement

An assignment to x or y kills the copy statement x � y

Dataflow equations:

out

B

� c_gen

B

in

B

� c_kill

B

in

B �

P � pred

B

!

out

P

for B not initial

in

B1

� /0 where B1 is the initial block

Introduction to Compiler Design – A. Pimentel – p. 53/98

Page 55: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Copy propagation (cont’d)

For each copy statement s: x � y doDetermine the uses of x reached by this definition of xDetermine if for each of those uses this is the onlydefinition reaching it ( � s � in

Buse

)If so, remove s and replace the uses of x by uses of y

Introduction to Compiler Design – A. Pimentel – p. 54/98

Page 56: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Detection of loop-invariant computations

1. Mark invariant those statements whose operands areconstant or have reaching definitions outside the loop

2. Repeat step 3 until no new statements are marked invariant

3. Mark invariant those statements whose operands either areconstant, have reaching definitions outside the loop, or haveone reaching definition that is marked invariant

Introduction to Compiler Design – A. Pimentel – p. 55/98

Page 57: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Code motion

1. Create a pre-header for the loop

2. Find loop-invariant statements

3. For each statement s defining x found in step 2, check that(a) it is in a block that dominate all exits of the loop(b) x is not defined elsewhere in the loop(c) all uses of x in the loop can only be reached from

this statement s

4. Move the statements that conform to the pre-header

Introduction to Compiler Design – A. Pimentel – p. 56/98

Page 58: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Code motion (cont’d)

i = 2u = u + 1

i = 1

if u < v goto B3

v = v − 1if v <= 20 goto B5

j = i

B3

B2

B4

B5

B1

i = 1 B1

i = 2u = u + 1

if u < v goto B3

v = v − 1if v <= 20 goto B5

j = i

B3

B2

B4

B5

i = 3

i = 2u = u + 1

i = 1

if u < v goto B3

B3

B2

B1

v = v − 1if v <= 20 goto B5

j = i B5

k = iB4

Condition (a) Condition (b) Condition (c)

Introduction to Compiler Design – A. Pimentel – p. 57/98

Page 59: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Detection of induction variables

A basic induction variable i is a variable that only hasassignments of the form i � i

&

c

Associated with each induction variable j is a triple

i ' c ' d

where i is a basic induction variable and c and d areconstants such that j � c � i � d

In this case j belongs to the family of i

The basic induction variable i belongs to its own family,with the associated triple

i ' 1 ' 0

Introduction to Compiler Design – A. Pimentel – p. 58/98

Page 60: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Detection of induction variables (cont’d)

Find all basic induction variables in the loop

Find variables k with a single assignment in the loop withone of the following forms:

k � j � b, k � b � j, k � j(

b, k � j � b, k � b � j, whereb is a constant and j is an induction variable

If j is not basic and in the family of i then there must beNo assignment of i between the assignment of j and kNo definition of j outside the loop that reaches k

Introduction to Compiler Design – A. Pimentel – p. 59/98

Page 61: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Strength reduction for induction variables

Consider each basic induction variable i in turn. For eachvariable j in the family of i with triple

�i ' c ' d

:Create a new variable sReplace the assignment to j by j � sImmediately after each assignment i � i

&

n appends � s � c � nPlace s in the family of i with triple

i ' c ' d

Initialize s in the preheader: s � c � i � d

Introduction to Compiler Design – A. Pimentel – p. 60/98

Page 62: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Strength reduction for induction variables (cont’d)

i = i + 1t2 = 4 * it3 = a[t2]if t3 < v goto B2

Strength reduction

i = m − 1t1 = 4 * nv = a[t1]

if i < n goto B5

B5

B1

B2

B3

B4

i = m − 1t1 = 4 * nv = a[t1]

s2 = 4 * i

t3 = a[t2]if t3 < v goto B2

t2 = s2s2 = s2 + 4i = i + 1

if i < n goto B5

B5

B3

B2

B1

B4

Introduction to Compiler Design – A. Pimentel – p. 61/98

Page 63: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Elimination of induction variables

Consider each basic induction variable i only used tocompute other induction variables and tests

Take some j in i’s family such that c and d from the triple�

i ' c ' d

are simple

Rewrite tests if (i relop x) tor � c � x � d; if ( j relop r)

Delete assignments to i from the loop

Do some copy propagation to eliminate j � s assignmentsformed during strength reduction

Introduction to Compiler Design – A. Pimentel – p. 62/98

Page 64: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Alias Analysis

Aliases, e.g. caused by pointers, make dataflow analysismore complex (uncertainty regarding what is defined andused: x � �p might use any variable)

Use dataflow analysis to determine what a pointer mightpoint to

in

B

contains for each pointer p the set of variables towhich p could point at the beginning of block B

Elements of in

B

are pairs

p ' a

where p is a pointerand a a variable, meaning that p might point to a

out

B

is defined similarly for the end of B

Introduction to Compiler Design – A. Pimentel – p. 63/98

Page 65: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Alias Analysis (cont’d)

Define a function transB such that transB�

in

B � � out

B

transB is composed of transs, for each stmt s of block BIf s is p � &a or p � &a

&

c in case a is an array, thentranss

S

� �

S �� �

p ' b

�)

any variable b� � � �

p ' a

� �

If s is p � q

&

c for pointer q and nonzero integer c,then

transs

S� � �S �� �

p ' b

�)

any variable b

� �

� �p ' b

�) �

q ' b

� �

S and b is an array variable

If s is p � q, thentranss

S

� � �S �� �

p ' b

�)

any variable b

� �

� �

p ' b

�) �

q ' b

� � S

Introduction to Compiler Design – A. Pimentel – p. 64/98

Page 66: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Alias Analysis (cont’d)

– If s assigns to pointer p any other expression, thentranss

S

� � S �� �

p ' b

�)

any variable b�

– If s is not an assignment to a pointer, then transs

S

� � S

Dataflow equations for alias analysis:

out

B � transB

in

B

in

B �

P � pred

B

!

out

P

where transB�

S� � transsk

transsk *1

�,+ + + �

transs1

S

� � � �

Introduction to Compiler Design – A. Pimentel – p. 65/98

Page 67: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Alias Analysis (cont’d)

How to use the alias dataflow information? Examples:In reaching definitions analysis (to determine gen andkill)

� statement �p � a generates a definition of everyvariable b such that p could point to b

� �p � a kills definition of b only if b is not an arrayand is the only variable p could possibly point to (tobe conservative)

In liveness analysis (to determine de f and use)

� �p � a uses p and a. It defines b only if b is theunique variable that p might point to (to beconservative)

� a � �p defines a, and represents the use of p and ause of any variable that p could point to

Introduction to Compiler Design – A. Pimentel – p. 66/98

Page 68: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Code generation

Instruction selectionWas a problem in the CISC era (e.g., lots of addressingmodes)

RISC instructions mean simpler instruction selection

However, new instruction sets introduce new, complicatedinstructions (e.g., multimedia instruction sets)

Introduction to Compiler Design – A. Pimentel – p. 67/98

Page 69: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Instruction selection methods

Tree-based methods (IR is a tree)Maximal MunchDynamic programmingTree grammars

Input tree treated as string using prefix notationRewrite string using an LR parser and generateinstructions as side effect of rewriting rules

If the DAG is not a tree, then it can be partitioned intomultiple trees

Introduction to Compiler Design – A. Pimentel – p. 68/98

Page 70: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Tree pattern based selection

Every target instruction is represented by a tree pattern

Such a tree pattern often has an associated cost

Instruction selection is done by tiling the IR tree with theinstruction tree patterns

There may be many different ways an IR tree can be tiled,depending on the instruction set

Introduction to Compiler Design – A. Pimentel – p. 69/98

Page 71: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Tree pattern based selection (cont’d)

+

mem

const d

+

mem

mem

move

const b

temp 1 const a

* temp 2 const c

+

Name Effect Trees Cycles

— ri temp 0

ADD ri

- r j

. rk+

1

MUL ri

- r j

/ rk*

1

ADDI ri

- r j

. c +

const

+

constconst 1

LOAD ri

- M 0r j. c 1

+

const

mem

+

mem

const

mem

const

mem3

STORE M

0r j

. c 1 - ri

+

const

mem

move

+

mem

move

const

mem

move

const

mem

move3

MOVEM M

0

r j

1 - M 0ri

1

mem

move

mem6

Introduction to Compiler Design – A. Pimentel – p. 70/98

Page 72: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Optimal and optimum tilings

The cost of a tiling is the sum of the costs of the tree patterns

An optimal tiling is one where no two adjacent tiles can becombined into a single tile of lower cost

An optimum tiling is a tiling with lowest possible cost

An optimum tiling is also optimal, but not vice-versa

Introduction to Compiler Design – A. Pimentel – p. 71/98

Page 73: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Maximal Munch

Maximal Munch is an algorithm for optimal tilingStart at the root of the treeFind the largest pattern that fitsCover the root node plus the other nodes in the pattern;the instruction corresponding to the tile is generatedDo the same for the resulting subtrees

Maximal Munch generates the instructions in reverse order!

Introduction to Compiler Design – A. Pimentel – p. 72/98

Page 74: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Dynamic programming

Dynamic programming is a technique for finding optimumsolutions

Bottom up approachFor each node n the costs of all children are foundrecursively.Then the minimum cost for node n is determined.

After cost assignment of the entire tree, instructionemission follows:

Emission(node n): for each leaves li of the tileselected at node n, perform Emission(li). Then emitthe instruction matched at node n

Introduction to Compiler Design – A. Pimentel – p. 73/98

Page 75: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Register allocation...a graph coloring problem

First do instruction selection assuming an infinite numberof symbolic registers

Build an interference graphEach node is a symbolic registerTwo nodes are connected when they are live at thesame time

Color the interference graphConnected nodes cannot have the same colorMinimize the number of colors (maximum is thenumber of actual registers)

Introduction to Compiler Design – A. Pimentel – p. 74/98

Page 76: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Coloring by simplification

Simplify interference graph G using heuristic method(K-coloring a graph is NP-complete)

Find a node m with less than K neighborsRemove node m and its edges from G, resulting in G

2

.Store m on a stackColor the graph G

2Graph G can be colored since m has less than Kneighbors

Introduction to Compiler Design – A. Pimentel – p. 75/98

Page 77: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Coloring by simplification (cont’d)

SpillIf a node with less than K neigbors cannot be found inG

Mark a node n to be spilled, remove n and its edgesfrom G (and stack n) and continue simplification

SelectAssign colors by popping the stackArriving at a spill node, check whether it can becolored. If not:

The variable represented by this node will reside inmemory (i.e. is spilled to memory)Actual spill code is inserted in the program

Introduction to Compiler Design – A. Pimentel – p. 76/98

Page 78: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Coalescing

If there is no interference edge between the source anddestination of a move, the move is redundant

Removing the move and joining the nodes is calledcoalescing

Coalescing increases the degree of a node

A graph that was K colorable before coalescing might notbe afterwards

Introduction to Compiler Design – A. Pimentel – p. 77/98

Page 79: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Sketch of the algorithm with coalescing

Label move-related nodes in interference graph

While interference graph is nonemptySimplify, using non-move-related nodesCoalesce move-related nodes using conservativecoalescing

Coalesce only when the resulting node has less thanK neighbors with a significant degree

No simplifications/coalescings: “freeze” amove-related node of a low degree � do not considerits moves for coalescing anymoreSpill

Select

Introduction to Compiler Design – A. Pimentel – p. 78/98

Page 80: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Register allocation: an example

Live in: k,jg = mem[j+12]h = k −1f = g * he = mem[j+8]m = mem[j+16]b = mem[f]c = e + 8d = ck = m + 4j = bgoto dLive out: d,k,j

e

d

h g

kj b

f

m

c

Assume a 4-coloring (K � 4)

Simplify by removing and stacking nodes with � 4neighbors (g,h,k,f,e,m)

Introduction to Compiler Design – A. Pimentel – p. 79/98

Page 81: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Register allocation: an example (cont’d)

After removing and stacking the nodes g,h,k,f,e,m:

After simplification

d

j b

c

j&b d&c

After coalescing

Coalesce now and simplify again

Introduction to Compiler Design – A. Pimentel – p. 80/98

Page 82: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Register allocation: an example (cont’d)

R0 R1 R2 R3Stacked elements: d&cj&b mefkgh

4 registers available:

e

d

h g

kj b

f

m

c

e

d

h g

kj b

f

m

c

Introduction to Compiler Design – A. Pimentel – p. 81/98

Page 83: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Register allocation: an example (cont’d)

R0 R1 R2 R3efkgh

4 registers available:

e

d

h g

kj b

f

m

c

e

d

h g

kj b

f

m

c

Stacked elements: m

ETC., ETC.

No spills are required and both moves were optimized away

Introduction to Compiler Design – A. Pimentel – p. 82/98

Page 84: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Instruction scheduling

Increase ILP (e.g., by avoiding pipeline hazards)Essential for VLIW processors

Scheduling at basic block level: list schedulingSystem resources represented by matrix Resources 3

TimePosition in matrix is true or false, indicating whetherthe resource is in use at that timeInstructions represented by matrices Resources 3

Instruction durationUsing dependency analysis, the schedule is made byfitting instructions as tight as possible

Introduction to Compiler Design – A. Pimentel – p. 83/98

Page 85: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

List scheduling (cont’d)

Finding optimal schedule is NP-complete problem � useheuristics, e.g. at an operation conflict schedule the mosttime-critical first

For a VLIW processor, the maximum instruction durationis used for scheduling � painful for memory loads!

Basic blocks usually are small (5 operations on the average)

� benefit of scheduling limited � Trace Scheduling

Introduction to Compiler Design – A. Pimentel – p. 84/98

Page 86: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Trace scheduling

Schedule instructions over code sections larger than basicblocks, so-called traces

A trace is a series of basic blocks that does not extendbeyond loop boundaries

Apply list scheduling to whole trace

Scheduling code inside a trace can move code beyond basicblock boundaries � compensate this by adding code to theoff-trace edges

Introduction to Compiler Design – A. Pimentel – p. 85/98

Page 87: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Trace scheduling (cont’d)

BB1

BB2 BB3

BB4

Introduction to Compiler Design – A. Pimentel – p. 86/98

Page 88: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Trace scheduling (cont’d)

Operation to be movedbefore Op A

Op COp AOp B

Off Trace

Off Trace

in TraceBasic Block

in TraceBasic Block

in TraceBasic Block

(c)

(b)

(a)

Copied code

Basic BlockOff Trace

traceOp ABranch

Op BOp C

Branch

Op A

Branch

Op B

Op B

below Branch in

Op COp AOp B Op B

Op C

BranchOp A

Op A Op C Op BOp A

Op C

In Trace

allowed if no side-

In TraceCopied code inoff Trace Basic Block

codeeffects in Off trace

Moved code onlyOperation to be movedabove Branch

In Trace

In Trace

Operation to be moved

Introduction to Compiler Design – A. Pimentel – p. 87/98

Page 89: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Trace scheduling (cont’d)

Trace selection

Because of the code copies, the trace that is most oftenexecuted has to be scheduled first

A longer trace brings more opportunities for ILP (loopunrolling!)

Use heuristics about how often a basic block is executedand which paths to and from a block have the most chanceof being taken (e.g. inner-loops) or use profiling (inputdependent)

Introduction to Compiler Design – A. Pimentel – p. 88/98

Page 90: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Other methods to increase ILP

Loop unrollingTechnique for increasing the amount of code availableinside a loop: make several copies of the loop body

Reduces loop control overhead and increases ILP (moreinstructions to schedule)

When using trace scheduling this results in longer tracesand thus more opportunities for better schedules

In general, the more copies, the better the job the schedulercan do but the gain becomes minimal

Introduction to Compiler Design – A. Pimentel – p. 89/98

Page 91: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop unrolling (cont’d)

Example

for (i = 0; i < 100; i++)

a[i] = a[i] + b[i];becomes

for (i = 0; i < 100; i += 4) {

a[i] = a[i] + b[i];

a[i+1] = a[i+1] + b[i+1];

a[i+2] = a[i+2] + b[i+2];

a[i+3] = a[i+3] + b[i+3];

}

Introduction to Compiler Design – A. Pimentel – p. 90/98

Page 92: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Software pipelining

Also a technique for using the parallelism available inseveral loop iterations

Software pipelining simulates a hardware pipeline, henceits name

pipelinedSoftware

iteration

Iterattion 0Iteration 1

Iteration 2Iteration 3

Iteration 4

There are three phases: Prologue, Steady state and Epilogue

Introduction to Compiler Design – A. Pimentel – p. 91/98

Page 93: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Software pipelining (cont’d)

Loop: LDADDD F4,F0,F2

SD

F0,0(R1)

0(R1),F4

Body

SBGEZ R1, Loop Loop control

T0

T1

T2

T... Loop:

LD

ADDD .

.

SD

LD

LD SBGEZ Loop.ADDD

LD

SD ADDD .Steady state

Prologue

Epilogue

Tn

Tn+1

Tn+2

SD ADDD

SD

Introduction to Compiler Design – A. Pimentel – p. 92/98

Page 94: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Modulo scheduling

Scheduling multiple loop iterations using softwarepipelining can create false dependencies between variablesused in different iterations

Renaming the variables used in different iterations is calledmodulo scheduling

When using n variables for representing the same variable,the steady state of the loop has to be unrolled n times

Introduction to Compiler Design – A. Pimentel – p. 93/98

Page 95: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Compiler optimizations for cache performance

Merging arrays (better spatial locality)

int val[SIZE]; struct merge {

int key[SIZE]; 4 int val, key; };

struct merge m_array[SIZE]

Loop interchange

Loop fusion and fission

Blocking (better temporal locality)

Introduction to Compiler Design – A. Pimentel – p. 94/98

Page 96: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop interchange

Exchanging of nested loops to change the memory footprintBetter spatial locality

for (i = 0; i < 50; i++)

for (j = 0; j < 100; j++)

a[j][i] = b[j][i] * c[j][i];

becomesfor (j = 0; j < 100; j++)

for (i = 0; i < 50; i++)

a[j][i] = b[j][i] * c[j][i];

Introduction to Compiler Design – A. Pimentel – p. 95/98

Page 97: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop fusion

Fuse multiple loops togetherLess loop controlBigger basic blocks (scheduling)Possibly better temporal locality

for (i = 0; i < n; i++)

c[i] = a[i] + b[i];

for (j = 0; j < n; j++)

d[j] = a[j] * e[j];

becomes

for (i = 0; i < n; i++) {

c[i] = a[i] + b[i];

d[i] = a[i] * e[i];

}

Introduction to Compiler Design – A. Pimentel – p. 96/98

Page 98: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Loop fission

Split a loop with independent statements into multiple loopsEnables other transformations (e.g. vectorization)Results in smaller cache footprint (better temporallocality)

for (i = 0; i < n; i++) {

a[i] = b[i] + c[i];

d[i] = e[i] * f[i];

}

becomes

for (i = 0; i < n; i++) {

a[i] = b[i] + c[i];

}

for (i = 0; i < n; i++) {

d[i] = e[i] * f[i];

}

Introduction to Compiler Design – A. Pimentel – p. 97/98

Page 99: Compiler Design

Universityof

Amsterdam

CSACSAComputerSystems

Architecture

Blocking

Perform computations on sub-matrices (blocks), e.g. whenmultiple matrices are accessed both row by row and column bycolumn

i

j

i

k j

k

X Y Zfor (i=0; i < N; i++) for (j=0; j < N; j++) {

r = 0;for (k = 0; k < N; k++) {

r = r + y[i][k]*z[k][j];};x[i][j] = r;

};

Matrix multiplication x = y*z

not touched older access recent access

i

j

i

k j

k

X Y Z

Blocking

Introduction to Compiler Design – A. Pimentel – p. 98/98