towards a more principled compiler: progressive backend compiler optimization

64
School of Computer Science Towards a More Principled Compiler: Progressive Backend Compiler Optimization David Koes 8/28/2006

Upload: cheng

Post on 12-Jan-2016

66 views

Category:

Documents


3 download

DESCRIPTION

Towards a More Principled Compiler: Progressive Backend Compiler Optimization. David Koes 8/28/2006. Performance Gains Due to Compiler (gcc). 2.8Ghz Pentium 4, 1GB RAM, -O3 …. The Future of Compiler Optimization. is this possible?. How do we exploit the existing optimization potential?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

School of Computer Science

Towards a More Principled Compiler:

Progressive Backend Compiler Optimization

Towards a More Principled Compiler:

Progressive Backend Compiler Optimization

David Koes8/28/2006

Page 2: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

2School of Computer Science

Performance Gains Due to Compiler (gcc)Performance Gains Due to Compiler (gcc)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Nov-95Nov-96Nov-97Nov-98Nov-99Nov-00Nov-01Nov-02Nov-03Nov-04Nov-05Nov-06Nov-07Nov-08Nov-09

SPEC2000 Performance Improvement

2.8Ghz Pentium 4, 1GB RAM, -O3 …

Page 3: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

3School of Computer Science

The Future of Compiler OptimizationThe Future of Compiler Optimization

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Nov-95Nov-96Nov-97Nov-98Nov-99Nov-00Nov-01Nov-02Nov-03Nov-04Nov-05Nov-06Nov-07Nov-08Nov-09

SPEC2000 Performance Improvement

is this possible?

10-30% improvement just from reordering compiler phaseshttp://www.cs.rice.edu/~keith/Adapt/

Yes!Yes!How do we exploit the existing optimization potential?

Need a more principled compilerNeed a more principled compiler

Page 4: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

4School of Computer Science

Compiler code size improvementCompiler code size improvement

0%

5%

10%

15%

20%

25%

Nov-95May-96Nov-96May-97Nov-97May-98Nov-98May-99Nov-99May-00Nov-00May-01Nov-01May-02Nov-02May-03Nov-03May-04Nov-04May-05Nov-05

Code size improvement

Page 5: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

5School of Computer Science

A Principled CompilerA Principled Compiler

A compiler that A compiler that – knows right from wrongknows right from wrong

(less optimal from more optimal)(less optimal from more optimal)– follows a rigorous procedure to get the desired outputfollows a rigorous procedure to get the desired output

Page 6: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

6School of Computer Science

Today’s Compiler Today’s Compiler

target dependenttarget dependenttarget dependenttarget dependent

target independenttarget independenttarget independenttarget independent

const

proploop unroll

GVNstrength reductSCCP

code motion

…copy prop

inlining

DCE

PRE

peephole …

reg allocinsn sched

branch opt

optimized program

machine description

Problems– some phases not

internally optimal• purely heuristic solution

– machine description mostly ignored

– lack of integration between phases

insn select

Page 7: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

7School of Computer Science

optimized program

machine description

copy

prop

loop unrol

l

DCE

PREconst

prop

code motio

ninline

GVN

strength

reduct peep

-hole

CSE SCC

Preg alloc

branch optinsn

select

Ideal CompilerIdeal Compiler– each phase locally optimal– makes full use of machine

description– tight integration between

phases

Absolutely Absolutely nono idea how to do idea how to do this or if it’s even this or if it’s even possiblepossible

Page 8: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

8School of Computer Science

optimized program

machine description

copy

prop

loop unrol

l

DCE

PREconst

prop

code motio

ninline

GVN

strength

reduct peep

-hole

CSE SCC

Preg alloc

branch opt

Towards a More Principled CompilerTowards a More Principled Compiler– each phase locally optimal– makes full use of machine

description– tight integration between

phases

insn selec

t

reg alloc

Page 9: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

9School of Computer Science

OutlineOutline

I. Motivation

II. Related Work

III. Completed Work

IV. Proposed Work

V. Contributions & Timeline

Page 10: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

10School of Computer Science

Register Allocation ProblemRegister Allocation Problem

v = 1

w = v + 3

x = w + v

u = v

t = u + x

print(x);

print(w);

print(t);

print(u);

registerregisterallocatorallocatorregisterregisterallocatorallocator

unbounded number of unbounded number of program variablesprogram variables

limited number of limited number of processor registers + processor registers + slow memoryslow memory

eaxebxecxedxesiedi

ebpesp

spill code optimizationspill code optimizationspill code optimizationspill code optimization

memory operandsmemory operandsmemory operandsmemory operands

register preferencesregister preferencesregister preferencesregister preferencesrematerializationrematerializationrematerializationrematerialization

live range splittinglive range splittinglive range splittinglive range splitting

Related WorkRelated Work

Page 11: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

11School of Computer Science

Method Expressive Fast Optimal

Linear Scan

Graph Coloring

Integer Linear Programming

Partitioned Boolean Quadratic Programming / /

Register Allocation Previous WorkRegister Allocation Previous Work Related WorkRelated Work

Page 12: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

12School of Computer Science

Instruction Selection ProblemInstruction Selection Problem

movl (p),t1leal (x,t1),t2leal 1(y),t3leal (t2,t3),r

IRIR AssemAssem

instruction selector

instruction selector

minimum cost tilingminimum cost tilingminimum cost tilingminimum cost tiling?

IR RepresentationIR RepresentationIR RepresentationIR Representation

Related WorkRelated Work

Page 13: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

13School of Computer Science

Instruction Selection Previous WorkInstruction Selection Previous Work

MethodDAG Tiling

Register Allocation Aware

Fast Optimal

Dynamic Programming

Binate Covering

Peephole Based Instruction Selection

AVIV Code Generator

Exhaustive Search

Related WorkRelated Work

Page 14: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

14School of Computer Science

OutlineOutline

I. Motivation

II. Related Work

III. Completed Work

IV. Proposed Work

V. Contributions & Timeline

Page 15: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

15School of Computer Science

A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture

– optimal solutions

reg allocreg

alloc

machine description

Completed WorkCompleted Work

Page 16: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

16School of Computer Science

Multi-commodity Network Flow: An Expressive ModelMulti-commodity Network Flow: An Expressive ModelGiven network (directed graph) with

– cost and capacity on each edge– sources & sinks for multiple commodities

Find lowest cost flow of commodities

NP-complete for integer flows

Example:edges have unit capacity

a b

a b

01

Completed WorkCompleted Work

Page 17: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

17School of Computer Science

Variables Commodities

Variable Definition Source

Variable Last Use Sink

Nodes Allocation Classes (Reg/Mem/Const)

Registers Limits Node Capacities

Spill Costs Edge Costs

Allocation Flow

Register Allocation as a MCNFRegister Allocation as a MCNF

a

a

r0 r1 mem 1

r1 mem 1

r0 r1 mem 1

3

Completed WorkCompleted Work

Page 18: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

18School of Computer Science

ExampleExampleSource Codeint example(int a, int b){ int d = 1; int c = a - b; return c+d;}

Pre-alloc AssemblyMOVE 1 -> dSUB a,b -> cADD c,d -> cMOVE c -> r0

insn pref cost

mem access cost

load cost

Completed WorkCompleted Work

Page 19: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

19School of Computer Science

Control FlowControl FlowMCNF can only represent straight-line code

– need to link together networks from basic blocks

Extend MCNF model with merge and split nodes to implement boundary constraints.

a: %eaxa: %eax

a: %eaxa: %eaxa: %eaxa: %eax

a: mema: mem

a: mema: mem

a: mema: mem

a: mema: mem

details in proposal document…details in proposal document…

along with modeling persistence of along with modeling persistence of values in memoryvalues in memory

Completed WorkCompleted Work

Page 20: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

20School of Computer Science

A Better Register AllocatorA Better Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture: Global MCNF

– locally optimal• NP-hard, so use progressive

solution technique

reg allocreg

alloc

machine description

Completed WorkCompleted Work

Page 21: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

21School of Computer Science

A Better Register AllocatorA Better Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture: Global MCNF

– locally optimal• NP-hard, so use progressive

solution technique

reg allocreg

alloc

machine description

Completed WorkCompleted Work

Page 22: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

22School of Computer Science

Progressive Solution TechniqueProgressive Solution Technique

Quickly find a good allocation

Then progressively find better allocations– until optimal allocation found– or time limit is reached

Compile Time

Allo

catio

n Q

ualit

y

Lagrangian relaxation directed allocatorsLagrangian relaxation directed allocators

Technique:Technique:

Completed WorkCompleted Work

Page 23: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

23School of Computer Science

Lagrangian Relaxation: IntuitionLagrangian Relaxation: IntuitionRelaxes the hard constraints

– only have to solve single commodity flow

Combines easy subproblems using a Lagrangian multiplier (price)– an additional price on each edge– a price on each split/merge node

a b

a b

01

Example:edges have unit capacity

a b

a b

0+11with price, solution to single commodity flow can be solution to multicommodity flow

Completed WorkCompleted Work

Page 24: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

24School of Computer Science

Solution ProcedureSolution ProcedureCompute prices with iterative

subgradient optimization– guaranteed converge to optimal prices– optimal for linear relaxation

At each iteration, construct a feasible integer solution using current prices– iterative allocator in documentin document

– simultaneous allocator– trace-based simultaneous allocator

a b

a b

0+1+11

a b

a b

0+11

Completed WorkCompleted Work

Page 25: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

25School of Computer Science

Simultaneous AllocatorSimultaneous Allocator

XX XX

Current cost:-1-1-3-3-2-2

Edges to/from memory cost 3

Completed WorkCompleted Work

Page 26: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

26School of Computer Science

Trace-Based AllocationTrace-Based AllocationDecompose function into traces of basic blocks

– run simultaneous allocator on each trace– control flow internal to trace presents difficulty

addressed in proposal documentaddressed in proposal document

Completed WorkCompleted Work

Page 27: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

27School of Computer Science

EvaluationEvaluationImplemented in gcc 3.4.4 targeting x86

Optimize for code sizecode size– perfect static evaluation– important metric in its own right

MediaBench, MiBench, Spec95, Spec2000– over 10,000 functions

Completed WorkCompleted Work

Page 28: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

28School of Computer Science

ProgressivenessProgressivenesssquareEncrypt

Completed WorkCompleted Work

Page 29: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

29School of Computer Science

ProgressivenessProgressivenessquicksort

Completed WorkCompleted Work

Page 30: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

30School of Computer Science

0%

1%

2%

3%

4%

5%

6%

7%

8%

initial heuristics only 10 iterations 100 iterations 1000 iterations default allocatorAverage code size improvement over graph allocator

Code SizeCode Size

Progressive!

Completed WorkCompleted Work

Page 31: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

31School of Computer Science

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 iteration 10 iterations 100 iterations 1000 iterations

Percent of functions

>25% from optimal

<=25% from optimal

<=10% from optimal

<=5% from optimal

<=1% from optimal

optimal

OptimalityOptimality

Proven optimality

Completed WorkCompleted Work

Page 32: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

32School of Computer Science

Compile Time Slowdown :-(Compile Time Slowdown :-(

9.2x slower

Completed WorkCompleted Work

Page 33: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

33School of Computer Science

A Better Register AllocatorA Better Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture: Global MCNF

– locally optimal• approach optimality using

progressive solution technique: Lagrangian directed allocators

reg allocreg

alloc

machine description

Completed WorkCompleted Work

Page 34: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

34School of Computer Science

OutlineOutline

I. Motivation

II. Related Work

III. Completed Work

IV. Proposed Work

V. Contributions & Timeline

Page 35: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

35School of Computer Science

A Better Better Register AllocatorA Better Better Register AllocatorSolver Improvements

– Improve initial solution– Improve quality as prices converge– Hope to prove approximation bounds

Model Improvements– Improve accuracy of model– Model simplification– Represent uniform register sets efficiently

Proposed WorkProposed Work

Page 36: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

36School of Computer Science

Model SimplificationModel SimplificationSummarize overly expressive sections of the model

Conservative simplificationdoes not change optimal value

Aggressive simplificationexplore tradeoff between model complexity and optimality

Conservative simplificationdoes not change optimal value

Aggressive simplificationexplore tradeoff between model complexity and optimality

Proposed WorkProposed Work

Page 37: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

37School of Computer Science

Instruction Selection Interaction Instruction Selection Interaction

perform same operation

which instruction is best depends on the register allocator

so let register allocator decide

which instruction is best depends on the register allocator

so let register allocator decide

Proposed WorkProposed Work

Page 38: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

38School of Computer Science

Register Allocation Aware Instruction SElection (RA2ISE)Register Allocation Aware Instruction SElection (RA2ISE)Instruction selection not finalized

until register allocation

IR tiled with Register Allocation Aware Tiles (RAATs)

A RAAT represents several instruction sequences– different costs– a sequence for every possible

register allocation

Proposed WorkProposed Work

Page 39: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

39School of Computer Science

RA2ISERA2ISE

tilingtilingtilingtiling

IR RAAT

modelmodelcreatiocreatio

nn

modelmodelcreatiocreatio

nn

registerregisterallocationallocationregisterregister

allocationallocationcwtl %eaxcwtl %eax

Proposed WorkProposed Work

Page 40: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

40School of Computer Science

Implementing RA2ISEImplementing RA2ISEAdd side-constraints to Global MCNF model

– implement inter-variable preferences and constraints• “if x allocated to r1 and y allocated to r2, then save three bytes”

• “x and y must be allocated to the same register”

Implement x86 RAATs– RAAT tables created manually– GMCNF RAAT representation automatically generated

from RAAT table with minimum use of side constraints

Algorithms for tiling RAATs– leverage existing algorithms– exploit feedback between passes

Proposed WorkProposed Work

Page 41: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

41School of Computer Science

Tiling RAATsTiling RAATs

3

2

4

24

1

1

53

3

2

1

1

1

53

3

2

1

11

13

1

1

4

1

14

2

3

tilingtilingtilingtiling

1

14

2

3

4

3

eax

edx memmem

registerregister

allocateallocateregister

register

allocateallocate

feedback

feedback

feedback

feedback

Proposed WorkProposed Work

Page 42: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

42School of Computer Science

EvaluationEvaluationImplement in production quality compiler (gcc)

Evaluate code size and simple code speed metric

Evaluate on three different architectures– x86 (8 registers)– 68k/ColdFire (16 registers)– PPC (32 registers)

Proposed WorkProposed Work

Page 43: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

43School of Computer Science

OutlineOutline

I. Motivation

II. Related Work

III. Completed Work

IV. Proposed Work

V. Contributions & Timeline

Page 44: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

44School of Computer Science

ContributionsContributionsRA2ISE

– register allocation aware tiles (RAATs) explicitly encode effect of register allocation on instruction sequence

– algorithms for tiling RAATs– expressive model of register allocation that operates

on RAATs and explicitly represents all important components of register allocation

– progressive solver for this model that can quickly find decent solution and approaches optimality as more time is allowed for compilation

Comprehensive evaluation of RA2ISE

Page 45: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

45School of Computer Science

Thesis StatementThesis Statement

RARA22ISE is a principled and effective system ISE is a principled and effective system for performing instruction selection and for performing instruction selection and

register allocation.register allocation.

RARA22ISE is a principled and effective system ISE is a principled and effective system for performing instruction selection and for performing instruction selection and

register allocation.register allocation.

Page 46: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

46School of Computer Science

One Step Towards a More Principled CompilerOne Step Towards a More Principled Compiler

optimized program

machine description

copy

prop

loop unrol

l

DCE

PREconst

prop

code motio

ninline

GVN

strength

reduct peep

-hole

CSE SCC

Preg alloc

branch opt

…insn selec

t

reg alloc

Page 47: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

47School of Computer Science

TimelineTimelineFall 2006

add simple speed metric option to modelbegin model simplification workimprove model accuracy and solver performance

Winter 2006

finish model simplification workadd side-constraints to modelimplement existing gcc tiles as RAATsimprove model accuracy and solver performance

Spring 2007

finish implementation of side-constraints and gcc RAATsbegin work on RA2ISE infrastructurecreate gcc-independent set of RAATs for x86improve model accuracy and solver performance

Summer 2007finish work on RA2ISEinvestigate and develop tiling algorithmsimprove model accuracy and solver performance

Fall 2007add 68k/ColdFire and PowerPC targetsinvestigate uniform register set simplificationsimprove model accuracy and solver performance

Winter 2007begin writing thesiswork on improving compile time performance

Spring 2008 finish writing thesis

Page 48: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

48School of Computer Science

Andrew Richard Koes

Page 49: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

49School of Computer Science

Questions?Questions?

?

Page 50: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

50School of Computer Science

Processor PerformanceProcessor Performance

0%

1000%

2000%

3000%

4000%

5000%

6000%

7000%

Nov-95May-96Nov-96May-97Nov-97May-98Nov-98May-99Nov-99May-00Nov-00May-01Nov-01May-02Nov-02May-03Nov-03May-04Nov-04May-05Nov-05May-06

SPEC2000 Performance Improvement

Performance w/o Compiler Improvements

Double every 24 months Double every 18 months

Page 51: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

51School of Computer Science

Instruction Selection & Register AllocationInstruction Selection & Register Allocation

machine description

reg allocreg

alloc

insn selec

t

insn selec

t

– fully utilize machine description– locally optimal– tight integration between phases

Page 52: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

52School of Computer Science

Costs of Register AllocationCosts of Register AllocationSpilling to/from memory

movl 8(%ebp), %edx

Direct memory accessaddl 8(%ebp), %eax

Moving between registersmovl %edx,%ecx

Rematerialization of constant valuemovl $3,%eax

Register usage preferencesimul %edx,%eax

vs.imul %edx,%ecx

Page 53: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

53School of Computer Science

Iterative Heuristic AllocatorIterative Heuristic AllocatorAllocate each variable in a heuristic priority order

Find shortest path in each block– avoid edges that make remaining problem infeasible

Process blocks in topological order– allocation at block entry fixed by previous blocks

– shortest path is minimum cost allocation for a variable– allocate most significant variables first

– shortest path is minimum cost allocation for a variable– allocate most significant variables first

Intuition:Intuition:

– greedy: can’t undo poor decisions– greedy: can’t undo poor decisions

Limitation:Limitation:

Page 54: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

54School of Computer Science

Iterative Heuristic AllocatorIterative Heuristic AllocatorAllocation order:

a, b, c, d

Cost:

a

0

b

4

c

0

d

-2

Total: 22

Edges to/from memory cost 3

Page 55: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

55School of Computer Science

Simultaneous AllocatorSimultaneous AllocatorScan each block

– maintain an allocation of all live variables– at variable definition find cheapest allocation

• allocation with shortest path to variable’s sink or block exit• allowed to evict (reallocate) already allocated variable

– eviction cost shortest path to edge from current allocation to new allocation in this block

– cost of eviction added to shortest path cost

– minimizing cost for all variables at once– minimizing cost for all variables at once

Intuition:Intuition:

– path computations limited to single block– future blocks do not change previous block allocations

– path computations limited to single block– future blocks do not change previous block allocations

Limitation:Limitation:

Page 56: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

56School of Computer Science

easy-updateeasy-updateeasy-updateeasy-update

full-updatefull-updatefull-updatefull-update

Trace-Based AllocationTrace-Based AllocationDecompose function into traces of basic blocks

– run simultaneous allocator on each trace– control flow internal to trace

• update only blocks that are necessary (easy-update)• update all effected blocks (full-update)

Page 57: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

57School of Computer Science

0%

10%

20%

30%

40%

50%

60%

>10%10-5% 5-3% 3-2% 2-1% 1-0%0%

0-1% 1-2% 2-3% 3-5% 5-10%>10%

Percent difference between predicted and actual size

Percent of functionslarger than predicted(under-predicted)

smaller than predicted(over-predicted)

Accuracy of the ModelAccuracy of the ModelGlobal MCNF model correctly predicts costs of register allocation within 2% for 72.5% of functions compiled

Page 58: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

58School of Computer Science

Compile Time Slowdown :-(Compile Time Slowdown :-(

10x slower

0

5

10

15

20

25

30

35

40

45

50

099.go

124.m88ksim129.compress

130.li

132.ijpeg134.perl147.vortex

164.gzip

168.wupwise

171.swim173.applu175.vpr176.gcc181.mcf

183.equake188.ammp197.parser254.gap

255.vortex256.bzip2300.twolf301.apsiCRC32

adpcm_dadpcm_ebasicmathbitcount

blowfish_dblowfish_e

dijkstraepic_depic_eg721_dg721_egsm_dgsm_eispell

jpeg_djpeg_elamemesa

mpeg2_dmpeg2_epatricia

pegwit_dpegwit_epgp_dpgp_eqsortrasta

sha

stringsearch

susan

geo. mean

Factor slower than graph allocation

Initialization Initial Iterative Allocation Initial Simultaneous Allocation One Iteration

Page 59: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

59School of Computer Science

Code size improvementCode size improvement

0%

1%

2%

3%

4%

5%

6%

7%

8%

initial heuristicsonly

10 iterations 100 iterations 1000 iterations default allocatorCode size improvement over graph allocator

without traces with easy traces with full traces

Page 60: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

60School of Computer Science

Code Size ImprovementCode Size Improvement

-5%

0%

5%

10%

15%

20%

099.go

124.m88ksim129.compress

130.li

132.ijpeg134.perl147.vortex

164.gzip

168.wupwise

171.swim173.applu175.vpr176.gcc181.mcf

183.equake188.ammp197.parser254.gap

255.vortex256.bzip2300.twolf301.apsiCRC32adpcm

basicmathbitcountblowfishdijkstra

epicg721gsmispelljpeglamemesampeg2patriciapegwit

pgpqsortrastasha

stringsearch

susanaverage

Code size improvement over graph allocator

initial heuristics only 10 iterations 100 iterations 1000 iterations

Page 61: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

61School of Computer Science

Code Size ImprovementCode Size Improvement

-6%

-4%

-2%

0%

2%

4%

6%

8%

10%

12%

099.go

124.m88ksim129.compress

130.li

132.ijpeg134.perl147.vortex

164.gzip

168.wupwise

171.swim173.applu175.vpr176.gcc181.mcf

183.equake188.ammp197.parser254.gap

255.vortex256.bzip2300.twolf301.apsiCRC32adpcm

basicmathbitcountblowfishdijkstra

epicg721gsmispelljpeglamemesampeg2patriciapegwit

pgpqsortrastasha

stringsearch

susanaverage

Code size improvement over default allocator

initial heuristic only 10 iterations 100 iterations 1000 iterations

Page 62: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

62School of Computer Science

Code PerformanceCode Performance

Page 63: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

63School of Computer Science

int foo(int a, short b) { return a*4+b; }

4 movl 4(%esp),%eax3 sall $2,%eax4 addl 8(%esp),%eax1 cwtl1 ret

5 movswl 8(%esp),%edx4 movl 4(%esp),%eax3 leal (%edx,%eax,4),%eax1 ret

Integrating Register Allocation and Instruction SelectionIntegrating Register Allocation and Instruction Selection

Page 64: Towards a More Principled Compiler: Progressive Backend Compiler Optimization

64School of Computer Science

Another RAAT Another RAAT