constraint programming in compiler optimization: lessons learned

45
Constraint Programming in Compiler Optimization: Lessons Learned Peter van Beek University of Waterloo

Upload: peter-van-beek

Post on 26-May-2015

321 views

Category:

Education


0 download

DESCRIPTION

Talk given in the "Special Track on SAT and CSP Technologies" at the ICTAI 2013 conference, November, 2013, Washington, DC.

TRANSCRIPT

Page 1: Constraint Programming in Compiler Optimization: Lessons Learned

Constraint Programming in Compiler Optimization: Lessons Learned

Peter van BeekUniversity of Waterloo

Page 2: Constraint Programming in Compiler Optimization: Lessons Learned

Acknowledgements

• Joint work with:

Omer Beg

Alejandro López-Ortiz

Abid Malik

Jim McInnes

Wayne Oldford

Claude-Guy Quimper

John Tromp

Kent Wilken

Huayue Wu

• Funding:

NSERC

IBM Canada

Page 3: Constraint Programming in Compiler Optimization: Lessons Learned

Application-driven research

• Idea:

• pick an application—a real-world problem—where, if you solve it, there would be a significant impact

• Along the way, if all goes well, you will also:

• identify and fill gaps in theory

• identify and solve interesting sub-problems whose solutions will have general applicability

Page 4: Constraint Programming in Compiler Optimization: Lessons Learned

Optimization problems in compilers

• Instruction selection

• Instruction scheduling

• basic-block scheduling

• super-block scheduling

• loop scheduling: tiling, unrolling, fusion

• Memory hierarchy optimizations

• Register allocation

Page 5: Constraint Programming in Compiler Optimization: Lessons Learned

Optimization problems in compilers

• Instruction selection

• Instruction scheduling

• basic-block scheduling

• super-block scheduling

• loop scheduling: tiling, unrolling, fusion

• Memory hierarchy optimizations

• Register allocation

Page 6: Constraint Programming in Compiler Optimization: Lessons Learned

Production compilers

“At the outset, note that basic-block scheduling is an NP-hard problem, even with a very simple formulation of the problem, so we must seek an effective heuristic, rather than an exact approach.”

Steven Muchnick, Advanced Compiler Design

& Implementation, 1997

Page 7: Constraint Programming in Compiler Optimization: Lessons Learned

Outline

• Introduction

• computer architecture

• superblock scheduling

• Constraint programming approach

• temporal scheduler

• spatial and temporal scheduler

• Experiments

• experimental setup

• experimental results

• Lessons learned

Page 8: Constraint Programming in Compiler Optimization: Lessons Learned

• Multiple-issue

• multiple functional units; e.g., ALUs, FPUs, load/store units, branch units

• multiple instructions can be issued (begin execution) each clock cycle

• issue width: max number of instructions that can be issued each clock cycle

• on most architectures issue width less than number of functional units

Computer architecture:Performing instructions in parallel

Page 9: Constraint Programming in Compiler Optimization: Lessons Learned

• Pipelining

• overlap execution of instructions on a single functional unit

• latency of an instruction

number of cycles before result is available

• execution time of an instruction

number of cycles before next instruction can be issued on same functional unit

• serializing instruction

instruction that requires exclusive use of entire processor in cycle in which it is issued

Computer architecture:Performing instructions in parallel

Analogy: vehicle assembly line

Page 10: Constraint Programming in Compiler Optimization: Lessons Learned

Superblock instruction scheduling

• Instruction scheduling

• assignment of a clock cycle to each instruction

• needed to take advantage of complex features of architecture

• sometimes necessary for correctness (VLIW)• Basic block

• straight-line sequence of code with single entry, single exit

• Superblock

• collection of basic blocks with a unique entrance but multiple exits

• Given a target architecture, find schedule with minimum expected completion time

Page 11: Constraint Programming in Compiler Optimization: Lessons Learned

dependency DAG

• nodes

• one for each instruction

• labeled with execution

time

• nodes F and G are branch

instructions, labeled with

probability the exit is

taken

• arcs

• represent precedence

• labeled with latencies

Example superblock1

A:1

D:1

C:1

E:1

F:1

G:1

1

2

B:3

40%

60%

5 5

2

0

0 0

Page 12: Constraint Programming in Compiler Optimization: Lessons Learned

Example superblock

optimal cost schedule for 2-issue processor

cycle

ALU FPU

1 A

2 B

3

4

5 C

6

7 D

8 E

9 F

10 G

1

A:1

D:1

C:1

E:1

F:1

G:1

1

2

B:3

40%

60%

5 5

2

0

0 0

Page 13: Constraint Programming in Compiler Optimization: Lessons Learned

register file

f i b m

processor

Computer architecture: General purpose architectures

Page 14: Constraint Programming in Compiler Optimization: Lessons Learned

cluster inter-

connect

f0

c1

register file

c0

register file

i0 b0 m0

f1 i1 b1 m1

cluster 0

cluster 1m2b2i2f2

c2

register file

c3

register file

f3 i3 b3 m3

cluster 3

cluster 2

Computer architecture: Clustered architectures

Page 15: Constraint Programming in Compiler Optimization: Lessons Learned

Computer architecture: Clustered architectures

• Current: digital signal processing

• multimedia, audio processing, image processing

• wireless, ADSL modems, …

• Future trend: general purpose multi-core processors

• large numbers of cores

• fast inter-processor communication

Page 16: Constraint Programming in Compiler Optimization: Lessons Learned

cycle c0

12345678910

A

FGH

B

C

DE

cycle

c0 c1

123456789

10

A

F

H

B

D

G

C

E

Spatial and temporal scheduling

D E

F G

C

H

A1 1

2

20%

80%

B

cost = 9.8

cost = 7.6

22

2

2

1

1

Page 17: Constraint Programming in Compiler Optimization: Lessons Learned

Spatial and temporal scheduling

D E

F G

C

H

A1 1

2

20%

80%

B

2

2

2

1

1

2

cycle

c0 c1

123456789

10

A

F

H

B

D

G

C

E

cost = 7.6

Page 18: Constraint Programming in Compiler Optimization: Lessons Learned

Approaches

• Superblock instruction scheduling is NP-complete

• Heuristic approaches in all commercial and open-source research compilers

• greedy list scheduling algorithm coupled with a priority heuristic

• Here: Optimal approach

• useful when longer compile times are tolerable

• e.g., compiling for software libraries, digital signal processing, embedded applications, final production build

Page 19: Constraint Programming in Compiler Optimization: Lessons Learned

Outline

• Introduction

• computer architecture

• superblock scheduling

• Constraint programming approach

• temporal scheduler

• spatial and temporal scheduler

• Experiments

• experimental setup

• experimental results

• Lessons learned

Page 20: Constraint Programming in Compiler Optimization: Lessons Learned

1

A

D

C

E

Temporal scheduler:Basic constraint model

F

G

1

2

B

40%

60%

5 5

2

0

0 0

variables

A, B, C, D, E, F, G

domains

{1, …, m}

constraints

B A + 1, C A + 1,

D B + 5, …, G F

gcc(A, B, C, F, G, nALU)

gcc(D, E, nFPU)

gcc(A, …, G, issuewidth)

cost function

40F + 60G

Page 21: Constraint Programming in Compiler Optimization: Lessons Learned

Temporal schedulerBasic constraint model (con’t)

B:3non-fully pipelined

instructions• introduce auxiliary variables

PB,1

PB,2

• introduce additional constraints

B + 1 = PB,1

B + 2 = PB,2

gcc(A, B, PB,1, PB,2 C, F, G, nALU)

serializing instructions

• similar technique

Page 22: Constraint Programming in Compiler Optimization: Lessons Learned

• Add constraints to increase constraint propagation (e.g., Smith 2006)

• implied constraints: do not change set of solutions

• dominance constraints: preserve an optimal solution

• Here:

• many constraints added to constraint model in extensive preprocessing stage that occurs once

• extensive preprocessing effort pays off as model is solved many times

Temporal scheduler:Improving the model

Page 23: Constraint Programming in Compiler Optimization: Lessons Learned

• From optimization to satisfaction

• find bounds on cost function

• enumerate solutions to cost function (knapsack constraint; Trick 2001)

• step through in increasing order of cost

• Improved bounds consistency algorithm for gcc constraints

• Use portfolio to improve performance (Gomes et al. 1997)

• increasing levels of constraint propagation

• Impact-based variable ordering (Refalo 2004)

• Structure-based decomposition technique (Freuder 1994)

Temporal scheduler:Improving the solver

Page 24: Constraint Programming in Compiler Optimization: Lessons Learned

Spatial and temporal scheduler:Basic constraint model

variables

cycle of issue: xA, xB, …, xH

cluster: yA, yB, …, yH

domains

dom(x) = {1, …, m}

dom(y) = {0, …, k−1}

communication constraints

yA ≠ yC → xC ≥ xA + 1 + cost

yA = yC → xC ≥ xA + 1

cost function

80 xH + 20 xG

D E

F G

C

H

A1 1

2

20%

80%

B

2

2

2

1

1

2

Page 25: Constraint Programming in Compiler Optimization: Lessons Learned

• Symmetry breaking

• add auxiliary variables: zAC, zBC, …

• dom(z) = {‘=’, ‘≠’}

• instead of backtracking on the y’sbacktrack on the edges with z’s

• preserves at least one optimal solution

Spatial and temporal scheduler:Improving the model

A B

C

D

2 1

1

Page 26: Constraint Programming in Compiler Optimization: Lessons Learned

• Preprocess DAG to find instructions which must be on same cluster

• preserve an optimal solution

• Variable ordering

• assign z variables first, in breadth-first order of DAG

• determine assignment for corresponding y variables

• determine cost of temporal schedule for these assignments

Spatial and temporal scheduler:Improving the solver

Page 27: Constraint Programming in Compiler Optimization: Lessons Learned

Outline

• Introduction

• computer architecture

• superblock scheduling

• Constraint programming approach

• temporal scheduler

• spatial and temporal scheduler

• Experiments

• experimental setup

• experimental results

• Lessons learned

Page 28: Constraint Programming in Compiler Optimization: Lessons Learned

• All 154,651 superblocks from SPEC 2000 integer and floating pt. benchmarks

• standard benchmark suite

• consists of software packages chosen to be representative of types of programming languages and applications

• superblocks generated by IBM’s Tobey compiler when compiling the software packages

• compilations done using Tobey’s highest level of optimization

Experimental setup: Instances

Page 29: Constraint Programming in Compiler Optimization: Lessons Learned

Experimental setup: Target architectures

architecture issue width

simple int. units

complex int. units

memory units

branch units

floating pt. units

1-issue 1 1

2-issue 2 1 1 1 1

4-issue 4 2 1 1 1 1

6-issue 6 2 2 3 2

Realistic architectures: • not fully pipelined• issue width not equal to number of functional units• serializing instructions

Page 30: Constraint Programming in Compiler Optimization: Lessons Learned

Experimental results: Temporal scheduler

1 sec. 10 sec. 1 min. 10 min.

architecture time % time % time % time %

1-issue 1:30:20

97.34

7:15:46 99.38

10:22:36 99.96

15:08:44 99.98

2-issue 3:57:13

91.83

30:53:83

93.90

108:50:01

97.18

665:31:00

97.70

4-issue 2:17:44

95.47

17:09:48

96.60

61:29:31 98.43

343:04:46

98.87

6-issue 3:04:18

93.59

25:03:44

94.76

87:04:34 97.78

511:19:14

98.29

Total time (hh:mm:ss) to schedule all superblocks and percentage solved to optimality, for various time limits for solving each instance

Page 31: Constraint Programming in Compiler Optimization: Lessons Learned

Spatial and temporal scheduler:Some related work

• Bottom Up Greedy (BUG) [Ellis. MIT Press ‘86]

• greedy heuristic algorithm

• localized clustering decisions

• Hierarchical Partitioning (RHOP) [Chu et al. PLDI ‘03]

• coarsening and refinement heuristic

• weights of nodes and edges updated as algorithm progresses

Page 32: Constraint Programming in Compiler Optimization: Lessons Learned

ammp

applu ap

si artbzip

2cra

ftyeo

n

equak

e

facere

cfm

a3d

galge

lgc

cgzi

pluca

smcf

mesamgri

dpars

er

perlbmk

sixtra

cksw

imtw

olf

vorte

x vpr

wupwise

AVERAGE

0.4

0.6

0.8

1

1.2

1.4

1.64-cluster-2-issue-2-cyl

rhop-ls rhop-opt cp

Benchmarks

Aver

age

Spe

edup

Experimental results:Spatial and temporal scheduler

Page 33: Constraint Programming in Compiler Optimization: Lessons Learned

1―1 1―2 1―4 1―6 2―1 2―2 2―4 2―6 4―1 4―2 4―4 4―6 8―1 8―2 8―4 8―60.600000000000001

1

1.4

1.8

2.2

2.6

3 applu-2-cyl

rhop-ls rhop-opt cp

Architecture Configuration (#Clusters – IssueWidth)

Aver

age

Spee

dup

Experimental results:Spatial and temporal scheduler

Page 34: Constraint Programming in Compiler Optimization: Lessons Learned

Outline

• Introduction

• computer architecture

• superblock scheduling

• Constraint programming approach

• temporal scheduler

• spatial and temporal scheduler

• Experiments

• experimental setup

• experimental results

• Lessons learned

Page 35: Constraint Programming in Compiler Optimization: Lessons Learned

Lessons learned (I)

• Pick problem carefully

• is a new solution needed?

• what is the likelihood of success?

• Existing heuristics may not leave any room for improvement

• examples: basic block scheduling, instruction selection

Page 36: Constraint Programming in Compiler Optimization: Lessons Learned

Lessons learned (II)

• Be prepared for adversity

• significant overhead

• learning domain of application

• significant implementation

• significant engineering

• different research cultures

• researchers are tribal

• different standards of reviewing (number & contentiousness)

• different standards of evaluation, formalization, assumptions

Page 37: Constraint Programming in Compiler Optimization: Lessons Learned

Lessons learned (III)

• Rewards

• can be attractive to students

• can lead to identifying and solving interesting sub-problems whose solutions have general applicability

• bounds consistency for alldifferent and gcc global constraints

• restarts and portfolios

• machine learning of heuristics

Page 38: Constraint Programming in Compiler Optimization: Lessons Learned

Optimization problems in compilers

• Instruction selection

• Instruction scheduling

• basic-block scheduling

• super-block scheduling

• loop scheduling: tiling, unrolling, fusion

• Memory hierarchy optimizations

• Register allocation

Page 39: Constraint Programming in Compiler Optimization: Lessons Learned

Selected publications

• ApplicationsA. M. Malik, M. Chase, T. Russell, and P. van Beek. An application of constraint programming to superblock instruction scheduling. CP-2008.

M. Beg and P. van Beek. A constraint programming approach for integrated spatial and temporal scheduling for clustered architectures. ACM TECS, To appear.

• Global constraintsC.-G. Quimper, P. van Beek, A. Lopez-Ortiz, A. Golynski, and S. Bashir Sadjad. An efficient bounds consistency algorithm for the global cardinality constraint. CP-2003.

A. Lopez-Ortiz, C.-G. Quimper, J. Tromp, and P. van Beek. A fast and simple algorithm for bounds consistency of the alldifferent constraint. IJCAI-2003.

• Portfolios and restartsH. Wu and P. van Beek. On portfolios for backtracking search in the presence of deadlines. ICTAI-2007.

H. Wu and P. van Beek. On universal restart strategies for backtracking search. CP-2007.

• Heuristics and machine learningT. Russell, A. M. Malik, M. Chase, and P. van Beek. Learning heuristics for the superblock instruction scheduling problem. IEEE TKDE, 2009.

M. Chase, A. M. Malik, T. Russell, R. W. Oldford, and P. van Beek. A computational study of heuristic and exact techniques for superblock instruction scheduling. J. of Scheduling, 2012.

Page 40: Constraint Programming in Compiler Optimization: Lessons Learned

Next project:Smart water infrastructure / water analytics

Page 41: Constraint Programming in Compiler Optimization: Lessons Learned

Spatial and temporal scheduler: Search tree of basic model

A B

C

D

2 1

1

0yA= 1 32

0 1 32

0 1 32

0 1 32

0 1 32

0 1 32

0 1 32

find temporal schedule for y = (0, 0, 0, 2)

yB=

yC=

yD=

Page 42: Constraint Programming in Compiler Optimization: Lessons Learned

(‘=’) (‘≠’)

(‘≠’)

(‘≠’)

determine y,find temporal schedule

for y =(0,0,0,0)same as y =(1,1,1,1) etc.

zAC=

zBC=

zCD=

(‘=’)

(‘=’) (‘≠’)(‘=’)

(‘≠’)

(‘≠’)

(‘=’)

(‘=’) (‘≠’)(‘=’)

determine y,find temporal schedule

for y =(0,1,1,0)same as y =(2,3,3,2), y =(0,2,2,3) etc.

A B

C

D

2 1

1

Spatial and temporal scheduler: Search tree of improved model

Page 43: Constraint Programming in Compiler Optimization: Lessons Learned

Instruction Selection

DAG:

TILES:

OUTPUT:

+f32

*f32

+f32

Z

YX

+f32

rf32

*f32 *f32

rf32 rf32 rf32 rf32 rf32

rf32

+f32

+f32

*f32

+f32

Z

YX

+f32

*f32

+f32

Z

YX

OR

Page 44: Constraint Programming in Compiler Optimization: Lessons Learned

Instruction Selection

• Given

• an expression DAG G

• a set of tiles representing machine instructions

• Find a mapping of tiles to nodes in G of minimal cost (size) that covers G

• Complexity:

• polynomial for trees

• NP-hard for DAGs

Page 45: Constraint Programming in Compiler Optimization: Lessons Learned

8b10b

802.11abmm

vpenta

dbms

beamform

er

fmrad

iog7

21

adpcm epic

basicm

ath

bitcount

qsort

susan

patrici

a

dijkstr

a

blowfish sha

crc32 fft gsm

AVERAGE0

10

20

30

40

50

60

70

80

90

Burg DP

CP

Benchmarks

Code

Siz

e(KB

)

Experimental evaluation