spatial computation thesis committee: seth goldstein peter lee todd mowry babak falsafi nevin...

77
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai Budiu CMU CS

Upload: ella-norris

Post on 26-Mar-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

Spatial Computation

Thesis committee:Seth Goldstein

Peter Lee

Todd Mowry

Babak Falsafi

Nevin Heintze

Ph.D. Thesis defense, December 8, 2003

SCS

Mihai BudiuCMU CS

Page 2: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

2

Spatial Computation

Thesis committee:Seth Goldstein

Peter Lee

Todd Mowry

Babak Falsafi

Nevin Heintze

Ph.D. Thesis defense, December 8, 2003

SCSA model of general-purpose computationbased on Application-Specific Hardware.

Page 3: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

3

Thesis StatementApplication-Specific Hardware (ASH):

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 4: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

4

Outline• Introduction

• Compiling for ASH

• Media processing on ASH

• ASH vs. superscalar processors

• Conclusions

Page 5: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

5

CPU Problems

• Complexity

• Power

• Global Signals

• Limited ILP

Page 6: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

6

Design Complexity

from Michael Flynn’s FCRC 2003 talk

58%/Year

21%/Year

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2003

2001

2005

2007

2009

xxx

x xx

x

Logic transistors/chip

Transistors/staff*month

Source: S. Malik, orig Sematech

Prod

uctiv

ity

10

1,000,000

10,000,000

100,000,000

1000

100

10,000

100,000

10

1000

100

10,000

100,000

1,000,000

10,000,000

Chi

p si

ze (K

tran

sist

ors)

Design Time:CAD productivity favors FPL

2.5

.10

.35

Page 7: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

7

Communication vs. Computation

5ps 20ps

gate wire

Power consumption on wires is also dominant

Page 8: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

8

Our Approach: ASH

Application-Specific Hardware

Page 9: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

9

1.

2.

1.

2.Programs

Programs

Resource Binding Time

CPU ASH

Page 10: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

10

Hardware Interface

CPU ASH

ISA

software

hardware

software

hardwaregates

virtual ISA

Page 11: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

11

Application-Specific HardwareC program

Compiler

Dataflow IR

Reconfigurable/custom hw

Page 12: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

12

Contributions

Compilation

Computerarchitecture

Reconfigurablecomputing

Embeddedsystems

Asynchronouscircuits

High-levelsynthesis

Dataflowmachines

Nanotechnology

theory

syste

ms

Page 13: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

13

Outline• Introduction

• CASH: Compiling for ASH

• Media processing on ASH

• ASH vs. superscalar processors

• Conclusions

Page 14: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

14

Computation = Dataflow

• Operations ) functional units• Variables ) wires• No interpretation

x = a & 7;...

y = x >> 2;

Programs

&

a 7

>>

2

x

Circuits

Page 15: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

15

Basic Operation

+data

valid

ack

latch

Page 16: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

16

+

Asynchronous Computation

data

valid

ack

1

+

2

+

3

+

4

+

8

+

7

+

6

+

5

latch

Page 17: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

17

Distributed Control Logic

+ -

ackrdy

FSM

asynchronous control

short, local wires

Page 18: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

18

Forward Branches

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Conditionals ) Speculation critical path

Page 19: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

19

Control Flow ) Data Flow

datapredicate

Merge (label)

Gateway

data

data

Split (branch)p

!

Page 20: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

20

i

+1< 100

0

*

+

sum

0

Loops

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;return sum; !

ret

Page 21: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

21

no speculation

sequencingof side-effects

Predication and Side-Effects

Load

addr

data

pred

token

token

tomemory

Page 22: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

22

Thesis StatementApplication-Specific Hardware:

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 23: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

23

Outline• Introduction• CASH: Compiling for ASH

– An optimization on the SIDE

• Media processing on ASH• ASH vs. superscalar processors• Conclusions

skip to

Page 24: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

24

Availability Dataflow Analysis

y

y = a*b;

...

if (x) {

...

... = a*b;

}

Page 25: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

25

Dataflow Analysis Is Conservative

if (x) {

...

y = a*b;

}

...

... = a*b;y?

Page 26: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

26

Static Instantiation, Dynamic Evaluation

flag = false;

if (x) {

...

y = a*b;

flag = true;

}

...

... = flag ? y : a*b;

Page 27: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

27

SIDE Register Promotion Impact

0

5

10

15

20

25

30

ad

pcm

_e

ad

pcm

_d

gsm

_e

gsm

_d

ep

ic_

e

ep

ic_

d

mp

eg

2_

e

mp

eg

2_

d

jpe

g_

e

jpe

g_

d

pe

gw

it_e

pe

gw

it_d

g7

21

_e

g7

21

_d

pg

p_

e

pg

p_

d

rast

a

me

sa

09

9.g

o

12

4.m

88

ksim

12

9.c

om

pre

ss

13

0.li

13

2.ij

pe

g

13

4.p

erl

14

7.v

ort

ex

18

3.e

qu

ake

18

8.a

mm

p

16

4.g

zip

17

5.v

pr

17

6.g

cc

18

1.m

cf

19

7.p

ars

er

25

4.g

ap

30

0.tw

olf

%st promo

%st PRE

53

0

5

10

15

20

25

30

35

40

45

adp

cm_e

adp

cm_d

gsm

_e

gsm

_d

epic

_e

epic

_d

mpe

g2_e

mpe

g2_d

jpeg

_e

jpeg

_d

peg

wit_

e

peg

wit_

d

g72

1_e

g72

1_d

pgp

_e

pgp

_d

rast

a

mes

a

099

.go

124

.m88

ksim

129

.co

mp

ress

130

.li

132

.ijpe

g

134

.pe

rl

147

.vo

rtex

183

.eq

uake

188

.am

mp

164

.gzi

p

175

.vp

r

176

.gcc

181

.mcf

197

.pa

rser

254

.ga

p

300

.twol

f

% ld promo

% ld PRE

Loads

Stores

% r

educ

tion

Page 28: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

28

Outline• Introduction• CASH: Compiling for ASH• Media processing on ASH

• ASH vs. superscalar processors• Conclusions

Page 29: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

29

Performance Evaluation

ASH

LSQ

limited BW

L18K

L21/4M

Mem

CPU: 4-way OOO

Assumption: all operations have the same latency.

Page 30: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

30

Media Kernels, vs 4-way OOO

0

0.5

1

1.5

2

2.5

3ad

pcm

_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

Tim

es f

aste

r

125.85.8

Page 31: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

31

Media Kernels, IPC

0

5

10

15

20

25

adpc

m_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

Base IPC

ASH IPC

4

Page 32: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

32

Speed-up IPC Correlation

0

1

2

3

4

5

6

7

8

9

10ad

pcm

_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

Tim

es b

igg

er

Speed-up

IPC Ratio

12

Page 33: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

33

Low-Level EvaluationC

CASHcore

Verilog back-end

Synopsys,Cadence P/R

Results shown so far.All results in thesis.

Results in the next two slides.

ASIC

180nm std. cell library, 2V

~1999technology

Page 34: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

34

Area

0

2

4

6

8

10

12

adpc

m_d

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

d

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

Sq

uar

e m

m

Reference: P4 in 180nm has 217mm2

Page 35: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

35

Power

vs 4-way OOO superscalar, 600 Mhz, with clock gating (Wattch), ~ 6W

0

50

100

150

200

250

300

350

Tim

es s

mal

ler

than

OO

O

power ratio 70 41 41 129 147 94 121 136 303 303

adpcm_d g721_d g721_e gsm_d gsm_e jpeg_d mpeg2_d mpeg2_e pegwit_d pegwit_e

Page 36: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

36

Thesis StatementApplication-Specific Hardware:

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 37: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

37

Outline• Introduction• CASH: Compiling for ASH• Media processing on ASH

– dataflow pipelining

• ASH vs. superscalar processors• Conclusions

skip to

Page 38: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

38

Pipeliningi

+

<=

100

1

*

+

sum

pipelinedmultiplier(8 stages)

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;

cycle=1

Page 39: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

39

Pipeliningi

+

<=

100

1

*

+

sum

cycle=2

Page 40: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

40

Pipeliningi

+

<=

100

1

*

+

sum

cycle=3

Page 41: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

41

Pipeliningi

+

<=

100

1

*

+

sum

cycle=4

Page 42: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

42

Pipeliningi

+

<=

100

1

i=1

i=0

+

sum

cycle=5

pipeline balancing

Page 43: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

43

Outline• Introduction

• CASH: Compiling for ASH

• Media processing on ASH

• ASH vs. superscalar processors

• Conclusions

Page 44: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

44

This Is Obvious!

ASH runs at full dataflow speed, so CPU cannot do any better(if compilers equally good).

Page 45: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

45

SpecInt95, ASH vs 4-way OOO

-50

-40

-30

-20

-10

0

10

20

300

99

.go

12

4.m

88

ksim

12

9.c

om

pre

ss

13

0.li

13

2.ij

pe

g

13

4.p

erl

14

7.v

ort

ex

Pe

rce

nt

slo

we

r /

fas

ter

Page 46: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

46

Predicted not takenEffectively a noop for CPU!

Predicted taken.

Branch Prediction

for (i=0; i < N; i++) {

...

if (exception) break;

}

i

+

<

1

&

!

exception

result available before inputs

ASH crit path

CPU crit path

Page 47: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

47

SpecInt95, perfect prediction

-60

-40

-20

0

20

40

60

099.

go

124.

m88

ksim

129.

com

pres

s

130.

li

132.

ijpeg

134.

perl

147.

vort

ex

Per

ce

nt

slo

we

r/fa

ster

baseline

prediction

no data

Page 48: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

48

ASH Problems

• Both branch and join not free• Static dataflow (no re-issue of same instr)• Memory is “far”• Fully static

– No branch prediction– No dynamic unrolling– No register renaming

• Calls/returns not lenient• ...

Page 49: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

49

Thesis StatementApplication-Specific Hardware:

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 50: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

50

Outline

Introduction

+ CASH: Compiling for ASH

+ Media processing on ASH

+ ASH vs. superscalar processors

= Conclusions

Page 51: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

51

• low power

• simple verification?

• specialized to app.

• unlimited ILP

• simple hardware

• no fixed window

• economies of scale

• highly optimized

• branch prediction

• control speculation

• full-dataflow

• global signals/decision

Strengths

Page 52: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

52

Conclusions

• Compiling “around the ISA” is a fruitful research approach.

• Distributed computation structures require more synchronization overhead.

• Spatial Computation efficiently implements high-ILP computation with very low power.

Page 53: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

53

Backup Slides

• Control logic • Pipeline balancing• Lenient execution• Dynamic Critical Path• Memory PRE• Critical path analysis• CPU + ASH

Page 54: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

54

Control Logic

C

C

Reg

rdyin

ackin

rdyoutackout

datain dataout

back back to talk

Page 55: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

55

Last-Arrival Events

+

data

valid

ack

• Event enabling the generation of a result• May be an ack• Critical path=collection of last-arrival edges

Page 56: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

56

Dynamic Critical Path

3. Some edges may repeat 2. Trace back along

last-arrival edges

1. Start from last node

back back to analysis

Page 57: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

57

Critical Paths

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Page 58: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

58

Lenient Operations

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Solve the problem of unbalanced pathsback back to talk

Page 59: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

59

Pipeliningi

+

<=

100

1

*i=1

i=0

+

sum

cycle=6

Page 60: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

60

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

predicate

cycle=7

Page 61: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

61

Predicate ackedge is on thecritical path.

Pipeliningi

+

<=

100

1

*

+

sum

critical pathi’s loop

sum’s loop

Page 62: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

62

Pipelinine balancing i

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

decouplingFIFO

cycle=7

Page 63: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

63

Pipelinine balancing i

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

critical path

decouplingFIFO

back back to presentation

Page 64: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

64

Register Promotion

…=*p(p2)

*p=…(p1)

…=*p

*p=…(p1)

(p2 Æ : p1)

Load is executed only if store is not

Page 65: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

65

Register Promotion (2)

…=*p(p2)

*p=…(p1)

…=*p(false)

*p=…(p1)

• When p2 ) p1 the load becomes dead...• ...i.e., when store dominates load in CFG

back

Page 66: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

66

¼ PRE

...=*p(p1) ...=*p(p2) ...=*p(p1 Ç p2)

This corresponds in the CFG to lifting the load to a basic block dominating the original loads

Page 67: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

67

Store-store (1)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• When p1 ) p2 the first store becomes dead...• ...i.e., when second store post-dominates first in CFG

Page 68: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

68

Store-store (2)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• Token edge eliminated, but...• ...transitive closure of tokens preserved

back

Page 69: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

69

A Code Fragment

for(i = 0; i < 64; i++) {

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Y[i] = X[j].q;

}

SpecINT95:124.m88ksim:init_processor, stylized

Page 70: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

70

Dynamic Critical Path

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

load predicate

loop predicate

sizeof(X[j])

definition

Page 71: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

71

MIPS gcc CodeLOOP:

L1: beq $v0,$a1,EXIT ; X[j].r == i

L2: addiu $v1,$v1,20 ; &X[j+1].r

L3: lw $v0,0($v1) ; X[j+1].r

L4: addiu $a0,$a0,1 ; j++

L5: bne $v0,$a3,LOOP ; X[j+1].r == 0xF

EXIT:

L1! L2 ! L3 ! L5 ! L14-instructions loop-carried dependence

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Page 72: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

72

If Branch Prediction Correct

L1! L2 ! L3 ! L5 ! L1Superscalar is issue-limited!2 cycles/iteration sustained

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

LOOP:

L1: beq $v0,$a1,EXIT ; X[j].r == i

L2: addiu $v1,$v1,20 ; &X[j+1].r

L3: lw $v0,0($v1) ; X[j+1].r

L4: addiu $a0,$a0,1 ; j++

L5: bne $v0,$a3,LOOP ; X[j+1].r == 0xF

EXIT:

Page 73: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

73

Critical Path with Prediction

Loads are notspeculative

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Page 74: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

74

Prediction + Load Speculation

~4 cycles!Load not pipelined(self-anti-dependence)

ack edge

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Page 75: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

75

OOO Pipe Snapshot

IF DA EX WB CT

L5L1L2

L1L2L3L4

L1L3

L5L3L2

L1L3L3

registerrenaming

LOOP:

L1: beq $v0,$a1,EXIT ; X[j].r == i

L2: addiu $v1,$v1,20 ; &X[j+1].r

L3: lw $v0,0($v1) ; X[j+1].r

L4: addiu $a0,$a0,1 ; j++

L5: bne $v0,$a3,LOOP ; X[j+1].r == 0xF

EXIT:

Page 76: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

76

Unrolling?

for(i = 0; i < 64; i++) {

for (j = 0; X[j].r != 0xF; j+=2) {

if (X[j].r == i)

break;

if (X[j+1].r == 0xF)

break;

if (X[j+1].r == i)

break;

}

Y[i] = X[j].q;

}

when 1 iteration

back back to talk

Page 77: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai

77

Ideal Architecture

High-ILPcomputation

Low ILP computation+ OS+ VM

CPU ASH

Memory

back