compiling application-specific hardware

37
Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University

Upload: reed-frazier

Post on 02-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Compiling Application-Specific Hardware. Mihai Budiu Seth Copen Goldstein Carnegie Mellon University. Resources. Problems. Complexity Power Global Signals Limited issue window => limited ILP. We propose a scalable architecture. Outline. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compiling  Application-Specific Hardware

Compiling Application-Specific Hardware

Mihai Budiu

Seth Copen Goldstein

Carnegie Mellon University

Page 2: Compiling  Application-Specific Hardware

Resources

Page 3: Compiling  Application-Specific Hardware

Problems

• Complexity

• Power

• Global Signals

• Limited issue window => limited ILP

We propose a scalable architecture

Page 4: Compiling  Application-Specific Hardware

Outline

• Introduction• ASH: Application Specific Hardware

• Compiling for ASH• Conclusions

Page 5: Compiling  Application-Specific Hardware

Application-Specific HardwareC program

Compiler

Dataflow IR

Reconfigurable hardware

Page 6: Compiling  Application-Specific Hardware

Our Solution

General: applicable to today’s software - programming languages

- applications

Automatic: compiler-driven

Scalable: - run-time: with clock, hardware - compile-time: with program size

Parallelism: exploit application parallelism

Page 7: Compiling  Application-Specific Hardware

Asynchronous Computation

+

data

datavalid

ack

Page 8: Compiling  Application-Specific Hardware

New

• Entire C applications

• Dynamically scheduled circuits

• Custom dataflow machines

- application-specific

- direct execution (no interpretation)

- spatial computation

Page 9: Compiling  Application-Specific Hardware

Outline

• Scalability• Application Specific Hardware• CASH: Compiling in ASH

• Conclusions

Page 10: Compiling  Application-Specific Hardware

CASH: Compiling for ASH

Memory partitioning

Interconnection net

Circuits

C Program

RH

Page 11: Compiling  Application-Specific Hardware

Primitives+Arithmetic/logic

Multiplexors

Merge

Eta (gateway)

Memory

data

predicates

datapredicate

ld st

Page 12: Compiling  Application-Specific Hardware

Forward Branches

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Decoded mux

Conditionals => Speculation

Page 13: Compiling  Application-Specific Hardware

Critical Paths

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Page 14: Compiling  Application-Specific Hardware

Lenient Operations

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Solve the problem of unbalanced paths

Page 15: Compiling  Application-Specific Hardware

!

ret

i

+1< 100

0

*

+

sum

0

Loops

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;

Control flow => data flow

Page 16: Compiling  Application-Specific Hardware

Compilation

• Translate C to dataflow machines

• Optimizationssoftware-, hardware-, dataflow-specific

• Expose parallelism – predication– speculation– localized synchronization– pipelining

Page 17: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

pipelinedmultiplier

Page 18: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

Page 19: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

Page 20: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

Page 21: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

Page 22: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

Page 23: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

predicate

Page 24: Compiling  Application-Specific Hardware

Predicate ackedge is on thecritical path.

Pipeliningi

+

<=

100

1

*

+

sum

critical pathi’s loop

sum’s loop

Page 25: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

decouplingFIFO

Page 26: Compiling  Application-Specific Hardware

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

critical path

decouplingFIFO

Page 27: Compiling  Application-Specific Hardware

ASH Features

• What you code is what you get– no hidden control logic– lean hardware

(no CAM, multi-ported files, etc.)– no global signals

• Compiler has complete control

• Dynamic scheduling => latency tolerant

• Natural ILP and loop pipelining

Page 28: Compiling  Application-Specific Hardware

Conclusions

• ASH: compiler-synthesized hardware from HLL

• Exposes program parallelism

• Dataflow techniques applied to hardware

• ASH promises to scale with:

– circuit speed

– transistors

– program size

Page 29: Compiling  Application-Specific Hardware

Backup slides

• Hyperblocks• Predication• Speculation• Memory access• Procedure calls• Recursive calls• Resources• Performance

Page 30: Compiling  Application-Specific Hardware

Hyperblocks

Procedure back

Page 31: Compiling  Application-Specific Hardware

Predication

p !p

q

if (p) .......q

if (!p) .......

hyperblock

back

Page 32: Compiling  Application-Specific Hardware

Speculation

q

if (!p) ......

q

if (!p) ......

ops w/ side-effects

back

Page 33: Compiling  Application-Specific Hardware

Memory Access

back

load

addresspredicate

token

tokendataLoad-store

queue

store

address pred token

token

data

Inte

rcon

nect

ion

netw

ork

Memory

Page 34: Compiling  Application-Specific Hardware

Procedure calls

back

Inte

rcon

nect

ion

netw

ork

Extract args

ret

result caller

Procedure P

call P

args

Page 35: Compiling  Application-Specific Hardware

Recursion

recursive call

save live values

restore live values

hyperblock

stack

back

Page 36: Compiling  Application-Specific Hardware

Resources

• Estimated SpecINT95 and Mediabench

• Average < 100 bit-operations/line of code

• Routing resources harder to estimate

• Detailed data in paper

back

Page 37: Compiling  Application-Specific Hardware

Performance• Preliminary comparison with 4-wide OOO• Assumed same FU latencies• Speed-up on kernels from Mediabench

0

0.5

1

1.5

2

2.5

3

3.5

adpc

m_e

adpc

m_d

gsm

_e

gsm

_d

epic_

e

epic_

d

mpe

g2_d

jpeg_

e

pegw

it_e

pegw

it_d

g721

_e

g721

_d

back