simulation of streaming applications on multicore systems

19
Simulation of Streaming Applications on Multicore Systems Saurabh Gayen, Mark Franklin (PI), Eric J. Tyson, Roger D. Chamberlain Storage-Based Supercomputing Group Dept. of Computer Science and Engineering Washington University in St. Louis Supported by Nat’l Science Foundation grant CCF- 0427794

Upload: lawrence-oneill

Post on 31-Dec-2015

33 views

Category:

Documents


1 download

DESCRIPTION

Simulation of Streaming Applications on Multicore Systems. Saurabh Gayen, Mark Franklin (PI), Eric J. Tyson, Roger D. Chamberlain Storage-Based Supercomputing Group Dept. of Computer Science and Engineering Washington University in St. Louis - PowerPoint PPT Presentation

TRANSCRIPT

Simulation of Streaming Applications on Multicore Systems

Saurabh Gayen, Mark Franklin (PI),Eric J. Tyson, Roger D. Chamberlain

Storage-Based Supercomputing GroupDept. of Computer Science and EngineeringWashington University in St. Louis

Supported by Nat’l Science Foundation grant CCF-0427794

Saurabh Gayen 6/3/2008 2

Problem domain

FPGA Network Proc

FPGA

Network Proc

FPGA

High-performance streaming applications» Large streams of high-throughput data

– Networking and communications– Scientific computing (offline AND online)– Media creation and playback– Data mining (e.g., bioinformatics, security)

Hard to develop applications on multicore systems» Complex programming model (e.g., synchronization)

Other platforms can provide speedups (FPGA, DSP, NP)

Devices are becoming more interconnected» Hard to simulate» Hard to debug» Hard to deploy

Saurabh Gayen 6/3/2008 3

Overview

1. Auto-Pipe and the X Language

2. X-Sim: Federated System Simulator

3. Example applications

4. Status and future work

Saurabh Gayen 6/3/2008 4

CPU CPU

What is Auto-Pipe?

Auto-Pipe is… a set of tools used to create, test, build and deploy,

and optimize distributed applications

CPUFPGA CPUPCI

Partitioned, parallel algorithms

PCI

NP

Complex heterogeneous systems

Auto-Pipe is made for…

Time and/or resource-constrained applications

Saurabh Gayen 6/3/2008 5

The X Language

X language files are composed of:

• An algorithm description• Made of blocks and edges

• A processing architecture• Made of computation and interconnect resources

• A mapping of algorithm to architecture

CPU

CPU

CPU

FPGAA

BC D E

Saurabh Gayen 6/3/2008 6

Overview

1. Auto-Pipe and the X Language

2. X-Sim: Federated System Simulator

3. Example applications

4. Status and future work

Saurabh Gayen 6/3/2008 7

X-Sim: Federated Simulation

Platform-Specific Simulators

gen2

half

proc[1]

proc[2]

FPGA

gen1

outsumPCI

Sh.

Mem

.

Communication Link Models

Saurabh Gayen 6/3/2008 8

X-Sim Mechanism

gen2

gen1

sum

1us

0us

D TD T

avail

TT

in

D T

out

D T

avail

D T

out

D T

out

D

T

Data file

Timestamp file

T

testpoint

T

testpoint

proc[1]

proc[2]

FPGA

PCIhalf store

T

in

T

testpoint

Sh.

Mem

.

Saurabh Gayen 6/3/2008 9

Overview

1. Auto-Pipe and the X Language

2. X-Sim: Federated System Simulator

3. Example applications

4. Status and future work

Saurabh Gayen 6/3/2008 10

Example Application : test1

shar

ed_

me

m

proc [4]

processor

processor

proc [3]

processor

processor

gen1

pro c[1]

gen2

s um half s tore

pro c[2]

sh

are

d_m

em

proc [4]

processor

processor

proc [3]

processor

processor

gen1

pro c[1]

gen2

s um half s tore

pro c[2]

48.9 48.9

18.2 12.9

138.2

0

20

40

60

80

100

120

140

160

gen1 gen2 sum half store

block

cum

ula

tive

tim

e (s

) .

267.5

138.6

267.5

143.3

0

50

100

150

200

250

300

350

1-core 2-coremappings

app tim

es (s)

.

sim

deployed

1.93x 1.87x

Saurabh Gayen 6/3/2008 11

Example Application : VERITAS

Astrophysics Gamma-ray event parameterization

» Active sources: galactic nuclei, pulsars» Transient sources: hypernovae, ...

Lots of data: 20TB/year» Want to process as fast as possible» Process whole DB for rare events

Saurabh Gayen 6/3/2008 12

VERITAS algorithm

Front

P ipe[1] P ipe[6]

Back

R aw D ata for 1 P ixel

C harge for 1 P ixel

...

...

F ron t(2 .5% )

P ipes(94 .7% )

B ack(0 .9 % )

o the r(1 .9% )

P rocess ing T im e

FFT

IFFT

LowPass LowPass

F F T(47 .4% )

Low P ass(13 .7% )

IF F T(38 .8 % )

Pipe[i]

Saurabh Gayen 6/3/2008 13

2-Processor Mappings

Front

Back

map2a : Vertical Partition

proc[1]

proc[2]

Front

Back

map2b : Horizontal Partition

FFT

LowPass

IFFT

proc[1]

proc[2]

Saurabh Gayen 6/3/2008 14

3-Processor Mappings

Front

Back

map3a : Vertical Partition

proc[1]

proc[3]proc[2]

Front

Back

map3b : Horizontal Partition

proc[1]

proc[3]

proc[2]

FFT

LowPass

IFFT

Saurabh Gayen 6/3/2008 15

69.3 73.4

46.461.5

70.2 75.2

48.563.2

127.2 127.2

0

20

40

60

80

100

120

140

map1 map2a map2b map3a map3b

Mappings

App tim

es (s)

.

sim deployed

2 and 3 Processor Results

1x

1.83x 1.73x

2.74x2.07x

VERITAS Configured with 6 Pipes

Saurabh Gayen 6/3/2008 16

134

19.611.4

134

38.835.4

69.1

47.7

72.2

48.3

0

20

40

60

80

100

120

140

160

1 2 3 4 8 16

Number of Processors

App. Tim

e (s

) .

sim dep

SMP Performance Scaling

VERITAS Configured with 16 Pipes

1x

1.94x

2.81x3.79x

6.84x11.75x

Saurabh Gayen 6/3/2008 17

Overview

1. Auto-Pipe and the X Language

2. X-Sim: Federated System Simulator

3. Example applications

4. Status and future work

Saurabh Gayen 6/3/2008 18

Status and Future Work

Currently»X-Sim is operational

What’s next»Develop library of validated

communication modelsFuture directions

»Develop X-Opt, an automated performance optimization tool

Saurabh Gayen 6/3/2008 19

Acknowledgements

•Storage based supercomputing group

Michela Becchi Justin Brown Jim Buckley

Jeremy Buhler Roger Chamberlain Patrick Crowley

Mark Franklin (PI) Narayan Ganesan Gregory Galloway

Saurabh Gayen Eric Tyson

• Gamma Ray application: Jim Buckley / VERITAS collab.

• National Science Foundation CCF-0427794