a parallel streaming framework for multi-processing-unit ... · references "parallel data˜ow...

References"Parallel Data�ow Scheme for Streaming (Un)Structured Data" by H. Vo, D. Osmari, B. Summa, J. Comba, V. Pascucci and C. Silva, SCI Institute, University of Utah, Technical Report #UUSCI-2009-004, 2009.

"Multi-Threaded Streaming Pipeline For VTK" by H. Vo and C. Silva, SCI Institute, University of Utah, Technical Report #UUSCI-2009-005, 2009.

www.sci.utah.edu

A Parallel Streaming Framework for Multi-Processing-Unit SystemsHuy Vo, Brian Summa, Valerio Pascucci, Claudio Silva

SCI Institute, University of Utah, USAPresenter: Huy Vo Daniel Osmari, Joao Comba

UFRGS, Brazil

ENew Event E

Data Request

ENew Event

Data Request

POLICY

PUSH PULL

SCHEDULER RESOURCES

Trigger,Execute &Feedback

Distribute

PERFORMCOMPUTATION

OUTPUT

Demand-Driven RequestsPulling Data

Event-Driven RequestsPushing Data

Module

Message

Data Dependency

Executive

Our Uni�ed Model

ENew Event

Data Request

Piece #2

Piece #1

Streamable/Separable Data Structure

Concurrency Execution

Task Pipeline Data

Motivation Pull(), Push() and ReleaseInputs()Current data�ow system problems(with VTK, AVS, Data Explorer,...):• API “polluted” over the years (many designs in 80s-90s)• Do not map well to modern highly-parallel architecture: Multi- Core, GPUs, ...• Large datasets not well supported

Need a uni�ed data�ow architecture supporting:• Streaming• Heterogeneous architecture• Efficiency• Concurrency

Piece #1 Piece #2

Piece #1

RequestPiece #1

Piece #1

RequestPiece #2

Piece #2

Streaming Push Streaming Pull

• Use a priority queue with topological order of the modules as priority ranks• Include streaming piece numbers to priority ranks• Resources distributing strategy is based on execution time• Maximize concurrency but still taking data dependency into account

• Pipelining GPU branch with separate CUDA devices• Load-balancing between CPU/GPU is converged overstreaming time• Achieved a throughput of 250M pairs/s on (2) Xeon w5580 and a Quadro FX5800 gives• Achieved a throughput over 350M pairs/s on (2) Xeon e5550 and (2) Tesla C1060s

Task Scheduler & Resource Distribution

System Architecture

Execution Parallelism

Classi�cation of Data�ow Streaming Model Preliminary Results

Ongoing Progress

Complete Implementation for CPU Thread Resources

Basic Concrete Implementation of CUDA execution for GPU Resources • Connections can direct data transfer to appropriate cudaMemcpy() calls • A custom scheduling strategy to minimize host-device data transfer

• Dynamic Load-Balancing for GPU kernels • Concrete implementation for MPI to run on cluster • Incorporate TBB/OpenMP for Task Scheduling • A generalized streaming data structure

vtkContourFilter

vtkDataSetMapper

vtkActor

vtkDataSetReader

vtkContourFilter

vtkDataSetMapper

vtkActor

vtkDataSetReader

vtkRenderWindow

vtkRenderer

ContourFilter

DataSetMapper

ContourFilter

DataSetMapper

DataSetReader

RenderWindow

Renderer

(a) (b)

Data Block and Priority Rank

viewportChangedEvent()

while (LOD>0)

// Set LOD of DataAccesses

Push(<list of DataAccess>)

LOD = LOD - 1

ProgressiveRenderer

...DataAccess DataAccess DataAccess

Average1 image

12 imagesProgressiveRenderer

Average1 image

12 images

viewportChangedEvent()

while (LOD>0)

// Propagate LOD to DataAccesses

Pull(Average Module)

LOD = LOD - 1

Push Pull

Better Pipeline ParallelismSimpler Interface

Compute() {

// input dependent computation

ReleaseInputs()

// input independent computation

vtkImageDataStreamer5

vtkImageGaussianSmooth

vtkImageLuminance

vtkImageThreshhold

vtkPNGReader

vtkDataSetMapper

vtkActor

2vtkImageDataStreamGenerator

vtkRenderer

vtkImageDataStreamMerger6

vtkImageLuminance

vtkImageThreshhold

vtkPNGReader

vtkDataSetMapper

vtkActor

vtkImageBlend

vtkImageDataStreamMerger

vtkDataSetMapper

vtkActor

vtkImageDataStreamGenerator

vtkImageLuminance

vtkImageThreshhold

vtkPNGReader

vtkImageDataStreamGenerator

vtkPNGReader

CPU/GPU Sort

Streaming Simpli�cation of Tetrahedral Meshes

Multi-View Visualization ~3x speed up on 4-core machine

ProgressiveRenderer

GrayScale GrayScale GrayScale

Average

StandardDeviation

Di�erenceL2 Norm

EdgeDetection

Downsample Downsample Downsample

1 image 12 images

Streaming Simplication of Tetrahedral MeshesModels (Tets) Original Streaming Quadric Pipeline

Inst. Inst.Torso 1.0M 8.5s 8.4s 7.9s 3.0sFighter 1.4M 10.1s 9.5s 8.4s 3.6sRbl 3.8M 37.4s 35.6s 31.5s 8.9sMito 5.5M 52.0s 53.1s 44.9s 10.2s

UpdateQuadrics UpdateQuadrics UpdateQuadrics

TetStreamMerger

TetStreamingMesh

Simpli�cationBuf

Decimate

UpdateQuadrics UpdateQuadrics UpdateQuadrics

Simpli�cationBuf

Decimate

Simpli�cationBuf

Decimate

Simpli�cationBuf

Decimate

TetStreamMerger

TetStreamingMesh

CPU Threaded Radix Sort

GPU Presort

GPU Scatter

Streaming Unsorted Data

CPU Threaded Merge

CPU Threaded Accumulate

Sorted Output

GPU Scan Blocks

GPU Sum Blocks

a parallel streaming framework for multi-processing-unit ... · references "parallel data˜ow...

Documents

os desafios de competitividade de um porto privado osmari de...

block parallel programming for real-time …...preface this...

streaming similarity search over one billion tweets...

modeling performance of a parallel streaming engine ... ·...

joelen osmari da silva obtenÇÃo de acetato de …

lncs 5374 - parsing xml using parallel traversal of...

high-performance broadcast designs for streaming...

anwar rizal – streaming & parallel decision tree in flink

automatic generation of e cient parallel streaming...

streaming parallel gpu acceleration of large-scale ﬁlter...

parallel streaming signature em-tree: a clustering...

winbro: a window and broadcast- based parallel streaming...

streaming for functional data-parallel...

streaming-enabled parallel dataflow architecture for...

parallel sql and streaming expressions in

design of a cooperative video streaming system on community...

fastflow: e cient parallel streaming applications on...

arcserve2000 getting...

streaming similarity search over one billion tweets using...

video streaming © nanda ganesan, ph.d.. video streaming...