portable parallel programming on cloud and hpc: scientific applications of twister4azure

33
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne ([email protected]) Bingjing Zhang, Tak-Lon Wu, Judy Qiu School of Informatics and Computing Indiana University, Bloomington.

Upload: kaida

Post on 24-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure. Thilina Gunarathne ([email protected]) Bingjing Zhang, Tak -Lon Wu, Judy Qiu School of Informatics and Computing Indiana University, Bloomington. Clouds for scientific computations. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Portable Parallel Programming on Cloud and HPC: Scientific Applications of

Twister4Azure

Thilina Gunarathne ([email protected])Bingjing Zhang, Tak-Lon Wu, Judy Qiu

School of Informatics and Computing Indiana University, Bloomington.

Page 2: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Clouds for scientific computations

No upfront

cost

Zero maintenance

Horizontal scalability

Compute, storage and other services

Loose service guarantees

Not trivial to utilize effectively

Page 3: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Scalable Parallel Computing on Clouds

Programming Models

Scalability

Performance

Fault Tolerance

Monitoring

Page 4: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Pleasingly Parallel Frameworks

Classic Cloud Frameworks

512 1012 1512 2012 2512 3012 3512 401250%

60%

70%

80%

90%

100%

DryadLINQ Hadoop

EC2 Azure

Number of Files

Para

llel E

ffici

ency

Cap3 Sequence Assembly

512 1024 1536 2048 2560 3072 3584 40960

20406080

100120140

DryadLINQHadoopEC2Azure

Number of Files

Per C

ore

Per F

ile T

ime

(s)

Page 5: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Map Redu

ce

Programming Model

Moving Computation to

Data

Scalable

Fault Tolerance

Ideal for data intensive applications

Page 6: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

MRRoles4Azure

Azure Cloud Services

• Highly-available and scalable• Utilize eventually-consistent , high-latency cloud services effectively• Minimal maintenance and management overhead

Decentralized

• Avoids Single Point of Failure• Global queue based dynamic scheduling• Dynamically scale up/down

MapReduce

• First pure MapReduce for Azure• Typical MapReduce fault tolerance

Page 7: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

MRRoles4Azure

Azure Queues for scheduling, Tables to store meta-data and monitoring data, Blobs for input/output/intermediate data storage.

Page 8: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

MRRoles4Azure

Global Barrier

Page 9: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

SWG Sequence Alignment

Smith-Waterman-GOTOH to calculate all-pairs dissimilarity

Costs less than EMR

Performance comparable to Hadoop, EMR

Page 10: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Data Intensive Iterative Applications• Growing class of applications– Clustering, data mining, machine learning & dimension

reduction applications– Driven by data deluge & emerging computation fields– Lots of scientific applications

k ← 0;MAX ← maximum iterationsδ[0] ← initial delta valuewhile ( k< MAX_ITER || f(δ[k], δ[k-1]) ) foreach datum in data β[datum] ← process (datum, δ[k]) end foreach

δ[k+1] ← combine(β[]) k ← k+1end while

Page 11: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Data Intensive Iterative Applications

Compute Communication Reduce/ barrier

New Iteration

Larger Loop-Invariant Data

Smaller Loop-Variant Data

Broadcast

Page 12: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Twister4Azure – Iterative MapReduce Overview

• Decentralized iterative MR architecture for clouds• Extends the MR programming model • Multi-level data caching – Cache aware hybrid scheduling

• Multiple MR applications per job• Collective communications *new*• Outperforms Hadoop in local cluster by 2 to 4 times• Sustain features of MRRoles4Azure

– Cloud services, dynamic scheduling, load balancing, fault tolerance, monitoring, local testing/debugging

Page 13: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Twister4Azure – Performance Preview

KMeans Clustering

BLAST sequence search Multi-Dimensional Scaling

Page 14: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Reduce

Reduce

MergeAdd

Iteration? No

Map Combine

Map Combine

Map Combine

Data Cache

Yes

Hybrid scheduling of the new iteration

Job Start

Job Finish

Iterative MapReduce for Azure Cloud

http://salsahpc.indiana.edu/twister4azure

Page 15: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Reduce

Reduce

MergeAdd

Iteration? No

Map Combine

Map Combine

Map Combine

Data Cache

Yes

Hybrid scheduling of the new iteration

Job Start

Job Finish

Iterative MapReduce for Azure Cloud

Merge step

http://salsahpc.indiana.edu/twister4azure

• Extension to the MapReduce programming model– Map -> Combine -> Shuffle -> Sort -> Reduce -> Merge

• Receives Reduce outputs and the broadcast data

Page 16: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Reduce

Reduce

MergeAdd

Iteration? No

Map Combine

Map Combine

Map Combine

Data Cache

Yes

Hybrid scheduling of the new iteration

Job Start

Job Finish

Iterative MapReduce for Azure Cloud

Merge step

http://salsahpc.indiana.edu/twister4azure

• Loop variant data – Comparatively smallerMap(Key, Value, List of KeyValue-Pairs(broadcast data) ,

…)

• Can be specified even for non-iterative MR jobs

Extensions to support broadcast data

Page 17: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Reduce

Reduce

MergeAdd

Iteration? No

Map Combine

Map Combine

Map Combine

Data Cache

Yes

Hybrid scheduling of the new iteration

Job Start

Job Finish

Iterative MapReduce for Azure Cloud

Merge step

http://salsahpc.indiana.edu/twister4azure

• Loop invariant data (static data) – traditional MR key-value pair– Cached between iterations

• Avoids the data download, loading and parsing cost

Extensions to support broadcast data

In-Memory/Disk caching of static

data

Page 18: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Reduce

Reduce

MergeAdd

Iteration? No

Map Combine

Map Combine

Map Combine

Data Cache

Yes

Hybrid scheduling of the new iteration

Job Start

Job Finish

Iterative MapReduce for Azure Cloud

Merge step

http://salsahpc.indiana.edu/twister4azure

• Tasks are finer grained and the intermediate data are relatively smaller than traditional map reduce computations

• Table or Blob storage based transport based on data size

Extensions to support broadcast data

In-Memory/Disk caching of static

data

Hybrid intermediate data transfer

Page 19: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Cache Aware Scheduling• Map tasks need to be scheduled with cache awareness– Map task which process data ‘X’ needs to be scheduled to the

worker with ‘X’ in the Cache• Nobody has global view of the data products cached in

workers – Decentralized architecture– Impossible to do cache aware assigning of tasks to workers

• Solution: workers pick tasks based on the data they have in the cache– Job Bulletin Board : advertise the new iterations

Page 20: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Hybrid Task SchedulingFirst iteration

through queues

New iteration in Job Bulleting Board

Data in cache + Task meta data

history

Left over tasks

Page 21: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Multiple Applications per Deployment

• Ability to deploy multiple Map Reduce applications in a single deployment

• Capability to chain different MR applications in a single job, within a single iteration.–Ability to pipeline

• Support for many application invocations in a workflow without redeployment

Page 22: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

KMeans Clustering• Partition a given data set into disjoint clusters• Each iteration– Cluster assignment step– Centroid update step

Page 23: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Performance – Kmeans Clustering

Number of Executing Map Task Histogram

Strong Scaling with 128M Data PointsWeak Scaling

Task Execution Time Histogram

First iteration performs the initial data fetch

Overhead between iterations

Scales better than Hadoop on bare metal

Page 24: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Applications• Bioinformatics pipeline

Gene Sequences

Pairwise Alignment &

Distance Calculation

Distance Matrix

Clustering

Multi-Dimensional

Scaling

Visualization

Cluster Indices

Coordinates

3D Plot

O(NxN)

O(NxN)

O(NxN)

http://salsahpc.indiana.edu/

Page 25: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Metagenomics Result

http://salsahpc.indiana.edu/

Page 26: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

X: Calculate invV (BX)Map Reduce Merge

Multi-Dimensional-Scaling• Many iterations• Memory & Data intensive• 3 Map Reduce jobs per iteration• Xk = invV * B(X(k-1)) * X(k-1)

• 2 matrix vector multiplications termed BC and X

BC: Calculate BX Map Reduce Merge

Calculate StressMap Reduce Merge

New Iteration

Page 27: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Performance – Multi Dimensional Scaling

Azure Instance Type Study Number of Executing Map Task Histogram

Weak Scaling Data Size ScalingFirst iteration performs the initial data fetch

Performance adjusted for sequential performance difference

Page 28: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

BLAST sequence search

BLAST Sequence SearchBLAST

Scales better than Hadoop & EC2-Classic Cloud

Page 29: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Current Research• Collective communication primitives – All-Gather-Reduce– Sum-Reduce (aca MPI Allreduce)

• Exploring additional data communication and broadcasting mechanisms– Fault tolerance

• Twister4Cloud– Twister4Azure architecture implementations for

other cloud infrastructures

Page 30: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Collective Communications

Map1

Map2

MapN

Map1

Map2

MapN

Map1 δ

Map2 δ

…..

Map N δ

App X App Y

Page 31: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Conclusions• Twister4Azure– Address the challenges of scalability and fault tolerance

unique to utilizing the cloud interfaces– Support multi-level caching of loop-invariant data across

iterations as well as caching of any reused data – Novel hybrid cache-aware scheduling mechanism

• One of the first large-scale study of Azure performance for non-trivial scientific applications.

• Twister4Azure in VM’s outperforms Apache Hadoop in local cluster by a factor of 2 to 4

• Twister4Azure exhibits performance comparable to Java HPC Twister running on a local cluster.

Page 32: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Acknowledgements• Prof. Geoffrey C Fox for his many insights and

feedbacks • Present and past members of SALSA group –

Indiana University. • Seung-Hee Bae for many discussions on MDS• National Institutes of Health grant 5 RC2

HG005806-02.• Microsoft Azure Grant

Page 33: Portable Parallel Programming on Cloud and HPC:  Scientific Applications of Twister4Azure

Questions?

Thank You!http://salsahpc.indiana.edu/twister4azure