ex-mate: data-intensive computing with large reduction objects and its application to graph mining

39
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Upload: vern

Post on 14-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining. Wei Jiang and Gagan Agrawal. Outline. Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion. Outline. Background - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Wei Jiang and Gagan Agrawal

Page 2: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 20232

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 3: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 20233

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 4: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 20234

Map-Reduce Simple API : map and reduce

Easy to write parallel programs Fault-tolerant for large-scale data centers

Performance? Always a concern for HPC community

Generalized Reduction First proposed in FREERIDE that was developed at Ohio

State 2001-2003 Shared a similar processing structure

The key difference lies in a programmer-managed reduction-object

Better performance?

Background (I)

Page 5: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 20235

Map-Reduce Execution

Page 6: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Comparing Processing Structures

6

• Reduction Object represents the intermediate state of the execution• Reduce func. is commutative and associative• Sorting, grouping.. .overheads are eliminated with red. func/obj.

April 21, 2023

Page 7: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Our Previous Work A comparative study between FREERIDE and

Hadoop: FREERIDE outperformed Hadoop with factors of 5 to 10 Possible reasons:

Java VS C++? HDFS overheads? Inefficiency of Hadoop? API difference?

Developed MATE (Map-Reduce system with an AlternaTE API) on top of Phoenix from Stanford Adopted Generalized Reduction Focused on API differences MATE improved Phoenix with an average of 50%

Avoids large set of intermediate pairs between Map & Reduce Reduces memory requirements

April 21, 20237

Page 8: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Extending MATE Main issues of the original MATE:

Only works on a single multi-core machine Datasets should reside in memory Assumes the reduction object MUST fit in memory

This paper extended MATE to address these limitations Focus on graph mining: an emerging class of apps

Require large-sized reduction objects as well as large-scale datasets

E.g., PageRank could have a 8GB reduction object! Support of managing arbitrary-sized reduction objects

Also reading disk-resident input data Evaluated Ex-MATE using PEGASUS

PEGASUS: A Hadoop-based graph mining system

April 21, 20238

Page 9: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 20239

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 10: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202310

System Design and Implementation System design of Ex-MATE

Execution overview Support of distributed environments

System APIs in Ex-MATE One set provided by the runtime

operations on reduction objects Another set defined or customized by the users

reduction, combination, etc.. Runtime in Ex-MATE

Data partitioning Task scheduling Other low-level details

Page 11: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202311

Ex-MATE Runtime Overview Basic one-stage execution

Page 12: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202312

Implementation Considerations Support for processing very large datasets

Partitioning function: Partition and distribute to a number of nodes

Splitting function: Use the multi-core CPU on each node

Management of a large reduction-object (R.O.): Reduce disk I/O! Outputs (R.O.) are updated in a demand-driven way

Partition the reduction object into splits Inputs are re-organized based on data access

patterns Reuse a R.O. split as much as possible in memory

Example: Matrix-Vector Multiplication

Page 13: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

A MV-Multiplication Example

April 21, 202313

Output Vector

Input Vector

Input Matrix(1, 1)

(2, 1)

(1, 2)

Page 14: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 202314

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 15: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

GIM-V for Graph Mining (I) Generalized Iterative Matrix-Vector

Multiplication(GIM-V) Proposed at CMU at first Similar to the common MV Multiplication

MV Mul. : Three operations in

GIM-V: combine m(i, j) and v(j) :

Not have to be a multiplication combineAll n partial results for the element i :

Not have to be the sum assign v(new) to v(i) :

The previous value of v(i) is updated by a new value

April 21, 202315

Multiplication

Sum

Assignment

Page 16: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

GIM-V for Graph Mining (II) A set of graph mining applications can fit

into this GIM-V PageRank, Diameter Estimation, Finding

Connected Components, Random Walk with Restart, etc..

Parallelization of GIM-V: Use Map-Reduce in PEGASUS

A two-stage algorithm: two consecutive map-reduce jobs

Use Generalized Reduction in Ex-MATE A one-stage algorithm: simpler code

April 21, 202316

Page 17: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

GIM-V Example: PageRank PageRank is used by Google to calculate the

relative importance of web-pages: Direct implementation of GIM-V: v(j) is the ranking

value The three customized operations are:

April 21, 202317

Multiplication

Sum

Assignment

Page 18: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

GIM-V: Other Algorithms Diameter Estimation: HADI is an algorithm to

estimate the diameter of a given graph The three customized operations are:

Finding Connected Components: HCC is a new algorithm to find the connected components of large graphs The three customized operations are:

April 21, 202318

Multiplication

Bitwise-or

Bitwise-or

Multiplication

Minimal

Minimal

Page 19: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Parallelization of GIM-V (I) Using Map-Reduce: Stage I

Map:

April 21, 202319

Map M(i,j) and V(j) to reducer j

Page 20: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Parallelization of GIM-V (II) Using Map-Reduce: Stage I (cont.)

Reduce:

April 21, 202320

Map “combine2(M(i,j) , V(j)) “to reducer i

Page 21: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Parallelization of GIM-V (III) Using Map-Reduce: Stage II

Map:

April 21, 202321

Page 22: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Parallelization of GIM-V (IV) Using Map-Reduce: Stage II (cont.)

Reduce:

April 21, 202322

Page 23: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Parallelization of GIM-V (V) Using Generalized Reduction in Ex-MATE:

Reduction:

April 21, 202323

Page 24: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Parallelization of GIM-V (VI) Using Generalized Reduction in Ex-MATE:

Finalize:

April 21, 202324

Page 25: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 202325

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 26: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202326

Applications: Three graph mining algorithms:

PageRank, Diameter Estimation, and Finding Connected Components

Evaluation: Performance comparison with PEGASUS

PEGASUS provides a naïve version and an optimized version

Speedups with an increasing number of nodes Scalability speedups with an increasing size of

datasets Experimental platform:

A cluster of multi-core CPU machines Used up to 128 cores (16 nodes)

Experiments Design

Page 27: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202327

Results: Graph Mining (I) PageRank: 16GB dataset; a graph of 256

million nodes and 1 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

10.0 speedup

Page 28: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202328

Results: Graph Mining (II) HADI: 16GB dataset; a graph of 256 million

nodes and 1 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

11.0 speedup

Page 29: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202329

Results: Graph Mining (III) HCC: 16GB dataset; a graph of 256 million

nodes and 1 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

9.0 speedup

Page 30: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202330

Scalability: Graph Mining (IV) HCC: 8GB dataset; a graph of 256 million

nodes and 0.5 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

1.7 speedup

1.9 speedup

Page 31: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202331

Scalability: Graph Mining (V) HCC: 32GB dataset; a graph of 256 million

nodes and 2 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

1.9 speedup

2.7 speedup

Page 32: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202332

Scalability: Graph Mining (VI) HCC: 64GB dataset; a graph of 256 million

nodes and 4 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

1.9 speedup

2.8 speedup

Page 33: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Observations

April 21, 202333

Performance trends are similar for all three applications Consistent with the fact that all three applications

are implemented using the GIM-V method Ex-MATE outperforms PEGASUS significantly

for all three graph mining algorithms Reasonable speedups for different datasets Better scalability for larger datasets with a

increasing number of nodes

Page 34: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 202334

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 35: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Related Work: Academia

April 21, 202335

Evaluation of Map-Reduce-like models in various parallel programming environments: Phoenix-rebirth for large-scale multi-core machines Mars for a single GPU MITHRA for GPGPUs in heterogeneous platforms Recent IDAV for GPU clusters

Improvement of Map-Reduce API: Integrating pre-fetch and pre-shuffling into Hadoop Supporting online queries Enforcing a less restrictive synchronization

semantics between Map and Reduce

Page 36: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Related Work: Industry

April 21, 202336

Google’s Pregel System: Map-reduce may not so suitable for graph

operations Proposed to target graph processing Open source version: HAMA project in Apache

Variants of Map-Reduce: Dryad/DryadLINQ from Microsoft Sawzall from Google Pig/Map-Reduce-Merge from Yahoo! Hive from Facebook

Page 37: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Outline

April 21, 202337

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 38: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

April 21, 202338

Conclusion Ex-MATE supports the management of

reduction objects of arbitrary sizes Deals with disk-resident reduction objects

Outperforms PEGASUS for both the naïve and optimized implementations for all three graph mining application Has a simpler code

Offers a promising alternative for developing efficient data-intensive applications, Uses GIM-V for parallelizing graph mining

Page 39: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

39

Thank You, and Acknowledgments Questions and comments

Wei Jiang - [email protected] Gagan Agrawal - [email protected]

This project was supported by: