comparing pregel related systems

Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

December 2, 2013

Joshua Woo, Prashant Raghav, Vishnu Prathish

David R. Cheriton School of Computer ScienceUniversity of Waterloo

Outline

● Motivation● Our Project● Setup● Preliminary Results● Preliminary Analysis● In-Progress● References

Motivation

Recall: Pregel● Large-scale graph processing system● Fault-tolerant framework for graph

algorithms● MapReduce for graph operations?● Vertex-centric model (“think like a vertex”)

Motivation

● Pregel is proprietary● Many open source graph processing

systems○ Pregel clones○ Pregel-inspired○ BSP

Motivation

● Apache Hama● Signal/Collect● Apache Giraph● GPS● GraphLab

● Phoebus● GoldenOrb● HipG● Mizan

MotivationSystem Impl. Language Type

Apache Hama Java Pure BSP framework

Signal/Collect Scala Pregel inspired

Apache Giraph Java Pregel clone

GPS Java Advanced Pregel clone

GraphLab C++ Pregel inspired

Phoebus Erlang Pregel clone

GoldenOrb Java Pregel clone

HipG Java Advanced Pregel clone

Mizan C++ Advanced Pregel clone

Motivation

● How do these systems compare?○ In terms of performance (runtime)?○ In terms of memory footprint?○ In terms of network utilization (num. messages)?○ Variables:

■ Algorithm■ Graph size (number of vertices)■ Cluster size

Our Project

● Compare at least 3 systems○ Apache Hama - general BSP framework○ Apache Giraph - Hadoop Map-only job, Facebook○ GPS - +dynamic repartitioning, +multi vertex-centric○ Signal/Collect - +edges, +async computations○ GraphLab○ Mizan

Our Project

● Measure the runtime of at least two algorithms on each system○ PageRank

■ Fixed number of supersteps = 30○ Single Source Shortest Path (SSSP)○ k-means clustering

Setup

● Experiments on AWS○ Ubuntu 12.04 m1.medium EC2 instances

■ 2 ECUs, 1 vCPU, 3.7 GiB memory, moderate network performance

■ 8 GiB EBS volume per instance○ Cluster sizes:

■ Single-node cluster■ 4-node cluster■ 8-node cluster

Setup

● Experiments on AWS○ 5 runs per dataset per algorithm per cluster

■ 35 runs per algorithm per cluster■ 70 runs per cluster■ 140 runs in total (single-node, 4-node)

● TODO: another 70 runs (8-node)

Setup

● Dataset○ 7 datasets

■ tinyEWD: 8 vertices 15 edges■ mediumEWD: 250 vertices 2,546 edges■ 1000EWD: 1,000 vertices 16,866 edges■ rome99: 3,353 vertices 8,870 edges■ 10000EWD: 10,000 vertices 16,866 edges■ NYC: 264,346 vertices 733,846 edges■ largeEWD: 1,000,000 vertices 15,172,126 edges

○ Source: http://algs4.cs.princeton.edu/44sp/

http://algs4.cs.princeton.edu/44sp/

Setup

● Systems○ Hama

■ Hadoop 1.03.0■ Hama 0.6.3

○ Giraph■ Hadoop 0.20.203rc1■ Giraph (trunk@37bc2c80564b45d7e4ce95db76f5411a6b8bdb3a)

○ GPS■ Hadoop 0.20.203rc1■ GPS (trunk@Revision 112)

Setup

● Input Graph○ Source files converted into format suitable for each

system■ Time for this conversion excluded from results:

● Conversion done before algorithms are run (pre-processing?)

● Negligible for largeEWD (1,000,000 vertices, 15,172,126 edges)

Preliminary Results

Dataset Hama Giraph GPS

tinyEWD 14.17 41.60 14.40

mediumEWD 16.36 44.00 36.00

1000EWD 18.06 48.80 46.60

rome99 22.95 66.00 50.00

10000EWD 25.32 67.40 55.00

NYC 165.01 267.00 310.00

largeEWD 6,109.20 602.80 618.70

Average SSSP runtime on 4-node cluster (in seconds)

Preliminary ResultsSSSP runtime vs. graph size (num. vertices)

Preliminary Results


tinyEWD 29.36 49.40 58.57

mediumEWD 30.26 53.40 60.42

1000EWD 37.86 54.60 61.03

rome99 29.35 56.20 61.80

10000EWD 302.33 61.80 64.80

NYC 1,001.24 134.40 68.69

largeEWD Failed 2,100.00 1,213.56

Average PageRank (30 supersteps) runtime on 4-node cluster (in seconds)

Preliminary ResultsPageRank runtime vs. graph size (num. vertices)

Preliminary Analysis● A point of resource crunch

○ No significant change in performance until a point● Hama does not scale well (vertices ~10^4)● Giraph and GPS scale better● In general, PageRank runtime > SSSP runtime● GPS input reader does not guarantee true partitioning

for large datasets● Which ‘knobs’ to keep constant? - Optimization vs.

Comparability

In-Progress

● Output validation● Memory footprint● Network utilization (num. messages)● GraphLab and Signal/Collect● Green-Marl?

○ (DSL) → [Compiler] → (Giraph, GPS)

Questions?

Extras

Preliminary Results


tinyEWD 10 7 7

mediumEWD 16 13 18

1000EWD 27 25 23

rome99 105 102 18

10000EWD 85 80 64

NYC 671 905 438

largeEWD 806 670 730

Number of supersteps for SSSP

Preliminary ResultsNumber of supersteps for SSSP

Really, really PreliminaryPageRank runtime (in seconds) on GPS: native vs. Green-Marl generated

Dataset Native Green-Marl generated

tinyEWD 58.57 60.20

mediumEWD 60.42 60.11

1000EWD 61.03 62.30

rome99 61.80 62.32

10000EWD 64.80 65.78

NYC 68.69 71.34

largeEWD 1,213.56 -

Really, really PreliminaryPageRank runtime (in seconds) on GPS: native vs. Green-Marl generated

References

● Our Project Proposal● http://algs4.cs.princeton.edu/44sp/● https://github.com/apache/hadoop-common● https://github.com/apache/giraph● https://subversion.assembla.com/svn/phd-

projects/gps/trunk/● http://ppl.stanford.edu/main/green_marl.html



https://github.com/apache/hadoop-common

https://github.com/apache/hadoop-common

https://github.com/apache/giraph

https://github.com/apache/giraph

https://subversion.assembla.com/svn/phd-projects/gps/trunk/



http://ppl.stanford.edu/main/green_marl.html

http://ppl.stanford.edu/main/green_marl.html

comparing pregel related systems

Engineering

preliminary pagerank runtime

preliminary results number

graph size

setup experiments

node cluster

runtime

algorithm

algorithms