improving graphchi for large graph...
TRANSCRIPT
Improving GraphChi for Large Graph ProcessingFast Radix Sort in Pre-Processing
Stuart Thiel Greg Butler Larry Thiel
Department of Computer Science Software EngineeringConcordia University
July 11, 2016 / IDEAS 2016
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 1 / 33
Who Am I
Stuart Thiel
PhD. Student
Part-time Faculty
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 2 / 33
Contributions
Fast Radix: plug-in replacement Radix Sort
Improved GraphChi performance
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 3 / 33
In this talk
GraphChi: context of large graph processing
GraphChi: pre-processing examined
Fast Radix: our contribution, an improved LSD Radix Sort
Experimental Results: how Fast Radix improved GraphChi
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 4 / 33
Improving GraphChi
The Cloud / Parallel computing for Large GraphsPregelGiraphGraphLabGraphX
GraphChi: competitive performance, one PC
Improvements can make it more competitive
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 5 / 33
Large Graph Processing
What is large?
This context: 10s of billions of edges
Operations such as Page Rank
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 6 / 33
A Focus on Pre-processing
GraphChi Pre-processing 20–80% of total processing time
Only needs to be done once per graph
Can use different graph processing operations after
Sorting happens in pre-processing
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 7 / 33
Big Data, Small Details
Many approaches
Some basics used frequentlyWe consider
General: graph processingSpecific: sorting
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 8 / 33
Saving Resources
If we sort frequentlyBetter sorting means saving
TimeMoney
∼ 20% server time spent sorting [1, 2].
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 9 / 33
Graph Chi
Single-machine graph processing
Edge-wise graph processing
Sorts to grid
Processes graph using “parallel sliding windows” approach [3]
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 10 / 33
Graph Chi preprocssing 1
“parallel sliding windows” needs organized data
Sorting during pre-processing creates a “grid”
Processing follows
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 11 / 33
Graph Chi preprocssing 2
Pre-processing involves:Converting graph input into edge formatFilling a shovel buffer with edgesSorting edges by destination into shovelsPerforming a k-way merge of shovels into a shard buffersorting edges in shard buffer into shards
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 12 / 33
Graph Chi Example
1 2
3
4
5
6
Figure : An example Graph
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 13 / 33
Graph Chi Process
graph.txt
1 2
1 3
4 6
2 4
3 6
3 4
3 5
2 3
4 5
1,2 1,3 4,6 2,4 3,6 3,4 3,5 2,3 4,5
1,2
1,3
2,4
4,6
3,6
2,3
3,4
3,5
4,5 1,2 1,3 2,3 2,4 3,4 3,5 4,5 4,6 3,6
3,6
4,6
2,4
3,4
3,5
4,5
1,2
1,3
2,3
conver
t to
edge
s
shovelbuffer
shovel 1 shovel 2
shard 1 shard 2 shard 3
shovels
shards
shardsink
sort on dst sort on dst
k-way merge
sort on src
sort on src
sort
onsr
c
Figure : Creating Shards for our example graph
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 14 / 33
GraphChi Sorting
GraphChi uses PBBS Radix Sort
Problem Based Benchmark Suite (PBBS)
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 15 / 33
Least Significant Digit Radix Sort
Scan right-most digit
Deal into buckets
Consider next digit and repeat
Like sorting cards by value, then suit
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 16 / 33
PBBS Radix Sort
Algorithm 1 PBBS Radix Sort
Input: input, val()/* Initialization */
1: buf ← initArray[input.length]2: counts← initArray[bucketCount]3: bIndexT ← initArray[input.length]Output: sorted input
/* deal based on current radix and counts */4: for j = 0 to passes−1 do5: zero(counts)
/* build histogram, store value */6: for i = 0 while i < input.length do7: bIndexT [i]← bitsFor(val(input[i]), j)8: counts[bIndexT [i]]++9: end for
/* convert counts to indices */10: for i = 0 to N−1 do11: convertToIndices(counts)12: end for
/* deal to buf */13: for i = 0 while i < input.length do14: buf [counts[bIndexT [i]]] = i15: end for16: swap(input,buf )17: end for
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 17 / 33
Fast Radix
We introduce Fast RadixA better Radix Sort
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 18 / 33
Fast Radix
Estimate bucket sizeSkips initial counting passManages overflow
Given three or four passes:removing a complete read through data is very noticeable
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 19 / 33
Fast Radix Example
Estimate bucket sizes
Deal into buckets based on LSD
Deal from buckets and overflow based on next digit
Repeat untill all significant digits dealt
Everything is sorted
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 20 / 33
Fast Radix Example: Estimation
ind input
0 A1
1 D7
2 5D
3 89
4 B6
5 18
6 A3
7 39
8 98
9 F3
A 52
B F9
C F6
D 67
E 9A
F 05
buf input
39
A1 98
52 F3
A3 F9
F6
05 67
B6 A3
D7 39
18 98
89 F3
9A 52
F9
F6
5D 67
9A
05
Figure : Dealing and counting pass. The ind is the index. Dealing is frominput to buf, shown by arrows. The second input shows overflow dealt back tothe beginning of that buffer.
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 21 / 33
Fast Radix Example: Overflow Management
buff
A1
52
A3
05
B6
D7
18
89
9A
5D
input
05
18
39
52
5D
67
89
98
9A
A1
A3
B6
D7
F3
F6
F9
over
F3
F6
67
98
39
F9
1
2
3
5
6
7
8
9
A
D
3
6
7
8
9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Figure : Dealing pass. The buckets’ low-order hex digits are shown withbraces on B and Over. The interleaving is shown in the dealing, representedby numbered arrows.
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 22 / 33
Data Sets
Social media/Web graphs
Between 1M–150M verticesBetween 5M–5.5B edges
google [4]: web graphlive-journal [5]: social media graphhollywood [6, 7]: related actorstwitter-2010 [8]: social media graphfriendster [9]: social media graphuk-union [10]: web graph
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 23 / 33
Setup
4 Intel Xeon E5-2697 2.6GHz processors x14 core
768GB RAM
6x2 TB HDD
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 24 / 33
Results: Sorting
Graph Name Vertices Edges GraphChi:Sort
Fast Radix:Sort
∆ Sort overflow
google 900K 5M 0.15s 0.077s 99% 0.65%live-journal 4.8M 69M 5.4s 5.4s 0% 0.72%hollywood 1.1M 113M 7.2s 4.9s 47% 0.50%
twitter-2010 42M 1.5B 128s 106s 21% 1.54%friendster 66M 1.8B 165s 133s 24% 0.12%uk-union 134M 5.5B 469s 329s 43% 0.49%
Graph Name: Name of the GraphVertices: Number of Vertices in GraphEdges : Number of Edges in GraphGraphChi: Sort : Total sorting time for GraphChi using PBBS radix sortFast Radix: Sort : Total sorting time for GraphChi using Fast Radix∆ Time : percentage improvement GraphChi:Sort
FastRadix:Sortoverflow : overflow percentage during Fast Radix
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 25 / 33
Results: Pre-processing
Graph Name Vertices Edges GraphChi:Pre-processing
Fast Radix:Pre-processing
∆ Time
google 900K 5M 1.7s 1.5s 14%live-journal 4.8M 69M 32s 32s 0%hollywood 1.1M 113M 45s 38s 18%
twitter-2010 42M 1.5B 759s 693s 9.4%friendster 66M 1.8B 1022s 921s 11%uk-union 134M 5.5B 2549s 2135s 19%
Graph Name: Name of the GraphVertices: Number of Vertices in GraphEdges : Number of Edges in GraphGraphChi: Pre-processing : Total pre-processing for GraphChi using PBBS radix sortFast Radix: Pre-processing : Total pre-processing for GraphChi using Fast Radix
∆ Time : percentage improvement GraphChi:Pre−processingFastRadix:Pre−processing
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 26 / 33
Unexpected Results
Graph Name Vertices Edges GraphChi:Sort
Fast Radix:Sort
∆ Sort overflow
google 900K 5M 0.15s 0.077s 99% 0.65%live-journal 4.8M 69M 5.4s 5.4s 0% 0.72%hollywood 1.1M 113M 7.2s 4.9s 47% 0.50%
twitter-2010 42M 1.5B 128s 106s 21% 1.54%friendster 66M 1.8B 165s 133s 24% 0.12%uk-union 134M 5.5B 469s 329s 43% 0.49%
Expected timing difference with fewer than 224 nodes28 bit digits used in a dealing passless than 224−1 = 28+8+8−1 three dealing passesmore than 224 means four dealing passes
livejournal fared worse than expected
uk-union did better than expected
in both cases, we suspect cache-issues, but unconfirmed
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 27 / 33
Conclusion
In general Fast Radix was much better
It was no worse than PBBS in the worst case
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 28 / 33
Questions?
http://users.encs.concordia.ca/~sthiel/IDEAS2016/
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 29 / 33
Bibliography I
S. Chen and J. Reif, “Using difficulty of prediction to decreasecomputation: fast sort, priority queue and convex hull on entropybounded inputs,” in Proceedings, 34th Annual Symposium onFoundations of Computer Science, 1993., pp. 104–112, 1993.
B. Wagar, “System for MSD radix sort bin storage management,”1995.US Patent 5,440,734.
A. Kyrola, G. Blelloch, and C. Guestrin, “Graphchi: Large-scalegraph computation on just a pc,” in Presented as part of the 10thUSENIX Symposium on Operating Systems Design andImplementation (OSDI 12), pp. 31–46, 2012.
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 30 / 33
Bibliography II
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,“Community Structure in Large Networks: Natural Cluster Sizes andthe Absence of Large Well-Defined Clusters,” CoRR,vol. abs/0810.1355, 2008.
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, “Groupformation in large social networks: membership, growth, andevolution,” in Proceedings of the 12th ACM SIGKDD InternationalConference on Knowledge discovery and data mining, pp. 44–54,ACM, 2006.
P. Boldi and S. Vigna, “The WebGraph framework I: Compressiontechniques,” in Proc. of the Thirteenth International World WideWeb Conference (WWW 2004), (Manhattan, USA), pp. 595–601,ACM Press, 2004.
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 31 / 33
Bibliography III
P. Boldi, M. Rosa, M. Santini, and S. Vigna, “Layered LabelPropagation: A Multiresolution Coordinate-Free Ordering forCompressing Social Networks,” in Proceedings of the 20thInternational Conference on World Wide Web (S. Srinivasan,K. Ramamritham, A. Kumar, M. P. Ravindra, E. Bertino, andR. Kumar, eds.), pp. 587–596, ACM Press, 2011.
H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a socialnetwork or a news media?,” in WWW ’10: Proceedings of the 19thInternational Conference on World Wide Web, (New York, NY,USA), pp. 591–600, ACM, 2010.
J. Yang and J. Leskovec, “Defining and Evaluating NetworkCommunities based on Ground-truth,” CoRR, vol. abs/1205.6233,2012.
P. Boldi, M. Santini, and S. Vigna, “A Large Time-Aware Graph,”SIGIR Forum, vol. 42, no. 2, pp. 33–38, 2008.
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 32 / 33
Bibliography IV
Stuart Thiel, Greg Butler, Larry Thiel Improving GraphChi for Large Graph Processing 33 / 33