replication-based fault-tolerance for large-scale graph processing
DESCRIPTION
Replication-based Fault-tolerance for Large-scale Graph Processing. Peng Wang , Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan Shanghai Jiao Tong University. Graph. Useful information in graph Many applications SSSP Community Detection ……. Graph computing. Graphs are large - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/1.jpg)
Replication-based Fault-tolerance for Large-scale Graph Processing
Peng Wang, Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan
Shanghai Jiao Tong University
![Page 2: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/2.jpg)
Graph
• Useful information in graph
• Many applications– SSSP– Community Detection……
![Page 3: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/3.jpg)
Graph computing
• Graphs are large– Require a lot of machines
• Fault tolerance is important
![Page 4: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/4.jpg)
How graph computing works
1
2
3
1
W1 W2
Compute Compute
SendMsg SendMsg
EnterBarrier
Commit Commit
LeaveBarrier
2
3
1
PageRank(i) // compute its own rank total = 0 foreach ( j in in_neighbors(i)) : total = total + R[j] * Wji
R[i] = 0.15 + total
// trigger neighbors to run again if R[i] not converged then activate all neighbors
LoadGraph LoadGraph
1
1
Master
Replica
![Page 5: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/5.jpg)
Related work about fault tolerance
• Simple re-execution (MapReduce)– Complex data dependency
• Coarse-grained FT (Spark)– Fine-grained update on each vertex
• State-of-the-art fault tolerance for graph computing– Checkpoint– Trinity, PowerGraph, Pregel, etc.
![Page 6: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/6.jpg)
How checkpoint works
1 53 7
2 64 1 3
2 6 1 5
4 6
1 5
4 6Crash
Loading Graph Iter X Iter X+1
Iter XPartition && Topology
W1
W2
Recovery
DFS
recovery
checkpoint
global barrier
![Page 7: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/7.jpg)
Problems of checkpoint
• Large execution overhead– Large amount of states to write– Synchronization overhead
NO 1 2 40
50
100
150
200
ckptsynccommcomp
Exec
ution
tim
e (s
ec)
Checkpoint period
PageRank on LiveJournal
![Page 8: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/8.jpg)
Problems of checkpoint
• Large overhead
• Slow recovery– A lot of I/O operations– Require standby node
avg-time 1 iteration 2 iterations 3 iterations0
10
20
30
40
50
60
70
Recovery Time
tot
Seco
nd
w/o CKPT
Checkpoint Period
![Page 9: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/9.jpg)
Observation and motivation• Reuse existing replicas to provide fault tolerance
• Reuse existing replicas small overhead
• Replicas distributed in different machines fast recovery
GWeb LJournal Wiki SYN-GL DBLP RoadCA0%
4%
8%
12%
16%
0.84% 0.96% 0.26% 0.13%
Verti
ces w
ithou
t rep
lica
Almost all the vertices have replicas
![Page 10: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/10.jpg)
Contribution
• Imitator: a replication based fault tolerance system for graph processing
• Small overhead– Less than 5% for all cases
• Fast recovery– Up to 17.5 times faster than the checkpoint one
![Page 11: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/11.jpg)
Outline
• Execution flow
• Replication management
• Recovery
• Evaluation
• Conclusion
![Page 12: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/12.jpg)
Normal execution flowLoadGraph
Compute Compute
SendMsg SendMsg
EnterBarrier
Commit Commit
LeaveBarrier
1. Adding FT support
2. Extending normal synchronization message
LoadGraph
![Page 13: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/13.jpg)
Failure before barrier
Compute Compute
SendMsg SendMsg
EnterBarrier
Commit Commit
LeaveBarrier
enterBarrier
Compute
SendMsg
EnterBarrier
Commit
LeaveBarrier
Rollback && RecoveryRecovery
Newbie joinsCrash
LoadGraph LoadGraph
![Page 14: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/14.jpg)
Failure during barrier
Compute Compute
SendMsg SendMsg
EnterBarrier
Commit Commit
LeaveBarrier
leaveBarrier
Compute
SendMsg
EnterBarrier
Commit
LeaveBarrier
Recovery
Recovery
Newbie boot
Crash
LoadGraph LoadGraph
![Page 15: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/15.jpg)
Management of replication
• Fault tolerance replicas– every vertex has at least f replicas to tolerate f failures
• Full state replica (mirror)– Existing replica lacks meta information– Such as replication location
1
4
75
2
4
3
12
5
1 5
4 2
3
6
6
7
Node1
Node2
Node3
Master
Replica
Vertex: 5Master: n2Replicas: n1 | n3
![Page 16: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/16.jpg)
Optimization: selfish vertices
• States of selfish vertices have no consumer• Their states may only decided by their neighbors• Opt: get their states by re-computation
![Page 17: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/17.jpg)
How to recover
• Challenges– Parallel recovery
– Consistent state after recovery
![Page 18: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/18.jpg)
Problems of recovery
1
4
75
2
4
3
12
5
1 5
4 2
3
66 7Node1 Node2 Node3
1
1
1
CrashMaster
Mirror
Replica
• Which vertices have crashed?• How to recover without a central coordinator?
Rules:1. Master recovers replicas2. If master crashed, mirror recovers master and replicas
Replication Location
Vertex 3:Master: n3Mirror: n2
![Page 19: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/19.jpg)
Rebirth
1
4
75
2
4
3
12
5
1
6 7
Node1 Node2Newbie3
64 35
2
Rule:1. Master recovers replicas2. If master crashed, mirror recovers master and replicas
1
4
75
2
4
3
12
5
1 5
4 2
3
66 7Node1 Node2 Node3
1
1
1
Crash
Master
Mirror
Replica
![Page 20: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/20.jpg)
Problems of Rebirth
• Standby machine
• A single newbie machine
Migrate tasks to surviving machines
![Page 21: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/21.jpg)
Migration
1
4
75
2
4
3
12
56 7
Node1Node2
6
1
4
75
2
4
3
12
5
1 5
4 2
3
66 7Node1 Node2 Node3
1
1
1
Master
Mirror
Replica
Crash
Procedure:1. Mirrors upgrade to masters and broadcast2. Reload missing graph structure and reconstruct
![Page 22: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/22.jpg)
Inconsistency after recovery
1
2
3
1
W1 W2
Compute Compute
SendMsg SendMsg
EnterBarrier
Commit Commit
LeaveBarrier
2
3
Replica 2 on W1
Rank
Activated false0.10.2
Master 2 on W2
Rank 0.1
Activated falsetrue
![Page 23: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/23.jpg)
Replay Activation
1
2
3
1
W1 W2
2
3
Replica 2 on W1
Rank
Activated false
Master 2 on W2
Rank 0.1
Activated
Master 1 on W1
Rank 0.2
Activated false
ActNgbs true
Replica 1 on W2
Rank
Activated false
ActNgbs
falsetrue0.20.1
falsetrue
0.20.4
![Page 24: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/24.jpg)
Evaluation
• 50 VMs (10G memory 4cores)
• HDFS (3 Replications)
• Applications Application Graph Vertices Edge
PageRankGWeb 0.87M 5.11M
LJournal 4.85M 70.0MWiki 5.72M 130.1M
ALS SYN-GL 0.11M 2.7MCD DBLP 0.32M 1.05M
SSSP RoadCA 1.97M 5.53M
![Page 25: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/25.jpg)
Speedup over Hama• Imitator is based on Hama, a open source clone of Pregel
– Replication for dynamic computing [Distributed Graphlab, VLDB’12]
• Evaluated systems– Baseline: Imitator without fault tolerance– REP: Baseline + Replication based FT– CKPT: Baseline + Checkpoint based FT
GWeb LJournal Wiki SYN-GL DBLP RoadCA0
1
2
3
4
Speedup
![Page 26: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/26.jpg)
Normal execution overhead
Replication has negligible execution overhead
![Page 27: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/27.jpg)
Communication Overhead
![Page 28: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/28.jpg)
Performance of recovery
41
Exec
ution
Tim
e (S
econ
d) 56
GWeb LJournal Wiki SYN-GL DBLP RoadCA02468
101214161820
CKPTRebirthMigration
![Page 29: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/29.jpg)
Recovery Scalability
10 20 30 40 500
10
20
30
40
50
60
RebirthMigration
Reco
very
Tim
e (S
econ
d)
The more machines, the faster the recovery
![Page 30: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/30.jpg)
Simultaneous failure
One Two Three0
5
10
15
20
25
30
35
Recovery Time
RebirthRecovery
GWeb LJournal Wiki SYN-GL DBLP RoadCA95%
97%
99%
101%
103%
105%
107%
109%
111%
Overhead
OneTwoThree
Exec
ution
Tim
e (S
econ
d)Add more replicas to tolerate more than 1 machines simultaneous failure
![Page 31: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/31.jpg)
Case study
– Application: PageRank on the dataset of LiveJournal– A checkpoint for every 4 iterations– A Failure is injected between the 6th iteration and the 7th iteration
0 20 40 60 80 100 120 140 1600
2
4
6
8
10
12
14
16
18
20
BASE
CKPT/4
REP
CKPT/4 + 1 Failure
Rebirth + 1 Failure
Migration + 1 Failure
Execution time (Second)
Fini
shed
ite
ratio
ns
Detect Failure45 8.8
2.6
Replay
![Page 32: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/32.jpg)
Conclusion
• Imitator: a graph engine which supports fault tolerance
• Imitator’s execution overhead is negligible because it leverages existing replicas
• Imitator’s recovery is fast because of its parallel recovery approach
![Page 33: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/33.jpg)
Backup
![Page 34: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/34.jpg)
Memory Consumption
![Page 35: Replication-based Fault-tolerance for Large-scale Graph Processing](https://reader035.vdocuments.mx/reader035/viewer/2022062520/568162ac550346895dd32e1f/html5/thumbnails/35.jpg)
Partition Impact