1 fast failure recovery in distributed graph processing systems yanyan shen, gang chen, h.v....

27
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

Upload: janice-booth

Post on 29-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

1

Fast Failure Recovery in Distributed Graph Processing Systems

Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

Page 2: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

2

Graph analytics• Emergence of large graphs

– The web, social networks, spatial networks, …

• Increasing demand of querying large graphs– PageRank, reverse web link analysis over the web graph– Influence analysis in social networks– Traffic analysis, route recommendation over spatial graphs

Page 3: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

3

Distributed graph processing

MapReduce-like systems

Pregel-like systems GraphLab-related systems

Others

Page 4: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

4

Failures of compute nodes

Increasing graph size

More compute nodes

Increase in the number of failed

nodes

1 10 100 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Failure probability when avg. failure

time of a compute node is ~200 hours

# of compute nodes

• Failure rate– # of failures per unit of time– 1/200(hours)

• Exponential failure probability

Page 5: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

5

Outline• Motivation & background• Failure recovery problem

– Challenging issues– Existing solutions

• Solution– Reassignment generation– In-parallel recomputation– Workload rebalance

• Experimental results• Conclusions

Page 6: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

6

Pregel-like distributed graph processing systems• Graph model

– G=(V,E)– P: partitions

• Computation model– A set of supersteps– Invoke compute function for each active vertex– Each vertex can

•Receive and process messages•Send messages to other vertices•Modify its value, state(active/inactive), its outgoing edges

BA G

HJ

C

FE

D

I

𝐍𝟏 𝐍𝟐

P1

P2

P3

P4

P5

BA

C D

FE

G H

JI

B

C

H

I D EB F

A B

C D

FE

G H

JI

Vertex Subgraph

G

Page 7: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

7

Failure recovery problem• Running example

– All the vertices compute and send messages to all neighbors in all the supersteps

– N1 fails when the job executes in superstep 12

– Two states: record each vertex completes which superstep when failure occurs (Sf)and failure is recovered (Sf*)

• Problem statement– For a failure F(Nf, sf), recover vertex states from Sf to Sf*

BA G

HJ

C

FE

D

I

𝐍𝟏 𝐍𝟐

Sf Sf*

A-F: 10; G-J: 12 A-J: 12

Page 8: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

8

Challenging issues• Cascading failures

– New failures may occur during the recovery phase– How to handle all the cascading failures if any?

•Existing solution: treat each cascading failure as an individual failure and restart from the latest checkpoint

• Recovery latency– Re-execute lost computations to achieve state S*– Forward messages during recomputation– Recover cascading failures– How to perform recovery with minimized latency?

Page 9: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

9

Existing recovery mechanisms• Checkpoint-based recovery

– During normal execution•all the compute nodes flush its own graph-related information to a reliable storage at the beginning of every checkpointing superstep (e.g., C+1, 2C+1, …, nC+1).

– During recovery•let c+1 be the latest checkpointing superstep•use healthy nodes to replace failed ones; all the compute nodes rollback to the latest checkpoint and re-execute lost computations since then (i.e., from superstep c+1 to sf)

Simple to implement! Can handle cascading failures!

Replay lost computations over whole graph!

Ignore partially recovered workload!

Page 10: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

10

Existing recovery mechanisms• Checkpoint + log

– During normal execution:•besides checkpoint, every compute node logs its outgoing messages at the end of each superstep

– During recovery•Use healthy nodes (replacements) to replace failed one•Replacements:

– redo lost computation and forward messages among each other;

– forward messages to all the nodes in superstep sf

•Healthy nodes:–holds their original partitions and redo the lost computation by forwarding locally logged messages to failed vertices

Page 11: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

11

Existing recovery mechanisms• Checkpoint + log

– Suppose latest checkpoint is made at the beginning of superstep 11; N1 (A-F) fails at superstep 12

– During recovery•superstep 11: A-F perform computation and send messages to each other; G-J send messages to A-F•superstep 12:A-F perform computation and send messages along their outgoing edges; G-J send messages to A-F

BA G

HJ

C

FE

D

I

𝐍𝟏 𝐍𝟐

Less computation and communication cost!

Overhead of locally logging! (negligible)

Limited parallelism: replacements handle all the lost computation!

Page 12: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

12

Outline• Motivation & background• Problem statement

– Challenging issues– Existing solutions

• Solution– Reassignment generation– In-parallel recomputation– Workload rebalance

• Experimental results• Conclusions

Page 13: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

13

Our solution• Partition-based failure recovery

– Step 1: generate a reassignment for the failed partitions– Step 2: recompute failed partitions

•Every node is informed of the reassignment•Every node loads its newly assigned failed partitions from the latest checkpoint; redoes lost computations

– Step 3: exchange partitions•Re-balance workload after recovery

Page 14: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

14

Recompute failed partitions• In superstep , every compute node iterates through its

active vertices. For each vertex , we:– perform computation for vertex only if:

•its state after the failure satisfies:

– forward a message from to only if:•; or,

Intuition: will need this message to perform computation in superstep i+1

Page 15: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

15

Example• N1 fails in superstep 12

– Redo superstep 11, 12

BA G

HJ

C

FE

D

I

𝐍𝟏 𝐍𝟐

B

A

G

H JC

FE

D I

𝐍𝟏 𝐍𝟐

(1) reassginment (2) recomputation

Less computation and communication cost!

Page 16: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

16

Handling cascading failures• N1 fails in superstep 12

• N2 fails superstep 11 during recovery

BA G

HJ

C

FE

D

I

𝐍𝟏 𝐍𝟐

B

A

G

H JC

FE

D I

𝐍𝟏 𝐍𝟐

(1) reassginment

(2) recomputation

No need to recover A and B since they have been recovered!

Same recovery algorithm can be used to recovery any failure!

Page 17: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

17

Reassignment generation• When a failure occurs, how to compute a good

reassignment for failed partitions?– Minimize the recovery time

• Calculating recovery time is complicated because it depends on:– Reassignment for the failure– Cascading failures– Reassignment for each cascading failure

No knowledge about cascading failures!

Page 18: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

18

Our insight• When a failure occurs (can be cascading failure), we

prefer a reassignment that can benefit the remaining recovery process by considering all the cascading failures that have occurred

• We collect the state S after the failure and measure the minimum time Tlow to achieve Sf*

– Tlow provides a lower bound of remaining recovery time

Page 19: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

19

Estimation of Tlow

– Ignore downtime (similar over different recovery methods)

• To estimate computation and communication time, we need to know:– Which vertex will perform computation – Which message will be forwarded (across different nodes)

• Maintain relevant statistics in the checkpoint

Page 20: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

20

Reassignment generation problem• Given a failure, find a reassignment that minimizes

– Problem complexity: NP-hard– Different from graph partitioning problem

•Assignment partitioning•Not a static graph, but depends on runtime vertex states and messages•No “balance” requirement

• Greedy algorithm– Start with a random reassignment for failed partitions and

achieve a better one (with less ) by “moving” the failed partitions

Page 21: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

21

Outline• Motivation & background• Problem statement

– Challenging issues– Existing solutions

• Solution– Reassignment generation– In-parallel recomputation– Workload rebalance

• Experimental results• Conclusions

Page 22: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

22

Experimental evaluation• Experiment settings

– In-house cluster with 72 nodes, each of which has one Intel X3430 2.4GHz processor, 8GB of memory, two 500GB SATA hard disks and Hadoop 0.20.203.0, and Giraph-1.0.0.

• Comparisons– PBR(our proposed solution), CBR(checkpoint-based)

• Benchmark Tasks– K-means– Semi-clustering– PageRank

• Datasets– Forest– LiveJournal– Friendster

Page 23: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

23

PageRank results

Logging Overhead Single Node Failure

Page 24: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

24

PageRank results

Multiple Node Failure Cascading Failure

Page 25: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

25

PageRank results (communication cost)

Multiple Node Failure Cascading Failure

Page 26: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

26

Conclusions• Develop a novel partition-based recovery method to

parallelize failure recovery workload for distributed graph processing

• Address challenges in failure recovery– Handle cascading failures– Reduce recovery latency

• Reassignment generation problem• Greedy strategy

Page 27: 1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

27

Thank You!

Q & A