pwl denver: copysets

31
opysets: Reducing the Frequency of Data Loss in Cloud Storage Aysylu Greenberg Papers We Love Denver April 27, 2017

Upload: aysylu-greenberg

Post on 23-Jan-2018

284 views

Category:

Software


1 download

TRANSCRIPT

Page 1: PWL Denver: Copysets

opysets:

Reducing the Frequency of Data Lossin Cloud Storage

Aysylu Greenberg

Papers We Love Denver

April 27, 2017

Page 2: PWL Denver: Copysets

Welcome Papers We Love Denver!

Page 3: PWL Denver: Copysets

Aysylu Greenberg

@aysylu22

paperswelove.org

Page 4: PWL Denver: Copysets
Page 5: PWL Denver: Copysets

Today

• Random replication

• Copyset Replication

• Copyset Replication with Scatter Width

• Pragmatic aspects

Page 6: PWL Denver: Copysets

RANDOM REPLICATION

Overview & Tradeoffs

Page 7: PWL Denver: Copysets

Random Replication

R = 3N = 9

Page 8: PWL Denver: Copysets

Random Replication:Correlated Failures

Page 9: PWL Denver: Copysets

Recovery from Data Loss

Fixed cost of restoring lost data is high

Lose more data but less often

Increase in R is expensive

Page 10: PWL Denver: Copysets

Random Replication:Tradeoff

{small amount & high frequency} data loss

{large amount & low frequency} data loss

Page 11: PWL Denver: Copysets

COPYSET REPLICATION

Intuition

Page 12: PWL Denver: Copysets

Copyset Replication

R = 3N = 9

Page 13: PWL Denver: Copysets

Copyset Replication

R = 3N = 9

S = 2

Page 14: PWL Denver: Copysets

Recovery from Node Failure

Simpler recovery than random replication:

R – 1 nodes with data

Higher load on small number of nodes

Page 15: PWL Denver: Copysets

Copyset Replication with S = 2:Tradeoff

{small amount & high frequency} data loss

{large amount & low frequency} data loss

Page 16: PWL Denver: Copysets

SCATTER WIDTH

Tuning choices

Page 17: PWL Denver: Copysets

Copyset Replication with S=2

R = 3N = 9

Page 18: PWL Denver: Copysets

Copyset Replication with S=4

R = 3N = 9

Page 19: PWL Denver: Copysets

Copyset Replication with S = 4

1 2 3

654

7 8 9 R = 3N = 9

Page 20: PWL Denver: Copysets

1 2 3

654

7 8 9

1 2 3

654

7 8 9

Copyset Replication with S = 4:Permutation Phase

Page 21: PWL Denver: Copysets

1 2 3

654

7 8 9

Copyset Replication with S = 4:Permutation Phase

Page 22: PWL Denver: Copysets

1 2 3

654

7 8 9

Copyset Replication with S = 4:Permutation Phase

Page 23: PWL Denver: Copysets

1 2 3

654

7 8 9

Copyset Replication with S = 4:Permutation Phase

Page 24: PWL Denver: Copysets

Tuning Scatter Width

Set by system designer to control parallelism of data recovery

Control load on each individual node during recovery

Page 25: PWL Denver: Copysets

Copyset Replication Scatter Width:Tradeoffs

{small amount & high frequency} data loss

{large amount & low frequency} data loss

Page 26: PWL Denver: Copysets

Scatter Width:Tuning Choices

Random replication: scatter width of N-1, lots of replica sets

Page 27: PWL Denver: Copysets

Scatter Width:Tuning Choices

Random replication: scatter width of N-1, lots of replica sets

S << N

Page 28: PWL Denver: Copysets

Scatter Width:Tuning Choices

Random replication: scatter width of N-1, lots of replica sets

S << N

To reduce frequency of data loss, minimize:

Page 29: PWL Denver: Copysets

FROM IDEAS TO PRACTICE

Pragmatic aspects

Page 30: PWL Denver: Copysets

Pragmatic Aspects

• Move randomization to permutation stage

• Low overhead on operations

• Near optimal and fast

• Support for dynamic systems while maintaining guarantees is tricky -> chainsets(http://hackingdistributed.com/2014/02/14/chainsets/)

• Tiered replicationhttps://www.usenix.org/conference/atc15/technical-session/presentation/cidon

Page 31: PWL Denver: Copysets

opysets:

Reducing the Frequency of Data Lossin Cloud Storage

Aysylu Greenberg

Papers We Love Denver

April 27, 2017