bolt-on global consistency for the cloud - github pageschallenges for data replication in cloud...

Post on 07-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bolt-On Global Consistency for the Cloud

Zhe Wu, Edward Wijaya, Muhammed Uluyol,

Harsha V. Madhyastha

University of Michigan

1

Geo-distribution for Low Latency

2

Geo-distribution Requires Data Replication

3

Geo-distribution Requires Data Replication

4

Cloud Simplifies App Deployment

5

Cloud Simplifies App Deployment

6

?

?

Application Needs to Manage Replication

Isolated storage services

7

Application Needs to Manage Replication

Isolated storage services

8

Application Needs to Manage Replication

Isolated storage services

No replication across cloud providers

9

Challenges for Data Replication in Cloud

Conflict?

10

Challenges for Data Replication in Cloud

Megastore(CIDR’11) Spanner(OSDI’12) MDCC(Eurosys’13) Tapir(SOSP’15) …..

Paxos

11

Challenges for Data Replication in Cloud

Paxos logic Paxos logic Paxos logicPropose writes

Paxos

12

Challenges for Data Replication in Cloud

Paxos logic Paxos logic Paxos logicPropose writes

Paxos

13

Challenges for Data Replication in Cloud

Paxos logic Paxos logic Paxos logicPropose writes

Paxos

14

Challenges for Data Replication in Cloud

Paxos logic Paxos logic Paxos logic

PUT/GET

Propose writes

Paxos

15

Challenges for Data Replication in Cloud

Paxos

VMs managedby application

PUT/GET

16

Paxos logic Paxos logic Paxos logic

Challenges for Data Replication in Cloud

Paxos1. High cost

VMs managedby application

PUT/GET

17

Paxos logic Paxos logic Paxos logic

Challenges for Data Replication in Cloud

Paxos1. High cost2. Bottleneck

VMs managedby application

PUT/GET

18

Paxos logic Paxos logic Paxos logic

Challenges for Data Replication in Cloud

Conflict?

Paxos1. High cost2. Bottleneck

19

Challenges for Data Replication in Cloud

Conflict?

Paxos1. High cost2. Bottleneck

Paxos withlimited interface

Disk Paxos(Distributed Computing’03)

pPaxos (ATC’15)

20

Challenges for Data Replication in Cloud

Paxos1. High cost2. Bottleneck

Paxos withlimited interface

DiskPaxos, pPaxos

21

Challenges for Data Replication in Cloud

Paxos1. High cost2. Bottleneck

Paxos withlimited interface

DiskPaxos, pPaxos

1. Conflict-free writePaxos ops

22

Challenges for Data Replication in Cloud

Paxos1. High cost2. Bottleneck

Paxos withlimited interface

DiskPaxos, pPaxos

1. Conflict-free writePaxos ops

2. Read from the logto replay Paxos logic

23

Challenges for Data Replication in Cloud

Paxos1. High cost2. Bottleneck

Paxos withlimited interface

DiskPaxos, pPaxos

1. Conflict-free writePaxos ops

2. Read from the logto replay Paxos logic

1. High latency

24

Challenges for Data Replication in Cloud

Paxos1. High cost2. Bottleneck

Paxos withlimited interface

DiskPaxos, pPaxos

1. Conflict-free writePaxos ops

2. Read from the logto replay Paxos logic

1. High latency2. High cost

25

Problems with Existing Solutions

26

Low latency Compatible with limited interface Low cost

Traditional Paxos

Disk Paxos,pPaxos

Our Solution: Consistent Replication In the Cloud

27

Low latency Compatible with limited interface Low cost

Traditional Paxos

Disk Paxos,pPaxos

CRIC

CRIC Overview

Cloud Storage

Cloud Storage

App VM App VMCRIC Library CRIC Library

Key-value store(reads/writes)

28

CRIC Overview

Cloud Storage

Cloud Storage

App VM App VMCRIC Library CRIC Library

CPaxos

Key-value store(reads/writes)

29

CRIC Overview

Cloud Storage

Cloud Storage

App VM App VMCRIC Library CRIC Library

CPaxos

Key-value store(reads/writes)

30

ü Apps directly read/write data from/to cloud storage

CRIC Overview

Cloud Storage

Cloud Storage

App VM App VMCRIC Library CRIC Library

CPaxos

Key-value store(reads/writes)

31

ü Apps directly read/write data from/to cloud storage

ü Low latency (1 RTT)

CPaxos In Action

Proposer(App) Acceptor Storage

Prepare

Accept

Executing a write in traditional Paxos

32

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage

Prepare

Accept

Executing a write in traditional PaxosExecuting a write in CPaxos

33

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage(Passive

acceptor)

Prepare

Accept

Executing a write in traditional PaxosExecuting a write in CPaxos

34

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage(Passive

acceptor)

Prepare

Read Paxos state

Run Paxos prepare logic

Update Paxos state

Accept

Executing a write in traditional PaxosExecuting a write in CPaxos

35

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage(Passive

acceptor)

Prepare

Read Paxos state

Run Paxos prepare logic

Update Paxos state

Accept

Executing a write in traditional Paxos

Run Paxos accept logic

Update Paxos state and data

Executing a write in CPaxos

36

CPaxos In Action

Proposer(App)

Storage(Passive

acceptor)

Read Paxos state

Run Paxos prepare logic

Update Paxos state

Run Paxos accept logic

Update Paxos state and data

Executing a write in CPaxos

37

Proposer

CPaxos In Action

Proposer(App)

Storage(Passive

acceptor)

Read Paxos state

Run Paxos prepare logic

Update Paxos state

Run Paxos accept logic

Update Paxos state and data

Executing a write in CPaxos

38

Proposer

Leverage cloud supported conditional-PUT(available in all cloud storage services)

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage(Passive

acceptor)

Prepare

Read Paxos state

Update Paxos state

Accept

Executing a write in traditional Paxos

Update Paxos state and data

Executing a write in CPaxos

39

2 RTTs 3 RTTs

Preparelogic

Acceptlogic

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage(Passive

acceptor)

Prepare

Read Paxos state

Update Paxos state

Accept

Executing a write in traditional Paxos

Update Paxos state and data

Executing a write in CPaxos

40

2 RTTs 3 RTTs

Preparelogic

Acceptlogic

Can be omitted when:1. Write follows a read2. Object creation

CPaxos In Action

Proposer(App) Acceptor Storage

Proposer(App)

Storage(Passive

acceptor)

Prepare

Read Paxos state

Update Paxos state

Accept

Executing a write in traditional Paxos

Update Paxos state and data

Executing a write in CPaxos

41

2 RTTs 3 RTTs

Preparelogic

Acceptlogic

Can be omitted when:1. Write follows a read2. Object creation

Leverage Fast Paxos to execute reads and writes in one round

Tradeoff: High Latency under Conflict

Propose 0Propose 1Time

42

Tradeoff: High Latency under Conflict

Propose 0Propose 1Time

43

Tradeoff: High Latency under Conflict

Propose 0Propose 1Time

Retry Retry

44

Tradeoff: High Latency under Conflict

Propose 0Propose 1Time

Higher proposal will succeed in traditional Paxos

Retry Retry

45

Tradeoff: High Latency under Conflict

Propose 0Propose 1Time

Retry Retry

Reason for conflict: variance inlatency to different data centers

46

Optimization: Staggered Requests

Propose 0Propose 1Time

47

Optimization: Staggered Requests

Propose 0Propose 1Time

48

Optimization: Staggered Requests

Propose 0Propose 1Time

49

Optimization: Staggered Requests

Propose 0Propose 1Time

50

Optimization: Staggered Requests

Propose 0Propose 1Time

Detect conflict faster

51

Optimization: Staggered Requests

Propose 0Propose 1Time

Detect conflict faster

52

Observation: low network latency variance between cloud DCs

CRIC Optimizations

¤Reduce latency under conflict¤Staggered Requests

¤Reduce reader-write-back¤Asynchronous commit notification

¤Reduce storage and data transfer cost¤ Separates data and Paxos log¤ Aggressive garbage collection in Accept phase¤ Store data digest in Paxos log

53

CRIC Optimizations

¤Reduce latency under conflict¤Staggered Requests

¤Reduce reader-write-back¤Asynchronous commit notification

¤Reduce storage and data transfer cost¤ Separates data and Paxos log¤ Aggressive garbage collection in Accept phase¤ Store data digest in Paxos log

54

Cost-effectiveOnly one version of the data is stored in each replica data center

Evaluation

¤ Deploy CRIC in 5 Azure data centers and run YCSB workload

¤ Comparison systems:¤ active acceptor Fast Paxos¤ passive acceptor pPaxos

55

Evaluation

¤ Deploy CRIC in 5 Azure data centers and run YCSB workload

¤ Comparison systems:¤ active acceptor Fast Paxos¤ passive acceptor pPaxos

¤How does CRIC compare with respect to cost and performance?

56

Evaluation

¤ Deploy CRIC in 5 Azure data centers and run YCSB workload

¤ Comparison systems:¤ active acceptor Fast Paxos¤ passive acceptor pPaxos

¤How does CRIC compare with respect to cost and performance?

¤How effective are staggered requests?

57

CRIC Enables Low Cost

58

CRIC Enables Low Cost

59

CRICFast Paxos

pPaxos

CRIC Enables Low Cost

60

CRICFast Paxos

pPaxos

Eliminate need for relay VMs

CRIC Enables Low Cost

61

CRICFast Paxos

pPaxos

Reduce I/O and data transfers

CRIC Enables Low Cost

62

CRICFast Paxos

pPaxos

CRIC can reduce cost by 20% ~ 50%

… without Sacrificing Performance

0

50

100

150

200

250

300

350

400

Read Write

Med

ian

late

ncy

(ms)

Und

er lo

w c

onfli

ct

FastPaxospPaxosCRIC

63

… without Sacrificing Performance

0

50

100

150

200

250

300

350

400

Read Write

Med

ian

late

ncy

(ms)

Und

er lo

w c

onfli

ct

FastPaxospPaxosCRIC

Same performance as FastPaxos

64

… without Sacrificing Performance

0

50

100

150

200

250

300

350

400

Read Write

Med

ian

late

ncy

(ms)

Und

er lo

w c

onfli

ct

FastPaxospPaxosCRIC

Better write performance than pPaxos

65

Staggered Requests Lower Latency Under Conflict

100

1000

0 1 2 3 4 5 6 7 8

Med

ian

late

ncy

for

succ

essf

ul w

rites

(ms)

# of client servers per DC

100

1000

0 1 2 3 4 5 6 7 8

Med

ian

late

ncy

for

succ

essf

ul w

rites

(ms)

# of client servers per DC

Without staggeredWith staggered

Increasing conflict rate

66

Staggered Requests Lower Latency Under Conflict

100

1000

0 1 2 3 4 5 6 7 8

Med

ian

late

ncy

for

succ

essf

ul w

rites

(ms)

# of client servers per DC

100

1000

0 1 2 3 4 5 6 7 8

Med

ian

late

ncy

for

succ

essf

ul w

rites

(ms)

# of client servers per DC

Without staggeredWith staggered

Increasing conflict rate

Lower latency for same conflict rate

67

Conclusions

¤Consistent Replication In the Cloud¤Compatible with cloud storage interface ¤One round read/write in common case¤Low cost

Thank youtowuzhe@gmail.com

68

69

top related