causalconsistencyalgorithmsforpartially ...ajayk/ccalgos.pdfcausalconsistencyalgorithmsforpartially...

Post on 23-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Causal Consistency Algorithms for PartiallyReplicated and Fully Replicated Systems

TaYuan Hsu, Ajay D. Kshemkalyani, Min ShenUniversity of Illinois at Chicago

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems1 / 72

Outline

1 Introduction

2 System Model

3 AlgorithmsFull-Track Algorithm: partially replicated memoryOpt-Track Algorithm: partially replicated memoryOpt-Track-CRP: fully replicated memory

4 Complexity Measures and Performance

5 Approximate Causal ConsistencyApprox-Opt-TrackPerformance

6 Conclusions and Future Work

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems2 / 72

Introduction

Motivation for Replication

Replication – makes copies of data/services on multiple sites.Replication improves ...

Reliability (by redundancy)If primary File Server crashes, standby File Server still worksPerformanceReduce communication delaysScalabilityPrevent overloading a single server (size scalability)Avoid communication latencies (geographic scale)

However, concurrent updates become complexConsistency models define which interleavings of operations are valid(admissible)

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems3 / 72

Introduction

Consistency Models

Spectrum of consistency models trade-off: cost vs. convenient semanticsLinearizability (Herlihy and Wing 1990)

non-overlapping ops seen in order of occurrence +overlapping ops seen in common order

Sequential consistency (Lamport 1979)all processes see the same interleaving of executions

Causal consistency (Ahamad et al. 1991)all processes see the same order of causally related writes

PRAM consistency (Lipton and Sandberg 1988)pipeline per pair of processes, for updates

Slow memory (Hutto et al. 1990)pipeline per variable per pair of processes, for updates

Eventual consistency (Johnson et al. 1975)updates are guaranteed to eventually propagate to all replicas

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems4 / 72

Introduction

The Motivation of Causal Consistency

Figure: Causal consistency is good for users in social networks.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems5 / 72

Introduction

Related Works

Causal consistency in distributed shared memory systems (Ahamad etal. [1])

Causal consistency has been studied (by Baldoni et al., Mahajan et al.,Belaramani et al., Petersen et al.).Since 2011,

ChainReaction (S. Almeida et al.)Bolt-on causal consistency (P. Bailis et al.)Orbe and GentleRain (J. Du et al.)Wide-Area Replicated Storage (K. Lady et al.)COPS, Eiger (W. Lloyd et al.)

The above works assume full replication.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems6 / 72

Introduction

Partial Replication

Figure: Case for Partial Replication.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems7 / 72

Introduction

Case for Partial Replication

Partial replication is more natural for some applications. See previous fig.

With p replicas at some p of the total of n DCs, each write operation thatwould have triggered an update broadcast now becomes a multicast to justp of the n DCs.

Savings in storage and infrastructure costs

For write-intensive workloads, partial replication gives a direct savings inthe number of messages.

Allowing flexibility in the number of DCs required in causally consistentreplication remains an interesting aspect of future work.

The supposedly higher cost of tracking dependency metadata is relativelysmall for applications such as Instagram.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems8 / 72

Introduction

Causal Consistency

Causal consistency: writes that are potentially causally related must beseen by all processors in that same order. Concurrent writes may be seenin a different order on different machines.

causally related writes: the write comes after a read that returned thevalue of the other write

Examples

Figure: Should enforce W(x)a < W(x)b ordering?

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems9 / 72

System Model

System Model

n application processes - ap1, ap2,...,apn - interacting through a sharedmemory Q composed of q variables x1,x2,...,xq

Each api can perform either a read or a write on any of the q variables.ri (xj )v : a read operation by api on variable xj returns value vwi (xj )v : a write operation by api on variable xj writes value v

local history hi : a series of read and write operations generated byprocess api

global history H : the set of local histories hi from all n processes

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems10 / 72

System Model

Causality Order

Program Order : a local operation o1 precedes another local operationo2, denoted as o1 �po o2

Read-from Order : read operation o2 = r(x )v retrieves the value vwritten by the write operation o1 = w(x )v from a distinct process,denoted as o1 �ro o2

Causality Order : o1 �co o2 iff one of the following conditions holds:9api s.t. o1 �po o2 (program order)9api , apj s.t. o1 �ro o2 (read-from order)9o3 2 OH s.t. o1 �co o3 and o3 �co o2 (transitive closure)

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems11 / 72

System Model

Underlying Distributed Communication System

The shared memory abstraction and its causal consistency model isimplemented on top of the distributed message passing system.

With n sites (connected by FIFO channels), each site si hosts anapplication process api and holds only a subset of variables xh 2 Q.When api performs a write operation w(x1)v , it invokes theMulticast(m) to deliver the message m containing w(x1)v to all sitesreplicating x1.

send event, receive event, apply event in msg-passing layer

When api performs a read operation r(x2)v , it invokes theRemoteFetch(m) to deliver the message m containing r(x2)v to apre-designated site replicating x2 to fetch its value.

fetch event, remote_return event, return event in msg-passing layer

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems12 / 72

System Model

Activation Predicate

Baldoni et al. [2] defined relation !co on send events .sendi (mw(x)a) !co sendj (mw(y)b) iff one of the following conditionsholds:

1 i = j and sendi (mw(x)a) locally precedes sendj (mw(y)b)2 i 6= j and returnj (x ; a) locally precedes sendj (mw(y)b)3 9sendk (mw(z)c), s.t. sendi (mw(x)a) !co sendk (mw(z)c) !co sendj (mw(y)b)

!co � ! (Lamport’s happened before relation)A write can be applied when its activation predicate becomes true

there is no earlier message (under !co) which has not been locally applied

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems13 / 72

System Model

Implementing the Activation Predicate

Figure: Can the write for M3 can be applied at s1? Only after the earlier write forM is applied. The dependency "s1 is a destination of M" is needed as meta-data onM3.

Meta-data grows as computation progresses

Objective: to reduce the size of meta-data

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems14 / 72

Algorithms

Overview of Algorithms

Two algorithms [3, 4] implement causal consistency in a partiallyreplicated distributed shared memory system.

Full-TrackOpt-Track (a message and space optimal algorithm)

Implement the !co relation; adopt the activation predicateA special case of Opt-Track – for full replication.

Opt-Track-CRP (optimal) : a lower message size, time, space complexitythan the Baldoni et al. algorithm [2]

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems15 / 72

Algorithms Full-Track Algorithm: partially replicated memory

Algorithm 1: Full-Track

Algorithm 1 is for a non-fully replicated system.

Each application process performing write operation will only write to asubset of all the sites.

Each site si needs to track the number of write operations performed byevery apj to every site sk , denoted as Writei [j ][k ].

the Write clock piggybacked with messages generated by theMulticast(m) should not be merged with the local Write clock at themessage reception, but only at a later read operation reading the valuethat comes with the message.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems16 / 72

Algorithms Full-Track Algorithm: partially replicated memory

Algorithm 1: Full-Track

Data structures1 Writei - the Write clock

Writei [j ; k ] : the number of updates sent by apj to site sk that causallyhappened before under the !co relation.

2 Applyi - an array of integersApplyi [j ] : the number of updates written by apj that have been appliedat site si .

3 LastWriteOni hvariable id, Writei - a hash map of Write clocksLastWriteOni hhi : the Write clock value associated with the last writeoperation on variable xh locally replicated at site si .

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems17 / 72

Algorithms Full-Track Algorithm: partially replicated memory

Algorithm 1: Full-Track

Figure:TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems18 / 72

Algorithms Full-Track Algorithm: partially replicated memory

Algorithm 1: Full-Track

The activation predicate is implemented.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems19 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

Algorithm 2: Opt-Track

Each message corresponding to a write operation piggybacks an O(n2)

matrix in Algorithm 1.Algorithm 2 uses propagation constraints to further reduce themessage size and storage cost.

Exploits the transitive dependency of causal deliveries of messages (ideasfrom the KS algorithm [5]).

Each site keeps a record of the most recently received message from eachother site (along with the list of its destinations).

optimal in terms of log space and message space overheadachieve another optimality that no redundant destination information isrecorded.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems20 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

Propagation Constraints: Two Situations forDestination Information to be Redundant

Figure: Meta-data information � = "s2 is a destination of m". The causal future ofthe relevant apply and return events are shown in dotted lines.

� must not exist in the causal future ofapply(w): (Propagation Constraint 1)apply(w’): (Propagation Constraint 2)

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems21 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

Algorithm 2: Opt-Track

Data Structures1 clocki

local counter at site si for write operations performed by api .2 Applyi - an array of integers

Applyi [j ] : the number of updates written by apj that have been appliedat site si .

3 LOGi = fhj ; clockj ;Destsig - the local logEach entry indicates a write operation in the causal past.

4 LastWriteOni hvariable id, LOGi - a hash map of LOGsLastWriteOni hhi : the piggybacked LOG from the most recent updateapplied at site si for locally replicated variable xh .

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems22 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

Algorithm 2: Opt-Track

Figure: Write process at site siTaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems23 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

Algorithm 2: Opt-Track

Figure: Read, receiving processes at site siTaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems24 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

Procedures used in Opt-Track

Figure: PURGE and MERGE functions at site si

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems25 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

The Propagation Constraints: An example

M

P1

P2

P3

P4

P5

P6

M

M

MM

4,3

4,3

2,2

4,2M

4,2

5,1M

1

1

1

1

1

causal past contains event (6,1)

21 3 4

2

32

2

M2,3

M3,3

43

2 3

32

of multicast at event (5,1) propagatesas piggybacked information and in Logs

information about P as a destination6

5

M3,3

M 5,26,25,1

M

(a)

M5,1

to P

M4,2

to P3

,P2

2,2M to P

1

M6,2

to P1

M4,3

to P6

M4,3

to P3

M5,2

to P6

M2,3

to P

M

1

3,3to P

2,P

6

{ P4

{ P6

{ P6

{ P4

{ P6

{ }

{ P ,4

P6

{ P

{ }

}

}

}

}

}

64P }

Message to dest.

piggybacked

M 5,1 .Dests

,6

,P

}6

(b)

Figure: Meta-data information � = "P6 is a destination of M5;1". The propagationof explicit information and the inference of implicit information.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems26 / 72

Algorithms Opt-Track Algorithm: partially replicated memory

If the Destination List Becomes ;, then ...

Figure: Illustration of why it is important to keep a record even if its destination listbecomes empty.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems27 / 72

Algorithms Opt-Track-CRP: fully replicated memory

Algorithm 3: Opt-Track-CRP

Special case of Algorithm 2 for full replication; same optimizations.

Every write operation will be sent to exactly the same set of sites; thereis no need to keep a list of the destination information with each write.

Represent each write operation as hi ; clocki i at site si .

the cost of a write operation down from O(n) to O(1).d entries in local log, where d = no. of write operations in local log

Local log always gets reset after each writeEach read will add at most one new entry into the local log

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems28 / 72

Algorithms Opt-Track-CRP: fully replicated memory

Further Improved Scalability

In Algorithm 2, keeping entries with empty destination list is important.

In the fully replicated case, we can also decrease this cost.

s1

s2

s3

m(w(x1)v)

m(w0(x2)u)

LOG3 = {w}

LOG1 = {w}

LOG3 = {w0}return3(x1, v)

send1(m(w))

receive3(m(w))

receive1(m(w0))

send3(m(w0))

LastWriteOn1h2i = {w0}

LastWriteOn3h1i = {w}

Figure: In fully replicated systems, the local log will be reset after each write.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems29 / 72

Algorithms Opt-Track-CRP: fully replicated memory

Algorithm 3: Opt-Track-CRP

Figure: There is no need to maintain the destination list for each write operation inthe local log.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems30 / 72

Complexity Measures and Performance

Parameters

n : number of sites in the system

q : number of variables in the system

p: replication factor, i.e., the number of sites where each variable isreplicated

w : number of write operations performed in the system

r : number of read operations performed in the system

d : number of write operations stored in local log (used only inOpt-Track-CRP algorithm)

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems31 / 72

Complexity Measures and Performance

Complexity

Table: Complexity measures of causal memory algorithms.

Metric Full-Track Opt-Track Opt-Track-CRP optP [2]

Message Count ((p � 1) + n�pn )w + 2r

(n�p)n ((p � 1) + n�p

n )w + 2r(n�p)

n (n � 1)w (n � 1)w

Message O(n2pw + nr(n � p)) O(n2pw + nr(n � p)) O(nwd) O(n2w)Size amortized O(npw + r(n � p))Time write O(n2) write O(n2p) write O(d) write O(n)Complexity read O(n2) read O(n2) read O(d) read O(n)Space O(max(n2

; npq)) O(max(n2; npq)) O(max(d; q)) O(nq)

Complexity amortized O(max(n; pq))

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems32 / 72

Complexity Measures and Performance

Message Count as a Function of wrate

Define write rate as wrate = ww+r

Partial replication gives a lower message count than full replication if

pw + 2r(n � p)

n< nw ) w > 2

rn

(1)

) wrate >2

1+ n(2)

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems33 / 72

Complexity Measures and Performance

Message count: Partial Replication vs. FullReplication

0

1

2

3

4

5

6

7

8

9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Mes

sage

Cou

nt

Write Rate (w/(w+r))

Partial Replication versus Full Replication (n=10)

p = 1p = 3p = 5p = 7

p = 10

Figure: The graph illustrates message count for partial replication vs. fullreplication, by plotting message count as a function of wrate .

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems34 / 72

Complexity Measures and Performance

Message meta-data structures

Figure: Partial Replication

Figure: Full Replication (only SM)

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems35 / 72

Complexity Measures and Performance

Simulation Methodology

Time interval Te between two consecutive events from 5ms � 2000ms .

Propagation time Tt from 100ms � 3000ms .

Number of processes n is varied from 5 up to 40.

wrate is set to be 0.2, 0.5, and 0.8, respectively.

Replica factor rate pn for partial replication is defined as 0.3.

Message meta-data size (ms): The total size of all the meta-datatransmitted over all the processes.

Each simulation execution runs 600n operation events totally.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems36 / 72

Complexity Measures and Performance

Meta-Data Space Overhead in Partial Replication

0

2

4

6

8

10

12

14

16

5 10 20 30 40

Avera

ge M

ess

ag

e S

ize (

KB

)

Processes

SM(Full-Track)FM(Full-Track)RM(Full-Track)SM(Opt-Track)FM(Opt-Track)RM(Opt-Track)

Figure: wrate = 0.2.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems37 / 72

Complexity Measures and Performance

Meta-Data Space Overhead in Partial Replication

0

2

4

6

8

10

12

14

16

5 10 20 30 40

Avera

ge M

ess

ag

e S

ize (

KB

)

Processes

SM(Full-Track)FM(Full-Track)RM(Full-Track)SM(Opt-Track)FM(Opt-Track)RM(Opt-Track)

Figure: wrate = 0.5.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems38 / 72

Complexity Measures and Performance

Meta-Data Space Overhead in Partial Replication

0

2

4

6

8

10

12

14

16

5 10 20 30 40

Avera

ge M

ess

ag

e S

ize (

KB

)

Processes

SM(Full-Track)FM(Full-Track)RM(Full-Track)SM(Opt-Track)FM(Opt-Track)RM(Opt-Track)

Figure: wrate = 0.8.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems39 / 72

Complexity Measures and Performance

Ratio of Message Overhead of Opt-Track to Full-Track

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30 35 40

Ra

tio

of

Mes

sag

e O

verh

ead

of

Op

t-T

rack

to

Fu

ll-T

rack

Processes

write rate: .2write rate: .5write rate: .8

Figure: Total message meta-data space overhead as a function of n and wrate inpartial replication protocols.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems40 / 72

Complexity Measures and Performance

Meta-Data Size in Partial Replication

With increasing n , the ratio rapidly decreases. For 40 processes, theOpt-Track overheads are only around 10 � 20 % of Full-Track overheadson different write rates.

In Full-Track protocol, the average message space overheads of SM andRM quadratically increases with n . However, the increasingly loweroverhead of SM and RM in Opt-Track are linear in n .

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems41 / 72

Complexity Measures and Performance

Meta-Data Space Overhead in Full Replication

0

100

200

300

400

500

600

700

5 10 20 30 35 40

Avera

ge M

ess

ag

e S

ize (

Byte

)

Processes

SM(optP)SM(Opt-Track-CRP)

Figure: wrate = 0.2.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems42 / 72

Complexity Measures and Performance

Meta-Data Space Overhead in Full Replication

0

100

200

300

400

500

600

700

5 10 20 30 35 40

Avera

ge M

ess

ag

e S

ize (

Byte

)

Processes

SM(optP)SM(Opt-Track-CRP)

Figure: wrate = 0.5.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems43 / 72

Complexity Measures and Performance

Meta-Data Space Overhead in full Replication

0

100

200

300

400

500

600

700

5 10 20 30 35 40

Avera

ge M

ess

ag

e S

ize (

Byte

)

Processes

SM(optP)SM(Opt-Track-CRP)

Figure: wrate = 0.8.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems44 / 72

Complexity Measures and Performance

Ratio of Message Overhead of Opt-Track-CRP to optP

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30 35 40

Ra

tio

of

Mes

sag

e O

verh

ead

of

Op

t-T

rack

-CR

P t

o o

ptP

Processes

write rate: .2write rate: .5write rate: .8

Figure: Total message meta-data space overhead as a function of n and wrate in fullreplication protocols.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems45 / 72

Complexity Measures and Performance

Message Count: Partial Replication vs. FullReplication

Total message size overhead= Total message count � (meta-data size + replicated data size)

X meta-data size � replicated data size

Figure: Total message count for Full Replication (Opt-Track-CRP) VS. PartialReplication (Opt-Track).

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems46 / 72

Complexity Measures and Performance

Message Size: Partial Replication vs. Full Replication

(a) when n = 40, p = 12, wrate = 0.5, in the worst case.

(b) when n = 40, p = 12, wrate = 0.5, in the real case.

Figure: Total message size for Full Replication (Opt-Track-CRP) and PartialReplication (Opt-Track).

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems47 / 72

Approximate Causal Consistency

Approximate Causal Consistency

For some applications where the data size is small (e.g, wall posts inFacebook), the size of the meta-data can be a problem.

Can further reduce meta-data overheads at the risk of some (rare)violations of causal consistency.

As dependencies age, w.h.p. the messages they represent get deliveredand the dependencies need not be carried around and stored.

Amount of violations can be made arbitrarily small by controlling aparameter called credits.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems48 / 72

Approximate Causal Consistency Approx-Opt-Track

Notion of Credits: Case 1

Figure: Reduce the meta-data at the cost of some possible violations of causalconsistency. The amount of violations can be made arbitrarily small by controlling atunable parameter (credits).

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems49 / 72

Approximate Causal Consistency Approx-Opt-Track

Notion of Credits: Case 2

Figure: Illustration of meta-data reduction when credits are exhausted.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems50 / 72

Approximate Causal Consistency Approx-Opt-Track

Approx-Opt-Track

Integrate the notion of credits into the Opt-Track algorithm, to give analgorithm – Approx-Opt-Track [6] – that can fine-tune the amount ofcausal consistency by trading off the size of meta-data overhead.

Give three instantiations of credits (hop count, time-to-live, and metricdistance)Violation Error Rate: Re = ne

mc

ne : number of messages applied by the remote replicated sites withviolation of causal consistencymc : total number of transmitted messages

Meta-Data Saving Rate: Rs = 1- ms(cr 6=1)ms(Opt�Track)

ms : message meta-data size, the total size of all the meta-data transmittedover all the processes

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems51 / 72

Approximate Causal Consistency Performance

Simulation Methodology in Approx-Opt-Track

Time interval Te between two consecutive events from 5ms � 2000ms .

Propagation time Tt 100ms � 3000ms .

Number of processes n is varied from 5 up to 40.

wrate is set to 0.2, 0.5, and 0.8, respectively.

Replica factor rate pn for partial replication defined as 0.3.

Number of variables q used is 100.

(?) Hop Count Credit (cr): denotes the hop count available before theentry meta-data ages out and is removed.

(?) Initial credit cr is specified from one to a critical value cr0, withwhich there is no message transmission violating causal consistency.

Each simulation execution runs 600n operation events totally.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems52 / 72

Approximate Causal Consistency Performance

Violation Error Rates

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 2 3 4 5 6 7 8 9

Vio

lati

on

Erro

r R

ate

Initial Credits

5 Peers

10 Peers

20 Peers

30 Peers

40 Peers

Figure: wrate = 0.2.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems53 / 72

Approximate Causal Consistency Performance

Violation Error Rates

0

0.05

0.1

0.15

0.2

1 2 3 4 5 6 7 8 9

Vio

lati

on

Erro

r R

ate

Initial Credits

5 Peers

10 Peers

20 Peers

30 Peers

40 Peers

Figure: wrate = 0.5.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems54 / 72

Approximate Causal Consistency Performance

Violation Error Rates

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 2 3 4 5 6 7 8 9

Vio

lati

on

Erro

r R

ate

Initial Credits

5 Peers

10 Peers

20 Peers

30 Peers

40 Peers

Figure: wrate = 0.8.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems55 / 72

Approximate Causal Consistency Performance

Critical Initial Credits

Figure: Summary of the critical values of cr0 and cr�0:5%.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems56 / 72

Approximate Causal Consistency Performance

Impact of initial cr on Re

cr0 : major critical initial credit (Re = 0)

cr�0:5% : minor critical initial credit (Re � 0.5%).

cr�0:5% seems not to highly increase as n .

By setting the initial cr to a small finite value but enough, most of thedependencies will become aged and can be removed without violatingcausal consistency after the associated meta-data is transmitted across afew hops (even for a large number of processes.)

The correlation coefficients of cr0 and n are around 0.94 � 0.95. Themajor critical credit values increase as n to avoid causal violations.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems57 / 72

Approximate Causal Consistency Performance

Average Meta-Data Size (KB)

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

1 2 3 4 5 6 7 8

Ave

rag

e M

eta

-Da

ta S

ize

(KB

)

Initial Credits

5Peers

10Peers

20Peers

30Peers

40Peers

Figure: wrate = 0.2.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems58 / 72

Approximate Causal Consistency Performance

Average Meta-Data Size (KB)

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6 7 8

Ave

rag

e M

eta

-Da

ta S

ize

(KB

)

Initial Credits

5Peers

10Peers

20Peers

30Peers

40Peers

Figure: wrate = 0.5.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems59 / 72

Approximate Causal Consistency Performance

Average Meta-Data Size (KB)

0.2

0.4

0.6

0.8

1

1.2

1.4

1 2 3 4 5 6 7 8

Ave

rag

e M

eta

-Da

ta S

ize

(KB

)

Initial Credits

5Peers

10Peers

20Peers

30Peers

40Peers

Figure: wrate = 0.8.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems60 / 72

Approximate Causal Consistency Performance

Critical Average Message Meta-Data Size mave (KB)

Figure: Summary of the critical average message meta-data sizes.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems61 / 72

Approximate Causal Consistency Performance

Critical Average Message Meta-Data Size

With increasing number of processes, mave linearly increases.mave becomes smaller with a higher wrate for more processes.

A read operation will invoke a MERGE function to merge the piggybackedlog list. So, a higher read rate may increase the likelihood to make the logsize enlarged.Furthermore, although a write operation results in the increase of explicitinformation, it comes with a PURGE function to delete the redundantinformation.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems62 / 72

Approximate Causal Consistency Performance

Message Meta-Data Saving Rate

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9

Met

a-D

ata

Siz

e S

avi

ng

Ra

te

Initial Credits

5 Peers10 Peers20 Peers30 Peers40 Peers

Figure: wrate = 0.2.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems63 / 72

Approximate Causal Consistency Performance

Message Meta-Data Saving Rate

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

Met

a-D

ata

Siz

e S

avi

ng

Ra

te

Initial Credits

5 Peers10 Peers20 Peers30 Peers40 Peers

Figure: wrate = 0.5.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems64 / 72

Approximate Causal Consistency Performance

Message Meta-Data Saving Rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

Met

a-D

ata

Siz

e S

avi

ng

Ra

te

Initial Credits

5 Peers10 Peers20 Peers30 Peers40 Peers

Figure: wrate = 0.8.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems65 / 72

Approximate Causal Consistency Performance

Critical Message Meta-Data Size Saving Rates Rs

Figure: Summary of the critical message meta-data size saving rates.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems66 / 72

Approximate Causal Consistency Performance

Critical Message Meta-Data Size Saving Rate

Focus on the case of 40 processes:Rs is around 40% � 60% at a very slight cost of violating causalconsistency.Rs reaches around 5% � 20% without violating causality order in differentwrite rates.

This evidence proves that if the initial credit allocation is just a smalldigit, when the corresponding meta-data is removed, the associatedmessage would already (very likely) have reached its destination.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems67 / 72

Conclusions and Future Work

Conclusions

Opt-Track has a better network capacity utilization and betterscalability than Full-Track for causal consistency in partial replication.

The meta-data overhead of Opt-Track is linear in n .These improvements increase in higher write-intensive workloads.For the case of 40 processes, the overheads of Opt-Track are only around10 � 20 % of those of Full-Track for different write rates.

Opt-Track-CRP can perform better than optP (Baldoni et. al 2006) infull replication.

For 40 processes, the overheads of Opt-Track-CRP are only around 50 �55 % of those of optP for different write rates.

Showed the advantages of partial replication and provided the conditionsunder which partial replication can provide less overhead than fullreplication.Modification of Opt-Track, called Approx-Opt-Track, to provideapproximate causal consistency by reducing the size of the meta-data.

By controlling a parameter cr , we can trade-off the level of potentialinaccuracy by the size of meta-data.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems68 / 72

Conclusions and Future Work

Future Work

Reduce the size of the meta-data for maintaining causal consistency inpartially replicated systems.

Dynamic and Data-driven Replication Mechanism (Optimize thereplication mechanism).

Which file?How many replicas?Where?When?(i.e., the replica factor rate p

n is a variable.)Hierarchical Causal Consistency Protocol in Partial Replication.

A client-cluster model (two-level architecture) for causal consistency underpartial replication.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems69 / 72

Conclusions and Future Work

References I

M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto.Causal memory: Definitions, implementation and programming.Distributed Computing, 9(1):37–49, 1994.

R. Baldoni, A. Milani, and S. Tucci-piergiovanni.Optimal propagation based protocols implementing causal memories.Distributed Computing, 18(6):461–474, 2006.

Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu.OPCAM: optimal algorithms implementing causal memories in sharedmemory systems.In Proceedings of the 2015 International Conference on DistributedComputing and Networking, ICDCN 2015, Goa, India, January 4-7,2015, pages 16:1–16:4.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems70 / 72

Conclusions and Future Work

References II

Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu.Causal consistency for geo-replicated cloud storage under partialreplication.In IPDPS Workshops, pages 509–518. IEEE, 2015.

A. Kshemkalyani and M. Singhal.Necessary and sufficient conditions on information for causal messageordering and their optimal implementation.Distributed Computing, 11(2):91–111, April 1998.

Ta Yuan Hsu and Ajay D. Kshemkalyani.Performance of approximate causal consistency for partially replicatedsystems.In Workshop on Adaptive Resource Management and Scheduling forCloud Computing (ARMS-CC), pages 7–13. ACM, 2016.

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems71 / 72

Conclusions and Future Work

Questions

Thank You!

TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems72 / 72

top related