causalconsistencyalgorithmsforpartially ...ajayk/ccalgos.pdfcausalconsistencyalgorithmsforpartially...
TRANSCRIPT
Causal Consistency Algorithms for PartiallyReplicated and Fully Replicated Systems
TaYuan Hsu, Ajay D. Kshemkalyani, Min ShenUniversity of Illinois at Chicago
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems1 / 72
Outline
1 Introduction
2 System Model
3 AlgorithmsFull-Track Algorithm: partially replicated memoryOpt-Track Algorithm: partially replicated memoryOpt-Track-CRP: fully replicated memory
4 Complexity Measures and Performance
5 Approximate Causal ConsistencyApprox-Opt-TrackPerformance
6 Conclusions and Future Work
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems2 / 72
Introduction
Motivation for Replication
Replication – makes copies of data/services on multiple sites.Replication improves ...
Reliability (by redundancy)If primary File Server crashes, standby File Server still worksPerformanceReduce communication delaysScalabilityPrevent overloading a single server (size scalability)Avoid communication latencies (geographic scale)
However, concurrent updates become complexConsistency models define which interleavings of operations are valid(admissible)
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems3 / 72
Introduction
Consistency Models
Spectrum of consistency models trade-off: cost vs. convenient semanticsLinearizability (Herlihy and Wing 1990)
non-overlapping ops seen in order of occurrence +overlapping ops seen in common order
Sequential consistency (Lamport 1979)all processes see the same interleaving of executions
Causal consistency (Ahamad et al. 1991)all processes see the same order of causally related writes
PRAM consistency (Lipton and Sandberg 1988)pipeline per pair of processes, for updates
Slow memory (Hutto et al. 1990)pipeline per variable per pair of processes, for updates
Eventual consistency (Johnson et al. 1975)updates are guaranteed to eventually propagate to all replicas
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems4 / 72
Introduction
The Motivation of Causal Consistency
Figure: Causal consistency is good for users in social networks.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems5 / 72
Introduction
Related Works
Causal consistency in distributed shared memory systems (Ahamad etal. [1])
Causal consistency has been studied (by Baldoni et al., Mahajan et al.,Belaramani et al., Petersen et al.).Since 2011,
ChainReaction (S. Almeida et al.)Bolt-on causal consistency (P. Bailis et al.)Orbe and GentleRain (J. Du et al.)Wide-Area Replicated Storage (K. Lady et al.)COPS, Eiger (W. Lloyd et al.)
The above works assume full replication.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems6 / 72
Introduction
Partial Replication
Figure: Case for Partial Replication.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems7 / 72
Introduction
Case for Partial Replication
Partial replication is more natural for some applications. See previous fig.
With p replicas at some p of the total of n DCs, each write operation thatwould have triggered an update broadcast now becomes a multicast to justp of the n DCs.
Savings in storage and infrastructure costs
For write-intensive workloads, partial replication gives a direct savings inthe number of messages.
Allowing flexibility in the number of DCs required in causally consistentreplication remains an interesting aspect of future work.
The supposedly higher cost of tracking dependency metadata is relativelysmall for applications such as Instagram.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems8 / 72
Introduction
Causal Consistency
Causal consistency: writes that are potentially causally related must beseen by all processors in that same order. Concurrent writes may be seenin a different order on different machines.
causally related writes: the write comes after a read that returned thevalue of the other write
Examples
Figure: Should enforce W(x)a < W(x)b ordering?
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems9 / 72
System Model
System Model
n application processes - ap1, ap2,...,apn - interacting through a sharedmemory Q composed of q variables x1,x2,...,xq
Each api can perform either a read or a write on any of the q variables.ri (xj )v : a read operation by api on variable xj returns value vwi (xj )v : a write operation by api on variable xj writes value v
local history hi : a series of read and write operations generated byprocess api
global history H : the set of local histories hi from all n processes
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems10 / 72
System Model
Causality Order
Program Order : a local operation o1 precedes another local operationo2, denoted as o1 �po o2
Read-from Order : read operation o2 = r(x )v retrieves the value vwritten by the write operation o1 = w(x )v from a distinct process,denoted as o1 �ro o2
Causality Order : o1 �co o2 iff one of the following conditions holds:9api s.t. o1 �po o2 (program order)9api , apj s.t. o1 �ro o2 (read-from order)9o3 2 OH s.t. o1 �co o3 and o3 �co o2 (transitive closure)
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems11 / 72
System Model
Underlying Distributed Communication System
The shared memory abstraction and its causal consistency model isimplemented on top of the distributed message passing system.
With n sites (connected by FIFO channels), each site si hosts anapplication process api and holds only a subset of variables xh 2 Q.When api performs a write operation w(x1)v , it invokes theMulticast(m) to deliver the message m containing w(x1)v to all sitesreplicating x1.
send event, receive event, apply event in msg-passing layer
When api performs a read operation r(x2)v , it invokes theRemoteFetch(m) to deliver the message m containing r(x2)v to apre-designated site replicating x2 to fetch its value.
fetch event, remote_return event, return event in msg-passing layer
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems12 / 72
System Model
Activation Predicate
Baldoni et al. [2] defined relation !co on send events .sendi (mw(x)a) !co sendj (mw(y)b) iff one of the following conditionsholds:
1 i = j and sendi (mw(x)a) locally precedes sendj (mw(y)b)2 i 6= j and returnj (x ; a) locally precedes sendj (mw(y)b)3 9sendk (mw(z)c), s.t. sendi (mw(x)a) !co sendk (mw(z)c) !co sendj (mw(y)b)
!co � ! (Lamport’s happened before relation)A write can be applied when its activation predicate becomes true
there is no earlier message (under !co) which has not been locally applied
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems13 / 72
System Model
Implementing the Activation Predicate
Figure: Can the write for M3 can be applied at s1? Only after the earlier write forM is applied. The dependency "s1 is a destination of M" is needed as meta-data onM3.
Meta-data grows as computation progresses
Objective: to reduce the size of meta-data
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems14 / 72
Algorithms
Overview of Algorithms
Two algorithms [3, 4] implement causal consistency in a partiallyreplicated distributed shared memory system.
Full-TrackOpt-Track (a message and space optimal algorithm)
Implement the !co relation; adopt the activation predicateA special case of Opt-Track – for full replication.
Opt-Track-CRP (optimal) : a lower message size, time, space complexitythan the Baldoni et al. algorithm [2]
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems15 / 72
Algorithms Full-Track Algorithm: partially replicated memory
Algorithm 1: Full-Track
Algorithm 1 is for a non-fully replicated system.
Each application process performing write operation will only write to asubset of all the sites.
Each site si needs to track the number of write operations performed byevery apj to every site sk , denoted as Writei [j ][k ].
the Write clock piggybacked with messages generated by theMulticast(m) should not be merged with the local Write clock at themessage reception, but only at a later read operation reading the valuethat comes with the message.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems16 / 72
Algorithms Full-Track Algorithm: partially replicated memory
Algorithm 1: Full-Track
Data structures1 Writei - the Write clock
Writei [j ; k ] : the number of updates sent by apj to site sk that causallyhappened before under the !co relation.
2 Applyi - an array of integersApplyi [j ] : the number of updates written by apj that have been appliedat site si .
3 LastWriteOni hvariable id, Writei - a hash map of Write clocksLastWriteOni hhi : the Write clock value associated with the last writeoperation on variable xh locally replicated at site si .
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems17 / 72
Algorithms Full-Track Algorithm: partially replicated memory
Algorithm 1: Full-Track
Figure:TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems18 / 72
Algorithms Full-Track Algorithm: partially replicated memory
Algorithm 1: Full-Track
The activation predicate is implemented.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems19 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
Algorithm 2: Opt-Track
Each message corresponding to a write operation piggybacks an O(n2)
matrix in Algorithm 1.Algorithm 2 uses propagation constraints to further reduce themessage size and storage cost.
Exploits the transitive dependency of causal deliveries of messages (ideasfrom the KS algorithm [5]).
Each site keeps a record of the most recently received message from eachother site (along with the list of its destinations).
optimal in terms of log space and message space overheadachieve another optimality that no redundant destination information isrecorded.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems20 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
Propagation Constraints: Two Situations forDestination Information to be Redundant
Figure: Meta-data information � = "s2 is a destination of m". The causal future ofthe relevant apply and return events are shown in dotted lines.
� must not exist in the causal future ofapply(w): (Propagation Constraint 1)apply(w’): (Propagation Constraint 2)
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems21 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
Algorithm 2: Opt-Track
Data Structures1 clocki
local counter at site si for write operations performed by api .2 Applyi - an array of integers
Applyi [j ] : the number of updates written by apj that have been appliedat site si .
3 LOGi = fhj ; clockj ;Destsig - the local logEach entry indicates a write operation in the causal past.
4 LastWriteOni hvariable id, LOGi - a hash map of LOGsLastWriteOni hhi : the piggybacked LOG from the most recent updateapplied at site si for locally replicated variable xh .
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems22 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
Algorithm 2: Opt-Track
Figure: Write process at site siTaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems23 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
Algorithm 2: Opt-Track
Figure: Read, receiving processes at site siTaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems24 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
Procedures used in Opt-Track
Figure: PURGE and MERGE functions at site si
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems25 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
The Propagation Constraints: An example
M
P1
P2
P3
P4
P5
P6
M
M
MM
4,3
4,3
2,2
4,2M
4,2
5,1M
1
1
1
1
1
causal past contains event (6,1)
21 3 4
2
32
2
M2,3
M3,3
43
2 3
32
of multicast at event (5,1) propagatesas piggybacked information and in Logs
information about P as a destination6
5
M3,3
M 5,26,25,1
M
(a)
M5,1
to P
M4,2
to P3
,P2
2,2M to P
1
M6,2
to P1
M4,3
to P6
M4,3
to P3
M5,2
to P6
M2,3
to P
M
1
3,3to P
2,P
6
{ P4
{ P6
{ P6
{ P4
{ P6
{ }
{ P ,4
P6
{ P
{ }
}
}
}
}
}
64P }
Message to dest.
piggybacked
M 5,1 .Dests
,6
,P
}6
(b)
Figure: Meta-data information � = "P6 is a destination of M5;1". The propagationof explicit information and the inference of implicit information.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems26 / 72
Algorithms Opt-Track Algorithm: partially replicated memory
If the Destination List Becomes ;, then ...
Figure: Illustration of why it is important to keep a record even if its destination listbecomes empty.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems27 / 72
Algorithms Opt-Track-CRP: fully replicated memory
Algorithm 3: Opt-Track-CRP
Special case of Algorithm 2 for full replication; same optimizations.
Every write operation will be sent to exactly the same set of sites; thereis no need to keep a list of the destination information with each write.
Represent each write operation as hi ; clocki i at site si .
the cost of a write operation down from O(n) to O(1).d entries in local log, where d = no. of write operations in local log
Local log always gets reset after each writeEach read will add at most one new entry into the local log
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems28 / 72
Algorithms Opt-Track-CRP: fully replicated memory
Further Improved Scalability
In Algorithm 2, keeping entries with empty destination list is important.
In the fully replicated case, we can also decrease this cost.
s1
s2
s3
m(w(x1)v)
m(w0(x2)u)
LOG3 = {w}
LOG1 = {w}
LOG3 = {w0}return3(x1, v)
send1(m(w))
receive3(m(w))
receive1(m(w0))
send3(m(w0))
LastWriteOn1h2i = {w0}
LastWriteOn3h1i = {w}
Figure: In fully replicated systems, the local log will be reset after each write.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems29 / 72
Algorithms Opt-Track-CRP: fully replicated memory
Algorithm 3: Opt-Track-CRP
Figure: There is no need to maintain the destination list for each write operation inthe local log.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems30 / 72
Complexity Measures and Performance
Parameters
n : number of sites in the system
q : number of variables in the system
p: replication factor, i.e., the number of sites where each variable isreplicated
w : number of write operations performed in the system
r : number of read operations performed in the system
d : number of write operations stored in local log (used only inOpt-Track-CRP algorithm)
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems31 / 72
Complexity Measures and Performance
Complexity
Table: Complexity measures of causal memory algorithms.
Metric Full-Track Opt-Track Opt-Track-CRP optP [2]
Message Count ((p � 1) + n�pn )w + 2r
(n�p)n ((p � 1) + n�p
n )w + 2r(n�p)
n (n � 1)w (n � 1)w
Message O(n2pw + nr(n � p)) O(n2pw + nr(n � p)) O(nwd) O(n2w)Size amortized O(npw + r(n � p))Time write O(n2) write O(n2p) write O(d) write O(n)Complexity read O(n2) read O(n2) read O(d) read O(n)Space O(max(n2
; npq)) O(max(n2; npq)) O(max(d; q)) O(nq)
Complexity amortized O(max(n; pq))
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems32 / 72
Complexity Measures and Performance
Message Count as a Function of wrate
Define write rate as wrate = ww+r
Partial replication gives a lower message count than full replication if
pw + 2r(n � p)
n< nw ) w > 2
rn
(1)
) wrate >2
1+ n(2)
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems33 / 72
Complexity Measures and Performance
Message count: Partial Replication vs. FullReplication
0
1
2
3
4
5
6
7
8
9
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Mes
sage
Cou
nt
Write Rate (w/(w+r))
Partial Replication versus Full Replication (n=10)
p = 1p = 3p = 5p = 7
p = 10
Figure: The graph illustrates message count for partial replication vs. fullreplication, by plotting message count as a function of wrate .
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems34 / 72
Complexity Measures and Performance
Message meta-data structures
Figure: Partial Replication
Figure: Full Replication (only SM)
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems35 / 72
Complexity Measures and Performance
Simulation Methodology
Time interval Te between two consecutive events from 5ms � 2000ms .
Propagation time Tt from 100ms � 3000ms .
Number of processes n is varied from 5 up to 40.
wrate is set to be 0.2, 0.5, and 0.8, respectively.
Replica factor rate pn for partial replication is defined as 0.3.
Message meta-data size (ms): The total size of all the meta-datatransmitted over all the processes.
Each simulation execution runs 600n operation events totally.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems36 / 72
Complexity Measures and Performance
Meta-Data Space Overhead in Partial Replication
0
2
4
6
8
10
12
14
16
5 10 20 30 40
Avera
ge M
ess
ag
e S
ize (
KB
)
Processes
SM(Full-Track)FM(Full-Track)RM(Full-Track)SM(Opt-Track)FM(Opt-Track)RM(Opt-Track)
Figure: wrate = 0.2.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems37 / 72
Complexity Measures and Performance
Meta-Data Space Overhead in Partial Replication
0
2
4
6
8
10
12
14
16
5 10 20 30 40
Avera
ge M
ess
ag
e S
ize (
KB
)
Processes
SM(Full-Track)FM(Full-Track)RM(Full-Track)SM(Opt-Track)FM(Opt-Track)RM(Opt-Track)
Figure: wrate = 0.5.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems38 / 72
Complexity Measures and Performance
Meta-Data Space Overhead in Partial Replication
0
2
4
6
8
10
12
14
16
5 10 20 30 40
Avera
ge M
ess
ag
e S
ize (
KB
)
Processes
SM(Full-Track)FM(Full-Track)RM(Full-Track)SM(Opt-Track)FM(Opt-Track)RM(Opt-Track)
Figure: wrate = 0.8.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems39 / 72
Complexity Measures and Performance
Ratio of Message Overhead of Opt-Track to Full-Track
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30 35 40
Ra
tio
of
Mes
sag
e O
verh
ead
of
Op
t-T
rack
to
Fu
ll-T
rack
Processes
write rate: .2write rate: .5write rate: .8
Figure: Total message meta-data space overhead as a function of n and wrate inpartial replication protocols.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems40 / 72
Complexity Measures and Performance
Meta-Data Size in Partial Replication
With increasing n , the ratio rapidly decreases. For 40 processes, theOpt-Track overheads are only around 10 � 20 % of Full-Track overheadson different write rates.
In Full-Track protocol, the average message space overheads of SM andRM quadratically increases with n . However, the increasingly loweroverhead of SM and RM in Opt-Track are linear in n .
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems41 / 72
Complexity Measures and Performance
Meta-Data Space Overhead in Full Replication
0
100
200
300
400
500
600
700
5 10 20 30 35 40
Avera
ge M
ess
ag
e S
ize (
Byte
)
Processes
SM(optP)SM(Opt-Track-CRP)
Figure: wrate = 0.2.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems42 / 72
Complexity Measures and Performance
Meta-Data Space Overhead in Full Replication
0
100
200
300
400
500
600
700
5 10 20 30 35 40
Avera
ge M
ess
ag
e S
ize (
Byte
)
Processes
SM(optP)SM(Opt-Track-CRP)
Figure: wrate = 0.5.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems43 / 72
Complexity Measures and Performance
Meta-Data Space Overhead in full Replication
0
100
200
300
400
500
600
700
5 10 20 30 35 40
Avera
ge M
ess
ag
e S
ize (
Byte
)
Processes
SM(optP)SM(Opt-Track-CRP)
Figure: wrate = 0.8.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems44 / 72
Complexity Measures and Performance
Ratio of Message Overhead of Opt-Track-CRP to optP
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Ra
tio
of
Mes
sag
e O
verh
ead
of
Op
t-T
rack
-CR
P t
o o
ptP
Processes
write rate: .2write rate: .5write rate: .8
Figure: Total message meta-data space overhead as a function of n and wrate in fullreplication protocols.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems45 / 72
Complexity Measures and Performance
Message Count: Partial Replication vs. FullReplication
Total message size overhead= Total message count � (meta-data size + replicated data size)
X meta-data size � replicated data size
Figure: Total message count for Full Replication (Opt-Track-CRP) VS. PartialReplication (Opt-Track).
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems46 / 72
Complexity Measures and Performance
Message Size: Partial Replication vs. Full Replication
(a) when n = 40, p = 12, wrate = 0.5, in the worst case.
(b) when n = 40, p = 12, wrate = 0.5, in the real case.
Figure: Total message size for Full Replication (Opt-Track-CRP) and PartialReplication (Opt-Track).
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems47 / 72
Approximate Causal Consistency
Approximate Causal Consistency
For some applications where the data size is small (e.g, wall posts inFacebook), the size of the meta-data can be a problem.
Can further reduce meta-data overheads at the risk of some (rare)violations of causal consistency.
As dependencies age, w.h.p. the messages they represent get deliveredand the dependencies need not be carried around and stored.
Amount of violations can be made arbitrarily small by controlling aparameter called credits.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems48 / 72
Approximate Causal Consistency Approx-Opt-Track
Notion of Credits: Case 1
Figure: Reduce the meta-data at the cost of some possible violations of causalconsistency. The amount of violations can be made arbitrarily small by controlling atunable parameter (credits).
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems49 / 72
Approximate Causal Consistency Approx-Opt-Track
Notion of Credits: Case 2
Figure: Illustration of meta-data reduction when credits are exhausted.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems50 / 72
Approximate Causal Consistency Approx-Opt-Track
Approx-Opt-Track
Integrate the notion of credits into the Opt-Track algorithm, to give analgorithm – Approx-Opt-Track [6] – that can fine-tune the amount ofcausal consistency by trading off the size of meta-data overhead.
Give three instantiations of credits (hop count, time-to-live, and metricdistance)Violation Error Rate: Re = ne
mc
ne : number of messages applied by the remote replicated sites withviolation of causal consistencymc : total number of transmitted messages
Meta-Data Saving Rate: Rs = 1- ms(cr 6=1)ms(Opt�Track)
ms : message meta-data size, the total size of all the meta-data transmittedover all the processes
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems51 / 72
Approximate Causal Consistency Performance
Simulation Methodology in Approx-Opt-Track
Time interval Te between two consecutive events from 5ms � 2000ms .
Propagation time Tt 100ms � 3000ms .
Number of processes n is varied from 5 up to 40.
wrate is set to 0.2, 0.5, and 0.8, respectively.
Replica factor rate pn for partial replication defined as 0.3.
Number of variables q used is 100.
(?) Hop Count Credit (cr): denotes the hop count available before theentry meta-data ages out and is removed.
(?) Initial credit cr is specified from one to a critical value cr0, withwhich there is no message transmission violating causal consistency.
Each simulation execution runs 600n operation events totally.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems52 / 72
Approximate Causal Consistency Performance
Violation Error Rates
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1 2 3 4 5 6 7 8 9
Vio
lati
on
Erro
r R
ate
Initial Credits
5 Peers
10 Peers
20 Peers
30 Peers
40 Peers
Figure: wrate = 0.2.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems53 / 72
Approximate Causal Consistency Performance
Violation Error Rates
0
0.05
0.1
0.15
0.2
1 2 3 4 5 6 7 8 9
Vio
lati
on
Erro
r R
ate
Initial Credits
5 Peers
10 Peers
20 Peers
30 Peers
40 Peers
Figure: wrate = 0.5.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems54 / 72
Approximate Causal Consistency Performance
Violation Error Rates
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 2 3 4 5 6 7 8 9
Vio
lati
on
Erro
r R
ate
Initial Credits
5 Peers
10 Peers
20 Peers
30 Peers
40 Peers
Figure: wrate = 0.8.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems55 / 72
Approximate Causal Consistency Performance
Critical Initial Credits
Figure: Summary of the critical values of cr0 and cr�0:5%.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems56 / 72
Approximate Causal Consistency Performance
Impact of initial cr on Re
cr0 : major critical initial credit (Re = 0)
cr�0:5% : minor critical initial credit (Re � 0.5%).
cr�0:5% seems not to highly increase as n .
By setting the initial cr to a small finite value but enough, most of thedependencies will become aged and can be removed without violatingcausal consistency after the associated meta-data is transmitted across afew hops (even for a large number of processes.)
The correlation coefficients of cr0 and n are around 0.94 � 0.95. Themajor critical credit values increase as n to avoid causal violations.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems57 / 72
Approximate Causal Consistency Performance
Average Meta-Data Size (KB)
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
1 2 3 4 5 6 7 8
Ave
rag
e M
eta
-Da
ta S
ize
(KB
)
Initial Credits
5Peers
10Peers
20Peers
30Peers
40Peers
Figure: wrate = 0.2.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems58 / 72
Approximate Causal Consistency Performance
Average Meta-Data Size (KB)
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6 7 8
Ave
rag
e M
eta
-Da
ta S
ize
(KB
)
Initial Credits
5Peers
10Peers
20Peers
30Peers
40Peers
Figure: wrate = 0.5.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems59 / 72
Approximate Causal Consistency Performance
Average Meta-Data Size (KB)
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4 5 6 7 8
Ave
rag
e M
eta
-Da
ta S
ize
(KB
)
Initial Credits
5Peers
10Peers
20Peers
30Peers
40Peers
Figure: wrate = 0.8.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems60 / 72
Approximate Causal Consistency Performance
Critical Average Message Meta-Data Size mave (KB)
Figure: Summary of the critical average message meta-data sizes.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems61 / 72
Approximate Causal Consistency Performance
Critical Average Message Meta-Data Size
With increasing number of processes, mave linearly increases.mave becomes smaller with a higher wrate for more processes.
A read operation will invoke a MERGE function to merge the piggybackedlog list. So, a higher read rate may increase the likelihood to make the logsize enlarged.Furthermore, although a write operation results in the increase of explicitinformation, it comes with a PURGE function to delete the redundantinformation.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems62 / 72
Approximate Causal Consistency Performance
Message Meta-Data Saving Rate
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9
Met
a-D
ata
Siz
e S
avi
ng
Ra
te
Initial Credits
5 Peers10 Peers20 Peers30 Peers40 Peers
Figure: wrate = 0.2.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems63 / 72
Approximate Causal Consistency Performance
Message Meta-Data Saving Rate
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9
Met
a-D
ata
Siz
e S
avi
ng
Ra
te
Initial Credits
5 Peers10 Peers20 Peers30 Peers40 Peers
Figure: wrate = 0.5.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems64 / 72
Approximate Causal Consistency Performance
Message Meta-Data Saving Rate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9
Met
a-D
ata
Siz
e S
avi
ng
Ra
te
Initial Credits
5 Peers10 Peers20 Peers30 Peers40 Peers
Figure: wrate = 0.8.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems65 / 72
Approximate Causal Consistency Performance
Critical Message Meta-Data Size Saving Rates Rs
Figure: Summary of the critical message meta-data size saving rates.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems66 / 72
Approximate Causal Consistency Performance
Critical Message Meta-Data Size Saving Rate
Focus on the case of 40 processes:Rs is around 40% � 60% at a very slight cost of violating causalconsistency.Rs reaches around 5% � 20% without violating causality order in differentwrite rates.
This evidence proves that if the initial credit allocation is just a smalldigit, when the corresponding meta-data is removed, the associatedmessage would already (very likely) have reached its destination.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems67 / 72
Conclusions and Future Work
Conclusions
Opt-Track has a better network capacity utilization and betterscalability than Full-Track for causal consistency in partial replication.
The meta-data overhead of Opt-Track is linear in n .These improvements increase in higher write-intensive workloads.For the case of 40 processes, the overheads of Opt-Track are only around10 � 20 % of those of Full-Track for different write rates.
Opt-Track-CRP can perform better than optP (Baldoni et. al 2006) infull replication.
For 40 processes, the overheads of Opt-Track-CRP are only around 50 �55 % of those of optP for different write rates.
Showed the advantages of partial replication and provided the conditionsunder which partial replication can provide less overhead than fullreplication.Modification of Opt-Track, called Approx-Opt-Track, to provideapproximate causal consistency by reducing the size of the meta-data.
By controlling a parameter cr , we can trade-off the level of potentialinaccuracy by the size of meta-data.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems68 / 72
Conclusions and Future Work
Future Work
Reduce the size of the meta-data for maintaining causal consistency inpartially replicated systems.
Dynamic and Data-driven Replication Mechanism (Optimize thereplication mechanism).
Which file?How many replicas?Where?When?(i.e., the replica factor rate p
n is a variable.)Hierarchical Causal Consistency Protocol in Partial Replication.
A client-cluster model (two-level architecture) for causal consistency underpartial replication.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems69 / 72
Conclusions and Future Work
References I
M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto.Causal memory: Definitions, implementation and programming.Distributed Computing, 9(1):37–49, 1994.
R. Baldoni, A. Milani, and S. Tucci-piergiovanni.Optimal propagation based protocols implementing causal memories.Distributed Computing, 18(6):461–474, 2006.
Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu.OPCAM: optimal algorithms implementing causal memories in sharedmemory systems.In Proceedings of the 2015 International Conference on DistributedComputing and Networking, ICDCN 2015, Goa, India, January 4-7,2015, pages 16:1–16:4.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems70 / 72
Conclusions and Future Work
References II
Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu.Causal consistency for geo-replicated cloud storage under partialreplication.In IPDPS Workshops, pages 509–518. IEEE, 2015.
A. Kshemkalyani and M. Singhal.Necessary and sufficient conditions on information for causal messageordering and their optimal implementation.Distributed Computing, 11(2):91–111, April 1998.
Ta Yuan Hsu and Ajay D. Kshemkalyani.Performance of approximate causal consistency for partially replicatedsystems.In Workshop on Adaptive Resource Management and Scheduling forCloud Computing (ARMS-CC), pages 7–13. ACM, 2016.
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems71 / 72
Conclusions and Future Work
Questions
Thank You!
TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at ChicagoCausal Consistency Algorithms for Partially Replicated and Fully Replicated Systems72 / 72