data consistency in 3d this talk is about… · [consistency in 3d] guarantee vs. semantics...
TRANSCRIPT
Data consistency in 3D(It’s the invariants, stupid)
Marc Shapiro Masoud Saieda Ardekani
Gustavo Petri
[Consistency in 3D]
This talk is about…Understanding consistency
• Primitive consistency mechanisms • How primitives compose models • How models relate / differ • What they cost
Understanding invariants • Some interesting classes of invariants
Relating consistency to invariants • Which primitives guarantee which invariants
Useful intuitions for app. and system designers
2
[Consistency in 3D]
Shared database
Social, web, e-commerce: shared mutable data Scalability ⇒ replication ⇒ consistency issues
3
q.push(e) c.inc()
c.inc()
q.val() c.val()
q: Queue c: Counter { |q| ≤ c }
[Consistency in 3D]
Geo-replicated database
Social, web, e-commerce: shared mutable data Scalability ⇒ replication ⇒ consistency issues
4
5 ms – ∞
q.push(e) c.inc()
c.inc()
q.val() c.val() q3 ∈ Queue?
q1 = q2 ? |q1| ≤ c4 ?
q: Queue c: Counter { |q| ≤ c } q: Queue
c: Counter { |q| ≤ c }
q: Queue c: Counter { |q| ≤ c }
[Consistency in 3D]
Consistency
More replicas: • Better read availability, responsiveness,
performance, etc. • More work to keep replicas in sync
Consistent = behavior similar to sequential: • Satisfies specs: does q behave like a queue? • Replicas agree: is q identical everywhere? • Objects agree: is |q| ≤ c? • Same flow of time? q1.push() before q2.push()
5 [Consistency in 3D]
Consistency opportunities and costs
CAP Availability
⟹ Parallelism keeps the hardware busy ⟹ More implem. options, scalable
But consistency constrains order of events: • Delay delivery • Stale reads • Waits, synchronisation (mutual wait)
Keeping track of order requires metadata Significant!
6
[Consistency in 3D]
Serrano-SI P-Store-SER GMU-US Jessy2pc-NMSI
RC Walter-PSI SDUR-SER
Credit: Masoud Saeida Ardekani
Costs illustrated
7
3×
4.5×
Term
inat
ion
Late
ncy
ofUp
date
Tra
nsac
tions
(m
s)
90% Read-only transactions; Disaster Tolerant
70% Read-only transactions; Disaster Tolerant
[Consistency in 3D]
Strict Serialisability
8
T1
T1
R1R2R3
Invariant
clientT1
Invariant Invariant
T3
T3
T2
T2
[Consistency in 3D]
Eventual consistency
9
Op1
R1R2R3
Op2
[Consistency in 3D]
Highperformance
Lowperformance
Strong vs. weak?
10
Hard toprogram
Predictable Strict Serialisability
Eventual Consistency
Snapshot Isolation
Strict Serialisability
PRAM
[Consistency in 3D]
Highperformance
Lowperformance
Strong vs. weak?
11
Hard toprogram
Predictable Strict Serialisability
Eventual Consistency
Snapshot Isolation
Strict Serialisability
PRAM
Serialis-ability
[Consistency in 3D]
PL-1
PL-2
Cursor Stability (PL-CS) Monotonic View (PL-2L)
Monotonic SnapshotReads (PL-MSR)
Consistent View (PL-2+)
Forward Consistent View (PL-FCV)
Snapshot Isolation (PL-SI) Update Serializability (PL-3U)
Full Serializability (PL-3)
Strict Serializability (PL-SS)
Repeatable Read (PL-2.99)
Figure 4-1: A partial order to relate various isolation levels.
previous chapter. Various levels can be ranked according to their “strength”: one level is strongerthan another if it allows fewer histories. In the figure, if level Y is stronger than level X, there is adirected path from X to Y; if there is no path between two levels, they are unrelated to each other.
For all intermediate levels, we have also developed corresponding guarantees that can beprovided to transactions as they execute. As in the previous chapter, the levels defined for runningtransactions are similar to the corresponding levels for committed transactions.
The rest of this chapter is organized as follows. In Section 4.1, we present our specifications forPL-2+. In Section 4.2, we present definitions for PL-2L. In Section 4.3, we describe specificationsof Snapshot Isolation. We discuss a new isolation level called Forward Consistent View in 4.4 thathas been inspired by Snapshot Isolation. We describe a level that captures the essence of Oracle’sRead Consistency in Section 4.5 and compare it with level PL-2L. Cursor Stability is presented inSection 4.6. Section 4.7 describes update serializability, a consistency guarantee that is useful forread-only transactions, and compares it with PL-2+ and serializability. Finally, in Section 4.8, weextend our definitions for intermediate levels to provide guarantees for executing transactions.
4.1 Isolation Level PL-2+
Isolation level PL-2+ is motivated by the fact that certain applications only need to observe aconsistent state of the database and serializability may not be required, e.g., a read-only transactionin an inventory application may simply want to observe a consistent state of the current ordersand in-stock items. It is the weakest level that ensures that integrity constraints are not observedas violated as long as update transactions modify the database consistently and are serializable.
67
Strong vs. weak?
12
Linearizability
Sequential
Regular
Safe
Eventual
Causal+ Real-timecausal
Causal
Read-your-writes(RYW)
Monotonic Reads(MR)
Writes-follow-reads (WFR)
Monotonic Writes(MW)
PRAM (FIFO)
Fork
Fork*
Fork-join causal
Boundedfork-join
causal
Fork sequential
Eventual linearizability
Timed serial& ∆,Γ-atomicity
Processor
Fork-basedmodels
Slowmemory
Per-object models
Per-record timeline
&Coherence
Timedcausal
Bounded staleness
&Delta
Weak fork-lin. Strong
eventual
Quiescent
Weak
k-regular
k-safe
PBS k-staleness
k-atomicity
Release
Weak ordering
Location
Scope
Lazy release
Entry
Synchronized models
Causal models
Staleness-basedmodels
Per-objectcausal
Per-keysequential
Prefix linearizable
Prefix sequential
PBS t-visibility
Session models
Eventual serializability
Figu
re1:
Hie
rarc
hyof
non-
trans
actio
nalc
onsi
sten
cym
odel
s.A
dire
cted
edge
from
cons
iste
ncy
sem
antic
sA
toco
nsis
tenc
yse
man
tics
Bm
eans
that
any
exec
utio
nth
atsa
tisfie
sB
also
satis
fies
A.U
nder
lined
mod
els
expl
icitl
yre
ason
abou
ttim
ing
guar
ante
es.
7
Transactional Adya 1999
Non-transactional Viotti & Vukolić 2016
[Consistency in 3D]
Three classes…
13
…of invariant … of protocol
Gen1 Object value Total order of operations
PO Relative ordering of operations Visibility
EQ State equivalence Composition
[Consistency in 3D]
Three dimensions
14
Eventual Consistency
Snapshot Isolation
Txnl CC
Mostly orthogonal (but not all
combinations make sense.)
Gen1 / Total Order
EQ /
Composition
PO / VisibilityCausal
Linearisability
SerialisabilityStrict
SerialisabilityCAP
[Consistency in 3D]
Operation
generator: read, compute, generate effector effector: compute, write side-effect Sequential execution:
• precondition ⟹ invariant • each effector individually safe
15
x:=3x+y>0x=4 y=–2
x+y≥0 y≔–3x+y>0 ¬x+y>0 skip
[Consistency in 3D]
Sequential correctness
generator: read, compute, generate effector effector: compute, write side-effect Sequential execution:
• precondition ⟹ invariant • each effector individually safe
16
x:=3x=4 y=–2
x+y≥0 y≔–3 skip
x=4 y=–2
x=3 y=–2 true
[Consistency in 3D]
Guarantee vs. semantics
Guarantee: • Class of invariants that is always true • Regardless of application code • Assuming sequentially correct
Application can compensate for absence of guarantee • e.g. Inv={ c≥0 }, app: c.inc()
17 [Consistency in 3D]
Data typesRegister
• Update: assign with constant ‣ Not commutative ‣ Absorbing
High-level types • Counter, ORset, Sequence:
effectors commute • Stock, Account, Queue: ¬ commute
Composed data • + structural invariants
18
[Consistency in 3D]
Replicated operation
u: state ⤻ (retval, (state ⤻ state)) Read one, write all (ROWA) Deferred-update replication (DUR)
19
origin replica
u!
u!
u?
client u
replica
uPRE
u!replica
v? v!
uPRE
uPRE
[Consistency in 3D]
Sharded, geo-replicated
20
x1y1z1
x2y2z2
DC1
DC2
z2%2=0
x2:=0
x1:=0x1>0
y2+=1
y1+=1
x=1 y’=1
x=0 y=0
DC3 ¬ read my writes
arbitrary origin
sharded, parallel
concurrent updates
[Consistency in 3D]
Type EQ invariants• A = B • x.friendOf (y) ⟺ y.friendOf (x) • x + y = constant • South ⨄ Boat ⨄ North
= { sheep, dog, wolf } Joint update to two objects Atomicity (all-or-nothing) property of transactions Protocol: single update message
• Asynchronous
21 [Consistency in 3D]
EQ: transactional composition
Airplane reservation • Allocate a seat to me • Pay for the flight
Two EQ relations: • paid = have_seat • my $$ + airline $$ = constant
Ad-hoc grouping (This txn also needs TO + snapshot)
22
[Consistency in 3D]
EQ/Composition axisTransaction groups operations All-or-nothing effects:
• Deliver effectors indivisibly ‣ packaged together
• + same TOE ‣ ≈ 2-phase commit
Snapshot reads: • all generators read from
same set of effectors ‣ maintain versions
• + same TO, VIS guarantees ‣ coordination
23
All-or-nothing effects
+ snapshot
0 = Independent operations
[Consistency in 3D]
EQ/Composition axisTransaction groups operations All-or-nothing effects:
• Deliver effectors indivisibly ‣ packaged together
• + same TOE ‣ ≈ 2-phase commit
Snapshot reads: • all generators read from
same set of effectors ‣ maintain versions
• + same TO, VIS guarantees ‣ coordination
24
All-or-nothing effects
+ snapshot
0 = Independent operations
SerialisabilitySnapshot Isolation
Trans. Causal
RC
LinearisabilityPRAM
[Consistency in 3D]
EQ/Composition axisTransaction groups operations All-or-nothing effects:
• Deliver effectors indivisibly ‣ packaged together
• + same TOE ‣ ≈ 2-phase commit
Snapshot reads: • all generators read from
same set of effectors ‣ maintain versions
• + same TO, VIS guarantees ‣ coordination
25
All-or-nothing effects
+ snapshot
0 = Independent operations
SerialisabilitySnapshot Isolation
Trans. Causal
RC
LinearisabilityPRAM
[Consistency in 3D]
Type PO invariants• employee.manager.salary ≥ employee.salary • S1; S2; S3 ≣ S1 ⟸ S2 ⟸ S3 • dog ∈ S ⟸ sheep ∈ S ∧ wolf ∈ S • Referential integrity • “inode references disk block” • ACL (u, p) ⟸ access (u, p)
Demarcation Protocol: 1. increase LHS by c 2. increase RHS by c' ≤ c ⟹ordered delivery
No synchronisation: Available26
[Consistency in 3D]
PO: transitive / causal visibility
x = 100; y = 100 Inv = { x ≥ y } Ex 1:
• P1: x += 100 • P2: if x > y then y += (x–y)/2 • P3: x ≥ y? • Transitive visibility vis* ⊆ vis
Ex 2: • P1: x += 100; d ≔ 100 • P2: if d > 0 then y += d/2 • P3: x ≥ y?
Causal visibility (vis; po)* ⊆ vis
27
x!
x!
x!y!
y!
x!
[Consistency in 3D]
PO: transitive / causal visibility
x = 100; y = 100 Inv = { x ≥ y } Ex 1:
• P1: x += 100 • P2: if x > y then y += (x–y)/2 • P3: x ≥ y? • Transitive visibility vis* ⊆ vis
Ex 2: • P1: x += 100; d ≔ 100 • P2: if d > 0 then y += d/2 • P3: x ≥ y?
Causal visibility (vis; po)* ⊆ vis
28
x!
x!
y!
y!
x!
d!
d!
x!
x!
y!
y!
x! client is part of DB
[Consistency in 3D]
Monoto
nic cl
ient
Total
caus
al ord
er
Trans
itive V
isibility
Causa
l Visib
ility
PO/Visibility axis
29
Monoto
nic cl
ient
Total
caus
al ord
er
0 = Rollb
acks
Externa
l
Visibility • Which writes visible to
reads Transitive closure property
• Metadata • System-wide
Sender not delayed ⟹ writes available
Stale data ⟹ reads available
[Consistency in 3D]
Monotonic client
• Read My Writes • Monotonic Reads
Often assumed ‣ Buffer
30
Monoto
nic cl
ient
Total
caus
al ord
er
0 = Rollb
acks
Trans
itive V
isibility
Causa
l Visib
ility
Externa
l
Eventual Consistency“Not reasonable”
[Consistency in 3D]
Transitive, causal vis.• Effector: metadata identifies set of
predecessor effectors • Delay delivery after predecessors ‣ Read stale data
• Graph: unbounded • Vector clock: 104—106 entries × 8 bytes! • Approximate VC: stronger order
31
Monoto
nic cl
ient
Total
caus
al ord
er
0 = Rollb
acks
Trans
itive V
isibility
Causa
l Visib
ility
Externa
l
SERNMSI PSI
x
y!
y
x!
d!
d
x!
x!
y!
y!
x!
client is part of DB
[Consistency in 3D]
Total/external causal
Total order extends causal order Metadata: 1 single scalar
• but cost of total order External: real-time clock
32
Monoto
nic cl
ient
Total
caus
al ord
er
0 = Rollb
acks
Trans
itive V
isibility
Causa
l Visib
ility
Externa
l
Gentle Rain LinearisableSSER
[Consistency in 3D]
Gen1 invariantsInv = “0 ≤ x” u! = “x ≔ x–1” { Inv ∧1≤ x} u! { Inv }
Predict that Inv will be true after u!: • Sequential: weakest precondition • Generalises to bounded concurrency
Unbounded concurrency: no sufficient precondition • Invariant is not stable • Limit concurrency: escrow • No concurrency: order updates
33 [Consistency in 3D]
Gen1: total order
34
Gen1
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Do replicas observe events in the same order?
Pick a unique number
[Consistency in 3D]
0 = unordered
No: concurrent • Commute ⟹ converge • Stable precondition ⟹ Invariant35
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Do replicas observe events in the same order?
Pick a unique number
x!
x!
y!
y!
[Consistency in 3D]
Capricious TO effectors
Pick a number locally: capricious Gap: will arrive later?
• Non-monotonic: rollback • Monotonic ‣ Wait for gap to fill (Lamport 78) ‣ Lost updates (LWW)
36
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Do replicas observe events in the same order?
Pick a unique number
10
7
EC
LamportLWW
[Consistency in 3D]
Capricious TO effectors
Pick a number locally: capricious Gap: will arrive later?
• Non-monotonic: rollback • Monotonic ‣ Wait for gap to fill (Lamport 78) ‣ Lost updates (LWW)
37
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Do replicas observe events in the same order?
Pick a unique number
10
7
EC
LamportLWW
[Consistency in 3D]
Gapless TO effectors
Gapless: • No lost updates • Consensus, 2PC to uniquely
allocate next free number ⟹ not available
38
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Do replicas observe events in the same order?
Pick a unique number
8
7
SMRPSI
NMSI
[Consistency in 3D]
TO generators
TO effectors + TO generators ‣ separate from effectors ‣ same order as effectors
39
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Do replicas observe events in the same order?
Pick a unique number
SI
LINSER
SSER
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
40
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
41
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
EC
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
42
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Lin
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
43
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
SSER
Lin
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
44
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
SER SSER
Lin
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
45
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
SSERSER
PSI
Lin
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
46
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
SSERSER
PSISI
Lin
[Consistency in 3D]
Gen1 / Total OrderThree dimensions
47
EQ / Composition
PO / Visibility
Total causal order
0 = Rollbacks
Transitive VisibilityCausal Visibility
External
Monotonic client
All-o
r-not
hing
effec
ts
+ sn
apsh
ot
0 =
Inde
pend
ent
oper
ation
s
Gapless TO effectors
0 = Concurrent
Total order, capricious
TO generators + TO effectors
TO generators = effectors
CAP
Txnl CC[Consistency in 3D] 48
<article-no>:12Consistency in 3D
Total Order Composition VisibilityRollbacks Monotonic Transitive Causal External
All-or-Nothing + Snapshot SER SSERAll-or-Nothing E�ectorsTOG=TOE
Single Operation SC LINAll-or-Nothing + Snapshot NMSI PSI SSI
All-or-Nothing E�ectorsGapless TOESingle Operation
All-or-Nothing + Snapshot Bayou ÿAll-or-Nothing E�ectors ÿCapricious TOE
Single Operation LWW ÿAll-or-Nothing + Snapshot Causal HAT ÿ
All-or-Nothing E�ectors RC ÿConcurrent OpsSingle Operation EC PRAM CC ÿ
Table 5 Matrix of features and consistency models
6 Discussion and conclusion
Our system model (Section 2) is very general. The separation between generators and ef-fectors allows for internal parallelism; if unusual, it reflects practical implementations [23].Our total order axis (Section 3), classifies the degree of concurrency between operationsto a single object, including only e�ectors or also generators, and accounts for both avail-able (capricious) and consensus-based (gapless) approaches. The other two axes introducemechanisms that relate multiple objects; however, they serve di�erent purposes and havedi�erent costs. Visibility order (Section 4) relates reads to writes and involves maintaininga system-wide transitive closure, and aims to support PO-type invariants. Transactions(composition, Section 5) serves to enforce ad-hoc EQ and Gen*; a transaction is a one-o�grouping, requested by the application.
In order to be intuitively useful, our classification simplifies the design space into threeapproximately linear axes (which we relate to application invariants). Obviously, this can-not account for the full complexity of the relations between models. We acknowledge thedeficiencies of such a simplification. For instance, we flatten the visibility axis, and abus-ively assume that all TOG=TOE models must be gapless. We defend this simplification aspractically relevant, even if not formally justified. We also ignored hybrid models, such asUpdate Serialisability [16].
We focus on client-monotonic models, as they are the most intuitive, and because mono-tonicity is trivial to implement. While the specifications of SER, NMSI, or RC do not requireMonotonic visibility, all the actual implementations that we know of do provide it.
Table 5 positions some major consistency models within the three axes. Compare forinstance two prominent strong consistency models: SSER and LIN. While LIN considerssingle operations and single objects, SSER is a transactional model requiring All-or-Nothingand Snapshot. Also notice how the visibility axis di�erentiates SSER from SER, and NMSIfrom PSI.
While our results are preliminary, we believe that this classification sheds light on thecrowded space of distributed consistency guarantees, towards a better understanding of theapplication invariants enforced by each of them. We intend, in further work, to formalizeour definitions and prove some interesting meta-properties. This work aims to be an step
[Consistency in 3D]
SummaryDistributed, replicated data
• Improves read availability • Parallel updates may violate invariants • Guarantee: invariants maintained by system • System vs. application cost trade-off ‣ Tools needed
3D consistency design space • Total order (effectors, generators) • Visibility order • Transactional Composition
Work in progress
49 [Consistency in 3D]
Creative Commons Attribution-ShareAlike
4.0 Intl. LicenseYou are free to:
• Share — copy and redistribute the material in any medium or format
• Adapt — remix, transform, and build upon the material for any purpose, even commercially, under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
50
[Consistency in 3D]
4 session guarantees ≣ causal
w1!
w1!
Monotonic readsr2 r3
Client / No rollback: r3 must include w1
52
w1!
w1!
w1Read My Writes
r2
Client / RMW: r2 must include w1
w1!
w1
w1!
Monotonic writes
w2!
w2
w2!
Global / No rollback: r3 must include w1
Writes Follow Reads
w1!
w1!
w3!
w3
w3!
Global / WR dependence: w3 must follow w1
r2