Implementation of
Cluster-wide Causal
Consistency in
- What is causal consistency- Academics view on Causal Consistency- MongoDB architecture- Causal Consistency building blocks- Making Causal Consistency secure- Making Causal Consistency fast- Making Causal Consistency reliable- Causal consistency for end-users
Outline
Client-side properties of causal consistency
- Read your writes
- Writes follow reads
- Monotonic reads
- Monotonic writes
Implementing with a non causally consistent system- Single server systems are causally consistent
- Read and write from the same node
- Add an application logic to handle the scenarios that have to be causally consistent
Ordering of Events in Distributed System
Process P Process Q Process R
q2: <C2> = 11
p1
r1: <C3> = 11
q1
<C3> = 0<C1> = 10 <C2> = 10
r2: <C3> = 12
q3
Server-side causal consistency
Causal consistency is a partial order of events in a distributed system. If an event A causes another event B, then causal consistency provides an assurance that each other process of the system observes event A before observing event B. If an event A is not causally related to an event B then they are concurrent.
System’s Architecture
Gossiping clusterTime
ClusterTime: {uint64}
Timestamp(1495470881, 6)
Ticking clusterTime
{ts: 6} ...
{ts: 5} ...
{ts: 11} ...
insert({x:1}, clusterTime: 4)
<clusterTime>: 6
<Wall clock>: 11
Reporting operationTime
insert({x:1})
{ts: 11} ...
{ts: 5} ...
{ok:1}, { operationTime: 12} {ts: 12} ...
<clusterTime>: 11
<Wall clock>: 11
Waiting for afterClusterTime
find({x:1}, afterClusterTime: {10},
clusterTime: {15})
{ts: 6} ...
{ts: 5} ...
{x:1}, { operationTime: {11}} {ts: 11} ...
Breaking clusterTime
{ts: Timestamp(1495470881, 6), term: 1},
...
{ts: Timestamp(1495470881, 5), term: 1},
...
{Timestamp(0xFFFFFFFF, 0xFFFFFFFF}
...
insert({x:1}, clusterTime:
{0xFFFFFFFF, 0xFFFFFFFE})
LogicalClock:clusterTime =
Timestamp(0xFFFFFFFF, 0xFFFFFFFE)
Protecting clusterTime
"$clusterTime" : {"clusterTime" : Timestamp(1495470881, 5),
"signature" : {
"hash" : BinData(0,"7olYjQCLtnfORsI9IAhdsftESR4="),
"keyId" : NumberLong("6422998367101517844")
}
}
Protecting against operator errors
{ts: 6} ...
{ts: 5} ...
insert({x:1}, clusterTime: 100,000)
<clusterTime>: 6
<Wall clock>: 11
{Error}, { operationTime: 6}
Signing a range of clusterTime
find({x:1}, clusterTime: <val>,
signature:<hash>)
<timeRange> =
<val> | 0x0000’0000’0000’FFFF
cache:{ <timeRange>:<hash> }
Use dummy signatures
- When the auth is off
- When a user has advanceClusterTime privilege
How end users see it
let session=db.getMongo().startSession({causalConsistency: true})
db = session.getDatabase(db.getName());
{checking:100}
find({name:”misha”})
afterClusterTime: 15
update({name:”misha”
checking:100})
{ok:1} operationTime: 15
startSession()
Misha Tyulenev