replication internals: the life of a write
Post on 08-Sep-2014
276 Views
Preview:
DESCRIPTION
TRANSCRIPT
Andy SchwerinLead Engineer, MongoDB
• Goals of Replication• Replication Architecture• A representative write
• High availability for processing reads and writes– Automatic leader election
• Support many network topologies– Tag sets
• Accessible consistency model– Ordered operation log
• Client can trade latency for durability– Write concern
{ ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” }}
OPLOG
…
PRIMARY OPLOG
4
SECONDARY OPLOG
8 9
SECONDARY OPLOG
4 5
When a secondary oplog is not a prefix of the primary oplog…
w:?
w:1
Could lose write when primary disappears, without notification.
w:majority
Over half of nodes must fail to lose the write.
And, an outside operator must intervene before new writes are accepted.
w:all
All nodes have the write before primary responds.
But, cannot complete writes if any nodes are down.
OPLOG
d.c
OPLOG
P TS:6
S1 TS:6
S2 TS:2
d.c. insert ({_id:10,name:’john’}, wC: {w:2}}) 1. Fetch oplog entries2. Apply to collections3. Write to local oplog4. Notify primary5. Repeat
OPLOG
OBSERVER
BATCH
BATCHPREFETCH
APPLIER
BATCH
x.y d.cd.c
OPLOG
d.c. insert ({_id:10,name:’john’}, wC: {w:2}})
P TS:6
S1 TS:6
S2 TS:2
OPLOG
d.c.insert ({_id:10,name:’john’}, wC: {w:2}})
d.c
{ ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” }}
P TS:4
S1 TS:2
S2 TS:2
OPLOG
d.c.insert ({_id:10,name:’john’}, wC: {w:2}})
OBSERVER
BATCH
d.c
OPLOG
P TS:6
S1 TS:2
S2 TS:2
OPLOG
d.c.insert ({_id:10,name:’john’}, wC: {w:2}})
BATCH
d.c
OPLOG
OBSERVER
P TS:6
S1 TS:2
S2 TS:2
OBSERVER
BATCH
BATCHPREFETCH
OPLOG
• Split batch into arbitrary work units• Assign work to prefetch threads• Entries processed in any order• All while admitting readers
Allow readers
OBSERVER
BATCH
BATCHPREFETCH
OPLOG
BATCH
x.y d.c
APPLIER• Assign entries to workers by
target collection• Disable schema constraints
Allow readers
OBSERVER
BATCH
BATCHPREFETCH
OPLOG
BATCH
x.y d.c
APPLIER• Concurrency control excludes
readers• Oplog entries applied in
timestamp order
Exclude readers
OBSERVER
BATCH
BATCHPREFETCH
OPLOG
BATCH
x.y d.c
APPLIER
Exclude readers• Concurrency control excludes readers
• Oplog entries applied in timestamp order
OBSERVER
BATCH
BATCHPREFETCH
OPLOG
BATCH
x.y d.c
APPLIER
Exclude readers• Concurrency control excludes readers
• Oplog entries applied in timestamp order
OBSERVER
BATCH
BATCHPREFETCH
APPLIER
BATCH
x.y d.c
OPLOG
• Readmit readers• Move entries from batch to oplog• Begin processing next batch
Allow readers
OPLOG
OBSERVER
BATCH
BATCHPREFETCH
APPLIER
BATCH
x.y d.cd.c
OPLOG
Allow readers
P TS:6
S1 TS:6
S2 TS:2
OPLOG
d.c
P TS:6
S1 TS:6
S2 TS:2
• Consults list of waiting clients• Looks for those waiting for ts:6 or
earlier on S1• Sends acknowledgement!
top related