replication internals: the life of a write

Post on 08-Sep-2014

276 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Andy SchwerinLead Engineer, MongoDB

• Goals of Replication• Replication Architecture• A representative write

• High availability for processing reads and writes– Automatic leader election

• Support many network topologies– Tag sets

• Accessible consistency model– Ordered operation log

• Client can trade latency for durability– Write concern

{ ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” }}

OPLOG

PRIMARY OPLOG

4

SECONDARY OPLOG

8 9

SECONDARY OPLOG

4 5

When a secondary oplog is not a prefix of the primary oplog…

w:?

w:1

Could lose write when primary disappears, without notification.

w:majority

Over half of nodes must fail to lose the write.

And, an outside operator must intervene before new writes are accepted.

w:all

All nodes have the write before primary responds.

But, cannot complete writes if any nodes are down.

OPLOG

d.c

OPLOG

P TS:6

S1 TS:6

S2 TS:2

d.c. insert ({_id:10,name:’john’}, wC: {w:2}}) 1. Fetch oplog entries2. Apply to collections3. Write to local oplog4. Notify primary5. Repeat

OPLOG

OBSERVER

BATCH

BATCHPREFETCH

APPLIER

BATCH

x.y d.cd.c

OPLOG

d.c. insert ({_id:10,name:’john’}, wC: {w:2}})

P TS:6

S1 TS:6

S2 TS:2

OPLOG

d.c.insert ({_id:10,name:’john’}, wC: {w:2}})

d.c

{ ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” }}

P TS:4

S1 TS:2

S2 TS:2

OPLOG

d.c.insert ({_id:10,name:’john’}, wC: {w:2}})

OBSERVER

BATCH

d.c

OPLOG

P TS:6

S1 TS:2

S2 TS:2

OPLOG

d.c.insert ({_id:10,name:’john’}, wC: {w:2}})

BATCH

d.c

OPLOG

OBSERVER

P TS:6

S1 TS:2

S2 TS:2

OBSERVER

BATCH

BATCHPREFETCH

OPLOG

• Split batch into arbitrary work units• Assign work to prefetch threads• Entries processed in any order• All while admitting readers

Allow readers

OBSERVER

BATCH

BATCHPREFETCH

OPLOG

BATCH

x.y d.c

APPLIER• Assign entries to workers by

target collection• Disable schema constraints

Allow readers

OBSERVER

BATCH

BATCHPREFETCH

OPLOG

BATCH

x.y d.c

APPLIER• Concurrency control excludes

readers• Oplog entries applied in

timestamp order

Exclude readers

OBSERVER

BATCH

BATCHPREFETCH

OPLOG

BATCH

x.y d.c

APPLIER

Exclude readers• Concurrency control excludes readers

• Oplog entries applied in timestamp order

OBSERVER

BATCH

BATCHPREFETCH

OPLOG

BATCH

x.y d.c

APPLIER

Exclude readers• Concurrency control excludes readers

• Oplog entries applied in timestamp order

OBSERVER

BATCH

BATCHPREFETCH

APPLIER

BATCH

x.y d.c

OPLOG

• Readmit readers• Move entries from batch to oplog• Begin processing next batch

Allow readers

OPLOG

OBSERVER

BATCH

BATCHPREFETCH

APPLIER

BATCH

x.y d.cd.c

OPLOG

Allow readers

P TS:6

S1 TS:6

S2 TS:2

OPLOG

d.c

P TS:6

S1 TS:6

S2 TS:2

• Consults list of waiting clients• Looks for those waiting for ts:6 or

earlier on S1• Sends acknowledgement!

top related