building real time systems on mongodb using the oplog at stripe

82
MongoDB and the Oplog EVAN BRODER @ebroder

Upload: stripe

Post on 06-Jul-2015

1.494 views

Category:

Software


1 download

DESCRIPTION

MongoDB's oplog is possibly its most underrated feature. The oplog is vital as the basis on which replication is built, but its value doesn't stop there. Unlike the MySQL binlog, which is poorly documented and not directly exposed to MySQL clients, the oplog is a well-documented, structured format for changes that is query-able through the same mechanisms as your data. This allows many types of powerful, application-driven streaming or transformation. At Stripe, we've used the MongoDB oplog to create PostgresSQL, HBase, and ElasticSearch mirrors of our data. We've built a simple real-time trigger mechanism for detecting new data. And we've even used it to recover data. In this talk, we'll show you how we use the MongoDB oplog, and how you can build powerful reactive streaming data applications on top of it. If you'd like to see the presentation with presenter's notes, I've published my Google Docs presentation at https://docs.google.com/presentation/d/19NcoFI9BG7PwLoBV7zvidjs2VLgQWeVVcUd7Xc7NoV0/pub Originally given at MongoDB World 2014 in New York

TRANSCRIPT

Page 1: Building Real Time Systems on MongoDB Using the Oplog at Stripe

MongoDB and the OplogEVAN BRODER @ebroder

Page 2: Building Real Time Systems on MongoDB Using the Oplog at Stripe

AGENDAINTRO TO THE OPLOGEXAMPLE APPLICATIONS

Page 3: Building Real Time Systems on MongoDB Using the Oplog at Stripe

INTROTO THE OPLOG

Page 4: Building Real Time Systems on MongoDB Using the Oplog at Stripe

PRIMARY

SECONDARIES

APPLICATION

Page 5: Building Real Time Systems on MongoDB Using the Oplog at Stripe

APPLICATION

save{_id: 1, a: 2}

Page 6: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

- save {_id: 1, a: 2}

Page 7: Building Real Time Systems on MongoDB Using the Oplog at Stripe

APPLICATION

update where{a: 2},{$set: {a: 3}}

Page 8: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

- save {_id: 1, a: 2}- update {_id: 1}, {$set: {a: 3}}

Page 9: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

- save {_id: 1, a: 2}- update {_id: 1}, {$set: {a: 3}}- insert…- delete…- delete…- save…- update…

Page 10: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

save…

THINGS I’VE DONE:

Page 11: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

save…

THINGS I’VE DONE:

save…

Page 12: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 13: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 14: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 15: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 16: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 17: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 18: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 19: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 20: Building Real Time Systems on MongoDB Using the Oplog at Stripe

TRIGGERS

Page 21: Building Real Time Systems on MongoDB Using the Oplog at Stripe

GOAL:EVENT PROCESSING

Page 22: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 23: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 24: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 25: Building Real Time Systems on MongoDB Using the Oplog at Stripe

GOAL:DETECT INSERTIONS

Page 26: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 27: Building Real Time Systems on MongoDB Using the Oplog at Stripe

WARNINGTHIS CODE IS NOT PRODUCTION-READY

Page 28: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

Page 29: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

Page 30: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

Page 31: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

Page 32: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 33: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 34: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 35: Building Real Time Systems on MongoDB Using the Oplog at Stripe

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 36: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 37: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 38: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 39: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 40: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 41: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 42: Building Real Time Systems on MongoDB Using the Oplog at Stripe

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 43: Building Real Time Systems on MongoDB Using the Oplog at Stripe

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 44: Building Real Time Systems on MongoDB Using the Oplog at Stripe

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 45: Building Real Time Systems on MongoDB Using the Oplog at Stripe

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 46: Building Real Time Systems on MongoDB Using the Oplog at Stripe

DATA TRANSFORMATIONS

Page 47: Building Real Time Systems on MongoDB Using the Oplog at Stripe

GOAL:MONGODB TO POSTGRESQL

Page 48: Building Real Time Systems on MongoDB Using the Oplog at Stripe

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 49: Building Real Time Systems on MongoDB Using the Oplog at Stripe

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start}}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

Page 50: Building Real Time Systems on MongoDB Using the Oplog at Stripe

cursor.each do |op|

puts op['o']['_id']

end

Page 51: Building Real Time Systems on MongoDB Using the Oplog at Stripe

cursor.each do |op|

case op['op']

when 'i'

puts op['o']['_id']

else

# ¯\_(ツ)_/¯

end

end

Page 52: Building Real Time Systems on MongoDB Using the Oplog at Stripe

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') +

') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

else

# ¯\_(ツ)_/¯

end

end

Page 53: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 54: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 55: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 56: Building Real Time Systems on MongoDB Using the Oplog at Stripe

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') +

') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

when 'd'

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

Page 57: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 58: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 59: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 60: Building Real Time Systems on MongoDB Using the Oplog at Stripe

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

when 'u'

query = "UPDATE #{op['ns']} SET"

updates = op['o']['$set'] ? op['o']['$set'] : op['o']

updates.each do |k, v|

query += " #{k}=#{v.inspect}"

end

query += " WHERE _id="

query += op['o2']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

Page 61: Building Real Time Systems on MongoDB Using the Oplog at Stripe

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') + ') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

when 'd'

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

when 'u'

query = "UPDATE #{op['ns']} SET"

updates = op['o']['$set'] ? op['o']['$set'] : op['o']

updates.each do |k, v|

query += " #{k}=#{v.inspect}"

end

query += " WHERE _id=" + op['o2']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

Page 62: Building Real Time Systems on MongoDB Using the Oplog at Stripe

github.com/stripe/mosql

Page 63: Building Real Time Systems on MongoDB Using the Oplog at Stripe

github.com/stripe/zerowing

Page 64: Building Real Time Systems on MongoDB Using the Oplog at Stripe

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') + ') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

when 'd'

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

when 'u'

query = "UPDATE #{op['ns']} SET"

updates = op['o']['$set'] ? op['o']['$set'] : op['o']

updates.each do |k, v|

query += " #{k}=#{v.inspect}"

end

query += " WHERE _id=" + op['o2']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

Page 65: Building Real Time Systems on MongoDB Using the Oplog at Stripe

DISASTER RECOVERY

Page 66: Building Real Time Systems on MongoDB Using the Oplog at Stripe

task = collection.find_one({'finished' => nil}

# do something with task…

collection.update({'_id' => task.id},

{'$set' => {'finished' => Time.now.to_i}})

Page 67: Building Real Time Systems on MongoDB Using the Oplog at Stripe

loop do

collection.remove(

{'finished' => {'$lt' => Time.now.to_i - 30}})

sleep(10)

end

Page 68: Building Real Time Systems on MongoDB Using the Oplog at Stripe

evan@caron:~$ mongo

MongoDB shell version: 2.4.10

connecting to: test

normal:PRIMARY> null < (Date.now() / 1000) - 30

true

Page 69: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 70: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 71: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

insertdelete…

THINGS I’VE DONE:

Page 72: Building Real Time Systems on MongoDB Using the Oplog at Stripe

> db.getReplicationInfo()

{

"logSizeMB" : 48964.3541015625,

"usedMB" : 46116.4,

"timeDiff" : 316550,

"timeDiffHours" : 87.93,

"tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)",

"tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)",

"now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)"

}

Page 73: Building Real Time Systems on MongoDB Using the Oplog at Stripe

> db.getReplicationInfo()

{

"logSizeMB" : 48964.3541015625,

"usedMB" : 46116.4,

"timeDiff" : 316550,

"timeDiffHours" : 87.93,

"tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)",

"tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)",

"now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)"

}

Page 74: Building Real Time Systems on MongoDB Using the Oplog at Stripe

new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == nil # found one! # save old_task to a file, and we'll re-queue it later end end

old_connection['admin'].command({'applyOps' => [op]}) endend

Page 75: Building Real Time Systems on MongoDB Using the Oplog at Stripe

new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end

old_connection['admin'].command({'applyOps' => [op]}) endend

Page 76: Building Real Time Systems on MongoDB Using the Oplog at Stripe

new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end

old_connection['admin'].command({'applyOps' => [op]}) endend

Page 77: Building Real Time Systems on MongoDB Using the Oplog at Stripe

THINGS I’VE DONE:

save…

THINGS I’VE DONE:

save…

Page 78: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 79: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 80: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 81: Building Real Time Systems on MongoDB Using the Oplog at Stripe
Page 82: Building Real Time Systems on MongoDB Using the Oplog at Stripe

QUESTIONS?