building a scalable inbox system with mongodb and java

48
Technical Account Manager Lead, MongoDB Inc @antoinegirbal Antoine Girbal JavaOne 2013 Building a scalable inbox system with MongoDB and Java

Upload: antoinegirbal

Post on 12-May-2015

3.310 views

Category:

Technology


2 download

DESCRIPTION

Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.

TRANSCRIPT

Page 1: Building a Scalable Inbox System with MongoDB and Java

Technical Account Manager Lead, MongoDB Inc

@antoinegirbal

Antoine Girbal

JavaOne 2013

Building a scalable inbox system with MongoDB and Java

Page 2: Building a Scalable Inbox System with MongoDB and Java

Single Table En

Agenda

• Problem Overview

• Schema and queries

• Java Development

• Design Options – Fan out on Read– Fan out on Write– Bucketed Fan out on Write– Cached Inbox

• Discussion

Page 3: Building a Scalable Inbox System with MongoDB and Java

Problem Overview

Page 4: Building a Scalable Inbox System with MongoDB and Java

Let’s getSocial

Page 5: Building a Scalable Inbox System with MongoDB and Java

Sending Messages

?

Page 6: Building a Scalable Inbox System with MongoDB and Java

Reading my Inbox

?

Page 7: Building a Scalable Inbox System with MongoDB and Java

Schema and Queries

Page 8: Building a Scalable Inbox System with MongoDB and Java

Basic CRUD• Save your first document:> db.test.insert({ firstName: "Antoine", lastName: "Girbal" } )

• Find the document:> db.test.find({ firstName: "Antoine" } ){ _id: ObjectId("524495105889411fab0cdfa3"), firstName: "Antoine", lastName: "Girbal" }

• Update the document:> db.test.update({ _id: ObjectId("524495105889411fab0cdfa3") }, { x: 1, y: 2 } )

• Remove the document:> db.test.remove({ _id: ObjectId("524495105889411fab0cdfa3") })

• No schema definition or other declaration, it's easy!

Page 9: Building a Scalable Inbox System with MongoDB and Java

The User Document{ "_id": ObjectId("519c12d53004030e5a6316d2"),

"address": { "streetAddress": "2600 Rafe Lane", "city": "Jackson", "state": "MS", "zip": 39201, "country": "US" }, "birthday": "IDODate("1980-12-26T00:00:00.000Z"), "company": "Parade of Shoes", "domain": "SanFranciscoAgency.com", "email": "[email protected]", "firstName": "Anthony", "gender": "male", "lastName": "Dacosta", "location": [ -90.183518, 32.368619 ],

…}

Page 10: Building a Scalable Inbox System with MongoDB and Java

The User CollectionThe collection statistics:

> db.users.stats(){

"ns": "edges.users", "count": 1000000, // number of documents "size": 637864480, // size of all documents "avgObjSize": 637.86448, "storageSize": 845197312, "numExtents": 16, "nindexes": 2, "lastExtentSize": 227786752, "paddingFactor": 1.0000000000260925, // padding after documents"systemFlags": 1, "userFlags": 0, "totalIndexSize": 66070256, "indexSizes": { "_id_": 29212848, "uid_1": 36857408 }, "ok": 1

}

Page 11: Building a Scalable Inbox System with MongoDB and Java

Queries on UsersFinding a user by email address… > db.users.find({ "email": "[email protected]" }).pretty(){ "_id": ObjectId("519c12d53004030e5a6316d2"),

By default will use a slow table scan… > db.users.find({ "email": "[email protected]" } ).explain(){ "cursor": "BasicCursor",

"nscannedObjects": 1000000, // 1m objects scanned"nscanned": 1000000, …

Use an index for fast performance… > db.users.ensureIndex({ "email": 1 } ) // does not do anything if index is there > db.users.find({ "email": "[email protected]" }).explain(){ "cursor": "BtreeCursor email_1", // Btree, sweet!

"nscannedObjects": 1, // document is found almost right away"nscanned": 1, …

Page 12: Building a Scalable Inbox System with MongoDB and Java

Users Relationships• Here the follower / followee relationships

are of "many-to-many" type. It can be either stored as:

1. a list of followers in user2. a list of followees in user3. a relationship collection: "followees"4. two relationship collections: "followees" and

"followers".

• Ideal solutions:– a few million users and a 1000 followee limit:

Solution #2– no boundaries and relative scaling: Solution #3– no boundaries and max scaling: Solution #4

Page 13: Building a Scalable Inbox System with MongoDB and Java

Relationship DataLet's look at a sample document:

> use edgesswitched to db edges> db.followees.findOne(){ "_id": ObjectId(), "user": "17052001”, "followee": "31554261”}

And the statistics:

> db.followees.stats(){ "ns": "edges.followees", "count": 1000000, "size": 64000048, "avgObjSize": 64.000048, "storageSize": 86310912, "numExtents": 10, "nindexes": 2, "lastExtentSize": 27869184, "paddingFactor": 1, "systemFlags": 1, "userFlags": 0, "totalIndexSize": 85561840, "indexSizes": {

"_id_": 32458720, "user_1_followee_1": 53103120 },

"ok": 1 }

Page 14: Building a Scalable Inbox System with MongoDB and Java

Relationship QueriesTo find all the users that a user follows:

> db.followees.ensureIndex({ user: 1, followee: 1 }) // why not just index on user? We shall see > db.followees.find({user: "11622712"}) { "_id" : ObjectId("51641c02e4b0ef6827a34569"), "user" : "11622712", "followee" : "30432718" } … > db.followees.find({user: "11622712"}).explain() { "cursor" : "BtreeCursor user_1_followee_1", "n" : 66, "indexOnly" : false, "millis" : 0, // this is fast

Even faster if using a “covered” index:

> db.followees.find({user: "11622712"}, {followee: 1, _id: 0}).explain() { "cursor" : "BtreeCursor user_1_followee_1", "n" : 66, "nscannedObjects" : 0, "nscanned" : 66, "indexOnly" : true, // this means covered

To find all the followers of a user, we just need the opposite index::

> db.followees.ensureIndex({followee: 1, user: 1}) > db.followees.find({followee: "30313973"}, {user: 1, _id: 0})

Page 15: Building a Scalable Inbox System with MongoDB and Java

Message DocumentThe message document: > db.messages.findOne(){ "_id": "ObjectId("519d4858e4b079162fe7eb12"), "uid": "48268973", // the author id"username": "Abiall", // why store the username?"text": "Lorem ipsum dolor sit amet, consectetur ...", "created": ISODate(2013-05-22T22:36:08.663Z"), "location": [ -95.470188, 37.366044 ], "tags": [ "gadgets" ] }

Collection statistics:

> db.messages.stats(){ "ns": "msg.messages", "count": 21440518, "size": 14184598000, "avgObjSize": 661.5790719235422, "storageSize": 15749418944, "numExtents": 27, "nindexes": 2, "lastExtentSize": 2146426864, "paddingFactor": 1, "systemFlags": 1, "userFlags": 0, "totalIndexSize": 1454289648, "indexSizes": {

"_id_": 695646784, "uid_1_created_1": 758642864 },

"ok": 1 }

Page 16: Building a Scalable Inbox System with MongoDB and Java

Implementing the Outbox

The query is on "uid" and needs to be sorted by descending "created" time:

> db.messages.ensureIndex({ "uid": 1, "created": 1 } ) // use a compound index

> db.messages.find({ "uid": "31837072" } ).sort({ "created": -1 } ).limit(100){ "_id": ObjectId("519d626ae4b07916312e15b1") }, "uid": "31837072", "username": "Royague", "text": "Lorem ipsum dolor sit amet, consectetur adipisicing elit , sed do eiusmod tempor …", "created": ISODate("2013-05-23T00:27:22.369Z"), "location": [ "-118.296138", "33.772832" ], "tags": [ "Art" ] } …

> db.messages.find({ "uid": "31837072" }).sort({ "created": -1 }).limit(100).explain(){"cursor": "BtreeCursor uid_1_created_1 reverse", "n": 18, "nscannedObjects": 18, "nscanned": 18, "scanAndOrder": false, "millis": 0…

Page 17: Building a Scalable Inbox System with MongoDB and Java

Java Development

Page 18: Building a Scalable Inbox System with MongoDB and Java

Java support

• Java driver is open source, available on github and Maven.

• mongo.jar is the driver, bson.jar is a subset with BSON library only.

• Java driver is probably the most used MongoDB driver

• It receives active development by MongoDB Inc and the community

Page 19: Building a Scalable Inbox System with MongoDB and Java

Driver Features

• CRUD

• Support for replica sets

• Connection pooling

• Distributed reads to slave servers

• BSON serializer/deserializer (lazy option)

• JSON serializer/deserializer

• GridFS

Page 20: Building a Scalable Inbox System with MongoDB and Java

Message Storepublic class MessageStoreDAO implements MessageStore {

private Morphia morphia; private Datastore ds;

public MessageStoreDAO( MongoClient mongo ) { this.morphia = new Morphia(); this.morphia.map(DBMessage.class); this.ds = morphia.createDatastore(mongo, "messages"); this.ds.getCollection(DBMessage.class).

ensureIndex(new BasicDBObject("sender",1).append("sentAt",1) ); }

// get a messagepublic Message get(String user_id, String msg_id) { return (Message) this.ds.find(DBMessage.class) .filter("sender", user_id) .filter("_id", new ObjectId(msg_id)) .get(); }

Page 21: Building a Scalable Inbox System with MongoDB and Java

Message Store// save a messagepublic Message save(String user_id, String message, Date date) { Message msg = new DBMessage( user_id, message, date ); ds.save( msg ); return msg; }

// find message by author sorted by descending timepublic List<Message> sentBy(String user_id) { return (List) this.ds.find(DBMessage.class) .filter("sender",user_id).order("-sentAt").limit(50).asList(); }

// find message by several authors sorted by descending timepublic List<Message> sentBy(List<String> user_ids) { return (List) this.ds.find(DBMessage.class) .field("sender").in(user_ids).order("-sentAt").limit(50).asList(); }

Page 22: Building a Scalable Inbox System with MongoDB and Java

Graph StoreBelow uses Solution #4: both a follower and followee list

public class GraphStoreDAO implements GraphStore {

private DBCollection friends; private DBCollection followers; public GraphStoreDAO(MongoClient mongo) { this.followers = mongo.getDB("edges").getCollection("followers"); this.friends = mongo.getDB("edges").getCollection("friends"); followers.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique", true)); friends.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique",true)); }

// find users that are followedpublic List<String> friendsOf(String user_id) { List<String> theFriends = new ArrayList<String>(); DBCursor cursor = friends.find( new BasicDBObject("u",user_id), new BasicDBObject("_id",0).append("o",1)); while(cursor.hasNext()) theFriends.add( (String) cursor.next().get("o")); return theFriends; }

Page 23: Building a Scalable Inbox System with MongoDB and Java

Graph Store// find followers of a userpublic List<String> followersOf(String user_id) { List<String> theFollowers = new ArrayList<String>(); DBCursor cursor = followers.find( new BasicDBObject("u",user_id), new BasicDBObject("_id",0).append("o",1)); while(cursor.hasNext()) theFollowers.add( (String) cursor.next().get("o")); return theFollowers;}

public void follow(String user_id, String toFollow) { friends.save( new BasicDBObject("u",user_id).append("o",toFollow)); followers.save( new BasicDBObject("u",toFollow).append("o",user_id));}

public void unfollow(String user_id, String toUnFollow) { friends.remove(new BasicDBObject("u", user_id).append("o", toUnFollow)); followers.remove(new BasicDBObject("u", toUnFollow).append("o", user_id));}

Page 24: Building a Scalable Inbox System with MongoDB and Java

Design Options

Page 25: Building a Scalable Inbox System with MongoDB and Java

4 Approaches (there are more)• Fan out on Read

• Fan out on Write

• Bucketed Fan out on Write

• Inbox Caches

Page 26: Building a Scalable Inbox System with MongoDB and Java

Fan out on read

• Generally, not the right approach

• 1 document per message sent

• Reading an inbox is finding all messages sent by the list of people users follow

• Requires scatter-gather on sharded cluster

• Then a lot of random IO on a shard to find everything

Page 27: Building a Scalable Inbox System with MongoDB and Java

Fan out on ReadPut the followees ids in a list:

> var fees = [] > db.followees.find({user: "11622712"})

.forEach( function(doc) { fees.push( doc.followee ) } )

Use $in and sort() and limit() to gather the inbox:

> db.messages.find({ uid: { $in: fees } }).sort({ created: -1 }).limit(100){ "_id": ObjectId("519d627ce4b07916312f0a09"), "uid": "34660390", "username": "Dingdowas" } …{ "_id": ObjectId("519d627ce4b07916312f0a10"), "uid": "34661390", "username": "John" } …{ "_id": ObjectId("519d627ce4b07916312f0a11"), "uid": "34662390", "username": "Brenda" } ……

Page 28: Building a Scalable Inbox System with MongoDB and Java

Fan out on read – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 29: Building a Scalable Inbox System with MongoDB and Java

Fan out on read – Inbox Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 30: Building a Scalable Inbox System with MongoDB and Java

Fan out on read > db.messages.find({ uid: { $in: fees } } ).sort({ created: -1 } ).limit(100).explain() {

"cursor": "BtreeCursor uid_1_created_1 multi", "isMultiKey": false, "n": 100, "nscannedObjects": 1319, "nscanned": 1384, "nscannedObjectsAllPlans": 1425, "nscannedAllPlans": 1490, "scanAndOrder": true, // it is sorting in RAM??"indexOnly": false, "nYields": 0, "nChunkSkips": 0, "millis": 31 // takes about 30ms

}

Page 31: Building a Scalable Inbox System with MongoDB and Java

Fan out on read - sort

Page 32: Building a Scalable Inbox System with MongoDB and Java

Fan out on write

• Tends to scale better than fan out on read

• 1 document per recipient

• Reading my inbox is just finding all of the messages with me as the recipient

• Can shard on recipient, so inbox reads hit one shard

• But still lots of random IO on the shard

Page 33: Building a Scalable Inbox System with MongoDB and Java

Fan out on Write// Shard on “recipient” and “sent” db.shardCollection(”myapp.inbox”, { ”recipient”: 1, ”sent”: 1 } )

msg = { from: "Joe”, sent: new Date(), message: ”Hi!” }

// Send a message, write one message per followerfor( follower in followersOf( msg.from) ) {

msg.recipient = recipientdb.inbox.save(msg);

}

// Read my inbox, super easydb.inbox.find({ recipient: ”Joe” }).sort({ sent: -1 })

Page 34: Building a Scalable Inbox System with MongoDB and Java

Fan out on write – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 35: Building a Scalable Inbox System with MongoDB and Java

Fan out on write– Read Inbox

Shard 1 Shard 2 Shard 3

Read Inbox

Page 36: Building a Scalable Inbox System with MongoDB and Java

Bucketed Fan out on write• Each “inbox” document is an array of

messages

• Append a message onto “inbox” of recipient

• Bucket inbox documents so there’s not too many per document

• Can shard on recipient, so inbox reads hit one shard

• 1 or 2 documents to read the whole inbox

Page 37: Building a Scalable Inbox System with MongoDB and Java

Bucketed Fan out on Write

// Shard on “owner / sequence”db.shardCollection(”myapp.buckets”, { ”owner”: 1, ”sequence”: 1 } )db.shardCollection(”myapp.users”, { ”user_name”: 1 } )

msg = { from: "Joe”, sent: new Date(), message: ”Hi!” }

// Send a message, have to find the right sequence documentfor( follower in followersOf( msg.from) ) { sequence = db.users.findAndModify({ query: { user_name: recipient}, update: { '$inc': { ’msg_count': 1 }}, upsert: true, new: true }).msg_count / 50;

db.buckets.update({ owner: recipient, sequence: sequence}, { $push: { ‘messages’: msg } }, { upsert: true });

}

// Read my inboxdb.buckets.find({ owner: ”Joe” }).sort({ sequence: -1 }).limit(2)

Page 38: Building a Scalable Inbox System with MongoDB and Java

Bucketed fan out on write - Send

Shard 1 Shard 2 Shard 3

Send Message

Page 39: Building a Scalable Inbox System with MongoDB and Java

Bucketed fan out on write - Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 40: Building a Scalable Inbox System with MongoDB and Java

Cached inbox

• Recent messages are fast, but older messages are slower

• Store a cache of last N messages per user

• Used capped array to age out older messages

• Create cache lazily when user accesses inbox

• Only write the message if cache exists.

• Use TTL collection to time out caches for inactive users

Page 41: Building a Scalable Inbox System with MongoDB and Java

Cached Inbox// Shard on “owner"db.shardCollection(”myapp.caches”, { ”owner”: 1 } )

// Send a message, add it to the existing caches of followersfor( follower in followersOf( msg.from) ) {

db.caches.update({ owner: recipient }, { $push: { messages: {$each: [ msg ], $sort: { ‘sent’: 1 },$slice: -50 } } } );

// Read my inboxIf( msgs = db.caches.find({ owner: ”Joe” }) ) {

// cache document existsreturn msgs;

} else {// fall back to "fan out on read" and cache itdb.caches.save({owner:’joe’, messages:[]});msgs = db.outbox.find({sender: { $in: [ followersOf( msg.from ) ] }}).sort({sent:-1}).limit(50);db.caches.update({user:’joe’}, {$push: msgs });

}

Page 42: Building a Scalable Inbox System with MongoDB and Java

Cached Inbox – Send

Shard 1 Shard 2 Shard 3

Send Message

Page 43: Building a Scalable Inbox System with MongoDB and Java

Cached Inbox- Read

Shard 1 Shard 2 Shard 3

Read Inbox

1

2

Cache Hit

Cache Miss

Page 44: Building a Scalable Inbox System with MongoDB and Java

Discussion

Page 45: Building a Scalable Inbox System with MongoDB and Java

TradeoffsFan out on Read

Fan out on Write

Bucketed Fan out on

Write

Inbox Cache

Send Message Performance

Best Single shardSingle write

GoodShard per recipientMultiple writes

WorstShard per recipientAppends (grows)

MixedDepends on how many users are in cache

Read Inbox Performance

WorstBroadcast all shardsRandom reads

GoodSingle shardRandom reads

Best Single shardSingle read

MixedRecent messages fastOlder messages are slow

Data Size Best Message stored once

WorstCopy per recipient

WorstCopy per recipient

GoodSame as FoR + size of cache

Page 46: Building a Scalable Inbox System with MongoDB and Java

Things to consider

• Lots of recipients

• Fan out on write might become prohibitive• Consider introducing a “Group” • Make fan out asynchronous

• Very large message size

• Multiple copies of messages can be a burden• Consider single copy of message with a “pointer” per

inbox

• More writes than reads

• Fan out on read might be okay

Page 47: Building a Scalable Inbox System with MongoDB and Java

Summary

• Multiple ways to model status updates

• Think about characteristics of your network – Number of users – Number of edges – Publish frequency – Access patterns

• Try to minimize random IO

Page 48: Building a Scalable Inbox System with MongoDB and Java

Technical Account Manager Lead, MongoDB Inc

Antoine Girbal

JavaOne 2013

Thank You