Transcript
Page 1: Schema Tricks & Tips

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Schema Design 4 Real World Use Cases

Page 2: Schema Tricks & Tips

One size fits all?

Page 3: Schema Tricks & Tips

Single Table En

Agenda

•  Why is schema design important

•  4 Real World Schemas –  Inbox –  History –  Indexed Attributes –  Multiple Identities

•  Conclusions

Page 4: Schema Tricks & Tips

Why is Schema Design important?

•  Largest factor for a performant system

•  Schema design with MongoDB is different •  RBMS – "What answers do I have?" •  MongoDB – "What question will I have?"

Page 5: Schema Tricks & Tips

#1 - Message Inbox

Page 6: Schema Tricks & Tips

Let’s get Social

Page 7: Schema Tricks & Tips

Sending Messages

?

Page 8: Schema Tricks & Tips

Design Goals

•  Efficiently send new messages to recipients

•  Efficiently read inbox

Page 9: Schema Tricks & Tips

Reading my Inbox

?

Page 10: Schema Tricks & Tips

3 Approaches (there are more)

•  Fan out on Read

•  Fan out on Write

•  Fan out on Write with Bucketing

Page 11: Schema Tricks & Tips

// Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

} // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )

Fan out on read

Page 12: Schema Tricks & Tips

Fan out on read – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 13: Schema Tricks & Tips

Fan out on read – Inbox Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 14: Schema Tricks & Tips

Considerations

•  1 document per message sent

•  Multiple recipients in an array key

•  Reading an inbox is finding all messages with my own name in the recipient field

•  Requires scatter-gather on sharded cluster

•  Then a lot of random IO on a shard to find everything

Page 15: Schema Tricks & Tips

// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe”, to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

} // Send a message for ( recipient in msg.to ) {

msg.recipient = recipient db.inbox.save( msg );

} // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )

Fan out on write

Page 16: Schema Tricks & Tips

Fan out on write – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 17: Schema Tricks & Tips

Fan out on write– Read Inbox

Shard 1 Shard 2 Shard 3

Read Inbox

Page 18: Schema Tricks & Tips

Considerations

•  1 document per recipient

•  Reading my inbox is just finding all of the messages with me as the recipient

•  Can shard on recipient, so inbox reads hit one shard

•  But still lots of random IO on the shard

Page 19: Schema Tricks & Tips

Fan out on write with buckets

•  Each “inbox” document is an array of messages

•  Append a message onto “inbox” of recipient

•  Bucket inbox documents so there’s not too many per document

•  Can shard on recipient, so inbox reads hit one shard

•  A few documents to read the whole inbox

Page 20: Schema Tricks & Tips

// Shard on “owner / sequence” db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

} // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50);

db.inbox.update( { owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } );

} // Read my inbox db.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )

Fan out on write – with buckets

Page 21: Schema Tricks & Tips

Bucketed fan out on write - Send

Shard 1 Shard 2 Shard 3

Send Message

Page 22: Schema Tricks & Tips

Bucketed fan out on write - Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 23: Schema Tricks & Tips

#2 – History

Page 24: Schema Tricks & Tips
Page 25: Schema Tricks & Tips

Design Goals

•  Need to retain a limited amount of history e.g. –  Hours, Days, Weeks –  May be legislative requirement (e.g. HIPPA, SOX, DPA)

•  Need to query efficiently by –  match –  ranges

Page 26: Schema Tricks & Tips

3 Approaches (there are more)

•  Bucket by Number of messages

•  Fixed size Array

•  Bucket by Date + TTL Collections

Page 27: Schema Tricks & Tips

db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ( { owner: "friend1", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}}) // Remove elements based on a date db.inbox.update( { owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )

Inbox – Bucket by # messages

Page 28: Schema Tricks & Tips

Considerations

•  Shrinking documents, space can be reclaimed with –  db.runCommand ( { compact: '<collection>' } )

•  Removing the document after the last element in the array as been removed –  { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }

Page 29: Schema Tricks & Tips

msg = { from: "Your Boss", to: [ "Bob" ],

sent: new Date(), message: "CALL ME NOW!"

} // 2.4 Introduces $each, $sort and $slice for $push db.messages.update(

{ _id: 1 }, { $push: { messages: { $each: [ msg ],

$sort: { sent: 1 }, $slice: -50 } }

} )

Maintain the latest – Fixed Size Array

Page 30: Schema Tricks & Tips

Considerations

•  Need to compute the size of the array based on retention period

Page 31: Schema Tricks & Tips

// messages: one doc per user per day

db.inbox.findOne() {

_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }

// Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )

TTL Collections

Page 32: Schema Tricks & Tips

#3 – Indexed Attributes

Page 33: Schema Tricks & Tips

Design Goal

•  Application needs to stored a variable number of attributes e.g. –  User defined Form –  Meta Data tags

•  Queries needed –  Equality –  Range based

•  Need to be efficient, regardless of the number of attributes

Page 34: Schema Tricks & Tips

2 Approaches (there are more)

•  Attributes

•  Attributes as Objects in an Array

Page 35: Schema Tricks & Tips

db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("2013-03-01T09:59:42.689Z" } } )

db.files.insert( { _id:"local.1", attr: { type: "text", size: 128} } )

db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } )

// Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} )

// Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )

Attributes as a Sub-Document

Page 36: Schema Tricks & Tips

Considerations

•  Each attribute needs an Index

•  Each time you extend, you add an index

•  Lots and lots of indexes

Page 37: Schema Tricks & Tips

db.files.insert( { _id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("2013-03-01T09:59:42.689Z" } ] } )

db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } )

db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } )

db.files.ensureIndex( { attr: 1 } )

Attributes as Objects in Array

Page 38: Schema Tricks & Tips

// Range queries db.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )

db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )

// Multiple condition – Only the first predicate on the query can use the Index // ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071

db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

// Each $or can use an index db.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

Queries

Page 39: Schema Tricks & Tips

#4 – Multiple Identities

Page 40: Schema Tricks & Tips

Design Goal

•  Ability to look up by a number of different identities e.g. •  Username •  Email address •  FB Handle •  LinkedIn URL

Page 41: Schema Tricks & Tips

2 Approaches (there are more)

•  Identifiers in a single document

•  Separate Identifiers from Content

Page 42: Schema Tricks & Tips

db.users.findOne() { _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} }

// Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } )

// Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } )

Single Document by User

Page 43: Schema Tricks & Tips

Read by _id (shard key)

Shard 1 Shard 2 Shard 3

find( { _id: "joe"} )

Page 44: Schema Tricks & Tips

Read by email (non-shard key)

Shard 1 Shard 2 Shard 3

find ( { email: [email protected] } )

Page 45: Schema Tricks & Tips

Considerations

•  Lookup by shard key is routed to 1 shard

•  Lookup by other identifier is scatter gathered across all shards

•  Secondary keys cannot have a unique index

Page 46: Schema Tricks & Tips

// Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mongodbdays.identities", { identifier : 1 } )

// Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} )

// Create a docuemnt that holds all the other user attributes db.users.save( { _id: "1200-42", ... } )

// Shard collection by _id db.shardCollection( "mongodbdays.users", { _id: 1 } )

Document per Identity

Page 47: Schema Tricks & Tips

Read requires 2 reads

Shard 1 Shard 2 Shard 3

db.identities.find({"identifier" : { "hndl" : "joe" }})

db.users.find( { _id: "1200-42"} )

Page 48: Schema Tricks & Tips

Solution

•  Lookup to Identities is a routed query

•  Lookup to Users is a routed query

•  Unique indexes available

Page 49: Schema Tricks & Tips

Conclusion

Page 50: Schema Tricks & Tips

Summary

•  Multiple ways to model a domain problem

•  Understand the key uses cases of your app

•  Balance between ease of query vs. ease of write

•  Random IO should be avoided

Page 51: Schema Tricks & Tips

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Thank You


Top Related