schema tricks & tips

of 51 /51
Technical Director, 10gen @jonnyeight [email protected] alvinonmongodb.com Alvin Richards #MongoDBdays Schema Design 4 Real World Use Cases

Author: alvin-john-richards

Post on 07-Nov-2014

158 views

Category:

Documents


0 download

Embed Size (px)

DESCRIPTION

In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.

TRANSCRIPT

#MongoDBdays

Schema Design 4 Real World Use CasesAlvin RichardsTechnical Director, 10gen @jonnyeight [email protected] alvinonmongodb.com

One size ts all?

Agenda Why is schema design important 4 Real World Schemas Inbox History Indexed Attributes Multiple Identities

Conclusions

Single Table En

Why is Schema Design important? Largest factor for a performant system Schema design with MongoDB is different

RBMS "What answers do I have?" MongoDB "What question will I have?"

#1 - Message Inbox

Lets get

Social

Sending Messages

?

Design Goals Efciently send new messages to recipients Efciently read inbox

Reading my Inbox

?

3 Approaches (there are more) Fan out on Read Fan out on Write Fan out on Write with Bucketing

Fan out on read// Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )

Fan out on read Send MessageSend Message

Shard 1

Shard 2

Shard 3

Fan out on read Inbox ReadRead Inbox

Shard 1

Shard 2

Shard 3

Considerations 1 document per message sent Multiple recipients in an array key Reading an inbox is nding all messages with my

own name in the recipient eld

Requires scatter-gather on sharded cluster Then a lot of random IO on a shard to nd

everything

Fan out on write// Shard on recipient and sent db.shardCollection( "mongodbdays.inbox", { recipient: 1, sent: 1 } ) msg = { from: "Joe, to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = recipient db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )

Fan out on write Send MessageSend Message

Shard 1

Shard 2

Shard 3

Fan out on write Read InboxRead Inbox

Shard 1

Shard 2

Shard 3

Considerations 1 document per recipient Reading my inbox is just nding all of the

messages with me as the recipient Can shard on recipient, so inbox reads hit one

shard But still lots of random IO on the shard

Fan out on write with buckets Each inbox document is an array of messages Append a message onto inbox of recipient Bucket inbox documents so theres not too many

per document shard

Can shard on recipient, so inbox reads hit one A few documents to read the whole inbox

Fan out on write with buckets// Shard on owner / sequence db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update( { owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )

Bucketed fan out on write - SendSend Message

Shard 1

Shard 2

Shard 3

Bucketed fan out on write - ReadRead Inbox

Shard 1

Shard 2

Shard 3

#2 History

Design Goals Need to retain a limited amount of history e.g. Hours, Days, Weeks May be legislative requirement (e.g. HIPPA, SOX, DPA)

Need to query efciently by match ranges

3 Approaches (there are more) Bucket by Number of messages Fixed size Array Bucket by Date + TTL Collections

Inbox Bucket by # messagesdb.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, ]} // Query with a date range db.inbox.find ( { owner: "friend1", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04 ") }}}}) // Remove elements based on a date db.inbox.update( { owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04 ") } } } } )

Considerations Shrinking documents, space can be reclaimed with db.runCommand ( { compact: '' } )

Removing the document after the last element in

the array as been removed { "_id" : , "messages" : [ ], "owner" : "friend1",

"sequence" : 0 }

Maintain the latest Fixed Size Arraymsg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } )

}

Considerations Need to compute the size of the array based on

retention period

TTL Collections// messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ]

}

// Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )

#3 Indexed Attributes

Design Goal Application needs to stored a variable number of

attributes e.g.

User dened Form Meta Data tags

Queries needed Equality Range based

Need to be efcient, regardless of the number of

attributes

2 Approaches (there are more) Attributes Attributes as Objects in an Array

Attributes as a Sub-Documentdb.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("2013-03-01T09:59:42.689Z" } } ) db.files.insert( { _id:"local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} )

// Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )

Considerations Each attribute needs an Index Each time you extend, you add an index Lots and lots of indexes

Attributes as Objects in Arraydb.files.insert( { _id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("2013-03-01T09:59:42.689Z" } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } ) db.files.ensureIndex( { attr: 1 } )

Queries// Range queries db.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } ) db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } ) // Multiple condition Only the first predicate on the query can use the Index // ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071 db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T ") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ]}) // Each $or can use an index db.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T ") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ]})

#4 Multiple Identities

Design Goal Ability to look up by a number of different

identities e.g.

Username Email address FB Handle LinkedIn URL

2 Approaches (there are more) Identiers in a single document Separate Identiers from Content

Single Document by Userdb.users.findOne() { _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: { } } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } )

Read by _id (shard key)nd( { _id: "joe"} )

Shard 1

Shard 2

Shard 3

Read by email (non-shard key)nd ( { email: [email protected] } )

Shard 1

Shard 2

Shard 3

Considerations Lookup by shard key is routed to 1 shard Lookup by other identier is scatter gathered

across all shards

Secondary keys cannot have a unique index

Document per Identity// Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each db.identities.save( { identifier db.identities.save( { identifier db.identities.save( { identifier users document : { hndl: "joe" }, user: "1200-42" } ) : { email: "[email protected]" }, user: "1200-42" } ) : { li: "joe.e.smith" }, user: "1200-42" } )

// Shard collection by _id db.shardCollection( "mongodbdays.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Create a docuemnt that holds all the other user attributes db.users.save( { _id: "1200-42", ... } ) // Shard collection by _id db.shardCollection( "mongodbdays.users", { _id: 1 } )

Read requires 2 readsdb.identities.nd({"identier" : { "hndl" : "joe" }}) db.users.nd( { _id: "1200-42"} )

Shard 1

Shard 2

Shard 3

Solution Lookup to Identities is a routed query Lookup to Users is a routed query Unique indexes available

Conclusion

Summary Multiple ways to model a domain problem Understand the key uses cases of your app Balance between ease of query vs. ease of write Random IO should be avoided

#MongoDBdays

Thank YouAlvin [email protected] Technical Director, 10gen [email protected] alvinonmongodb.com