mongouk 2011 schema design

Alvin Richards -‐ [email protected]

Schema Design

Topics

Introduction• Basic Data Modeling• Evolving a schema

Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Trees• Queues

So why model data?

http://www.flickr.com/photos/42304632@N00/493639870/

A brief history of normalization• 1970 E.F.Codd introduces 1st Normal Form (1NF)• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Make the model informative to users• Avoid bias towards a particular style of query

* source : wikipedia

The real benefit of relational

• Before relational• Data and Logic combined

• After relational• Separation of concerns• Data modeled independent of logic• Logic freed from concerns of data design

• MongoDB continues this separation

Relational made normalized data look like this

Document databases make normalized data look like this

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Partition Shard

Partition Key Shard Key

So today’s example will use...

Design Session

Design documents that simply map to your application> post = {author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]}

> db.posts.save(post)

> db.posts.find()

{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Secondary index for “author”

// 1 means ascending, -‐1 means descending

> db.posts.ensureIndex({author: 1})

> db.posts.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-‐09-‐18T09:56:06.298Z"), author: "Hergé", ... }

Add and index, find via Index

Examine the query plan> db.blogs.find({author: 'Hergé'}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}

Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags> db.posts.find({tags: {$exists: true}})



Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })



Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })

Counting: // number of posts written by Hergé> db.posts.find({author: “Hergé”}).count()

Extending the Schema new_comment = {author: “Kyle”, date: new Date(), text: “great book”}

> db.posts.update( {text: “Destination Moon” }, { ‘$push’: {comments: new_comment}, ‘$inc’: {comments_count: 1}})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }

Extending the Schema

// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})

> db.posts.find({comments.author:”Kyle”})




// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)




// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)

// most commented post:> db.posts.find().sort({comments_count:-‐1}).limit(1)

When sorting, check if you need an index


Watch for full table scans

> db.blogs.find({text: 'Destination Moon'}).explain() { "cursor" : "BasicCursor", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { }}

Group

• Equivalent to a Group By in SQL

• Specific the attributes to group the data

• Process the results in a Reduce function

Group - Count post by Authorcmd = { key: { "author":true }, initial: {count: 0}, reduce: function(obj, prev) { prev.count++; }, };result = db.posts.group(cmd);

[ { "author" : "Hergé", "count" : 1 }, { "author" : "Kyle", "count" : 3 }]

Common Patterns

Inheritance

Single Table Inheritance - RDBMS

shapes tableid type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}


// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})


// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

// create index> db.shapes.ensureIndex({radius: 1})

One to ManyOne to Many relationships can specify• degree of association between objects• containment• life-cycle

One to Many- Embedded Array / Array Keys - slice operator to return subset of array - some queries harder e.g find latest comments across all documents

blogs: { author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ]}

One to Many- Embedded tree - Single document - Natural - Hard to query

blogs: { author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book", replies: [ { author : “James”, ...} ] } ]}

One to Many- Normalized (2 collections) - most flexible - more queries

blogs: { author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), comments : [ {comment : ObjectId(“1”)} ]}

comments : { _id : “1”, author : "James", date : ISODate("2011-‐09-‐19T09:56:06.298Z")}

One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys- Embedded tree- Normalized

Many - ManyExample: - Product can be in many categories- Category can have many products

products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]}

Many - Many

products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}

Many - Many

products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}

//All categories for a given product> db.categories.find({product_ids: ObjectId("10")})

Many - Many

products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure"}

Alternative


// All products for a given category> db.products.find({category_ids: ObjectId("20")})

Alternative


// All products for a given category> db.products.find({category_ids: ObjectId("20")})

// All categories for a given productproduct = db.products.find(_id : some_id)> db.categories.find({_id : {$in : product.category_ids}})

Alternative

TreesFull Tree in Document

{ comments: [ { author: “Kyle”, text: “...”, replies: [ {author: “James”, text: “...”, replies: []} ]} ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

TreesParent Links- Each node is stored as a document- Contains the id of the parent

Child Links- Each node contains the id’s of the children- Can support graphs (multiple parents / child)

Array of Ancestors- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" }


//find all descendants of b:

> db.tree2.find({ancestors: ‘b’})

//find all direct descendants of b:

> db.tree2.find({parent: ‘b’})


//find all descendants of b:

> db.tree2.find({ancestors: ‘b’})

//find all direct descendants of b:

> db.tree2.find({parent: ‘b’})

//find all ancestors of f:> ancestors = db.tree2.findOne({_id:’f’}).ancestors> db.tree2.find({_id: { $in : ancestors})

Trees as PathsStore hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree

{ comments: [ { author: “Kyle”, text: “initial post”, path: “/” }, { author: “Jim”, text: “jim’s comment”, path: “/jim” }, { author: “Kyle”, text: “Kyle’s reply to Jim”, path : “/jim/kyle”} ] }

// Find the conversations Jim was part of > db.posts.find({path: /^jim/i})

Queue• Need to maintain order and state• Ensure that updates to the queue are atomic

{ inprogress: false, priority: 1, ... }

Queue• Need to maintain order and state• Ensure that updates to the queue are atomic

{ inprogress: false, priority: 1, ... }

// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the apps manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

@mongodb

conferences, appearances, and meetupshttp://www.10gen.com/events

http://bit.ly/mongo1 Facebook | Twitter | LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

[email protected]

mongouk 2011 schema design

Documents