mongoboulder - schema design

Post on 02-Apr-2015

4.367 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

chema Design: Data as Documents One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions:Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document?Are join tables necessary, or is there another technique for building out many-to-many relationships?What level of denormalization is appropriate?How do my data modeling decisions affect the efficiency of updates and queries?In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.

TRANSCRIPT

Schema DesignAlvin Richards

alvin@10gen.com

Topics

Introduction• Basic Data Modeling• Evolving a schema

Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Trees• Queues

So why model data?

http://www.flickr.com/photos/42304632@N00/493639870/

A brief history of normalization• 1970 E.F.Codd introduces 1st Normal Form (1NF)• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Make the model informative to users• Avoid bias towards a particular style of query

* source : wikipedia

The real benefit of relational

• Before relational• Data and Logic combined

• After relational• Separation of concerns• Data modeled independent of logic• Logic freed from concerns of data design

• MongoDB continues this separation

Relational made normalized data look like this

Document databases make normalized data look like this

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON  Document

Index Index

Join Embedding  &  Linking

Partition Shard

Partition  Key Shard  Key

DB ConsiderationsHow can we manipulate

this data ?

• Dynamic Queries

• Secondary Indexes

• Atomic Updates

• Map Reduce

Considerations• No Joins• Document writes are atomic

Access Patterns ?

• Read / Write Ratio

• Types of updates

• Types of queries

• Data life-cycle

So today’s example will use...

Design Session

Design documents that simply map to your applicationpost  =  {author:  “Hergé”,                date:  new  Date(),                text:  “Destination  Moon”,                tags:  [“comic”,  “adventure”]}

>  db.post.save(post)

>  db.posts.find()

   {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:  "Hergé",          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Secondary index for “author”

 //      1  means  ascending,  -­‐1  means  descending

 >  db.posts.ensureIndex({author:  1})

 >  db.posts.find({author:  'Hergé'})          {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          author:  "Hergé",            ...  }

Add and index, find via Index

Verifying indexes exist>  db.system.indexes.find()

//  Index  on  ID    {  name:  "_id_",          ns:  "test.posts",          key:  {  "_id"  :  1  }  }

//  Index  on  author    {  _id:  ObjectId("4c4ba6c5672c685e5e8aabf4"),          ns:  "test.posts",          key:  {  "author"  :  1  },          name:  "author_1"  }

Examine the query plan>  db.blogs.find({author:  'Hergé'}).explain(){   "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   }}

Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Regular expressions://  posts  where  author  starts  with  h>  db.posts.find({author:  /^h/i  })  

Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Regular expressions://  posts  where  author  starts  with  h>  db.posts.find({author:  /^h/i  })  

Counting: //  number  of  posts  written  by  Hergé>  db.posts.find({author:  “Hergé”}).count()

Extending the Schema        new_comment  =  {author:  “Kyle”,                                  date:  new  Date(),                                text:  “great  book”}

 >  db.posts.update(                      {text:  “Destination  Moon”  },                        {  ‘$push’:  {comments:  new_comment},                          ‘$inc’:    {comments_count:  1}})

     {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),          author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book"   }        ],        comments_count:  1    }    

Extending the Schema

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({comments.author:”Kyle”})

Extending the Schema

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({comments.author:”Kyle”})

//  find  last  5  posts:>  db.posts.find().sort({date:-­‐1}).limit(5)

Extending the Schema

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({comments.author:”Kyle”})

//  find  last  5  posts:>  db.posts.find().sort({date:-­‐1}).limit(5)

//  most  commented  post:>  db.posts.find().sort({comments_count:-­‐1}).limit(1)

When sorting, check if you need an index

Extending the Schema

Watch for full table scans

>  db.blogs.find({text:  'Destination  Moon'}).explain()    {   "cursor"  :  "BasicCursor",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  0,   "indexBounds"  :  {       }}

Map Reduce

Map reduce : count tagsmapFunc  =  function  ()  {        this.tags.forEach(function  (z)  {emit(z,  {count:1});});}

reduceFunc  =  function  (k,  v)  {        var  total  =  0;        for  (var  i  =  0;  i  <  v.length;  i++)  {                total  +=  v[i].count;  }        return  {count:total};  }

res  =  db.posts.mapReduce(mapFunc,  reduceFunc)

>db[res.result].find()          {  _id  :  "comic",  value  :  {  count  :  1  }  }          {  _id  :  "adventure",  value  :  {  count  :  1  }  }

   

Group

• Equivalent to a Group By in SQL

• Specific the attributes to group the data

• Process the results in a Reduce function

Group - Count post by Authorcmd  =  {  key:  {  "author":true  },                initial:  {count:  0},                reduce:  function(obj,  prev)  {                                prev.count++;                            },            };result  =  db.posts.group(cmd);

[   {     "author"  :  "Hergé",     "count"  :  1   },   {     "author"  :  "Kyle",     "count"  :  3   }]

Review

So Far:- Started out with a simple schema- Queried Data- Evolved the schema - Queried / Updated the data some more

http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html

Inheritance

Single Table Inheritance - RDBMS

shapes tableid type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  >  db.shapes.find({radius:  {$gt:  0}})

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  >  db.shapes.find({radius:  {$gt:  0}})

//  create  index>  db.shapes.ensureIndex({radius:  1})

One to ManyOne to Many relationships can specify• degree of association between objects• containment• life-cycle

One to Many- Embedded Array / Array Keys - slice operator to return subset of array - some queries harder e.g find latest comments across all documents

blogs:  {                author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          comments  :  [      {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book"      }        ]}

One to Many- Embedded tree - Single document - Natural - Hard to query

blogs:  {                author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          comments  :  [      {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                replies:  [  {  author  :  “James”,  ...}  ]      }        ]}

One to Many- Normalized (2 collections) - most flexible - more queriesblogs:  {                author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          comments  :  [        {comment  :  ObjectId(“1”)}        ]}

comments  :  {  _id  :  “1”,                          author  :  "James",              date  :  "Sat  Jul  24  2010  20:51:03  ..."}

One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys- Embedded tree- Normalized

Many - ManyExample: - Product can be in many categories- Category can have many products

products:      {  _id:  ObjectId("10"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("20"),                                          ObjectId("30”]}    

Many - Many

products:      {  _id:  ObjectId("10"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("20"),                                          ObjectId("30”]}    categories:      {  _id:  ObjectId("20"),            name:  "adventure",            product_ids:  [  ObjectId("10"),                                        ObjectId("11"),                                        ObjectId("12"]}

Many - Many

products:      {  _id:  ObjectId("10"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("20"),                                          ObjectId("30”]}    categories:      {  _id:  ObjectId("20"),            name:  "adventure",            product_ids:  [  ObjectId("10"),                                        ObjectId("11"),                                        ObjectId("12"]}

//All  categories  for  a  given  product>  db.categories.find({product_ids:  ObjectId("10")})

Many - Many

products:      {  _id:  ObjectId("10"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("20"),                                          ObjectId("30”]}    categories:      {  _id:  ObjectId("20"),            name:  "adventure"}

Alternative

products:      {  _id:  ObjectId("10"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("20"),                                          ObjectId("30”]}    categories:      {  _id:  ObjectId("20"),            name:  "adventure"}

//  All  products  for  a  given  category>  db.products.find({category_ids:  ObjectId("20")})  

Alternative

products:      {  _id:  ObjectId("10"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("20"),                                          ObjectId("30”]}    categories:      {  _id:  ObjectId("20"),            name:  "adventure"}

//  All  products  for  a  given  category>  db.products.find({category_ids:  ObjectId("20")})  

//  All  categories  for  a  given  productproduct    =  db.products.find(_id  :  some_id)>  db.categories.find({_id  :  {$in  :  product.category_ids}})  

Alternative

TreesFull Tree in Document

{  comments:  [          {  author:  “Kyle”,  text:  “...”,                replies:  [                                            {author:  “James”,  text:  “...”,                                              replies:  []}                ]}    ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 4MB limit

   

TreesParent Links- Each node is stored as a document- Contains the id of the parent

Child Links- Each node contains the id’s of the children- Can support graphs (multiple parents / child)

Array of Ancestors- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

Array of Ancestors- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

//find  all  descendants  of  b:

>  db.tree2.find({ancestors:  ‘b’})

//find  all  direct  descendants  of  b:

>  db.tree2.find({parent:  ‘b’})

Array of Ancestors- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

//find  all  descendants  of  b:

>  db.tree2.find({ancestors:  ‘b’})

//find  all  direct  descendants  of  b:

>  db.tree2.find({parent:  ‘b’})

//find  all  ancestors  of  f:>  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors>  db.tree2.find({_id:  {  $in  :  ancestors})

Trees as PathsStore hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree

{  comments:  [          {  author:  “Kyle”,  text:  “initial  post”,                path:  “/”  },          {  author:  “Jim”,    text:  “jim’s  comment”,              path:  “/jim”  },          {  author:  “Kyle”,  text:  “Kyle’s  reply  to  Jim”,              path  :  “/jim/kyle”}  ]  }

//  Find  the  conversations  Jim  was  part  of  >  db.posts.find({path:  /^jim/i})

Queue• Need to maintain order and state• Ensure that updates to the queue are atomic

     {  inprogress:  false,          priority:  1,        ...      }

Queue• Need to maintain order and state• Ensure that updates to the queue are atomic

     {  inprogress:  false,          priority:  1,        ...      }

//  find  highest  priority  job  and  mark  as  in-­‐progressjob  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1),                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}},                              new:  true})    

Remember me?

http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the apps manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

@mongodb

conferences,  appearances,  and  meetupshttp://www.10gen.com/events

http://bit.ly/mongo>  Facebook                    |                  Twitter                  |                  LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

We’re Hiring !alvin@10gen.com

Competition!

1 - Tweet a picture of Mongo Boulder before 3pm

2 - Include the #mongoboulder hashtag

3 - You must be following @mongodb or @10gen

4 - Winner announced during the roadmap session gets free copy of MongoDB in Action and t-shirt

top related