advanced schema design patterns

46
FEBRUARY 15, 2018 | BELL HARBOR #MDBlocal Advanced Schema Design Patterns

Upload: mongodb

Post on 16-Mar-2018

82 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Advanced Schema Design Patterns

FEBRUARY 15, 2018 | BELL HARBOR

#MDBlocal

Advanced Schema

Design Patterns

Page 2: Advanced Schema Design Patterns

#MDBlocal

{ "name": "Daniel Coupal","jobs_at_MongoDB": [{ "job": "Senior Curriculum Engineer","from": new Date("2016-11") },

{ "job": "Senior Technical Service Engineer","from": new Date("2013-11") }

],"previous_jobs": ["Consultant","Developer","Manager Quality & Tools Team","Manager Software Team","Tools Developer"

],"likes": [ "food", "beers", "movies", "MongoDB" ],"email": "[email protected]"

}

Who Am I?

Page 3: Advanced Schema Design Patterns

#MDBlocal

The "Gang of Four":

A design pattern systematically names, explains,

and evaluates an important and recurring design

in object-oriented systems

MongoDB systems can also be built using its own

patterns

PATTERNPattern

Page 4: Advanced Schema Design Patterns

#MDBlocal

• 10 years with the document model

• Use of a common methodology and vocabulary when designing schemas for MongoDB

• Ability to model schemas using building blocks

• Less art and more methodology

Why this Talk?

Page 5: Advanced Schema Design Patterns

#MDBlocal

Ensure:

• Good performance

• Scalability

despite constraints

• Hardware• RAM faster than Disk

• Disk cheaper than RAM

• Network latency

• Reduce costs $$$

• Database Server• Maximum size for a document

• Atomicity of a write

• Data set• Size of data

Why do we Create Models?

Page 6: Advanced Schema Design Patterns

#MDBlocal

However don't Over Design!

Page 7: Advanced Schema Design Patterns

#MDBlocal

WMDB -

World Movie Database

Any events, characters and entities depicted in this presentation are fictional.

Any resemblance or similarity to reality is entirely coincidental

Page 8: Advanced Schema Design Patterns

#MDBlocal

WMDB -

World Movie Database

First iteration3 collections:

A. moviesB. moviegoersC. screenings

Page 9: Advanced Schema Design Patterns

#MDBlocal

Our mission, should we decide to accept it, is to

fix this solution, so it can perform well and scale.

As always, should I or anyone in the audience do

it without training, WMDB will disavow any

knowledge of our actions.

This tape will self-destruct in five seconds. Good

luck!

Mission Possible

Page 10: Advanced Schema Design Patterns

#MDBlocal

Page 11: Advanced Schema Design Patterns

#MDBlocal

• Frequency of Access• Subset ✔️

• Approximation ✔️

• Extended Reference

Patterns by Category

• Grouping• Computed ✔️

• Bucket

• Outlier

• Representation• Attribute ✔️

• Schema Versioning ✔️

• Document Versioning

• Tree

• Polymorphism

• Pre-Allocation

Page 12: Advanced Schema Design Patterns

#MDBlocal

{

title: "Dunkirk",

...

release_USA: "2017/07/23",

release_Mexico: "2017/08/01",

release_France: "2017/08/01",

release_Festival_San_Jose:"2017/07/22"

}

Would need the following indexes:

{ release_USA: 1 }

{ release_Mexico: 1 }

{ release_France: 1 }

...

{ release_Festival_San_Jose: 1 }...

Issue #1: Big Documents, Many Fields

and Many Indexes

Page 13: Advanced Schema Design Patterns

#MDBlocal

Pattern #1: Attribute

{

title: "Dunkirk",

...

release_USA: "2017/07/23",

release_Mexico: "2017/08/01",

release_France: "2017/08/01",

release_Festival_San_Jose:"2017/07/22"

}

Page 14: Advanced Schema Design Patterns

#MDBlocal

Problem:

• Lots of similar fields

• Common characteristic to search across those fields together

• Fields present in only a small subset of documents

Use cases:

• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...

• Release dates of a movie in different countries, festivals

Attribute Pattern

Page 15: Advanced Schema Design Patterns

#MDBlocal

Solution:

• Field pairs in an array

Benefits:

• Allow for non deterministic list of attributes

• Easy to index{ "releases.location": 1, "releases.date": 1 }

• Easy to extend with a qualifier, for example:{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }

Attribute Pattern - Solution

Page 16: Advanced Schema Design Patterns

#MDBlocal

Possible solutions:

A. Reduce the size of your working set

B. Add more RAM per machine

C. Start sharding or add more shards

Issue #2: Working Set doesn’t fit in RAM

Page 17: Advanced Schema Design Patterns

#MDBlocal

WMDB -

World Movie Database

First iteration3 collections:

A. moviesB. moviegoersC. screenings

Page 18: Advanced Schema Design Patterns

#MDBlocal

In this example, we can:

• Limit the list of actors and crew to 20

• Limit the embedded reviews to the top 20

• …

Pattern #2: Subset

Page 19: Advanced Schema Design Patterns

#MDBlocal

Problem:

• There is a 1-N or N-N relationship, and only a few documents always need to be shown

• Only infrequently do you need to pull all of the depending documents

Use cases:

• Main actors of a movie

• List of reviews or comments

Subset Pattern

Page 20: Advanced Schema Design Patterns

#MDBlocal

Solution:

• Keep duplicates of a small subset of fields in the main collection

Benefits:

• Allows for fast data retrieval and a reduced working set size

• One query brings all the information needed for the "main page"

Subset Pattern - Solution

Page 21: Advanced Schema Design Patterns

#MDBlocal

Question:

• Which new MongoDB 3.6 feature will allow me to notify an application if the name of an actor is changed?

Quiz A

Subset Pattern

Page 22: Advanced Schema Design Patterns

#MDBlocal

• CPU is on fire!

Issue #3: Lot of CPU Usage

Page 23: Advanced Schema Design Patterns

#MDBlocal

{

title: "The Shape of Water",

...

viewings: 5,000

viewers: 385,000

revenues: 5,074,800

}

Issue #3: ..caused by repeated calculations

Page 24: Advanced Schema Design Patterns

#MDBlocal

For example:

• Apply a sum, count, ...

• rollup data by minute, hour, day

• As long as you don’t mess with your source, you can recreate the rollups

Pattern #3: Computed

Page 25: Advanced Schema Design Patterns

#MDBlocal

Problem:

• There is data that needs to be computed

• The same calculations would happen over and over

• Reads outnumber writes:• example: 1K writes per hour vs 1M read per hour

Use cases:

• Have revenues per movie showing, want to display sums

• Time series data, Event Sourcing

Computed Pattern

Page 26: Advanced Schema Design Patterns

#MDBlocal

Solution:

• Apply a computation or operation on data and store the result

Benefits:

• Avoid re-computing the same thing over and over

Computed Pattern - Solution

Page 27: Advanced Schema Design Patterns

#MDBlocal

Question:

• Which Relational Database feature is typically used to mimic the computed pattern?

Quiz B

Computed Pattern

Page 28: Advanced Schema Design Patterns

#MDBlocal

Issue #4: Lots of Writes

Page 29: Advanced Schema Design Patterns

#MDBlocal

Issue #4: … for non critical data

Page 30: Advanced Schema Design Patterns

#MDBlocal

• Only increment once in X iterations

• Increment by X

Pattern #4: Approximation

Page 31: Advanced Schema Design Patterns

#MDBlocal

Page 32: Advanced Schema Design Patterns

#MDBlocal

Problem:

• Data is difficult to calculate correctly

• May be too expensive to update the document every time to keep an exact count

• No one gives a damn if the number is exact

Use cases:

• Population of a country

• Web site visits

Approximation Pattern

Page 33: Advanced Schema Design Patterns

#MDBlocal

Solution:

• Fewer stronger writes

Benefits:

• Less writes, reducing contention on some documents

Approximation Pattern –

Solution

Page 34: Advanced Schema Design Patterns

#MDBlocal

• Keeping track of the schema version of a document

Issue #5: Need to change the list of fields in the

documents

Page 35: Advanced Schema Design Patterns

#MDBlocal

Add a field to track the schema version number, per document

Does not have to exist for version 1

Pattern #5: Schema Versioning

Page 36: Advanced Schema Design Patterns

#MDBlocal

Problem:

• Updating the schema of a database is:• Not atomic

• Long operation

• May not want to update all documents, only do it on updates

Use cases:

• Practically any database that will go to production

Schema Versioning Pattern

Page 37: Advanced Schema Design Patterns

#MDBlocal

Solution:

• Have a field keeping track of the schema version

Benefits:

• Don't need to update all the documents at once

• May not have to update documents until their next modification

Schema Versioning Pattern –

Solution

Page 38: Advanced Schema Design Patterns

#MDBlocal

BACK to reality

Page 39: Advanced Schema Design Patterns

#MDBlocal

• How duplication is handledA. Update both source and target in real time

B. Update target from source at regular intervals. Examples:• Most popular items => update nightly

• Revenues from a movie => update every hour

• Last 10 reviews => update hourly? daily?

Aspect of Patterns: Consistency

Page 40: Advanced Schema Design Patterns

#MDBlocal

What our Patterns did for us

Problem Pattern

Messy and Large Documents Attribute

Too much RAM Subset

Too much CPU Computed

Too many disk accesses Approximation

No downtime to upgrade schema Schema Versioning

Page 41: Advanced Schema Design Patterns

#MDBlocal

• Bucket

• grouping documents together, to have less documents

• Document Versioning

• tracking of content changes in a document

• Outlier

• Avoid few documents drive the design, and impact performance for all

• External Reference

• Tree(s)

• Polymorphism

• Pre-allocation

Other Patterns

Page 42: Advanced Schema Design Patterns

#MDBlocal

A. Simple grouping from tables to collections is not optimal

B. Learn a common vocabulary for designing schemas with MongoDB

C. Use patterns as "plug-and-play" to improve performance

Take Aways

Page 43: Advanced Schema Design Patterns

#MDBlocal

A full design example for a given problem:

• E-commerce site

• Contents Management System

• Social Networking

• Single view

• …

References for complete Solutions

Page 44: Advanced Schema Design Patterns

#MDBlocal

• More patterns in a follow up to this presentation

• MongoDB in-person training courses on Schema Design

• Upcoming Online course atMongoDB University:

• https://university.mongodb.com

• Data Modeling

How Can I Learn More About Schema Design?

Page 45: Advanced Schema Design Patterns

#MDBlocal

Question:

• Which Pattern is used in the following document?

{ "name": "Daniel Coupal","jobs_at_MongoDB": [{ "job": "Senior Curriculum Engineer","from": new Date("2016-11") },

{ "job": "Senior Technical Service Engineer","from": new Date("2013-11") }

],"previous_jobs": ["Consultant","Developer","Manager Quality & Tools Team","Manager Software Team","Tools Developer"

],"likes": [ "food", "beers", "movies", "MongoDB" ],"email": "[email protected]"

}

Quiz C

Which Pattern is used

Page 46: Advanced Schema Design Patterns

#MDBlocal

Thank You for using MongoDB !