tales from the field

Adam ComerfordSenior Solutions Engineer, MongoDB

@comerford #MongoDBLondon

Tales from the Field

Or:

●Cautionary Tales●Don’t solve the wrong

problems●Bad schemas hurt ops too●etc.

● Are (mostly) true, and (mostly) actually happened

● Names have been changed to protect the (mostly) innocent

● No animals were harmed during the making of this presentation ○ Perhaps a few DBAs and engineers had light

emotional scarring● Some of the people that inspired the stories may

well be here today at MongoDB London

The Stories

Story #1: Bill the Bulk Updater● Bill built a system that tracked status information

for entities in his business domain

● State changes for this system happened in batches:o Sometimes 10% of entities get updatedo Sometimes 100% get updated

● Essentially, lots of random updates

Bill’s Initial Architecture

Application / mongosmongod

What about production?

● Bill’s system was a success!

● The product grew, and the number of entities increased by a factor of 5

● Not a problem - add more shards!

Bill’s Eventual Architecture

Application / mongos

…16 more shards…

mongod

Linear Scaling

● Bill’s cluster scaled linearly, as intended

● But, Bill’s TCO scaled linearly too

● More growth was forecast

Large Cluster, Large Expense● Entity growth predicted at 10x

● Rough calculations called for ~200 shards

● Linear scaling of cost

What problem did Bill overlook?● Horizontal Scaling = Linear Scaling

● Not necessarily the most efficient option

The “Golden Hammer” Tendency

What did we recommend?

● Scale the random I/O vertically, not horizontally

● Sometimes a combination of vertical & horizontal scaling is the best approach

Bill’s Final Architecture

Application / mongosmongod SSD

Story #2: Gary the Game Developer● Gary was launching a AAA game title

● MongoDB would provide the backend for the player’s online experience

● Launched worldwide, same day, midnight launches

Complex Cloud Deployment

● Deploying in the cloud, but very beefy instances

● 32 vCPU, 244GiB RAM, 8 x SSD

● Single mongod unable to stress instances

● Hence “Micro-Sharding” required to get most out of instances

Micro-What?

HOST1

Primary1

Primary2

Primary3

Secondary4

Secondary5

Secondary6

Secondary7

Secondary8

Secondary9

HOST2

Secondary1

Secondary2

Secondary3

Primary4

Primary5

Primary6

Secondary7

Secondary8

Secondary9

HOST3

Secondary1

Secondary2

Secondary3

Secondary4

Secondary5

Secondary6

Primary7

Primary8

Primary9

Micro-Sharding is the practice of deploying multiple relatively small (hence “micro”) shards on large hosts to better take advantage of available resources which are difficult to utilise with a single mongod instance.

For example, 9 shards evenly distributed across 3 hosts, as below:

● Load tested

● Failover and Backups tested

● Procedures, architecture reviewed

● Basically, lots of testing/reviewing was done (all passed)

Extensive Pre-Production Testing

However…….

HOST1

Primary1

Primary2

Primary3

Secondary4

Secondary5

Secondary6

Secondary7

Secondary8

HOST2

Secondary1

Secondary2

Secondary3

Primary4

Primary5

Primary6

Secondary7

Secondary8

HOST3

Secondary1

Secondary2

Secondary3

Secondary4

Secondary5

Secondary6

Primary7

Primary8

The production layout of mongod processes actually was 8 shards on 3 host, reproduced below. This layout caused a problem in production. But, it was tested and had no issues, right?

Almost: the backup process was tested, and load was tested, but not together…..

The Backup Process

HOST1

Primary1

Primary2

Primary3

Secondary4

Secondary5

Secondary6

Secondary7

Secondary8

HOST2

Secondary1

Secondary2

Secondary3

Primary4

Primary5

Primary6

Secondary7

Secondary8

HOST3

Secondary1

Secondary2

Secondary3

Secondary4

Secondary5

Secondary6

Primary7

Primary8

Backups took place on a single host (host 2 below).

The databases were locked, then an LVM snapshot was taken, the lock was released.

This was almost instantaneous in pre-prod testing (no load), not so in production.

Backup Under Load

HOST1

Primary1

Primary2

Primary3

Secondary4

Secondary5

Secondary6

Secondary7

Secondary8

HOST2

Secondary1

Secondary2

Secondary3

Primary4

Primary5

Primary6

Secondary7

Secondary8

HOST3

Secondary1

Secondary2

Secondary3

Secondary4

Secondary5

Secondary6

Primary7

Primary8

Once load was introduced to the equation, the snapshots were no longer instantaneous. This essentially caused the primaries to become unresponsive but not fail over on the host taking the backup

Which eventually caused a cascading failure, bringing the whole cluster down


HOST1

Primary1

Primary2

Primary3

Primary4

Secondary5

Secondary6

Secondary7

Secondary8

HOST2

Secondary1

Secondary2

Secondary3

Secondary4

Secondary5

Secondary6

Secondary7

Secondary8

HOST3

Secondary1

Secondary2

Secondary3

Secondary4

Primary5

Primary6

Primary7

Primary8

New process layout proposed, as below, backups still taken on Host2.

The database lock was not necessary because LVM snapshot gives point in time, removed.

Also put some limits on max connections, just in case

No one single cause:● Small issue with deployment layout● Small error with backup process● Lack of integration with testing plan● Relatively new system● Some bad luck

Led to:● Large outage, slow cautious recovery

Summary

Story #3: Rita the Retailer

Rita the Retailer had an ecommerce site, selling diverse goods in 20+ countries.

{

_id: 375

en_US : { name : ..., description : ..., <etc...> },

en_GB : { name : ..., description : ..., <etc...> },

fr_FR : { name : ..., description : ..., <etc...> },

de_DE : ...,

de_CH : ...,

<... and so on for other locales... >

}

Product Catalog: Original Schema

What’s good about this schema?

● Each document contains all the data about a given product, across all languages/locales

● Very efficient way to retrieve the English, French, German, etc. translations of a single product’s information in one query

However……

That is not how the product data is actually used

(except perhaps by translation staff)

db.catalog.find( { _id : 375 } , { en_US : true } );

db.catalog.find( { _id : 375 } , { fr_FR : true } );

db.catalog.find( { _id : 375 } , { de_DE : true } );

... and so forth for other locales ...

Dominant Query Pattern

Which means……

The Product Catalog’s data model did not fit the way the

data was accessed.

Consequences

● Each document contained ~20x more data than any common use case needed

● MongoDB lets you request just a subset of a document’s contents (using a projection), but…

o Typically the whole document will get loaded into RAM to serve the request

● There are other overheads for reading from disk into memory (like readahead)

Therefore…..

Less than 5% of data loaded into RAM from disk is actually required at the time - highly inefficient

{ _id: 42, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > }

<READAHEAD OVERHEAD>


<READAHEAD OVERHEAD>


Visualising the problem

- Data in RED are loaded into RAM and used.

- Data in BLUE take up memory but are not required.

- Readahead padding in GREEN makes things even more inefficient

More RAM? It’s not that simple


● Design for your use case, your dominant query pattern

o In this case: 99.99% of queries want the product data for exactly one locale at a time

o Hence, alter schema appropriately

● Eliminate inefficiencies on the systemo Make reading from disk less wasteful,

maximise I/O capabilities: reduce readahead settings

Schema After (document per-locale):

{ _id: "375-en_US",

name : ..., description : ..., <etc...> }

{ _id: "375-en_GB",


{ _id: "375-fr_FR",


... and so on for other locales ...

Query After:db.catalog.find( { _id : "375-en_US" };db.catalog.find( { _id : "375-fr_FR" };db.catalog.find( { _id : "375-de_DE" };

Schema: Before & AfterSchema Before (embedded):

{ _id: 375

en_US : { name : ..., description : ...,

<etc...> },

en_GB : { name : ..., description : ...,

<etc...> },

fr_FR : { name : ..., description : ...,

<etc...> },

<... and so on for other locales... >

}

Query Before:db.catalog.find( { _id : 375 } , { en_US : true } );db.catalog.find( { _id : 375 } , { fr_FR : true } );db.catalog.find( { _id : 375 } , { de_DE : true } );

Consequences of Changes

● Queries induced minimal overhead● Greater than 20x distinct products fit in memory

at once● Disk I/O utilization reduced● UI latency decreased● Happier Customers● Profit (well, we hope)

Conclusions

● MongoDB can be used to for a wide range of (sometimes pretty cool) use cases

● A small problem can seem much bigger when it happens in production

● We are here to help - if you hit a problem, it’s likely you are not the first to hit it

● We can provide a fresh perspective, advice based on experience to prevent and solve issues

Adam ComerfordSenior Solutions Engineer, MongoDB

@comerford #MongoDBLondon

Questions?

Further Reading for Retail/Catalogs● Antoine Girbal (my team mate) has produced a full reference

architecture for this type of application

o Blog part 1: http://tmblr.co/ZiOADx1RRsAWe

o Blog part 2: http://tmblr.co/ZiOADx1LfVmfm

● Detailed presentations and talks from MongoDB World:

o http://www.mongodb.com/presentations/retail-reference-architecture-part-1-flexible-searchable-low-latency-product-catalog

o http://www.mongodb.com/presentations/retail-reference-architecture-part-2-real-time-geo-distributed-inventory

o http://www.mongodb.com/presentations/retail-reference-architecture-part-3-scalable-insight-component-providing-user-history

http://tmblr.co/ZiOADx1RRsAWe

http://tmblr.co/ZiOADx1LfVmfm

http://www.mongodb.com/presentations/retail-reference-architecture-part-1-flexible-searchable-low-latency-product-catalog

http://www.mongodb.com/presentations/retail-reference-architecture-part-1-flexible-searchable-low-latency-product-catalog

http://www.mongodb.com/presentations/retail-reference-architecture-part-2-real-time-geo-distributed-inventory

http://www.mongodb.com/presentations/retail-reference-architecture-part-2-real-time-geo-distributed-inventory

http://www.mongodb.com/presentations/retail-reference-architecture-part-3-scalable-insight-component-providing-user-history



tales from the field

Technology

bills system

linear scaling bills

single host host

shards linear scaling

bills tco

large cluster

deployment layout small

new system