tales from the field
TRANSCRIPT
Adam ComerfordSenior Solutions Engineer, MongoDB
@comerford #MongoDBLondon
Tales from the Field
Or:
●Cautionary Tales●Don’t solve the wrong
problems●Bad schemas hurt ops too●etc.
● Are (mostly) true, and (mostly) actually happened
● Names have been changed to protect the (mostly) innocent
● No animals were harmed during the making of this presentation ○ Perhaps a few DBAs and engineers had light
emotional scarring● Some of the people that inspired the stories may
well be here today at MongoDB London
The Stories
Story #1: Bill the Bulk Updater● Bill built a system that tracked status information
for entities in his business domain
● State changes for this system happened in batches:o Sometimes 10% of entities get updatedo Sometimes 100% get updated
● Essentially, lots of random updates
Bill’s Initial Architecture
Application / mongosmongod
What about production?
● Bill’s system was a success!
● The product grew, and the number of entities increased by a factor of 5
● Not a problem - add more shards!
Bill’s Eventual Architecture
Application / mongos
…16 more shards…
mongod
Linear Scaling
● Bill’s cluster scaled linearly, as intended
● But, Bill’s TCO scaled linearly too
● More growth was forecast
Large Cluster, Large Expense● Entity growth predicted at 10x
● Rough calculations called for ~200 shards
● Linear scaling of cost
What problem did Bill overlook?● Horizontal Scaling = Linear Scaling
● Not necessarily the most efficient option
The “Golden Hammer” Tendency
What did we recommend?
● Scale the random I/O vertically, not horizontally
● Sometimes a combination of vertical & horizontal scaling is the best approach
Bill’s Final Architecture
Application / mongosmongod SSD
Story #2: Gary the Game Developer● Gary was launching a AAA game title
● MongoDB would provide the backend for the player’s online experience
● Launched worldwide, same day, midnight launches
Complex Cloud Deployment
● Deploying in the cloud, but very beefy instances
● 32 vCPU, 244GiB RAM, 8 x SSD
● Single mongod unable to stress instances
● Hence “Micro-Sharding” required to get most out of instances
Micro-What?
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
Secondary9
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
Secondary9
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8
Primary9
Micro-Sharding is the practice of deploying multiple relatively small (hence “micro”) shards on large hosts to better take advantage of available resources which are difficult to utilise with a single mongod instance.
For example, 9 shards evenly distributed across 3 hosts, as below:
● Load tested
● Failover and Backups tested
● Procedures, architecture reviewed
● Basically, lots of testing/reviewing was done (all passed)
Extensive Pre-Production Testing
However…….
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8
The production layout of mongod processes actually was 8 shards on 3 host, reproduced below. This layout caused a problem in production. But, it was tested and had no issues, right?
Almost: the backup process was tested, and load was tested, but not together…..
The Backup Process
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8
Backups took place on a single host (host 2 below).
The databases were locked, then an LVM snapshot was taken, the lock was released.
This was almost instantaneous in pre-prod testing (no load), not so in production.
Backup Under Load
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8
Once load was introduced to the equation, the snapshots were no longer instantaneous. This essentially caused the primaries to become unresponsive but not fail over on the host taking the backup
Which eventually caused a cascading failure, bringing the whole cluster down
What did we recommend?
HOST1
Primary1
Primary2
Primary3
Primary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Primary5
Primary6
Primary7
Primary8
New process layout proposed, as below, backups still taken on Host2.
The database lock was not necessary because LVM snapshot gives point in time, removed.
Also put some limits on max connections, just in case
No one single cause:● Small issue with deployment layout● Small error with backup process● Lack of integration with testing plan● Relatively new system● Some bad luck
Led to:● Large outage, slow cautious recovery
Summary
Story #3: Rita the Retailer
Rita the Retailer had an ecommerce site, selling diverse goods in 20+ countries.
{
_id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... >
}
Product Catalog: Original Schema
What’s good about this schema?
● Each document contains all the data about a given product, across all languages/locales
● Very efficient way to retrieve the English, French, German, etc. translations of a single product’s information in one query
However……
That is not how the product data is actually used
(except perhaps by translation staff)
db.catalog.find( { _id : 375 } , { en_US : true } );
db.catalog.find( { _id : 375 } , { fr_FR : true } );
db.catalog.find( { _id : 375 } , { de_DE : true } );
... and so forth for other locales ...
Dominant Query Pattern
Which means……
The Product Catalog’s data model did not fit the way the
data was accessed.
Consequences
● Each document contained ~20x more data than any common use case needed
● MongoDB lets you request just a subset of a document’s contents (using a projection), but…
o Typically the whole document will get loaded into RAM to serve the request
● There are other overheads for reading from disk into memory (like readahead)
Therefore…..
Less than 5% of data loaded into RAM from disk is actually required at the time - highly inefficient
{ _id: 42, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > }
<READAHEAD OVERHEAD>
{ _id: 709, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > }
<READAHEAD OVERHEAD>
{ _id: 3600, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > }
Visualising the problem
- Data in RED are loaded into RAM and used.
- Data in BLUE take up memory but are not required.
- Readahead padding in GREEN makes things even more inefficient
More RAM? It’s not that simple
What did we recommend?
● Design for your use case, your dominant query pattern
o In this case: 99.99% of queries want the product data for exactly one locale at a time
o Hence, alter schema appropriately
● Eliminate inefficiencies on the systemo Make reading from disk less wasteful,
maximise I/O capabilities: reduce readahead settings
Schema After (document per-locale):
{ _id: "375-en_US",
name : ..., description : ..., <etc...> }
{ _id: "375-en_GB",
name : ..., description : ..., <etc...> }
{ _id: "375-fr_FR",
name : ..., description : ..., <etc...> }
... and so on for other locales ...
Query After:db.catalog.find( { _id : "375-en_US" };db.catalog.find( { _id : "375-fr_FR" };db.catalog.find( { _id : "375-de_DE" };
Schema: Before & AfterSchema Before (embedded):
{ _id: 375
en_US : { name : ..., description : ...,
<etc...> },
en_GB : { name : ..., description : ...,
<etc...> },
fr_FR : { name : ..., description : ...,
<etc...> },
<... and so on for other locales... >
}
Query Before:db.catalog.find( { _id : 375 } , { en_US : true } );db.catalog.find( { _id : 375 } , { fr_FR : true } );db.catalog.find( { _id : 375 } , { de_DE : true } );
Consequences of Changes
● Queries induced minimal overhead● Greater than 20x distinct products fit in memory
at once● Disk I/O utilization reduced● UI latency decreased● Happier Customers● Profit (well, we hope)
Conclusions
● MongoDB can be used to for a wide range of (sometimes pretty cool) use cases
● A small problem can seem much bigger when it happens in production
● We are here to help - if you hit a problem, it’s likely you are not the first to hit it
● We can provide a fresh perspective, advice based on experience to prevent and solve issues
Adam ComerfordSenior Solutions Engineer, MongoDB
@comerford #MongoDBLondon
Questions?
Further Reading for Retail/Catalogs● Antoine Girbal (my team mate) has produced a full reference
architecture for this type of application
o Blog part 1: http://tmblr.co/ZiOADx1RRsAWe
o Blog part 2: http://tmblr.co/ZiOADx1LfVmfm
● Detailed presentations and talks from MongoDB World:
o http://www.mongodb.com/presentations/retail-reference-architecture-part-1-flexible-searchable-low-latency-product-catalog
o http://www.mongodb.com/presentations/retail-reference-architecture-part-2-real-time-geo-distributed-inventory
o http://www.mongodb.com/presentations/retail-reference-architecture-part-3-scalable-insight-component-providing-user-history