scaling mongodb for real time analytics
DESCRIPTION
TRANSCRIPT
![Page 1: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/1.jpg)
SHORTCUTSAROUND THE
MISTAKES WE’VEMADE SCALING
MONGODB
David Tollmyr, Platform lead
@effataslideshare.net/tollmyr
![Page 2: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/2.jpg)
What we doWe want to make digital advertising an amazing user experience.There is more to metrics that clicks.
![Page 3: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/3.jpg)
Ads
![Page 4: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/4.jpg)
Data
![Page 5: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/5.jpg)
Assembling sessionsexposure
pingping
ping ping
ping
event
event
ping
session➔ ➔
![Page 6: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/6.jpg)
Information
![Page 7: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/7.jpg)
Crunching
session
session
session
session
sessionsession
session session
session
session
session
session
session
➔ ➔ 42
![Page 8: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/8.jpg)
![Page 9: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/9.jpg)
Metrics
![Page 10: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/10.jpg)
Reports
![Page 11: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/11.jpg)
What we doTrack ads, make pretty reports.
![Page 12: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/12.jpg)
That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of dataIt’s all done in (close to) real time
![Page 13: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/13.jpg)
Numbers200 Gb logs100 million data pointsper day~300 metrics per data point= 6000 updates / s at peak
![Page 14: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/14.jpg)
How we use(d) MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobs, replays and manual analysisMetrics storage
![Page 15: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/15.jpg)
Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)It’s just… nice
![Page 16: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/16.jpg)
Btw.We use JRuby, it’s awesome
![Page 17: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/17.jpg)
STANDING ON THE SHOULDERS
OF GIANTS WITH JRUBY
slideshare.net/iconara
![Page 18: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/18.jpg)
A story in 9 iterations
![Page 19: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/19.jpg)
secondary indexes and updates1st iteration
One document per session, update as new data comes alongOutcome: 1000% write lock
![Page 20: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/20.jpg)
#1Everything is aboutworking around the
GLOBALWRITELOCK
![Page 21: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/21.jpg)
MongoDB 1.8.1
db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)
db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)
![Page 22: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/22.jpg)
MongoDB 2.0.0
db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)
db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)
![Page 23: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/23.jpg)
using scans for two step assembling2nd iteration
Instead of updating, save each fragment, then scan over _id to assemble sessions
![Page 24: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/24.jpg)
using scans for two step assembling2nd iteration
Outcome: not as much lock, but still not great performance. We also realised we couldn’t remove data fast enough
![Page 25: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/25.jpg)
#2Everything is aboutworking around the
GLOBALWRITELOCK
![Page 26: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/26.jpg)
#3Give a lot of
thought to your
PRIMARYKEY
![Page 27: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/27.jpg)
partitioning3rd iteration
Partitioning the data by writing to a new collection every hourOutcome: complicated, fragmented database
![Page 28: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/28.jpg)
#4Make sure you can
REMOVE OLD DATA
![Page 29: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/29.jpg)
sharding4th iteration
To get around the global write lock and get higher write performance we moved to a sharded cluster.Outcome: higher write performance, lots of problems, lots of ops time spent debugging
![Page 30: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/30.jpg)
#5Everything is aboutworking around the
GLOBALWRITELOCK
![Page 31: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/31.jpg)
#6SHARDINGIS NOT A
SILVER BULLETand it’s complex, if you can, avoid it
![Page 32: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/32.jpg)
![Page 33: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/33.jpg)
#7IT WILL FAIL
design for it
![Page 34: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/34.jpg)
![Page 35: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/35.jpg)
![Page 36: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/36.jpg)
moving things to separate clusters5th iteration
We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.Outcome: a more balanced and stable cluster
![Page 37: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/37.jpg)
#8Everything is aboutworking around the
GLOBALWRITELOCK
![Page 38: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/38.jpg)
#9ONE DATABASE
with one usage pattern
PER CLUSTER
![Page 39: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/39.jpg)
#10MONITOR
EVERYTHINGlook at your health
graphs daily
![Page 40: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/40.jpg)
monster machines6th iteration
We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).
♥I
![Page 41: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/41.jpg)
#11Don’t try to scale up
SCALE OUT
![Page 42: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/42.jpg)
#12When you’re out of ideas
CALL THE EXPERTS
![Page 43: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/43.jpg)
partitioning (again) and pre-chunking7th iteration
We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.Outcome: no more problems removing data.
![Page 44: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/44.jpg)
#13Smaller objects means a smaller database, and a smaller database means
LESS RAM NEEDED
![Page 45: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/45.jpg)
#14Give a lot of
thought to your
PRIMARYKEY
![Page 46: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/46.jpg)
#15Everything is aboutworking around the
GLOBALWRITELOCK
![Page 47: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/47.jpg)
realize when you have the wrong tool8th iteration
Transient data might not need all the bells and whistles.
Outcome: Redis gave us 100x performance in the assembling step
![Page 48: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/48.jpg)
#16When all you have is a
HAMMEReverything looks like a
NAIL
![Page 49: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/49.jpg)
rinse and repeat9th iteration
We now have the same scaling issues later in the chain.
Outcome: Upcoming rewrite to make writes/updated more effectiveRedis was actually slower
![Page 50: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/50.jpg)
#17Everything is aboutworking around the
GLOBALWRITELOCK
![Page 51: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/51.jpg)
Thank you
@effataslideshare.net/tollmyr
engineering.burtcorp.comburtcorp.com
richmetrics.com
![Page 52: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/52.jpg)
Since we got time…
![Page 53: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/53.jpg)
EC2Tips
You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.m1.xlarge comes with 1.7 TB of storage.
![Page 54: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/54.jpg)
Avoid bulk insertsTips
Very dangerous if there’s a possibility of duplicate key errors
It’s not fixed in 2.0 even though the driver has a flag for it.
![Page 55: Scaling MongoDB for real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051313/548da26db47959e20c8b66b9/html5/thumbnails/55.jpg)
Safe modeTips
Run every Nth insert in safe modeThis will give you warnings when bad things happen; like failovers