mongodb and the mean stack
TRANSCRIPT
Ger Hartnett & Alan Spencer MongoDB Dublin
2
• Fictional story of a startup using MongoDB & MEAN stack to build IoT application
• We’ll take a devops perspective - show you what to watch out for a framework like MEAN
• Tips you can use to help development team focus on the right things when close to production
• Questions • How many from operations? • How many from development?
Overview
3
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
5 Things we Learned
Context: IoT & MEAN
Internet of Things
“The rise of device oriented development … new architectural and workflow challenges … distinctly different from … web and mobile development so far.” - Morten Bagai
Big Data => Humongous Data
6
Internet of Things
• Bosch: “IoT brings root and branch changes to the world of business”
• Richard Kreuter's Webinar May 2013
• Earlier bootcamp looked at sharding IoT
Photo by jurvetson - Creative Commons Attribution License - http://www.flickr.com/photos/jurvetson/916142
7
Express - web app framework/router
Angular - browser HTML/JS MVC
Node - javascript application server
MongoDB - the database
MEAN stack
Photo by benmizen - Creative Commons ShareAlike License - http://www.flickr.com/photos/benmizen/9456440635
8
Valeri Karpov - MongoDB Kernel Tools Team http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-tools/ MEAN.io http://mean.io
Learn more about MEAN
9
We invest in technical new hires
Everyone does “bootcamp”
NYC for 2 weeks - product internals
Then work on a longer project 3-4 weeks
In our case: wanted to do a bit of everything, capacity planning, iterate user-stories, MongoDB a component
About MongoDB Bootcamp
The Application
11
!!!!!!!!
• IoT example 3 from Richard’s Webinar
Location based advertising - IoMT
Customer
Advertiser
AdvertiserAdvertiser
12
US1 - customer looks for advertisers near US2 - advertiser wants to see how many customers saw offer US3 - find hot spots where many customers but few advertisers
User Stories - for the application
Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589
exports.all = function(req, res) {!! findQuery = { near: [ Number(req.query.lng), Number(req.query.lat) ],!! ! maxDistance: Number(req.query.dist) };!! Advertiser.geoSearch({kind:"pub"}, findQuery, !! ! function (err, advertisers) {! // error handling!! !! res.jsonp(advertisers);!! ! });!}
13
Document / Model / Controller
Model (advertiser.js) Document{ name: ‘Long Hall’, pos: [-6.265535, 53.3418364], kind: “pub” }
AdvertiserSchema = new Schema({! name: { type: String,! default: ‘’},! pos: [Number],! kind: { type: String,! default: ‘place’},!});
Controller (advertisers.js)Haystack examples sent us in wrong direction initially
14
CRUD interface & Mongoose
CRUD interface !Raised & fixed bug in Mongoose, pull request merged
15
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
5 Things we Learned
16
MongoDB shell scripts 9 advertisers, small area, distance 10km MongoDB has 5 kinds of geo query 3 kinds of geo index geoSearch (haystack) looked much better than others (our 1st mistake) TIP: performance is sensitive to test data & query
US1 Initial Measurements
17
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
5 Things we Learned
The good thing about frameworks is… !they do lot’s of things for developers !!!…and the bad thing about frameworks? !they do lot’s of things for developers
19
To find out what’s happening - debug
Console
Mongoose: clients.findOne({ _id: ObjectId(“…”) })!Mongoose: advertisers.geoHaystack({…[-6.267765, 53.34087]})!
We used Express passport-http to add Basic-Digest auth (client id lookup) It can be hard to figure out what a framework like express/mongoose really does Tip: mongoose.set('debug', true) - detailed logging
20
Find out what’s happening - profiler
db.system.profile.find{"op":"query", "ns":"tings.clients",...!{“op":"command", "command":{"geoSearch"...!
{"op" :"update","ns":"tings.sessions"...!
Tip: The MongoDB profiler shows operations really happening on DB, check with dev
exports.all = function(req, res) {!. . .!! ! ! req.session = null;!! !! res.jsonp(advertisers);!}
10% performance improvement
Where did that come from?
Fixing it is not obvious
Back to the application
22
US1 - customer looks for advertisers near • Need to store
customer location US2 - advertiser wants to see how many customers near
US2 means we built on US1
Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589
Being a startup we decided to take a naive pragmatic approach: • Store all samples • US2 aggregates on-demand
23
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
5 Things we Learned
1 hour of raw samples @ 2k RPS = 7.2M documents !Aggregation on 7.2M raw samples took 1 second on our instances Significant impact • Run every 2 seconds
RPS dropped by factor of 4! (single instance)
24
US2 - Aggregation of Raw Samples
Query Aggregate
Raw Insert
Samples
Aggregate
25
US2 - Pre aggregation
Query Aggregate
Raw Insert
Samples
Query Aggregate
Pre Aggregate
!Update
Samples
Aggregate Aggregate
An MMS type approach Document for advertiser-customer-month !Using update multi-true (more on this later) !Query now only needs to aggregate unique customers
26
MongoDB shell scripts More realistic data - old measurements repeated locations 110k advertisers with clusters in DUB and NYC Performance best for near and nearSphere (2x better than Haystack)
US1 measurements revisited
27
• Express/Mongoose/Node • Customer Lookup • Find ($near) • Save Sample DB • Save Sample File • Preagg=multiple docs (6) • Preagg=multi-update 1 doc
Where does the time go?
28
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
5 Things we Learned
MongoD
29
Deployment
Chrome:PostmanNodeJS
HAproxy
NodeLoad
NodeJS
NodeJS
NodeJS MongoD
30
Scaling
31
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
5 Things we Learned
2 - HAproxy
1 - number of Node.JS
3 - load gen threads/BW
MongoD
33
Pattern: “slam dunk optimization"
Chrome:PostmanNodeJS
HAproxy
NodeLoad
NodeJS
NodeJS
NodeJS MongoD*
3
2
1
34
1. Increase number of Node.JS 2. Increase perf of proxy/balancer instance
HAproxy more balanced than Amazon ELB 3. Tweak Nodeload (generates/measures REST)
Nodeload concurrency 3x Node servers Run Nodeload on same machine as HAproxy
Development recommendation: Postman chrome ext - generates REST / Basic Auth
Performance tips
Back to the application
36
US3 Overview
What are the top 10 hot sales areas? • What is an “area”…? Requirements • Little impact, easy to calculate • Approx. Regular size • Optimal approx. distance - “bounding areas” • Plays nice with sharding Internals of haystack, 2dsphere? Polygon? MGRS?
37
US3 - Hot box - Sales, go sell!
38
• 4QFJ123678 precision level 100m
MGRS - Military Grid Reference System
Image by Mikael Rittri - Creative Commons ShareAlike License http://en.wikipedia.org/wiki/File:MGRSgridHawaiiSchemeAARealigned.png
39
MGRS - But at the poles…
39 Image by Mikael Rittri - Creative Commons ShareAlike License http://en.wikipedia.org/wiki/File:MGRSgridNorthPole.png
Introducing the ‘box’
x
41
• Reinvented the sphere • Long/lat -> box number • Tailored to specific distance • Boxes are at least 1km • Search in current and 8
neighbouring boxes !
• Filter outside circle in JS • Performed relatively well • Can be used to shard
The “box” - the poor-man’s MGRS
42
Replication
43
Impact of Replication
Secondary reads !Worked for this app !Beware - don’t try this at home!
44
Apply the production notes
Change from default readahead Disable NUMA & THP ext4 or XFS noatime Load test workload on different configurations Instance Store / EBS (PIOPs) SSDs / spinning rust AWS instance types
Recap
46
Capacity planning/prototyping is a good idea but performance is sensitive to sample test data
The MEAN stack rocks - fast to get started but profiler can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”
Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”
With NodeJS/Express number of app servers becomes bottleneck before MongoDB
5 Things we Learned
Next Steps
48
Plan to publish as blog post series and github project !Check blog.mongodb.org !Continue to explore…
Next Steps
49
Hadoop/YARN for aggregations Use “box” to geo-shard Try 2.6 bulk updates Dynamic angular-google-maps with socket-io Implement in another framework (Go/Clojure) to load MongoDB with less hardware Find balance between batch and pre-aggregation (see next slide)
Next Steps - continuation
50
Introduction to MEAN - Valeri Karpov http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-tools/
MEAN.io http://mean.io
Richard Kreuter's webinar - M2M http://www.mongodb.com/presentations/webinar-realizing-promise-machine-machine-m2m-mongodb
Building MongoDB Into Your Internet of Things http://blog.mongohq.com/building-mongodb-into-your-internet-of-things-a-tutorial/
Schema design for time series data (MMS) http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb
Learn More & Thank You