running mongodb in the cloud
DESCRIPTION
A talk about how Wordnik migrated from EC2 to physical servers and back again, much due to the cloud-friendliness of MongoDBTRANSCRIPT
![Page 1: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/1.jpg)
Running MongoDB in the Cloud
Tony Tam@fehguy
![Page 2: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/2.jpg)
What this Talk is About
Wordnik left the cloud and came back
• What?!?
• Why we left
• Decisions
• Why we came back (and what we did differently)
![Page 3: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/3.jpg)
Who is Wordnik?
•World’s fastest updating English dictionary
• Based on input of text at ~8k words/second
• Word Graph as basis to our analysis
• Synchronous & asynchronous processing
•10’s of Billions of documents in NR storage
•Concept & Meaning Discovery Engine
•> 20M daily REST API calls, billions served
![Page 4: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/4.jpg)
So Why the Detour?
•Architectural Choices
•Business Choices
•Feedback, tooling, infrastructure
•Learning
•Changes in use case
•Progress!
![Page 5: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/5.jpg)
Architecture History
•EC2-based LAMP Stack
• POC (and seed funding)
• A manageable corpus < 1M records
•REST API
• Web + public
• MySQL in master/slave
• ~1B documents
• Operational nightmare
![Page 6: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/6.jpg)
Architecture History
•MongoDB
• First-order MySQL issues solved
• But it got slow…
•Real Servers to the rescue!
• Faster, bigger disks
•MongoDB for Corpus, Structured Data
• Faster Reads + Writes!
• More metal (72GB RAM)
• More cores
• “cold” query from 400ms to < 100
![Page 7: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/7.jpg)
Why Change?
Easy!
•Can’t beat metal…except
• Quick expansion
• Batch jobs/experiments
• Add a datacenter
• Full cluster migration
• The bill for unused capacity
![Page 8: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/8.jpg)
Architectural Mindshift
1. Anything can die, anytime
2. Centralized, redundant state (see point 1)
3. Server performance is *different*
• CPU, I/O, Memory—choose one
• Smart design makes it work!
![Page 9: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/9.jpg)
Architectural Mindshift
•Your software will need to change!
• So will the components you rely on
![Page 10: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/10.jpg)
Your Infrastructure
•Deploying Servers
• Going to need a lot!
•Configuration
•Updates to your software
What about Data?
Cloud Hero
![Page 11: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/11.jpg)
Let’s make this Work!
•MySQL Master Slave
• Take a snapshot (yes, this will block)
• Keep your binlogs!
change master to MASTER_HOST='app1', MASTER_USER='XXXX', MASTER_PASSWORD='XXXX', MASTER_LOG_FILE='app1-relay.0038774', MASTER_LOG_POS=6754205951;
![Page 12: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/12.jpg)
Let’s make this Work!
But…
•Your master is down!
• Quick, promote a slave!
• Point the other slaves to the new master
•As for the clients…
“Well, we never really tried that…”
![Page 13: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/13.jpg)
Better with Mongo
•Easy up, easy down!
• Startup: Sync your data, and announce to clients when ready for business
• Shutdown: Announce your departure and leave
•Replica setsrs.add("db4.wordnik.com:27017");
rs.remove("db1.wordnik.com:27017");
![Page 14: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/14.jpg)
Better with Mongo
![Page 15: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/15.jpg)
But what about Performance?
•Software Design
• It’s slow! (What is *it*?)
• Profile everythingimport com.wordnik.util.perf._
...
def findUser(id:Long): User = {
Profile("UserDao::findUserById", dao.findUserById(id))
}
http://github.com/wordnik/wordnik-oss
![Page 16: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/16.jpg)
But what about Performance?
![Page 17: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/17.jpg)
But what about Performance?
•“It’s the database!”
• What is it?
•Mapping layer
• Mysql (12+ joins) => 50 records/sec
• Mongo JSON POJO => 1000 records/sec
• Mongo DBO POJO => 35,000 records/sec
•How do you know?Profile
it!
![Page 18: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/18.jpg)
It’s Still Slow!
•It’s the index!
• How do you know?
• AHHHHH
![Page 19: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/19.jpg)
It’s Still Slow!
•Balance your B-Tree
• Can't always keep index in ram. MMF "does it's thing"
• Right-balanced b-tree keeps necessary index hot
• If you hit indexes on disk, mute your pager17
15
27
![Page 20: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/20.jpg)
But it’s Still Slow!
•Look at your Schema design
• Design to limit index size/number
• _id is your friend—make it meaningful
• Record size consistency
• Hierarchal Data beware!
• Split documents even in same collection!db.posts.find({_id:/^tony_posts_/})
{_id:"tony_posts_1”, posts:[...]}
{_id:"tony_posts_2”, posts:[...]}
{_id:"tony_posts_3”, posts:[...]}
YOUR app
knows best
![Page 21: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/21.jpg)
Really, it’s STILL slow!
•Your monolithic app/DB won’t scale same on VMs
•Specialize!
• Wordnik uses mSOA
• Data tiers follow service types
• Smaller *everything*
Powered APIswagger.wordnik.com
![Page 22: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/22.jpg)
Really, it’s STILL slow!
•Your monolithic app/DB won’t scale same on VMs
•Specialize!
• Wordnik uses mSOA
• Data tiers follow service types
• Smaller *everything*
Powered APIswagger.wordnik.com
A contract for your clients
![Page 23: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/23.jpg)
Be the Boss of your Data
•Your app *should* be smarter than your DB
• Lots of users?
• Lots of blog posts?
• Lots of images?
• Shard? On what?
•Data dimensionality
• Keep active data hot
• Don’t try to boil the ocean
![Page 24: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/24.jpg)
Cloud Computing + Mongo
•It can work extremely well
• No “Save as Cloud!” menu item
•Shifting constraints
• Optimize for RAM on VM
• Virtual disk => virtual performance
•Be “Deployable”
• Mongo Replica Sets are made for this
![Page 25: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/25.jpg)
Cloud Computing + Mongo
•System Durability
• Design your software for abuse
• Your old design doesn’t apply
• Add APM hooks, now!
•Dissect your app
• Build to micro services with dedicated MongoDB clusters
•Deployment Infrastructure
• Don’t wait until it’s too late
![Page 26: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/26.jpg)
See More
• See more about Wordnik APIs
http://developer.wordnik.com
• Migrating from MySQL to MongoDBhttp://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik
• Maintaining your MongoDB Installationhttp://www.slideshare.net/fehguy/mongo-sv-tony-tam
• Swagger API Frameworkhttp://swagger.wordnik.com
• Mapping Benchmarkhttps://github.com/fehguy/mongodb-benchmark-tools
• Wordnik OSS Tools https://github.com/wordnik/wordnik-oss
![Page 27: Running MongoDB in the Cloud](https://reader033.vdocuments.mx/reader033/viewer/2022052823/554f909ab4c905d25b8b51ae/html5/thumbnails/27.jpg)
Questions?