humongous mongodb · agenda • scaling mongodb - concepts • replica sets & sharding • read...
TRANSCRIPT
![Page 1: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/1.jpg)
Humongous MongoDBSean Corfield
World Singles llc
1
![Page 2: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/2.jpg)
Agenda
• Scaling MongoDB - Concepts
• Replica Sets & Sharding
• Read Preference, Write Concern, Etc
• Map/Reduce
• Aggregation
2
![Page 3: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/3.jpg)
You Might Prefer...
• Queries and Looping and Grouping
• Enterprise JavaScript Workflows
• Using PhoneGap to Build Mobile Apps
• Deep Dive - two hours, halfway thru!
• Caching in ColdFusion 10
3
![Page 4: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/4.jpg)
Me
• Functional Programming in the 80's
• Object-Oriented Programming in the 90's
• Web / Dynamic Programming in the 00's
• Mostly Clojure today
4
![Page 5: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/5.jpg)
Me & MongoDB
• Using MongoDB in production (2 years)
• Took 10gen "MongoDB for Developers" course (Python + MongoDB)
• Lead maintainer for Clojure's MongoDB wrapper "CongoMongo"
5
![Page 6: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/6.jpg)
Agenda
• Scaling MongoDB - Concepts
• Replica Sets & Sharding
• Read Preference, Write Concern, Etc
• Map/Reduce
• Aggregation
6
![Page 7: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/7.jpg)
Scaling Concepts
• Master / slave for traditional DBs
• One master
• One or more slaves
• Usually scale "up" not "out"
7
![Page 8: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/8.jpg)
Scaling Concepts
• MongoDB replaces "master / slave" with "replica set" for failover & load distribution
• MongoDB adds sharded clusters to support very large data sets - horizontal scale out
8
![Page 9: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/9.jpg)
Replica Set Concepts
• A replica set contains any number of (mostly) identical nodes
• A subset of these vote to elect a primary
• All remaining nodes are then secondaries
• Writes go to the primary node and replicate to the secondary nodes
9
![Page 10: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/10.jpg)
Replica Set Concepts
• Reads generally go to the primary node but can be performed against secondaries
• Per connection or per operation
• Can also be targeted to specific nodes
• Tagged of nodes and reads
10
![Page 11: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/11.jpg)
Replica Set Concepts
• Nodes may also be
• Secondary only
• Hidden
• Arbiter
• Non-voting
• Delayed
11
![Page 12: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/12.jpg)
Sharding Concepts
• Multiple "shards"
• Servers or clusters (replica sets)
• Configuration server (or a cluster)
• Shard server proxy (mongos process)
• Lightweight, can have one per app server
12
![Page 13: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/13.jpg)
mongos
app
mongos
app
mongos
app
config
config
config
shard primary
shard primary
shard primary
shard sec'dary
shard sec'dary
shard sec'dary
shard sec'dary
shard sec'dary
shard sec'dary
13
![Page 14: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/14.jpg)
Sharding Concepts
• A collection is split across the shards
• Automatic, based on a key column
• Automatically balanced across shards
• Reads directed to appropriate shards
• May run on all shards, then aggregate
14
![Page 15: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/15.jpg)
Agenda
• Scaling MongoDB - Concepts
• Replica Sets & Sharding
• Read Preference, Write Concern, Etc
• Map/Reduce
• Aggregation
15
![Page 16: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/16.jpg)
Replica Set Setup
• Start MongoDB servers as replica nodes
• add --replSet {rsName}
• specify unique ports and --dbpath folders!
16
![Page 17: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/17.jpg)
Replica Set Setup
• Connect via mongo shell to one server
• initiate the replica set
• add the other servers to that
17
![Page 18: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/18.jpg)
Replica Sets
• Either:conf = { _id: "rsName", members: [ { _id: 0, host: "server-x:27017" } ] }rs.initiate( conf )
18
![Page 19: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/19.jpg)
Replica Set Setup
• Or:rs.initiate()
• Creates default rs.conf()
• Now add the others:rs.add( "server-y:27017" )rs.add( "server-z:27017" )
19
![Page 20: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/20.jpg)
Replica Set Demo
• Let's see a real replica set running locally!
• Must use local machine name
• Must use different ports for each instance
20
![Page 21: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/21.jpg)
Sharding Setup
• Start config server(s)
• mongod --configsvr --dbpath /data/cfg
• Start mongos process(es)
• mongos --configdb server-c:27019
• Start server (or replica set) for each shard
• mongod --dbpath /data/sh1
21
![Page 22: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/22.jpg)
Sharding Setup
• Add each shard to configuration
• sh.addShard("server-s:27100")
• Enable sharding for the database
• sh.enableSharding("mydb")
• Enable sharding for a collection
• sh.shardCollection("mydb.coll",{thekey:1})
22
![Page 23: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/23.jpg)
Sharding In Use
• Connect to the mongos server (or cluster)
• Interact with the database as usual
• Data automatically moved between shards
• Reads automatically routed based on key
23
![Page 24: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/24.jpg)
Sharding In Use
• If a query includes the shard key, it will be routed directly to the appropriate server
• If a query does not include the shard key (or uses a range), the query will be sent to multiple servers and the results merged in the mongos process
24
![Page 25: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/25.jpg)
Sharding Demo
• Let's see it running!
25
![Page 26: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/26.jpg)
Agenda
• Scaling MongoDB - Concepts
• Replica Sets & Sharding
• Read Preference, Write Concern, Etc
• Map/Reduce
• Aggregation
26
![Page 27: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/27.jpg)
Read Preference
• Normally all reads go to the primary
• Just like writes
• As we saw, you cannot normally read from a secondary unless you explicitly allow it
• Sometimes you want to spread the read load or read from nearby servers (in a geographically distributed cluster)
27
![Page 28: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/28.jpg)
Read Preference
• Available options:
• primary - default
• primaryPreferred
• secondary
• secondaryPreferred
• nearest
28
![Page 29: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/29.jpg)
Read Preference
• Secondaries can return stale data!
• Can specify preference
• Per connection
• Per collection
• Per operation
• Not currently supported by cfmongodb
29
![Page 30: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/30.jpg)
Write Concern
• By default, receipt of a write command is acknowledged by the server (but may not yet have been written to disk)
• You can control what operations you wait for before a write command returns
30
![Page 31: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/31.jpg)
Write Concern
• Errors Ignored - do not use!
• Unacknowledged
• Fire and forget
• Network errors are detected
• Used to be the default
31
![Page 32: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/32.jpg)
Write Concern
• Acknowledged
• Current default (as of late 2012)
• Network errors, duplicate keys etc
• Journaled
• The update is written to local journal
• Durable - will survive shutdown / crash
32
![Page 33: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/33.jpg)
Write Concern
• Replica Acknowledged
• The update is written to one or more secondaries in a replica set
• Can specify number or "majority"
• Specifying number will block until that many secondaries have the write (and therefore it can block "forever"!)
33
![Page 34: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/34.jpg)
Write Concern
• cfmongodb supports per-connection only
• Not per-operation
• mongoClientOptions arg to MongoConfig
• writeConcern field of that struct
34
![Page 35: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/35.jpg)
WriteConcern
• Retrieve from a Java class
• mongoFactory.getObject( "com.mongodb.WriteConcern" ).UNACKNOWLEDGED
• http://api.mongodb.org/java/2.10.1/com/mongodb/WriteConcern.html
35
![Page 36: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/36.jpg)
Agenda
• Scaling MongoDB - Concepts
• Replica Sets & Sharding
• Read Preference, Write Concern, Etc
• Map/Reduce
• Aggregation
36
![Page 37: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/37.jpg)
Map/Reduce
• Intended for complex data processing
• Batch operation, not real time!
• You provide map, reduce, finalize functions written in JavaScript (as strings!)
37
![Page 38: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/38.jpg)
Map/Reduce
• people = mongo.getDBCollection("people");people.mapReduce( map = "function(){ ...}", reduce = "function(key,values){ ... }", outputTarget = ... );
• For finalize you must use a DB command
• Examples in cfmongodb aggregation folder
38
![Page 39: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/39.jpg)
Agenda
• Scaling MongoDB - Concepts
• Replica Sets & Sharding
• Read Preference, Write Concern, Etc
• Map/Reduce
• Aggregation
39
![Page 40: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/40.jpg)
Aggregation Framework
• Added in MongoDB 2.2
• Native, pipeline-based functions
• project (SELECT), match (WHERE), group (GROUP BY), sort (ORDER BY), unwind, skip, limit, geoNear (new in 2.4)
• Simple aggregate() function takes each operation as an argument in order
40
![Page 41: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/41.jpg)
Aggregation Framework
• result = musicians.aggregate( { "$group" : { "_id" : "$status", "total" : { "$sum" = 1 } } }, { "$project" : { "status" : "$_id", "numberOfMusicians" : "$total", "_id" : 0 } });
41
![Page 42: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/42.jpg)
Aggregation Framework
• Equivalent to
• SELECT COUNT(*) AS total, statusFROM musiciansGROUP BY status
• More examples in cfmongodb aggregation folder
42
![Page 43: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/43.jpg)
Summary
• MongoDB
• Supports simple, robust clustering with automatic failover
• Supports data sharding to provide automatic horizontal scalability
• Provides plenty of control over reading and writing in a clustered environment
43
![Page 44: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/44.jpg)
Summary
• cfmongodb supports
• Robust, scalable clustering
• Big Data manipulation through map/reduce and the aggregation framework
• Write Concern (partially)
• It's open source - contribute!
44
![Page 45: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/45.jpg)
Resources
• http://cfmongodb.riaforge.org
• http://mongodb.org
• http://docs.mongodb.org
• http://www.10gen.com
45
![Page 46: Humongous MongoDB · Agenda • Scaling MongoDB - Concepts • Replica Sets & Sharding • Read Preference, Write Concern, Etc • Map/Reduce • Aggregation 2](https://reader034.vdocuments.mx/reader034/viewer/2022051815/603fe4f7eb21f62e7917d5ea/html5/thumbnails/46.jpg)
Q & A?
• @seancorfield
• http://corfield.org
46