introduction to new high performance storage engines in mongodb 2.8 henrik ingo solutions architect,...
TRANSCRIPT
Introduction to new high performance storage engines in MongoDB 2.8
Henrik IngoSolutions Architect, MongoDB
3.0
2
Hi, I am Henrik Ingo
@h_ingo
Introduction to new high performance storage engines in MongoDB 2.8
Agenda:
- MongoDB and NoSQL - Storage Engine API - WiredTiger configuration + performance
3.0
4
Most popular NoSQL database
5
5 NoSQL categories
Key Value Wide Column Document
Graph Map Reduce
Redis, Riak Cassandra
Neo4j Hadoop
6
MongoDB is a Document Database
MongoDBRich Queries
• Find Paul’s cars• Find everybody in London with a car
built between 1970 and 1980
Geospatial• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search• Find all the cars described as having
leather seats
Aggregation• Calculate the average value of Paul’s
car collection
Map Reduce• What is the ownership pattern of colors
by geography over time? (is purple trending up in China?)
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
7
Operational Database Landscape
MongoDB 3.0 & storage engines
9
Current state in MongoDB 2.6
Read-heavy apps
• Great performance• B-tree• Low overhead
• Good scale-out perf• Secondary reads• Sharding
Write-heavy apps
• Good scale-out perf• Sharding
• Per-node efficiency wish-list:• Doc level locking• Write-optimized data
structures (LSM)• Compression
Other
• Complex transactions• In-memory engine• SSD optimized engine• etc...
10
Current state in MongoDB 2.6
Read-heavy apps
• Great performance• B-tree• Low overhead
• Good scale-out perf• Secondary reads• Sharding
Write-heavy apps
• Good scale-out perf• Sharding
• Per-node efficiency wish-list:• Doc level locking• Write-optimized data
structures (LSM)• Compression
Other
• Complex transactions• In-memory engine• SSD optimized engine• etc...
How to get all of the above?
11
MongoDB 3.0 Storage Engine API
MMAP
Read-heavy app
WiredTiger
Write-heavy app
3rd party
Special app
12
MMAP
Read-heavy app
WiredTiger
Write-heavy app
3rd party
Special app
• One at a time:– Many engines built into mongod– Choose 1 at startup– All data stored by the same engine– Incompatible on-disk data formats (obviously)– Compatible client API
• Compatible Oplog & Replication– Same replica set can mix different engines– No-downtime migration possible
MongoDB 3.0 Storage Engine API
13
• MMAPv1– Improved MMAP (collection-level locking)
• WiredTiger– Discussed next
• RocksDB– LSM style engine developed by Facebook– Based on LevelDB
• TokuMXse– Fractal Tree indexing engine from Tokutek
Some existing engines
14
• Heap– In-memory engine
• Devnull– Write all data to /dev/null– Based on idea from famous flash animation...– Oplog stored as normal
• SSD optimized engine (e.g. Fusion-IO)
• KV simple key-value engine
Some rumored engines
https://github.com/mongodb/mongo/tree/master/src/mongo/db/storage
WiredTiger
16
• Modern NoSQL database engine– flexible schema
• Advanced database engine– Secondary indexes, MVCC, non-locking algorithms
– Multi-statement transactions (not in MongoDB 3.0)
• Very modular, tunable– Btree, LSM and columnar indexes
– Snappy, Zlib, 3rd-party compression
– Index prefix compression, etc...
• Built by creators of BerkeleyDB• Acquired by MongoDB in 2014• source.wiredtiger.com
What is WiredTiger
17
Choosing WiredTiger at server startup
mongod --storageEngine wiredTiger
http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine
18
Main tunables exposed as MongoDB options
mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --wiredTigerDirectoryForIndexes /data/indexes --wiredTigerCollectionBlockCompressor zlib --syncDelay 30
http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine
19
All WiredTiger options via configString (hidden)
mongod --storageEngine wiredTiger --wiredTigerEngineConfigString "cache_size=8GB,eviction=(threads_min=4,threads_max=8), checkpoint(wait=30)"
--wiredTigerCollectionConfigString "block_compressor=zlib"
--wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" --wiredTigerDirectoryForIndexes /data/indexes
See docs for wiredtiger_open() & WT_SESSION::create()http://source.wiredtiger.com/2.5.0/group__wt.html#ga9e6adae3fc6964ef837a62795c7840edhttp://source.wiredtiger.com/2.5.0/struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb
20
Also via createCollection(), createIndex()
db.createCollection( "users", { storageEngine: { wiredTiger: { configString: "block_compressor=none" } } )
http://docs.mongodb.org/master/reference/method/db.createCollection/#db.createCollectionhttp://docs.mongodb.org/master/reference/method/db.collection.createIndex/#db.collection.createIndex
21
• db.serverStatus()
• db.collection.stats()
More...
Understanding and OptimizingWiredTiger
23
Understanding WiredTiger architectureW
ired
Tig
er S
E
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
24
Covering 90% of your optimization needsW
ired
Tig
er S
E
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
Decompression time
Disk seek time
25
Strategy 1: fit working set in CacheW
ired
Tig
er S
E
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
cache_size = 80%
26
Strategy 2: fit working set in OS Disk CacheW
ired
Tig
er S
E
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
cache_size = 10%
OS Disk Cache (Remaining: 90%)
27
Strategy 3: SSD disk + compression to save €W
ired
Tig
er S
E
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical diskSSD
28
Strategy 4: SSD disk (no compression)W
ired
Tig
er S
E
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical diskSSD
29
What problem is solved by LSM indexes?P
erf
orm
ance
Fast reads Fast writesBoth
Easy: Add indexes
Easy: No indexes
Hard: Smart schema design (hire a consultant) LSM index structures (or columnar)
30
2B inserts (with 3 secondary indexes)
http://smalldatum.blogspot.fi/2014/12/read-modify-write-optimized.html