understanding and tuning wiredtiger, the new high performance database engine in mongodb / henrik...

Post on 06-Jan-2017

1.619 Views

Category:

Engineering

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Understanding and tuning WiredTigerthe new high performance database engine in MongoDB

Henrik IngoSolutions Architect, MongoDB

Agenda:

- MongoDB and NoSQL - Storage Engine API - WiredTiger configuration + performance

3

Most popular NoSQL database

4

5 NoSQL categories

Key Value Wide Column Document

Graph Map Reduce

Redis, Riak Cassandra

Neo4j Hadoop

5

MongoDB is a Document Database

MongoDBRich Queries

• Find Paul’s cars• Find everybody in London with a car

built between 1970 and 1980

Geospatial • Find all of the car owners within 5km of Trafalgar Sq.

Text Search • Find all the cars described as having leather seats

Aggregation • Calculate the average value of Paul’s car collection

Map Reduce• What is the ownership pattern of colors

by geography over time? (is purple trending up in China?)

{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

6

Operational Database Landscape

MongoDB 3.0 & storage engines

8

MongoDB until 3.0

Read-heavy apps

• Great performance• B-tree• Low overhead

• Good scale-out perf• Secondary reads• Sharding

Write-heavy apps

• Good scale-out perf• Sharding

• Per-node efficiency wish-list:• Doc level locking• Write-optimized data

structures (LSM)• Compression

Other

• Multi statement transactions• In-memory engine• SSD optimized engine• etc...

9

Current state in MongoDB 2.6

Read-heavy apps

• Great performance• B-tree• Low overhead

• Good scale-out perf• Secondary reads• Sharding

Write-heavy apps

• Good scale-out perf• Sharding

• Per-node efficiency wish-list:• Doc level locking• Write-optimized data

structures (LSM)• Compression

Other

• Complex transactions• In-memory engine• SSD optimized engine• etc...

How to get all of the above?

10

MongoDB 3.0 Storage Engine API

MMAP

Read-heavy app

WiredTiger

Write-heavy app

3rd party

Special app

11

MMAP

Read-heavy app

WiredTiger

Write-heavy app

3rd party

Special app

• One at a time:– Many engines built into mongod– Choose 1 at startup– All data stored by the same engine– Incompatible on-disk data formats (obviously)– Compatible client API

• Compatible Oplog & Replication– Same replica set can mix different engines– No-downtime migration possible

MongoDB 3.0 Storage Engine API

12

• MMAPv1– Improved MMAP (collection-level locking)

• WiredTiger– Discussed next

• RocksDB– LSM style engine developed by Facebook– Based on LevelDB

• TokuMXse– Fractal Tree indexing engine from Percona

Some existing engines

13

• Heap– In-memory engine

• Devnull– Write all data to /dev/null– Based on idea from famous flash animation...

• SSD optimized engine (e.g. Fusion-IO)• KV simple key-value engine

Some rumored engines

https://github.com/mongodb/mongo/tree/master/src/mongo/db/storage

WiredTiger

15

• Modern NoSQL database engine– flexible schema

• Advanced database engine– Secondary indexes, MVCC, non-locking algorithms– Multi-statement transactions (not in MongoDB)

• Very modular, tunable– Btree, LSM and columnar indexes– Snappy, Zlib, 3rd-party compression– Index prefix compression, etc...– Encryption at rest

• Built by creators of BerkeleyDB• Acquired by MongoDB in 2014• source.wiredtiger.com, @WiredTigerInc

What is WiredTiger

16

Choosing WiredTiger at server startup

mongod --storageEngine wiredTiger

http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine

Default engine:MongoDB 3.0 = MMAP

MongoDB 3.2 = WiredTiger

17

Main tunables exposed as MongoDB options

mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --wiredTigerDirectoryForIndexes /data/indexes --wiredTigerCollectionBlockCompressor zlib --dbpath /data/datafiles

http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine

18

All WiredTiger options via configString (hidden)

mongod --storageEngine wiredTiger --wiredTigerEngineConfigString "cache_size=8GB,eviction=(threads_min=4,threads_max=8), checkpoint(wait=30)"

--wiredTigerCollectionConfigString "block_compressor=zlib"

--wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" --wiredTigerDirectoryForIndexes /data/indexes

See docs for wiredtiger_open() & WT_SESSION::create()http://source.wiredtiger.com/2.5.0/group__wt.html#ga9e6adae3fc6964ef837a62795c7840edhttp://source.wiredtiger.com/2.5.0/struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb

19

Also via createCollection(), createIndex()

db.createCollection( "users", { storageEngine: { wiredTiger: { configString: "block_compressor=none" } } )

http://docs.mongodb.org/master/reference/method/db.createCollection/#db.createCollectionhttp://docs.mongodb.org/master/reference/method/db.collection.createIndex/#db.collection.createIndex

20

• db.serverStatus()• db.collection.stats()

More...

Understanding and OptimizingWiredTiger

22

Understanding WiredTiger architectureW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

23

Covering 90% of your optimization needsW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

Decompression time

Disk seek time

24

Strategy 1: fit working set in CacheW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

cache_size = 80%

25

Strategy 2: fit working set in OS Disk CacheW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

cache_size = 10%

OS Disk Cache (Remaining: 90%)

26

Strategy 3: SSD disk + compression to save €W

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical diskSSD

27

Strategy 4: SSD disk (no compression)W

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical diskSSD

28

Compression benchmarks

29

What problem is solved by LSM indexes?P

erfo

rman

ce

Fast reads Fast writesBoth

Easy: Add indexes

Easy: No indexes

Hard: Smart schema design (hire a consultant) LSM index structures (or columnar)

30

2B inserts (with 3 secondary indexes)

http://smalldatum.blogspot.fi/2014/12/read-modify-write-optimized.html

top related