is it fast? : measuring mongodb performance

IS IT FAST?Measuring MongoDB Performance

Tim Callaghan, @acmebench

Who am I?

ACME + + ing+

CBO (Chief Benchmarking Officer)

Why call it ACME?

Me?• Database Consumer (92-09)• VoltDB (09-11)• Tokutek (11-15)• CrunchTime! (15-)• Semi-professional database benchmarker

Why measure performance?

1. Win cool bug-hunt prizes


2. Optimize and tune for your environment and workload


3. Monitor for regressions


4. Improve your skillset (or become sympathetic)

Code

Build

Test

Deploy

Operate

Monitor

Developers Operations

MongoDB performance timeline

2009 2010 2011 2012 2013 2014 2016 201820172015

MongoDB v1.0

Global Lock

(2/2009)

MMAPv1


2009 2010 2011 2012 2013 2014 2016 201820172015

MongoDB v1.0

Global Lock

(2/2009)

MongoDB v2.2

Database Lock

(8/2012)

MMAPv1


2009 2010 2011 2012 2013 2014 2016 201820172015

MongoDB v1.0

Global Lock

(2/2009)

MongoDB v2.2

Database Lock

(8/2012)TokuMX v1.0

Concurrency

Compression

(6/2013)

MMAPv1

TokuMX

Not fast.


2009 2010 2011 2012 2013 2014 2016 201820172015

MongoDB v1.0

Global Lock

(2/2009)

MongoDB v2.2

Database Lock

(8/2012)TokuMX v1.0

Concurrency

Compression

(6/2013)

MongoDB v3.0

Storage Engines

(3/2015)

MMAPv1

TokuMX

WiredTiger

RocksDB

TokuMXse

Others?


2009 2010 2011 2012 2013 2014 2016 201820172015

MongoDB v1.0

Global Lock

(2/2009)

MongoDB v2.2

Database Lock

(8/2012)TokuMX v1.0

Concurrency

Compression

(6/2013)

MongoDB v3.0

Storage Engines

(3/2015)

MMAPv1

TokuMX

WiredTiger

RocksDB

TokuMXse

Others?

(Prediction)

Interesting times ahead

Storage Engine == Performance• MongoDB v3.0 = Storage Engine API v1.0

• It’s going to take some time• TokuMX currently has serious performance advantages

• Read free replication, partitioning, read free $ operations

• Competition FTW!

• Available now• Some performance improvements• Compression

• Future features• Additional performance improvements• Transactions• Joins

Default

Storage Engine Wars! – today

MMAPv1

se

Default

Storage Engine Wars! – tomorrow?

MMAPv1

In-Memory?

OLAP/Analytics?XOthers?

se

Column Store?

Important performance concepts• Throughput

• How many “transactions” per “second” were completed

• Latency• How many “seconds” did each “transaction” take

• Which is important to your use-case? Both?• Each should be measured in detail

• Overall average• Interval average (every 10 seconds)• Exit (last 10% of run)• Percentiles (99%, 95%)• Outliers (find a way to catch them)

What is A/B benchmarking?• Always have two “sides” for comparison

• Today vs. yesterday• directIO vs. bufferedIO• WiredTiger vs. RocksDB• Snappy compression vs. zlib• EC2 m3.large vs. m3.2xlarge

• Change 1 thing• Compare to prior run• Repeat

Benchmarking 101

Step 1: Model your workload• Three techniques

• Use your real data and real workload if possible• You probably can’t share with others,

• Capture/replay tools• Same downside to above, • Also might be hard to modify data or workload

• Create a synthetic representation• i.e., a benchmark• Open source and share it

Step 2: Run it often• Every day, or at least weekly• Look for measurable changes

• Throughput, latency, CPU, RSS, IO

• Compare to yesterday, last week, last month• Automation is a must• Tutorial at http://bit.ly/benchmarkmongodb

• Use for testing any upcoming changes• OS, hardware, application version, MongoDB upgrade

• Measure and save everything• Save the data forever• You are only measuring too much when it impacts performance• Start with mongostat, iostat, ps

http://bit.ly/benchmarkmongodb

http://bit.ly/benchmarkmongodb

Step 3: Share with others (if possible)• Open source your benchmark• Blog about your results• File crashes or performance issues (bug hunt!)

• https://jira.mongodb.org

• Encourage storage engine competition

https://jira.mongodb.org/

https://jira.mongodb.org/

Is it fast ENOUGH?• What if your application is performing fine?

• But you’d like to reduce your infrastructure

• MongoDB v3.0 allows mixed storage engines within replica sets

• Add a hidden replica set member with a new storage engine into your production environment

• Compare CPU, RSS, IO, disk space with other secondaries

• You won’t see how it will perform as primary• Far different concurrency model

The future of MongoDB performance

?

Things to look forward to, part 1• MMAPv1 journal performance

• Collection level locking in v3.0 only changed the bottleneck• Group commit algorithm?• WiredTiger as default makes this unimportant

• Capped collections are hard• How “large” is a transactional data store at a given point in time?• They are natural in MMAPv1 (CLL), but nowhere else• TokuMX solved this by partitioning the oplog

• But used time based partitioning (by hour or by day)

• Interesting solutions are surely coming

• TokuMX• Currently based on MongoDB v2.4, needs v2.6 or v3.0• Public feature roadmap?

Things to look forward to, part 2• The oplog gates performance

• It’s a capped collection (see prior slide)• It’s a serious point of contention (writers and readers)

• Replication bottleneck• Write concurrency on primaries is far higher than on secondaries• Multiple mongod processes per physical server is workaround

• But adds significant operational complexity

• MySQL is constantly improving this, as will MongoDB

• TTL indexes are painful• In write optimized SE inserts are far less work than deletes• Extremely busy systems might fall behind and never catch up

DO TRY THIS AT HOME!

Tim Callaghan

Acme Benchmarking

www.acmebenchmarking.com

@acmebench

http://www.acmebenchmarking.com/

is it fast? : measuring mongodb performance

Software