Download - Advanced Benchmarking at Parse
Advanced Benchmarking at Parse
Travis Redman Parse + Facebook
Parse?
• Parse is a backend service for mobile apps
• Data Storage
• Server-side code
• Push Notifications
• Analytics
• … all by dropping an SDK into your app
Parse Stats
• Parse has 400,000 apps
• Rapidly growing MongoDB deployment with:
• 500 databases
• 2.5M collections
• 8M indexes
• 50T storage (excluding replication)
• We have all kinds of workloads!
Variety is Fun• We support just about any kind workload you can
imagine
• Games, social networking, events, travel, music, etc
• Apps that are read heavy or write heavy
• Heavy push users (time sensitive notifications)
• Apps that store large objects
• Apps that use us for backups
• Inefficient queries
2.6 - Why Upgrade?
• General desire to stay current, precursor for 2.8 and pluggable storage engines
• Specific features in 2.6
• Background indexing on secondaries
• Index intersection
• query plan summary logging
Upgrading is Scary
• In the early days, we just upgraded
• Put a new version on a secondary
• ???
• Upgrade primaries
• ???
• Fix bugs as we find them - LIVE!
Upgrading
• We’re too big now to cowboy it up
• Upgrading blindly is a potential catastrophe
• In particular, we want to avoid:
• Significant performance regressions
• Unexpected bugs that break customer apps
Benchmarking
• We know that:
• Benchmarking can detect performance regressions between versions
• Tools and sample workloads (sysbench, YCSB, …) already exist
• MongoDB runs its own benchmarks
• Our workload is complex - we want more confidence
A Customized Approach
• Why not test with production workloads?
• Flashback: https://github.com/ParsePlatform/flashback
• Record - python tool to record ops
• Replay - go tool to play back ops
Record
• Record leverages mongo’s profiling and oplog
• Profiling is enabled on all DBs
• Inserts are collected from the oplog
• All other ops taken from profile db
• Ops are recorded for specified time period (24H) and then merged
• Produces a JSON file of ops to feed the replay tool
Recording
Base Snapshot
• Need to replay prod ops on prod data
• It’s best to play back ops on a consistent copy of the data, otherwise:
• inserts are duplicate key errors
• deletes are no-ops
• queries don’t return the right data
• Using EBS snapshots, we grab a copy of the db during the recording
• Discard ops before the snapshot
Recording Timeline
Base Snapshot
• Snapshot is restored to our benchmark server(s)
• EBS volume has to be “warmed” because snapshot blocks are not instantiated
• Multi TB volumes can take a few hours to warm
• After warming we create an LVM snapshot
• We can “rewind” (merge) after each playback, iterating faster
Playback
1. Freeze the LVM volume
2. Start the version of mongo being tested
3. Adjust replay parameters
• # workers
• # num ops
• timestamp to start at (when base snapshot was taken)
4. Go!
5. Client-side results are logged to file, server-side collected from monitoring tools
Playback
Our Workload
• 24h of ops collected
• 10M ops at a time, as fast as possible
• 10 workers
• No warming of RS
• LVM snapshot reset, mongod restarted for each version
• Rinse and repeat for multiple replica sets
Our Results
2.4.10
3061.96 ops/sec (avg)
Results2.6.3
2062.69 ops/sec (avg)
• 33% loss in throughput.
• A second workload showed a 75% drop in throughput
• 3669.73 ops/sec vs 975.64 ops/sec
• Ouch! What do we do next?
Results
2.4.10 P99
2.4.10 MAX
2.6.3 P99
2.6.3 MAX
query 18.45ms 20953ms 19.21ms 60001ms
insert 23.5ms 6290ms 50.29ms 48837ms
update 21.87ms 3835ms 21.79ms 48776ms
FAM 21.99ms 6159ms 24.91ms 49254ms
Replay Data
Replay Data
Bug Hunt!
• Old fashioned troubleshooting begins
• Began isolating query patterns and collections with high max times
• Reproduced issue, confirmed slowness in 2.6
• Lots of documentation and log gathering, including extremely verbose QLOG
• Started investigation with the Mongo team that ran several weeks
What we found
• Basically, new query planner in 2.6 meets Parse auto-indexer
• We create lots of indexes automatically
• More indexes to score and potentially race
• Increased likelihood of running into query planner bugs
Example 1
Remove op on “Installation”
{ "installationId": {"$ne": ? }, "appIdentifier": "?", "deviceToken": “?”}
• 9M documents
• installationId is UUID, unique value
• "installationId": {"$ne": ? } matches most documents
• deviceToken is a unique token identifying the device
{ "installationId": {"$ne": ? }, "appIdentifier": "?", "deviceToken": “?”}
• Three candidate indexes:
{installationId: 1, deviceToken: 1} {deviceToken: 1, installationId: 1} {deviceToken: 1}
• The second and third indexes are clearly better candidates for this query, since the device token is a simple point lookup.
• Mongo bug where the work required to skip keys was not factored in to the plan ranking, causing the inefficient plan to sometimes tie
• Since it’s a remove op, held the write lock for the DB
• Fixed in: https://jira.mongodb.org/browse/SERVER-14311
Example 2
Query on “Activity”:
{ $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }
• 25M documents
• _p_project and _p_newProject are pointers to unique IDs of other objects
• acl matches most documents
• Four candidate indexes for this query
{ _p_newProject: 1 } { _p_project: 1 } { _p_project: 1, _created_at: 1 } { acl: 1 }
{ $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }
• Query Planner would race multiple plans using indexes
• Due to a bug, one of the raced indexes would do a full index scan (acl)
• Index scan was non-yielding, tying up the lock until it had completed
• Parse query killer job kills non-yielding queries after 45s
• Query planner would fail to cache plan, and would re-run on next query with the same pattern
• Fixed: https://jira.mongodb.org/browse/SERVER-15152
Example 3Query on “Activity”: { $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } } (same as previous example)
• Usually fast, but occasionally saw high nscanned and query time > 60s
• Since there were indexes on all fields in AND condition, this was a candidate for index intersection
• planSummary: IXSCAN { _p_project: 1 }, IXSCAN { _p_newProject: 1 }, IXSCAN { acl: 1.0 }
• acl was not selective, but _p_project and _p_newProject would sometimes match 0 documents during race
• intersection-based query plan would get cached, subsequent queries slow
• Fixed in https://jira.mongodb.org/browse/SERVER-14961
Success?2.6.5
4443.10 ops/sec (vs 3061.96 in 2.4.10)
Comparison
2.4.10 P99
2.4.10 MAX
2.6.4 P99
2.6.4 MAX
2.6.5 P99
2.6.5 MAX
query 18 ms
20,953 ms
19 ms
60,001 ms
10 ms
4,352 ms
insert 23 ms
6,290 ms
50 ms
48,837 ms
24 ms
2,225 ms
update 22 ms
3,835 ms
21 ms
48,776 ms
23 ms
4,535 ms
FAM 22 ms
6,159 ms
24 ms
49,254 ms
23 ms
4,353 ms
More Results
2.4.10 2.6.5
Ops:10M W:10
3061 ops/sec
4443 ops/sec
Ops:10M W:250
10666 ops/sec
12248 ops/sec
Ops:20M W:1000
11735 ops/sec
14335 ops/sec
What now?
• 2.6 has a green light on performance
• Working through functionality testing
• Unit/integration testing catching majority of issues
• Bonus: Flashback error log helping us to identify problems not caught by tests
Wrap Up
• Benchmarking with something representative of your production workload is worth the time
• Saved us from discovering slowness in production and inevitable and painful rollbacks
• Using actual production data is even better
• Helped us avoid new bugs
• Learned a lot about our own service (indexing algorithms need some work)
• Initial work can be reused to efficiently test future versions
Questions?
• Flashback: https://github.com/ParsePlatform/flashback
• Links to bugs:
• https://jira.mongodb.org/browse/SERVER-14311
• https://jira.mongodb.org/browse/SERVER-15152
• https://jira.mongodb.org/browse/SERVER-14961