harmony intune final

80
Harmony in Tune Philip (flip) Kromer Huston Hoburg infochimps.com Feb 15 2013 How we Refactored Cube to Terabyte Scale

Upload: mongodb

Post on 22-Mar-2017

602 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Harmony intune final

Harmony in Tune

Philip (flip) KromerHuston Hoburg infochimps.com

Feb 15 2013

How we Refactored Cube to Terabyte Scale

Page 2: Harmony intune final

Big Data for All

Page 3: Harmony intune final

Big Data for All

Page 4: Harmony intune final

why dashboards?

Page 5: Harmony intune final

Lightweight Dashboards

• Understand what’s happening

• Understand data in context

• NOT exploratory analytics

• real-time insight...but not just about real-time

mainline: j.mp/sqcube

hi-scale branch: j.mp/icscube

Page 6: Harmony intune final

The “Church of Graphs”

Page 7: Harmony intune final

Predictive Kvetching

Page 8: Harmony intune final

Lightweight Dashboards

Page 9: Harmony intune final

Approach to Tuning

• Measure: “Why can’t it be faster?”

• Harmonize: “Use it right”

• Tune: “Align it to production resources”

Page 10: Harmony intune final

cube is awesome

Page 11: Harmony intune final

What’s so great?• Streaming, real-time

• Ad-hoc data: write whatever you want

• Ad-hoc queries: make up new queries whenever

• Efficient (“pyramidal”) calculations

Page 12: Harmony intune final

Event Stream

• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }

Page 13: Harmony intune final

Events vs Metrics

• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }

Event:

• “# of tweets in 10s bucket at 1:02:10 on 2013-02-15”

• “# of non-english-language tweets in 1hr bucket at ...”

Metrics:

Page 14: Harmony intune final

Events vs Metrics

• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

Event:

Metrics:

• “# of requests in 10s bucket at 3:05:10 on 2013-02-15”

• “Average duration of requests with 4xx status in the 5 minute bucket at 3:05:00 on 2013-02-15”

Page 15: Harmony intune final

Events vs Metrics• Events:

• baskets of facts

• narcissistic

• LOTS AND LOTS

{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

Page 16: Harmony intune final

Events vs Metrics• Events:

• baskets of facts

• narcissistic

• LOTS AND LOTS

• Metrics:

• a timestamped number

• look like the graph

• one per time bucket

{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

{ time: "2013-02-15T01:02:03Z", value: 90 }

Page 17: Harmony intune final

billions and billions

Page 18: Harmony intune final

3000 events/second

Page 19: Harmony intune final

tuning methodology

Page 20: Harmony intune final

Monkey See Monkey Do

Google for the #s the cool kids use

Page 21: Harmony intune final

Spinal Tap

Turn everythingto 11!!!!

Page 22: Harmony intune final

Hillbilly Mechanic

Rewrite formemcachedHBase onCassandra!!!

Page 23: Harmony intune final

Moneybags

SSD plz

Moar CPU

Moar RAM

Moar Replica

Page 24: Harmony intune final

Tuning How to do it

• Measure: “Why can’t it be faster?”

• Harmonize: “Use it right”

• Tune: “Align it to production resources”

Page 25: Harmony intune final

see throughthe magic

Page 26: Harmony intune final

• Why can’t it be faster than it is now?

Page 27: Harmony intune final

• dstat (http://j.mp/dstatftw): dstat -drnycmf -t 5

• htop

• mongostat

Page 28: Harmony intune final

Grok: client-side

• Made a sprayer to inject data

• invalidate a time range at max speed

• writes variously-shaped data: noise, ramp, sine, etc

• Or just reach into the DB and poke

• delete range of metrics, leave events

• delete range of events, leave metrics

Page 29: Harmony intune final

Fault injection

• raise when packet comes in with certain flag

• { time: "2013...", data: {...}, _raise:"db_write" }

• (only in development mode, obvs.)

Page 30: Harmony intune final

app-side tracing

• “Metalog” announces lifecycle progress:

• writes to log...

• ... or as cube metrics!

metalog.event('connect', { method: 'ws', ip: connection.remoteAddress, path: request.url }, 'minor');

Page 31: Harmony intune final

app-side tracing

Page 32: Harmony intune final

fits on machine

Page 33: Harmony intune final

• Rate:

• 3000 ev/sec ≈ 250 M ev/day ≈ 2 BILLION/wk

• Expensive. Difficult.

• 250 GB accumulated per day (@1000 bytes/ev)

• 95 TB accumulated per year (@1000 bytes/ev)

3000 events/second

Page 34: Harmony intune final

Metrics• Rate:

• 3M tensec/year (π· 107 sec/year)

• < 100 bytes/metric ...

• Manageable!

• a 30 metric dashboard is ~ 10 GB/year @10sec

• a 30 metric dashboard is ~ 170 MB/year @ 5min

Page 35: Harmony intune final

20% gains are boring

At scale, your first barriers are either:

• Easy

• Impossible

Metrics: 10 GB/year

Events: 10 TB/month

Page 36: Harmony intune final

Scalability síPerformance no

Page 37: Harmony intune final

Still CPU and Memory Use

• Problem

• Mongo seems to be working

• but high resident memory and fault rate

• Memory-mapped Files

• 1Tb data served by 4Gb ram is no good

Page 38: Harmony intune final

Capped Collections

AA B C D E F

• Fixed size circular queue

• records are in order of insertion

• oldest records are discarded when full

AH C D E F G ......G

Page 39: Harmony intune final

Capped Collections

• Extremely efficient on write

• Extremely efficient for insertion-order reads

• Very efficient if queries are ‘local’

• events in same timebucket typically arrived at nearby timesand so are nearby on disk

AA B C D E F

Page 40: Harmony intune final

don’t like the answer?

change the question.

Page 41: Harmony intune final

uncapped events

capped metrics:

metrics are a view on data

mainline

Page 42: Harmony intune final

capped events

uncapped metrics:

events are ephemeral

hi-scale branch

Page 43: Harmony intune final

Harmony

• Make your pattern of accessmatch your system’s strengths and rhythm

Page 44: Harmony intune final

Validate Mental Model

Page 45: Harmony intune final

Easy fixes

• Duplicate requests = duplicate calculations

• Cube patch for request queues exists

• Easy fix!

• Non-pyramidal are inefficient

• Remove until things are under control

• ( solve paralyzing problems first )

Page 46: Harmony intune final

cube 101

Page 47: Harmony intune final

Cube Systems

Page 48: Harmony intune final

Collector

• Receives events

• writes to MongoDB

• marks metrics for re-calculation (“invalidates”)

Page 49: Harmony intune final

Evaluator

• receives, parses requests for metrics

• calculates metrics “pyramidally”

• then stores them, cached

Page 50: Harmony intune final

Pyramidal Aggregation

10 20 15 25 10 10

1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 6 0 0 1 0 3

90

ev ev ev ev ev ev ...

10s

1min

5min

Page 51: Harmony intune final

Pyramidal Aggregation

1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2

ev ev ev ev ev ev ...

10s

1min

5min

Page 52: Harmony intune final

Uses Cached Results

1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2

ev ev ev ev ev ev ...

10 20 15 25 10

5 5 4 6 4 1 2 7 0 0 0 1 10s

1min

5min

Page 53: Harmony intune final

Pyramidal Aggregation

5 min

1 min

10 sec

ev ev ev ev ev....

• calculates metrics...

• from metrics and constants ... from metrics ...

• from events

• (then stores them, cached)

Page 54: Harmony intune final

fast writes

Page 55: Harmony intune final

how fast can we write?

Page 56: Harmony intune final

how fast can we write?

FASTstreaming writes: way efficient

Page 57: Harmony intune final

locked out

Page 58: Harmony intune final

Writes and Invalidations

Page 59: Harmony intune final

Inserts Stop Every 5s

• working

• working

• ANGRY

• ANGRY

• working

• working

Page 60: Harmony intune final

Thanks, mongostat!

• working

• working

• ANGRY

• ANGRY

• working

• working

...

(simulated)

Page 61: Harmony intune final

Inserts Stop Every 5sEvents Collection

AH C D E F G ......G

hi-speed writes localized reads

Metrics Collection. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . ..

. ..

randomishreads

hi-speeddeletes

xxxxxxx

updates

Page 62: Harmony intune final

Inserts Stop Every 5sEvents Collection

AH C D E F G ......G

hi-speed writes localized reads

Metrics Collection. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . ..

. ..

randomishreads

hi-speeddeletes

xxxxxxx

updates

Page 63: Harmony intune final

Inserts Stop Every 5s• What’s really going on?

• Database write locks

• Events and metrics have conflicting locks

• Solution: split the databasesEvents Collection

AH C D E F G ......G

hi-speed writes localized reads

Metrics Collection. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . ..

. ..

randomishreads

hi-speeddeletes

xxxxxxx

Page 64: Harmony intune final

fast reads

Page 65: Harmony intune final

Pre-cache Metrics

• Keep metrics fresh (Warmer)

• Only calculate recent updates (Horizons)

Page 66: Harmony intune final

fancy metrics

Page 67: Harmony intune final

Non-pyramidal Aggregates

• Can’t calculate from warmed metrics

• Store values with counts in metrics

• Counts can be vivified for aggregations

• Smaller footprint than full events

• Works best for dense, finite values

Page 68: Harmony intune final

finally, scaling

Page 69: Harmony intune final

Multicore

• MongoDB

• Writes limited to single core

• Requires sharding for multicore

Page 70: Harmony intune final

Multicore

• Cube (node.js)

• Concurrent, but not multi-threaded

• Easy solution

• Multiple collectors on different ports

• Produces redundant invalidations

• Requires external load balancing

Page 71: Harmony intune final

Multicore

Page 72: Harmony intune final

Hardware

• High Memory

• Capped events size scale with memory

• CPU

• Mongo / cube not optimized for multicore

• Faster cores

• EC2 Best value: m2.2xlarge

• < $700/mo, 34.2GB RAM, 13 bogo-hertz

Page 73: Harmony intune final

Cloud helps

• Tune machines to application

• Dedicating databases for each application makes life a lot easier

Page 74: Harmony intune final

Cloud helps

• Tune machines to application

Page 76: Harmony intune final

good ideas that didn’t help

Page 77: Harmony intune final

Queues

• Different queueing methods

• Should optimize metric calculations

• No significant improvement

Page 78: Harmony intune final

Locks: update VS remove

• Uncapped metrics allow ‘remove’ as invalidation option

• Remove doesn’t help with database locks

• It was a stupid idea anyway: that’s OK

• “Hey, poke it and see what happens!”

Page 79: Harmony intune final

Mongo Aggregations

• Mongo has aggregations!

• Node ends up working better

• Mongo aggregations aren’t faster

• Less flexible

• Would require query language rewrite

Page 80: Harmony intune final

Why not Graphite?

• Data model

• Metrics-centric vs Events-centric(metrics code not intertwingled with app code)

• Environment familiarity

• Cube: d3, node.js, mongo

• Graphite: Django, Whisper, C