why visibility into your stack matters
TRANSCRIPT
Why visibility into your
stack mattersor, Do you see it all?
Mike Fiedler
Operations
Datadog.comTwitter: @mikefiedler
GitHub: @miketheman
OpsSchool.org
Chef Community
Roller Derby Referee
Skydiver
©Alex Erde
–CEO calling your cellphone at 03:00
“The site is slow.”
What?
• typical monitoring implementation story
• an alternative approach
(CC BY 2.0) http://www.gotcredit.com/ https://flic.kr/p/6439SA
LB
Data
User
Web
(CC BY 2.0) www.futurealpha.com https://flic.kr/p/8PhF4g
(CC BY 2.0) Aristocrats-hat https://flic.kr/p/6qdTC1
–W. Edwards Deming, The Elements of Statistical Learning
“In God we trust; all others bring data.”
You want more?
• graphite
• ganglia
• mongodb
• mysql
• influxdb
• socket.io
• datadog
• …
Time is a Cruel Master
(CC BY-SA 2.0)
https://www.flickr.com/theilr/
https://flic.kr/p/8MC5YM
Have
• systems
• applications
• services
• developers
• operators
• customers
Have
• systems
• applications
• services
• developers
• operators
• customers
Polyglot Platforms
Complex Systems
Disparate Locations
Information Overload
–CEO calling your cellphone at 03:00
“The site is slow.”
(CC BY 2.0) www.futurealpha.com https://flic.kr/p/8PhF4g
What exactly are we
monitoring, anyhow?
Top-down
• work metrics
• resource metrics
• events
Work Metricsthroughput, success vs error, performance
Resource Metricsutilization, saturation, errors, availability
Eventschange/build/deploy, alerts, etc
Trend resource metrics,
notify on changes
Wake people up when
work metrics go awry
Slice and Dice
Set-and-Forget
Just-In-Time
Information
Does it scale?
Customer Stats
• AdRoll, ~2m transactions/second
• SimpleReach, ~7b measurements/day
• MercadoLibre, ~18k hosts monitored
• AirBnB, 3000+ monitors defined
–M. Fiedler
“If you don’t measure, you don’t know.”
Questions?
Mike Fiedler
OperationsTwitter: @mikefiedler
GitHub: @miketheman
OpsSchool.org
Chef Community
Roller Derby Referee
Skydiver
©Alex Erde