A Lean Approach to MonitoringSeptember 15, 2015
About Ernest• Product Manager at IDERA in Austin, TX• 20 years of IT experience, from startups
to enterprise shops• Runs CloudAustin user
group, DevOpsDays Austin conference
• Twitter: @ernestmueller• Blog: theagileadmin.com
AgendaThe Monitoring Landscape
What Is Lean?
MVP Monitoring Areas
Next Steps
Monitoring Your Systems
First Topic SubcontentGoes Here
Monitoring Your Applications
Monitoring Tools• Network (SNMP, Netflow)• Server (SNMP, WMI, system)• Virtualization/Cloud/Container• Real User Monitoring (network, browser)• Service Endpoint (simple/transactional,
local/remote)• Application (management interface,
instrumentation)• Software metrics (database, web/app server)• Custom metrics (application)• Logging, Security, Analytics, Reporting, More…
What To Do?• Monitor it all?
– Expensive– Complex
• How deep?– Monitor parts of it?– Gaps in visibility– Which parts?
Monitoring Pitfalls• “I have 100,000 metrics, but still can’t tell if the
site is down?”• “Did you know we’re generating 30% of our
system load from monitoring?”• “It’s going to cost how much? Maybe, but the
procurement cycle will be 9 months…”• “We’re spending 2 headcount just on maintaining
our monitoring systems!”• We get so many alerts we need a secondary
triage system so we know which ones to pay attention to.”
What Is Lean?
• Eliminate Waste• Amplify Learning• Decide as late as possible• Deliver as fast as possible• Empower the team• Build quality in• See the whole
Lean Principles
Your Monitoring Is A Product
• Build – Minimum Viable Monitoring• Measure – All the Monitoring Points• Learn – About the App and the
Monitoring• Repeat – Go Deeper Where It’s Needed
Iterate Through A Development Cycle
Monitoring MVP Areas
1. Service Performance and Uptime2. Software Component Metrics3. System Metrics4. Application Metrics
What are the most important areas to cover?
Service Performance and Uptime
• Remember lean principle “see the whole”• “What do my users see?”• MVP: external synthetic probe of the end
service• Next: RUM, waterfalls, transactions• Later: transaction warehousing, cross-tier
transaction tracing
The end user view is always the most critical
Remember the Process
• Build – Minimum Viable Monitoring• Measure – All the Monitoring Points• Learn – About the App and the
Monitoring• Repeat – Go Deeper Where It’s Needed
Lean Development Cycle
Software Component Metrics
• “Is my service up?”• Check ports/processes for actionable outages• MVP: local probes• Next: More metrics beyond uptime and
response time (most have a set they expose)• Later: Advanced deep dive database and
other app component APM
What you can page people on
System and Network Metrics
• “What is the root cause?”• Load on your systems and network devices• MVP: basic system metrics
(CPU/mem/disk/network)• Next: More depth, cloud/virt/container layer
stats• Later: Netflow, deeper dive into specific
hardware platform metrics (SANs, etc.)
Diagnosing Issues
Application Metrics
• “What is really going on?”• The app knows, get the app to tell you• MVP: Logging and log aggregation• Later: Better logging• Next: Specific app metric emission,
application instrumentation (Management API or bytecode)
Business value and troubleshooting specifics
Think About The Principles
• Eliminate Waste• Amplify Learning• Decide as late as possible• Deliver as fast as possible• Empower the team• Build quality in• See the whole
Lean Principles
Quick Demo
• CopperEgg – Ultra quick-start SaaS-based monitoring with basics on systems, endpoints, RUM, custom
• Uptime – Download and install infrastructure and application monitoring
• Precise – APM suite with deep support from everything from SAP to Java to SQL
Monitor At the Right Depth
Questions?
Monitor the Lean way…