nagios conference 2014 - david josephsen - alert on what you draw
DESCRIPTION
David Josephsen's presentation on Alert on What You Draw. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conferenceTRANSCRIPT
hi.
hi.
Alert on What you Draw
hi.
hi.
hi.
hi.
hi.
hi.
hi.
hi.
T
tim.mycorp.com bob.mycorp.com
hi.Developer cat
Wants to change things
Change control
Says no.
hi.
You guys ok?
YUP YUP
hi.Process Oriented Model
Given a finite number of reliable systems, and full environmental control, run processes for as long
as possible
hi.
The Cloud
VirtualVirtual
Massive Infrastructure
Maintenance Coersion
hi.
The Cloud
Virtual
Massive Infrastructure
Maintenance Coersion
hi.
The Cloud
Virtual
Multi-Tenant
Massive Infrastructure
Maintenance Coersion
hi.
The Cloud
Virtual
Multi-Tenant
Massive Infrastructure
Compulsory Maintenance
hi.
The Cloud
Virtual
Multi-Tenant
Massive Infrastructure
Compulsory Maintenance
hi.
The Cloud
hi.
The Cloud
hi.
The Cloud
hi.
US-EAST
AZ-1 AZ-2
US-EAST
AZ-1 AZ-2
US-EAST
AZ-1
AZ-2
AZ-3
US-EAST
AZ-1
AZ-2
AZ-3
US-EAST
AZ-1
AZ-2
AZ-3
US-EAST
AZ-1
AZ-2
AZ-3
US-EAST
AZ-1
AZ-2
AZ-3
XXXX
hi.
hi.
hi.Services Oriented Model
Design reliable services atop an infinite number of unreliable, and uncontrollable systems.
hi.
hi.
hi.
hi.
hi.
hi.
hi.here’s a log line wrapped in an http GET request
hi.HTTP 200 OK!
hi.
}<100ms
}<10
0ms
tim.mycorp.com
It Scales
I can change how it works
It’s resilient
It ScalesIt’s resilient
I can change how it works
It ScalesIt’s Resilient
I can change how it works
I can change how it works
Latency, Queues, Workers
Summarized at the service level
I can change how it works
Latency, Queues, Workers
Monitored from within..
I can change how it works
Latency, Queues, Workers
Monitored from within..
…by Developers
Summarized at the service level
I can change how it works
Latency, Queues, Workers
Monitored from within..
…by Developers
Summarized at the service level
I can change how it worksSummarized at the service level
I can change how it works
Can be polled externallyState Data about hosts the order of minutes
Operations controls and configures hosts and services
Must be instrumented internally
Performance Data about services the order of seconds
Any engineer can create new ad-hoc metrics
I can change how it works
internally instrumented
metrics measured every few seconds
write-accessible by every engineer
I can change how it works
I can change how it works
I can change how it works
Undermines Credibility
I can change how it works
Undermines CredibilitySilo’s Knowledge
Multiplies Burden
I can change how it works
Undermines CredibilitySilo’s KnowledgeMultiplies Burden
Heka
Riemannhttp://riemann.io/
http://hekad.readthedocs.org/en/v0.7.2/
LIVE Demo Ahead!
Questions?