life on-call, availa-liberty, and the pursuit of happiness

30
Life On-Call, Availa- liberty, & the Pursuit of Happiness Runbooks Dave Cliffe @CliffeHangers

Upload: dave-cliffe

Post on 14-Feb-2017

21 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Life On-Call, Availa-

liberty, & the Pursuit

of HappinessRunbooksDave Cliffe

@CliffeHangers

Page 2: Life On-Call, Availa-liberty, and the Pursuit of Happiness
Page 3: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Incident #1:Oct 27, 2011

Incident #2:May 1, 2013

Incident #3:Nov 2, 2015

Page 4: Life On-Call, Availa-liberty, and the Pursuit of Happiness
Page 5: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Collaboration/Resolution

MICROSERVICES

APPS & SERVICES

CONTAINERS

CLOUD

NETWORK

DATABASE

SERVERS

Developer

NOC

Helpdesk

IT OpsSystem and User

Efficiency

ALERT 1 ALERT 2 ALERT 3

Correlate, Cluster and Manage

EVENTS

People Data Process

Deployment Tools

Monitoring Tools

Ticketing Tools

APP

SYSTEM

LOG

WEB

MOBILE APP

Automatic Escalations

On-CallScheduling

Your Fastest Path to Incident Resolution

Page 6: Life On-Call, Availa-liberty, and the Pursuit of Happiness
Page 7: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Availability

Page 8: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Every software powered company experiences downtime

http://www.evolven.com/blog/downtime-outages-and-failures-understanding-their-true-costs.html

Cost of outages:

$7,400,000 annual cost @175 hours downtime Gartner

Page 9: Life On-Call, Availa-liberty, and the Pursuit of Happiness

“The most important ability is availability.”

All CEOs everywhere

Page 10: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Why is Availability a terrible metric?

Page 11: Life On-Call, Availa-liberty, and the Pursuit of Happiness

The Tyranny of the SLA

credit: J. Paul Reed (@jpaulreed)

Page 12: Life On-Call, Availa-liberty, and the Pursuit of Happiness

“System Availability” means the percentage of total time during which the Hosted Service network is available to Client and Client is able to access the Hosted Service system interface.

______ warrants the following minimum levels of Hosted Service System Availability during each calendar month: 99.95%

The following definitions will apply to the calculation of “availability”:“Hosted Service System Availability” means the percentage of total time during each calendar month during whichthe Hosted Service is available to Client, excluding Scheduled Downtime and Emergency Maintenance

An actual SaaS SLA

Page 13: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Are you Available?

Page 14: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Happiness

Page 15: Life On-Call, Availa-liberty, and the Pursuit of Happiness
Page 16: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Measuring (Un)Happiness

Page 17: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Responsiveness

Page 18: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Pain

Page 19: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Health Checks

https://labs.spotify.com/2014/09/16/squad-health-check-model/

Page 20: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Happiness++

Page 21: Life On-Call, Availa-liberty, and the Pursuit of Happiness

http://www.activestate.com/blog/2014/01/devops-hero-culture

Beware the ‘Hero Culture’

Page 22: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Eliminate Single Points of

Dependence

Page 23: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Reduce Alert

Fatigue

https://www.pinterest.com/pin/497929302524908289

Page 24: Life On-Call, Availa-liberty, and the Pursuit of Happiness
Page 25: Life On-Call, Availa-liberty, and the Pursuit of Happiness

On a regular basis, For every alert, Ask …

1) Is it actionable?2) Is it urgent?3) Could we consolidate?4) Did the right person get it?

Page 26: Life On-Call, Availa-liberty, and the Pursuit of Happiness

“The most important on-call responsibility is to understand customer impact.” Anonymous Customer (who I didn’t verify I could quote)

Page 27: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Sharing Operational

Responsibility

Page 28: Life On-Call, Availa-liberty, and the Pursuit of Happiness

“Giving developers operational responsibilities has greatly enhanced the QUALITY of the services, both from a customer and

a technology point of view.

The TRADITIONAL model is that you take your software to the wall that separates development and operations, and throw it

over and then forget about it.

-Werner Vogels, CTO Amazon

SHARED OPERATIONAL RESPONSIBILITY

… You build it, you run it.”

Page 29: Life On-Call, Availa-liberty, and the Pursuit of Happiness

“For developers to take responsibility for the systems they create, they need support from

operations to understand how to build ’reliable software that can be continuous deployed to an

unreliable platform that scales horizontally’.”

-Jez Humble, quoting Jesse Robbins (Chef)

SHARED OPERATIONAL RESPONSIBILITY

Page 30: Life On-Call, Availa-liberty, and the Pursuit of Happiness

Thanks!

RunbooksDave [email protected]

m @CliffeHangers