handling incidents

Post on 06-Aug-2015

93 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to handle incidents, downtime & outages

Devopsdays, Amsterdam 2015 David Mytton, Founder, Server Density

Cost of uptime?

Cost of uptime?

Cost of uptime?

$2.9bnQ1: 2015

Cost of uptime?

Cost of uptime?

$2.9bnQ1: 2015

$870mQ1: 2015

Cost of uptime?

Cost of uptime?

$2.9bnQ1: 2015

$870mQ1: 2015

$4.1bnQ1: 2015

Cost of uptime?

How much are you spending?

Expect downtime

• Prepare

• Respond

• Postmortem

Prepare

• On call

• Primary/secondary

Prepare

• On call

• Primary/secondary

• Reachability

Prepare

• On call

• Off call

Prepare

• On call

• Off call

• Docs

Prepare

• On call

• Off call

• Docs

• Searchable

Prepare

• On call

• Off call

• Docs

• Searchable

• Independent

Prepare

• Key info

• Team contacts

Prepare

• Key info

• Team contacts

• Vendor contacts

Prepare

• Key info

• Team contacts

• Vendor contacts

• Key credentials

Prepare

• Key info

• Unexpected situations

Prepare

• Communication

• Key info

• Unexpected situations

Prepare

• Communication

• Internet access

• Key info

• Unexpected situations

• Communication

• Internet access

• Support access

Prepare

Respond

• First responder

1. Load incident response checklist

Respond

• First responder

1. Load incident response checklist

2. Log into Ops War Room

Respond

• First responder

1. Load incident response checklist

2. Log into Ops War Room

3. Log incident in JIRA

Respond

• First responder

1. Load incident response checklist

2. Log into Ops War Room

3. Log incident in JIRA

4. Begin investigation

• Key response principles

• Log everything

Respond

Respond

• Key response principles

• Log everything

• Frequent public updates

Respond

• Key response principles

• Log everything

• Frequent public updates

• Gather the team

Respond

• Key response principles

• Log everything

• Frequent public updates

• Gather the team

• Escalate!

• Within a few days

Postmortem

• Within a few days

• Tell the story

Postmortem

• Within a few days

• Tell the story

• Appropriate technical detail

Postmortem

• Within a few days

• Tell the story

• Appropriate technical detail

• What failed, why?

Postmortem

Postmortem

• How it’s going to be fixed

Postmortem

ありがとうございます

david@serverdensity.com

@davidmytton

top related