incident management system (ims) - blackrock 3 partners ...€¦ · about us 4 who we are – deep...
TRANSCRIPT
Incident Management System (IMS)
© 2014 Blackrock 3 Partners LLC
Chris Hawley, Rob Schnepp, and Ron Vidal
About Us
4 Who We Are– Deep global experience in Incident Management and Critical Infrastructure– Fire, Special Operations & Law Enforcement
• Chemicals, Technical Rescue, Anti-Terrorism, Counter Proliferation– Critical Infrastructure
• Fiber Networks, Data Centers, Oil & Gas, Power, Capital Markets– Market Leader in IMS for IT
4 What We Do– Maximize Uptime During High Severity IT Incidents
• Assess, Train, Evaluate & Exercise Incident Response Teams– Engage with Teams Across the Customer’s Organization
• NOC, Site Reliability, Cybersecurity, Mission Critical Support, Executives– Customers: Global Cloud Providers, Fortune 500 Enterprises & Developers
• Incorporate IMS into ITIL, DevOps, Agile, Lean Practices• Publicly traded and privately held companies
2
www.blackrock3.com
Chris Hawley –[email protected]
Rob Schnepp –[email protected]
Ron Vidal –[email protected]
San Francisco & Baltimore
3
Blackrock 3 Partners LLC
4
Day 109:00 – 10:00 Introductions and course overview10:00 – 10:30 Team Building exercise10-minute break10:40-11:50 IMS terminology and CAN report
exercise3 minute paper
12:00-13:00 Lunch13:00-14:00 Incident response decision making14:00-15:30 Role of the Incident Commander10-minute break15:40-16:45 Span of control16:45-17:00 Wrap up and questions
Day 1
5
Day 2
Day 209:00-10:30 Problem solving exercise10-minute break10:40-11:30 Exercise debrief and discussion11:30 – 12:00 Lunchtime exercise briefing
12:00-13:00 Lunchtime exercise14:00-14:30 Lunchtime exercise debrief10-minute break14:50-16:00 Personalities!16:00-16:45 After Action Reviews (AAR)16:45-17:00 Wrap up and questions
6
TIME and transitionsOn DutyOn Call
Life clock Game clock
ToneInteractionManagementEngagement
Resolution
Incident resolution is a
people to people activity
7
4 Incident Management System (IMS)– National standard for managing all-hazard incidents for the last 40 years
4 IMS specifically designed for Emergency Operations– High stakes, life or death situations 8
IMS Overview
IMS Overview
9
The Process - Management The PeopleLeadership
Incident Management Incident Command
How Do You Respond?
PredictableRepeatableOptimizedClearEvaluatedScalableSustainable
10
Derisking response in advance of the incident
AvailableNotifiedRespondingEngagedAssigned (Staged)Released
Peacetime vs. Wartime
4Process must be in place to accept the rapid change
4Everybody has to be on board with the change
4Emergency Services come to workexpecting wartime operations
4Hope is never a viable plan
11
Do you see problem or solution?
12
The world of Operations
Incident vs. Emergency vs. Event vs. Problem vs. Alert
Respond or React
Bodies and Players
13
Issue Monitoring
Incident Commander
Network Database
DBA - 1 DBA - 2
SAN / Storage
Customer Liaison
Executive Liaison
Response Resolution
AARNotification
Incident Lifecycle
Severity
MTTA MTTR AAR
CAN Report
ConditionsWhat’s happening?
ActionsWhat’s being done?
NeedsWhat are the needs?
14
Always consider the consumer of the CAN report beforegiving it!
Developing The "Battle Plan" For The Incident.
Size-UpTriageActReview
Getting Oriented to the Incident
Size-up
4Size-up is a mental process of evaluation of the incident – 360 degree view
4Gathering as much information as quickly as you can, but realizing the incident is not going to be on hold while you complete this
4Facts, possibilities, and probabilities 4Distilled down to the most important
pieces of information16
17
Dispatch - - - - - - - - NotificationSpecific - - - - - - - - - AccurateResponder - - - - - - - ReactorInvestigatingSize UpStand ByStagingCANSpan of ControlLNOIC
Terminology
18
Decouple the process of decision making from the outcome you
anticipate
Making Decisions – Rule #1
19
Making Decisions – Rule #2
Own the Process not the Problem!
20
Making Decisions – Rule #3
In most cases, idea generation isn’t the challenge – it’s idea selection
21
It’s not about making quick decisions – it’s about making the best decision in the shortest amount of time . . . .
Based on what you know at the time!
Making Decisions – Rule #4
22
Pitfalls of group decision making:
ConformityGroup polarization
Obedience to AuthorityWonder becomes Wander
Making Decisions
23
Bigger pool of responders = more differentiation of opinion and
perspective
ISTP = widespread responsibility for decision
Support vs. Consensus
Making Decisions
Making Decisions4As decision making cycles get tighter,
communications must keep up with the pace of the incident.
4Incident response communication is based on TRUST
– TIME
– Recognize Fact Pattern
– Understand the Circumstances
– See the Linkages
– Transmit the decision(s), thinking, opinion(s) etc.24
CAN Report
ConditionsWhat’s happening?
ActionsWhat’s being done?
NeedsWhat are the needs?
25
Always consider the consumer of the CAN report beforegiving it!
DISASTER
Linking Response to Risk
26
Low Risk Moderate Risk
High Risk Extreme Risk
27See it – Fix It
28
Basic solution set is available. Finding the right one is the trick
29
No solution may be readily available or a new solution needs to be determined.
Trial and Error
Creative Thinking
30
It becomes a brave new world
Manageable Span of Control4Span of control:
– How many individuals or resources should a supervisor lead during an incident?
31
IMS
Incident Commander
Applications
App 1 App 2
Database
DBA - 1 DBA - 2
Continued Operations
Comm’s
Liaison LNO
Scribe
Disaster Recovery
32
Incident Commander
(IC)
Subject Matter Experts(SME)
Incident Commander
DBA Network Triage & Diagnostics
Liaison (LNO) Scribe/Comms
Span of Control
Incident Commander
DatabaseGroup Lead
Applications Group Lead
Infrastructure Group Lead
Disaster Recovery Lead
LNO Plans
Scribe Comms
Span of Control
Incident Commander
DBA Group Lead
DBA-1 DBA-2
DBA-3
Network Triage & Diagnostics
Liaison (LNO) Communications
Scribe
Span of Control
Incident Commander
DBA Group Lead
DBA-1 DBA-2
DBA-3
Applications Lead
App-1 App-2
Site Switch Lead
LNO Plans
Scribe Comms
Span of Control
During complex incidents, group leaders coordinate their own actions and report up to IC.
Unified CommandIC
Network LeadStorage LeadApp 2 LeadApp 1 Lead
Operations
SME SMESME SMESME SMESME SME
During complex incidents, group leaders coordinate their own actions and report up to IC.
Unified CommandUC
Network LeadStorage LeadApp 2 LeadApp 1 Lead
IC
SME SMESME SMESME SMESME SME
Lost at Sea
4You have chartered a yacht for a trip over the Atlantic with 3 friends. As none of you have any sailing experience you hire a crew of 3. In the mid-Atlantic there is a fire and all of the crew is lost. The yacht is sinking. You do not know where you are, but you are hundreds of miles away from land. You have saved 15 items that are undamaged. You have a 4 person lifeboat and a box of matches.
4Your individual task is to rank these items from the most important (1) to the least important (15)
39
The job of an IC
40
4The IC must direct the group to accept the existence of second and third order consequences
4Developing a Plan B, C…...4Forwarding thinking4Taking care of their people4Making notifications
41
Why is the IC critical?
Operational Periods
4The IC should provide regular operational briefings– Timing is determined by the anticipated MTTR– For incidents under 4 hours, briefings are helpful every 30-
45 minutes– For incidents over 4 hours, briefings should be every 60-95
minutes– The need for information dissemination also drives the
timing
42
Transfer of Command
4May be passed temporarily– Announced to the bridge– No formal transfer
4Formal transfer includes;– IC to IC discussion (offline)– Off-going IC announces change– On-coming IC provides situational status and operational
period briefng
43
Role of SME’s in Response
4Your role = a Subject Matter Expert to make recommendations
4Arrive in a timely fashion4Identify yourself when entering the bridge4Ensure your work environment is quiet4Speak up and speak clearly4Be direct and factual4Respect IC timeline4If you need more help – ask for it4Never let the IC fail!
44
45
DescriptionExplanation
SolutionResponse by SME’s – “I support the plan”
WHAT
WHY
HOW
Listen to the Chatter
4“I don’t know, maybe we should . . .”4“I guess we could . . . . “4“I always knew . . . “4“This always happens . . .”4“Listen, I’m not here to . . . .”4“Whatever . . . “4“That’s impossible . . . “4“That never happens . . . “4“We’ve always done it this way before . . .”
46
Personalities
4The Joker 4The gun slinger/savior4Overbearing4Over explainer4Uncertain SME4The interrupter4The grenade thrower
47
Personalities
4Quiet one4Naysayer4Bridge lurker4Tunnel rat4Jumper to conclusions420/20 Hindsight – Monday
morning quarterback4Chicken little
48
Comparison
4 Executive swoop4 Long Conversations4 Bad SME4 Team friction4 Background noise4 Language challenges4 Cultural challenges4 Lack of progress4 Lack of sense of urgency4 Fatigue
4 The savior4 Over explainer4 Uncertain SME4 The interrupter4 The grenade thrower4 Naysayer4 Bridge lurker4 Tunnel rat4 20/20 – Monday morning4 Chicken little
49
Situations Personalities
Three parts of an AAR
Determine the root CAUSE of
the problem
Evaluate the Impact to the Business and how to
PREVENT future incidents
Evaluate the people
RESPONSE
50
51
Looking for “Some Guy”
4Failure4Operations4Process4Software4Hardware4Response4Responsibilities
Evaluate the TALENT
TrainingAccountabilityLeadershipEmpowermentNotificationTrust/Temptation
52
53
Identify a Group Leader (GL)
Timeline for the exercise (30 minutes at (3) 10 minute blocks)
(1) Identify a recent incident within the last 12 months and write a brief incident synopsis including incident benchmarks, challenges; barriers to success; response time to SLA’s; etc.
(2) Discuss the actions retrospectively as they relate to the principles and of IMS.
(3) Determine points on the line where adherence to IMS may have changed the trajectory of the incident for the positive.
Identify the top 3 lessons learned (Q/A and Q/I) as they relate to the incident. TALENT
GL provides debrief to larger group
Putting it All Together – AAR Exercise
What does a good SME look like?
4Just the facts (Dragnet)4Straight shooter, no sugar coating4Trusted advisor4Answers quickly and concisely 4Responds quickly4Anticipates the needs of the IC (Radar O’Reilly)4Skill and temperament of a surgeon and pilot
54
Establishing a Culture
4Things WILL go wrong4There should be a process to address them4Incident Management System (IMS)
– Handling of an occurrence– Handling of an issue– Having a really bad day
4AAR review4Retrospective and prospective
4The key is a commitment to making change
55
www.blackrock3.com
Chris Hawley –[email protected]
Rob Schnepp –[email protected]
Ron Vidal –[email protected]
San Francisco & Baltimore
56
Blackrock 3 Partners LLC