scenario based recovery metrics

36
A Method to Measure the Ability to Recover from Simple to Complex Disaster Scenarios Rod Davis, CRISC, CBCP SIL International Version 1.01

Upload: rod-davis

Post on 12-Apr-2017

115 views

Category:

Technology


0 download

TRANSCRIPT

A Method to Measure the Ability to Recover from Simple to Complex Disaster Scenarios

Rod Davis, CRISC, CBCP

SIL InternationalVersion 1.01

Thanks so much to SIL Global Technology and

Information Services for their patience and suggestions

in helping craft our disaster recovery methodology. Our

current format of disaster recovery fed directly into the

development of this presentation.

What is ‘Scenario-based Recovery Metrics’?

Benefits of this Approach

Quick Review of some Key DR Concepts

Principles to apply with this Approach

How it Works

Recovery Procedures

Exercises

Where to find this Stuff

Takeaways

Scenario-based Recovery Metrics give

continuity planners a method to measure the

business unit’s (or IT Department’s) ability to

recover from a range of scenarios, from the

simple to more complex.

Prevents unrealistic expectations by the business owners

Identifies gaps to meeting recovery requirements

Supports budgets and projects for gap remediation.

Provides a basis for exercises

It gives opportunity for continuity planners to dialogue with stakeholders and explain existing limitations

Dialogue … ‘We can meet your requirements for this set of simpler scenarios, but are (currently) unable to meet them for these more disruptive scenarios.’

Professional Practice* Three – Business Impact Analysis: Item 6.c -

Identify gaps between current recovery capabilities and requirements defined by the results of the BIA.

Current State

Desired State

Steps required for change

Action Plan____________________________________________________

*Professional Practices for Business Continuity Practitioners – DRII.org

Business Continuity Planning Cycle

Business Impact Analysis

Recovery Point Objective (RPO)

Recovery Time Objective (RTO)

Project Initiation

Risk Assessment

Business Impact

Analysis

Business Continuity Strategies

Business Continuity Plan Development

Training, Testing,

Evaluation

Business Continuity Planning is ...

project oriented

iterative

ongoing multi-phased

requires testing

The Business

Continuity

Planning Cycle

This approach touches the Business Impact Analysis.

A process designed to assess the potential quantitative

(financial) and qualitative (non-financial) impacts that

might result if an organization was to experience a

business disruption.

- Business Continuity Glossary by DRJ.com

RPO – Recovery Point Objective RTO – Recovery Time Objective

Point of last data backup Systems fully recoveredDisaster strikes!

• RPO – Recovery Point Objective• The maximum data loss that an organization will tolerate. Data and

systems must be restored to this point after a disruption.

• RTO – Recovery Time Objective• The maximum period of time that an organization accepts for recovery of

business functions, systems, and processes.

DowntimeData

Timeline

Inequality of Disasters and of Organizations

Build Recovery Plans for Simple, then Complex Scenarios

Build Recovery Metrics around ‘Scenario Themes’

Keep your Scenarios Plausible

“Start small, think big. Don’t worry about too many things at once. Take a handful of simple things to begin with, and then progress to more complex ones.” – Steve Jobs

Simple

Developing a robust disaster recovery capability requires developing the capability to recover from simple scenarios first …

Single failed server

Complex

Then you can build upon this base to develop the capability to recover from the more complex scenarios …

Server Room Fire

Scenario ThemesScenario themes

enable disaster

recovery planners to

model the various

levels of disaster

events … from less

severe to catastrophic.

• An individual (critical) component of a system fails.

Scenario Theme 1

• Multiple critical components (or a single super-critical component) of a system fails.

Scenario Theme 2

• An on-site disaster takes down multiple mission-critical systems.

Scenario Theme 3

• Regional disaster affects infrastructure, power grid and Internet services.

Scenario Theme 4

All these things are possible, but keep the scenario plausible in the context of your specific environment.

Cyb

er-Attack

Device theft

Disaster Recovery Priorities

Scenario Themes

Actual Recovery Metrics vs. Business Owner’s Requirements

Instructions – How to write

Recovery Steps

Dependencies

Expected Outcomes

Exercises leverage scenarios already developed

Exercise step-by-step

Exercise Observations

Exercises

Scenario Theme

One

Scenario Theme

Two

Scenario Theme Three

Scenario Theme

Four

Identify problems in system documentation or recovery procedures.

Identify problems with access of recovery documentation.

Identify any instances where critical recovery information resides only in the head of one individual.

Identify shortcomings in recovery technologies.

Identify potential problems with access to security credentials.

The tester must not be the author of the documentation or recovery procedure, but should be an individual who could perform recovery.

Tester ‘walks through’ the recovery procedure and system documentation for system being tested.

All participants encouraged to note problems in the Exercise Observations, though Scribe is responsible for this.

Go to http://missionresiliency.org/resources/

Here you can download the presentation, examples, and templates from this presentation

Proverbs 21:5 Good planning and hard work lead

to prosperity, but hasty shortcuts lead to poverty.

All disasters and business are not created equal.

Scenario based recovery metrics identifies gaps in meeting stakeholder’s recovery requirements, and supports projects for gap remediation.

They prevent unrealistic expectations by stakeholders, and provide a solid basis for exercises.

Scenario themes enable disaster recovery planners to model disaster events from less severe to catastrophic.

Scenario themes support building disaster recovery capability first for simple scenarios, then for increasingly complex scenarios.