structured approach to it business system availability and continuity planning, analysis and design
DESCRIPTION
Structured Approach to IT Business System Availability and Continuity Planning, Analysis and DesignTRANSCRIPT
Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design
Alan McSweeney
February 18, 2010 2
Objectives
• To provide details on a structured approach to analyse and define availability and continuity requirements for ITsystems
• To provide background information on the changing landscape of availability and continuity
February 18, 2010 3
Agenda
• Availability and Continuity Overview
• Availability Management
• Continuity Management
• Summary
February 18, 2010 4
Availability and Continuity
• Availability is the ability of a system or service to perform its required function at a stated instant or over a stated period of time.
• Availability is expressed as the availability ratio− The proportion of time that the service is actually available for use by the
customers within the agreed service hours
• Continuity is concerned with preparing to address unwanted occurrences− May relate to the recovery of IT systems or entire business processes.
• Continuity is concerned with ensuring that IT Services are recovered within agreed time scale
• Availability is a superset of Continuity and encompasses the continued operation of systems in the event of a disaster
• Continuity ensures availability in extreme circumstances
• Availability defines what is to be available in these extreme circumstances
February 18, 2010 5
Availability and Continuity Relationship
Availability Continuity
Continuity Provides Business Impact Analysis to Availability
Availability Provides Availability Criteria to Continuity
February 18, 2010 6
Availability and Continuity Relationships with Other IT Management Processes
Availability Continuity
Capacity Planning and Management
IT ArchitectureChange
Management
Service Planning and Management
Security Management
Finance Management
Puts a Cost on Lack of AvailabilityControls Expenditure on Availability and Continuity
Defines the Capacity Required for Continuity and Availability
Ensures Systems and Infrastructure are Designed to Incorporate Continuity and Availability
Controls Change that May Impact Availability or Require Continuity to be Invoked
Ensures that Continuity and Availability are Incorporated into Service Agreements and Provisions
Controls Security that May Impact Continuity and Availability
Continuity Provides Business Impact Analysis to Availability
Availability Provides Availability Criteria to Continuity
February 18, 2010 7
Availability and Continuity
• Availability
−Defines availability of service during operating hours
• Under normal circumstances
• Under extraordinary circumstances
• Continuity
−Defines continued operations of critical services and their availability
• Time until services are available and state of service after recovery
• Under extraordinary circumstances
February 18, 2010 8
Availability and Continuity
Service 1 Service 2
Component 1
Component 2
Component 3
Component 1
Component 4
Component 5
Service 3 Service 4
Component 1
Component 5
Component 6
Component 1
Component 2
Component 7
Primary IT Facilities
Service 1
Component 1
Component 2
Component 3
Service 3
Component 1
Component 5
Component 6
Recovery IT Facilities
Availability of Services During Normal Operations
Availability of Services After Continuity
Continuity of
Operations
February 18, 2010 9
Availability and Continuity
Service 1 Service 2
Component 1
Component 2
Component 3
Component 1
Component 4
Component 5
Service 3 Service 4
Component 1
Component 5
Component 6
Component 1
Component 2
Component 7
Primary IT Facilities
Service 1
Component 1
Component 2
Component 3
Service 3
Component 1
Component 5
Component 6
Recovery IT Facilities
Continuity of
Operations
Full View of Availability
February 18, 2010 10
Availability and Continuity
Non-disruptive system maintenance such as data backup combined with continuous availability of
agreed business systems
Protection against unplanned outages such as disasters through
reliable and predictable recovery and continuity of operations
Fault-tolerant, failure-resistant infrastructure supporting continuous
availability of agreed business systems
ContinuousOperation
Disaster Recovery
High Availability
Business Continuity
February 18, 2010 11
Availability and Continuity
Availability During Normal Operations
Availability During Housekeeping and Maintenance Operations
Availability After Some Component Failures
Availability After Complete Failure of Primary Facility
Availability
Continuity
February 18, 2010 12
Availability and Continuity Heat Map
InstantlySecondsMinutesHoursDays
Days
Hours
Minutes
Last Transaction
Recovery Time Objective (RTO) – Time to Recover Service/Time By Which Service Needs to be Recovered
Recovery Point
Objective (RPO) –Amount of Data
Loss Tolerable
After
Recovery
Increasing Availability (and Continuity)
Requirements
February 18, 2010 13
RTO and RPO
• Recovery Point Objective (RPO)
−Amount of Data Loss Tolerable After Recovery
• Either amount of data immediately available after recovery or amount of data available for some time after recovery
• Can be different
• Provide some data for minimal operations initially
• Provide more/all data
• Recovery Time Objective (RTO)
− Time to Recover Service/Time By Which Service Needs to be Recovered
February 18, 2010 14
RTO and RPO With Cost of Lack of Availability
Recovery Point
Objective (RPO) –Amount of Data
Loss Tolerable
After
Recovery
Recovery Time Objective (RTO) – Time to Recover Service/Time By Which
Service Needs to be Recovered
Cost of Lack of Availability of Service/Cost
Benefit of Providing High Availability and High Continuity
Business Critical Services Requiring Immediate Access With
Very Limited/No Data Loss and Requiring Continued Operation in
the Event of a Disaster
• Add extra dimension to Availability and Continuity Heat Map to allow for explicit identification of those systems that need to be continuously available
February 18, 2010 15
What is a Business Critical Application?
• Applications deemed business/mission critical
− 2006 – 16%
− 2007 – 36%
− 2008 – 56%
− 2009 – 60%
• Availability and continuity are merging as most applications are being deemed mission critical
February 18, 2010 16
How Often Have You had to Invoke Continuity Plan in Last Five Years?
Once 14%
Twice 6%
Three 3%
None 73%
Five or More 2%
Four 2%
• 27% of organisations have declared at least one disaster in the last five years
February 18, 2010 17
What Were the Causes of Having to Invoke Continuity Plans?
22.5%
16.6%
11.2%
8.9%
8.4%
6.3%
6.3%
5.6%
3.9%
3.5%
1.9%
1.9%
1.5%
1.1%
0.4%
Power Failure
Hardware Failure
Network Failure
Software Failure
Human Error
Flood
Other
Hurricane
Fire
Winter Storm
Terrorism
Not Specified
Earthquake
Tornado
Chemical Spill
February 18, 2010 18
Continuity Testing Seen as Disruptive
• 40% of organisations state that continuity testing impacts customers
• 32% of organisations state that continuity testing impacts sales
• Reasons for lack of testing
− Lack of time resources
− Lack of technology
− Disruption to employees
− Budget
− Disruption to customers
− Disruption to sales
− Disruption to production systems
− Not seen as a priority
February 18, 2010 19
Business Impact of Lack of Availability and Continuity Increase Exponentially Over Time
Seconds Minutes Hour Hours Day Days
Duration of Loss of Continuity
Fin
an
cia
l Lo
ss
Revenue Loss Staff Productivity Loss
Reputational Damage Financial Performance
February 18, 2010 20
Availability Design and Management
• Availability design optimises the capability of the IT infrastructure, services and supporting organisation to deliver a cost effective and sustained level of availability that enables the business to satisfy its business objectives− Ensures IT systems and infrastructure are designed to deliver the levels of
availability required by the business
− Provides a range of availability reporting to ensure that agreed levels of Availability are continuously measured and monitored
− Optimises the availability of the IT infrastructure to deliver cost effective improvements that deliver real benefits to the business
− Ensures shortfalls in availability are recognised and corrective actions are identified and performed
− Reduces problems and incidents that impact availability
− Creates and maintains an Availability Plan aimed at improving the overall availability and infrastructure components to ensure business availability requirements can be satisfied
February 18, 2010 21
Continuity Design and Management
• Continuity design is concerned with responding to and recovering business operations in the event of an outage or disaster rendering significant impact on the organisation
− Support the business by ensuring that the required IT facilities can be recovered within required and agreed business timescales
− Provides the strategic and operational framework to review the way the organisation continues to provide its services while increasing its ability to recover from disruption, interruption or loss
−Depends both on management and operations
− Requires management commitment
February 18, 2010 22
People, Process, Technology
• Start availability and continuity design with a business impact analysis and risk assessment
• Technology exists to supports availability and continuity design - technology not constitute a plan
• Focus on prevention before investing in technology
• However, availability and continuity is seen as the preserve of IT
− The business frequently does not have the required project focusor experience
• Embed availability and continuity into IT architectures
February 18, 2010 23
Questions
• Do you have adequate control over prevention of business process or IT infrastructure downtime?
• Do you have adequate IT capabilities to insure continuous operations?
• Do you know the risks your business and its business systems face?
• What would the cost and impact of downtime be to your business?
• Is your current continuity plan sufficient to meet your RPO and RTO objectives?
• Do you know how much will business continuity costs?
• What business problems will implementing availability and continuity solve even if you do not experience an unplanned IT outage?
• What is the overall business value of availability and continuity to the business?
• How should we define what level of business continuity we really need?
February 18, 2010 24
Availability Design and Management
February 18, 2010 25
Availability Design and Management Process
2. Availability Report
Evaluation and Improvement
1. Availability Reporting
3. Management Escalations of
Service Availability Violations
2. Document System and Application Architecture
1. Availability Requirements
Analysis
4. Availability Review
3. Gap Analysis and
Recommendations
Availability Process Quality Control
Availability Process Design and Management
Availability Design and Management Consists of Two Parallel Sub-Processes
February 18, 2010 26
Structured Approach to Availability Design and Management
• Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape
• Scope is to define a plan to implement agreed availability
February 18, 2010 27
Scope of Availability Design and Management
• Planning for service availability
• Designing for service availability by anticipating disruptions, estimating and measuring reliability and maintainability
• Planning for availability within SLA and reporting on them
• Ensuring cost effectiveness of availability solutions
• Reducing the duration of problems and incidents affecting availability
• Ensuring that security requirements are defined and incorporated within the overall availability design
February 18, 2010 28
Availability Design and Management Driven by Requirements
• Availability requirements are based on the needs of the business
• Requirements are gathered, defined, and validated by the key users and business management
• Includes hours of uptime as well as planned and unplanned downtime
• Includes ongoing support and procedures to address service disruptions
February 18, 2010 29
Benefits of a Structured Approach to Availability Design and Management
• Reduce Risks
− SLAs will incorporate availability design based on architecture,
− Reduced risk of violating SLAs
• Cost Reduction
− A defined and agreed acceptable level of service prevents over-delivery
− Unnecessary expenditure on maintenance and resilience building is avoided
• Improved Service Agility
− Changing business availability requirements are addressed quickly
− Cost of changes in availability of different levels is defined or can be assessed quickly.
• Improved Service Quality
− Improvement in Service Quality results from reduced Incidents as well as a reduced time to restore service
February 18, 2010 30
Structured Approach to Availability Design and Management
Availability Analysis and Design
1. Availability Requirements Analysis
2. Document System and Application Architecture
3. Gap Analysis and Recommendations
4. Availability Review
1.1 Understand Service Goals
1.2 Document Availability Requirements
1.3 Validate with Service Level Management
Function
2.1 Define Service Critical Components
2.2 Document Service Critical Components and
Their Relationships
2.3 Document and Review Components
Monitoring Capability
2.4 Document System and Application Architecture
3.1 Perform Gap and Risk Analysis
3.2 Identify Single points of Failure
3.3 Evaluate Alternative Approaches and Costs
3.4 Produce Gap Closure Recommendation and
Specification
3.5 Plan and Summarise Downtime
3.6 Create Statement of Work to Implement
4.1 Define Availability Measurement Model
4.2 Perform Trend Analysis
4.3 Analyse Expanded Incident Lifecycle
4.4 Investigate Major Outages
4.5 Analyse Availability Reports
February 18, 2010 31
Step 1 - Availability Requirements Analysis
Validated availability requirements
Overall service management planValidate availability draft requirements with service level agreements and overall service management plan
1.3 Validate with Service Level Management Function
Documented and agreed availability requirements
Draft service level agreementProduce draft availability requirements based on understanding of business goals
1.2 Document Availability Requirements
Documented and agreed business goals
Service design specification Document business goals for the service
1.1 Understand Service Goals
Documented and agreed availability requirements
Request for new service or changes to existing service
Request for change to availability
Determine availability requirements related to supporting the needs of the business
Validate with other IT management processes
Create draft service agreement and assess for feasibility from availability perspective
1. Availability Requirements Analysis
OutputsInputsScopeStep
February 18, 2010 32
Step 2 - Document System and Application Architecture
Architecture documentRepresentation of individual components, their attributes and relationships
Defined service monitoring criteria
Complete architecture document that describes how the service is delivered according to the service level agreement
2.4 Document System and Application Architecture
Defined service monitoring criteria
Existing service monitoring procedures
Review existing service monitoring facilities and update or replace if required
2.3 Document and Review Components Monitoring Capability
Representation of individual components, their attributesand relationships
Configurations of individual components, their attributes and relationships
Document the structure of the service breakdown - individual components and and their relationships that deliver the service
2.2 Document Service Critical Components and Their Relationships
Documented and agreed list of individual components that comprise the service
Service design specification
Configurations of individual components that comprise the service
Define the configurations of individual components that comprise the service
2.1 Define Service Critical Components
Documented and agreed existing architecture for service delivery
Service design specification
Configurations of individual components that comprise the service level agreement
Analyse operating environment of the individual components that comprise the service
2. Document System and Application Architecture
OutputsInputsScopeStep
February 18, 2010 33
Step 3 - Gap Analysis and Recommendations
Statement of work for projectSpecifications for the availability design and architecture
Initiate project for implementing changes to address availability issues
3.6 Create Statement of Work to Implement
Planned downtimeDecision on design and implementation
Plan downtime for components and aggregate downtime across services
3.5 Plan and Summarise Downtime
Decision on design and implementation
Specifications for the availability design and architecture
Approach for required availability
Cost information
Decision on how the closure should be implemented based on financial and business reasons
Develop specifications for the availability design and architecture
3.4 Produce Gap Closure Recommendation and Specification
Approach for required availability
IT strategy and architecture
Gaps analysed and risks identified and documented
Explore various options within the approved range and identify a suitable approach based on requirements and cost justification
3.3 Evaluate Alternative Approaches and Costs
Identified points of failureComponents attributes and relationships
Identify individual components whose failure can cause service disruption
3.2 Identify Single points of Failure
Gaps analysed and risks identified and documented
Problem and incident data
Availability requirements
Architecture document
Based on knowledge derived from Incident and Problem data identify gaps in current services
3.1 Perform Gap and Risk Analysis
Availability designValidated availability requirements
Architecture document
Service problem and incident history
Perform gap analysis and recommend suitable approach, create specifications and cost justification
3. Gap Analysis and Recommendations
OutputsInputsScopeStep
February 18, 2010 34
Step 4 - Availability Review
Identified availability concerns
Statement of work for identified changes
Availability reports Review availability reports and update infrastructure if required
4.5 Analyse Availability Reports
Identified availability concerns
Detailed incident analysis for specific incidents, fault, problems and performance reports
Investigate large outages and update availability design if required
4.4 Investigate Major Outages
Identified specific areas which need improvement
Analyse breakdown of incident resolution to validate and update design considerations
Analyse expanded incident lifecycle
4.3 Analyse Expanded Incident Lifecycle
Identified availability concerns
Incident and problem trend reportsAnalyse incident and problem data to arrive at a high level view of availability
4.2 Perform Trend Analysis
Defined data sources for availability measurement
Documented and agreed availability requirements
Define availability measurement model
4.1 Define Availability Measurement Model
Identified availability concerns and amended design if required
Incident, problem, fault reportsAssess, review and update availability design if required
4. Availability Review
OutputsInputsScopeStep
February 18, 2010 35
Core Principles
• Core principles ensure consistency of work and outputs
• Ensure processes will meet the requirements of the business
• Work will be of a high quality
• Core principles should serve as a checklist against which all work is assessed
February 18, 2010 36
Availability Design and Management Core Principles
1. Availability requirements are based on the agreed and defined needs of the business
2. The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business
3. Infrastructure needs to be designed to routinely incorporate availability requirements
4. The availability design and management process must adhere to security policies and procedures
5. An availability plan will be used to track and manage availability requirements and information collected
6. Data on service reliability, maintainability, resiliency must be collected and monitored
7. The IT function will use continuous process improvement to achieve and maintain level of service availability
8. Planned downtime must be minimised for business-critical functions and unplanned downtime is handled by service management processes including Incident Management, Service Request Management, Continuity Management
February 18, 2010 37
Core Principle 1 - Availability Requirements Are Based On The Agreed And Defined Needs Of The Business
• Elements
− Conditions for availability must be aligned with the needs of the business
− Relevant availability data must be gathered and analysed
− Input and validation of requirements must be solicited from the business
− Availability requirements must be documented and distributed for agreement and approval
• Benefits
− Expectations are clearly defined and accepted
− User satisfaction is increased
− Growth can be forecast more easily
− Problem areas can be identified
February 18, 2010 38
Core Principle 2 - The IT Function Determines The Overall Requirement Of Availability, Performance And Recoverability Of Systems
• Elements
− Requirements are met under defined and agreed service agreements
− Good working relationships need to exist with key suppliers and vendors
− Changes to environment must be reflected in service agreements
• Benefits
− There is a structure of supporting contracts in place from suppliers and vendors to met business availability requirements
February 18, 2010 39
Core Principle 3 - Infrastructure Needs To Be Designed To Routinely Incorporate Availability Requirements
• Elements
− Changes in infrastructure and business needs must reflected in availability planning and design
− Availability and recovery requirements need to be explicitly incorporated at the design stage
• Benefits
− Availability requirements and expectations are clearly defined and accepted
February 18, 2010 40
Core Principle 4 - Availability Design And Management Process Must Adhere To Security Policies And Procedures
• Elements
− Access to IT services must be provided in a secure environment
− Availability processes must be aligned with security policies
• Benefits
− Security measures will be followed
− There will be an ability to differentiate between security problems and availability problems
February 18, 2010 41
Core Principle 5 - Availability Plan Will Be Used To Track And Manage Availability Requirements And Information Collected
• Elements
− An availability plan must be developed and distributed
− Availability planning must be defined and outlined
− The availability plan must define the details about the to be data collected: what, how often, analysis, reporting, distribution, responses required
• Benefits
− Availability management goals are clearly defined and documented
− There will be a clearly communicated process for availability planning and reporting
− Data provided for availability reporting, analysis and forecasting
February 18, 2010 42
Core Principle 6 - Data On Service Reliability, Maintainability, Resiliency Must Be Collected And Monitored
• Elements
− The data to be collected and monitored must be defined, documented and communicated
− A supporting procedure to collect and monitor data, including response to potential problems must be defined
− Data needs to be reviewed on a regular and consistent basis
• Benefits
− Availability management will be proactive and responsive rather than reactive
− The expectations of the business can be set accurately
− There will be an ability to prepare for potentially increased future requirements
− Availability trends can be identified and addresses
February 18, 2010 43
Core Principle 7 - IT Function Will Use Continuous Process Improvement To Achieve And Maintain Level Of Service Availability
• Elements
− Collected availability data will be used to identify areas requiring improvement
− Implementation of any availability process improvement must be controlled by the change management process to control impact
• Benefits
− The business is enabled to make recommendations on availability improvements
February 18, 2010 44
Core Principle 8 - Planned Downtime Must Be Minimised For Business-Critical Functions And Unplanned Downtime Is Handled By Service Management Processes
• Elements
− Planned and unplanned downtime must be clearly notified to the business
− Acceptable versus unacceptable unplanned downtime for business-critical functions must be defined
− Escalation procedures will be developed and distributed
• Benefits
− Expectations are set with the business
− IT demonstrates commitment to supporting business-critical functions
February 18, 2010 45
Use Core Principles as Checklist for Independent Verification of Availability Design and Processes
�4.2 Availability processes must be aligned with security policies
�4.1 Access to IT services must be provided in a secure environment
�4 Availability Design And Management Process Must Adhere To Security Policies And Procedures
�3.2 Availability and recovery requirements need to be explicitly incorporated at the design stage
�3.1 Changes in infrastructure and business needs must reflected in availability planning and design
�3 Infrastructure needs to be designed to routinely incorporate availability requirements
�2.3 Changes to environment must be reflected in service agreements
�2.2 Good working relationships need to exist with key suppliers and vendors
�2.1 Requirements are met under defined and agreed service agreements
�2 The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business
�1.4 Availability requirements must be documented and distributed for agreement and approval
�1.3 Input and validation of requirements must be solicited from the business
�1.2 Relevant availability data must be gathered and analysed
�1.1 Conditions for availability must be aligned with the needs of the business
�1 Availability requirements are based on the agreed and defined needs of the business
February 18, 2010 46
Continuity Design and Management
February 18, 2010 47
Continuity Design and Management Process
2. Continuity Report Evaluation and Improvement
1. Continuity Reporting
3. Management Escalations of Service Continuity Violations
2. Conduct Business Impact Analysis
1. Conduct Risk and Disaster Avoidance
Assessment
4. Form Continuity and Disaster Recovery
Team
3. Determine Data Backup and Recovery
Options
Continuity Process Quality Control
Continuity Process Design and Management
Continuity Design and Management Consists of Two Parallel Sub-Processes
6. Continuity Processing for Critical Service Components
5. Design and Develop Disaster Recovery Plan
8. Maintain Continuity and Disaster Recovery
Plan
7. Conduct Continuity and Disaster Recovery
Rehearsal
February 18, 2010 48
Structured Approach to Continuity Design and Management
• Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape
• Scope is to define a plan to implement agreed continuity
February 18, 2010 49
Scope of Continuity Design and Management
• Conducting impact analyses on loss of business systems
• Designing for service continuity by anticipating disruptions, estimating and measuring reliability and maintainability
• Supporting business critical functions
• Designing and developing a Disaster Recovery Plan
• Design and developing Disaster Recovery Training
• Planning for and performing disaster mitigation and avoidance
• Assessing and managing risk
February 18, 2010 50
Structured Approach to Continuity Design and Management
Continuity Analysis and
Design
1. Conduct Risk and Disaster Avoidance
Assessment
2. Conduct Business Impact
Analysis
3. Determine Backup and
Recovery Options
4. Form Continuity and Disaster
Recovery Team
1.1 Identify Potential Threats
1.2 Assess Probability of
Threats
1.3 Evaluate Current Disaster
Avoidance Measures
2.1 Define Business Impact
Analysis Methodology
2.2 Identify Business
Functions to be Analysed
2.3 Define Business Function
Criticality Categorisation
2.4 Design Questions and
Conduct Interviews
3.1 Identify Backup and
Recovery Options for Critical Functions
3.2 Evaluate Operation of Backup and
Recovery Options
3.3 Determine Backup and
Recovery Options for Critical Functions
3.4 Design Backup and Recovery Procedures
4.1 Define Recovery Team
Structure
4.2 Define Recovery Team
Functions
4.3 Define Team Leaders and
Members
4.4 Define Team Charter
5. Design and Develop Disaster
Recovery Plan (DRP)
6. Continuity Processing for Critical Service Components
7. Conduct Continuity and
Disaster Recovery Rehearsal
8. Maintain Continuity and
Disaster Recovery Plan
1.4 Assess Risk Controls to
Mitigate Threats
1.5 Determine Impact of
Reduced Controls
1.6 Determine Value of
Additional Controls
2.5 Analyse Results of Interviews
2.6 Summarise and Present
Results
5.1 Determine DRP Structure and
Methodology
5.2 Define DRP Notification
Schedule and Process
5.3 Define DRP Escalation Process
5.4 Define Key Recovery
Objectives
5.5 Define Recovery Steps
5.6 Define Critical Function
Restoration Process
6.1 Identify Critical
Components for Continuity
6.2 Develop Options for Continuity
6.3 Develop Continuity
Processing Steps
6.4 Develop Return from
Continuity Process
7.1 Design Rehearsal
Programme
7.2 Develop Rehearsal Scenarios
7.3 Plan and Schedule
Rehearsals
7.4 Develop Rehearsal Evaluation
Criteria
7.5 Conduct Rehearsals
7.6 Review and Analyse
Rehearsals
8.1 Assign Responsibility for DRP Maintenance
8.2 Establish DRP Review and
Maintenance Procedures and
Schedule
8.3 Integrate DRP Maintenance into
Change Management
8.4 Agree and Maintain DRP
Distribution List
February 18, 2010 51
Step 1 - Conduct Risk and Disaster Avoidance Assessment
Value to organisation of additional controls
Assessment of risk controls to reduce threats, impact to organisation
Determine which risks the organisation is willing to accept and those to be controlled
1.6 Determine Value of Additional Controls
Impact to organisation without adequate disaster recovery controls
Assessment of risk controls to reduce threats
Determine how effective a control would be in deterring the threat, limiting the cost of the risk and minimising the impact threats have
1.5 Determine Impact of Reduced Controls
Assessment of risk controls to reduce threats
Current avoidance measuresDetermine the effectiveness of controls in deterring threats
1.4 Assess Risk Controls to Mitigate Threats
Evaluation of current disaster avoidance measures
Potential threats affecting IT systems are identified and their probability
Evaluates current disaster avoidance measures
1.3 Evaluate Current Disaster Avoidance Measures
Assessment of probability of identified potential threats
Potential threats affecting IT systems are identified
Assess the probability of the potential threats affecting IT systems are identified
1.2 Assess Probability of Threats
Potential threats affecting IT systems are identified
Agreement on scope of Continuity recovery plan
Identify potential threats, internal and external, including weaknesses in the organisation that will cause failure of IT systems
1.1 Identify Potential Threats
Risk assessment report with recommendations for improvements
Risks and threats, historical data, current environment, current policies, processes and procedures
Identify and quantify risks and vulnerabilities to the organisation
1. Conduct Risk and Disaster Avoidance Assessment
OutputsInputsScopeStep
February 18, 2010 52
Step 2 - Conduct Business Impact Analysis
Conclusions and final report of Business Impact Analysis
Analysis of dataDevelop conclusions and present final report regarding Business Impact Analysis
2.6 Summarise and Present Results
Analysis of dataValidation of business losses Analyse the data and validate findings if necessary
2.5 Analyse Results of Interviews
Validation of business losses Defined criteria for categories of business functions
Design and validate questions and conduct interviews
2.4 Design Questions and Conduct Interviews
Criteria for categorising business functions
Identified business functions Defined categorisation criteria for each business function
2.3 Define Business Function Criticality Categorisation
Business functions identified for analysis
Agreed methodologies and processes to be used in Business Impact Analysis
Identify business functions to be analysed for risk and disasters
2.2 Identify Business Functions to be Analysed
Agreed methodologies and processes to be used in Business Impact Analysis
Business systemsDefines methodology and process to be used in Business Impact Analysis based on the risk and disaster avoidance assessment
2.1 Define Business Impact Analysis Methodology
Critical function categorisation
List of recovery requirements for processing critical functions
Risk and disaster avoidance assessmentConduct business impact analysis In order to know which functions are the most critical to the organisation for survival
2. Conduct Business Impact Analysis
OutputsInputsScopeStep
February 18, 2010 53
Step 3 - Determine Data Backup and Recovery Options
Backup procedures for critical business functions
Backup options for critical business functions
Design backup procedures for all critical business functions
3.4 Design Backup and Recovery Procedures
Backup options for all critical business functions
Evaluated backup options for critical business functions
Determine backup options for those critical business functions that currently do not have any backup options or where the options do not work correctly
3.3 Determine Backup and Recovery Options for Critical Functions
Evaluated backup options for critical business functions
Backup options for critical functionsEvaluate previously identified backup options needs to be for various scenarios
3.2 Evaluate Operation of Backup and Recovery Options
Backup options for critical functions
Conclusions and final report of Business Impact Analysis
Work with business units to identify possible backup options for critical business functions
3.1 Identify Backup and Recovery Options for Critical Functions
Recovery objectives
List of backup options,
Supporting procedures
Available time to backup and recover
Acceptable downtime
Recovery requirements
Determine data backup and recovery options based on the requirements for recovering critical functions and the type of disaster or interruption being cater for
3. Determine Data Backup and Recovery Options
OutputsInputsScopeStep
February 18, 2010 54
Step 4 - Form Continuity and Disaster Recovery Team
Charter and recovery procedures along with roles and responsibilities for each recovery team
Recovery team leader, alternate team leader and members
Define charter for each team along with the defined roles and responsibilities
Define recovery procedures for each team relevant to their team role and charter
4.4 Define Team Charter
Recovery team leader, alternate team leader and members
Functions for recovery teamDefine team leader, alternative leader and other team members for each type of disaster and business units
4.3 Define Team Leaders and Members
Functions for recovery teamStructure of disaster recovery teamDefine the function of each individual disaster recovery team of each business units
4.2 Define Recovery Team Functions
Structure of disaster recovery team
Decision to proceedDefine structure of disaster recovery team
4.1 Define Recovery Team Structure
Recovery team structure
Recovery team charter and members
Recovery procedures
Business needs
Recovery requirements
Establish recovery teams and specify what each team is to do in the event of a broad range of possibilities
4. Form Continuity and Disaster Recovery Team
OutputsInputsScopeStep
February 18, 2010 55
Step 5 - Design and Develop Disaster Recovery Plan
Accepted restoration processDisaster recovery stepsDiscuss the DRP with business units to get acceptance to define final restoration process and define training to be provided
5.6 Define Critical Function Restoration Process
Disaster recovery stepsConsideration of key recovery objectives and policies
Define the framework for disaster recovery to ensure it contains the required recovery steps
5.5 Define Recovery Steps
Consideration of key recovery objectives and policies
Escalation procedureConsider the organisation’s key recovery objectives and policies while designing DRP
5.4 Define Key Recovery Objectives
Escalation procedureNotification schedule and recovery process
Define the DRP escalation criteria and procedure
5.3 Define DRP Escalation Process
Notification schedule and recovery process
Structure and methodology of developing DRP
Define the notification schedule and process of recovery
5.2 Define DRP Notification Schedule and Process
Structure and methodology of developing DRP
Structure of disaster recovery teamDetermine the structure and methodology of how the plan will be developed
5.1 Determine DRP Structure and Methodology
Recovery PlanRecovery objectives
Scope of plan
Business function classification
Disaster definitions and classification
Recovery team organisation
Develop and validate processes and procedures to support the critical business functions and validate,
5. Design and Develop Disaster Recovery Plan
OutputsInputsScopeStep
February 18, 2010 56
Step 6 - Alternate Processing for Critical Service Components
Steps to return critical components to normal processing from alternate processing
Alternate processing stepsDevelop procedure to return from alternate processing to normal processing
6.4 Develop Return from Continuity Process
Alternate processing stepsOptions for alternate processingDevelop processing steps based on the options for alternate processing for critical components
6.3 Develop Continuity Processing Steps
Options for alternate processing Critical components identified Develop options for alternate processing for critical components in coordination with business units
6.2 Develop Options for Continuity
Critical components identified Accepted restoration processWork with business units to identify critical components that need alternate processing
6.1 Identify Critical Components for Continuity
Critical business function components timelines
Alternate procedures
Critical business function components
Alternatives for processing critical components
Evaluate critical business function components to determine if alternate processing procedures are necessary and feasible for the period between a disaster and recovery and how recovery should be achieved
6. Alternate Processing for Critical Service Components
OutputsInputsScopeStep
February 18, 2010 57
Step 7 - Conduct Continuity and Disaster Recovery Rehearsal
Reports on conducted rehearsals
Conduct rehearsalsDocument and distribute outcomes of the rehearsals to all the members along with lessons learned and review reports
7.6 Review and Analyse Rehearsals
Conduct rehearsalsSchedule rehearsalsConduct rehearsals in coordination with all other members
7.5 Conduct Rehearsals
Evaluation techniques and criteria
Schedule rehearsalsDevelop evaluation techniques and criteria for each rehearsal scenarios
7.4 Develop Rehearsal Evaluation Criteria
Schedule rehearsalsRehearsal scenariosPlan and schedule rehearsals, both planned and unannounced
7.3 Plan and Schedule Rehearsals
Rehearsal scenariosPrograms for rehearsalsDevelop rehearsal scenarios based on the design of rehearsals
7.2 Develop Rehearsal Scenarios
Programs for rehearsalsDisaster Recovery PlanDesigned programmes for rehearsals7.1 Design Rehearsal
Lessons learned
Rehearsal report
Rehearsal plan
Recovery procedures
Alternate procedures
Rehearsal objectives
Conduct rehearsals to validate the success of an organisation’s ability to respond and recover from a disaster
7. Conduct Continuity and Disaster Recovery Rehearsal
OutputsInputsScopeStep
February 18, 2010 58
Step 8 - Maintain Continuity and Disaster Recovery Plan
Distribution list Updated DRPAfter updating DRP create a distribution list to whom the DRP has to be distributed
8.4 Agree and Maintain DRP Distribution List
Updated DRPReview feedbacks and inputsIntegrate maintenance process with change management processes to assessed changes for their potential impact on the continuity plans
8.3 Integrate DRP Maintenance into Change Management
Procedure for review and maintenance of DRP
Assigned responsibilities to review and maintenance of DRP
Establish review and maintenance of procedures and schedules
8.2 Establish DRP Review and Maintenance Procedures and Schedule
Assigned responsibilities to review and maintenance of DRP
Rehearsal review reports
DRP
Review criteria and objectives
Identify reviewers responsible for plan maintenance and assign responsibility
8.1 Assign Responsibility for DRP Maintenance
Recommendations for improvements or changes
Approval list from reviewer
Disaster recovery plan
Review schedule
List of reviewers
Review criteria and objectives
Conduct scheduled reviews of the contents of the continuity plan
Updated the plan as part of the change management process and with other related changes
8. Maintain Continuity and Disaster Recovery Plan
OutputsInputsScopeStep
February 18, 2010 59
Continuity Design and Management Core Principles
1. Scope of continuity plan must contain clear and realistic recovery objectives and recovery timeframes
2. Risk management and disaster avoidance measures should be in place and practiced
3. Continuity plan including disaster recovery should be designed and developed to support recovery of agreed critical business functions
4. Continuity plan should be rehearsed regularly
5. Continuity and recovery strategies or plans should be integratedinto design and deployment of changes to infrastructure
6. Continuity and recovery processes or plans should be reviewed and updated on a regular basis
February 18, 2010 60
Core Principle 1 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes
• Elements
− Recovery process must be aligned to support business objectives
− It must be ensured that business impact and recovery investments have direct relationship
− Recovery time and objectives needs to be communicated and validated
− The disasters must be defined, which continuity plan will and will not address
− Scope of planning efforts must be stated
• Benefits
− Clear objectives
− Defined scope of efforts
− Expectations are agreed and defined
− Coordinated recovery efforts
February 18, 2010 61
Core Principle 2 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes
• Elements
− Ensure that environment is constructed and operated to prevent potential disasters
− As infrastructure changes and business needs change, ensure risks and exposures are addressed
• Benefits
− Control of preventable, predictable disasters
− Minimising and deterring potential disasters
February 18, 2010 62
Core Principle 3 - Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions
• Elements
− Investment for adequate preventative, proactive, and recovery methods for critical business functions
− All business functions and their criticality must be defined and communicated to the organisation
− Must be ensured that the key customers are reassured of continuity management process
• Benefits
− Expectations are set and agreed upon
− Minimise significant losses to the organisation in terms of financial, legal, and operational issues
February 18, 2010 63
Core Principle 4 - Continuity Plan Should Be Rehearsed Regularly
• Elements
− Regular rehearsals must be conducted, both planned and unannounced
− Partial and full rehearsals must be conducted
− A variety of rehearsal techniques must be used
− Rehearsal objectives and success criteria must be clearly defined
• Benefits
− Potential for successful recovery is high
− Reinforces learning and commitment
− Demonstrates value to organisation
− Identification of potential weaknesses in plan
February 18, 2010 64
Core Principle 5 - Continuity And Recovery Strategies Or Plans Should Be Integrated Into Design And Deployment Of Changes To Infrastructure
• Elements
− Must ensure the plans for changes to infrastructure are considered with continuity in mind
− Recovery procedures must be requested for new applications, systems, networks
• Benefits
− Continuity is critical component of operating environment
− Continuity strategies and plan have important role in design and deployment decisions and plans
February 18, 2010 65
Core Principle 6 - Continuity And Recovery Processes Or Plans Should Be Reviewed And Updated On A Regular Basis
• Elements
− Regular reviews of continuity plans must be defined and scheduled
− Make sure reviewers are not involved in the development of the plan and are objective
− Integration into the change management process for plan updates must be ensured
− Revision, tracking, and distribution list must be defined and document
• Benefits
− Keeps continuity plan as a living document
− Ensures the plan is kept current
− Reminder of continuing purpose of plan and its benefits to the organisation
February 18, 2010 66
Use Core Principles as Checklist for Independent Verification of Continuity Design and Processes
�3.3 Must be ensured that the key customers are reassured of continuity management process
�4.2 Partial and full rehearsals must be conducted
�4.1 Regular rehearsals must be conducted, both planned and unannounced
�4 Continuity Plan Should Be Rehearsed Regularly
�3.2 All business functions and their criticality must be defined and communicated to the organisation
�3.1 Investment for adequate preventative, proactive, and recovery methods for critical business functions
�3 Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions
�2.2 As infrastructure changes and business needs change, ensure risks and exposures are addressed
�2.1 Ensure that environment is constructed and operated to prevent potential disasters
�2 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes
�1.4 The disasters must be defined, which continuity plan will and will not address
�1.3 Recovery time and objectives needs to be communicated and validated
�1.2 It must be ensured that business impact and recovery investments have direct relationship
�1.1 Recovery process must be aligned to support business objectives
�1 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes
February 18, 2010 67
Process Quality Control
February 18, 2010 68
Common Process Quality Control Procedures for Availability and Continuity
2. Continuity Report Evaluation and Improvement
1. Continuity Reporting
3. Management Escalations of
Service Continuity Violations
Continuity Process Quality Control
2. Availability Report
Evaluation and Improvement
1. Availability Reporting
3. Management Escalations of
Service Availability Violations
Availability Process Quality Control
February 18, 2010 69
Structured Approach to Availability and Continuity Process Quality Control
Availability and Continuity Process Quality Control
1. Generate Report Metrics and Reports
2. Evaluation and Improvement3. Management Escalations of Service Continuity Violations
1.1 Develop Management Reports Based on Agreed Metrics
1.2 Schedule Report
1.3 Generate Reports
2.1 Evaluate Process for Improvement
2.2 Develop Improvements and Implementation Plan
2.3 Create and Submit Improvement Implementation
Plan
2.4 Implement Improvement Plan1.4 Distribute Reports
1.5 Review Report Schedule
1.6 Update Reporting Schedule
2.5 Review Implementation
2.6 Update Process Improvement Plan
February 18, 2010 70
Step 1 - Generate Report Metrics and Reports
Updated report scheduleReport scheduleUpdate report schedule with the new reports
1.6 Update Reporting Schedule
Review resultsReport schedule
Report details
Review regularly the report requirements
1.5 Review Report Schedule
Distributed reportsGenerated reportsDistribute the generated report to the target recipients
1.4 Distribute Reports
Generated reportsCollected metricsGenerate reports according to per schedule or in response to ad hoc requirements
1.3 Generate Reports
Updated report scheduleReport scheduleUpdate the report schedule1.2 Schedule Report
Accepted reports, frequency and costs
Report requirementsReport to management the contributions made by this process to overall service management
1.1 Develop Management Reports Based on Agreed Metrics
Generated or distributed Reports
Report Schedule
Request for Ad hoc reports
Generate report metrics and periodic and ad hoc reports as per requirement or plan
1. Generate Report Metrics and Reports
OutputsInputsScopeStep
February 18, 2010 71
Step 2 - Evaluation and Improvement
Updated process improvement plan
Process Improvement plan
Review cycle
Update the process improvement plan with any changes
2.6 Update Process Improvement Plan
Closed improvement implementation plan
Review Results
Implemented improvements Monitor implementation to ensure that process is not disrupted and that the changes are working as intended
2.5 Review Implementation
Implemented improvements
Reduced costs
Improved process efficiency And effectiveness
Approved improvement implementation plan
Improvement strategy
Manage and coordinate the implementation of the process improvement plan
2.4 Implement Improvement Plan
Submitted improvement implementation plan
Improvement strategyCreate and submit improvement implementation plan
2.3 Create and Submit Improvement Implementation Plan
Improvement strategyImprovement plan
Gap analysis report
Revised business requirements
Develop and review proposed process improvements
2.2 Develop Improvements and Implementation Plan
Gap analysis reportImprovement planReview the effectiveness and efficiency of the continuity management process regularly
2.1 Evaluate Process for Improvement
Implemented improvements, Reduced costs, Improved process efficiency and effectiveness
Process metrics
Future directives
Service level expectations
Review schedule
Improvement plan
Perform periodic reviews for process performance improvement
2. Evaluation and Improvement
OutputsInputsScopeStep
February 18, 2010 72
Summary
• Availability and continuity are merging into a single unbroken requirement
• Availability and continuity can be a significant overhead to an organisation so their cost should yield benefits elsewhere
• Most business systems and processes are defined as business critical
• Management commitment is needed to ensure availability and continuity can the required attention and resources
• Use core principles for availability and continuity for independent verification of processes and designs
• Availability and continuity should be embedded into system architectures and designs rather than being an afterthought