DISASTER RECOVERY CAPSTONE PROJECT
Advisor: Jim French, Dept of Ecology
Team Members: Scott Andersen, WSDOTGary Duffield, DISDoug Selix, OFMThelma Smith, WSDOTBrian Sylvester, DOP
Capstone Assignment How can the state achieve a coordinated
approach to IT disaster recovery? How will we recover critical services and
infrastructure knowing that we share services, platforms, and customers that rely on each other for data during the recovery?
How do we expose the risks, identify the gaps and move toward meeting recovery time objectives?
How do we ensure that the capacity to recover aligns with the risk tolerance of state leadership?
5-Guiding Principles1. Establish and empower a central
authority for ‘Enterprise’ (Statewide) D/R Planning
2. Standardize and consolidate IT Infrastructure where ever possible to ease D/R Planning
3. Practice D/R Planning at the ‘Enterprise’ (not agency) level
4. Mandate D/R planning for all IT systems
5. Develop and document State guidelines on ‘risk appetite’
New Term!
Resilience and Recoverability (R/R)
LeadershipLeadership is about change!
Shared Vision: Changes on the horizon
Standardization & Consolidation System Level R/R Focus R/R Designed into All Systems Risk Tolerance and Oversight
Planning Senior Level Sponsorship
State Agencies’ Partnership
Strategic and Tactical Leadership Strategic = Resilience Tactical = Recoverability
Governor
Emergency Management Council
State Agency Liaisons
DIS, OFM, DOP, DOT, etc.
Existing Channels
Comprehensive Emergency Management
Plan - CEMP
ISB Standards
State Agencies’ Plans
Existing Plans
BudgetingExisting Catch 22
Change agency-centric approach to statewide R/R solution
Establish shared vision for funding R/R
Integrate R/R into Spending Plans Develop policy that cements R/R funding
into IT initiatives
Control Establish Ownership and Oversight
Align R/R efforts with similar or preexisiting efforts Emergency management groups Agencies’ leadership teams
Establish new teams or partnerships as needed
Establish policies for: Compliance Success Metrics Change Management
Summary LEADERSHIP!!!
Proactive = Resilience Reactive = Recovery
Close Gaps and Remove Roadblocks
Leverage Existing or Empower new Program
Standardization & Consolidation
Hardware and software consolidation and standardization is becoming the driving force behind organizations evaluating their Disaster Recovery plans.
A 2009 survey from Symantec Corporation found that 64% of organizations are creating or re-evaluating their DR plans based on a plan to consolidate and standardize their infrastructure.
Hosting Service Matrix
MaturityTarget
Transition
Target
Increase provider mgmt, reduce agency
resources
Leverage common infrastructure, consolidate hardware, reduce cost
2
Resiliency and Recoverability
Olympia ProductionData Center 1
Olympia ProductionData Center 2
Eastern WARecovery Site
Virtual Cluster 1
Servers Servers
Virtual Cluster 2
Servers Servers
Recovery Cluster 1
Servers Servers
Load Balanced
Data Replication
Server Image ReplicationVirtual Storage Virtual StorageVirtual StorageData Replication
Summary Adopt a cost effective enterprise
High Availability Architecture solution (Resilience).
Future investments in Infrastructure and Applications should include Resilience and Recoverability.
Enterprise Level R/R Focus
Planning for Resilience and Recoverability should be at the Enterprise Level.
Planning for recovery by agency, technology, or individual application is not effective for an enterprise class system.
Enterprise Level R/R Focus Enterprise Level Planning is complex, and must be done for Essential Systems.
Essential Systems support Essential Agency Functions as defined in agency COOP plans
Must consider core agency systems - run by agency or service provider
Must consider dependencies such as infrastructure and interface services
Must consider dependant trading partner systems Must consider enterprise data at recovery point Must include procedures for assuring data
integrity at recovery point
Enterprise Level R/R Focus OFM Example - The State Payment Process
Payment Process based upon AFRS and all systems that it connects to
Historical DR Plan “DIS will recover the mainframe and all will be good”
Look at interfaces to partner agencies Look at known single points of failure
Enterprise Level R/R Focus Enterprise Class Planning requires someone to focus on getting it done for essential systems!
A single organization must facilitate Enterprise planning
Enterprise system owner and Stakeholders must fully participate in development and testing of R/R Plans
Summary Enterprise Planning is HARD!
Enterprise Class Systems are COMPLEX!
Someone Needs to GET ‘er DONE!
R/R Designed into All IT systems
Many, if not most, recent IT systems developed without Disaster Recovery – Why? Elimination viewed as a ‘Cost Reduction’
strategy. This is a ‘false economy’ – a calculated risk
Real consequences to State citizens: Missing vital systems after a disaster
Or Spend too much to ensure their availability
R/R Designed into All IT systems
Creation of WSRRO Mandate all new IT systems include R/R Review and approve
Criteria Agency impact analysis Integration impact analysis Validate appropriateness of plan
R/R Designed into All IT systems
Types of ‘valid’ plans: ‘Resilience’ ‘Warm site’ ‘Cold site’ Data protection only No recovery plan
R/R Designed into All IT systems
Resilience Recovery (Warm) Recovery (Cold) Data Protection Only
Time
Cost
Assurance
TimeCostAssurance
Summary
Mandate R/R planning for all IT systems
Scope for critical functions only
Ensure ‘Enterprise’ context
What Do We Save? If your house was on-fire, what would
you save?
We all live in the same house, we need to decide what is going to be saved! And how much!
We won’t be able to save it all.
Be careful what you choose!
Risk Tolerance & GovernanceWhat is important to the WA State Enterprise? Public Safety
(EMD/WSP/DOC/Roads/others?) Citizen Systems – Licensing, Social Systems,
others? Financial Systems - How we dispense and
receive funds. H/R Systems, Data Centers?
State Enterprise Approach!
Acceptable Risk Tolerance
How much and what loss is acceptable?
Data? E-mail? File Systems? Hardware/infrastructure Network s, communications? Applications used by Citizens? Applications used by Agencies?
What does this look like? How do we determine what and how much?
Identify and Develop a Risk Matrix!
Governance! Now we know what, How do we really know it will
work?
What are our expectations for Disaster Recovery?
How do we ensure that RECOVERY WILL work? LEADERSHIP! Identify and apply standardized comprehensive testing
(Know what and how much to test and test it the same way across the board!
Perform Resilience and Recoverability Plans Review Results and apply Process Improvement!
(Do it better next time!)
Summary Target Enterprise (State Level)
Programs/Systems NOT silo agencies Identify how much of it we really need!
RISK MATRIX!
Standardized Comprehensive Testing applied
Regularly perform Resilience and Recoverability Testing
Process Improvement
5-Principles to Implement!1. Establish and empower a central authority
for ‘Enterprise’ (Statewide) R/R Planning
2. Standardize and consolidate IT Infrastructure where ever possible to ease R/R Planning
3. Practice R/R Planning at the ‘Enterprise’ (not agency) level
4. Mandate R/R planning for all new IT systems
5. Develop and document State guidelines on ‘risk appetite’
Questions? Thank you!
Scott Andersen, WSDOTGary Duffield, DISDoug Selix, OFMThelma Smith, WSDOTBrian Sylvester, DOP