welcome to trends in disaster recovery curole-m… · note that in 1890 herman hollerith made the...
TRANSCRIPT
1
Category 5 Services, LLC
Welcome to
Trends in Disaster RecoveryThe Past, Today and the Future
September 2012
Glen Curole
(918) 344-9998
Lafayette, Louisiana
Martin Myers
(804) 332-3013
Richmond, Virginia
2
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
3
Introductions & Expectations
Category 5 Services, LLC
We are a team of experienced professionals with over 100 years total experience dedicated to providing our clients with cost effective solutions to business issues in the following areas.
· Business Continuity & Disaster Recovery Management· Data Center & IT Strategic Planning and Management · Project & Program Management · Security Assessments and Management · Web Site Design and Implementation
4
Introductions & Expectations
“An expert is a person who has made all the mistakes that can be made in a very narrow field.”
Neils Bohr
5
Introductions & Expectations
Goal of this session is to discuss the past,present and future of Disaster Recovery and toshare information that you can use when youget back to your respective companies.
Novice/Intermediate/Advanced
How have disaster recovery and systems availability changed? Thispresentation will provide three views of systems recovery and the requiredhardware components that are needed to support each recovery method.The presenters will provide a look back on the old days of recovery, acurrent look into today and the future vision. Systems recovery for disastersand/or single event outages must change and evolve to support highersystems availability by using prevailing and advanced hardware andsoftware techniques that support our customers' needs for 24 X 7 systemsavailability.
6
Introductions & Expectations
Legalese
The opinions expressed by Glen Curole andMartin Myers are their own, and shouldnot be interpreted as those of Category 5Services or Bank of America.
The mention of any companies should notbe considered as endorsements.
7
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
8
What is a Disaster?
• Generally, a sudden unplanned event.
• That results in the inability to provide critical business
functions for a predetermined period of time.
• Resulting in a loss as defined by the BIA.
• With the requirement that you not replicate your disaster
at the recovery site.
9
Statement of Fact
• Publicly traded companies have a responsibility to stockholders, customers and employees to ensure the continuity of business operations.
• In the event of a business outage or disaster this responsibility continues to exist.
10
Surveys & Studies Show …
2006 AT&T Corp. survey of the state of business continuity and disaster recover planning in the United States. This survey of more than 1,000 senior executives of U.S. companies with more than US$10 million in annual revenue indicated that:
• 28 percent of U.S.-based companies do not have adequate plans in place to cope with natural or other potential disasters.
• Of those companies with business continuity plans in place, 40 percent say that they have not tested their plans in the past 12 months. But is this something business leaders really need to worry about or is there a sense of paranoia about disasters? If disasters do occur, are they really a significant hurdle to overcome? The AT&T survey results shed more light:
• Nearly 30 percent of those surveyed said that their companies have suffered from a disaster.
• 9 percent of those companies hit with a disaster reported that repairs and business losses cost at least US$500,000 a day. If normal business operations cannot be restored, that translates to a loss of US$2.5 million a week.
Gartner Group report found that two out of five organizations go out of business within five years after a disaster. It clearly reveals that business continuity planning is something requiring management attention.
11
• Ensure employee safety
• Minimize customer impact even under adverse situations
• Minimize financial impact
• Meet due diligence expectations of employees, customers, regulators, shareholders and public
• Meet regulatory & compliance requirements
• Protect company against negligence or dereliction-of-duty accusations / legal action
• Maintain high reputation even in the face of a crisis
• Provide a competitive advantage
Why bother with DR/BC Plans?
12
Concepts
Identifying your optimal BC investment
LossesLossesLossesCostCostCost Optimal Down TimeOptimal Down TimeOptimal Down Time
Optimal Optimal Optimal
InvestmentInvestmentInvestment
13
Concepts
Identifying your optimal BC investment
LossesLossesLossesCostCostCost Less Less Less
Down TimeDown TimeDown Time
Appropriate Appropriate Appropriate
InvestmentInvestmentInvestment
Spend More Spend More Spend More
Lose LessLose LessLose Less
Optimal Optimal Optimal
Down TimeDown TimeDown Time
14
Concepts
Identifying your optimal BC investment
LossesLossesLossesCostCostCost More Down TimeMore Down TimeMore Down Time
Appropriate Appropriate Appropriate
InvestmentInvestmentInvestment
Spend Less Spend Less Spend Less
Lose MoreLose MoreLose More
Optimal Down TimeOptimal Down TimeOptimal Down Time
Plan
15
Concepts
When a disaster strikes you need …
Place People
Communications
Equipment
Data
16
Concepts
Plan for the Worst Case!
Anything less can be handled
by the worst case plan.
What happens isn't as important as thefact that something happens!
17
Where Most Organizations Are Headed
$$$ $$$$$$$$$$$$
TRADITIONALTRADITIONALTRADITIONAL
OPERATING SYSTEMOPERATING SYSTEMOPERATING SYSTEM
ADVANCED RECOVERYADVANCED RECOVERYADVANCED RECOVERY
CONTINUOUSCONTINUOUSCONTINUOUSAVAILABILITYAVAILABILITYAVAILABILITYServiceServiceService
LevelLevelLevel
To summarize …
• What are the key factors for any backup and recovery technology?
– Data storage media and data replication time
– Network capability
•How fast and how much data can it move?
18
19
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
20
Average Time to Recover
Traditional RecoveryTraditional RecoveryTraditional Recovery
TransactionsTransactionsTransactionsNot CapturedNot CapturedNot Captured
DeclarationDeclarationDeclarationTransactionTransactionTransactionRecreationRecreationRecreation
Data Data Data RetrievalRetrievalRetrieval
TransitTransitTransitSystemSystemSystemRestoreRestoreRestore
IPL &IPL &IPL &NetworkNetworkNetwork
DatabaseDatabaseDatabaseRestoreRestoreRestore
Hours of Lost Hours of Lost Hours of Lost Transactions (RPO)Transactions (RPO)Transactions (RPO)
Hours Required to Hours Required to Hours Required to Resume Business (RTO)Resume Business (RTO)Resume Business (RTO)
Standby Operating Standby Operating Standby Operating
SystemSystemSystem
Database ReplicationDatabase ReplicationDatabase Replication
ReplicationReplicationReplication
Continuous AvailabilityContinuous AvailabilityContinuous Availability
� Wait till it happens
Recovery Strategies
TTT
WWWEEE
NETWORKNETWORKNETWORK
?????? ???
???
???
?????? ???
� Emergency Equipment Replacement
TTT
WWWEEE
NETWORKNETWORKNETWORK
RecoveryRecoveryRecovery
Recovery Strategies
� Reciprocal Agreement (sometimes known as ‘sharing’ your disaster)
NETWORKNETWORKNETWORK
TTTTTT
PartnerPartnerPartner
Recovery Strategies
� Hot Site
NETWORKNETWORKNETWORK
TTTTTT...
...
Recovery Strategies
� Cold Site
NETWORKNETWORKNETWORK
TTT...
...TTT
Recovery Strategies
Buckets and Bins of Data
IBM Cards and Dollar (1930)
Note that in 1890 HermanHollerith made the size of thedata-processsing card (latercommonly known as the IBMcard) equal to the dollar bill ofthat time. Chosing that sizeallowed reuse of existing filingbins and adaptation of othercurrency manipulatingequipment. We show a 1930 SilverCertificate Dollar bill that still hadthe same size. In 1929 the dollarbill was reduced by 20% in bothdimensions.
26
Trip Down Memory Lane…
27
1958 IBM 350
1 Gig circa 1992 & today
28
Line Speed 110 BPS
29
300 BPS
30
300 BPS w/o Ma Bell
31
Still 300 BPS but now ‘Smart’
32
Early 1980s 1200 BPS
33
Mid-1980’s 4.8kbs up to 28.8kbs
34
Mid-1990’s up to 33.6kps
35
Late 90’s maxed at 56kbs
36
Line speed comparison
Dial Up
• 4.8 kbs
• 9.6 kbs
• 14.4 kbs
• 19.2 kbs
• 28.8 kbs
• 56.0 kbs
Dedicated
• ISDN 128 kbs
• T1 1.54 mbps
• DSL 8.0 mbps
• Cable 52 mbps
• T3 44.7 mbps
• Ethernet 1 gig
• OC 256 13 gig
37
1982 versus 2012
1982 VAX 11-780
16 Mg of RAM
1 Gig of Storage
$500,000
2012 SONY Laptop
8 Gig of RAM
750 Gig storage
$680
38
-
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
VAX 11-780 SONY Laptop
RAM
- 200.00 400.00 600.00 800.00
VAX 11-780
SONY Laptop
Storage
Disk Storage
$- $100,000.00 $200,000.00 $300,000.00 $400,000.00 $500,000.00 $600,000.00
VAX 11-780
SONY Laptop
Cost
Cost ($)
39
Recovery Reality
• Tape to Remote Hot Site
– 1 day to get tapes and people ‘there’
– 1 to 2 days to restore network & system data
– 1 to 2 days to restore applications
– 3 to 5 days total
• Critical Issues
– Staff availability / Day or regional incident
– Inertia / Starting is different from Running
– Synchronization / All apps need to resynch
40
• Data Center
• Call Center
• Corporate HQ
• Reputation Protection
• Critical Business Units
• Non-Critical Bus
• Enterprise
• Critical Vendors
• Supply Chain
Evolution from DR (past) to BC (present)
41
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
42
Average Time to Recover
Traditional RecoveryTraditional RecoveryTraditional Recovery
TransactionsTransactionsTransactionsNot CapturedNot CapturedNot Captured
DeclarationDeclarationDeclarationTransactionTransactionTransactionRecreationRecreationRecreation
Data Data Data RetrievalRetrievalRetrieval
TransitTransitTransitSystemSystemSystemRestoreRestoreRestore
IPL &IPL &IPL &NetworkNetworkNetwork
DatabaseDatabaseDatabaseRestoreRestoreRestore
Hours of Lost Hours of Lost Hours of Lost Transactions (RPO)Transactions (RPO)Transactions (RPO)
Hours Required to Hours Required to Hours Required to Resume Business (RTO)Resume Business (RTO)Resume Business (RTO)
Standby Operating Standby Operating Standby Operating
SystemSystemSystem
Database ReplicationDatabase ReplicationDatabase Replication
ReplicationReplicationReplication
Continuous AvailabilityContinuous AvailabilityContinuous Availability
� Mirrored Site
NETWORKNETWORKNETWORK
EEE
WWW
TTTTTT
WWW
EEE
Recovery Strategies
44
AAA BBB
AAA
100 %100 %100 %
100 %100 %100 %
100 %100 %100 %
60 60 60 --- 75 %75 %75 %
ProductionProductionProductionDevelopment Development Development
Test Test Test
SandboxSandboxSandbox
� Split Site
Recovery StrategiesRecovery StrategiesRecovery Strategies
45
AAA
AAA
CCC
CCC
BBB
BBB
150 %150 %150 %
150 %150 %150 %
150 %150 %150 %
� Non-stop Operations
Recovery StrategiesRecovery StrategiesRecovery Strategies
How did we get here? Moore’s Law
Moore's law is the observation that over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years.
46
47
• Data Center
• Call Center
• Corporate HQ
• Reputation Protection
• Critical Business Units
• Non-Critical Bus
• Enterprise
• Critical Vendors
• Supply Chain
Evolution from to Continuity / Recovery (present) to Resiliency (future)
• Moving to non-stop business and operations thanks to innovations in technology.
• Making this capability affordable
48
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
49
Average Time to Recover
Traditional RecoveryTraditional RecoveryTraditional Recovery
TransactionsTransactionsTransactionsNot CapturedNot CapturedNot Captured
DeclarationDeclarationDeclarationTransactionTransactionTransactionRecreationRecreationRecreation
Data Data Data RetrievalRetrievalRetrieval
TransitTransitTransitSystemSystemSystemRestoreRestoreRestore
IPL &IPL &IPL &NetworkNetworkNetwork
DatabaseDatabaseDatabaseRestoreRestoreRestore
Hours of Lost Hours of Lost Hours of Lost Transactions (RPO)Transactions (RPO)Transactions (RPO)
Hours Required to Hours Required to Hours Required to Resume Business (RTO)Resume Business (RTO)Resume Business (RTO)
Standby Operating Standby Operating Standby Operating
SystemSystemSystem
Database ReplicationDatabase ReplicationDatabase Replication
ReplicationReplicationReplication
Continuous AvailabilityContinuous AvailabilityContinuous Availability
The future is now
50
AAA
AAA
CCC
CCC
BBB
BBB
150 %150 %150 %
150 %150 %150 %
150 %150 %150 %
� Non-stop Operations
Recovery StrategiesRecovery StrategiesRecovery Strategies
Solid State Storage
51
• Flash memory is a non-volatile computer storage chip thatcan be electrically erased and reprogrammed. It wasdeveloped from EEPROM (electrically erasable programmableread-only memory).
• Primarily used in memory cards, USB flash drives, solid-statedrives, and similar products, for general storage and transferof data.
Phase-change Memory (PCM)
52
• Around for more than 40 years• PCM is a key component of rewritable
CDs, DVDs and Blu-ray storage disks thatuse laser optics.
• Flash disks are limited to holding one bitof data per storage cell. However, IBM’sPCM research team in Zurich found a wayto enable each PCM cell to hold multipledata bits securely; previously, bits oftenbecame lost or corrupt at unpredictabletimes.
• Same storage as NAND flash disks, whichnow are up to 1TB in capacity, but deliverabout 100 times faster data movementspeed to go with a much longer lifespan.
• “Today’s enterprise flash can endureabout 30,000 read/write cycles; today’sPCM chips can do in excess of 10 millioncycles,” Pozidis said.
From the Researchers at IBM
53
Atomic-Scale Storage
54
Wow
55
PCIe
56
• PCIe-based flash storage has the ability to bypass traditional storage overhead by reducing latencies, increasing throughput and enabling efficient processing of massive quantities of data.
• PCIe, an expansion-card standard based on point-to-point serial links.
Virtual Machine Cloning
57
• “virtual machine image cloning” as an alternative to file-system and data-store snapshots.
• “Right now, when you have a virtual machine running, you create a snapshot, which is a child of the current virtual machine,” explained WimCoekaerts, who serves as Oracle’s senior vice president of Linux and virtualization engineering. “But that’s not something that can independently grow afterward. With a clone, you have a new entity that can have its own life …”
• A snapshot is an object and a part of the virtual disk, ... Clones are completely new virtual disk objects, independent units that can have new lives of their own.
• Automated disaster recovery—either on-premises or from a cloud service—is coming. In the past, reconnecting data stores with systems and getting those systems running after a power outage was done manually. However, software now available is smart enough to get large portions of a virtualized system back online much faster and with less effort.
• Dell EqualLogic, EMC Data Domain, Hewlett-Packard and VMware are some of the vendors that offer this.
58
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
59
Category 5 Services, LLC
Thank You!Thank You!Thank You!Thank You!Thank You!Thank You!
Questions?Questions?Questions?Questions?Questions?Questions?
Glen CuroleGlen Curole
(918) 344(918) 344--99989998
Lafayette, LouisianaLafayette, Louisiana
Martin MyersMartin Myers
(804) 332(804) 332--30133013
Richmond, Virginia Richmond, Virginia
60
Agenda
• Introductions & Expectations
• General
• Past
• Present
• Future
• Extras
• Questions & Answers
61
Measuring BC Competency Improvement
62
BC Maturity Model
63
Tangible Disaster Impact
Annual Revenue After Disaster
$-
$200
$400
$600
$800
$1,000
$1,200
$1,400
Year 0 Year 1 Year 2 Year 3 Year 4 Year 5
MIL
LIO
NS
0% Loss / 20% Growth 5% Loss / 10% Growth
10% Loss / 5% Growth
Total Rev. % Decrease
$ 5.0 B 0%
$ 3.4 B 32%
$ 3.0 B 40%
64
Crisis Management & Business Continuity Policy
• Policy #
• Sections
– Statement of Purpose
– Organization & Structure
– Incident Management – Corporate Responsibility
– Business Continuity – Local Management Responsibility
• Business Resumption Planning
• IT Resumption Planning
• Supplier BC Planning
• Funding
65
BC Roles & Responsibilities
BU ResponsibilitiesBU ResponsibilitiesBU Responsibilities: Create their plan. Review/approve IT & Vendor plans. FUND IT Plan.: Create their plan. Review/approve IT & Vendor plans. FUND IT Plan.: Create their plan. Review/approve IT & Vendor plans. FUND IT Plan.
BU SiteBU SiteBU Site
ITITIT ITITIT
BU BU BU
RecoveryRecoveryRecovery
SiteSiteSite
Vendor SiteVendor SiteVendor SiteAlt VendorAlt VendorAlt Vendor
SiteSiteSite
RaisedRaisedRaised
FloorFloorFloor
OfficeOfficeOffice
AreaAreaAreaITITIT
RaisedRaisedRaised
FloorFloorFloor
Hot SiteHot SiteHot Site
NetworkNetworkNetwork
ITITIT
Office RecoveryOffice RecoveryOffice Recovery
SiteSiteSite
Data CenterData CenterData Center
IT ResponsibilitiesIT ResponsibilitiesIT Responsibilities: IT portion of BU plan, Bunker & IT Office plans, IT Network plans: IT portion of BU plan, Bunker & IT Office plans, IT Network plans: IT portion of BU plan, Bunker & IT Office plans, IT Network plans
Review & approve IT Vendor plans. Request FUNDING. Implement and Test.Review & approve IT Vendor plans. Request FUNDING. Implement and Test.Review & approve IT Vendor plans. Request FUNDING. Implement and Test.
Vendor ResponsibilitiesVendor ResponsibilitiesVendor Responsibilities: Their plans: Their plans: Their plans
66
6 KEYS for Recovery
• Sufficient capability at the recovery site
• High degree of confidence in ability to restore critical operations in the timeframe you require
• Network adequate to meet needs at time of disaster
• Recovery / resiliency plans complete
• Primary site(s) restoration plan complete
• Plans are tested & updated regularly
• Link Change Control in production to recovery platform & network
67
Danger Signals
• Maturity Level 1 or 2 - ‘At risk’
• Single - Data Center, Call Center, Critical Production or Office Function
• No BIA and/or don’t know MAD, RTO, RPO– Or can’t meet RTOs
• No exercise program or unsuccessful exercise history
• Not knowing– Revenue loss over time
– Market share loss over time
– Time to regain market share
68
Danger Signals
• Significant change in risk profile– Consolidations & Supply chain efforts
– Decrease in bench strength or depth
– Outsourcing, Single points of failure
• Decrease in training or maintenance $s
• Unplanned or increase in unplanned outages
• No BC/DR plans and/or no restoration documentation
• Equipment and/or software no longer supported by vendor
• DR is IT’s problem
69
How will you know when you’re There?
• You’ll KNOW who’s going to go where!
• When!
EEs who who can be redeployedEEs who who can be redeployedEEs who who can be redeployed
EEs needing an alt. wk siteEEs needing an alt. wk siteEEs needing an alt. wk site
EEs who can wk. at homeEEs who can wk. at homeEEs who can wk. at home
DMTDMTDMT
IT SystemsIT SystemsIT Systems
Location ALocation ALocation A
Location BLocation BLocation B
Hot SiteHot SiteHot Site
HomeHomeHome
Alt. SiteAlt. SiteAlt. Site
RedeployedRedeployedRedeployed
ExecExecExec