© 2011 ibm corporation architect’s 2013 guide to designing ha, bc, and dr - best practices...

34
© 2011 IBM Corporation Architect’s 2013 Guide to Designing HA, BC, and DR - Best Practices Industry Best Practices - IT HA DR BC Provided by: John Sing, Executive IT Consultant, San Jose, California [email protected].

Upload: godwin-wells

Post on 01-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

© 2011 IBM Corporation

Architect’s 2013 Guide to Designing

HA, BC, and DR - Best Practices

Industry Best Practices - IT HA DR BC

Provided by: John Sing, Executive IT Consultant, San Jose, California [email protected]

© 2013 IBM Corporation2

Industry Best Practices – IT HA DR BC

September 2013

Contents

Principles of architecting traditional IT HA, DR, BC

Technology and location considerations

Traditional Workloads vs. Internet Scale Workloads

Best Practices Step by Step Methodology

© 2013 IBM Corporation3

Industry Best Practices – IT HA DR BC

September 2013

Four Stages of Data Center Efficiency: (pre-req’s for HA/BC/DR)

http://public.dhe.ibm.com/common/ssi/ecm/en/rlw03007usen/RLW03007USEN.PDF http://www-935.ibm.com/services/us/igs/smarterdatacenter.html

April 2012

© 2013 IBM Corporation4

Industry Best Practices – IT HA DR BC

September 2013

Application 1Application 3Analytics

report

managementreports

http://xyz.xml

decisionpoint

MQseries

WebSphere

Application 2

SQL

db2

Businessprocess A

Businessprocess B

Businessprocess C

Businessprocess D

Businessprocess E

Businessprocess F

Businessprocess G

Infr

astr

uctu

reA

pp

licati

on

Bu

sin

ess

1. An error occurs on a storage device that correspondingly corrupts a database

2. The error impacts the ability of two or more applications to share critical data

3. The loss of both applications affects two distinctly different business processes

IT Business Continuity must recover at the business processlevel

Business Process is the Recoverable Unit

© 2013 IBM Corporation5

Industry Best Practices – IT HA DR BC

September 2013

Still true: synergistic overlap of valid data protection techniques

Protection of critical Business data Operations continue after a disaster

Costs are predictable and manageableRecovery is predictable and reliable

Fault-tolerant, failure-resistant streamlined infrastructure

with affordable cost foundation

1. High Availability Non-disruptive backups and

system maintenance coupled with continuous availability of

applications

2. Continuous Operations Protection against unplanned

outages such as disasters through reliable, predictable

recovery

3. Disaster Recovery

IT DataProtection

© 2013 IBM Corporation6

Industry Best Practices – IT HA DR BC

September 2013

Done?

?

Still true: Timeline of an IT Recovery ==>

Production ☺ Network Staff

Operations StaffOperations Staff

Data

Operating System

Physical Facilities

Telecom Network

Management Control

Execute hardware, operating system, and data integrity recovery

AssessRPO

Application transactionintegrity recovery

Applications

Now we're done!

Applications Staff

Recovery Time Objective (RTO)of transaction integrity

Recovery Time Objective (RTO)of hardware data integrity

Recovery Point Objective

(RPO)

How much datamust be

recreated?

Outage!

RPO

Telecom bandwidth still the major delimiterfor any fast recovery

© 2013 IBM Corporation7

Industry Best Practices – IT HA DR BC

September 2013

?

Still true: value of Automation for real-time failover ===>

Production ☺ Network StaffOperations StaffOperations Staff

Data

Operating System

Physical Facilities

Telecom Network

Management Control

AssessRPO

Trans.Recov.

Applications

Now we're done!

Applications Staff

RTO trans. integrity

RTO H/W

Recovery Point Objective

(RPO)

How much datamust be

recreated?

Outage!

RPO

HW

•Reliability

•Repeatability

•Scalability

•Frequent Testing

Value of automation

© 2013 IBM Corporation8

Industry Best Practices – IT HA DR BC

September 2013

Tape Backup

SecsMinsHrsDays Wks Secs Mins Hrs Days Wks

Recovery PointRecovery Point Recovery TimeRecovery Time

Synchronous replication / HA

Periodic Replication

Asynchronous replication

Still true: Replication Technology Drives RPO

For example:

© 2013 IBM Corporation9

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time includes:

– Fault detection

– Recovering data

– Bringing applications back online

– Network access

Manual Tape Restore

SecsMinsHrsDays Wks Secs Mins Hrs Days Wks

Recovery PointRecovery Point Recovery TimeRecovery Time

End to end automated clustering

Storage automation

Still true: Recovery Automation Drives Recovery Time

For example:

© 2013 IBM Corporation10

Industry Best Practices – IT HA DR BC

September 2013

Integration into IT ManageBusiness Prioritization

StrategyDesign

riskassessment

businessimpactanalysis

Risks,

Vulnerabilities

and Threats

programassessment

Impacts

of

Outage

RTO/RPO

•Maturity Model

•Measure ROI

•Roadmap for Program

ProgramDesign

Current

Capability

Implement programvalidation

Estimated

Recovery Tim

e

ResilienceProgram

Management

Awareness, Regular Validation, Change Management, Quarterly Management Briefings

Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels.

crisis team

businessresumption

disasterrecovery

highavailability

1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities

Database andSoftware design

High Availability Servers

Storage, Data Replication

High Availabilitydesign

Source: IBM STG, IBM Global Services

Still true: “ideal world” construct for IT High Availability and Business Continuity

© 2013 IBM Corporation11

Industry Best Practices – IT HA DR BC

September 2013

The 2013 Bottom line: (IT Business Continuity Planning Steps)

For today’s real world environment……….

Integration into IT ManageBusiness Prioritization

StrategyDesign

riskassessment

businessimpactanalysis

Risks,

Vulnerabilities

and Threats

programassessment

Impacts

of

Outage

RTO/RPO

• Maturity Model

• Measure ROI

• Roadmap for Program

ProgramDesign

Current

Capability

Implement programvalidation

Estimated

Recovery Tim

e

ResilienceProgram

Management

Awareness, Regular Validation, Change Management, Quarterly Management Briefings

crisis team

businessresumption

disasterrecovery

highavailability

1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities

Database andSoftware design

High Availability Servers

Data Replication

high availabilitydesign

i.e. how to streamline this “ideal” process?1. Collect information for prioritization

2. Vulnerability, risk assessment, scope

3. Define BC targets based on scope

4. Solution option design and evaluation

5. Recommend solutions and products

6. Recommend strategy and roadmap

4. Solution option design and evaluation

5. Recommend solutions and products

6. Recommend strategy and roadmap

2013 key #2:

Workload type

2013 key #1:

need a basicData Strategy

Need faster way than even this simplified 2007 version:

© 2013 IBM Corporation12

Industry Best Practices – IT HA DR BC

September 2013Streamlined BC ActionsInput Output

2. Vulnerability / Risk Assessment

List of vulnerabilities Defined vulnerabilities

3. Define desired HA/BC targets based on scope

Existing BC capability, KPIs, targets, and success rate

Defined BC baseline targets, architecture, decision and success criteria

4. Solution design andevaluation

Technologies and solution options

Business process segmentsand solutions

5. Recommend solutions and products

Generic solutions that meet criteria

Recommended IBMSolutions and benefits

1. Collect info forprioritization

Business processes, Key Perf. Indicators, IT inventory

Scope, Resource Business Impact

Component effect on business processes

6. Recommend strategy and roadmap

Budget, major project milestones, resource availability, business process priority

Baseline Bus. Cont. strategy, roadmap, benefits, challenges,financial implications andjustification

2005 version

© 2013 IBM Corporation13

Industry Best Practices – IT HA DR BC

September 2013

Scope definition of Business Continuity program

Frequency ofOccurrences

Per Year

Consequences (Single Occurrence Loss) in Dollars per Occurrence

1,000

100

10

1

1/10

1/100

1/1,000

1/10,000

1/100,000

Virus

WormsDisk Failure

Component Failure

Power Failure

frequent

infr

equent

lower higher

Natural Disaster

Application Outage

Data Corruption

Network Problem

Building Fire

Terrorism/Civil Unrest

availability-related

recovery-related

This becomes the scope of HA/BC

progrom

© 2013 IBM Corporation14

Industry Best Practices – IT HA DR BC

September 2013

Define scope based on prioritized vulnerabilitiesSet expectation for phased implementation

Example chart at left shows Vulnerability / Risk Assessment:

– Define what will be on the chart– This defines the scope of the Business Continuity

solution

Divide Scope into implementation phases– Do not try to solve all vulnerabilities at once– Instead, focus on delivering tangible visible value in

each project step – Portray that scope expands as project progresses– This matches expenditure with increasing probability

over timeriskrisk

risk

risk

6 months12 months18 months

Total Scope

Likelihood

Imp

act risk

risk

risk

riskriskrisk

risk

Risk Assessment

© 2013 IBM Corporation15

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time Objective (guidelines only)

15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Co

st

/ Va

lue

BC Tier 4 – Add Point in Time replication to Backup/Restore

BC Tier 3 – VTL, Data De-Dup, Remote vault

BC Tier 2 – Tape libraries + Automation

BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery

BC Tier 6 – Add real-time continuous data replication, server or storage

BC Tier 1 – Restore from Tape

Step by Step: Typical three phase approach to implementing High Availability, Business Continuity Technologies

Balancing recovery time objective with cost / value

BC Tier 5 – Add Application/database integration to Backup/Restore

Recovery from a disk image Recovery from tape copy

© 2013 IBM Corporation16

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time Objective

15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Co

st

/ Va

lue

BC Tier 4 – Add Point in Time replication to Backup/Restore

BC Tier 3 – VTL, Data De-Dup, Remote vault

BC Tier 2 – Tape libraries + Automation

BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery

BC Tier 6 – Add real-time continuous data replication, server or storage

BC Tier 1 – Restore from Tape

Recovery from a disk image Recovery from tape copy

Step by Step Virtualization, High Availability, Business Continuity data strategy

Balancing recovery time objective with cost / value

BC Tier 5 – Add Application/database integration to Backup/Restore

Continuous AvailabilityContinuous Availability

Rapid Data RecoveryRapid Data Recovery

Backup/RestoreBackup/Restore

Workload typesStorage Pools

Clouddeploymentif needed

© 2013 IBM Corporation17

Industry Best Practices – IT HA DR BC

September 2013

? IT Virtualization, Consolidation enhances

Data Protection

Funding given today’s cost crunch?Complexity of infrastructure to recover?Priorities? Resources? Data Protection is an intended side benefit of

Consolidation, Virtualization

Fact: accelerating IT Consolidation, Virtualization, will accelerate Data Protection

Strategic Approach: Data protection is intended side-benefit of IT Virtualization

Data Protection Fewer Components to Recover Invest percentage of Savings

Invest in more robust Business Resiliency

Standardize and optimize IT and Business Resiliency solution design

Load Balancing Solution architecture

HA/BC pre-requisite:IT Virtualization and Consolidation

Cost-Effective Storage

and IT Efficiency

Application Servers

High-End Workstations

Database

End Users

Protocols

SANCIFSNFS

HTTPFTP

ManagementCentralAdministratio

nMonito

ringFile

Mgmt

AvailabilityData Migration

ReplicationBackup

© 2013 IBM Corporation18

Industry Best Practices – IT HA DR BC

September 2013

For traditional IT - Virtualization is fundamental to addressing today’s IT diversity

Virtualization

© 2013 IBM Corporation19

Industry Best Practices – IT HA DR BC

September 2013

IT Virtualization is the means to achieve IT Business Continuity

I.e. consolidate Servers, Storage, into virtualized systems

Provides the change agent and political momentum to enable Business Continuity implementation

Reduces management complexity using integrated virtualization and management software

Provides workload optimization needed for affordable maximum performance and efficiency

Becomes possible to identify what to replicate and manage that replication

Implements key tools such as virtual resource mobility within the ensemble

Is perfect foundation to implement the necessary IT strategy, design, tools, procedures, and testing to create IT Business Continuity

Because it also provides the umbrella

and political change-agent required to

allow IT Business Continuity to be

implemented as a by-product

© 2013 IBM Corporation20

Industry Best Practices – IT HA DR BC

September 2013

Virtualized IT infrastructure Business Processes

Virtualized systems become the resource pools that enable the recoverability

For traditional IT - Consolidated virtualized systems become the Recoverable Units for IT Business Continuity

Virtualization

© 2013 IBM Corporation21

Industry Best Practices – IT HA DR BC

September 2013

IT storage infrastructure …… Before:

End Users

Servers and Storage

Database

Underutilized Segmented StorageCopies of Data

Application Servers

High-End Workstations

© 2013 IBM Corporation22

Industry Best Practices – IT HA DR BC

September 2013

Transformation To Standardization, Virtualization

Servers And Storage

Database

Underutilized Segmented StorageCopies of Data

Application Servers

High-End Workstations

(animated chart)

End Users

VirtualizedStorage

VirtualizationSANNAS

ManagementCentral

AdministrationMonitoringFile Mgmt

AvailabilityData Migration

ReplicationBackup

Virtualized Storage

Ability to move data between

storage pools

Tiered

Storage

Virtualized

De-dup,

tape

High performance

petabyte

scale

Here arethe benefits:

© 2013 IBM Corporation24

Industry Best Practices – IT HA DR BC

September 2013

Key strategy: using standardized virtualization, segment data into logical data storage pools by appropriate Data Protection characteristics

Continuous Availability (CA) – E2E automation enhances RDR– RTO = near continuous, RPO = small as possible (Tier 7)– Priority = uptime, with high value justification

Lower cost

Rapid Data Recovery (RDR) – enhance backup/restore– For data that requires it– RTO = minutes, to (approx. range): 2 to 6 hours– BC Tiers 6, 4– Balanced priorities = Uptime and cost/value

Backup/Restore (B/R) – assure efficient foundation – Standardize base backup/restore foundation – Provide universal 24 hour - 12 hour (approx) recovery capability– Address requirements for archival, compliance, green energy– Priority = cost

Mission Critical

Know and categorize your data -

Provides foundation for affordable data protection

Know and categorize your data -

Provides foundation for affordable data protection

Enabled by

virtualization

© 2013 IBM Corporation25

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time Objective

15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Co

st

/ Va

lue

BC Tier 4 – Add Point in Time replication to Backup/Restore

BC Tier 3 – VTL, Data De-Dup, Remote vault

BC Tier 2 – Tape libraries + Automation

BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery

BC Tier 6 – Add real-time continuous data replication, server or storage

BC Tier 1 – Restore from Tape

High Availability, Business Continuity Step by Step virtualization journey

Balancing recovery time objective with cost / value

BC Tier 5 – Add Application/database integration to Backup/Restore

Recovery from a disk image Recovery from tape copy

Foundation

Storage pools

© 2013 IBM Corporation26

Industry Best Practices – IT HA DR BC

September 2013Storage Pools

Apply appropriate server, storage technology

Real Time replication(storage or server or

software)

Real Time replication(storage or server or

software)

Periodic PiT replication:-File System

- Point in Time Disk- VTL to VTL with Dedup

Periodic PiT replication:-File System

- Point in Time Disk- VTL to VTL with Dedup

- Foundation backup/restore- Physical or electronic transport

- Foundation backup/restore- Physical or electronic transport

PetaByteUnstructured

PetaByteUnstructured

PetabyteUnstructured

PetabyteUnstructured

Petabyte unstructured, due to usage and large scale, typically uses

application level intelligent redundancyfailure toleration design

Petabyte unstructured, due to usage and large scale, typically uses

application level intelligent redundancyfailure toleration design

Real-time replication

Point in time

Removable media

File, application, or disk-to-disk

periodic replication

Add automated failover to replicated storage

© 2013 IBM Corporation27

Industry Best Practices – IT HA DR BC

September 2013

Step by step – architecting remote solution

© 2013 IBM Corporation28

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time Objective

Co

st

Methodology Traditional IT:HA / BC / DR in stages, from bottom up

SAN SAN

Add: Point-in-time Copy, disk to disk, Tiered Storage (Tier 4)Foundation: electronic vaulting, automation, tape lib (Tier 3)

Foundation: standardized, automated tape backup (Tier 2, 1)

Disk VTL/De-DupDisk VTL/De-Dup VTL/De-Dup

•IBM FlashCopy, SnapShot•IBM XIV, SVC, DS, SONAS•IBM Tivoli Storage Productivity Center 5.1

•IBM ProtecTier•IBM Virtual Tape Library•IBM Tivoli Storage Manager Backup/restore

•VTL, de-dup, remote replication at tape level

© 2013 IBM Corporation29

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time Objective

Co

st

SAN SAN

Add: Point-in-time Copy, disk to disk for backup/restore (Tier 4)Foundation: electronic vaulting, automation, tape lib (Tier 3)

Foundation: standardized, automated tape backup (Tier 2, 1)

Disk VTL/De-DupDisk VTL/De-Dup VTL/De-Dup

Applicationintegration

Applicationintegration

Automate applications, database for replication and automation (Tier 5)Consolidate and implement real time data availability (Tier 6)

Datareplication

Data replication

End to end automated site failover servers, storage, applications (Tier 7)

Dynamic

End to endAutomatedFailover:Server

StorageApplications

Methodology Traditional IT HA / BC / DR in stages, from bottom up

If storage: •Metro Mirror, Global Mirror, Hitachi UR•XIV, SVC, DS, other storage•TPC 5.1

•VMWare•PowerHA on p

•Tivoli FlashCopy Manager

•Server virtualization

© 2013 IBM Corporation30

Industry Best Practices – IT HA DR BC

September 2013

IBM Disk Mirroring Technology naming

DS8000DS6000

ESS

DS5000DS4000

DCS3700

DS3000V3700

V7000

N series

.

N series

Entry

Midrange NAS Enterprise SAN

SVCV7000

Virtualization

Metro / Global MirrorThree site synchronous and asynchronous mirroring

– DS8000 (sync+async)– N series (only async)

FlashCopy Point in time copy SVC, V7000, DS3000,

DS4000, DS5000, DS6000, DS8000, ESS, XIV, SONAS, N series

Global Mirror Asynchronous Mirroring SVC, V7000, DCS3700, DS4000,

DS5000, DS6000, DS8000, ESS, XIV, SONAS, N series

Metro Mirror Synchronous Mirroring SVC, V7000, DS3500, DCS3700, DS4000, DS5000, DS6000, DS8000, ESS, XIV, N series

XIVSONAS

© 2013 IBM Corporation31

Industry Best Practices – IT HA DR BC

September 2013

Recovery Time Objective

15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Co

st

/ Va

lue

BC Tier 4 – Add Point in Time replication to Backup/Restore

BC Tier 3 – VTL, Data De-Dup, Remote vault

BC Tier 2 – Tape libraries + Automation

BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery

BC Tier 6 – Add real-time continuous data replication, server or storage

BC Tier 1 – Restore from Tape

Today’s world: High Availability, Business Continuity is a Step by Step data strategy / workload journey

Balancing recovery time objective with cost / value

BC Tier 5 – Add Application/database integration to Backup/Restore

Recovery from a disk image Recovery from tape copy

Workload Types

Data Strategy

Clouddeploymentif needed

© 2013 IBM Corporation32

Industry Best Practices – IT HA DR BC

September 2013

Summary – IT High Availability / Business Continuity Best Practices 2012

Production

Backup/Restore Tier 1, 2 Foundation:

Storage, server virtualization and consolidation

Understand my dataDefine scope of recovery Implement remote

sites (Tier 1, 2)

Backup/Restore Tier 1, 2 replicated foundation:

SAN and server virtualization and consolidation

Implement Tier 3 – Consolidate and standardize Backup/Restore methods. Implement tape VTL, data de-dup, Server / Storage Virtualization / Mgmt tools, basic automation

Backup /Restore

Implement Tier 4 – Standardize use of disk to disk and Point in Time disk copy

Implement Tier 5 - Standardize DB / Application Mirroring methods

Implement Tier 6 – Standardize high volume data replication method

RapidData

Recovery

Implement BC Tier 7 – Standardize use of Continuous Availability automated Failover

ContinuousAvailability

Workload typesData strategy Recovery

© 2013 IBM Corporation33

Industry Best Practices – IT HA DR BC

September 2013

Key IT High Availability, Business ContinuityRequirements Questions (in proper order):

1. What applications or databases to recover?

2. What platform? (z, p, i, x and Windows, Linux, heterogeneous open, heterogeneous z+Open)

3. What is desired Recovery Time Objective (RTO)?

4. What is distance between the sites? (if there are 2 sites)

5. What is the connectivity, infrastructure, and bandwidth between sites?

7. What is the Level of Recovery?- Planned Outage- Unplanned Outage- Transaction Integrity

8. What is the Recovery Point Objective?

9. What is the amount of data to be recovered (in GB or TB)?

10. Who will design the solution?

11. Who will implement the solution?

12. Remaining solutions are valid choices to give to detailed DR evaluation team

6. What are the specific h/w equipment(s) that needs to be recovered?

Tier 4Tier 3

Tier 2

Tier 7Tier 6

Tier 5

Tier 1

© 2013 IBM Corporation34

Industry Best Practices – IT HA DR BC

September 2013

Summary

Clouddeployment

options

Principles of architecting traditional IT HA, DR, BC

Technology and location considerations

Traditional Workloads vs. Internet Scale Workloads

Best Practices Step by Step Methodology

© 2013 IBM Corporation35

Industry Best Practices – IT HA DR BC

September 2013