introduction to ibm ha

22
© Copyright IBM Corporation 2004 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Welcome to: 3.0.2 3.0.3 Introduction to High-Availability Introduction to High-Availability

Upload: harikrishnan-arun

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 1/22

© Copyright IBM Corporation 2004

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Welcome to:

3.0.23.0.3

Introduction to High-AvailabilityIntroduction to High-Availability

Page 2: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 2/22

© Copyright IBM Corporation 2004

Unit Objectives

 After completing this unit, you should be able to:

Understand what high availability is

Understand why you might need high availability

Outline the various options for implementing high availability

Compare and contrast the high availability optionsState the benefits of using highly available clusters

Understand the key considerations when designing andimplementing a high availability cluster 

Be familiar with the basics of risk analysis

Page 3: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 3/22

© Copyright IBM Corporation 2004

So, What Is High Availability?

High Availability is...

The masking or elimination of both planned and unplanned downtime.

The elimination of single points of failure (SPOFs).

Fault resilience, but NOT fault tolerance.

Workload Fallover 

Production Standby

Client

WAN

Page 4: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 4/22© Copyright IBM Corporation 2004

Planned downtime:

Hardware upgrades

Repairs

Software updatesBackups

Testing

Development

So Why Is Planned Downtime Important?

High availability solutions should reduce bothplanned and unplanned downtime.

Unplanned downtime:

 Administrator Error 

 Application failure

Hardware faultsEnvironmental Disasters

1.0%14.0%

85.0%

Hardware Failure (1%)

Other unplanned downtime (14%)

Planned downtime (85%)

Page 5: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 5/22© Copyright IBM Corporation 2004

Continuous Availability Is the Goal

Continuous Availability

Continuous

Operations

High

 Availability

Elimination of Downtime

Masking or elimination of

planned downtime

Masking or elimination of

unplanned downtime

Page 6: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 6/22© Copyright IBM Corporation 2004

Eliminating Single Points of Failure

Cluster Object Eliminated as a single point of failure by . . .

Node Using multiple nodes

Power Source Using multiple circuits or uninterruptible power supplies

Network adapter Using redundant network adapters

Network Using multiple networks to connect nodes

TCP/IP Subsystem Using serial networks to connect adjoining nodes and clients

Disk adapter Using redundant disk adapters

Disk Using redundant hardware and disk mirroring and/or striping

 Application Assigning a node for application takeover; configuring anapplication monitor 

 A fundamental design goal of (successful) cluster design is

the elimination of single points of failure (SPOFs).

Page 7: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 7/22© Copyright IBM Corporation 2004

Availability - from Simple to Complex

Stand-alone

Enhanced

High Availability

Cluster 

Fault

Tolerant

Page 8: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 8/22© Copyright IBM Corporation 2004

The Stand-alone SystemThe stand-alone system may offer limited availability benefits:

Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power 

Redundant CoolingECC MemoryHot Swap AdaptersDynamic KernelDisk mirroring

Example single points of failure:

Disk Adapter/ Data PathsNo Hot Swap StoragePower for Storage Arrays

Cooling for Storage ArraysHot Spare StorageNode/Operating SystemNetworkNetwork Adapter 

 ApplicationSite Failure (SAN distance)Site Failure (via mirroring)

Page 9: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 9/22© Copyright IBM Corporation 2004

The Enhanced System

The enhanced system may offer increased availability benefits:

Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power Redundant Cooling

ECC MemoryHot Swap AdaptersDynamic KernelDisk MirroringRedundant Disk adapters/multiple paths

Hot Swap StorageRedundant Power for Storage ArraysRedundant Cooling for Storage ArraysHot Spare Storage

Example single points of failure:Node/Operating SystemNetwork Adapter Network

 ApplicationSite Failure (SAN distance)Site Failure (via mirroring)

Page 10: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 10/22© Copyright IBM Corporation 2004

High-Availability Clusters (HACMP)

Clustering technologies offer high-availability:

Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power 

Redundant CoolingECC MemoryHot Swap AdaptersDynamic KernelRedundant Data Paths

Data MirroringHot Swap StorageRedundant Power for Storage ArraysRedundant Cooling for Storage ArraysHot Spare StorageDual Disk AdaptersRedundant nodes (operating system)Redundant Network AdaptersRedundant NetworksApplication MonitoringSite Failure (SAN distance)

Example single points of failure:Site Failure (via mirroring)

C

Page 11: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 11/22© Copyright IBM Corporation 2004

Fault-Tolerant Computing

Fault-tolerant solutions should not fail:

Lock Step CPUsHardened Operating SystemHot Swap StorageContinuous Restart

Example single points of failure:

Site Failure (SAN distance)Site Failure (via mirroring)

A il bilit S l ti

Page 12: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 12/22© Copyright IBM Corporation 2004

Availability Solutions

Stand-aloneEnhancedStandalone

High AvailabilityClusters

Fault-tolerantComputers

Solutions

 Availabilitybenefits

Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power Redundant CoolingECC MemoryHot Swap AdaptersDynamic Kernel

Redundant Data PathsData MirroringHot Swap StorageRedundant Power forStorage Arrays

Redundant Cooling forStorage Arrays

Hot Spare Storage

Redundant ServersRedundant NetworksRedundant Network AdaptersHeartbeat MonitoringFailure DetectionFailure Diagnosis

 Automated Fallover  Automated Reintegration

Lock Step CPUsHardened Operating SystemRedundant MemoryContinuous Restart

Downtime Couple of days Couple of hoursDepends, but

typically 3 mins

In theory, none!

Data AvailabilityGood as your

last full backupLast transaction Last transaction No loss of Data

Relative Cost* 1 1.5 2-3 10+

* All other parameters being equal.

Simple Complex

S Wh t Ab t Sit F il ?

Page 13: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 13/22

© Copyright IBM Corporation 2004

So, What About Site Failure?

Toronto London

Data Replication

Near distance (using SAN) supported by HACMP 5.2

Far distance, (requires data mirroring) invest in a GeographicClustering Solution (for example, HACMP XD*)

Distance unlimited

Data replication across a geography

 Application, disk and network independent

 Automated site failover and reintegration A single cluster across two sites

*The HACMP XD feature of HACMP contains IBM's HAGEO product and PPRC support .

Wh Mi ht I N d Hi h A il bilit ?

Page 14: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 14/22

© Copyright IBM Corporation 2004

Why Might I Need High Availability?

60% of all large companies now operate round the clock (7x24)

Losses on failure:330,000 $US per hour (industry average)

Peak losses: 130,000 $US per minute (telephone network)

Loss of customer loyaltyLoss of customer confidence

 And, if there is no disaster recovery:50% of affected companies will never reopen

90% of affected companies are out of business in less than two years

Note: High Availability is NOT a Disaster Recovery solution.

$ £

0

50

100

150

200

Lose of Revenue $M

E

B fit f Hi h A il bilit S l ti

Page 15: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 15/22

© Copyright IBM Corporation 2004

Benefits of High-Availability Solutions

High-availability solutions offer the following benefits:

Standard components (no specialized hardware)Can be built from existing hardware (no need to invest in new kit)Work with just about any application

Work with wide range of disk and network typesNo specialized operating system or microcodeExcellent availability at low cost

+ =

Standard Components High Availability Solution

Other Considerations for High Availability

Page 16: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 16/22

© Copyright IBM Corporation 2004

Highavailability

Continuousoperation

Continuous availability

Systems

Management

People

Data

Hardware

Software

Environment

Networking

Other Considerations for High-Availability

High-availability solutions require the following:

Thorough design and detailed planningElimination of single points of failureSelection of appropriate hardwareCorrect implementationDisciplined system administration practicesDocumented operational proceduresComprehensive testing

A Philosophical View of High Availability

Page 17: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 17/22

© Copyright IBM Corporation 2004

A Philosophical View of High Availability

The goal of an HA cluster is to make a service highly available.

Users aren't interested in highly available hardware.Users aren't even interested in highly available software.

Users are interested in the availability of services.

Therefore, use the hardware and the software to make the serviceshighly available.

Cluster design decisions should be judged on the basis of whetheror not they:

Contribute to availability (for example, eliminate a SPOF)Detract from availability (for example, gratuitous complexity)

Since it is impractical if not impossible to truly eliminate all SPOFs,

be prepared to use risk analysis techniques to determine whichSPOFs are tolerated and which must be eliminated

Classic Risk Analysis

Page 18: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 18/22

© Copyright IBM Corporation 2004

Classic Risk Analysis

1. Identify relevant policies

What existing risk tolerance policies are available?2. Study the current environment

Understand what strengths (for example, server room is on a properly sizedUPS) and weaknesses (for example, no disk mirroring) exist today

3. Perform requirements analysisJust how much availability is required?

What is the acceptable likelihood of a long outage?

4. Hypothesize vulnerabilities

What can possibly go wrong?

5. Identify and quantify risks

The statistical probability of something going wrong over the life of the

project (or the likely number of times something will go wrong over the life ofthe project) multiplied by the cost of an occurrence

6. Evaluate countermeasures

What does take to reduce the risk (by reducing the likelihood

or consequences of an occurrence) to an acceptable level7. Make decisions, create a budget and plan the cluster 

What Do We Plan to Achieve This Week?

Page 19: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 19/22

© Copyright IBM Corporation 2004

What Do We Plan to Achieve This Week?

 A

B

 A

B

Your mission this week is to build a two-node highly available cluster

using two previously separate pSeries systems, each of which has anapplication which needs to be made highly available.

Checkpoint

Page 20: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 20/22

© Copyright IBM Corporation 2004

Checkpoint

1. Which of the following is a characteristic of high availability?

a. High availability always requires specially designed hardware components.b. High availability solutions always require manual intervention to ensure recovery following

failover.

c. High availability solutions never require customization.

d. High availability solutions offer excellent price performance when compared with Fault

Tolerant solutions.

2. True or False?High availability solutions never fail.

3. True or False? A thorough design and detailed planning is required for all high availability solutions.

4. True or False?The cluster shown on the foil titled "What We Plan to Achieve This Week" has no obvioussingle points of failure.

5. A proposed cluster with a two year life (for planning purposes) has avulnerability which is likely to occur twice per year at a cost of $10,000 peroccurrence. It costs $25,000 in additional hardware costs to eliminate thevulnerability. Should the vulnerability be eliminated?

a. yesb. no

Checkpoint Answers

Page 21: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 21/22

© Copyright IBM Corporation 2004

Checkpoint Answers

1. Which of the following is a characteristic of high availability?

a. High availability always requires specially designed hardware components.b. High availability solutions always require manual intervention to ensure recovery followingfailover.

c. High availability solutions never require customization.

d. High availability solutions offer excellent price performance when compared with Fault

Tolerant solutions.2. True or False?

High availability solutions never fail.

3. True or False?

 A thorough design and detailed planning is required for all high availability solutions.

4. True or False? (the local area network is a SPOF)

The cluster shown on the foil titled "What Will We Achieve This Week" has no obvioussingle points of failure.

5. A proposed cluster with a two year life (for planning purposes) has avulnerability which is likely to occur twice per year at a cost of $10,000 peroccurrence. It will cost $25,000 in additional hardware costs to eliminatethe vulnerability. Should the vulnerability be eliminated?

a. yes ($25,000 is less than $10,000 times four)b. no

Unit Summary

Page 22: Introduction to IBM HA

8/13/2019 Introduction to IBM HA

http://slidepdf.com/reader/full/introduction-to-ibm-ha 22/22

© Copyright IBM Corporation 2004

Unit Summary

Having completed this unit, you should be able to:

Understand what high availability is

Understand why you might need high availability

Outline the various options for implementing high availability

Compare and contrast the high-availability optionsState the benefits of using highly available clusters

Understand the key considerations when designing andimplementing a high-availability cluster 

Be familiar with the basics of risk analysis