introduction to ibm ha
TRANSCRIPT
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 1/22
© Copyright IBM Corporation 2004
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Welcome to:
3.0.23.0.3
Introduction to High-AvailabilityIntroduction to High-Availability
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 2/22
© Copyright IBM Corporation 2004
Unit Objectives
After completing this unit, you should be able to:
Understand what high availability is
Understand why you might need high availability
Outline the various options for implementing high availability
Compare and contrast the high availability optionsState the benefits of using highly available clusters
Understand the key considerations when designing andimplementing a high availability cluster
Be familiar with the basics of risk analysis
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 3/22
© Copyright IBM Corporation 2004
So, What Is High Availability?
High Availability is...
The masking or elimination of both planned and unplanned downtime.
The elimination of single points of failure (SPOFs).
Fault resilience, but NOT fault tolerance.
Workload Fallover
Production Standby
Client
WAN
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 4/22© Copyright IBM Corporation 2004
Planned downtime:
Hardware upgrades
Repairs
Software updatesBackups
Testing
Development
So Why Is Planned Downtime Important?
High availability solutions should reduce bothplanned and unplanned downtime.
Unplanned downtime:
Administrator Error
Application failure
Hardware faultsEnvironmental Disasters
1.0%14.0%
85.0%
Hardware Failure (1%)
Other unplanned downtime (14%)
Planned downtime (85%)
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 5/22© Copyright IBM Corporation 2004
Continuous Availability Is the Goal
Continuous Availability
Continuous
Operations
High
Availability
Elimination of Downtime
Masking or elimination of
planned downtime
Masking or elimination of
unplanned downtime
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 6/22© Copyright IBM Corporation 2004
Eliminating Single Points of Failure
Cluster Object Eliminated as a single point of failure by . . .
Node Using multiple nodes
Power Source Using multiple circuits or uninterruptible power supplies
Network adapter Using redundant network adapters
Network Using multiple networks to connect nodes
TCP/IP Subsystem Using serial networks to connect adjoining nodes and clients
Disk adapter Using redundant disk adapters
Disk Using redundant hardware and disk mirroring and/or striping
Application Assigning a node for application takeover; configuring anapplication monitor
A fundamental design goal of (successful) cluster design is
the elimination of single points of failure (SPOFs).
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 7/22© Copyright IBM Corporation 2004
Availability - from Simple to Complex
Stand-alone
Enhanced
High Availability
Cluster
Fault
Tolerant
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 8/22© Copyright IBM Corporation 2004
The Stand-alone SystemThe stand-alone system may offer limited availability benefits:
Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power
Redundant CoolingECC MemoryHot Swap AdaptersDynamic KernelDisk mirroring
Example single points of failure:
Disk Adapter/ Data PathsNo Hot Swap StoragePower for Storage Arrays
Cooling for Storage ArraysHot Spare StorageNode/Operating SystemNetworkNetwork Adapter
ApplicationSite Failure (SAN distance)Site Failure (via mirroring)
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 9/22© Copyright IBM Corporation 2004
The Enhanced System
The enhanced system may offer increased availability benefits:
Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power Redundant Cooling
ECC MemoryHot Swap AdaptersDynamic KernelDisk MirroringRedundant Disk adapters/multiple paths
Hot Swap StorageRedundant Power for Storage ArraysRedundant Cooling for Storage ArraysHot Spare Storage
Example single points of failure:Node/Operating SystemNetwork Adapter Network
ApplicationSite Failure (SAN distance)Site Failure (via mirroring)
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 10/22© Copyright IBM Corporation 2004
High-Availability Clusters (HACMP)
Clustering technologies offer high-availability:
Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power
Redundant CoolingECC MemoryHot Swap AdaptersDynamic KernelRedundant Data Paths
Data MirroringHot Swap StorageRedundant Power for Storage ArraysRedundant Cooling for Storage ArraysHot Spare StorageDual Disk AdaptersRedundant nodes (operating system)Redundant Network AdaptersRedundant NetworksApplication MonitoringSite Failure (SAN distance)
Example single points of failure:Site Failure (via mirroring)
C
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 11/22© Copyright IBM Corporation 2004
Fault-Tolerant Computing
Fault-tolerant solutions should not fail:
Lock Step CPUsHardened Operating SystemHot Swap StorageContinuous Restart
Example single points of failure:
Site Failure (SAN distance)Site Failure (via mirroring)
A il bilit S l ti
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 12/22© Copyright IBM Corporation 2004
Availability Solutions
Stand-aloneEnhancedStandalone
High AvailabilityClusters
Fault-tolerantComputers
Solutions
Availabilitybenefits
Journaled FilesystemDynamic CPU DeallocationService Processor Redundant Power Redundant CoolingECC MemoryHot Swap AdaptersDynamic Kernel
Redundant Data PathsData MirroringHot Swap StorageRedundant Power forStorage Arrays
Redundant Cooling forStorage Arrays
Hot Spare Storage
Redundant ServersRedundant NetworksRedundant Network AdaptersHeartbeat MonitoringFailure DetectionFailure Diagnosis
Automated Fallover Automated Reintegration
Lock Step CPUsHardened Operating SystemRedundant MemoryContinuous Restart
Downtime Couple of days Couple of hoursDepends, but
typically 3 mins
In theory, none!
Data AvailabilityGood as your
last full backupLast transaction Last transaction No loss of Data
Relative Cost* 1 1.5 2-3 10+
* All other parameters being equal.
Simple Complex
S Wh t Ab t Sit F il ?
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 13/22
© Copyright IBM Corporation 2004
So, What About Site Failure?
Toronto London
Data Replication
Near distance (using SAN) supported by HACMP 5.2
Far distance, (requires data mirroring) invest in a GeographicClustering Solution (for example, HACMP XD*)
Distance unlimited
Data replication across a geography
Application, disk and network independent
Automated site failover and reintegration A single cluster across two sites
*The HACMP XD feature of HACMP contains IBM's HAGEO product and PPRC support .
Wh Mi ht I N d Hi h A il bilit ?
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 14/22
© Copyright IBM Corporation 2004
Why Might I Need High Availability?
60% of all large companies now operate round the clock (7x24)
Losses on failure:330,000 $US per hour (industry average)
Peak losses: 130,000 $US per minute (telephone network)
Loss of customer loyaltyLoss of customer confidence
And, if there is no disaster recovery:50% of affected companies will never reopen
90% of affected companies are out of business in less than two years
Note: High Availability is NOT a Disaster Recovery solution.
$ £
0
50
100
150
200
Lose of Revenue $M
E
B fit f Hi h A il bilit S l ti
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 15/22
© Copyright IBM Corporation 2004
Benefits of High-Availability Solutions
High-availability solutions offer the following benefits:
Standard components (no specialized hardware)Can be built from existing hardware (no need to invest in new kit)Work with just about any application
Work with wide range of disk and network typesNo specialized operating system or microcodeExcellent availability at low cost
+ =
Standard Components High Availability Solution
Other Considerations for High Availability
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 16/22
© Copyright IBM Corporation 2004
Highavailability
Continuousoperation
Continuous availability
Systems
Management
People
Data
Hardware
Software
Environment
Networking
Other Considerations for High-Availability
High-availability solutions require the following:
Thorough design and detailed planningElimination of single points of failureSelection of appropriate hardwareCorrect implementationDisciplined system administration practicesDocumented operational proceduresComprehensive testing
A Philosophical View of High Availability
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 17/22
© Copyright IBM Corporation 2004
A Philosophical View of High Availability
The goal of an HA cluster is to make a service highly available.
Users aren't interested in highly available hardware.Users aren't even interested in highly available software.
Users are interested in the availability of services.
Therefore, use the hardware and the software to make the serviceshighly available.
Cluster design decisions should be judged on the basis of whetheror not they:
Contribute to availability (for example, eliminate a SPOF)Detract from availability (for example, gratuitous complexity)
Since it is impractical if not impossible to truly eliminate all SPOFs,
be prepared to use risk analysis techniques to determine whichSPOFs are tolerated and which must be eliminated
Classic Risk Analysis
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 18/22
© Copyright IBM Corporation 2004
Classic Risk Analysis
1. Identify relevant policies
What existing risk tolerance policies are available?2. Study the current environment
Understand what strengths (for example, server room is on a properly sizedUPS) and weaknesses (for example, no disk mirroring) exist today
3. Perform requirements analysisJust how much availability is required?
What is the acceptable likelihood of a long outage?
4. Hypothesize vulnerabilities
What can possibly go wrong?
5. Identify and quantify risks
The statistical probability of something going wrong over the life of the
project (or the likely number of times something will go wrong over the life ofthe project) multiplied by the cost of an occurrence
6. Evaluate countermeasures
What does take to reduce the risk (by reducing the likelihood
or consequences of an occurrence) to an acceptable level7. Make decisions, create a budget and plan the cluster
What Do We Plan to Achieve This Week?
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 19/22
© Copyright IBM Corporation 2004
What Do We Plan to Achieve This Week?
A
B
A
B
Your mission this week is to build a two-node highly available cluster
using two previously separate pSeries systems, each of which has anapplication which needs to be made highly available.
Checkpoint
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 20/22
© Copyright IBM Corporation 2004
Checkpoint
1. Which of the following is a characteristic of high availability?
a. High availability always requires specially designed hardware components.b. High availability solutions always require manual intervention to ensure recovery following
failover.
c. High availability solutions never require customization.
d. High availability solutions offer excellent price performance when compared with Fault
Tolerant solutions.
2. True or False?High availability solutions never fail.
3. True or False? A thorough design and detailed planning is required for all high availability solutions.
4. True or False?The cluster shown on the foil titled "What We Plan to Achieve This Week" has no obvioussingle points of failure.
5. A proposed cluster with a two year life (for planning purposes) has avulnerability which is likely to occur twice per year at a cost of $10,000 peroccurrence. It costs $25,000 in additional hardware costs to eliminate thevulnerability. Should the vulnerability be eliminated?
a. yesb. no
Checkpoint Answers
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 21/22
© Copyright IBM Corporation 2004
Checkpoint Answers
1. Which of the following is a characteristic of high availability?
a. High availability always requires specially designed hardware components.b. High availability solutions always require manual intervention to ensure recovery followingfailover.
c. High availability solutions never require customization.
d. High availability solutions offer excellent price performance when compared with Fault
Tolerant solutions.2. True or False?
High availability solutions never fail.
3. True or False?
A thorough design and detailed planning is required for all high availability solutions.
4. True or False? (the local area network is a SPOF)
The cluster shown on the foil titled "What Will We Achieve This Week" has no obvioussingle points of failure.
5. A proposed cluster with a two year life (for planning purposes) has avulnerability which is likely to occur twice per year at a cost of $10,000 peroccurrence. It will cost $25,000 in additional hardware costs to eliminatethe vulnerability. Should the vulnerability be eliminated?
a. yes ($25,000 is less than $10,000 times four)b. no
Unit Summary
8/13/2019 Introduction to IBM HA
http://slidepdf.com/reader/full/introduction-to-ibm-ha 22/22
© Copyright IBM Corporation 2004
Unit Summary
Having completed this unit, you should be able to:
Understand what high availability is
Understand why you might need high availability
Outline the various options for implementing high availability
Compare and contrast the high-availability optionsState the benefits of using highly available clusters
Understand the key considerations when designing andimplementing a high-availability cluster
Be familiar with the basics of risk analysis