will you have a successful local or disaster recovery? gse_recovery.pdfuse recovery examination to...

36
Duane Wente Advisory Software Consultant BMC Software Will you have a successful local or disaster recovery?

Upload: others

Post on 18-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

Duane Wente Advisory Software Consultant BMC Software

Will you have a successful local or disaster recovery?

Page 2: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 2

Recovery is a Real Challenge

Cost of Downtime varies – By Industry – By Business Cycle

Staff Productivity and Expertise pressures – Harder to get and keep good technicians – Recovery is a ‘part time’ job, skills may wane – A lot of hours can go into DR test ‘preparations’

Planned downtime (backups) pressures – Consistent Copies may/may not require outage – Even a brief outage may impact business

Unplanned outages happen at painful times

Page 3: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 3 © Copyright 5/2/2012 BMC Software, Inc 3

Definition of Maturity Class

Page 4: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 4 © Copyright 5/2/2012 BMC Software, Inc 4

Time to Recover from Business Interruptions

Page 5: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 5

When Availability is Critical, Recovery is Crucial!

Unplanned downtime is an unfortunate fact of life...

Up to 80% of all unplanned downtime is caused by software or human error*

Up to 70% of recovery is “think time”!

*Source: Gartner

Recover30%

Build30%

Diagnose20%

Detect20%

Page 6: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 6

What can cause a database application outage?

Some events are planned: - Application database

maintenance - Data migration - Structure change implementation - Hardware upgrades (processor,

storage) - Operating system or DBMS

maintenance - Disaster recovery preparation

Other events are unplanned - Site disasters (floods, power

outages, storms, fire, etc.) - Hardware failures (disk, CPU,

network, etc.) - Operating system failures - DBMS failures - Operation errors - Batch cycle errors - Improper data feeds - User errors - Deliberate data corruption - Application software errors - Application performance

degradation - Fallback from application change

migrations

Page 7: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 7

How customers spend their money

Recovery Type

Budget Attention Probability

Disaster $$$$$ High Low

Volume $$$ Medium Medium

Application/ Logical

$ Low Very high – it’s sure to happen!

Page 8: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 8

Cost Components of Backup

What do you spend doing Database Backups? CPU time, overhead on system resources Output resources (tape or disk) Operations and Support resources

What’s the value to the business? Recoverability of critical data asset Possible side benefit – use backup to migrate data to ‘clone’

system

What’s the business impact? Availability impact (maybe) Data integrity and consistency risk (maybe) Conflicts with business processing (maybe)

Page 9: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 9

Cost Components of Log Processing

What do you spend doing Log Processing (accums)? CPU time, overhead on system resources Output resources (tape or disk) Operations and Support resources

What’s the value to the business? Faster Recovery of critical data asset

What’s the business impact? Availability impact (maybe) Conflicts with business processing (maybe)

Page 10: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 10

Cost Components of Application Recovery

What do you spend doing Local Recovery? Business is DOWN – cost can be $$$$$$$’s per hour! CPU time, overhead on system resources Output resources (tape or disk) Operations/Support resources – do you have Recovery

Experts? – ‘Think Time’ can be a significant part of total outage time – Remember – MOST outages are LOCAL outages, not Disaster

Recovery

What’s the value to the business? Recovery of critical data asset - eventually Business Resumption

– Identify and Reapply lost transactions

What’s the business impact? Availability impact

– Lost sales, lost opportunity, fees and fines, supply chain impact, etc.

Page 11: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 11

Examples of ISV Innovation for Backup and Recovery

Multi-vendor storage exploitation for consistent image copies with minimal outage

High-speed recovery with a variety of techniques

Point-In-Time recovery to any timestamp with consistency

Point-In-Time change accumulation

Disaster Recovery preparation automation

Data Replication for testing

Using DBMS log data for reporting and transaction recovery

Dynamic RECON management and use

Monitoring DBMS recovery actions with solution recommendations

Simplification and automation for complex tasks

Page 12: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 12

The 3 P’s - Performance

Performance - Externalize Sorting - Backout Recovery - Monitoring Recovery

Page 13: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 13

Performance – External Sort

Schedule sort tasks in separate address space - Reduces the amount of virtual storage in utility address space - Add more sort tasks to further distribute workload

Log sorting - Change Accum - Recovery

Index rebuild sorting - Recovery - Allows for greater overlap of the Index Rebuild functions - As each index build completes, resources are released

Page 14: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 14

Performance – Backout Recovery

ISV recovery method

Starts with current, existing database data sets

Reverses (backs out) logged changes

Returns the database to the condition it was in at the specified recovery time stamp

Determine whether to run a forward recovery or a backout recovery based on logs needed

Lives within the rules of DBRC

Supports full-function databases and HALDBs

Page 15: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 15

Performance - Recovery Monitor

What is going on with this recovery?

Which databases were recovered?

How many logs did the recovery job read?

Were there any problems with the recovery?

Page 16: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 16

Monitor Recovery Actions

Page 17: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 17

Consolidated DBA Worklist

Page 18: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 18

The 3 P’s - Protection

Protection - Image Copy Encryption - Automatic Database Allocation - RECON Reorganization - Recovery Extensions - RECON Cleanup Utility

Page 19: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 19

Protection – Image Copy Encryption

› Satisfies need for SOX compliance to protect financial and customer information

› Standard z/OS data encryption - DES (64bit) or AES (128bit) keys › Encryption key file is allocated dynamically

– MDALIB or STEPLIB

Joe Blogs 123 45 6789

IC ENCRYPTION RECOVER DB

Encryption key file

or

$je Lb*(1 C18 bo 3(7V

Joe Blogs 123 45 6789 Encrypted

Image copies 2-10

Page 20: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 20

ALT JCL PDS Del/Def

Capture Prod Allocation Info Batch

System Catalogs

Original IMS DBs

Protection – Automatic Database Definitions

DBRC RECONS

Original IMS DBs

JCL PDS Del/Def Mbr

Page 21: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 21

Protection - RECON Reorganization Utility

RECON Reorg Utility Reorg All Mode

RECON Contention?

CHANGE.RECON REPLACE

Y

N

REORG ALL Mode

Delete/Define

Issue Command /DIS OLDS

All Reorged? Y

EOJ N

› Purpose – Restore the RECON

data sets to optimum availability and performance levels.

› Two modes – Replace – Reorganize All

Page 22: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 22

Protection – Recovery Extensions

– Store additional image copy and change accum data set information in an externally maintained repository

– Functions using the additional data set retrieval • Incremental image copy

• Change accum

• Recovery

Image Copy

Change Accum

DBRC

Manager

Image Copy

Change Accum

Recovery

Repository

Copies 1 or 2

Copies 1 … n

Page 23: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 23

RECON Data Set

Protection - RECON Cleanup

Subsys Subsys

Closes open PRILOGs Closes open SECLOGs Deletes PRIOLDs Deletes SECOLDs Deletes SUBSYS records Perform other cleanup...

Updates/deletes ALLOCs Updates/deletes LOGALLs

Marks CA runs “invalid” Closes open SECSLDs Closes open PRISLDs

Provides detailed reports Marks DBs as “recov needed”

Bad Good Good

Provides suggested PIT Provides suggested CA time Marks Primary Logs in ERROR Optionally: Marks Primary ICs in ERROR

Page 24: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 24

The 3 P’s - Productivity

Productivity - Conditional Image Copy - Change Accumulation File Management

Page 25: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 25

Productivity - Conditional Image Copy

Bypass Image Copy

Start IMAGE COPY PLUS

Any updates since last image copy?

Has it been too long since

last image Copy?

Yes

No Create

Image Copy

No

Yes

› Am I making too many batch image copies?

› Can I save money on image copies without changing the schedule?

Page 26: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 26

Productivity – Change Accum File Management with IC Triggering

Bypass Image Copy

Start IMAGE COPY

Is my change accum file too big?

Has it been too long since

last image Copy?

Yes

No Create

Image Copy

No

Yes

› How can I manage the size of the change accum dataset?

› Can I trigger an image copy when the change accum is too big?

CHANGE ACCUMULATION

Repository Statistics

* CA is TOO BIG

TOO BIG!! Change Accum

Dataset

Page 27: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 27

Performance, Protection, & Productivity = Effective Recovery

Use recovery examination to detect common problems that affect the recoverability of IMS databases. - READ the DBRC RECON data sets and analyze the records against

appropriate threshold parameters - REPORT problems as exceptions along with flexible notification email - RECOMMEND a solution and generate JCL that can solve reported

problems

Conditionally image copy to bypass unnecessary image copies.

Use image copy triggering to maintain size of change accumulation file

Monitor Recovery functions to proactively watch the progress of recovery jobs.

Page 28: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 28

Automated Maintenance Cycle

AUTOMATE

database and resource maintenance

Configure 1

2 Gather

Analyze 3

Execute 4

Auto-configure feature and review default analysis

options

Gather recovery information about

databases and resources

Analyzes databases and resources and reports the

current and potential problems

Recommend solutions to correct the problem and

lets you execute the solution

Page 29: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 29

Recovery Management – Step 1 - Configure

AUTOMATE

database and resource maintenance

Configure 1

Establish thresholds based on your Business Requirements

Need to detects the following and more - Unavailable databases - Unavailable RECONS - Missing assets – image copies, logs, and change accums - Not enough image copies and change

accums exist - Databases missing from change accum

groups - RECONS are out of space - Too many logs are needed for recovery

Page 30: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 30

Recovery Management – Step 2 - Gather

2

Collect recovery Information based on your Business Requirements

Recovery collection possibilities - Automatically via program - Via your own Scheduler - On demand at your request - Collect by RECON - Collect by Group - Against a RECON backup

AUTOMATES database and resource

maintenance

Gather

Page 31: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 31

Recovery Management– Step 3 - Analyze

Analyze recovery exceptions based on your Business Requirements

Recovery exception requirements: - Consolidated exceptions Both recovery and space issues Includes databases, change accum,

logs, and RECONs - Flexibility Enterprise-wide down to specific

groups Most severe down to warnings General down to specific

- EMAIL flexibility Individual or consolidated Limit by severity

AUTOMATES database and resource

maintenance

Analyze 3

Page 32: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 32 © Copyright 5/2/2012 BMC Software, Inc 32

Recovery Management: Work Prioritization

Page 33: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 33

Recovery Management – Step 4 - Execute

Process Recovery Exceptions based on your Business Requirements

Recovery management capabilities - Resolution flexibility Fix all problems for a database Fix selected problems for a

database - JCL Creation flexibility Create as needed for a database Batch request against all exceptions

and have JCL created for all databases

AUTOMATE

database and resource maintenance

Execute 4

Page 34: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 34

Recovery Management - Execute

Page 35: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 35

How Many Potential Recovery Exceptions?

Page 36: Will you have a successful local or disaster recovery? GSE_Recovery.pdfUse recovery examination to detect common problems that affect the recoverability of IMS databases. -READ the

© Copyright 5/2/2012 BMC Software, Inc 36

Learn more at www.bmc.com