Duane Wente Advisory Software Consultant BMC Software
Will you have a successful local or disaster recovery?
© Copyright 5/2/2012 BMC Software, Inc 2
Recovery is a Real Challenge
Cost of Downtime varies – By Industry – By Business Cycle
Staff Productivity and Expertise pressures – Harder to get and keep good technicians – Recovery is a ‘part time’ job, skills may wane – A lot of hours can go into DR test ‘preparations’
Planned downtime (backups) pressures – Consistent Copies may/may not require outage – Even a brief outage may impact business
Unplanned outages happen at painful times
© Copyright 5/2/2012 BMC Software, Inc 3 © Copyright 5/2/2012 BMC Software, Inc 3
Definition of Maturity Class
© Copyright 5/2/2012 BMC Software, Inc 4 © Copyright 5/2/2012 BMC Software, Inc 4
Time to Recover from Business Interruptions
© Copyright 5/2/2012 BMC Software, Inc 5
When Availability is Critical, Recovery is Crucial!
Unplanned downtime is an unfortunate fact of life...
Up to 80% of all unplanned downtime is caused by software or human error*
Up to 70% of recovery is “think time”!
*Source: Gartner
Recover30%
Build30%
Diagnose20%
Detect20%
© Copyright 5/2/2012 BMC Software, Inc 6
What can cause a database application outage?
Some events are planned: - Application database
maintenance - Data migration - Structure change implementation - Hardware upgrades (processor,
storage) - Operating system or DBMS
maintenance - Disaster recovery preparation
Other events are unplanned - Site disasters (floods, power
outages, storms, fire, etc.) - Hardware failures (disk, CPU,
network, etc.) - Operating system failures - DBMS failures - Operation errors - Batch cycle errors - Improper data feeds - User errors - Deliberate data corruption - Application software errors - Application performance
degradation - Fallback from application change
migrations
© Copyright 5/2/2012 BMC Software, Inc 7
How customers spend their money
Recovery Type
Budget Attention Probability
Disaster $$$$$ High Low
Volume $$$ Medium Medium
Application/ Logical
$ Low Very high – it’s sure to happen!
© Copyright 5/2/2012 BMC Software, Inc 8
Cost Components of Backup
What do you spend doing Database Backups? CPU time, overhead on system resources Output resources (tape or disk) Operations and Support resources
What’s the value to the business? Recoverability of critical data asset Possible side benefit – use backup to migrate data to ‘clone’
system
What’s the business impact? Availability impact (maybe) Data integrity and consistency risk (maybe) Conflicts with business processing (maybe)
© Copyright 5/2/2012 BMC Software, Inc 9
Cost Components of Log Processing
What do you spend doing Log Processing (accums)? CPU time, overhead on system resources Output resources (tape or disk) Operations and Support resources
What’s the value to the business? Faster Recovery of critical data asset
What’s the business impact? Availability impact (maybe) Conflicts with business processing (maybe)
© Copyright 5/2/2012 BMC Software, Inc 10
Cost Components of Application Recovery
What do you spend doing Local Recovery? Business is DOWN – cost can be $$$$$$$’s per hour! CPU time, overhead on system resources Output resources (tape or disk) Operations/Support resources – do you have Recovery
Experts? – ‘Think Time’ can be a significant part of total outage time – Remember – MOST outages are LOCAL outages, not Disaster
Recovery
What’s the value to the business? Recovery of critical data asset - eventually Business Resumption
– Identify and Reapply lost transactions
What’s the business impact? Availability impact
– Lost sales, lost opportunity, fees and fines, supply chain impact, etc.
© Copyright 5/2/2012 BMC Software, Inc 11
Examples of ISV Innovation for Backup and Recovery
Multi-vendor storage exploitation for consistent image copies with minimal outage
High-speed recovery with a variety of techniques
Point-In-Time recovery to any timestamp with consistency
Point-In-Time change accumulation
Disaster Recovery preparation automation
Data Replication for testing
Using DBMS log data for reporting and transaction recovery
Dynamic RECON management and use
Monitoring DBMS recovery actions with solution recommendations
Simplification and automation for complex tasks
© Copyright 5/2/2012 BMC Software, Inc 12
The 3 P’s - Performance
Performance - Externalize Sorting - Backout Recovery - Monitoring Recovery
© Copyright 5/2/2012 BMC Software, Inc 13
Performance – External Sort
Schedule sort tasks in separate address space - Reduces the amount of virtual storage in utility address space - Add more sort tasks to further distribute workload
Log sorting - Change Accum - Recovery
Index rebuild sorting - Recovery - Allows for greater overlap of the Index Rebuild functions - As each index build completes, resources are released
© Copyright 5/2/2012 BMC Software, Inc 14
Performance – Backout Recovery
ISV recovery method
Starts with current, existing database data sets
Reverses (backs out) logged changes
Returns the database to the condition it was in at the specified recovery time stamp
Determine whether to run a forward recovery or a backout recovery based on logs needed
Lives within the rules of DBRC
Supports full-function databases and HALDBs
© Copyright 5/2/2012 BMC Software, Inc 15
Performance - Recovery Monitor
What is going on with this recovery?
Which databases were recovered?
How many logs did the recovery job read?
Were there any problems with the recovery?
© Copyright 5/2/2012 BMC Software, Inc 16
Monitor Recovery Actions
© Copyright 5/2/2012 BMC Software, Inc 17
Consolidated DBA Worklist
© Copyright 5/2/2012 BMC Software, Inc 18
The 3 P’s - Protection
Protection - Image Copy Encryption - Automatic Database Allocation - RECON Reorganization - Recovery Extensions - RECON Cleanup Utility
© Copyright 5/2/2012 BMC Software, Inc 19
Protection – Image Copy Encryption
› Satisfies need for SOX compliance to protect financial and customer information
› Standard z/OS data encryption - DES (64bit) or AES (128bit) keys › Encryption key file is allocated dynamically
– MDALIB or STEPLIB
Joe Blogs 123 45 6789
IC ENCRYPTION RECOVER DB
Encryption key file
or
$je Lb*(1 C18 bo 3(7V
Joe Blogs 123 45 6789 Encrypted
Image copies 2-10
© Copyright 5/2/2012 BMC Software, Inc 20
ALT JCL PDS Del/Def
Capture Prod Allocation Info Batch
System Catalogs
Original IMS DBs
Protection – Automatic Database Definitions
DBRC RECONS
Original IMS DBs
JCL PDS Del/Def Mbr
© Copyright 5/2/2012 BMC Software, Inc 21
Protection - RECON Reorganization Utility
RECON Reorg Utility Reorg All Mode
RECON Contention?
CHANGE.RECON REPLACE
Y
N
REORG ALL Mode
Delete/Define
Issue Command /DIS OLDS
All Reorged? Y
EOJ N
› Purpose – Restore the RECON
data sets to optimum availability and performance levels.
› Two modes – Replace – Reorganize All
© Copyright 5/2/2012 BMC Software, Inc 22
Protection – Recovery Extensions
– Store additional image copy and change accum data set information in an externally maintained repository
– Functions using the additional data set retrieval • Incremental image copy
• Change accum
• Recovery
Image Copy
Change Accum
DBRC
Manager
Image Copy
Change Accum
Recovery
Repository
Copies 1 or 2
Copies 1 … n
© Copyright 5/2/2012 BMC Software, Inc 23
RECON Data Set
Protection - RECON Cleanup
Subsys Subsys
Closes open PRILOGs Closes open SECLOGs Deletes PRIOLDs Deletes SECOLDs Deletes SUBSYS records Perform other cleanup...
Updates/deletes ALLOCs Updates/deletes LOGALLs
Marks CA runs “invalid” Closes open SECSLDs Closes open PRISLDs
Provides detailed reports Marks DBs as “recov needed”
Bad Good Good
Provides suggested PIT Provides suggested CA time Marks Primary Logs in ERROR Optionally: Marks Primary ICs in ERROR
© Copyright 5/2/2012 BMC Software, Inc 24
The 3 P’s - Productivity
Productivity - Conditional Image Copy - Change Accumulation File Management
© Copyright 5/2/2012 BMC Software, Inc 25
Productivity - Conditional Image Copy
Bypass Image Copy
Start IMAGE COPY PLUS
Any updates since last image copy?
Has it been too long since
last image Copy?
Yes
No Create
Image Copy
No
Yes
› Am I making too many batch image copies?
› Can I save money on image copies without changing the schedule?
© Copyright 5/2/2012 BMC Software, Inc 26
Productivity – Change Accum File Management with IC Triggering
Bypass Image Copy
Start IMAGE COPY
Is my change accum file too big?
Has it been too long since
last image Copy?
Yes
No Create
Image Copy
No
Yes
› How can I manage the size of the change accum dataset?
› Can I trigger an image copy when the change accum is too big?
CHANGE ACCUMULATION
Repository Statistics
* CA is TOO BIG
TOO BIG!! Change Accum
Dataset
© Copyright 5/2/2012 BMC Software, Inc 27
Performance, Protection, & Productivity = Effective Recovery
Use recovery examination to detect common problems that affect the recoverability of IMS databases. - READ the DBRC RECON data sets and analyze the records against
appropriate threshold parameters - REPORT problems as exceptions along with flexible notification email - RECOMMEND a solution and generate JCL that can solve reported
problems
Conditionally image copy to bypass unnecessary image copies.
Use image copy triggering to maintain size of change accumulation file
Monitor Recovery functions to proactively watch the progress of recovery jobs.
© Copyright 5/2/2012 BMC Software, Inc 28
Automated Maintenance Cycle
AUTOMATE
database and resource maintenance
Configure 1
2 Gather
Analyze 3
Execute 4
Auto-configure feature and review default analysis
options
Gather recovery information about
databases and resources
Analyzes databases and resources and reports the
current and potential problems
Recommend solutions to correct the problem and
lets you execute the solution
© Copyright 5/2/2012 BMC Software, Inc 29
Recovery Management – Step 1 - Configure
AUTOMATE
database and resource maintenance
Configure 1
Establish thresholds based on your Business Requirements
Need to detects the following and more - Unavailable databases - Unavailable RECONS - Missing assets – image copies, logs, and change accums - Not enough image copies and change
accums exist - Databases missing from change accum
groups - RECONS are out of space - Too many logs are needed for recovery
© Copyright 5/2/2012 BMC Software, Inc 30
Recovery Management – Step 2 - Gather
2
Collect recovery Information based on your Business Requirements
Recovery collection possibilities - Automatically via program - Via your own Scheduler - On demand at your request - Collect by RECON - Collect by Group - Against a RECON backup
AUTOMATES database and resource
maintenance
Gather
© Copyright 5/2/2012 BMC Software, Inc 31
Recovery Management– Step 3 - Analyze
Analyze recovery exceptions based on your Business Requirements
Recovery exception requirements: - Consolidated exceptions Both recovery and space issues Includes databases, change accum,
logs, and RECONs - Flexibility Enterprise-wide down to specific
groups Most severe down to warnings General down to specific
- EMAIL flexibility Individual or consolidated Limit by severity
AUTOMATES database and resource
maintenance
Analyze 3
© Copyright 5/2/2012 BMC Software, Inc 32 © Copyright 5/2/2012 BMC Software, Inc 32
Recovery Management: Work Prioritization
© Copyright 5/2/2012 BMC Software, Inc 33
Recovery Management – Step 4 - Execute
Process Recovery Exceptions based on your Business Requirements
Recovery management capabilities - Resolution flexibility Fix all problems for a database Fix selected problems for a
database - JCL Creation flexibility Create as needed for a database Batch request against all exceptions
and have JCL created for all databases
AUTOMATE
database and resource maintenance
Execute 4
© Copyright 5/2/2012 BMC Software, Inc 34
Recovery Management - Execute
© Copyright 5/2/2012 BMC Software, Inc 35
How Many Potential Recovery Exceptions?
© Copyright 5/2/2012 BMC Software, Inc 36
Learn more at www.bmc.com