backing up very large databases - university of …tsm-symposium.oucs.ox.ac.uk/2007/papers/charles...

29
k Backing up Very Large Databases Charles Silvan GATE Informatic GATE Informatic Switzerland

Upload: others

Post on 05-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

kBacking up Very Large Databases

Charles SilvanGATE InformaticGATE Informatic

Switzerland

Page 2: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

A dAgendaEnvironmentFlashcopy PrinciplesFlashcopy PrinciplesTSM for ACS NumbersIssuesIssuesScalability

C. Silvan Backing up Very Large Databases 2

Page 3: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

C t E i tCustomer EnvironmentFour Datacenters worldwideEach one over two sites separated by 20-Each one over two sites, separated by 20-40 kmE thi LVM i d HACMPEverything LVM mirrored, HACMPFactory approach to installation and y ppmaintenanceEach datacenter hosting 10-20 SAP DB2Each datacenter hosting 10-20 SAP DB2 databases of n TB’s

C. Silvan Backing up Very Large Databases 3

Page 4: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

SAN L tSAN LayoutSite A Site B

posst002Tape Library

posst001Tape Library

dr1

dr2

dr3

dr4

dr5

dr6

dr1

dr2

dr3

dr4

dr5

dr6

Core 1AEdge 1A

AIX

Core 1B Edge 1B

AIX

Core 2AEdge 2A

DS8000

Win2K

TSMServer A

Core 2B Edge 2B

DS8000

Win2K

TSMServer B

Tape Libraryposst002

Tape Libraryposst001

dr11

dr12

dr13

dr14

dr15

dr16

dr11

dr12

dr13

dr14

dr15

dr16

posst002posst001

Systems

Directors

C. Silvan Backing up Very Large Databases 4

DS8000

Tape Library

Page 5: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Flashcopy/Flashrestore P i i lPrinciples

S t f LUN i t tSet of LUN pairs: source-target, same sizesFl h bli h i i dFlashcopy establishment: pointers copied to target LUN’s = Point-in-time snapshotData on target LUN immediately accessibleAfter all data is copied to targets: can reverse relationship: FlashrestorepAfter first full copy: can use Incremental Flashcopy

C. Silvan Backing up Very Large Databases 5

as copy

Page 6: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Fl h B k t TFlashcopy Backup to Tape

Database set to "write suspend" modeFlashcopy pairs establishmentFlashcopy pairs establishmentDatabase resumedTarget LUN’s are accessed fromTarget LUN s are accessed from Backup Mover systemImportvg mount etcImportvg, mount, etcDatabase is started on Mover SystemOnline DB backup to TSM tape in LANOnline DB backup to TSM tape, in LAN-less mode

C. Silvan Backing up Very Large Databases 6

Page 7: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Fl h B fitFlashcopy BenefitsFl h B k i i i i tFlashcopy Backups: minimize impact on production database systems

short quiesce time Zero load on production serverZero load on production server only reading load on production disks

Costs:double the disk space requirementdouble the disk space requirementsetup

C. Silvan Backing up Very Large Databases 7

Page 8: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Fl h t B fitFlashrestore BenefitsReduces restore time from hours to minutes (basically independent of ( y pdatabase size)Eliminates data transfer from tapeEliminates data transfer from tapeForward recovery can start immediately after reverse Flashcopy relationship isafter reverse Flashcopy relationship is established

Note: in case of tape restore: LAN less toNote: in case of tape restore: LAN-less to production system

C. Silvan Backing up Very Large Databases 8

Page 9: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Fl h L dFlashcopy LandscapePrd1 Prd2 (takeover)TSM1 TSM2 (takeover)HACMP

LVM Mirror

HACMP

SAN

BM1 BM2MAN Links

SAN SANSAN SAN

DS8K#1Source Target

DS8K#2Source TargetLTO Tape

LibraryLTO Tape

Library

C. Silvan Backing up Very Large Databases 9

Page 10: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

TSM for ACS(= TDP for Flashcopy)

TSM for Advanced Copy Services provides a solution for Flashcopy backupprovides a solution for Flashcopy backup in SAP environmentSoftware stack:Software stack:SAP, DB2, DP for SAP, DP for Flashcopy, TSM

Cli t TSM St t CIM A t DS8KClient, TSM Storage agent, CIM Agent, DS8K SMI-S API, CIM client (Pegasus)

M i li i i i h VM i i DBMain limitation: with LVM mirroring, DB must reside on one disk subsystem

C. Silvan Backing up Very Large Databases 10

Page 11: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

C l it A tComplexity AspectsMultiple systemsMultiple systemsJava: installers, productsCode levels dependencies: DS8000, CIM AgentAgentLog filesP d (DB2 TSM CIM DS8000)Password (DB2, TSM, CIM, DS8000)

All of this make automated installs and maintenance complicated

C. Silvan Backing up Very Large Databases 11

maintenance complicated

Page 12: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

P d ti IProduction IssuesHigh percentage of failed backups for manyHigh percentage of failed backups, for many

different reasons:Setup: NFS passwords rexec LVM/DB changesSetup: NFS, passwords, rexec, LVM/DB changes, code upgrades, DS8000 LUNs, etcTapes: drives not available tape preemptionTapes: drives not available, tape preemptionPerformance: MAN links limits, disk subsystems

As DB’s were expected to grow by a factor 5 to 10 t k d IBM10, customer asked IBM:

How do you backup a database of 20, 40, 80 TB?

C. Silvan Backing up Very Large Databases 12

Page 13: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

IBM Montpellier Study

Real size tests, over several months, by top IBM specialists, and lots of hardwaretop IBM specialists, and lots of hardwareDocumented in a Redbook: SG24-7289 « Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver Business Intelligence Data WarehouseIntelligence Data Warehouse » Also includes proposals for scaling up to 60 TB

C. Silvan Backing up Very Large Databases 13

Page 14: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Some Conclusions from M t lliMontpellier

Dedicate DS8K ports for tape backups, one per tape streamone per tape streamSpread Flashcopy source and targets on all arrays: when no backup use full DS8Kall arrays: when no backup, use full DS8K power; when backups, equivalent to

ti thseparating themBackup time: 4.5 hours for 20 TB, using 16 LTO-3 drives = 4.4 TB/hr, 300 GB/hr/stream

C. Silvan Backing up Very Large Databases 14

Page 15: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

f hTransfer Rates ArithmeticsBackup or restore 42 TB in 7 hours = 6 TB/hrBackup or restore 42 TB in 7 hours = 6 TB/hr. Assuming 300 GB/hr per stream (=83 MB/s):

need 20 parallel streamspaggregate data rate: 1660 MB/s

For 63 TB DB in 7 hours: 9 TB/hr:30 parallel streamsaggregate: 2500 MB/s

If over the MAN: 9 - 13 x 2 Gbps linksConclusions:

cannot afford backups going over the MANcannot afford taking a second copy

C. Silvan Backing up Very Large Databases 15

Page 16: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Flashcopy NumbersFull Flashcopy of 42 TB DB: assuming 1.8 TB/hr, or 500 MB/sec read + write (Montpellier numbers), requires 23 hours on one DS8000For 63 TB: 33 hours

To go down to 8 hours elapsed: spread the loadTo go down to 8 hours elapsed: spread the load on 3 to 4 DS8000's

C. Silvan Backing up Very Large Databases 16

Page 17: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

How many DS8000's ?How many DS8000 s ?Capacity-wise: 2 DS8000 (per site) for a 42 TB DBCapacity wise: 2 DS8000 (per site) for a 42 TB DBPerformance-wise: I/O loads:Production: ? read,? write, over 24 hoursProduction: ? read,? write, over 24 hoursFull Flashcopy: 1700 MB/s read, 1700 MB/s write, over 7 hours. Backup to tape: 1660 MB/s read, over 7 hoursLogs: 23 MB/s write, 23 MB/s read. Peak: 50 MB/s ? Total I/O Capacity of one DS8000: about 1100 MB/sec Backup to tape plus Full Flashcopy represent about 3 x DS8000 I/O capability

So: need at least 4 x DS8000, for performance

C. Silvan Backing up Very Large Databases 17

Page 18: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Backup MoversBackup Movers

20 / 30 t t d th20 / 30 tape streams: need more than one Backup MoverGo for 4 Backup Movers, one per DS8000Each one with 6 / 8 tape streamsEach one with 6 / 8 tape streamsUsing 4 DS8000 ports, and 4 tape HBA's (at 4 Gbps)4 Gbps)Have HACMP Clusters of Backup Movers ? f fIf one Backup Mover fails, the whole DB backup has failed

C. Silvan Backing up Very Large Databases 18

Page 19: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

N 5 2 LPAR D t bNew 5x2 LPAR Database

C. Silvan Backing up Very Large Databases 19

Page 20: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

T D i All tiTape Drive AllocationNeed to guarantee that 20 tape drives will be available at backup timewill be available at backup time.If the TSM server used for more than one DB: not possibleThe only way is to use a LibraryThe only way is to use a Library Manager-only TSM server, which allows to dedicate drives

C. Silvan Backing up Very Large Databases 20

Page 21: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Al B kAlternate BackupsGoal: avoid Copy storagepool by backing up alternatively to two tape libraries, one

h iton each siteOne site disaster: recovery from -1 backupI t lli i i th (SAP) li t i tIntelligence is in the (SAP) client scripts: backup to other Mgmtclass (=library) than last backuplast backupStill to one TSM serverNeeds new definitions: primary stgpoolsNeeds new definitions: primary stgpools on remote library and new set of Management classes: more complexity

C. Silvan Backing up Very Large Databases 21

Management classes: more complexity

Page 22: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

A hi LArchive LogsArchive Logs are even more critical than DB

backups: if one log is missing, cannot ll f d t th t i troll-forward past that point

Continuous flow of 2 GB objects, about 100 GB/hr100 GB/hrWant to keep logs from last N days on disk for restoredisk, for restoreIf need to restore from tape, must have parallelismparallelismCannot live more than a few hours without archiving (and deleting) the logs

C. Silvan Backing up Very Large Databases 22

archiving (and deleting) the logs

Page 23: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

MIGDELAYMIGDELAYTo keep last N days on disk: use MIGDELAYMIGDELAYProblems:

MIGDELAY vs MIGCONTINUE No practical was to get the age ofNo practical was to get the age of archive files in a disk storage pool

C. Silvan Backing up Very Large Databases 23

Page 24: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

C ll ti ICollocation IssuesCollocation has been invented to avoid files of one node or filespaceavoid files of one node or filespace from being spread on many tape volumesIn this case we need to migrate filesIn this case, we need to migrate files from one filespace to multiple tape

l (f ll l t ) tvolumes (for parallel restores): not possible if same node and filespace.

C. Silvan Backing up Very Large Databases 24

Page 25: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

TSM P dTSM PasswordsTSM password must be correct at allTSM password must be correct, at all times, on:

All production hosts and their take over hostsAll production hosts and their take-over hostsAll Backup Mover systems

C t CLUSTER YESCannot use CLUSTER=YES Use ASNODE, but still need to

h i f h hauthenticate from each hostEach node still needs to be backed up-> in TSM server: 18 x 2 nodenames, 36 password files

C. Silvan Backing up Very Large Databases 25

36 password files

Page 26: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

T tiTestingCan you afford a test environment with 20Can you afford a test environment with 20 hosts, 8 DS8000, 40 - 60 tape drives?Test/validation needed for:Test/validation needed for:

Setup changesS ft l l hSoftware level changesDS8000 microcode changes

Development Labs have the same problem!How do you verify that a backup image is OK, without having to do a full restore?

C. Silvan Backing up Very Large Databases 26

Page 27: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

G l IGeneral IssuesMost products were not designed for theseMost products were not designed for these

sizes:

Large system effectsLarge system effectsRestart capabilitiesF il t i tFailure containment

Error analysisPerformance monitoring

C. Silvan Backing up Very Large Databases 27

Performance monitoring

Page 28: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

S l bilitScalabilityScalability options:Scalability options:

technology: faster disks and tapes. Will it keep up with data growth?up with data growth?horizontal scaling: more DS8000's, more tape drives, more Backup Movers, more LPAR’s.drives, more Backup Movers, more LPAR s. Can go a very long way.

But only up to a point:But only up to a point:complexityskillsskillsexposure to one failing element

C. Silvan Backing up Very Large Databases 28

Page 29: Backing up Very Large Databases - University of …tsm-symposium.oucs.ox.ac.uk/2007/papers/Charles Silvan...C. Silvan Backing up Very Large Databases 3 SAN L tSAN Layout Site A Site

Conclusion

Q: How do you back up very large databases?databases?A: Very carefully

t t ll… or not at all.

C. Silvan Backing up Very Large Databases 29