open source backup cpnference 2014: bareos in scientific environments, by dr. stefan vollmar

14
Max Planck Institute for Metabolism Research Cologne, Germany Dr. Stefan Vollmar Head of IT Group [email protected] Bareos @ MPI SF

Upload: netways

Post on 10-Jun-2015

178 views

Category:

Software


0 download

DESCRIPTION

To backup 110 (partly virtualized) Linux servers the Max Planck Institute for Radio Astronomy has been using Bareos for 5 years now. The full backup volume is constantly growing and has just passed the 35 TiB mark with up to 6 million files per TiB. Naturally there were problems with scalability and flexibility which needed to be addressed. We are using 2 Spectra Logic T950 (LTO5/LTO6) tape libraries, 40 TiB of disk backup space, and a dedicated 1GbE/10GbE backup LAN. As it may be an inspiration to other users, we would like to share our experience utilizing virtual full backups, concurrent jobs, backup of Heartbeat/DRBD Failover Clusters and integrating Bareos with REAR for disaster recovery. Coming from TSM, passing Bacula on the way, we finally found our destination with Bareos! The Max Planck Institute for Neurological Research operates several brain scanners for human and animal studies. Imaging techniques used here comprise magnetic resonance imaging (MRI), positron emission tomography (PET), optical imaging and microscopy. Research is often interdisciplinary, including contributions from the fields of biology, physics, medicine, psychology, genetics, biochemistry, radiochemistry – with very heterogeneous characteristics of data and analysis methods. Backup requirements range between file systems with literally millions of very small files (DICOM raw data or FSL intermediate results) to files of 200 GB+ size (PET listmode). “Good Scientific Practice” mandates backup/archiving primary data and “everything else needed to reproduce published results” (tools, documentation of tool chains, intermediate results) – which is a veritable challenge in a high-end, dynamic lab environment. Until recently, we have used a HSM system from Sun/Oracle Inc (SAM-FS) to meet our requirements of backup and archiving, in particular, using HSM-type filesystems for scientific computing in order to have a fine-grained backup. However, a significantly larger and more powerful system was needed and we are now migrating to a Quantum i6000 (LTO-6) tape library with Grau OpenArchiver as HSM frontend. With help from our colleagues in Bonn (MPI for Radio Astronomy), we were able to use Bareos for archiving some vital filesystems (backup-to-disk using a HSM file system with WORM tapes; one job per file; file archives < 5 GB; mostly unixoid backup clients). We are very pleased with the performance, ease of handling and flexibility this approach offers, e.g. when using incremental backups of virtual machines, listing the 5 largest files can tell a lot about a system’s “health”; pre- and posthooks allow some interesting security features in an ESX-cluster environment (taking network interfaces automatically up before saving sensitive data and shutting the interfaces down afterwards); analysing backup reports reveal longterm trends for hot spots, etc.

TRANSCRIPT

Page 1: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Max Planck Institute forMetabolism Research

Cologne, Germany

Dr. Stefan VollmarHead of IT Group

[email protected]

Bareos @ MPI SF

Page 2: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Diffusion MRI - Fibre Tracking• “Nothing defines the function of a neuron better than

its connections” Mesulam (2006) • analyzing fiber structure (anatomical connectivity)

in-vivo in human brains

Page 3: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

The High Resolution Research Tomograph

Page 4: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Migration from Sun SAM-FS to Grau OpenArchive (1)

• Interdisciplinary and complex projects, heterogeneous groups: fine granular backup/archiving needed for “Good Scientific Practice”

• Sun SAM-FS, 50 TB online, 374 x LTO-4 in Sun SL500-Library

• Problem: not (nearly) enough online space • Problem: users find it difficult to work efficiently

with offline data • Problem: one central file server (Sun V490) • Problem: ESX-Cluster increasingly important,

unsatisfactory backup concept (Acronis Backup & Recovery 11.5 Virtual Edition)

Page 5: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Migration from Sun SAM-FS to Grau OpenArchive (2)

• VMWare ESX-Cluster now with 80 TB HA virtual storage - DataCore SANSymph-V (HP EVA, HP P2000)

• new Grau OpenArchive HSM with 300 x LTO-6 in Quantum i6000-Library for offline storage

• But: GAM setup is not suitable for original SAM-FS concept - no shared pools, need fewer file systems

• So try Bareos for backup of selected VMs with concept of D. Jahn (contac Datentechnik GmbH): backup-to-disk into a HSM file system

• Sliced Bread • New approach: fewer HSM-type "production" file

systems for scientific computing - use Bareos!

Page 6: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Services: Virtually Fat Free?

4 x HP DL585 G7 > 1 TB RAM 10-Gig-E networking FC-SAN approx. 64 cores each

approx. 100 virtual machines, VMWare vSphere 5 ESX cluster

Page 7: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Storage and Archiving (3)

LTO 6-WORM approx. 6 TB per Tape

Quantum i6000 600 Slots 6 x LTO-6 (for two institutes)

Page 8: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Adding Clients with Templates

Schedule { Name = "XXX-all" Run = Level=Full sun at 2:25 Run = Level=Incremental mon-sat at 22:30} #Job { Name = "XXX-all" Type = Backup Level = Incremental Client = XXX-fd FileSet= "XXX-all" Messages = Standard Storage = XXX Pool = XXX Accurate = true Schedule = XXX-all }#Client { Name = XXX-fd Address = XXX Catalog = MyCatalog ...Pool { Name = XXX Pool Type = Backup LabelFormat = "XXX-" Maximum Volume Jobs = 1 Maximum Volume Bytes = 5G Recycle = no}

Device { Name = XXX Media Type = XXX Archive Device = /var/lib/bareos/storage/XXX LabelMedia = yes; # lets Bareos label unlabeled media Random Access = yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no;}

echo "creating storage dir"mkdir "/var/lib/bareos/storage/$1"chown bareos "/var/lib/bareos/storage/$1"cp _template-linux.dir.conf_ "$1.dir.conf"cp _template-linux.sd.conf_ "$1.sd.conf"sed -i "s/XXX/$1/g" $1.dir.conf sed -i "s/XXX/$1/g" $1.sd.conf

_template-linux.dir.conf_

_template-linux.sd.conf_

root@:conf.d# add-client.sh marvin

add-client.sh

templates courtesy of J. Behrend, thanks!

Page 9: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Backup-to-Disk on HSM

!

-rw-r----- 1 bareos bareos 18M Sep 17 13:35 dbintern-5797 -rw-r----- 1 bareos bareos 25M Sep 17 21:35 dbintern-5806 -rw-r----- 1 bareos bareos 43M Sep 18 10:36 dbintern-5824 -rw-r----- 1 bareos bareos 17M Sep 18 13:35 dbintern-5826 -rw-r----- 1 bareos bareos 23M Sep 18 21:35 dbintern-5835 -rw-r----- 1 bareos bareos 30M Sep 19 10:35 dbintern-5853 -rw-r----- 1 bareos bareos 17M Sep 19 13:35 dbintern-5855 -rw-r----- 1 bareos bareos 19M Sep 19 21:35 dbintern-5864 -rw-r----- 1 bareos bareos 27M Sep 20 10:35 dbintern-5882 -rw-r----- 1 bareos bareos 11M Sep 20 13:35 dbintern-5883 -rw-r----- 1 bareos bareos 15M Sep 20 21:35 dbintern-5894 -rw-r----- 1 bareos bareos 5.0G Sep 21 01:12 dbintern-5908 -rw-r----- 1 bareos bareos 5.0G Sep 21 01:27 dbintern-5909 -rw-r----- 1 bareos bareos 2.9G Sep 21 01:41 dbintern-5910 -rw-r----- 1 bareos bareos 49M Sep 22 10:36 dbintern-5933 -rw-r----- 1 bareos bareos 16M Sep 22 13:35 dbintern-5935

-rw-r----- 1 bareos bareos 309M Aug 3 01:08 nfhome-4560 -rw-r----- 1 bareos bareos 187M Aug 4 01:08 nfhome-4580 -rw-r----- 1 bareos bareos 4.4G Aug 5 01:10 nfhome-4607 -rw-r----- 1 bareos bareos 1.4G Aug 6 01:07 nfhome-4634 -rw-r----- 1 bareos bareos 1.1G Aug 7 01:13 nfhome-4661 -rw-r----- 1 bareos bareos 3.7G Aug 8 01:07 nfhome-4688 -rw-r----- 1 bareos bareos 5.0G Aug 9 00:58 nfhome-4715 -rw-r----- 1 bareos bareos 5.0G Aug 9 01:00 nfhome-4716 -rw-r----- 1 bareos bareos 1.2G Aug 9 01:05 nfhome-4717 -rw-r----- 1 bareos bareos 196M Aug 10 01:35 nfhome-4746 -rw-r----- 1 bareos bareos 243M Aug 11 01:04 nfhome-4761 -rw-r----- 1 bareos bareos 5.0G Aug 12 01:01 nfhome-4788 -rw-r----- 1 bareos bareos 5.0G Aug 12 01:08 nfhome-4789 -rw-r----- 1 bareos bareos 1.2G Aug 12 01:11 nfhome-4790 -rw-r----- 1 bareos bareos 5.0G Aug 13 01:18 nfhome-4817 -rw-r----- 1 bareos bareos 86M Aug 13 01:28 nfhome-4818 -rw-r----- 1 bareos bareos 1.9G Aug 14 01:10 nfhome-4845 -rw-r----- 1 bareos bareos 2.8G Aug 15 01:04 nfhome-4873

• Files should have a sensible size limit (here: 5 GB) • HSM needs suitable “MinFileAge” • one job per virtual tape (file) to avoid writing earlier

data of a particular file more than once

Page 10: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Backup of small DB servers

• mostly (very) small MySQL databases • Cronjob 1: create dump files (for each database)

several times a day, store them locally • Cronjob 2: run daily and make sure that older

dumps are deleted as necessary • Run Bareos several times daily

-rw-r--r-- 1 bcka bcka 2.1M 2014-09-17 11:05 itdb_2014-09-17T1105.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-17 14:05 itdb_2014-09-17T1405.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-17 18:05 itdb_2014-09-17T1805.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-18 11:05 itdb_2014-09-18T1105.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-18 14:05 itdb_2014-09-18T1405.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-18 18:05 itdb_2014-09-18T1805.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-19 11:05 itdb_2014-09-19T1105.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-19 14:05 itdb_2014-09-19T1405.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-22 11:05 itdb_2014-09-22T1105.sql.gz -rw-r--r-- 1 bcka bcka 2.1M 2014-09-22 14:05 itdb_2014-09-22T1405.sql.gz

Page 11: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Improved Security withRunBefore- and RunAfter-Hooks

• use a “private” network which only exists during the backup window and which contains two virtual network cards: one in the database server, one in the Bareos server

• grant user “bareos” (which is running the local director daemon) via sudo permission to activate virtual network board eth1, also to deactivate it

• add these commands to the backup job: RunBeforeJob = "dbsec-run-before.sh" RunAfterJob = "dbsec-run-after.sh"

• these shell scripts simply take eth1 up and down, resp.

Page 12: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Reporting (Mail)Date: Sat, 29 Dec 2012 05:19:15 +0100From: [email protected]: [ABR11.5]: Task 'Inkrementelles Backup' erfolgreich abgeschlossen auf Maschine 'RESERVE'.To: [email protected]

1 Information 29.12.2012 05:01:15 Befehl 'Backup-Plan 'DBSEC_inkrementell_taeglich' wird ausgeführt' wird ausgeführt.2 Information 29.12.2012 05:02:35 Befehl 'Backup' wird ausgeführt.3 Information 29.12.2012 05:02:41 Backup von VM 'vm://host-859/421d7d4c-28e3-2497-d9b7-b3a2837b7dfd?type%3dvmwesx'.4 Information 29.12.2012 05:02:45 Gewählter 'Changed Block Tracking'-Modus: 'CBT aktivieren und verwenden'.5 Information 29.12.2012 05:02:46 Changed Block Tracking ist bereits für die virtuelle Maschine 'dbsec' aktiviert.6 Information 29.12.2012 05:02:54 Snapshot 'Sat Dec 29 05:02:48 2012' wird erstellt...7 Information 29.12.2012 05:03:02 Virtuelle Maschine 'vm://host-859/421d7d4c-28e3-2497-d9b7-b3a2837b7dfd?type%3dvmwesx' wird in das Laufwerk-Subsystem geladen.8 Information 29.12.2012 05:03:03 VMware_VDDK: SSLVerifyIsEnabled: failed to open the product registry key. Falling back to default behavior: verification off. LastError = 0 ...14 Information 29.12.2012 05:03:08 Ermittelter 'GRUB 2 Loader' auf Laufwerk '\comp_emu(vm://host-859/421d7d4c-28e3-2497-d9b7-b3a2837b7dfd?type%3dvmwesx)15 Information 29.12.2012 05:03:14 Der Linux-Gerätename kann 'sda' sein.16 Information 29.12.2012 05:03:14 Der Linux-Gerätename kann 'sda1' sein....19 Information 29.12.2012 05:03:18 Volume 'C:' wird analysiert...20 Information 29.12.2012 05:03:18 Volume '1-5' wird analysiert...21 Information 29.12.2012 05:03:21 Ermittelter 'GRUB 2 Loader' auf Laufwerk '\comp_emu(vm://host-859/421d7d4c-28e3-2497-d9b7-b22 Information 29.12.2012 05:03:26 Der Linux-Gerätename kann 'sda' sein....37 Information 29.12.2012 05:05:39 Ausstehende Aktion 152 wurde gestartet: 'Partitionsstruktur sichern'.38 Information 29.12.2012 05:05:42 Erntfernen des Snapshots (snapshot-2061).39 Information 29.12.2012 05:06:01 Befehl 'Backup' wurde erfolgreich abgeschlossen.40 Information 29.12.2012 05:06:23 Befehl 'Validierung' wird ausgeführt.41 Information 29.12.2012 05:19:11 Befehl 'Validierung' wurde erfolgreich abgeschlossen.Task 'Inkrementelles Backup' erfolgreich abgeschlossen auf Maschine 'RESERVE

• Several pages of useless gibberish • How much data? On which media? </rant>

Page 13: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

:-)

#!/bin/shdbname="bareos"username="postgres"psql $dbname $username << EOFSELECT starttime, round(readbytes/1024.0/1024.0, 2) AS "read [MB]"FROM job WHERE name LIKE '%$1%'ORDER by starttime DESCLIMIT 500;EOF

root@:conf.d# list-jobs.sh marvin

list-jobs.sh

• We even like the special appeal of bconsole • But then we also like Emacs and even the other

editor…

Page 14: Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr. Stefan Vollmar

Know thy Increments

http://adsm.org/lists/html/Bacula-users/2011-06/msg00159.html#!/bin/shdbname="bareos"username="postgres"jobid=$1psql $dbname $username << EOFSELECT (SELECT st_size FROM decode_lstat(File.lstat)) AS Size,Path.path AS PATH, FileName.Name, File.MD5FROM File,FileName,PathWHERE File.JobId = '$1'AND Filename.FileNameId = File.FileNameidAND File.PathId = Path.PathIdORDER by Size DESCLIMIT 20;EOF

size | path | name | md5 ----------+-----------------------+--------------------------------------+----------------------------- 29420997 | /var/log/mysql/ | mysql-bin.000888 | QdHJreWvGWiis4ZXNiqJSCqryuU 19301500 | /var/lib/mlocate/ | mlocate.db | A5Nm9Aozf3pT2uSL0AXLNylp05I 1147109 | /var/log/ | auth.log ... | n6pkSuuJxGhwAf+SXGeNv7fuw5k 584751 | /var/cache/man/ | index.db | O4YFMt4ycElhgpV87dy5PF/rvRI 58264 | /var/lib/apt/lists/ | de.archive.ubuntucid-updates_Release | yNB+bjLB8eDvHHgqaq3AstMRB8M 57254 | /var/lib/apt/lists/ | security.ubuntu.com_ubuntu_dists_se | OJoiBYptg8/DYpYJ7Xf9NbNnwfw 57242 | /var/lib/apt/lists/ | de.archive.ubuntu.com_ubuntu_dist | BZKDq7xmy44tApRrvMLRkNeHjiw

largest-from-jobid.sh

root@:conf.d# ./largest-from-jobid.sh 5023

John H. Pierceoriginal code by John H. Pierce