combining ibm real-time compression and ibm protectier deduplication

12
7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 1/12 IBM Systems and Technology Thought Leadership White Paper  July 2011 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication Benchmark tests show that combining storage optimization technologies achieves compelling results 

Upload: ibm-india-smarter-computing

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 1/12

IBM Systems and Technology

Thought Leadership White Paper 

 July 2011

Combining IBM Real-time

Compression andIBM ProtecTIER DeduplicationBenchmark tests show that combining storage optimization technologies achieves compelling results 

Page 2: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 2/12

2 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

Contents

2 Introduction

3 Landmarks in the data optimization landscape

5 The need for data optimization in database backup

and recovery

 7 The test environment: An overview

10 Test 1: ProtecTIER deduplication only

10 Test 2: IBM Real-time Compression and ProtecTIER

deduplication

11 Summary

11 For more information

Introduction As the capacity and overhead of powering, cooling and manag-

ing larger amounts of storage continues to outpace the growth

of storage budgets, IT decision makers are increasingly looking

to optimization technologies to meet capacity demands while

minimizing capital expenditures. Recently, two storage optimiza-

tion approaches in particular have been receiving significant 

attention in the industry: real-time compression for primary and

secondary data, and data deduplication for highly redundant backup data sets. Although sometimes viewed as mutually 

exclusive, the two technologies are, in fact, very complementary.

 This paper discusses the compelling financial and operational

advantages of deploying real-time compression and data dedupli-

cation in conjunction, as demonstrated by the results of tests in

 which IBM Real-time Compression and IBM® ProtecTIER ®

Deduplication solutions were combined to optimize Oracle

database physical backups in a Network File Storage (NFS)

environment.

 The compelling results of combining IBM solutions for real-

time compression and data deduplication in the Oracle database

environment include:

● Greater than 82 percent immediate savings on initial write

to disk.● Greater than 96 percent overall data reduction when com-

bined with deduplication.

● Up to 71 percent reduction in backup time.● Less CPU utilization on the deduplication engine.● Less disk activity in the deduplication subsystem.●

Less network traffic on the deduplication backup network.

Page 3: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 3/12

3IBM Systems and Technology

 Figure 1: The data optimization landscape

Detailed test results are included later in this document. The

bottom line is that combining real-time compression and data

deduplication optimizes your overall storage footprint by reduc-

ing data on your primary NAS devices as well as throughout 

 your data life cycle. By combining these two technologies, you

can achieve maximum data reduction, which maximizes your

return on investment and dramatically improves your data pro-

tection performance and capabilities.

Landmarks in the data optimization

landscape To fully appreciate the benefits of combining real-time compres-

sion and data deduplication, it’s important first to understand

how each technology works, the differences between them, and

 where they fit in the overall storage architecture, as shown

in Figure 1.

Page 4: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 4/12

4 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

Real-time compression

Data compression reduces the size of data files so that less space

is required to store them. Real-time compression, as the name

implies, is the ability to compress data in real time—before it is

 written to the hard disk rather than after—without any notice-

able performance degradation.

Designed to sit transparently in front of primary Network 

 Attached Storage (NAS), IBM Real-time Compression offersthe unique advantage of making it possible to shrink primary,

online data in real time with no loss in speed. With over

30 patents, it reduces the size of every file you create by up to

five times, depending upon the file type. It significantly reduces

the physical capacity required to store a file (or copies and per-

mutations of a file) through the entire data life cycle, including

backup. IBM Real-time Compression also has a feature called

the Compression Accelerator that enables the non-disruptive

compression of data that has already been saved to disk—while

applications continue to have random, read-write access to the

data. IBM Real-time Compression can also significantly enhance

overall network and storage performance, since less data is writ-ten to disk and more data can be stored in the storage cache.

 As more server workloads become virtualized, real-time com-

pression becomes increasingly valuable as a tool for storage

optimization in virtualized environments. The technology works

particularly well given the compression rates associated with

 virtualized files (see Table 1). As a result, many companies that 

have adopted file virtualization technologies are also exploring

deployment of IBM Real-time Compression in conjunction

 with file virtualization. IBM Real-time Compression solutions

transparently integrate with file virtualization solutions andcan dramatically extend the cost reductions that file virtualiza-

tion enables.

File type Compression rate

Database Up to 85 percent

Microsoft Office Up to 20 - 60 percent

 VMware VMDK (virtualized files) Up to 72 percent

CAD/CAM Up to 70 percent

Oil and Gas Up to 50 percent

Table 1: Compression rates

Page 5: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 5/12

5IBM Systems and Technology

Data deduplication

Data deduplication is designed to reduce the physical storage

required to store redundant data. The deduplication process

removes duplicate data and replaces it with a pointer to the main

copy, leaving only one copy of the data that actually has to be

stored. This is why it is well suited for backup data where there

are typically multiple data sets (daily/weekly, for example) of 

mostly redundant data. The more copies of redundant data you

have, the higher your effective deduplication rate.

IBM ProtecTIER Deduplication solutions feature revolutionary 

and patented HyperFactor® data deduplication technology.

 They provide enterprise-class performance, scalability and

proven enterprise-level data integrity to meet disk-based data

protection needs while enabling significant infrastructure cost 

reductions. They specifically provide improved backup

performance, up to 2000 MB/sec (7.2 TB/hour) sustained

inline deduplication, and even faster restores at up to

2800 MB/sec (10 TB/hour). It also provides:

●  The ability to scale to 1 PB of physical storage.●  A reduction in storage capacity consumption of up to 25 times

or more.●  A non-hash-based approach that protects data integrity by 

reducing the risk of data loss due to hash collision.

Technology differences

 As described above and illustrated in Figure 1, real-time com-

pression and data deduplication technologies address different 

problems and sit at different points in the data life cycle. But 

more importantly, the two technologies are complementary;

in particular, deploying real-time compression significantly 

enhances the value and performance of data deduplication. This

conclusion has been demonstrated in a series of performance

tests, which are described in detail in this paper.

The need for data optimization in

database backup and recoveryIn general, backup and recovery refers to the various strategies

and procedures involved in protecting a database against data

loss and allowing for the reconstruction of the database after any 

kind of disaster. The performance and reliability of backup and

recovery operations are critical to effective database operation.

Physical backups are backups of the physical files used in storing

and recovering your database, such as data files, control files and

archived redo logs.

Ultimately, every physical backup is a copy of files storing data-

base information to some other location, whether on disk or

some offline storage media such as tape. Backup performed after

a database is properly shut down is called cold database backup.

Conversely, backup performed when a database is online and

fully functional is called hot database backup. During a cold

backup, the database is shut down and unavailable; obviously, any 

technology that reduces the period of time that the database is

offline is advantageous. In either case, due to the tremendous

growth in data, it is becoming increasingly difficult for backups

to complete within designated backup windows.

Page 6: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 6/12

6 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

IBM TS7610 ProtecTIERDeduplication Appliance Express

Tivoli Storage Manager

NDMPBackup

Gigabit Ethernet Switch

Oracle Database10.2.0.4

3x DB Clients

Quest BenchmarkFactory

IBM Real-timeCompression

IBM N5600-A10

Brocade 4100

LAN

 Figure 2: The test environment

Page 7: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 7/12

7IBM Systems and Technology

 Although the benefits of utilizing database backups (either cold

or hot) are clear, their use can result in the creation of large

amounts of data that must reside in the storage environment,

taking up precious disk space and increasing the complexity and

cost of backup procedures. This is why primary storage opti-

mization is so important.

The test environment: An overview

In order to simulate accurate and realistic data storage scenarios,IBM used Quest Software’s Benchmark Factory and Data

Factory to create and populate an Oracle database running over

NFS to an IBM System Storage® N5600-A10 storage con-

troller. A 37GB baseline database was then used to test the

effects of data deduplication and compression, respectively. Each

test had a seven percent daily change rate that was simulated

between each database copy. Seven database copies were then

taken to simulate a week’s worth of Oracle data sets in an enter-

prise environment through a combination of updates to existing

data, additions of new data, and other database activities such as

delete, drop, create and remove.

Backup using Network Data Management Protocol (NDMP)

 The test environment consisted of an IBM TS7610

ProtecTIER Deduplication Appliance Express acting as a

 Virtual Tape Library attached to an IBM Tivoli® Storage

 Manager server. In such a configuration, the Tivoli Storage

 Manager server controls the virtual tape library through a direct 

physical connection to the library robotics control port.

(The library robotics, the IBM Tivoli Storage Manager server

and the NAS file server are all connected over Fibre Channel.)For NDMP operations, the drives in the library were connected

directly to the NAS file server, with a path defined from the

NAS head to the virtual drives. The NAS file server transfers

data to the virtual tape drives at the request of the IBM Tivoli

Storage Manager server.

 As shown in Figure 3, to allow Tivoli Storage Manager to use

the virtual tape drives for non-NDMP operations, the virtual

tape drives were also connected to the Tivoli Storage Manager

server, with paths defined from the server to the drives.

 This configuration also supports an IBM Tivoli Storage

 Manager storage agent having access to the virtual tape drivesfor its LAN-free operations.

Page 8: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 8/12

8 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

Tivoli Storage Manager Server

NAS File Server

NAS File Server

File System Disks

 Web Client

(optional)

Virtual Tape Library

LEGEND

SCSI or Fibre Channel Connection

TCP/IP Connection

Data Flow

Robotics Control

Drive access

 Figure 3: NDMP architecture

Page 9: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 9/12

9IBM Systems and Technology

SAN configuration

 All components with Fibre Channel connectivity (Tivoli Storage

 Manager server, TS7610 ProtecTIER Deduplication Appliance

Express and IBM System Storage N5600 storage controller)

 were connected to a Brocade 4100 SAN switch in the test envi-

ronment. According to Fibre Channel SAN zoning best prac-

tices, five zones were defined, each including one initiator and

one target. Four zones were defined to connect ProtecTIER’s

 virtual tape library robotics and its virtual tape drives to the Tivoli Storage Manager server in a redundant manner, in order

to enable Control Path Failover (CPF) and Data Path Failover

(DPF) between the Tivoli Storage Manager server and the

ProtecTIER virtual tape library. Control Path Failover and Data

Path Failover were handled transparently by the IBM Tape

Device Driver for Windows (IBM tape) installed on the Tivoli

Storage Manager server. In addition, a single virtual drive was

zoned to the N series to enable the NAS file server to transfer

data directly to a virtual drive.

N Series configuration

 The IBM N5600-A10 storage controller configurationincluded 28 144 GB 15k Fibre Channel drives in a RAID-DP

environment. The N5600 operating environment was ONTAP

7.3.3 for the tests.

IBM ProtecTIER Deduplication configuration

 A ProtecTIER virtual tape library was defined with two virtual

tape drives, each assigned to one Fibre Channel front-end port.

 To enable Control Path Failover, the virtual robot was assigned

to both Fibre Channel front-end ports. ProtecTIER’s LUN

masking feature was used to assign specific virtual devices to a

specific host running backup application modules. This feature

enables multiple initiators to share the same target Fibre

Channel port on the ProtecTIER system without having

conflicts on the devices that are being emulated.

 Ten cartridges were defined to store backup data. To limit the

nominal cartridge size, maximum cartridge growth was set to200 GB. Under these conditions, as soon as a cartridge stores

200 GB of nominal data, it is marked “full” and another car-

tridge is used to backup data. Ten virtual tape slots were defined

in the virtual tape library to house the ten virtual tape cartridges.

By default, eight import/export slots were defined.

IBM Real-time Compression configuration

 The IBM Real-time Compression Appliance STN6800 (Version

3.7.0) with Gigabit Ethernet ports were used for the tests.

1-Gigabit Ethernet connections were established to the Gigabit 

Ethernet switch and the N5600 for connectivity between the

Oracle server and the N5600 storage controller.

Page 10: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 10/12

10 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

Tivoli Storage Manager configuration

 A server running IBM Tivoli Storage Manager 5.5.4.3 Extended

Edition was installed using Tivoli Storage Manager’s built-in

configuration wizards. (An Extended Edition license is required

to allow NDMP backups of NAS devices.) The Tivoli Storage

 Manager database size was configured to 2048 MB and the log

size was configured to 1024 MB.

 To initiate an NDMP backup from the Tivoli Storage Managerserver, the backup node command was used to perform a full

backup of the Oracle database files on the N5600 storage con-

troller. A table of contents (TOC) was not created as it is needed

only for single file restore. The backup NAS process in Tivoli

Storage Manager was monitored to measure backup time.

Test 1: ProtecTIER deduplication only To illustrate the benefits of data deduplication, tests were per-

formed to deduplicate the seven 37 GB cold backup sets using

the ProtecTIER deduplication appliance only. Deduplication

 was performed during the time the data was copied using

NDMP from the N5600 storage controller to the ProtecTIER 

deduplication appliance.

Test 2: IBM Real-time Compression and

ProtecTIER deduplication To illustrate the added benefits of using real-time compression

 with deduplication, an IBM Real-time Compression STN6800

appliance was installed in front of the IBM N5600 storage

controller. IBM Real-time Compression provided an immediate

footprint reduction of the database file size from 37 GB to

6.6 GB, a reduction of over 82 percent. The introduction of 

IBM Real-time Compression provided immediate space savings,

since the compression was performed in real time, when the data

 was written to storage. No post processing or configuration

changes were required to realize these savings.

Clearly, both data deduplication and compression standing alone

offer significant space savings over traditional, non-optimized

storage. However, the benefits of combining these technologiesare even more compelling.

    B   a   c    k   u   p    T    i   m

   e    i   n    S   e   c   o   n    d   s

Day of Backup

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7

IBM Real-time

Compression withIBM ProtecTIER

IBM ProtecTIER

 Figure 4: Backup times for IBM Real-time Compression combined with

ProtecTIER Deduplication, compared to deduplication alone

Page 11: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 11/12

11IBM Systems and Technology

    U   s   e    d    S   p   a   c   e    G    B

Day of Backup

25

20

15

10

5

01 2 3 4 5 6 7

IBM Real-time

Compression withIBM ProtecTIER

IBM ProtecTIER

 Figure 5: Space used with IBM Real-time Compression combined with

ProtecTIER Deduplication, compared to deduplication alone

 When the ProtecTIER deduplication solution was used to

backup the IBM Real-time Compression compressed data, the

seven compressed backup sets were further reduced by an aver-

age of 39 percent. In addition, backup of compressed data took 

an average 68 percent less time than backup in the absence of 

IBM Real-time Compression.

Summary While IBM Real-time Compression and IBM ProtecTIER 

Deduplication solutions both offer compelling storage and data

protection benefits when used individually, the combination of 

the two technologies has been shown to produce far greater

storage efficiency, significantly reduce backup times, and

improve utilization of resources. Tests involving Oracle

database physical backups have shown that together, these data

compression and deduplication solutions are capable of produc-

ing benefits far exceeding those found with either technology 

alone—including 96 percent overall data reduction and up to

71 percent reduction in backup time, as well as better deduplica-

tion CPU utilization, less deduplication disk activity and less

deduplication network traffic. These results demonstrate strongsynergies between real-time compression and deduplication and

present a powerful argument for using both in order to achieve

storage optimization.

For more information To learn more about how IBM Real-time Compression and

IBM ProtecTIER Deduplication solutions can optimize storage

efficiency in your environment, contact your IBM representative

or visit ibm.com /storage/solutions/rtc and

ibm.com /systems/storage/tape/protectier/ 

 Additionally, financing solutions from IBM Global Financing

can enable effective cash management, protection from technol-

ogy obsolescence, improved total cost of ownership and return

on investment. Also, our Global Asset Recovery Services help

address environmental concerns with new, more energy-efficient 

solutions. For more information on IBM Global Financing,

 visit: ibm.com /financing

Page 12: Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

7/29/2019 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

http://slidepdf.com/reader/full/combining-ibm-real-time-compression-and-ibm-protectier-deduplication 12/12

Please Recycle

© Copyright IBM Corporation 2011

IBM Systems and Technology GroupRoute 100Somers, NY 10589U.S.A.

Produced in the United States of America July 2011 All Rights Reserved

IBM, the IBM logo, ibm.com and ProtecTIER are trademarks orregistered trademarks of International Business Machines Corporation in the

United States, other countries, or both. If these and other IBM trademarksare marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Suchtrademarks may also be registered or common law trademarks in othercountries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com /legal/copytrade.shtml

Other company, product and service names may be trademarks or servicemarks of others.

 This paper is intended to provide information regarding IBM Real-timeCompression Appliance (RTC) in combination with ProtecTIER Deduplication solutions. It discusses findings based on configurations that  were created and tested under laboratory conditions. These findings may not be realized in all customer environments, and implementation in suchenvironments may require additional steps, configurations and performance,compression and deduplication analysis. This information does not constitutea specification or form part of the warranty for any IBM or non-IBM products.

Information in this document was developed in conjunction with the use of the equipment specified and is limited in application to those specifichardware and software products and levels.

 The information contained in this document has not been submitted to any formal IBM test and is distributed as-is. The use of this information or theimplementation of these techniques is a customer responsibility and dependson the customer’s ability to evaluate and integrate them into the customer’soperational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the sameor similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.

IBM may not officially support techniques mentioned in this document.For questions regarding officially supported techniques, please refer to theproduct documentation or announcement letters, or contact IBM Support.

 This document could include technical inaccuracies or typographicalerrors. IBM may not offer the products, services or features discussed in

this document in other countries, and the product information may besubject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. Any statements regarding IBM’s future direction and intent are subject tochange or withdrawal without notice, and represent goals and objectivesonly. The information contained in this document is current as of theinitial date of publication only and is subject to change without notice. Allperformance information was determined in a controlled environment. Actual results may vary. Performance information is provided “AS IS” andno warranties or guarantees are expressed or implied by IBM. Informationconcerning non-IBM products was obtained from the suppliers of theirproducts their published announcements or other publicly availablesources. Questions on the capabilities of the non-IBM products should beaddressed with the suppliers. IBM does not warrant that the informationoffered herein will meet your requirements or those of your distributorsor customers. IBM disclaims all warranties, express or implied, including

the implied warranties of noninfringement, merchantability and fitness fora particular purpose or noninfringement. IBM products are warrantedaccording to the terms and conditions of the agreements under which they are provided.

 TSW03093-USEN-00