catalogic dpx 4.3...

42
Catalogic DPX  4.3 Best Practices Guide dpx43211/18/2014bp

Upload: lamthien

Post on 10-Mar-2018

235 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Catalogic DPX™ 4.3

Best Practices Guide

dpx43211/18/2014bp

Page 2: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

©Catalogic Software, Inc.™, 2014. All rights reserved.

This publication contains proprietary and confidential material, and is only for use by licensees of Catalogic DPX™, CatalogicBEX™, or Catalogic ECX™ proprietary software systems. This publication may not be reproduced in whole or in part, in any form,except with written permission from Catalogic Software.

Catalogic, Catalogic Software, DPX, BEX, ECX, and NSB are trademarks of Catalogic Software, Inc. Backup Express is a registeredtrademark of Catalogic Software, Inc. All other company and product names used herein may be the trademarks of their respectiveowners.

Page 3: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Table of Contents

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

3

Table of Contents

Table of Contents 3

Chapter 1: Technology and Solution Overview 4Audience and Purpose 4

Chapter 2: NetApp Storage System Guidelines 7General Considerations 7Storage and Sizing for Secondary Data 10Storage Configuration 11Existing andMulti-Use Storage 16

Chapter 3: Managing NetApp Storage Systems 18Servers and Data Grouping 18Job Creation and Scheduling 20Miscellaneous Considerations 28External MediaManagement and Device Control (Tape Libraries) 29Troubleshooting and Known issues 30

Chapter 4: External Resource List 35Catalogic 35NetApp 35VMware 36

Chapter 5: Conclusion 37

TRADEMARKS 38

INDEX 41

Page 4: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Chapter 1: Technology and Solution Overview

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

4

Chapter 1: Technology and Solution Overview

Catalogic DPX™ is designed to protect data, applications, and servers using amyriad of storage technologies. Thisguide specifically describes the combination of DPX software and NetApp storage systems. The hardware and softwarecomponents are configured to implement a system that protects data on supported client systems to NetApp storage andoptionally archives the data to tape. This guide offers specific recommendations for system configuration, as well asgeneral guidelines across all components, including data protection software, storage system hardware and software,and tape library configuration. This ensures that the overall solution operates optimally and fulfills customer’s specificdata protection needs.

DPX is compatible with a wide range of NetApp storage offerings including hardware FAS and V-series devices, IBM N-series branded hardware, as well as the NetApp software, Data ONTAP Edge server. Data ONTAP 7-mode is asupported destination for DPX Block backups. 7-mode and Cluster mode (CDOT) are both supported for NDMP backup.

For the latest system requirements and compatibility details regarding supported hardware, file systems, applications,operating systems, and service packs, go to System Requirements and Compatibility. Data ONTAP 7.3.x and later issupported, however it is strongly recommended to run Data ONTAP 8.x or later. Data ONTAP 8.1/8.2 or later arepreferred to take advantage of all current fixes and storage efficiency features.

This guide has been updated for DPX 4.3 and Data ONTAP 8.2. Differences with feature support on prior versions arenoted where important.

Audience and PurposeThis guide is targeted at DPX implementation professionals and advanced DPX administrators. The guidelines listed arebased on deployment and administration experience, as well as the best practices of the respective technology vendors.The document lists known parameters and configurations that lead to a successful DPX implementation. Use it as a toolwhen architecting a solution that fits a customer’s specific data protection needs.

Implementing these best practice guidelines requires knowledge and understanding of the following publishedmaterials:

• DPX Deployment Guide at MySupport

• TR-3487 SnapVault Best Practices Guide

• TR-3466Open Systems SnapVault (OSSV) Best Practices Guide

• TR-3446 SnapMirror Async Overview and Best Practices Guide

• TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide

• TR-3505i.aWhen to Select NetApp Deduplication and/or Data Compression Best Practices (available on requestfrom NetApp or partner)

• TR-3965 NetApp Thin Provisioning Deployment and Implementation Guide

• Data ONTAP 8Documentation

Page 5: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Audience and Purpose

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

5

• Data ONTAP 8.2 Data Protection Online Backup and Recovery Guide for 7-mode (lists specific limits forSnapVault, SnapMirror, and other limitations for specific devices)

• Data ONTAP 8.2 Storage Efficiency Management Guide for 7-mode

• Data ONTAP 8.2e Data Protection Tape Backup and Recovery Guide for 7-mode

• Data ONTAP 8.2MultiStoreManagement Guide for 7-mode

• Data ONTAP 8.2 StorageManagement Guide for 7-mode

• Data ONTAP 8.2 SAN Administration guide For 7-mode

• For additional information about NetApp licensing, read knowledge base article 42502.

• For additional information about FlexClone, read knowledge base article 45779.

Catalogic Software documentation and knowledge base articles can be obtained fromMySupport. NetApp documentscan be obtained from the NetApp Support site and in some cases directly from the NetApp reseller.

The following are also required to architect a successful data protection plan using DPX:

• The implementation of amonitoring and alerting framework and a storage procurement plan tomanage the NetAppstorage systems. These requirements are described in TR-3965 and are essential when utilizing thin provisionedspace to avoid performance degradation and storage availability disruptions in production environments. NetApptechnical support canmake specific recommendations on software and procedures necessary to fulfill thisrequirement.

• Familiarity with the System Requirements and Compatibility information.

• Detailed knowledge of the environment to be protected:

• types of servers, versions of operating systems

• applications to be protected (structured/unstructured data)

• data volatility

• data compressibility

• locations of servers (local/remote) of data

• bandwidth and latency of network links between systems

• Firm understanding of data protection needs including:

• backup frequency and retention needs

• short-term recovery requirements

• long-term archival requirements

• replication/offsite requirements

Page 6: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Audience and Purpose

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

6

• disaster recovery needs and facilities

DPX is integrated with and dependent on the following key NetApp technologies:

• Data ONTAP 8.0 or later with 64-bit aggregates, to support larger data stores and built-in storage efficiency features

• NetApp FlexVol and thin provisioning technology for efficient spacemanagement

• iSCSI LUNs and NetApp FlexClone, used to support fast lightweight access to protected data

• NetApp SnapVault Primary, SnapVault Secondary, and NetAppOpen Systems SnapVault licensing, used for backupdata transfer

• NetApp NearStore licensing to increase data transfer limits on NetApp secondary storage device

• NDMP protocol for Block backup and recovery control, as well as tape operations

• Data ONTAP 8.2.1 or later for non-root account usage andMultiStore (vFiler) integration

Knowledge of these technologies and how they interoperate is crucial to understanding how the best practicerecommendations build a strong foundation for data protection success.

The best practice guidelines follow the chronological flow of DPX implementation, starting with the initial setup of theNetApp storage system, configuration and sizing of storage requirements, followed by creation and scheduling of backupjobs to fulfill a data protection plan. This document also covers items specific to utilizing tape libraries.

Page 7: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Chapter 2: NetApp Storage System Guidelines

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

7

Chapter 2: NetApp Storage System Guidelines

General ConsiderationsDPX supports Data ONTAP 7.3.x and later. 7-mode is required for the DPX Block solutions including agent, agentless,NetAppOSSV, controller to controller SnapVault, and NDMP backup. NetApp Cluster mode is supported for NDMP tapebackup only.

It is strongly suggested that the NetApp controller run themost recent versions of Data ONTAP to take advantage ofstorage efficiency features, general improvements with NDMP, resourcemanagement, andmiscellaneous correctedissues. For older 32-bit controllers, Data ONTAP 7.3.7 is suggested. All newer 64-bit controllers are recommended to runeither 8.1.3P1 or 8.2P3 or later. For additional information on known issues and important fixes, see “Troubleshooting andKnown issues” on page 30.

Note: There is a critical A-SIS related Data ONTAP bug with early versions of Data ONTAP 8.2 described in“Troubleshooting and Known issues” on page 30.

When configuring network interfaces, ensure that themanagement interface, typically e0M, is not a network interface onthe same subnet as other interfaces intended to transfer data. The e0M interface is typically a low bandwidth interface,100-Base-T onmany controllers. Including the interface in a subnet, especially when ip.fastpath is enabled, will lead tolow performance as the e0M interfacemay get included in sending or receiving data transfer traffic. Configure themanagement interface to its own subnet. If it cannot be isolated, the interface can be completely disabled, withmanagement operations taking place over one of the provisioned data interfaces. Disabling ip.fastpathmay also be asolution. For more information, see “NetAppManagement Interface” on page 32.

Data ONTAP 8.x 7-mode high-availability pair controller configurations are supported when each controller is utilized asan independent storage device. However, “cluster failover” features and storagemigration between NetApp nodes is notdirectly managed by DPX. A controller takeover, which occurs during the backup, fails the backup. Backup and restoreoperations to the failover controller are expected to work as along as the failover maintains the SnapVault relationship listand all SnapVault qtrees have been rolled back and quiesced by Data ONTAP. Controller takeover and take back is anoperation that occurs completely outside of the data protection solution, assuming these operations are properlyconfigured and successfully executed, the data protection software should not need any special configuration for backupand restore operations. The NetApp controllers move the necessary storage and IP address needed by DPX.

Data ONTAP versions earlier than Data ONTAP 8.2.1 requires use of root credential to support DPX backup and restoreoperations. If the NetApp storage server is only used as an NDMP backup source, a non-root account can be used in anysupported Data ONTAP version.

Data ONTAP versions earlier than 8.2.1 configured with aMultiStore license are limited to using vFiler0 as the backupdestination and recovery source. See the “SnapVault andMultiStore” section of the Data ONTAP 8.2 7-Mode DataProtection Online Backup and Recovery Guide from the NetApp Data ONTAP 8Documentation.

Data ONTAP 8.2.1 and later introduces a new NDMP security method used to support non-root accounts and to enableMultiStore configured controllers to use any vFiler for data protection operations. For additional information on setting upvFilers and scanning in nodes with non-root accounts, read knowledge base article 46640. However, note that use of non-root accounts and vFiler access may have security implications that are important to consider. The Deployment Guidemakes the following specific security related recommendations:

• options ndmpd.authtype challenge

Page 8: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide General Considerations

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

8

• options httpd.admin.ssl.enable on

The first option instructs the NDMP service to transmit account credentials using anMD5 hash and the second sets upsecure HTTPS access for Data ONTAP API control. Using these options, which employ user credential encryption overthe wire is strongly suggested, however the combination of these secure features is only available to the root account andonly for vFiler0 whereMultiStore is installed. For more information, see Enable Options and Services in the DeploymentGuide

Non-root account setup, including any account used for vFiler access, requires the use of the new NDMP authtype"plaintext_sso", which is effectively equivalent to "plaintext" in that it transmits NDMP credentials in the clear over thenetwork. Note that vFilers, except for vFiler0, do not have an HTTPS service available to them and only support HTTPAPI access which also transmits user credentials in the clear.

The root volume (usually /vol/vol0) should not be housed on a large aggregate. It is recommended to leave the rootvolume in the default NetApp supplied configuration and not expand the aggregate containing the root volume to hostother data. If the root volume encounters a data consistency issue and requires a Data ONTAP wafliron correction, theentire aggregate where the root volume is located needs to be scanned. Large aggregates can take several days to runthrough suchmaintenance tasks. This condition is not a common occurrence, consult your NetApp implementationengineer or NetApp technical support for any specific questions and concerns about adjusting Data ONTAP’s rootvolume and containing aggregate.

FlexClone licensing is strongly suggested for all DPX implementations. DPX integrates with FlexClone features tostreamline the DPX condense process, assist with recovery features such as Instant Access and virtualization, andavoid the need to break SnapMirror replication relationships. For more information, read knowledge base article 45779.Additionally, discuss with your sales representative or data protection engineer. If you choose not to use FlexClone, thenconsider reviewing the Data ONTAP SAN Administration Guide for 7-mode with respect to setting the snapshot_clone_dependency option on each DPX data volume to avoid errors in the DPX condense process. Without FlexClone, mostrestore operations using a SnapMirror destination volume require that you break the SnapMirror relationship before therecovery action can succeed.

NearStore licensing is required to increase resource limits for concurrent backup and restore operations. NearStore isusually included with most modern Data ONTAP versions.

NetApp storage systems have specific NDMP kernel thread concurrency limits. In general, each data protection taskrequires an NDMP connection, which uses a Data ONTAP NDMP kernel thread. A DPX job generally initiates a separatetask to control a backup or restore operation initiated for a specific device. For example, a Block backup of aMicrosoftWindows server with two source volumes consumes two NDMP connections. Similarly, an NDMP tape backupconsumes an NDMP session for each source volume in the backup job.

• Total concurrent NDMP operations are the sum of all backup, restore, NetAppOSSV, and SnapVault Primarytransfer operations performed by a NetApp storage system. Each of these types of operations has its own specificlimits; however, concurrency of all these operations is bounded by the system’s NDMP kernel thread limitation.

• When architecting new backup jobs, it is important to account for all concurrent tasks that may already be running atthe time a new job starts. Ensure that all concurrent jobs do not exceed the system’s NDMP kernel thread limit. It isstrongly advised to reserve at least 20 NDMP kernel threads at any given time as a buffer for unexpected joboverlaps, emergency restore operations, and other ad hoc Data ONTAP data transfers.

• Very large client servers can generate a significant number of tasks in a backup job. LargeMicrosoft Exchange DAGclusters are especially prone to this. When the number of DAG data devices is large and replicated tomany DAGhosts, there is amultiplicative effect on task generation that must be accounted for. For example, if a 4-node

Page 9: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide General Considerations

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

9

DAG cluster each contain 25 disk devices used for Exchange data, this translates into roughly 100 tasks to back upthe entire cluster. Thus, a single job containing this cluster could affect other backup jobs running in parallel. Considerjob scheduling and setup strategies which effectively avoid NDMP task concurrency concerns.

• NDMP backup/restore is limited to 40 concurrent operations. However, each NDMP transfer also counts against theNDMP kernel thread limits mentioned previously.

• Exceeding NDMP kernel thread or concurrent operation limits may lead to job failure.

When architecting a DPX backup solution to work with NetApp functionality managed outside of DPX, such as NetAppSnapMirror, qtree SnapMirror, VolumeCopy, externally triggered NetApp SnapVault transfers, SnapDrive,SnapManager, and Snap Creator; confirm the specific technical limits that apply to your storage systemmodels. Givespecial consideration when DPX is added to an existing NetApp storage system that serves multiple purposes, forexample, primary and secondary storage. The abovementioned functionality generally do not count against NDMPkernel thread limits, however, other storage system specific Data ONTAP limits may apply.

System limits and current usage can be determined from Data ONTAP command line interface directly using thefollowing commands:

priv set advanced

rsm show_limits

Most of these limits are subject to system queuing when concurrent resource requests exceed systemmaximums.However, note that backup and restore operations consume resources bound by the storage controller limits, especiallySV SRC and SV DST limits reported by rsm show_limits. Exceeding these limits may result in DPX job failures. Onecommon conflict exists when an aggressive backup schedule, utilizing SnapVault protocol, is run in parallel withaggressive SnapMirror schedules.

rsm show_limits prints a detailed review of system resources available and their cost. At the top of the output, thiscommand prints Reservations, Tokens, and Transfers summaries. Reservations refer to resources that are reserved forspecific operations. Reservations for volume SnapMirror are reported in the VSM figure and are controlled by the optionreplication.volume.reserved_transfers. SnapVault is reported by the QSM reserve figure and is controlled by the optionreplication.logical.reserved_transfers. Careful use of these options and reservations is recommended, as reservingthese resources will prevent other operations from running, even when no reserved transfers are in progress; thereservations remain idle. The Transfers section displays real time information about current system use. Each operationhas an assigned cost that when underway removes Avail_Tokens from the system resource pool. TheMP VSMSRC/DST figures report resources used for volume SnapMirror transfers. SV SRC/DST figures report SnapVaultprotocol use including DPX agent Block backup and controller to controller backup. Legacy SV SRC/DST report onNetAppOSSV agent transfers.

For a secondary NetApp system dedicated to DPX storage, the NDMP kernel thread use is generally the limiting factorfor agent-based backup and NDMP tape use. The rsm show_limits resource figures aremainly of concern if you arecoordinatingmultiple controller operations such as running SnapVault and SnapMirror operations in parallel, using volcopy, coordinating OSSV transfers outside of the DPX product, or using other NetApp technologies in amixed storageenvironment.

Agentless backups are not generally constrained by NDMP kernel thread limits as these operations transfer data directlyto and from LUNs for either iSCSI or Fibre Channel. Agentless also does not use any resources reported by rsm show_limits. NetApp LUN concurrency limits are significantly higher and do not have an effect on this functionality. Consider

Page 10: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Storage and Sizing for Secondary Data

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

10

how concurrent agentless data transfers to NetApp storage systems could affect overall performance, however, thesejobs can be freely run concurrently with other agent-based jobs.

Customers familiar with NetApp DFM and ProtectionManager products have documented SnapVault fan-in limitationsthat are quite low, typically four relationships or less. DPX is different than those other products and the published limitsfor DFM and ProtectionManager do not apply to DPX. It is typical for DPX tomanagemore than four relationshipsfanning into a single volume including DPX client backup, OSSV agent, and controller to controller SnapVault. DPX hasimplemented SnapVault control to be conforming with all NetApp guidelines and best practices and has verified that ouruse of SnapVault and fan-in is well within the limits and expectations for Data ONTAP.

Storage and Sizing for Secondary DataAny physical disk drives supported by NetApp can be used for secondary storage. It is typical to see secondary storageneeds met with lower-cost and larger capacity SATA drives. When architecting DPX, consider the usable spaceavailable at the volume level after aggregate creation.

It is strongly recommended to follow NetApp's best practices for provisioning disks and creating storage aggregates. It isgenerally recommended to take a conservative approach and use the typical RAID-DP options and provision therecommended number of hot spares. The following are not recommended:

• Using RAID4 with aggregate setup

• Eliminating hot spares

• Extending the aggregate used for the root/boot volume

Storage tuning is a useful method to increase the usable space for secondary storage. These are topics that should bereviewed and approved by NetApp support and the NetApp hardware implementation engineer. They can assess thestorage configuration risks to the Enterprise.

Storage needs for DPX depend on the size of existing data, frequency of backup, retention period, and change rate.Consult your Catalogic Software sales engineer for approximate storage requirement estimates for your specificenvironment and data protection policies. It is advised to take a conservative approach for initial storage provisioning, asit can be difficult to estimate what an environment's change rate and growth will be over time. Additionally, note thatstorage efficiency savings are not absolute and are inherently data dependent. A-SIS and compressionmay not beappropriate for all secondary storage volumes and the savings achieved with either are highly dependent on similarity andcompressibility of the source data.

Short-term iSCSI restore operations, for example IA map and BMR, generally do not consumemuch space. However,longer term use, such as long running RRP restores or use of IV for I/O intensive large data sets could consumesignificant space in the volume containing the LUN. Youmay either reserve aggregate space to account for such usecases or regularly monitor aggregate space usage to avoid out-of-space conditions.

FlexClone volume clones temporarily count against the storage system’s maximum volume limitation. This is of specialconcern if the total number of volumes used on a storage system for both primary and secondary data is very close to thestorage system’s published limits.

Page 11: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Storage Configuration

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

11

Storage ConfigurationCreate the largest 64-bit aggregates the system can support. This maximizes usable space by minimizing the number ofdrives dedicated tomeeting RAID parity and hot-spare requirements. 64-bit aggregates are required to takemaximumadvantage of various storage efficiency options.

Aggregates for secondary storagemay span storage shelves; this helps to create aggregates of the largest sizesupported by the storage controller.

DPX does not use aggregate level snapshots, it is highly suggested that aggregate level snapshots be disabled onaggregates hosting secondary backup data. Aggregate level snapshots unnecessarily trap blocks that otherwise expireand are removed in the course of normal operations. Before disabling aggregate level snapshots, check that the NetAppstorage system does not require the use of these snapshots for other Data ONTAP features such as NetApp SyncMirror.

Disable snapshot reservation and NetApp scheduled snapshots on all volumes that contain backup datamanaged byDPX. The snapshot reservation wastes space in this case and snapshot scheduling can have unintended side effects,such as retaining blocks of data that should be expired and using up limited snapshot copies that are needed for newbackups.

Data ONTAP 8.0.x cannot migrate existing SnapVault data from 32-bit to 64-bit aggregates. It is advised to create 64-bitaggregates and re-base any existing SnapVault data to the 64-bit aggregate. Once the retention period requirements aremet on the new destination, the 32-bit aggregate can be removed and the disks merged into an existing or new 64-bitaggregate.

Data ONTAP 8.1 and later can automatically convert 32-bit aggregates to 64-bit. Youmust add enough disks to theaggregate such that the newly expanded storage exceeds the 32-bit storage limit. The aggregate converts automaticallyand in the background. A storage administrator can expect to see some performance degradation to aggregates that arein the process of migrating to 64-bit, however all existing data and SnapVault relationships should be unaffected.

NetApp systems have a file size and LUN size limitation of 16 TB. This limitation applies to all of the recent versions ofData ONTAP and all NetApp controller modules. This 16 TB limitation is the limit for any single volume you need toprotect. For example, a client machine configured for agent-based backup cannot have any one file system larger than 16TB. For agentless backup, the VM cannot have a VMDK that is larger than 16 TB. Other methods supported by DPXsuch as file level or OSSV backup can be used to protect file systems exceeding the 16 TB limit.

Use thin provisioning with proper monitoring, event alerting, andmitigation plans in place. See “General Considerations”on page 7. Proactively monitoring space utilization at the volume and aggregate level is necessary to avoid the aggregatefilling up. An aggregate running out of space has significant effects on any operation that the aggregate supports,including performance degradation and I/O errors. An aggregate that runs out of space sends all transferring SnapVaultrelationships into a quiescing or rollback state that deadlocks on resource contention. The only solutions are to addmorespace to the aggregate or to kill the SV relationship and delete affected data. It is suggested that you use spacemonitoring and threshold alerting tools to closely monitor storage system space utilization.

A general guideline is to create thin provisioned destination volumes that are two to four times the space required for theinitial base backup of the source data. Actual space required depends on the retention duration, backup frequency, andthe expected change rate of the source data. For a typical server that is backed up once per day, a 14 day retention wouldrequire two to three times destination storage and a 30 day retention three to four times destination storage. For morecomplex backup/retention/change rate scenarios, contact Catalogic Software sales engineering or professional servicefor assistance.

Page 12: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Storage Configuration

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

12

Enable deduplication or A-SIS on all volumes used for secondary storage:

• Check the version of Data ONTAP in use. Older versions of Data ONTAP 8.2must be upgraded to Data ONTAP8.2P3 or later to avoid a potential data corruption issue when using A-SIS. See “Troubleshooting and Known issues”on page 30.

• For agent-based backup target volumes, disable the default automatic deduplication schedule. The completion of theSnapVault processes automatically starts a deduplication task following the data transfer. There is no way to alter orcontrol the automatic SnapVault deduplication process other than to disable A-SIS storage efficiency features at thevolume level.

• For agentless backup target volumes, enable scheduled A-SIS deduplication on the volume if it is not alreadyenabled. If a deduplication run is not scheduled, the data in the volume does not benefit from additional storageefficiency gains even if the destination volume is configured to support this. Schedule the operation to occur after thedata transfer has taken place. In general, schedule a deduplication operation to occur one to two hours after the job isexpected to complete. Scheduling frequency or exact timing is not critical, however, it is suggested to rundeduplication at least once per day and preferably to complete before any volume SnapMirror operations.

Change deduplication configuration and scheduling using the NetAppOnCommand SystemManager utility or thecommand line interface:

• For agent-based backup, use the NetAppOnCommand SystemManager user interface utility to select the On-demand option for the volume. The same can also accomplished using sis config –s – path to disable alldeduplication schedules. The deduplication process is automatically triggered by the completion of a SnapVaultoperation and cannot be scheduled or overridden. If deduplication is undesirable, for reasons such as the data doesnot deduplicate well or there are performance concerns; it can be disabled on a volume by volume basis or across theentire NetApp storage system.

• For agentless backup, it is suggested to use the OnCommand SystemManager Scheduled volume option. See alsothe sis config manual page for command line scheduling options. Configure a schedule to begin the deduplicationprocess after the backup job is complete, or at some point during the day where starting such process has minimaloverlap with other processes that may consume storage system resources.

• Current versions of Data ONTAP 8.x and later support up to eight concurrent deduplication processes across astorage system controller and one active process per volume. Data ONTAP queues any outstanding deduplicationrequests.

• Automated schedule requires that the default data growth threshold of 20% be crossed before initiating deduplication.See TR-3505 – NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide. This is set usingthe OnCommand SystemManager Automated volume option or by issuing sis config –s auto path from thecommand line.

• It is not generally suggested to use the Automated deduplication setting for all volumes. Consider using this featureon some of the volumes if the number of concurrent deduplication processes is of concern or if the average size ofincremental data transfer generally exceeds the 20% threshold. Automated deduplication is not suggested forvolumes containing DPX agent block data through SnapVault as the SnapVault protocol triggers A-SIS automatically.

Compression features may be considered to help optimize secondary storage usage. Enabling compressionmay impactstorage system CPU andmemory usage patterns and should be tested within your specific environment prior to ongoinguse. See TR-3505 – NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide for NetApprecommendations as well as the additional suggestions for overall DPX implementation in this document. Inline

Page 13: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Storage Configuration

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

13

compression is available in Data ONTAP 8.0 and later and should be tested on a volume by volume basis until anacceptable balance between resource use, performance, and data protection goals are achieved. Data ONTAP 8.1 andlater offer the post-process compression feature, which can also be considered and tested to help defer compressionoperations and help normalize storage system resource usage throughout the day.

Disable volume auto-grow for secondary data volumes used by DPX. It is generally better to thin provision and over-sizethe volume rather than use auto-grow features. In the field, auto-grow does not seem to work well with SnapVault datatransfers, generally causing these to fail and enter quiesce when the auto-grow action is initiated.

Disable auto-delete of older snapshot copies. Leaving this option enabled can potentially lead to recovery point data lossif the volume or aggregate starts to run out of space. During low space conditions, Data ONTAP attempts to delete oldersnapshots to reclaim space. Loss of DPX created snapshots are not synchronized with the DPX catalog. Thus the DPXcatalog reflects recovery points that should be available but are in fact removed from the secondary storage.

Do not configure snapshot reserve or fractional reserve for secondary storage volumes containing DPX Block backupdata. Configuring this does not interfere with any operations, but is unnecessary. Fractional reserve is a primary storagefeature that is not used with DPX data. Snapshot reserve is not necessary since all recovery points are retained insnapshots. Setting a snapshot reserve removes the reserved percentage from certain storage reporting and only serve toconfuse storage administration.

There are no explicit requirements with regard to volume naming conventions, however, the following recommendationsmay help simplify ongoingmanagement of the DPX solution:

• Keep volume names as short as possible because NetApp limits qtree path names to a total of 63 characters. qtreesare used for primary storage (CIFS/NFS), NetAppOSSV backups, and all other DPX Block backup data repositories:

• Data ONTAP imposes limitations on the total path name length and some of the file names created during thebackup process can be lengthy. Keeping the volume name short helps avoid conflicts in the backup and recoveryprocess.

• There are some recovery processes that may require keying in a volume namemanually or selecting it from a list.Shorter names are easier to identify and find.

• qtree names generated by the data protection solution are constructed from a combination of the backup jobname, node name, and a task-specific identifier. The NetApp volume path name, backup job name, and the servernode name can be controlled by the user. The task-specific identifier is an internal reference that is generatedbased on the job type and device type or both.

• Avoid naming conventions where volume names are very similar. For example, avoid using volume names that areprefixed or suffixed with long strings of identical characters. Doing so generally makes it difficult to sort, search, andenter volume names.

• It is best to make volume names as unique and as readable as possible, preferably using some consistent namingconvention that is obvious and easy to understand for all individuals involved inmanaging andmaintaining thesolution.

• Consider adding a short prefix that associates the volumewith some function. Examples include designation oflocation, department, server function. Additionally, consider suffixing the volume namewith something thatidentifies the retention period. Large storage systems with many aggregates may also benefit from a keyword orletter to designate the purpose or location of a specific volume.

Page 14: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Storage Configuration

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

14

• Avoid special characters, for example *, &, !, non-printable ASCII, and other extended character sets whennaming volumes, aggregates, jobs, and logical node names, as thesemay bemisinterpreted by the software.Data ONTAP does not allow qtrees to contain non-ASCII characters.

Monitor aggregate and volume space usage on an ongoing basis and avoid situations where the aggregate or individualvolumes run out of space. Full or nearly-full aggregates can suffer from performance degradation and it is important toconfigure your storage system to avoid such circumstances. It can be challenging to correct a low space conditionwithout deleting data, adding new storage, or moving data around. A volume running out of space is easier to addressprovided the containing aggregate is not full or nearly full, however, a full volume affects all backup and restore operationsfor which it is a destination. A volume running out of space also leads to qtree rollback operations, which generally requireaddingmore space to correct and depending on the data size; can take several hours or days to remedy. See “Storageand Sizing for Secondary Data” on page 10 for a spacemonitoring, alerting, andmitigation plan that must be in place toproperly manage NetApp storage servers using thin provisioning.

Consider the Data ONTAP total volume count limits. Lower-end NetApp storage systems are limited to 200 volumes percontroller, while higher-endmodels have a limit of 500 volumes per controller. This includes volumes used for anypurpose such as primary storage, secondary storage, root volume, and volume FlexClones. This is an important factor toconsider when grouping servers into volumes and in any scenario where NetApp storage is not dedicated entirely to DPX.

Configure and enable Storage efficiency features, such as deduplication and compression, on the volume prior to runningthe first backup job. Greatest benefits are realized when new/empty volumes are used for the initial base backups,followed by scheduled incremental backups. Although it may be possible to run deduplication and compression onexisting data, this does not generally result in immediate storage savings andmay require additional space and I/O forprocessing. When carrying out storage efficiency post-process operations on existing data, maximum benefits aretypically not attained until after all data trapped in existing snapshots have expired.

Enabling inline compression indiscriminately on all configured volumes is not recommended. Inline compressionalgorithms consumeCPU andmemory resources that can overburden a NetApp storage system resulting in degradedperformance. Test compression features prior to ongoing use on specific volumes containing representative data. This isespecially true for lower-end FAS 2xxx series models and storage systems with mixed workloads, for example,production data and backup on a single storage system. Conduct testing to determine if compression should be enabled,on which volumes, and how the use of the features affects overall storage system performance. The backup secondarystorage use case is much different than typical primary storage I/O use; backup tasks send a continuous stream of dataall of which is subject to compression as the controller receives it. Inline compression is not recommended when thecontroller CPU use is above 50% utilization. Inline compression slows down the backup process by as much as half ormore, based on controller size and available capacity.

As data compressibility and performance impact are inversely related, consider disabling compression on volumes whereefficiency gains from compression are not significant. Leaving compression enabled in such cases does not producemeaningful storage efficiency gains; however, it does consume storage system resources. Similarly, do not attempt tocompress data that is already highly compressed (file servers with a lot of JPG images, ZIP files). Note that themorecompressible the data is, the lower the system resource overhead and performance impact is on the storage system.

Deferred volume compression has also delivered favorable results. Deferred compression is a scheduled task which ismonitored, limited, and controlled by Data ONTAP, similar in concept to how A-SIS is controlled. Data ONTAPschedules and limits resources consumed by deferred compression so that it has minimal effect on other importantcontroller operations such as backup and primary storage use.

Consider the impact of of deduplication and compression on NDMP tape backup:

• Dump backup – data is rehydrated and uncompressed inmemory before being written to tape. This inflates theamount of data which goes to tape and consequently requires media equal to the amount of logical data being stored.

Page 15: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Storage Configuration

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

15

• SMTape backup – all volume attributes are preserved and the data is written to tape with deduplication andcompression savings intact. Note that the storage system used for the restore operation needs the same or later DataONTAP release and all options and licenses enabled for the restored data to be accessible.

• For amore detailed description of “Dump” and “SMTape” backupmethods, see the DPX Deployment Guide atMySupport.

Consider Storage efficiency features such as deduplication and compression or both on agentless backup destinationvolumes when the VMware source of that data resides on an NFS data store. For VMware NFS data stores, the initialbase backupmay transfer the entire allocated VMDK, even if the virtual disk is thin provisioned, and not all of the sourceblocks are occupied by data. Although the base backup needs to transfer the entire VMDK footprint once, deduplicationor compression or both should eliminate the destination storage impact of the unallocated blocks, thus saving space onthe NetApp volume. Ongoing incremental backups only transfers blocks changed since the initial base backup. Theabove described behavior is consistent with VMware Change Block Tracking features detailed in VMware KB article1020128.

Inline compression can be used as a tool to reduce the required landing zone for agentless backup data originating fromNFS datastores where initial CBT tracking is not possible. This does not reduce the amount of base backup data sent tothe controller but it may drastically reduce the initial storage required to hold the base backup. Once the base backup iscompleted, inline compression can be disabled or replaced with deferred compression.

DPX can track VM locations when VM's are part of a higher-level object, such as a resource group. Note that VMwaredoes not maintain CBT when a VM is storage vMotioned to another datastore. If a storage vMotion occurs, the CBT islost and this results in a DPX base backup transfer for that affected VM. If themoved storage is hosted on an NFSdatastore, then the base transfer might also include all data for the device, not just allocated blocks. If you expect VM's tobemoved frequently, account for this in your secondary storage plan. You will either need tomonitor and extend thevolume size each time a base transfer is needed or perhaps back up the VM resource group on a short retention cyclewhich expires and frees up space frequently. The agent-based solution is preferred here as the agent-based backupcontinues incremental regardless of how or when the VM's storage is migrated.

When utilizing SnapMirror between NetApp storage systems, consider the bandwidth needed to support the base backupdata sets to transfer and a prudent synchronization schedule:

• If tape is available, consider using SMTape backup and restore to seed volume SnapMirror operations.

• Where tape is not available or inconvenient, configure the SnapMirror relationship on empty volumes, before any otherdata transfer takes place. This establishes the relationship and initializes checkpoint facilities. Subsequent updatesbenefit from checkpoint restart should data transfer interruptions occur. SnapMirror can be established after thesource volumes contain data, however, this requires that all of the source volume data be successfully transferredfirst before checkpoint data can be established. An interruption in the SnapMirror initializationmay require a larger setof data to transfer from the beginning.

• Configure SnapMirror synchronization to occur once per day, after the backup is completed. Do not attempt toconfigure SnapMirror synchronization to runmultiple times per day as this is not necessary, it wastes availablesystem resources, and can interfere with other important operations. A conservative approach would be to configurethe volume SnapMirror to occur 12 hours after the backup has completed.

• Consider running SnapMirror schedules outside of the backup window when possible. This avoids controller resourceconflicts between SnapVault and SnapMirror operations indicated in the “General Considerations” on page 7.

Page 16: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Existing andMulti-Use Storage

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

16

• It is not recommended to use SnapMirror Sync or Semi-Sync with DPX. These Data ONTAP features are intended tobe used as near real time storagemirroring for primary storage use cases. The SnapMirror Sync and Semi-Syncfeatures do not add any additional protection to DPX secondary storage volumes. Using suchmay imposeunnecessary load on your controller when backup, restore, A-SIS, compression, or condense operations areunderway. Additionally, it is unknown if data on the SnapMirror Sync/Semi-Sync destination will be usable for restoreif themirror is interrupted in themiddle of a backup, snapshot, or A-SIS operation. If you desire to replicate secondarystorage, Asynchronous SnapMirror is the preferred and supportedmethod.

• Creating FlexClone copies from SnapMirror destination snapshots may lead to SnapMirror replication errors if thesource DPX host continues a normal cycle of backup and condense. If a recovery operation such as an IA, IV, oragentless recovery lasts for a very along time, the SnapMirror sourcemay want to expire that snapshot and replicatethese changes to the SnapMirror destination. The replication fails since the destination has snapshot data being heldbusy by the recovery operation. This condition can also occur if a recovery operation fails in a way that leavesFlexClone volumes behind or if a manual FlexClone is created from the SnapMirror destination. This is a known andexpected behavior of Data ONTAP. You will need to either remove the FlexClone copy or arrange tomanually split offthat clone because a data copy operation could take significant time.

iSCSI is a core requirement of the DPX data protection solution. iSCSI is mainly used for block restore, verification,BMR, and agentless backup/restore. If the NetApp controller is not configured for iSCSI or is isolated from otherproduction networks, the DPX may not be able to perform desired functions. NetApp interfaces that block iSCSI trafficvia options interface.blocked.iSCSI may fail backup and restore operations that require the use of that interface.Entities that would require access to a NetApp interface for iSCSI access include hosts attempting IA maps, ESX hostsattempting agent or agentless virtualization restore, Virtualization proxy nodes coordinating agentless backups, andBMR.

VMware ESX servers must be able to interact with iSCSI. The ESX server must have the software iSCSI initiatorinstalled and enabled. The ESX server must have one or more local NICs that can host a vmkernel service; iSCSI LUNattachment for to the ESX server takes place on an available vmkernel interface. The vmkernel requirement disqualifiesuse of any NICs that are used for high availability and clustering functions where the interface and available IP addressesare not usable by the local ESX server. For agentless backup, your designated virtualization proxy nodes, which could bephysical or virtual hosts, must have access to a NetApp interface to write iSCSI data. Additionally, the individual VM'shosted on an ESX cluster must have the ability to route and access iSCSI sources on a routable NetApp interface toperform restore operations.

Existing and Multi-Use StorageFor NetApp systems that host primary (production) and secondary (backup) data, additional considerations apply. DPXbackup jobs produce a different load characteristic than typical production storage use cases, for example file sharing,application back end data storage, and VMware storage. Block-level transfers produce concurrent data streams that arenon-cacheable sequential write operations. Consider scheduling backup jobs outside of peak production use hours tominimize degrading performance of critical applications.

When architecting a data protection solution, note that hosting primary and secondary data on the same storage systemintroduces a critical single point of failure. A loss or failure of the sole storage system results in the loss of both productionand backup data. If a storage system is to host primary and secondary data, it is strongly suggested that the secondarydata be periodically moved to another location via tape or transmitted offsite via SnapMirror replication.

Do not attempt to use primary storage volumes as backup destinations.

Page 17: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Existing andMulti-Use Storage

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

17

Avoidmixing primary and secondary storage within the same aggregate. Create separate aggregates for primary andsecondary storage needs. This helps isolate performance issues and prevent out-of-space conditions.

Where aggregates must host both primary and secondary data, it is strongly recommended to have amonitoring, alerting,and spacemitigation facilities in place as described previously. If these facilities are not in place and you do not havepredictable and known storage growth patterns, consider utilizing space reservation on your primary data volumes,especially where LUNs are configured. Although this conservative approachmay require allocatingmore space, it helpsprotect the primary data volumes from I/O errors if the secondary data fills the aggregate.

Give special consideration to backup of NetApp volumes containing primary LUN data. This includes NDMP backup orcontroller to controller SnapVault backup of volumes containing live LUN data. Typical scenarios include Fiber Channelor iSCSI attached LUNs used directly by applications such as SQL Server, Exchange, or VMware ESX data stores.When designing a solution that manages the backup of LUN data,verify that there are no attempts to quiesce the sourceLUNs, nor are the contents of the LUNs cataloged or searchable. Review how the LUN data can be placed into amodethat results in an application consistent backup and confirm that the tools, systems, and procedures necessary to restorethe LUN data are available in a usable way to the appropriate application. Data consistency is usually accomplished byquiescing the LUN data in someway such that a valid volume snapshot can be taken. DPX can then back up thisapplication consistent data either via Data ONTAP SnapVault or NDMP backup, or the data can be replicated viaSnapMirror. Some applications such as SnapManager can arrange to create a predictable Snapshot copy name, which isused to build a scheduled NDMP backup job.

Similar consideration applies to NFS data stores used by ESX servers for VMDK virtual machine storage. When backingup these volumes directly with either NDMP tape backup or SnapVault Data ONTAP primary to secondary transfer, DPXdoes not quiesce VMDK storage nor catalog the contents of VMDK files. This is similar to the behavior describedpreviously for volumes containing primary LUN storage.

Page 18: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Chapter 3: Managing NetApp Storage Systems

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

18

Chapter 3: Managing NetApp Storage Systems

Install NetAppOnCommand SystemManager onto one or moremanagement workstations. This is the NetApprecommended free tool for storage systemmanagement andmonitoring. Do not rely on a built-in web-based NetAppFilerView application, as this is unavailable with Data ONTAP 8.1.0 or later.

Install a full-featured telnet/SSH terminal emulator that can easily capture output. TheWindows telnet client is generallycumbersome to use. SSH is preferred for its security and ability to be scripted. UNIX and Linux machines are especiallyuseful for SSH and scripting, howeverWindows variants do exist. For an example of SSH scripting, read knowledgebase article 46630. RSH can also be used for scripting purposes; it is much easier to set up, but much less secure thanusing SSH.

Install an FTP client. Command-line FTP utilities are acceptable if you are familiar with them. Otherwise, a graphical FTPclient is easier to use. An FTP client is useful for occasionally collecting log files from the NetApp device.

Configure and test FTP access to the NetApp root volume (usually /vol/vol0). This is required for log collection iftroubleshooting is necessary. The FTP password is not tied to the system’s NDMP or system password and can easilyget out of sync. It is suggested to have the FTP and NDMP/system passwords be the same, so that included supportand troubleshooting utilities can get equal access as needed. As indicated previously, this should use the system’s rootaccount, as the included software and tools all pull NetApp configuration information from a common place. If rootaccount use with FTP is unacceptable, theremay be some log collecting tools that will not work. However, the NetApplog files can usually be collectedmanually either by an alternate FTP account or via other methods such as CIFS andNSF sharing. For more information regarding FTP access errors and NetApp log files, read knowledge base articles45648 and 41798. If FTP access to your controller does not work, call NetApp technical support for assistance.Depending on what kinds of access your controller is licensed for, this could be a delicate operation to correct. Note thattroubleshooting NetApp operations often requires reviewing a large number of logs by both NetApp Support and CatalogicSoftware Data Protection Technical Support and the OnCommand SystemManager tools currently do not permitdownloading these files.

It is extremely important that the NetApp storage administrators understand that they should not delete, rename,relocate, or alter DPX snapshots or underlying destination volumes in any way. Do not use tools such as FilerView,SystemManager, or the Data ONTAP command line interface to alter backup data in any way. This includes attemptingtomanually clean up space when aggregates or volumes fill up, unless instructed to do so by Catalogic Software DataProtection Technical Support. DPX manages the lifecycle of snapshots and will add, remove, and clone snapshots asneeded and requested through themanagement console. The appropriate way tomanage recovery data is to either alterretention periods or remove servers/jobs from within themanagement console. The space is reclaimed after the nextcondense operation.

Servers and Data GroupingThemaster server component of DPX can generally support approximately 300 nodes protected via agent-basedbackups. This estimate assumes an average of two to three source volumes per server and once daily backups.

Master server capacity is ultimately dependent on overall task scheduling and resources, so if the total number of backuptasks or the job frequency requirements are higher themaster server capacity is lower.

Page 19: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Servers and Data Grouping

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

19

Agentless backups are bound by different system constraints. Agent-based and agentless jobs can be freely intermixedin the backups schedule. Agentless backups do not conflict with NDMP kernel thread usage nor with other Data ONTAPlimits for SnapVault and SnapMirror.

Predominantly for agent-based deployment, it is recommended to deploy nomore than 300 nodes on amaster server. Ifthere is a small amount of agentless nodes needed, 20 nodes or less, this can be added.

For mixed agent/agentless environments, it is recommended to limit the agent-based nodes to 220 with amaximum of400 nodes including agentless. For large agentless environments, the 400 nodemaximum applies. Where necessary,lower and distribute the agent-based load across multiple masters to make room for additional agentless nodes.

Multiple NetApp storage systems can be configured for use with a single master server/Enterprise.

Each NetApp storage system added to the backup Enterprise should have a dedicated client node, otherwise known asan NDMP proxy server. Do not use a single server as a proxy for more than one NetApp controller. For small Enterpriseconfigurations, 50 nodes or less, themaster server can be the proxy for the NetApp controller. For larger Enterprises,avoid using themaster server as a proxy for any NetApp controller. Instead, choose an individual DPX client machine toserve as a dedicated proxy node for each NetApp controller in the Enterprise and same subnet.

An NDMP proxy may be a virtual machine, however, in larger environments CPU andmemory resources may need to bereserved for this VM to ensure adequate performance.

When considering where to locate an NDMP proxy server, give preference to nodes on the same subnet as the NetAppcontroller.

Configuringmultiple master servers with a single common storage controller is strongly discouraged. Sharing a singlestorage controller across multiple master servers/Enterprises complicates job scheduling andmakes resource conflictmanagement significantly more difficult. It is possible to scan a single NetApp controller into multiple Enterprises.However, caremust be taken to ensure that themaster server jobs are carefully scheduled to avoid exhausting NDMPkernel thread resources and causing SnapVault and SnapMirror resource contention.

Servers using DPX agent-based backupmust be current on all recommended vendor operating system patches andapplication patches. Defragment Windows servers prior to running DPX base backup. Configure Linux servers to fulfill allminimum requirements for LVM2 and free extents prior to DPX installation.

A Block backup jobmust back data up to its own dedicated volume. A NetApp volumemust not host data for more thanone backup job. Do not configuremultiple backup jobs to share volumes otherwise data retention, condense operations,and reclaiming of space will be problematic. Sharing volumes across backup jobs also complicates deduplication andcompression operations on the NetApp. See “Storage Configuration” on page 11 for more information on the NetAppfile/LUN limitation.

Large implementations (>100 clients) should strongly consider grouping servers with like function and similar data or bothinto backup jobs. Design jobs groupings around:

• Geographic location: Do not mix local and remote hosts in a single backup job.

• Data retention: Combine servers with similar data retention requirements.

• Source server data size: Do not combine server backups with drastically different data sizes. Ideally, for data transferconcurrency and job completion, all incremental backup transfers within the job would be roughly the same size, andtherefore complete around the same time.

Page 20: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

20

• Server type or server function: Avoidmixing dissimilar operating systems or dissimilar applications, as theremay belimited deduplication benefits across diverse operating system platforms and data sets.

• Cluster nodes: Consider creating independent jobs for large clusters with many nodes and/or physical volumes.

• Business function: Consider organizational separation, however, note that grouping servers with disparate features orconsideration as described abovemay not benefit from deduplication.

• Application functionality: Consider breaking up large application servers into multiple jobs, which focus on specificresources. This is especially helpful for largeMicrosoft Exchange DAG clusters, but can also be used for MicrosoftSQL Server. Break jobs up with a focus on server devices, as the Block backup is inherently device centric. Avoidcreating such jobs where data is shared on common devices as each backup job creates its own base backup andconsume the necessary secondary storage space to protect this.

In smaller implementations,it is recommended to dedicate a backup job for each server, backing up each job to a singlevolume. The primary reason is to improve scheduling flexibility andmaintain stricter separation between servers, groups,and departments. Note the previously mentioned Data ONTAP model-specific volume limits. In addition, not groupingservers in a single job reduces potential deduplication benefits.

A DPX node cannot perform block-level backup for attached CIFS/NFS mount points. DPX can back up CIFS/NFSprimary data on a NetApp controller either via Data ONTAP SnapVault Primary backup to a Secondary system or NDMPbackup to tape. Alternately, this data can be SnapMirrored to a second NetApp controller and then the SnapMirrordestination volume can be backed up via NDMP tape backup.

For remote site server backup, tuning the TCP socket keepalive settings for themaster and clients is stronglyrecommended.

For more information about socket keepalive settings forWindows, Linux, and Solaris, read knowledge base article46021. It is recommended to reduce keepalive transmission to 10minutes or less. The overhead that TCP keepalivepackets generate are negligible even for very low bandwidth links and reducing this setting helps avoidmany commonfirewall, router, and latency issues that may time out idle or very slow moving network connections. Tuning keepalivesettings on themaster server is generally good and has no other effect on the local high performance network. All remoteclients should receive similar tuning. Local nodes generally do not need this kind of tuning.

Job Creation and Scheduling

General NetApp Storage Recommendations

NetApp storage systems impose a limit of 255 snapshots per volume. When considering job frequency and retentiontime, plan for nomore than 245 snapshots in a volume. NetApp requires a few snapshots for general activities such asmaintaining SnapVault relationships, managing roll back operations, Dump/SMTape tape backup requests, SnapMirrortransfers, and others. Each backup job run results in one snapshot copy that preserves that recovery point-in-time; this isregardless of the number of source servers, tasks, or amount of data contained in the destination volume.

Also consider the size of the storage system when grouping servers into jobs andmixing job types. Higher-end storagesystems such as the FAS6xxx series have higher resource limits andmore CPU andmemory to accommodate greater

Page 21: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

21

concurrency. Lower-end storage systems such as the FAS2040 have lower resource limits, which in turn necessitateless aggressive concurrency andmore conservative job distribution and scheduling. Some features such as onlinecompressionmight not be appropriate for lower end storage controllers.

Note that backup traffic is not like typical primary storage use. Backup traffic generally consists of a continuous streamof sequential data writes that do not benefit from caching. Primary storage I/O is generally more random access and canbenefit tremendously from caching. Backup traffic generally requires more CPU resources than typical primary storageuse cases.

DPX Agent-Based Backup

For agent-based backup jobs, the number of concurrent tasks is limited by Data ONTAP’s internal NDMP kernel threadresources. Each DPX task corresponds to one NDMP session, which in turn uses one NDMP kernel thread.Concurrency of jobs and tasks must be planned and scheduled to ensure that the storage system’s resources are notexhausted. DPX tasks include the following:

• NetAppOSSV backup: Each path backed up corresponds to one task.

• NDMP backup: Each path backed up corresponds to one task.

• Restore: Restore jobs are similar to backup jobs for task usage, however, restores are usually performed on an adhoc basis.

• Linux BMR: Each target disk which transfers data corresponds to one task, however BMR is usually an ad hocoperation.

The total number of concurrent tasks must not exceed the storage system limits. Jobs can consist of one or more serversand each source volume on each server use at least one task. Thus, populate and spread out the schedule such that thetotal number of concurrent tasks running across jobs at any given time does not exceed the storage system’s internallimits.

In environments where SnapMirror scheduling is used and/or other NetApp data transfers are coordinated outside ofDPX, youmust account for this additional resource use and adjust the job schedule concurrency to accommodate. See“NetApp Storage System Guidelines” on page 7 for additional details and references to Data ONTAP's resource limits.

A recommended starting point for DPX SnapVault job creation is four to eight servers per job, backing up to a singlevolume. However, optimal server count depends on the number of source volumes, each requiring a single task, amountof data to be backed up, and the expected duration of the incremental data transfer. See “Servers and Data Grouping” onpage 18 for additional details on how to group servers into jobs.

The range of servers per job is flexible and should be adjusted based upon the data and servers included as describedabove. For example, if the environment has a lot of very small servers with few source volumes, it is recommended togroupmore of such servers into a single job. Conversely, servers with large data footprints andmany source volumes,such as large clusters, should configure fewer servers per job. Isolate clusters to their own job. Very large servers,especially those with high data change ratemay benefit from being isolated in their own jobs.

Any job that is likely to initiate more than 75 parallel tasks is better broken down in several jobs; where these jobs couldbe scheduled to reduce the overall parallel operations to a single controller. Very large application clusters, such as largeExchange DAG implementations, also benefit from splitting the data protection work across multiple jobs where possible.

Page 22: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

22

Avoidmixing very large servers with small servers. It is better to group servers of similar size where all such parallel datatransfers start and end around the same time, see “Servers and Data Grouping” on page 18. Mixing different sized serversstill works, however the NetApp snapshot does not take place until all of the servers in a backup volume have completedtheir transfer. Thus, the smaller servers that finished sooner may wait for a very long time before their data is trulyprotected in a snapshot.

Some agent environments may require data transfer throttling to helpmanage job or server bandwidth utilization.Consider the following where bandwidthmanagement is needed:

• ForWindows 2008 nodes and later, node throttling is best handled with a local group policy. This applies a QoSsetting to the node that can be global to the system or limited to DPX related transfer. For additional information, readknowledge base article 45991.

• DPX Block backup job “Set Source Options” dialog offers a throttling parameter that caps themaximum data transferrate of each task in a DPX job. This management console parameter applies to each individual task; the totalbandwidth consumed is limited to the number of tasks multiplied by the per-task limit.

• There is a new node throttling option for the nibbler process that can be used on nodes withWin 2003 and later andwhere local group policies are not desired. The options utilizeWindows QoS features on a per-process level. For moreinformation, read knowledge base article 46361.

• NetAppOSSV agents also offer bandwidthmanagement parameters. Consult the Overview section of the NetAppTR-3466Open Systems SnapVault (OSSV) Best Practices Guide. Modification of several parameters in the sourceserver’s snapvault.cfg, wan.cfg, and server.cfg files is necessary. Note that you cannot mix throttlingmethodsabove, youmust select only one for a job or a node. Attempting tomix methods may result in erratic or unpredictabletransfer.

DPX provides agents which are designed specifically to handle backup and restore integration for key applications. Theblock-level agent generally covers application support for Active Directory, Exchange, SQL Server, SharePoint, andOracle. The DPX agent interacts directly with VSS onWindows platforms to quiesce applications for backup and tocoordinate restore. Linux agent integrates with the application and LVM2 to coordinate backup and facilitate restore.

NDMP Tape Backup

NDMP tape jobs share the pool of NDMP kernel threads and also have their own specific NDMP data transfer limitations.See “NetApp Storage System Guidelines” on page 7. When laying out NDMP tape and agent-based SnapVault jobs,create the jobs and populate the schedule such that the total number of NDMP kernel threads is not exhausted at anygiven time and the total number of concurrent NDMP tape operations is also not exceeded.

Note that NDMP backup has some specific limitations in regards to encryption and tapemigration. NDMP encryptionrequires that the tape drive be connected to the NetApp controller. Tapemigration can only be performed via the use of aDPX device server. Additionally, themigration of NDMP tape data has some limitations on the type of NDMP data, forboth SMTape or Dump.

The following table describes the limitations of NDMP datamigration:

Page 23: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

23

BackupType

EncryptedBackup

Migration Supported Migration with Encryption

Dump Y Physically connect tape libraries to controller nodes anddevice services and then install on each device server.

N

Dump N Use automatic setup for multiple device servers and tapelibraries using the Device Configuration Wizard.

Note: The Device Configuration Wizard performs all thesteps described in the Manual process and acts on theentire Enterprise.

Y

SMTape Y N N

SMTape N Y Y

For a comprehensive review of NDMP backup and restore covering primary data, secondary data, and NetApp Cluster,see NDMP Backup and Restore in the ReferenceGuide. Note that NDMP dump of DPX Block level client data, includingagent and agentless backup, does not support incremental or differential backup due to how Data ONTAP tracks files bydate for backup inclusion. All NDMP dump backups for DPX client datamust be defined as full backups. DPX support forSMTape at this writing only supports full volume backup.

NetApp OSSV Agent and SnapVault Controller Management

NetAppOSSV agent backup is very similar to DPX Block backup with respect to job tasks and NDMP kernel threadconsumption. SnapVault controller to controller backupmanagement also consumes NDMP kernel threads. By abiding tothe NDMP kernel thread limit suggestions above, OSSV and SnapVault management jobs can be freely intermixed withother jobs.

In addition to NDMP kernel threads, these backups also consume resources as described previously within rsm show_limits. Use caution when attempting tomix these jobs with other NetApp operations that are not directly controlled byDPX; for example SnapMirror, SnapVault, Volume copy, etc., outside of the DPX management console.

It is strongly recommended that the hostservers have the latest version of OSSV agent be installed. V3.0.1P3 is theminimum requirement, but newer versions for specific platforms may exist. If the version you have access to on theNetApp support portal is not this minimum version or later, youmay need to use the search facility at the bottom of thesoftware download window to find the latest version for your platform.

If you cannot locate the appropriate OSSV software version, contact NetApp technical support, specify the platform youare interested in, and indicate that you need access to the latest version; v3.0.1P3 or later.

Agentless Backup

The NetApp storage system constraints do not apply to agentless backups. Agentless backups work via iSCSI and arenot subject to NDMP kernel threads or resource contention between SnapVault and SnapMirror. Agentless is bound bycontroller limits relating to LUNs and iSCSI. However, iSCSI is inherently limited by the bandwidth allotted to it viaavailable network controllers. Although agentless and iSCSI are bound by limits that differ from SnapVault, the NetApp

Page 24: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

24

snapshot still does not take place until after all tasks in the job complete, thus it is still prudent to group servers of similarsize where possible.

With any disk and network transfer architecture design, the available resources and limiting factors are different for eachenvironment. Themaster server, backup job, and proxy suggestions provided below are generalities that can be extendedif system resource and bandwidth are available for heavier workloads. Start with the suggested figures and incrementallyincrease them as needed until a balance between resources and backup speed is achieved. For agentless workloads, thegeneral resource limits to consider are ESX server performance for VM I/O activity, VM snapshotting, Network File Copy(NFC) access, VMDK read performance, and network interface performance. Proxies are generally limited by the ESXNFC operations, storage performance, and network bandwidth. The NetApp storage is generally limited by iSCSInetwork transfer performance and the other familiar constraints onmemory, CPU, and controller limits.

VMware’s vSphere 5 documentation contains many references describing thesemethods and their associated limits.See vSphere 5 Documentation Center for various VM backup transport methods and ESX/ESXi NFC connection limits.

Note: DPX only supports backup of VMs controlled by a vCenter, so only the “through vCenter Server” figures apply.

At the start of each job, there is a resolution phase that takes place. The job resolution connects to each vCenter to queryfor the VMs to backup. This query can take some time depending on the number of VMware resources such as VMs,resource groups, etc., that the environment contains. The VMs are thenmatched up against the capabilities of theproxies defined in the job. The SAN method is always preferred. Hot-addmethod is the next preference with NBDnetwork transfer being the lowest. Proxies hosted on virtual machines automatically discover the appropriate method touse; this is controlled by the ESX server. There are DPX proxy parameters that can be used to prevent use of a particularmethod, for example, prevent Hot-add use in favor of NBD, however these settings are generally discouraged. It is betterto use the highest performance proxy method the environment has available and/or select specific proxy nodes in a jobdefinition to achieve the desired results. Note that there are no options to promote a proxy to be recognized as a specifictype that the ESX API does not recognize. For additional information on proxy server optional settings, contact CatalogicSoftware Data Protection Technical Support.

The agentless proxy node function is divided into three different access methods:

• SAN: Applies only to proxy nodes with VMFS LUN attached storage. This is usually a physical node with Fiber HBAthat can attach to VMFS SAN storage, however iSCSI can also be used. The proxy coordinates VM snapshots,reads the VMDK data directly from the LUN and transfers the backup to the NetApp controller via iSCSI.

• Hot-add: Applies to virtual proxy nodes only. The virtual proxy nodemust exist on an ESX server that has access toall of the datastores for the VMs you intend to back up. The proxy nodemust reside on a datastore that has a blockingsize similar to the VM datastores to be accessed for backup. The proxy coordinates the VM snapshot, directlyattaches to and read the VMs VMDK data and transmits this to the NetApp controller via iSCSI.

• NBD: Applies to physical or virtual proxy nodes. All data transfers take place across the network. Consequently, thismethod is generally the lowest performing option for backup. The proxy connects to the vCenter, requests a snapshotof the VM, connects to the ESX server, reads the VMDK data over the network, and retransmits this data to theNetApp controller via iSCSI. The VMDK data is read using a Network File Copy protocol and each version of ESXhas specific resource limits on how many NFC connections it supports.

All proxy servers are required to have iSCSI access to the NetApp controller. The proxy must also have access to thevCenter node. NBD proxies must have access to each ESX server. For restore operations, the ESX server must permitthe ESX iSCSI software initiator to have access to the NetApp iSCSI storage interface.

Page 25: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

25

Note that DPX attempts tomatch as many nodes to a preferred proxy as possible. The proxy type and order preference isimportant to remember to avoid unexpected issues. For example, a job that contains 10 NBD proxies and one SAN proxywhere all VM node storage can be accessed via SAN, all prefer the SAN proxy and ignore the other 10.

Before reading data, all VMs being backed up perform a VMware snapshot. For VMware supportedWindows guests withVMware Tools service installed, the snapshot process invokes VSS to quiesce VSS aware applications. The snapshotprocess may take time or possibly fail if the VM is under heavy I/O load. VM snapshots impose processing overhead onan ESX server, so it is important to consider how many snapshots an ESX server concurrently perform within a shortperiod of time. The backup job schedule should correspond to a time when the VMs have a lower level of activity and theoverall ESX server is under light load.

For reading data, the following guidelines are offered:

• SAN is limited by the number of simultaneous read operations made to the storage. Even for larger Enterprises wherea proxy might open hundreds of VMDK files, read operations are not expected to impose any specific SAN fabricissues. The backup operation is generally limited by themaximum number of open file handles the proxy OS supportsand the available bandwidth for reading data. Fiber Channel SAN is preferred here, however iSCSI can also be used.Note that the data is written to the NetApp controller via iSCSI, thus if there is only one interface available for iSCSItraffic, the read and write operations share the same bandwidth which limits the performance gains the SAN methodwas designed to provide.

• Hot-add is similar to SAN in that read operations are generally limited by the number of open files permitted byVMware and the bandwidth available on the ESX server to read data from the datastores. The VM hosting the proxymust be on a datastore of identical block size to the VM datastores being backed up. The VM also has other VMwareconfiguration requirements for Hot-add including that all VM disks participating in backup are required to be SCSI andnot IDE. The proxy VMmust also have permission to read the datastores of interest and the VM itself must reside inthe same datacenter as the virtual machines it is backing up.

• NBD has tighter limits for concurrent data access to vCenter and ESX servers. The ESX server has version specificNFC connection limits that cannot be exceeded. The link provided above describes the specific limits for each ESXversion. NBD proxy must have network access to connect directly to both the vCenter server and to the ESX serverswhere the VMs exist.

For writing data, the primary resource limitations is themaximum network bandwidth between the proxy and the NetApp.Write performance is also limited by overall NetApp Controller performance such as I/O load, CPU use, etc. One proxyattempts to write multiple parallel streams andmultiple proxies may operate in parallel against one NetApp. Since alliSCSI connections attach to one target, it is likely that all such transfers move data across the sameNetApp controllernetwork interface. It is recommended to configure and test which NetApp interface is receiving the inbound iSCSI backuptraffic. NetApp can be configured with single port management interfaces, isolated/dedicated network interfaces, andinterfaces that bondmultiple links into one logical network connection. Ensure that the data protection iSCSI traffic usesthe interfaces you intend.

Consult your NetApp documentation at NetApp Support for the specific controller limits pertaining to connected iSCSIhosts and LUNs. LUN maximums are shared between iSCSI and Fiber channel. Lower end controllers max out at 128hosts per controller and 1024 LUNs. Larger controllers can handle up to 512 iSCSI hosts and 2048 LUNs. Each proxythat connects to a NetApp controller for iSCSI transfer counts as a host and each VMDK backup in progress correspondsto one LUN on the controller.

Avoid using NIC bonding on physical proxy nodes. In the field, NIC bonding, at times, has shown to be unreliableespecially where the expectation is to increase network throughput. Also note that NetApp systems can have troublewith NIC bonded hosts when the ip.fastpath option is enabled; see “Troubleshooting and Known issues” on page 30.

Page 26: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

26

Although job and task concurrency are theoretically similar to agent-based backup, the pattern of load on themasterserver is different. Agentless jobs query source vCenters for the entire VM enterprise at the start of each job. For smaller,less complex Enterprises of less than 100 nodes, this should not be an issue. For larger andmore complex Enterprises,the environment discovery can take some time and processing to accomplish. For this reason, it is recommended toavoid running large numbers of agentless jobs to start at the same time. Rather, it is a good policy to determine how longthe job resolution phase will take and space job starts apart in the schedule by this time. If job spacing for all jobs is notpossible, limit the number of simultaneous job starts to five or less and observe the load imposed on the vCenter serverandmaster server when simultaneous jobs begin. Based on performance, adjust the simultaneous job starts as needed.The number of concurrent jobs running at any one time can remain the same as the agent-based figures suggest.

From a host perspective, consider how busy the VMs might be at the time of backup and how much data they are likely totransfer. Consider constructing smaller or even single node backup jobs for servers that will be I/O intense or generallyhave a lot of data to regularly transfer. For smaller nodes and nodes with little or no data to transmit, consider batchingthese into larger jobs. VDI are generally very lightweight and usually good candidates for batching up into larger jobs.

Creating backup jobs can bemoremanageable and flexible if the VMs are organized into higher level containers such asResource Groups; each high level container then gets a backup job to protect it. Using VMware containers such asresource groups also adds flexibility as the backup job dynamically discover all VMs in the group; all VM additions anddeletion are accommodated for.

For any agentless deployment, note the overall number of concurrent parallel iSCSI transfers a NetApp controller ishosting. Although the LUN and host limits for NetApp are fairly high, the published NetApp figures do not account for all ofthese connections participating in concurrent bulk inbound transfer of data. When architecting a job, strive to limitconcurrent iSCSI transfers to about 50 and adjust this figure as needed based on the controller size and performanceobserved on the controller.

For environments where NBD is the only option, consider a conservative approach to job creation. Each ESX servermust not exceed its maximum number of parallel NFC connections or the tasks in the job fail. Recall that each VM is atask, each task utilizes one thread per VMDK, each VMDK consumes a VMware NFC stream, and each VMDK uses aniSCSI LUN to the NetApp for data transfer. Do not create jobs where the sum of all servers and protected devicesexceeds the ESX server NFC limit. To generalize this case, the recommendation is for five to ten servers per job ofaverage size one to three VMDK files and to avoid job concurrency to a common ESX server. Evaluate the job tasksneeded against the NFC resource limits and adjust your job accordingly. Since the limiting factor here is ESX NFCconnections, job concurrency can be configured if jobs running in parallel back up VMs from different ESX server.

The Hot-add transfer resource limitation is higher, but likely not as high as SAN. To generalize this case, limit the numberof concurrent backups against a single ESX server to 25 concurrent servers of average size being one to three VMDKdevices. This may be accomplished in a single job or multiple jobs having their job startup spread out as indicated above.

SAN is generally considered the highest performing solution. Your resource limitations are likely to relate to the availablebandwidth the proxies can read and the network bandwidth the NetApp controller can accept. Using the NetApp iSCSIrecommendation reviewed previously, limit concurrent iSCSI VMDK transfers for a single NetApp controller to 50 hosts.Adjust the job based on the resources consumed and data transfer performance. With a higher number of parallel nodes ina job, note the load imposed upon the ESX servers during the VM snapshot phase and adjust accordingly.

The number of proxies you select for backup jobs depend on the available data paths and backupmethods available tothem. For amedium to large environment, it is strongly advised to plan out your proxy use and avoid selecting all proxieson your jobs. Large Enterprises with large numbers of proxies may consume significant time in the resolution phasereading the VMs in the environment andmatching up against proxy capabilities.

Page 27: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Job Creation and Scheduling

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

27

NBD has the lowest resource limits. If the NBD proxies are virtual, limit the job to two to four proxies; this is meant toavoid excessive network interface traffic for the ESX server. If the NBD nodes are physical, limit the proxy number to 10or less, not to exceed the number of nodes being backed up in the job. The number of proxies can be adjusted; howevergiven the limits applied to NFC transfers, increasing the number of proxies does not necessarily improve performance.

Hot-add has higher performance but does consume resources from an ESX server. Limit the number of hot-add proxies ina job to 10 and adjust this as needed based on ESX load and NetApp iSCSI performance. If your ESX datastores havevarying block sizes, limit proxy selection to only include proxies hosted on VMFS datastores with identical block size.

SAN is the highest performingmethod. On a relatively fast SAN and high performing NetApp iSCSI interface, start with15 proxies for large backup jobs and adjust this as resources permit.

If your ESX storage is hosted on NFS, the NBD proxy method is the only option available.

In general,it is inefficient to configuremore proxies in a job than there are nodes in the job. Proxies can be used in parallelfor multiple concurrent jobs, however note that the proxy resources and performance are shared across all requests. So,it may be a better use of resources to define jobs that do not overlap proxy use for concurrent job runs.

Agentless backup of Windows VSS enabled applications may be possible but should be fully validated with theapplication vendor and tested (backup and restore) prior to production deployment. VMwaremakes specific requirementson the VM to support VSS backup; most notably IDE disks are not supported, dynamic disks are not supported, andSCSI controllers must have an equal number of available IDs to perform backup. Some application configurations are notsupported by their vendor for backup and restore through the VMware snapshot process. VSS aware stand alone serverapplications can be backed up and restored via agentless backup. Confirm with the application vendor that they supportrestored data from a VMware snapshot. SharePoint Farms andmost clustering configurations like failover cluster,Exchange DAG, and SQL AlwaysOn requires the use of an agent for proper backup and restore. Validate ActiveDirectory server backup and restore with Microsoft; there are specific requirements needed for proper backup and toensure that recovery does not cause undesired replication issues. Note that VSS integration only handles the backup anddoes not integrate with restore at the application level. Many applications restored in this way require amanual procedureto ensure that the application is online. For older servers and for applications that do not have VSS integration, a VMwaretools pre/post snapshot scripting facility is available to help pause the application for application consistent backup.

In general, the DPX agent is preferred for application aware backup as the agent is specifically designed to supportapplication interfaces for backup and restore.

Agentless backup of Linux does not have any special application integration. The Linux VMware Tools interface doesprovide a pre/post snapshot scripting facility that can be used to quiesce or pause applications for consistent backup.

For additional information on VMware virtual machine requirements and support for VMware Tools service VSSintegration, see the Vsphere 5 Documentation Center.

If you have questions regarding a specific application, Catalogic Software Data Protection Technical Support can supplysome field gathered data upon request. In general, refer agentless application consistency questions to VMware tovalidate proper configuration of the VM to support this and to the application vendor to ensure hypervisor initiated VSSsnapshots are a support backup and restoremethod.

File-level Backup and Catalog Maintenance

NetApp controller resource constraints do not apply to file-level backups. Since file-level jobs do not directly interact withthe NetApp controller, these jobs can be freely interleaved with SnapVault, NDMP, Agentless, and other backup jobs.

Page 28: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Miscellaneous Considerations

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

28

File-level job scheduling is constrained by the general master server limits described in the “Servers and Data Grouping”on page 18.

When defining the data protection schedule, schedulemaster server catalog backup and condense operations. Thegeneral recommendations are:

• Run Condense at least once per day, preferably at a time when there is no activity or minimal activity.

• Run Catalog backup at least once per day, preferably at a time when there is no activity or minimal activity. Enableemail notifications and archive catalog backup job logs for future reference. Retain your catalog backup for at least aslong as your longest backup job retention.

For a comprehensive review of condense and catalog backup recommendations, seeMaintaining and Protecting theCatalog in the User's Guide.

It is generally recommended to configure the tape library media changer on a supported DPX client; generally Windowsand Linux platforms work best. Note that if themedia changer is configured on a NetApp controller, tape library operationscould compete with NDMP kernel thread resources. Although rare, tape library operations could fail if there are notenough NDMP kernel threads available.

Miscellaneous ConsiderationsConsider the following recommendations and caveats if you are implementing DPX concurrently with theSnapDrive/SnapManager suite of products:

• SnapManager products must be the only backup software that controls an application’s transaction logs. If more thanone backup applicationmanipulates transaction logs, neither application canmake use of the transaction logs forpoint-in-time recovery. DPX does not interact with Microsoft SQL Server transaction logs, however, DPX has optionsto disable Exchange transaction log andOracle archive log processing, whichmay hinder backup and recovery withNetApp SnapManager suite of products.

• SnapDrive/SnapManager restores may completely revert a volume to a previous point-in-time. If this is the case, theDPX change journal becomes inconsistent and a new base backup of that source volume is required.

• Develop proper scheduling to avoid a situation where SnapManager’s backup kicks off while a DPX job is active.

• Under some circumstances, SnapManager may copy large data files between previous backup snapshots and thecurrent/live volume. In such cases, the DPX change journal registers the copy as changed blocks. The nextincremental backup transfers all of the data contained in these blocks, leading to a snapshot that may be larger thanexpected and longer in running job duration.

DPX cannot run in parallel with any other backup products. Backup products that install drivers and/or hooks into theMicrosoft VSS framework must be disabled or possibly removed. The latter may be the case for products that utilize lowlevel kernel modules.

It is not recommended to runmanually scheduledMicrosoft VSS snapshots of volumes that are protected with DPXagent-based backups. These client-side snapshots consume space, use I/O resources, andmay create conflicts forscheduled DPX backups.

Page 29: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide External MediaManagement and Device Control (Tape Libraries)

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

29

Avoidmanual VSS schedules with agentless backups unless they are needed to support application consistency withagentless backup. In this case, avoid scheduling the VSS snapshots at the same time as agentless VM backup. TheVMware tools interface also invokes VSS; and the extra processing of multiple concurrent VSS snapshots could lead totimeouts with quiescing and backing up the VM.

It is recommended to configure VSS shadow copy storage to remain on the drive that VSS is managing. As a generalrule, do not move a volume’s VSS shadow copy storage to an alternate location unless instructed by Catalogic SoftwareData Protection Technical Support to address specific server or product related issues. The device used to relocate VSSstoragemay become a bottle neck for DPX server backup and if the reserved space fills up, it causes a backup failure formultiple source volumes. There is no specific issue with relocating VSS storage, however, note that VSS writes cansignificantly increase I/O load on a system and device. A VSS storage devicemust be properly sized to hold all VSSsystem writes for the entire duration of the backup. If VSS storage is full and VSS is unable tomaintain the snapshot, theWindows system automatically removes the snapshot causing the backup operation to fail.

External Media Management and Device Control (Tape Libraries)A tape library can be controlled by only onemaster server Enterprise. Any DPX node includingmaster server can beconfigured to control themedia changer for a tape library; the DPX node controlling the tape library can only participate inonemaster server Enterprise. You cannot configure a tape library media changer onmultiple DPX nodes and a tapelibrary cannot be shared between DPX and another backup product.

It is possible to share a tape library between Enterprises and products if the tape library supports hardware partitioning. Inthis case, one physical tape library is configured to emulate two or more library systems, each of which has dedicatedmedia changer devices, tape drives, and tape slots. Do not attempt to overlap slots between library partitions as thisintroduces the possibly of unintended tape overwrite and uncoordinated tape reuse between Enterprises and products.

The tape library hardware and all associated tape devices must be under manufacturer’s support. Any device that themanufacturer no longer supports is not supported by DPX.

Tape libraries must include a bar code reader and the tapes must be labeled with unique bar codes. This greatly simplifiesthe backup operator’s job, especially in environments where tapes need to be regularly removed, sent offsite, andreplaced.

Themedia changer device (tape library arm)must be connected to a DPX node. It is strongly suggested that themasterserver have an HBA installed and the arm be controlled by themaster server. However, any physical DPX client canserve this purpose. Adhering to this recommendation significantly simplifies tape library control management andtroubleshooting with Catalogic Software and the tape library vendor.

It is not recommended to attach themedia changer device directly to the NetApp controller. Media changer controldirectly via the NetApp controller is more complex to configure, more difficult to troubleshoot, andmay require NetAppsupport coordination.

Attach and zone tape libraries to SAN in a way that allows the tape drives to be accessible to the NetApp storage server.It is strongly suggested that at least one tape drive be zoned to themaster server to perform Catalog backups. Ifattachment to themaster server is not possible, then alternate strategies for Catalog backup can be employed.

Page 30: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Troubleshooting and Known issues

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

30

The DPX Deployment Guide recommends the option setting options tape.reservations scsi. This is necessary to forsharing access to tape drives between NetApp controllers and DPX nodes on a SAN. See Enable Options and Servicesin the Deployment Guide.

Do not open the tape library door or manually refresh tapemedia outside of DPX. Using DPX to control the tape libraryimport/export slots for adding or removingmedia allow it to remain in sync with the library. If the library inventory gets outof sync, it may be necessary to force a re-inventory from the DPX management console, shut down themanagementconsole and reinitiate the session, or in extreme cases reset the tape library and force it to re-inventory all of the availabletapes, followed by re-inventory from the DPX tape library console.

Any NetApp systems that require tape backup should have tape drives zoned for access. Although NetApp-to-NetAppNDMP backup can be configured, large data footprints and network transfer speed limitations usually make this optionunattractive. Additionally, configure the NDMP tape devices with their short-name form. Older versions of Data ONTAPallowed for a device namewhich was similar to UNIX devices; for example /dev/nrst0. More recent versions of DataONTAP have permitted both the longer path name or the shorter one and themost recent Data ONTAP versions enforceusing the short name. If you are using NetApp hardware with older versions of Data ONTAP, it s recommended to use theshorter form (nrst0) without the /dev path qualifier. If you use the longer form and then later upgrade to a later version ofData ONTAP which enforces the shorter name, the DPX device setup and use fails and requires reconfiguration. See SetUp Tape Library in the Deployment Guide.

Troubleshooting and Known issues

Critical A-SIS related Bug, Data ONTAP 8.2

BURT 723354 summarized in NetApp Support Bulletin 7010088 describes a potential A-SIS race condition that can leadto data corruption. Contact NetApp support for additional details and advice when using Data ONTAP 8.2 versions laterthan 8.2P3. It is strongly recommended to use A-SIS tomaximize storage efficiency.

The only corrections available for this issue are to disable A-SIS on all volumes or upgrade Data ONTAP. DPX advisesdiscussing your storage configuration with NetApp support to plan for upgrade.

RFC1323

Windows 2003 systems appear to have trouble properly negotiating TCP 1323 sliding window sizes. The symptom is thatan agent-based backup begins and transfers some data and then enters a state where the DPX job is still active, but nodata transfer between the affected client and the storage takes place. At some point during the data transfer, both theWindows and the NetApp controller deadlock waiting for one another to answer protocol negotiation. Windows 2003 is nolonger supported by Microsoft, so there is little that can be done with affected servers; an upgrade to amoremodernoperating system like 2008 or 2012 is generally recommended. There are two possible solutions on the NetApp controllerto work around this issue:

• Disable RFC 1323, this setting works well but does have some drawbacks. For more information, read knowledgebase article 42467. The fix suppresses the controller's use of date based TCP time stamps and negotiating slidingwindow sizes. Time stamps are of primary concern, with packet loss on very high-speed networks where the legacyTCP sequence numbermay wrap and a delayed retransmission request is confused due to sequence wrap. Slidingwindow negotiation is generally not a serious issue except for high-bandwidth; high-latency connections where very

Page 31: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Troubleshooting and Known issues

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

31

large window sizes are desired. Disabling RFC 1323 limits the window size and affects connection performanceespecially for SnapMirror transfers across such links.

Note: Setting the tcp_do_rfc1323 flag does not automatically persist controller reboot. Consult NetApp technicalsupport if persisting this setting across reboots is necessary. Generally this requires editing the /etc/rc script to includepriv set diag; setflag tcp_do_rfc1323 0.

• Disable sliding window negotiation for SnapValut transferrs. There is a hidden Data ONTAP optionsnapmirror.window_size. Although the option implies "snapmirror", it does not effect SnapMirror's ability tonegotiate RFC1323 window scaling, however the setting applies to all SnapVault transfers. The default value for thisis 1994752. The following option change has been shown to work well for bypassing RFC1323 issues:

options snapmirror.window_size 65535

This method should only be used when RFC 1323 is enabled. Perform a priv set siag; printflag tcp_do_rfc1323 tocheck this setting. You should see output similar to tcp_do_rfc1323 = 1. If the value is not 1, set it to 1 using setflagtcp_do_rfc1323 1. The value 1 is the system default, however if youmodified the controller's startup script to alterthis, you need to revert that change to avoid conflicting settings at system restart.

There are some details on NetApp NOW which suggest this has been fixed. However, this work has not been observedin the field. To date, all supported versions of Data ONTAP from 7.x through 8.x appear to have this RFC 1323 issue withcertainWindows nodes andmay need the changes suggested above.

IP FastPath

IP Fastpath is a technology that allows a storage controller to automatically respond to client requests using the NICcontroller that the initial request came in on. This avoids the overhead of routing and transmitting responses over NICsthat are less optimal for client communication.

Windows servers using NIC teaming seem to be particularly vulnerable to data transfer failures when IP Fastpast is set;this seems especially predominant on HP servers. The observed behaviors include very slow network performance,unusual diminishing of network performance over time, and apparent transfer hang. There are currently two knownsolutions to work around this issue:

• Disable NIC teaming on the client machine. This is a step employed when troubleshooting this kind of transfer issue.Once the issue is narrowed down to NIC teaming, youmay need to pursue the issue with your server hardwarevendor.

• Disable ip.fastpath on the NetApp storage server, this fix has a high success rate. For more information, readknowledge base article 40090. Any additional overhead incurred due to disabling this feature seems to be negligible,especially for secondary storage data protection use.

The NetApp default is to enable the ip.fastpath.enable option. It is not recommended to disable this unless doing so isdeemed necessary during troubleshooting. Use caution when altering this option. If the NetApp controller does not havedefault routes in place or has applied any other special tuning which is affected by IP fastpath, disabling this feature caneffectively render the controller inaccessible to other nodes on its connected network.

Page 32: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Troubleshooting and Known issues

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

32

NetApp Management Interface

The NetAppmanagement interface, usually e0M, is a low bandwidth interface that cannot reside in the same subnet ashigher performance 1G and 10G interfaces intended for primary data transfer. For more information, read NetApp KB2013683. When ip.fastpath is enabled and configured to share a subnet with a high performance data interface, the lowperformance interface included in traffic routing, results in very low performance transfers. Consult NetApp supportregarding the proper configuration of this interface and use of ip.fastpath. Configure themanagement interface to use aseparate subnet, if that isn't possible then themanagement interface can also be disabled.

Hostname Resolution

Data ONTAP depends upon a properly functioning DNS infrastructure. This includes forward and reverse nameresolution for the NetApp controller NIC interfaces and all nodes in the DPX enterprise. Where hostname resolution isproblematic or non-functional, it is typical to place entries for the NetApp and DPX nodes into the Data ONTAP/etc/hostsfile.

SnapVault, a protocol used for agent-based DPX Block backup, is especially sensitive to DNS name resolution. Theprotocol operates via themaster server requesting the controller to initiate a connection to a node to transfer data; thusthe controller must be able to do DNS resolution on names and IP addresses that are valid for it to connect to. This is ofspecial concern to firewalled networks using NAT where the IP address used on either side of the firewall may bedifferent.

SnapVault also depends on proper name resolution for controller to controller transfer, as well as SnapVault quiescingoperations. It is critical that a controller have accurate name resolution for itself, as a SnapVault rollback can deadlock ifthe controller cannot resolve itself to initiate the rollback data transfer. For more information, read knowledge base article46200.

Protection of vCenter in Agentless Deployments

When deploying agentless solutions, scan the vCenter node as type VMWARE. This enables to store the logincredentials and coordinate agentless backup/restore operations. However, scanning in a node as type VMWAREprevents its inclusion as part of an agent-based backup. To scan the vCenter in for agent-based backup, first install theagent and thenmodify themaster server preferences to allow this node to be scanned in again. The following syncuiprocedure can be used to enable a system preference to prevent checking the data base for duplicate IP addresses:

syncui

c s localhost ssdb

db login sysadmin SYSADMINPASSWORD

pref add DONOT_CHECK_DUP_HOST Y

quit

The vCenter node needs to be scanned into the DPX Enterprise with another logical name. Once the node is scanned in,it is generally recommended to revert theDONOT_CHECK_DUP_HOST preference toN.

Page 33: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Troubleshooting and Known issues

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

33

For additional information on using syncui or the option specified above, contact your DPX deployment specialist orCatalogic Software Data Protection Technical Support.

Agentless backup of a vCenter is not specifically supported by DPX, however, it might be supported by VMware.Contact VMware technical support about the vCenter implementation and its suitability for ESX VADP snapshot basedbackup and restore.

Agentless Backup of Virtual Machines with Agents

DPX agentless backup is designed to skip virtual machines where the DPX agent is installed. This is usually desired toavoid backing up the same servers more than once. However, there are some cases where backup via bothmethods isdesired. The identification of VMs containing the agent is through a VMware tools property, guestinfo.BEXInstalled.

The property includes the DPX version that is installed on the VM. As of this writing, valid values are: 4.0, 4.1, 4.2, 4.3. Ifthis property is empty or null, it is assumed that DPX is not installed. Thus, this property can be used to change the stateof a VM to include it or exclude it from agentless backups.

It is strongly advised to have the latest version of VMware tools installed on the VM in question. BothWindows and Linuxservers have similar commands that can be used tomanipulate this value. For older versions of VMware tools (notrecommended) or for other operating systems that VMware supports, consult VMware technical support on the commandline syntax needed to set or clear this value.

The following are examples forWindows and Linux:

Windows: vmtoolsd --cmd "info-set guestinfo.BEXInstalled 4.3"

Linux: vmware-guestd -cmd "info-set guestinfo.BEXInstalled 4.3"

Miscellaneous Data ONTAP Concerns

Avoid performing Data ONTAP upgrade/downgrade/upgrade on a given controller where A-SIS storage efficiency isactivated.The Data ONTAP revert and re-upgrade can leave A-SIS metadata in a state where A-SIS operations fail.Contact NetApp support for additional details. It appears that stopping and starting A-SIS with privileged rights can clearthis metadata and restore storage efficiency functionality. If this condition exists, the Data ONTAP /etc/log/sis fileincludes error messages similiar to "Error (Previously downgraded SIS metafiles exists )". In most cases, it is better toupgrade Data ONTAP and if any issues arise work through them with NetApp and Catalogic Software Data ProtectionTechnical Support.

Data ONTAP 8.1.3P1 and 8.2.1 are suggested to help avoid a few issues encountered in the field. In addition to thespecifics noted below, these versions of Data ONTAP are known to correct issues with Java heapmemorymanagement, improve space reclamation priority, and improve storage efficiency processing; all of which have beenknow to affect NetApp controllers as the solution scales up.

SnapVault qtree quiescing is a knownNetApp behavior and this topic is of particular interest to customers thatimplemented a data protection solution prior to DPX 4.3. The SnapVault protocol has specific behaviors that are built-in toassure data resiliency. If there is any error or interruption in SnapVault data transfer into a volume, for example networkinterruption, controller panic/reboot, etc., one or more qtree relationships can enter into a quiescing status, which cantake significant time to complete. It is not unusual for a qtree rollback to operate for several hours or possibly even a

Page 34: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Troubleshooting and Known issues

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

34

whole day to recover. In extreme examples where a busy master server or NetApp controller is rebooted, many qtreescan enter queiscing and consume significant time and resources to complete. The Data ONTAP versions indicatedabove have a specific fix for this, which is to set the following option:

options replication.logical.inomap_snapdiff_copy on

SnapVault quiesce cannot be completely eliminated, however the new Data ONTAP feature dramatically reduces theamount of time needed for qtrees to finish their rollback operations. If the data protection environment started a blockbase backup with DPX 4.3 or later, rollback timing issues areminimized. Whenmanaging the data protection solutionand secondary storage, avoid taking drastic actions that may result in many qtrees having to roll back and consume allsystem resources for long periods of time. Such actions to avoid during the backup cycle are:

• rebooting or panicing the controller

• performing a controller failover/giveback

• abruptly power cycling themaster server

• reconfiguration or abrupt removal of network access for either master server or the controller

Older versions of Data ONTAP have a SnapVault qtreemanagement error, which affects DPX storagemanagement. Thebehavior is that storage spacemay leak over time as servers are deprecated and removed from the backup schedule.The specific issue relates to deleting qtree relationships that are no longer needed; deleting these orphan SnapVaultmanagement snapshots that have a pattern of *-dst.N. These snapshots can be easily removedmanually when they arefound. This issue is reported in NetApp Bug ID 87808 and seems to be corrected in most recent updates to 7-mode DataONTAP including 7.3.7, 8.0.5, 8.1.4, and 8.2.

Data ONTAP can be slow to reclaim space when snapshots are deleted in bulk. The space reclamation process is a lowpriority thread designed to run on a daily schedule and to concede resources to other higher priority tasks. This process isalso single threaded, generally being isolated to a single CPU core regardless of available and idle cores on the controller.Use caution when performing bulk snapshot removals manually. The bulk removal may not result in immediate spacesavings andmay also consume significant system resources to complete. With the DPX solutionmanual cleanup isgenerally unnecessary since the condense process takes care of this on a daily basis. Data ONTAP 8.1.1 and laterincludes significant performance enhancements that help prioritize and finish thesememory reclamation processesfaster.

Page 35: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Chapter 4: External Resource List

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

35

Chapter 4: External Resource List

The DPX Best Practice Guide contains references to the following external resources:

Catalogic• MySupport

• DPX Deployment Guide

• Knowledge Base Article 42502 - NetApp Licensing and BEX/NSB

• Knowledge Base Article 45779 - FlexClone Licensing and Use with NSB

• Knowledge Base Article 46640 - NetApp 7-Mode Non-root andMultiStore vFiler Support

• Knowledge Base Article 46630 - Using PuTTY’s plink Command to Automate SSH Actions onWindows

• Knowledge Base Article 45648 - Data ONTAP Log File Locations

• Knowledge Base Article 41798 - Network Traffic Throttling withWindows 2008 R2

• Knowledge Base Article 45991 - Network Traffic Throttling withWindows 2008 R2

• Knowledge Base Article 46361 - ThrottlingWindows Data Transfers in Advanced Backup Jobs

• Knowledge Base Article 42467 - Block Backup to NetApp Storage Hangs

• Knowledge Base Article 40090 - SnapVault Backup Performance Periodically Slows in NIC Teaming Environment

• Knowledge Base Article 46200 - Advanced Recovery Backups Fail Due to SnapVault Qtrees in a Quiescing ForeverState

NetApp• NetApp Support

• TR-3487 SnapVault Best Practices Guide

• TR-3466Open Systems SnapVault (OSSV) Best Practices Guide

• TR-3446 SnapMirror Async Overview and Best Practices Guide

• TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide

• TR-3505i.aWhen to Select NetApp Deduplication and/or Data Compression Best Practices (available on requestfrom NetApp or partner)

Page 36: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide VMware

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

36

• TR-3965 NetApp Thin Provisioning Deployment and Implementation Guide

• NetApp Support Bulletin 7010088

• Data ONTAP 8Documentation

VMware• Vsphere 5 Documentation Center

• Knowledge Base Article 1020128

Page 37: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide Chapter 5: Conclusion

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

37

Chapter 5: Conclusion

The DPX Best Practice Guide is a broad collection of known constraints and collected field information from DPXimplementations. Many of the guidelines are presented to aid a DPX architect with implementing and customizing a dataprotection strategy based on the customer’s environmental constraints. If you have any specific questions or concernsregarding the contents in this guide, contact your DPX implementation partner, pre-sales engineer, or a representative ofCatalogic Software Data Protection Technical Support.

Page 38: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide TRADEMARKS

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

38

TRADEMARKS

This publication contains proprietary and confidential material, and is only for use by licensees of Catalogic DPX™,Catalogic BEX™, or Catalogic ECX™proprietary software systems. This publicationmay not be reproduced in whole orin part, in any form, except with written permission from Catalogic Software.

Commonly Used Company and Product Names

The following companies and products might be used in the Catalogic DPX™documentation andmanagement console:

Adobe®

Reader®, PDF

Flexera Software®

InstallShield®

FreeBSD®

Hewlett Packard or HP

HP-UX, HP Tru64 UNIX®

IBM®

DB2®, Lotus Notes®, Domino®, AIX®, Magstar®, Tivoli StorageManager

Kroll

Application Recovery Options, ExchangeMailbox Recovery, SharePoint Object Recovery

Linux®

Microsoft®

AlwaysOn, Excel®, Exchange, Hyper-V®,Internet Explorer®, Internet Information Services (IIS), iSCSI Initiator,Notepad, SharePoint®, SQL Server ®, Vista®, Visual SourceSafe® (VSS), Windows®, Windows Server®,Word®, WordPad

NetApp®

Data ONTAP®, FilerView®, FlexVol®, NearStore®, NOW®, RAID-DP®, SnapMirror®, Snapshot™, OSSV,WAFL®, FlexClone®, SnapVault®, SnapManager®, OnCommand™, MultiStore®

Novell®

NetWare®, Open Enterprise Server, GroupWise®, SUSE®, eDirectory™

Oracle®

Java®, Solaris, RMAN, StorageTek Tape Storage

Quantum®

Page 39: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide TRADEMARKS

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

39

Advanced Digital Information Corp.

Red Hat®

CentOS

SAP®

SGI®

IRIX®, XFS®

Sybase®

UNIX®

VMware®

ESX server, ESXi server, vCenter, vSphere, VMware Consolidated Backup, vMotion, VDDK, VMDK

Additional Trademark Information

The following list contains additional trademark information:

• Catalogic, Catalogic Software, DPX, BEX, ECX, and NSB are trademarks of Catalogic Software, Inc. BackupExpress is a registered trademark of Catalogic Software, Inc. All other company and product names used herein maybe the trademarks of their respective owners.

• NetApp, the NetApp logo, Go further, faster, Data ONTAP, FilerView, FlexClone, FlexVol, NearStore, RAID-DP,Snapshot, and SnapVault are trademarks or registered trademarks of NetApp, Inc. in the United States and/or othercountries. NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability ofany information or recommendations provided in this publication, or with respect to any results that may be obtainedby the use of the information or observance of any recommendations provided herein. The information in thisdocument is distributed AS IS, and the use of this information or the implementation of any recommendations ortechniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate theminto the customer's operational environment. This document and the information contained herein may be used solelyin connection with the NetApp products discussed in this document.

• Windows and SharePoint are registered trademarks of Microsoft Corporation.

• Oracle and Java are registered trademarks of Oracle and/or its affiliates.

• VMware is a registered trademark of VMware, Inc.

• Novell is a servicemark of Novell, Inc. and a registered trademark of Novell, Inc. in the United States and othercountries.

• SUSE is a registered trademark of Novell, Inc.

• Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.

Page 40: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide TRADEMARKS

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

40

• UNIX® is a registered trademark of TheOpenGroup in the United States and other countries.

• Red Hat is a registered trademark of Red Hat, Inc. in the United States and other countries.

• AIX is a registered trademark of International Business Machines Corporation, registered inmany jurisdictionsworldwide.

• Sybase is a registered trademark of Sybase, Inc. or its affiliates.

• SAP is the trademark(s) or registered trademark(s) of SAP AG in Germany and in several other countries.

• Quantum is a registered trademark of Quantum Corporation, registered in the U.S. and other countries.

• FreeBSD is a registered trademark of The FreeBSD Foundation.

• SGI and IRIX are registered trademark of Silicon Graphics International Corp. or its subsidiaries in the United Statesand/or other countries.

• This document might contain certain diagrams created using the official VMware icon and diagram library. Copyright© 2010 VMware, Inc. All rights reserved.

• Copyright © 2011 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright andintellectual property laws. VMware products are covered by one or more patents listed athttp://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the UnitedStates and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respectivecompanies.

• All other company and product names used herein may be the trademarks of their respective owners.

Page 41: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide INDEX

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

41

INDEX

BBlock backup

agent based 21

Ccompatibility considerations

Data ONTAP 4, 6-7, 11, 13, 18, 30, 33-34OSSV Agent 23Windows 22, 30

DData ONTAP

aggregates 11command line 9compatability 4, 7concerns 33Host name resolution 32

deduplication 12DPX

agent 22agentless 23-24catalogmaintenance 27file level backup 27master server 18NDMP proxy 19NDMP tape 22server configuration 22tape libraries 29

Eexternal resources 35

FFlexClone

licensing 8volume limitation 10

IiSCSI

requirements 16restore 10

KKnowledge Base Articles

40090 3141798 1842467 3042502 545648 1845779 5, 845991 2246021 2046200 3246361 2246630 1846640 7

Llicensing 8

MMicrosoft VSS 29

NNearStore

licensing 8NetApp storage 4

aggregates 10, 17compression 12controller limits 9devices 7FTP access 18guidelines 7management 18OnCommand 12OSSV agent 23resources 9Snapshot limits 20

Rresources 35

Page 42: Catalogic DPX 4.3 BestPracticesGuidedoc.catalogicsoftware.com/kb/Content/kb/docs/DPX43_BestPractices.… · SnapVault,SnapMirror,andotherlimitationsforspecificdevices) ... spacetoavoidperformancedegradationandstorageavailabilitydisruptionsinproductionenvironments.NetApp

Best Practices Guide INDEX

Catalogic DPX™4.3 © 2014 Catalogic Software, Inc.

42

SSnapManager

recommendations 28SnapMirror

bandwidth 15replication 16scheduling 15, 21

Snapshotaggregate level 11alterations 18limits 20

SnapVaultaggregates 11job creation 21qtree quiescing 33

solution overview 4

Ttechnology overview 4Trademarks 38

VVMware 27volume

compression 14naming 13