best practice bpg1

39
Best Practices for Reliability New System Installations June 2001

Upload: jorge-quintal

Post on 03-Apr-2015

75 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Practice Bpg1

Best Practices for ReliabilityNew System Installations

 

 

 

 

 

 

 

 

 

 

 

 

 

June 2001

Page 2: Best Practice Bpg1

Copyright informationCopyright © 1994–2001 Network Appliance, Inc. All rights reserved. Printed in the U.S.A.

No part of this book covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical, includingphotocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner.

Portions of this product are derived from the Berkeley Net2 release and the 4.4-Lite-2 release, which are copyrighted and publiclydistributed by The Regents of the University of California.

Copyright © 1980–1995 The Regents of the University of California. All rights reserved.

Portions of this product are derived from NetBSD, which is copyrighted by Carnegie Mellon University.

Copyright © 1994, 1995 Carnegie Mellon University. All rights reserved. Author Chris G. Demetriou.

Permission to use, copy, modify, and distribute this software and its documentation is hereby granted, provided that both the copyrightnotice and its permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, andthat both notices appear in supporting documentation.

CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLONDISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THISSOFTWARE.

Software derived from copyrighted material of The Regents of the University of California and Carnegie Mellon University is subject tothe following license and disclaimer:

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions aremet:

Redistributions of source code must retain the above copyright notices, this list of conditions, and the following disclaimer.1.

Redistributions in binary form must reproduce the above copyright notices, this list of conditions, and the following disclaimer inthe documentation and/or other materials provided with the distribution.

2.

All advertising materials mentioning features or use of this software must display the following acknowledgment:3.

This product includes software developed by the University of California, Berkeley and its contributors.4.

Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived fromthis software without specific prior written permission.

5.

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIEDWARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESSFOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLEFOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUTNOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; ORBUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICTLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THISSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Software derived from copyrighted material of Network Appliance, Inc. is subject to the following license and disclaimer:

Network Appliance reserves the right to change any products described herein at any time, and without notice. Network Appliance assumesno responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetworkAppliance. The use and purchase of this product do not convey a license under any patent rights, trademark rights, or any other intellectualproperty rights of Network Appliance.

The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.

RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph(c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19(June 1987).

Page 3: Best Practice Bpg1

Trademark informationNetApp and the Network Appliance design are registered trademarks of Network Appliance, Inc. in the United States, Canada, and theEuropean Union. Network Appliance is a registered trademark of Network Appliance, Inc. in Monaco and a trademark of NetworkAppliance, Inc. in the United States and Canada. FAServer is a registered trademark of Network Appliance, Inc. in the United States andthe European Union. NetCache is a registered trademark of Network Appliance, Inc. in the European Union and Japan, and a trademark ofNetwork Appliance, Inc. in the United States. SnapCopy is a registered trademark of Network Appliance, Inc. in the European Union and atrademark of Network Appliance, Inc. in the United States. WAFL is a registered trademark of Network Appliance, Inc. in the EuropeanUnion and a trademark of Network Appliance, Inc. in the United States and Canada. FilerView and SecureShare are registered trademarksof Network Appliance, Inc. in the United States. Data ONTAP is a trademark of Network Appliance, Inc. in the United States and Canada.Snapshot is a trademark of Network Appliance, Inc. in the United States and the European Union. ApplianceWatch, BareMetal,ContentDirector, ContentReporter, DataFabric, SecureAdmin, Serving Data by Design, Smart SAN, SnapManager, SnapMirror,SnapRestore, and Web Filer are trademarks of Network Appliance, Inc. in the United States.

Network Appliance is a licensee of the CompactFlash and CF Logo trademarks.

All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.

Page 4: Best Practice Bpg1

Use the NetApp Hardware SiteRequirements Guide What You should install the filer in a clean, air conditioned location with sufficient

electrical power and ample space to allow good airflow for all components. Why The filer may not operate properly if installed into an environment that does

not meet operating requirements. Environmental and site power problems cancause unnecessary downtime.

 How Review the NetApp Hardware Site Requirements Guide found in your filer

documentation. You can view this guide online on the NOW site.

Ensure that the environment you install your filer into meets the requirementsspecified in the guide.

Page 5: Best Practice Bpg1

Separate redundant powersupply connections What Connect your system's redundant power supplies to separate power circuits. Why Redundant power supplies can protect your system from both mechanical

failures and power circuit failures. How Ensure that sufficient power connections are available for your system (filer

head and disk shelves) and connect each power supply to a circuit that isprotected by its own circuit breaker. Use the Site Preparation Guide todetermine the amount of power required.

If you have an uninterruptible power supply (UPS), you can connect onepower cable of each head and shelf to the UPS and one power cable directly toan unprotected outlet; or you can connect the power cables to two UPS units toensure continued operation in the event of a UPS failure.

Page 6: Best Practice Bpg1

Connect filer to a UPS What Connect the filer, disk shelves, and any terminal (console) or network

equipment used to manage the filer to a UPS or line conditioner. Why Redundant power supplies only protect your filer and related equipment from

single power supply failures. They do not protect your filer from a completepower outage at your facility.

In the case of an extended power failure, a line-conditioning UPS can providepower to your filer until you are able to shut down the filer gracefully. A UPSalso can prevent the filer from halting during momentary power failures.

Many UPS units have line-conditioning capabilities that provide clean power. How Determine the amount of standby power required to support your filer, disk

shelves, console, and any network equipment (such as a terminal server used tomanage your filer remotely) for at least 15 minutes.

If your data center has an existing UPS unit and the UPS has sufficientcapacity to support the filer and its components, connect the filer and itscomponents to the UPS.

If your data center does not have a UPS unit, purchase a UPS unit withsufficient capacity to support your filer and its components for at least 15minutes.

Page 7: Best Practice Bpg1

Follow configuration guidelines What Your filer comes with comprehensive hardware and software documentation.

Follow the configuration guidelines contained in this documentation.

Periodically review the System Configuration Guide on the NOW site forupdates to recommended hardware configurations.

 Why Your filer documentation outlines supported and tested filer configurations and

environments. If you install hardware in your filer that has not been tested andis not supported, you run the risk of causing unnecessary downtime or servicedisruptions.

 How Review the configuration guidelines included in your filer documentation

when you receive your filer. Periodically review the System ConfigurationGuide to get up-to-date information about new supported configurations.

Page 8: Best Practice Bpg1

Ground disk shelves What Ground your filer's disk shelves. Why For disk storage to work reliably, your filer's disk shelves must be grounded.

The Fibre Channel StorageShelf Hardware Guide or the SCSI StorageShelfHardware Guide for your disk shelves contains specific grounding instructions.

 How Follow the grounding instructions in the disk shelf guide that came with your

disk shelves.

Page 9: Best Practice Bpg1

Check all Fibre Channel cablesfor proper installation What Make sure all Fibre Channel cables are installed properly. Why Improperly installed Fibre Channel cables can cause loop instability or other

errors in the Fibre Channel loop. How Make sure the cables are

Of the proper length●

Properly and firmly connected to both filer and shelves●

Not kinked, bent, or twisted●

You can refer to the Fibre Channel StorageShelf Hardware Guide and EasyInstallation Instructions shipped with your filer for examples of proper cableinstallation.

Page 10: Best Practice Bpg1

 Maintain a console device What Maintain a hardwired serial console connection to the filer's serial port. Why If a filer stops communicating over the network, the only way to diagnose and

fix the problem is from a console device connected to the filer's serial port.

If a filer stops functioning and needs to be started with the boot or diagnosticsdiskettes, a console device is necessary.

 How Connect a serial console directly to the filer's serial port, or connect the filer's

serial port to a console server that can be accessed remotely through thenetwork, or through a console adapter.

The filer's console port has a standard DB9 male connector configured as aDTE device. Refer to your filer Hardware Guide for more details on pinoutsand required terminal configuration settings.

Page 11: Best Practice Bpg1

Maintain hot spare disks What Maintain sufficient hot spare disks in your disk shelves to enable the filer to

reconstruct data from failed disks. Why When a disk fails, your filer uses a hot spare disk to reconstruct the failed

disk's data. Your filer runs in degraded mode until the failed disk isreconstructed on a hot spare disk. While your filer is running in degradedmode, it is subject to data loss if a second disk fails before the first failed diskis reconstructed. It is very important for hot spare disks to be available so thatfailed disks can be reconstructed automatically.

Network Appliance recommends keeping at least one hot spare disk for eachdisk size and disk type installed in your filer. This allows the filer to use a diskof the same size and type as a failed disk when reconstructing a failed disk. If adisk of the same size is not available, the filer uses the next largest availablespare disk.

Note: If a spare disk is not available for reconstruction when a disk fails, thefiler runs in degraded mode for 24 hours or until a suitable spare disk is addedto the system.

 How Purchase sufficient spare disks and keep them installed in your disk shelves.

Example: If your filer has a combination of 9-GB, 18-GB, and 36-GB drives,you should have at least one 9-GB, one 18-GB, and one 36-GB drive installedas hot spares.

Note: In a cluster configuration, each filer must have its own set of hot sparedisks. Filers in a cluster cannot use their partner's hot spare disks to reconstructtheir failed disks.

Page 12: Best Practice Bpg1

Maintain sparepower supplies What Maintain spare power supplies. Why The redundant power supplies in your filer and disk shelves ensure continued

operation in if a power supply fails. If you experience a power supply failure,you need to replace the failed power supply as quickly as possible so that yourequipment is not be affected if the remaining power supply fails. 

 How Purchase and maintain at least one spare power supply for your filer and disk

shelves.

If a disk shelf power supply fails, you can hot swap it with a spare powersupply.

If a filer power supply fails on a C720 or F720, you must power off the filer toreplace the failed power supply. On all other filer platforms, you can hot swapthe power supply.

Page 13: Best Practice Bpg1

Install spare NICs What Install redundant NICs. Why A single NIC is a potential single point of failure. If the NIC fails, your users

cannot access data on the filer. How As a rule of thumb, you should have one spare NIC on hand for each type of

interface card. If possible, install the spare NICs into supported slots in thefiler so they can be brought online quickly.

Example: Your filer contains the following networking interfaces:

One onboard 10Base-T/100Base-Tx Ethernet interface●

One quad-port 10Base-T/100BaseTX Ethernet card●

One Gigabit Ethernet card●

Your filer is configured to use the onboard interface, two ports on the quadcard, and the Gigabit Ethernet card.

To provide spare NICs, you should order and install two additional NICs:

One quad-port 10Base-T/100Base-TX Ethernet card●

One Gigabit Ethernet card●

This configuration would handle the following scenarios with a minimum ofdisruption to your clients:

If the onboard interface fails, you can configure another interface oneither of the other Ethernet cards with the same IP address and bring thatinterface online.

If the quad Ethernet card you are using fails, you can configure the otherquad Ethernet card and bring it online.

If the Gigabit Ethernet card you are using fails, you can bring thestandby Gigabit Ethernet card online.

If you use a redundant NIC as described above without utilizing a virtualinterface (VIF), you might need to unmount and remount your UNIX clients toresolve stale file handles. For more information on VIFs, see Determine whento use VIFs.

Page 14: Best Practice Bpg1

Use autosupport What Configure autosupport to send information automatically to NetApp Technical Support and your own

support staff. Why The autosupport feature sends information to NetApp Technical Support and the filer's administrators

when your filer encounters errors and when it reboots. NetApp Technical Support monitors autosupportmail to identify problems before they cause downtime. For example, if NetApp Technical Supportreceives an autosupport e-mail indicating that a disk has failed, and you have a contract that suppliesfour-hour hardware replacement, a replacement disk will be delivered to your site within four hourswithout your having to contact Network Appliance.

In order to make the automatic replacement and notification process more reliable, be sure to include aname, address, phone number, and pager number for your site contact so that NetApp Technical Supportcan contact your support personnel quickly if a problem is detected; a lack of specific contactinformation is the most common reason for delay in addressing autosupport-initiated customer serviceactivities..

 How Configuring autosupport with FilerView

Select Autosupport or Configure Autosupport from the Filer tree.1.

Set Autosupport Enabled to Yes.2.

Enter one or more names of SMTP mail hosts under the Mailhosts: column.3.

Enter the e-mail name for an administrator under the From: column.4.

Enter [email protected] under the To: column.5.

If you want one or more administrators to receive copies of the full e-mail sent to autosupport,enter their e-mail address under the To: column.

6.

If you want one or more administrators to receive summary information when e-mail is sent toautosupport, enter their e-mail addresses under the Note to: column.

7.

Configuring autosupport from the command line

Enter options autosupport commands  to configure autosupport.

Command Description

options autosupport.enable on Turnsautosupporton

options autosupport.mailhost hostname1, ..., hostname5 Specifies thenames of upto fiveSNMP mailhosts that thefiler can useto sendautosupporte-mail

Page 15: Best Practice Bpg1

options autosupport.to [email protected], ..., address5 SpecifiesNetAppTechnicalSupport andup to fouradditionale-mailaddressesthat receiveautosupporte-mailmessages

options autosupport.noteto address1, ..., address5 Specifies upto fivee-mailaddressesthat receive anotewhenever anautosupportmessage issent

options autosupport.from name@domain Specifiescontactinformationfor anadministratorthat NetAppTechnicalSupport cancontact whenanautosupportmessageindicates theneed foraction to betaken

Refer to the System Administrator's Guide for additional information on configuring autosupport.

 

Page 16: Best Practice Bpg1

Identify multiple name servers What Provide pointers to two or more servers if your filer is configured to use DNS,

NIS or WINS name servers. Why If a filer is configured to access only a single name server and that server

becomes unavailable, the filer may deny users access to its file systems orrespond to requests slowly.

 How During initial setup

With the setup command

The setup command prompts you to enter the IP addresses of up to threeservers. Provide the addresses of at least two servers.

With the Setup Wizard

The Setup Wizard forms contain fields to enable you to enter the IP addressesof up to two servers. Provide the addresses of two servers.

After initial set up

From FilerView

Select Name Service or Manage DNS, NIS Name Service from theNetwork tree to modify the DNS server information.

1.

Select Config or Configure > General Setup from the CIFS tree tomodify the WINS server information.

2.

From the command line

Edit the /etc/resolv.conf file in the filer's root volume to specify thenames and IP addresses of the name servers.

1.

Enter the following commands to specify the DNS domain name and toenable DNS:options dns.domainname domainnameoptions dns.enable on

domainname is the DNS domain name.

2.

Page 17: Best Practice Bpg1

Refer to the System Administrator's Guide for more information on adjustingthese configuration settings after your filer is set up.

Page 18: Best Practice Bpg1

Limit rootvolume size What Limit the size of the root volume to two disks.  Why Limiting the size of the root volume to two disks, which is the smallest

possible RAID group size, reduces the amount of downtime if the volumeneeds to be restored from tape.

For filers that are configured in high-availability environments, NetworkAppliance recommends that you separate the root volume from the volumes onwhich you  store data .

  How During initial setup with the Setup Wizard

When you use the Setup Wizard to set up a new filer, the root volume iscreated automatically as a two-disk volume.

During initial setup with the setup command

When you use the setup command to set up a new filer, the default size for theroot volume is two disks. The setup command provides an option for creating alarger root volume. When you are presented with the volume size option,select the default volume size of two disks.

After initial setup

If your filer has been set up already, complete the following steps to create aroot volume with two disks.

Use FilerView or the vol command to create a new volume with twodisks.

1.

Create an /etc directory in the new volume.2.

Copy the contents of the /etc directory from the original root volume tothe new volume.

3.

If the current root volume is storing user data, home directories, or Webpages that are being served via the HTTP protocol, move the files toother volumes.

4.

Use FilerView or the vol options volume_name root command5.

Page 19: Best Practice Bpg1

to set the new volume as the root volume; volume_name is the nameyou assigned to the new volume.

Reboot the filer.6.

The data that was on the original root volume is still available under theoriginal volume name.

Page 20: Best Practice Bpg1

Limit rootvolume access What Do not permit the root volume to be used to store user data. Why Prohibiting user data storage on the root volume ensures that the volume has sufficient space to

store logs and other system data, and prevents unauthorized access to the filer's configuration files.

When your filer is set up initially, the root volume (named vol0) is automatically exported. If youspecified an adminhost during setup, root volume is exported to the adminhost; if you did notspecify an adminhost during setup, the root volume is exported to all hosts. When CIFS is set up,two shares are created automatically: C$ and HOME. You need to modify these configurations tolimit access to administrators only.

 How To limit access by NFS clients

Use UNIX permissions to control access to directories.●

Use NFS export options to prevent NFS clients from mounting the root volume.●

The following is an example of the default NFS exports file when no adminhost is specifiedduring setup:

#Auto-generated by setup Fri Feb 2 23:38:38 GMT 2001/vol/vol0 -anon=0/vol/vol0/home

The following is an example of a recommended NFS export file; adminhost is the hostname of theadministrator's computer:

/vol/vol0 -root=adminhost,access=adminhost

To limit access by CIFS clients

Use CIFS ACLs and Share permissions to limit access to the root volume to the filer'sadministrators.

The following is an example of the default CIFS shares on the root volume (Data ONTAP 5.3.x):

Name Mount Point Description---- -------------- -----------HOME /vol/vol0/home Default ShareEveryone / Full ControlC$ /vol/vol0 Remote AdministrationBUILTIN\Administrators / Full Control

The following is an example of the recommended CIFS shares on the root volume:

Name Mount Point Description------ ----------- -----------C$ /vol/vol0 Remote AdministrationBUILTIN\Administrators / Full Control

Page 21: Best Practice Bpg1

Determine when to use VIFs What Determine when a virtual interface (VIF) can be used in your environment and

configure your filer and network equipment to support VIFs. Why A single NIC is a single potential point of failure. If the NIC fails, your clients

will not be able to access data on the filer. Therefore, it is a good idea to installredundant NICs in your filer.

In addition to redundant NICs, you can configure VIFs to provide additional,transparent fault tolerance. If configured correctly, VIFs allow one interface totake over for a failed interface with no manual intervention.

The VIF will respond to both a complete hardware failure of one of the NICsand a link failure on one of the interfaces.

For example, if the switch to which one interface is connected fails, and theother interface is attached to a different switch, the VIF will allow the otherinterface to take over communications for the failed link.

 How Purchase at least two network adapters of the same type.●

Purchase network switches from vendors that support VIFs orEtherChannel Trunking. Install both cards in supported slots in the filer.

Configure the filer and the switch to be attached to the filer. For the bestfault tolerance, attach each VIF to a different switch.

You can find more information about configuring VIFs in the SystemAdministrator's Guide and on the NOW site. The NOW site also has a list ofswitches that have been tested to work with NetApp VIF technology.

Page 22: Best Practice Bpg1

Configure the Snapshots feature What Use the Snapshot feature for online backup to reduce planned downtime and to

enable users to restore deleted files.  Why Snapshots provide a copy-on-write, online, read-only backup image of a file

system. When snapshots are available online, users frequently can restore lostfiles without system administrator intervention. File recovery from onlinesnapshots enables an application to resume operations after several seconds offile or directory copy time, as opposed to what is frequently 30 minutes toseveral hours when tape media recovery is used. The files and directories in asnapshot are more up to date than could ever be accomplished with nightlytape archival operations.

You can use an individual snapshot as the source for creating the archived tapeimages. Using the snapshot image as the source enables users to continue toaccess their files while archiving to tape media takes place.

  How Up to 31 separate snapshots can be maintained for each volume. (The exact

number depends on the version of Data ONTAP your filer is running.) Youcan configure snapshots to occur at specific hours of the day, and on a daily,weekly, and monthly basis.

Configuring snapshots with FilerView

Select Snapshot Config from the Disks tree or Snapshots > Configure from theVolumes tree.

Configuring snapshots from the command line

Use the snap sched command to define a schedule for automatic snapshots.Refer to the Data Protection Guide for complete information on schedulingsnapshots.

Creating snapshots manually

Enter the following command at the console:

snap create volume_name snapshot_name

volume_name is the name of the volume for which you want a snapshotcreated.

Page 23: Best Practice Bpg1

snapshot_name is a unique name for the snapshot.

Page 24: Best Practice Bpg1

Determine the best RAID groupsize What Determine and configure the correct RAID group size for your environment.

Network Appliance recommends a RAID group size of 14 disks for RAID groups that contain 9-GB,18-GB, or 36-GB drives. If you are using drives larger than 36-GB, Network Appliance recommendsthat you set the RAID group size to 8 disks.

Data ONTAP 5.3.x has a default RAID group size of 14 disks. Data ONTAP 6.x has a default RAIDgroup size of 8 disks. Depending on the types and sizes of disks used in your filer, and on the versionof Data ONTAP your filer is running, you may need to modify the RAID group size.

These recommendations are based on balancing the amount of time it takes to reconstruct a faileddisk with the performance characteristics of a RAID group. If you want to ensure quicker RAIDreconstruction times, you can decrease the RAID group size further.

 Why The default RAID group size is based on the amount of time it takes to reconstruct a disk after a disk

failure. Reducing the amount of time to reconstruct a disk reduces the likelihood of a double diskfailure.

 How Note: You cannot set a RAID group size for a volume that is smaller than the current number of disks

in the RAID group.

To change the RAID group size with FilerView

Run FilerView.1.

Select Manage from the Volumes tree.2.

Click Modify for the volume you want to change.3.

View the current RAID group size.4.

Enter the new RAID group size for the volume.5.

Click Apply to save your changes.6.

To change the RAID group size from the command line

Run the vol status -v command to check the current default RAID group size setting on yourfiler.

Run the vol options volume_name raidsize size command to set the volume's RAIDgroup size. volume_name is the name of the volume and size is the new size for the RAID group.

The following example shows the RAID group sizes for two volumes before and after the RAIDgroup size is changed; the example would not work if vol0 already had more than eight disks in itsRAID group:

filer> vol status -v

Volume State Status Options vol0 online normal root, nosnap=off, nosnapdir=off, minra=off, no_atime_update=off,

Page 25: Best Practice Bpg1

raidsize=14, nvfail=off, checksum_blocks=default(off) raid group 0: normal

filer> vol options vol0 raidsize 8

filer> vol status -v

Volume State Status Options vol0 online normal root, nosnap=off, nosnapdir=off, minra=off, no_atime_update=off, raidsize=8, nvfail=off, checksum_blocks=default(off) raid group 0: normal

Page 26: Best Practice Bpg1

Use diagnostics diskettes What Run diagnostics when

You install a new filer.●

You install new hardware in a file.●

You suspect you filer has a hardware problem.●

Your filer ships with a set of diagnostics diskettes. You can download thelatest diagnostics from the NOW site.

 Why Running hardware diagnostics when you first install a filer gives you a

baseline status for the filer and gives you experience using the diagnosticstools.

Running diagnostics when you install new hardware in a filer enables you todetermine that the new hardware has not affected the filer's operationnegatively.

Running diagnostics when you suspect your filer has a hardware problem canhelp you determine whether a problem exists.

 How Download the diagnostics software image and the Diagnostics User's

Guide from NOW.1.

Read the Diagnostics Users Guide.2.

Create a diagnostics diskette from the software image.3.

Insert the diskette into your filer's diskette drive.4.

From a console or terminal server, halt and restart the filer.5.

When the console displays the Diagnostics Monitor and the Enter Diag,Command or Option prompt, enter the following command:

6.

all

The name and results of each test appear on the console as thetests are run. The diagnostics stop if a problem occurs so that youcan pinpoint where the error occurs.

 

Page 27: Best Practice Bpg1

What Run diagnostics when

You install a new filer.●

You install new hardware in a file.●

You suspect you filer has a hardware problem.●

Your filer ships with a set of diagnostics diskettes. You can download thelatest diagnostics from the NOW site.

 Why Running hardware diagnostics when you first install a filer gives you a

baseline status for the filer and gives you experience using the diagnosticstools.

Running diagnostics when you install new hardware in a filer enables you todetermine that the new hardware has not affected the filer's operationnegatively.

Running diagnostics when you suspect your filer has a hardware problem canhelp you determine whether a problem exists.

 How Download the diagnostics software image and the Diagnostics User's

Guide from NOW.1.

Read the Diagnostics Users Guide.2.

Create a diagnostics diskette from the software image.3.

Insert the diskette into your filer's diskette drive.4.

From a console or terminal server, halt and restart the filer.5.

When the console displays the Diagnostics Monitor and the Enter Diag,Command or Option prompt, enter the following command:

6.

all

The name and results of each test appear on the console as thetests are run. The diagnostics stop if a problem occurs so that youcan pinpoint where the error occurs.

Page 28: Best Practice Bpg1

Register on NOW What Create an account on the NetApp on the Web (NOW) site.

 Why NOW provides access to the NetApp Knowledgebase, technical assistance and

documentation, subscriptions, new software releases, tools and utilities, andmore.

NOW also enables you to maintain information about your service contractsand installed products, submit RMAs, track orders, and check the status ofsupport cases.

You must register on NOW to get access to new releases of Data ONTAP,firmware, and technical documentation.

 How Point a browser at

http://now.netapp.com/Self-Service/Start/Register.asp.1.

Fill in the forms with your e-mail address, your contract information, afiler serial number or system ID, a login ID, and a password.

2.

You will receive a temporary guest account, which provides limited access toNOW. You will receive a confirmation of your NOW login ID within 24hours. After your login ID is confirmed, you have full access to NOW whenyou use your login ID.

Page 29: Best Practice Bpg1

Remove failed disks quickly What When a disk fails, remove it from the disk shelf as soon as possible. Return the

failed disk to Network Appliance after you receive the replacement drive. Why A failed disk can cause disruptions in a Fibre Channel loop or SCSI chain that

may result in apparent double disk failures or other downtime. How Physically remove the failed disk from your disk shelf.

If you are not running autosupport, request a replacement disk  via anRMA. You can request a replacement on the NOW site or by phoningNetApp Technical Support.

1.

If you are running autosupport and have the required entitlements, youshould receive a replacement disk automatically without requesting anRMA.

2.

When you receive your replacement disk, install it in the disk shelf toprovide a hot spare disk. Follow the steps outlined in the Field ServiceGuide to replace failed disks.

3.

Return the failed disk to Network Appliance.4.

Page 30: Best Practice Bpg1

Check Drive Seating Regularly What Check drive seating regularly. Why Drives can walk forward in the disk shelves if there are excessive vibrations in

the disk shelves. If a drive becomes unseated, the filer may mark the drive asbad or the drive may start to cause errors on the Fibre Channel loop. Anunseated drive also can cause an apparent double disk failure that can cause thefiler to panic.

 How Press firmly on the front of each disk drive casing. If the drive was unseated

you will hear a click as the drive is reseated and locked into position.

Page 31: Best Practice Bpg1

Read hardware flyers What Read flyers distributed with NetApp hardware and follow the instructions they

contain. Why New drives and newly supported hardware often require specific software

releases and firmware updates to function properly.

Reading the flyers ensures that you are aware of the minimum software orfirmware requirements for any hardware you receive.

 How When you receive a new piece of hardware, open the top of the box and look

to see if there is a flyer on top of the hardware. If you see a flyer, read it.

If you have any questions or concerns about a new piece of hardware, checkthe NOW site or contact NetApp Technical Support.

Page 32: Best Practice Bpg1

Upgrade to General AvailabilityRelease What Upgrade to the latest General Availability release of Data ONTAP software. Why Network Appliance releases new versions of the Data ONTAP software

regularly. After a period of time that allows for customer feedback, a newversion of Data ONTAP becomes the General Availability release.

By running the General Availability release of Data ONTAP, you ensure thatyour organization benefits from new features and bug fixes.

 How Follow the upgrade instructions in the Data ONTAP documentation.

Page 33: Best Practice Bpg1

Maintain boot & diagnosticsdiskettes What Maintain a current set of boot and diagnostics diskettes in case you need them. Why Boot diskettes enable you to boot your filer when something prevents the filer

from booting from the system disks. A diagnostics diskette enables you totroubleshoot problems with your filer when it will not boot.

Boot and diagnostics diskettes are not always forward compatible with newreleases. Avoid the discovery that your boot or diagnostics diskettes are out ofdate when you need them by making a current set of diskettes whenever youupgrade the version of Data ONTAP that is installed on your filer. 

 How To create a set of boot diskettes

Follow the instructions in the Data ONTAP documentation.

To create a of diagnostics diskette

Download the diagnostics software image and the Diagnostics User'sGuide from NOW.

1.

Read the Diagnostics Users Guide.2.

Create a diagnostics diskette from the software image.3.

Label and store the diskette where it can be located quickly and easilywhen it is needed.

4.

Page 34: Best Practice Bpg1

Have spare diskettes available What To ensure that you can create boot and diagnostics diskettes when you need

them, keep a supply of spare diskettes on hand near your filer. Why When you determine that you need to create new boot or diagnostics diskettes,

you do not want to have to go looking for spare diskettes. How Purchase diskettes through your normal channels.

Page 35: Best Practice Bpg1

Monitor filer health regularly What Monitor your filer's health regularly. Why The Data ONTAP software monitors filer performance and provides indicators

of potential problems.

The following can indicate potential problems:

Error events identified in the messages log file●

Reports of network errors●

Capacity issues with file systems●

CPU running at maximum capacity for extended periods of time●

 How Install SNMP monitoring tools like DataFabric manager,

ApplianceWatch, HP OpenView Network Node Manager, Tivoli, andMulti Router Traffic Grapher (filer-mrtg,  which is available on NOW).

Regularly review the filer messages file (/etc/messages in the filer'sdefault volume) for errors, link failures, and Health Monitor messages.

Configure autosupport to send automated messages to NetAppTechnical Support and to your system administrator.

Use Health Monitor in FilerView.●

Do not let the used space in a volume exceed 90% of available capacityfor an extended period of time.

Run the sysstat l command periodically during peak loads to see ifthe filer CPU is running at maximum capacity for extended periods oftime.

Run the netdiag command periodically to check the condition ofnetwork interfaces and the network attached to the filer.

Page 36: Best Practice Bpg1

Maintain tapebackups What Maintain regular tape backups of your filer's volumes. Why Data stored on your filer is protected against loss by RAID 4 technology.

When a disk fails and a hot spare disk is available, your filer reconstructs thefailed disk's data onto a hot spare disk. RAID 4 only protects your data from asingle disk failure. If a second disk fails during data reconstruction, the data inthe volume associated with the failed disk is lost and the volume's data must berestored from a backup.

RAID 4 technology and snapshots do not protect your data from a physicaldisaster in your facility. Tape backups stored off site can ensure that you canrecover from a disaster.

 How Use a locally attached or network attached tape backup device, or use

the dump command or NDMP-compliant third-party software to createbackups.

1.

Keep a copy of your backup tapes off site in a safe location.2.

Page 37: Best Practice Bpg1

Subscribe to Field Alerts What The Field Alerts mailing list is used to provide notification of urgent product

information that may affect product performance or reliability. Theinformation sent to this list includes confidential material intended to be sharedwith eligible customers only.

 Why Subscribe to Field Alerts to ensure that you receive important information as it

becomes available. How Log in to NOW.1.

Click the Subscriptions tab.2.

Check the checkbox next to Field Alerts.3.

Click save.4.

Read e-mail from the Field Alerts mailing list as soon as you receive it,and take action to remedy any problem that could affect your filers.

5.

Page 38: Best Practice Bpg1

Subscribe to Bug Notifications What The Bug Notifications mailing list is used to send a list of bugs that were made

public through Bugs Online. An e-mail is sent to the list weekly. Why The Bugs Online feature of NOW enables you to view public information that

is available about bugs. By subscribing to the Bug Notifications mailing list,you can receive a weekly summary of new public bugs automatically.

 How Log in to NOW.1.

Click the Subscriptions tab.2.

Check the checkbox next to Bug Notifications.3.

Click save.4.

Page 39: Best Practice Bpg1

Deal with known issues What Evaluate each notification you receive from the Field Alerts and Bug

Notifications mailing lists to determine whether there are issues that affectyour filers. For each issue that affects your filers, take the recommended stepsas quickly as possible to remedy or work around the problem.

 Why The majority of filer crashes and problems are due to bugs that are quickly

identified and fixed by Network Appliance. When hardware problems thatimpact reliability are identified you will be notified. You must respond to thesenotifications to maintain your filer’s reliability.

 How Follow the recommendations listed in the notifications you receive. If you are

not sure that an issue affects your filers, please contact NetApp TechnicalSupport for assistance in determining the potential impact to your filers.