adcnas001 users dell html training document uploads docs poweredge
TRANSCRIPT
1
DellTM Training for PowerEdgeTM Systems
Contents DellTM Training for PowerEdgeTM Systems .................................................................................................. 1
Introduction .................................................................................................................................................. 8
Safety Guidelines ...................................................................................................................................... 8
Additional Information .............................................................................................................................. 8
PowerEdge Server Families ........................................................................................................................... 9
Objectives.................................................................................................................................................. 9
What Is a Server? ...................................................................................................................................... 9
Power Supplies ...................................................................................................................................... 9
Fans ..................................................................................................................................................... 10
Memory ............................................................................................................................................... 10
PERC RAID Controller .......................................................................................................................... 10
Critical Components for Redundancy in Servers ................................................................................ 10
PowerEdge Server Types ......................................................................................................................... 11
Rack‐Optimized Servers ...................................................................................................................... 11
SC Server Models ................................................................................................................................ 12
Blade Servers ....................................................................................................................................... 12
Dell PowerEdge Server Families Naming Schema ................................................................................... 12
Optional Components ............................................................................................................................. 13
Summary ................................................................................................................................................. 13
Configuration and Assembly ....................................................................................................................... 14
Objectives................................................................................................................................................ 14
PowerEdge System Components ............................................................................................................ 14
System Board Features and Connectors ............................................................................................. 14
PowerEdge System Components ............................................................................................................ 15
System Components ........................................................................................................................... 15
RAID Contoller ..................................................................................................................................... 15
Disk Drive I/O Connectors ................................................................................................................... 16
Peripheral Connectors ........................................................................................................................ 16
I/O Expansion Cards/Risers ................................................................................................................. 16
Expansion Slots ................................................................................................................................... 17
2
PCI Bus ................................................................................................................................................. 18
Hot‐Plug PCI ........................................................................................................................................ 19
Indicators on the Hot‐Plug PCI ............................................................................................................ 19
Processors ........................................................................................................................................... 19
Types of Memory ................................................................................................................................ 20
Error Checking and Correcting ............................................................................................................ 21
Chip kill Memory ................................................................................................................................. 22
Memory Interleaving .......................................................................................................................... 22
Memory Technologies ........................................................................................................................ 22
Power Subsystem: An Overview ......................................................................................................... 24
Power Subsystem ................................................................................................................................ 24
Storage Subsystems: An Overview ..................................................................................................... 25
Storage Subsystems ............................................................................................................................ 25
Backplane ............................................................................................................................................ 26
Backplanes and Split Backplanes ........................................................................................................ 27
BIOS ......................................................................................................................................................... 27
BIOS: An Overview .............................................................................................................................. 27
BIOS Sequence: The following are the steps that a typical boot sequence involves: ........................ 28
Configuring Your Bios .......................................................................................................................... 29
Restoring BIOS/Factory Defaults ......................................................................................................... 29
Clearing a System’s NVRAM ................................................................................................................ 30
Summary ................................................................................................................................................. 30
Networking ................................................................................................................................................. 31
Objectives................................................................................................................................................ 31
NIC and NIC Teaming .............................................................................................................................. 31
NIC Teaming ............................................................................................................................................ 31
What Is NIC Teaming? ......................................................................................................................... 31
NIC Teaming In PowerEdge Servers .................................................................................................... 32
NIC Installation Guidelines .................................................................................................................. 32
TCP/IP Offload Engine ............................................................................................................................. 32
What Is TOE? ....................................................................................................................................... 32
TOE and OSI ......................................................................................................................................... 33
Broadcom Advanced Control Suite (BACS) Version 2 ......................................................................... 34
TOE Recommendations ....................................................................................................................... 34
SUMMARY ............................................................................................................................................... 34
3
Server Management ................................................................................................................................... 35
Module Objectives .................................................................................................................................. 35
What Is Management? ............................................................................................................................ 35
OpenManage System Overview .............................................................................................................. 35
Overview ............................................................................................................................................. 35
The OpenManage Model ........................................................................................................................ 36
OpenManage Model ........................................................................................................................... 36
Remote Management ............................................................................................................................. 37
In‐band Interfaces ............................................................................................................................... 38
Out‐of‐Band Management .................................................................................................................. 38
Dell Remote Access Controller ................................................................................................................ 39
DRAC Integrates With BMC/ESM ........................................................................................................ 39
DRAC 5 .................................................................................................................................................... 39
DRAC Terminology .............................................................................................................................. 39
DRAC 5 Overview ................................................................................................................................ 39
RACADM and DRAC 5 .......................................................................................................................... 40
Virtual Media .......................................................................................................................................... 41
Overview ............................................................................................................................................. 41
Virtual Media Plug‐in .......................................................................................................................... 41
Virtual Media Support ......................................................................................................................... 42
Virtual Media on Microsoft Windows ................................................................................................. 42
Virtual Flash ........................................................................................................................................ 43
Configuring a Bootable Virtual Flash................................................................................................... 43
Console Redirection ............................................................................................................................ 43
Viewing and Saving System Event Logs .................................................................................................. 44
Dell OpenManage Server Administrator ............................................................................................. 46
Summary ................................................................................................................................................. 46
Storage ........................................................................................................................................................ 47
Module Objectives .................................................................................................................................. 47
Storage Technologies .............................................................................................................................. 47
Serial ATA (SATA) ................................................................................................................................ 47
Small Computer Interface Systems (SCSI) ........................................................................................... 48
SCSI Limitations ................................................................................................................................... 48
Serial Attached SCSI (SAS) ................................................................................................................... 48
SCSI vs. SAS Technologies ................................................................................................................... 49
4
Serial Attached SCSI (SAS) ....................................................................................................................... 49
The PHY: Basis of all SAS Communication ........................................................................................... 49
Links and Ports .................................................................................................................................... 50
Expanders ............................................................................................................................................ 50
Connection Rates ................................................................................................................................ 51
SAS Transport Protocol ....................................................................................................................... 51
Direct/Expander Attached End Devices .............................................................................................. 52
SAS Topology ....................................................................................................................................... 53
SAS Domains ....................................................................................................................................... 53
SAS Device Detection .......................................................................................................................... 55
SAS Controllers .................................................................................................................................... 56
SAS/SATA Hard Drives ......................................................................................................................... 56
Disk Data Format (DDF)....................................................................................................................... 57
Redundant Array of Independent Disks (RAID) ...................................................................................... 57
Redundant Array of Independent Disks (RAID) .................................................................................. 57
RAID Levels .......................................................................................................................................... 58
Global Hot Spare ................................................................................................................................. 60
Disk Failure .......................................................................................................................................... 60
PowerEdge Expandable RAID Controller (PERC) ..................................................................................... 61
Introduction ........................................................................................................................................ 61
PERC Nomenclature ............................................................................................................................ 61
PERC 4 ................................................................................................................................................. 62
PERC 5 ................................................................................................................................................. 62
PERC 5 SAS and SATA Support ............................................................................................................ 63
PERC 5 Performance ........................................................................................................................... 63
PERC 5 Battery Backup Unit (BBU) ...................................................................................................... 63
Battery Thermal Impacts ..................................................................................................................... 64
Battery Learn Cycle ............................................................................................................................. 64
Native and Foreign Configurations ..................................................................................................... 64
Foreign Arrays ..................................................................................................................................... 64
Auto Import of Foreign Configurations after Migration ..................................................................... 65
< CTRL >< R >Overview ....................................................................................................................... 65
Functions of the < Ctrl > < R > Utility .................................................................................................. 65
Ctrl‐R User Interface ........................................................................................................................... 66
Multiple Adapters ............................................................................................................................... 67
5
Not in the BIOS Utilities ...................................................................................................................... 67
Foreign Configurations in CTRL‐R........................................................................................................ 67
OpenManage Server Administrator ........................................................................................................ 68
Introduction ........................................................................................................................................ 68
Connecting to OMSA ........................................................................................................................... 68
Component Properties ........................................................................................................................ 69
Health: ................................................................................................................................................. 69
Component Status .............................................................................................................................. 70
Storage Information / Configuration .................................................................................................. 70
Storage Management Features .......................................................................................................... 71
Controller Object................................................................................................................................. 71
Battery Object ..................................................................................................................................... 72
Connector Object ................................................................................................................................ 73
Enclosure/Backplane Object ............................................................................................................... 73
Physical Disk Object ............................................................................................................................ 73
EMMs Object ....................................................................................................................................... 74
Fans Object ......................................................................................................................................... 74
Power Supplies Object ........................................................................................................................ 74
Temperatures Object .......................................................................................................................... 74
Firmware/Driver Versions Object ....................................................................................................... 74
Virtual Disks Object ............................................................................................................................. 74
Server Internal Storage Troubleshooting ................................................................................................ 76
Drive(s) going offline ........................................................................................................................... 76
SMART errors on the physical drive .................................................................................................... 76
Dell OpenManage Server Administrator warnings about firmware ................................................... 76
Driver not ready or unrecoverable errors ........................................................................................... 76
RAID Troubleshooting ............................................................................................................................. 76
Drive in a Fail State ............................................................................................................................. 77
Drive in a Missing State ....................................................................................................................... 77
Multiple Drive Failure ......................................................................................................................... 78
Hard Drive with Pre‐Failure Warning .................................................................................................. 78
Summary ................................................................................................................................................. 78
Troubleshooting and Diagnostics ................................................................................................................ 79
Objectives................................................................................................................................................ 79
Troubleshooting Techniques .................................................................................................................. 79
6
Broad‐Level Steps ............................................................................................................................... 79
Check the Obvious .............................................................................................................................. 79
Communicate ...................................................................................................................................... 79
Post Messages and Other Error Indications ............................................................................................ 80
What Is an LCD Panel? ........................................................................................................................ 80
Interpreting LCD Message Codes ........................................................................................................ 80
Display of Error Messages ................................................................................................................... 81
BIOS Progress Code Display ................................................................................................................ 81
LCD Display .......................................................................................................................................... 81
Hard Drive Indicator Codes ................................................................................................................. 82
NIC Activity/Link Indicators ................................................................................................................. 83
DRAC Activity/Link Indictors ............................................................................................................... 83
System Status Indicator ...................................................................................................................... 83
Power Status/Fault/Present Indicators ............................................................................................... 84
BIOS Messages ........................................................................................................................................ 84
System Messages ................................................................................................................................ 84
Baseboard Management Controller ....................................................................................................... 85
What Is Baseboard Management Controller? .................................................................................... 85
BMC and the LCD Panel ...................................................................................................................... 85
BMC Connections ................................................................................................................................ 86
Connection Modes .............................................................................................................................. 86
BMC KG Key......................................................................................................................................... 87
Configuring the BMC ........................................................................................................................... 88
Intelligent Platform Management Interface ........................................................................................... 88
Intelligent Platform Management Interface (IPMI) ............................................................................ 88
IPMI 2.0 ............................................................................................................................................... 89
Virtual Media on Linux ........................................................................................................................ 89
Dell Diagnostics Distribution Package ..................................................................................................... 90
What Is Dell Diagnostics Distribution Package (DDDP)? ..................................................................... 90
DDDP Media Options .......................................................................................................................... 95
Boot Order: Flash Key ......................................................................................................................... 96
Test Selection Menu ........................................................................................................................... 97
Running Diagnostics ............................................................................................................................ 98
Overview of DDDP in Linux ................................................................................................................. 98
DDDP in Linux ...................................................................................................................................... 98
7
MP Memory Tests ............................................................................................................................. 100
Dell PowerEdge Diagnostics: An Overview ....................................................................................... 100
Dell PowerEdge Diagnostics Features ............................................................................................... 101
Dell 32‐Bit Hardware Diagnostics ......................................................................................................... 104
Troubleshooting and Diagnostics .......................................................................................................... 105
Summary ............................................................................................................................................... 105
Navigating Dell Information and Tools ..................................................................................................... 107
Module Objectives ................................................................................................................................ 107
Objectives.......................................................................................................................................... 107
Dell System E‐Support Tool (DSET) ....................................................................................................... 107
Support.Dell.com .................................................................................................................................. 107
Dell Solution Network ........................................................................................................................... 108
Appendix A: RAID/PERC Terms ................................................................................................................. 109
Appendix B: Glossary ................................................................................................................................ 111
8
Introduction
Safety Guidelines You must take precautions to prevent electrostatic discharge (ESD). • Static Electricity ‐ A charge stored in any body. • Electrostatic Discharge‐ A sudden transfer of electrostatic charge between bodies at different
electrostatic potential ‐ usually as a spark as the bodies approach one another. ESD is a major concern when handling components, especially expansion cards and system boards. Very slight charges can damage circuits. Damage from ESD can occur immediately, or it may not become apparent for some time. ESD may also result in intermittent problems or a shortened product lifespan can also result. You can minimize the chances of a discharge by wearing the wrist‐grounding strap that comes in the ESD kit and taking several precautions: • While the system is plugged into the Earth circuit via the power socket, attach the wrist‐grounding
to your wrist and clip the other end to a ground object. If a wrist‐grounding strap is not available, you can discharge the static electricity in your body by touching an unpainted metal surface, such as the computer chassis.
• Unplug the machine. Static‐sensitive components arrive wrapped in anti‐static packing material. Do the following when handling static‐sensitive components: • Use an ESD wrist‐grounding strap. • Handle all sensitive components in a static‐safe area. • If possible, use anti‐static floor mats and workbench pads. • When unpacking a static‐sensitive component, do not remove the component from the antistatic
packing material until you are ready to install the component into your system.
Additional Information • This course should take 4 hours to complete (excluding the assessment). • For additional technical information refer to support.dell.com .
9
PowerEdge Server Families
Objectives Upon completion of this section, you will be able to: • Identify the features that are inherent to
servers. • Compare the different physical
configurations of Dell servers. • Recognize the characteristics of the Dell
PowerEdge™ server families. • Identify components that are inherent to
servers.
What Is a Server? A server is a computer designed with the capabilities to survive a given level of hardware component failure. Inherent to the server is a level of redundancy for all critical components. The following describes the functionality of components that provide failover capabilities in a server.
Power Supplies
Redundant power supplies are available on most models. Redundant power supplies failover (automatically take over) if the primary power supply fails.
10
Fans
Fans keep the server at the proper temperature.
Memory
Memory can be configured to provide different levels of redundancy: • Spare bank • Memory mirroring • Memory RAID
PERC RAID Controller
PERC RAID controllers provide the capability of using different levels of RAID data protection for the contents of the disks.
Critical Components for Redundancy in Servers
In addition to the available redundancy, servers have a hardware monitoring system built on top of the executing environment. This system of monitoring probes is controlled by a Baseboard Management Controller (BMC) and provides the capability to be proactive in forecasting the possible failure of a server component. The information available from the BMC is accessible by various means and can be used for notification to support staff. • Failovers ‐ Failover is monitored and managed by a power supply controller that will detect a failure
and utilize the redundant component. Also, for servers that require multiple power supplies, the controller will perform power load balancing to reduce the load on any single power supply which extends the mean‐time‐to‐failure (MTTF).
• Fans ‐ All fans are controlled by the BIOS to ensure that if a fan fails, the remaining fans will speed up to ensure proper cooling. Most fans are hot‐swappable.
• Memory redundancy configurations:
11
o Spare bank configurations provide the capability of keeping the server running when a bank of memory chips fails. Upon bank failure, the designated spare bank of memory will take over the work of the failed bank.
o Memory mirroring splits the total amount of memory and will provide two copies of memory contents so that if multiple banks fail, the surviving side of the mirror will continue supporting memory activity.
o Memory RAID provides memory content protection by calculating parity values stored within memory.
PowerEdge Server Types Dell offers four types of servers, as well as accessories and options for network administrators, who face an ever‐changing technology environment. These servers and options include: tower servers, rack‐optimized servers, SC servers, blade servers, and optional components. SC servers are available in both rack mount and tower configurations. A rack mount configuration is better suited to be installed into a rack (a large steel cage that servers can be mounted into for centralization). A tower configuration is better for environments where a rack is not available and the server is to be installed on or under a desk or other similar location.
Tower Servers Dell PowerEdge tower servers are manufactured with the same performance characteristics as the other Dell server types, but with several key advantages: • Form factor is better suited for unracked implementations. • These servers typically offer more expansion slots than rack‐mounted
systems, due to greater space capability. • Some tower configurations can also be rack‐mounted, such as the Dell PowerEdge 6800 and 2900.
Rack‐Optimized Servers
Dell PowerEdge rack servers offer computing power and scalability in a form‐fitting housing. These servers typically give you the following advantages: • Better form factor for racking (IU = 1.75 inches). • Small profile takes up less rack space than the typical tower rack, enabling configuration of 1 to 2U,
as compared to up to 7U for some tower models. • Rack servers give you the same computing power but a greater computing density by allowing you
to contain more servers in valuable floor space.
12
SC Server Models
The Dell SC line of servers offers an entry‐level, scalable solution to businesses at an aggressive price. Since these systems are generally cost‐driven solutions, there are some limitations with other Dell PowerEdge solutions, such as: • Limited number of processors (two maximum) • Single power supplies These limitations are often overshadowed by the ease with which they can be scaled out to fit your business’s expansion.
Blade Servers
A blade server is often referred to as a server on a system board. These servers contain memory, processors, storage, and network connections; yet they share a common set of power and cooling fans that are housed in a blade chassis mounted in a standard rack. Additionally, blade chassis can contain commonly shared storage area network (SAN) connections, switches, and InfiniBand connections. Blade servers are typically hot‐swappable within the blade chassis, and their architecture is closer to that of mainframe‐type architectures. These types of architecture are typically used in large server farm implementations where space and cooling capabilities are at a premium.
Dell PowerEdge Server Families Naming Schema Dell PowerEdge servers have adopted a server naming schema that allows you to easily determine a server’s generation, the model within the generation, and whether the system is tower or rack‐optimized. Server designation uses a very simple MGR0 schema which can be deciphered as follows: M is the server model within a generation G is the generation number R is 0 for tower, 5 for rack‐optimized 0 can also be 5 for a blade server
13
Note that SC servers and some server families do not necessarily follow strict nomenclature rules (for example, the PowerEdge 860 and 840). NOTE: This will change in the next generation of servers.
Optional Components Servers also utilize optional components such as power distribution units (PDUs), racks, and network switches. The following graphic describes the various components and their use.
Summary About servers:
• Servers increase the mean‐time‐to‐failure (MTTF), thereby providing a greater amount of up‐time.
• Dell servers are available in a tower, rack‐optimized, or blade configuration. Rack‐mounted configurations are ideal for saving space. Some tower form factors are rackable.
• Spare bank memory, memory mirroring, and memory RAID are three types of memory configurations that allow redundancy in servers.
• Other components provide redundancy in servers, such as fans, hard drives, and power supplies. These components can also be hot‐swappable, or be removed and/or replaced while the server is powered up.
• Servers have a hardware monitoring system built on top the executing environment that is controlled by a baseboard management controller (BMC). The BMC monitors the hardware and can help predict when a component will fail.
• Dell servers use PowerEdge Expandable RAID Controllers, or PERCs, which is a Dell‐proprietary RAID controller. Remember, the “E” stands for “expandable” not “Edge!”
14
Configuration and Assembly
Objectives Upon completion of this section you will be able to:
• Identify the locations of different Dell PowerEdge™ components.
• Describe the functionalities of the Dell PowerEdge components.
• Interpret the variety of component and operating system technologies.
PowerEdge System Components Powerful components are integral in the design of the PowerEdge server. This section covers the components that are specific to Dell servers, including all the individual components on the system board.
System Board Features and Connectors
The Dell PowerEdge system board is the foundation of all Dell PowerEdge servers. All major components are either integrated into the system board or sockets are provided for components to attach through various methods. Each Dell PowerEdge server has a similar system board. While the size and shape may vary, the component functionality is similar. The PowerEdge 2950 system board shown here is the “heart” of the server. • Gigabit Ethernet connector: Port for network
interface controller (NIC). • PCI riser connector: Used to connect the
peripheral component interconnect (PCI) riser. • System board power connectors: Used to
supply power to the system board.
15
• Side‐plane connector: Used for connecting the Serial Attached SCSI (SAS) controller to the system board.
• Backplane: Interface between the hard drives and controller (drives plug into this component). • Backplane power connector: Used to supply power to the backplane. • Secondary CPU: The secondary processor in the system. • Primary CPU: The primary processor in the system. • DRAC‐5 daughter card connectors: Used to connect the Dell Remote Access Card 5 (DRAC‐5) to the
system board. The DRAC‐5 enables IT personnel to manage the system remotely. • DIMM sockets: Sockets for system memory, sometimes located directly on the system board and
other times located on the memory riser card. • PSPB/PDB: Power Supply Paralleling Board or Distribution Board: Distributes power from multiple
power supplies. • PERC: PowerEdge Expandable RAID Controller. Dell’s line of RAID controllers.
PowerEdge System Components
System Components
This section covers the specific components that are found in most Dell servers.
RAID Contoller
Redundant Array of Independent Disks (RAID) is a disk subsystem that employs two or more drives in combination for fault tolerance and performance. The Dell PowerEdge Expandable RAID Controller (PERC) is discussed in greater detail later in this course.
16
Disk Drive I/O Connectors
A disk driver I/O connector is an internal connector that attaches directly to components without the use of cables. The following list describes available internal connections. 1) Small Computer Systems Interface (SCSI) connector: Used to connect internal SCSI devices. 2) Serial Advanced Technology Attachment (SATA) connector: User to connect internal Serial ATA devices. 3) Serial Attached SCSI (SAS) connector: Used to connect SAS devices.
Peripheral Connectors
DRAC ENET: A dedicated Ethernet port for DRAC‐5. Serial: Used to connect serial devices. VGA (Video Graphics Array: Used to attach a monitor. USB (Universal Serial Bus): Used to connect various peripheral devices. GB ENET: Used to connect the server to the network via high speed Ethernet.
I/O Expansion Cards/Risers
Peripheral component interconnect (PCI) specifies a computer bus for attaching devices to a system board or peripheral. These devices may include RAID controllers, network interface cards (NICs), and
17
host bus adapters (HBAs). The PCI connection to the system board provides an electrical path to the expansion cards from the processor and other components, allowing for two‐way communication. A riser is an expansion card that extends a slot for a chip or card in a fully loaded computer to make room to plug it in. The cards are plugged into the riser card in a parallel orientation to the system board. There are three types of PCI buses. 1) PCI was the initial bus standard with transfer rates such as 133 megabits (Mbs) per second. 2) PCI‐Extended (PCI‐X) is extension of the initial PCI bus with transfer speeds up to 1 gigabit (GB) per second. 3) PCI‐Express (PCI‐E) speeds start at 2+ GBs per second and above depending on the number of lanes used.
Expansion Slots
18
PCI Bus
A bus is a set of conductors that connect the functional units in a computer. The type of PCI bus manufactured into a system is usually determined by the manufacturer. Typically only the latest, best, and fastest version is used. However, a user might choose an older version of bus architecture to preserve an investment in expansion cards. The table displayed compares PCI‐X with PCI‐E.
19
Hot‐Plug PCI
Some PowerEdge servers have hot‐plug PCI support. The following conditions must be met for that support: • Server must support hot‐plug PCI • Operating system must support hot‐plug PCI • PCI adapter must support hot‐plug PCI • Fiber Channel HBAs and network adapters support hot‐plug PCI • Dell RAID and SCSI controllers do not support hot‐plug PCI
Indicators on the Hot‐Plug PCI
1) Off: Expansion slot power is off. No action is required. 2) Green: Expansion slot power is on. No action is required. 3) Green blinking fast: Expansion slot is being identified by an application program or driver. No action required. 4) Amber blinking slow: Expansion card is faulty or improperly installed and causing a problem with power supply to the card. 5) Amber blinks twice and pauses, and then repeats the sequence: Hot‐plugged expansion card is a slower operating speed than other cards on the same PCI bus.
Processors
• The processor, also known as the CPU, is the primary
chip set within your system. • The processor is comprised of the Arithmetic Logic Unit
and the Control Unit. The processors perform the calculations and execute the processes that cause your server to function properly.
• An increase in the speed of your processor will increase the overall performance of your system.
Types of processors installed in the Dell PowerEdge systems include:
• AMD Opteron 8200 Series • AMD Opteron 2200 Series
20
• Intel Xeon 7100 Series • Intel Xeon 5300 Series • Intel Xeon 5100 Series • Intel Xeon 5000 Series • Intel Xeon 3000 Series • Intel Pentium D915 • Intel Celeron D336
Operating System Support An operating system is the core program running on a computer. It is responsible for providing a platform for other applications to run from. It coordinates input (from a keyboard, for example) and output (to a printer, for example) activities on behalf of the other applications. The operating system is always running and is the first software to load during the boot process.
Types of Memory
The differences in the types of memory are related to how individual cells of memory are accessed or the degree of error correction available. Dell determines the type of memory in its servers. 1) Dynamic Random Access Memory (DRAM): DRAM stores each bit of data in a separate capacitor. This type of memory needs to be refreshed in order to stay current because the capacitors leak electrons, hence the name dynamic. 2) Synchronous DRAM (SDRAM): SDRAM is a specialized DRAM chip that uses an internal clock that is coordinated with the system processor in order to synchronize the input and output of data. The speed of the SDRAM chip is therefore limited by the speed
21
of the processor. A faster set of processors means that a faster SDRAM chipset can be used. 3) Dual In‐line Memory Module (DIMM): A DIMM is a type of memory that uses a 64‐bit bus to transfer data. 4) Single In‐line Memory Module (SIMM): A SIMM is a type of memory that uses a 32‐bit bus to transfer data. 5) Double Data Rate (DDR): DDR is a type of SDRAM that sends data on both the rising and the falling of the clock cycle. This effectively doubles the rate of data that can be read with a standard SDRAM module. 6) Double Data Rate 2 (DDR 2): DDR 2 is the latest generational evolution of the SDRAM module set. This is an improvement on the DDR family with an increased number of buffers, faster pre‐fetch rate, improved packaging, and reduced electrical demands. 7) Fully Buffered DIMM (FB‐DIMM): FB‐DIMM combines the high‐speed internal architecture of DDR 2 memory with a brand new point‐to‐point serial memory interface, which links each FB‐DIMM module together in a chain.
Error Checking and Correcting
• Error Checking and Correcting (ECC) checks and corrects data in real time (on the fly).
• Non‐ECC using SDRAM or DDR‐SDRAM does not check for errors during the data read and/or transmission process.
• Registered memory has a register chip:
o Clocks data in and out o Slower than non‐registered modules o Takes one clock cycle
• Improves data transfer by "re‐driving" the control signals in the memory chips.
• DDR (Double‐Data Rate) SDRAM uses a double‐data rate clocking technique to push.
• Its peak burstsbandwidth to 1.6GB/sec, compared to 1GB/sec for PC133 SDRAM.
22
Chip kill Memory
• Chip‐kill is a function of the system's memory
controller and works with ECC memory. • Failure of DRAM devices will not cause uncorrectable
data errors. • Chip‐kill is accomplished by spreading two 144‐bit
ECC words across four (4) dual in‐line memory modules (DIMMs) (32 bytes).
• Each ECC word has the capability of correcting any single bit error or any four (4) adjacent bits (nibble).
• When failure is detected and memory DRAM is bypassed, the amount of RAM available for use is decreased by the size of the now unusable DRAM device.
• The operating system reports the full system memory even when chip‐kill has bypassed DRAM devices on memory modules.
• Use the Dell System E‐Support Tool (DSET) or other utility to review the system logs for information related to DRAM failure.
Memory Interleaving
Non‐interleaving memory
• System memory is accessed sequentially one DIMM at a time.
Interleaving Memory
• Interleaving memory addressing is found in higher‐end servers.
• Interleaving works by dividing the system memory into multiple blocks.
• Multiple chunks of data (either in chunks of two or four) are accessed, and then processed simultaneously.
Memory Technologies
Memory is the electronic holding place for instructions and data that a system's processor accesses. The following list describes the different memory technologies.
1) Spare bank: Allows a backup memory module to be used when a primary memory module fails. This backup allows the server to continue functioning until the failed primary module is replaced.
23
2) Memory mirroring: Enables data to be mirrored identically across two memory banks. This mirroring allows for the server to remaining functioning if the primary memory bank fails.
3) Memory RAID: Memory RAID provides memory content protection by calculating parity values stored within memory.
24
Power Subsystem: An Overview
Dell PowerEdge servers can be configured with redundant, hot swappable power supplies. Redundant power supplies are an integral part of a high availability system.
The power requirements of your server should follow the N+1 rule. Simply stated, if your server requires N power supplies you should install N+1. In this manner, if one of your power supplies goes down, the server can still have the required amount of power to operate properly. Additionally, by having hot‐swappable power supplies installed, you can replace the defective power supply without powering down the server.
Power Subsystem
The following list provides details on the functionality of each component of the PowerEdge power subsystem.
• Single or Redundant Power Supplies o A Dell PowerEdge system may be equipped with redundant power supplies in an N + 1
configuration. o Indicator LED:
Red or amber indicator light: failure Green indicator light: normal operation
• Power Board
o Systems with redundant power supplied contain a power board. o Two types of power boards are available:
PSPB: Power Supply Paralleling Board PDB: Power Distribution Board
• Power Supply Paralleling Board (PSPB)
o Distributes power load of the server across multiple power supplies. o Provides redundancy by supporting a failover power supply. o N + 1 configuration of three power supplies.
Two power supplies work together to provide the system with power. The third power supply waits to become active if either of the other two fails.
• Power Distribution Board (PDB)
o The power distribution board (PDB) provides: Hot‐plug logic Power distribution for the system.
o PDB is used in smaller servers with 1 + 1 configuration with one redundant power supply and one active power supply.
25
o PDB does not spread the power load between multiple active power supplies like a PSPB. o PDB provides redundancy and hot swapping.
Allows a backup power supply to give power to a failing active supply.
Storage Subsystems: An Overview
A storage subsystem is a collection of components that allow for the storage of data.
• Those components may include: • A SCSI or RAID controller • Internal hard disk drives • An external enclosure containing disk drives • CD or floppy‐disk drives • Data and power cabling • Software applications that enable the
configuration of the subsystem
The available storage options are listed below.
1) Internal Single Disks: Separate disk drives are used to store system information. These drives can be set up with simple software‐based RAID mirroring configurations but provide the least amount of redundancy and availability. 2) Internal RAID with PERC: Using a Dell PowerEdge Expandable RAID Controller (PERC), you can set up a storage array comprised of a series of internal storage disks. This is the preferred method of creating an internal RAID array as the PERC controller offloads the processing of the data access routines to the card from the processors. 3) External RAID with PERC: Using a combination of Dell PERCs and PowerVault Direct Attached Storage systems, an external RAID array system can be set up to provide an improved level of performance, scalability, and reliability. 4) Fibre Channel SAN or NAS: Dell Storage Area Networks (SANs) and Network Area Storage (NAS) solutions can be connected to the Dell PowerEdge server series to provide the ultimate in data access and reliability.
Storage Subsystems
Internal Storage Devices
The following describes items that are performance and reliability technologies built into disks and drives.
26
SCSI: Small Computer System Interface is an industry standard for providing high‐speed access to peripheral devices.
SATA: Serial Advanced Technology Attachment is a generational upgrade of the parallel ATA or Integrated Drive Electronics (IDE) interface. Additionally, the cables used with SATA‐enabled devices are much smaller and allow for smaller chassis design, due to improved cooling efficiency.
SAS: Serial Attached SCSI hard drives deliver the next generation of SCSI performance and reliability for critical business applications.
NOTE: The CD/DVD options for most drives include:
• 8X DVD‐ROM • 24X CD‐ROM • 48X CD ROM • 16X DVD‐ROM • 24X CD‐RW/DVD ROM
Backplane
A backplane is a high‐speed communications circuit board that contains sockets with which devices and other electronic components can interface. The purpose of the disk storage backplane is to provide one interface for the storage devices, providing better logical control and faster connection speeds. Two additional features are available to increase system performance and reliability on PowerEdge systems: • Drives attached to the backplane are hot‐
swappable • Splitting is possible Replacing a Backplane Replace a SCSI backplane when:
• Backplane and/or hard drives experience failure • Intermittent failure • An error message suggests replacement as an option to correct the error
27
• Replace or reset the backplane when experiencing I/O failure Install/Replace Guidelines:
1. Power down the server. 2. Move or disconnect any components obstructing the removal of the backplane. 3. Disconnect the backplane cables. 4. Pull the backplane off its grounding tabs and out of the system.
Reconnect/reset all disconnected components and cables to ensure correct operation of the new backplane.
Backplanes and Split Backplanes
1) Backplane: A backplane allows for storage devices to be replaced within the array without the need to power down the system. This increases availability of the system for users in the event that one of the devices attached to the backplane fails. Users should always double‐check the system indicator light to ensure that they have the correct drive before attempting to remove it. Drives that are not on a RAID array need to undergo routine backups in order to avoid any loss of information. 2) Split backplane: Split backplanes can accomplish two SCSI channels with drives on both Channel A and Channel B. Splitting the backplane across two channels increases the performance of the system as the traffic is now fanned out across both channels. A common configuration is to have mirrored sets of disk drives on separate controller channels. This can improve redundancy as well as performance. Split configurations are based on the model that is chosen.
BIOS
BIOS: An Overview
The Basic Input Output System (BIOS) gets the system from <off> mode to the <operating system loading> mode. The system BIOS has two key functions: • Enables the system to run when you turn on the system • Attempts to boot to the opening system • Ensures all chips, hard drives, ports, and processors function together
28
BIOS Sequence: The following are the steps that a typical boot sequence involves: 1. The internal power supply board turns on and initializes (defective power supplies can damage a
system).
2. The chipset receives the signal from the power supply board and passes it along to the processors.
3. The BIOS performs the Power‐On Self Test (POST).
4. The BIOS looks for the video chip and the video’s built in BIOS program and runs it.
5. The BIOS does more tests on the system, including the memory count‐up test which you sometimes see on the screen.
6. The BIOS performs a “system inventory,” doing more tests to determine what sort of hardware is in the system.
7. Some BIOS’s will now display a summary screen about your system’s configuration. 8. The BIOS begins the search for a boot drive.
9. If it finds what it is looking for, the BIOS starts the process of booting the operating system, using
the information in the boot sequence. BIOS Setup Screen (F2): You can use the System Setup programs to change and/or identify the system’s basic configuration after you add, change or remove any hardware on your system. The server’s basic configuration is held in the system BIOS. On servers, the BIOS contains all of the code required to control the keyboard, display screen, disk drives, and more. The following graphic depicts the System Setup program screen.
29
Configuring Your Bios
Dell suggests updating the BIOS to the current release and as a general rule, leaving the BIOS at factory default settings. Changes to the BIOS should be made only if specifically noted in the documentation for your system and additional hardware. Only the most basic configuration tasks should be performed: • Modifications to time or date • Boot order change • Enabling or disabling integrated devices
Restoring BIOS/Factory Defaults
Resetting your system to factory default settings will automatically reset the integrated device option to “SCSI Enabled.” Ensure that you reset the integrated device to the correct setting to avoid data loss. Prior to setting the BIOS to defaults, document the system’s current settings. Actions to restore the BIOD defaults include: • Press <Alt><F> to restore factory defaults for the entire BIOS.
30
• Press <ALT><D> to restore single fields in the BIOS.
Clearing a System’s NVRAM
If the configuration settings become corrupted to the point where the system will not boot, follow the procedure below to clear the system configuration stored on the system’s NVRAM. CAUTION: Only trained service technicians are authorized to remove the system cover and access any of the components inside the system. See the Safety Guidelines for complete information about safety precautions, working inside the computer, and protecting against electrostatic discharge. 1. Power down the server. Refer to your hardware owner’s manual for information on the NVRAM
jumper location and settings for your system. 2. Unplug the system power before removing the cover for your system. Remove any mechanical
components necessary (i.e., shrouds, covers, etc.) in order to obtain access to the NVRAM jumper. Change the jumper to the clear setting.
3. Power up the server and have the system run completely through its boot process. Exercise caution, as the inside of the system is exposed.
4. Power the system down. Change the jumper back to the default setting. 5. Replace all of the system’s components that were removed previously. Replace the system cover.
Power up the system.
Summary • The BIOS is the mechanism that takes a system from the OFF mode to the operating system mode,
which puts the system in ON mode. The BIOS assists in the communication between devices. • Dell suggest updating the BIOS to the current release and as a general rule, leaving the BIOS at it
factory default settings • Storage subsystems are a collection of components that allow for the storage of data. Dell servers
utilize RAID controllers, hard drives (SAS, SATA, and SCSI), and CD and/or floppy drives storage components.
• SAS and SATA drives are both available on PowerEdge 9th‐generation systems.
31
Networking
Objectives
Upon completion of this section, you will be able to:
• Define networking and how data moves between systems. • Review NICs and TCP options for 9th‐generation (9G) servers. • Understand various teaming methods. • Describe the functionalities of TCP/IP offload engine (TOE).
NIC and NIC Teaming
Network Interface Cards (NIC)
A Network Interface Card is used to connect Dell PowerEdge™ servers to an Ethernet network. This can be an expansion board that you insert into an expansion slot inside a computer or it may be built into the system board of your computer. Dell PowerEdge servers may be configured with single or dual NIC cards. You may also install an add‐on network card provided you have an open expansion slot available for use. When specifying a NIC, you must consider which model fits your server configuration and then choose between available speeds. The card you choose will only work as fast as the infrastructure it connects to. When choosing a NIC, you might also want to consider future computing needs.
NIC Teaming
What Is NIC Teaming?
Teaming is a method of creating a virtual LAN (a group of multiple devices that functions as a single device). The teaming function allows you to group any available network devices together to function as a team. The benefit of this approach is that it enables load balancing and failover.
Installing multiple NICs allows teaming to provide:
1) Fault tolerance: Assures network availability. If one controller fails, the server will remain available to the network by using another controller.
32
2) Load‐balancing: Allows multiple controllers to share large data loads, preventing one controller from being overwhelmed. 3) Generic Trunking: A term that describes using multiple Ethernet network cables/ports in parallel to increase the link speed beyond the limits of any one single cable or port, and to increase the redundancy for higher availability. It is also known as Link Aggregation.
NIC Teaming In PowerEdge Servers
Dell PowerEdge servers use NICs manufactured by Intel and Broadcom. • NICs are teamed using Adapter Configuration Utilities. • Broadcom teaming uses the BASP utility. • Intel teaming uses the PROSet utility. • When teaming a mixture of Broadcom and Intel adapters, the Broadcom utility is preferred. •
NIC Installation Guidelines
• Pre‐check: Check your system to ensure it meets the minimum requirements and compatibility for
your NIC. • Check the PCI slot, system RAM, recommended operating system. • Driver installation: If necessary, install the latest drivers for your adapter.
TCP/IP Offload Engine
What Is TOE?
The TCP/IP offload engine (TOE) is a technology in high‐speed Ethernet systems for the purpose of optimizing throughput. TOE components are incorporated into one of the printed circuit boards, such as the network
33
interface card (NIC) or the host bus adapter (HBA).
TOE increases overall system efficiency and performance by: • Removing processor bottlenecks • Reducing excessive traffic across internal memory buses • Reducing traffic on the internal PCI buses • Requiring a TOE key With a standard NIC, all TCP/IP processing is done on the host processor with the exception of checksum and packet re‐assembly. A standard NIC is an inexpensive solution, but costly in terms of processor utilization. TCP/IP places a heavy burden on host processors. One solution for iSCSI initiators is to use an iSCSI storage adapter instead of standard NIC. Another solution is to use a TCP/IP Offload Engine (TOE). With a TOE, the processing requirements for four layers, including TCP, are moved from the host processor to hardware. The result is faster servers, an accelerated network and superior application performance. TIP: Install the TOE software before installing TOE. NOTE: TOE does not support iSCSI technology. You must purchase iSCSI‐specific TOE software if you want iSCSI support. In 9th generation TOE, the NIC is online before POST. You must install the TOE key before AC power is applied.
TOE and OSI
TOE takes layer 3 and 4 functions of the OSI (Open System Interconnection) model out of the operating system and leaves more resources for applications. TOE cards are sometimes called "Layer 4 Ethernet Cards."
34
Broadcom Advanced Control Suite (BACS) Version 2
BACS 2 is a configuration application for Broadcom NetXtreme cards which offers the following: • Second‐generation configuration utility incorporating Broadcom NetXtreme II devices. • Basic diagnostics on Broadcom Ethernet devices. • More visibility into network statistics. • Teaming Wizard for easy team creation and teardown. • A tab specifically for accessing TOE information like TOE statistics and TOE resource allocation. • A display of available technology licenses which have been found and validated by the Ethernet
device. • Display of absolute maximum number of connections for each license. • A resource allocation that shows available connections for all converged technologies.
TOE Recommendations
DO: • Use the Resource Allocation/Configure option to enable/disable TOE.
o 17% of resources are reserved by the TOE NIC. Only 83% of resources can be allocated for offloading.
• Use netstat –nt to check for offload connections. • Use Umbrella Installer to install, upgrade and move the drivers. • Only SLB is supported for TOE teaming RTS+.
o Use Broadcom Advance Control Suite for creating a TOE team. DON’T • Don’t use Plug‐and‐Play installation of drivers. • No 802,2ad or GE/FEC teaming types are supported for TOE teaming. • No multi‐vendor teaming for TOE teaming.
SUMMARY • Networking revolves around sharing of devices or information. • Teaming is a method of creating a virtual LAN (a group of multiple devices that functions as a single
device). • TOE NIC trunking does not support generic NIC trunking. • Dell PowerEdge servers use NICs manufactured by Intel and Broadcom.
35
Server Management
Module Objectives Upon completion of this section, you will be able to:
• Identify the concepts of server management. • Understand the role of the Remote Access Controller
(RAC) and its uses. • Identify the Dell Remote Access Controller (DRAC) 5
features. • Configure basic DRAC 5 functions. • View and save system event logs (SEL).
What Is Management?
Systems Management
Systems Management involves using tools (hardware and software) to perform mundane, simple, or complex tasks. Systems management can call on multiple components:
• Hardware • Remote management
hardware, in addition to the base hardware that makes the server work
• Software agents to monitor status and generate alerts
• Software to manage same or different systems
OpenManage System Overview
Overview
Dell OpenManage™ systems management software is a suite of application programs for PowerEdge™ systems that allows you to manage your system with proactive monitoring, diagnosis, notification, and remote access. Each of your managed systems will use applications that include Server Administrator, OpenManage Storage Management, and remote access controller (RAC) software. A management
36
station can be used to remotely manage one or more managed systems from a central location. By installing IT Assistant on a management station, you can effectively manage from one to thousands of remote managed systems. This graphic illustrates the relationship between a management station and its managed systems and also shows the operating systems and the Dell OpenManage software products that may be installed on the managed systems.
The OpenManage Model
OpenManage Model
Review the OpenManage model on this page. Note that OpenManage consists of Deployment Tools, Monitoring Tools and Maintenance Tools. Review the names of all the individual tools in the OpenManage suite, and then focus on the tools in the Deployment category.
37
Remote Management 1) Remote Management Hardware: Remote Management Hardware is often optional. It generally allows you to do extra management tasks that would not be otherwise possible. For Dell systems, this additional hardware is called DRAC, Dell Remote Access Controller. It provides the following: • Often works "out‐of‐band" • No operating system interaction required • Can work even when the system is
powered down Systems management software uses management protocols such as SNMP, DMI and CIM to interrogate IPMI components (such as ESM) and to change settings, but these protocols are embedded within the operating system, and therefore are "in‐band" – only available when the operating system is loaded. 2) Out‐of‐Band Management: To perform out‐of‐band management, it has been a requirement to install an additional management adapter card. Dell recommends the Dell Remote Access Controller (DRAC) family of adapters that allow you to remotely access the information supplied by IPMI and remotely control manageable hardware. The DRAC hardware supplied the interfaces include: • Serial • Modem • Ethernet 3) In‐Band Management: In‐band management is accomplished via instrumentation in the operating system. In‐band management means that a management system is dealing with a managed node through the operating system and through agents and drivers. The Dell Remote Access hardware supports “in‐band” management, and allows you to perform three major types of tasks:
• View system related management information (such as temperature, voltages, etc) via ESM or the
new BMC • Perform power management • Take control of the server GUI remotely
38
In‐band Interfaces
4) OpenManage Server Administrator: Server Administrator provides a comprehensive, one‐to‐one systems management solution from an integrated, Web browser‐based GUI (the Server Administrator home page), and from a command line interface (CLI) through the operating system. Server Administrator is designed for system administrators to both locally and remotely manage systems on a network. 5) IT Assistant: IT Assistant is installed on the management station and is used to configure the RAC. IT Assistant provides a one‐to‐many systems management solution. Server Administrator must be installed on the managed system in order to capture events.
Out‐of‐Band Management
Out‐of‐band management is accomplished completely independent of the operating system. It does not use the host operating system of the server, but connects to the Dell Remote Access hardware directly using a management program or console via one of its external interfaces. NOTE: The previous image has been simplified, and does not show the IPMI interface, the ESM, or the BMC with which the DRAC interfaces. Out‐of‐band management is useful to us because so many failure situations render the operating system unusable. Using Dell Remote Access hardware, we can connect to the ESM subsystem and perform tasks such as: • Check for line voltage • Verify BIOS levels • View logs • Power the server on and off • Force the server to boot from a diskette held at the management station
Dell Remote Access hardware has special hardware interfaces for performing out‐of‐band management. For example:
• Network port • Optional modem • Optional serial cable
39
Dell Remote Access Controller
DRAC Integrates With BMC/ESM
The Dell recommended interface for BMC is either OMSA or the Remote Access Controller (RAC). Dell Remote Access Controller (DRAC) provides the remote connections into BMC/ESM. RAC hardware has been IMPI compliant since the 3rd generation. When you plug a RAC into a Dell Server, you are interfacing and working with the systems management controller – either ESM or BMC. The DRAC adapter here becomes part of the management subsystem of this server. Upon installation of RAC, direct interface to the BMC is disabled ‐ you cannot use IMPI to talk directly to the BMC.
DRAC 5
DRAC Terminology
The Dell recommended interface for BMC is either OMSA or the Remote Access Controller (RAC). Dell Remote Access Controller (DRAC) provides the remote connections into BMC/ESM. RAC hardware has been IPMI compliant since the 3rd generation. When you plug a RAC into a Dell Server, you are interfacing and working with the systems management controller ‐ either ESM or BMC. The DRAC adapter here becomes part of the management subsystem of this server. Upon installation of RAC, direct interface to the BMC is disabled ‐ you cannot use IPMI to talk directly to the BMC.
DRAC 5 Overview
DRAC 5 is a hardware and software systems management solution. It interfaces with BMC and takes over BMC function once installed in x9xx servers. It offers enhanced functions including: • Friendly and intuitive interfaces • Additional security access configuration options • IPMI / BMC configuration capabilities • Enable/disable access points and features
40
• SM‐CLP (new CLI for RAC) • RAC users are from same domain as BMC users • Virtual Media feature is USB‐based rather than IDE‐based • Local host interface is IPMI instead of Virtual UART (serial) • Graphical Console Redirection Application • Hardware chipset, faster processor
RACADM and DRAC 5
RACADM RACADM is the CLI configuration utility for DRAC 5. DRAC 5 RACADM features include: • User friendly (messages, errors, etc.) • Accepts both hexadecimal format and decimal format for numeric values • Displays numeric values in a more intuitive decimal format • IPMI configurations groups • Extended RAC tuning properties for better security and access point control • Firmware update process more accurate and informative • Help system provides more information and examples • Remote RACADM interfaces to DRAC 5 and DRAC/MC DRAC 5 GUI The DRAC 5 has a firmware‐based GUI application hosted on its own Web server. The DRAC 5 GUI enhanced features include: • Consistency with RACADM in terms of manageability features • Target‐based rather than functional‐based navigation • Consistency with OMSA GUI user experience • Enhanced server sensor display (temperature, voltage, fan speed, intrusion) • User friendly and intuitive (descriptive error messages, etc.) • Service Access Point configuration (ssh, serial, telnet, etc.) • Server sensor display DRAC 5 Configuration Configuring the DRAC 5 includes network settings, users, alerts, etc. Configure DRAC 5 settings using one of the following: • Web‐based interface • RACADM CLI – “cfgLanNetworking” • BIOS BMC Binary (BBB) (Ctrl‐E setup)
41
After installation, configure the DRAC 5 properties (network, users, alerts, etc.). You can configure the DRAC 5 network settings using one of the following tools: • Web‐based interface • RACADM CLI • BIOS BMC Binary (BBB)
Virtual Media
Overview
DRAC 5 enables virtual media support. Virtual USB devices are always present in the server BIOS. DRAC can “hot‐plug” virtual devices into the operating system. It has virtual floppy and CD support and physical drive or image support. Using virtual media, administrators can: • Remotely boot their managed
systems. • Install applications. • Update drivers. • Install new operating systems remotely from the virtual CD/DVD and diskette drives. The virtual CD and floppy drives are two electronic devices embedded in the DRAC 5 that are controlled by the DRAC 5 firmware. These two devices are present on the managed system’s operating system and BIOS at all times, whether virtual media is connected or disconnected.
Virtual Media Plug‐in
The management station provides the physical media or image file across the network. When you launch the RAC browser for the first time and you access the virtual media page, the virtual media plug‐in is downloaded from the DRAC 5 web server and is automatically installed on the management station. NOTE: The virtual media plug‐in must be installed on the management station for the virtual media feature to function properly.
42
Virtual Media Support
Using Virtual Media, you can “virtualize” a diskette image or drive, enabling a floppy image, floppy drive, or CD drive on your management console to become available drives on the remote system. The DRAC 5 Virtual Media feature is based on USB technology and can take advantage of the USB plug and play features. DRAC 5 adds the option to attach and detach the virtual devices from the USB bus. The following media are supported: • Floppy drives • Legacy 1.44 floppy drive with a 1.44
floppy diskette • USB floppy drive with a 1.44 floppy diskette • 1.44 floppy image • CD drives • CD‐, DVD, CDRW, combination drive with CD media • CD image file in the ISO‐9660 format • USB CD drive with CD media
Virtual Media on Microsoft Windows
To run the virtual media feature on a management station running the Microsoft Windows operating system: • Install a supported version of Internet Explorer with the ActiveX Control plug‐in. • Set the browser security to medium or a lower setting to enable Internet Explorer to download and
install signed ActiveX controls. • You must have administrator rights to install and use the virtual media feature. • Before installing the ActiveX control, Internet Explorer may display a security warning. To complete
the ActiveX control installation procedure, accept the ActiveX control when Internet Explorer prompts you with a security warning.
On Windows systems, the virtual media drives are auto‐mounted and configured with a drive letter. Using the virtual drives from within Windows is similar to using your physical drives. When you connect to the media at a management station, the media is available at the system by clicking the drive and browsing its content.
43
Virtual Flash
The DRAC 5 provides persistent Virtual Flash–16 MB of flash memory that resides in the DRAC 5 file system that can be used for persistent storage and accessed by the system. It can upload ISO images. Once enabled, it appears to the host‐OS as a USB flash disk and it is available to the host‐OS as a drive letter. It can be formatted and also made bootable. You can enable / disable via RACADM or the DRAC GUI. When enabled, Virtual Flash is configured as a third virtual drive and appears in the BIOS boot order, allowing a user to boot from the Virtual Flash. Unlike a CD or floppy drive that requires an external client connection or functional device in the host system, implementing Virtual Flash only requires the DRAC 5 persistent Virtual Flash feature. The 16 MB key appears as an unformatted, removable USB drive in the host environment. Use the following guidelines when implementing Virtual Flash: • Attaching or detaching the Virtual Flash performs a USB enumeration, which attaches and detaches
all Virtual Media devices (for example, CD drive and floppy drive). • When you enable or disable Virtual Flash, the Virtual Media CD/floppy drive connection status does
not change.
Configuring a Bootable Virtual Flash 1. Insert a bootable diskette into the diskette drive or a bootable CD into the CD drive. 2. Restart your system and boot to the selected media drive. 3. Add a partition to virtual flash and enable the partition. 4. Use fdisk if virtual flash is emulating the hard drive. If virtual flash is configured as b:\ drive, the
virtual flash is floppy‐emulated and does not require a partition to configure virtual flash as a bootable drive.
5. Using the format command, format the drive with the /s switch to transfer the system files to the virtual flash. For example: format /s x where “x” is the drive letter assigned to virtual flash.
6. Shut down the system and remove the bootable floppy or CD from the appropriate drive. 7. Turn on the system and verify that the system boots from virtual flash to the c:\ or a:\ prompt.
Console Redirection
DRAC 5 uses hardware‐based console redirection. You cannot configure a console redirection session on the local system. A minimum available network bandwidth of 128 K bps is required. It is launched from the DRAC web GUI.
44
Console Redirection Configuration Page
Viewing and Saving System Event Logs A user has three options to view the System Event Logs (SEL): • Baseboard Management Controller (BMC) • Dell System E‐Support Tool (DSET) • Dell OpenManage Server Administrator
Baseboard Management Controller (BMC); also known as the Remote Access Configuration Utility. When working with 9G Servers, the user can press <Ctrl> and <E> during POST to be given the option to view the system event logs.
With the 9G Servers, configurations for the DRAC 5 and the BMC have been combined into one boot interface called the “Remote Access Configuration Utility”. The option to view the system event log is on the main menu.
45
Dell System E‐Support Tool (DSET) is an application that is used to extract all the system event logs and allow off line viewing. This information is consolidated into a single System Configuration Report. This does not come standard on systems but can be downloaded from http://support.dell.com/support/topics/global.aspx/support/en/dell_system_tool?c=us&l=en&s=gen
Once you have the Dell System E‐Support Tool (DSET) installed you create a new report while running the operating system. Within Windows you find the “Create DSET Report” option in the DSET listing of the Start Programs menu.
When you launch the program, DSET opens a DOS window listing the information that is being extracted from the system.
The DSET report is created and appears as a .zip file on the system’s desktop. This file will be just over 1MB.
46
Dell OpenManage Server Administrator is another method for viewing and extracting system event logs. To view, save or export the log using Dell OpenManage, open Server Administrator and select “system” in the tree on the left side of the Dell OpenManage Server Administrator screen and then select the “logs” tab as pictured here. Above the log entries are buttons that allow the user to perform different functions with the log.
Summary The Server Update Utility is a program used to update server BIOS, RAC, and RAID firmware. • OpenManage Server Administrator (OMSA), the remote access controller (RAC), and the baseboard
management controller (BMC) is a management solution that provides a consolidated and consistent way to monitor, configure, update and manage Dell systems.
• OMSA does not support SC servers. • IPMI is management software that can “discover” capabilities of manageable devices. • The DRAC 5 is a hardware and software systems management solution that interfaces with BMC and
takes over BMC function after it is installed in 9th generation servers. • DRAC 5 enables virtual media support. Virtual USB devices are always present in the server BIOS.
DRAC can “hot‐plug” virtual devices into the operating system. It has virtual floppy and CD support and physical drive or image support.
• RACADM is the CLI configuration utility for DRAC 5.
47
Storage
Module Objectives
Upon completion of this section, you will be able to:
• Identify the differences between IDE/SATA/SCSI technologies. • Use the new terminology necessary to describe SAS technology. • Describe the fundamental components of SAS technology and how they are connected. • Identify the different disk layouts and configurations offered. • Configure hardware for RAID storage. • Manage storage using BIOS utilities. • Manage storage using Dell OpenManage™. • Troubleshoot RAID hardware issues.
Storage Technologies
Serial ATA (SATA)
Serial ATA is an evolutionary replacement for the parallel ATA (traditionally used by IDE) physical storage interface. • Point‐to‐point configuration – no master‐slave.
o Two data channels, one for sending and one for receiving. o Smaller, easier‐to‐use cables:
Improved data robustness Backward compatibility
48
Small Computer Interface Systems (SCSI)
SCSI is based on an older, proprietary parallel (bus) style architecture in which: • It serves as the interface between all of the other
devices on the SCSI bus and the computer. • The controller can be a card that you plug into an
available slot or it can be built right into the system board. • Serial data bytes are converted into parallel data bits. • Parallel bit sets move down a bus system. • Parallel data bits re‐serialized at the receiving end.
SCSI Limitations
As with all older technologies, there are U320 SCSI limitations: • Increased crosstalk with increased bus speed. • Timing errors at higher bus speeds, and the need for termination to prevent reflections on the bus. • SCSI cables are often wider and more bulky compared to the SAS cable creating airflow and cooling
problems in rack‐dense servers. • SCSI cable is strongly affected by Electromagnetic Interference (EMI).
Serial Attached SCSI (SAS)
• SAS, the successor technology to the parallel SCSI interface, leverages proven SCSI functionality and
greatly builds on the existing capabilities of the enterprise storage connection. • SAS offers many features not found in today’s mainstream storage solutions:
o SCSI command set delivered via a serial topology o Up to 16,384 SAS devices are in a SAS domain o SAS is local, not SAN o Uses 3.5‐inch and smaller 2.5‐inch hard drives
• SAS is new, faster version of SCSI designed to meet the demands of enterprise IT • The next evolution of SCSI beyond Ulta320 SCSI • Leverages enhanced Serial ETA (SATA) while adding support for a second drive port • Uses features of the Fibre Channel and compatibility with SATA drives in a point‐to‐point, switched
architecture
49
SCSI vs. SAS Technologies
Serial Attached SCSI (SAS)
The PHY: Basis of all SAS Communication
The PHY is a transceiver (one transmit and one receive). It is a pathway between two PHYs consists of a transmit circuit at one end of the connection and a receive circuit at the other end. A connection of transmit and receive pairs is called a “physical link.” This physical link looks very much like a phone call ‐ the mouthpiece on each end is transmitting to the earpiece at the other end. Both circuits can transmit conversation at the same time. The call consists of: • The Source PHY transmitting an OPEN address frame, containing a destination SAS address: dialing
the phone call. • The destination PHY replying with an OPEN_ACCEPT: “Hello” • An established connection that remains open: The Conversation. • Ending the conversation, with both sides exchanging CLOSE primitives to close the connection:
“Goodbye”
50
Links and Ports
Narrow Links and Narrow Ports The Link, called a “narrow link” is a PHY with a SAS address attached to another PHY that also has a SAS address. Both directions have the same “physical link rate,” and the link is full duplex. The Port, at the software level the link is seen as a “port.” A SAS device is a device that contains one or more ports. Each port will have a distinct SAS address. The graphic shows a narrow link that also represents a narrow “port” from the viewpoint of the software. Wide Links and Wide Ports Two narrow links can be combined (aggregated) to become a wide link. Each PHY has a unique identifier, but they share a common SAS address and are part of the same SAS device.
Expanders The expander’s main purpose is as a port multiplier.
• Similar in function to an Ethernet switch ‐ as a transparent device through which packets are routed to their destination.
• It is not seen as an “end device,” like an HBA or SAS hard drive.
51
In this example the SAS HBA initiator is using a wide port connection to the expander, while the SAS drives are each using only one port.
Connection Rates
• A connection runs at 3 Gbps or greater.
• The connection rate <= physical link rate.
• If the connection rate is slower than the physical link rate, rate matching will be used.
Example: If a 6 Gbps physical link is “connected” to a 3 Gbps physical link, the 6 Gbps link will insert an ALIGN primitive every other DWORD to halve the effective throughput. The ALIGN primitive is immediately stripped out by the receiver at the 3 Gbps device.
SAS Transport Protocol
• Serial Attached SCSI (SAS) supports both SCSI and Serial ATA
• SAS has three transport protocols
o Serial SCSI Protocol (SSP) ‐ supports SAS (SCSI) disk drives, tape drives, etc. o Serial ATA Tunneling Protocol (STP) ‐ supports Serial ATA disk drives o Serial Management Protocol (SMP) ‐ supports SAS expanders
52
Direct/Expander Attached End Devices
In Direct Attach, the number of drives attached is limited by the number of ports available on the HBA.
Expander Attach
53
SAS Topology
The following graphic provides details on SAS topology.
SAS Domains
A simple SAS domain contains SAS devices and one or more expander devices. In this simplistic example, the hosts are all connected to the expander as are the drives. Hosts could also be attached to both drives and expanders. The following graphics depict additional types of SAS topology.
54
1) Two‐Edge Expander Device Sets:
2) Expander Routing:
55
3) Connection of Two Device Sets:
4) Fanout Expander:
SAS Device Detection
When a SAS controller comes online, it: • Goes out through each of its PHYs • Registers every device it comes across • Notes the device WWN • Maintains the list of physical disks by WWN
56
If the SAS controller supports RAID, it examines the RAID information and builds a map of virtual disks to physical disks. Information about backplanes, slots, and enclosures is obtained by asking any discovered SEPs.
SAS Controllers
• A single SAS controller supports both SAS native drives and SATA II drives • Basic SAS controller: SAS 5i o Daughter card o ‘i’ denotes internal cables only o No external connectors o CTRL‐C BIOS setup
• With added RAID hardware: PERC 5i o Daughter card o CTRL‐R BIOS setup
SAS 5i Connectivity PowerEdge 2900
• The SAS connects to the backplane via a four‐lane cable.
o This allows four (4) PHY connections.
o SAS 5i supports only four (4) hard disks.
SAS/SATA Hard Drives
• There are a number of drives available for the PowerEdge™ servers: o 3.5” SATA II hard drives o 3.5” SAS hard drives o 2.5” SAS hard drives
• Some servers will allow a mix of SAS and SATA • Drives must use the correct carrier
o 3.5” ‐ SATA o 3.5” ‐ SATA or SAS (interposer card) o 2.5” ‐ SAS
57
Disk Data Format (DDF)
PERC 5 uses DDF‐the SNIA specification driven standard formatting for RAID Disk Data. This structure allows a basic level of interoperability between different suppliers of RAID technology.
• DDF is stored twice at the end of each physical disk.
• PERC 5/E uses 512MB on each disk for DDF. • No configuration data is stored on the controller. It is cached into the controller during normal
operation. When all of the member physical disks of a disk group are removed, the virtual disk is deleted.
PERC 5 Initialization When the server is powered on the PERC 5 BIOS initializes. The firmware then performs SAS domain discovery (finds disks). PERC 5 then looks for the DDF information on the hard drives. Disks can be: Unassigned: Not members of any array or hot‐spare etc. Members: Members of an array known to the controller: native. Foreign Good: Members of an unknown complete/working array. Foreign Bad: Members of an unknown broken array.
Redundant Array of Independent Disks (RAID)
Redundant Array of Independent Disks (RAID)
RAID is a system of using multiple hard drives to share or span data across multiple drive devices. RAID provides: • Additional technology which sits on top of SCSI,
SATA or IDE. • A system for storing data on multiple array disks
to ensure availability and performance. o RAID has been developed mainly for server
systems to help increase performance and protect disk data.
o RAID makes a group of disks work as one “virtual disk, and in different ways, known as RAID levels. This can be done with:
58
o Dedicated hardware. o Software within the operating system.
RAID information is stored on a PowerEdge Expandable RAID Controller (PERC) and each disk: • Disks know where they are in “stripe order”. • Adapters know where the disks are. • If disks get moved there will be a mismatch. • If the PERC gets replaced there will be a mismatch.
RAID Levels
Depending upon the type of RAID configuration chosen, the system can benefit with increased measures of fault tolerance, capacity, and throughput. Dell PowerEdge RAID Controllers (PERC) can handle several different versions of RAID. RAID 0: RAID 0 is commonly referred to as a basic “striped set,” meaning that data is evenly distributed across two or more drives with no parity or mirroring involved. Data redundancy is not provided, but RAID 0 is very fast. RAID 1: RAID 1 is commonly referred to as drive mirroring, or data that is mirrored or duplicated on one or more drives. If one drive fails, the data can be rebuilt using the mirror. RAID 1 is fast, but is a high‐cost redundant solution.
59
RAID 3: RAID 3 stripes data across the array disks, with one disk dedicated to parity information. If a drive fails, the data can be reconstructed from the parity. RAID 3 is generally not used in servers because the parity disk is “over‐used.” RAID 5: RAID 5 is the most popular configuration in use with most of today’s systems. RAID 5 provides data redundancy by using data striping in combination with parity information. Rather than dedicating a drive to parity, the parity information is striped across all disks in the array. RAID 10: RAID 10, also known as RAID 1+0, is a combination of the RAID concepts of striping and mirroring. In a RAID 10 configuration, the data is striped across sets of mirrored drives. The mirrored sets allow for added redundancy not found in basic striped sets.
60
RAID 50: RAID 50 is a different spin on the RAID 10 configuration. RAID 50 is a concatenation of RAID 5 across more than one three‐drive spans. RAID 50 provides the features of both RAID 0 and RAID 5. RAID 50 includes both parity and disk striping across multiple drives.
Global Hot Spare
Global hot spare can be used to replace any failed drive in a redundant array as long as its capacity is equal to or larger than the coerced capacity of the failed drive. A global hot spare defined on any SAS/SATA II target should be available to replace a failed drive on both SAS/SATA II targets. Observe the following parameters when using hot spares:
• Use only in arrays with redundancy, which includes RAID levels 1, 5, 10, and 50. • Use a hot spare connected to a specific controller to rebuild a drive connected only to that
controller. • Must assign the hot spare to one or more virtual drives through the controller BIOS or use RAID
management software to place it in the hot‐spare pool. • Hot spare must be as large as or larger than the drive it is replacing. • If the rebuild on a global hot‐spare fails, the global hot‐spare goes back into HOTSPARE state and the
virtual disk goes into FAIL state. • If a dedicated hot‐spare fails during rebuild, the dedicated hot‐spare will become
UNCONFIGURED_GOOD as the virtual disk will be in FAIL state.
Disk Failure
Disk Failure with a Hot Spare Disk If you lose a disk, the system automatically spins up the spare disk and begins the rebuild. Afterward, replace the failing disk so that it becomes the new spare disk. NOTE: If the disks are located in bays 1, 2, and 3, with the spare disk in bay 4 before the rebuild operation, the system is configured in bays 1, 2, and then 3.
61
During the rebuild process, the rebuild is completed on the spare disk in bay 4; therefore, after the rebuild operation is completed, the configuration for the disks is bays 1, 3, and 4 (the disk in bay 2 has not been replaced by the disk in bay 4). Disk Failure without a Hot‐Spare Disk If a spare disk is not installed, the missing data is rebuilt in cache when required. Obviously, this scenario is slow compared with having a spare installed, but the system continues to work. Until the failing disk has been replaced, the system is in critical mode. The failing disk must be replaced and then either an automatic build begins or the process should be manually started.
PowerEdge Expandable RAID Controller (PERC)
Introduction
The Dell PowerEdge Expandable RAID Controller (PERC) controllers are adapters or chips that offer a cost‐effective way to implement RAID features for reliability, high performance, and fault‐tolerant disk subsystem management. PERC controllers: • Reside on the system board • Have one or more channels that connect to storage
devices such as disk drives or enclosures. • Provide the logic that interacts with the
microprocessor and memory or the microprocessor and storage devices to write and retrieve data and perform RAID functions.
PERC Nomenclature
The graphic describes the nomenclature of PERCs. PERC allows you to have total control over how your array is implemented. Additionally, PERC can support the following features (together or separately):
62
• Hot Spare: RAID can be configured with hot spares in place to take over in case of drive failure • Drive Roaming :Used to identify drives that have been moved to different slots on the SCSI
backplane. The PERC associates a drive with the SCSI backplane slot to which the drive is connected during configuration. When it finds a drive in a different backplane slot, it re‐associates the drive's SCSI ID with the new slot.
PERC 4
The PERC is attached through the backplane to individual disk drives. The following table provides information about the PERC 4 family of adapters. PERC 4 was largely used on 8G PowerEdge servers.
PERC 5
PERC 5 is a PCI Express based Serial Attached SCSI RAID controller that provides enterprise class protection and performance. It provides multiple protection options to suit a wide range of applications and non‐critical data. PERC 5 is available in different versions, as noted in the table: The PERC 5 SAS RAID controller features include: • SAS performance of up to 3 Gb/sec • RAID levels 0 (striping), 1 (mirroring), 5
(distributed parity), 10 (combination of striping and mirroring), and 50 (combination of striping and distributed parity)
• Advanced virtual disk configuration and management utilities
• Ability to boot from any virtual disk
63
PERC 5 SAS and SATA Support
Each port on the controllers supports SAS and SATA II devices using the following protocol: • Serial SCSI Protocol (SSP): enables communication with SAS devices like disk drives, tape drives, etc. • Serial ATA Tunneling Protocol (STP): enables communication with other SATA II devices. • Serial Management Protocol (SMP): communicates topology management information directly with
an attached SAS. Supports SAS expanders (discovery and configuration).
PERC 5 Performance
The SAS connection contains two transmit/receive paths. From the controller perspective, one path is a transmit path and one is a receive path, and vice versa from the drive perspective. Both paths operate at the same speed: 3 Gb/s. As with other interconnects (like Fibre Channel) two communication modes are possible: • Half‐duplex: controller and drive do not transmit at the same time • Full‐duplex: controller and drive do transmit at the same time
PERC 5 Battery Backup Unit (BBU)
All PERC 5 controllers support battery‐backed cache. The optional battery pack is a lithium ion battery, which has a three‐year life cycle. It offers an inexpensive way to protect the data on the memory module. The lithium battery provides a way to store more power in a smaller form factor than previous batteries. The Backup Battery Unit features:
• Up to 72 hours of cache protection • Battery is optional on PERC 5/i Adapter • Transportable* battery on the PERC 5/e, but not on
the PERC 5/i • On the PERC 5/e, the battery is on the DIMM, but on the PERC 5/i, it is on the card (and thus not
transferable). • One year warranty for battery
* Transportable means that the dirty cache, battery maintained could be taken to a new card.
64
Battery Thermal Impacts
• Battery charger will be disconnected
60oC/140oF • Battery will be disconnected at
70oC/158oF
Battery Learn Cycle
The battery learn cycle is a feature for battery calibration that is used by all PERC 5 controllers. The battery learn cycle: • Determines the condition of the battery
by discharging and recharging the battery • Operation performed by the controller
approximately every 3 months • Occurs automatically; however, it can be
delayed by up to 168 hours (7 days) The PERC 5 will always switch from write‐back to write through when the battery drops below threshold, and this occurs during the learn cycle.
Native and Foreign Configurations
Native Configurations: Native configuration is a DDF configuration that is currently active on a controller. A native configuration will remain native until it is cleared from the active configuration set either by the user or the system. Foreign Configurations: A foreign configuration is a DDF configuration that is not part of the active configuration. Foreign configurations can be merged with existing native configurations after migration. A "cable pull" or enclosure disconnect will cause virtual disks to become foreign.
Foreign Arrays
An array will be foreign if it is not listed in the controller NVRAM. An array will be listed in NVRAM if: • It was created • It was imported as a working array (foreignè imported)
65
Auto Import of Foreign Configurations after Migration
This function is supported at boot if the target controller does not have any configuration AND the foreign configuration is complete and consistent. A configuration is complete and consistent if and only if all the configured drives are present in the system and all of them have same controller guide, header guide, time‐stamp and sequence number. Importing Configurations If a new configuration is detected you can "import" the configuration in addition to existing configuration. This sequence will create foreign/bad drives:
1. Good config 2. Drives removed 3. Reboot 4. Controller sees missing drives 5. Config changes to 1x broken RAID 5 6. You realize problem 7. Power off, insert drives 8. Power on ‐ "foreign config" 9. Cannot import broken config
NOTE: Forgetting to plug in drives/enclosures can result in foreign/bad.
< CTRL >< R >Overview
Ctrl‐R is a basic configuration utility with limited feature set. It is designed to get you into the operating system. The purpose of it is to allow you to: • Create RAID configuration in pre‐operating system environment • Modify adapter BIOS settings • Provide error recovery functions to troubleshoot problems when the operating system is not
accessible Functions of the < Ctrl > < R > Utility The BIOS Setup Utility for PERC 5 adapters are designed to be used as a basic configuration utility with a limited feature set. Functions of the < Ctrl > < R > utility include:
66
• Configuration • Create/delete virtual disks and hot spares • Initialize virtual disks • View configuration status • Display physical layout of virtual disks and free space(s) in disk groups • Perform fully supported (text‐based) console redirection • Import or clear foreign configurations • Modify adapter BIOS settings • Enable/disable BIOS • Specify boot volume • Error recovery • Force disks online/offline • Manually start rebuilds • Virtual disk recovery
Ctrl‐R User Interface
The user interface is more simplified than the interface used in earlier PERC adapters. The keystrokes have been simplified to increase usability in remote connection scenarios (through text‐interfaces). All operations exist in one of three primary screens: • Virtual Disk Mgmt • Physical Disk Mgmt • Controller Mgmt A keystroke () provides switching between screens. Operation menus are now context sensitive and no longer embedded among many multi‐layered menus. The procedure to create virtual disks is straight‐forward and intuitive. The user interface is more simplified than <CTRL><M>.
67
Multiple Adapters
If you have multiple PERC 5 adapters in a system, when you use the Ctrl‐R sequence during POST to enter the utility, you will be asked which card you want to run Ctrl‐R for. This is because each adapter could have a different firmware version (and therefore a different version of the Ctrl‐R utility). Each card is configured using its own Ctrl‐R firmware.
Not in the BIOS Utilities
A number of features that you might expect to be available in BIOS configuration utilities are not ‐ in which case OpenManage Server Administrator (OMSA) must be used. These features include: • Battery management • Patrol Read management • Modify background operation rates (BGI, CC, rebuild, reconstruction) • Modify auto‐rebuild setting • Modify cache flush interval
Foreign Configurations in CTRL‐R
Foreign configurations can be imported or cleared from < Ctrl >< R >. If a foreign configuration is found, you can import or clear them from here. NOTE: The Foreign Config option will only be available is a foreign configuration has been detected.
68
OpenManage Server Administrator
Introduction
Dell OpenManage Server Administrator Storage Management (formerly known as: Array Manager) provides enhanced features for configuring a system's locally‐attached RAID and non‐RAID disk storage. Storage Management enables you to perform controller and enclosure functions for all supported RAID and non‐RAID controllers and enclosures from a single graphical or command‐line interface without requiring use of the controller BIOS utilities. The graphical interface is wizard‐driven with features for novice and advanced users and detailed online help.
Connecting to OMSA
Storage Management is installed as a Dell OpenManage™ Server Administrator service. All Storage Management features are accessible by selecting the ‘Storage’ object in the Server Administrator tree view. To start a Server Administrator session on any remote system, open a Web browser, type one of the following in the address field, and press <Enter>. • https://<localhost>:1311 (where <localhost> is the assigned name for the managed system and 1311
is the default port) • https://<IP address>:1311 (where <IP address> is the IP address for the managed system and 1311 is
the default port.) NOTE: You must type https:// (not http://) in the address field to receive a valid response in your browser. To start a Server Administrator session on a local system click the Dell OpenManage icon on the system’s desktop. A login window will be displayed where the user must enter an account with Administrator privileges.
69
Component Properties
After OpenManage is launched select the object in the system tree and then review the properties tab. There are two main sections for each of the components displayed in the Properties tabs; Health and Information/Configuration. Click on each of the storage components and the Health or Information/Configuration for more information. Health: The Health sub tab displays the current status for the storage components. The tree object reflects the status of all lower‐level objects. For example, if the storage system is compromised due to a degraded enclosure, both the enclosure Health sub tab and the Storage Health sub tab display a yellow exclamation point (!) to indicate a Warning severity. A quick way to review the status of all storage components is to select the Storage tree view object and view the Health sub tab. You click the storage components on the Health sub tab to display detailed information about the component.
70
Component Status
Component status is indicated by the severity. A component with a Warning or Critical/Failure status requires immediate attention to avoid data loss. Review the Alert Log for events indicating why a component has a Warning or Critical status.
Storage Information / Configuration
The OpenManage Server Administrator (OMSA) storage menu lists the storage controllers installed in the system. The two tasks that are performed at the global Level are: • Global Rescan: A global rescan updates configuration changes (such as new or removed devices) for
all controllers and their attached components. • Enable/Disable Smart Thermal Shutdown: By default, the operating system and server shut down
when the enclosures reach a critical temperature of below 0 degrees Celsius or above 50 degrees Celsius. Using the Enable Smart Thermal Shutdown task, however, you can specify that only the enclosure and not the operating system and server be shut down when the enclosure reaches a critical temperature. To restore the system to its default setting, use the Disable Smart Thermal Shutdown task.
71
Storage Management Features
There are many tasks associated with controllers, enclosures, virtual disks, and array disks that users can perform with the Storage Management Service. If a user expands the controller listing in the system tree in the left side of the OpenManage Server Administrator screen, they will see the list of tasks available. Depending on the controllers and storage attached to the system, the expanded Storage object may display the following lower‐level objects, each of which may have related tasks underneath. • Controller • Battery • Connector • Enclosure or Backplane • Physical Disks • EMMs (Enclosure Management Modules) • Fans • Power Supplies • Temperatures • Firmware/Driver Versions • Virtual Disks
Controller Object
The Controller object provides information about your controllers and the various components attached to the controller. The components attached to the controller can include battery, virtual disks, and so on. The following controller tasks are available when the Controller object is selected.
72
1) Set Check Consistency Rate: This task changes the amount of system resources dedicated to the check consistency task.
2) Set Reconstruct Rate: This task changes the amount of system resources dedicated to the
reconstruct task. 3) Set Patrol Read Mode: This feature identifies disk errors in order to avoid disk failures and data loss
or corruption. 4) Start and Stop Patrol Read: These tasks enable you start a Patrol Read task or stop a running task
when the Patrol Read mode is set to manual. 5) Import/Recover Foreign Configuration: This task imports and recovers virtual disks that resides on
physical disks that are moved from another controller. 6) Rescan Controller: A rescan controller updates configuration changes (such as new or removed
devices) for all components attached to the controller. 7) Create Virtual Disk: This launches the Express and Advanced Create Virtual Disk Wizards. 8) Enable, Disable, Quiet, and Test Alarm: These tasks enable you to manage the controller alarm. For
example, you can set the alarm to sound in the event of a device failure or quiet the alarm once it is sounding.
9) Set Rebuild Rate: The rebuild rate refers to how much of the system's resources are dedicated to
rebuilding a failed physical disk. This task enables you to adjust this setting. 10) Reset Configuration: This task erases all information on the controller, so that you can perform a
fresh configuration. This operation destroys all virtual disks on the controller. 11) Export Log File: This task exports the controller log to a text file. 12) Import Foreign Configuration: This task imports virtual disks that reside on physical disks that have
been moved from another controller. 13) Clear Foreign Configuration: Use the clear foreign configuration task to clear or erase the virtual
disk information from the newly attached physical disks. 14) Set Background Initialization Rate: This task changes the amount of system resources dedicated to
the background initialization task.
Battery Object
The following battery tasks are available when the Battery object is selected. These tasks are only available for controllers that have batteries that require reconditioning. 1) Recondition Battery: This task fully discharges and recharges the controller battery.
73
2) Start Learn Cycle: Use the Start Learn Cycle task to initiate the batter Learn Cycle 3) Battery Delay Learn Cycle: Use this task to delay the start dime of the Learn Cycle for up to seven
days.
Connector Object
The Connector object provides information about the connector and the enclosure or backplane attached to the connector. The following connector task is available when the Connector object is selected. • Rescan Connector‐ This task rescans the controller connectors to verify the currently connected
devices or to recognize new devices that have been added to the connectors. Performing a rescan on a connector is similar to performing a rescan on the controller.
Enclosure/Backplane Object
The Enclosure or BackPlane object provides information about the physical disks, temperature probes, and other components attached to the enclosure or backplane. The following enclosure tasks are available when the Enclosure object is selected. 1) Enable and Disable Alarm: Use these tasks to manage the enclosure alarm. When enabled, the
alarm sounds when the enclosure encounters an error condition. 2) Set Asset Data: Use this task to change the enclosure's asset tag and asset name. 3) Set Temperature Probe Values: Each temperature probe has a warning and a failure threshold. The
warning threshold indicates that the enclosure is approaching an unacceptably warm or cool temperature. Use this task to modify the warning threshold.
Physical Disk Object
The Physical Disks object provides information about the physical disks attached to the enclosure or backplane. The following physical disk tasks are available when the Physical Disks object is selected. 1) Blink and Unblink: The Blink task allows you to find a disk within an enclosure by blinking one of the
light‐emitting diodes (LEDs) on the disk. The Unblink task cancels the Blink task. 2) Remove Dead Segments: In certain circumstances, this task enables you to recover disk space that
is currently unusable. 3) Assign and Unassign Global Hot Spare‐ Assign/Unassign one or more physical disks as a global hot
spare. 4) Prepare to Remove: Use this task before removing a disk from an enclosure.
74
5) Online and Offline‐ Use the Offline task to deactivate a disk before removing it. Use the Online task to reactivate an offline disk.
6) Initialize: On some controllers, the Initialize task prepares a physical disk for use as a member of a
virtual disk. 7) Rebuild‐ Rebuild a failed physical disk 8) Cancel Rebuild: se the Cancel Rebuild task to cancel a rebuild that is in progress. 9) Clear Physical Disk and Cancel Clear: Use the clear physical disk task to erase data residing on a
physical disk.
EMMs Object
The EMMs object provides information about the Enclosure Management Modules (EMMs).
Fans Object
The Fans object provides information about the enclosure fans.
Power Supplies Object
The Power Supplies object provides information about the enclosure power supplies.
Temperatures Object
The Temperatures object provides information about the enclosure temperature probes. The following temperature probe task is available when the Temperatures object is selected. Set Temperature Probe: The temperature probes monitor the enclosure's temperature. Each temperature probe has a Warning and a Failure threshold. The Warning threshold indicates that the enclosure is approaching an unacceptably warm or cool temperature. Use this task to modify the Warning threshold.
Firmware/Driver Versions Object
The Firmware/Driver Version object provides information about the version of the driver and firmware that are currently installed on the controller. The firmware and driver properties can vary depending on the model of the controller.
Virtual Disks Object
The Virtual Disks object provides information about the virtual disks configured on the controller. The following virtual disk tasks are available when the Virtual Disks object is selected.
75
1) Blink and Unblink: The Blink and Unblink tasks blink or unblink the lights on the physical disks included in the virtual disk.
2) Rename: Use this task to rename a virtual disk. 3) Reconfigure: Launches the Reconfigure Virtual Disk Wizard which enables you to change the virtual
disk configuration. 4) Cancel Rebuild: Use the Cancel Rebuild task to cancel a rebuild while it is in progress. 5) Cancel Reconfigure: Use the Cancel Reconfigure task to cancel a virtual disk reconfiguration while it
is in progress. 6) Format and Initialize; Slow and Fast Initialize: Use the Format or Initialize; Slow and Fast Initialize
task to erase files and remove the file systems on a virtual disk. 7) Cancel Background Initialization: On some controllers, background initialization of redundant
virtual disks begins automatically after the virtual disk is created. Use this task if you need to cancel the background initialization.
8) Restore Dead Segments: Use the Restore Dead Segments task to recover data from a RAID‐5 virtual
disk that has been corrupted. 9) Delete: Use this task to destroy all data on the virtual disk. 10) Assign and Unassign Dedicated Hot Spare: Assign/Unassign one or more physical disks as a global
hot spare 11) Check Consistency, Cancel Check Consistency, Pause Check Consistency, and Resume Check
Consistency: If you have created a redundant virtual disk, the Check Consistency task verifies the accuracy of the redundant (parity) information. This task only applies to redundant virtual disks. When necessary, the Check Consistency task rebuilds the redundant data.
76
Server Internal Storage Troubleshooting Below is a short list of issues that maybe be encountered when troubleshooting server internal storage. Click on each of the following issues to display details about the issue. NOTE: Before starting any of the troubleshooting suggested below a user should ensure they have a complete backup of the data contained in the storage system.
Drive(s) going offline
When a drive or drives has gone offline, the user should attempt to rebuild the drive(s) into the array. Monitor the rebuild process and restart troubleshooting if issues are experienced.
SMART errors on the physical drive
When SMART errors on a physical drive are experienced, the drive will need to be replaced. After replacing the drive, monitor the rebuild process and restart troubleshooting if issues are experienced. After the successful completion of the rebuild, ensure that the firmware on all the drives is up‐to‐date with the firmware on the PERC and the operating system drivers.
Dell OpenManage Server Administrator warnings about firmware
Verify that the firmware on the hard drive, backplane, and PERC are current with the drivers in use by the operating system.
Driver not ready or unrecoverable errors
There are three options that allow the user to perform a drive self test. The first two tests can be performed with the server still running. However, all tests require that a Dell application be installed on the server. • OpenManage Diagnostics (Quick Test) • Dell Online Diagnostics (Quick Test) • 32‐Bit Diagnostics (Quick Test) (reboot required)
RAID Troubleshooting
The first step in RAID troubleshooting is to determine the category of issue the server is having and then collect some information on the following items to help troubleshoot the issue.
• Server model number
77
• PERC controller version • Current RAID level in use • Current RAID status • Firmware version installed (drives and PERC) • OS PERC Driver version installed • Hard Drive(s) size and configuration These are some of the issues encountered. Click each link to view additional information about troubleshooting RAID.
Drive in a Fail State
• Look in the Dell OpenManage Storage Manager log to see why the drive was taken offline (Sense
Key error). • If no reason for drive failure is seen, pull the PERC controller log (TTY log for an LSI controller,
controller log for an Adaptec controller) to determine which drive has an error. • Run diagnostics on the drives in Dell OpenManage Storage Manager. While the diagnostics are not
complete, they will help in analyzing array and drive failures. • If the logs do not indicate an error with the drive or array (sense key 03 or 04 or a predictive failure),
run Online Diagnostics on the PERC and hard drives. The Online Diagnostics offer more robust testing of the hardware than the diagnostics within Array Manager.
• Instead of running the full diagnostics, if you can reboot the server, use the PERC BIOS. If you have an LSI controller, go to Objects>Physical Drives and press <F2> on each drive to look for media errors and verify the size of the drive is correct (not 0 MB). The drive that is offline may not be causing the problem. If you have an Adaptec controller, go to Disk Utilities, select each drive and perform a verify on each drive. The rule of thumb is to verify the drive 10%. If no errors occur before then, press <Esc> and move to the next drive. If you find an error on the drive, verify to 10% after the last error before moving to the next drive (most drive errors are seen at the beginning of the disk).
Drive in a Missing State
• Determine if the storage manager lists the drive size information correctly, or does it show 0 MB. • If the size is listed as 0 MB, move the drive to another slot to see if the drive size is detected
correctly. o If you are able to see the drive correctly after moving it to another slot, the backplane or
cabling is most likely the issue. o If moving the drive does not allow the drive to be seen correctly, most likely the issue is
a bad hard drive. • If the size is listed correctly, check the array configuration for inconsistencies.
78
Multiple Drive Failure
• Use log files to determine the order in which the drives failed. • If the operating system is functional, pull the logs from Dell OpenManage Storage Manager.
o Use DSN to look up any error messages in the logs and how to rebuild the array. • If the operating system drive is offline, pull the TTY log from an LSI controller or go into the PERC
BIOS of the Adaptec controller. Then, go into the <Control>‐<P> menu to view the controller log for errors that would cause the drives to go offline.
o Check the drives on both controllers to look for media errors that would have caused the PERC controller to lose the drive configurations.
Hard Drive with Pre‐Failure Warning
• Take the drive offline before replacing the drive. Failure to do this can cause further problems. • If the hard drive is in a hot‐plug backplane, replace the failed hard drive while the server is powered
on. • If the hard drive is not in a hot‐plug backplane, power down the server to replace the drive. The
server should detect the drive replacement and initiate a drive rebuild when it boots the PERC/CERC controller.
Using the above information, a user can view additional specific troubleshooting information using the Dell Solution Network, which also contains specific troubleshooting items. There is more on the DSN in the Navigating Dell Information and Tools section.
Summary • Serial Attached SCSI (SAS) is a new, faster version of SCSI designed to meet the demands of
enterprise IT. • If the SAS controller supports RAID, it examines the RAID information and builds a map of virtual
disks to physical disks. • A single SAS controller supports both SAS native drives and SATA II drives. • PERC stands for PowerEdge Expandable RAID controller. Many people think that the “E” stands for
“Edge.” • All PERC 5 controllers support both SAS and Dell‐compliant SATA II hard drives. • Phy = physical link made up of 2 pairs (rcv, xmit). • Every port in a domain has a World‐Wide‐Name. • < Ctrl > < R > is designed to allow you to get into the operating system. • The benefits of using RAID include performance, capacity, and data protection.
79
Troubleshooting and Diagnostics
Objectives Upon completion of this section, you will be able to: • Understand how to interpret POST messages and other error indications. • Locate and use the DDDP to create stand‐alone diagnostic test media. • Use OpenManage™ Server Administrator (OMSA) to find logs and run diagnostic tests. • Examine the PERC 5/i TTY logs. • Use 32‐bit Diagnostic tools to run tests. • Understand the proper application of the Diagnostic tools available for PowerEdge™ server systems.
Troubleshooting Techniques
Broad‐Level Steps
Those that attend PowerEdge training report that troubleshooting is a major skill required to effectively perform their job. This section offers a few tips to get you started.
Check the Obvious
• Reboot the system. • Clear NVRAM. • Ensure that your BIOS supports all of your hardware and software. • Ensure all cables and external attachments are correctly connected. • Carefully read through any errors that are displayed on the screen and follow any prescribed
remedy options. • Ensure compatibility between new hardware/software and existing system configurations.
Communicate
• First, talk to the user about the problem. • Check to make sure that connections and power are OK. • Does the system power on and if so, how far does it get? • If you cannot find any problems with the system, it could be an intermittent error. • Check the manufacturer’s website or you may even need to call the support desk.
80
Post Messages and Other Error Indications
What Is an LCD Panel?
The LCD panel is the primary error‐indicator in 9G systems. Errors may appear before, during, and after POST. The physical layout of the panel includes: • 1 row X 5 column alpha‐numeric dot
matrix display • Power button (green when powered on) Display colors include: 1) System ID state: White letters on a blinking blue background. 2) Non‐alert state: White letters on a solid blue background. 3) Alert state: Very light amber letters on a dark amber background.
4) NOTE: If you press and hold the ID button for approximately 5 seconds, the LCD will enter the BIOS Progress Code state. This feature is very useful when troubleshooting a no POST or no video situation.
Interpreting LCD Message Codes
The first five characters of errors contain the Message Code.
• Error text scrolls through: for example, I/O Channel Chk.
81
Display of Error Messages
• The LCD will display up to three (3) error
messages, based on the priority of the message.
• All messages are logged in the SEL. • When more than three (3) messages are
displayed, you will see the first three (3) messages scroll the LCD, followed by: I1991 > 3 ERRs Chk Log.
BIOS Progress Code Display
The BIOS generates progress codes during POST. • Identify boot stages. The BIOS sends the codes to the BMC before each stage begins.
1) The BIOS will send a code such as “MEMC” before it begins configuring the system memory. 2) After stage successfully completes it will send the next progress code and continue on.
If the user presses and holds the ID button for approximately 5 seconds, the LCD shows the last BIOS state code.
LCD Display
Clearing of the LCD Display When an error is displayed, the LCD panel continues to display errors until: • The condition is cleared (for sensors) • System reset
82
BIOS related messages are usually cleared this way: • The SEL is cleared • A/C power is cycled LED Indicators • Present on:
o PSU o Disks o Power Button o Rear Cyclops o Not on internal components
• Pay attention to the control panel messages. o They should direct you straight to the issue.
Error Messages If the system can, it will display errors during POST on the screen. NOTE: If the technician presses and holds the ID button for approximately 5 seconds, the LCD will enter the BIOS Progress Code state. This feature is very useful when troubleshooting a no POST or no video situation.
Hard Drive Indicator Codes
The hard‐drive carriers have two indicators: the drive‐activity indicator and the drive‐status indicator. 1) Blinks green two times per second: Identify
drive or preparing for removal 2) Off: Drive ready for removal
3) Blinks green, amber, and off: Drive predicted
failure 4) Blinks amber four times per second: Drive failed 5) Blinks green slowly: Drive rebuilding 6) Steady green: Drive online
83
7) Blinks green three seconds, amber three seconds, and off six seconds: Rebuild aborted
NOTE: For non‐RAID configurations, only the drive‐activity indicator is active. The drive‐status indicator is off.
NIC Activity/Link Indicators
Each NIC on the back panel has an indicator that provides information about network activity and link status.
1) Link and activity indicators are off: Network interface card (NIC) is not connected to the network
2) Link indicator is green: NIC is connected to a valid link partner on the network 3) Activity indicator is blinking amber: Data is being sent or received NOTE: Depending on the speed of the NIC, you will get different light combinations.
DRAC Activity/Link Indictors
1) Link and activity indicators are off: Network interface card (NIC) is not connected to the network 2) Link indicator is green: NIC is connected to a valid link partner on the network 3) Activity indicator is blinking amber: Data is being sent or received
System Status Indicator
The identification buttons on the front and back panels can be used to locate a particular system within a rack. 1) Indicator off: System is not turned on. 2) Indicator solid blue: The system is on and the BMC did not report any errors.
84
3) Indicator blinking blue: System is turned on and is in system ID state. 4) Indicator blinking amber: System is turned on and is in system alert state. Alert state means that the server has an error.
Power Status/Fault/Present Indicators
1) Power supply status: Illuminated green indicating power supply is operational 2) Power supply fault: Illuminated amber indicating the power supply is faulted 3) Power status: Illuminated green indicating power source is connected to the power supply
BIOS Messages
System Messages
System messages appear on the screen to notify users about possible problems. The graphic lists the system messages that can occur in a PowerEdge 2950 as of BIOS version 1.0.1.
85
Baseboard Management Controller
What Is Baseboard Management Controller?
Baseboard Management Controller (BMC) is a microcontroller present on all 9G PowerEdge servers and is controlled by updateable firmware. BMC senses and reports on temperature, voltage and fan speeds. BMC provides several communication interfaces: • Host: in‐band (local) • Serial: out‐of‐band (remote) • LAN: out‐of‐band (remote) The behavior of the BMC changes when you install a DRAC 5 adapter. The System Event Log (SEL) stores the error conditions of the components that are monitored and reported on. Thresholds are preset, predefined, and will only be reported if the component falls outside of that threshold (indicator LEDs will flash amber on the front of the system server when a component
fails). You cannot change any of the thresholds EXCEPT the ambient temperature range.
BMC and the LCD Panel
The BMC monitors the managed components. When an error occurs, the BMC: • Logs an event in the System Event Log. • Displays the error on the LCD.
o Only 3 errors will remain on the LCD, by priority. • Can optionally send an SNMP trap. • Can take power actions (reset, power off). What Is an SNMP Trap? An SNMP trap is generated by management software or the firmware in a management device when a message needs to be delivered somewhere. • The message is usually about a failure or a security breach, but could simply be to say that a PC has
booted correctly. • Dell servers can generate SNMP traps from the BMC and also from the Dell OpenManage Server
Administrator (OMSA) management software. • SNMP traps have to go to a specified destination, so you usually have to configure an “SNMP Trap
Destination” somewhere in your system.
86
BMC Connections
You can connect to the BMC when the server is off by: • Serial: out‐of‐band (remote): Using the system’s COM port. • LAN: out‐of‐band (remote): Using one or more of the onboard NICs.
Connection Modes
The BMC can be run in three modes: serial, terminal, and LAN. Serial Mode Serial mode is the basic mode of connection for the BMC. • Basic mode • Server BIOS external serial connector
setting to Remote Access. • BMC serial connection mode to Basic
Mode • Configure BMC serial baud rate. • Run the ipmish CLI command to
connect and use. A successful serial connection requires two (2) configuration steps:
1. Configure the server BIOS (<f2>) serial connection. 2. Configure the BMC <ctrl><E> (ensure that baud rates match your remote system).
Terminal Mode The terminal mode is used when utilizing console redirection. • Server BIOS external serial connector settings to Remote Access. • Server BIOS console redirection setting to On with Console Redirection via COM2. • BMC serial connection mode to Terminal Mode. • Configure BMC serial baud rate. • Run HyperTerminal to connect to the BMC. User IDs: In terminal mode, you will need to log into: • OpenManage hardware user ID, traditionally
o User: root
87
o Password: calvin • This is a pun: IPMI is also known as “Hobbes bus.” LAN Connection Perhaps more practical than serial, you can use the LAN. BMC can share one or more of the LOM ports. For this type of connection, you will still use impish. o Remote control using LAN o BMC KG key o Ipmish o IPMI shell issues Remote control using LAN: PowerEdge servers support console redirection: • Independently of the BMC • POST console sent through COM1 • Operating system CLI sent through COM1 post operating system load
o Linux Console and Microsoft SAC BMC serial connection allows: • Hyperterminal version of impish • CLI version of impish BMS LAN connection allows: • Ipmish
BMC KG Key
The KG key is an optional security measure for LAN‐based communications. • Key is a hexadecimal value that can be set in the BMC Setup Utility, called the RMCP+ Encryption
Key. • The hexadecimal value must be given when connecting to the BMC. In 9G systems, the BMC implements teaming and failover using both system LOMs. This functionality can be configured in the “NIC Selection” setting using OpenManage Server Assistant (OMSA), DTK, and the Remote Access Configuration Utility.
Available options include:
1) Shared: BMC shares the network interface with the host operating system.
88
2) Failover: Same as shared mode except if NIC 1 fails, the BMC fails over to NIC 2. 3) Dedicated: Only available when the DRAC is present and using a dedicated NIC on the DRAC.
Configuring the BMC
• The BMC can be configured using the DTK, OMSA or the BMC Setup Utility (accessed using <Ctrl>‐
<E>). • The Remote Access Configuration Utility is the last option ROM that runs before booting to the
operating system. • Most of the configuration options in the BMC Configuration Utility have either been seen before or
are quite simple. One exception is the RMCP+ Encryption Key.
Intelligent Platform Management Interface
Intelligent Platform Management Interface (IPMI)
IPMI was designed by Intel in 1998 to push forward manageability in server systems. It is a command language interface to the Baseboard Management Controller that can be used to manage controlled devices. Servers based on IPMI use “intelligent” or autonomous hardware that remains operational even
89
when the processor is down so that platform management information and control capabilities are always accessible. Other partner technologies that exist include: • ICMB: Intelligent Chassis Management Bus • IPMB: Intelligent Platform Management Bus Many servers are equipped with more than 100 on‐board sensors. These sensors are connected to the BMC via the IPMB bus. The BMC/IPMB/ICMB implementation is designed to make the hardware management and monitoring architecture a stand‐alone computer subsystem.
IPMI 2.0
Intelligent Management Platform Interface (IPMI) 2.0 is an enhancement of IPMI 1.5 and is designed to extend customers’ IT capabilities and further improve remote management by introducing enhanced security, remote access and configurational capabilities, while maintaining compatibility with previous IPMI versions. Features of IPMI 2.0 include. • New authentication and encryption algorithms: enhances security for remote management access. • “Serial Over LAN:” supports remote interaction with serial‐based applications, BIOS and operating
system. • SMBus System Interface: provides low‐pin count connection for low‐cost management controllers. • Firmware Firewall: supports partitioning and protection of management between blades in modular
system implementations. • IPMI “Payloads”: provides infrastructure for Serial Over LAN and OEM value‐added redirection
capabilities. • New user login and configuration options: enable users’ access rights and security configuration
capabilities to be tailored to the needs of the user’s facility.
Virtual Media on Linux
To run the virtual media feature on a management station running the Linix operating system: • Install a supported version of Mozilla or Firefox. • If the virtual media plug‐in is not installed or if a newer version is available, a dialog box appears
during the installation procedure to confirm the plug‐in installation on the management station. • Ensure that the user ID running the browser has write permissions in the browser’s directory tree. • If the user ID does not have write permissions, you cannot install the virtual media plug‐in.
90
Dell Diagnostics Distribution Package
What Is Dell Diagnostics Distribution Package (DDDP)?
DDDP is a distribution mechanism for Dell diagnostics and MP Memory diagnostics. DDDP includes new downloadable executables that allow you to create diagnostic media. The following media types are supported with the DDDP: • Diskettes • CDs • USB flash drives • PXE images BE AWARE. There is an 8G and 9G version of DDDP. They are NOT backward compatible! DDDP: Install to a USB Flash Drive: This dialog box is displayed when the Install to USB Flash Drive button from the main window is selected. You can select a USB flash drive from the drop‐down list box. • Flash drives can be hot‐plugged and will
automatically be detected and added to the list without the need to restart the DDDP application.
• The drop‐down list includes the drive letter (if assigned) in parentheses, followed by the volume name in brackets ([ ]), followed by the size and description of the flash drive.
o If the Cancel button is selected, nothing will be written to the flash drive, and the application will return to the main window.
o If OK is selected, then the flash drive will be reformatted to make it bootable, and the diagnostic files will be copied to it.
Caution: Clicking OK will erase all data already on the flash drive.
91
NOTE: The following items apply: • Local administrator rights are necessary to create a bootable flash drive from DDDP. • Only flash drives up to two (2) GB in size are supported by DDDP. • Boot functionality has been tested with Dell‐branded USB flash drives. • Other products may not work correctly, even if DDDP can successfully write the image to them. DDDP: Create a Bootable CD This dialog box is displayed when the Create a Bootable CD button from the main window is selected. By default, this option writes an ISO image file at the specified location. You can enter the full path to the file directly in the field at the bottom of the dialog or by selecting the ... button to display a dialog box to select the location and the filename to use. An option of burning a CD‐R or CD‐RW media directly from the DDDP application is also available. Select the Burn the image to CD‐R or CD‐RW media instead of saving to a file check box. The check box is grayed out if the system does not have a drive capable of burning CD‐R or CD‐RW media. • If the Cancel button is selected, nothing will be written
and the application will return to the main window. • If OK is selected, then the image file will be saved in the
filename and location selected. • If the Burn the image to CD‐R or CD‐RW media instead of
saving to a file check box has been checked, then the Burn CD dialog box at right is displayed.
92
Select the Location to Store the CD Image: This dialog appears when the ... button from the Create a Bootable CD dialog box is selected. This dialog is a standard Windows save file dialog and you can navigate through the directory structure to the location to store the file. Enter the name of the file to save into the File name: field. Click Save and DDDP returns to the Create a Bootable CD dialog box. 1. Click OK to write the image file. 2. Click Cancel to return to the main DDDP window. Burn a CD: After OK is selected from the Create a Bootable CD dialog box, and when the Burn the image to CD‐R or CD‐RW media instead if saving to a file check box has been checked, the Burn CD dialog box is displayed. Next, select a CD recorder to use to create the CD from the drop‐down list box. Click OK. The CD is erased if the media type is CD‐RW. Then, the diagnostic data is copied to the CD. This option supports only CD‐R and CD‐RW media. The system must have a drive capable of writing to one or both of these media types. NOTE: The capability of burning a CD directly from DDDP is not intended to support a wide variety of different CD burners and configurations. If the direct burn capability does not work in a particular hardware configuration, then it is best to create an ISO image and use commercial CD creation software to burn the image to a CD. NOTE: Local administrator rights are necessary to burn a CD directly from DDDP.
93
DDDP: Create a Bootable Diskette Set This dialog box is displayed when the Create Bootable Diskette Set button on the main window is selected. Select a 1.44 MB diskette drive from the drop‐down list box. USB diskette drives are hot‐pluggable and are automatically detected and added to the list without restarting the DDDP application. If Cancel is selected, then nothing is written to the diskette drive, and the application will return to the main window. If OK is selected, then the Create Diskette Set dialog is displayed. NOTE: This option only supports 1.44 MB diskettes. Create Diskette Set This dialog box is displayed after selecting OK from the Create Bootable Diskette Set dialog box shown in the previous section. It is updated for each diskette in the set to indicate the current diskette number and the total number of diskettes in the set. Click the Cancel button to return to the main DDDP window. After clicking OK, the diskette is formatted and the diagnostic files are copied to it. All existing data on the diskettes used is erased. DDDP: Create a Hard Drive Image This dialog box is displayed when Create a Bootable HDD Image File button is selected from the main window. The full path to the file can be entered directly in the field at the bottom of the dialog, or select the ... button to display a dialog box to select the location and the filename to use. • If the Cancel button is selected, nothing will be
written and the application will return to the main window.
• If OK is selected, then the image file will be saved in the filename and location selected.
94
Select Location for Hard Drive Image File: This dialog box is displayed when the user clicks the ... button from the Create a Hard Drive Image dialog box. This dialog box is a standard Windows save file dialog box and can be navigated through the directory structure to the location to store the file. 1. Enter the name of the file to
save in the File name: field. 2. Click Save. The DDDP returns
to the Create a Hard‐Drive Image dialog box.
3. Click OK to write the image file, or click Cancel to return to the main DDDP window.
PXE Boot To PXE‐boot the DDDP hard drive ISO into the diagnostics, additional software and network infrastructure is required including: • TFTP server • DHCP server • Boot‐loader capable of network booting Specific instructions about how to PXE‐boot is beyond the scope of this training course. However, the following steps provide a high level overview of how to set this up using the pxelinux bootloader. Pxelinux (http://syslinux.zytor.com) is one example of an open source bootloader capable of booting DOS boot images. If you already have a TFTP server and a DHCP server configured on your network, you can easily use pxelinux to boot the hard drive image created by DDDP. Follow these basic steps to perform this task: 1. Download the SYSLINUX package. 2. On the TFTP server, create the directory /tftpboot and copy the files pxelinux.0 and memdisk (from
the SYSLINUX distribution) to that directory. 3. Using the DDDP application, create an HDD image file called diags.img and copy it to the /tftpboot
directory.
95
4. Create a directory called /tftpboot/pxelinux.cfg on the TFTP server. In that directory, create an empty text file called default (with no extension) and add the following text to the file: DEFAULT diagnostics LABEL diagnostics kernel memdisk append initrd=diags.img
5. Consult the documentation for your DHCP server and configure the following scope options: 043 Vendor Specific Info 01 04 00 00 00 00 ff 066 Boot Server Host Name <Enter the IP address of your TFTP server> 067 Bootfile Name pxelinux.0 You may also need to configure the following additional options: 013 Boot File Size <Take the size in bytes of the pxelinux.o file / 512 and put the resulting number here> 060 ClassID PXEClient
PXE booting should now be enabled. Try booting a client and selecting the boot time option to PXE boot (usually <F12>) and the system should boot to the diagnostic image.
DDDP Media Options
The DDDP executable can be downloaded from support.dell.com. The executable file will also be on your Dell Server Installation and Management CD. DDDP offers the following advantages: • Provides support for users without a CD drive, floppy drive, or hard drive through the use of USB
flash drives/keys. • Improves ease‐of‐use for downloading and running Dell diagnostics and MP Memory diagnostics. • Reduces number of downloadable images from six to two. Dell has one DDDP image for windows
and one image for Linux. The USB flash drive/key can only have one bootable image on at a time.
96
Boot Order: Flash Key
With the 9G servers, you can make the USB key bootable two ways: • BIOS Setup • Booting using F11 BIOS Setup
97
Booting Using F11
Test Selection Menu
• The Dell Diagnostics procedures run in a special version of DOS called DRMK (Dell Real Mode
Kernel). • The DRMK boots into what becomes the c:\ drive for the server and expands the diagnostics into a
RAM drive which becomes drive d:\. • From the DDDP menu, you can select:
o MP Memory Diagnostics menu (interactive) o Dell Diagnostics GUI (interactive) o Loop the MP Memory and Diagnostic tests (non‐interactive) o Quit
98
Running Diagnostics
After the bootable media is created using DDDP, boot to the media to display the following menu driven screen. Select the appropriate diagnostic from the list. NOTE: The diagnostics can also be executed from the command line. Exit the menu, perform a directory listing to find the diagnostic name, and then execute it from the command line.
Overview of DDDP in Linux
DDDP supports Dell systems running the following Linux operating systems: • Red Hat® Enterprise Linux version 3 (AS) • Red Hat® Enterprise Linux version 4 (AS) • SUSE™ LINUX Enterprise Server version 9 DDDP is a batch script that has a binary data file appended to the end of the script file. This single file release format is executed using the batch shell.
DDDP in Linux
Linux Script Execution • The DDDP script must be executed by root.
o Attempts to directly access CDROM, floppy and USB key will need to chmod the file to make it executable
• DDDP will prompt the user for the one type of media to be created. o Default pathnames for the working and output filenames will be shown.
Creating Media: Linux • DDDP will create bootable media and images that contain the following components:
o The DRMK boot image used as the execution platform. o The DellDiags diagnostics programs used to test system components.
• DDDP will only write the boot image to the media if it can detect the media device type the user selected.
o DDDP will prompt the user for confirmation of device writes before any media writing is performed.
o W DDDP will identify the error output and program being executed in case of errors during the script execution.
99
Running DDDP in Linux
Launching the Test • The DRMK environment starts. • The virtual disk is created and files expanded into it.
100
MP Memory Tests
Multi‐Processor Memory Tests: • Are considered by many to be the conclusive memory tester • Tests processor cache ‐ but not memory below one (1) MB
Dell PowerEdge Diagnostics: An Overview
Dell PowerEdge Diagnostics is a suite of diagnostic programs, or test modules, that run locally on your system. When you start PowerEdge Diagnostics, the devices on your system are discovered. You select diagnostics tests to run from the Diagnostic Selection tree that contains the hardware that PowerEdge Diagnostics discovers on your system. The PowerEdge Diagnostics graphical user interface (GUI) provides several options. Tools to Test, View, and Save Results You can select tests for various parts of a system and run them by clicking the Run Tests button in the Tests Selected tab. You can select to run the diagnostic tests in Normal or Quick Test mode.
101
Dell PowerEdge Diagnostics Features
Configuration Tab
The Configuration tab allows you to: View the information about the devices discovered in your system. View additional hardware information of a device, if available. (When additional hardware is available, the information icon appears next to the device name in the Configuration tab.)
102
Status Tab
The Status tab allows you to: • View the status of the running tests and abort or suspend diagnostic tests if the specific test
supports this feature. • Abort or suspend a test by highlighting the test in the Status tab and right‐clicking your mouse to
display the Abort/Suspend/Abort All menu.
103
Results Tab
The Results tab allows you to: • View the results for each individually selected test. • View the entire result message for a particular test by highlighting the device and the test and
double‐clicking the mouse. A dialog box displays the entire test result message. • View the subtest results for any item that has a plus sign (+) next to the ID by double‐clicking on that
item as shown in the screen capture.
104
Save Results or Save Configuration Tab
You can save the results and configuration information collected by PowerEdge diagnostics by selecting the tab from the File menu. You are prompted for a file name to use for archiving the information. View Saved Results Tab View the saved results by extracting the .html file from the .zip file into a directory that you create. When you open the .html
Dell 32‐Bit Hardware Diagnostics The Dell 32‐Bit Hardware Diagnostics are used to isolate hardware failures. Dell 32‐Bit Hardware Diagnostics may by located on the utility partition depending on how and when the server was installed. Dell 32‐Bit Hardware Diagnostics can also be downloaded and made into boot media. A system administrator can perform individual tests to isolate what field replaceable unit has failed. Testing options are:
• Single Devices • On‐board RAM Memory (known as ‘MpMemory’) • All Devices Using the Quick Test Option • All Devices Using the Extended Test Option
105
Troubleshooting and Diagnostics During this course several diagnostic tools have been discussed. The following matrix is meant to illustrate when and how these diagnostic tools for PowerEdge server systems should be used. This matrix represents a general guideline and is not meant to replace the troubleshooting guides found in Dell Solution Network (DSN).
Summary • The LCD panel is the primary error‐indicator in 9G systems. Errors may appear before, during, and
after post. • The LCD panel can display up to three error messages, based upon the priority of the message. • The BMC senses and reports on temperature, voltage, and fan speeds. • The NIC connector that belongs to the DRAC is indicated by the wrench icon.
106
• The BMC provides the following communication interfaces: host (in‐band, local), serial (out‐of‐band, remote), and LAN (out‐of‐band, remote).
• Dell Diagnostics Distribution Package allows users to: install a USB flash drive, create a bootable diskette set, create a bootable HDD image file, create a utility partition, and install an operating system.
• DDDP stands for Dell Diagnostics Distribution Package. DDDP is a distribution method for Dell 32‐Bit Hardware Diagnostics.
107
Navigating Dell Information and Tools
Module Objectives Dell Enterprise is committed to offering customers information and support through many channels. Our engineers and subject matter experts are dedicated to bringing you experience and guidance, technical “how to” tips, tools, best practices and solutions.
Objectives
Upon completion of this section, you will be able to: • Navigate through many of the support sites and tools designed and offered by Dell Enterprise. • Recognize the benefits of these tools. • Relate the relevancy of this information to your job.
Dell System E‐Support Tool (DSET) The Dell System E‐Support Tool (DSET) provides the ability to collect hardware, storage and operating system information from a Dell PowerEdge™ server. This information is consolidated into a single System Configuration Report that can be used for troubleshooting or inventory collection of a system. The browser user interface provides a convenient means to view a specific data through hierarchal menu trees.
Access to DSET is through support.dell.com.
Support.Dell.com This comprehensive support site offers a wide‐variety of technical support options. You may research your issues using Dell manuals and forums as well as check on order status and download drivers you need. The site offers comprehensive information on all Dell products and features links to many tech support, customer care and chat options. Navigate to the Dell’s Support page at this link: support.dell.com/ 1. Find the Troubleshooting and FAQs section and click the link. Follow these steps to use the
troubleshooting wizard:
a. Choose a Select a Model. b. With Select your Product Model highlighted, select Servers, Storage, Networking. c. With Servers, Storage, Networking highlighted, select PowerEdge Server. d. With PowerEdge Server highlighted, select the model of your choice, then confirm your
choice.
108
e. Spend 10 minutes using the wizard. Be sure to read some of the FAQ materials and enter search criteria.
2. Support.dell.com offers all customers options for signing up and managing Dell Technical Update information. The options include Dell Technical Updates and Management of Your Updates. The site requires that you sign up initially for an account after which updates concerning your equipment will automatically be e‐mailed to you.
Dell Solution Network
Another online tool is the Dell Solution Network (DSN) tool that you can use to solve technical issues. DSN includes Dell‐specific instructional guidance and advice about software and hardware troubleshooting. DSN has detailed Decision Trees (DTs) that help resolve any issue quickly and efficiently. For example, you can use the operating system DTs to solve operating system issues, which could prevent unnecessary reinstallation. Access to DSN is through support.dell.com.
• Click on Troubleshooting • Choose a model, service tag or from a list to access information relevant to your system
• Use the Search function under DSN to find the information you need.
109
Appendix A: RAID/PERC Terms Before this course goes into detail on RAID and PERC concepts, it is helpful to familiarize yourself with terms you will encounter. Working individually or as a class, review each term and its associated definition. Spend extra time on terms that warrant discussion. Background Patrol Read ‐ Proactively helps prevent data loss in a redundant array by continuously monitoring the hard drives for sector errors. If an error is found, it proceeds to recover the data. As its name suggests, this feature is a background, or secondary, process that is designed to dynamically throttle its usage of the hard drives so as not to interfere with I/O performance. Baseboard Management Controller ‐ Delivers a complete set of tools that are designed to help you maximize ongoing server performance. Battery Learn Cycle ‐ Composed of a shallow battery discharge followed by a battery re‐charge. It is used to allow the battery gas gauge integrated circuit (IC) to recalibrate and accurately predict the battery's charge capacity when fully charged. Learn cycles are initiated by the firmware automatically, without user intervention. Consistency Check ‐ A test performed to determine if the data has any internal conflicts. Disk Data Format (DDF) ‐ Structure that allows a basic level of interoperability between different suppliers of RAID technology. The PERC 5 implementations conform to Dell's Profile 1 structure of this specification. Disk Migration ‐ Physically moving virtual disks and hot spares from one controller to another without regard to sequence of drive installation. Firmware supports migration of configurations from one controller to another, regardless of whether the target controller has an existing configuration, without a controller restart, host reboot, or reconfiguration of existing virtual disk parameters and/ or parameters of the virtual disk being migrated. Disk Rebuilds ‐ When a physical disk in a RAID array fails, you can rebuild the drive by recreating the data that was stored on the drive before it failed. The controller uses hot spares to rebuild failed drives automatically and transparently, at user‐defined rebuild rates. Disk Roaming ‐ Used to identify drives that have been moved to different slots on the SCSI backplane. The PERC associates a drive with the SCSI backplane slot to which the drive is connected during configuration. When it finds a drive in a different backplane slot, it re‐associates the drive's SCSI ID with the new slot. Failover ‐ The capability to switch over automatically to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network. Failover happens without human intervention and generally without warning, unlike switchover. Foreign Configuration ‐ A DDF configuration that is not part of the active configuration. This implies that not only a configuration moved from controller B to controller A is considered as a foreign configuration
110
on Controller A but also the configuration that was previously created on controller A that is not part of the current configuration on Controller A. Hot Spare (Global) ‐ Can be used to replace any failed drive in a redundant array as long as its capacity is equal to or larger than the coerced capacity of the failed drive. A global hot spare defined on any SAS/SATA II target should be available to replace a failed drive on both SAS/SATA II targets. Hot Spare (Dedicated) ‐ Dedicated hot spare can be used to replace a failed drive only in a selected disk group. A dedicated hot spare is used before one of the global hot spares is used. Hot spare drives can be located on any RAID SAS/SATA II target. Hot swapping ‐ The ability to remove and replace components of a machine, usually a computer, while it is operating. Once the appropriate software is installed on the computer, a user can plug and unplug the component without rebooting. Native Configuration ‐ A DDF configuration that is currently active on a controller. It will remain native until it is cleared from the active configuration set either by the user or by the system. RAID Level Migration ‐ PERC controllers support migrating from one RAID level to another. A RAID 1 or 5 can be migrated to a RAID 0, but you cannot migrate a RAID 0 to RAID 1 or 5. A RAID 0 cannot be migrated to RAID 1 or 5 because the size of one drive must be set aside for the storage of parity data. RAID Migration ‐ Porting an existing RAID array from one mass storage controller to another. Rebuild Checkpoint ‐ Resumes a rebuild on a physical disk in case of an abrupt power loss or if the server reboots in the middle of a rebuild operation. A rebuild of an array will not resume if that array undergoes reconstruction. Rebuild Rate ‐ The percentage of the compute cycles dedicated to rebuilding failed drives. A rebuild rate of 100 percent means the system gives priority to rebuilding the failed drives. Spanned Drives ‐ Arranging arrays sequentially with an identical number of drives so that the drives in the different arrays are spanned. Spanned drives can be treated as one large drive. Data can be striped across multiple arrays as one virtual disk. The maximum number of spans is eight.
111
Appendix B: Glossary Term Definition Background Patrol Read Proactively helps prevent data loss in a redundant array by
continuously monitoring the hard drives for sector errors. If an error is found, it proceeds to recover the data. As its name suggests, this feature is a background, or secondary, process that is designed to dynamically throttle its usage of the hard drives so as not to interfere with I/O performance.
Backplane Interface between the hard drives and controller (drives plug into this component).
Backplane A backplane is a high‐speed communications circuit board that contains sockets with which devices and other electronic components can interface.
Backplane Power Connector
Used to supply power to the backplane.
Baseboard Management Controller
Delivers a complete set of tools that are designed to help you maximize ongoing server performance.
Baseboard Management Controller (BMC)
A microcontroller present on all 9G PowerEdge servers. It is ccontrolled by updatable firmware, BMC monitors, controls, and reports system health.
Battery Learn Cycle Composed of a shallow battery discharge followed by a battery re‐charge. It is used to allow the battery gas gauge integrated circuit (IC) to recalibrate and accurately predict the battery’s charge capacity when fully charged. Learn cycles are initiated by the firmware automatically, without user intervention.
ChipKill Memory Technology
A function of the systems memory controller and works with ECC memory.
Consistency Check A test performed to determine if the data has any internal conflicts. DDDP A distribution mechanism for Dell diagnostics and MP Memory
diagnostics. Dell OpenManage Systems Management
A suite of application programs for PowerEdge systems that allows you to manage your system with proactive monitoring, diagnosis, notification, and remote access.
Dell PowerEdge Diagnostics
A suite of diagnostic programs, or test modules, that run locally on a system.
DIMM Sockets Sockets for system memory, sometimes located directly on the system board and other times located on the memory riser card.
Disk Data Format (DDF) Structure that allows a basic level of interoperability between different suppliers of RAID technology. The PERC 5 implementations conform to Dell’s Profile 1 structure of this specification.
Disk Drive I/O Connectors
A disk driver I/O connector is an internal connector that attaches directly to components without the use of cables.
112
Term Definition Disk Migration Physically moving virtual disks and hot spares from one controller to
another without regard to sequence of drive installation. Firmware supports migration of configurations from one controller to another, regardless of whether the target controller has an existing configuration, without a controller restart, host reboot, or reconfiguration of existing virtual disk parameters and/ or parameters of the virtual disk being migrated.
Disk Rebuilds When a physical disk in a RAID array fails, you can rebuild the drive by recreating the data that was stored on the drive before it failed. The controller uses hot spares to rebuild failed drives automatically and transparently, at user‐defined rebuild rates.
Disk Roaming Used to identify drives that have been moved to different slots on the SCSI backplane. The PERC associates a drive with the SCSI backplane slot to which the drive is connected during configuration. When it finds a drive in a different backplane slot, it re‐associates the drive's SCSI ID with the new slot.
Double Data Rate (DDR) A type of SDRAM that sends data on both the rising and the falling of the clock cycle. This effectively doubles the rate of data that can be read with a standard SDRAM module.
Double Data Rate 2 (DDR 2)
DDR 2 is the latest generational evolution of the SDRAM module set. This is an improvement on the DDR family with an increased number of buffers, faster pre‐fetch rate, improved packaging, and reduced electrical demands.
DRAC 5 A hardware and software systems management solution. DRAC‐5 Daughter Card Connectors
Used to connect the Dell Remote Access Card 5 (DRAC‐5) to the system board. The DRAC‐5 enables IT personnel to manage the system remotely.
Dual Inline Memory Module (DIMM)
A type of memory that uses a 64‐bit bus to transfer data.
Dynamic Random Access Memory (DRAM)
Stores each bit of data in a separate capacitor. This type of memory needs to be refreshed in order to stay current because the capacitors leak electrons, hence the name “dynamic.”
ENET A dedicated Ethernet port for DRAC‐5. Error Checking and Correcting (ECC)
Checks and corrects data in real time (on the fly).
Failover The capability to switch over automatically to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network. Failover happens without human intervention and generally without warning, unlike switchover.
Fault Tolerance Assures network availability. If one controller fails, the server will remain available to the network by using another controller
113
Term Definition Foreign Configuration A DDF configuration that is not part of the active configuration. This
implies that not only a configuration moved from controller B to controller A is considered as a foreign configuration on Controller A but also the configuration that was previously created on controller A that is not part of the current configuration on Controller A.
Fully Buffered DIMM (FB‐DIM)
FB‐DIMM combines the high‐speed internal architecture of DDR 2 memory with a brand new point‐to‐point serial memory interface, which links each FB‐DIMM module together in a chain.
GB ENET The GB Ethernet is used to connect the server to the network via high speed Ethernet
Gigabit Ethernet Connector
Port for network interface controller (NIC).
Hot Spare (Dedicated) A dedicated hot spare can be used to replace a failed drive only in a selected disk group. A dedicated hot spare is used before one of the global hot spares is used. Hot spare drives can be located on any RAID SAS/SATA II target.
Hot Spare (Global) Can be used to replace any failed drive in a redundant array as long as its capacity is equal to or larger than the coerced capacity of the failed drive. A global hot spare defined on any SAS/SATA II target should be available to replace a failed drive on both SAS/SATA II targets.
Hot swapping The ability to remove and replace components of a machine, usually a computer, while it is operating. Once the appropriate software is installed on the computer, a user can plug and unplug the component without rebooting.
Hot‐Swappable Memory
Works with memory mirroring, not with spare bank, and enables the user to replace failed memory modules while the system is still powered on.
HVAC Cooling/Rack Advisor
Advisory services for setting up a rack system properly with the right amount of clean air flow to keep the server running optimally.
Intelligent Platform Management Interface (IMPI)
Designed by Intel in 1998 to push forward manageability in server systems. It is a command language interface to the Baseboard Management Controller that can be used to manage controlled devices.
KVM Keyboard/Video/Mouse (KVM) Switches
Used to provide a single point of entry (monitor, mouse, and keyboard) for servers within a rack. They can be analog, digital or even over‐IP, which allows direct access through an Internet browser.
LCD Panel The primary error‐indicator in 9G systems. Load‐Balancing Allows multiple controllers to share large data loads, preventing one
controller from being overwhelmed Memory Mirroring Enables data to be mirrored identically across two memory banks. This
mirroring allows for the server to remaining functioning if the primary memory bank fails.
Native Configuration A DDF configuration that is currently active on a controller. It will remain native until it is cleared from the active configuration set either by the user or by the system.
114
Term Definition Networking Sharing of devices or information. Operating System The core program running on a computer. It is responsible for
providing a platform for other applications to run from. OSI Model (Open System Interconnection). Developed by the International
Standards Organization ISO in 1984 to provide a reference model for the complex aspects related to network communication.
PCI Riser Connector Used to connect the peripheral component interconnect (PCI) riser PDU (Power Distribution Unit)
A device that distributes and organizes power used in Dell rack.
PERC PowerEdge Expandable RAID Controller: Dell’s line of RAID controllers PHY A transceiver (one transmit and one receive). Primary CPU The primary processor in the system Processor The processor, also known as the CPU, is the primary chip set within
your system. PSPB\PSDB Power Supply Paralleling Board or Distribution Board: Distributes
power from multiple power supplies. RACADM The command‐line interface configuration utility for DRAC 5. Racks Enclosures for PowerEdge rack optimized servers. RAID Controller Redundant Array of Independent Disks (RAID). A disk subsystem that
employs two or more drives in combination for fault tolerance and performance.
RAID Level Migration PERC controllers support migrating from one RAID level to another. A RAID 1 or 5 can be migrated to a RAID 0, but you cannot migrate a RAID 0 to RAID 1 or 5. A RAID 0 cannot be migrated to RAID 1 or 5 because the size of one drive must be set aside for the storage of parity data.
RAID Migration Porting an existing RAID array from one mass storage controller to another.
Rails Provide easy access to rack‐mounted server components without the need to dismount the device from the rack.
Rebuild Checkpoint Resumes a rebuild on a physical disk in case of an abrupt power loss or if the server reboots in the middle of a rebuild operation. A rebuild of an array will not resume if that array undergoes reconstruction.
Rebuild Rate The percentage of the compute cycles dedicated to rebuilding failed drives. A rebuild rate of 100 percent means the system gives priority to rebuilding the failed drives.
SAS Serial Attached SCSI. Hard drives that deliver the next generation of SCSI performance and reliability for critical business applications.
SAS Controller A single SAS controller supports both SAS native drives and SATA II drives
SAS Domain A simple SAS domain contains SAS devices and one or more expander devices
115
Term Definition SATA Serial Advanced Technology Attachment. A generational upgrade of
the parallel ATA or Integrated Drive Electronics (IDE) interface. Additionally, the cables used with SATA‐enabled devices are much smaller and allow for smaller chassis design due to their improved cooling efficiencies.
SCSI (Small Computer System Interface)
An industry standard interface for providing high‐speed access to peripheral devices.
Secondary CPU The secondary processor in the system. Serial Advanced Technology Attachment (SATA) Connector
Used to connect internal Serial ATA devices.
Serial Attached SCSI (SAS) Connector
Used to connect SAS devices.
Serial Port Used to connect serial devices such as a mouse Server A computer designed with the capabilities to survive a given level of
hardware component failure.
Side‐Plane Connector Used for connecting the small computer systems interface (SCSI) side‐plane to the system board.
Single In‐line Memory Module (SIMM)
A type of memory that uses a 32‐bit bus to transfer data.
Small Computer Systems Interface (SCSI) Connector
Used to connect internal SCSI devices.
SNMP Trap Generated by management software or the firmware in a management device when a message needs to be delivered somewhere.
Spanned Drives Arranging arrays sequentially with an identical number of drives so that the drives in the different arrays are spanned. Spanned drives can be treated as one large drive. Data can be striped across multiple arrays as one virtual disk. The maximum number of spans is eight.
Spare Bank Allows a backup memory module to be used when a primary memory module fails. This backup allows the server to continue functioning until the failed primary module is replaced.
Storage Subsystem A storage subsystem is a collection of components that allow for the storage of data.
Switches Dimensionally correct switches can be used to provide a localized switching option for servers within your rack.
Synchronous DRAM (SDRAM)
A specialized DRAM chip that uses an internal clock that is coordinated with the system processor in order to synchronize the input and output of data. The speed of the SDRAM chip is therefore limited by the speed of the processor. A faster set of processors means that a faster SDRAM chipset can be used.
116
Term Definition System Board The Dell PowerEdge system board, often referred to as the
“motherboard,” is the foundation of all Dell PowerEdge Servers. All major components are either integrated into the system board or sockets are provided for components to attach through various methods.
System Board Power Connectors
Used to supply power to the system board.
Systems Management Involves using tools (hardware and software) to perform mundane, simple or complex tasks
TOE The TCP/IP Offload Engine. A technology in high‐speed Ethernet systems that optimizes throughput
UPS (Uninterrupted Power Supply)
Used to power a system in the event of a power failure
USB (Universal Serial Bus) Connector
Used to connect various peripheral devices
VGA (Video Graphics Array) Connector
Used to attach a monitor.