hp nmi watchdog

7
Technical white paper Performing an HP ProLiant server NMI crash dump Table of contents Introduction............................................................................................................................................................................ 2 NMI crash dump overview .................................................................................................................................................... 2 Initiating NMI crash dumps ................................................................................................................................................... 3 NMI crash jumper pins and dump switches ...................................................................................................................3 ROM-Based NMI Debug button........................................................................................................................................4 NMI crash dump compliant operating systems.................................................................................................................4 Microsoft Windows............................................................................................................................................................ 5 VMware ............................................................................................................................................................................... 5 Linux ................................................................................................................................................................................... 5 iLO Virtual NMI Button........................................................................................................................................................... 6 Resources............................................................................................................................................................................... 7 Click here to verify the latest version of this document

Upload: sprdd

Post on 10-May-2015

482 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: HP NMI WATCHDOG

Technical white paper

Performing an HP ProLiant server NMI crash dump

Table of contents Introduction ............................................................................................................................................................................ 2 NMI crash dump overview .................................................................................................................................................... 2 Initiating NMI crash dumps ................................................................................................................................................... 3

NMI crash jumper pins and dump switches ................................................................................................................... 3 ROM-Based NMI Debug button ........................................................................................................................................ 4

NMI crash dump compliant operating systems ................................................................................................................. 4 Microsoft Windows ............................................................................................................................................................ 5 VMware ............................................................................................................................................................................... 5 Linux ................................................................................................................................................................................... 5

iLO Virtual NMI Button ........................................................................................................................................................... 6 Resources ............................................................................................................................................................................... 7

Click here to verify the latest version of this document

Page 2: HP NMI WATCHDOG

Technical white paper | ProLiant diagnostic tools

2

Introduction

This document describes the implementation of non-maskable interrupt (NMI)-based crash dump capabilities in HP ProLiant servers, including ProLiant Gen8 servers. The ability to perform an NMI-based crash dump can be beneficial to system administrators in their root cause failure analysis.

An NMI crash dump allows you to obtain critical diagnostic information in the event of system failures. We present both user-initiated and automatic crash dump methods.

NMI crash dump overview

The NMI crash dump is a diagnostic mechanism that allows the creation of crash dump files in situations when a system is unresponsive and traditional debugging mechanisms are unsuccessful.

Crash dump analysis is an essential diagnostic tool for addressing reliability problems in operating systems, device drivers, and applications. Many crashes will freeze a system in such a way that your only recourse is to do a hard reset (cycling power on the system). Since resetting the system erases any information supporting an analysis of the problem, the system must execute a memory dump before you perform a hard reset. A hardware jumper, dump switch, or virtual NMI button along with supported operating systems provide this function.

Figure 1 shows the course of events that occur when you force the operating system to invoke the NMI handler, generate a crash dump log, and then use that log to diagnose software failures. The crash dump log can provide critical information for root-cause analysis that may be difficult or impossible to obtain through other means. You initiate an NMI event by shorting the jumper pins, by pressing the dump switch, or through the HP iLO Virtual NMI Button feature. The NMI can allow a frozen system to become responsive enough to generate a crash dump log.

Figure 1.

Warning Using the NMI crash jumper pins or dump switch on a functioning system (using any operating system) will cause an abruptly halt. You should never use NMI crash dump during normal operation.

The jumper pins and dump switch operate even if the appropriate driver is not loaded. If present, the driver disables the Automatic Server Recovery (ASR) feature so that the server does not reboot when a debug session is in progress.

Page 3: HP NMI WATCHDOG

Technical white paper | ProLiant diagnostic tools

3

The NMI crash dump jumper pins or dump switch may not work in all situations: after another NMI has already occurred in the system, when the OS crash handler is incapable of running properly, and following some hardware failures. Table 1 highlights ProLiant server NMI crash dump capabilities and benefits.

Table 1.

ProLiant NMI crash dump compatibility NMI benefits for ProLiant servers

ProLiant hardware Newer ProLiant servers only include NMI jumpers, not dump switches. Consult the product documentation for your server.

ProLiant server blades do not include physical NMI debug jumper pins or dump switches. You can only use iLO-based Virtual NMI functions. See the iLO Virtual NMI Button section later in this paper.

Jumper pins, a dump switch, or a virtual NMI function cause a ProLiant server to initiate an NMI (PCI SERR) event and create a crash dump file.

ProLiant software Beginning with ProLiant Gen8 servers, an HP NMI Sourcing driver is not required for any operating system. The system ROM logs the NMI event. Older servers require the appropriate HP NMI Sourcing driver to create a crash dump file.

The NMI crash dump function is dependent upon the ProLiant server Generation, whether the ProLiant is a rack or blade server, and on the appropriate driver being installed when necessary. All drivers are distributed with Service Pack for ProLiant (SPP) hp.com/go/spp. The SPP detects and installs the appropriate driver for the server automatically.

Note The NMI jumper pins or dump switch will cause an NMI upon activation. This feature does not require any software to generate the NMI. An NMI event by itself will not create a crash dump log.

Initiating NMI crash dumps

You can initiate a NMI event through the jumper pins or dump switch provided on the ProLiant server, or remotely through the Virtual NMI button in iLO (see the “iLO Virtual NMI Button” section).

NMI crash jumper pins and dump switches The NMI crash dump jumper pins or dump switch generate a PCI SERR under all operating systems.

Figures 2 and 3 are examples of jumper pins and dump switches found on ProLiant servers. For exact placement, refer to the illustration on the hood label of the server or in the user guide.

Figure 2. Figure 3.

Page 4: HP NMI WATCHDOG

Technical white paper | ProLiant diagnostic tools

4

Note Newer ProLiant servers only include NMI jumpers, not dump switches. Consult the product documentation for your server.

ProLiant server blades do not include physical NMI debug jumper pins or dump switches. You can only use iLO-based Virtual NMI functions. See the iLO Virtual NMI Button section later in this paper.

ROM-Based NMI Debug button The NMI Debug Button option is a toggle setting (Figure 4) that allows you to enable debug functionality when the system has experienced a software lock-up. The NMI Debug Button generates an NMI to enable the use of the operating system debugger. The NMI Debug Button is enabled by default.

Options include:

• Enabled (default)

• Disabled

Figure 4.

NMI crash dump compliant operating systems

The operating systems discussed here give you the ability to initiate crash memory dumps.

Note Beginning with ProLiant Gen8 servers, an HP NMI sourcing driver is not required for any operating system. The system ROM logs the NMI event.

Page 5: HP NMI WATCHDOG

Technical white paper | ProLiant diagnostic tools

5

Microsoft Windows You can find the latest guidelines for generating a NMI crash dump file or a kernel crash dump file on a Windows-based system in this Microsoft support article: support.microsoft.com/kb/927069

The article also contains a current list of applicable Microsoft operating systems. If your ProLiant server requires an HP sourcing driver (a ProLiant pre-Gen8 server), you will need to use the SPP software appropriate for your operating system to supply that driver.

Use either the SPP version that shipped with your ProLiant server, or a later version from the hp.com. You can download the latest version of SPP from: hp.com/go/spp_download

Note For optimal functionality, we recommend using SPP to obtain the appropriate HP NMI sourcing driver for servers older than ProLiant Gen8. This is an optional step since the HP NMI Sourcing driver is not required with compliant Microsoft operating systems.

Warning Before making changes in the Registry, HP recommends that you make a copy of the system settings. This will allow you to restore the system settings if there are errors.

VMware VMware 5 is compliant with the HP NMI Sourcing driver, but only on ProLiant pre-Gen8 servers. On ProLiant Gen8 servers, even those without the HP NMI driver, VMware will halt the VMkernel with a purple diagnostic screen (panic) when an NMI occurs. The sole purpose of the HP NMI Sourcing driver is to tell the OS to panic and log the NMI event to the HP Integrated Management Log (IML).

For more information about managing a VMware NMI event, see the VMware Knowledge Base. You can access the VMware Knowledge base at either of the following locations:

• kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1014767

• kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2002955

You can download the latest HP NMI sourcing driver for VMware at hp.com/products/servers/software/vmware-esxi/driver_version.html.

Linux Linux uses the kdump facility to create a crash dump when an NMI event occurs. Most Linux kernels are configured to be "kdump-ready". You can typically find a description of the configuration inside the Linux kernel source tree file “Documentation/kdump/kdump.txt.”

This is generic information regarding Linux and NMI-related crash dumps. You should look for specific information relating to your version of Linux.

Page 6: HP NMI WATCHDOG

Technical white paper | ProLiant diagnostic tools

6

Table 2 provides an overview of compliant operating systems and benefits.

Table 2.

Operating system NMI crash dump compliance NMI benefits

Compliant Microsoft Windows operating systems

Registry changes are required to generate a crash dump when the NMI dump switch is used. No special installation requirements are needed.

Allows user level settings for a crash dump file generation.

VMware VMware 5 is compliant with the HP NMI sourcing driver, but only on ProLiant pre-Gen8 servers. No drivers are required on ProLiant Gen8 servers.

VMware is compatible with NMI crash dump for ProLiant Gen8 and earlier servers.

Linux Linux uses the kdump facility Linux is compatible with ProLiant server NMI crash dumps

iLO Virtual NMI Button

ProLiant servers with iLO can initiate an NMI crash dump through a web browser. The iLO- based Virtual NMI button allows users to trigger an NMI without requiring physical access to the server chassis or knowing the precise location of the NMI control for the host. Access to this control is restricted to users with the “iLO Virtual Power & Reset” privilege. The same NMI crash dump conditions and restrictions apply when using iLO.

To generate an NMI using iLO, you must:

1. Log into the iLO processor of the target using an account with the Virtual Power & Reset privilege.

2. Navigate to the iLO Diagnostics screen as shown in Figure 5.

3. Click the Generate NMI to System button.

Figure 5.

Page 7: HP NMI WATCHDOG

Technical white paper | ProLiant diagnostic tools

Sign up for updates hp.com/go/getupdated

Share with colleagues

Rate this document

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation.

4AA4-7853ENW, July 2013

Resources

HP ROM-Based Setup Utility User Guide hp.com/bc/docs/support/SupportManual/c00191707/c00191707.pdf

iLO product information and user guide hp.com/go/ilo

ProLiant server information hp.com/go/proliant

Industry Standard Server Technology Papers hp.com/servers/technology