intel(r) cluster checker · 2013. 10. 22. · step 3 run intel® cluster checker you can now run...

75
Intel® Cluster Checker 1.8 User's Guide

Upload: others

Post on 16-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Intel® Cluster Checker 1.8User's Guide

Page 2: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

This page was intentionally left blank

Page 3: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

ABOUT THIS DOCUMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

GETTING STARTED WITH INTEL® CLUSTER CHECKER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1. CONFIGURING INTEL® CLUSTER CHECKER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.1. DEFINING THE NODES TO CHECK ...................................................................................................................... 11 1.2. DEFINING INTEL® CLUSTER CHECKER CONFIGURATION ...................................................................................... 13

1.2.1. List of Nodes ................................................................................................................................. 14 1.2.2. Altering the Runtime Behavior of the Tool ................................................................... 14 1.2.3. Selecting Test Modules ............................................................................................................ 16 1.2.4. Configuring Test Modules ....................................................................................................... 17 1.2.5. Using Multiple Configuration Files ...................................................................................... 25

1.3. LICENSE FILE PATH CONFIGURATION ................................................................................................................ 25 1.4. UPDATING OLD CONFIGURATION FILES ............................................................................................................. 26

2. RUNNING INTEL® CLUSTER CHECKER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1. VERIFYING CLUSTER CORRECTNESS ................................................................................................................... 27 2.1.1. Console Output ........................................................................................................................... 27 2.1.2. Log Files ........................................................................................................................................... 29 2.1.3. Additional Output ........................................................................................................................ 29 2.1.4. Command Line Options ............................................................................................................ 30 2.1.5. Environment Variables ............................................................................................................. 34

2.2. GATHERING CLUSTER INFORMATION .................................................................................................................. 35 2.2.1. Command Line Options ............................................................................................................ 36

3. USER-DEFINED CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.1. CORRECTNESS CHECKING ................................................................................................................................... 38 3.2. UNIFORMITY CHECKING ...................................................................................................................................... 38

4. COPY EXACTLY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5. INTEL® CLUSTER CHECKER TEST MODULES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1. COMPLIANCE TEST MODULES ........................................................................................................................... 41 5.2. SDK COMPLIANCE TEST MODULES ................................................................................................................ 42 5.3. DEFAULT TEST MODULES ................................................................................................................................. 42 5.4. OPTIONAL TEST MODULES ............................................................................................................................... 44

6. PERFORMANCE TEST MODULES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.1. SINGLE-NODE BENCHMARKS .............................................................................................................................. 46 6.2. PAIR-WISE BENCHMARKS ................................................................................................................................... 47 6.3. CLUSTER-WIDE BENCHMARKS ............................................................................................................................ 47

7. HETEROGENEOUS CLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Page 4: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

7.1. NOMINAL HARDWARE VARIATION ..................................................................................................................... 48 7.2. SUB-CLUSTERS ................................................................................................................................................... 48 7.3. FAT NODES ...................................................................................................................................................... 50

8. AUTOMATIC CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

8.1. OVERVIEW .......................................................................................................................................................... 52 8.2. COMMAND LINE OPTIONS .................................................................................................................................. 52

8.2.1. Automatic Configuration Options ....................................................................................... 52 8.3. CONSOLE OUTPUT AND LOGS ........................................................................................................................... 56 8.4. CLUSTER NODES AUTOMATIC DISCOVERY ....................................................................................................... 56

8.4.1. Configuration Options ............................................................................................................... 57 8.5. PERFORMANCE THRESHOLDS AUTOMATIC CONFIGURATION ............................................................................. 57

8.5.1. Hardware Scanning .................................................................................................................... 57 8.5.2. Additional Output ........................................................................................................................ 58 8.5.3. Benchmarking and Performance Disclaimers ............................................................... 58

8.6. AUTOMATIC CONFIGURATION ADVANCED USAGE .............................................................................................. 59 8.6.1. Group Configuration Alternatives ...................................................................................... 59 8.6.2. Heterogeneous Hardware Support ................................................................................... 60 8.6.3. Single Node Performance ....................................................................................................... 60

9. THIRD PARTY COPYRIGHT NOTICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Page 5: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Disclaimer and Legal Information

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.

The Intel® Cluster Checker tool is intended to be used by registered Intel® Cluster Ready partners only. The Intel® Cluster Checker tool executes only in systems which run over Intel® processors.

Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Intel recommends that you evaluate other related products to determine which best meets your requirements.

* Other names and brands may be claimed as the property of others

Copyright © 2006-2011, Intel Corporation. All rights reserved.

5

Page 6: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

About This Document

Intel® Cluster Checker verifies the configuration and performance of Linux-based clusters and checks compliance with the Intel® Cluster Ready Specification. This User's Guide provides step-by-step instructions for using Intel® Cluster Checker.

This guide is organized into the following sections:

A Getting Started With Intel® Cluster Checker introduction with the very basics steps to run the tool.

A complete guide with the details on how to configure, execute and tune Intel® Cluster Checker for specific clusters, divided in the following chapters:

Chapter 1 describes how to configure Intel® Cluster Checker for a specific cluster.Chapter 2 describes how to execute Intel® Cluster Checker.Chapter 3 describes how to run custom checks without creating new test modules.Chapter 4 describes the Copy Exactly feature.Chapter 5 lists all the test modules included with Intel® Cluster Checker.

Chapter 6 describes the performance benchmarks included in Intel® Cluster Checker.Chapter 7 describes how to configure Intel® Cluster Checker to recognize and verify

heterogeneous clusters.Chapter 8 describes the automatic configuration feature. Chapter 9 contains copyright notices for the third party tools that are distributed

with Intel® Cluster Checker.

Other documents included with Intel® Cluster Checker distribution

• Intel® Cluster Checker Developer's GuideInformation on how to create new test modules using the Intel® Cluster Checker plug-in architecture.

• Intel® Cluster Checker Test Module Reference GuideDetailed information about each test module.

Further information and support can be found online at http://www.intel.com/go/cluster.

6

Page 7: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Getting Started With Intel ® Cluster Checker

Overview Intel® Cluster Checker runs as a sequence of individual tests called test modules. Each test module conducts a specific test over a single cluster node or over the whole cluster, depending on its type. You can find information about what each test module checks and how to configure them in the separate Test Modules Reference Guide.

Simply put, using Intel® Cluster Checker involves the following steps1:0. Setting up the run-time environment.1. Creating a nodes file, which contains the list of cluster nodes.2. Creating a configuration file. 3. Executing Intel® Cluster Checker, passing your configuration file as a

parameter.4. Analyzing Intel® Cluster Checker's output.5. Starting over, until all test modules produce the desired results.

This tutorial guides you through the process of running Intel® Cluster Checker to get you started.

Step 0 Set up the run-time environment

To get started, setup the run-time environment for Intel® Cluster Checker by executing the initialization script clckvars.sh, saved in Intel® Cluster Checker's install path.

$ source <install­path>/clckvars.sh

The initialization script: • Adds cluster­check to the execution path.• Enables command line options automatic completion (pressing TAB).• Enables the tool man pages with the man utility. To see the general man page

just execute:

$ man cluster­check

1 For installation help and information on run-time prerequisites, please read the Release Notes for your version of Intel® Cluster Checker.

7

Page 8: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

To open test modules man pages use the following syntax: clck-<test_module_name>. Example for hardware_uniformity man page:

$ man clck­hardware_uniformity 

Step 1 Create a nodes fi le

Intel® Cluster Checker must know which nodes to process when running test modules. A nodes file is a list of your cluster's nodes hostnames or IP addresses, one name per line. As a working example, consider a computer cluster called Cluster, made up of four nodes named node1, node2, node3 and node4. In our example, the nodes file looks like this:

Step 2 Create a configuration fi le

Intel® Cluster Checker's behavior can be customized with configuration files. A configuration file is a validated XML file that holds values that model Intel® Cluster Checker's run-time behavior. The most simple configuration file consists only in the <nodefile> element, which points to an Intel® Cluster Checker nodes file. Following our example:

8

<!­­ My configuration file: /home/icr/myconfig.xml ­­><cluster>

<nodefile>/home/icr/nodesfile</nodefile></cluster>

Listing 2. A simple configuration file.

# /home/icr/nodesfile: Cluster nodes to processnode1node2node3node4

Listing 1. A simple node definition file. Note the use of '#' to include comments in nodes files. See Defining the Nodes to Check for more information.

Page 9: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

The configuration file may also contain test module configurations, including a list of which test modules to run and their custom parameters. If no test module configuration is provided (as in this example) then Intel® Cluster Checker runs a predefined subset of test modules. However, bear in mind that test modules must be properly configured to provide meaningful results.

Step 3 Run Intel® Cluster Checker

You can now run Intel® Cluster Checker by calling cluster­check binary, passing your configuration file as a parameter, for instance:

$ cluster­check myconfig.xml

When executed, Intel® Cluster Checker runs the pertinent test modules in a sequence, providing feedback on the outcome of each test.

NOTE: If no configuration file is passed by command line, cluster­check searches for a configuration file on the the following locations:

1. <install_path>/etc/config.xml2. /etc/intel/clck/config.xml

Note that this works as a fallback mechanism. The first file found file is used.

Step 4 Analyze Intel® Cluster Checker output

During and after execution, Intel® Cluster Checker provides reports on the results to each of the test modules executed. From these reports you can to detect flaws or recognize opportunities to improve your cluster's operation.

Intel® Cluster Checker generates 2 output reports with information of its execution. The names of the log files are the same as the name of the configuration file that was inferred, plus a time-stamp and a specific suffix. Following our example, this would be the log files created from the execution of Intel® Cluster Checker:

File name Description

myconfig­20110304.085149.out The .out log file contains the console output that Intel® Cluster Checker prints during

9

Page 10: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

execution.myconfig­20110304.085149.xml The .xml log file is the output of Intel®

Cluster Checker's highest verbosity. Being an XML file, it is suitable for parsing with other tools.

Step 5 Start over

After addressing the issues raised by test modules results, start over from step 2. Make changes to the nodes file, the configuration file or the cluster itself until the test modules produce no more warnings.

Further reading Try the following:

$ man cluster­check$ cluster­check –­help

And see also• Configuring Intel® Cluster Checker for information on configuration options.• Running Intel® Cluster Checker for information on customizing Intel® Cluster

Checker's run-time behavior.

10

Page 11: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

1. Configuring Intel® Cluster Checker

Intel® Cluster Checker is highly configurable. The default settings are appropriate in most cases, but may not be appropriate for all clusters. Consequently, the most valuable check is one that is optimized for your cluster.

1.1. Defining the Nodes to Check

Typically clusters are composed of many individual nodes where some nodes are used as computational resources, some are used to control the cluster, and some may be used for other purposes, such as storage servers. Intel® Cluster Checker recognizes 3 functional types of nodes: compute, head, and other. Some test modules are only executed to check nodes of a certain type while other may behave differently depending on the node type.

The cluster nodes are defined in a text file. The most basic file only lists the node names, one per line. For example, the following file defines 4 compute nodes:

# list of nodes to checknode1node2node3  # fails intermittentlynode4

The ‘#’ symbol has different uses in the nodes file. It may be used to introduce comments to the file by placing it at the beginning of a line or after the name of a node (as in above example). However, the ‘#’ may also be used for configuration options if it is followed by one of the keywords: ‘type:’ or ‘group:’

If the keywords '# type: head’ appears in the comment text on the same line as a node, Intel® Cluster Checker considers the node to be a head node. Similarly, nodes of functional types ‘compute’ and ‘other’ may also be defined. By default, a node without an explicitly defined type is considered a compute node. For example, the following file defines 4 nodes: node1 is a head node, node2 and node3 are compute nodes, and node 4 belongs to type other:

# list of nodes to checknode1  # type: headnode2  node3  # type: compute   fails intermittently

11

Page 12: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

node4  # type: other

One node may have multiple functions. For example, on small clusters, it is not uncommon for one node to serve as both the head node and as a computational resource. The node type definition may be repeated to assign more than 1 type to a particular node. A comma-separated list may also be used to assign a node to more than 1 type. For example, the following nodelist file defines 4 nodes, node1 is a combined head and compute node, node2 and node3 are compute/other nodes, and node4 belongs to types head, compute, and other:

# list of nodes to checknode1  # type: head type: computenode2  # type: compute, othernode3  # type: compute type: other fails intermittentlynode4  # type: other, head type: compute

For backwards compatibility, a head node may also be designated using the bare word 'head' if immediately following the '#' character2:

node1  # head

Clusters are typically homogeneous, but nodes may differ in some known aspects. For example, some nodes may have more memory than others. Intel® Cluster Checker can be configured to recognize this kind of heterogeneity using the 'group' property. Group assignments are similar to types, except the label would be ‘# group:' and the string is arbitrary. For example, the following file defines 4 nodes: node1 is a head/compute node with extra memory (belongs to bigmem group), node2 is a compute node with extra memory and higher frequency processors (belongs to bigmem and fastcpu), and node3 and node4 are compute nodes with the standard hardware:

# list of nodes to checknode1  # type: compute, head group: bigmemnode2  # group: bigmem group: fastcpunode3  # fails intermittentlynode4

Assigning a node to a group is necessary but not sufficient for Intel® Cluster Checker to recognize heterogeneous nodes. The group name must also be used in the XML configuration file, for instructions see 1.2.4.

2 This feature should be considered deprecated and may be removed in the future.12

Page 13: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

1.2. Defining Intel® Cluster Checker Configuration

The Intel® Cluster Checker configuration parameters are specified in an XML file. These parameters control the execution of each test module and define the list of nodes to be checked.

Several example XML configuration files are included with the tool and are located in the <installation­path>/examples/ directory.

Before the execution, the configuration file is verified with Intel® Cluster Checker XML validation schema. This corroborates that the correct parameters are used at the correct location, ensuring the user will get the expected behavior of the tool. Although not recommended, this verification can be disabled using the --force switch (see 2.1.4 for details of this option).

Tip to manually val idate the XML configuration fi le: A W3C XML Schema (clck.xsd) and XLST transformer style-sheet (clck.xsl) are included with Intel® Cluster Checker. The schema and style-sheet may be used with third-party XML editors to generate and/or validate a configuration file.

The xmllint tool included in the libxml2 package can be used to validate if a given configuration file follows the required schema. Because the schema validation requires the XML file to be ordered, it is advisable to first order it with the xsltproc command to avoid ordering issues. The command below shows how to order the file examples/example.xml:

    xsltproc   ­­output   examples/sorted.xml  clck.xsl examples/example.xml

Then use xmllint to verify if the file examples/sorted.xml is compliant with the required schema:

xmllint ­­schema clck.xsd examples/sorted ­­noout

If references to external xml files are included  in configuration file (see 1.2.5 for instructions of how to include other files) the option ­­xinclude must also be used in the validation. 

13

Page 14: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

1.2.1. List of Nodes

The nodes are typically read from the path specified in the XML configuration file. The ­­nodefile command line option may also be used to specify the file containing the list of nodes (see 2.1.4 for the complete list of command line options).

<nodefile> file </nodefile>Read the cluster nodes from file. If file begins with '$', it is interpreted as an environment variable, e.g. <nodefile>$PBS_NODEFILE</nodefile>.

The following parameters are supported only for backwards compatibility matters. Their usage is discouraged:

<head> value </head>This is an alternative method for defining a head node. This value will be added to the list of nodes to be checked as if it appeared in the node list file. This option may be specified more than once to define more than one head node in this manner.

<mixed­head/>Binary flag denoting that all head nodes are also compute nodes.

<node_suffix> value </node_suffix>Value will be appended to the cluster node names, i.e., mycluster1.mydomain. If the suffix is a domain name, the '.' must be explicitly included with the suffix. The default is no suffix.

1.2.2. Altering the Runtime Behavior of the Tool

The following parameters control how Intel® Cluster Checker operates and are optional.

<alltoall­threshold> value </alltoall­threshold>Override the default method used to control the combinational explosion in all-to-all checks as the number of nodes increases. The runtime behavior is to check all node pairs as long as the number of nodes is less or equal than the provided value (or 64 by default). Above that, the behavior is to check each node only against its nearest neighbors in the node list or more if alltoall-throttle is specified.

<alltoall­throttle> value </alltoall­throttle>

14

Page 15: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

The value corresponds to the degree of neighbors used to generate pairs when the number of nodes is greater than the one specified by alltoall-threshold. Node neighbors are determined by position in the nodelist file, not the physical arrangement of the nodes. Value will be 1 if not specified.

<head_tempdir> value </head_tempdir>Change the default location where temporary directories/files are created locally in the head node to prepare the checks to execute. The default location is /tmp. Value should be the absolute path to an existent directory in the head node with read, write and execute permissions. This value can also be specified with the environment variable CLCK_HEAD_TEMPDIR (see 2.1.5 for details of environment variables). It is important to note that the environment variable has precedence over the configuration file.

<node_tempdir> value </node_tempdir>Change the default location where temporary directories/files are created in the nodes for testing purposes. Test modules that include the head node in their checks will use this path when verifying it. The default location is /tmp. Value should be the absolute path to an existent directory in all nodes with read, write and execute permissions. This value can also be specified with the environment variable CLCK_NODE_TEMPDIR  (see 2.1.5 for details of environment variables). It is important to note that the environment variable has precedence over the configuration file.

<process­limit> value </process­limit>Override the default number of nodes that can be simultaneously checked (checks are parallelized by forking a process for each node). No more than value number of processes will ever be running concurrently. The default value is 64.

<retry> value </retry>The number of times a test module should be re-executed if it fails.

<user> value </user>The system user name to use when running test modules.This setting only affects test modules which are intended for regular users when using the tool as privileged user .

<env> export NAME=VALUE </env>Set user defined environment variables before executing the test module commands on the cluster nodes. The tag may be repeated many times to set

15

Page 16: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

more than one environment variable. This tag may be used globally for all tests or inside each test module configuration. It is important to note that the resulting environment configuration for a specific test module will be the merge of the globally configured environment variables and the ones configured in the test module. If a globally configured environment variable is redefined inside a test module configuration, the global value will be overridden for that test module. Example:

<cluster>   <env> export HOME=/home/$USER </env>   <nodefile>/etc/intel/clcknodelist</nodefile>   <test>       <intel_mpi_rt>          <env> export I_MPI_PERHOST=1 </env>          <env> export I_MPI_DEBUG=5 </env>          <device>rdssm</device>          <mpi­path>/opt/intel/mpi­rt/3.1</mpi­path>          <process­number>2</process­number>       </intel_mpi_rt>   </test></cluster>

Also take into account that the environment variables will be set in the order they are entered (top-bottom). This is specifically important for environment variables whose value depend on other ones. For cshell change the keyword 'export' to 'set'.

1.2.3. Selecting Test Modules

The default list of test modules to be executed may be altered at run time by including the following set of tags in the configuration file (see chapter 5.3 for the list of default test modules). Equivalent command-line options are also available (see 2.1.4 for details of each of these options). These configuration file tags and their respective command-line option may be freely mixed. However, command-line options have precedence over the configuration tags.

16

Page 17: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

<exclude_module> test_module </exclude_module>Individually exclude the test module named test_module. If other test modules depend on the excluded one, they will also be excluded. This tag may be used multiple times to exclude many test modules and is also available as command line option.

<include_module> test_module </include_module>Include the test module named test_module. The included test module will run in addition to the standard set of test modules. If the included test module depends on other test modules that are not explicitly included, the required dependencies will also be executed. This option may be used multiple times to include more than one test module and is also available as command line option.

<include_only_module> test_module </include_only_module>Ignore the default set of test modules and include only the individual one named test_module. Test modules required to satisfy the dependencies of the included one will also be included. This option may be used multiple times and is also available as command line option. This option also has precedence over <include_only>, so any modules added with that option will be ignored.

Each default test modules has an execution level associated (see Table 6.3 ), changing the default execution level with the ­­level command line option will also change the list of test modules to be executed (for details of this option see 2.1.4).

In addition to the default set of test modules, Intel® Cluster Checker has different predefined sets that can be executed using command line options. These options are : ­­compliance,  ­­sdk­compliance,  ­­certification and  ­­deployment (for details of these options see 2.1.4). Note that the three configuration tags described in this section will also alter the list of test modules included by these options.

1.2.4. Configuring Test Modules

The individual test modules may be configured for your cluster. The configuration parameters are contained inside the <test> ... </test> XML container. The <test> block should only be specified once in a configuration file. Parameters for a specific test module are further enclosed in a tag matching the name of the test module. For example, to configure the clock_sync test module:

17

Page 18: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

<cluster>...

  <test>  ...  <clock_sync> <deviation>30</deviation>  </clock_sync>  ...</test>

</cluster>

Additional documentation about the specific configuration options for each test module can be found at the Intel® Cluster Checker Test Module Reference Guide document.

1.2.4.1. Creating Configuration Groups

The group mechanism gives the flexibility of having a test module with different configuration values for different groups of nodes within the cluster. The matching between the configuration values and nodes is done based on the groups that each node belongs to, which is defined in the nodelist (see 1.1 for details on how to create the nodelist). The <group> tag in the configuration file should be located in the next nested level to the name of the test module. Otherwise, it will not pass the schema validation. Although not recommended, if the ­­force command line option is used and the tag is located at deeper XML levels, its configuration values will be ignored.For example, the following configuration for the system_memory test module specifies that nodes belonging to the 'bigmem' group should have 8GB of physical memory while the default amount of physical memory for the remaining nodes is 4GB and all nodes should have 4GB of virtual memory:

<system_memory><group name="bigmem">

    <physical>8388608</physical>  </group>

  <physical>4194304</physical>  <swap>4194304</swap></system_memory>

Group configuration parameters always supersede the default parameters for the nodes belonging to that group. Multiple group containers may be defined per module. So, the group configuration values will be used only for the nodes that belong to the

18

Page 19: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

specified group. If a node does not belong to any group, the test module general configuration values will be used.

Group names combination is also supported. Using the reserved key words 'OR' and 'AND' enhanced sentences can be created to have a configuration container apply to more than one group and/or to the intersection of two or more groups. For example, the following configuration shows four different values for an item: the first one applies to nodes that belong to group g1, g2 or g3; the second one applies only to nodes that belong to group g1, g4 and g5; the third one applies to nodes that belong to g5 or g6 and also belong to g7; and at last the default value for nodes that not match any of the above.

<system_memory><group name="g1 OR g2 OR g3">

       <physical>8388608</physical>      </group>

      <group name="g1 AND g4 AND g5">       <physical>2097152</physical>       </group>

      <group name="g5 OR g6 AND g7">       <physical>16777216</physical>      </group>     

<physical>4194304</physical>

 </system_memory>

It is important to consider that the expressions are evaluated from left to right and the operations have the same precedence. Also note that each node can match only one configuration option for each test module. If more than one configuration option applies to a node, a message indicating this will be printed and the test module will be skipped. A detailed list of the conflicting nodes will be printed if Intel® Cluster Checker is executed with verbosity 3 or higher.

1.2.4.2. Using Global Configuration Options

Most of the test modules have configuration options that are specific for each one (refer to the Test Modules Reference Guide for details). However, some test modules share some configuration options. For these cases, the <global_configuration> container can be used to write the configuration option once and avoid repeating it for

19

Page 20: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

every test module that uses it. The container should be placed outside the <test> container.

Two kinds of global configuration options are available: Single entry ones, used to set paths to utilities and libraries; and one option of multiples entries, used to configure the network fabrics.

Single entry global configuration options have only one XML tag that provides the information needed for that option. This kind of options are used to configure specific paths required by the test modules. If a global configuration option is defined, all test modules for which this configuration option has meaning will use the value globally defined. However, if a test module has a specific configuration, that value will be used instead. This means that test modules local configuration has precedence over the global one.See Table 2.1 for details of the configuration options available and test modules that they apply to.

For example, the base path of the Intel® MPI Library is used by imb_collective_intel_mpi, imb_pingpong_intel_mpi and hpcc test modules. Therefore, the following configuration:

   <test><imb_collective_intel_mpi>

<benchmark>barrier</benchmark><fabric>

<device>sock</device></fabric><mpi­path>/opt/intel/mpi/3.0</mpi­path>

</imb_collective_intel_mpi><imb_pingpong_intel_mpi>

<fabric><bandwidth>110</bandwidth><device>sock</device><latency>35</latency>

</fabric><mpi­path>/opt/intel/mpi/3.0</mpi­path>

</imb_pingpong_intel_mpi><hpcc>

<mpi­path>/opt/intel/impi/4.0.0</mpi­path><hpcc>

  </test>

Can be simplified by:

20

Page 21: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

  <global_configuration><mpi­path>/opt/intel/mpi/3.0</mpi­path>

  </global_configuration>     <test>

<imb_collective_intel_mpi><benchmark>barrier</benchmark><fabric>

<device>sock</device></fabric>

</imb_collective_intel_mpi>

<imb_pingpong_intel_mpi><fabric>

<bandwidth>110</bandwidth><device>sock</device><latency>35</latency>

</fabric></imb_pingpong_intel_mpi>

<hpcc><mpi­path>/opt/intel/impi/4.0.0</mpi­path>

<hpcc>   </test>

Note that hpcc will use a different version of Intel® MPI Library because its local configuration overrides the global one.

The global configuration option with multiple entries is used to define the network fabrics to be used by test modules that exercise Intel® MPI Library (see Table 2.1 for the list of test modules). The <network> container is used to hold the different fabrics that should be entered using the <fabric> container. Multiple fabrics can be configured and each one must include the corresponding MPI device (<device>). Optional attributes are available for the user to specify a custom name to the fabric (name=) and its state (enabled=). The default state is enabled.

There are three options to configure network fabrics in test modules:

1. No local network fabric configuration in the test module. The test module will use every global network in enabled state. If no global network fabric is available (or enabled) the test module will use its default.

21

Page 22: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

2. In the test module local configuration use the <device> tag to refer to a globally configured network fabric by its name (attribute name=), instead of defining an MPI device. In this case, only the specified network fabric will be exercised by the test module.

3. In the test module local configuration use the <device> tag to specify an MPI device. In this case globally configured network fabrics will be ignored.

Note that cases 1 and 2 allow to enable/disable network fabrics from one place and affect several test modules.

The following example shows how to create a configuration in which two network fabrics are globally defined.

 <global_configuration><mpi­path>/opt/intel/impi/3.2</mpi­path><network>

<fabric name=”IB” enabled=”on”><device>rdssm</device>

</fabric><fabric name=”SM­ETH” enabled=”on”>

<device>shm:tcp</device></fabric>

</network> </global_configuration>

  <test><intel_mpi>

<device>IB</device></intel_mpi>

  <imkl_hpl><fabric>

<device>rdssm</device><hpl>0.5</hpl><process­number> 3 </process­number>

</fabric> </imkl_hpl>

  </test>

The runtime behavior of this configuration is:

• Every test module that uses Intel® MPI Library with exception of intel_mpi and imkl_hpl will exercise both network fabrics (both are enabled).

22

Page 23: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

• intel_mpi will only test the network fabric named “IB” in global configuration.• imkl_hpl will ignore globally configured network fabrics and will test rdssm.

Note that <device> supports I_MPI_DEVICE and I_MPI_FABRICS styles to specify an MPI network fabric. An I_MPI_DEVICE definition must use one of: sock, shm, ssm, rdma, rdssm. In the case of the I_MPI_FABRICS style, the definition must match {shm,dapl,tcp,ptl,tmi,ofa}:{dapl,tcp,ptl,tmi,ofa}.

Additional Intel® MPI Library options can be provided by using an 'options' XML attribute for the <device> tag. The options will be reordered as required by MPI, placing global ones first.

For instance, the first example increases the verbosity of the MPI library run-time messages, the second one specifies the TCP network to use, third one enables the multi-rail fabric combination feature using MPI options and the last one shows how to select the tag matching interface (TMI*) transport.

 <device options="­genv I_MPI_DEBUG 5">ssm</device> <device options="­genv I_MPI_TCP_NETMASK ib0">ssm</device> <device options="­genv I_MPI_OFA_NUM_ADAPTERS 2">shm:ofa</device>  <device   options="­genv   I_MPI_TMI_LIBRARY   /usr/lib/libtmi.so ­genv I_MPI_TMI_CONFIG /etc/tmi.conf ­genv I_MPI_TMI_PROVIDER mx”> tmi </device>

See the Intel® MPI Library Reference Manual for more details on MPI device selection and the available configuration options.

The following table shows the global configuration options available and the test modules for which they apply.

Global configuration options Test Modules

<cc-path> clomp, hpcc, intel_cc, intel_cc_rtl, intel_cce_rtl, intel_mpi_testsuite, memory_bandwidth_stream

<fc-path> intel_fc_rtl, intel_fce_rtl, intel_mpi_testsuite

<gcc-path> gcc

<ibstat-path> dat_conf, openib

<mkl-path> hpcc, mflops_intel_mkl

<mpi-path> hpcc, imb_collective_intel_mpi,

23

Page 24: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

imb_message_integrity_intel_mpi, imb_pingpong_intel_mpi, imkl_hpl, intel_mpi, intel_mpi_internode, intel_mpi_rt, intel_mpi_rt_internode, intel_mpi_testsuite

<network> <fabric> <device>

hpcc, imb_collective_intel_mpi, imb_message_integrity_intel_mpi, imb_pingpong_intel_mpi, imkl_hpl, intel_mpi, intel_mpi_internode, intel_mpi_rt, intel_mpi_rt_internode, intel_mpi_testsuite

<perl-path> perl

<python-path> python

<ssh-path> ssh_version

Table 2.1 Global Configuration Options

When used in local configuration scope these parameters are not recognized inside group containers.

1.2.4.3. Altering the Test Module Dependencies

The relation between test modules is defined by a hierarchic structure of dependencies. The hierarchy is built with simple test modules at the top and more complicated ones at the bottom. A graphic with the test module dependencies hierarchy is provided inside the documentation folder in the file doc/ICR_Cluster_Checker_Dependencies_Graph.jpg.The dependencies between the test modules imply that if one fails, all the other ones depending on it will be skipped. The test modules dependencies may be modified using the following parameters:

<add_dependency> test_module </add_dependency>Add test_module to the list of dependencies for the test module where this option appears. This option may appear multiple times to specify more than one additional dependency.

<remove_dependency> test_module </remove_dependency>Remove test_module from the list of dependencies for the test module where this option appears. This option may appear multiple times to remove more than one dependency. Warning: this option should be used with extreme caution since it may result in unstable checking behavior.

Note that genuine_intel test module can not be removed using this option.

24

Page 25: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

1.2.4.4. Altering the Nodes Checked by a Test Module

Some test modules universally apply to all nodes, while others may only be appropriate for specific types of nodes. The type of nodes checked by a test module may be modified by using the following parameters:

<check_compute> value </check_compute>Override the default test module behavior as to which types of nodes are checked. Value may be set to any of ‘true’, ‘on’, or ‘1’ to configure the test module to include compute nodes in the check, or any of ‘false’, ‘off’, or ‘0’ to configure the test module to exclude compute nodes in the check.

<check_dedicated_head> value </check_dedicated_head>3

Override the default test module behavior as to which types of nodes are checked. Value may be set to ‘true’, ‘on’, or ‘1’ to configure the test module to include 'dedicated' head nodes in the check, or ‘false’, ‘off’, or ‘0’ to configure the test module to exclude 'dedicated' head nodes in the check. Dedicated head nodes are nodes that are exclusively head nodes and do not belong to any other node types.

<check_head> value </check_head>Override the default test module behavior as to which types of nodes are checked. Value may be set to ‘true’, ‘on’, or ‘1’ to configure the test module to include head nodes in the check, or ‘false’, ‘off’, or ‘0’ to configure the test module to exclude head nodes in the check.

<check_other> value </check_other>Override the default test module behavior as to which types of nodes are checked. Value may be set to ‘true’, ‘on’, or ‘1’ to configure the test module to include other nodes in the ‘false’, ‘off’, or ‘0’ to configure the test module to exclude other nodes in the check.

1.2.5. Using Multiple Configuration Files

After defining the XInclude* namespace, a configuration file may use <include> tags to reference external files in order to reuse settings or provide custom alternatives.3 This option is provided for backward compatibility only. It is deprecated; the <check_head> parameter should be used instead. If the <check_head> parameter is defined, this option will be ignored.

25

Page 26: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

For instance the following configuration

<cluster>   <test>       <speedstep>       <state>on</state>       </speedstep>   </test>

</cluster>

Can be replaced by:

<cluster xmlns:xi="http://www.w3.org/2001/XInclude">      <test>     <xi:include href="speedstep.zml"/>      </test>

</cluster>

Where the contents of speedstep.zml are:

<speedstep>      <state>on</state>

</speedstep>

1.3. License Fi le Path Configuration

The configuration of the license file path can be done in two ways:

1. Setting the environment variable INTEL_LICENSE_FILE to point to a folder containing a valid Intel® Cluster Checker license file. The folder and the file must have read and execute permissions. Example for Bash* shell:

export  INTEL_LICENSE_FILE=/opt/intel/licenses

2. Creating a text file named .flexlmrc in the user's home directory with the following content:

INTEL_LICENSE_FILE=<path_to_folder_containing_the_license>

26

Page 27: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

If option 1 is used, after the first execution of the tool option 2 will automatically be performed.

1.4. Updating Old Configuration Fi les

Before trying to execute Intel® Cluster Checker with a configuration file used with old versions of the tool, the following items need to be considered:

• The configuration file contents are automatically verified using an XML validation schema, any non-compliance will prevent the execution until resolved. Although not recommended, the execution can be forced by using the ­­force flag (see 2.1.4 for details of this option).

• The <version_id> configuration tag is not longer required, the configuration format is now automatically verified and no explicit version definition is needed.

• Updated configuration options for the following test modules:• the intel_mpi_rt and hpcc test modules allow configuration of the Intel®

MPI tuning feature• the openib test module offers configuration uniformity and correctness

checks• the intel_mpi_testsuite test module can be configured to select among

several different suites, also to exclude tests if required• the imb_pingpong_intel_mpi test module offers extra configuration tags

to customize the behavior of the Intel® MPI Benchmark

For further details on Intel® Cluster Checker test modules and the configuration options refer to the Intel® Cluster Checker Test Modules Reference Guide.

27

Page 28: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

2. Running Intel® Cluster Checker

2.1. Verifying Cluster Correctness

Intel® Cluster Checker has two execution modes: running checks on the cluster and gathering cluster information (see 2.2 for instructions on gathering cluster information).The general form of executing Intel® Cluster Checker is with the following command:

cluster­check xml_config_file options

Intel® Cluster Checker will simultaneously print output to the terminal and write 2 timestamp-labeled output reports in the current directory (see 2.1.5 for instructions on how to save the output files in an alternative location). The text output report (.out) is identical to the output printed to the terminal. The XML-based report corresponding to the highest verbosity level output and is suitable for parsing by other programs.

The tool may be run by both privileged and unprivileged users. Some test modules may require privileged access to function properly. Such test modules are automatically flagged and skipped if run by an unprivileged user.

Tip for running Intel® Cluster Checker: Creating a special user account expressly for the purpose of running Intel® Cluster Checker is recommended. This setup provides a stable, reproducible environment for performing checks and a convenient way to store the records of past checks.

Tips for systems with l imited disk space: By setting the environment variable TEMP it is possible to have Intel® Cluster Checker decompress its temporary libraries at a user defined path (default is /tmp). For example:

TEMP=/home/icr  cluster­check 

Additionally, use the environment variables CLCK_HEAD_TEMPDIR and CLCK_NODE_TEMPDIR  (see 2.1.5 for complete list of the environment variables) to change the temporary locations used during execution.

28

Page 29: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

2.1.1. Console Output

During execution, several configuration and diagnostic messages are provided through console output.

The default output contains a header showing the configuration to be used during execution, the status of each test module and sub-tests, and at last the overall result. However, the output may change depending on some configuration options and execution modes.

The header shows information such as the command line, user credentials and start date of the execution. Some settings details are also shown; for instance the path to the configuration file used and its contents, and the list of nodes checked.

The results report lists all the test modules included in the execution, detailing their names and descriptions. In the default verbosity level, only the failed sub-tests will be shown in console. However, if verbosity is increased by the user (see 2.1.3) the entire list of executed sub-tests is shown detailing each one's status.

The sub-tests are grouped and sorted by an associated severity, being the severity levels shown before each set of results. The severity of the issues found by each sub-test will help to understand if troubleshooting is required.

The available severity levels are SUCCESS, NOTICE, WARNING, ERROR, and CRITICAL. A brief description of each of them can be found on the following table.

Severity Level DescriptionSUCCESS The sub-test passed successfully.NOTICE Informational findings or potential errors, such as inconsistencies on uniformity.WARNING Non-urgent failures, such as minimum performance deviations.ERROR Items to be corrected, such as non-functional benchmarks.CRITICAL Significant errors, such as non-working cluster-wide subsystems.

Table 3.1 Sub Tests Severity Levels

In the test modules included in the Intel Cluster Ready compliance set (see 5 for details) the severity of the findings is set to ERROR.

For base test modules such as ping and ssh, the severity of the findings is considered CRITICAL. In the case of performance thresholds and deviation sub-tests, the severity will depend on how distant the results are from the expected values. In the case of performance thresholds sub-tests the comparison of the result will be done

29

Page 30: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

against the value set in the configuration file. In the case of deviation sub-tests the comparison of the result will be against the median (see 6 for more details). A difference of up to 5% will be considered as WARNING severity, over 20% as ERROR and of CRITICAL severity over 50%. If the performance thresholds sub-test have no value set in the configuration file, the severity will be NOTICE.

2.1.2. Log Fi les

Two types of files are created with the results of each execution. On one hand a simple file is generated replicating the text showed in the console output. As with the console output, the amount of information available in the file is controlled with the ­­verbose option (see 2.1.4 for details of command line options). The name of this file is formed using the input configuration file name plus a time-stamp (indicating the time of execution) and a .out suffix.

On the other hand, an XML* file is generated with the complete list of tests and sub-tests executed. This file always contains all the results for every node in the cluster. For that reason this file is significantly larger than the simple text one. The XML* format makes it suitable for other tools to use it as input. The name of this file is also created using the input file name plus a time-stamp but an .xml suffix. The tool uses a simple fallback mechanism to define where the log files are created. When a valid directory with write permission is found, it is used and the fallback stops in that step. The steps are:

1. The directory defined in the environment variable CLCK_LOG_DIRECTORY if it is defined (see 2.1.5 for details of environment variables).

2. /var/log/intel/clck/ if the tool was installed in the standard way.3. <installation_directory>/logs/ if the tool was installed in a custom

fashion.4. The current directory from which the tool is executed.

Note that the -­report option will use the same fallback mechanism when searching for log files.

For details on how to install the tool refer to section 4 of the Release Notes. Note that the creation of log files may be disabled with the ­­nolog option (see 2.1.4 for details of command line options).

30

Page 31: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

2.1.3. Additional Output

<verbose>value </verbose>Control the output verbosity. A higher value produces more output. Can also be set by command line option (see 2.1.4 for the complete list of command line options).

<debug/>Setting this option creates files named testmodule.timestamp.debug that contains the command(s) executed on each node and the corresponding output. This option can be individually enabled for each test module. The debug files creation can also be enabled with the ­­debug command line option (see 2.1.4 for details of command line options). The location of debug files will follow the same mechanism as log files (see 2.1.2 for details).

Tip for resolving reported issues: Determining why a test module is failing may not always be readily apparent based on the diagnostic messages. Looking at the debug file may reveal a more complete error message than the one printed in the console. The user can also try running the same commands himself to confirm that the output is the same.

2.1.4. Command Line Options

Command line options may be used to alter the runtime behavior of Intel® Cluster Checker.

­­autoconfigureEnable the automatic configuration capabilities. Also available as the ­­auto short option. See 8 for more details.

31

Page 32: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

­­compliance versionCheck compliance of the cluster to the Intel® Cluster Ready Specification. Versions 1.0.x are all functionally equivalent. Multiple versions can be entered separated by commas to verify compliance of several of them in a single execution. If no values are provided, they are read from the /etc/intel/icr file. In the case that the file is not available, all existing versions are checked. See Table 6.1 for the list of test modules executed. This list can be altered with the options described at 1.2.3.A successful run should not be interpreted as complete Intel® Cluster Ready compliance as other requirements must also be met. This option replaces the default set of test modules with the compliance set.

­­certification versionCheck requirements for certification of the cluster against the Intel® Cluster Ready program, under the provided Intel® Cluster Ready specification version. The values for the version are the same as for the --compliance option. A successful run should not be interpreted as certifying Intel® Cluster Ready compliance as other requirements must also be met. This option replaces the default set of test modules, as required by the certification procedure. This option includes different sets of test modules if executed with users with privileges.

­­debugGenerate debug files for every test module executed. To enable debug files only for specific test modules use the XML configuration file (see 2.1.3 for instructions). The location of debug files will follow the same mechanism as log files ( see 2.1.2 for details).

­­deployment versionCheck a cluster after first deployment, this option executes both compliance and 'wellness' test modules. The values for the version are the same as for the --compliance option. See the ­­list option for more details on each set of test modules.

­­exclude test_moduleExclude the test module named test_module. If other test modules depend on the excluded ones, they will be skipped. This option may be used multiple times to exclude many test modules and is also available in the XML configuration file (see 1.2.3).

Note that genuine_intel test module can not be removed using this option.

32

Page 33: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

­­forceDo not apply validation of the XML configuration file before execution.

­­helpPrint a short help menu describing command line options.

­­include test_moduleInclude the test module named test_module. The included test module will run in addition to the standard set. If the included test module depends on other ones that are not included, the required dependencies will also be included. This option may be used multiple times to include more than one test module and is also available in the XML configuration file (see 1.2.3).

­­include_only test_moduleIgnore the default set of test modules and execute only the one named test_module and all its dependencies (dependencies are executed first). This option is particularly useful when working to resolve a failure and the user wants to run only the failing test module. This option may be specified multiple times and is also available in the XML configuration file (see 1.2.3).Command line option has precedence over the configuration file. So, if a test module is configured with this option in the XML file and another is requested by command line option, only the command line one will be included. This option also has precedence over ­­include, so any modules added with that option will be ignored.Combining this option with ­­exclude is discouraged.

­­level valueEach test module has a check level assigned: Fast, non-intrusive tests have a low level (i.e., 1) while slow, intrusive tests have higher levels assigned. The ­­level option tells the tool to run only test modules with levels less than or equal to value. However, by explicitly including test modules with higher check level, this option is overridden. The minimum value is 1, the maximum is 5 and default is 3. See chapter 5 for the complete list of test modules and their check level.

­­listPrint the list of test modules to the screen and exit.

­­nodefile

33

Page 34: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Check the nodes listed in file. Note: this option overrides the <nodefile> parameter from the XML configuration file.

­­nodepsAdd no dependencies to exclusively included modules. This option only works when used together with ­­include_only or <include_only_module>, and is meant for troubleshooting purposes only.

To guaranty accessibility to all cluster nodes to be tested the test module ping is always executed first.

­­noheaderDo not print the tool version, XML file, or information on included / excluded test modules as part of the output. The configuration file contents are written in the XML output report regardless of this value.

­­nologDo not write the output report to disk.

­­packagesGenerate the set of files required for the packages test module. Since the head node and one node on each defined node group is analyzed, their installed packages are listed on a file for a comparison during the execution of the test module.The reference list may contain comments in each line (after a package entry or in a new line) following the '#' character.

­­report valueInstruct cluster-check to look for the latest output logs in the log directory (see 2.1.2) and generate a report with descriptive information about each execution. The value entered represents the number of latest logs that will beexamined to generate the output. The files are ordered by date, taking the latest files first. A maximum of five logs are examined. To specify a directory that contains the output logs to be analyzed use the CLCK_LOG_DIRECTORY environment variable.In addition to the latest five logs, all certification logs are checked for a successful execution.Verbosity can be changed using ­­verbose  value , where higher values produce more output. These values range from 1 to 4:

34

Page 35: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

1 - reports if successful logs were found, and provides the date of the execution and the path to the log.2 - adds a list of details containing: full path to log, check type, overall status, date, version of Intel(R) Cluster Checker run, command line and, if the overall status failed, a short message with a possible reason.3 - adds the list of failing modules.4 - adds the list of the passing modules.

If --report is executed without a value, then only certification logs are examined until a successful one is found or no more certification logs are left, and a report with a verbosity of 2 is generated.

­­reverse (experimental)Enable a reverse dependency tracking mode assuming an optimistic behavior. cluster-check runs without adding dependencies to the list of test modules to be executed. This list is sorted with the modules with more dependencies first. If a test module fails, then root causing is triggered and the list of modules to be executed is placed by the failed module's dependencies, sorted in the same way.

To guaranty accessibility to all cluster nodes to be tested the test module ping is always executed first.

This option can be used together with ­­level, but not with ­­deployment, ­­certification, ­­compliance options or ­­sdk­compliance.

WARNING: This feature is experimental and included here for feedback. The dependency handling mechanism is altered and test module assumptions may not be properly handled. Output logs generated using this option cannot be used for certification purposes.

­­sdk­complianceCheck compliance of the cluster to the Development Cluster section of the Intel® Cluster Ready Specification version 1.1. If the --compliance option is not also specified, --compliance=1.1 is implied by the selection of this option. Note that is option is only available for version 1.1 of the Specification. See Table 6.2 for the list of test modules. The list of test modules may be altered with the options described at 1.2.3.

35

Page 36: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

A successful run should not be interpreted as complete Intel® Cluster Ready compliance as other requirements must also be met. This option replaces the default set of test modules with the complete set of compliance.

­­verbose valueControl the amount of output. Higher values produce more output:

1 Report the overall success / failure only with no information on the status of the test modules.

2 Report the success / failure of each test module, the overall success / failure and the total elapsed time. Failing or indeterminate test modules will print additional output. This is the default verbosity level.

3 Same as 2, but also prints the name of the failing test modules that cause other ones to be skipped.

4 Report the success / failure of each test module and the overall success / failure. All test modules, regardless of status, will print additional output.

5 Report the success / failure of each test module and the overall success / failure. All test modules, regardless of status, will print additional output. In addition, the version of each one will be displayed.

Tip for running Intel® Cluster Checker: Use the ­­level command line option to develop an automated, periodic check for your cluster. For example, consider running the relatively quick, unobtrusive level one test modules daily or as part of a resource manager job preamble script while saving the higher level test modules for weekly or monthly preventive maintenance periods.

2.1.5. Environment Variables

These are the environment variables recognized by the tool:

CLCK_LOG_DIRECTORYWrite the output reports to the specified directory rather than the one from which the tool is executed.

CLCK_MODULE_PATHSet the path to search for third-party Intel® Cluster Checker test modules. The environment variable may contain multiple directories separated by a colon.

CLCK_HEAD_TEMPDIR

36

Page 37: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Change the default location where temporary directories/files are created in the head node to prepare the checks to execute. The default location is at /tmp. CLCK_HEAD_TEMPDIR should contain the absolute path to an existent directory in the head node with read, write and execute permissions. This value can also be specified from the XML configuration file by using the <head_tempdir> tag. Note that the environment variable has precedence over the configuration file.Take into account that the MPD subsystem of Intel® MPI Library may create its own temporary files, look for I_MPI_MPD_TMPDIR in the Intel® MPI Library user guide for details of how to configure their location.

CLCK_NODE_TEMPDIRChange the default location at which temporary directories/files are created in compute nodes for testing purposes. Test modules that include the head node in their checks will use this path when verifying it. The default location is at /tmp. CLCK_NODE_TEMPDIR should contain the absolute path to an existent directory in all nodes with read, write and execute permissions. This value can also be specified from the XML configuration file by using the <node_tempdir> tag. Note that the environment variable has precedence over the configuration file.

CLCK_REGULAR_USERThe user to behave as when running as a privileged user. This setting only affects test modules which are intended for regular users. This has precedence over the <user> configuration tag.

2.2. Gathering Cluster Information

It is possible to use Intel® Cluster Checker to generate a report with useful cluster information for statistical sales analysis. The report can be created using the command line option ­­sales­report to process an existent log file generated by the tool.

A text file in comma-separated values (CSV) format will be generated with the collected information. The following items will be available in the file:

1. vendor sales order number2. analyzed log file name3. Intel® Cluster Checker time-stamp4. Intel® Cluster Checker serial number5. number of nodes

37

Page 38: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

6. overall Intel® Cluster Checker pass/fail7. total amount of memory8. number of CPUs9. number of cores10. types of CPUs11. brand of interconnect if available12. kernel version13. OFED* version14. Intel® MPI Library version15. Intel® Cluster Runtime version16. Intel® Cluster Ready Reference Implementation identifier17. Intel® Cluster Checker version

If an item is not found in the cluster­check­log.xml file the matching field will be completed with the text “Not available”. To enable the generation of a report with all the items completed, information from core_count, cpuinfo, kernel, pci, and system_memory test module must be available on the provided log. This is normally the log of a default execution (‘wellness’ with execution level 3).

2.2.1. Command Line Options

Sales report mode is enabled using custom command line options.

cluster­check   ­­sales­report  cluster­check­log.xml    ­­order­number sales­order­number

­­sales­report  cluster­check­log.xmlInstructs Intel® Cluster Checker to run in XML parsing mode on the provided log file and create the sales report with the information described above.

­­order­number sales­order­numberSales order number of the cluster for which sales­report is being generated. If this option is not provided, the corresponding field in the output file will be completed with the tag "sales­order­number".

38

Page 39: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

3. User-Defined Checking

Intel® Cluster Checker plug-in architecture allows the user to create new test modules (see the Developer's Guide for complete details). However, two special test modules permit the user to define custom test modules without the need to write any code. The generic_correctness and generic_uniformity test modules execute commands specified by the user and check the correctness and uniformity of the output.

These test modules are not part of the default set and must be included using the <include_module> tag or the ­­include command line option.

Tip for running user-defined checking: Since the generic test modules run arbitrary commands, by default they will not be evaluated if the tool is run as a privileged user. See the entries for the generic test modules in Test Modules Reference Guide to learn how to override this behavior.

3.1. Correctness Checking

The generic_correctness test module executes an arbitrary command and compares the output to a specified value. An exact match (case and whitespace sensitive) is considered a successful result. The output of multiple commands may be checked by using multiple <item> container tags:

<generic_correctness><item>

    <command>uname ­r</command>    <result>2.4.21­20.EL</result>  </item>

<item>    <command>/sbin/lsmod | grep e1000</command>    <result>e1000        171104   1</result>  </item></generic_correctness>

Consult the generic_correctness entry in the Test Modules Reference Guide for more information.

39

Page 40: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

3.2. Uniformity Checking

The generic_uniformity test module executes an arbitrary command and compares the output to the other cluster nodes. The same result on all nodes is considered a successful result.

<generic_uniformity><command>uname ­r</command>

  <command>/sbin/lsmod | grep e1000</command></generic_uniformity>

Consult the generic_uniformity entry in the Test Modules Reference Guide for more information.

40

Page 41: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

4. Copy Exactly

copy_exactly is a test module that Intel® Cluster Checker executes to verify that nodes are an exact copy of a reference node. Using a list of reference file checksums, the copy_exactly4 test module confirms the checksums match the actual files present on the nodes. The list of reference file checksums may be provided by a third party, or may be generated using one of your nodes as reference. The node_checksum utility is provided with Intel® Cluster Checker to generate this reference file.

The node_checksum program generates a file containing the checksum for most of the files on a node. In most cases, the node_checksum utility without any command line options should be run on the reference node. It automatically excludes files that are known to vary between nodes because they contain MAC or IP address, hostnames, or other node specific information or are temporary. Additionally, user specified files may also be excluded from the checksum list via an exceptions file during the generation step.

The exceptions file is a list of basic regular expressions, one per line. All special characters, such as '+', may need to be escaped with '\' to be literally interpreted. Only characters '. ' and '$' are accepted as part of a regular expression. For example, to exclude the /usr/bin/gcc and /usr/bin/g++ files, all files ending in gconf.xml, and all files in the /usr/local/node/  folder, the exception file would contain:

/usr/bin/gcc/usr/bin/g\+\+gconf.xml$/usr/local/node/

The exceptions file should not contain any blank lines. To apply the exceptions file, run node_checksum with the path to the exceptions file as the first and only command line option. For more information on using extended regular expressions, see the man page for grep.

4 See http://www.intel.com/design/quality/mq_ce.htm for more about the Intel® Copy Exactly philosophy.

41

Page 42: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Please see the copy_exactly documentation in the Test Modules Reference Guide for more information. Also, a similar validation at installed packages level could be done through the packages test module.

42

Page 43: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

5. Intel® Cluster Checker Test Modules

The test modules are divided into 4 sets:

1. One set is used to check compliance with the base Intel® Cluster Ready Specification. The set is used with the ­­compliance command line option (see 2.1.4 for the complete list of command line options).

2. The second set is used to check compliance with the Developer Cluster section of the Intel® Cluster Ready Specification. The set is used with the --sdk-compliance command line option (see 2.1.4).

3. The third set is used to check if the cluster is configured correctly and performs according to expectation. This set is the default (no command line option) and is named ‘wellness’ mode.

4. The fourth set is composed by optional tests modules.

The ­­list switch is available to print the list of available test modules (see 2.1.4 for the complete list of command line options).

5.1. Compliance Test Modules

Running the tool in compliance mode using the ­­compliance command line option loads the following set of test modules. Excepting the gige test module that is only included in compliance for 1.0 and glibc_verision and openssh_verision that are included for 1.0 and 1.1, all other test modules are included in 1.0, 1.1 and 1.2 compliance modes.

Test module Name Description1GiB_memory Minimum of 1 GiB of memory per node and 0.5 GiB of memory per core65GiB_storage_head Minimum of 65 GiB of direct access storage on the head nodebase_libs Base and runtime libraries are providedcluster_size Cluster size >= 4 nodesfile_tree Materially identical software images / file treesgenuine_intel GenuineIntel processorsgige Gigabit Ethernet port presentglibc_version 32-bit and 64-bit GNU runtime (glibc) version compliancehome Shared, common /homeicr_version_compliance Intel® Cluster Ready version compliance (/etc/intel/icr)intel_cc_rtl_version 32-bit Intel® C++ Compiler runtime libraries version complianceintel_cce_rtl_version 64-bit Intel® C++ Compiler runtime libraries version compliance

43

Page 44: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Test module Name Descriptionintel_cmkl_rtl_version Intel® Math Kernel Library, Cluster Edition runtime version complianceintel_fc_rtl_version 32-bit Intel® Fortran Compiler runtime libraries version complianceintel_fce_rtl_version 64-bit Intel® Fortran Compiler runtime libraries version complianceintel_mpi_rtl_version Intel® MPI Library runtime version complianceintel_tbb_rtl_version Intel® Threading Building Blocks runtime version compliancejava_version Java Runtime Environment version compliancekernel_version Kernel version complianceLib32_counterpart_lib64 32-bit libraries have 64-bit counterpartsmpi_consistency Consistent MPI image (mpirun / mpiexec)ip_consistency All-to-all Network Connectivityopenssh_version OpenSSH version complianceperl_version Perl version compliancepython_version Python version compliancesingle_authentication Single authentication domaintcl_version Tcl version complianceX11_clients X11 clients are provided on head nodeX11_libs X11 runtime libraries are provided

Table 6.1 Compliance Test Modules

5.2. SDK Compliance Test Modules

Running the tool in development cluster compliance mode using the ­­sdk­compliance command line option loads the test modules from Table 6.1 and adds the following set of test modules:

Test module Name Descriptionbinutils_version GNU binutils version compliancegcc_version GNU C Compiler suite (gcc and g++) version compliancegdb_version GNU debugger (gdb) version compliancegmake_version GNU make version complianceintel_devtools_version Intel® Cluster Ready Developer Tools version complianceJdk_version Java Software Development Kit version compliance

Table 6.2 SDK Compliance Test Modules ( in addition to Table 6.1)

5.3. Default Test Modules

By default, Intel® Cluster Checker runs in 'wellness' mode with check level 3. So, the tests from the list below with check level 3 or lower will be executed by default. Changing the check level with ­­level command line option will include/exclude test modules only from this set.

Test module Name Description Levelarch System Architecture Uniformity 1

44

Page 45: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Test module Name Description Levelavailable_disk Available Disk 1bash Bourne Again Shell 1clean_ipc System V Interprocess Communication 2clock_granularity gettimeofday() Clock Granularity 1clock_sync Clock Synchronization 1core_count Core Count (Multi-core & Hyper-Threading Technology) 1core_frequency Core frequency uniformity 1cpuinfo /proc/cpuinfo Uniformity 1csh C Shell 1dat_conf Valid /etc/dat.conf entries 1disk_bandwidth Single-node Disk Bandwidth 3dmidecode SMBIOS/DMI Uniformity 1environment Uniform environment variables 1file_permissions File Existence, Ownership, and Permissions 1genuine_intel GenuineIntel processors 1hdparm Singe-node Disk Performance (hdparm) 3hardware_uniformity Hardware uniformity 3hostname Hostname Correctness 1hpcc HPC Challenge Benchmark (Intel® C++ Compiler, Intel® MPI

Library, Intel® Math Kernel Library)4

imb_collective_intel_mpi MPI Collectives (Intel® MPI Benchmarks; Intel® MPI Library) 3imb_message_integrity_intel_mpi MPI Message Integrity (Intel® MPI Benchmarks; Intel® MPI

Library)3

imb_pingpong_intel_mpi Network Performance (Intel® MPI Benchmarks; Intel® MPI Library)

3

intel_cce_rtl Intel® C++ Compiler runtime libraries 2intel_fce_rtl Intel® Fortran Compiler runtime libraries 2intel_mpi_rt Intel® MPI Library Runtime Environment (Single-node) 1intel_mpi_rt_internode Intel® MPI Library (All nodes) 2kernel Kernel Version Uniformity 1kernel_modules Kernel Test module Correctness and Uniformity 1kernel_parameters Linux Kernel Runtime Parameters 1ksh Korn Shell 1loopback Loopback Address 1memory_bandwidth_stream Single-node Memory Bandwidth (STREAM) 3mflops_intel_mkl Single-node floating point performance (Intel® Math Kernel

Library)3

mount_proc procfs Filesystem 1nfs_mounts NFS mounts 1nsswitch Name Service Configuration (/etc/nsswitch.conf) 1packages Installed packages 4pci PCI Device Consistency 1perl Perl Interpreter 1ping Basic Network Connectivity 1ip_consistency All-to-all Network Connectivity 1process_check Stale Process Check 2python Python Interpreter 1sh Bourne Shell 1shm_mount /dev/shm mount test 1single_authentication Single authentication domain 1speedstep Intel® SpeedStep(R) Technology 1

45

Page 46: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Test module Name Description Levelssh Node SSH Connectivity 1stray_uids Files ownership UID/GID check 1system_memory System Memory Uniformity 1tcsh Enhanced C Shell 1tmp Permissions on /tmp 1uid_sync User and Group Uniformity 1

Table 6.3 Default Test Modules

5.4. Optional Test Modules

For these test modules to be executed they should be explicitly requested by the user, either through the command line options (see 2.1.4 for the complete list of command line options) or editing the configuration file (see 1.2.3 for instructions using the configuration file).

Test module Name Descriptionclomp Intel® C++ Compiler Cluster OpenMP runtime librarycopy_exactly Copy Exactly! file treescron Cron Disableddmidecode Check the uniformity of the SMBIOS/DMI informationetc_hosts IP entries in /etc/hosts filegcc GNU C/C++ compilergeneric_correctness Generic correctness testgeneric_uniformity Generic uniformity testhost_conf /etc/host.conf Configurationibadm Mellanox InfiniBand In-band Monitorimkl_hpl Intel® Optimized HPL Benchmark intel_cc Intel® C++ Compilerintel_fc Intel® Fortran Compilerintel_ethernet_driver Intel® Ethernet Network Driversintel_mpi Intel® MPI Library (Single-node)intel_mpi_internode Intel® MPI Library (All nodes)intel_mpi_testsuite Intel® MPI Library Test Suiteipoib IP over InfiniBandiwarp Check uniformity of iWarp deviceslsb Linux Standard Base (LSB*) Compliancenisdomain NIS Domainnismaps NIS Password Map Consistencynumactl Check NUMA Hardware and Policy Uniformityopenib InfiniBand Adapter Status (OpenIB)portal Portal name resolutionprocessor_cache Processor multiple layers cache testprocessor_msr Processor Model Specific Registers (MSRs)ssh_version SSH version uniformitysubnet_manager InfiniBand Subnet Manager

46

Page 47: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Table 6.4 Optional Test Modules

47

Page 48: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

6. Performance Test Modules

Intel® Cluster Checker includes several test modules which exercise most used High Performance Computing benchmarks for clusters; this allows performance comparisons against a reference system to ensure proper health and functionality of the systems.

The following table summarizes the performance-related test modules. More details of each one can be found in the Intel® Cluster Checker Test Module Reference Guide.

test module description type

hdparm hard disk read timings single-node

memory_bandwidth STREAM* single-node

mflops_intel_mkl Intel® Math Kernel Library DGEMM single-node

imb_pingpong_intel Intel® MPI Benchmark - Ping Pong pair-wise

imkl_hpl Intel® Optimized HPL* cluster-wide

hpcc Intel® Optimized HPCC* cluster-wide

Table 7.1 Performance Test Modules

These test modules are optimized to balance execution time on their default behavior. However, benchmarks can be configured to gather better performance numbers if required, by increasing their input problem size or execution approach.Performance-related test modules can also validate the results obtained against user provided thresholds if explicitly configured. In addition to the binary version of the benchmark some of the test modules have a build option to be compile it from the sources at execution time if required.

Intel® Cluster Checker includes three types of benchmarks: single-node, pair-wise and cluster-wide.

6.1. Single-node Benchmarks

In these benchmarks results don't depend on the quantity of nodes in the cluster and can be easily used to compare the nodes from clusters of different sizes. Most of the single-node benchmarks have a performance deviation check to verify that all nodes report a similar performance among them.

48

Page 49: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

The performance deviation is measured using the median and the standard deviation of the gathered performance values. Each value should be in the range of a predefined number of standard deviations from the overall median.The allowed range can be summarized as (median ± factor x stddev), with factor equal to 3 by default. However, this factor can be configured by the user in each test module.

6.2. Pair-wise Benchmarks

Pair-wise benchmarks select combinations of two nodes in the cluster and test the communication performance between them. This helps to detect possible degradation in the fabrics used in the cluster. This kind of benchmarks also have the performance deviation check to verify that all node pairs report a similar performance among them.

6.3. Cluster-wide Benchmarks

Cluster-wide benchmarks do depend on the whole quantity of nodes on the cluster. Therefore, their results can be used to understand the behavior of a cluster as a whole.

The HPCC* benchmark is a set of benchmarks used to exercise different components of a system, it includes benchmarks that are also provided as independent test modules in order to allow the gathering of both single-node and cluster-wide performance measurements.

49

Page 50: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

7. Heterogeneous Clusters

7.1. Nominal Hardware Variation

Intel® Cluster Checker must be configured to recognize nominal hardware variations. There is no limit on the number of nodes or types of nominal variations allowed within the cluster.

Intel® Cluster Checker should be configured to recognize nominal variation using the 'group' property. This feature requires editing the nodes list file and the XML configuration file.

The nodes list file permits to manage groups using the 'group:' label and the name of the group. For example, the following file defines 4 nodes where node2 and node3 have different processor models and the other ones have similar hardware.

# list of nodes to checknode1  # head node2  # group: XeonE5506node3  # group: XeonX5560node4

In addition, the XML configuration file should be edited to use the created groups. Every test modules that checks hardware uniformity should contain a <group> tag with the corresponding group name. The following example shows how to configure the hardware_uniformity  test module.

<hardware_uniformity>     <group name="XeonE5506"/>

<group name="XeonX5560"/></hardware_uniformity>

7.2. Sub-clusters

Sub-clusters are internal divisions of a single cluster where some nodes may have completely different hardware capabilities from others . As defined by the resource manager, jobs may or may not span sub-clusters; the application proxy test modules in Intel® Cluster Checker (e.g. HPCC) may be configured to check sub-clusters independently if jobs will not span sub-clusters.

50

Page 51: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

It is possible to configure Intel® Cluster Checker to use different configurations for each sub-cluster. It is necessary reflect this configuration in the nodes file and in the XML configuration file. The node file should be configured using the ‘group:’ label. The following example shows how to define 5 nodes on the group1 and 4 nodes on the group2

# list of nodes to checknode1  # head group: subCluster1 node2  # group: subCluster1node3  # group: subCluster1 node4  # group: subCluster1node5  # group: subCluster1node6  # group: subCluster2node7  # group: subCluster2node8  # group: subCluster2node9  # group: subCluster2

The XML configuration file should use the <group> tag. The name for each group should be the same in both files. The following example shows how to configure the hpcc test module for two the different groups.

    <hpcc>   <group name="subCluster1"><cc­path>/opt/intel/cce/9.1.038/</cc­path> 

<fabric><bandwidth>.015</bandwidth> <device>sock</device> <dgemm>8.5</dgemm> <fft>.5</fft> <hpl>.023</hpl> <latency>60</latency> <ptrans>.15</ptrans> <randomaccess>.002</randomaccess> <stream>0.9</stream> 

</fabric><fabric>

<bandwidth>.25</bandwidth> <device>rdssm</device> <dgemm>8.5</dgemm> <fft>.5</fft> <hpl>.026</hpl> <latency>20</latency> <ptrans>1.0</ptrans> <randomaccess>.012</randomaccess> 

51

Page 52: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

<stream>0.9</stream> </fabric> <mkl­path>/opt/intel/cmkl/9.0/</mkl­path>  <mpi­path>/opt/intel/impi/3.0/</mpi­path><process­number>4</process­number> <thread­number>1</thread­number>  

     </group> 

<group name="subCluster2">       <cc­path>/opt/intel/cce/9.1.038/</cc­path> 

       <fabric>   <bandwidth>.03</bandwidth> 

<device>sock</device> <dgemm>8.4</dgemm> 

   <fft>1.04</fft>    <hpl>.020</hpl> 

   <latency>25</latency>    <ptrans>.2</ptrans>    <randomaccess>.002</randomaccess> 

   <stream>2.69</stream>    </fabric>

  <fabric><bandwidth>.8</bandwidth> <device>rdssm</device> 

     <dgemm>8.4</dgemm>    <fft>1.025</fft>    <hpl>.022</hpl>    <latency>8</latency> 

   <ptrans>.32</ptrans>    <randomaccess>.012</randomaccess>    <stream>2.69</stream> 

   </fabric>   <mkl­path>/opt/intel/cmkl/9.0/</mkl­path> 

   <mpi­path>/opt/intel/impi/3.0/</mpi­path><thread­number>1</thread­number> 

     </group>    </hpcc>

7.3. Fat Nodes

‘Fat nodes’ are nodes with a hardware super-set relative to other nodes used for the same purpose, e.g. extra memory or additional secondary storage compared to a 'regular' compute node.

52

Page 53: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Intel® Cluster Checker must be configured to recognize the 'fat' node variation from 'regular' nodes. For example, in order to support nodes with extra memory or secondary storage, the nodes on the node list should be configured as part of a ‘fat’ group.

# list of nodes to checknode1  # head node2  # group: fatnode3  node4

The special characteristics of the fat node should be specified in the XML Configuration file by defining different parameters for specific test modules. The following example shows how to configure the system_memory test module.

<system_memory>

<group name="fat"><physical>8388608</physical>

</group>

         <physical>4194304</physical>   <swap>4194304</swap></system_memory>

53

Page 54: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

8. Automatic Configuration

This section provides the details of the Intel® Cluster Checker automatic configuration feature.

8.1. Overview

The ­­autoconfigure option simplifies the initial configuration required to run the tool. At the beginning of the execution the tool scans the cluster nodes to gather the information that will be used to complete the configuration provided by the user in a basic configuration file. The feature is capable of detecting the cluster nodes, configuring the path for the Intel® Cluster Runtimes tools and configuring the single node performance test modules thresholds. This section explains the details for each one.

8.2. Command Line Options

­­autoconfigure [OPTIONS]Instructs the tool to automatically set some configuration parameters. The option may be shortened, as --auto for instance.

Note that automatic configuration mode requires a basic configuration file to run. Therefore, the configuration file must be passed by command line together with the above option or must be available at one of its default locations (see 1.2).

8.2.1. Automatic Configuration Options

The optional additional parameters are distributed across two categories: options to control the targets for automatic configuration and options to control how the new configuration is stored. The options provided should be entered separated by commas as a single string with no spaces in between.

8.2.1.1. Automatic Configuration Targets

These options control what parts of the configuration will be subject to automatic configuration. When a specific target option is provided, only that target will be

54

Page 55: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

automatically configured. If no option is provided, the tool will default to all the targets available to the user running the tool.

• global (enabled by default)Use the global configuration feature (see 1.2.4.2) to automatically set the path to the Intel® Cluster Runtimes Tools. The target configuration options are <cc­path>, <fc­path>, <mpi­path> and <mkl­path>. If the paths are already configured they will be redefined.

It is a requirement that the tools must be installed according to the Intel® Cluster Ready Specification and be uniform across all nodes. If more than one version of a tool is detected, the latest one will be used.

The following example will attempt to perform global path configuration. A backup file will be stored on the same directory of the provided configuration file before modifications.

Example:

cluster­check ­­auto global config.xml

• nodes (enabled by default)Automatically discover the nodes of the cluster and, if applicable, create a new nodelist file. If the configuration file has the path to a nodelist file, or if the ­­nodefile command line option is provided, it will disable nodes automatic discovery. For more details of the discovery see 8.4.If performance automatic configuration is also used, the nodelist file will also contain the information of the hardware discovered in each node for advance usage of the grouping feature (for more details of HW grouping see 8.5.1).

The following example will attempt to perform compute nodes discovery. A new nodelist file will be automatically generated in the same location of the provided configuration file.

Example:

cluster­check ­­auto nodes /home/icr/clck_conf/config.xml

• performance (privileged user only)Scan the cluster compute nodes to detect the main hardware components and based on the information gathered perform a basic heuristic calculation to set

55

Page 56: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

the thresholds for single node performance test modules. Although the directly targeted test modules are mflops_intel_mkl and memory_bandwidth_stream, other test modules may take benefit of this feature. See 8.6 for more details. This option is available only to the privileged user and it is enabled by default.

The following example will attempt to perform performance thresholds configuration. A backup configuration file will be generated before modifying the default configuration file. Note that since no configuration file is provided the tool will look for it at the default locations (see Getting Started With Intel®Cluster Checker for more details).

Example:

cluster­check ­­auto performance

The value of the <user> tag is automatically detected when running as a privileged user and it will be included on the generated configuration file if applicable.

The feature will check if any of 'clck' or 'icr' are valid users on the system. An alternative user name can be provided by using the <user> tag or by using an environment variable named as CLCK_REGULAR_USER.

8.2.1.2. Files Handling

These options control how files will be handled to save the results of auto-configuration. They are mutually exclusive, meaning that only one can be used at a time. If no file handling option is provided, the tool will set backup by default.

If no configuration file is provided, a set of default location will be searched as detailed above.

• backup (enabled by default)Create a backup of the files provided by the user and edit the provided ones. The backup will have the same file base name with the .backup suffix and a time-stamp. This targets the XML configuration file and the nodelist file, when applicable.

It optionally accepts a path to define the destination of the backups; otherwise the same path of the original files are used.

The following example will attempt to perform compute node discovery, global path configuration and performance thresholds configuration if

56

Page 57: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

executed as root. A backup file inside the /home/icr/clck_conf directory will be created before modifying the configuration file provided.

Example:

cluster­check ­­auto backup=/home/icr/clck_conf config.xml

• newfile Create new files containing the automatically configured parameters. The new files will have the same file base name with the .new suffix and a time-stamp. This includes the XML configuration file and the nodelist file, when applicable. It optionally accepts a path to define the destination of the new files; otherwise use the same path of the original files.

The following example will attempt to perform compute node discovery, global path configuration and performance threshold configuration if executed as root. The results will be written into a new configuration file inside the /home/icr/clck_conf directory.

Example:

cluster­check ­­auto newfile=/home/icr/clck_conf config.xml

• overwriteOverwrite the file provided by the user with the automatically configured parameters. This will target the XML configuration file and the nodelist file, ifrequired. Non-compliant XML information may be lost as a side effect.

The following example will attempt to perform compute node discovery, global path configuration and performance threshold configuration if executed as root. The results will be written into the provided configuration file.

Example:

cluster­check ­­auto overwrite config.xml

• nowriteDo not save the automatic configuration, just use it in the current execution.

57

Page 58: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

The following example will attempt to perform compute node discovery, global path configuration and performance threshold configuration if executed as root. The results will not affect the configuration file provided.

Example:

cluster­check ­­auto nowrite config.xml

In the cases where nodes automatic discovery is used and no nodelist file is provided, a new nodelist file will be created at the same path of the XML configuration file targeted by automatic configuration. The name of the created nodelist file will be nodelist.<timestamp>.auto.

Note that the automatic configuration will maintain any configuration provided in the user configuration file. If this file references external XML files using XInclude, the automatic option parses the referenced XMLs and include them in the new file. Therefore no XInclude directives will appear in the final configuration file.

As a complex example, the following command will attempt automatic configuration of global path configuration and compute nodes discovery. The config.xml file will be used as a starting point but a new file will be created at the /etc/intel/clck directory.

Example:

cluster­check ­­auto newfile=/etc/intel/clck,global,nodes config.xml

8.3. Console output and Logs

At the beginning of the execution the tool will print on the screen the configuration with all the automatically configured parameters and, if applicable, the name of the file that contains it. By increasing the default verbosity, the tool will display on the screen the hardware information gathered and thresholds calculated by the performance automatic configuration option. See 8.5.2 for more details. If the input parameters are modified by auto-configuration, the output XML log will include both configurations: the one modified by automatic configuration (which was actually used during execution) and the original one provided by the user.Additionally, a log of the commands executed during nodes discovery and hardware scanning can be generated with the usual debugging option (­­debug or <debug/> see 2.1.3 for more details) .

58

Page 59: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

8.4. Cluster Nodes Automatic Discovery

When the nodes auto-configuration option is enabled the tool will discover the compute nodes available in the cluster using a cluster-wide command (it assumes it runs on the front-end/head node). Currently the tool has built in support for: ROCKS+* up to version 5.3, PCM* up to version 2.1 and Perceus* 1.5.3. For other provisioning systems use the <nodelist_cmd> configuration tag (see 8.4.1).

8.4.1. Configuration Options

This configuration will be used by the tool only when running in automatic configuration mode.

<nodelist_cmd>print_cluster_nodes_command</nodelist_cmd> For provisioning systems not supported by default use the <nodelist_cmd> configuration tag in to indicate the exact command that returns the list of compute nodes in the following format:

 head­node compute­node­1 compute­node­2 compute­node­3

Note that only the names of the nodes should be present in the output, one per line with no extra characters. Also, white-space and comments will be removed from the output before the actual execution.

Example:

<cluster> <nodelist_cmd>/opt/rocks/bin/dbreport nodes</nodelist_cmd>      . . . </cluster>

8.5. Performance Thresholds Automatic Configuration

Because the performance of each node depends on its hardware components, the auto-configuration performs a scan on each compute node to discover its bill of materials. With the information gathered a simplistic heuristic calculation is performed. If the tool cannot calculate the value, a fallback mechanism is used. This mechanism also allows the user to alter the default behavior. For details see 8.6.3.

59

Page 60: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

8.5.1. Hardware Scanning

During performance auto-configuration Intel® Cluster Checker queries each compute node and executes the commands listed below:

• dmidecode• lspci• cat /proc/cpuinfo

With the information gathered the compute nodes list is edited detailing the main hardware components for each compute node. The association of each compute node with its hardware components description is done using the “group” feature of Intel® Cluster Checker. Then the configuration is edited with the created ''groups” to match the different setting to each node during the testing phase (see the 1.1 and 1.2.4 for more details of the group feature).The following table shows the hardware components that are used to create groups and differentiate compute nodes. It also includes some examples of group names for each hardware component.

Hardware Source Group name examples

Processor Processor model string X5355, E5506

Sockets Processors quantity 1_PROCESSOR, 2_PROCESSOR

Base Board Base board identifier S5400SF, X38ML

Memory Speed <type>_SPEED_<Mhz> DIMM_SPEED_800, MM_SPEED_800

Memory Size MM_SIZE_<MB> MM_SIZE_4096, MM_SIZE_12288

Ethernet Ethernet device identifier 82575EB, 82598EB

Infiniband* Infiniband device identifier MT25208, MT25204

Table 9.1 Hardware Scanning components

8.5.2. Additional Output

If required, the output to screen can be increased with ­­verbose (or <verbose>) and more information will be shown.Verbose level:

4 The types and groups for each node.

60

Page 61: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

5 4 + configuration values for single-node performance test modules: node, test module, parameter, value, step in which that value was obtained and groups used for that value (if applies).

8.5.3. Benchmarking and Performance Disclaimers

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel® products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering to purchase. For more information on performance tests and on the performance of Intel® products, visit the Intel® Performance Benchmark Limitations website.

8.6. Automatic Configuration Advanced Usage

8.6.1. Group Configuration Alternatives

Based on the fact that compute nodes are automatically grouped during performance automatic configuration, it is possible to create beforehand a configuration file that can be used in different clusters. Therefore, the configuration file will have parameters to match different values according to the available hardware on the compute nodes and the groups created. It is also possible to configure default values for compute nodes that do not match a specific hardware combination. This is done by simply placing the configuration parameters with no group. Single-node node performance test modules (mflops_intel_mkl, memory_bandwidth_stream, imb_pingpong_intel_mpi and hdparm) follow specific steps to determine the configuration thresholds against which to test compute nodes performance (see details in 8.6.3).The following example shows how to configure the core_count test module to use different values of logical and physical cores based on the detected base board and processor model. Also note that default values (4 logical and 4 physical cores) are set for all compute nodes that do not match any group.

<core_count>

      <group name="X38ML AND X3230">        <logical­cores>4</logical­cores>     <physical­cores>4</physical­cores>    

61

Page 62: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

       </group>

       <group name="S5000PAL AND X5355">        <logical­cores>8</logical­cores>     <physical­cores>8</physical­cores>    

      </group>

       <group name="S5520UR AND X5355">        <logical­cores>16</logical­cores>     <physical­cores>8</physical­cores>    

      </group>

   <logical­cores>4</logical­cores>   <physical­cores>4</physical­cores>    </core_count>

8.6.2. Heterogeneous Hardware Support

Most of the test modules checking hardware homogeneity may be configured to take advantage of the on-the-fly nodes grouping of the performance automatic configuration. By editing the configuration file it is possible to have tests modules compare nodes in groups. The following configuration example shows how to tell the dmidecode test module to differentiate compute nodes (by base board and processor model) and compare each one only with other nodes belonging to the same group. Note that nodes that do not match any configured group will be considered to be part of the “default” group.

<dmidecode><group name=”S5400SF AND X5472”></group><group name=”X38ML AND X5355”></group>

<dmidecode>

It is important to have a comprehensive knowledge of the hardware components of each compute node to create the correct configuration. The following is a short list of the test modules checking hardware homogeneity:

• arch• core_count• core_frequency• cpuinfo• dmidecode• hardware_uniformity• iwarp

62

Page 63: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

• pci• processor_cache

8.6.3. Single Node Performance

In the case of the single-node performance test modules, the automatic configuration mode offers a fallback mechanism that looks for different configurations available at execution time. The following list shows the order of precedence used to obtain the required configuration:

A- Direct group match in XML configuration file.B- Heuristic thresholds calculation.C- Default threshold value in XML configuration file.D- Fallback floor value according to historic figures.

The following table shows the steps that apply to each single-node performance test module:

Test Module Steps

mflops_intel_mkl A, B, C, D

memory_bandwidth_stream A, B, C, D

imb_pingpong_intel_mpi A, C, D

hdparm A, C, D

Table 9.2 Single Node Performance thresholds completion steps

This mechanism is used to obtain the threshold for each configuration parameter of the above listed test modules. Therefore, for test modules with more than one configuration parameter it is possible to obtain different parameter values at different steps.

8.6.3.1. Direct Group Match

In the case that the user knows the expected performance figures for each compute node with different hardware components, the user can build a configuration file with the specific values for each one. This option has the highest precedence. So, if a match for a node is found, it will be used.

63

Page 64: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

The following table shows the hardware components that affect single-node performance test modules and gives examples of how to create groups to differentiate configuration values. Note that this table is only a reference and that nodes may be grouped according the user's criteria.

Test Module Hardware Group name examples

imb_pingpong_intel_mpi Ethernet DeviceInfiniband Device

82575EB AND MT25204

memory_bandwidth_stream Base Board IdentifierMemory Type

X38ML AND DIMM_SPEED_667

mflops_intel_mkl Processor CountProcessor Identifier

1_PROCESSOR AND X3230

Table 9.3 Hardware components for single-node performance

8.6.3.2. Heuristic Thresholds Calculation

A simple theoretical value is set based on the characteristics of the hardware components detected. Only the memory_bandwidth_stream and mflops_intel_mkl test modules support this approach.

8.6.3.3. Default Threshold Configuration

If nor a direct group match nor an heuristic calculation are possible, the tool searches in the configuration file for a default value. This value should be configured in for each test leaving it outside all groups. The following example shows how to configure the default value in the mflops_intel_mkl test module. The value 21328 will be used to test all nodes that do not belong to the group "2_PROCESSOR AND E5506 ".

<mflops_intel_mkl><group name= "2_PROCESSOR AND E5506">

<mflops>34128</mflops></group><mflops>21328</mflops>

</mflops_intel_mkl>

8.6.3.4. Fallback Floor Value

64

Page 65: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

If none of the previous alternatives succeeded an historical fallback value is used. This value is taken from the least known performing system at the moment. The value is intended only to prove that the device/feature being tested is just working and by no means that it is performing optimally. In the imb_pingpong_intel_mpi test module, floor values will be completed only for the rdssm fabric.

65

Page 66: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

9. Third Party Copyright Notices

The product is comprised of the following software and the following information is made available in compliance with these licenses:

The Intel® MPI Library Test Suite is based in part on the MPI C++ Test Suite. This product includes software developed at the Ohio Supercomputer Center at The Ohio State University, the University of Notre Dame and the Pervasive Technology Labs at Indiana University with original ideas contributed from Cornell University. For technical information contact Andrew Lumsdaine at the Pervasive Technology Labs at Indiana University. For administrative and license questions contact the Advanced Research and Technology Institute at 1100 Waterway Blvd. Indianapolis, Indiana 46202, phone 317-274-5905, fax 317-274-5902.

Software License for LAM/MPI  Copyright (c) 2001­2003 The Trustees of Indiana University.                        All rights reserved.Copyright   (c)   1998­2001   University   of   Notre   Dame.   All   rights reserved.Copyright (c) 1994­1998 The Ohio State University.   All rights reserved.  Indiana University has the exclusive rights to license this product under the following license.  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1)   All   redistributions   of   source   code   must   retain   the   above copyright notice, the list of authors in the original source code, this list of conditions and the disclaimer listed in this license;2) All redistributions in binary form must reproduce the above copyright notice, this list of conditions and the disclaimer listed in   this   license   in   the   documentation   and/or   other   materials provided with the distribution;3) Any documentation included with all redistributions must include the following acknowledgement:          "This   product   includes   software   developed   at   the   Ohio Supercomputer Center at The Ohio State University, the University of   Notre   Dame   and   the   Pervasive   Technology   Labs   at   Indiana University with original ideas contributed from Cornell University. For technical information contact Andrew Lumsdaine at the Pervasive 

66

Page 67: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Technology   Labs   at   Indiana   University.   For   administrative   and license   questions   contact   the   Advanced   Research   and   Technology Institute at 1100 Waterway Blvd. Indianapolis, Indiana 46202, phone 317­274­5905, fax317­274­5902."  Alternatively,   this   acknowledgement   may   appear   in   the   software itself,   and   wherever   such   third­party   acknowledgments   normally appear.  4) The name "LAM" or "LAM/MPI" shall not be used to endorse or promote products derived from this software without prior written permission from Indiana University.  For written permission, please contact   Indiana   University   Advanced   Research   &   Technology Institute.5) Products derived from this software may not be called "LAM" or     "LAM/MPI", nor may "LAM" or "LAM/MPI" appear in their name, without prior written permission of Indiana University Advanced Research & Technology Institute.  Indiana University provides no reassurances that the source code provided does not infringe the patent or any other intellectual property rights of any other entity.  Indiana University disclaims any liability to any recipient for claims brought by any other entity based on infringement of intellectual property rights or otherwise.  LICENSEE UNDERSTANDS THAT SOFTWARE IS PROVIDED "AS IS" FOR WHICH NO WARRANTIES   AS   TO   CAPABILITIES   OR   ACCURACY   ARE   MADE.   INDIANA UNIVERSITY GIVES NO WARRANTIES AND MAKES NO REPRESENTATION THAT SOFTWARE IS FREE OF INFRINGEMENT OF THIRD PARTY PATENT, COPYRIGHT, OR   OTHER   PROPRIETARY   RIGHTS.     INDIANA   UNIVERSITY   MAKES   NO WARRANTIES THAT SOFTWARE IS FREE FROM "BUGS", "VIRUSES", "TROJAN HORSES", "TRAP DOORS", "WORMS", OR OTHER HARMFUL CODE.   LICENSEE ASSUMES THE ENTIRE RISK AS TO THE PERFORMANCE OF SOFTWARE AND/OR ASSOCIATED   MATERIALS,   AND   TO   THE   PERFORMANCE   AND   VALIDITY   OF INFORMATION GENERATED USING SOFTWARE.  Indiana University has the exclusive rights to license this product under this license.

Intel® MPI Benchmarks is made available under the Common Public License. The source code is made available with this product and may also be downloaded from http://www.intel.com/cd/software/products/asmo-na/eng/219848.htm.

Intel® MPI Benchmarks (Common Public License)IMPORTANT ­ READ BEFORE COPYING, INSTALLING OR USINGCommon Public License Version 1.0

67

Page 68: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS COMMON PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF   THE   PROGRAM   CONSTITUTES   RECIPIENT'S   ACCEPTANCE   OF   THIS AGREEMENT.

DEFINITIONS"Contribution" means:

    * in the case of the initial Contributor, the initial code and documentation distributed under this Agreement, and    * in the case of each subsequent Contributor:          o changes to the Program, and           o additions to the Program; where such changes and/or additions to the Program originate from and are distributed by that particular   Contributor.   A   Contribution   'originates'   from   a Contributor if it was added to the Program by  such Contributor itself or anyone acting on such Contributor's behalf. Contributions do not include additions to the Program which: (i) are separate test   modules   of   software   distributed   in   conjunction   with   the Program   under   their   own   license   agreement,   and   (ii)   are   not derivative works of the Program.

"Contributor"   means   any   person   or   entity   that   distributes   the Program.

"Licensed Patents " mean patent claims licensable by a Contributor which   are   necessarily   infringed   by   the   use   or   sale   of   its Contribution alone or when combined with the Program.

"Program" means the Contributions distributed in accordance with this Agreement.

"Recipient"   means   anyone   who   receives   the   Program   under   this Agreement, including all Contributors.

GRANT OF RIGHTSSubject to the terms of this Agreement, each Contributor hereby grants Recipient a non­exclusive, worldwide, royalty­free copyright license   to   reproduce,   prepare   derivative   works   of,   publicly display,   publicly   perform,   distribute   and   sublicense   the Contribution   of   such   Contributor,   if   any,   and   such   derivative works, in source code and object code form.

Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non­exclusive, worldwide, royalty­free patent license under Licensed Patents to make, use, sell, offer to sell, import and otherwise transfer the Contribution of such Contributor, if any, in source code and object code form. This patent license shall apply to the combination of the Contribution and the Program 

68

Page 69: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

if, at the time the Contribution is added by the Contributor, such addition of the Contribution causes such combination to be covered by the Licensed Patents. The patent license shall not apply to any other combinations which include the Contribution. No hardware per se is licensed hereunder.

Recipient understands that although each Contributor grants the licenses to its Contributions set forth herein, no assurances are provided by any Contributor that the Program does not infringe the patent or other intellectual property rights of any other entity. Each Contributor disclaims any liability to Recipient for claims brought by any other entity based on infringement of intellectual property rights or otherwise. As a condition to exercising the rights   and   licenses   granted   hereunder,   each   Recipient   hereby assumes   sole   responsibility   to   secure   any   other   intellectual property rights needed, if any. For example, if a third party patent license is required to allow Recipient to distribute the Program, it is Recipient's responsibility to acquire that license before distributing the Program.

Each Contributor represents that to its knowledge it has sufficient copyright   rights   in   its   Contribution,   if   any,   to   grant   the copyright license set forth in this Agreement.

REQUIREMENTSA Contributor may choose to distribute the Program in object code form under its own license agreement, provided that:

    * it complies with the terms and conditions of this Agreement; and    * its license agreement:          o effectively disclaims on behalf of all Contributors all warranties   and   conditions,   express   and   implied,   including warranties or conditions of title and non­infringement, and implied warranties   or   conditions   of   merchantability   and   fitness   for   a particular purpose;          o effectively excludes on behalf of all Contributors all liability   for   damages,   including   direct,   indirect,   special, incidental and consequential damages, such as lost profits;                   o states that any provisions which differ from this Agreement are offered by that Contributor alone and not by any other party; and          o states that source code for the Program is available from such Contributor, and informs licensees how to obtain it in a reasonable   manner   on   or   through   a   medium   customarily   used   for software exchange.

      When the Program is made available in source code form:

          o it must be made available under this Agreement; and

69

Page 70: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

          o a copy of this Agreement must be included with each copy of the Program. Contributors may not remove or alter any copyright notices contained within the Program.

Each Contributor must identify itself as the originator of its Contribution, if any, in a manner that reasonably allows subsequent Recipients to identify the originator of the Contribution.

COMMERCIAL DISTRIBUTIONCommercial   distributors   of   software   may   accept   certain responsibilities with respect to end users, business partners and the   like.   While   this   license   is   intended   to   facilitate   the commercial use of the Program, the Contributor who includes the Program in a commercial product offering should do so in a manner which does not create potential liability for other Contributors. Therefore, if a Contributor includes the Program in a commercial product   offering,   such   Contributor   ("Commercial   Contributor") hereby   agrees   to   defend   and   indemnify   every   other   Contributor ("Indemnified Contributor") against any losses, damages and costs (collectively "Losses") arising from claims, lawsuits and other legal actions brought by a third party against the Indemnified Contributor to the extent caused by the acts or omissions of such Commercial Contributor in connection with its distribution of the Program in a commercial product offering. The obligations in this section do not apply to any claims or Losses relating to any actual or alleged intellectual property infringement. In order to qualify, an Indemnified Contributor must:

    * promptly notify the Commercial Contributor in writing of such claim, and     * allow the Commercial Contributor to control, and cooperate with the Commercial Contributor in, the defense and any related settlement   negotiations.   The   Indemnified   Contributor   may participate in any such claim at its own expense.

For   example,   a   Contributor   might   include   the   Program   in   a commercial product offering, Product X. That Contributor is then a Commercial Contributor. If that Commercial Contributor then makes performance  claims,  or offers  warranties related  to  Product  X, those   performance   claims   and   warranties   are   such   Commercial Contributor's   responsibility   alone.   Under   this   section,   the Commercial Contributor would have  to  defend  claims  against the other   Contributors   related   to   those   performance   claims   and warranties, and if a court requires any other Contributor to pay any damages as a result, the Commercial Contributor must pay those damages.

NO WARRANTY

70

Page 71: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY   WARRANTIES   OR   CONDITIONS   OF   TITLE,   NON­INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the appropriateness of using and distributing the Program and assumes all risks associated with its exercise of rights under this Agreement, including but not limited to the risks and costs of program errors, compliance with applicable laws, damage to or loss of data, programs or equipment, and unavailability or interruption of operations.

DISCLAIMER OF LIABILITYEXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR   ANY   CONTRIBUTORS   SHALL   HAVE   ANY   LIABILITY   FOR   ANY   DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

GENERALIf any provision of this Agreement is invalid or unenforceable under   applicable   law,   it   shall   not   affect   the   validity   or enforceability of the remainder of the terms of this Agreement, and without further action by the parties hereto, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable.

If Recipient institutes patent litigation against a Contributor with respect to a patent applicable to software (including a cross­claim   or   counterclaim   in   a   lawsuit),   then   any   patent   licenses granted by that Contributor to such Recipient under this Agreement shall   terminate   as   of   the   date   such   litigation   is   filed.   In addition, if Recipient institutes patent litigation against any entity   (including   a   cross­claim   or   counterclaim   in   a   lawsuit) alleging that the Program itself (excluding combinations of the Program with other software or hardware) infringes such Recipient's patent(s), then such Recipient's rights granted under Section 2(b) shall terminate as of the date such litigation is filed.

All Recipient's rights under this Agreement shall terminate if it fails to comply with any of the material terms or conditions of this Agreement and does not cure such failure in a reasonable period of time after becoming aware of such noncompliance. If all Recipient's rights under this Agreement terminate, Recipient agrees to cease use and distribution of the Program as soon as reasonably practicable. However, Recipient's obligations under this Agreement 

71

Page 72: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

and any licenses granted by Recipient relating to the Program shall continue and survive.

Everyone   is   permitted   to   copy   and   distribute   copies   of   this Agreement, but in order to avoid inconsistency the Agreement is copyrighted and may only be modified in the following manner. The Agreement   Steward   reserves   the   right   to   publish   new   versions (including revisions) of this Agreement from time to time. No one other than the Agreement Steward has the right to modify this Agreement. IBM is the initial Agreement Steward. IBM may assign the responsibility to serve as the Agreement Steward to a suitable separate entity. Each new version of the Agreement will be given a distinguishing   version   number.   The   Program   (including Contributions) may always be distributed subject to the version of the Agreement under which it was received. In addition, after a new version of the Agreement is published, Contributor may elect to distribute the Program (including its Contributions) under the new version.   Except   as   expressly   stated   in   Sections   2(a)   and   2(b) above, Recipient receives no rights or licenses to the intellectual property   of   any   Contributor   under   this   Agreement,   whether expressly, by implication, estoppel or otherwise. All rights in the Program not expressly granted under this Agreement are reserved.

This Agreement is governed by the laws of the State of New York and the intellectual property laws of the United States of America. No party   to   this   Agreement   will   bring   a   legal   action   under   this Agreement more than one year after the cause of action arose. Each party   waives   its   rights   to   a   jury   trial   in   any   resulting litigation.

License for Use of "Intel® MPI Benchmarks" Name and Trademark

In addition to the provisions of the Common Public License as included in the Intel® MPI Benchmarks distribution, Intel® grants the recipient the right to use the name and trademark "Intel® MPI Benchmarks" in relation to disclosures or publications of results, provided   that   bespoke   results   were   obtained   by   running   the benchmarks generated from the original, unchanged source code as distributed by Intel®.

Under no circumstances shall the recipient be permitted to use the name and trademark "Intel® MPI Benchmarks" in relation to results obtained by running benchmarks generated from source code that is different from the original source code as distributed by Intel®, regardless of whether the differences are caused by modifying the existing benchmark components or by adding new components.

dmidecode is made available under the GNU Public License (GPL):

72

Page 73: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

Copyright (C) 2000­2002 Alan Cox <[email protected]>Copyright (C) 2002­2007 Jean Delvare <khali@linux­fr.org> This   program   is   free   software;   you   can   redistribute   it   and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT   ANY   WARRANTY;   without   even   the   implied   warranty   of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along   with   this   program;   if   not,   write   to   the   Free   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA   02111­1307 USA

A copy of the HPC Challenge Benchmark License is made available in accordance with the requirements of the license.

LicenseCopyright © 2011 The University of Tennessee. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: ∙ Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. ∙ Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer listed in   this   license   in   the   documentation   and/or   other   materials provided with the distribution. ∙ Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. in no event shall the copyright owner or contributors be liable for any direct, indirect, incidental,   special,   exemplary,   or   consequential   damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict   liability,   or   tort   (including   negligence   or   otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

73

Page 74: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

External XML parsing software is included as part of this product.

Libexpat library for XML parsing is used in this product under the MIT License: 

Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd and Clark CooperCopyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Expat maintainers.

Permission   is   hereby   granted,   free   of   charge,   to   any   person obtaininga copy of this software and associated documentation files (the"Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   FITNESS   FOR   A   PARTICULAR   PURPOSE   AND NONINFRINGEMENT.IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The tool includes software developed at the University of Tennessee, Knoxville, Innovative Computing Laboratories and neither the University nor ICL endorse or promote this product. Although HPL 2.0 is redistributable under certain conditions, this particular package is subject to the MKL license.-----------------------------------------------------------------------------------------------------------

HPL Copyright Notice and Licensing TermsRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:1. Redistributions   of   source   code   must   retain   the   above copyright notice,  this   list   of   conditions   and   the   following disclaimer.2. Redistributions   in   binary   form   must   reproduce   the   above copyright  notice,   this   list   of   conditions,   and   the   following disclaimer in the  documentation   and/or   other   materials   provided with the distribution.

74

Page 75: Intel(R) Cluster Checker · 2013. 10. 22. · Step 3 Run Intel® Cluster Checker You can now run Intel® Cluster Checker by calling clustercheck binary, passing your configuration

3. All advertising materials mentioning features or use of this software  must display the following acknowledgment: This product includes  software   developed   at   the   University   of   Tennessee, Knoxville, Innovative  Computing Laboratory.4. The name of the University, the name of the Laboratory, or the names  of  Its contributors may not be used to endorse or promote  products  derived   from   this   software   without specific written  permission.DisclaimerTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS `AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OR  CONTRIBUTORS  BE  LIABLE  FOR ANY  DIRECT, INDIRECT, INCIDENTAL,   SPECIAL,   EXEMPLARY,   OR   CONSEQUENTIAL   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT   LIABILITY,   OR   TORT   (INCLUDING   NEGLIGENCE   OR   OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

75