guide to powerha templates - helpsystems...powerha® for ibm i is the ibm® power systems™...

52
Guide to IBM Power HA templates for IBM i

Upload: others

Post on 03-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Guide to IBM Power HA templates for IBM i

Page 2: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

CopyrightCopyright Halcyon - A Division of HelpSystems. All rights reserved.

This document is intended as a guide to the Vision iTERA Availability monitoring templatesavailable with Halcyon software products for IBM i.

This documentation contains Halcyon proprietary and confidential information and may not bedisclosed, used, or copied without the prior consent of Halcyon Software, or as set forth in theapplicable license agreement. Users are solely responsible for the proper use of the softwareand the application of the results obtained.

Although Halcyon Software has tested the software and reviewed the documentation, the solewarranty for the software may be found in the applicable license agreement between HalcyonSoftware and the user.

Publication Revision: June 2019

Overview Halcyon Templates are designed to provide the same level of monitoring across a number ofsimilar devices by applying a set of user-defined filters with a single-click. This greatly reducesset-up time and ensures all systems are covered by at least a basic level of monitoring.

Should you need to make a system-wide change at a later date, a single update covers allsystems using the template.

Page 3: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

C H A P T E R

CHAPTER 1INSTALLATION

If you already use any of Halcyon's Level 1 to 4 software suites then followthese instructions to install templates to a customized environment on the IBM i.

If you do not currently use Halcyon's tools in your IBM environment but wish tohave more information, an on-line demonstration or a free trial then please seethe list of contacts on the back page of this guide.

Installation of Customized EnvironmentFollow these instructions to install templates to a customized environment.

1 Install the Halcyon solution, using the appropriate installation guide.

2 Once successfully installed, log into the environment to which you wish to apply the customized template, for example, HALPROD/HALCYON.

3 From the command line run ENDMON and press F4. Follow the prompts to complete the ending of the monitors.

Note: The installation of the customized environment fails if the monitors are not stopped.

4 From the command line type CSTENV and press F4.

5 Type the required authorization code for the template you wish to apply and press Enter.

I n s t a l l a t i o n 1-1

Page 4: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Figure 1.1 Entering the customization code

Note: Each customized environment requires an authorization code.

Please contact [email protected] or your local Halcyon reseller for details on how to obtain this code.

The Customized environment is now installed.

6 From the main menu of your Halcyon solution, select option 5=Work withRules. The template rules applicable to the customized environment that youinstalled can be found in the listed queue and rule groups. Default actionschedules are installed and additionally, where appropriate, changes tosystem defaults may also be made.

In most cases, the templates supplied can be used immediately uponcompletion of installation, but there may be instances where you need tochange rule properties to match those of your own environment. This can bedone by taking option 2=Change against the rule and making the requiredchanges. Similarly, should you require multiple rules for different messagequeues or devices and so on you can use option 3=Copy against the ruleand then make the required amendments.

1- 2 H a l c y o n T e m p l a t e s

Page 5: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Rule ActionsUnless otherwise specified, all template rules are implemented with a defaultaction schedule which sends an alert message to your local console (option10=Message Console from the main menu). Should you wish to amend thisoption, take option 2=Change against the action within the rule and make theamendments as required.

Note: Please refer to the user reference guide for your Halcyon solution for details of actions that may be applied to rules.

I n s t a l l a t i o n 1-3

Page 6: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

C H A P T E R

CHAPTER 2IBM POWER HA FOR IBM ICUSTOMIZATION TEMPLATES

IBM Power HA for IBM i Customization TemplatesPowerHA® for IBM i is the IBM® Power Systems™ solution for high availabilityand disaster recovery. It’s an IBM storage based clustering solution that is anintegrated extension of the storage management architecture and the IBM ioperating system.

Template AssignmentIn most cases, the templates supplied can be used immediately uponcompletion of installation, but there may be instances where you need tochange rule properties to match those of your own environment. This can bedone by taking option 2=Change against the rule and making the requiredchanges. Similarly, should you require multiple rules for different messagequeues or devices and so on you can use option 3=Copy against the rule andthen make the required amendments.

ActionsUnless otherwise specified, all template rules are implemented with a defaultaction schedule which sends an alert message to your local console (option10=Message Console from the main menu). Should you wish to amend thisoption, take option 2=Change against the action within the rule and make theamendments as required.

Note: Please refer to the user reference guide for your Halcyon solution for details of actions that may be applied to rules.

2- 4 H a l c y o n T e m p l a t e s

Page 7: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Note: The message queue used for failover related messages can be set by defining a cluster message queue (CLUMSGQ) on the ‘Change Cluster’ (CHGCLU) command. If no message queue is set, messages will not appear when failover occurs. If a message queue is defined, customers may be asked to confirm failover.

Message Queue Rules

The Message Queue rules for the Power HA for IBM i template are located inthe QSYSOPR Message Queue Rule Group.

QSYSOPR QSYSOPR Message QueueAll the rules defined in this message queue rule group run every 60 seconds.

No Global Exclusions DefinedSequence number 0 is a special sequence number which cannot be deleted,even if you do not define any global exclusions for the named message queue.

If you have messages that are to be totally excluded from any type of action,take option 2=Change against sequence 0 to define the exact criteria for theexclusion. If a global exclusion exists for a message, then none of the othersequence numbers for this message queue are searched for a match and noaction is taken.

1200: Cluster node could not be added to CRGThis message rule, which operates on a 24/7 basis, monitors for any clusternode that cannot be added to a Cluster Resource Group (CRG).

It does this by checking for message CPFBB52 ‘Cluster node &1 could not beadded to cluster resource group &2’ being received in Message File QCPFMSGwithin library QSYS.

1210: Cluster Resource Services active. Request failedThis message rule, which operates on a 24/7 basis, monitors for any request todelete a Cluster Resource Group which fails as the requested Cluster ResourceGroup is not inactive.

It does this by checking for message CPFBB53 ‘Cluster Resource Services isactive. Request cannot be processed’ being received in Message FileQCPFMSG within library QSYS.

1220: Node cannot join clusterThe Start Cluster Node (QcstStartClusterNode) API is used to start ClusterResource Services on a node in the cluster. This message rule, which operateson a 24/7 basis, monitors for any request to add a node to a cluster which fails.

It does this by checking for message CPFBB54 ‘Node &1 not be added to thecluster &2’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-5

Page 8: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1230: Cluster node cannot be removed from clusterThe Remove Cluster Node Entry (QcstRemoveClusterNodeEntry) API is used to remove a node from a cluster.

This message rule, which operates on a 24/7 basis, monitors for any request toremove a cluster node from a cluster which fails.

It does this by checking for message CPFBB6A ‘Cluster node &1 cannot beremoved from cluster &2’ being received in Message File QCPFMSG withinlibrary QSYS.

1240: Primary node not current owner of cluster devicesThe Initiate Switchover (QcstInitiateSwitchOver) API changes the current roles of nodes in the recovery domain of a cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for any eventwhere the primary node is not the current owner of cluster devices.

It does this by checking for message CPFBB6A ‘Primary node &1 not currentowner of hardware resource &2’ being received in Message File QCPFMSGwithin library QSYS.

1250: Request not valid for cluster resource groupThe Change Cluster Resource Group Device Entry(QcstChgClusterResourceGroupDev) API changes information about one ormore configuration objects in a device cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for any invalidrequest to change cluster resource group information.

It does this by checking for message CPFBB6B ‘Request not valid for type &1cluster resource group’ being received in Message File QCPFMSG withinlibrary QSYS.

1260: Hardware configuration not completeBefore configuring a CRG for switchable IASP, one must configure a I/O pooland add the disk units to that pool. Having an I/O pool allows the disks to beswitched. Errors such as message CPFBB6C may result, if not configuredcorrectly. The message CPFBB6C indicates that hardware configuration is notcomplete.

This message rule, which operates on a 24/7 basis, monitors the ping formessage CPFBB6C ‘Hardware configuration not complete being received inMessage File QCPFMSG within library QSYS.

1270: Configuration object type not in CRGThe Change Cluster Resource Group Device Entry(QcstChgClusterResourceGroupDev) API changes information about one ormore configuration objects in a device cluster resource group.

2- 6 H a l c y o n T e m p l a t e s

Page 9: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

This message rule, which operates on a 24/7 basis, monitors for any requests tochange a configuration object in a cluster resource group.

It does this by checking for message CPFBB6D ‘Configuration object &1 not incluster resource group &2’ being received in Message File QCPFMSG withinlibrary QSYS.

1280: Cluster message not received from cluster nodeThis message rule, which operates on a 24/7 basis, monitors for any clustermessage not being received from the cluster node.

It does this by checking for message CPFBB60 ‘Offset to configuration objectarray is not valid’ being received in Message File QCPFMSG within libraryQSYS.

1290: Exit program name *NONE not validThe Create Cluster Resource Group API creates a cluster resource groupobject. The Exit Program is required during the recovery process.

This message rule, which operates on a 24/7 basis, monitors for any request touse an invalid Exit Program of *NONE.

It does this by checking for message CPFBB62 ‘Exit program name *NONE notvalid’ being received in Message File QCPFMSG within library QSYS.

1300: Cluster node in different device domainThe Add Node To Recovery Domain API is used to add a new node to therecovery domain of an existing cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for any clusternode that is created in a different device domain.

It does this by checking for message CPFBB65 ‘Cluster node &1 in differentdevice domain’ being received in Message File QCPFMSG within library QSYS.

1310: Request failed for device cluster resource groupThis message rule, which operates on a 24/7 basis, monitors for any requestthat fails for the named device cluster resource group.

It does this by checking for message CPFBB66 ‘Request failed for devicecluster resource group &3’ being received in Message File QCPFMSG withinlibrary QSYS.

1320: CRG has no configuration object entriesThe cluster resource group must have at least one configuration object entry.The configuration objects specified for the cluster resource group must exist onall active nodes in the recovery domain and the resource name specified in aconfiguration object must be the same on all active nodes in the recoverydomain.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-7

Page 10: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

This message rule, which operates on a 24/7 basis, monitors for a clusterresource group which has no configuration object entries defined.

It does this by checking for message CPFBB68 ‘Cluster resource group &1 hasno configuration object entries’ being received in Message File QCPFMSGwithin library QSYS.

1330: Primary node not owner of hardware resourceIf the node being removed is the current primary node, ownership of the devices specified in the cluster resource group are switched from the current primary to the new primary, if none of the configuration objects are varied on the current primary.

This message rule, which operates on a 24/7 basis, monitors for a primary nodethat is not the current owner of a hardware resource.

It does this by checking for message CPFBB69 ‘Primary node &1 not currentowner of hardware resource &2’ being received in Message File QCPFMSGwithin library QSYS.

1340: Node in CRG not current owner of specified devicesIf the primary node in a named Cluster Resource Group (CRG) does notcurrently own the specified devices, the API fails with an error message.

This message rule, which operates on a 24/7 basis, monitors for the primarynode in the CRG not being the owner of the specified devices.

It does this by checking for message CPFBB7A ‘Primary node &1 in clusterresource group &2 not current owner of specified devices’ being received inMessage File QCPFMSG within library QSYS.

1350: Device type not correct for configuration object & modeHardware configuration must be complete so that the physical hardware hasbeen associated with the configuration object and node.

This message rule, which operates on a 24/7 basis, monitors for the correlationbetween hardware and the configuration object and node.

It does this by checking for message CPFBB7B ‘Device type incorrect forconfiguration object &1 on node &2’ being received in Message File QCPFMSGwithin library QSYS.

1360: Resource name already used in CRGThe resource name specified in the configuration object must be the same on all nodes in the recovery domain.

This message rule, which operates on a 24/7 basis, monitors for the resourcename already being in use by a configuration object in a named cluster resourcegroup.

2- 8 H a l c y o n T e m p l a t e s

Page 11: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

It does this by checking for message CPFBB7C ‘Resource name &1 alreadyused by configuration object &2 in cluster resource group &4’ being received inMessage File QCPFMSG within library QSYS.

1370: Configuration object already in CRGWhen configuration objects are added they cannot be already specified inanother cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for the existenceof configuration objects already existing in a Cluster Resource Group.

It does this by checking for message CPFBB7D ‘Configuration object &1already in cluster resource group &2’ being received in Message FileQCPFMSG within library QSYS.

1380: Configuration object resource name already in CRGThe resource name specified in the configuration object must be the same on allnodes in the recovery domain.

This message rule, which operates on a 24/7 basis, monitors for the namedresource name already existing in a Cluster Resource Group.

It does this by checking for message CPFBB7E ‘Resource name &1 already incluster resource group &2’ being received in Message File QCPFMSG withinlibrary QSYS.

1390: Too many I/O processors / bridges specified in a cluster resource groupIf devices attached to different IOPs or high-speed link I/O bridges are groupedsuch as for an auxiliary storage pool, all devices for the affected IOPs or high-speed link I/O bridges must be specified in the same cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for there being toomany I/O Processors/bridges specified in a named Cluster Resource Group.

It does this by checking for message CPFBB7F ‘Too many I/O processors orhigh-speed link I/O bridges specified for cluster resource group &1’ beingreceived in Message File QCPFMSG within library QSYS.

1400: Request not compatible with cluster versionThis message rule, which operates on a 24/7 basis, monitors for a requestbeing made which is incompatible with the current cluster version.

It does this by checking for message CPFBB70 ‘API request &1 not compatiblewith current cluster version’ being received in Message File QCPFMSG withinlibrary QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-9

Page 12: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1410: Potential cluster node version not compatibleThe potential node version of the node being started must be equal to thecurrent cluster version or up to one level higher than the current cluster version.

This message rule, which operates on a 24/7 basis, monitors for anincompatibility between the potential and current cluster nodes.

It does this by checking for message CPFBB71 ‘Potential node version &1 ofnode &2 not compatible’ being received in Message File QCPFMSG withinlibrary QSYS.

1420: Potential cluster version mod not compatibleThe potential cluster version represents the most advanced level of clusterfunction available for a given node. This is the version at which the node iscapable of communicating with the other cluster nodes.

This message rule, which operates on a 24/7 basis, monitors for a modificationto the potential current cluster version being incompatible.

It does this by checking for message CPFBB72 ‘Potential cluster node versionmodification level of cluster node &2 not compatible’ being received in MessageFile QCPFMSG within library QSYS.

1430: Cluster node cannot be added to device domainIt may not be possible to add a cluster node to a device domain for a number ofreasons. The API will fail if any member of the device domain to which the nodebeing added has a status of Partition. The API will fail if it is the first node beingadded to the device domain and any node in the cluster has a status ofPartition.

This message rule, which operates on a 24/7 basis, monitors for an instancewhere a named cluster node cannot be added to a named device domain.

It does this by checking for message CPFBB73 ‘Cluster node &1 could not beadded to device domain &2’ being received in Message File QCPFMSG withinlibrary QSYS.

1440: Cluster node already a member of device domainA node can only be a member of one device domain.

This message rule, which operates on a 24/7 basis, monitors for an instancewhere a named cluster node is already a member of a named device domain.

It does this by checking for message CPFBB74 ‘Cluster node &1 already amember of device domain &2’ being received in Message File QCPFMSGwithin library QSYS.

2- 1 0 H a l c y o n T e m p l a t e s

Page 13: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1450: Cluster node not a member of device domainThis message rule, which operates on a 24/7 basis, monitors for a namedcluster node not being a member of a named device domain.

It does this by checking for message CPFBB75 ‘Cluster node &1 not a memberof device domain &2’ being received in Message File QCPFMSG within libraryQSYS.

1460: Cluster node cannot be removed from device domainIt may not be possible to remove a cluster node from a device domain for anumber of reasons. The API will fail if the node to be removed is in the recoverydomain of any device cluster resource group. The node to be removed and atleast one other member of the device domain must be ACTIVE. On certainconditions, all current members of the device domain must be active.

This message rule, which operates on a 24/7 basis, monitors for a namedcluster node not being a member of a named device domain.

It does this by checking for message CPFBB76 ‘Cluster node &1 cannot beremoved from device domain &2’ being received in Message File QCPFMSGwithin library QSYS.

1470: Device domain does not exist in clusterThis message rule, which operates on a 24/7 basis, monitors for a namedcluster node not existing in a named cluster.

It does this by checking for message CPFBB77 ‘Device domain &1 does notexist in cluster &2’ being received in Message File QCPFMSG within libraryQSYS.

1480: Internal cluster resource services mismatchThis message rule, which operates on a 24/7 basis, monitors for an internalmismatch between cluster resource services.

It does this by checking for message CPFBB79 ‘Internal cluster resourceservices mismatch’ being received in Message File QCPFMSG within libraryQSYS.

1490: Cluster node failed to join or merge with clusterThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a named cluster node fails to join or merge with a named cluster.

It does this by checking for message CPFBB8A ‘Cluster node &1 failed to join ormerge with cluster &2’ being received in Message File QCPFMSG within libraryQSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-11

Page 14: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1500: Internal error occurred in operationThis message rule, which operates on a 24/7 basis, monitors for instanceswhere an internal error occurred during an named operation.

It does this by checking for message CPFBB8D ‘Internal error occurred duringoperation &1’ being received in Message File QCPFMSG within library QSYS.

1510: Internal error occurred during methodThis message rule, which operates on a 24/7 basis, monitors for instanceswhere an internal error occurred during an named method.

It does this by checking for message CPFBB8E ‘Internal error occurred duringmethod &1’ being received in Message File QCPFMSG within library QSYS.

1520: Enqueue on distribute information queue failedThe Distribute Information (QcstDistributeInformation) API is used to deliverinformation from a node in the recovery domain to other nodes in the recoverydomain.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere an enqueue of data on a named distribution queue fails.

It does this by checking for message CPFBB8F ‘Enqueue on distributeinformation queue &1 in library &2 failed’ being received in Message FileQCPFMSG within library QSYS.

1530: Request failed for device CRG failedThis message rule, which operates on a 24/7 basis, monitors for a requestfailing for a named device Cluster Resource Group.

It does this by checking for message CPFBB90 ‘Request failed for devicecluster resource group &3’ being received in Message File QCPFMSG withinlibrary QSYS.

1540: Request failed for device CRG failedThis message rule, which operates on a 24/7 basis, monitors for a requestfailing for a named device Cluster Resource Group.

It does this by checking for message CPFBB80 ‘Request failed for devicecluster resource group &3’ being received in Message File QCPFMSG withinlibrary QSYS.

1550: New primary node not activeThis message rule, which operates on a 24/7 basis, monitors for a requestfailing for a new named primary node, where assigned, being active.

It does this by checking for message CPFBB81 ‘New primary node &1 notactive’ being received in Message File QCPFMSG within library QSYS.

2- 1 2 H a l c y o n T e m p l a t e s

Page 15: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1560: Cluster node defined in Cluster Resource GroupThis message rule, which operates on a 24/7 basis, monitors for instances of anamed cluster node already being defined in a named cluster resource group.

It does this by checking for message CPFBB82 ‘Cluster node &2 defined incluster resource group &1’ being received in Message File QCPFMSG withinlibrary QSYS.

1570: Internal cluster object damagedThis message rule, which operates on a 24/7 basis, monitors for instanceswhere an internal cluster object has been identified as damaged.

It does this by checking for message CPFBB83 ‘Internal cluster objectdamaged’ being received in Message File QCPFMSG within library QSYS.

1580: Device domain entry for node being removedThis message rule, which operates on a 24/7 basis, monitors for instanceswhere device domain entry for a named node is being removed.

It does this by checking for message CPFBB84 ‘Device domain entry for node&1 being removed’ being received in Message File QCPFMSG within libraryQSYS.

1590: Cluster detected damageThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a named cluster has detected damage.

It does this by checking for message CPFBB85 ‘Cluster &3 detected damage’being received in Message File QCPFMSG within library QSYS.

1600: Job already existsThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a job already exists.

It does this by checking for message CPFBB87 ‘Job already exists’ beingreceived in Message File QCPFMSG within library QSYS.

1610: Node not in recovery domain for CRGThe Distribute Information (QcstDistributeInformation) API can only be run on anode that is an active member in the recovery domain of a cluster resourcegroup.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a named node is not in the recovery domain for a named cluster resourcegroup.

It does this by checking for message CPFBB88 ‘Node &1 not in recoverydomain for cluster resource group &2’ being received in Message FileQCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-13

Page 16: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1620: Current status of cluster node can’t be changedThe Change Cluster Node Entry (QcstChangeClusterNodeEntry) API is used tochange the fields in the cluster node entry.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the current status of a named cluster node cannot be changed.

It does this by checking for message CPFBB89 ‘The current status &2, ofcluster node &1 cannot be changed’ being received in Message File QCPFMSGwithin library QSYS.

1630: Online value not valid for configuration objectA value of 2 for the device's 'configuration object online' attribute can bespecified only for a secondary auxiliary storage pool.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the online value is invalid for the named configuration object.

It does this by checking for message CPFBB9A ‘Online value not valid forconfiguration object &1 type &3’ being received in Message File QCPFMSGwithin library QSYS.

1640: Auxiliary storage pool group member not specifiedAll members of an auxiliary storage pool group must be configured in the clusterresource group before ownership can be changed.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the online value is invalid for the named configuration object.

It does this by checking for message CPFBB9B ‘Auxiliary storage pool groupmember &1 not specified’ being received in Message File QCPFMSG withinlibrary QSYS.

1650: Not all ASP group members added or removed togetherThis message rule, which operates on a 24/7 basis, monitors for instances in acluster resource group where not all ASP group members were added orremoved together.

It does this by checking for message CPFBB9C ‘Not all auxiliary storage poolgroup members added or removed together’ being received in Message FileQCPFMSG within library QSYS.

1660: Config object not compatible with cluster versionThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a named configuration object is not compatible with the current clusterversion.

It does this by checking for message CPFBB9D ‘Configuration object &1 notcompatible with current cluster version’ being received in Message FileQCPFMSG within library QSYS.

2- 1 4 H a l c y o n T e m p l a t e s

Page 17: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1670: Relational DB name not correct for config object on nodeIf a database name has been specified for a configuration object, it must be thesame on all active nodes in the recovery domain.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the relational database name is not correct for the named configurationobject on a named node.

It does this by checking for message CPFBB9E ‘Data base name &1 not correctfor configuration object &2 on node &3’ being received in Message FileQCPFMSG within library QSYS.

1680: ASP storage pool configuration changes in progressThis message rule, which operates on a 24/7 basis, monitors for instanceswhere an auxiliary storage pool cannot be accessed due to there being changesin progress.

It does this by checking for message CPFBB9F ‘Auxiliary storage poolconfiguration changes are in progress’ being received in Message FileQCPFMSG within library QSYS.

1690: Request failed for device cluster resource groupThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a request failed for a named device cluster resource group.

It does this by checking for message CPFBB90 ‘Request failed for devicecluster resource group &3’ being received in Message File QCPFMSG withinlibrary QSYS.

1700: Configuration object still varied onThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a named configuration object is still varied on.

It does this by checking for message CPFBB91 ‘Configuration object &1 stillvaried on’ being received in Message File QCPFMSG within library QSYS.

1710: Hardware resource not owned by nodeThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a named hardware resource is not owned by a named node.

It does this by checking for message CPFBB92 ‘Hardware resource &1 notowned by node &3 or node &4’ being received in Message File QCPFMSGwithin library QSYS.

1720: Base operating system option 41 not installedIf the node being started is in a device domain, the Start Cluster Node(QcstStartClusterNode) API requires that IBM® i option 41, HA SwitchableResources, is installed and a valid license key exists on that node.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-15

Page 18: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the base operating system option 41 is not installed or properly licensed.

It does this by checking for message CPFBB93 ‘Base operating system option41 not installed or license key not valid’ being received in Message FileQCPFMSG within library QSYS.

1730: Hardware resources not returned to nodeThis message rule, which operates on a 24/7 basis, monitors for instanceswhere hardware resources are not returned to a named node.

It does this by checking for message CPFBB94 ‘Hardware resources notreturned to node &2’ being received in Message File QCPFMSG within libraryQSYS.

1740: Vary configuration failed for configuration objectThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a vary configuration failed for a named configuration object.

It does this by checking for message CPFBB95 ‘Vary configuration failed forconfiguration object &1, type &7’ being received in Message File QCPFMSGwithin library QSYS.

1750: Internal device domain mismatchThis message rule, which operates on a 24/7 basis, monitors for instanceswhere there is an internal device domain mismatch.

It does this by checking for message CPFBB96 ‘Internal device domainmismatch’ being received in Message File QCPFMSG within library QSYS.

1760: Primary node does not own hardware for configuration objectIf the primary node does not currently own the specified devices, the API failswith an error message

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the primary node does not own the hardware for a named configurationobject.

It does this by checking for message CPFBB97 ‘Primary node does not ownhardware for configuration object &1’ being received in Message FileQCPFMSG within library QSYS.

1770: Cluster node cannot be started by cluster nodeThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a named cluster node cannot be started by another named cluster node.

2- 1 6 H a l c y o n T e m p l a t e s

Page 19: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

It does this by checking for message CPFBB98 ‘Cluster node &1 cannot bestarted by cluster node &2’ being received in Message File QCPFMSG withinlibrary QSYS.

1780: Request failed for device cluster resource groupWith an active device cluster resource group (CRG), the CRG job fails to startwhile trying to start clustering on mirror site primary node due to missinghardware on the node.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the request fails to start clustering on a mirror site primary node.

It does this by checking for message CPFBB99 ‘Request failed for devicecluster resource group &3’ being received in Message File QCPFMSG withinlibrary QSYS.

1790: Lock space not foundThe Retrieve Lock Space Attributes (QTRXRLSA) API returns information forthe specified lock space. A lock space is an internal object that is used by otherobjects to hold object and record locks.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a named lock space is not found.

It does this by checking for message CPFBDD1 ‘Lock space &1 not found’being received in Message File QCPFMSG within library QSYS.

1800: No authority to lock spaceThe caller of the API must be running under a user profile that has job control (*JOBCTL) special authority.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a user profile does not have the required level of authority to the namedlock space.

It does this by checking for message CPFBDD2 ‘No authority to lock space &1’being received in Message File QCPFMSG within library QSYS.

1810: Lock space state not validThe Retrieve Lock Space Attributes (QTRXRLSA) API returns information forthe specified lock space. A lock space is an internal object that is used by otherobjects to hold object and record locks.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a named lock space state is not valid.

It does this by checking for message CPFBDD3 ‘Lock space &1 state not valid’being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-17

Page 20: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1820: Clustered hash table server internal errorThe Connect Clustered Hash Table (QcstConnectCHT) API establishes aconnection to the clustered hash table server specified. This API returns aconnection handle that will be used on all requests to specified clustered hashtable server.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a clustered hash table server returns an internal error.

It does this by checking for message CPFBD0A ‘Clustered hash table server &1internal error’ being received in Message File QCPFMSG within library QSYS.

1830: Connection handle not activeThe Generate Clustered Hash Table Key (QcstGenerateCHTKey) API returns auniversally unique key that can be used to store an entry into the clustered hashtable. A connection must have been established with the clustered hash tableserver.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a connection handle is not active.

It does this by checking for message CPFBD0B ‘Connection handle not active’being received in Message File QCPFMSG within library QSYS.

1840: Hash table server error QcstPrintHashTableThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a hash table server error occurs when using the QcstPrintHashTableAPI.

It does this by checking for message CPFBD01 ‘Clustered hash table serverinternal error during QcstPrintHashTable’ being received in Message FileQCPFMSG within library QSYS.

1850: Start clustered hash table server failedThe Start Clustered Hash Table Server (STRCHTSVR) command is used todefine a clustered hash table server on each cluster node specified in theNODE parameter.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the command to start a clustered hash table server has failed.

It does this by checking for message CPFBD02 ‘Start clustered hash tableserver failed’ being received in Message File QCPFMSG within library QSYS.

1860: End clustered hash table server failedThe End Clustered Hash Table Server (ENDCHTSVR) command is used to endthe specified clustered hash table server on the cluster nodes specified by theNODE parameter.

2- 1 8 H a l c y o n T e m p l a t e s

Page 21: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

This message rule, which operates on a 24/7 basis, monitors for instanceswhere the command to end a clustered hash table server has failed.

It does this by checking for message CPFBD03 ‘End clustered hash tableserver failed’ being received in Message File QCPFMSG within library QSYS.

1870: Clustered hash table server failedThis message rule, which operates on a 24/7 basis, monitors for instanceswhere a clustered hash table server fails with a named reason.

It does this by checking for message CPFBD04 ‘Clustered hash table serverfailed with reason &1’ being received in Message File QCPFMSG within libraryQSYS.

1880: Key not foundThe Retrieve Clustered Hash Table Entry (QcstRetrieveCHTEntry) APIretrieves an entry from the clustered hash table specified by the connectionhandle parameter. The entry to be retrieved is identified by the key parameter.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a key cannot be found in a named clustered hash table.

It does this by checking for message CPFBD06 ‘Key not found in clustered hashtable &1’ being received in Message File QCPFMSG within library QSYS.

1890: Profile not authorized clustered hash table entryThe Store Clustered Hash Table Entry (QcstStoreCHTEntry) API stores anentry in the clustered hash table identified by the connection handle. The userthat originally stores the entry will be the owner of the entry. The owning userprofile will be used in determining authorization to an entry.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a named user profile is not authorized to clustered hash table entry.

It does this by checking for message CPFBD07 ‘User profile &1 not authorizedto clustered hash table entry’ being received in Message File QCPFMSG withinlibrary QSYS.

1900: Key already existsInformation stored in the clustered hash table is associated with a key. The keycan be generated by using the Generate Clustered Hash Table Key(QcstGenerateCHTKey) API or the user can generate their own. Unique keyscan be added from any cluster partition. However, Cluster Resource Servicesdoes not guarantee keys are unique between cluster partition. Managing uniquekeys across cluster partitions is the users responsibility.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a key already exists in a named clustered hash table entry.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-19

Page 22: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

It does this by checking for message CPFBD08 ‘Key already exists in clusteredhash table &1’ being received in Message File QCPFMSG within library QSYS.

1910: Clustered hash table server not activeThe Store Clustered Hash Table Entry (QcstStoreCHTEntry) API stores anentry in the clustered hash table identified by the connection handle. When thisAPI is called, the clustered hash table server must be active on the requestingnode.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a named clustered hash table server is inactive or not responding.

It does this by checking for message CPFBD09 ‘Clustered hash table server &1not active or not responding’ being received in Message File QCPFMSG withinlibrary QSYS.

1920: Cluster resource group is failing overWithin IBM® i high availability environments, you can specify a cluster messagequeue where you can receive and respond to messages that provide details onfailover events in the cluster. This message provides information on all clusterresource groups (CRGs), which are failing over to the same node when theprimary node for the CRGs ends or fails.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere a named cluster resource group is failing over named nodes.

It does this by checking for message CPABB01 ‘Cluster resource group &1 isfailing over from node &2 to node &3’ being received in Message FileQCPFMSG within library QSYS.

1930: Cluster resource groups are failing overFor node-level failovers, one message (CPABB02) is sent to cluster messagequeue on the first backup node which controls all CRGs failing over to thatnode.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere cluster resource groups is failing over named node.

It does this by checking for message CPABB02 ‘Cluster resource groups arefailing over to node &1’ being received in Message File QCPFMSG within libraryQSYS.

1940: ASP resources exceededThis may have serious ramifications. Cancelling the offending job relieves theproblem in most cases. The system does not automatically cancel the offendingjob. If the job is from a single-threaded JOBQ or a single-threaded subsystem,other jobs behind it are held up until the offending job is handled. Possiblescheduling impacts may occur.

2- 2 0 H a l c y o n T e m p l a t e s

Page 23: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

This message rule, which operates on a 24/7 basis, monitors for instanceswhere ASP resources are exceeded.

It does this by checking for message MCH2814 ‘ASP resources exceeded’being received in Message File QCPFMSG within library QSYS.

1950: File system failure, IASP may not be availableWhen an IASP is not available (or before the IASP is created), it is possible tocreate a directory with the name of the IASP. If there is a directory with thesame name as an IASP, when the IASP is varied on: The MOUNT operation willbe successful if the existing directory is empty.

The MOUNT operation will fail if there are any objects in the existing directory.The vary-on will not fail. The first indication of failure is likely to occur whenusers try to access objects in the directory. The objects will be missing orincorrect.

The only indication that the MOUNT operation failed is message CPDB414 - filesystem failure with a reason code 1 (The directory to be mounted over is notempty) in the joblog of the thread that performed the vary-on operation. If theIASP environment uses IFS, each vary-on operation should be checked toensure that the IFS mounted properly.

This message rule, which operates on a 24/7 basis, monitors for instances of afile system failure resulting in an IASP not being available.

It does this by checking for message CPDB414 ‘File system failure.Independent ASP may not be usable’ being received in Message FileQCPFMSG within library QSYS.

1960: Cluster Resource Services communications failThis message rule, which operates on a 24/7 basis, monitors for instanceswhere there is a failure in a Cluster Resource Services communications.

It does this by checking for message CPFBB22 ‘Cluster Resource Servicescommunications failure on cluster node &2’ being received in Message FileQCPFMSG within library QSYS.

1970: Automatic fail over not startedThis problem affects active application Cluster Resource Groups which have afailover message queue defined. On an application failure, it is possible that theuser can get a CPFBB1E followed by a CPFBB4F, even though there is anactive backup node. This will cause the failover to fail.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere an automatic failover is not started for a named cluster resource group.

It does this by checking for message CPFBB4F ‘Automatic fail over not startedfor cluster resource group &1 in cluster &2’ being received in Message FileQCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-21

Page 24: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

1980: Attempt to change asp session while varied onThis message rule, which operates on a 24/7 basis, monitors for instanceswhere an attempt to change an asp session is made while it is varied on.

It does this by checking for message CPD26B9 ‘Device &1 must be varied offfor this change’ being received in Message File QCPFMSG within library QSYS.

1990: Cross-site Mirroring synchronization completeAfter geographic mirroring is configured, the production copy and mirror copyfunction as a disk unit. When the production copy is made available, the mirrorcopy is brought to a state that allows geographic mirroring to be performed.Synchronization occurs when you make the disk pool available after youconfigure geographic mirroring. When geographic mirroring is active, changesto the production copy data are transmitted to the mirror copy across TCP/IPconnections. Changes can be transmitted either synchronously orasynchronously.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere cross-site mirroring synchronization is complete.

It does this by checking for message CPI095D ‘Cross-site Mirroring (XSM)synchronization for IASP &1 is complete’ being received in Message FileQCPFMSG within library QSYS.

2000: Cross-site Mirroring is not activeCross-site mirroring is a collective term that covers several i5/OS® supportedhigh availability mirroring technologies which provide disaster recovery and highavailability by maintaining a mirrored copy of the data. These technologies alsomanage the replication process and control the point of access to the data. Inthe event of an outage on the source or production system, the mirrored datastored on the target system can be made available, either automatically ormanually.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere cross-site mirroring is not active.

It does this by checking for message HAI2001 ‘Cross-site Mirroring (XSM) is notactive’ being received in Message File QCPFMSG within library QSYS.

2010: Cluster resource group is failing over from node to nodeThis message rule, which operates on a 24/7 basis, monitors for instances of acluster resource group failing over between named nodes.

It does this by checking for message CPIBB18 ‘Cluster resource group &1 isfailing over from node &3 to node &2’ being received in Message FileQCPFMSG within library QSYS.

2- 2 2 H a l c y o n T e m p l a t e s

Page 25: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2020: Cluster resource group deletedThis message rule, which operates on a 24/7 basis, monitors for instances of anamed cluster resource group being deleted.

It does this by checking for message CPIBB17 ‘Cluster resource group &1deleted’ being received in Message File QCPFMSG within library QSYS.

2030: Reattach of ASP session was requestedIf you change from one disk pool to another and resources are registered withcommitment control on the disk pool, the SETASPGRP command fails withmessage CPDB8EC, reason code 2: The thread has an uncommittedtransaction.

This message if followed by message CPFB8E9. If you change disk pools andno resources are registered with commitment control, the commitmentdefinitions are moved to the independent disk pool to which you are switching. Ifyou change from the system disk pool (ASP group *NONE), commitment controlis not affected.

The commitment definitions stay on the system disk pool. If you use a notifyobject, the notify object must reside on the same independent disk pool orindependent disk pool group as the commitment definition. If you move thecommitment definition to another independent disk pool or independent diskpool group, the notify object must also reside on that other independent diskpool or independent disk pool group.

The notify object on the other independent disk pool or independent disk poolgroup is updated if the commitment definition ends abnormally. If the notifyobject is not found on the other independent disk pool or independent disk poolgroup, the update fails with message CPF8358.

This message rule, which operates on a 24/7 basis, monitors for instances of arequest to reattach an ASP session.

It does this by checking for message HAA2000 ‘Reattach of ASP sessionIASP_MM was requested (C, G)’ being received in Message File QCPFMSGwithin library QSYS.

2040: ASP group not set for threadRunning SETASPGRP is not allowed in PDM, SEU, and the other ApplicationDevelopment Toolset tools. To resolve the issue, you should run theSETASPGRP command outside of PDM and the other tools.

This message rule, which operates on a 24/7 basis, monitors for instances of anamed ASP group not being set for thread.

It does this by checking for message CPDB8EC ‘ASP group &2 not set forthread &3’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-23

Page 26: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2050: ASP group not set for threadRunning SETASPGRP is not allowed in PDM, SEU, and the other ApplicationDevelopment Toolset tools. To resolve the issue, you should run theSETASPGRP command outside of PDM and the other tools.

This message rule, which operates on a 24/7 basis, monitors for instances of anamed ASP group not being set for thread.

It does this by checking for message CPDB8E9 ‘ASP group &2 not set forthread &3’ being received in Message File QCPFMSG within library QSYS.

2060: Notify object not updatedA notify object is a message queue, data area, or database file that containsinformation identifying the last successful transaction completed for a particularcommitment definition if that commitment definition did not end normally.

For independent disk pools, the notify object must reside on the sameindependent disk pool or independent disk pool group as the commitmentdefinition. If you move the commitment definition to another independent diskpool or independent disk pool group, the notify object must also reside on thatother independent disk pool or independent disk pool group. The notify objecton the other independent disk pool or independent disk pool group is updated ifthe commitment definition ends abnormally. If the notify object is not found onthe other independent disk pool or independent disk pool group, the update failswith message CPF8358.

This message rule, which operates on a 24/7 basis, monitors for instances of anamed notify object not being updated.

It does this by checking for message CPF8358 ‘Notify object &1 in &2 notupdated’ being received in Message File QCPFMSG within library QSYS.

2070: ASP storage threshold reachedDuring system operation, if the threshold value for the disk pool is exceeded,the message CPI0953 is generated. The message data for CPI0953 containsthe auxiliary storage capacity, the auxiliary storage used, the percentage ofthreshold, and the percentage of auxiliary storage available.

This message rule, which operates on a 24/7 basis, monitors for instances of anamed notify object not being updated.

It does this by checking for message CPI0953 ‘ASP storage threshold reached’being received in Message File QCPFMSG within library QSYS.

2080: Severe error occurred during a GMIR Target FlashAfter a hardware issue which caused a Fatal condition on the Global mirrorsession, the Global copy (H1-I2) pairs stay in a suspending state because thetarget volume was write inhibited in the revertible state.

2- 2 4 H a l c y o n T e m p l a t e s

Page 27: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

This message rule, which operates on a 24/7 basis, monitors for instances of asevere error occurring during a GMIR Target Flash.

It does this by checking for messages IAS0383, CPA0701 (IAS0383) orCPF9999 (IAS0383) ‘Notify object &1 in &2 not updated’ being received inMessage File QCPFMSG within library QSYS.

2090: ASP not in correct state for operationThis message rule, which operates on a 24/7 basis, monitors for instances of anASP not being in the correct state for the intended operation.

It does this by checking for message CPFBA5A ‘ASP not in correct state foroperation’ being received in Message File QCPFMSG within library QSYS.

2100: CRG not in correct state for operationThis message rule, which operates on a 24/7 basis, monitors for instances of acluster resource group not being in the correct state for the intended operation.

It does this by checking for message CPFBA5B ‘CRG not in correct state foroperation’ being received in Message File QCPFMSG within library QSYS.

2110: ASP not eligible for operationThis message rule, which operates on a 24/7 basis, monitors for instances of anASP not being considered eligible for the intended operation.

It does this by checking for message CPFBA5C ‘ASP not eligible for operation’being received in Message File QCPFMSG within library QSYS.

2120: CRG not in correct state for operationThis message rule, which operates on a 24/7 basis, monitors for instances of anCRG not in correct state for the intended operation.

It does this by checking for message CPFBA54 ‘CRG not in correct state foroperation’ being received in Message File QCPFMSG within library QSYS.

2130: Geographic mirroring operation failedThe Start Disk Management Operation (QYASSDMO) API performs variousdisk management operations as indicated by the Operation key inputparameter.

This message rule, which operates on a 24/7 basis, monitors for instances of afailure of a geographic mirroring operation.

It does this by checking for message CPFBA55 ‘Geographic mirroring operationfailed’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-25

Page 28: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2140: Disk management operation failedThe Start Disk Management Operation (QYASSDMO) API performs variousdisk management operations as indicated by the Operation key inputparameter.

This message rule, which operates on a 24/7 basis, monitors for instances of adisk management operation failure.

It does this by checking for message CPFBA56 ‘Disk management operationfailed’ being received in Message File QCPFMSG within library QSYS.

2150: Disk balancing request not processedThis message rule, which operates on a 24/7 basis, monitors for instances of a disk balancing request not being processed.

It does this by checking for message CPFBA57 ‘Disk balancing request notprocessed’ being received in Message File QCPFMSG within library QSYS.

2160: Disk management operation session is in useThis message rule, which operates on a 24/7 basis, monitors for instances ofwhen a disk management operation session is in use.

It does this by checking for message CPFBA58 ‘Disk management operationsession is in use’ being received in Message File QCPFMSG within libraryQSYS.

2170: Auxiliary storage pool not foundThis message rule, which operates on a 24/7 basis, monitors for instances ofwhen a requested auxiliary storage pool is not found.

It does this by checking for message CPFBA59 ‘Auxiliary storage pool notfound’ being received in Message File QCPFMSG within library QSYS.

2180: Mismatch on node role and auxiliary storage poolThis message rule, which operates on a 24/7 basis, monitors for instances ofmismatch on anode role and auxiliary storage pool.

It does this by checking for message CPFBBAA ‘Mismatch on node role andauxiliary storage pool role’ being received in Message File QCPFMSG withinlibrary QSYS.

2190: Node role not validThis message rule, which operates on a 24/7 basis, monitors for instances of anamed node role not being valid.

It does this by checking for message CPFBBAB ‘Node role &1 not valid’ beingreceived in Message File QCPFMSG within library QSYS.

2- 2 6 H a l c y o n T e m p l a t e s

Page 29: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2200: Recovery domain node role is not validA recovery domain is a subset of nodes in the cluster that are grouped in acluster resource group for purposes, such as performing a recovery action.

This message rule, which operates on a 24/7 basis, monitors for instances of anamed node role not being valid.

It does this by checking for message CPFBBAF ‘Recovery domain node role isnot valid’ being received in Message File QCPFMSG within library QSYS.

2210: Cluster node not respondingThis message rule, which operates on a 24/7 basis, monitors for instances of anamed cluster node in a named cluster resource group not responding.

It does this by checking for message CPFBBA0 ‘Cluster node &1 in clusterresource group &2 is not responding’ being received in Message FileQCPFMSG within library QSYS.

2220: All cluster command user queues busyQueues do not correspond to individual hosts; each queue can use all serverhosts in the cluster, or a configured subset of the server hosts.

This message rule, which operates on a 24/7 basis, monitors for instanceswhere all cluster command user queues are busy.

It does this by checking for message CPFBBA1 ‘All cluster command userqueues busy’ being received in Message File QCPFMSG within library QSYS.

2230: Connection for service cluster cannot be madeThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a connection for a service as400-cluster cannot be made.

It does this by checking for message CPFBBBB ‘Connection for service as400-cluster cannot be established’ being received in Message File QCPFMSGwithin library QSYS.

2240: Cluster resource services has an internal errorCluster Resource Services consists of a set of multi-threaded jobs.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere cluster resource services has an internal error.

It does this by checking for message CPFBBBC ‘Cluster resource services hasan internal error’ being received in Message File QCPFMSG within libraryQSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-27

Page 30: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2250: All cluster command user spaces busyThe Work with Cluster Administrative Domain Monitored Resource Entries(WRKCADMRE) command displays the current monitored resource entries forthe given cluster administrative domain.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere all cluster command user spaces are busy.

It does this by checking for message CPFBBB3 ‘All cluster command userspaces busy’ being received in Message File QCPFMSG within library QSYS.

2260: Create cluster administrative domain failedThe Create Cluster Administrative Domain (CRTADMDMN) command creates apeer cluster resource group object which represents the cluster administrativedomain. The cluster administrative domain provides synchronization ofmonitored resources across the active nodes in the domain. The cluster nodeswhich are defined within the cluster administrative domain will participate in thesynchronization process. The cluster administrative domain name is the nameof the cluster resource group which is being created.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the create cluster administrative domain command has failed.

It does this by checking for message CPFBBB4 ‘Create cluster administrativedomain failed’ being received in Message File QCPFMSG within library QSYS.

2270: Create cluster administrative domain already existsThe cluster administrative domain name is the name of the cluster resourcegroup which is being created.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the create cluster administrative domain command has failed becausethe domain already exists.

It does this by checking for message CPFBBB8 ‘Cluster administrative domainalready exists on cluster node &1’ being received in Message File QCPFMSGwithin library QSYS.

2280: Cluster administrative domain internal errorThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster administrative domain reports an internal error.

It does this by checking for message CPFBBB9 ‘Cluster administrative domain&1 internal error’ being received in Message File QCPFMSG within libraryQSYS.

2- 2 8 H a l c y o n T e m p l a t e s

Page 31: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2290: Cluster monitor error on nodeThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster monitor error exists on a named node.

It does this by checking for message CPFBBCB ‘Cluster monitor error on node&3’ being received in Message File QCPFMSG within library QSYS.

2300: Request failed for device cluster resource groupThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a request fails for a named device cluster resource group.

It does this by checking for message CPFBBCC ‘Request failed for devicecluster resource group &3’ being received in Message File QCPFMSG withinlibrary QSYS.

2310: Cluster message cannot be sentA cluster queue definition is advertised to other queue managers in the cluster.The other queue managers in the cluster can put messages to a cluster queuewithout needing a corresponding remote-queue definition.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster message cannot be sent.

It does this by checking for message CPFBBCF ‘Cluster message cannot besent’ being received in Message File QCPFMSG within library QSYS.

2320: Cluster resource group not allowed to be changedThe Change Cluster Resource Group (CHGCRG) command changes some ofthe attributes of a cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster resource group is not allowed to be changed.

It does this by checking for message CPFBBC0 ‘Cluster resource group &1 notallowed to be changed’ being received in Message File QCPFMSG withinlibrary QSYS.

2330: Failover action information could not be retrievedThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere failover action information could not be retrieved.

It does this by checking for message CPFBBC1 ‘Failover action informationcould not be retrieved’ being received in Message File QCPFMSG within libraryQSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-29

Page 32: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2340: Application identifier not validThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere an invalid application identifier has been discovered.

It does this by checking for message CPFBBC2 ‘Application identifier not valid’being received in Message File QCPFMSG within library QSYS.

2350: Cluster administrative domain does not existThe cluster administrative domain name is the name of the cluster resourcegroup which is being created.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the cluster administrative domain does not exist.

It does this by checking for message CPFBBC3 ‘Cluster administrative domain&1 does not exist’ being received in Message File QCPFMSG within libraryQSYS.

2360: Request not valid for cluster modeThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a request is not valid for a named cluster node.

It does this by checking for message CPFBBC4 ‘Request not valid for clusternode &1’ being received in Message File QCPFMSG within library QSYS.

2370: Request failed for device cluster resource groupThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a request failed for a device cluster resource group.

It does this by checking for message CPFBBC5 ‘Request failed for devicecluster resource group &2’ being received in Message File QCPFMSG withinlibrary QSYS.

2380: Duplicate nodes found in domain node listThe Domain node list identifies the nodes that compose the administrativedomain. Nodes in the administrative domain must be unique.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a request failed for a device cluster resource group.

It does this by checking for message CPFBBC6 ‘Duplicate nodes found indomain node list’ being received in Message File QCPFMSG within libraryQSYS.

2- 3 0 H a l c y o n T e m p l a t e s

Page 33: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2390: Cluster Resource Services job cancelledThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster resource services job has been cancelled.

It does this by checking for message CPFBBC7 ‘Cluster Resource Services jobcancelled’ being received in Message File QCPFMSG within library QSYS.

2400: Cluster message is not validThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster message is not valid.

It does this by checking for message CPFBBD0 ‘Cluster message is not valid’being received in Message File QCPFMSG within library QSYS.

2410: Local cluster node &1 not found in member arrayThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named local cluster node is not found in a member array.

It does this by checking for message CPFBBD1 ‘Local cluster node &1 notfound in member array’ being received in Message File QCPFMSG withinlibrary QSYS.

2420: Unable to perform PowerHA requestThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere it has not been possible to perform a PowerHA request.

It does this by checking for message CPFBBD4 ‘Unable to perform PowerHArequest’ being received in Message File QCPFMSG within library QSYS.

2430: Multiple nodes not allowed for siteThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere multiple nodes are not allowed for a named site.

It does this by checking for message CPFBBD6 ‘Multiple nodes not allowed forsite &4’ being received in Message File QCPFMSG within library QSYS.

2440: Cluster node not activeThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster node is not active in a named cluster.

It does this by checking for message CPFBB0A ‘Node &1 is not active in cluster&2’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-31

Page 34: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2450: Cluster node ID specified more than onceThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster node ID is specified more than once.

It does this by checking for message CPFBB0C ‘Cluster node ID &1 specifiedmore than once’ being received in Message File QCPFMSG within libraryQSYS.

2460: Cluster interface address specified more than onceThe cluster interface address is an IP address that is used by Cluster ResourceServices to communicate with other nodes in the cluster.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster interface address is specified more than once.

It does this by checking for message CPFBB0D ‘Cluster interface address &2specified more than once’ being received in Message File QCPFMSG withinlibrary QSYS.

2470: Cluster resource group type not validThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster resource group type is not valid.

It does this by checking for message CPFBB0E ‘Cluster resource group type &1not valid’ being received in Message File QCPFMSG within library QSYS.

2480: Cluster resource group does not exist in clusterThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster resource group type does not exist in the cluster.

It does this by checking for message CPFBB0F ‘Cluster resource group doesnot exist in cluster’ being received in Message File QCPFMSG within libraryQSYS.

2490: Cluster already existsThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster already exists.

It does this by checking for message CPFBB01 ‘Cluster already exists’ beingreceived in Message File QCPFMSG within library QSYS.

2500: Cluster does not existThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster does not exist.

It does this by checking for message CPFBB02 ‘Cluster &1 does not exist’ beingreceived in Message File QCPFMSG within library QSYS.

2- 3 2 H a l c y o n T e m p l a t e s

Page 35: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2510: Number of cluster node entries not validThe Create Cluster (QcstCreateCluster) API is used to create a new cluster ofone or more nodes. Each node specified on the “Cluster membershipinformation” parameter will be placed in the cluster membership list.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the number of cluster node entries is not valid.

It does this by checking for message CPFBB03 ‘Number of cluster node entriesnot valid’ being received in Message File QCPFMSG within library QSYS.

2520: Number of cluster interface addresses not validThe cluster interface address is an IP address that is used by Cluster ResourceServices to communicate with other nodes in the cluster.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the number of cluster interface addresses is not valid.

It does this by checking for message CPFBB04 ‘Number of cluster interfaceaddresses not valid’ being received in Message File QCPFMSG within libraryQSYS.

2530: Cluster node cannot be startedThe Start Cluster Node (QcstStartClusterNode) API is used to start ClusterResource Services on a node in the cluster.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the number of cluster interface addresses is not valid.

It does this by checking for message CPFBB05 ‘Cluster node cannot be started’being received in Message File QCPFMSG within library QSYS.

2540: Incoming request from cluster node rejectedThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere an incoming request from a named cluster in a named cluster beingrejected.

It does this by checking for message CPFBB06 ‘Incoming request from clusternode &3 in cluster &2 rejected’ being received in Message File QCPFMSGwithin library QSYS.

2550: Node could not be added to clusterThe Add Cluster Node Entry (QcstAddClusterNodeEntry) API is used to add a node to the membership list of an existing cluster.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named node could not be added to a named cluster.

It does this by checking for message CPFBB07 ‘Node &1 could not be added tocluster &2’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-33

Page 36: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2560: Cluster name not validThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster name is found to invalid.

It does this by checking for message CPFBB08 ‘Cluster name &1 not valid’being received in Message File QCPFMSG within library QSYS.

2570: Cluster node does not exist in clusterThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the named cluster node does not exist in a named cluster.

It does this by checking for message CPFBB09 ‘Cluster node &1 does not existin cluster &2’ being received in Message File QCPFMSG within library QSYS.

2580: >=1 node in recovery domain of CRG must be activeWithin i5/OS clusters technology, a recovery domain is a subset of clusternodes that are grouped together in a cluster resource group (CRG) for acommon purpose such as performing a recovery action or synchronizingevents. There must be at least one active node in the recovery domain.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere there are no active nodes in the recovery domain of a named clusterresource group.

It does this by checking for message CPFBB1A ‘At least one node in therecovery domain of cluster resource group &1 must be active’ being received inMessage File QCPFMSG within library QSYS.

2590: Cluster node does not exist in the recoveryA recovery domain is a subset of cluster nodes that are grouped together in acluster resource group (CRG) for a common purpose such as performing arecovery action or synchronizing events.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the named cluster node does not exist in the recovery domain.

It does this by checking for message CPFBB1B ‘Cluster node &1 does not existin the recovery domain’ being received in Message File QCPFMSG withinlibrary QSYS.

2600: Cluster node cannot be endedThe End Cluster Node (QcstEndClusterNode) API is used to end ClusterResource Services on one or all the nodes in the membership list of an existingcluster.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the named cluster node cannot be ended.

2- 3 4 H a l c y o n T e m p l a t e s

Page 37: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

It does this by checking for message CPFBB1C ‘Cluster node &1 in cluster &2cannot be ended’ being received in Message File QCPFMSG within libraryQSYS.

2610: Cluster node cannot be changedThe Change Cluster Node Entry (CHGCLUNODE) command is used to changecluster membership information for a cluster node entry. The information thatcan be changed is the cluster interface addresses defined for the node andstatus of the node.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the named cluster node in the cluster resource group cannot bechanged.

It does this by checking for message CPFBB1D ‘Cluster node &1 in clusterresource group &2 cannot be changed’ being received in Message FileQCPFMSG within library QSYS.

2620: A switch over cannot be done for cluster resource groupThe Initiate Switchover (QcstInitiateSwitchOver) API changes the current rolesof nodes in the recovery domain of a cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a switchover cannot be performed for the named cluster resource group.

It does this by checking for message CPFBB1E ‘A switch over can not be donefor cluster resource group &1’ being received in Message File QCPFMSG withinlibrary QSYS.

2630: Cluster node never started on this systemUse the Cluster Node attribute group to monitor the status of nodes defined inan i5/OS cluster. Node Status shows the status of the node in the cluster. Thefield is an integer. Enumerated values are: New - A node has been added tothe cluster membership list but the Cluster Resource Services has never beenstarted on that node.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster node has been added but never started on this system.

It does this by checking for message CPFBB1F ‘Cluster node never started onthis system’ being received in Message File QCPFMSG within library QSYS.

2640: Specified cluster interface not defined on systemThis message rule, which operates on a 24/7 basis, monitors for instances of where the specified cluster node has not been defined on this system.

It does this by checking for message CPFBB10 ‘Specified cluster interface notdefined on this system’ being received in Message File QCPFMSG within libraryQSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-35

Page 38: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2650: Cluster node already exists in clusterThis message rule, which operates on a 24/7 basis, monitors for instances of where the named cluster node already exists in the named cluster.

It does this by checking for message CPFBB11 ‘Cluster node &1 already existsin cluster &2’ being received in Message File QCPFMSG within library QSYS.

2660: Cluster node could not be startedThe Start Cluster Node (QcstStartClusterNode) API is used to start ClusterResource Services on a node in the cluster. If Cluster Resource Services issuccessfully started on the node specified, the status of the node will be set toActive.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the named cluster node could not be started in the named cluster.

It does this by checking for message CPFBB12 ‘Cluster node &1 in cluster &2could not be started’ being received in Message File QCPFMSG within libraryQSYS.

2670: Request is not allowed for cluster resource groupThe Start Cluster Resource Group (QcstStartClusterResourceGroup) API willenable resiliency for the specified cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the request is not allowed for the named cluster resource group.

It does this by checking for message CPFBB18 ‘Request &1 is not allowed forcluster resource group &2’ being received in Message File QCPFMSG withinlibrary QSYS.

2680: Cluster node in cluster is already activeThe Start Cluster Node (QcstStartClusterNode) API is used to start ClusterResource Services on a node in the cluster. If Cluster Resource Services issuccessfully started on the node specified, the status of the node will be set toActive.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the named cluster node in the named cluster is already active.

It does this by checking for message CPFBB19 ‘Cluster node &1 in cluster &2 isalready active’ being received in Message File QCPFMSG within library QSYS.

2690: Cluster version cannot be adjustedThe Adjust Cluster Version (QcstAdjustClusterVersion) API is used to adjust thecurrent version of the cluster. The current cluster version is the version at whichthe nodes in the cluster are actively communicating with each other.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the cluster version cannot be adjusted.

2- 3 6 H a l c y o n T e m p l a t e s

Page 39: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

It does this by checking for message CPFBB2F ‘Cluster version &1 cannot beadjusted’ being received in Message File QCPFMSG within library QSYS.

2700: Cluster partition detected for clusterA cluster partition occurs in a cluster whenever contact is lost between one ormore nodes in the cluster and a failure of the lost nodes cannot be confirmed.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a partition has been detected for a named cluster within a named clusternode.

It does this by checking for message CPFBB20 ‘Cluster partition detected forcluster &1 by cluster node &2’ being received in Message File QCPFMSGwithin library QSYS.

2710: Cluster partition condition no longer existsThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a cluster partition condition no longer exists has been detected for anamed cluster within a named cluster node.

It does this by checking for message CPFBB21 ‘Cluster partition condition nolonger exists for cluster &1’ being received in Message File QCPFMSG withinlibrary QSYS.

2720: Cluster Resource Services communications failAdvanced node failure detection function is provided which can be used toreduce the number of failure scenarios which result in cluster partitions. Failuremight be the result of a communication failure between cluster nodes or anentire cluster node has failed.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a Cluster Resource Services communications failure on a named clusternode has occurred.

It does this by checking for message CPFBB22 ‘Cluster Resource Servicescommunications failure on cluster node &2’ being received in Message FileQCPFMSG within library QSYS.

2730: Cluster interface responding in clusterThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster interface is responding in a named cluster.

It does this by checking for message CPFBB23 ‘Cluster interface &3 respondingin cluster &1’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-37

Page 40: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2740: Cluster Resource Services not active or respondingThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere Cluster Resource Services are found to be inactive or non-responsive.

It does this by checking for message CPFBB26 ‘Cluster Resource Services notactive or not responding’ being received in Message File QCPFMSG withinlibrary QSYS.

2750: Primary node was not specified for recovery domainA recovery domain is a subset of cluster nodes that are grouped together in acluster resource group (CRG) for a common purpose such as performing arecovery action or synchronizing events. The primary node is the cluster nodethat is the primary point of access for the resilient cluster resource.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a primary node was not specified for a recovery domain.

It does this by checking for message CPFBB27 ‘A primary node was notspecified for the recovery domain’ being received in Message File QCPFMSGwithin library QSYS.

2760: Timeout detected waiting for responseThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a timeout has been detected while waiting for a response.

It does this by checking for message CPFBB3C ‘Timeout detected waiting for aresponse from &1’ being received in Message File QCPFMSG within libraryQSYS.

2770: Exit program failed during switch over of CRGThe exit program performs actions when the CRG detects certain events, suchas a new node being added to the recovery domain, or the current primary nodefailing.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere an exit program fails during the switch over of a named cluster resourcegroup.

It does this by checking for message CPFBB3D ‘Exit program failed duringswitch over of cluster resource group &1’ being received in Message FileQCPFMSG within library QSYS.

2780: CRG count not join or merge with clusterThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster resource group on a named node would not join ormerge with a named cluster.

2- 3 8 H a l c y o n T e m p l a t e s

Page 41: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

It does this by checking for message CPFBB3E ‘Cluster resource group &1 onnode &2 could not join or merge with cluster &3’ being received in Message FileQCPFMSG within library QSYS.

2790: All nodes in clusterThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere all nodes in a named cluster failed while another named node wasattempting to join the cluster.

It does this by checking for message CPFBB3F ‘All nodes in cluster &1 failedwhile node &2 was attempting to join’ being received in Message FileQCPFMSG within library QSYS.

2800: Cluster node already exists in recovery domainThe Add Node To Recovery Domain API is used to add a new node to therecovery domain of an existing cluster resource group. This API causes thepreferred and current roles of all nodes in the recovery domain to be updated.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster node already exists in the recovery domain for a namedcluster resource group.

It does this by checking for message CPFBB33 ‘Cluster node &1 already existsin recovery domain for cluster resource group &4’ being received in MessageFile QCPFMSG within library QSYS.

2810: CRG already exists in clusterThe Create Cluster Resource Group (CRTCRG) command creates a clusterresource group object. The cluster resource group serves as the control objectfor a collection of resilient resources.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a named cluster resource group already exists in the named cluster.

It does this by checking for message CPFBB34 ‘Cluster resource group &1already exists in cluster &2’ being received in Message File QCPFMSG withinlibrary QSYS.

2820: Number of cluster nodes invalidThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the number of cluster nodes specified for the recovery domain is invalid.

It does this by checking for message CPFBB36 ‘The number of cluster nodesspecified for the recovery domain is not valid’ being received in Message FileQCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-39

Page 42: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2830: Cluster resource group has two primary nodesThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the cluster resource group has two primary nodes defined.

It does this by checking for message CPFBB4A ‘Cluster resource group &1 hastwo primary nodes defined’ being received in Message File QCPFMSG withinlibrary QSYS.

2840: Node removed from cluster resource groupThe Remove Cluster Node Entry (RMVCLUNODE) command is used to removea node from a cluster. The node specified will be removed from the clustermembership list and will no longer be considered a member of the cluster. Thenode will also be removed from the membership of the device domain to whichit belongs. The cluster resource group objects on the node being removed aredeleted only if the node has a status of Active or if this command is called on thenode that is being removed.

This message rule, which operates on a 24/7 basis, monitors for instances of anode being removed from a cluster resource group.

It does this by checking for message CPFBB4B ‘Node &1 removed from clusterresource group &2’ being received in Message File QCPFMSG within libraryQSYS.

2850: Node not removed from cluster resource groupThis message rule, which operates on a 24/7 basis, monitors for instances of anode not being removed from a cluster resource group as expected.

It does this by checking for message CPFBB4C ‘Node &1 could not be removedfrom cluster resource group &2’ being received in Message File QCPFMSGwithin library QSYS.

2860: Cluster Resource Services cannot process requestCluster Resource Services consists of a set of multi-threaded jobs. Whenclustering is active on an IBM i, the jobs will be run in the QSYSWRKsubsystem.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere Cluster Resource Services cannot process a request.

It does this by checking for message CPFBB4D ‘Cluster Resource Servicescannot process the request’ being received in Message File QCPFMSG withinlibrary QSYS.

2- 4 0 H a l c y o n T e m p l a t e s

Page 43: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2870: Automatic failover may not have completed for CRGThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the automatic failover may not have completed for the Cluster ResourceGroup.

It does this by checking for message CPFBB4E ‘Automatic fail over may nothave completed for cluster resource group &1 in cluster &2’ being received inMessage File QCPFMSG within library QSYS.

2880: Automatic fail over not started for CRGThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the automatic failover may not have started for the Cluster ResourceGroup.

It does this by checking for message CPFBB4F ‘Automatic fail over not startedfor cluster resource group &1 in cluster &2’ being received in Message FileQCPFMSG within library QSYS.

2890: CRG exit program ended abnormallyFor most action codes, Cluster Resource Services waits for the exit program tofinish before continuing. A time out is not used. If the exit program goes into along wait such as waiting for a response to a message sent to an operator, noother work will be started for the affected cluster resource group. In the case ofa long wait during failover processing for a node failure, all Cluster ResourceServices jobs are affected and no other cluster work will be started.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the Cluster Resource Group exit program has ended abnormally.

It does this by checking for message CPFBB41 ‘CRG exit program endedabnormally’ being received in Message File QCPFMSG within library QSYS.

2900: CRG exit program already runningThis message rule, which operates on a 24/7 basis, monitors for instances of where the automatic failover may not have started for the Cluster Resource Group.

It does this by checking for message CPFBB42 ‘Cluster resource group &1 exitprogram already running’ being received in Message File QCPFMSG withinlibrary QSYS.

2910: CRG internal errorThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the Cluster Resource Services may have encountered an internal error.

It does this by checking for message CPFBB46 ‘Cluster Resource Servicesinternal error’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-41

Page 44: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2920: CRG detected error and may have ended abnormallyThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the Cluster Resource Services has detected an error and may haveended abnormally.

It does this by checking for message CPFBB47 ‘Cluster Resource Servicesended abnormally’ being received in Message File QCPFMSG within libraryQSYS.

2930: CRG error detectedWhile it is unlikely you will ever experience a damaged object, it may bepossible for cluster resource services objects to become damaged.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the Cluster Resource Services has detected an error.

It does this by checking for message CPFBB48 ‘Cluster Resource Serviceserror detected’ being received in Message File QCPFMSG within library QSYS.

2940: Automatic merge of cluster partitions failedThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the automatic merge of cluster partitions has failed.

It does this by checking for message CPFBB49 ‘Automatic merge of clusterpartitions failed’ being received in Message File QCPFMSG within libraryQSYS.

2950: Device domain for recovery domains incorrectThe IOP or high-speed link I/O bridge controlling the devices specified in acluster resource group must be accessible by all nodes in the cluster resourcegroup's recovery domain or by all nodes within the same site (for cross-sitemirroring).

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the automatic merge of cluster partitions has failed.

It does this by checking for message CPFBB5A ‘Device domain for recoverydomain nodes not correct’ being received in Message File QCPFMSG withinlibrary QSYS.

2960: Resource name conflict for configuration objectThe resource name specified in the configuration object must be the same on allnodes.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a resource name conflict for a configuration object exists.

It does this by checking for message CPFBB5B ‘Resource name conflict forconfiguration object &2 on node &3’ being received in Message File QCPFMSGwithin library QSYS.

2- 4 2 H a l c y o n T e m p l a t e s

Page 45: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

2970: Configuration object already in cluster resource groupConfiguration objects cannot be specified in another cluster resource group.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere a resource name conflict for a configuration object exists.

It does this by checking for message CPFBB5C ‘Configuration object &1already in cluster resource group &2’ being received in Message FileQCPFMSG within library QSYS.

2980: Other related devices already in cluster resource groupThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere other related devices already exist in the cluster resource group.

It does this by checking for message CPFBB5D ‘Other related devices alreadyin cluster resource group &1’ being received in Message File QCPFMSG withinlibrary QSYS.

2990: User profile to run exit program not specifiedThis message rule, which operates on a 24/7 basis, monitors for instances ofwhere the user profile required to run the exit program has not been specified.

It does this by checking for message CPFBB5E ‘User profile to run exit programnot specified’ being received in Message File QCPFMSG within library QSYS.

3000: Cluster node not removed from cluster resource groupThe Remove Node From Recovery Domain(QcstRemoveNodeFromRcvyDomain) API is used to remove a node from therecovery domain of a cluster resource group. The node being removed does notneed to be active in the cluster to be removed from the recovery domain.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the cluster node has not been removed from the cluster resource groupas expected.

It does this by checking for message CPFBB50 ‘Cluster node &1 not removedfrom cluster resource group &2’ being received in Message File QCPFMSGwithin library QSYS.

3010: IP address already in use by clusterIf the takeover IP address is active on the node being changed, the ChangeCluster Resource Group (QcstChangeClusterResourceGroup) API will fail.

This message rule, which operates on a 24/7 basis, monitors for instances ofwhere the cluster node has not been removed from the cluster resource groupas expected.

It does this by checking for message CPFBB51 ‘IP address &4 already in useby cluster &3’ being received in Message File QCPFMSG within library QSYS.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-43

Page 46: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Job Log Rules

There are two Job Log rules contained within the POWERHA Job Log RuleGroup.

POWERHA PowerHA Rules

10: Cluster node cannot be started by cluster nodeThis issue can be caused by the following scenario. A STRCLUNOD commandfailed, and a second attempt was made from the node trying to be started. Thesecond STRCLUNOD failed because the first failed STRCLUNOD did not cleanup on some or all of the nodes in the cluster. The failure to clean up on somenodes was caused by an intermittent communication failure during a specificstep in the STRCLUNOD protocol.

This job log rule, which operates on a 24/7 basis, monitors for all batch jobs,across all users and subsystems for message CPFBB98 ‘Cluster node &1cannot be started by cluster node &2’ being received in Message FileQCPFMSG within library QSYS.

20: Cluster partition statusA cluster partition occurs in a cluster whenever contact is lost between one ormore nodes in the cluster and a failure of the lost nodes cannot be confirmed.This is not to be confused with a partition in a logical partition (LPAR)environment.

If you receive error message CPFBB20 in either the history log (QHST) or theQCSTCTL job log, a cluster partition has occurred.

This job log rule, which operates on a 24/7 basis, monitors for all jobs, across allusers and subsystems for message CPFBB20 ‘Cluster node &1 cannot bestarted by cluster node &2’ being received in the QCSTCTL job log.

Device Rules

There are two Device rules contained within the POWERHA Job Log RuleGroup.

POWERHA PowerHA Rules

10: PowerHA: Cluster node activeThis rule, which runs 24/7checks to ensure that the configuration status of all cluster node objects (*CLN) are active and raises an alert is any are found to be in any other status.

2- 4 4 H a l c y o n T e m p l a t e s

Page 47: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

20: PowerHA: Cluster Resource Group ActiveThis rule, which runs 24/7checks to ensure that the configuration status of all cluster resource group objects (*CRG) are active and raises an alert is any are found to be in any other status.

Performance Rules

There are 13 Performance rules contained within the POWERHA PerformanceRule Group.

POWERHA PowerHA Rules

10: Subsystem QHTTPSVR activeThe PowerHA for System i provides a graphical user interface within IBMSystems Director Navigator for i that is a part of the HTTP Server ADMINinstance. We recommend that you have the current level of group PTFs forHTTP Server and Java.

This performance *SUBSYSTEM rule, which operates on a 24/7 basis, monitorsthe QHTTPSVR subsystem to ensure that it is always active. An alert is raised ifthe QHTTPSVR subsystem is found to be in an inactive state.

20: QHTTPSVR Admin* jobs activeThis Performance *JOB rule checks the ADMIN and ADMIN1-4 jobs in theQHTTPSVR subsystem to ensure that they exist. If they any of these jobs arefound not to exist, an alert is raised and sent to the system console specifiedwithin the action schedule.

30: Subsystem QSYSWRK activeThis performance *SUBSYSTEM rule, which operates on a 24/7 basis, monitorsthe QSYSWRK subsystem to ensure that it is always active. All of the daemonjobs (with the exception of the file server daemon job and the database serverdaemon job) run in this subsystem. An alert is raised if the QSYSWRKsubsystem is found to be in an inactive state.

40: QSYSWRK job QUMECIMOM activeThe Common Information Model (CIM) is a language-independentprogramming model that defines the properties, operations, and relationships ofobjects in enterprise and Internet environments.

This Performance *JOB rule checks the QUMECIMOM job in the QSYSWRKsubsystem to ensure that it exists. If this job are found not to exist, an alert israised and sent to the system console specified within the action schedule.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-45

Page 48: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

50: QSYSWRK job QTOGINTD activeInternet Daemon (INETD) Super Server listens for client requests for manydifferent programs. Using INETD saves system resources by not requiringprocesses to be started and listing on ports for services that are not used often.When a client request is received, INETD generates a process to run theconfigured program to handle the request.

This Performance *JOB rule checks the QTOGINTD job in the QSYSWRKsubsystem to ensure that it exists. If this job is found not to exist, an alert israised and sent to the system console specified within the action schedule.

60: QCSTCTL Cluster Control job activeCluster control job consists of one job that is named QCSTCTL. The QCSTCTLand QCSTCRGM job are cluster critical jobs. That is, the jobs must be runningin order for the node to be active in the cluster.

This Performance *JOB rule checks the QCSCTL Cluster Control job in allsubsystems to ensure that it exists. If this job is found not to exist, an alert israised and sent to the system console specified within the action schedule.

70: QCSTCRGM Cluster Resource Group Manager job activeCluster resource group manager consists of one job that is namedQCSTCRGM. The QCSTCTL and QCSTCRGM job are cluster critical jobs.That is, the jobs must be running in order for the node to be active in the cluster.

This Performance *JOB rule checks the QCSTCRGM Cluster Resource GroupManager job in all subsystems to ensure that it exists. If this job is found not toexist, an alert is raised and sent to the system console specified within theaction schedule.

80: QSYSWRK job QHASVR activeQHASVR allows clustering commands to be funneled through it, and allows forany node in the cluster to perform a cluster-related command against anothernode, even if the node from which the command being run is not active in thecluster or not active in the associated CRG or Admin Domain.

This Performance *JOB rule checks the QHASVR job in the QSYSWRKsubsystem for a job type of BCH (Batch - Regular) to ensure that it exists. If thisjob is found not to exist, an alert is raised and sent to the system consolespecified within the action schedule.

90: Subsystem QBATCH activeThis performance *SUBSYSTEM rule, which operates on a 24/7 basis, monitorsthe QBATCH subsystem to ensure that it is always active. An alert is raised ifthe QBATCH subsystem is found to be in an inactive state.

2- 4 6 H a l c y o n T e m p l a t e s

Page 49: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

100: QSYSWRK job CHT activeThe Clustered Hash Table (CHT) Server enables applications to store andretrieve data that must be highly available across the cluster.

This Performance *JOB rule checks the CHT job in the QSYSWRK subsystemto ensure that it exists. If this job is found not to exist, an alert is raised and sentto the system console specified within the action schedule.

110: QSYSWRK job CRG_name activeThis Performance *JOB rule checks the CRG_name job in the QSYSWRKsubsystem to ensure that it exists. If this job is found not to exist, an alert israised and sent to the system console specified within the action schedule.

120: Disk Busy >40%This Performance *DISK rule checks all ASPs and Disk activity. If any ASP or Disk busy activity os found to be greater than 40%, an alert is raised and sent to the system console specified within the action schedule.

130: Any ASP using >90%This Performance *ASP rule runs 24/7 and is set to alert on the first occurrenceof the performance threshold being breached, in this case, any ASP usinggreater than 90% its capacity.

Audit Journal Rules

There are five Audit Journal rules contained within the POWERHA Audit JournalRule Group.

POWERHA PowerHA Rules

10: SV - System value changed QCFGMSGQThis rule, that runs on a 24/7 basis, checks for any change in QCFGMSGQ (themessage queue for lines, controllers and devices). If any conditions havechanged, an alert is raised and sent to the system console specified within theaction schedule.

20: SV - System value changed QRETSVRSECThe Retain Server Security (QRETSVRSEC) system value determines whetherde-cryptable authentication information associated with user profiles orvalidation list (*VLDL) entries can be retained on the host system.

This rule, that runs on a 24/7 basis, checks for any change in QRETSVRSEC. Ifany conditions have changed, an alert is raised and sent to the system consolespecified within the action schedule.

I B M P o w e r H A f o r I B M i C u s t o m i z a t i o n T e m p l a t e s 2-47

Page 50: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

30: CP - User profile changed QCLUSTERThe cluster resource group (CRG) is owned by the QCLUSTER user profile. Touse the cluster resource group commands with the cluster administrativedomain you will need to be authorized to the cluster resource group and to theQCLUSTER user profile. This rule, that runs on a 24/7 basis, checks for anychange in the QCLUSTER user profile properties since the last check wasmade. If any of these conditions have changed, an alert is raised and sent to thesystem console specified within the action schedule.

40: CP - User profile changed QHAUSRPRFThe profile is used by PowerHA to perform functions for PowerHA. Usersshould not be signing in and doing work with this profile.

This rule, that runs on a 24/7 basis, checks for any change in the QHAUSRPRFuser profile since the last check was made. If any of these conditions havechanged, an alert is raised and sent to the system console specified within theaction schedule.

40: ZC - CRG QUSRSYS Objects changedThis rule, that runs on a 24/7 basis, checks for any change to the objects in the QUSRSYS library of the Cluster Resource Group since the last check was made. If any of these conditions have changed, an alert is raised and sent to the system console specified within the action schedule.

2- 4 8 H a l c y o n T e m p l a t e s

Page 51: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Halcyon TemplatesThe following system templates are available for use with Halcyon IBMi and Windows monitoring solutions:

• AIX• AIX TEMENOS 24• AIX VIOS• HP DATA PROTECTOR• IBM SERVICES MONITORING

iCLUSTER• INFOR M3• INFOR SYSTEM 21• INFOR XA• JD EDWARDS• LINUX• MAXAVA• MISYS EQUATION• MISYS MIDAS PLUS• POWER HA• QUICK EDD• ROBOT HA• SAP• STAND GUARD ANTI VIRUS• SYMANTEC BACKUP EXEC• SYMANTEC NETBACKUP• VISION iTERA• VISION OMS/ODS REPLICATION• WEBSPHERE MQ MONITORING• WINDOWS

Page 52: Guide to PowerHA Templates - HelpSystems...PowerHA® for IBM i is the IBM® Power Systems™ solution for high availability and disaster recovery. It’s an IBM storage based clustering

Learn MoreFor white papers, online product tours, datasheets, technical tips and manuals, please visit: https://www.helpsystems.com/halcyon

Contactwww.helpsystems.com

US: Toll-free: 800-328-1000

+1 952-933-0609

Outside the U.S.: +44 (0) 1252 618030

TrademarksIBM®, iSeries®, Power/System i®, IBM i®, i5/OS® and AIX® are registered trademarks of International Business Machines Corporation in the United States and in other countries,

All other trademarks are respective of their own companies.