discover best practices - ca...

18
eHealth™ Discover Process Best Practices Managing and Troubleshooting the eHealth Discovery Process The eHealth discover mechanism is the main avenue to integrate existing network devices into the eHealth Fault and Performance management environment. Through that process, network devices are added to the eHealth configuration and provide the user with the data necessary to successfully manage their network infrastructure and to maximize the benefit of the eHealth suite Prepared by: Jason Normandin Concord Technical Support Copyright © 2004 Concord Communications, Inc. eHealth, the Concord Logo, Live Health, Live Status, SystemEDGE, AdvantEDGE and/or other Concord marks or products referenced herein are either registered trademarks or trademarks of Concord Communications, Inc. Other trademarks are the property of their respective owners.

Upload: trinhtu

Post on 31-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

eHealth™ Discover Process Best Practices

Managing and Troubleshooting the eHealth Discovery Process The eHealth discover mechanism is the main avenue to integrate existing network devices into the eHealth Fault and Performance management environment. Through that process, network devices are added to the eHealth configuration and provide the user with the data necessary to successfully manage their network infrastructure and to maximize the benefit of the eHealth suite

Prepared by: Jason Normandin Concord Technical Support

Copyright © 2004 Concord Communications, Inc. eHealth, the Concord Logo, Live Health, Live Status, SystemEDGE, AdvantEDGE and/or other Concord marks or products referenced herein are either registered trademarks or trademarks of Concord Communications, Inc. Other trademarks are the property of their respective owners.

Concord Communications – Discover Best Practices ï 2 ð

I. INTRODUCTION ...................................................................................................................................................................3 II. PREREQUISITES...............................................................................................................................................................3 III. OVERVIEW OF THE EHEALTH DISCOVER PROCESS..............................................................................................3

1. HOW DOES THE EHEALTH DISCOVER PROCESS WORK?............................................................................................................3 2. WHAT ARE THE DIFFERENCES BETWEEN AD-HOC AND SCHEDULED DISCOVERIES? ...................................................................5 3. HOW DO DISCOVERIES IMPACT MY LICENSE CONSUMPTION? ..................................................................................................6 4. EXPLANATION OF THE EHEALTH MERGE ALGORITHM .............................................................................................................7 5. HOW DOES SELECTING THE ‘MIB2’ OPTION IMPACT MY DISCOVERY RESULTS?.......................................................................9

IV. TROUBLESHOOTING COMMON DISCOVERY ISSUES .............................................................................................9 1. TROUBLESHOOTING ‘NO RESPONSE TO SNMP’ OR ‘NO RESPONSE TO PING’ ERRORS ...............................................................9 2. TROUBLESHOOTING ‘NO MIB SUPPORT FOR THIS AGENT’ ERRORS ........................................................................................11 3. TROUBLESHOOTING SYSTEMEDGE DISCOVERY ISSUES ........................................................................................................11 4. RECONCILING AND AVOIDING DUPLICATE ELEMENT CREATION ............................................................................................13

V. GENERAL DISCOVERY BEST PRACTICES....................................................................................................................14 1. AVOIDING DUPLICATE ELEMENTS BY IMPLEMENTING A STRONG CHANGE CONTROL PROCESS ...............................................14 2. MINIMIZING DATA LOSS THROUGH STATISTICS POLLER ERROR ANALYSIS AND ‘NODBDATAFOR’ TOOLS .................................14 3. USING SEED FILES TO AUTOMATE INCREMENTAL CONFIGURATION UPDATES ...........................................................................15 4. SELF MONITORING THE EHEALTH SYSTEM USING PROCESS SET CREATION ............................................................................15 5. EFFECTIVELY INTERFACING WITH CONCORD TECHNICAL SUPPORT TO RESOLVE DISCOVERY ISSUES.......................................17

VI. CHANGES TO THE DISCOVERY PROCESS IN EHEALTH 5.6.X.............................................................................17 VII. OTHER RESOURCES......................................................................................................................................................18

Concord Communications – Discover Best Practices ï 3 ð

I. Introduction

The eHealth ™ discover process is an integral piece of a successful eHealth implementation. The eHealth discover mechanism is the main avenue to integrate existing network devices into the eHealth Fault and Performance management environment. Through that process, network devices are added to the eHealth configuration and provide the user with the data necessary to successfully manage their network infrastructure and to maximize the benefit of the eHealth suite Although relatively simple, the eHealth discover process does require a ‘hands on’ approach to ensure success. This document will provide the reader with the knowledge and tools necessary to ensure success managing that process.

II. Prerequisites

This document is not intended as a replacement for the standard eHealth suite documentation such as the Administration Guide or User’s Guide. This document should be used in conjunction with the existing eHealth manuals and Concord Knowledgebase. Additionally, the reader should possess a basic understanding of the eHealth application, an understanding of their network infrastructure, and a basic understanding of SNMP and device MIBs.

III. Overview of the eHealth Discover Process

1. How does the eHealth Discover Process Work?

There are three key steps to the discovery process. 1. The finder process searches the network for everything it can find within preset limits. The preset limits, for either a scheduled or an interactive discovery, are determined by the user and are defined by three major categories. These include IP addresses to search, technology type and community string. The finder is a program written in the TCL ("Tool Control Language") scripting language. TCL is an interpreted language and like any interpreted language tcl requires a runtime interpreter. The tcl interpreter and related libraries are included in the $NH_HOME/bin/sys directory. Finder is composed of several logical pieces that operate sequentially to perform one primary mission: the creation of poll records in the $NH_HOME/poller/poller.cfg file. Finder is never run directly; rather it is called from other scripts or programs (depending on the operating system), which check environment, set variables, etc. First, finder queries the sysObjectID of the device-in-question (DiQ). The sysObjectID is an entry in the MIB2 system table that identifies the vendor who wrote and/or implemented the MIB being queried. Depending on the object class being discovered (i.e. LAN/WAN, Router, Probe, or Server) the finder will go to the class's main table and iterate through the list of possible OIDs until it finds a match for the sysObjectID retrieved from the DiQ's MIB. If a match is found, the table will then tell finder where to go next. In the case of LAN/WAN, the main table will point finder toward a vendor-specific (or perhaps IETF standard) algorithm and a vendor-specific (or IETF standard) interface table to be used as input to that algorithm. In this way finder can cover any situation where a device is supported by standard MIBs, such as an RMON probe, or vendor-

Concord Communications – Discover Best Practices ï 4 ð

specific MIBs, such as a Cabletron EMME concentrator, or a combination of both, such as a Cisco Catalyst 5000, which, while populating the MIB2 interface table (IETF standard table), places the module and port numbers in the enterprise MIB (vendor specific algorithm). Next, finder uses a collection of tables for interface types. These tables are used to choose the interface types (ifType in MIB2) that will be added to the poller configuration. We generally do not want to use all entries in the ifTable, as some entries are not relevant to eHealth. For example if we are discovering an RMON probe for ethernet statistics we most likely do not want to discover the out-of-band (OOB) 9600 bps SLIP port, which is also in the MIB2 ifTable. So these tables are used as kind of an inclusive decision filter to pass only the types of interfaces that we want downstream to the algorithm that will be used to generate poll records. Finder also looks at the ifAdminStatus or the ifOperstatus to determine if the device/interface is down or up. In most cases the ifAdminStatus is used. There are a few cases where the ifOperStatus is used instead. This is determined in the finder.tcl. ifOperStatus is the actual electrical connection of device(plugged in, not plugged in) ifAdminStatus is the desired position that the administrator chooses. both up = active and discoverable. both down = not active and not discoverable. ifOperStatus = up, ifAdminStatus = down = active but not discoverable ifOperStatus down, ifAdminStatus up = not active but admin does want it to be discovered Once an interface has passed through this table it is passed off to an algorithm to generate the actual poll record (poller entry). In some cases the algorithm will perform some additional exclusive interface-type filtering. Sometimes only a single interface entry is sent, and we iterate through an array of entries, and other times we go interface by interface and generate a poll record on each one. It depends on whether we are doing standard support or enterprise-specific support and how many interfaces exist on the device. Sometimes the enterprise support is extremely easy, using a simple table and the standard algorithm and sometimes it is quite complex. It all depends on the complexity of the MIB implementation and the statistics that the customer is requesting. If there is a requirement to cross reference variables from one table to others, the algorithm can be quite intricate. For example an RMON probe is quite simple. A stand-alone probe will typically populate its MIB2 ifTable with one or more ethernet ports and an OOB SLIP port. We filter out everything but the ethernet ports and send them down to the standard ifTable algorithm, which generates poll records for RMON "etherstats" elements. An example of more complex support is the Bay 5000 chassis. Like most chassis designs the Bay 5000 can accept many types of blades - ethernet, token-ring, FDDI, ATM, management, etc. - each with different capabilities and/or numbers of ports. The management software (i.e. Optivity) allows the user to define logical groups of ports (virtual LANs if you will), which are either partitioned from the rest of the network, or connected to other ports on another card or chassis. In order to provide utilization statistics for all of these complex blade types and virtual LANs that the user can build, the vendor had to come up with a very complex group of MIBs. Consequently, finder needs to sort through this jungle of options and is rather involved. Based on the above information, finder assigns an agent type to each element found on the device. The agent type is associated with a MIB translation file in the $NH_HOME/poller directory. A list of these associations may be found in $NH_HOME/poller/agent.types. 2. The newly created DCI file is passed through the eHealth Merge Algorythm.

Concord Communications – Discover Best Practices ï 5 ð

This collected data is then put into a single, temporary, internal, DCI file. DCI files are comma separated flat files. The merge process then takes the temporary DCI file created by the Finder process and compares that information against the known elements in the poller configuration. It does this to determine one of three things about the new information. It determines: a) Is the newly discovered item identical to a pre-existing item in the poller? b) Does the new item already exist in the poller but need to be updated? c) Is the discovered item an actual new discovery? For more information on the eHealth merge process, please see section 3.4, Explanation of the eHealth Merge Algorithm. 3. The new elements or updates to existing elements are saved to the eHealth configuration. For more information on the save process, please see section 3.2, What are the Differences Between Ad-hoc and Scheduled Discoveries?

2. What are the Differences between Ad-hoc and Scheduled Discoveries?

Scheduled and Ad-hoc discoveries perform the same duties with the exception of how/when the results are saved to the eHealth configuration. The scheduled discovery can be configured to either save the discovery results or simply log the results for review by the eHealth Administrator for a later discovery. If the scheduled discover is configured to simply log the results, the eHealth administrator should review the changes logged and re-run the discovery to actually save the results at a later time. During the scheduled discovery where the job is configured to save the results, the merge process and the save process take place at the same time in the config server. This is due to the fact that the scheduled discovery does not allow the user to review the discovery information before saving it to the database. In contrast, during an interactive discovery, eHealth gives the user the option to edit the findings before saving. By selecting "Edit before Save", all new elements found are brought up in the poller configuration editor. Here the user may modify the information found by the finder. A DCI file is generated from the save process which contains the original discovery information along with the modifications made by the user through the "Edit before Save". This DCI file is then sent to the config server to update the poller configuration/database. The interactive discovery has been engineered to be the more aggressive tool. As the user is allowed to edit the findings before committing them to the database, the user has more control over what will be polled on the network and how it is polled. The eHealth Discover logfiles are an invaluable tool to better manage the Discovery process. The logfile created by eHealth will vary depending on the type of discovery run. For adhoc/interactive discoveries, eHealth will create interactiveDiscover logs located in $NH_HOME/log directory which contain detailed information about each element discovered, possible duplicates, and unresolved elements. The log files have the following naming convention: discoverInteractive.mm.dd.yyyy.nnnnnn.log If the adhoc/interactive results are not saved, a .unsaved will be appended to the log file name. The $NH_HOME/log/discoverResults.log file will be created for an adhoc/interactive discovery as

Concord Communications – Discover Best Practices ï 6 ð

well.This log contains the listing of findings seen in the discover results window. A poller audit log will also be created in the $NH_HOME/log directory. This log contains a listing of all of the changes made to the poller configuration when the results are saved. These log files have the following format: pollerAudit.date.time.log For scheduled discoveries with the 'Save Results' option selected, a discover..log will be created in the $NH_HOME/log directory which contains the information which would have been displayed in the Discover UI if this discover was run interactively. Like the adhoc discovery above, a discoverResults.log and pollerAudit log will be created containing the same information as documented above. The scheduled discovery process will also create a discoverScheduled log. This log file has the same naming convention and contents as the discoverInteractive log described above. For scheduled discoveries with the 'Report only' option selected, the same log files will be created containing the same information as the scheduled discovery with the 'Save Results' option selected with the exception of the pollerAudit log. Since no changes are being made to the configuration, this log is not created as that operation is not performed.

3. How do Discoveries Impact My License Consumption?

Discovering a particular element within a device does not necessarily consume a license. For example, if a router discovery is performed without the LAN/WAN option selected all of the routers certified interfaces will also be discovered. These interfaces will not consume a license however as they simply contribute to the aggregate values reported by the parent router element. If the LAN/WAN option was selected, or the 'Include in LAN/WAN reports' option selected within the poller configuration UI then the interfaces would be actively polled to report individual statistics and therefore a poller license would be consumed for each respective interface. The same scenario exists during Server discoveries. Several disks, partitions, CPU's, etc. may be discovered and actively polled, but once again these elements simply provide aggregate variables to the parent Server element and therefore do not consume a poller license. The lan/wan elements of a server would be subjected to the same scenario as described in the above Router discovery example. Turning off polling for an aggregated element will not impact the total available licenses, while disabling polling for non-aggregated elements will impact the total available licenses. It must be noted however that disabling polling for aggregate elements will impact the total statistics reported by the parent device. Other element types such as RAS and Process Sets share similar parent child relationships and license usage of those technologies will be similar to as described above. In addition, weighted licensing of certain Technology Types such as Wireless Access Points and Mobile Wireless devices will affect license consumption. Weighted licensing simply indicates that certain element types will consume more then 1 license per element. For example, PDSN elements will utilize 1000 statistical licenses per elements. This is due to the amount of information that 1 PDSN element provides.

Concord Communications – Discover Best Practices ï 7 ð

Understanding eHealth element license consumption will allow the eHealth Administrator to better manage and track license usage. The eHealth console provides Administrators with information regarding license usage. Additionally, starting in eHealth 5.0.2 P05 and eHealth 5.5 P03, the command nhListElementLicenses can be used to identify which elements are using a license. This command will output a list of all elements with a number next to each. If the element has a -1 next to it, then it does not use a license. However, if the element has a positive number next to it, then it does use a license. If the same number is next to several elements, then all of those elements only use one license. The numbers will increment with each license used, so the bottom number is the total licenses being used. For example, the following is the output from the nhListElementLicenses command: 2 sysName-SH 2 sysName-SH-/ 2 sysName-SH-/export 2 sysName-SH-/opt 2 sysName-SH-/tmp 2 sysName-SH-/var 2 sysName-SH-/var/run 2 sysName-SH-Cpu-1 2 sysName-SH-disk-dad0c0t0d0s0 2 sysName-SH-disk-dad1c0t2d0s0 2 sysName-SH-disk-sd0c0t1d0s0 3 sysName-SH-enet-port-2 In this case, one license is being used by all the elements, except sysName-SH-enet-port-2. This element utilizes its own license as can be seen by the incrementing of the count to three. Each additional element that uses its own license will be incremented by one.

4. Explanation of the eHealth Merge Algorithm

Upon element discover or rediscovery, eHealth executes the merge process attempting to determine if the elements discovered by the finder already matches an existing element in the poller configuration. The merge process is invoked after the discovery process is finished creating an incoming DCI file. The following is the DCI attributes search order used to determine if a discovered element is a "resolved updated", "unresolved update" or a "new element": For eHealth 5.5 and earlier:

1. nmsSource o The default nmsSource is NH:DISCOVER, and it is hard coded in the discovery

process o Integration modules and Application Response elements have a different nmsSource o The matching search is limited to those elements having the

same nmsSource and nmsId not empty, if a match is found, move to item 2, otherwise a "new element" is created.

o If nmsId is empty, move to item 3 2. uniqueDeviceId

o Unique attribute for each device in the network used to distinguish one from another o Assigned by the finder upon discovery o By default, it is set to the lowest MAC address found in the device

Concord Communications – Discover Best Practices ï 8 ð

o For Cisco routers the chassis-Id is used, except when the first 4 alpha characters in a row

o In environments where neither the MAC address or Cisco SNMP ChassisId are unique, the variable NH_DISCOVER_DEVID_IP can help to set the uniqueDeviceId to be one of the following: § sysName § ipAddress § sysName-MAC (sysName-ChassisId or sysName-ipAddress for Cisco

Routers) o The matching search is limited to those elements having the same uniqueDeviceID o If a match is found, move to item 4, otherwise a "new element" is created

3. nmsId (Discover key) o Assigned by the finder upon discovery, usually: sysName Descr o Unique for each element, except for parent elements o Parent elements do not have discover key, therefore nmsId is not used for matching o If a match is found, a "resolved update" is issued, otherwise move to item 4 o If within the incoming DCI file more than 1 element is found with the same

nmsId (not empty), the nmsId is marked as "poisoned" and it's not used for element matching.

o When nmsId is empty or "poisoned", the following applies: § If there is a match of the first item and a combination of the others listed

below, a "unresolved update" is issued, even if it has different nmsSource and/or uniqueDeviceId § 2 out of 3 match of uniqueDeviceId, sysName and ipAddr § mibTranslationFile § All indices (index1, index2, index3 and index4) § community string

4. ipAddress, mibTranslationFile, community string, All indices o Last ditch resort used to match an element when matching by nmsId failed o If there is a match of these attributes, a "unresolved update" is issued o If there is no match found, a new element will be created

For eHealth version 5.6 and above:

1. nmsSource o The default nmsSource is NH:DISCOVER, it is hardcoded in the discovery process o Integration modules and Application response have a different nmsSource o The matching search is limited to those elements having the same the same

nmsSource o If a match is found, move to item 2, otherwise a "new element" is created

2. deviceHashKey o New DCI field added in eHealth 5.6 o NOT visible from the GUI, only through DCI o Assigned during the merge to uniquely identify each device within the configuration o The matching search is limited to those elements having the same deviceHashKey o If a match is found, move to item3, otherwise a "new element" is created. o The following attributes are used to determine the uniqueness of the device:

§ uniqueDeviceId § ipAddress § sysName § ifPhysicalAddress cloud (List of all the physical addresses in the device) § ifIpAddress cloud (List of all the ip addresses in the device)

3. UDP Port, SNMP enterprise ID, parent mtf (if any) o Used to identify multiple SNMP agents running in the same host

Concord Communications – Discover Best Practices ï 9 ð

o UDP Port and enterprise ID must match. o The matching search is limited to those elements sharing these attributes o If a match is found, move to item 4, otherwise a "new element" is created

4. nmsId o Used to uniquely identify an element within the device o If a match is found, a "resolved update" is issued, otherwise move to item 5

5. ifPhysAddr, dbId (in case of remote polled elements) o Used to match a particular interface using it's MAC address o If a match is found, a "resolved update" is issued, otherwise move to item 6

6. mtf_name, All indices (index1, index2, index3, index4) o Last ditch resort to match an element o If a match is found for all attributes, a "resolved update" is issued, otherwise a "new

element" is created The merge algorithm was rewritten in eHealth release 5.6 to perform a more reliable comparison with existing and new elements. This new algorithm greatly reduces the likelihood of duplicate element or unresolved new elements being created during the merge. To further reduce the likelihood of duplicate element creation, please refer to section 4.4, Reconciling and Avoiding Duplicate Element Creation.

5. How does selecting the ‘MIB2’ Option Impact My Discovery Results?

The ‘Find MIB2 LAN’ option allows the finder to locate LAN interfaces which only contain basic MIB2 statistics such as In/Out/Total packets. This method of discovery is useful when a device has an uncertified SNMP agent installed. When discovering this device, eHealth will generate a basic element which will allow for reporting of availability and basic packet count information. This option is not recommended for devices running a certified firmware version as the vendor specific interface will be discovered allowing for a more robust reporting solution.

IV. Troubleshooting Common Discovery Issues

1. Troubleshooting ‘No Response to SNMP’ or ‘No Response to Ping’ errors

These errors indicate that the eHealth discover process timed-out waiting for the Ping or SNMP response from the target device. The most probable causes of this scenario are: 1. The device is unable to respond to ping or responds to ping outside of the timeout threshold due to network load.

Ping the device from the command line using the configured eHealth Ping packet size (default = 100 bytes). There are 3 steps that can be taken to resolve this issue: 1. Ensure the device is able to respond to ping and attempt to reduce the load by discovering during ‘off-peak’ hours. 2. Disable the discovery ping as described in section 4.1.2 3. Increase the timeout as described in section 4.1.3

2. The device is unable to respond to ping due to protocol restrictions placed on the device or the network segment on which the device resides.

If a device is unable to respond to ping due to configuration restrictions, the discover ping can be disabled via the NH_DISCOVER_DISABLE_PING variable. When this variable is set to ‘yes’,

Concord Communications – Discover Best Practices ï 10 ð

eHealth will not attempt to ping the device prior to sending the SNMP requests. This will allow the discovery of devices which are unable to respond to ping.

3. Either network latency or load on the target device caused the SNMP request to either be dropped by the device or received/transmitted outside the threshold of the discover timeout.

The NH_DISCOVER_TIMEOUT environment variable specifies the time in seconds that the discover process waits for a ping response and an SNMP response from a device. Increasing the value of this variable will allow eHealth to wait longer for device responses. The default value for the NH_DISCOVER_TIMEOUT variable is equal to 1 second. To determine the most appropriate 'timeout' value, perform a discovery from the command line: *NOTE: Command line discovery results are output to the display (or a file) and not saved to the poller configuration and database. As the $NH_USER:

1) CD to the $NH_HOME/bin directory 2) Run command: nhDiscover -c community string -mode mode1 [mode2] -t timeout IP Address where: mode = "lanWan", "router/Switch", "dialog", "server", "application", "modemPool", "ras", "respelements" timeout = timeout in seconds (2,3,4...) Example: nhDiscover -c public -mode "router", "lanwan" -t 10 192.168.25.25

Once the minimum timeout has been determined, modify the setting of the NH_DISCOVER_TIMEOUT variable to that value.

4. The SNMP agent on the target device is not running or is unresponsive.

Verify that the SNMP agent is properly configured by obtaining a MIB dump of the device using the nhSnmpTool utility.

5. The port on which the SNMP agent on the target device is running is not configured in the following eHealth variables:

• NH_DISCOVER_PORTS • NH_DISCOVER_SERVER_PORTS • NH_DISCOVER_APPLICATION_PORTS • NH_DISCOVER_RESPONSE_PORTS

Determine the port on which the SNMP agent is running and add that port number to the appropriate NH_DISCOVER_* variable(s).

Concord Communications – Discover Best Practices ï 11 ð

2. Troubleshooting ‘No MIB Support for this Agent’ errors

This error message indicates that the finder process was unable to successfully match the agent in question with the coded list of supported agents. Verify that the device in question is infact certified via the Concord Communication Device Certification matrix: http://www.concord.com/devices/html/default.html If the device in question is not listed as certified, you can either:

1. Submit a certification request to have the device agent reviewed for certification via: http://license.concord.com/custserv/certification.htm Additional information on the Concord Communications certification policy can be found at: http://www.concord.com/devices/cert_policy.asp 2. Rediscover the device using the ‘Find MIB2 Lans’ option to attempt discovery of any MIB2 Lan ports on the device. See section 3.5 for additional information regarding this option.

3. Troubleshooting SystemEDGE Discovery Issues

Troubleshooting SystemEDGE discovery issues can be accomplished via the following method:

1. Verify valid licensing on SystemEDGE server a. Windows: use the Sysedge_Home/setup –c –v command to validate license found and installation issues Example of valid return: setup: Found valid license key. setup: Found valid license key. No problems detected in SystemEDGE installation or license.

1. If valid key was not found:

Verify license string from sysedge.lic in the winnt/system32 directory. If necessary, obtain a valid license from Concord Communications Licensing. b. UNIX: examine the system log for errors relating to SystemEdge licensing. 1. If valid key was not found:

Verify license string from sysedge.lic in the /etc directory. If necessary, obtain a valid license from Concord Communications Licensing.

2. Ensure the agent is running on the system in question.

a. Windows: ensure the SNMP service is running. b. UNIX: verify agent is running using ps –ef | grep sysedge command.

Concord Communications – Discover Best Practices ï 12 ð

3. Verify agent’s running port

a. UNIX: Use the 'ps –ef | grep sysedge' command to locate process and port entry b. Windows: Sysedge runs as a sub-agent of the Windows master SNMP agent. This will usually be port 161, but can be verified in the winnt/system32/dirvers/etc/services file. Example: NeWS 144/tcp news sgmp 153/udp sgmp tcprepo 158/tcp repository snmp 161/udp snmp snmp-trap 162/udp snmp

4. Verify agent is running in “full” mode from the SystemEDGE system:

Syntax: sysvariable ipaddress:port community string –V –O –a Example of correct output: System: CORVALLIS Build 1381, Service Pack 5 4.0 4.0 Patchlevel 1 SystemEDGE Mode: fullMode(1) AgentVersion: 4.0 Patchlevel Example of error indicating problem: snmprecv timeout

5. Verify agent can communicate with eHealth system

a. Ping SystemEDGE system from eHealth system to verify network connectivity b. Use the sysvariable command from the eHealth system using the SystemEdge systems IP address. c. Use the walktree command to verify valid SNMP communication from agent to eHealth. Syntax: walktree community IP addr:port mibpath outfile number of retries Example: walktree public 192.168.18.208:161 1.3.6.1.2.1.1.3 walk.out 3 d. Use the nhSnmpTool command to verify if eHealth can complete a successful “mibdump” of agent. Syntax: nhSnmpTool –c community –Server –t 8 –ret 8 IP address of sysedge system Example: nhSnmpTool –c public –Server –t 8 –ret 8 192.168.18.208

6. Verify environmental variable settings a. Verify the NH_DISCOVER_SERVER_PORTS variable is set to the port on which the agent is running.

Concord Communications – Discover Best Practices ï 13 ð

b. Verify that discover is using this port by examining the discover.log created

UNIX: examine the appropriate resource file for correct discover port entry NT: use the “system” tab to examine variable settings

7. Run a command line discovery and modify the configurable values a. Determine if any “lan/wan” ports can be discovered Syntax: nhDiscover –mode “lanwan” –c community string –ret 8 –t 8 –o $NH_HOME/tmp/out.log –res $NH_HOME/tmp/res.log ip address b. Determine if agent can be discovered using command line with forced set timeouts. Syntax: nhDiscover –mode “server” –c community string –ret 8 –t 8 –o $NH_HOME/tmp/out.force.log –res $NH_HOME/tmp/res.force.log ip address

8. Examine files created by command line discover for potential problems:

a. Out.log : output of lan/wan discovery b. Res.log: DCI formatted output of lan/wan discovery c. Out.force.log: output of timeout and retry increased server discover d. Res.force.log: DCI formatted output of timeout and retry increase server discovery

4. Reconciling and Avoiding Duplicate Element Creation

In most cases, the rediscovery of existing elements will result in resolved updates. However, network environments are always changing and this creates a chance of getting duplicate elements when the merge algorithm fails to resolve an update because of differences between the original and newly discovered element's attributes. A duplicate element is simply an element where the eHealth element naming convention duplicates an already existing element name. eHealth will attempt to ensure uniqueness by appending a –A (or –B,-C etc.) to the newly found elements name. In order to minimize this possibility, we recommend rediscovering the elements within the eHealth configuration on a regular basis. This will limit the amount of updates that occur by minimizing the time between updates. In case of duplicate creation, examine the elements (original and duplicate) and determine if eHealth should have merged those elements into one, once that assessment has been made, take note of the following attributes of the duplicate element from the eHealth Discover UI:

• Hardware ID (uniqueDeviceId) • Discover Key (nmsId) • System Name (sysName) • Agent Type (mibTranslationFile)

First delete the new element and update the original with these attributes, then rediscover to update all other attributes.

Concord Communications – Discover Best Practices ï 14 ð

If there are more than a few elements to be reconciled, it is advisable to write a script to perform these operations using DCI. In some cases, the duplicate element might have been collecting data for quite a while so it's up to the customer/end user whether to delete the duplicate and keep the original element or just delete the original element and keep the duplicate.

V. General Discovery Best Practices

1. Avoiding Duplicate Elements by Implementing a Strong Change Control Process A strong change management process is essential to ensuring an accurate and stable eHealth element configuration. The eHealth Administrator should work in concert with the Network Administrators to ensure that the eHealth Administrator is aware of when changes are going to be implemented. It is strongly suggested that prior to a device change occuring, the eHealth elements associated with that device be rediscovered. This will ensure the eHealth device configuration is current prior to the change occuring. After the device change has been made, and additional rediscovery should be performed to ensure that the new configuration is updated within the eHealth configuration. This method will minimize the number of device changes detected by eHealth at one time thereby minimizing the chance for duplicate element creation. For additional information on this topic, please view sections 3.4 and 4.4.

2. Minimizing data loss through Statistics poller error analysis and ‘noDbDataFor’ tools

The eHealth installation includes valuable tools such as the nhListElements command to assist in configuration management. That utility includes a ‘noDbDataFor’ flag which creates a list of all elements that have not reported data in the configured amount of time. This usually means that an element either has polling disabled or is experiencing polling errors which are causing eHealth to not insert data into the database for that element. That list can be used as a ‘to-do’ list of elements which should be rediscovered or investigated further. The rediscovery should resolve any conflicts which may be causing the polling errors in question. nhListElements

The nhListElements command displays a simple list of eHealth element names using selected criteria. You can use arguments to filter the list and create specific lists of elements. You can also redirect the output of this command as input to other commands to modify your poller configuration file, such as nhModifyElements, nhDeleteElements and nhPopulateGroup. Syntax The nhListElements command uses the following syntax: nhListElements [-h] [-rev] [-showTypes] [-showDciFields] nhListElements [-elements] [-outfile filename] nhListElements -rebooted [-outfile filename] nhListElements -where "whereClause" [-outfile filename] nhListElements -elemType type [-outfile filename]

Concord Communications – Discover Best Practices ï 15 ð

nhListElements -noDbDataFor hours [-outfile filename] nhListElements -groupType groupType -inGroup group [-outfile filename]

-noDbDataFor hours

Lists only those elements for which eHealth has not collected data and added it to the database for the number of hours specified, and for which there is not any alarm data. This command allows you to produce a list of elements that eHealth is not currently polling, or elements that it is currently polling but that have poll errors. You cannot use this argument in combination with any other nhListElements argument except for -outfile. You cannot use this argument on the central site to return data in a remote polling environment. You must run it on the remote systems. NOTE: in eHealth 5.0.2 and prior, only elements with polling turned on would be output from the nhListElements command

3. Using seed files to automate incremental configuration updates

The eHealth discover mechanism allows for the use of ‘seed files’ during the discovery process. These seed files are simply a text file containing a list of IP address, community string combinations. This provides the eHealth Administrator with an easy way to discover groups of elements and to automate the discover process. Example:

# Server 1 10.100.10.32 private # Server 2 10.100.10.33 public

Seed files should also contain like technology types to ensure the discover is run against the correct technology and there are no mismatches. For example, a router discovery of a server may actually produce a router element as servers can act as a routing device. The eHealth poller configuration file ($NH_HOME/poller/poller.cfg) can also be used as a rediscovery seed file but this is only recommended for small configurations. Larger configurations should not utilize this method as a rediscovery of the entire configuration causes a severe performance impact to the eHealth server. It is recommended that the rediscovery target a portion of the configuration when using seed files in large configurations.

4. Self Monitoring the eHealth System using Process Set Creation

Using the SystemEDGE agent, the eHealth system can be setup to self-monitor itself via the creation of eHealth Process Sets. An eHealth and an Oracle (or Ingres) process set can be created to provide detailed statistics on the eHealth and Database processes. These statistics can be used in capacity planning as well as within LiveHealth monitoring to ensure application stability. Process Sets are created via the eHealth discover UI via the ‘Find Processes’ –> Define option. Two new process sets should be created for eHealth and the Database using the following processes and parameters:

• Process Set – eHealth

o 5.5 and Above:

Concord Communications – Discover Best Practices ï 16 ð

§ nhiArControl § nhiCfgServer § nhiDbServer § nhiLiveExSvr § nhiMsgServer § nhiNotifierSvr § nhiPoller

• Argument: –remote No § nhiPoller

• Argument: -live § nhiPoller

• Argument: -dlg § nhiPoller

• Argument: -import § nhiReplServer § nhiRespServer § nhiRftIn § nhiRftOut § nhiRmtIn § nhiRmtOut § nhiServer § nhiTrapServerCmu

o 5.0.2 and Earlier:

§ nhiArControl § nhiCfgServer § nhiConsole § nhiDbServer § nhiLiveExSvr § nhiMsgServer § nhiNotifierSvr § nhiPoller

• Arguments: none § nhiPoller

• Arguments: -live § nhiPoller

• Arguments: -dlg § nhiPoller

• Arguments: -import § nhiRespServer § nhiServer § nhiTrapServerCmu

• Process Set Oracle (for eHealth 5.5 and above)

o ora_arc0_EHEALTH o ora_arc1_EHEALTH o ora_ckpt_EHEALTH o ora_dbw0_EHEALTH o ora_lgwr_EHEALTH o ora_pmon_EHEALTH o ora_reco_EHEALTH

Concord Communications – Discover Best Practices ï 17 ð

o ora_smon_EHEALTH

• Process set Ingres (for eHealth 5.0.2 and earlier)

o dmfacp o iigcn o iidbms

§ Argument: recovery o Iidbms

§ Argument: dbms

Each process should have the ‘create if found’ flag set, match full name set, and the appropriate Operating System set. Once the process set has been defined, the eHealth server should be discovered using the ‘read-write’ community string to create the appropriate MIB rows. Once discovered, the ‘Record Detailed Data’ option can be enabled to allow for individual process data to be reported along with aggregate process set data.

5. Effectively Interfacing with Concord Technical Support to Resolve

Discovery Issues

When the situation arises that it is necessary to contact Concord Technical Support, it is important to provide Support with the information necessary to troubleshoot the issue. When dealing with Discovery issues, the following information is often vital to the troubleshooting process:

• A clear description of the problem • What may have changed in the environment since the issue has occurred • The current eHealth patch and certification level (including any verification kits that may have

been installed) • The OS of the eHealth system • Any discovery logs generated detailing the issue • The poller configuration file ($NH_HOME/poller/poller.cfg) • A tar archive of the $NH_HOME/log directory • A tar archive of the $NH_HOME/tmp/nhiCfgServer directory • A full MIB dump (stages 1 & 2) of the device in question using the nhSnmpTool command

Although the above information and files may appear to be unrelated, in the majority of instances this information is required during the troubleshooting process. Providing this information to the Technical Support Engineer when initially contact Concord Technical Support, will dramatically reduce the time taken in obtaining all necessary information to resolve the issue.

VI. Changes to the Discovery Process in eHealth 5.6.x

The main change in the Discover process in eHealth 5.6 is the changes made to the merge algorithm. eHealth no longer utilizes an element’s discoverKey to determine uniqueness but now relies on a deviceHashKey. This new key allows for a greater level of accuracy when determining if changes to a device constitute a new element or an update to an existing element. Additional information on the discover algorithm changes can be found in section 3.4, Explanation of the eHealth Merge Algorithm.

Concord Communications – Discover Best Practices ï 18 ð

VII. Other Resources

In addition to this document, there are many other resources available to the eHealth Administrator to assist in the management of the Discover process and the eHealth element configuration. These include, but are not limited to:

• Standard eHealth Documentation (TotalDoc) o eHealth Administration Guide o http://www.concord.com/support/secure/products/search_prod.shtml

• Concord Communications White Papers o http://www.concord.com/support/secure/tech_wht_papers.shtml o The eHealth Discovery and Certification Process White Paper

§ http://www.concord.com/support/secure/n_nhdisc.shtml o The MIB2 LAN Element Type White Paper

§ http://www.concord.com/support/secure/n_mib2lan.shtml

o Customizing Element Names White Paper § http://www.concord.com/support/secure/n_customize.shtml

• Concord Communications Knowledgebase o http://search.support.concord.com o Applicable Solution ID’s:

§ PrimusTrain202 § TS2906 § TS15008 § TS13051 § TS11641 § TS13242 § PrimusTrain195 § PrimusTrain90 § TS11673 § TS15008 § TS4602 § TS13577 § TS14359

• Scripts Contributed by Concord Employees and Customers:

o Concord Knowledgebase Solution # TS13791