79408639 intro scope sizing guide

Upload: rockinever

Post on 14-Oct-2015

31 views

Category:

Documents


0 download

DESCRIPTION

ca

TRANSCRIPT

  • Version 8.0

    CA Wily Introscope

    Sizing and

    Performance Guide Date: 08-2008

  • Copyright 2008, CA. All rights reserved.

    Wily Technology, the Wily Technology Logo, Introscope, and All Systems Green are registered trademarks of CA.

    Blame, Blame Game, ChangeDetector, Get Wily, Introscope BRT Adapter, Introscope ChangeDetector, Introscope Environment Performance Agent, Introscope ErrorDetector, Introscope LeakHunter, Introscope PowerPack, Introscope SNMP Adapter, Introscope SQL Agent, Introscope Transaction Tracer, SmartStor, Web Services Manager, Whole Application, Wily Customer Experience Manager, Wily Manager for CA SiteMinder, and Wily Portal Manager are trademarks of CA. Java is a trademark of Sun Microsystems in the U.S. and other countries. All other names are the property of

    their respective holders.

    For help with Introscope or any other product from CA Wily Technology, contact Wily Technical Support at 1-888-GET-WILY ext. 1 or [email protected].

    If you are the registered support contact for your company, you can access the support Web site directly at http://support.wilytech.com.

    We value your feedback

    Please take this short online survey to help us improve the information we provide you. Link to the survey at: http://tinyurl.com/6j6ugb6000 Shoreline Court, Suite 200South San Francisco, CA 94080

    US Toll Free 888 GET WILY ext. 1US +1 630 505 6966Fax +1 650 534 9340Europe +44 (0)870 351 6752Asia-Pacific +81 3 6868 2300Japan Toll Free 0120 974 580Latin America +55 11 5503 6167

    www.wilytech.com

  • CONTENTS

    New tab for Enterprise Manager Overview . . . . . . . . . 13

    New tab for Metric Count . . . . . . . . . . . . . . . 13

    Ping time threshold properties . . . . . . . . . . . . . . 13Running multiple Collectors on one machine . . . . . . . . . 13

    Scalability . . . . . . . . . . . . . . . . . . . . . 13

    SmartStor metadata stored in uncompressed format . . . . . . 14Table of Contents

    Chapter 1 Introscope Sizing and Performance Introduction . . . . . . 9

    New and changed features in Introscope 8.0 . . . . . . . . . 10

    Agent load balancing . . . . . . . . . . . . . . . . . 10

    Agent metric aging. . . . . . . . . . . . . . . . . . 10

    Changed Heap Capacity (%) metric . . . . . . . . . . . . 10

    Changed Metric Count metric . . . . . . . . . . . . . . 10

    Changed way of determining events. . . . . . . . . . . . 11

    Changed Number of Inserts metric . . . . . . . . . . . . 11

    Changed Overall Capacity (%) metric . . . . . . . . . . . 11

    Dynamic instrumentation . . . . . . . . . . . . . . . 11

    Enterprise Manager dead metric removal . . . . . . . . . . 11

    How to detect metric explosions . . . . . . . . . . . . . 11

    Metric clamping . . . . . . . . . . . . . . . . . . . 11

    MOM hot failover . . . . . . . . . . . . . . . . . . 12

    MOM sizing limits examples. . . . . . . . . . . . . . . 12

    New metric for Collector Metrics Received Per Interval . . . . . 12

    New metric for Historical Metric Count . . . . . . . . . . . 12

    New metric for Number of Historical Metrics . . . . . . . . . 12

    New metric for Transaction Traces Dropped Per Interval. . . . . 12

    New tab for CPU Overview . . . . . . . . . . . . . . . 12Contents iii

    SQL statements, statement normalizers, and metric explosions . . 14

  • CA Wily Introscope Support for RAID 5 data storage . . . . . . . . . . . . . 14

    Transaction Trace component clamp . . . . . . . . . . . 14

    Chapter 2 EM Requirements and Recommendations . . . . . . . . . 15

    Enterprise Manager overview . . . . . . . . . . . . . . . 17

    Enterprise Manager databases . . . . . . . . . . . . . . 20

    Factors that affect the Introscope environment . . . . . . . . 20

    Factors that affect EM maximum capacity . . . . . . . . . . 21

    Differences between EMs and J2EE servers . . . . . . . . . 22

    About Introscope system size . . . . . . . . . . . . . . 26

    Enterprise Manager health . . . . . . . . . . . . . . . . 27

    About the Enterprise Manager Overview tab . . . . . . . . . 27

    About EM health and supportability metrics . . . . . . . . . 28

    Harvest Duration metric . . . . . . . . . . . . . . . . 29

    Number of Collector Metrics . . . . . . . . . . . . . . 30

    Collector Metrics Received Per Interval metric . . . . . . . . 31

    Converting Spool to Data metric . . . . . . . . . . . . . 32

    Overall Capacity (%) metric . . . . . . . . . . . . . . 33

    Heap Capacity (%) metric . . . . . . . . . . . . . . . 34

    Troubleshooting Enterprise Manager health . . . . . . . . . 35

    Additional supportability metrics . . . . . . . . . . . . . 38SmartStor overview . . . . . . . . . . . . . . . . . . 40

    About SmartStor spooling and reperiodization . . . . . . . . 40

    Report generation and performance . . . . . . . . . . . . 43

    Concurrent historical queries and performance . . . . . . . . 43

    About SmartStor and flat file archiving . . . . . . . . . . . 43

    MOM overview . . . . . . . . . . . . . . . . . . . . 44

    Collector overview. . . . . . . . . . . . . . . . . . . 44

    Collector metric capacity and CPU usage . . . . . . . . . . 45

    About the CPU Overview tab . . . . . . . . . . . . . . 46

    Enterprise Manager basic requirements . . . . . . . . . . . 47

    Enterprise Manager file system requirements . . . . . . . . 47

    EM OS disk file cache memory requirements . . . . . . . . . 47

    Enterprise Manager heap sizing . . . . . . . . . . . . . 48

    SmartStor requirements . . . . . . . . . . . . . . . . 49

    Each EM requires SmartStor on a dedicated disk or I/O subsystem . 49

    SmartStor Duration metric limit . . . . . . . . . . . . . 50iv Contents

  • Sizing and Performance GuideMOM and Collector EM requirements . . . . . . . . . . . . 51

    Local network requirement for MOM and Collectors . . . . . . 51

    When to run reports, custom scripts, and large queries . . . . . 52

    Introscope 8.0 EM settings and capacity . . . . . . . . . . . 53

    Estimating Enterprise Manager databases disk space needs . . . 53

    SmartStor settings and capacity . . . . . . . . . . . . . . 55

    Setting the SmartStor dedicated controller property . . . . . . 55

    Planning for SmartStor storage using SAN . . . . . . . . . 57

    Planning for SmartStor storage using SAS controllers . . . . . 57

    Enterprise Manager thread pool and available CPUs . . . . . . 57

    Collector and MOM settings and capacity . . . . . . . . . . . 58

    MOM disk subsystem sizing requirements . . . . . . . . . . 58

    MOM hardware requirements . . . . . . . . . . . . . . 59MOM to Collectors connection limits . . . . . . . . . . . . 59

    MOM to Workstation connection limits . . . . . . . . . . . 60Metric load limit on MOM-Collector systems . . . . . . . . . 60

    Configuring a cluster to support 1,000,000 MOM metrics . . . . 61

    MOM hot failover . . . . . . . . . . . . . . . . . . 62

    Agent load balancing on MOM-Collector systems . . . . . . . 63

    Avoid Management Module hot deployments . . . . . . . . . 68

    Collector applications limits . . . . . . . . . . . . . . . 69

    Collector metrics limits . . . . . . . . . . . . . . . . 69

    Collector events limits . . . . . . . . . . . . . . . . 70

    Collector agent limits . . . . . . . . . . . . . . . . . 70

    Collector hardware requirements . . . . . . . . . . . . . 71

    Collector with metrics alerts limits . . . . . . . . . . . . 71

    Collector to MOM clock drift limit . . . . . . . . . . . . . 71

    Reasons Collectors combine slices . . . . . . . . . . . . 72

    Increasing Collector capacity with more and faster CPUs . . . . 73

    Standalone EM hardware requirements example . . . . . . . 74

    Running multiple Collectors on one machine . . . . . . . . . 74

    Chapter 3 Metrics Requirements and Recommendations . . . . . . . 77

    Metrics background . . . . . . . . . . . . . . . . . . 78

    About metrics groupings and metric matching . . . . . . . . 78

    8.0 metrics setup, settings, and capacity . . . . . . . . . . . 79

    Matched metrics limits . . . . . . . . . . . . . . . . 79Contents v

  • CA Wily Introscope Inactive and active metric groupings and EM performance . . . . 80

    Performance and metrics groupings using the wildcard (*) symbol . 80

    SmartStor metrics limits . . . . . . . . . . . . . . . . 80

    Virtual agent metrics match limits . . . . . . . . . . . . 80

    About alerted metrics and slow Workstation startup . . . . . . 81

    About aggregated metrics and Management Module hot deployments 81

    Detecting metrics leaks . . . . . . . . . . . . . . . . . 81

    Metrics leak causes . . . . . . . . . . . . . . . . . 82

    Finding a metrics leak. . . . . . . . . . . . . . . . . 82

    Metrics for diagnosing a metrics leak . . . . . . . . . . . 83

    Detecting metric explosions . . . . . . . . . . . . . . . 84

    Metric explosion causes . . . . . . . . . . . . . . . . 84

    Finding a metric explosion . . . . . . . . . . . . . . . 85

    Investigator metrics and tab for diagnosing metric explosions. . . 85

    How Introscope prevents metric explosions . . . . . . . . . 91SQL statements and metric explosions . . . . . . . . . . . 92

    SQL statement normalizers . . . . . . . . . . . . . . . 94

    Enterprise Manager dead metric removal . . . . . . . . . . 96

    Metric clamping . . . . . . . . . . . . . . . . . . . 96

    SmartStor metadata files are uncompressed . . . . . . . . . 98

    Chapter 4 Workstation and WebView Requirements and Recommendations 99

    Workstation and WebView background . . . . . . . . . . . 100

    8.0 Workstation and WebView requirements . . . . . . . . . 100

    OS RAM requirements for Workstations running in parallel . . . . 100

    WebView and Enterprise Manager hosting requirement . . . . . 100

    8.0 Workstation and WebView setup, settings, and capacity . . . . 101

    Workstation to standalone EM connection capacity. . . . . . . 101

    Workstation to MOM connection capacity . . . . . . . . . . 102WebView server capacity . . . . . . . . . . . . . . . 103

    WebView server guidelines . . . . . . . . . . . . . . . 103

    Top N graph metrics limit per Workstation . . . . . . . . . 103

    Chapter 5 Agent Requirements and Recommendations . . . . . . . 105

    Agent background . . . . . . . . . . . . . . . . . . . 106

    About virtual agents . . . . . . . . . . . . . . . . . 106

    Agent sizing setup, settings, and capacity . . . . . . . . . . 107vi Contents

  • Sizing and Performance GuideAgent metrics reporting limit . . . . . . . . . . . . . . 107

    About the Metric Count tab . . . . . . . . . . . . . . . 107

    Transaction Trace component clamp . . . . . . . . . . . 108

    Agent maximum load when disabling Boundary Blame . . . . . 109

    Configuring agent heuristics subsets . . . . . . . . . . . 109

    Virtual agent metrics match limits . . . . . . . . . . . . 109

    Virtual agent reported applications capacity . . . . . . . . . 110

    Agents limits per Collector . . . . . . . . . . . . . . . 110

    Agent heap sizing . . . . . . . . . . . . . . . . . . 110

    High agent CPU overhead from deep nested front-end transactions . 111

    Dynamic instrumentation . . . . . . . . . . . . . . . . 112

    Appendix A Introscope 8.0 Sizing and Performance FAQs . . . . . . . 113

    Appendix B Sample Introscope 8.0 Collector and MOM Sizing Limits by OS 119

    Sample Introscope 8.0 Collector sizing limits table . . . . . . . 119

    Sample Introscope 8.0 MOM sizing limits table . . . . . . . . . 122

    Index . . . . . . . . . . . . . . . . . . . . . . . . . . 125Contents vii

  • CA Wily Introscope viii Contents

  • Introscope Sizing and Performance Introduction 9

    CHAPTER 1

    Where to get the latest version of this book

    You can find the most current version of this book on the CA Wily Community site

    at https://community.wilytech.com/. Check back periodically to see if the book has been updated.

    NOTE: The Wily Community Site is for use by registered members of the Wily User Community. If you need a user account, you can request one at the site.Introscope Sizing and Performance Introduction

    This document contains background, instructions, best practices, and tips for optimizing the sizing and performance of your Introscope 8.0 deployment and environment. Use it in conjunction with the following Introscope 8.0 documentation:

    Introscope Configuration and Administration Guide Introscope Installation and Upgrade Guide Introscope Java Agent Guide Introscope .NET Agent Guide Introscope Overview Guide Introscope WebView Guide Introscope Workstation User Guide

    For additional information about this product, you can take the CA Wily Technology Education Services class, Introscope: Enterprise Manager (EM) Capacity Management. For more information, go to http://www.wilytech.com/services/education.html.

    In addition, CA Wily Technology Professional Services and Technical Support have service offerings to address specific needs in your application management environment.

  • CA Wily Introscope New and changed features in Introscope 8.0The following sections detail new or changed features in Introscope 8.0 that affect sizing and performance.

    Agent load balancing

    Introscope 8.0 (8.0 only) agents in a clustered environment can connect to the

    MOM and get load-balanced to a Collector. Pre-8.0 agents must connect directly to a Collector. Also, the MOM keeps the metric load balanced between Collectors by ejecting participating 8.0 agents from over-burdened Collectors. A participating agent is one that connected to the MOM. The ejected agents reconnect to the MOM, and are reallocated to under-burdened Collectors. To configure agent load balancing, see the Introscope Configuration and Administration Guide. To understand how agent load balancing affects Introscope performance, see Agent load balancing on MOM-Collector systems on page 63.

    Agent metric aging

    By default, agent metric aging periodically removes dead metrics from the agent memory cache. This helps prevent metric explosions. See About agent metric aging on page 91.

    Changed Heap Capacity (%) metric

    The Heap Capacity (%) metric is created when the Enterprise Manager periodically asks the JVM how much maximum heap there is and how much it is currently using (based on the GC Heap: In Use Post GC (mb) metric). Formerly this metric was calculated based on a ratio of current heap total and how much heap is in use. See Overall Capacity (%) metric on page 33 and Heap Capacity (%) metric on page 34.

    Changed Metric Count metric

    The Metric Count metric Investigator node, which was previously under the Agent Stats node, is now here:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Metric Count

    See Metric Count metric on page 85.10 Introscope Sizing and Performance Introduction

  • Sizing and Performance GuideChanged way of determining events

    The way that the Enterprise Manager handles Transaction Trace incoming events has changed, and uses new and changed metrics. See Events and Transaction Traces on page 36.

    Changed Number of Inserts metric

    The former Data Store|Transactions:Number of Inserts metric was renamed to Data Store|Transactions:Number of Inserts Per Interval. This metric value now shows the number of Transaction Traces placed into the Transaction Trace insert queue during an interval. Previously this metric showed the number of Transaction Traces that were reported to the Enterprise Manager. See Events and Transaction Traces on page 36 and See Collector events limits on page 70.

    Changed Overall Capacity (%) metric

    The Overall Capacity (%) metric is calculated using an additional value from the CPU Capacity (%) metric value. See Overall Capacity (%) metric on page 33.

    Dynamic instrumentation

    Introscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. Dynamic instrumentation affects CPU utilization, memory, and disk utilization. See Dynamic instrumentation on page 112.

    Enterprise Manager dead metric removal

    Starting with Introscope 8.0, when a metric has not produced data for more than eight minutes (default), it is removed from the Investigator tree. See Enterprise Manager dead metric removal on page 96.

    How to detect metric explosions

    Introscope 8.0 includes a number of new metrics and capabilities to help you detect metric explosions. For more information, see Detecting metric explosions on page 84.

    Metric clamping

    Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. See Metric clamping on page 96.New and changed features in Introscope 8.0 11

  • CA Wily Introscope MOM hot failover

    If the MOM gets disconnected or goes down due to, for example a hardware or network failure, you can configure a second MOM to take over using hot failover. See MOM hot failover on page 62.

    MOM sizing limits examples

    CA Wily now provides MOM hardware and cluster requirements examples. See Sample Introscope 8.0 MOM sizing limits table on page 122.

    New metric for Collector Metrics Received Per Interval

    The Collector Metrics Received Per Interval metric is an extremely simple way of gauging how much load metric data queries are placing on the cluster. The Number of Collector Metrics metric is the total sum of Collector metric data points that the MOM has received each 15 second time period, including data queries. See Collector Metrics Received Per Interval metric on page 31.

    New metric for Historical Metric Count

    The Historical Metric Count metric shows the total number of metrics from an agent that are either live or recently active. The Enterprise Manager uses this metric to decide whether to start clamping more metrics from the agent. For more information, see Historical Metric Count metric on page 88.

    New metric for Number of Historical Metrics

    A new metric, Number of Historical Metrics, tracks the number of metrics for which Introscope has historical data in SmartStor. For more information, see Number of Historical Metrics metric on page 89.

    New metric for Transaction Traces Dropped Per Interval

    A new metric, Performance.Transactions.Num.Dropped.Per.Interval, shows the number of Transaction Traces that the Enterprise Manager could not handle during the interval and were dropped. See Events and Transaction Traces on page 36.

    New tab for CPU Overview

    By viewing the CPU Overview tab you can assess agent CPU health and performance-related statistics in one centralized location. See About the CPU Overview tab on page 46.12 Introscope Sizing and Performance Introduction

  • Sizing and Performance GuideNew tab for Enterprise Manager Overview

    By viewing the EM Overview tab you can assess a number of EM health and performance-related statistics and components in one centralized location. See About the Enterprise Manager Overview tab on page 27 and Enterprise Manager Overview tab on page 90.

    New tab for Metric Count

    By viewing the Metric Count tab you can assess the number and distribution of agent and resource metrics in one centralized location. See About the Metric Count tab on page 107.

    Ping time threshold properties

    For optimal Workstation response times, Collector ping times should average no higher than 500 ms. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. Ping times over the 10 second threshold cause the Enterprise Manager|MOM|Collectors|:Connected metric to display a value of 2. You can adjust this threshold for your environment.

    In Introscope 8.0, there is an additional ping time threshold of 60 seconds. If the ping time exceeds this value, the MOM automatically disconnects from the Collector associated with the slow ping time. A disconnected Collector causes the Enterprise Manager|MOM|Collectors|:Connected metric to display a value of 3. You can adjust this threshold for your environment. See Local network requirement for MOM and Collectors on page 51.

    Running multiple Collectors on one machine

    By following CA Wilys guidelines, you can set up multiple Collectors on a single machines. See Running multiple Collectors on one machine on page 74.

    Scalability

    Introscope 8.0 includes a number of scalability improvements, which are documented across this guide:

    Each Collector Enterprise Manager can handle up to 500 K metrics (varies according to hardware) about twice the Introscope 7.x Enterprise Manager metric limit.

    Collectors can take advantage of additional CPUs to increase these limits: number of applications per Collector

    number agents per CollectorNew and changed features in Introscope 8.0 13

  • CA Wily Introscope number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager).

    Each MOM can connect to a five million metric cluster (10 collectors, 500 K metrics per collector), which is a five-fold increase in clustered Enterprise Manager scale.

    The MOM now requires more powerful hardware than Collectors. See MOM hardware requirements on page 59.

    Support for 50 concurrent Workstation connections Important The limits may differ substantially depending on the specific

    platform and hardware used in your environment.

    SmartStor metadata stored in uncompressed format

    To increase SmartStors speed in reading stored metadata files, starting with Introscope 8.0, all new metadata files are written in an uncompressed format. See SmartStor metadata files are uncompressed on page 98.

    SQL statements, statement normalizers, and metric explosions

    Metric explosions can be caused by a number of factors, including poorly written and long SQL statements. Introscope includes four SQL statement normalizers to address long SQL statements. The regular expression SQL statement normalizer is new for Introscope 8.0. CA Wily recommends that you use this normalizer before the other normalizers provided with Introscope, as the regular expression SQL statement normalizer allows you to configure regular expressions and normalize any characters or sequence of characters in the SQL statement. See SQL statements and metric explosions on page 92.

    Support for RAID 5 data storage

    CA Wily now supports Redundant Array of Inexpensive Disks (RAID) 5 for data storage. See Setting the SmartStor dedicated controller property on page 55.

    Transaction Trace component clamp

    In the case of an infinitely expanding transactionfor example when a servlet executes hundreds of object interactions and backend SQL callsIntroscope clamps the Transaction Trace, resulting in a truncated trace. This helps prevent the JVM from running out of memory. The clamped Transaction Traces are marked as truncated in the Workstation Transaction Trace Viewer. See Transaction Trace component clamp on page 108.14 Introscope Sizing and Performance Introduction

  • Requirements and Recommendations 15

    CHAPTER 2

    SmartStor requirements. . . . . . . . . . . . . . . . . 49

    MOM and Collector EM requirements. . . . . . . . . . . . . 51

    Local network requirement for MOM and Collectors . . . . . . . . 51

    When to run reports, custom scripts, and large queries. . . . . . . 52

    Setting the SmartStor dedicated controller property . . . . . . . . 55

    Collector and MOM settings and capacity . . . . . . . . . . . 58EM Introscope 8.0 EM settings and capacity . . . . . . . . . . . 53

    Estimating Enterprise Manager databases disk space needs . . . . . 53

    SmartStor settings and capacity . . . . . . . . . . . . . . 55EM Requirements and Recommendations

    This chapter provides background and specifics to help you understand how to size and tune your Enterprise Manager for good performance. In this chapter youll find the following topics:

    Enterprise Manager overview . . . . . . . . . . . . . . . 17

    Factors that affect the Introscope environment . . . . . . . . . 20

    Factors that affect EM maximum capacity . . . . . . . . . . . 21

    Differences between EMs and J2EE servers. . . . . . . . . . . 22

    Enterprise Manager health . . . . . . . . . . . . . . . . 27

    About EM health and supportability metrics . . . . . . . . . . 28

    SmartStor overview . . . . . . . . . . . . . . . . . . 40

    About SmartStor spooling and reperiodization. . . . . . . . . . 40

    Report generation and performance . . . . . . . . . . . . . 43

    Concurrent historical queries and performance . . . . . . . . . 43

    About SmartStor and flat file archiving . . . . . . . . . . . . 43

    MOM overview . . . . . . . . . . . . . . . . . . . . 44

    Collector overview . . . . . . . . . . . . . . . . . . 44

    Enterprise Manager basic requirements . . . . . . . . . . . . 47

    Enterprise Manager file system requirements . . . . . . . . . . 47

    EM OS disk file cache memory requirements . . . . . . . . . . 47

    Each EM requires SmartStor on a dedicated disk or I/O subsystem . . . 49

  • CA Wily Introscope MOM disk subsystem sizing requirements . . . . . . . . . . . 58

    MOM to Collectors connection limits . . . . . . . . . . . . . 59

    MOM to Workstation connection limits . . . . . . . . . . . . 60

    Metric load limit on MOM-Collector systems . . . . . . . . . . 60

    Configuring a cluster to support 1,000,000 MOM metrics . . . . . . 61

    MOM hot failover . . . . . . . . . . . . . . . . . . . 62

    Agent load balancing on MOM-Collector systems . . . . . . . . . 63

    Avoid Management Module hot deployments . . . . . . . . . . 68

    Collector applications limits . . . . . . . . . . . . . . . 69

    Collector metrics limits . . . . . . . . . . . . . . . . . 69

    Collector events limits . . . . . . . . . . . . . . . . . 70

    Collector agent limits . . . . . . . . . . . . . . . . . . 70

    Collector hardware requirements . . . . . . . . . . . . . . 71

    Collector with metrics alerts limits . . . . . . . . . . . . . 71

    Collector to MOM clock drift limit . . . . . . . . . . . . . . 71

    Reasons Collectors combine slices . . . . . . . . . . . . . 72

    Increasing Collector capacity with more and faster CPUs . . . . . . 73

    Standalone EM hardware requirements example . . . . . . . . . 74

    Running multiple Collectors on one machine . . . . . . . . . . 7416 EM Requirements and Recommendations

  • Sizing and Performance GuideEnterprise Manager overviewThe Enterprise Manager (EM) is an integral component of the Introscope system. An Enterprise Manager is a server that collects, performs calculations on, and stores metrics reported by multiple agents. In a simple Introscope environment such as the one shown in the figure below, one single standalone Enterprise Manager collects, persists, and processes all the agent metrics, then supplies the resultant data for viewing in the Introscope Workstation or WebView browser instances. Enterprise Manager overview 17

  • CA Wily Introscope In a more complex environment, as shown in the figure below, Enterprise Managers in the role of Collectors can be clustered so that their collected metrics data is compiled in a single Manager of Managers (MOM) Enterprise Manager. The MOM provides a unified view of all the metrics to the connected Workstation and WebView instances.

    Note In cases where the data is specific to a single Enterprise Manager or where clustering makes no difference to the topic, this guide uses the generic term Enterprise Manager. However in some cases, Collectors and MOM Enterprise Managers perform different functions that require different sizing capacity guidelines or result in different performance behaviors. In these cases, the term Collector or MOM is used as appropriate. While the Collector and MOM perform very different functions within a cluster, the system requirements are quite similar with the exception of data persistence, as the MOM persists relatively little data in its role. 18 EM Requirements and Recommendations

  • Sizing and Performance GuideIn an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Multiple physical agents can be configured into a single virtual agent, which enables an aggregated, logical view of the metrics reported by multiple agents.

    To an Introscope Enterprise Manager, an application is an agent-specific association of metrics that is derived from the Java application .war files deployed on the managed J2EE application server. In an Introscope Enterprise Manager Investigator metric tree, applications, which are agent-specific, are found under the Frontends node, as shown in the following figure.

    Note You can have multiple applications running within a single JVM, but you can assign only one Introscope agent per JVM to collect the performance data.Enterprise Manager overview 19

  • CA Wily Introscope Enterprise Manager databases

    The Enterprise Manager writes to three separate databases: SmartStor, Transaction Event database (traces.db) and the metrics baselining (heuristics) database (baselines.db).

    Introscope features such as Transaction Tracing, Transaction Trace sampling, and metrics baselining (heuristics) incur additional load on the disk subsystem. For this reason, the Transaction Event database (traces.db) and the metrics baseline (heuristics) database (baselines.db) can be located on the same disk with each other. However, SmartStor MUST be located on a separate dedicated disk or I/O subsystem.

    In the default Enterprise Manager installation process, the SmartStor data directory defaults to the target Enterprise Manager installation directory. However, for optimal performance, move the SmartStor data directory to a separate physical disk from the Enterprise Manager installation directory. For heavy-duty, production Enterprise Managers, disk I/O is the primary bottleneck for Enterprise Manager capacity, so CA Wily strongly recommends the use of multiple drives.

    For more information, see SmartStor requirements on page 49.

    Factors that affect the Introscope environment

    The first questions to answer when considering your Introscope environment are: How many Java application server processes do I want to monitor (number of agents)? and How many metrics per server on average (metrics per agent) will be generated? The answers to those questions depend on the complexity of the server and the agent instrumentation settings. For more information, see the Introscope Configuration and Administration Guide.

    The capacity of the Enterprise Manager is dependent on the hardware it is running on as well as other complicating factors. For example, one factor is the JVM being used for the Enterprise Manager on the platform under consideration. The Enterprise Manager performs much better when its underlying JVM uses concurrent garbage collection (traditional garbage collection can halt the system when it is busy), and JVMs that support concurrent garbage collection are preferred.

    If the CA Wily sizing recommendations are exceeded, the system becomes more likely to suddenly experience sluggish behavior if too many operations all occur simultaneously. You can use the Overall Capacity metric for alerting purposes. For more information, see Overall Capacity (%) metric on page 33. For example, the metrics limit is the number of metrics that can be written safely to the disk I/O system. 20 EM Requirements and Recommendations

  • Sizing and Performance Guide Important On typical server configurations, the metrics limit is usually the primary limitation on the capacity of the Enterprise Manager. This is a critical factor when sizing an Enterprise Manager.

    CPU performance, network bandwidth, and availability of RAM are also influential, but disk I/O seek time is typically the primary bottleneck. In Introscope 8.0, exceeding the limits found in the Sample Introscope 8.0 Collector sizing limits table on page 119 will potentially bring the system to a state where you begin to see performance problems. These problems depend on what is impacted. Overloaded disk I/O typically causes combined time slices and sluggish Workstation refresh times. Lack of RAM causes memory exceptions during spool file conversion, as too many metrics are tracked. Network bandwidth problems cause slow cluster response time, and more rarely, may cause agents to be dropped. Lagging CPU causes performance problems including calculators not updating and alerts to be missed.

    Another example, as seen in Sample Introscope 8.0 Collector sizing limits table on page 119, the recommended limit for monitored applications (maximum number of applications) for a Windows-based Enterprise Manager is about 170% of that found on a Solaris machine. In the case of applications, the limit is strongly dependent on the performance characteristics of the CPUs available to the Enterprise Manager, since applications create alerts that must be calculated every time slice.

    Factors that affect EM maximum capacity

    The maximum capacity of an Enterprise Manager can be reduced by the factors listed in the table below.

    Factor Reducing Enterprise Manager Maximum Capacity

    For More Information See

    SmartStor NOT on a separate disk drive or I/O subsystem

    Each EM requires SmartStor on a dedicated disk or I/O subsystem on page 49

    If metric groupings are used, exceeds the maximum number of metrics placed in metric groupings.

    Matched metrics limits on page 79

    Boundary Blame is disabled and maximum loads are not redistributed across all Enterprise Managers.

    Sample Introscope 8.0 Collector sizing limits table on page 119

    The Enterprise Manager runs at greater than 40-50% average CPU utilization range

    Collector metric capacity and CPU usage on page 45

    The sum of all metrics behind every TOP N graph viewed by every Workstation instance exceeds 100,000

    Top N graph metrics limit per Workstation on page 103Enterprise Manager overview 21

  • CA Wily Introscope Differences between EMs and J2EE servers

    Users who maintain enterprise application servers are accustomed to purchasing hardware that scales well with their applications, and have general understandings of target utilization levels and capacity. Although the Enterprise Manager itself is a Java server, the Enterprise Manager neither behaves nor performs like a typical J2EE server. Therefore it should not be modeled as such when purchasing hardware or performing an Enterprise Manager capacity forecast.

    J2EE servers and web applications receive requests for work at irregular intervals, with varying load throughout the day. Therefore the J2EE server only performs as much work as is requested of it in a given interval.

    Under standard usage, when incoming user requests come into a J2EE server, the requests are serviced by a pool of worker threads, which perform necessary business logic in servlets and pools of EJBs. The servlets and EJBs in turn make requests to external databases or systems. In well-designed J2EE applications, each of these worker threads is:

    largely independent from one another free to obtain the necessary resources and information needed to satisfy the

    request

    not forced through a common checkpoint for synchronization (although J2EE applications often aren't designed well).

    More than 4 concurrent historical queries

    are issued against SmartStor.

    Concurrent historical queries and performance on page 43

    SmartStor is used in conjunction with flat file archiving

    About SmartStor and flat file archiving on page 43

    Improper sizing is used for Enterprise Managers, Workstations, metrics, and agents

    All chapters in this book.

    Factor Reducing Enterprise Manager Maximum Capacity

    For More Information See22 EM Requirements and Recommendations

  • Sizing and Performance GuideTherefore, in most situations, application servers scale well in throughput by adding additional CPUs, because each CPU can run additional worker threads to satisfy more requests. Occasionally one request might be slowed down, but whether it takes 100 milliseconds (ms) or 5 seconds doesn't cause the rest of the system to come to a halt. Only in the event of an external bottleneck, such as a database, can all threads come to a halt waiting for data. Eventually the request threads all become busy, and the application server slows to a crawl, maintaining most throughput while rejecting additional requests for work. When the bottleneck is relieved, the system begins to service requests again, and returns to normal.

    In contrast, the Enterprise Manager behaves very differently because of its architecture and the nature of the work it performs. Introscope monitors production systems in real time, and provides information, warnings, and alerts in real time. In order to accomplish this, the Enterprise Manager performs as a real time system as well. The Enterprise Manager receives a continual flow of data from agents every 7.5 seconds. Once every 15 seconds, the Enterprise Manager must do all of the following:

    examine all of the metric data that it has received for the interval for consistency

    perform calculations perform actions, such as fire alerts or send messages store the data to disk respond to Workstation requests for live data handle incoming events (Transaction Traces, errors, and so on) and persist

    them. Enterprise Manager overview 23

  • CA Wily Introscope For the most part, the Enterprise Manager can only use two threads to perform calculations and actions on the large set of agent-generated data, and only a single thread to perform the data storage. If the Enterprise Manager is unable to complete these operations within the 15 second interval, it may fall behind and not catch up with all the processing that needs to be completed because another set of data arrives. The Enterprise Manager then continually combines data or suffers from sluggish performance as it attempts to process and write more data than it can handle. There are internal buffers to allow for bursts of activity so that the Enterprise Manager can catch up, but if the Enterprise Manager has too many metrics being reported, these buffers fill up quickly. The Enterprise Manager is very different from a J2EE server in this regard, because the standard J2EE server does not examine data requests on a regularly scheduled basis to decide what to do with them. The Enterprise Manager's scenario is more similar to the classic factory production conveyor belt analogy, in which a continual set of finished products (data) arrives for two workers to examine. Then the two workers must transfer the product packages (metric data) to a single worker who drives the packaged data in a truck down a single-lane road to a warehouse, where several more workers off load the packages from the truck into storage (SmartStor database).

    Because of the nature of the tasks that the Enterprise Manager performs, there are currently limitations in the number of CPUs that the Enterprise Manager can use effectively. A minimum of 2 CPUs are required for optimum performance. However, the use of 4 CPUs increases performance by allowing more of the following:

    number of applications per Collector number agents per Collector number of metrics that can be placed in metric groupings (if using a standalone

    Enterprise Manager).

    More than 4 CPUs do not enhance performance. However, CA Wily recommends faster CPUs because each of the threads can then examine the data much faster. For the maximum limits on 4 CPU Enterprise Managers for matched metrics, see Matched metrics limits on page 79. 24 EM Requirements and Recommendations

  • Sizing and Performance GuideAnother difference between J2EE servers and Enterprise Managers is in how they perform data processing. J2EE servers largely perform batch processing, while Enterprise Managers largely perform real-time processing. J2EE applications are batch processors. Work queues up and is handled as quickly as possible. As the machine slows down, the batch processes take longer and longer. In contrast, the Enterprise Manager, which has some batch processing functions (for example, responding to historical data query requests), handles most data flow in real-time. This means that the Enterprise Manager can take whatever time it needs to process incoming data, as long as it finishes within the 15-second harvest duration period. Once the Enterprise Manager takes longer than that time frame, it starts to combine data. Sizing a real-time system can be difficult because you need to size for the maximum load, not the average load on the machine. If you only size for the average load, then during maximum load times you'll lose data.

    More ways that Enterprise Managers perform additional work and have limitations that affect performance atypical of standard J2EE systems include:

    Introscope Workstations provide different load characteristics than typical Web clients. Workstations allow users to view live data in real time. Depending on the feature or data requested, a Workstation can be a continual tax on the Enterprise Manager even if no user is watching the console, as the Enterprise Manager continues to serve data. In contrast, if a user stops interacting with a browser-based Web application, the data/refresh requests typically stop.

    Workstations can perform historical queries for data, which cause the Enterprise Manager to retrieve data from storage. This can interfere with the Enterprise Manager's ability to effectively process and store incoming agent data due to disk contention. J2EE systems don't typically serve requests directly from databases or have disk contention issues.

    The Enterprise Manager periodically reorders and reperiodizes stored data. Incoming metric data is written sequentially to a spool file, which is reorganized and indexed once every hour. This reorganization process is a resource expensive (CPU and disk I/O intensive) operation that can interfere with the Enterprise Manager's ability to process and store incoming data. J2EE servers don't typically perform periodic intense housekeeping operations such as reperiodization.

    Agents can experience metric leaks over time, without the user knowing, which causes more data to be processed by the Enterprise Manager. Metric leaks occurs when the number of registered metrics being reported by agents is continually increasing. This means that a properly configured system can drift over time into a problem state.

    An Enterprise Manager, for all configurations, should run AT MOST within 40% to 50% CPU utilization range in a steady state. This provides the additional headroom necessary for periodic operations, such as SmartStor spooling, reperiodizing, and user Workstation requests (alert requests) that may Enterprise Manager overview 25

  • CA Wily Introscope saturate the CPU. Typically J2EE systems can be run much closer to saturation because there are no hidden operations that can consume CPU above and beyond steady state. In the event the system is saturated, the J2EE server refuses incoming requests to alleviate the pressure.

    No other applications/processes should be running on an Enterprise Manager in order to avoid contention for system resources available to Enterprise Manager.

    Enterprise Managers (both Collectors and MOM) queue up incoming data query requests and aggregate the data as it is read in from SmartStor.

    About Introscope system size

    Introscope system size is determined by workload and business logic.

    Introscope workload is comprised of:

    total applications monitored total metrics monitored total agents monitored number of Enterprise Managers.

    Introscope business logic handles the data collected in the monitoring operations and determines what will be done with the data. Introscope business logic operations include determining or handling the following:

    total number of metrics groupings maximum number of metrics in a metrics groupings number of metrics persisted per minute calculators alerts management modules containing a lot of dashboards, calculators, alerts, and

    so on

    large numbers of reports Top N graphs.26 EM Requirements and Recommendations

  • Sizing and Performance GuideEnterprise Manager health You can monitor and assess Enterprise Manager health in two ways by viewing the:

    Enterprise Manager Overview tab (see About the Enterprise Manager Overview tab, below)

    Enterprise Manager health and supportability metrics (see About EM health and supportability metrics on page 28)

    The Enterprise Manager generates and collects metrics about itself that are useful in assessing its health and determining how well it is performing under its workload. These are sometimes referred to as supportability metrics because these metrics help support the healthy functioning of the Enterprise Manager.

    About the Enterprise Manager Overview tab

    By viewing the Enterprise Manager Overview tab you can assess a number of Enterprise Manager health and performance-related statistics and components in one centralized location.

    To view the Enterprise Manager Overview tab

    1 Select the Enterprise Manager node under the Custom Metric Agent.

    2 Click the Overview tab in the right pane.

    Study these graphs as shown in the figure below.

    EM Capacity (%) EM CPU Utilization Heap Utilization Harvest, SmartStor, and GC Durations Number of Metrics EM Databases (MB) Number of AgentsEnterprise Manager health 27

  • CA Wily Introscope Number of Workstations

    About EM health and supportability metrics

    Enterprise Manager metrics appear in the Investigator tree, under:

    Custom Metric Host (Virtual)Custom Metric Process (Virtual)

    Custom Metric Agent (Virtual)(SuperDomain)Enterprise Manager

    In a clustered environment, the MOM's metrics also appear under the tree path shown above. However, in a clustered environment, Collector supportability metrics show up in the same Custom Metric Host (Virtual) and Custom Metric Process (Virtual) path location, but the last name includes (CollectorHostName@PortNumber). 28 EM Requirements and Recommendations

  • Sizing and Performance GuideThe Investigator tree with the MOM and one Collector looks like this:

    Custom Metric Host (Virtual)Custom Metric Process (Virtual)

    Custom Metric Agent (Virtual)(SuperDomain)Enterprise Manager

    Custom Metric Agent (Virtual)(Collector1@5001)(SuperDomain)Enterprise Manager

    For more information, see the Introscope Configuration and Administration Guide.

    When you deploy Enterprise Managers into your Introscope environment, you'll need to look at the Enterprise Manager health and supportability metrics to find out what's really happening in your monitoring solution.

    Harvest duration, Collector Metrics Received Per Interval, SmartStor spool file conversion, and Overall Capacity (%) are several of the more significant indicators of problems in an Enterprise Manager.

    For more information, see

    Harvest Duration metric on page 29 Collector Metrics Received Per Interval metric on page 31 Converting Spool to Data metric on page 32 Overall Capacity (%) metric on page 33 Additional supportability metrics on page 38.

    Harvest Duration metric

    The Harvest Duration metric shows the time in milliseconds (during a 15-second time slice) spent harvesting data. It is generally a good indicator in determining whether or not the Enterprise Manager is keeping up with the current workload. You can find this metric at the following location in the Investigator tree, as shown in the figure below.

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Tasks | Harvest Duration (ms)Enterprise Manager health 29

  • CA Wily Introscope The Harvest Duration metric value should be less than 3000 ms [3 seconds] and should not exceed 7,500 ms [7.5 seconds]. The harvest operation usually causes the CPU activity to spike for the full harvest duration and the CPU is often almost idle for the rest of the 15 seconds. If the harvest duration is too long, investigate reducing the metric load on the overloaded Enterprise Manager by having agents report to separate Enterprise Managers or consider moving the Enterprise Manager to a platform with faster CPUs.

    Number of Collector Metrics

    The Number of Collector Metrics metric shows the total number of metrics currently being tracked in the cluster. You can find the Number of Collector Metrics metric here in the Investigator tree:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | MOM | Number of Collector Metrics.

    Heres the Harvest Duration metric location. 30 EM Requirements and Recommendations

  • Sizing and Performance GuideCollector Metrics Received Per Interval metric

    The Collector Metrics Received Per Interval metric is an extremely simple way of gauging how much load metric data queries are placing on the cluster. This metric is the total sum of Collector metric data points that the MOM has received each 15-second time period, including data queries. You can find the Collector Metrics Received Per Interval metric here in the Investigator tree:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | MOM | Collector Metrics Received Per Interval

    Tip Consult this metric regularly.

    A large Collector Metrics Received Per Interval metric value, coupled with degradation of the cluster, indicates that the MOM has been asked to read too much metric data from the Collectors. This overloading is the result of some combination of the following:

    too many Workstations connected too many queries (especially historical queries) being run user alerts and calculators set up to evaluate too many metrics

    Although all resource loading issues combine to affect overall cluster performance, a large Collector Metrics Received Per Interval metric value, which reflects too many metric reads, is a different than a metric explosion (see Detecting metric explosions on page 84), which is the result of too many metric writes by the agents. This means, in particular, that reducing metric load on your Collectors may not solve issues on the MOM related to a high Collector Metrics Received Per Interval metric value.

    If your Collector Metrics Received Per Interval value seems too high, check the number of Workstations attached, and that most are in Live mode. If this fails to solve the issue, you should check to make sure you do not have alerts set up to evaluate too many metrics in the system. You can do this by searching and sorting by the value all metrics named:

    Enterprise Manager | Internal | Alerts: Number of Evaluated MetricsEnterprise Manager health 31

  • CA Wily Introscope If Collector Metrics Received Per Interval value continues to remain high after carrying out the suggestions above, you can also set the introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance.

    Important Clamping the Collector metrics prevents cluster degradation, but queries and alerts that are clamped do not fully evaluate all metrics they match.

    Converting Spool to Data metric

    The Converting Spool to Data metric tracks whether or not the spool to data conversion task is running. You can find this metric at the following location in the Investigator tree:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks | Converting Spool to Data

    When this task is running, the metric has a value of 1. When this task is not running, it has a value of 0. If this metric stays at a value of 1 for more than 10 minutes per hour, this indicates that reorganizing the SmartStor spool file is taking too long. This problem is often progressive. As the spooling time gets longer hour after hour, the Enterprise Manager usually becomes noticeably less responsive overall because the Enterprise Manager is putting more and more effort into reorganizing the spool file.

    For better performance, add more physical memory (RAM) to the machine. Adding more RAM can help increase the size of OS disk file cache and should reduce the amount of time the conversion task takes. The amount of RAM that will help varies between operating systems, however a good general rule is to dedicate 1 GB RAM for the OS disk cache. In general at full load, you should configure a Collector to use 1.5 GB heap memory. If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.32 EM Requirements and Recommendations

  • Sizing and Performance GuideAdditionally, a server host typically requires approximately 500 MB for the operating system (this varies based on hardware and OS). When SmartStor starts the re-spooling operation, the operating system starts reading the spool file into the file cache memory (which is part of the OS, not the Enterprise Manager Java virtual machine). If reading 200,000 metrics into memory, for example, the spool file will usually be over 1.5 GB. For optimum performance the file cache should be large enough to accommodate the entire spool file. So the host machine should have between 3 and 4 GB of physical RAM. Windows machines that are 32 bit use a fixed file cache limited to approximately 1 GB, whereas UNIX systems generally have a configurable file cache limit. This must be physical memory not virtual memory (swap space). Enterprise Manager performance degrades dramatically if the host machine starts paging to and from virtual memory.

    For more information about the converting spool to data task, see About SmartStor spooling and reperiodization on page 40.

    Overall Capacity (%) metric

    The Enterprise Manager Overall Capacity (%) metric estimates the percentage of the Enterprise Managers capacity that is consumed. You can find it at this location in the Investigator tree:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager: Overall Capacity (%)

    The Overall Capacity (%) metric is computed in part from the following metrics, which you can find at this location in the Investigator tree:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Health

    CPU Capacity (%) (added into the computation in Release 8.0). See Additional supportability metrics on page 38

    Harvest Capacity (%) See Additional supportability metrics on page 38. Heap Capacity (%) See Heap Capacity (%) metric, below. Incoming Data Capacity (%) See Additional supportability metrics on page 38. SmartStor Capacity (%). See Additional supportability metrics on page 38.Enterprise Manager health 33

  • CA Wily Introscope The Overall Capacity (%) metric is more valuable over a long period of time rather than for a specific 15-second time slice. Since the Overall Capacity metric is based on real-time metrics, you may see the Overall Capacity value spike quite a bit higher than 100% because, for example, the hardware's I/O subsystem could be briefly overloaded. However, the Enterprise Manager tends to recover from these spike situations automatically if they are not long-lasting. In general, a spike (for example, to 200%) isn't cause for concern if it's only for a brief moment, but over a long period of time, the Overall Capacity should ideally average about 75%. However, generally if the Overall Capacity value is 50%, then you should be able to double the load (+/- 15%) to get see a 100% capacity value.

    Note SmartStor hourly and nightly conversion times are not factored into the Overall Capacity metric, however hourly and nightly operations do affect how much metric load the Enterprise Manager is capable of handling.

    During time periods that the Overall Capacity (%) metric spikes to high values (for example 600%), at least one of the other metrics listed above should also show a spike. Investigating and understanding the source of the secondary spike might help pinpoint the root cause of the resource issue.

    For example, the problem might be found by looking at the Heap Capacity (%) metric, which feeds into Overall Capacity (%) metric. See Heap Capacity (%) metric, below.

    Heap Capacity (%) metric

    The Heap Capacity (%) metric is determined by what percentage of heap the JVM is currently using (based on the GC Heap: In Use Post GC (mb) metric).

    Note A 25% buffer remains when the Heap Capacity (%) metric reports 100% and when the actual heap would be at 100%. For example, if the total heap is 1000 MB and the current heap usage is 750 MB, then this metric value is 100%. This buffer is included because Java needs heap space for normal operations.

    Depending on how youve set and launched the JVM with heap options, the JVM may start with a very small heap but grow it over time. The Heap Capacity (%) metric is based on the current JVM heap size, not what the heap size could become. CA Wily recommends that you set the Introscope heap settings so that heap min equals heap max.34 EM Requirements and Recommendations

  • Sizing and Performance GuideTroubleshooting Enterprise Manager health

    Every 15 second the Enterprise Manager gathers and records health metrics about itself. There are two ways you can view these metrics to troubleshoot Enterprise Manager health performance:

    examine the Enterprise Manager health and supportability metrics in the Investigator tree. For more information see About EM health and supportability metrics on page 28.

    examine the perflog.txt file.

    Related Knowledge Base article(s):

    Perflog Values in Introscope 7.1

    The Investigator tree Enterprise Manager health and supportability metrics are easy to view and interpret, so this is first place you should look to understand your Enterprise Managers current health. Perflog.txt is often valuable to CA Wily Support.

    Several examples of how you can use the perflog.txt file are provided in the topics below.

    Harvest Duration

    You can find the Harvest.HarvestDuration metric value in perflog.txt, as shown in the figure below.

    Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.Enterprise Manager health 35

  • CA Wily Introscope SmartStor Duration

    You can find the Smartstor.Duration metric value in perflog.txt as shown in the figure below.

    Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.

    Events and Transaction Traces

    The Enterprise Manager attempts to insert all incoming events into a Transaction Trace insert queue. The number of events in the queue at any time is shown in the Performance.Transactions.TT.Queue.Size metric.

    If the Transaction Trace insert queue is not full, an incoming event is counted by the performance.transaction.num.inserts.per.interval metric.

    If the Transaction Trace insert queue is full when a new event comes in, the event is dropped. For Introscope 8.0, you can view a new metric, Performance.Transactions.Num.Dropped.Per.Interval that shows the number of Transaction Traces that the Enterprise Manager could not handle during the interval and were dropped.

    You can find these metric values in perflog.txt, as shown in the figure below.36 EM Requirements and Recommendations

  • Sizing and Performance GuideIf you want to know how many events the Enterprise Manager received from agents for an interval, add the performance.transaction.num.inserts.per.interval metric plus the Performance.Transactions.Num.Dropped.Per.Interval metric.

    Although one would expect the values for the performance.transaction.num.inserts.per.interval metric and Performance.Transactions.TT.Queue.Size metric for an interval to be identical, that is generally not the case due to these factors:

    metric counts are based on frequent samples of the system samples of these two metrics are not taken at the same time the system is very active (numeric counts vary quickly and greatly)

    If, for example, at one sample time the number of inserted events is 500, this implies that the Transaction Trace insert queue should have a positive value and you would expect to see a value of 500 as well for the Performance.Transactions.TT.Queue.Size metric. However, by the time the Transaction Trace insert queue is sampled, it can be empty and record a sample number of zero.Enterprise Manager health 37

  • CA Wily Introscope Additional supportability metrics

    There are a number of supportability metrics to help you monitor the help of your system and Enterprise Manager. See the table below for brief descriptions. See the Introscope Configuration and Administration Guide for more information.

    Supportability metric name

    Investigator tree location Description

    CPU Capacity (%) Enterprise Manager|Health Same as EM CPU Used (%) (see below). Duplicated to easily relate to Overall Capacity (%) metric, which now takes into account this metric.

    Number of Agents Enterprise Manager | Connections The number of currently connected agents.

    The Enterprise Manager's perflog.txt file records and reports the number of actual agents connected in the Agent.NumberOfAgents metric value.

    EM CPU Used (%) Enterprise Manager|CPU The percent of the total available CPU was used by running Enterprise Managers during the time period specified.

    Note: This number does not reflect other processes running on the server or overall server CPU in use, but rather how much CPU the particular Enterprise Manager used. This metric is acquired from the JVM using an API introduced in the JDK 1.5. Therefore, it is supported only on some platforms.

    Harvest Capacity (%) Enterprise Manager|Health Percent of time needed for the data harvest in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the data harvest takes 15000 ms, then this metric value is 100.

    Incoming Data Capacity (%)

    Enterprise Manager|Health The capacity of the Enterprise Manager to handle incoming data, based on an internal metric that indicates the number of incoming metrics yet to be processed. This internal metric is divided by twice the total number of metrics. For example, if 150,000 metrics are in the to-be-processed queue and the Enterprise Manager has a total of 300,000 metrics, the incoming data capacity will be 25%.38 EM Requirements and Recommendations

  • Sizing and Performance GuideNumber of Metrics Enterprise Manager|Connections The metric load on an Enterprise Manager. When an agent disconnects, this number drops.

    SmartStor Capacity (%) Enterprise Manager|Health Percent of time needed for the SmartStor write process in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the SmartStor write duration is 15000 ms, then this metric value is 100.

    Write Duration (ms) Data Store|SmartStor|MetaData The duration of SmartStor Capacity (%) metric time (see above) spent writing metadata.

    If this metric value doesnt change proportionately as the SmartStor Capacity (%) metric value increases or decreases, there may be an issue with the file system.

    Supportability metric name

    Investigator tree location DescriptionEnterprise Manager health 39

  • CA Wily Introscope SmartStor overviewIntroscope 7.1 included significant optimizations in disk read/write synchronization that take advantage of a dedicated SmartStor disk. All performance improvements and sizing increases starting with Introscope 7.1 depend on those optimizations.

    SmartStor writes to disk data supplied from agents sent to the Enterprise Manager/Collector first, and performs all other operation after that. For example, if 10 users are running large historical queries (over 1000 metrics/query) at the same time, an Enterprise Manager performs more slowly. The users experience sluggish Workstation response time is because SmartStor is simultaneously writing new agent metric data, running extensive user queries, doing reports, and converting files to the faster query file format. The Workstation queries are slow (or metric data is aggregated) due to the disk being overloaded.

    About SmartStor spooling and reperiodization

    SmartStor writes live incoming data to disk in a spool format that is fast to write, but slow to query. Every hour at the top of the hour SmartStor takes the spool file from the previous hour and reformats the file into a SmartStor data file. The SmartStor data file, which is faster and easier to search than the spool file, optimizes historic query responses. This Introscope process, which is referred to as spool to data conversion (or conversion), typically takes 10 minutes. However, conversion times on different hardware perform differently due to memory, CPU power, and disk read/write speeds. A conversion time longer than 10 minutes is a potential warning sign of an overloaded Enterprise Manager. Most importantly, the conversion time should not be getting longer every hour. This is a sure sign that the system is becoming overloaded and often indicates a metric creep, in which the number of registered metrics being reported by agents is continually increasing.

    The most common cause of excessively long SmartStor spool to data conversion times is a file cache size that is too small to perform the required operations. This situation can be addressed by adding more physical memory. The conversion process is usually the first process to show problems if SmartStor is not using a dedicated I/O subsystem.

    SmartStor reperiodization is the process by which archived data files are compressed to reduce the total size of the SmartStor directory. Reperiodization is performed in two stages after midnight by default. For information about how to configure this multi-tier reperiodization, see the Introscope Configuration and Administration Guide. 40 EM Requirements and Recommendations

  • Sizing and Performance GuideReperiodization is both I/O and CPU intensive, as the data archive files are read, the data is compacted by aggregating multiple time slices, and then the resulting data is written back to SmartStor. This means that the period after midnight is the busiest time for an Enterprise Manager. The entire reperiodization process should not take more than two hours. During this time, no other Enterprise Manager operation such as report generation (see Report generation and performance on page 43) or OS-level operation should be scheduled.

    Note If the Enterprise Manager is stopped in the middle of reperiodization, it will, upon restart, delete the partially written files and restart reperiodization after 45 minutes. This restart may not occur during the regularly scheduled reperiodization time. The 45 minute delay allows the system to register all its agents and metrics before launching the restart of this compute-intensive reperiodization task.

    SmartStor spooling and reperiodization can be verified in the Enterprise Manager log in verbose mode, which records that the spooling process starts at the top of the hour. Under standard conditions, within 10 minutes, a second recorded message reports that the spooling process has completed. In addition there are three SmartStor management metrics, which you can find at this location in the Investigator tree:

    Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks.

    As shown in the figure below, the three tasks that are monitored are:

    Spool to Data Conversion Data Appending ReperiodizationSmartStor overview 41

  • CA Wily Introscope These tasks have metric values that oscillate from 0 to 1 when the respective task is running. You can see when those tasks are running and how long they are taking by selecting a task in the tree, then picking an appropriate time from the Time Range drop down list in the Viewer pane.

    Top of the hour problems are generally related to slow SmartStor spooling. Early morning (after 6 A.M.) problems are usually due to reperiodization not being completed quickly enough. This usually implies that the Enterprise Manager is excessively loaded. For more information, see EM OS disk file cache memory requirements on page 47.42 EM Requirements and Recommendations

  • Sizing and Performance GuideReport generation and performance

    Generating Introscope reports is very expensive in terms of CPU and disk access. The cost is primarily based on two factors:

    the number of graphs (total amount of data) the report time period (historical range)

    Reports that are either larger than 50 graphs or longer than 24 hours should not be scheduled during the hours when SmartStor is reperiodizing (usually midnight to 3:00 A.M.) because of high CPU activity and the large amount of disk activity.

    Concurrent historical queries and performance

    The best way to avoid disk performance problems from historical queries is to have most Introscope Workstation users view data in Live mode. Use Historical mode only for in-depth analysis, like troubleshooting and reports. On systems under heavy metric load, make sure that users are not all attempting to perform historical queries (which attempts to access the SmartStor historical archive) at the same time. CA Wily recommends a maximum of four concurrent historical queries, although this limit may differ depending on the performance of your hardware. You should also be aware that this limit decreases during spool-to-data file conversion at the top of each hour, and at midnight during reperiodization.

    You can also set the introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance.

    About SmartStor and flat file archiving

    The flat file archiving is an alternate format that can be used for metric data storage instead of SmartStor. Unlike SmartStor, flat file format writes the data in readable ascii format, which is considerably more expansive than SmartStors format. If you use flat file archiving, when archiving, you have the option of configuring flat file data to be gzipped. This reduces the amount of disk space needed considerably, but is CPU intensive to write and extremely CPU intensive to read.

    CA Wily has three recommendations about SmartStor and flat file archiving. SmartStor overview 43

  • CA Wily Introscope First, avoid using SmartStor and flat file archiving at the same time. Flat file archiving duplicates some of the functionality of SmartStor. In addition, flat file archivings compression feature (if enabled) requires noticeable CPU resources that can adversely affect the Enterprise Managers performance when the compression feature periodically runs. In the event that flat file archiving must be used, configure the smallest possible number of metrics to be logged.

    Second, do not use flat file archiving in production. Readable metric values are most useful in a QA debug environment.

    Third, SmartStor should not be located on the same disk as a flat file archive. SmartStor should be on its own dedicated disk. For more information, see SmartStor settings and capacity on page 55.

    MOM overviewMOMs are CPU intensive, in contrast to Collectors, which are I/O and CPU intensive. For more information about MOM requirements, see MOM and Collector EM requirements on page 51 and Collector and MOM settings and capacity on page 58.

    Collector overviewCollectors are I/O intensive, and perform most of Introscope's difficult and intensive calculation processing work.

    Cluster performance is dominated by the Collectors. Given the synchronous communication model between MOM and Collectors, the responsiveness of a MOM (in terms to data refresh to the Workstation) is related to responsiveness of the Collectors. Any performance problems causing response problems in the Collector will be magnified by the MOM. For more information see, Collector to MOM clock drift limit on page 71.

    If upgrading a Collector from 6.x to 8.0, as long as there is a dedicated disk for SmartStor and Boundary Blame is turned on, there should be enough resources left over on the same host to handle the new functionality including metric baselining (heuristics) and creating virtual agents. If you need to migrate a 6.x Enterprise Manager to become an 8.0 Collector, see

    Related Knowledge Base article(s):

    Migrating a 6.x Enterprise Manager to an 8.0 Collector (KB 1630)44 EM Requirements and Recommendations

  • Sizing and Performance GuideCollector metric capacity and CPU usage

    If a Collector is at maximum capacity, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119, you may look at the CPU and the system doesn't appear busy. See You may wonder why Introscope requirements don't allow adding more metrics or agents to the system.

    The reason is that CPU monitoring tools show a snapshot. The behavior of the Collector is 100% CPU usage for 3-4 seconds (at full load), and then idle until the next agent data processing. This happens every 7.5 seconds, which is how the 45% average CPU utilization recommendation is derived. The initial 3-4 seconds is the harvest time, recorded as the Harvest Duration metric and it must be less than 4 seconds. For more information about the Harvest Duration metric, see Enterprise Manager health on page 27.

    The time between harvests allows the Collector to service Workstations, perform Transaction Traces, and handle SmartStor spooling and reperiodization. Unless you're looking at a high resolution CPU/Memory/I/O trace of the Collector between 12:00 midnight and 3:00 A.M., you can't get a true picture of a Collector's resource usage.

    At midnight the usage pattern of everything the Collector does changes dramatically because it's about to start reperiodization. At that point, the Collector gets very busy and typically CPU utilization jumps to 80% to 90%.

    Also, if your CPU monitoring tool is sampling or averaging CPU snapshots over an interval longer than one second, you may not see the intense activity spikes that can cause the Collector to back up and run into problems.

    There are certain operations that can easily saturate the Collector's CPU, such as Transaction Tracing, large numbers of connected Workstations, large numbers of events, large historical queries, and large reports. The Collector must have additional headroom in order to handle those peaks of activity, or else it will fall behind in its processing tasks, resulting in undesirable system behavior.

    While a Collector's CPU usage may not look busy at one point in time, it will look busy if you turn on a large Transaction Trace or if you connect 10 more Workstations, or run a big historical query. That's why CA Wily recommends so much additional CPU headroom.

    On average, you can't have any more than 40% steady state usage because there are too many other operations that can immediately cause the Collector to use 100% CPU. At that point you'll start to see Workstation sluggishness and combined time slices.Collector overview 45

  • CA Wily Introscope About the CPU Overview tab

    By viewing the CPU Overview tab you can assess agent CPU health and performance-related statistics in one centralized location.

    To view the CPU Overview tab

    1 Select the CPU node under the agent.

    2 Click the Overview tab in the right pane.

    Study the CPU Utilization graph as shown in the figure below.46 EM Requirements and Recommendations

  • Sizing and Performance GuideEnterprise Manager basic requirementsThere are several basic requirements for every Enterprise Manager.

    Typically an Enterprise Manager needs 2 to 4 CPUs depending on the hardware platform.

    More CPUs will not improve performance. An Enterprise Manager with fewer CPUs than recommended results in the system performing poorly.

    All Enterprise Managers need a minimum of 3 GB OS RAM to effectively run at anything close to full load.

    EVERY Collector Enterprise Manager must have a dedicated disk I/O subsystem for SmartStor with no other processes competing for it.

    After those basic requirements, system performance is determined by the speed of the CPUs, the speed of the I/O subsystems, and the file cache performance.

    WARNING The recommendations for maximum metrics/Enterprise Manager, agents/Enterprise Manager, physical memory, and so on, should be strictly followed. If you are seeing less CPU utilization than the recommended maximum threshold (at full metrics load), it is NOT a reason to add additional load (above CA Wily recommendations) to the Collector. In general, metrics load is highly I/O bound rather than CPU intensive, so even with CPU cycles available, the Enterprise Manager can get I/O bound on metric data and the whole system can start slowing down.

    Enterprise Manager file system requirements

    Make sure that the file system used for Enterprise Manager files baselines.db, and traces.db is a local disk and not a network file system (NFS). Otherwise, serious performance degradation can result.

    EM OS disk file cache memory requirements

    How much OS memory does each Enterprise Manager need? At full load, it's typically 1.5 GB of JVM heap space allocated to the Enterprise Manager process in JVM properties, but on top of that there must be OS memory - physical RAM -- for at least another 1 GB free over and above the requirements for the OS. The CA Wily recommendation is a minimum of 3 GB for a system running an Enterprise Manager; preferably 4 GB.

    Note If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.Enterprise Manager basic requirements 47

  • CA Wily Introscope If your hardware allows it, CA Wily recommends running the OS in 64-bit mode to take advantage of the large file cache. The file cache is important for the Enterprise Manager when doing SmartStor maintenance like spooling and reperiodization. This cache resides in physical RAM, and is dynamically adjusted by the OS during runtime based on available physical RAM. Therefore, our recommendation is for 4 GB RAM.

    As general guidance, each Enterprise Manager should have about 1.5 GB of OS file cache available in its memory.

    Top of the hour problems are usually related to SmartStor spooling which are best addressed by additional physical memory, especially disk file cache. The biggest single influencing factor for SmartStor spooling is the file cache size. Typically, 32-bit Windows allows a file cache just under 1 GB, and typically SmartStor spooling files for a full load are closer to 2 GB. That difference in size causes performance pressure. In providing a larger OS file cache, you are providing a large enough Enterprise Manager file cache to allow the OS to read the entire spool file into memory, then process the profile and dump it straight back out into the SmartStor archive as a data file.

    Enterprise Manager heap sizing

    The appropriate Enterprise Manager heap settings depend on your Enterprise Manager OS, hardware, and the metric load. The Enterprise Manager GC parallel flag youll need to set also depends on the Enterprise Manager OS version.

    In the heap settings examples below, note that when the total number of metrics that the Enterprise Manager monitors changes, the heap settings also change.

    Enterprise Manager Hardware(OS Version)

    RAM (GB)

    Total Metrics Monitored

    Example Enterprise Manager GC Flag Settings

    2x2.8Ghz Xeon HT

    (Win 2K Adv Server)

    2 90,000 lax.nl.java.option.additional=-server -Xms512m -Xmx512m -showversion -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=64m

    2x2.8Ghz Xeon HT

    (Win 2K Adv Server)

    3 210,000 lax.nl.java.option.additional=-server -Xms800m -Xmx800m -showversion -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=64m48 EM Requirements and Recommendations

  • Sizing and Performance GuideIf you are operating a high-performance Introscope environment, contact CA Wily Professional services for the appropriate Enterprise Manager JVM heap settings.

    SmartStor requirements

    Each EM requires SmartStor on a dedicated disk or I/O subsystem

    In Introscope 7, significant performance improvements were made in SmartStor that freed up CPU resources for other features such as virtual agents, calculators, Transaction Tracing and sampling, and applications with associated heuristic calculations (baselining). What matters to SmartStor is concurrent I/O throughput and how many disk spindles are servicing those requests. Having SmartStor on a second dedicated disk is required to take advantage of these enhancements.

    Point the SmartStor location to a separate dedicated disk or disk-array than the Transaction Event database (traces.db) and metrics baseline (heuristics) database (baselines.db). Verify that the SmartStor file persistence is actually going to that different disk. Ensuring that the SmartStor data directory is on its own disk is the top solution to many Introscope performance issues.

    When SmartStor is not on its own dedicated disk, the first indication that there is a problem is when there are SmartStor spooling problems. For more information, see About SmartStor spooling and reperiodization on page 40.

    Note For information about a spreadsheet to help you determine your SmartStor disk requirements, see the Introscope Configuration and Administration Guide.