performance management (best practices) ref: document id 15115

30
Performance Management (Best Practices) REF:www.cisco.com Document ID 15115

Upload: vernon-walters

Post on 22-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Performance Management(Best Practices)

REF:www.cisco.comDocument ID 15115

Introduction

• Performance Management involves optimization of network response time and management of consistency and quality of individual and overall network services

• The most important service is the need tomeasure the user/application response time.

• For most users, response time is the critical performance success factor.

Background (1)

• Performance problems often correlate with capacity of resources (CPU, RAM, Bandwidth).– In networks, this is typically bandwidth and data

that must wait in queues before it can be transmitted through the network.

– In voice applications, this wait time almost certainly impacts users because factors such as delay and jitter affect the quality of the voice call.

Performance management issues

• User performance • Application performance• Capacity planning• Proactive fault management

• It is important to note that with newer application like video and voice performance management is the key success

Performance management process flow (1)

Develop a network managementconcept of operation

Measure Performance

Perform a Proactive Fault Analysis

Performance management process flow (2)

1 develop a network management concept of operation– Define the required features : Services, Scalability

objectives– Define availability and network management

objectives– Define performance SLAs and Metrics– Define SLA

Performance management process flow (3)

2 Measure Performance– Gather network baseline data– Measure availability – Measure response time– Measure accuracy– Measure utilization– Capacity planning

Performance management process flow (4)

3 perform a proactive fault analysis– Use threshold for proactive fault management– Network management implementation– Network operation metrics

Performance management process flow (5)

Develop a network managementconcept of operation

Measure Performance

Perform a Proactive Fault Analysis

Develop a network management concept of operation

• The purpose of this document is to describe the overall desired system characteristics from an operational standpoint

• The focus of this document is to form the long range operational planning activities for network management and operation.

• It also provides guidance for the development of all subsequent definition documentation, such as service level agreements.

Define the required features: Services, Scalability Objectives

• Define services objectives :– To describe what the objectives that networks and

services are supposed to be– This step requires that you understand applications,

basic traffic flows, user and site counts, and required network services.

• Define scalability objectives: – to help network engineers design networks that meet

future growth requirement and not experience resource constraint (media capacity, number of routes and etc)

Define availability and network management Objectives (1)

• Defining availability objectives is to explain the level of services needed (service level requirements)

• This helps to ensure the solution meets end availability requirements

• It might lead to – categorize different class of service for each availability

requirement– Higher availability objective might necessitate

increased redundancy and support procedures

Define availability and network management objectives (2)

• Define manageability objectives to ensure that overall network management does not lack management functionality

• It might lead to – Have understand the process and tools used for

organization– Uncover all important MIB or network tool

information required to support a potential network – Have training required to support the new network

service

Define performance SLAs and Metrics

• Performance SLAs and metrics help define and measure the performance of new network solutions to ensure they meet performance requirements.

• The performance SLAs should include the average expected volume of traffic, peak volume of traffic, average response time and maximum response time allowed

Define SLAs (1)

• SLA (Service Level Agreement) – Customer (Enterprise) , SLM (Service Level Management) - Provider

• SLA include definitions for problem types and severity and help desk responsibilities– Escalation path, time before escalation at each tier support

level– Time to start work on the problem – Time to close target based on priority– Service to provide in the area of capacity planning, hardware

replacement

Performance management process flow

Develop a network managementconcept of operation

Measure Performance

Perform a Proactive Fault Analysis

Measure Performance

• Gather Network Baseline data– Perform a baseline of the network before and

after a new solution deployment – A typical router/switch baseline report includes

capacity issues related to CPU, memory, buffer, link/media utilization, throughput

– Application baseline: bandwidth used by app per time period

Measure availability

• Availability is the the measure of time for which a network system or application is available to a user– Coordinate the help desk phone calls with the

statistics collected from managed devices– Check scheduled outages– Etc

Measure Response Time

• Network response time is the time required to travel between two points

• Simple level – pings from the network management station to key points I the network. (not accuracy)

• Server-centric polling : SAA (Service Assurance Agent) on router (Cisco) to measure response time to a destination device

• Generate traffic that resembles the particular application or technology of interest

Measure accuracy

• Accuracy is the measure of interface traffic that does not result in error and can be expressed in term of percentage

• Accuracy = 100 – error rate • Error rate = ifInErrors * 100 / (ifInUcastPkts +

IfInNUcastPkts)

Measure Utilization (1)

• Utilization measure the use of a particular resource over time

• Percentage in which the usage of a resource is compared with its maximum operational capacity

• High utilization is not necessarily bad• Sudden jump in utilization can indicate

unnormal condition

Measure Utilization (2)

• Input utilization = ifInOctets *8*100/(time in second)*ifSpeed

• Output UtilizationifOutOctets *8*100/(time in second)*ifSpeed

Capacity planning

• The following are potential areas for concern:– CPU– Backplane or I/O– Memory– Interface and pip sizes– Queuing, latency and jitter– Speed and distance– Application characteristics

Performance management process flow

Develop a network managementconcept of operation

Measure Performance

Perform a Proactive Fault Analysis

Perform a Proactive fault analysis

• One method to perform fault management is through the use of RMON alarms and event groups

• Distributed management system that enables polling at a local level with aggregation of data at a manager to manager

Use threshold for proactive fault management (1/2)

• Threshold is the point of interest in specific data stream and generate event when threshold is triggered

• 2 classes of threshold for numeric data – Continuous threshold apply to continuous or time

series data such as data stored in SNMP counter or gauges

– Discrete threshold apply to enumerated objects or discrete numeric data such as Boolean objects

Use threshold for proactive fault management (2/2)

• 2 different forms of continuous threshold – Absolute :use with gauges– Relative (delta): use with counter

• Step to determine threshold– 1 select the objects– 2 select the devices and interfaces– 3 determine the threshold values for each object or

interface– 4 determine the severity for the event generated by

each threshold

Network management implementation

• The organization should have an implemented network management system.

• SNMP/RMON or other network management system tools

Network operation metrics (1/2)

• Number of problems that occurs by call priority• Minimum, maximum and average time to close

in each priority• Breakdown of problems by problem type

(hardware, software crash, configuration, power user error)

Network operation metrics (2/2)

• Breakdown of time to close for each problem type

• Availability by availability or SLA • How often you met or missed SLA

requirements