1 capacity planning case studies. 2 case 1: the organization: an online travel agency offers a...

1

Capacity Planning Case Studies

2

Case 1: The Organization: An Online Travel Agency

• Offers a variety of travel services, including fare finder, hotel and car rental information, reservations, and destination information.

• Over 5.7 million unique visitors monthly

• The site's visitors view an average of 3.3 unique pages per day.

• Visitors to the site spend roughly 51 seconds on each pageview and a total of five minutes on the site during each visit.

• Average Load Time was slow (2.29 Seconds), average response time 20-50 s (Slow)

• Business people were complaining to IT that the load and response time were not acceptable constantly (Their requirements changed all the time due to the competitors)

• There was never any capacity planning approaches • The practice of Performance Tuning was performed by the vendors for the trouble areas one at

the time

• Processes were all reactive, Actions were all temporary• Fixing a problem led to other problems

• IT Services have to be optimized to meet the needs of the business

3

Their Capacity Planning Approaches

• A processes was introduced to the IT team including the following steps:

• The main steps of the methodology were: • Characterizing the Business Case• Functional Analysis• Characterizing the User Behavior• Characterizing the IT Infrastructure• Characterizing the Workload• Performance Model Development• Performance Prediction• Cost Modeling

4

Process Issues

• They have been given all servers sized by the vendors

• Vendors have been always tuning the performance

• The IT didn’t have the time, skills and patient to do a full fledged capacity planning

• So the process was reduced to smaller steps

5

Simple Capacity Planning Model

Three Steps for Capacity Planning were taken for simplicity

1. Determine Service Level Requirements• The first step was to categorize the work done by systems and to

quantify users’ expectations for how that work gets done.

2. Analyze Current Capacity• Next, the current capacity of the system was analyzed to determine how

it was meeting the needs of the users.

3. Planning for the future• Finally, using forecasts of future business activity, future system

requirements were determined. Implementing the required changes in system configuration would ensure that sufficient capacity would be available to maintain service levels, even as circumstances change in the future.

6

Other Issues

• Translating • Business plans to technology

– Difficult: IT people didn’t have the skills to listen to business people and business people didn’t understand the IT terms

• Benchmarking– Load testing

» Was hard due to the business people perspective on working systems

• Trending– Linear trend analysis– Statistical approaches

» Never did that

• Modeling• Analytic modeling• Simulation modeling

– Didn’t know what it was

7

Proactive Resource Management Approach

• Ensure service delivered matches expectations

• Ensure service quality not impacted by cost-saving consolidation efforts

• Determine impact of growth changes to existing infrastructure plans

8

Determine Service Level Requirements

• The overall process of establishing service level requirements first demanded an understanding of workloads.

• We began looking workloads on a system running a back-end Oracle database.

• Before setting service levels, we needed to determine what unit we will use to measure the incoming work.

9

What a Service Level Agreement (SLA) Is

• An SLA sets the expectations between the consumer and provider. It helps define the relationship between the two parties. It is the cornerstone of how the service provider sets and maintains commitments to the service consumer.

• A good SLA addresses five key aspects:• What the provider is promising.• How the provider will deliver on those promises.• Who will measure delivery, and how.• What happens if the provider fails to deliver as promised.• How the SLA will change over time.

10

A Simple SLA Template• 1.0 Statement of Intent

• This section states the objectives of the document.• 1.1 Approvals All parties must agree on the SLA.

• This section contains a list of who approved the SLA.• 1.2 Review Dates This section contains the track record of the SLA reviews.• 1.3 Time and Percent Conventions

• This section contains the descriptions of what time conventions and metrics are being used.• 2.0 About the Service

• This section introduces the service addressed by this SLA.• 2.1 Description

• This section describes the service in detail.• 2.2 User Environment

• This section describes the architecture and technologies that are used by the consumers of the service.• 3.0 About Service Availability

• This section introduces the availability concepts used in this SLA.• 3.1 Normal Service Availability Schedule

• This section describes what is considered normal service availability.• 3.2 Scheduled Events That Impact Service Availability

• This section describes what scheduled outages are to be expected,• 3.3 Non-emergency Enhancements

• This section describes the process that inserts enhancements into the infrastructure.• 3.4 Change Process

• This section describes the complete process of how changes are introduced in the service., including the associated availability impact.

• 3.5 Requests for New Users • This section describes the provisioning process of new users/customers.

• 4.0 About Service Measures • This section contains a detailed description of how the service availability is measured and reported.

11

Service Level Management (SLM) Approaches

• Service Level management for the company was the disciplined, proactive methodology and procedures shall be to ensure that adequate levels of service are delivered to all IT users in accordance with business priorities and at an acceptable cost.

• This is what they standardized thorough their ITIL processes

12

Purpose of SLM

• Customer satisfaction• Set expectations• Define bounds of the services• Resource management• Cost optimization

13

Adding an SLA to an Existing Service

• Build a model of the production environment

• Predict projected growth

• Compare predictions to service level parameters

• Make necessary changes to planned provisioning

14

Service Level Metrics

• Performance

• Demand levels

• Security

• Recoverability

• Availability

15

Service Level Considerations – Model was created by IT and Business Collaboration

• Average Load Time < 2 Seconds

• Average Response Time < 10 s • response time for search transactions

• Number of Requests that should be processed within the peak hour >=300,000

• Availability >= 99.999%• NEBS 3

• Site has to be recoverable within 5 seconds

16

Workload Definition and Classifications

• A workload is a logical classification of work performed on a system.

• If you consider all the work performed on your systems as pie, a workload can be thought of as some piece of that pie.

• Workloads can be classified by a wide variety of criteria

17

Workload

• The workload of a system can be defined as the set of all inputs that the system receives from its environment during any given period of time.

HTTPrequests

Web Server

18

Our Approach to Find out the Workloads

• who is doing the work• (particular user or department)

• what type of work is being done• (order entry, financial reporting)

• how the work is being done• (online inquiries, batch database backups)

19

Workload characterization

• Critical transaction: Property Search, Book a Reservation, Change a Reservation, Canceling a Reservation, Payment Processing

• Searches, submission of property via forms, requests for changes

20

Processes ran on the System

• Processes running on the system revealed that during the same 24 hour period, individual processes ran on this system were identified.

21

Partitioning the Workload

• Resource usage• Applications• Objects• Geographical orientation• Functional• Organizational units• Mode

22

Workload Characterization

• Common steps were:• specification of a point of view from the workload will be

analyzed;• choice of set of relevant parameters;• monitoring the system;• analysis and reduction of performance data• construction of a workload model.

23

Workloads in the System (extracted from the processes)

• AmenityInformation• Availability• BookReservation• BrandInformation• CancelReservation• Error• ModifyReservation• MultiAvailability • POISearch• PropertyInformation• PropertySearch• RateRules • MultiAvailability • RateRules • PaymentProceesing• PropertySearch• PropertyInformation• POISearch

24

List of Workloads

• All the processes were attributed to one of four (4) workloads. These workloads are defined according to the type of work being done on the system.

• All we did was define workloads based on the type of work being performed on this server.

25

Clustering Analysis

Service Demands

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.5 1 1.5

CPU Time

I/O T

ime

26

Workload Partitioning

Worklaod Class

Workload

Type

1 BrandInformation, Error

2 CancelReservation, MultiAvailability

RateRules

3 BookReservation, Availability, ModifyReservation, PaymentProceesing

4 AmenityInformation, POISearch

PropertyInformation

PropertySearch

27


Workload Class

Frequency Max CPU time (msec)

Max I/O time (msec)

1 40% 8 120

2 30% 20 300

3 20% 100 700

4 10% 900 1200

28

Validating Workload Models

ActualWorkload

SyntheticWorkload

SystemSystem

Acceptable?Model

Calibration

Yes

No

Measured RT, page load, availability, throughput


29

Analyze Current Capacity

• There were several steps that had to be performed during the analysis of capacity measurement

• data.• a. First, compared the measurements of any items referenced in service

level agreements with their objectives. This provided the basic indication of whether the system had adequate capacity.

• b. Next, checked the usage of the various resources of the system (CPU, memory, and I/O devices). This analysis identified highly used resources that might prove problematic now or in the future.

• c. Looked at the resource utilization for each workload. Ascertained which workloads were the major users of each resource. This helped narrow the attention to only the workloads that were making the greatest demands on system resources.

• d. Determined where each workload was spending its time by analyzing the components of response time, allowing to determine which system resources were responsible for the greatest portion of the response time for each workload.

30

Plan for the Future

• How did we make sure that a year from now the systems won’t be overwhelmed?

• The best weapon was a capacity plan based on forecasted processing requirements.

• We needed to know the expected amount of incoming work, by workload. Then we could calculate the optimal system configuration for satisfying service levels.

31

The Approach

• Followed these steps:

• First, we needed to forecast what the business will require of your IT systems in the future.• Business didn’t know

• Once we knew what to expected in terms of incoming work, we used basic excel tools to determine the optimal system configuration for meeting service levels on into the future.

• This method didn’t work due to lack of business people involvement

Workload Forecasting with Excel was used

32

Workload Forecasting with Excel (weighted exponential smoothing time-series forecasting method was recommended)

Transaction Type Current 1 year Forecast

Trivial 40% 40%

Light 30% 28%

Medium 20% 22.1%

Heavy 10% 12.1%

33

Workload Forecasting (weighted exponential smoothing time-series forecasting method with the last three years data was used to drive the forecast)

Transaction Type 3 years ago

2 years ago

Last year

Trivial 50% 43% 36%

Light 20% 23% 24%

Medium 16% 20% 26%

Heavy 14% 10% 14%

The tool used is Exponential-Smoothing-genworth (given to you today)

34

Exponential Smoothing

• This is a very popular scheme to produce a smoothed Time Series. Whereas in Single Moving Averages the past observations are weighted equally, Exponential Smoothing assigns exponentially decreasing weights as the observation get older. • In other words, recent observations are given relatively more

weight in forecasting than the older observations.

• In the case of moving averages, the weights assigned to the observations are the same and are equal to 1/N. • In exponential smoothing, however, there are one or more

smoothing parameters to be determined (or estimated) and these choices determine the weights assigned to the observations.

35

About the Accuracy and the Errors

• MAD= Mean Absolute Deviation• MADPE= Mean Absolute Percentage Error• MSE=Mean Squared Error • Smoothing factor for correlation factor

36

Quantity Accuracy Common Measurements

Mean Error

Where :

FE : Forecast Error

Ai : The actual value in time period i

Fi : The forecast value in time period i

Mean Square Error

Where :

MSE : Mean Square Error



)]([1

1

n

iii FA

nFE

2

1

)]([1

n

iii FA

nMSE

37

Quantity Accuracy Common Measurements

Mean Absolute Deviation

Where :

MAD : Mean Absolute Deviation



Mean Absolute Percentage Error

Where :

MAPE : Mean Square Error



n

iii FA

nMAD

1

1

1001

1

n

i i

ii

A

FA

nMAPE

38

Value of “a”

• An exponentially weighted moving average with a smoothing constant a, corresponds roughly to a simple moving average of length (i.e., period) n, where a and n are related by:

• a = 2/(n+1) OR n = (2 - a)/a.

• Thus, for example, an exponentially weighted moving average with a smoothing constant equal to 0.1 would correspond roughly to a 19 day moving average. And a 40-day simple moving average would correspond roughly to an exponentially weighted moving average with a smoothing constant equal to 0.04878.

39

Determine Future Processing Requirements

• The forecasting method for 1 year was acceptable, but the future processing requirements had to come from a variety of sources based on commitment from CTO, CFO, COO and CEO.

• Input from management that was committed for next year were:

• Expected growth in the business• Requirements for implementing new applications• Planned acquisitions or divestitures• IT budget limitations• Requests for consolidation of IT resources

40

Plan Future System Configuration

• After system capacity requirements for the year 1 was identified, a capacity plan was developed to prepare for it.

• The first step in doing this was to create a model of the current configuration.

• From this starting point, the model could be modified to reflect the future capacity requirements.

• If the results of the model indicated that the current configuration did not provide sufficient capacity for the future requirements, then the model could be used to evaluate configuration alternatives to find the optimal way to provide sufficient capacity.

41

Forecasted Workload – Configuration Changes

ActualWorkload

SyntheticWorkload

SystemSystem

Acceptable?Configuration

for the New System(CPU, I/O, Memory)

Yes

No



Cost $

42

The Capacity Plan

• The Capacity Plan included:• Identifying Server Transactions • Defining Server Transaction Throughput

Requirements • Choosing Hardware • Obtaining Measurements/Throughput Rates • Calculate the Required Number of Machines • Make Network Topology Choices

43

Capacity Plan Template Used

• Production Environment Servers • Server Capacity Requirements • Processing Capacity Requirements - Server Class and CPUs • Memory Requirements • Disk Capacity Requirements

• User Learning Environment Servers • Processing Capacity Requirements - Server Class and CPUs • Memory Requirements • Disk Capacity Requirements

44

Capacity Plan Template Used

• Testing Environment Servers • Processing Capacity Requirements - Server Class and CPUs • Memory Requirements • Disk Capacity Requirements

• Desktop Client Machines • Specifications • Processing Requirements

• Network Capacity • Bandwidth Requirements

45

Business Impacts

• Business Benefits Expected• Potential Impact of Recommendation• Risks Involved• Resources Required• Setup and Ongoing Costs

46

Assumptions and Risks

• Assumptions • <Assumption 1> • <Assumption 2>

• Risks • <Risk 1> • <Risk 2>

• Open and Closed Issues for this Deliverable • Open Issues • Closed Issues

47

Template

1. SLA

2. Business process and workload

3. Forecasting

4. Cost

48

Case Study 2: Capacity Planning for Admin Server

49

ConfigurationPlan

InvestmentPlan

PersonnelPlan

Understanding the Environment


WorkloadModel

Validation and Calibration

Workload Forecasting

Performance PredictionCost PredictionValidModel

Cost Model

Developing a Cost Model

PerformanceModel

Cost/Performance Analysis

Methodology

50

Capacity Planning Approach

Three Steps for Capacity Planning

1. Define the Business Requirements1. Business Process Description/Flow2. Business Realignment3. Decomposition of the Business Process (outline the business functions)

2. Determine Service Level Requirements1. The first step was to categorize the work done by systems and to

quantify users’ expectations for how that work gets done.3. Analyze Current Capacity

1. Next, the current capacity of the system was analyzed to determine how it was meeting the needs of the users.

4. Planning for the future1. Finally, using forecasts of future business activity, future system

requirements were determined. Implementing the required changes in system configuration would ensure that sufficient capacity would be available to maintain service levels, even as circumstances change in the future.

51

Business Requirements

• Business Case•

• Business Process Description/Flow•

• Business Realignment•

• Decomposition of the Business Process (outline the business functions)•

52

Business Requirements Example

• Business Case• ACH

• Business Process Description/Flow• Automated Clearing House (ACH)• It is a way for businesses to transmit money through a banking system

(EFT)• Business Realignment

• 1 year changes: 25% increase • Decomposition of the Business Process (outline the business

functions)• ACH creates a file which will transmitted to the finance team, then the

finance team will validate this with the actual transaction. SWD creates a record in ACH file. PAY will create a record in ACH.

53

SLA ACH

• This process impact money flowing in/out from the company.

• The files has to leave the server 8 am (be transmitted)

54

Determine Service Level Requirements for all the Activities

For each Activity:

1. Map the activity to a business process

2. Define the SLA parameters for each activity based on the business process

55

Service Level Requirements for Activity “ACH”

1ACH Target Description

Transmission Time 8 am Mon-Fri

It has to leave the system by 8 am weekdays

Response Time 1800 s Time to process an ACH

56

Service Level Requirements for the System

1800 Target Description

Availability 99.999%

Recoverability 900 s

57

Business SLA Mapping to Activity SLA

• Business SLA is has to be mapped to Activity SLA

• Activity SLA can be measured and monitored through testing and operations laws.

58

ACH Example

• Business Process

• Business Process SLA

• ACH SLA

59

Business SLA Requirements

• Automated Clearing House (ACH)• SLA Parameters:• Transmission Time

– 8 am Mon-Fri– It has to leave the system by 8 am weekdays

• Response Time– 1800 s– Time to process an ACH

what is the main business process SLA parameters? Rx, Process., Tx

• SLA Parameters:

• Rx: by 8:05 they have to see the funds, processing time=20 minutes, tx before 9:30

• What is the mapping?

60

SLA Template• 1.0 Statement of Intent

• This section states the objectives of the document.• 1.1 Approvals All parties must agree on the SLA.

• This section contains a list of who approved the SLA.• 1.2 Review Dates This section contains the track record of the SLA reviews.• 1.3 Time and Percent Conventions

• This section contains the descriptions of what time conventions and metrics are being used.• 2.0 About the Service

• This section introduces the service addressed by this SLA.• 2.1 Description

• This section describes the service in detail.• 2.2 User Environment

• This section describes the architecture and technologies that are used by the consumers of the service.• 3.0 About Service Availability

• This section introduces the availability concepts used in this SLA.• 3.1 Normal Service Availability Schedule

• This section describes what is considered normal service availability.• 3.2 Scheduled Events That Impact Service Availability

• This section describes what scheduled outages are to be expected,• 3.3 Non-emergency Enhancements

• This section describes the process that inserts enhancements into the infrastructure.• 3.4 Change Process

• This section describes the complete process of how changes are introduced in the service., including the associated availability impact.

• 3.5 Requests for New Users • This section describes the provisioning process of new users/customers.

• 4.0 About Service Measures • This section contains a detailed description of how the service availability is measured and reported.

61

Analyze Current Capacity

• There were several steps that had to be performed during the analysis of capacity measurement

• data.• a. First, compared the measurements of any items referenced in service

level agreements with their objectives. This provided the basic indication of whether the system had adequate capacity.

• b. Next, checked the usage of the various resources of the system (CPU, memory, and I/O devices). This analysis identified highly used resources that might prove problematic now or in the future.

• c. Looked at the resource utilization for each workload. Ascertained which workloads were the major users of each resource. This helped narrow the attention to only the workloads that were making the greatest demands on system resources.

• d. Determined where each workload was spending its time by analyzing the components of response time, allowing to determine which system resources were responsible for the greatest portion of the response time for each workload.

62

Workload Partitioning For Admin Server (ACH)

Workload Classification based on Process Count

Workload

Type

Frequency in week %

Class 1 128 0.000724078Class 2 2685 0.015188658Class 3 22,756 0.128727416Class 4 125,500 0.70993543Class 5 826735 4.676720861Class 6 3992124 22.58287068Class 7 12707736 71.88583288

Total Count: 176,776,64

63

Workload Forecasting For Admin Server/Count

Class Type Frequency in week %

Forecast 1 Year

Class 1 128 0.000724078

Class 2 2685 0.015188658

Class 3 22,756 0.128727416

Class 4 125,500 0.70993543

Class 5 826735 4.676720861

Class 6 3992124 22.58287068

Class 7 12707736 71.88583288

Total Count: 176,776,64

64

Workload Class

Frequency Max I/O

Time (msec)

Max CPU time (msec)

Memory Usage

(MB)

1

2

3

4

5

6

7

8

Workload Partitioning For Admin Server – Service Demand

65

Workload Class

Frequency Forecast

Forecast I/O

Time (msec)

Forecast CPU time (msec)

Forecast Memory Usage (MB)

1

2

3

4

5

6

7

8

Forecasted Workload For Admin Server – Frequency and Service Demand

66

Performance Modeling and Prediction for ACH

System and Workload

Description

Performancemetrics: response

time, transmission time

67

Estimating Performance Measures

QueuingNetwork Model

System Description

PerformanceMeasures

• Response time• Throughput• Utilization

• Queue length

• System parameters

• Resources parameters

• Workload parameters- service demands- workload intensity

68

Validating Performance Models

RealSystem

PerformanceModel

CalculationsMeasurements

Acceptable?Model

Calibration

Yes (*)

No

Measured RT, Thput., etc

Calculated RT,Thput., etc.

69

Planning for the Future

• Finally, using forecasts of future business activity, future system requirements were determined. Implementing the required changes in system configuration would ensure that sufficient capacity would be available to maintain service levels, even as circumstances change in the future.

70

Plan for the Future

• How did we make sure that a year from now the systems won’t be overwhelmed?

• The best weapon was a capacity plan based on forecasted processing requirements.

• We needed to know the expected amount of incoming work, by workload. Then we could calculate the optimal system configuration for satisfying service levels.

71

Future Performance Modeling and Prediction

• How are performance measures estimated?

Future System and Workload

Predicted Performancemetrics: responsetime, throughput,

utilization, etc

72

Forecasted Workload – Configuration Changes

ActualWorkload

SyntheticWorkload

SystemSystem

Acceptable?Configuration

for the New System(CPU, I/O, Memory)

Yes

No



Cost $

73

Plan Future System Configuration

• After system capacity requirements for the year 1 was identified, a capacity plan was developed to prepare for it.

• The first step in doing this was to create a model of the current configuration.

• From this starting point, the model could be modified to reflect the future capacity requirements.

• If the results of the model indicated that the current configuration did not provide sufficient capacity for the future requirements, then the model could be used to evaluate configuration alternatives to find the optimal way to provide sufficient capacity.

74

The Capacity Plan

75

The Capacity Plan

• The Capacity Plan included:• Identifying Server Transactions • Defining Server Transaction Throughput

Requirements • Choosing Hardware • Obtaining Measurements/Throughput Rates • Calculate the Required Number of Machines • Make Network Topology Choices

76

Production Environment Servers

• Server Capacity Requirements • Processing Capacity Requirements - Server Class

and CPUs • Memory Requirements • Disk Capacity Requirements

77

User Learning Environment Servers

• Processing Capacity Requirements - Server Class

and CPUs

• Memory Requirements

• Disk Capacity Requirements

78

Testing Environment Servers

• Processing Capacity Requirements - Server Class

and CPUs • Memory Requirements • Disk Capacity Requirements

79

Network Capacity

• Bandwidth Requirements

80

Assumptions and Risks

• Assumptions • <Assumption 1> • <Assumption 2>

• Risks • <Risk 1> • <Risk 2>

• Open and Closed Issues for this Deliverable • Open Issues • Closed Issues

81

Capacity Planning Case Study 3

82

The Organization

• IT department for a fortune 100 communications service provider

• They used standard Capacity Planning procedures for the growth and no methodology was used

83

Capacity Planning for Servers in a WebSEAL Environment

• WebSEAL provided authentication and authorization mechanism based on Tivoli Access Manager. • It enabled an end-to-end Single Sign On (SSO) solution for

secure transactions for WebSphere application servers.

• In a WebSEAL environment, several servers handled the workload.

84

Service Level

• No service level requirements were defined

• Capacity Planning couldn’t be completed with the structured methodology like the previous case study for the following reasons:

• IT team didn’t have the time and patient to deal with the business people

• IT team trusted the vendor models

• Forecasting was done ad-hoc a 20% increase for the next 3 years

85

Servers

Server

WebSEAL (PDWeb)

Lightweight Directory Access Protocol (LDAP)

Backend Web servers

86

WebSEAL Network Topology Choices

• WebSEAL was placed in a DMZ connecting Internet users on one physical network to Intranet backend Web servers on another physical network.

• LDAP was used as the registry and resides on the Intranet for security purposes.

• All servers supported replication.

87


88

Server Transactions

• High-level transaction consists of authenticated Web page access.

• Page access required Secure Sockets Layer (SSL), used LDAP authentication, averaged about 5 KB in size, and flows through a WebSEAL TCP junction.

• This work requested visits the WebSEAL server, the LDAP server, a backend Web server, and the physical network.

89

WebSEAL SSL authenticated Web page access to a TCP junctioned backend

LDAP WebSEAL-driven authentication

Backend Web server

TCP Web page access

Physical network

Approximately 10 KB of traffic, including a small overhead for LDAP server communication

Server Transactions Used

90

Transactions in a WebSEAL Environment

• The primary, high-level transaction in the WebSEAL environment was access to a Web page.

• There were several parameters that defined Web page access. • For example, one parameter was whether the page access

includes a user login (authentication). Another transaction in a WebSEAL environment was the administrative update, such as creating, deleting, listing, and modifying users or ACL entries.

91

Parameters associated with transaction in a WebSEAL environment

• WebSEAL server • Page access

• TCP or SSL • Authenticated, post-authenticated, not authenticated • If SSL, new browser each request (new SSL session) or same browser (SSL session reuse) • If SSL, the SSL cache size and timeouts • If authenticated, user in Tivoli SecureWay Policy Director cache or not • If authenticated, failed login or not • Logout (pkmslogout page request) or not • Junctioned or not junctioned to a backend Web server • If junctioned, TCP or SSL junction type • If junctioned, backend requires authentication or not • Web page size

• Administrative • Create, delete, list, or modify a user • Create, delete, list, or modify an ACL

• LDAP server • Authentication

• User from LDAP cache, DB2(R) cache, or disk (non-cached) • Failed authentication or not • If failed authentication, why? User not found or invalid password • Registry size (number of users)

• Administrative • Create, delete, list or modify • Update to a LDAP master or propagated to an LDAP replica • If update to a LDAP master, replication setting (on or off) • Frequency of propagation (immediate or delayed) • Registry size (number of users)

• Backend Web server • TCP or SSL • Authenticated, post-authenticated, not authenticated • Web page size

• Physical network • Number of bytes in each transaction • Line speed

92

Basing Estimated on Registered Users

• Another method for estimating transaction throughputs , or requirements, was to base estimates on the number of users in the registry.

• The idea was that some percentage of these users will use the system in any given time period.

• Along with this, there was an idea of what an average user does in terms of putting load on the system.

• For example, the percentage of users in any given time period was the authentication rate.

• The authentication rate times the load applied by an average user gave the throughput for the average workload.

93

Calculations

• There were 4 million registered users and that approximately 20% of them log in (authenticate) each day. Also assumed that each user accesses 10 Web pages per session.

• The required authentication rate was 9.26 authentications per second as calculated in the following formula:

4,000,000 users * 0.20 percent / 24 hours in a day / 60minutes in an hour / 60 seconds in a minute = 9.26auths/sec

• The required page access rate is 92.6 pages per second as calculated in the following formula:

9.26 auths/sec * 10 pages per user session = 92.6 pages/sec

94

Identifying Server Transactions

• Assumed that the common parameters that define the Web page access transaction were as follows:

• Tivoli SecureWay Policy Director connected to the backend Web server using a TCP junction with options that filter the original identity of the user and replace it with one provided by WebSEAL (-B, -U, and -W options).

• ACLs were defined in WebSEAL to protect backend Web server resources.

• Authentication was tuned as described in the Tivoli SecureWay Policy Director Base Performance Tuning Guide.

• The average Web page was 10 KB in size.

95

Identifying Server Transactions

• Following high-level transactions were identified:

• TCP page--A user makes a request for a HTTP (TCP) Web page going through WebSEAL to a backend Web server. No authentication occurs.

• SSL authentication page--A user makes an initial request for a HTTPS (SSL) Web page going through WebSEAL to a backend Web server. Authentication occurs.

• SSL post-authentication page--A user makes a subsequent request for a HTTPS (SSL) Web page going through WebSEAL to a backend Web server. No authentication occurs.

96

Server Transactions

Server Transaction

WebSEAL 10 KB TCP page access through a TCP junction 10 KB SSL page access with authentication through a TCP junction 10 KB SSL page access, already authenticated through a TCP junction

LDAP Authentication

Backend Web server 10 KB TCP page access, already authenticated (Tivoli SecureWay Policy Director authenticates to backend Web servers infrequently as defined by backend server, infrequent enough to be insignificant)

Internet network 10 KB Web page requests plus SSL, TCP, HTTP, and IP headers

Intranet network 10 KB Web page requests and 200 byte LDAP lookups plus TCP, HTTP, and IP headers

97

Server Transaction Throughput Requirements

IP traces showed that a typical user session consisted of the following requests:

• 3 TCP page • 1 SSL page requiring authentication • 9 SSL pages after authentication

The traces showed user authentication occurs at a rate of 5 per second.

• The system-wide throughput requirements were as follows:

• SSL page accesses requiring authentication: 5 per second • SSL page accesses after authentication: 9*5 = 45 per second • TCP page accesses: 3*5 = 15 per second

98

Throughput Requirements

Server Transaction Throughput requirements

WebSEAL 10 KB TCP page access through a TCP junction

15 /sec

10 KB SSL page access with authentication through a TCP junction

5 /sec

10 KB SSL page access, already authenticated through a TCP junction

45 /sec

LDAP Authentication 5 /sec

Backend Web server 10 KB TCP page access 65 /sec


780 KB/sec


781 KB/sec

99

Network Throughput

• The Internet network throughput requirement was calculated as follows:

(15 TCP pages/sec + 50 SSL pages/sec) * 10 KB/page => 650 KB/sec + 20% headers = 780 KB/sec

• The Intranet network throughput requirement is calculated as follows:

650 KB/sec Internet requirement + (5 SSL auth/sec) * 200 bytes => 651 + 20% headers = 781 KB/sec

100

Measurements Throughput

Server Transaction Maximum throughput measurements

WebSEAL 100 byte TCP page access through a TCP junction

750 /sec

5 KB TCP page access through a TCP junction

700 /sec

10 KB SSL page access through a TCP junction

655 /sec (estimated, see the formula described following this table)


40 /sec


250 /sec

LDAP Authentication 35 /sec

Backend Web server 10 KB TCP page access 750 /sec


7+ MB/sec


7+ MB/sec

101

TCP page access

• The estimated throughput for a WebSEAL 10 kilobytes (KB) TCP page access, estimating from the throughput measurements of the 100 bytes and 4 KB cases, was 655 pages/sec as calculated in the following formula:

y = 1 / ((x - x1) * (1/y1 - 1/y2) / (x1 - x2) + 1/y1) y = 1 / ((x - 100) * (1/750 - 1/700) / (100 - 5*1024)+ 1/750) 1 / ((10*1024 - 100) * (1/750 - 1/700) / (100 - 5*1024)+ 1/750) = 655

102

Transactions, requirements, and measurements

Transaction Throughput requirements Maximum throughput

measurements

WebSEAL 10 KB TCP page access through a TCP junction

15 /sec 655 /sec (estimated)


5 /sec 40 /sec


45 /sec 250 /sec

LDAP Authentication 5 /sec 35 /sec

Backend Web server 10 KB TCP page access 65 /sec 750 /sec


780 KB/sec 7+ megabytes (MB)/sec


781 KB/sec 7+ MB/sec

103

Calculating the machine factor

Server Calculation Machine Factor Increased by

20%

WebSEAL 15/655 + 5/40 + 45/250 0

.33 0.39

LDAP 5/35 0 .14 0 .17

Backend Web server

65/750 0 .09 0 .10

Internet network 780/7000 0 .11 0 .13

Intranet network 780/7000 0 .11 0 .13

104

Calculating the machine factor

• Since all machine factors were less than one, each server utilized only a portion of a machine given by the scaling factor.

• In other words, the scaling factor, multiplied by 100, gave the percentage of the machine utilized.

105

Calculating the Required Number of Machines

• The number of machines was based upon the requirements divided by the achievable, or measurements.

• The ratios were calculated for each transaction type for a given server, and then added together to get the full story for that server. The calculation was repeated for each server. This included the physical network, which was treated as a special type of server for capacity planning purposes.

• If the number of machines needed was less than one, it represents the portion of a machine that was needed.

• The formula as as follows:

• Machine factor = R1/M1 + R2/M2 + R3/M3 + . . . + Rn/Mn

• Variable definitions in this formula were as follows:

• Machine factor specifies the portion of or number of machines needed for a given server. • n specifies the number of transactions identified for the given server. • R1... Rn specifies the throughput requirements for transactions 1 through n. • M1 ... Mn specifies the throughput measurement for transactions 1 through n.

106

Scaling for CPU Utilizations Less Than 100%

• It is possible to estimate maximum throughput from measurements where less than maximum throughput has been achieved, but it requires knowledge of the CPU utilization. It also results in a larger margin of error, since software systems do not always behave well when they reach or go beyond maximum throughput.

• Following is the formula for estimating maximum throughput from measured throughput at less than maximum:

• Maximum throughput = measured throughput / CPU utilization

• For example, if throughput is measured at 50% CPU utilization, the estimated maximum throughput is twice the measured rate, since only half (50%) the machine is utilized.

107

Scaling for Hardware Differences

• One method for bridging hardware differences was to use published benchmarks, available from the following Web site:

• http://www.spec.org

• The Standard Performance Evaluation Corporation (SPEC) provides hardware manufacturers a way to publish the results of certain performance benchmarks.

• Since communication programs tend to act like integer arithmetic programs, the benchmark of interest was specint. To find specint results, selected SPEC CPU95 or SPEC CPU2000 from the Web site. Then selected Submitted Results and either SPECint95, SPECint_rate95, SPECint2000, or SPECint_rate2000.

108

Capacity Planning Case Study 4: Characterizing the Workload

of a Corporate B2B Portal

109

Understanding the Problem

• Business Case: Business portal of a fortune 100 semiconductor manufacturing company

• Roll out a new B2B portal• Access for employees, partners, and suppliers• First year estimate: 10,000 people will use the portal• The business goal is by end of 2014, 40,000 users will access

the portal.• Management wants to analyze the performance of the portal

application to make sure the given SLA is not violated.• Portal applications and services:

• Registration• Login• Employee directory• HR• Health insurance payments• On-demand interactive training• Simple text to video and audio• Issuing PO, viewing PO, tracking payments

110

Other Portal Functions

• Accounts Payable• Ethics Reporting• Advanced Shipment Notification• Audiocasts• Commercial Invoices and Packing Lists (pdf)• Explore becoming a supplier• Info for potential suppliers• Vested Outsourcing Overview for Logistics Suppliers

111

Other Transactions

• New Users• Supplier Pages• Employee and Contingent Worker • Registered Users• Manage My Account • Need help? • Check out Frequently Asked Questions.

112

Tasks 1 and 2

• Task 1: Identify all possible Workloads

• Task 2: Charactize the workloads

113

SLA

• The response time for the portal for page views

• 90th or 95th percentile Response Time• The 90th percentile response time of all portal

transactions shall be within 3 seconds. This means that only 10% of the transactions have a response time higher than 3-5 seconds and can therefore be a meaningful measure.

114


• Common steps were:• specification of a point of view from the workload will

be analyzed;• choice of set of relevant parameters;• monitoring the system;• analysis and reduction of performance data• construction of a workload model.

115

Workloads in the System (extract them from the portal functions)

• Supplier_Registration

• etc

116

List of Workloads

• All the portal functions shall be attributed to list of workload classed. These workloads are defined according to the type of work being done on the system.

• All we did was define workloads based on the type of work being performed on the physical infrastructure.

117


Worklaod Class

Workload

Type

1

2

3

4

5

118

Workload Partitioning after the portal is up and running after 3 months

Workload Class

Intensity Max CPU time (msec)

Max I/O time (msec)

1 12500 8 120

2 26% 20 300

3 20% 100 700

4 14% 900 1200

5 10% 3000 2800

119

Workload Partitioning Forecasting

Workload Class

12 months 18 months 24 months

1 30% 33% 36%

2 26% 20% 22%

3 20% 17% 24%

4 14% 18% 7%

5 10% 12% 11%

120

Capacity Planning Process

121

Capacity Planning Process

• 1. Determine service level requirements• a. Define workloads• b. Determine the unit of work• c. Identify service levels for each workload

• 2. Analyze current system capacity• a. Measure service levels and compare to objectives• b. Measure overall resource usage• c. Measure resource usage by workload• d. Identify components of response time

• 3. Plan for the future• a. Determine future processing requirements• b. Plan future system configuration

122

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

123


The goal is to learn what kind of

• hardware (clients and servers)• software (OS, middleware, and applications)• network connectivity and protocols

are present in the environment.

124

Elements in Understanding the Environment

Client platform Quantity and type

Server platform Quantity, type, configuration andfunctions

Middleware Type (e.g. TP monitors)

DBMS Type

Application Main types of applications, criticality,etc.

Networkconnectivity

Network diagrams with LANs, WANs,routers, servers, etc.

SLAs Existing SLAs per application

Procurementprocedures

Elements of the procurement process,expenditure limits, justificationprocedures for acquisitions.

125

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

126


Workload characterization is the process of precisely describing the system’s global workload in terms of its main components.

The basic components are then characterized by intensity and service demand parameters at each resource of the system.

127

Workload Characterization Process

Wkl component # 1(e.g., C/S transactions)

Global Workload

Wkl component # n(e.g., Web doc. Requests)

Basic component 1.1(e.g., personnel transactions)

Basic Component 1.2 (e.g., sales transactions)

Basic component n.k(e.g. video requests)

Basic component n.1(e.g., small HTML docs.)

. . .

128

Workload Description: example

Basic Components and Parameters Type

Sales transaction . Number of transactions submitted per client . Number of clients . Total number of I/Os to the Sales DB . CPU utilization at the DB server . Avg. messages sent/received by the DB server

--WIWISDSDSD

Web-based training . Avg. number of training sessions per day . Avg size of image files retrieved . Avg. size of http documents retrieved . Avg number of image files retrieved/session . Avg. number of documents retrieved/session . Avg. CPU utilization of the httpd server

--WISDSDSDSDSD

SD = service demandWI = workload intensity

129

Workload Characterization: concepts and ideas

• Basic component of a workload refers to a generic unit of work that arrives at the system from external sources.• Transaction,• interactive command,• process,• HTTP request, and • depends on the nature of service provided

130

Workload Characterization: concepts and ideas

• Workload characterization • workload model is a representation that mimics the

workload under study.

• Workload models can be used:• selection of systems• performance tuning• capacity planning

131

Workload Description

Hardware

Software

User

Resource-orientedDescription

Functional Description

BusinessDescription

132

Workload Description

• Business characterization: a user-oriented description that describes the load in terms such as number of employees, invoices per customer, etc.

• Functional characterization: describes programs, commands and requests that make up the workload

• Resource-oriented characterization: describes the consumption of system resources by the workload, such as processor time, disk operations, memory, etc.

133

Workload Models

• A model should be representative and compact.• Natural models are constructed either using basic components of

the real workload or using traces of the execution of real workload.• Artificial models do not use any basic component of the real

workload.• Executable models (e.g.: synthetic programs, artificial benchmarks, etc)

• Non-executable models, that are described by a set of parameter values that reproduce the same resource usage of the real workload.

134

Workload Models

• The basic inputs to analytical models are parameters that describe the service centers (i.e., hardware and software resources) and the customers (e.g. requests and transactions)

• component (e.g., transactions) interarrival times;• service demands• execution mix (e.g., levels of multiprogramming)

135

Selection of characterizing parameters

• Each workload component is characterized by two groups of information:

• Workload intensity• arrival rate• number of clients and think time• number of processes or threads in execution simultaneously

• Service demands (Di1, Di2, … DiK), where Dij is the service

demand of component i at resource j.

136

Partitioning the workload

• Motivation: real workloads can be viewed as a collection of heterogeneous components.

• Partitioning techniques divide the workload into a series of classes such that their populations are composed of quite homogeneous components.

• What attributes can be used for partitioning a workload into classes of similar components?

137

Workload Partitioning:Internet Applications

Application Classes KB Transmitted

WWW 4,216

ftp 378

telnet 97

Mbone 595

Others 63

138

Workload Partitioning:Document Types

Document Class Percentage of Access (%)

HTML (html file types) 30

Images (e.g., gif or jpeg) 40

Sound (e.g., au or wav) 4.5

Video (e.g., mpeg, avi or mov) 7.3

Dynamic (e.g., cgi or perl) 12.0

Formatted (e.g., ps, dvi or doc) 5.4

Others 0.8

139

Workload Partitioning:Geographical Orientation

Classes Percentage of Total Requests

East Coast 32

West Coast 38

Midwest 20

Others 10

140

Calculating the class parameters

• How should one calculate the parameter values that represent a class of components?

• Averaging: when a class consists of homogeneous components concerning service demands, an average of the parameter values of all components may be used.

• Clustering of workloads is a process in which a large number of components are grouped into clusters of similar components.

141

Data Collection Issues

• How to determine the parameter values for each basic component?

Data Collection Facilities

Use benchmark, industry practice,

and ROTs only

Use benchmark,industry practice, ROT,

and measurements

Use measurements only

None Some Detailed

142

Data Collection Issues: example

• The server demand at the server for a given application was 10 msec obtained in a controlled environment with a server with a SPECint rating of 3.11.

• What would be the service demand if the server used in the actual system were faster and had a SPECint rating of 10.4?

ActualServiceDemand = MeasuredServiceDemand x ScalingFactor

ScalingFactor = ControlledResourceThroughput / ActualResourceThroughput

ActualServiceDemand = 10 * (3.11/10.4) = 3.0 msec.

143

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

144

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

145


• How will the number of e-mail messages handled per day by the server vary over the next 6 months?

• How will the number of hits to the corporate intranet’s Web server vary over time?

146

Workload Forecasting (cont’d)

• Answering these questions involves:

• evaluating the organization’s workload trends;• analyzing historical usage data;• analyzing business or strategic plans;• mapping plans into business processes (e.g., paperwork reduction

will add 50% more e-mail).

• Workload forecasting techniques: moving averages, exponential smoothing, etc.

147

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

148

Performance Modeling and Prediction

• How are performance measures estimated?

System and Workload

Description

Performancemetrics: responsetime, throughput,

utilization, etc

149

Estimating Performance Measures

QueuingNetwork Model

System Description

PerformanceMeasures

• Response time• Throughput• Utilization

• Queue length

• System parameters

• Resources parameters

• Workload parameters- service demands- workload intensity

150

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

151

Validating Performance Models

RealSystem

PerformanceModel

CalculationsMeasurements

Acceptable?Model

Calibration

Yes (*)

No

Measured RT, Thput., etc

Calculated RT,Thput., etc.

(*) Accuracy from 10 to30% is acceptable in CP

152

ConfigurationPlan

InvestmentPlan

PersonnelPlan



WorkloadModel




Cost Model


PerformanceModel


Methodology

153

Basic Tools and Utilities Used

154

Sysstat Utilities

• The sysstat utilities are a collection of performance monitoring tools for Linux.

• These include sar, sadf, mpstat, iostat, pidstat and sa tools.

155

Sysstat Utilities

• Iostat reports CPU statistics and input/output statistics for devices, partitions and network files ystems.

• Mpstat reports individual or combined processor related statistics. • Pidstat reports statistics for Linux tasks (processes) : I/O, CPU, memory, etc. • sarcollects, reports and saves system activity information (CPU, memory, disks,

interrupts, network interfaces, TTY, kernel tables,etc.) • sadc is the system activity data collector, used as a backend for sar.• sa1 collects and stores binary data in the system activity daily data file. It is a front

end to sadc designed to be run from cron. • sa2 writes a summarized daily activity report. It is a front end to sar designed to be

run from cron. • Sadf displays data collected by sar in multiple formats (CSV, XML, etc.) This is useful

to load performance data into a database, or import them in a spreadsheet to make graphs.

• Nfsiostat reports input/output statistics for network file systems (NFS). • Cifsiostat reports CIFS statistics.

159

Find out who was monopolizing or eating the CPUs

• Finally, we needed to determine which process was monopolizing or eating the CPUs.

161

iostat command

• “iostat” command reports CPU statistics and input/output statistics for devices and partitions.

• It can be use to find out your system's average CPU utilization since the last reboot.

1 capacity planning case studies. 2 case 1: the organization: an online travel agency offers a...

Documents