software performance testing based on workload characterization

Software Performance Testing Based on Workload Characterization

Elaine Weyuker

Alberto Avritzer

Joe Kondek

Danielle Liu

AT&T Labs

Workload Characterization

A probability distribution associated with the input domainthat describes how the system is used when it is operational in the field. Also called an operational profile or operational distribution. It is derived by monitoring field usage.

We have used it to select (correctness) test cases, predictrisk, assess software reliability, predict scalability, selectperformance test cases, among other things.

Steps in Characterizing the Workload

• Model the software system.– Identify key parameters that characterize the

system’s behavior.– Getting the granularity right.

• Collect data while the system is operationalor from related operational system.

• Analyze the data and determine the probability distribution.

System Description

Automated customer care system, built by another company, that can be accessed by both customer care agents and customers. It contains a large database with a web browser front-end and a cache facility.

For this system data was collected for 2 ½ months and analyzed page hits at 15 minute intervals.

Implementation Information

The system was implemented as an extension to the http web server daemon, with a mutex semaphore used to implement database locking. The system is single threaded. Queued processes executed spin lock operations until the semaphore was free.

Before Performance Testing

Prior to doing performance testing, users were complaining about poor performance, and the database was “hanging” several times a day.

The hypothesis was that these problems werecapacity-related, and the vendor was contactedbut unable to solve the problems.

Performance Testing Goals

Help them determine:

• Which resources were overloaded.

• Effects of the database size on performance.

• Effects of single vs. multiple transactions.

• Effects of the cache hit rate.

• Effects of the number of http servers.

System Information

• The web-server database was modeled as an M/D/1 queue.

• The arrival process was assumed to be Poisson.

• The cache hit rate was determined to be central to the system’s performance. It ranged between 80-87%, with the average being 85%.

Distribution of User Requests

Agent Requests Customer Requests

Page Type Percentage Percentage

Static Page 50% 23%

Error Code 10% 23%

Search Form 7% 30%

Search Result 8% 16%

Other Pages 25% 8%

Computing the Cache Hit Probability

Page Type Frequency Prob Occur Cache Prob Wted Prob

Home 2707 0.2236 0.9996 0.2235

Static 2515 0.2077 0.9407 0.1954

Error Code 1316 0.1087 0.6915 0.0752

Screen Shot 1076 0.0889 0.6078 0.0540

Search Res. 1035 0.0855 0.0463 0.0040

SearchForm 832 0.0687 0.9218 0.0633

Index 494 0.0408 0.9797 0.0400

Other 2132 0.1761 0.9484 0.1670

Total 12,106 1.0000 0.8224

System Observations

Heavier workload for agents on weekdays, peak hours in the afternoon, with little day-to-day variation.

Customer peak hours occurred during the evening.

Little change in workload as users become familiar with the system. (Agents are already expert users and execute a well-defined process, while individual customers tend to use the system rarely and therefore also maintain the same usage pattern over time.)

What We Achieved

• Characterized the agent and customer workload, and used it as a basis for performance testing.

• Identified performance limits and as a result detected a software bottleneck.

• Provided recommendations for performance improvement.

• Increased the understanding of the use of data collection for performance issues.

What We Learned

The system was currently running at about 20% utilization.

The CISCO routers were not properly load balanced.

The spin lock operations consumed CPU time whichled to a steep increase in the response time. (We usedthe SUN SE toolkit to record the number of spin locks).

No Caching

0

1

2

3

4

5

6

7

0 0.5 1 1.5 2 2.5

Load (hits/sec)

avg

CP

U c

ost

(sec

)/re

qu

est

Customer

Agent

No Caching

0

5000

10000

15000

20000

25000

30000

35000

40000

0 0.5 1 1.5 2 2.5

Load (hits/sec)

avg

res

po

nse

tim

e(m

s)

Customer

Agent

All Requests Retrieved From Cache

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 20 40 60 80 100 120 140 160 180

Load (hits/sec)

avg

res

po

nse

tim

e(m

s)

Customer

Agent

-10000

0

10000

20000

30000

40000

50000

60000

0 0.5 1 1.5 2 2.5 3 3.5 4

Load (hits/sec)

avg

res

po

nse

tim

e (m

s)

Customer

Agent

Simulated 85% Cache Hit Rate

In Particular

Delay strongly depends on caching:Found in cache ~ 100ms Retrieved from database ~ 5 secs

Current Available capacity:Customer: 2 hits/secAgent: 2.5 hits/sec

Average demand:Customer: 10,000 hits/day = 0.12 hits/secAgent: 25,000 hits/day = 0.29 hits/sec

Busy Hour demand:Customer: 784 hits/hour = 0.22 hits/secAgent 2228 hits/hour = 0.62 hits/sec

Effect of Database SizeCustomer – Cache Off

0

10000

20000

30000

40000

50000

60000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Load (hits/sec)

avg

res

po

nse

tim

e (m

s)

Small DB (200 MB)

Medium DB (400 MB)

Large DB (600 MB)

Effect of Database SizeAgent – Cache Off

0

5000

10000

15000

20000

25000

30000

35000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Load (hits/sec)

avg

res

po

nse

tim

e (m

s)

Small DB (200 MB)

Medium DB (400 MB)

Large DB (600 MB)

Adding ServersFor this system, n servers meant n service queues, eachoperating independently, and hence less lock contention.This led to a significant increase in the workload that could be handled.

However, since each server maintains its own cachingmechanism, there was a distinct decrease in the cache hitprobability, and an associated increase in the response time.

The response time is dominated by the cache hit probability when the load is low; as the load increases the queuingfor the database also increases.

Multiple Servers

0

10000

20000

30000

40000

50000

60000

0 1 2 3 4 5 6 7 8

Load (hits/sec)

avg

res

po

nse

tim

e (m

s)

1 web server

2 web servers

3 web servers

4 web servers

5 web servers6 web servers

Recommendations

• Projects should collect traffic data on a daily basis.

• Performance measurements should be made while the software is being testing in the lab – both for new systems and when changes are being made.

• Workload-based testing is a very cost effective way to do performance testing.

THE END

No Caching

0

10000

20000

30000

40000

50000

60000

70000

0 0.5 1 1.5 2 2.5

hits/sec

avg

res

p t

ime(

ms)

Error

Result

Form

Static

Cache Off

0

5

10

15

20

25

30

0 0.5 1 1.5 2 2.5

hits/sec

avg

CP

U c

ost

(sec

)/re

qu

est

Error

Result

Form

Static

software performance testing based on workload characterization

Documents