icse2013 malik

49
Automatic Detection of Performance Deviations in the Load Testing of Large Scale Systems Haroon Malik Software Analysis and Intelligence Lab (SAIL) Queen’s University, Kingston, Canada Hadi Hemmati Software Architecture Group (SWAG) University of Waterloo, Waterloo, Canada Ahmed E. Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University, Kingston, Canada Performance Team Canada

Upload: sailqu

Post on 18-Aug-2015

20 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Icse2013 malik

Automatic Detection of Performance Deviations in the Load

Testing of Large Scale Systems

Haroon MalikSoftware Analysis and Intelligence Lab (SAIL)

Queen’s University, Kingston, Canada

Hadi HemmatiSoftware Architecture Group

(SWAG)University of Waterloo,

Waterloo, Canada

Ahmed E. HassanSoftware Analysis and Intelligence Lab (SAIL)

Queen’s University, Kingston, Canada

Performance TeamCanada

Page 2: Icse2013 malik

Large scale systems need to satisfy performance constraints

Page 3: Icse2013 malik

LOAD TESTING

Page 4: Icse2013 malik

4

Environment setup Load test execution Load test analysis Report generation

Current Practice

1 2 3 4

Page 5: Icse2013 malik

5

Environment Setup Load test execution Load test analysis Report generation

1 2 3 4

Current Practice

Page 6: Icse2013 malik

6

Environment Setup Load test execution Load test analysis Report generation

1 2 3 4

Current Practice

Page 7: Icse2013 malik

7

Load Test Execution

MONITORING TOOL

LOAD GENERATOR- 1 SYSTEM PERFORMANCE REPOSITORY

LOAD GENERATOR- 2

Ahmed E. Hassan
This is the last slide where I fixed your slide headings.. You need to use the format painter to fix things
Page 8: Icse2013 malik

8

Environment Setup Load test execution Load test analysis Report generation

1 2 3 4

Current Practice

Page 9: Icse2013 malik

9

Environment Setup Load test execution Load test analysis Report generation

1 2 3 4

Current Practice

Page 10: Icse2013 malik

10

Environment Setup Load test execution Load test analysis Report generation

1 2 3 4

Current Practice

Page 11: Icse2013 malik

11

Challenges with Load Test Analysis

Limited Knowledge

Limite

d TimeLarge Number of Counters

1 2 3

Page 12: Icse2013 malik

12

Challenges with Load Test Analysis

Limited Knowledge

Limite

d TimeLarge Number of Counters

1 2 3

Page 13: Icse2013 malik

13

Challenges with Load Test Analysis

Limited Knowledge

Limite

d TimeLarge Number of Counters

1 2 3

Page 14: Icse2013 malik

14

Challenges with Load Test Analysis

Limited Knowledge

Limite

d TimeLarge Number of Counters

1 2 3

Page 15: Icse2013 malik

We Propose 4 Deviation detection Approaches

15

3 Unsupervised1 Supervised

Hadi Hemmati
The picture does not connect ot the slide I removet ityou are overacting in using pictures
Page 16: Icse2013 malik

Using Performance Counters to Construct Performance Signature

16

%CPU Idle

%CPU Busy

Byte Commits Disk

writes/sec

% Cache Faults/ Sec Bytes received

Page 17: Icse2013 malik

17

Performance Counters Are Highly Correlated

CPU DISK (IOPS)

NETWORK

MEMORY

TRANSACTIONS/SEC

Page 18: Icse2013 malik

18

High Level Overview of our Approach

Data Preparation

Signature Generation

Deviation Detection

Baseline Test

New Test

SanitizationStandardization

 

Performance Report

Input Load Test

haroon
1. I am not sure, if i keep this slide along with the previous two, or just this one.2. Also, I understand, that you may not consider NEW TEST as part of signature generation, .... despite the new test needs to go through all the phases that are essential for signature generation (since we need to compare the weights of the same set of counters as present in the Baseline) . To audience signature generation of both the baseline and NEW TEST and then their comparison will get digested easily.However, I can remove the NEW TEST from the signature generation, and make it go separately through PCA and Mapping process.In case you feel, its ok to keep it this way thenFor the diagram of this slide, i can say with exception to PCA all our approaches use only baseline test to generate performance signature.3. Since data preparation step is being used by all our approaches, i have not recursively used it in the next slide. I will verbally clear it to audience
Hadi Hemmati
1- i removed old slides 18-23just come to here after saying there exists some correlation2- keep the new test since it is part of deviation detection inputs3- it is fine. i just removed the extra text from the dat prep box. it was repretitive
Hadi Hemmati
New test should go to data prep since we compare signatures generate dfrom the clean data and not the full raw datai changed it
Page 19: Icse2013 malik

19

Prepared Load Test Extracting

CentroidsSignature

Data ReductionClustering

Prepared Load Test Random

SamplingSignature

Unsupervised Signature Generation

Random SamplingApproach

Clustering Approach

SignaturePrepared Load Test

Dimension Reduction

(PCA)

Identifying Top k Performance Counters

Mapping Ranking

Analyst tunes weight parameter

PCA Approach

Ahmed E. Hassan
How come? The PCA approach has the NEW TEST in there.. while the other approach do not have it?Also the Baseline Signature box should be OUT of the green area!Since that is the output of the process!--I want all three if not even all FOUR! in the same slide - showing pu together so people can see the differences!
Ahmed E. Hassan
Also ideally I want a figure the saysBaseline TEst -> Data Preperation -> Derivation -> Detection ->ResultsThen you can avoid showing the the Baseline and Data prepeartion parts of your figures in this slide which makes the slide too busy for no good reason
Hadi Hemmati
I deleted the new test as inputsignature generation does not need two inputs you give one test data(baseline or the new one) and get the signature as outputalso since it is the output of data prep I called the input as prepared Load test (whcih can be a prepared baseline or a prepared new test)
Page 20: Icse2013 malik

20

Supervised Signature Generation

Identifying Top k Performance

Countersi. Count

ii. % Frequency 

Attribute Selection

 

 

OneR

Genetic Search

………

SPC1

SPC2

SPC10

.

.

.Par

titi

onin

g th

e D

ata

Prepared Load Test

Labeling (only for baseline)

WRAPPERApproach Signature

Ahmed E. Hassan
Labelling needs to be its OWN box in the figure.It is not a data prepeartion step.. It is part and ESSENTIAL difference having it in there makes is hard for people to spot it.Again Signature needs to be out of it
Page 21: Icse2013 malik

21

Supervised Signature Generation

Identifying Top k Performance

Countersi. Count

ii. % Frequency 

Attribute Selection

 

 

OneR

Genetic Search

………

SPC1

SPC2

SPC10

.

.

.Par

titi

onin

g th

e D

ata

Prepared Load Test

Labeling (only for baseline)

WRAPPERApproach Signature

Ahmed E. Hassan
Labelling needs to be its OWN box in the figure.It is not a data prepeartion step.. It is part and ESSENTIAL difference having it in there makes is hard for people to spot it.Again Signature needs to be out of it
Page 22: Icse2013 malik

22

Two Deviation Detection Approaches

Using Control Chart Using Approach-Specific Techniques

For Clustering and Random Sampling Approaches

For PCA and WRAPPERApproaches

Page 23: Icse2013 malik

23

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Page 24: Icse2013 malik

24

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Page 25: Icse2013 malik

25

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Page 26: Icse2013 malik

26

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Control Chars use control limits (CL) to represent the limits of variation that should be expected

from a process.

Page 27: Icse2013 malik

27

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Baseline CL

Central Limit (CL): the median of all values of the performance counters in a baseline load test

Page 28: Icse2013 malik

28

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Baseline LCL, UCL

The Upper/Lower Control Limits (U/LCL) is the upper/lower limit of the range of a counter under the normal behavior of the system

Baseline CL

Page 29: Icse2013 malik

29

Deviation Detection

Clustering and Random Sampling

PCA Approach Performance Report

Comparing PCA Counter

Weights

Baseline Signature

New Test Signature

Training Data

Evaluation Data

WRAPPER Approach

Baseline Signature

New Test Signature

Control ChartPerformance

Report

Performance Report

Logistic Regression

Baseline Signature

New Test Signature

Ahmed E. Hassan
Need a slide that says we have two ways to do this1. Control chart based detection2. Approach specific detection
Hadi Hemmati
The fonts for Signature generation side and that of Deviation Detection box are kept diffrent on purpose----- to contrastit has enough of contrast (bkg color!)So I reduced stuff
Page 30: Icse2013 malik

30

Case StudyHow effective are our signature-based approaches in detecting performance deviations in load tests?

RQ

Page 31: Icse2013 malik

31

Case StudyHow effective are our signature-based approaches in detecting performance deviations in load tests?

RQ

An Ideal approach should predict a minimal and correct set of

performance deviations.

Evaluation Using:Precision, Recall and F-measure

Page 32: Icse2013 malik

Subjects of Study

System: Open SourceDomain: Ecommerce

Type of data:1. Data From Our Experiments with and Open Source Benchmark Application

DVD Store

32

System: Industrial SystemDomain: Telecom

Type of data:1. Load Test Repository2. Data From Our Experiments

on the Company’s Testing Platform

Page 33: Icse2013 malik

33

Fault Injection

Category Failure Trigger Faults

Software FailureResource Exhaustion

CPU Stress

Memory Stress

System Overload Abnormal Workload

Operator Errors Procedural ErrorsInterfering Workload

Unscheduled Replication

Page 34: Icse2013 malik

34

Case Study Findings

EffectivenessPrecision/Recall/F-measure

Practical Differences

Page 35: Icse2013 malik

35

Case Study Findings(Effectiveness)

WRAPPER PCA Clustering Random0

0.10.20.30.40.50.60.70.80.9

1

PrecisionRecallF-Measure

Random Sampling has the lowest effectiveness

On Avg. and in all experiments, PCA performs better than Clustering approach.

WRAPPER dominates the best supervised approach i.e., PCA

Ahmed E. Hassan
Convert this to a Figure..BOSS HUMANS ARE NOT ABLE TO READ SUCH TABLES.I mentioned this like every single time we talked so please fix this
Page 36: Icse2013 malik

36

Case Study Findings(Effectiveness)

WRAPPER PCA Clustering Random0

0.10.20.30.40.50.60.70.80.9

1

PrecisionRecallF-Measure

Overall, there is an excellent balance of high precision and recall of both the WRAPPER and PCA approaches (on average 0.95, 0.94 and 0.82, 0.84 respectively) for

deviation detection

Ahmed E. Hassan
Convert this to a Figure..BOSS HUMANS ARE NOT ABLE TO READ SUCH TABLES.I mentioned this like every single time we talked so please fix this
Page 37: Icse2013 malik

37

Real Time Analysis Stability Manual Overhead

Case Study Findings(Practical Differences)

Page 38: Icse2013 malik

38

Real Time Analysis

WRAPPER--- deviations on a per-observation basis.

PCA --- requires a certain amount of observations (wait time).

Page 39: Icse2013 malik

39

Stability

We refer to ‘Stability’ as the ability of an approach to remain effective while its signature size is reduced.

Page 40: Icse2013 malik

40

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

Stability

Page 41: Icse2013 malik

41

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

Stability

Page 42: Icse2013 malik

42

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

Stability

646055504540353025201510 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

Page 43: Icse2013 malik

43

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

Stability

646055504540353025201510 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

WRAPPER approach is more stable that PCA approach

Page 44: Icse2013 malik

44

Manual Overhead

WRAPPER approach requires all observations of the baseline performance counter data to be labeled as Pass/Fail

Page 45: Icse2013 malik

45

Manual Overhead

Marking each observation is time consuming

Page 46: Icse2013 malik

46

Limitations of Our Approaches

Hardware Difference Sensitivity

Page 47: Icse2013 malik

47

Hardware Differences Large systems have many testing labs to

run multiple load tests simultaneously. Labs may have different hardware.

Our approaches may interpret same test on different labs as deviated tests

Recent work by K.C.Foo et al. propose several techniques to over come this problemMining Performance Regression Testing Repositories for Automated Performance Analysis ( QSIC 2010)

haroon
I am not sure, if (ethically) i need to cite Dereks Regression paper or his thesis in the end of the slidesThe box with DEREK work is diffrent, since its not a finding or take away messege (perhaps, i should not even make it a box)Thanks.
Hadi Hemmati
yes you need a small citatinon in this page
Page 48: Icse2013 malik

48

Sensitivity

We can tune the sensitivity of our approaches. PCA – Analyst tunes the ‘weight’

parameter. WRAPPER – Analyst decides, in the

labelling phase, how big the deviation should be in order to be flagged.

Lowering sensitivity reduces false alarms, it may overlook some important outliers

Page 49: Icse2013 malik

49