metamorphic testing - moodle

58
Metamorphic Testing Largely based on Christian Murphy material and research

Upload: others

Post on 05-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metamorphic Testing - Moodle

Metamorphic TestingLargely based on Christian Murphy material and

research

Page 2: Metamorphic Testing - Moodle

!2

Motivating Example: Machine Learning

Page 3: Metamorphic Testing - Moodle

!3

Motivating Example: Simulation

LengthofStayversusU2liza2on

unitsof2

me

0

75

150

225

300

numberofbeds

0 3 5 8 10

LOSDoctorU2liza2onNurseU2liza2onTriageU2liza2onClerkU2liza2on

Page 4: Metamorphic Testing - Moodle

!4

Problem Statement■ Partial oracles may exist for a limited subset of

the input domain in applications such as Machine Learning, Discrete Event Simulation, Scientific Computing, Optimization, etc.

■ Obvious errors (e.g., crashes) can be detected with certain inputs or testing techniques

■ However, it is difficult to detect subtle computational defects in applications without test oracles in the general case

Page 5: Metamorphic Testing - Moodle

!5

What do I mean by “defect”?■ Deviation of the implementation from the

specification ■ Violation of a sound property of the software

■ “Discrete localized” calculation errors Off-by-one Incorrect sentinel values for loops Wrong comparison or mathematical operator

■ Misinterpretation of specification Parts of input domain not handled Incorrect assumptions made about input

Page 6: Metamorphic Testing - Moodle

!6

Observation■ Many programs without oracles have properties

such that certain changes to the input yield predictable changes to the output

■ We can detect defects in these programs by looking for any violations of these “metamorphic properties”

■ This is known as “metamorphic testing” [T.Y. Chen et al., Info. & Soft. Tech vol.4, 2002]

Page 7: Metamorphic Testing - Moodle

!7

Hypotheses■ For programs that do not have a test oracle, an

automated approach to metamorphic testing is more effective at detecting defects than other approaches

■ An approach that conducts function-level metamorphic testing in the context of a running application will further increase the effectiveness

■ It is feasible to continue this type of testing in the deployment environment, with minimal impact on the end user

Page 8: Metamorphic Testing - Moodle

!8

Other Approaches [Baresi & Young, 2001]

■ Formal specifications A complete specification is essentially a test oracle

■ Embedded assertions Can check that the software behaves as expected

■ Algebraic properties Used to generate test cases for abstract datatypes

■ Trace checking & Log file analysis Analyze intermediate results and sequence of executions

Page 9: Metamorphic Testing - Moodle

!9

Complexity vs. Effectiveness

Complexity

Effe

ctiv

enes

s

Embedded Assertions

Algebraic Specifications

Formal Specifications

Trace Checking & Log Analysis

System-level Metamorphic

Testing

Metamorphic Runtime Checking

Page 10: Metamorphic Testing - Moodle

!10

Metamorphic Testing [Chen et al., 2002]

■ If new test case output f(t(x)) is as expected, it is not necessarily correct

■ However, if f(t(x)) is not as expected, either f(x) or f(t(x)) – or both! – is wrong

x f f(x)Initial test case

t(x) f f(t(x))New test case

t f(x) and f(t(x)) are “pseudo-oracles”

Transformation function based on

metamorphic properties of f

Page 11: Metamorphic Testing - Moodle

!11

Metamorphic Testing Example■ Consider a function to determine the standard

deviation of a set of numbers

a b c d e fInitial input

c e b a f dNew test case #1

2a 2b 2c 2d 2e 2fNew test case #3

sstd_dev

std_dev

std_dev

s ?

2s ?

std_dev s ?New test case #2 a+2 b+2 c+2 d+2 e+2 f+2

Page 12: Metamorphic Testing - Moodle

!12

MotivationMetamorphic Properties

Page 13: Metamorphic Testing - Moodle

!13

Categories of Metamorphic Properties

■ Additive: Increase (or decrease) numerical values by a constant

■ Multiplicative: Multiply numerical values by a constant

■ Permutative: Randomly permute the order of elements in a set

■ Invertive: Negate the elements in a set ■ Inclusive: Add a new element to a set ■ Exclusive: Remove an element from a set ■ Compositional: Compose a set

Page 14: Metamorphic Testing - Moodle

!14

Classifiers Metamorphic Properties1. Permuting the order of the examples in the training data should not

affect the model 2. If all attribute values in the training data are multiplied by a positive

constant, the model should stay the same 3. If all attribute values in the training data are increased by a positive

constant, the model should stay the same 4. Updating a model with a new example should yield the same model

created with training data originally containing that example 5. If all attribute values in the training data are multiplied by -1, and an

example to be classified is also multiplied by -1, the classification should be the same

6. Permuting the order of the examples in the testing data should not affect their classification

7. If all attribute values in the training data are multiplied by a positive constant, and an example to be classified is also multiplied by the same positive constant, the classification should be the same

8. If all attribute values in the training data are increased by a positive constant, and an example to be classified is also increased by the same positive constant, the classification should be the same

Page 15: Metamorphic Testing - Moodle

!15

Other Classes of Properties (1)■ Statistical

Same mean, variance, etc. as the original

■ Heuristic Approximately equal to the original

■ Semantically Equivalent Domain specific

Page 16: Metamorphic Testing - Moodle

!16

Other Classes of Properties (2)■ Noise Based

Add/Change data that should not affect result

■ Partial Change to part of input only affects part of output

■ Compositional New input relies on original output ShortestPath(a, b) =

ShortestPath(a, c) + ShortestPath(c, b)

Page 17: Metamorphic Testing - Moodle

!17

Automatic Detection of Properties■ Static

Use machine learning to model what code looks like that exhibits certain properties, then determine whether other code matches that model Use symbolic execution to check “algebraically”

■ Dynamic Observe multiple executions and infer properties

Page 18: Metamorphic Testing - Moodle

objects do not change

Computer Vision

Page 19: Metamorphic Testing - Moodle

Given an Image

• Do not change their nature if we

• rotate them

• change the colour

• change the shadow(s)

• superimpose effects such as rain, fog, shadow(s)

• If two or more objects of the same type are in an image

• they still remain of the same type

• if a different object is placed in the scene the original objects are still there and unchanged

Objects

Page 20: Metamorphic Testing - Moodle

Graphs

• Given a directed graph

• the shortest path remain the same if we reverse the edges

• maximum flow remain the same reversing edges

• strongly connected components remains the same if we reverse edges

Page 21: Metamorphic Testing - Moodle

Strings

• If a string X contains the substring Y

• Reversing X means we should find the reverse of Y

Page 22: Metamorphic Testing - Moodle

Mathematical functionSqRt(X)*SqRt(Y)==SqRt(X*Y)• Test1:SqRt(9)==3• GenTest2:SqRt(9)*SqRt(2)==SqrRt(9*2)

log(k*X)==log(X)+log(k)

exp(k*X)==exp(X)*exp(k)

(x+k)^2=x^2+k^2+2kx

Page 23: Metamorphic Testing - Moodle

!23

MotivationAutomated Metamorphic

Testing

Page 24: Metamorphic Testing - Moodle

!24

Automated Metamorphic Testing■ Tester specifies the application’s metamorphic

properties

■ Test framework does the rest: Transform inputs Execute program with each input Compare outputs according to specification

Page 25: Metamorphic Testing - Moodle

!25

AMST Model

Page 26: Metamorphic Testing - Moodle

!26

Specifying Metamorphic Properties

Page 27: Metamorphic Testing - Moodle

!27

Empirical Study■ Is metamorphic testing more effective than other

approaches in detecting defects in applications without test oracles?

■ Approaches investigated Metamorphic Testing ■ Using metamorphic properties of the entire application Runtime Assertion Checking ■ Using Daikon-detected program invariants Partial Oracle ■ Simple inputs for which correct output can easily be determined

Page 28: Metamorphic Testing - Moodle

!28

Applications Investigated■ Machine Learning

C4.5: decision tree classifier MartiRank: ranking Support Vector Machines (SVM): vector-based classifier PAYL: anomaly-based intrusion detection system

■ Discrete Event Simulation JSim: used in simulating hospital ER

■ Information Retrieval Lucene: Apache framework’s text search engine

■ Optimization gaffitter: genetic algorithm approach to bin-packing problem

Page 29: Metamorphic Testing - Moodle

!29

Methodology■ Mutation testing was used to seed defects into each

application Comparison operators were reversed Math operators were changed Off-by-one errors were introduced

■ For each program, we created multiple versions, each with exactly one mutation

■ We ignored mutants that yielded outputs that were obviously wrong, caused crashes, etc.

■ Effectiveness is determined by measuring what percentage of the mutants were “killed”

Page 30: Metamorphic Testing - Moodle

!30

Experimental Results

C4.5

MartiRank

SVM

PAYL

JSim

Lucene

gaffitter

TOTAL

% of Mutants Killed0 25 50 75 100

Metamorphic Testing Runtime Assertion Checking Partial Oracle

Page 31: Metamorphic Testing - Moodle

!31

SVM Results

■ Permuting the input was very effective at killing off-by-one mutants

■ Many functions in SVM analyze a set of numbers (mean, standard dev, etc.)

■ Off-by-one mutants caused some element of the set to be omitted

■ By permuting, a different number would be omitted ■ This revealed the defect

Page 32: Metamorphic Testing - Moodle

!32

SVM Example Off-by-one

■ Permuting the input reveals this defect because both m_I1 and m_I4 will be different

■ Partial oracle does not because only one element is omitted, so one will remain same; for small data sets, this did not affect the overall result

Page 33: Metamorphic Testing - Moodle

!33

C4.5 Results

■ Negating the input was very effective ■ C4.5 creates a decision tree in which nodes contain

clauses like “if attrn > α then class = C” ■ If the data set is negated, those nodes should change

to “if attrn ≤ -α then class = C”, i.e. both the operator and the sign of α

■ In most cases, only one of the changes occurred

Page 34: Metamorphic Testing - Moodle

!34

Analysis of Results■ Assertions are good for checking bounds and

relationships but not for changes to values

■ Metamorphic testing particularly good for detecting errors in loop conditions

■ Metamorphic testing was not very effective for PAYL (5%) and gaffitter (33%)

fewer properties identified defects had little impact on output

Page 35: Metamorphic Testing - Moodle

Metamorphic Runtime Checking

Page 36: Metamorphic Testing - Moodle

!36

Metamorphic Runtime Checking■ Results of previous study revealed limitations of

scope and robustness in metamorphic testing

■ What if we consider the metamorphic properties of individual functions and check those properties as the entire program is running?

■ A combination of metamorphic testing and runtime assertion checking

Page 37: Metamorphic Testing - Moodle

!37

Metamorphic Runtime Checking■ Tester specifies the metamorphic properties of individual

functions using a special notation in the code (based on JML)

■ Pre-processor instruments code with corresponding metamorphic tests

■ Tester runs entire program as normal (e.g., to perform system tests)

■ Violation of any property reveals a defect

Page 38: Metamorphic Testing - Moodle

!38

Extensions to JML

Page 39: Metamorphic Testing - Moodle

!39Metamorphic test

MRC Model of Execution

Function f is about to be executed with input x in state S

Create a sandbox for

the test

Execute f(x) to get result

Send result to test

Program continues

Transform input to

get t(x)

Execute f(t(x))

Compare outputs

Report violations

The metamorphic test is conducted at the same point in the program execution as the original function call

The metamorphic test runs in parallel with the rest of the application

Page 40: Metamorphic Testing - Moodle

!40

Empirical Study■ Can Metamorphic Runtime Checking detect

defects not found by system-level metamorphic testing?

■ Same mutants used in previous study 29% were not found by metamorphic testing

■ Metamorphic properties identified at function level using suggested guidelines

Page 41: Metamorphic Testing - Moodle

!41

Experimental Results%

of d

efec

ts d

etec

ted

0

25

50

75

100

C4.5 MartiRank SVM PAYL JSim Lucene gaffitter TOTAL

MT onlyBothMRC only

Page 42: Metamorphic Testing - Moodle

!42

Analysis of Results■ Scope: Function-level testing allowed us to:

identify additional metamorphic properties execute more tests

■ Robustness: Metamorphic testing “inside” the application detected subtle defects that did not have much effect on the overall program output

Page 43: Metamorphic Testing - Moodle

!43

Combined Results

C4.5

MartiRank

SVM

PAYL

JSim

Lucene

gaffitter

TOTAL

0 25 50 75 100

MT + MRC Metamorphic Testing Runtime Assertion Checking Partial Oracle

Page 44: Metamorphic Testing - Moodle

!44

MRC: Lucene Results■ MRC killed three mutants not killed by MT

■ All three were in the idf function

Page 45: Metamorphic Testing - Moodle

!45

Study #2: Lucene Example

Search query results are ordered according to a score

“ROMEO or JULIET” Act 3 Scene 5

Act 2 Scene 4

Act 5 Scene 1

Consider a defect in which the scores are off by one. The results stay the same because only the order is important.

“ROMEO or JULIET” Act 3 Scene 5

Act 2 Scene 4

Act 5 Scene 1

5.837 4.681 3.377

6.837 5.681 4.377Partial oracle does not reveal this defect because the scores cannot be calculated in advance.

Page 46: Metamorphic Testing - Moodle

!46

Study #2: Lucene Example

“ROMEO or JULIET” Act 3 Scene 5

Act 2 Scene 4

Act 5 Scene 1

System-level metamorphic property: changing the query order shouldn’t affect result

“JULIET or ROMEO” Act 3 Scene 5

Act 2 Scene 4

Act 5 Scene 1

6.837 5.681 4.377

6.837 5.681 4.377

Even though the defect exists, the property still holds and the defect is not detected.

Page 47: Metamorphic Testing - Moodle

!47

Study #2: Lucene Example

The score itself is computed as the result of many sub-calculations.

Score(q) = ∑Similarity(f)*Weight(qi) + … + idf(q) + …

Metamorphic Runtime Checking can detect that there is an error in this function by checking its individual (mathematical) properties.

Page 48: Metamorphic Testing - Moodle

!48

Results■ Demonstrated that metamorphic testing advances the

state of the art in detecting defects in applications without test oracles

■ Proved that Metamorphic Runtime Checking will reveal defects not found by using system-level properties

■ Showed that it is feasible to continue this type of testing in the deployment environment, with minimal impact on the end user

Page 49: Metamorphic Testing - Moodle

!49

MotivationHeuristic Metamorphic

Testing

Page 50: Metamorphic Testing - Moodle

!50

Statistical Metamorphic Testing■ Introduced by Guderlei & Mayer in 2007

■ The application is run multiple times with the same input to get a mean value µo and variance σo

■ Metamorphic properties are applied ■ The application is run multiple times with the new input

to get a mean value µ1 and variance σ1

■ If the means are not statistically similar, then the property is considered violated

Page 51: Metamorphic Testing - Moodle

!51

Heuristic Metamorphic Testing■ When we expect that a change to the input will produce

“similar” results, but cannot determine the expected similarity in advance

■ Use input X to generate outputs M1 through Mk

■ Use some metric to create a profile of the outputs

■ Use input X’ (created according to a metamorphic property) to generate outputs N1 through Nk

■ Create a profile of those outputs

■ Use statistical techniques (e.g. Student t-test) to check that the profile of outputs N is similar to that of outputs M

Page 52: Metamorphic Testing - Moodle

!52

Heuristic Metamorphic Testing

x y1nd_f

x y2nd_f

x ynnd_f

t(x) y’1nd_f

y’2nd_f

y’nnd_f

t(x)

t(x)

Do the profiles demonstrate

the expected relationship?

profile of y1…yn

profile of y’1…y’n

Page 53: Metamorphic Testing - Moodle

!53

HMT Example

2

sort

?

1

?

4

3

1

?

2

3

4

?

1

2

3

?

4

?

?

1

?

2

3

4

Build a profile based on normalized equivalenceP

permute

4

1

?

3

2

?

sort

1

?

?

2

3

4

1

2

?

3

4

?

Build a profile based on normalized equivalence and compare it statistically to the first profile

P’=?

Page 54: Metamorphic Testing - Moodle

!54

HMT Empirical Study■ Is Heuristic Metamorphic Testing more effective than

other approaches in detecting defects in non-deterministic applications without test oracles?

■ Approaches investigated Heuristic Metamorphic Testing Embedded Assertions Partial Oracle

■ Applications investigated MartiRank: sorting sparse data sets JSim: non-deterministic event timing

Page 55: Metamorphic Testing - Moodle

!55

HMT Study Results & Analysis■ Heuristic Metamorphic Testing killed 59 of the 78

mutants

■ Partial oracle and assertion checking ineffective for JSim because no single execution was outside the specified range

Page 56: Metamorphic Testing - Moodle

!56

Challenges■ Automatic detection of metamorphic properties

Using dynamic and/or static techniques

■ Fault localization Once a defect has been detected, figure out where it occurred and how to fix it

■ Implementation issues Reducing overhead Handling external databases, network traffic, etc.

Page 57: Metamorphic Testing - Moodle

!57

Long-Term Directions■ Testing of multi-process or distributed

applications in these domains

■ Collaborative defect detection and notification

■ Investigate the impact on the software development processes used in the domains of non-testable programs

Page 58: Metamorphic Testing - Moodle

!58

References1. A set of metamorphic testing guidelines

■ [Murphy, Kaiser, Hu, Wu; SEKE’08]

2. New empirical studies ■ [Xie, Ho, Murphy, Kaiser, Xu, Chen; QSIC’09]

3. Heuristic Metamorphic Testing ■ [Murphy, Shen, Kaiser; ISSTA’09]

4. Metamorphic Runtime Checking ■ [Murphy, Shen, Kaiser; ICST’09]

5. In Vivo Testing ■ [Murphy, Kaiser, Vo, Chu; ICST’09] ■ [Murphy, Vaughan, Ilahi, Kaiser; AST’10]