cs 4723: lecture 5 test coverage
DESCRIPTION
Test Coverage After we have done some testing, how do we know the testing is enough? The most straightforward: input coverage # of inputs tested / # of possible inputs Unfortunately, # of possible inputs is typically infinite Not feasible, so we need approximations… 2TRANSCRIPT
CS 4723: Lecture 5Test Coverage
2
Test Coverage After we have done some testing, how do we
know the testing is enough? The most straightforward: input coverage # of inputs tested / # of possible inputs
Unfortunately, # of possible inputs is typically infinite
Not feasible, so we need approximations…
3
Test Coverage Code Coverage Input Combination Coverage Specification Coverage Mutation Coverage
4
Code Coverage Basic idea:
Bugs in the code that has never been executed will not be exposed
So the test suite is definitely not sufficient Definition:
Divide the code to elements Calculate the proportion of elements that are
executed by the test suite
5
Control Flow Graph
How many test cases to achieve full statement coverage?
6
Statement Coverage in Practice Microsoft reports 80-90% statement coverage Safely-critical software must achieve 100%
statement coverage
Usually about 85% coverage, 100% for large systems is usually very hard
7
Statement Coverage: Example
8
Branch Coverage Cover the branches in a program A branch is consider executed when both (All)
outcomes are executed Also called multiple-condition coveage
9
Control Flow Graph
How many test cases to achieve full branch coverage?
10
Branch Coverage: Example
11
Branch Coverage: Example
An untested flow of data from an assignment to a use of the assigned value, could hide an erroneous computation
Even though we have 100% statement and branch coverage
12
Data Flow Coverage Cover all def-use pairs in a software Def: write to a variable Use: read of a variable Use u and Def d are pairedwhen d is the directprecursor of u in certainexecution
13
Data Flow Coverage Formula
Not easy to locate all use-def pairs Easy for inner-procedure (inside a method) Very difficult for inter-procedure
Consider the write to a field var in one method, and the read to it in another method
14
Path coverage The strongest code coverage criterion
Try to cover all possible execution paths in a program Covers all previous coverage criteria? Usually not feasible
Exponential paths in acyclic programs Infinite paths in some programs with loops
15
Path coverage N conditions 2N paths Many are not
feasible e.g., L1L2L3L4L6
X = 0 => L1L2L3L4L5L6
X = -1 => L1L3L4L6
X = -2 => L1L3L4L5L6
16
Control Flow Graph
How many paths?How many test casesto cover?
17
Path coverage, not enough1. main() {2. int x, y, z, w;3. read(x);4. read(y);5. if (x != 0)6. z = x + 10;7. else8. z = 1;9. if (y>0)10. w = y / z;10. else11. w = 0;12.}
Test Requirements: – 4 paths• Test Cases – (x = 1, y = 22) – (x = 0, y = 10) – (x = 1, y = -22) – (x = 1, y = -10)• We are still not exposing the fault !• Faulty if x = -10 – Structural coverage cannot reveal this error
18
Code Coverage Questions
Statement (basic block) coverage, are they the same? Branch coverage (cover all edges in a control flow
graph), same with basic block coverage?
19
Method coverage So far, all examples are inner-method
Quite useful in unit testing It is very hard to achieve 100% statement
coverage in system testing Need higher level code element Method coverage
Similar to statements Node coverage : method coverage Edge coverage : method invocation coverage Path coverage : stack trace coverage
20
Method coverage
21
Code coverage: summary Coverage of code elements and their
connections Node coverage:
Class/method/statement/predicate coverage Edge coverage:
Branch/Dataflow/MethodInvok Path coverage:
Path/UseDefChain/StackTrace
22
Code coverage: limitations Not enough
Some bugs can not be revealed even with full path coverage
Cannot reveal bugs due to missing code
23
Code coverage: practice Though not perfect, code coverage is the most
widely used technique for test evaluation Also used for measure progress made in
testing The criteria used in practice are mainly:
Method coverage Statement coverage Branch coverage Loop coverage with heuristic (0, 1, many)
24
Code coverage: practice Far from perfect
The commonly used criteria are the weakest, recall our examples
A lot of corner (they are not so corner if just not found by statement coverage) cases can never be found
100% code coverage is rarely achieved Mature commercial software products released with
85% to 90% statement coverage Some commercial software products released with
around 60% statement coverage Many open source software even lower than 50%
25
Input Combination Coverage Basic idea
Origins from the most straightforward idea In theory, proof of 100% correctness when achieve
100% coverage in theory In practice, on very trivial cases
Main problems Combinations are exponential Possible values are infinite
26
Input Combination Coverage An example on a simple automatic sales
machine Accept only 1$ bill once and all beverages are 1$ Coke, Sprite, Juice, Water Icy or normal temperature Want receipt or not
All combinations = 4*2*2 = 16 combinations
Try all 16 combinations will make sure the system works correctly
27
Input Combination Coverage Sales Machine Example
Coke
Sprite
Juice
Water
Normal
Icy
Receipt
No-Receipt
Input 1 Input 2 Input 3
28
Combination Explosion Combinations are exponential to the number of
inputs Consider an annual tax report system with 50
yes/no questions to generate a customized form for you
250 combinations = about 1015 test cases Running 1000 test case for 1 second -> 30,000
years
29
Observation When there are many inputs, usually a
relationship among inputs usually involve only a small number of inputs
The previous example: Maybe only icy coke and sprite, but receipt is independent
30
Example of Tax Report Input 1: Family combined report or Single
report Input 2: Home loans or not Input 3: Receive gift or not Input 4: Age over 60 or not …
Input 1 is related to all other inputs Other inputs are independent of each other
31
Studies A long term study from NIST (national institute
of standardization technology) A combination width of 4 to 6 is enough for detecting
almost all errors
32
N-wise coverage Coverage on N-wise combination of the possible values of
all inputs Example: 2-wise combinations
(coke, icy), (sprite, icy), (water, icy), (juice, icy) (coke, normal), (sprite, normal), … (coke, receipt), (sprite, receipt), … (coke, no-receipt), (sprite, no-receipt), … (icy, receipt), (normal, receipt) (icy, no-receipt), (normal, no-receipt) 20 combinations in total We had 16 3-wise combinations, now we have 20, get
worse??
33
N-wise coverage Note: One test case may cover multiple N-wise
combinations E.g., (Coke, Icy, Receipt) covers 3 2-wise combinations
(Coke, Icy), (Coke, Receipt), (Icy, Receipt) 100% N-wise coverage will fully cover 100% (N-1)-
wise coverage, is this true? For K Boolean inputs
Full combination coverage = 2k combinations: exponential Full n-wise coverage = 2n*k*(k-1)* … *(k-n+1)/n!combinations: polynomial, for 2-wise combination, 2*k*(k-1)
34
N-wise coverage: Example How many test cases for 100% 2-wise
coverage of our sales machine example? (coke, icy, receipt), covers 3 new 2-wise combinations (sprite, icy, no-receipt), cover 3 new … (juice, icy, receipt), covers 2 new … (water, icy, receipt), covers 2 new … (coke, normal, no-receipt), covers 3 new … (sprite, normal, receipt), cover 3 new … (juice, normal, no-receipt), covers 2 new … (water, normal, no-receipt), covers 2 new … 8 test cases covers all 20 2-wise combinations
35
Combination Coverage in Practice 2-wise combination coverage is very widely
used Pair-wise testing All pairs testing
Mostly used in configuration testing Example: configuration of gcc All lot of variables Several options for each variable For command line tools: add or remove an option
36
Input model What happened if an input has infinite possible
values Integer Float Character String Note: all these are actually finite, but the possible
value set is too large, so that they are deemed as infinite
Idea: map infinite values to finite value baskets (ranges)
37
Input model Equivalent class partition
Partition the possible value set of a input to several value ranges
Transform numeric variables (integer, float, double, character) to enumerated variables
Example: int exam_score => {less than -1}, {0, 59}, {60,69},
{70,79}, {80,89}, {90, 100}, {100+} char c => {a, z}, {A,Z}, {0,9}, {other}
38
Input model Feature extraction
For string and structure inputs Split the possible value set with a certain feature Example:
String passwd => {contains space}, {no space} It is possible to extract multiple features from one input Example:
String name => {capitalized first letter}, {not} => {contains space}, {not} => {length >10}, {2-10}, {1}, {0}One test case may cover multiple features
39
Input model Feature extraction: structure input
A Word Binary Tree (Data at all nodes are strings) Depth : integer -> partition {0, 1, 1+} Number of leaves : integer -> partition {0, 1, <10, 10+} Root: null / not A node with only left child / not A node with only right child / not Null value data on any node / not Root value: string -> further feature extraction Value on the left most leaf: string -> further feature
extraction …
40
Input model Infeasible feature combination?
Example:String name => {capitalized first letter}, {not}
=> {contains space}, {not} => {length >10}, {2-10}, {1}, {0}Length = 0 ^ contains spaceLength = 0 ^ capitalized first letterLength = 1 ^ contains space ^ capitalized first letter
41
Input combination coverage Summary:
Try to cover the combination of possible values of inputs
Exponential combinations: N-wise coverage 2-wise coverage is most popular, all pairs testing
Infinite possible values Input partition Input feature extraction
Coverage is usually 100% once adopted It is easy to achieve, compared with code coverage Models are not easy to write
42
Specification Coverage A type of input coverage Covers the written formal specification in the
requirement document Example
When a number smaller than 0 is fed in, the system should report error => testcase: -1
Sometimes can be a sequence of inputs When you input correct user name, a passwd prompt
is shown, after you input the correct passwd, the user profile will be shown, …
=> testcase: xiaoyin, xxxxx, …
43
Specification Coverage Widely used in industry Advantages
Target at the specification No need for writing oracles Usually can achieve 100% coverage
Disadvantages Very hard to automate
can only be automated with formal specifications No guarantee to be complete Quality highly depend on the specification
44
Test coverage So far, covering inputs and code The final goal of testing
Find all bugs in the software So there should be a bug coverage The coverage best represents the adequacy of
a test suite 50% bug coverage = half done! 100% bug coverage = done!
45
But it is impossible Bugs are unknown
Otherwise we do not need testing So we have the number of bugs found, we do
not know what to divide One possible solution
Estimation 1-10 bugs in 1 KLOC Depends on the type of software and the stage of
development, imprecise When you find many bugs, do you think all bugs are
there or the code is really of low quality?
46
Mutation coverage How can we know how many bugs there are in
the code? If only we plant those bugs!
Mutation coverage checks the adequacy of a test suite by how many human-planted bugs it can expose
47
Concepts Mutant
A software version with planted bugs Usually each mutant contains only one planted bug,
why? Mutant Kill
Given a test suite S and a mutant m, if there is a test case t in S, so that execute(original, t) != execute(m, t), we state that S can kill m
Basically, a test suite can kill a mutant, meaning that the test suite is able to detect the planted bug represented by the mutant
48
Illustration
Test Cases
Original
Mutant 1
Mutant 2
Mutant n
...
Oracles
Results
Results
Results
same Survived
different Killed
49
Concepts Mutation coverage
generated mutants of #killed mutants of #
50
Mutant generation Traditional mutation operators
Statement deletion Replace Boolean expression with true/false Replace arithmetic operators (+, -, *, /, …) Replace comparison relations (>=, ==, <=, !
=) Replace variables …
51
Mutation Example: OperatorMutant operator In original In mutant
Statement Deletion z=x*y+1;
Boolean expression to true | false
if (x<y) if(true)If(false)
Replace arithmetic operators
z=x*y+1; z=x*y-1z=x+y-1
Replace comparison operators
if(x<y) if(x<=y)if(x==y)
Replace variables z=x*y+1; z = z*y+1z = x*x+1
52
Mutant generation Object-oriented mutation operators
Insert/Delete overriding method Add/delete “this” Instantiation as child class Cast to subtype …
53
Mutation Example: Object-Oriented Insert/Delete overriding methodclass Shape{ public void setID(String id){ this.id = id; } public void draw(){ ... }}class Circle extends Shape{ public void draw(){ ... }}
class Shape{ public void setID(String id){ this.id = id; } public void draw(){ ... }}class Circle extends Shape{ public void setID(String id){ } public void draw(){ ... }}
class Shape{ public void setID(String id){ this.id = id; } protected void draw(){ ... }}class Circle extends Shape{}
54
Problems of mutation testing Large amount of time overhead
Need to run the test suite over large number of mutants
Cause extra burden for collecting test coverage Equivalent mutants
A mutant that will not affect the behavior of the software
55
Time overhead For n mutants, requires n times of
overhead How to reduce time overhead?
Reuse execution info Early rule out
Mutants that are not covered Mutants that cannot be killed
56
Reduce Time Overheadint index = read; while (…){ …; index++; if (index == 10) { break; }}return value > 0;
int index = read; while (…){ …; index++; if (index == 10) { break; }}return value < 0;
original m1
reuse the program states before return statement
int index = read; while (…){ …; index++; if (index == 10) { return true; }}return value > 0;
int index = read; while (…){ …; index++; if (index == 10) { break; }}return value +1 >0;
m3m2
If index reads 100, The mutant is not covered
If value is not 0, nothing is changed
57
Equivalent mutants Another main problem in mutation coverage is
equivalent mutants A mutant is an equivalent mutant if its semantics is
identical with the original softwareint index = 0; while (…){ …; index++; if (index == 10) { break; }}
=>
int index = 0; while (…){ …; index++; if (index >= 10) { break; }}
58
Equivalent mutants Another main problem in mutation coverage is
equivalent mutants Equivalent mutants cause mutation coverage to never
reach 100% So you do not know whether there are too many
equivalent mutants, or the test suite is not adequate
59
Reduce equivalent mutants Using compiler optimization
Check whether the compiled bytecode is the same with the original software Mutating dead code Mutating unused variable
After the mutation code, write a conditional path, and check whether the path is feasible
//result = a + b;result = a - b;
=>
//result = a + b;result = a - b;
if(a + b != a - b){ not equivalent;}
60
Mutant testing tools MILUhttp://www0.cs.ucl.ac.uk/staff/Y.Jia/#tools MuJavahttp://cs.gmu.edu/~offutt/mujava/ Javalanchehttps://github.com/david-schuler/javalanche/
61
Summary on all coverage measures Code coverage
Target: code Adequacy: no -> 100% code coverage != no bugs Approximation: dataflow, branch, method/statements Usability: medium (require code for instrumentation) Preparation: none Overhead: low (instrumentation cause some
overhead)
62
Summary on all coverage measures Input combination coverage
Target: inputs Adequacy: yes -> 100% input coverage == no bugs Approximation: n-wise coverage, input partition, input
feature extraction Usability: none Preparation: hard (require input mapping) Overhead: none
63
Summary on all coverage measures Mutation coverage
Target: bugs Adequacy: no -> 100% mutant coverage != no bugs Approximation: mutation is already approximation Usability: medium (require code change for mutants) Preparation: none Overhead: very high (execution on instrumented
mutated versions)