muffler a tool using mutation to facilitate fault localization 2.0
TRANSCRIPT
Muffler: An Approach Using Mutation to Facilitate Fault Localization
Software Engineering LaboratoryDepartment of Computer Science, Sun Yat-Sen University
Department of Computer Science and Engineering, HKUST
November 2011Sun Yat-Sen University, Guangzhou, China
1/34
Key hypothesis
Mutating the faulty statement tends to maintain the outcome of passed test cases.
Only one statement is faulty. By contrast, mutating a correct statement tends to toggle the outcome of
passed test cases (from passed to failed). Two statements are faulty.
Intuition Two faulty statements can trigger more failures than one faulty statement.
2/34
In branches
Intuition
3/34
F M
M: Mutant pointF: Fault point
In branches
Intuition
4/34
F
M
M: Mutant pointF: Fault point
In branches
Intuition
5/34
F +M
M: Mutant pointF: Fault point
Intuition
6/34
F
In series
M
M: Mutant pointF: Fault point
Intuition
7/34
F
In series
+M
M: Mutant pointF: Fault point
A motivating example Siemens – tcas v27
8/34
A motivating example Siemens – tcas v27 Golden version – line 118
enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) && (Cur_Vertical_Sep > MAXALTDIFF);
Fault version – line 118 enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) ; // missing ‘(Cur_Vertical_Sep > MAXALTDIFF)’
9/34
enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) && (Cur_Vertical_Sep > MAXALTDIFF);
enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) ;
A motivating example
Suspiciousness by Ochiai
10/34
Worst rankBest rank
A motivating example
11/34
Vioplot from tcas v27Y-axis: # of changes from passed to failedRed line: fault line’s change
Vioplot from tcas v27Y-axis: proportion of changes from passed to failedRed line: fault line’s change
Result changes by Muffler
A motivating example
Formula Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)
12/34
Improve fault’s worst rank
Formula
Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)
13/34
Primary Key(imprecise when multiple faultsoccurs)
Secondary Key(invalid when coincidental correctness%is high)
Additional Key(inclined to handlecoincidental correctness)
Formula
Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)
14/34
Total TestsFailed
Passed
Coincidentalcorrectness:i.e. P(f)
PtoF (f)
PtoF (c)
PtoF (c) is empirically greater than PtoF(f)
After mutating each executable statement
Model of our approach - Muffler
Simple and clear target of each input and step List all possible research questions
We first focus on the key issue, and leave the others as future work.
15/34
Model of our approach – Muffler
16/34
Input variables
Program Test suite Mutant operators (No more)
17/34
Program
Program Number of faults Failure-triggering ability by covering this
statement (Is it easy to cause coincidental correctness?)
Fault type Statement type of fault Fault position Program structure Program size Etc.
18/34
Test suite
Number of total test cases Number of passed/failed test cases Number of passed test cases cover each
statement to be mutated test cases that may alter testing results
Number of statements covered by each failed test case Help us select suspicious statements.
Can we locate faults without test oracle?
19/34
Mutant operators
Types of mutation operators Applicable condition for each operator The probability of failure-triggering by
covering mutant point The ability for mutant to kill test cases
20/34
Steps
I list steps for separation of concerns This can help us locate where our hypothesis is.
21/34
Step I: execute test cases
Get testing results Get coverage
Of failed test cases For the selection of suspicious statements
Of passed test cases For the selection of test cases to re-run mutants
22/34
Step II: select statements to mutate
Input Coverage of failed test cases
Output Statements to be mutated
Covered by least-coverage failed test case
Discussion Multi-fault scenarios
Our approach depend more on passed test cases Practical situation (E.g., in gcc bug reports, most
faults are reported with one failure) Precision of failure clustering is …
23/34
Step III: mutate selected statements
Input Program Statements to be mutated Mutant operators
Output Mutants by mutating each suspicious statement
Discussion How many mutants to take from each suspicious statement? What if there is no applicable mutant operator for this statement?
(I.e., no mutant is generated.) What if all mutants of a statement do not change the passed testing
results to failed? (I.e., mutation impact is 0.0.) Two possible reasons: Equivalent mutants Coincidental correctness
24/34
Step IV: select passed test cases
Select passed test cases that cover the mutant point Select passed test cases that cover less statements Reduce the number of passed test cases by discarding
similar test cases
25/34
Step V: run mutants against passed test cases that cover the mutated statement
Input Mutants for each suspicious statement Selected passed test cases on original program
Output Number of failed test cases from running on each mutant
26/34
Step VI: weight statements
By dynamic impact By clustering mutant statements by analyzing
program structure
27/34
Step VII: compute mutation impact and rank the suspicious
Discussion Why not use failed test cases? How to design the formula of mutation impact? What if the number of passed test cases that cover the mutant
point is 0?
28/34
Robust with potential issues in CBFL?
Coincidental Correctness Multi-fault Coverage equivalence
29/34
Coincidental Correctness
When coincidental correctness frequently occurs, our approach can work.
When coincidental correctness rarely occurs, our approach is as good as CBFL. This is why we use ‘– PassedToFailed(s)’, rather
than ‘– PassedToFailed(s)/Passed(s)’
30/34
Multi-fault
CBFL techniques mainly rely on the coverage of passed test cases and failed test cases caused by the same fault. (More on the failed test cases)
When applying to multi-fault scenarios, the different coverage between failed test cases caused by different fault will affect the effectiveness.
Our approach more depend on passed tests.
31/34
Equivalent coverage
Inside a basic block, different statements have different def-use pair impact on statements outside the basic block.
32/34
Q & A
33/34
35/34
Guideline of our approach
Simple and clear target of each step List all possible research questions
We first focus on the key issue, and leave the others as future work.
36/34
37/34