muffler a tool using mutation to facilitate fault localization 2.0

Muffler: An Approach Using Mutation to Facilitate Fault Localization

Tao [email protected]

Software Engineering LaboratoryDepartment of Computer Science, Sun Yat-Sen University

Department of Computer Science and Engineering, HKUST

November 2011Sun Yat-Sen University, Guangzhou, China

1/34

mailto:[email protected]

Key hypothesis

Mutating the faulty statement tends to maintain the outcome of passed test cases.

Only one statement is faulty. By contrast, mutating a correct statement tends to toggle the outcome of

passed test cases (from passed to failed). Two statements are faulty.

Intuition Two faulty statements can trigger more failures than one faulty statement.

2/34

In branches

Intuition

3/34

F M

M: Mutant pointF: Fault point

In branches

Intuition

4/34

F

M


In branches

Intuition

5/34

F +M


Intuition

6/34

F

In series

M


Intuition

7/34

F

In series

+M


A motivating example Siemens – tcas v27

8/34

A motivating example Siemens – tcas v27 Golden version – line 118

enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) && (Cur_Vertical_Sep > MAXALTDIFF);

Fault version – line 118 enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) ; // missing ‘(Cur_Vertical_Sep > MAXALTDIFF)’

9/34

enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) && (Cur_Vertical_Sep > MAXALTDIFF);

enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV) ;

A motivating example

Suspiciousness by Ochiai

10/34

Worst rankBest rank


11/34

Vioplot from tcas v27Y-axis: # of changes from passed to failedRed line: fault line’s change

Vioplot from tcas v27Y-axis: proportion of changes from passed to failedRed line: fault line’s change

Result changes by Muffler


Formula Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)

12/34

Improve fault’s worst rank

Formula

Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)

13/34

Primary Key(imprecise when multiple faultsoccurs)

Secondary Key(invalid when coincidental correctness%is high)

Additional Key(inclined to handlecoincidental correctness)

Formula

Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)

14/34

Total TestsFailed

Passed

Coincidentalcorrectness:i.e. P(f)

PtoF (f)

PtoF (c)

PtoF (c) is empirically greater than PtoF(f)

After mutating each executable statement

Model of our approach - Muffler

Simple and clear target of each input and step List all possible research questions

We first focus on the key issue, and leave the others as future work.

15/34

Model of our approach – Muffler

16/34

Input variables

Program Test suite Mutant operators (No more)

17/34

Program

Program Number of faults Failure-triggering ability by covering this

statement (Is it easy to cause coincidental correctness?)

Fault type Statement type of fault Fault position Program structure Program size Etc.

18/34

Test suite

Number of total test cases Number of passed/failed test cases Number of passed test cases cover each

statement to be mutated test cases that may alter testing results

Number of statements covered by each failed test case Help us select suspicious statements.

Can we locate faults without test oracle?

19/34

Mutant operators

Types of mutation operators Applicable condition for each operator The probability of failure-triggering by

covering mutant point The ability for mutant to kill test cases

20/34

Steps

I list steps for separation of concerns This can help us locate where our hypothesis is.

21/34

Step I: execute test cases

Get testing results Get coverage

Of failed test cases For the selection of suspicious statements

Of passed test cases For the selection of test cases to re-run mutants

22/34

Step II: select statements to mutate

Input Coverage of failed test cases

Output Statements to be mutated

Covered by least-coverage failed test case

Discussion Multi-fault scenarios

Our approach depend more on passed test cases Practical situation (E.g., in gcc bug reports, most

faults are reported with one failure) Precision of failure clustering is …

23/34

Step III: mutate selected statements

Input Program Statements to be mutated Mutant operators

Output Mutants by mutating each suspicious statement

Discussion How many mutants to take from each suspicious statement? What if there is no applicable mutant operator for this statement?

(I.e., no mutant is generated.) What if all mutants of a statement do not change the passed testing

results to failed? (I.e., mutation impact is 0.0.) Two possible reasons: Equivalent mutants Coincidental correctness

24/34

Step IV: select passed test cases

Select passed test cases that cover the mutant point Select passed test cases that cover less statements Reduce the number of passed test cases by discarding

similar test cases

25/34

Step V: run mutants against passed test cases that cover the mutated statement

Input Mutants for each suspicious statement Selected passed test cases on original program

Output Number of failed test cases from running on each mutant

26/34

Step VI: weight statements

By dynamic impact By clustering mutant statements by analyzing

program structure

27/34

Step VII: compute mutation impact and rank the suspicious

Discussion Why not use failed test cases? How to design the formula of mutation impact? What if the number of passed test cases that cover the mutant

point is 0?

28/34

Robust with potential issues in CBFL?

Coincidental Correctness Multi-fault Coverage equivalence

29/34

Coincidental Correctness

When coincidental correctness frequently occurs, our approach can work.

When coincidental correctness rarely occurs, our approach is as good as CBFL. This is why we use ‘– PassedToFailed(s)’, rather

than ‘– PassedToFailed(s)/Passed(s)’

30/34

Multi-fault

CBFL techniques mainly rely on the coverage of passed test cases and failed test cases caused by the same fault. (More on the failed test cases)

When applying to multi-fault scenarios, the different coverage between failed test cases caused by different fault will affect the effectiveness.

Our approach more depend on passed tests.

31/34

Equivalent coverage

Inside a basic block, different statements have different def-use pair impact on statements outside the basic block.

32/34

Q & A

33/34

Thank you!Contact me via [email protected]

34/34

mailto:[email protected]

Guideline of our approach

Simple and clear target of each step List all possible research questions

We first focus on the key issue, and leave the others as future work.

36/34

muffler a tool using mutation to facilitate fault localization 2.0

Technology

fault localizationtao

mutant pointf

fault point434

fault point634

fault point334

mutant point f

faulty statements

rate maxaltdiffenabled