bug localization with association rule mining wujie zheng [email protected]

25
Bug Localization with Association Rule Mining Wujie Zheng [email protected]

Upload: elaine-may

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Bug Localization with Association Rule

MiningWujie Zheng

[email protected]

Page 2: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Outline

Introduction Background of Bug Localization From Predicate to Predicate Sets

Mining Suspicious Predicate Sets as Strong Association Rules Modeling The AllRules Algorithm

Redundant Rule Pruning Definition Sufficient Condition of Redundant Rules The ClosedRules Algorithm

Experiments and Case Study Conclusions

Page 3: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Introduction

Page 4: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Background of Bug Localization Motivation

Software is far from bug-free Manual debugging is laborious and expensive

DefinitionBug localization is to find a set or a ranking of source code locations that are likely buggy through automatic analysis.

General Setting A set of failing executions A set of passing executions

Page 5: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Background of Bug Localization An Example [Jones02]

Page 6: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Background of Bug Localization xSlice [Agrawal95]

Set Operation

Page 7: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Background of Bug Localization TARANTULA [Jones02]

Visualization:

Page 8: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Background of Bug Localization LIBLIT05 [Liblit05]

Page 9: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Background of Bug Localization SOBER [Liu05]

the probability density function of the evaluation bias of P on passing runs and failing runs respectively

The bug relevance score of P is then defined as the difference between them

Page 10: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

From Predicate to Predicate Sets Motivation

A failure is caused by not only the bug but also some other trigger conditions

The criterion of an interesting (suspicious) predicate set Ps={P1 ,…Pn} Any Pi in Ps should be related to the bug. The whole set Ps should be related to the bug.

Benefit Improve the accuracy The mined implicit relationships may provide more hints for the programmers

Potential problems High computational complexity High redundancy

Existing work Consider only combinations of two predicates

Page 11: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Mining Suspicious Predicate Sets as Strong Association Rules

Page 12: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Modeling

The criterion of an interesting (suspicious) predicate set Ps={P1 ,…Pn} Any Pi in Ps should be related to the bug. The whole set Ps should be related to the bug.

The appearance of such a Ps in the execution trace When Pi exists, the program has a high probability to run into

failure; When the program run into failure, Pi always exists. When Ps exists, the program has a high probability to run into

failure; When the program run into failure, Ps always exists. Strong Association Rule Representation

Pi => failure should have high support and confidence Ps =>failure should have high support and confidence Support(X=>Y)=p(X,Y), Confidence(X=>Y)=p(Y|X)

Benefit from the advance of data mining techniques

Page 13: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

The AllRules Algorithm

Given a database of the execution traces, the items are the predicates {P1 ,…Pn} and the label failing/passing. 1st-Phase: select the buggy single predicates

1. Mining all the frequent itemsets {Pi , failure}. 2. Calculate all the confidences of Pi => failure. 3. Select the top-20 rules and construct a new database with

the corresponding Pi . 2nd-Phase: select the buggy predicate sets

1. Mining all the frequent itemsets {Ps , failure} from the new database.

2. Calculate all the confidences of Ps => failure. 3. Select the top rules as the results.

Page 14: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Redundant Rule Pruning

Page 15: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Redundant Rules

X=>failure is redundant when there exists a superset of X named Y, and the support and confidence Y => failure are not less than those of X => failure. We should have checked some superset of such Ps before

checking it. Sufficient Condition of Redundant Rules

If {X, failure} is not a closed frequent itemset, then X=>failure is a redundant rule.

So we just need to mine the closed frequent itemsets!

Page 16: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

The ClosedRules Algorithm

Given a database of the execution traces, the items are the predicates {P1 ,…Pn} and the label failing/passing. 1st-Phase: select the buggy single predicates

1. Mining all the frequent itemsets {Pi , failure}. 2. Calculate all the confidences of Pi => failure. 3. Select the top-20 rules and construct a new database with

the corresponding Pi . 2nd-Phase: select the non-redundant buggy predicate sets

1. Mining all of the closed frequent itemsets {Ps , failure} from the new database.

2. Calculate all the confidences of Ps => failure. 3. Select the top rules as the results.

Page 17: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Experiments and Case Study

Page 18: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Subject Programs and Performance Metrics Subject Programs

Siemens suiteThe Siemens suite was originally prepared by Siemens Corp. It contains 130 faulty versions of 7 programs: print tokens, print tokens2, replace, schedule, schedule2, tcas, and tot_info

Performance Metrics T-score

Based on program dependence graph, where each statement is a node and there is an edge between two nodes if two statements have data and/or control dependencies.

Given a bug localization report, a programmer is assumed to start from the suspicious statements and does a breadth-first search along the program dependence graphs until he reaches the faulty statements.

A T-score is defined as the percentage of code that is examined during this process. T-score estimates the amount of programmer effort required to find bugs using the bug localization algorithms. The less code to be examined, the higher the quality of a bug localization algorithm is.

Page 19: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Predicate Sets vs. Single Predicate

Fig. 1. Predicate Sets vs. Single Predicate

Page 20: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Comparison with Other Algorithms

Fig. 2. Performance of BLARM, LIBLIT05 and SOBER

Page 21: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Case Study

Page 22: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Subject Programs and Performance Metrics We tested this buggy program with 1608 test cases,

among which 1538 cases passed and 70 cases failed. LIBLIT05: 12th; SOBER: 10th; BLARM

Page 23: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Conclusions

Page 24: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Conclusions

A general method to exploit the relationships between predicates.

Compact results Better performance

Page 25: Bug Localization with Association Rule Mining Wujie Zheng wjzheng@cse.cuhk.edu.hk

Thank you!