Bug Localization with Machine Learning Techniques Wujie Zheng wjzheng@cse.cuhk.edu.hk.

Download Bug Localization with Machine Learning Techniques Wujie Zheng wjzheng@cse.cuhk.edu.hk.

Post on 31-Dec-2015




2 download

Embed Size (px)


  • Bug Localization with Machine Learning TechniquesWujie Zhengwjzheng@cse.cuhk.edu.hk

  • Background of Bug LocalizationSoftware is far from bug-freeDebugging is a methodical process of finding and correcting the bugs in a programManual debugging is laborious and expensiveIt is possible to automatic or semi-automatic this processBug localization is to find a set or a ranking of source code locations that are likely buggy through automatic analysis

  • Techniques of Bug LocalizationProgram SlicingExperimental MethodsMachine Learning Methods

  • Program SlicingDebugging with Program SlicingStart from the inputs or the variables that cause the failure (but not root cause), find all statements that may cause the failureProgram Dependence Graph (PDG)Control dependenceData dependenceStatic Slice vs. Dynamic SliceStatic slice: all statements that may affect the value of a variable at a program point for any arbitrary execution of the programDynamic slice: all statements that actually affect the value of a variable at a program point for a particular execution of the programForward Slice vs. Backward SliceForward slice: the statements that are affected by the value of variable v at statement sBackward slice: the statements that affect the value of variable v at statement s

  • PDG void main() { int sum = 0; int i = 1; while (i
  • Experimental MethodsDelta DebuggingGeneralize and simplify some failing test case to a minimal test case that still produces the failureSeparate the test case into two sub tests, choose the sub test that fails, and repeat the process. When both sub tests pass, perform another divide-and-test againEssentially a binary search algorithm

  • Delta Debugging

  • Machine Learning MethodsProblem SettingA set of failing and passing test cases are observed (test cases as the samples)The statements are instrumented and some traces are collected (statements as the features)Find a set or a ranking of the statements that are likely buggy

  • Machine Learning MethodsDirect correlation analysisTarantulaLIBLIT05Build a classification model of the test cases (feature selection)Logistic regressionDecision treeBuild a behavior model of the statements from passing test cases, and then check violations in failing test casesDynamic invariantsSOBERPPDG

  • TarantulaVisualization:

  • Logistic regressionThe logistic function

    Maximize the likelihood of the training set

    The regularization term

    The input features are normalized firstThe larger the resulting coefficient is, the more suspicious the statement is

  • Dynamic invariantsGiven a set of test cases, mine the implicit rules over the variables

    The templatesInvariants over any variable: x=a Invariants over two numeric variables: y=ax+b Invariants over three numeric variables: y=ax+by+c

  • SOBERSOBER [Liu05]

    the probability density function of the evaluation bias of P on passing runs and failing runs respectivelyThe bug relevance score of P is then defined as the difference between them

  • Probabilistic Program Dependence Graph (PPDG)Conditional IndependenceThe state of a statement is independent of the previous statements conditioned on its immediate parentsDependence NetworkAllow cycles, suitable for loops in programs

    Usage in debuggingThe distributions of conditional probabilities are learned from the passing runsThe lower the conditional probability of a state has, the more suspicious the statement it belongs to is

  • Our WorkBelongs to the group of Build a classification model of the test cases (feature selection)Association Rule MiningThe procedureMine all the strong (high frequency and high confidence) association rules X=>failSelect the rule X=> with highest confidence (the best classification model of this kind)Output the statement set X as the resultsProblems:It is hard to justify that such rule (only conjunction phrase) based model is suitableThe resulting rule may contain lots of statements. Further ranking scheme is still neededHigh computational overhead

  • Our WorkBelongs to the group of Build a classification model of the test cases (feature selection)Feature Subset SelectionThe procedureIteratively evaluates a candidate subset of features, then modifies the subset and evaluates if the new subset is an improvement over the old. Greedy forward selectionEvaluation of the subsets requires a scoring metric that grades a subset of features.F-measure (i.e., harmonic average of precision and recall, 1/(1/precision+1/recall))ProblemsToo simple, just a traditional method

  • How to improve the work?Adopt the association rule mining method to mine the behavior modelsAssociation rule mining is better for mining frequent patterns than building classification modelsDifficulties:Usage pattern may dominate, while failure statements related pattern may be not frequentIf we consider relative frequent pattern for every statements, there may be too many results to be mined and used effectivelyConsider program dependence graph may help

  • How to improve the work?Consider the debugging problem of specific bugs such as data raceData race is an important kinds of bugs, and it is difficult to debugSequential pattern mining?Combine dynamic analysis with machine learningExisting dynamic analysis methods can provide valuable features, e.g., the potential data race in a certain execution (many false positives)

  • How to improve the work?

  • Other topics of automated debuggingFailure classificationNLPFailure trace clusteringNLP + Failure trace clusteringFailure replayingRecord the failures in user sites compactly and reproduce it in developer sitesDebugging with replayingTraverse to any point of an execution without restarting the programQuery the trace database

  • Thank you!


View more >