the impact of test case summaries on bug fixing performance: an empirical investigation

The Impact of Test Case Summaries on Bug Fixing Performance:

An Empirical Investigation

Sebastiano Panichella

Annibale Panichella

Moritz Beller

Andy Zaidam

Harald Gall

Why?

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

2

Class Name: Option.java Library: Apache Commons-Cli



}



}


Why?

3

Q1: What are the main differences?

Q2: Do they cover different parts of the code?



}



}


4

Why?





}



}


5

CandidateAssertions

Why?





}



}


6

Q3: Are these assertions correct?

Why?





}



}

7

Test Code Comprehension

Generated Tests

Production Codepublic class Options implements Serializable{ private static final long serialVersionUID = 1L;

/** a map of the options with the character key */ private Map shortOpts = new HashMap();

/** a map of the options with the long key */ private Map longOpts = new HashMap();

/** a map of the required options */ private List requiredOpts = new ArrayList();

/** a map of the option groups */

Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015.

Are Generated Tests Helpful?

G. Fraser et al., Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study,

TOSEM 2015.

Do not lead to detection of more faults.

8

0%

TestingComprehension

Testing time

75% 100%

Our Solution

Test Case

9

Test Coverage Analysis

COBERTURA

Test Suite GenerationOption.java

TestDescriber

@Testpublic void testProva() throws Throwable {

Option option0 = new Option("aaa", true, "aaa");Option option1 = new Option("aaa", true, "aaa");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Testpublic void testProva2() throws Throwable {

Option option0 = new Option("aaa", true, "aaa");Option option1 = new Option("aaa", true, "aaa");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

Summary Generation

10

Summary Generator

Software Words Usage Model: deriving <actions>, <themes>, and <secondary arguments> from class, methods, attributes and variable identifiers

E. Hill et al. Automatically capturing source code context of NL-queries for software maintenance and reuse. ICSE 2009

11

Summary Generator

public class Option {

public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {

OptionValidator.validateOption(opt);this.opt = opt;this.longOpt = longOpt;

if (hasArg) {this.numberOfArgs = 1;

}

this.description = descr;}

... }

SWUM in TestDescriber:

Covered Code

12



OptionValidator.validateOption(opt);this.opt = opt;this.longOpt = longOpt;

if (hasArg) { //FALSEthis.numberOfArgs = 1;

}

this.description = descr;}

... }

Summary Generator


1) Select the covered statements

Covered Code

13



OptionValidator.validateOption(opt);this opt = opt;this longOpt = longOpt;

if (hasArg) {false

}

this description = descr;}

... }



2) Filter out Java keywords, etc.

Summary Generator

Covered Code

14


public Option(String opt, String long Opt, boolean has Arg, String descr) throws IllegalArgumentException {

Option Validator.validate Option(opt);this opt = opt;this long Opt = long Opt;

if (has Arg) {false;

}

this description = descr;}

... }




3) Identifier Splitting (Camel case)

Summary Generator

Covered Code

15


public Option(String option, String long Option, boolean has Argument String description) throws IllegalArgumentException {

Option Validator.validate Option(option);this option = option;this long Option = long Option;

if (has Argument) {false

}

this description = description;}

... }





4) Abbreviation Expansion (using external vocabularies)

Summary Generator

Covered Code

16





4) Abbreviation Expansion (using external vocabularies)

5) Part-of-Speech tagger

Summary Generator

<actions> = Verbs <themes> = Nouns/Subjects <secondary arguments> = Nouns / objectes, adjectives, etc

public class Option {Option(String option, String long Option

, boolean has Argument String description) throws IllegalArgumentException

Option Validator.validate Option(option);

this option = option;

this long Option = long Option;

if (has Argument false}this description = description;

}

NOUN NOUN NOUNADJ

NOUNNOUNVERB

NOUN NOUN NOUN

NOUN

VERB NOUN

NOUNADJ

ADJ ADJ ADJ

NOUN

NOUN NOUN

VERB

ADJ

NOUN

CON

NOUN

ADJ

Covered Code

17

Summary Generator

NOUN NOUN NOUNADJ

NOUNNOUNVERB

NOUN NOUN NOUN

NOUN

VERB NOUN

NOUNADJ

ADJ ADJ ADJ

NOUN

NOUN NOUN

VERB

ADJ

NOUN

CON

NOUN

The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”

An option validator validates it

The test exercises the following condition:- "Option" has no argument

public class Option {Option(String option, String long Option

, boolean has Argument String description) throws IllegalArgumentException

Option Validator.validate Option(option);

this option = option;

this long Option = long Option;

if (has Argument false}this description = description;

}

NOUN NOUN NOUNADJ

NOUNNOUNVERB

NOUN NOUN NOUN

NOUN

VERB NOUN

NOUNADJ

ADJ ADJ ADJ

NOUN

NOUN NOUN

VERB

ADJ

NOUN

CON

NOUN

ADJ

Natural Language Sentences Parsed Code

18




Natural Language Sentences

19

Class Level

Method LevelStatement

Level

Branch Level

Summarisation Levels

Summarisation Levels




Natural Language Sentences

20

Class Level

Method LevelStatement

Level

Branch Level

Do Test Summaries Improve Test Readability?

Do Test Summaries Help Developers?

Case StudyBug Fixing Tasks

Involving 30 Developers

21

ContextObject: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015]

Subjects: 30 Developers

ArrayIntList.javaRational.java

22

Subjects: 30 Developers (23 Researchers and 7 Developers)

ContextObject: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015]


23

Study Procedure

24

Bug Fixing Tasks

Group 1 Group 2

ArrayIntList.javaRational.java ArrayIntList.javaRational.java

25

Bug Fixing Tasks

Group 1 Group 2


26

Bug Fixing Tasks

Group 1 Group 2


27

Bug Fixing Tasks

Group 1 Group 2


Comments Comments

TestDescriber

28

Bug Fixing Tasks

Experiment conducted Offline via a Survey platform

Each participant received the experiment package consisting of: 1. A pretest questionnaire 2. Instructions and materials to perform the experiment 3. A post-test questionnaire

We do not revealed the goal of the study

45 minutes of time for each task

29

How do test case summaries impact the number of bugs fixed by developers?

RQ1

RQ1: How do test case summaries impact the number of bugs fixed by developers?

31


Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.

32


Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs.

33



With summaries, the participants were able to fix twice as many number of bugs (+50%,+100%), in the same

time window (45 minutes).

The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE

34

Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs.



Results are not influenced by developers’ experience:

(i) the number of bugs fixed is not significantly influenced by the programming experience;

(ii)there is no significant interaction between the programming experience and the presence of test case summaries.

35

The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE


Results are not influenced by developers’ experience:

(i) the number of bugs fixed is not significantly influenced by the programming experience;

(ii) there is no significant interaction between the programming experience and the presence of test case summaries.

Summary: Using automatically generated test case summaries significantly helps developers to

identify and fix more bugs.

36

How do test case summaries impact developers to change test cases in terms of

structural and mutation coverage?

RQ2


RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?

38



ONLY for Rational there is an improvements of the mutation score (+10%) when tests are

enriched with summaries.

10%

39



ONLY for Rational there is an improvements of the mutation score (+10%) when tests are

enriched with summaries.

10%Summary: Test case summaries do not influence how the developers manage the test cases in

terms of structural coverage.

40

Test Cases Summaries and Comprehension

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%

Medium High Very High Low Very Low

Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries

41


WITH Summaries:

(i) 46% of participants consider the test cases as “easy to understand”.

(iii) Only 18% of participants considered the test cases

as incomprehensible.

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%



42


WITHOUT Summaries:

(i) Only 15% of participants consider the test cases as

“easy to understand”.

(iii) 40% of participants considered the test cases


WITH Summaries:




Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%



43

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%




WITHOUT Summaries:

(i) Only 15% of participants consider the test cases as

“easy to understand”.

(iii) 40% of participants considered the test cases


WITH Summaries:



as incomprehensible.Summary: Test summaries statistically improve the comprehensibility of automatically generated

test case according to human judgments.

44

Quality of TestDescriber’ Summaries

Expressiveness

30%

70%

Is easy to read and understand

Is somewhat readable and understandable

Is hard to read and understand

Conciseness

10%

52%

38%

Has no unnecessary information

Has some unnecessary information

Has a lot of unnecessary information

Content adequacy

13%

37%50%

Is not missing any information

Missing some information

Missing some very important information

45


Expressiveness

30%

70%




Conciseness

10%

52%

38%




Content adequacy

13%

37%50%




46


Expressiveness

30%

70%




Conciseness

10%

52%

38%




Content adequacy

13%

37%50%




47

Conclusion

1) Using automatically generated test case summaries significantly helps

developers to identify and fix more bugs.

2) Test case summaries do not influence how the developers manage the test

cases in terms of structural coverage.

3) Test summaries statistically improve the comprehensibility of automatically

generated test case according to human judgments.

Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 48

the impact of test case summaries on bug fixing performance: an empirical investigation

Presentations & Public Speaking