java unit testing tool competition — fifth round

33
.lu software verification & validation V V S Java Unit Testing Tool Competition — Fifth Round Annibale Panichella, Urko Rueda Molina 1

Upload: annibale-panichella

Post on 22-Jan-2018

63 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Java Unit Testing Tool Competition — Fifth Round

.lusoftware verification & validationVVS

Java Unit Testing Tool Competition — Fifth Round

Annibale Panichella, Urko Rueda Molina

1

Page 2: Java Unit Testing Tool Competition — Fifth Round

Previous Editions

Year Venue Coverage tool

Mutation Tool #CUTs #Projects #Participants Statistical

Tests

Round 1 2013 ICST Cobertura Javalanche 77 5 2 ✗

Round 2 2014 FITTEST JaCoCo PITest 63 9 4 ✗

Round 3 2015 SBST JaCoCo PITest 63 9 8 ✗

Round 4 2016 SBST DEFECT4J (Real Faults) 68 5 4 ✗

2

Page 3: Java Unit Testing Tool Competition — Fifth Round

New Edition

Year Venue Coverage tool

Mutation Tool #CUTs #Projects #Participants Statistical

Tests

Round 1 2013 ICST Cobertura Javalanche 77 5 2 ✗

Round 2 2014 FITTEST JaCoCo PITest 63 9 4 ✗

Round 3 2015 SBST JaCoCo PITest 63 9 8 ✗

Round 4 2016 SBST DEFECT4J (Real Faults) 68 5 4 ✗

Round 5 2017 SBST JaCoCo PITest + Our Env. 69 8 2+2 ✓

3

Page 4: Java Unit Testing Tool Competition — Fifth Round

The Infrastructure

4

Page 5: Java Unit Testing Tool Competition — Fifth Round

The Infrastructure

Defect4j

Defect4j

• The previous edition used DEFECT4J to detect flaky tests and to measure effectiveness

• In the new edition, we modified the infrastructure to work with libraries not in DEFECT4J

• We developed our own tool to detect flaky tests

• Effectiveness based on mutation analysis: PITest + JaCoCo

5

Page 6: Java Unit Testing Tool Competition — Fifth Round

The Infrastructure

Defect4j

Defect4j

• The previous edition used DEFECT4J to detect flaky tests and to measure effectiveness

• In the new edition, we modified the infrastructure to work with libraries not in DEFECT4J

• We developed our own tool to detect flaky tests

• Effectiveness based on mutation analysis: PITest + JaCoCo

6

Page 7: Java Unit Testing Tool Competition — Fifth Round

The Infrastructure

Defect4j

Defect4j

• The previous edition used DEFECT4J to detect flaky tests and to measure effectiveness

• In the new edition, we modified the infrastructure to work with libraries not in DEFECT4J

• We developed our own tool to detect flaky tests

• Effectiveness based on mutation analysis: PITest + JaCoCo

7

Page 8: Java Unit Testing Tool Competition — Fifth Round

The Infrastructure

• The previous edition used DEFECT4J to detect flaky tests and to measure effectiveness

• In the new edition, we modified the infrastructure to work with libraries not in DEFECT4J

• We developed our own tool to detect flaky tests

• Effectiveness based on mutation analysis: PITest + JaCoCo

Our Tool

PITest +

JaCoCo

8

Page 9: Java Unit Testing Tool Competition — Fifth Round

Test Management

Flaky tests: • Pass during generation but fail when re-executed • Detection mechanism: we run each test suite five times • Ignored when computing the coverage scores

Non-compiling tests: • Generated test suites were re-compiled in our own

execution environment

9

Page 10: Java Unit Testing Tool Competition — Fifth Round

Metric Computation

Code Coverage: • Statement coverage • Condition coverage

Mutation Score: • We did not use PITest’s running engine since it gave

errors for test cases with ad-hoc/non-standard JUnit runners (e.g., in EvoSuite)

• We only use PITest engine for the generation of mutants

• Combining PITest with JaCoCo: executing only mutants infecting covered lines

10

Page 11: Java Unit Testing Tool Competition — Fifth Round

We apply the same formula used in the last competition since it combines coverage metrics, effectiveness, execution time and number of flaky/non-compiling tests

Scoring Formula

T = Generated Test B = Search Budget C = Class under test R = independent Run

Covi = statement coverage Covb = branch coverage Covm = Strong Mutation

covScorehT,B,C,ri = 1⇥ Covi + 2⇥ Covb + 4⇥ Covm1 2 4

11

Page 12: Java Unit Testing Tool Competition — Fifth Round

We apply the same formula used in the last competition since it combines coverage metrics, effectiveness, execution time and number of flaky/non-compiling tests

Scoring Formula

tScorehT,B,C,ri = covScorehT,B,C,ri ⇥min

✓1,

L

genT ime

T = Generated Test B = Search Budget C = Class under test R = independent Run

Covi = statement coverage Covb = branch coverage Covm = Strong Mutation

getTime = generation time

covScorehT,B,C,ri = 1⇥ Covi + 2⇥ Covb + 4⇥ Covm1 2 4

2 x B

12

Page 13: Java Unit Testing Tool Competition — Fifth Round

We apply the same formula used in the last competition since it combines coverage metrics, effectiveness, execution time and number of flaky/non-compiling tests

Scoring Formula

tScorehT,B,C,ri = covScorehT,B,C,ri ⇥min

✓1,

L

genT ime

T = Generated Test B = Search Budget C = Class under test R = independent Run

Covi = statement coverage Covb = branch coverage Covm = Strong Mutation

getTime = generation time

penalty = percentage of flaky test and non-compiling tests

ScorehT,B,C,ri = tScorehT,B,C,ri + penaltyhT,B,C,ri

covScorehT,B,C,ri = 1⇥ Covi + 2⇥ Covb + 4⇥ Covm1 2 4

2 x B

13

Page 14: Java Unit Testing Tool Competition — Fifth Round

The Competition

14

Page 15: Java Unit Testing Tool Competition — Fifth Round

The Tools

jTExpert

RandoopAutomatic unit test generation for Java

T3

15

Page 16: Java Unit Testing Tool Competition — Fifth Round

Selection of the Benchmark Classes

Source Application Domain # Classes # Selected Classes

BCEL

Apache commons

Bytecode manipulation 431 10

Jxpath Java Beans manipulation with Path syntax 180 10

Imaging Framework to write/read images with various formats 427 4

Google GsonGoogle

Conversion of Java Objects into their JSON representation and vice versa 174 9

Re2j Regular expression engine for time-linear regular expression matching 47 8

Freehep Java Analysis Studio

Open-source repository providing Java utilities for high energy physics applications 180 10

LA4j Github Linear Algebra primitives (matrices and vectors) and algorithms 208 10

Okhttp Github HTTP and HTTP/2 client for Android and Java applications 193 8

16

Page 17: Java Unit Testing Tool Competition — Fifth Round

Selection of the Benchmark Classes

Source Application Domain # Classes # Selected Classes

BCEL

Apache commons

Bytecode manipulation 431 10

Jxpath Java Beans manipulation with Path syntax 180 10

Imaging Framework to write/read images with various formats 427 4

Google GsonGoogle

Conversion of Java Objects into their JSON representation and vice versa 174 9

Re2j Regular expression engine for time-linear regular expression matching 47 8

Freehep Java Analysis Studio

Open-source repository providing Java utilities for high energy physics applications 180 10

LA4j Github Linear Algebra primitives (matrices and vectors) and algorithms 208 10

Okhttp Github HTTP and HTTP/2 client for Android and Java applications 193 8

17

Page 18: Java Unit Testing Tool Competition — Fifth Round

Selection of the Benchmark Classes

Source Application Domain # Classes # Selected Classes

BCEL

Apache commons

Bytecode manipulation 431 10

Jxpath Java Beans manipulation with Path syntax 180 10

Imaging Framework to write/read images with various formats 427 4

Google GsonGoogle

Conversion of Java Objects into their JSON representation and vice versa 174 9

Re2j Regular expression engine for time-linear regular expression matching 47 8

Freehep Java Analysis Studio

Open-source repository providing Java utilities for high energy physics applications 180 10

LA4j Github Linear Algebra primitives (matrices and vectors) and algorithms 208 10

Okhttp Github HTTP and HTTP/2 client for Android and Java applications 193 8

18

Page 19: Java Unit Testing Tool Competition — Fifth Round

Selection Procedure

HOW: • Computing the McCabe’s cyclomatic complexity (MCC) for all methods in

each java library • Filtering out all trivial classes, i.e., classes that contains only methods

with a MCC < 3 • Random sampling from the pruned projects

WHAT/WHY: • Removing (likely) trivial classes not challenging for the tools • Developers may use automated tools for complex classes

19

Page 20: Java Unit Testing Tool Competition — Fifth Round

Benchmark Statistics

Largest Class: Name = XPathParserTokenManager Project = JXPATH N. Statements = 1029 N. Branches = 872

Smallest Class: Name = ForwardBackSubstitutionSolver Project = LA4J N. Statements = 26 N. Branches = 20

# Branches

Freq

uenc

y

# Statements

Freq

uenc

y

20

Page 21: Java Unit Testing Tool Competition — Fifth Round

The Methodology

• Search Budgets = 10s, 30s, 60s, 120s, 240s, 300s, 480s

• Number of CUTs = 69

• Number of repetitions = 3

• All tools have been executed in parallel (multi-threading) on the same machine

• Statistical analysis: Friedman’s test: non-parametric test for multiple-problem analysis Post-hoc Connover’s procedure for pairwise multiple comparisons

21

Page 22: Java Unit Testing Tool Competition — Fifth Round

The Results

22

Page 23: Java Unit Testing Tool Competition — Fifth Round

Coverage Results

Search Budget = 10s Search Budget = 30s

23

Page 24: Java Unit Testing Tool Competition — Fifth Round

Coverage Results

Search Budget = 60s Search Budget = 480s

24

Page 25: Java Unit Testing Tool Competition — Fifth Round

Coverage Results

There are 43 classes out of 69 (≈ 60%) for which at least one of the two participant tools could

not generate any test case.

What happens if we consider only classes for which both EvoSuite and

JTexpert could generate tests?

Filtered Results with Search Budget = 480s

25

Page 26: Java Unit Testing Tool Competition — Fifth Round

Scalability %

Bra

nch

Cove

rage

0

25

50

75

100

Search Budget10s 30s 60s 120s 240s 300s 480s

EvoSuite JTExpertT3 Randoop

% S

trong

Mut

ation

Cov

.

0

12.5

25

37.5

50

Search Budget10s 30s 60s 120s 240s 300s 480s

EvoSuite JTExpertT3 Randoop

Comparison for the class Parser.java extracted from the library Re4J. N. Statements = 760, N. Branches = 565, N. Mutants = 203

26

Page 27: Java Unit Testing Tool Competition — Fifth Round

ScoringSc

ore

0

75

150

225

300

Search Budget10s 30s 60s 120s 240s 300s 480s

EvoSuite JTExpert T3 Randoop

27

Page 28: Java Unit Testing Tool Competition — Fifth Round

Generated vs. Manually-written Tests

Comparison of the scores achieved by • EvoSuite after 480s • JTexpert after 480s • T3 after 480s • Random after 480s • Manually-written tests • Optimal Score

N.B.: We only considered the 63 subjects for which we found developers-written tests.

0

50

100

150

200

250

300

350

400

450

500

268

6178125

251

Optimal

EvoSuit

e

JTExpe

rt T3Rand

oop

Manual

28

Page 29: Java Unit Testing Tool Competition — Fifth Round

Tool Total Score St. Dev.Friedman’s Test

Statistically better than (Conover’s procedure)

Rank Score

EvoSuite 1457 193 1 1.55 JTExpert, T3, Randoop

JTexpert 849 102 2 2.71 T3, Randoop

T3 526 82 3 2.81 Random

Random 448 34 4 2.92

Statistical Analysis

29

Page 30: Java Unit Testing Tool Competition — Fifth Round

Tool Total Score St. Dev.Friedman’s Test

Statistically better than (Conover’s procedure)

Rank Score

EvoSuite 1457 193 1 1.55 JTExpert, T3, Randoop

JTexpert 849 102 2 2.71 T3, Randoop

T3 526 82 3 2.81 Random

Random 448 34 4 2.92

Statistical Analysis

30

Page 31: Java Unit Testing Tool Competition — Fifth Round

Statistical Analysis

Tool Total Score St. Dev.Friedman’s Test

Statistically better than (Conover’s procedure)

Rank Score

EvoSuite 1457 193 1 1.55 JTExpert, T3, Randoop

JTexpert 849 102 2 2.71 T3, Randoop

T3 526 82 3 2.81 Random

Random 448 34 4 2.92

31

Page 32: Java Unit Testing Tool Competition — Fifth Round

Lessons Learnt

• Using multi-problem statistical tests

• Selection procedure to filter-out (likely) trivial classes

• Subject categories: string manipulation, computational intensive, object manipulation, etc.

• What next:

• Publishing  the benchmark infrastructure

• Performing a more in-depth analysis for each subject category

• More Tools, new languages? (i.e., C, C#?)

32

Page 33: Java Unit Testing Tool Competition — Fifth Round

.lusoftware verification & validationVVS

Java Unit Testing Tool Competition — Fifth Round

Annibale Panichella, Urko Rueda Molina

33