towards building a universal defect prediction model
DESCRIPTION
To predict files with defects, a suitable prediction model must be built for a software project from either itself (withinproject) or other projects (cross-project). A universal defect prediction model that is built from the entire set of diverse projects would relieve the need for building models for an individual project. A universal model could also be interpreted as a basic relationship between software metrics and defects. However, the variations in the distribution of predictors pose a formidable obstacle to build a universal model. Such variations exist among projects with different context factors (e.g., size and programming language). To overcome this challenge, we propose context-aware rank transformations for predictors. We cluster projects based on the similarity of the distribution of 26 predictors, and derive the rank transformations using quantiles of predictors for a cluster. We then fit the universal model on the transformed data of 1,398 open source projects hosted on SourceForge and GoogleCode. Adding context factors to the universal model improves the predictive power. The universal model obtains prediction performance comparable to the within-project models and yields similar results when applied on five external projects (one Apache and four Eclipse projects). These results suggest that a universal defect prediction model may be an achievable goal.TRANSCRIPT
![Page 1: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/1.jpg)
Towards Building a Universal Defect Prediction Model
Feng Zhang
Audris Mockus
Iman Keivanloo
Ying Zou
![Page 2: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/2.jpg)
2
ONE ring that rules the other rings of power.
![Page 3: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/3.jpg)
3
A universal model that predicts defects for all the projects.
![Page 4: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/4.jpg)
4
Most successful prediction models are within-project models
![Page 5: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/5.jpg)
5
How about cross-project models?
![Page 6: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/6.jpg)
6
Deriving a universal model with cross-project models?
![Page 7: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/7.jpg)
7
Select the training set of projects like this?
![Page 8: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/8.jpg)
8
Or select the training set of projects like this?
![Page 9: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/9.jpg)
9
Is it still possible to build a universal model? If so, then how?
![Page 10: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/10.jpg)
10
What context factors to consider ?
![Page 11: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/11.jpg)
11
C++
S
C++
L
Java
S
Java
L
Steps towards building a universal model 1. Partition
C++ Java Small size
Large size
Programming languages System Size
![Page 12: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/12.jpg)
12
C++
S
C++
L
Java
S
Java
L
Steps towards building a universal model 1. Partition
C++
S
C++
L
Java
2. Cluster
R1(x)
R1(x)
R3(x)
3. Obtain Ranking Functions
4. Rank
Using quantiles of metric values (- ∞, 10%] => level 1 (10%, 20%] => level 2
… [90%, +∞) => level 10
Java
S
Java
L
Java
![Page 13: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/13.jpg)
13
C++
S
C++
L
Java
S
Java
L
Build a universal model 1. Partition
C++
S
C++
L
Java
2. Cluster
R1(x)
R1(x)
R3(x)
3. Obtain Ranking Functions
4. Rank
Build a universal defect prediction model using rank-transformed values.
![Page 14: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/14.jpg)
14
Case study setup
937
461
0 200 400 600 800
Version Control System
0
200
400
600
800
1000
Using Not Using
Issue Tracking System
0
200
400
600
800
Programming languages
![Page 15: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/15.jpg)
15
Research Questions
![Page 16: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/16.jpg)
16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Precision Recall AUC
Rank Transformation
Log Transformation
0.48 0.48 0.57
0.58 0.62
0.61
RQ1. Is our rank transformation good ?
![Page 17: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/17.jpg)
17
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Precision Recall AUC
Universal Model
Within-project Model
0.45 0.48
0.58 0.63 0.64
0.62
RQ2. How good is the universal model ?
![Page 18: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/18.jpg)
18
RQ3. Does the universal model work for external projects ?
Predict
![Page 19: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/19.jpg)
19
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Eclipse Equinox PDE Mylyn Lucene
Universal Model
Within-project Model 0.31
0.47
0.63 0.66
0.21
0.13
Precision
0.23 0.28
0.23 0.28
RQ3. Precision comparison
![Page 20: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/20.jpg)
20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Eclipse Equinox PDE Mylyn Lucene
Universal Model
Within-project Model
0.57
0.79
0.54 0.61 0.61
0.34
Recall
0.47
0.72
0.42
0.60
RQ3. Recall comparison
![Page 21: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/21.jpg)
21
0.6 0.62 0.64 0.66 0.68
0.7 0.72 0.74 0.76 0.78
0.8
Eclipse Equinox PDE Mylyn Lucene
Universal Model
Within-project Model
0.76 0.77 0.78
0.79
0.69 0.67
AUC
0.70 0.70 0.68
0.69
RQ3. AUC comparison
![Page 22: Towards Building a Universal Defect Prediction Model](https://reader034.vdocuments.mx/reader034/viewer/2022051514/5484e0a1b4af9f870d8b4cbf/html5/thumbnails/22.jpg)
22
Summary