analysing defect inflow distribution of automotive & large software projects
TRANSCRIPT
![Page 1: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/1.jpg)
Rakesh Rana
University of Gothenburg, Sweden
Analyzing defect inflow distribution of large software projects
![Page 2: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/2.jpg)
Defect inflow distribution
Tracking and predicting quality challenge.
Software defects observable and useful indicator to track and forecast software reliability.
Software reliability measures are primarily used for [1]:• Planning and controlling testing resources allocation, and• Evaluating the maturity or release readiness.
[1] C.-Y. Huang, M. R. Lyu, and S.-Y. Kuo, “A unified scheme of some nonhomogenous poisson process models for software reliability estimation,” IEEE Trans. Softw.
Eng., vol. 29, no. 3, pp. 261–269, 2003.
![Page 3: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/3.jpg)
SRGMs: Software Reliability Growth Models
![Page 4: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/4.jpg)
Defect inflow distribution
According to Okamura, Dohi and Osaki [1], understanding underlying defect distribution family is important:
“When the number of total software faults is given by a Poisson random variable, the mean value function of NHPP-based SRGMs is dominated by only failure time distribution. That is, the essential problem can be reduced to what kind of probability distribution is suitable for representing the failure time distribution.”
[1] H. Okamura, T. Dohi, and S. Osaki, “Software reliability growth model with normal distribution and its parameter estimation,” in Quality, Reliability, Risk, Maintenance,
and Safety Engineering (ICQR2MSE), 2011 International Conference on, 2011, pp. 411–416.
![Page 5: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/5.jpg)
Why?
Finding the distribution that fits best to observed defect inflow data is helpful for:
1. Understanding underlying process of defect discovery
2. Choose the correct statistical analysis
3. Visualization and simulations
4. Selecting appropriate model for modelling/forecasting
reliability growth
5. Bayesian analysis to model prior probability
![Page 6: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/6.jpg)
Research objectives
Explore which statistical distribution fit best to the defect inflow from large software projects, and
Explore how different information criteria differ in selection of best distribution fit.
![Page 7: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/7.jpg)
Research methodology
Context: Large Software Projects
Case: Defect Inflow Distribution (best fit)
Unit 1: VCC
Four large automotive software project
Unit 2: Ericsson
Five consecutive releases of a large telecom product
Unit 3: OSS
Five large open source software project
Unit of analysis Application domainSoftware development process for studied
projects
Volvo Cars
GroupAutomotive
V-shaped software development mostly using sub-
suppliers for implementation
Ericsson Telecom Agile development, mostly in-house
OSS Open-Source ProjectsOpen source software development, projects from
Apache and Mozilla
![Page 8: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/8.jpg)
Projects used in study
Case Unit Project/Release Time PeriodTotal number of
Defects*/Issues
VCG
Project-A1
Project-A2
Project-A3
Project-A4
NA
6.7X
14.4X
2.0X
X
Ericsson
Release-B1
Release-B2
Release-B3
Release-B4
Release-B5
NA
2.2Y
Y
1.3Y
1.2Y
1.6Y
OSS
Project- HTTPClient
Project- Jackrabbit
Project- Lucene-Java
Project- Rhino
Project- Tomcat5
Nov-2001 – Apr-2012
Sep-2004 – Apr-2012
Mar-2004 – Mar-2012
Nov-1999 – Feb-2012
May-2002 – Dec-2011
305
938
697
302
670
![Page 9: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/9.jpg)
Overview of distributions
No Distribution Notation Parameters Probability Density Function
1 Exponential 𝐸𝑥𝑝(𝜆) 𝜆 > 0; 𝜆 𝑒−𝜆𝑥
2 Weibull 𝑊𝑒𝑖𝑏𝑢𝑙𝑙(𝜆, 𝑘)𝜆 > 0;𝑘 > 0
𝑘
𝜆
𝑥
𝜆
𝑘−1
𝑒− 𝑥 𝜆 𝑘
0 , 𝑥 < 0, 𝑥 ≥ 0
3 Beta 𝐵𝑒𝑡𝑎(𝛼, 𝛽)𝛼 > 0;𝛽 > 0
𝑥𝛼−1 1 − 𝑥 𝛽−1
𝐵 𝛼, 𝛽,
𝑤ℎ𝑒𝑟𝑒 𝐵 𝛼, 𝛽 = 0
1
𝑢𝛼−1 1 − 𝑢 𝛽−1 𝑑𝑢
4 Gamma 𝐺𝑎𝑚𝑚𝑎(𝑘, 𝜃)𝑘 > 0;𝜃 > 0
1
Γ 𝑘 𝜃𝑘𝑥𝑘−1𝑒
−𝑥𝜃,
𝑤ℎ𝑒𝑟𝑒 Γ 𝑘 = 0
∞
𝑥𝑡−1𝑒−𝑥 𝑑𝑥
5 Logistic 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐(𝜇, 𝑠)𝜇 (𝑟𝑒𝑎𝑙);
𝑠 > 0
𝑒−𝑥−𝜇𝑠
𝑠 1 + 𝑒−𝑥−𝜇𝑠
2
6 Normal 𝒩(𝜇, 𝜎2)𝜇 (𝑟𝑒𝑎𝑙);
𝜎2 > 0
1
𝜎 2𝜋𝑒−𝑥−𝜇 2
2𝜎2
![Page 10: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/10.jpg)
Overview of information criteria
No Short Long Name Definition
1 LogLik Log likelihoodLogarithm of the probability of
observed outcomes given a set ofparameter values
2 ML Maximum Likelihood 𝑀𝐿 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘
3 AIC Akaike Information Criterion 𝐴𝐼𝐶 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 + 2 ∗ 𝑘
4 AICcAkaike Information Criterion
(correction)𝐴𝐼𝐶𝑐 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 +
2𝑘𝑛
𝑛 − 𝑘 − 1
5 BIC Bayesian Information Criterion 𝐵𝐼𝐶 = −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 + 𝑘 ∗ log(𝑛)
6 HQCHannan–Quinn Information
Criterion
𝐻𝑄𝐶= −2 ∗ 𝑙𝑜𝑔𝑙𝑖𝑘 + 2 ∗ 𝑘 ∗ log(log(𝑛))
Where 𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑎𝑛𝑑 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
![Page 11: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/11.jpg)
Defect Inflow Profiles
![Page 12: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/12.jpg)
Probability density plots
![Page 13: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/13.jpg)
Quantile–quantile plots (QQ-plots)
![Page 14: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/14.jpg)
Different information criteria: Project-Jack
Project Distribution LogLik ML AIC AICc BIC HQC
Jack
Exponential 7.29 -14.57 -12.57 -12.53 -10.05 -11.56
Weibull 36.25 -72.50 -68.50 -68.36 -63.45 -66.46
Beta 36.72 -73.44 -69.44 -69.31 -64.40 -67.41
Gamma 36.05 -72.10 -68.10 -67.96 -63.06 -66.06
Logistic 31.43 -62.86 -58.86 -58.72 -53.81 -56.82
Normal 30.79 -61.58 -57.58 -57.44 -52.53 -55.54
Selected Criteria 36.72 -73.44 -69.44 -69.31 -64.40 -67.41
Selected Distribution Beta Beta Beta Beta Beta Beta
![Page 15: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/15.jpg)
Log-Likelihood values for selected distribution
Project Exponential Weibull Beta Gamma Logistic Normal
A1 59.0 63.6 105.9 66.0 8.4 5.2
A2 5.5 7.0 19.8 5.5 -6.8 -3.1
A3 56.9 82.9 104.6 98.6 11.6 10.9
A4 119.8 188.3 491.2 199.5 50.3 35.3
Release Exponential Weibull Beta Gamma Logistic Normal
B1 88.1 92.6 167.1 97.9 48.8 39.6
B2 -4.3 4.6 24.8 6.0 2.7 2.0
B3 -7.8 2.8 5.0 2.9 -0.7 0.3
B4 38.8 38.9 86.7 40.4 30.2 24.7
B5 -8.4 5.4 11.9 5.3 1.7 2.2
Project Exponential Weibull Beta Gamma Logistic Normal
Http 195.4 202.1 406.2 211.9 174.6 169.0
Jack 7.3 36.2 36.7 36.0 31.4 30.8
Lucene 62.0 67.6 70.7 71.9 59.4 58.8
Rhino 52.9 63.7 59.4 77.3 31.4 28.1
TomCat 58.5 73.0 76.9 71.9 12.3 10.7
![Page 16: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/16.jpg)
Quantile–Quantile plots (QQ-plots)
![Page 17: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/17.jpg)
Conclusions
Research objectives Explore which statistical distribution fit best to the defect inflow
from large software projects, and Explore how different information criteria differ in selection of
best distribution fit.
It’s useful for: Understanding underlying process of defect discovery. Choose the correct statistical analysis Visualization and simulations Selecting appropriate model for modelling/forecasting reliability
growth Bayesian analysis to model prior probability
![Page 18: Analysing Defect Inflow Distribution of Automotive & Large Software Projects](https://reader030.vdocuments.mx/reader030/viewer/2022032507/55cef08cbb61ebfd2c8b4582/html5/thumbnails/18.jpg)