Download - Defect Detection

DEFECT PREDICTION MODEL FOR TESTING PHASE

MUHAMMAD DHIAUDDIN BIN MOHAMED SUFFIAN

UNIVERSITI TEKNOLOGI MALAYSIA

ii

DEFECT PREDICTION MODEL FOR TESTING PHASE

MUHAMMAD DHIAUDDIN BIN MOHAMED SUFFIAN

A project report submitted in fulfillment of the requirements for the award of the degree of

Masters of Science (Computer Science – Real Time Software Engineering)

Faculty of Computer Science and Information System Universiti Teknologi Malaysia

MAY 2009

iv

ALHAMDULILLAH….

To my beloved parents, my wife, brothers and sisters

who have given me courage and strength

v

ACKNOWLEDGEMENT

In completing the project, there are many individuals who have contributed to the

success of this research. First and foremost, special thanks to my academic supervisor,

Prof. Dr. Shamsul Sahibuddin who has guided me throughout this research work.

Appreciation also goes to my industrial supervisor who is also the Senior Manager of

Test Centre of Excellence department, Mr. Mohamed Redzuan Abdullah for his support

and constructive comment in completing this project.

I am very grateful to my parents and parents’ in-law who always put trust and

faith in me to continue working for this research. Special gratitude goes to my wife who

continually gives her dedicated encouragement to me throughout the tough period. Not

forgotten, thank you to the members of Test COE department for their cooperation and

valuable inputs in ensuring the success of this project. Not forgotten, special thanks to my

Six Sigma coach for his constant cooperation and technical guidance.

Last but not least, great gratitude expressed to the colleagues of Part Time 9 for

Real Time Software Engineering programme. My thanks also go to staffs of Centre for

Advanced Software Engineering (CASE) who have involved directly or indirectly in the

project.

vi

ABSTRACT

The need for predicting defects in testing phase is important nowadays as part of

the improvement initiatives for software production process. Being the group that

ensuring successful implementation of verification and validation process area, all test

engineers in Test Centre of Excellence (Test COE) department are required to play their

part to discover software defects as many as possible and contain them within testing

phase. This research is aimed to achieve zero-known post release defects of the software

delivered to end-user. To achieve the target, the research effort focuses on establishing a

defect prediction model for testing phase using Six Sigma methodology. It identifies the

customer needs on the requirement for the prediction model as well as how the model can

benefits them. It also outlines the possible factors that associated to defect discovery in

testing phase. Analysis of the repeatability and capability of test engineers in finding

defects are elaborated. This research also describes the process of identifying type of data

to be collected and techniques of obtaining them. Relationship of customer needs with the

technical requirements is then explained clearly. Finally, the proposed defect prediction

model for testing phase is demonstrated via regression analysis. This is achieved by

considering faults found in phases prior to testing phase and also the code size of the

software. The achievement of the whole research effort is described at the end of this

project together with challenges faced and recommendation for next research work.

vii

ABSTRAK

Keperluan terhadap meramalkan kecacatan dalam fasa pengujian adalah penting

pada masa kini sebagai sebahagian daripada inisiatif pembaikan untuk proses penghasilan

perisian. Menjadi kumpulan yang memastikan kejayaan perlaksanaan bidang proses

verifikasi dan validasi, semua jurutera pengujian di jabatan Pusat Kecemerlangan

Pengujian adalah diperlukan dalam memainkan peranan mereka untuk menjumpai

kecacatan perisian sebanyak yang mungkin dan membendung kecacatan tersebut dalam

lingkungan fasa pengujian. Penyelidikan ini menyasarkan untuk mencapai kecacatan

sifar diketahui bagi pasca pelepasan untuk perisian yang diserahkan kepada pengguna

akhir. Untuk mencapai sasaran tersebut, usaha penyelidikan bertumpu kepada

mewujudkan model ramalan kecacatan untuk fasa pengujian dengan menggunakan

kaedah Six Sigma. Ia mengenal pasti keperluan pengguna ke atas keperluan model

ramalan dan juga bagaimana model tersebut member manfaat kepada mereka. Ia juga

menggariskan faktor-faktor yang berpotensi dikaitkan dengan penemuan kecacatan dalam

fasa pengujian. Analisa mengenai kebolehulangan dan kemampuan para jurutera

pengujian dalam menjumpai kecacatan turut dihuraikan. Penyelidikan ini juga

menerangkan proses mengenal pasti jenis data yang perlu dikumpul and teknik untuk

memperolehnya. Kaitan keperluan pengguna dengan keperluan teknikal kemudiannya

diterangkan dengan jelas. Akhirnya, cadangan model ramalan kecacatan untuk fasa

pengujian ditunjukkan melalui analisa regresi. Ini dicapai dengan menimbang kesilapan-

kesilapan yang dijumpai dalam fasa-fasa sebelum fasa pengujian dan juga saiz kod untuk

perisian tersebut. Kejayaan untuk keseluruhan usaha penyelidikan dijelaskan di akhir

tesis bersama-sama dengan cabaran yang dihadapi dan cadangan untuk kerja

penyelidikan seterusnya.

viii

TABLE OF CONTENTS

CHAPTER

TITLE

DECLARATION

DEDICATION

ACKNOWLEDGEMENT

PAGE

iii

iv

v

ABSTRACT

ABSTRAK

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

LIST OF ABBREVIATIONS

vi

vii

viii

x

xi

xii

1 INTRODUCTION

1.1 Introduction

1.2 Introduction to Defect Prediction Model for

Software Testing

1.3 Background of Company

1.4 Background of Problem

1.5 Statement of Problem

1.6 Objectives of Study

1.7 Importance of Study

1.8 Scope of Work

1.9 Project Schedule

1

1

1

2

3

5

6

7

7

7

ix

1.10Project Outline

8

2 LITERATURE REVIEW ON DEFECT PREDICTION MODEL FOR TESTING PHASE 2.1 Introduction

2.2 Defect Prediction across Software

Development Life Cycle (SDLC)

2.3 Reviews on the Defect Prediction across SDLC

and Testing Phase

2.4 Applications and Issues of Defect Prediction

2.5 Summary of the Proposed Solution

10

10

10

19

20

30

3 METHODOLOGY

3.1 Introduction

3.2 Six Sigma - DMADV Methodology

3.3 Supporting Tools

31

31

31

36

4 PROJECT DISCUSSION

4.1 Introduction

4.2 Findings of Define Phase

4.3 Findings of Measure Phase

4.4 Findings of Analyze Phase

37

37

37

44

50

5 CONCLUSION

5.1 Achievements

5.2 Constraints and Challenges

5.3 Recommendation

53

53

55

56

REFERENCES 58

x

LIST OF TABLES

TABLE NO.

1.1

2.1

2.2

3.1

3.2

TITLE

Project schedule

Short-term defect inflow prediction example

Strength and weakness of defect prediction

techniques

Project team

Customer identification

PAGE

8

17

27

32

33

xi

LIST OF FIGURES

FIGURE

NO.

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

2.10

2.11

3.1

4.1

4.2

4.3

4.4

4.5

TITLE

Defects detection techniques

Defects per life cycle phase

Defects based on testing metrics

Relationship between CMM levels and delivered

defects

Short-term defect inflow prediction example

Normalized results from the application of CDM

Model to test process

Process Performance Model

Graphical representation of Rayleigh model

parameters

Prediction without process metrics

Prediction with process metrics

High level schematic of whole phase BN

DMADV phases

MIMOS software production process

Schematic diagram

Detail schematic – Y to X tree diagram

Team charter

Customer need statement

PAGE

12

14

15

15

16

19

22

24

25

25

26

32

38

39

40

41

42

xii

4.6

4.7

4.8

4.9

4.10

4.11

4.12

4.13

4.14

4.15

4.16

4.17

4.18

1st level of KJ analysis

2nd level of KJ analysis

Kano analysis

House of quality for defect prediction model

Test case experiment result

Assessment agreement

Assessment agreement for within appraiser

Assessment agreement for each appraiser against

standard

Assessment agreement for all appraisers against

standard

Operational definition

Data collection plan

Data for regression

Regression result

43

43

44

45

46

47

47

48

48

49

50

51

51

xiii

LIST OF ABBREVIATIONS

BN

CMM

CMMI

COE

COQUALMO

CUT

DfSS

DMADV

FMEA

FP

IPF

ISP

JARING

KJ

KLOC

LOC

MEMS

MIMOS

MOF

MSA

NEMS

PC

PDF

- Bayesian Network

- Capability Maturity Model

- Capability Maturity Model Integration

- Centre of Excellence

- Constructive Quality Model

- Code and Unit Testing

- Design for Six Sigma

- Design, Measure, Analyze, Design, Verify

- Failure Mode and Effect Analysis

- Function Point

- In-Process Fault

- Internet Service Provider

- Joint Advanced Research Integrated Networking

- Kawakita Jiro

- Kilo Lines of Code

- Lines of Code

- Micro-Electro-Mechanical Systems

- Malaysian Institute for Microelectronic Systems

- Ministry of Finance

- Measurement System Analysis

- Nano-Electro-Mechanical Systems

- Personal Computer

- Probability Density Function

xiv

QFD

R&D

SDLC

SEI

TER

UAT

V&V

- Quality Function Deployment

- Research and Development

- Software Development Life Cycle

- Software Engineering Institute

- Test Effectiveness Ratio

- User Acceptance Test

- Verification and Validation

1

CHAPTER 1

INTRODUCTION

1.1 Introduction This chapter describes the introduction of the research effort presented throughout

this project. It illustrates the overview of the research that encourages the establishment

of defect prediction model for software testing phase. The discussion continues with

background of the research, problem statements, research objectives and the importance

of the research. The scope of work and project outline is then explained in the last

sections of this chapter.

1.2 Introduction to Defect Prediction Model for Software Testing As an organization that aims to become a premier applied research centre in

frontier technologies, MIMOS has always committed to develop, produce and release

high quality software to the market. One of the key aspects to ensure it can be achieved is

by having effective and efficient software development process throughout entire SDLC.

Thus, prediction or estimation of defects for particular software during testing phase is

very crucial to enhance the testing process as part of process improvement in SDLC.

2

Being the last gate before acknowledging that the particular software is ready to

go to the market requires strong and accurate data and metrics. The initiative on having

defect prediction model for testing phase helps in determining defects that are likely to

occur during test execution and contributes in providing relevant software quality

metrics. Defect prediction model for testing contributes to zero-known post release

defects of a software product. This is determined by defect containment in testing phase.

Predicting total number of defects at the start of testing allows for wider test coverage to

be put in place. As more defects contained within testing phase, it helps in improving

quality of software product being delivered to end user. By using testing metrics for

predicting total defects, it demonstrates the stability of development effort of releasing a

software product.

1.3 Background of Company MIMOS or Malaysian Institute for Microelectronics System was established on 1st

January 1985 as a unit of Prime Minister’s department following the initiative by group

of academicians led by Tengku Dr. Mohd Azzman Shariffadeen. The initial objective is

to conduct microelectronics research to support the industries as well as to develop

indigenous products. After going through corporatization exercise as a company under

Ministry of Finance (MOF), MIMOS has been focusing on three (3) core functions:

Research and Development (R&D), National IT Policy Development and Business

Development. Since then, MIMOS has embarking on various initiatives and projects

including manufacture affordable personal computer (PC), commission industrial-class

water fabrications plant, launch first Malaysia’s first Internet Service Provider (ISP)

called JARING, initiate Computer Forensic Services and also launch AgriBazaar. On 1

July 2006, Dato’ AbdulWahab Abdullah was appointed as new President and Chief

Executive Officer of MIMOS replacing Tengku Dr. Mohd Azzman Sharifadden. The

appointment of Dato’ Abdul Wahab has turned MIMOS from the R&D organization in

ICT and microelectronics to world class R&D Centre of Excellence.

3

With the tagline “Innovation for Life”, MIMOS is now the premier applied

research centre in frontier technologies aimed at growing globally competitive indigenous

industries. Through smart partnerships with local and international universities, research

institutes, industries and Malaysia Government, MIMOS focuses on frontier technologies

by pursuing exploratory and industry-driven applied research. To date, research and

technology areas in MIMOS are refined into eight (8) technology clusters: Advanced

Informatics, Communication Technology, Cyberspace Security, Encryption Systems,

Grid Computing, Knowledge Technology, Micro Energy, and Micro Systems

(MEMS/NEMS) cluster.

1.4 Background of the Problem

As defect becomes the main intention of software testing, test engineers of Test

Centre of Excellence (Test COE) are expected to find and discover any errors, bugs and

faults in the software through various kinds of testing techniques or strategies. Estimation

of defects to be found upfront at the start of testing phase is very important to strategize

on executing the test for the software. As this research problem concentrates on

formulating a defect prediction model for testing, several issues contribute to this

research problem.

1.4.1 Issue on Better Resource Planning for Test Execution Across Projects Currently, number of test engineers allocated to a particular testing project is

based on the size of the project including how detail the requirements are and complexity

of the software being developed. At the same time, one test engineer can work in more

than one testing project. Thus, there is a need to estimate the number of defect to be

found in testing phase. Appropriate number of test engineers can be planned and

4

allocated across multiple projects. Estimated number of defects is required to support the

resource planning activities within the testing department to ensure the resources are

optimized and productivity of every test engineer is high.

1.4.2 Issue on Wider Test Coverage to Find Defect

Defects are found while test execution is still in progress. Test engineer applies

various testing techniques to find and discover as many defects as possible based on the

baseline requirements. There is no specific pre-determined factor that a test engineer can

use as a basis to find defects. As a result, test engineer will continue to find defects until

end of the execution schedule. To ensure wider and better test coverage, prediction of

defects could improve the way test engineers find defects as they are now having a target

of defects to be found. This can be in a form of adding more type of testing to use or

adding more and relevant scenarios on how users will utilize the software which results in

better root cause analysis of defects found and improve engineer’s understanding of the

software under test.

1.4.3 Issue on Improving Test Execution Time to Meet Project Deadline As test engineer need to discover as many defect as possible during text

execution, there might be slip in schedule and cause the delay in the deliverables of the

software work product from the actual planned release date. This due to necessity of

ensuring all testing requirements for the software are fulfilled and covered. Putting defect

prediction in place will reduce and overcome the schedule slippage problem contributed

by testing activities. By having a target on estimated number of defects to be discovered,

every test engineer would be able to plan test execution accordingly to ensure project

deadline is achieved.

5

1.4.4 Issue on Reliability of Software to be Delivered Defects found during testing phase are given back to development team for bugs

fixing. The fixed software is then retested in several iterations to validate the defects have

been resolved. More defects found within testing phase means more defects are contained

from escaping to field or market that will be using the software. However, test engineers

cannot give the exact figure of defect that be contained within testing phase. This is

where the existence of defect prediction is really needed to provide a direction on how

many defects engineers should discover and contain within the phase. Having this

estimated figure contributes to the zero-known post release defects of the software. In the

long run, the metrics associated with the defect prediction will portray the stability of the

development effort in completing and releasing software product.

1.5 Statement of the Problem

This research is intended to tackle the issues with regard to system testing process

as explained in Section 1.4. The main question to address that is “How to predict the total

number of defects to be found at the start of system testing phase using a model?”.

Followings are the list of sub-questions to support main research questions:

i. What are the key contributors to test defect prediction model?

ii. What are the factors that contribute to defect found in system testing

phase?

iii. How to measure the relationship between the factors of defect with the

total number of defects in system testing phase?

iv. What is the type of defect category that needs to be considered to calculate

the total defects of the software?

v. How can the prediction model helps in improving the testing process and

improve the software quality?

vi. What type of data should be gathered and how to get them?

6

1.6 Objectives of the Study

The research is aimed to achieve following set of objectives to address the issues

and problems mentioned before. The objectives are:

1) To establish a defect prediction model for software testing phase

2) To demonstrate the approach in building a defect prediction model using

Design for Six Sigma (DfSS) Methodology

3) To identify the significant factors that contribute to a reliable defect prediction

model

4) To determine the importance of defect prediction model for improving testing

process

1.7 Importance of Study As the organization is moving towards achieving CMMI Level 5 status company,

it needs to improve and refine the Verification and Validation (V&V) Process Area. For

this reason, Test Centre of Excellence (Test COE) plays the important role to ensure the

goal is achieved by improving the internal testing process. As the testing approach being

applied is based on V-Model, it is essential to refine all processes with regard to V&V

across all phases in the life cycle, starting from requirement until actual system testing

phase. By having a model to estimate and predict total number of defects in the system

testing phase, it does help the testing team to contribute in achieving the said target.

Putting the defect prediction in the process serves as the preventive mechanism in

reducing the occurrence of defects (Mohapatra and Mohanty, 2001). Furthermore, it

could be beneficial tool to reduce testing time. Reduction in testing time is accomplished

by implementing effective test strategy in minimizing escaped defects while utilizing

resources efficiently. At the same time, the development team can use the model to guide

them in implementing higher quality codes. Obviously by looking from the overall

perspective of software development life cycle, defect prediction model for test does

7

improve the verification and validation process, specifically in ensuring zero-known post

release defects of a software product.

1.8 Scope of Work

In this research, the scope is focused on exploring and establishing a model of

defect prediction specifically for system testing phase. Study on the defects means the

identification of faults, errors and bugs in the software during system testing phase of

software development life cycle. It can be from functional defects, security defects,

usability defects to performance defects. In order to do the prediction for these defects,

analysis is done to determine the factors or contributors to the introduction of defects in

the testing phase. This involves identifying all possible significant factors such as faults

in requirement phase, design phase, code and unit test phase, size of the software, fault

density and historical defects. Moreover, the scope of work also emphasizes on

measuring the capability of test engineers in discovering defects.

The work on modeling is done to establish the relationship between the identified

predictors against the defects found in testing phase. Individual analysis of each possible

predictor need to be performed to determine which factors that has strong connection

with defects. The output is going to be the proposed model that will be normalized to suit

all kind of projects.

1.9 Project Schedule

The research which is also serves as the professional training started from 20

October 2008 until 17 April 2009. However, the actual end date for this project is on 30

May 2009 since it needs to follow the agreed schedule as per Six Sigma Green Belt

project methodology, in which it starts with Define phase, Measure phase, Analyze

8

phase, Design phase and Verify phase. For the purpose of this project, the results and

discussion will be presented up until end of Verify phase schedule. The schedule is

presented as below:

Table 1.1: Project schedule

Phase Start Date End Date

Define 20/10/2008 30/11/2008 Measure 01/12/2008 31/01/2009 Analyze 01/02/2009 31/03/2009 Design 01/04/2009 30/04/2009 Verify 01/05/2009 30/05/2009

1.10 Project Outline This research encompasses the discussion on the several topics and subjects

related to establishing a defect prediction model for testing phase. Thus, the research is

organized into the followings for further understanding of the subject matter.

Chapter 2: This chapter discusses on the literature review of the defect

prediction model. Discussion involves overview of the several techniques in

predicting software defects across Software Development Life Cycle, some issues

with regard to defect prediction, strategies in predicting defects for software

testing phase as well as application of the defect prediction model in improving

software process.

Chapter 3: It discusses the research methodology applied in analyzing the

research problem and formulating the proposed solution with regard to Six Sigma

Green Belt DMADV track.

9

Chapter 4: This chapter outlines the discussion on the outcome of the research

activities that covers the characteristics of data being gathered, analysis of

relationship on possible factors that contributes to the prediction defect as well as

establishing the proposed model that be verified and validated to ensure it is fit to

be incorporated into the software process.

Chapter 5: This chapter summarizes the research studies that include the

achievement that have been obtained and how the proposed outcome of defect

prediction model for test could contribute to the user. Then, it concludes with the

limitation of the proposed solution together with recommendations for the future

research work.

10

CHAPTER 2

LITERATURE REVIEW ON DEFECT PREDICTION MODEL FOR TESTING PHASE

2.1 Introduction This chapter outlines and describes the approaches in predicting the number of

defects to be discovered for a software product, particularly for software testing phase. It

presents the overview of various techniques and models in predicting software defects

across Software Development Life Cycle (SDLC). It then focuses on strategies in

estimating defects for software testing phase using various models. Next, it describes the

application and use of defect estimation with regard to software process improvement and

software quality. Several critiques on defect prediction model are also presented. Finally,

this chapter illustrates the proposed model in predicting and estimating defects for

software testing phase.

2.2 Defect Prediction across Software Development Life Cycle (SDLC) This section describes the approaches of defect prediction throughout Software

Development Life Cycle (SDLC). It consists of perspectives of defect and defect

prediction, approaches and techniques of defect prediction as well as relationship of

defect prediction with reliability.

11

2.2.1 Perspectives of Defect and Defect Prediction The term defect itself can be expressed in various ways. Defect is referred as a

flaw in a component or system that can cause the component or system to fail to perform

its required function (Graham, Veenendaal, Evans and Black, 2007). According to Fenton

and Neil (1999), when there is deviation from specifications or expectations of particular

software, then it is also called a defect. Clark and Zubrow (2001) also express defect as

any flaw or imperfection in a software work product or software process. The defect

found is referred to as fault or bug. Regardless of any of the definition, both are

expressing towards common understanding, in which when defect takes place, it may

cause failures in operation of a component or a system.

In the context of defect prediction, it is vital to analyze the defects and understand

the rationale of predicting defects. Predicting defects is important for assessing project

progress and planning the activities of defect detection. Predicting defects also helps

those involved in coming out with the software work product to decide on the work

product quality. Process performance can be assessed thus improve the capability of the

process. That is why defect is always the main subject of defect prediction.

2.2.2 Defect Detection and Defect Prediction

As defect becomes the main focus of defect prediction, we should be able to

distinguish between different defect severities, either major or minor defects. Minor

should not be taken into considerations as it will inflate the estimation of product defects.

From the observations done, most defect prediction depends on historical data.

Furthermore, the techniques used to predict defect vary especially in term of data

required (Clark and Zubrow, 2001). Prediction of defect can require little or more data. It

also can rely on some work product characteristics or only use defect data. These

12

differences in the quality of inputs used for predicting defects will determine the

strengths and weaknesses of a particular defect prediction.

To start off with estimating defects, we must first aware on how defects are

detected and generated. The purpose of understanding the defect detection is to identify

the sources of defect or how defects are discovered. Defects can be detected either from

verification and validation (V & V) process or post-deployment. Figure below

summarizes the defect detection techniques as outlined in the studies by Clark and

Zubrow.

Figure 2.1: Defects detection techniques

In general, defect prediction deals with estimating number of defects or faults.

Defect prediction is usually used interchangeably with other terms such as defect

estimation, fault prediction or fault estimation. Nayak and Naidya (2003) describe defect

estimation as a proactive process of identifying various kinds of defects in the design,

content and code of a software product with the aim to enhance product quality and

performance capability. Having the defect prediction helps in estimating the quality of

software before being released and used by the users. To answer this, again Fenton and

Neil (1999) observed the effort from three areas: predicting the number of defects in a

13

system, estimating the reliability of systems in terms of time to failure, and understanding

the impact of design and testing process on defect counts and defect densities.

This defect prediction is expressed in a form of equations describing the defect

inflow as a function of other selected measurements such as milestone completion status

or lines of code (LOC), either from a short-term or long-term standpoint (Staron and

Meding, 2007). Both standpoints help in monitoring the project status and project

progress in developing software.

2.2.3 Approaches and Techniques of Defect Prediction

Various approaches and techniques have been formulated and applied in

predicting number of defects throughout the entire SDLC. The techniques or approaches

which are presented in a form of model or equation are developed according to several

sources and metrics. Neil and Fenton (1999) presented their findings on how defects are

predicted. First approach is prediction by using size and complexity metrics, in which it

predicts defects directly based on program code, mostly towards lines of code and

McCabe’s Cyclomatic complexity. According to them, a study by Akiyama of Fujitsu,

Japan showed that linear models of some simple metrics provide reasonable estimates for

the total number of defects. From the four equations computed by him, one of them

involves equation on lines of code (LOC) as below:

Defect (D) = 4.86 + 0.018 Lines of Code (L)

They added on the argument by Gaffney that stated relationship between Defect (D) and

Lines of Code (L) was not language dependent due to optimal size for individual modules

with regard to defect. Lipow’s data is used for the prediction:

D = 4.2 + 0.0015 L4/3

14

Further analysis was then conducted by Compton and Withrow who derived the

polynomial equation, in which they concluded that the optimum size for an Ada module

is 83 source statements with respect to minimizing error density. The equation is as

below:

D = 0.069 + 0.00156 L + 0.00000047 L2

Second approach as outlined by Neil and Fenton is predicting defects using

Function Point (FP). It is a measure of number of functionality in requirements for

particular software. This Albrecth Function Point describes defect density prediction by

using metric extracted at specification stage due to believe function point-based metric is

better than lines of code and is language independent.

Figure 2.2: Defects per life cycle phase

Testing metrics is another approach given by Neil and Fenton for predicting

defects. This involves careful collection of data on defects found during inspection and

testing phases. Test coverage measure is one of the testing metrics used to predict defect

via structural testing strategy. The resulting metric is called Test Effectiveness Ratio

(TER) that covers either statement coverage, branch coverage or Linear Code Sequence

and Jump coverage. Examples of how defects are found based on testing metrics is

presented below:

15

Figure 2.3: Defects based on testing metrics

Finally in their findings, Neil and Fenton described the usage of process quality

data to predict the defects of software. This was expressed through the SEI Capability

Maturity Model (CMM) ranking. The table below outlines the relationship between

CMM levels and delivered defects.

Figure 2.4: Relationship between CMM levels and delivered defects

For large software projects, studies by Staron and Meding have produced two

types of prediction model which is defect inflow prediction (2007). One model is for

short-term defect inflow prediction and another is for long-term defect inflow prediction.

Historical data from the defect inflow trends and project plans is used to construct the

short-term prediction model. From the data, multivariate linear regression prediction

model is created, which then is applied in new projects to predict number of defects for a

particular week. This multivariate regression model for short-term prediction is

represented as an equation based on several independent variables as below:

Y = a0x0 + a1x1 + … + anxn

16

From the equation, values for ‘a’ is the coefficient calculated using statistical

regression while method while ‘x’ is the independent variable. Based on the short-term

prediction model, project team members can predict the number of inflow defects to be

found in future week of the project execution. As presented in the example below given

in their studies, project manager should pay more attention to week 13, based on current

situation of the project including number of defects reported in current and previous week

as well as status of planned and accumulated number of packages.

Figure 2.5: Short-term defect inflow prediction example

Software Engineering Institute (SEI) of Carnegie Mellon University also

conducted a study of the level of goodness for particular software. That study by Clark

and Zubrow (2001) emphasized on techniques in defect prediction. They categorized

defect prediction techniques into three areas: project management, work product

assessment and process improvement. Project management covers prediction techniques

such as empirical defect prediction, defect discovery profile, COQUALMO and

orthogonal defect classification. For work product assessment area, it involves fault

proneness evaluation and capture/recapture analysis. As for process improvement area,

defect prevention and statistical process control techniques are used. The descriptions for

all these techniques are presented in the table below:

17

Table 2.1: Short-term defect inflow prediction example

Area Technique Description

Project

Management

Empirical Defect Prediction Number of defects per size (Defect

density)

- Defect density (Number of

defects/thousands line of codes)

based on historical data

- Enhanced with historical data on

injection distribution and yield

Defect Discovery Profile Projection based on time or phrases of

defect density found in process onto

theoretical discovery curve (Rayleigh)

COQUALMO Defect prediction model for

requirements, design and coding phases

based on sources of introduction and

discovery techniques used

Orthogonal Defect

Classification

Classification and analysis of defects to

identify project status based on

comparison of current defects with

historical patterns

Work Product

Assessment

Fault Proneness Evaluation

(Size, Complexity, Prior

History)

Analysis of work product attributes to

plan for allocation of defect detection

resources (inspection and testing)

Capture/Recapture Analysis Analysis of pattern of defects detected

within an artifact by independent defect

detention activities (inspectors or

inspection versus test)

Process

Improvement

Defect Prevention Program Root cause analysis of most frequently

occurring defects

Statistical Process Control Use of control charts to determine

whether inspection performance was

18

consistent with prior process

performance

2.2.4 Defect Prediction in Testing Phase of Software Development Life Cycle

Approaches used in the previous findings of defect prediction mostly covered the

potential number of defects to be found for all phases in Software Development Life

Cycle but no specific prediction techniques explained for Testing phase. Although there

are some techniques mentioned about defects to be found in System Test phase but the

findings also take into account the defects to be found prior to and after System Test

phase.

The main intention is to understand more on the prediction techniques of defects

to be found specifically for software testing phase of SDLC. Bertolino and Marchetti

(2003) introduced a simple model called Bemar model. This model is used to predict the

expected number of remaining failures in early test phases. It is quite simple since it

predicts number of defects based on intervals of time between subsequent failures. The

model is represented as below:

NF, k = NFTI, k . Ek[F]

NFK is the number of failures for k test intervals, NFTI, k is the number of failures test

intervals based on test information collected during k test intervals, and Ek[F] is the

expectation of failures for k intervals. This Bemar model has been applied for functional

testing and also operational test data. From the results, they concluded that the model

assumes defects detected are distributed over the whole test period. They also suggested

that the model works well to complement reliability growth models.

Sun Microsystems has come out with an approach to simulate and predict the test

process behavior, including prediction of number of remaining defects in its product

release. In the case study conducted at Sun Microsystems, Karcich, Cangussu and Earl

19

proposed a state variable model called as the CDM Model (2003). CDM came from the

developers’ name of the state variable model for their Software Test Process: Cangussu,

DeCarlo and Mathur. Besides using the model to control the test process using failure

intensity as the control variable, the CDM model is also used to calculate the estimated

number of total remaining defects. Having these figures, the manager can decide number

of testing cycles as well as when to stop the testing.

Figure 2.6: Normalized results from the application of CDM Model to test process 2.3 Reviews on the Defect Prediction across SDLC and Testing Phase There were many studies and researches have been conducted to predict number

of defects across Software Development Life Cycle (SDLC). However, not really many

reviews conducted to specifically determine the estimated defects for System Test phase.

Most of them were really focused on estimating faults or defects for every phase of

SDLC.

20

From the studies presented, various techniques and approaches have been applied

to predict the number of defects across SDLC. The techniques can either use software

size, function point, historical defect data, process-related data, quality-related data, test

process data or by using derivation of existing model such as Rayleigh model and

Bayesian networks as the basis of estimating defects. Mostly, the results are presented in

a form of mathematical equation model.

This project is intended to analyze the various approaches and techniques that

have been put in place in order to come out with prediction model for defects, narrowing

to defects to be found in System Test phase of SDLC. First step in moving forward is to

identify internal predictors or factors contributing to the defects found in particular

software product. This will involve collection of available software, quality, process and

test data. Next action is to analyze the strength of each possible factor and the

relationship with the defects detected. From the results, further analysis will be carried

out to develop a mathematical model suitable for predicting the defects in testing phase.

2.4 Applications and Issues of Defect Prediction

This section describes the application and issues of defect prediction. It consists

of building defect prediction in practice, application of defect prediction, enhancement to

defect prediction as well as issues in defect prediction.

2.4.1 Building Defect Prediction in Practice

It is very crucial to ensure the process of collecting data for predicting the defects

is proper and accurate, so that the data that we used for analysis is correct. Generally,

most of studies follow these steps or guidelines to statistically coming out with estimated

defects:

21

1. Identify parameters or factors that have impact to defect injection in a

software product

2. Gather defect data for past projects in terms of total number of defects

detected

3. Analyze the correlation patterns between the parameters and the total defects

found in past projects

4. Estimate independent parameters for new project

5. Use Linear Regression to estimate total number of defects that may get

injected based on the estimated independent parameters

6. Calculate the total number of latent defects

7. Calculate efficiency required by project

8. Calculate estimated defect rate for each period using Rayleigh Distribution

9. Calculate estimated defect injection rate by phase based on project schedule

10. Plot the S-shaped curve for defect detection pattern

11. Compare Rayleigh curve and actual data to get quantitative estimate

Banerjee and Sekhar (2004) presented their own view for the processes that need

to be followed in establishing a defect prediction model. Their views were based on use

of Regression Analysis as the suitable basis for predicting defects due to its long proven

and established statistical techniques and its goodness can be verified via statistical

analysis. The processes of developing prediction model based on Regression Analysis are

as follows:

1. Gather data on given independent variables and correspondent dependent

variables

2. Determine the form of equation to fit by plotting the dependent and

independent data sets on a special graph such as scatter plot to shows the

existence of statistical relationship

3. Fit an equation depending on number of independent variables either simple

or multiple regression

4. Evaluate the fit using statistics such as Coefficient of Determination (R) or

Standard Error of Estimate (SE)

22

2.4.2 Application of Defect Prediction Defect prediction is used for various purposes throughout Software Development

Life Cycle (SDLC). Defect prediction model can be used to plan for quality of a software

project based on the capability baseline (Banerjee and Sekhra, 2004). This is described in

the Process Performance Model in which defect prediction model is one of the

importance contributors. Process Performance Model predicts the effort, number or

defect and other related data based on parameters such as schedule and size.

Figure 2.7: Process Performance Model

One of the items in the quality planning as outline by the two authors with regard

to process performance is to control number in User Acceptance Testing (UAT) phase.

For this, it starts with predicting the total number of defects using defect prediction

model, which then being adjusted according to project parameters such as customer

23

quality goals, past data from similar project and type of development methodology used.

Then, the defects are distributed amongst the phases in software life cycle. Next, these

distributed defects are adjusted for three things: to distribute defects early in the life cycle

to achieve zero defects at acceptance phase, to distribute the remaining defects in other

phases as per project scope and also to be used for verification and validation strategy

which involves use of various type of test strategy to tackle more defects. From the result,

project team should be able to derive several measures such as defect per function point

per phase, defects per person month and also review effectiveness. The data will then be

recorded and tracked.

Defect prediction is also used to determine the reliability of software. This is

because defect prediction is also part of the software reliability model. Software

reliability model aims to estimate the reliability of the latent defects of software,

especially when it is available to customers. The defects estimated across the SDLC

provide a basis for describing the probability of the software operating in a given

environment within the design range of input without failure (Thangarajan and Biswas,

2000). Rayleigh Model is chosen to be the suitable software reliability model as it

predicts the expected value of defect density at different stages of life cycle of the project.

The equation presented in the Rayleigh Model is used to predict the number of defects

over time. In order to determine the accuracy of the duration and magnitude of this

Rayleigh Model, specific inputs must be selected. Having good inputs to the model

allows accurate forecast for a specified scenario. Three main factors of the model are

mentioned in several studies: source lines of code in a form of size required to build the

software functionality, productivity index in a form of product efficiency and complexity

as well as peak staffing in terms of human effort required to build and test the software.

Thangarajan and Biswas then explained the nature of the curve of Rayleigh model

indicates the defect removal pattern across the entire life cycle. The measurement of total

defects likely to be occurred from the software being constructed is represented by the

area bounded by the x-axis and the curve as depicted in the figure below:

24

Figure 2.8: Graphical representation of Rayleigh model parameters

From the above figure, an equation of Probability Density Function (PDF) is produced,

which is F (t) = f (K, tm, t). K denotes cumulative defect density, tm represents actual time

unit while t is the time at the peak of the curve.

Good software maintenance also depends on good prediction model. Selecting

good defect prediction model is important for pricing maintenance contracts and

insurance (Li, Shaw and Herbsleb, 2003). It also helps in predicting support costs for

software including maintenance staffing. Defect prediction model helps in planning the

maintenance activities and timing for resolving reported defects. This is because a good

model should be able to simulate occurrences of similar defects in the field. The essential

thing to consider here is the different type of operational setting in which the model is

applied to. The model should be able to work in environment of user-reported defects,

widely-used systems, multi-release systems or commercial systems so that suitable

maintenance activities can be adopted.

25

2.4.3 Enhancement to Defect Prediction

One approach to enhance the defect prediction is by using the process metrics.

Process metrics or process data covers the data that is gathered in and by the problem

tracking system and the configuration management system (Kaszycki, 1999). The data

can be in a form of number of changes since last release, number of faults found since

last release, number of different developers who turned over, versions of this module

since last release or number of features that were added that affected this module. By

using process metrics, it contributes to developing a higher accuracy of defect prediction

model as well as helps in earlier detection of defect in the development process. Figures

below depict the differences between prediction without process metrics and prediction

with metrics.

Figure 2.9: Prediction without process metrics

Figure 2.10: Prediction with process metrics

Another approach to the enhanced defect prediction is through advanced model.

This is achieved via phase level Bayesian Networks (BN) for defect prediction. The

objective is to predict defects and defect rates at different periods across software

development project based on information available at any stage of development and

26

testing (Neil, 2006). This advanced model takes into account several things: how big the

software is, how good the development process is, how good the testing process is and

also chances of successfully of removing defects.

Figure 2.11: High level schematic of whole phase BN

Phase level BN for defect prediction is very useful to predict defects introduction,

prevention, detection and removal. It also covers wider scope starting from specification,

design and development, testing and rework. However, the successful implementation of

this advanced technique depends on capability and maturity of the organization.

27

2.4.4 Issues in Defect Prediction

In producing and implementing defect prediction model in actual software

development practice, there are several issues and concerns have been discussed and

being put forward. In describing the common techniques used to predict defect in their

research, Clark and Zubrow (2001) also provided the strength and weakness of each

technique. The details are presented in the table below:

Table 2.2: Strength and weakness of defect prediction techniques

Technique Strength Weakness

Empirical Defect

Prediction

• Easy to use and

understand

• Can be implemented with

minimal data

• Requires stable

processes and

standardized life cycle

• Does not account for

changes in project,

personnel or platform

Defect Discovery Profile • Predicts defect density by

time period enabling

estimation of defect to be

found in test

• No adjustment

mechanism for

efficiency of discovery

processes to account for

changes in product,

personnel, platform or

project will impact

defect predictions

COQUALMO • Predict defects for three

phases

• Quantify effects of

different discovery

techniques on detection

and removal of defects

• Covers small number of

phases

• Does not predict test or

post-deployment defects

28

Orthogonal Defect

Classification

• Classifications linked to

process provide valuable

insight

• Classification takes little

time

• Requires development of

classification scheme

• Does not account for

changes in people,

process or product

Fault Proneness

Evaluation (Size,

Complexity, Prior

History)

• Efficient and effective

focus of defect detection

activities

• “In-process” fault

density by module or

component may not

predict operational fault

density; effort may be

misdirected

Capture/Recapture

Analysis

• Can be used as soon as

data are available

• Estimates of number of

remaining defects best

when stringent

assumptions are met

Defect Prevention

Program

• Allows for comparison of

defect trends overtime to

assess impact and ROI for

defect prevention

activities

• Requires sampling of

defects and in-depth

analysis and

participation by

engineers to identify

root cause

Statistical Process

Control

• Gives indication of

inspection and

development process

performance

• Requires stable process

and real time data

collection and analysis

Several critiques have been brought up with regard to current approaches of

defect prediction (Fenton and Neil, 1999). The critiques involve the unknown

relationship between defect and failures (1), problems with multivariate statistical

approach (2), problems of using size and complexity metrics as sole predictors of defects

(3), problems in statistical methodology and data quality (4) as well as false claims about

29

software decomposition and “Goldilock’s Conjecture” (5). Reasons for critique 1 are due

to difficulty to determine upfront the importance of defect by classifying them to

different classes and also variety of how different users use the system resulting in variety

in operational profiles and difficulties to predict which defects cause which failures.

Critique 2 is related to using multivariate techniques, such as factor analysis that involves

producing metrics that cannot be interpreted directly into program features. Explanations

to critique 3 are related to ignorance of the prediction on programmers and designers as

causal effects since faulty code introduced by them, poor design ability that leads to

complex programs and inconsistencies between design modules due to complex design.

Issues in critique 4 are caused by lack of attention to the essential assumptions for

particular statistical technique, removal of data points without proper justification and

less focus between model prediction and model fitting. Finally, critique 5 takes place

because of inaccurate modeling and inference due to unclear relationship between

module size and defect density.

2.4.5 Reviews and Remarks on Application and Issues in Defect Prediction In order to build a good defect prediction model, it is imperative to have and

follow appropriate steps. It should start with identifying suitable parameters that

influence the introduction of defect in software before moving on with collecting data

with regard to the identified parameters. Then, the data collected from previous step

should be plotted in a graph against the defects found to study and establish statistical

relationship between them in a form of equation. The equation obtained from the graph is

fitted and evaluated using statistical technique such as verifying it using current and

future projects until it satisfies the purpose of the model.

From various reviews conducted, having defect prediction in place helps in

increasing the software product quality while saving maintenance cost and effort. Defect

prediction also facilitates the distribution of testing resources in parallel with defect

30

density. Sources of defects also can be identified through predicting defects. Putting

defect prediction in practice enables the creation of quantifiable metrics to aid the

decision making on software product delivery.

Several critiques towards existing defect prediction have revealed the strengths

and weaknesses of each approach and technique. It also shows the capabilities of existing

models to cater for different objective of defect prediction, thus provides the

opportunities for more improvement to be imposed on these models in improving the

quality and reliability of software.

2.5 Summary of the Proposed Solution

From the above discussion, the author draws several conclusions with regard to

new approach of defect prediction model for software testing phase in SDLC. First, it is

important to set the clear objective of what the proposed model need to achieve when it is

implemented in real software development operations, which is to be able to estimate

total number of defects to be discovered in software testing phase. Getting started with

sample technique will do. Second, identification and collection of appropriate factors’

data that has strong significance with defect need is very essential in defect prediction by

following proper steps or processes, especially historical data. Whatever data that is

available in place could help in determining the suitable prediction technique. This is

because the historical data may drive the model selection. This bring to third conclusion

in which the statistical relationship between the factors and defects must be established in

coming out with the model to determine the correlation between those parameters.

Instead of just focusing on fixing defects, analysis on the patterns against defects can be

carried out. Fourth, verification of the model, in which in a form of equation, must be

performed to ensure the model works and suitable with the internal software production

process.

31

CHAPTER 3

METHODOLOGY

3.1 Introduction This chapter discusses the research methodology applied towards the

establishment of defect prediction model for testing phase. As this project is a Six Sigma

Green Belt project, the methodology follows the Design for Six Sigma (DfSS) approach,

which is DMADV comprises of Define phase (D), Measure phase (M), Analyze phase

(A), Design phase (D) and Verify phase (V). Then, it explains on the supporting tools

used throughout this research with regard to data gathering, data analysis and also the

establishing the proposed model.

3.2 Six Sigma - DMADV Methodology As mentioned, the research uses Six Sigma DMADV methodology consists of

Define, Measure, Analyze, Design and Verify phases. As this research will finish in end

of May 2009, the research has been completed until end of Analyze phase. However,

complete activities in all phases are explained throughout this chapter.

3.2.1

is ide

tree d

is wh

requi

invol

signo

the cu

Define Ph

In this ph

entified. It i

diagram. Th

here the Big

ired to devel

lves identifyi

P

S

C

L

TM

Project sc

off date by P

ustomers tha

hase

hase, busines

nvolves pro

ese are deriv

g Y or busin

lop the team

ing Project S

Project Typ

ponsor

Champion

Leader

Team Members

cheduling is

roject Cham

at are directl

Figure 3.1

ss opportunit

oducing busi

ved from the

ness target a

m charter an

Sponsor, Pro

Table 3

e DMAD

Moham

Moham

Muham

V.VeeraMohd KVivek KNagesw

outlined in

mpion for eve

y related and

1: DMADV

ty that leads

iness scoreca

e organizatio

and detail tr

nd build effe

oject Champ

3.1: Project t

DV

med Redzuan

med Redzuan

mmad Dhiaud

anjeneya ReKhairulnizamKumar wari Kumara

this phase t

ery phase. Th

d impacted b

phases

s to the estab

ard drill-dow

on software

ree diagram

ective team.

ion, Leader

team

n Abdullah

n Abdullah

ddin Moham

ddy m Md Dahar

an

to determine

hen, the acti

by the projec

blishment of

wn as well

production

m are defined

Building a

and Team M

med Suffian

ri

e start date,

ivity moves t

ct.

3

f this researc

as building

process. Thi

d. Then, it i

team charte

Members.

end date an

to identifyin

32

ch

a

is

is

er

nd

ng

33

Table 3.2: Customer identification

Customers Segments Priorities

Software Tester Test COE Planning & Benchmarking

Software Developers Software Development Benchmarking

Improvement

After that, analysis on voice of customer is done by identifying the customer need

statement, conducting KJ Analysis and performing Kano Analysis. Customer need

statement is important to know what is needed by the identified customers to predict the

defects and what the customers will do when the model is in place. From this exercise,

the author could conduct KJ Analysis to observe the relationship among the list of

customer need statement. Survey form on “What is the contributor for test defect

prediction model” is distributed to all test engineers to get the total score of their needs.

Then, author could perform Kano Analysis to obtain exact customer requirement and

conduct further analysis.

3.2.2 Measure Phase Customer requirements that have been identified in Define phase is translated into

system or technical requirements. This is done using Quality Function Deployment

(QFD) or House of Quality. QFD helps to determine product development characteristics

by combining customer needs with technical requirements. After building the QFD, it

moves to performing Measurement System Analysis (MSA). MSA that is done here is for

attribute data since the result of test case execution is only PASS or FAIL. For this MSA,

ten (10) test cases with known result of PASS or FAIL are selected. Then, three (3)

testers are chosen to execute all test cases in random for three times. The result of this

will be used to determine repeatability of test engineers in finding defects and their

capability against specified standard.

34

After completing the MSA, next activity is to describe the Operational Definition.

It defines the customer needs followed by the description of each customer need and unit

of measurement that will be used for each need. Then, author outlines the data collection

plan. Data collection plan includes the data that need to be collected with regard to

defined Operational Definition. Furthermore, data collection plan also includes

identifying the sample size of data, sources of data, time to gather the data, mechanism or

ways to obtain the data, persons that will collect the data and measurement unit for data

being collected.

3.2.3 Analyze Phase This important phase involves several activities that need to be performed. From

the data collected in Measure phase, author identifies significant factors that contribute to

the defect detection in software testing phase. Thus, author selects the data from a few

numbers of projects for analyzing the factors. Author needs to quantify any issues with

regard to the data that has been collected and determine the significant factors by using

comparative methods or by quantifying the design relationship. In this case of

establishing the defect prediction model, author performs regression and correlation

analysis of identified factors against the number of defects.

After completing the regression, author could observe the strong factors that lead

to the defect discovery in testing phase. From the regression, an equation of the

relationship between significant factors against the defects is generated using statistical

software tool. Next, author need to identify and prevent design failure modes. Failure

Mode and Effect Analysis (FMEA) is done to identify potential failure modes, potential

effect to customers, potential causes of failures and probability of failures occurrences

with regard to defect discovery. Author then identifies the design alternatives and

conceptual design of the proposed model. This is done using Pugh Method technique in

35

which alternative design concepts are evaluated and compared against the proposed

model.

3.2.4 Design Phase As this research focusing on establishing a model and not a product, several

activities in Design phase are skipped. Outcome from the Analyze phase is used to

perform tolerance analysis. The predicted defects from the model are compared with the

actual defects found. Tolerance from that comparison is recorded and analyzed. It has

been set that the actual defects found must be within 10% less or 10% more of the

predicted defect. Then, the performance of the proposed model is evaluated using

scorecard.

3.2.5 Verify Phase In this last phase, reliability of the proposed model is assessed using statistical

method. Then, author need to perform capability flow-up and scorecard to ensure

customer requirements are met. If there is a need, FMEA will be updated. Finally,

transition plan will be prepared. This is to ensure that the proposed defect prediction

model for test is implemented and incorporated into the software production process.

Final sign-off will be obtained from the relevant parties including Project Sponsor,

Project Champion, Leader and the Process Owner.

36

3.3 Supporting Tools Throughout the research on establishing the defect prediction model for testing

phase, several software tools or software application are used. The tools or applications

serve as the basis in gathering related data, conducting survey, building graphs and also

performing regression. Followings are the tools used throughout the research:

i) Rational Clear Case – software for acquiring Test Summary Report that

contains the list of defects for particular projects

ii) Rational Clear Quest – software for obtaining defect data for particular

projects

iii) Microsoft Excel – software for recording survey results and MSA result

iv) Microsoft Power Point – software for documenting the results of each

phase in slides format

v) Minitab 15 – statistical software tool for building related graphs and

performing regression

vi) MyMetrics – centralized repository of software quality metrics

37

CHAPTER 4

PROJECT DISCUSSION

4.1 Introduction This chapter discusses the results and outcome of each phase. It explains the

MIMOS software production process, list of schematic diagram, team charter, customer

need statement, KJ Analysis of customer need and Kano Analysis of Define phase.

Discussion on outcome of Measure phase covers explanation House of Quality for Defect

Prediction Model, MSA results, Operational Definition and Data Collection Plan. Then, it

describes the outcome of Analyze phase including the data for regression, regression

analysis, FMEA result and Pugh Method result.

4.2 Findings of Define Phase 4.2.1 MIMOS Software Production Process As presented in Figure 4.1, testing team involves in all review session for each

phase, starting from planning until end of system testing phase throughout the software

production process. Test engineers involve in reviewing planning document, requirement

38

analysis document, design document, test planning document and test cases. The software

production process is governed by project management, quality management,

configuration and change management, integral and support as well as process

improvement initiatives, which CMMi. From Figure 4.1, the area of study is the

functional or system test phase. In order to perform further analysis and establish defect

prediction model for system test phase, faults and errors captured in previous phases prior

to testing phase must be considered and investigated.

Figure 4.1: MIMOS software production process

4.2.2 Schematic Diagram There are two (2) schematic diagrams that have been produced, which are high

level schematic diagram and detail schematic diagram. High level schematic diagram

deals with establishing the Big Y or business target, little Ys, vital Xs and the goal

statement against the business scorecard. In this research, Big Y is to produce software

with zero-known post release defects. As for little Ys, elements that contribute in

39

achieving Big Y are defect containment in test phase, customer satisfaction, quality of the

process being imposed to produce the software and project management. From the little

Ys, it is obvious that testing team involves in ensuring the defect containment in test

phase. There two (2) aspects involved related to this litte Y: potential number of defects

before test phase which is the research interest and number of defects after completing

test phase. The goal statement for this research is “To achieve and implement Defect

Prediction Model for Test in Test Centre of Excellence by 30th May 2009”. This is

presented in Figure 4.2 as below:

Figure 4.2: Schematic diagram

Going into detail schematic diagram, from the Vital X which is potential number

of defects before test, possible factors that contribute to the defect prediction are defined

and summarized in a Y to X tree diagram. Basically, author defines seven (7) main

factors associated to defect prediction. They are software complexity, developer, tester,

test process, fault, historical defects and projects. Software complexity could be in a form

of requirement, programming language used or code size. Developer’s factor involves

knowledge they have in developing the software. Knowledge of tester in testing the

software product is also considered. As for the test process factor, it includes test case

design coverage, test case execution productivity, test tool used and test strategy being

40

applied. Fault factor comprises of requirement fault, design fault, code and unit testing

(CUT) fault, integration fault and test case fault. Historical defect factor consists of defect

severity, defect category and defect validity, while for project factor involves type of

project domain and project thread, either it is application based software or component

based software. These descriptions are exhibited in the Figure 4.3 as below:

Figure 4.3: Detail schematic – Y to X tree diagram

4.2.3 Team Charter

In team charter, author defines the business case, opportunity statement, specific

goal statement, project scope, in scope and out of scope for the project. Business case

explains the relevancy of why defect prediction is needed and how it can improve the

business. Opportunity statement outlines the customer of the project, potential volume

and market share for the project. Specific goal statement is similar with the one that has

been defined in the high level schematic diagram. Project scope defines the application of

Design for Six Sigma (DfSS) using DMADV in the Testing phase of Software

41

Development Life Cycle. For in scope and out of scope section, it details out the location

of project and business that is related and not related to the project.

Figure 4.4: Team charter

4.2.4 Customer Need Statement Customer need statement involves two main things. First, author identifies

customer need from the point of what are required or factors that could help in predicting

total number of defects in testing phase. Second thing is author observe what the

customer of the project will do once the defect prediction model is established and

incorporated in the process. This is explained in Figure 4.5.

42

Figure 4.5: Customer need statement

4.2.5 KJ Analysis and Kano Analysis From the customer need statement, author establishes the relationship between the

lists of customer needs. For this purpose, author prepares a survey form and distributes

the form to all test engineers. The survey is meant for collecting the scores from all test

engineers on key contributors for test defect prediction model. The scores are put

accordingly to the related customer needs. Using KJ Analysis, author establishes the

relationship between the customer needs and observes the significance between them.

The outcomes of KJ Analysis are presented below in Figure 4.6 and Figure 4.7.

43

Figure 4.6: 1st level of KJ analysis

Figure 4.7: 2nd level of KJ analysis

After completing KJ Analysis, author performs Kano Analysis. This is where

author determines Kano identifier of the customer needs. Since the project is focused on

establishing defect prediction model for testing phase, the customer need that is

considered is “Estimated total number of defects to be discovered per project”. Kano

identifier is “Must Be” to proceed with further analysis. Figure 4.8 shows Kano Analysis.

44

Figure 4.8: Kano analysis

4.3 Findings of Measure Phase 4.3.1 House of Quality From the customer needs, it is translated into technical requirement using Quality

Function Deployment (QFD). QFD determines the model characteristics by combining

customer needs with technical requirements. QFD consists of customer requirements,

direction of goodness, system or technical requirements, competitive analysis,

importance, technical analysis and relationship matrix. From the QFD, author observed

that project name, Problem Report number, submission date of defect, fault and in-

process fault are strong factors for defect prediction.

Project scheduling is outned in this phase to determine start date, end date and signoff

date by Project Champion for every phase. Then, the activity moves to identifying the

customers that are directly related and impacted by the project.

45

Figure 4.9: House of Quality for defect prediction model

4.3.2 Measurement System Analysis Measurement System Analysis (MSA) that has been conducted is the attribute

MSA since result of test case execution is either PASS or FAIL. MSA begins with

identifying ten (10) test cases with known result of PASS and FAIL. Then, three (3) test

engineers are selected and execute the test cases in random. This is repeated three (3)

times for every engineer. The result is recorded in Microsoft Excel as below:

46

Figure 4.10: Test case experiment result

The result as above is then transferred to Minitab software for further attribute

assessment agreement. The assessment is done to evaluate the agreement within

appraisers, each appraiser against standard and all appraisers against standard. For

attribute agreement within appraisers, the MSA result is PASS since it shows 100%

assessment agreement and shows Kappa value of 1 which demonstrates perfect

agreement. Thus, it proves strong repeatability in achieving test result within tester. As

for attribute agreement for each appraiser against standard, the result is also PASS since

Kappa value shows the value of more than 0.7 or more than 70%. This demonstrates

acceptable result of accuracy against standard. Finally, for MSA of all appraisers against

standard, the result is PASS. Thus, it summarizes that overall MSA being conducted is

PASS with Kappa value of more than 0.7 or 70%. These results are shown in following

figures.

47

Figure 4.11: Assessment agreement

Figure 4.12: Assessment agreement for within appraiser

48

Figure 4.13: Assessment agreement for each appraiser against standard

Figure 4.14: Assessment agreement for all appraisers against standard

49

4.3.3 Operational Definition and Data Collection Plan

Operational definition describes the type of data that need to be collected,

definition for each data as well as unit of measurement used for each data. The

operational definition that has been prepared is summarized as below:

Figure 4.15: Operational definition

From the operational definition, a plan has been established to determine when to

collect and obtain the data from the respective sources. The data collection plan consists

of data that need to be collected as specified in operational definition, description of each

data, sample size, sources of data, time to collect the data, methods to extract the data,

responsible person to extract the data and unit of measurement that will be used for every

data extracted. The plan is presented in following figure:

50

Figure 4.16: Data collection plan

4.4 Findings of Analyze Phase 4.4.1 Regression To perform the regression, right data must be obtained to ensure correct

regression is performed. Below is the data collected and used to perform the regression

analysis.

51

Figure 4.17: Data for regression

Using the above data, author performs regression using Minitab. In Minitab,

multiple regression option is chosen to do the regression. For the regression, factors that

being considered are faults in requirement, faults in design, faults in CUT, total faults, in-

process fault which is faults divided by code size or KLOC and code size itself. Defects

in this case are the defects raised as functional defects. Non-functional defects such as

usability or performance defects are not considered to conduct the regression. The

regression result is presented below:

Figure 4.18: Regression result

52

From the regression result, since total fault is highly correlated with other factors,

it is removed from the equation. With R-Square values of 80.2%, the model equation to

predict the defects is summarized as:

Defect = -1.27 - 0.025 Requirement – 0.026 Design + 0.320 CUT +

0.207 + 0.604 KLOC

It is also observed that KLOC and CUT are the strong factors in predicting defects

by looking at the P-value of 0.009 and 0.091 respectively. However, all factors are

considered to avoid bias in establishing the defect prediction model. This regression

result will be used to complete another two phases: Design and Verify phases.

53

CHAPTER 5

CONCLUSION

5.1 Achievements Although the research is not yet completed due to end date specified is in end of

May 2009, several beneficial achievements have been observed and obtained with regard

to establishing a defect prediction model for testing phase. Towards the end of the

project, the objectives outlined beforehand have been achieved. The mathematical

equation generated from the regression analysis has demonstrated that defect prediction

model could be constructed with the existence of identified factors. From the model

equation, author able to discover the strong factors that contribute to the number of

defects in testing phase. In addition, author also realized that other important factors are

also need to be considered and incorporated since those factors are also significant in

predicting defects in testing phase.

Moreover throughout the research studies, author has been able to demonstrate the

success of Six Sigma methodology in building a defect prediction model for testing

phase. Each and every phase of the Six Sigma approach has allowed the research to be

conducted in a very structured and systematic ways by having proper planning and

analysis for every deliverables. Design for Six Sigma (DfSS) methodology provides

opportunities to the author to clearly determine what needs to be achieved from the

research, issues to be addressed, data to be collected, what needs to be measured and how

the model is generated and constructed.

54

Technically, in building the defect prediction model, it is observed that many

factors contribute to the defect discovery in testing phase. Obviously, faults in

requirement, design and coding as well as in-process faults have their own relationship

with defects. Code size in a form of kilo lines of code also affects the number of defects

found in testing phase. By extracting the correct data from right sources, author able to

conduct proper and details analysis on the identified factors while at the same time,

proves that all factors must be considered in predicting defects for testing phase. On the

other hand, while performing the measurement analysis and regression, author has been

exposed to the usage of Minitab software, a powerful statistical solution. This has

allowed author to have in-depth knowledge on the statistical knowledge and how

importance the statistic is in improving the internal process.

As outlined in the business case of the team charter, this research has

demonstrated the importance of defect prediction model in improving the internal

software production process, specifically the testing process. Although the project is still

on going until end of May 2009, the research shows that defect prediction model provides

strong contribution to zero-known post release defects of particular software product

since testing is the last gate in the process before the software can be said as fit for release

and use. Test engineers will discover as many defects as possible to ensure all defects are

contained within the testing phase and not escaping to the end-user. Additionally, having

a predicted number of defects allows for better resource utilization of test engineers for a

project by allocating appropriate number of testers to test the software. Better test

strategy and wider test coverage could be implemented by having predicted number of

defects. This can be achieved practically since every test engineer will be aware of the

potential defects that they will discover. The tolerance of 10% lesser or 10% greater of

actual defects found against the estimated defects could be their guide in testing the

software product. Indirectly, having estimated number of defects in testing phase

promotes the initiatives of the whole software development process, especially in

ensuring stability of development effort in releasing a software product.

55

Furthermore, this project also shows the importance of effective communication

between the team members as well as related parties that involved in gathering the

software quality metrics. Author has successfully delegated related tasks to respective

team members and ensuring they are completed successfully. These can be seen from

performing data gathering, measurement system analysis and identification of customer

needs. Besides that, effective communication is also applied when acquiring the data on

software quality metrics from MyMetrics application. This is very crucial to ensure the

data acquired are correct and reliable. 5.2 Constraints and Challenges Throughout the research period, several constraints and challenges have been

faced by the author. However, those challenges have been tackled accordingly to ensure

the success of research effort until end of Analyze phase. First challenge that took place

is when the author needs to collect the historical defects data of the selected projects.

There are two sources to obtain the data: Test Summary Report and Rational Clear Quest.

Author needs to extract only valid defects data from Rational Clear Quest, which means

defects data with rejected status or defects that were raised out of testing phase are not

considered. Author need to go through the defects data one by one for each selected

project with assistance of the query provided in the system. Next, author also needs to

compare that defects data extracted from the system are tally with the one reported in the

Test Summary Report. However, sometimes the data are not matched between these two

sources. Thus, author has to verify with respective test leads for that particular project to

get the correct results.

Other challenge that has been faced was during measurement system analysis of

the test case result. The MSA has to be conducted twice due to FAIL result in the first

MSA activity. Author has to identify the reason on why inconsistency happened in

executing the random test cases that leads to wrong test case result against the standard

56

result. Next constraint or challenge is difficulty in obtaining the data on software quality

metrics. This due to the no full access given to access MyMetrics system thus resulting in

less quality data can be extracted to perform further analysis. Author has to wait for

quality engineers to give the data and due sometimes it caused delay to the schedule.

One more challenge happened when conducting regression analysis on the

extracted data. First round of regression consists of data that involves outliers due to

bigger number of requirements fault recorded. The regression result looks promising but

since the data used involves outliers, it cannot be considered as the best model. Second

round of regression is done and the result also looks promising. To agree with this latest

equation, author has to get consensus from the Project Champion so that the author can

proceed with the next phases.

5.3 Recommendation To date, analysis of the proposed defect model is still being done until its

completion in end of May 2009. However, from the research effort being done since start

of this project, author already observes the improvements and recommendations that

could be done. First recommendation is to consider more factors besides current factors

that have been identified. Next research effort can focus on considering other factors with

detail analysis. As of current effort, author only considers code size factor, in-process

fault (IPF) and faults found in phases prior to testing phase. Moving forward, author can

consider test case fault, test case coverage, test case productivity and defect severity as

other factors that lead to defect discovery in testing phase. Other than, as current research

area focuses on predicting total number defects regardless on severity or duration of

testing activities, future effort can focus on improving the defect prediction model to

predict defect severity in testing phase. For example, the model can predict how many

critical or major defects can be found in testing phase. The model also can focus on

predicting number of defect found over time until end of test execution activities.

57

Other recommendation may include incorporation of this defect prediction model

with other established software reliability model, such as Musa model or Shooman’s

model. This could help in enhancing the confidence and reliability of the software being

released to the customer or end-user. Finally, this model can be improved by splitting it

to accommodate different project thread. Current model serves as generalized model to

govern the prediction of defects for all project threads. In the future, specific defect

prediction model can be constructed to cater for different project thread. It means that

there will be a defect prediction model for application-based project and another model

for component-based project.

58

REFERENCES

1. Clark, B. and Zubrow, D. (2001). How Good is the Software: A Review of Defect

Prediction Techniques. Software Engineering Symposium. Carnegie Mellon

University.

2. Fenton, N.E. and Neil, M. (1999). A Critique of Software Defect Prediction

Models. IEEE Transactions On Software Engineering. Volume 25, No.5.

3. Grottke, M. and Dussa-Zieger, K (2001). Prediction of Software Failures Based

on Systematic Testing. Ninth European Conference on Software Testing Analysis

and Review. Stockholm.

4. Mohanty, B. and Mohapatra, S. (2001). Defect Prevention Through Defect

Prediction: A Case Study at Infosys. Proceedings of IEEE International

Conference on Software Maintenance.

5. Nayak, V. and Naidya, D. (2003). Defect Estimation Strategies. Patni Computer

Systems Liited. Mumbai.

6. Neuendorf, S. (2004). Prediction of Software Defects. SASQAG 2004.

7. Ostrand, J.T. and Weyuker, E.J. (2007). How to Measure Success of Fault

Prediction Models. SOQUA ‘07. 25-30.

8. Ostrand, T.J., Weyuker, E.J., Bell, R.M. and Ostrand, R.C. (2005). A Different

View of Fault Prediction. Proceedings of the 29th Annual International Computer

Software and Applications Conference (COMPSAC ’05).

9. Rana, Z.A., Shamail, S. and Awais, M.M. (2008). Towards a Generic Model for

Software Quality Prediction. WoSQ ’08. Leipziq.

10. Thangarajan, M. and Biswas, B. (2002). Software Reliability Prediction Model.

Tata Elxsi Whitepaper.

59

APPENDICES

60

SURVEY: DEFECT PREDICTION MODEL FOR TEST

Name: ____________________________________________________________

From your point of view, what is the key contributor for test defect prediction model?

(Please rank from most important to least important)

Requirements for Software

Programming Language Used

Software Size/Code Size (KLOC)

Errors/Mistakes Captured in Phase Prior to Testing

Historical data of defects logged (Historical PRs)

Others (Please identify): ___________________________________________

Thank you

Download - Defect Detection

Top Related