bug triage with bug data reduction - ijitechijitech.org/uploads/531462ijit10465-315.pdf ·...

3
WWW.IJITECH.ORG ISSN 2321-8665 Vol.04,Issue.10, August-2016, Pages:1737-1739 Copyright @ 2016 IJIT. All rights reserved. Bug Triage with Bug Data Reduction DR. G. PRAKASH BABU 1 , BHAVANA REDDY 2 1 Professor, Dept of CSE, Intell Engineering College, Anantapur, AP, India, E-mail: [email protected]. 2 PG Scholar, Dept of CSE, Intell Engineering College, Anantapur, AP, India, E-mail: [email protected]. Abstract: The process of fixing bug is bug triage, which aims to correctly assign a developer to a new bug. Software companies spend most of their cost in dealing with these bugs. To reduce time and cost of bug triaging, we present an automatic approach to predict a developer with relevant experience to solve the new coming report. In proposed approach we are doing data reduction on bug data set which will reduce the scale of the data as well as increase the quality of the data. We are using instance selection and feature selection simultaneously with historical bug data. We have added a new module here which will describe the status of the bug like whether it assigned to any developer or not and it is rectified or not. Keywords: Bug, Bug Triage, Data Reduction, Instance Selection, Data Mining. I. INTRODUCTION A bug repository plays an important role in managing software bugs. Many open source software projects have an open bug repository that allows both developers and users to submit defects or issues in the software, suggest possible enhancements, and comment on existing bug reports. For open source large-scale software projects, the number of daily bugs is so large which makes the triaging process very difficult and challenging . Software companies spend over 45 percent of cost in fixing bugs. There are two challenges related to bug data that may affect the effective use of bug repositories in software development tasks, namely the large scale and the low quality. In a bug repository, a bug is maintained as a bug report, which records the textual description of reproducing the bug and updates according to the status of bug fixing. Primary contribution of this paper is as follow: Here in this paper we are using feature selection and instance selection with historical data for reducing the bug data in bug repository so that we get quality data as well as low scale data. We are also adding a graph module for representing the bug report’s. Section II describes the architecture of the proposed system. The details of instance selection, feature selection, historical data use and graph module is given in section III and the summary is concluded in section IV. A. Existing System We review existing work on modeling bug data, bug triage, and the quality of bug data with defect prediction. 7.1 Modeling Bug Data To investigate the relationships in bug data, Sandusky et al. Form a bug report network to examine the dependency among bug reports. Besides studying relationships among bug reports, Hong et al. build a developer social network to examine the collaboration among developers based on the bug data in Mozilla project. This developer social network is helpful to understand the developer community and the project evolution. By mapping bug priorities to developers, Xuan et al. identify the developer prioritization in open source bug repositories. The developer prioritization can distinguish developers and assist tasks in software maintenance. Bug Triage Bug triage aims to assign an appropriate developer to fix a new bug, i.e., to determine who should fix a bug. _Cubrani_c and Murphy first propose the problem of automatic bug triage to reduce the cost of manual bug triage. They apply text classification techniques to predict related developers. Anvik et al. examine multiple techniques on bug triage, including data preparation and typical classifiers. B. Proposed System The primary contributions of this paper are as follows: We present the problem of data reduction for bug triage. This problem aims to augment the data set of bug triage in two aspects, namely a) to simultaneously reduce the scales of the bug dimension and the word dimension and b) to improve the accuracy of bug triage. We propose a combination approach to addressing the problem of data reduction. This can be viewed as an application of instance selection and feature selection in bug repositories. We build a binary classifier to predict the order of applying instance selection and feature selection. To our knowledge, the order of applying instance selection and feature selection has not been investigated in related domains. This paper is an extension of our previous work. In this extension, we add new attributes extracted from bug data sets, prediction for reduction orders, and experiments on four instance selection algorithms, four feature selection algorithms, and their combinations In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the bug data to save the labor cost of developers and improve the quality to facilitate the process of bug triage. Data reduction for bug triage aims to build a small-scale and

Upload: buihuong

Post on 07-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bug Triage with Bug Data Reduction - IJITECHijitech.org/uploads/531462IJIT10465-315.pdf · techniques on bug triage, including data ... Bug Triage with Bug Data Reduction ... and

WWW.IJITECH.ORG

ISSN 2321-8665

Vol.04,Issue.10,

August-2016,

Pages:1737-1739

Copyright @ 2016 IJIT. All rights reserved.

Bug Triage with Bug Data Reduction DR. G. PRAKASH BABU

1, BHAVANA REDDY

2

1Professor, Dept of CSE, Intell Engineering College, Anantapur, AP, India, E-mail: [email protected].

2PG Scholar, Dept of CSE, Intell Engineering College, Anantapur, AP, India, E-mail: [email protected].

Abstract: The process of fixing bug is bug triage, which aims

to correctly assign a developer to a new bug. Software

companies spend most of their cost in dealing with these bugs.

To reduce time and cost of bug triaging, we present an

automatic approach to predict a developer with relevant

experience to solve the new coming report. In proposed

approach we are doing data reduction on bug data set which

will reduce the scale of the data as well as increase the quality

of the data. We are using instance selection and feature

selection simultaneously with historical bug data. We have

added a new module here which will describe the status of the

bug like whether it assigned to any developer or not and it is

rectified or not.

Keywords: Bug, Bug Triage, Data Reduction, Instance

Selection, Data Mining.

I. INTRODUCTION

A bug repository plays an important role in managing

software bugs. Many open source software projects have an

open bug repository that allows both developers and users to

submit defects or issues in the software, suggest possible

enhancements, and comment on existing bug reports. For

open source large-scale software projects, the number of daily

bugs is so large which makes the triaging process very

difficult and challenging . Software companies spend over 45

percent of cost in fixing bugs. There are two challenges

related to bug data that may affect the effective use of bug

repositories in software development tasks, namely the large

scale and the low quality. In a bug repository, a bug is

maintained as a bug report, which records the textual

description of reproducing the bug and updates according to

the status of bug fixing. Primary contribution of this paper is

as follow: Here in this paper we are using feature selection

and instance selection with historical data for reducing the

bug data in bug repository so that we get quality data as well

as low scale data. We are also adding a graph module for

representing the bug report’s. Section II describes the

architecture of the proposed system. The details of instance

selection, feature selection, historical data use and graph

module is given in section III and the summary is concluded

in section IV.

A. Existing System

We review existing work on modeling bug data, bug triage,

and the quality of bug data with defect prediction. 7.1

Modeling Bug Data To investigate the relationships in bug

data, Sandusky et al. Form a bug report network to examine

the dependency among bug reports. Besides studying

relationships among bug reports, Hong et al. build a

developer social network to examine the collaboration among

developers based on the bug data in Mozilla project. This

developer social network is helpful to understand the

developer community and the project evolution. By mapping

bug priorities to developers, Xuan et al. identify the developer

prioritization in open source bug repositories. The developer

prioritization can distinguish developers and assist tasks in

software maintenance. Bug Triage Bug triage aims to assign

an appropriate developer to fix a new bug, i.e., to determine

who should fix a bug. _Cubrani_c and Murphy first propose

the problem of automatic bug triage to reduce the cost of

manual bug triage. They apply text classification techniques

to predict related developers. Anvik et al. examine multiple

techniques on bug triage, including data preparation and

typical classifiers.

B. Proposed System

The primary contributions of this paper are as follows:

We present the problem of data reduction for bug triage.

This problem aims to augment the data set of bug triage

in two aspects, namely a) to simultaneously reduce the

scales of the bug dimension and the word dimension and

b) to improve the accuracy of bug triage.

We propose a combination approach to addressing the

problem of data reduction. This can be viewed as an

application of instance selection and feature selection in

bug repositories.

We build a binary classifier to predict the order of

applying instance selection and feature selection. To our

knowledge, the order of applying instance selection and

feature selection has not been investigated in related

domains. This paper is an extension of our previous work.

In this extension, we add new attributes extracted from bug

data sets, prediction for reduction orders, and experiments on

four instance selection algorithms, four feature selection

algorithms, and their combinations In this paper, we address

the problem of data reduction for bug triage, i.e., how to

reduce the bug data to save the labor cost of developers and

improve the quality to facilitate the process of bug triage.

Data reduction for bug triage aims to build a small-scale and

Page 2: Bug Triage with Bug Data Reduction - IJITECHijitech.org/uploads/531462IJIT10465-315.pdf · techniques on bug triage, including data ... Bug Triage with Bug Data Reduction ... and

DR. G. PRAKASH BABU, BHAVANA REDDY

International Journal of Innovative Technologies

Volume.04, Issue No.10, August-2016, Pages: 1737-1739

high-quality set of bug data by removing bug reports and

words, which are redundant or non-informative. In our work,

we combine existing techniques of instance selection and

feature selection to simultaneously reduce the bug dimension

and the word dimension. The reduced bug data contain fewer

bug reports and fewer words than the original bug data and

provide similar information over the original bug data. We

evaluate the reduced bug data according to two criteria: the

scale of a data set and the accuracy of bug triage. To avoid the

bias of a single algorithm, we empirically examine the results

of four instance selection algorithms and four feature selection

algorithms.

II. ARCHITECTURE

A. Bug Triage

Aim of bug triage is to assign a developer for bug fixing.

Once a developer is assigned to a new bug report he will fix

the bug or try to rectify it. He will give the status related to

bug whether it is rectified or not [1].

Fig.1.

B. Data Reduction

Here we are reducing the bug data by using instance and

feature selection so that we get low scale as well as quality

data.

Fig.2.

III. INSTANCE SELECTION

Instance selection methods associated with data mining

tasks such as classification and clustering.

It’s a nontrivial process of identifying valid, novel,

potentially useful, and ultimately understandable patterns

in data.

Choosing a subset of data to achieve the original purpose

of a data mining application as if the whole data is used.

The ideal outcome of instance selection is model

independent

(1)

A. Evaluation Measures

Direct Measure: Keep as much resemblance as possible

between the selected data and the original data.

Ex: Entropy, moments, and histograms.

Indirect Measure:

For example, a classifier can be used to check whether

instance selection results in better, equal, or worse

predictive accuracy.

Conventional evaluation methods in sampling,

classification, and clustering can be used in assessing the

performance of instance selection. Ex) Precision, recall.

B. Feature Selection

It select a minimum set of features such that the

probability distribution of different classes given the

values for those features is as close as possible to the

original distribution given the values of all features [1].

Reduce # of patterns in the patterns, easier to understand.

Create new attributes that can capture the important

information in a data set much more efficiently than the

original attributes.

Use the smallest representation which is enough to solve

the task.

Heuristic Methods:

Step-wise forward selection

Step-wise backward elimination

Create new attributes that can capture the important

information in a data set much more efficiently than the

original attributes Three general methodologies:

Feature extraction domain-specific

Mapping data to new space (see: data reduction)

Fig.3. Instance selection in Mozilla and feature selection in

Mozilla.

Page 3: Bug Triage with Bug Data Reduction - IJITECHijitech.org/uploads/531462IJIT10465-315.pdf · techniques on bug triage, including data ... Bug Triage with Bug Data Reduction ... and

Bug Triage with Bug Data Reduction

International Journal of Innovative Technologies

Volume.04, Issue No.10, August-2016, Pages: 1737-1739

C. Graph Module

This module show’s four part’s as follow:

Firstly it will show how many bugs are not assigned to

any developer. It will give complete status about the bugs

to the admin so that he will come to know which bugs are

not assigned yet.

Secondly it will show how many bugs are not assigned to

any developer. It will give complete status about the bugs

to the admin so that he will come to know which bugs are

assigned.

Thirdly it will show how many bugs are rectified by the

developer’s. It will give complete status about the bugs to

the admin so that he will come to know which bugs are

rectified completely.

Fourthly it will show how many bugs are not rectified by

the developer’s. It will give complete status about the

bugs to the admin so that he will come to know which

bugs are not rectified yet.

Historical Data: This is also used for reducing the bug data.

Here we will enter the date for the the last accessed bug’s and

the data which we get we will use it for data reduction [3].

IV. CONCLUSION

In this paper we have focused on reducing bug data set in

order to have less scale of data and quality data. For that we

have used feature selection and instance selection techniques

of data mining as well as we have used historical data. Our

experimental results showed that this data reduction technique

will give quality data as well as it will reduce the data scale.

We have added new module in this paper than the earlier

which will give various details related to the bugs to

administrator in graphical format.In future work, we plan on

improving the results of data reduction more in bug triage to

explore how to prepare a high quality bug data set.

V. REFERENCES

[1]Jifeng Xuan, He Jiang, Yan Hu, Zhilei Ren, Weiqin Zou,

Zhongxuan Luo, and Xindong Wu,” Towards Effective Bug

Triage with Software Data Reduction Techniques” ieee

transactions on knowledge and data engineering, vol. 27, no.

1, january 2015.

[2] Mamdouh Alenezi and Kenneth Magel, Shadi Banitaan

“Efficient Bug Triaging Using Text Mining” © 2013 academy

publisher.

[3]Francisco Servant “Supporting Bug Investigation using

History Analysis” 978-1-4799-0215-6/13 c 2013 IEEE

[4]Pamela Bhattacharya, Iulian Neamtiu, Christian R. Shelton,

“Automated, Highly- Accurate, Bug Assignment Using

Machine Learning and Tossing Graphs”, May 2, 2012.

[5]K. Balog, L. Azzopardi, and M. de Rijke, “Formal models

for expert finding in enterprise corpora,” in Proc. 29thAnnu.

Int. ACM SIGIR Conf. Res. Develop. Inform. Retrieval, Aug.

2006, pp. 43–50.

[6]P. S. Bishnu and V. Bhattacherjee, “Software fault

prediction using quad tree-based k-means clustering

algorithm,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 6, pp.

1146–1150, Jun. 2012.

[7]H. Brighton and C. Mellish, “Advances in instance

selection for instance-based learning algorithms,” Data

Mining Knowl. Discovery, vol. 6, no. 2, pp. 153–172,

Apr.2002. A. K. Uysal and S. Gunal, “A novel probabilistic

feature selection method for text classification,” Knowledge-

Based Systems, vol. 36, no. 0, pp. 226–235, 2012.

[8]S. Kim, H. Zhang, R. Wu, and L. Gong, “Dealing with

noise in defect prediction,” in Proc. 32nd ACM/IEEE

Int.Conf. Softw. Eng., May 2010, pp. 481–490.

[9]A. Lamkanfi, S. Demeyer, E. Giger, and B.

Goethals,“Predicting the severity of a reported bug,” in Proc.

7th

IEEE Working Conf. Mining Softw. Repositories, May

2010, pp. 1–10.

Author’s Profile:

Dr. G Prakash Babu M Tech., Ph.D.,

Working as Professor at Intell Engineering

College, Anantapur affiliated by JNTUA

University Anantapur and has vast experience

in Teaching field and has published may

National and Internal Journals in various

disciplines.

Bhavana Reddy, is pursuing her M.Tech in Dept of CSE,

Intell Engineering College, Affiliated to JNTUA University,

Ananthapur.