bug triage with bug data reduction - ijitechijitech.org/uploads/531462ijit10465-315.pdf ·...
TRANSCRIPT
WWW.IJITECH.ORG
ISSN 2321-8665
Vol.04,Issue.10,
August-2016,
Pages:1737-1739
Copyright @ 2016 IJIT. All rights reserved.
Bug Triage with Bug Data Reduction DR. G. PRAKASH BABU
1, BHAVANA REDDY
2
1Professor, Dept of CSE, Intell Engineering College, Anantapur, AP, India, E-mail: [email protected].
2PG Scholar, Dept of CSE, Intell Engineering College, Anantapur, AP, India, E-mail: [email protected].
Abstract: The process of fixing bug is bug triage, which aims
to correctly assign a developer to a new bug. Software
companies spend most of their cost in dealing with these bugs.
To reduce time and cost of bug triaging, we present an
automatic approach to predict a developer with relevant
experience to solve the new coming report. In proposed
approach we are doing data reduction on bug data set which
will reduce the scale of the data as well as increase the quality
of the data. We are using instance selection and feature
selection simultaneously with historical bug data. We have
added a new module here which will describe the status of the
bug like whether it assigned to any developer or not and it is
rectified or not.
Keywords: Bug, Bug Triage, Data Reduction, Instance
Selection, Data Mining.
I. INTRODUCTION
A bug repository plays an important role in managing
software bugs. Many open source software projects have an
open bug repository that allows both developers and users to
submit defects or issues in the software, suggest possible
enhancements, and comment on existing bug reports. For
open source large-scale software projects, the number of daily
bugs is so large which makes the triaging process very
difficult and challenging . Software companies spend over 45
percent of cost in fixing bugs. There are two challenges
related to bug data that may affect the effective use of bug
repositories in software development tasks, namely the large
scale and the low quality. In a bug repository, a bug is
maintained as a bug report, which records the textual
description of reproducing the bug and updates according to
the status of bug fixing. Primary contribution of this paper is
as follow: Here in this paper we are using feature selection
and instance selection with historical data for reducing the
bug data in bug repository so that we get quality data as well
as low scale data. We are also adding a graph module for
representing the bug report’s. Section II describes the
architecture of the proposed system. The details of instance
selection, feature selection, historical data use and graph
module is given in section III and the summary is concluded
in section IV.
A. Existing System
We review existing work on modeling bug data, bug triage,
and the quality of bug data with defect prediction. 7.1
Modeling Bug Data To investigate the relationships in bug
data, Sandusky et al. Form a bug report network to examine
the dependency among bug reports. Besides studying
relationships among bug reports, Hong et al. build a
developer social network to examine the collaboration among
developers based on the bug data in Mozilla project. This
developer social network is helpful to understand the
developer community and the project evolution. By mapping
bug priorities to developers, Xuan et al. identify the developer
prioritization in open source bug repositories. The developer
prioritization can distinguish developers and assist tasks in
software maintenance. Bug Triage Bug triage aims to assign
an appropriate developer to fix a new bug, i.e., to determine
who should fix a bug. _Cubrani_c and Murphy first propose
the problem of automatic bug triage to reduce the cost of
manual bug triage. They apply text classification techniques
to predict related developers. Anvik et al. examine multiple
techniques on bug triage, including data preparation and
typical classifiers.
B. Proposed System
The primary contributions of this paper are as follows:
We present the problem of data reduction for bug triage.
This problem aims to augment the data set of bug triage
in two aspects, namely a) to simultaneously reduce the
scales of the bug dimension and the word dimension and
b) to improve the accuracy of bug triage.
We propose a combination approach to addressing the
problem of data reduction. This can be viewed as an
application of instance selection and feature selection in
bug repositories.
We build a binary classifier to predict the order of
applying instance selection and feature selection. To our
knowledge, the order of applying instance selection and
feature selection has not been investigated in related
domains. This paper is an extension of our previous work.
In this extension, we add new attributes extracted from bug
data sets, prediction for reduction orders, and experiments on
four instance selection algorithms, four feature selection
algorithms, and their combinations In this paper, we address
the problem of data reduction for bug triage, i.e., how to
reduce the bug data to save the labor cost of developers and
improve the quality to facilitate the process of bug triage.
Data reduction for bug triage aims to build a small-scale and
DR. G. PRAKASH BABU, BHAVANA REDDY
International Journal of Innovative Technologies
Volume.04, Issue No.10, August-2016, Pages: 1737-1739
high-quality set of bug data by removing bug reports and
words, which are redundant or non-informative. In our work,
we combine existing techniques of instance selection and
feature selection to simultaneously reduce the bug dimension
and the word dimension. The reduced bug data contain fewer
bug reports and fewer words than the original bug data and
provide similar information over the original bug data. We
evaluate the reduced bug data according to two criteria: the
scale of a data set and the accuracy of bug triage. To avoid the
bias of a single algorithm, we empirically examine the results
of four instance selection algorithms and four feature selection
algorithms.
II. ARCHITECTURE
A. Bug Triage
Aim of bug triage is to assign a developer for bug fixing.
Once a developer is assigned to a new bug report he will fix
the bug or try to rectify it. He will give the status related to
bug whether it is rectified or not [1].
Fig.1.
B. Data Reduction
Here we are reducing the bug data by using instance and
feature selection so that we get low scale as well as quality
data.
Fig.2.
III. INSTANCE SELECTION
Instance selection methods associated with data mining
tasks such as classification and clustering.
It’s a nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns
in data.
Choosing a subset of data to achieve the original purpose
of a data mining application as if the whole data is used.
The ideal outcome of instance selection is model
independent
(1)
A. Evaluation Measures
Direct Measure: Keep as much resemblance as possible
between the selected data and the original data.
Ex: Entropy, moments, and histograms.
Indirect Measure:
For example, a classifier can be used to check whether
instance selection results in better, equal, or worse
predictive accuracy.
Conventional evaluation methods in sampling,
classification, and clustering can be used in assessing the
performance of instance selection. Ex) Precision, recall.
B. Feature Selection
It select a minimum set of features such that the
probability distribution of different classes given the
values for those features is as close as possible to the
original distribution given the values of all features [1].
Reduce # of patterns in the patterns, easier to understand.
Create new attributes that can capture the important
information in a data set much more efficiently than the
original attributes.
Use the smallest representation which is enough to solve
the task.
Heuristic Methods:
Step-wise forward selection
Step-wise backward elimination
Create new attributes that can capture the important
information in a data set much more efficiently than the
original attributes Three general methodologies:
Feature extraction domain-specific
Mapping data to new space (see: data reduction)
Fig.3. Instance selection in Mozilla and feature selection in
Mozilla.
Bug Triage with Bug Data Reduction
International Journal of Innovative Technologies
Volume.04, Issue No.10, August-2016, Pages: 1737-1739
C. Graph Module
This module show’s four part’s as follow:
Firstly it will show how many bugs are not assigned to
any developer. It will give complete status about the bugs
to the admin so that he will come to know which bugs are
not assigned yet.
Secondly it will show how many bugs are not assigned to
any developer. It will give complete status about the bugs
to the admin so that he will come to know which bugs are
assigned.
Thirdly it will show how many bugs are rectified by the
developer’s. It will give complete status about the bugs to
the admin so that he will come to know which bugs are
rectified completely.
Fourthly it will show how many bugs are not rectified by
the developer’s. It will give complete status about the
bugs to the admin so that he will come to know which
bugs are not rectified yet.
Historical Data: This is also used for reducing the bug data.
Here we will enter the date for the the last accessed bug’s and
the data which we get we will use it for data reduction [3].
IV. CONCLUSION
In this paper we have focused on reducing bug data set in
order to have less scale of data and quality data. For that we
have used feature selection and instance selection techniques
of data mining as well as we have used historical data. Our
experimental results showed that this data reduction technique
will give quality data as well as it will reduce the data scale.
We have added new module in this paper than the earlier
which will give various details related to the bugs to
administrator in graphical format.In future work, we plan on
improving the results of data reduction more in bug triage to
explore how to prepare a high quality bug data set.
V. REFERENCES
[1]Jifeng Xuan, He Jiang, Yan Hu, Zhilei Ren, Weiqin Zou,
Zhongxuan Luo, and Xindong Wu,” Towards Effective Bug
Triage with Software Data Reduction Techniques” ieee
transactions on knowledge and data engineering, vol. 27, no.
1, january 2015.
[2] Mamdouh Alenezi and Kenneth Magel, Shadi Banitaan
“Efficient Bug Triaging Using Text Mining” © 2013 academy
publisher.
[3]Francisco Servant “Supporting Bug Investigation using
History Analysis” 978-1-4799-0215-6/13 c 2013 IEEE
[4]Pamela Bhattacharya, Iulian Neamtiu, Christian R. Shelton,
“Automated, Highly- Accurate, Bug Assignment Using
Machine Learning and Tossing Graphs”, May 2, 2012.
[5]K. Balog, L. Azzopardi, and M. de Rijke, “Formal models
for expert finding in enterprise corpora,” in Proc. 29thAnnu.
Int. ACM SIGIR Conf. Res. Develop. Inform. Retrieval, Aug.
2006, pp. 43–50.
[6]P. S. Bishnu and V. Bhattacherjee, “Software fault
prediction using quad tree-based k-means clustering
algorithm,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 6, pp.
1146–1150, Jun. 2012.
[7]H. Brighton and C. Mellish, “Advances in instance
selection for instance-based learning algorithms,” Data
Mining Knowl. Discovery, vol. 6, no. 2, pp. 153–172,
Apr.2002. A. K. Uysal and S. Gunal, “A novel probabilistic
feature selection method for text classification,” Knowledge-
Based Systems, vol. 36, no. 0, pp. 226–235, 2012.
[8]S. Kim, H. Zhang, R. Wu, and L. Gong, “Dealing with
noise in defect prediction,” in Proc. 32nd ACM/IEEE
Int.Conf. Softw. Eng., May 2010, pp. 481–490.
[9]A. Lamkanfi, S. Demeyer, E. Giger, and B.
Goethals,“Predicting the severity of a reported bug,” in Proc.
7th
IEEE Working Conf. Mining Softw. Repositories, May
2010, pp. 1–10.
Author’s Profile:
Dr. G Prakash Babu M Tech., Ph.D.,
Working as Professor at Intell Engineering
College, Anantapur affiliated by JNTUA
University Anantapur and has vast experience
in Teaching field and has published may
National and Internal Journals in various
disciplines.
Bhavana Reddy, is pursuing her M.Tech in Dept of CSE,
Intell Engineering College, Affiliated to JNTUA University,
Ananthapur.