Download - Categorizing Bugs with Social Networks
Categorizing Bugs with Social Networks: a case study of four open source software communities
Presented by A. Ibrahim
ECE 654 – Spring 2013
Bug Reports Categorization (Problem)
• Open source systems such as Firefox receive incredibly large number of bug reports every month through there bug tracking system.
• Mozilla Firefox
• Community has processed 64,000 bug reports (until current release)
• 50,000 (~ 79 %) of those were faulty
• An automatic categorization will be appreciated
• Automated prioritization of valid reports
• Can decrease response and fix time
• Can increase productivity
Source: Mozilla Quest Magazine (mozillaquest.com)
Classification of Bugs with Social Network Analysis (solution)
• Social organization of bug reportersprovides a novel dimension forcategorizing bug report.
• In particular, using topological
measures to qualify the position of
bug reporters in a collaborative
development network can predict the
status of bug report (valid or faulty).
Social Organization of Bug Reporters
[Zanetti et al. 2013]
Valid Faulty
Classification of Bugs with Social Network Analysis (cont.)
timet+30t
The position of a bug reporter in the monthly collaboration network is indicative for the eventual outcome of the
bug handling process
Bug reports
Predictive
Model
Bugzilla is general
purpose bug
tracker and testing
tool originally
developed and used
by the Mozilla
project
Time Frame
(30 days)
Network Construction
• When a new user join a community of OSS, he can add his name to CC of Assign list.
• CC is a list of users that subscribing into the community to receive information about future updates on bug reports.
• Assign is a list of users assigning the task of handling a bug to another user.
Social Organization of Bug Reporters
[Zanetti et al. 2013]
• For the construction of social networks, the authors focus on capturing the pairwise
interactions between two types of users (community manages and developers) in the
communities of the selected open source systems.
Methodology Verification
time
tt-30
Time Frame
(30 days)
t+30
����Eigenvector Centrality
�1: �� < ��
�2: �� > ��
(Valid Report)
(Faulty Report)
H3: The position of a bug
reporter in the monthly
collaboration network
preceding the time of the
report is indicative for the
eventual outcome of the
bug handling process.
Bug Report Submission
Building a Predictive Model (Classifier)
Simple Classifiers
LCC (the network’s largest connected components) is
the size of the largest group of nodes in a network that
are all reachable from each other.
Support Vector Machine
• Topological measures used for the prediction of bug reports quality:
• centrality
• eigenvector
• closeness
• Betweenness
• k-coreness
• Degree
• Total, In-degree, out-degree
• Clustering coefficient
Data retrieval (4 open source programs)
• Authors used a data set of more than 700,000 bug reports obtained from Bugzilla installations of four OSS (Firefox, Thunderbird, Eclipse and NetBeans) communities for a period of 10 years (Jan 1999 to Jun 2012).
• This data set contains the full history of change events (5.8 million).
112,968
35,388
356,415
210,921
64,088
21,644
158,957
42,851
FIREFOX THUNDERBIRD ECLIPSE NETBEANS
# o
f B
ug
Re
po
rts
Open Software Systems Selected for Study
Total bug reports Resolved bugs
Results
• Social networks can be used to automatically categorize bug reports.
• Precision and recall higher than for competing methods (except?).
44.1
62.1
76.3
71.9
60.4 68.6 76.3
76.782.5 90.3
88.7
78.9
FIREFOX THUNDERBIRD ECLIPSE NETBEANS
PRECISION
LCC EVCENT SVM
50.9
44.5
62.6
62.4
30.5
5.4
62.6
38.844.5
38.9
91
87
F IREFOX THUNDERBIRD ECL IPSE NETBEANS
RECALL
LCC EVCENT SVM
Questions?