[ieee 2008 15th working conference on reverse engineering (wcre) - antwerp, belgium...
TRANSCRIPT
Towards a Benchmark for Evaluating Reverse Engineering Tools
Lajos Jeno Fulop, Peter Hegedus, Rudolf Ferenc and Tibor Gyimothy
University of Szeged, Department of Software Engineering
{flajos|ferenc|gyimi}@inf.u-szeged.hu, [email protected]
Abstract
In this paper we present work in progress towards imple-
menting a benchmark called BEFRIEND (BEnchmark For
Reverse engInEering tools workiNg on source coDe), with
which the outputs of reverse engineering tools can be evalu-
ated and compared easily and efficiently. Such tools are e.g.
design pattern miners, duplicated code detectors and cod-
ing rule violation checkers. BEFRIEND supports different
kinds of tool families, programming languages and software
systems, and it enables the users to define their own evalu-
ation criteria.
Keywords
Benchmark, reverse engineering tools, tool evaluation
1 Introduction
Several design pattern recognition tools have been in-
troduced in literature, and so far they have proven to be
rather efficient. Despite this, it would be difficult to state
that the performance of design pattern recognition tools is
well-defined and well-known as far as the accuracy and rate
of the recognized patterns are concerned. So far, this has
been quite difficult to achieve since for the comparison of
different tools a common measuring tool and a common set
of testing data are needed. To solve this problem, we devel-
oped the DEEBEE (DEsign pattern Evaluation BEnchmark
Environment) benchmark system in our previous work [2].
The current work introduces the further development of
the DEEBEE system which has become more widely appli-
cable by generalizing the evaluating aspects and the data
to be indicated. The new system is called BEFRIEND
(BEnchmark For Reverse engInEering tools workiNg on
source coDe). With BEFRIEND, the results of reverse engi-
neering tools from different domains recognizing arbitrary
characteristics of source code can be subjectively evaluated
and compared with each other. Such tools are, e.g. bad code
smell miners, duplicated code detectors, and coding rule vi-
olation checkers.
BEFRIEND largely differs from its predecessor in five
areas: (1) it enables uploading and evaluating results related
to different domains, (2) it enables adding and deleting the
evaluating aspects of the results arbitrarily, (3) it introduces
a new user interface, (4) it generalizes the definition of sib-
ling relationships [2], and (5) it enables uploading files in
different formats by adding the appropriate uploading plug-
in. BEFRIEND is a freely accessible online system avail-
able at http://www.inf.u-szeged.hu/befriend/.
2 Motivation
Nowadays, more and more papers introducing the evalu-
ation of reverse engineering tools are published. These are
needed because the number of reverse engineering tools is
increasing and it is difficult to decide which of these tools
is the most suitable to perform a given task.
Petterson et al. [3] summarized problems during the
evaluation of accuracy in pattern detection. The goal was
to make accuracy measurements more comparable. They
stated that community effort is highly required to make con-
trol sets for a set of applications.
Bellon et al. [1] presented an experiment to evaluate and
compare clone detectors. The experiment involved several
researchers who applied their tools on carefully selected
large C and Java programs. Their benchmark gives a stan-
dard procedure for every new clone detector.
Wagner et al. [5] compared 3 Java bug searching tools
on 1 university and 5 industrial projects. A 5-level severity
scale, which can be integrated into BEFRIEND, served as
the basis for comparison.
Moreover, several articles dealt with comparing and
evaluating some kind of reverse engineering tools but they
had to miss the support of an automated framework like BE-
FRIEND.
3 Architecture
We use the well-known issue and bug tracking system
called Trac [4] (version 0.10.3) as the basis of the bench-
mark. Issue tracking is based on tickets, where a ticket
stores all information about an issue or a bug. Trac is writ-
ten in Python and it is an easily extendible and customizable
plug-in oriented system.
2008 15th Working Conference on Reverse Engineering
1095-1350/08 $25.00 © 2008 IEEE
DOI 10.1109/WCRE.2008.18
335
2008 15th Working Conference on Reverse Engineering
1095-1350/08 $25.00 © 2008 IEEE
DOI 10.1109/WCRE.2008.18
335
2008 15th Working Conference on Reverse Engineering
1095-1350/08 $25.00 © 2008 IEEE
DOI 10.1109/WCRE.2008.18
335
2008 15th Working Conference on Reverse Engineering
1095-1350/08 $25.00 © 2008 IEEE
DOI 10.1109/WCRE.2008.18
335
2008 15th Working Conference on Reverse Engineering
1095-1350/08 $25.00 © 2008 IEEE
DOI 10.1109/WCRE.2008.18
335
Although the Trac system provides many useful services,
we had to do a lot of customization and extension work
to create a benchmark from it. The two major extensions
were the customization of the graphical user interface and
the customization of the system’s tickets. In the case of the
graphical user interface we had to inherit and implement
some core classes of the Trac system. In the case of the
tickets we had to extend them to be able to describe design
pattern, duplicate code and rule violation instances (name of
the pattern or rule violation, information about its position
in the source code, information about its evaluation, etc).
Furthermore we extended the database schema to support
different kind of reverse engineering tools.
4 BEFRIEND
BEFRIEND serves the evaluation of tools working on
source code, which hereafter will be called tools. The tools
can be classified into domains. The tools in a given do-
main produce different results which refer to one or more
positions in the analyzed source code. We refer to these po-
sitions as result instances. Many times it may happen that
several instances can be grouped, which can largely speed
up their evaluation. Furthermore, it can be said that with-
out grouping, the interpretation of tool results may lead to
false conclusions. In order to group instances, their rela-
tions need to be defined. If two instances are related to each
other, they are called siblings. BEFRIEND supports differ-
ent kinds of sibling mechanisms which cannot be detailed
here because of space limitations.
During the development of BEFRIEND, we were striv-
ing for full generalization: an arbitrary number of domains
can be created (design patterns, bad smells, rule violations,
duplicated code, etc.), the domain evaluation aspects and
the setting of instance siblings can be customized. Further-
more, for uploading the results of different tools, the bench-
mark provides an import filter plug-in mechanism.
In the following, we show the steps needed to perform
a tool evaluation and comparison task in a concrete domain
(e.g. duplicated code) with the help of BEFRIEND:
1. add the new domain to the benchmark (e.g. Duplicated
Code).
2. add one or more evaluation criteria with evaluation
queries for the new domain (e.g. Procedure abstrac-
tion – Is it worth substituting the duplicated code frag-
ments with a new function and function calls?).
3. upload the results of the tools and set the appropriate
sibling relations.
4. evaluate the uploaded results with the evaluation crite-
ria defined in step 2.
5. by using the statistics and comparison functionality of
the benchmark the tools can be easily evaluated and
compared.
Benefits. Previously, if one wanted to evaluate and com-
pare reverse engineering tools he had to examine and search
the appropriate source code fragments by hand. For ex-
ample, he had to traverse several directories and files and
search the exact lines and columns in a source file. Con-
sequently, he could make mistakes and examine the wrong
files. Furthermore, he had to store his evaluation results
somewhere for later use which is now automatically sup-
ported by BEFRIEND.
To evaluate and compare reverse engineering tools with-
out BEFRIEND one also has to define evaluation crite-
ria, find test cases, and evaluate the results by hand. BE-
FRIEND only has the cost of implementing the appropriate
plug-in for uploading the results of a tool. However, this
is a small cost, plug-ins that have been developed till now
contain less than 100 lines of code.
The evaluation of the tools with BEFRIEND is clearly
faster than without it (it costs less).
5 Conclusion and future workThis work is the first step to create a generally applica-
ble benchmark that can help to evaluate and compare many
kinds of reverse engineering tools. In the future, we will
need the opinion and advice of reverse engineering tool de-
velopers in order for the benchmark to achieve this aim and
satisfy all needs.
In the future, we would like to examine further reverse
engineering domains, prepare the benchmark for these do-
mains and deal with the possible shortcomings.
References
[1] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo.
Comparison and Evaluation of Clone Detection Tools. In
IEEE Transactions on Software Engineering, volume 33,
pages 577–591, 2007.[2] L. J. Fulop, R. Ferenc, and T. Gyimothy. Towards a Bench-
mark for Evaluating Design Pattern Miner Tools. In Proceed-
ings of the 12th European Conference on Software Mainte-
nance and Reengineering (CSMR 2008). IEEE Computer So-
ciety, Apr. 2008.[3] N. Pettersson, W. Lowe, and J. Nivre. On evaluation of
accuracy in pattern detection. In First International Work-
shop on Design Pattern Detection for Reverse Engineering
(DPD4RE’06), October 2006.[4] The Trac Homepage.
http://trac.edgewall.org/.[5] S. Wagner, J. Jurjens, C. Koller, and P. Trischberger. Compar-
ing bug finding tools with reviews and tests. In In Proceedings
of 17th International Conference on Testing of Communicat-
ing Systems (TestCom’05), pages 40–55. Springer, 2005.
336336336336336