[ieee 2008 15th working conference on reverse engineering (wcre) - antwerp, belgium...

2
Towards a Benchmark for Evaluating Reverse Engineering Tools Lajos Jen ˝ oF¨ ul¨ op, P´ eter Heged˝ us, Rudolf Ferenc and Tibor Gyim´ othy University of Szeged, Department of Software Engineering {flajos|ferenc|gyimi}@inf.u-szeged.hu, [email protected] Abstract In this paper we present work in progress towards imple- menting a benchmark called BEFRIEND (BEnchmark For Reverse engInEering tools workiNg on source coDe), with which the outputs of reverse engineering tools can be evalu- ated and compared easily and efficiently. Such tools are e.g. design pattern miners, duplicated code detectors and cod- ing rule violation checkers. BEFRIEND supports different kinds of tool families, programming languages and software systems, and it enables the users to define their own evalu- ation criteria. Keywords Benchmark, reverse engineering tools, tool evaluation 1 Introduction Several design pattern recognition tools have been in- troduced in literature, and so far they have proven to be rather efficient. Despite this, it would be difficult to state that the performance of design pattern recognition tools is well-defined and well-known as far as the accuracy and rate of the recognized patterns are concerned. So far, this has been quite difficult to achieve since for the comparison of different tools a common measuring tool and a common set of testing data are needed. To solve this problem, we devel- oped the DEEBEE (DEsign pattern Evaluation BEnchmark Environment) benchmark system in our previous work [2]. The current work introduces the further development of the DEEBEE system which has become more widely appli- cable by generalizing the evaluating aspects and the data to be indicated. The new system is called BEFRIEND (BEnchmark For Reverse engInEering tools workiNg on source coDe). With BEFRIEND, the results of reverse engi- neering tools from different domains recognizing arbitrary characteristics of source code can be subjectively evaluated and compared with each other. Such tools are, e.g. bad code smell miners, duplicated code detectors, and coding rule vi- olation checkers. BEFRIEND largely differs from its predecessor in five areas: (1) it enables uploading and evaluating results related to different domains, (2) it enables adding and deleting the evaluating aspects of the results arbitrarily, (3) it introduces a new user interface, (4) it generalizes the definition of sib- ling relationships [2], and (5) it enables uploading files in different formats by adding the appropriate uploading plug- in. BEFRIEND is a freely accessible online system avail- able at http://www.inf.u-szeged.hu/befriend/. 2 Motivation Nowadays, more and more papers introducing the evalu- ation of reverse engineering tools are published. These are needed because the number of reverse engineering tools is increasing and it is difficult to decide which of these tools is the most suitable to perform a given task. Petterson et al. [3] summarized problems during the evaluation of accuracy in pattern detection. The goal was to make accuracy measurements more comparable. They stated that community effort is highly required to make con- trol sets for a set of applications. Bellon et al. [1] presented an experiment to evaluate and compare clone detectors. The experiment involved several researchers who applied their tools on carefully selected large C and Java programs. Their benchmark gives a stan- dard procedure for every new clone detector. Wagner et al. [5] compared 3 Java bug searching tools on 1 university and 5 industrial projects. A 5-level severity scale, which can be integrated into BEFRIEND, served as the basis for comparison. Moreover, several articles dealt with comparing and evaluating some kind of reverse engineering tools but they had to miss the support of an automated framework like BE- FRIEND. 3 Architecture We use the well-known issue and bug tracking system called Trac [4] (version 0.10.3) as the basis of the bench- mark. Issue tracking is based on tickets, where a ticket stores all information about an issue or a bug. Trac is writ- ten in Python and it is an easily extendible and customizable plug-in oriented system. 2008 15th Working Conference on Reverse Engineering 1095-1350/08 $25.00 © 2008 IEEE DOI 10.1109/WCRE.2008.18 335 2008 15th Working Conference on Reverse Engineering 1095-1350/08 $25.00 © 2008 IEEE DOI 10.1109/WCRE.2008.18 335 2008 15th Working Conference on Reverse Engineering 1095-1350/08 $25.00 © 2008 IEEE DOI 10.1109/WCRE.2008.18 335 2008 15th Working Conference on Reverse Engineering 1095-1350/08 $25.00 © 2008 IEEE DOI 10.1109/WCRE.2008.18 335 2008 15th Working Conference on Reverse Engineering 1095-1350/08 $25.00 © 2008 IEEE DOI 10.1109/WCRE.2008.18 335

Upload: tibor

Post on 10-Mar-2017

219 views

Category:

Documents


2 download

TRANSCRIPT

Towards a Benchmark for Evaluating Reverse Engineering Tools

Lajos Jeno Fulop, Peter Hegedus, Rudolf Ferenc and Tibor Gyimothy

University of Szeged, Department of Software Engineering

{flajos|ferenc|gyimi}@inf.u-szeged.hu, [email protected]

Abstract

In this paper we present work in progress towards imple-

menting a benchmark called BEFRIEND (BEnchmark For

Reverse engInEering tools workiNg on source coDe), with

which the outputs of reverse engineering tools can be evalu-

ated and compared easily and efficiently. Such tools are e.g.

design pattern miners, duplicated code detectors and cod-

ing rule violation checkers. BEFRIEND supports different

kinds of tool families, programming languages and software

systems, and it enables the users to define their own evalu-

ation criteria.

Keywords

Benchmark, reverse engineering tools, tool evaluation

1 Introduction

Several design pattern recognition tools have been in-

troduced in literature, and so far they have proven to be

rather efficient. Despite this, it would be difficult to state

that the performance of design pattern recognition tools is

well-defined and well-known as far as the accuracy and rate

of the recognized patterns are concerned. So far, this has

been quite difficult to achieve since for the comparison of

different tools a common measuring tool and a common set

of testing data are needed. To solve this problem, we devel-

oped the DEEBEE (DEsign pattern Evaluation BEnchmark

Environment) benchmark system in our previous work [2].

The current work introduces the further development of

the DEEBEE system which has become more widely appli-

cable by generalizing the evaluating aspects and the data

to be indicated. The new system is called BEFRIEND

(BEnchmark For Reverse engInEering tools workiNg on

source coDe). With BEFRIEND, the results of reverse engi-

neering tools from different domains recognizing arbitrary

characteristics of source code can be subjectively evaluated

and compared with each other. Such tools are, e.g. bad code

smell miners, duplicated code detectors, and coding rule vi-

olation checkers.

BEFRIEND largely differs from its predecessor in five

areas: (1) it enables uploading and evaluating results related

to different domains, (2) it enables adding and deleting the

evaluating aspects of the results arbitrarily, (3) it introduces

a new user interface, (4) it generalizes the definition of sib-

ling relationships [2], and (5) it enables uploading files in

different formats by adding the appropriate uploading plug-

in. BEFRIEND is a freely accessible online system avail-

able at http://www.inf.u-szeged.hu/befriend/.

2 Motivation

Nowadays, more and more papers introducing the evalu-

ation of reverse engineering tools are published. These are

needed because the number of reverse engineering tools is

increasing and it is difficult to decide which of these tools

is the most suitable to perform a given task.

Petterson et al. [3] summarized problems during the

evaluation of accuracy in pattern detection. The goal was

to make accuracy measurements more comparable. They

stated that community effort is highly required to make con-

trol sets for a set of applications.

Bellon et al. [1] presented an experiment to evaluate and

compare clone detectors. The experiment involved several

researchers who applied their tools on carefully selected

large C and Java programs. Their benchmark gives a stan-

dard procedure for every new clone detector.

Wagner et al. [5] compared 3 Java bug searching tools

on 1 university and 5 industrial projects. A 5-level severity

scale, which can be integrated into BEFRIEND, served as

the basis for comparison.

Moreover, several articles dealt with comparing and

evaluating some kind of reverse engineering tools but they

had to miss the support of an automated framework like BE-

FRIEND.

3 Architecture

We use the well-known issue and bug tracking system

called Trac [4] (version 0.10.3) as the basis of the bench-

mark. Issue tracking is based on tickets, where a ticket

stores all information about an issue or a bug. Trac is writ-

ten in Python and it is an easily extendible and customizable

plug-in oriented system.

2008 15th Working Conference on Reverse Engineering

1095-1350/08 $25.00 © 2008 IEEE

DOI 10.1109/WCRE.2008.18

335

2008 15th Working Conference on Reverse Engineering

1095-1350/08 $25.00 © 2008 IEEE

DOI 10.1109/WCRE.2008.18

335

2008 15th Working Conference on Reverse Engineering

1095-1350/08 $25.00 © 2008 IEEE

DOI 10.1109/WCRE.2008.18

335

2008 15th Working Conference on Reverse Engineering

1095-1350/08 $25.00 © 2008 IEEE

DOI 10.1109/WCRE.2008.18

335

2008 15th Working Conference on Reverse Engineering

1095-1350/08 $25.00 © 2008 IEEE

DOI 10.1109/WCRE.2008.18

335

Although the Trac system provides many useful services,

we had to do a lot of customization and extension work

to create a benchmark from it. The two major extensions

were the customization of the graphical user interface and

the customization of the system’s tickets. In the case of the

graphical user interface we had to inherit and implement

some core classes of the Trac system. In the case of the

tickets we had to extend them to be able to describe design

pattern, duplicate code and rule violation instances (name of

the pattern or rule violation, information about its position

in the source code, information about its evaluation, etc).

Furthermore we extended the database schema to support

different kind of reverse engineering tools.

4 BEFRIEND

BEFRIEND serves the evaluation of tools working on

source code, which hereafter will be called tools. The tools

can be classified into domains. The tools in a given do-

main produce different results which refer to one or more

positions in the analyzed source code. We refer to these po-

sitions as result instances. Many times it may happen that

several instances can be grouped, which can largely speed

up their evaluation. Furthermore, it can be said that with-

out grouping, the interpretation of tool results may lead to

false conclusions. In order to group instances, their rela-

tions need to be defined. If two instances are related to each

other, they are called siblings. BEFRIEND supports differ-

ent kinds of sibling mechanisms which cannot be detailed

here because of space limitations.

During the development of BEFRIEND, we were striv-

ing for full generalization: an arbitrary number of domains

can be created (design patterns, bad smells, rule violations,

duplicated code, etc.), the domain evaluation aspects and

the setting of instance siblings can be customized. Further-

more, for uploading the results of different tools, the bench-

mark provides an import filter plug-in mechanism.

In the following, we show the steps needed to perform

a tool evaluation and comparison task in a concrete domain

(e.g. duplicated code) with the help of BEFRIEND:

1. add the new domain to the benchmark (e.g. Duplicated

Code).

2. add one or more evaluation criteria with evaluation

queries for the new domain (e.g. Procedure abstrac-

tion – Is it worth substituting the duplicated code frag-

ments with a new function and function calls?).

3. upload the results of the tools and set the appropriate

sibling relations.

4. evaluate the uploaded results with the evaluation crite-

ria defined in step 2.

5. by using the statistics and comparison functionality of

the benchmark the tools can be easily evaluated and

compared.

Benefits. Previously, if one wanted to evaluate and com-

pare reverse engineering tools he had to examine and search

the appropriate source code fragments by hand. For ex-

ample, he had to traverse several directories and files and

search the exact lines and columns in a source file. Con-

sequently, he could make mistakes and examine the wrong

files. Furthermore, he had to store his evaluation results

somewhere for later use which is now automatically sup-

ported by BEFRIEND.

To evaluate and compare reverse engineering tools with-

out BEFRIEND one also has to define evaluation crite-

ria, find test cases, and evaluate the results by hand. BE-

FRIEND only has the cost of implementing the appropriate

plug-in for uploading the results of a tool. However, this

is a small cost, plug-ins that have been developed till now

contain less than 100 lines of code.

The evaluation of the tools with BEFRIEND is clearly

faster than without it (it costs less).

5 Conclusion and future workThis work is the first step to create a generally applica-

ble benchmark that can help to evaluate and compare many

kinds of reverse engineering tools. In the future, we will

need the opinion and advice of reverse engineering tool de-

velopers in order for the benchmark to achieve this aim and

satisfy all needs.

In the future, we would like to examine further reverse

engineering domains, prepare the benchmark for these do-

mains and deal with the possible shortcomings.

References

[1] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo.

Comparison and Evaluation of Clone Detection Tools. In

IEEE Transactions on Software Engineering, volume 33,

pages 577–591, 2007.[2] L. J. Fulop, R. Ferenc, and T. Gyimothy. Towards a Bench-

mark for Evaluating Design Pattern Miner Tools. In Proceed-

ings of the 12th European Conference on Software Mainte-

nance and Reengineering (CSMR 2008). IEEE Computer So-

ciety, Apr. 2008.[3] N. Pettersson, W. Lowe, and J. Nivre. On evaluation of

accuracy in pattern detection. In First International Work-

shop on Design Pattern Detection for Reverse Engineering

(DPD4RE’06), October 2006.[4] The Trac Homepage.

http://trac.edgewall.org/.[5] S. Wagner, J. Jurjens, C. Koller, and P. Trischberger. Compar-

ing bug finding tools with reviews and tests. In In Proceedings

of 17th International Conference on Testing of Communicat-

ing Systems (TestCom’05), pages 40–55. Springer, 2005.

336336336336336