[ieee 2010 international conference on emerging technologies (icet) - islamabad, pakistan...

6
Role of Relationships during Clustering of Object-Oriented Software Systems Siraj Muhammad , Onaiza Maqbool ,Abdul Qudus Abbasi ∗† Dept. of Computer Science, Quaid-e-Azam University,Islamabad Elixir Technology Pakistan (Pvt) Ltd Email: [email protected], [email protected], [email protected] Abstract—Clustering has been applied by researchers for the architecture recovery of software systems. Clustering algorithms form clusters of similar entities, where similarity is determined by the characteristics of an entity or the relationships that exist between entities. Thus selecting appropriate relationships is im- portant for improving cluster quality. As compared to structured systems, for which relationships have been evaluated, relatively little work has been done for object-oriented software systems to determine which relationships produce better clustering results. In this paper, we divide relationships within object-oriented systems into different categories and evaluate them. We conduct experiments on three test systems using well known hierarchical clustering algorithms. Our experimental results indicate the relationships that improve the quality of clustering results. I. I NTRODUCTION It is a well known fact that software maintenance is an expensive activity. According to Glass [1], maintenance is the single most expensive software activity, and hence perhaps also the most important. Studies report that system understand- ing takes up 47% or more of the software maintenance effort [2]. These facts have led researchers to explore techniques for easing software system understanding. System understanding may be gained at the detailed (pro- gram) level, or at the high (architectural) level. Architec- ture, which refers to the structure of a system, “comprises software elements, the externally visible properties of those elements, and relationships among them” [3]. Architectural level understanding is important for many reasons including determining whether a system has the ability to fulfill its requirements, to adapt to changing requirements, and also for enabling reuse of components [4]. A widely used technique for gaining architectural level understanding of software systems is clustering, which refers to the process of grouping similar entities together. Researchers have employed this technique for gaining architectural understanding [5], [6], for architecture recovery [4], [7], and have also developed new clustering algorithms for this purpose [8], [9]. When the clustering process is applied, the first step is to identify the entities that are to be clustered, and their features (characteristics) or relationships on the basis of which similarity between entities will be determined. Since the results of clustering depend on identifying appropriate relationships, there is a need to evaluate which relationships are more important and provide relatively better results. For structured systems, researchers have evaluated individ- ual relationships [8], [10]. Moreover, they have categorized relationships into direct and indirect, and have evaluated them. Direct relationships represent an immediate connection between two entities (e.g. If function f 1 calls another function f 2, f 1 and f 2 are directly related), whereas indirect rela- tionships represent the proportion of common features that two entities share (e.g. If functions f 1 and f 2 both call a function f 3, then f 1 and f 2 are indirectly related). Although researchers have worked on clustering object-oriented software systems, there has been little focus on identifying relationships that may be more useful, and may lead to better architectural understanding. Also, there has been no categorization of relationships in object-oriented systems into direct/indirect. It is relevant to note that the number of relationships in object- oriented systems is larger than in structured systems due to features e.g. inheritance, therefore evaluating relationships is of greater significance. In this paper, we identify important relationships that may exist between entities within object-oriented systems. We place these relationships into direct and indirect categories and then evaluate these categories experimentally by clustering three real life software systems. For clustering, we use well known clustering algorithms. Thus this paper addresses the important issue of identifying (through experiments) the relationship categories that produce better clustering results for object- oriented systems, and thus produce better architectural under- standing. Organization of this paper is as follows. Section II describes related work. Section III discusses our clustering approach. Section IV describes the experimental setup, results and anal- ysis. Section V presents conclusions and future work. II. RELATED WORK For recovering the architecture of software systems, archi- tectural and non-architectural inputs can be used [11]. Non- architectural inputs include static, dynamic, formal and non- formal relationships, whereas architectural inputs are architec- tural styles and viewpoints. Static relationships are extracted from the source code, and directly impact the software behavior. They have been used by researchers for architecture recovery of structured as well as object-oriented systems. Koschke [4] used a number of re- 978-1-4244-8058-6/10/$26.00 ©2010 IEEE 2010 6th International Conference on Emerging Technologies (ICET) 270

Upload: abdul-qudus

Post on 13-Apr-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2010 International Conference on Emerging Technologies (ICET) - Islamabad, Pakistan (2010.10.18-2010.10.19)] 2010 6th International Conference on Emerging Technologies (ICET)

Role of Relationships during Clustering ofObject-Oriented Software Systems

Siraj Muhammad∗, Onaiza Maqbool†,Abdul Qudus Abbasi‡∗†Dept. of Computer Science, Quaid-e-Azam University,Islamabad

‡Elixir Technology Pakistan (Pvt) Ltd

Email: ∗[email protected],†[email protected],†[email protected]

Abstract—Clustering has been applied by researchers for thearchitecture recovery of software systems. Clustering algorithmsform clusters of similar entities, where similarity is determinedby the characteristics of an entity or the relationships that existbetween entities. Thus selecting appropriate relationships is im-portant for improving cluster quality. As compared to structuredsystems, for which relationships have been evaluated, relativelylittle work has been done for object-oriented software systems todetermine which relationships produce better clustering results.In this paper, we divide relationships within object-orientedsystems into different categories and evaluate them. We conductexperiments on three test systems using well known hierarchicalclustering algorithms. Our experimental results indicate therelationships that improve the quality of clustering results.

I. INTRODUCTION

It is a well known fact that software maintenance is an

expensive activity. According to Glass [1], maintenance is the

single most expensive software activity, and hence perhaps

also the most important. Studies report that system understand-

ing takes up 47% or more of the software maintenance effort

[2]. These facts have led researchers to explore techniques for

easing software system understanding.

System understanding may be gained at the detailed (pro-

gram) level, or at the high (architectural) level. Architec-

ture, which refers to the structure of a system, “comprises

software elements, the externally visible properties of those

elements, and relationships among them” [3]. Architectural

level understanding is important for many reasons including

determining whether a system has the ability to fulfill its

requirements, to adapt to changing requirements, and also for

enabling reuse of components [4]. A widely used technique for

gaining architectural level understanding of software systems

is clustering, which refers to the process of grouping similar

entities together. Researchers have employed this technique for

gaining architectural understanding [5], [6], for architecture

recovery [4], [7], and have also developed new clustering

algorithms for this purpose [8], [9].

When the clustering process is applied, the first step is

to identify the entities that are to be clustered, and their

features (characteristics) or relationships on the basis of which

similarity between entities will be determined. Since the results

of clustering depend on identifying appropriate relationships,

there is a need to evaluate which relationships are more

important and provide relatively better results.

For structured systems, researchers have evaluated individ-

ual relationships [8], [10]. Moreover, they have categorized

relationships into direct and indirect, and have evaluated

them. Direct relationships represent an immediate connection

between two entities (e.g. If function f1 calls another function

f2, f1 and f2 are directly related), whereas indirect rela-

tionships represent the proportion of common features that

two entities share (e.g. If functions f1 and f2 both call a

function f3, then f1 and f2 are indirectly related). Although

researchers have worked on clustering object-oriented software

systems, there has been little focus on identifying relationships

that may be more useful, and may lead to better architectural

understanding. Also, there has been no categorization of

relationships in object-oriented systems into direct/indirect. It

is relevant to note that the number of relationships in object-

oriented systems is larger than in structured systems due to

features e.g. inheritance, therefore evaluating relationships is

of greater significance.

In this paper, we identify important relationships that may

exist between entities within object-oriented systems. We place

these relationships into direct and indirect categories and then

evaluate these categories experimentally by clustering three

real life software systems. For clustering, we use well known

clustering algorithms. Thus this paper addresses the important

issue of identifying (through experiments) the relationship

categories that produce better clustering results for object-

oriented systems, and thus produce better architectural under-

standing.

Organization of this paper is as follows. Section II describes

related work. Section III discusses our clustering approach.

Section IV describes the experimental setup, results and anal-

ysis. Section V presents conclusions and future work.

II. RELATED WORK

For recovering the architecture of software systems, archi-

tectural and non-architectural inputs can be used [11]. Non-

architectural inputs include static, dynamic, formal and non-

formal relationships, whereas architectural inputs are architec-

tural styles and viewpoints.

Static relationships are extracted from the source code, and

directly impact the software behavior. They have been used

by researchers for architecture recovery of structured as well

as object-oriented systems. Koschke [4] used a number of re-

978-1-4244-8058-6/10/$26.00 ©2010 IEEE

2010 6th International Conference on Emerging Technologies (ICET)

270

Page 2: [IEEE 2010 International Conference on Emerging Technologies (ICET) - Islamabad, Pakistan (2010.10.18-2010.10.19)] 2010 6th International Conference on Emerging Technologies (ICET)

TABLE ICATEGORIES OF RELATIONSHIPS BETWEEN ENTITIES IN AN OBJECT-ORIENTED SYSTEM

Name Description Direct/IndirectInheritance based (IC)Inheritance The inheritance relationship between a base class and a derived class DSame Inheritance Hierarchy The relationship between classes that are derived from same class IInheritance Type The inheritance type that exists between classes i.e. private, protected or public DVirtual Method Override Represents that virtual methods written in a base class are overridden in derived class DBase Class Variable Access Represents that at least one member variable of base class is accessed by derived class DBase Class Method Access Represents that at least one method of base class is accessed by derived class DContainment based (CC)Containment as Object The relationship is formed by declaring an object of a class in another class (container) DSame Class Containment Represents that classes contain objects of same class IVariable Access Represents that at least one public data member of contained class is accessed by container class DMethod Access Represents that at least one method of contained class is accessed by container class DAssociation based (AC)Maintaining Pointer The relationship where an address variable of one class is declared in another class DMaintaining Reference The relationship formed by declaring a reference variable of one class in another class DMethod Parameter Represents that a method of a class takes an object/pointer/ref. of another class as its parameter DMethod Local Represents that a method of a class declares an object/pointer of another class as its local member DSame Class in Methods Represents classes containing objects of same class declared in a method locally or as parameter IFiles based (FC)Same File This relationship indicates that source code of both classes is written in same file IInclude Source File Represents that source file of one class includes source file of other class using include statement DSame Folder Represents that files containing source code of two classes reside in same folder I

lationships e.g. function calls, global variable access, function

parameters for architectural component recovery in structured

systems. These relationships are also used in the Bunch tool

[5]. For object-oriented systems, static relationships including

inheritance and containment have been explored in [12].

Dynamic relationships are determined during program ex-

ecution. Dynamic information supplements the information

obtained statically and thus may also be useful in architecture

recovery, as shown by results in [13].

Non-formal relationships do not have direct impact on the

behavior of software. For example, file name and directory

structure may be helpful in gaining architectural understand-

ing. Researchers have used different non-formal relationships

in their experiments [8], [10].

For structured systems, researchers have divided relation-

ships into direct and indirect and have evaluated these cat-

egories separately [14], [15]. Results indicate that indirect

relationships produce better results [15].

For object-oriented systems Abbasi used twenty six relation-

ships for architecture recovery [16]. He used the relationships

together and did not evaluate them individually.

III. OUR CLUSTERING APPROACH

As described in Section I, clustering has been used by many

researchers for software architecture recovery of structured [7],

[10] as well as object-oriented systems [12], [17]. Clustering is

concerned with grouping entities based on their characteristics

(features) and/or relationships. For software, an entity may be

a file, function or a class. Relationships represent the depen-

dencies between entities. In the first step during clustering, a

feature matrix i.e. (m x n) matrix is formed where m is the

number of entities and n is the number of features representing

the relationships between entities. For our experiments we

selected classes as entities. We used a subset of the 26 different

relationships used by Abbasi [16] to find similar entities. The

relationships we used are given in Table I. ’D’ represents a

direct relationship and ’I’ represents an indirect relationship.

We selected this set because it represents commonly used

relationships within object-oriented systems.

To find the similarity between entities there are differ-

ent similarity measures e.g. Euclidean distance, Jaccard co-

efficient [14]. To evaluate both direct and indirect relationships

using the same similarity measure, we used an objective

function which is a count of the number of relationships

that exist between entities. Greater number of relationships

between two entities indicates higher similarity between them.

After producing the similarity matrix, a clustering algorithm

is applied to cluster the similar entities. Hierarchical agglomer-

ative clustering algorithms, which cluster the two most similar

entities at every step, are commonly used because they are ca-

pable of representing the hierarchical structure of a software’s

architecture. The clustering algorithm works till all the entities

are in a single cluster or the specified number of clusters

is formed. We used well known hierarchical agglomerative

clustering algorithms i.e. Complete linkage, Weighted average

and Unweighted average algorithms for clustering [18].

To assess the results produced by the clustering algorithms,

we compared them with the architecture produced manually

by human experts. To reduce bias, three expert decompositions

were prepared for each test system. The results of the algo-

rithms were compared with the expert decompositions using

the MoJoFM assessment measure [19], which is the latest

version of the MoJo measure [20]. The value of MoJoFM

lies between 0 and 100, where a 0 indicates no similarity be-

tween the decompositions being compared, and 100 indicates

total similarity. The decomposition produced automatically

was compared at every step with each of the three expert

decompositions, and the MoJoFM values thus obtained were

271

Page 3: [IEEE 2010 International Conference on Emerging Technologies (ICET) - Islamabad, Pakistan (2010.10.18-2010.10.19)] 2010 6th International Conference on Emerging Technologies (ICET)

then averaged.

IV. EXPERIMENTAL SETUP AND RESULTS

A. DataSet Description

For our experiments we selected three software systems

which are developed in Visual C++. These systems are pro-

prietary systems developed by a software company which

has a fairly large customer base. SAVT helps in analysis

and visualization of statistical data. FES is a fact extractor.

It reads C++ source code files, extracts entities, and finds

relationships among the entities. PEDS is related to electrical

power systems. It solves economic power dispatch problem

using conventional and evolutionary computing techniques.

Overview of these systems is provided in Table II and Table

III.

TABLE IITEST SOFTWARE SYSTEMS

SAVT FES PEDSS. No.1 Total number of source code

lines27311 10402 16360

2 Total number of header (.h)files

70 39 31

3 Total number of implemen-tation (.cpp, .cxx) files

76 37 27

4 Total number of Classes 97 47 41

TABLE IIIRELATIONSHIPS WITHIN TEST SYSTEMS. D - DIRECT RELATIONSHIPS, I -

INDIRECT RELATIONSHIPS

Relationship Specification SAVT FES PEDSTotal Relationships AmongClasses

5201 1229 473

Inheritance based (IC) 1242 365 180Inheritance depth (D) 26 54 13

Same inheritance hierarchy (I) 986 166 70

Inheritance type (D) 26 54 13

Virtual method override (D) 21 12 6

Base class variable access (D) 100 33 43

Base class method access (D) 83 46 35

Containment based (CC) 1199 143 61Containment as object (D) 41 26 12

Same class containment (I) 1032 56 12

Variable access (D) 49 23 20

Method access (D) 77 38 17

Association based (AC) 2171 514 136Maintaining pointer (D) 41 11 9

Maintaining reference (D) 0 0 0

Method parameter (D) 77 63 22

Method local (D) 153 56 29

Same class in methods (I) 1900 384 76

File and folder based (FFC) 528 84 72Same file (I) 264 42 36

Include source file (D) 264 42 36

Same folder (I) 0 0 0

B. Experimental Results

We conducted different sets of experiments to evaluate

relationship categories. These are described in the following

sections.

1) Direct and Indirect relationships: As described in Sec-

tion II, indirect relationships have shown better results than

direct relationships for structured systems. To evaluate the per-

formance of indirect relationships for object-oriented systems,

in our first set of experiments we combined all indirect rela-

tionships in Table III and clustered the software systems. The

results were compared with those obtained by combining all

direct relationships. The experimental results using MoJoFM

are given in Table IV. “All” indicates combined direct and

indirect relationships.

TABLE IVRESULTS OF DIRECT, INDIRECT AND ALL RELATIONSHIPS USING

MOJOFM (D - DIRECT, I - INDIRECT, CL - COMPLETE LINKAGE, WA -WEIGHTED AVERAGE, UWA - UNWEIGHTED AVERAGE)

CL WA UWAD I All D I All D I All

SAVT 32 43 46 32 43 42 35 30 45FES 30 42 37 45 47 44 41 50 41PEDS 35 55 32 34 52 34 34 48 36

Fig. 1. Experimental Results for Direct, Indirect and All Relationships

It can be seen from Table IV and Figure 1 that indirect

relationships produce better results than direct relationships

for all algorithms and test systems, except in one case (shown

highlighted in Table IV). This shows that the information

contained within indirect relationships is more meaningful for

architectural understanding, since it is closer to the way human

experts view the system as indicated by the higher MoJoFM

values.

An interesting observation is that indirect relationships also

produce better results than All relationships in almost all

cases. This is contrary to results obtained earlier for structured

272

Page 4: [IEEE 2010 International Conference on Emerging Technologies (ICET) - Islamabad, Pakistan (2010.10.18-2010.10.19)] 2010 6th International Conference on Emerging Technologies (ICET)

Fig. 2. Experimetal results of category wise direct and indirect relationships using MoJoFM

systems, where experimental results show that including a

larger number of relationships improves results [14]. Our

results indicate that combining direct and indirect relationships

together may deteriorate results, thus it is important to differ-

entiate between them especially for object-oriented systems.

2) Category-wise direct and indirect relationships: To gain

further insight, we conducted experiments by dividing the

relationships into the four categories (inheritance, containment,

association, and file and folder based) presented in Table III.

Within each category, we divided relationships into direct and

indirect. Experimental results are given in Table V.

It can be seen from Table V and Figure 2 that in general,

results of indirect relationships are better than those of direct

relationships for each of the categories. For FES, results for

indirect are better than or same as for direct except in the

case of Complete linkage for containment. For SAVT, results

for indirect are better than or same as for direct in all cases

except in the case of Unweighted average for association. This

is also the case for PEDS for the inheritance, association and

file and folder categories, where results of direct are slightly

better than for indirect only in case of Unweighted average

for association. However, all three algorithms produce better

results for direct relationships for the containment category

for PEDS. This may be due to the small number of indirect

relationships in containment based category (12 out of 61, as

can be seen from Table III). Due to this small number, the

information provided may be insufficient for an algorithm to

produce meaningful clusters.

TABLE VCATEGORY-WISE COMPARISON BETWEEN CLUSTERING ALGORITHM AND

EXPERT DECOMPOSITION RESULTS USING MOJOFM. (D - DIRECT, I -INDIRECT, CL - COMPLETE LINKAGE, WA - WEIGHTED AVERAGE, UWA

- UNWEIGHTED AVERAGE)

CL WA UWAD I D I D I

Inheritance (IC)SAVT 27 37 28 37 28 42FES 31 38 31 37 33 45PEDS 24 35 25 43 25 49Containment (CC)SAVT 30 32 37 42 33 49FES 33 32 37 42 45 49PEDS 29 26 36 26 43 26Association (AC)SAVT 29 29 34 35 39 31FES 32 34 29 37 30 34PEDS 26 27 28 28 41 40File and Folder (FFC)SAVT 28 28 31 31 34 34FES 32 32 37 37 36 36PEDS 28 28 38 38 38 38

Figure 2 shows that results for File and folder category are

the same for all algorithms and test systems. This is because

the number of direct and indirect relationships is the same

in this category (Table III). Moreover, the relationships exist

between the same entities, thus leading to same clustering

results.

273

Page 5: [IEEE 2010 International Conference on Emerging Technologies (ICET) - Islamabad, Pakistan (2010.10.18-2010.10.19)] 2010 6th International Conference on Emerging Technologies (ICET)

Fig. 3. Category wise results using MoJoFM

3) Category-wise relationships: Table VI and Figure 3

present category-wise results for the inheritance, containment,

association, and file and folder based categories for the three

clustering algorithms. It can be seen from Table VI and Figure

3 that no single category produces better results for all test

systems. Based on results of two out of three algorithms, the

inheritance category performs better for SAVT, containment

category performs better for FES and association category

performs better for PEDS. From the experimental results, it

appears that results of individual categories are dependent

on system structure. For example, for SAVT, the inheritance

relationship has played a major role in arriving at a sub-system

structure. Thus even though the results do not indicate better

performance of one category, they are useful because they may

be used to gain insight into the design of a system.

TABLE VICATEGORY WISE RESULTS FOR COMPLETE, WEIGHTED AND

UNWEIGHTED AVERAGE USING MOJOFM

CL IC CC AC FCSAVT 37 33 29 28FES 36 34 36 32PEDS 27 26 26 28WASAVT 37 37 31 31FES 41 50 34 37PEDS 28 31 43 38UWASAVT 34 37 32 34FES 35 45 38 36PEDS 28 42 43 38

V. CONCLUSION AND FUTURE WORK

Clustering has been applied by various researchers for gain-

ing architectural understanding and recovering the architecture

of software systems. Relationships play a very important role

during clustering, since they are used to determine similarity

between entities to be clustered. For structured systems, rela-

tionships have been evaluated and they have been categorized

into direct/indirect. For object-oriented systems the number

of relationships is larger than for structured systems due to

features e.g. inheritance, thus it is important to evaluate which

of the relationships are more useful. However, no attempt has

been made to find the usefulness of relationships.

In this paper we divided relationships into direct and indirect

for object-oriented systems and evaluated different categories

of relationships. From our experimental results, we conclude

that in general indirect relationships produces better results

than direct relationships. An evaluation of various categories

including inheritance, containment, association, and file and

folder based reveals that no single category produces better

results for all datasets. Thus the clustering results depend upon

the structure of software systems.

In the future, we intend to evaluate different combinations

of relationships. The relationships may also be evaluated for

other datasets and algorithms, and the role of relationships

may be explored for refactoring.

REFERENCES

[1] R. L. Glass, “Frequently forgotten fundamental facts about softwareengineering,” IEEE Software, vol. 18, no. 3, pp. 111–112, May/Jun 2001.

[2] R. Hall, “Seven ways to cut software maintenance costs,” Datamation,vol. 33, no. 14, pp. 81–83, 1987.

[3] L. Bass and P. Clements and R. Kazman, Software Architecture inPractice, Second ed. Pearson Education, 2004.

[4] R. Koschke, “Atomic architectural component recovery for programunderstanding and evolution,” Ph.D. dissertation, Institut fr Informatik,Universitt Stuttgart, 2000.

[5] B. S. Mitchell and S. Mancoridis, “On the automatic modularizationof software systems using the bunch tool,” IEEE Trans. Software Eng.,vol. 32, no. 3, pp. 193 – 208, March 2006.

[6] V. Tzerpos, “Comprehension driven software clustering,” Ph.D. disserta-tion, Graduate Department of Computer Science University of Toronto,2001.

[7] O. Maqbool and H. A. Babri, “Hierarchical clustering for softwarearchitecture recovery,” IEEE Trans. Software Eng., vol. 33, no. 11, pp.759 – 780, November 2007.

[8] P. Andritsos and V. Tzerpos, “Information theoretic software clustering,”IEEE Trans. Software Eng., vol. 31, no. 2, pp. 150 – 165, February 2005.

[9] O. Maqbool and H. A. Babri, “The weighted combined algorithm: alinkage algorithm for software clustering,” Proc. Int’l Conf. SoftwareMaintenance and Reeng., pp. 15 – 24, 2004.

[10] N. Anquetil and T. C. Lethbridge, “Recovering software architecturefrom the names of source files,” Journal of Software Maintenance:Research and Practice, vol. 11, p. 201221, December 1999.

[11] S. Ducasse and D. Pollet, “Software architecture reconstruction: Aprocess-oriented taxonomy,” IEEE Trans. Software Eng., vol. 35, no. 4,pp. 573–591, July-Aug 2009.

[12] M. Trifu, “Architecture-aware, adaptive clustering of object-orientedsystems,” Master’s thesis, Forschungszentrum Informatik Karlsruhe,2003.

[13] C. Xiao and V. Tzerpos, “Software clustering based on dynamic depen-dencies,” Proc. Int’l Conf. Software Maintenance and Reeng., pp. 124– 133, 2005.

[14] J. Davey and E. Burd, “Evaluating the suitability of data clustering forsoftware remodularisation,” Proc. Working Conf. Reverse Eng., pp. 268– 276, November 2000.

274

Page 6: [IEEE 2010 International Conference on Emerging Technologies (ICET) - Islamabad, Pakistan (2010.10.18-2010.10.19)] 2010 6th International Conference on Emerging Technologies (ICET)

[15] N. Anquetil and T. C. Lethbridge, “Experiments with clustering as asoftware remodularization method,” Proc. Working Conf. Reverse Eng.,pp. 235–255, 1999.

[16] A. Q. Abbasi, “Application of appropriate machine learning techniquesfor automatic modularization of software systems,” MPhil. thesis, Quaid-e-Azam University Islamabad, 2008.

[17] T. Systa, “Static and dynamic reverse engineering techniques for javasoftware systems,” Ph.D. dissertation, University of Tampere, 2000.

[18] J. Han and M. Kamber, Data Mining: Concepts and Techniques.Morgan Kaufmann, 2006.

[19] Z. Wen and V. Tzerpos, “An effectiveness measure for software clus-tering algorithms,” Proc. Int’l Workshop Program Comprehension, pp.194 – 203, June 2004.

[20] V. Tzerpos and R. C. Holt, “Mojo : A distance metric for softwareclusterings,” Proc. Working Conf. Reverse Eng., pp. 187 – 193, October1999.

275