modeling the search landscape of metaheuristic software clustering algorithms

19
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell [email protected] or http://www.mcs.drexel.edu/~bmitchel Department of Computer Science College of Engineering Drexel University Philadelphia, PA, 19104 USA

Upload: russell-barron

Post on 04-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms. Dagstuhl – Software Architecture Brian S. Mitchell [email protected] or http://www.mcs.drexel.edu/~bmitchel Department of Computer Science College of Engineering Drexel University Philadelphia, PA, 19104 USA. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

1

Modeling the Search Landscape of Metaheuristic

Software Clustering Algorithms

Dagstuhl – Software Architecture

Brian S. [email protected] or http://www.mcs.drexel.edu/~bmitchelDepartment of Computer ScienceCollege of EngineeringDrexel UniversityPhiladelphia, PA, 19104 USA

Page 2: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

2Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Understanding Large Systems is HARD

Example: RedHat Linux 7.1Kernel 1,400 modules, 2.5M LOCSystem 350K modules, 30M LOCLanguages: > 19 (including scripting)[http://www.dwheeler.com/sloc]

ManualAnalysis is

Tedious andError Prone

Source CodeAnalysis Approaches

Create LargeRepositories

Software ClusteringApproaches

Create AbstractRepresentations

(1)

(2)

(3)

Page 3: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

3Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Software Clustering

BunchTool

Requires aRepresentation...

…A ClusteringAlgorithm…

…And a way toRepresent Results…

Researchers Have Examined ManyDifferent Approaches for Software Clustering

Page 4: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

4Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Search-Based Software Clustering with Bunch

Preparation Phase

Source Code

Software StructureGraph Generation

(e.g., MDG)

Source Code Analysis(e.g., cia, Acacia)

MDG

System.out.println(…);

Clustering Phase

Search Space

Metaheuristic SearchSoftware Clustering

Algorithms(e.g., Bunch)

Analysis & VisualizationPhase

Visualization

Additional Analysis<gxl> <graph id=”G1"> <node id=”C1"> <node id=”M1"/> <node id=”M2"/> <edge from=”M1" to=”M2"/> ... </node> </graph></gxl>

Bunch Uses Metaheuristic Search Algorithms for Software Clustering

Page 5: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

5Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Bunch Example

The MDGThe RandomStart Point

The Solution

Page 6: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

6Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Evaluating Bunch’s Results

Observation: Bunch produces similar results

This is desirable, but This is unexpected considering the

use of metaheuristic search algorithms

Some evaluation has been done “Good Enough” via empirical studies Similarity Analysis

[WCRE01,ICSM01] Comparing to spectral clustering

techniques [WCRE02]We were intrigued to investigate whyBunch’s results are consistently similar

Bunch ProducesA “Family” of

Related Results

Page 7: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

7Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

The Search Landscape

Search Landscape ModelerSearch Landscape Modeler

Structural Landscape Similarity LandscapeWhat are some common

properties, if any, in the MDG partitions?

How similar are thecontents of theMDG partitions?

MDG BunchTool

ClusteringResults

Cluster a System Many Times, Look for Patterns in theClustering Results that Provide Insight into the Search

Space

Can Modeling theSearch Space be useful

for Evaluation?

Page 8: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

8Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

The Structural Landscape – What do we Expect?

The Structural Landscape is Modeled using a Series of Views

MQ vsNumber of

Clusters

Intra-Edge

Density

MQValue

Number ofClusters

We expect to see a relationship between MQ and the number of clusters. Both MQ and the number of clusters in the partitioned MDG should not vary widely across clustering runs.

We expect a good result to produce a high percentage of intraedges (edges that start and end in the same cluster) consistently.

We expect repeated clustering runs to produce similar MQ results.

We expect that the number of clusters remains relatively consistent across multiple clustering runs.C

om

pari

ng

Bu

nch

’s F

inal

Resu

lts a

gain

st

the I

nit

ial

Ran

dom

Part

itio

ned

MD

G

Page 9: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

9Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

The Similarity Landscape – What do we Expect?

ab c

CLUSTEROther

Clusters edges (Intra-Edges) edges (Inter-Edges)

1. Create a counter C<u,v> for each edge, initialize to zero2. Cluster a system many times, For each run:

• For each edge, Increment C<u,v> if <u,v> is an Intraedge3. After all Runs, determine P<u,v> which is the percentage of

times that each <u,v> appeares as an IntraedgeNone Low Mediu

mHigh

Aggregate the P <u,v>

based on the level of agreement

LARGEDissimilarity

MODERATEDissimilarity

NOTSimilar

VERYSimilar

Our Expectations

Page 10: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

10Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study

System Name

NumberModules

NumberRelations

Description

Telnet 28 81 Terminal Emulator

PHP 62 191 Internet Scripting Language

Bash 92 901 Unix Terminal Environment

Lynx 148 1,745 Text-Based HTML Browser

Bunch 220 764 Software Clustering Tool

Swing 413 1,513 Standard Java User Interface Framework

Kerberos 5 558 3,793 Security Services Infrastructure

We also looked at 6 randomly generated MDGs

Page 11: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

11Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Structural Landscape (1)MQ

Cluster Count

Sample Number

(|IntraEdges|/|E|)%

Sample Number

MQ

Sample Number

Cluster Count

Black = Bunch Gray = Random

0

25

50

75

100

0 50 100

0

1

2

3

4

0 50 100

0

10

20

30

0 50 100

TE

LN

ET

0

6

12

18

5.6 5.8 6 6.2

0

25

50

75

100

0 50 100

0

2

4

6

8

0 50 100

0

20

40

60

80

0 50 100

PH

P

0

10

20

30

40

0 5 10

0

25

50

75

100

0 50 100

0

2

4

6

8

0 50 100

0

20

40

60

80

0 50 100

BA

SH

0

2

4

6

8

0 2 4

Y-Axis:

X-Axis:

The independent samples were ordered by MQ to highlight

some relationships that would not be obvious otherwise.

Page 12: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

12Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Structural Landscape (2)MQ

Cluster Count

Sample Number

(|IntraEdges|/|E|)%

Sample Number

MQ

Sample Number

Cluster Count

Black = Bunch Gray = Random

0

10

20

30

40

4 6 8

0

25

50

75

100

0 50 100

0

2

4

6

8

0 50 100

0306090

120150

0 50 100

LYN

X

0

10

20

30

40

16 18 20

0

25

50

75

100

0 50 100

0

5

10

15

20

0 50 100

0255075

100125

0 50 100

BU

NC

H

0

20

40

60

80

60 65 70

0

25

50

75

100

0 50 100

0

20

40

60

80

0 50 100

0100200300400500

0 50 100

SW

ING

0

20

40

60

80

64 66 68 70

0

25

50

75

100

0 50 100

0

20

40

60

80

0 50 100

0

150

300

450

600

0 50 100

KER

BER

OS5

Y-Axis:

X-Axis:

Page 13: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

13Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Structural Landscape (3) – Random MDGs

01020304050

0 2 4

0

25

50

75

100

0 50 100

0

1

2

3

4

0 50 100

0

25

50

75

100

0 50 100

01020304050

0 5

0

25

50

75

100

0 50 100

0

1

2

3

4

0 50 100

0

25

50

75

100

0 50 100

01020304050

14 16 18

0

25

50

75

100

0 50 100

0

5

10

15

20

0 50 100

0

25

50

75

100

0 50 100

RN

D5

RN

D5

0R

ND

75

MQ

Cluster Count

Sample Number

(|IntraEdges|/|E|)%

Sample Number

MQ

Sample Number

Cluster Count

Black = Bunch Gray = Random

Y-Axis:

X-Axis:

Page 14: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

14Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Structural Landscape (4) – Random MDGs

01020304050

0 2 4

0

25

50

75

100

0 50 100

0

1

2

3

4

0 50 100

0

25

50

75

100

0 50 100

01020304050

0 5

0

25

50

75

100

0 50 100

0

2

4

6

0 50 100

0

25

50

75

100

0 50 100

0

5

10

15

20

5.6 5.8 6 6.2

0

25

50

75

100

0 50 100

0

2

4

6

8

0 50 100

0

25

50

75

100

0 50 100

BIP

5B

IP5

0B

IP75

MQ

Cluster Count

Sample Number

(|IntraEdges|/|E|)%

Sample Number

MQ

Sample Number

Cluster Count

Black = Bunch Gray = Random

Y-Axis:

X-Axis:

Page 15: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

15Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Structural Landscape - Observations

There was significant commonality across the clustering resultsMany desirable aspectsA lot of commonality between the random and open source systems Some additional variability in the MQ vs

Cluster Size relationship for the random MDGs More variability in the clustering results for

the random graphs with higher edge densities

Page 16: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

16Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Landscape (1)1009080706050403020100

Zero Low Medium High

35

61

51

12

47

32

14

30

22

21

54

35

7

139 13

34

27

12

25

18

0

28

6

Open Source Systems

Random MDGs

Page 17: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

17Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Landscape (2)1009080706050403020100

Zero Low Medium High

35

61

51

12

37

25

14

30

22

21

54

38

7

139 13

2419

12

2518

9

2818

Open Source Systems

Random MDGs - Low

Random MDGs - High

29

47

36

24

38

32 28

3532

0

Page 18: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

18Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Observations – Similarity Landscape

Open Source systems exhibited expected trends High dissimilarity and high similarity Low medium similarity

Random MDGs had much higher medium similarity, and almost no high-similarity We think that this might be due to

isomorphism in the clustering results Why: The variability in the number of clusters with

similar MQ that we observed from the structural landscape

Page 19: Modeling the Search Landscape  of Metaheuristic  Software Clustering Algorithms

19Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

ConclusionsIdeally evaluation can be performed by comparing Bunch’s results to a benchmark

Not possible – Graph partitioning is NP-Hard Empirical feedback indicates that the results are “good

enough”Up to this point and time no investigation has been performed on why Bunch produces consistent results

The Search Landscape model provided a lot of intuition into Bunch’s behavior

We examined both the structural and similarity aspects of the search landscapeThe Search Landscape approach seems appropriate for modeling other metaheuristic search algorithms