1 a heuristic approach towards solving the software clustering problem icsm03 brian s. mitchell...

16
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell [email protected] / http://www.mcs.drexel.edu/~bmitchel Department of Computer Science, College of Engineering Drexel University Philadelphia, PA, 19104 USA

Upload: amelia-webb

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

1

A Heuristic Approach Towards Solvingthe Software Clustering Problem

ICSM03

Brian S. [email protected] / http://www.mcs.drexel.edu/~bmitchelDepartment of Computer Science, College of EngineeringDrexel UniversityPhiladelphia, PA, 19104 USA

Page 2: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

2

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Understanding Large Systems is HARD

Example: RedHat Linux 7.1Kernel 1,400 modules, 2.5M LOCSystem 350K modules, 30M LOCLanguages: > 19 (including scripting)[http://www.dwheeler.com/sloc]

ManualAnalysis is

Tedious andError Prone

Source CodeAnalysis Approaches

Create LargeRepositories

Software ClusteringApproaches

Create AbstractRepresentations

(1)

(2)

(3)

Page 3: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

3

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Software Clustering

Software clustering simplifies program maintenance and program understandingThe abstract views produced by software clustering techniques can be used to help developers fix defects or add features to existing software systems

Page 4: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

4

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Software Clustering Environments

BunchTool

Requires aRepresentation...

…A ClusteringAlgorithm…

…A way toRepresent Results…

OtherTools

…And a way toCompare Results…

f(x)

TestSuite

ComparisonFailure

TestCase

Assert

AssertionFailedError

TestResult

TestFailure

ComparisonFailure

Assert

AssertionFailedError

TestSuite

TestCase

TestResult

TestFailure

Bunch works by partitioning a software graphand uses a fitness function called MQ to evaluate the quality of individual partitions

Page 5: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

5

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Software Clustering TechniquesA variety of techniques for software clustering have been studied by the reverse engineering community:

Source code component similarity (or dissimilarity)

Concept Analysis Subsystem Patterns Implementation-Specific Information

My Research Contribution Was Applying Search Techniques to the Software Clustering Problem,

and Improving the State of Practice forEvaluating Software Clustering Results

Page 6: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

6

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Problem: There are too many partitions to search all of them…

1 = 12 = 23 = 54 = 155 = 52

6 = 2037 = 8778 = 41409 = 2114710 = 115975

11 = 67857012 = 421359713 = 2764443714 = 19089932215 = 1382958545

16 = 1048014214717 = 8286486980418 = 68207680615919 = 583274220505720 = 51724158235372

otherwisekSS

nkkifS

knknkn

,11,1,

11

A 15 Module System is about the limit for performing Exhaustive Analysis

The number of partitions (ways to cluster a system)of a software graph grows very quickly, as the number of modules in the system increases…

Page 7: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

7

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Applying Heuristic Search Techniques To The Software Clustering Problem

Source CodeAnalysis Tools

MDG

Source Codevoid main(){ printf(“hello”);}

Acacia Chava

M1

M2

M3

M5M4

M6

M7 M8

Software ClusteringSearch Algorithms

“GOOD” MDG Partition“GOOD” MDG Partition

M1

M2

M3

M5M4

M6

M7 M8

SEARCH SPACESet of All

MDG Partitions

M1

M2

M3

M5M4

M6

M8 M7

M1

M2

M3

M5M4

M6

M8 M7

Total = 4140 Partitions

Hill Climbing

Genetic Algorithm

Simulated Annealing

Note that a “good” Partition may not

be an optimal solution

Page 8: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

8

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Software Developed as Part of my Ph.D. Research

Bunch: An Automatic Clustering Tool

CRAFT: A ReferenceDecomposition Generator

Both tools also have a documented API to support integration into other tools

Page 9: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

9

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Bunch Example

The MDGThe RandomStart Point

A Solution

TestSuite

ComparisonFailure

TestCase

Assert

AssertionFailedError

TestResult

TestFailure

JUnit is a Unit Testing Framework for Java

(FrameworkPackage Shown Below)

TestFailure

ComparisonFailure

Assert

TestResult

AssertionFailedError

TestSuite

TestCase

ComparisonFailure

Assert

AssertionFailedError

TestSuite

TestCase

TestResult

TestFailure

MQ = 0.2857 MQ = 1.7889

Assert

TestCase

TestResult

CompFailure TestFailure

AssertTestCase

(My Dissertation Discusses Several MQ Measurements)

Page 10: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

10

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Clustering Large Software Systems Efficiently

Our goal was to cluster large and interesting systems in a reasonable amount of time: Linux Kernel: >1,000 modules in ~ 90 seconds Swing Framework: > 450 classes in ~ 20 seconds Kerberos: > 500 modules in ~35 seconds Other Popular Systems Examined: Xerces, Apache

HTTP Server, Jigsaw HTTP Server, Mozilla, Ant … Overall we examined over 50 reference systems

during the course of my Ph.D. research

Since the source code analysis and clustering activities are separated, Bunch can cluster software developed in any programming language.

Page 11: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

11

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Research into Evaluating Software Clustering Results

Most software clustering results are evaluated subjectivelyFor a limited set of well-studied systems a reference is available, but for many systems no benchmark decomposition exists for comparison

WCRE’01: Paper described the CRAFT system to generate a reasonable reference decomposition by highlighting similarities in a collection of software clustering results

One important aspect of evaluation is being able to compare software clustering results to each other

ICSM’01: Paper introduced 2 measurements to determine similarity: MeCl and EdgeSim

Page 12: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

12

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

What’s Been Done Since Completing my Ph.D. Research

Applying a formal Architectural Constraint Language (ISF) to software clustering results to reverse engineer the software architecture of a systemModeling the Search Landscape to better understand why Bunch produces consistent results given the size of the search spaceIntegration of Bunch’s software clustering services into the RePortal online reverse engineering portal (http://reportal.cs.drexel.edu)Support for GXL as both input and output representation into Bunch

Page 13: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

13

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Additional Research Opportunities Identified in my Thesis

Improved Visualization ServicesClustering the Dynamic Behavior of SystemsClustering Distributed and Heterogeneous SystemsInvestigating other Heuristics Appropriate for Clustering Software SystemsInvestigating other Representations of Systems being Clustered

Page 14: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

14

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Summary

Application of search techniques to the software clustering problemDeveloped software clustering algorithms and software to cluster large and interesting systems efficientlyDeveloped software and techniques to improve the state of practice for evaluating software clustering results

Page 15: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

15

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

RecognitionSpecial Thanks To:

My Advisor: Dr. Spiros Mancoridis My Committee: Dr. J. Johnson, Dr. C. Rorres,

Dr. A. Shokoufandeh, Dr. R. Chen, and Dr. L. Perkovic (former member)

My Sponsors: AT&T Research, Sun Microsystems, DARPA, NSF, US Army Bunch Project Contributors: D. Doval,

M. Traverso, S. Mancoridis Dr. E. Gansner & Dr. R. Chen (AT&T Labs -Research) for test data and

validation of Bunch’s clustering results. The gang at the SERG lab…

Page 16: 1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell bmitchell@drexel.edu / bmitchel

16

Drexel University Software Engineering Research Group (SERG)http://serg.cs.drexel.edu

Questions / More Information

Reverse Engineering Tools@ Drexel

Bunch – Software Clustering Tool

CRAFT – Benchmark Generation Tool

RePortal – Online Reverse Engineering Portal

Where to Download & Evaluate