an exploration of power-law in use-relation of java software systems

31
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University An Exploration of Power-law in Use-relation of Java Software Systems Makoto Ichii , Makoto Matsushita, Katsuro Inoue Osaka University 2008/3/26 ASWEC 2008 1

Upload: bebe

Post on 09-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

An Exploration of Power-law in Use-relation of Java Software Systems. Makoto Ichii , Makoto Matsushita, Katsuro Inoue Osaka University. Software Component Graph. A software system is composed of software components. Software component ( component ): building unit of a software system - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

An Exploration ofPower-law in Use-relation ofJava Software Systems

Makoto Ichii, Makoto Matsushita, Katsuro Inoue

Osaka University

2008/3/26 ASWEC 20081

Page 2: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Software Component Graph A software system is composed of software components.

Software component (component): building unit of a software system

Complex use-relation is formed between components Software component graph (component graph) represents

use-relation between components node: component / edge: use-relation

Various researches utilize component graphs to analyze software systems

It is important to know the nature of component graphs

2008/3/26 ASWEC 20082

Page 3: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Power-law distribution

A graph is characterized by the degree distribution The graphs whose degree distribution follows the power-law

distribution attracts attention in various research domains Link structure of WWW pages Hosts on the Internet

Such graphs tend to have interesting characteristics Self similarity Fault tolerance

2008/3/26 ASWEC 20083

Explore the component graphs to seek whether the degree distributions follow the power law

p(x) = Cx-α

Page 4: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Questions [1-2/4]Q. 1 Do the in- and out-degree distributions of a component

graph of a software system follow the power law?

2008/3/26 ASWEC 20084

Q. 2 Do the in- and out-degree distributions of a component graph of multiple software systems follow the power law?

?

?

Page 5: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Questions [3-4/4]Q. 3 Do the in- and out-degree distributions of subgraph of a

component graph follow the power law?

2008/3/26 ASWEC 20085

Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs?

?

Page 6: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definitions [1/2]Component: Java class (including interface)

Use-relation: Any of the following six relation types acquired by static analysis of the component source files.

A class or an interface extends another class or interface respectively.

A class implements an interface. A class or an interface declares a variable of a class or an

interface. A class instantiates a class object. A class calls a method of a class or an interface. A class or an interface references to a field variable of a class

or an interface.

2008/3/26 ASWEC 20086

Page 7: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definitions [2/2]Component graph: Directed simple graph

node: component edge: use-relation between components

In-(Out-)degree: The number of incoming (outgoing) edges to a node

2008/3/26 ASWEC 20087

class B { … A.exec(); …}

class A { void exec() { … }}

A

B

class C { … A a = new A(); …} C

in-degree: 2

out-degree: 0

in-degree: 0

out-degree: 1

in-degree: 0

out-degree: 1

Page 8: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Observing the power-law Plot cumulative frequency on log-

log axis The data forms a straight line

if the distribution is the power law

2008/3/26 ASWEC 20088

gradient : -αgradient : -(α-1)

p(x) = Cx-α

in-(or out-)degreeM. E. J. Newman, "Power laws, Pareto distributions and Zipf's law",

Contemporary Physics 46, 323-351 (2005)

Page 9: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Values shown in the experiments α: exponent

Derive from the gradient of the regression line

R*2: the determination coefficient adjusted for the degree of freedom

Fitness of a regression model for data

[0..1] Large value means good

fitness

2008/3/26 ASWEC 20089

gradient : -(α-1)

p(x) = Cx-α

in-(or out-)degree

Page 10: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 1

Setup component sets Each set contains a single software system

Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis.

2008/3/26 ASWEC 200810

Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law?

Description # of components

JDK Java 2 SE Software Development Kit 1.4 11,556

ECLIPSE Eclipse 3.0.1 13,941

Page 11: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 1 / JDK

2008/3/26 ASWEC 200811

α R*2

in-degree 2.1 ±8.6×10-3 0.99

out-degree 3.1 ±8.2×10-2 0.88

► The in-degree follows the power law

► The out-degree does not follow the power law

# of Nodes 11,556

# of Edges 107,198

1 100

11

01

00

10

00

10

00

0

10out-degree

cum

ulat

ive

freq

uenc

y

1 100 1000 5000

11

01

00

10

00

10

00

0

10in-degree

cum

ulat

ive

freq

uenc

y

Page 12: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 1 / ECLIPSE

2008/3/26 ASWEC 200812

► The similar characteristics with JDK

The in-degree follows the power law

The out-degree does not follow the power lawα R*2

in-degree 2.2 ±1.6×10-2 0.96

out-degree 3.0 ±7.7×10-2 0.86

# of Nodes 13,941

# of Edges 140,678

1 100 200

11

01

00

10

00

0

10

10

00

cum

ulat

ive

freq

uenc

y

out-degree1 100 1000

11

01

00

10

00

0

10

10

00

cum

ulat

ive

freq

uenc

y

in-degree

Page 13: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 2

Setup component sets Each set contains multiple software systems Use-relation across the systems exists

Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis.

2008/3/26 ASWEC 200813

Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law?

Description # of components

ASF Various projects checked out from the repository of Apache Software Foundation

59,486

SPARS_DB The components stored in the database of demo.spars.info (includes ASF, JDK)

180,637

Page 14: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 2 / ASF

2008/3/26 ASWEC 200814

► The similar characteristics with Exp. 1

The in-degree follows the power law

The out-degree does not follow the power lawα R*2

in-degree 2.4 ±1.1×10-2 0.98

out-degree 3.4±6.4×10-2 0.94

# of Nodes 59,486

# of Edges 303,7551

10

00

01

01

00

01

00

1 10 100 200

out-degree

cum

ulat

ive

freq

uenc

y

1 10

11

00

00

10

10

00

10

0

1000100

cum

ulat

ive

freq

uenc

y

in-degree

Page 15: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 2 / SPARS_DB

2008/3/26 ASWEC 200815

α R*2

in-degree 2.0 ±1.5×10-3 1.00

out-degree 3.7 ±7.0×10-2 0.90

# of Nodes 180,637

# of Edges 1,808,982

1 10 100 200

11

00

10

00

0

out-degree

cum

ulat

ive

freq

uenc

y

1 100 10000

11

00

10

00

0

cum

ulat

ive

freq

uenc

y

in-degree

► The similar characteristics with Exp. 1

The in-degree follows the power law

The out degree does not follow the power-law completely

► In-degree distribution fits to the power-law straight line almost ideally.

Page 16: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 3

Construct subsets of SPARS_DB Keyword: The components that contain a specified keyword in the

source code The keywords are randomly selected so that the number of resulting

components is about 1,000/10,000 Random: 1,000/10,000 random components

Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis.

2008/3/26 ASWEC 200816

Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law?

Description # of componentsKWD1K The components that contain “labels” 1,002KWD10K The components that contain “getstring” 8,938RND1K Randomly-selected 1,000 components 1,000

RND10K Randomly-selected 10,000 components 10,000

Page 17: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 3 / KWD1K

2008/3/26 ASWEC 200817

out-degree

cum

ulat

ive

freq

uenc

y

110

010

1000

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

110

010

1000

1 10 1005 502 20

α R*2

in-degree 2.2 ±3.3×10-2 0.98

out-degree 3.7 ±2.0×10-1 0.93

# of Nodes 1,002

# of Edges 1,564

► The similar characteristics with SPARS_DB

The in-degree follows the power law

The out-degree does not follow the power law

Page 18: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 3 / KWD10K

2008/3/26 ASWEC 200818

out-degree

cum

ulat

ive

freq

uenc

y

110

010

1000

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

110

1000

100

1000

0

1 10 100 10000

α R*2

in-degree 2.1 ±9.3×10-3 0.99

out-degree 3.4 ±2.7×10-1 0.93

# of Nodes 8,938

# of Edges 24,317

► The similar characteristics with SPARS_DB

The in-degree follows the power law

The out-degree does not follow the power law

Page 19: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 3 / RND1K

2008/3/26 ASWEC 200819

110

1000

100

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

out-degree

cum

ulat

ive

freq

uenc

y

1010

0010

0

1 2

α R*2

in-degree 2.3 ±1.8×10-1 0.93

out-degree N/A N/A

# of Nodes 1,000

# of Edges 52

► The original characteristics is almost lost

Page 20: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 3 / RND10K

2008/3/26 ASWEC 200820

out-degree

cum

ulat

ive

freq

uenc

y

110

1000

100

1000

0

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

110

1000

100

1000

0

1 10 1005 50 5002 20

α R*2

in-degree 1.9 ±2.1×10-2 0.98

out-degree 4.3 ±3.3×10-1 0.91

# of Nodes 10,000

# of Edges 6,184

► The similar characteristics with SPARS_DB, however

# of edges is small

Page 21: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 4

List top-ten components in the in- and out-degree Calculate correlation between degrees and metric values.

Spearman's rank correlation coefficient Target: SPARS_DB

2008/3/26 ASWEC 200821

Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs?

Metric Description

LOC Non-comment source lines of code

WMC1 A variation of weighted methods per class (WMC)

Weight of a method: constant value (1)WMC2 A variation of WMC

Weight of a method: Cyclomatic complexityLCOM A variation of lack of cohesion of methods: LCOM5

Page 22: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 4 / In-degree Top-ten components

The components that have fundamental/general role

Correlation with metrics In-degree have low

correlation with the metrics

The in-degree relates to the role

2008/3/26 ASWEC 200822

Name LOC

In-degree

Out-degree

1 java.lang.String 675 116,239 21

2 java.lang.Object 35 98,261 4

3 java.lang.Class 605 29,682 41

4 java.lang.Exception 15 21,046 2

5 java.lang.Throwable 136 19,519 12

6 java.lang.System 170 19,175 27

7 java.util.Iterator 5 15,522 1

8 java.util.List 27 14,462 4

9 java.util.ArrayList 200 13,656 19

10 java.lang.Integer 285 12,736 9

Out-degree

LOC WMC1 WMC1 LCOM

In-degree

0.00 0.07 0.24 0.08 0.12

Page 23: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of experiment 4 / Out-degree Top-ten components

Simply large/complex classes Correlation with metrics

High correlation with LOC and WMC

The out-degree relates to the size/complexity of a component

2008/3/26 ASWEC 200823

In-degree

LOC WMC1 WMC1 LCOM

Out-degree

0.00 0.82 0.64 0.75 0.39

Name LOC

In-degree

Out-degree

1 org.apache...FunctionEval 364 1 354

2 org.jgraph.GPGraphpad 2,196 130 255

3 com.jgraph.GPGraphpad 2,200 131 253

4 org.jgraph.GPGraphpad 542 209 252

5

org.eclipse...

ASTConverter 4,520 3 223

6 org.eclipse...JavaEditor 1,368 115 220

7

net.sourceforge...

GanttProject 3,055 98 216

8

it.businesslogic...

MainFrame 7,177 46 204

9

org...

InstConstraintVisitor 1,626 3 197

10

org...

ASTInstructionCompiler 2,449 1 189

Page 24: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Answers: summary of experiments [1/4]Q. 1 Do the in- and out-degree distributions of a component

graph of a software system follow the power law?The in-degree follows the power lawThe out-degree does not follow the power law

Mixture of the power-law distribution and the lognormal distribution

2008/3/26 ASWEC 200824

1 100

11

01

00

10

00

10

00

0

10out-degree

cum

ulat

ive

freq

uenc

y

1 100 1000 5000

11

01

00

10

00

10

00

0

10in-degree

cum

ulat

ive

freq

uenc

y

1 100

11

01

00

10

00

10

00

0

10out-degree

cum

ulat

ive

freq

uenc

y

1 100 1000 5000

11

01

00

10

00

10

00

0

10in-degree

cum

ulat

ive

freq

uenc

y

Page 25: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Answers: summary of experiments [2/4]Q. 2 Do the in- and out-degree distributions of a component

graph for multiple software systems follow the power law?The in-degree follows the power lawThe out-degree does not follow the power law

The similar results with that of single software systems

2008/3/26 ASWEC 200825

11

00

00

10

10

00

10

0

1 10 100 200

out-degree

cum

ulat

ive

freq

uenc

y

1 10

11

00

00

10

10

00

10

0

1000100

cum

ulat

ive

freq

uenc

y

in-degree

1 10 100 200

11

00

10

00

0

out-degree

cum

ulat

ive

freq

uenc

y

1 100 10000

11

00

10

00

0

cum

ulat

ive

freq

uenc

y

in-degree

Page 26: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Answers: summary of experiments [3/4]Q. 3 Do the in- and out-degree distributions of subgraph of a

component graph for software systems follow the power law?Depends on how the subgraph is created.

Keyword-based subgraph has similar characteristics with the superset Related components likely share words

Random-selection-based subgraph with small number of nodes has different characteristics

• Few edges exist.

2008/3/26 ASWEC 200826

out-degree

cum

ulat

ive

freq

uenc

y

110

010

1000

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

110

010

1000

1 10 1005 502 20

out-degree

cum

ulat

ive

freq

uenc

y

110

010

1000

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

110

1000

100

1000

0

1 10 100 10000

110

1000

100

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

out-degree

cum

ulat

ive

freq

uenc

y

1010

0010

0

1 2

out-degreecu

mul

ativ

e fr

eque

ncy

110

1000

100

1000

0

1 102 5in-degree

cum

ulat

ive

freq

uenc

y

110

1000

100

1000

0

1 10 1005 50 5002 20

Page 27: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Answers: summary of experiments [4/4]Q. 4 What aspects of components affects the in- and out-degree

distribution of component graphs?In-degree relates to the roles of components

Most of the components are used at the specific part Components with fundamental/general role are used from

everywhere The more the size of component set grows, the more the

value of in-degree becomes large.Out-degree relates to size/complexity of components

Many components have reasonable size/complexity Some components may have relatively large size/complexity

Extremely large components are unreasonable

2008/3/26 ASWEC 200827

Page 28: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Summary Component graphs are investigated to seek whether the in-

and out-degree distribution follows the power-law As the results, following characteristics are revealed.

The in-degree distribution follows the power-law The in-degree of a component relates to the role of the component

The out-degree distribution does not follows the power-law The out-degree of a component relates to the size/complexity of

the component Some sort of subgraph of a component graph have the same

characteristics of degree distribution with the graph.

Future works Explore the other types of component graph

2008/3/26 ASWEC 200828

Page 29: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

2008/3/26 ASWEC 200829

Page 30: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

+

2008/3/26 ASWEC 200830

Page 31: An Exploration of Power-law in Use-relation of Java Software Systems

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Discussion Generative models of a power-law graph

If a node is added to a graph, the nodes with large degree tend to get the edge to the new node. “rich get richer”

Meanings for component graphs If a new component is added to (developed for) a software

system, the new component uses the component that is already used by many components

The members of frequently-used components hardly change even if the software development proceeds If the member changes, it means that the fundamental structure

(design, architecture) of the software is changed

2008/3/26 ASWEC 200838