an exploration of power-law in use-relation of java software systems
DESCRIPTION
An Exploration of Power-law in Use-relation of Java Software Systems. Makoto Ichii , Makoto Matsushita, Katsuro Inoue Osaka University. Software Component Graph. A software system is composed of software components. Software component ( component ): building unit of a software system - PowerPoint PPT PresentationTRANSCRIPT
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
An Exploration ofPower-law in Use-relation ofJava Software Systems
Makoto Ichii, Makoto Matsushita, Katsuro Inoue
Osaka University
2008/3/26 ASWEC 20081
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Component Graph A software system is composed of software components.
Software component (component): building unit of a software system
Complex use-relation is formed between components Software component graph (component graph) represents
use-relation between components node: component / edge: use-relation
Various researches utilize component graphs to analyze software systems
It is important to know the nature of component graphs
2008/3/26 ASWEC 20082
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Power-law distribution
A graph is characterized by the degree distribution The graphs whose degree distribution follows the power-law
distribution attracts attention in various research domains Link structure of WWW pages Hosts on the Internet
Such graphs tend to have interesting characteristics Self similarity Fault tolerance
2008/3/26 ASWEC 20083
Explore the component graphs to seek whether the degree distributions follow the power law
p(x) = Cx-α
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Questions [1-2/4]Q. 1 Do the in- and out-degree distributions of a component
graph of a software system follow the power law?
2008/3/26 ASWEC 20084
Q. 2 Do the in- and out-degree distributions of a component graph of multiple software systems follow the power law?
?
?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Questions [3-4/4]Q. 3 Do the in- and out-degree distributions of subgraph of a
component graph follow the power law?
2008/3/26 ASWEC 20085
Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs?
?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Definitions [1/2]Component: Java class (including interface)
Use-relation: Any of the following six relation types acquired by static analysis of the component source files.
A class or an interface extends another class or interface respectively.
A class implements an interface. A class or an interface declares a variable of a class or an
interface. A class instantiates a class object. A class calls a method of a class or an interface. A class or an interface references to a field variable of a class
or an interface.
2008/3/26 ASWEC 20086
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Definitions [2/2]Component graph: Directed simple graph
node: component edge: use-relation between components
In-(Out-)degree: The number of incoming (outgoing) edges to a node
2008/3/26 ASWEC 20087
class B { … A.exec(); …}
class A { void exec() { … }}
A
B
class C { … A a = new A(); …} C
in-degree: 2
out-degree: 0
in-degree: 0
out-degree: 1
in-degree: 0
out-degree: 1
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Observing the power-law Plot cumulative frequency on log-
log axis The data forms a straight line
if the distribution is the power law
2008/3/26 ASWEC 20088
gradient : -αgradient : -(α-1)
p(x) = Cx-α
in-(or out-)degreeM. E. J. Newman, "Power laws, Pareto distributions and Zipf's law",
Contemporary Physics 46, 323-351 (2005)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Values shown in the experiments α: exponent
Derive from the gradient of the regression line
R*2: the determination coefficient adjusted for the degree of freedom
Fitness of a regression model for data
[0..1] Large value means good
fitness
2008/3/26 ASWEC 20089
gradient : -(α-1)
p(x) = Cx-α
in-(or out-)degree
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1
Setup component sets Each set contains a single software system
Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis.
2008/3/26 ASWEC 200810
Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law?
Description # of components
JDK Java 2 SE Software Development Kit 1.4 11,556
ECLIPSE Eclipse 3.0.1 13,941
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 1 / JDK
2008/3/26 ASWEC 200811
α R*2
in-degree 2.1 ±8.6×10-3 0.99
out-degree 3.1 ±8.2×10-2 0.88
► The in-degree follows the power law
► The out-degree does not follow the power law
# of Nodes 11,556
# of Edges 107,198
1 100
11
01
00
10
00
10
00
0
10out-degree
cum
ulat
ive
freq
uenc
y
1 100 1000 5000
11
01
00
10
00
10
00
0
10in-degree
cum
ulat
ive
freq
uenc
y
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 1 / ECLIPSE
2008/3/26 ASWEC 200812
► The similar characteristics with JDK
The in-degree follows the power law
The out-degree does not follow the power lawα R*2
in-degree 2.2 ±1.6×10-2 0.96
out-degree 3.0 ±7.7×10-2 0.86
# of Nodes 13,941
# of Edges 140,678
1 100 200
11
01
00
10
00
0
10
10
00
cum
ulat
ive
freq
uenc
y
out-degree1 100 1000
11
01
00
10
00
0
10
10
00
cum
ulat
ive
freq
uenc
y
in-degree
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 2
Setup component sets Each set contains multiple software systems Use-relation across the systems exists
Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis.
2008/3/26 ASWEC 200813
Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law?
Description # of components
ASF Various projects checked out from the repository of Apache Software Foundation
59,486
SPARS_DB The components stored in the database of demo.spars.info (includes ASF, JDK)
180,637
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 2 / ASF
2008/3/26 ASWEC 200814
► The similar characteristics with Exp. 1
The in-degree follows the power law
The out-degree does not follow the power lawα R*2
in-degree 2.4 ±1.1×10-2 0.98
out-degree 3.4±6.4×10-2 0.94
# of Nodes 59,486
# of Edges 303,7551
10
00
01
01
00
01
00
1 10 100 200
out-degree
cum
ulat
ive
freq
uenc
y
1 10
11
00
00
10
10
00
10
0
1000100
cum
ulat
ive
freq
uenc
y
in-degree
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 2 / SPARS_DB
2008/3/26 ASWEC 200815
α R*2
in-degree 2.0 ±1.5×10-3 1.00
out-degree 3.7 ±7.0×10-2 0.90
# of Nodes 180,637
# of Edges 1,808,982
1 10 100 200
11
00
10
00
0
out-degree
cum
ulat
ive
freq
uenc
y
1 100 10000
11
00
10
00
0
cum
ulat
ive
freq
uenc
y
in-degree
► The similar characteristics with Exp. 1
The in-degree follows the power law
The out degree does not follow the power-law completely
► In-degree distribution fits to the power-law straight line almost ideally.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 3
Construct subsets of SPARS_DB Keyword: The components that contain a specified keyword in the
source code The keywords are randomly selected so that the number of resulting
components is about 1,000/10,000 Random: 1,000/10,000 random components
Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis.
2008/3/26 ASWEC 200816
Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law?
Description # of componentsKWD1K The components that contain “labels” 1,002KWD10K The components that contain “getstring” 8,938RND1K Randomly-selected 1,000 components 1,000
RND10K Randomly-selected 10,000 components 10,000
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 3 / KWD1K
2008/3/26 ASWEC 200817
out-degree
cum
ulat
ive
freq
uenc
y
110
010
1000
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
110
010
1000
1 10 1005 502 20
α R*2
in-degree 2.2 ±3.3×10-2 0.98
out-degree 3.7 ±2.0×10-1 0.93
# of Nodes 1,002
# of Edges 1,564
► The similar characteristics with SPARS_DB
The in-degree follows the power law
The out-degree does not follow the power law
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 3 / KWD10K
2008/3/26 ASWEC 200818
out-degree
cum
ulat
ive
freq
uenc
y
110
010
1000
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
110
1000
100
1000
0
1 10 100 10000
α R*2
in-degree 2.1 ±9.3×10-3 0.99
out-degree 3.4 ±2.7×10-1 0.93
# of Nodes 8,938
# of Edges 24,317
► The similar characteristics with SPARS_DB
The in-degree follows the power law
The out-degree does not follow the power law
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 3 / RND1K
2008/3/26 ASWEC 200819
110
1000
100
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
out-degree
cum
ulat
ive
freq
uenc
y
1010
0010
0
1 2
α R*2
in-degree 2.3 ±1.8×10-1 0.93
out-degree N/A N/A
# of Nodes 1,000
# of Edges 52
► The original characteristics is almost lost
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 3 / RND10K
2008/3/26 ASWEC 200820
out-degree
cum
ulat
ive
freq
uenc
y
110
1000
100
1000
0
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
110
1000
100
1000
0
1 10 1005 50 5002 20
α R*2
in-degree 1.9 ±2.1×10-2 0.98
out-degree 4.3 ±3.3×10-1 0.91
# of Nodes 10,000
# of Edges 6,184
► The similar characteristics with SPARS_DB, however
# of edges is small
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 4
List top-ten components in the in- and out-degree Calculate correlation between degrees and metric values.
Spearman's rank correlation coefficient Target: SPARS_DB
2008/3/26 ASWEC 200821
Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs?
Metric Description
LOC Non-comment source lines of code
WMC1 A variation of weighted methods per class (WMC)
Weight of a method: constant value (1)WMC2 A variation of WMC
Weight of a method: Cyclomatic complexityLCOM A variation of lack of cohesion of methods: LCOM5
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 4 / In-degree Top-ten components
The components that have fundamental/general role
Correlation with metrics In-degree have low
correlation with the metrics
The in-degree relates to the role
2008/3/26 ASWEC 200822
Name LOC
In-degree
Out-degree
1 java.lang.String 675 116,239 21
2 java.lang.Object 35 98,261 4
3 java.lang.Class 605 29,682 41
4 java.lang.Exception 15 21,046 2
5 java.lang.Throwable 136 19,519 12
6 java.lang.System 170 19,175 27
7 java.util.Iterator 5 15,522 1
8 java.util.List 27 14,462 4
9 java.util.ArrayList 200 13,656 19
10 java.lang.Integer 285 12,736 9
Out-degree
LOC WMC1 WMC1 LCOM
In-degree
0.00 0.07 0.24 0.08 0.12
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of experiment 4 / Out-degree Top-ten components
Simply large/complex classes Correlation with metrics
High correlation with LOC and WMC
The out-degree relates to the size/complexity of a component
2008/3/26 ASWEC 200823
In-degree
LOC WMC1 WMC1 LCOM
Out-degree
0.00 0.82 0.64 0.75 0.39
Name LOC
In-degree
Out-degree
1 org.apache...FunctionEval 364 1 354
2 org.jgraph.GPGraphpad 2,196 130 255
3 com.jgraph.GPGraphpad 2,200 131 253
4 org.jgraph.GPGraphpad 542 209 252
5
org.eclipse...
ASTConverter 4,520 3 223
6 org.eclipse...JavaEditor 1,368 115 220
7
net.sourceforge...
GanttProject 3,055 98 216
8
it.businesslogic...
MainFrame 7,177 46 204
9
org...
InstConstraintVisitor 1,626 3 197
10
org...
ASTInstructionCompiler 2,449 1 189
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Answers: summary of experiments [1/4]Q. 1 Do the in- and out-degree distributions of a component
graph of a software system follow the power law?The in-degree follows the power lawThe out-degree does not follow the power law
Mixture of the power-law distribution and the lognormal distribution
2008/3/26 ASWEC 200824
1 100
11
01
00
10
00
10
00
0
10out-degree
cum
ulat
ive
freq
uenc
y
1 100 1000 5000
11
01
00
10
00
10
00
0
10in-degree
cum
ulat
ive
freq
uenc
y
1 100
11
01
00
10
00
10
00
0
10out-degree
cum
ulat
ive
freq
uenc
y
1 100 1000 5000
11
01
00
10
00
10
00
0
10in-degree
cum
ulat
ive
freq
uenc
y
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Answers: summary of experiments [2/4]Q. 2 Do the in- and out-degree distributions of a component
graph for multiple software systems follow the power law?The in-degree follows the power lawThe out-degree does not follow the power law
The similar results with that of single software systems
2008/3/26 ASWEC 200825
11
00
00
10
10
00
10
0
1 10 100 200
out-degree
cum
ulat
ive
freq
uenc
y
1 10
11
00
00
10
10
00
10
0
1000100
cum
ulat
ive
freq
uenc
y
in-degree
1 10 100 200
11
00
10
00
0
out-degree
cum
ulat
ive
freq
uenc
y
1 100 10000
11
00
10
00
0
cum
ulat
ive
freq
uenc
y
in-degree
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Answers: summary of experiments [3/4]Q. 3 Do the in- and out-degree distributions of subgraph of a
component graph for software systems follow the power law?Depends on how the subgraph is created.
Keyword-based subgraph has similar characteristics with the superset Related components likely share words
Random-selection-based subgraph with small number of nodes has different characteristics
• Few edges exist.
2008/3/26 ASWEC 200826
out-degree
cum
ulat
ive
freq
uenc
y
110
010
1000
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
110
010
1000
1 10 1005 502 20
out-degree
cum
ulat
ive
freq
uenc
y
110
010
1000
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
110
1000
100
1000
0
1 10 100 10000
110
1000
100
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
out-degree
cum
ulat
ive
freq
uenc
y
1010
0010
0
1 2
out-degreecu
mul
ativ
e fr
eque
ncy
110
1000
100
1000
0
1 102 5in-degree
cum
ulat
ive
freq
uenc
y
110
1000
100
1000
0
1 10 1005 50 5002 20
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Answers: summary of experiments [4/4]Q. 4 What aspects of components affects the in- and out-degree
distribution of component graphs?In-degree relates to the roles of components
Most of the components are used at the specific part Components with fundamental/general role are used from
everywhere The more the size of component set grows, the more the
value of in-degree becomes large.Out-degree relates to size/complexity of components
Many components have reasonable size/complexity Some components may have relatively large size/complexity
Extremely large components are unreasonable
2008/3/26 ASWEC 200827
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Summary Component graphs are investigated to seek whether the in-
and out-degree distribution follows the power-law As the results, following characteristics are revealed.
The in-degree distribution follows the power-law The in-degree of a component relates to the role of the component
The out-degree distribution does not follows the power-law The out-degree of a component relates to the size/complexity of
the component Some sort of subgraph of a component graph have the same
characteristics of degree distribution with the graph.
Future works Explore the other types of component graph
2008/3/26 ASWEC 200828
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
2008/3/26 ASWEC 200829
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
+
2008/3/26 ASWEC 200830
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Discussion Generative models of a power-law graph
If a node is added to a graph, the nodes with large degree tend to get the edge to the new node. “rich get richer”
Meanings for component graphs If a new component is added to (developed for) a software
system, the new component uses the component that is already used by many components
The members of frequently-used components hardly change even if the software development proceeds If the member changes, it means that the fundamental structure
(design, architecture) of the software is changed
2008/3/26 ASWEC 200838