modularity in design formal modeling & automated analysis yuanfang cai
TRANSCRIPT
Modularity in DesignFormal Modeling & Automated Analysis
Yuanfang Cai
Longhorn is late
“With each patch and enhancement, it became harder to strap new features onto the software, since new code could affect everything else in unpredictable ways” ---The Wall Street Journal
“60-m-lines-of-code mess of spaghetti” ---Financial Times
What we have known for decades
Low coupling, high cohesion [Constantine 1974]
Information hiding [Parnas 1972]
Open to extension, close to modification
Seek to modularize Parallel implementation Change Accommodation …
Success depends on designers’ intuition and experience.
What we still don’t do well
Intuition and experience do not prevent unexpected dependencies modularity decay delay in bringing software to market
It remains difficult to estimate the consequences of a change analyze options to accommodate a change make decisions with significant consequences
We seek for
Description Why some architectures are more adaptive than others?
Prediction What’s going to happen if the requirement changes?
Prescription What’s the best way to accommodate a change? Shall
we refactor?
A Formal Model and Theory
A Formal Model and Theory General enough
Span language paradigm Span software lifecycle
Explicitly represent decisions Design is a decision-making procedure [Alexander 1970]
Computable Scalable Capture the essence of informal principles.
We seek for
We first need an analyzable design representation
Roadmap
An Emerging Approach Design rule theory [Baldwin 2000] Design structure matrices (DSM) How DSM and DR explain
Augmented Constraint Network Formal model for the basis of automation DSM derivation and design impact analysis Splitting formal designs ACN in practice
Future work
“Design Rule: the Power of Modularity” [Baldwin 2000]
Design Rules and Modular Operators Modeling: Design Structure Matrix (DSM)
[Steward81,Eppinger91] Economic Analysis: Net Option Value (NOV)
Design Structure Matrix Design Variables Dependences Proto-Modules
Design Rule Theory & Design Structure Matrices
Design Rule Theory Design rule Splitting Substitution Modules create options Net option value analysis
How DR and DSM Explain
The characteristic of a good design Clearly defined design rules Blocks along diagonals model modules
Modules create options No off-diagonal dependencies among blocks
Informally, low-coupling Formally, splitting and substitution
Small blocks Informally, high-cohesion Formally, more options, higher value
How DR and DSM Explain
Tomcat Why it is successful? What is the key enabler?
Server 2 Why refactor? Is the refactoring successful?
Linux and Mozilla [MacCormack et al. 2006]
Tomcat
version v3.3.1 v4.0 v4.1.31 v5.0.28Tomcat-main change ratio 1.0 1.1 0.5 0.4Jasper change ratio 0.2 0.2 0.5 0.5
Tomcat DSM Classes as variables Reverse engineer dependencies A DSM for each version.
Why it is successful? It allows different rates of evolution in different
modules What is the key enabler?
Server 2: Before and After Refactoring
Mozilla Evolution
The Power of Description
The indicator of healthy evolution Fewer off-diagonal dependencies among blocks
Informally, low-coupling Formally, splitting and substitution
Smaller blocks Informally, high-cohesion Formally, more options, higher value
Intuitively or unconsciously, a good designer Define design rules Splitting Substitution
Models at Design Level
Retrospective conclusion is not sufficient
Designers need to make decision at early stages
Changes start from requirements
First Attempt: Design DSM
X Y Z A D G J B E H K C F I L M
X - Computer .
Y - Corpus X . X
Z - User X .
A - In Type .
D - Circ Type .
G - Alph Type .
J - Out Type .
B -In Data X X . X X
E - Circ Data X X X . X
H - Alph Data X X X X .
K - Out Data X X .
C - In Alg X X X X .
F - Circ Alg X X X X X .
I - Alph Alg X X X X X X X .
L - Out Alg X X X X X X .
M - Master X X X X X .
X Y Z N A D G J O P B C E F H I K L M
X - Computer .
Y - Corpus X . X
Z - User X .
N - Line Type .
A - In Type .
D - Circ Type .
G - Alph Type .
J - Out Type .
O - Line Data X X X . X
P - Line Alg X X X X .
B - Input Data X X X . X
C - Input Alg X X X X X .
E - Circ Data X X X X . X
F - Circ Alg X X X X X .
H - Alph Data X X X X . X
I - Alph Alg X X X X X X .
K - Out Data X X X . X
L - Out Alg X X X X X .
M - Master X X X X X X .
(A)Sequential Design
NOV = 0.26
(B) Information Hiding Design
NOV = 1.56
“The Structure and Value of Modularity” [SWC01]
First Attempt: Design DSM
General Object-Oriented (OO), Aspect-Oriented (AO)
[SGSC05] [Lopes05] Generalized Information Hiding Interface
Make Information Hiding Criterion Precise Design Rules are Invariant to Environment Change
Analyze Software Quantitatively (Net Option Value Analysis
Design Level DSM Limitations
Ambiguous !!!! What is “dependence?”
a b c c d e
Can’t represent possible choices Input Condition? Core Size?
Design Impact Analysis? What if x changes from x1 to x2?
How many ways?
1. Variables Design Dimensions
2. Values Possible Choices
3. Constraints Relations Among Decisions
Constraint Network
input_ds:{core4,disk,core0,other};envr_input_size:{small,medium,large};input_ds = disk => envr_input_size = large;
X Y Z N A D G J O P B C E F H I K L M
X - Computer .
Y - Corpus X . X
Z - User X .
N - Line Type .
A - In Type .
D - Circ Type .
G - Alph Type .
J - Out Type .
O - Line Data X X X . X
P - Line Alg X X X X .
B - Input Data X X X . X
C - Input Alg X X X X X .
E - Circ Data X X X X . X
F - Circ Alg X X X X X .
H - Alph Data X X X X . X
I - Alph Alg X X X X X X .
K - Out Data X X X . X
L - Out Alg X X X X X .
M - Master X X X X X X .
1. Constraint Network2. Dominance Relation
Design rule Environment
3. Clustering
Augmented Constraint Network
(input_impl, input_ADT)(input_impl, input_format)
X Y Z N A D G J O P B C E F H I K L M
X - Computer .
Y - Corpus X . X
Z - User X .
N - Line Type .
A - In Type .
D - Circ Type .
G - Alph Type .
J - Out Type .
O - Line Data X X X . X
P - Line Alg X X X X .
B - Input Data X X X . X
C - Input Alg X X X X X .
E - Circ Data X X X X . X
F - Circ Alg X X X X X .
H - Alph Data X X X X . X
I - Alph Alg X X X X X X .
K - Out Data X X X . X
L - Out Alg X X X X X .
M - Master X X X X X X .
Environment: {envr_input_format, envr_core,…}Design Rules: {input_ADT, circ_ADT…}
Analyzable Models
2. Dominance Relation
DesignSpace matrix{DesignSpace matrix{client:{dense, sparse};client:{dense, sparse};ds:{list_ds, array_ds, other_ds};ds:{list_ds, array_ds, other_ds};alg:{array_alg, list_alg, other_alg};alg:{array_alg, list_alg, other_alg};ds = array_ds => client = dense;ds = array_ds => client = dense;ds = list_ds => client = sparse;ds = list_ds => client = sparse;alg = array_alg => ds = array_ds;alg = array_alg => ds = array_ds;alg = list_alg => ds = list_ds;alg = list_alg => ds = list_ds;
}}
{(ds, client), (alg, client)}{(ds, client), (alg, client)}
Environment Cluster: {client}Environment Cluster: {client}Design Cluster: {ds, alg}Design Cluster: {ds, alg}
1. Constraint Network
3. Clustering
Analyses Design Change Impacts Precise Dependence DSM Analyses
Design Automaton Change Dynamics Design Space Design Evolution
Design Automaton
client = denseds = array_dsalg = array_alg
client = sparseds = list_dsalg = list_alg
client = denseds = array_dsalg = other_alg
client = sparseds = list_dsalg = other_alg
client = denseds = other_dsalg = other_alg
client = sparseds = other_dsalg = other_alg
S1
S2
client = sparse
client = sparsealg = other_alg
client = sparseds = other_ds
1. Non-deterministic; 2. Minimal Perturbation;3. Respect Dominance Relation
ds = list_ds
alg = other_alg
S3 S4
S5
S6
Design Impact Analysis
Design Automaton
client = denseds = array_dsalg = array_alg
client = sparseds = list_dsalg = list_alg
client = denseds = array_dsalg = other_alg
client = sparseds = list_dsalg = other_alg
client = denseds = other_dsalg = other_alg
client = sparseds = other_dsalg = other_alg
S1
S2
client = sparse
client = sparsealg = other_alg
client = sparse
ds = other_ds
Precise Definition of Pair-wise Dependence – DSM Derivation
1 2 3
1.client .
2.ds .
3.alg .
xx
xxxx
xx
S3 S4
S5
S6
Simon
Design Impact Analysis
Design Structure Matrices
Net Option Value
Other DSM Analyses: scheduling, cycle detection...
Design Automaton
Cluster SetDominance Relation
Constraint Network
Pair-wise Dependence
Augmented Constraint Network
Modeling
Analysis
User Input
Derive
Derive
A C
luster
KWIC Regenerated
Sequential Design Information Hiding Design
Information Hiding Reformulated
S179
S555
S558
S102
S19
C4
C5
C1C2
C3 S18
input_impl
C1 envr_input_format = new 1 1C2 envr_input_size = large 7 2C3 envr_input_size = small 0 0C4 envr_alph_policy = partial 3 2C5 envr_alph_policy = search 3 2
alph_dsalph_imploutput_impl
alph_dsalph_imploutput_impl
input_dsalph_dscirc_dsinput_implcirc_implalph_imploutput_impl
S155
S2476S1284
S75
S1535
C4
C5
C1
C2C3
S1034
input_impl
alph_dsalph_impl
alph_dsalph_impl
linestorage_dslinestorage_impl
(b) KWIC IH DA(a) KWIC SQ DA
S865
C2
Changes SQ IH
Design Impact Analysis
(A) Sequential Design (B) Information Hiding Design
Scalability Issue
Constraint Solving
Explicit Solution Enumeration
Our approach Using design rules (dominance relation) to split
logical constraints
Model Decomposition
1: linestorage_impl = orig => linestorage_ADT = orig && linestorage_ds = core4;
2: linestorage_ds = core4 => envr_input_size = medium || envr_input_size = small;
3: linestorage_ds = core0 => envr_input_size = small && envr_core_size = large;
4: linestorage_ds = disk => envr_input_size = large;
5: circ_ds = copy => envr_input_size = small || envr_core_size = large;
6: circ_impl = orig => circ_ADT = orig && circ_ds = index && linestorage_ADT = orig;
(1) Construct CNF Graph (2) Cut Edges According to Dominance Relation(3) Create Condensation Graph(4) Compose Sub-ACN
Construct CNF Graph
(¬linestorage impl = orig linestorage ADT = orig) (¬linestorage impl = orig linestorage ds = core4) (¬linestorage ds = core4 envr input size = medium || envr input size = small) (¬linestorage ds = core0 envr input size = small) (¬linestorage ds = core0 envr core size = large) (¬linestorage ds = disk envr input size = large) (¬circ ds = copy envr input size = small envr core size = large) (¬circ impl = orig circ ADT = orig) (¬circ impl = orig circ ds = index) (¬circ impl = orig linestorage ADT = orig)
Construct CNF Graph(¬circ_ds = copy envr_input_size = small envr_core_size = large)
(¬linestorage_ds = core0 envr input size = small)
envr_input_size envr_core_size
circ_dslinestorage_ds
circ_impllinestorage_impl
linestorage_ADT
circ_ADT
(1) Construct CNF Graph (2) Cut Edges According to Dominance Relation
Construct Condensation Graphenvr_input_size
envr_core_size
linestorage_ADT linestorage_ds
linestorage_impl
envr_input_size
envr_core_size
linestorage_ADT
circ_ADT
circ_ds,
circ_impl
envr_input_size
envr_core_size
linestorage_ADT
circ_ADTlinestorage_ds
linestorage_impl circ_ds
circ_impl
Line Storage Function Circular Shift Function
KWIC Decomposed
Information Hiding
Sequential Design
Result Integration
1: envr_input_size = medium
2: envr_core_size = small
3: linestorage_ADT = orig
4: linestorage_ds = core4
5: linestorage_impl = orig
6: circ_ADT = orig
7: circ_ds = index
8: circ_impl = orig
L0
L2
L3
C0 C1
1:
2:
3:
6:
7:
8:
1:
2:
3:
4:
5:
1: envr_input_size = large
2: envr_core_size = small
3: linestorage_ADT = orig
4: linestorage_ds = disk
5: linestorage_impl = other
6: circ_ADT = orig
7: circ_ds = core4
8: circ_impl = orig
1: envr_input_size = large
2: envr_core_size = small
3: linestorage_ADT = orig
4: linestorage_ds = other
5: linestorage_impl = other
6: circ_ADT = orig
7: circ_ds = core4
8: circ_impl = orig
envr_input_size = large
1:
2:
3:
4:
5:
1:
2:
3:
4:
5:
1:
2:
3:
6:
7:
8:
Design Impact Analysis
envr_input_size = large
envr_input_size = large
Input 1: Original Design
Input 2: A Change
envr_input_size = large
Output
Result Integration
Pair-wise Dependence Relation
Generalizability--- WineryLocator
6 Main Functions
5 “Crosscutting” Functions
No Crosscutting
Generalizability--- HyperCast
Vodka Case Study
VODKA Organizational Device for Keeping Assets (VODKA) An online financial management system for student
societies on campus Three-tier and service-oriented architecture Follow software engineering standards
Requirement Design Implementation Testing Iterative Process
Modeling and Analysis
Decisions that span overall lifecycle Standard Requirement Specification Design Document Testing Plan
Decomposition Modules (Responsibility Assignments) Traceability Analysis Changeability Analysis
Proactively Control Design Evolution
Model Decisions in Requirements
Model Decisions in Design
Model Relations among Decisions at Different Stages
Model Testing Decisions Model Dominance Relations Model Clustering 162 Variables in total
Vodka Design Analysis--Decomposition
Independent Responsibility Assignments
Find not-well modularized parts Find incomplete testing plan An error in a sequence diagram …
Vodka Design Analysis Results
Vodka Design Analysis—Traceability and Changeability
Vodka Case Study Summary
Model decisions that span software lifecycle Automatically split the design into independent
responsibility assignments Identify big modules that need to be further
decomposed Automatic traceability and changeability analysis Proactively control design evolution
A Formal Model and Theory
Description Design Structure Matrix
Prediction Design Impact Analysis – a preliminary step
Prescription Decision-tree Analysis
General enough Span language paradigm Span software lifecycle
Explicitly represent decisions Computable Scalable Capture the essence of informal principles. Analyzable design representations
Design Structure Matrix (DSM) Augmented Constraint Network (ACN)
A Formal Model and Theory
Future Work
Design Level Modularity vs. Source Code Modularity
Make Complex Design Decision
Further Evaluation
Questions?