xml schema metrics - university of...
Post on 11-Apr-2018
228 Views
Preview:
TRANSCRIPT
60-510 Literature Review and SurveyWinter 2008
XML Schema Metrics
Instructor: Dr. Richard Frost
Yanyin Zhang102537020
zhang13c@uwindsor.ca
1
CONTENTS
ABSTRACT...................................................................................................................31. INTRODUCTION.....................................................................................................42. General Discussion.................................................................................................5
2.1 The purpose of XML Schemas.........................................................................52.2 Why XML Schema Metrics.............................................................................5
3. Survey of Research................................................................................................63.1 Metrics for XML document collections...........................................................63.2 Metrics for DTDs.............................................................................................8
3.2.1 DTD Metrics...................................................................................93.2.2 DTD vs. XML Schema....................................................................9
3.3 Metrics for XML Schema...............................................................................113.3.1 Structural Metrics........................................................................113.3.2 XML Schema Usage.....................................................................123.3.3 XML Schema Complexity and Quality Index...............................143.3.4 A Quality Model for XML Schema Evaluation.............................153.3.5 Metrics for XML Complexity........................................................163.3.6 Metrics for XML Schema Complexity..........................................163.3.7 Summary of Research on XML Schema and DTD.......................17
3.4 Metrics for Software Component...................................................................193.4.1 Component Metrics to Measure Component Quality...................193.4.2 Software Component Reusability Metrics....................................213.4.3 Software Component Reusability Metrics....................................223.4.4 Components Interface Metrics for Reusability............................233.4.5 Metrics for the Integration of Software Components...................253.4.6 Summary of Research on software component metrics................27
3.5 Metrics for Objected Oriented Design...........................................................293.5.1 Summary of Research on Object Oriented metrics......................32
3.6 Metrics for Semantic Web Schemas...............................................................323.6.1 Summary of Research on Semantic Web metrics..........................33
4. Conclusions............................................................................................................335. Acknowledgements...............................................................................................34REFERENCES...........................................................................................................35APPENDIX..................................................................................................................42
Annotations on the Milestone Papers...................................................................42
2
ABSTRACT
With the increasing application of XML as a popular data-format in different
communities, the XML Schema is becoming a central component in software
construction projects. The application of schemas varies from being the interface
definitions, or protocol specifications, to being the input to some code generators for
software components. The wide application of schemas has created the necessity of
research in schema metrics for the software engineering process. Particularly, schema
metrics should be developed to enable quantification of schema size, complexity, quality
and other properties for controlling the software processes. Before the application of
XML Schema, DTD was the data-exchange standard in XML Storage, XML Publishing,
and optimization of XML queries. But the shortcomings of DTD limit its further
application and it was eventually superseded by XML Schema. The metric studies of
DTDs are insightful for XML schema metrics research. Besides, XML Schema being as
an artifact of software component, the metrics in software components which started over
three decades ago can provide meaningful illustrations for XML Schema. In this paper, a
comprehensive survey on the research into XML Schema Metrics, DTDs and software
component metrics is reviewed.
Keywords: XML; Schema metrics; DTDs, software components;
3
1 INTRODUCTION
XML schemas provide a consistent way to validate XML documents. The XML Schema
grammar specifies a language that constrains and documents corresponding XML. The
greatest strength of an XML schema is that each schema defines a “contract”. This
contract provides the foundation for an application to accept or reject XML data before
operating on that data. XML provides a grammar for parsing a particular file or stream
format. XML schemas provide a mechanism for specifying more extensive grammar
constraints. An XML Schema specifies valid elements and attributes in an XML instance.
Furthermore, a schema specifies the exact element hierarchy [Binstock et al. 2002]. Also,
a schema specifies options and various other constraints placed on the XML instance. In
addition, a schema specifies a range for the value of an element or attribute. Suppose you
have the XML instance shown in Figure 1. It consists of a product element that has two
children (number and size) and an attribute (effDate). Figure 2 shows a schema that
describes the instance. It contains element and attribute declarations that assign data types
and element type names to elements and attributes.
Figure 2 Product schema [Binstock et al. 2002, 21]
4
<product effDate=”2001-04-02”><number>557</number><size>10</size>
</product>
Figure 1 Product instance [Binstock et al. 2002, 21]
<xsd: schema xmlns : xsd=http://www.w3.org/2001/XMLSchema>
<xsd: element name=”product” type=”ProductType”/><xsd: complexType name=”ProductType”> <xsd: sequence> <xsd: element name=”number” type=”xsd: integer”/> </xsd: sequence> <xsd: attribute name=”effDate” type=”xsd:date”/></xsd: complexType><xsd: simpleType name=”SizeType”> <xsd: restriction base=”xsd: integer”> <xsd: minInclusive value=”2”/> <xsd: maxInclusive value=”18”/></xsd: restriction>
</xsd: simpleType></xsd: schema>
2 GENERAL DISCUSSIONS
2.1 The purpose of XML schemas
XML has developed rapidly from its original idea, which is an electronic data interchange
standard, To use in the new type of distributed application of Web Service which use
XML documents for their data representation, XML Schema has become very
convenient for the definition of interface specifications, definition of data models,
protocol specifications and so on. Meanwhile, XML schema is a powerful tool for
contracts specification between different business subjects in the E-business industry and
it is beneficial in verifying an XML document to determine if it is valid according to a
defined set of rules, being a contract with trading partners, providing documentation
about the data in an XML instance, adding some data to the instance, inserting default
and fixed values for elements and attributes, normalizing whitespace according to the
data type and providing a way for additional information about the data to be supplied to
the application when processing a particular type of document. These advantages benefit
XML Schema in its role in software development process and need to be quantified for
ease of maintainability in its development and use.
2.2 Why XML Schema metrics?
All XML Schema applications related to XML documents need to be properly designed,
so that they can be easily maintained for the purpose of XML data to be effectively and
properly used by various applications. Further, schema metrics must be developed to
enable quantification of schema size, complexity, quality and the other properties. In
software engineering, XML and XML Schema documents also have a great impact on the
overall quality of the software. The metrics on them, for predicting quality and
complexity of the software development process, are important components.
5
3 SURVEY RESEARCH
Not much research that deals with schema quality and complexity metric. But the
research in software component metrics and object-oriented metrics has been studied
thoroughly. Since XML Schema also can be said to be software components, in this
survey, I will start the investigation from xml and XML Schema metrics, and then survey
software component metrics.
3.1 Metrics for XML document collections
This part of the survey presents the metrics for XML documents, two papers focus on this
subject.
There is little research of metrics for XML documents. The most relevant research was
done by [Klettke et al. 2002]. In it, they developed a set of five metrics for Document
Type Definition (DTD) documents to measure the complexity of XML documents. It was
an important first step in XML metrics research. But it focused mainly on complexity
rather than quality. In this paper, the authors did not develop a new quality model, but
used the ISO 9126 quality model [58] and concentrated on the characteristics usability
and maintainability. The first metric proposed in this paper is the size. The software
metric of LOC (lines of code) was applied and developed to evaluate the number of
elements, attributes, entities and notations in the metric. The second metric is to evaluate
the structural complexity. The McCabe complexity [McCabe 1976] was adopted and
added additional edges in the DTD graph to derive a new metric. The third metric
developed in the paper is structure depth. Since a DTD can be represented as a graph.
From the graph, the depth of the leaf nodes on the graph is 0. The depth of all other nodes
is the maximal depth of all child nodes plus 1. The depth of the whole graph is equal to
the depth of the root node. The fourth metric proposed is the fan-in metric. Different from
the depth structure, this metric determines the “width” of each element declaration. It
expresses how many child elements and attributes it has. This value correlates with the
number of child nodes in the DTD graph. The formula for this metric is:
is child node of n}|. The fifth metric is the fan-out metric. It is the
6
counterpart of the fan-in metric. It is derived from the DTD graph and it counts how often
every element occurs in the DTD declaration or how often an element declaration is
reused. The formula for it is: is parent node of n}|. Both the fan-in and
fan-out metrics can be calculated using an adjacency matrix. In these five metrics, size
and complexity can be applied to complete DTDs. The metrics of depth, fan-in and fan-
out can be determined for each element and attribute of a DTD and for the complete
graph.
In [Mlynkova et al 2006], the authors make a thorough investigation of existing XML
data and their real complexity and structure. Different from the formal definition in
computer programming of O(n) or O(n2), the complexities here means counting the
number of each component in a file with its related weight value. The authors gathered
more than a total of 20 GB of real XML collections and studied the relationship between
schemas and their instances. The total number of collected XML documents is 16534 and
the number of XML collections is 133. The maximum size of a document is 1.3 MB. The
median size of a document is 10 KB. In the total collection, the documents with DTDs
accounts for 74.6%, and documents with XSDs is 38.2%. The collection is categorized
into data-centric documents, document-centric documents, documents for data exchange,
reports, research documents and semantic web documents. The authors use 10 logical
categories for better understanding the analysis. Their global statistics considers the
overall properties of XML data such as number of elements of various types, number of
attributes, paths, depths and portion of text in documents. The authors claim that their
results show most of the documents are constructed simply using only a small number of
distinct element and attribute names (usually less than 150). The maximum depth
exceeded 20 only for few specific, recursive instances. The maximum depth of
corresponding schemas is higher but has a maximum around 80. For depth statistics, it
shows that the typical depth is always less than 10. This confirms the previous works
[Choi 2002, Geert et al. 2004] related to maximum depth of XML data. In level statistics,
it focuses on distribution of elements, attributes, text nodes, and mixed contents at each
level of XML documents. The numbers show that the highest amounts of analyzed nodes
are always at the first levels and then the number of occurrences rapidly decreases. The
7
steep exponential decrease ends around level 20 and then the drop is much slower and
shows more fluctuations. For fan-out statistics, the numbers show that the characteristics
of the graph are similar at each level but with the growing depth it gets thinner. The fan-
in statistics are just an “inverse” of the fan-out characteristic. It indicates that the highest
values of fan-in naturally occur in the doc and ex categories which show more
complicated schema definitions than the rest. The recursive statistics deal with types and
complexity of recursion. The results conclude that XML Schemas cannot be taken as the
only source of information for benchmarking though they are reliable. There is a mixed-
content statistic which further analyzes the structure and complexity of mixed contents
more deeply. They focus on average and maximum depth of mixed content and the
percentage of simple mixed-content elements. The authors claim that the structure of
mixed-content elements is not complex. The average depth is less than 5 and most of
them are simple types which consist only of trivial subelements. Another category is
relational statistics. This pattern can be easily processed in a relational database, or using
relational approaches because they always are a product of various database export
routines. The final category is schema statistics. This statistic analyzes XML Schema-
specific constructs. From the analysis in this paper, the authors conclude that many
pattern usages are not as complex as they are expected to be. In conclusion, the authors
suggest two ways of producing the XML documents, one is by flexible schema-driven
database mapping methods and the other one is to focus on an autoconfigurable XML
processing system.
3.2 Metrics for DTDs
The XML Schema language has a stronger capability than DTD to describe the
vocabularies of XML documents, and has a good chance of being the schema language of
the future for XML. The study on DTDs can provide a better understanding of XML
Schema design. This subsection presents the metrics study on DTDs. There are two
papers dealing with this problem.
8
3.2.1 DTD Metrics
In the paper [Choi 2002], the author surveyed 60 DTDs collected from the web and
provided statistics with respect to a variety of criteria. In the paper, the statistic study of
DTDs was categorized into local and global properties. For local properties, it focused on
the structure and complexity of the content model. It also studied the properties that
affected the parsing of the DTDs or its ambiguity. For global properties, it was focused
on some properties that might be important in the mapping of the XML into some
database format. In local properties, DTDs were grouped into app, data meta categories
based on their function. The first metric was the syntactic complexity. It defined the
depth function as a rough measure of this metric. Then the determinism of DTDs was
discussed. It applies this standard to analyze the DTDs and find out how many DTDs
were non-deterministic. Also the ambiguity of DTDs was explained. The ambiguity of
DTDs means that the the maping is not unique. For global properties, the reachability of
element was first discussed which means that for each node n, there is exactly one node
reachable from x along the path n. The author claims that separating the unreachable
parts in DTD into smaller DTDs appear to be a better design. Secondly, the recursion in
DTDs was defined. The recursion of the DTD means that it has an element which can be
reached by itself. The linear recursive means that it is recursive and for any reachable
element, this element occurs only once in a content model and the occurrence is not
enclosed in “*” or “+”. Following, the simple path, simple cycle, chain of Stars and hub
were introduced. The simple paths happened in non-recursive DTDs and the simple cycle
happens in recursive DTDs. The chain of Stars means the continuous stars are appeared
in the DTD content model. The hub means an element name with a large fan-in value.
The author collected the fan-in value for data DTDs and meta DTDs and represented
them on graph. The analysis in this paper provides a good reference for other people to
develop the research in XML.
3.2.2 DTD vs. XML Schema
In [Bex et al 2004], the authors claimed that DTDs has some shortcomings and inspected
a number of DTDs and XSDs to answer two questions. (1) Which of the extra feature
9
/expressiveness of XML Schema not allowed by DTDs are effectively used in practice?
To answer these two questions, the authors firstly discussed the expressiveness of XML
schema. They investigated whether the expressive power of single-type SDTDs was used
in real-world XSDs. The result showed that only about 15% of total investigated 30
XSDs are true single-type SDTDs. The reason for this is that expressiveness beyond local
tree languages is simply rarely needed. The other reason is that because the relatively
new nature of XML Schema and its complicated definition, most users have no clear
view on what can be expressed. Secondly, they discussed the derived types of XSDs.
Two kinds of XML Schema were provided: simple types and complex types. For XML
Schema, the new types can be created by two mechanisms: extension and restriction. The
author made a statistic analysis about application of these two mechanisms to simple and
complex types of XML schema. Thirdly, they discussed the additional features that XML
schema possesses. One feature that XML Schema has is the application of &-operator.
The other is that the utilization of ID attributes and referred to by IDREF or IDREFS for
elements in XML document. In addition, referring to elements can be expressed by
key/keyref pairs and the use of namespaces for modularity is the feature that XML
Schema differs from the DTDs. The last feature is the ability to redefine types and
groups. The fourth question discussed is about the regular expression characterization. It
also answers the second question introduced at the beginning of the paper that how
sophisticated regular expressions tend to be in real world DTDs and XSDs. The simple
expressions were widely used and it indicates the simplicity that the DTDs and XSDs
expressed. The fifth question discussed in the paper is about the schema and ambiguity.
The author applies the term of one-unambiguous to check whether the DTDs and XSDs
used in the paper satisfy the requirement. It is showed that almost all of them meet this
requirement. Only few of them are not. In final, the authors collected total 109 DTDs and
93 XSDs for study. The author applied each metric and standard introduced in the paper
to these samples. The results showed that there is no absolute advantage or disadvantage
that which schema has, this helps the reader to get a better understanding of what the
DTDs and XSDs is.
10
The conclusion made in this paper is that many features defined in the XML Schema
specification are not widely used, especially those that are related to object oriented data
modeling such as derivation of complex types by extension. More importantly, it turns
out that almost all XSDs are local tree grammars. The expressive power encountered in
real world XSDs is mostly equivalent to that of DTDs. Also it concluded that the data
type of XML Schema overcomes the shortcoming of DTDs that it has the ability to
specify the format and type of the text of an element by restriction of simple types. It can
give software engineer some suggestions to avoid the utilization of complex types when
developing new XML implementations.
3.3 Metrics for XML Schema
This subsection presents the studies of Metrics on XML Schema. There are 5 papers deal
with this subject.
3.3.1 Structural Metrics
With the development of software engineering, metrics for XML schemas are needed for
quantification of schema size, complexity, quality, and other properties, instrumental to
control the processes in which the schema re involved. The first definition of a suite of
metrics for the XML Schema language was provided by [Lammel et al. 2005], the
metrics proposed in that paper range from simple counters of various types of schema
nodes to more involved metrics such as McCabe, depth, and breadth.
11
Table 1. Overview of XML Schema metrics defined by Lammel et al. [Visser 2006, 2]
The authors in this paper collected 63 schema projects from different IT sectors and each
metric was applied to the schemas. The works done in this paper helped to introduce the
fundamental metrics for the XSD language and identified the basic feature model of the
XSD language at a basic level. At a more problem-specific level, it looked into problems
that are related to the so-called impedance mismatch in data-model mapping which is
relevant for XML data binding.
3.3.2 XML Schema usage
The metrics proposed in [Lammel et al. 2005] mainly focus on size metrics. As an
extension of this paper, the work in [Visser 2006] proposed a number of more advanced
schema metrics that can be applied to measure other properties than size. The metrics in
this paper are all defined over the graph representations of schema structure. Then they
are called structure metrics. The metrics defined in this paper were listed in following
table.
12
Table 2. Metrics defined in paper [Visser 2006, 8]
Some formulas used in this paper are:
Impurity: , e: denotes the edge and n: is the node;
Instability based on fan: ;
Instability based on coupling: ;
Coherence: ;
In the paper, the authors collected two megabyte file and applied the prototype tool of
XsdMetz to measure the metrics proposed in the paper. It gave some intuitions behind the
metrics and their potential use.
13
3.3.3 XML Schema Complexity and Quality index
In [McDowell 2004], the authors focused on the quality measurement of XML Schema
and the complexity measurement of conforming XML documents. The authors in this
paper proposed metrics based on the metrics enumerated in [Klettke et al. 2002], they
include
Number of Complex Type Declarations:
-Number of Text Complex Type Declaration
-Number of Element Complex Type Declaration
-Number of Mixed Complex Type Declaration
Number of Simple Type Declarations
Number of Annotations
Number of Derived Complex types
Average Number of Attributes per Complex Type
Number of Global Type Declaration
Number of Global Type Reference
Number of Unglobal Element
Number of Unbounded Element Multiplicity
Average Bounded Element Multiplicity
Average Number of Restrictions per Simple Type
Element Fanning
Table 3. Metrics defined and experimented in [McDowell 2004, 543]
Based on these metrics, two indices for measuring quality and complexity are proposed.
The formulae for calculating them are:
Quality index= (Ratio if simple to complex type declarations)*5+(Percentage of
annotations over total number of elements)*4+(Average restrictions per simple type
declarations)*4+(Percentage of derived complex type declarations over total number of
complex type declarations)*3-(Average bounded element multiplicity size)*2-(Average
attributes per complex type declaration)*2;
14
Complexity index= (Number of unbounded elements)*5+(Element fanning)*3+(Number
of complex type declarations)+(Number of simple type declarations)+(Number of
attributes per complex type declaration);
The quality index is intended to provide an indication of the quality of Schema
documents. The complexity index is used for XML documents that are validated by the
XML Schema. Both the values are used to compare with the average index calculated by
the analyzing tool and give a relative indication of quality and complexity.
3.3.4 A Quality model for XML Schema evaluation
Based on above XML Metrics, in [Sumak et al. 2007], the authors propose an XML
Schema quality model for the purpose of assuring a quality XML schema. The model is
composed of following factors.
Figure 2 XML Schema Quality Model [Sumak et al. 2007, 787]
These factors provide a basic framework for programmers when design and analyzing the
XML Schema documents for how to evaluate them a good quality.
15
3.3.5 Metrics for XML Complexity
In [Qureshi et al. 2005], the authors focused on different ways of determining the
complexity of XML documents based on different syntactic and structural aspects to
lower the complexity of XML documents and improve their reusability and
maintainability. The method proposed by others for measuring the complexity has a
weakness that sometimes it can not distinguish the two different documents which have
same expressions. To overcome this problem, the authors in this paper proposed a new
algorithm of Weight Allocation Algorithm (WA) by assigning weights to the elements of
XML trees according to their distance from the root node (element). Since XML
documents can be denoted as a tree representation by applying the DOM (Document
Object Model). Through a recursive pre-order traversal of the XML tree can get the
corresponding XML documents. The formula used to decide the complexity of a
document is: weight(i)=1, when i=root(ele(D)) and weight(i)=weight(parent(i))+1, when i
belongs to elem(D)-Root(elem(D) where D denotes a given document and elem(D) is the
collection of all elements contained within that document. This can be expressed by
During the calculation process, the value allocation is that the root element has weight
‘1’, all immediate successors of the root element have weight ‘2’, and so on. Weights are
assigned to each element in elem (D) on the basis from the root element. This algorithm
gives some ways of gauging the quality and comprehensibility of XML documents.
3.3.6 Metrics for XML Schema complexity
In [Basci 2007 et al], a new complexity metric for XML Schema Documents was
proposed. This metric was based on the internal architecture of the XSD components.
Different from other methods in measuring the complexity by counting the number of
schema components, the authors in this paper considered the design architecture of XSD.
Each component was assigned a weight value called complexity degree which reflects the
complexity of each component. The total complexity was obtained by summing up each
of component’s weight value. For a XSD, the formula for total complexity is:
. Here, is the total complexity value of all
unreferenced global elements and attributes that is assigned by the weight values of
16
reflected by their type complexity values. is the total complexity values of
unreferenced global elements and attributes group, and is the total complexity
values of unreferenced global complex and simple type definitions of XSD. Here, the
word “unreferenced” means components that have no reference made within any
component definitions of the current schema. In the calculation process of each variable,
the weight value is defined by the type of group definitions that may have different
complexity weight values. In the paper, the author illustrated the metric by applying it on
a wsdl file and calculated its complexity. The result show that this metric gave better
indication than the metrics which measures the complexity based on the counts of
schema’s each components.
3.3.7 Summary of Research on XML Schema and DTD
Authors Conference/Journal Paper Title Main Contributions
Arnaud Sahuguet Proceedings of WebDB
2000
Everything You Ever
Wanted to Know
About DTDs, But
Were Afraid to Ask
It explores some shortcomings of DTDs and how people actually (mis)understand them. It also give some replacement for this problem.
Byron Choi Proceedings of WebDB
2002
What Are Real DTDs
Like
It analyzes the structures
of XML and provides the
statistics on some
structures of real DTDs
Meike Klettke, Lars
Schneider, and Andreas
Heuer
Workshops on EDBT
2002
Metrics for XML
Document Collections
Five Metrics are proposed
to evaluate the Schema of
XML documents.
Geert Jan Bex, Frank
Neven, Jan Van den
Bussche
Workshop on the Web
and Databases 2004
DTDs versus XML
Schema: A Practical
Study
It explores the features of
XML Schema not allowed
in DTDs, and the
structural properties of
two foms
Andrew McDowell,
Chris Schmidt, Kwok-
Bun Yue
Proceedings of CSREA
Press 2004
Analysis and Metrics
of XML Schema
It proposes eleven metrics
to measure the quality and
complexity of XML
17
Schema
Ralf Lammel, Stan
Kitsis, Dave Remy
Proceeding of XML
2005 Conference
Analysis of XML
schema usage
It introduces essential concepts of XML schema analysis based on software code metrics. It analyzes the quantitative and qualitative information for XML Schema.
Mustafa H. Quereshi,
M. H. Samadzadeh
Proceedings of ITCC
2005
Determining the
Complexity of XML
Documents
It focuses on determining
the complexity of XML
documents based on
different syntactic and
structural characteristic.
Zi Lin, Bingsheng He,
Byron Choi
Proceedings of ER
2006
A quantitative Summa-
ry of XML Structures
It proposes some metrics
for major structural
properties of XML,
especially the nestings of
entities and one-to-many
relationships.
Irena Mlynkova, Kamil
Toman, and Jaroslav
Pokorny
Conference on
Management of Data,
2006
Statistical Analysis of
Real XML Data
Collections
Analyzing real XML data
and their structure, real
complexity
Joost Visser Proceedings of XATA
2006
Structure Metrics for
XML Schema
It proposes a suite of
metrics for XML Schema
to measure structural
properties.
Bostjan Sumak, Marjan
Hericko, Maja Pusnik
Proceedings of the ITI
conference, 2007
Towards a Framework
for Quality XML
Schema Evaluation
It analyzes the XML
metrics and proposes a
XML Schema quality
framework.
Dilek Basic, Sanjay
Misra
OOPSLA 2007 Complexity Metric for
XML Schema Docum-
ents
It proposes a metric based
on the internal architect-
ture of the XSD compon-
ents and considers the
complexities of its build-
ing components.
18
3.4 Metrics for Software components
XML Schemas are software artifacts that are claiming an increasingly central role in
software construction projects. Schemas have come into use as interface definitions, data
models, protocol specifications and more. With increased reliance on schemas comes the
necessity of properly embedding them in the software engineering process. In software
engineering, the designing XML Schemas play an important role in software
development process. There are lots of researches have been done in software metrics.
The study of software metrics may give some inspirations to XML Schema metrics. In
following part, some researches of software component metrics will be surveyed.
3.4.1 Component metrics to measure component quality
Component-based software development (CBD) requires a considerably different
approach from OO methods. OO methods develop systems by defining functional and
object models, by comparison, CBD methods utilizes commonality and variability (C&V)
analysis, components, component’s interfaces and relationships among components.
Because OO metrics only focus on objects or classes, but component consists of one or
more classes as well as interfaces. Therefore, various metrics developed for OO
programming can’t be applied to CBD equally [Cho et. Al 2001].
In [Cho et. Al 2001], the authors proposed metrics for measuring the complexity,
customizability, and reusability of software components. For measuring the complexity,
the authors proposed a new complexity metric by combining cyclomatic complexity
(McCabe) [MaCabe 1976] is called Component Complexity Metric (CCM). The CCM
was then classified into four kinds of complexity metrics: component plain complexity
(CPC), component static complexity (CSC), component dynamic complexity (CDC), and
component cyclomatic complexity (CCC). The formula for calculating these metrics are
following.
For CPC, , Where:
19
is calculated by counting classes, abstract classes, interfaces and methods.
: the complexity of each class, and : the complexity of each method.
For calculating Component Static Complexity, it focuses on how complex the
component’s internal structure is. The formula is defined as:
, where Count(R) is the count of each relationship between
classes, and W(R): Weight value of each relationship.
The Component Dynamic Complexity measures the complexity of internal message in a
component with a dynamic view. The formula is defined as:
, here, the formula represents the complexity f each interface
method.
The Component Cyclomatic Complexity is used after the component implementation is
finished. The formula is defined as:
, where is the sum of classes,
interfaces, and interface methods. is the sum of complexity of each classed
contained in a component, and is the sum of complexity of each interface
method. The cyclomatic component method is defined as the number of edges minus the
number of nodes plus 2.
For measuring the Reusability, two approaches were proposed. One is a metric that
measures how a component has reusability. The other is a metric that measures how a
20
component is reused in a particular application. For the former approach, the formula is
defined as:
, where Count(CCM) is the count of each interface method for
providing common functions among several applications in a domain. Count(CIM) is the
count of methods declared in interfaces provided by a component. The latter approach is
a metric to measure particular component’s reuse level per application in a CBSD. The
formula is defined as:
, where Reuse(C) is the line of code reused component
in an application. And Size(C) is the total lines of code delivered in the application. In the
paper, the authors applied the metrics into several projects in the banking domain for
evaluation. The results show that the proposed metrics help evaluates the development
and testing efforts. It also provides valuable information to component developers,
component assemblers, and application developers.
3.4.2 Software component reusability metric
In [Washizaki et al. 2005], a metrics suite for measuring the reusability of software
components was proposed. The novelty of this metrics suite compared to [Barnard et al.
1998] and [Cho et. Al 2001] lies that the author metrics defined with confidence intervals
that were set by statistical analysis based on a certain number of JavaBean components.
Also the authors combined the metrics proposed in this paper and provided a reusability
model for indentifying black-box components with high reusability. The metrics defined
are: 1) Existence of Meta-Information (EMI), measures whether the BeanInfo class
corresponding to the target component is provided. 2) Rate of Component Observability
(RCO), is a percentage of readable properties in all fields implemented within the Façade
class of a component. It indicates the component’s degree of observability for users of the
component. 3) Rate of Component Customizabilty (RCC), is a percentage of writable
21
properties in all fields implemented within a Façade class of a component. It indicates the
component’s degree of customizability for users of the component. 4) Self-Completeness
of Component’s Return Value (SCCr), is a percentage of business methods without any
return value in all business methods implemented within a component. It indicates the
component’s degree of self-completeness, and low degree of external dependency for
users of the component. 5) Self-Completeness of Component’s parameter (SCCp), is the
percentage of business methods without any parameters in all business methods
implemented within a component. It indicates the component’s degree of self-
completeness, and the low degree of external dependency for users of the component. In
final, a reusability metrics named Component Overall Reusability (COR) by combining
all above five metrics based on the reusability model. It indicates the component’s degree
of reusability for users of the component.
The component c is reusable if the value of COR(c) is larger than 0. It can be effectively
used for the component selection in terms of reusability when there are several black-box
components implementing the same specification.
In [Potaru et 1. 2005], the authors proposed reusability metrics of compose-ability and
adaptability for software metrics. Both these two metrics were initially designed to for
database components. The degree of compose-ability of a component is intended to
quantify the easiness in combining a component with others. This metric is qualitatively
defined by studying the parameters and return values of its interface methods. A software
component interfaced only by methods with no parameters and no return values has the
largest compose-ability degree because it doesn’t have any external data dependencies.
Another metric measures the ability of a component to accommodate changes in the
environment and it is evaluated by the complexity of its interface.
3.4.3 Usability metrics for software component
22
The same work has been done by [Bertoa et al. 2004], the authors presented a suite of
metrics for usability of software component based on the ISO 9126 Quality Model. The
authors in this paper pointed out that one of deficiencies that other metrics for software
component lies they don’t consider attributes, even they don’t associate any quality
characteristic to metrics themselves. So the metrics defined in this paper focused on these
two questions. The measurable concepts related to usability metrics are quality of
documentation, complexity of the problem and complexity of the solution. The metrics
for these three attributes are presented respectively. The advantage of propose the metrics
for attributes lies that metrics can measure attributes directly.
As an extension of [Bertoa et al. 2004], in [Bertoa et al. 2006], the authors presents a set
of measures to evaluate the usability of software component and how to validate these
measures. It also proves how the appropriate combinations of measures can evaluate
better the usability of a component than any individual measure. The metrics used in this
paper is proposed in [Bertoa et al. 2004], but the authors conducted experiments for
measuring the understandability, learnability and operability of a set of software
components. The results showed that the understandability depends on the structure and
organization of the manual as well as on the simplicity of the method’s signature. The
learnability depends both the quality of the manuals and the complexity of the
component’s design. The operability mainly depends on a combination of the amount of
information available about the component configurable parameters. The future work in
this paper is to conduct more experiments and to work on analysis model to define the
appropriate quality indicators for usability.
3.4.4 Components Interface metrics for reusability
In [Boxall et al. 2004], a set of metrics for measuring the understandability of component
interface is proposed. For measuring the reusability of components, using some public
and static properties are easier than some properties, like efficiency, safety which require
runtime environment for measurement. The authors in the paper collected 12 component
interfaces written in C and C++ and measured them thoroughly. The metrics proposed
include:
23
1) Interface size: Argument Per Procedure (APP) measures the mean size of procedure
declarations of an interface. It is defined as:
, where is the total count of procedures that are publicly declared by an
interface. is the total count of arguments of the publicly declared procedures.
2) Distinct Argument Count, it measures the consistency of the naming and typing of
arguments. It is defined as: , where A is the set of name-type pairs used as
arguments in an interface. The interfaces with lower DAC will have arguments that
are declared more consistently.
3) Argument Repetition Scale (ARS), it still measures the consistency of the naming and
typing of arguments. It is an alternative to DAC.
, A is the set of name-type pairs. |a| is the count of procedures in which
argument name-type a is used, and is the total count of arguments. ARS will be in the
range . Interface with higher ARS will tend to be dominated by fewer distinct
arguments which are repeated more often.
4) Mean String Commonality (MSC), it measures the commonality of a set of
identifiers. It is defined as:
, where A is a set of identifiers, n is the count of
identifiers, comb(A) computes the set of combinations of two elements from the set A.
lcs(x,y) computes the longest common subsequence of x and y. If the set of identifiers
has zero or one elemens, MSC is undefined. Sets of identifiers with higher MSC will
have more commonality which suggests that the identifiers have been named consistently
and indicates that a naming convention has been adopted.
24
5) Mean Identifier Length (MIL) is the weighted mean length of identifiers occurring in
the interface. Median Identifier Length (MeIL) is the weighted median length of
identifiers. It is believed that longer identifiers will tend to contain more information
and be more self-documenting. The results in the paper showed that MeIL appeared
to be better than MIL because it is resistant to outliers.
6) Reference Argument Density (RAD), it measures the occurrence of reference
arguments in an interface. It is defined as:
, where is the count of pass by reference arguments and is the total
Argument Count. A higher RAD indicates that the interface tends to be more difficult to
understand.
The interface metrics can increase the understanding of the reusability components. They
can provide some accurate and efficient information for reuse analysis because
sometimes, other source information is unavailable and they can be accessed
conveniently.
3.4.5 Metrics for the integration of software components
Lots of researches focused on reusability of software components. In [Narasimhan et al.
2004], the authors start from the reality and propose a suite of metrics to measure
complexity and criticality for the integration of software components. The complexity
metrics consist of component packing density (CPD) metrics and component interaction
density (CID) metrics. CPD metrics is defined as the form of a ratio of constituent to the
number of components. It is used to identify the density of integrated components. The
formula is:
, where #<constituent> is the number of
lines of code, operations, classes, and/or modules in the related components. The number
of constituent depends on the information that might come from the definition of
component. This is related to the information on the number of classes, number of lines
25
of code (LOC), number of modules, or number of operations for each component.
Interaction Density Metrics measures the interaction happens at one component’s
interface or through consuming other component’s events. It also happens when a
component submits an event and other components receive it. The interaction density
metric proposed in this paper is called Average Interaction Density (AID). To measure
AID, first the Interaction Density of a Component (IDC) is introduced as the ratio
between the actual numbers of interactions to the available interactions in a component.
So the AID is defined as: , where to is
interactions density for component 1 to n, and #components is the number of the existing
component in actual system. For these metrics, a component with high value of density
indicates the need to utilize high quality professionals to do the design. In this paper, also
criticality metrics for component were proposed. It measures the critical component
existed in software. In a system, a component is called critical if it is a link criticality,
bridge criticality, inheritance criticality and size criticality. The metrics defined for them
are: for links criticality, it is the sum of the number of links connected to a component.
For bridge criticality, it can be obtained from the incidence matrix and identify which
component functions as bridge. For inheritance criticality, the information can be got
from the inheritance tree. For size criticality, it is the number of lines of code, modules,
operations or classes. By getting this critical number, it provides a better understanding of
which component face higher risk than other components. It is an indication of the risk
associated with the component.
As an extension of [Narasimhan et al. 2004], in [Narasimhan et al. 2006], the authors
present two suites of metrics, one is static metrics which measure complexity and
criticality of component assembly. Here, the complexity is measured by using
Component Packing Density and Component Interaction Density metrics. Further, the
complexity and criticality metrics can be combined to form a Triangular Metric. The
complexity metrics can be categorized into two groups. One deals with component
packing density (CPD) and the other deal with component interaction density (CID)
metrics. The metrics are defined as: and
26
, respectively. The #constituent is the number of LOC, object/classes,
operations or modules. The #I is the number of actual interactions and #Imax is the number
of maximum available interactions. For criticality Metrics, four metrics are presented.
They are Link Criticality, Bridge Criticality, Inheritance Criticality and Size Criticality.
The authors define the Link criticality metrics as the number of components. The bridge
Criticality metrics as the number of bridge components. The Inheritance Criticality
Metrics as the number of root components which has inheritance and the Size Criticality
Metrics as the number of component which exceeds a given critical value. The final
Criticality Metrics is the sum of above four metrics. The other suite is Dynamic metrics
which are collected during the runtime of a complete application. The dynamic metrics
defined in the paper are: the number of Cycle (NC) metric, the number of Average
number of active components which is defined as the number of active component
divided by the time to execute the application, the active component density (ACD)
which is defined as the number of active components divided by the number of available
components. The fourth one is the average active component density (AACD) which is
defined as the where is the sum of ACD and Te is the time to
execute of a function. The fifth one is the Peak Number of Active Components, it is
defined as , where is the number of active component
at time n and is the time interval in seconds. The dynamic metrics are helpful to
identify super-component and to evaluate utilization of components.
Continually as an extension of [Narasimhan et al. 2006], in [Narasimhan et al. 2007], the
authors detailed illustrate the static and dynamic metrics proposed in [Narasimhan et al.
2006] and validated these metrics by using Weyuker’s properties. Most of these metrics
fulfill the Weyuker’s property criteria. Also the authors checked the impact of these
metrics in the context of McCall’s Quality Model. Therefore, it concluded that these
metrics can help component-based developers to identify complexity and criticality in an
integrated system.
3.4.6 Summary of Research on software component metrics
Authors Conference/Journal Paper Title Main Contributions
27
Eun Sook Cho, Min
Sun Kim, Soo Dong
Kim
Conference of APSEC
2001
Component Metrics to
Measure Component
Quality
It proposes metrics for
measuring the complexity,
customizability and
reusability of software
components
Manuel F. Bertoa,
Antonio Vallecillo
Workshop of
QAOOSE, 2002
Usability metrics for
software components
A suite of usability
metrics is defined based
on ISO 9126 Quality
Model
H Washizaki, H
Yamamoto and Y
Fukazawa
Proceedings of
METRICS 2003
A metrics suite for
measuring reusability
of software
components
It proposes 5 metrics to
measure a component’s
understandability,
adaptability, and portability
with confidence interval
MAS Boxall, Saeed
Araban
Conference of
ASWEC 2004
Interface Metrics for
Reusability Analysis of
Components
It presents a set of metrics
for measuring the
understandability and
reusability
VL Narasimhan, B
Hendradjaya
Workshop of WOSSA
2004
A new suite of metrics
for the integration of
software components
It presents two sets of
metrics to measure
complexity and criticality
of software systems
Geert Jan Bex, Frank
Neven, Jan Van den
Bussche
Workshop on the Web
and Databases 2004
DTDs versus XML
Schema: A Practical
Study
It explores the features of
XML Schema not allowed
in DTDs, and the
structural properties of
two foms
OP Rotaru, Marian
Dobre
Conference on
Computer Systems
and Applications,
2005
Reusability Metrics for
Software Components
It proposes metrics and a
mathematical model for
measuring the adaptability
and compose-ability
Manuel F. Bertoa, Jose
M. Troya, Antonio
Vallecillo
The Journal of
Systems and Software,
2006
Measuring the usability
of software components
It presents a set of
measures to assess the
usability of software
28
components for building a
quality model.
VL Narasimhan, B
Hendradjaya
Advances in Systems,
Computing Sciences
and Software
Engineering, 2006
Detailed theoretical
considerations for a
suite of metrics for
integration of software
components
It presents two set of
metrics based on static
and dynamic aspects of
component assembly
VL Narasimhan, B
Hendradjaya
Information Sciences,
2007
Some theoretical
considerations for a
suite of metrics for the
integration of software
components
It presents two set of
metrics based on static and
dynamic aspects of
component assembly and
evaluates them using
Weyuker’s set of properties
3.5 Metrics for Object oriented design
There are two papers in this paper deal with the OO metrics for software
In [Chidamber et. Al 1994], the authors proposed a new suite of theoretically and
mathematically based metrics for OO design. The proposed metrics include: a) Weighted
Methods per class (WMC), , here is the complexity of the method in
class. b) Depth of Inheritance tree (DIT). c) Number of Children (NOC), this is
represented by the number of immediate subclasses subordinated to a class in the class
hierarchy. d) Coupling between object classes (CBO), is a count of the number of other
classes to which it is coupled. e) Response for a Class (RFC). The value of this metric is
the value of the response set for the class and the response set is a set of methods that can
potentially be executed in response to a message received by an object of that class. f)
Lack of Cohesion in Methods (LCOM). The LCOM is a count of the number of method
pairs whose similarity is 0 minus the count of method pairs whose similarity is not zero.
It measures the inter-relatedness between portions of a program. A high LCOM value
indicates disparateness in the functionality provided by the class. To evaluate the metrics,
the authors applied the criterion proposed by Weyker for each metric. The work in this
29
paper laid a solid foundation of later software metrics analysis and same to the design of
software systems.
In [Barnard et al. 1998], the authors defined a new reusability metric that can be used to
give values for the reusability of OO code. Before the metrics were developed, the
authors considered all factors that might affect the reusability of OO softwares. All these
factors can be grouped into four categories, namely for classes, for attributes, for methods
and for input parameters. All of these factors are listed below:
Metrics for class
Table 4 metrics for class defined in [Barnard et al. 1998, 37]
Metrics for Attribute
Table 5 metrics for Attribute defined in [Barnard et al. 1998, 37]
Metrics for method
30
Table 6. Metrics for Method defined in [Barnard et al. 1998, 37]
Method input parameter metrics
Table 7 metrics for Method input parameter defined in [Barnard et al. 1998, 38]
But the experiments showed in the paper that the results conflicted with some published
works which said that the size and complexity of program are relevant to reusability. In
this paper, it showed that only certain factors are relevant to reusability. For example,
some reusable codes don’t necessarily have low code complexity. The authors then
suggested a “black box” approach which is only the interface important for reusability.
The new model is following:
31
Table 8 The new model for reusability metric defined in [Barnard et al. 1998, 46]
Then the reusability metric is defined as:
Where
or if
cm: the number of methods in the class; ca: the number of attributes in the class;
p: the number of input parameters of method i; CBO: the number of classes this class
coupled to; MNc: meaningful name; MDc: meaningful description; MC: method
coverage;
For class Low coupling: Low number of calls to foreign classes (should be zero) Low depth of inheritanceGood documentation:Meaninful class name and descriptions, including methods and attributes
For Attributes Simple type-few sub-types Meaningful name and descriptionFor methods Method must perform one function onlyLow number of calls to foreign classes (should be zero)
Method must be robust-full coverage for all input parameters Meaningful name and descriptionFor input parameters Simple type-few sub-types Meaningful name and description
32
PC: parameter complexity; NF: number of functions performed; NFC: number of calls to
foreign classes. This metric is helpful for programmers to write more reusable code when
working on software design.
3.5.1 Summary of Research on Object Oriented metrics
Authors Conference/Journal Paper Title Main Contributions
Shyam R. Chidamber,
Chris F. Kemerer
IEEE Transactions on
Software Engineering,
1994
A metrics Suite for
Object Oriented Design
This is a milestone paper.
It first proposes six
metrics for OO design
Judith Barnard Software Quality
Journal, 1998
A new reusability
metric for object-
oriented software
It identifies the factors
affecting the reusability
and proposes a reusability
metric for OO software
3.6 metrics for Semantic Web Schemas
The latest study of semantic web schemas is by [Theoharis et al. 2008], in which the
authors analyze the graph features of semantic web schemas (RDF/S). The main finding
in this paper is that the majority of SW schemas with a significant number of properties
approximate a power law distribution for total-degree. To investigate whether the
property exhibits a power-law degree distribution, the authors analyzed the power laws
on two different functions of a discrete random variable (DRV) X, the first one is
Complementary Cumulative Density Function (CCDF) which measures the frequencies
of X values. The second is denoted by Value versus Rank (VR), measures the
relationship between the ith biggest X value and its rank. The result showed that both
CCDF and VR showed a power-law distribution. This provides a good enlightment to
investigate the m of XML schema whether has the same property of power-law
distribution
3.6.1 Summary of Research on Semantic Web metrics
Authors Conference/Journal Paper Title Main Contributions
Y Theoharis, Y. IEEE Transactions on On graph features of It’s the first work
33
Tzitzikas, D.
Kotzinos and V.
Christophides
knowledge and data
engineering, 2008
semantic web schemas analyzing the property
graph for semantic web
schemas
4 CONCLUSIONSIn this survey I have reviewed the current status of XML Schema metrics research and
the metrics for software components. Different claims have been made about the metrics
for XML and XML Schema documents, and the research derived from the metrics for
software components which has been reached a high level. The studies on XML Schema
need to be further improved which mainly focus on the document size, structure,
complexity and reusability analysis. To develop a specialized metrics that evaluate XML
documents is one of future tasks [Klettke et al. 2002]. Schema analysis is a key factor in
software developing, schema-based data management and schema-aware XML
programming. Designing a suite of metrics for schema analysis to analyze its features,
styles, and others becomes an integral part of XML schema-development process, on the
basis of appropriate tool support. This is helpful for future software/schema developers to
express intentions about the complexity, features and styles when they consider them into
designing the schema or software components [Lammel et al. 2005]. Combining with the
latest study on semantic web schemas, such as RDF/RDFS, OWL to see whether the
XML Schema documents have a same property of power-law distribution will be a
subject of continuous research in my following semesters.
5 ACKNOWLEDGEMENTSI gratefully thank Dr. Richard Frost for his invaluable guidance and support I received is
this semester. He teaches me a lot on how to conduct a literature review and how to
compose a survey. I also sincerely express my thanks to my supervisor Dr. Jiaoguo Lu for
his suggestions and help on my research topic.
34
REFERENCES1 Klettke, M., Schneider, L., and Heuer A.. Metrics for XML Document Collections.
Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data
Management and Multimedia Engineering. Lecture Notes In Computer Science, 2490:15-
28, 2002.
2 Sahuguet A.. Everything You Ever Wanted to Know About DTDs, But Were Afraid to
Ask. In Third International Workshop WEBDB2000. Lecture Notes in Computer Science.
1997: 171–183, 2000.
35
3 McDowell A., Schmidt C., Yue K.. Analysis and Metrics of XML Schema.
Proceedings of the International Conference on Software Engineering Research and
Practice, CSREA Press(2004), 538-544, 2004.
4 Martens W., Neven F., Schwentick T., Bex GJ.. Expressiveness and complexity of
XML Schema. ACM Transactions on Database Systems (TODS). 31(3), 770-813, 2006.
5 Lammel R., Kitsis S., Remy D.. Analysis of XML schema usage. Conference
Proceedings XML, Atlanta, Georgia, U.S.A, 2005. Available as:
http://www.idealliance.org/proceedings/xml05/ship/49/paper.PDF
6 Bex GJ., Neven F., Bussche JV.. DTDs versus XML Schema: A Practical Study. In
Proceedings of the Seventh International Workshop on the Web and Databases, WebDB
Paris, France, 2004, 79-84, 2004.
7 What are real DTDs like? B. Choi., In Proceedings WebDB, Madison, Wisconsin, USA
2002, 43-48, 2002;
8 Mlynkova I., Toman K., Pokorny J.. Statistical Analysis of Real XML Data
Collections. Proceedings of the 13th International Conference on Management of Data,
Delhi, India, 2006. Available as: http://www.cs.cas.cz/semweb/download.php?file=06-
05-mlynkova-toman-pokorny&type=pdf
9 Mustafa H. Qureshi, M. Samadzadeh H.. Determining the Complexity of XML
Documents. Proceedings of the International Conference on Information Technology:
Coding and Computing (ITCC'05). 2, 416 – 421, 2005.
10 Basci D. and Misra S.. Complexity Metric for XML Schema Documents. Proceedings
of Object-Oriented Programming, Systems, Languages, and Applications, 2007.
Available as: http://boss.bekk.no/oopsla2007/papers/p5.pdf
36
11 Visser J.. Structure Metrics for XML Schema. Proceedings of XATA, 2006. Available
as: http://www.di.uminho.pt/~joostvisser/publications/StructureMetricsForXMLSchema.pdf
12 Binstock, C., Peterson D., Smith, M., Wooding, M., Dix, C., Galtenberg, C.. The
XML Schema Complete Reference. Addison Wesley Professional Publishers, 2002.
13 Chidamber, SR., Kemerer CF. A metrics suite for object oriented design. Software
Engineering, IEEE Transactions on. 20(6), 476 – 493,1994.
14 Washizaki H., Yamamoto H., Fukazawa Y.. A Metrics Suite for Measuring
Reusability of Software Components. Proceedings of the 9th International Symposium on
Software Metrics, 2003, 211 – 223, 2003.
15 Felix Michel. Representation of XML Schema Components. Master Thesis, School of
Information University of California, Berkeley, 2007.
16 Goulão M., Abreu FB.. Software Components Evaluation: an Overview. 5ª
Conferência da APSI, 2004. Available as: http://ctp.di.fct.unl.pt/QUASAR/Resources/
Papers/2004/goulaoCAPSI2004Final.pdf
17 Cho ES., Min Sun Kim MS., Kim SD.. Component metrics to measure component
quality. Software Engineering Conference on, 2001, Eighth Asia-Pacific, Macau, China,
APSEC 2001, 419- 426, 2001.
18 Hilbert DM., Redmiles DF.. Extracting usability information from user interface
events. ACM Computing Surveys (CSUR), 32(4), Pages: 384 – 421, 2000.
19 Kafura, D., Reddy, GR..The Use of Software Complexity Metrics in Software
Maintenance. Software Engineering, IEEE Transactions on, Volume: SE-13(3), 335-
343,1987.
37
20 Lanubile, F., Visaggio, G.. Maintainability via structure models and software metrics.
Software Engineering and Knowledge Engineering, 1992. Proceedings., Fourth
International Conference on, 590-599, 1992.
21 Lin, Z., He, BS., Choi, B.. A Quantitative Summary of XML Structures. Lecture Notes
in Computer Science, Conceptual Modeling –ER, 2006, 4215, 228-240.
22 Mayer, T., Halla, T.. Critical Analysis of Current OO Design Metrics. Software
Quality Journal, 8, 97-110, 1999.
23 Narasimhan, VL., Hendradjaya, B.. Theoretical Considerations for Software
Component Metrics. Proceedings of World Academy of Science, Engineering and
Technology, 10, 165-170, 2005.
24 Narasimhan, VL., Hendradjaya, B.. Detailed Theoretical Considerations for a Suite of
Metrics for Integration of Software Components. Advances in Systems, Computing
Sciences and Software Engineering, Springer Netherlands, 257-264, 2006.
25 Allen, EB., Gottipati, S., Govindarajan, R.. Measuring size, complexity, and coupling
of hypergraph abstractions of software: An information-theory approach. Software
Quality Journal, 15(2), pages:179-212, 2007.
26 Jeffrey S. Poulin. Measuring software reusability; Software Reuse: Advances in
Software Reusability, 1994. Proceedings., Third International Conference on, 126-138,
1994.
27 Rotaru, OP., Dobre, M.. Reusability metrics for software components. Proceedings of
the ACS/IEEE 2005 International Conference on Computer Systems and Applications,
24-31, 2005.
38
28 Narasimhan, VL., Hendradjaya, B.. Some theoretical considerations for a suite of
metrics for the integration of software components; - Information Sciences 177, 844-864,
2007.
29 Jianguo Lu, Ju Wang, Shengrui Wang. XML Schema Matching. International Journal
of Software Engineering and Knowledge Engineering, 17(5), 575-597, 2007.
30 Sumak, Bostjan, Hericko, Marjan, Pusnik, Maja. Towards a Framework for Quality
XML Schema Evaluation. Information Technology Interfaces, 2007. ITI 2007. 29th
International Conference on, 25-28 June 2007, 783 – 788, 2007.
31 Mignet, L., Barbosa, D. and Veltri, P..The XML web: a first study. In Proc. of
International World Wide Web Conf, Budapest, Hungary, May 2003, 500—510, 2003.
32 Bernauer, M., Kappel, G., Kramler, G.. Approaches to implementing active semantics
with XML schema. Database and Expert Systems Applications, 2003. Proceedings. 14th
International Workshop on, 1-5 Sept. 2003, 559 – 565, 2003.
33 Yijun Yu, Jianguo Lu, Jinghao Xue, Yi Zhang, Weiwei Sun. Localizing XML
Documents through XSLT. Applied Informatics, 1059-1064, 2003.
34 Kotsakis, E., Bohm, K..XML Schema Directory: a data structure for XML data
processing. Web Information Systems Engineering, 2000. Proceedings of the First
International Conference on, June 2000, 1(19-21), 62 – 69, 2000.
35 Torabi, T, Dillon, Th, Rahayu, W.. XML schema for software process framework.
21st IASTED International Multi-Conference on Applied Informatics, Innsbruck, Austria,
10-13 Feb. 2003, 948-954, 2003.
36 Ramanath, M.. Schema-based Statistics and Storage for XML. A Doctoral Thesis,
Indian Institute of Science, April, 2006.
39
37 Jianguo Lu, Yijun Yu, John Mylopoulos. A Lightweight Approach to Semantic Web
Service Synthesis. ICDE Workshop, International Workshop on Challenges in Web
Information Retrieval and Integration, Tokyo, 240-247, 2005.
38 Farooq, A., Braungarten, R., Dumke, RR.. An Empirical Analysis of Object-Oriented
Metrics for Java Technologies. 9th International Multitopic Conference, IEEE INMIC
2005, Dec. 2005, 1-6, 2005.
39 Narasimhan, VL., Hendradjaya, B.. A New Suite of Metrics for the Integration of
Software Components. The First International Workshop on Object Systems and oftware
Architectures (WOSSA'2004), Victor Harbor, South Australia, 2004. Available as:
http://www.cs.adelaide.edu.au/~wossa2004/HTML/09-bayu-paper.pdf
40 Washizaki, H., Yamamoto, Y., Fukazawa, Y.. Software Component Metrics and It's
Experimental Evaluation. Proc. of International Symposium on Empirical Software
Engineering (ISESE2002), Vol.II, 2002.
41 Fenton, NE., Neil, M.. Software metrics: roadmap. International Conference on
Software Engineering, 2000, 357 - 370 , 2000.
42 Basili, V.R., Briand, LC., Melo, WL.. A validation of object-oriented design metrics
as quality indicators. Transactions on Software Engineering, 22(10), 751-761, 1996.
43 Coleman, D., Ash, D., Lowther, B., Oman, P.. Using Metrics to Evaluate Software
System Maintainability. Computer, 27(8), pages: 44-49, 1994.
44 Yu, X., Lamb, DA.. Metrics applicable to software design. Annals of Software
Engineering, 1995 – Springer, 1(1), 23-41, 1995.
40
45 Weyuker, E.J.. Evaluating software complexity measures. IEEE Transactions on
Software Engineering, 14(9), 1357-1365, 1988.
46 McCabe, T.J.. A Complexity Measure. IEEE Transactions on Software Engineering,
Volume: SE-2(4), 308- 320, 1976.
47 Boxall, M., and Araban, S.. Interface Metrics for Reusability Analysis of Components.
Proceedings of the 2004 Australian Software Engineering Conference (ASWEC'04), 40-
51, 2004.
48 Power, JF., Malloy, BA.. A metrics suite for grammar-based software. Journal of
Software Maintenance and Evolution, John Wiley & Sons, Inc., 16(6), 405-426, 2004.
49 Purao, S., and Vaishnavi, V.. Product metrics for object-oriented systems. ACM
Comput. Surv., 35, 191-221, 2003.
50 Evanco, WM.. The confounding effect of class size on the validity of object-oriented
metrics. Software Engineering, IEEE Transactions on, 29(7), 670 - 672, 2003.
51 Etzkorn, L., Delugach, H.. Towards a semantic metrics suite for object-oriented
design. Technology of Object-Oriented Languages and Systems, 2000. TOOLS 34.
Proceedings. 34th International Conference on, 30 July-4 Aug. 2000. 71 – 80, 2000.
52 Barnard, J.. A new reusability metric for object-oriented software. Software Quality
Control, 7, 35-50, 1998.
53 Alkadi, G., Carver, D.L.. Application of metrics to object-oriented designs. Aerospace
Conference, 1998. Proceedings., IEEE; Volume 4, 21-28 March 1998, 159 – 163, 1994.
41
54 Gowda, R.G., Winslow, LE.. An approach for deriving object-oriented metrics.
Aerospace and Electronics Conference, 1994. NAECON 1994., Proceedings of the IEEE
1994 National, 23-27 May 1994. 897 – 904, 1994.
55 Xenos, M., Stavrinoudis, D., Zikouli, K., Christodoulakis, D.. Object-Oriented Metrics
– A Survey. Proceedings of the FESMA 2000. Available as: http://edu.eap.gr/pli/pli10
/info/old-xenos/2000_p04.pdf
56 Frakes, W., Terry, C.. Software reuse: metrics and models. ACM Computing Surveys
(CSUR), 28(2), 415 – 435, 1996.
57 Bertoa, MF., Troya, JM., and Vallecillo, A.. Measuring the Usability of Software
Components. Journal of Systems and Software, 79(3), 427-439, 2006.
58 http://www.sqa.net/iso9126.html
APPENDIX
ANNOTATED BIBLIOGRAPHY
A.1 [Visser 2006]
Problem to be solved: A suite of metrics are developed to measure the structural
properties of XML Schema.
Previous work referred to: Metrics developed by other authors to quantify the schema
include size, complexity, quality and other properties. The metrics developed in this
paper is an extension of the work done by [Lammel et al. Analysis of XML Schema
42
Usage] which proposed 11 size metrics. The metrics proposed in this paper focus on
measuring some properties other than size of Schemas.
New idea: The authors in this paper defined a graph to represent the schema structure.
The components in XML Schema are related to each other, based on this dependence, a
graph called successor graph representation of the schema structure can be obtained. In
this graph, the nodes represent the components (such as elements, types, groups and
attributes) and the edges are successor relations between two components. In the
successor graph, any nodes which are strongly connected can be combined into one
component as a specific notation of module. This kind of graph is called CG. Further,
considering the namespaces, schema documents, or global declarations/definitions, each
of them can be applied to group the nodes of a successor graph into modules. In this
paper, the global declarations/definitions are adopted to develop the GG graph.
The metrics proposed in this paper include: 1) Tree impurity . It indicates the extant a
given graph deviates from a tree structure with the same number of nodes. 2) Efferent
coupling, which is the number of edges from nodes outside the module to nodes inside
the module. 3) Afferent coupling, is the number of edges from nodes inside the module to
nodes outside .4) Fan-in; is the number of incoming edges of a node. 5) Fan-out: the
number of outgoing edges of a node. 6) Instability: it is defined as the fan-out fraction of
total fan which may be interpreted as resistance to change. 7) Coherence: it concerns the
degree to which its internal nodes are connected with each other. 8) Normalized count of
modules: is defined as a ratio of potential module count to the total node in the graph.
Value of 100% means that No grouping has occurred.. Correspondingly, a low
normalized count of modules indicates a higher level of encapsulation. 9) Count of nodes
per module: is the number of nodes in each module.
Experiments/analysis carried out: A prototype tool XsdMetz was implemented in the
functional programming language with functional graph representations and algorithms.
A series of freely available XML Schema up to two megabyte in ascii file size were
collected for analysis.
43
The Results of all metrics for each file were collected. Statistically, it shows that for all
schemas, the normalized count of modules for the strong components graph is close to
100%, which indicates the a very low degree of recursiveness. Besides, none f the
schemas have more than 3 groups of mutually dependent nodes and the schema for XML
Schema has a very large strong component. But the normalized count of modules for
global declarations/definitions graph has a relatively lower percentage which means the
low encapsulation. For tree impurity metric, it was analyzed on each kind of graphs. The
other metrics, such as fan, coupling, coherence and instability are only applied on each
node or each module only.
Claims/conlusions made: A suite of eleven structures for XML Schema has been
defined based on the graph representations of schema structures. The metrics suite was
implemented and the intuitions in metrics and their potential use in schema assessment
has been described. In future, the validation of these metrics will be focused and the
correlations between different metrics will be figured out.
Papers that refer to (cited) this paper: This paper is referred by 4 times. In this survey,
it is referred by: Dilek Basci and Sanjay Misra. Complexity Metric for XML Schema
Documents. OOPSLA 2007.
A.2 [Mustafa et al. 2005]
Problem to be solved: This paper proposed a new method to determine the complexity
of XML documents based on their structures and syntactic aspects. The purpose is to
lower the complexity of XML documents and improve their reusability and
maintainability by the new method.
Previous work referred to measure the software metrics ranges from simple measures,
such as counting the number of source lines of code, to the more sophisticated one, such
as the number of variable definition/usage associations. Some better known metrics
44
theory for measuring the software complexity are Crawford’s size metrics, McCabe’s
complexity, structural decomposition metrics and residual complexity metric. For
measuring the complexity of XML documents, since DTDs can be expressed as regular
expressions either by direct mapping of DTDs to regular expressions or by applying some
analyzing tools. Through assigning a certain value to existing symbols and alphabets in
the expression, the complexity of the regular expressions can be evaluated.
New algorithm: The previous work for measuring the complexity has a weakness that
sometimes it can not distinguish the two different documents which have same
expressions. To overcome this problem, the authors in this paper proposed a new
algorithm of Weight Allocation Algorithm (WA) by assigning weights to the elements of
XML trees according to their distance from the root node (element). Since XML
documents can be denoted as a tree representation by applying the DOM(Document
Object Model). Through a recursive pre-order traversal of the XML tree can get the
corresponding XML documents. The formula used to decide the complexity of a
document is: weight(i)=1, when i=root(ele(D)) and weight(i)=weight(parent(i))+1, when i
belongs to elem(D)-Root(elem(D) where D denotes a given document and elem(D) is the
collection of all elements contained within that document.
Experiments/analysis carried out for the algorithm is to test a suite containing 90
different XML documents which have various sizes, areas, nesting levels by
implementing a JAVA program.
Results obtained from the experiments include the total number of elements, number of
distance elements, document size, tree height and the complexity value. The results can
be expressed by applying graphical representations.
Claims/conlusions made in this paper is that some works should be focused on including
the WA algorithm still needs to be improved, some sophisticated methods to determine
the complexity if regular expressions need to be further explored, in-depth analysis of
XML Schemas to overcome the shortcoming of DTDs should be carried on.
45
Papers that refer to (cite) this paper in this survey is:
Complexity Metric for XML Schema Documents; Dilek Basci and Sanjay Misra;
OOPSLA 2007;
A.3 [McDowell et al. 2004]
Problem to be solved: In this paper, the authors only extended the work by other authors
and didn’t address a substantial problem.
Previous work referred to: [Klettke et al.,2002 ] developed a set of metrics for
measuring DTD documents. The work they did only focuses on the complexity of
documents. It has limitations for evaluating the XML documents.
New idea/algortihm/architecture etc. As an extension of the work done by [Klettke et
al. ], the authors in this paper considered the quality of XML schemas and proposed
eleven metrics for evaluating both the complexity and quality of XML documents. The
eleven metrics were: number of Complex type declarations, number of simple type
declarations, number of annotations, number of derived complex types, average number
of attributes per complex type declaration, number of global type declarations, number of
global type references, number of unbounded elements, average bounded element
multiplicity size, average number of restrictions per simple type declaration, element
fanning. Besides, two formulas were developed to calculate the complexity indices and
quality indices of a XML Schema document by considering all metrics and allocate the
different weight to different metrics.
Experiments/analysis carried out: in this paper, the author didn’t conduct the
experiment for analyzing the metrics. One comprehensive example was shown to give the
illustration of each metric. Also a Metric Analyzer tool was developed to process the
XML Schemas according to user’s needs and the output is metrics and corresponding
indices.
46
Claims/conlusions made: The metrics developed here is not enough to interpreting the
XML Schema. A more systematic approach and classification need to be developed for
different using situation of XML Schemas.
Papers that refer to (cited) this paper: This paper is referred 3 times. In this survey, it
is referred by:
Mlynkova I., Toman K., Pokorny J.. Statistical Analysis of Real XML Data Collections.
Proceedings of the 13th International Conference on Management of Data, 2006.
Basci D. and Misra S.. Complexity Metric for XML Schema Documents. Proceedings of
Object-Oriented Programming, Systems, Languages, and Applications, 2007.
A.4 [Lammel et al. 2005]
Problem addressed: The authors in this paper claimed that the XML analysis is derived
from the software analysis and of software code metrics. Since XML Schema itself can
be viewed as software component. Starting from this, this paper introduces some essential
concepts of XML schema analysis, and then they are applied to real-world XML schema
usage for a better understanding.
Previous work referred to: The XML schema metrics done by other people only
focused on some structural statistical collections, such as the size or number of element,
complexity and number of types, etc. In [A. McDowell et al 2004], a detailed description
of XML Schema was presented. In [D. Basci et al 2007], the complexity metrics of XML
schema documents was investigated. In that paper, considers the factors of elements and
attributes definitions /declarations as well as their group definitions /declarations to the
complexity of XSD. In this paper, the metrics used to are McCabe complexity [T.
McCabe et al 1976] and a metrics suite for software [J. Power et al 2004].
New algorithm/Analysis: The purpose of XML Schema analysis aims to extract
quantitative and qualitative information from actual XML schemas. In this paper, the
47
schema usage analysis was based on analyzing for feature counts, idiosyncrasy counts,
size metrics, complexity metrics, and XML schema styles. For XSD Metrics, The first
metrics proposed is XML-agnostic schema size. This metric measure the file size of all
XSD files included to a schema. The second is lines of code (LOC). This metric is
sensitive to line-break style and is sensitive to element-closing style. Both metrics can not
be expected to reveal any XSD-specific insight and they just provide baseline measures
for understanding the overall size category of a given schema project. The third metrics is
related to XSD-agnostic schema size, It includes the number of all XML nodes (both
attributes and elements) and the number of all XML nodes for annotation. (including
documentation and appinfos). The fourth one is XSD-aware counts. It counts the global
building blocks of schema. It includes the number of global element declarations, the
number of global complex-type definitions, the number of global simple-type definitions,
the number of global model-group definitions, the number of global attribute-group
declarations, the number of global attribute declarations and the sum of all of the above.
In practice, the sum of the number of global element declarations and the number of
global complex type definitions is often used in informal conversions. The fifth metric is
the McCabe complexity for XSD. Mcc is a structural complexity metrics and its typical
use is to approximate the psychological complexity of code. Typically, MCC measures
the number of linearly independent paths through the control-flow graph of a program
module. In XML schema, it can be adopted for the schema grammars. It is assumed that
the schema understanding process can be approximated by the number of decisions. The
important decision nodes in XML schema include: choices, occurrence constraints,
element references to substitution groups, type references to types that are extended or
restricted, the multiplicity of root element declarations and nillable elements. For each of
the above factors, the different value is counted when calculating the total MCC. The
sixth metric is the code-oriented breadth. This measures for the number of parties in a
content model. Here, the content model doesn’t necessarily include attributes. The parties
include local element declarations, element references, model-group references, base-
type references, local attribute declarations, attribute references and attribute-group
references. The seventh metric is the Instance-oriented breadth. It tries to approximate the
number of children in actual XML trees, and it must deference model-group and
48
attribute-group references. It measures the number of children and selectively includes
the attributes. The eighth metrics is the Code-oriented depth. It investigates the level of
nesting in content models and instances thereof. On the other hand, it is the depth of its
content model. The ninth metrics is Instance-oriented depth. It measures the height of
XML trees and it defines the greatest lower bound on the depth of XML trees that are
derivable from a given root-element declaration,.
New Experiment/Analysis: This article is a survey on XML schema usages. The authors
collected 63 schema projects from different IT sectors. Some of them come from
Microsoft Webdata/XML’s internal suite of business-critical schemas. The others were
scanned and obtained freely on the internet. All analysis was based on these schemas.
Result obtained: Each metric was applied to the schemas and the analyzing result was
obtained. The authors listed all result for each metric in a table in the paper and a brief
analysis was carried out. For the first and second metric, they just provide a basic
measures for understanding the overall size category of a given schema project. For
second metric, the amount of documentation varies greatly while the appinfos are only
used very sporadically. For XSD-aware counts metric, the number of highs and lows for
each measure was counted and for #CT measure, a #CT-based categorization table was
obtained as the #CT metrics is biased to the XML paradigm , just as much as the “number
of “classes” is biased to the OO paradigm. Also it measures the number of hierarchically
structured “concepts” in a schema. For MCC metric, the graph sowed that between MCC
and #NODE, there is a weak correlation and the additional information contributed by
MCC was indicated. For Code-oriented breadth, the result showed that half of the
schemas have at least one content model with more than 25 parties. The difference
between “parties including attributes” versus “parties excluding attributes” is visually
striking in at least 10 cases. And there are 11 schemas with a number of parties above
100. For instance-oriented breadth metric, the graph is much similar to that of code-
oriented breadth. For Code-oriented depth metric, the pure nesting of element
declarations is limited to 2 for the majority of content models. There is a single schema
that exercise element nesting up to the level of 14 and the full description depth is
49
systematically greater. For Instance-oriented depth metric, the result showed that
approximately half of all schemas define a maximum depth greater than 7. With a early
ceasing, there are a good number of schemas with a maximum depth greater than 5. It
was seen that much smaller depth values can be obtained if the glb-depth algorithm was
applied.
Conclusions made: This paper introduced all necessary concepts of XML schema
analysis and executed a detailed empirical study of schema usage. At the basic conceptual
level, it introduced the fundamental metrics for the XSD language and identified the basic
feature model of the XSD language. At a more problem-specific level, it looked into
problems that are related to the so-called impedance mismatch in data-model mapping, as
it is relevant for XML data binding. The authors in this paper envisaged that the future
schema developer can easily observe trends for schema metrics along development. At
the same time, the schema developer is able to express intentions about complexity,
features and styles so that warnings can be produced by schema editor when they are
violated.
This paper was referred by 7 times. In this survey, it is referred by:
Visser J.. Structure Metrics for XML Schema. Proceedings of XATA, 2006. Felix Michel.
Representation of XML Schema Components. Master Thesis, School of Information
University of California, Berkeley, 2007.
A.5 [Basci et al. 2007]
Problem to be solved: In this paper, the author proposed a new method to measure the
complexity of xml schema based on the internal architecture of the XSD components
rather than counting the number of schema’s each components.
Previous work referred to by the authors in this paper are some procedural metrics for
evaluating the complexity of DTD, such as LOC, McCabe, Fan-in and Fan-out proposed
by [Klettke et al. ]. Also [Mustafa et al. ] proved that the XML documents with higher
50
nesting levels have higher complexity and more complicated compared to the documents
with lower nesting levels. [Besides, Ralf et al ] measure the structural complexity of
XSDs with the metrics of Tree impurity, Efferent and Afferent Coupling, Instability,
Cohesion, Normalized Count of Modules and so on.
New idea proposed in this paper is that authors consider the complexity of each
independent component which means the internal complexity of each component is
closely related to that of whole xml document. Therefore, a weight value called
complexity degree for each component is assigned. The complexity of the schema
document is evaluated by summing up each of it’s component’s weight values. Based on
design style, a schema may have various number of components declared locally or
globally, or the elements and attributes may be defined or declared in group, or some
user-defined or built-in simple type and complex type definitions. All of above have an
impact on the complexity of XSD. A formula for how to calculate the complexity has
been developed and assigning the weights to each element and attribute according to their
definition type. Element’s weight value is 1 if the element is declared by using < any >
element and anytype. The attribute’s weight value is 1 if attribute is declared by <
anyAttribute > element. The weight value of a simple-type is defined to 1 if it is built-in
simple type.
Analysis carried out to demonstrate the author’s claim is by analyzing a wsdl document
and its XSD file. No other experiments are conducted to support the performance of this
method.
Claims/conlusions made in the paper is that the proposed metric works better than other
methods which are based on counting the number of components since the internal
architecture of XSD documents affect the overall complexity of document. To reliably
applying the metric, the schema quantification should be validated. Lacking of validation
for the metric is one shortcoming of this paper.
Papers that refer to (cite) this paper: N/A
A.6 [Lin et al. 2006]
51
Problem to be solved: In relational databases, statistical summaries mainly focus on the
distribution of data values. While this technique is applicable to the data values in XML
documents, new applied techniques are required for generalization of the structures of
XML documents. In this paper, the authors propose a suite of metrics for major structures
of XML documents, namely the nesting of entities and one-to-many relationships.
Previous work referred to: Some works have been done for summarizing the structures.
Some of their approaches are graph-based. Also some of them derived for estimating
selectivity of a query workload. There is method that counts the number of simple paths
in XML documents and determines the correlation between paths for estimating the
selectivity of a given query workload. In addition, the method that just proposes
statistical synopses for XML for path query selectivity estimation. Some properties of
real-world DTDs and XML Schemas are surveyed in [G.J.Bex et al. 2004] and [B.choi et
al. 2002].
New Metrics: The proposed metrics for XML structures in this paper are included
following. The number of paths, the length of a path, the number of star edges in the
prefix tree, the number of star edges of a path (implies the number of nested one-to-many
relationship in a tree), the number of recursive /non-recursive element types (measures
the number of recursive and non-recursive element types in a document), the number of
recursive elements of a path (quantifies the recursiveness of an XML document), the
number of a particular kind of star edges of a node (measures the multiplicity of each
kind of star edges of a node) and the skewness of the number of a particular kind of star
edges of a node.
Experiments/analysis carried out: There is no experiments in this paper, but the authors
apply the metrics on a few popular XML datasets. They are XML benchmark datasets,
NASA, SP and some synthetic datasets of XMARK, XBENCH. Here, the XMARK
datasets contain synthetic auction transactions and it allows users to vary the size of the
generated dataset by providing a scaling factor. The XBENCH dataset has four types,
52
they are TC/SD, TC/MD, DC/SD and DC/ND. Here, TC, DC, SD and MD stand for text-
centric, data-centric, single document and multiple documents respectively.
Results obtained: The results for the survey in this paper are listed as tables in the paper.
The authors apply different metrics on different datasets. They summarizes the result as
that the majority of surveyed XML datasets are mild generalization of relations, they are
not “tree-like”. Further, these datasets are highly skewed. All the datasets are gardly
hierarchical and contain a small number of recursions.
Claims/conclusions made: The metrics proposed in this paper are based on the
derivation of statistics from a prefix tree of XML structures and used simple paths and
star edges. These metrics answer the question of whether an SML structure is tree-like or
relation-like. The analysis results indicates that the structures of these XML documents
are highly skewed, non-hierarchical and mostly non-recursive which means the datasets
are relational-like.
Papers that refer to (cited) this paper: 1
A.7 [Bex et al. 2004]
Problem to be addressed: The authors in this paper claimed that DTDs has some
shortcomings and XML Schema is an extension of DTDs with a restricted form of
specialization. In this paper, the authors inspected a number of DTDs and XSDs to
answer two questions. (1) Which of the extra feature /expressiveness of XML Schema not
allowed by DTDs are effectively used in practice; and (2) how sophisticated are the
structural properties of two formalisms.
Previous work related to: This paper is an extension of the paper of [Choi 2004], in
which three types of schemas were proposed to identify features that are characteristic for
DTDs. They are application, data and meta-data related. Also the content model created
on that paper was used in this paper to consider XSDs beside DTDs. In addition, from a
53
structural view, the DTDs and XSDs were defined by using tree languages. The
specialized DTD as well as single-type SDTD were defined in the paper using the tree
languages.
New algorithm/analysis: The first problem discussed by the author is the expressiveness
of XML schema. It investigated whether the expressive power of single-type SDTDs was
used in real-world XSDs. The result showed that only about 15% of total investigated 30
XSDs are true single-type SDTDs. The reason for this is that expressiveness beyond lical
tree languages is simply rarely needed. The other reason is that because the relatively
new nature of XML Schema and its complicated definition most users have no clear view
on what can be expressed. The second problem discussed is about the derived types of
XSDs. Two kinds of types that XML Schema provides: simple types and complex types.
For XML Schema, the new types can be created by two mechanisms: extension and
restriction. The author made a statistic about application of this two mechanisms to
simple and complex types of XML schema. The third problem discussed in the paper is
additional features that XML schema possesses. One feature that XML Schema has is the
application of &-operator. The other is that the utilization of ID attributes and referred to
by IDREF or IDREFS for elements in XML document. In addition, referring to elements
can be expressed by key/keyref pairs and the use of namespaces for modularity is the
feature that XML Schema differs from the DTDs. The last feature is the ability to
redefine types and groups. The fourth question discussed is about the regular expression
characterization. It also answers the second question introduced at the beginning of the
paper that how sophisticated regular expressions tend to be in real world DTDs and
XSDs. The simple expressions were widely used and it indicates the simplicity that the
DTDs and XSDs expressed. The fifth question discussed in the paper is about the schema
and ambiguity. The author applies the term of one-unambiguous to check whether the
DTDs and XSDs used in the paper respect the requirement. It is showed that almost all of
them meet this requirement. Only few of them are not.
Experiment carried out: In this paper, the author didn’t conduct the experiment. There
are total 109 DTDs and 93 XSDs collected for study. The author applied each metric and
54
standard introduced in the paper to these samples. The author gives a detailed statistic
data for each sample.
Result obtained: In each section, the author listed the result for the comparison of DTDs
and XSDs. There is no absolute advantage or disadvantage that which schema has, it
helps the reader to get a better understanding of what the DTDs and XSDs is.
Conclusion made: The analysis showed that many features defined in the XML Schema
specification are not widely used, especially those that are related to object oriented data
modeling such as derivation of complex types by extension. More importantly, it turns
out that almost all XSDs are local tree grammars. The expressive power encountered in
real world XSDs is mostly equivalent to that of DTDs. Maybe in future with the
development of technology, the level of sophistication offered by XML Schema will have
a wide application.
The data type of XML Schema overcomes the shortcoming of DTDs that it has the ability
to specify the format and type of the text of an element by restriction of simple types. The
content models in both samples tend to be simple types. It can give software engineer
some suggestions to avoid the utilization of complex types when developing new XML
implementations.
This paper was referred by 43 times. In this survey, it is referred by:
Martens W., Neven F., Schwentick T., Bex GJ.. Expressiveness and complexity of XML
Schema. ACM Transactions on Database Systems (TODS). Volume 31 , Issue 3, Pages:
770 - 813, 2006.
Mlynkova I., Toman K., Pokorny J.. Statistical Analysis of Real XML Data Collections.
Proceedings of the 13th International Conference on Management of Data, 2006.
Lin, Z., He, BS., Choi, B.. A Quantitative Summary of XML Structures. Lecture Notes in
Computer Science, Conceptual Modeling –ER, 2006.
55
A.8 [Choi 2002]
Problem to be solved: In previous research for DTDs, it was assumed that the recursion
and nondeterminism were absence. This assumption brought a problem that it is
necessary to justify these assumptions against DTDs in the real world. For this reason, the
author in this paper collected a number of collections and made a survey to provide some
statistics with respect to a variety of criteria commonly discussed in XML research.
Previous work referred to: DTDs have a wide application in implementing efficient
storage systems for XML and in typechecking programming and query languages for
XML. This work has been studied in literatures. The collection of DTDs in this paper was
implemented by “DTD Inquisitor” which was developed by the author earlier. This
program can read a DTD and identify problems, compute a number of graph-theoretic
properties of it.
New analysis: The statistic study of DTDs was categorized into local and global
properties. For local properties, it focused on the structure and complexity of the content
model. It also studied the properties that affected the parsing of the DTDs or its
ambiguity. For global properties, it was focused on some properties that might be
important in the mapping of the XML into some database format. In local properties,
DTDs were grouped into app, data meta categories based on their function. The first
metric was the syntactic complexity. It defined the depth function as a rough measure of
this metric. Then the determinism of DTDs was discussed. It applies this standard to
analyze the DTDs and find out how many DTDs were non-deterministic. Also the
ambiguity of DTDs was explained. The ambiguity of DTDs means that the the maping is
not unique. For global properties, the reachability of element was first discussed. The
reachability means that for each node Qi, there is exactly one node reachable from x
along the path Qi. The author claims that separating the unreachable parts in DTD into
smaller DTDs appear to be a bettwr design. Secondly, the recursion in DTDs was
defined. The recursion of the DTD means that it has an element which can be reached by
itself. The linear recursive means that it is recursive and for any reachable element , this
56
element occurs only once in a content model and the occurrence is not enclosed in “*” or
“+”. Following, the simple path, simple cycle, chain of Stars and hub were introduced.
The simple paths happened in non-recursive DTDs and the simple cycle happens in
recursive DTDs. The chain of Stars means the continuous stars are appeared in the DTD
content model. The hub means that an element name with a large fan-in value. The author
collected the fan-in value for data DTDs and meta DTDs and represented them on graph.
Results obtained: In this paper, there is no experiment carried out. The author analyzed
the collection of total 60 DTDs and applied them with the metrics and standards
presented in the paper. Of 60 DTDs, 7 were app, 13 were data and 40 were meta. For
each metric, the author made a statistic analysis and draws a table or a graph to show an
intuitive understanding and a comparison.
Conclusion made: The author in this paper provided statistics on some structures of real
DTDs. The structures were analyzed that may be assumed in XML research. The analysis
will provide a good reference for other people to develop the research in XML.
This paper was referred by 67 times. In this survey, it is referred by:
Bex GJ., Neven F., Bussche JV.. DTDs versus XML Schema: A Practical Study. In
Proceedings of the Seventh International Workshop on the Web and Databases, WebDB
2004. pages 79—84, 2004.
Mlynkova I., Toman K., Pokorny J.. Statistical Analysis of Real XML Data Collections.
Proceedings of the 13th International Conference on Management of Data, 2006.
Lin, Z., He, BS., Choi, B.. A Quantitative Summary of XML Structures. Lecture Notes in
Computer Science, Conceptual Modeling –ER, 2006.
Bex GJ., Neven F., Bussche JV.. DTDs versus XML Schema: A Practical Study. In
Proceedings of the Seventh International Workshop on the Web and Databases, WebDB
2004. pages 79—84, 2004.
A.9 [Sahaguet 2000]
57
Problems to be solved: With the increasing popular use of XML as data format, the
authors in this paper survey some preliminary results that explore how XML DTDs are
actually being used and how people are actually (mis)using DTDs. Some shortcomings
and requirements for possible replacement are described.
Previous work referred to: DTDs have been invented for SGML. The purpose of it is to
determine whether the mark-up for an individual document is correct and also to supply
the mark-up that is missing because it can be inferred unambiguously from other mark-up
present. This definition is made by ISO. The historical functions of DTDs have been
parsing and validation. It was studied from a database perspective for it can be useful to
trigger optimizations for XML query languages. This was studied by [S. Abiteboul et al.
1999]. Also DTDs were analyzed by applying them for the purpose of storage and
compression because the structure information of XML can be learned in advance.
Besides, some studies were done by using DTD information to create a language binding
from XML to a programming language such as C++, Java etc.
New analysis: This paper is a survey of DTDs. There is no new metrics or architectures
are presented.
Analysis/experiment carried out: The authors collected the DTDs from some
repositories online. For some DTDs which are missing some declarations are cleaned and
then normalized them by expanding parameter entities and translating the DTD structure
into a convenient data-structure. The last step of the analysis is reporting and
visualization.
Results obtained: The authors analyze the DTDs properties in terms of its structure size,
complexity. Also some specific aspects such as use of mixed-content are analyzed as well
as the graphs were drawn to get some insights. The results firstly show that the most
published DTDs are not correct with missin elements, wrong syntax or incompatible
attribute declarations. Secondly, it shows that a DTD is not always a connected graph.
The third result concerns the encoding tuples. It uses the different operator to create
58
unordered sequences. Fourthly, it shows that the inheritance via entity references is
purely syntactic and sometimes lead to some cascading mistakes. Finally, it shows that
most of the features of DTDs are not being used, such as notations and fancy attribute
types.
Conclusions made: (1) DTD have all sort of shapes and sizes and are used for many
diverse purposes; (2) DTD features are not properly understood; (3) People tend to use
only one solution to apply DTDs; (4) people use hacks to solve DTD shortcomings,
sometimes for better, sometimes for worse. Besides, since DTDs are often a misleading
approximation of the intended structure, it is not a wise strategy to rely on DTDs for
storage, compression and optimization.
Paper referred to: 49 times. In this survey, it is referred by:
Bex GJ., Neven F., Bussche JV.. DTDs versus XML Schema: A Practical Study. In
Proceedings of the Seventh International Workshop on the Web and Databases, WebDB
2004. pages 79—84, 2004.
Martens W., Neven F., Schwentick T., Bex GJ.. Expressiveness and complexity of XML
Schema. ACM Transactions on Database Systems (TODS). Volume 31 , Issue 3, Pages:
770 - 813, 2006.
Klettke, M., Schneider, L., and Heuer A.. Metrics for XML Document Collections.
Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data
Management and Multimedia Engineering. Lecture Notes In Computer Science, 2490:15
– 28, 2002.
Mlynkova I., Toman K., Pokorny J.. Statistical Analysis of Real XML Data Collections.
Proceedings of the 13th International Conference on Management of Data, 2006.
A.10 [Washizaki et al. 2003]
Problems to be solved: In Component-based software development, it is difficult to use
conventional metrics since the source codes of components cannot be seen, but these
metrics require the analysis of source codes. The authors in this paper propose a metrics
59
suite for measuring the reusability of such black-box components which is based on
limited information from outside of components without any source codes.
Previous work referred to: Some metrics measure the reusability of OO software, such
as metrics proposed in [S. Chidamber et al. 1994] and [W. Frakes et al. 1996]. But these
metrics require analysis of source codes and can not be applied to black-box components.
In previous works, different version of the definition of software component were
described.
New metrics: In this paper, a metrics suite of five metrics for black-box components was
proposed. The confidence intervals for these metrics were set by using the rating scores
from JARS.COM. The metrics defined in this paper are: EMI (existence of meta-
information), RCO (Rate of Component Observability), RCC (Rate of Component
Customizability), SCCr (Self-Completeness of Component’s Return Value) and SCCp
(Self-completeness of component’s parameter). They measure the existence of meta-
information, observability, customizablity, and external dependency of a black-box
JavaBeans component.
Analysis carried out: The authors define a new reusability model based on McCall’s
FactorCriteria-Metrics issued by ISO for black-box components. This model is structured
in three levels in a top-down manner. These three levels are management level, design
level and product level. The metrics are used at product level. Besides, a component
analysis tool for black-box JavaBeans components in Java language has been developed.
It automatically measures primitive values and calculates the values of five metrics.
There are total 125 JavaBeans components provided and were applied to to calculate the
confidence intervals.
Results obtained: The result of analysis indicate that the components with too high an
observability and a customizability are not reusable. In all samples, the percentage of
components whose values of SCCr are in the confidence interval is 86%. This means that
a user of a component is preferable obtaining the results of the business method’s
60
invocation by invoking the read methods of readable properties, not by capturing the
return value of the business methods. For SCCp, the percentage of components whose
values of SCCp are in the confidence interval is 22%. It means it is difficult to judge the
reusability only by using the value of SCCp. Also the authors analyze the correlation
between different metrics. The result shows that the metrics of RCO and RCC are
independent of the metrics of SCCr and SCCp. If the understandability of a component is
high, then the adaptability would also be high. The SCCp cannot reflect the portability of
a component compared with SCCr. The user should use only SCCr for measuring the
portability of a component.
Conclusion made: The authors in this paper propose a metrics suite for evaluating the
component’s reusability. Using the metrics and the confidence interval proposed in this
paper, it is easy for users to select the components with higher reusability. This approach
can help and promote the activity of development with the reuse of existing black-box
components.
This paper was referred by 47 times. In this survey, it is referred by:
Narasimhan, VL., Hendradjaya, B.. A New Suite of Metrics for the Integration of
Software Components. The First International Workshop on Object Systems and oftware
Architectures (WOSSA'2004), 2004.
Bertoa, MF., Troya, JM., and Vallecillo, A.. Measuring the Usability of Software
Components. Journal of Systems and Software, vol. 79, no. 3, pages: 427-439, 2006
Narasimhan, VL., Hendradjaya, B.. Some theoretical considerations for a suite of metrics
for the integration of software components; - Information Sciences 177, pages: 844-864,
2007.
A.11 [Chidamber et al. 1994]
Problem to be solved: The authors in this paper focus on the problem of managing the
process when do software design and developing. A new suite of theoretically and
mathematically based metrics for OO design was proposed.
61
Previous work referred to: Lots of work related to object oriented design metrics had
been done. [Moreau. Et al.] proposed three metrics for OO graphical information system,
but didn’t give formal definitions. [Lieberherr et. al] presented a more formalized method
in defining the rules for OO programming style, and the coupling and cohesion were used
in traditional programming. [Pfleeger ] gave the measure of counting the number of
objects and methods to develop a cost estimation model for OO development. There are
also other methods available for OO design metrics. But the weaknesses of these metrics
are short of theory foundation and empirical support.
New idea: A suite of metrics include Weighte d Methods per class, Depth of Inheritance
tree, number of children, coupling between object classes, response for a class and lack of
cohesion in methods are proposed. Different from other methods, the authors evaluated
each metric with a set of criterion proposed by Weyker. Theses properties are notions of
monotonicity, interaction, noncoarseness, nonuniqueness and permutation. ………..
Experiments/analysis carried out: To test the metrics, a tool was developed and the
data were collected by this tool. The data were collected from 2 different organizations.
One is a software vendor and the other is a semiconductor manufacture. Both of them
uses OOD in their development work. The former has a collection of different 634 C++
class libraries and the latter uses Smalltalk programming and has 1459 classes.
Results obtained: All the metrics satisfy the most of the properties criterion. But only the
property 6(interaction increases complexity) is a exception. This implies that the
complexity metric can increase when a class is divided into more classes. This is
confirmed by the engineers and software designers who participated in the study. It
indicates that satisfying this property is not an essential feature for OO software design.
In addition, the metrics of DIT and LCOM don’t satisfy the property 4(monotonicity)
when 2 classes are in a parent-descendent relationship. This is right because the distance
from the root of a parent can’t become greater than one of its descendants.
62
Claims/conlusions made:The metrics suite developed in this paper provides software
designers and mangers an indication with the design details of an application. It is helpful
for them to solve the architectural and structural consistency problems. In addition, by
using the metrics, the flaws and other leverage points in the design can be identified
earlier. It added the insight about the trade-offs between conflicting requirements and
ease of testing. This suite is the first empirically validated proposal for formal metrics for
OOD.
Papers that refer to (cite) this paper: this paper is referred by total 1520 times. In this
survey, it is referred by:
Frakes, W., Terry, C.. Software reuse: metrics and models. ACM Computing Surveys
(CSUR),Volume 28 , Issue 2, Pages: 415 – 435, 1996.
Sahuguet A.. Everything You Ever Wanted to Know About DTDs, But Were Afraid to
Ask. In Third International Workshop WEBDB2000. Lecture Notes in Computer Science.
1997: 171–183, 2000.
Washizaki H., Yamamoto H., Fukazawa Y.. A Metrics Suite for Measuring Reusability
of Software Components. Proceedings of the 9th International Symposium on Software
Metrics, 2003.
Cho ES., Min Sun Kim MS., Kim SD.. Component metrics to measure component
quality. Software Engineering Conference on, 2001. APSEC 2001. Eighth Asia-Pacific.
pages: 419- 426, 2001.
Boxall, M., and Araban, S.. Interface Metrics for Reusability Analysis of Components.
Proceedings of the 2004 Australian Software Engineering Conference (ASWEC'04),
2004.
Narasimhan, VL., Hendradjaya, B.. A New Suite of Metrics for the Integration of
Software Components. The First International Workshop on Object Systems and oftware
Architectures (WOSSA'2004), 2004.
Barnard, J.. A new reusability metric for object-oriented software. Software Quality
Control, vol. 7, pages: 35-50, 1998.
A.12 [Barnard et al. 1998]
63
Problem addressed: The author in this paper claimed that for both managers of the reuse
library and the developers, a way of measuring the reusability of the component is what
they really need. Only with reusability metric, the usefulness of a component can be
determined. For this reason, the author in this paper defines reusability metric that can be
used to give values for the reusability of OO code.
Previous work referred to: This paper is based on a project an experiment the author
involved to get the result and analyze the developed metrics. The previous work referred
to in this paper mainly focus on the reuse of component and some measurements. In some
literatures, the factors that affect software reusability were summarized. In general, it
includes low module complexity, good documentation, few external dependences, proven
module reliability and so on.
New algorithm/Metrics suite: The new metrics developed in this paper can be grouped
into for classes, for attributes, for methods and for input parameters. For each category,
the metrics were presented for measurement. Or classes, low coupling and good
documentation were included. For attributes, simple type-few sub-types and meaningful
name and description should be considered. For methods, low number of calls to foreign
classes and method should cover full coverage for all input parameters should be
considered. Also the author developed a formula for how to calculate the reusability
value.
Experiment carried out: Two phases experiment was carried out. The experiment 1
was easy and some programmers were involved in writing a apiece of specified object-
oriented code following the requirement. The programmers were separated into 2 groups.
One group was asked to write the code in a very unreusable manner while the other group
members are component on OO and write the most reusable code. The experiment 2 was
much more professional. It analyzes a wide selection of reusable classes (30 classes with
over 200 methods) taken from well used and accepted programming libraries.
64
Result obtained: The result of experiment 1 doesn’t provide much information about the
classes and documentation. Only few factors, such as Coupling Between Object (CBO)
and Lack of Cohesion in Methods (LCOM) differ from the manner expected: highly
reusable code has low CBO and LCOM while the unreusable code has high CBO and
high LCOM values. The attributes and method input parameters show a general trend that
need to be simple, well-named types for highly reusable codes. From method factors, it
showed that he highly reusable code has methods which perform only one function. From
the experiment 2, the author listed all the classes from the graoh that showed a constant
trend and didn’t show a constant trend. It appears that calls to library classes do not
follow the pattern of having low values for highly reusable classes. For highly reusable
classes, it only has high reusability within its own library language domain but not to
other domains.
Conclusion made: The presented reusability metric in this paper are based on empirical
evidence in which the classes were assumed to be highly reusable. From the experiment,
it can be seen that a class can be regarded as a black box which means that its content are
irrelevant to its reusability and the reuser can only concern its documentation and
interface. This metric is helpful for developer to write more reusable code.
This paper was referred by 8 times.
A.13 [Narasimhan et al. 2004]
Problem addressed: Modern approach to software re-use has been through Component
Based Software Engineering (CBSE). Lots of applications and system are created by
assembling independent components which are developed by various developers. It is
necessary to develop metrics for measuring the efficiency and effectiveness of such
applications, including predicting the cost, effort, and time to develop and testing. In this
paper, the authors propose a suite of metrics for measuring component integration.
65
Previous work referred to: The previous work related to this paper includes the research
in using the software components to build new system and the challenges it poses, the
white box reuse and black reuse of components for system integration. In literatures, the
metric of lines of code was used in counting the size of software; also, counting the
reusability of component was used as a metric. In addition, by focusing on the process
and measurement framework to evaluate the software component was investigated by
other authors.
New algorithm/Analysis: The metrics proposed in this paper include the complexity
Metrics for components, criticality metrics for component and triangular metrics three
parts. In complexity metrics for components, the component packing density (CPD) and
interaction density metrics for component integration (IDC) were proposed. The CPD is
defined in the form of a ratio of constituent to the number of components. The IDC is the
ratio between the actual number of interactions to the available interactions in a
component. IDC=#I/#Imax, where I is Actual Interactions and Imax is Maximum
Available Interactions. Similarly, the metrics for incoming and outgoing interaction
densities can be defined. The Incoming Interaction Density of a Component (IIDC) is a
ratio of incoming interaction used to the number of incoming interaction available. The
Outgoing Interaction Density of a Component (OIDC) is a ratio of the number of
outgoing interactions used to the number of incoming interaction available. For metrics
for critical component, the link criticality, bridge criticality, inheritance criticality and
size criticality have been proposed. For triangular metrics, it is a combination of CPD,
AID and Critical metrics into two-axis diagram of density and three-axis diagram to
represent the relationship among them.
Experiment carried out: There is no experiment carried out in the paper. The author
only used an example of Personal Digital Assistant software to illustrate the metrics. It
has three main components which have corresponding functions. In addition, the authors
proposed an experimental design and gave the expected outcome which wasn’t executed.
66
Result obtained: Each metric was applied to the example and the corresponding value
was calculated. The result showed the relationship between the metrics and its functions
in the component integration.
Conclusion made: The metrics proposed in this paper is based on a measurement theory.
It will help component-based developer to identify the level of complexity and criticality
in integrated software system. An experimental design was also proposed to validate the
metrics theoretically and empirically. A further study on component metric will be
continued.
This paper was referred by 8 times.
A.14 [Alkadi et al. 1998]
Problem addressed: The authors claimed that existing OO design methodologies do not
generally require adherence to design metrics and there is a lack of available methods to
measure the quality of the various components during the software design phase. The
method proposed in this paper is described for this purpose.
Previous work referred to: For object-oriented design, four key elements are considered
importantly, namely abstraction, encapsulation, hierarchy and modularity. These factors
have been investigated and corresponding metrics are presented by other people. In [S.
Chidamber et al, 1994] proposed six software metrics on software maintenance. In this
paper, the authors perform design evaluation by measuring the class structure of object-
oriented designs using the above metrics criteria.
New Metrics/Formulas: In this paper, a volume metric is defined to help characterize
the activity in a class hierarchy based on NOC (number of children) metric. The volume
metric represents the complexity of the object based on the number of methods and its
attributes. The formula of volume metric V0 is defined as the sum of the number of
inherited (NOI), overridden (NOO) and local (NOL) “internal or hidden” methods in an
67
object. In a class hierarchy, the volume of all the children “objects” V* is defined as the
sum of total number of methods in each object. The volume of a class Vc is defined as
the sum of number of attributes NOA and V*. In addition, two other formulas were
defined to compute the average of local and overridden/inherited methods (AIM) and the
percentage of overridden/inherited methods (POM). Based on above formula, a heuristic
was proposed to restructuring the process of software design.
Experiment carried out: There is no experiment carried out to evaluate the performance
of metrics proposed in this paper. The author only made an analysis for an illustration of
property of each metric.
Conclusion made: To evaluate the OO designs, the OO metrics should be formulated.
This paper introduced a method to evaluate NOC, measure the volume of an object and
measure the volume of the class. The formulas proposed to computes the ratio and local
and overridden methods over inherited methods. The heuristic was introduced to indicate
when object and class hierarchies need to be reexaminzed and supported with experiment
data.
This paper was referred by 6 times.
A.15 [Cho et al. 2001]
Problem addressed: Component-based software development requires quite different
approach from OO methods for it applies commonality and variability analysis,
components and component’s interfaces in developing the system. Various metrics
developed for OO programming cannot be equally applied to CBD process. The authors
in this paper presented component metrics that can be efficiently applied in CBD process
for this purpose.
Previous work referred to: Many different metrics were proposed for OO systems and
they were applied to the concept of classes, coupling and inheritance. It include:
68
Weighted Methods per Class (WMC), Response for a Class (RFC), Lack of Cohesion
(LCOM), Coupling Between Object Classes (CBO), Depth of Inheritance Tree (DIT) and
Number of Children (NOC). These metrics studied by others and they had some
limitations in application to CBD process.
New Metrics/Analysis: The new metrics proposed in the paper was to measure
complexity, customizability and reusability. The proposed metrics were categorized into
design metrics and implementation metrics. The metrics proposed for measuring the
component complexity were based on MCC. They were divided into four kinds:
component plain complexity (CPC), component static complexity (CSC), component
dynamic complexity (CDC) and component cyclomatic complexity (CCC). For
measuring the customizability, the metric of Component Variability (CV) was defined
and assigned weight value into behaviour customization methods and workflow methods
to be involved the calculation. For measuring the Reusability, the first metric is
component reusability (CR) which is defined by dividing sum of interface methods
providing commonality functions in a domain into the sum of total interface methods.
The second metric is Component Reuse Level (CRL) which is divided into two kinds.
The first one is measured by using lines of code (LOC) and the second one is by dividing
functionality that a component supports into required functionality in an application.
Experiments carried out: There is no experiment carried out in this paper. The authors
applied metrics into several projects proceeded in the bank domain and illustrated the
results.
Results obtained: The value of each metric proposed in the paper was calculated. By
applying metrics into component design and implementation, the result showed that the
number of factors of metrics applied in design time is fewer than the number of factors of
metrics applied in implementation time. Thus, measurement results of complexity or
reusability by using CCC and CRL are more accurate than of by using CPC, CSC, CDC,
and CR.
69
Conclusion made: The proposed metrics were tested and it was found that the
complexity of a component may help to estimate the component’s size. Also, the
reusability and customizability have great effect on software development. In addition,
the lines of code of components are suitable for measurements of reusability in CBSD
(Component-based Software Development).
This paper was referred by 33 times. It was referred by
Washizaki H., Yamamoto H., Fukazawa Y.. A Metrics Suite for Measuring Reusability
of Software Components. Proceedings of the 9th International Symposium on Software
Metrics, 2003.
Narasimhan, VL., Hendradjaya, B.. Detailed Theoretical Considerations for a Suite of
Metrics for Integration of Software Components. Advances in Systems, Computing
Sciences and Software Engineering, Springer Netherlands, pages: 257-264, 2006
Narasimhan, VL., Hendradjaya, B.. Some theoretical considerations for a suite of metrics
for the integration of software components; - Information Sciences 177, pages: 844-864,
2007.
A.16 [Boxall et al. 2004]
Problem addressed: Component-based software development relies in reusable
components in order to improve quality and flexibility of products. To assess the
reusability of a component before reusing it would be very helpful to reusers. Between
the two static sources of information for components, the component interfaces are easier
to be measure while the external documentation is hard to measure. Thus, in this paper,
the authors presented a set of metrics for measuring properties of understandability and
reusability.
Previous work referred to: In previous work, the study of the interfaces affecting
component reusability has been studied in literatures. In [S. Atkinson 1997], it studied
how interfaces affect the knowledge transferability. In other papers, it is indicated that
component interfaces are understandable and well documented is essential for users to
70
reuse them. In addition, the internal properties can affect the reusability and the interface
may be more important than software modules. These works were done by other people.
New Metrics: There is no algorithm proposed in this paper. The metrics presented in this
paper include: (1) Arguments Per Procedure (APP), it measures the mean size of
procedure declarations of an interface and is defined as , where is the
total count of arguments of the publicly declared procedures and is the total count of
procedures that are declared by an interface. (2) Distinct Argument Count (DAC), it
measures the consistency of the naming and typing of arguments. It is defined as DAC=|
A|, here A is the set of name-type pairs used as arguments in an interface. (3) Argument
Repetition Scale (ARS), it is an alternative to DAC, it is defined as , |a|
is the count of procedures in which argument name-type a is used and is the Argument
Count of the interface. ARS will be in the range between 1 and n. (4) Mean String
Commonality (MSC), it measures the commonality of a set of identifiers and defined as
, where A is a set of identifiers, n is the count of
identifiers, comb(A) computes the set of combinations of two elements from the set A,
and lcs(x,y) computes the longest common subsequence of x and y. (5) Identifier Length.
It includes the Lean Identifier Length (MIL) and Median Identifier Length (MeIL). MIL
is the weighted mean length of identifiers occurring in the interface and MeIL is the
weighted median length of identifiers. (6) Reference Argument Density (RAD), it
measures the occurrence of reference arguments in an interface and defined as ,
where is the count of pass by reference arguments and is the total Argument Count.
Experiment/Analysis carried out: Interfaces of 12 components were chosen to be
measured to provide empirical data for metrics analysis. They are Apple Frameworks,
(including the component of CoreAudio, CoreMIDI, DrawSprocket,
SystemConfiguration andveclib), Financial Software Components (including the
component of Core, DB, GUI, MemDB and XML) and MFC component and MLibrary
71
component. Besides, three command-line programs were developed to yield the values of
the metrics.
Result obtained: For each section, the result was obtained and analyzed. The result
considered as whether the interfaces are relevant to the reusability of components. For
some metrics, the linear relationship exists, or there is the possibility that they are
corrected with interface size. In addition, MFC has large interface and this makes it an
outlier. The detailed result analysis is available in the paper. Here, I don’t give them due
to the space and scope of the paper.
Conclusion made: The metrics developed in the paper can provide useful information of
reusability of component and can give a better understanding of the properties of
component’s interface. The theory used in analysis was consistent with expert
knowledge. In this paper, the metrics are not complexity measures, it is necessary to
consider evaluating the complexity. Also it is desirable to conduct more controlled and
extensive testing of the metrics. This will be author’s future work.
This paper was referred by 12 times.
A.17 [Theoharis et al. 2008]
Problems to be solved: Lots of semantic web schemas expressed either in RDF/S or
OWL and they have been widely analyzed. Different from previous analysis, in this
paper, the authors study the features of individual SW schema graphs and find that the
graph features of SW schemas has a power-law degree distributions. In addition, they
analyze the feature of morphology for each schema.
Previous work referred to: An SW schema can be viewed as a directed labelled graph
where nodes are classes or literal types and arcs are properties. The degree distribution is
a main feature of the graph. In [R. Gil et al. 2004], it essentially analyzes the aggregation
of all DAML library ontology which motivates the author in this paper to study the
72
detailed analysis of individual SW schema feature. In that paper, the total degree of
CCDF approximates a power law distribution. In [L. Ding et al. 2005] and [H. Halpin et
al. 2006], the authors collected a voluminous set of online FOFA documents and
analyzed the resulting RDF instance graph. It was observed a power-law for both in and
out degree distributions. Besides, the internet topology, the WWW and the call graph also
exhibit a power-law distribution for their in and out degrees.
New analysis: In this paper, the authors analyze the graphs of property graph and
subsumption graph. They investigate power laws on two different functions of a Discrete
Rndom Variable (DRV) X. The first called Complementary Cumulative Density Function
(CCDF) which is P(X>x), measures the frequencies of X values. The second is denoted
by Value versus Rank (VR), measures the relationship between the ith biggest X value
and its rank i. Also the authors use the findings in this paper to generate synthetic SW
schemas. The authors developed the algorithm and used the Linear Programming
reduction to generate SW graphs in polynomial time.
Analysis carried out: The authors conducted the experiments based on 250 schemas
collected from the RDFsuite, SchemaWeb and Swoogle schema registries.
Based on the properties and transitive relationship, the graph of a SW schema can be
divided into property graph and consumption graph. This corpus contains the biggest
schemas published on the WWW until May 2006. According to the size of each schema,
the corpus was categoeized into 2 groups. One consists of 83 schemas with more than
100 classes. The other one has 58 schemas that have more than 100 properties. The
schema which the size is smaller in extended 250 schemas is not analyzed separately.
Result obtained: The results show that the majority of SW schemas, 94.8% for VR and
67.2% for CCDF approximate a power law for property total-degree functions. By
analyzing the subsumed classes’ VR, the VR function approximates a power law for
87.9%, while that is 60.2% for CCDF. From sketching an abstract morphology of SW
73
schemas, it can be observed that the classes with big degrees in the property graph are
usually located at the higher levels of the subsumption graph. In addition, the class
subsumption hierarchies are mostly unbalanced with large branches and many leaves.
Conclusions made: This paper is the first work analyzing the property graph for
individual SW schemas. The majority of schemas with a significant number of classes
and properties approximate a power law for class and property total-degree distributions.
It can be concluded that there exist few “focal” classes that form the conceptual backbone
of the defined schema and many “peripheral” ones that are used to further detail the
analysis of the former.
This paper was referred: N/A
A.18 [Rotaru et al. 2005]
Problems to be solved: In component based software development (CBSD) system,
some metrics designed for object-oriented software (OOS) can’t be used for measuring
the reusability of CBSD. The authors in this paper proposed methods for this problem,
and studied the adaptability and compose-ability f software components. A mathematical
model for these characteristics and for evaluating non-functional parameters of software
component was presented.
Previous work referred to: For software metrics, the most widely used one is McCabe
Cyclomatic Complexity (MCC) [McCabe 1976]. It measures the number of linearly
independent paths through a program module and it is a count of the number of test cases.
It is calculated from a connected graph of the program module that shows the topology of
control flow and is defined as: CC=E-N+p, where E is the number of edges of the greaph,
N is the number of nodes and p is the number of connected components. Other metrics
for measuring the complexity include : Halstead Complexity (by counting operators and
operands), Bowels metrics (by evaluating the coupling via parameters and global
variables), Ligier metrics (modularity of the structure chart), troy and Zweben metrics
74
(coupling, complexity of structure, calls-to and called-by), Henry and Kafura metrics
(Fan-in – fan out). For OO metrics, they are mainly: WMC (Weighted Methods per
class), DIT (Depth of Inheritance Tree), NOC (Number of Children), CBO (Coupling
Between Object Classes), LCOM (Lack of Cohesion), and RFC (Response for a class).
New Metrics: The first metric proposed in this paper is the Compose-Ability metric. It is
defined qualitatively by studying the parameters and return values of its interface
methods. This metric is inversely proportional to its multiplicity. The multiplicity of an
interface method of a software component is the sum of its return and signature
multiplicities. Here, the return multiplicity is zero when the method doesn’t return
anything and one when the method has a return value. The signature multiplicity is the
sum of the multiplicities of the method’s parameters. Also, the authors analyzed the
component adaptability and its relationship to complexity. A model for adaptability
calculation was developed based on the formulas developed previously.
Analysis carried out: There is no experiment implemented. The authors further analyzed
the component adaptability with its calculation. The formula developed has a problem of
circular dependency, to solve this problem, the solution is to replace the interface
complexity with the complexity of the problem that is solved by the component. The
authors followed this solution and gave the concrete analysis.
Result obtained: The result for above analysis is illustrated in last paragraph.
Conclusion made: The authors in this paper addresses the market needs for uniform
metrics and methods of assessment the adaptability and compose-ability. The authors
proposed a mathematical model for practical assessment of software component. Both
from the perspective of the problem to be solved and the interface characterization, the
component complexity was assessed. An implementation quality metric was also defined.
It was generated by the necessity to break the circular dependency between the
adaptability of a component and its complexity.
75
Papers that refer to (cite) this paper: 4
A.19 [Kafura et al. 1987]
Problem addressed: The authors in this paper claimed that the high cost of maintenance
of the software is a major concern for software engineers. One of this problems perhaps
lies in no a comparable cohesive method for requirement analysis, design and testing of
the software system. In this paper, the authors explored the relationships between
different software complexity metrics and proposed a way to utilize the quantitative
information to form a complete maintenance method.
Previous work referred to: This paper was written in 1987. Before that, the studies on
software reuse had been on for some time. The metrics proposed in other literatures were
applied in this paper include: McCabe’s Cyclomatic complexity number, Halstead’s
Effort metric E, Lines of code, Henry and Kafura’s Information Flow Metric, McClure’s
Control Flow Metric, Woodfield’s Syntactic Interconnection Measure and Yau and
Collofello’s Logical Stability Metric. The first three is classed into code metrics and the
other four is structure metrics. They are applied in the experiments in this paper.
New Algorithm/Analysis: There is no new metrics proposed in this paper.
Experiments carried out: The system studied in this paper is a data base management
system called Mini Data Base (MDB). It’s based on a relational model and VMS
operating system on a VAX 11/780. The MDB is a medium size system with 16,000 lines
of FORTRAN code. The system has been developed for several years. There are total 4
versions before the system was applied in the paper for analysis. The analysis of
complexity changes was based on 4 versions.
Result obtained: The analysis and comparisons between the changes of system’s three
versions were made at two different levels. The first is system level. At system level, the
total complexity of the system for each of the seven metrics was computed and compared
for each version. The second is procedure level, in which the percent change in
76
complexity from one version to the next for each procedure was computed. From the
graph obtained in the system, it showed that in each later version of the system, there is
an obvious increase in complexity. This reflects the enhancement made and new
commands and capabilities added to the system. Also it showed the improvement in
maintenance and most of the maintenance preformed in Version was error correction. It
confirmed that the metrics reflect he growth in complexity of the system introduced as a
result of maintenance activities. At second level, it recorded the percent increases in
complexities for the procedures. From the result, it was observed that the metrics can be
used to identify improper integration of enhancements and some complex procedures can
lead to future improper structuring of the system, since maintainers would avoid using
such complex procedures when making enhancements. The other interesting observation
is that the result reflects the distinction between code and structure metrics for which
when the code metrics increases, the structure metrics don’t correspondingly to increase.
Conclusion made: From three different versions of the same single system, seven
software metrics were analyzed to assess and control software maintenance activities.
The study confirmed the results obtained in previous works. It also confirmed that it is
useful to examine carefully components which are “outliers” of one or more metrics. This
paper provided a good experience in building the software metric analysis tool.
This paper was referred by 65 times.
A.20 [Narasimhan et al. 2006]
Problems to be solved : In Component-Based Software Engineering (CBSE),
component based metrics have been proposed by several researchers. Some of them aim
at the reusability, while others focus on the process and measurement framework in
developing software components. In this paper, the authors’s research is an attempt to
build a suite of software metrics on software assembly.
Previous work referred to: The previous works focused on the component based metrics
are the use of Lines of Code (LOC) in counting the size of software and estimate the
77
number of LOC early in the software life-cycle. The shortcoming of this method is it is
hard to predict the size of the software prior to implementing. For some specific issues on
integration, [Sedigh Ali et al] discussed the complexity of interfaces and their integration
as quality metrics. [Cho et al.] define metrics for complexity, customisability and
reusability. The complexity metrics in this way is calculated by using the combination of
the number of classes, interfaces and relationships among classes. The cyclometric
complexity is calculated by sum the classes and interfaces.
New Metrics: Two metrics are proposed in this paper. They are static and dynamic
metrics respectively. The static metrics measure complexity and criticality of component
assembly. Here, the complexity is measured using Component Packing Density and
Component Interaction Density metrics. The criticality are defined as Link, Bridge,
Inheritance and Size criticalities. The complexity and criticality metrics are combined
onto a Triangular Metric. For Dynamic metric, it characterizes the dynamic behaviour of
a software application by recognizing component activities during run time.
Experiments/analysis carried out: A stock-broker system is applied to evaluate the
importance of the proposed metrics. In this system, a StockDistributor (SD) component
monitors a stock database. If some values change in this database, this component
generates an event via an event source to corresponding event sinks. Several StockBroker
(SB) components can respond with their event sinks. Then the authors apply the
framework proposed by Weyuker to validate the metrics proposed in this paper. That
framework identifies 9 properties.
Results obtained: For stock-broker system, the author calculates all its static and
dynamic metrics. Then triangular metrics is analyzed and it is concluded that the
StockQuoter system is a simple application. The number of cycle metric is can’t be
showed, since there is no actual cycle in the graph representation. For the analysis of
Weyuker’s framework, most metrics fulfill the Weyuker’s property criteria, while few do
not. It shows that the metrics presented in this paper can be used both at the design level
and at the implementation level.
78
Claims/conlusions made: The metrics proposed in this paper can help developer in
reasoning how complex a system is and locating critical areas in a component assembly.
It also can help to identify new super-components, and high extent of use of particular
components.
Papers that refer to (cited) this paper: 0
A. 21 [Bertoa et al. 2004]
Problems to be solved: ISO 9126 part 2 and 4 define a set of metrics to measure the
quality characteristics and sub-characteristics that constitute a Quality Model. But the
metrics defined here are mostly indirect metrics and they are defined without any
reference to the attributes they measure. This situation doesn’t improve the quality of
following metrics for software components. Even worsen, the metrics are defined, but
with no indication to either the attributes they measure or to the quality characteristics
they try to assess.
Previous work referred to: ISO 9126 Quality Model is a benchmark to software design.
Before this work, the authors adapt this quality model to software components. Based on
this model, plus the information collected from software component vendors, the metrics
developed were not adequate and even ill-defined. The purpose of developing metrics is
for the usability of software component. This purpose was widely studied in other
literatures. The definition of usability is also various. According to ISO 9126, usability
means understandability, learnability, operability, attractiveness and usability
compliance. In [B. W. Boehm et al. 1978] and [N. Fenton et al. 1998], the definition of
usability are also described. They are a basis for following improvement of usability
definition.
New metrics: There are three measurable concepts related with software component
usability: Quality of documentation, complexity of the problem and complexity of the
79
solution. The authors developed a set of metrics for each attributes. The tables are listed
in the paper.
Analysis carried out: There is no detailed experiments are carried out in this paper. The
authors only analyze the metrics developed in this paper and the relation to attributes they
measured.
Result obtained: In author’s own understanding and experience, they define the link
between attributes measured by metrics. The connection defined is between the quality of
documentation and the complexity of design to the understandability, learnability and
operability. Though there is not a unique direct relationship between a metric and a
quality subcharacteristic, but there are different degrees of relation between every metric
with each subcharacteristic. The authors concluded that the quality of documentation
mainly affects understandability and learnability. The complexity of design affects to
operability.
Conclusion made: Some of the metrics proposed in this paper are subjective and they are
difficult to automate. There is some limitation for this process. The degree of influence
that component attributes have over component subcharacteristics are analyzed and listed
in a table in the paper. Also some problems in this research are left for author’s future
work.
Papers that refer to (cited) this paper: 3
80
top related