a benchmark diagnostic model generation system

23
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SE PTEMBER 2010 959 A Benchmark Diagnostic Model Generation System Jun Wang and Gregory Provan  Abstract—It is critical to use automate d gener ators for syn- theti c models and data give n the sparsit y of benc hmark mod- els for empirical analysis and the cost of generating models by hand. We describe an automated generator for benchmark models that is based on using a compositional modeling framework and employs graphical models for the system topology. We propose a three-step process for synthetic model generation: 1) domain analysis; 2) topology generation; and 3) system-level behavioral model generation. To demonstrate our approach on two highly dif- ferent domains, we generate models using this process for circuits drawn from the Inter national Symposium on Circuits and Systems benchmark suite and a process-control system. We then analyze the synthetic models acc ord ing to two crite ria : 1) topologic al deli ty and 2) diagno stic efcien cy . Based on this comparis on, we identify parameters necessary for the autogenerated models to generate benchmark diagnosis circuit and process-control models with realistic properties.  Index Terms —Benchmark model gener ation , compo sitional modeling, diagnosis. I. I NTRODUCTION B ENCHMARK mode l su it es ar e vi tal to fa ci li tati ng pro gre ss in a va rie ty of domain s, and the pre sence of good benchmarks has had big impacts on several areas. For example, the benchmarks for SATISFIABILITY (SAT), e.g., SATLIB 1 and DIMACS, 2 have spu rred pro gre ss in that area; further, it has enabled SAT algorithms to be applied to a variety of oth er domain s, suc h as pla nni ng [1]. Ben chmark mod el suites are becoming increasin gly important for val idatin g a variety of algorithms in other domains, including very large scale integration (VLSI) design [2], [3], process control [4]–[6], and bioinformatics [7]. Diagnosis, in contrast to areas such as SAT and constraint satisfaction (CSP), has very few benchmarks. To our knowl- edge, there are only two public ly av ailab le bench marks for diagnostics: 1) the International Symposium on Circuits and Systems (ISCAS) benchmark models for discrete-valued mod- els [8] and 2) the DA MADICS ben chmark for contin uous- valued models [9]. Given the sparsity of benchmark models and the cost of generating models by hand, it is critical to design an automated generator for synthetic models and data. Manuscript received May 2, 2008; revised March 3, 2009. Date of pub- lication July 1, 2010; date of current version August 18, 2010. This work was supported by the Science Foundation of Ireland under Grant 04/IN3/I524. This paper was recommended by Guest Editor G. Biswas. J. Wang is with Fujitsu Laboratories of America, Inc., Sunnyvale, CA 94085 USA (e-mail: jwang@a.fujits u.com). G. Provan is with the Department of Computer Science, University College Cork, Cork, Ireland (e-mail: [email protected] c.ie). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TSMCA.2010.2052039 1 http://www.cs.ubc.ca/hoos/SATLIB/benchm.html. 2 ftp://dimacs.rutgers.ed u/pub/challen ge/satisability/b enchmarks/. T o sat isf y the nee d for ben chmark models , we descri be a domain-independent Comple x Sys tem Mod el Genera tor (CoSyMGen), which is based on using a compositional mod- eling framework and employs graphical models for the system topolo gy . Compo sitio nal model ing [10] is the predo minan t knowledge-based approach to automated model construction. It assumes that a system can be decomposed into a collection of components, each of which can be dened using a behav- ioral model. These component models are then integrated into the full system model using a system topology graph, which describes the component interactions. Standard compositional modeling tools, e.g., [11]–[14], re- quire manual construction of the component models and then use the comp onent li br ar y to sp eed up th is te di ous ma n- ual process of system-level model development. This manual process is necessary to capture target systems but is costly for compiling a suite of similar benchmark models for tasks like algorithm analysis. Our use of automated topology generator s overcomes the drawback of hand-generated topologies typical of compositional modeling [10] by using topology generators for this task. We base our automated topology generation on the recent discovery that the topology of virtually all real-world systems, from domains as diverse as World Wide Web (WWW), social networks, biological systems, and technological systems [15], [16], can be modeled using a graph framework [17]. A range of graph model s has been propose d, e.g., [17]– [19], which are signicant improvements over the classic random graph models traditionally used for the empirical analysis of algorithms in that they capture the topological properties of realistic systems much better than do classic random graphs [15]. Although it is known that different domains have different properties, e.g., [15 ] and [20 ], the re has been lit tle work on cha rac ter izi ng domain s bas ed on the und erl yin g pro per tie s. Further, unt il now, most analyzes of such models have been conned to the models’ global statistical properties (e.g., degree distribution, av era ge sho rte st connecting pat hs and clu ste rin g coe fcients) or the statistics of specic local connectivity patterns (motif) [15]. In contrast, little research has focused on the functionality and the corresponding complexity of generated graphs in practical applications. Further, existing models have inherently been inaccurate due to discrepancies between the graphical parameters of the real systems and those of the autogenerated graphs. For example, the well-known Watts–Stogatz [19] model requires an integral mean degree, whereas the mean degree of many systems is nonintegral [21]. We ad dress the va lidi ty of mode ls ge ne rated no t only in terms of their topological properties but also in terms of their behav- ioral properties. The behavioral property that we examine in this paper is diagnostics, specically the inference efciency of model-based diagnosis (MBD) [22]–[24]. The MBD problem 1083-4427 /$26.00 © 2010 IEEE

Upload: jun-wang

Post on 08-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 1/23

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010 959

A Benchmark Diagnostic Model Generation SystemJun Wang and Gregory Provan

 Abstract—It is critical to use automated generators for syn-thetic models and data given the sparsity of benchmark mod-els for empirical analysis and the cost of generating models byhand. We describe an automated generator for benchmark modelsthat is based on using a compositional modeling framework andemploys graphical models for the system topology. We proposea three-step process for synthetic model generation: 1) domainanalysis; 2) topology generation; and 3) system-level behavioralmodel generation. To demonstrate our approach on two highly dif-ferent domains, we generate models using this process for circuitsdrawn from the International Symposium on Circuits and Systemsbenchmark suite and a process-control system. We then analyzethe synthetic models according to two criteria: 1) topologicalfidelity and 2) diagnostic efficiency. Based on this comparison,we identify parameters necessary for the autogenerated models togenerate benchmark diagnosis circuit and process-control modelswith realistic properties.

  Index Terms—Benchmark model generation, compositionalmodeling, diagnosis.

I. INTRODUCTION

B ENCHMARK model suites are vital to facilitatingprogress in a variety of domains, and the presence of 

good benchmarks has had big impacts on several areas. Forexample, the benchmarks for SATISFIABILITY (SAT), e.g.,SATLIB1 and DIMACS,2 have spurred progress in that area;further, it has enabled SAT algorithms to be applied to a variety

of other domains, such as planning [1]. Benchmark modelsuites are becoming increasingly important for validating avariety of algorithms in other domains, including very largescale integration (VLSI) design [2], [3], process control [4]–[6],and bioinformatics [7].

Diagnosis, in contrast to areas such as SAT and constraintsatisfaction (CSP), has very few benchmarks. To our knowl-edge, there are only two publicly available benchmarks fordiagnostics: 1) the International Symposium on Circuits andSystems (ISCAS) benchmark models for discrete-valued mod-els [8] and 2) the DAMADICS benchmark for continuous-valued models [9]. Given the sparsity of benchmark models andthe cost of generating models by hand, it is critical to design an

automated generator for synthetic models and data.

Manuscript received May 2, 2008; revised March 3, 2009. Date of pub-lication July 1, 2010; date of current version August 18, 2010. This work was supported by the Science Foundation of Ireland under Grant 04/IN3/I524.This paper was recommended by Guest Editor G. Biswas.

J. Wang is with Fujitsu Laboratories of America, Inc., Sunnyvale, CA 94085USA (e-mail: [email protected]).

G. Provan is with the Department of Computer Science, University CollegeCork, Cork, Ireland (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCA.2010.2052039

1http://www.cs.ubc.ca/hoos/SATLIB/benchm.html.2ftp://dimacs.rutgers.edu/pub/challenge/satisfiability/benchmarks/.

To satisfy the need for benchmark models, we describea domain-independent  Complex System Model Generator(CoSyMGen), which is based on using a compositional mod-eling framework and employs graphical models for the systemtopology. Compositional modeling [10] is the predominantknowledge-based approach to automated model construction.It assumes that a system can be decomposed into a collectionof components, each of which can be defined using a behav-ioral model. These component models are then integrated intothe full system model using a system topology graph, whichdescribes the component interactions.

Standard compositional modeling tools, e.g., [11]–[14], re-quire manual construction of the component models and then

use the component library to speed up this tedious man-ual process of system-level model development. This manualprocess is necessary to capture target systems but is costly forcompiling a suite of similar benchmark models for tasks likealgorithm analysis. Our use of automated topology generatorsovercomes the drawback of hand-generated topologies typicalof compositional modeling [10] by using topology generatorsfor this task.

We base our automated topology generation on the recentdiscovery that the topology of virtually all real-world systems,from domains as diverse as World Wide Web (WWW), socialnetworks, biological systems, and technological systems [15],[16], can be modeled using a graph framework [17]. A rangeof graph models has been proposed, e.g., [17]–[19], which aresignificant improvements over the classic random graph modelstraditionally used for the empirical analysis of algorithms inthat they capture the topological properties of realistic systemsmuch better than do classic random graphs [15]. Although itis known that different domains have different properties, e.g.,[15] and [20], there has been little work on characterizingdomains based on the underlying properties. Further, untilnow, most analyzes of such models have been confined to themodels’ global statistical properties (e.g., degree distribution,average shortest connecting paths and clustering coefficients) orthe statistics of specific local connectivity patterns (motif) [15].In contrast, little research has focused on the functionality andthe corresponding complexity of generated graphs in practicalapplications.

Further, existing models have inherently been inaccurate dueto discrepancies between the graphical parameters of the realsystems and those of the autogenerated graphs. For example,the well-known Watts–Stogatz [19] model requires an integralmean degree, whereas the mean degree of many systems isnonintegral [21].

We address the validity of models generated not only in termsof their topological properties but also in terms of their behav-ioral properties. The behavioral property that we examine inthis paper is diagnostics, specifically the inference efficiency of model-based diagnosis (MBD) [22]–[24]. The MBD problem

1083-4427/$26.00 © 2010 IEEE

Page 2: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 2/23

960 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

focuses on isolating the root faults given an observation (e.g., of sensor values). More formally, MBD determines whether an as-signment of failure status to a set of mode variables is consistentwith a system description and an observation (e.g., of sensorvalues). This problem is addressed in various engineering fields,and the underlying structure of the problem that affects thediagnosis complexity is governed by the graph framework [21].

In this paper, we assume that we have a library of behavioralcomponent models for the domain in question, so the mainfocus of benchmark generation is on creating ensembles of random but “realistic” topologies. A range of methods exist togenerate system topologies, each of which has a set of specificinput parameters that must be optimized to create a modelthat accurately depicts a domain-specific topology. As we willshow, the different generation methodologies produce quitedifferent models, with different topological properties, such asdegree distribution, etc. Since each application domain requiresdifferent topological properties, the key to generating goodbenchmark models is to match the generation methodology to

the domain requirements.Our contributions are as follows:

1) We propose a domain-independent synthetic model gen-eration system, i.e., CoSyMGen, that can create modelswhose parameters can be optimized to conform to a rangeof different criteria.

2) We describe the following main phases of modelgeneration:a) domain analysis, which extracts topology and compo-

nent statistics and selects fidelity metrics Φ;b) topology generation, which automatically, rather than

manually, generates a high-fidelity system topologyG by optimizing the corresponding parameters Π in

terms of Φ;c) behavioral model generation, which uses the domainlibrary, the component statistics, and the system topol-ogy (G) to generate the behavioral equations.

3) We illustrate the domain-analysis and model-generationprocedure on two quite different domains: 1) MBD of discrete Boolean circuits (where we compare the topo-logical fidelity of the generated models to that of realcircuit models) and 2) process control diagnostics for apulp mill.

We organize the remainder of this paper as follows. Section IIdescribes the assumptions underlying our model-generationapproach. Section III illustrates the system architecture of 

CoSyMGen. Sections IV, V-A, and V-B describe the domainanalysis, the system topology generation, and the system behav-ioral model generation in the proposed benchmark generationprocess, respectively. Section VI presents the experimentalresults of diagnosis benchmark generation on the two differentapplication domains. Section VII compares our contributions toprevious results in the literature. Finally, Section VIII summa-rizes our contributions.

II. KEY ASSUMPTIONS

This section discusses two key assumptions underlying thisapproach: 1) model compositionality and 2) the ability to gener-

ate realistic system topologies. We first introduce the necessarynotation to discuss these topics.

  A. Notation

The models used in standard MBD frameworks, such as thequalitative (logical) [22] or fault diagnosis and isolation (FDI)[23], [24] approaches, typically define a system model in termsof a set B of behavioral equations, defined over sets of failuremodes and observable/controllable variables. The underlying

assumption is that such systems are not explicitly viewed asbeing decomposable, and the system model is treated mono-lithically. In contrast to these approaches, we are interested insystem models that are explicitly decomposable, so we mustcapture not only the behavioral equations but also a framework that captures the system’s compositional properties.

A system is compositional if its behavior consists of thecombination of the behaviors of its constituent components’behaviors. Due to this assumption, modeling/analysis of com-positional models can be more efficient than noncompositionalmodeling/analysis and scales better. A precondition for compo-sitional modeling is the existence of an underlying structure forthe model. This will describe how the different components of 

the system constrain each other.We can thus describe a decomposable model Ψ using two

orthogonal aspects: 1) behavior  and 2) topology (interaction).The behavior model describes the (possibly dynamic) behaviorsof the system and components, whereas the topology modeldescribes the component connectivity in terms of componentsand their connections and defines the constraints on componentbehaviors that enable their interactions to specify the system-level interaction [25].

  Definition 1: A composable system Ψ is defined using thepair (G, B), where B is the behavior model, and G is thetopology model.

We assume that a system Ψ can be decomposed into sub-

systems. There are two types of subsystem: 1) a component,which is a primitive subsystem, and 2) a composite subsystem,which can further be decomposed. A component represents thespecification of a primitive behavioral specification of  Ψ, i.e.,no further decomposition of behavior is possible that allowseach subfunction to coherently describe a process. We assume aset C  = {C 1, . . . , C  m} of components and that the input/outputtuple for each C j can be specified. By merging componentsand/or subsystems, we can define a hierarchical model; wedefine a flat model to consist of a system represented only interms of components C i ∈ C  and their interconnections.

In this paper, we focus on generating two classes of diagnos-tic benchmark model: 1) a propositional logic MBD model and

2) a Bayesian network (BN) model, as subsequently described.3

1) MBD Model: This section describes the general MBDframework, which we will use to illustrate our model con-struction process throughout this paper. We can characterize anMBD problem using the triple Φ = COMPS, SD, OBS [22],where we have the following:

1) COMPS = {C 1, . . . , C  m} describes the operating modesof the set of  m components into which the system isdecomposed.

3Note that we start from a propositional logic model as well as a differential

equation model, and our approach is not limited to generating discrete models;the approach is applicable to any class of model that satisfies the assumptionsdescribed in this paper.

Page 3: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 3/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 961

2) SD, or system description, describes the behavior of thesystem. This model encodes the system’s topology withinthe equations in SD.

3) OBS, the set of observations, denotes possible sen-sor/actuator measurements, which may be control inputs,outputs, or intermediate variable values.

We adopt a propositional logic framework for our MBD sys-tem behavior model SD. Component i has the associated modevariable C i; C i can be functioning normally ([C i = OK ]) orcan take on a finite set of abnormal behaviors.

MBD inference, using weak fault models [26], initially as-sumes that all components are functioning normally: [C i =OK ], i = 1, . . . , m. Diagnosis is necessary when SD ∪OBS  ∪ {[C i = OK ]|C i ∈ CO MPS } is proved to be in-consistent. Hypothesizing that component i is faulty meansswitching from [C i = OK ] to [C i = OK ]. Given some mini-mality criterion ω, a (minimal) diagnosis is a (ω-minimal) sub-set C  ⊆ CO MPS such that SD ∪ OBS ∪ {[C i = OK ]|C i ∈CO MPS \ C } ∪ {[C i = OK ]|C i ∈ C } is consistent.

In this paper, we adopt a multivalued propositional logicusing standard connectives (¬, ∨, ∧, ⇒). We denote variable Ataking on value α using [A = α]. An example equation for abuffer X  is [In = t] ∧ [X  = OK ] ⇒ [Out = t].

2) BN Diagnosis Model: We can frame a diagnosis problemas a BN, as done in [27]. Using our (G, B) framework, in a BN,the behavior model B consists of a set of factorized probabilitydistributions, and the topology model G is a graph.

  Definition 2 (BN): A BN is a tuple (G, B), where Gis a directed acyclic graph, and B is a set of factorized

  probability distributions constructed from vertices V  in Gbased on the topological structure of  G. B satisfies P r(V ) =ni=1 P r(vi|π(vi)), where π(vi)’s are the parents of vi in G.

If we model a diagnosis system as in [27], then given anobservation OB S, a diagnosis consists of the posterior distrib-ution of the failure modes, i.e., Pr(COMPS|OBS). We typicallywill select those components with the highest probability asthe most likely diagnoses. We can also compute multiple-faultdiagnoses within this framework [28].

  B. Compositionality Assumption

A domain D is compositional if a system model from Dcan be composed from model components, each of whichis defined by a component behavioral model. CoSyMGen isapplicable to any compositional domain for which structural

models and component libraries exist. There is a large bodyof evidence that virtually all mechanisms, whether natural orman made, exhibit properties of modularity, compositionality,and hierarchical design and hence are amenable to component-based modeling approaches.

Biological systems, which have evolved according to naturalevolutionary principles, clearly exhibit modularity and com-positionality [29]–[31]. The issue in the analysis of biosys-tems, given the assumption that modularity is ubiquitous, ishow to integrate the biological modules; integration theoriesinclude evolutionary frameworks, such as that governing theintegration of skeletal modules [32]. Further, several tools thatbuild biosystems from components have been implemented,

such as [7], [33], and [34]. In addition, component-based toolsfor systems as complex as ecosystems are becoming standard.

As an example, the Ecological Component Library for ParallelSpatial Simulation tool has been developed to build ecosystemmodels, at multiple spatial scales, from a library of reusableinterchangeable components [35].

Technological (or man-made) systems exhibit modularity,regularity, and hierarchy to an even greater degree than biosys-tems; in fact, modularity, regularity, and hierarchy are funda-mental principles of the theory and practice of engineeringdesign [36]. Further, engineering design has adopted the prin-ciples of modularity, regularity, and hierarchy as a key to cost-effective and reliable design, both in theory and in practice [36].As the complexity of technological systems increases, modulereuse increases based on 1) symmetrical and regular structuresand 2) developing standards for components and dimensions,since this regularity and component-based methodology trans-lates into reduced design, fabrication, and operation costs.

Component-based approaches have been used to model awide range of technological systems, with model representa-tions ranging from discrete logic to the complexity of hybrid

systems [37], [38]. Component-based hybrid system modelshave been developed for systems ranging from mechatronicsystems [39] to electrical power systems [40].

Since the focus of this paper is not on the theory of systemcompositionality or of the domain-dependent methods for cre-ating component-based system models, we will assume the allof the domains to which the synthetic generator will be appliedare compositional domains, and refer the reader to the literaturefor precise expositions of system compositionality, for example,[10], [38], [41], and [42]. Note that, although we restrict ouranalysis to systems where components can be connected byunidirectional arcs, component composition methods exist forhandling bidirectional component connections, e.g., [43].

In this paper, we focus on two forms of behavior equation:1) Logic: The compositionality of propositional logicmodels has been described in [44].

2) Probability equations: If we model a probability distribu-tion in terms of a graphical model, then the composition-ality of probability distributions is well defined [45].

We assume that a model can be generated from the tuple(G, B), where G denotes the topology graph, and B denotesthe system behavioral specification. The topology graph G =(V, E ) consists of vertices V  and edges E  and specifies thetopological relations among the system components. For anysystem Ψ that can be decomposed into a set of components, wedefine the graph G(V, E ) for Ψ such that we have the following:

• the vertices V  of G correspond to the components, inputs,or outputs of Ψ;

• the edges E  of G are such that (vi, vj) ∈ E  if (vi, vj) cor-respond to the component pair (C i, C j), Ψ and (C i, C j)are coupled.

If the coupling relations indicate directionality, then weassume that all edges of  G are directed. Our component libraryspecifies a behavioral description Bi for each component vi inthe system being modeled.

1) Compositionality Requirements: When we assume com-positionality, we assume that the components can be rep-resented in terms of a block , which consists of the tuple(I,O ,X , B), where a block has two types of ports (i.e., I 

is a set of input ports, and O is a set of output ports), X defines the internal variables, and B is the block’s behavioral

Page 4: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 4/23

962 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

description (behavioral equations) defined over (I , O , X  ). Theprocess of composing a system from components consists of selecting appropriate blocks and connecting the outputs of particular blocks to the inputs of other blocks. The output–inputconnection is possible if the types of the corresponding portsmatch [12]. Formal studies of causal block diagrams can befound in [12], [41], and [46]–[48].

Bond graphs are one modeling framework that explicitlycaptures a block composition framework for a broad range of continuous-time physical systems. Cellier [49] describes howto interconnect a set of basic bond graph elements within theobject-oriented modeling language Dymola.

In this paper, we focus on discrete-valued systems. As anexample of composing such systems, consider the case wherewe have two simple blocks Bi and Bj , each with one input andone output. We further assume that the equations for blocks Biand Bj are Oi = φ(I i, X i) and Oj = φ(I j , X j), respectively.Assume that we connect the output of block  Bi, which isdenoted by Oi, to the input of block  Bj , which is denoted

by I j . By our assumption of compositionality, we can composethe equations of  Bi and Bj by equating Oi and I j or byrenaming the input I j to the name of the output Oi.

Consider an example where Bi and Bj are both inverters,with modes M i and M j , respectively, where each mode hasvalues {OK, bad}, and with the equations given by

Bi : (M i = OK ) ∧ (I i ⇒ ¬Oi)

Bj : (M j = OK ) ∧ (I j ⇒ ¬Oj).

If we rename I j to be Oi, then we obtain the composedequations

(M i

= OK ) ∧ (I i

⇒ ¬Oi) (M j

= OK ) ∧ (Oi

⇒ ¬Oj

).

If the system equations are defined by differential equations,then an analogous example on block compositionality can beprovided, e.g., [12] and [46].

In our model generation, we assume that we can representthe block diagram, which consists of a set of blocks connectedby directed arcs, in terms of a directed graph G.

2) Topology Graph: The topology graph G defines the con-nectivity over the system components. For example, electroniccircuits can be viewed as graphs in which nodes are elec-tronic components (such as logic gates in digital circuits) andedges are wires in a broad sense [16]. In gene transcriptionalregulatory networks (TRNs), nodes represent genes and edgescorrespond to regulatory interactions at the transcriptional levelbetween the genes [7], [50].

G can be either directed or undirected, depending on the se-mantics of the component coupling relations. It is important tonote, however, that most compositional modeling frameworksassume directionality, either explicitly [41] or implicitly (i.e.,deriving the directionality) [12], [46].

C. Topology Generation Assumption

A second key assumption that we make is that the topologyof real-world systems can be captured using a random graphframework. In the past several years, several recent theoretical

studies and extensive data analyses have shown that a variety of complex systems, including biological [15], [17], social [15],

[17], and technological [16], [17] systems, share a common un-derlying structure, which is characterized by a class of randomgraph models. In this structure, the nodes form several looselyconnected clusters, every node can be reached from every othernode by a small number of hops or steps, and the degreedistribution P k, which is the probability of finding a node withk links, displays a heavy tail [17]. Several random graph modelshave been proposed to capture the real-world graph properties,such as the Watts–Strogatz [or small-world graph (SWG)][19] and the Barabasi–Albert [or preferential attachment (PA)]models [18].

Throughout this paper, we use this topological property toautomatically generate the structure of our diagnosis models.

III. SYSTEM ARCHITECTURE

The use of random-graph generators is appealing becausethey can generate an infinite number of models, in each of which the model parameters (e.g., number and type of com-

ponents) can be specified. However, only a small subset of random-graph models can be considered reasonable with re-spect to technological constraints and topological properties[51]. The key question for any work on benchmark generationis “How good are the synthetic models that are produced?” Forsynthetic benchmark models to be useful, they must be shownto be realistic proxies for real models. Thus, it is importantto have both a strong experimental platform and objectivemeasures of benchmark model quality with which to evaluatethe output of the automated generation process. Naturally,we can demonstrate realism by comparing real benchmark models to “clones” generated synthetically from the charac-terization of the existing real models. Real and clone models

could be compared on the basis of an important structural andbehavioral properties, and this realism assessment approachhas been widely used in the synthetic model generation of various application domains, including electronic circuits [51]–[53], the Internet [54], and biological systems [7]. We extendthese domain-specific approaches and propose our domain-independent automated benchmark model generators.

Generation Algorithm: We generate diagnostic (bench-mark) models in a three-step process.

1) Domain analysis: Analyze existing domain models toextract important model properties.

2) Topology generation: Generate the (topology) graph Gunderlying each synthetic model.

3) System-level behavioral model generation: Assign

components to each node in G and create the system-level

behavioral model B.

Fig. 1 depicts the system architecture. The figure shows howStep 1, domain analysis, characterizes two types of information:1) domain topological properties, which are used for topologygeneration (Step 2), and 2) component statistics, which are usedfor behavioral-model generation (Step 3). Step 2 is concerned

with generating a system topology graph G that captures thedomain topology with high fidelity. We have implemented thisstep in terms of model selection, i.e., selecting the model type(from among a large set of model types that can be created by

different model generators) that best matches the key topologi-cal properties of the original domain model. Finally, in Step 3,

Page 5: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 5/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 963

Fig. 1. Automated model generation framework.

using G, the component statistics, and a component library, we

create the behavioral model for the synthetic system.In the following sections, we will show the main model

generation steps in detail.

IV. DOMAIN ANALYSIS MODULE

We perform domain analysis to capture two types of data:

1) component statistical properties;2) topological statistical properties.

We now describe each data type in turn.

 A. Component Analysis

The analysis of the component properties extracts informa-tion about the distribution of components in Ψ and also poten-tially the connectivity patterns, or motifs, for the components.The statistical data on components help us to generate morerealistic behavioral models.

For example, for a circuit, we must classify the componenttypes based on their connectivity and obtain the relative distrib-ution for each connectivity class. Hence, in a circuit, which hasa directed graph as its underlying structure such that every nodecorresponds to a gate, we identify the following:

1) the set of connectivity classes, where each class is distin-guished by the pair ηij = (#-inputs, #-outputs) of everycomponent (node v ∈ G)4;

2) for each ηij , we identify the relative proportion of gates;for example, for η11, we compute the relative proportionof buffers and inverters.

It is possible to generate models using component clustersof size larger than the primitive components for model gen-eration. This approach has been advocated as the proper onefor biological domains, among others. For example, variousresearchers, e.g., [55]–[57], have argued that the underlyingbuilding blocks of biosystems, or motifs, consist of interactinggroups of between two and four genes, which control transcrip-tional regulation [58]. Given this evidence, Shen-Orr et al. [56]

4

Our current automated procedure deals only with the number of inputs, butin the future, we plan to extend the implementation to cover both inputs andoutputs.

have proposed motifs as the basic building blocks in biologicaland technological networks and further argue that such motifspossess direct analogs in technological systems. A motif  hasbeen defined as a subgraph that occurs significantly morefrequently in real-world networks than expected by chancealone [59]. The observed overrepresentation of motifs has beeninterpreted as a manifestation of behavioral constraints anddesign principles that have shaped the network architecture atthe local level. Some researchers argue that motifs reflect theunderlying processes that generated different networks and mayhave specific functions as elementary computational circuits inthe networks [59]. Motifs may then predict what system-levelfunction a network performs and how it performs it.

The proposed model generation approach can be used forsynthetic generation based on motifs as long as the domainanalysis is focused on the motif level rather than the componentlevel.

  B. Topological Analysis

We extract topological properties that are widely used tocharacterize complex systems [54], [60]. The obtained statis-tical data, based on key topological properties, can help us toselect the most plausible topology generation algorithms andbehavioral model.

We assume that we have a graph G(V, E ) with n vertices andm edges. Some key topological properties are now summarized.

1) Global Graph Properties: Two fundamental global graphproperties, which are highly domain dependent, are the charac-teristic path length L and the clustering coefficient C .

Shortest paths play an important role in the transport andcommunication within a network. For such a reason, shortestpaths have also played an important role in the characterization

of the internal structure of a graph. A measure of the typical sep-aration between two nodes in the graph is given by the averageshortest path length, which is also known as the characteristicpath length L, defined as the mean of geodesic lengths over allcouples of nodes.

Graph clustering characterizes the degree of cliquishnessof a typical neighborhood (a node’s immediately connectedneighbors). The clustering coefficient C i for a vertex vi is theproportion of links between the vertices within its neighbor-hood divided by the number of links that could possibly existbetween them. The graph clustering coefficient C  is the averageof the clustering coefficient C i for each vertex vi [15].

2) Graph Degree Properties: The degree (or connectivity)

ki of a node vi is the number of edges incident with the node. Alist of the node degrees of a graph is called the degree sequence.The most basic topological characterization of a graph G can beobtained in terms of the degree distribution P k, which is definedas the probability that a node chosen uniformly at random hasdegree k or, equivalently, as the fraction of nodes in the graphhaving degree k.

The degree distributions of some complex systems, such aspower grids, appear to have exponential tails: P k ∝ e−k/κ, asindicated by their approximately straight line forms on thesemilogarithmic scales [17].

Many real-world systems, such as the WWW and geneTRNs, are heavy tailed in their degree distributions. Power laws

can characterize their tails, i.e., P k ∝ k−γ , as indicated by theirapproximately straight line forms on the double-logarithmic

Page 6: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 6/23

964 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

scales [17]. They are also referred to as scale-free networks,although it is only their degree distributions that are scalefree [17].

Other common forms for degree distributions are power lawswith cutoffs [17], [20], [61], such as those seen in electroniccircuits and airport networks. The degree distribution looks likea power law over the lower range of values but decays quicklyfor higher values. Often, this decay is exponential, and hence,this is usually called an exponential cutoff: P k ∝ k−γe−k/κ,where e−k/κ is the exponential cutoff term, and k−γ is thepower law term.

While most of the focus regarding node degrees has fallenon degree distributions, there are higher-order statistics thatcould also be considered. Mahadevan et al. [54] introduces thedK  series of probability distributions that specify all degreecorrelations within d-sized subgraphs of a given graph G. Inthis framework, the degree distribution P k is the 1K  distrib-ution. The 2K  distribution is the joint degree distribution thatdescribes degree correlations for pairs of connected nodes. The

 joint degree distribution is

P k1,k2 =m(k1, k2)μ(k1, k2)

2m

where m(k1, k2) is the number of edges between nodes of degree k1 and k2, and μ(k1, k2) is 2if  k1 = k2, and 1 otherwise.The higher-order distributions can be defined analogously. Thestatistical data of  dK  distribution can be used as input con-straints of random graph generation approach in the subsequentsection.

3) Spatial Properties: A particular class of networks arethose embedded in real space, i.e., networks whose nodesoccupy a precise position in 2-D or 3-D Euclidean space,

and whose edges are real physical connections. Along with acomplex topological structure, many spatial networks display alarge heterogeneity in the wire length of the connections [15].For example, both electronic circuits and brain networks haveheavy-tailed wire length distributions [62], [63].

4) Design Objectives: Various researchers have also pro-posed optimization (OPT) approaches as a means of generatingsystem topologies [64], [65]. By investigating plausible objec-tives and constraints in the design of actual networks, observedtopological properties such as node degree distributions canbe understood as the natural byproduct of an approximatelyoptimal solution to a network design problem. For example,empirical analysis demonstrates that the wire length optimiza-

tion is among the underlying driving forces creating power lawdegree distributions with cutoffs in both electronic circuits andbrain networks [53], [62], [63].

C. Topological Metric Selection

We assume that we have a correct set of behavioral compo-nents, meaning that the system topology is the source of modelfidelity. In this case, we need to identify metrics for topologycomparison, i.e., methods to measure the topological distance

δ(G, G) between real topology graph G and synthetic topology

graph G.Naturally, the topological properties discussed in the previ-

ous section, i.e., the characteristic path length L, the clusteringcoefficient C , and the degree distribution P k, can be used

as standard metrics. In addition to these standard metrics,recently, some extended metrics have also been introduced forvarious applications [15], [54], [66]. We focus on the followingextended metrics.

s-Metric: The s-Metric of graph G is defined as s(G) =edge(vi,vj)

didj , where (vi, vj) are the edges in the graph, and

di and dj are the degrees of the node vi and vj , respectively. Thes-Metric is closely related to betweenness, degree correlation,and graph assortativity [54]. Recent research in both techno-logical and biological systems has shown that the correlationstructure has an important impact on system function andperformance [67].

Subgraph Frequency Distribution: P (F x(G)) defines thatprobability of subgraph of type x occurring in graph G. Thedistribution P (F x(G)) enables us to analyze the frequencies of all subgraphs with specified sizes, and such subgraph frequencystatistics have been successfully applied on the evaluation of biological network models [68], [69].

 Rent Exponent: For the evaluation of algorithms that are re-

lated to circuit partitioning, the main characterization parameterof interconnection complexity is the Rent exponent [53], whichreflects the relationship between the number of terminals in apartitioned circuit and the number of blocks per partition.5

  Inference Complexity Metric: In measuring the diagnosisinference complexity, we use the treewidth [70] of the system’stopology graph.6 We now explain our choice of metric formeasuring the diagnosis inference complexity.

The worst-case complexity of MBD has been thoroughlystudied. For the propositional models considered in this paper,the complexity is Σ p2-hard [71]. The average-case complexityhas not been heavily studied, apart from an empirical analysisof the ISCAS circuit benchmarks [72], nor are there detailed

comparative analyses of the different MBD algorithms. Exist-ing empirical evidence suggests that the diagnostic inferencecomplexity of a model Φ is a function of two parameters: 1) theobservation vector OBS  and 2) the treewidth τ  of the Gaifmangraph of  Φ [73].7 The observation vector OBS  is closely re-lated to the minimal cardinality diagnosis β  [74]; search-baseddiagnosis algorithms have complexity proportional to β , butcompilation-based algorithms, such as the Assumption-basedTruth Maintenance System (ATMS) [75] or causal network ap-proach [44], or stochastic algorithms like SAFARI, are relativelyinsensitive to β  [76]. The treewidth τ  of a diagnosis model Φgoverns the worst-case complexity of  Φ: the abduction, SAT,and CSP problems are fixed-parameter tractable in τ , which is

5These differences in complexity of the interconnect topology have beenexperimentally observed by Rent, and his observations led to the well-knownRent’s rule: a relationship between the average number of elementary blocks Bin the modules of a partitioned circuit and the average number of the module’sexternal connections (terminals) T , i.e., T  = tBr , where t is the averagenumber of terminals per logic block, and r is called the Rent exponent. Thisexponent is a measure of the interconnection complexity of the circuit. Its valueis always smaller than 1, with increasing values for increasing interconnectioncomplexity. Generally, it ranges from 0.47 for regular circuits up to 0.75 forcomplex circuits [53].

6Roughly speaking, the treewidth is a metric of the join-tree T   of a graph G,which is a topological transformation of G into a tree of cliques, where a cliqueis a fully connected subgraph [70].

7The Gaifman graph of a conjunctive normal form (CNF) formula is a graph

having a vertex for each variable and an edge (v1, v2) if the variables v1 andv2 occur in the same clause of the formula. By treewidth of a CNF formula, werefer to the treewidth of its Gaifman graph.

Page 7: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 7/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 965

TABLE ITOPOLOGY GENERATION APPROACHES. INPUT PARAMETERS FOR GENERATION ALGORITHMS ARE AS FOLLOWS: n—NOD E NUMBER;

m—EDG E NUMBER; pr—REWIRING PROBABILITY; α—SPATIAL FACTOR; gs—SEED NETWORK; pd—DUPLICATION PROBABILITY; λi—TRADEOFF WEIGHT; d—SUBGRAPH SIZE

an algorithm-independent measure [77]; see also [78].8 Provan[72] showed the proportionality of inference complexity to τ for the causal network approach, in addition to the fact that the

parameter τ  dominates β .Given that τ  is an algorithm-independent measure, we adopt

it in this paper as a diagnosis complexity measure for ourpropositional models. For our BN models, τ  is accepted as thede facto inference complexity metric. Rather than computingthe treewidth τ , we adopt the metric of maximum clique sizeμ(τ ), noting that μ(τ ) = 1 + τ  [70].

We need to select suitable metrics to validate the synthetictopology based on the requirements of the particular applica-tion. A parsimonious model can capture the general principlesor structures of real-world systems, but it is hard to match alltopological metrics simultaneously and perfectly. As a conse-quence, we need to identify and understand the essential metrics

that are responsible for key behaviors of each application torank the performance of synthetic models primarily in termsof the specified metrics. For example, if the performance of a routing algorithm depends only on the distribution of theshortest path length in the network, then the topologies of areal-world and synthetic network match perfectly as soon astheir distance distributions are the same, independent of othercharacteristics [54], [79]. In the evaluation of different physicaldesign algorithms, not all aspects of realistic circuits will betaken into account, but only those that influence the aspects thatone wants to study [52]. For algorithms related to partitioning,the Rent exponent is used as one of the main objective metrics,but for some other applications, particularly for timing-driven

applications, the delay distribution is an important metric [52].Our proposed framework enables a user to specify any

evaluation metric. To empirically demonstrate the use of met-rics, in this paper, focus on generating benchmark models forevaluating the complexity of discrete MBD algorithms, andadopt a join-tree metric that focuses on diagnostic inferencecomplexity, as described in Section IV. We have previously

8The MBD problem can be framed as an abduction problem, and it canmake use of SAT algorithms for its inference. Although the SAT problem isrelated to the diagnosis problem, their complexity is different, and the entailedalgorithms are different. It was empirically shown in [76] that a SAT solver willnot solve diagnosis problems as well as custom diagnosis algorithms like the

one proposed in [76]. In fact, many diagnosis solvers use a SAT solver as asubroutine in the diagnosis algorithm; hence, SAT is often a subset of diagnosisinference.

Fig. 2. Selecting plausible algorithms depending on domain analysis. Candi-date algorithms include SWG, PA, SPA, PD, and OPT.

shown that this inference metric is more important than othernetwork characteristic [21] metrics.

V. MODEL GENERATION

This section describes the topology generation module, aswell as the instantiation of the topology with a behavioralmodel.

 A. Topology Generation Module

To generate a synthetic topology graph G using analgorithm A, we provide to A a set Π of input parameters.

We then measure the properties of  G (e.g., degree distribution)using a set Φ of graph metrics [54] to compare the properties of 

the real and synthetic topology graphs and make adjustments tothe generation process, if necessary.

There is a wide range of generation algorithms availablefor synthetic topology generation, e.g., [15] and [80]. Table Iclassifies the space of topology generation approaches that ourmodel generation tool supports, together with their key proper-ties, corresponding parameters, recommended applications, andassociated model generation computational costs. Based on theresults of domain analysis, the key properties can be appliedto select suitable algorithms. Fig. 2 shows an overview of thisselection process. For example, the PA [15] algorithm requires

the number of nodes and edges of  G as input parameters andis a viable model when the degree distribution P k ∝ k−γ (is

power-law distributed) with no cutoff. We classify the generatormodels into two main groups, as shown in column 1 of Table I:

Page 8: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 8/23

966 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

1) explanatory models, which attempt to capture the underlyinggrowth or evolution process of the system topology in theresulting model, or 2) descriptive models, which randomlygenerate topology graphs under the constraints on the specifiedtopological properties, independent of any complex systemgrowth process. For example, the PA model captures a specificnetwork growth process of complex systems, in which a newstructure preferentially forms around existing substructures[15], and provides a plausible explanation for the origin of the corresponding power law degree distributions. In contrast,given a specified degree sequence, the descriptive generalizedrandom graph (GRG) model [15], [81] randomly forms edgesby pairing nodes with uniform probability and satisfies thedegree constraint on each node simultaneously. The process weadopt to generating high-fidelity synthetic models differs basedon this basic classification. In the following, we will summarizeour model selection process (using these two classes) and thenreview the different generation approaches.

1) Topology Generation Using Explanatory Models: Given

key properties obtained from the topological analysis of real-world network  G in domain D and specified metric set Φaccording to domain-specific requirements, we select an ex-planatory model from a set A of possible generators (seeTable I) as follows:

1) Generate a potential algorithm set A ⊆ A based on theresults of domain analysis.

2) Optimize the parameters Πi of each algorithm Ai ∈ A

to match G in terms of specified topological metrics Φ

and put Ai into the result set A if it can match G withappropriate values of Πi.

3) If A contains multiple algorithms, we compute additionalmetrics Φ, according to further requirements in D, andcontinue to evaluate and select algorithms in terms of  Φ.

As discussed in Section IV, the degree distribution is oneof the most fundamental topological properties, and as shownin Table I, many current topology generation approaches focuson the capability to capture the degree distributions of real-world systems. Fig. 2 shows a typical example of step 1in the above topology generation process, in which potentialalgorithms are selected according to the analysis on degreedistribution. For example, in gene expression simulation, weanalyzed the topology graph of the E. coli TRN and found thatit displays a clear power law degree distribution. Accordingto Fig. 2, the PA and partial duplication (PD) models seem

viable choices for topology generation. Since the syntheticTRN topology graph G is used to generate gene expressiondata (on which the accuracy of reverse-engineering algorithmsis evaluated), we only need to measure the model fidelity interms of regular topological metrics. For this task, we can usethe degree distribution P k as the basic fidelity metric Φ for the

synthetic topology G. The metric of P k can be simplified as thecorresponding exponent γ when following a power law; the γ of the E. coli TRN is about 2.5. The parameters in the P A(n, m)and P D(n,m,gs, pd) algorithms are optimized in terms of  γ ;n and m are assigned as the numbers of nodes and edges in theactual TRN model, respectively. In step 2, we further optimizevalues of the input parameters to minimize the difference in

terms of the γ  between the synthesized topology and the actualTRN. The PA model can only generate graphs with γ  around 3,

Fig. 3. GeneratingSWG froma regularring lattice withrewiringprobability pr .

deviating from that of the TRN, but the PD model can generateγ  in a wide range (1–3). Finally, the PD model with ( pd =0.2) closely matches the actual TRN, much better than thePA model [82]. In some more complicated cases, both PA andPD models can closely match the degree distribution of a realsystem, such as the yeast protein interaction network (PIN),and we need to compute additional metrics Φ like subgraphfrequency distribution to further evaluate and select algorithms,

as presented in step 3 [83]. Our topology generation approachescan be used in a wide range of applications, including di-agnosis benchmark generation and bioinformatics simulation.More concrete examples of the topology generation process fordiagnosis are demonstrated in Section VI.

When using an explanatory model, we first restrict the possi-ble algorithms based on Model Focus (cf. column 2 in Table I),i.e., whether the domain D provides information to generate amodel from topological properties or using an OPT approachgiven the system’s global objective function. We briefly discussthese two types of approaches.

Topology-Based Generators: Given the wide range of graphgenerators defined in the literature, e.g., [15] and [80], we have

selected four of the most important approaches, i.e., SWG,PA, spatial PA (SPA), and PD models. These models show thegeneral and fundamental principles underlying the topologiesof real-world systems, and we can extend them and achievehigher fidelity by introducing richer sets of domain-specificexternal parameters. Each approach has particular properties,which lend themselves to modeling particular domains withdiffering fidelity. We now summarize each model in turn.

SWG model: This model aims to capture “small-world”properties observed in many real-world systems like electroniccircuits [16] and power grids [19], such as low characteristicpath length L relative to that of a classic random-graph (ER)model Lr (L Lr), and high clustering coefficient C  relative

to that of an ER model C r (C  C r). The SWG generatorextends a regular ring lattice with a set of random connectionsdetermined by a rewiring probability pr . We adopt the extendedSWG approach of [21], which can model the arbitrary meandegrees k that occur in real systems, and not just integral meandegrees, as in the standard generator. pr 0 corresponds toa regular graph, and pr 1 corresponds to a random graph;graphs with real-world structure occur in between these ex-tremes. Fig. 3 depicts the graph generation process, wherewe control the proportion of random edges using a rewiringprobability pr. By continually increasing pr , the regularity andmodularity of the generated graph will keep decreasing, morelong-range links and nodes with higher degree will appear, and

consequentially, the characteristic path length L will becomesmaller.

Page 9: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 9/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 967

Fig. 4. Graphs generated by the SPA model by enhancing the spatialconstraint.

PA model: This model focuses on capturing the power lawof the degree distribution using a WWW-inspired generationprocess [18]. Starting with n0 isolated nodes, at each t =1, 2, . . . , n − n0, a new node vj with mnew links is added to thegraph. The probability P (vi, vj) that a link will connect vj toan existing node vi is linearly proportional to the actual degreedi of the node vi.

SPA model: The SPA model extends the PA model witha parameter α that improves the ability to capture networkswith spatial constraints embedded in physical space, such aselectronic circuits, telecommunications networks, and trans-portation networks [15]. In the SPA model, the node positionis chosen randomly in a 2-D square space with uniform density.Connections of a new node vj with each existing node vi areestablished with probability P (vi, vj) ∝ diw

−αij , where wij is

the spatial (Euclidean or Manhattan) distance between the nodepositions, di is the degree of the node vi, and α ≥ 0 is atunable parameter used to adjust spatial constraint and shapethe connection probability in the PA process. When α = 0, themodel corresponds to the standard PA model. By continually

increasing α, the modularity of the generated graph will keepincreasing, fewer long-range links and high-degree nodes willappear, and consequentially, the characteristic path length Lwill become larger. Finally, the degree distribution will degradefrom the power law distribution to the exponential distributionwith sharper cutoff. Fig. 4 displays the SPA graph generationby adjusting the geometric constraint. Similarly, we also extendthe PA process to match the mean degree k of the real circuit.

PD model: The PD model aims to capture the duplicationmechanism, which is a dominant evolutionary force in shapingbiological networks [80], in contrast to other mechanisms suchas PA. Given a initial seed network  gs, the network is updatedby randomly choosing a node vi, adding a duplicate of  vi

called node vj , and connecting vj to each neighbor of  viwith probability pd. This model and its variants have beenwidely applied on bioinformatic applications related to PINsand TRNs.

Optimization-Based Generators: Rather than explicitlyreplicate the statistical properties, the OPT approach usesan optimization framework to model the mechanisms drivingnetwork growth or evolution. The OPT model formulates aweighted objective function over conflicting system propertiesξi and weights λi, e.g., f  =

ni=1 ξi · λi, and trades off the

properties using the weights λi. For example, in some sys-tems embedded in physical space, such as electronic circuitsand brain networks, the topological structures are shaped and

optimized under two conflicting constraints: 1) informationtransmission steps (characteristic path length L) and 2) cost

of constructing connection (average wire length W ) [53], [63],[84]. The objective function is formulated as follows: f  =λL + (1 − λ)W , where 0 ≤ λ ≤ 1. For this OPT model, theoptimization function concentrates only on minimizing the totalwire length at λ = 0, and a regular network emerges with anearly uniform degree and high characteristic path length L.

At λ = 1, the optimization function concentrates only on mini-mizing the average shortest path length, and a star-like network emerges with highly connected hubs. The topology graphs withthe power law distributions with cutoffs should emerge when0 < λ < 1 [64], [65]. The optimization process is looking fora solution that minimizes the above objective function at anappropriate value of  λ. This approach can also give rise topower laws in graph degree distributions with cutoffs underappropriate values of  λ [64], [65]. Starting from a connectedrandom network in which nodes are evenly put on a 2-Dsquare, we use simulated annealing to search for the minimumcost of the objective function [85], [86]. In each annealingrearrangement step, an edge is randomly selected and rewired.

In rearrangements, duplicated edges and self-loops are notallowed to ensure that no node will be disconnected or isolated.2) Topology Generation Using Descriptive Models: The

dK -series Model generator [54] has as its primary input pa-rameter an integer d, which allows one to specify all degreecorrelations within d-size subgraphs of a given graph G.9 1K captures the degree distribution P k and is equal to the GRGapproach [15], [81]. 2K  randomly generates synthetic graphsby maintaining of the joint degree distribution of the giventopology graph G, and 3K  considers interconnectivity amongtriples of nodes.

Generally, the set of  (d + 1)K -graphs is a subset of  dK -graphs, and larger values of  d further constrain the number of 

possible graphs. Given a descriptive dK -series algorithm, wegenerate a synthetic model G by increasing the input parameter

d until the generated graph G matches the properties of the real-world graph G with sufficient fidelity in terms of the specifiedmetrics Φ. Increasing values of  d capture progressively moreproperties of  G at the cost of more complex representationof the probability distribution and dramatically increasing thecomputational complexity.

Mahadevan et al. [54] found that the d = 2 case is sufficientfor most practical purposes, whereas d = 3 essentially recon-structs the Internet AS- and router-level topologies exactly interms of regular graph metrics. Our experiments show similarresults on the TRN of the E. coli and electronic circuits [82].

Another dimension in model selection encompasses tradeoffsbetween 1) the complexity of a model and the number of metrics it tries to reproduce, and 2) its explanatory power andassociated generality. Although the dK -series model can gener-ally capture regular topological metrics better than explanatorymodels due to the number of constraints imposed, we cannotuse it to discover laws governing the topology growth processof a particular system. It lacks the predictive and rescalingpower necessary for benchmark generation. Our experimentson diagnosis model generation also showed that the dK -seriesmodel is not flexible enough for fitting more complicated join-tree metrics.

9A large number parameters are needed for every value of d in real imple-mentations, but d is the governing parameter.

Page 10: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 10/23

968 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 5. Graph created for simple circuit topology. Inputs are denoted byyellow nodes, components by blue nodes, and outputs by dotted green nodes.

 B. System Behavioral Model Generation

To generate a behavioral model, we assign components toG and then merge the behavioral equations for each assignedsystem component.

1) Assign Components to Graph G: Given a topology graphG, we associate with each node in G a component based on thenumber of incoming and outgoing arcs for the node. Hence,

given a node v ∈ G with i inputs and o outputs, we assign acomponent denoted as ΔZ(i,o,τ, BZ , w), where τ  denotes thetype (e.g., AN D gate, OR gate), BZ defines the functionality(behavioral equations) of component Z , and w denotes theweights assigned to variables, e.g., probabilities assigned to thecomponent failure modes of Z .

  Example 1: Consider that the topology generation processhas created a graph G, as shown in Fig. 5. This graph has beencreated with appropriate proportions of input, component, andoutput nodes based on our domain analysis. Given this structureas input, together with a component library and componentstatistics, we can now assign components to G.

Given a node v ∈ G, we randomly assign to v a suitable

component with probability based on the computed componentdistribution. For example, the single-input nodes correspondto single-input gates (NOT and buffer), and the dual-inputnodes correspond to dual-input gates (AN D, OR, NAND, NO R,and XO R).

2) Generate the System Behavioral Model: In the final step,we generate the system functionality in terms of the union of the component functions such that we match the correspondinginputs and outputs. As an example of input/output matching,consider the following: if output 1 of component X , which isdenoted as OX,1, is the second input to component Y , which isdenoted as OY,2, then we set OX,1 = OY,2. Once this is done,we merge the component behavioral descriptions to generate

the system behavioral model B. At present, we assume thatwe can simply take the union of the component behavioraldescriptions; in future work, we plan to explore systems forwhich the behavioral composition is more complicated.

VI. CAS E STUDIES: ISCAS BENCHMARK CIRCUITS

AN D PROCESS-C ONTROL SYSTEMS

In this section, we summarize experimental results compar-ing the structure and diagnostic inference complexity propertiesof the autogenerated models with the ISCAS circuits, which arean established benchmark for circuit optimization [8], and a realpulp mill benchmark model developed by Castro and Doyle [6].

The ISCAS benchmark suites consist of multiple setsof circuits, which include the ISCAS-85, ISCAS-89, and

Fig. 6. Two-input XO R functional block implemented by four NAND gates.

ISCAS-99 circuits [8]. We have run experiments for the fullsuite of ISCAS-85 benchmarks. We present only a few demon-strative results here due to space limitations. The pulp millbenchmark model consists of modular representations of unitoperations in a complete pulp mill. The pulping process bench-mark is based on a nonlinear dynamic mathematical modelof an actual pulp mill process. The benchmark can be usedas a basis for studying several process-system tasks, includingmodeling, control, estimation, and fault diagnosis [6].

 A. Domain Analysis

Component Analysis:

 ISCAS-85 circuits: The ISCAS-85 benchmark circuits arepresented in netlists of fundamental logic gates, which pro-vide a standard nonhierarchical representation specifying bothnetwork topology and functionality (in terms of the functional-ity of primitive gates) [87]. Our component analysis revealedthat only seven types of primitive gates (NAND, NO R, NOT,AN D, OR, XO R, and BUFF) appear in ISCAS-85 circuits, andin general, each circuit contains only four or five types of gates.

The NAND gates are the most common components in every

ISCAS-85 circuit. For instance, the percentage of  NAND gatesin C17, C432, and C1355 is 100%, 74%, and 84%, respectively.One possible reason for the prevalence of the NAND gate isthat it is the cheapest gate to manufacture. Additionally, NAND

gates alone can be used to reproduce the functions of all theother logic gates. Fig. 6 shows how an XO R functional block can be implemented by four NAND gates, and the prevalence inC1355 of many NAND gates may be due to the XO R functionsit repeatedly performs.

Another property of note is that the same type of componentmay have various numbers of inputs. For example, in C432,most AN D gates have two inputs, but a small number of  AN D

gates have four, eight, and even nine inputs. We need to

carefully consider above circuit component statistics in systembehavioral model generation.

Pulp mill: According to the detailed schematic diagramsof the pulp mill in [6], there are 130 basic physical componentsand about 180 connections in the pulp mill benchmark. Themost common basic components are valves, which are used toconnect components in and between various key units. Table IIlists the statistics of the major components used in the pulp millbenchmark.

Fig. 7 shows several typical devices used in our pulp millprocess control component library, together with the variablesrepresenting the devices’ primary behavioral role. Only thepulp washers are modeled by algebraic equations. All the other

units are modeled by ordinary differential equations or partialdifferential equations [6].

Page 11: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 11/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 969

TABLE IISTATISTICS OF MAJOR COMPONENTS IN THE PUL P MILL BENCHMARK

Fig. 7. Partial component library for pulp mill process control domain. Eachcomponent has specific process measurement variables associated with itsfunctionality, as shown under the component’s name.

Topological Analysis:

 ISCAS-85 circuits: Cancho et al. found SWG patterns fora small collection of electronic circuits and observed powerlaw tails with cutoffs in degree distributions [16]. Fig. 8 showscumulative degree distributions for the full suite of ISCAS-85benchmark circuits in log–log scale. We can see that most cir-cuits exhibit long tails with cutoffs in their degree distributions.

Existing analysis has conjectured that the cutoffs in power lawdegree distributions might result from the presence of spatialconstraints limiting the number of links when connections arecostly [20], and this has been confirmed by further studies ondiverse networks such as the Internet, power grids, transportnetworks, and brain networks [16], [20], [63], [88]–[90].

In circuit design, the wire length has been treated as theprime parameter for performance evaluation since it has adirect impact on several important design parameters [53], [91].Recent research on circuit placement showed that the wirelength of real circuits exhibits power law distributions [53],[62], [84]. Another driving force underlying circuit design istiming. Many design cost metrics can be treated as technolog-

ical parameters that can be optimized by trading off delay andwire length [84]. The delay of signal transmission among com-

Fig. 8. Cumulative degree distributions of ISCAS-85 benchmark circuits inlog–log scale.

TABLE IIICHARACTERISTIC PATH LENGTHS OF REA L CIRCUITS

AN D CORRESPONDING RANDOM GRAPHS

ponents can approximately be simplified as the characteristicpath length.

In an electronic circuit, a cluster of components correspondsto components that together serve a particular task, e.g., asubsystem; the relatively small number of connections between

clusters corresponds to the fact that subsystems are typicallyloosely coupled. As shown in Table III, the characteristic pathlength of each ISCAS-85 circuit is close to that of the 1K random graph with corresponding size. Fig. 9(a) and (b) showsdifferent views of a typical ISCAS-85 benchmark circuit, i.e.,C432. It is important to note the density of shortcut edges joining nodes that have long paths (based on the circle connec-tivity). This circular network view displays such connectionsclearly and demonstrates that the overall graph distance orcharacteristic path length for this network will be relatively lowdue to such connections.

Pulp mill: Based on the flow sheet and detailed schematicdiagrams in [6], we decomposed the pulp mill benchmark into

fundamental device components and generated the underlyingtopology graph according to the physical connections amongthese fundamental components. The topology graph of the pulpmill displays a clear power law degree distribution with cutoff,which is discovered in many technological complex systems.Fig. 10(a) and (b) shows different views of the topology graphof the pulp mill, and both figures show that most connectionsare local connections between near neighbors, and there area few long-range connections. The pattern displayed in bothfigures is consistent with the modular structure of the topologyof the pulp mill. The detailed schematic diagrams in [6] showthat the pulp mill consists of eight loosely coupled submodules,in each of which the nodes are densely connected. We also

found that the characteristic path length L of the topology graphis much larger than that of a corresponding random graph.

Page 12: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 12/23

970 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 9. (a) Directed graph depicting the topology of circuit C432, displayed in a circular view. (b) Directed graph depicting the topology of circuit C432,displayed in a regular view.

Fig. 10. (a) Topology graph of the physical components in the pulp mill displayed in a circular view. This figure shows far fewer shortcuts than the correspondinggraph of C432. (b) Topology graph of the physical components in the pulp mill displayed in a regular view. (c) Topology graph of the physical components andembedded control interfaces in the pulp mill displayed in a circular view. (d) Topology graph of the physical components and embedded control interfaces in thepulp mill displayed in a regular view.

In our topological analysis, we only consider connections be-tween physical components; however, some components, such

as valves and condensers, have an embedded control interfacein addition to the inlet and outlet flow. The topology shown in

Fig. 10(a) and (b) ignores the embedded control interfaces of these components. If we treat each embedded control interface

as an “input” component, then we can generate a new topol-ogy graph with 232 nodes and nearly 300 edges. Fig. 10(c)

Page 13: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 13/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 971

and (d) shows different views of the corresponding topologygraph, which share a similar pattern with the topology graph inFig. 10(a) and (b). The new topology graph also displays a clearpower law degree distribution with cutoff.

If we compare the two circular views of the pulp mill[Fig. 10(a) and (c)] with that of a typical circuit, e.g., C432[Fig. 9(a)], then we see that the pulp mill has far fewer shortcutedges than does the circuit. As noted earlier, these shortcutedges connect components that would otherwise be connectedby long paths. Hence, these figures clearly show that the pulpmill has a longer mean graph distance or characteristic pathlength than does the circuit. In addition, even with the controlstructure, the essential linear aspect of the process controlsystem is the dominant structural feature.

1) Objective MetricSelection: This section addressesways inwhich we analyze the fidelity of the synthetic models. The prim-ary metric that we use is derived from the MBD inference task.

  MBD Autogeneration Task: Given a real-world model (B,G), the objective of MBD autogeneration is to create an “equiv-

alent” synthetic model (ˆ

B,ˆ

G) that minimizes |γ A(ˆ

B,OBS ) −γ A(B,OBS )|, where A is an MBD inference algorithm thathas complexity γ A(B,OBS ) when computing a probabilityminimal diagnosis given B and observations OBS .10

As noted earlier, the treewidth of the topology graph is thekey parameter for determining the inference complexity. Thetreewidth is closely related to the largest clique size μ(T  ) of G[44], which is the parameter we adopt as a complexity measurefor an MBD model defined in terms of propositional logic oras a BN.

The main objective of our study is to provide as close acomparison of the system models of the circuit and pulp milldomains as possible. For the circuits, it is straightforward to

create propositional logic and BN diagnostic models. Althoughdiagnostic models based on propositional logic and BN differfrom the FDI approach taken for most diagnostics studies of the pulp mill, e.g., [92] and [93], system-level BN modelscan provide useful diagnostics for the global behaviors of apulp mill. To build a qualitative BN model for the pulp millbenchmark, we discretized all of our variables. For example, theinput signals (manipulated variables) are scaled and constrainedbetween ±1.0, and output signals are scaled based on thenominal/steady-state value of outputs and the maximum rangeof change of the outputs [6]. The scaled continuous-valuedvariables can further be converted into discrete-valued variablesaccording to the specification of the pulp process.

  B. Topology Generation

  Explanatory Model Approach: We generated topologygraphs using the steps shown below.

 ISCAS-85 circuits:

1) Based on the evidence of power laws with cutoffs indegree distribution and wire length, the SPA model,combining PA with the constraint of spatial layout, is aplausible candidate for the topology generation of cir-cuits. The OPT model can give rise to power laws withcutoffs in both degree distribution and wire length distri-

10We assume that γ (·) returns a complexity parameter such as CPU time ornumber of nodes searched.

bution under appropriate λ. We, along with Barthelemy[94], have found that, under appropriate parameters, theSPA model can generate structures similar to that of theOPT model. However, the computational cost of modelgeneration using the OPT model is significantly higherthan that using the SPA model, so we use the SPA modelas an efficient alternative of the OPT model in practicalapplications. The SWG model is also a possible choiceto fit the SWG pattern observed in circuits. The SWGmodel naturally has a sharp cutoff in its exponentialdegree distribution and can vary the tail length of degreedistribution in a limited range.

2) We automatically optimized parameters in each model tomatch the μ(T  ) of real circuits. Experiments showed thattwo selected models can both match real circuits withappropriate parameters. For example, the typical circuitC432 can be matched by the SWG model with pr 0.28[21], as shown in Fig. 11(a), and the SPA model withα 3.7, as shown in Fig. 11(b). Fig. 11(c)–(f) shows the

results of some other circuits.3) Since both SPA and SWG models fit the real circuitswell in terms of  μ(T  ), we can further refine the modelselection by other topological metrics, such as degreedistribution P k. As shown in Fig. 12, the SPA model canmatch real circuits better than the SWG model in termsof P k .

Pulp mill:

1) Since the pulp mill displays a power law degree distribu-tion with cutoff, the SPA model is a natural candidate fortopology generation. According to the connection patternshown in Fig. 10(a) and (c), the SWG model seemsanother possible candidate. Due to the large characteristic

path length L, the pr of the SWG model should haverewiring probability with a relatively small value.

2) We automatically optimized parameters in each model tomatch the μ(T  ) of the above two types of topology graphsof the pulp mill benchmark, respectively.

Topology graph of physical components: Fig. 13(a) showsthat the pulp mill benchmark can be matched by the SWGmodel with pr 0.15. However, as shown in Fig. 13(b), thecorresponding SPA model cannot match the low μ(T  ) of thepulp mill benchmark. By increasing α, the SPA model cangenerally produce the graphs with sharper cutoffs in degreedistributions and consequent lower μ(T  ), but as shown inFig. 13(b), tuning α becomes counterproductive when α > 5

due to the limited size of the pulp mill benchmark. As shownin Fig. 14(a), the SPA model can match the pulp mill bench-mark better than the SWG model in terms of  P k. The data inFig. 14(a) are averaged over 100 graph instances and demon-strate that the SPA model can generally closely match thedegree distribution of the pulp mill benchmark, although asmall fraction of graph instances, with overly long tails in theirdegree distributions, contribute to a high μ(T  ).

Topology graph of physical components and control inter-

 faces: The results are similar to those of the topology graphof physical components. Fig. 13(c) shows that the pulp millbenchmark can be matched by the SWG model with  pr 0.08.As shown in Fig. 13(d), the SPA model cannot match the low

μ(T  ) of the pulp mill benchmark. Fig. 14(b) shows similarresults on P k as well.

Page 14: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 14/23

972 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 11. We use the SWG and SPA models to create synthetic circuits with the number of nodes and edges identical to that of real circuits, and analyze thecorresponding average-case inference complexity and average maximum degree (averaged over 100 runs). We automatically optimize the appropriate parameter(rewiring probability parameter pr of the SWG model or spatial constraint parameter α of the SPA model) to match the inference complexity (measured bymaximum clique size) of real circuits. Experiments show that the circuit depicted can be matched by the SWG/SPA model with corresponding pr/α parametersin terms of  maximum clique size. For each plot, we show the generator type and the parameter value. (a) C432, SWG ( pr 0.28). (b) C432, SPA (α 3.7).(c) C499, SWG ( pr 0.18). (d) C499, SPA (α 5.4). (e) C880, SWG ( pr 0.14). (f) C880, SPA (α 5.8).

dK -Series Approach:

  ISCAS-85 circuits: As shown in [95]–[97], when d = 1,

the dK -series model can match the topological properties of ISCAS-85 circuits well.

Pulp mill: As shown in [97], even when d = 3, the dK -series model cannot capture the basic topological properties of 

the pulp mill benchmark, although the same approach can per-fectly match all the regular topological metrics of the Internet,

Page 15: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 15/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 973

Fig. 12. Cumulative degree distributions of C432 and graphs generated by theSWG ( pr = 0.28) and SPA (α = 3.7) model (averaged over 100 graphs).

TRNs, and electronic circuits. The characteristic path lengths L

of both topology graphs of the pulp mill benchmark are around30% larger than that of the corresponding 3K  models.

C. Behavioral Model Generation

We illustrate how we construct behavioral models for thecircuits and pulp mill, respectively, using a continuation of ourrunning example. We show how we create behavioral modelsusing propositional logic and BNs.

 ISCAS-85 circuits:

 Example 2: Given a desired system with n components, wegenerate the required topology, as shown in Fig. 15(a). Thetopology in Fig. 15(a) depicts the schematic of a simple circuit

with arbitrary components A, B, C, D, and E. The circuit hastwo inputs I 1 and I 2, with the output of component i denotedby Oi.

 Example 3: Propositional logic: Given the topology of thenine-node graph G in Fig. 5, we assign components to G asfollows:

1) For each input node, we assign an input.2) For each output node, we assign an output.3) For each component node, we assign a component based

on the distribution over component types.

Fig. 15(a) depicts the schematic of the circuit with twoinputs I 1 and I 2, with the output of component i denotedby Oi, and arbitrary components A, B, C, D, and E. Given

the component distribution, Fig. 15(b) shows the circuit withinstantiated components.

Finally, we generate the system behavioral specification interms of the union of the component functions. For our di-agnostic application, we generate the behavioral descriptionas the union of the components’ normal-mode equations (and

potentially failure-mode equations). In the following, we definethe normal-mode equations for the components Inverter, Buffer,and AN D:

Inverter i (M i = OK ) ⇔ (I i ⇒ ¬Oi)Buffer i (M i = OK ) ⇔ (I i ⇒ Oi)

AND i (M i = OK ) ⇔ [(I i1 ∧ I i2) ⇒ Oi](M i = OK ) ⇔ [(¬I i1 ∨ ¬I i2) ⇒ ¬Oi] .

If we rename appropriate inputs and outputs, then we obtainthe composed equations, shown at the bottom of the page.

  Example 4: BN: In a BN [98], we assign to each node v ∈G a probability distribution P r(v|π(v)), where π(v)’s are theparents of v in G.

Given the component assignment in Fig. 15(b), we assigna distribution to failure-mode values by assuming that nor-mal behavior is highly likely, i.e., P r{C i = OK } 0.99,and faulty behavior is unlikely, i.e., P r{C i = OK } 0.01.Fig. 15(b) depicts the instantiated failure mode for the com-

ponents in shaded boxes: Components B, C, and E have SA1fault modes,11 component A has SA0 fault mode, and com-ponent D has INVERT fault mode. Given this information,we can generate a system description with distributions cor-responding to the component types and fault mode types, as just described. The distributions required are P r(OA|I 1, M A),P r(OB|OA, M B), P r(OC |I 2, OA, M C ), P r(OD|OB , M D),and P r(OE |OC , M E). Table IV shows a sample distributionfor P r(OA|I 1, M A).

Pulp mill: According to the process described in the pre-vious section, we can generate a synthetic topology graph G.Fig. 16(a) shows a snippet of the generated topology graphG (including both physical components and control interfaces)

with nodes A, B, C, and D. We associate with each node inG a component based on the number of inputs and outputs forthe node. All four nodes in Fig. 16(a) have two inputs and oneoutput, and we randomly select a component for each node withprobability that is proportional to its frequency in the actualpulp mill. As shown in Fig. 16(b), the four nodes are randomlyassociated with the valve and condenser, respectively, which aretwo types of the most common components in the pulp mill, asshown in Table II.

In the component library of the pulp mill domain, a com-ponent is denoted as ΔZ(i,o,τ, BZ , wZ), where τ  denotes thetype (e.g., valve, condenser, or tank), BZ defines the function-ality (behavioral equations) of component Z , and wZ denotes

the probabilities assigned to the component failure modes of Z . For example, the valve component V  is represented asSDV (2, 1, valve, BV , wV ) in the component library.

11A component with a stuck-at-1(stuck-at-0) fault outputs t(f ) (resp.) inde-pendent of the input(s).

(M A = OK ) ⇔ (I 1 ⇒ ¬OA)(M B = OK ) ⇔ (OA ⇒ OB)(M D = OK ) ⇔ (I B ⇒ ¬OD)(M E = OK ) ⇔ (I C ⇒ ¬OE)

(M C  = OK ) ⇔ [(OA ∧ I 2) ⇒ OC ](M C  = OK ) ⇔ [(OA ∧ I 2) ⇒ ¬OC ]

Page 16: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 16/23

974 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 13. We use the SWG and SPA models to create synthetic networks with the number of nodes and edges identical to that of real networks, and analyze thecorresponding average-case inference complexity and average maximum degree (averaged over 100 runs). We automatically optimize the appropriate parameter(rewiring probability parameter pr of the SWG model or spatial constraint parameter α of the SPA model) to match the inference complexity (measured bymaximum clique size) of real networks. Experiments show that the pulp mill network depicted can be matched by the SWG model with appropriate pr parameterin terms of maximum clique size but cannot be matched by the SPA model. For each plot, we show the generator type and the parameter value. (a) Only physicalcomponents: SWG ( pr 0.15). (b) Only physical components: SPA. (c) Physical components and embedded control interfaces: SWG ( pr 0.08). (d) Physicalcomponents and embedded control interfaces: SPA.

Fig. 14. Cumulative degree distributions of the pulp mill and synthetic networks generated by the SWG and SPA models (averaged over 100 graphs). (a) Onlyphysical components: SWG ( pr = 0.15) and SPA (α = 5). (b) Physical components and embedded control interfaces: SWG ( pr = 0.08) and SPA (α = 4).

According to the technological specification of the valve V ,

we can discretize the flow rate of  V  based on appropriatethresholds and define the flow rate as the following three states:

f  (flow), rf  (reduced flow), and nf  (no flow) [99]. The possible

fault modes M V  for valves can be modeled as OK , sc(stuck closed), lk (leaking), and hb (half-blocked) [99]. We assume

Page 17: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 17/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 975

Fig. 15. Schematic of a simple electronic circuit.

TABLE IVTRUTH TABLE FOR INVERTER A, WIT H SA0 FAULT MOD E

(WHERE THE OUTPUT IS f  INDEPENDENT OF THE INPUT)

Fig. 16. Schematic of a snippet of the synthetic pulp mill benchmark.

that the control condition ctrlV  of the valve V  (i.e., whetherit has been commanded to be open or closed) is known. Thepropositional equations for a behavioral model BV  of the valveV  can be defined as shown in Table V. The group of formulasin Table V describing the behavior of a valve V  shows that theoutput flow outV  depends on the behavioral mode M V , on theinput flow inV , and on the current control condition ctrlV  of the valve V ; for example, the first formula of the group assertsthat, if valve V  is in the OK  behavioral mode and the controlcondition is o (open), it will simply report its input flow inv (forexample, f ) to its output. On the contrary, if valve V  is in the

hb behavioral mode (half-blocked), then its control condition isc (closed), the input inV  is f , the output outV  is rf  (reduced

TABLE VBEHAVIORAL EQUATIONS BV   FOR THE VALVE COMPONENT V 

flow). In general, we use the symbol ∗ to denote any admissiblevalue for a variable to limit the number of formulas [99].

Similarly, we can also define the behavioral model for con-densers, on which the temperatures of the flow are measured.

In the final step, we generate the system functionality interms of the union of the component functions such that wematch the corresponding inputs and outputs. The system modelfor our model fragment consists of three sets of valve equations(as depicted in Table V), together with a set of condenserequations.

To construct a BN model, we need conditional probabilitydistributions for the components V A, COB , V C , and V D ratherthan propositional equations. The set of distributions needed forour model fragment is

P r (outV A |inV A ,ctrlV A , M V A)

P r (outCOB |outV A ,ctrlCOB , M COB)

P r

outV C |outCOB , ctrlV C,D, M V C

P r

outV D |inCOB , ctrlV C,D, M V D

.

For the distributions, we assume that all the faulty behavioralmodes are equally likely, i.e., P r{M V  = OK } = 0.01, andhave a much smaller probability than the nominal mode, i.e.,P r{M V  = OK } = 0.99. We extract the remaining probabili-ties based on the analysis of the differential equation models.

 D. Analysis of Synthetic Model Explanatory Model Approach:

  ISCAS-85 circuits: As shown in Fig. 11(a)–(f), the taillengths of the degree distributions are highly correlated withinference complexity, and the tail of  P k must be modeledwell since it defines the high-degree nodes that contributeto large cliques in the join-tree and hence high complexitiesusing join-tree metrics. The PA and PD models generate degreedistributions with tails that are longer than those of real circuits,and therefore, the resulting models have inference complexityhigher than those of the corresponding real circuits.

We analyzed all ISCAS-85 benchmark circuits with theSWG and SPA models, and these two models can capture the

inference complexity of all real circuits under appropriate pa-rameters. The experimental results of all ISCAS-85 benchmark 

Page 18: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 18/23

976 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

TABLE VISTATISTICS ON THE AVERAGE CLUSTERING COEFFICIENT C ,

CHARACTERISTIC PATH LENGTH L, AN D s-METRIC OF CIRCUITS AND

TH E GRAPHS GENERATED BY THE CORRESPONDING 3K-SERIES

MODEL (AVERAGED OVE R 100 GRAPHS)

TABLE VIIINFERENCE COMPLEXITY OF C432 AN D CORRESPONDING dK -SERIES

MODELS (d = 1, 2, 3). A LL VALUES OF THREE MODELS ARE

AVERAGED OVER 100 GRAPHS, RESPECTIVELY

circuits are similar, so we list the experimental results of thethree circuits due to limited space.

Pulp mill: Although the SWG generally cannot matchthe degree distribution well, it is a good and flexible modelfor fitting small-scale systems in terms of  μ(T  ). As shown

in Fig. 13(a) and (c), even for systems without “small-world”properties like the pulp mill benchmark (with large characteris-tic path length), the SWG model with a small pr is appropriate.In contrast, as shown in Fig. 13(b) and (d), the SPA model isnot suitable for generating the diagnosis benchmark of the pulpmill, which has relatively low complexity.

dK -Series Approach:

 ISCAS-85 circuits: Although this model captures the de-gree distribution and other regular topological properties well(Table VI), Table VII shows, however, that even d = 3 pro-vides insufficient fidelity to match μ(T  ) metrics for MBDbenchmark generation; it also shows that increasing d cangenerate random graphs with increasing levels of fidelity of 

inference complexity. For d > 3, the computational complexityincreases dramatically, and the size of the generated random-graph ensemble decreases exponentially as well. In this case,the dK -series model is unsuitable for diagnosis benchmark generation compared with the SPA and the SWG model. ThedK -series model is not flexible enough for diagnosis bench-mark generation due to the number of constraints imposed.

Pulp mill: In the existing topological analysis of complexsystems [54], [82], researchers found that k = 3 is gener-ally sufficient for reproducing regular topological propertiesperfectly. However, we found that the characteristic pathlength L of the pulp mill benchmark is much larger thanthe corresponding 3K  model, and the result shows that the

underlying topology of the pulp mill benchmark has someorganizational principles different from the Internet, TRNs, andelectronic circuits. To distinguish underlying mechanisms willbe an interesting topic in the future.

  Diagnosis Complexity Analysis: Table VIII lists the infer-ence complexity data of the ISCAS-85 benchmark circuits. Forall circuits of nontrivial size, the maximum clique size is verylarge, indicating that the diagnostic inference complexity formultiple-fault diagnoses would be high.

The topology graph shown in Fig. 10(a) and (b) and thetopology graph shown in Fig. 10(c) and (d) of the pulp millhave the same μ(T  ) value of 512, which is a really smallvalue compared with that of C432 (1.4 e14), although C423

has approximately the same number of components as the pulpmill benchmark.

TABLE VIIIMBD INFERENCE COMPLEXITY IN TERMS OF THE LARGEST

CLIQUE SIZE IN THE COMPILED BN MODEL

VII. RELATED WOR K

  A. Automated Model Generation

The topology generation method that we adopt was orig-

inally developed based on the theory of random graphs andcomplex networks—see [15] and [17] for background in thisarea. However, this method focuses solely on the topologi-cal structural characteristics (as captured by the graph) andignores the system behavioral specification. Such topologicalcharacteristics pertain only to a relatively abstract level, anddeveloping high-fidelity topologies for applications like simu-lation requires more precise high-order properties, which mustbe obtained by domain analysis and parameter optimizationbased on application-specific requirements. In [15] and [17],the SWG and SPA models are also discussed, but only thebasic topological properties of these models are analyzed. Aswe show in the case studies in this paper, if we want to

use these models to capture the systems in real applicationswith high fidelity, we need to optimize the parameters of the models based on practical application requirements (suchas inference complexity for MBD applications) beyond justtopological “accuracy.” We also extend this approach in [15]and [17] by adopting the system structure based on the random-graph generators and then encoding system functionality usinga component library.

This paper is most closely related to domain-specific modelgenerators, which exist for circuits [53] and biological interac-tion kinetic models [7]. Our approach is different from either of these approaches in that we make no prior assumptions aboutdomain properties but rather compute the domain properties

necessary for model generation. Our model generation ap-proach differs from related work in VLSI autogeneration, e.g.,[53], in several ways. The VLSI approach emphasizes circuitsimulation of physical design for partitioning, floorplanning,placement, and routing; in contrast, our approach focuses ongenerative topological and organizational principles of circuitsand can be used for a wider variety of applications, includingthe diagnostic applications we report.

In [53], Stroobandt developed a synthetic benchmark gen-eration method called gnl for generate netlist  that generatescircuits according to Rent’s rule. A bottom-up clustering ap-proach is used for gnl. As a result, any distribution of cellscan be used as a starting point. A piecewise-linear Rent model

is used, which allows the creation of circuits with complexityspecified in terms of Rent characteristics. gnl is mainly used to

Page 19: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 19/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 977

Fig. 17. We use gnl to generate benchmark circuits with the size and cell distribution identical to those of the ISCAS circuit by tuning the input parameter of the

Rent exponent in a wide range (the Rent exponent R of each ISCAS circuit is noted) [53]. (a) C432 (R 0.62). (b) C499 (R 0.55). (c) C880 (R 0.57).

evaluate the partitioning applications of electronic circuits, andwe found that it is not suitable for benchmark generation forMBD applications. As shown in Fig. 17(a)–(c), we use gnl togenerate benchmark circuits with the size and cell distributionidentical to those of real circuits by tuning the input parameterof the Rent exponent in a wide range. According to the objectivemetrics used in partitioning applications, the synthetic circuitgenerated by gnl best matches the real circuit when the inputparameter of the Rent exponent is same as the exponent of thereal circuit [53], but the corresponding inference complexity (inMBD applications) of synthetic circuits deviates far from that of 

real circuits. There is no correlation between the Rent exponentand the inference complexity, so gnl has less control on theinference complexity of generated circuits compared with ourbenchmark generation approach. It is interesting to note that in[21], we found that our autogenerated circuits had reasonableRent exponents similar to those of the real circuits. Moreover,as shown in Fig. 18, we can easily generate synthetic circuitswith expected Rent exponents by tuning input parameters, suchas rewiring probability. In physical design, the Rent exponentcharacterizes the interconnection complexity of circuits, andFig. 18 showed it is also correlated with the rewiring proba-bility. The Rent exponent and the inference complexity havethe same trend rising with the rewiring probability. The results

showed that a higher interconnection complexity correspondsto a higher inference complexity.

Fig. 18. We use the SWG model to generate synthetic circuits with thenumber of nodes and edges identical to that of the real circuits, and analyzethe corresponding Rent exponents over a wide range of rewiring probabilities.

This paper improves upon the model generation approachof [21] in several ways. First, it explicitly defines a domain-analysis phase. Second, it extends and improves upon the topol-ogy generation algorithms for creating the underlying system

structure and examines a wider range of metrics for empiricallyevaluating synthetic networks.

Page 20: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 20/23

978 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

Complex systems such as electronic circuits and the Internetkeep growing up to a larger scale with new technologicaldevelopment, but their structures still share some common orga-nizational principles and topological parameters. We start fromexisting models because we want to show that our approachcan generate models with realistic properties of real systems.After obtaining optimized topological parameters by comparingwith existing real models, we can select reasonable values of topological parameters and specify larger values of the sizeparameters to create synthetic models on a much larger scale.

Although we focused on the ISCAS-85 circuits, other re-cent studies also indicate that much larger circuits also havesimilar topological properties [100], [101]. These studies alsoshow that the PA mechanism is plausible for VLSI circuits:when designing a circuit, one creates new components andconnects them to existing components, with higher probabil-ity to connect to larger fanouts (power supply, clock, reset,global buses or control signals, etc.) than smaller fanouts [101].Cheng et al. [101] demonstrated the possibility of non-Rent

based yet equally plausible and well-fitting structural modelsfor VLSI circuits and their interconnections. These resultsare consistent with our analyses on ISCAS-85 circuits. Giventhat PA-based models have been “empirically validated” withsimilar procedures as in previous empirical studies of Rent’srule, Cheng et al. [101] even suggest that a power-law fanoutdistribution or other interconnect structural characteristics mayneither necessarily follow from, nor imply, that the circuit is“Rentian.” Hence, they claim that emphasis on structure basedon Rent’s rule may be inappropriate.

 B. Model Composability and Modeling Tools

Compositional modeling uses a set of behavioral component

models, together with a specification of component interactions(called a “scenario” in [10]), to generate useful (mathematical)models. Our approach differs from that of [10] in that we createthe system structure or scenario using model generators insteadof manual work. Further, although the model generation (orcompositional modeling) approach has primarily been appliedto physical systems, it can be applied to other domains, suchas socioeconomic, ecological, and biological systems [7], [50],[102], [103].

Biswas et al. explore the use of bond graphs [104], [105]for compositional modeling. Manders et al. [38] describe howa meta-programmable visual modeling tool called the genericmodeling environment can be used for compositional model-

ing, and in [40], they describe the use of this approach forsystem simulation. Along similar lines, Bouamama et al. [106]describe a tool for designing FDI algorithms for thermofluidprocesses, together with a library of component models [107].Our proposed model-generation approach is fully compatiblewith bond graph modeling and model generation for FDI.

Several authors have addressed the formal aspects of compo-sitional modeling. For example, Gößler and Sifakis and Gosslerand Sifakis [25], [108] address formal modeling for compo-sitional modeling from the point of view of heterogeneousinteraction and execution and extend the framework in [42].Denckla and Mosterman [41] defined a formal block diagramlanguage. They have applied it to hybrid systems [41]; in

addition, they [109] provide translational semantics for block diagrams using the programming language Haskell. Rather than

focus of the formal aspects of compositional modeling, we haveaddressed issues of model generation, assuming such formalproperties will hold.

A variety of  causal (or block-oriented) languages has beendeveloped and implemented in tools such as Modelica [11] andSimulink [13]. Such languages often support object-orientedtools such as Dymola [110] and Modelica [11], [111]. Ourproposed model generation approach can be viewed as anextension of such modeling tools in that our proposed tool canact as a generator for models using the component librariesassociated with such tools. The objectives of our approach aredifferent than those of such tools. We start with existing modelsto generate additional benchmark models. Most modeling toolscreate new models from scratch.

VIII. CONCLUSION

  A. Summary

This paper has described a tool called CoSyMGen for

generating behavioral models that have real-world topology.CoSyMGen provides a key advance for benchmark generation:given a set of exemplar system models and a component library,it can automatically analyze the exemplar models to generatesynthetic benchmark models, of arbitrary size, with propertiesthat closely match those of the exemplar models. In addition,users can provide a range of experimental metrics for thesynthetic models, such as diagnostics complexity or simulationfidelity, to fine tune the synthetic models. This method cir-cumvents problems with using random graphs for experimentalstudies of diagnostics and provides an alternative to manuallydeveloping suites of benchmark models.

We have demonstrated CoSyMGen on two real-world do-

mains: those of diagnostics for discrete circuits and pulp millprocess control. Our experiments have shown that CoSyMGencan be used for two domains with significant differences,from the discrete-valued models typically studied in the MBDcommunity to the continuous-valued process control modelstypically studied in the FDI community. We argue that thisapproach can be used for any domain where systems can becomposed from a library of components. For example, if weuse a library of pump/engine components, CoSyMGen couldbuild systems from fluid flow or engine domains.

  B. Discussion

In addition to the benchmark generation capabilities, thedomain analysis approach of CoSyMGen reveals several usefulsystem properties that can be used to guide diagnosticiansin the selection of inference and analysis techniques. In thissense, one can view domain analysis as the meta-analysis of 

domain properties. For example, in comparing the circuit andprocess-control domains, our topological analysis has revealedsignificant differences in topology, which affect the choiceof system-level inference algorithms that use structure-basedapproaches, e.g., methods based on SAT or CSP techniques[112]. Circuits have large clusters and high treewidth, leading tointractability for any algorithm whose complexity is governedby treewidth. Hence, one will need to use additional techniques,

such as component abstraction [113], or stochastic algorithms,e.g., [77], to render inference more tractable in this domain. In

Page 21: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 21/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 979

contrast, process-control systems have relatively small clustersand low treewidth, leading to tractability for any algorithmwhose complexity is governed by treewidth.

CoSyMGen has been used for empirical analysis and di-agnostic algorithm analysis [77], [114], [115]. As discussed,CoSyMGen can not only conveniently generate and optimizerealistic diagnostic benchmark models by tuning parameters butalso outperform some domain-specific generators. However, inspecific cases, it is not easy to identify the essential metrics,which are responsible for key behaviors of each application, andoptimize synthetic models in terms of the specified key metrics[116]. We plan to extend and improve model optimization byintegrating the maximum likelihood estimation principle [116],which can provide a statistically objective criterion.

In the future, we also plan to extend our domain analysisto cover properties governing continuous-valued inference pa-rameters, analogous to the structure-based parameters (clustersize and treewidth) that our current system covers. In addition,we plan to make this tool available as an extension to compo-

sitional modeling tools for benchmark generation from existingcomponent libraries.

REFERENCES

[1] H. Kautz, B. Selman, and J. Hoffmann, “SatPlan: Planning as satisfiabil-ity,” in Abstracts of the 5th Int. Plan. Competition, 2006.

[2] P. D. Kundarewich and J. Rose, “Synthetic circuit generation usingclustering and iteration,” in Proc. ACM/SIGDA 11th Int. Symp. FPGA,2003, p. 245.

[3] D. Stroobandt, P. Verplaetse, and J. van Campenhout, “Towards syntheticbenchmark circuits for evaluating timing-driven CAD tools,” in Proc.

 ISPD, 1999, pp. 60–66.[4] M. Bhushan and R. Rengaswamy, “Comprehensive design of a sensor

network for chemical plants based on various diagnosability and reli-ability criteria. 1. Framework,” Ind. Eng. Chem. Res., vol. 41, no. 7,

pp. 1826–1839, Apr. 2002.[5] D. Dvorak and B. Kuipers, “Process monitoring and diagnosis: A model-based approach,” IEEE Expert , vol. 6, no. 3, pp. 67–74, Jun. 1991.

[6] J. Castro and F. J. Doyle, III, “A pulp mill benchmark problem forcontrol: Problem description,” J. Process Control, vol. 14, no. 1, pp. 17–29, Feb. 2004.

[7] T. Van den Bulcke, K. Van Leemput, B. Naudts, P. van Remortel, H. Ma,A. Verschoren, B. De Moor, and K. Marchal, “SynTReN: A generatorof synthetic gene expression data for design and analysis of structurelearning algorithms,” BMC Bioinformatics, vol. 7, no. 1, p. 43, 2006.

[8] J. E. Harlow, “Overview of popular benchmark sets,” IEEE Des. Test 

Comput., vol. 17, no. 3, pp. 15–17, Jul.–Sep. 2000.[9] M. Barty, R. Patton, M. Syfert, S. de las Heras, and J. Quevedo, “In-

troduction to the DAMADICS actuator FDI benchmark study,” Control

 Eng. Pract., vol. 14, no. 6, pp. 577–596, Jun. 2006.[10] J. Keppens and Q. Shen, “On compositional modelling,” Knowl. Eng.

 Rev., vol. 16, no. 2, pp. 157–200, Mar. 2001.

[11] J. Broenink, “Bond-graph modeling in Modelica,” in Proc. ESS, 1997,pp. 137–141.

[12] J. Broenink, “20-sim software for hierarchical bond-graph/block-diagram models,” Simul. Pract. Theory, vol. 7, no. 5/6, pp. 481–492,Dec. 1999.

[13] MATLAB Simulink. [Online]. Available: www.mathworks.com/ products/simulink/ 

[14] N. Mazzocca and V. Vittorini, “Compositional modeling of complexsystems: Contact center scenarios in OsMoSys,” in Proc. 25th ICATPN ,Bologna, Italy, Jun. 21–25, 2004, pp. 177–196.

[15] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang,“Complex networks: Structure and dynamics,” Phys. Rep., vol. 424,no. 4/5, pp. 175–308, Feb. 2006.

[16] R. F. I. Cancho, C. Janssen, and R. V. Solé, “Topology of technologygraphs: Small world patterns in electronic circuits,” Phys. Rev. E, Stat.

Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 64, no. 4, p. 046 119,

Sep. 2001.[17] M. E. J. Newman, “The structure and function of complex networks,”SIAM Rev., vol. 45, no. 2, pp. 167–256, Jun. 2003.

[18] A. L. Barabasi and R. Albert, “Emergence of scaling in random net-works,” Science, vol. 286, no. 5439, pp. 509–512, Oct. 1999.

[19] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’networks,” Nature, vol. 393, no. 6684, pp. 440–442, Jun. 1998.

[20] L. A. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley, “Classes of small-world networks,” Proc. Nat. Acad. Sci. U.S.A., vol. 97, no. 21,pp. 11 149–11 152, Oct. 2000.

[21] G. M. Provan and J. Wang, “Automated benchmark model generators for

model-based diagnostic inference,” in Proc. IJCAI , 2007, pp. 513–518.[22] R. Reiter, “A theory of diagnosis from first principles,” Artif. Intell.,vol. 32, no. 1, pp. 57–95, Apr. 1987.

[23] P. Frank, “Fault diagnosis in dynamic systems using analytical andknowledge-based redundancy—A survey and some new results,” Auto-

matica, vol. 26, no. 3, pp. 459–474, May 1990.[24] P. Frank, S. Ding, and T. Marcu, “Model-based fault diagnosis in tech-

nical processes,” Trans. Inst. Meas. Control, vol. 22, no. 1, pp. 57–101,2000.

[25] G. Gößler and J. Sifakis, “Composition for component-based modeling,”Sci. Comput. Program., vol. 55, no. 1–3, pp. 161–183, Mar. 2005.

[26] J. de Kleer, A. Mackworth, and R. Reiter, “Characterizing diagnoses andsystems,” Artif. Intell., vol. 56, no. 2/3, pp. 197–222, Aug. 1992.

[27] S. Srinivas, “A probabilistic approach to hierarchical model-based diag-nosis,” in Proc. UAI , 1994, pp. 538–545.

[28] P. Lucas, “Bayesian model-based diagnosis,” Int. J. Approx. Reason.,vol. 27, no. 2, pp. 99–119, Aug. 2001.

[29] W. Callebaut and D. Rasskin-Gutman, Modularity: Understanding the  Development and Evolution of Natural Complex Systems. Cambridge,MA: MIT Press, 2005.

[30] L. Hartwell, J. Hopfield, S. Leibler, and A. Murray, “From molecular tomodular cell biology,” Nature, vol. 402, no. 6761Supp, pp. C47–C52,Dec. 1999.

[31] G. Wagner and L. Altenberg, “Complex adaptations and the evolution of evolvability,” Evolution, vol. 50, no. 3, pp. 967–976, 1996.

[32] R. Thomas, “Hierarchical integration of modular structures in the evo-lution of animal skeletons,” in Modularity: Understanding the Devel-

opment and Evolution of Natural Complex Systems. Cambridge, MA:MIT Press, 2005.

[33] R. Randhawa, C. Shaffer, and J. Tyson, “Model composition for macro-molecular regulatory networks,” IEEE/ACM Trans. Comput. Biol. Bioin-

 formatics, vol. 7, no. 2, pp. 278–287, Apr.–Jun. 2010.[34] H. Soueidan, M. Nikolski, and D. Sherman, “BioRica: A multi model de-

scription and simulation system,” in Proc. FOSBE , Stuttgart, Germany,Sep. 2007, pp. 279–287.[35] P. Woodbury, R. Beloin, D. Swaney, B. Gollands, and D. Weinstein,

“Using the ECLPSS software environment to build a spatially explicitcomponent-based model of ozone effects on forest ecosystems,” Eco-

logical Model., vol. 150, no. 3, pp. 211–238, May 2002.[36] N. Suh, The Principles of Design. New York: Oxford Univ. Press,

1990.[37] J. Liu and E. Lee, “A component-based approach to modeling and sim-

ulating mixed-signal and hybrid systems,” ACM Trans. Model. Comput.

Simul. (TOMACS), vol. 12, no. 4, pp. 343–368, Oct. 2002.[38] E. Manders, G. Biswas, N. Mahadevan, and G. Karsai, “Component-

oriented modeling of hybrid dynamic systems using the generic mod-eling environment,” in Proc. 4th Workshop Model-Based Develop.

Comput. Based Syst., Potsdam, Germany, 2006, pp. 159–168.[39] S. Burmester, H. Giese, and O. Oberschelp, “Hybrid UML components

for the design of complex self-optimizingmechatronic systems,” in Proc.

1st ICINCO, Setubal, Portugal, 2004, pp. 222–229.[40] M. Daigle, I. Roychoudhury, G. Biswas, and X. Koutsoukos, “Efficient

simulation of component-based hybrid models represented as hybridbond graphs,” in Proc. HSCC , vol. 4416, Lecture Notes in Computer 

Science, 2007, pp. 680–683.[41] B. Denckla and P. Mosterman, “Formalizing causal block diagrams for

modeling a class of hybrid dynamic systems,” in Proc. 44th IEEE CDC-

 ECC , 2005, pp. 4193–4198.[42] G. Gößler, S. Graf, M. E. Majster-Cederbaum, M. Martens, and

J. Sifakis, “An approach to modelling and verification of componentbased systems,” in Proc. SOFSEM (1), 2007, pp. 295–308.

[43] H. Nilsson, J. Peterson, and P. Hudak, “Functional hybrid modeling,”in Proc. PADL, vol. 2562, Lecture Notes in Computer Science, 2003,pp. 376–390.

[44] A. Darwiche, “Model-based diagnosis using structured system descrip-tions,” J. Artif. Intell. Res., vol. 8, no. 1, pp. 165–222, Jan. 1998.

[45] J. Keppens and Q. Shen, “Causality enabled compositional modellingof Bayesian networks,” in Proc. 18th Int. Workshop Qualitative Reason.

 About Phys. Syst., 2004, pp. 33–40.

Page 22: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 22/23

980 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010

[46] A. Breunese, J. Broenink, J. Top, and J. Akkermans, “Libraries of reusable models: Theory and application,” Simulation, vol. 71, no. 1,pp. 7–22, 1998.

[47] E. Posse, J. de Lara, and H. Vangheluwe, “Processing causal block diagrams with graph-grammars in AToM 3,” in Proc. ETAPS, Workshop

 AGT , 2002, pp. 23–34.[48] L. Han, C. J. J. Paredis, and P. K. Khosla, “Object-oriented libraries of 

physical components in simulation and design,” in Proc. 2001 Summer 

Comput. Simul. Conf., Orlando, FL, Jul. 15–19, 2001, pp. 1–8.[49] F. Cellier, “Hierarchical non-linear bond graphs: A unified methodologyfor modeling complex physical systems,” Simulation, vol. 58, no. 4,pp. 230–248, 1992.

[50] P. Mendes, W. Sha, and K. Ye, “Artificial gene networks for objectivecomparison of analysis algorithms,” Bioinformatics, vol. 19 Suppl 2,pp. 122–129, Oct. 2003.

[51] M. D. Hutton, J. Rose, J. P. Grossman, and D. G. Corneil, “Characteri-zation and parameterized generation of synthetic combinational bench-mark circuits,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,vol. 17, no. 10, pp. 985–996, Oct. 1998.

[52] P. Verplaetse, J. Van Campenhout, and D. Stroobandt, “On syntheticbenchmark generation methods,” in Proc. IEEE Int. Symp. Circuits Syst.,2000, pp. 213–216.

[53] D. Stroobandt, “Analytical methods for a priori wire length estimatesin computer systems,” Ph.D. dissertation, Ghent Univ., Ghent, Belgium,2001.

[54] P. Mahadevan, D. V. Krioukov, K. R. Fall, and A. Vahdat, “Systematictopology analysis and generation using degree correlations,” in Proc.

SIGCOMM , 2006, pp. 135–146.[55] C. T. Harbison, B. D. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac,

T. W. Danford, N. M. Hannett, J.-B. Tagne, D. B. Reynolds, J. Yoo,E. G. Jennings, J. Zeitlinger, D. K. Pokholok, M. Kellis, A. P. Rolfe,K. T. Takusagawa, E. S. Lander, D. K. Gifford, E. Fraenkel, andR. A. Young, “Transcriptional regulatory code of a eukaryotic genome,”

 Nature, vol. 431, no. 7004, pp. 99–104, Sep. 2004.[56] S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon, “Network motifs in

the transcriptional regulation network of Escherichia coli,” Nat. Genet.,vol. 31, no. 1, pp. 64–68, May 2002.

[57] D. E. Zak, G. E. Gonye, J. S. Schwaber, and F. J. Doyle, “Importanceof input perturbations and stochastic gene expression in the reverse en-gineering of genetic regulatory networks: Insights from an identifiabilityanalysis of an in silico network,” Genome Res., vol. 13, no. 11, pp. 2396–

2405, Nov. 2003.[58] N. Barkai and S. Leibler, “Circadian clocks limited by noise,” Nature,vol. 403, no. 6636, pp. 267–268, Jan. 2000.

[59] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, andU. Alon, “Network motifs: Simple building blocks of complex net-works,” Science, vol. 298, no. 5594, pp. 824–827, Oct. 2002.

[60] L. Costa, F. Rodrigues, G. Travieso, and P. Boas, “Characterization of complex networks: A survey of measurements,” Adv. Phys., vol. 56,no. 1, pp. 167–242, Jan. 2007.

[61] D. Chakrabarti and C. Faloutsos, “Graph mining: Laws, generators, andalgorithms,” ACM Comput. Surv., vol. 38, no. 1, pp. 1–69, 2006.

[62] D. Christie and P. Stroobandt, “The interpretation and application of Rent’s rule,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst. , vol. 8,no. 6, pp. 639–648, Dec. 2000.

[63] Y. He, Z. J. J. Chen, and A. C. C. Evans, “Small-world anatomicalnetworks in the human brain revealed by cortical thickness from MRI,”Cereb Cortex, vol. 17, no. 10, pp. 2407–2419, Oct. 2007.

[64] R. M. D’Souza, C. Borgs, J. T. Chayes, N. Berger, and R. D. Kleinberg,“Emergence of tempered preferential attachment from optimization,”Proc. Nat. Acad. Sci. U.S.A., vol. 104, no. 15, pp. 6112–6117, Apr. 2007.

[65] N. Mathias and V. Gopal, “Small worlds: How and why,” Phys. Rev.

 E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 63, no. 2,p. 021117, Feb. 2001.

[66] N. Przulj, “Biological network comparison using graphlet degree distri-bution,” Bioinformatics, vol. 23, no. 2, pp. e177–e183, Jan. 2007.

[67] L. Li, J. C. Doyle, and W. Willinger, “Towards a theory of scale-freegraphs: Definition, properties, and implications,” Internet Math., vol. 2,no. 4, pp. 431–523, Mar. 2006.

[68] N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling interactome: Scale-free or geometric?” Bioinformatics, vol. 20, no. 18, pp.3508–3515,Dec. 2004.

[69] M. Middendorf, E. Ziv, and C. H. Wiggins, “Inferring network mecha-nisms: The Drosophila melanogaster protein interaction network,” Proc.

 Nat. Acad. Sci. U.S.A., vol. 102, no. 9, pp. 3192–3197, Mar. 1, 2005.[70] H. L. Bodlaender, “Treewidth of graphs,” in Encyclopedia of Algorithms.New York: Springer-Verlag, 2008.

[71] T. Eiter and G. Gottlob, “The complexity of logic-based abduction,” J.

 ACM , vol. 42, no. 1, pp. 3–42, Jan. 1995.[72] G. M. Provan, “An empirical analysis of the complexity of model-based

diagnosis,” in Proc. ECAI , 2006, pp. 783–784.[73] M. Grohe and J. Flum, Parameterized Complexity Theory. New York:

Springer-Verlag, 2006.[74] A. Feldman, G. M. Provan, and A. J. C. van Gemund, “Computing

observation vectors for max-fault min-cardinality diagnoses,” in Proc.

 AAAI , 2008, pp. 919–924.[75] J. de Kleer, “An assumption-based TMS,” Artif. Intell., vol. 28, no. 2,pp. 127–162, Mar. 1986.

[76] A. Feldman, G. M. Provan, and A. J. C. van Gemund, “Computingminimal diagnoses by greedy stochastic search,” in Proc. AAAI , 2008,pp. 911–918.

[77] G. Gottlob and S. Szeider, “Fixed-parameter algorithms for artificialintelligence, constraint satisfaction and database problems,” Comput. J.,vol. 51, no. 3, pp. 303–325, May 2008.

[78] A. Ferrara, G. Pan, and M. Vardi, “Treewidth in verification: Local vs.global,” in Proc. LPAR, vol. 3835, Lecture Notes in Computer Science,2005, pp. 489–503.

[79] D. Krioukov, K. Claffy, M. Fomenkov, F. Chung, A. Vespignani, andW. Willinger, “The workshop on Internet topology (WIT) report,”SIGCOMM Comput. Commun. Rev., vol. 37, no. 1, pp. 69–73, 2007.

[80] F. R. K. Chung, L. Lu, T. G. Dewey, andD. J. Galas,“Duplication modelsfor biological networks,” J. Comput. Biol., vol. 10, no. 5, pp. 677–687,

2003.[81] F. Viger and M. Latapy, “Efficient and simple generation of random

simple connected graphs with prescribed degree sequence,” in Proc.

COCOON , 2005, pp. 440–449.[82] J. Wang and G. M. Provan, “Generating application-specific benchmark 

model for complex systems,” in Proc. AAAI , 2008, pp. 566–571.[83] F. Hormozdiari, P. Berenbrink, N. Przulj, and S. C. C. Sahinalp, “Not

all scale-free networks are born equal: The role of the seed graph inPPI network evolution,” PLoS Comput. Biol., vol. 3, no. 7, p. e118,Jul. 2007.

[84] J. Dambre, “Prediction of interconnect properties for digital circuit de-sign and technology exploration,” Ph.D. dissertation, Ghent Univ., Fac-ulty Eng., Ghent, Belgium, 2003.

[85] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization bysimulated annealing,” Science, vol. 220, no. 4598, pp. 671–680,May 13, 1983.

[86] W. H. Press, S. A. Teukolsky, W. T. Vettering, and B. P. Flannery,  Numerical Recipes in C++. The Art of Scientific Computing, 2nd ed.Cambridge, U.K.: Cambridge Univ. Press, Feb. 2002.

[87] M. C. Hansen, H. Yalcin, and J. P. Hayes, “Unveiling the ISCAS-85benchmarks: A case study in reverse engineering,” IEEE Des. Test Com-

 put., vol. 16, no. 3, pp. 72–80, Jul.–Sep. 1999.[88] R. Guimera and L. Amaral, “Modeling the world-wide airport net-

work,” Eur. Phys. J. B—Condens. Matter , vol. 38, no. 2, pp. 381–385,Mar. 2004.

[89] R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral, “Theworldwide air transportation network: Anomalous centrality, communitystructure, and cities’ global roles,” Proc. Nat. Acad. Sci., vol. 102, no. 22,pp. 7794–7799, May 2005.

[90] S.-H. Yook, H. Jeong, and A.-L. Barabasi, “Modeling the Internet’slarge-scale topology,” Proc. Nat. Acad. Sci. U.S.A., vol. 99, no. 21,pp. 13 382–13386, Oct. 2002.

[91] D. Stroobandt, “A priori wire length distribution models with multiter-

minal nets,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11,no. 1, pp. 35–43, Feb. 2003.

[92] G. Lee, T. Tosukhowong, and J. Lee, “Fault detection and diagnosis of pulp mill process,” Comput. Aided Chem. Eng., vol. 21, no. B, pp. 1461–1466, 2006.

[93] A. Samantaray and S. Ghoshal, “Sensitivity bond graph approach tomultiple fault isolation through parameter estimation,” Proc. Inst. Mech.

  Eng., Part I: J. Syst. Control Eng., vol. 221, no. 4, pp. 577–587,2007.

[94] M. Barthélemy, “Crossover from scale-free to spatial networks,” Euro-

 phys. Lett., vol. 63, no. 6, pp. 915–921, Sep. 2003.[95] J. Wang and G. Provan, “Topological analysis of specific spatial complex

networks,” Adv. Complex Syst., vol. 12, no. 1, pp. 45–71, Feb. 2009.[96] J. Wang and G. Provan, “A comparative analysis of specic spatial net-

work topological models,” in Proc. Int. Conf. Complex Sci.: Theory

 Appl., 2009, vol. 5, pp. 1514–1525.

[97] J. Wang and G. Provan, “Characterizing the structural complexity of real-world complex networks,” in Proc. Int. Conf. Complex Sci.: Theory

 Appl., 2009, vol. 4, pp. 1178–1189.

Page 23: A Benchmark Diagnostic Model Generation System

8/7/2019 A Benchmark Diagnostic Model Generation System

http://slidepdf.com/reader/full/a-benchmark-diagnostic-model-generation-system 23/23

WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 981

[98] F. Jensen, S. Lauritzen, and K. Olesen, “Bayesian updating in recursivegraphical models by local computations,” Comput. Statist. Q., vol. 4,pp. 269–282, 1990.

[99] P. Torasso, “Compact diagnoses representation in diagnostic problemsolving,” Comput. Intell., vol. 21, no. 1, pp. 27–68, Feb. 2005.

[100] Y. Zhao and A. Doboli, “Finding broad-scale patterns in large sizeelectronic circuit netlists,” in Proc. 48th Midwest Symp. Circuits Syst.,Aug. 2005, pp. 307–310.

[101] C.-K. Cheng, A. Kahng, and B. Liu, “Interconnect implications of growth-based structural models for VLSI circuits,” in Proc. ACM Int.

Workshop Syst.-Level Interconnect Prediction, Apr. 2001, pp. 99–106.[102] R. Duboz and C. Cambier, “Small world properties in a DSDEVS model

of ecosystem,” in Proc. OICMS, 2005, pp. 65–71.[103] J. Keppens and Q. Shen, “Granularity and disaggregation in composi-

tional modelling with applications to ecological systems,” Appl. Intell.,vol. 25, no. 3, pp. 269–292, Dec. 2006.

[104] F. E. Cellier, H. Elmqvist, and M. Otter, “Modeling from physical prin-ciples,” in The Control Handbook , W. S. Levine, Ed. Boca Raton, FL:CRC Press, 1995, pp. 99–108.

[105] D. Karnopp, D. Margolis, and R. Rosenberg, Systems Dynamics: A

Unified Approach., 2nd ed. Hoboken, NJ: Wiley, 1990.[106] B. Bouamama, A. Samantaray, K. Medjaher, M. Staroswiecki, and

G. Dauphin-Tanguy, “Model builder using functional and bond graphtools for FDI design,” Control Eng. Pract., vol. 13, no. 7, pp. 875–891,Jul. 2005.

[107] B. Bouamama, “Bond graph approach as analysis tool in thermofluidmodel library conception,” J. Franklin Inst., vol. 340, no. 1, pp. 1–23,Jan. 2003.

[108] G. Gossler and J. Sifakis, “Composition for component-based mod-eling,” in Proc. 1st Int. Symp. FMCO, Leiden, The Netherlands,Nov. 5–8, 2002.

[109] B. Denckla and P. Mosterman, “Block diagrams as a syntactic extensionto Haskell,” in Proc. Workshop on Multi-Paradigm Modeling: Concepts

and Tools at the 9th Int. Conf. Model-Driven Eng. Lang. Syst., Genoa,Italy, Oct. 3, 2006, pp. 67–79.

[110] F. Cellier and R. McBride, “Object-oriented modeling of complex phys-ical systems using the Dymola bond-graph library,” in Proc. ICBGM ,2003, pp. 19–23.

[111] P. Fritzson, Principles of Object-Oriented Modeling and Simulation With

 Modelica 2.1. New York: Wiley-IEEE, 2004.

[112] G. Gottlob, R. Pichler, and F. Wei, “Bounded treewidth as a key totractability of knowledge representation and reasoning,” in Proc. AAAI ,2006, pp. 250–256.

[113] S. Siddiqi and J. Huang, “Hierarchical diagnosis of multiple faults,” inProc. 20th IJCAI , 2007, pp. 581–586.

[114] A. Santana and G. M. Provan, “An analysis of Bayesian network model-approximation techniques,” in Proc. ECAI , 2008, pp. 851–852.

[115] A. Venturini and G. M. Provan, “Incremental algorithms for approximate

compilation,” in Proc. AAAI , 2008, pp. 1495–1498.[116] I. Bezáková, A. Kalai, and R. Santhanam, “Graph model selection usingmaximum likelihood,” in Proc. 23rd ICML, 2006, pp. 105–112.

Jun Wang received the Bachelor and Master degreesin computer science from the Huazhong Universityof Science and Technology, Wuhan, China, and thePh.D. degree in computer science from UniversityCollege Cork, Cork, Ireland.

He was on the research staff with the Fujitsu R&DCenter, Beijing, China, and a Visiting Researcherwith Fujitsu Laboratories, Kawasaki, Japan. He iscurrently a member of research staff with FujitsuLaboratories of America, Inc., Sunnyvale, CA. Hiscurrent research focuses on intelligent Web comput-

ing and complex networks.

Gregory Provan is currently a Professor with theComputer Science Department and the Directorof the Complex Systems Laboratory, UniversityCollege Cork (UCC), Cork, Ireland. He haspublished over 90 peer-reviewed articles. He hasexpertise in modeling, diagnosis, and control of complex systems. The types of complex systems thathave been studied range from aerospace systems,such as the Space Shuttle and Boeing/Airbuscommercial aircraft, to zero-energy buildings.