proceedings of the third european workshop on probabilistic

344
Prague, Czech Republic September 12-15, 2006 Probabilistic Graphical Models Proceedings of the Third European Workshop on Edited by Milan Studený and Jiří Vomlel Prague, 2006 Action M Agency pgm’06

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proceedings of the Third European Workshop on Probabilistic

Prague, Czech RepublicSeptember 12−15, 2006

Probabilistic Graphical ModelsProceedings of the Third European Workshop on

Edited by Milan Studený and Jiří Vomlel

Prague, 2006Action M Agency

pgm’06

Page 2: Proceedings of the Third European Workshop on Probabilistic

Typeset in LATEX.The cover page and PGM logo: Jirı VomlelGraphic art: Karel Solarık

ISBN 80-86742-14-8

Page 3: Proceedings of the Third European Workshop on Probabilistic

Preface

These are the proceedings of the third European workshop on Probabilistic Graphical Models(PGM’06) to be held in Prague, Czech Republic, September 12-15, 2006. The aim of this seriesof workshops on probabilistic graphical models is to provide a discussion forum for researchersinterested in this topic. The first European PGM workshop (PGM’02) was held in Cuenca, Spainin November 2002. It was a successful workshop and several of its participants expressed interest inhaving a biennial European workshop devoted particularly to probabilistic graphical models. Thesecond PGM workshop (PGM’04) was held in Leiden, the Netherlands in October 2004. It wasalso a success; more emphasis was put on collaborative work, and the participants also discussedhow to foster cooperation between European research groups.

There are two trends which can be observed in connection with PGM workshops. First, eachworkshop is held during an earlier month than the preceding one. Indeed, PGM’02 was held inNovember, PGM’04 in October and PGM’06 will be held in September. Nevertheless, I think thisis a coincidence. The second trend is the increasing number of contributions (to be) presented. Iwould like to believe that this is not a coincidence, but an indication of increasing research interestin probabilistic graphical models.

A total of 60 papers were submitted to PGM’06 and, after the reviewing and post-reviewingphases, 41 of them were accepted for presentation at the workshop (21 talks, 20 posters) andappear in these proceedings. The authors of these papers come from 17 different countries, mainlyEuropean ones.

To handle the reviewing process (from May 15, 2006 to June 28, 2006) the PGM ProgramCommittee was considerably extended. This made it possible that every submitted paper wasreviewed by 3 independent reviewers. The reviews were handled electronically using the STARTconference management system.

Most of the PC members reviewed around 7 papers. Some of them also helped in the post-reviewing phase, if revisions to some of the accepted papers were desired. Therefore, I would liketo express my sincere thanks here to all of the PC members for all of the work they have donetowards the success of PGM’06. I think PC members helped very much to improve the quality ofthe contributions. Of course, my thanks are also addressed to the authors.

Further thanks are devoted to the sponsor, the DAR UTIA research centre, who covered theexpenses related to the START system. I am also indebted to the members of the organizingcommittee for their help, in particular, to my co-editor, Jirka Vomlel.

My wish is for the participants in the PGM’06 workshop to appreciate both the beauty ofPrague and the scientific program of the workshop. I believe there will be a fruitful discussionduring the workshop and I hope that the tradition of PGM workshops will continue.

Prague, June 26, 2006Milan StudenyPGM’06 PC Chair

Page 4: Proceedings of the Third European Workshop on Probabilistic
Page 5: Proceedings of the Third European Workshop on Probabilistic

Workshop OrganizationProgram Committee

Chair:Milan Studeny Czech Republic

Committee Members:Concha Bielza Technical University of Madrid, SpainLuis M. de Campos University of Granada, SpainFrancisco J. Dıez UNED Madrid, SpainMarek J. Druzdzel University of Pittsburgh, USAAd Feelders Utrecht University, NetherlandsLinda van der Gaag Utrecht University, NetherlandsJose A. Gamez University of Castilla La Mancha, SpainJuan F. Huete University of Granada, SpainManfred Jaeger Aalborg University, DenmarkFinn V. Jensen Aalborg University, DenmarkRadim Jirousek Academy of Sciences, Czech RepublicPedro Larranaga University of the Basque Country, SpainPeter Lucas Radboud University Nijmegen, NetherlandsFero Matus Academy of Sciences, Czech RepublicSerafın Moral University of Granada, SpainThomas D. Nielsen Aalborg University, DenmarkKristian G. Olesen Aalborg University, DenmarkJose M. Pena Linkoping University, SwedenJose M. Puerta University of Castilla La Mancha, SpainSilja Renooij Utrecht University, NetherlandsDavid Rıos–Insua Rey Juan Carlos University, SpainAntonio Salmeron University of Almeria, SpainJames Q. Smith University of Warwick, UKMarco Valtorta University of South Carolina, USAJirı Vomlel Academy of Sciences, Czech RepublicMarco Zaffalon IDSIA Lugano, Switzerland

Additional Reviewers

Alessandro Antonucci IDSIA Lugano, SwitzerlandJanneke Bolt Utrecht University, NetherlandsJulia Flores University of Castilla La Mancha, SpainMarcel van Gerven Radboud University Nijmegen, NetherlandsYimin Huang University of South Carolina, USASøren H. Nielsen Aalborg University, DenmarkJana Novovicova Academy of Sciences, Czech RepublicValerie Sessions University of South Carolina, USAPetr Simecek Academy of Sciences, Czech RepublicMarta Vomlelova Charles University, Czech RepublicPeter de Waal Utrecht University, NetherlandsChanghe Yuan University of Pittsburgh, USA

Page 6: Proceedings of the Third European Workshop on Probabilistic

Organizing CommitteeChair:Radim Jirousek Academy of Sciences, Czech Republic

Committee members:Milena Zeithamlova Action M Agency, Czech RepublicMarta Vomlelova Charles University, Czech RepublicRadim Lnenicka Academy of Sciences, Czech RepublicPetr Simecek Academy of Sciences, Czech RepublicJirı Vomlel Academy of Sciences, Czech Republic

Page 7: Proceedings of the Third European Workshop on Probabilistic

ContentsSome Variations on the PC Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Joaquın Abellan, Manuel Gomez-Olmedo, and Serafın Moral

Learning Complex Bayesian Network Features for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Peter Antal, Andras Gezsi, Gabor Hullam, and Andras Millinghoffer

Literature Mining using Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Peter Antal and Andras Millinghoffer

Locally specified credal networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Alessandro Antonucci and Marco Zaffalon

A Bayesian Network Framework for the Construction of Virtual Agentswith Human-like Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Olav Bangsø, Nicolaj Søndberg-Madsen, and Finn V. Jensen

Loopy Propagation: the Convergence Error in Markov Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43Janneke H. Bolt

Preprocessing the MAP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Janneke H. Bolt and Linda C. van der Gaag

Quartet-Based Learning of Shallow Latent Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59Tao Chen and Nevin L. Zhang

Continuous Decision MTE Influence Diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67Barry R. Cobb

Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study . . . . . . . . . . . . . . . . . 75Ad Feelders and Jevgenijs Ivanovs

The Independency tree model: a new approach for clustering and factorisation . . . . . . . . . . . . . . . . . 83M. Julia Flores, Jose A. Gamez, and Serafın Moral

Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets . . . . . . . . . . . . . . . . 91Olivier C.H. Francois and Philippe Leray

Lattices for Studying Monotonicity of Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Linda C. van der Gaag, Silja Renooij, and Petra L. Geenen

Multi-dimensional Bayesian Network Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Linda C. van der Gaag and Peter R. de Waal

Dependency networks based classifiers: learning models by using independence tests . . . . . . . . . . 115Jose A. Gamez, Juan L. Mateo, and Jose M. Puerta

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials . . . . . . . . 123Jose A. Gamez, Rafael Rumı, and Antonio Salmeron

Selecting Strategies for Infinite-Horizon Dynamic LIMIDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Marcel A. J. van Gerven and Francisco J. Dıez

Page 8: Proceedings of the Third European Workshop on Probabilistic

Sensitivity analysis of extreme inaccuracies in Gaussian Bayesian Networks . . . . . . . . . . . . . . . . . . . 139Miguel A. Gomez-Villegas, Paloma Maın, and Rosario Susi

Learning Bayesian Networks Structure using Markov Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Christophe Gonzales and Nicolas Jouve

Estimation of linear, non-gaussian causal models in the presence of confoundinglatent variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Patrik O. Hoyer, Shohei Shimizu, and Antti J. Kerminen

Symmetric Causal Independence Models for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Rasa Jurgelenaite and Tom Heskes

Complexity Results for Enhanced Qualitative Probabilistic Networks . . . . . . . . . . . . . . . . . . . . . . . . . 171Johan Kwisthout and Gerard Tel

Decision analysis with influence diagrams using Elvira’s explanation facilities . . . . . . . . . . . . . . . . . 179Manuel Luque and Francisco J. Dıez

Dynamic importance sampling in Bayesian networks using factorisation of probability trees . . . 187Irene Martınez, Carmelo Rodrıguez, and Antonio Salmeron

Learning Semi-Markovian Causal Models using Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195Stijn Meganck, Sam Maes, Philippe Leray, and Bernard Manderick

Geometry of rank tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .207Jason Morton, Lior Pachter, Anne Shiu, Bernd Sturmfels, and Oliver Wienand

An Empirical Study of Efficiency and Accuracy of Probabilistic Graphical Models . . . . . . . . . . . . 215Jens D. Nielsen and Manfred Jaeger

Adapting Bayes Network Structures to Non-stationary Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223Søren H. Nielsen and Thomas D. Nielsen

Diagnosing Lyme disease - Tailoring patient specific Bayesian networksfor temporal reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Kristian G. Olesen, Ole K. Hejlesen, Ram Dessau, Ivan Beltoft, and Michael Trangeled

Predictive Maintenance using Dynamic Probabilistic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239Demet Ozgur-Unluakin and Taner Bilgic

Reading Dependencies from the Minimal Undirected Independence Map of a Graphoidthat Satisfies Weak Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

Jose M. Pena, Roland Nilsson, Johan Bjorkegren, and Jesper Tegner

Evidence and Scenario Sensitivities in Naive Bayesian Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255Silja Renooij and Linda van der Gaag

Abstraction and Refinement for Solving Continuous Markov Decision Processes. . . . . . . . . . . . . . .263Alberto Reyes, Pablo Ibarguengoytia, L. Enrique Sucar, and Eduardo Morales

Bayesian Model Averaging of TAN Models for Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .271Guzman Santafe, Jose A. Lozano, and Pedro Larranaga

Page 9: Proceedings of the Third European Workshop on Probabilistic

Dynamic Weighting A* Search-based MAP Algorithm for Bayesian Networks . . . . . . . . . . . . . . . . . 279Xiaoxun Sun, Marek J. Druzdzel, and Changhe Yuan

A Short Note on Discrete Representability of Independence Models . . . . . . . . . . . . . . . . . . . . . . . . . . .287Petr Simecek

Evaluating Causal effects using Chain Event Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Peter Thwaites and Jim Smith

Severity of Local Maxima for the EM Algorithm: Experiences with HierarchicalLatent Class Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Yi Wang and Nevin L. Zhang

Optimal Design with Design Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309Yang Xiang

Hybrid Loopy Belief Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317Changhe Yuan and Marek J. Druzdzel

Probabilistic Independence of Causal Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325Adam Zagorecki and Marek J. Druzdzel

Page 10: Proceedings of the Third European Workshop on Probabilistic
Page 11: Proceedings of the Third European Workshop on Probabilistic

Some Variations on the PC Algorithm

J. Abellan, M. Gomez-Olmedo, and S. MoralDepartment of Computer Science and Artificial Intelligence

University of Granada18071 - Granada, Spain

Abstract

This paper proposes some possible modifications on the PC basic learning algorithm andmakes some experiments to study their behaviour. The variations are: to determineminimum size cut sets between two nodes to study the deletion of a link, to make statisticaldecisions taking into account a Bayesian score instead of a classical Chi-square test, tostudy the refinement of the learned network by a greedy optimization of a Bayesian score,and to solve link ambiguities taking into account a measure of their strength. It will beshown that some of these modifications can improve PC performance, depending of theobjective of the learning task: discovering the causal structure or approximating the jointprobability distribution for the problem variables.

1 Introduction

There are two main approaches to learningBayesian networks from data. One is based onscoring and searching (Cooper and Herskovits,1992; Heckerman, 1995; Buntine, 1991). Itsmain idea is to define a global measure (score)which evaluates a given Bayesian network modelas a function of the data. The problem is solvedby searching in the space of possible Bayesiannetwork models trying to find the network withoptimal score. The other approach (constraintlearning) is based on carrying out several inde-pendence tests on the database and building aBayesian network in agreement with tests re-sults. The main example of this approach is PCalgorithm (Spirtes et al., 1993). It can be ap-plied to any source providing information aboutwhether a given conditional independence rela-tion is verified.

In the past years, searching and scoring pro-cedures have received more attention, due tosome clear advantages (Heckerman et al., 1999).One is that constraint based learning makescategorical decisions from the very beginning.These decisions are based on statistical teststhat may be erroneous and these errors will af-fect all the future algorithm bahaviour. Another

one is that scoring and search procedures allowto compare very different models by a score thatcan be be interpreted as the probability of be-ing the true model. As a consequence, we canalso follow a Bayesian approach considering sev-eral alternative models, each one of them withits corresponding probability, and using themto determine posterior decisions (model averag-ing). Finally, in score and searching approachesdifferent combinatorial optimization techniques(de Campos et al., 2002; Blanco et al., 2003)can be applied to maximize the evaluation ofthe learned network. On the other hand, the PCalgorithm has some advantages. One of them isthat it has an intuitive basis and under someideal conditions it has guarantee of recovering agraph equivalent to the one being a true modelfor the data. It can be considered as an smartselection and ordering of the questions that haveto be done in order to recover a causal structure.

The basic point of this paper is that PC al-gorithm provides a set of strategies that canbe combined with other ideas to produce goodlearning algorithms which can be adapted to dif-ferent situations. An example of this is whenvan Dijk et al. (2003) propose a combination oforder 0 and 1 tests of PC algorithm with an scor-ing and searching procedure. Here, we propose

Page 12: Proceedings of the Third European Workshop on Probabilistic

several variations about the original PC algo-rithm. The first one will be a generalization ofthe necessary path condition (Steck and Tresp,1999); the second will be to change the statisti-cal tests for independence by considering deci-sions based on a Bayesian score; the third will beto allow the possibility of refining the networklearned with PC by applying a greedy optimiza-tion of a Bayesian score; and finally the last pro-posal will be to delete edges from triangles in thegraph following an order given by a Bayesianscore (removing weaker edges first). We willshow the intuitive basis for all of them and wewill make some experiments showing their per-formance when learning Alarm network (Bein-lich et al., 1989). The quality of the learnednetworks will be measured by the number ormissing-added links and the Kullback-Leiblerdistance of the probability distribution associ-ated to the learned network to the original one.

The paper is organized as follows: Section 2is devoted to describe the fundamentals of PCalgorithm; Section 3 introduces the four varia-tions of PC algorithm; in Section 4 the resultsof the experiments are reported and discussed;Section 5 is devoted to the conclusions.

2 The PC Algorithm

Assume that we have a set of variables X =(X1, . . . , Xn) with a global probability distribu-tion about them P . By an uppercase bold letterA we will represent a subset of variables of X.By I(A,B|C) we will denote that sets A and B

are conditionally independent given C.

PC algorithm assumes faithfulness. Thismeans that there is a directed acyclic graph, G,such that the independence relationships amongthe variables in X are exactly those representedby G by means of the d-separation criterion(Pearl, 1988). PC algorithm is based on theexistence of a procedure which is able of say-ing when I(A,B|C) is verified in graph G. Itfirst tries to find the skeleton (underlying undi-rected graph) and on a posterior step makes theorientation of the edges. Our variations will bemainly applied to the first part (determining theskeleton). So we shall describe it with some de-

tail:

1. Start with a complete undirected graph G′

2. i = 03. Repeat

4. For each X ∈ X

5. For each Y ∈ ADJX

6.Test whether ∃S ⊆ ADJX − Y with |S| = i and I(X,Y |S)

7. If this set exists8. Make SXY = S

9. Remove X − Y link from G′

10. i = i + 111. Until |ADJX | ≤ i, ∀X

In this algorithm, ADJX is the set of nodesadjacent to X in graph G′. The basis is that ifthe set of independencies is faithful to a graph,then there is not a link between X and Y , ifand only if there is a subset S of the adjacentnodes of X such that I(X,Y |S). For each pairof variables, SXY will contain such a set, if itis found. This set will be used in the posteriororientation stage.

The orientation step will proceed by lookingfor sets of three variables X,Y,Z such thatedges X −Z, Y −Z are in the graph by not theedge X − Y . Then, if Z 6∈ SXY , it orients theedges from X to Z and from Y to Z creating av-structure: X → Z ← Y . Once, these orienta-tions are done, then it tries to orient the rest ofthe edges following two basic principles: not tocreate cycles and not to create new v-structures.It is possible that the orientation of some of theedges has to be arbitrarily selected.

If the set of independencies is faithful to agraph and we have a perfect way of determin-ing whether I(X,Y |S), then the algorithm hasguarantee of producing a graph equivalent (rep-resents the same set of independencies) to theoriginal one.

However, in practice none of these conditionsis verified. Independencies are decided at thelight of independence statistical tests based ona set of data D. The usual way of doing thesetests is by means of a chi-square test based onthe cross entropy statistic measured in the sam-ple (Spirtes et al., 1993). Statistical tests have

2 J. Abellán, M. Gómez-Olmedo, and S. Moral

Page 13: Proceedings of the Third European Workshop on Probabilistic

errors and then, even if faithfulness hypothesisis verified, it is possible that we do not recoverthe original graph. The number of errors of sta-tistical tests increases when the sample is smallor the cardinality of the conditioning set S islarge (Spirtes et al., 1993, p. 116). In bothcases, due to the nature of frequentist statisti-cal tests, there is a tendency to always decideindependence (Cohen, 1988). This is one rea-son of doing statistical tests in increasing orderof the cardinality of the sets to which we areconditioning.

Apart from no recovering the original graph,we can have another effects, as the possibilityof finding cycles when orienting v-structures. Inour implementation, we have always avoided cy-cles by reversing the arrows if necessary.

3 The Variations

3.1 Necessary Path Condition

In PC algorithm it is possible that we deletethe link between X and Y by testing the inde-pendence I(X,Y |S), when S is a set containingnodes that do not appear in a path (withoutcycles) from X to Y . The inclusion of thesenodes is not theoretically wrong, but statisti-cal tests make more errors when the size ofthe conditioning set increases, then it can bea source of problems in practice. For this rea-son, Steck and Tresp (1999) proposed to reduceADJX − Y in Step 6, by removing all thenodes that are not in a path from X to Y . Inthis paper, we will go an step further by consid-ering any subset CUTX,Y disconnecting X andY in the graph in which the link X − Y hasbeen deleted, playing the role of ADJX − Y .Consider that in the skeleton, we want to seewhether link X − Y can be deleted, then wefirst remove it, and if the situation is the onein Figure 1, we could consider CUTX,Y = Z.However, in the actual algorithm (even with thenecessary path condition) we consider the setADJX−Y , which is larger, and therefore withan increased possibility of error.

Our proposal is to apply PC algorithm, butby considering in step 6 a cut set of mini-mum size in the graph without X − Y link, as

X Z Y

Figure 1: An small cut set

Acid and de Campos (2001) did in a differentcontext. The computation of this set will needsome extra time, but it can be done in polyno-mial time with a modification of Ford-Fulkersonalgorithm (Acid and de Campos, 1996).

3.2 Bayesian Statistical Tests

PC algorithm performs a chi-square statisticaltest to decide about independence. However,as shown by Moral (2004), sometimes statis-tical tests make too many errors. They tryto keep the Type I error (deciding dependencewhen there is independence) constant to thesignificance level. However, if the sample islarge enough this error can be much lower byusing a different decision procedure, withoutan important increase in Type II error (decid-ing independence when there is dependence).Margaritis (2003) has proposed to make statis-tical tests of independence for continuous vari-ables by using a Bayesian score after discretizingthem. Previously, Cooper (1997) proposed adifferent independence test based on a Bayesianscore, but only when conditioning to 0 or 1 vari-able. Here we propose to do all the statisticaltests by using a Bayesian Dirichlet score1 (Heck-erman, 1995) with a global sample size s equalto 1.0. The test I(X,Y |S) is carried out by com-paring the scores of X with S as parents and ofX with S ∪ Y as parents. If the former islarger than the later, the variables are consid-ered independent, and in the other case, they

1We have chosen this score instead of the originalK2 score (Cooper and Herskovits, 1992) because this isconsidered more correct from a theoretical point of view(Heckerman et al., 1995).

Some Variations on the PC Algorithm 3

Page 14: Proceedings of the Third European Workshop on Probabilistic

are considered dependent. The score of X witha set of parents Pa(X) = Z is the logarithm of:

z

(

Γ(s′)

Γ(Nz + s′)

x

Γ(Nz,x + s′′)

Γ(s′′)

)

where Nz is the number of occurrences of[Z = z] in the sample, Nz,x is the number ofoccurrences of [Z = z, X = x] in the sample, s′

is s divided by the number of possible values ofZ, and s′′ is equal to s′ divided by the numberof values of X.

3.3 Refinement

If the statistical tests do not make errors andthe faithfulness hypothesis is verified, then PCalgorithm will recover a graph equivalent to theoriginal one, but this can never be assured withfinite samples. Also, even if we recover the origi-nal graph, when our objective is to approximatethe joint distribution for all the variables, thendepending of the sample size, it can be moreconvenient to use a simpler graph than the trueone. Imagine that the variables follow the graphof Figure 2. This graph can be recovered byPC algorithm by doing only statistical indepen-dence tests of order 0 and 1 (conditioning tonone or 1 variable). However, when we are go-ing to estimate the parameters of the networkwe have to estimate a high number of probabil-ity values. This can be a too complex model(too many parameters) if the database is notlarge enough. In this situation, it can be rea-sonable to try to refine this network, taking intoaccount the actual orientation and the size ofthe model. In this sense, the result of PC al-gorithm can be used as an starting point for agreedy search algorithm to optimize a concretemetric.

In particular, our proposal is based on thefollowing steps:

1. Obtain an order compatible with the graphlearned by PC algorithm.

2. For each node, try to delete each one ofits parents or to add some of the non par-ents preceding nodes as parent, measuringthe resulting Bayesian Dirichlet score. We

Figure 2: A too complex network.

X Y

Z

Figure 3: A simple network

make the movement with highest score dif-ference while this is positive.

Refinement can also solve some of the prob-lems associated with the non verification ofthe faithfulness hypothesis. Assume for exam-ple, that we have a problem with 3 variables,X,Y,Z, and that the set of independencies isgiven by the independencies of the graph in Fig-ure 3 plus the independence I(Y,Z|∅). PC al-gorithm will estimate a network, where the linkbetween Y and Z is lost. Even if the sampleis large we will estimate a too simple networkwhich is not an I-map (Pearl, 1988). If we ori-ent the link X → Z in PC algorithm, refinementcan produce the network in Figure 3, by check-ing that the Bayesian score is increased (as itshould be the case if I(Z, Y |X) is not verified).

The idea of refining a learned Bayesian net-work by means of a greedy optimization of aBayesian score has been used in a different con-text by Dash and Druzdzel (1999).

3.4 Triangles Resolution

Imagine that we have 3 variables, X,Y,Z,and that no independence relationship involv-ing them is verified: each pair of variables is de-pendent and conditionally dependent giving thethird one. As there is a tendency to decide forindependence when the size of the conditioning

4 J. Abellán, M. Gómez-Olmedo, and S. Moral

Page 15: Proceedings of the Third European Workshop on Probabilistic

set is larger, then it is possible that all order 0tests produce dependence, but when we test theindependence of two variables with respect to athird one, we obtain independence. In this sit-uation, the result of PC algorithm, will dependof the order in which tests are carried out. Forexample, if we ask first for the independence,I(X,Y |Z), then the link X − Y is deleted, butnot the other two links, which will be oriented ina posterior step without creating a v-structure.If we test first I(X,Z|Y ), then the deleted linkwill be X − Z, but not the other two.

It seems reasonable that if one of the linksis going to be removed, we should choose theweakest one. In this paper, for each 3 vari-ables that are a triangle (the graph containsthe 3 links) after order 0 tests, we measure thestrength of link X − Y as the Bayesian scoreof X with Y,Z as parent, minus the Bayesianscore of X with Z as parents. For each trian-gle we delete the link with lowest strength (ifthis value is lower than 0). This is done as anintermediate step, between order 0 and order 1conditional independence tests.

In this paper, it has been implemented only inthe case in which independence tests are basedon a Bayesian score, but it could be also consid-ered in the case of Chi-square tests by consid-ering the strength of a link equal to the p-valueof the statistical independence test.

A deeper study of this type of interdepen-dencies between the deletion of links (the pres-ence of a link depends of the absence of otherone, and vice versa) has been carried out bySteck and Tresp (1999), but the resolution ofthese ambiguities is not done. Hugin system(Madsen et al., 2003) allows to decide betweenthe different possibilities by asking to the user.Our procedure could be extended to this moregeneral setting, but at this stage the implemen-tation has been limited to triangles, as it is, atthe sample time, the most usual and simplestsituation.

4 Experiments

We have done some experiments with the Alarmnetwork (Beinlich et al., 1989) for testing the

PC variations. In all of them, we have startedwith the original network and we have gener-ated samples of different sizes by logic sampling.Then, we have tried to recover the original net-work from the samples by using the differentvariations of the PC algorithm including the ori-entation step. We have considered the follow-ing measures of error in this process: number ofmissing links, number of added links, and theKullback-Leibler distance (Kullback, 1968) ofthe learned probability distribution to the orig-inal one2. Kullback-Leibler distance is a moreappropriate measure of error when the objectiveis to approximate the joint probability distribu-tion for all the variables and the measures ofnumber of differences in links is more appropri-ate when our objective is to recover the causalstructure of the problem. We do not considerthe number of wrong orientations as our vari-ations are mainly focused in the skeleton dis-covery phase of PC algorithm. The number ofadded or deleted links only depend of the firstpart of the learning algorithm (selection of theskeleton).

Experiments have been carried out in Elviraenvironment (Consortium, 2002), where a localcomputation of Kullback-Leibler distance is im-plemented. The different sample sizes we haveused are: 100, 500, 1000, 5000, 10000, and foreach sample size, we have repeated the exper-iment 100 times. The combinations of algo-rithms we have tested are the following:

Alg1 This is the algorithm with minimal separat-ing sets, score based tests, no refinement,and triangle resolution.

Alg2 Algorithm with minimal separating sets,score based tests, refinement, and triangleresolution.

Alg3 Algorithm with adjacent nodes as separat-ing sets, score based tests, no refinement,and triangle resolution.

Alg4 Algorithm with minimal separating sets,Chi-square tests, no refinement and no res-olution of triangles.

2The parameters are estimated with a BayesianDirichlet approach with a global sample size of 2.

Some Variations on the PC Algorithm 5

Page 16: Proceedings of the Third European Workshop on Probabilistic

100 500 1000 5000 10000

Alg1 17.94 8.76 6.02 3.25 2.56Alg2 16.29 8.16 5.59 3.7 3.28Alg3 26.54 18.47 14.87 8.18 7.08Alg4 29.07 11.49 8.42 3.53 2.14Alg5 17.87 8.83 6.08 3.03 2.57

Table 1: Average number of missing links

100 500 1000 5000 10000

Alg1 10.42 4.96 2.87 2.2 1.98Alg2 26.13 17.52 16.32 16.01 15.38Alg3 3.06 0.63 0.14 0.03 0.01Alg4 14.73 5.96 5.68 4.78 4.79Alg5 10.46 4.99 3.21 1.92 1.9

Table 2: Average number of added links

Alg5 This is the algorithm with minimal separat-ing sets, score based tests, no refinement,and no resolution of triangles.

These combinations are designed in this way,as we consider Alg1 our basic algorithm to re-cover the graph structure, and then we want tostudy the effect of the application of each oneof the variations to it.

Table 1 contains the average number of miss-ing links, Table 2 the average number of addedlinks, Table 3 the average Kulback-Leibler dis-tance, and finally Table 4 contains the averagerunning times of the different algorithms. Inthese results we highlight the following facts:

Refinement (Alg2) increases the number of er-rors in the recovering of the causal structure(mainly more added links), but decreases theKullback-Leibler distance to the original distri-bution. So, its application will depend of ourobjective: approximate the joint distribution orrecover the causal structure. Refinement is fast

100 500 1000 5000 10000

Alg1 4.15 2.46 1.81 0.98 0.99Alg2 2.91 0.96 0.56 0.19 0.11Alg3 4.98 3.27 2.58 1.11 0.77Alg4 5.96 2.19 1.49 1.05 0.91Alg5 4.15 2.36 1.86 1.11 0.98

Table 3: Average Kullback-Leibler distance

100 500 1000 5000 10000

Alg1 2.16 4.1 5.88 21.98 42Alg2 2.25 4.12 5.89 21.98 42.1Alg3 0.33 1.56 3.34 20.73 44.14Alg4 2.45 8.9 13.89 39.42 68.85Alg5 2.21 4.15 6.02 22.95 44.4

Table 4: Average time

and it does not add a significant amount of extratime.

When comparing Alg1 with Alg3 (minimumsize cut set vs set of adjacent nodes) we ob-serve that with adjacent nodes fewer links areadded and more ones are missing. The totalamount of errors is in favour of Alg1 (minimumsize cut set). This is due to the fact that Alg3makes more conditional independence tests andwith larger conditioning sets of variables, whichmakes more possible to delete links. Kullback-Leibler distance is better for Alg1 except for thelargest sample size. A possible explanation, isthat with this large sample size, we really donot miss any important link of the network, andadded links can be more dangerous than deletedones (when we delete links we are averaging dis-tributions). With smaller sample sizes, Alg3had a worse Kullback-Leibler as it can be miss-ing some important links. We do not have anypossible explanation to the fact that Alg1 doesnot improve Kullback-Leibler distance when in-creasing the sample size from 5000 to 10000.When comparing the time of both algorithms,we see that Alg1 needs more time (to computeminimum size cut sets) however, when the sam-ple size is large this extra time is compensatedby the lower number of statistical tests, beingthe total time for size 10000 lower in the case ofAlg1 (with minimum size cut sets).

When comparing Alg1 and Alg4 (Score testand triangle resolution vs Chi-square tests andno triangle resolution) we observe than Alg4 al-ways add more links and miss more links (exceptfor the largest sample size). The total numberof errors is lower for Alg1. It is meaningful thefact that the number of added links do not de-crease when going from a sample of 5000 to a

6 J. Abellán, M. Gómez-Olmedo, and S. Moral

Page 17: Proceedings of the Third European Workshop on Probabilistic

sample of 10000. This is due to the fact thatthe probability of considering dependence whenthere is independence is fixed (the significancelevel) for large the sample sizes. So all the ex-tra information of the larger sample is devotedto decrease the number of missing links (2.14 inAlg4 against 2.56 in Alg1), but the difference inadded links is 4.79 in Alg4 against 1.98 in Alg1.So the small decreasing in missing links is at thecost of a more important error in the number ofadded links. Bayesian scores tests are more bal-anced in the two types of errors. When consid-ering the Kullback-Leibler distance, we observeagain the same situation than when comparingAlg1 and Alg2: a greater number of errors inthe structure does not always imply a greaterKullback-Leibler distance. The time is alwaysgreater for Alg4.

The differences between Alg1 and Alg4 arenot due to the triangles resolution in Alg1. Aswe will see now, triangles resolution do not re-ally implies important changes in Alg1 perfor-mance. In fact, the effect of Chi-square testsagainst Bayesian tests without any other addi-tional factor, can be seen by comparing Alg5and Alg4. In this case, we can observe the samedifferences as when comparing Alg1 and Alg5.

When comparing Alg1 and Alg5 (no resolu-tion of triangles) we see that there is not im-portant differences in performance (errors andtime) when resolution of triangles is applied.It seems that the total number of errors is de-creased for intermediate sample sizes (500-1000)and there are not important differences for theother sample sizes, but more experiments arenecessary. Triangles resolution do not reallyadd a meaningful extra time. Applying this stepneeds some time, but the graph is simplified andposterior steps can be faster.

5 Conclusions

In this paper we have proposed four variationsof the PC algorithm and we have tested themwhen learning the Alarm network. Our finalrecommendation would be to use the PC algo-rithm with score based tests, minimum size cutsets, and triangle resolution. The application of

refinement step would depend of the final aim:if we want to learn the causal structure, then re-finement should not be applied, but if we wantto approximate the joint probability distribu-tion, then refinement should be applied. Werecognize that more extensive experiments arenecessary to evaluate the application of thesemodifications, specially the triangle resolution.But we feel that this modification is intuitivelysupported, and that it could have a more im-portant role in other situations, specially if thefaithfulness hypothesis is not verified.

Other combinations could be appropriated ifthe objective is different, for example if we wantto minimize the number of added links, thenAlg3 (with adjacent nodes as cut set) could beconsidered.

In the future we plan to make more extensiveexperiments testing different networks and dif-ferent combinations of these modifications. Atthe same time, we will consider another possiblevariations, as for example an algorithm mixingthe skeleton and orientation steps. It is possiblethat some of the independencies are tested con-ditional to some sets, that after the orientationdo not separate the two links. We also plan tostudy alternative scores and to study the efectof using different sample sizes. Also partial ori-entations can help to make the separating setseven smaller as there can be some paths whichare not active without observations. This canmake algorithms faster and more accurate. Fi-nally, we think that the use of PC and its vari-ations as starting points for greedy searchingalgorithms needs further research effort.

Acknowledgments

This work has been supported by the SpanishMinistry of Science and Technology under theAlgra project (TIN2004-06204-C03-02).

References

S. Acid and L.M. de Campos. 1996. Finding min-imum d-separating sets in belief networks. InProceedings of the Twelfth Annual Conference onUncertainty in Artificial Intelligence (UAI–96),pages 3–10, Portland, Oregon.

Some Variations on the PC Algorithm 7

Page 18: Proceedings of the Third European Workshop on Probabilistic

S. Acid and L.M. de Campos. 2001. A hy-brid methodology for learning belief networks:Benedit. International Journal of ApproximateReasoning, 27:235–262.

I.A. Beinlich, H.J. Suermondt, R.M. Chavez, andG.F. Cooper. 1989. The alarm monitoring sys-tem: A case study with two probabilistic inferencetechniques for belief networks. In Proceedings ofthe Second European Conference on Artificial In-telligence in Medicine, pages 247–256. Springer-Verlag.

R. Blanco, I. Inza, and P. Larraaga. 2003. Learningbayesian networks in the space of structures by es-timation of distribution algorithms. InternationalJournal of Intelligent Systems, 18:205–220.

W. Buntine. 1991. Theory refinement in bayesiannetworks. In Proceedings of the Seventh Con-ference on Uncertainty in Artificial Intelligence,pages 52–60. Morgan Kaufmann, San Francisco,CA.

J. Cohen. 1988. Statistical power analysis for thebehavioral sciences (2nd edition). Erlbaum, Hills-dale, NJ.

Elvira Consortium. 2002. Elvira: An environ-ment for probabilistic graphical models. In J.A.Gmez and A. Salmern, editors, Proceedings of the1st European Workshop on Probabilistic Graphi-cal Models, pages 222–230.

G.F. Cooper and E.A. Herskovits. 1992. A bayesianmethod for the induction of probabilistic networksfrom data. Machine Learning, 9:309–347.

G.F. Cooper. 1997. A simple constraint-basedalgorithm for efficiently mining observationaldatabases for causal relationships. Data Miningand Knowledge Discovery, 1:203–224.

D. Dash and M.J. Druzdzel. 1999. A hybrid any-time algorithm for the construction of causal mod-els from sparse data. In Proceedings of the Fif-teenth Annual Conference on Uncertainty in Arti-ficial Intelligence (UAI-99), pages 142–149. Mor-gan Kaufmann.

L.M. de Campos, J.M. Fernndez-Luna, J.A. Gmez,and J. M. Puerta. 2002. Ant colony optimiza-tion for learning bayesian networks. InternationalJournal of Approximate Reasoning, 31:511–549.

D. Heckerman, D. Geiger, and D.M. Chickering.1995. Learning bayesian networks: The combina-tion of knowledge and statistical data. MachineLearning, 20:197–243.

D. Heckerman, C. Meek, and G. Cooper. 1999. Abayesian approach to causal discovery. In C. Gly-mour and G.F. Cooper, editors, Computation,Causation, and Discovery, pages 141–165. AAAIPress.

D. Heckerman. 1995. A tutorial on learning withBayesian networks. Technical Report MSR-TR-95-06, Microsoft Research.

S. Kullback. 1968. Information Theory and Statis-tics. Dover, New York.

A.L. Madsen, M. Land, U.F. Kjærulff, andF. Jensen. 2003. The hugin tool for learning net-works. In T.D. Nielsen and N.L. Zhang, editors,Proceedings of ECSQARU 2003, pages 594–605.Springer-Verlag.

D. Margaritis. 2003. Learning Bayesian ModelStructure from Data. Ph.D. thesis, School ofComputer Science, Carnegie Mellon University.

S. Moral. 2004. An empirical comparison of scoremeasures for independence. In Proceedings of theTenth International Conference IPMU 2004, Vol.2, pages 1307–1314.

J. Pearl. 1988. Probabilistic Reasoning with Intelli-gent Systems. Morgan & Kaufman, San Mateo.

P. Spirtes, C. Glymour, and R. Scheines. 1993. Cau-sation, Prediction and Search. Springer Verlag,Berlin.

H. Steck and V. Tresp. 1999. Bayesian belief net-works for data mining. In Proceedings of the 2ndWorkshop on Data Mining und Data Warehous-ing als Grundlage moderner entscheidungsunter-stuetzender Systeme, pages 145–154.

S. van Dijk, L.C. can der Gaag, and D. Thierens.2003. A skeleton-based approach to learningbayesian networks from data. In Nada Lavrac,Dragan Gamberger, Ljupco Todorovski, and Hen-drik Blockeel, editors, Proceedings of the Sev-enth Conference on Principles and Practice ofKnowledge Discovery in Databases, pages 132–143. Springer Verlag.

8 J. Abellán, M. Gómez-Olmedo, and S. Moral

Page 19: Proceedings of the Third European Workshop on Probabilistic

Learning Complex Bayesian Network Features for Classification

Peter Antal, Andras Gezsi, Gabor Hullam and Andras MillinghofferDepartment of Measurement and Information Systems

Budapest University of Technology and Economics

Abstract

The increasing complexity of the models, the abundant electronic literature and therelative scarcity of the data make it necessary to use the Bayesian approach to complexqueries based on prior knowledge and structural models. In the paper we discuss theprobabilistic semantics of such statements, the computational challenges and possiblesolutions of Bayesian inference over complex Bayesian network features, particularly overfeatures relevant in the conditional analysis. We introduce a special feature called MarkovBlanket Graph. Next we present an application of the ordering-based Monte Carlo methodover Markov Blanket Graphs and Markov Blanket sets.

In the Bayesian approach to a structural fea-ture F with values F (G) ∈ fi

Ri=1 we are

interested in the feature posterior induced bythe model posterior given the observations DN ,where G denotes the structure of the Bayesiannetwork (BN)

p(fi|DN ) =∑

G

1(F (G) = fi)p(G|DN ) (1)

The importance of such inference results from(1) the frequently impractically high sampleand computational complexity of the completemodel, (2) a subsequent Bayesian decision-theoretic phase, (3) the availability of stochasticmethods for estimating such posteriors, and (4)the focusedness of the data and the prior oncertain aspects (e.g. by pattern of missing val-ues or better understood parts of the model).Correspondingly, there is a general expectationthat for small amount of data some propertiesof complex models can be inferred with highconfidence and relatively low computation costpreserving a model-based foundation.

The irregularity of the posterior over thediscrete model space of the Directed AcyclicGraphs (DAGs) poses serious challenges whensuch feature posteriors are to be estimated.This induced the research on the application ofMarkov Chain Monte Carlo (MCMC) methods

for elementary features (Madigan et al., 1996;Friedman and Koller, 2000). This paper ex-tends these results by investigating Bayesian in-ference about BN features with high-cardinality,relevant in classification. In Section 1 wepresent a unified view of BN features enrichedwith free-text annotations as a probabilisticknowledge base (pKB) and discuss the corre-sponding probabilistic semantics. In Section 2we overview the earlier approaches to featurelearning. In Section 3 we discuss structural BNfeatures and introduce a special feature calledMarkov Blanket Graph or Mechanism Bound-ary Graph. Section 4 discusses its relevance inconditional modeling. In Section 5 we report analgorithm using ordering-based MCMC meth-ods to perform inference over Markov BlanketGraphs and Markov Blanket sets. Section 6presents results for the ovarian tumor domain.

1 BN features in pKBs

Probabilistic and causal interpretations of BNensure that structural features can express awide range of relevant concepts based on condi-tional independence statements and causal as-sertions (Pearl, 1988; Pearl, 2000; Spirtes etal., 2001). To enrich this approach with sub-jective domain knowledge via free-text annota-tions, we introduce the concept of ProbabilisticAnnotated Bayesian Network knowledge base.

Page 20: Proceedings of the Third European Workshop on Probabilistic

Definition 1. A Probabilistic AnnotatedBayesian Network knowledge base K for a fixedset V of discrete random variables is a first-order logical knowledge base including standardgraph, string and BN related predicates, rela-tions and functions. Let Gw represent a targetDAG structure including all the target randomvariables. It includes free-text descriptions forthe subgraphs and for their subsets. We assumethat the models M of the knowledge base varyonly in Gw (i.e. there is a mapping G ↔ M)and a distribution p(Gw|ξ) is available.

A sentence α is any well-formed first-orderformula in K, the probability of which is definedas the expectation of its truth

Ep(M|K)[αM] =

G

αM(G)p(G|K).

where αM(G) denotes its truth-value in themodel M(G). This hybrid approach defines adistribution over models by combining a logi-cal knowledge base with a probabilistic model.The logical knowledge base describes the cer-tain knowledge in the domain defining a setof models (legal worlds) and the probabilisticpart (p(Gw|ξ)) expresses the uncertain knowl-edge over these worlds.

2 Earlier works

To avoid the statistical and computational bur-den of identifying complete models, related lo-cal algorithms for identifying causal relationswere reported in (Silverstein et al., 2000) and in(Glymour and Cooper, 1999; Mani and Cooper,2001). The majority of feature learning algo-rithms targets the learning of relevant variablesfor the conditional modeling of a central vari-able, i.e. they target the so-called feature sub-

set selection (FSS) problem (Kohavi and John,1997). Such examples are the Markov BlanketApproximating Algorithm (Koller and Sahami,1996) and the Incremential Association MarkovBlanket algorithm (Tsamardinos and Aliferis,2003). The subgraphs of a BN as features weretargeted in (Pe’er et al., 2001). The bootstrapapproach inducing confidence measures for fea-tures such as compelled edges, Markov blanket

membership and pairwise precedence was inves-tigated in (Friedman et al., 1999).

On the contrary, the Bayesian framework of-fers many advantages such as the normative,model-based combination of prior and data al-lowing unconstrained application in the smallsample region. Furthermore the feature poste-riors can be embedded in a probabilistic knowl-edge base and they can be used to induce pri-ors for other model spaces and for a subse-quent learning. In (Buntine, 1991) proposedthe concept of a posterior knowledge base con-ditioned on a given ordering for the analysisof BN models. In (Cooper and Herskovits,1992) discussed the general use of the poste-rior for BN structures to compute the poste-rior of arbitrary features. In (Madigan et al.,1996) proposed an MCMC scheme over thespace of DAGs and orderings of the variablesto approximate Bayesian inference. In (Heck-ermann et al., 1997) considered the applica-tion of the full Bayesian approach to causalBNs. Another MCMC scheme, the ordering-based MCMC method utilizing the ordering ofthe variables were reported in (Friedman andKoller, 2000). They developed and used aclosed form for the order conditional posteri-ors of Markov blanket membership, beside theearlier closed form of the parental sets.

3 BN features

The prevailing interpretation of BN featurelearning assumes that the feature set is sig-nificantly simpler than the complete domainmodel providing an overall characterization asmarginals and that the number of features andtheir values is tractable (e.g linear or quadraticin the number of variables). Another interpre-tation is to identify high-scoring arbitrary sub-graphs or parental sets, Markov blanket sub-sets and estimate their posteriors. A set of sim-ple features means a fragmentary representationfor the distribution over the complete domainmodel from multiple, though simplified aspects,whereas using a given complex feature means afocused representation from a single, but com-plex point of view. A feature F is complex if

10 P. Antal, A. Gézsi, G. Hullám, and A. Millinghoffer

Page 21: Proceedings of the Third European Workshop on Probabilistic

the number of its values is exponential in thenumber of domain variables. First we cite acentral concept and a theorem about relevancefor variables V = X1, . . . ,Xn (Pearl, 1988).

Definition 2. A set of variables MB(Xi)is called the Markov Blanket of Xi w.r.t.the distribution P (V ), if (Xi ⊥⊥ V \MB(Xi)|MB(Xi)). A minimal Markov blanketis called a Markov boundary.

Theorem 1. If a distribution P (V ) factorizes

w.r.t DAG G, then

∀ i = 1, . . . , n : (Xi ⊥⊥ V \ bd(Xi)|bd(Xi, G))P ,

where bd(Xi, G) denotes the set of parents, chil-

dren and the children’s other parents for Xi.

So the set bd(Xi, G) is a Markov blanket ofXi. So we will also refer to bd(Xi, G) as aMarkov blanket of Xi in G using the notationMB(Xi, G) implicitly assuming that P factor-izes w.r.t. G.

The induced, symmetric pairwise rela-tion is the Markov Blanket MembershipMBM(Xi,Xj , G) w.r.t. G between Xi and Xj

(Friedman et al., 1999)

MBM(Xi,Xj , G) ↔ Xj ∈ bd(Xi, G) (2)

Finally, we define the Markov Blanket Graph.

Definition 3. A subgraph of G is called theMarkov Blanket Graph or Mechanism Bound-ary Graph MBG(Xi, G) of a variable Xi if itincludes the nodes from MB(Xi, G) and the in-coming edges into Xi and into its children.

It is easy to show, that the characteristicproperty of the MBG feature is that it com-pletely defines the distribution P (Y |V \ Y ) bythe local dependency models of Y and its chil-dren in a BN model G, in case of point pa-rameters (G,θ) and of parameter priors satisfy-ing global parameter independence (Spiegelhal-ter and Lauritzen, 1990) and parameter modu-larity (Heckerman et al., 1995). This propertyoffers two interpretations for the MBG feature.From a probabilistic point of view the MBG(G)feature defines an equivalence relation over theDAGs w.r.t. P (Y |V \ Y ), but clearly the MBG

feature is not a unique representative of a condi-tionally equivalent class of BNs. From a causalpoint of view, this feature uniquely representsthe minimal set of mechanisms including Y . Inshort, under the conditions mentioned above,this structural feature of the causal BN domainmodel is necessary and sufficient to support themanual exploration and automated construc-tion of a conditional dependency model.

There is no closed formula for the posteriorp(MBG(Y,G)), which excludes the direct useof the MBG space in optimization or in MonteCarlo methods. However, there exist a formulafor the order conditional posterior with polyno-mial time complexity if the size of the parentalsets is bounded by k

p(MBG(Y,G) = mbg|DN ) =

p(pa(Y,mbg)|DN )∏

Y ≺Xi

Y ∈pa(Xi,mbg)

p(pa(Xi,mbg)|DN )

Y ≺Xi

Y /∈pa(Xi,mbg)

Y /∈pa(Xi)

p(pa(Xi)|DN ).

(3)

The cardinality of the MBG(Y ) space is stillsuper-exponential (even if the number of par-ents is bounded by k). Consider an order-ing of the variables such that Y is the firstand all the other variables are children of it,then the parental sets can be selected indepen-dently, so the number of alternatives is in theorder of (n − 1)n

2

(or (n − 1)(k−1)(n−1)). How-ever, if Y is the last in the ordering, then thenumber of alternatives is of the order 2n−1 or(n − 1)(k)). In case of MBG(Y,G), variableXi can be (1) non-occurring in the MBG, (2)a parent of Y (Xi ∈ pa(Y,G)), (3) a child ofY (Xi ∈ ch(Y,G)) and (4) (pure) other par-ent ((Xi /∈ pa(Y,G) ∧ (Xi ∈ pa(ch(Y )j)))).These types correspond to the irrelevant (1) andstrongly relevant (2,3,4) categories (see, Def. 4).The number of DAG models G(n) compatiblewith a given MBG and ordering ≺ can be com-puted as follows: the contribution of the vari-ables Xi ≺ Y without any constraint and thecontribution of the variables Y ≺ Xi that are

Learning Complex Bayesian Network Features for Classification 11

Page 22: Proceedings of the Third European Workshop on Probabilistic

not children of Y, which is still 2O((k)(n)log(n))

(note certain sparse graphs are compatible withmany orderings).

4 Features in conditional modeling

In the conditional Bayesian approach the rel-evance of predictor variables (features in thiscontext) can be defined in an asymptotic,algorithm-, model- and loss-free way as follows

Definition 4. A feature Xi is strongly rel-evant iff there exists some xi, y and si =x1, . . . , xi−1, xi+1, . . . , xn for which p(xi, si) > 0such that p(y|xi, si) 6= p(y|si). A feature Xi isweakly relevant iff it is not strongly relevant,and there exists a subset of features S′

i of Si forwhich there exists some xi, y and s′i for whichp(xi, s

′i) > 0 such that p(y|xi, s

′i) 6= p(y|s′i).

A feature is relevant if it is either weakly orstrongly relevant; otherwise it is irrelevant (Ko-havi and John, 1997).

In the so-called filter approach to feature se-lection we have to select a minimal subset X ′

which fully determines the conditional distri-bution of the target (p(Y |X) = p(Y |X ′)). Ifthe conditional modeling is not applicable anda domain model-based approach is necessarythen the Markov boundary property (feature)seems to be an ideal candidate for identifyingrelevance. The following theorem gives a suf-ficient condition for uniqueness and minimality(Tsamardinos and Aliferis, 2003).

Theorem 2. If the distribution P is stable

w.r.t. the DAG G, then the variables bd(Y,G)form a unique and minimal Markov blanket of

Y , MB(Y ). Furthermore, Xi ∈ MB(Y ) iff Xi

is strongly relevant.

However, the MBG feature provides a moredetailed description about relevancies. As anexample, consider that a logistic regression (LR)model without interaction terms and a NaiveBN model can be made conditionally equiva-lent using a local and transparent parametertransformation. If the distribution contains ad-ditional dependencies, then the induced condi-tional distribution has to be represented by aLR model with interaction terms.

5 Estimating complex features

The basic task is the estimation of the expecta-tion of a given random variable over the spaceof DAGs with a specified confidence level inEq. 1. We assume complete data, discrete do-main variables, multinomial local conditionaldistributions and Dirichlet parameter priors. Itensures efficiently computable closed formulasfor the (unnormalized) posteriors of DAGs. Asthis posterior cannot be sampled directly andthe construction of an approximating distribu-tion is frequently not feasible, the standard ap-proach is to use MCMC methods such as theMetropolis-Hastings over the DAG space (seee.g. (Gamerman, 1997; Gelman et al., 1995)).

The DAG-based MCMC method for estimat-ing a given expectation is generally applicable,but for certain types of features such as Markovblanket membership an improved method, theso-called ordering-based MCMC method can beapplied, which utilizes closed-forms of the or-der conditional feature posteriors computablein O(nk+1) time, where k denotes the maximumnumber of parents (Friedman and Koller, 2000).

In these approaches the problem is simplifiedto the estimation of separate posteriors. How-ever, the number of target features can be ashigh as 104 − 106 even for a given type of pair-wise features and moderate domain complex-ity. This calls for a decision-theoretic reportof selection and estimation of the features, buthere we use a simplified approach targeting theselection-estimation of the K most probable fea-ture values. Because of the exponential num-ber of feature values a search method has tobe applied either iteratively or in an integratedfashion. The first approach requires the offlinestorage of orderings and corresponding commonfactors, so we investigated this latter option.The integrated feature selection-estimation isparticularly relevant for the ordering-based MCmethods, because it does not generate implic-itly “high-scoring” features and features thatare not part of the solution cause extra com-putational costs in estimation.

The goal of search within the MC cycle atstep l is the generation of MBGs with high

12 P. Antal, A. Gézsi, G. Hullám, and A. Millinghoffer

Page 23: Proceedings of the Third European Workshop on Probabilistic

order conditional posterior, potentially usingthe already generated MBGs and the posteri-ors p(MBG| ≺l,DN ). To facilitate the searchwe define an order conditional MBG state spacebased on the observation that the order condi-tionally MAP MBG can be found in O(nk+1)time with a negligible constant increase only.An MBG state is represented by an n′ dimen-sional vector s, where n′ is the number of vari-ables not preceding the target variable Y in theordering ≺:

n′ =

n∑

i=1

1(Y Xi) (4)

The range of the values are integers si =0, . . . , ri representing either separate parentalsets or (in case of Xi where Y ≺l Xi) a specialset of parental sets not including the target vari-able. The product of the order conditional pos-teriors of the represented sets of parental setsgives the order conditional posterior of the rep-resented MBG state in Eq. 3. We ensure thatthe conditional posteriors of the represented setsof parental sets are monotone decreasing w.r.t.their indices:

∀si < s′i : p(si|DN ,≺) ≥ p(s′i|DN ,≺) (5)

which can be constructed inO(nk+1 log(maxi ri)) time, where k is themaximum parental set size.

This MBG space allows the application of ef-ficient search methods. We experimented witha direct sampling, top-sampling and a deter-ministic search. The direct sampling was usedas a baseline, because it does need the MBGspace. The top-sampling method is biased to-wards sampling MBGs with high order condi-tional posterior, by sampling only from the Lmost probable sets of parental sets for eachY - Xi. The uniform-cost search constructsthe MBG space, then performs a uniform-costsearch to a maximum number of MBGs or tothreshold p(MBGMAP,≺| ≺,DN )/ρS .

The pseudo code of searching and estimat-ing MAP values for the MBG feature of a givenvariable is shown in Alg. 1 (for simplicity the

estimation of simple classification features suchas edge and MBM relations, and the estimationof the MB features of a given variable using theestimated MAP MBG collection is not shown).

Algorithm 1 Search and estimation of classifi-cation features using the MBG-ordering spaces

Require: p(≺),p(pa(Xi)| ≺),k,R,ρ,LS ,ρS ,LT ,M;Ensure: K MAP feature value with estimates

Cache order-free parental posteriors Π =∀ i, |pa(Xi)| ≤ k : p(pa(Xi)|DN )Initialize MCMC, the MBG-tree T , MBMand edge posterior matrices R, E ;Insert a priori specified MBGs in T ;for l = 0 to M do the sampling cycle

Draw next ordering;Cache order specific common factors Ψ for|pa(Xi)| ≤ k:p(pa(Xi)| ≺l) for all Xi

p(Y /∈ pa(Xi)| ≺l) for Y ≺l Xi;Compute p(≺l |DN );Construct order conditional MBG-Subspace(Π,Ψ, R, ρ)=ΦSS=Search(Φ,LS , ρS);for all mbg ∈ SS do

if mbg /∈ T then

Insert(T ,mbg)if LT < |T | then

T =PruneToHPD(T ,LT );for all mbg ∈ T do

p(mbg|DN )+ = p(mbg| ≺l,DN );Report K MAP MBGs from T ;Report K’ MAP MBs using the MBGs in T ;

Parameters R, ρ allow the restriction of theMBG subspace separately for each dimension toa less than R values by requiring that the corre-sponding posteriors are above the exp(−ρ) ratioof the respective MAP value. The uniform-costsearch starts from the order conditional MAPMBG, and stops after the expansion of LS num-ber of states or if the most probable MBG in itssearch list drops below exp(ρS) ratio of the or-der conditional posterior of the starting MBG.

Generally, the expansion phase has high com-putational costs, but for large LT the update ofthe MBGs in T is high as well. In order tomaintain tractability the usage of more refined

Learning Complex Bayesian Network Features for Classification 13

Page 24: Proceedings of the Third European Workshop on Probabilistic

methods such as partial updating are required.Within the explored OC domain however thefull, exact update has acceptable costs if thesize of the estimated MBG set is LT ∈ [105, 106].This LT ensures that the newly inserted MBGsare not pruned before their estimates can re-liable indicate their high-scoring potential andstill allows an exact update. In larger domainsthis balance can be different and the analysis ofthis question in general is for future research.

The analysis of the MBG space showed thatthe conditional posteriors of the ranked parentalsets after rank 10 are negligible, so subsequentlywe will report results using values R = 20, ρ = 4and LS = 104, ρS = 10−6. Note that the ex-pansion with the LS conditionally most proba-ble MBGs in each step does not guarantee thatthe LS most probable MBGs are estimated, noteven the MAP MBG.

6 Results

We used a data set consisting of 35 discrete vari-ables and 782 complete cases related to the pre-operative diagnosis of ovarian cancer (see (Antalet al., 2004)).

First we report the estimation-selection ofMB features for the central variable Pathology.We applied the heuristic deterministic search-estimation method in the inner cycle of theMCMC method. The length of the burn-inand MCMC simulation was 10000, the prob-ability of the pairwise replace operator was0.8, the parameter prior was the BDeu andthe structure prior was uniform prior for theparental set sizes (Friedman and Koller, 2000).The maximum number of parents was 4 (theposteriors of larger sets are insignificant).For preselected high-scoring MB values after10000 burn-in the single-chain convergencetest from Geweke comparing averages hasz-score approximately 0.5, the R value ofthe multiple-chain method of Gelman-Rubinwith 5 chains drops below 1.05 (Gamerman,1997; Gelman et al., 1995). The variancesof the MCMC estimates of these preselectedtest feature values drop below 10−2. We alsoapplied the deterministic search-estimation

method for a single ordering, because a totalordering of the variables was available from anexpert. Fig. 1 reports the estimated posteriorsof the MAP MB sets for Pathology with theirMBM-based approximated values assumingthe independence of the MBM values andTable 1 shows the members of the MB sets.Note that the two monotone decreasing curvescorrespond to independent rankings, one forthe expert’s total ordering and one for theunconstrained case. It also reports the MBset spanned by a prior BN specified by theexpert (E), the MB set spanned by the MAPBN (BN) and the set spanned by the MAPMBG (MBG) (see Eq. 6). Furthermore wegenerated another reference set (LR) from aconditional standpoint using the logistic regres-sion model class and the SPSS 14.0 softwarewith the default setting for forward modelconstruction (Hosmer and Lemeshow, 2000).MBp reports the result of the deterministicselect-estimate method using the total orderingof the expert and the MB1,MB2,MB3 reportthe result of the unconstrained ordering-basedMCMC with deterministic select-estimatemethod. Variables FamHist,CycleDay,HormTherapy, Hysterectomy, ParityPMenoAge are never selected and the variablesV olume, Ascites, Papillation, PapF low,CA125, WallRegularity are always selected,so they are not reported.

The MBM-based approximation performs rel-atively well, particularly w.r.t. ranking in thecase of the expert’s ordering ≺0, but it performspoorly in the unconstrained case both w.r.t. es-timations and ranks (see the difference of Mp setto M1 w.r.t. variables Age, Meno, PI, TAMX,Solid).

We compared the MBG(Y,GMAP ) andMB(Y,GMAP ) feature values defined by theMAP BN structure GMAP against the MAPMBG feature value MBG(Y )MAP and theMAP MB feature value MB(Y )MAP includingthe MB feature value defined by the MAP MBGfeature value MB(Y,MBG(Y )MAP )

14 P. Antal, A. Gézsi, G. Hullám, and A. Millinghoffer

Page 25: Proceedings of the Third European Workshop on Probabilistic

Table 1: Markov blanket sets of Pathologyamong the thirty-five variables.

E LR BN MG MBp MB1 MB2 MB3

Age 1 0 1 0 1 0 0 0Meno 1 0 1 1 0 1 1 1PMenoY 0 1 0 0 0 0 0 0PillUse 1 0 0 0 0 0 0 0Bilateral 1 0 1 1 1 1 1 1Pain 0 1 0 0 0 0 0 0Fluid 1 0 1 0 0 0 0 0Septum 1 0 1 1 1 1 1 1ISeptum 0 0 1 1 1 1 1 1PSmooth 1 0 0 0 0 0 0 0Loc. 1 1 0 0 0 1 0 1Shadows 1 0 1 1 1 1 1 1Echog. 1 0 1 1 1 1 1 1ColScore 1 0 1 1 1 1 1 1PI 1 1 0 0 1 0 0 0RI 1 0 1 1 1 1 1 1PSV 1 0 1 1 1 1 1 1TAMX 1 1 0 1 1 0 0 0Solid 1 1 1 1 1 0 1 0FHBrCa 0 0 0 0 0 0 0 1FHOvCa 0 0 0 1 0 0 0 0

GMAP = arg maxG

p(G|DN ) (6)

MBG(Y )MAP = arg maxmbg(Y )

p(mbg(Y )|DN )

MB(Y )MAP = arg maxmb(Y )

p(mb(Y )|DN )

We performed the comparison using the bestBN structure found in the MCMC simulation.The MAP MBG feature value MBG(Y )MAP

differed significantly from the MAP domainmodel, because of an additional Age and Fluidvariables in the domain model. The MAPMB feature value MB(Y )MAP similarly differsfrom the MB sets defined by the MAP domainmodels for example w.r.t. the vascularizationvariables such as PI. Interestingly, the MAPMB feature value also differs from the MB fea-ture value defined by the MAP MBG featurevalue MB(Y,MBG(Y )MAP ), for example w.r.t.TAMX, Solid variables. In conclusion theseresults together with the comparison againstthe simple feature-based analysis such as MBM-based analysis reported in Fig. 1, show the rel-evance of the complex feature-based analysis.

We also constructed an offline probabilisticknowledge base containing 104 MAP MBGs. It

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1 3 5 7 9 11 13 15 17 19

P(MB|<,D)

P(MB|<,D,MBM)

P(MB|D)

P(MB|D,MBM)

Figure 1: The ranked posteriors and theirMBM-based approximations of the 20 mostprobable MB(Pathology) sets for the sin-gle/unconstrained orderings.

is connected with the annotated BN knowledgebase defined in Def. 1, which allows an offlineexploration of the domain from the point of viewof conditional modeling. The histogram of thenumber of parameters and inputs for the MBGsusing only the fourteen most relevant variablesare reported in Fig. 2.

100 200 300 400 500 600 700 800 8

10

12

14

−600

−400

−200

0

200

400

600

800

|Inputs|

|Params|

Fre

quen

cy

100

200

300

400

500

600

Figure 2: The histogram of the number of pa-rameters and inputs of the MAP MBGs.

7 Conclusion

In the paper we presented a Bayesian approachfor complex BN features, for the so-calledMarkov Blanket set and the Markov Blanket

Learning Complex Bayesian Network Features for Classification 15

Page 26: Proceedings of the Third European Workshop on Probabilistic

Graph features. We developed and applied aselect-estimate algorithm using ordering-basedMCMC, which uses the efficiently computableorder conditional posterior of the MBG featureand the proposed MBG space. The compari-son of the most probable MB and MBG fea-ture values with simple feature based approx-imations and with complete domain modelingshowed the separate significance of the analy-sis based on these complex BN features in theinvestigated medical domain. The proposed al-gorithm and the offline knowledge base in theintroduced probabilistic annotated BN knowl-edge base context allows new types of analysisand fusion of expertise, data and literature.

Acknowledgements

We thank T. Dobrowiecki for his helpful com-ments. Our research was supported by grantsfrom Research Council KUL: IDO (IOTA On-cology, Genetic networks).

References

P. Antal, G. Fannes, Y. Moreau, D. Timmerman,and B. De Moor. 2004. Using literature and datato learn Bayesian networks as clinical models ofovarian tumors. AI in Med., 30:257–281. Specialissue on Bayesian Models in Med.

W. L. Buntine. 1991. Theory refinement of Bayesiannetworks. In Proc. of the 7th Conf. on Un-certainty in Artificial Intelligence , pages 52–60.Morgan Kaufmann.

G. F. Cooper and E. Herskovits. 1992. A Bayesianmethod for the induction of probabilistic networksfrom data. Machine Learning, 9:309–347.

N. Friedman and D. Koller. 2000. Being Bayesianabout network structure. In Proc. of the 16thConf. on Uncertainty in Artificial Intelligence,pages 201–211. Morgan Kaufmann.

N. Friedman, M. Goldszmidt, and A. Wyner. 1999.Data analysis with bayesian networks: A Boot-strap approach. In Proc. of the 15th Conf. onUncertainty in Artificial Intelligence, pages 196–205. Morgan Kaufmann.

D. Gamerman. 1997. Markov Chain Monte Carlo.Chapman & Hall, London.

A. Gelman, J. B. Carlin, H. S. Stern, and D. B.Rubin. 1995. Bayesian Data Analysis. Chapman& Hall, London.

C. Glymour and G. F. Cooper. 1999. Computation,Causation, and Discovery. AAAI Press.

D. Heckerman, D. Geiger, and D. Chickering. 1995.Learning Bayesian networks: The combination ofknowledge and statistical data. Machine Learn-ing, 20:197–243.

D. Heckermann, C. Meek, and G. Cooper. 1997. Abayesian aproach to causal discovery. TechnicalReport, MSR-TR-97-05.

D. W. Hosmer and S. Lemeshow. 2000. AppliedLogistic Regression. Wiley & Sons, Chichester.

R. Kohavi and G. H. John. 1997. Wrappers forfeature subset selection. Artificial Intelligence,97:273–324.

D. Koller and M. Sahami. 1996. Toward optimalfeature selection. In International Conference onMachine Learning, pages 284–292.

D. Madigan, S. A. Andersson, M. Perlman, andC. T. Volinsky. 1996. Bayesian model averag-ing and model selection for markov equivalenceclasses of acyclic digraphs. Comm.Statist. TheoryMethods, 25:2493–2520.

S. Mani and G. F. Cooper. 2001. A simulation studyof three related causal data mining algorithms. InInternational Workshop on Artificial Intelligenceand Statistics, pages 73–80. Morgan Kaufmann,San Francisco, CA.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems. Morgan Kaufmann, San Francisco, CA.

J. Pearl. 2000. Causality: Models, Reasoning, andInference. Cambridge University Press.

D. Pe’er, A. Regev, G. Elidan, and N. Friedman.2001. Inferring subnetworks from perturbed ex-pression profiles. Bioinformatics, Proceedings ofISMB 2001, 17(Suppl. 1):215–224.

C. Silverstein, S. Brin, R. Motwani, and J. D. Ull-man. 2000. Scalable techniques for mining causalstructures. Data Mining and Knowledge Discov-ery, 4(2/3):163–192.

D. J. Spiegelhalter and S. L. Lauritzen. 1990. Se-quential updating of conditional probabilities ondirected acyclic graphical structures. Networks,20(.):579–605.

P. Spirtes, C. Glymour, and R. Scheines. 2001. Cau-sation, Prediction, and Search. MIT Press.

I. Tsamardinos and C. Aliferis. 2003. Towardsprincipled feature selection: Relevancy,filters, andwrappers. In Proc. of the Artificial Intelligenceand Statistics, pages 334–342.

16 P. Antal, A. Gézsi, G. Hullám, and A. Millinghoffer

Page 27: Proceedings of the Third European Workshop on Probabilistic

Literature Mining using Bayesian Networks

Peter Antal and Andras MillinghofferDepartment of Measurement and Information Systems

Budapest University of Technology and Economics

Abstract

In biomedical domains, free text electronic literature is an important resource for knowl-edge discovery and acquisition, particularly to provide a priori components for evaluatingor learning domain models. Aiming at the automated extraction of this prior knowledgewe discuss the types of uncertainties in a domain with respect to causal mechanisms,formulate assumptions about their report in scientific papers and derive generative prob-abilistic models for the occurrences of biomedical concepts in papers. These results allowthe discovery and extraction of latent causal dependency relations from the domain lit-erature using minimal linguistic support. Contrary to the currently prevailing methods,which assume that relations are sufficiently formulated for linguistic methods, our ap-proach assumes only the report of causally associated entities without their tentativestatus or relations, and can discover new relations and prune redundancies by providinga domain-wide model. Therefore the proposed Bayesian network based text mining is animportant complement to the linguistic approaches.

1 Introduction

Rapid accumulation of biological data and thecorresponding knowledge posed a new challengeof making this voluminous, uncertain and fre-quently inconsistent knowledge accessible. De-spite recent trends to broaden the scope of for-mal knowledge bases in biomedical domains,free text electronic literature is still the centralrepository of the domain knowledge. This cen-tral role will probably be retained in the nearfuture, because of the rapidly expanding fron-tiers. The extraction of explicitly stated or thediscovery of implicitly present latent knowledgerequires various techniques ranging from purelylinguistic approaches to machine learning meth-ods. In the paper we investigate a domain-model based approach to statistical inferenceabout dependence and causal relations giventhe literature using minimal linguistic prepro-cessing. We use Bayesian Networks (BNs) ascausal domain models to introduce generativemodels of publication, i.e. we examine the re-lation of domain models and generative modelsof the corresponding literature.

In a wider sense our work provides supportto statistical inference about the structure ofthe domain model. This is a two-step process,which consists of the reconstruction of the be-liefs in mechanisms from the literature by modellearning and their usage in a subsequent learn-ing phase. Here, the Bayesian framework is anobvious choice. Earlier applications of text min-ing provided results for the domain experts ordata analysts, whereas our aim is to go one stepfurther and use the results directly in the sta-tistical learning of the domain models.

The paper is organized as follows. Section 2presents a unified view of the literature, thedata and their models. In Section 3 we re-view the types of uncertainties in biomedical do-mains from a causal, mechanism-oriented pointof view. In Section 4 we summarize recent ap-proaches to information extraction and liter-ature mining based on natural language pro-cessing (NLP) and “local” analysis of occur-rence patterns. In Section 5 we propose gen-erative probabilistic models for the occurrencesof biomedical concepts in scientific papers. Sec-tion 6 presents textual aspects of the application

Page 28: Proceedings of the Third European Workshop on Probabilistic

domain, the diagnosis of ovarian cancer. Sec-tion 7 reports results on learning BNs given theliterature.

2 Fusion of literature and data

The relation of experimental data DN , proba-bilistic causal domain models formalized as BNs(G, θ), domain literature DL

N ′ and models ofpublication (GL, θL) can be approached at dif-ferent levels. For the moment, let us assumethat probabilistic models are available describ-ing the generation of observations P (DN |(G, θ))and literature P (DL

N ′ |(GL, θL)). This lattermay include stochastic grammars for modelingthe linguistic aspects of the publication, how-ever, we will assume that the literature has asimplified agrammatical representation and thecorresponding generative model can be formal-ized as a BN (GL, θL) as well.

The main question is the relation of P (G, θ)and P (GL, θL). In the most general approachthe hypothetical posteriors P (G, θ|DN , ξi) ex-pressing personal beliefs over the domain mod-els conditional on the experiments and the per-sonal background knowledge ξi determine orat least influence the parameters of the model(GL, θL) in P (DL

N ′ |(GL, θL), ξi).

The construction or the learning of a full-fledged decision theoretic model of publicationis currently not feasible regarding the state ofquantitative modeling of scientific research andpublication policies, not to mention the cogni-tive and even stylistic aspects of explanation,understanding and learning (Rosenberg, 2000).In a severely restricted approach we will fo-cus only on the effect of the belief in domainmodels P (G, θ) on that in publication modelsP (GL, θL). We will assume that this transfor-mation is “local”, i.e. there is a simple proba-bilistic link between the model spaces, specifi-cally between the structure of the domain modeland the structure and parameters of the pub-lication model p(GL, θL|G). Probabilisticallylinked model spaces allow the computation ofthe posterior over domain models given the lit-

erature(!) data as:

P (G|DLN ′) =

P (G)P (DL

N ′)

GL

P (DLN ′ |GL)P (GL|G).

The formalization (DN ← G → GL → DLN ′)

also allows the computation of the posteriorover the domain models given both clinical andthe literature(!) data as:

P (G|DN , DLN ′) = P (G)

P (DLN ′ |G)

P (DLN ′)

P (DN |G)P (DN |DL

N ′)

∝ P (G)P (DN |G)∑

GL

P (DLN ′ |GL)P (GL|G),

The order of the factors shows that the prioris first updated by the literature data, then bythe clinical data. A considerable advantage ofthis approach is the integration of literature andclinical data at the lowest level and not throughfeature posteriors, i.e. by using literature pos-teriors in feature-based priors for the (clinical)data analysis (Antal et al., 2004).

We will assume that a bijective relation existsbetween the domain model structures G and thepublication model structures GL (T (G) = GL),whereas the parameters θL may encode addi-tional aspects of publication policies and expla-nation. We will focus on the logical link be-tween the structures, where the posterior giventhe literature and possibly the clinical data is:

P (G|DN , DLN ′ , ξ) (1)

∝ P (G|ξ)P (DN |G)P (DLN ′ |T (G)).

This shows the equal status of the literatureand the clinical data. In integrated learningfrom heterogeneous sources however, the scal-ing of the sources is advisable to express ourconfidence in them.

3 Concepts, associations, causation

Frequently a biomedical domain can be charac-terized by a dominant type of uncertainty w.r.tthe causal mechanisms. Such types of uncer-tainty show certain sequentiality described be-low, related to the development of biomedicalknowledge, though a strictly sequential view isclearly an oversimplification.

18 P. Antalband A. Millinghoffer

Page 29: Proceedings of the Third European Workshop on Probabilistic

(1) Conceptual phase: Uncertainty over thedomain ontology, i.e. the relevant entities.

(2) Associative phase: Uncertainty over theassociation of entities, reported in the literatureas indirect, associative hypotheses, frequentlyas clusters of entities. Though we accept thegeneral view of causal relations behind associa-tions, we assume that the exact causal functionsand direct relations are unknown.

(3) Causal relevance phase: (Existential) un-certainty over causal relations (i.e. over mech-anisms). Typically, direct causal relations aretheoretized as processes and mechanisms.

(4) Causal effect phase: Uncertainty over thestrength of the autonomous mechanisms em-bodying the causal relations.

In this paper we assume that the target do-main is already in the Associative or Causalphase, i.e. that the entities are more or lessagreed, but their causal relations are mostlyin the discovery phase. This holds in manybiomedical domains, particularly in those link-ing biological and clinical levels. There the As-sociative phase is a crucial but lengthy knowl-edge accumulation process, where wide range ofresearch methods is used to report associatedpairs or clusters of the domain entities. Thesemethods admittedly produce causally orientedassociative relations which are partial, biasedand noisy.

4 Literature mining

Literature mining methods can be classified intobottom-up (pairwise) and top-down (domainmodel based) methods. Bottom-up methods at-tempt to identify individual relationships andthe integration is left to the domain expert. Lin-guistic approaches assume that the individualrelations are sufficiently known, formulated andreported for automated detection methods. Onthe contrary, top-down methods concentrate onidentifying consistent domain models by analyz-ing jointly the domain literature. They assumethat mainly causally associated entities are re-ported with or without tentative relations anddirect structural knowledge. Their linguisticformulation is highly variable, not conforming

to simple grammatical characterization. Conse-quently top-down methods typically use agram-matical text representations and minimal lin-guistic support. They autonomously prune theredundant, inconsistent, indirect relations byevaluating consistent domain models and candeliver results in domains already in the Asso-ciative phase.

Until recently mainly bottom-up methodshave been analyzed in the literature: linguisticapproaches extract explicitly stated relations,possibly with qualitative ratings (Proux et al.,2000; Hirschman et al., 2002); co-occurrenceanalysis quantifies the pairwise relations of vari-ables by their relative frequency (Stapley andBenoit, 2000; Jenssen et al., 2001); kernel sim-ilarity analysis uses the textual descriptions orthe occurrence patterns of variables in publi-cations to quantify their relation (Shatkay etal., 2002); Swanson and Smalheiser (1997) dis-cover relationships through the heuristic pat-tern analysis of citations and co-occurrences; in(Cooper, 1997) and (Mani and Cooper, 2000)local constraints were applied to cope with pos-sible hidden confounders, to support the discov-ery of causal relations; joint statistical analysisin (Krauthammer et al., 2002) fits a generativemodel to the temporal pattern of corrobora-tions, refutations and citations of individual re-lations to identify “true” statements. The top-down method of the joint statistical analysis ofde Campos (1998) learns a restricted BN the-saurus from the occurrence patterns of words inthe literature. Our approach is closest to thisand those of Krauthammer et al. and Mani.

The reconstruction of informative and faith-ful priors over domain mechanisms or modelsfrom research papers is further complicated bythe multiple aspects of uncertainty about the ex-istence, scope (conditions of validity), strength,causality (direction), robustness for perturba-tion and relevance of mechanism and the in-completeness of reported relations, because theyare assumed to be well-known parts of commonsense knowledge or of the paradigmatic alreadyreported knowledge of the community.

Literature Mining using Bayesian Networks 19

Page 30: Proceedings of the Third European Workshop on Probabilistic

5 BN models of publications

Considering (biomedical) abstracts, we adoptthe central role of causal understanding and ex-planation in scientific research and publication(Thagard, 1998). Furthermore, we assume thatthe contemporary (collective) uncertainty overmechanisms is an important factor influenc-ing the publications. According to this causalstance, we accept the ‘causal relevance’ inter-pretation, more specifically the ‘explained’ (ex-planandum) and ‘explanatory’ (explanans), inaddition, we allow the ‘described’ status. This isappealing, because in the assumed causal publi-cations both the name occurrence and the pre-processing kernel similarity method (see Sec-tion 6) express the presence or relevance of theconcept corresponding to the respective vari-able. This implicitly means that we assumethat publications contain either descriptions ofthe domain concepts without considering theirrelations or the occurrences of entities partici-pating in known or latent causal relations. Weassume that there is only one causal mechanismfor each parental set, so we will equate a givenparental set and the mechanism based on it.

Furthermore, we assume that mainly positivestatements are reported and we treat negationand refutation as noise, and that exclusive hy-potheses are reported, i.e. we treat alternativesas one aggregated hypothesis. Additionally, wepresume that the dominant type of publicationsare causally (“forward”) oriented. We attemptto model the transitive nature of causal expla-nation over mechanisms, e.g. that causal mech-anisms with a common cause or with a commoneffect are surveyed in an article, or that subse-quent causal mechanisms are tracked to demon-strate a causal chain. On the other hand, wealso have to model the lack of transitivity, i.e.the incompleteness of causal explanations, e.g.that certain variables are assumed as explana-tory, others as potentially explained, except forsurvey articles that describe an overall domainmodel. Finally, we assume that the reports ofthe causal mechanisms and the univariate de-scriptions are independent of each other.

5.1 The intransitive publication model

The first generative model is a two-layer BN.The upper-layer variables represent the prag-matic functions (described or explanandum) ofthe corresponding concepts, while lower-layervariables represent their observable occurrences(described, explanatory or explained). Upper-layer variables can be interpreted as the in-tentions of the authors or as the property ofthe given experimental technique. We assumethat lower-layer variables are influenced only bythe upper-layer ones denoting the correspond-ing mechanisms, and not by any other externalquantities, e.g. by the number of the reportedentities in the paper. A further assumption isthat the belief in a compound mechanism is theproduct of the beliefs in the pairwise dependen-cies. Consequently we use noisy-OR canonicdistributions for the children in the lower layer.In a noisy-OR local dependency (Pearl, 1988),the edges can be labeled with a parameter, in-hibiting the OR function, which can be inter-preted also structurally as the probability of animplicative edge.

This model extends the atomistic, individual-mechanism oriented information extractionmethods by supporting the joint learning of allthe mechanisms, i.e. by the search for a domain-wide coherent model. However it still cannotmodel the dependencies between the reportedassociations, and the presence of hidden vari-ables considerably increase the computationalcomplexity of parameter and structure learning.

5.2 The transitive publication model

To devise a more advanced model, we relax theassumption of the independence between thevariables in the upper layer representing thepragmatic functions, and we adapt the mod-els to the bag-of-word representation of publi-cations (see Section 6). Consequently we an-alyze the possible pragmatic functions corre-sponding to the domain variables, which couldbe represented by hidden variables. We assumehere that the explanatory roles of a variable arenot differentiated, and that if a variable is ex-plained, then it can be explanatory for any other

20 P. Antalband A. Millinghoffer

Page 31: Proceedings of the Third European Workshop on Probabilistic

variable. We assume also full observability ofcausal relevance, i.e. that the lack of occur-rence of an entity in a paper means causal ir-relevance w.r.t. the mechanisms and variablesin the paper and not a neutral omission. Theseassumptions allow the merging of the explana-tory, explained and described status with theobservable reported status, i.e. we can repre-sent them jointly with a single binary variable.Note that these assumptions remain tenable incase of report of experiments, where the patternof relevancies has a transitive-causal bias.

These would imply that we can model onlyfull survey papers, but the general, uncon-strained multinomial dependency model used inthe transitive BNs provides enough freedom toavoid this. A possible semantics of the parame-ters of a binary, transitive literature BN can bederived from a causal stance that the presenceof an entity Xi is influenced only by the pres-ence of its potential explanatory entities, i.e. itsparents. Consequently, P (Xi = 1|PaXi = paxi)can be interpreted as the belief that the presentparental variables can explain the entities Xi

(PaXi denotes the parents of Xi and PaXi →Xi denotes the parental substructure). In thatway the parameters of a complete network canrepresent the priors for parental sets compatiblewith the implied ordering:

P (Xi = 1|PaXi = paXi) = P (PaXi = paXi)(2)

where for notational simplicity pa(Xi) denotesboth the parental set and a corresponding bi-nary representation.

The multinomial model allows entity specificmodifications at each node, combined into theparameters of the conditional probability modelthat are independent of other variables (i.e. un-structured noise). This permits the modelingof the description of the entities (P (XD

i )), thebeginning of the transitive scheme of causalexplanation (P (XB

i )) and the reverse effectof interrupting the transitive scheme (P (XI

i )).These auxiliary variables model simplistic in-terventions, i.e. authors’ intentions about pub-lishing an observational model. Note that a“backward” model corresponding to an effect-

to-cause or diagnostic interpretation and expla-nation method has a different structure with op-posite edge directions.

In the Bayesian framework, there is a struc-tural uncertainty also, i.e. uncertainty over thestructure of the generative models (literatureBNs) themselves. So to compute the probabil-ity of a parental set PaXi = paXi given a liter-ature data set DL

N ′ , we have to average over thestructures using the posterior given the litera-ture data:

P (PaXi = paXi |DLN ′) (3)

=∑

(paXi→Xi)⊂G

P (Xi = 1|paXi , G)P (G|DLN ′)

≈∑

G

1((paXi → Xi) ⊂ G)P (G|DLN ′) (4)

Consequently, the result of learning BNs fromthe literature can be multiple, e.g. using a max-imum a posteriori (MAP) structure and the cor-responding parameters, or the posterior over thestructures (Eq. 3). In the first case, the param-eters can be interpreted structurally and con-verted into a prior for a subsequent learning. Inthe latter case, we neglect the parametric infor-mation focusing on the structural constraints,and transform the posterior over the literaturenetwork structures into a prior over the struc-tures of the real-world BNs (see Eq. 1).

6 The literature data sets

For our research we used the same collectionof abstracts as that described in (Antal et al.,2004), which was a preliminary work using pair-wise methods. The collection contains 2256 ab-stracts about ovarian cancer, mostly between1980 and 2002. Also a name, a list of synonymsand a text kernel is available for each domainvariable. The presence of the name (and syn-onyms) of a variable in documents is denotedwith a binary value. Another binary represen-tation of the publications is based on the kerneldocuments:

RKij =

1 if 0.1 < sim(kj , di)0 else

, (5)

which expresses the relevance of kernel doc-ument kj to document di using the ‘term

Literature Mining using Bayesian Networks 21

Page 32: Proceedings of the Third European Workshop on Probabilistic

frequency-inverse document frequency’ (TF-IDF) vector representation and the cosine sim-ilarity metric (Baeza-Yates and Ribeiro-Neto,1999). We use the term literature data to denoteboth binary representations of the relevance ofconcepts in publications, usually denoted withDL

N ′ (containing N ′ publications).

7 Results

The structure learning of the transitive model isachieved by an exhaustive evaluation of parentalsets up to 4 variables followed by the K2 greedyheuristics using the BDeu score (Heckerman etal., 1995) and an ordering of the variables froman expert, in order to be compatible with thelearning of the intransitive model. The struc-ture learning of the two-layer model has a highercomputational cost, because the evaluation of astructure requires the optimization of parame-ters, which can be performed e.g. by gradient-descent algorithms. Because of the use of the“forward” explanation scheme, only those vari-ables in the upper layer can be the parentsof an external variable that succeed it in thecausal order. Note that beside the optionalparental edges for the external variables, we al-ways force a deterministic edge from the cor-responding non-external variable. During theparameter learning of a fixed network structurethe non-zero inhibitory parameters of the lowerlayer variables are adjusted according to a gradi-ent descent method to maximize the likelihoodof the data (see (Russell et al., 1995)). Afterhaving found the best structure, according to itssemantics, it is converted into a flat, real-worldstructure without hidden variables. This con-version involves the merging of the correspond-ing pairs of nodes of the two layers, and thenreverting the edges (since in the explanatory in-terpretation effects precede causes).

We compared the trained models to the ex-pert model using a quantitative score based onthe comparison of the pairwise relations in themodel, which are defined w.r.t. the causal inter-pretation as follows (Cooper and Yoo, 1999; Wuet al., 2001): Causal edge (E) An edge betweenthe nodes. Causal path (P) A directed path

linking nodes. (Pure) Confounded (C) The twonodes have a common ancestor. The relationis pure, if there is no edge or path between thenodes. Independent (I) None of the previous(i.e. there is no causal connection).

The difference between two model structurescan be represented in a matrix containing thenumber of relations of a given type in the expertmodel and in the trained model (the type of therelation in the expert model is the row indexand the type in the trained model is the col-umn index). These matrices (i.e. the compari-son of the transitive and the intransitive mod-els to the expert’s) are shown in Table 1. Scalar

Table 1: Causal comparison of the intransitiveand the transitive domain models (columns with‘i’ and ‘t’ in the subscript, respectively) to theexpert model (rows).

Ii Ci Pi Ei It Ct Pt Et

I 12 0 0 0 0 4 2 6C 106 20 2 4 4 90 26 12P 756 72 80 18 188 460 216 62E 70 6 8 36 6 38 24 52

scores can be derived from this matrix, to evalu-ate the goodness of the trained model, the stan-dard choice is to sum the elements with differ-ent weights (Cooper and Yoo, 1999; Wu et al.,2001). One possibility e.g. if we take the sum ofthe diagonal elements as a measure of similar-ity. By this comparison, the intransitive modelachieves 148 points, while the transitive 358, sothe transitive reconstructs more faithfully theunderlying structure. Particularly important isthe (E, E) element according to which 52 of the120 edges of the expert model remains in thetransitive model, on the contrary the intransi-tive model preserves only 36 edges. Similarlythe independent relations of the expert modelare well respected by both models.

Another score, which penalizes only the incor-rect identification of independence (i.e. thoseand only those weights have a value of 1 whichbelong to the elements (I, .) or (., I), the othersare 0), gives a score 210 and 932 for the tran-sitive model and the intransitive respectively.

22 P. Antalband A. Millinghoffer

Page 33: Proceedings of the Third European Workshop on Probabilistic

PI

Age

Pathology

FamHist

Age

Parity

Meno

ReprYears

CycleDay

HormThera

PillUse

Bilateral

Volume

Pain

Ascites

FluidSeptum

IncomplSe

Papillati

PapFlow

PapSmooth

Locularit

WallRegul

Shadows

Echogenic

ColScore

CA125

PI

RIPSV

TAMX

Hysterect

Solid

PMenoAge FamHistOvFamHistBr

PostMenoY

Figure 1: The expert-provided (dotted), the MAP transitive (dashed) and the intransitive (solid)BNs compatible with the expert’s total ordering of the thirty-five variables using the literature dataset (PMCOREL

R3), the K2 noninformative parameter priors, and noninformative structure priors.

This demonstrates that the intransitive modelis extremely conservative in comparison withboth the other learning method and with theknowledge of the expert, it is only capable ofdetecting the most important edges; note thatthe proportion of its false positive predictionsregarding the edges is only 38% while in thetransitive model it is 61%.

Furthermore, we investigated the Bayesianlearning of BN features, particularly using thetemporal sequence of the literature data sets.An important feature indicating relevance be-tween two variables is the so-called ‘MarkovBlanket Membership’ (Friedman and Koller,2000). We have examined the temporal charac-teristics of the posterior of this relation betweena target variable ‘Pathology’ and the other onesusing the approximation in Eq. 4. This featureis a good representative for the diagnostic im-portance of variables according to the commu-nity. We have found four types of variables: theposterior of the relevance increasing in time fastor slowly, decreasing slowly or fluctuating. Fig-

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1980 1985 1990 1995 2000 2005

Septum

PI

ColScore

TAMX

PapSmooth

RI

FamHistBrCa

Fluid

Volume

WallRegularity

Bilateral

PSV

Figure 2: The probability of the relationMarkov Blanket Membership between Pathol-ogy and the variables with a slow rise.

ure 2 shows examples for variables with a slowrising in time.

8 Conclusion

In the paper we proposed generative BN mod-els of scientific publication to support the con-struction of real-world models from free-text lit-erature. The advantage of this approach is its

Literature Mining using Bayesian Networks 23

Page 34: Proceedings of the Third European Workshop on Probabilistic

domain model based foundation, hence it is ca-pable of constructing coherent models by au-tonomously pruning redundant or inconsistentrelations. The preliminary results support thisexpectation. In the future we plan to use theevaluation methodology applied there includingrank based performance metrics and to investi-gate the issue of negation and refutation partic-ularly through time.

References

P. Antal, G. Fannes, Y. Moreau, D. Timmerman,and B. De Moor. 2004. Using literature anddata to learn Bayesian networks as clinical mod-els of ovarian tumors. Artificial Intelligence inMedicine, 30:257–281. Special issue on BayesianModels in Medicine.

R. Baeza-Yates and B. Ribeiro-Neto. 1999. ModernInformation Retrieval. ACM Press, New York.

G. F. Cooper and C. Yoo. 1999. Causal discoveryfrom a mixture of experimental and observationaldata. In Proc. of the 15th Conf. on Uncertainty inArtificial Intelligence (UAI-1999), pages 116–125.Morgan Kaufmann.

G. Cooper. 1997. A simple constraint-basedalgorithm for efficiently mining observationaldatabases for causal relationships. Data Miningand Knowledge Discovery, 2:203–224.

L. M. de Campos, J. M. Fernandez, and J. F.Huete. 1998. Query expansion in information re-trieval systems using a Bayesian network-basedthesaurus. In Gregory Cooper and Serafin Moral,editors, Proc. of the 14th Conf. on Uncertaintyin Artificial Intelligence (UAI-1998), pages 53–60. Morgan Kaufmann.

N. Friedman and D. Koller. 2000. BeingBayesian about network structure. In CraigBoutilier and Moises Goldszmidt, editors, Proc.of the 16th Conf. on Uncertainty in ArtificialIntelligence(UAI-2000), pages 201–211. MorganKaufmann.

D. Geiger and D. Heckerman. 1996. Knowledge rep-resentation and inference in similarity networksand Bayesian multinets. Artificial Intelligence,82:45–74.

D. Heckerman, D. Geiger, and D. Chickering. 1995.Learning Bayesian networks: The combination ofknowledge and statistical data. Machine Learn-ing, 20:197–243.

L. Hirschman, J. C. Park, J. Tsujii, L. Wong, andC. H. Wu. 2002. Accomplishments and challengesin literature data mining for biology. Bioinfor-matics, 18:1553–1561.

T. K. Jenssen, A. Laegreid, J. Komorowski, andE. Hovig. 2001. A literature network of humangenes for high-throughput analysis of gene expres-sion. Nature Genetics, 28:21–28.

M. Krauthammer, P. Kra, I. Iossifov, S. M. Gomez,G. Hripcsak, V. Hatzivassiloglou, C. Friedman,and A. Rzhetstky. 2002. Of truth and pathways:chasing bits of information through myriads of ar-ticles. Bioinformatics, 18:249257.

S. Mani and G. F. Cooper. 2000. Causal discov-ery from medical textual data. In AMIA AnnualSymposium.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems. Morgan Kaufmann, San Francisco, CA.

D. Proux, F. Rechenmann, and L. Julliard. 2000.A pragmatic information extraction strategy forgathering data on genetic interactions. In Proc.of the 8th International Conference on IntelligentSystems for Molecular Biology (ISMB’2000), La-Jolla, California, pages 279–285.

A. Rosenberg. 2000. Philosophy of Science: A con-temporary introduction. Routledge.

S. J. Russell, J. Binder, D. Koller, and K. Kanazawa.1995. Local learning in probabilistic networkswith hidden variables. In IJCAI, pages 1146–1152.

H. Shatkay, S. Edwards, and M. Boguski. 2002. In-formation retrieval meets gene analysis. IEEE In-telligent Systems, 17(2):45–53.

B. Stapley and G. Benoit. 2000. Biobibliometrics:Information retrieval and visualization from co-occurrences of gene names in medline asbtracts.In Proc. of Pacific Symposium on Biocomputing(PSB00), volume 5, pages 529–540.

D. R. Swanson and N. R. Smalheiser. 1997. An in-teractive system for finding complementary litera-tures: a stimulus to scientific discovery. ArtificialIntelligence, 91:183–203.

P. Thagard. 1998. Explaining disease: Correlations,causes, and mechanisms. Minds and Machines,8:61–78.

X. Wu, P. Lucas, S. Kerr, and R. Dijkhuizen. 2001.Learning bayesian-network topologies in realisticmedical domains. In In Medical Data Analysis:Second International Symposium, ISMDA, pages302–308. Springer-Verlag, Berlin.

24 P. Antalband A. Millinghoffer

Page 35: Proceedings of the Third European Workshop on Probabilistic

! #"$%&

'%(*),++-/.101234'5.7637.981:;:;<=-/.10?>@-A2:B3DC-FE-G(*37.H +6<*68163JIK5-G(L(*)>3/(L(*),M4K5<ONP68Q0Q<RN181(L(TS H ./6),(L(L<*UG)V.1W,-4'%26<*XY:;<L-G(*)

Z\[^]`_GaGbGc >-/.Q.13dTeO8#U/-/.#3#fgNPh^<*6W;);2(L-/.Q0iFj1k/lQmPmGjGn#oPp1q!rs#j7tPt#j1k/qGnvuxwQyAoYmPy/jz|V

~R,xVYG!/A,GV -A2)53P0Q),(L+61-A6)BP6)V.Q0\-,G),+<L-/..#);6+630Q),-G(Yh^<*6<Q2),:;<L+<*37.<.123/ Y- ] Y<L(L<*6¡Gg9-/.Q0:;-/.-G:B68Q-G(L(* v)2);U/-A201),0-G++);6+!3G¢\-,G),+<L-/.D.#);6+;£!¤¥#<L01)V.1:B)¦+81UGUG),+6+61-A6:B2),0Y-G(.1);6+=-A2)§-v3Vh);2¢T8Q(¨),-/.Q+63©2),12),+)V./6-/.10ª0Q),-G(h^<*6«4-/./¬<v3G26-/.76=-/.Q0ª:­1-G(L(*)V.#U/<.#UQ23/ Q(*)V+¦<.8Q.1:B);26-G<.2),-G+37.1<.#U1£¨®¬)U/<*¥G))B#-/4Q(*),+¦634+13Vh?6Q-A6^+37)¯3G¢!6#),+)°123/ Q(*)V4+:;-/.37.Q(* v)3P01),(L(*),0© 9:B2),0Q-G(.#);6+¯:;-G(L(*),0 GQ±¡;`²1G³/¡B*´¡²1µV¶ · £=¸%#),+)Gg#3Vh);¥G);2,gR-A2)+6<L(L(<L++<.1U-DUG2-G1<L:;-G(2),12),+)V./6-A6<*37.¹(L-/.#U781-AUG)4-/.10¬+3/(8#6<*37.¬-G(*UG3G2<*6Q4+;£¸^#)+<*681-A6<*37.<L+=ºP8Q<*6)46#)3/Y3/+<*6)4h^<*6J+),Q-A2-A6),(*¹+v),:;<*XQ),0ª:B2),0Q-G(.#);6+;g»h%1<L:|ª1-,¥G)D v);)V.¬6#)4+|81 #¼),:B63G¢\81:|+6810Q-/.10¬-G(*UG3G2<*6Q4<L:01);¥G),(*3/Y)V./6;£¸^1<L+5Q-Gv);25U/<*¥G),+6¡h3-F¼3G2¯:B37./62|<L Y8#6<*37.1+;£½ <*2+6;g<*6^01),(L<*¥G);2|+%-4.1);h?UG2-GYQ<L:;-G(R(L-/.#U781-AUG)=634¢¾3G2|81(L-A6)°-/.76¡#)=3G¢:B2),0Q-G(O.1);6¡h3G2¿gv v3G6+),Q-A2-A6),(*-/.10.#37. ] +),Y-A2-A6),(*+v),:;<*XQ),0£%N#),:B37.10g<*6À+#3xh^+^6Q-A6 GQ´ .137. ] +),Q-A2-A6),(*@+v),:;< ]XY),0.#);6À2),12),+)V./6),0h^<*6@6#).#);hÁ(L-/.#U781-AUG):;-/. v)=),-G+<L(*62-/.1+¢Â3G2|),0<.763§-/.),º#81<*¥A-G(*)V./6+),Q-A2-A6),(*+v),:;<*XQ),0.#);6;g01);X.#),03V¥G);2-4(L-A2UG);250Q37-G<.£¦¸^1<L+%2),+8Q(*6^3/v)V.1+%8Q-.P8Q v);23G¢.1);hÃv);2+v),:B6<*¥G),+5-/.10©:B37.1:B2);6)378#6:B37),+;ÄXQ2|+6%3G¢¨-G(L(Tg<*6À<),0Q<L-A6),(*)V.Q-G Q(*),+À6#)=)B1<L+6<.#U-G(*UG3G2|<*6Q+=¢Â3G2+),Q-A2|-A6),(*ª+v),:;<*XQ),0Å:B2),0Q-G(\.#);6+=63 v)-GYQ(L<*),0J63¬.#37. ] +),Q-A2-A6),(*ª+),:;<*XY),037.1),+;£

Æ ÇxÈOÉGÊ/ËÌÍÎPÉ/Ï¡ËRÈ

®¬)¢¾3P:,8Q+%37.©:B2),0Q-G(.#);6¡h3G2¿#+4dÐNP),:B6<*37.ªÑ9f=d Z 3GW ]-/.g bAÒGÒ/Ó fgÃh%1<L:­Ô-A2)Õ-ÖUG)V.#);2-G(L<*W,-A6<*37.×3G¢\-,G),+<L-/.Ø.#);6+,£Ù¸^#)§UG)V.#);2|-G(L<*W,-A6<*37.Ø<L+-G:|1<*);¥G),0 9¯2),(L-F#<.1U^6#)¨2),ºP8Q<*2)V)V.7661-A6O6#)¨:B37.10Y<*6<*37.1-G(-G++¢8Q.1:B6<*37.1+3G¢¯6#)¬3#01),( v)12),:;<L+)GĬh^<*6:B2),0Q-G(.1);6+¦),-G:|3G¢»6#)VÚ<L+¦37.Q(*42),º#81<*2),063 v) ](*37.#U63-:;(*3/+),0©:B37.7¥G)B+);6;£ Z (*3/+),0:B37./¥G)B+);6+3G¢-G++^¢8Q.1:B6<*37.1+%-A2)-G(L+3¿1.#3xh%.-G+ µV/A»;,¾-A¢¾6);2\eR);¥P<!dTe);¥#<TgÛ aGcAÒ f£¨ÜÀ+<.#U:B2),0Q-G(v+);6+\<.6#)Q(L-G:B)3G¢\4-G++¯¢8Q.1:B6<*37.1+°4-A¿G),+:B2),0Q-G(¨.#);6¡h3G2¿#+-/. ¶ÂÝ^²vµV¶,=²v7Þ7Þ;¶Â߶¾´ 3P0Q),(%d¾®¹-G(L(*);Gg5Û aGa ÛFf£H 6\:;-/. )5+#3xh%.D61-A6-:B2),0Q-G(.1);6¡h3G2¿D<L+),ºP81<*¥ ]-G(*)V.7663©- ;,5à4á^G´/­B¶¾/â;¾ h^<*6ª6#)+-/)UG2-GY£'5.¬<v3G26-/.76¯ºP81),+6<*37.¹<L+¯hÀ#);6#);23G2=.13G6°-G(L(

:B2),0Q-G(.#);6¡h3G2¿#+©:;-/.? v)¹2),Q2),+)V.76),0ã<.ã-âh\-,61-A6)V4Y1-G+<*W;),+(*3P:;-G(L<*6`G£¸^#)-/.1+h¨);2<L+v3/+<*6<*¥G)<*¢»h¨)2),+62<L:B66#)=-A66)V./6<*37.636#)=3/+6¦v3/Y81(L-A26¡#v)°3G¢:B2),0Y-G(».#);6`h¨3G2¿P+;gv6#3/+):;-G(L(*),0 ;`²1G³/¡B*´Ð²Qµ,¶ · dÐNP),:B6<*37.åä#f£ H .$61<L+:;-G+)Gg¯),-G:|?:B37. ]

0Q<*6<*37.1-G(4-G++5¢8Q.1:B6<*37.©<L+¯-G(L(*3Vh),0¹63§¥A-A2©<.©<*6+:B2),0Q-G(1+);6<.101),v)V.101)V.76(*3G¢v6#)¦3G6#);2+,£!¸%#)\2), ]2),+)V.76-A6<*37.¬<L+.1-A6812-G(L(*(*3#:;-G( ),:;-/8Q+)6#);2)-A2).#3«2),(L-A6<*37.1+|1<LQ+ v);6¡h);)V.æ0Q<ßE);2)V./6§:B2),0Y-G(À+);6+;£¸^#)ªºP8#),+6<*37.?<L+©3G2)ª:B37Q(L<L:;-A6),0ãh^<*6Ã3G2)UG)V.#);2-G(R+v),:;<*XY:;-A6<*37.1+3G¢!:B2),0Q-G(O.#);6¡h3G2¿#+;gQh%1<L:­h)Å:;-G(L( YGY±Ð,`²#GG¡;ß´$¡²1µV¶ · £ç¸^#)Å<L01),-Ù3G¢.#37. ] +),Q-A2-A6),(*+),:;<*XY),0:B2),0Y-G(v.#);6+\<L+<.¢¾-G:B663-G(L(*3Vh«¢¾3G2R2),(L-A6<*37.Q+1<LQ+» v);6¡h);)V.°:B37.10Y<*6<*37.1-G(P4-G++¢8Q.1:B6<*37.1+<.40Q<ßE);2)V./6¨:B2),0Q-G(Q+);6+,g7h%1<L:­:;-/.4-G(L+3 v)¯¢¾-A2%-,h\-,<.6#).#);6;£'À(*6#378#U746#)5<L01),-°3G¢R.137. ] +),Q-A2-A6),(*D+v),:;<*XQ),0

:B2),0Q-G(.1);6+<L+2),(L-A6<*¥G),(*â<.7681<*6<*¥G)Gg¦<*6+#3781(L0è v)+62),++),0J6Q-A66Q<L+°¿#<.10ª3G¢%.#);6+1-G+ );)V.ª<.7¥G),+ ]6<*U/-A6),0$¥G);2è(L<*66(*)GĬ<.è¢T-G:B6;g¦6#);2)@1-G+ v);)V.â.#3-A66)VQ6+3Å¢¾-A2D63«01);¥G),(*3/Ø-JUG)V.#);2|-G(^UG2-G1<L:;-G((L-/.#U781-AUG)63@01),+:B2<L v)6#)V§é!-/.10¬61);2)4<L+=.13-G( ]UG3G2<*6Qê63§:B374Y8#6)°h%<*661)V§£*문^1<L+5-GQv),-A2+

ìÐíRîï³ðxñ³ïòVóÐôöõ­îôö÷¦óÐøFïñ³ùûú÷÷ôöüAñúóÐôöõ|î@úùöý­õ|þÐôÿóÐøVïBïùõ­òGï*õþóÐøFï ! "#$%'&7ú)(7ú|ùöõ­î+*-,..0/)1 * 243Vóôÿó\ôö÷¦úøxõ,ñßõ|þú5Bï³þ6÷òGïñ³ôÿüGñó76VòGïõOîFï³ó98õ|þ!:<;>=Àõ|þÐïý­ï³îFï³þ¡úùûù64÷òGïú:Vôöîxý*ôÿóôö÷¦îFõó?3Fîxùöô@:­ïù6óÐøAúó¦÷õ¦ï%õ»óÐøxï

Page 36: Proceedings of the Third European Workshop on Probabilistic

63« v)-/.Ø8Q.#¢¾3G268Q.Q-A6)U/-GØ<. 6#)(L<*6);2|-A68#2)¬-G+6#)§.#37. ] +),Q-A2-A6)+v),:;<*XY:;-A6<*37.«+);)V+=63© )6#)¿G);563%3P01),(/4-/./À<v3G26-/.76123/ Q(*)V4+;gV-G+O<L(L(81+ ]62-A6),0©<.NP),:B6<*37. _ £ÀNP),Q-A2|-A6),(*+v),:;<*XQ),0:B2),0Q-G(.#);6+,gV37.¯61)3G6#);2»1-/.Q0gx1-V¥G)¨ v);)V.56#)+8Q #¼³),:B63G¢81:­-G(*UG3G2<*6Q4<L:01);¥G),(*3/)V.76\d Z 3GWV4-/.g bAÒGÒ/Ó f£H .À61<L+OQ-Gv);2h¨)U/<*¥G)¨6¡h3À4-F¼³3G2R:B37.762<L Y816<*37.1+;£

½ <*2+6;gYh¨)01);X.#)-48Y.1<*XQ),0§UG2-G1<L:;-G((L-/.#U781-AUG)63(*3#:;-G(L(*å+v),:;<*¢ÂØ:B2),0Q-G(=.#);6¡h3G2¿#+<.Ø6#)JUG)V.#);2-G(:;-G+)§dÐNP),:B6<*37. Ó f£%¸%#).#);hÃ2),12),+)V./6-A6<*37.<L+5<. ]+Y<*2),0g¥#<L-¬6#) Z^Z > 62|-/.1+¢¾3G2|-A6<*37.åd Z -/.#3¹);6-G(T£LgÛ aGa ä#fg° 9$6#)¹¢¾3G2|-G(L<L+×3G¢<. 81)V.1:B)¹0Q<L- ]UG2-/4+;g-/.10æ3G2)UG)V.#);2-G(L(* 3G¢=01),:;<L+<*37.ÙUG2|-GY1+dÐCQ-/.#U¯);6¨-G(T£LgÛ aGa Ñ9f£ H .61<L+(L-/.1U781-AUG)^6#)^UG2-GY3G¢R-:B2),0Q-G(.#);6\<L+\-/8#U7)V.76),0Dh^<*64:B37.7623/(.#3#01),+61-A6\)B112),++\61)À2),(L-A6<*37.1+1<LQ+¦ v);6`h¨);)V.§0Y<ßEv);2)V.76:B2),0Y-G(+);6+;£®¬)°U/<*¥G)°)B1-/Q(*),+\634+|#3Vhã61-A6¦6#).#);hÁ(L-/.#U78Q-AUG)4123x¥P<L01),+37.#)h^<*6©-.1-A68#2-G(!h\-,6301);Xv.#).#37. ] +),Q-A2|-A6),(*+v),:;<*XQ),0¹.1);6+;éO-/.10©h)U/<*¥G)-123#:B),0Y8#2)°63§2);¢¾3G2|81(L-A6)-/./+),Q-A2-A6),(*+v),:;<*XQ),0.1);6^<.6#)°.#);hã(L-/.#U781-AUG)G£N#),:B37.10gh¨)©-A¿G)-J¥G);2â+<Y(*)§3/ Q+);2¥A-A6<*37.

dÐNP),:B6<*37.7fg7h%Q<L:|1-G++|8#212<L+<.#U/(*=v3Vh);2¢81(#< ]Q(L<L:;-A6<*37.Q+;Äh¨)5+#3xh 6Q-A6¢¾3G2-/.74:B2),0Q-G(.#);6`h¨3G2¿+v),:;<*XQ),0@h^<*6@6#).#);hÁ(L-/.#U781-AUG)6#);2)<L+5-+), ]-A2-A6),(*@+v),:;<*XQ),0:B2),0Y-G(».#);6`h¨3G2¿g01);X.#),03V¥G);2¯-(L-A2UG);2§01374-G<.g¦h%1<L:­Ù<L+D),º#81<*¥A-G(*)V./6;£Á¸^#)123 ]:B),0Y812)463©62-/.1+¢Â3G2| 6#)D¢Â3G2­);2=<.763©6#)(L-A66);2.#);6`h¨3G2¿ <L+¥G);2$+<Q(*)GgÀ-/.10$6-A¿G),+§37.1(*$(L<.#),-A26<)G£^¸^#)=¿G);v3/<./6À<L+%6Q-A6%61<L+À123#:B),0Y8#2)°:;-/. v)81+),0¬-G+¯-63P3/(63ÙI+),Y-A2-A6),M§6#):B2),0Y-G(+);6+3G¢¨.137. ] +),Q-A2-A6),(*©+v),:;<*XQ),0¬.#);6+;£=¸^1<L+¯-A¿G),+5<*6v3/++<L Y(*)°63§3#01),(Tg 9+),Q-A2|-A6),(*+v),:;<*XQ),0©.#);6+;g123/ Q(*)V+°¢Â3G2|);2(*¹3#01),(L(*),0Å 7ª.137. ] +),Q-A2-A6),(*+v),:;<*XQ),0=37.#),+;é9-/.101)V.1:B)¨6381+) GY´ dT 3G6=)B1-G:B6-/.10«-GQQ23,1<-A6)Af)B1<L+6<.#UJ-G(*UG3G2|<*6Q ¢¾3G2+),Q- ]2-A6),(*§+v),:;<*XQ),0@.#);6+¦634+3/(*¥G)+81:­§123/ Q(*)V4+;£N#37):B374)V.76+=37.Å61<L+2),+81(*6-/.10èv);2+),: ]

6<*¥G),+¢Â3G2¢T8168#2)01);¥G),(*3/)V.76+-A2)@0Q<L+:,81++),0Ù<.NP),:B6<*37. c £¦¸^#)=3G2)¯6),:|Y.1<L:;-G(!Q-A26+3G¢!61<L+%Q- ]v);2-A2)°:B3/(L(*),:B6),0<.'ÀQv)V.10Q<ßD'°£ ]

ï³ðVôö÷TóÐôûîxý¯ú|ùöý|õ|þÐôÿóÐø4¦÷ *õþ÷ï³òAúþ¡úóÐïù6÷òGïñ³ôÿüAï îxï³óÐ÷ñú|î 2Gïï³ð,óÐïî Vï °óÐõ÷òGïñ`ô úùñú÷ï÷õOîxõ­î0Â÷ï³òAúþ¡úóÐï÷òGïñ³ôÿüGñúóÐôöõ­î *243Vó 8ïRúþÐïîFõóú8úþÐïõ7úî 6¨ò432Fùöôö÷øFï 8õ|þ!:xïúùöôûîxý 8ôÿóÐøóÐøFôö÷ôö÷÷3Fï;

FÏ`ιÈËÉ vÉ/ÏÐËOÈȨÌAÏÈ È1ÉeR);6¨81+XY2+601);X.1)+37)%.13G6-A6<*37.4-/.106#)^¢8Q.10Q- ])V./6-G(¦.#3G6<*37.«3G¢5\-,G),+<L-/.$.#);6`h¨3G2¿v£«e);6d ë! ! ! #"f v)è-æ:B3/(L(*),:B6<*37. 3G¢2-/.10137 ¥A-A2< ]-G Q(*),+,g%h%Q<L:|$6-A¿G)¬¥A-G(8#),+<. X.1<*6)¹+);6+,gÀ-/.10%$-Å0Q<*2),:B6),0Ø-G:B#:;(L<L:@UG2|-GYãdTK^''&=fgh%#3/+).#3#01),+-A2)-G++3#:;<L-A6),0ª63@6#)¥A-A2<L-G Q(*),+<.(£ ½ 3G2=),-G:­#)*+g-,/.10<L+¦6#)°v3/++<L Q<L(L<*6¡§+Q-G:B)¯3G¢2#)³g43)!-UG)V.#);2|<L:°),(*)V)V./653G¢5,/.10g764d#)³f\--G++%¢8Q.1:B6<*37.¢¾3G28#)¨-/.1096d:3)³f61)123/ Q-G Q<L(L<*6¡61-A6;#)3)³£¸^1)Q-A2)V.76+3G¢2 ) g-G:;:B3G20Q<.#U63$¨gY-A2)°0Q)V.#3G6),0 96#)¼3/<./6¨¥F-A2|<L-G Q(*)=<=)`gPh%#3/+)^v3/++<L Q<L(L<*6`+Q-G:B)<L+1,/>-0£ ½ 3G2»),-G:|@?4)*A,/>B0gC6d#)ED ?)¡f<L+6#)¨:B37.10Q< ]6<*37.1-G(-G++»¢8Q.1:B6<*37.¢¾3G2#)U/<*¥G)V.6#)!¼3/<./6¥A-G(8#)?)O3G¢»6#)Q-A2)V.76+¦3G¢#)`£¨¸^1<L+\¢¾3G2|-G(L<L+Ú<L+^+8GF ]:;<*)V.76634123/);2|(*D<.7623#0Y81:B)¯6#)¯¢¾3/(L(*3Vh^<.1U1ÄHJILKMN N:OBMQP4R '«\-,G),+<L-/..1);6¡h3G2¿dTTS°f3x¥G);21<L+!-5Q-G<*2VUW$ YX1Z +81:­°6Q-A6 X <L+-5+);6!3G¢Y:B37.10Y<*6<*37.1-G(4-G++¢T8Y.1:B6<*37.1+64d#)[D ?)¡fg37.#)¢¾3G2),-G:|#)*9-/.10J?)C*=,/>-0£®¬)-G++|8Q)61)]\ GA_^DµGY/¶Â¶TG 634-A¿G)#$

2),Q2),+)V.76¹123/ Q-G Q<L(L<L+6<L:â<.101),v)V.101)V.Q:B)ª2),(L-A6<*37.Q+ v);6¡h);)V. 6#)¥F-A2<L-G Y(*),+D<.`Ĭ);¥G);2â¥A-A2<L-G Q(*)¬<L+<.10Q),)V.Q01)V./6À3G¢\<*6+°.#37. ] 01),+:B)V.10Q-/.76°.#37. ] Q-A2)V.76+:B37.10Y<*6<*37.1-G(37.J<*6+Q-A2)V./6+;£©¸^P81+;g-TS 01);6);2 ]4<.#),+-¯¼3/<.76À-G++¢8Q.1:B6<*37.+6df\-G:;:B3G20Q<.#UD636#)¯¢¾3/(L(*3Vh%<.#U4¢T-G:B63G2<*W,-A6<*37.Ä

6d:a!f"b)dc ë

6d:3 ) D ? ) f dÛFf

¢¾3G2),-G:|a(*A,/e5g/h%1);2)\¢Â3G2),-G:­gf1ÁÛ ! ! ! hEi 6#)¥A-G(8#),+°d:3-) ?)¡f-A2)°:B37.1+<L+6)V./6h%<*6a£j kÊYÌl8C1ÉJÈ¨Ì Î#ÊYÌ7l5È1ÉmËÊn2Z 2),0Q-G(.#);6¡h3G2¿#+)B#6)V.10Ù\-,G),+<L-/. .#);6+63J01),-G(h^<*6 <12),:;<L+<*37.Á<. 123/ Q-G Q<L(L<*6¡G£ ¸^1<L+J<L+¬3/ ]6-G<.#),0¬ 9),-/.1+À3G¢¨:;(*3/+),0J:B37./¥G)B¹+);6+53G¢123/ ]-G Q<L(L<*6`«4-G++¢T8Y.1:B6<*37.1+;ghÀ1<L:|Å-A2):;-G(L(*),0 µV/A,;¾ dTe);¥#<TgÛ aGcAÒ f£®¬)¯¢¾3/(L(*3Vh d Z 3GWV4-/.g bAÒGÒGÒ f¨<.:B37.1+<L01);2<.#U37.1(*¹X.Q<*6),(*JUG)V.#);2-A6),0è:B2),0Q-G(\+);6+;g<T£ÿ)G£LgP3/ 16-G<.#),0-G+61)%:B37.7¥G)BP81(L(13G¢-Xv.1<*6)À.P8Q ] v);23G¢-G++¯¢T8Y.1:B6<*37.1+;£J&À);37);62<L:;-G(L(*Gg-:B2),0Q-G(+);6¯<L+¯- ²#A*´AÐ|²1 £' :B2),0Q-G(!+);6:B37.76-G<.1+¯-/.©<.#X ].1<*6)5.98Y= v);23G¢R4-G++¢8Q.1:B6<*37.Q+;gP 8#637.1(*-=X.1<*6).P8Q v);2»3G¢ Eo7,Ý5ÝFRàqp#Yµ,¶TG# Ä!6#3/+)%:B3G22) ]+v37.10Y<.#U=6361) ^F;­¶Tµ|­ 3G¢»6#)v3/(*963/v)Gg1h%Q<L:|

26 A. Antonucci and M. Zaffalon

Page 37: Proceedings of the Third European Workshop on Probabilistic

-A2)GgY<.§UG)V.#);2-G(Tg-4+81 Q+);6¦3G¢!6#)¯UG)V.1);2-A6<.#UD4-G++¢8Q.1:B6<*37.1+;£å' :B2),0Q-G(%+);643V¥G);2J <L+D01)V.#3G6),0$-G+ d¬f£§®¬)+<<L(L-A2|(*©0Q)V.#3G6)-G+ d D Yf5- µGQ±G¶¾¶¾/YA°µ,7A¯;, 3x¥G);29 U/<*¥G)V.å-«¥A-G(8#)Ù3G¢-/.#3G6#);22-/.10137¥A-A2<L-G Q(*)4g¯<T£ÿ)G£Lg-â:B2),0Y-G(¯+);63G¢=:B37.Q0Q<*6<*37.1-G(¯-G++¢8Q.1:B6<*37.1+64d D Qf£ &¯<*¥G)V.- G¶¾Qµ,7AR,; d 4fg6#) ÝGG¶¾YAµV/A;, ¢¾3G2 <L+61):B2),0Q-G(\+);6 d¬f3/ 16-G<.1),0â 76#)v3/<.76 ] h^<L+)%4-A2U/<.1-G(L<*W,-A6<*37.3G¢ã¢Â237 -G(L(16#)¼3/<./6§-G++¢8Q.1:B6<*37.`64d f9* d f£ ½ < ].1-G(L(*Gg!U/<*¥G)V.ª-+81 Y+);6 , . ,/.g-Y-A26<L:,81(L-A2(*<v3G26-/.76!:B2),0Q-G(#+);6!¢Â3G237812!8#2v3/+),+»<L+!6#) ^F/µ,±p1_p9µV/A!,; ¢¾3G2 , . g<T£ÿ)G£Lg61)+);653G¢-G(L(!4-G++¢8Q.1:B6<*37.1+3V¥G);2+ -G++<*U7.1<.#U«123/ Q-G Q<L(L<*6`â37.#)63, . £ H .6#)^¢Â3/(L(*3xh^<.#U°h)h^<L(L(81+)6#)^h¨),(L(1¿1.#3xh%.¢T-G:B661-A66#)¦¥G);26<L:B),+3G¢v+8Q:|-¯:B2),0Q-G(1+);6-A2)¦6#)D , . D70Q);UG)V.#);2-A6)-G++¨¢8Q.1:B6<*37.1+\-G++<*U7.Q<.#U123/ ]-G Q<L(L<*6`§37.#)¯6346#)+<.#U/(*)),(*)V)V.76+3G¢2, . £HJILKMN N:O-MR ' :B2),0Q-G(¨.#);6`h¨3G2¿ d Z S°f%3V¥G);2g<L+\-=Y-G<*2 UW$ i X ë ! ! ! hYX u Z +81:­61-A6 UW$ YX Z <L+\-\-,G),+<L-/..#);6¡h3G2¿3x¥G);2' ¢¾3G2),-G:­ Û ! ! ! C £¸^1)^ S%+ i UW$ YX_Z u c ë -A2)^+-G<L0463° v)61) µGݱ²#G¶ÞB SÀ+¹3G¢6#) Z S:B37.1+<L01);2),0 <. KÀ);X.1< ]

6<*37. b £¸^1) Z S UW$ i X ë ! ! ! hYX u Z :;-/.§ v)58Q+),06301) ]6);2|4<.#)¯6#)¯¢¾3/(L(*3Vh^<.#UD:B2),0Q-G(+);6;Ä

d@fÄ Z\[ i 6 ë d@f ! ! ! C 6 d@f u d b f

h%#);2) Z¦[ 01)V.13G6),+è6#)å:B37.7¥G)B P81(L(D3G¢©-Á+);63G¢Ø¢T8Y.1:B6<*37.1+;gå-/.10 6#)Á¼3/<./6Ú-G++ ¢T8Q.Q:B6<*37.1+i 6 d@f u c ë -A2)D6#3/+)01);6);2|4<.#),0J 9¬6#):B37 ]Q-A6<L Q(*) S%+3G¢61) Z S=£O®Ø<*6Å-/.J-G 81+)D3G¢¦6);2 ]<.13/(*3GUGGgPh)¯:;-G(L(6#)5:B2),0Q-G(+);6¦<.D¤º#81-A6<*37.¬d b f6#) ­G[o/¡,#­¶TG 3G¢Y6#) Z S=gG 7-/.1-G(*3GUGh^<*66#).#3G6<*37.JQ23V¥#<L01),0J<.¹6#)+),:;<L-G(¨:;-G+)43G¢ ,`²v±GG¡B*´DвQµ,¶ ·Z SÀ+dT+);)NP),:B6<*37.@ä#f£ H .#¢¾);2)V.1:B)3V¥G);2- Z SÙ<L+<.76)V.101),0-G+61)\:B374Y8#6-A6<*37.3G¢81 ]v);2-/.10§(*3Vh);2À)B#v),:B6-A6<*37.1+^¢Â3G2^-U/<*¥G)V.¢T8Y.1:B6<*37.3G¢5 3V¥G);261):B2),0Q-G(!+);6 d@fg3G25),º#81<*¥F-G(*)V.76(*3V¥G);25<*6+¦¥G);26<L:B),+d¾®¹-G(L(*);Gg!Û aGa ÛFf£

15vÊ É_l /YÎ1Ï5YÌÁÎ#ÊÌ l5È 1ɸ^#) -G<. ¢¾),-A68#2)Ú3G¢ Q23/ Q-G Q<L(L<L+6<L: UG2-G1<L:;-G(3#01),(L+;g%h%1<L:­Ø<L+6#)¬+v),:;<*XY:;-A6<*37.æ3G¢-«U/(*3/ Q-G(3#01),(O6#2378#U7-:B3/(L(*),:B6<*37.¬3G¢+81 ] 3P0Q),(L+%(*3#:;-G(6361).#3#01),+¯3G¢\6#)UG2-GgO:B37.762-G+6+=h^<*6¹KÀ);¢ ]

<.1<*6<*37. b ghÀ1<L:|2),Q2),+)V.76+À- Z S -G+¯-/.)B1Q(L<L:;<*6)V.P8Q);2-A6<*37.§3G¢! SÀ+;£S%);¥G);26#),(*),++,g6#);2)-A2)+v),:;<*XY:+|81 :;(L-G++),+3G¢

Z SÀ+61-A6À01);X.#)-+);6^3G¢ SÀ+%-G+À<.KÀ);X.1<*6<*37. b6#23781U7=(*3#:;-G(1+v),:;<*XY:;-A6<*37.1+,£!¸%1<L+O<L+!¢¾3G2!)B#-/4Q(*)6#)5:;-G+)À3G¢ Z S%+¨h^<*6+),Q-A2-A6),(*D+v),:;<*XQ),04:B2),0Q-G(+);6+;g! @h%1<L:­è-A2)+<Q(*Å:;-G(L(*),0 ,`²#GG¡;ß´ªÐ²Qµ,±¶ ·æ"¯ <.Ù6#)¢¾3/(L(*3Vh%<.#U1£Á¸^1<L+4+),:;<*X:;-A6<*37.2),º#81<*2),+=),-G:|«:B37.10Y<*6<*37.1-G(-G++°¢T8Y.1:B6<*37.ª63 v) ](*37.#U63-¬dT:B37.10Q<*6<*37.1-G(¾fÀ:B2),0Q-G(+);6;g-G:;:B3G20Y<.#U§636#)¢¾3/(L(*3Vh%<.#U1ÄHJILKMN N:O-M$# R ' +),Q-A2-A6),(*ª+),:;<*XY),0 Z SÚ3x¥G);2Á<L+R-^Q-G<*2TUW$ &%8Z g#h%#);2) % <L+O-+);63G¢Q:B37.10Y<*6<*37.1-G(:B2),0Q-G(!+);6+ d#)[D ?4)`fg37.#)¢Â3G25),-G:|Q#)/* -/.10?4)*A,/>B0£¸^1) ­G Eo7¡;#B¶¾G d@f3G¢°-Å+),Q-A2-A6),(*

+v),:;<*XQ),0 Z S <L+@01);X.#),0æ-G+6#)¹:B37./¥G)BãP81(L(¯3G¢6#)¼3/<./644-G++¢8Q.1:B6<*37.1+g64d@fgh^<*6g¢¾3G2),-G:­a(*A,/e5Ä

64d:a!f"b)dc ë

64d:3)[D ?)¡f 64d#)[D ?4)`f5* d#)ED ?4)³f ¢Â3G2^),-G:|9#)*A ?)1*A<')

dÑ9f[ );2) d ) D ? ) f4:;-/.å v)©2),Q(L-G:B),0? 9$6#)J+);63G¢<*6+\¥G);26<L:B),+dT+);)'23/v3/+<*6<*37.Û%<.6#)¯-GQv)V.10Y<ßf£NP),Q-A2|-A6),(*+v),:;<*XQ),0 Z S%+¯-A2)61)3/+6¯v3/Y81(L-A26¡#v)3G¢ Z S°£'À+-Å3G2)@UG)V.1);2-G(5:;-G+)Gg%+37)-/816#3G2+D:B37. ]

+<L01);2),0+3 ] :;-G(L(*),0 Eo7¡;#B¶ ^A +),:;<*X:;-A6<*37.1+3G¢ Z SÀ+d ½ );22),<*2-0Q-)(^3P:­1--/.10 Z 3GWV-/.g bAÒGÒ/b fg»hÀ#);2)<.1+6),-G03G¢»-+),Q-A2|-A6)°+v),:;<*XY:;-A6<*37.¢¾3G2),-G:|:B37. ]0Q<*6<*37.1-G(¯4-G++D¢T8Q.Q:B6<*37. -G+<.$KÀ);X.1<*6<*37.æÑPgÀ6#)UG)V.#);2<L:=123/ Q-G Y<L(L<*6¡§6-G Q(*)@64d#)[D <')`fg<T£ÿ)G£Lg-¢8Q.1: ]6<*37.@3G¢ v3G6A ) -/.Q09< ) g<L+501);X.#),063D v),(*37.#U63-X.1<*6)§+);63G¢6-G Q(*),+,£ª¸^#)+6237.1U)B#6)V.1+<*37.Å3G¢-/.4)BP6)V.1+<*¥G) Z Sæ<L+¨3/ Q6-G<.#),0-G+\<.¤º#81-A6<*37.©dÑ9fg 9ª+<4Q(*¬2),Q(L-G:;<.#U©6#)§+),Y-A2-A6)§2),º#81<*2)V)V.76+¢¾3G2),-G:|¬+<.#U/(*)4:B37.10Q<*6<*37.Q-G(-G++À¢8Q.1:B6<*37.gRh^<*6)B#6)V.1+<*¥G)D2),ºP8Q<*2)V)V.76+°-G v378#6=6#)46-G Q(*),+=h%1<L:­6-A¿G)°¥A-G(8#),+%<.6#):B3G22),+37.Q0Q<.#UX.1<*6)+);6+;£H .6#)À.#)BP6+),:B6<*37.gPh¨)À123x¥P<L01)%-/.4-G(*6);2|.Q-A6<*¥G)

01);X.Q<*6<*37.Ù3G¢ Z S=gh%<*6$6#)¬+-/)UG)V.#);2-G(L<*6`$3G¢KÀ);X.1<*6<*37. b g» Y8#653/ 16-G<.#),0¹6#2378#U7©(*3#:;-G(+v),:;< ]XY:;-A6<*37.-G+^<.§KÀ);X.1<*6<*37.@ÑP£*,+,õ¦ïú3xóÐøxõ|þÐ÷ 3F÷ïúùû÷õ\óÐøxï!ï³ðVòxþÐï³÷÷ôûõ|î !- ! . " -

/10 ÷ % / õ32ú|î+*<,...1 ;

Locally specified credal networks 27

Page 38: Proceedings of the Third European Workshop on Probabilistic

ËÎGl /Î1ÏÎGvÉ/ÏÐËOÈãË°Î#ÊYÌ7l5È1ÉH . 61<L+§+),:B6<*37.Øh)¹123x¥P<L01)¬-/.Ø-G(*6);2|.Q-A6<*¥G)J-/.10G);6!),º#81<*¥F-G(*)V.7601);X.1<*6<*37.=¢¾3G2 Z S%+»h^<*6=2),+),:B6»63KÀ);X.1<*6<*37. b gPh%1<L:­<L+<.1+Y<*2),04 96#)%¢Â3G2|4-G(L<L+3G¢ 9µ,¶ ­¶TGD;/x dÐC1-/.#UÀ);6!-G(T£LgQÛ aGa Ñ9fO¥P<L-%6#)ZZ >Ö62-/.Q+¢Â3G2­-A6<*37.¬d Z -/.#34);6^-G(T£LgOÛ aGa ä#f£HJILKMN N:OBMR ' (*3#:;-G(L(*å+v),:;<*XQ),0å:B2),0Y-G(°.#);6 ]h3G2¿â3x¥G);2+ <L+D-ª62<LQ(*);6(UW$ d f d YX f Z+8Q:|61-A6;ÄdT<¾f=$$<L+¯-§K''&Ú3x¥G);2 A ¾édT<L<¾f <L+^-4:B3/(L(*),:B6<*37.3G¢!+);6+8, 0.10 ,/.10gQ37.#)°¢¾3G2),-G:­`#)J* -/.Q0+?4)J* <=)édT<L<L<¾f X <L+-Å+);63G¢:B37.10Q<*6<*37.Q-G(O-G++¢8Q.1:B6<*37.1+V6d ) D ? ) fg37.#)°¢¾3G2),-G:­+#)2*+ -/.Q0+?4)*A,/>£®¬)<.76)V.10§63+|#3Vhæ61-A6KÀ);X.Q<*6<*37.ä+v),:;<*XQ),+

- Z S 3x¥G);26#)¥A-A2<L-G Q(*),+<. é6#).#3#01),+:B3G2 ]2),+37.Q0Q<.#U63] -A2)°61);2);¢Â3G2)°:;-G(L(*),0 pPµ|;­ÐG¶¾-/.10Dh^<L(L( v)501),Q<L:B6),0§ 94:;<*2:;(*),+;gQhÀ1<L(*)56#3/+)¯:B3G2 ]2),+37.Q0Q<.#U63A -A2)4+-G<L0 7µV¶B¶¾/èY,7B -/.10h^<L(L(% v)01),Q<L:B6),0$ 7â+º#81-A2),+,£ØeR);6D81+-G++3#:;<L-A6)),-G:­Ù01),:;<L+<*37.Ù.#3#01)+#) * h^<*6â-¹+3 ] :;-G(L(*),07µ,¶ ­¶TG¯àqp#Yµ,¶TG . 0!Ä ,/>-0 ,/. 0R2);68#2|.Q<.#U°-/.),(*)V)V./63G¢ , 0.10 ¢¾3G2^),-G:­A?)2*9,/>B0£ Z -G(L( ­G¡ G´ -/.â-A22|-,â3G¢01),:;<L+<*37.â¢8Q.1:B6<*37.1+,g¦37.#)¢¾3G24),-G:­#);*%£§®¬)D0Q)V.#3G6)-G+#,©6#)D+);6°3G¢-G(L(6#)v3/++<L Y(*)+62-A6);U/<*),+;£¤¨-G:|¬+62-A6);UG * ,01);6);2|<.1),+¯- S 3x¥G);2

¥#<L-¯K%);Xv.1<*6<*37.ä1£'$:B37.10Q<*6<*37.1-G(Y4-G++!¢8Q.1:B6<*37.64d#)ED ?4)³f¦¢¾3G2¯),-G:|¹8Q.Q:B);26-G<.©.#3P0Q)##)/* »-/.10? ) *A, >-0 <L+»-G(*2),-G01°+v),:;<*XQ),0 7 X £!¸O3À01);6);2­<.#)-Å SÖh)¬Q-,¥G)¬6#)V. 63è+<Y(*è2),12),+)V.76§01),:;< ]+<*37.Q+^¢8Q.1:B6<*37.1+À 9@-G++^¢T8Y.1:B6<*37.1+;Ħ¢¾3G25),-G:|01) ]:;<L+<*37..#3#01)8#)*JÙ-/.10]?)1*9,/>-0g#h)À:B37.Q+<L01);26#):B37.10Q<*6<*37.1-G(14-G++¢8Q.1:B6<*37.1+16xd#)[D ?4)`f-G++<*U7. ]<.#U$-G(L(6#)«4-G++@63 6#)Å¥A-G(8#) .10d?4)`fgh%#);2) .10<L+@6#)ª01),:;<L+<*37.?¢8Q.1:B6<*37.?:B3G22),+v37.10Y<.#Uâ63 £ ¸^1)¹TS 3/ 16-G<.#),0?<.Ø61<L+§h\-,Øh^<L(L(¯ v)¬01) ].#3G6),0Ã-G+ UW$ YX Z g=h%1<L(*)J¢¾3G26#)Å:B3G22),+v37.10Q<.#U¼3/<./6-G++4¢T8Y.1:B6<*37.g\h¨)©:;(*),-A2(* Q-,¥G)Gg¢¾3G24),-G:­a9Ád:a a f5*A,Ve5Ä6Fd:a a f b

.Ce 6xd:3 D ?Tf! b.10Ce#" 64d:3-) D ?)³f

d¾ä#f¸%#)@.#)B#64+6),$<L+461)V.è3/ 9¥P<*3781+,Ä@h)h\-/./6D63

01);Xv.#)=- Z S 9),-/.1+%3G¢6#)+);653G¢ SÀ+¯01);6);2 ]$&%'(' 0 ñ`õ|þþÐï÷òGõ|î x÷óÐõúòFúþÐïî;óÐùöï÷÷îxõ xï»õ*)#*xú÷ôûîxý­ùöï

÷ï³óRï,+ 3Aúù/óÐõ¨óÐøFï 8øxõ­ùöï.-0/1ôö÷Oñ³õ­îx÷ô@xï³þÐï -;

4<.#),0â 9â-G(L(6#)©v3/++<L Q(*)+62-A6);U/<*),+;£ã¦8#66#)v3/<.76^<L+¦h%#);61);2^3G2À.13G6^-G(L(6#),+).#);6¡h3G2¿#+%1-V¥G)6#)+-/)K''&gv-G+2),º#81<*2),0§ 9K%);Xv.1<*6<*37. b £¸O3+13Vhå61<L+h¨)=.1);),0§6#)¢¾3/(L(*3Vh^<.1U1Ä2 x M 43 O 65¹Q N:O-M PR87 ¶ ^A; *,µ|A¾ß´ вQµ,¶ ·" UW$ d9 f d X f Z;: 7Þ;Ð/¶Âè=<?> 7 $ ^F­±;,µ,¶TG¡JйA@#+^xG­¶T7ÞB­ ¶¾¡;G¶¾ : à;G/µB@DC]* : A@#^àBGÂLG¶¾|²1;G¶TG#D^F; $EGF ¶IHµ|GQµ,5¶ÂA@ª/«/³µGÂA@#²#G;Q¾à DC ¶¾A@A¾A@1µJ@P¶ LG;@à DCLKMF ¶¾¶NH,Ý_^FA@#¯YV7µ|G|±­¡²#GYG¶¾Ð DCPO

½ <*U78#2)ÛAÄ!¸^#)¦K''& $ 2);6812|.#),0 7¸O2-/.1+¢¾3G2|- ]6<*37.ªÛ¯U/<*¥G)V.-4(*3#:;-G(L(*+v),:;<*XQ),0 Z SãhÀ#3/+)=K^''&<L+»61-A6<. ½ <*U78#2) b d¾3G2-G(L+3 ½ <*U78#2)¦Ñ%3G2 ½ <*U78#2)\ä#f£

½ <*U78#2)ÛAg2),v3G26+À-/.)B1-/Q(*)=3G¢¸O2-/.1+¢¾3G2|- ]6<*37. ÛA£@¸^1)K''& $ 2);6812|.#),0J 9¹¸O2-/.1+¢¾3G2|- ]6<*37.¬Û°<L+^:B37.1+<L0Q);2),0 96#)°¢Â3/(L(*3xh^<.#U1Ä2RQ I O I 5 PRTS @#ÝGG¶¾YAOàB/ ;*/¶ ^A©ÐUW$ YX Z;: ¶ O O A@#°ÝA!à p#YµV¶¾G 6Fd f qp1µJ@DA@#G

6 d:a f1 UV 6W*X

6Vd:a a f d Ó f

à;G/µB@ a * , e " : à;/µ,Ð/|¶ZYVBAA@1 G¶¾QÝFàqp#Yµ,¶TGèàá " UW$ YX Z _^F, : @#; $ ¶ A@#?<[> 7 7Þ,ÐG¶¾à­GÝ $ Þ;´ S ³/#Âà;G­Ý/¶¾G]\ O½ 237 h%Q<L:|gY:B37.Q+<L01);2<.#U46#)= SÀ+ UW$ YX Z ¢¾3G2),-G:­§+62-A6);UG *A,[Yg1<*6<L+¦v3/++<L Y(*)À63:B37.Q:;(8101)GÄ

^ O O0_`_ Y4a PR > L,µ|AÂ*´4в1µ,¶ ·©" F¶Â<TàB±¶¾Q¶¾¶¾GTb²v|²1;|*´©9T·Bª" ^F; : ÞF;G©A@#c<?> 7 $ ;Wp#­Þ,´ S ³G1Âà;G|ÝG¶TGd\ OH 6<L+h3G267©63.#3G6)6Q-A6°-/./ Z SÚ01);Xv.#),0¬-G+

<.KÀ);X.Q<*6<*37. b :;-/. v)¦2);¢¾3G2|81(L-A6),0-G+<.KÀ);X.1< ]6<*37.Dä1g1 94+<Y(*-G0Q0Q<.#U-+<.#U/(*)¯01),:;<L+<*37.§.#3#01)Ggh%Q<L:|<L+5Q-A2)V./6%3G¢-G(L(R6#)3G6#);2¯.#3#01),+=dT+);) ½ <*U ]8#2) b f£¸%#)@:B37.10Y<*6<*37.1-G(¯-G++4¢T8Q.Q:B6<*37.1+:B3G22),+v37.10 ]

<.#U§63@0Q<ßEv);2)V./6¯¥A-G(8#),+3G¢6#)01),:;<L+<*37.J.#3#01)-A2)-G++|8Q),0=63¯ )6#3/+)+),:;<*XY),0 96#)¦:B374Q-A6<L Q(*) SÀ+;£»¸%1<L+»),-/.1+6Q-A6;g/<*¢eÁ01)V.#3G6),+O6#)0Q),:;<L+<*37.

28 A. Antonucci and M. Zaffalon

Page 39: Proceedings of the Third European Workshop on Probabilistic

½ <*U78#2) b ÄOeR3P:;-G(9+v),:;<*XY:;-A6<*37.=3G¢Y-%.#37. ] +),Q-A2-A6),(*+v),:;<*XQ),0 Z S 3V¥G);2§6#)K''& <. ½ <*U7812)ªÛA£ (^) ])V v);2¬61-A6«:;<*2:;(*),+«01)V.#3G6)æ8Q.1:B);26-G<. .#3#01),+;gh%1<L(*)¯61)°+º#81-A2)<L+^81+),0§¢Â3G2^6#)°01),:;<L+<*37.@.#3#01)G£

.#3#01)GgA6#)\+6-A6),+3G¢ e <.Q01)B°6#)\:B37Q-A6<L Y(*) SÀ+;g-/.10 64d#) D ?) f°Ä 6C7d#)ED ?4)³fg»h%#);2) 6C7d#)ED ?4)³f-A2)6#):B37.10Q<*6<*37.Q-G(-G++%¢T8Y.1:B6<*37.1+5+v),:;<*XQ),0© 76#) ] 6J:B374Q-A6<L Q(*)4 SÚ¢¾3G2°),-G:­ #);* -/.10?4)5* ,/>B0-/.Q0 * ,%£¸^1<L+^¢Â3G2|81(L-A6<*37.gYh%1<L:­<L+R-/.)B#-/4Q(*)3G¢16#) Z^Z > 62|-/.1+¢¾3G2|-A6<*37.d Z -/.#3);6¦-G(T£LgÛ aGa ä#fg<L+37.1(*D+);)V<.#U/(*D(*3P:;-G(TgQ v),:;-/81+)À3G¢6#)-A2:;+:B37.Q.#),:B6<.#U©6#)§01),:;<L+<*37.Å.13P01)h^<*6Å-G(L(6#)8Q.1:B);26-G<.Å.#3#01),+;£ H .¹6#)42)V-G<.1<.#U@Q-A26°3G¢61<L+\+),:B6<*37.g1h)¯+#3xhæ13Vhå0Q<ßE);2)V.76 Z SÀ+\+v),:;<*X ]:;-A6<*37.1+\:;-/.D v)2);¢¾3G2|81(L-A6),0-G+2),º#81<*2),0 7KÀ);¢ ]<.1<*6<*37.ä1£½ 3G2\)B1-/Q(*)Gg#h)À:;-/.2),Q2),+)V.76¨)B#6)V.1+<*¥G) Z SÀ+

9@<.7623P081:;<.#U§-01),:;<L+<*37.©Q-A2)V.76¯¢¾3G2À),-G:­J.#3#01)3G¢!6#)¯3G2<*U/<.1-G( Z SÚd ½ <*U7812)=Ñ9f£

½ <*U78#2)ÑPÄ!e3#:;-G(1+v),:;<*XY:;-A6<*37.3G¢-/.)B#6)V.1+<*¥G) Z S3V¥G);2À6#)K''& <. ½ <*U78#2)ÛA£

¸^1)°:B37.10Q<*6<*37.1-G(»-G++¢T8Q.Q:B6<*37.1+3G¢!6#)=8Y.1:B);2 ]6-G<.Å.#3P0Q),+:B3G22),+v37.10Y<.#U§63@0Q<ßE);2)V.76¥A-G(8#),+°3G¢6#)©2),(L-A6),0å0Q),:;<L+<*37.æ.#3#01),+-A2)¹-G++8Q),0Ù63â v)6#3/+)¯+v),:;<*XQ),0 96#)¯0Q<ßE);2)V.76¨6-G Q(*),+^<.461)%)B ]6)V.1+<*¥G)\+v),:;<*XY:;-A6<*37.=3G¢Q6#)\8Q.1:B);26-G<..#3#01)G£!¸^1<L+),-/.Q+@6Q-A6;g<*¢]#)<. -/.Á8Y.1:B);26-G<.Á.#3#01)è-/.10e ) 6#):B3G22),+v37.10Y<.#UD0Q),:;<L+<*37.J.#3#01)Gg6#)+6-A6),+ ) *Q, 0<.101)B§6#)°6-G Q(*),+=6C 0 d#)[D <=)³f3G¢6#))B ]6)V.1+<*¥G)+),:;<*X:;-A6<*37.¢Â3G2 #)³g-/.10Q64d#)[D ) ?4)`f<L+6#)¼³3/<.76\4-G++¢8Q.1:B6<*37.]6C 0 d#)[D ?4)`f»-G++3#:;<L-A6),0D636#) ) ] 66-G Q(*)3G¢»6#))B#6)V.1+<*¥G)+v),:;<*XY:;-A6<*37.£

>3G2)UG)V.#);2|-G(L(*Gg»:B37.1+62-G<./6+°¢Â3G2¯61)+v),:;<*XY:;- ]6<*37.1+43G¢À:B37.10Y<*6<*37.1-G(%4-G++¢T8Y.1:B6<*37.1+2),(L-A6<*¥G)@630Q<ßE);2)V.76O.#3#01),+-A2)+<4<L(L-A2(*À2),12),+)V.76),0° 9501),:;< ]+<*37..13P01),+!hÀ1<L:|-A2)¦61)Q-A2)V.76+3G¢61),+)^.#3#01),+;£½ <.1-G(L(*Gg#63(*3P:;-G(L(*4+),:;<*¢¾Gg#-G+2),º#81<*2),0 7KÀ);¢ ]

<.1<*6<*37.$ä1g5-«+),Q-A2-A6),(*$+v),:;<*XQ),0 Z S°g¦<*6h3781(L0+8GF:B)¹63Ù2);¢¾3G2|81(L-A6)ª6#)J+),Q-A2|-A6),(*?+v),:;<*XQ),0Z S -G+§-/.Ù)B#6)V.1+<*¥G) Z S h%#3/+)6-G Y(*),+-A2)©3/ ]6-G<.#),0â:B37.Q+<L01);2<.#U¹-G(L(\61):B37= Y<.1-A6<*37.1+3G¢À6#)¥G);26<L:B),+43G¢%6#)+),Q-A2-A6),(*è+v),:;<*XQ),0â:B37.10Y<*6<*37.1-G(:B2),0Q-G(+);6+=3G¢6#)D+-/)¥A-A2<L-G Q(*)G£);6;g61<L+-G ]123/-G:­D+8PE);2+\-/.43/ 9¥#<*3781+)B#v37.#)V.76<L-G(v)B#Q(*3/+<*37.3G¢!6#)°.P8Q );2\3G¢!6-G Q(*),+%<.6#)<.1Y816+<*W;)G£'æ3G2))BEv),:B6<*¥G)¯123P:B),0Y812):B37.1+<L+6+<.-G0Y0Q<.#U

-¹01),:;<L+<*37.Ù.#3#01)<.è v);6¡h);)V.è),-G:­ .13P01)-/.Q0â<*6+Q-A2)V.76+;g2);U/-A20Q<.#UÁ6#)Ã.#3P0Q),+«3G¢@6#)æ3G2|<*U/<.1-G(3#01),(!-G+8Q.1:B);26-G<.¹.#3#01),+d ½ <*U78#2)ä#f£¸O3:B37 ]Q(*);6)6#)(*3#:;-G(+v),:;<*XY:;-A6<*37.©123P:B);),0¬-G+¯¢¾3/(L(*3Vh%+;£½ 3G2¨),-G:|D8Q.1:B);26-G<..#3P0Q)V#)³g76#)^+6-A6),+ )2*A, 03G¢56#):B3G22),+37.Q0Q<.#U¬01),:;<L+<*37. .#3P0Q) e#)5-A2)-G+ ]+8Q),0Ù63â<.Q01)B 6#)©¥G);26<L:B),+3G¢ A¾ 61)¹:B37.10Q< ]6<*37.1-G(O:B2),0Q-G(+);6+ d#)ED ?)¡fT* % gRh^<*6+?4)2*A,/>£H .è61<L+4h-VGg¦¢Â3G2),-G:|Ø8Q.1:B);26-G<. .#3#01)+ ) g\<*6D<L+v3/++<L Q(*)ª63æ+);6@61)Å:B37.10Y<*6<*37.1-G(-G++¢T8Y.1:B6<*37.64d#)[D )³f§63å v)«6#)«¥G);26)BÁ3G¢46#)â:B37.10Y<*6<*37.1-G(:B2),0Q-G(%+);6 d#)ED ?4)³f-G++3#:;<L-A6),0 63 )³g¦¢¾3G2),-G:­ )g* , 0£ ();U/-A20Q<.1U¹01),:;<L+<*37.â.#3#01),+;g¢Â3G2),-G:­01),:;<L+<*37.D.#3#01)[e ) -/.1046#)^2),(L-A6),04¥A-G(8#)=? ) 3G¢6#)Q-A2)V.76+;gh¨)@+<Q(*Å+);66#)@+81 Q+);6 , 00 , 063J v)§+81:­Å61-A6 i 64d#) D )³f u C 0 6W 0 0 -A2)§61)§¥G);2 ]6<L:B),+è3G¢ d ) D ? ) f£ ¸^1<L+Å-GQ123/-G:|g§h%Q<L:| <L+:;(*),-A2(*(L<.#),-A2\<.61)^<.1Y8#6+<*W;)Gg96-A¿G),+\<.1+Y<*2-A6<*37.¢¾237 ²v7Þ7Þ;¶Â߶¾´¯­ 2),12),+)V./6-A6<*37.1+\d Z -/.#3À-/.10>3G2-G(Tg bAÒGÒ/b f£

½ <*U78#2)èä1Ä e3#:;-G(+v),:;<*XY:;-A6<*37. 3G¢D-æ+),Q-A2-A6),(*+v),:;<*XQ),0 Z S?3x¥G);2%6#)K^''& <. ½ <*U78#2)4ÛA£

Locally specified credal networks 29

Page 40: Proceedings of the Third European Workshop on Probabilistic

NQ8Q44-A2<*W,<.#U1g#h¨):;-/.§3/ 16-G<.@<.§)!F:;<*)V./6À6<)-(*3#:;-G(P+v),:;<*XY:;-A6<*37.3G¢1- Z S°gF 7¯+<4Q(*%:B37.1+<L01);2 ]<.#U37.#)53G¢6#)562-/.Q+¢Â3G2­-A6<*37.1+01),+:B2<L v),0<.61<L++),:B6<*37.£ H 6¦<L+\h3G26§.#3G6<.#U61-A6(*3#:;-G(+v),:;<*XY:;- ]6<*37.1+¨3G¢ Z SÀ+ Q-G+),0437.6#)%K''&¯+\2),+v),:B6<*¥G),(*4<.½ <*U78#2)ä1g ½ <*U78#2)Ñ-/.Q0 ½ <*U78#2) b :B3G22),+v37.10©63Z SÀ+R Y-G+),0°37.°6#)\K''&$<. ½ <*U7812)%Û¨h%1<L:­°-A2)¨2) ]+v),:B6<*¥G),(*+),Y-A2-A6),(*°+v),:;<*XQ),0gF)B#6)V.1+<*¥G),(*°+v),: ]<*XQ),0«-/.10è.#37. ] +),Q-A2-A6),(*ØdT-/.10â.#37. ] )B#6)V.1+<*¥G),(*f+v),:;<*XQ),0£À<L:B)@¥G);2+-#g¦<*64<L+),-G+Å63Å:|#),:¿è61-A6¸O2-/.1+¢¾3G2|4-A6<*37.4Û\:;-/. v)¨2);U/-A201),0-G+!6#)¦<.7¥G);2+)3G¢»6#),+)612);)62-/.1+¢Â3G2|4-A6<*37.1+;£

Q4xËOÈ ËÊ©ÈËOÈqC5ÊvÉ -lW/YÎ1ÏTYÌ Î#ÊYÌ l5È QÉ_

eR);6\81+<L(L(81+62|-A6)5 94-¢¾);h )B1-/Q(*),+61-A66#)¯.#) ]:B),++<*6¡æ3G¢.#37. ] +),Y-A2-A6),(*æ+v),:;<*XQ),0å:B2),0Q-G(.#);6+-A2<L+),+^.1-A68#2-G(L(*§<.§-¥F-A2<*);6`§3G¢»Q23/ Q(*)V+,£^ OBM I 11 N IM 3 I IGM I _:I ¸^#) µ|G#;,|±^FG¶ ^Aª¶Â;à,;;µ|ª pP d Z H (¯f<L+@- .#);h 2|81(*)¹¢¾3G2810Q-A6<.1UÅ v),(L<*);¢T+h^<*6Ø<.1:B37Y(*);6)3/ Y+);2¥A-A6<*37.1+dÐC-FE-G(*37.g bAÒGÒ/Ó f£ Z H (æ3P01),(L+¨6#)5:;-G+)53G¢»4<ßP),0¿1.#3Vh%(*),01UG) -G v378#6ª6#)Ù<.1:B374Q(*);6)V.#),++Å123#:B),++;gh%Q<L:|¯<L+O-G++8Q),0563^ v).#),-A2(*8Q.#¿1.#3VhÀ.5¢Â3G2O+37)¥A-A2<L-G Q(*),+-/.10â8Q.1+),(*),:B6<*¥G)@¢Â3G246#)3G6#);2+,£â¸^1<L+(*),-G0Q+À63§-/.<12),:;<L+)Q23/ Q-G Q<L(L<*6`@3#01),(Tgvh%#);2)-G(L(¦61)v3/++<L Q(*):B374Q(*);6<*37.1+3G¢56#)<.1:B374Q(*);6)3/ Q+);2¥F-A6<*37.Q+§3G¢=61)¬XQ2+66¡#v)¹3G¢¥F-A2<L-G Y(*),+-A2):B37.1+<L01);2),0£ H .¹-2),:B)V./6h¨3G2¿g810Q-A6<.#U@ ),(L<*);¢T+h^<*6 Z H (Ô1-G+© v);)V.Ã:B37.1+<L01);2),0 ¢¾3G2¬ S%+âdT'5. ]637.P81:;:;<-/.10ÃC-FE-G(*37.g bAÒGÒ/_ f£ê¸^#)¹123/ Q(*)V <L+123V¥G),0«63J v)),ºP8Q<*¥F-G(*)V.7663¹-©62-G0Q<*6<*37.Q-G( v),(L<*);¢810Q-A6<.1U123/ Q(*)Vç<.«- Z S=Ä6#)§),º#81<*¥F-G(*)V.Q:B)@<L+4-G01)3/++<L Q(*) 9¯.137. ] +),Q-A2-A6),(*¯+v),:;<*XQ),0°:B2),0Q-G(.#);6+,£ _WN V1 N I M I O 8Q-G(L<*6-A6<*¥G) 123/ Q- ] Q<L(L<L+6<L:Å.#);6`h¨3G2¿P+âd¾®¬),(L(4-/.gDÛ aGaAÒ f-A2)Å-/.ã-G ]+62|-G:B6<*37.3G¢TSÀ+;g/hÀ#);2)¦6#)123/ Q-G Q<L(L<L+6<L:¦-G++),++ ])V./6+D-A2)©2),Q(L-G:B),0æ 9Ùº#81-G(L<*6-A6<*¥G)¹2),(L-A6<*37.1+3G¢ íR÷¯ú÷ô@xï=îFõóÐï*ôÿóôö÷5ô¦òGõ|þó¡ú|î;ó5óÐõ 2Gïú8úþÐïóÐøAúó

ú=ñ¡þÐï FúùR÷ï`óñú|îøAú Bï%ú Bï`þ6ùûúþÐý­ï¯î 32Gï³þ¨õ Bï³þóÐôûñ³ï÷*ú|î4DóÐøFôö÷Àñ³ú|î÷TóÐôöùûù 2Gï°ú÷õ3Vþ¡ñ³ï°õñ³õ¦ò3xó¡úóÐôöõ|îAú|ùOòxþÐõ2ùöï¦÷ßõ|þúùûý|õ|þÐôÿóÐø4¦÷ %L÷3AñÐøú÷óÐøFõ|÷ï 2Fú|÷ï õ­îóÐøxï / / =óþ¡ú|îx÷ *õ|þ!úóÐôöõ­î-1PóÐøAúóvï`ðxòxùöô ñ`ôöóÐù6ïî 34¦ï³þ¡úóÐïvóÐøFï Bï³þóÐôûñ³ï³÷õú%îxï³ó ÷¨ñ¡þÐï Fúù÷ï³óÐ÷;øxôû÷ôö÷ú8vïùöù':,îFõ 8ªôö÷÷3Fï* 8øFôûñ¡øôûîóÐøFïòxþÐï³÷ïî;óv÷ï³ó!3xòôû÷YþÐïùûúóÐï \óÐõóÐøFïòGõ|÷÷ô@2xù6ù úþÐý­ïOî032Gï³þõR÷Tó¡úóÐï³÷*õþóÐøFï xïñ³ôö÷ôûõ|îîFõ xï³÷ôöîóÐøFïùöõ,ñú|ùöù6÷òGïñ³ôÿüAïþÐïòxþÐï³÷ïî;ó¡úóÐôöõ­î5õQú\ñ`þÐïFú|ù9îFï³ó ;

6#)<. 8#)V.1:B),+!3G261)+#.1);2U/<*),+ v);6¡h);)V.6#)¦¥A-A2< ]-G Q(*),+,£ H ¢h¨)2);U/-A20$º#81-G(L<*6-A6<*¥G)J.#);6+-G+:B2),0Q-G(.#);6+,gh¨)+);)61-A6°.#3G6-G(L(6`Pv),+¯3G¢2),(L-A6<*37.1+=:;-/. v)52),12),+)V./6),0 9+),Q-A2-A6)°+),:;<*X:;-A6<*37.1+3G¢»6#):B37.10Y<*6<*37.1-G(#:B2),0Q-G(P+);6+;£¸^1<L+»<L+;gA¢Â3G2<.1+6-/.1:B)GgG6#):;-G+)\3G¢OdTv3/+<*6<*¥G)Af p1A߶¾ÐG¶:^F^¶ÂTpQ;µ| gGhÀ1<L:|°2) ]º#81<*2),+;gP¢¾3G2¦6¡h3 3P3/(*),-/.¥F-A2|<L-G Q(*),+å-/.10 Dg#61-A6

64d"!7D #Vf$64d"!7D&%'#Vf d _ f

¸^1)Àº#81-G(L<*6-A6<*¥G)=<. 8#)V.1:B)5 v);6¡h);)V.(?-/.10( :;-/.6#);2);¢Â3G2) v)D3P01),(*),0ª 7J2),ºP8Q<*2<.#U96d)#D #Vf-/.1064d)gD&%'#Vf634 v),(*37.#U63:B2),0Q-G(+);6+;gYh%1<L:|:;-/.Q.#3G6 v)¨+),Q-A2-A6),(*+v),:;<*XQ),0 ),:;-/8Q+)¨3G¢6#)\:B37.1+62-G<.76<.¤º#81-A6<*37.èd _ f£'À.@)B#6)V.1+<*¥G)+v),:;<*XY:;-A6<*37.©¢¾3G2 +#3781(L0©6#);2);¢Â3G2) v):B37.1+<L0Q);2),0¬633#01),(O6#)v3/+<*6<*¥G)°<. 8#)V.1:B)¯3G¢ d Z 3GWV-/.);6%-G(T£Lg bAÒGÒ ä#f£

*,+ N # _:I M .-vV0/ Q 3 O ^21@ ()V)V );261-A6©K''&5+¹2),12),+)V.76©<.101),v)V.101)V.1:;<*),+ v);6¡h);)V.¥A-A2<L-G Q(*),+¹-G:;:B3G2|0Q<.#Uæ63å6#)â>@-A2¿G3V¥ :B37.Q0Q<*6<*37.£K5<ßE);2)V./6@K''&5+0Q),+:B2<L Q<.#Uè61)¹+-/)¬<.101),v)V. ]01)V.Q:;<*),+D-A2)+-G<L0$63è v) p#¶ ^FA;Q d3);2|-Å-/.10'»),-A2(TgÛ aGaAÒ f£¸^98Q+;g-§ S :;-/.¬ v)2);¢¾3G2|81(L-A6),081+<.#U-/.D),º#81<*¥A-G(*)V./6K^''&£¸^1)À+-/)513/(L0Q+\h^<*6Z SÀ+;g%h%#)V. dT-G+§<Q(L<L:;<*6(*$0137.#)¬<. 61<L+§Q-Gv);2;fB³G $¶ÂY9`²1,Y7;µ| 2),Q(L-G:B),+@+6-/.10Q-A20ã123/ Q- ] Q<L(L<L+6<L:<.101),v)V.10Q)V.1:B)<.è61)>@-A2¿G3V¥Ù:B37.Q0Q<*6<*37.d>3G2-G(-/.10 Z -/.#31g bAÒGÒ/b f£Z 37.1+<L01);2,gR¢Â3G2)B1-/Y(*)Gg4 5 -/.106879Dg

h%Q<L:|©-A2):;(*),-A2(*©),º#81<*¥F-G(*)V.76K''&5+,£;:.#)123/ ](*)V h%<*6¬+),Q-A2-A6),(*¹+),:;<*XY),0 Z S%+°<L+61-A6¯6#);-A2)©.#3G64:;(*3/+),0æ8Q.Q01);261<L+4¿P<.10Ù3G¢d¾),ºP81<*¥A-G(*)V.76Bf+62­81:B68#2)$:|1-/.#UG),+,Ä <*¢§h¨)æ01);Xv.#)Ù-Ã+),Q-A2-A6),(*+v),:;<*XQ),0 Z S ¢¾3G2< 5DgO-/.Q0©6#)V.©2);¥G);2+)6#)-A2:Ag6#)â2),+|81(*6<.#Uã.#);6ªh^<L(L(4.#3G6Å v)â+),Q-A2-A6),(*+v),:;<*XQ),04<.UG)V.#);2-G(T£ Z 37.1+<L01);261)¢Â3/(L(*3xh^<.#U+), ]-A2-A6)+v),:;<*XY:;-A6<*37.1+R3G¢P61):B37.10Q<*6<*37.1-G(P:B2),0Q-G(7+);6+¢¾3G2^- Z Sã3V¥G);2= >DÄ

d)+D !1fÄ ? a ! ÛA@ d)+D&%'!QfÄ B? c ! b @ d)°f Ä Z\[ i ? ! ÑC@ ? Ó ! Ó @ u

h%1);2) 6¡h3 ] 0Q<)V.1+<*37.1-G(J#3G2<*W;37.76-G(¹-A22|-,#+å-A2)81+),0630Q)V.#3G6)-G++¢8Q.1:B6<*37.1+\¢¾3G2 v393/(*),-/.D¥A-A2< ]-G Q(*),+,£N181:­¬-+),Q-A2-A6),(*¹+v),:;<*XQ),0 Z SÚQ-G+¯6¡h3:B374Q-A6<L Q(*) SÀ+;g37.#)¢Â3G2),-G:|ª¥G);26)BJ3G¢ d)°f£½ 237 61)¨¼3/<.76¦4-G++¨¢T8Q.Q:B6<*37.1+\:B3G22),+v37.10Q<.1U°63

30 A. Antonucci and M. Zaffalon

Page 41: Proceedings of the Third European Workshop on Probabilistic

6#),+)TSÀ+;g»+-, 6 ë d) f%-/.10 6 d) 4fg»h)3/ ]6-G<.§61):B37.10Q<*6<*37.1-G(»-G++\¢8Q.1:B6<*37.1+¢Â3G261):B3G2 ]2),+v37.10Y<.#UTSÀ+3x¥G);2 >=Ä6 ë d)fB? c ! Û,ÑC@6 ë d)#D #Vf B? b ! bGc @6 ë d)#D&% #,f2B? Ó ä ! ä _ @

6 d)fB? cGÓ ! Û Ó @6 d)#D #Vf B? ÓGb ! äL@6 d)#D&% #,f2B? ÑGÑ ! _ @ 'À:;:B3G20Q<.#U$63$KÀ);X.Q<*6<*37. b g=6#),+)Å6¡h3Ø0Y<L+6<.1:B6+v),:;<*XY:;-A6<*37.1+01);X.1)- Z S 3V¥G);2 °g»h%1<L:­:;-/.Q.#3G6! v)+),Q-A2|-A6),(*=+v),:;<*XQ),0=-G+»<.=KÀ);X.1<*6<*37.ÑP£¸O3$+);)Å61<L+;g.#3G6)Å¢Â3G2)B#-/4Q(*)J6Q-A6@6#)«+v),: ]<*XY:;-A6<*37. 64d"!7D #,f ÓGb gQ6d"!BD&%'#Vf ÑGÑ -/.1064d"#Vf cGÓ g%h%1<L:­Ù<L+:;(*),-A2(*$-ªv3/++<L Y(*)@+v),:;< ]XY:;-A6<*37.<*¢61)À:B37.10Y<*6<*37.1-G(:B2),0Q-G(v+);6+\h);2)À+),Q- ]2-A6),(*4+),:;<*XY),0g9h3781(L0D(*),-G0D63=6#)¯8Q.1-G:;:B),Q6-G Q(*)-G++\¢T8Q.Q:B6<*37.J64d)°fB? ä a ! Ó ÛA@ * d)f£H 6Ù<L+ 8Q+);¢T8Q(63 3/ Q+);2¥G)?61-A6 UG)V.#);2-G(Tg¬.#37. ]+),Q-A2|-A6),(*4+v),:;<*XQ),0g Z S%+¨013.13G6¨+8PE);2¢¾3G261),+)123/ Y(*)V+¼8Q+6¹ v),:;-/81+)«6#); -A2)Ù:;(*3/+),0 8Y.101);2),º#81<*¥F-G(*)V.76¨:­1-/.#UG),+¨<.6#)^+62­81:B68#2)¦3G¢v6#)^K''&£TI Y MNM - 3³ O 5 NM O 5 / _:I I 1V &¯<*¥G)V.6#2);)© v393/(*),-/. 2-/.Q0137 ¥F-A2<L-G Y(*),+(°g -/.10g(*);656#)K''& )B#12),++5<.101),v)V.101)V. ]:;<*),+ );6`h¨);)V.46#)V£O®¬)Àh-/.7663(*),-A2|.46#)%3#01),(123/ Y-G Q<L(L<*6<*),+¢¾3G2+|81:|«-¬K''& ¢Â237 61)<.Q:B37 ]Q(*);6)0Q-A6-+);6\<.¸»-G Q(*)ÛAgQ-G++8Y<.#U.#3<.#¢¾3G2|- ]6<*37.¬-G v378#656#)123P:B),++4-A¿P<.1UD61)3/ Y+);2¥A-A6<*37.3G¢ ê<L++<.1U4<.@6#)(L-G+6À2),:B3G203G¢6#)0Q-A6-+);6;£¸^#)3/+6^:B37.Q+);2¥A-A6<*¥G)-GQQ23/-G:|@<L+6#);2);¢Â3G2)=63(*),-A2|.ª6¡h3¬0Q<L+6<.1:B6 S%+¢¾237 6#)6`h¨3¬:B374Q(*);6)0Q-A6-¬+);6+4:B3G22),+v37.10Q<.1U63J6#)§3/++<L Q(*)§¥A-G(8#),+3G¢61)4<L++<.#UD3/ Q+);2¥A-A6<*37.©-/.10©:B37.1+<L0Q);25<.Q01);),06#) Z SÃ4-G01)53G¢»61),+):B37Q-A6<L Y(*)° S%+;£

' Z! # %'! % # ! # %!

¸»-G Q(*)@ÛAÄ'Ú0Q-A6-+);6=-G v378#6612);) v3P3/(*),-/.¬¥A-A2< ]-G Q(*),+;g=01)V.#3G6),+%-<L++<.1U3/ Q+);2¥A-A6<*37.£¸O3-A¿G)À61<.#U/+\+<Y(*)^h)5:B374Y8#6)%6#)¯123/ ]

-G Q<L(L<*6<*),+\¢¾3G2¨61)¼3/<./6\+6-A6),+¦ 94),-/.1+3G¢6#)%2),( ]-A6<*¥G)¯¢¾2),ºP81)V.1:;<*),+¦<.6#)¯:B37Q(*);6)¯0Q-A6-+);6+;£eR);66 ë d) f^-/.Q0(6 d) f% v)6#)5¼3/<./6=4-G++

¢8Q.1:B6<*37.3/ 16-G<.#),0<.§61<L+%h-VGgh%1<L:|@01);X.1)6#)+-/):B37.10Q<*6<*37.1-G(R4-G++¦¢8Q.1:B6<*37.1+¢Â3G2,Ä

6 ë d)°f 6 d)f2B? Ó ! bGÓ @6 ë d)+D&%'!Qf1 6 d)JD&% !1f1B? Ò ÛA@6 ë d]D&%'#Vf1 6 d]D&%'#Vf1B?*Û Ò @Ðé-/.100Y<ßEv);2)V.76^:B37.10Y<*6<*37.1-G(R-G++¦¢T8Y.1:B6<*37.1+\¢¾3G2,Ä

6 ë d)+D !1f ?*Û Ò @6 ë d D #VfB? _ ! ÑGÑC@6 d)+D !QfB? _ ! ÑGÑC@6 d]D #VfB? Ó ! Ó @

®¬)Q-,¥G)°61);2);¢Â3G2)3/ 16-G<.#),06¡h3D SÀ+3V¥G);2, gh%1<L:­:;-/.@ v)°2);U/-A20Q),0-G+À6#)=:B374Q-A6 ]<L Q(*)TSÀ+D3G¢- Z S°£¦N181:|Ù- Z S <L+:;(*),-A2(*$.#37. ]+),Q-A2|-A6),(*+v),:;<*XQ),0g1 v),:;-/81+)À6#)À6¡h3TSÀ+¦+v),: ]<*¢¾=0Q<ßE);2)V.76:B37.10Q<*6<*37.1-G(Y4-G++»¢T8Y.1:B6<*37.1+!¢¾3G23G2)61-/.§-¥F-A2<L-G Y(*)G£

Ê/Ë l¡ËÎG7l lWØÉGË C5ÊvÉ -lW /Î1Ï5ÌÁÎ#ÊYÌ l5È QÉ_

H .61<L+5+),:B6<*37.gvh)123x¥G)=61-A6¯-/.7(*3#:;-G(L(*@+v),: ]<*XQ),0 Z SÚ3V¥G);2g :;-/.ª),ºP8Q<*¥F-G(*)V.76(*ª v)2);U/-A2|01),0-G+-°+),Q-A2-A6),(*+v),:;<*XQ),0 Z S 3x¥G);2T£¸^#)¦62|-/.1+ ]¢¾3G2|-A6<*37.Ø<L+§6),:­Q.1<L:;-G(L(*æ+62-G<*U776¢¾3G2h-A2|0Äâ<*6<L+ Q-G+),037.2),12),+)V.76<.#U0Q),:;<L+<*37.©.#3P0Q),+^ 9§8Y.1:B);2 ]6-G<.4.13P01),+h^<*6¥A-G:,8#3781+¨:B37.10Q<*6<*37.1-G(v:B2),0Q-G(Q+);6+;g-G+¢¾3G2|4-G(L<*W;),0 v),(*3Vh£2 V M 3 O 65¹1 N:O-M R87 ¶ ^A;Á *,µ|A¾ß´Ùв1µ,¶ ·" UW$ d9 ¾f d YX f Z _^A; : 7Þ;ÐG¶¾¬,`²v±GG¡B*´Jв1µ,¶ ·æ" UW$ &%8Z _^F, : @#,©A@1µ/YG¶¾¶¾GAvµV/AQ,;¾¶¾ % GOàBGÀ/µB@ #)2*+GY ?)1*9,/>-0 E

d#) D ?)¡fÄ 64d#)ED ?4)³f <*¢4#)2*J W 0/40 d#)`fÔ<*¢4#)2*J

d 7f

@#, 6d#)ED ?)¡f ¶ ^A@#5ÝF|Rà p#YµV¶¾/4¡²1µV¶ ·\¶¾X GY W 0/ 0 d#)³f A@# ¥A-G:,8#3781+ µ,7AQ,;9àBG , 0. 0 O½ <*U78#2) Ó 2),v3G26+=-/.¹)B1-/Y(*)3G¢¸R2|-/.1+¢¾3G2|- ]

6<*37. b gh%1<L:|@<L+5:;(*),-A2(*(L<.#),-A2¯<.@6#)<.1Y8#6À+<*W;)G£

¸^1)4dT+6237.#U#f2),(L-A6<*37. v);6¡h);)V.-(*3#:;-G(L(*+v),: ]<*XQ),0 Z S UW$ d ¾f d X f Z 3V¥G);2] -/.10Å6#)+),Q-A2|-A6),(*Ô+v),:;<*XQ),0 Z S UW$ &% Z 2);68#2­.#),0ç 7¸O2-/.1+¢¾3G2|-A6<*37. b <L+378#6(L<.#),0 7D6#)¢¾3/(L(*3Vh^<.1U1Ä

Locally specified credal networks 31

Page 42: Proceedings of the Third European Workshop on Probabilistic

½ <*U78#2) Ó ÄÁ¸^1)«K^''& -G++3#:;<L-A6),0 63æ61)â+),Q- ]2-A6),(*Á+v),:;<*XQ),0 Z S 2);68#2|.#),0 7Á¸O2-/.1+¢¾3G2|- ]6<*37. b g»¢¾237 61)(*3#:;-G(L(*¹+v),:;<*XQ),0 Z SÚ Q-G+),0¹37.6#) K''& <. ½ <*U1£°ä1£ ¸^#)â:B37.Q0Q<*6<*37.1-G(D:B2),0Q-G(+);6+43G¢¯6#)hÀ1<*6)@.#3#01),+@dT:B3G22),+v37.10Q<.#U¬63ª6#)3G2<*U/<.Q-G(#8Y.1:B);26-G<..#3#01),+Bf-A2)¦12),:;<L+),(*+v),:;<*XQ),0gh%Q<L(*)¯6#)UG2);.#3#01),+°dT<T£ÿ)G£Lg.#);hã8Q.Q:B);26-G<..#3#01),+:B3G22),+v37.10Q<.#U\636#)¢Â3G2|);201),:;<L+<*37.=.#3P0Q),+BfQ2), ]2),+)V./6¥A-A2<L-G Q(*),+@h%#3/+)J:B37.10Q<*6<*37.1-G(=:B2),0Q-G(=+);6+-A2)¯¥A-G:,8#3781+,£

2RQ I O I 5 7R ; d f Þ|ÔA@# ÝGG¶¾YAà;G à A@# B³/ [o/¡;1­¶TG df àUW$ &%8Z GY d f A@#Ö­G Eo7¡;#B¶¾/ àUW$ d Âf d YX f Z O S @#, E

d f1 d f d c f

½ 237Á¸^#);3G2)V b g7<*6<L++62-G<*U776¢¾3G2h-A2|063:B37. ]:;(810Q)¯6#)¢¾3/(L(*3Vh^<.#U1Ä^ O O0_`_ Y4a R >5Q´©¶¾;à,;;Yµ­=²v7ÞB;Ý G$¬*G±µ|AÂ*´§¡²1µV¶ ·\«" µGâÞ|D p#¶:^xA;QT*´§;A ^F©¶¾A@#§;³²#GG¡B*´¬¡²1µV¶ ·\$" ;Wp#­«Þ;´ S G#­±à;G­Ý/¶¾G OeR);6D81++62),++461-A6¸R2-/.Q+¢Â3G2­-A6<*37. b <L+4¥G);2

+<4Q(*)Gg-/.10©<*6<L+¯+8#2Q2<L+<.#U61-A6<*6<L+¯12),+)V.76),0#);2)¢¾3G2D61)@XQ2|+646<)Gg^-G+<*6D<L+D2),-G(L(* 6#)@¿G);63ÙI+),Q-A2-A6),M61):B2),0Q-G(+);6+3G¢¦.137. ] +),Q-A2-A6),(*+v),:;<*XQ),0 .#);6+;Ä@<.⢾-G:B6;g¦U/<*¥G)V. -ª.137. ] +),Q-A2-A6),(*+v),:;<*XQ),0 Z S=gV37.#)\:;-/.°(*3#:;-G(L(*=+),:;<*¢¾56#) Z S°gA81+ ]<.#U6#)=Q2),+:B2<L16<*37.Q+^3G¢6#)+),:B37.10Q-A26%3G¢¨NP),: ]6<*37. Ó g-/.10â-GQQ(*è¸O2-/.1+¢¾3G2|4-A6<*37. b 63ª3/ 16-G<.-+),Q-A2|-A6),(*+v),:;<*XQ),0 Z S=£'%:;:B3G20Y<.#U§63 Z 3G23/( ](L-A2 b g16#)V.g#-/./D<.#¢Â);2)V.1:B)¯123/ Q(*)V 37.6#)53G2<*U ]<.1-G( Z Sæ:;-/.D),º#81<*¥A-G(*)V./6(*D )%2),12),+)V./6),0D37.61<L+.#);hâ+),Q-A2-A6),(*+v),:;<*XQ),0 Z S=£G¸R3-A¿G)-/.)B1-/ ]Q(*)Gg\61<L+D123#:B),0Y8#2):;-/.Ù<),0Q<L-A6),(*è+3/(*¥G)6#)

Z H (Ú810Q-A6<.#U@Q23/ Q(*)VÖ0Q),+:B2<L v),0ª<.«NP),:B6<*37. _ gh%Q<L:|§<L+-.#3G6-G Q(*)°-/.102),-G01 ] 63 ] 81+)2),+|81(*6;£ kËRȨÎl¡ÍAÏ¡ËRÈÈÌ ËRÍÉlÐË!Ë n2®¬)¬1-,¥G)01);X.#),0 -ª.#);hÕUG2-GY1<L:;-G(À(L-/.#U781-AUG)63¢¾3G2|81(L-A6)^-/.76¡#v)^3G¢:B2),0Q-G(v.#);6¡h3G2¿vg1 3G64+), ]-A2-A6),(*ª-/.10ª.#37. ] +),Q-A2-A6),(*J+),:;<*XY),0£®¬)1-V¥G)-G(L+3@+#3Vh),0¹6Q-A6°-/./¹.#);6°2),12),+)V.76),0¹h%<*6©6#).#);hØ(L-/.#U781-AUG)5:;-/.D v)^),-G+<L(*62-/.1+¢¾3G2|),04<./63-/.),º#81<*¥A-G(*)V./6À+),Q-A2-A6),(*+),:;<*XY),0:B2),0Q-G(».#);6;£\¸^1<L+<4Q(L<*),+;gx<.°Q-A26<L:,81(L-A2VgF6Q-A6».#37. ] +),Q-A2-A6),(*+v),:;< ]XQ),0.#);6+¨1-V¥G)%-/.),ºP81<*¥A-G(*)V.76+),Y-A2-A6),(*+v),:;<*XQ),02),Q2),+)V.76-A6<*37.g%¢¾3G2h%1<L:­Ø+3/(8#6<*37.1+§-G(*UG3G2<*6Y+-A2)-V¥F-G<L(L-G Y(*)<.61)°(L<*6);2-A68#2)G£¸%#)¦62-/.1+¢¾3G2|4-A6<*37.123/v3/+),04-G(L+3+|#3Vh^+61-A6

-æ+8Q v:;(L-G++¬3G¢D+),Q-A2-A6),(* +v),:;<*XQ),0Á:B2),0Y-G(4.#);6 ]h3G2¿#+°:;-/.ª )D81+),0¹63+3/(*¥G)<.#¢Â);2)V.1:B)123/ Q(*)V+¢¾3G2-A2 Y<*62-A2â+v),:;<*XQ),0 :B2),0Q-G(%.1);6+;Ä©61<L+<L+46#):;(L-G++À3G¢.#);6+5<.h%1<L:­6#):B2),0Q-G(»+);6+À-A2)),<*6#);2¥A-G:,8#3781+¨3G2¦12),:;<L+)G£ H 6¨<L+h3G26.13G6<.#U=6Q-A6-=2) ]:B)V.76¨01);¥G),(*3/)V.763G¢6#)^-GQQ23,1<-A6)^e b Üæ-G(*UG3 ]2<*6Y dT'À.7637.98Q:;:;<);6^-G(T£Lg bAÒGÒ/_ f¨+);)V+\634 v)¯Q-A2 ]6<L:,81(L-A2|(*+81<*6),0°¼81+6¢Â3G2\+81:­4-=:;(L-G++;g#-/.10D+#3781(L06#);2);¢Â3G2) v):B37.1+<L01);2),0<.¢8#68#2)¯h¨3G2¿v£½ <.1-G(L(*Gg6#)«+6237.#U$:B37.Q.#),:B6<*37. v);6¡h);)V.ã6#)

(L-/.#U78Q-AUG)¢Â3G2%:B2),0Q-G(R.#);6`h¨3G2¿P+%<./623#0Y81:B),0<.61<L+Q-Gv);24-/.Q0è6#)¢¾3G2|4-G(L<L+ 3G¢¯01),:;<L+<*37.$.#);6¡h3G2¿#+dT<.1:;(8Q0Q<.#U5<. 8#)V.1:B)\0Q<L-AUG2-/4+Bfg/+);)V4+O63 v)\Q-A2 ]6<L:,81(L-A2|(*ªh3G26«)B1Q(*3G2<.#U¬¢¾3G2:B23/++ ] ¢Â);26<L(L<*W,-A6<*37. v);6¡h);)V.§61)¯6¡h34XQ),(L0Q+,£ Î n!ÈËm l YÌ YÈOɸ^Q<L+§2),+),-A2:­åh\-G+Y-A26<L-G(L(*å+81Qv3G26),0æ 9Ø6#)NPh%<L++/S5N ½ UG2|-/./6 bAÒGÒGÒ/bAÒF] Û Ò/aGbGaGÓ ÛA£ Ê/Ë!Ë! ¸^1)3/ 7¥#<*3781+D123P3G¢¾+43G¢ Z 3G23/(L(L-A2ãÛ-/.10 Z 3G23/( ](L-A2 b -A2)°37<*66),0£ ³,àà S @1G;Ý \ O\e);6^8Q+¦+6-A266#)°4-A2U/<.1-G( ]<*W,-A6<*37.ã<.椺#81-A6<*37. d Ó f4¢¾237 -â0Q),:;<L+<*37.?.#3#01) *Q £%'À:;:B3G20Q<.#UD63D¤¨ºP8Q-A6<*37.ªd¾ä#fg¢¾3G2%),-G:­a(*A,Ve5Ä

UV LW*X 6Vd:a!f U

V 6W*X b

. e 6Vd:3YD ? Tf! b.10` e "6d:3)ED ?4)³f

d a f

32 A. Antonucci and M. Zaffalon

Page 43: Proceedings of the Third European Workshop on Probabilistic

¸^P81+;g^3V¥#<.#Uª378163G¢61)+8YÔ6#)¬:B37.10Y<*6<*37.1-G(123/ Y-G Q<L(L<*6<*),+§h%1<L:­æ013â.#3G6§2);¢¾);2§63â6#)¹+6-A6),+3G¢5 d¾h%1<L:|©-A2) Q2<*) 01)V.13G6),0¬ 7Dfg¤º#81- ]6<*37.Jd a f ),:B37),+;Ä

U LW / 6xd:3 D ? f b

. / 64d:3_D 3 ?Ff dÛ Ò fh%#);2). 01)V.#3G6),+°6#)4:­1<L(L012)V.©3G¢/ -/.10g¢¾3G2),-G:| * . g <D-A2)§6#)§Q-A2)V.76+3G¢=01) ]12<*¥G),0â3G¢8 £«¸%#);2);¢¾3G2)Gg:B37.Q+<L01);2<.#U¬6Q-A66#)-G++¢8Q.1:B6<*37.6 d D ? f-G++<*U7.1+\-G(L(v6#)¯-G++¨636#)¥A-G(8#) . d? f * ,/. gRhÀ#);2) . <L+61)401) ]:;<L+<*37.Ù¢T8Y.1:B6<*37.Ù-G++3P:;<L-A6),0$63 g¤º#81-A6<*37.ÃdÛ Ò f2);h2|<*6),+-G+ b

. / 64d:3_D . d? f ?Ff dÛGÛFf

H 6<L+6#);2);¢¾3G2)+8GF4:;<*)V.7663©+);6@< Ä < < g-/.10

6 d D ? fÄ 64dD . d? f ?xf dÛ b f

632);U/-A20¤º#81-A6<*37.¬dÛGÛFf-G+\6#)\¼3/<.76^-G++\¢8Q.1: ]6<*37.3G¢-4 S 3V¥G);2; i u Q-G+),037.6#)=K''&2);68#2­.#),0° 9¸R2|-/.1+¢¾3G2|-A6<*37.4Û¨:B37.1+<L01);2),0=¢Â3G2»6#)+<.#U/(*)01),:;<L+<*37..#3#01) *+^£»¸^#)¨6#),+<L+61);2) ]¢¾3G2)¢¾3/(L(*3Vh^+À¢¾237 -4+<Q(*)<*6);2-A6<*37.@3x¥G);25-G(L(R6#) *A9%£¸^1)%¢¾3/(L(*3Vh^<.1Uh¨),(L( ] ¿1.#3VhÀ.-/.10¬d¾2),(L-A6<*¥G),(*Yf\<. ]

681<*6<*¥G)123/v3/+<*6<*37.J<L+2),º#81<*2),0¬63@3/ 16-G<.ª¸^#);3 ]2)V b g-/.10¹h^<L(L(¨ v)123x¥G),0Å1);2) v),:;-/81+)3G¢¦6#)+);)V4<.#U/(*(L-G:¿@3G¢<*6+À¢Â3G2|4-G(R12393G¢<.@6#)=(L<*6);2|- ]68#2)G£ O / O N N:OBM PRcS @#V^F,|¶Tµ|B i 6 d@f u c ë à\A@1­G [o/¡;1­¶TG d@f àÙæ;³²#GG¡B*´å¡²1µV¶Â±·4" UW$ &% Z G G¶¾QÝFàqpPµ,¶TG#\9Þ;ÐG¶¾Þ;´DA@#4µGÝÞ;¶¾YG¶TGÅà@^A;­¶¾µ­­à=A@#;`²1G³/¡B*´Ð²Qµ,¶ · µ|GYG¶¾¶¾/YA=µV/A5;,¾ : ¶ O O : à;G¹/µB@a(*A,/e E 6 d:a!f "

b)dc ë

6 d:3-) D ?)`f dÛ,Ñ9f

à;GØ/µJ@ Û ! ! ! h : @#; : à;Gæ/µB@ fAÛ ! ! ! hEi GY ?4)/* ,/>B0 : 6 d#)[D ?4)`f ¶ 9^F,|¡[o@à d#)[D ?4)`f* % O

,àà ³|²1F­¶¾¶¾/\ O®¬)ªQ23V¥G)¹61)J123/v3/+< ]6<*37.© 9- _p1µ,¶T/7Þ­qp#³_p#Ý g-G++8Y<.#U61-A6-A6¦(*),-G+6-¥G);26)B 64d@f3G¢ df<L+.13G63/ Q6-G<.#),0 9Å-¹(*3#:;-G(^:B37 Q<.Q-A6<*37.è3G¢5¥G);26<L:B),+3G¢À61):B37. ]0Q<*6<*37.1-G(4:B2),0Q-G(+);6+ª<. % £ ¸^1<L+¹),-/.Q+©61-A6;g¢¾3G2),-G:| a*,/e5g 64d:a!f¢T-G:B63G2<*W;),+-G+<.«¤º#81- ]6<*37.JdÛ,Ñ9fgv Y8#6-A6^(*),-G+6%-:B37.10Y<*6<*37.1-G(123/ Y-G Q<L(L<*6¡<.61<L+Q23P081:B6À:B37),+5¢Â237 -§:B37.10Q<*6<*37.1-G(4-G++¢8Q.1:B6<*37. h%1<L:­Ù<L+.#3G6-ª¥G);26)B 3G¢=6#)@2),(L-A6<*¥G):B37.10Q<*6<*37.Q-G(¯:B2),0Q-G(À+);6;£ ¸^1<L+:B37.10Q<*6<*37.Q-G(54-G++¢8Q.1:B6<*37.g+-, 64d D ?fg:;-/.è v)§)B112),++),0è-G+4-:B37.7¥G)B:B37 Q<.1-A6<*37.3G¢¥G);26<L:B),+3G¢ d D ? fg7<T£ÿ)G£Lg64d D ?f# 6 d D ?fgh%<*6 çÛ-/.10gY¢Â3G2%),-G:|g $ Ò -/.10+6 d D ?`f\<L+^-4¥G);2 ]6)B§3G¢ d D ?³f£¸^P81+;g1¢¾3G2),-G:|Aa(*A,/e5g6d:a!f2 U 6 d:3 D ?³f b )c 64d:3 D ?³f dÛBä#f

h%1<L:­Ù:;-/.$ v)),-G+<L(*Ù2);¢Â3G2­8Q(L-A6),0 -G+-J:B37.7¥G)B:B37 Q<.1-A6<*37.£¸%981+,g 64d@f<L+\-:B37./¥G)B§:B37 Q<.1- ]6<*37.@3G¢),(*)V)V.76+À3G¢6#)+6237.#U4)B#6)V.1+<*37. d@f£¸^1<L+R¥P<*3/(L-A6),+!61)¨-G++8Y16<*37.56Q-A6 64d@f<L+O-^¥G);2 ]6)B§3G¢ d@f£ ,àà S @#G;Ý O\'%:;:B3G2|0Q<.#U 63 ¸^#) ]3G2)V ÛAg 6#) +6237.#U )B#6)V.1+<*37. d f 3G¢UW$ d f d EX f Z :;-/.Õ v)ã2);U/-A201),0 -G+$6#)-A2U/<.1-G(¢Â3G2= 3G¢

d@f1 Z\[ i 6Fdf u 6W! dÛ Ó f

h%#);2)¢Â3G2À),-G:| * ,g-6xd@f<L+%6#)^¼³3/<.7654-G++¢8Q.1:B6<*37.Å-G++3#:;<L-A6),0è63UW$ YX Z £ ½ 3G2),-G:­ DCQ*%g6#):B37.10Q<*6<*37.Q-G(!4-G++¢8Q.1:B6<*37.Q+=6xdDCLD ?CFfg+v),:;<*XQ),0Ø¢¾3G2),-G:| ? C%* ,/>"¹<. X ,g¯-G++<*U7.æ-G(L(6#)ª4-G++§63 6#)J+<.#U/(*)ª+6-A6) . " d? CFf * , ".#" gh%#);2) ".#" -A2)$6#)$01),:;<L+<*37. ¢8Q.1:B6<*37.1+Å-G++3#:;< ]-A6),0 63?6#)Ù+62-A6);UG * , g-/.10 2),12),+)V.766#);2);¢¾3G2)«-â¥G);26)BÃ3G¢6#)J¥A-G:,8#3781+©:B37.10Y<*6<*37.1-G(:B2),0Q-G(=+);6 "W /$" dDCAf * % +v),:;<*XQ),0?<.?¤º#81- ]6<*37.âd 7f£¸^P81+;g6 d@f¦<L+°-¥G);26)B3G¢ d@f^ v) ]:;-/81+)3G¢ '23/v3/+<*6<*37.ØÛA£J'À+ ¥A-A2<*),+<. ,g-G(L(6#)¥G);26<L:B),+¨3G¢ d@f»-A2)3/ Q6-G<.#),04-/.106#);2);¢¾3G2) d@f# d@fg¢¾237 h%1<L:­«6#)6#),+<L+¢¾3/(L(*3Vh%+-A2U/<.1-G(L<*W,<.#U1£

Locally specified credal networks 33

Page 44: Proceedings of the Third European Workshop on Probabilistic

QYQÊÈÎG ! ""#$!%&'(*)+, )- )/.0)+)21354)7689: +;<)7$3$)+6=>2$?@(8A*$BC?$; - *)+DEGFHJI0K<L7MONNOPQSRUTVWL8X>Y[ZQSKPCQSR,Y\NK]R^(_Y`Q[L(R^(abMOL(RXN+K<N+R@MONcL(R2dL8X]YfegN+Y[ZL7P5VhQSR:I3K<Li^iQSajQkY`l^R@Pmd@Y\^Y`Q9V]Y`Q[MVOnom?$?@);

U< qpr(s*pst uF/v)pgwyx=oz+ADmh ""#|10;<*<**B;<<$Dm6f~k;?$?;<5A*Dm()$?$<*$BE ;)$$)O6+FHI0K<L7MONONPQSRUTVL/XY[ZNhY[ZQSK<PE0vK<L]vNO^RdYq^KY[QSRTh=N]V+NO^(K<MZN+Kmd@l(_$L5V]QSv=nrm?$?,)7;7

z0$=uE;Oqo "" $r6<B?$; - - */4f<;)+)76 D?$v<)Dm;B*96m.b<*D?$;) 96<)g?$; - - *A)76+R,Y\v3@K<L+NO^(VLRQSRTp, 85O*]#

z0$pft$z0$,p,uE;`z')+6<)O6~?$;< - - *)76?$;?,B<* - 46<*D$9()m$A)**$B<;)+)~ +9&U$)6FHI3K<LMONONOP(QSRT(V L/X¡ YkZ ]R@YqN+K]R@^Y[Q[LR@^(a¢L(RXN+K<N+R@MONL(RWI0K<L7MNVOV]QSRUT£^(RPeg^R@^+TUN+N+R,YL8X!¤R@MONKY\^(QSR@Y[l¥QSRg¦§RL(¨3aNOP+TUN+_`©^(VNOPd@l5VYqN+V]pv?B)76ª

w0x=vz+DmpUz«v)z0D?@6pUtuF/v)p|t$zw)+;;<);O=$|¬bv O0 "",­«;?@6<sE;<)*A1054)6<*®)/.0;¯6668v 9().b< *D=?;<)+A +*6<)!&U*(')§?$; - - *96/* 668)76<6<D)+U6FHI3K<LMONONOP(QSRT(V0L8X­Y[ZN0°±Y[ZfRR^a$¢L(R7XNK<N+R@MONLR¤oR_MON+KYq^QSR,Y`l§QSR|KY[Q ²0MQ[^aU]R@YqN+aSajQTUNRMONp?B)637"@bF3«;)66+

w0x=zD: """$z;)$)/.0;¯6KY[Q ²0MQ[^aR,Y\NaSaQTN+R@MON+p7 "$*(v ³³$

w0yx=z+Dm ""´$oxf;O?$$9 +D=vv)*6~k;D?$;) +*6<)?;< - - **)76+µ]R,Y\¶$0,K<LU¶N^(V+L(RQSRTp¶³` (A³O*#·@7ª,

tzrw)+;;<);OE$c¬bv Oc,w0x=zDh ""U vFH$~k)+;)+ +).b<®6<)+?,;O(<)4E6<?,)7 ¸)!6<)O6f~?$; - A- *)76 +;<)7$$)+/.;<¯v6FH:I0K<L7MNONOP(QSRT(VL8X=Y[ZN¹º Y[Z>¢LRXN+K<NRMONhLR»¤R@MON+KYq^QSR,Y`lcQSRKY`Q ²M+Q[^(aR_Y\N+aSajQTN+R@MON+p$?B)63U³"(U³·E;B¥¼¶$~kD$

F]½r)+'`c7ª"$:¾sZvNm0R,Y\NK\@K]Q9V+N¥L8X¦§RL(¨3aNOP+TUN+hEF/n«­;<)76<6pv½vr

u$E;s¥§,z0 "" vuU<;B= +v<*,$A$)+?@)+v) )~k; ;)$68)+60RR^(ayV§L/X¶e!^YkZvNm^Y`_Q[MV^(RPKY`Q ²M+Q[^(a,R,Y\NaSajQTUN+R@MONp@³´/A\] ´5³ $

n$¿À);<Dm,Et$«);<q3"$À%&'(*)+, )¶h6<4U$A$)6<96c~| +6§Dvv)+96ÁFHÂI0KLMONONOP(QSRUTV2L/X Y[ZNd@QUYkZR,R^(af¢L(R7XNK<N+R@MONL(R¤R@MON+KYq^QSR,Y`l®QSR2K]_Y`Q ²M+Q[^(a@]R,Y\N+aSajQTN+R@MON+p?B)763 ´´5v ·"$%968)');

«sÃ*)4E$dYq^Y[Q9VY`Q[M^aoNO^5V+LRQSRUTg¨ÀQkY[ZE],KN_M+Q9V+NI0KLi^iQSajQkY`Q[N]V]3z,?$Dmh¥Äqp$Å).WÆ;¯

b«bÃ)+*DmÇ"Èw$$D)+U§ , )+?$6h~&U<*')h?$; - - *96/* ¥$)/.0;¯v6+2KY`Q ²M+Q[^(a­R_Y\N+aSajQTN+R@MON+p$@[³Oy ´·7³"³$

(s* ""$u(*68<9 +*v~k)+;)+ +)¶~o<$)§,*') +;<)7$ +*6<6<¸)+;7FHgI0K<L7MONNOPQSRUTV=L/X§Y[ZNdsNMOLR@PR_Y\N+K]R@^Y`Q[L(R^(a,d,l(vL(V]QSvJL(R|],K<NOM+Q9V+NI0K<LUi<^Ui]QSaQkY[Q[NV^R@P¾sZvNQSK0@ajQ[MO^Y`Q[L(RVOp,?B)6b³ª(v³³$Uuv¯)+;7

,s(s*À ""´vz,68);<'((');$*)6~k;3?$;)v9 ]')*v~k);<) ).b* D=?)+<)$(O$FHEI0KLMONONOP(QSRUTV§L/XY[ZN0ÉrLvKYkZ¶]R@YqN+K]R@^Y[Q[LR@^(ad@l($L5V]QSvL(R],K<NOM+Q9VNI3K<Li^iQSajQkY`Q[N]Vc^(RP>¾ZN+QSK3@ajQ[MO^Y`Q[L(R$V]p?,B)76¶"#5,7´$vuvF/«­n§

Å$B,ps¬¶rÊ`ps,gË«)7³Ì D=?vA<$);4~v)7 968*§)/.0;¯6r]R@Yq$U3@K<L+NO^5V+L(R,QSRUTpsq ] ª³5@5´ª

34 A. Antonucci and M. Zaffalon

Page 45: Proceedings of the Third European Workshop on Probabilistic

!"$#%&'(*)(+,(-. #0/12,(354 6&7'8,9!;:<3 2='(>; ,?

@ ACB%DFEBHG+IKJ L+M+N OQP%RSACBUTWVLSGYX[Z\^]2IS_a`bBSX+J \^G5BHGYX6cOdG+Gfe\^]2GY\^] gK\^GYJ \^Gh ZBHG+IKJ2iKM+G+OQP%RSACBHOjM+klDUTnmSoWP%Jqp BrBHs.p X[tu \^vBH]2w2xy\^G<wzRHk|RSx~v+s+w \^]zVP^OQ\^GYP%\ BHAdZRS]2I G+OdDK\^] J2OdwaiKM u \^G+xBH]2t

%qH+r wJ \%\^xyJw \^x~v+w2OdG+I(w R\%<s+OdvyDOd]2w2sBHABHIK\^G<w JqMKkRS]|\nYBHx~v+AQ\OdGB;P%RSx~v+s+w \^]|I<BHxy\rM<Odw2Ir]BHv++OQPqBHAxyRX+\^AQJ"kRS]X+\%P^OQJ2OQRSGyxBHtOdG+IYp"BHwOQJqMSOk.BWDOd]2w2sBHABHIK\^G<wvRrJ J \%J J \%JBHG~OdG[YsY\^GYP%\ X[OCBHIr]BHxRHkOdw J] \^AQ\^DrBHG<wzRS]2AQXMw2Y\^G0OdwWOdAdAZ\(BHZ+AQ\?w RwBHtK\?]BHw2OQRSGBHAX+\%P^OQJ2OQRSGYJ;BHGYXbBHv+v\qBH]WDK\^]2iFOdG<w \^AdAdOdIK\^G<wqp Rq\^DK\^]qMw2+OQJ(BHv+v+] RKBSP x~OQJ J \%J;w2Y\9vRSOdG<w?w2BHw(BHIK\^G<w J(BH] \yJ2s+v+vRrJ \%Xbw RfBSP^w<s+xBHG[_jAdOdtK\] \n_I<BH] X[AQ\%J J|YRqBHAdOQ\^Gw2Y\^i9xB%iBHv+v\qBH]qpBHw|OQJ|x~OQJ J2OdG+IOQJ|w2BHw\^xyRSw2OQRSGYJOdG[YsY\^GYP%\ w2Y\ P YRSOQP%\RHkBSP^w2OQRSG.MBHGYX]BHw2Y\^]?w2BHGw2]2iOdG+Ifw RfOdGYP%RS]2vRS]BHw \\^xyRSw2OQRSGYJ?OdGw2Y\~s+w2OdAdOdw2OQ\%J?OdwOQJv+] \nk\^]BHZ+AQ\w RFxBHtK\y\n[v+AdOQP^Odw?w2Y\yX[iGBHx~OQP%J?RHk\^xyRSw2OQRSGYJ(BSJ\^AdABSJ;w2Y\^Od]?OdG[YsY\^GYP%\RSGw2Y\yP YRSOQP%\rpY\.OdAdOdIr<w v+] RHT2\%P^wv\^]7kRS]2xy\%XfBHw BHAdZRS]2I~ G+OdDK\^] J2OdwaiOQJZBSJ \%XFRSGFw2Y\;J R9PqBHAdAQ\%Xf `VM[+OQP FOQJB~J2i[J2w \^x=kRS]WJ2Odx?s+ACBHw2OdG+I<s+xBHGb\^xyRSw2OQRSGYJqp P%P%RS] X[OdG+Iw Ryw2+OQJw2Y\%RS]2iKM\^xyRSw2OQRSGYJzBH] \(BU¡\%P^w \%XY\^G¢\^DK\^G<w J~OdG[YsY\^GYP%\FIKRKBHAQJ~RS]~] \%J RSs+] P%\%JqpY\£RSx~vBH]BHw RS]BHGBHAdi[J \%J9YRq*BHG¢\^DK\^G<wyxB%iOdG[YsY\^GYP%\IKRKBHAQJBHGYX~] \%J RSs+] P%\%Jqp"Y\ DrBHAdsBHw RS]X+\^w \^]2x~OdGY\%J|w2Y\P BHG+IK\ RHk\^xyRSw2OQRSGBHAv+] RH¤YAQ\(¥BDK\%P^w RS]P%RSG<wBHOdG+OdG+I(X+\^Ir] \%\RHkBHG+IK\^]qMrk\qBH]qMnT2RqiKM<BHGYX~J RS]2] RqW¦nMBHGYX9w2Y\ P^w2OQRSGy§] RSvRrJ \^]v+OQP t[Jw2Y\BSP^w2OQRSGxy\%X[OCBHw2OdG+I£vRrJ J2OdZ+OdAdOdw2OQ\%JBHGYX\^xyRSw2OQRSGBHA|v+] RH¤YAQ\rp9+OQJ;vBHv\^];OQJzOdGYJ2v+Od] \%XZ<i0w2Y\~ `Vp\kRP^sYJRSGw2Y\WRSx~vBH]BHw RS]BHGYXRSs+w2AdOdGY\YRqw2Y\X+\^DK\^AQRSv\%XEB%iK\%J2OCBHGGY\^waRS]2tyxyRX+\^AQJBHAQJ RPqBHGfZ\zs+w2OdAdOd¨q\%XkRS]w2Y\ DrBHAdsBHw RS]WBHGYXFw2Y\ P^w2OQRSG£§] RSvRrJ \^]qp

© ªH«¬r­K®¯|°|±¬K²j®«

eWOd]2w2sBHA"BHIK\^G<w J;BSP^w2OdG+IP^] \%X[OdZ+Adi0BSJ<s+xBHG0Z\^OdG+IKJRSs+AQX¢Z\FRHkzOdG<w \^] \%J2wOdG¢P%RSx~v+s+w \^]I<BHxy\%Jqp¢Y\P^ACBSJ J|RHkI<BHxy\%J|w2BHw|RSs+AQXZ\^GY\n¤Yw|w2Y\xyRrJ2wkl] RSxDOd]2w2sBHA+BHIK\^G<w JBSP^w2OdG+IzP^] \%X[OdZ+Adi9BSJ<s+xBHGYJ"OQJ³´SµQ¶n·¸ µd¹SºS»½¼¾f¿¹SÀ¶Á;¥l §ÃzJn¦np" §ÃzJsYJ2sBHAdAdiIrOdDK\Ww2Y\v+ACB%iK\^]xyRS] \"kl] \%\%X+RSxÄw2BHG?RSw2Y\^]P^ACBSJ J \%J.RHk[I<BHxy\%JqpOdw29w2+OQJ"kl] \%\%X+RSx5w2Y\v+ACB%iK\^]|PqBHG9Z\\n[v\%P^w \%X9w ROdG<w \^]BSP^wWOdw2bBHG<iFRSZ[T2\%P^w OdGfw2Y\I<BHxy\rMYOdGYP^AdsYX[OdG+IBHG<iDOd]2w2sBHA"BHIK\^G<wqpÃ;BHxy\%JWBH] \IK\^w2w2OdG+IxyRS] \?BHGYXxyRS] \b] \qBHAdOQJ2w2OQPbOdGÄBHAdABSJ2v\%P^w JYRq\^DK\^]£OdGxBHG<iPqBSJ \%JzAdOdw2w2AQ\9\n¡RS]2w;OQJWv+s+w;OdG<w Rw2Y\(DOd]2w2sBHA|BHIK\^G<w Jqp+OQJ|OdGJ2v+Odw \WRHkP%RSx~v+s+w \^]BHIK\^G<w JBSP^w2OdG+I9P^] \%X[OdZ+AdiBSJ<s+xBHGYJ|OQJ|BHxyRSG+I?w2Y\RSAQX+\%J2wOdG<DK\%J2w2OdI<BHw \%Xw RSv[_OQP%JWOdG ¥l.s+]2OdG+IYMÅqÆrÇrÈK¦npz\?v+] \%J \^G<wByxy\^w2YRXw R¢v+] RX[sYP%\ÉBHIK\^G<w J0Odw2Ä<s+xBHG[_jAdOdtK\ÉZ\^B%DOQRSs+]+OQP £OQJ\^AdAJ2s+Odw \%XkRS]DOd]2w2sBHABHIK\^G<w JOdGf §ÃzJqpVRSxy\;vRSv+s+ACBH]I<BHxy\%JqM[+OQP fB%DK\;DK\^]2iJ2Odx~v+AQ\

DOd]2w2sBHABHIK\^G<w JqMSBH] \ u OCBHZ+AQR ¥lEAdOd¨%¨UBH] X G<w \^]2wBHOdG[_xy\^G<wqMÊrÈrÈrÈK¦BHGYXNW\^DK\^]2OdG<w \^]9N OdIr<w J¥lEOQRqBH] \rM

ÊrÈrÈrÊK¦np G u OCBHZ+AQR \qBSP £BHIK\^G<w\n[+OdZ+Odww2Y\;JBHxy\Z\^B%DOQRSs+]\n+P%\^v+w(kl] RSxËP%\^]2wBHOdGJ P^]2Odv+w \%XÌ\^DK\^G<w Jqp G¢NW\^DK\^]2OdG<w \^]N OdIr<w J~w2Y\£BHIK\^G<w JX+RbGYRSwyB%DK\OdGYX+\^v\^GYX+\^G<wyZ\^B%DOQRSs+]qÍw2Y\^iBHOdwyRSG+AdikRS](w2Y\v+ACB%iK\^]w RP%RSxy\0BHGYX¢OdG<w \^]BSP^wOdw2w2Y\^xfpWJ2s[_BHAdAdiKMU]BHw2OQRSGBHA<DOd]2w2sBHABHIK\^G<w JBHw2w \^x~v+ww RxBU[Odx~Od¨q\w2Y\^Od]y\n[v\%P^w \%XÌs+w2OdAdOdwaiKpÌ+OQJyJ RSAds+w2OQRSG.M|YRq\^DK\^]qM] \%J2s+Adw JOdGv+] \%X[OQP^wBHZ+AQ\0Z\^B%DOQRSs+]+OQP sYJ2sBHAdAdiOQJs+G<BHG<w \%X£OdG£P%RSx~v+s+w \^]I<BHxy\%Jqp

@ GY\ÎRHkyw2Y\ÎBHOdxyJbRHkyw2Y\É.OdAdOdIr<wv+] RHT2\%P^w0OQJw R~X+\^DK\^AQRSv0B(J2i[J2w \^xkRS]J2Odx?s+ACBHw2OdG+Iy<s+xBHG£\^xyRH_w2OQRSGYJqMUw2Y\|J2i[J2w \^x6OQJPqBHAdAQ\%X(BHG9ÏÀy´rÐl»´S¼¹Sµ¹S¼Ñ Òb´S·Ðl»½ÓH¹rÐl»´S¼¹SµÔºHÁ^Ða¶nÀ!¥l `V+¦np.@ w2Y\^]BHOdxyJWOdGYP^AdsYX+\yBJ2i[J2w \^xÕkRS](X[iGBHx~OQPyGBH]2]BHw2OQRSGÌBHGYXxyRX+\^AdOdG+I0RHkw2Y\zv+ACB%iK\^] Jv+] \nk\^] \^GYP%\%JZBSJ \%XFRSGFw2Y\;v+ACB%iK\^] J BSPn_w2OQRSG.pN OdtKRSACBUT iAQX[OdIÉkl] RSxÖw2Y\×(¶jØ+¹SÙ^ÐlÀ¶n¼Ð´7ÚÛ´SÀ9À9ܼY»Ý ¹rÐl»´S¼5¹S¼Ñ ¸ ÁnºKÝnÞ[´Sµd´ ¾SºHßà¹SµQá2´SÙ¾â¼Y»½·ÓS¶nÙÁn»lÐlº~BSJv+] RSvRrJ \%Xw R?ZBSJ \Ww2Y\WJ2i[J2w \^xãRSGw2Y\n_RS]2OQ\%JRSGFw2Y\z<s+xBHG£\^xyRSw2OQRSGBHABHGYXxyRSw2OdDrBHw2OQRSGBHAJ2i[J2w \^x䥽Ojp \rp&vYJ2i[P YRSAQRSIriY¦y+OQP Irs+OQX+\%J<s+xBHG

Page 46: Proceedings of the Third European Workshop on Probabilistic

BSP^w2OQRSGYJ?¥ iAQX[OdIYMÊrÈrÈrÇK¦npY\* `V$OQJJ2Odx~OdACBH]w RÕw2Y\ @; xyRX+\^A

¥j@ ]2w RSG<iKMW|AQRS] \rM BHGYX6RSAdAdOdGYJqMÅqÆ ¦np G5¥lEBH]2w7_GY\%P tM ÊrÈrÈrÊK¦yJ RSxy\£v+]BSP^w2OQPqBHAzv+] RSZ+AQ\^xyJOdw2Odx9_v+AQ\^xy\^G<w2OdG+Iw2Y\Î@;!xyRX+\^A?kRS]FkjBSP^OCBHA(\n[v+] \%J7_J2OQRSGYJ~BH] \vRSOdG<w \%XÎRSs+wqM|BHAQRSG+IbOdw2ÌJ RSAds+w2OQRSGYJ(kRS]J RSxy\(RHk|w2Y\^xfpz`0RrJ2wWGYRSwBHZ+AQ\~BH] \~By] \%X[sYP^w2OQRSGRHkw2Y\G<s+x?Z\^]|RHk\^xyRSw2OQRSGYJ|BHGYX(tK\%\^v+OdG+I?BW+OQJ2w RS]2i(RHk\^DK\^G<w Jqp "[v+] \%J J2OdG+IJ \^DK\^]BHA\^xyRSw2OQRSGYJzBHw RSGYP%\] \n_xBHOdGYJBv+] RSZ+AQ\^x BHAQRSG+IFOdw2xyRX+\^AdOdG+IfX[O¡\^] \^G<wv\^] J RSGBHAdOdwaiwaiv\%Jqp GÉw2+OQJ?RS]2t\kRP^sYJ(RSGÌBSP^w2OQRSGYJ~BHGYXÉs+w2OdAdOd¨q\

¹Sº<¶Án»¹S¼?¶^Ð|´SÙqÁ¥lENWJn¦.w RzJ2Odx?s+ACBHw \\^xyRSw2OQRSGBHA] \%J2vRSGYJ \%Jqp.Y\|X[]2OdDOdG+IkRS] P%\OdGw2Y\ `VWOQJÜ+¹Sµ ·»lÐlº´7Úµ » Úq¶¥ WR¦£BHGYX&Odw£OQJ£sYJ \%X&w RX+\^w \^]2x~OdGY\ZRSw20\^xyRSw2OQRSGBHA"J2wBHw \(BHGYX£P YRSOQP%\9RHk|BSP^w2OQRSG.pY\P YRSOQP%\RHk|BSP^w2OQRSGfOQJOdG[YsY\^GYP%\%XfZ<iw2Y\zDrBHAdsY\RHkBHAdA\^xyRSw2OQRSGYJqMzBHAdAQRqOdG+I¢J \^DK\^]BHA;\^xyRSw2OQRSGYJw RÉZ\b\n_v+] \%J J \%X0w2+] RSs+Irw2Y\yBHIK\^G<w JBSP^w2OQRSGYJqp9WJ2OdG+Iw2Y\] \%J RSs+] P%\%J£B%DrBHOdACBHZ+AQ\w RÎw2Y\BHIK\^G<wfw RÎxyRX+\^A;w2Y\RS]2AQXJ2wBHw \kRS]w2Y\zBHIK\^G<wqM[\W\^AdOdx~OdGBHw \zw2Y\ GY\%\%XkRS]w2Y\z+OQJ2w RS]2ikls+GYP^w2OQRSGfv+] RSvRrJ \%XZ<i¥lEBH]2w2GY\%P tMÊrÈrÈrÊK¦npEiyBHAdAQRqOdG+I(v+] \%X+\n¤YGY\%Xv+] \nk\^] \^GYP%\%J|w ROdG[_YsY\^GYP%\yZRSw2w2Y\yP BHG+IK\yOdGÉ\^xyRSw2OQRSGÎBHGYXP YRSOQP%\RHkBSP^w2OQRSG~OdwOQJ"vRrJ J2OdZ+AQ\w RWxyRX+\^AYX[O¡\^] \^G<wv\^] J RSG[_BHAdOdwaiwaiv\%J |BSP BHIK\^G<wWOdAdAIK\^w;ByP%\^]2wBHOdGBHxyRSs+G<w;RHkWR

Z<iv\^]7kRS]2x~OdG+IBSP^w2OdDOdw2OQ\%Jqp |BSP BSP^w2OdDOdwai] \%<s+Od] \%JB;G<s+x?Z\^]RHk.] \%J RSs+] P%\%JOdGRS] X+\^]w R?Z\ PqBH]2]2OQ\%XRSs+wBHGYX(OkB ] \%J RSs+] P%\OQJAQRrJ2w"Odw|BU¡\%P^w J"+OQP BSP^w2OdDOdw2OQ\%Jw2Y\BHIK\^G<wPqBHGyPqBH]2]2i~RSs+wqp"Y\WRf] \%P%\^OdDK\%X9kl] RSxBzP%\^]2wBHOdGBSP^w2OdDOdwai~OQJX+\^w \^]2x~OdGY\%XyGYRSwRSG+Adi(Z<i9YRq\n¡\%P^w2OdDK\Fw2Y\fBSP^w2OdDOdwaiÎOQJ~Z+s+wBHAQJ RbZ<iÎYRq xyRSw2O_DrBHw \%Xw2Y\;BHIK\^G<wOQJBHGYXZ<i+OQJIK\^GY\^]BHAv+] \nk\^] \^GYP%\kRS]w2BHwzBSP^w2OdDOdwaiKp\;sYJ \WRw RX+\^w \^]2x~OdGY\?w2Y\ÏÀy´rÐl»´S¼¹Sµ ¸ Ù2´µQ¶¥ −−→

EP¦RHk"w2Y\;DOd]2w2sBHABHIK\^G<w Jqp

\fRS]2t¢Odw2¢kRSs+]tOdGYX+JRHk?\^xyRSw2OQRSGYJ´SºrMÔ´SÙnÙ2´M"à ¼¾<¶nÙfBHGYX¶2¹SÙ^pY\y\nYBSP^w?\^xyRSw2OQRSGBHA] \%J2vRSGYJ \(w RFBHG\^DK\^G<w?X+\^v\^GYX+J;RSGbw2Y\~P BHG+IK\9OdGw2Y\AQ\^DK\^ARHk\n[v\%P^w \%XWROdG£w2Y\?J2Odw2sBHw2OQRSGbBUklw \^]w2Y\F\^DK\^G<wqpà ¼¾<¶nÙRP%P^s+] J~Y\^GÌw2Y\£BHIK\^G<w\%J2w2O_xBHw \%Jyw2BHwOdwyOQJX[OP^s+Adww R] \^wBHOdGw2Y\fAQ\^DK\^AWRHkWR6Odw9BSXÎZ\nkRS] \w2Y\\^DK\^G<wy+OdAQ\0Ô´SÙnÙ2´ RPn_P^s+] JY\^GOdwBSJZ\%P%RSxy\s+GBSP +OQ\^DrBHZ+AQ\rp¶2¹SÙRPn_P^s+] JzY\^Gbw2Y\~BHIK\^G<w\%J2w2OdxBHw \%J?BJ2OdIrG+O¤PqBHG<wX+\n_P^] \qBSJ \zOdGkls+w2s+] \;\n[v\%P^w \%X WR+OdAQ\!´SºRP%P^s+] JY\^Gw2Y\~\n[v\%P^w \%X"WRÌ]2OQJ \%Jqp9Y\9OdG<w \^GYJ2OdwaiRHkw2Y\\^xyRSw2OQRSG£OQJ X+\^w \^]2x~OdGY\%X£Z<iw2Y\?P BHG+IK\OdG0\n_

v\%P^w \%X WRpY\? `VOQJ X+\%P%RSx~vRrJ \%X£OdG<w R~¤YDK\?J2w \^vYJqpY\

Û´SÀ Ø+¹SÙ2¹rÐj´SÙFP%RSx~vBH] \%J(YRqAdOdtK\^AdiOdwOQJw2BHw?w2Y\BHIK\^G<wÉPqBHG] \^wBHOdGRS]xBHOdG<wBHOdGOdw JWR*Z\nkRS] \BHGYX BUklw \^]BHG \^DK\^G<wbBSJ0RP%P^s+] \%Xp Y\ÌÏÓH¹Sµ Ü·¹rÐj´SÙbX+\^w \^]2x~OdGY\%Jw2Y\f\^xyRSw2OQRSGBHA ] \qBSP^w2OQRSGw Rw2Y\GY\^J2Odw2sBHw2OQRSG.p w(BHAQJ RfX+\^w \^]2x~OdGY\%J?w2Y\~OdG<w \^GYJ2OdwaiRHkw2Y\ \^xyRSw2OQRSGBHA] \%J2vRSGYJ \rp"Y\WàÝ%Ðl»´S¼ ¸ Ù2´Ø+´HÁq¶nÙv+] RSvRrJ \%Jw2Y\0BSP^w2OQRSGw2Y\0BHIK\^G<wOdAdAWwBHtK\£OdGRS]7_X+\^]?w R£X+\qBHAOdw2w2Y\\^DK\^G<wqpfY\BSP^w2OQRSGÉP YRrJ \^GJ2YRSs+AQXbZ\9OdG[YsY\^GYP%\%XZ<ibw2Y\~\n[v\%P^w \%X"WRBHGYXw2Y\BHIK\^G<w J −−→

EPp"Y\ ¸ Þ<ºHÁn»´Sµd´ ¾S»Ý ¹SµÛÞ[¹S¼¾<¶¿ ¶n¼Y·

¶nÙ2¹rÐj´SÙBHGYX[AQ\%JP BHG+IK\%JOdGw2Y\bBHIK\^G<w JfBHv+v\qBH]7_BHGYP%\0OdGw2Y\0I<BHxy\0w RÉ\n[+OdZ+Odww2Y\b\^xyRSw2OQRSGYJOjp \rpw \n[w2s+] \%JqMYBHG+OdxBHw2OQRSGYJ\^w PrpY\?àÝ%Ðj´SÙ9PqBH]2]2OQ\%JRSs+ww2Y\;P YRrJ \^GbBSP^w2OQRSG.M+OkBHG<iKp G~w2+OQJ"vBHv\^]\OdAdAYP%RSGYP%\^G<w2]BHw \ RSGyw2Y\ RSx9_

vBH]BHw RS]BHGYX¢X+\%J P^]2OdZ\FYRq*w2Y\Fv+] RSZ+AQ\^xPqBHGZ\J RSAdDK\%XbsYJ2OdG+IfEB%iK\%J2OCBHGGY\^waRS]2t[JqpyY\9GY\n[wJ \%Pn_w2OQRSGOdAdAzX+\%J P^]2OdZ\£w2Y\£IK\^GY\^]BHAzJ2w2]2sYP^w2s+] \bRHkw2Y\xyRX+\^AQJw2+] RSs+IrBHG~\nYBHx~v+AQ\rpV\%P^w2OQRSG$# OdAdA+J2YRqYRq*w2Y\fxyRX+\^AQJPqBHGJ2s+v+vRS]2ww2Y\bRSx~vBH]BHw RS]qpY\zACBSJ2wwaRyJ \%P^w2OQRSGYJRSs+w2AdOdGY\%Jkls+w2s+] \zRS]2tMV\%Pn_w2OQRSG&%OdAdARSs+w2AdOdGY\~YRq w2Y\~X+\^DK\^AQRSv\%XEB%iK\%J2OCBHGGY\^waRS]2tÎxyRX+\^AQJyPqBHG¢Z\s+w2OdAdOd¨q\%XÌkRS]9w2Y\f DrBHAds[_BHw RS]~BHGYXÉw2Y\ P^w2OQRSĢ] RSvRrJ \^]qM+OdAQ\fV\%P^w2OQRSG¢ÇX+\%J P^]2OdZ\%JBHwx?sYJ2wZ\;J2v\%P^O¤\%XkRS]RSs+]kl]BHxy\n_RS]2tw R9Z\zs+w2OdAdOd¨q\%Xp

' (²j­r¬K°*),+.-/10«¬32

Y\9IKRKBHA|RHkw2+OQJRS]2tbOQJ;w RB%DK\yDOd]2w2sBHABHIK\^G<w Jv\^]7kRS]2x~OdG+I9P^AQRrJ \ w R(YRq<s+xBHGYJRSs+AQXpR?w2+OQJ\^GYX9\sYJ \.WR0BSJw2Y\ZBSJ2OQPP%RSGYP%\^v+w|X[]2OdDOdG+Iw2Y\BHIK\^G<w Jqp P^w2OdDOdw2OQ\%JBHGYX¢RqGY\^] J2+OdvBH] \Fw2Y\FRSG+AdiB%i[JBHIK\^G<w JPqBHG¢] \%P%\^OdDK\WR BHGYX¢w2Y\4WR BHGBHIK\^G<wy] \%P%\^OdDK\%J~kl] RSxBbJ2v\%P^O¤P£BSP^w2OdDOdwaiÎOQJ~OdG[Ys[_\^GYP%\%XZ<ibYRq xyRSw2OdDrBHw \%Xw2Y\BHIK\^G<wOQJzkRS]?X+RSOdG+Iw2Y\bBSP^w2OdDOdwaiKp1Y\0xyRSw2OdDrBHw2OQRSGkRS]w2Y\0X[O¡\^] \^G<wBSP^w2OdDOdw2OQ\%JxB%i9Z\OdG[YsY\^GYP%\%XyZ<i~X[O¡\^] \^G<wkjBSP^w RS] JqMPqBHAdAQ\%Xf] \%J RSs+] P%\%JqMBHGYXfw2Y\;xyRSw2OdDrBHw2OQRSG£kRS] \qBSP 0RHkw2Y\BSP^w2OdDOdw2OQ\%J(OQJOdG[YsY\^GYP%\%XÉZ<iw2Y\BHIK\^G<w J(v+] \nk½_\^] \^GYP%\ kRS]w2Y\WJ2v\%P^O¤PzBSP^w2OdDOdwaiKp ]BHw2OQRSGBHA.BHIK\^G<wRSs+AQXÉw2]2iw R£xBU[Odx~Od¨q\w2Y\\n[v\%P^w \%X5WRMZ+s+wOdGw2+OQJRS]2tw2Y\ −−→

EPRHkw2Y\BHIK\^G<w~BHAQJ RFOdx~vBSP^w J

w2Y\;P YRSOQP%\?RHkBSP^w2OdDOdwaiKpY\?xy\^w2YRX+RSAQRSIriFkRS]WX+\%P^OQX[OdG+I+OQP BSP^w2OdDOdwai

w RÎv\^]7kRS]2x PqBHG6Z\bX+\%J P^]2OdZ\%X6Z<iw2Y\0kRSAdAQRqOdG+I

36 O. Bangsø, N. Søndberg-Madsen, and F. V. Jensen

Page 47: Proceedings of the Third European Workshop on Probabilistic

J2w \^vYJ

Årp G\^DK\^G<wP BHG+IK\%Jw2Y\] \%J RSs+] P%\%JRHk[w2Y\|BHIK\^G<wqp

Ê[p|Y\z\n[v\%P^w \%X WRxB%iP BHG+IK\?BSJBHGf\n¡\%P^wRHk"w2Y\;P BHG+IK\OdGf] \%J RSs+] P%\%Jqp

#[p P BHG+IK\ OdG\n[v\%P^w \%X WR£w2]2OdIrIK\^] JBHG\^xyRH_w2OQRSGBHA.] \%J2vRSGYJ \rp

%Yp|Y\b\n[v\%P^w \%X WR1IrOdDK\^G6w2Y\P BHG+IK\%X6] \n_J RSs+] P%\%JzBHGYXfw2Y\GY\^ −−→

EPX[OQP^wBHw \?w2Y\(BSP^w2OdD<_

Odwaiw R~v\^]7kRS]2xfp

\(OdAdA"OdG<w2] RX[sYP%\9w2Y\?kl]BHxy\^RS]2t0w2+] RSs+IrBHG\nYBHx~v+AQ\rp\sYJ \@ Z[T2\%P^w"@ ]2OQ\^G<w \%X(EB%iK\%J2OCBHG(NW\^w7_RS]2t[J~¥lEBHG+IKJ LBHGYX£s+OdAdAQ\^x~OdG.M.ÊrÈrÈrÈK¦?¥j@;@WENWJn¦OdGw2Y\\nYBHx~v+AQ\"kRS].\qBSJ \RHk+\n[vRrJ2Odw2OQRSG.MHBSJw2Y\Od]2] \^A_\^DrBHG<w;X+\^wBHOdAQJ;BH] \?+OQX+X+\^GbOdG0w2Y\(RSZ[T2\%P^w JqM\~BHAQJ RZ\^AdOQ\^DK\w2BHw|J RSxy\P^ACBSJ J \%JPqBHGyZ\] \^sYJ \%XM<J RJ RSxy\I<BHOdG6OdG6J2v\%P^O¤PqBHw2OQRSG6w2Odxy\J2YRSs+AQXZ\BSP +OQ\^DK\%Xp Rq\^DK\^]Ww2Y\;OQX+\qBSJOdG£w2+OQJvBHv\^]WBH] \\%<sBHAdAdifBHv[_v+AdOQPqBHZ+AQ\w RENWJqp"VRSxy\RHk[w2Y\GYRSwBHw2OQRSG(RHk@;@WENWJBH] \OdAdAdsYJ2w2]BHw \%XÌOdGÌcOdIrs+] \ÅrpY\XYBSJ2Y\%XÎGYRX+\%JBH] \£OdG+v+s+wGYRX+\%J] \^v+] \%J \^G<w2OdG+I¢J RSxy\£GYRX+\0J2v\%Pn_O¤\%X6RSs+w J2OQX+\0w2Y\J P%RSv\bRHk?w2Y\P^ACBSJ JB%DOdG+IBHGOdG[YsY\^GYP%\(RSG£GYRX+\%JOdGYJ2OQX+\OdGYJ2wBHGYP%\%JWRHkw2Y\?P^ACBSJ JqpY\?ZR%+\%JW] \^v+] \%J \^G<wzOdGYJ2wBHGYP%\%JzRHkP^ACBSJ J \%JqMY\^] \w2Y\GYRX+\%J|RSG~w2Y\s+v+v\^]BHAk.RHkw2Y\ZR%BH] \OdG+v+s+wGYRX+\%Jw R;w2Y\OdGYJ2wBHGYP%\rMBHGYXyw2Y\GYRX+\%JBHww2Y\ZRSw7_w RSxãBH] \ RSs+w2v+s+w|GYRX+\%JqÍ<GYRX+\%JOdGYJ2OQX+\w2Y\OdGYJ2wBHGYP%\w2BHwPqBHGZ\vBH] \^G<w JRHkGYRX+\%J|OdGw2Y\WJ2s+]2] RSs+GYX[OdG+IP^ACBSJ Jqp

cOdIrs+] \&Å Gã@;@WENP%RSG<wBHOdG+OdG+I1BHG OdGYJ2wBHGYP%\Odw2FwaRyOdG+v+s+wGYRX+\%J BHGYXfBHGfRSs+w2v+s+wGYRX+\rpY\@;@WENãP^ACBSJ J;Odw J \^AkBSJ?RSGY\~OdG+v+s+wGYRX+\BHGYXRSGY\RSs+w2v+s+wGYRX+\rp

Jw2+OQJRS]2tOQJOdGw2Y\bP%RSG<w \n[wFRHk(P%RSx~v+s+w \^]I<BHxy\%JqMw2Y\yxBHOdGkRP^sYJOQJ?RSGB%DOdG+I0w2Y\BHIK\^G<w J\n[+OdZ+Odw2OdG+I<s+xBHG[_jAdOdtK\(Z\^B%DOQRSs+]qMRSZYJ \^]2DrBHZ+AQ\?Z<iw2Y\Wv+ACB%iK\^]qpY\;BHIK\^G<w JOdGw2Y\z\nYBHx~v+AQ\WOdAdAB%DK\w2+] \%\waiv\%JfRHkBSP^w2OdDOdw2OQ\%Jfw2BHw£iOQ\^AQX WRMOdG BP%RSx~xy\^] P^OCBHAI<BHxy\;w2Y\^] \;OdAdA.GY\%\%XFw R9Z\zxyRS] \

•cBH]2x~OdG+IBSP^w2OdDOdw2OQ\%J(BSJ(w R0X+R£Odw2kjBH]2x~OdG+Iw2Y\zACBHGYXp

• RSxy\BSP^w2OdDOdw2OQ\%J9BH] \yw2Y\BSP^w2OdDOdw2OQ\%J?\yBHG<ww2Y\WBHIK\^G<w Jw RZ\WBHZ+AQ\ w R?v\^]7kRS]2xOdw2+OdGw2Y\P%RSG[¤YGY\%J RHk"w2Y\^Od]YRSxy\rp

•V<vBH] \n_jw2Odxy\fBSP^w2OdDOdw2OQ\%JBH] \FBHG<iÌBSP^w2OdDOdw2OQ\%J9\OQJ2w2Y\BHIK\^G<w9w RfZ\X+RSOdG+I0Y\^GÉOdw?OQJGYRSwRS]2tOdG+IRS]WBHwYRSxy\rp

Y\ WR] \%P%\^OdDK\%Xkl] RSx \qBSP ÎBSP^w2OdDOdwaiwaiv\~OQJOdG[YsY\^GYP%\%XyZ<i~w2Y\ BHIK\^G<w J|waiv\rM<\rp IYp<B;J RSAQX[OQ\^]OdAdAGYRSw|IK\^wx?sYP WRfkl] RSxX+RSOdG+IkjBH]2x~OdG+I9BSP^w2OdDOdw2OQ\%JY\^] \qBSJ;Odw;OdAdAZ\(w2Y\9xy\qBHG+OdG+I£RHk\n[OQJ2w \^GYP%\(kRS]B?kjBH]2xy\^]qp + 9r C<Y\zxyRSw2OdDrBHw2OQRSGFkRS] BHAdAkjBH]2x~OdG+IBSP^w2OdDOdw2OQ\%J BH] \;OdG[_YsY\^GYP%\%XyZ<i~w2Y\J \qBSJ RSGBHGYX~w2Y\w2Odxy\ RHkXYB%iKpY\kjBH]2x~OdG+IBSP^w2OdDOdw2OQ\%JWBH] \

•§AQRSs+Ir+OdG+IOdw2bB~v+AQRSs+IrbBHGYX0B9YRS] J \rpY\xyRrJ2w\ P^OQ\^G<wB%iRHk.v+AQRSs+Ir+OdG+IYp \%<s+Od] \%JB¤\^AQXMB(v+AQRSs+Ir.MBHGYXfB9YRS] J \rp

•§AQRSs+Ir+OdG+IOdw25BÎv+AQRSs+Ir.p k9GYRÌYRS] J \OQJB%DrBHOdACBHZ+AQ\rMWv+AQRSs+Ir+OdG+I¢Odw2&BÉv+AQRSs+Ir6OQJw2Y\xyRrJ2w&\ P^OQ\^G<wqp  \%<s+Od] \%JÄB¤\^AQX BHGYX Bv+AQRSs+Ir.p

•§AQRSs+Ir+OdG+I(Odw2BJ2YRqDK\^AjpY\AQ\qBSJ2w|\ P^OQ\^G<wB%iÌRHk;v+AQRSs+Ir+OdG+IYp \%<s+Od] \%JB£¤\^AQXBHGYXBJ2YRqDK\^Ajp

Y\ WR] \%P%\^OdDK\%Xkl] RSx*\qBSP FkjBH]2x~OdG+IBSP^w2OdDOdwaiOQJOdG[YsY\^GYP%\%XZ<i9w2Y\WBHIK\^G<w J\^GY\^]2Iri~AQ\^DK\^AjM\rp IYpKOkw2Y\BHIK\^G<wBSJB?AQRq&\^GY\^]2IriAQ\^DK\^AjM[OdwOQJGYRSw\n[v\%X[OQ\^G<ww R~v+AQRSs+IrfOdw20B9J2YRqDK\^Ajp £9r C<Y\;YRSxy\?BSP^w2OdDOdw2OQ\%JWBH] \

•|AQ\qBHG+OdG+IYp Y\0xyRSw2OdDrBHw2OQRSGkRS]FP^AQ\qBHG+OdG+IÎOQJBU¡\%P^w \%XZ<iw2Y\zw2Odxy\zRHk"XYB%iFBHGYXw2Y\;J \qBSJ RSG.p

A Bayesian Network Framework for the Construction of Virtual Agents with Human-like Behaviour 37

Page 48: Proceedings of the Third European Workshop on Probabilistic

•§ACB%iOdG+IOdw25P +OdAQX[] \^G.p$ \%<s+Od] \%JbBHw£AQ\qBSJ2wRSGY\9P +OdAQXpzY\(xyRSw2OdDrBHw2OQRSGbOQJWOdG[YsY\^GYP%\%X0Z<iw2Y\?BHIK\^G<w J \^GY\^]2IriAQ\^DK\^AjpY\ WR] \%P%\^OdDK\%XOQJFOdG[YsY\^GYP%\%X6Z<iw2Y\bw2Odxy\RHk9XYB%i6BHGYXw2Y\BHIK\^G<w JWP BH]2OQJ2xB[p

•§ACB%iOdG+IOdw2ÄJ2vRSsYJ \rp= \%<s+Od] \%J£BÎJ2vRSsYJ \rpY\xyRSw2OdDrBHw2OQRSGOQJOdG[YsY\^GYP%\%XZ<i9w2Y\ J \qBSJ RSG.Mw2Y\¢BHIK\^G<w J\^GY\^]2Iri AQ\^DK\^AyBHGYX5w2Y\ÎJ2vRSsYJ \%JP BH]2OQJ2xB[p Y\WROQJFOdG[YsY\^GYP%\%X&Z<iw2Y\w2Odxy\RHkXYB%ifBHGYXFw2Y\?BHIK\^G<w J P BH]2OQJ2xB[p

+ 09r C<Y\;J2vBH] \n_jw2Odxy\(BSP^w2OdDOdw2OQ\%JWBH] \

•§s+Z0DOQJ2OdwqpY\xyRSw2OdDrBHw2OQRSG£kRS] IKRSOdG+Iw Ryw2Y\v+s+Z9OQJOdG[YsY\^GYP%\%X(Z<i?w2Y\BHIK\^G<w J"xBH]2OdwBHA[J2wBU_w2sYJ|BHGYX9\^GY\^]2Iri(AQ\^DK\^Ajp"Y\.WRFOQJ"OdG[YsY\^GYP%\%XZ<ibw2Y\9w2Odxy\yRHkXYB%iKMY\^w2Y\^]w2Y\yBHIK\^G<w?BSJP%RSx~vBHG<iKMBHGYXYRq x?sYP ÉxyRSGY\^i0w2Y\BHIK\^G<wBSJqp

•u BHw2OdG+IYpb \%<s+Od] \%J9B£XYBHw \rp0Y\xyRSw2OdDrBHw2OQRSGkRS]XYBHw2OdG+I~OQJOdG[YsY\^GYP%\%XZ<iw2Y\;BHIK\^G<w JxBH]7_OdwBHA;J2wBHw2sYJBHGYX\^GY\^]2IriAQ\^DK\^Ajp Y\&WR5OQJOdG[YsY\^GYP%\%XÉZ<iw2Y\yw2Odxy\RHk XYB%iKMw2Y\yxBH]2OdwBHAJ2wBHw2sYJqMSw2Y\BHIK\^G<w JP BH]2OQJ2xB[M<BHGYX(YRqÌx?sYP xyRSGY\^iw2Y\?BHIK\^G<wBSJqp

•§ACB%iOdG+IP Y\%J Jqp* \%<s+Od] \%J0BÌP Y\%J JfvBH]2w2GY\^]qpY\ xyRSw2OdDrBHw2OQRSGOQJOdG[YsY\^GYP%\%XZ<i~w2Y\w2Odxy\WRHkXYB%iKp(Y\$WRÌOQJzOdG[YsY\^GYP%\%XbZ<i0w2Y\~BHIK\^G<w JOdG<w \^AdAdOdIK\^GYP%\rp

GcOdIrs+] \yÊBHGÉ@;@WEN1kRS]zw2Y\~J2vBH] \n_jw2Odxy\yBSPn_w2OdDOdwaibwaiv\PqBHGZ\yJ \%\^G.pY\yIK\^GY\^]BHAJ2w2]2sYP^w2s+] \kRS]BSP^w2OdDOdwaiwaiv\?@;@WENWJPqBHGFZ\;\n[w2]BSP^w \%Xkl] RSxw2+OQJ.\nYBHx~v+AQ\rpY\vBH]BHxy\^w \^] J.kRS]w2Y\xyRX+\^AQJBH] \\^AdOQP^Odw \%Xkl] RSxw2Y\ I<BHxy\WX+\%J2OdIrGY\^]qM+BSJw2Y\^iyw \^GYXw RB%DK\BHG(s+]2IK\w R B%DK\P%RSG<w2] RSA+RqDK\^]"YRqÎP BH]BSP^w \^] JOdGw2Y\^Od]I<BHxy\%JZ\^B%DK\rpc+s+]2w2Y\^]2xyRS] \w2Y\^] \OdAdA<OdGIK\^GY\^]BHA+Z\GYRzXYBHwB;B%DrBHOdACBHZ+AQ\kRS]"w2Y\Z\^B%DOQRSs+]RHkI<BHxy\;P BH]BSP^w \^] JqMYw2Y\X+\%J2Od] \%XZ\^B%DOQRSs+] RHk"w2Y\^xGY\%\%XGYRSwZ\Ww2Y\;JBHxy\BSJw2BHwRHkBHG<ivRSv+s+ACBHw2OQRSGOdGf\n[OQJ2w \^GYP%\rpY\^] \ÉOQJbBHG5OdGYJ2wBHGYP%\kRS]b\qBSP BSP^w2OdDOdwai RSs+w7_

v+s+w2w2OdG+IbBFxy\qBSJ2s+] \RHkw2Y\ WRRHkw2BHw9BSP^w2OdDOdwaiKp |BSP ÄBSP^w2OdDOdwaiBHAQJ RBSJfBHGOdGYJ2wBHGYP%\RSs+w2v+s+w2w2OdG+Iw2Y\BHIK\^G<w J(xyRSw2OdDrBHw2OQRSGkRS]w2Y\J2v\%P^O¤PBSP^w2OdDOdwaiKpY\v+] \nk\^] \^GYP%\kRS] w2Y\9BSP^w2OdDOdwaiFwaiv\OQJzB~vBH] \^G<w

RHkw2Y\;s+w2OdAdOdwaiGYRX+\rpY\zOdG+v+s+wGYRX+\%J BH] \;w2Y\z] \n_J RSs+] P%\%JqMw2Y\(X+\%P^OQJ2OQRSGbGYRX+\?OQJ;BX[s+x~x?i£DrBH]2OCBHZ+AQ\BHGYX?OQJsYJ \%XkRS]] \qBSX[OdG+Izw2Y\\n[v\%P^w \%X!WRFRHk\qBSP BSP^w2OdDOdwaiKp U b Y\ X+\^DK\^AQRSv\%XF@;@WENxyRX+\^AQJJ2s+v+vRS]2w|w2Y\ `VOdG¢w2+] \%\fB%i[JqÍWY\b@;@WENWJBH] \fxyRX+\^AQJRHk\n_v\%P^w \%X WRÍ|\£PqBHG\^G<w \^]\^DK\^G<w Jw Rw2Y\FxyRX_\^AQJw R9IK\^ww2Y\z] \%J2s+AdwBHG<w P BHG+IK\?RHk"\n[v\%P^w \%X4WR¥½vBH]2wRHk[w2Y\wBSJ2tzRHk[w2Y\RSx~vBH]BHw RS]"RHk[w2Y\ `V+¦npY\ `V&J2YRSs+AQXÄRSs+w2v+s+wBHGÄs+vXYBHw \%X −−→

EPM?J R

Odw;GY\%\%X+Jw RfX+\^w \^]2x~OdGY\w2Y\yJ2w2] \^G+Irw2RHkw2Y\y\^xyRH_w2OQRSGBHA] \%J2vRSGYJ \Ww R9w2Y\;\^DK\^G<w(¥½vBH]2wRHkw2Y\zwBSJ2tRHkw2Y\ DrBHAdsBHw RS]n¦nÍvBH]2w|RHkw2+OQJwBSJ2t9OdAdAYOdGYP^AdsYX+\w2Y\J2Od¨q\ RHkw2Y\WP BHG+IK\WOdG\n[v\%P^w \%X WRpY\;@;@WENxyRX+\^AQJPqBHG¢v+] RqDOQX+\bBb]BHG+tK\%X¢AdOQJ2wRHkPqBHGYX[OQXYBHw \BSP^w2OdDOdw2OQ\%J+OQP FOQJsYJ \%XOdGFP%RSGST7s+GYP^w2OQRSGFOdw2 −−→

EPw RP YRRrJ \Fw2Y\£BSP^w2OdDOdwaiÎw Rv\^]7kRS]2x ¥½w2Y\FwBSJ2tÌRHkw2Y\ P^w2OQRSG0§] RSvRrJ \^]n¦np G£w2Y\;kRSAdAQRqOdG+IFJ \%P^w2OQRSGYJ \(X+\%J P^]2OdZ\OdG0xyRS] \

X+\^wBHOdA.YRq&w2Y\;xyRX+\^AQJJ2s+v+vRS]2ww2Y\; `Vp

2S²a«*/¬*0=®¯*0 + 27®.­£¬*0 ® )­3)¬r®.­

§"BH]2w(RHkw2Y\y `VbOQJw2Y\RSx~vBH]BHw RS](w2BHw(P%RSx9_vBH] \%J?w2Y\BHIK\^G<w J~\n[v\%P^w \%XWRZ\nkRS] \BHGYXÎBUk½_w \^]BHG\^DK\^G<wBSJRP%P^s+] \%Xp Gw2+OQJJ \%P^w2OQRSG\ OdAdAJ2YRqÌYRqÌw2Y\ @;@WENxyRX+\^AQJPqBHG~Z\sYJ \%X9w RWw2+OQJ\^GYXp G<i¢\^DK\^G<wFPqBHGZ\fxyRX+\^AQ\%XZ<iÌYRq OdwOdG[Ys[_

\^GYP%\%Jw2Y\BHIK\^G<w JF] \%J RSs+] P%\%JqM \rp IYpBw2Y\nklwF\^DK\^G<w] \%J2s+Adw J9OdGÎw2Y\FJ2w RSAQ\^GÎ] \%J RSs+] P%\%J~GYRSw9Z\^OdG+IB%DrBHOdA_BHZ+AQ\zkRS]w2Y\?BHIK\^G<wqp@ GYP%\(BHG£\^DK\^G<w OQJxyRX+\^AQ\%XFOdwPqBHGÌZ\OdGYJ \^]2w \%XÎOdG<w Rw2Y\fBSP^w2OdDOdwaiÉwaiv\£@;@WENWJBHGYX£w2Y\WRÎiOQ\^AQX+\%X£Z<i£w2Y\?Z\%J2w;BSP^w2OdDOdwai£IrOdDK\^G−−→EP

PqBHGZ\ P%RSx~vBH] \%Xw R?w2Y\ WRbiOQ\^AQX+\%XZ\nkRS] \w2Y\\^DK\^G<w|BSJOdG<w2] RX[sYP%\%Xp+OQJIrOdDK\%Jw2Y\P BHG+IK\OdG\n[v\%P^w \%X WRÎkRS]Ww2Y\~\^DK\^G<wqp?NWRSw \(w2BHw;\qBSP ] \%J RSs+] P%\WBHAQJ RBSJB WRbBHw2wBSP Y\%XkRS]RqGY\^] J2+Odv.MOjp \rp+w2Y\(BHIK\^G<wWIK\^w J WRkRS]B%DOdG+Iw2Y\] \%J RSs+] P%\rMBHGYX¢Okw2Y\f] \%J RSs+] P%\fOQJAQRrJ2ww2+OQJOdAdAzBHAQJ RÉBU¡\%P^ww2Y\P BHG+IK\OdG£\n[v\%P^w \%X4WRpY\\^xyRSw2OQRSGBHA] \n_J2vRSGYJ \Ww2Y\^GfX+\^v\^GYX+JRSGfw2+OQJP BHG+IK\(BSJRSs+w2AdOdGY\%XOdG0V\%P^w2OQRSGÉÅrp GbcOdIrs+] \#BHG@;@WEN1kRS]Ww2Y\9\^DK\^G<w;w2BHwzw2Y\

BHIK\^G<w J?xyRSGY\^iOQJ?J2w RSAQ\^GÉPqBHGZ\J \%\^G.pY\RSs+w7_v+s+wFGYRX+\ÎÒb´S¼¶nºJ2YRSs+AQX&P%RSG<wBHOdG&w2Y\\n[v\%P^w \%X

38 O. Bangsø, N. Søndberg-Madsen, and F. V. Jensen

Page 49: Proceedings of the Third European Workshop on Probabilistic

cOdIrs+] \?Ê GfRSZ[T2\%P^w RS]2OQ\^G<w \%X£EN&kRS]w2Y\J2vBH] \n_jw2Odxy\?BSP^w2OdDOdw2OQ\%Jqp

cOdIrs+] \# xyRX+\^A+kRS]B;w2Y\nklw|\^DK\^G<wz¥½w2Y\nklwRHkw2Y\BHIK\^G<w J.xyRSGY\^iY¦npY\.Þ+¶½Ú^ÐSGYRX+\"xyRX+\^AQJ.Y\^w2Y\^]w2Y\\^DK\^G<wBSJRP%P^s+] \%XfRS]Ok"OdwOQJ B9w2+] \qBHwqp

DrBHAdsY\?kRS]Ww2Y\~BHIK\^G<w J;] \%J RSs+] P%\(xyRSGY\^iKMIrOdDK\^Gbw2Y\P^s+]2] \^G<wyDrBHAdsY\£BHGYXÎw2Y\v+] RSZBHZ+OdAdOdwai¢RHkWw2Y\f\^DK\^G<wRP%P^s+]2]2OdG+IYp*Y\OdG+v+s+w£GYRX+\ÎÒb´S¼¶nºJ2YRSs+AQX6Z\BSJ J RP^OCBHw \%XÉOdw2Îw2Y\BHIK\^G<w JÒb´S¼¶nºGYRX+\¥Ckl] RSxw2Y\BSP^w2OdDOdwaibxyRX+\^AQJn¦npY\~GYRX+\.Þ+¶½Ú^ÐP%RSG<wBHOdGYJw2Y\v+] RSZBHZ+OdAdOdwai~w2BHww2Y\\^DK\^G<wOdAdAYRP%P^s+]qMrOjp \rpSw2Y\v+] RSZBHZ+OdAdOdwaiw2BHww2Y\~w2+] \qBHw?OdAdA|Z\~PqBH]2]2OQ\%XRSs+wqp kw2+OQJ.Þ+¶½Ú^ÐGYRX+\?OQJWOdGbw2Y\(J2wBHw \?kjBHAQJ \rMw2Y\(RSs+w7_v+s+w|GYRX+\;Òb´S¼¶nºOdAdAB%DK\ w2Y\ JBHxy\X[OQJ2w2]2OdZ+s+w2OQRSGBSJ;w2Y\9OdG+v+s+wGYRX+\Òb´S¼¶nºrMOjp \rp9GYRw2Y\nklw;BSJRPn_P^s+]2] \%XM|J R0w2Y\FBHIK\^G<w J~xyRSGY\^iBSJ9GYRSw9Z\%\^G¢BUk½_k\%P^w \%Xp kOdwOQJ?OdGw2Y\J2wBHw \yw2]2sY\rM"w2Y\\^DK\^G<w(BSJRP%P^s+] \%XyJ RWw2Y\ BHIK\^G<w JxyRSGY\^i9OdAdA+Z\BU¡\%P^w \%X~Z<iw2Y\;\^DK\^G<wqMBHGYXw2Y\;RSs+w2v+s+wzÒb´S¼¶nºzOdAdAw2Y\^GfP%RSG[_

wBHOdG0w2Y\?GY\^ DrBHAdsY\kRS]Ww2Y\9BHIK\^G<w JWxyRSGY\^iKpz+OQJxyRX+\^AOQJ~OdGYJ \^]2w \%XÌOdG¢w2Y\£BSP^w2OdDOdwaiÎwaiv\0@;@WENWJZ\^wa\%\^G¢w2Y\`0RSGY\^iOdG+v+s+w~GYRX+\FBHGYXÉw2Y\FBSP^w2OdD<_Odw2OQ\%JFY\^] \`0RSGY\^iOQJ£BHG6OdG+v+s+wfGYRX+\rp=+OQJFOQJJ2YRqGfOdG£cOdIrs+] \ %Yp

£°¬3+a²a«*0®¬*0 ),+a°*)¬r®.­ )«|¯5¬*0-£±¬K²j®«­K®® 2 0Y­

Gw2+OQJ(J \%P^w2OQRSGÉ\RSs+w2AdOdGY\OQX+\qBSJ(P%RSGYP%\^]2G+OdG+I0w2Y\ DrBHAdsBHw RS]fBHGYXw2Y\ P^w2OQRSG6§] RSvRrJ \^]qMw2Y\%J \bBH] \J2w2OdAdAv+] \^AdOdx~OdGBH]2iKMBHGYX¢J2YRSs+AQXÎZ\FJ \%\^GBSJ9w RSv+OQP%JkRS]kls+w2s+] \;] \%J \qBH] P .pY\ DrBHAdsBHw RS]X+\^w \^]2x~OdGY\%J"w R +OQP ~\n[w \^G<ww2Y\

kRSs+];\^xyRSw2OQRSGYJ?BH] \(OdG[YsY\^GYP%\%XZ<i£w2Y\9\^DK\^G<wqp u \n_w \^]2x~OdG+OdG+Iw2Y\\^xyRSw2OQRSGBHA?OdG[YsY\^GYP%\OQJfX+RSGY\BSPn_P%RS] X[OdG+Iw R0J RSxy\J2Odx~v+AQ\]2s+AQ\%J9sYJ2OdG+I0w2Y\P BHG+IK\OdGf\n[v\%P^w \%X4WRÉBSJOdG+v+s+w

Årp|§RrJ2Odw2OdDK\(P BHG+IK\?OdG0\n[v\%P^w \%XWR] \%J2s+Adw J OdGT2RqiKp

Ê[p|NW\^I<BHw2OdDK\ P BHG+IK\ OdG \n[v\%P^w \%X WRMw2Y\P BHG+IK\PqBHG+GYRSw(\qBSJ2OdAdi0Z\~] \%P^w2O¤\%X¥l\rp IYpyw2Y\x?s+] X+\^]WRHkB(AQRqDK\%X£RSGY\U¦nM+] \%J2s+Adw JOdGfJ RS]2] Rqp

A Bayesian Network Framework for the Construction of Virtual Agents with Human-like Behaviour 39

Page 50: Proceedings of the Third European Workshop on Probabilistic

cOdIrs+] \.% Y\V<vBH] \n_jw2Odxy\;BSP^w2OdDOdwaiwaiv\@;@WEN6Y\^] \;BxyRX+\^AkRS]Bw2Y\nklw\^DK\^G<w?¥½w2Y\nklwRHkw2Y\zBHIK\^G<w JxyRSGY\^iY¦BSJZ\%\^GFOdGYJ \^]2w \%Xp

#[pNW\^I<BHw2OdDK\!P BHG+IK\ OdG \n[v\%P^w \%X WRMw2Y\P BHG+IK\9PqBHG£Z\;] \%P^w2O¤\%X£Odw2YRSs+w;B~AQRSwWRHk|\nk½_kRS]2wqM+] \%J2s+Adw JOdG0BHG+IK\^]qp

%YpNW\^I<BHw2OdDK\P BHG+IK\OdGÉ\n[v\%P^w \%XWR6BSJ(BF] \n_J2s+AdwRHkBw2+] \qBHwqM.w2Y\yBHG<wBHIKRSG+OQJ2w(PqBHGZ\~X+\n_k\qBHw \%XM+] \%J2s+Adw JOdG£BHG+IK\^]qp

Ç[pNW\^I<BHw2OdDK\P BHG+IK\OdGÉ\n[v\%P^w \%XWR6BSJ(BF] \n_J2s+AdwyRHkzB0w2+] \qBHwqMw2Y\FBHG<wBHIKRSG+OQJ2wyOQJ9GY\n[w9w ROdx~vRrJ J2OdZ+AQ\;w R~X+\nk\qBHwqM+] \%J2s+Adw JOdGFk\qBH]qp

u OQJ2w2OdG+Irs+OQJ2+OdG+I;Z\^wa\%\^GJ2Odw2sBHw2OQRSGYJY\^] \Ê[pRS]#[p9OQJz¤Y] \%XOQJ;GYRSw(Bw2]2OdDOCBHAxBHw2w \^]qp GBHv+v+] R%[O_xBHw2OQRSGPqBHGZ\yBSP +OQ\^DK\%XZ<ibxBHtOdG+I£EN xyRX+\^AQJRHkWw2Y\X[OP^s+AdwaiÌRHkWIK\^w2w2OdG+I] \%J RSs+] P%\%JqMBHGYXÎw2Y\^GX+\^w \^]2x~OdG+OdG+I0YRqãX[OP^s+Adw(Odw?OQJw RfZ+]2OdG+I£w2Y\\n_v\%P^w \%X4WRÉZBSP tFw RBHw AQ\qBSJ2wBHw Odw BSJ Z\nkRS] \w2Y\ \^DK\^G<wqpY\%J \xyRX+\^AQJOdAdA] \%J \^x?Z+AQ\Ww2Y\xyRX_\^AQJBHAd] \qBSX[iX+\%J P^]2OdZ\%Xp Y\bX[O¡\^] \^GYP%\%JOdAdAzZ\w2BHwBSP^w2OdDOdw2OQ\%J BH] \Wv+ACBHGYJkRS]BSP%<s+Od]2OdG+I~] \%J RSs+] P%\%JqMw2Y\BSP^w2OdDOdwaiwaiv\%JBH] \X[O¡\^] \^G<wv+ACBHGYJkRS] BSP%<s+Od]7_OdG+I9B] \%J RSs+] P%\rM+BHGYXyOdGYJ2w \qBSXRHk\n[v\%P^w \%X WR0w2Y\s+w2OdAdOdw2OQ\%J"OdAdA+Z\B xy\qBSJ2s+] \RHkw2Y\\n[v\%P^w \%X9OdGYP%RSG[_DK\^G+OQ\^GYP%\rpY\|v+ACBHGYJ.OdAdA+BHAQJ ROdGYP^AdsYX+\BHG?OdGYJ2wBHGYP%\

RHkBFP^ACBSJ JRSs+w2v+s+w2w2OdG+Ifw2Y\~v+] RSZBHZ+OdAdOdwaiRHkJ2sYP%P%\%J JRHkw2Y\?v+ACBHG.MsYJ \%X£w RX+\^w \^]2x~OdGY\(w2Y\(\n[v\%P^w \%Xb\nk½_k\%P^wRSGyw2Y\] \%J RSs+] P%\%J|RHk.BHw2w \^x~v+w2OdG+I?w2Y\v+ACBHGBHGYXkRS] BSJ J \%J J2OdG+I9w2Y\X[OP^s+AdwaiKp+OQJ|BHv+v+] R%[OdxBHw2OQRSGyxy\qBHGYJw2BHw\]2OQJ2t?OdIrGYRS]7_

OdG+IÎw2Y\bx?s+] X+\^]£RHk~BOk\rM s+G+AQ\%J JFw2Y\ WR1kRS]RqG+OdG+IRSGY\FOQJyJ \^w~+OdIr\^GYRSs+Ir.MBSJ~w2Y\F+OdIrY\%J2w\n[v\%P^w \%XWRxB%iGYRSw?OdGYP^AdsYX+\BFOk\rp u OQJ2w2OdG[_Irs+OQJ2+OdG+IÌZ\^wa\%\^G %Yp=BHGYX&Ç[pPqBHG&Z\bX+RSGY\bZ<iP%RSx~vBH]2OdG+IÎw2Y\£] \%J RSs+] P%\%JRHkw2Y\bBHIK\^G<wfBHGYXw2Y\] \%J RSs+] P%\%JRHk"w2Y\?BHG<wBHIKRSG+OQJ2wqpu \^w \^]2x~OdG+OdG+IFw2Y\9J2w2] \^G+Irw2RHk|w2Y\(OdG[YsY\^GYP%\~RSG

w2Y\\^xyRSw2OQRSGYJfOQJ0BÎx~O[w2s+] \RHk9w2Y\X[OP^s+Adwai&RHkRqDK\^] P%RSx~OdG+Iw2Y\v+] RSZ+AQ\^xBHGYX9YRqJ \^]2OQRSsYJ"w2Y\\n_v\%P^w \%X$WRFOQJBU¡\%P^w \%XMrOjp \rpSYRq¢ACBH]2IK\w2Y\P BHG+IK\OdG \n[v\%P^w \%X WRãOQJqp cOdGYX[OdG+Iw2Y\ÉJ2w2] \^G+Irw25OQJX+RSGY\FX[s+]2OdG+Iw2Y\FX+\^w \^]2x~OdGBHw2OQRSGRHkzw2Y\F\^xyRSw2OQRSGBU¡\%P^w \%XM?w2Y\ÎJ \^]2OQRSsYJ2GY\%J J0OQJ£w2Y\ÎRSs+w2v+s+wRHkw2Y\RSx~vBH]BHw RS]qpY\ P^w2OQRSG§] RSvRrJ \^]zJ2YRSs+AQXbRSs+w2v+s+wWw2Y\?GY\n[w

w2+OdG+I0kRS](w2Y\FBHIK\^G<w~w RbX+R+p w(IK\^w JyBSJ(OdG+v+s+wyBHG−−→EP

RHkw2Y\9BHIK\^G<w;BHGYXbB~AdOQJ2wWRHkBSP^w2OdDOdw2OQ\%JqM]BHG+tK\%XZ<iw2Y\^Od] \n[v\%P^w \%X WRp u O¡\^] \^G<wzBHIK\^G<w JOdAdA] \n_BSP^wyX[O¡\^] \^G<w2AdiÎw RBHG¢\^xyRSw2OQRSGBHAWP BHG+IK\rM\rp IYp|BSJ

40 O. Bangsø, N. Søndberg-Madsen, and F. V. Jensen

Page 51: Proceedings of the Third European Workshop on Probabilistic

B£Z+s+AdAdiÉwaiv\FBHIK\^G<wyOQJyBHG+IK\^] \%XÎOdw~OQJ9xyRS] \AdOdtK\^Adiw RbP YRRrJ \FBHIrIr] \%J J2OdDK\FBSP^w2OdDOdw2OQ\%JqM+OdAQ\fBfvBSP^O¤J2wBHIK\^G<wOdAdA] \qBSP^w;w R+OdIrY\^](BHG+IK\^]Z<i0Z\^OdG+IFxyRS] \AdOdtK\^Adi w RP YRRrJ \¢BSP^w2OdDOdw2OQ\%JY\^] \Îw2Y\¢BHIK\^G<wOQJBHAQRSGY\rpy\y\^G<DOQJ2OQRSGÎBJ2i[J2w \^xÕY\^] \\qBSP ÎBHIK\^G<wBSJ£BHG ÏÀy´rÐl»´S¼¹Sµ..¹KÝ%Ðj´SÙbÓS¶2Ý%Ðj´SÙ%¥ −−→

EF¦Odw2 BHG

\^G<w2]BHGYP%\?kRS]z\qBSP b\^xyRSw2OQRSGBSJ J RP^OCBHw \%X£Odw2b\qBSP BSP^w2OdDOdwaiKpY\ P^w2OQRSG§] RSvRrJ \^]|OdAdAYw2Y\^GykRS]|\qBSP BSP^w2OdDOdwaiKMWx?s+Adw2Odv+Adiw2Y\X+RSw7_jv+] RX[sYP^w£RHk −−→

EPBHGYX

−−→EF

Odw2Îw2Y\\n[v\%P^w \%X5WRiOQ\^AQX[OdG+I0w2Y\fÏÀy´S·Ðl»´S¼¹Sµ+¹Sµ ÜY¶%¥

EV¦

EV = QoL ∗−−→EP ·

−−→EF

RWB%DKRSOQX?v+] \%X[OQP^wBHZ+OdAdOdwai\OdAdA<v+OQP t?BHG(BSP^w2OdDOdwaiBHw]BHGYX+RSx sYJ2OdG+I~w2Y\%J \;DrBHAdsY\%J BSJ\^OdIr<w JqpWJ2OdG+IÉRSG+AdiÌw2+OQJOdAdA ] \%J2s+AdwOdGw2Y\bBHIK\^G<w JFBHA_

B%i[Jv\^]7kRS]2x~OdG+I£J RSxy\BSP^w2OdDOdwaiKMw2<sYJ?GY\^DK\^]w2]2i<_OdG+Iw R;Z\^w2w \^]w2Y\^Od]J2Odw2sBHw2OQRSGyZ<i9w2]2iOdG+Iw R?BSP%<s+Od] \xyRS] \~] \%J RSs+] P%\%Jqp Jz\BHAd] \qBSX[ibGY\%\%XxyRX+\^AQJw RBSJ J \%J Jw2Y\|X[OP^s+AdwaiRHk¥½] \U¦7BSP%<s+Od]2OdG+IW] \%J RSs+] P%\%J.kRS]w2Y\ DrBHAdsBHw RS]qMY\;x~OdIr<wzBSJ\^AdAsYJ \zw2Y\%J \;xyRX_\^AQJ~kRS]X+\^w \^]2x~OdG+OdG+IÉY\^w2Y\^]w2Y\£BHIK\^G<w JJ2YRSs+AQXw2]2iw R~BSP%<s+Od] \z] \%J RSs+] P%\%JqpY\;\n[v\%P^w \%XOdGYP%RSG<DK\n_G+OQ\^GYP%\(RHk|\qBSP bv+ACBHGbPqBHG£Z\?J2s+Z+w2]BSP^w \%X£kl] RSx*w2Y\\n[v\%P^w \%X WR RHkzw2Y\fJ2Odw2sBHw2OQRSGY\^] \Fw2Y\Fv+ACBHGBSJ|Z\%\^GBHw2w \^x~v+w \%Xp+OQJDrBHAdsY\OQJ|w2Y\^GsYJ \%X~kRS]w2Y\£PqBHAQP^s+ACBHw2OQRSG6RHk;w2Y\

EVRHk;w2Y\£v+ACBHG.p6§ACBHGYJ

kRS] BSP%<s+Od]2OdG+I] \%J RSs+] P%\%J PqBHGfw2Y\^GfZ\;sYJ \%XFZ<iw2Y\ P^w2OQRSGɧ] RSvRrJ \^]?OdGw2Y\JBHxy\~B%iBSJ(BSP^w2OdDOdw2OQ\%JqpNWRSw \Ww2BHww2+OQJBHv+v+] RKBSP FOQJx?iKRSv+OQPrM+Z+s+wPqBHGF\qBSJ7_OdAdi?Z\\n[vBHGYX+\%X~w RWOdGYP^AdsYX+\ BHG<i(G<s+x?Z\^]|RHkJ2s+ZYJ \n_<sY\^G<wv+ACBHGYJqp"+OQJqMSYRq\^DK\^]qMP%RSxy\%JBHww2Y\P%RrJ2wRHkBHGy\n[vRSGY\^G<w2OCBHA+Ir] Rqw2yOdG~w2Y\G<s+x?Z\^]|RHkvRrJ J2OdZ+AQ\P%RSx?Z+OdGBHw2OQRSGbRHkv+ACBHGYJw Ry\^DrBHAdsBHw \rp Adw \^]2GBHw2OdDK\^AdiB Ir] \%\%X[i9BHv+v+] RKBSP ~xB%i?Z\BHw2w \^x~v+w \%XMKY\^] \w2Y\¤Y] J2wzJ2w \^v0OQJWw Ry¤YGYX0w2Y\(v+ACBHG.MRS]zv+ACBHGYJqMOdw20w2Y\+OdIrY\%J2w\n[v\%P^w \%X WR¢¥BUklw \^]J2s+Z+w2]BSP^w2OdG+I9w2Y\W\n_v\%P^w \%X£OdGYP%RSG<DK\^G+OQ\^GYP%\~RHk|w2Y\?v+ACBHG¦nMw2Y\^G0w2Y\(\n_v\%P^w \%XÌJ2Odw2sBHw2OQRSGBUklw \^]yPqBH]2]2iOdG+IRSs+w~w2Y\v+ACBHGÌOQJsYJ \%Xfw Ry¤YGYX£w2Y\Z\%J2w v+ACBHG0w R\n+\%P^s+w \?GY\n[wqMBHGYXJ RbRSG.Ms+G<w2OdAGYR0I<BHOdGÎOdGÌ\n[v\%P^w \%X WR6OQJ(vRrJ J2O_Z+AQ\rpY\9Ir] \%\%X[ibBHv+v+] RKBSP OdAdA"B%DK\9w2Y\~BSX[DrBHG[_wBHIK\RHk IrOdDOdG+IB£Z\^w2w \^]$WR6\%J2w2OdxBHw \rMZ+s+w9OdAdAwBHtK\zAQRSG+IK\^]w R~PqBHAQP^s+ACBHw \rM[kRS]\qBSP £J2w \^vFOdGFw2Y\;J \n_<sY\^GYP%\rMK\qBSP ~vRrJ J2OdZ+AQ\|GY\n[wJ2w \^v9GY\%\%X+J"w RWZ\\^DrBHA_sBHw \%Xp£c+s+]2w2Y\^]2xyRS] \w2+OQJ9BHv+v+] RKBSP ÎOdAdAOdGYP^AdsYX+\

RSG+Adi?RSGY\P YRSOQP%\RHk] \%J RSs+] P%\BSP%<s+OQJ2Odw2OQRSGUÍSw2Y\|Z\%J2wJ \%<sY\^GYP%\ RHk.v+ACBHGYJqMJ R?RSGYP%\ w2Y\

EVOQJPqBHAQP^s+ACBHw \%XM

\zPqBHG+GYRSwZ\WJ2s+] \Ww2BHww2+OQJBSJOdGkjBSP^ww2Y\WZ\%J2wJ \%<sY\^GYP%\rpR&B%DKRSOQX5w2Y\ÉDrBHAdsY\ÎRHk\^xyRSw2OQRSGYJbZ\^OdG+I6P%RSG[_

J2wBHG<w2AdiIr] RqOdG+IYM|\OdAdAxBHOdG<wBHOdGB£AdOQJ2w~RHk w2Y\P%RSG<w2]2OdZ+s+w2OQRSGYJWw R −−→

EPMBHGYXFAQ\^w w2Y\DrBHAdsY\%J RHk\qBSP

P%RSG<w2]2OdZ+s+w2OQRSGkjBSX+\~RqDK\^];w2Odxy\rpyVRSxy\~\^DK\^G<w JxB%iKMYRq\^DK\^]qMyZ\ÌJ RvRq\^]7kls+Ayw2BHww2Y\^i1BH] \ÎGY\^DK\^]P%RSx~v+AQ\^w \^Adi] \^xyRqDK\%Xkl] RSx −−→

EP¥½w2]BHs+xBSJn¦nM YRq

w2+OQJRP%P^s+] JOdG<s+xBHGYJOQJBxBHw2w \^]RHk;vYJ2i[P YRSA_RSIriKM9BHGYX OdwbOQJ0Z\^OdG+IOdG<DK\%J2w2OdI<BHw \%X5OdG5w2Y\Î.O_AdOdIr<w~v+] RHT2\%P^wqpÌ|s+]2] \^G<w\^DK\^G<w J~xB%iÌBHAQJ R0w2]2OdIrIK\^]w2Y\] \%P%RSAdAQ\%P^w2OQRSGRHkv+] \^DOQRSsYJW\^DK\^G<w JqMIrOdDOdG+Iw2Y\^x] \^GY\^\%XÉJ2w2] \^G+Irw2ÉOdG −−→

EPMw2+OQJ?OQJqMRSGYP%\BHI<BHOdG.MB

w2+OdG+Iw2BHwz\9YRSv\?w RAQRRStFkls+]2w2Y\^]zOdG<w RYRq1w ROdGYP^AdsYX+\rp

0±+²± )¬K²j®«),2 2

Gw2+OQJJ \%P^w2OQRSGf\zOdAdAIKR9w2+] RSs+Ir0BHAdAw2Y\zw2+OdG+IKJw2BHwGY\%\%Xw RZ\0J2v\%P^O¤\%X¢kRS]w2Y\0RSs+w2AdOdGY\%X6BHv[_v+] RKBSP w RZ\?Odx~v+AQ\^xy\^G<w \%XP%RSx~v+AQ\^w \^AdiKp?\~BHOdxBHwX+\^DK\^AQRSv+OdG+IB(J2i[J2w \^xkRS]BHs+w RSxBHw2OQPqBHAdAdiIK\^GY\^]7_BHw2OdG+I@;@WEN xyRX+\^AQJkl] RSx=w2Y\J2v\%P^O¤PqBHw2OQRSGYJqp

< H <Y\£] \%J RSs+] P%\%Jw2BHwx~OdIr<wFZ\0B%DrBHOdACBHZ+AQ\0kRS]w2Y\BHIK\^G<w JBHGYX?w2Y\^Od]"vRrJ J2OdZ+AQ\|DrBHAdsY\%JqpVRSxy\|] \%J RSs+] P%\%JBH] \IrAQRSZBHADrBHAdsY\%JqM\rp IYp9w2Y\ J \qBSJ RSG.MRSw2Y\^] JBH] \zBHw7_w2]2OdZ+s+w \%JWRHkw2Y\(BHIK\^G<w JqM\rp IYpP BH]2OQJ2xB[M+OdAQ\?RSw2[_\^] JBH] \RSZ[T2\%P^w J"RS]BHZYJ2w2]BSP^w2OQRSGYJw2BHwPqBHG9Z\|kRSs+GYXOdGw2Y\0I<BHxy\n_jRS]2AQXMz\rp IYpBv+AQRSs+IrOQJFBHGRSZ[T2\%P^w+OdAQ\xyRSGY\^ix~OdIr<wWZ\BHGbBHZYJ2w2]BSP^w2OQRSG.pNWR~xBHw7_w \^]BHww2Y\^ifBH] \rM+\;GY\%\%Xw R~tGYRq&\^] \;w R9IK\^ww2Y\bDrBHAdsY\%Jkl] RSxÖw2Y\bI<BHxy\rp=Y\WR RHk~\qBSP ] \%J RSs+] P%\Z\^OdG+IbOdGÌB0J2v\%P^O¤PJ2wBHw \x?sYJ2wyBHAQJ R£Z\J2v\%P^O¤\%XMBHAQRSG+IbOdw2¢B£J2wBHGYXYBH] XÉv+] \nk\^] \^GYP%\kRS]w2Y\f] \%J RSs+] P%\rp wyxB%iBHAQJ RZ\f\n[v\%X[OQ\^G<ww ROdG[_P^AdsYX+\9w2Y\9J2wBHGYXYBH] Xb]BHG+IK\yBHGB%DK\^]BHIK\BHIK\^G<w;OdAdAB%DK\ RHk.\qBSP ] \%J RSs+] P%\rMKY\^] \OdwxBHtK\%JJ \^GYJ \rMK\rp IYpOdwX+R\%J"GYRSw"xBHtK\J \^GYJ \|kRS]BW] \%J RSs+] P%\AdOdtK\J \qBSJ RSG.MZ+s+wOdwX+R\%JkRS]OdG<w \^AdAdOdIK\^GYP%\rp

! "$#%&' ()+*,"! !!-($&.($&'!!/!0&' "$&.+#213(5476)&!89:&<;=5>?';@( +*,"! !!A&+0&' &' B

A Bayesian Network Framework for the Construction of Virtual Agents with Human-like Behaviour 41

Page 52: Proceedings of the Third European Workshop on Probabilistic

9r C <9r C<

Y\BSP^w2OdDOdwaibwaiv\%J(BHAQRSG+IfOdw2ÎBHAdAw2Y\BSP^w2OdDOdw2OQ\%JOdGb\qBSP BSP^w2OdDOdwai£waiv\rpWcYRS]z\qBSP BSP^w2OdDOdwaiKMw2Y\?] \n_J RSs+] P%\%Jw2BHwOdG[YsY\^GYP%\ w2Y\xyRSw2OdDrBHw2OQRSGkRS]w2Y\ BSPn_w2OdDOdwaiKM.+OQP X[Od] \%P^w2OQRSG.M"BHGYXw2Y\~J2w2] \^G+Irw2RHkw2Y\OdG[YsY\^GYP%\fBHAQRSG+IbOdw2Ìw2Y\JBHxy\kRS](OdG[YsY\^GYP%\FRSGw2Y\ WRRHkw2Y\BSP^w2OdDOdwaiKp wOQJ9BHAQJ R£BFIKRRXOQX+\qBw RÉOdGYP^AdsYX+\BÉJ2wBHGYXYBH] Xv+] \nk\^] \^GYP%\0kRS]ZRSw2w2Y\BSP^w2OdDOdwai0waiv\%JBHGYX0w2Y\yBSP^w2OdDOdw2OQ\%Jqp( |BSP ÉBSP^w2OdDOdwaiJ2YRSs+AQXBHAQJ RZ\(IrOdDK\^GBykjBSP^w RS]zkRS];\qBSP \^xyRSw2OQRSG.MsYJ \%X9kRS]PqBHAQP^s+ACBHw2OdG+I9w2Y\

EVM<Z<i~w2Y\ P^w2OQRSG§] RH_

vRrJ \^]qp wxB%iZ\zIKRRXOQX+\qB9w R9AQ\^w BHAdA −−→EF

kRS] BHGBHIK\^G<wJ2s+x1w Rzw2Y\JBHxy\DrBHAdsY\rMKJ Rzw2BHw"¤YGY\n_jw2s+G+OdG+Iw2Y\Z\^B%DOQRS]RHkBHIK\^G<w JPqBHGZ\X+RSGY\ J RSAQ\^Adi~RSGyw2Y\v+] \nk\^] \^GYP%\%Jqp§ACBHGYJ~kRS]yBSP%<s+Od]2OdG+I] \%J RSs+] P%\%JJ2YRSs+AQXÎOdGYP^AdsYX+\

w2Y\JBHxy\BSJfBSP^w2OdDOdw2OQ\%JqMWOdw2&OdGYP%RSG<DK\^G+OQ\^GYP%\] \n_v+ACBSP^OdG+I WRpc+s+]2w2Y\^]2xyRS] \w2Y\$J2v\%P^O¤PqBHw2OQRSGJ2YRSs+AQXOdGYP^AdsYX+\|] \%J RSs+] P%\%Jw2BHwOdG[YsY\^GYP%\|w2Y\|v+] RSZ[_BHZ+OdAdOdwaiRHk?w2Y\£v+ACBHGZ\^OdG+IÎJ2sYP%P%\%J J7kls+AjMYRq!w2Y\^iOdG[YsY\^GYP%\yw2Y\~v+] RSZBHZ+OdAdOdwaiKMYRq] \%J RSs+] P%\%J(BH] \BUk½_k\%P^w \%XZ<iãBHw2w \^x~v+w2OdG+I1w2Y\v+ACBHG.MfBHGYXYRq ] \n_J RSs+] P%\%JBH] \0BU¡\%P^w \%X¢Z<i¢w2Y\fv+ACBHGZ\^OdG+IÉPqBH]2]2OQ\%XRSs+wJ2sYP%P%\%J J7kls+AdAdiKp

<y qcYRS]f\qBSP &waiv\RHk IK\^G<wfw2Y\v+] \nk\^] \^GYP%\%JfRHkyBHAdARHkw2Y\9BHZRqDK\rMY\^] \?w2Y\9BHIK\^G<wzwaiv\(X[O¡\^] Jkl] RSxBHG&B%DK\^]BHIK\BHIK\^G<wqMWGY\%\%X+Jw RÉZ\bJ2v\%P^O¤\%Xp k9Bv+] \nk\^] \^GYP%\|OQJ.s+GYJ2v\%P^O¤\%X?w2Y\|J2wBHGYXYBH] Xv+] \nk\^] \^GYP%\OdAdA.Z\zsYJ \%XpY\(BHIK\^G<wwaiv\PqBHG0BHAQJ R~OdGYP^AdsYX+\?BJ2wBHGYXYBH] Xy]BHG+IK\kRS]w2Y\] \%J RSs+] P%\%JY\^] \WBHGBHIK\^G<wRHkw2BHwwaiv\9OdAdAX[O¡\^]zkl] RSxÕw2Y\B%DK\^]BHIK\BHIK\^G<wqp IK\^G<wwaiv\%JxB%iBHAQJ RzB%DK\ −−→

EFJw2BHw|X[O¡\^]kl] RSx

w2Y\|B%DK\^]BHIK\BHIK\^G<wqMr\rp IYpqw2Y\Z+s+AdAdi;waiv\xy\^G<w2OQRSGY\%Xp wJ2w2OdAdAxBHtK\%JJ \^GYJ \zw R9AQ\^w −−→

EFJJ2s+x=w R(w2Y\;JBHxy\

DrBHAdsY\rM.\rp IYpw2Y\9Z+s+AdAdi0waiv\~BHIK\^G<w JOdGYP^AdOdGBHw2OQRSGw RZ\qBHw2OdG+Ifv\%RSv+AQ\~s+vOdAdA|X[] RSvxyRS] \~w2BHGw2Y\B%D<_\^]BHIK\;v\^] J RSG1 JqM[Y\^G~T2Rqi]2OQJ \%Jqp |BSP BHIK\^G<wGY\%\%X+J|B waiv\BHGYXyBzJ2v\%P^O¤PqBHw2OQRSGyRHk

w2Y\zv+] \nk\^] \^GYP%\%JY\^] \;OdwX[O¡\^] Jkl] RSx*B9J2wBHGYXYBH] XBHIK\^G<wRHkw2BHwwaiv\rMGYRSw \Ww2BHwv+] \nk\^] \^GYP%\%JkRS]BSPn_w2OdDOdwai Uv+ACBHGwaiv\%JBH] \ ¤++\%XOdGYJ2OQX+\zBHGFBHIK\^G<wwaiv\rp kJ RSxy\9] \%J RSs+] P%\9GY\%\%X+J;w RZ\(¤++\%XBHw?BJ2v\%P^O¤PDrBHAdsY\rMRS]|OdGYJ2OQX+\zB;]BHG+IK\ w2BHwX[O¡\^] Jkl] RSxw2Y\zB%D<_\^]BHIK\FBHIK\^G<wyRHkWw2BHw9waiv\rMOdw9x?sYJ2w9Z\J2v\%P^O¤\%Xp

wWOQJWsYJ2sBHAdAdi0ByIKRRX£OQX+\qBw RxBHtK\9J2s+] \?w2BHwzw2Y\] \%J RSs+] P%\%J?IrOdDK\^Gw R0BHGÎBHIK\^G<w?kjBHAdAOdw2+OdGÉBHw?OQJvRrJ J2OdZ+AQ\(kRS]w2BHw(BHIK\^G<w?waiv\~w RFB%DK\rp w;OQJ?BHAQJ RvRrJ J2OdZ+AQ\~w RfRqDK\^]2]2Odw \ −−→

EFJOdGw2Y\BHIK\^G<w(J2v\%P^O¤+_

PqBHw2OQRSG.p

-£±2«®4+ 0¯*/ 0«¬32Y\£BHs+w2YRS] JyRSs+AQX¢OdGÌvBH]2w2OQP^s+ACBH]AdOdtK\Fw Rw2BHG+tN OdtKRSACBUT iAQX[OdIÉkRS]DrBHAdsBHZ+AQ\X[OQJ P^sYJ J2OQRSGYJRSG+OQJOQX+\qBSJ9RSGÉw2Y\ `V+OQP Ì\B%DK\IrOdDK\^G¢BfGYRS]7_xBHw2OdDK\~BHG+IrAQ\rp;Y\~BHs+w2YRS] JWRSs+AQXbAdOdtK\(w Rw2BHG+t s+IrOdG¢kRS]v+] RqDOQX[OdG+IÌBw RRSAkRS]J2v\%P^O¤PqBHw2OQRSGRHk@;@WENWJqp.\|RSs+AQXyBHAQJ RAdOdtK\|w R w2BHG+tw2Y\BHGYRSG<i<_xyRSsYJ] \^DOQ\^\^] JkRS]DrBHAdsBHZ+AQ\P%RSx~xy\^G<w Jqp

0 0Y­30«|± 0 2 ! #"%$'&)(+*-,/.1024365879,;:=<?>83651@BADCE:-51FG3652H@BI6I 36J7%@B51<?@BKL ML$'"N O%P=P-QRO%S=&

T)U VV#W'?MYX+%Z[\Z?W' ]^[%Z#`_'&=&=&a3b:=cBI *ed\dDT f-ghW'[=_'&=&%_i@Bj-@BADkl3652H@BAmin3 7o>82bpDq r/s U M _'&=&%$tu9v`JHw8x=x=y=z-w=w%'JHw=1g| U =%Z)~+f'[#DZg)f=YMfB]^[%Z#

`m\Zf=8s= U f=[=W'MG1f U U #"==+1>@Y(+*7'J5362436j-@v`24AD0<B240A@*t),;*-243b*-5pD)W']M=[/ =[\Q\ ZsY~+[#

m T W'\Z[#?1/_'&=&%_mdD52H@R7'A:-24365872>@(h(uE*#F=@BI*t),;*-243b*-5p365t),;c?*#F-3b@?F(>:-A:=<B2H@BA?pDH~+fB[[#M8Q %f'|Z[Gf=\f=f=m \ZW U f=8=[?W-Z f=W UW'?W=DZ[?N/ U W-Z f='KL[BZfM'W'M|[#\[#W'??W U U [=[#KL[ U `f=[=

T W'%+~! r+ U U []^ |_'&=&=&^*.1J\F=*-kl5(+*-5JpD24AD0<B243b*-5 :-51F¢¡m@R.@B24362436j-@v`24AD0<B240A@Dp£¡[email protected]@Dp@B52H:-J243b*-5365¤m:-C=@DpD3b:-5i@B24k*-A¥op/H£~+fB[[#M %mf'lZ[h \Z[[%ZYH%Z[W-Z f=W U`¦§U f=MW/\Z ¨`BW U H%Z[ U U Q=[B[©8fB [BZs9f=[[B[=W'=[#|_'%_oQ_'=S

42 O. Bangsø, N. Søndberg-Madsen, and F. V. Jensen

Page 53: Proceedings of the Third European Workshop on Probabilistic

Loopy Propagation: the Convergence Error in Markov Networks

Janneke H. BoltInstitute of Information and Computing Sciences, Utrecht University

P.O. Box 80.089, 3508 TB Utrecht, The [email protected]

Abstract

Loopy propagation provides for approximate reasoning with Bayesian networks. In pre-vious research, we distinguished between two different types of error in the probabilitiesyielded by the algorithm; the cycling error and the convergence error. Other researchersanalysed an equivalent algorithm for pairwise Markov networks. For such networks withjust a simple loop, a relationship between the exact and the approximate probabilities wasestablished. In their research, there appeared to be no equivalent for the convergence er-ror, however. In this paper, we indicate that the convergence error in a Bayesian networkis converted to a cycling error in the equivalent Markov network. Furthermore, we showthat the prior convergence error in Markov networks is characterised by the fact that thepreviously mentioned relationship between the exact and the approximate probabilitiescannot be derived for the loop node in which this error occurs.

1 Introduction

A Bayesian network uniquely defines a jointprobability distribution and as such providesfor computing any probability of interest overits variables. Reasoning with a Bayesian net-work, more specifically, amounts to comput-ing (posterior) probability distributions for thevariables involved. For networks without anytopological restrictions, this reasoning task isknown to be NP-hard (Cooper 1990). For net-works with specific restricted topologies, how-ever, efficient algorithms are available, such asPearl’s propagation algorithm for singly con-nected networks. Also the task of computingapproximate probabilities with guaranteed errorbounds is NP-hard in general (Dagum and Luby1993). Although their results are not guaran-teed to lie within given error bounds, various ap-proximation algorithms are available that yieldgood results on many real-life networks. One ofthese algorithms is the loopy-propagation algo-

rithm. The basic idea of this algorithm is to ap-ply Pearl’s propagation algorithm to a Bayesiannetwork regardless of its topological structure.While the algorithm results in exact probabil-

ity distributions for a singly connected network,it yields approximate probabilities for the vari-ables of a multiply connected network. Goodapproximation performance has been reportedfor this algorithm (Murphy et al 1999).

In (Bolt and van der Gaag 2004), we stud-ied the performance of the loopy-propagationfrom a theoretical point of view and argued thattwo types of error may arise in the approximateprobabilities yielded by the algorithm: the cy-

cling error and the convergence error. A cyclingerror arises when messages are being passedon within a loop repetitively and old informa-tion is mistaken for new by the variables in-volved. A convergence error arises when mes-sages that originate from dependent variablesare combined as if they were independent.

Many other researchers have addressed theperformance of the loopy-propagation algo-rithm. Weiss and his co-workers, more specif-ically, investigated its performance by study-ing the application of an equivalent algorithmon pairwise Markov networks (Weiss 2000, andWeiss and Freeman 2001). Their use of Markovnetworks for this purpose was motivated by therelatively easier analysis of these networks and

Page 54: Proceedings of the Third European Workshop on Probabilistic

justified by the observation that any Bayesiannetwork can be converted into an equivalentpairwise Markov network. Weiss (2000) derivedan analytical relationship between the exact andthe computed probabilities for the loop nodesin a network including a single loop. In theanalysis of loopy propagation in Markov net-works, however, no distinction between differ-ent error types was made, and on first sightthere is no equivalent for the convergence er-ror. In this paper we investigate this differencein results; we do so by constructing the simplestsituation in which a convergence error may oc-cur, and analysing the equivalent Markov net-work. We find that the convergence error inthe Bayesian Markov network is converted to acycling error in the equivalent Markov network.Furthermore, we find that the prior convergenceerror in Markov networks is characterised by thefact that the relationship between the exact andthe approximate probabilities, as established byWeiss, cannot be derived for the loop node inwhich this error occurs.

2 Bayesian Networks

A Bayesian network is a model of a joint prob-ability distribution Pr over a set of stochas-tic variables V, consisting of a directed acyclicgraph and a set of conditional probability dis-tributions1. Each variable A is represented bya node A in the network’s digraph2. (Condi-tional) independency between the variables iscaptured by the digraph’s set of arcs accord-ing to the d-separation criterion (Pearl 1988).The strength of the probabilistic relationshipsbetween the variables is captured by the con-ditional probability distributions Pr(A | p(A)),where p(A) denotes the instantiations of theparents of A. The joint probability distribution

1Variables are denoted by upper-case letters (A), andtheir values by indexed lower-case letters (ai); sets ofvariables by bold-face upper-case letters (A) and theirinstantiations by bold-face lower-case letters (a). Theupper-case letter is also used to indicate the whole rangeof values of a variable or a set of variables.

2The terms node and variable will be used inter-changeably.

is presented by

Pr(V) =∏

A∈V

Pr(A | p(A))

For the scope of this paper we assume all vari-ables of a Bayesian network to be binary. Wewill often write a for A = a1 and a for A = a2.Fig. 1 depicts a small binary Bayesian network.

A

B

C

Pr(a) = x

Pr(b | a) = pPr(b | a) = q

Pr(c | ab) = rPr(c | ab) = sPr(c | ab) = tPr(c | ab) = u

Figure 1: An example Bayesian network.

A multiply connected network includes one ormore loops. We say that a loop is simple if noneof its nodes are shared by another loop. A nodethat has two or more incoming arcs on a loopwill be called a convergence node of this loop.Node C is the only convergence node in the net-work from Fig. 1. Pearl’s propagation algorithm(Pearl 1988) was designed for exact inferencewith singly connected Bayesian networks. Theterm loopy propagation used throughout the lit-erature, refers to the application of this algo-rithm to networks with loops.

3 The Convergence Error in

Bayesian Networks

When applied to a singly connected Bayesiannetwork, Pearl’s propagation algorithm resultsin exact probabilities. When applied to a multi-ply connected network, however, the computedprobabilities may include errors. In previouswork we distinguished between two differenttypes of error (Bolt and van der Gaag 2004).

The first type of error arises when messagesare being passed on in a loop repetitively andold information is mistaken for new by the vari-ables involved. The error that thus arises willbe termed a cycling error. A cycling error canonly occur if for each convergence node of a loopeither the node itself or one of its descendentsis observed. The second type of error originatesfrom the combination of causal messages by the

44 J. H. Bolt

Page 55: Proceedings of the Third European Workshop on Probabilistic

convergence node of a loop. A convergence nodecombines the messages from its parents as if theparents are independent. They may be depen-dent, however, and by assuming independence,a convergence error may be introduced. A con-vergence error may already emerge in a networkin its prior state. In the sequel we will denotethe probabilities that result upon loopy prop-agation with Pr to distinguish them from theexact probabilities which are denoted by Pr.

Moreover, we studied the prior convergenceerror. Below, we apply our analysis to the ex-ample network from Figure 1. For the networkin its prior state, the loopy-propagation algo-rithm establishes

fPr(c) =X

A,B

Pr(c | AB) · Pr(A) · Pr(B)

as probability for node C. Nodes A and B, how-ever, may be dependent and the exact probabil-ity Pr(c) equals

Pr(c) =X

A,B

Pr(c | AB) · Pr(B | A) · Pr(A)

The difference between the exact and approxi-mate probabilities is

Pr(c) − Pr(c) = x · y · z

where

x = Pr(c | ab) − Pr(c | ab) − Pr(c | ab) + Pr(c | ab)

y = Pr(b | a) − Pr(b | a)

z = Pr(a) − Pr(a)2

The factors that govern the size of the priorconvergence error in the network from Figure 1,are illustrated in Figure 2; for the constructionof this figure we used the following probabilities:r = 1, s = 0, t = 0, u = 1, p = 0.4, q = 0.1 andx = 0.5. The line segment captures the exactprobability Pr(c) as a function of Pr(a); notethat each specific Pr(a) corresponds with a spe-

cific Pr(b). The surface captures Pr(c) as a func-tion of Pr(a) and Pr(b). The convergence errorequals the distance between the point on the

Pr(c)

10

Pr(a) 0

1Pr(b)

1

0.5

0.65

Figure 2: The probability of c as a function ofPr(a) and Pr(b), assuming independence of theparents A and B of C (surface), and as a func-tion of Pr(a) (line segment).

line segment that matches the probability Pr(a)from the network and its orthogonal projectionon the surface. For Pr(a) = 0.5, more specif-

ically, the difference between Pr(c) and Pr(c)is indicated by the vertical dotted line segmentand equals 0.65− 0.5 = 0.15. Informally speak-ing:

• the more curved the surface is, the largerthe distance between a point on the linesegment and its projection on the surfacecan be; the curvature of the surface is re-flected by the factor x;

• the distance between a point on the seg-ment and its projection on the surface de-pends on the orientation of the line seg-ment; the orientation of the line segment isreflected by the factor y;

• the distance between a point on the linesegment and its projection on the surfacedepends its position on the line segment;this position is reflected by the factor z.

We recall that the convergence error originatesfrom combining messages from dependent nodesas if they were independent. The factors y andz now in essence capture the degree of depen-dence between the nodes A and B; the factor xindicates to which extent this dependence canaffect the computed probabilities.

Loopy Propagation: the Convergence Error in Markov Networks 45

Page 56: Proceedings of the Third European Workshop on Probabilistic

4 Markov Networks

Like a Bayesian network, a Markov networkuniquely defines a joint probability distribu-tion over a set of statistical variables V. Thevariables are represented by the nodes of anundirected graph and (conditional) indepen-dency between the variables is captured by thegraph’s set of edges; a variable is (conditionally)independent of every other variable given itsMarkov blanket. The strength of the probabilis-tic relationships is captured by clique potentials.Cliques C are subsets of nodes that are com-pletely connected; ∪C = V. For each clique, apotential function ψC(AC) is given that assignsa non-negative real number to each configura-tion of the nodes A of C. The joint probabilityis presented by:

Pr(V) = 1/Z ·∏

C

ψC(AC)

where Z =∑

V

∏C ψC(AC) is a normalising

factor, ensuring that∑

VPr(V) = 1. A pair-

wise Markov network is a Markov network withcliques of maximal two nodes.

For pairwise Markov networks, an algorithmcan be specified that is functionally equivalentto Pearl’s propagation algorithm (Weiss 2000).In this algorithm, in each time step, every nodesends a probability vector to each of its neigh-bours. The probability distribution of a node isobtained by combining the steady state valuesof the messages from its neighbours.

In a pairwise Markov network, the transitionmatrices MAB andMBA can be associated withany edge between nodes A and B.

MABji = ψ(A = ai, B = bj)

Note that matrix MBA equals MABT

.

Example 1 Suppose we have a Markov net-work with two binary nodes A and B, andsuppose that for this network the potentialsψ(ab) = p, ψ(ab) = q, ψ(ab) = r and ψ(ab) = sare specified, as in Fig. 3 . We then as-

sociate the transition matrix M AB =

»p rq s

with the link from A to B and its transpose

MBA =

»p qr s

–with the link from B to A. 2

A

B

[p rq s

][p qr s

]ψ(ab) = p

ψ(ab) = q

ψ(ab) = r

ψ(ab) = s

Figure 3: An example pairwise Markov networkand its transition matrices.

The propagation algorithm now is definedas follows. The message from A to B equalsMAB ·v after normalisation, where v is the vec-tor that results from the component wise mul-tiplication of all message vectors sent to A ex-cept for the message vector sent by B. Theprocedure is initialised with all message vectorsset to (1,1,...,1). Observed nodes do not re-ceive messages and they always transmit a vec-tor with 1 for the observed value and zero forall other values. The probability distribution fora node, is obtained by combining all incomingmessages, again by component wise multiplica-tion and normalisation.

5 Converting a Bayesian Network

into a Pairwise Markov Network

In this section, the conversion of a Bayesian net-work into an equivalent pairwise Markov net-work is described (Weiss 2000). In the con-version of the Bayesian network into a Markovnetwork, for any node with multiple parents,an auxiliary node is constructed into which thecommon parents are clustered. This auxiliarynode is connected to the child and its parentsand the original arcs between child and parentsare removed. Furthermore, all arc directions inthe network are dropped. The clusters are allpairs of connected nodes. For a cluster with anauxiliary node and a former parent node, thepotential is set to 1 if the nodes have a simi-lar value for the former parent node and to 0otherwise. For the other clusters, the potentialsare equal to the conditional probabilities of theformer child given the former parent. Further-more, the prior probability of a former root nodeis incorporated by multiplication into one of thepotentials of the clusters in which it takes part.

46 J. H. Bolt

Page 57: Proceedings of the Third European Workshop on Probabilistic

A

B

X

C

ψ(ab) = px

ψ(ab) = (1− p)x

ψ(ab) = q(1 − x)

ψ(ab) = (1− q)(1 − x)ψ(a, a′b′) = 1

ψ(a, a′ b′) = 1

ψ(a, a′b′) = 0

ψ(a, a′ b′) = 0

ψ(a, a′b′) = 0

ψ(a, a′ b′) = 0

ψ(a, a′b′) = 1

ψ(a, a′ b′) = 1

ψ(a′b′, c) = r

ψ(a′b′, c) = s

ψ(a′b′, c) = t

ψ(a′b′, c) = u

ψ(b, a′b′) = 1

ψ(b, a′b′) = 0

ψ(b, a′b′) = 1

ψ(b, a′b′) = 0

ψ(b, a′b′) = 0

ψ(b, a′b′) = 1

ψ(b, a′b′) = 0

ψ(b, a′b′) = 1

ψ(a′b′, c) = (1− r)

ψ(a′ b′, c) = (1− s)

ψ(a′b′, c) = (1− t)

ψ(a′ b′, c) = (1− u)

Figure 4: A pairwise Markov network that rep-resents the same joint probability distributionas the Bayesian network from Figure 1.

Example 2 The Bayesian network from Figure1 can be converted into the pairwise Markovnetwork from Figure 4 with clusters AB, AX,BX and XC. Node X is composed of A′ andB′ and has the values a′b′, a′b′, a′b′ and a′b′.Given that the prior probability of root nodeA is incorporated in the potential of clusterAB, the network has the following potentials:ψ(AB) = Pr(B | A) · Pr(A); ψ(XC) = Pr(C |AB): ψ(AX) = 1 if A′ = A and 0 otherwiseand; ψ(BX) = 1 if B ′ = B and 0 otherwise.

6 The Analysis of Loopy

Propagation in Markov Networks

Weiss (2000) analysed the performance of theloopy-propagation algorithm for Markov net-works with a single loop and related the approx-imate probabilities found for the nodes in theloop to their exact probabilities. He noted thatin the application of the algorithm messageswill cycle in the loop and errors will emergeas a result of the double counting of informa-tion. The main idea of his analysis is that for anode in the loop, two reflexive matrices can bederived; one for the messages cycling clockwiseand one for the messages cycling counterclock-wise. The probability distribution computed bythe loopy-propagation algorithm for the loopnode in the steady state, now can be inferredfrom the principal eigenvectors of the reflexive

matrices plus the other incoming vectors. Sub-sequently, he showed that the reflexive matricesalso include the exact probability distributionand used those two observations to derive an an-alytical relationship between the approximatedand the exact probabilities.

L1 L2

Ln

On

O1 O2

M1

M1T

Mn

MnT

D1 D2

Dn

Figure 5: An example Markov network with justone loop.

More in detail, Weiss considered a Markovnetwork with a single loop with n nodes L1...Ln

and with connected to each node in the loop,an observed node O1...On as shown in Figure 5.During propagation, a node Oi will constantlysend the same message into the loop. This vec-tor is one of the columns of the transition ma-trix MOiLi

In order to enable the incorporationof this message into the reflexive matrices, thisvector is transformed into a diagonal matrix Di,with the vector elements on the diagonal. For

example, suppose that MOiLi

=

»p rq s

–and

suppose that the observation Oi = oi1

is made,

then Di =

»p 00 q

–. Furthermore, M 1 is the

transition matrix for the message from L1 toL2 and M1T

the transition matrix for the mes-sage from L2 to L1 etc. The reflexive matrixC for the transition of a counterclockwise mes-sage from node L1 back to itself is defined asM1

T

D2...Mn−1T

DnMnT

D1. The message thatL2 sends to L1 in the steady state now is inthe direction of the principal eigenvector of C.The reflexive matrix C2 for the transition of aclockwise message from node L1 back to itselfis defined as MnDnMn−1Dn−1...M1D1. Themessage that node Ln sends to L1 in the steadystate is in the direction of the principal eigen-vector of C2. Component wise multiplication of

Loopy Propagation: the Convergence Error in Markov Networks 47

Page 58: Proceedings of the Third European Workshop on Probabilistic

the two principal eigenvectors and the messagefrom O1 to L1, and normalisation of the result-ing vector, yields a vector of which the com-ponents equal the approximated values for L1

in the steady state. Furthermore, Weiss provedthat the elements on the diagonals of the re-flexive matrices equal the correct probabilitiesof the relevant value of L1 and the evidence, forexample, C1,1 equals Pr(l1

1,o). Subsequently, he

related the exact probabilities for a node A inthe loop to its approximate probabilities by

Pr(ai) =λ1Pr(ai) +

∑j=2

PijλjP−1

ji∑j λj

(1)

in which P is a matrix that is composed of theeigenvectors of C, with the principal eigenvectorin the first column, and λ1...λj are the eigenval-ues of the reflexive matrices, with λ1 the max-imum eigenvalue. We note that from this for-mula it follows that correct probabilities will befound if λ1 equals 1 and all other eigenvaluesequal 0.

In the above analysis, all nodes Oi are con-sidered to be observed. Note that given unob-served nodes outside the loop, the analysis isessentially the same. In that case a transition

matrix MOiLi

=

»p rq s

–will result in the diag-

onal matrix Di =

»p + r 0

0 q + s

–.

7 The Convergence Error in Markov

Networks

As discussed in Section 3 in Bayesian networks,a distinction could be made between the cyclingerror and the convergence error. In the previ-ous section it appeared that for Markov networksuch a distinction does not exist. All errors re-sult from the cycling of information and, on firstsight, there is no equivalent for the convergenceerror. However, any Bayesian network can beconverted into an equivalent pairwise Markovnetwork on which an algorithm equivalent tothe loopy-propagation algorithm can be used.In this section, we investigate this apparent in-compatibility of results and indicate how theconvergence error yet is embedded in the anal-ysis of loopy propagation in Markov networks.

We do so by constructing the simplest situationin which a convergence error may occur, that is,the Bayesian network from Figure 1 in its priorstate, and analysing this situation in the equiv-alent Markov network. The focus thereby is onthe node that replaces the convergence node inthe loop. We then argue that the results have amore general validity.

Consider the Bayesian network from Fig-ure 1. In its prior state, there is no cy-cling of information, and exact probabilitieswill be found for nodes A and B. In nodeC, however, a convergence error may emerge.The network can be converted into the pair-wise Markov network from Figure 4. Forthis network we find the transition matrices

MXA =

»1 1 0 00 0 1 1

–, MXB =

»1 0 1 00 1 0 1

–,

MXC =

»r s t u

1 − r 1 − s 1 − t 1 − u

–, MAB =

»px q(1 − x)

(1 − p)x (1 − p)(1 − x)

–and their transposes.

In the prior state of the network, C will send themessage MCX · (1, 1) = (1, 1, 1, 1) to X. In or-der to enable the incorporation of this messageinto the reflexive loop matrices it is transformedinto DCX which, in this case, is the 4x4-identitymatrix.

We first evaluate the performance of the loopypropagation algorithm for the regular loop nodeA. This node has the following reflexive matri-ces for its clockwise and counterclockwise mes-sages respectively:

MA = M

XA · DCX · MBX · MAB =

»x 1 − xx 1 − x

with eigenvalues 1 and 0 and principal eigenvec-tor (1,1) and

MA = M

BA ·MXB ·DCX ·MAX =

»x x

1 − x 1 − x

with eigenvalues 1 and 0 and principal eigenvec-tor (x, 1−x). Note that the correct probabilitiesfor node A indeed are found on the diagonal ofthe reflexive matrices. Furthermore, λ1 = 1 andλ2 = 0 and therefore correct approximations are

48 J. H. Bolt

Page 59: Proceedings of the Third European Workshop on Probabilistic

expected. We indeed find that the approxima-tions (1 · x, 1 · (1− x)) equal the exact probabil-ities. Note also that, as expected, the messagesfrom node A back to itself do not change anymore after the first cycle. As in the Bayesiannetwork, for node A no cycling of informationoccurs in the Markov network. For node B asimilar evaluation can be made.

We now turn to the convergence node C. Inthe Bayesian network in its prior state a conver-gence error may emerge in this node. In the con-version of the Bayesian network into the Markovnetwork, the convergence node C is placed out-side the loop and the auxiliary node X is added.For X, the following reflexive matrices are com-puted for the clockwise and counterclockwisemessages from node X back to itself respec-tively:

MX = MBX ·MAB ·MXA ·DCX =[px px q(1 − x) q(1− x)

(1− p)x (1− p)x (1− q)(1 − x) (1− q)(1 − x)px px q(1 − x) q(1− x)

(1− p)x (1− p)x (1− q)(1 − x) (1− q)(1 − x)

]

with eigenvalues 1, 0, 0, 0; principal eigenvector((px+q(1−x))/((1−p)x+(1−q)(1−x)), 1, (px+q(1 − x))/((1 − p)x + (1 − q)(1 − x)), 1) andother eigenvectors (0, 0,−1, 1), (−1, 1, 0, 0) and(0, 0, 0, 0).

MX = MAX ·MBA ·MXB ·MCX =[px (1− p)x px (1− p)xpx (1− p)x px (1− p)x

q(1 − x) (1− q)(1− x) q(1 − x) (1− q)(1 − x)q(1 − x) (1− q)(1− x) q(1 − x) (1− q)(1 − x)

]

with eigenvalues 1, 0, 0, 0; principal eigenvector(x/(1−x), x/(1−x), 1, 1) and other eigenvectors(0,−1, 0, 1), (−1, 0, 1, 0) and (0, 0, 0, 0).

On the diagonal of the reflexive matrices of Xwe find the probabilities Pr(AB). As the correctprobabilities for a loop node are found on thediagonal of its reflexive matrices, these probabil-ities can be considered to be the exact probabil-ities for node X. The normalised vector of thecomponent wise multiplication of the principaleigenvectors of the two reflexive matrices of Xequals the vector with the normalised probabil-ities Pr(A) ·Pr(B). Likewise, these probabilitiescan be considered to be the approximate prob-abilities for node X.

A first observation is that λ1 equals 1 and theother eigenvalues equal 0, but the exact and theapproximate probabilities of node X may differ.This is not consistent with Equation 1. The ex-planation is that for node X, the matrix P , issingular and therefore, the matrix P−1, whichis needed in the derivation of the relationshipbetween the exact and approximate probabili-ties, does not exist. Equation 1, thus isn’t validfor the auxiliary node X. We note furthermorethat the messages from node X back to itselfmay still change after the first cycle. We there-fore find that, although in the Bayesian networkthere is no cycling of information, in the Markovnetwork, for node X information may cycle, re-sulting in errors computed for its probabilities.

The probabilities computed by the loopy-propagation algorithm for node C equal the nor-malised product MXC · v, where v is the vec-tor with the approximate probabilities found atnode X. It can easily be seen that these ap-proximate probabilities equal the approximateprobabilities found in the equivalent Bayesiannetwork. Furthermore we observe that if nodeX would send its exact probabilities, that is,Pr(AB), exact probabilities for node C wouldbe computed. In the Markov network wethus may consider the convergence error to befounded in the cycling of information for theauxiliary node X.

In Section 3, a formula for the size of the priorconvergence error in the network from figure 1is given. We there argued that this size is de-termined by the factors y and z that capturethe degree of dependency between the parentsof the convergence node and the factor x, thatindicates to which extent the dependence be-tween nodes A and B can affect the computedprobabilities. In this formula, x is composedof the conditional probabilities of node C. Inthe analysis in the Markov network we have asimilar finding. The effect of the degree of de-pendence between A and B is reflected in thedifference between the exact and the approxi-mate probabilities found for node X. The ef-fect of the conditional probabilities at node Cemerges in the transition of the message vectorfrom X to C.

Loopy Propagation: the Convergence Error in Markov Networks 49

Page 60: Proceedings of the Third European Workshop on Probabilistic

We just considered the small example net-work from Figure 1. Note, however, that forany prior binary Bayesian networks with justsimple loops, the situation for any loop can be’summarised’ to the situation in Figure 1 bymarginalisation over the relevant variables. Theresults with respect to the manifestation of theconvergence error by the cycling of informationand the invalidity of Equation 1 for the auxil-iary node, found for the network from Figure1, therefore, apply to any prior binary Bayesiannetworks with just simple loops.34

8 Discussion

Loopy propagation refers to the application ofPearl’s propagation algorithm for exact reason-ing with singly connected Bayesian networks tonetworks with loops. In previous research weidentified two different types of error that mayarise in the probabilities computed by the algo-rithm. Cycling errors result from the cycling ofinformation and arise in loop nodes as soon asfor each convergence node of the loop, either thenode itself, or one of its descendents is observed.Convergence errors result from combining infor-mation from dependent nodes as if they were in-dependent and may arise at convergence nodes.This second error type is found both in a net-work’s prior and posterior state. Loopy prop-agation has also been studied by the analysisof the performance of an equivalent algorithmin pairwise Markov networks with just a simpleloop. According to this analysis all errors resultfrom the cyling of information and on first sightthere is no equivalent for the convergence error.We investigated how the convergence error yet isembedded in the analysis of loopy propagationin Markov networks. We did so by constructingthe simplest situation in which a convergenceerror may occur, and analysing this situation inthe equivalent Markov network. We found thatthe convergence error in the Bayesian network

3Given a loop with multiple convergence nodes, in theprior state of the network, the parents of the convergencenodes are independent and effectively no loop is present.

4Two loops in sequence may result in incorrect proba-bilities entering the second loop. The reflexive matrices,however, will have a similar structure as the reflexivematrices derived in this section.

is converted to a cycling error in the equivalentMarkov network. Furthermore, we found thatthe prior convergence error is characterised bythe fact that the relationship between the exactprobabilities and the approximate probabilitiesyielded by loopy propagation, as established byWeiss, can not be derived for the loop node inwhich this error occurs. We then argued thatthese results are valid for binary Bayesian net-work with just simple loops in general.

Acknowledgements

This research was partly supported by theNetherlands Organisation for Scientific Re-search (NWO).

References

J.H. Bolt, L.C. van der Gaag. 2004. The convergenceerror in loopy propagation. Paper presented at theInternational Conference on Advances in Intelli-gent Systems: Theory and Applications.

G.F. Cooper. 1990. The computational complexityof probabilistic inference using Bayesian beliefnetworks. Artificial Intelligence, 42:393–405.

P. Dagum, M. Luby. 1993. Approximate inference inBayesian networks is NP hard. Artificial Intelli-gence, 60:141–153.

K. Murphy, Y. Weiss, M. Jordan. 1999. Loopy beliefpropagation for approximate inference: an empir-ical study. In 15th Conference on Uncertainty inArtificial Intelligence, pages 467–475.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. Mor-gan Kaufmann Publishers, Palo Alto.

Y. Weiss. 2000. Correctness of local probabilitypropagation in graphical models with loops. Neu-ral Computation, 12:1–41.

Y. Weiss, W.T. Freeman. 2001. Correctness of be-lief propagation in Gaussian graphical models ofarbitrary topology. Neural Computation, 13:2173–2200.

50 J. H. Bolt

Page 61: Proceedings of the Third European Workshop on Probabilistic

Preprocessing the MAP problem

Janneke H. Bolt and Linda C. van der GaagDepartment of Information and Computing Sciences, Utrecht University

P.O. Box 80.089, 3508 TB Utrecht, The Netherlandsjanneke,[email protected]

Abstract

The MAP problem for Bayesian networks is the problem of finding for a set of variables aninstantiation of highest posterior probability given the available evidence. The problem isknown to be computationally infeasible in general. In this paper, we present a method forpreprocessing the MAP problem with the aim of reducing the runtime requirements forits solution. Our method exploits the concepts of Markov and MAP blanket for derivingpartial information about a solution to the problem. We investigate the practicability ofour preprocessing method in combination with an exact algorithm for solving the MAPproblem for some real Bayesian networks.

1 Introduction

Upon reasoning with a Bayesian network, of-ten a best explanation is sought for a givenset of observations. Given the available evi-dence, such an explanation is an instantiation ofhighest probability for some subset of the net-work’s variables. The problem of finding suchan instantiation has an unfavourable computa-tional complexity. If the subset of variables forwhich a most likely instantiation is to be foundincludes just a single variable, then the prob-lem, which is known as the Pr problem, is NP-hard in general. A similar observation holds forthe MPE problem in which an instantiation ofhighest probability is sought for all unobservedvariables. These two problems can be solvedin polynomial time, however, for networks ofbounded treewidth. If the subset of interest is anon-singleton proper subset of the set of unob-served variables, on the other hand, the prob-lem, which is then known as the MAP problem,remains NP-hard even for networks for whichthe other two problems can be feasibly solved(Park and Darwiche, 2002).

By performing inference in a Bayesian net-work under study and establishing the mostlikely value for each variable of interest sepa-rately, an estimate for a solution to the MAP

problem may be obtained. There is no guar-antee in general, however, that the values inthe resulting joint instantiation indeed corre-spond to the values of the variables in a solu-tion to the MAP problem. In this paper, wenow show that, for some of the variables of in-terest, the computation of marginal posteriorprobabilities may in fact provide exact informa-tion about their value in a solution to the MAPproblem. We show more specifically that, bybuilding upon the concept of Markov blanket,some of the variables may be fixed to a par-ticular value; for some of the other variables ofinterest, moreover, values may be excluded fromfurther consideration. We further introduce theconcept of MAP blanket that serves to providesimilar information.

Deriving partial information about a solutionto the MAP problem by building upon the con-cepts of Markov and MAP blanket, can be ex-ploited as a preprocessing step before the prob-lem is actually solved with any available al-gorithm. The derived information in essenceserves to reduce the search space for the prob-lem and thereby reduces the algorithm’s run-time requirements. We performed an initialstudy of the practicability of our preprocess-ing method by solving MAP problems for realnetworks using an exact branch-and-bound al-

Page 62: Proceedings of the Third European Workshop on Probabilistic

gorithm, and found that preprocessing can beprofitable.

The paper is organised as follows. In Sec-tion 2, we provide some preliminaries on theMAP problem. In Section 3, we present twopropositions that constitute the basis of our pre-processing method. In Section 4, we providesome preliminary results about the practicabil-ity of our preprocessing method. The paper isended in Section 5 with our concluding obser-vations.

2 The MAP problem

Before reviewing the MAP problem, we intro-duce our notational conventions. A Bayesiannetwork is a model of a joint probability dis-tribution Pr over a set of stochastic variables,consisting of a directed acyclic graph and a setof conditional probability distributions. We de-note variables by upper-case letters (A) andtheir values by (indexed) lower-case letters (ai);sets of variables are indicated by bold-faceupper-case letters (A) and their instantiationsby bold-face lower-case letters (a). Each vari-able is represented by a node in the digraph;(conditional) independence between the vari-ables is encoded by the digraph’s set of arcsaccording to the d-separation criterion (Pearl,1988). The Markov blanket B of a variable A

consists of its neighbours in the digraph plus theparents of its children. Given its Markov blan-ket, the variable is independent of all other vari-ables in the network. The strengths of the prob-abilistic relationships between the variables arecaptured by conditional probability tables thatencode for each variable A the conditional dis-tributions Pr(A | p(A)) given its parents p(A).

Upon reasoning with a Bayesian network, of-ten a best explanation is sought for a given setof observations. Given evidence o for a subsetof variables O, such an explanation is an instan-tiation of highest probability for some subset M

of the network’s variables. The set M is calledthe MAP set for the problem; its elements arecalled the MAP variables. An instantiation m

of highest probability to the set M is termed aMAP solution; the value that is assigned to a

MAP variable in a solution m is called its MAP

value. Dependent upon the size of the MAP set,we distinguish between three different types ofproblem. If the MAP set includes just a singlevariable, the problem of finding the best expla-nation for a set of observations reduces to es-tablishing the most likely value for this variablefrom its marginal posterior probability distribu-tion. This problem is called the Pr problem as itessentially amounts to performing standard in-ference (Park and Darwiche, 2001). In the sec-ond type of problem, the MAP set includes all

non-observed variables. This problem is knownas the most probable explanation or MPE prob-

lem. In this paper, we are interested in the thirdtype of problem, called the MAP problem, inwhich the MAP set is a non-singleton propersubset of the set of non-observed variables of thenetwork under study. This problem amounts tofinding an instantiation of highest probabilityfor a designated set of variables of interest.

We would like to note that the MAP problemis more complex in essence than the other twoproblems. The Pr problem and the MPE prob-lem both are NP-hard in general and are solv-able in polynomial time for Bayesian networksof bounded treewidth. The MAP problem isNPPP-hard in general and remains NP-hard forthese restricted networks (Park, 2002).

3 Fixing MAP values

By performing inference in a Bayesian networkand solving the Pr problem for each MAP vari-able separately, an estimate for a MAP solu-tion may be obtained. There is no guaranteein general, however, that the value with highestmarginal probability for a variable correspondswith its value in a MAP solution. We now showthat, for some variables, the computation ofmarginal probabilities may in fact provide ex-act information about their MAP values.

The first property that we will exploit in thesequel, builds upon the concept of Markov blan-ket. We consider a MAP variable H and itsassociated Markov blanket. If a specific valuehi of H has highest probability in the marginaldistribution over H for all possible instantia-

52 J. H. Bolt and L. C. van der Gaag

Page 63: Proceedings of the Third European Workshop on Probabilistic

tions of the blanket, then hi will be the value ofH in a MAP solution for any MAP problem in-cluding H. Alternatively, if some value hj neverhas highest marginal probability, then this valuecannot be included in any solution.

Proposition 1. Let H be a MAP variable in a

Bayesian network and let B be its Markov blan-

ket. Let hi be a specific value of H.

1. If Pr(hi | b) ≥ Pr(hk | b) for all values

hk of H and all instantiations b of B, then

hi is the value of H in a MAP solution for

any MAP problem that includes H.

2. If there exist values hk of H with Pr(hi |b) < Pr(hk | b) for all instantiations b

of B, then hi is not the MAP value of H

in any solution to a MAP problem that in-

cludes H.

Proof. We prove the first property stated in theproposition; the proof of the second propertybuilds upon similar arguments.

We consider an arbitrary MAP problem withthe MAP set H ∪ M and the evidence o forthe observed variables O. Finding a solution tothe problem amounts to finding an instantiationto the MAP set that maximises the posteriorprobability Pr(H,M | o). We have that

Pr(hi,M | o) =

=∑

bPr(hi | b,o) · Pr(M | b,o) · Pr(b | o)

For the posterior probability Pr(hk,M | o) ananalogous expression is found. Now supposethat for the value hi of H we have that Pr(hi |b) ≥ Pr(hk | b) for all values hk of H and all in-stantiations b of B. Since B is the Markov blan-ket of H, we have that Pr(hi | b) ≥ Pr(hk | b)implies Pr(hi | b,o) ≥ Pr(hk | b,o) for all val-ues hk of H and all instantiations b of B. Weconclude that Pr(hi,M | o) ≥ Pr(hk,M | o)for all hk. The value hi of H thus is includedin a solution to the MAP problem under study.Since the above considerations are algebraicallyindependent of the MAP variables M and of theevidence o, this property holds for any MAPproblem that includes the variable H.

H C

D

F

A B

E

G

I

Figure 1: An example directed acyclic graph.

The above proposition provides for preprocess-ing a MAP problem. Prior to actually solvingthe problem, some of the MAP variables may befixed to a particular value using the first prop-erty. With the second property, moreover, var-ious values of the MAP variables may be ex-cluded from further consideration. By build-ing upon the proposition, therefore, the searchspace for the MAP problem is effectively re-duced. We illustrate this with an example.

Example 1. We consider a Bayesian networkwith the graphical structure from Figure 1.Suppose that H is a ternary MAP variable withthe values h1, h2 and h3. The Markov blanketB of H consists of the three variables A, C andD. Now, if for any instantiation b of these vari-ables we have that Pr(h1 | b) ≥ Pr(h2 | b) andPr(h1 | b) ≥ Pr(h3 | b), then h1 occurs in aMAP solution for any MAP problem that in-cludes H. By fixing the variable H to the valueh1, the search space of any such problem is re-duced by a factor 3. We consider, as an exam-ple, the MAP set H,E,G, I of ternary vari-ables. Without preprocessing, the search spaceincludes 34 = 81 possible instantiations. By fix-ing the variable H to h1, the search space of theproblem reduces to 33 = 27 instantiations.

Establishing whether or not the properties fromProposition 1 can be used for a specific MAPvariable, requires a number of computationsthat is exponential in the size of the variable’sMarkov blanket. The computations required,however, are highly local. A single restricted in-ward propagation for each instantiation of theMarkov blanket to the variable of interest suf-fices. Since the proposition moreover holds for

Preprocessing the MAP Problem 53

Page 64: Proceedings of the Third European Workshop on Probabilistic

any MAP problem that includes the variable,the computational burden involved is amortisedover all future MAP computations.

The second proposition that we will exploitfor preprocessing a MAP problem, builds uponthe new concept of MAP blanket. We considera problem with the MAP variables H ∪ M.A MAP blanket K of H now is a minimal sub-set K ⊆ M such that K d-separates H fromM \ K given the available evidence. Now, if aspecific value hi of H has highest probability inthe marginal distribution over H given the ev-idence for all possible instantiations of K, thenhi will be the MAP value of H in a solution tothe MAP problem under study. Alternatively, ifsome value hj never has highest marginal prob-ability, then this value cannot be a MAP valuein any of the problem’s solutions.

Proposition 2. Let H ∪M be the MAP set

of a given MAP problem for a Bayesian network

and let o be the evidence that is available for

the observed variables O. Let K be the MAP

blanket for the variable H given o, and let hi be

a specific value of H.

1. If Pr(hi | k,o) ≥ Pr(hk | k,o) for all values

hk of H and all instantiations k to K, then

hi is the MAP value of H in a solution to

the given MAP problem.

2. If there exist values hk of H with Pr(hi |k,o) < Pr(hk | k,o) for all instantiations

k to K, then hi is not the MAP value of H

in any solution to the given MAP problem.

The proof of the proposition is relativelystraightforward, building upon similar argu-ments as the proof of Proposition 1.

The above proposition again provides for pre-processing a MAP problem. Prior to actuallysolving the problem, the values of some vari-ables may be fixed and other values may be ex-cluded for further consideration. The proposi-tion therefore again serves to effectively reducethe search space for the problem under study.While the information derived from Proposi-tion 1 holds for any MAP problem including H,

however, Proposition 2 provides information forany problem in which H has a subset of K for itsMAP blanket and with matching evidence forthe observed variables that are not d-separatedfrom H by K. The information derived fromProposition 2, therefore, is more restricted inscope than that from Proposition 1.

We illustrate the application of Proposition 2for our example network.

Example 2. We consider again the MAP setH,E,G, I for the Bayesian network from Ex-ample 1. In the absence of any evidence, theMAP blanket of the variable H includes justthe variable I. Now, if for each value i ofI we have that Pr(h1 | i) ≥ Pr(h2 | i) andPr(h1 | i) ≥ Pr(h3 | i), then the value h1 occursin a solution to the given MAP problem. Thesearch space for actually solving the problemthus again is reduced from 81 to 27.

Establishing whether or not the properties fromProposition 2 can be used for a specific MAPvariable, requires a number of computationsthat is exponential in the size of the vari-able’s MAP blanket. The size of this blanketis strongly dependent of the network’s connec-tivity and of the location of the various MAPvariables and observed variables in the network.The MAP blanket can in fact be larger in sizethan the Markov blanket of the variable. Thecomputations required, moreover, are less localthan those required for Proposition 1 and caninvolve full inward propagations to the variableof interest. Since the proposition in additionapplies to just a restricted class of MAP prob-lems, the computational burden involved in itsverification can be amortised over other MAPcomputations to a lesser extent than that in-volved in the verification of Proposition 1.

The general idea underlying the two propo-sitions stated above is the same. The idea isto verify whether or not a particular value ofH can be fixed or excluded as a MAP value byinvestigating H’s marginal probability distribu-tions given all possible instantiations of a collec-tion of variables surrounding H. Proposition 1

54 J. H. Bolt and L. C. van der Gaag

Page 65: Proceedings of the Third European Workshop on Probabilistic

uses for this purpose the Markov blanket of H.By building upon the Markov blanket, which inessence is independent of the MAP set and ofthe entered evidence, generally applicable state-ments about the values of H are found. A dis-advantage of building upon the Markov blanket,however, is that maximally different distribu-tions for H are examined, which decreases thechances of fixing or excluding values.

By taking a blanket-like collection of variablesat a larger distance from the MAP variable H,the marginal distributions examined for H arelikely to be less divergent, which serves to in-crease the chances of fixing or excluding val-ues of H as MAP values. A major disadvan-tage of such a blanket, however, is that its sizetends to grow with the distance from H, whichwill result in an infeasibly large number of in-stantiations to be studied. For any blanket-likecollection, we observe that the MAP variablesof the problem have to either be in the blan-ket or be d-separated from H by the blanket.Proposition 2 builds upon this observation ex-plicitly and considers only the instantiations ofthe MAP blanket of the variable. The proposi-tion thereby reduces the computations involvedin its application yet retains and even furtherexploits the advantage of examining less diver-gent marginal distributions over H. Note thatthe values that can be fixed or excluded basedon the first proposition, will also be fixed or ex-cluded based on the second proposition. It maynevertheless still be worthwhile to exploit thefirst proposition because, as stated before, withthis proposition values can be fixed or excludedin general and more restricted and possibly lesscomputations are required.

So far we have argued that application ofthe two propositions serves to reduce the searchspace for a MAP problem by fixing variables toparticular values and by excluding other valuesfrom further consideration. We would like tomention that by fixing variables the graphicalstructure of the Bayesian network under studymay fall apart into unconnected components,for which the MAP problem can be solved sep-arately. We illustrate the basic idea with ourrunning example.

Example 3. We consider again the MAP setH,E,G, I for the Bayesian network from Fig-ure 1. Now suppose that the variable H can befixed to a particular value. Then, by performingevidence absorption of this value, the graphicalstructure of the network falls apart into the twocomponents A, I,H and B,C,D,E, F,G,respectively. The MAP problem then decom-poses into the problem with the MAP set Ifor the first component and the problem withthe MAP set E,G for the second component;both these problems now include the value ofH as further evidence. The search space thus isfurther reduced from 27 to 3 + 9 = 12 instanti-ations to be studied.

4 Experiments

In the previous section, we have introduced amethod for preprocessing the MAP problem forBayesian networks. In this section, we performa preliminary study of the practicability of ourmethod by solving MAP problems for real net-works using an exact algorithm. In Section 4.1we describe the set-up of the experiments; wereview the results in Section 4.2.

4.1 The Experimental Set-up

In our experiments, we study the effects of ourpreprocessing method on three real Bayesiannetworks. We first report the percentages ofvalues that are fixed or excluded by exploit-ing Proposition 1. We then compare the num-bers of fixed variables as well as the numbersof network propagations with and without pre-processing, upon solving various MAP problemswith a state-of-the-art exact algorithm.

In our experiments, we use three realBayesian networks with a relatively high con-nectivity; Table 1 reports the numbers of vari-ables and values for these networks. The Wil-

son’s disease network (WD) is a small net-work in medicine, developed for the diagnosisof Wilson’s liver disease (Korver and Lucas,1993). The classical swine fever network (CSF)is a network in veterinary science, currentlyunder development, for the early detection ofoutbreaks of classical swine fever in pig herds

Preprocessing the MAP Problem 55

Page 66: Proceedings of the Third European Workshop on Probabilistic

(Geenen and Van der Gaag, 2005). The ex-tended oesophageal cancer network (OESO+) isa moderately-sized network in medicine, whichhas been developed for the prediction of re-sponse to treatment of oesophageal cancer (Ale-man et al, 2000). For each network, we computeMAP solutions for randomly generated MAPsets with 25% and 50% of the network’s vari-ables, respectively; for each size, five sets aregenerated. We did not set any evidence.

For solving the various MAP problems in ourexperiments, we use a basic implementation ofthe exact branch-and-bound algorithm availablefrom Park and Darwiche (2003). This algorithmsolves the MAP problem exactly for most net-works for which the Pr and MPE problems arefeasible. The algorithm constructs a depth-firstsearch tree by choosing values for subsequentMAP variables, cutting off branches using anupper bound. Since our preprocessing methodreduces the search space by fixing variables andexcluding values, it essentially serves to decreasethe depth of the tree and to diminish its branch-ing factor.

4.2 Experimental results

In the first experiment, we established for eachnetwork the number of values that could befixed or excluded by applying Proposition 1,that is, by studying the marginal distributionsper variable given its Markov blanket. For com-putational reasons, we decided not to investi-gate variables for which the associated blankethad more than 45 000 different instantiations;this number is arbitrarily chosen. In the WD,CSF and OESO+ networks, there were 0, 8 and15 of such variables respectively.

The results of the first experiment are pre-sented in Table 1. The table reports, for eachnetwork, the total number of variables, the to-tal number of values, the number of variablesfor which a value could be fixed, and the num-ber of values that could be fixed or excluded;note that if, for example, for a ternary variablea value can be fixed, then also two values canbe excluded. We observe that 17.1% to 19.0%of the variables could be fixed to a particularvalue. The number of values that could be fixed

Table 1: The numbers of fixed and excludedvalues.

network #vars. #vals. #vars.f. #vals.f.+e.WD 21 56 4(19.0%) 13(23.2%)CSF 41 98 7(17.1%) 15(15.3%)

OESO+ 67 175 12(17.9%) 27(15.4%)

or excluded ranges between 15.3% and 23.2%.We would like to stress that whenever a variablecan be fixed to a particular value, this result isvalid for any MAP problem that includes thisvariable. The computations involved, therefore,have to be performed only once.

In the second experiment, we compared foreach network the numbers of variables thatcould be fixed by the two different preprocess-ing steps; for Proposition 2, we restricted thenumber of network propagations per MAP vari-able to four because of the limited applicabil-ity of the resulting information. We further es-tablished the numbers of network propagationsof the exact branch-and-bound algorithm thatwere forestalled by the preprocessing.

The results of the second experiment are pre-sented in Table 2. The table reports in the twoleftmost columns, for each network, the sizesof the MAP sets used and the average num-ber of network propagations performed with-out any preprocessing. In the subsequent twocolumns, it reports the average number of vari-ables that could be fixed to a particular valueby using Proposition 1 and the average num-ber of network propagations performed by thebranch-and-bound algorithm after this prepro-cessing step. In the fifth and sixth columns, thetable reports the numbers obtained with Propo-sition 2; the sixth column in addition shows, be-tween parenthesis, the average number of prop-agations that are required for the applicationof Proposition 2. In the final two columns ofthe table, results for the two preprocessing stepscombined are reported: the final but one columnagain mentions the number of variables thatcould be fixed to a particular value by Proposi-tion 1; it moreover mentions the average num-ber of variables that could be fixed to a partic-ular value by Proposition 2 after Proposition 1

56 J. H. Bolt and L. C. van der Gaag

Page 67: Proceedings of the Third European Workshop on Probabilistic

had been used. The rightmost column reportsthe average number of network propagations re-quired by the branch-and-bound algorithm. Wewould like to note that the additional computa-tions required for Proposition 2 are restrictednetwork propagations; although the worst-casecomplexity of these propagations is the sameas that of the propagations performed by thebranch-and-bound algorithm, their runtime re-quirements may be considerably less.

From Table 2, we observe that the number ofnetwork propagations performed by the branch-and-bound algorithm grows with the numberof MAP variables, as expected. For the twosmaller networks, we observe in fact that thenumber of propagations without preprocessingequals the number of MAP variables plus one.For the larger OESO+ network, the number ofnetwork propagations performed by the algo-rithm is much larger than the number of MAPvariables. This finding is not unexpected sincethe MAP problem has a high computationalcomplexity. For the smaller networks, we fur-ther observe that each variable that is fixedby one of the preprocessing steps translates di-rectly into a reduction of the number of prop-agations by one. For the OESO+ network, wefind a larger reduction in the number of net-work propagations per fixed variable. Our ex-perimental results thus indicate that the num-ber of propagations required by the branch-and-bound algorithm indeed is decreased by fixingvariables to their MAP value.

With respect to using Proposition 1, we ob-serve that in all networks under study a rea-sonably number of variables could be fixed toa MAP value. We did not take the number oflocal propagations required for this propositioninto consideration because the computationalburden involved is amortised over future MAPcomputations. With respect to using Proposi-tion 2, we observe that for all networks and allMAP sets the additional computations involvedoutweigh the number of network computationsthat are forestalled for the algorithm. This ob-servation applies to using just Proposition 2 aswell as to using the proposition after Proposi-tion 1 has been applied. We also observe that

fewer variables are fixed in the step based onProposition 2 than in the step based on Propo-sition 1. This can be attributed to the limitednumber of propagations used in the step basedon Proposition 2. We conclude that, for the net-works under study, it has been quite worthwhileto use Proposition 1 as a preprocessing step be-fore actually solving the various MAP problems.Because of the higher computational burden in-volved and its relative lack of additional value,the use of Proposition 2 has not been worthwhilefor our networks and associated problems.

To conclude, in Section 3 we observed thatfixing variables to their MAP value could serveto partition a MAP problem into smaller prob-lems. Such a partition did not occur in ourexperiments. We would like to note, however,that we studied MAP problems without evi-dence only. We expect that in the presence ofevidence MAP problems will more readily bepartitioned into smaller problems.

5 Conclusions and discussion

The MAP problem for Bayesian networks isthe problem of finding for a set of variablesan instantiation of highest posterior probabil-ity given the available evidence. The problemhas an unfavourable computational complexity,being NPPP-hard in general. In this paper, weshowed that computation of the marginal pos-terior probabilities of a variable H given itsMarkov blanket may provide exact informationabout its value in a MAP solution. This infor-mation is valid for any MAP problem that in-cludes the variable H. We further showed thatcomputation of the marginal probabilities of H

given its MAP blanket may also provide exactinformation about its value. This informationis valid, however, for a more restricted class ofMAP problems. We argued that these resultscan be exploited for preprocessing MAP prob-lems before they are actually solved using anystate-of-the-art algorithm for this purpose.

We performed a preliminary experimentalstudy of the practicability of the preprocessingsteps by solving MAP problems for three differ-ent Bayesian networks using an exact branch-

Preprocessing the MAP Problem 57

Page 68: Proceedings of the Third European Workshop on Probabilistic

Table 2: The number of network propagations without and with preprocessing using Propositions1 and 2.

Prop. 1 Prop. 2 Prop. 1+2network #MAP #props. #vars. f. #props. #vars. f. #props. (add.) #vars. f. #props. (add.)WD 5 6.0 1.2 4.8 0.0 6.0 (0.6) 1.2 + 0.0 4.8 (0.6)

10 11.0 2.4 8.6 1.0 10.0 (6.4) 2.4 + 0.2 8.4 (4.0)CSF 10 11.0 2.0 9.0 1.0 10.0 (1.6) 2.0 + 0.4 8.6 (0.4)

20 21.0 3.4 17.6 1.6 19.4 (8.2) 3.4 + 0.6 17.0 (5.0)OESO+ 17 24.8 3.4 18.4 0.8 22.0 (4.2) 3.4 + 0.0 18.4 (2.2)

34 49.8 5.8 40.8 1.8 47.4 (7.8) 5.8 + 0.4 40.0 (4.6)

and-bound algorithm. As expected, the num-ber of network propagations required by the al-gorithm is effectively decreased by fixing MAPvariables to their appropriate values. We foundthat by building upon the concept of Markovblanket for 17.1% to 19.0% of the variables aMAP value could be fixed. Since the results ofthis preprocessing step are applicable to all fu-ture MAP computations and the computationalburden involved thus is amortised, we consid-ered it worthwhile to perform this step for theinvestigated networks. We would like to addthat, because the computations involved arehighly local, the step may also be feasibly ap-plied to networks that are too large for the exactMAP algorithm used in the experiments. Withrespect to building upon the concept of MAPblanket, we found that for the investigated net-works and associated MAP problems, the com-putations involved outweighed the reduction innetwork propagations that was achieved. Sincethe networks in our study were comparable withrespect to size and connectivity, further exper-iments are necessary before any definitive con-clusion can be drawn with respect to this pre-processing step.

In our further research, we will expand theexperiments to networks with different numbersof variables, different cardinality and divergingconnectivity. More specifically, we will investi-gate for which types of network preprocessing ismost profitable. In our future experiments wewill also take the effect of evidence into account.We will further investigate if the class of MAPproblems that can be feasibly solved can be ex-tended by our preprocessing method. We hopeto report further results in the near future.

Acknowledgements

This research was partly supported by theNetherlands Organisation for Scientific Re-search (NWO).

References

B.M.P. Aleman, L.C. van der Gaag and B.G. Taal.2000. A Decision Model for Oesophageal Cancer– Definition Document (internal publication inDutch).

P.L. Geenen and L.C. van der Gaag. 2005. Devel-oping a Bayesian network for clinical diagnosisin veterinary medicine: from the individual tothe herd. In 3rd Bayesian Modelling ApplicationsWorkshop, held in conjunction with the 21st Con-ference on Uncertainty in Artificial Intelligence.

M. Korver and P.J.F. Lucas. 1993. Converting a rule-based expert system into a belief network. MedicalInformatics, 18:219–241.

J. Park and A. Darwiche. 2001. ApproximatingMAP using local search. In 17th Conference onUncertainty in Artificial Intelligence, pages 403–410.

J. Park. 2002. MAP complexity results and approx-imation methods. In 18th Conference on Uncer-tainty in Artificial Intelligence, pages 388–396.

J. Park and A. Darwiche. 2003. Solving MAP exactlyusing systematic search. In 19th Conference onUncertainty in Artificial Intelligence, pages 459–468.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. Mor-gan Kaufmann Publishers, Palo Alto.

58 J. H. Bolt and L. C. van der Gaag

Page 69: Proceedings of the Third European Workshop on Probabilistic

Quartet-Based Learning of Shallow Latent Variables

Tao Chen and Nevin L. Zhang

Department of Computer Science and EngineeringThe Hong Kong University of Science and Technology, Hong Kong, China

csct,[email protected]

Abstract

Hierarchical latent class(HLC) models are tree-structured Bayesian networks where leafnodes are observed while internal nodes are hidden. We explore the following two-stageapproach for learning HLC models: One first identifies the shallow latent variables –latent variables adjacent to observed variables – and then determines the structure amongthe shallow and possibly some other “deep” latent variables. This paper is concernedwith the first stage. In earlier work, we have shown how shallow latent variables can becorrectly identified from quartet submodels if one could learn them without errors. Inreality, one does make errors when learning quartet submodels. In this paper, we studythe probability of such errors and propose a method that can reliably identify shallowlatent variables despite of the errors.

1 Introduction

Hierarchical latent class (HLC) models (Zhang,2004) are tree-structured Bayesian networkswhere variables at leaf nodes are observed andhence are called manifest variables (nodes),while variables at internal nodes are hidden andhence are called latent variables (nodes). HLCmodels were first identified by Pearl (1988) asa potentially useful class of models and werefirst systematically studied by Zhang (2004) asa framework to alleviate the disadvantages ofLC models for clustering. As a tool for clusteranalysis, HLC Models produce more meaningfulclusters than latent class models and they allowmulti-way clustering at the same time. As a toolfor probabilistic modeling, they can model high-order interactions among observed variables andhelp one to reveal interesting latent structuresbehind data. They also facilitate unsupervisedprofiling.

Several algorithms for learning HLC modelshave been proposed. Among them, the heuris-tic single hill-climbing (HSHC) algorithm devel-oped by Zhang and Kocka (2004) is currentlythe most efficient. HSHC has been used tosuccessfully analyze, among others, the CoIL

Challenge 2000 data set (van der Putten andvan Someren, 2004), which consists of 42 man-ifest variables and 5,822 records, and a dataset about traditional Chinese medicine (TCM),which consists of 35 manifest variables and2,600 records.

In terms of running time, HSHC took 98hours to analyze the aforementioned TCM dataset on a top-end PC, and 121 hours to analyzethe CoIL Challenge 2000 data set. It is clearthat HSHC will not be able to analyze data setswith hundreds of manifest variables.

Aimed at developing algorithms more efficientthan currently available, we explore a two-stageapproach where one (1) identifies the shallowlatent variables, i.e. latent variables adjacentto observed variables, and (2) determines thestructure among those shallow, and possiblysome other “deep”, latent variables. This pa-per is concerned with the first stage.

In earlier work (Chen and Zhang, 2005), wehave shown how shallow latent variables can becorrectly identified from quartet submodels ifone could learn them without errors. In real-ity, one does make errors when learning quartet-submodels. In this paper, we study the proba-bility of such errors and propose a method that

Page 70: Proceedings of the Third European Workshop on Probabilistic

X1

X2 Y1 X3

Y2 Y3 Y4 Y5 Y6 Y7

(a)

X1

X2 Y1 X3

Y2 Y3 Y4 Y5 Y6 Y7

(b)

Figure 1: An example HLC model and the cor-responding unrooted HLC model. The Xi’s arelatent nodes and the Yj’s are manifest nodes.

can reliably identify shallow latent variables de-spite of the errors.

2 HLC Models and Shallow Latent

Variables

Figure 1 (a) shows an example HLC model.Zhang (2004) has proved that it is impossible todetermine, from data, the orientation of edgesin an HLC model. One can learn only unrooted

HLC models, i.e. HLC models with all direc-tions on the edges dropped. Figure 1 (b) showsan example unrooted HLC model. An unrootedHLC model represents a class of HLC models.Members of the class are obtained by rootingthe model at various nodes. Semantically it isa Markov random field on an undirected tree.In the rest of this paper, we are concerned onlywith unrooted HLC models.

In this paper, we will use the term HLC struc-

ture to refer to the set of nodes in an HLCmodel and the connections among them. HLCstructure is regular if it does not contain latentnodes of degree 2. Starting from an irregularHLC structure, we can obtain a regular struc-ture by connecting the two neighbors of eachlatent node of degree 2 and then remove thatnode. This process is known as regularization.In this paper, we only consider regular HLCstructures.

In an HLC model, a shallow latent variable

(SLV) is one that is adjacent to at least onemanifest variable. Two manifest variables aresiblings if they are adjacent to the same (shal-low) latent variable. For a given shallow la-tent variable X, all manifest variables adjacentto X constitute a sibling cluster. In the HLCstructure shown in Figure 1 (b), there are 3sibling clusters, namely Y1, Y2, Y3, Y4, andY5, Y6, Y7. They correspond to the three la-

U WTV

(a)

U V T W

(b)

U T V W

(c)

U W V T

(d)

Figure 2: Four possible quartet substructuresfor a quartet Q = U, V, T,W. The fork in (a)is denoted by [UV TW ], the dogbones in (b), (c),and (d) respectively by [UV |TW ], [UT |V W ],and [UW |V T ].

tent variables in the model respectively.

3 Quartet-Based SLV Discovery:

The Principle

A shallow latent node is defined by its relation-ship with its manifest neighbors. Hence to iden-tify the shallow latent nodes means to identifysibling clusters. To identify sibling clusters, weneed to determine, for each pair of manifest vari-ables (U, V ), whether U and V are siblings. Inthis section, we explain how to answer this ques-tion by inspecting quartet submodels.

A quartet is a set of four manifest variables,e.g., Q = U, V, T,W. The restriction of anHLC model structure S onto Q is obtained fromS by deleting all the nodes and edges not inthe paths between any pair of variables in Q.Applying regularization to the resulting HLCstructure, we obtain the quartet substructure forQ, which we denote by S|Q. As shown in Figure2, S|Q is either the fork [UV TW ], or one of thedogbones [UV |TW ], [UT |V W ], and [UW |V T ].

Consider the HLC structure in Figure 1 (b).The quartet substrcuture for Y1, Y2, Y3, Y4is the fork [Y1Y2Y3Y4], while that forY1, Y2, Y4, Y5 is the dogbone [Y1Y5|Y2Y4],and that for Y1, Y2, Y5, Y6 is the dogbone[Y1Y2|Y5Y6].

It is obvious that if two manifest variables Uand V are siblings in the structure S, then theymust be siblings in any quartet substructurethat involves both of them. Chen and Zhang

60 T. Chen and N. L. Zhang

Page 71: Proceedings of the Third European Workshop on Probabilistic

(2005) has proved the converse. So, we have

Theorem 1 Suppose S is a regular HLC struc-

ture. Let (U, V ) be a pair of manifest variables.

Then U and V are not siblings in S iff there ex-

ist two other manifest variables T and W such

that S|U,V,T,W

is a dogbone where U and V are

not siblings.

Theorem 1 indicates that we can determinewhether U and V are siblings by examiningall possible quartets involving both U and V .There are (n−2)(n−3)

2 such quartets, where nis the number of manifest variables. We nextpresent a result that allows us to do the sameby examining only n− 3 quartets.

Let (U, V ) be a pair of manifest variables andT be a third one. We use QUV |T to denote thefollowing collection of quartets:

QUV |T := U, V, T,W|W∈Y\U, V, T,

where Y is the set of all manifest variables.T appears in every quartet in QUV |T and thuscalled a standing member of QUV |T . It is clearthat QUV |T consists of n−3 quartets. Chen andZhang (2005) has also proved the following:

Theorem 2 Suppose S is a regular HLC struc-

ture. Let (U, V ) be a pair of manifest variables

and T be a third one. U and V are not siblings

in S iff there exists a quartet Q ∈ QUV |T such

that S|Q is a dogbone where U and V are not

siblings.

In learning tasks, we do not know the struc-ture of the generative model structure S. Wheredo we obtain the quartet substructures? Theanswer is to learn them from data. Let M bean HLC model with a regular structure S. Sup-pose that D is a collection of i.i.d samples drawnfrom M. Each record in D contains values forthe manifest variables, but not for the latentvariables. Let QSL(D,Q) be a routine that takesdata D and a quartet Q as inputs, and returnsan HLC structure on the quartet Q. One canuse the HSHC algorithm to implement QSL, andone can first project the data D onto the quartetQ when learning the substructure for Q.

Suppose QSL is error-free, i.e. QSL(D,Q) =S|Q for any quartet Q. By Theorem 2, we

can determine whether two manifest variablesU and V are siblings (in the generative model)as follow:

• Pick a third manifest variable T .• For each Q ∈ QUV |T , call QSL(D,Q).• If U and V are not siblings in one of the

resulting substructures, then conclude thatthey are not siblings (in the generativemodel).• If U and V are siblings in all the resulting

substructures, then conclude that they aresiblings (in the generative model).

We can run the above procedure on each pairof manifest variables to determine whether theyare siblings. Afterwards, we can summarize allthe results using a sibling graph. The siblinggraph is an undirected graph over the manifestvariables where two variables U and V are ad-jacent iff they are determined as siblings.

If QSL is error-free, then each connected com-ponent of the sibling graph should be com-pletely connected and correspond to one latentvariable. For example, if the structure of thegenerative model is as Figure 3 (a), then thesibling graph that we obtain will be as Fig-ure 3 (b). There are four completely connectedcomponents, namely Y1, Y2, Y3, Y4, Y5, Y6,Y7, Y8, Y9, Y10, Y11, Y12, which respectivelycorrespond to the four latent variables in thegenerative structure.

4 Probability of Learning Quartet

Submodels Correctly

In the previous section, we assumed that QSL iserror-free. In reality, one does make mistakeswhen learning quartet submodels. We have em-pirically studied the probability of such errors.

For our experiments, QSL was implementedusing the HSHC algorithm. For model selection,we tried each of the scoring functions, namelyBIC (Schwarz, 1978), BICe (Kocka and Zhang,2002), AIC (Akaike, 1974), and the Cheeseman-Stutz(CS) score (Cheeseman and Stutz, 1995).

We randomly generated around 20,000 quar-tet models. About half of them arefork-structured, while the rest are dogbone-structured. The cardinalities of the variables

Quartet-Based Learning of Shallow Latent Variables 61

Page 72: Proceedings of the Third European Workshop on Probabilistic

range from 2 to 5. From each of the models,we sampled data sets of size 500, 1,000, 2,500,5,000. The QSL was then used to analyze thedata sets. In all the experiments, QSL producedeither forks or dogbones. Consequently, thereare only three classes of errors:

F2D: The generative model was a fork, and QSL

produced a dogbone.

D2F: The generative model was a dogbone, andQSL produced a fork.

D2D: The generative model was a dogbone, andQSL produced a different dogbone.

The statistics are shown in Table 5. To un-derstand the meaning of the numbers, considerthe number 0.83% at the upper-left corner ofthe top table. It means that, when the samplesize was 500, QSL returned dogbones in 0.83%percent, or 83, of the 10,011 cases where thegenerative models were forks. In all the othercases, QSL returned the correct fork structure.

It is clear from the tables that: The probabil-ity of D2F errors is large; the probability of F2Derrors is small; and the probability of D2D er-rors is very small, especially when BIC or BICeare used for model selection. Also note thatthe probability of D2F decreases with samplesize, but that of F2D errors do not. In the nextsection, we will use those observations when de-signing an algorithm for identifying shallow la-tent variables.

In terms of comparison among the scoringfunctions, BIC and BICe are clearly preferredover the other two as far as F2D and D2D er-rors are concerned. It is interesting to observethat BICe, although proposed as an improve-ment to BIC, is not as good as BIC when itcomes to learning quartet models. For the restof this paper, we use BIC.

5 Quartet-Based SLV Discovery: An

Algorithm

The quartet-based approach to SLV discov-ery consists of three steps: (1) learn quartetsubmodels, (2) determine sibling relationshipsamong manifest variables and hence obtain asibling graph, and (3) introduce SLVs based onthe sibling graph. Three questions ensue:

Table 1: Percentage of times that QSL producedthe wrong quartet structure. The table on thetop is for the fork-structured generative models,while the table at the bottom is for the dogbone-structured generative models.

500(F2D) 1000(F2D) 2500(F2D) 5000(F2D)BIC 0.83% 1.87% 3.70% 3.93%BICe 1.42% 3.16% 6.58% 7.86%AIC 12.8% 10.2% 6.81% 4.85%CS 6.20% 5.05% 6.19% 6.41%

Total=10011

500 1000 2500 5000D2F D2D D2F D2D D2F D2D D2F D2D

BIC 95.3%0.00% 87.0%0.00% 68.2%0.00% 51.5%0.00%BICe 95.0%0.05% 86.8%0.00% 67.8%0.04% 50.6%0.04%AIC 71.2%2.71% 60.5%1.58% 46.7%0.54% 39.0%0.38%CS 88.5%1.93% 83.4%0.74% 67.0%0.30% 51.5%0.13%

Total=10023

i) Which quartets should we use in Step 1?ii) How do we determine sibling relationships

in Step 2 based on results from Step 1?iii) How do we introduce SLVs in Step 3 based

on the sibling graph constructed in Step 2?

Our answer to the second question is sim-ple: two manifest variables are regarded as non-siblings if they are not siblings in one of thequartet submodels. In the next two subsections,we discuss the other two questions.

5.1 SLV Introduction

As seen in Section 3, when QSL is error-free, thesibling graph one obtains has a nice property:every connected component is a fully connectedsubgraph. In this case, the rule for SLV intro-duction is obvious:

Introduce one latent variable for eachconnected component.

When QSL is error-prone, the sibling graph oneobtains no longer has the aforementioned prop-erty. Suppose data are sampled from a modelwith the structure shown in Figure 3 (a). Thenwhat one obtains might be the graphs (c), (d),or (e) instead of (b).

Nonetheless, we still use the above SLV in-troduction rule for the general case. We chooseit for its simplicity, and for the lack of betteralternatives. This choice also endows the SLVdiscovery algorithm being developed with errortolerance abilities.

62 T. Chen and N. L. Zhang

Page 73: Proceedings of the Third European Workshop on Probabilistic

X1

X2 X3 X4

Y1

Y2

Y3

Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12

(a) (b)

(c) (d)

(e)

X1 X2

X3 X4

Y1 Y2 Y3 Y4 Y5 Y6

Y7 Y8 Y9 Y10 Y11 Y12

(f)

Figure 3: Generative model (a), sibling graphs(b, c, d, e), and shallow latent variables (f).

There are three types of mistakes that onecan make when introducing SLVs, namely latent

omission, latent commission, and misclustering.In the example shown in Figure 3, if we intro-duce SLVs based on the sibling graph (c), thenwe will introduce three latent variables. Two ofthem correspond respectively to the latent vari-ables X1 and X2 in the generative model, whilethe third corresponds to a “merge” of X3 andX4. So, one latent variable is omitted.

If we introduce SLVs based on the siblinggraph (d), then we will introduce five latentvariables. Three of them correspond respec-tively to X1, X3, and X4, while the other twoare both related to X2. So, one latent variableis commissioned.

If we introduce SLVs based on the siblinggraph (e), then we will introduce four latentvariables. Two of them correspond respectivelyto X1 and X4. The other two are related to X2

and X3, but there is not clear correspondence.

This is a case of misclustering.

We next turn to Question 1. There, the mostimportant concern is how to minimize errors.

5.2 Quartet Selection

To determine whether two manifest variables Uand V are siblings, we can consider all quartetsin QUV |T , i.e. all the quartets with a third vari-able T as a standing member. This selection ofquartets will be referred to as the parsimonious

selection. There are only n − 3 quartets in theselection.

When QSL were error-free, one can use QUV |T

to correctly determine whether U and V are sib-lings. As an example, suppose data are sampledfrom a model with the structure shown in Fig-ure 3 (a), and we want to determine whetherY9 and Y11 are siblings based on data. Furthersuppose Y4 is picked as the standing member.If QSL is error-free, we get 4 dogbones whereY9 and Y11 are not sibling, namely [Y9Y7|Y11Y4],[Y9Y8|Y11Y4], [Y9Y4|Y11Y10], [Y9Y4|Y11Y12], andhence conclude that Y9 and Y11 are not siblings.

In reality, QSL does make mistakes. Accord-ing to Section 4, the probability of D2F errorsis quite high. There is therefore good chancethat, instead of the aforementioned 4 dogbones,we get 4 forks. In that case, Y9 and Y11 will beregarded as siblings, resulting in a fake edge inthe sibling graph.

QUV |T represents one extreme when it comesto quartet selection. The other extreme is to useall the quartets that involve both U and V . Thisselection of quartets will be referred to as thegenerous selection. Suppose U and V are non-siblings in the generative model. This gener-ous choice will reduce the effects of D2F errors.As a matter of fact, there are now many morequartets, when compared with the case of par-simonious quartet selection, for which the truestructures are dogbones with U and V beingnon-siblings. If we learn one of those structurescorrectly, we will be able to correctly identify Uand V as non-siblings.

The generous selection also comes with adrawback. In our running example, considerthe task of determining whether Y1 and Y2 aresiblings based on data. There are 36 quartets

Quartet-Based Learning of Shallow Latent Variables 63

Page 74: Proceedings of the Third European Workshop on Probabilistic

that involve Y1 and Y2 and for which the truestructures are forks. Those include [Y1Y2Y4Y7],[Y1Y2Y4Y10], and so on. According to the Sec-tion 4, the probability for QSL to make anF2D mistake on any one of those quartets issmall. But there are 36 of them. There isgood chance for QSL to make an F2D mistakeon one of them. If the structure QSL learnedfor [Y1Y2Y4Y7], for instance, turns out to be thedogbone [Y1Y4|Y2Y7], then Y1 and Y2 will be re-garded as non-siblings, and hence the edge be-tween Y1 and Y2 will be missing from the siblinggraph.

Those discussions point to the middle groundbetween parsimonious and generous quartet se-lection. One natural way to explore this middleground is to use several standing members in-stead of one.

5.3 The Algorithm

Figure 4 shows an algorithm for discov-ering shallow latent variables, namelyDiscoverSLVs. The algorithm first calls asubroutine ConstructSGraph to construct asibling graph, and then finds all the connectedcomponents of the graph. It is understood thatone latent variable is introduced for each of theconnected components.

The subroutine ConstructSGraph starts fromthe complete graph. For each pair of manifestvariables U and V , it considers all the quartetsthat involve U , V , and one of the m standingmembers Ti (i = 1, 2, . . . ,m). QSL is called tolearn a submodel structure for each of the quar-tets. If U and V are not siblings in one of thequartet substructures, the edge between U andV is deleted.

DiscoverSLVs has error tolerance mechanismnaturally built in. This is mainly because it re-gards connected components in sibling graph assibling clusters. Let C be a sibling cluster inthe generative model. The vertices in C will beplaced in one cluster by DiscoverSLVs if theyare in the same connected component in the sib-ling graph G produced by ConstructSGraph. Itis not required for variables in C to be pairwiseconnected in G. Therefore, a few mistakes byConstructSGraph when determining sibling re-

Algorithm DiscoverSLVs(D,m):

1. G← ConstructSGraph(D,m).2. return

the list of the connected components of G.

Algorithm ConstructSGraph(D,m):

1. G← complete graph over manifest nodes Y.2. for each edge (U, V ) of G,3. pick T1, · · · , Tm ⊆ Y\U, V 4. for each Q∈ ∪m

i=1 QUV |Ti,

5. if QSL(D,Q)=[U ∗ |V ∗]6. delete edge (U, V ) from G, break.7. endFor.8. endFor.9. return G.

Figure 4: An algorithm for learning SLVs.

lationships are tolerated. When the set C is notvery small, it takes a few or more mistakes inthe right combination to break up C.

6 Empirical Evaluation

We have carried out simulation experiments toevaluate the ability of DiscoverSLVs in discov-ering latent variables. This section describes thesetup of the experiments and reports our find-ings.

The generative models in the experimentsshare the same structure. The structure con-sists of 39 manifest variables and 13 latent vari-ables. The latent variables form a complete 3-ary tree of height two. Each latent variable inthe structure is connected to 3 manifest vari-ables. Hence all latent variables are shallow.The cardinalities of all variables were set at 3.

We created 10 generative models from thestructure by randomly assigning parameter val-ues. From each of the 10 generative models, wesampled 5 data sets of 500, 1,000, 2,500, 5,000and 10,000 records. DiscoverSLVs was run oneach of the data sets three times, with the num-ber of standing members m set at 1, 3 and 5respectively. The algorithms were implementedin Java and all experiments were run on a Pen-tium 4 PC with a clock rate of 3.2 GHz.

The performance statistics are summarized inTable 2. They consist of errors at three different

64 T. Chen and N. L. Zhang

Page 75: Proceedings of the Third European Workshop on Probabilistic

Table 2: Performance statistics of our SLV discovery algorithm.QSL-level Edge-level SLV-level

m D2F D2D F2D ME FE LO LCO MC

Sample size = 500

1 77.0%(4.5%) 0.06%(0.05%) 0.40%(0.13%) 1.0(1.1) 88.2(9.2) 11(3.0) 0(0) 0(0)

3 71.8%(4.0%) 0.05%(0.03%) 0.36%(0.14%) 2.9(1.3) 28.0(8.3) 9.3(3.1) 0(0) 0.1(0.3)

5 68.8%(3.0%) 0.05%(0.02%) 0.30%(0.14%) 3.3(1.6) 14.7(6.0) 5.1(1.9) 0.2(0.4) 0(0)

Sample size = 1000

1 65.3%(6.2%) 0.01%(0.03%) 0.17%(0.14%) 0.3(0.6) 55.3(15.7) 11.2(0.7) 0(0) 0(0)

3 58.3%(2.7%) 0.02%(0.02%) 0.10%(0.07%) 0.7(0.9) 10.1(6.5) 4.8(2.9) 0(0) 0(0)

5 56.6%(3.2%) 0.01%(0.01%) 0.10%(0.08%) 1.1(0.7) 6.2(3.8) 2.3(1.3) 0.1(0.3) 0(0)

Sample size = 2500

1 50.1%(2.3%) 0.00%(0.00%) 0.04%(0.07%) 0.2(0.4) 20.4(8.9) 8.6(2.3) 0(0) 0(0)

3 38.0%(4.6%) 0.00%(0.00%) 0.02%(0.04)% 0.2(0.4) 3.1(3.7) 1.4(1.4) 0(0) 0(0)

5 37.3%(3.8%) 0.00%(0.01%) 0.07%(0.08)% 1.1(1.4) 1.3(2.5) 1.5(0.9) 0.1(0.3) 0(0)

Sample size = 5000

1 32.6%(5.5%) 0.00%(0.00%) 0.03%(0.09%) 0(0) 7.7(6.0) 3.9(2.4) 0(0) 0(0)

3 25.2%(4.4%) 0.01%(0.01%) 0.04%(0.06%) 0.3(0.5) 1.5(2.7) 0.5(0.7) 0(0) 0(0)

5 25.7%(3.9%) 0.00%(0.01%) 0.10%(0.11%) 0.9(0.9) 0.9(2.7) 0.1(0.3) 0(0) 0(0)

Sample size = 10000

1 21.8%(5.9%) 0.00%(0.00%) 0.02%(0.07%) 0(0) 2.0(3.3) 1.2(1.8) 0(0) 0(0)

3 17.5%(3.9%) 0.00%(0.00%) 0.05%(0.06%) 0.2(0.4) 0.4(1.2) 0.1(0.3) 0(0) 0(0)

5 17.4%(4.1%) 0.00%(0.01%) 0.03%(0.04%) 0.6(0.8) 0.1(0.3) 0.1(0.3) 0(0) 0(0)

levels of the algorithm: the errors made whenlearning quartet substructures (QSL-level), theerrors made when determining sibling relation-ships between manifest variables (edge-level),and the errors made when introducing SLVs(SLV-level). Each number in the table is anaverage over the 10 generative models. Thecorresponding standard deviations are given inparentheses.

QSL-level errors: We see that the proba-bilities of the QSL-level errors are significantlysmaller than those reported in Section 4. This isbecause we deal with strong dependency modelshere, while the numbers in Section 4 are aboutgeneral models. This indicates that strong de-pendency assumption does make learning easier.On the other hand, the trends remain the same:the probability of D2F errors is large, that ofF2D errors is small, and that of D2D errors arevery small. Moreover, the probability of D2Ferrors decreases with sample size.

Edge-level errors: For the edge-level, thenumbers of missing edges (ME) and the numberof fake edges (FE) are reported. We see that thenumber of missing edges is always small, and ingeneral it increases with the number of standingmembers m. This is expected since the larger mis, the more quartets one examines, and hencethe more likely one makes F2D errors.

The number of fake edges is large when thesample size is small and m is small. In gen-eral, it decreases with sample size and m. Itdropped to 1.3 when for the case of sample size

2,500 and m = 5. This is also expected. As mincreases, the number of quartets examined alsoincreases. For two manifest variables U and Vthat are not siblings in the generative model,the probability of obtaining a dogbone (despiteD2F errors) where U and V are not siblings alsoincreases. The number of fake edges decreaseswith m because as m increases, the probabilityof D2F errors decreases.

SLV-level errors: We now turn to SLV-level errors. Because there were not many miss-ing edges, true sibling clusters of the generativemodels were almost never broken up. There areonly five exceptions. The first exception hap-pened for one generative model in the case ofsample size 500 and m = 3. In that case, a man-ifest variable from one true cluster was placedinto another, resulting in one misclustering er-ror (MC).

The other four exceptions happened for thefollowing combinations of sample size and m:(500, 5), (1000, 5), (2500, 5). In those cases,one true sibling cluster was broken into two clus-ters, resulting in four latent commission errors(LCO).

Fake edges cause clusters to merge and hencelead to latent omission errors. In our experi-ments, the true clusters were almost never bro-ken up. Hence a good way to measure latentomission errors is to use the total number ofshallow latent variables, i.e. 13, minus the num-ber of clusters returned by DiscoverSLVs. Wecall this the number of LO errors. In Table 2,we

Quartet-Based Learning of Shallow Latent Variables 65

Page 76: Proceedings of the Third European Workshop on Probabilistic

see that the number of LO errors decreases withsample size and the number of standing mem-bers m. It dropped to 1.4 when for the case ofsample size 2,500 and m = 3. When the samplesize was increased to 10,000 and m set to 3 or5, LO errors occurred only once for one of the10 generative models.

Running time: The running times ofDiscoverSLVs are summarized in the followingtable (in hours). For the sake of comparison, wealso include the times HSHC took attempting toreconstruct the generative models based on thesame data as used by DiscoverSLVs. We seethat DiscoverSLVs took only a small fractionof the time that HSHC took, especially in thecases with large samples. This indicates thatthe two-stage approach that we are exploringcan result algorithms significantly more efficientthan HSHC.

RunningTime(hrs) 500 1000 2500 5000 10000

m=1 1.05 1.06 0.92 0.89 0.84

m=3 1.57 1.52 1.49 1.79 1.91

m=5 1.85 2.07 2.17 2.72 3.02

HSHC 4.65 8.69 23.7 43.0 118.6

The numbers of quartets examined byDiscoverSLVs are summarized in the follow-ing table. We see that DiscoverSLVs exam-ined only a small fraction of all the C4

39=82251possible quartets. We also see that doubling mdoes not imply doubling the number of quar-tets examined, nor doubling the running time.Moreover, the number of quartets examined de-creases with sample size. This is because theprobability of D2F errors decrease with samplesize.

500 1000 2500 5000 10000

m=1 5552 4191 2861 2131 1816

m=3 8326 5939 4659 4292 4112

m=5 9801 8178 6776 6536 6496

Total: 82251

7 Related Work

Linear latent variable graphs (LLVGs) are a spe-cial class of structural equation models. Vari-ables in such models are continuous. Some areobserved, while others are latent. Sliva et al.

(2003) has studied the problem of identifyingSLVs in LLVGs. Their approach is based on theTetrad constraints (Spirtes et al., 2000), and itscomplexity is O(n6).

Phylogenetic trees (PT) (St. John et al.,2003) can be viewed as a special class of HLC

models. The quartet-based method for PT re-construction first learn submodels for all C4

n

possible quartets and then use them to buildthe overall tree ( St. John et al., 2003).

8 Conclusions

We have empirically studied the probability ofmaking errors when learning quartet HLC mod-els and have observed some interesting regular-ities. In particular, we observed the probabilityof D2F errors is high and decreases with samplesize, while the probability of F2D errors is low.Based on those observations, we have developedan algorithm for discovering shallow latent vari-ables in HLC models reliably and efficiently.

Acknowledgements

Research on this work was supported by HongKong Grants Council Grant #622105.

References

Akaike, H. (1974). A new look at the statisticalmodel identification. IEEE Transactions on Au-tomotic Control,.

Cheeseman, P. and Stutz, J. (1995). Bayesian classi-fication (AutoClass): Theory and results. In Ad-vances in knowledge discovery and data mining.

Chen, T. and Zhang, N.L. (2006). Quartet-BasedLearning of HLC Models: Discovery of ShallowLatent Variables. In AIMath-06.

Kocka, T. and Zhang, N.L. (2002). Dimension cor-rection for hierarchical latent class models. InUAI-02.

Pearl, J. (1988). Probabilistic Reasoning in Intelli-gent Systems: Networks of Plausible Inference.

Schwarz, G. (1978). Estimating the dimension of amodel. Annuals of Statistics.

Silva, R., Scheines, R., Glymour, C. and Spirtes, P.(2003). Learning measurement models for unob-served variables. In UAI-03.

Spirtes, P., Glymour, C. and Scheines, R. (2000).Causation, Prediction and Search.

St. John, K., Warnow, T., Moret, B.M.E. andVawter, L. (2003) Performance study of phylo-genetic methods: (unweighted) quartet methodsand neighbor-joining. Journal of Algorithms.

van der Putten, P. and van Someren, M. (2004).A Bias-Variance Analysis of a Real World Learn-ing Problem: The CoIL Challenge 2000. MachineLearning.

Zhang, N.L. (2004). Hierarchical latent class modelsfor cluster analysis. JMLR.

Zhang, N.L. and Kocka, T. (2004). Efficient learningof hierarchical latent class models. In ICTAI-04.

66 T. Chen and N. L. Zhang

Page 77: Proceedings of the Third European Workshop on Probabilistic

!"#$&%'$)(* +-,/..0214365-7988:<;>=?=@=BA/CED?FGIHKJL9M

NOQP,/.RSTO?U>R&7WVYX[Z\7>U]7>S_^`ZBab,9Udce+&fda^gU]OQaahi^j.k9^gUd^`,mln^`o`^jR,/.0qprUdarR^jRfdRrOstO\u]^gUdkWRr7>UvwhYx4vwyzWz>|/2~363x43

tQ@?wWl^u]Rf].OQa7WVdRr.<fUdZB,/RrOQcO\u]P67>U]O?U>R^`,Wo`a&lnbX&6^gUdf]O?UdZ\O cw^`,/kW.,9S_aZB,9U.OQP.OQarO?U>RcdOQZB^`a^j7>UPd.798]ojO?Sab^jRc^`aZ\.OBRrO)cOQZB^`a^j7>U,/.<^`,W8ojOQa!b^jR]7>fdRYo`^gS^jR,/R^j7>Uda!7>UR]O&c^`aRr.^`8wf]R^j7>Uda7WV6Z\7>U9R^gUf]7>fdaZd,9UdZ\O)/,/.^`,W8ojOQa 7W.R]OU,/Rf].O&7WV6R]Obf]R^`o`^jR0TVfUdZ\R^j7>UaB3d^`aP,WP6OB.Pd.OQaO?U9Ra,9UTO\uRrO?Uda^j7>UT7WVl)X^gU]f]O?UdZ\O&c^`,/kW.<,9S_aYRd,/R ,Wo`oj7@ba 867WRTcw^`aZ\.OBRrOb,9UdcTZ\7>U>R^gUf]7>fda cOQZB^`a^j7>U/,/.^`,W8ojOQaB3b^`aUdOBST7]cdOQoR]OZ\7>U>R^gU f]7>fa)cdOQZB^`a^j7>Uel)X^gU]f]O?UdZ\Oc^`,/kW.<,9S¡cdOBWOQoj79Pwa)cdOQZB^`a^j7>U.fdojOQaEV¢7W.Z\7>U>R^gUf]7>fda_cdOQZB^`a^j7>U£,/.<^`,W8ojOQa_,Wa_,¤VfUdZ\R^j7>U¥7WVR]OQ^j._Z\7>U>R^gU f]7>faP,/.O?U>RaB3¥p¦U¥R^`a_ST7]cdOQoKv,§Z\7>U>R^gUf]7>fdacOQZB^`a^j7>U¨U]7]cdOq^`aS_,/.k9^gUd,Wo`^j©BOQc¨8 0.OQPwo`,WZB^gU]kª^jRTb^jR«,§cOBRrOB.<S_^gUd^`aR^`ZqZ<d,9UdZ\OUd7cdO,9Udc§,WPPwoj0^gU]kq,9U¤79P6OB.,/R^j7>U¤cOB.^jWOQc§VI.7>S¬R]O_STOBR]7]c­7WV[Z\7>U>W79ogf]R^j7>Uda^gU¤Pd.798,W8w^`o`^jR0RdOB7W.0W3 prUO\udPOB.<^gSTO?U>RaBvZ\7>U9R^gUf]7>fdacdOQZB^`a^j7>U­lnbX®^gU]fdO?UdZ\Oc^`,/kW.,9Sa)Pd.7@^`cdO,9Uq^gUdZ\.OQ,WarO^gUnS_,ud^gSfwS¯O\udP6OQZ\RrOQcf]R^`o`^jR°0q]O?UZ\7>SP,/.OQcRr7O\ud^`arR^gU]k2STOBR]7]caB3

± ²@³´Wµ9¶!· ¸ ¹´9º¶t³

xiU G¢»Q¼-MJB»:<J¥LWGK½?¾WA¦½WF ^`an,¨Z\7>SP,WZ\RekW.,WPd^`ZB,Wo.OQPd.OQarO?U9R,/R^j7>UmVI7W.-,cdOQZB^`a^j7>UmPd.798ojO?S¿fUdcdOB.EfUZ\OB.R,W^gU>R0W3 p¦U^jR^`,Wo`oj0Wv¤^gUdf]O?UdZ\OÀc^`,/kW.<,9S_a£-OB.OPd.79P679arOQc®8>0¨Á7?E,/.c¡,9Udcl,/R]OQar7>U¦ÃQÄWÅz]_,Wa,¡V¢.7>U9R¦ÆO?UdcÇVI7W.ÈcdOQZB^`a^j7>UÇRr.OBOQaQ3Édfd8aOQÊf]O?U>Roj0WvË ogSarRrOQcª¦ÃQÄWÅWÌ &,9Udc­d,WZ<>RrOB.¦ÃQÄWÅWÍ )cdOBWOQoj79P6OQcSTOBRd7caVI7W.OB/,Wogfd,/R^gU]k4,9UT^gUdf]O?UdZ\O)c^`,/kW.<,9SÎc^.OQZ\Roj0§b^jRd7>f]R4Z\7>U>WOB.R^gU]k¤^jR4Rr7,ncdOQZB^`a^j7>UªRr.OBOW3b]OQaOTS_OBR]7]cai,WaafS_ORd,/R,Wo`ofwUdZ\OB.R,W^gU­/,/.^,W8ojOQa^gUªR]OS_7cOQo,/.O2.OQPd.OQarO?U>RrOQc¥8>0Èc^`aZ\.OBRrOPd.798w,W8^`o`^jR0S,WaaeVfUdZ\R^j7>Ua¥KÏ-lnÐ-Ña\e,9UdcRd,/RcdOQZB^`a^j7>Ue,/.<^`,W8ojOQad,?WOc^`aZ\.OBRrO4arR,/RrO4aP,WZ\OQaB3prU]f]O?UZ\OÈc^`,/kW.,9SÒS_7cOQo`aqRd,/RP6OB.<S^jRqZ\7>U

R^gUf]7>fdanZd,9UdZ\O¥,9UdcÀcdOQZB^`a^j7>U/,/.^`,W8ojOQan^gUZBogfdcdOÓ ,9fdaa^`,9UÔ^gU]6f]O?UdZ\OÕc^`,/kW.,9SaÖÆdd,WZ9RrOB.×,9UdcØO?UdojOB0Wv-ÃQÄWÅWÄ vYS^uRfd.OQa7WV Ó ,9faa^`,9Uda^gUdf]O?UdZ\Oc^`,/kW.,9SaÇKÏ79o`,9Udc×,9Udc×dd,WZ9RrOB.Qv¨ÃQÄWÄWÌ v¥,9Udco`^gU]OQ,/.rÊ]fd,Wcd.<,/R^`ZZ\7>Uc^jR^j7>Ud,Wo Ó ,9fdaa^`,9U^gUdf]O?UdZ\Oc^`,/kW.,9Sa_ln,WcwarO?U§,9UdcÈÙ9O?UdarO?Uvty/W9| 3X[,WZ<¤7WVR]OQarObS_7cOQo`atO\udPoj79^jRa SfdojR^j/,/.^`,/RrOU]7W.S_,WoPd.798],W8^`o`^jR°0R]OB7W.0ªRr7ÈPd.7@^`cOmR]OqcdOQZB^`a^j7>U«arRr.,/RrOBkW0,9UdcO\udP6OQZ\RrOQcf]R^`o`^jR0Rd,/Rar79ojWOQaeR]OÈ^gUdf]O?UdZ\Oc^`,/kW.,9Se35-798w8­,9Uc§]O?U]7?0£y/W/z]^gU>Rr.7cfdZ\O_S_^u]Rf].OQa

7WV)Rr.<fwUdZB,/RrOQc¥O\u]P67>U]O?U>R^`,Wo`aqln)X)^gU]f]O?UdZ\O2c^,/kW.,9SaBvd^`Z<#,/.O^gUdf]O?UdZ\Oc^`,/kW.,9Sa]OB.OPd.798w,W8^`o`^jR0cw^`arRr.^`8wf]R^j7>Uan,9UdcÂf]R^`o`^jR0VKfUZ\R^j7>Uda,/.O­.OQPd.OQaO?U9RrOQc¡8>0¡ln)X#P67WRrO?U>R^`,Wo`aB3ÇNOQZB^`a^j7>U/,/.^`,W8ojOQa-^gUml)X¥^gU]f]O?UdZ\Oc^`,/kW.,9S_a-SfdarR[d,?WOc^`aZ\.OBRrOÈarR,/RrOÈaP,WZ\OQaBÚ]7?-OBWOB.QvS_,9U>0®cdOQZB^`a^j7>UPd.798wojO?S_a_^gUPd.,WZ\R^`Z\O§d,QWO¤Z\7>U>R^gU f]7>famcdOQZB^`a^j7>U/,/.^`,W8ojOQaB35-7>Ua^`cdOB. RdObVI79o`oj7?b^gU]kcdOQZB^`a^j7>UmPd.798wojO?SÎVI.7>S

OQZ\7>U]7>S^`ZBaKarOBO4ÐY^jk>f].OÃÛ

xÜS_7>U]79P679o`^`arRbVK,WZ\OQafUdZ\OB.R,W^gUcdO?S,9UdccOQPO?UcdO?U9R7>UVK,QW7W.<,W8ojOÝ ÞßÃ7W.fwU]VI,?W7W.,W8ojOàÝ ÞÒqS,/.áWOBRZ\7>Udcw^R^j7>UaB3NO?S_,9Udc_^`a[.OBOQZ\RrOQc^gU_R]OPd.<^`Z\OKâT!^jR-.OQZ\OQ^jWOQa&VI7W.&^jRa-7>f]RPwf]RÆã3 ]OS_7>U]79P679o`^`arRZB,9UwU]7WRcw^j.OQZ\Roj04798aOB.WO-cO\S,9Udcv¤8wf]R.,/R]OB..OQo`^jOQa¡7>UäR]O.O\a<fdojRaKåE7WV ,S_,/.áWOBRaf].WOB0nRr7mk9,9f]kWOcO?S_,9Udcv),9Uc®RfdaPd.OQc^`Z\RqPd.<^`Z\O£Kâ3+E,WarOQc®7>U«R]Oea<f].WOB0£.OQafdojRaKåv-RdOS_7>U]79P679o`^`arRSfdarRæcOBRrOB.<S_^gU]O^jRa79PdR^gS,Wo7>fdRPwf]R4Æã3

bdO¥ST7>Ud79P79o`^`aR­ZB,9UÇPd.7]cwfdZ\OV¢.7>SçRr7èfUd^jRa«,9Uc"d,Wa£R]OV¢79o`oj7@b^gU]k¿fdR^`o`^jR0#VKfwUdZ\R^j7>UÛ

Page 78: Proceedings of the Third European Workshop on Probabilistic

ÐY^jk>f].OmÃ/Û xUe^gU]f]O?UdZ\Ocw^`,/kW.,9S VI7W.bR]OST7>U]79P]79o`^`arRBÑacdOQZB^`a^j7>UnPd.798ojO?Se3

94Þ ®| ®y/WWWW]3b]OmP.798,W8^`o^jR°0§7WV),9UfUdVI,?W7W.,W8ojOqS_,/.áWOBRmÝ Þ i^`a]3 z9]3bdOSTOQ,9U£7WVRdOZ\7>Udc^jR^j7>U,Wo Ó ,9faa^`,9UPd.798,8^`o`^jR°0ecdO?Uda^jR°0VKfUZ\R^j7>UªKÏ NiÐ[[VI7W.P.^`Z\OmKâE^`ab,o`^gU]OQ,/.[VKfwUdZ\R^j7>UT7WVÊ]fd,9U9R^jR°0nÆãP.7cfdZ\OQcv9^`Z<^`aq8,WaOQc7>UR]OªS_,/.áWOBRZ\7>Uc^jR^j7>Udae,WaqVI79o`oj7?aBÛâãÝ Þ £|/WWèQÃBWW_,9UdcâãÝ ÞÜà !"£¦ÃBWWW#ª|/QÃBWW 3&OQarR.OQa<fdojRaÈKå,/.O§k9^jWO?Uà,Wae,«P6OB.Z\O?U>R,/kWOÈ7WVZ\7>UafwSTOB.aYaf].WOB0WOQc_]7^gU>RrO?UdcTRr78wf]0R]ObPd.7]cwfdZ\R,9Udcª,/.OmST7]cdOQojOQc§8 0¤R]O_VI79o`oj7?b^gU]k­86OBR,eÏ NiÐ-ÑaBÛå$wÝ Þ¯%'&)(+*-,6¦Ã/. Ìy.è>i,9Udc§å0Ý Þ Ã&)(+*-,6y.èQÃ/. Ì 3prU]fdO?UdZ\OÕc^`,/kW.,9S_a#^jRZ\7>U9R^gUf]7>fdaäcdOQZB^

a^j7>U ,/.^`,W8wojOQa¿,9Udc U]7>U Ó ,9fdaa^`,9U Z\7>U>R^gU fd7>fdaZd,9UdZ\OÎ/,/.^`,W8ojOQa,/.OÜc^21ZQfdojRàRr7"ar79ojWOfda^gU]kZQf]..O?U9RÈS_OBR]7c79oj7WkW0W3Òbd^`a§P,WP6OB.È^gU>Rr.7]cwfdZ\OQaR]O)Z\7>U>R^gUf]7>fda-cdOQZB^`a^j7>Ul)X^gU]f]O?UdZ\O)cw^`,/kW.,9SÆ5ENl)Xp¦Nvd,S_7cOQod^`Z,Wo`oj7@ba),9U>0Z\7>S8^Ud,/R^j7>U7WVc^`aZ\.OBRrOq,9Udc£Z\7>U>R^gU f]7>faTZ<d,9UdZ\O/,/.^,W8ojOQab^jRÈU]7e.OQarRr.^`Z\R^j7>Uda47>U¤R]OTR0]P6O_7WVEPd.798],W8^`o`^jR°0«c^`arRr.^`8wfdR^j7>Uv ,Wa-OQo`o),Wa_c^`aZ\.OBRrOe,[email protected]\7>U>R^gU fd7>fda[cOQZB^`a^j7>U_/,/.^`,W8ojOQaQ3[5ENln)X p NÑa-ZB,9U,Wo`ar7,WZBZ\7>SST7]c,/RrO®Z\7>Uc^jR^j7>Ud,Wo`oj0ÜcdOBRrOB.<S^gUd^`arR^`ZZd,9UdZ\Oª/,/.^`,W8ojOQaQ3äxiUà79P6OB.,/R^j7>UÀcOB.^jWOQcÀVI.7>SR]O2STOBR]7]c¤7WV&Z\7>U9W79ogf]R^j7>Ua^gUªPd.798,W8w^`o`^jR0¤R]O\7W.0£^`afdaOQc£Rr7§OQo`^gS_^gUd,/RrOn,§Z\7>U>R^gUf]7>fdacOQZB^`a^j7>UU]7]cdOEcwf].^gUdkRdOEa79ogf]R^j7>UTPwd,WaOE^gU,45&Nil)X p N3]OÀ.O?S,W^gUdcdOB.®7WV§R^`aP,WP6OB.¡^`a7W.k9,9Ud^j©BOQc

,WaVI79o`oj7?baQ3 OQZ\R^j7>Uyªk9^jWOQa­U]7WR,/R^j7>Uà,9UdcàcOBV ^gUd^jR^j7>UaB3 OQZ\R^j7>UÇ̨cdOQaZ\.<^`8OQan79P6OB.,/R^j7>Uda¤fdarOQcRr7¥ar79ojWOª5ENln)X p NÑaB3ÂOQZ\R^j7>UzPd.OQaO?U9RamR]O5ENl)Xp¦N ar79ogf]R^j7>URr7®RdOªO\ud,9S_PojOªP.798ojO?Sq3OQZ\R^j7>U¡|§Pd.7?]^`cdOQaZ\7>UdZBogfa^j7>UdaB3àbd^`aP,WP6OB.m^`aO\u]Rr.,WZ\RrOQc¥V¢.7>S*,noj7>UdkWOB.-7W.á]^gU]kP,WP6OB.Æ5-7988vy/W9Í 3

ÐY^jk>f].OyÛ p¦U]6f]O?UdZ\Oc^`,/kW.,9S"V¢7W.X!ud,9S_PojOTÃ/3

4 5ª¶æ´/66´9ºÆ¶³76³ ·98;:=<-³[º´9ºÆ¶t³?>@BADC EGF ?d H FJIh,/.^`,W8wojOQaYb^`o`o]8O-cdO?U]7WRrOQcT8 04ZB,WP^jR,WodojOBRrRrOB.aQv9OW3 kd3`vK L&MON3ÇOBRam7WV/,/.^`,W8ojOQa2b^`o`o86OncdO?U]7WRrOQc8 08679o`cdVK,WZ\O¨ZB,WP^jR,Wo_ojOBRrRrOB.aQv)P ^jV,Wo`o_,/.O¨cw^`aZ\.OBRrOZd,9UdZ\O/,/.^`,W8ojOQaBvRQ¨^jVY,Wo`ot,/.O4Z\7>U>R^gU f]7>fabZ<,9UdZ\O/,/.^`,W8ojOQaBvTSÉ^jVb,Wo`oE,/.OcdOQZB^`a^j7>U¥/,/.^`,W8ojOQaBv 7W.#U^jVeRdO¡Z\7>SP7>UdO?U9Ra¥,/.O,ÂS^uRfd.O®7WVncw^`aZ\.OBRrOZd,9UdZ\OWvZ\7>U9R^gUf]7>fdanZ<d,9UdZ\OWv,9UcàcdOQZB^`a^j7>U/,/.^,W8ojOQaQ3p VTUÖ^`a,arOBR7WV ,/.^`,W8wojOQaBvWV¥^`a,mZ\7>U=Xk>f.,/R^j7>U7WVaP6OQZB^2XwZarR,/RrOQai7WVR]79aO/,/.^`,W8ojOQaB3b]Oc^`aZ\.OBRrOWvZ\7>U>R^gU fd7>fdaBvw7W.iS_^u]OQcearR,/RrOaP,WZ\O7WVTU^`a)cO?U]7WRrOQce8 0%Y[Zi3l)XÜP.798,W8^`o`^jR°0£P67WRrO?U>R^`,Wo`aBv&c^`aZ\.OBRrOPd.798,

8^`o`^jR°0ÇP67WRrO?U>R^`,Wo`aBv_,9UdcÎcdOBRrOB.S_^gUd^`arR^`Z£P67WRrO?U>R^`,Wo`a,/.OcdO?U]7WRrOQcn8 0qoj7?-OB.rZB,WarOkW.OBOBáeojOBRrRrOB.aBvOW3 kd3`vW\ v] v_^3 ln)X¨fdR^`o`^jR0mP67WRrO?U>R^`,Wo`a),/.OcdO?U]7WRrOQc8 0 a` 3prU¯kW.,WPwd^`ZB,Wo2.OQPd.OQarO?U9R,/R^j7>UaBvecdOQZB^`a^j7>U/,/.^

,W8ojOQa¿,/.O .OQPd.OQarO?U>RrOQc 8 0 .OQZ\R,9U]k>fdo`,/. U]7]cdOQaIb^jR,£a^gUdk9ojOÈ87W.<cdOB.V¢7W.ec^`aZ\.OBRrOÈ,9Udc,«cd7>f8ojOi867W.cdOB.-V¢7W.&Z\7>U>R^gU fd7>fda\vwZd,9UdZ\Oi,/.<^`,W8ojOQa&,/.O.OQP.OQarO?U>RrOQc¨8>0£7@,Wo`a¤Ib^jR«,Èa^gU]k9ojO867W.cdOB.TVI7W.c^`aZ\.OBRrOWv6,2cd7>fd8wojO867W.cdOB.V¢7W.iZ\7>U9R^gUf]7>fdaQv6,9Udc­,Rr.^`PwojOi867W.cOB.-^jVR]OZ<d,9UdZ\O/,/.^`,W8ojO^`a&cOBRrOB.<S_^gU^`arR^`Zv-,9Udc£f]R^`o`^jR0¥VKfwUdZ\R^j7>Uda,/.O2.OQPd.OQarO?U>RrOQc«8 0c^`,9S_7>UdcaB3b V dcfehgji CaA 5E7>Uda^`cdOB.[R]Ob^gU]fdO?UdZ\Obc^`,/kW.,9S¿^gUÐY^jk>f].O)y3YprURd^`aST7]cdOQoKv>ÝÔ^`a!,cw^`aZ\.OBRrO&/,/.^`,W8ojO^`Z<ZB,9UTR,/áWOb7>U,WogfdOQa[ÝßÞi7W.[Ý ÞÂÃ/3!b]O/,/.^`,W8ojOQaå ,9Udcã ,/.OZ\7>U>R^gU f]7>fa4Z<,9UdZ\O/,/.^,W8ojOQaQvb^jRàã ,£cdOBRrOB.S_^gUd^`arR^`Z¤Zd,9UdZ\O§/,/.^`,W8ojOÆãä^`aZ\7>Udc^jR^j7>Ud,Wo`oj0ªcOBRrOB.<S_^gUd^`aR^`Z_k9^jWO?U¥,e/,Wogf]O7WVYå3 b]O4fdR^`o`^jR0qVfUdZ\R^j7>UÈ k [^`a),_Z\7>U>R^gU fd7>fdaVfUdZ\R^j7>U7WVã3@BAl@ m H V onoi Fdprq n I 9dois

b V e FI i I /HIgIx¨S^uRf].O 7WVRr.<fwUdZB,/RrOQc4O\udP67>U]O?U>R^`,Wo`a)ln)X)P67/RrO?U>R^`,Wod^gU,9U^gU]f]O?UdZ\OEc^`,/kW.,9SÎ,WaRdO-VI79o`oj7?^gU]k

68 B. R. Cobb

Page 79: Proceedings of the Third European Workshop on Probabilistic

cdO XU^jR^j7>Uv]d^`Z<2^`a&,TST7]c^2XwZB,/R^j7>Um7WVR]OcdO XUd^R^j7>UeP.79P679arOQcq8>01fwS ,9Udc],WogS_OB. 7>U§y/W9| 3 ;JB»GI½ 3 stOBR'U 8O",*S^uOQc » c^gS_O?Uda^j7>Ud,Wo4,/.^`,W8wojOW3 sæOBR P Þ . . . vQ Þ . . . v,9Udc"S Þ . . . 286OR]Oc^`aZ\.OBRrOiZd,9UdZ\OWvZ\7>U>R^gU fd7>fda)Z<,9UdZ\OWv,9UdccdO\ZB^`a^j7>UÇ/,/.^`,W8ojO¨P,/.Ra7WVMU­v.OQaP6OQZ\R^jWOQoj0WvTb^jR! #"$ &% Þ('Y3¨x¬VKfUZ\R^j7>U*)ÇÛ Y Z,+- ./ ^`a,9Unl)XP67WRrO?U9R^`,Wo^jVY7>U]O47WVYR]OU]O\u]RbR].OBOZ\7>Uc^jR^j7>Uda]79o`caBÛ

Ã/3P10MSäÞ32T,9Uc4)­ZB,9Ue86Oi).^jRrRrO?Ue,Wa)jVYÞ&)5]YÞ , 6 879

`;: , ` O\udP<9= : ?>A@ =CB`ED =GF

¦ÃVI7W.,Wo`o V*H Y Ziv]OB.O, ` I-ÞÜ_ . . . J ,9Udc> @ =KB` v IÞÎÃ/ . . . J¤vMLÞÂÃ/ . . .o ,/.O[.OQ,Wo>UfS86OB.aQ3

y3P 09S Þ 2",9UcÖR]OB.O¿^`aÂ, P,/.R^jR^j7>UYN+ . . .oYPO&7WVBYRQ_^gU>Rr74>0P6OB.ZQfd86OQaYafdZRd,/R)^`acdO XU]OQcq,Wa

)jVY!Þ&)TSjVY ^jV VUHGYRS y

]OB.O_OQ,WZV)TS_XW£Þ×Ã/ . . . XY¥ZB,9Uª86O_b.^jRrRrO?U^gUeR]OVI7W.<S#7WV!OQÊ]fd,/R^j7>U¥¦ÃiK^K3 OW3EOQ,WZ<Z)TS_^`a,9Unln)XP67WRrO?U9R^`,Wot7>UYPS3

Ì3P[0S]\Þ32i,9UdcVI7W.OQ,WZ<)Xdu]OQcT,WogfdO4_^T s HYP`acbv)dfe g5d[ZB,9Ue86OcdO XUdOQcq,Wab^gUÈy 3

prU«R]O­cdO XUd^jR^j7>U®,W867?WOWvhY¨^`aR]O¤U fwS86OB._7WV GJ:JKi ve,9UcjJ ^`a¥R]OÀU fS86OB.ª7WVnO\u]P67>U]O?U>R^`,WoJBAFki ^gUiOQ,WZP^jOQZ\O[7WV R]O-ln)XP67WRrO?U9R^`,WoK3YprUiR]ORd^j.c¥ZB,WarOWv!R]O2P67WRrO?U9R^`,Woml A ½?¾9F_JB»n_i )dfe g5dVI7W.,Wo`o!_^T s 6HGY ` e b Z\7>UdarR^jRfdRrOR]Oil)X«P67WRrO?U>R^`,WoVI7W. Pf+Q LS 93ÈprUªR^`aPw,WPOB.?v,Wo`o)ln)XÎP.798,8^`o`^jR°0,9Ucnf]R^`o`^jR0qP67WRrO?U>R^`,Wo`ab,/.OOQÊ]fd,WoRr7T©BOB.7^gUfUdaPOQZB^2XwOQc.OBk9^j7>UdaB3prUn5ENl)Xp¦NÑaBvw,Wo`oæP.798,8^`o`^jR°0mc^`arRr.<^`8wf]R^j7>Uda-,9Udcqf]R^`o`^jR02VfUdZ\R^j7>Uda-,/.O,WP]Pd.7?u]^gS,/RrOQce8 0l)X®P67WRrO?U>R^`,Wo`aB3bdO¤cO XUd^jR^j7>UàPd.OQarO?U9RrOQcÀdOB.OÈ,WaafS_OQa2Rd,/R

cdOQZB^`a^j7>UÕ,/.^`,W8wojOQaà,/.OÜc^`aZ\.OBRrOW3 bd^`aP,WP6OB.Pd.OQaO?U9Ra!,STOBRd7cVI7W.cdOBWOQoj79P^gU]k,cdOQZB^`a^j7>U.<fdojOVI7W.,Z\7>U>R^gUf]7>fdacdOQZB^`a^j7>Un,/.^`,W8wojO,Wa,TVKfwUdZ\R^j7>U

7WVT^jRaeZ\7>U>R^gU f]7>fanP,/.O?U>RaBÚ_]7@[OBWOB.Qv4R]OªlnbX.OQPd.OQarO?U9R,/R^j7>U®7WVRd^`amS_OBR]7c X.<arRfdarOQa_,ªc^`a¦Z\.OBRrO,WPPd.7?u]^gS,/R^j7>URr7mR]OZ\7>U9R^gUf]7>fdaicdOQZB^`a^j7>U/,/.^`,W8ojOW3@WA;o m q bqp F whH g H¢ ^fS i I oH¢ HjidfdPwP79aOr)!s^`a),9U^gUdPwfdR&l)XP7WRrO?U>R^`,WoVI7W. U"ÞPj0Qt0 S¬.OQPd.OQaO?U9R^gU]k_,TÏ[NiÐVI7W.u#H Q£k9^jWO?U^jRabPw,/.O?U9Ra Uwv x 93 p°V[OZB,9UqWOB.^jVI0mR,/Ry z ) s jV D " D ÞÂÃ Ì

VI7W.4,Wo`oTV[H Y Z6|~ vt-O_arR,/RrORd,/R)Rs!^`a,9UÈlnbXcdO?Uda^jR0­V¢7W.k43§Om,WaafS_ORd,/R,Wo`o ^gUdPwf]RlnbXPd.798w,W8^`o`^jR0­P7WRrO?U>R^`,Wo`a^gU­,5&Nil)X p N,/.OTU]7W.rS_,Wo`^j©BOQcnPd.^j7W.)Rr7_RdO4ar79ogf]R^j7>UnPwd,WarOW3@WA S i> i] cfH I HIB HI p F i I /HIgx L>JJBAFGI»GiKGK: ;JQ»nGK½ cdOQaZ\.<^`8OQanR]O£o`^gU]OQ,/.cdOBRrOB.<S^gUd^`arR^`Zb.OQo`,/R^j7>Udad^`P86OBR-OBO?U2,4arOBR-7WV/,/.^,W8ojOQa!Q¥Þ x . . .o 93ExÂcdOBRrOB.<S_^gU^`arR^`ZP67WRrO?UR^`,WoÆ5E7988,9Udc2d]O?U]7@0Wvy/W9| YV¢7W. Qª^`a cdO XUdOQc_,Wa,9UOQÊ]fd,/R^j7>U

5] Þ¡xÞ w5]!Þx : Iz]

]OB.O]vr¬Þ Ã/ . . . â,/.O£Z\7>UarR,9U9RaQ3 b]OOQÊ]fd,/R^j7>U 5]2Þ ¥cdO XUdOQa_,o`^gUdOQ,/.2cdOBRrOB.S_^gU^`arR^`Z2.OQo`,/R^j7>Uda<d^`Pv!]OB.O 5]Þ , D , D > v,9UdcàdOB.O , . . .o, A ,9Udc > ,/.O.OQ,Wo UfS86OB.aB3b]OTZ\7O 1ZB^jO?U9R#,A ` 7>U ` H QÀ^`aOQÊ]fd,WoERr7¨Ã]O?UR]OecdOBRrOB.S_^gUd^`arR^`ZP67WRrO?U>R^`,Wo)^`aaP6OQZB^2XOQce,Wab,_Z\7>Udc^jR^j7>Ud,WoP67WRrO?U9R^`,WotV¢7W.h ` k9^jWO?UQvr ` 3]Ol ½9:Æ;WAXi 5]Þ x-,/.OS_,W^gUR,W^gU]OQc¨,WaT,§cOQZ\7>S_P679arOQc£arOBRT7WV[OQ^jk>>RrOQc«OQÊ]fd,R^j7>UdaBv b^jR.OQPd.OQarO?U9R^gUdk­RdO JQGj¾ ?_i VI7W.T,Wo`o¥ÞäÃ/ . . .<âT32b]OT-OQ^jk>9RabR°0P^`ZB,Wo`oj0§.OQafdojRVI.7>S R]OS_,/.k9^gU,Wo`^j©Q,/R^j7>Un7WV,mc^`aZ\.OBRrO,/.<^`,W8ojOWv,Waba<]7?U^gUeX!ud,9S_PojO4ÌT^gUOQZ\R^j7>UnÌ3 y3gÃ/3§O RrOB.<S 5] Þ x , P67WRrO?U>R^`,Wo^gU R]O arO?UarO Rd,/R ,9U90 /,/.^`,W8ojO `R,/áWOQaÔ7>U RdO /,Wogf]O D ` Þ r,A D , e `_ D `_ , e ` / D ` / , A D > G/, `b^jRÂPd.798,W8^`o`^jR0ã,9UdcÎ,9U907WR]OB.§/,Wogf]O£b^jRPd.798w,W8^`o`^jR0¨]3Àxa_b^jRlnbXP67WRrO?U>R^`,Wo`aBvSfdoR^`PojO_VI.,/k>S_O?U9Ra7WVER]OTVI7W.<SÕ^gUIz]ZB,9UÈcO XU]O_,cdOBRrOB.<S^gUd^`arR^`ZP67WRrO?U>R^`,WoæVI7W.,_aOBR)7WVY,/.^`,W8wojOQarU­3

Continuous Decision MTE Influence Diagrams 69

Page 80: Proceedings of the Third European Workshop on Probabilistic

bdOTV¢.,/k>S_O?U9Ra4ZB,9UÈ86OP,/.,9S_OBRrOB.^j©BOQcÈ8 0¤OQ^jR]OB.arOBRam7WV4c^`aZ\.OBRrO­Zd,9UdZ\O¤,9Udc¡cdOQZB^`a^j7>U,/.<^`,W8ojOQaBv7W.È8 0ÇPw,/.R^jR^j7>Uda§7WV90]P6OB.ZQfd86OQa­7WV2Z\7>U>R^gU fd7>fdaZd,9UdZ\O/,/.^`,W8ojOQaBv,Waba]7@Uq86OQoj7?43b V dcfehgji @BA p¦UªR]Oq^gU]fdO?UdZ\O2cw^`,/kW.,9S 7WVbÐY^jk/f].O®yv2ãÔ^`aZ\7>Udc^jR^j7>Ud,Wo`oj0ÜcOBRrOB.<S_^gUd^`aR^`Z«k9^jWO?Uå3qbd^`a4.OQo`,/R^j7>Uda^`P^`a.OQPd.OQaO?U9RrOQc¥8>0¤R]O2cdO\RrOB.<S^gUd^`arR^`Z_P67WRrO?U>R^`,WoVI.,/k>STO?U>R Þ x[7W. ¨z>Í. ÍWÍ9è ®Ì/_.gÃWÃQyªÞ¬x)^jV _. ÅW|z è v,9Udcä8>0#R]OÇcdOBRrOB.<S^gUd^`arR^`ZP67WRrO?U>R^`,WonVI.,/k>S_O?U9R @ Þàx7W. m 2è4Þx^jV6_. ÅW|z è Ã/3

¶º³ M8²o8 >

5ENl)Xp¦NÑa,/.OTar79ojWOQc§8 0­fda^gUdkmR]OVfda^j7>U­,WokW7W.^jRwS cdOBWOQoj79P6OQcÀ8>0àd]O?U]7@0¦ÃQÄWÄWÌ ,9Udc¡R]O79P6OB.,/R^j7>Uda-Pd.OQaO?U9RrOQcm^gU_Rd^`a[arOQZ\R^j7>U3 ]O)Vfda^j7>U,WojkW7W.^jRwS ^gU>W79ojWOQacdOQojOBR^gU]k¥,/.<^`,W8ojOQa2V¢.7>S R]OU]OBR°[7W.á2^gU2,TarOQÊ]f]O?UdZ\O^`Z<m.OQaP6OQZ\RaER]O^gU]VI7W.rS,/R^j7>UÈZ\7>UdarRr.<,W^gU9RaeI.OQP.OQarO?U>RrOQc8 0È,/.ZBaP79^gU>R¦^gU]kmRr72cOQZB^`a^j7>U,/.^`,W8wojOQa^gU^gU]fdO?UdZ\Oc^`,/kW.<,9S_a\^gURdO)Pd.798ojO?Se3b^`aZ\7>Udc^jR^j7>U_O?Udaf].OQaYR,/R[fU798aOB.WOQc_Z<,9UdZ\O)/,/.^`,W8ojOQa ,/.ObcdOQojOBRrOQc86OBVI7W.O&cdO\ZB^`a^j7>UÈ,/.<^`,W8ojOQaB3m]OTVKfa^j7>UÈSTOBRd7c§,WPPwo`^jOQaRr7Pd.798ojO?S_a]OB.O2R]OB.Oq^`a7>Udoj0¥7>U]Oef]R^`o`^jR°0VfUdZR^j7>UÜI7W.e,¦79^gU>Rf]R^`o`^jR0¡VfUdZ\R^j7>U^`Z<¡VK,WZ\Rr7W.aSfdojR^`Po`^`ZB,/R^jWOQoj0e^gU>Rr7arOBWOB.<,Wof]R^`o`^jR°0eP7WRrO?U>R^`,Wo`a\35E7>S8w^gUd,/R^j7>UT7WVR-7Tln)X¥P67WRrO?U9R^`,Wo`aE^`a[P79^gU>R¦

b^`aOmSfdojR^`Po`^`ZB,/R^j7>U3nl,/.k9^gUd,Wo`^j©Q,/R^j7>U¥7WV&Z<,9UdZ\O/,/.^`,W8ojOQa§VI.7>S l)X P7WRrO?U>R^`,Wo`aª^`a^gU9RrOBkW.<,/R^j7>[email protected]\7>U>R^gU fd7>fda Zd,9UdZ\Ob/,/.^`,W8ojOQa[86OQ^gU]k.O?S_7?WOQc,9UdcafSmS_,/R^j7>U¥7?WOB.ecw^`aZ\.OBRrOZd,9UdZ\O­,/.<^`,W8ojOQa86OQ^gU]kÀ.O?ST7?WOQcæ3 NOBR,W^`o`a¥7WVqR]OQarOR-779P6OB.,R^j7>Uda-ZB,9Um8ObV¢7>fUc^gU­Æ5-7988,9Uc2d]O?Ud7?0Wvdy/W/z]3Ë RdOB.Z\7>S8^gUd,/R^j7>U,9UdcÂS_,/.k9^gUd,Wo`^j©Q,/R^j7>U79P6OB.r,/R^j7>Uda_.OQÊf^j.OQc£V¢7W.5ENln)X p NÑa_,/.OncdOQaZ\.^`86OQc86OQoj7?43

o ADC F c¤ H I H FI Fprm q b I sS i>oi cfH I HIB HI p F i I HIgI

stOBR)dfe g5 86O2,9U£l)XÎPd.798,W8w^`o`^jR0P67WRrO?U>R^`,Wo7>U UÔÞ P(0 Q0 S,9Uc«ojOBR dfe gæ5 Þ×xb86O,¤cdOBRrOB.<S^gUd^`arR^`ZqP67WRrO?U>R^`,WoE7>U UÔÞ P 0 Q0 S¤vdOB.O%QÂÞ$Q 0 Q 3nb]O2Z\7>S8^gUd,/R^j7>Uª7WVu)ndfe g,9Udc dfe gq^`a),_P67WRrO?U>R^`,WoTVI7W. U×cdO XUdOQcq,Wa

ÐY^jk>f].OÌÛ p¦U]6f]O?UdZ\Oc^`,/kW.,9S"V¢7W.X!ud,9S_PojOÌ3

Kdfe g5]YÞÜ)ndfe g dfe g d?e g 5dÞÜ x)ndfe gæ5GBL dfe gæ5 Þx W

|

VI7W.,Wo`o 5HYRQ3qxZBZ\7W.c^gU]kRr7nR]O2stx! "Pd.79P],/k9,/R^j7>UaZ<dO?STO_ln,WcwarO?Ue,9UdceÙ9O?UdarO?UvæÃQÄWÄWÄ -R]OP67WRrO?U>R^`,Wo`a,/.OU]7WRZ\7>S8^gU]OQcv!8wf]R4.,/R]OB.TS_,W^gUR,W^gU]OQc2,WaE,cOQZ\7>S_P679arOQcaOBR[7WVæP67WRrO?U>R^`,Wo`a&cwf].^gU]kZ\7>S8^gUd,/R^j7>U3o Al@ m w#"WH I Jg H 5 d H FJIo Al@BA C S HIQ>oi i %$ I i'&w HIhgji p F c

m q b I s S i i]oc H I HIQ HIp F i I HIgId^`a)79P6OB.,/R^j7>U^`a^`o`ogfarRr.,/RrOQc­8>0qO\ud,9S_PojOKarOBO

Æ5-798w8vwy/W9Í [VI7W.b,TVI7W.<S_,WocO XUd^jR^j7>U63b V dcfehgji o A 5-7>Uda^`cdOB.RdO¥^gU]6f]O?UdZ\Ocw^`,/kW.,9S^gUäÐY^jk>f].OÎÌv§]OB.OÇâ Þ Þ _. |¿,9Udcâ ÞäÃbÞÜ_. |3_]O,/.^`,W8wojO( ^`aZ\7>Uc^jR^j7>U,Wo`oj0®cOBRrOB.<S_^gUd^`aR^`Zek9^jWO?U7A!9v,È.OQo`,/R^j7>Uda<d^`P.OQP.OQarO?U>RrOQc 8 0 R]OßcdOBRrOB.<S^gUd^`arR^`ZÒP67WRrO?U>R^`,WoVI.,/k>STO?U>Ra *)R D Þ*TÞ ) y D äÞÕx,9Udc Q*) D Þ Ã¥Þ ) Ç_. yW| D ¿ÃÀÞ xÆ3bdO .O?S_7?/,Wo¯7WV VI.7>S R]O Z\7>S8w^gUd,/R^j7>U7WV V¢7W.7#(? Â,9Udc"RdOÏ[lÐ VI7W. .O\afojRa­^gUR]O£VI79o`oj7?b^gUdkÀcdOBRrOB.<S_^gU^`arR^`ZP67WRrO?U>R^`,WoKÛ?_. | ) ªy D ÃÞx _. |! ) È_. yW| D ¨ÃÞàx >3]O)/,Wogf]OQa VI7W.-R]OcdOBRrOB.S_^gUd^`arR^`ZP67WRrO?U9R^`,Wo6,/.O

-OQ^jk>9RrOQcb^jR£RdOePd.798,W8^`o`^jR°0«,WogfdOQafdP67>U¥.O\S_7?/,Wo7WVRdOc^`aZ\.OBRrO/,/.^`,W8ojOW3o Al@BA @ FJI H I n F nt +$ I i'&w/HIwhgji

p F c m q b I s S i i]oc H I HIQ HIp F i I HIgIl,/.k9^gUd,Wo`^j©Q,/R^j7>UÀ7WV4,£Z\7>U9R^gUf]7>fda2/,/.^`,W8ojO `

VI.7>S¯R]OZ\7>S8w^gUd,/R^j7>Um7WV,9UlnbX«P7WRrO?U>R^`,Wo,9Udc,cdOBRrOB.<S_^gU^`arR^`ZmP67WRrO?U>R^`,Wo[^`aaf8arR^jRf]R^j7>Uª7WV&R]O^gU>WOB.arOq7WViR]OqOQÊ]fd,/R^j7>UæKa\^gU£R]OecdOBRrOB.<S^gUd^`arR^`Z

70 B. R. Cobb

Page 81: Proceedings of the Third European Workshop on Probabilistic

P67WRrO?U9R^`,Wo ^gU9Rr7qR]Ol)XÀP67WRrO?U>R^`,WoK3Tbd^`ai79P6OB.,R^j7>U2^`a[cOB.^jWOQc2V¢.7>S¯R]OSTOBR]7]cT7WVtZ\7>U>W79ogf]R^j7>Uda^gUPd.798,W8w^`o`^jR04RdOB7W.0Wv>,WaO\u]Pwo`,W^gU]OQcT^gUÆ5-7988,9Udcd]O?Ud7?0Wvy/W9| 3stOBR)dfe g5 86O_,9Ul)XÇP67WRrO?U9R^`,Wo 7>U U Þ

P,0 Q 0 S ,9UdcojOBR dfe gæ5 iÞ"x-86O,ncOBRrOB.rS_^gU^`arR^`ZP7WRrO?U>R^`,Wo)7>U U Þ0P 0 Q 0 S¤3¨xa¦afS_O Q"ÞQR0 Q v U Þ U !0;U v&,9Udc®Rd,/ROQ,WZ< dfe gfe 5 [ÞÇxÆv­ÞÃ/ . . . âTv^`ai^gU9WOB.R^`8ojO^gU4 ` Hª Q Q 3bdO4S_,/.k9^gUd,Wo7WV?x)dfe gW dfe gJVI7W.-,arOBR 7WV/,/.^`,W8ojOQa UsdÞ P#0m Qv ` c0 S U^`a)Z\7>SPwf]RrOQce,Wa

)ndfe g dfe g6 Zd?e g 5 s Þ 9 : )ndfe gW dfe gfe 5 s 5 s

Í

VI7W.b,Wo`oBV s HGY Z ]OB.O5ÞÜ5 s D ` v5G-ÞÎ5 s D ` v5 ÞÎ5 s D ` vw,9UdcWndfe gfe 5 s YÞÎ , D , e `_ D `_ ;, e ` / D ` / , D > f/, ` .b]O­Z\7>UdarR,9U>R ¥^`a Ã

, ` ^jVk)dfe g®^`a,9U¡lnbXPd.798w,W8^`o`^jR0¥P7WRrO?U>R^`,Wo&,9UdcÃ2^jVN)dfe gª^`aT,9U£lnbXf]R^`o`^jR°0P67WRrO?U9R^`,WoKvt]OB.O#,A ` .OQPd.OQarO?U>RaR]OZ\7 OBV¢XwZB^jO?U>R)7>U/,/.^`,W8ojO$ ` ^gU dfe gfe 5 3oBA @BAo S HIQ>oi>oi S iHIoH FJI &w/HIw gji l,/.k9^gUd,Wo`^j©Q,/R^j7>Umb^jR.OQaP6OQZ\R!Rr74,c^`aZ\.OBRrObcdO\

ZB^`a^j7>U¥/,/.^`,W8ojOq^`a7>Udoj0ªcdO X6U]OQcªVI7W._lnbXÜf]R^`o`^jR0P67WRrO?U9R^`,Wo`aQ3 stOBR 86O¥,9Ul)X*f]R^`o`^jR°0P67WRrO?UR^`,WoiVI7W. U"Þ P10GQ0MS§v)dOB.O H S¤3Îb]OS_,/.k9^gUd,Woæ7WV VI7W.,marOBR7WV!/,/.^`,W8ojOQa UAT^`a,9Uel)XfdR^`o`^jR0eP67WRrO?U9R^`,WoZ\7>S_PwfdRrOQcq,Wa

@ Z B _^5= s s YÞ S_,u z _^5= s Æè>

VI7W.,Wo`o_^T5a s s tH Y Z ]OB.O s Þ s s " 3l,/.k9^gUd,Wo`^j©Q,/R^j7>U£7WV&cdOQZB^`a^j7>U/,/.^`,W8ojOQa.OQafdojRa^gU,9U£lnbXÜP67WRrO?U9R^`,WoK3¥Ðd7W._cdOBR,W^`o`aBv-arOBOn5-7988£,9Udcd]O?Ud7?0§y/W/z]3oBA @BA FI /H I n F nt S iHIoH FJI &w/HIw gji X[o`^gS_^gUd,/R^gU]k¿,ÜZ\7>U9R^gUf]7>fda¡cdOQZB^`a^j7>U",/.<^`,W8ojO

VI.7>S ,m5ENl)Xp¦NÂ^`ab,R].OBO\arRrOQPnPd.7Z\OQaaQÛ

]RrOQPeÃ/Û!5E.OQ,/RrO,cw^`aZ\.OBRrOb,WPP.7Qud^gS_,/R^j7>URr7R]OZ\7>U>R^gUf]7>fdabcOQZB^`a^j7>Uq/,/.^`,W8ojO4,9Uce,WPPoj02R]OP.7Z\OQcf].O^gUnOQZ\R^j7>UÌ3 y3 Ì3

]RrOQPªyÛ~a^gU]kojOQ,WarRaÊfd,/.OQa.OBkW.OQaa^j7>UvZ\.OQ,/RrO,qcOQZB^`a^j7>U§.<fojOVI7W.R]O_Z\7>U>R^gUf]7>fdacdOQZB^`a^j7>U/,/.^`,W8ojO),Wa[,iVKfwUdZ\R^j7>U7WV6^jRa[Z\7>U9R^gUf]7>fda P,/.rO?U>R?Ka\3

]RrOQPÜÌÛÜ5E7>UdarRr.<fdZ\R§,àcdOBRrOB.S_^gUd^`arR^`Z£P67WRrO?U>R^`,WoVI.7>S RdOªcdOQZB^`a^j7>UÀ.<fdojOÈcdOBWOQoj79P6OQc^gURrOQPyv¤Z\7>U>WOB.RRdOZ\7>U>R^gU f]7>facdOQZB^`a^j7>U /,/.^,W8wojO-Rr7,icdOBRrOB.<S^gUd^`arR^`Z-Z<d,9UZ\O&,/.^`,W8wojOWv>,9UdcS,/.k9^gUd,Wo`^j©BO§R]O§/,/.^`,W8ojOªfa^gU]k£R]OÈPd.7]Z\O\cf].O^gUnOQZ\R^j7>UÌ3 y3 y3

b^`ab79P6OB.,/R^j7>Unb^`o`o8OcdOQaZ\.^`86OQcnV¢7W.R]OZB,WarO]OB.Oª,®cdOQZB^`a^j7>UÀ/,/.^`,W8ojO«d,Wae7>UdOZ\7>U>R^gU f]7>faP,/.O?U>RBvw8wfdR&SfdojR^j/,/.^`,/RrO.OBkW.OQaa^j7>UnZB,9Uq86O4fdarOQcRr7«O\u]RrO?UdcR]O¤79P6OB.,/R^j7>U¡V¢7W.qRdO¤ZB,WaO§]OB.O§,cdOQZB^`a^j7>U®/,/.^`,W8ojO­,WamSfdojR^`PojOnZ\7>U>R^gUf]7>fda2P,/.rO?U>RaB3«fdPP679arO2[OqE,9U9RTRr7§OQo`^gS^gUd,/RrOe,§Z\7>U>R^gUf]7>fdacdOQZB^`a^j7>U¥/,/.^`,W8ojO b^jR¥Z\7>U>R^gU fd7>fdaP,/.rO?U>RrÇVI.7>S"R]O^gUdf]O?UdZ\Oc^`,/kW.,9S¬fda^gU]kmR]O4Vfa^j7>U«,WojkW7W.<^jRSq3®stOBR 86Oe,9U¨l)Xf]R^`o`^jR0¨P67/RrO?U>R^`,WoVI7W. x G93bÐY^j.arRBv,9P679^gU9Rc^`aZ\.OBRrO,WP]Pd.7?u]^gS,/R^j7>U­^`aZ\.OQ,/RrOQc¤V¢7W.3X ,WZ¤Êfd,Wo`^jR,/R^jWOarR,/RrO7WVR]Oic^`aZ\.OBRrO,WPPd.7?u]^gS,/R^j7>UZ\7W..OQaP67>Udcab^jR«,È.OQ,WobUfS8OB.OQc/,Wogf]On^gU¨R]OeZ\7>U>R^gU f]7>faarR,/RrOaP,WZ\Om7WV&R]O/,/.^`,W8ojOWvY Þ$ " Û " 7 ` " " 7

93ªb]O2.OQ,Wo&U fwS86OB.OQc/,Wogf]OQaT,Waar7]ZB^,/RrOQce^jRqR]O4Ê]fd,Wo`^jR,/R^jWOarR,/RrOQa,/.O4cdOBRrOB.<S^gU]OQc,Wa " Þ " 7 ` j* ®_. | ? " 7

" 7 ` G!§VI7W.*¡Þ Ã/ . . . w3ÔÐd7W.a^gSPo`^`ZB^jR0WvTRdO¨Êfd,Wo`^jR,/R^jWOarR,/RrOQab^`o`o,Wo`ar728Oi.OBVIOB..OQceRr7,Wa " + . . .o "" 3l,/.k9^gUd,Wo`^j©Q,/R^j7>UÈ7WV ÕVI.7>SäR]O_f]R^`o`^jR0¤P67WRrO?U

R^`,Wo ^gU>W79ojWOQa XUdc^gU]kRdO),WogfdOb7WV " R,/R-S_,ud^S_^j©BOQa V¢7W.!OQ,WZ_P679^gU>RY^gURdO)Z\7>U9R^gUf]7>fdac7>S_,W^gU7WV43®b]OeVI79o`oj7?^gU]kY P^jOQZ\O­Z\7>S_P67>U]O?U>RlnbXP67WRrO?U9R^`,Wo-VI7W. #^`acdOB.^jWOQc¥V¢.7>S*R]Oc^`aZ\.OBR^j©BOQcZ\7>U>R^gU f]7>fa&cdOQZB^`a^j7>U2/,/.^`,W8ojOr fda^gUdk4R]OPd.7]Z\O\cwf].Oi^gUOQZ\R^j7>UÌ3 y3 ÌÛ

7# D !Þ

$%%%%& %%%%' ? D ^jV ( D ( D ^jV ( D ( 333 O] D ^jV (O D (MO

]OB.O ( = ,9Udc ( = ,/.O¡R]Ooj7@[OB.«,9Udc#fdPP6OB.867>fUdca7WViR]Ocd7>S_,W^gU£7WVRdOe,/.<^`,W8ojOZä^gULW

Continuous Decision MTE Influence Diagrams 71

Page 82: Proceedings of the Third European Workshop on Probabilistic

R§Pw^jOQZ\O_7WVER]OlnbXf]R^`o`^jR°0ÈP7WRrO?U>R^`,Wo 7 vt,Wa

cdOBRrOB.S_^gU]OQc£8 0R]OPd.7]Z\OQcwf].Oq^gU®OQZ\R^j7>UÌ3 y3 Ì3bdOQarO4867>fUdca&,/.O4cdO XU]OQcnafdZ<eRd,/R ( `_ O( ` ( = O( = iÞ12eV¢7W.,Wo`o I- L¤Þ×Ã/ . . . XY6vmI \ÞwL­,9Udc O= : ( = O( = ÞÖ D 7 ` D 7#

vYdOB.O_R]O2arR,/RrOaPw,WZ\Om7WVr ^`ak9^jWO?U£,Wa)YRàÞ D Û D 7 ` D D 7

93 stOBRT D ¥Þ " @ B . . . " @ O B ¨86O,¡WOQZRr7W.m7WV.OQ,Wo°U fS86OB.OQc«/,Wogf]OQa " 7WViR]OncOQZB^`a^j7>U/,/.^`,W8ojO Rd,/R2Z\7W..OQaP67>Udc¨b^jR + . . .o O¤^gU 7

v^K3 OW3¯^`abRdOcdOQZB^`a^j7>Un.fdojO4798dR,W^gUdOQc­fdP67>UR]OT.O?ST7?/,Wo!7WVn3_]O_cdOQZB^`a^j7>U¤.fdojO_^`a4,VfUdZR^j7>U­dOB.OT D bÞ " @ =KB ^jV ( = D 9( = VI7W.,Wo`oLÞÇÃ/ . . .oXY63]OmcdOQZB^`a^j7>U¥.<fdojO"^`aRr.,9UarV¢7W.STOQcÈRr7§,­cdO\

ZB^`a^j7>UÇ.fdojO Rd,/R§^`a¤arR,/RrOQcÎ,Wa§,VKfUZ\R^j7>U7WV43#~a^gUdk¨ojOQ,WarRnaÊ]fd,/.OQa.OBkW.OQaa^j7>Uvi-O§798dR,W^gU,o`^gU]OQ,/.OQÊ]fd,/R^j7>U7WVER]OVI7W.<S % D iÞ > P > D 3bdOq,Wogf]OQa Þ > > ~,/.OZB,Wo`ZQfdo`,/RrOQc,Wa Þ ]OB.OÂÞ " @ B " @ B . . . " @ O B ,9Udc

Þ

à / à / 333 333à /

! "

.

bdOicOQZB^`a^j7>Uq.<fojO4^`a&R]O?UqaR,/RrOQcn,Wa

T D YÞ $%%%& %%%'" 7 ` ^jV > 6 > D " 7 `> 6 > D^jV " 7 ` > 6 > D " 7 " 7#

^jV > > D# " 7

dOB.ORdO X.arRq,9Udc¡Rd^j.c.OBk9^j7>Udaq7WV$_ D ,/.O,WccOQc§Rr7O?Uaf].O R,/áWOQa7>UÈ,n,WogfdOb^jRd^gUª^jRaarR,/RrOTaPw,WZ\OW3bdOcdOQZB^`a^j7>U­.fdojO,W867?WOT^`afdarOQcRr7Z\7>UdaRr.<fdZ\R!,cdOBRrOB.<S^gUd^`arR^`ZEP67WRrO?U9R^`,Wo D " ÞxÆ3h,/.^`,W8wojO d,Wa86OBO?UªZ\7>U>WOB.RrOQc£Rr7¤,cOBRrOB.<S_^gU^`arR^`ZZd,9UdZ\O/,/.^`,W8ojO,9Udcq^`aEOQo`^gS^gUd,/RrOQcqVI.7>S¿R]OS_7cdOQo8 0efda^gU]kR]O479P6OB.,/R^j7>UcdOQaZ\.^`86OQce^gU­OQZR^j7>UeÌ3 y3 y3]OTZBo`,Waa7WV&lnbXP67WRrO?U9R^`,Wo`a^`aZBoj79arOQc¥fUdcdOB.

OQ,WZ7WVdR]O[79P6OB.,/R^j7>Uda.OQÊf^j.OQcRr7ar79ojWOEl)X¤^gUfdO?UdZ\Oc^`,/kW.,9Sa4Æ5-7988,9Udcd]O?Ud7?0Wvy/W/z]&,9UdcR]O[Z\7>U9W79ogf]R^j7>U79P6OB.,/R^j7>UfdarOQcRr7OQo`^gS_^gUd,/RrO-Z\7>UR^gUf]7>fdaecOQZB^`a^j7>UÀ/,/.^`,W8ojOQa¥Æ5-7988,9Udcd]O?Ud7?0Wvy/W9| 3

0 0.2 0.4 0.6 0.8 10

0.25

0.5

0.75

1

1.25

1.5

1.75

ÐY^jk>f].OzdÛ)b]OlnbXàP7WRrO?U>R^`,WoV¢.<,/k>STO?U>Rab^`Z<Z\7>UdaR^jRf]RrOR]Ol)XP7WRrO?U>R^`,Wo ] V¢7W. ?å#Ý993

% '& 6)(+* :b^`amarOQZ\R^j7>UP.OQarO?U>RamRdO§5ENl)Xp¦NÖa79ogf]R^j7>URr7¡R]O«ST7>U]79P679o`^`arRBÑa­cdOQZB^`a^j7>UÇPd.798wojO?Sq3Öxcc^R^j7>Ud,WoYUfSTOB.^`ZB,WocdOBR,W^`o`a7WV[l)XP7WRrO?U>R^`,Wo`aiZB,9U86OiV¢7>fwUdcq^gUÈÆ5-7988ævy/W9Í 3RADC , i_e i i I ?d H FJIbdOlnbX¡P67WRrO?U>R^`,WoV¢7W.iâ¯k9^jWO?U @Ý7<ãT^`a,y@P^jOQZ\O2ln)XÇ,WPPd.7Qud^gS_,/R^j7>U§Rr7R]OmU]7W.S_,Wo!Ï NiÐÆ5-798w8e,9UdcddO?U]7?0qy/W9Í/,&b^jR.-­Þà|/WWè,9Udc0/W4Þ#ÃBWk9^jWO?UÈÝ ÞÂ]væ,9Udc1-¥ÞÃBWWW|/ª,9Udc+/ ÞÃBWW k9^jWO?UÝ ÞÔÃ/3Îb]OP67/RrO?U>R^`,WoR^­VI7W.!?â Ý7<ãTd,WabP67WRrO?U>R^`,WotVI.,/k>STO?U>Ra^YRÝßÞà[,9Udc%^YRÝßÞÎÃ3]Ol)X,WPPd.7?u]^gS,/R^j7>UeRr72R]OfdR^`o`^jR0eVfUdZ

R^j7>U V¢7W.[?â <ã^`a[cdO?U]7WRrOQc_8 0 ,9Udc_^`a cdOBRrOB.rS^gU]OQc§fda^gU]kqR]OmPd.7]Z\OQcwf].OTcdOQaZ\.<^`8OQc§^gU®Æ5-7988,9UdcÂd]O?U]7@0Wvy/W/z]3 ln)X×,WPP.7Qud^gS_,/R^j7>UdaRr7R]O86OBR,Ï NiÐ-ÑaV¢7W.iåÜk9^jWO?U¤Ý ,/.OZ\7>UdarRr.fdZ\RrOQcfda^gU]kÂR]OÀPd.7]Z\OQcwf].O^gU*Æ5-798w8 JMȽ `H v­y/W9Í 3bdOQarOl)XªP67WRrO?U>R^`,WowVI.,/k>STO?U>Ra d^`Z<TZ\7>UarR^Rf]RrOiR]O4l)X¨P67WRrO?U>R^`,Wo ] VI7W.?å#Ý9\¡,/.O4c^`a¦Po`,?0WOQcekW.,WPd^`ZB,Wo`oj0e^gUqÐY^jk>f].Oz7?WOB.o`,?0WOQc¤7>UR]OZ\7W..OQaP67>Udc^gU]k86OBR,2Ï NiÐ-ÑaB3]OÏ-lnЮV¢7W.Ý ^`a.OQP.OQarO?U>RrOQc¡8 0®RdO¤P67WRrO?U>R^`,Wo!\ÎdOB.O;\[KÞâÝßÞYÞà_. zm,9UdcG\-¦ÃYÞâmÝßÞÎÃYÞ_. Í3xojR]7>f]k>2ã^`a[,Z\7>U9R^gUf]7>fda-cdOQZB^`a^j7>Um/,/.^`,W8ojOWv

-Om^`o`o ^gUd^jR^`,Wo`oj0£fdarO,nR].OBO\P79^gU>Rq ªÞ"Ì c^`a¦Z\.OBRrO,WPPd.7?u]^gS,/R^j7>Ue^jRGxEÞ¯ÃWÃ/. ÍWÍ9è v ÞÌW|v,9Udc k Þ |WÅ. ÌWÌWÌ¥Rr7,WPPoj0«RdOar79ogf]R^j7>UPd.7]Z\O\cwfd.OW3RAl@ 2WF gDnæ/H FIbdO5ENl)Xp¦N¿a79ogf]R^j7>U­Pd.7]Z\OBOQcai8>0eOQo`^gS^gUd,/R¦^gU]kqRdO_,/.<^`,W8ojOQa^gU¤R]OmarOQÊfdO?UdZ\OâTvݯvYãTvå3

72 B. R. Cobb

Page 83: Proceedings of the Third European Workshop on Probabilistic

0.2 0.4 0.6 0.8 1

-100000

-50000

50000

100000

150000

200000

ÐY^jk>f].O|Ûb]Oif]R^`o`^jR0P67WRrO?U>R^`,Wo6V¢.,/k>S_O?U9Ra[d^`ZZ\7>UdarR^jRfdRrO4R]O4f]R^`o`^jR°0eP67WRrO?U9R^`,Wo dk 3

7.O?ST7@WO)âTv9-O)ZB,Wo`ZQfdo`,/RrO ÞÜ ^t e 3b]O/,Wogf]OQabV¢7W. ,/.OZB,Wo`ZQfdo`,/RrOQc¤,Wa

Ý Þ!Þ y z @R>R ^RÝ Þ "

Ý ÞÂÃ!Þ y z @R>R ^RÝ ÞÂÃ " .

7 .O?ST7?WO ݯv [O ZB,Wo`ZQfdo`,/RrO k Þ \ ] me 3 b]OçfdR^`o`^jR0 P67WRrO?U>R^`,WoVI.,/k>STO?U>Ra k <ã Þ x\v dk <ã Þ v,9Udcdk <ã Þ k ,/.Oa]7@UÂkW.,WPd^`ZB,Wo`oj0Î^gUÜÐY^jk/f].O |3 1bO?ST7@^gU]k ã ^gU9W79ojWOQa X.aR X6Udc^gU]kl,uJ k <ã Þ L k <ã Þ L k <ã Þ k -,/ROQ,WZ<ÈP79^gU>R^gU­R]O_cd7>S,W^gUn7WV-å34prURd^`aZB,WarOWv-O XUdc dk <ãÒÞ k <ã Þ k ,/R«å Þ _. y/9ÄW|3 O\u]RBvqR]O¡cOQZB^`a^j7>U.<fdojO^`acdOBWOQoj79P6OQcn8 0qfda^gU]kTRd^`a),Wogf]OW3 A @BADC oi d/H I "È S iHIoH FJI , nhgji p F

FI /H I n F nt S iHIoH FJI &w/HIw gji~a^gU]kR]O_Pd.7Z\OQcwfd.O^gUÈ]OQZ\R^j7>UÈÌ3 y3 Ìvt-O_cdO\

RrOB.<S^gU]OR]O79PdR^gS,WoarRr.,/RrOBkW0q7WV!Pd.7]cwfdZB^gUdk_ãÞ ÞàÌW|^jV ®_. y/9ÄW|v,9UdcãÇÞ k Þà|WÅ. ÌWÌWÌ^jV)_. y/9ÄW| "Ã/3n~a^gU]k­ojOQ,WarRaÊf,/.OQa.OBkW.OQa¦a^j7>UàKarOBO]OQZ\R^j7>U¨Ì3 y3 z]vER]OqV¢79o`oj7@b^gU]kÈcdOQZB^`a^j7>U.<fdojO^`a)798dR,W^gU]OQcÛ

YÞ z>Í. ÍWÍ9è! P Ì/_.gÃWÃQy ^jV «_. ÅW|z èè ^jV_. ÅW|z è ÀÃ .

7iZ\7>U>WOB.R[ã¡Rr7,cdOBRrOB.S_^gUd^`arR^`Z-Z<,9UdZ\O&,/.<^`,W8ojOWv-OmfdarO bRr7­cdO XU]OTRdO_cdOBRrOB.<S_^gU^`arR^`Z_P67WRrO?UR^`,Wo VI7W. ã#å4,Wa&^gUX!ud,9S_PwojOiy7WVY]OQZ\R^j7>Uey3 zIR]O§cd7>S,W^gU®7WVå P.OQZBogfdcdOQaR]O§P679aa^`8w^`o`^jR0®7WVPd.7]cwfdZB^gUdk©BOB.7mfUd^jRa\3

,W8ojO¥Ã/ÛNOQZB^`a^j7>UarRr.,/RrOBkW0®VI7W.,9U5&Nil)X p Nar79ogf]R^j7>Ueb^jROQ^jk>>RarR,/RrOQabV¢7W.iã3

OQarRb1)OQa<fdojRa4Kå fd,9U>R^jR0eÏ .7]cwfdZ\OQcÈÆã ¨_.gÃQÍW|W| 4ÞÌWÄ3 Ì9è/|

_.gÃQÍW|W| ®_. yWÅWÄW| 4Þz>Å3gÃQyW|_. yWÅWÄW| ®_. ÌWÄ9è/| 4Þ|WÍ3 Å9è/|_. ÌWÄ9è/| ÀÃ 4ÞÍW|3 ÍWyW|

x 5ENl)Xp¦N ar79ogf]R^j7>U"b^jR OQ^jk>>Rc^`aZ\.OBRrOarR,/RrOQaV¢7W.bR]OcdOQZB^`a^j7>Ue,/.^`,W8wojOãÎ0]^jOQo`ca&RdO4cdO\ZB^`a^j7>UÇ.<fojO^gUÇ,W8wojOÃKar7>S_O¥arR,/RrOQa¤7WVqãÉ,/.OU]7WR79PdR^gS_,Wo V¢7W.,9U90§/,Wogf]OQa7WV)å3eNO XU]O.äÞ ÌWÄ. Ì9è/|rz>Å.gÃQyW||9Í. Å>è/|=ÍW|_. Í9yW|x ,9Udc

Þ

à / à / à / k à k /

! "Þ

à _.9ÅWy9è/|à _. yWy9è/|à _. Ìz>ÌW|à _. ÍWÄWÅ9è/|

! "

.

b]O.OQa<fdojR^gU]kqcdOQZB^`a^j7>U­.<fojO^`a8w,WarOQc¤7>UR]O_o`^gUOQ,/.TOQÊ]fd,/R^j7>U % Þ zÃ/. z99Ì ÌWÅ. |/dÃe,9Udc¥R]OcdOBRrOB.<S^gUd^`arR^`Z_P67WRrO?U>R^`,Wo >4Þ x[VI7W.)ãåd,WaTVI.,/k>S_O?U9Ra 9_Þ*x)7W. ®zÃ/. z99Ì ÌWÅ. |/dÃÞx!^jV ¡_.è/Í/9Å,9Uc @ >-Þx7W. È èÞx^jVY_.è/Í/9Å ÀÃ/3 A @BAl@ FI i? H I "È $ i S iHIoH FJI &w/HIw gji

F S i> i] cfH I HIB HI %$ I i&w/HIwhgji

7OQo`^gS_^gUd,/RrOiR]OicdOQZB^`a^j7>U2/,/.^`,W8ojOãTv -OZ\7>Ua^`cdOB.iR]O.OB]^`arOQc§^gU]6f]O?UdZ\Oc^`,/kW.,9S×^gU­ÐY^jk>f].OyRd,/R.OQPo`,WZ\OQaYR]OEcdOQZB^`a^j7>UU]7]cdO[b^jR,cdOBRrOB.S_^gU^`arR^`Z-Pd.798,W8w^`o`^jR04U]7]cdOW3]O P67WRrO?U9R^`,Wo`a.O?S,W^gUd^gU]k,/VIRrOB.mã¬^`aTZ\7>U9WOB.RrOQc¨Rr7¤,¤cdOBRrOB.S_^gUd^`arR^`Z2Z<,9UdZ\O/,/.^`,W8ojO,/.O dk V¢7W. ãå,9Uc VI7W.ã#å937S_,/.k9^gUd,Wo`^j©BOq,Z\7>U>R^gU fd7>fdaTZ<d,9UZ\O2/,/.^`,W8ojO2VI.7>SR]OZ\7>S8^gUd,/R^j7>U7WV,_f]R^`o`^jR0qP67WRrO?U>R^`,Wo,9Udcq,TcdO\RrOB.<S^gUd^`arR^`ZP67WRrO?U9R^`,WoKv[[Oeaf8arR^jRf]RrO,9U¥O\u]P.OQa¦a^j7>U¥798dR,W^gUdOQc£V¢.7>S RdOcOBRrOB.<S_^gUd^`aR^`ZmP67WRrO?U>R^`,WoVI7W.R]O2/,/.^`,W8ojORr7­86Om.O?ST7@WOQc¥^gU>Rr7¤R]Oqf]R^`o`^jR0P67WRrO?U9R^`,WoKv-,WaTcdOBR,W^`ojOQc«^gU«OQZ\R^j7>U«Ì3 y3 y3ÈprURd^`aZB,WarOWv6,_U]OBl)Xf]R^`o`^jR°0eP67WRrO?U9R^`,Wo^`aBÛ

Þ$%& %'dk Iz>Í. ÍWÍ9è! R Ì/_.gÃWÃQy

^jVY ¨_. ÅW|z èdk Æè_ ^jVY_. ÅW|z è ÀÃ

.

Continuous Decision MTE Influence Diagrams 73

Page 84: Proceedings of the Third European Workshop on Probabilistic

prU9RrOBkW.<,/R^gU]k [email protected]]O2c7>S_,W^gU§7WVå"k9^jWOQaR]OXU,Wo]O\u]P6OQZ\RrOQcf]R^`o`^jR0T7WVtÅWÌ9è/yWÄ3-5E.OQ,/R^gU]k4RdObcdO\ZB^`a^j7>Uq.<fdojOaP6OQZB^2XOQcq8 0R]OVfUdZ\R^j7>U T !0]^jOQo`ca,9Uª^gUdZ\.OQ,WarO2^gU§O\udPOQZ\RrOQc/,Wogf]O7WV)ÄWy]ÃB­7?WOB.R]OcdOQZB^`a^j7>U®.fdojO­Z\.OQ,/RrOQcÀfda^gU]k£,9Uln)X ^gUdf]O?UdZ\Oc^`,/kW.<,9S*b^jRª,nc^`aZ\.OBRrOmcdOQZB^`a^j7>UÈ/,/.^`,W8ojOW3e~a¦^gU]kiR]O&a,9S_O-P.7Z\OQcf].OWv/R]O)S_,ud^gSfSÀO\udP6OQZ\RrOQcf]R^`o`^jR°0®VI7W.R]OÈST7>U]79P679o`^`arRfda^gU]k£,£5&Nil)X p Nb^jRqOQ^jk>9Ric^`aZ\.OBRrOarR,/RrOQaV¢7W.ã¯^`aÅWÄWÍWÅWÍv^`Z<^`aiÌ/9|WÌm^jk>]OB.R,9Un,9U­lnbX¡^gUdf]O?UdZ\O4cw^`,/kW.,9Sb^jRTOQ^jk>9REarR,/RrOQa[V¢7W.-ãT3 5-7988ny/W9Í !k9^jWOQa-,cdO\R,W^`ojOQcZ\7>S_Pw,/.^`ar7>UT7WVRdOi5ENl)Xp¦Nàar79ogf]R^j7>UmRr7a^gS^`o`,/.blnbX,9Udcqcw^`aZ\.OBRrO4^gU]6f]O?UdZ\Oc^`,/kW.,9SaB3 m¶t³[¹ ¸>/º¶t³>5-7>U>R^gUf]7>fdacdOQZB^`a^j7>Ul)Xª^gU]6f]O?UdZ\O)c^`,/kW.,9Sa ^`Z<cdOBRrOB.<S_^gUdO-,cdOQZB^`a^j7>U.<fojO[VI7W.,iZ\7>U>R^gU fd7>fdacdOQZB^`a^j7>U2,/.^`,W8wojO,Wa),VKfUZ\R^j7>U27WV^jRa)Z\7>U>R^gU fd7>fdaP,/.O?U9Ra a]7?ÉP67WRrO?U9R^`,WoRr7^gS_P.7?WO®cOQZB^`a^j7>US,/á^gU]k,/R&,oj7?-OB.bZ\7>S_PwfdR,/R^j7>Ud,WoZ\79arRERd,9Uc^`a¦Z\.OBRrOq^gU]6f]O?UdZ\Omcw^`,/kW.,9S_a7W._lnbXÎ^gU]f]O?UdZ\Oc^,/kW.,9SaB3"Ï.OB^j7>fda­STOBR]7]caVI7W.d,9Udcwo`^gU]k¨Z\7>UR^gUf]7>fdaEcdOQZB^`a^j7>Um/,/.^`,W8ojOQa&^gU2^gU]fdO?UdZ\Oc^`,/kW.<,9S_a.OQÊ]fd^j.OnOQ^jR]OB.Ud7W.<S_,Wo)cw^`arRr.^`8wf]R^j7>Ua_VI7W.mZ\7>U>R^gUf]7>faZ<d,9UdZ\O,/.^`,W8wojOQa4,[email protected]^`aZ\.OBRrO,WPwPd.7Qud^S,/R^j7>Uda4Rr7§Z\7>U>R^gU fd7>fdacdOQZB^`a^j7>U,/.<^`,W8ojOQaB3§b]OZ\7>U>R^gU fd7>fdabcdOQZB^`a^j7>Unl)X^gUdf]O?UdZ\Oc^`,/kW.<,9S",Wooj7@ba ,4S_7W.O&Pd.OQZB^`arOcdOQZB^`a^j7>UT.fdojO),9UdcTR]O,W8^`o`^jR0Rr7_O\udPoj79^jR)RdOZ\7>U9R^gUf]7>fdaUd,/Rf].O7WVRrOQarRb.OQafdojRaB32 IRF gjis"Jc;i I ?bdOn,9f]R]7W.^`akW.<,/RrOBVKfdobV¢7W.2R]O­^gUda^jk>>RrVKfdoZ\7>SS_O?U9Ra&7WV!Rd.OBO4,9U]7>U>0]S_7>fda).OB^jOB-OB.ad^`Z<na^jk/Ud^2XZB,9U9Roj0q^gS_P.7?WOQcR]OP,WP6OB.Q3

: :µ:³ ¹_:d> "!$#&% '(% )+*,!.-/'102'(%4365 7(!809'(%

'(%,:&5&!$%&-$!;*,'1<=)36<>?08 @+AB36C'D% )FEG<=H!$3.JIK'(3 )'D% '1<L 'D7('NM<=36OFPQ%R0SM6'NM65,M !UTWVX<='(7(< 7(!ZY[3\*,X]"% 7(<*^<XM._```"acbXdfegaShjikRl.dmhji&e=n,lh&o.pk l oqXpp&lr,s=tuRvxwyaz,ij|

<%&*UEg Egm~&!$% XOJy KO 3 '1*U'(%,:&5 !8%&-/!*,'(<)3<=>05&0 'D%&)>?'NM65 3 !.0WYM63 5&%&-$<=M !8*!$,HR%&!$%jM '1<=710$ @'(% L &'(-C!$36'D% )<%&* jK<7DH!$36%U[!.* 0$$/=R/ y/=R $N¡j/m R ¢j£.¤¥¦

§ <%&*¨Eg Eg~ !8% XO©=£j^ KO 3 '1*<xO!80 '(<%% !$MS]3 C,0]"'DM 7('D% !.<=3K* !/M !83 >?'(% '10SM6'(-VX<3 'D< 7(!808 @'(%ªyy<-$-5&0«<%&*­¬,<<=CC7(<?!8* 08® f&¯$/=R?/ $X/Rc/NN¡$ R,°°8¦X¤°8&

<%&*±Eg Eg~ !$%&XO²=<j PQ%,Y[!$36!$%&-$!³'(%O 36'(*<xO!80 '(<%%&!/MS]y36C0«5&02'(% )>?'DM 5 36!80=YM63 5 % -8<XM6!8*!/,HR% !$%jM6'(<7(08 @ /Rc/ m=´XmX&µ ´=¶, X´S·y¸R¯¸ ´$¹º=c»Kx¼$´XRj¡ &°j¦j &£j½8¤,¢&

(,~, !$% XOEg Egj<%&*"5 >¾¿ f TWH H&3 x,'(>?<=M '(% )H 36&< '(7D'DMSOB* !$%&0 'NMSOBY[5 %R- M '(%R0³'(%²Oj&3 '1*9<xO!80 '1<=%À% !/MS]y36C,0]"'NM6Z>?'NM 5&3 !.0U=Y?M63 5 % -8<XM6!8*;!/,HR% !$%jM6'(<7(08 @FÁ 1¼/ ¼XÂÄÃG´=ºK¸¶ ¡ °.&¦j ¥¦x¤,¦¢

KX]y<36* T³j<%&*& ÅK L <=M !.02%ÆS°.¥¢f2PQ%,:R5 !$%&-$!* '(<)3<=>0$ @²'(%Ç TWX]<=3*È<=%&*É& ÅW L <XM &!80 %!8* 08 »K6ÂX¡X¼±´=ÀÊËy m/ ¸D ¼JXmÂB¸¸RN6=¯´X&¼´2·³Ì³$1¼ ´XÍmXx¼ 1¼ ½°.¥x¤&½X ~jM 3<XM6!$)'1-Î !8-$'(0 '(%&0WÏ365 HÐ L !$% 7(EG<=36Cm yT³

L <* 0 !$%T³ ÑÒ<%&*ªyj!$%&0 !$%U=j£ 6~7DV'(% )7('D% !.<=3 Ó 5&<*,3<XM '1-Ô-/%&*,'DM '(%&<7ÇÏ<=5&0602'1<=%Õ'(%,:&5 !8%&-/!Ö* '(<=)36<>0$ @ $ ´==µ,´=¶, =±´S·Ç¸¸R6´$¹º?»Kx¼$´XRj¡ ¦¢&¦j &=¦X¤,¢,

L <* 0 !$%,T Ñ«R<=%&*תy IR=!8%&0 !$%±2°8¥¥¥j g ÑÐ<Ø$O!8Vx<7D5 <=M '(%UYy0 O>>?!$M 36'(-<xO!80 '1<=%U*,!.-/'102'(%±H 3 7(!$>0$ @'(%ÆÙ? Ñf<0 C!$O×<=%&*ÆE«3<*,!Ú[!8*&0$ fm//cX/ y/= $N¡j/m °x£, ¦¢.¤,¦¥

Û7(>0SM6!8*~m L RS°.¥¢¦jfÛ%?36!$H 36!80 !$%jM '(% )<%&*027DV'(% )* !8-/'10 'D%H 36 7(!$>08 @E«Ð Î &¬ !.02'10$ Î !$HR<=3 M >?!$%jM"=YÅÒ% )'(% !8!$36'D% )=¤Å-/%&>?'(-g~O,02M !$>08~jM6<%,Y[3*ÜK% 'DV!$3 0 'DMSO&~jM<=%,Y[36*RyT³

Eg7(<%&*,A9 <=%&* Î ~&<-jM !$3S°8¥¥¦« L 'NM65 3 !.0YÏ<=5R0 0 '(<%&0y<=%&*>'(% '(>³5 >Ý3 !87(<=M '(V!K!8%jM 36HOM !8- %&' Ó 5 !.0yY[3W>?,*,!$7('(% )­-/%M6'D%5 5&05 %&-$!$3 M6<'D%jM '(!808 @³'(%Î mW!8-C!$36><=%U<=%R*UÅW L <=>* <% '[!8*&0$ fm/ ¯cX?/ y/=R $N¡j/m6 R¥&°8¢¦x¤°.¥&

"5&>¾¿ ,&<%&*×TR~,<7D>?!83R¾%±j£g6Eg!$% % '(7(!8060H&3 H,<)<=M '(%]"'DM >'DM 5 36!80Y M 365 %&-8<XM !.*!$H%&!$%jM '1<=710$ @'(%ÆÑÒmÏ,*,B[!8*® gÁ Xº­Þ ´=­=ÂB߶&X[c=àx¸R¯¸ ´./Ê/¼c´W»Wx¼$´=&j¡¶,mÂ$³$/=R xÑÐ!8-/M 5 36!á =M6!80'(%TK32M6'N#-/'1<=7PQ%jM !87D7('D)!$%&-$! ¦j£½°,¦¥x¤ £=&

~R<-jM !838K Î 2°8¥¢j Å«VX<7D5&<=M '(% )'(%,:&5 !8%&-/!â* '(<=)36<>0$ @Uã ¸ / ´= ¼»W/¼$X /Ê &¦Rj ¢½°/¤¢¢j,

~R<-jM !838m Î Ð<%&*B Ù!8% 7D!8O²S°.¥¢¥6Ï<5&0602'1<=%'(%,:&5&!$%&-$!*,'1<=)36<>0$ @?ä Xm$¡j/º/ Á /$ ,¦£ £ £½.¤,££

~&!$% XOmEg EgGS°.¥¥¦ T;% !8]§>?!/M6 ,*UY[336!$H&3 !.02!8%M '(% )B<=%&*å027DV'(% )B<xO!80 '(<%²*,!8-$'(0 'D%²H 36 7(!$>08 @B'(%Î ® gK<%&*É[!.* / $XG $N¡j/m6Úæ ´=R/¼ Á c=1¼ /¼ _ W³X Á c=1¼//¼" 2 &<=H&>?<%×<%&*<=7(7 ÑÐ%&* %°°8¥x¤m°8¦¢

74 B. R. Cobb

Page 85: Proceedings of the Third European Workshop on Probabilistic

Discriminative Scoring of Bayesian Network Classifiers:a Comparative Study

Ad Feelders and Jevgenijs IvanovsDepartment of Information and Computing Science

Universiteit Utrecht, The Netherlands

Abstract

We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of theconditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameterspace, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalentLogistic Regression (LR) model is possible, and optimization can be performed in LR parameterspace. We perform an empirical comparison of the efficiency of scoring in BN parameter space,and in LR parameter space using two different mappings. For each parameterization, we study twopopular optimization methods: conjugate gradient, and BFGS. Efficiency of scoring is comparedon simulated data and data sets from the UCI Machine Learning repository.

1 Introduction

Discriminative learning of Bayesian Network Clas-sifiers (BNCs) has received considerable attentionrecently (Greiner et al., 2005; Pernkopf and Bilmes,2005; Roos et al., 2005; Santafe et al., 2005). In dis-criminative learning, one chooses the parameter val-ues that maximize theconditionallikelihood of theclass label given the attributes, rather then thejointlikelihood of the class label and the attributes. Itis well known that conditional loglikelihood (CLL)optimization, although arguably more appropriatein a classification setting, is computationally moreexpensive because there is no closed-form solutionfor the ML estimates and therefore numerical op-timization techniques have to be applied. Since instructure learning of BNCs many models have to bescored, the efficiency of scoring a single model is ofconsiderable interest.

For BNCs with perfect independence graphs(such as Naive Bayes, TANs, FANs) a mappingto an equivalent Logistic Regression (LR) modelis possible, and optimization can be performed inLR parameter space. We consider two such map-pings: one proposed in (Roos et al., 2005), and adifferent mapping, that, although relatively straight-forward, has to our knowledge not been proposedbefore in discriminative learning of BNCs. We con-jecture that scoring models in LR space using our

proposed mapping is more efficient than scoring inBN space, because the logistic regression model hasfewer parameters than its BNC counterpart and be-cause the LR model is known to have a strictly con-cave loglikelihood function. To test this hypothesiswe perform experiments to compare the efficiencyof model fitting with both LR parameterizations,and the more commonly used BN parameterization.

This paper is structured as follows. In section 2we introduce the required notation and basic con-cepts. Next, in section 3 we describe two map-pings from BNCs with perfect graphs to equivalentLR models. In section 4 we give a short descrip-tion of the optimization methods used in the experi-ments, and motivate their choice. Subsequently, wecompare the efficiency of discriminative learning inLR parameter space and BN parameter space, us-ing the optimization methods discussed. Finally, wepresent the conclusions in section 7.

2 Preliminaries

2.1 Bayesian Networks

We use uppercase letters for random variables andlowercase for their values. Vectors are written inboldface. A Bayesian network (BN) (X, G =(V,E),θ) consists of a discrete random vectorX =(X0, . . . , Xn), a directed acyclic graph (DAG)Grepresenting the directed independence graph ofX,

Page 86: Proceedings of the Third European Workshop on Probabilistic

and a set of conditional probabilities (parameters)θ.V = 0, 1, . . . , n is the set of nodes ofG, andEthe set of directed edges. Nodei in G correspondsto random variableXi. With pa(i) (ch(i)) we de-note the set of parents (children) of nodei in G. Wewrite XS , S ⊆ 0, . . . , n to denote the projectionof random vectorX on components with index inS. The parameter setθ consists of the conditionalprobabilities

θxi|xpa(i)= P (Xi = xi|Xpa(i) = xpa(i)), 0 ≤ i ≤ n

We useXi = 0, . . . , di − 1 to denote the setof possible values ofXi, 0 ≤ i ≤ n. The setof possible values of random vectorXS is denotedXS = ×i∈SXi. We also useX−

i = Xi \ 0, andlikewiseX−

S = ×i∈SX−i

In a BN classifier there is one distinguished vari-able called the class variable; the remaining vari-ables are called attributes. We useX0 to denotethe class variable;X1, . . . , Xn are the attributes.To denote the attributes, we also writeXA, whereA = 1, . . . , n. We defineπ(i) = pa(i)\0, thenon-class parents of nodei, andφ(i) = i ∪ π(i).

Finally, we recall the definition of aperfectgraph:a directed graph in which all nodes that have a com-mon child are connected is calledperfect.

2.2 Logistic Regression

The basic assumption of logistic regression (Ander-son, 1982) for binary class variableX0 ∈ 0, 1 is

lnP (X0 = 1|Z)P (X0 = 0|Z)

= w0 +k∑

i=1

wiZi, (1)

where the predictorsZi (i = 1, . . . , k) can be sin-gle attributes fromXA, but also functions of one ormore attributes fromXA. In words: the log poste-rior odds are linear in the parameters, not necessar-ily in the basic attributes.

Generalization to a non-binary class variableX0 ∈ X0 gives

lnP (X0 = x0|Z)P (X0 = 0|Z)

= w(x0)0 +

k∑i=1

w(x0)i Zi, (2)

for all x0 ∈ X−0 . This model is often referred to

as the multinomial logit model or polychotomouslogistic regression model.

It is well known that the loglikelihood functionof the logistic regression model is concave and hasunique maximum (provided the data matrixZ isof full column rank) attained for finitew except intwo special circumstances described in (Anderson,1982).

2.3 Log-linear models

Let G = (V,E) be the (undirected) independencegraph of random vectorX, that isE is the set ofedges(i, j) such that whenever(i, j) is not inE, thevariablesXi andXj are independent conditionallyon the rest. The log-linear expansion of a graphicallog-linear model is

lnP (x) =∑C⊆V

uC(xC)

where the sum is taken over all complete subgraphsC of G, and allxC ∈ X−

C , that is,uC(xC) = 0for i ∈ C and xi = 0 (to avoid overparameteri-zation). Theu-term u∅(x) is just a constant. It iswell known that for BN’s with perfect directed inde-pendence graph, an equivalent graphical log-linearmodel is obtained by simply dropping the directionof the edges.

3 Mapping to Logistic Regression

In this section we discuss two different mappingsfrom BNCs with a perfect independence graph toequivalent logistic regression models. Equivalenthere means that, assumingP (X) > 0, the BNC andcorresponding LR model, represent the same set ofconditional distributions of the class variable.

3.1 Mapping of Roos et al.

Roos et al. (Roos et al., 2005) define a map-ping from BNCs whose canonical form is a per-fect graph, to equivalent LR models. The canonicalform is obtained by (1) taking the Markov blanketof X0 (2) marrying any unmarried parents ofX0.This operation does clearly not change the condi-tional distribution ofX0. They show that if thiscanonical form is a perfect graph, then the BNCcan be mapped to an equivalent LR model. Theirmapping creates an LR model with predictors (andcorresponding parameters) as follows

76 A. Feelders and J. Ivanovs

Page 87: Proceedings of the Third European Workshop on Probabilistic

1. Zxpa(0)= I(Xpa(0) = xpa(0)) with parameter

w(x0)xpa(0)

, for xpa(0) ∈ Xpa(0).

2. Zxφ(i)= I(Xφ(i) = xφ(i)) with parameter

w(x0)xφ(i)

, for i ∈ ch(0) andxφ(i) ∈ Xφ(i).

For a given BNC with parameter valueθ an equiva-lent LR model is obtained by putting

w(x0)xpa(0)

= ln θx0|xpa(0), w

(x0)xφ(i)

= ln θxi|xpa(i)

3.2 Proposed mapping

Like in the previous section, we start from thecanonical graph which is assumed to be perfect.Hence, we obtain an equivalent graphical log-linearmodel by simply dropping the direction of theedges. We then have

lnP (X0 = x0|XA)P (X0 = 0|XA)

= lnP (X0 = x0,XA)/P (XA)P (X0 = 0,XA)/P (XA)

= ln P (X0 = x0,XA) − lnP (X0 = 0,XA),

for x0 ∈ X−0 . Filling in the log-linear expansion

for lnP (X0 = x0,XA) andlnP (X0 = 0,XA), wesee immediately thatu-terms that do not containX0

cancel, and furthermore thatu-terms withX0 = 0are constrained to be zero by our identification re-strictions. Hence we get

lnP (X0 = x0,xA)− lnP (X0 = 0,xA) =

u0(X0 = x0) +∑C

uC(X0 = x0,xC) =

w(x0)∅ +

∑C

w(x0)xC

whereC is any complete subgraph ofG not con-tainingX0. Hence, to map to a LR model, we createvariables

I(XC = xC), xC ∈ X−C

to obtain the LR specification

lnP (X0 = x0|Z)P (X0 = 0|Z)

= w(x0)∅ +

∑C

w(x0)xC I(XC = xC)

This LR specification models the same set of condi-tional distributions ofX0 as the corresponding undi-rected graphical model, see for example (Sutton andMcCallum, 2006).

0

1 2

3 4 5

0

1 2

3 4 5

Figure 1: Example BNC (left); undirected graphwith same conditional distribution of class (right).

3.3 Example

Consider the BNC depicted in figure 1. Assumingall variables are binary, and using this fact to sim-plify notation, this maps to the equivalent LR model(with 9 parameters)

lnP (X0 = 1|Z)P (X0 = 0|Z)

= w∅ + w1X1 + w2X2+

w3X3 + w4X4 + w5X5+

w1,2X1X2 + w1,3X1X3 + w3,4X3X4

In the parameterization of (Roos et al., 2005), wemap to the equivalent LRRoos model (with 14 pa-rameters)

lnP (X0 = 1|Z)P (X0 = 0|Z)

= w1,2=(0,0)Z1,2=(0,0)+

w1,2=(0,1)Z1,2=(0,1) +w1,2=(1,0)Z1,2=(1,0)+

w1,2=(1,1)Z1,2=(1,1) +w1,3=(0,0)Z1,3=(0,0)+

. . . + w3,4=(1,1)Z3,4=(1,1)+

w5=(0)Z5=(0) + w5=(1)Z5=(1)

In both cases we setw(0) = 0.

4 Optimization methods

In the experiments, we use two optimization meth-ods: conjugate gradient(CG) and variable metric(BFGS) algorithms (Nash, 1990). Conjugate Gra-dient is commonly used for discriminative learningof BN parameters, see for example (Greiner et al.,2005; Pernkopf and Bilmes, 2005). In a study ofMinka (Minka, 2001), it was shown that CG andBFGS are efficient optimization methods for the lo-gistic regression task.

BFGS is a Hessian-based algorithm, which up-dates an approximate inverse Hessian matrix of size

Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study 77

Page 88: Proceedings of the Third European Workshop on Probabilistic

r2 at each step, wherer is the number of param-eters. CG on the other hand works with a vectorof sizer. This obviously makes each iteration lesscostly, however, in general CG exhibits slower con-vergence in terms of iterations.

Both algorithms at stepk compute an update di-rection u(k), followed by a line search. This isa one-dimensional search, which looks for a stepsizeα maximizingf(α) = CLL(w(k) + αu(k)),wherew(k) is a vector of parameter values at stepk. It is argued in (Nash, 1990) that neither algo-rithm benefits from too large a step being taken.Even more, for BFGS it is not desirable that the in-crease in the function value is different in magni-tude from the one determined by the gradient value,(w(k+1) − w(k))Tg(k). Therefore, simpleaccept-able point search is suggested for both methods.In case of CG an additional step is made. Oncean acceptable point has been found, we have suf-ficient information to fit a parabola to the projec-tion of the function on the search direction. Theparabola requires three pieces of information: thefunction value at the end of the last iteration (orthe initial point), the projection of the gradient atthis point onto the search direction, and the newfunction value at the acceptable point. If the CLLvalue at the maximum of the parabola is larger thanthe CLL value at the acceptable point, then the for-mer becomes the starting point for the next iteration.Another approach to CG line search is Brent’s linesearch method (Press et al., 1992). It iteratively fitsa parabola to 3 points. The main difference with theprevious approach is that we do find an optimum inthe given update direction. This, however, requiresmore function evaluations at each iteration.

In our experiments we use the implementation ofthe CG and BFGS methods of theoptim functionof R (Venables and Ripley, 2002) which is based onthe source code from (Nash, 1990). In addition weimplemented CG with Brent’s line search (with rela-tive precision for Brent’s method set to 0.0002). Werefer to this algorithm as CGB. The conjugate gra-dient method may use different heuristic formulasfor computing the update direction. Our preliminarystudy showed that the difference in performance be-tween these heuristics is small. We use the Polak-Ribiere formula, suggested in (Greiner et al., 2005)for optimization in the BN parameter space.

In our study we compare the rates of convergencefor 9 optimization techniques: 3 optimization al-gorithms (CG, CGB, BFGS) used in 3 parameterspaces (LR, LRRoos, BN). We compare conver-gence in terms of (1) iterations of the update di-rection calculation, and (2) floating point operations(flops). The number of flops provides a fair compar-ison of the performance of the different methods,but the number of iterations gives us additional in-sight into the behavior of the methods.

Let costCLL and costgrad denote the costs inflops of CLL and gradient evaluations respectively,and let countCLL be the number of times CLLis evaluated in a particular iteration. We estimatethe cost of one particular iteration of BFGS as12r2 + 8r + (4r + costCLL)countCLL + costgrad;the cost of one CG and CGB iteration is10r+(4r+costCLL)countCLL+costgrad. These estimates areobtained by inspecting the source code in (Nash,1990).

5 Parameter learning

5.1 Logistic regression

Equation 2 can be rewritten as follows

P (X0 = x0|Z) =ew(x0)T

Z∑d0−1x′0=0 ew(x′0)T

Z,

where we putZ0 = 1, which corresponds to the in-tercept, and fixw(0) = 0. wT denotes the transposeof w. The conditional loglikelihood of the parame-ters given data is

CLL(w) = log∏N

i=1 P (x(i)0 |z(i))

=∑N

i=1(w(x

(i)0 )

Tz(i) − log(

∑d0−1x′0=0 ew(x′0)

Tz(i)

)),

whereN is the number of observations in the dataset. The gradient of the CLL is given by

∂CLL(w)

∂w(x0)k

=

∑Ni=1

1x(i)0 =x0

z(i)k − ew

(x0)Tz(i)z

(i)kPd0−1

x′0=0ew

(x′0)Tz(i)

wherex0 ∈ X−

0 . We note here that the data ma-trix Z is a sparse matrix of indicators with 1s in

78 A. Feelders and J. Ivanovs

Page 89: Proceedings of the Third European Workshop on Probabilistic

non-zero positions, thus the CLL and gradient func-tions can be implemented very efficiently. We esti-mate costs (in flops) according to above formulas:costCLL = |Z|(d0 − 1) + N(d0 + 2), costgrad =|Z|d0 + 2Nd0, where |Z| denotes number of 1sin the data matrix. Here we used the fact that themultiplication of two vectorsw(x0)Tz(i) requires|z(i)|−1 flops. In case of LRRoos matrixZ alwayscontains exactly1 + n − |pa(0)| 1s irrespective ofgraph complexity and dimension of attributes. Forour mapping|Z| depends on the sample. In theexperiments we used the following heuristic to re-duce |Z|: for each attributeXi we code the mostfrequent value as 0. We note that|Z| is smaller incase of our mapping comparing to LRRoos map-ping, when the structure is not very complex (withregard to the number of parents and the domain sizeof the attributes) and becomes bigger for more com-plex structures.

5.2 Bayesian Network Classifiers

Here we follow the approach taken in (Greiner etal., 2005; Pernkopf and Bilmes, 2005). We write

θji|k = P (xj = i|xpa(j) = k).

We have constraintsθji|k ≥ 0 and

∑dj−1i=0 θj

i|k = 1.We reparameterize to incorporate the constraints onθji|k and use different parametersβj

i|k as follows

θji|k =

expβji|k∑dj−1

l=0 expβjl|k

The CLL is given by

CLL(β) =N∑

t=1

(log P (x(t))− logd0−1∑

x(t)0 =0

P (x(t)))

Further expansion may be obtained using the factor-ization ofP (x) and plugging in expressions forθj

i|k.It is easy to see that

∂θji′|k

∂βji|k

= θji′|k(1i=i′ − θj

i|k),

thus

∂P (x)

∂βji|k

= 1xpa(j)=kP (x)(1xj=i − θji|k)

Simple calculations result in

∂CLL(β)

∂βji|k

=

∑Nt=1

(1x(t)

j =i,x(t)pa(j)

=k − 1x(t)pa(j)

=kθji|k−Pd0−1

x(t)0 =0

[P (x(t))(1x

(t)j

=i,x(t)pa(j)

=k−1

x(t)pa(j)

=kθji|k)]Pd0−1

x(t)0 =0

P (x(t))

Note that for eacht andj > 1 only d0dj gradientvalues are to be considered. In order to obtainθfrom β we need3|β| flops, where|β| denotes thenumber of components ofβ. From the formulasabove we estimatecostCLL = 3|β| + N(nd0 +2d0 + 2) and costgrad = 3|β| + N(2nd0 + n +(2d0 + 1)(3 + d1 + . . . + dn)).

Finally, we point out that the cost of the gradi-ent is very close to the cost of the CLL in case ofLR and is by a factor2 + 2d larger in case of BN.This suggests that CGB might be better than CG forBN, but it is very unlikely that it will be better alsofor LR and LRRoos parameter spaces. Note thatcostCLL(BN) is very close tocostCLL(LR Roos),which strongly supports the fairness of our cost es-timates.

5.3 Convergence issues

It is well known that LR has a concave loglikeli-hood function. Since BNCs whose canonical formis a perfect graph are equivalent to the LR modelsobtained by either mapping, it follows from the con-tinuity of the mapping fromθ to w, that the CLLfunction in the standard BN parameterization alsohas only global optima (Roos et al., 2005). This im-plies that all our algorithms should converge to themaximal CLL value.

It is important to pick good initial values for theparameters. The common approach to initialize BNparameters is to use the (generative) ML estimates.It is crucial for all three parameter spaces to avoidzero probabilities, therefore we use Laplace’s rule,and add one to each frequency. After the initial val-ues ofθ are obtained, we can derive starting val-ues forw as well. We simply apply the mappingsto obtain the initial values for the LRRoos and LRparameters. This procedure guarantees that all algo-rithms have the same initial CLL value.

Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study 79

Page 90: Proceedings of the Third European Workshop on Probabilistic

6 Experiments

For the experiments we used artificially generateddata sets and data sets from the UCI Machine Learn-ing Repository (Blake and Merz, 1998). Artifi-cial data were generated from Bayesian Networks,where the parameter values were obtained by sam-pling each parameter from a Dirichlet distributionwith α = 1. We generated data from both simpleand complex structures in order to evaluate perfor-mance under different scenarios. We generated per-fect graphs in the following way: for each attributewe selecti parents: the class node andi − 1 previ-ous nodes(whenever possible) according to index-ing, wherei ∈ 1, 2, 3.

Table 1 lists the UCI data sets used in the ex-periments, and their properties. To discretize nu-meric variables, we used the algorithm described in(Fayyad and Irani, 1993). The? values in the votesdata set were treated as a separate value, not as amissing value.

Table 1: Data sets and their properties#Samples #Attr. #Class Max attr. dim.

Glass 214 9 6 4Pima 768 8 2 4Satimage 4435 36 7 2Tic-tac-toe 958 9 2 3Voting 435 16 2 3

The fitted BNC structures for the UCI data wereobtained by (1) connecting the class to all attributes,(2) using a greedy generative structure learning al-gorithm to add successive arcs. Step (2) was notapplied for the Satimage data set for which we fit-ted the Naive Bayes model. All structures happenedto be perfect graphs, thus there was no need for ad-justment. In figure 2 we show the structure fitted tothe Pima Indians data set. The other structures wereof similar complexity.

Figure 3 depicts convergence curves for 9 opti-mization algorithms on the Pima Indians data.

Table 2 depicts statistics for different UCI and ar-tificial data sets. For each data seti we have start-ing CLL valueCLLi

ML and maximal (over all algo-rithms) CLL valueCLLi

max. We define a thresholdti = CLLi

max−5%∗(CLLimax−CLLi

ML). For ev-ery algorithm we computed the number of iterations

01

2

3

4

5

6

7

8

Figure 2: Structure fitted to Pima Indians data. Thestructure was constructed using a hill climber on the(penalized) generative likelihood.

0 e+00 2 e+06 4 e+06 6 e+06 8 e+06 1 e+07

−33

7.0

−33

6.8

−33

6.6

−33

6.4

−33

6.2

flops

CLL

LR + CGLR + CG_full.searchLR + BFGSLR_Roos + CGLR_Roos + CG_full.searchLR_Roos + BFGSBN + CGBN + CG_full.searchBN + BFGS

Figure 3: Optimization curves on the Pima Indiansdata.

(flops) needed to reach the threshold. For each dataset these numbers are scaled dividing by the small-est value. The bound of5% was selected heuris-tically. This bound is big enough, so all algorithmswere able to reach it before 200 iterations and beforesatisfying the stopping criterion. The bound is smallenough, so algorithms make in general quite a fewiterations before achieving it. There are some datasets, however, on which all algorithms converge fastto a very high CLL value, and a clear distinctionin performance is visible using a very small bound.Pima Indians is an example (5% threshold is set to-337.037). We still see that table 2 contains a quitefair comparison, except that BFGS methods are un-derestimated for smaller bounds.

It is very interesting to notice that convergence

80 A. Feelders and J. Ivanovs

Page 91: Proceedings of the Third European Workshop on Probabilistic

Data Set LR LR Roos BNCG CGB BFGS CG CGB BFGS CG CGB BFGS

Glass 2.1 1.9 1.0 3.0 2.0 1.1 2.4 1.9 1.0Pima 1.3 1.5 1.8 1.5 1.2 1.8 1.0 1.0 1.8Satimage 8.2 2.3 1.0 3.3 2.0 1.1 3.0 1.0 1.3Tic-tac-toe 3.9 2.2 1.5 2.1 1.1 1.2 2.2 1.0 1.3Voting 3.2 1.8 1.2 1.9 1.3 1.0 1.5 1.0 1.11.5.2-3.100 2.8 1.4 1.2 2.3 1.3 1.1 2.0 1.0 1.22.5.2-3.100 2.6 2.1 1.4 1.8 1.6 1.3 1.4 1.0 1.03.5.2-3.100 2.7 1.8 1.3 2.2 1.0 1.2 1.3 1.2 1.21.5.2-3.1000 2.2 2.2 2.2 1.5 1.5 2.5 1.0 1.0 1.52.5.2-3.1000 2.2 2.0 1.8 1.2 1.2 1.8 1.0 1.0 2.53.5.2-3.1000 3.6 2.2 1.0 1.8 1.6 1.0 1.3 1.3 1.01.30.2-3.1000 2.0 1.2 1.0 1.5 1.2 1.5 1.2 1.0 1.22.30.2-3.1000 2.0 1.8 2.2 1.6 1.2 1.4 1.4 1.0 1.23.30.2-3.1000 2.2 1.2 2.0 1.6 1.0 1.6 1.4 1.0 1.21.5.5.1000 3.1 3.2 1.8 2.3 2.3 1.5 1.1 1.1 1.02.5.5.1000 6.7 6.9 2.3 3.1 3.0 1.4 1.5 1.5 1.03.5.5.1000 4.8 4.2 1.8 2.3 2.2 1.1 1.7 1.7 1.0mean 3.3 2.4 1.6 2.1 1.6 1.4 1.6 1.2 1.3

Data Set LR LR Roos BNCG CGB BFGS CG CGB BFGS CG CGB BFGS

Glass 1.0 2.4 3.3 2.6 4.1 8.3 5.9 8.1 12.6Pima 1.0 2.4 1.6 1.8 2.5 3.1 4.0 6.1 11.0Satimage 4.6 3.0 1.0 4.6 7.1 3.3 11.5 6.1 7.6Tic-tac-toe 2.0 3.2 1.0 1.3 1.7 1.3 6.1 4.0 6.0Voting 1.2 1.9 1.9 1.0 1.8 3.5 3.6 4.2 15.81.5.2-3.100 1.0 1.7 1.0 1.2 2.1 1.8 2.9 2.9 5.12.5.2-3.100 1.2 2.8 3.5 1.0 2.6 7.4 2.1 3.2 13.23.5.2-3.100 1.0 2.4 2.3 1.1 1.4 5.3 2.1 4.4 21.21.5.2-3.1000 1.0 2.0 1.7 1.0 1.9 3.9 1.5 2.4 5.52.5.2-3.1000 1.5 3.2 2.2 1.0 2.0 4.6 2.3 3.6 25.03.5.2-3.1000 1.7 2.6 1.4 1.0 2.2 3.5 2.2 4.3 13.51.30.2-3.1000 1.9 3.7 1.0 3.4 8.4 4.3 12.1 17.4 16.62.30.2-3.1000 1.0 3.1 2.2 1.2 3.2 3.3 4.9 7.4 11.83.30.2-3.1000 1.2 2.3 5.8 1.0 2.4 12.5 4.1 7.6 36.61.5.5.1000 1.1 2.8 1.5 1.0 2.4 1.9 1.6 2.4 2.82.5.5.1000 2.3 5.6 11.1 1.0 2.2 10.1 1.7 2.4 11.63.5.5.1000 2.6 5.4 101.5 1.0 2.3 98.1 2.4 3.8 137.5mean 1.6 3.0 8.5 1.5 3.0 10.4 4.2 5.3 20.8

Table 2: Convergence speed in terms of iterations (top) and flops (bottom). Artificial data sets are namedin the following way: [number of parents of each attribute].[number of attributes].[domain size of eachattribute].[sample size]. Domain size denoted as 2-3 is randomly chosen between 2 and 3 with equal proba-bilities.

Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study 81

Page 92: Proceedings of the Third European Workshop on Probabilistic

with regard to iterations is faster in BN andLR Roos parameter spaces. It seems that overpa-rameterization is advantageous in this respect. Thismight be explained by the fact that there is only asingle global optimum in the LR case, whereas thereare many global optima in case of BN and LRRoos.Thus, in the latter case one can ’quickly’ converge tothe closest optimum. Overparameterization, how-ever, is also an additional burden, as witnessed bythe flops count; this is especially true for the BFGSmethod.

We observe that BFGS is generally winning, withCGB being close, in terms of iterations. Consid-ering flops, CGB is definitely losing to CG in allparameterizations. CG seems to be the best tech-nique, though it might be very advantageous to useBFGS for simple (with respect to the number of par-ents and the domain size of attributes) structures, incombination with our LR mapping.

We have both theoretical and practical evidencethat our LR parameterization is the best for rela-tively simple structures. So the general conclusionis to use LR + BFGS, LR + CG and LRRoos + CGin order of growing structure complexity. The sizeof the domain and the number of parents mainly in-fluence our choice. On the basis of flop counts, theBN parameterization is never preferred in our ex-periments. Finally, we note that our LR parameteri-zation was the best on 4 out of 5 UCI data sets.

7 Conclusion

We have studied the efficiency of discriminativescoring of BNCs using alternative parameteriza-tions of the conditional distribution of the class vari-able. In case the canonical form of the BNC is a per-fect graph, there is a choice between at least threeparameterizations. We found out that it is wise toexploit perfectness by optimizing in LR or LRRoosspaces. Based on the experiments we have per-formed, we would suggest to use LR + BFGS, LR +CG and LRRoos + CG in order of growing struc-ture complexity. If only one method is to be se-lected, we suggest LRRoos + CG. It works prettywell on any type of data set, plus the mapping isvery straightforward, which makes the initializationstep easy to implement.

References

J. A. Anderson. 1982. Logistic discrimination. In P. R.Krishnaiah and L. N. Kanal, editors,Classification,Pattern Recognition and Reduction of Dimensionality,volume 2 ofHandbook of Statistics, pages 169–191.North-Holland.

C.L. Blake and C.J. Merz. 1998. UCIrepository of machine learning databases[http://www.ics.uci.edu/∼mlearn/mlrepository.html].

U. Fayyad and K. Irani. 1993. Multi-interval discretiza-tion of continuous valued attributes for classificationlearning. In Proceedings of IJCAI-93 (volume 2),pages 1022–1027. Morgan Kaufmann.

R. Greiner, S. Xiaoyuan, B. Shen, and W. Zhou. 2005.Structural extension to logistic regression: Discrimi-native parameter learning of belief net classifiers.Ma-chine Learning, 59:297–322.

T. Minka. 2001. Algorithms for maximum-likelihoodlogistic regression. Technical Report Statistics 758,Carnegie Mellon University.

J. C. Nash. 1990. Compact Numerical Methods forComputers: Linear Algebra and Function Minimisa-tion (2nd ed.). Hilger.

F. Pernkopf and J. Bilmes. 2005. Discriminative ver-sus generative parameter and structure learning ofBayesian network classifiers. InICML ’05: Proceed-ings of the 22nd international conference on Machinelearning, pages 657–664, New York. ACM Press.

W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vet-terling. 1992.Numerical Recipes in C: the art of sci-entific computing. Cambridge.

T. Roos, H. Wettig, P. Grunwald, P. Myllymaki, andH. Tirri. 2005. On discriminative Bayesian networkclassifiers and logistic regression.Machine Learning,59:267–296.

G. Santafe, J. A. Lozano, and P. Larranaga. 2005. Dis-criminative learning of Bayesian network classifiersvia the TM algorithm. In L. Godo, editor,ECSQARU2005, volume 3571 ofLecture Notes in Computer Sci-ence, pages 148–160. Springer.

C. Sutton and A. McCallum. 2006. An introduction toconditional random fields for relational learning. InL. Getoor and B. Taskar, editors,Introduction to Sta-tistical Relational Learning. MIT Press. To appear.

W.N. Venables and B.D. Ripley. 2002.Modern AppliedStatistics with S (fourth edition). Springer, New York.

82 A. Feelders and J. Ivanovs

Page 93: Proceedings of the Third European Workshop on Probabilistic

The Independency tree model:a new approach for clustering and factorisation

M. Julia Flores and Jose A. GamezComputing Systems Department / SIMD (i3A)

University of Castilla-La ManchaAlbacete, 02071, Spain

Serafın MoralDepartamento de Ciencias de la Computacion e I. A.

Universidad de GranadaGranada, 18071, Spain

Abstract

Taking as an inspiration the so-called Explanation Tree for abductive inference in Bayesiannetworks, we have developed a new clustering approach. It is based on exploiting thevariable independencies with the aim of building a tree structure such that in each leaf allthe variables are independent. In this work we produce a structure called Independency

tree. This structure can be seen as an extended probability tree, introducing a newand very important element: a list of probabilistic single potentials associated to everynode. In the paper we will show that the model can be used to approximate a jointprobability distribution and, at the same time, as a hierarchical clustering procedure.The Independency tree can be learned from data and it allows a fast computation ofconditional probabilities.

1 Introduction

In the last years the relevance of unsupervisedclassification within data mining processing hasbeen remarkable. When dealing with largenumber of cases in real applications, the iden-tification of common features that allows theformation of groups/clusters of cases seems tobe a powerful capability that both simplifies thedata processing and also allows the user to un-derstand better the trend(s) followed in the reg-istered cases.

In the data mining community this descrip-

tive task is known as cluster analysis (Ander-berg, 1973; Duda et al., 2001; Kaufman andRousseeuw, 1990; Jain et al., 1999), that is,getting a decomposition or partition of a dataset into groups in such a way that the objectsin one group are similar to each other but asdifferent as possible from the objects in othergroups. In fact, as pointed out in (Hand et al.,

2001, pg. 293), we could distinguish two differ-ent objectives in this descriptive task: (a) seg-

mentation, in which the aim is simply to par-tition the data in a convenient way, probablyusing only a small number of the available vari-ables; and (b) decomposition, in which the aimis to see whether the data is composed (or not)of natural subclasses, i.e., to discover whetherthe overall population is heterogeneous. Strictlyspeaking cluster analysis is devoted to the sec-ond goal although, in general, the term is widelyused to describe both segmentation and clusteranalysis problems.

In this work we only deal with categorical ordiscrete variables and we are closer to segmen-tation than (strictly speaking) to cluster anal-ysis. Our proposal is a method that in ouropinion has several good properties: (1) it pro-duces a tree-like graphical structure that allowsus to visually describe each cluster by meansof a configuration of the relevant variables for

Page 94: Proceedings of the Third European Workshop on Probabilistic

that segment of the population; (2) it takes ad-vantage of the identification of contextual inde-pendencies in order to build a decomposition ofthe joint probability distribution; (3) it stores ajoint probability distribution about all the vari-ables, in a simple way, allowing an efficient com-putation of conditional probabilities. An exam-ple of an independence tree is given in figure 1,which will be thoroughly described in the fol-lowing sections.

The paper is structured as follows: first, inSection 2 we give some preliminaries in rela-tion with our proposed method. Section 3 de-scribes the kind of model we aim to look for,while in Section 4 we propose the algorithm wehave designed to discover it from data. Section5 describes the experiments carried out. Finally,Section 6 is devoted to the conclusions and todescribe some possible lines in order to continueour research.

2 Preliminaries

Different types of clustering algorithms can befound in the literature differing in the type ofapproach they follow. Probably the three mainapproaches are: partition-based clustering, hi-erarchical clustering, and probabilistic model-based clustering. From them, the first two ap-proaches yield a hard clustering in the sense thatclusters are exclusive, while the third one yieldsa soft clustering, that is, an object can belongto more than one cluster following a probabilitydistribution. Because our approach is somewhatrelated to hierarchical and probabilistic model-based clustering we briefly comment on thesetypes of clustering.

Hierarchical clustering (Sneath and Sokal,1973) returns a tree-like structure called den-drogram. This structure reflects the way inwhich the objects in the data set have beenmerged from single points to the whole setor split from the whole set to single points.Thus, there are two distinct types of hierar-chical methods: agglomerative (by merging twoclusters) and divisive (by splitting a cluster).Although our method does not exactly belongto hierarchical clustering, it is closer to hierar-

chical divisive methods than to any other clus-tering approach (known by us). In concrete,it works as monothetic divisive clustering algo-rithms (Kaufman and Rousseeuw, 1990), thatsplit clusters using one variable at a time, butdiffers from classical divisive approaches in theuse of a Bayesian score to decide which variableis chosen at each step, and because branches arenot completely developed.

With respect to probabilistic clustering, al-though our method produces a hard clustering,it is somehow related to it because probabilitydistributions are used to complete the informa-tion about the discovered model. Probabilisticmodel-based clustering is usually modelled as amixture of models (see e.g. (Duda et al., 2001)).Thus, a hidden random variable is added to theoriginal observed variables and its states corre-spond with the components of the mixture (thenumber of clusters). In this way we move to aproblem of learning from unlabelled data andusually EM algorithm (Dempster et al., 1977)is used to carry out the learning task when thegraphical structure is fixed and structural EM(Friedman, 1998) when the graphical structurehas also to be discovered (Pena et al., 2000).Iterative approaches have been described in theliterature (Cheeseman and Stutz, 1996) in orderto discover also the number of clusters (compo-nents of the mixture). Notice that although thisis not the goal of the learning task, the struc-tures discovered can be used for approximateinference (Lowd and Domingos, 2005), havingthe advantage over general Bayesian networks(Jensen, 2001) that the learned graphical struc-ture is, in general, simple (i.e. naive Bayes) andso inference is extremely fast.

3 Independency tree model

If we have the set of variables X =X1, . . . ,Xm regarding a certain domain, anindependent probability tree for this set of vari-ables is a tree such that (e.g. figure 1):

• Each inner node N is labelled by a vari-able Var(N), and this node has a child foreach one of the possible values (states) ofVar(N).

84 M. J. Flores, J. A. Gámez, and S. Moral

Page 95: Proceedings of the Third European Workshop on Probabilistic

When a variable is represented by a nodein the tree it implies that this variable par-titions the space from now on (from thispoint to the farther branches in the tree)depending on its (nominal) value. So, avariable cannot appear again (deeper) inthe tree.

• Each node N will have an associated listof potentials, List(N), with each of the po-tentials in the list storing a marginal prob-ability distribution for one of the variablesin X, including always a potential for thevariable Var(N).

This list appears in Figure 1 framed into adashed box.

• For any path from the root to a leaf, eachone of the variables in X appears uniquelyand exactly once in the the lists associatedto the nodes along the path. This poten-tial will determine the conditional proba-bility of the variables given the values ofthe variables on the path from the root tothe list containing the potential.

A configuration is a subset of variables Y ⊆X1, . . . ,Xm together with a concrete valueYj = yj for each one of the variables Yj ∈ Y.Each node N has an associated configurationdetermined for the variables in the path fromthe root to node N (excluding Var(N)) with thevalues corresponding to the children we have tofollow to reach N . This configuration will bedenoted as Conf(N).

An independent probability tree represents ajoint probability distribution, p, about the vari-ables in X. If x = (x1, . . . , xm), then

p(x) =m∏

i=1

Px(i)(xi)

where Px(i) is the potential for variable Xi whichis in the path from the root to a leaf determinedby configuration x (following in each inner nodeN with variable Var(N) = Xj , the child corre-sponding to the value of Xj ∈ x).

This decomposition is based on a set of inde-pendencies among variables X1, . . . ,Xm. As-sume that VL(N) is the set of variables of the

potentials in List(N), and that DVL(N)1 is theunion of sets VL(N ′), where N ′ is a node ina path from N to a leaf, then the independen-cies are generated from the following statement:Each variable X in VL(N) - Var(N) is indepen-

dent of the variables (VL(N) − X)∪DVL(N)given the configuration Conf(N); i.e. each vari-able in the list of a node is independent of theother variables in that list and of the variablesin its descendants, given the configuration asso-ciated to the node.

An independent probability tree also definesa partition of the set of possible values of vari-ables in X. The number of clusters is the num-ber of leaves. If N is a leaf with associatedconfiguration Conf(N), then this group is givenby all set of values x that are compatible withConf(N) (they have the same value for all thevariables in Conf(N)). For example, in figure 1the configuration X = 0, 1, 0, 1, 0 would fallon the second leaf since X1=0 and X2=1. Itis assumed that the probability distribution ofthe other variables in the cluster are given bythe potentials in the path defined by the config-uration, for example, P(X3=0)=1.

In this way, with an independence tree we areable of accomplishing two goals in one: (1)Thevariables are partitioned in a hierarchical waythat gives us at the same time a clustering re-sult; (2)The probability value of every configu-ration of the variables.

X1

X2: "0" [1] "1" [0]X3: "0" [1] "1" [0]

"1""0"

"1""0"

X5: "0" [0.5] "1" [0.5]X4: "0" [0.5] "1" [0.5]

X1: "0" [0.67] "1" [0.33]

X3: "0" [0] "1" [1] X3: "0" [1] "1" [0]

X2

X2: "0" [0.5] "1" [0.5]

Figure 1: Illustrative independence tree struc-ture learned from the exclusive dataset.

1D stands for “Descendants”.

The Independency tree model: a new approach for clustering and factorisation 85

Page 96: Proceedings of the Third European Workshop on Probabilistic

Our structure seeks to keep in leaves thosevariables that remain independent. At the sametime, if the distribution of one variable is sharedby several leaves, we try to store it in their com-mon ascendant, to avoid repeating it in all theleaves. For that reason, when one variable ap-pears in a list for a node Nj it means that thisdistribution is common for all levels from here toa leaf. For example, in figure 1, the binary vari-able X4 has a uniform (1/2,1/2) distribution forall leaf cases (also for intermediate ones), sinceit is associated to the root node. On the otherhand, we can see how X2 distribution varies de-pending on the branch (left or right) we takefrom the root, that is, when X1 = 0 the be-haviour of variable X2 is uniform whereas being1 the value of X1, X2 is determined to be 0.

The intuition underlying this model is basedon the idea that inside each cluster the variablesare independent. When we have a set of data,groups are defined by common values in certainvariables, having the other variables randomvariations. Imagine that we have a databasewith characteristics of different animals includ-ing mammals and birds. The presence of thesetwo groups is based on the existence of depen-dencies between the variables (two legs is re-lated with having wings and feathers). Oncethese variables are fixed, there can be anothervariables (size, colour) that can have randomvariations inside each group, but that they donot define new subcategories.

Of course, there are some other possible al-ternatives to define clustering. This is basedon the idea that when all the variables are inde-pendent, then to subdivide the population is notuseful, as we have a simple method to describethe joint behaviour of the variables. However,if some of the variables are dependent, then thevalues of some basic variables could help to de-termine the values of the other variables, andthen it can be useful to divide the populationin groups according to the values of these basicvariables.

Another way of seeing it is that having inde-pendent variables is the simplest model we canhave. So, we determine a set of categories suchthat each one is described in a very simple way

(independent variables). This is exploited byanother clustering algorithms as EM-based Au-toClass clustering algorithm (Cheeseman andStutz, 1996).

An important fact of this model is that it is ageneralisation of some usual models in classifi-cation, the Naive Bayes model (a tree with onlyone inner node associated to the class and inits children all the rest of variables are indepen-dent), and the classification tree (a tree in whichthe list of each inner node only contains one po-tential and the potential associated to the classvariable always appears in the leaves). This gen-eralization means that our model is able of rep-resenting the same conditional probability dis-tribution of the class variable with respect to therest of the variables, taking as basis the same setof associated independencies.

From this model one may want to apply twokinds of operations:

• The production of clusters and their charac-

terisation. A simple in-depth from root until ev-ery leaf node access of the tree will give us thecorresponding clusters, and also the potentiallists associated to the path nodes will indicatethe behaviour of the other variables not in thepath.

• The computation of the probability for a cer-tain complete configuration with a value foreach variable: x = x1, x2, . . . , xm. This al-gorithm is recursive and works as follows:

getProb(Configuration x1, x2, . . . , xm,Node N)Prob ← 11 For all Potential Pj ∈ List(N)

1.1. Xj ← Var(Pj)1.2. Prob ← Prob · Pj(Xj = xj)

2 If N is a leaf then return Prob

3 Else3.1. XN ← Var(N)3.2. Next Node N

′ ← Branch child(XN = xN )3.3. return Prob·getProb(x1, x2, . . . , xm,N

′)

If we have a certain configuration we shouldgo through the tree from root until leaves tak-ing the corresponding branches. Every time wereach a node, we have to use the single poten-tials in the associated list to multiply the valueof probability by the values of the potentialscorresponding to this configuration.

It is also possible to compute the probabil-ity for an interest variable Z conditioned to

86 M. J. Flores, J. A. Gámez, and S. Moral

Page 97: Proceedings of the Third European Workshop on Probabilistic

a generic configuration (a set of observations):Y = y. This can be done in two steps: 1)Transform the independent probability tree intoa probability tree (Salmeron et al., 2000); 2)Make a marginalisation of the probability treeby adding in all the variables except in Z asdescribe in (Salmeron et al., 2000).

In the following we describe the first step.It is a recursive procedure that visits all nodesfrom root to leaves. Each node can pass to itschildren a float, Prob (1 in the root) and a po-tential P which depends of variable Z (emptyin the root). Each time a node is visited, thefollowing operations are carried out:• All the potentials in List(N) are examined andremoved. For any potential, depending on itsvariable Xj we proceed as follows:

- If Xj 6= Z, then if Xj = Yk is in the obser-vations configuration, Prob is multiplied by thevalue of this potential for Yk = yk.

- If Xj 6= Z and Xj does not appear in theobservations configuration, then the potential isignored.

- If Xj = Z and Xj is the one associatedto node N , then we transform each one of thechildren of Z, but multiplying Prob by the valueof potential in Z = z, before transforming thechild corresponding to this value.

- If Xj = Z and Xj is not the one associatedto node N , then the potential of Xj is stored inP.• After examining the list of potentials, we pro-ceed:

- If N is not a leaf node, then we transformits children.

- If N is a leaf node and P = ∅, we assign thevalue Prob to this node.

- If N is a leaf node and P 6= ∅, we make Nan inner node with variable Z: For each valueZ = z, build a node N ′

z which is a leaf nodewith a value equal to Prob ⊙ P(z). Make N ′

z achild of N .

This procedure is fast: it is linear in the thesize of the independent probability tree (consid-ering the number of values of Z constant). Themarginalisation in the second step has the sametime complexity. In fact, this marginalizationcan be done by adding for each value Z = z the

values of the leaves that are compatible withthis value (compatibility means that to followthis path we do not have to assume Z = z′ withz 6= z′), which is linear too.

4 Our clustering algorithm

In this section we are going to describe howan independent probability tree can be learnedfrom a database D, with values for all the vari-ables in X = X1, . . . ,Xm.

The basics of the algorithm are simple. Ittries to determine for each node, the variablewith a strongest degree of dependence with therest of remaining variables. This variable willbe assigned to this node, repeating the processwith its children until all the variables are inde-pendent.

For this, we need a measure of the degreeof dependence of two variables Xi and Xj indatabase D. The measure should be centeredaround 0, in such a way that the variables areconsidered dependent if and only if the measureis greater than 0. In this paper, we consider theK2 score (Cooper and Herskovits, 1992), mea-suring the degree of dependence as the differ-ence in the logarithm of the K2 score of Xj

conditioned to Xi minus the logarithm of K2score of marginal Xj , i.e. the difference be-tween the logarithms of the K2 scores of twonetworks with two variables: one in which Xi

is a parent of Xj and other in which the twovariables are not connected. Let us call thisdegree of dependence Dep(Xi,Xj |D). This isa non-symmetrical measure and should be readas the influence of Xi on Xj , however in prac-tice the differences between Dep(Xi,Xj |D) andDep(Xj ,Xi|D) are not important.

In any moment, given a variable Xi and adatabase D, we can estimate a potential Pi(D)for this variable in the database. This potentialis the estimation of the marginal probability ofXi. Here we assume that this is done by count-ing the absolute frequencies of each one of thevalues in the database.

The algorithm starts with a list of variablesL which is initially equal to X1, . . . ,Xm anda database equal to the original one D, then it

The Independency tree model: a new approach for clustering and factorisation 87

Page 98: Proceedings of the Third European Workshop on Probabilistic

determines the root node N and its children ina recursive way. For that, for any variable Xi

in the list, it computes

Dep(Xi|D) =∑

Xj∈L

Dep(Xi,Xj |D)

Then, the variable Xk with maximum valueof Dep(Xi|D) is considered.

If Dep(Xk|D) > 0, then we assign variableXk to node N and we add potential Pk(D) tothe list List(N), removing Xk for L. For allthe remaining variables Xi in L, we computeDep(Xi,Xj |D) for Xj ∈ L(j 6= i), and Xj =Xk. If all these values are less or equal than0, then we add potential Pi(D) to List(N) andremove Xi from L; i.e. we keep in this nodethe variables which are independent of the restof variables, including the variable in the node.Finally, we build a child of N for each one ofthe values Xk = xk. This is done by callingrecursively to the same procedure, but with thenew list of nodes, and changing database D toD[Xk = xk], where D[Xk = xk] is the subsetof D given by those cases in which variable Xk

takes the value xk.If Dep(Xk|D) ≤ 0, then the process is

stopped and this is a leaf node. We buildList(N), by adding potential Pi(D) for any vari-able Xi in L.

In this algorithm, the complexity of each nodecomputation is limited by O(m2.n) where m isthe number of variables and n is the databasesize. The number of leaves is limited by thedatabase size n multiplied by the maximumnumber of cases of a variable, as a leaf with onlyone case of the database is never branched. Butusually the number of nodes is much lower.

In the algorithm, we make some approxima-tions with respect to the independencies repre-sented by the model. First, we only look at one-to-one dependencies, and not to joint dependen-cies. It can be the case that Xi is indepen-dent of Xj and Xk and it is not independent of(Xj ,Xk). However, testing these independentsis more costly and we do not have the possibilityof a direct representation of this in the model.This assumption is made by other Bayesian net-works learning algorithms such as PC (Spirtes

et al., 1993), where a link between two nodes isdeleted if these nodes are marginally indepen-dent.

Another approximation is that it is assumedthat all the variables are independent in a leafwhen Dep(Xk|D) ≤ 0, even if some of the termswe are adding are positive. We have found thatthis is a good compromise criterion to limit thecomplexity of learned models.

5 Experiments

To make an initial evaluation of the Indepen-

dence Tree (IndepT) model, we decided to com-pare it with other well known unsupervisedclassification techniques that are also based onProbabilistic Graphical Models: learning of aBayesian network (by a standard algorithm asPC) and also Expectation-Maximisation witha Naive Bayes structure, which uses cross-validation to decide the number of clusters. Be-cause we produce a probabilistic description ofthe dataset, we use the log-likelihood (logL) ofthe data given the model to score a given clus-tering. By using the logL as goodness measurewe can compare our approach with other algo-rithms for probabilistic model-based unsuper-vised learning: probabilistic model-based clus-tering and Bayesian networks. This is a directevaluation of the procedures as methods to en-code a complex joint probability distribution.At the same time, it is also an evaluation of ourmethod from the clustering point of view, show-ing whether the proposed segmentation is usefulfor a simple description of the population.

Then, the basic steps we have followedfor the three procedures [IndepT,PC-Learn,EM-

Naive] are:

1. Divide the data cases into a training or dataset (SD) and a test set (ST ), using (2/3,1/3).

2. Build the corresponding model for the casesin SD.

3. Compute the log-likelyhood of the obtainedmodel over the data in ST .

Among the tested cases, apart from easy syn-thetical data bases created to check the ex-pected and right clusters, we have looked forreal sets of cases. Some of them have been taken

88 M. J. Flores, J. A. Gámez, and S. Moral

Page 99: Proceedings of the Third European Workshop on Probabilistic

from the UCI datasets repository (Newman etal., 1998) and others from real applications re-lated to our current research environment.

We will indicate the main remarks about allthe evaluated cases:

• Case 1: exclusive: This is a very simpledataset with five binary variables, from X1 toX5. The three first variables have an exclusive

behaviour, that is, if one of them is 1 the othertwo will be 0. On the other hand, X4 and X5

are independent with the three first ones andalso with each other. This example is interest-ing not especially for the LogL comparison, butmainly to see the interpretation of the tree givenin figure 1.

• Case 2: tic-tac-toe: Taken from UCI reposi-tory and it encodes the complete set of possibleboard configurations at the end of tic-tac-toegames. It presents 958 cases and 9 attributes(one per each game square).

• Cases 3: greenhouses: Data cases takenfrom real greenhouses located in Almerıa(Spain). There are four distinct sets ofcases, here we indicate them with denotation(case)=num cases, num attributes: (3A)=1240,8, (3B)= 1465,17, (3C)= 1318,33,(3D)= 1465,6.

• Cases 4: sheep: Datasets taken from thework developed in (Flores and Gamez, 2005)where the historical data of the different ani-mals were registered and used to analyse theirgenetic merit for milk production directed toManchego cheese. There are two datasets with3087 cases, 4A with 24 variables/attributes and4B, where the attribute breeding value (thatcould be interpreted as class attribute) has beenremoved.

• Case 5: connect4: Also downloaded from theUCI repository and it contains all legal 8-playpositions in the game of connect-4 in whichneither player has won yet, and in which thenext move is not forced. There are 67557 casesand 42 attributes, each corresponding to oneconnect-4 square2.

The results of the experiments can be foundin figure 1 and in table 1. Figure 1 presents

2Actually, only the half of the cases have been used.

case IndepT PC-Learn EM-Naive

1 -61.33 -62.88 -119.672 -1073.94 -2947.85 -3535.11

3A -3016.92 -2644.38 -4297.213B -2100.40 -3584.06 -6266.713C -4956.04 -7630.98 -15145.153D -1340.04 -2770.15 -4223.454A -6853.12 -18074.69 -32213.294B -6802.94 -17357.80 -31244.285: -465790.73 -349631.35 -794415.07

Table 1: Comparison in terms of the log-likelyhood value for all cases (datasets).

the tree learned in the exclusive case. As itcan be observed, the tree really captures whatis happening in the problem: X4 and X5 areindependent and placed in the list of root nodeand then one of the other variables is looked up,if its value is 1, then the values of the other twovariables is completely determined and we stop.If its value is 0, then another variable has to beexamined.

With respect to the capacity of approximat-ing the joint probability distribution, we cansee in table 1 that our method always providesgreater values of logL than EM-Naive in all thedatasets. With respect to PC algorithm, theindependent tree wins in all the situations ex-cept in two of them (cases 3A and 5). This isa remarkable result as our model has some lim-itations in representing sets of independenciesthat can be easily represented by a Bayesiannetwork (for example a Markov chain), howeverits behaviour is usually better. We think thatthis is due to two main aspects: it can representasymmetrical independencies and it is a simplemodel (which is always a virtue).

6 Concluding remarks and future

work

In this paper we have proposed a new model forrepresenting a joint probability distribution in acompact way, which can be used for fast compu-tation of conditional probability distributions.This model is based on a partition of the statespace, in such a way that in each group all the

The Independency tree model: a new approach for clustering and factorisation 89

Page 100: Proceedings of the Third European Workshop on Probabilistic

variables are independent. In this sense, it is atthe same time a clustering algorithm.

In the experiments with real and syntheti-cal data we have shown its good behaviour asa method of approximating a joint probability,providing results that are better (except for twocases for the PC algorithm) to the factorisa-tion provided by an standard Bayesian networkslearning algorithm (PC) and by EM clusteringalgorithm. In any case, more extensive experi-ments are necessary in order to compare it withanother clustering and factorisation procedures.

We think that this model can be improvedand exploited in several ways: (1) Some of theclusters can have very few cases or can be verysimilar in the distributions of the variables. Wecould devise a procedure to joint these clusters;(2) we feel that the different number of values ofthe variables can have some undesirable effects.This problem could be solved by considering bi-nary trees in which the branching is determinedby a partition of the set of possible values of avariable in two parts. This could allow to extendthe model to continuous variables; (3) we coulddetermine different stepping branching rules de-pending of the final objective: to approximate ajoint probability or to provide a simple (thoughnot necessarily exhaustive) partition of the statespace that could help us to have an idea of howthe variables take their values; (4) to use thismodel in classification problems.

Acknowledgments

This work has been supported by Spanish MECunder project TIN2004-06204-C03-02,03.

References

M.R. Anderberg. 1973. Cluster Analysis for Appli-cations. Academic Press.

P. Cheeseman and J. Stutz. 1996. Bayesian classi-fication (AUTOCLASS): Theory and results. InU. M. Fayyad, G. Piatetsky-Shapiro, P Smyth,and R. Uthurusamy, editors, Advances in Knowl-edge Discovery and Data Mining, pages 153–180.AAAI Press/MIT Press.

G.F. Cooper and E.A. Herskovits. 1992. A Bayesianmethod for the induction of probabilistic networksfrom data. Machine Learning, 9:309–347.

A. P. Dempster, N. M. Laird, and D.B. Rubin. 1977.Maximum likelihood from incomplete data via theEM algorithm. Journal of the Royal StatisticalSociety, 1:1–38.

R.O. Duda, P.E. Hart, and D.J. Stork. 2001. Pat-tern Recognition. Wiley.

M.J. Flores and J. A. Gamez. 2005. Breeding valueclassification in manchego sheep: A study of at-tribute selection and construction. In Kowledge-Based Intelligent Information and EnginneringSystems. LNAI, volume 3682, pages 1338–1346.Springer Verlag.

N. Friedman. 1998. The Bayesian structural EM al-gorithm. In Proceedings of the 14th Conference onUncertainty in Artificial Intelligence (UAI-98),pages 129–138. Morgan Kaufmann.

D. Hand, H. Mannila, and P. Smyth. 2001. Princi-ples of Data Mining. MIT Press.

A.K. Jain, M.M. Murty, and P.J. Flynn. 1999. Dataclustering: a review. ACM Computing Surveys,31:264–323.

F. V. Jensen. 2001. Bayesian Networks and Deci-sion Graphs. Springer Verlag.

L. Kaufman and P. Rousseeuw. 1990. FindingGroups in Data. John Wiley and Sons.

D. Lowd and P. Domingos. 2005. Naive Bayes mod-els for probability estimation. In Proceedings ofthe 22nd ICML conference, pages 529–536. ACMPress.

D.J. Newman, S. Hettich, C.L. Blake, and C.J.Merz. 1998. UCI repository of machine learningdatabases.

J.M. Pena, J.A. Lozano, and P. Larranaga. 2000.An improved Bayesian structural EM algorithmfor learning Bayesian networks for clustering.Pattern Recognition Letters, 21:779–786.

A. Salmeron, A. Cano, and S. Moral. 2000. Impor-tance sampling in Bayesian networks using prob-ability trees. Computational Statistics and DataAnalysis, 34:387–413.

P.H. Sneath and R.R. Sokal. 1973. Numerical Tax-onomy. Freeman.

P. Spirtes, C. Glymour, and R. Scheines. 1993. Cau-sation, Prediction and Search. Springer Verlag.

90 M. J. Flores, J. A. Gámez, and S. Moral

Page 101: Proceedings of the Third European Workshop on Probabilistic

Learning the Tree Augmented Naive Bayes Classifier fromincomplete datasets

Olivier C.H. Francois and Philippe LerayLITIS Lab., INSA de Rouen, BP 08, av. de l’Universite

76801 Saint-Etienne-Du-Rouvray, France.

Abstract

The Bayesian network formalism is becoming increasingly popular in many areas suchas decision aid or diagnosis, in particular thanks to its inference capabilities, even whendata are incomplete. For classification tasks, Naive Bayes and Augmented Naive Bayesclassifiers have shown excellent performances. Learning a Naive Bayes classifier fromincomplete datasets is not difficult as only parameter learning has to be performed. Butthere are not many methods to efficiently learn Tree Augmented Naive Bayes classifiersfrom incomplete datasets. In this paper, we take up the structural em algorithm principleintroduced by (Friedman, 1997) to propose an algorithm to answer this question.

1 Introduction

Bayesian networks are a formalism for proba-bilistic reasoning increasingly used in decisionaid, diagnosis and complex systems control. LetX = X1, . . . , Xn be a set of discrete random vari-ables. A Bayesian network B =< G, Θ > is de-fined by a directed acyclic graph G =< N, U >

where N represents the set of nodes (one nodefor each variable) and U the set of edges, andparameters Θ = θijk16i6n,16j6qi,16k6ri

the setof conditional probability tables of each nodeXi knowing its parents’ state Pi (with ri andqi as respective cardinalities of Xi and Pi).

If G and Θ are known, many inference algo-rithms can be used to compute the probabilityof any variable that has not been measured con-ditionally to the values of measured variables.Bayesian networks are therefore a tool of choicefor reasoning in uncertainty, based on incom-plete data, which is often the case in real appli-cations.

It is possible to use this formalism for clas-sification tasks. For instance, the Naive Bayesclassifier has shown excellent performance. Thismodel is simple and only need parameter learn-ing that can be performed with incompletedatasets. Augmented Naive Bayes classifiers(with trees, forests or Bayesian networks),

which often give better performances than theNaive Bayes classifier, require structure learn-ing. Only a few methods of structural learningdeal with incomplete data.

We introduce in this paper a method to learnTree Augmented Naive Bayes (tan) classifiersbased on the expectation-maximization (em)principle. Some previous work by (Cohen etal., 2004) also deals with tan classifiers and emprinciple for partially unlabeled data. In therework, only the variable corresponding to theclass can be partially missing whereas any vari-able can be partially missing in the approach wepropose here.

We will therefore first recall the issues relat-ing to structural learning, and review the vari-ous ways of dealing with incomplete data, pri-marily for parameter estimation, and also forstructure determination. We will then exam-ine the structural em algorithm principle, beforeproposing and testing a few ideas for improve-ment based on the extension of the MaximumWeight Spanning Tree algorithm to deal withincomplete data. Then, we will show how touse the introduced method to learn the well-known Tree Augmented Naive Bayes classifierfrom incomplete datasets and we will give someexperiments on real data.

Page 102: Proceedings of the Third European Workshop on Probabilistic

2 Preliminary remarks

2.1 Structural learning

Because of the super-exponential size of thesearch space, exhaustive search for the beststructure is impossible. Many heuristic meth-ods have been proposed to determine the struc-ture of a Bayesian network. Some of them relyon human expert knowledge, others use realdata which need to be, most of the time, com-pletely observed.

Here, we are more specifically interested inscore-based methods. Primarily, greedy searchalgorithm adapted by (Chickering et al., 1995)and maximum weight spanning tree (mwst)proposed by (Chow and Liu, 1968) and ap-plied to Bayesian networks in (Heckerman et al.,1995). The greedy search carried out in directedacyclic graph (dag) space where the interest ofeach structure located near the current struc-ture is assessed by means of a bic/mdl typemeasurement (Eqn.1)1 or a Bayesian score likebde (Heckerman et al., 1995).

BIC(G, Θ) = log P (D|G, Θ)− log N

2Dim(G) (1)

where Dim(G) is the number of parameters usedfor the Bayesian network representation andN is the size of the dataset D.

The bic score is decomposable. It can bewritten as the sum of the local score computedfor each node as BIC(G, Θ) =

Pi bic(Xi, Pi, ΘXi|Pi

)

where bic(Xi, Pi, ΘXi|Pi) =X

Xi=xk

XPi=paj

Nijk log θijk −log N

2Dim(ΘXi|Pi

) (2)

with Nijk the occurrence number of Xi =xk and Pi = paj in D.

The principle of the mwst algorithm is ratherdifferent. This algorithm determines the besttree that links all the variables, using a mutualinformation measurement like in (Chow andLiu, 1968) or the bic score variation when twovariables become linked as proposed by (Hecker-man et al., 1995). The aim is to find an optimalsolution, but in a space limited to trees.

1As (Friedman, 1997), we consider that the bic/mdlscore is a function of the graph G and the parameters Θ,generalizing the classical definition of the bic score whichis defined with our notation by BIC(G, Θ∗) where Θ∗

is obtained by maximizing the likelihood or BIC(G, Θ)score for a given G.

2.2 Bayesian classifiers

Bayesian classifiers as Naive Bayes have shownexcellent performances on many datasets. Evenif the Naive Bayes classifier has underlyingheavy independence assumptions, (Domingosand Pazzani, 1997) have shown that it is op-timal for conjunctive and disjunctive concepts.They have also shown that the Naive Bayes clas-sifier does not require attribute independence tobe optimal under Zero-One loss.

Augmented Naive Bayes classifier appear asa natural extension to the Naive Bayes classi-fier. It allows to relax the assumption of inde-pendence of attributes given the class variable.Many ways to find the best tree to augmentthe Naive Bayes classifier have been studied.These Tree Augmented Naive Bayes classifiers(Geiger, 1992; Friedman et al., 1997) are a re-stricted family of Bayesian Networks in whichthe class variable has no parent and each otherattribute has as parents the class variable andat most one other attribute. The bic score ofsuch a Bayesian network is given by Eqn.3.

BIC(TAN , Θ) = bic(C, ∅, ΘC|∅) (3)

+X

i

bic(Xi, C, Pi, ΘXi|C,Pi)

where C stands for the class node and Pi couldonly be the emptyset ∅ or a singleton Xj, Xj 6∈C, Xi.

Forest Augmented Naive Bayes classifier(fan) is very close to the tan one. In thismodel, the augmented structure is not a tree,but a set of disconnected trees in the attributespace (Sacha, 1999).

2.3 Dealing with incomplete data

2.3.1 Practical issue

Nowadays, more and more datasets are avail-able, and most of them are incomplete. Whenwe want to build a model from an incompletedataset, it is often possible to consider only thecomplete samples in the dataset. But, in thiscase, we do not have a lot of data to learn themodel. For instance, if we have a dataset with2000 samples on 20 attributes with a probabil-ity of 20% that a data is missing, then, only

92 O. C.H. François and P. Leray

Page 103: Proceedings of the Third European Workshop on Probabilistic

23 samples (in average) are complete. General-izing from the example, we see that we cannotignore the problem of incomplete datasets.

2.3.2 Nature of missing dataLet D = Xl

i16i6n,16l6N our dataset, withDo the observed part of D, Dm the missing partand Dco the set of completely observed cases inDo. Let also M = Mil with Mil = 1 if Xl

i ismissing, 0 if not. We then have the followingrelations:Dm = Xl

i / Mil = 116i6n,16l6N

Do = Xli / Mil = 016i6n,16l6N

Dco = [Xl1 . . . Xl

n] / [M1l . . . Mnl] = [0 . . . 0]16l6N

Dealing with missing data depends on theirnature. (Rubin, 1976) identified several typesof missing data:

• mcar (Missing Completly At Random):P (M|D) = P (M), the probability for datato be missing does not depend on D,

• mar (Missing At Random): P (M|D) =

P (M|Do), the probability for data to bemissing depends on observed data,

• nmar (Not Missing At Random): the prob-ability for data to be missing depends onboth observed and missing data.

mcar and mar situations are the easiest tosolve as observed data include all necessary in-formation to estimate missing data distribution.The case of nmar is trickier as outside informa-tion has to be used to model the missing datadistribution.

2.3.3 Learning Θ with incomplete dataWith mcar data, the first and simplest possi-

ble approach is the complete case analysis. Thisis a parameter estimation based on Dco, the setof completely observed cases in Do. When D ismcar, the estimator based on Dco is unbiased.However, with a high number of variables theprobability for a case [Xl

1 . . . Xln] to be completely

measured is low and Dco may be empty.One advantage of Bayesian networks is that,

if only Xi and Pi = Pa(Xi) are measured, thenthe corresponding conditional probability tablecan be estimated. Another possible methodwith mcar cases is the available case analy-sis, i.e. using for the estimation of each con-ditional probability P (Xi|Pa(Xi)) the cases in

Do where Xi and Pa(Xi) are measured, not onlyin Dco (where all Xi’s are measured) as in theprevious approach.

Many methods try to rely more on all the ob-served data. Among them are sequential updat-ing (Spiegelhalter and Lauritzen, 1990), Gibbssampling (Geman and Geman, 1984), and ex-pectation maximisation (EM) in (Dempster etal., 1977). Those algorithms use the missingdata mar properties. More recently, boundand collapse algorithm (Ramoni and Sebastiani,1998) and robust Bayesian estimator (Ramoniand Sebastiani, 2000) try to resolve this taskwhatever the nature of missing data.

EM has been adapted by (Lauritzen, 1995)to Bayesian network parameter learning whenthe structure is known. Let log P (D|Θ) =

log P (Do,Dm|Θ) be the data log-likelihood.Dm being an unmeasured random variable, thislog-likelihood is also a random variable functionof Dm. By establishing a reference model Θ∗, itis possible to estimate the probability density ofthe missing data P (Dm|Θ∗) and therefore to cal-culate Q(Θ : Θ∗), the expectation of the previouslog-likelihood:

Q(Θ : Θ∗) = EΘ∗ [log P (Do,Dm|Θ)] (4)

So Q(Θ : Θ∗) is the expectation of the likelihoodof any set of parameters Θ calculated using adistribution of the missing data P (Dm|Θ∗). Thisequation can be re-written as follows.

Q(Θ : Θ∗) =

nXi=1

XXi=xk

XPi=paj

N∗ijk log θijk (5)

where N∗ijk = EΘ∗ [Nijk] = N × P (Xi = xk, Pi =

paj |Θ∗)is obtained by inference in the network< G, Θ∗ > if the Xi,Pi are not completely mea-sured, or else only by mere counting.

(Dempster et al., 1977) proved convergenceof the em algorithm, as the fact that it was notnecessary to find the global optimum Θi+1 offunction Q(Θ : Θi) but simply a value whichwould increase function Q (Generalized em).

2.3.4 Learning G with incompletedataset

The main methods for structural learningwith incomplete data use the em principle: Al-ternative Model Selection em (ams-em) pro-

Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets 93

Page 104: Proceedings of the Third European Workshop on Probabilistic

posed by (Friedman, 1997) or Bayesian Struc-tural em (bs-em) (Friedman, 1998). We canalso cite the Hybrid Independence Test pro-posed in (Dash and Druzdzel, 2003) that canuse em to estimate the essential sufficient statis-tics that are then used for an independencetest in a constraint-based method. (Myers etal., 1999) also proposes a structural learningmethod based on genetic algorithm and mcmc.We will now explain the structural em algorithmprinciple in details and see how we could adaptit to learn a tan model.

3 Structural em algorithm

3.1 General principle

The EM principle, which we have describedabove for parameter learning, applies moregenerally to structural learning (Algorithm 1as proposed by (Friedman, 1997; Friedman,1998)).

Algorithm 1 : Generic em for structural learning

1: Init: i = 0Random or heuristic choice of the initialBayesian network (G0, Θ0)

2: repeat3: i = i + 14: (Gi, Θi) = argmax

G,ΘQ(G, Θ : Gi−1, Θi−1)

5: until |Q(Gi, Θi : Gi−1, Θi−1)−Q(Gi−1, Θi−1 : Gi−1, Θi−1)| 6 ε

The maximization step in this algorithm (step4) has to be performed in the joint spaceG, Θ which amounts to searching the beststructure and the best parameters correspond-ing to this structure. In practice, these twosteps are clearly distinct2:

Gi = argmaxG

Q(G, • : Gi−1, Θi−1) (6)

Θi = argmaxΘ

Q(Gi, Θ : Gi−1, Θi−1) (7)

where Q(G, Θ : G∗, Θ∗) is the expectation ofthe likelihood of any Bayesian network <

G, Θ > computed using a distribution of themissing data P (Dm|G∗, Θ∗).

Note that the first search (Eqn.6) in the spaceof possible graphs takes us back to the initialproblem, i.e. the search for the best structure in

2The notation Q(G, • : . . . ) used in Eqn.6 stands forEΘ[Q(G, Θ : . . . )] for Bayesian scores or Q(G, Θo : . . . )where Θo is obtained by likelihood maximisation.

Algorithm 2 : Detailed em for structural learning

1: Init: finished = false, i = 0Random or heuristic choice of the initialBayesian network (G0, Θ0,0)

2: repeat3: j = 04: repeat5: Θi,j+1 = argmax

ΘQ(Gi, Θ : Gi, Θi,j)

6: j = j + 17: until convergence (Θi,j → Θi,jo

)

8: if i = 0 or |Q(Gi, Θi,jo

: Gi−1, Θi−1,jo

) −Q(Gi−1, Θi−1,jo

: Gi−1, Θi−1,jo

)| > ε then

9: Gi+1 = arg maxG∈VGi

Q(G, • : Gi, Θi,jo

)

10: Θi+1,0 = argmaxΘ

Q(Gi+1, Θ : Gi, Θi,jo

)

11: i = i + 112: else13: finished = true14: end if15: until finished

a super-exponential space. However, with Gen-eralised em it is sufficient to look for a bettersolution rather than the best possible one, with-out affecting the algorithm convergence proper-ties. This search for a better solution can thenbe done in a limited space, like for example VG,the set of the neigbours of graph G that havebeen generated by removal, addition or inver-sion of an arc.

Concerning the search in the space of the pa-rameters (Eqn.7), (Friedman, 1997) proposesrepeating the operation several times, using aclever initialisation. This step then amounts torunning the parametric em algorithm for eachstructure Gi, starting with structure G0 (steps 4to 7 of Algorithm 2). The two structural emalgorithms proposed by Friedman can thereforebe considered as greedy search algorithms, withEM parameter learning at each iteration.

3.2 Choice of function Q

We now have to choose the function Q that willbe used for structural learning. The likelihoodused for parameter learning is not a good indi-cator to determine the best graph since it givesmore importance to strongly connected struc-tures. Moreover, it is impossible to computemarginal likelihood when data are incomplete,so that it is necessary to rely on an efficientapproximation like those reviewed by (Chicker-

94 O. C.H. François and P. Leray

Page 105: Proceedings of the Third European Workshop on Probabilistic

ing and Heckerman, 1996). In complete datacases, the most frequently used measurementsare the bic/mdl score and the Bayesian bdescore (see paragraph 2.1). When proposing thems-em and mwst-em algorithms, (Friedman,1997) shows how to use the bic/mdl score withincomplete data, by applying the principle ofEqn.4 to the bic score (Eqn.1) instead of likeli-hood. Function QBIC is defined as the bic scoreexpectation by using a certain probability den-sity on the missing data P (Dm|G∗, Θ∗) :

QBIC(G, Θ : G∗, Θ∗) = (8)

EG∗,Θ∗ [log P (Do,Dm|G, Θ)]− 1

2Dim(G) log N

As the bic score is decomposable, so is QBIC .

QBIC(G, Θ : G∗, Θ∗)=X

i

Qbic(Xi, Pi, ΘXi|Pi: G∗, Θ∗)

(9)where Qbic(Xi, Pi, ΘXi|Pi: G∗, Θ∗) =X

Xi=xk

XPi=paj

N∗ijk log θijk −

log N

2Dim(ΘXi|Pi

) (10)

with N∗ijk = EG∗,Θ∗ [Nijk] = N ∗ P (Xi = xk, Pi =

paj |G∗, Θ∗) obtained by inference in the networkG∗, Θ∗ if Xi,Pi are not completely measured,or else only by mere counting. With the samereasoning, (Friedman, 1998) proposes the adap-tation of the bde score to incomplete data.

4 TAN-EM, a structural EM forclassification

(Leray and Francois, 2005) have introducedmwst-em an adaptation of mwst dealing withincomplete datasets. The approach we proposehere is using the same principles in order to ef-ficiently learn tan classifiers from incompletedatasets.

4.1 MWST-EM, a structural EM inthe space of trees

Step 1 of Algorithm 2, like all the previous al-gorithms, deals with the choice of the initialstructure. The choice of an oriented chain graphlinking all the variables proposed by (Friedman,1997) seems even more judicious here, since thischain graph also belongs to the tree space. Steps4 to 7 do not change. They deal with the run-ning of the parametric em algorithm for eachstructure Bi, starting with structure B0.

There is a change from the regular structuralem algorithm in step 9, i.e. the search for abetter structure for the next iteration. Withthe previous structural em algorithms, we werelooking for the best dag among the neighboursof the current graph. With mwst-em, we candirectly get the best tree that maximises func-tion Q.

In paragraph 2.1, we briefly recalled that themwst algorithm used a similarity function be-tween two nodes which was based on the bicscore variation whether Xj is linked to Xi ornot. This function can be summed up in thefollowing (symmetrical) matrix:h

Mij

i16i,j6n

=hbic(Xi, Xj , ΘXi|Xj

)− bic(Xi, ∅, ΘXi)i

(11)

where the local bic score is defined in Eqn.2.Running maximum (weight) spanning algo-

rithms like Kruskal’s on matrix M enables us toobtain the best tree T that maximises the sumof the local scores on all the nodes, i.e. functionBIC of Eqn.2.

By applying the principle we described in sec-tion 3.2, we can then adapt mwst to incom-plete data by replacing the local bic score ofEqun.11 with its expectation; to do so, we use acertain probability density of the missing dataP (Dm|T ∗, Θ∗) :h

MQij

ii,j

=hQbic(Xi, Pi = Xj, ΘXi|Xj

: T ∗, Θ∗)

−Qbic(Xi, Pi = ∅, ΘXi : T ∗, Θ∗)i

(12)

With the same reasoning, running a maximum(weight) spanning tree algorithm on matrixMQ enables us to get the best tree T that max-imises the sum of the local scores on all thenodes, i.e. function QBIC of Eqn.9.

4.2 TAN-EM, a structural EM forclassification

The score used to find the best tan structureis very similar to the one used in mwst, so wecan adapt it to incomplete datasets by definingthe following score matrix:hMQ

ij

ii,j

=hQbic(Xi, Pi = C, Xj, ΘXi|XjC : T ∗, Θ∗)

−Qbic(Xi, Pi = C, ΘXi|C : T ∗, Θ∗)i

(13)

Using this new score matrix, we can use theapproach previously proposed for mwst-em to

Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets 95

Page 106: Proceedings of the Third European Workshop on Probabilistic

get the best augmented tree, and connect theclass node to all the other nodes to obtain thetan structure. We are currently using the samereasoning to find the best forest ”extension”.

4.3 Related works

(Meila-Predoviciu, 1999) applies mwst algo-rithm and em principle, but in another frame-work, learning mixtures of trees. In this work,the data is complete, but a new variable is intro-duced in order to take into account the weightof each tree in the mixture. This variable isn’tmeasured so em is used to determine the corre-sponding parameters.

(Pena et al., 2002) propose a change insidethe framework of the sem algorithm resultingin an alternative approach for learning BayesNets for clustering more efficiently.

(Greiner and Zhou, 2002) propose maximiz-ing conditional likelihood for BN parameterlearning. They apply their method to mcar in-complete data by using available case analysisin order to find the best tan classifier.

(Cohen et al., 2004) deal with tan classifiersand em principle for partially unlabeled data.In there work, only the variable correspondingto the class can be partially missing whereas anyvariable can be partially missing in our tan-emextension.

5 Experiments

5.1 Protocol

The experiment stage aims at evaluating theTree Augmented Naive Bayes classifier onincomplete datasets from UCI repository3:Hepatitis, Horse, House, Mushrooms andThyroid.

The tan-em method we proposed here iscompared to the Naive Bayes classifier withem parameters learning. We also indicate theclassification rate obtained by three methods:mwst-em, sem initialised with a random chainand sem initialised with the tree given bymwst-em (sem+t). The first two methods are

3http://www.ics.uci.edu/∼mlearn/MLRepository.html

dedicated to classification tasks while the oth-ers do not consider the class node as a specificvariable.

We also give an α confidence interval for eachclassification rate, based on Eqn.14 proposed by(Bennani and Bossaert, 1996):

I(α, N) =T +

Z2α

2N± Zα

qT (1−T )

N+

Z2α

4N2

1 +Z2

αN

(14)

where N is the number of samples in the dataset,T is the classification rate and Zα = 1, 96 for α =

95%.

5.2 Results

The results are summed up in Table 1. First, wecould see that even if the Naive Bayes classifieroften gives good results, the other tested meth-ods allow to obtain better classification rates.But, where all runnings of nb-em give the sameresults, as em parameter learning only needsan initialisiation, the other methods do not al-ways give the same results, and then, the sameclassification rates. We have also noticed (notreported here) that, excepting nb-em, tan-em seems the most stable method concerningthe evaluated classification rate while mwst-emseems to be the less stable.

The method mwst-em can obtain very goodstructures with a good initialisation. Then, ini-tialising it with the results of mwst-em gives usstabler results (see (Leray and Francois, 2005)for a more specific study of this point).

In our tests, except for this house dataset,tan-em always obtains a structure that leadto better classification rates in comparison withthe other structure learning methods.

Surprisingly, we also remark that mwst-emcan give good classification rates even if theclass node is connected to a maximum of twoother attributes.

Regarding the log-likelihood reported in Ta-ble 1, we see that the tan-em algorithm findsstructures that can also lead to a good approx-imation of the underlying probability distribu-tion of the data, even with a strong constrainton the graph structure.

Finally, the Table 1 illustrates that tan-em and mwst-em have about the same com-plexity (regarding the computational time) and

96 O. C.H. François and P. Leray

Page 107: Proceedings of the Third European Workshop on Probabilistic

Datasets N learn test #C %I NB-EM MWST-EM TAN-EM SEM SEM+THepatitis 20 90 65 2 8.4 70.8 [58.8;80.5] 73.8 [62.0;83.0] 75.4 [63.6;84.2] 66.1 [54.0;76.5] 66.1 [54.0;76.5]

-1224.2 ; 29.5 -1147.6 ; 90.4 -1148.7 ; 88.5 -1211.5 ; 1213.1 -1207.9 ; 1478.5Horse 28 300 300 2 88.0 75 [63.5;83.8] 77.9 [66.7;86.2] 80.9 [69.9;88.5] 66.2 [54.3;76.3] 66.2 [54.3;76.3]

-5589.1 ; 227.7 -5199.6 ; 656.1 -5354.4 ; 582.2 -5348.3 ; 31807 -5318.2 ; 10054House 17 290 145 2 46.7 89.7 [83.6;93.7] 93.8 [88.6;96.7] 92.4 [86.9;95.8] 92.4 [86.9;95.8] 93.8 [88.6;96.7]

-2203.4 ; 110.3 -2518.0 ; 157.0 -2022.2 ; 180.7 -2524.4 ; 1732.4 -2195.8 ; 3327.2Mushrooms 23 5416 2708 2 30.5 92.8 [91.7;93.8] 74.7 [73.0;73.4] 91.3 [90.2;92.4] 74.9 [73.2;76.5] 74.9 [73.2;76.5]

-97854 ; 2028.9 -108011 ; 6228.2 -87556 ; 5987.4 -111484 ; 70494 -110828 ; 59795Thyroid 22 2800 972 2 29.9 95.3 [93.7;96.5] 93.8 [92.1;95.2] 96.2 [94.7;97.3] 93.8 [92.1;95.2] 93.8 [92.1;95.2]

-39348 ; 1305.6 -38881 ; 3173.0 -38350 ; 3471.4 -38303 ; 17197 -39749 ; 14482

Table 1: First line: best classification rate (on 10 runs, except Mushrooms on 5, in %) on testdataset and its confidence interval, for the following learning algorithms: nb-em, mwst-em, sem,tan-em and sem+t. Second line: log-likelihood estimated with test data and calculation time(sec) for the network with the best classification rate. The first six columns give us the name of thedataset and some of its properties : number of attributes, learning sample size, test sample size,number of classes and percentage of incomplete samples.

are a good compromise between nb-em (clas-sical Naive Bayes with em parameter learning)and mwst-em (greedy search with incompletedata).

6 Conclusions and prospects

Bayesian networks are a tool of choice for rea-soning in uncertainty, with incomplete data.However, most of the time, Bayesian networkstructural learning only deal with completedata. We have proposed here an adaptation ofthe learning process of Tree Augmented NaiveBayes classifier from incomplete datasets (andnot only partially labelled data). This methodhas been successfuly tested on some datasets.

We have seen that tan-em was a good clas-sification tool compared to other Bayesian net-works we could obtained with structural em likelearning methods.

Our method can easily be extended to un-supervised classification tasks by adding a newstep in order to determine the best cardinalityfor the class variable.

Related future works are the adaptation ofsome other Augmented Naive Bayes classifiersfor incomplete datasets (fan for instance), butalso the study of these methods with mardatasets.

mwst-em, tan-em and sem methods are re-spective adaptations of mwst, tan and greedysearch to incomplete data. These algorithms are

applying in (subspace of) dag space. (Chicker-ing and Meek, 2002) proposed an optimal searchalgorithm (ges) which deals with Markov equiv-alent space. Logically enough, the next stepin our research is to adapt ges to incompletedatasets. Then we could test results of thismethod on classification tasks.

7 Acknowledgement

This work was supported in part by the istProgramme of the European Community, underthe pascal Network of Excellence, ist-2002-506778. This publication only reflects the au-thors’ views.

References

Y. Bennani and F. Bossaert. 1996. Predictive neu-ral networks for traffic disturbance detection inthe telephone network. In Proceedings of IMACS-CESA’96, page xx, Lille, France.

D. Chickering and D. Heckerman. 1996. EfficientApproximation for the Marginal Likelihood of In-complete Data given a Bayesian Network. InUAI’96, pages 158–168. Morgan Kaufmann.

D. Chickering and C. Meek. 2002. Finding optimalbayesian networks. In Adnan Darwiche and NirFriedman, editors, Proceedings of the 18th Con-ference on Uncertainty in Artificial Intelligence(UAI-02), pages 94–102, S.F., Cal. Morgan Kauf-mann Publishers.

Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets 97

Page 108: Proceedings of the Third European Workshop on Probabilistic

D. Chickering, D. Geiger, and D. Heckerman. 1995.Learning bayesian networks: Search methods andexperimental results. In Proceedings of Fifth Con-ference on Artificial Intelligence and Statistics,pages 112–128.

C.K. Chow and C.N. Liu. 1968. Approximating dis-crete probability distributions with dependencetrees. IEEE Transactions on Information The-ory, 14(3):462–467.

I. Cohen, F. G. Cozman, N. Sebe, M. C. Cirelo,and T. S. Huang. 2004. Semisupervised learningof classifiers: Theory, algorithms, and their ap-plication to human-computer interaction. IEEETransactions on Pattern Analysis and MachineIntelligence, 26(12):1553–1568.

D. Dash and M.J. Druzdzel. 2003. Robust inde-pendence testing for constraint-based learning ofcausal structure. In Proceedings of The Nine-teenth Conference on Uncertainty in Artificial In-telligence (UAI03), pages 167–174.

A. Dempster, N. Laird, and D. Rubin. 1977. Maxi-mum likelihood from incomplete data via the EMalgorithm. Journal of the Royal Statistical Soci-ety, B 39:1–38.

P. Domingos and M. Pazzani. 1997. On the optimal-ity of the simple bayesian classifier under zero-oneloss. Machine Learning, 29:103–130.

N. Friedman, D. Geiger, and M. Goldszmidt. 1997.bayesian network classifiers. Machine Learning,29(2-3):131–163.

N. Friedman. 1997. Learning belief networks in thepresence of missing values and hidden variables.In Proceedings of the 14th International Confer-ence on Machine Learning, pages 125–133. Mor-gan Kaufmann.

N. Friedman. 1998. The bayesian structural EMalgorithm. In Gregory F. Cooper and SerafınMoral, editors, Proceedings of the 14th Conferenceon Uncertainty in Artificial Intelligence (UAI-98), pages 129–138, San Francisco, July. MorganKaufmann.

D. Geiger. 1992. An entropy-based learning algo-rithm of bayesian conditional trees. In Uncer-tainty in Artificial Intelligence: Proceedings of theEighth Conference (UAI-1992), pages 92–97, SanMateo, CA. Morgan Kaufmann Publishers.

S. Geman and D. Geman. 1984. Stochastic re-laxation, Gibbs distributions, and the Bayesianrestoration of images. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 6(6):721–741, November.

R. Greiner and W. Zhou. 2002. Structural extensionto logistic regression. In Proceedings of the Eigh-teenth Annual National Conference on ArtificialIntelligence (AAI02), pages 167–173, Edmonton,Canada.

D. Heckerman, D. Geiger, and M. Chickering. 1995.Learning Bayesian networks: The combination ofknowledge and statistical data. Machine Learn-ing, 20:197–243.

S. Lauritzen. 1995. The EM algorithm for graphicalassociation models with missing data. Computa-tional Statistics and Data Analysis, 19:191–201.

P. Leray and O. Francois. 2005. bayesian networkstructural learning and incomplete data. In Pro-ceedings of the International and InterdisciplinaryConference on Adaptive Knowledge Representa-tion and Reasoning (AKRR 2005), Espoo, Fin-land, pages 33–40.

M. Meila-Predoviciu. 1999. Learning with Mixturesof Trees. Ph.D. thesis, MIT.

J.W. Myers, K.B. Laskey, and T.S. Lewitt. 1999.Learning bayesian network from incomplete datawith stochatic search algorithms. In Proceedingsof the Fifteenth Conference on Uncertainty in Ar-tificial Intelligence (UAI99).

J.M. Pena, J. Lozano, and P. Larranaga. 2002.Learning recursive bayesian multinets for dataclustering by means of constructive induction.Machine Learning, 47:1:63–90.

M. Ramoni and P. Sebastiani. 1998. Parameter es-timation in Bayesian networks from incompletedatabases. Intelligent Data Analysis, 2:139–160.

M. Ramoni and P. Sebastiani. 2000. Robust learn-ing with missing data. Machine Learning, 45:147–170.

D.B. Rubin. 1976. Inference and missing data.Biometrika, 63:581–592.

J.P. Sacha. 1999. New Synthesis of bayesian Net-work Classifiers and Cardiac SPECT Image In-terpretation. Ph.D. thesis, The University ofToledo.

D. J. Spiegelhalter and S. L. Lauritzen. 1990. Se-quential updating of conditional probabilities ondirected graphical structures. Networks, 20:579–605.

98 O. C.H. François and P. Leray

Page 109: Proceedings of the Third European Workshop on Probabilistic

!"#$% &'(*)+# %,.-/0'1234

57698:<;=?><@A;B8:DCFE!G?;B;IHDJKL69M N%;OC8DP'PQ6 NRJ<;B8:TSUCFV&E;W5X>YG!CFC8DC8Z!CF[;IE%V%\]C8^V#PB_a`b8D_cPBE\];IV%6dPQ8T;B8:e=PQ\$[f<V%698DH]KLgF6dC8<gRChFJYiV&E%Cgkj^V!i#86d@BCFEh%6dVmlBJ

SU>onp>qPsrTtIuL>vuQtBw'JxByIuQtz,q2iV&E%Cgkj^VFJYV%jDC4|#CFV%jDCFEM9;B8:<hF>CR~\];B6MAL<L^^L0mQ'D^DLU'"

as ¡B`¢8*\W;B8^l£E%C;BM [E%PB¤MdC\¥:DPQ\W;B698<hFJ¦V%j<CW\];B698§@I;IE69;I¤YMdCPB_68^V&CFE%Ch%V¨¤Cj<;@BChp\$PQ8DPBV&PQ8<6gF;BM9Mdle68V&CFEk\]hUPB_V%jDCPB¤h&CFE@A;I¤M9C©@I;IE69;I¤M9ChFJB6984V%j<Ch%C8<h&CV%j<;IVj<6dHQjDCFE0@I;BM9fDChU_ªPBE0V%jDC@I;IE69;I¤M9CXPB_Y68^V&CFE%Ch%V¤CgRPQ\$C"\$PBECM6d«BCMdl¨¬69V%jj<69HQjDCFE&PBE:DCFEC:PB¤Yh&CFE%@I;IV%6dPQ8<hF>0i#8<_ªPBE%V%f8<;IV&CMdlBJBCh&V%;I¤M69h%j<698<H¬jDCFV%j<CFEPBE8DPBVW;*q;slBCh%69;B8­8DCFV~¬PBE%«®CRrLj6d¤6dV%hV%jDCh%Ce\$PQ8DPBV&PQ869gF6dV~l¯[<E%PB[CFE%V%69Ch]69h$j<6dHQjMdl°698V&E;BgRV%;I¤M9C68°HBC8<CFE;BM>¯`¢8¯V%j<69h[Y;I[±CFEsJ ¬©C²[E%Ch&C8V;§\]CFV%jDP':®V%j;IVFJX¤'l°¤f69M9:<698<HTfD[PQ8°V%j<CTgRPQ8<gRCF[<V$PB_;Bhh%6dHQ8<\$C8V$M9;IV&V%69gRCBJ[<E%P@'69:<Ch_ªPBE]69:<C8^V%6d_clL698DH³;B8^l¯@'69PQM9;IV%6dPQ8<hPB_!V%jDC´[<E%PB[CFE%V%6dChPB_$µc[;IE%V%6;BMª¶\]PQ8DPBV&PQ8<69gF6dVmlPB_YV%jDC"PQfDV&[f<V0;B8:_ªPBEXgRPQ8h&V&Ef<gRV%698<H?\W698<69\];BMPI·C8<:<698DHgRPQ8V&CRr'V%h>¸§C69M9M9fh&V&E;IV&CV%j<C¨;I[<[M969gF;IV%69PQ8´PB_UPQf<E,\$CFV%jDPL:²¬6dV%j²;$EC;BM¦q;lBCh%69;B8£8DCFVm¬©PBE%«²698´@BCFV&CFEk698<;IE%l´h%gF6dC8<gRCB>

¹ º»a¼B½Q¾¿XÀXÁ'¼QÂ~¾7»

`b8\];B8l´[<E%PB¤MdC\Ã:DPQ\];B698<hJ<V%jDC¨@I;IE69;I¤M9Ch"PB_ 69\[PBE%V%;B8<gRC$j<;s@BC:6Ä·±CFEC8^V!E%PQMdChF>4n!_cV&C8J;´8f\4¤CFEPB_"PB¤h&CFE@A;I¤M9C]698D[fDV@I;IE69;I¤YMdCh¨;B8<:°;Th%698DHQM9C$PQfDVb[fDV#@A;IEk69;I¤MdC4;IEC:<69h%V%698DHQf<69hjDC:>`¢8e;W¤69PQ\$C:<69gF;BM:<69;IHQ8<PQh&V%69gT;I[<[YM969gF;IV%6dPQ8J_cPBEWCRrD;B\$[MdCBJV%jDC£698D[fDV@I;IE69;I¤MdChgF;I[<V%f<E%CV%jDC?Å8<:<68DHQhX_cE%PQ\/:<6d·±CFE%C8V":<6Ä;IHQ8DPQh&V%6g²V&Ch&V%hW;B8:ÆV%jDC²PQfDV&[f<V$@A;IE6;I¤MdC²\$PL:DCM9hV%jDC[±PQhh%6d¤MdC]:69h&C;Bh&ChF>Çef<MdV%69[MdC$698D[f<V@A;IE6;I¤MdCh;B8<:$;h%68DHQMdCPQfDV&[fDV0@I;IE69;I¤MdC698p_È;BgRV ;IE%C"V~l'[69gF;BM9M9l_cPQf<8<:²698²;B8l´V~l'[±CPB_:<6;IHQ8DPQh&V%69g?[<EPB¤MdC\²>ÉDPBE#\];B8^lW[<E%PB¤MdC\WhFJV%j<C!ECM9;IV%6dPQ8¤CFV~¬CFC8²V%jDCPQfDV&[f<V¨@A;IEk69;I¤MdCW;B8:*V%jDCPB¤h&CFE%@I;I¤MdCW68D[fDV?@I;IE6Ä;I¤MdCh®69hÆ\]PQ8DPBV&PQ8DCÊ698V%jDCÊh&C8<h&C2V%j<;IVËj6dHQjDCFE&PBE:DCFEC:§@A;BMfDCh_cPBE¨V%jDCW698<[fDV?@A;IEk69;I¤MdCh?HQ6d@BCWE69h&CV&P®;Ìj<6dHQj<CFE&PBE:DCFE%C:ËPQfDV&[fDV_ªPBEV%jDCe\W;B698Æ@I;IE6Ä;I¤MdCPB_ 698V&CFE%Ch&VF>`b8£;]¤6dPQ\]C:<69gF;BM7:69;IHQ8DPQh&V%69g;I[D[M96gF;IV%6dPQ8JI_ªPBE0CRrD;B\$[MdCBJBPB¤Yh&CFE%@L698DH#h%l'\$[V&PQ\]h;B8<:h%6dHQ8h"V%j<;IV;IE%C\$PBE%C4h&CF@BCFEC4¬69MMYECh%f<MdV"698T;W\$PBE%Ch&CF@BCFE%C?:<69h%C;Bh&C¤C698DH4V%jDC#\$PQh%V©M96d«BCM9lp@I;BM9fDCPB_¦V%jDC:<69;IHQ8<PQh&V%69g?@I;IE69;I¤MdCB> zjDC¨gRPQ8<gRCF[<V,PB_\$PQ8<PBV&PQ8<69g6dVmle698³:<69h%V&E6d¤fDV%69PQ8j<;Bh?¤CFC8§698V&E%PL:<f<gRC:eV&PTgF;I[DV%fDE%CeV%j<69hWV~l'[±CePB_«'8<Ps¬MdC:<HBC_cPBEq;lBCh69;B828DCFVb¬PBE%«'h?µcÍ ;B8:DCFEG?;B;IHTÎFÏ ÐBÑo>dJDÒIuBuIÓ¶k>ÇPBE%C#h%[±CgF69_69gF;BM9M9lBJ;48<CFV~¬PBE%«W69h©h%;B69:]V&Pp¤C69h&PBV&PQ8<C#6d_V%jDC!gRPQ8L:<6dV%69PQ8<;BM[E%PB¤;I¤69M6dV~lÆ:<69h%V&E6d¤fDV%69PQ8ËgRPQ\$[fDV&C:­_cPBEV%jDCPQfDV&[fDV@A;IEk69;I¤MdCHQ6d@BC8®h&[CgF6dÅg$PB¤h%CFE%@A;IV%69PQ8<h69hh&V&PLgj;Bh&V%69gF;BM9Mdl³:DPQ\]68<;IV&C:³¤l§;B8l£hf<gj*:69h&V&E6Ä

¤fDV%69PQ8´HQ6d@BC8Tj<6dHQj<CFE&PBE:DCFE%C:´PB¤Yh&CFE%@I;IV%6dPQ8<hF>i!8D_cPBE%V%f<8<;IV&CMdlBJÔV%j<CÕ[<E%PB¤YMdC\ PB_¥@BCFE69_ªlL698DH¬jDCFV%j<CFE©PBE"8DPBV;q;slBCh%69;B8²8DCFVm¬©PBE«WCRrDj<6d¤6dV%hXV%jDC[<E%PB[CFE%V%69ChPB_?\$PQ8DPBV&PQ869gF6dV~l°_ªEPQ\Ô6dV%h$:DPQ\];B698®PB_;I[<[M69gF;IV%6dPQ86h#j<69HQj<Mdl²698V&E;BgRV%;I¤MdC698eHBC8DCFE;BM0;B8<:E%C\];B68<h?V&P¤C]h&P²CF@BC8°_ªPBE4[±PQM9lV&E%CFCh>WÖ#M9V%jDPQfDHQj;B8*;I[<[E%PrD69\];IV&CW;B8l'V%69\$CW;BMdHBPBEk6dV%j<\¥69h¨;s@A;B6M9;I¤MdC_cPBE*:DCgF69:<698<H×6d_];2HQ6d@BC88DCFVm¬©PBE«Ø69h³\$PQ8<PBV&PQ8DCBJ¬C_ªPQf<8:V%j;IV69V%hEf<8V%69\$C!E%CÙf6dE%C\$C8V%hV&C8<:´V&P_cPBE%Ch&V%;BM9M¦fh&C¨698²;$[<Ek;BgRV%69gF;BM¦h&CFV&V%698<HD>`b8§V%j<69h?[Y;I[±CFEsJ¦¬©C[<E%Ch&C8V¨;T8DCF¬4Ja\$PBE%CW[<E;BgV%69gF;I¤M9C­\$CFV%j<P':_cPBEËh&V%f<:DlL698DHØ\$PQ8DPBV&PQ8<69gF69V~lÚPB_q;lBCh%6;B8Û8DCFV~¬PBE%«LhF>Üz,jDC³\$CFV%jDPL:­¤f<69M:<hWfD[PQ8;®M9;IV&V%69gRC£PB_;BM9M N&PQ698^V@I;BM9fDC³;Bh%h%6dHQ8<\]C8^V%hV&P®V%jDCPB¤h&CFE@A;I¤M9CÌ@I;IE69;I¤MdCh£f<8<:<CFEeh&V%f<:<lYÝV%j<69h£M9;IV&V%69gRC\$PBE%CFP@BCFEJX69hC8<j<;B8gRC:̬69V%jÌ698D_cPBE\];IV%69PQ8Ì;I¤PQfDVV%jDC°CR·±CgRV%h£PB_$V%jDC°@A;IE69PQf<h;Bh%h6dHQ8<\$C8V%hTPQ8ÞV%jDC[<E%PB¤Y;I¤69M96dVml:<69h%V&E6d¤fDV%69PQ8P@BCFE4V%jDCW\W;B698³@A;IEk69;I¤MdCPB_698V&CFE%Ch&VF>§z,j<C;Bh%h6dHQ8<\$C8VM9;IV&V%6gRC´6hf<h%C:Ì_cPBE69:DC8V%6d_cl'68DHÌ;B8lß@L6dPQM9;IV%6dPQ8<hPB_¨V%jDC³[<EPB[±CFEV%6dChWPB_\$PQ8DPBV&PQ869gF6dV~l698àV%jDCÊ8DCFV~¬PBE%«±> KLfD¤h%CÙfDC8V%MdlBJ\]69869\];BMBPI·C8<:<698<H#gRPQ8V&CRr'V%h;IE%CgRPQ8<h&V&Ekf<gRV&C:pV%j<;IVgj;IE;BgRV&CFE69h&CÌV%jDC*69:DC8V%6dÅ<C:­@L6dPQM9;IV%6dPQ8<h²;B8:2[<E%PI@L69:DC?_ªPBE#_cf<E%V%jDCFE,698@BCh&V%6dHQ;IV%6dPQ8¦>z,j<C#Ef8^V%69\]C#gRPQ\$[M9CRrL6dVmlWPB_aPQfDE"\$CFV%jDPL:6hXCRr'[PQ8DC8V%69;BMY68V%j<C!8'f<\¤±CFEPB_¦PB¤Yh&CFE%@I;I¤MdC!@A;IE6;I¤MdChf<8<:<CFEh&V%f<:<lB>+ÉDPBE²M9;IE%HBCFE´8DCFV~¬PBE%«Lh²698<gFM9f:<698DH§;M9;IE%HBC]8'f<\¤CFE!PB_PB¤h&CFE@A;I¤M9Cp@I;IE69;I¤YMdCh!V%jDCFE%CF_cPBE%CBJPQfDEX\]CFV%jDP':[E%Ps@L69:DCh_cPBE @BCFE69_ªlL698DH![;IE%V%69;BM\$PQ8DPI

Page 110: Proceedings of the Third European Workshop on Probabilistic

V&PQ8<6gF6dV~lÞ_ªPBE*M96\]6dV&C:+hfD¤h&CFV%h³PB_@A;IE6;I¤MdCh£PQ8<MdlB>z,j<Cf<8D_È;@BPQf<E;I¤MdCgRPQ\$[fDV%;IV%69PQ8<;BMagRPQ\][MdCRrD6dV~lTPB_V%jDC[<E%PB¤M9C\²JajDP¬©CF@BCFEsJX69h4M6d«BCMdl£V&Pe_cPBE%Ch&V%;BM9MXV%jDC:DCh6dHQ8´PB_aChh&C8^V%6;BM9Mdl´\$PBE%CCWgF6dC8V\$CFV%j<P':<h>¸§C";I[<[YM96dC:¨PQf<EU@BCFEk6dÅgF;IV%6dPQ8p\$CFV%jDPL:p_ªPBEh&V%f<:<l^698<H\]PQ8DPBV&PQ8<69gF6dVml´V&P;]q;lBCh69;B88DCFVm¬©PBE«T698´@BCFVbCFE68<;IE%l$h%gF6dC8<gRCB>`b8]ECgRC8^VlBC;IEhFJL¬C#:DCF@BCM9PB[±C:;8DCFVm¬©PBE«¨_ªPBE0V%jDC:DCFV&CgRV%6dPQ8PB_YgFM9;Bh%h%69gF;BM'h&¬68DCX_cCF@BCFE698®[Y6dHQhF>ßqPBV%jËV%j<C8DCFV~¬PBE%«vhh%V&Ef<gRV%fDE%CT;B8:Ë6dV%h;Bh%h%P'gF69;IV&C:®[E%PB¤;I¤69M6dV%6dCh¬CFE%C´CM969gF6dV&C:¯_cE%PQ\ V~¬PCRrL[CFE%V%hF>Z?fDE68DHTV%jDCCM969gF6dV%;IV%69PQ8Ì698V&CFE%@L6dCF¬hFJUV%jDCCRrL[CFE%V%hWj<;B:×[<E%PL:<f<gRC:×h&CF@BCFE;BM!h&V%;IV&C\$C8V%h´V%j<;IVh%f<HBHBCh&V&C:²[<E%PB[CFE%V%69Ch©PB_U\$PQ8DPBV&PQ8<6gF6dV~lB> ¸§Ch&V%f<:L6dC:×V%jDCh&C³[<EPB[±CFEV%6dChW698×V%jDC³8DCFV~¬PBE%«­f<h%698DH°PQfDE@BCFE69ÅgF;IV%6dPQ82\$CFV%jDPL:>ܸ§C§_cPQf<8<:2;Æh%\];BMM#8'f<\¤CFE?PB_©@L6dPQM9;IV%69PQ8<h!PB_©V%jDC$h%f<HBHBCh&V&C:*[<EPB[±CFEV%6dChPB_\$PQ8<PBV&PQ8<69gF6dVmlBJ7¬j<6gj£[<E%P@BC:*V&P²¤C]698<:<6gF;IV%6d@BCPB_\$PL:DCM9M698DH68<;B:DCÙ'f<;BgF6dChF>zjDC?[;I[CFE,69h"PBE%HQ;B8<6h&C:²;Bh,_cPQM9MdP¬hF>0`¢8£K'CgRV%6dPQ8Ò'J¬CE%CF@L6dCF¬ V%jDCegRPQ8<gRCF[<VWPB_¨\]PQ8DPBV&PQ8<69gF6dVmlB>×`b8K'CgRV%69PQ8*x'J7¬C$[<E%Ch&C8VPQfDE¨\$CFV%jDPL:§_cPBE¨h&V%f<:DlL698DH\$PQ8<PBV&PQ8<69gF6dVmlB>$¸§CWE%CF[PBE%V?PQ8§V%jDC;I[<[M969gF;IV%69PQ8ePB_PQfDE?\$CFV%jDPL:68³K'CgRV%69PQ8ÓD>!z,jDC4[Y;I[±CFE#C8<:<h#¬6dV%jPQfDE,gRPQ8gFM9f<:<698<HpPB¤Yh&CFE%@I;IV%6dPQ8<h,698KLCgRV%6dPQ8y'>

¾7»XÁ X¼e¾/¾7»X¾¼B¾7»XÂmÁD¼

i#[PQ8¨E%CF@L6dCF¬698<HV%jDCgRPQ8<gRCF[<V0PB_\$PQ8DPBV&PQ869gF6dV~lBJB¬C;Bh%hf<\$C³V%j<;IVT;®q;lBCh%69;B8Þ8DCFV~¬PBE%«2f<8<:DCFE´h&V%f<:<l698gFM9f<:DCh;£h698DHQMdCPQfDV&[fDV@I;IE69;I¤M9C ;B8<:­PB¤Dh&CFE@A;I¤M9CË@A;IEk69;I¤MdCh¢J "!###$!%J&(' IÝ698¯;B:<:<69V%6dPQ8J0V%jDC´8DCFV~¬PBE%«®\];l¯698<gFM9f<:<CW;B8Æ;IE%¤6ÄV&E;IEl­8'f<\¤±CFE²PB_$698V&CFE\$C:69;IV&C§@A;IEk69;I¤MdCh´¬j69gj;IE%C×8DPBV*PB¤h&CFE@BC:Ú698 [E;BgRV%69gRCB>*)X;Bgjà@I;IE69;I¤MdC+,]698+V%jDC°8<CFV~¬PBE%«Û;B:DPB[<V%hePQ8DC¯PB_];ËÅ8<69V&CÌh&CFV- µ.+,¶/ 021 !###3!%054 QJ6 '7 IJ©PB_#@I;BM9fDCh>*¸§C;Bh%hf<\$CV%j<;IVV%jDCFE%CCRrD69h&V%h;´V&PBV%;BMPBEk:DCFE698DH98PQ8V%j<6h"h&CFV,PB_0@A;BMfDChFÝD¬6dV%j<PQfDV"MdPQh%h"PB_0HBC8DCFE;BM6dV~lBJ<¬C;Bh%hf<\$CpV%j<;IV:0<; 80>= ¬jDC8DCF@BCFE@?A8CB±>pz,jDCpPBE&:DCFEk698DHQh[CFE¦@I;IE69;I¤MdC;IE%C©V%;I«BC84V&P#698<:<f<gRC ;,[;IE%V%69;BMPBE:<CFE698DH,DËPQ8WV%jDC#h%CFV PB_NbPQ698V @I;BM9fDC;Bhh%6dHQ8<\$C8V%hV&PW;B8l´h%fD¤h&CFV"PB_0V%jDC¨8<CFV~¬PBE%«>vh#@I;IE69;I¤MdCh>zjDC²gRPQ8<gRCF[<V$PB_E,FHG>FBÏIF"GJLKJÈÏNMOJ.GQP"JSRÏNTUJIVXWDÏNJNFHG8DP¬Ì¤f69M9:<hfD[PQ8V%jDCX[±PQh%V&CFE6dPBE7[E%PB¤;I¤69M6dV~l:69h&V&E6Ĥf<V%6dPQ8<h©Ps@BCFE"V%jDC?PQfDV&[f<V©@I;IE69;I¤M9C#HQ6d@BC8V%jDC?@I;IE6ÄPQf<hYN&PQ698V@A;BMfDC";Bh%h%6dHQ8<\]C8^V%h0V&P¨V%j<C,8DCFV~¬PBE%«vhPB¤Dh&CFE@A;I¤M9Cp@I;IE69;I¤YMdChWµcÍ ;B8Ì:<CFEpG?;B;IH®ÎÏ#ÐIÑ >dJÒIuBuIÓ¶k>z,j<C×gRPQ8<gRCF[<VFJ²\$PBEC­h&[CgF6dÅYgF;BM9MdlBJW69h¯:DCFÅ8<C:Ú698

V&CFE\Wh PB_h&V&PLgj<;Bh&V%6g#:DPQ\W698<;B8<gRC,;B\$PQ8<H4V%jDCh&C!:<69hbV&E69¤fDV%6dPQ8<h>7ÉDPBE;[<E%PB¤;I¤Y69M96dVml:<69h%V&E6d¤fDV%69PQ8SXEsµI4¶P@BCFEeV%jDCÌPQfDV&[fDVT@I;IE69;I¤YMdCBJ?V%jDC°gFf<\pf<M9;IV%69@BCÌ:<69hbV&E69¤fDV%6dPQ8¯_Èf<8<gRV%6dPQ8ZY\[^]69h$:DCFÅY8DC:Ë;Bh/Y\[^]RµN_ ¶`SXEsµI 8a_ ¶²_cPBE§;BM9M4@I;BM9fDCh9_ PB_b> ÉDPBE§V~¬P:<6h&V&E6d¤f<V%6dPQ8<h"S EAµI4¶#;B8<:³S E%cµI4¶,Ps@BCFEdJ¦;Bh%h&PLgF6Ä;IV&C:§¬#6dV%jeY2[<]µI¶!;B8<:eY [^].f µI¶#E%Ch&[CgRV%6d@BCMdlBJ¦¬Ch%;sl²V%j<;IV#SXEgcµI4¶"6h:RÏIF3KXh'Ð$RRÏNJNKkÐIÑcÑiMAPjF"EdJSGYÐHGÏ,P@BCFESXEsµI4¶kJX:DC8<PBV&C:ÆS EsµI4¶`8/SXE%cȵI¶kJX6d_@Y [<]Sf µN_ ¶`8Y [^] µN_ ¶_ªPBE;BM9M_ k - µI4¶TµÈq©CFEHBCFEJl wBtIu^¶k>ʸ§C8DP¬­h;l]V%j<;IVV%jDC?8DCFV~¬PBE%«W6h@JSRXFBÏIFHG±Î6986dV%hh&CFV©PB_PB¤h%CFE%@A;I¤YMdC?@A;IE6;I¤MdChm6d_

n D n co S EsµIqp n ¶r8ßS EsµIsp n c ¶

_cPBE#;BM9MLNbPQ698V,@I;BM9fDC¨;Bh%h6dHQ8<\$C8V%h n ! n c V&P`]>`¢8D_cPBE&\];BMMdlh&[C;I«L698DHDJ¬C]j<;@BC]V%j;IV¨;²8DCFV~¬PBE%«³69h69h&PIV&PQ8DC?698:<69h&V&E69¤fDV%6dPQ8]69_±C8V&CFE698<Hp;pj<6dHQjDCFE%PBE:DCFE%C:@I;BM9fDC­;Bh%h%6dHQ8<\]C8^V®V&PV%jDC­PB¤h&CFE%@I;I¤MdC­@A;IEk69;I¤MdChgF;B8<8<PBV\];I«BC´j<6dHQjDCFE%PBE:DCFE%C:Ì@A;BM9f<Ch¨PB_,V%jDCPQfDVb[f<V@I;IE69;I¤M9C£MdChhM6d«BCMdlB>Úz,j<C³gRPQ8gRCF[<V²PB_;B8V%6ÄV&PQ8<6gF6dV~lWj;BhV%jDC?E%CF@BCFEh&C¨698V&CFE%[<ECFV%;IV%6dPQ80V%jDC8DCFVb¬PBE%«´69h,h%;B69:´V&PW¤±CWÐHGÏNJcÏIFHG±Îp698b6d_

n D n c o S EsµIqp n ¶r'ßS EsµIsp n c ¶

_cPBE;BMM<@I;BM9fDC;Bh%h6dHQ8<\$C8V%h n ! n c V&Pt]> |#PBV&C?V%j<;IV6d_;²8DCFVm¬©PBE«£69h6h&PBV&PQ8DC]698§6dV%h?PB¤Yh&CFE%@I;I¤MdC@A;IEk69;I¤MdChHQ6d@BC8×V%jDCePBE:DCFE698<HQhu8 PQ8ßV%jDC69Eh&CFV%hPB_@I;BM9fDChFJV%jDC8TV%jDC¨8DCFV~¬PBE%«²69h,;B8V%6dV&PQ8DC4HQ6d@BC8²V%jDCE%CF@BCFEkh&C:PBE:<CFE698DHQhF> Ö#M9V%jDPQfDHQjØ;B8^V%6dV&PQ869gF6dV~lÊV%j'f<h³69h®µcE%CR@BCFEh%CMdlD¶¨CÙ'f<6d@I;BMdC8^VV&P§69h&PBV&PQ8<69gF69V~lBJ¬CCRrL[M96gF6dV%Mdl:<6h&V%698DHQf<6h%j¤CFVm¬©CFC8V%jDCXVm¬©P!V~l'[Ch7PB_D\$PQ8<PBV&PQ8<69g6dVml¯h698<gRCT;*:DPQ\W;B698ÆPB_?;I[<[M969gF;IV%69PQ8Æ\];l®CRrDj<6d¤6dV;B8§698V&E69gF;IV&C$gRPQ\¤698<;IV%69PQ8³PB_©69h&PBV&PQ869gF6dV~l£;B8<:³;B8LV%6dV&PQ869gF6dV~l_cPBE698V&CFE%E%CM9;IV&C:²PB¤Yh&CFE%@I;I¤MdC?@A;IEk69;I¤MdChF>qf<6M9:<698DHfD[PQ8V%j<C?;I¤P@BC4:DCFÅ8<69V%6dPQ8<hFJ¬©C¨j<;s@BCV%j<;IV´:DCgF69:698DH§¬jDCFV%j<CFEPBE²8DPBV´;Ìq;lBCh69;B8Ê8DCFVb¬PBE%«$69hX6h&PBV&PQ8DC#;B\]PQf<8^V%h©V&P4@BCFEk6d_ªlL698DHV%j;IV©C8V&CFE&698<HT;B8^l§j<6dHQj<CFE&PBE:DCFE%C:*@A;BM9f<C];Bh%h%6dHQ8<\]C8^V¨V&P£6dV%hPB¤h%CFE%@A;I¤YMdC´@A;IE6;I¤MdCh$E%Chf<MdV%h$698Æ;*h%V&P'gkj<;Bh&V%69gF;BM9M9l:DPQ\W698<;B8V©[<EPB¤;I¤69M969V~l]:<69h%V&E6d¤fDV%69PQ8]Ps@BCFEV%jDCPQfDVb[f<V @A;IEk69;I¤MdCB>v)Xh%V%;I¤M969h%j698DH¨;B8V%6dV&PQ8<69gF6dVml];B\$PQf<8V%hV&P²@BCFE69_ªlL698DH´V%j<;IVC8V&CFE698DHT;B8l³h%f<gjÌ;Bh%h%6dHQ8<\]C8^VE%Chf<MdV%h"698²;]:<PQ\]698<;IV&C:²:<6h&V&E6d¤f<V%6dPQ8>zjDC2[<E%PB¤MdC\ PB_³:DCgF69:<68DHØ\]PQ8DPBV&PQ8<69gF6dVmlà_cPBE;Ëq;slBCh%69;B88DCFV~¬PBE%«Û69h«'8<Ps¬8ÊV&P­¤C*gRPQ|#S [<[ gRPQ\$[YMdCFV&C¨698´HBC8DCFE;BMJ;B8<:²698´_È;BgRV,E%C\];B68<h"gRPQ|#S gRPQ\$[YMdCFV&C°_cPBE£[±PQM9lV&E%CFChƵcÍ ;B8:DCFE*G?;B;IHàÎFÏ´ÐIÑ >dJ

100 L. C. van der Gaag, S. Renooij, and P. L. Geenen

Page 111: Proceedings of the Third European Workshop on Probabilistic

ÒIuBuIÓ¶k>®`¢8®@L6dCF¬ PB_!V%jDCh&C²gRPQ\$[MdCRrD6dVml°gRPQ8<h%6:DCFE;AV%6dPQ8<hJYÍ ;B8£:DCFE¨G?;B;IHÌÎFÏ,ÐIÑ >±:<Ch%6dHQ8DC:e;B8£;I[[<E%Pr'69\];IV&C;B8^l'V%69\$CW;BM9HBPBE6dV%j<\¥_cPBE4@BCFEk6d_ªlL698DH²¬#jDCFV%jDCFEPBE]8DPBV];£HQ6d@BC8Æ8DCFVm¬©PBE«¯69h$\]PQ8DPBV&PQ8DCB>®z,j<69hp;BMÄHBPBE6dV%j\ h&V%f<:6dCh$V%jDCeE%CM9;IV%6dPQ8ߤ±CFVm¬©CFC8­V%jDCPQfDVb[fDV@A;IEk69;I¤MdC;B8<:ÌC;Bgkj®PB¤h&CFE%@I;I¤MdC@A;IEk69;I¤MdCh&CF[D;IE;IV&CMdlBJ©698ÌV&CFEk\]hpPB_,V%jDC´h%6dHQ8°PB_,V%jDC´Ùf<;BM6dV%;IV%6d@BC698YfDC8<gRC?¤CFV~¬CFC8TV%jDC\ µc¸§CM9M9\];B8J wBwIu^¶kÝfD[PQ8698<gRPQ8gFM9f<h%6d@BCÆECh%f<MdV%hFJ6dV&CFE;IV%6d@BCM9l V%6dHQj^V&C8<C:à8'fL\$CFE6gF;BM0¤±PQf8<:<h?;IE%C$Ch&V%;I¤M69h%jDC:£PQ8§V%jDC$ECMdCF@A;B8V[<E%PB¤Y;I¤69M96dV%69Ch¦f<h%698<H!;B8;B8lV%69\]C\]CFV%jDP':$;s@A;B6M9;I¤MdC_cE%PQ\ 5a69f*;B8<:°¸§CM9M9\W;B8×µ wBwBtQ¶k>§i#8<_ªPBE%V%f8<;IV&CMdlBJV%jDC*P@BCFE;BM9M4Ef<8V%69\$C§E%CÙ'f<6dE%C\]C8^V%h²PB_V%jDC°;BMdHBPIE6dV%j\ÞV&C8:V&P_ªPBE%Ch%V%;BM9MLf<h&C"698;?[<E;BgRV%69gF;BMDh%CFV&V%698DHD> a¼QÀX¿ Âm» /¾7»X¾¼B¾7»XÂmÁD¼n?fDE¥\$CFV%j<P': _cPBE h&V%f<:DlL698DH \$PQ8DPBV&PQ869gF6dV~l 698q;lBCh%6;B8 8DCFV~¬PBE%«Lh£8<Ps¬ ¤f<69M9:h²fD[PQ8ÛV%jDC¯gRPQ8Lh&V&EfgRV!PB_;B8³;Bh%h%6dHQ8<\]C8^VM9;IV&V%69gRCB>z,jDCM9;IV&V%69gRC$698LgFM9f<:<Ch;BM9M±N&PQ698^V@I;BM9fDC];Bhh%6dHQ8<\$C8V%h¨V&PV%jDCWh&CFV¨PB_PB¤h&CFE@A;I¤M9C@I;IE69;I¤YMdCh;B8<:]69h0C8j<;B8<gRC:$¬69V%j[<E%PB¤D;I¤69M69h&V%69g°698D_cPBE\];IV%6dPQ8gRPQ\$[f<V&C:_ªE%PQ\ V%jDCÆ8DCFVb¬PBE%«Ìf<8<:<CFEph&V%f:DlB>£É<E%PQ\ V%jDC²M9;IV&V%69gRCBJ ;BM9M©@L6dPQM9;AV%6dPQ8<h0PB_±V%jDC"[<EPB[±CFEV%6dCh0PB_±\]PQ8DPBV&PQ8<69gF6dVmlp;IEC,69:DC8LV%6dÅ<C:¦J,_ªEPQ\ ¬#j<69gj­\]69869\];BM,PI·C8<:<68DH°gRPQ8V&CRr'V%h;IE%C4gRPQ8h&V&Ef<gRV&C:²_cPBE_ÈfDE%V%j<CFE,698^@BCh%V%6dHQ;IV%6dPQ8> ±c Dc¡ z,jDCÐ$R RUJ GEWÎXGÏ"ÑdÐBÏÏNJLKkÎp_cPBE!V%jDCh&CFV PB_PB¤h&CFE%@;I¤MdC@A;IE6;I¤MdCh$PB_;Ìq;slBCh%69;B828DCFV~¬PBE%«ËgF;I[<V%f<E%Ch;BM9MQNbPQ68^V@A;BMfDC!;Bhh%6dHQ8<\$C8V%hV&PWJ<;BMdPQ8DHp¬6dV%jV%jDC[;IE%V%6;BMaPBEk:DCFE698DH¤CFV~¬CFC8§V%jDC\²>ÉDPBE¨C;Bgkj§@I;BM9fDC;Bh%h%69HQ8<\$C8V n V&P]J;B8CMdC\$C8V#µ n ¶"69h,68<gFM9f<:DC:698×V%jDC§M9;IV&V%69gRCB>ÚzjDCe¤PBV&V&PQ\ PB_V%jDC§M9;IV&V%69gRC³C8LgRPL:DCh!V%jDC$;Bh%h%6dHQ8<\]C8^V n _ªPBE?¬j69gje¬©C$j<;s@BCV%j<;IVn D n c _ªPBE$;BM9M n c k - µNp¶kÝV%jDC¤PBV&V&PQ\ V%j'f<hpC8LgRPL:DCh0V%jDCMdPs¬Ch&VbPBE:DCFEC:$@A;BMfDC;Bhh%6dHQ8<\$C8VUV&P W>z,jDCV&PB[PB_V%jDCM9;IV&V%69gRCC8<gRPL:DChUV%jDC;Bh%h%6dHQ8\$C8^V n c c_cPBE ¬j<69gkj¬©C,j;@BC,V%j<;IV n c D n c c _cPBE ;BM9M n c k - µN¶kÝV%jDCpV&PB[§V%jfh#C8gRP':DCh!V%jDCj<6dHQj<Ch&VbPBE:DCFE%C:e@I;BM9fDC;Bh%h%69HQ8<\$C8VV&Pt]>0`b8$V%jDC,M9;IV&V%6gRCBJ¬C,_ÈfDE%V%jDCFE j<;s@BCV%j<;IVX;B8WCMdC\$C8V#µ n ¶U[<E%CgRC:<ChX;B8]CM9C\$C8^V #µ n c c ¶6d_ n D n c c ݬ©C\$PBE%CFP@BCFE]h%;l§V%j<;IV!#µ n ¶[<ECgRC:DCh#µ n c c ¶:<6dECgRV%MdlT6d_V%jDCFE%C$69h!8DP´;Bh%h%6dHQ8\$C8^V n c ¬6dV%jn " n c " n c c >z,jDC[;IE%V%6;BMPBEk:DCFE698DH:DCFÅY8DC:p¤lpV%jDCM9;IV&V%69gRC$V%j'f<h¨gRPQ68<gF69:DCh?¬69V%jeV%j<C$[;IE%V%69;BM0PBE:<CFE698DHDØPQ8³V%jDC#NbPQ68^V?@I;BM9fDC;Bh%h%6dHQ8<\]C8^V%h?V&Pb]>ÉU69HQfDE%C :<CF[69gRV%hFJ';BhX;B8CRrL;B\$[YMdCBJ^V%jDC#;Bh%h%6dHQ8<\]C8^VM9;IV&V%69gRC

01$#>1

0%&#>1 01$#'%

0%&#'%0(&#>1 01$#'(

0(&#'% 0%&#'(

0(&#'(

É06dHQfDE%C IaÖ!84CRrD;B\$[MdC;Bh%h%6dHQ8\$C8^VaM9;IV&V%69gRC_ªPBEUV~¬PV&CFE8<;IElPB¤h%CFE%@A;I¤YMdC@A;IEk69;I¤MdChF>

_cPBE©V%jDC#V~¬PV&CFE8<;IE%l$@I;IE69;I¤M9Ch + ;B8<:*)W> ÉDPBE_ÈfDE&V%jDCFE698D_cPBE\];IV%6dPQ8Ë;I¤PQfDVM9;IV&V%69gRChW68ÆHBC8DCFE;BMJ¬CE%CF_cCFE,V&PeµG!E,+;IV.-FCFEJ\ w0/< s¶k>zaPÆ:DCh%gRE6d¤CTV%j<C£CR·CgRV%h´PB_4V%j<Ce@I;IE6dPQf<h@I;BM9fDC;Bh%h%69HQ8<\$C8V%h n PQ8ÞV%jDC¯[<E%PB¤;I¤6M96dV~l2:<69h%V&E6d¤fDV%69PQ8Ps@BCFE£V%jDC°PQfDV&[fDV²@I;IE69;I¤M9CBJ!V%j<C°;Bh%h%69HQ8<\$C8VM9;IVbV%69gRC¨69hC8j<;B8<gRC:´¬6dV%j´[E%PB¤;I¤69M69h&V%69g,68D_ªPBEk\];IV%6dPQ8>¸26dV%jTC;BgkjCMdC\]C8^V1#µ n ¶"PB_V%jDCM;IV&V%69gRCBJV%jDCgRPQ8L:<6dV%69PQ8<;BM7[E%PB¤;I¤69M6dV~lT:<6h&V&E6d¤f<V%6dPQ8SXEsµI p n ¶#P@BCFEV%jDCTPQf<V&[fDV@A;IEk69;I¤MdCA 6h];Bh%h&PLgF69;IV&C:>˸§Ce8DPBV&CV%j<;IV²V%jDCh%CÌ:<69h&V&Ek6d¤fDV%6dPQ8hW;IE%C§E%C;B:69MdlßgRPQ\$[YfDV&C:_cE%PQ\ V%jDC*q;lBCh%6;B8+8DCFVm¬©PBE%«2f<8<:<CFETh&V%f<:DlB>¸§C¬69MM E%CFV%fDEk8®V&P£V%j<CTgRPQ\$[MdCRrD6dV~l°PB_V%jDC²gRPQ\$[YfDV%;AV%6dPQ8<h698^@BPQM9@BC:T698K'CgRV%6dPQ8ex'> ÓD>32 46587 9ª:* ±c Dc¡ ql*¤f<6M9:<698DH´fD[PQ8*V%j<C;Bh%h6dHQ8<\$C8VM9;IV&V%6gRCBJ@BCFE6Ä_cl'698<H³\]PQ8DPBV&PQ8<69gF6dVml¯68Æ:<69h&V&E69¤fDV%6dPQ8°;B\$PQf<8V%h$V&Pgj<Cg%«L698DH$¬jDCFV%j<CFE[Y;IE%V%69gFf<M9;IE":DPQ\W698<;B8<gRC![<E%PB[CFE&V%6dChUjDPQM9:¨;B\]PQ8DHV%jDC©[<E%PB¤;I¤Y69M96dVml#:<69h%V&E6d¤fDV%69PQ8<h;Bhbh&PLgF69;IV&C:T¬#6dV%j´V%jDC¨M9;IV&V%69gRCHvhCMdC\$C8V%hF>¸§CE%CgF;BM9MaV%j<;IV?;q;slBCh%69;B8§8DCFVm¬©PBE%«e69h#6h&PBV&PQ8DC698®6dV%hh&CFV$PB_#PB¤h&CFE%@I;I¤MdC´@I;IE69;I¤MdChd 69_,C8^V&CFEk698DH;´j<6dHQjDCFE%PBE:DCFE%C:@A;BM9f<Cp;Bh%h6dHQ8<\$C8V#V&Pb E%Chf<MdV%h698³;²h&V&PLgj<;Bh%V%69gF;BM9Mdl£:DPQ\]698<;B8V![<E%PB¤;I¤6M96dV~lT:69h&V&E6ĤfDV%69PQ8P@BCFE¨V%jDCpPQfDV&[fDV!@A;IEk69;I¤MdC,>¸§C_cf<E%V%jDCFEE%CgF;BM9M©V%j<;IVV%jDC[;IE%V%69;BM©PBE:DCFEk698DHDàPQ8¯V%jDC4NbPQ698V@I;BM9fDCT;Bh%h%69HQ8<\$C8V%h]V&PO 69h$C8<gRPL:DC:×:<6dE%CgRV%Mdl¯698V%jDC;Bh%h%6dHQ8\$C8^V?M9;IV&V%69gRC_cPBE:]>?¸§C]8DPs¬ gRPQ8<h69:DCFE;BM9MY[;B6dEhXPB_7gRPQ8<:<69V%6dPQ8<;BM<[E%PB¤;I¤69M6dV~l:<69h&V&Ek6d¤fDV%6dPQ8hS EsµIap n ¶];B8<:ËSXEsµIap n c ¶$;Bh%h&PLgF69;IV&C:ˬ69V%jÆCMdCR\$C8V%h*!µ n ¶];B8<:;#µ n c ¶$698®V%jDC£M9;IV&V%69gRChf<gjßV%j<;IV#µ n ¶[<ECgRC:DCh*!µ n c ¶k>2`m_?_cPBEWC;Bgkj­hf<gjß[;B6dE]¬Cj<;s@BCpV%j;IV!SXEsµI p n ¶ 82S EµI p n c ¶kJV%jDC8eV%jDCp8DCFVb

Lattices for Studying Monotonicity of Bayesian Networks 101

Page 112: Proceedings of the Third European Workshop on Probabilistic

¬PBE%«6h69h&PBV&PQ8DC,698dW>0zjDC"8DCFV~¬PBE%«$69h;B8V%6dV&PQ8DC,6986d_U_ªPBE,C;Bgkjehf<gj[;B6dE,PB_:<69h%V&E6d¤fDV%69PQ8<hX¬C4j<;s@BCV%j<;IV©S EsµIsp n c ¶r8ËS EsµIqp n ¶k>¸§C#8DPBV&C,V%j<;IVFJ'h%698<gRCV%jDCp[<E%PB[CFE%VmlTPB_©h%V&P'gkj<;Bh&V%69g:DPQ\]68<;B8<gRC69hV&Ek;B8<h%6ÄV%6d@BCBJ©¬©CTj;@BCTV&P*h&V%f:Dl*V%jDCT:DPQ\W698<;B8<gRC´[<E%PB[CFE&V%6dCh!PB_ V%j<C:69h&V&E6d¤YfDV%6dPQ8<h"PB_©:<6dE%CgRV%MdlTM698D«BC:e[;B6dEhPB_#CMdC\$C8V%h]698¯V%jDC²M9;IV&V%69gRC²PQ8<M9l*V&P*:DCgF69:DC²fD[PQ869h%PBV&PQ8<69gF6dVmlPBE;B8V%6dV&PQ8<6gF6dV~lB>Ö!h4\]C8^V%6dPQ8<C:°698¯K'CgRV%6dPQ8®Ò'J0;£:DPQ\];B68§PB_,;I[D[M69gF;IV%6dPQ8³\];l³CRrDj<6d¤69V!;B8*68^V&E6gF;IV&C]gRPQ\4¤68<;IV%6dPQ8PB_[<E%PB[CFE%V%6dCh!PB_69h&PBV&PQ8<6gF6dV~l³;B8<:§;B8V%6dV&PQ8<69gF6dVmle_cPBE698V&CFE%E%CM;IV&C:³PB¤h&CFE%@I;I¤MdC@I;IE69;I¤MdCh>pÖ!h¨;B8§CRrD;B\[M9CBJA¬C"gRPQ8<h%69:<CFEUV%j<CVm¬©P@I;IE69;I¤YMdCh +Ã;B8<: )+h%fgjV%j<;IV©V%jDCPQf<V&[fDVX@I;IE69;I¤YMdC,PB_¦698V&CFE%Ch&V©¤±Cj;@BCh69h&PIV&PQ8<6gF;BM9Mdl$698/+ ;B8:;B8V%6dV&PQ8<69gF;BMMdl$698*)>0¸§C?_ªPLgFf<hPQ8²V%j<C@A;BM9f<C¨;Bh%h%6dHQ8<\]C8^V%h

n ; 0 # ; 1n c ; 0 1.# ;n c c ; 0 1.# ; 1

V&PeV%jDCh&CW@I;IE69;I¤YMdChF>´|PBV&CV%j<;IV4_cPBE4V%j<CWV%jDE%CFC;Bhbh%69HQ8<\$C8V%h,¬Cpj<;s@BCV%j<;IV n ; D n c c ; ;B8<: n c ; D n c c ; ÝV%jDC£;Bh%h%6dHQ8\$C8^V%h n ; ;B8<: n c ; j;@BC§8DPÌPBEk:DCFE698DHD>ÉDPBE¯h&V%f<:DlL698DH­V%jDCß\]PQ8DPBV&PQ8<69gF6dVmlÞ[E%PB[CFE%V%6dCh§698L@BPQMd@BC:¦J^¬C8DPs¬­j<;@BC#V&P4gRPQ\$[;IEC,V%jDC,[<EPB¤;I¤69M969V~l:<6h&V&E6d¤f<V%6dPQ8<h$P@BCFEV%jDC§PQfDV&[YfDV´@A;IE6;I¤MdC V%j<;IV;IE%C!gRPQ\$[fDV&C:W_ªEPQ\ÚV%jDC!q;lBCh%69;B8´8DCFVm¬©PBE«]f<8<:DCFEh&V%f:DlBJHQ6d@BC8§V%jDC@I;IE6dPQf<h?;Bh%h%69HQ8<\$C8V%h?V&Pb+ ;B8<:)>Ì`¢__cPBE$V%jDC´[;B6dEPB_#:69h&V&E6d¤YfDV%6dPQ8<hS EsµI p n ; ¶;B8<:2SXEsµI p n c c ; ¶kJ#¬C§j<;@BC§V%j;IV´SXEµI p n c c ; ¶69hh&V&PLgj;Bh&V%69gF;BM9Mdl°:DPQ\]698<;B8V4P@BCFE]S EµI p n ; ¶kJ V%jDC8V%jDC¨8<CFV~¬PBE%«CRrDj<6d¤Y6dV%hV%jDC¨;Bh%h&PLgF69;IV&C:²[<EPB[±CFEV~lWPB_69h%PBV&PQ8<69gF6dVml³68e+Ì>²`¢__ªPBEV%jDC:<6h&V&E6d¤f<V%6dPQ8<h!HQ6d@BC8n c ; ;B8: n c c ; ¬C$j<;@BCV%j<;IVSXEµI p n c ; ¶,6h#h&V&PLgj;BhbV%69gF;BMMdl:DPQ\W698<;B8V"Ps@BCFES EsµI p n c c ; ¶kJ<V%jDC8TV%j<C48DCFVb¬PBE%«h%jDP¬h¦V%jDC;Bh%h&PLgF69;IV&C:¨[E%PB[CFE%V~l?PB_D;B8^V%69V&PQ8<69g6dVmle698 )>4z,jDC$8DCFVm¬©PBE%«eV%jfh?E%CF@BC;BM9h?¤PBV%j£[<E%PB[DCFE%V%69Ch,6d_S EµIqp n ; ¶m8ËS EsµI p n c c ; ¶m8ÆS EµIqp n c ; ¶

ÍCFE69_ªlL698DHp¬jDCFV%jDCFEPBE,8DPBV"V%j<C?8<CFV~¬PBE%«6h6h&PBV&PQ8DC698`+ ;B8<:²;B8V%6dV&PQ8DC4698 )8DPs¬Þ;B\$PQf<8V%h"V&PWh&V%f<:<l^698<HV%j<C];I¤±P@BC698<CÙf<;BM6dV%6dCh#_ªPBE4;BMM0@A;BM9f<Ch:0 ;B8<:# ; PB_0V%jDCV~¬PW@I;IE69;I¤YMdChF>8 : 9 : ¯¡09 ± 5 i#[PQ8Tf<h%698<HpV%j<C4;Bh%h%6dHQ8\$C8^V#M9;IV&V%69gRC4;Bh!:DCh%gRE69¤±C:;I¤Ps@BCBJ²h&PQ\]C×[<EPB[±CFEV%6dCh*PB_T\$PQ8<PBV&PQ8<69gF6dVmlÚ\];l

8DPBV#h%jDPs¬Þ698²V%jDC4q;slBCh%69;B8£8DCFV~¬PBE%«²f<8<:DCFE#h&V%f<:DlB>¸§CV%jDC8Th%;sl²V%j<;IVV%jDCh%C¨[<E%PB[CFE%V%6dCh";IEC4@L6dPQM9;IV&C:>ÇPBE%C_cPBE\];BM9MdlBJL¬C¨h%;lV%j<;IV,;[;B69EPB_N&PQ698^V@I;BM9fDC;Bh%h6dHQ8<\$C8V%h n ! n c ¬6dV%j n D n c JLFIÑdÐBÏ~ÎURV%jDCp[<E%PB[DCFE%Vml]PB_769h%PBV&PQ8<69gF6dVml]6d_¬©C?j<;s@BC?SXEsµIsp n ¶ ËSXEsµIspn c ¶¦_ªPBE0V%jDC69EU;Bh%h%P'gF69;IV&C:[E%PB¤;I¤69M6dV~l?:<69h&V&Ek6d¤fDV%6dPQ8hFÝ@L6dPQM9;IV%6dPQ8¯PB_!V%jDC´[<E%PB[CFE%VmlÌPB_?;B8^V%6dV&PQ869gF6dV~l¯69h:DCRÅ8<C:Th%69\]6M9;IEMdlB> ¸§CPB¤h&CFE%@BCpV%j<;IV#_ªPBE!;B8^l²@L6dPQM9;IVb698<Hp[Y;B6dE n ! n c ¬#jDPQh&CCMdC\$C8V%h,;IE%C:<6dE%CgRV%MdlM9698D«BC:698V%j<C;Bh%h%6dHQ8\$C8^V©M9;IV&V%69gRCBJ^V%j<CFE%C69h ;/W<G5JW<Î@I;IE6Ä;I¤M9C/698§V%jDC]h&CFV4PB_PB¤Yh&CFE%@I;I¤MdC$@I;IE69;I¤MdCh: V&P¬j69gj n ;B8<: n c ;Bh%h6dHQ8§;T:<6Ä·CFE%C8V@A;BMfDCB>]5¦CFV n ; ¤CpV%jDC@A;BM9f<CpPB_©V%j<69h!@I;IE69;I¤MdCp¬6dV%j698V%j<C;Bhh%6dHQ8L\$C8V n ;B8<:M9CFV n = JBZ?<J'¤C!6dV%h@A;BM9f<C#698 n c > z,jDC@L6dPQM9;IV%6dPQ8³8DP¬Ü69h?:DC8DPBV&C:§¤'l¯µ n ! n c p ; = ¶k>¸§Ch%;slWV%j<;IVV%jDC?@L6dPQM9;IV%6dPQ8´j<;Bh;gj;B8DHBCPB_7V%jDC?@I;BM9fDCPB_2@¦_cE%PQ\ n ; V&P n = _cPBE"6dV%h:F"TUJ JSG<JL¬,Ek6dV&V&C8 ; = >z,j<Ce@I;BM9fDC³;Bh%h%6dHQ8\$C8^V n V&PZW3m§V%j<;IVT69hh%j;IE%C:£¤l n ;B8<: n c J¦69h?gF;BM9MdC:£V%jDC K F"GÏ~Î^ÏPB_©V%jDC@L6dPQM9;IV%6dPQ8¦>`b8²HBC8DCFEk;BMJYf<h698DH$V%jDC4;Bhh%6dHQ8<\$C8VM9;IV&V%6gRC4_cPBE#;q;lBCh69;B8§8DCFV~¬PBE%«³\];leE%Ch%f<M9V#698£;´h&CFV+PB_X@L6dPIM9;IV%69PQ8<hPB_0V%jDC[<E%PB[CFE%V%6dChPB_0\$PQ8DPBV&PQ869gF6dV~lB>"K'PQ\$CPB_V%jDCW69:<C8^V%6dÅC:e@L6dPQM9;IV%69PQ8<h\];l§h%j<Ps¬ÚgRPQ8<h69:DCFE&;I¤M9C]E%CFHQf<M9;IEk6dV~lBJ0698*V%j<Ch&C8<h%C]V%j<;IV6d_";T@L6dPQM9;IV%6dPQ8PBE69HQ698<;IV&Ch0_cE%PQ\Ü;¨gkj<;B8DHBC,PB_@I;BM9fDC,698$;?[;IEV%69gFf<M9;IEgRPQ8V&CRr'VFJ0V%jDC8*V%j<69h¨h%;B\]CWgj<;B8<HBCgF;Bf<h&Chp;²@L6dPQM9;AV%6dPQ8W698$;BM9MDj<6dHQj<CFE&PBE:DCFE%C:$gRPQ8V&CRrLV%h©;BhX¬©CM9M>KLfgj@L6dPQM9;IV%6dPQ8h¬69M9M±¤CV&CFE\$C: RÏNTUW KsÏNWTbÐIÑ >,n!V%jDCFE@L6dPIM9;IV%69PQ8<h¦_cE%PQ\ÞV%jDC©h&CFV*:DP!8DPBVUj;@BCh%f<gjp;,ECFHQf<M9;IEh&V&Ekf<gRV%fDE%C;B8<:\];l¤±CgRPQ8<h%6:DCFE%C:JSG>KJNPÎXGÏÐIÑ >0¸§C:<6h&V%698DHQf<6h%j³¤CFV~¬CFC8°V%j<CVm¬©P§Vml[ChpPB_,@L6dPQM9;IV%6dPQ8V&P;BM9MdPs¬Ø;gRPQ\][;BgRV#ECF[<E%Ch&C8V%;IV%6dPQ8ePB_V%jDC4C8V%6dE%Ch&CFVPB_U69:<C8^V%6dÅC:´@'6dPQM;IV%6dPQ8<hF>¸§CgRPQ8<h%69:<CFE7V%jDCXhfD¤h&CFVµN ; = ¶ !§PB_<;BM9MB@L6ÄPQM9;IV%69PQ8<h?PB_©V%j<C$[<E%PB[CFE%V%6dCh?PB_\$PQ8DPBV&PQ8<6gF6dV~l£V%j<;IVPBE69HQ698<;IV&C,_cE%PQ\;4gj;B8DHBC?PB_V%jDC@I;BM9fDC#PB_V%jDC@I;IE6Ä;I¤M9Cd@©_cE%PQ\ n ; V&P n = >z,jDC]gRPQ8V&CRr'V n PB_h%fgj;×@'69PQM9;IV%6dPQ8+8<Ps¬ 69hegF;BMMdC:Þ; RÏNTUWKÏNWT&ÐIÑK F"GÏ~Î^ÏF#"uF%$!ÎXG>KÎe6d_¬CTj<;@BCeV%j<;IV]V%jDCFECT69h$;§@L6dPQM9;IV%6dPQ8µ n ! n c p ; = ¶ k µN ; = ¶_ªPBE];BMM@A;BM9f<C´;Bhh%6dHQ8L\$C8V%h n V&P` V%j;IV?68<gFM9f<:DCV%j69h#gRPQ8V&CRrLV n PBE;j<69HQjDCFE&PBE:DCFEC:]PQ8DCB> `¢_¦V%jDCFEC!69h8DP$MdP¬©CFE&PBEk:DCFE%C:h&V&Ekf<gRV%fDE;BMBgRPQ8V&CRr'VUPB_LPI·±C8gRCX_cPBE7V%jDCXh;B\$CXgj;B8DHBCPB_L@I;BM9fDC ; = JI¬©Ch%;lV%j<;IVaV%jDCXgRPQ8V&CRr'V0PB_LPI·C8<gRC69h RRÏNT WKÏNWT&ÐIÑcÑiMEJ.GJ.E]ÐIÑo>©Ö+h&V&EfgRV%fDE;BM9Mdl]\W698<69\];BMgRPQ8V&CRr'V"PB_aPI·±C8gRCV%jf<hgkj<;IE;BgRV&CFE69h%Ch";ph%CFV"PB_¦E%CR

102 L. C. van der Gaag, S. Renooij, and P. L. Geenen

Page 113: Proceedings of the Third European Workshop on Probabilistic

M9;IV&C:­@L6dPQM9;IV%6dPQ8<h>ÞÖ h&V&Ekf<gRV%fDE;BM9M9l®\W698<69\];BMgRPQ8LV&CRrLV n _ªPBE0V%jDC@L6dPQM9;IV%6dPQ8<hUPBE69HQ698<;IV%698DH#_ªEPQ\ ; =\$PBE%C$h&[CgF6dÅYgF;BM9MdlT:DCFÅ8<Ch;hfD¤M9;IV&V%69gRCPB_©V%jDC$;Bhbh%6dHQ8\$C8^VM9;IV&V%69gRC698W¬j<69gkjWC;Bgkj´CMdC\$C8V698gFM9f<:DChn ; ;B8<:$698p¬j<69gkj_ªPBEC;Bgkj]CMdC\]C8^V n V%jDCFE%CCRrL6h&V%h ;@L6dPQM9;IV%6dPQ8¯µ n ! n c pj ; = ¶698TV%j<Ch&CFV ?µN ; = ¶kÝV%jDCgRPQ8V&CRr'V n 69hV%jDC4¤±PBV&V&PQ\ÃPB_aV%j69h"h%fD¤M;IV&V%69gRCB>`b8´;B:<:<6dV%6dPQ8V&PV%jDC?@'69PQM9;IV%6dPQ8<h©V%j<;IV";IE%CgRPs@BCFE%C:¤'lWh&V&Ef<gRV%f<E;BM9Mdl$\]69869\];BMgRPQ8V&CRrLV%hPB_7PI·C8<gRCBJDV%jDCh&CFVpPB_,;BM9MX6:DC8^V%69Å<C:*@L6dPQM9;IV%6dPQ8h4\];slÌ698<gFM9f:DC$CMdCR\$C8V%h,V%j<;IV#;IE%C48DPBVh%V&Ef<gRV%fDE;BMMdl]E%CM9;IV&C:¦>©ÖØgRPQ8LV&CRrLV n 69hh%;B69:×V&P°¤C£;B8 J.G>KJLP^ÎGÏÐIÑ:K F"GÏ~Î QÏ/F#"F#$!ÎG>KkÎ*6d_6dV6h8DPBV´;¯h&V&Ef<gRV%fDEk;BMgRPQ8V&CRr'V²PB_PB_ª_cC8<gRCB>¸§C$8DPBV&CV%j<;IV;B8^lh%CFV!PB_X@'6dPQM;IV%6dPQ8<h!69:DC8LV%6dÅ<C:¯_cE%PQ\ ;³q;slBCh%69;B8Ë8<CFV~¬PBE%«¯6hp8<Ps¬/gj<;IEk;BgV&CFE69h%C:¤lp;h&CFVPB_Yh&V&Ekf<gRV%fDE;BM9M9l4\]698<6\];BM^gRPQ8V&CRr'V%hPB_0PI·±C8gRC4;B8<:T;]h%CFV"PB_068<gF69:DC8V%;BMgRPQ8^V&CRrLV%hF>Ö!h¨;B8§CRrD;B\$[MdCBJa¬©C]gRPQ8h%69:DCFEV%jDCM9;IV&V%69gRC$_cE%PQ\É06dHQfDE%CZ §_cPBE²V%jDC§Vm¬©P×V&CFE8<;IE%l×PB¤h&CFE%@I;I¤MdC³@I;IE6Ä;I¤MdCh + ;B8<: )W> KLfD[<[PQh&C®V%j<;IVÌf<[±PQ8@BCFEk6d_ªl698DH69h&PBV&PQ8<6gF6dV~l PB_ÌV%jDCØq;slBCh%69;B8¥8DCFVm¬©PBE%« f<8L:DCFEËh&V%f:DlBJ²V%jDC2V%jDE%CFCÊ@L6dPQM9;IV%6dPQ8h2µ.021.# %"!%0 % #'% p+ 1 % ¶ !sµ.0 1 # ( !%0 % # ( p + 1 % ¶²;B8<:Úµ.0 % # % !%0 ( # % p+ % (s¶²;IE%C¯69:DC8V%6dÅ<C:> z,jDC*Å<Ekh&VTV~¬P­PB_pV%j<Ch&C@L6dPQM9;IV%6dPQ8<hj<;@BC³V%jDC³h%;B\$C£PBE6dHQ68³V%jDCeVm¬©P®@L6dPIM9;IV%6dPQ8h;IEk69h&C³6d_V%jDC³@A;BMfDCePB_V%jDC³@A;IEk69;I¤MdCA+ 69hgj;B8DHBC:Ê_cE%PQ\ 01´V&PZ0%I>Úz,j<Ch&C³@'6dPQM;IV%6dPQ8<h´;IE%Cgj;IE;BgRV&CFE69h&C:°¤'l£V%j<Ch&V&EfgRV%fDE;BM9Mdl£\]698<6\];BM0gRPQ8LV&CRrLV7PB_'PI·±C8<gRC # %I¦68¤PBV%jV%jDC gRPQ8V&CRr'V# %;B8:V%jDCj<6dHQj<CFE&PBE:DCFE%C:gRPQ8^V&CRrLV # ( JQgkj<;B8DHQ698<HV%jDC@A;BMfDC©PB_+ _cE%PQ\ 021!V&Pb0 %HQ69@BChE69h&CpV&PT;´@L6dPQM9;IV%6dPQ8>z,jDCV%j<6dEk:´@'6dPQM;IV%6dPQ8T\$C8V%6dPQ8DC:e;I¤±P@BC69h#8DPBV,lBCFV!gRPs@CFE%C:̤l£V%jDC6:DC8^V%69Å<C:³\]698<69\W;BMUgRPQ8V&CRrLVph%68<gRC]6dVj<;Bh!;:6Ä·±CFEC8^V#PBE6dHQ698>,zj<69h,@L6dPQM9;IV%69PQ8TPBE6dHQ68<;IV&Ch_cE%PQ\ ;°gj<;B8<HBCePB_V%jDC£@I;BM9fDCTPB_V%jDC£@I;IE69;I¤MdCu+_cE%PQ\ 0%#V&P0 (;B8<:´;IHQ;B698´j<;Bh #'%,_cPBE"6dV%hPI·±C8:<698DHgRPQ8V&CRr'VF>z,jDCgRPQ8V&CRr'V6# % 8DP¬+69h"\$CFECMdl68<gF69:DC8LV%;BM±h%698<gRC!8DP4@L6dPQM9;IV%6dPQ8j<;Bh©¤±CFC869:DC8V%6dÅ<C:W_ªPBEV%jDC[;B6dE"PB_U;Bh%h%6dHQ8\$C8^V%h0% # (?;B8<:0 (&# (!¬69V%j68<gFM9f<:DCV%jDC4j6dHQjDCFE&PBE:<CFE%C:´gRPQ8^V&CRrLV6# ( >zaP]gRPQ8<gFM9f<:<CBJ'¬C¬PQf<M9:´M96d«BC!V&PW8DPBV&CV%j<;IV_cPBE;HQ6d@BC8h&CFVPB_@'6dPQM;IV%6dPQ8<hFJV%jDC!h%V&Ef<gRV%fDE;BMMdlp\W698<69\];BMgRPQ8V&CRr'V%h£PB_$PI·C8<gRC¯;IECÌE%C;B:<6Mdl×Ch%V%;I¤M969h%j<C:ʤlV&E;s@BCFEh%698DH°V%jDC;Bh%h6dHQ8<\$C8VWM9;IV&V%69gRCef<8<:DCFEWh%V%f<:DlB>K'V%;IE%V%68DH;IV4V%j<CW¤PBV&V&PQ\ PB_"V%j<CM9;IV&V%69gRCBJUV%jDC[<E%PIgRC:<fDEC¨Å8<:<h;B8TCMdC\$C8V6!µ n ¶,h%f<gjeV%j<;IV n 69hV%jDCMdP¬©Ch&VbPBEk:DCFE%C:4NbPQ68^V @I;BM9fDC!;Bh%h%6dHQ8\$C8^VXP'gFgFfDEE698DH

698;¨@L6dPQM9;IV%6dPQ8£µ n ! n c p$ ; = ¶ k !> z,jDC[<E%PLgRC:<fDE%Ch%fD¤Yh&CÙf<C8^V%Mdl*6h&PQM9;IV&Chp_ªEPQ\ V%jDC;Bhh%6dHQ8<\$C8V$M9;IVbV%69gRCV%jDC$h%fD¤YM9;IV&V%69gRCp¬6dV%j£V%jDCCMdC\$C8V #µ n ¶kJ7¬6dV%jn n ; n JY_cPBE?6dV%h,¤PBV&V&PQ\²ÝV%jDCph%fD¤M9;IV&V%6gRC¨_cf<E%V%jDCFE698<gFMf<:DChU;BM9MLCMdC\$C8V%h#µ n ; n ¶a¬#6dV%j n ; n D n ; n >|#Ps¬¨J<6d_7C;Bgj²CMdC\]C8^VPB_7V%j<C?hfD¤M9;IV&V%69gRC!PLgFgFfDEh698;§@L6dPQM9;IV%6dPQ8ˬ6dV%jZ ; = _ªPBE6dV%h$PBE6dHQ68JXV%jDC8ßV%jDCgRPQ8V&CRr'V n 69h,;h&V&Ef<gRV%fDEk;BM9Mdl\W698<69\];BMgRPQ8V&CRrLV!PB_PI·C8<gRCBÝ,PBV%jDCFE%¬#69h&C²6dV$69h698<gF69:<C8^V%;BM>¯z,jDC´[<E%PLgRCR:<fDEC69h69V&CFE;IV%6d@BCMdlE%CF[C;IV&C:$_ªPBE ;BMMLlBCFV f<8<gRPs@BCFEC:@L6dPQM9;IV%6dPQ8<h> 9 7 5 ªÆ¡09 7 A D9 7z,jDC Ekf<8^V%6\$C gRPQ\$[MdCRrD6dV~l PB_®PQfDEÛ\$CFV%jDPL:¥_cPBEh&V%f<:<l'698<H4\$PQ8DPBV&PQ8<6gF6dV~l]PB_U;q;slBCh%69;B8²8DCFVm¬©PBE%«69h:DCFV&CFE\W698DC:$¤'lV%j<C#h%6 -FC"PB_V%j<C;Bh%h%6dHQ8\$C8^V©M9;IV&V%69gRCf<h&C:¦>¸§C´PB¤h&CFE%@BC´V%j<;IVpV%j<69hpM9;IV&V%69gRCC8<gRPL:DCh;B8CRrL[±PQ8<C8^V%69;BMY8f\4¤CFE PB_@I;BM9fDC;Bh%h6dHQ8<\$C8V%hXV&PV%jDCh&CFV²PB_PB¤h&CFE@A;I¤M9C£@I;IE69;I¤YMdChF> =PQ8<h&V&EfgRV%698DH®V%jDCM9;IV&V%69gRC;B8<:£gRPQ\$[fDV%68DH]V%jDCp[<E%PB¤;I¤Y69M96dVml:<6h&V&E6d¤fDV%6dPQ8<h!V&P´¤C;Bh%h%P'gF69;IV&C:³¬#6dV%je69V%h#CM9C\$C8^V%hJ±V%j<CFE%CR_cPBE%CBJWV%;I«BCh¯CRrL[±PQ8<C8^V%69;BM]V%69\$CB> ÇTPBE%CFP@BCFEJ´V%jDC:DPQ\]68<;B8<gRC[<E%PB[CFE%V%6dCh0PB_;B8$CRrL[PQ8DC8V%69;BMD8f\4¤CFEPB_[;B6dEh?PB_[<E%PB¤Y;I¤69M96dVml:<69h%V&E6d¤fDV%69PQ8<hj<;s@BCWV&PT¤CgRPQ\$[;IEC:>ÊÉDPBE+¤698<;IE%l°PB¤h&CFE%@I;I¤MdC@A;IE6;I¤MdChFJ_cPBE,CRrL;B\][MdCBJ<;BMdE%C;B:<l

µ.eb¶ ßÒ gRPQ\$[;IEk69h&PQ8<h$;IE%C²ECÙf<69E%C:>ÌÉDEPQ\ V%jDCh&CTgRPQ8h%69:LCFE;IV%6dPQ8h]¬C£j<;s@BCeV%j;IVWPQfDE\$CFV%jDPL:ßj<;Bh;*@BCFE%lj<6dHQjTEf<8V%69\$C¨gRPQ\$[M9CRrL6dVmlB>"Ö#MdV%j<PQfDHQj6dV%hE%CÙ'f<6dE%CR\$C8V%h$gF;B8ƤCEC:<f<gRC:®V&P*;IV]MdC;Bh&V$h&PQ\]C²CRr'V&C8V¤'l£CRrL[MdPQ69V%698DH´V%jDCW698:DCF[C8<:DC8<gRCh\$PL:DCM9M9C:§698*;q;lBCh%6;B88DCFVm¬©PBE«YJL_cPBE©M9;IE%HBCFE©8DCFV~¬PBE%«Lh©698gFM9f<:<698<H;¯M9;IE%HBCe8'f<\¤CFEWPB_¨PB¤h%CFE%@A;I¤YMdCT@I;IE69;I¤MdChh&V%f<:Dl698DH\$PQ8DPBV&PQ869gF6dV~l´698DCF@L6dV%;I¤M9l$¤±CgRPQ\]Ch"698D_cC;Bh%6d¤MdCB>z,jDCf<8D_È;@BPQf<E;I¤MdCpgRPQ\$[fDV%;IV%6dPQ8;BMUgRPQ\][MdCRrD6dV~lTPB_V%jDC[<E%PB¤MdC\TJUjDP¬©CF@BCFEsJ 69h4M969«BCMdl³V&P_cPBE%Ch&V%;BM9M©V%jDC:DCh%69HQ8´PB_UChh&C8^V%6;BM9Mdl\]PBE%C¨CWgF6dC8V\$CFV%j<P':<h>`b8ß@L6dCF¬ PB_¨V%jDC³j<6dHQj×Ef<8V%69\$C£gRPQ\$[MdCRrD6dVmlß698L@BPQMd@BC:J^¬©C[<E%PB[PQh&CV&Pf<h&CPQfDE\$CFV%jDPL:_cPBEh&V%f<:Dl698DH![<E%PB[CFE%V%69Ch7PB_LÐ"TRÏNJÈÐIÑ>E/F"G>FBÏIF"G5JNKJcÏNMPQ8<MdlB>0z,jDCgRPQ8<gRCF[<VaPB_D[;IE%V%69;BM^\$PQ8DPBV&PQ8<69gF69V~l;I[<[M969ChV&P#;#h%fD¤Dh&CFV + PB_ V%jDCpPB¤h&CFE@A;I¤M9C4@I;IE69;I¤MdCh#PB_X;8DCFV~¬PBE%«±>Ö!8Æ;Bh%h%6dHQ8<\]C8^V]M9;IV&V%6gRCT69h$gRPQ8<h&V&EfgRV&C:Æ_ªPBE$V%j<Ch&C@I;IE69;I¤MdChÌ;BhÌ:DCh%gREk6d¤C: ;I¤±P@BCB> z,jDCÆ[<EPB¤;I¤69MÄ6dVml×:<69h%V&E6d¤fDV%69PQ8<hW;Bh%h%P'gF69;IV&C:2¬#6dV%j­V%jDC§CMdC\$C8V%h

Lattices for Studying Monotonicity of Bayesian Networks 103

Page 114: Proceedings of the Third European Workshop on Probabilistic

PB_V%jDC£M9;IV&V%69gRC³;IE%C³gRPQ8<:<6dV%6dPQ8<C:ËPQ8­; 'ÎgP*NbPQ68^V@I;BM9fDC!;Bh%h%6dHQ8<\]C8^V n V&PpV%jDC!PB¤h&CFE%@I;I¤MdC#@A;IEk69;I¤MdCh ! + V%j<;IV";IE%C?8DPBV"698<gFM9f<:<C:]698V%jDCh&V%f<:DlB>¸26dV%jÆC;Bgkj×CM9C\$C8^V*#µ.0¶$PB_?V%jDCM9;IV&V%69gRCeV%jfhW69h;Bh%h%P'gF69;IV&C:£V%jDC$gRPQ8<:6dV%6dPQ8<;BMa[<E%PB¤;I¤Y69M96dVml:<6h&V&E6d¤fDV%6dPQ8£S EAµI p<0 n ¶k>?zjDCpM;IV&V%69gRCp8<Ps¬ [<E%P@'6:DCh_cPBEh&V%f:Dl'68DH"V%jDC\$PQ8DPBV&PQ8<69gF69V~l[<E%PB[CFE%V%69Ch¦PB_<V%jDC8DCFVb¬PBE%«§_cPBEpV%j<Ch&CFV+ HQ6d@BC8 n >³z,jDC;Bh%h%6dHQ8<\]C8^Vn V%jDC8­69hV&CFE\$C:×V%jDCZVÐ K HT%F"WG>PßÐ"R R J HG5E]ÎGÏ_cPBEh&V%f<:<l'698<HV%jDC4[;IE%V%69;BM\$PQ8<PBV&PQ8<69gF6dVmlB>¸§C׬©PQfM9:ÜM96d«BCÆV&PØ8DPBV&CßV%j;IV*[;IE%V%69;BMW\$PQ8DPIV&PQ8<6gF6dV~l¯PB_;*8DCFV~¬PBE%«®HQ6d@BC8×;§[;IE%V%69gFfM9;IE$¤;Bg%«HBE%PQf8<:³;Bh%h%6dHQ8<\]C8^V n :DP'Ch8DPBV?HQf<;IE;B8V&CFC][;IE&V%69;BM#\$PQ8DPBV&PQ8<6gF6dV~l®HQ6d@BC8­;B8^lÆPBV%j<CFE¤;Bg«HBEPQf<8<:;Bh%h6dHQ8<\$C8VF>UÖ!M9h&PDJB@L6dPQM9;IV%69PQ8<hUPB_YV%jDC[<EPB[±CFEV%6dCh7PB_\$PQ8<PBV&PQ8<69gF6dVmlV%j<;IVU;IEC69:<C8^V%6dÅC:¨HQ6d@BC8p;[Y;IE%V%69gFfLM9;IE¤Y;Bg%«'HBE%PQf<8<:];Bhh%6dHQ8<\$C8V \];sl8<PBVP'gFgFfDE©HQ6d@BC8;B8DPBV%j<CFEph%fgjÌ;Bhh%6dHQ8<\$C8VF>²z,jDC¤;Bg%«'HBE%PQf<8<:°;Bhbh%69HQ8<\$C8V;IHQ;B698<h&V!¬j<6gjT[Y;IE%V%69;BM7\]PQ8DPBV&PQ8<69gF6dVml69hV&P³¤C²h&V%f<:<6dC:¯h%jDPQfM9:ÌV%j<CFE%CF_ªPBEC¤C´gjDPQh%C8Ƭ6dV%jgF;IE%C#;B8<:$¤C"¤;Bh&C:]f<[±PQ8$gRPQ8h%69:DCFE;IV%69PQ8<h0_ªEPQ\ V%jDC:DPQ\W;B698´PB_U;I[[M969gF;IV%6dPQ8¦> §» 9 ~ÂmÁ±¼BÂ~¾7»¸§C;I[<[M969C:PQfDE0\$CFV%jDPL:4_cPBEUh%V%f<:DlL698DH\$PQ8<PBV&PQ8<69g6dVml$V&P$;4E%C;BM±q;lBCh%6;B8²8DCFV~¬PBE%«W698W@BCFV&CFE698<;IElWh%gF6ÄC8<gRCB>X¸§C¨¤<E6dC l]698^V&EP':<fgRC!V%j<C¨8DCFV~¬PBE%«´;B8<:²:DCRh%gREk6d¤C#V%jDC4@'6dPQM;IV%6dPQ8<hV%j<;IV,¬C469:<C8^V%6dÅC:> : 9 9Y¡ c <sc¡Q *: `b8gFMdPQh&CËgRPQM9M9;I¤PBE;IV%69PQ8Þ¬#6dV%jØV~¬PÊCRr'[CFE%V%h³_cE%PQ\V%jDCÌ=C8V&E;BM!`b8<h&V%6dV%fDV&C³PB_pÖ#8<6\];BM#Z?69h%C;Bh&C°=PQ8LV&E%PQMp698+V%jDC¯|#CFV%jDCFEM;B8<:<hFJ¬©C®;IE%C®:DCF@BCMdPB[698<H×;q;lBCh69;B8°8<CFV~¬PBE%«§_ªPBEV%jDC:<CFV&CgRV%6dPQ8ÌPB_"gFM9;Bhh%69gF;BMh&¬#698DC´_ªCF@BCFEs>Û="M9;Bh%h%69gF;BM"h%¬698DC²_cCF@BCFE6h];B8ß698<_ªCgV%6dPQfhW:<69h%C;Bh&CPB_[69HQhFJ"¬j<69gkjËj<;Bhh&CFE6dPQfhWh&PLgF6dPICgRPQ8DPQ\W69gF;BMYgRPQ8h&CÙf<C8<gRCh©f<[±PQ8W;B8WPQfDV&¤E%C;I«Y> Ö!hV%jDC,:69h&C;Bh&C"j<;Bh;?[PBV&C8V%69;BMD_cPBEE;I[69:h&[<EC;B:JQ6dV 69h69\][±CFEk;IV%6d@BC©V%j;IV 6dV%h0P'gFgFf<E%E%C8<gRC,69h0:DCFV&CgRV&C:698pV%jDCC;IEM9lph&V%;IHBCh> z,jDC,8DCFVm¬©PBE%«$f<8:DCFE gRPQ8<h&V&EfgRV%6dPQ8]69h;B69\]C:²;IVh%fD[<[PBE%V%68DH@BCFV&CFEk698<;IE%l[<E;BgRV%6dV%69PQ8DCFEh"698V%jDC£:<69;IHQ8DPQh69h]PB_¨V%j<C£:<6h&C;Bh&C¬#jDC8Ë@L69h%6dV%68DH§[6dH_È;IE\]h¬6dV%j´:<6h&C;Bh&C?[<E%PB¤M9C\]hPB_af<8D«L8DP¬8´gF;Bf<h&CB>n?f<E8<CFV~¬PBE%«²gFfDE%E%C8V%MdlW698gFM9f<:DChÓ^Òp@I;IE69;I¤MdCh_cPBE¬j69gj­Ps@BCFE§ÒAÓQuBuÆ[;IEk;B\$CFV&CFE[<EPB¤;I¤69M969V%6dChWj<;s@BC¤CFC8³;Bh%h&Ch%h%C:>¨z,jDCp@A;IEk69;I¤MdCh?\$PL:DCMaV%jDC[;IV%jDPIHBC8DCh69hUPB_V%jDC":<69h%C;Bh&C";Bh0¬©CM9ML;Bh0V%jDC"gFM968<69gF;BM'h%6dHQ8<hPB¤h%CFE%@BC:°68°698:<6d@L69:<f<;BMU[6dHQh>£ÒAÓ£PB_,V%jDCV&PBV%;BM©PB_

Ó^Ò@A;IEk69;I¤MdCha;IE%C©PB¤Yh&CFE%@I;I¤MdCB>aÉ06dHQf<E%CÒ#:DCF[69gRV%h7V%jDCHBE;I[Yj<69gF;BMh&V&EfgRV%fDE%CPB_UV%jDC4gFfDE%EC8^V,8DCFVm¬©PBE«Y> 2 9aFsa¡B: Ì c D,ª¡ Z?fDE68DHV%jDCCM96gF6dV%;IV%6dPQ8p698^V&CFE@'6dCF¬#hFJIPQfDE@BCFV&CFE698<;IElCRrL[CFE%V%h¨[<EP':<fgRC:§@A;IEk6dPQf<h¨h&V%;IV&C\]C8^V%hpV%j<;IVh%fDHIHBCh&V&C:Þ\$PQ8DPBV&PQ8<6gF6dV~lB> z,j<CFl­698:<69gF;IV&C:J#_ªPBETCRr';B\$[YMdCBJYV%j;IV#V%jDCpPQfDV&[f<V#@I;IE69;I¤M9C "!2JST&Ð^ÎXEdJÐh%j<PQf<M9:§¤Cj<;s@BC69h%PBV&PQ8<69gF;BM9Mdl§698ÌV&CFE\]hpPB_,V%jDCÅ<@BC@I;IE69;I¤M9Ch$#lJÐ"TUT hFsÎ%ÐIJ&%ÏÐ HJÐIJ'7Î ÎXTJ)(ÌÐIÑdÐ"J RÎRJI;B8<:*$JSG h'Ð^ÎXE/F"TUT h'Ð ÎXRk>p¸§C_ªPLgFf<h!PQ8³V%jDCh&CPB¤h&CFE@^;I¤M9CX@I;IE69;I¤MdChaV&P69M9M9fh&V&E;IV&C V%jDC;I[<[M69gF;IV%6dPQ8PB_<PQfDE\$CFV%j<P':´_cPBE#h&V%f:Dl'68DH[Y;IE%V%69;BM\$PQ8DPBV&PQ869gF6dV~lB>É<E%PQ\ V%jDC Å@BC PB¤h&CFE%@I;I¤MdC @I;IE69;I¤M9Ch f<8<:DCFEh&V%f:DlBJ¦¬C]gRPQ8<h&V&Ef<gRV&C:Ì;B8*;Bh%h%6dHQ8\$C8^V¨M;IV&V%69gRC];Bh:DChgRE6d¤C:Ü698ÚV%jDC×[E%CF@'69PQf<h*h&CgRV%6dPQ8¦> KL698gRC×;BM9M@I;IE69;I¤M9Ch,698@BPQMd@BC:¬CFE%C¤698<;IE%lBJ±;B:DPB[<V%698<H]PQ8DC4PB_V%jDC°@A;BMfDChÌÏNTUWή;B8<: "FÐIÑiRFÎFJ4¬C¯f<h%C:+;ßhM96dHQjV%Mdl\$PBECgRPQ8<gF69h&CeC8<gRPL:<698DH§PB_V%jDC69EN&PQ698^VW@A;BM9f<C;Bhbh%69HQ8<\$C8V%hF>#ÉDPBEC;Bgkj§;Bh%h%6dHQ8\$C8^V n V&P´V%jDCh&CFV#PB_PB¤h%CFE%@A;I¤YMdC@A;IEk69;I¤MdChWJ¬C©M9CFV #µ n ¶¤C V%jDCXhfD¤h&CFVPB_5+h%f<gjpV%j<;IVv@ k !µ n ¶a6d_;B8:4PQ8<M9l46d_2ØÏNTUWÎPLgFgFfDEh¨68 n >Wz,j<C$CMdC\$C8V%h¨PB_V%jDC]E%Ch%fMdV%698DH´M9;IVbV%69gRCV%jf<hW;IE%Ch%fD¤Yh&CFV%h$PB_:]Ý,V%j<CT¤PBV&V&PQ\ PB_?V%jDCM9;IV&V%6gRC6h#V%j<CpC\][<V~l³h&CFV;B8<:³V%jDCV&PB[§CÙ'f<;BM9h!V%jDCC8V%6dE%Ch&CFVW>z,jDCE%Ch%f<M9V%698DHTM9;IV&V%69gRC_cPBEV%jDCÅ<@BC@I;IE69;I¤M9Chf<8<:DCFE¨h&V%f:Dle69hhjDPs¬8Ì698§É06dHQfDE%CWx'>]`¢V698gFM9f<:DCh¦Ò+@ÊxBÒCMdC\$C8V%haV&P!gF;I[<V%f<E%C©;BM9M^[PQh%h%6d¤M9CN&PQ698VU@I;BM9fDC";Bh%h6dHQ8<\$C8V%h0V&PV%jDC"@I;IE69;I¤MdCh0;B8<:p_ÈfDE&V%jDCFE698<gFM9f:DCh tIu4:<69E%CgRVXh&CFVb~698gFM9f<h%6dPQ8Wh&V%;IV&C\$C8V%hF>qCF_cPBE%CËV%jDC­M9;IV&V%69gRCßgRPQfM9: ¤CÆC8<j<;B8<gRC:Ú¬6dV%jgRPQ8<:6dV%6dPQ8<;BMD[E%PB¤;I¤69M6dV%6dCh0PB_V%jDC#[<E%Ch&C8<gRC#PB_7;¨@L6ÄE;IC\W69;?PB_±gFM;Bh%h%69gF;BM<h%¬698DC_cCF@BCFEJ^¬Cj<;B:$V&P:DCgF69:DCfD[PQ8;4¤Y;Bg%«'HBE%PQf<8<:²;Bh%h6dHQ8<\$C8V©_cPBEV%jDC?PBV%jDCFEl wPB¤h%CFE%@A;I¤YMdC]@I;IE69;I¤MdCh;IHQ;B698<h&V¬j<69gkj*V%jDC[<E%PB[CFE&V%6dCh]PB_#[;IEV%69;BM"\$PQ8DPBV&PQ8<6gF6dV~l¯¬©PQfM9:®¤C@BCFEk6dÅ<C:>¸§C²:DCgF69:<C:ÌV&P³V%;I«BC²_cPBEpV%j69h4[YfDE%[PQh&CWV%j<C@I;BM9fDC;Bh%h6dHQ8<\$C8V$698¯¬j<69gkj¯;BMMPBV%jDCFE$PB¤h&CFE%@I;I¤MdC´@I;IE6Ä;I¤M9ChPB_LV%jDC 8DCFV~¬PBE%«¨j<;B:4;B:DPB[<V&C:¨V%j<C @A;BMfDC "RÐBÑ RÎR>¸§C£gjDPQh&CeV%j<69h$[Y;IE%V%69gFf<M9;IE];Bhh%6dHQ8<\$C8VWh%68<gRC²V%jDC@I;IE6dPQf<h!gFM969869gF;BMah6dHQ8<h!j<;@BC];E;IV%jDCFE¨h\];BM9Ma[<E%PB¤D;I¤6M96dV~l®PB_P'gFgFfDEE%C8<gRC§;B8<:­6dV6hj6dHQj<MdlÆf<8<M6d«BCMdlV&P£Å8<:*;£M9;IE%HBC´8'f<\4¤CFEPB_"V%jDCh&C´h%6dHQ8<h698Ì;£h%698LHQMdC!M96d@BC![6dHD>XG?6d@BC8V%j<69h©¤;Bg«HBE%PQf8<:;Bhh%6dHQ8<\$C8VFJ¬CWgRPQ\$[f<V&C:§V%jDC]@I;IE6dPQf<hgRPQ8:<6dV%6dPQ8<;BM0[E%PB¤;I¤69Md6dV%69ChV&Pe¤CW;Bhh&P'gF6;IV&C:̬69V%j§V%jDCWCM9C\$C8^V%hPB_V%jDC;Bh%h6dHQ8<\$C8V,M9;IV&V%69gRCB>É<PBEXC;Bgj[;B69E PB_:<6dE%CgRV%MdlM9698<«BC:CMdC\$C8V%hX_cE%PQ\

104 L. C. van der Gaag, S. Renooij, and P. L. Geenen

Page 115: Proceedings of the Third European Workshop on Probabilistic

É06dHQf<E%C4Ò'Xz,jDCHBEk;I[j<69gF;BMh&V&Ekf<gRV%fDE%CPB_UPQfDEq;lBCh69;B8e8<CFV~¬PBE%«´_ªPBEgFM;Bh%h%69gF;BM¦h%¬698DC?_cCF@BCFE#698´[69HQhF>

V%jDC³;Bh%h%6dHQ8\$C8^VM9;IV&V%69gRCBJ,¬C³gRPQ\$[;IE%C:­V%jDC£gRPQ8L:<6dV%69PQ8<;BM [<EPB¤;I¤69M969V%6dChPB_;£@L6dE;IC\]69;PB_#gFM9;Bhh%69gF;BMh&¬68DCe_cCF@BCFE> ¸§C*_cPQf<8<:2_cPQfDE²@L6dPQM9;IV%6dPQ8<hPB_pV%jDC[<E%PB[CFE%V%69ChPB_[;IE%V%69;BM69h&PBV&PQ869gF6dV~l+HQ6d@BC8V%jDC®h&CRMdCgRV&C:³¤;Bg«HBEPQf<8<:³;Bh%h%6dHQ8\$C8^VFÝV%j<Ch&C@'6dPQM;IV%6dPQ8<h;IE%ChjDPs¬8]¤l:<;Bh%jDC:WM9698DCh698$É06dHQf<E%Cx'> z,jDC"_cPQfDE@L6dPQM9;IV%6dPQ8<hU;BM9M'PBE6dHQ68<;IV&C:p_ªE%PQ\;B:<:698DHV%j<CgFM698<69gF;BMh%6dHQ8WPB_7:<69;IEEjDP'C;V&PV%j<C!gRPQ\¤698<;IV%69PQ8WPB_Å8:<698DHQhPB_;IV%;ArL6;];B8<:T\];BM;B69h&CB>Xz,j<C_ªPQfDE@'6dPQM;IV%6dPQ8<hV%j'f<h¬CFE%C©;BM9MgRPs@BCFE%C:¤'l¨;h698DHQMdCXh%V&Ef<gRV%fDE;BMMdl!\W698<69\];BMgRPQ8V&CRr'V#PB_UPI·C8<gRCB>¸§C°[<E%Ch&C8V&C:ÛV%jDC§[Y;B6dEh²PB_@L6dPQM9;IV%698DHÆ;Bh%h6dHQ8L\$C8V%h]V&P°V~¬PÌ@BCFV&CFE698;IE69;B8<hF>­qC698DHÌgRPQ8D_cE%PQ8V&C:¬6dV%j¨V%jDC _cPQfDE¦@L6dPQM9;IV%6dPQ8<hJsV%jDCFl698<:<CF[±C8:DC8^V%M9l;B8<:¬6dV%jÆgRPQ8@L69gRV%6dPQ8ß68<:<69gF;IV&C:®V%j<;IVV%jDC²[<E%PB¤Y;I¤69M96dVmlPB_ ;]@L6dE;IC\W69;PB_ gFM9;Bh%h%6gF;BM¦h&¬698<C_ªCF@BCFE?h%jDPQf<M:T698LgRE%C;Bh&CfD[PQ8²Å8:<698DHV%jDCp;B:<:<6dV%6dPQ8;BM¦h%6dHQ8PB_ :<69;IE&EjDP'C;L>*qPBV%j®@BCFV&CFE698<;IEk69;B8<h\$C8V%6dPQ8DC:¯V%j<;IVV%jDCgRPQ\¤698<;IV%6dPQ8£PB_X;IV%;ArD69;²;B8<:³:<6;IE%EjDP'C;]Ch&[CgF69;BMMdl[PQ698V&C:V&P#gFM9;Bhh%69gF;BMQh&¬#698DC_cCF@BCFEÝB¬6dV%j<68?V%jDCXh%gRPB[CPB_"V%jDCWq;slBCh%69;B8®8DCFV~¬PBE%«±J0V%jDCFl*gRPQf<M9:°8DPBV¨V%j<698<«PB_$;B8DPBV%jDCFE³:<69h&C;Bh%C*V%j<;IVT¬PQf<M9:ʤCÌ\]PBE%CÌM96d«BCM9l

V&P²HQ6d@BCE69h%CV&P²V%j<6h!gRPQ\¤698;IV%6dPQ8³PB_©h%69HQ8<hF>¨z,jDCFlV%j'f<h4698:<69gF;IV&C:§V%j<;IVV%jDCW8DCFVm¬©PBE«*h%jDPQf<M9:§68<:DCFC:j<;s@BC̤CFC8Û69h&PBV&PQ8<CÌ698ÊV%jDCÌÅ<@BC*@I;IE69;I¤MdCh²f8<:DCFEh&V%f<:<lHQ6d@BC8WV%jDC#;I¤Yh&C8<gRC,PB_;B8l$PBV%jDCFE©h%69HQ8<hF>0z,jDC_cPQfDE©69:<C8^V%6dÅC:$@'69PQM9;IV%6dPQ8<hXV%jf<h©¬CFE%C#698:<69gF;IV%6d@BC,PB_\$PL:DCM9M968DHp68<;B:DCÙ'f<;BgF6dCh"698´PQfDE#gFfDE%E%C8V,8DCFV~¬PBE%«±>

¾7»XÁ*mÀI¾a»`b8³V%j<69h?[;I[CFEJ¬CWj<;s@BC][<E%Ch&C8V&C:*;²\$CFV%j<P':³_cPBEh&V%f<:<l'698<H\$PQ8DPBV&PQ8<6gF6dV~l¨PB_±q;lBCh%69;B8$8<CFV~¬PBE%«'h>U`b8@L6dCF¬*PB_'V%jDCXf<8D_È;@BPQf<E;I¤MdCgRPQ\$[M9CRrL6dVml?PB_V%jDCX[<E%PB¤DMdC\ 698­HBC8DCFE;BMJV%jDC§\$CFV%jDPL:­_ªPLgFf<h&Ch´PQ8ÌN%f<h&V²;h%fD¤Yh&CFV#PB_XV%jDCPB¤h&CFE%@I;I¤MdCp@A;IEk69;I¤MdCh#PB_X;´8DCFV~¬PBE%«;B8<:̤f<69M9:hfD[PQ8*;M9;IV&V%6gRCPB_,;BM9MNbPQ698V4@I;BM9fDC;Bhbh%6dHQ8\$C8^V%h#V&PV%jDCh&Cp@A;IE6;I¤MdChF>zjDC4M9;IV&V%69gRC69h#C8Lj<;B8<gRC:Þ¬6dV%jÞ698D_cPBE\];IV%6dPQ8Þ;I¤PQfDVV%jDC°CR·±CgRV%h³PB_V%jDCh&Ce;Bh%h%6dHQ8<\]C8^V%h$PQ8ßV%jDCT[<EPB¤;I¤69M969V~lÌ:<6h&V&E6d¤fDV%6dPQ8ßPs@BCFE´V%jDCe8<CFV~¬PBE%«>vh´\];B68ËPQfDV&[f<V]@A;IEk69;I¤MdCB>z,jDC©C8<j<;B8<gRC:pM9;IV&V%69gRCV%jDC8p69h7fh&C:4_cPBE069:DC8V%6d_ªlL698DH;BM9MU@L6dPQM9;IV%6dPQ8<h?PB_©V%jDC$[<E%PB[CFE%V%69Ch#PB_\$PQ8DPBV&PQ869gF6dV~l;B8<:p_cPBE0gRPQ8h&V&Ef<gRV%698<H!\]68<69\];BMBPI·C8<:<68DH#gRPQ8V&CRr'V%h_cPBE,_cf<E%V%jDCFE,gRPQ8<h%6:DCFE;IV%6dPQ8>

Lattices for Studying Monotonicity of Bayesian Networks 105

Page 116: Proceedings of the Third European Workshop on Probabilistic

"! #$%&

' "( ) "( ** () ( "!" #$%&( ' +*(, ' , '+( "( "!" #%&$ "!" # &( ** ( "!" #%*&

' "( "( ** *(, +( '

"() "(, "( **( '

"( "!" #%&( ' "!" #%*&,( "( **

"( "!" #%*&,( "( *( "!" #%&$

"!" #%&( "( ' !" #$%&(, +( +*

**( "() "(, "( "!" #%*&,( ' "( **

( ' ( +( "! #$%&$- ( ( "! #$%&(

**( "!" #$%&(, '+(

"!" #$%&( ' ( "( **(

É06dHQfDECx'Ö!8§;Bh%h%6dHQ8<\]C8^VM9;IV&V%69gRC_cPBEPQfDE¨q;lBCh%6;B8*8DCFV~¬PBE%«£_ªPBE¨gFM;Bh%h%69gF;BMh&¬#698DCp_ªCF@BCFEÝ7V%jDC$@'69PQM9;IV%6dPQ8<hPB_UV%jDC¨[<EPB[±CFEV%6dChPB_a[Y;IE%V%69;BM\$PQ8DPBV&PQ869gF6dV~l²;IE%C¨68<:<69gF;IV&C:´¤'l²:<;Bh%jDC:²M968DChF>

¸§C¬PQf<M9:eM96d«BC4V&P²8DPBV&CpV%j<;IVFJ¦;Bh?;h%fD[<[M9C\$C8^VV&PÆPQfDET\]CFV%jDP':2_cPBETh&V%f<:<l'698<H¯\$PQ8<PBV&PQ8<69gF6dVmlBJ!¬C:DCh6dHQ8DC:W;ph&[CgF69;BMÄ[YfDE%[PQh&C"CM969gF69V%;IV%6dPQ8WV&Cgkj<8<69Ù'fDCV%j<;IV¨;BMMdPs¬h!_cPBE¨:<69h%gFfh%h%698DH¬6dV%j³:DPQ\];B68eCRrL[CFE%V%h¬j<CFV%jDCFEPBE8DPBV4V%jDC]69:DC8V%6dÅ<C:³@L6dPQM9;IV%6dPQ8h698<:DCFC:gF;B8´¤C?gRPQ8<h&V&EfDC:´;Bh@L6dPQM9;IV%6dPQ8h©PB_UgRPQ\]\$PQ8<M9lW;Bg«L8DPs¬#MdC:DHBC:³[;IV&V&CFE8hPB_\]PQ8DPBV&PQ8<69gF6dVml®µcÍ ;B8°:DCFEG?;B;IHÎFÏÐBÑo>dJ4ÒIuBu.Q¶k> z,j<69hV&Cgj8<69Ù'fDC*j<;Bh²¤CFC8:DCh6dHQ8DC:Wh%[±CgF69ÅgF;BM9Mdl$h&P;BhV&Pp;Bh%«M96dV&V%M9C,V%69\$C?;B8<:M969V&V%MdC gRPBHQ8<6dV%6d@BC©CR·±PBEV7_cE%PQ\ÛV%jDC©CRr'[CFE%V%ha698V%jDC©@BCFE&6dÅYgF;IV%6dPQ8´PB_0V%jDC¨69:DC8V%6dÅ<C:´@L6dPQM9;IV%6dPQ8hF>zjDCWE%Chf<MdV%hV%j<;IV¬©C´PB¤<V%;B698<C:°_cE%PQ\ ;I[<[M9l'698<HPQfDE"\$CFV%j<P':_cPBEh%V%f<:DlL698DH\]PQ8DPBV&PQ8<69gF6dVmlWV&P$;pE%C;BM8DCFVm¬©PBE«®68°@BCFV&CFEk698<;IE%l®h%gF6dC8<gRCBJ698<:69gF;IV&C´V%j<;IVW6dV[<ECh&C8^V%hX;¨f<h&CF_Èf<ML\$CFV%jDPL:$_cPBEXh&V%f<:DlL698DH?E%C;Bh%PQ8<698DH[;IV&V&CFEk8<h68°q;lBCh69;B8®8DCFV~¬PBE%«LhF>§z,jDC8<CRr'Vh&V&CF[8DP¬ß69h0V&PCRrLV&C8<:]PQf<E \$CFV%jDPL:¤lpV&Cgj<869Ùf<ChV%j<;IVCRrL[MdPQ6dV#V%jDC$gRPQ8<h&V&EfgRV&C:§\]698<69\W;BMPI·±C8:<698DH´gRPQ8LV&CRrLV%h_ªPBE"69:DC8V%6d_cl'68DHV%jDC\$PL:DCM9M9698<H¨698<;B:DCÙ'f<;BgF6dCh698¯;³8DCFV~¬PBE%«°V%j<;IV$gF;Bf<h&C´V%jDC´@A;IEk6dPQf<hp@'69PQM9;IV%6dPQ8<hPB_0\$PQ8DPBV&PQ8<6gF6dV~lB>

/ <½ Y»XÁ 0214351768 9:;8%9%1<%=?>;@1BA2CDEC*FHG"CFIDEJLKNMIFHG"FOEPRQTS-MOEU"VWDEPYXZ[DEV;MG"FDEP]\[PTD^J_V`G"FHGaTbEc-de8%d)1fa2gih9+jkc:?8 9,lm8 9nko;:-1

p 1 p 9`qoEr+s%8 9t1u<%=wvi<;1yxzD^CC*FIMQYSMOEU"V;|~BF'UG"C]LOEPYIMYCGD^PTX5KFHG"C*U"FiC*F'`MuxDEC*CFIMGai1 1T98 8%o;c)1

1 B1Nzjfo^c-d1 71N8 nknfo^cz1e<t=;=?>1c- 9+8%8 cwro;nr9o?di8 ;9+8t?nfr+jk;cjfcw-o;c?rjroErjf?8 h9+?¡-o^¡-jfnkjkr+jHc-8r¢;9£¤%1)c ¥U+OtIMMXEF'P¤¦EG§O¨©C'S-M§ª«-C'S¬LOEPt¨MUMPYIMO^P®­PTIM U"CD^F'P-C*V¯F'P°\[U"C*F ±$I FDEJ)²"P2CM J'J_F¦wMPYIMa)§?9+:wo^c³´o;iµo^cczaih-o^:?8%,¶;¶?>´·©¶;¸w¹i1

B1 1%`o;cdi8 9 p o;o;:-a%º1 B1t6idinHo^8%c-di8 9o^c-d´»º1%8%8 nHdi8 9%1b;@;@;¸-1^§?c^r;cjHjfr¼ºjkc½6,o`¼;8%+jHo^cNc8r¢;9£i 1Ec©¥UOE¾IMMXEF'P¤¦EG¿O¨¿C'S-MyÀ¤ÁECSÂmO^P%¨M U+M PTIMuOEPíPTIM U"CD^F'P-C*VuF'P\[U"C*F ±$I FDEJT²"P-CM J'J_F¦?M PTIM a-»ÅÄ¿»¿"a-h-o;:;8t,¹;Æ;=·©¹?vEÆ-1

B1 1Ã`o;cÇdi8%9 p o;o;:-aÈ71 L1 p 8 8%c8 cÇo;c-dÉ1 021 1ÊËo^¡2o;Ìc8t£wgiÌ-j ͵1Bb;@;@;Æ-1mlm8 9jµ¼¤jkc:W;c-^r+?cjHjfr¼;µe6,o`¼;8tjHo^cÈc8r¢;9£i°¢Åjfr+ÌÎdi?o^jkcÈ8ÏihT8 9+r%1c¥,U+O%IMMXEF'P¤¦EGºO¨C'S-M[«-C'S©Z¿D^V;M"G"FD^P°ÐÂOtX;M J'J_F'Pw¦©\$;YJ_F'¾ID^CFO^PG§ÑÒO^U+Ó`G+S-OTa2h-o^:?8%Å=º·]<t¹1

1 71^8 nknfo;c)1-<t=;=;@-1Ec-d-o^8 cwro;n-?c-8%hirL^µ?2o^nfjfro^r+jk;8Ôh9;¡-o;¡jknfjHrjkÔc-8r¢;9£¤%1º\[U"CF ±IFD^Jº²"P2CM J'J_F'¾¦wM PTIMai¸?¸-ÕLb;¹wv´·½¶?@;¶-1

106 L. C. van der Gaag, S. Renooij, and P. L. Geenen

Page 117: Proceedings of the Third European Workshop on Probabilistic

"!#$&%')(+*",- . /(+

02143656789;:<7=3>5;?A@BC7=7<DE7=365>FG?AHI?A@JK9L5;?CMN7=7=OP?AQL7<@ H R?S3H)T=UGV3;UWT=@XR7<H 1YTZ3>7=365[8\TZRQL];H 1436D^;_A1Y?S36_`?SaAb6cHI@ ?S_ dHc361Y:=?A@Xa 1YHfe

FG9hgi96jkTSlnm<o&9poZm=qrb+s=t<oZm<ujcHI@ ?S_ dHAbu)d;?Cv)?AH d;?A@XO47=3656awyx;z|6;~$6~r~rx&6r$x

2S|L=M?E143HI@ Tr56]L_`?H d;?U7=R14OYe[T=U\Ri]6OYH 1f561R?S36a 1YTZ3L7=O2j\7Se=?Sa 147=33;?AHT=@ _AO7=a a 1Y¡6?A@XaA9ud;?SaI?_AO47=aIa 1Y¡6?A@ a1436_AO4]656?CTZ3;?¢T=@CRT=@ ?¢_AO47=a a:<7<@ 147<£LO4?Sa)7=365R¢]LOYH 1YQLOY?CU?S7<H ];@ ?i:<7<@ 147<£LOY?SaSbL)d61_ d3;?A?S5n36T=H£?CRT&5;?SO4OY?S5¤7=a.£?S1436D56?AQ?S3L5;?S3H\TZ3¥?A:=?A@ e¤_AO7=a a\:y7<@X147<£LOY?=9g¦][email protected]=R1OYeT=U$R¢]6O4H 1f5614R?S3La 1YTZ367=O_AO7=a a 1Y¡6?A@Xa1436_AO4]L5;?Sa7=aEa Q?S_A17=O_A7=aI?SaH d;?[k?SO4O&3;T)3§367=14:=?[j\7Se=?Sa 147=3'7=365§HI@ ?A?`f7=];DZR?S3HI?S5_AO7=a a 1Y¡6?A@XaAb$e=?AHiT<¨?A@ a£?AHIHI?A@RT&5;?SO4O41436D[_A7<QL7<£L14O1YH 1Y?SaKH dL7=3©Uª7=R14O1Y?SaKT=URTr56?SO4a«1YH d©7a 1436DZOY?_AO7=a a:<7<@ 147<£LOY?=9§M?n5;?Sa _`@X1Y£?¥H d;?[OY?S7<@ 3L143;D Q6@ T=£LO4?SR¬UWT=@7Na ];£6Uª7=R14O4eNT=UCR¢]6O4H 1f5614R?S3La 1YTZ367=O_AO7=a a 1Y¡6?A@Xa¢7=365­a d6T|®H d67<HH d;?¥_`TZRQLOY?`l;1YHeNT=U)H d;?¥aITZO];H 1YTZ3¯7=OYD=T=@ 1YH d6R°14aiQTZOYe&3;TZR147=O143H d6?3r]6R¢£?A@T=U:y7<@ 17<£LOY?Sa143:=TZO4:=?S5+9¦M?U];@ H d;?A@QL@ ?SaI?S3HaITZR?iQL@ ?SO414R1367<@ e?`lrQ?A@ 14R?S3H 7=O2@ ?Sa ]6O4H aHITE14O4O4]6a HI@ 7<HI?¦H d;?C£?S3;?A¡6H a\T=U$H d;?«Ri]6OYH 1f5L14R?S36a 14TZ367=O41YHeiT=U$TZ];@_AO7=a a 1Y¡6?A@XaA9

± ²³2´=µZ¶·¸¹r´Zºf¶»³

j\7Se=?Sa 17=33;?AHT=@ _AO47=a a 1Y¡6?A@ ad67|:=?.DZ7=143;?S5_`TZ3La 145&?A@ 7<£LO4?.QT=Q]6O47<@ 1YHe«UT=@aITZOY:&143;DC_AO47=a a 1Y¡L_A7<H 1YTZ3Q6@ T=£;OY?SRa)d;?A@ ?i7=3 1436a H 7=36_`?i56?Sa _`@ 1Y£?S5>£re[73]LR«£?A@T=U«U?S7<H ];@ ?Sa¥dL7=a¤HIT¼£? _AO47=a a 1Y¡6?S5½143½TZ3;? T=U¢aI?A:?A@ 7=O\5614a H 1436_`H¢_AO7=a aI?SaA9Nud6?a ]L_A_`?Sa aT=U)?SaIQ?S_A147=OOYe367=1Y:=?j\7Se=?Sa 17=3_AO47=a a 1Y¡L?A@ a)7=365nH d;?RT=@ ?i?`lrQL@ ?Saa 1Y:=?iHI@ ?A?`f7=];DZR?S3HI?S5N36?AHfkT=@ [_AO47=a a 1Y¡6?A@ a14a@ ?S7=5&14OYe¦?`l&QLO47=1436?S5CUW@ TZR¾H d;?S14@2?S7=aI?kT=UL_`TZ36a HI@ ]6_`H 1YTZ3i7=365H d;[email protected]=?S36?A@ 7=O4OYeD=TT&5>_AO47=a a 1Y¡L_A7<H 1YTZ3>Q?A@ UWT=@ R7=36_`?=9

¿n7=3e 7<Q6QLO14_A7<H 1YTZ3 5;TZR7=143LaAb+d;T|k?A:=?A@Sb2136_AO4]65;?_AO47=a a 1Y¡L_A7<H 1YTZ3-QL@ T=£LOY?SRa>)d;?A@ ?N7=3À1436aIH 7=36_`?©d67=aHIT>£?7=a a 14DZ3;?S5HIT[7¥RTZa H«O41Y=?SOYe_`TZR«£14367<H 1YTZ3T=U_AO47=a a ?SaA9^&1436_`?)H d;?3r]6R¢£?A@T=UÁ_AO47=a a:<7<@ 147<£OY?Sa1437j\7Se=?Sa 17=33;?AHfkT=@ [_AO47=a a 1Y¡6?A@¦14a@ ?SaIHI@ 14_`HI?S5nHITTZ3;?=ba ]6_XdQ6@ T=£OY?SRaC_A7=363;T=H¢£?RTr56?SO4OY?S5aIHI@ 7=14DZdHIUT=@I\7<@ 56OYe=9-g¦3;?>7<Q6Q6@ TZ7=_ d½14aHIT_`TZ36a HI@ ]6_`HE7_`TZRQTZ]6365_AO7=a a[:<7<@ 147<£OY?H d67<H RTr56?SO4a[7=O4OKQTZa a 1Y£LOY?_`TZR¢£L14367<H 1YTZ3LaÁT=U6_AO47=a a ?SaA9Gu)d614aÁ_AO47=a a»:<7<@ 147<£LO4?H d;?S3?S7=a 14O4e?S3L56a¢]6Q)1YH d©7=3©136d61Y£L14H 1Y:=?SOYe[O47<@ D=?E3r]6R£?A@KT=U\:y7=O];?SaA9EÂO4aIT;b2H d;?aIHI@X]6_`H ];@ ?T=U.H d;?Q6@ T=£;OY?SRÃ14a[3;T=HnQ6@ T=Q?A@ OYe½@ ?AÄ6?S_`HI?S5Å143¾H d;?@ ?Sa ]6OYH 143;DRT&5;?SO9GÂ3;T=H d6?A@7<Q6Q6@ TZ7=_Xd14aGHITK5;?A:=?SOYT=QR¢]LOYH 1YQLOY?_AO47=a a 1Y¡6?A@ aAbTZ3;?UT=@?S7=_XdT=@ 1YDZ1367=O»_AO47=a aA9«¿[]LOYH 1YQLOY?_AO47=a a 1Y¡6?A@ aAbd;T|k?A:=?A@Sb;_A7=363;T=Hk_A7<Q6H ];@ ?)13HI?A@ 7=_`H 14TZ36a7=RTZ3;DH d;?k:y7<@X1YTZ]6a»_AO7=a aI?Sa$7=365iR7SeKH dr]6aG7=O4aIT¦3;T=HQ6@ T=Q?A@ O4en@ ?AÄL?S_`H¢H d;?Q6@ T=£OY?SR¥9[¿[T=@ ?AT|:[email protected] d;?

:<7<@ 1YTZ]6a_AO47=a a 1Y¡6?A@ aG14365L14_A7<HI?.R¢]6O4H 1YQLOY?k_AO47=a aI?SaAbH d;?S3H d;?14RQLO414?S5_`TZR«£L1367<H 1YTZ3R7Se3;T=H¢£?H d;?¤RTZaIHO41Y=?SO4e?`l&QLO47=367<H 1YTZ3¤T=U$H d;?CT=£aI?A@ :=?S5>U?S7<H ];@ ?SaS9

V3ÆH d614a QL7<Q?A@k?Ç143HI@ Tr5L]6_`?­H d;?Ç_`TZ36_`?AQ6HNT=URi]6OYH 1f561R?S36a 1YTZ3L7=O41YHfe¥143j\7|e=?Sa 147=336?AHfkT=@ _AO47=aa 1Y¡L?A@ aHITiQ6@ T:r1456?UWT=@\7=_A_A];@ 7<HI?SO4eRT&5;?SO4O41436DCQ6@ T=£;OY?SRa)d;?A@ ?n1436aIH 7=36_`?SaE7<@ ?[7=a a 1YDZ3;?S5ÇHIT¯R¢]LOYH 1YQLOY?_AO47=a a ?SaA9ÂÈR¢]6O4H 1f5614R?S3La 1YTZ367=O+j\7Se=?Sa 147=33;?AHfkT=@ _AO47=a a 1Y¡6?A@[143L_AO4]65;?Sa¤TZ3;?T=@RT=@ ?_AO47=a a[:y7<@ 17<£LOY?Sa7=365ÆTZ3;?¼T=@NRT=@ ?¼UW?S7<H ]6@ ?­:y7<@ 17<£LOY?SaA9ÉVHRT&5&?SO4aEH d;?@ ?SO47<H 1YTZ36a d61YQLa£?AH?A?S3'H d;?:<7<@ 147<£LOY?Sa£e7=_`e&_AO414_561Y@ ?S_`HI?S5 D=@ 7<QLd6a¦T:=?A@«H d;?_AO47=a a¦:y7<@ 17<£LOY?Sa7=365§T|:=?A@H d;?>U?S7<H ];@ ?[:<7<@ 147<£OY?SaaI?AQL7<@X7<HI?SOYe=b7=365Uª];@ H d;?A@_`TZ363;?S_`H aH d;?HT'aI?AH aT=U:<7<@ 147<£LOY?Sa[£e7>£1QL7<@ H 1YHI?E561Y@ ?S_`HI?S5D=@ 7<QLd+Ê7=3©?`l&7=RQOY?Ri]6OYH 15614R?S36a 1YTZ367=O¦_AO47=a a 1Y¡6?A@>14a¥56?AQL14_`HI?S5143-Ë$1YDZ]6@ ?ÇÌ<9ÂaKUT=@¢TZ3;?`f5L14R?S36a 14TZ367=Oj\7Se=?Sa 147=3¯3;?AHT=@ ©_AO47=aa 1Y¡L?A@ aAbZk?561aIH 143;DZ]61a di£?AH?A?S3561¨?A@ ?S3HHferQ?SaT=URi]6OYH 1f561R?S36a 1YTZ3L7=O_AO47=a a 1Y¡6?A@$£e14RQTZa 13;D)@ ?Sa HI@ 14_ÍH 1YTZ36aTZ3ÇH d;?S1Y@D=@ 7<QLdL14_A7=O.aIHI@ ]6_`H ]6@ ?=9¯ËL]6O4OYe©HI@ ?A?`7=];DZR?S3HI?S5¤R¢]LOYH 1f5614R?S36a 1YTZ367=O;_AO7=a a 1Y¡6?A@XaAbZUT=@?`lr7=RQLO4?=b6d67S:=?i561Y@ ?S_`HI?S5¥HI@ ?A?SaT|:=?A@H d;?S1Y@_AO7=a a.:<7<@ 17<£LOY?Sa)7=a.?SOOÁ7=a.T|:=?A@H d;[email protected]?S7<H ];@ ?«:y7<@ 17<£LOY?SaA9

Ë;T=@)H d;?¦U7=R1OYeT=U2Uª]6O4O4eiHI@ ?A?`f7=];DZR?S3HI?S5[Ri]6OYH 15614R?S36a 1YTZ367=OC_AO7=a a 1Y¡6?A@XaAbKk?¯aIH ]L5;e'H d;?¯O4?S7<@ 36143;DQ6@ T=£OY?SR¥b6H d67<H14aAbH d;?«Q6@ T=£LOY?SR#T=U¡L3L56143;D7E_AO47=aa 1Y¡L?A@H dL7<HK£?SaIH¦¡6H aK7¥aI?AHKT=U\7S:<7=14O47<£OY?567<H 7&9M?a d;T©H dL7<HAb=DZ1Y:=?S3¢7¡;l&?S5«aI?SO4?S_`H 1YTZ3«T=ULUW?S7<H ];@ ?:<7<@ 1

Page 118: Proceedings of the Third European Workshop on Probabilistic

C1

C2

C3

F1

F2 F3

F4

Ë$1YDZ];@ ?Ì$Â3?`l;7=RQLO4?\Ri]6OYH 1Yf5614R?S36a 1YTZ367=OZj\7Se=?SaI147=3¢3;?AHfkT=@ i_AO47=a a 1Y¡L?A@»)1YH d¢_AO47=a a2:y7<@X147<£LOY?Sa67=365U?S7<H ];@ ?C:<7<@ 147<£LO4?SaZ9

7<£LO4?SaQ?A@¤_AO47=a a:y7<@X147<£LOY?=b.H d;?OY?S7<@X36143;DNQL@ T=£LOY?SR_A7=3n£?«5;?S_`TZRQTZaI?S5n143HITET=Q6H 14R1a 7<H 1YTZ3¥Q6@ T=£LOY?SRaUT=@KH d;?aI?AHKT=U._AO47=a a«:y7<@ 17<£LOY?SaK7=365NUWT=@¢H d;?aI?AHKT=UU?S7<H ];@ ?¢:y7<@ 17<£LOY?SaaI?AQ7<@ 7<HI?SOYe=bLd614_ dn_A7=3>£T=H d>£?aITZO4:=?S5½13§QTZOYe&3;TZR147=OH 14R?=9 M? U]6@ H d;?A@¤7<@ DZ];?H d67<HAb$7=OYH d;TZ]6DZdTZ];@¢O4?S7<@ 36143;D¥7=OYD=T=@X1YH d6R 7=a a ]6R?Sa7n¡;l&?S5N£L1Q7<@ H 1YHI?D=@X7<QLd£?AH?A?S3¯H d;?¤_AO47=a ai7=365U?S7<H ];@ ?K:<7<@ 147<£OY?SaAb;1YH14a.?S7=a 14OYe¤_`TZR«£L13;?S5¥)1YH d¤?`lr14a H 143;D7<Q6Q6@ TZ7=_Xd;?SaHITUW?S7<H ];@ ?Ka ];£LaI?AHa ?SOY?S_`H 1YTZ3+9

u)d;?K3]LR?A@ 14_A7=O@ ?Sa ]6OYH a\H d67<H?KT=£LH 7=143;?S5¥U@ TZRQ6@ ?SO414R143L7<@ e?`l&Q?A@ 14R?S3H a)14H dÇTZ];@¤OY?S7<@ 3613;D7=OD=T=@ 14H d6R _AOY?S7<@ OYe[1O4O4]6aIHI@X7<HI?KH d;?£?S3;?A¡6H aT=URi]6OYH 1561R?S36a 1YTZ3L7=O41YHfeT=U¦j\7|e=?Sa 147=3½3;?AHfkT=@ Ç_AO47=a a 1Y¡6?A@ aA9aIQ?S_A147=O4OYe¢TZ3a R7=OOY?A@567<H 7¢aI?AH aAbrH d;?)_`TZ36aIHI@X]6_`HI?S5Ri]6OYH 1f5L14R?S36a 14TZ367=O_AO7=a a 1Y¡6?A@XaÁQ6@ T:r15;?S5Kd61YDZd6?A@»7=_Í_A];@X7=_`eH dL7=3KH d;?S1Y@2TZ3;?`f5614R?S36a 1YTZ367=O=_`TZ]L3HI?A@ Q7<@ H aA9V3 _`TZR¢£L14367<H 1YTZ3 )1YH d U?S7<H ];@ ?½aI?SOY?S_`H 1YTZ3+bERT=@ ?`T:=?A@Sb<TZ];@27=OYD=T=@X1YH d6R@ ?Sa ]6OYHI?S5K13KaIQL7<@ a ?A@2_AO47=a a 1Y¡6?A@ a)14H d_`TZ36a 15;?A@ 7<£LOYeU?A?A@kQL7<@ 7=R?AHI?A@ a\7=365+b&d;?S36_`?=b)14H d¥a R7=O4OY?A@.:<7<@ 147=3L_`?=9

u)d;?nQ7<Q?A@¥14a¤T=@ DZ7=3614aI?S5'7=a¥UTZO4OYT)aA9 V3¾^r?S_ÍH 1YTZ3rb.?n@ ?A:&1Y?A j\7Se=?Sa 147=3'3;?AHfkT=@ Ç_AO47=a a 1Y¡6?A@ a143½D=?S3;?A@X7=O9#V3¾^&?S_`H 1YTZ3¾srb¦?©5;?A¡L3;? TZ];@¥Uª7=R14O4e T=UNRi]6OYH 1Yf5614R?S36a 1YTZ367=O¥_AO47=a a 1Y¡6?A@ aA9 V3 ^r?S_ÍH 1YTZ3 ;b?'7=565;@ ?Sa aH d;?'OY?S7<@ 3L143;DQ6@ T=£LOY?SR UT=@Uª]6O4OYe§HI@ ?A?`f7=];DZR?S3HI?S5ÀRi]6OYH 1Yf5614R?S36a 1YTZ367=O)_AO7=a a 1¡6?A@Xa7=365¼Q6@ ?SaI?S3HE7QTZOYer36TZR147=OH 14R?¤7=O4D=T=@ 1YH d6RUT=@¢aITZO4:r1436D>1YHA9EV3¯^&?S_`H 1YTZ3©trb$k?£6@ 14?AÄ6en7=5656@ ?Sa aU?S7<H ];@ ?aI?SOY?S_`H 1YTZ3UWT=@kTZ];@Ri]6OYH 1Yf5614R?S36a 1YTZ367=Or_AO47=aa 14¡6?A@ aA92M?k@ ?AQT=@ HGaITZR?kQL@ ?SO414R1367<@ e¦@ ?Sa ]6O4H a»U@ TZR7=3E7<Q6QLO414_A7<H 14TZ3143H d;?£L1YTZR?S561_A7=O;5;TZR7=14313E^r?S_ÍH 1YTZ3r9¦ud;?«Q7<Q?A@¦14a)@ TZ]6365;?S5[T<¨)1YH dnTZ];@¦_`TZ3&_AO4]L56143;DiT=£LaI?A@ :<7<H 1YTZ36a)143[^r?S_`H 1YTZ39

µfº'º³µºjk?AUT=@ ?@ ?A:&1Y?A)13;D 367=1Y:=? jk7|e=?Sa 147=3 7=365ÃHI@ ?A?`7=];DZR?S3HI?S536?AHfkT=@ i_AO7=a a 1Y¡6?A@XaAb<?143HI@ T&56]6_`?kTZ];@3;T=H 7<H 14TZ367=O._`TZ3:=?S3H 1YTZ36aS9¯M?>_`TZ3La 145;?A@j\7Se=?Sa 17=33;?AHT=@ raT|:=?A@>7©¡L3614HI?naI?AH "! w$#&%$')()()(*'+#-, b.&/ Ì<b;T=UG5614a _`@ ?AHI?@X7=365;TZR :y7<@X147<£LOY?SaAbr)d;?A@ ??S7=_Xd# CH 7<=?Sa¥7:y7=O4]6?n143 7N¡L3614HI?[aI?AH10324+5 # 6X9ÀË;T=@7>a ];£LaI?AHKT=U.:<7<@ 147<£OY?Sa87:9 ?]6aI?;032<4=5>7?68!@ACB>DE 0324=5 # 6CHIT56?S3;T=HI?H d;?aI?AHiT=UGFTZ143HK:<7=O4];?7=a a 1YDZ36R?S3H aHITH79Â'j\7Se=?Sa 147=3¤3;?AHT=@ 3;T|½14a7QL7=14@JIK!MLN 'PORQ b;)d;?A@ ?8NÅ14a\7=3¤7=_`er_AO41_5L1Y@ ?S_`HI?S5D=@ 7<Qd)d6TZaI?:=?A@ H 14_`?Sa._`T=@ @ ?SaIQTZ365HITH d;?@X7=365;TZR:<7<@ 147<£LO4?SaS 7=365 O 14a)7EaI?AH)T=UQ7<@ 7=R?AHI?A@Q6@ T=£L7y£L1O41YH 1Y?SaAÊ»H d;?EaI?AH O 143L_AO4]65;?SaC7[QL7<@ 7=R?AHI?A@UT*V BW X V BUT=@¦?S7=_ d:<7=O4];?HYZ[\032<4]5 # 67=365?S7=_ d:<7=O4];?7=aa 14DZ36R?S3H&^SYZ_[`0324+5>^ # 6HITÇH d;?aI?AH ^ # ¢T=UQL7<@ ?S3H akT=U # 13aNi9ud6?3;?AHT=@ bIÈ3;T½56?A¡L3;?Sa7cFTZ143H)QL@ T=£L7<£L14O1YHfe¤5614aIHI@X1Y£L];H 1YTZ3&dCeÇT:=?A@f H d67<H14a\Uª7=_`HIT=@ 14a ?S5>7=_A_`T=@ 561436DHIT

dCeg5 # % ')()()($'+#-, 6!,hji %T ACBW XAGB (

j\7|e=?Sa 147=33;?AHT=@ ¢_AO47=a a 1Y¡6?A@ a$7<@ ?.j\7Se=?Sa 17=33;?AHkT=@ &aT=U@ ?SaIHI@ 1_`HI?S5HIT=QTZOYT=D=eH d67<Hk7<@ ?H 7=14OYT=@ ?S5HITaITZO4:r1436D>_AO47=a a 1Y¡_A7<H 1YTZ3NQ6@ T=£LOY?SRaCd;?A@ ?143LaIH 7=36_`?Sa5;?Sa _`@ 1Y£?S5¯£e­7N3r]6R«£?A@T=U¦U?S7<H ];@ ?Sad67|:=?[HITN£?_AO47=a a 1Y¡6?S5½143½TZ3;? T=U¢aI?A:=?A@X7=OK5614aIH 136_`HQ6@ ?S5;?A¡L3;?S5_AO47=a aI?Sa85ªË;@ 1Y?S5LR7=3lk)mn2<4pob»ÌSq=q<6X9u)d;?CaI?AH.T=U»:<7<@ 17<£LO4?SanT=U7Kj\7Se=?Sa 147=33;?AHT=@ _AO7=a a 1Y¡6?A@1a$QL7<@ H 1H 1YTZ36?S5[143HIT7a ?AHqZrs! w % ')()()(*' ut bv / Ì<bLT=Uw kx2<m>yzxkU|2zP~2)4jk7=3657a 143;DZOY?AHITZ3naI?AH8Z! w )14H d¦H d;?S)4j2P|2<zP~24k`9$Â2|k2kP~2b)4j2 k)zdL7=a7C561Y@ ?S_`HI?S5HI@ ?A?.UWT=@1YH aD=@ 7<QLdbNb143)dL14_ dH d;?_AO47=a a:y7<@X147<£LOY?c14aH d;?]6361r];?@ TrT=H\7=365?S7=_XdU?S7<H ];@ ?:y7<@X147<£LOY?_ dL7=a UT=@¥1YH aETZ36OYe¼QL7<@ ?S3HA9Âm>z+kPk+2y<bkmkxHk$m>nz 5$S_6)4j2 kzCd67=aUT=@1YH a»D=@X7<QLd-N½75614@ ?S_`HI?S57=_`er_AO41_D=@ 7<Qdi13«)dL14_ dH d;?_AO47=a a:y7<@X147<£LOY?c14aH d;?]6361r];?@ TrT=H\7=365?S7=_XdU?S7<H ];@ ?:<7<@ 147<£OY?g ÁdL7=aJ¾7=3657<HRTZaIHTZ3;?)T=H d;?A@U?S7<H ];@ ?:<7<@ 147<£LO4?UT=@>1YH a¥Q7<@ ?S3H aSÊ«H d;?Na ]6£6D=@ 7<QLd143L56]6_`?S5£eH d;?a ?AHf3rkb2RT=@ ?AT|:=?A@Sb$14aK7¥5L1Y@ ?S_`HI?S5HI@ ?A?=bLHI?A@ R?S5¥H d;? w k2<m>yz+kHm>zxkk¢T=U$H d;?K_AO47=a a 1Y¡6?A@S9

u)d;?ED=?S36?A@ 7=O\Q6@ T=£LO4?SRT=U¦OY?S7<@ 3613;Dn7 j\7Se=?Sa 17=33;?AHT=@ _AO47=a a 1Y¡[email protected]@ TZR#7DZ1Y:=?S3>a ?AHT=UG5L7<H 7a 7=RQLO4?Sac¡¢! w$£ % ')()()($'+£¤K bu¥ / Ì<bG14a¦HIT>¡L365U@ TZR7=RTZ36DCH d;?Uª7=R14OYeKT=U+3;?AHT=@ _AO7=a a 1Y¡6?A@Xa2TZ36?H d67<H£?SaIHR7<H _ d;?Sa¦H d6?i7|:y7=14O7<£LOY?¢5L7<H 7&9¦Âa7R?S7=a ];@ ?

108 L. C. van der Gaag and P. R. de Waal

Page 119: Proceedings of the Third European Workshop on Probabilistic

T=U;d;T|¯k?SO4OZ7RT&5;?SOZ56?Sa _`@ 1Y£?Sa+H d;?567<H 7&b<T=UWHI?S3¢H d;?RT&5;?SOpaGOYT=D<fO41Y=?SO14d;TrTr5CDZ14:=?S3«H d;?\567<H 7¦14a2]6aI?S5+Ê=UT=@7¦j\7Se=?Sa 147=336?AHfkT=@ fIÀ7=3657567<H 7¦aI?AH¡nb=H d;?.OYT=D<O41Y=?SO14d;TrTr5ET=UI DZ14:=?S3 ¡ 1a.5;?A¡L3;?S5>7=a

S5>I¡&6n!¤i %

OYT=D)dCeg5 £ 6 (Ë;[email protected] d6?Uª7=R14OYeT=U»j\7|e=?Sa 147=3>3;?AHT=@ E_AO47=a a 1Y¡6?A@ a

143D=?S3;?A@ 7=Ob»H d;?OY?S7<@X36143;DQ6@ T=£LO4?SR 1aC143HI@X7=_`H 7<£LOY?=9Ë;T=@:y7<@X1YTZ]6aa ];£6Uª7=R14O41Y?SaiT=UK_AO47=a a 1Y¡6?A@ aSbkd;T?A:=?A@|bH d;?QL@ T=£LOY?SR 1aaITZOY:<7<£LOY?143QTZO4er3;TZR147=O&H 14R?=9lr7=RQLO4?Sa136_AO4]65;?©H d;?Ç367=1Y:=?¼j\7Se=?Sa 147=3 7=365 uGÂv_AO47=a a 1Y¡6?A@ a@ ?A:&1Y?A?S57<£T:=?7=365H d;?a ];£6Uª7=R14O4eT=Uj\7Se=?Sa 17=3 3;?AHT=@ ­_AO7=a a 1Y¡6?A@XaT=U)dL14_ d­H d;?na ];£;D=@ 7<QLd­143L56]6_`?S5©£eNH d6?U?S7<H ];@ ?¥:<7<@ 147<£OY?Sa¢_`TZ3LaIH 1H ];HI?Sa7UWT=@ ?SaIHf5ª0»]L_A7=aAb<o=o 6X9\Ë;T=@)T=H d;?A@a ];£6Uª7=R14O414?Sa\T=U$j\7|e=?Sa 147=33;?AHfkT=@ >_AO47=a a 1Y¡6?A@ aAb&@ ?Sa ?S7<@ _ d;?A@Xad67|:=?Q6@ ?SaI?S3HI?S5 d;?S];@ 14a H 14_«7=OYD=T=@X1YH d6Ra)d61_ dnQ6@ T<:&145;?D=TT&5+bH d6TZ];DZd¼3;TZ3&T=Q6H 1R7=ObaITZO4]6H 1YTZ36a&5^&7yd67=R1bÌSq=q<rÊK?AT=DZdÅ7=3L5ÀF$7 S7=361b¤ÌSq=q=q6X9¬M?kTZ]6O45¯O41Y=?HIT3;T=HI?¤H d67<HH d;?Sa ?@ ?Sa ]6OYH a¢7=O4Ok@ ?SO47<HI?HITOY?S7<@ 3613;D¢7i_AO47=a a 14¡6?A@[email protected]¡;lr?S5aI?AH\T=U»@ ?SOY?A:y7=3HU?S7<H ];@ ?:<7<@ 147<£LO4?SaA9Gu2T«H d6?£?SaIHT=UTZ];@&3;T|OY?S5;D=?=bH d;?aI?SOY?S_`H 1YTZ3¯T=U7=3©T=Q6H 14R7=OaI?AH¢T=U)UW?S7<H ];@ ?:<7<@ 17<£LOY?Sa)d67=a3;T=H£?A?S3¥aITZO4:=?S5>7=ae=?AHA9

^&136_`?1YH aÇD=@X7<QLd d67=a 7 ¡;l&?S5#HIT=QTZOYT=D=e=b H d;?Q6@ T=£OY?SR T=UnOY?S7<@ 361436D¾7Æ367=1Y:=?'jk7|e=?Sa 147=3 _AO7=a a 1¡6?A@-7=RTZ]63H a'HIT FI]LaIH ?Sa H 7<£LO414a dL143;DÈR7yl;14Ri]6RO41Y=?SO14d;TrTr5E?SaIH 14R7<HI?Sa)UW@ TZR#H d;?K7S:<7=14O47<£OY?K567<H 7UT=@1YH aQL7<@ 7=R?AHI?A@ aA9Ë;T=@«7¡;l&?S5nD=@X7<QLd614_A7=O$aIHI@ ]L_`H ];@ ?143D=?S36?A@ 7=ObGH d;?R7yl;14Ri]6RfO41Y=?SO14d;TrTr5?SaIH 14R7<HIT=@ aT=U$H d;?CQL7<@ 7=R?AHI?A@)Q6@ T=£7<£L14O41YH 14?Sa7<@ ?«DZ1Y:=?S3¥£re

T V BW X V B ! dq5Y^SYZ6 ')d;?A@ ? d©5;?S3;T=HI?SaH d;?\?SRQ1Y@ 14_A7=Or5614aIHI@X1Y£L];H 1YTZ3¢5;?`¡L3;?S5¾£e-H d6?U@ ?$]6?S36_A1Y?Sa>T=UT&_A_A];@ @ ?S36_`?©143¾H d;?567<H 7&9\ud;?CQ6@ T=£LOY?SR#T=UOY?S7<@ 3L143;D7uGÂvÆ_AO47=a a 1Y¡6?A@7=RTZ]63H aHIT¡6@ a H)5;?AHI?A@ R136143;D7D=@X7<QLd614_A7=OÁa HI@ ]6_ÍH ];@ ?½T=U[R7yl;14Ri]6R O1Y=?SO414d;TrT&5 DZ1Y:=?S3 H d;?'7S:<7=14O7<£LOY?567<H 7§7=365'H d;?S3-?SaIH 7<£O414a d613;D©?Sa H 14R7<HI?Sa¥UT=@1YH aiQL7<@ 7=R?AHI?A@ aA9Ë;T=@_`TZ3LaIHI@ ]6_`H 1436D 7 R7yl;14Ri]6RO41Y=?SO14d;TrTr5U?S7<H ];@ ?KHI@ ?A?=b7QTZOYe&3;TZR147=OYH 14R?¦7=OYD=T<@ 1YH dLR#14a.7S:<7=14O47<£LO4?CUW@ TZR Ë;@ 1Y?S56R7=3lk)m24poq5ÌSq=q<6X9

u2T _`TZ36_AO]65;?=b¦?©TZ]LO45O41Y=?HIT½3;T=HI?H dL7<Hn7j\7Se=?Sa 17=33;?AHfkT=@ ½_AO47=a a 1Y¡L?A@_`TZRQ];HI?Sa¤UWT=@¤?S7=_Xd1436a H 7=36_`?[7_`TZ3L561YH 1YTZ367=OQ6@ T=£L7<£L1O41YHfe­5614a HI@ 1Y£L];H 14TZ3T|:=?A@EH d;?¥_AO47=a a:<7<@ 147<£OY?=9Ë;T=@_AO47=a a 1Y¡_A7<H 1YTZ3¯QL];@IQTZaI?SaAb.1YHUª];@ H d6?A@14a7=a a Tr_A147<HI?S5 )1YH d§7U]L36_`H 1YTZ3

0324+53r6 0324]5f6E)d614_Xd§HeQ14_A7=O4OYeÇ£]614O456a];QTZ3H d;??3kz=m2<kP+244 zPy4k`9ca 143;DH d61a\@ ]6OY?=b7¾_AO7=a a 1Y¡6?A@sI TZ];HIQL];H aUWT=@©?S7=_ d :<7=O4];?§7=a a 1YDZ3&R?S3H [ 032<4=5Zr6Xb7­_AO47=a a:<7=O4];?[a ]6_Xd½H d67<Hd e 5 6 / d e 5 ! 6EUT=@>7=O4O" &[ 0324+5R6Xb£6@ ?S7<&143;DH 1Y?Sa7<H@ 7=3656TZR¥9

# $ ¸n´=º%`·ºL³n<ºf¶»³ && yº('6µ j\7Se=?Sa 17=336?AHfkT=@ n_AO47=a a 1Y¡6?A@ a)7=a@ ?A:r1Y?Ak?S5[7<£T:=?=b1436_AO]65;?7 a 1436DZOY?E_AO47=a a¢:<7<@ 147<£OY?E7=365­7=aa ]L_ d­7<@ ?TZ3;?`f561R?S36a 1YTZ3L7=O9 M?3;T 143HI@ Tr5L]6_`? H d;?_`TZ3&_`?AQ6HT=UR¢]LOYH 1f5614R?S36a 1YTZ367=O1YHfe½143¾j\7Se=?Sa 147=3Æ3;?AHkT=@ _AO47=a a 1Y¡6?A@ aC£re5;?A¡36143;D>7nU7=R14OYe T=URT&5;?SO4aH d67<H)R7Se¥1436_AO4]L5;?Ri]6OYH 14QLOY?¦_AO47=a a.:<7<@ 147<£LO4?SaA9

 Uy4pm>+-k) P~2<42k>2< k$m>nz )4j2PP k)zi14a.7j\7|e=?Sa 147=33;?AHfkT=@ T=UGd614_ dH d;?CD=@X7<QLdN !KL '*)cQ d67=a7K@ ?SaIHI@ 14_`HI?S5HIT=QTZO4T=D=e=9ud;?aI?AHJT=U@ 7=3656TZR :<7<@ 147<£LOY?Sa14aQ7<@ H 1YH 1YTZ3;?S5[143HITEH d;?aI?AH a ! w % ')()()(*' + b-, / Ì<bT=U¢_AO47=a aE:y7<@ 17<£LOY?Sa7=365H d;?aI?AHfZr ! w % ')()()($' ut b v / Ì<b+T=UU?S7yH ];@ ?:y7<@X147<£LOY?SaA9ud6?aI?AHkT=UÁ7<@ _Aa ) T=U+H d;?D=@ 7<QLd14aQL7<@ H 14H 1YTZ3;?S5143HITCH d;?.H d;@ ?A?a ?AH a ) b ) r 7=365 ).0/d67|:r1436DH d;?CUWTZOOYT|)13;DQ6@ T=Q?A@ H 1Y?Sa)

1 UT=@«?S7=_Xdsuq[;3r½H d;?A@ ?E14aK7_C [ ½)1YH d5C ' 6[ )2.0/ 7=365¥UT=@?S7=_ d [_¯H d;?A@ ?1a.7=3 f[Zr¯)1YH d 5 ' 6[ ).0/ Ê

1 H d6?¤a ];£6D=@ 7<QLd¯T=UcN H d67<H14a14365L]6_`?S5¯£e Z?$r]67=O4aNc !MLZ '*) Q Ê

1 H d6?¤a ];£6D=@ 7<QLd¯T=U8N H d67<H14a143L56]6_`?S5©£e 3r?$r]67=O4aNqr !ML3r '*) r Q 9

ud;?ia ];£6D=@X7<QLd_Nc 14a7D=@X7<QLd614_A7=O2aIHI@ ]6_`H ];@ ?¢T:=?A@H d;?¤_AO47=a a¢:y7<@X147<£LOY?Sa¢7=365©1a¢_A7=O4O4?S5H d;?_AO47=a a 14¡6?A@3pa)4j2PPyZz=2 465ÊH d;?a ];£LD=@ 7<QLdRNqr¤14a+_A7=O4O4?S5K1YH a w kx2m>yz+kPyZz=2 4659ud;?a ];£6D=@ 7<QLdRN .0/ !ML '*) .0/ Q 14a7¥£L1YQL7<@ H 1YHI?D=@ 7<QdNH d67<H¢@ ?SO47<HI?SaKH d6?:y7<@X1YTZ]6aCU?S7yH ];@ ?[:y7<@ 17<£LOY?SaHITH d6?[_AO47=a a:<7<@ 147<£OY?SaAÊ.H d614aa ];£;D=@ 7<QLd¥1a._A7=O4OY?S5¤H d;? w kx2<m>yzxk8)k4kx*m>><yZ<z+27465>T=UH d;?[_AO7=a a 1Y¡6?A@7=3L5Ç1YH aaI?AHT=UC7<@ _AaE14aHI?A@ R?S5§H d;?_AO47=a a 1Y¡6?A@3pa w kx2<m>yzxk$k4kx$m>~ 2z= )k$mª9ÉË;T=@¯7=3e:<7<@ 147<£LOY? # 143¥7ER¢]LOYH 1f5614R?S36a 1YTZ367=O_AO47=a a 1Y¡6?A@Sb;k?3;TÀ]6aI?f^ # HITE5;?S36T=HI?«H d;?«_AO7=a aQL7<@ ?S3H a)T=U #143Nib&H d67<H.14aAb^ # ! ^ #98 Z9M?¦U]6@ H d;?A@\]6aI?^r # HIT¤56?S3;T=HI?H d;?UW?S7<H ];@ ?iQ7<@ ?S3H aT=U # 143 Ni9vT=HI?)H d67<HUT=@7=3e_AO47=a a:<7<@ 147<£OY? k?H dr]6ad67|:=?H d67<H ^r !;:Æ7=365 ^q !1^q9

Multi-dimensional Bayesian Network Classifiers 109

Page 120: Proceedings of the Third European Workshop on Probabilistic

M'1YH d613 H d;? Uª7=R14OYe T=U Ri]6OYH 1f5L14R?S36a 14TZ367=Oj\7Se=?Sa 147=3 3;?AHfkT=@ _AO47=a a 1Y¡6?A@ a§:y7<@X1YTZ]6a§5L1¨?A@ ?S3HHeQ?SaT=U_AO47=a a 1Y¡L?A@7<@ ?i5614aIH 1436DZ]614a d;?S5£L7=a ?S5n];QTZ3H d;?S14@D=@ 7<QLd614_A7=O.a HI@ ]6_`H ];@ ?SaS9ÇÂ3¼?`l;7=RQLO4?>14aH d;?w y44 q2|kgHy4pm>+bk >24)4j2PP kz)13«)dL14_ d£T=H d H d;?_AO7=a a¤a ];£6D=@ 7<Qd 7=365 H d;? U?S7<H ];@ ? a ];£;D=@ 7<Qd>d67S:=?¢?SRQ6He¤7<@X_KaI?AH aA9kv)T=HI?¢H d67<H)H d61a.a ];£;Uª7=R14OYe¤T=U£1QL7<@ H 1YHI?i_AO47=a a 14¡6?A@ a1436_AO4]656?SaH d;?iTZ3;?`561R?S36a 1YTZ3L7=O367=1Y:=?jk7|e=?Sa 147=3¼_AO47=a a 1Y¡6?A@7=a7naIQ?`_A147=O_A7=aI?=9ÀJ)?A:=?A@ aI?SOYe=b7=3e§a ]L_ dÇ£1QL7<@ H 1YHI?n_AO47=aa 14¡6?A@dL7=a¥7=3 ?$r]61Y:<7=OY?S3H3L7=1Y:=?jk7|e=?Sa 147=3_AO7=a a 1¡6?A@))1YH d¥7a 13;DZOY?C_`TZRQTZ]6365¥_AO47=a a:<7<@ 147<£LOY?=9Â3&T=H d;?A@HferQ?«T=UR¢]LOYH 1f5614R?S36a 1YTZ367=O_AO47=a a 1Y¡6?A@)1a.H d;?a ]6£6U7=R14OYe¥T=U._AO47=a a 14¡6?A@ aC143 )dL14_ d £T=H dH d;?E_AO47=a aa ]6£6D=@ 7<QLd7=365H d;?)U?S7<H ];@ ?a ];£6D=@X7<QLd7<@ ?5L1Y@ ?S_`HI?S5HI@ ?A?SaS9§V3¼H d;?>@ ?SR7=14365;?A@T=U¦H d;?>Q7<Q?A@|bk?>)14O4OUTr_A]LaTZ3 H dL14a¦a ];£6Uª7=R14OYe¥T=U w y44 m>zxkk)=2<y<bkZmkUy4 m>+bk) P~24C)4j2 kzPX9

 Ri]6OYH 1f561R?S36a 1YTZ3L7=O«_AO47=a a 1Y¡6?A@N143Æ?Sa a ?S36_`?§14a]6a ?S5­HITN¡L365­7aFITZ143H:y7=O4]6?¤7=a a 1YDZ36R?S3HT=UCd61YDZd&?SaIHQTZaIHI?A@ 1YT=@QL@ T=£L7<£L14O1YHfeEHIT1YH aaI?AH)T=U_AO7=a a:<7<@ 17<£LO4?SaA9Ë$1436561436DNa ]6_ d½7=3½7=a a 1YDZ36R?S3HEDZ1Y:=?S3§:<7=O];?Sa¢UT=@7=OOU?S7<H ];@ ?:y7<@X147<£LOY?Sai143:=TZOY:=?S5+b14a¢?$]614:7=OY?S3H¦HIT¥aITZOY:&143;DH d;?¿[F Q6@ T=£LOY?SR>9¦u)d614aQ6@ T=£;OY?SR®14a&3;T|)3HIT£?vFfdL7<@ 5¤143D=?S36?A@ 7=Ob;e=?AH._A7=3£?aITZOY:=?S5-143'QTZOYe&3;TZR17=O)H 14R?UT=@[3;?AHT=@ ra¥T=U£TZ]63656?S5 HI@ ?A?A)145;H dK5ªjkT&56O47<?S365;?A@k)mb24jobR<o=o6X9V3H d6?)Q6@ ?Sa ?S36_`?T=U2]63;T=£LaI?A@ :=?S5U?S7<H ];@ ?:y7<@X147<£LOY?SaAbH d;?[Q6@ T=£LOY?SR T=UC¡L36561436D 7=a a 1YDZ36R?S3H aET=U¦dL1YDZd;?SaIHQTZaIHI?A@ 14T=@QL@ T=£L7<£L14O1YHfe­@ ?SR7=136a¤143HI@X7=_`H 7<£LOY? ?A:=?S3UT=@H d;?SaI?¥@ ?SaIHI@X14_`HI?S5­3;?AHfkT=@ &a 5ªF$7<@ b<o=o6X9©V3:&1Y?A T=U«H d6?N]63;Uª7S:=TZ];@ 7<£OY?_`TZRQL];H 7<H 1YTZ3L7=OK_`TZRQLO4?`l&1YHe143:=TZOY:=?S5+bk?¥3;T=HI?¥H d67<HH d;?¤Q6@X7=_`H 14_A7<£L14O1YHe¥T=URi]6OYH 1f561R?S36a 1YTZ3L7=O+_AO47=a a 1Y¡L?A@ a)14aO414R14HI?S5>HITRT&5;?SO4a\1YH d¤@ ?SaIHI@ 1_`HI?S5>_AO47=a aa ];£6D=@ 7<QLdLaA9

g3µ³ºf³¸nµ3 %¸ L³2´<L·$ ¸n´Zº%`·ºL³yºf¶»³ && yº('6µ V3iH d614a$aI?S_`H 1YTZ3k?5;?A¡L36?H d;?.QL@ T=£LOY?SRÅT=UO4?S7<@ 36143;D7nU]LO4OYe HI@ ?A?`f7=];DZR?S3HI?S5¼Ri]6OYH 1f561R?S36a 1YTZ3L7=O_AO47=aa 14¡6?A@7=365a d;T¼H d67<HH d614a$Q6@ T=£LOY?SRÆ_A7=3£?.5;?S_`TZRQTZaI?S5 143HIT¤HfkT[aI?AQL7<@ 7<HI?T=QLH 14R14a 7<H 14TZ3[Q6@ T=£LOY?SRa)dL14_ d¥_A7=3¥£T=H d¥£?KaITZO4:=?S5>143¤QTZOYe&3;TZR147=OH 14R?=9 6G jk?AUT=@ ?5;?A¡L3L143;D°H d;? Q6@ T=£OY?SR T=UÆO4?S7<@ 36143;D¬7Uª]6O4OYe§HI@ ?A?`f7=];DZR?S3HI?S5ÀRi]6OYH 1Yf5614R?S36a 1YTZ367=O)_AO7=a a 1¡6?A@¦U@ TZR 567<H 7&bÁk?i@ ?S_A7=O4O»H dL7<HH d6?i@ ?SO47<HI?S5Q6@ T=£;

OY?SR UT=@nj\7|e=?Sa 147=3¾36?AHfkT=@ -_AO47=a a 1Y¡L?A@ a¥d67=a¥£?A?S3aIH ]L561Y?S5 UT=@¢7>¡6lr?S5©aI?AHKT=U.@ ?SOY?A:<7=3HKUW?S7<H ]6@ ?:<7<@ 17<£LO4?SaA9 Ë6TZO4OYT|143;DÇ7¼a 14R14O47<@¤7<Q6Q6@ TZ7=_Xd+b¦?36T|UT=@ Ri]6O47<HI?«TZ]6@¦O4?S7<@ 36143;DQ6@ T=£LOY?SR HIT¤Q?A@ H 7=143[HIT¤7a ]6£6U7=R14OYeT=U»_AO47=a a 1Y¡6?A@ [email protected]_ dEH d;?¦U?S7<H ];@ ?CaI?`OY?S_`H 14TZ3¥a ];£6D=@ 7<Qd1a\¡;lr?S5Á9M?K)14O4O@ ?AH ]6@ 3¤HITH d;?14a a ];?¦T=U2U?S7<H ];@ ?¢a ];£LaI?AHa ?SOY?S_`H 1YTZ3>143[^&?S_`H 1YTZ3[tr9

M?¥£?ADZ143¯£re©5;?A¡36143;DnH d;?¥a ];£6Uª7=R14OYeT=UU]LO4OYeHI@ ?A?`f7=]6DZR?S3HI?S5 _AO47=a a 1Y¡L?A@ a1YH d 7À¡;l&?S5®aI?SOY?S_ÍH 1YTZ3 T=U¼U?S7<H ];@ ? :y7<@ 17<£LOY?SaÀQ?A@Å_AO47=a aÅ:<7<@ 147<£LOY?=9ud6?SaI?_AO7=a a 1Y¡6?A@Xa7<@ ?_`TZ36a 15;?A@ ?S5 2U PP4kUT=@H d;?¦OY?S7<@ 3L143;D«QL@ T=£LOY?SR¥9$M?KOY?AHkH d;?CaI?AH.T=UÁ@X7=365;TZR:<7<@ 147<£LO4?Saq £?iQL7<@ H 1YH 14TZ3;?S5n13HIT Ç7=365 ZrÊ+k?Uª];@ H d;?A@+H 7<=?\7aI?AH2T=U;7<@ _Aa ) HIT)£?QL7<@ H 1YH 1YTZ3;?S5K13HIT) b ) r-7=365 ).0/ 7=a«£?AUT=@ ?=9M?E3;T _`TZ3La 145;?A@7©a ];£LaI?AH ) .0/ T=U8 @ r a ]6_XdÇH d67<HL '*) .0/ Q14a7©UW?S7<H ];@ ? a ?SOY?S_`H 1YTZ3'a ];£6D=@X7<QLd+9ÀÂU]LO4OYe­HI@ ?A?`7=];DZR?S3HI?S5Ri]6OYH 1Yf5614R?S36a 1YTZ367=O)_AO7=a a 1Y¡6?A@¥3;T14a7=56R14a a 1Y£OY?UT=@ ) ./ 1YU2?KdL7S:=?CUT=@)1YH a.aI?AH.T=UG7<@ _Aa).0/ H d67<H )2.0/ ! ) .0/ 9$ud;?\aI?AH$T=U7=O4O&7=56R14a a 1Y£OY?_AO47=a a 1Y¡6?A@ akUT=@ ) .0/ 14a5;?S36T=HI?S5>7=a"!$# %'&Á9

u)d;?¼OY?S7<@ 361436D Q6@ T=£LOY?SR 3;T|É14a HIT¡L365ÅU@ TZR7=RTZ36DH d;?¢a ?AHT=U7=56R14a a 1Y£LOY?¦_AO47=a a 1Y¡6?A@ a\TZ3;?¢H d67<H£?SaIH¡LH aH d;?7S:<7=14O47<£LO4?n567<H 7&9ÅÂa7R?S7=a ];@ ?[T=Ud;T-?SOO7RTr5;?SO5;?Sa _`@ 1Y£?SakH d;?K567<H 7&b;k?K]6aI?C1YH aOYT=D<fO1Y=?SO414d;TrT&5ÇDZ1Y:=?S3'H d;?N5L7<H 7&9 ¿[T=@ ?UWT=@XR7=O4OYe=bH d;?nOY?S7<@ 361436DQ6@ T=£LOY?SR UWT=@U]LO4OYeHI@ ?A?`f7=];DZR?S3HI?S5Ri]6OYH 1f5L14R?S36a 14TZ367=O_AO7=a a 1Y¡6?A@Xa+)1YH dK7)¡;l&?S5KU?S7<H ];@ ?aI?SO4?S_`H 1YTZ3>7<@ _Ka ?AH ) ./ H d;?S3>14a.HIT¡L365¥7_AO47=a a 1Y¡6?A@I 143(! # %'& H d67<H)R7yl;14R14a ?Sa S5>I¡6X9*) + -, © . 6/G0L - V3¥H d614a)aI?S_`H 1YTZ3n?«a d;T|ÀH d67<H)H d;?iOY?S7<@ 3613;DQ6@ T=£;OY?SR UWT=@U]6OOYeHI@ ?A?`f7=]6DZR?S3HI?S5 Ri]6OYH 1f5L14R?S36a 14TZ367=O_AO47=a a 1Y¡6?A@ a\_A7=3[£?Ka TZOY:=?S5>143¤QTZOYe&3;TZR147=OH 14R?=9

M?_`TZ36a 15;?A@7EUª]6O4OYe¤HI@ ?A?`f7=];DZR?S3HI?S5_AO47=a a 1Y¡6?A@I)1YH d©H d;?¥_AO47=a ai:y7<@X147<£LOY?Sa-7=365¯H d;?¤U?S7<H ];@ ?:<7<@ 147<£LO4?SaUZrH d67<H14ai7=56R1a a 1Y£LO4?UWT=@H d;?U?S7<H ];@ ?aI?SO4?S_`H 1YTZ3­7<@ _¤aI?AH ) .0/ 9Nj\]614O56143;D>];QTZ3¯7@ ?Sa ]6OYHU@ TZR Ë;@ 1Y?S5LR7=3 k)m 24jo5ÌSq=q<6XbG?d67|:=?H d67<H¦H d;?OYT=D<fO1Y=?SO414d;TrT&5NT=USIDZ1Y:=?S3¼7 567<H 7aI?AH?¡ _A7=3­£?@X1YHIHI?S3¥7=a

S5>I¡6!!21 ¥43

+i %

5768:9 5 ^c6<;¥43ti %

5768:9 5> ^S 6

110 L. C. van der Gaag and P. R. de Waal

Page 121: Proceedings of the Third European Workshop on Probabilistic

!"¥ 3+ji %

$68:9 5Êx^c~6 1l¥43+i %

568:9 56

;¥ 3ti %

68:9 5>ZÊx^S 6/1 ¥43ti %

5768:9 5>6 '

)d;?A@ ? d-1a.H d;?K?SRQ1Y@ 14_A7=O5614a HI@ 1Y£L];H 14TZ3EU@ TZR ¡nb5 8 5 # 6?! 1 V db5Y 6 3OYT=Ddb5Y 6¢14aiH d;??S3HI@ T=QeT=U7¼@ 7=3656TZR :<7<@ 147<£LOY? # )1YH d-5L14aIHI@ 1Y£];H 1YTZ3 db5 8 5 # 7?6c! 1 V db5Y ' 6 3ZOYT=Dndb5Y 6¦5;?`3;T=HI?SaH d6?\_`TZ3L561YH 1YTZ367=O?S3HI@ T=QeiT=U # DZ1Y:=?S3?7Eb7=365

8 5 # Êx7-6! V db5Y ' 6/3SOYT=D

d 5Y ' 6db5Y 6 3$db5 6

5;?S3;T=HI?Sa)H d;?KRi];H ]67=OÁ1436UWT=@ R7<H 1YTZ3¤T=U # 7=365 7E9ud6?HfkT)?S3HI@ T=QreCHI?A@ Ra568:9 56+7=3L5 5768:9 5>6143¾H d;?¯7<£T:=?©?`l&Q6@ ?Sa a 1YTZ3¾UT=@ S5>I f¡&6>_`TZ3&

_`?A@ 3iR7<@ DZ143L7=O5614a HI@ 1Y£L];H 14TZ36a?SaIH 7<£LO41a d;?S5KU@ TZRÀH d;?7S:<7=14O47<£OY?>567<H 7&9-ud;?SaI?[HI?A@ RaH d6?A@ ?AUWT=@ ?n5;?AQ?S365TZ36OYe©TZ3¼H d;?>?SRQL1Y@ 14_A7=O\561aIHI@ 1Y£L]6H 1YTZ37=365§3;T=HTZ3H d;?D=@ 7<QLd61_A7=O)aIHI@ ]L_`H ];@ ?[T=U«H d;?n_AO47=a a 1Y¡6?A@S9Àud614aT=£LaI?A@ :y7<H 1YTZ314RQO41Y?SaH d67<H\7=37=56R14a a 1Y£LOY?._AO47=a a 1Y¡6?A@H d67<H$R7yl;14R14aI?Sa»H d;?\OYT=D<fO41Y=?SO41d;TT&5¦DZ1Y:=?S3iH d;?k567<H 714a\7_AO47=a a 14¡6?A@\H d67<H.R7yl&1R14aI?SakH d;?Ca ]6R T=UG1YH a\HfkTRi];H ]67=Of1436UWT=@ R7<H 1YTZ3HI?A@ RaA9

M?®_`TZ36a 1456?A@¾H d;? R¢];H ]L7=Of143;UT=@ R7<H 1YTZ3 HI?A@ R$68:9 5>Êx^6»13¢aITZR?\RT=@ ?.5;?AH 7=14O92M?3;T=HI?\H d67<HH d;?.aI?AH$T=ULQ7<@ ?S3H a^ \T=U7=3e«U?S7<H ];@ ?\:<7<@ 147<£LOY? 14aQL7<@ H 1YH 1YTZ36?S5143HIT«H d;?)a ?AHJ^qGT=U+_AO47=a aQL7<@ ?S3H a7=365¥H d6?KaI?AH ^rG CT=U$U?S7<H ];@ ?CQL7<@ ?S3H aA9ca 13;DH d;?_ dL7=143Ç@ ]LOY?>UT=@ERi];H ]67=O)13;UWT=@XR7<H 1YTZ3 58kT:=?A@>7=365ud;TZR7=aAbÌSq=q&Ì*6Xb;H d6?HI?A@ R$68:9 5> =Êx^6_A7=3¤3;T£?C@ 1YHIHI?S3¥7=atPi %

$68:9 5>Êx^qu 6;$68:9 5>ZÊx^Sr ^qC6 ')d;?A@ ? 8 5 # Êx7 q6¢14aH d;?¥_`TZ365L1YH 1YTZ367=O\Ri];H ]67=O143;UT=@ R7<H 14TZ3¤T=U # 7=365 7 DZ1Y:=?S3 ¾)1YH d 8 5 # Êx7 q6!! V db5Y '' 6$3SOYT=D

db5Y ' 6db5Y 6/3$d 5 6 (

^&143L_`?H d6?CUW?S7<H ];@ ?«aI?SOY?S_`H 1YTZ3[7<@ _CaI?AH ) .0/ 14a\¡;l&?S5+bH d;?.aI?AH^ T=U_AO47=a a$QL7<@ ?S3H aGT=ULH d;?\U?S7<H ];@ ?\:<7<@ 17<£LOY?U 14aH d;?ia 7=R?UWT=@¦?A:=?A@ e7=56R14a a 1Y£LOY?«_AO7=a a 1¡6?A@S9«M?_`TZ36_AO4]656?¢H d67<HCH d6?Ri];H ]67=OYf143;UT=@ R7<H 1YTZ3HI?A@ R$68:9 5> =Êx^qC 6¦14aKH d;?a 7=R?UWT=@i7=O4ORT&5;?SO4a143¤H d;?¢aI?AHT=U$7=56R14a a 1Y£LOY?_AO47=a a 1Y¡L?A@ a ! # % & 9

u2T¯a ]6RR7<@ 1aI?>H d;? 7<£T|:=?_`TZ36a 1456?A@ 7<H 1YTZ36aAb\k?d67|:=?ÀH d67<H'7 _AO47=a a 1Y¡L?A@§H d67<H'aITZOY:=?Sa H d;?ÀO4?S7<@ 3&143;D Q6@ T=£LO4?SR UT=@'U]6OOYe®HI@ ?A?`f7=];DZR?S3HI?S5 Ri]6OYH 15614R?S36a 1YTZ367=O;_AO7=a a 1Y¡6?A@Xa)1YH dH d6?)¡;l&?S5U?S7<H ];@ ?aI?`OY?S_`H 1YTZ3½7<@ _aI?AH ) .0/ b1aE7¯_AO47=a a 14¡6?A@U@ TZR !$# % &H d67<H)R7yl&14R14aI?Sa+i %

68:9 5 Êx^c 6<;ti %

$68:9 5> Êx^ r ^ 6 (M?3;TÇQ6@ T&_`?A?S5£re¢a d;T)143;DH d67<HH d;?O4?S7<@ 36143;D

Q6@ T=£OY?SR _A7=3©£?E5;?S_`TZRQTZaI?S513HITnHfkTaI?AQ7<@ 7<HI?T=Q6H 14R14a 7<H 1YTZ3«Q6@ T=£LOY?SRa+)d614_Xd«_A7=3i£T=H dK£?a TZOY:=?S5143QTZOYe&3;TZR17=O$H 14R?=9¤u»TnH d614aC?S365Áb»k?¡6@ a HK_`TZ3&a 1456?A@»H d;?\R¢]6H ]67=Of143;UT=@ R7<H 14TZ3KHI?A@ RÅQ?A@ H 7=143613;D.HITH d;?¦_AO47=a ak:<7<@ 147<£LOY?SaS92M?¦T=£LaI?A@ :=?H dL7<HAb&a 1436_`?_AO7=a a:<7<@ 147<£LOY?Sa)TZ36OYe¥d67|:=?_AO7=a aQL7<@ ?S3H aAbH d614aHI?A@ R 5;?`Q?S3656aTZ3«H d;?aI?AH»T=U;7<@ _Aa ) T=U&H d;?_AO7=a a»a ]6£6D=@ 7<QLdTZ36OYe=9¥M'1YH d©@ ?SaIQ?S_`H«HITH d;?_`TZ36561YH 1YTZ3L7=OR¢]6H ]67=O143;UT=@ R7<H 14TZ3§HI?A@ R 7<£T|:=?=bk?nT=£La ?A@ :=? H d67<H¤H d614aHI?A@ R 5;?AQ?S3656aTZ3 H d6?[U?S7<H ];@ ? aI?SOY?S_`H 1YTZ3½7<@ _aI?AH) .0/ b)dL14_ d'14a¡;l&?S5+b¦7=365½TZ3 ) rkb£L];H¥3;T=H¤TZ3H d;?aI?AH ) 9 ud;?SaI? _`TZ36a 1456?A@ 7<H 1YTZ36a¤14RQOYe­H d67<HH d;?HT«HI?A@ Ra7<@ ?14365;?AQ?S365;?S3H$7=365_A7=3£?R7ylr14R1aI?S5¤aI?AQL7<@ 7<HI?SOYe=9

ud6? Ri];H ]67=OYf143;UT=@ R7<H 1YTZ3½HI?A@ R"Q?A@ H 7=136143;D¯HITH d;?¤_AO47=a a¢:y7<@X147<£LOY?Sai_A7=3¯£?R7yl;14R14aI?S5£eN]La 143;DH d;? Q6@ T&_`?S56];@ ?nU@ TZR 8\d;T 7=365'0»14]M5ÌSq<=m6UT=@_`TZ36aIHI@X]6_`H 143;DR7yl;14Ri]6RfO414=?SO414d;TrTr5HI@ ?A?Sa)

Ì<9\8\TZ36aIHI@ ]6_`HÆ7 _`TZRQLOY?AHI?#]6365L1Y@ ?S_`HI?S5 D=@X7<QLdT:=?A@H d6?KaI?AHT=U_AO47=a a.:<7<@ 147<£OY?Sa Z9

r9Âa a 14DZ3Ç7Nk?S1YDZdH$68 9 5 ' C6HIT©?S7=_ d ?S5;D=?£?AH?A?S3G7=365_C=b!;9

sr9j\]61O45>7R7yl&14Ri]6Rk?S1YDZdHI?S5na QL7=36361436DHI@ ?A?=bUT=@ ?`l;7=RQLOY? ]6a 143;D C@ ]6a y7=Opa 7=OYD=T=@X1YH d6R5ÌSq=t<6X9

;9u2@ 7=36a UWT=@ R H d;?©]L36561Y@ ?S_`HI?S5HI@ ?A?©143HIT 7 561@ ?S_`HI?S5TZ3;?=b$£reN_ d;TrTZa 1436D[7=3©7<@ £L1YHI@ 7<@ e:<7<@ 17<£OY?EUT=@1YH ai@ TrT=H7=3L5¯a ?AHIH 143;D 7=O4O\7<@ _¤561Y@ ?S_ÍH 14TZ36a.U@ TZR®H d;?K@ TrT=HTZ];Hk7<@X5+9

Ë;T=@H d;?i_`TZ36561YH 1YTZ3L7=OÁR¢]6H ]67=Of143;UT=@ R7<H 14TZ3¤HI?A@ RQ?A@ H 7=143613;D¤HIT>H d;?U?S7<H ];@ ?E:y7<@ 17<£LOY?SaAbÁk?T=£La ?A@ :=?H d67<H1YH[14anR7yl&1R14aI?S5-£re'¡L365613;DÇ7§R7yl;14Ri]6RO41Y=?SO14d;TrTr5©561Y@ ?S_`HI?S5­aIQL7=3L36143;D>HI@ ?A?¥T|:=?A@H d;?¤U?S7yH ];@ ?:<7<@ 147<£LOY?SaS9 ^&]L_ d½7©HI@ ?A?14a¤_`TZ36a HI@ ]6_`HI?S5½£eH d;?KUTZO4OYT)143;DiQ6@ T&_`?S56];@ ?<

Multi-dimensional Bayesian Network Classifiers 111

Page 122: Proceedings of the Third European Workshop on Probabilistic

Ì<9\8\TZ36aIHI@ ]6_`H7_`TZRQLOY?AHI?¥561Y@ ?S_`HI?S5D=@ 7<QdT:=?A@H d6?«aI?AHT=UG:y7<@X147<£LOY?Sa Zrk9

r9Âa a 14DZ3E7ik?S1YDZdH $68:9 5> Êx ^qC 6HIT¢?S7=_Xd7<@X_CUW@ TZR HITb b ! ;9

sr9j\]61O45>7R7yl&1R¢]6R?S1YDZdHI?S5561Y@ ?S_`HI?S5naIQL7=3&3L143;DHI@ ?A?=bUT=@?`l;7=RQLOY?i]6a 1436DH d;?i7=O4D=T=@ 1YH d6RT=U8.d]©7=3650214] 5ÌSq<=m6«T=@f56RTZ3656a3»7=OYD=T<@X1YH d6R 5ÌSq<<6X9

M?kTZ]6O45½O41Y=?HITÇ3;T=HI? H d67<H¥UT=@¥R7yl&1R14a 1436DH d;?\_`TZ365614H 1YTZ367=ORi];H ]67=OYf143;UT=@ R7<H 1YTZ3«HI?A@ RÅUWT=@GH d;?U?S7<H ];@ ?¯:y7<@X147<£LOY?SaAb¢?¼d67|:=?¼HIT-_`TZ36aIHI@ ]6_`H7'561@ ?S_`HI?S5aIQL7=363613;DHI@ ?A?=b»)d614OY?¢UWT=@¢R7yl&1R14a 1436DH d;?Ri];H ]67=Of13;UWT=@XR7<H 1YTZ3HI?A@ R#UT=@H d;?K_AO47=a a.:y7<@X147<£LOY?Sak?>_`TZRQL];HI?>7=3§]636561Y@ ?S_`HI?S5­TZ3;?=9Çud;?¥3;?A?S5¼UT=@H d61a561Y¨?A@ ?S3L_`?7<@X14aI?SaUW@ TZR H d6?iT=£aI?A@ :<7<H 1YTZ3 H d67<H$68:9 5ÊPC 6! $68:9 5CZÊP6+UWT=@GH d;?\_AO47=a a2:y7<@X147<£LOY?Sa)dL14OY? $68:9 5>uÊx Z^qG6! 68:9 5> =Êxu 3^qC 6UT=@H d;?CU?S7<H ];@ ?K:<7<@ 147<£OY?SaA9

gC];@7=O4D=T=@ 1YH d6R UT=@¦a TZOY:r13;DH d;?OY?S7<@X36143;DEQ6@ T=£;OY?SR UWT=@U]6OOYeHI@ ?A?`f7=]6DZR?S3HI?S5 Ri]6OYH 1f5L14R?S36a 14TZ367=O_AO47=a a 1Y¡6?A@ aN)1YH dÈ7¾¡;l&?S5 U?S7<H ];@ ?'aI?SOY?S_`H 1YTZ3 a ];£;D=@ 7<Qd 14a _`TZRQTZaI?S5 T=U H d6?HfkT Q6@ T&_`?S56];@ ?Sa5;?Sa _`@ 1Y£?S5 7<£T|:=?=9 ^rTZOY:&143;D H d6? QL@ T=£LOY?SRH dr]6a 7=RTZ]L3H a HITÉ_`TZRQL]6H 143;D 7=3 ]63L561Y@ ?S_`HI?S5R7yl;14Ri]6Rk?S1YDZdHI?S5aIQL7=363613;DHI@ ?A?.T|:=?A@H d6?_AO47=a a:<7<@ 147<£LO4?Sa7=365 7 561Y@ ?S_`HI?S5 R7yl&1R¢]6R?S1YDZdHI?S5aIQ7=3636143;DHI@ ?A?nT|:=?A@H d;?>U?S7<H ];@ ?>:<7<@ 147<£LO4?SaA9Çud;?_`TZRQ];H 7<H 1YTZ3¯T=UH d;??S14DZdH aiUT=@iH d6?]63L561Y@ ?S_`HI?S5HI@ ?A?d67=a»7_`TZRQL];H 7<H 1YTZ367=O_`TZRQLOY?`l;1YHeT=U 5 ,¥ 6Xb)dL14OY?H d;?C_`TZ36a HI@ ]6_`H 1YTZ3T=UÁH d;?¦HI@ ?A?C1YH a ?SOYU+@ ?$r]61Y@ ?Sa 5 ,O4T=D ,u6H 14R?_5C@ ]6aI<7=ObÌSq=t<6X9¤ud;?E_`TZRQL]&H 7<H 1YTZ3¤T=U2H d;?Ck?S1YDZdH a\[email protected] d;?K5614@ ?S_`HI?S5HI@ ?A?Kd67=a.7_`TZRQOY?`l&14Hfe[T=U 5v ¥ 6Xb»d614OY?iH d;?_`TZ36aIHI@X]6_`H 1YTZ3T=U+H d;?HI@ ?A?14H aI?SOYUH 7<=?Sa 5v*6$H 1R?=9^&1436_`?)7«HferQ;14_A7=O.5L7<H 7aI?AHa 7<H 14aI¡L?Sa?¥ OYT=D ,-7=365 ¥ vbk?)d67|:=?H d67<HH d;?)T:=?A@ 7=O4O_`TZRQLOY?`l;1YHfeT=U+TZ];@k7=OYD=T<@ 14H d6R 14aQTZOYe&3;TZR147=OÁ13>H d;?3]LR«£?A@T=U:y7<@X147<£LOY?Sa143:=TZOY:=?S5+9

M? _`TZ36_AO4]L5;?[H d61aaI?S_`H 1YTZ3½£reÇT=£LaI?A@ :r1436DNH d67<HH d;?OY?S7<@ 3613;DCQ6@ T=£LO4?SR _A7=3E7=OaITK£?)UT=@ Ri]6O47<HI?S5UT=@_AO47=a a 1Y¡6?A@ aE143 )d614_Xd§H d;? aI?AH ) T=@¤H d;? aI?AH ) r14aK?SRQ6Hfe=9ngC];@«7=OYD=T=@ 14H d6R14aK@ ?S7=5614OYe 7=567<Q6HI?S5©HITH d;?Sa ?KQ6@ T=£LO4?SRaA9M'1YH d ) l! :ibLTZ36O4eEH d;?¢_`TZ36561H 1YTZ3L7=OR¢];H ]L7=Of143;UT=@ R7<H 1YTZ3 HI?A@XR UWT=@«H d;?U?S7<H ];@ ?:<7<@ 147<£LO4?Sa¥d67=a¥HIT§£?NR7yl;14R1aI?S5-]6a 1436D¯H d6?aI?S_ÍTZ365 QL@ Tr_`?S5L];@ ?7<£T:=?=9M½1YH d ) r;! :ibÁTZ36O4e>H d;?

Ri];H ]67=Of13;UWT=@XR7<H 1YTZ3HI?A@ R#UT=@H d;?K_AO47=a a.:y7<@X147<£LOY?Sad67=aHIT£?KR7yl&1R14aI?S5¤]6a 13;DH d;?K¡6@ a HkQL@ Tr_`?S5L];@ ?=9

3´Z¸µ¸;´GZL¹r´Zº¶»³V3 H d;? Q6@ ?A:&1YTZ]6a-a ?S_`H 1YTZ3+b¯? d67|:=? 7=5656@ ?Sa aI?S5H d;?nOY?S7<@ 361436DQ6@ T=£LOY?SR UWT=@U]LO4OYeHI@ ?A?`f7=];DZR?S3HI?S5Ri]6OYH 1f5L14R?S36a 14TZ367=O_AO7=a a 1Y¡6?A@Xa+)1YH dK7)¡;l&?S5KU?S7<H ];@ ?aI?SO4?S_`H 1YTZ3>a ];£LD=@ 7<QLd+9$M?¢36T|£6@X1Y?AÄ6eE5L14a _A]6a a\U?S7yH ];@ ?Ka ];£LaI?AHa ?SOY?S_`H 1YTZ3¥UT=@TZ];@)_AO7=a a 1Y¡6?A@XaA9

VH«14aK?SO4Or3;T)3H d67<HAb$1YU\RT=@ ?ET=@«OY?Sa a«@ ?S56]63&567=3HKU?S7<H ];@ ?Sai7<@ ?E136_AO4]65;?S5 143©7>567<H 7naI?AHAb$H d;?SaI?U?S7<H ];@ ?SaR7|eN£L147=a«H d;?_AO7=a a 1Y¡6?A@iH d67<H14a«O4?S7<@ 3;?S5U@ TZR H d6? 567<H 7&b)d614_XdÈ143 H ];@ 3ÈR7Se @ ?Sa ]LOYH1437-@ ?SO47<H 1Y:=?SOYeÅQTT=@N_AO47=a a 1Y¡L_A7<H 1YTZ3È7=_A_A];@ 7=_`e=9 je_`TZ36a HI@ ]6_`H 143;D¤H d;?_AO7=a a 1Y¡6?A@¦T:=?A@F ]6aIHC7¥a ];£La ?AHT=UH d;?nUW?S7<H ]6@ ?n:<7<@ 147<£OY?SaAb7¯OY?Sa a_`TZRQLOY?`l§_AO47=a a 1Y¡6?A@14aer14?SO45;?S5ÇH dL7<HEHI?S365LaEHIT­d67|:=? 7©£?AHIHI?A@EQ?A@ UT=@IR7=3L_`?5ª0»7=3;DZO4?Ae;k)mq24pob\ÌSq=q<6X9[Ë$1436561436D¤7nR14361YRi]6R a ];£aI?AHCT=UU?S7<H ];@ ?Sa¢a ]6_XdH d67<HCH d;?a ?SOY?S_`H 1Y:=?_AO47=a a 1Y¡6?A@_`TZ36a HI@ ]6_`HI?S5nT|:=?A@H d614aa ]6£LaI?AH)d67=ad61YDZd&?SaIH¦7=_A_A];@ 7=_`e14a&3;T)3n7=aH d;?iU?S7<H ];@ ?a ];£LaI?AHaI?`OY?S_`H 14TZ3>Q6@ T=£OY?SR¥9u)d;?KU?S7<H ];@ ?¢a ?SOY?S_`H 1YTZ3>QL@ T=£LOY?SR]636UWT=@ H ]L367<HI?SOYe 14aK&3;T|3NHIT[£?vFfdL7<@ 5©13D=?S3&?A@ 7=O5ªua 7=R7<@ 5L143;TZa.7=365¥ÂO41YU?A@ 14aAb3<o=oZs6X9

Ë6T=@ jk7|e=?Sa 147=3 3;?AHT=@ _AO47=a a 1Y¡6?A@ aAbÇ5L1¨?A@ ?S3Hd;?S]6@ 14aIH 14_¦7<Q6QL@ TZ7=_ d;?Sa)HITU?S7<H ];@ ?«a ];£LaI?AH)a ?SOY?S_`H 1YTZ3d67|:=?£?A?S3Q6@ T=QTZa ?S5+9¢g¦3;?iT=UH d;?Sa ?1a)H d;?&z+2744 k)zg274 4z+$255CTZd67S:&1&7=365 =TZd63+b;ÌSq=q<6Xb13«)dL14_ dH d;?aI?SOY?S_`H 1YTZ3ET=U+U?S7<H ];@ ?):<7<@ 147<£OY?Sa14aR?A@ D=?S5E)1YH dH d;?ÇO4?S7<@ 36143;D-7=OYD=T=@X1YH d6R¥9 M? 36T| 7<@ DZ]6?ÇH d67<HH d;?¤a 7=R?¤7<Q6Q6@ TZ7=_ d­_A7=3¯£?¤]6aI?S5©UT=@¢TZ];@Ri]6OYH 1561R?S36a 1YTZ3L7=Oj\7Se=?Sa 17=3§3;?AHT=@ ­_AO47=a a 1Y¡6?A@ aA9¯ud;?@ ?Sa ]6OYH 143;DiQ6@ T&_`?S56];@ ?¦1a7=a.UTZO4OYT|a)

Ì<9\8.d;TTZa ?ÇH d;?¼?SRQ6HeÆUW?S7<H ];@ ?§aI?SOY?S_`H 1YTZ3Èa ];£;D=@X7<QLd¥UT=@H d;?«143L1YH 147=O_A];@ @ ?S3Ha ];£LD=@ 7<QLd+9

r9Ë6@ TZR H d6?¾_A];@ @ ?S3HÇa ];£LD=@ 7<QLd®D=?S3;?A@ 7<HI?Æ7=O4OQTZa a 1Y£LOY?¯UW?S7<H ]6@ ?§aI?SOY?S_`H 14TZ3 a ];£6D=@X7<QLd6aH d67<H7<@ ?CT=£6H 7=143;?S5¤£reE7=565L143;D7=3¥7<@ _¦UW@ TZR#7_AO47=a a:<7<@ 147<£OY?CHIT7U?S7<H ];@ ?K:<7<@ 147<£LO4?=9

sr9Ë6T=@>?S7=_ d¾D=?S3;?A@ 7<HI?S5-U?S7<H ];@ ?aI?SOY?S_`H 14TZ3-a ];£;D=@X7<QLd+b_`TZRQL];HI? H d;?7=_A_A];@ 7=_`e½T=U«H d;? £?SaIH_AO7=a a 1Y¡6?A@»DZ1Y:=?S3KH d61a+a ];£6D=@ 7<Qd¦H d67<H214a+O4?S7<@ 3;?S5]La 143;DH d;?¢7=OYD=T=@ 1YH d6R UW@ TZR H d;?KQ6@ ?A:r1YTZ]LaaI?S_ÍH 14TZ3+9

;9\^&?SOY?S_`H$H d;?\£?Sa HGD=?S36?A@ 7<HI?S5a ];£6D=@X7<QLd+b<H d67<H14aAbH d6?UW?S7<H ]6@ ?aI?SOY?S_`H 14TZ3Na ];£6D=@X7<QLdT=UkH d6?_AO47=aa 1Y¡[email protected]=Ud61YDZd;?Sa H7=_A_A];@ 7=_`e=9

112 L. C. van der Gaag and P. R. de Waal

Page 123: Proceedings of the Third European Workshop on Probabilistic

tr9VUH d;?¦7=_A_A];@ 7=_`eT=U+H d;?¦_AO47=a a 1Y¡6?A@)14H dH d;?aI?`O4?S_`HI?S5a ];£6D=@ 7<Qd1adL1YDZd;?A@¦H d67=3 H d67<HCT=UkH d;?_AO7=a a 1Y¡6?A@1YH d­H d;?>_A];@ @ ?S3Ha ]6£6D=@ 7<QLd+bH d;?S356?S3;T=HI?H d6?\a ?SOY?S_`HI?S5a ];£6D=@ 7<Qd¢7=aGH d;?\_A];@ @ ?S3Ha ];£6D=@ 7<QLd§7=365ÇD=T¯HIT­^rHI?AQ1r9-VU¦36T=HAb.H d;?S3a HIT=Q[7=365¥Q6@ T=QTZa ?CH d;?K£?SaIH_AO47=a a 1Y¡[email protected]=@H d;?_A]6@ @ ?S3Ha ];£6D=@ 7<Qd¥7=a.H d;?KT:=?A@ 7=O4OÁ£?SaIHA9

^rH 7<@ H 13;D )1YH d7=3?SRQLHfe D=@ 7<QLdL14_A7=ONaIHI@ ]L_`H ];@ ?)1YH d6TZ];H)7=3e>7<@ _AaAb7=a143[H d;?¢7<£T|:=?iQ6@ T&_`?S56];@ ?=bL14a&3;T|)37=a w zn2z=f$k4kx*m>>L9$ÂOYHI?A@ 3L7<H 1Y:=?SOYe=bx2 n2z+-k4 U2<m>~¤_A7=3£?\]6aI?S5Ábyd614_ daIH 7<@ H a$)1YH d7KUª]6O4OrD=@ 7<Qd614_A7=O6aIHI@X]6_`H ];@ ?U@ TZRÈ)d614_Xda 143;DZO4?7<@ _Aa7<@ ?K@ ?SRT|:=?S5n143¥7=3>1YHI?A@ 7<H 14:=?KU7=a d61YTZ3+9

6µºL³2´< Z<¸n´<V3®H d614a­aI?S_`H 1YTZ3#k?-Q6@ ?Sa ?S3H¼aITZR?-QL@ ?SO414R1367<@ e3r]6R?A@ 14_A7=O¦@ ?Sa ]6OYH a¥U@ TZR"TZ]6@>?`lrQ?A@ 1R?S3H a>HIT§14OO4]6a HI@ 7<HI?H d;?£?S3;?A¡6H a¦T=U\Ri]6OYH 1f561R?S36a 1YTZ3L7=O41YHfe¥T=Uj\7Se=?Sa 17=3n3;?AHT=@ ¤_AO7=a a 1Y¡6?A@XaA9

^&136_`?«H d;?cC8kV.@ ?AQTZa 1YHIT=@ e¥T=U£?S3L_ d6R7<@ >567<H 75;Tr?Sa>3;T=H[136_AO4]65;? 7=3e'5L7<H 7ÇaI?AH a>1YH d-R¢]LOYH 1YQLOY?_AO47=a ak:<7<@ 147<£LO4?SaAbk?56?S_A145;?S5EHITiD=?S36?A@ 7<HI?CaITZR?C7<@IH 1Y¡L_A17=O+567<H 7aI?AH a)HITHI?SaIHTZ];@)OY?S7<@X36143;D7=OYD=T=@X1YH d6R¥9ud;?Sa ?567<H 7§a ?AH a[k?A@ ?D=?S36?A@ 7<HI?S5ÅUW@ TZR H d;?NTr?`aIT=QLdL7<D=?S7=O_A7=3L_`?A@3;?AHfkT=@ _57=3¥56?A@BC7=7<Dk)m2<4pob<o=o6X9udL14a3;?AHT=@ UT=@H d6?)aIH 7<DZ143;D¢T=U+_A7=36_`?A@T=UH d;?T?Sa T=QLd67<DZ]6a\1436_AO]65;?Sa ¢@ 7=3L5;TZRÈ:<7<@ 147<£OY?SaT=U)d61_ d8=t7<@ ?T=£LaI?A@ :<7<£LOY?U?S7<H ];@ ?:y7<@ 17<£LOY?SaA9»u)d;@ ?A?T=U¦H d;?n36?AHfkT=@ paE:<7<@ 147<£LOY?Sa143¼?Sa aI?S3L_`?n7<@ ?[_AO7=a a:<7<@ 147<£LOY?SaSb)dL14_ dK13KH d;?k_A];@ @ ?S3H23;?AHT=@ «7<@ ?\a ]6RR7<@ 1aI?S5n13n7¤a 1436DZOY?«TZ];HIQ];H:<7<@ 147<£OY?=9¦M?iD=?S36?A@I7<HI?S5¯H d;@ ?A?567<H 7na ?AH a¢T=UKÌAo=o&bn<o=o 7=365l Zo=o a 7=RQLOY?SaSb@ ?SaIQ?S_`H 1Y:=?SO4e=bk]6a 143;DOYT=DZ14_¥a 7=RQLO4143;D;9¼Ë;@ TZRH d;?D=?S36?A@ 7<HI?S5Ea 7=RQLOY?Sak?@ ?SRT|:=?S5EH d;?):<7=O4];?SaT=U7=O4O3;TZ3&T=£aI?A@ :<7<£LOY?:y7<@ 17<£LOY?SaAbr?`l&_`?AQLH.UWT=@\H d;TZaI?CT=UH d;?KH d6@ ?A?K_AO47=a a.:<7<@ 147<£OY?SaA9

Ë;@ TZR H d;? H d;@ ?A? 5L7<H 7 aI?AH a-k?È_`TZ36aIHI@X]6_`HI?S5Uª]6O4OYe 367=1Y:=? 7=365°U]LO4OYe HI@ ?A?`f7=];DZR?S3HI?S5 Ri]6OYH 15614R?S36a 1YTZ367=O«_AO47=a a 1Y¡L?A@ aA9 Ë6T=@H d614anQL];@ QTZaI?=bCk?]6aI?S5ÅH d;?ÇO4?S7<@ 36143;D½7=OYD=T=@ 1YH dLR 5;?Sa _`@ 1Y£?S5Å143ÅH d;?Q6@ ?A:&1YTZ]6aa ?S_`H 1YTZ36aA9 Ë;T=@¯_`TZRQL7<@ 14aITZ3 QL];@ QTZaI?SaAbk?¯Uª];@ H d6?A@nOY?S7<@ 36?S5Å3L7=1Y:=?¯7=3L5¾HI@ ?A?`f7=]6DZR?S3HI?S5j\7Se=?Sa 17=3 3;?AHT=@ Ç_AO47=a a 1Y¡L?A@ a)1YH dÇ7_`TZRQTZ]6365_AO47=a a:<7<@ 147<£LO4?KUW@ TZR H d;?567<H 7&9CË;T=@K7=O4O2_AO47=a a 14¡6?A@ aAbk?]6a ?S5©7nUWT=@ k7<@ 5;faI?SOY?S_`H 1YTZ3¯@ 7<QLQ?A@i7<Q6Q6@ TZ7=_Xd)1YH dnH d;?OY?S7<@ 3613;DE7=OYD=T=@X1YH d6R¥9¦ud;?7=_A_A];@ 7=_A1Y?Sa¦T=UH d;?:<7<@ 1YTZ]La_AO47=a a 14¡6?A@ a$k?A@ ?)_A7=O4_A]6O7<HI?S5]6a 13;D¦HI?S3&

a 1 A?CT=U567<H 7a ?AH))ÌAo=o$4p2PP kzfm> 43k 2P*o 42z)o8kTZRQTZ]L365¤367=1Y:=? o&9 =q=t¿n]6OYH 1f5L14R®367=1Y:=? o&9 t ÌSs<8kTZRQTZ]L365¤u2Âv o&9 s=t ÌSm<=q¿n]6OYH 1f5L14R®ËGuGÂv o&9 6Ì Zo

a 1 A?CT=U567<H 7a ?AH)<o=o$4p2PP kzfm> 43k 2P*o 42z)o8kTZRQTZ]L365¤367=1Y:=? o&9 <o <&Ì¿n]6OYH 1f5L14R®367=1Y:=? o&9 t=t=t Ì*<q8kTZRQTZ]L365¤u2Âv o&9 s<oZt s<o<o¿n]6OYH 1f5L14R®ËGuGÂv o&9 <t ÌAoZq<

a 1 A?CT=U567<H 7a ?AH) Zo=o$4p2PP kzfm> 43k 2P*o 42z)o8kTZRQTZ]L365¤367=1Y:=? o&9 t=t<o <s<¿n]6OYH 1f5L14R®367=1Y:=? o&9 <oZt 8kTZRQTZ]L365¤u2Âv o&9 t<oZt <o ¿n]6OYH 1f5L14R®ËGuGÂv o&9 t=m=t s=m<

uG7<£LOY?)ÌGl&Q?A@X14R?S3H 7=O=@ ?Sa ]6O4H a+UWT=@$561¨?A@ ?S3HÁHferQ?SaT=U_AO47=a a 1Y¡6?A@TZ35L7<H 7EaI?AH aT=U5L1¨?A@ ?S3Ha 1 A?«D=?S36?A@I7<HI?S5>U@ TZR#H d6?CT?SaIT=Qd67<D=?S7=O2_A7=3L_`?A@3;?AHT=@ 9

UTZO45_`@ TZa aI:y7=O41567<H 1YTZ3+9$Ë;T=@kH d;?)Ri]6OYH 1f561R?S36a 1YTZ3L7=O_AO47=a a 1Y¡6?A@ aAb?¦5;?A¡L3;?S5H d;?S1Y@k7=_A_A];@ 7=_`e7=akH d;?Q6@ T<QT=@ H 1YTZ3T=UÁa 7=RQLOY?SaH d67<Hk?A@ ?_AO7=a a 1Y¡6?S5E_`T=@ @ ?S_`H OYeUT=@)7=O4O+_AO47=a a):y7<@X147<£LOY?Sa.143:=TZOY:=?S5+9

ud6?n@ ?Sa ]6OYH aEUW@ TZR TZ];@¤?`l&Q?A@ 14R?S3H a¤7<@ ?a ]6RR7<@ 1aI?S5©13©uG7<£LOY? Ì<9ud;?¤7=_A_A];@ 7=_`e©T=U)H d;?£?SaIHOY?S7<@ 36?S5¥_AO47=a a 1Y¡L?A@\1aDZ14:=?S3>143H d;?CaI?S_`TZ365>_`TZO]6R3+ÊH d;?>H dL1Y@ 5Ç_`TZO4]LR3¼DZ1Y:=?SaH d;?n3r]6R¢£?A@T=UQ7<@ 7=R?AHI?A@Q6@ T=£L7<£L1O41YH 1Y?Sa¢H d67<Hk?A@ ?n?SaIH 14R7<HI?S5§UWT=@EH d614a_AO47=a a 1Y¡6?A@S9 Ë6@ TZR H d;? H 7<£OY?§k?½R7Se _`TZ3L_AO4]65;?H d67<HEH d;?nRi]6OYH 1f5L14R?S36a 14TZ367=O_AO7=a a 1Y¡6?A@XaAb1YH d;TZ];H?`l;_`?AQ6H 1YTZ3+bTZ];HIQ?A@ UWT=@XR H d;?S1Y@_`TZRQTZ]L365_`TZ]63HI?A@IQL7<@ H a143HI?A@XRaT=UÁ7=_A_A];@X7=_`e=9ÂOaIT«H d;?3]LR«£?A@ aT=U?SaIH 14R7<HI?S5©Q7<@ 7=R?AHI?A@ a7<@ ?¤_`TZ36a 15;?A@ 7<£LOYea R7=O4OY?A@UT=@iH d6?¤Ri]6OYH 1Yf5614R?S36a 1YTZ367=O_AO47=a a 1Y¡6?A@ aA9g¦3¯7S:=?A@ 7<D=?=bZH d;?.OY?S7<@X3;?S5¢Ri]6OYH 1Yf5614R?S36a 1YTZ367=O=_AO47=a a 1Y¡6?A@ a2@ ?`r]61Y@ ?KTZ36?`H d61Y@ 5[T=UH d;?¢3r]6R¢£?A@T=UQL7<@ 7=R?AHI?A@XaT=UH d;?S1Y@>_`TZRQTZ]L365-_`TZ]63HI?A@ QL7<@ H aA9 u)d;?561¨?A@ ?S36_`?14aQL7<@ H 14_A]6O47<@ O4eNaIHI@ 1Y&143;DUWT=@H d6?>367=1Y:=?>_AO47=a a 1Y¡6?A@ aOY?S7<@ 36?S5¾U@ TZRÃH d;?½ÌAo=oyfa 7=RQLOY?­567<H 7'aI?AHAb«d;?A@ ?H d;?KRi]6OYH 1f5L14R?S36a 14TZ367=O_AO7=a a 1Y¡[email protected];?A?S5La.TZ36OYeETZ3;?`¡6UH dT=UH d;?¤3]6R¢£?A@¢T=UQL7<@ 7=R?AHI?A@Xa¢T=UH d;?¤_`TZRQTZ]6365TZ36?=9ud;?)a R7=O4OY?A@3r]6R¢£?A@XaT=U+QL7<@ 7=R?AHI?A@ a

Multi-dimensional Bayesian Network Classifiers 113

Page 124: Proceedings of the Third European Workshop on Probabilistic

@ ?$r]61Y@ ?S5_`TZ3LaIH 1YH ];HI?«7_`TZ3La 145;?A@ 7<£OY?C7=5;:y7=3H 7<D=?«T=UH d;?Ri]6OYH 1Yf5614R?S36a 1YTZ367=O_AO7=a a 1Y¡6?A@Xa2T:=?A@H d6?S1Y@_`TZRQTZ]6365_`TZ]L3HI?A@ Q7<@ H aa 136_`?.H d;?SaI?QL7<@ 7=R?AHI?A@ aHferQ;14_A7=OOYeK3;?A?S5HIT¦£??SaIH 1R7<HI?S5UW@ TZRÆ@ ?SO47<H 1Y:=?SOYe¢a R7=O4O567<H 7aI?AH aA9

ÂOYH d;TZ]6DZd TZ];@QL@ ?SO414R1367<@ e-?`l&Q?A@X14R?S3H 7=O@ ?`a ]LOYH anO4TT=ÀQ6@ TZR14a 143;D;bKk?Ç7<@ ?¼7S\7<@ ?ÇH d67<H Uª];@IH d;?A@?`l&Q?A@X14R?S3H 7<H 1YTZ3n1a)3;?S_`?Sa a 7<@ e>HIT¤a ];£LaIH 7=3H 17<HI?C7=3eE_AO47=14Rak7<£TZ];Hk£?AHIHI?A@\Q?A@ UWT=@ R7=36_`?T=U2TZ];@Ri]6OYH 1f5L14R?S36a 14TZ367=O<jk7|e=?Sa 147=33;?AHT=@ K_AO7=a a 1Y¡6?A@XaA9

&E¶»³¹ f¸n<ºf¶»³n&+³· ¸´Z¸µ Z3µ¹V3-H d614a>QL7<Q?A@>?¯143HI@ Tr5L]6_`?S57Ç3;?A U7=R14OYe§T=Uj\7Se=?Sa 147=3¥3;?AHfkT=@ ¤_AO47=a a 14¡6?A@ aH d67<H.1436_AO4]656?)TZ3;?¦T=@RT=@ ?E_AO47=a a«:y7<@X147<£LOY?Sa¢7=365Ri]6OYH 1YQOY?UW?S7<H ]6@ ?:<7<@ 17<£LO4?SaH d67<H¤3;?A?S5§3;T=H£?nRTr56?SO4OY?S5Ç7=a£?S13;DN5;?`Q?S365;?S3HK];QTZ3?A:=?A@ eN_AO7=a aK:<7<@ 147<£LO4?=9M?UWT=@ Ri]&O47<HI?S5©H d;?OY?S7<@ 361436D>Q6@ T=£LO4?SR UWT=@iH d614a¢Uª7=R14OYe 7=365Q6@ ?SaI?S3HI?S5­7aITZO4];H 14TZ3©7=OYD=T=@X1YH d6R°H d67<H14aiQTZO4er3;T<R17=O143H d;?3r]6R¢£?A@KT=U\:y7<@ 17<£LOY?SaK143:=TZOY:=?S5+9¥g¦];@Q6@ ?SO414R143L7<@ e¦?`lrQ?A@ 14R?S3H 7=O&@ ?Sa ]6OYH a$aI?A@ :=?S5HITK14O4O4]6aIHI@ 7<HI?H d;?£?S36?A¡6H aT=U+H d;?R¢]LOYH 1f5614R?S36a 1YTZ367=O1YHfeKT=UTZ];@j\7|e=?Sa 147=33;?AHfkT=@ ¥_AO47=a a 14¡6?A@ aA9

V3-H d;?©3;?S7<@[Uª];H ];@ ??13HI?S365HIT§Q?A@ UT=@ R 7RT=@ ?¯?`l&HI?S36a 14:=?¼?`lrQ?A@ 14R?S3H 7<H 1YTZ3 a H ]65;eT=UTZ];@OY?S7<@X36143;D7=O4D=T=@ 1YH d6R¥b]6a 143;DT=H d;?A@>567<H 7­aI?AH a¤7=365T=H d;?A@­7<Q6Q6@ TZ7=_ d;?Sa¯HITÅUW?S7<H ];@ ?½a ]6£LaI?AHaI?SO4?S_`H 1YTZ3+9M? Uª];@ H d;?A@)14a d§HIT¼143:=?Sa H 1YDZ7<HI?T=H d;?A@¤HferQ?SaET=URT&5;?SO[UW@ TZR TZ];@'Uª7=R14OYe®T=URi]6OYH 1f5L14R?S36a 14TZ367=O_AO47=a a 1Y¡6?A@ aS9FGTZa a 14£LOY?E7=O4HI?A@ 367<H 1Y:=?Sa143L_AO4]65;?_AO7=a a 1¡6?A@Xa21YH d . f5;?AQ?S365;?S36_`?kQTZOYeHI@ ?A?Sa2T:=?A@H d;?S14@GU?S7yH ];@ ?:y7<@ 17<£LOY?SaA9^;1436_`?H d;?3r]6R¢£?A@CT=U\_AO7=a aK:<7<@ 17<£LO4?Sak]La ]67=O4OYe1a@X7<H d;?A@a R7=O4Obrk?C7=O4aITkTZ]6O45¤O41Y=?HIT13:=?SaIH 14DZ7<HI?[H d;?>U?S7=a 1Y£14O41YHeNT=UC_AO47=a a 1Y¡L?A@ a)1YH da O1YDZdH O4eRT=@ ?«_`TZRQLOY?`l¥_AO47=a aa ]6£6D=@ 7<QLd6aS9

=6µ³¹Z !"##$% &(')*+,-/.0 !"21346587"9:9:7:;<=0?>@)3ABDCFE%G>(HI"*>J)LKNMPOC@:QGBR%C@:Q"Q%%ST>@%UAV>(W:@'SFXYZ\[+]@^_a`a`ab!cedf!g^Thjiek`mln"iekpo q]@^Jr`as!d#t-^!duhF`F]J`Fd6_J`v^!dpw0]<ixc y_czs!|FdiY`e/c~f:`d_a`X@=3M-JSJSC4uS "m"

DN.)!W".0 %#5(:;<pOCCJE%GBI!>@%4I%US@A@>@3CJQQ%G%G>(H?%US(>J@%Q>J%G:SW%G>@)CA>@JuS|Jo-o-o]@s"dgs_ixcz^!dg^"d|FdhF^"]<Ds"i8cz^"d6k`J^"]<!Z\:73?:

v .)N"¡3 m%5T::;F=0>@)ST):T>JST>"@QJSJAFAF*L#%GJAF>@4:JC)+¢P_Fcz`dixczs#¢cedcz_as!Z+£:L:9:9

¡3 K#¤.! :L", O¡+)BI:S5T:u;Fo G`D`di8g^(h|FdhF^"]<Ds"i8cz^"d6k`a^!]<!¥¦%GH

P$BDS\5(u";<=0C>J%GBB§Qa"Aa)%G4S\¨^!q]<d6s!^Th©`Fg`as!]@_akN^Thmiªk`j«s!ixcz^!d6s!¬q]J`Js"q­^(h3¢i®s"db:s!]@b!gajZ7"£:£Lp7!9

@%BI¯1j%4"?K#1jUST°BD%>u5T:";F+H:S@%¦F>(W J'¦AFUSJST%G±aS²­s_akced6`³`as!]<dcedf"7Z+£2Nu£

$ 3:4)´"µK¶ pMV°°%xN5(:;<"J%44BD:>Jµ+HuST%U"´AFUSJS@%~±JSZ·¸AF:BvC"J%S@*%US(>J@%Q>J%G:xQS@N­A:S@S@%~±A!>J%G:®QS@"CC@Aa)uS"XYZ-[+]@^_a`a`ab!cedf!gm^(hjiªk`D¹iekD|FdiY`F]<d6s!ixcz^!ds"º^!]@»!g@k^ar¶^"dw2|s!db?¢iYs!ixcUg<i8cz_Fg<C"4:S27:7L7£9

¼L3) %?1v : :)5(u";<¥a"CCaS\*ª:V*ª">JJ½S@QS@F>NS@A<>J%G:0w0]<ixc y_czs!2|<di®`e/c~f`Fd6_J`mZ7!£m£:7!

j3@ST'!"x65(";<=0I>@)LS@)@>@uS(>+STC"%4vSTQ>J@3"*V4:JC)"p>@)3>@a %G4DSJ"S@BDpC@:QB2[ ]@^u_a`J`ab"cedf"g,^(h­iek`Nw0D`]<cz_as!d¾²­s!iªk`Ds"i8cz_as"¢P^_cz`i8!Z\:p"9

M\\4H:¤¥¿\XYQ"ÀD¤¡+):BDCST:+5(::7:;<O"HST%US"*I+HS@%U"ÁA:S@S@%~±aS+XYZ¾[+]@^_a`a`ab!cedf!g^ThDiek`pl:Â!iªkt-^!duhF`F]J`Fd6_J`^"d,w0]<ixc y_czs!|FdiY`Fe/c~f`d_a`FC"4uS77"£mp77

M\ AS5x7"99;F¼ST>@J%UA<>@u H:S@%ÃF>(W J'ST>@JAF>@J@%G4XYZ\w0b!Ä!s!d6_J`FgcedD¬0s!`Fg<czs!dI«`i8Å ^"]@»gÆ-¢ixqb!cz`Fgced­ÇPq:ÈaÈced`Fgagvs!d6bp¢6^ThaiLtV^"2r6qi8cedf" :x6:CJ%G4@®VJ4C"4:S7!7£:7

MV@'6j587"9:9:7:;<+KNOMÉAFBDCFE%G>(H¦@uST~>aSpÊ"CC@E%BD">@%IBDF>J)SXYZ-[+]@^_a`a`ab!cedf!gm^(hjiªk`mlË"iªkt-^"dhF`]@`Fd6_a`Ì^"dÎͤd_a`]<i®s"cedixÊced¿w0]<ixc y_czs!3|<di®`e/cef`d_a`FKNJ4:"L"*ªBIC4uS £:mp£:

K#v")BD%85T::;Fmu"J%G4R%GBD%G>@uÏCA+H:S@%mAFUSJST%G±aSXYZV[ ]@^u_a`J`ab"cedf"g2^Th+iek`2Ð:db2|FdiY`]<ds!ixcz^!d6s!tV^"dhF`]@`d_a`#^!dÑd6^ Å G`abf`ÒcUgF_a^"Ä`]<s"dbÒs"i®sI²cedcedfOO0O2XM-JSJSC"4uS+££:m££:

X<-¡SJ"BI"a%:Sv".0 -O2%~*ª@%US25x7"9:9£:;FV¡!W+"aSC@%AF%C*ª">@JSTGuA<>@%ZJ !"A!±~>JaSWa"CCJSXYZV[ ]J^_a`a`ab!cedf"g^(h+iªk`0Ó"iªkm|FdiY`F]<d6s!ixcz^!ds"º^!]@»!g@k^ar^!dDw0]<i8c y _Fczs"|<diY`e/c~f:`d_a`js!d6bj¢iYs!ixcUg<i8cz_Fg<

Ô.0 ¶D134V6¤¼2% &F.0 K¶¥¦%G>T>JBI"j K# M\O2BI"Nj 1v¡:"x587997;<M-JQQ%GG%G>@%S­*ª,ÊC@:Q"Q%%ST>@%UA#>(W:@'6Z¿ÁAS@ÃST>@H%¿S@C)"4u"mA"Au3w0]<ixc y_czs!j|FdiY`Fe~c~f:`d_a`Àced²­`ab!cz_ced`7:Z+u7£LN:

114 L. C. van der Gaag and P. R. de Waal

Page 125: Proceedings of the Third European Workshop on Probabilistic

Dependency networks based classifiers: learning models by usingindependence tests.

Jose A. Gamez and Juan L. Mateo and Jose M. PuertaComputing Systems Department

Intelligent Systems and Data Mining Group – i3AUniversity of Castilla-La Mancha

Albacete, 02071, Spain

Abstract

In this paper we propose using dependency networks (Heckerman et al., 2000), that is aprobabilistic graphical model similar to Bayesian networks, to model classifiers. The maindifference between these two models is that in dependency networks cycles are allowed, andthis fact has the consequence that the automatic learning process is much easier and canbe parallelized. These properties make dependency networks a valuable model especiallywhen it is needed to deal with large databases. Because of these promising characteristicswe analyse the usefulness of dependency networks-based Bayesian classifiers. We presentan approach that uses independence tests based on chi square distribution, in order tofind relationships between predictive variables. We show that this algorithm is as goodas some state-of the-art Bayesian classifiers, like TAN and an implementation of the BANmodel, and has, in addition, other interesting proprierties like scalability or good qualityfor visualizing relationships.

1 Introduction

Classification is basically the task of assigninga label to an instance formed by a set of at-tributes. The automatic approach for this taskis one of the main parts in machine learning.Thus, the problem consists on learning classi-fiers from datasets in which each instance hasbeen previously labelled. Many methods havebeen used for solving this problem based on var-ious representations such as decision trees, neu-ral networks, rule based systems, support vec-tor machines among others. However a prob-abilistic approach can be used for this pur-pose, Bayesian classifiers (Duda and Hart, 1973;Friedman et al., 1997). Bayesian classifiersmake use of a probabilistic graphical model suchas Bayesian networks, that have been a verypopular way of representing probabilistic rela-tionships between variables in a domain. Thus aBayesian classifier takes advantage of probabil-ity theory, especially the Bayes rule, in conjunc-tion with a graphical representation for qual-

itative knowledge. The classification task canbe expressed as determining the probability forthe class variable knowing the value for all othervariables:

P (C|X1 = x1, X2 = x2, · · · , Xn = xn).

After this computation, the class variable’svalue with the highest probability is returnedas result. Obviously computing this probabilitydistribution for the class variable can be verydifficult and inefficient if it is carried out di-rectly, but based on the independencies statedby the selected model this problem can be sim-plified enormously.

In this paper we present a new algorithmfor the automatic learning of Bayesian classi-fiers from data, but using dependency network

classifiers instead of Bayesian networks. De-pendency networks (DNs) were proposed byHeckerman et al. (2000) as a new probabilisticgraphical model similar to BNs, but with a keydifference: the graph that encodes the model

Page 126: Proceedings of the Third European Workshop on Probabilistic

structure does not have to be acyclic. As in BNseach node in a DN has a conditional probabilitydistribution given its parents in the graph. Al-though the presence of cycles can be observedas the capability to represent richer models, theprice we have to pay is that usual BNs infer-ence algorithms cannot be applied and Gibbssampling has to be used in order to recover thejoint probability distribution (see (Heckermanet al., 2000)). An initial approach for introduc-ing dependency networks in automatic classifi-cation can be found in (Mateo, 2006).

Learning a DN from data is easier than learn-ing a BN, especially because restrictions aboutintroducing cycles are not taken into account.In (Heckerman et al., 2000) was presented amethod for learning DNs based on learningthe conditional probability distribution for eachnode independently, by using any classificationor regression algorithm to discover the parentsof each node. Also a feature selection schemecan be use in conjunction. It is clear that thislearning approach produces scalable algorithms,because any learning algorithm with this ap-proach can be easily parallelized just distribut-ing groups of variables between all nodes avail-able in a multi-computer system.

This paper is organized as follows. In section2 dependency networks are briefly described. Insection 3 we review some notes about how classi-fication task is done with probabilistic graphicalmodels. In section 4 we describe the algorithmto learn dependency networks classifiers fromdata by using independence tests. In section 5we show the experimental results for this algo-rithm and compare them with state-of-the-artBN classifiers, like TAN algorithm and an im-plementation of the BAN model. In section 6 wepresent our conclusions and some open researchlines for the future.

2 Preliminaries

A Bayesian network (Jensen, 2001) B over thedomain X = X1, X2, · · · , Xn, can be definedas a tuple (G,P) where G is a directed acyclicgraph, and P is a joint probability distribution.Due to the conditional independence constraints

encoded in the model, each variable is indepen-dent of its non-descendants given its parents.Thus, P can be factorized as follows:

P (X) =n∏

i=1

P (Xi|Pai) (1)

A dependency network is similar to a BN,but the former can have directed cycles in itsassociated graph. In this model, based on itsconditional independence constraints, each vari-able is independent of all other variables givenits parents. A DN D can be represented by(G,P) where G is a directed graph not necessar-ily acyclic, and as in a BN P = ∀i, P (Xi|Pai)is the set of local probability distributions, onefor each variable. In (Heckerman et al., 2000)it is shown that inference over DNs, recoveringjoint probability distribution of X from the lo-cal probability distributions, is done by meansof Gibbs sampling instead of the traditional in-ference algorithms used for BNs due to the po-tential existence of cycles.

A dependency network is said consistent ifP (X) can be obtained from P via its factoriza-tion from the graph (equation 1). By this defi-nition, can be shown that exists an equivalencebetween a consistent DN and a Markov network.This fact suggests an approach for learning DNsfrom Markov network learned from data. Theproblem with this approach is that can be com-putationally inefficient in many cases.

Due to this inconvenient, consistent DNs be-came very restrictive, so in (Heckerman et al.,2000) the authors defined general DNs in whichconsistency about joint probability distributionis not required. General DNs are interesting forautomatic learning due to the fact that each lo-cal probability distribution can be learned inde-pendently, although this local way of processingcan lead to inconsistencies in the joint probabil-ity distribution. Furthermore, structural incon-sistencies (Xi ∈ pa(Xj) but not Xj ∈ pa(Xi))can be found in general DNs. Nonetheless, in(Heckerman et al., 2000) the authors arguedthat these inconsistencies can disappear, or atleast be reduced to the minimum when a rea-sonably large dataset is used to learn from. Inthis work we only consider general DNs.

116 J. A. Gámez, J. L. Mateo, and J. M. Puerta

Page 127: Proceedings of the Third European Workshop on Probabilistic

An important concept in probabilistic graph-ical models is Markov blanket. This concept isdefined for any variable as the set of variableswhich made it independent from the rest of vari-ables in the graph. In a BN the Markov blanketis formed by the set of parents, children and par-ent of the children for every variable. However,in a DN, the Markov blanket for a variable is ex-actly the parents set. That is the main reasonwhy DN are useful for visualizing relationshipsamong variables, because these relationships areexplicitly shown.

3 Probabilistic graphical models in

classification

As it has been said before, we can use Bayesianmodels in order to perform a classification task.Within this area, the more classical Bayesianclassifier is the one called naive (Duda and Hart,1973; Langley et al., 1992). This model consid-ers all predictive attributes independent giventhe class variable. It has been shown that thisclassifier is one of the most effective and com-petitive in spite of its very restrictive modelsince it does not allow dependencies betweenpredictive variables. However this model is usedin some applications and is employed as initialpoint for other Bayesian classifiers. In the aug-mented naive bayes family models, the learn-ing algorithm begins with the structure pro-posed by naive Bayesian classifier and then tryto find dependencies among variables using avariety of strategies. One of this strategies canbe looking for relationships between predictivevariables represented by a simple structure likea tree as is done in TAN (Tree Augmented NaiveBayes) algorithm (Friedman et al., 1997). An-other is to adapt a learning algorithm employedin machine learning with general Bayesian net-works. In this case the learning method is mod-ified in order to take into account the relation-ship between every variable and the class.

Apart from this approach to learn Bayesianclassifiers, it must be mentioned another onebased on the Markov blanket concept. Thesemethods focus their efforts in searching the setof variables that will be the class variable’s

Markov blanket. Nonetheless, it has been shownthat methods based on extending Naive Bayesclassifier, in general, outperform the ones basedon the Markov blanket.

When the classifier model has been learnedis time to use inference in order to find whatclass value has to be assigned for the valuesof a given instance. To accomplish this taskthe MAP (Maximum at posterior) hypothesisis used, which returns as result the value forthe class variable that maximize the probabil-ity given the values of predictive variables. Thisis represented by the following expression:

cMAP = arg maxc∈ΩC

p(c|x1, . . . , xn)

= arg maxc∈ΩC

p(x1, . . . , xn|c) · p(c)

p(x1, . . . , xn)

= arg maxc∈ΩC

p(c, x1, . . . , xn)

Thus, the classification problem is reduced tocompute joint probability for every value of theclass variable with the values for the predictivevariables given by the instance to by classified.

When we deal with a classifier based on aBayesian network we can use joint probabilityfactorization shown in equation 1 in order torecover these probabilities. Initially inferencewith dependency networks must be done bymeans of Gibbs sampling, but due to the factthat we focus on classification, and under theassumption of complete data (i.e. no missingdata neither in the training set nor in the testset), we can avoid Gibbs sampling. This resultcomes from the fact that we can use modified

ordered Gibbs sampling as designed in (Hecker-man et al., 2000). This way of doing inferenceover dependency networks is based on decom-posing the inference task into a set of inferencetasks on single variables. For example, if weconsider a simple classifier based on dependencynetwork with two predictive variables, whichshow dependency between them, so with the fol-lowing graphical structure: X1 ←→ X2; C −→X1; C −→ X2; its probability model will be:

P (C, X1, X2) = P (C)P (X1|C, X2)P (X2|C, X1).

Dependency networks based classifiers: learning models by using independence tests 117

Page 128: Proceedings of the Third European Workshop on Probabilistic

With this method joint probability can be ob-tained by computing each term separately. Thedetermination of the first term does not requiresGibbs sampling. The second and third termsshould be determined by using Gibbs samplingbut in this case we know all values for their con-ditioning variables, so they can be determineddirectly too by looking at their conditional prob-ability tables. If we assume that the class vari-able will be determined before other variables(as it is the case), by this procedure we can com-pute any joint probability avoiding the Gibbssampling process.

4 Finding dependencies

In this section we present our algorithm to builda classifier model from data based on depen-dency networks. Our idea is to use indepen-dence tests in order to find dependencies be-tween predictive variables Xi and Xj , but tak-ing into account the class variable, and in thisway extend the Naive Bayes structure. It isknown that statistic

2 ·Nc · I(Xi; Xj |C = c)

follows a χ2 distribution with (|Xi|−1)·(|Xj |−1)degrees of freedom under the null hypothesis ofindependence, where Nc is the number of inputinstances in which C = c and I(Xi; Xj |C = c)is the mutual information between variables Xi

and Xj when the class variable C is equal toc. Using this expression, for any variable canbe found all other variables (in)dependent giventhe class.

But it is not this set what we are lookingfor, we need a set of variables which made thatvariable independent from the other, that is, itsMarkov blanket. So, we try to discover this setby carrying out an iterative process, for everypredictive variable independently, in which ineach step is selected as a new parent the vari-able not still chosen which shows more depen-dency with the variable under study (Xi). Thisselection is done considering all the parents pre-viously chosen. Thus the statistic used actuallyis

2 ·Nc · I(Xi; Xj |C = c,Pai),

where Pai is the current set of Xi’s parents mi-nus C. We assess the goodness of every candi-date parent by the difference between the per-centile of the χ2 distribution (with α = 0.025)and the statistic value. Here the degrees of free-dom must be

df = (|Xi| − 1) · (|Xj | − 1)∏

Pai∈Pai

(|Pai|).

Obviously this process finishes when all can-didate parents not selected yet shows indepen-dence with Xi according to this test, and in or-der to make search more efficient, every candi-date parent that appear independent in any stepin the algorithm, is rejected and is not takeninto account in next iterations. The reason ofrejecting these variables is just for efficiency, al-though we know that any of these rejected vari-ables can become dependent in posterior inter-ations.

Figure 1 shows the pseudo-code of this algo-rithm called ChiSqDN.

I n i t i a l i z e s t r u c tu r e to Naive Bayes

For each va r i ab l e Xi

Pai = ∅Cand = X \ Xi // Candidate parentsWhile Cand 6= ∅

For each va r i ab l e Xj ∈ Cand

Let val be assessment f o r Xj by χ2

t e s tI f val ≤ 0 Then

Remove Xj from Cand

Let Xmax be the best v a r i a b l e foundPai = Pai ∪ Xmax

Remove Xmax from Cand

For each va r i ab l e Pai ∈ Pai

Make new l i n k Pai → Xi

Figure 1: Pseudo-code for ChiSqDN algorithm.

It is clear that determining degrees of free-dom is critical in order to perform properly thistest. The scheme outlined before is the generalway for assessing this parameter, nonetheless,in some cases, especially for deterministic or al-most deterministic relationships, the tests car-ried out using degrees of freedom in this waycan fail. Deterministic relationships are pretty

118 J. A. Gámez, J. L. Mateo, and J. M. Puerta

Page 129: Proceedings of the Third European Workshop on Probabilistic

common in automatic learning when input casesare relatively few. Examining the datasets usedin the experiments (table 1) can be seen thatthere are some of them with a low number ofinstances. So, to make the tests in our algo-rithm properly, we use a method for computingdegrees of freedom and handling this kind of re-lationships proposed in (Yilmaz et al., 2002).Basically and considering discrete variables andits conditional probability tables (CPT), thisproposal reduces degrees of freedom based onthe level of determinism and this level is deter-mined by the number of columns or rows whichare full of zeros.

For example, if we wanted to compute degreesof freedom for variables Xi and Xj whose proba-bility distribution is shown in figure 2, we shoulduse (3− 1) · (3− 1) = 4 as value for this param-eter, as both variables have 3 states. Nonethe-less, is clear that the relation between these twovariables is not probabilistic but deterministic,because is similar to consider that variable Xi

has only one state. Thus, if we use this newway for computing degrees of freedom we haveto take into account that this table has two rowsand a column full of zeros. Then the degrees offreedom will be (3− 2− 1) · (3− 1− 1) = 0.

Xj MT

13 16 0 29Xi 0 0 0 0

0 0 0 0

MT 13 16 0

Figure 2: Probability table for variables Xi andXj .

In these cases, i. e, when the degrees of free-dom are zero, it does not make sense performthe test because in this case it always holds.In (Yilmaz et al., 2002), the authors proposeskipping the test and mark the relation in or-der to be handled by human experts. In ourcase, when this algorithm yields degrees of free-dom equal to zero we prefer an automatic ap-proach that is to accept dependence betweentested variables if statistic value is greater thanzero.

5 Experimental results

In order to evaluate our proposed algorithm andto measure their quality, we have selected a setof datasets from the UCI repository (Newmanet al., 1998). These datasets are described intable 1. We have preprocessed these datasetsin order to remove missing values and to dis-cretize continuous variables. Missing values foreach variable have been replaced by the mean ormode depending on whether it is a continuous odiscrete variable respectively. For discretizationwe have used the algorithm proposed in (Fayyadand Irani, 1993), which is especially suggestedfor classification.

Datasets inst. attrib. |C| cont.? missing?

australian 690 15 2 yes no

heart 270 14 2 yes no

hepatitis 155 20 2 yes yes

iris 150 5 3 yes no

lung 32 57 3 no yes

pima 768 9 2 yes no

post-op 90 9 3 yes yes

segment 2310 20 7 yes no

soybean 683 36 19 no yes

vehicle 846 19 4 yes no

vote 435 17 2 no yes

Table 1: Description of the datasets used in theexperiments.

We have chosen a pair of algorithms from thestate-of the-art in Bayesian classification whichallow relationships between predictive variables.One of this algorithms is the one known as TAN(Tree Augmented Naive Bayes). The other em-ploys a general Bayesian network in order torepresent relationships between variables and isbased on BAN model (Bayesian network Aug-mented Naive Bayes (Friedman et al., 1997)). Inour experiments we use B algorithm (Buntine,1994) as learning algorithm with BIC (Schwarz,1978) metric to guide the search. Both algo-rithms use mutual information as base statisticin order to determine relationships, directly inTAN algorithm and a penalized version in BICmetric.

The experimentation process consists on run-ning each algorithm for each database in a 5x2cross validation. This kind of validation is anextension of the well known k-fold cross vali-

Dependency networks based classifiers: learning models by using independence tests 119

Page 130: Proceedings of the Third European Workshop on Probabilistic

dation in which a 2-fold cv is performed fivetimes and in any repetition a different partitionis used. With this process we have the valida-tion power from the 10-fold cv but with less cor-relation between each result (Dietterich, 1998).

Once we have performed our experiments inthat way, we have the results for each classifierin table 2.

TAN BAN+BIC ChiSqDN

australian 0.838841 0.866957 0.843478

heart 0.819259 0.826667 0.823704

hepatitis 0.852930 0.872311 0.865801

iris 0.948000 0.962667 0.962667

lung 0.525000 0.525000 0.525000

pima 0.772135 0.773177 0.744531

post op 0.653333 0.673333 0.673333

segment 0.936970 0.909437 0.931169

soybean 0.898661 0.906893 0.903950

vehicle 0.719385 0.697163 0.698345

vote 0.945299 0.944840 0.931045

Table 2: Classification accuracy estimated foralgorithms tested.

5.1 Analysis of the results

The analysis done for the previous results con-sists on carring out a statistical test. The testselected is the paired Wilcoxon signed ranks,that is a non parametric test which examinestwo samples in order to decide whether the dif-ferences shown by them can be assumed as notsignificant. In our case, if this test holds, thenwe can conclude that the classifiers which yieldthe results analysed have similar classificationpower.

Thus, this test is done with a level of sig-nificance of 0.05 (α = 0.05) and it shows thatboth reference classifiers do not differ signifi-cantly from our classifier learned with ChiSqDNalgorithm. When we compare our algorithmwith TAN we obtain a p-value of 0.9188, andfor the comparison with BAN model the p-valueis 0.1415. Therefore ChiSqDN algorithm is asgood as the Bayesian classifier considered, inspite of the approximation introduced in the fac-torization of the joint probability distributiondue to the use of general dependency networks.Nonetheless, also as consequence of the use ofdependency networks we can take advantage of

the ease of learning and possibility of doing thisprocess in parallel without introduce an extraworkload.

Apart from these characteristics that can beexploited from a dependency network, we canfocus on its descriptive capability, i.e. howgood is the graph for each model to show re-lations over the domain. Taking into accountpima dataset, in figures 3 and 4 are shown thegraphs yielded by the algorithms BAN+BICand ChiSqDN respectively.

Obviously, for every variable in these graphs,those that form its Markov blanket are the moreinformative or more related. The difference isthat for a Bayesian network graph the Markovblanket set is formed by parents, children andparent of children for every variable, but for adependency network graph the Markov blanketset are all variables directly connected. Thus intables 3 and 4 we show the Markov blanket ex-tracted from the corresponding graph for everyvariable. Evidently the sets do not match dueto the automatic learning process but are quitesimilar. Apart from that, it is clear that discov-ering these relationships from the dependencynetwork graph is easier, especially for people notused to deal with Bayesian networks, than doingit from a Bayesian network. We think that thispoint is an important feature of dependency net-works because can lead us to a tuning processhelped by human experts after the automaticlearning.

Variable Markov blanket

mass skin, class

skin age, mass, insu, class

insu skin, plas, class

plas insu, class

age pres, preg, skin, class

pres age, class

preg age, class

pedi class

Table 3: Markov blanket for each predictivevariable from the BAN+BIC classifier.

120 J. A. Gámez, J. L. Mateo, and J. M. Puerta

Page 131: Proceedings of the Third European Workshop on Probabilistic

preg

plas

pres

skin

insu

mass

pedi

age

class

Figure 3: Network yielded by BAN algorithm with BIC metric for the pima dataset.

preg

plas

pres

skin

insu

mass

pedi

age

class

Figure 4: Network yielded by ChiSqDN algorithm for the pima dataset.

Variable Markov blanket

mass skin, pres, class

skin insu, mass, age, class

insu plas, skin, class

plas insu, class

age pres, preg, skin, class

pres age, mass, preg, class

preg age, pres, class

pedi class

Table 4: Markov blanket for each predictivevariable from the ChiSqDN classifier.

6 Conclusions and Future Work

We have shown how dependency networks canbe used as a model for Bayesian classifiers be-yond their initial purpose. We have presenteda new learning algorithm which use indepen-dence tests in order to find the structural model

for a classifier based on dependency networks.By means of the analysis developed in this pa-per, we have shown that the proposed classifiermodel based on dependency networks can be asgood as some classifiers based on Bayesian net-works. Besides this important feature, classi-fiers obtained by our algorithm are visually eas-ier to understand. We mean that relationshipsautomatically learned by the algorithm are eas-ily understood if the classifier is modelled witha dependency network. This characteristic is es-pecially valuable because for people is often dif-ficult discover all relationships inside a Bayesiannetworks if they do not know their theory, how-ever these relationships are clearly shown in adependency network. Besides, we can not forgetthe main contributions of dependency networks:ease of learning and possibility to perform thislearning in parallel.

Dependency networks based classifiers: learning models by using independence tests 121

Page 132: Proceedings of the Third European Workshop on Probabilistic

The ChiSqDN algorithm shown here learnstructural information for every variable inde-pendently, so we plan to study a parallelizedversion in the future. Also we will try to im-prove this algorithm, in terms of complexity,by using simpler statistics. Our idea is to usean approximation for the mutual informationover multiples variables using a decompositionin simpler terms (Roure Alcobe, 2004).

Acknowledgments

This work has been partially supported bySpanish Ministerio de Ciencia y Tecnologıa,Junta de Comunidades de Castilla-La Man-cha and European Social Fund under projectsTIN2004-06204-C03-03 and PBI-05-022.

References

Wray L. Buntine. 1994. Operations for Learningwith Graphical Models. Journal of Artificial In-telligence Research, 2:159–225.

Thomas G. Dietterich. 1998. Approximate sta-tistical test for comparing supervised classifica-tion learning algorithms. Neural Computation,10(7):1895–1923.

R. O. Duda and P. E. Hart. 1973. Pattern classifi-cation and scene analisys. John Wiley and Sons.

U. Fayyad and K. Irani. 1993. Multi-intervaldiscretization of continuous-valued attributes forclassification learning. In Morgan and Kaufmann,editors, International Joint conference on Artifi-cial Intellegence (IJCAI), pages 1022–1029.

Nir Friedman, Dan Geiger, and Moises Goldszmidt.1997. Bayesian Network Classifiers. MachineLearning, 29(2-3):131–163.

David Heckerman, David Maxwell Chickering,Christopher Meek, Robert Rounthwaite, and CarlKadie. 2000. Dependency Networks for Inference,Collaborative Filtering, and Data Visualization.Journal of Machine Learning Research, 1:49–75.

Finn V. Jensen. 2001. Bayesian networks and deci-sion graphs. Springer.

P. Langley, W. Iba, and K. Thompson. 1992. AnAnalisys of bayesian Classifiers. In Proceedings ofthe 10th National Conference on Artificial Intel-ligence, pages 223–228.

Juan L. Mateo. 2006. Dependency networks: Usein automatic classification and Estimation of Dis-tribution Algorithms (in spanish). University ofCastilla-La Mancha.

D.J. Newman, S. Hettich, C.L. Blake,and C.J. Merz. 1998. UCI Repos-itory of machine learning databases.http://www.ics.uci.edu/∼mlearn/MLRepository.html.

Josep Roure Alcobe. 2004. An approximated MDLscore for learning Bayesian networks. In Proceed-ings of the Second European Workshop on Proba-bilistic Graphical Models 2004 (PGM’04).

Gideon Schwarz. 1978. Estimating the dimension ofa model. Annals of Statistics, 6(2):461–464.

Yusuf Kenan Yilmaz, Ethem Alpaydin, Levent Akin,and Taner Bilgi. 2002. Handling of DeterministicRelationships in Constraint-based Causal Discov-ery. In First European Workshop on ProbabilisticGraphical Models (PGPM02), pages 204–211.

122 J. A. Gámez, J. L. Mateo, and J. M. Puerta

Page 133: Proceedings of the Third European Workshop on Probabilistic

Unsupervised naive Bayes for data clustering with mixtures oftruncated exponentials

Jose A. GamezComputing Systems Department

Intelligent Systems and Data Mining Group – i3AUniversity of Castilla-La Mancha

Albacete, 02071, Spain

Rafael Rumı and Antonio SalmeronDepartment of Statistics and Applied Mathematics

Data Analysis GroupUniversity of AlmerıaAlmerıa, 04120, Spain

Abstract

In this paper we propose a naive Bayes model for unsupervised data clustering, wherethe class variable is hidden. The feature variables can be discrete or continuous, as theconditional distributions are represented as mixtures of truncated exponentials (MTEs).The number of classes is determined using the data augmentation algorithm. The proposedmodel is compared with the conditional Gaussian model for some real world and syntheticdatabases.

1 Introduction

Unsupervised classification, frequently knownas clustering, is a descriptive task with manyapplications (pattern recognition, . . . ) and thatcan be also used as a preprocessing task in theso-called knowledge discovery from data bases(KDD) process (population segmentation andoutlier identification).

Cluster analysis or clustering (Anderberg,1973; Jain et al., 1999) is understood as adecomposition or partition of a data set intogroups in such a way that the objects in onegroup are similar to each other but as differentas possible from the objects in other groups.Thus, the main goal of cluster analysis is todetect whether or not the general populationis heterogeneous, that is, whether the data fallinto distinct groups.

Different types of clustering algorithms canbe found in the literature depending on the typeof approach they follow. Probably the threemain approaches are: partition-based cluster-

ing, hierarchical clustering, and probabilisticmodel-based clustering. From them, the firsttwo approaches yield a hard clustering in thesense that clusters are exclusive, while the thirdone yield a soft clustering, that is, an objectcan belong to more than one cluster following aprobability distribution.

In this paper we focus on probabilistic cluster-ing, and from this point of view the data cluster-ing problem can be defined as the inference of aprobability distribution for the given database.In this work we allow nominal and numeric vari-ables in the data set, and the novelty of the ap-proach lies in the use of MTE (Mixture of Trun-

cated Exponential) (Moral et al., 2001) densitiesto model the numeric variables. This model hasbeen shown as a clear alternative to the use ofGaussian models in different tasks (inference,classification and learning) but to our knowl-edge its applicability to clustering remains tobe studied. This is precisely the goal of this pa-per, to do an initial study of applying the MTEmodel to the data clustering problem.

Page 134: Proceedings of the Third European Workshop on Probabilistic

The paper is structured as follows: first, Sec-tions 2 and 3 give, respectively, some prelim-inaries about probabilistic clustering and mix-tures of truncated exponentials. Section 4 de-scribes the method we have developed in or-der to obtain a clustering from data by usingmixtures of truncated exponentials to modelnumerical variables. Experiments over severaldatasets are described in Section 5. Finally, andhaving in mind that this is a first approach, inSection 6 we discuss about some improvementsand applications to our method, and also ourconclusions are presented.

2 Probabilistic model-based

clustering

The usual way of modeling data clustering in aprobabilistic approach is to add a hidden ran-dom variable to the data set, i.e., a variablewhose value has been missed in all the records.This hidden variable, normally referred to asthe class variable, will reflect the cluster mem-bership for every case in the data set.

From the previous setting, probabilisticmodel-based clustering is modeled as a mixtureof models (see e.g. (Duda et al., 2001)), wherethe states of the hidden class variable corre-spond to the components of the mixture (thenumber of clusters), and the multinomial distri-bution is used to model discrete variables whilethe Gaussian distribution is used to model nu-meric variables. In this way we move to a prob-lem of learning from unlabeled data and usu-ally the EM algorithm (Dempster et al., 1977)is used to carry out the learning task when thegraphical structure is fixed and structural EM(Friedman, 1998) when the graphical structurealso has to be discovered (Pena et al., 2000).In this paper we focus on the simplest modelwith fixed structure, the so-called Naive Bayes

structure (fig. 1) where the class is the only rootvariable and all the attributes are conditionallyindependent given the class.

Once we decide that our graphical model isfixed, the clustering problem reduces to takea dataset of instances and a previously speci-fied number of clusters (k), and work out each

Class(hidden)

Y1 Yn Z1 Zm

Figure 1: Graphical structure of the model.Y1, . . . , Yn represent discrete/nominal variableswhile Z1, . . . , Zm represent numeric variables.

cluster’s distribution (multinomial or gaussian)and the population distribution between theclusters. To obtain these parameters the EM(expectation-maximization) algorithm is used.The EM algorithm works iteratively by carryingout the following two-steps at each iteration:

• Expectation.- Estimate the distribution ofthe hidden variable C, conditioned on thecurrent setting of the parameter vector θk

(i.e. the parameters needed to specify thenon-hidden variables distributions).

• Maximization.- Taking into account the newdistribution of C, use maximum-likelihoodto obtain a new set of parameters θk+1 fromthe observed data.

The algorithm starts from a given initial start-ing point (e.g. random parameters for C or θ)and under fairly general conditions it is guaran-teed to converge to (at least) a local maximumof the log-likelihood function.

Iterative approaches have been described inthe literature (Cheeseman and Stutz, 1996) inorder to discover also the number of clusters(components of the mixture). The idea is tostart with k = 2 and use EM to fit a model, thenthe algorithm tries with k = 3, 4, ... while thelog-likelihood improves by adding a new com-ponent (cluster) to the mixture. Of course somecriteria has to be used in order to prevent over-fitting.

To finish with this section we discuss abouthow to evaluate the obtained clustering?. Obvi-ously, a clustering is useful if it produces someinteresting insight in the problem we are analyz-ing. However, in practice and specially in order

124 J. A. Gámez, R. Rumí, and A. Salmerón

Page 135: Proceedings of the Third European Workshop on Probabilistic

to test new algorithms we have to use a differ-ent way of measuring the quality of a clustering.In this paper and because we produce a prob-abilistic description of the dataset, we use thelog-likelihood (logL) of the dataset (D) giventhe model to score a given clustering:

logL =∑

x∈D

log p(x|θ,C).

3 Mixtures of Truncated

Exponentials

The MTE model is an alternative to Gaussianmodels, and can be seen as a generalizationto discretization models. It is defined by itscorresponding potential and density as follows(Moral et al., 2001):

Definition 1. (MTE potential) Let X be amixed n-dimensional random vector. Let Y =(Y1, . . . , Yd) and Z = (Z1, . . . , Zc) be the dis-crete and continuous parts of X, respectively,with c + d = n. A function f : ΩX 7→ R

+0 is

a Mixture of Truncated Exponentials potential

(MTE potential) if one of the next conditionsholds:

i. Y = ∅ and f can be written as

f(x) = f(z) = a0 +

m∑

i=1

ai exp

c∑

j=1

b(j)i zj

(1)for all z ∈ ΩZ, where ai, i = 0, . . . ,m and

b(j)i , i = 1, . . . ,m, j = 1, . . . , c are real num-

bers.

ii. Y = ∅ and there is a partition D1, . . . ,Dk

of ΩZ into hypercubes such that f is de-fined as

f(x) = f(z) = fi(z) if z ∈ Di ,

where each fi, i = 1, . . . , k can be writtenin the form of equation (1).

iii. Y 6= ∅ and for each fixed value y ∈ ΩY,fy(z) = f(y, z) can be defined as in ii.

Definition 2. (MTE density) An MTE poten-tial f is an MTE density if

y∈ΩY

ΩZ

f(y, z)dz = 1 .

A univariate MTE density f(x), for a contin-uous variable X, can be learnt from a sample asfollows (Romero et al., 2006):

1. Select the maximum number, s, of intervalsto split the domain.

2. Select the maximum number, t, of expo-nential terms on each interval.

3. A Gaussian kernel density is fitted to thedata.

4. The domain of X is split into k ≤ spieces, according to changes in concav-ity/convexity or increase/decrease in thekernel density.

5. In each piece, an MTE potential with m ≤t exponential terms is fitted to the kerneldensity by iterative least squares estima-tion.

6. Update the MTE coefficients to integrateup to one.

When learning an MTE density f(y) from asample, if Y is discrete, the procedure is:

1. For each state yi of Y , compute the Max-imium Likelihood Estimator of p(yi), which

is#yi

sample size.

A conditional MTE density can be specifiedby dividing the domain of the conditioning vari-ables and specifying an MTE density for theconditioned variable for each configuration ofsplits of the conditioning variables (Romero etal., 2006).

In this paper, the conditional MTE densitiesneeded are f(x|y), where Y is the class vari-able, which is discrete, and X is either a dis-crete or continuous variable, so the potential iscomposed by an MTE potential defined for X,for each state of the conditioning variable Y .

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials 125

Page 136: Proceedings of the Third European Workshop on Probabilistic

Example 1. The function f defined as

f(x|y) =

5 + e2x + ex, y = 0,

0 < x < 2,

1 + e−x y = 0,

2 ≤ x < 3,1

5+ e−2x y = 1,

0 < x < 2,

4e3x y = 1,

2 ≤ x < 3.

is an MTE potential for a conditional x|y re-lation, with X continuous, x ∈ (0, 3]. Observethat this potential is not actually a conditionaldensity. It must be normalised to integrate upto one.

In general, an MTE conditional densityf(x|y) with one conditioning discrete variable,with states y1, . . . , ys, is learnt from a databaseas follows:

1. Divide the database in s sets, according tovariable Y states.

2. For each set, learn a marginal density forX, as shown above, with k and m constant.

4 Probabilistic clustering by using

MTEs

In the clustering algorithm we propose here, theconditional distribution for each variable giventhe class is modeled by an MTE density. In theMTE framework, the domain of the variables issplit into pieces and in each resulting intervalan MTE potential is fitted to the data. In thiswork we will use the so-called five-parameterMTE, which means that in each split there areat most five parameters to be estimated fromdata:

f(x) = a0 + a1ea2x + a3e

a4x . (2)

The choice of the five-parameter MTE is mo-tivated by its low complexity and high fittingpower (Cobb et al., 2006). The maximum num-ber of splits of the domain of each variable has

been set to four, again according to the resultsin the cited reference.

We start the algorithm with two clusters withthe same probability, i.e., the hidden class vari-able has two states and the marginal probabilityof each state is 1/2. The conditional distribu-tion for each feature variable given each of thetwo possible values of the class is initially thesame, and is computed as the marginal MTEdensity estimated from the train database ac-cording to the estimation algorithm describedin section 3.

The initial model is refined using the data

augmentation method (Tanner and Wong,1987):

1. First of all, we divide the train databaseinto two parts, one of them properly fortraining and the other one for testing theintermediate models.

2. For each record in the test and traindatabases, the distribution of the class vari-able is computed.

3. According to the obtained distribution, avalue for the class variable is simulated andinserted in the corresponding cell of thedatabase.

4. When a value for the class variable for allthe records has been simulated, the con-ditional distributions of the feature vari-ables are re-estimated using the method de-scribed in section 3, using as sample thetrain database.

5. The log-likelihood of the new model is com-puted from the test database. If it ishigher than the log-likelihood of the initialmodel, the process is repeated. Otherwise,the process is stopped, returning the bestmodel found for two clusters.

In order to avoid long cycles, we have lim-ited the number of iterations in the procedureabove to 100. After obtaining the best model fortwo clusters, the algorithm continues increasingthe number of clusters and repeating the proce-dure above for three clusters, and so on. The

126 J. A. Gámez, R. Rumí, and A. Salmerón

Page 137: Proceedings of the Third European Workshop on Probabilistic

algorithm continues to increase the number ofclusters while the log-likelihood of the model,computed from the test database, is increased.

A model for n clusters in converted into amodel for n + 1 clusters as follows:

1. Let c1, . . . , cn be the states of the class vari-able.

2. Add a new state, cn+1 to the class variable.

3. Update the probability distribution of theclass variable by re-computing the proba-bility of cn and cn+1 (the probability of theother values remains unchanged):

• p(cn) := p(cn)/2.

• p(cn+1) := p(cn)/2.

4. For each feature variable X,

• f(x|cn+1) := f(x|cn).

At this point, we can formulate the clusteringalgorithm as follows. The algorithm receives asargument a database with variables X1, . . . ,Xn

and returns a naive Bayes network with vari-ables X1, . . . ,Xn, C, which can be used to pre-dict the class of any object characterised by fea-tures X1, . . . ,Xn.

Algorithm CLUSTER(D)

1. Divide the train database D into two parts:one with the 80% of the records in D se-lected at random, for estimating the pa-rameters (Dp) and another one with theremaining 20% (Dt) for testing the currentmodel.

2. Let net be the initial model obtained fromDp as described above.

3. bestNet := net.

4. L0 :=log-likelihood(net,Dt).

5. Refine net by data augmentation fromdatabase Dp.

6. L1 :=log-likelihood(net,Dt).

7. WHILE (L1 > L0)

(a) bestNet := net.

(b) L0 := L1.

(c) Add a cluster to net.

(d) Refine net by data augmentation.

(e) L1 :=log-likelihood(net,Dt).

8. RETURN(bestNet)

5 Experimental evaluation

In order to analyse the performance of the pro-posed algorithm in the data clustering task, wehave selected a set of databases1 (see Table 1)and the following experimental procedure hasbeen carried out:

• For comparison we have used the EM al-gorithm implemented in the WEKA datamining suite (Witten and Frank, 2005).This is a classical implementation of theEM algorithm in which discrete and nu-merical variables are allowed as input at-tributes. Numerical attributes are modeledby means of Gaussian distributions. In thisimplementation the user can specify, be-forehand, the number of clusters or the EMcan decide how many clusters to create bycross validation. In the last case, the num-ber of clusters is initially set to 1, the EMalgorithm is performed by using a 10 foldscross validation and the log-likelihood (av-eraged the 10 folds) is computed. Then, thenumber of clusters in increased by one andthe same process is carried out. If the log-likelihood with the new number of clustershas increased with respect to the previousone, the execution continues, otherwise theclustering obtained in the previous itera-tion is returned.

• We have divided all the data sets into train(80% of the instances) and test (20% of theinstances). Then the training set has beenused to learn the model but the test set isused to assess the quality of the final model,i.e., to compute the log-likelihood.

1In those datasets usually used for classification theclass variable has been removed.

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials 127

Page 138: Proceedings of the Third European Workshop on Probabilistic

DataSet #instances #attributes discrete? classes source

returns 113 5 no – (Shenoy and Shenoy, 1999)

pima 768 8 no 2 UCI(Newman et al., 1998)

bupa 345 6 no 2 UCI(Newman et al., 1998)

abalone 4177 9 2 – UCI(Newman et al., 1998)

waveform 5000 21 no 3 UCI(Newman et al., 1998)

waveform-n 5000 40 no 3 UCI(Newman et al., 1998)

net15 10000 15 2 – (Romero et al., 2006)

sheeps 3087 23 5 – real farming data (Gamez, 2005)

Table 1: Brief description of the data sets used in the experiments. The column discrete meansif the dataset contains discrete/nominal variables (if so the number of discrete variables is given).The column classes shows the number of class-labels when the dataset comes from classificationtask.

• Two independent runs have been carriedout for each algorithm and data set:

– The algorithm proposed in this pa-per is executed over the given trainingset. Let #mteclusters be the number ofclusters it has decided. Then, the EMalgorithm is executed receiving as in-puts the same training set and numberof clusters equal to #mteclusters.

– The WEKA EM algorithm is exe-cuted over the given training set. Let#emclusters be the number of clustersit has decided. Then, our algorithm isexecuted receiving as inputs the sametraining set and number of clustersequal to #emclusters.

In this way we try to compare the perfor-mance of both algorithms over the samenumber of clusters.

Table 2 shows the results obtained. Out ofthe 8 databases used in the experiments, 7 re-fer to real world problems, while the other one(Net15) is a randomly generated network withjoint MTE distribution (Romero et al., 2006).A rough analysis of the results suggest a betterperformance of the EM clustering algorithm: in5 out of the 8 problems, it obtains a model withhigher log-likelihood than the MTE-based algo-rithm. However, it must be pointed out that thenumber of clusters contained in the best mod-els generated by the MTE-based algorithm is

significantly lower than the number of clustersproduced by the EM. In some cases, a modelwith many clusters, all of them with low prob-ability, is not as useful as a model with fewerclusters but with higher probability, even if thelast one has lower likelihood.

Another fact that must be pointed out is that,in the four problems where the exact numberof classes is known (pima, bupa, waveform andwaveform-n) the number of clusters returned bythe MTE-based algorithms is more accurate.Besides, forcing the number of clusters in theEM algorithm to be equal to the number of clus-ters generated by the MTE method, the differ-ences turn in favor of the MTE-based algorithm.

After this analysis, we consider the MTE-based clustering algorithm a promising method.It must be taken into account that the EMalgorithm implemented in Weka is very opti-mised and sophisticated, while the MTE-basedmethod presented here is a first version in whichmany aspects are still to be improved. Even inthese conditions, our method is competitive insome problems. Models which are away fromnormality (as Net15) are more appropriate forapplying the MTE-based algorithm rather thanEM.

6 Discussion and concluding remarks

In this paper we have introduced a novel unsu-pervised clustering algorithm which is based onthe MTE model. The model has been tested

128 J. A. Gámez, R. Rumí, and A. Salmerón

Page 139: Proceedings of the Third European Workshop on Probabilistic

DataSet #mteclusters MTE EM #emclusters MTE EM

returns 2 212 129 – – –

Pima 3 -4298 -4431 7 -4299 -3628

bupa 5 -1488 -1501 3 -1491 -1489

abalone 11 6662 7832 15 6817 8090

waveform 3 -38118 -33084 11 -38153 -31892

Waveform-n 3 -65200 -59990 11 -65227 -58829

Net15 4 -3418 -6014 23 -3723 -5087

Sheeps 3 -34214 -35102 7 -34215 -34145

Table 2: Results obtained. Columns 2-4 represents the number of clusters discovered by theproposed MTE-based method, its result and the result obtained by EM when using that numberof clusters. Columns 5-7 represents the number of clusters discovered by EM algorithm, its resultand the result obtained by the MTE-based algorithm when using that number of clusters.

over 8 databases, 7 of them corresponding toreal-world problems and some of them com-monly used for benchmarks. We consider thatthe results obtained, in compares with the EMalgorithm, are at least promising. The algo-rithm we have used in this paper can still beimproved. For instance, during the model con-struction we have used a train-test approach forestimating the parameters and validating the in-termediate models obtained. However, the EMalgorithm uses cross validation. We plan to in-clude 10-fold cross validation in a forthcomingversion of the algorithm.

Another aspect that can be improved is theway in which the model is updated when thenumber of clusters is increased. We have fol-lowed a very simplistic approach: for the fea-ture variables, we duplicate the same densityfunction conditional on the previous last clus-ter, while for the class variable, we divide theprobability of the former last cluster with thenew one. We think that a more sophisticatedapproach, based on component splitting and pa-rameter reusing similar to the techniques pro-posed in (Karciauskas, 2005) could improve theperformance of our algorithm.

The algorithm proposed in this paper can beapplied not only to clustering problems, butalso to construct an algorithm for approximateprobability propagation in hybrid Bayesian net-works. This idea was firstly proposed in (Lowdand Domingos, 2005), and is based on the sim-

plicity of probability propagation in naive Bayesstructures. A naive Bayes model with discreteclass variable, actually represents the condi-tional distribution of each variable to the rest ofvariables as a mixture of marginal distributions,being the number of components in the mix-ture equal to the number of states of the classvariable. So, instead of constructing a generalBayesian network from a database, if the goalis probability propagation, this approach can becompetitive. The idea is specially interesting forthe MTE model, since probability propagationhas a high complexity for this model.

Acknowledgments

Acknowledgements This work has beensupported by Spanish MEC under projectTIN2004-06204-C03-01,03.

References

M.R. Anderberg. 1973. Cluster Analysis for Appli-cations. Academic Press.

P. Cheeseman and J. Stutz. 1996. Bayesian classi-fication (AUTOCLASS): Theory and results. InU. M. Fayyad, G. Piatetsky-Shapiro, P Smyth,and R. Uthurusamy, editors, Advances in Knowl-edge Discovery and Data Mining, pages 153–180.AAAI Press/MIT Press.

B.R. Cobb, P.P. Shenoy, and R. Rumı. 2006. Ap-proximating probability density functions withmixtures of truncated exponentials. Statistics andComputing, In press.

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials 129

Page 140: Proceedings of the Third European Workshop on Probabilistic

A. P. Dempster, N. M. Laird, and D.B. Rubin. 1977.Maximum likelihood from incomplete data via theem algorithm. Journal of the Royal Statistical So-ciety, 1(39):1–38.

R.O. Duda, P.E. Hart, and D.J. Stork. 2001. Pat-tern Recognition. Wiley.

N. Friedman. 1998. The Bayesian structural EMalgorithm. In Gregory F. Cooper and SerafınMoral, editors, Proceedings of the 14th Conferenceon Uncertainty in Artificial Intelligence (UAI-98), pages 129–138. Morgan Kaufmann.

J.A. Gamez. 2005. Mining the ESROM: A studyof breeding value prediction in manchego sheepby means of classification techniques plus at-tribute selection and construction. Technical Re-port DIAB-05-01-3, Computer Systems Depart-ment. University of Castilla-La Mancha.

A.K. Jain, M.M. Murty, and P.J. Flynn. 1999. Dataclustering: a review. ACM COmputing Surveys,31(3):264–323.

G. Karciauskas. 2005. Learning with hidden vari-ables: A parameter reusing approach for tree-structured Bayesian networks. Ph.D. thesis, De-partment of Computer Science. Aalborg Univer-sity.

D. Lowd and P. Domingos. 2005. Naive bayes mod-els for probability estimation. In ICML ’05: Pro-ceedings of the 22nd international conference onMachine learning, pages 529–536. ACM Press.

S. Moral, R. Rumı, and A. Salmeron. 2001. Mix-tures of truncated exponentials in hybrid Bayesiannetworks. In Lecture Notes in Artificial Intelli-gence, volume 2143, pages 135–143.

D.J. Newman, S. Hettich, C.L. Blake,and C.J. Merz. 1998. UCI Repos-itory of machine learning databases.www.ics.uci.edu/∼mlearn/MLRepository.html.

J.M. Pena, J.A. Lozano, and P. Larranaga. 2000.An improved bayesian structural EM algorithmfor learning bayesian networks for clustering. Pat-tern Recognition Letters, 21:779–786.

V. Romero, R. Rumı, and A. Salmeron. 2006.Learning hybrid Bayesian networks using mix-tures of truncated exponentials. InernationalJournal of Approximate Reasoning, 42:54–68.

C. Shenoy and P.P. Shenoy. 1999. Bayesian networkmodels of portfolio risk and return. In W. Lo Y.S.Abu-Mostafa, B. LeBaron and A.S. Weigend, edi-tors, Computational Finance, pages 85–104. MITPress, Cambridge, MA.

M.A. Tanner and W.H Wong. 1987. The calcula-tion of posterior distributions by data augmenta-tion (with discussion). Journal of the AmericanStatistical Association, 82:528–550.

I.H. Witten and E. Frank. 2005. Data Min-ing: Practical Machine Learning Tools and Tech-niques. Morgan Kaufmann.

130 J. A. Gámez, R. Rumí, and A. Salmerón

Page 141: Proceedings of the Third European Workshop on Probabilistic

Selecting Strategies for Infinite-Horizon Dynamic LIMIDs

Marcel A.J. van GervenInstitute for Computing and Information Sciences

Radboud University NijmegenToernooiveld 1

6525 ED Nijmegen, The Netherlands

Francisco J. DıezDepartment of Artificial Intelligence, UNED

Juan del Rosal 1628040 Madrid, Spain

Abstract

In previous work we have introduced dynamic limited-memory influence diagrams (DLIM-IDs) as an extension of LIMIDs aimed at representing infinite-horizon decision processes.If a DLIMID respects the first-order Markov assumption then it can be represented by2TLIMIDS. Given that the treatment selection algorithm for LIMIDs, called single policyupdating (SPU), can be infeasible even for small finite-horizon models, we propose twoalternative algorithms for treatment selection with 2TLIMIDS. First, single rule updating(SRU) is a hill-climbing method inspired upon SPU which needs not iterate exhaustivelyover all possible policies at each decision node. Second, a simulated annealing algorithmcan be used to avoid the local-maximum policies found by SPU and SRU.

1 Introduction

An important goal in artificial intelligence is tocreate systems that make optimal decisions insituations characterized by uncertainty. Onecan think for instance of a robot that navigatesbased on its sensor readings in order to achievegoal states, or of a medical decision-support sys-tem that chooses treatments based on patientstatus in order to maximize life-expectancy.

Limited-memory influence diagrams (LIM-IDs) are a formalism for decision-making un-der uncertainty (Lauritzen and Nilsson, 2001).They generalize standard influence diagrams(Howard and Matheson, 1984) by relaxing theassumption that the whole observed history istaken into account when making a decision, andby dropping the requirement that a complete or-der is defined over decisions. This increases thesize and variety of decision problems that canbe handled, although possibly at the expenseof finding only approximations to the optimal

strategy. Often however, there is no predefinedtime at which the process stops (i.e. we havean infinite-horizon decision process) and in thatcase the LIMID would also become infinite insize. In previous work, we have introduced dy-namic LIMIDs (DLIMIDs) and two-stage tem-poral LIMIDs (2TLIMIDs) as an extension ofstandard LIMIDs that allow for the representa-tion of infinite-horizon decision processes (vanGerven et al., 2006). However, the problem offinding acceptable strategies for DLIMIDs hasnot yet been addressed. In this paper we discussa number of techniques to approximate the opti-mal strategy for infinite-horizon dynamic LIM-IDs. We demonstrate the performance of thesealgorithms on a non-trivial decision problem.

2 Preliminaries

2.1 Bayesian Networks

Bayesian networks (Pearl, 1988) provide for acompact factorization of a joint probability dis-

Page 142: Proceedings of the Third European Workshop on Probabilistic

tribution over a set of random variables by ex-ploiting the notion of conditional independence.One way to represent conditional independenceis by means of an acyclic directed graph (ADG)G where vertices V (G) correspond to randomvariables X and the absence of arcs from theset of arcs A(G) represents conditional indepen-dence. Due to this one-to-one correspondencewe will use vertices v ∈ V (G) and random vari-ables X ∈ X interchangeably. A Bayesian net-

work (BN) is defined as B = (X, G, P ), suchthat the joint probability distribution P over X

factorizes according to G:

P (X) =∏

X∈X

P (X | πG(X))

where πG(X) = X ′ | (X ′,X) ∈ A(G) denotesthe parents of X. We drop the subscript G whenclear from context. We assume that (random)variables X can take values x from a set ΩX anduse x to denote an element in ΩX = ×X∈X ΩX

for a set X of (random) variables.

2.2 LIMIDs

Although Bayesian networks are a naturalframework for probabilistic knowledge represen-tation and reasoning under uncertainty, theyare not suited for decision-making under un-certainty. Influence-diagrams (Howard andMatheson, 1984) are graphical models that ex-tend Bayesian networks to incorporate decision-making but are restricted to the representationof small decision problems. Limited-memory

influence diagrams (LIMIDs) (Lauritzen andNilsson, 2001) generalize standard influence-diagrams by relaxing the no-forgetting assump-tion, which states that, given a total ordering ofdecisions, information present when making de-cision D is also taken into account when makingdecision D′, if D precedes D′ in the ordering. ALIMID is defined as a tuple L = (C,D,U, G, P )consisting of the following components. C rep-resents a set of random variables, called chance

variables, D represents a set of decisions avail-able to the decision maker, where a decisionD ∈ D is defined as a variable that can take ona value from a set of choices ΩD, and U is a setof utility functions, which represent the utility

of being in a certain state as defined by config-urations of chance and decision variables. G isan acyclic directed graph (ADG) with verticesV (G) corresponding to C∪D∪U, where we useV to denote C ∪ D. Again, due to this corre-spondence, we will use nodes in V (G) and cor-responding elements in C∪D∪U interchange-ably. The meaning of an arc (X,Y ) ∈ A(G) isdetermined by the type of Y . If Y ∈ C thenthe conditional probability distribution associ-ated with Y is conditioned by X. If Y ∈ D thenX represents information that is present to thedecision maker prior deciding upon Y . We callπ(Y ) the informational predecessors of Y . Theorder in which decisions are made in a LIMIDshould be compatible with the partial order in-duced by the ADG and are based only on theparents π(D) of a decision D. If Y ∈ U thenX takes part in the specification of the utilityfunction Y such that Y : Ωπ(Y ) → R. In this pa-per, it is assumed that utility nodes cannot havechildren and the joint utility function U is ad-ditively decomposable such that U =

∑U∈U

U .P specifies for each d ∈ ΩD a distribution

P (C : d) =∏

C∈C

P (C | π(C))

that represents the distribution over C whenwe externally set D = d (Cowell et al., 1999).Hence, C is not conditioned on D, but ratherparameterized by D, and if D is unbound thenwe write P (C : D).

A stochastic policy for decisions D ∈ D is de-fined as a distribution PD(D | π(D)) that mapsconfigurations of π(D) to a distribution over al-ternatives for D. If PD is degenerate then wesay that the policy is deterministic. A strategy

is a set of policies ∆ = PD : D ∈ D whichinduces the following joint distribution over thevariables in V:

P∆(V) = P (C : D)∏

D∈D

PD(D | π(D)).

Using this distribution we can compute the ex-pected utility of a strategy ∆ as E∆(U) =∑

vP∆(v)U(v). The aim of any rational deci-

sion maker is then to maximize the expectedutility by finding the optimal strategy ∆∗ ≡arg max∆ E∆(U).

132 M. A. J. van Gerven and F. J. Díez

Page 143: Proceedings of the Third European Workshop on Probabilistic

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

C1

D1

C2

U1

· · ·

L0 Lt DLIMID

Figure 1: Chance nodes are shown as circles, decision nodes as squares and utility nodes as dia-monds. The 2TLIMID (left) can be unrolled into a DLIMID (right), where large brackets denotethe boundary between subsequent times. Note that due to the definition of a 2TLIMID, the infor-mational predecessors of a decision can only occur in the same, or the preceding time-slice.

2.3 Dynamic LIMIDs and 2TLIMIDs

Although LIMIDs can often represent finite-horizon decision processes more compactly thanstandard influence diagrams, they cannot rep-resent infinite-horizon decision processes. Re-cently, we introduced dynamic LIMIDs (DLIM-IDs) as an explicit representation of decisionprocesses (van Gerven et al., 2006). To repre-sent time, we use T ⊆ N to denote a set of timepoints, which we normally assume to be an in-terval u | t ≤ u ≤ t′, t, u, t′ ⊂ N, also writ-ten as t : t′. We assume that chance variables,decision variables and utility functions are in-dexed by a superscript t ∈ T, and use CT, DT

and UT to denote all chance variables, decisionvariables and utility functions at times t ∈ T,where we abbreviate CT ∪ DT with VT.

A DLIMID is simply defined as a LIMID(CT,DT,UT, G, P ) such that for all pairs ofvariables Xt, Y u ∈ VT ∪ UT it holds that ift < u then Y u cannot precede Xt in the or-dering induced by G. If T = 0 : N , whereN ∈ N is the (possibly infinite) horizon, thenwe suppress T altogether, and we suppress in-dices for individual chance variables, decisionvariables and utility functions when clear fromcontext. If a DLIMID respects the Markovassumption that the future is independent ofthe past, given the present, then it can becompactly represented by a 2TLIMID (see Fig.1), which is a pair T = (L0,Lt) with prior

model L0 = (C0,D0,U0, G0, P 0) and transition

model Lt = (Ct−1:t,Dt−1:t,Ut, G, P ) such that

for all V t−1 ∈ Vt−1:t in the transition modelit holds that πGt(V t−1) = ∅. The transitionmodel is not yet bound to any specific t, butif bound to some t ∈ 1 : N , then it is usedto represent P (Ct : Dt−1:t) and utility func-tions U ∈ Ut, where both G and P do notdepend on t. The prior model is used to rep-resent the initial distribution P 0(C0 : D0) andutility functions U ∈ U0. The interface of thetransition model is the set It ⊆ Vt−1 such that(V t−1

i , V tj ) ∈ A(G) ⇔ V t−1

i ∈ It. Given a hori-zon N , we may unroll a 2TLIMID for n time-

slices in order to obtain a DLIMID with thefollowing joint distribution:

P (C,D) = P 0(C0 : D0)

N∏

t=1

P (Ct : Dt−1:t).

Let ∆t = P tD(D | πG(D)) | D ∈ Dt denote

the strategy for a time-slice t and let the wholestrategy be ∆ = ∆0 ∪ · · · ∪ ∆N . Given ∆0, L0

defines a distribution over the variables in V0:

P∆0(V0) = P 0(C0 : D0)∏

D∈D0

PD(D | πG0(D))

and given a strategy ∆t with t > 0, Lt definesthe following distribution over variables in Vt:

P∆t(Vt | It) = P (Ct : Du)∏

D∈Dt

PD(D | πG(D))

with u = t − 1 : t. Combining both equations,given a horizon N and strategy ∆, a 2TLIMIDinduces a distribution over variables in V:

P∆(V) = P∆0(V0)

N∏

t=1

P∆t(Vt | It). (1)

Selecting Strategies for Infinite-Horizon Dynamic LIMIDs 133

Page 144: Proceedings of the Third European Workshop on Probabilistic

Let U0(V0) =∑

U∈U0 U(πG0(U)) denote the

joint utility for time-slice 0 and let U t(Vt-1:t) =∑U∈U

t U(πG(U)) denote the joint utility fortime-slice t > 0. We redefine the joint utilityfunction for a dynamic LIMID as

U(V) = U0(V0) +

N∑

t=1

γtU t(Vt-1:t)

where γ with 0 ≤ γ < 1 is a discount factor,representing the notion that early rewards areworth more than rewards earned at a later time.

In this way, we can use DLIMIDs to representinfinite-horizon Markov decision processes.

2.4 Memory variables

Figure 1 makes clear that the informational pre-decessors of a decision variable Dt can only oc-cur in time-slices t or t−1 (viz. Eq. 1). Observa-tions made earlier in time will not be taken intoaccount and as a result, states that are qualita-tively different can appear the same to the deci-sion maker, which leads to suboptimal policies.This phenomenon is known as perceptual alias-

ing (Whitehead and Ballard, 1991). We try toavoid this problem by introducing memory vari-

ables that represent a summary of the observedhistory. With each observable variable V ∈ V,we associate a memory variable M ∈ C, suchthat the parents of a memory variable are givenby π(M0) = V 0 and π(M t) = V t,M t−1for t ∈ 1, . . . ,N and all children of M t, witht ∈ 0, . . . ,N, are decision variables D ∈ Dt.

Figure 2 visualizes the concept of a memoryvariable, and is used as the running examplefor this paper. It depicts a 2TLIMID for thetreatment of patients that may or may not havea disease D. The disease can be identified by afinding F , which is the result of a laboratory test

L, having an associated cost that is captured bythe utility function U2. The memory concern-ing findings is represented by the memory vari-able M , and based on this memory, we decidewhether or not to perform treatment T , whichhas an associated cost, captured by the utilityfunction U3. Memory concerning past findingswill be used to decide whether or not to performthe laboratory test. If the patient has the dis-ease then this decreases the chances of patient

D0

F 0

L0

S0

U1

1U0

2

M0

T 0

U0

3

D1

F 1

L1

S1

U1

1U1

2

M1

T 1

U1

3

Figure 2: A DLIMID for patient treatment asspecified by a 2TLIMID.

survival S. Patient survival has an associatedutility U1. An initial strategy, as for instancesuggested by a physician, might be to always

treat and never test.

There are various ways to define ΩM andthe distributions P (M0 | V 0) and P (M t |V t,M t−1) for a memory variable M . The opti-mal way to define our memory variables is prob-lem dependent, and we assume that this defini-tion is based on the available domain knowl-edge. For our running example, we chooseΩM = a, n, p × a, n, p × a, n, p, wherea stands for the absence of a finding, n for anegative finding, and p for a positive finding,which is evidence for the presence of the dis-ease. M t = (z, y, x) then denotes the currentfinding x, the finding in the previous time-slicey, and the finding two time-slices ago z. If thenew finding is F t+1 = f then M t+1 = (y, x, f),and since we have not yet observed any findingsat t = 0, the initial memory is (a, a, a) if we didnot test and (a, a, n) or (a, a, p) if the test wasperformed.

3 Improving Strategies in

Infinite-Horizon DLIMIDs

Although DLIMIDs constructed from 2TLIM-IDs can represent infinite-horizon Markov deci-sion processes, to date, the problem of strategyimprovement using 2TLIMIDs has not been ad-dressed. In this section, we explore techniquesfor finding strategies with high expected utility.

134 M. A. J. van Gerven and F. J. Díez

Page 145: Proceedings of the Third European Workshop on Probabilistic

3.1 Single Policy Updating

One way to improve strategies in standard LIM-IDs is to use an iterative procedure called single

policy updating or SPU for short (Lauritzen andNilsson, 2001). Let ∆0 = p1, . . . , pn be an or-dered set representing the initial strategy wherepj, 1 ≤ j ≤ n stands for a (randomly initialized)policy PDj

for a decision Dj . We say pj is the lo-

cal maximum policy for a strategy ∆ at decisionDj if E∆(U) cannot be improved by changingpj. Single policy updating proceeds by iterat-ing over all decision variables (called a cycle)to find local maximum policies, and to reiterateuntil no further improvement in expected util-ity can be achieved. SPU converges in a finitenumber of cycles to a local maximum strategy ∆where each pj ∈ ∆ is a local maximum policy.Note that this local maximum strategy is notnecessarily the global maximum strategy ∆∗.

To find local maximum policies in standardLIMIDs, Lauritzen and Nilsson (2001) use amessage passing algorithm, optimized for sin-gle policy updating. In this paper, we re-sort to standard inference algorithms for findingstrategies for (infinite-horizon) DLIMIDs. Wemake use of the fact that given ∆, a LIMIDL = (C,D,U, G, P ) may be converted into aBayesian network B = (X, G′, P ′). Since ∆ in-duces a distribution over variables in V (viz.Eq. 1), we can use ∆ to convert decision vari-ables D ∈ D to random variables X ∈ X withparents πG(D) such that P (X | πG′(X)) =PD(D | πG(D)). Additionally, utility functionsU ∈ U may be converted into random vari-ables X by means of Cooper’s transformation(Cooper, 1988), which allows us to computeE∆(U). We use B(L,∆) to denote this conver-sion of a LIMID into a Bayesian network.

Single policy updating cannot be applied di-rectly to an infinite-horizon DLIMID since com-puting E∆(U) would need an infinite number ofsteps. In order to approximate the expectedutility given ∆, we assume that the DLIMIDcan be represented as a 2TLIMID T = (L0,Lt)and ∆ can be expressed as a pair (∆0,∆t), suchthat ∆0 is the strategy at t = 0 and ∆t is astationary strategy at t ∈ 1 : ∞. Note that the

SPU(T ,∆0,ǫ):

∆ = ∆0, euMax = Eǫ∆0

(U).repeat

euMaxOld = euMax

for j = 1 to n do

for all policies p′j for ∆ at Dj do

∆′ = p′j ∗ ∆if Eǫ

∆′(U) > euMax + ǫ then

∆ = ∆′ and euMax = Eǫ∆′(U)

end if

end for

end for

until euMax = euMaxOld

return ∆

Figure 3: Single policy updating for 2TLIMIDs.

optimal strategy is deterministic and stationaryfor infinite-horizon and fully observable Markovdecision processes (Ross, 1983). In the partiallyobservable case, we can only expect to find ap-proximations to the optimal strategy by usingmemory variables that represent part of the ob-servational history (Meuleau et al., 1999).

We proceed by converting (L0,Lt) into(B0,Bt) with B0 = B(L0,∆

0) and Bt =B(Lt,∆

t), where (B0,Bt) is known as a two-

stage temporal Bayes net (Dean and Kanazawa,1989). We use inference algorithms that oper-ate on (B0,Bt) in order to compute an approx-imation of the expected utility. In our work,we have used the interface algorithm (Murphy,2002), for which it holds that the space and timetaken to compute each P (Xt | Xt−1) does notdepend on the number of time-slices. The ap-proximation Eǫ

∆(U) is made using a finite num-ber of time-slices k, where k is such that γk < ǫwith ǫ > 0. The discount factor γ ensures thatlimt→∞ E∆(U) = 0. Let ∆0 = ∆0 ∪ ∆t bethe initial strategy with ∆0 = p1, . . . , pm and∆t = pm+1, . . . , pn, where m is the number ofdecision variables in L0 and n−m is the numberof decision variables in Lt. Following (Lauritzenand Nilsson, 2001), we define p′j ∗∆ as the strat-egy obtained by replacing pj with p′j in ∆. SPUbased on a 2TLIMID T with initial strategy ∆0

is then defined by the algorithm in Fig. 3.

Selecting Strategies for Infinite-Horizon Dynamic LIMIDs 135

Page 146: Proceedings of the Third European Workshop on Probabilistic

3.2 Single Rule Updating

An obstacle for the use of SPU for strategyimprovement is the fact that if the state-spaceΩπ(D) for informational predecessors π(D) of adecision variable D becomes large, then it be-comes impossible in practice to iterate over allpossible policies for D. The number of policiesthat needs to be evaluated at each decision vari-able D grows as kmr

, with k = |ΩD|, assumingthat |ΩVj

| = m for all Vj ∈ π(D), and r is thenumber of informational predecessors of D. Forexample, looking back for two time-slices withinour model leads to 2 policies for L0 and 227 poli-cies for T 0, L1 and T 1, which is computationallyinfeasible, even for this small example.

For this reason, we introduce a hill-climbingsearch called single rule updating (SRU) thatis inspired upon single policy updating. A de-terministic policy can be viewed as a mappingpj : Ωπ(Dt

j) → ΩDt

j, describing for each config-

uration x ∈ Ωπ(Dtj) an action a ∈ ΩDt

j. We

call (x, a) ∈ pj a decision rule. Instead of ex-haustively searching over all possible policies foreach decision variable, we try to increase theexpected utility by local changes to the deci-sion rules within the policy. I.e., at each stepwe change one decision-rule within the policy,accepting the change when the expected utilityincreases. We use (x, a′) ∗ pj to denote the re-placement of (x, a) by (x, a′) in pj . Similar toSPU, we keep iterating until there is no furtherincrease in the expected utility (Fig. 4).

SRU decreases the number of policies thatneed to be evaluated in each local cycle for adecision variable to kmr, where notation is asbefore. For our example, we only need to eval-uate 2 policies for L0 and 54 policies for T 0,L1 and T 1 in each local cycle, albeit at the ex-pense of replacing the exhaustive search by ahill-climbing strategy, increasing the risk of end-ing up in local maxima, and having to run localcycles until convergence.

3.3 Simulated Annealing

SPU and SRU both find local maximum strate-gies, which may not be the optimal strategy ∆∗.To see this, consider the proposed strategy for

SRU(T ,∆0,ǫ):

∆ = ∆0, euMax = Eǫ∆0

(U)repeat

euMaxOld = euMax

for j = 1 to n do

repeat

euMaxLocal = euMax

for all configurations x ∈ Ωπ(Dj) do

for all actions a′ ∈ ΩDjdo

p′j = (x, a′) ∗ pj

∆′ = p′j ∗ ∆if Eǫ

∆′(U) > euMax + ǫ then

∆ = ∆′ and euMax = Eǫ∆′(U)

end if

end for

end for

until euMax = euMaxLocal

end for

until euMax = euMaxOld

return ∆

Figure 4: Single rule updating for 2TLIMIDs.

our running example (Fig. 2) to never test andalways treat. Suppose this is our initial strategy∆0 for either the SPU or SRU algorithm. Try-ing to improve the policy for the laboratory testL we find that performing the test will only de-crease the expected utility since the test has noinformational value (we always treat) but doeshave an associated cost. Conversely, trying toimprove the policy for treatment we find thatthe test is never performed and therefore it ismore safe to always treat. Hence, SPU and SRUwill stop after one cycle, returning the proposedstrategy as the local optimal strategy.

In order to improve upon the strategies foundby SRU, we resort to simulated annealing (SA),which is a heuristic search method that triesto avoid getting trapped into local maximumsolutions that are found by hill-climbing tech-niques such as SRU (Kirkpatrick et al., 1983).SA chooses candidate solutions by looking atneighbors of the current solution as defined bya neighborhood function. Local maxima areavoided by sometimes accepting worse solutionsaccording to an acceptance function.

136 M. A. J. van Gerven and F. J. Díez

Page 147: Proceedings of the Third European Workshop on Probabilistic

SA(T ,∆0,ǫ,τ0, T ):

∆ = ∆0, t = 0, eu = Eǫ∆(U)

repeat

select a random decision variable Dj

select a random decision rule (x, a) ∈ pj

select a random action a′ ∈ ΩDj, a′ 6= a

p′j = (x, a′) ∗ pj

∆′ = p′j ∗ ∆eu’ = Eǫ

∆′(U)if θ ≤ P (a(∆′)=yes | eu+ ǫ, eu′, t) then

∆ = ∆′ and eu = eu′

end if

t = t + 1until T (t) < τ0

return SRU(T ,∆, ǫ)

Figure 5: Simulated annealing for 2TLIMIDs.

In this paper, we have chosen the acceptancefunction P (a(∆′) = yes | eu, eu′, t) equal to 1if eu′ > eu and equal to exp(eu’−eu

T (t) ) otherwise,

where a(∆′) stands for the acceptance of theproposed strategy ∆′, eu’ = Eǫ

∆′(U), eu =Eǫ

∆(U) for the current strategy ∆, and T is anannealing schedule that is defined as T (t + 1) =α · T (t) where T (0) = β with α < 1 and β > 0.

With respect to strategy finding in DLIMIDs,we propose the simulated annealing scheme asshown in Fig. 5, where θ is repeatedly chosenuniformly at random between 0 and 1. Notethat after the annealing phase we apply SRU inorder to greedily find a local maximum solution.

4 Experimental Results

We have compared the solutions found by SRUand SA to our running example based on twentyrandomly chosen initial strategies. Note thatSPU was not feasible due to the large numberof policies per decision variable. We have cho-sen a discounting factor γ = 0.95 and a stop-ping criterion ǫ = 0.01. After some initial ex-periments we have chosen α = 0.995, β = 0.5and tMin = 3.33 · 10−3 for the SA parame-ters. In order to reduce computational load,we assume that the parameters for P (T 0 | M0)and P (T t | M t) are tied such that we needonly estimate three different policies for P (L0),

8

9

10

11

12

0 200 400 600 800 1000

Eǫ∆(U)

cycle

Figure 6: Change in Eǫ∆(U) for one run of SA.

The sudden increase at the end of the run iscaused by the subsequent application of SRU.

P (Lt | M t−1) and P (T 0 | M0) = P (T t | M t).

For the twenty experiments, SRU foundstrategies with an average expected utility ofEǫ

∆(U) = 11.23 with σ = 0.43. This requiredthe evaluation of 720 different strategies on av-erage. SA found strategies with an average ex-pected utility of Eǫ

∆(U) = 11.51 with σ = 0.14.This required the evaluation of 1546 differentstrategies on average. In 16 out of 20 exper-iments, SA found strategies with higher ex-pected utility than SRU. Furthermore, due tothe random behavior of the algorithm it avoidsthe local maximum policies found by SRU. Forinstance, SRU finds a strategy with expectedutility 10.29 three out of twenty times, whichis equal to the expected utility of the proposedstrategy to always treat and never test. Thebest strategy was found by simulated anneal-ing and has an expected utility of 11.67. Thesubsequent values of Eǫ

∆(U), found during thatexperiment, are shown in Fig. 6.

The found strategy can be represented by apolicy graph (Meuleau et al., 1999); a finite statecontroller that depicts state transitions, wherestates represent actions and transitions are in-duced by observations. Figure 7 depicts the pol-icy graph for the best found strategy. Startingat the left arrow, each time slice constitutes atest decision (circle) and a treatment decision(double circle). Shaded circles stand for posi-tive decisions (i.e., Lt = yes and T t = yes) andclear circles stand for negative decisions (i.e.,Lt = no and T t = no). If a test has two outgo-

Selecting Strategies for Infinite-Horizon Dynamic LIMIDs 137

Page 148: Proceedings of the Third European Workshop on Probabilistic

Figure 7: Policy graph for the best strategyfound by simulated annealing.

ing arcs, then these stand for a negative finding(dashed arc) and positive finding (solid arc) re-spectively. Most of the time, the policy graphassociates a negative test result with no treat-ment and a positive test result with treatment.

5 Conclusion

In this paper we have demonstrated that reason-able strategies can be found for infinite-horizonDLIMIDs by means of SRU. Although compu-tationally more expensive, SA considerably im-proves the found strategies by avoiding localmaxima. Both SRU and SA do not suffer fromthe intractability of SPU when the number ofinformational predecessors increases. The ap-proach does require that good strategies can befound using a limited amount of memory, sinceotherwise, found strategies will fail to approx-imate the optimal strategy. This requirementshould hold especially between time-slices, sincethe state-space of memory variables can becomeprohibitively large when a large part of the ob-served history is required for optimal decision-making. Although this restricts the types ofdecision problems that can be managed, DLIM-IDs, constructed from a 2TLIMID, allow therepresentation of large, or even infinite-horizondecision problems, something which standard

influence diagrams cannot manage in principle.Hence, 2TLIMIDs are particularly useful in thecase of problems that cannot be properly ap-proximated by a short number of time slices.

References

G.F. Cooper. 1988. A method for using belief net-works as influence diagrams. In Proceedings of the4th Workshop on Uncertainty in AI, pages 55–63,University of Minnesota, Minneapolis.

R. Cowell, A. P. Dawid, S. L. Lauritzen, andD. Spiegelhalter. 1999. Probabilistic Networksand Expert Systems. Springer.

T. Dean and K. Kanazawa. 1989. A model for rea-soning about persistence and causation. Compu-tational Intelligence, 5(3):142–150.

R.A. Howard and J.E. Matheson. 1984. Influence di-agrams. In R.A. Howard and J.E. Matheson, edi-tors, Readings in the Principles and Applicationsof Decision Analysis. Strategic Decisions Group,Menlo Park, CA.

S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi.1983. Optimization by simulated annealing. Sci-ence, 220:671–680.

S.L. Lauritzen and D. Nilsson. 2001. Representingand solving decision problems with limited infor-mation. Management Science, 47(9):1235–1251.

N. Meuleau, K.-E. Kim, L.P. Kaelbling, and A.R.Cassandra. 1999. Solving POMDPs by search-ing the space of finite policies. In Proceedings ofthe 15th Conference on Uncertainty in ArtificialIntelligence, pages 417–426, Stockholm, Sweden.

K.P. Murphy. 2002. Dynamic Bayesian Networks.Ph.D. thesis, UC Berkely.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. Mor-gan Kaufmann Publishers, 2 edition.

S. Ross. 1983. Introduction to Stochastic DynamicProgramming. Academic Press, New York.

M.A.J. van Gerven, F.J. Dıez, B.G. Taal, and P.J.F.Lucas. 2006. Prognosis of high-grade carcinoidtumor patients using dynamic limited-memory in-fluence diagrams. In Intelligent Data Analysis inbioMedicine and Pharmacology (IDAMAP) 2006.Accepted for publication.

S.D. Whitehead and D.H. Ballard. 1991. Learningto perceive and act by trial and error. MachineLearning, 7:45–83.

138 M. A. J. van Gerven and F. J. Díez

Page 149: Proceedings of the Third European Workshop on Probabilistic

Sensitivity analysis of extreme inaccuracies in Gaussian BayesianNetworks

Miguel A. Gomez-Villegas and Paloma Maınma [email protected], [email protected]

Departamento de Estadstica e Investigacin OperativaUniversidad Complutense de Madrid

Plaza de Ciencias 3,28040 Madrid, Spain

Rosario [email protected]

Departamento de Estadstica e Investigacin Operativa IIIUniversidad Complutense de Madrid

Avda. Puerta de Hierro s/n,28040 Madrid, Spain

Abstract

We present the behavior of a sensitivity measure defined to evaluate the impact ofmodel inaccuracies over the posterior marginal density of the variable of interest, after theevidence propagation is executed, for extreme perturbations of parameters in GaussianBayesian networks. This sensitivity measure is based on the Kullback-Leibler divergenceand yields different expressions depending on the type of parameter (mean, variance orcovariance) to be perturbed. This analysis is useful to know the extreme effect of uncer-tainty about some of the initial parameters of the model in a Gaussian Bayesian network.These concepts and methods are illustrated with some examples.

Keywords: Gaussian Bayesian Network, Sensitivity analysis, Kullback-Leibler diver-gence.

1 Introduction

Bayesian network is a graphical probabilisticmodel that provides a graphical framework forcomplex domains with lots of inter-related vari-ables. Among other authors, Bayesian networkshave been studied, by Pearl (1988), Lauritzen(1996), Heckerman (1998) and Jensen (2001).A sensitivity analysis in a Bayesian network isnecessary to study how sensitive is the network’soutput to inaccuracies or imprecisions in the pa-rameters of the initial network, and therefore toevaluate the network robustness.In recent years, some sensitivity analysis tech-niques for Bayesian networks have been devel-oped. In Discrete Bayesian networks Laskey(1995) presents a sensitivity analysis based on

computing the partial derivative of a posteriormarginal probability with respect to a given pa-rameter, Coupe, et al. (2002) develop an effi-cient sensitivity analysis based on inference al-gorithms and Chan, et al. (2005) introduce asensitivity analysis based on a distance measure.In Gaussian Bayesian networks Castillo, et al.(2003) present a sensitivity analysis based onsymbolic propagation and Gomez-Villegas, etal. (2006) develop a sensitivity measure, basedon the Kullback-Leibler divergence, to performthe sensitivity analysis.In this paper, we study the behavior of the sensi-tivity measure presented by Gomez-Villegas, etal. (2006) for extreme inaccuracies (perturba-tions) of parameters that describe the GaussianBayesian network. To prove that this is a well-

Page 150: Proceedings of the Third European Workshop on Probabilistic

defined measure we are interested in studyingthe sensitivity measure when one of the param-eters is different from the original value. More-over, we want to proof that if the value of oneparameter is similar to the real value then thesensitivity measure is close to zero.The paper is organized as follows. In Section 2we briefly introduce definitions of Bayesian net-works and Gaussian Bayesian networks, reviewhow propagation in Gaussian Bayesian networkscan be performed, and present our working ex-ample. In Section 3, we present the sensitiv-ity measure and develop the sensitivity analysisproposed. In Section 4, we obtain the limitsof the sensitivity measure for extreme pertur-bations of the parameters giving the behaviorof the measure in the limit so as their interpre-tation. In Section 5, we perform the sensitiv-ity analysis with the working example for someextreme imprecisions. Finally, the paper endswith some conclusions.

2 Gaussian Bayesian Networks andEvidence propagation

Definition 1 (Bayesian network). A Bayesiannetwork is a pair (G,P) where G is a directedacyclic graph (DAG), which nodes correspond-ing to random variables X=X1, ..., Xn andedges representing probabilistic dependencies,and P=p(x1|pa(x1)), ..., p(xn|pa(xn)) is a setof conditional probability densities (one for eachvariable) where pa(xi) is the set of parents ofnode Xi in G. The set P defines the associatedjoint probability density as

p(x) =n∏

i=1

p(xi|pa(xi)). (1)

As usual we work with a variable of inter-est, so the network’s output is the informationabout this variable of interest after the evidencepropagation.

Definition 2 (Gaussian Bayesian net-work). A Gaussian Bayesian network is aBayesian network over X=X1, ..., Xn witha multivariate normal distribution N(µ,Σ),then the joint density is given by f(x) =

= (2π)−n/2|Σ|−1/2 exp−1

2(x− µ)′Σ−1(x− µ)

where µ is the n-dimensional mean vector, Σn×n the covariance matrix and |Σ| thedeterminant of Σ.

The conditional density associated withXi for i = 1, ..., n in equation (1), is theunivariate normal distribution, with density

f(xi|pa(xi)) ∼ N

µi +

i−1∑

j=1

βij(xj − µj), νi

where βij is the regression coefficient of Xj

in the regression of Xi on the parents of Xi,and νi = Σii − ΣiPa(xi)Σ

−1Pa(xi)Σ

′iPa(xi)

is theconditional variance of Xi given its parents.

Different algorithms have been proposed topropagate the evidence of some nodes in Gaus-sian Bayesian networks. We present an incre-mental method, updating one evidential vari-able at a time (see Castillo, et al. 1997) basedon computing the conditional probability den-sity of a multivariate normal distribution giventhe evidential variable Xe.For the partition X = (Y, E), with Y the set ofnon-evidential variables, where Xi ∈ Y is thevariable of interest, and E is the evidence vari-able, then, the conditional probability distribu-tion of Y, given the evidence E = Xe = e, ismultivariate normal with parameters

µY|E=e = µY + ΣYEΣ−1EE(e− µE)

ΣY|E=e = ΣYY − ΣYEΣ−1EEΣEY

Therefore, the variable of interest Xi ∈ Y afterthe evidence propagation is

Xi|E = e ∼ N(µY|E=ei , σ

Y|E=eii ) ≡

≡ N

(µi +

σie

σee(e− µe), σii − σ2

ie

σee

)(2)

where µi and µe are the means of Xi and Xe

respectively before the propagation, σii andσee the variances of Xi and Xe respectivelybefore propagating the evidence, and σie thecovariance between Xi and Xe before theevidence propagation.

140 M. A. Gómez-Villegas, P. Maín, and R. Susi

Page 151: Proceedings of the Third European Workshop on Probabilistic

Figure 1: DAG of the Gaussian Bayesian net-work

We illustrate the concept of a GaussianBayesian network and the evidence propagationmethod with next example.

Example 1. Assume that we are interested inhow a machine works. This machine is madeup of 5 elements, the variables of the problem,connected as the network in Figure 1. Let usconsider the time each element is working hasa normal distribution and we are interested inthe last element of the machine (X5).

Being X = X1, X2, X3, X4, X5 normally dis-tributed, X ∼ N(µ,Σ), with parameters

µ =

23345

; Σ =

3 0 6 0 60 2 2 0 26 2 15 0 150 0 0 2 46 2 15 4 26

Considering the evidence E = X2 = 4,after evidence propagation we obtain thatX5|X2 = 4 ∼ N(6, 24) and the joint distributionis normal with parameters

µY|X2=4 =

2446

;

ΣY|X2=4 =

3 6 0 66 13 0 130 0 2 46 13 4 24

3 Sensitivity Analysis andnon-influential parameters

The Sensitivity Analysis proposed is as follows:Let us suppose the model before propagatingthe evidence is N(µ,Σ) with one evidential vari-able Xe, whose value is known. After propagat-ing this evidence we obtain the marginal densityof interest f(xi|e). Next, we add a perturbationδ to one of the parameters in the model beforepropagating the evidence (this parameter is sup-posed to be inaccurate thus δ reflects this inac-curacy) and perform the evidence propagation,to get f(xi|e, δ). In some cases, the perturba-tion δ has some restrictions to get admissibleparameters.The effect of adding the perturbation is com-puted by comparing those density functions bymeans of the sensitivity measure. That measureis based on the Kullback-Leibler divergence be-tween the target marginal density obtained af-ter evidence propagation, considering the modelwith and without the perturbation

Definition 3 (Sensitivity measure). Let (G,P)be a Gaussian Bayesian network N(µ,Σ). Letf(xi|e) be the marginal density of interest af-ter evidence propagation and f(xi|e, δ) the samedensity when the perturbation δ is added to oneparameter of the initial model. Then, the sen-sitivity measure is defined by

Spj (f(xi|e), f(xi|e, δ)) =

=∫ ∞

−∞f(xi|e) ln

f(xi|e)f(xi|e, δ)dxi (3)

where the subscript pj is the inaccurate param-eter and δ the proposed perturbation, being thenew value of the parameter pδ

j = pj + δ.

For small values of the sensitivity measurewe can conclude our Bayesian network is robustfor that perturbation.

3.1 Mean vector inaccuracy

Three different situations are possible depend-ing on the element of µ to be perturbed, i.e.,the perturbation can affect the mean of thevariable of interest Xi ∈ Y, the mean of the

Sensitivity analysis of extreme inaccuracies in Gaussian Bayesian Networks 141

Page 152: Proceedings of the Third European Workshop on Probabilistic

evidence variable Xe ∈ E, or the mean of anyother variable Xj ∈ Y with j 6= i. Developingthe sensitivity measure (3), we obtain nextpropositions,

Proposition 1. For the perturbation δ ∈ < inthe mean vector µ, the sensitivity measure is asfollows• When the perturbation is added to the meanof the variable of interest, µδ

i = µi + δ, andXi|E = e, δ ∼ N(µY|E=e

i + δ, σY|E=eii ),

Sµi(f(xi|e), f(xi|e, δ)) =δ2

2σY|E=eii

.

• If the perturbation is added to the meanof the evidential variable, µδ

e = µe + δ,the posterior density of the variableof interest, with the perturbation, isXi|E = e, δ ∼ N(µY|E=e

i − σie

σeeδ, σ

Y|E=eii ),

then

Sµe(f(xi|e), f(xi|e, δ)) =δ2

2σY|E=eii

(σie

σee

)2

.

• The perturbation δ added to the mean of anyother non-evidential variable, different from thevariable of interest, has no influence over Xi,then f(xi|e, δ) = f(xi|e), and the sensitivitymeasure is zero.

3.2 Covariance matrix inaccuracy

There are six possible different situations,depending on the parameter of the covariancematrix Σ to be changed; three if the pertur-bation is added to the variances (elementsin the diagonal of Σ) and other three if theperturbation is added to the covariances of Σ.When the covariance matrix is perturbed, thestructure of the network can change. Thosechanges are presented in the precision matrixof the perturbed network, that is, the inverseof the covariance matrix with perturbation δ.To guarantee the normality of the network itis necessary ΣY|E=e,δ to be a positive definitematrix in all cases presented in next proposition

Proposition 2. Adding the perturbation δ ∈ <to the covariance matrix Σ, the sensitivitymeasure obtained is• If the perturbation is added to the vari-ance of the variable of interest, being

σδii = σii + δ with δ > −σii +

σ2ie

σee, then

Xi|E = e, δ ∼ N(µY|E=ei , σ

Y|E=e,δii ) where

σY|E=e,δii = σii + δ − σ2

ie

σeeand

Sσii(f(xi|e), f(xi|e, δ)) =

=12

[ln

(1 +

δ

σY|E=eii

)− δ

σY|E=e,δii

].

• When the perturbation is added to thevariance of the evidential variable, beingσδ

ee = σee + δ and δ > −σee(1− maxXj∈Y

ρ2je)

with ρje the corresponding correlation co-efficient, the posterior density of inter-est is Xi|E = e, δ ∼ N(µY|E=e,δ

i , σY|E=e,δii )

with µY|E=e,δi = µi +

σie

σee + δ(e− µe) and

σY|E=e,δii = σii − σ2

ie

σee + δtherefore

Sσee(f(xi|e), f(xi|e, δ)) =12

[ln

Y|E=e,δii

σY|E=eii

)+

+σ2

ieσee

(−δ

σee+δ

) (1 + (e− µe)2

(−δ

(σee+δ)σee

))

σY|E=e,δii

.

• The perturbation δ added to the variance ofany other non-evidential variable Xj ∈ Y withj 6= i, σδ

jj = σjj + δ, has no influence over Xi,therefore f(xi|e, δ) = f(xi|e) and the sensitivitymeasure is zero.

• When the covariance between the vari-able of interest Xi and the evidential vari-able Xe is perturbed, σδ

ie = σie + δ = σδei

and −σie −√σiiσee < δ < −σie +√

σiiσee,then Xi|E = e, δ ∼ N(µY|E=e,δ

i , σY|E=e,δii )

with µY|E=e,δi = µi +

(σie + δ)σee

(e− µe) and

σY|E=e,δii = σii − (σie + δ)2

σeethe sensitivity

measure obtained is

142 M. A. Gómez-Villegas, P. Maín, and R. Susi

Page 153: Proceedings of the Third European Workshop on Probabilistic

Sσie(f(xi|e), f(xi|e, δ)) =

=12

[ln

(1− δ2 + 2σieδ

σeeσY|E=eii

)+

Y|E=eii +

σee(e− µe)

)2

σY|E=e,δii

− 1

.

• If we add the perturbation to any othercovariance, i.e., between the variable of interestXi and any other non-evidential variable Xj orbetween the evidence variable Xe and Xj ∈ Ywith j 6= i, the posterior probability density ofthe variable of interest Xi is the same as with-out perturbation and therefore the sensitivitymeasure is zero.

4 Extreme behavior of theSensitivity Measure

When using the sensitivity measure, thatdescribes how sensitive is the variable ofinterest when a perturbation is added to ainaccurate parameter, it would be interestingto know how is the sensitivity measure whenthe perturbation δ ∈ < is extreme. Then, westudy the behavior of that measure for extremeperturbations so as the limit of the sensitivitymeasure.Next propositions present the results about thelimits of the sensitivity measures in all casesgiven in Propositions 1 and 2,

Proposition 3. When the perturbation addedto the mean vector is extreme, the sensitivitymeasure is as follows,

1. • limδ−→±∞

Sµi(f(xi|e), f(xi|e, δ)) = ∞• lim

δ−→0Sµi(f(xi|e), f(xi|e, δ)) = 0

2. • limδ−→±∞

Sµe(f(xi|e), f(xi|e, δ)) = ∞• lim

δ−→0Sµe(f(xi|e), f(xi|e, δ)) = 0.

Proof. It follows directly.

Proposition 4. When the extreme perturbationis added to the elements of the covariance ma-trix and the correlation coefficient of Xi and Xe

is 0 < ρ2ie < 1, the results are,

1. • limδ−→∞

Sσii(f(xi|e), f(xi|e, δ)) = ∞ but

Sσii(f(xi|e), f(xi|e, δ)) = o(δ)

• limδ−→ Mii

Sσii(f(xi|e), f(xi|e, δ)) = ∞ with

Mii = −σii + σ2ie

σee= −σii(1− ρ2

ie)

• limδ−→0

Sσii(f(xi|e), f(xi|e, δ)) = 0

2. • limδ−→∞

Sσee(f(xi|e), f(xi|e, δ)) =

=12

[−ln(1− ρ2

ie)− ρ2ie

(1− (e− µe)2

σee

)]

• limδ−→ Mee

Sσee(f(xi|e), f(xi|e, δ)) =

=12

[ln

(M∗

ee − ρ2ie

M∗ee(1− ρ2

ie)

)+

ρ2ie(1−M∗

ee)M∗

ee − ρ2ie(

1 +(e− µe)2

σee

(1−M∗

ee

M∗ee

))]

where Mee = −σee(1 − M∗ee) being

M∗ee = maxXj ρ2

je

• limδ−→0

Sσee(f(xi|e), f(xi|e, δ)) = 0

3. • limδ−→M1

ie

Sσie(f(xi|e), f(xi|e, δ)) = ∞

• limδ−→M2

ie

Sσie(f(xi|e), f(xi|e, δ)) = ∞with M1

ie = −σie − √σiiσee and

M2ie = −σie +

√σiiσee

• limδ−→0

Sσie(f(xi|e), f(xi|e, δ)) = 0.

Proof. 1. • It follows directly.

• When σδii = σii + δ the new variance of

Xi is σY|E=e,δii = σ

Y|E=eii + δ.

Being σY|E=e,δii > 0 then δ > −σ

Y|E=eii .

Naming Mii = −σY|E=eii and considering

x = σY|E=eii + δ we have

Sensitivity analysis of extreme inaccuracies in Gaussian Bayesian Networks 143

Page 154: Proceedings of the Third European Workshop on Probabilistic

limδ−→ Mii

Sσii(f(xi|e), f(xi|e, δ)) =

= limx−→0

12x

[x ln(x)− x ln(σY|E=e

ii )− x+

+σY|E=eii

]= ∞.

• It follows directly.

2. • limδ−→∞

Sσee(f(xi|e), f(xi|e, δ)) =

=12

ln

(σii

σY|E=eii

)+

−σ2ie

σee

(1− (e−µe)2

σee

)

σii

with σY|E=eii = σii(1− ρ2

ie) and

ρ2ie =

σ2ie

σiiσeethe limit is

=12

[−ln(1− ρ2

ie)− ρ2ie

(1− (e− µe)2

σee

)].

• When σδee = σee + δ, the new conditional

variance for all non evidential variables is

σY|E=e,δjj = σjj −

σ2je

σee + δfor all Xj ∈ Y.

If we impose σY|E=e,δjj > 0 for all Xj ∈ Y

then δ must satisfy next conditionδ > −σee(1− max

Xj∈Yρ2

je).

Naming M∗ee = maxXj ρ2

je andMee = −σee(1−M∗

ee) then we havelim

δ−→ Mee

Sσee(f(xi|e), f(xi|e, δ)) =

= limδ−→ Mee

12

ln

σii − σ2

ieσee+δ

σii − σ2ie

σee

+

σ2ie

σee

(−δ

σee+δ

) (1 + (e− µe)2

(−δ

(σee+δ)σee

))

σii − σ2ie

σee+δ

with ρ2ie = σ2

ieσiiσee

=12

[ln

(σiiσeeM

∗ee − σ2

ie

M∗ee(σiiσee − σ2

ie)

)+

+σ2

ieσee

(1−M∗

eeM∗

ee

) (1 + (e−µe)2

σee

(1−M∗

eeM∗

ee

))

M∗ee − ρ2

ie

=

=12

[ln

(M∗

ee − ρ2ie

M∗ee(1− ρ2

ie)

)+

ρ2ie(1−M∗

ee)M∗

ee − ρ2ie(

1 +(e− µe)2

σee

(1−M∗

ee

M∗ee

))].

• It follows directly.

3. • If we make σδie = σie + δ, the new

conditional variance is

σY|E=e,δii = σ

Y|E=eii − δ2 + 2δσie

σee.

Then, if we impose σY|E=e,δii > 0, δ must

satisfy the next condition−σie −√σiiσee < δ < −σie +

√σiiσee.

First, namingM2

ie = −σie +√

σiiσee, we calculatelim

δ−→M2ie

Sσie (f (xi | e) , f (xi | e, δ)).But δ → M2

ie is equivalent to(δ2 + 2δσie

) → σeeσY|E=eii and given

thatSσie (f (xi | e) , f (xi | e, δ)) =

=12

[ln

(σeeσ

Y|E=eii − (

δ2 + 2δσie)

σeeσY|E=eii

)+

+σeeσ

Y|E=eii +

σee(e− µe)

)2

σeeσY|E=eii − (δ2 + 2δσie)

− 1

and as limx−→0

[lnx +

k

x

]= ∞

for every k, then we getlim

δ−→M2ie

Sσie (f (xi | e) , f (xi | e, δ)) = ∞.

• Analogously, the other limit is alsolim

δ−→M1ie

Sσie (f (xi | e) , f (xi | e, δ)) = ∞.

• It follows directly.

The behavior of the sensitivity measure is theexpected one, except when the extreme pertur-bation is added to the evidential variance, be-cause with known evidence, the posterior den-sity of interest with the perturbation in themodel f(xi|e, δ) is not so different of the pos-terior density of interest without the perturba-tion f(xi|e), therefore although an extreme per-turbation added to the evidential variance canexist, the sensitivity measure tends to a finitevalue.

144 M. A. Gómez-Villegas, P. Maín, and R. Susi

Page 155: Proceedings of the Third European Workshop on Probabilistic

5 Experimental results

Example 2. Consider the Gaussian Bayesiannetwork given in Example 1. Experts disagreewith definition of the variable of interest X5,then the mean could be µδ1

5 = 2 = µ5 + δ1 (withδ1 = −3), the variance could be σδ2

55 = 24 withδ2 = −2 and the covariances between X5 andevidential variable X2 could be σδ3

52 = 3 withδ3 = 1 (the same to σ25); with the varianceof the evidential variable being σδ4

22 = 4 withδ4 = 2 and between X5 and other non-evidentialvariable X3 that could be σδ5

53 = 13 with δ5 = −2(the same to σ35); moreover, there are differentopinions about X3, because they suppose thatµ3 could be µδ6

3 = 7 with δ6 = 4, that σ33 couldbe σδ7

33 = 17 with δ7 = 2, and that σ32 could beσδ8

32 = 1 with δ8 = −1 (the same to σ23).

The sensitivity measure for those inaccuracyparameters yieldsSµ5(f(x5|X2 = 4), f(x5|X2 = 4, δ1)) = 0.1875Sσ55(f(x5|X2 = 4), f(x5|X2 = 4, δ2)) = 0.00195Sσ52(f(x5|X2 = 4), f(x5|X2 = 4, δ3)) = 0.00895Sσ22(f(x5|X2 = 4), f(x5|X2 = 4, δ4)) = 0.00541Sσ53(f(x5|X2 = 4), f(x5|X2 = 4, δ5)) = 0Sµ3(f(x5|X2 = 4), f(x5|X2 = 4, δ6)) = 0Sσ33(f(x5|X2 = 4), f(x5|X2 = 4, δ7)) = 0Sσ32(f(x5|X2 = 4), f(x5|X2 = 4, δ8)) = 0As these values of the sensitivity measures aresmall we can conclude that the perturbationspresented do not affect the variable of interestand therefore the network can be consideredrobust. Also, the inaccuracies about thenon-evidential variable X3 do not disturb theposterior marginal density of interest, beingthe sensitivity measure zero in all cases. Ifexperts think that the sensitivity measureobtained for the mean of the variable of interestis large enough then they should review theinformation about this variable.Moreover, we have implemented an algorithm(see Appendix 1) to compute all the sensitivitymeasures that can influence over the variableof interest Xi; this algorithm compare thosesensitivity measures computed with a thresholds fixed by experts. Then, if the sensitivitymeasure is larger than the threshold, theparameter should be reviewed.

Figure 2: Sensitivity measures obtained in theexample for any perturbation value

The extreme behavior of the sensitivity measurefor some particular cases, is given as followsWhen δ1 = 20, the sensitivity measure isSµ5(f(x5|X2 = 4), f(x5|X2 = 4, δ1)) = 8.33and with the perturbation δ1 = −25,Sµ5(f(x5|X2 = 4), f(x5|X2 = 4, δ1)) = 13.02.If the perturbation δ2 = 1000,Sσ55(f(x5|X2 = 4), f(x5|X2 = 4, δ2)) = 1.39therefore as the perturbation in-creases to infinity the sensitivitymeasure grows very slowly, in factSσ55(f(x5|X2 = 4), f(x5|X2 = 4, δ2)) = o(δ)as stated before. However if δ2 =−23, the sensitivity measure isSσ55(f(x5|X2 = 4), f(x5|X2 = 4, δ2)) = 9.91.We do not present the sensitivity measure whenδ3 is extreme because δ3 must be in (−√2,

√2)

to keep the covariance matrix of the networkwith the perturbation δ3 positive definite.Finally, when δ4 = 100, the sensitivity measureis Sσ22(f(x5|X2 = 4), f(x5|X2 = 4, δ4)) = 0.02(where the limit of Sσ22 when δ tends toinfinity is 0.0208) and with δ4 = −1.73,Sσ22(f(x5|X2 = 4), f(x5|X2 = 4, δ4)) = 2.026(being the limit when δ tends to Mee 2.1212).In Figure 2, we observed the sensitivity mea-sures, considered as a function of δ; the graphshows the behavior of the measure when δ ∈ <.

Sensitivity analysis of extreme inaccuracies in Gaussian Bayesian Networks 145

Page 156: Proceedings of the Third European Workshop on Probabilistic

6 Conclusions

In this paper we study the behavior of the sensi-tivity measure, that compares the marginal den-sity of interest when the model of the GaussianBayesian network is described with and withouta perturbation δ ∈ <, when the perturbation isextreme. Considering a large perturbation thesensitivity measure is large too except when theextreme perturbation is added to the evidencevariance. Therefore, although the evidence vari-ance were large and different from the variancein the original model, the sensitivity measurewould be limited by a finite value, that is be-cause the evidence about this variable explainsthe behavior of the variable of interest regard-less its inaccurate variance.Moreover, in all possible cases of the sensitivitymeasure, if the perturbation added to a parame-ter tends to zero, the sensitivity measure is zerotoo.The study of the behavior of the sensitivitymeasure is useful to prove that this is a well-defined measure to develop a sensitivity analy-sis in Gaussian Bayesian networks even if theproposed perturbation is extreme.The posterior research is focused on perturbingmore than one parameter simultaneously so aswith more than one variable of interest.

References

Castillo, E., Gutirrez, J.M., Hadi, A.S. 1997. ExpertSystems and Probabilistic Network Models. NewYork: Springer–Verlag.

Castillo, E., Kjærulff, U. 2003. Sensitivity analysisin Gaussian Bayesian networks using a symbolic-numerical technique. Reliability Engineering andSystem Safety, 79:139–148.

Chan, H., Darwiche, A. 2005. A distance Mea-sure for Bounding Probabilistic Belief Change.International Journal of Approximate Reasoning,38(2):149–174.

Coupe, V.M.H., van der Gaag, L.C. 2002. Propier-ties of sensitivity analysis of Bayesian belief net-works. Annals of Mathematics and Artificial In-telligence, 36:323–356.

Gomez-Villegas, M.A., Maın, P., Susi, R. 2006. Sen-sitivity analysis in Gaussian Bayesian networks

using a divergence measure. Communications inStatistics–Theory and Methods, accepted for pub-lication.

Heckerman, D. 1998. A tutorial on learning withBayesian networks. In: Jordan, M. I., eds,Learning in Graphical Models. Cambridge, Mas-sachusetts: MIT Press.

Jensen, F.V. 2001. Bayesian Networks and DecisionGraphs. New York: Springer.

Kullback, S., Leibler, R.A. 1951. On Informationand Sufficiency. Annals of Mathematical Statis-tics, 22:79–86.

Laskey, K.B. 1995. Sensitivity Analysis for Proba-bility Assessments in Bayesian Networks. IEEETransactions on Systems, Man, and Cybernetics,25(6):901–909.

Lauritzen, S.L. 1996. Graphical Models. Oxford:Clarendon Press.

Pearl, J. 1988. Probabilistic Reasoning in IntelligentSystems. Networks of Plausible Inference, MorganKaufmann: Palo Alto.

Appendix 1

The algorithm that computes the sensitivitymeasures and determines the parameters in thenetwork is available at the URL:http://www.ucm.es/info/eue/pagina//APOYO/RosarioSusiGarcia/S algorithm.pdf

Acknowledgments

This research was supported by the MEC fromSpain Grant MTM2005-05462 and UniversidadComplutense de Madrid-Comunidad de MadridGrant UCM2005-910395.

146 M. A. Gómez-Villegas, P. Maín, and R. Susi

Page 157: Proceedings of the Third European Workshop on Probabilistic

Learning Bayesian Networks Structure using Markov Networks

Christophe Gonzales and Nicolas JouveLaboratoire d’Informatique de Paris VI

8 rue du capitaine Scott 75015 Paris, France

Abstract

This paper addresses the problem of learning a Bayes net (BN) structure from a database. Weadvocate first searching the Markov networks (MNs) space to obtain an initial RB and, then, re-fining it into an optimal RB. More precisely, it can be shown that under classical assumptions ouralgorithm obtains the optimal RB moral graph in polynomial time. This MN is thus optimal w.r.t.inference. The process is attractive in that, in addition to providing optimality guarrantees, theMN space is substantially smaller than the traditional search spaces (those of BNs and equivalentclasses (ECs)). In practice, we face the classical shortcoming of constraint-based methods, namelythe unreliability of high-order conditional independence tests, and handle it using efficient con-ditioning set computations based on graph triangulations. Our preliminary experimentations arepromising, both in terms of the quality of the produced solutions and in terms of time responses.

1 Introduction

Effective modeling of uncertainty is essential tomost AI applications. For fifteen years probabilis-tic graphical models like Bayesian nets (BN) andMarkov nets (MN), a.k.a. Markov random fields,(Cowell et al., 1999; Jensen, 1996; Pearl, 1988)have proved to be well suited and computationallyefficient to deal with uncertainties in many practicalapplications. Their key feature consists in exploit-ing probabilistic conditional independences (CIs) todecompose a joint probability distribution over a setof random variables as a product of functions ofsmaller sets, thus allowing a compact representationof the joint probability as well as efficient inferencealgorithms (Allen and Darwiche, 2003; Madsen andJensen, 1999). These independences are encoded inthe graphical structure of the model, which is di-rected for BNs and undirected for MNs. In this pa-per, we propose an algorithm to learn a BN structurefrom a data sample of the distribution of interest.

There exist three main classes of algorithms forlearning BN from data. The first one tries to de-termine the set of all probabilistic CIs using sta-tistical independence tests (Verma and Pearl, 1990;Spirtes et al., 2000). Such tests have been criti-cized in the literature as they can only be appliedwith small conditioning sets, thus ruling out com-

plex BNs. A more popular approach consists insearching the BN structures space, optimizing somescore (BDeu, MDL, etc.) measuring how well thestructure fits data (Heckerman et al., 1995; Lam andBacchus, 1993). Unfortunately, the number of BNsis super-exponential in the number of random vari-ables (Robinson, 1973) and an exhaustive compar-ison of all the structures is impossible. One wayout is to use local search algorithms parsing effi-ciently the BN structure space, moving from oneBN to the next one by performing simple graphicalmodifications (Heckerman et al., 1995). However,these suffer from the existence of multiple BNs rep-resenting the same set of CIs. This not only length-ens the search but may also trap it into local op-tima. To avoid these problems, the third class ofapproaches (Munteanu and Bendou, 2001; Chick-ering, 2002) searches the space of BN equivalenceclasses (EC). In this space, equivalent BNs are rep-resented by a unique partially directed graph. In ad-dition to speeding-up the search by avoiding use-less moves from one BN to an equivalent one, thisclass of algorithms possesses nice theoretical prop-erties. For instance, under DAG-isormorphism, aclassical hypothesis on the data generative distribu-tion, and in the limit of a large database, the GESalgorithm is theoretically able to recover the opti-mal BN structure (Chickering, 2002). This property

Page 158: Proceedings of the Third European Workshop on Probabilistic

is remarkable as very few search algorithms are ableto guarantee the quality of the returned structure.

Although, in theory, the EC space seems more at-tractive than the BN space, it suffers in practice fromtwo problems: i) its neighborhood is exponential inthe number of nodes (Chickering, 2002) and ii) theEC space size is roughly equal to that of the BNspace (Gillispie and Perlman, 2001). It would thusbe interesting to find a space preserving the optimal-ity feature of the EC space exploited by GES whileavoiding the above problems. For this purpose, theMN space seems a good candidate as, like the ECspace, its equivalence classes are singletons. More-over, it is exponentially smaller than the BN spaceand, by its undirected nature, its neighborhood isalso exponentially smaller than that of EC. This sug-gests that searching the MN space instead of the ECspace can lead to significant improvements.

To support this claim, we propose a learning al-gorithm that mainly performs its search in the MNspace. More precisely, it is divided into three dis-tinct phases. In the first one, we use a local searchalgorithm that finds an optimal MN searching theMN space. It is well-known that only chordal MNsare precisely representable by BNs (Pearl, 1988).As it is unlikely that the MN we find at the end ofthe first phase is actually chordal, its transformationinto a BN must come along with the loss of someCIs. Finding a set of CIs that can be dispensed withto map the MN into a BN while fitting data as bestas possible is not a trivial task. Phase 2 exploits therelationship between BNs and their moral graphs totransform the MN into a BN whose moral graph isnot far from the MN obtained at the end of phase 1.Then, in a third phase, this BN is refined into onethat better fits data. This last phase uses GES sec-ond step. The whole learning algorithm preservesGES theoretical optimality guarantee. Furthermore,the BN at the end of phase 2, which possesses inter-esting theoretical properties, is obtained in polyno-mial time. Phase 3 is exponential but has an anytimeproperty. Preliminary experimental results suggestthat our learning algorithm is faster than GES andproduces better quality solutions.

The paper is organized as follows. Section 2 pro-vides some background on MNs and BNs. Then,Section 3 describes how we obtain the optimal MN.Section 4 shows how it can be converted into a BN.

Finally Section 5 mentions some related work andpresents some experimental results.

2 Background

Let V be a set of random variables with joint prob-ability distribution P (V). Variables are denoted bycapitalized letters (other than P ) and sets of vari-ables (except V) by bold letters.

Markov and Bayes nets are both composed of:i) a graphical structure whose nodes are the vari-ables in V (hereafter, we indifferently refer to nodesand their corresponding variables) and ii) a set ofnumerical parameters. The structure encodes proba-bilistic CIs among variables and then defines a fam-ily of probability distributions. The set of parame-ters, whose form depends on the structure, defines aunique distribution among this family and assessesit numerically. Once a structure is learned fromdata, its parameters are assessed, generally by max-imizing data likelihood given the model.

A MN structure is an undirected graph while aBN one is a directed acyclic graph (DAG). MNsgraphically encode CIs by separation. In an undi-rected graph G, two nodes X and Y are said to beseparated by a disjoint set Z, denoted by X ⊥G Y |Z, if each chain between X and Y has a node in Z.BNs criterion, called d-separation, is defined simi-larly except that it distinguishes colliders on a chain,i.e., nodes with their two adjacent arcs on the chaindirected toward them. In a DAG, X and Y are saidto be d-separated by Z, denoted by X ⊥B Y | Z, iffor each chain between X and Y there exists a nodeS s.t. if S is a collider on the chain, then neither S

nor any of its descendant is in Z, else S is in Z. Ex-tending Pearl’s terminology (Pearl, 1988), we willsay that a graph G, directed or not, is an I-map (Istanding for independency), implicitly relative to P ,if X ⊥G Y | Z =⇒ X PY | Z. We will alsosay that a graph G is an I-map of another graph G ′

when X ⊥G Y | Z =⇒ X ⊥G′ Y | Z. When twostructures are I-maps of one another, they representthe same family and are thus said to be equivalent.

The complete graph, which exhibits no separationassertion, is a structure containing all the possibledistributions and is, as such, always an I-map. Ofcourse, in a learning prospect, our aim is to recoverfrom data an I-map as sparse as possible. More pre-

148 C. Gonzales and N. Jouve

Page 159: Proceedings of the Third European Workshop on Probabilistic

cisely, in both MN and BN frameworks, an I-map Gis said to be optimal (w.r.t. inclusion) relatively to P

if there exists no other I-map G ′ such that i) G is anI-map of G ′ and ii) G is not equivalent to G ′.

Though they are strongly related, BNs and MNsare not able to represent precisely the same in-dependence shapes. MNs are mostly used in theimage processing and computer vision community(Pérez, 1998), while BNs are particularly used inthe AI community working on modeling and predic-tion tools like expert systems (Cowell et al., 1999).In the latter prospect, BNs are mostly preferred fortwo reasons. First, the kind of independences theycan express using arcs orientations are consideredmore interesting than the independence shapes onlyrepresentable by MNs (Pearl, 1988). Secondly, asBNs parameters are probabilities (while those ofMNs are non-negative potential functions withoutany real actual meaning), they are considered easierto assess, to manipulate and to interpret. In BNs,the direction is exploited only through V-structures,that is, three-node chains whose central node is acollider the neighbors of which are not connected byan arc. This idea is expressed in a theorem (Pearl,1988) stating that two BNs are equivalent if andonly if they have the same V-structures and the sameskeleton, where the skeleton of a DAG is the undi-rected graph resulting from the removal of its arcsdirection.

MNs are unable to encode V-structure informa-tion since it is a directed notion. As a DAG with-out V-structure is chordal (i.e. triangulated), it is notsurprising that a result (Pearl et al., 1989) states thatindependences of a family of distributions can berepresented by both MNs and BNs frameworks ifand only if the associated structure is chordal. Inthe general case where the graph is not chordal, it ispossible to get an optimal MN from a BN by mor-alization. The moral graph of a DAG is the undi-rected graph obtained by first adding edges betweennon adjacent nodes with a common child (i.e. theextremities of V-structures) and then removing thearcs orientations. It is easily seen that the moralgraph of a BN is an optimal undirected I-map of thisgraph, even if some independences of the BN havebeen necessarily loosed in the transformation. Theconverse, i.e. getting a directed I-map from a MN,is less easy and will be addressed in Section 4.

Like most works aiming to recover an optimalstructure from data, we will assume that the under-lying distribution P is DAG-isomorph, i.e. that thereexists a DAG B∗ encoding exactly the indepen-dences of P (s.t. X ⊥B∗ Y | Z⇐⇒ X PY | Z).Under this assumption, B∗ and equivalent BNs areobviously optimal. Using the axiomatic in (Pearl,1988), it can be shown that, in the MN frame-work, the DAG-isomorphism hypothesis entails theso-called intersection property, leading to the exis-tence of a unique optimal MN, say G∗ (Pearl, 1988).Moreover, G∗ is the moral graph of B∗.

3 Markov network search

Searching the BN or the EC space is often per-formed as an optimization process, that is, the algo-rithm looks for a structure optimizing some good-ness of fit measure, the latter being a decomposablescoring function that involves maximum likelihoodestimates. For MNs, the computation of these esti-mates is hard and requires time-expensive methods(Murray and Ghahramani, 2004) (unless the MN istriangulated, which is not frequent). Hence score-based exploration strategies seem inappropriate forMN searches. Using statistical CI tests and com-bining them to reveal the MN’s edges seems a morepromising approach. This is the one we follow here.

Algorithm LEARNMNInput : databaseOutput : moral graph G = (V , E2)

1. E1 ← ∅2. foreach edge (X, Y ) 6∈ E1 do

search, if necessary, a new SXY

s.t. X ⊥(V,E1) Y | SXY

3. if ∃ (X, Y ) 6∈ E1 s.t. X 6 Y |SXY then4. add edge (X, Y ) to E1 and go to line 25. E2 ← E16. foreach edge (X, Y ) ∈ E2 do

search, if necessary, a new SXY

s.t. X ⊥(V,E2\(X,Y )) Y | SXY

7. if ∃ (X, Y ) ∈ E2 s.t. X Y |SXY then8. remove edge (X, Y ) from E2 and go to line 69. return G = (V , E2)

End of algorithm

Unlike most works where learning a MN amountsto independently learn the Markov blanket of all thevariables, we construct a MN in a local search man-ner. Algorithm LEARNMN consists in two con-secutive phases: the first one adds edges to theempty graph until it converges toward a graph G1 =

Learning Bayesian Networks Structure using Markov Networks 149

Page 160: Proceedings of the Third European Workshop on Probabilistic

(V, E1). Then it removes from G1 as many edges aspossible, hence resulting in a graph G2 = (V, E2).

At each step of phase 1, we compute the depen-dence of each pair of non-adjacent nodes (X,Y )conditionally to any set SXY separating them inthe current graph. We determine the pair with thestrongest dependence (using a normalized differ-ence to the χ2 critical value) and we add the cor-responding edge to the graph and update separators.

Lemma 1. Assuming DAG-isomorphism and soundstatistical tests, graph G1 resulting for the firstphase of LEARNMN is an I-map.

Sketch of proof. At the end of phase 1, ∀ (X,Y ) 6∈E1, there exists SXY s.t. X ⊥G1

Y | SXY andX Y | SXY . By DAG-isomorphism, it can beshown that G1 contains the skeleton of B∗, and, then,that it also contains its moralization edges.

In the second phase, which takes an I-map G1 asinput, LEARNMN detects the superfluous edges inG1 and iteratively removes them.

Proposition 1. Assuming DAG-isomorphism andsound statistical tests, the graph G2 returned byLEARNMN is the optimal MN G∗.

Sketch of proof. At the end of phase 2, ∀ (X,Y ) ∈G2, there exists SXY s.t. X ⊥(V ,E2\(X,Y )) Y |SXY and X 6 Y | SXY . We show using DAG-isomorphism that this phase cannot delete any edgein G∗ and, then, that it discards all other edges.

It is well known that the main shortcoming of in-dependence test-based methods is the unreliabilityof high-order tests, so we must keep sets SXY assmall as possible. Finding small SXY requires amore sophisticated approach than simply using theset of neighbors of one of X or Y . Actually, the bestset is the smallest one that cuts all the paths betweenX and Y . This can be computed using a min-cut ina max-flow problem (Tian et al., 1998). Althoughthe set computed is then optimal w.r.t. the qualityof the independence test, it has a major drawback:computing a min-cut for all possible pairs of nodes(X,Y ) is too prohibitive when the graph involvesnumerous variables. Moreover, pairs of close nodesrequire redundant computations. An attractive alter-native results from the similarities between min-cutsand junction trees: a min-cut separates the MN intoseveral connected components, just as a separator

cuts a junction tree into distinct connected compo-nents. Hence, we propose a method, based on jointrees (Jensen, 1996), which computes the separatorsfor all pairs at the same time in O(|V|4), as com-pared to O(|V|5) in (Tian et al., 1998). Although theseparators we compute are not guaranteed to be op-timal, in practice they are most often small. Here isthe key idea: assume G0 is triangulated and a corre-sponding join tree J0 is computed. Then two casescan obtain: i) X and Y belong to the same clique, orii) they belong to different cliques of J0. In the sec-ond case, let C0, . . . ,Ck be the smallest connectedset of cliques such that X ∈ C0 and Y ∈ Ck, i.e.,C0 and Ck are the nearest cliques containing X andY . Then any separator on the path C0, . . . ,Ck cutsall the paths between X and Y in G0 and, thus, canact as an admissible set SXY . Assuming the trian-gulation algorithm produces separators as small aspossible, selecting the smallest one should keep setsSXY sufficiently small to allow independence tests.As for the first case, there is no separator between X

and Y , so the junction tree is helpless. However, wedo not need assigning to SXY all the neighbors of X

or Y , but only those that belong to the same bicon-nected component as X and Y (as only those can beon the chains between X and Y ). Such componentscan be computed quickly by depth first search algo-rithms. However, as we shall see, the triangulationalgorithm we used determines them implicitly.

In order to produce triangulations in polynomialtime (determining an optimal one is NP-hard), weused the simplicial and almost-simplicial rules ad-vocated by (Eijkhof et al., 2002) as well as someheuristics. These rules give high-quality triangula-tions but are time-consuming. So, to speed-up theprocess, we used incremental triangulations as sug-gested by (Flores et al., 2003). The idea is to updatethe triangulation only in the maximal prime sub-graphs of G0 involved in the modifications resultingfrom LEARNMN’s lines 4 and 8. In addition to thejoin tree, a join tree of max prime subgraphs is thusmaintained by the algorithm. It turns out that merg-ing in this tree the adjacent cliques that are linkedby a separator containing more than one node of Vprecisely produces G0’s biconnected components.

Once the join tree constructed, extracting for eachpair of nodes (X,Y ) not belonging to the sameclique its minimal separator is achieved by the col-

150 C. Gonzales and N. Jouve

Page 161: Proceedings of the Third European Workshop on Probabilistic

lect/distribute algorithms below. In the latter, mes-sages Mij transmitted from a clique Cj to Ci arevectors of pairs (X,S) such that the best condi-tioning sets between nodes Y in Ci\(Ci ∩ Cj)and X is S. An illustrative example is given onFigure 1: messages near solid arrows result fromCOLLECT(BCD,BCD) and those in dashedarrows from DISTRIBUTE(BCD,BCD,∅).

Algorithm COLLECTInput : pair (Ci,Cj)Output : message Mij

1. Mij ← an empty vector message2. foreach neighbor Ck of Ci except Cj do3. Mik ← Collect(Ck,Ci)4. foreach pair (X,S) in Mik do5. foreach node Y in Ci\(Ci ∩Ck) do6. if SXY 6= S then7. SXY ← S

8. done9. add (X, min(Ci ∩Ck,S)) to Mij

10. done11. done12. foreach node X in Ci\(Ci ∩Cj) do13. add (X,Ci ∩Cj) to Mij

14. return Mij

End of algorithm

Algorithm DISTRIBUTEInput : pair (Ci,Cj), message Nij

Output :1. foreach pair (X,S) in Nij do2. foreach node Y in Ci\(Ci ∩Cj) do3. if SXY 6= S then4. SXY ← S

5. done6. done7. foreach neighbor Ck of Ci except Cj do8. N← empty message vector9. foreach pair (X,S) in Nij do

10. add (X, min(Ci ∩Ck,S)) to N

11. call Distribute (Ck,Ci,N)12. N

′ ← message Mik sent during collect13. foreach pair (X,S) in N

′ do14. add (X, min(Ci ∩Ck,S)) to Nij

15. done

End of algorithm

BCDB

BD BDF

C CE

B BG

AB(A, B), (E, C)

(A, B)

(A, B), (E, C)

(G, B)(G, B), (F, B, D)

(A, B)

(E, C)

Figure 1: Messages sent during Collect/diffusion.

Because we use a polynomial algorithm to tri-angulate the maximal prime subgraphs, the com-plexity of the whole process recovering the opti-mal MN is also polynomial. Hence, under DAG-isomorphism, we obtain the moral graph of the op-timal BN in polynomial time. As the first step ofinference algorithms consists in moralizing the BN,and triangulating this moral graph to produce a sec-ondary structure well-suited for efficient computa-tions, the MN we obtain is optimal w.r.t. inference.

4 From the Markov Net to the Bayes Net

Once the MN is obtained, we transform it into a BNB0 using algorithm MN2BN.

Algorithm MN2BNInput : UG G = (V , E)Output : DAG B = (V ′,A)

1. A ← ∅ ; V ′ ← V2. While |V| > 1 Do3. Choose X ∈ V in a single clique of G4. E← Y ∈ V : (Y, X) ∈ E5. For all Y ∈ E Do6. E ← E\(Y, X) and A ← A ∪ (Y → X)7. V ← V\X8. For all (Y, Z) ∈ E×E Do9. G ← DEMORALIZE(G, (Y, Z))

10. Done11. Return B = (V ′,A)

End of algorithm

Algorithm DEMORALIZEInput : UG G = (V , E) ; Edge (X, Y ) ∈ EOutput : UG G′ = (V , E ′)

1. E ′ ← E\(X, Y )2. Search Z ⊂ V s.t. X ⊥(V,E′) Y | Z3. If X Y |Z Then Return G′ = (V , E ′)4. Else Return G′ = (V , E)

End of algorithm

Before stating the properties of MN2BN, let usillustrate it with an example. Consider the graphs inFigure 2. Assuming a DAG-isomorph distribution,B∗ represents the optimal BN. Suppose we obtainedthe optimal MN G∗: we now apply MN2BN to it. InG = G∗, F and R belong to a single clique. AssumeMN2BN chooses F . After saving its neighbors setin G (l.4), F is removed from G (l.7) as well as itsadjacent edges (l.6), which results in a new currentgraph G = G1. The empty DAG B is altered byadding arcs from these neighbors toward F , henceresulting in the next current DAG B = B1. By di-recting F ’s adjacent arcs toward F , we may create

Learning Bayesian Networks Structure using Markov Networks 151

Page 162: Proceedings of the Third European Workshop on Probabilistic

C

H

K

R

L

J

F C

H

K

R

L

J

F C

H

K

R

L

J

F C

H

K

R

L

J

F C

H

K

R

L

J

F

H

K

R

L

J

FC

H

K

R

L

J

F C

C

H

K

R

L

J

F C

H

K

R

L

J

C

H

K L

J

C

H

K

J

C

H

K

H

K K

G∗ G1 G2 G3 G4 G5 G6

B∗ B1 B2 B3 B4 B5 B6 = B0

Figure 2: A MN2BN run example

V-structures, hence some of the edges remaining inG between F ’s neighbors may correspond to mor-alization edges and may thus be safely discarded.To this end, for each candidate edge, DEMORALIZE

tests whether its deletion would introduce spuriousindependence in the DAG by searching a separatorin the current remaining undirected graph G. If not,we discard it from G. In our example, there is onlyone candidate edge, (C, J). We therefore computea set separating C from J in G1 without (C, J), sayK , and test if C J | K . As this independencedoes not hold in a distribution exactly encoded byB∗, the test fails and the edge is kept. For the seconditeration, only node R belongs to a single clique.The process is similar but, this time, moralizationedge (K,L) can be discarded. The algorithm thengoes on similarly and finally returns B6 = B0. Itis easily seen that B0 is a DAG which is an I-mapof B∗ and that its moral graph is G∗. Note that bymapping G∗ into B6, not all moralization edges havebeen identified. For instance, (C,F ) was not.

Now, let us explain why MN2BN producesDAGs closely related to the optimal BN we look for.

Proposition 2. Assuming a DAG-isomorph distri-bution and sound statistical tests, if MN2BN is ap-plied to G∗, the returned graph B0 is a DAG that isan I-map of B∗, and its moral graph is G∗. More-over, the algorithm runs in polynomial time.

Sketch of proof. By induction on V , we prove that,at each iteration, G is the moral graph of a subgraphof B∗. This entails the existence of a node in a singleclique at each iteration. Then we prove that given a

DAG B, its moral graph G and a node X belongingto a single clique of G, there exists a DAG B ′ s.t.: i)B′ is an I-map of B, ii) G is the moral graph of B ′

and iii) X has no child in B′. Finally, we prove theproposition by induction on V .

B0 encodes at least as many independences as G∗

and possibly more if some moralization edges havebeen discarded. In the case where MN2BN selectsnodes in the inverse topological order of B∗, B0 isactually B∗. However, this should not happen veryoften and recovering B∗ requires in general to refineB0 by identifying all remaining hidden V-structures.Under DAG-isomorph distribution and sound statis-tical tests, the second phase of GES (Chickering,2002) is known to transform an I-map of the genera-tive distribution into B∗. Applied to B0, it can there-fore be used as the refinement phase of our algo-rithm. This one has an exponential complexity butbenefits from an anytime property, in the sense thatit proceeds by constructing a sequence of BNs shar-ing the same moral graph and the quality of whichconverges monotonically from B0 to B∗. This lastphase preserves the GES score-based optimality.

With real-world databases, LEARNMN can failto recover precisely G∗. In this case, departing fromour theoretical framework, we lose our guaranteeof accuracy. We also lose the guarantee to alwaysfind a node belonging to a single clique. However,MN2BN can easily be modified to handle this situ-ation: if no node is found on line 3, just add edges toG so that a given node forms a clique with its neigh-bors. Whatever the node chosen, B0 is guaranteed

152 C. Gonzales and N. Jouve

Page 163: Proceedings of the Third European Workshop on Probabilistic

to be an I-map of the distribution represented by theinput MN. However, the node should be carefullychosen to avoid deviating too much from the MNpassed in argument to MN2BN, as each edge addi-tion hides conditional independences.

5 Related works and experiments

In this paper, we have presented an algorithm thatexploits attractive features of the MN space toquickly compute an undirected I-map close to theoptimal one. This graph is then directed and fed toGES second phase to obtain an optimal BN. The keyidea is to learn as much as possible the polynomi-ally accessible information of the generative distri-bution, namely its undirected part, and postpone tothe end the task concentrating the learning computa-tional complexity, namely the V-structure recovery.Most algorithms of the constraint-based approach,like IC (Verma and Pearl, 1990) or PC (Spirtes et al.,2000), deal at some stage with undirected learningbut they do not explicitly separate this stage fromthe directed one. That is, they generally search apower set to extract V-structures information beforeachieving a whole undirected search. In (Dijk et al.,2003), the two phases are disjoint but the authors aremore concerned with finding the skeleton (which isnot an I-map) rather than the moral graph. As theyonly allow CI tests conditioned by a set of size nogreater than 1, the search is mostly delegated to adirected score-based search. (Cheng et al., 2002)makes much stronger distribution assumptions.

Like GES, under DAG-isomorphism and soundCI test hypotheses, our method is able to find theoptimal BN it looks for. However, these hypothe-ses probably do not hold in practice. To assess thediscrepancy between theory and real world, we per-formed a series of experiments on classical bench-marks of the Bayes Net Repository1 . For each BN,we generated by logic sampling 10 databases of size500, 2000 and 20000. For each database, we ranLEARNMN, GES (the WinMine toolkit software)and a K2 implemented with a BIC score and fedwith the original BN topological ordering. K2 isthus already aware of some crucial learning infor-mation that LMN and GES must discover. For GES,we recorded as time response only the time required

1http://compbio.cs.huji.ac.il/Repository

for GES phase 1 but, as the toolkit does not providethe network found at the end of phase 1, we usedthe better one resulting from phase 2. For both GESand K2, we did not consider directly the output BNbut its moral graph, to compare it with the MN out-put by LMN. Note that the GES toolkit was unableto handle some of the graphs. Both time and qualityresults reported are means and standard deviationsfor the 10 databases. The first table shows runningtimes in seconds for LMN and GES (on a 3.06GHzPC). The second one shows for each algorithm thenumber of edges spuriously added (+) and missed (-) w.r.t. the original BN moral graph (whose numberof edges appears in the first row).

According to these preliminary tests, we considerour approach as promising. W.r.t GES, we findthat LMN is not time-consuming, as GES runningtime grows much faster than that of LMN whendatabase size increases. As for networks quality, itis noticeable that, for all algorithms, the smaller thedatabase, the weaker the dependences representedin the database and hence the fewer the edges recov-ered. LMN finds clearly more edges than GES andK2, despite the latter’s ordering knowledge whichgives it a strong advantage. LMN adds fewer spu-rious edges than GES but more than K2 (which ex-ploits its ordering knowledge). We think that LMNdid not detect some of the unnecessary edges be-cause nodes involved had so many neighbors thatχ2 tests were meaningless, despite our efforts. Thissuggests alternating multiple phases of edge addi-tions and deletions, so as to avoid situations dete-riorating due to big clusters of neighbors. We alsoshould try to tune the χ2 confidence probability.

References

D. Allen and A. Darwiche. 2003. Optimal time-spacetradeoff in probabilistic inference. In proceedings ofIJCAI, pages 969–975.

J. Cheng, R. Greiner, J. Kelly, D.A. Bell, and W. Liu.2002. Learning Bayesian networks from data: Aninformation-theory based approach. Artificial Intel-ligence, 137(1–2):43–90.

D.M. Chickering. 2002. Optimal structure identificationwith greedy search. Journal of Machine Learning Re-search, 3:507–554.

R.G. Cowell, A.P. Dawid, S.L. Lauritzen, and D.J.

Learning Bayesian Networks Structure using Markov Networks 153

Page 164: Proceedings of the Third European Workshop on Probabilistic

alarm barley carpo hailfinder insurance mildew watersize Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev

20000 LMN 3.65 0.82 9.38 0.73 11.96 2.67 15.56 1.31 3.59 0.1 3.29 0.37 7.48 0.78GES 22.36 1.72 75.91 4.19 16.74 4.22 18.36 3.91

2000 LMN 0.98 0.28 1.46 0.46 1.79 0.68 2.36 0.65 1.05 0.12 0.56 0.08 1.37 0.21GES 1.36 0 7.27 0.52 1.5 0 1.55 0

500 LMN 0.47 0.15 0.31 0.01 0.71 0.27 1.24 0.39 0.63 0.2 0.28 0.02 0.44 0.12GES 0.64 0 0.64 0 0.5 0 0.45 0

alarm (65) barley (126) carpo (102) hailfinder (99) insurance (70) mildew (80) water (123)size Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev

20000 GES + 13.6 2.69 26.6 3.85 17.6 5.14 17.9 2.34GES - 39.7 1.1 38.7 4.22 44.4 3.17 94.6 2.15LMN + 2.1 1.3 15.7 1.1 9.4 1.28 9.7 3 0.4 0.49 3.2 0.75 0 0LMN - 9.6 1.11 60.1 2.02 23.2 1.89 10.1 0.94 15.5 1.63 22.3 1.55 52.3 1.85K2 + 3.2 0.75 3 0 3.1 1.58 0 0 0 0 1 0 0 0K2 - 19.7 1.79 87.8 0.6 31.1 2.43 26 0 19.3 1.1 59 0 71.6 2.58

2000 GES + 14 0.63 11.8 3.6 11.8 2.32 3.9 1.14GES - 62 0.77 66.4 2.2 59.2 1.47 110.3 1.73LMN + 3.4 0.8 13.3 2.19 7.5 3.35 2.2 1.17 0.3 0.46 4.5 1.2 0.2 0.4LMN - 21.3 2.33 79 1.73 33.3 2.61 15.4 1.28 24.9 2.12 40.4 1.2 75 3.87K2 + 3.2 0.6 3.1 0.3 2.3 0.78 0 0 0 0 0 0 0 0K2 - 34.9 1.37 98.3 0.78 45.7 3.72 50.2 1.83 38.2 1.47 64.4 1.28 98.2 2.48

500 GES + 3.9 1.64 10.3 2.28 5.3 1 0.5 0.5GES - 65 0 89.1 2.07 65.1 1.14 118.2 1.25LMN + 7.1 2.26 8.2 1.25 5.9 2.12 2 1.26 1.1 0.94 4.4 1.5 0.4 0.92LMN - 31.5 2.01 90.2 1.08 54.2 2.52 33.6 3.07 34.7 1.68 56.8 0.98 88.3 2.24K2 + 3.7 0.78 3.4 0.49 4.5 2.29 0.1 0.3 0.3 0.46 0 0 0 0K2 - 42.7 1 105.2 0.98 60.8 2.44 65.5 1.2 47.2 0.98 74.8 0.75 101.6 1.2

Spiegelhalter. 1999. Probabilistic Networks and Ex-pert Systems. Springer-Verlag.

S. van Dijk, L. van der Gaag, and D. Thierens. 2003.A skeleton-based approach to learning Bayesian net-works from data. In Proceedings of PKDD.

F. van den Eijkhof, H.L. Bodlaender, and A.M.C.A.Koster. 2002. Safe reduction rules for weightedtreewidth. Technical Report UU-CS-2002-051,Utrecht University.

J. Flores, J. Gamez, and K. Olesen. 2003. Incrementalcompilation of bayesian networks. In Proceedings ofUAI, pages 233–240.

S.B. Gillispie and M.D. Perlman. 2001. EnumeratingMarkov equivalence classes of acyclic digraphs mod-els. In Proceedings of UAI, pages 171–177.

D. Heckerman, D. Geiger, and D.M. Chickering. 1995.Learning Bayesian networks: The combination ofknowledge and statistical data. Machine Learning,20(3):197–243.

F.V. Jensen. 1996. An Introduction to Bayesian Net-works. Springer-Verlag, North America.

W. Lam and F. Bacchus. 1993. Using causal informationand local measures to learn Bayesian networks. InProceedings of UAI, pages 243–250.

A.L. Madsen and F.V. Jensen. 1999. Lazy propagation:A junction tree inference algorithm based on lazy in-ference. Artificial Intelligence, 113:203–245.

P. Munteanu and M. Bendou. 2001. The EQ frame-work for learning equivalence classes of Bayesian net-works. In Proceedings of the IEEE International Con-ference on Data Mining, pages 417–424.

I. Murray and Z. Ghahramani. 2004. Bayesian learningin undirected graphical models: Approximate MCMCalgorithms. In Proceedings of UAI.

J. Pearl, D. Geiger, and T.S. Verma. 1989. The logic ofinfluence diagrams. In R.M. Oliver and J.Q. Smith,editors, Influence Diagrams, Belief Networks and De-cision Analysis. John Wiley and Sons.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. MorganKaufman.

P. Pérez. 1998. Markov random fields and images. CWIQuarterly, 11(4):413–437.

R.W. Robinson. 1973. Counting labeled acyclic di-graphs. In F. Harary, editor, New directions in GraphTheory, pages 239–273. Academic Press.

P. Spirtes, C. Glymour, and R. Scheines. 2000. Causa-tion, Prediction, and Search. Springer Verlag.

J. Tian, A. Paz, and J. Pearl. 1998. Finding minimal d-separators. Technical Report TR-254, Cognitive Sys-tems Laboratory.

T.S. Verma and J. Pearl. 1990. Causal networks : Se-mantics and expressiveness. In Proceedings of UAI,pages 69–76.

154 C. Gonzales and N. Jouve

Page 165: Proceedings of the Third European Workshop on Probabilistic

! "

# ! $%$

&' ! ( ! $ )

! $ $ $ *$ + $ , $- ! +$$.+ /$ + # ) #+ +# 0 $ $ ! ' $' 1/$ ! $ 2 # $ + * $ 34456 $ 3447- ! .' / !$$ $ $ / #/$ / + ! $ +!' $ #/$ $' , $'' 1 ! $ + 1 2 #$ $' ! ' / $ $ + 1 $ ' $ #/$ ! %$$)$/ # ! $$ $

$ '$ ! ! $ . # ! . 8 $ /+ #/$ $ .' / /#6 $+ + +$$ ! .# ' '

$ 9 / ' $$ 0 .!$ +# 1$ ! $+ /$ ! $$ 0 ! /# $ $ # $$ . /$ $ $ #/$ +$ 9 + + $ $ /#$ /+ ! /$ . $ $ $ / $$ /$ *$ 34446 $ 3444-

& $ $ $ + .

$ , $ / ' ! $ /$ $ $ ! .#$ # / ' ! # '*$$ :;<;6 $# $ 3447- !. /$ ! +# $ ! '. * $$ / - $.' # / #$ + ! / # / $. /$ / + / $ ! ' %.$ $ +$ #/$ $ .' / ' $ ! ' $ + * $ 34456 $ 3447- + + ! #/$ #$# .$$/ $$ $ $ $ ' 1/$ !.0 $ ! !' #/$ ++ + / ! $

Page 166: Proceedings of the Third European Workshop on Probabilistic

#$ $' , $' ' 1 ! $ + 1 ' !$$+ %

3 = + 1 $ $$ ! ! . > / + $$ ! 5 # .$ 1' $ #/$ ! %$$ 7 ? !. + $

+ /# ' / $ .' $ $ * &@(). * $ 34456 $ 3447-- / + $ /# /! #$# #/$ !$$+' A

: !$$ ! #/$ *$' /.# #/$- B : /' $ #/$ $ #/$2 $ / *- '' *$$ :;<;- ' / . / *(-*$ 34446 $ 3444-

3 #$ ' #/$ ! #$ $ .' $ #/$ $ C./D *- $ .$

B

E F F *:-

= / $$ . #/$ +

/ ! . #. ! * - B

*-

> / ! 2 ' ! /# #/$ / : + $

%' : + 0 $ !

$ $ + * $ 34456 $ 3447- / !$ ! ! , '' + *$ 3444- 0$' ! 9 ! ! $ 9 $ /

* $ 3444- 0 $ + $ $ $ 0! #/$ / !.! ,$ 0$

%$$ + + /$ /# $' / ! # *+ /# #/$ - ' ' /# / + ! /# #/$ $ *- Æ E / . $ $ ! /

9 # +* $ 34456 $ 3447- + $$+ A #/$ + 9 $ $ /##/$ $$ /$! $ $ 9 0' #$#' $ .$ + $ #/$ *$$ :;<;6$# $ 3447- !

#/$ $$+' $ . / '.

" 0 $ $ + %'. : $ / $ $ /$ !$$ '.' $ ! $ ! # B* -

/ $$ / ! ''$ #/$ *- + /# 6 ' ! #/$ ! /#$ !/$ /$$ 9 /# %$ /$ !' ! ' ' !

156 P. O. Hoyer, S. Shimizu, and A. J. Kerminen

Page 167: Proceedings of the Third European Workshop on Probabilistic

e3

e4

e7

e6

e4

e

a b

1

e3 x3

42 2

-5 -7

3

3

x4 x5x2

x6

x7x8e8 x8e8

e5e2

x1 e1

' x3

-812

2

-5

3

x4 x5x2 e5e2

x1-8 6

%' :A 0 $ ! $ #/$ &@() $ ' !$$+' ' A AB AB AB AB 3 F > F AB 3 F = F AB = F AB < 5 F AB ? F * $$ / '$- C/D #/$ + $$ $ ! / *- #/$ + 6 /# # ! $ #$ ! $$ ' + *- /#$$ $$ ,#$+ + $$ $$ $# #/$ # / $1 + 0 ! $

! ! #/$ + $# + '$A1' $ $ /+ /.# #/$ $$ $# #/$

+ $ G ! 2 #

AB 3 F > F

AB = F

$'

AB =*3 F > F - F

B 7 F :3 F = F

B 7 F :3 F

+ + # $1 B = F + + # #/$ ! $ / $ $ + /# $ '$ $ $. $' $ + %' :/ @ #/$ / $1 + + '' /#

2 !$ ! $$ $## $# #/$ ' !$.$+' A

+ $ #/$ &@() $ ! $ ! ./ *- ! /# # .$ ! + $ $ $ / ' / /.#$ $

+ $ #/$ &@() $ ! $! /#$$ ,#$ $$$ 9 ! /# #/$ /# #/$ $ ! + .$ ! $ *3444- $9 '# / * *-- + B 2 $ ! $$ ! + $ , / ' / /#. $$ 0

%$$ + 1 / $ #/$ &@() $ + $. #/$ * - $ + $ * .- % $' 9 $#/$ # ! $

Estimation of linear, non-gaussian causal models in the presence of confounding latent variables 157

Page 168: Proceedings of the Third European Workshop on Probabilistic

+ $ #/$ 0/ 0$ .$ ! ' /.# #/$ %$$ $ #/$ # #

# $ $ + /.#$$ $$ ,#$ '#$ #/$ &@() $ + !$$+' $'A

! "#

$

$ % $

& %' $

$ "# (

%%

) ( $

%$ *

+ %

$

$ %

, -

- .$ $

+ + + ! '. / /$ $ #/$ &@()

$ ! $ $+ /#$$ ,#$ '.' $

$ + * $ 34456 $ 3447- + + '! &@() !$$+ $ *"- / *" :;;>6 #H $ 344:- + /G#+ 1$$ !' 9 ! $ #/$

2 /' / ' !$$ #. B + $ $#/$ ! + ' / ! #/$ !$$ 1 B F + / ! ( 0 $ / . $+ '$ ! + $ ' *- ! #/$ $#'! / B + B * -

G ! / #./$ /# #/$ ' $ / $+ '$ *$.' $+ '$- + *- ' $ $ /+ .' ! . ! 1 $ $

! 8 ' + + # + @+ !.! ! ' ! #/$ $ B + 8 + ! ' /# #/$ 2 / ! /# #/$ $ / ! / #/$ ., + $ + + / " $.

& $ $ . C0' 0D % ! # $ #/$ &@()$ /#$$ $$,#$ $ $ + + $! '$ $ $$ @0 ' !$$ ! #/$ $$ $ #/$ 1 *

158 P. O. Hoyer, S. Shimizu, and A. J. Kerminen

Page 169: Proceedings of the Third European Workshop on Probabilistic

00

0

111

a

0

b c d

xk(i)

ek(i)

xk(i)

ek(i)

i J

xk(i)

erandperm(i)

i J

erandperm(i)

xi

i J

%' 3A ! !$$ " 0 ! $ $ * 0 $ = 7 /# #/$- + #/$@ ! $ $ $ ! 0 ! /0 / + /# #/$ #$ + ' + + + / (. / 1 + /# $ #/$ ! + # $ / ! 0 + *- " /# /0 $ + * !- + $ ! /# #/$ /# 0' 0 + ! $

$ - !$$+ / $$ /# #/$* $ - $ ! !$$ 0 + %' 3 $'+ $ /# ! !$$ 0 +$ $ + /.$ + %' 3 $ /# " / 0 $$ ! ! !$$ 0 $$ ! . $ $ $ $ / . ! /# / 0

+ + ! $ /# / ! ! $ # I # *344>-# $ + # $ "$ 1/$ '# ' #$ $' #$/$ ! ' *)$ $ :;;?6 :;;;6#H $ 344:- ! / ! ! $ $ + 0 " 0' .0 + 0 / $ + $ ! $' + #/$

, " .1 # $ $

$' ! $ ! $' 6 $ /$ ! /'/$ / ' ! G! / #/$ # ' ! '#/$ ! #/$ #' ' ! ! #/$ /# #/$ ! , + +# /$ $ #./$ / #/$ ' $' $ ! 1

/$ '$ $ ., !' $A 2 . + + $ ! #/$ @ +# $ ! ' $$ %'. 3 / ! /! /#$$ ,#$ $ $ / ! $1 /# # $ * F -J*JJ-+ / ! /.# #/$ % $1. + + $ #$ $.#/$ &@() $A

Estimation of linear, non-gaussian causal models in the presence of confounding latent variables 159

Page 170: Proceedings of the Third European Workshop on Probabilistic

%

"$ & -#

%

% '

$$

!

"# (

/ * '

"# $ $

% % & " $

# $

"# 0 % % "$ %

%# $

$ & '

"# 1 $

$ B

"# 23 3

% % $ & '

" # 3

$

+ # 0" / 0 + $ $. ! $ /

! #/$ $' *$$- . ' $$ $ ! /. ' $' $ $/$ % ! + $ ! $$ +$$ '# $$ !.! + 0 '' $

$#' /$ , # ! ! ! + # ' $' *I!

/ :;;=- /' ! . ' '.' $ ! 0' 0 / !$$+A % ! + $ ! 0$ / $ ' ' $$ '# $ 8' + K# ! $ $' ' * /.$ #$ - +$$ '# + + //$ ' : ! '+

#' 1 ! / .0 + $ $'

$ ! ! ''$ $ ' / $ $ $$ 9 + $ !

!$$ ! /# ! ! @#$ # / !$$ $ .$ * 5- # ' ! $$ $

2 # ! 0# $ . *- #! $' / *- #/$ ! $' !$' + *- # $ ! $' $$ . #/$ &@() $ ! %$$+$$. )$/ ! $$ ! 0 #$/$ $$+ 9$$ $ #! $ + !$ / / #' $ $ !

% + / / '' $ #/$ &@() $ ! $ $$' '" / 0 ! !' ! /#$$ ,#$ $ $' $' $$

160 P. O. Hoyer, S. Shimizu, and A. J. Kerminen

Page 171: Proceedings of the Third European Workshop on Probabilistic

a b

connections between observed variables

c

x1

x2

x3

0.6416

x4

0.4792

1.62

-0.031261.287

0.4717

-1.785

0.2653

-0.5683

0.411

x1

x2

-2.892 x3

0.6416

x4

0.4792

-0.031260.4449

-0.8535

0.3086

-0.6611

x1

x2

-5.657 x3

-1.5

x4

0.4792

-0.031260.4449

1.453

0.5253

1.125

connections to/from hidden variables

%' =A $ ' $ #/$ &@() $ $ $+ $$ /#$$ ,#$ + *- /#$$,#$ */ $$ ,#$- $ % " / + $ *- ' /$ ' /+ $ */- *- 0 ! $

%' = # 0$ $ ! + $$ ,#$ '$ $

+ $ #/$ ! / ' + $ # ! 0 ! / 0A2 ' &@() $ ! $ $$ ' "/ 0 $ 34 9CD ! ! . >3 ! ! /$ ''$ ! ! .$ + $ * $ - '$$ 2 / $' ' $1 ! !$ $' + .$ $ '' $.' # ! ./ $# $' *$ 34446 $ 3444- $1 ! $ . /#

%$$ + ! $$.$ . ! $' $ ! $ 2 0.!.' !+*)$ $ :;;?6 :;;;- . " / /$ $ /$ $ + # $$ + *= /# .

#/$- + $$ + + / ! $$ /#/$ $ ! :444 # + %' > + $ $ $'. 1 ! .+ * + ,$ .1/$- +$ / /$ + .#$ $ ' /# #/$

) + /! !+ ./ / $ $ $ /$ !/$ /.$ ! $/$ ' #. $ " / 0 + / ! . #/$ / ! ./ #/$ + " $.' *)$ $ :;;?6 :;;;- + ! # $$ / +$' #$/$ ! .' # $ " / $/$ $ '

%$ + / /$ $# $.#$ $' /$ # + . #/$ $ + ' / $$

Estimation of linear, non-gaussian causal models in the presence of confounding latent variables 161

Page 172: Proceedings of the Third European Workshop on Probabilistic

a b

x1

x3

-0.349

x2

0.1861 0.6672

-0.43 x1

x3

-0.3624

x2

-0.1588 -0.6894

0.4431

%' >A $ ' $ #./$ &@() $ ' $ ! @ + ' ! G! #/$ / + $$ $1 $' / ! /+

$' &@() $ + $ #./$ $ G $ $$ / !/# #/$ !$$+ . / *- $ *#H 3444- $ ./ / / !$+.$ / 2 !$$+ $ #$/ ' $$ / ! ' Æ" $' *#H $ 344:- $.' ! # $ " +$ $ /

2 # $ + + $$ $ + / #$# .' !' #/$ 0 * $ 34456 $ 3447- / + # ./ 9 ! !' $ #./$ + + !$ + /# .$ ! + !$$ )$/ #$/$ 1 #/$ ! %+ #$ !+ $ .. .$ $

2 #H !

+ / ! %$ 8 L34><37

! "

# $% & % ' ( )*+&

, - ./ $%

,0 1 23 *& 4/ / 353 $ % 6+7))&

# 8 " "%93 2: ; % *) < ' %3 =>% 3 2 3> $ !" #!$%%&' > **

2: # 8 * ,% > 2 3 / %= 3 %> =3 3/ *6+7++*

2: 3 , 8? * !"

, @3 =A $ , B +@C%3% 0 / =23 > 3> %C3 % ( ! ) #!*+,' > )+)*@3 B %

# * "- .) / ) ( $%/> D2 #

" "%93 2: ; # 8 * E2 =>3 3% 3> $ $0 ( 1 " ! 2 #1!$%%3'> *) ,/3> "

- "2 - " $ B%3 # " *) F > 33 2=/ % 4 ( . / +*&)

# " $ B%3 - " * ) ) @. #

162 P. O. Hoyer, S. Shimizu, and A. J. Kerminen

Page 173: Proceedings of the Third European Workshop on Probabilistic

Symmetric Causal Independence Modelsfor Classification

Rasa Jurgelenaite and Tom HeskesInstitute for Computing and Information Sciences

Radboud University NijmegenToernooiveld 1, 6525 ED Nijmegen, The Netherlands

Abstract

Causal independence modelling is a well-known method both for reducing the size of prob-ability tables and for explaining the underlying mechanisms in Bayesian networks. In thispaper, we propose an application of an extended class of causal independence models,causal independence models based on the symmetric Boolean function, for classification.We present an EM algorithm to learn the parameters of these models, and study con-vergence of the algorithm. Experimental results on the Reuters data collection show thecompetitive classification performance of causal independence models based on the sym-metric Boolean function in comparison to noisy OR model and, consequently, with otherstate-of-the-art classifiers.

1 Introduction

Bayesian networks (Pearl, 1988) are well-established as a sound formalism for represent-ing and reasoning with probabilistic knowledge.However, because the number of conditionalprobabilities for the node grows exponentiallywith the number of its parents, it is usuallyunreliable if not infeasible to specify the con-ditional probabilities for the node that has alarge number number of parents. The task of as-sessing conditional probability distributions be-comes even more complex if the model has to in-tegrate expert knowledge. While learning algo-rithms can be forced to take into account an ex-pert’s view, for the best possible results the ex-perts must be willing to reconsider their ideas inlight of the model’s ‘discovered’ structure. Thisrequires a clear understanding of the model bythe domain expert. Causal independence mod-els (Dıez, 1993; Heckerman and Breese, 1994;Srinivas, 1993; Zhang and Poole, 1996) can bothlimit the number of conditional probabilities tobe assessed and provide the ability for models tobe understood by domain experts in the field.The main idea of causal independence modelsis that causes influence a given common effect

through intermediate variables and interactionfunction.

Causal independence assumptions are of-ten used in practical Bayesian network mod-els (Kappen and Neijt, 2002; Shwe et al.,1991). However, most researchers restrict them-selves to using only the logical OR and log-ical AND operators to define the interactionamong causes. The resulting probabilistic sub-models are called noisy OR and noisy AND ;their underlying assumption is that the pres-ence of either at least one cause or all causesat the same time give rise to the effect. Sev-eral authors proposed to expand the space of in-teraction functions by other symmetric Booleanfunctions: the idea was already mentioned butnot developed further in (Meek and Heckerman,1997), analysis of the qualitative patterns waspresented in (Lucas, 2005), and assessment ofconditional probabilities was studied in (Jurge-lenaite et al., 2006).

Even though for some real-world problemsthe intermediate variables are observable (seeVisscher et al. (2005)), in many problems thesevariables are latent. Therefore, conditionalprobability distributions depend on unknownparameters which must be estimated from data,

Page 174: Proceedings of the Third European Workshop on Probabilistic

using maximum likelihood (ML) or maximum aposteriori (MAP). One of the most widespreadtechniques for finding ML or MAP estimates isthe expectation-maximization (EM) algorithm.Meek and Heckerman (1997) provided a gen-eral scheme how to use the EM algorithm tocompute the maximum likelihood estimates ofthe parameters in causal independence mod-els assumed that each local distribution func-tion is collection of multinomial distributions.Vomlel (2006) described the application of theEM algorithm to learn the parameters in thenoisy OR model for classification.

The application of an extended class of causalindependence models, namely causal indepen-dence models with a symmetric Boolean func-tion as an interaction function, to classificationis the main topic of this paper. These mod-els will further be referred to as the symmetriccausal independence models. We present an EMalgorithm to learn the parameters in symmet-ric causal independence models, and study con-vergence of the algorithm. Experimental resultsshow the competitive classification performanceof the symmetric causal independence modelsin comparison with the noisy OR classifier and,consequently, with other widely-used classifiers.

The remainder of this paper is organised asfollows. In the following section, we reviewBayesian networks and discuss the semantics ofsymmetric causal independence models. In Sec-tion 3, we state the EM algorithm for findingthe parameters in symmetric causal indepen-dence models. The maxima of the log-likelihoodfunction for the symmetric causal independencemodels are examined in Section 4. Finally, Sec-tion 5 presents the experimental results, andconclusions are drawn in Section 6.

2 Symmetric Boolean Functions for

Modelling Causal Independence

2.1 Bayesian Networks

A Bayesian network B = (G,Pr) represents afactorised joint probability distribution on a setof random variables V. It consists of two parts:(1) a qualitative part, represented as an acyclicdirected graph (ADG) G = (V(G),A(G)),

C1 C2 . . . Cn

H1 H2 . . . Hn

E f

Figure 1: Causal independence model

where there is a 1–1 correspondence betweenthe vertices V(G) and the random variablesin V, and arcs A(G) represent the conditional(in)dependencies between the variables; (2) aquantitative part Pr consisting of local proba-bility distributions Pr(V | π(V )), for each vari-able V ∈ V given the parents π(V ) of the corre-sponding vertex (interpreted as variables). Thejoint probability distribution Pr is factorised ac-cording to the structure of the graph, as follows:

Pr(V) =∏

V ∈V

Pr(V | π(V )).

Each variable V ∈ V has a finite set of mutuallyexclusive states. In this paper, we assume allvariables to be binary; as an abbreviation, wewill often use v+ to denote V = > (true) andv− to denote V = ⊥ (false). We interpret >as 1 and ⊥ as 0 in an arithmetic context. Anexpression such as

ψ(H1,...,Hn)=>

g(H1, . . . , Hn)

stands for summing g(H1, . . . , Hn) over all pos-sible values of the variables Hk for which theconstraint ψ(H1, . . . , Hn) = > holds.

2.2 Semantics of Symmetric Causal

Independence Models

Causal independence is a popular way to spec-ify interactions among cause variables. Theglobal structure of a causal independence modelis shown in Figure 1; it expresses the idea thatcauses C1, . . . , Cn influence a given common ef-fect E through hidden variables H1, . . . , Hn anda deterministic function f , called the interac-tion function. The impact of each cause Cion the common effect E is independent of each

164 R. Jurgelenaite and T. Heskes

Page 175: Proceedings of the Third European Workshop on Probabilistic

other cause Cj , j 6= i. The hidden variable Hi

is considered to be a contribution of the causevariable Ci to the common effect E. The func-tion f represents in which way the hidden ef-fects Hi, and indirectly also the causes Ci, in-teract to yield the final effect E. Hence, thefunction f is defined in such a way that whena relationship, as modelled by the function f ,between Hi, i = 1, . . . , n, and E = > is sat-isfied, then it holds that f(H1, . . . , Hn) = >.It is assumed that Pr(e+ | H1, . . . , Hn) = 1 iff(H1, . . . , Hn) = >, and Pr(e+ | H1, . . . , Hn) =0 if f(H1, . . . , Hn) = ⊥.

A causal independence model is defined interms of the causal parameters Pr(Hi | Ci), fori = 1, . . . , n and the function f(H1, . . . , Hn).Most papers on causal independence models as-sume that absent causes do not contribute tothe effect (Heckerman and Breese, 1994; Pearl,1988). In terms of probability theory this im-plies that it holds that Pr(h+

i | c−i ) = 0; as a

consequence, it holds that Pr(h−i | c−i ) = 1. In

this paper we make the same assumption.In situations in which the model does not cap-

ture all possible causes, it is useful to introducea leaky cause which summarizes the unidentifiedcauses contributing to the effect and is assumedto be always present (Henrion, 1989). We modelthis leak term by adding an additional inputCn+1 = 1 to the data; in an arithmetic contextthe leaky cause is treated in the same way asidentified causes.

The conditional probability of the occurrenceof the effect E given the causes C1, . . . , Cn, i.e.,Pr(e+ | C1, . . . , Cn), can be obtained from thecausal parameters Pr(Hl | Cl) as follows (Zhangand Poole, 1996):

Pr(e+ | C1, . . . , Cn)

=∑

f(H1,...,Hn)=>

n∏

i=1

Pr(Hi | Ci). (1)

In this paper, we assume that the function f inEquation (1) is a Boolean function. However,there are 22

n

different n-ary Boolean functions(Enderton, 1972; Wegener, 1987); thus, the po-tential number of causal interaction models ishuge. However, if we assume that the order of

the cause variables does not matter, the Booleanfunctions become symmetric (Wegener, 1987)and the number reduces to 2n+1.

An important symmetric Boolean function isthe exact Boolean function εl, which has func-tion value true, i.e. εl(H1, . . . , Hn) = >, if∑ni=1

ν(Hi) = l with ν(Hi) equal to 1, if Hi

is equal to true and 0 otherwise. A symmetricBoolean function can be decomposed in termsof the exact functions εl as (Wegener, 1987):

f(H1, . . . , Hn) =n∨

i=0

εi(H1, . . . , Hn) ∧ γi (2)

where γi are Boolean constants depending onlyon the function f . For example, for the Booleanfunction defined in terms of the OR operator wehave γ0 = ⊥ and γ1 = . . . = γn = >.

Another useful symmetric Boolean function isthe threshold function τk, which simply checkswhether there are at least k trues among the ar-guments, i.e. τk(I1, . . . , In) = >, if

∑nj=1

ν(Ij) ≥k with ν(Ij) equal to 1, if Ij is equal to trueand 0 otherwise. To express it in the Booleanconstants we have: γ0 = ¢ ¢ ¢ = γk−1 = ⊥ andγk = ¢ ¢ ¢ = γn = >. Causal independence modelbased on the Boolean threshold function furtherwill be referred to as the noisy threshold models.

2.3 The Poisson Binomial Distribution

Using the property of Equation (2) of the sym-metric Boolean functions, the conditional prob-ability of the occurrence of the effect E given thecauses C1, . . . , Cn can be decomposed in termsof probabilities that exactly l hidden variablesH1, . . . , Hn are true as follows:

Pr(e+ | C1, . . . , Cn)

=∑

0 ≤ l ≤ n

γl

εl(H1,...,Hn)

n∏

i=1

Pr(Hi | Ci).

Let l denote the number of successes in n

independent trials, where pi is a probabilityof success in the ith trial, i = 1, . . . , n; letp = (p1, . . . , pn), then B(l;p) denotes the Pois-son binomial distribution (Le Cam, 1960; Dar-

Symmetric Causal Independence Models for Classification 165

Page 176: Proceedings of the Third European Workshop on Probabilistic

roch, 1964):

B(l;p) =n

i=1

(1− pi)∑

1≤j1<...<jl≤n

l∏

z=1

pjz1− pjz

.

Let us define a vector of probabilistic param-eters p(C1, . . . , Cn) = (p1, . . . , pn) with pi =Pr(h+

i | Ci). Then the connection between thePoisson binomial distribution and the class ofsymmetric causal independence models is as fol-lows.

Proposition 1. It holds that:

Pr(e+ | C1, . . . , Cn) =n

i=0

B(i;p(C1, . . . , Cn))γi.

3 EM Algorithm

Let D = x1, . . . ,xN be a data set of indepen-dent and identically distributed settings of theobserved variables in a symmetric causal inde-pendence model, where

xj = (cj , ej) = (cj1, . . . , cjn, e

j).

We assume that no additional informationabout the model is available. Therefore, to learnthe parameters of the model we maximize theconditional log-likelihood

CLL(θθ) =N

j=1

ln Pr(ej | cj , θθ).

The expectation-maximization (EM) algo-rithm (Dempster et al., 1977) is a generalmethod to find the maximum likelihood esti-mate of the parameters in probabilistic models,where the data is incomplete or the model hashidden variables.

Let θθ = (θ1, . . . , θn) be the parameters of thesymmetric causal independence model whereθi = Pr(h+

i | c+

i ). Then, after some calcula-tions, the (z + 1)-th iteration of the EM algo-rithm for symmetric causal independence mod-els is given by:

Expectation step: For every data samplexj = (cj , ej) with j = 1, . . . , N , we form

p(z,j) = (p(z,j)1

, . . . , p(z,j)n ) where p

(z,j)

i = θ(z)

i cji .

Let us define

p(z,j)

\k = (p(z,j)1

, . . . , p(z,j)

k−1, p

(z,j)

k+1, . . . , p(z,j)

n ).

Subsequently, for all hidden variables Hk withk = 1, . . . , n we compute the probabilityPr(h+

k | ej , cj , θθ(z)) where

Pr(h+

k | ej , cj , θθ(z))

=p(z,j)

k

∑n−1

i=0B

(

i;p(z,j)

\k

)

γi+1

∑ni=0

B(

i;p(z,j))

γiif ej = 1,

and

Pr(h+

k | ej , cj , θθ(z))

=p(z,j)

k

(

1−∑n−1

i=0B

(

i;p(z,j)

\k

)

γi+1

)

1−∑ni=0

B(

i;p(z,j))

γiif ej = 0.

Maximization step: Update the parameterestimates for all k = 1, . . . , n:

θk =

1≤j≤N cjkPr(h+

k | ej , cj , θθ(z))

1≤j≤N cjk

.

4 Analysis of the Maxima of the

Log-likelihood Function

Generally, there is no guarantee that the EMalgorithm will converge to a global maximum oflog-likelihood. In this section, we investigatethe maxima of the conditional log-likelihoodfunction for symmetric causal independencemodels.

4.1 Noisy OR and Noisy AND Models

In this section we will show that the conditionallog-likelihood for the noisy OR and the noisyAND models has only one maximum. Sincethe conditional log-likelihood for these modelsis not necessarily concave we will use a mono-tonic transformation to prove the absence of thestationary points other than global maxima.

First, we establish a connection between themaxima of the log-likelihood function and themaxima of the corresponding composite func-tion.

166 R. Jurgelenaite and T. Heskes

Page 177: Proceedings of the Third European Workshop on Probabilistic

Proposition 2. (Global optimality condi-tion for concave functions (Boyd andVandenberghe, 2004))Suppose h(q) : Q → < is concave and differen-tiable on Q. Then q∗ ∈ Q is a global maximumif and only if

∇h(q∗) =

(

∂h(q∗)

∂q1, . . . ,

∂h(q∗)

∂qn

)T

= 0.

Further we consider the function

CLL(θθ) = h(q(θθ)).

Let CLL(θθ) and h(q(θθ)) be twice differentiablefunctions, and let q(θθ) be a differentiable, in-jective function where θθ(q) is its inverse. Thenthe following relationship between the station-ary points of the functions CLL and h holds.

Lemma 1. Suppose, θθ∗ is a stationary point ofCLL(θθ). Then there is a corresponding pointq(θθ∗), which is a stationary point of h(q(θθ)).

Proof. Since the function q(θθ) is differentiable

and injective, its Jacobian matrix ∂(q1,...,qn)

∂(θ1,...,θn)is

positive definite. Therefore, from the chainrule it follows that if ∇CLL(θθ∗) = 0, then∇h(q(θθ∗)) = 0.

Proposition 3. If h(q(θθ)) is concave and θθ∗

is a stationary point of CLL(θθ), then θθ∗ is aglobal maximum.

Proof. If θθ∗ is a stationary point, then fromLemma 1 it follows that q(θθ∗) is also station-ary. From the global optimality condition forconcave functions the stationary point q(θθ∗) isa maximum of h(q(θθ)), thus from the definitionof global maximum we get that for all θθ

CLL(θθ) = h(q(θθ)) ≤ h(q(θθ∗)) = CLL(θθ∗).

Given Proposition 3 the absence of the lo-cal optima can be proven by introducing sucha monotonic transformation q(θθ) that the com-posite function h(q(θθ)) would be concave. As

it is a known result that the Hessian matrix ofthe log-likelihood function for logistic regressionis negative-semidefinite, and hence the problemhas no local optima, we will use transformationsthat allow us to write the log-likelihood for thenoisy OR and noisy AND models in a similarform as that of the logistic regression model.

The conditional probability of the effect in anoisy OR model can be written:

Pr(e+ | c, θθ) = 1−n

i=1

Pr(h−i | ci)

= 1−n

i=1

(1− θi)ci = 1− exp

Ã

n∑

i=1

ln(1− θi)ci

)

.

Let us choose a monotonic transformation qi =− ln(1 − θi), i = 1, . . . , n. Then the conditionalprobability of the effect in a noisy OR modelequals

Pr(e+ | c,q) = 1− e−qTc.

Let us define zj = qT cj and f(zj) = Pr(e+ |cj ,q), then the function h reads

h(q) =N

j=1

ej ln f(zj) + (1− ej) ln(1− f(zj)). (3)

Since f ′(zj) = 1 − f(zj), the first derivative ofh is

∂h(q)

∂q=

N∑

j=1

f ′(zj)(ej − f(zj))

f(zj)(1− f(zj))cj

=N

j=1

ej − f(zj)

f(zj)cj .

To prove that the function h is concave we needto prove that its Hessian matrix is negativesemidefinite. The Hessian matrix of h reads

∂2h(q)

∂q∂qT= −

N∑

j=1

1− f(zj)

f(zj)2ejcjcj T .

As the Hessian matrix of h is negative semidefi-nite, the function h is concave. Therefore, fromProposition 3 it follows that every stationarypoint of the log-likelihood function for the noisyOR model is a global maximum.

Symmetric Causal Independence Models for Classification 167

Page 178: Proceedings of the Third European Workshop on Probabilistic

The conditional probability of the effect in anoisy AND model can be written:

Pr(ej+ | c, θθ) =n

i=1

Pr(h+

i | ci)

=n

i=1

θcii = exp

Ã

n∑

i=1

ln θici

)

.

Let us choose a monotonic transformation qi =ln θi, i = 1, . . . , n. Then the conditional proba-bility of the effect in a noisy AND model equals

Pr(ej+ | c,q) = eqTc.

Let us define zj = qT cj and f(zj) = Pr(e+ |cj ,q). The function h is the same as for thenoisy OR model in Equation (3). Combinedwith f ′(zj) = f(zj), it yields the first derivativeof h

∂h(q)

∂q=

N∑

j=1

f ′(zj)(ej − f(zj))

f(zj)(1− f(zj))cj

=N

j=1

ej − f(zj)

1− f(zj)cj

and Hessian matrix

∂2h(q)

∂q∂qT= −

N∑

j=1

f(zj)

(1− f(zj))2(1− ej)cjcj T .

Hence, the function h is concave, and the log-likelihood for the noisy AND model has no otherstationary points than the global maxima.

4.2 General Case

The EM algorithm is guaranteed to converge tothe local maxima or saddle points. Thus, wecan only be sure that the global maximum, i.e.a point θθ∗ such that CLL(θθ∗) ≥ CLL(θθ) for allθθ∗ 6= θθ, will be found if the log-likelihood hasneither saddle points nor local maxima. How-ever, the log-likelihood function for a causal in-dependence model with any symmetric Booleanfunction does not always fulfill this requirementas it is shown in the following counterexample.

Example 1. Let us assume a data set D =(1, 1, 1, 1), (1, 0, 1, 0) and an interaction func-tion ε1, i.e. γ1 = 1 and γ0 = γ2 = γ3 = 0. To

learn the hidden parameters of the model de-scribing this interaction we have to maximizethe conditional log-likelihood function

CLL(θθ) = ln[θ1(1− θ2)(1− θ3)

+(1− θ1)θ2(1− θ3) + (1− θ1)(1− θ2)θ3]

+ ln[1− θ1(1− θ3)− (1− θ1)θ3].

Depending on a choice for initial parametersettings θθ(0), the EM algorithm for symmetriccausal independence models converges to one ofthe maxima:

CLL(θθ)max

=

0 at θθ = (0, 1, 0),

−1.386 at θθ ∈ (

θ1, 0,1

2

)

,(

1

2, 0, θ3

)

.

Obviously, only the point θθ = (0, 1, 0) is a globalmaximum of the log-likelihood function whilethe other obtained points are local maxima.

The discussed counterexample proves that ingeneral case the EM algorithm for symmetriccausal independence models does not necessar-ily converge to the global maximum.

5 Experimental Results

For our experiments we use Reuters data collec-tion, which allows us to evaluate the classifica-tion performance of large symmetric causal in-dependence models where the number of causevariables for some document classes is in thehundreds.

5.1 Evaluation Scheme

Since we do not have an efficient algorithm toperform a search in the space of symmetricBoolean functions, we chose to model the in-teraction among cause and effect variables bymeans of Boolean threshold functions, whichseem to be the most probable interaction func-tions for the given domains.

Given the model parameters θθ, the testingdata Dtest and the classification threshold 1

2, the

classifications and misclassifications for bothclasses are computed. Let tp (true positives)stand for the number of data samples (cj , ej+) ∈Dtest for which Pr(e+ | cj , θθ) ≥ 1

2and fp (false

168 R. Jurgelenaite and T. Heskes

Page 179: Proceedings of the Third European Workshop on Probabilistic

positives) stand for the number of data samples(cj , ej+) ∈ Dtest for which Pr(e+ | cj , θθ) < 1

2.

Likewise, tn (true negatives) is the number ofdata samples (cj , ej−) ∈ Dtest for which Pr(e+ |cj , θθ) < 1

2and fp (false positives) is the num-

ber of data samples (cj , ej−) ∈ Dtest for whichPr(e+ | cj , θθ) ≥ 1

2. To evaluate the classifica-

tion performance we use accuracy, which is ameasure of correctly classified cases,

η =tp+ tn

tp+ tn+ fn+ fp,

and F-measure, which combines precision π =tp

tp+fpand recall ρ = tp

tp+fn,

F =2πρ

π + ρ.

5.2 Reuters Data Set

We used the Reuters-21578 text categorizationcollection containing the Reuters new storiespreprocessed by Karciauskas (2002). The train-ing set contained 7769 documents and the test-ing set contained 3018 documents. For everyof the ten document classes the most informa-tive features were selected using the expectedinformation gain as a feature selection criteria,and each document class was classified sepa-rately against all other classes. We chose touse the same threshold for the expected infor-mation gain as in (Vomlel, 2006), the numberof selected features varied from 23 for the corndocument class to 307 for the earn documentclass. While learning the values of the hiddenparameters the EM algorithm was stopped af-ter 50 iterations. The accuracy and F-measurefor causal independence models with the thresh-old interaction function k = 1, . . . , 4 are givenin tables 1 and 2. Even though the thresholdto select the relevant features was tuned for thenoisy OR classifier, for 5 document classes thecausal independence models with other interac-tion function than logical OR provided betterresults.

The accuracy and F-measure of the noisy ORmodel and a few other classifiers on the Reutersdata collection reported in (Vomlel, 2006) showthe competitive performance of the noisy ORmodel.

Table 1: Classification accuracy for symmetriccausal independence models with the interac-tion function τk, k = 1, . . . , 4 for Reuters dataset; NClass is number of documents in the cor-responding class.

Class NClass τ1 τ2 τ3 τ4

earn 1087 96.3 97.2 97.2 96.8acq 719 93.1 93.2 93.2 93.0crude 189 98.1 98.1 97.6 97.7money-fx 179 95.8 95.8 95.9 96.0

grain 149 99.2 99.0 98.2 97.9interest 131 96.5 96.8 96.7 96.7trade 117 96.6 97.0 97.3 97.3

ship 89 98.9 98.8 98.7 98.6wheat 71 99.5 99.2 98.8 98.5corn 56 99.7 99.4 99.1 98.8

6 Discussion

In this paper, we discussed the application ofsymmetric causal independence models for clas-sification. We developed the EM algorithm tolearn the parameters in symmetric causal inde-pendence models and studied its convergence.The reported experimental results indicate thatit is unnecessary to restrict causal independencemodels to only two interaction functions, logicalOR and logical AND. Competitive classificationperformance of symmetric causal independencemodels present them as a potentially useful ad-ditional tool to the set of classifiers.

The current study has only examined theproblem of learning conditional probabilities ofhidden variables. The problem of learning anoptimal interaction function has not been ad-dressed. Efficient search in symmetric Booleanfunction space is a possible direction for futureresearch.

Acknowledgments

This research, carried out in the TimeBayesproject, was supported by the Netherlands Or-ganization for Scientific Research (NWO) underproject number FN4556. The authors are grate-ful to Gytis Karciauskas for the preprocessedReuters data. We would also like to thank Jirı

Symmetric Causal Independence Models for Classification 169

Page 180: Proceedings of the Third European Workshop on Probabilistic

Table 2: Classification F-measure for symmet-ric causal independence models with the inter-action function τk, k = 1, . . . , 4 for Reuters dataset; NClass is number of documents in the cor-responding class.

Class NClass τ1 τ2 τ3 τ4

earn 1087 95.0 96.1 96.1 95.6acq 719 85.3 84.3 84.5 83.8crude 189 84.5 85.7 80.7 81.0money-fx 179 60.9 62.1 62.6 62.7

grain 149 92.7 89.9 80.7 77.2interest 131 40.2 55.0 53.3 54.0trade 117 51.0 61.2 63.7 63.7

ship 89 79.5 77.7 74.5 71.5wheat 71 90.3 81.8 71.4 66.2corn 56 91.8 83.6 72.5 61.5

Vomlel for sharing his code and insights.

References

S. Boyd and L. Vandenberghe. 2004. Convex Opti-mization. Cambridge University Press.

J. Darroch. 1964. On the distribution of the numberof successes in independent trials. The Annals ofMathematical Statistics, 35:1317–1321.

A.P. Dempster, N.M. Laird and D.B. Rubin. 1977.Maximum likelihood from incomplete data via theEM algorithm. Journal of the Royal Statistical So-ciety, 39:1–38.

F.J. Dıez. 1993. Parameter adjustment in Bayes net-works. The generalized noisy OR-gate. In Proceed-ings of the Ninth Conference on Uncertainty inArtificial Intelligence, pages 99–105.

H.B. Enderton. 1972. A Mathematical Introductionto Logic. Academic Press, San Diego.

D. Heckerman and J.S. Breese. 1994. A new look atcausal independence. In Proceedings of the TenthConference on Uncertainty in Artificial Intelli-gence, pages 286–292.

M. Henrion. 1989. Some practical issues in con-structing belief networks. Uncertainty in ArtificialIntelligence, 3:161–173.

R. Jurgelenaite, T. Heskes and P.J.F. Lucas. 2006.Noisy threshold functions for modelling causalindependence in Bayesian networks. Technical

report ICIS–R06014, Radboud University Ni-jmegen.

H.J. Kappen and J.P. Neijt. 2002. Promedas, a prob-abilistic decision support system for medical diag-nosis. Technical report, SNN - UMCU.

G. Karciauskas. 2002. Text Categorization using Hi-erarchical Bayesian Network Classifiers. Masterthesis, Aalborg University.

L. Le Cam. 1960. An approximation theorem for thePoisson binomial distribution. Pacific Journal ofMathematics, 10:1181–1197.

P.J.F. Lucas. 2005. Bayesian network modellingthrough qualitative patterns. Artificial Intelli-gence, 163:233–263.

C. Meek and D. Heckerman. 1997. Structure andparameter learning for causal independence andcausal interaction models. In Proceedings of theThirteenth Conference on Uncertainty in Artifi-cial Intelligence, pages 366–375.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. Mor-gan Kauffman Publishers.

M.A. Shwe, B. Middleton, D.E. Heckerman, M.Henrion, E.J. Horvitz, H.P. Lehmann and G.F.Cooper. 1991. Probabilistic diagnosis using a re-formulation of the INTERNIST-1/QMR knowl-edge base, I – the probabilistic model and in-ference algorithms. Methods of Information inMedicine, 30:241–255.

S. Srinivas. 1993. A generalization of the noisy-ormodel. In Proceedings of the Ninth Conference onUncertainty in Artificial Intelligence, pages 208–215.

S. Visscher, P.J.F. Lucas, M. Bonten and K.Schurink. 2005. Improving the therapeutic perfor-mance of a medical Bayesian network using noisythreshold models. In Proceedings of the Sixth In-ternational Symposium on Biological and MedicalData Analysis, pages 161–172.

J. Vomlel. 2006. Noisy-or classifier. InternationalJournal of Intelligent Systems, 21:381–398.

I. Wegener. 1987. The Complexity of Boolean Func-tions. John Wiley & Sons, New York.

N.L. Zhang and D. Poole. 1996. Exploiting causal in-dependence in Bayesian networks inference. Jour-nal of Artificial Intelligence Research, 5:301–328.

170 R. Jurgelenaite and T. Heskes

Page 181: Proceedings of the Third European Workshop on Probabilistic

Complexity Results for Enhanced Qualitative ProbabilisticNetworks

Johan Kwisthout and Gerard TelDepartment of Information and Computer Sciences

University of UtrechtUtrecht, The Netherlands

Abstract

While quantitative probabilistic networks (QPNs) allow the expert to state influencesbetween nodes in the network as influence signs, rather than conditional probabilities,inference in these networks often leads to ambiguous results due to unresolved trade-offsin the network. Various enhancements have been proposed that incorporate a notionof strength of the influence, such as enhanced and rich enhanced operators. Althoughinference in standard (i.e., not enhanced) QPNs can be done in time polynomial to thelength of the input, the computational complexity of inference in such enhanced networkshas not been determined yet. In this paper, we introduce relaxation schemes to relatethese enhancements to the more general case where continuous influence intervals areused. We show that inference in networks with continuous influence intervals is NP -hard,and remains NP -hard when the intervals are discretised and the interval [−1, 1] is dividedinto blocks with length of 1

4 . We discuss membership of NP, and show how these generalcomplexity results may be used to determine the complexity of specific enhancements toQPNs. Furthermore, this might give more insight in the particular properties of feasibleand infeasible approaches to enhance QPNs.

1 Introduction

While probabilistic networks (Pearl, 1988) arebased on a intuitive notion of causality anduncertainty of knowledge, elicitating the re-quired probabilistic information from the ex-perts can be a difficult task. Qualitative prob-abilistic networks (Wellman, 1990), or QPNs,have been proposed as a qualitative abstrac-tion of probabilistic networks to overcome thisproblem. These QPNs summarise the condi-tional probabilities between the variables in thenetwork by a sign, which denotes the directionof the effect. In contrast to quantitative net-works, where inference has been shown to beNP -hard (Cooper, 1990), these networks haveefficient (i.e., polynomial-time) inference algo-rithms. QPNs are often used as an intermedi-ate step in the construction of a probabilisticnetwork (Renooij and van der Gaag, 2002), asa tool for verifying properties of such networks

(van der Gaag et al., 2006), or in applicationswhere the exact probability distribution is un-known or irrelevant (Wellman, 1990).

Nevertheless, this qualitative abstractionleads to ambiguity when influences with con-trasting signs are combined. Enhanced QPNshave been proposed (Renooij and van der Gaag,1999) in order to allow for more flexibility in de-termining the influences (e.g., weakly or stronglypositive) and partially resolve conflicts whencombining influences. Also, mixed networks(Renooij and van der Gaag, 2002) have beenproposed, to facilitate stepwise quantificationand allowing both qualitative and quantitativeinfluences to be modelled in the network.

Although inference in quantitative networksis NP -hard, and polynomial-time algorithms areknown for inference in standard qualitative net-works, the computational complexity of infer-ence in enhanced networks has not been deter-

Page 182: Proceedings of the Third European Workshop on Probabilistic

mined yet. In this paper we recall the defini-tion of QPNs in section 2, and we introduce aframework to relate various enhancements, suchas enhanced, rich enhanced, and interval-basedoperators in section 3. In section 4 we show thatinference in the general, interval-based case isNP -hard. In section 5 we show that it remainsNP -hard if we use discrete - rather than contin-uous - intervals. Furthermore, we argue that,although hardness proofs might be nontrivialto obtain, it is unlikely that there exist poly-nomial algorithms for less general variants ofenhanced networks, such as the enhanced andrich enhanced operators suggested by Renooijand Van der Gaag (1999). Finally, we concludeour paper in section 6.

2 Qualitative Probabilistic Networks

In qualitative probabilistic networks, a directedacyclic graph G = (V,A) is associated with a set∆ of qualitative influences and synergies (Well-man, 1990), where the influence of one node toanother is summarised by a sign1. For example,a positive influence of a node A on its succes-sor B, denoted with S+(A,B), expresses thathigher values for A make higher values for Bmore likely than lower values, regardless of in-fluences of other nodes on B. In binary cases,with a > a and b > b, this can be summarisedas Pr(b | ax)−Pr(b | ax) ≥ 0 for any value of xof other predecessors of B. Negative influences,denoted by S−, and zero influences, denoted byS0, are defined analogously. If an influence isnot positive, negative, or zero, it is ambiguous,denoted by S?. Influences can be direct (causalinfluence) or induced (inter-causal influence orproduct synergy). In the latter case, the valueof one node influences the probabilities of valuesof another node, given a third node (Druzdzeland Henrion, 1993b).

Various properties hold for these qualitativeinfluences, namely symmetry, transitivity, com-position, associativity and distribution (Well-man, 1990; Renooij and van der Gaag, 1999). If

1Note that the network Q = (G, ∆) represents in-finitely many quantitative probabilistic networks that re-spect the restrictions on the conditional probabilities de-noted by the signs of all arcs.

we define Sδ(A,B, ti) as the influence Sδ, withδ ∈ +,−, 0, ?, from a node A on a node Balong trail ti, we can formalise these propertiesas shown in table 1. The ⊗- and ⊕-operatorsthat follow from the transitivity and composi-tion properties are defined in table 2.

symmetry Sδ(A, B, ti) ∈ ∆⇔

Sδ(B, A, t−1i ) ∈ ∆

transitivity Sδ(A, B, ti) ∧ Sδ′(B, C, tj)⇒

Sδ⊗δ′(A, C, ti tj)

composition Sδ(A, B, ti) ∧ Sδ′(A, B, tj)⇒

Sδ⊕δ′(A, B, ti tj)

associativity S(δ⊕δ′)⊕δ′′= Sδ⊕(δ′⊕δ′′)

distribution S(δ⊕δ′)⊗δ′′= S(δ⊗δ′′)⊕(δ′⊗δ′′)

Table 1: Properties of qualitative influences

⊗ + − 0 ? ⊕ + − 0 ?

+ + − 0 ? + + ? + ?

− − + 0 ? − ? − − ?

0 0 0 0 0 0 + − 0 ?

? ? ? 0 ? ? ? ? ? ?

Table 2: The ⊗- and ⊕-operator for combining signs

Using these properties, an efficient (poly-nomial time) inference algorithm can be con-structed (Druzdzel and Henrion, 1993a) thatpropagates observed node values to other neigh-bouring nodes. The basic idea of the algorithm,given in pseudo-code in figure 1, is as follows.When entering the procedure, a node I is in-stantiated with a ‘+’ or a ‘−’ (i.e., trail = ∅,from = to = I and msign = ‘+’ or ‘−’ ). Then,this node sign is propagated through the net-work, following active trails and updating nodeswhen needed. Observe from table 2 that a nodecan change at most two times: from ‘0’ to ‘+’,‘−’, or ‘?’, and then only to ‘?’. This algorithmvisits each node at most two times, and there-fore halts after a polynomial amount of time.

172 J. Kwisthout and G. Tel

Page 183: Proceedings of the Third European Workshop on Probabilistic

procedure PropagateSign(trail, from, to, msign):sign[to] ← sign[to] ⊕ msign;trail ← trail ∪ to ;for each active neighbour Vi of todo lsign ← sign of influence between to and Vi;

msign ← sign[to] ⊗ lsign;if Vi 6∈ trail and sign[Vi] 6= sign[Vi] ⊕ msignthen PropagateSign(trail, to, Vi, msign).

Figure 1: The sign-propagation algorithm

3 Enhanced QPNs

These qualitative influences and synergiescan of course be extended to preserve alarger amount of information in the abstrac-tion (Renooij and van der Gaag, 1999).For example, given a certain cut-off valueα, an influence can be strongly positive(Pr(b | ax)− Pr(b | ax) ≥ α) or weakly negative(−α ≤ Pr(b | ax)− Pr(b | ax) ≤ 0). The basic‘+’ and ‘−’ signs are enhanced with signs forstrong influences (‘++’ and ‘−−’) and aug-mented with multiplication indices to handlecomplex dependencies on α as a result of tran-sitive and compositional combinations. In addi-tion, signs such as ‘+?’ and ‘−?’ are used to de-note positive or negative influences of unknownstrength. Using this notion of strength, trade-offs in the network can be modelled by compo-sitions of weak and strong opposite signs.

Furthermore, an interval network can be con-structed (Renooij and van der Gaag, 2002),where each arc has an associated influence in-terval rather than a sign. Such an influ-ence is denoted as F [p,q](A,B), meaning thatPr(b | ax)− Pr(b | ax) ∈ [p, q]. Note that, giventhis definition, S+(A,B) ⇐⇒ F [0,1](A,B),and similar observations hold for S−, S0 andS?. We will denote the intervals [−1, 0], [0, 1],[0, 0] and [−1, 1] as unit intervals, being specialcases that correspond to the traditional quali-tative networks. The ⊗- and ⊕-operator, de-noting transitivity and composition in intervalnetworks are defined in table 3. Note that it ispossible that a result of a combination of twotrails leads to an empty set, for example whencombining [12 , 1] with [34 , 1], which would denotethat the total influence of a node on anothernode, along multiple trails, would be greater

than one, which is impossible. Since the indi-vidual intervals might be estimated by experts,this situation is not unthinkable, especially inlarge networks. This property can be used todetect design errors in the network.

Note, that the symmetry, associativity, anddistribution property of qualitative networks dono longer apply in these enhancements. For ex-ample, although a positive influence from a nodeA to B along the direction of the arc also has apositive influence in the opposite direction, thestrength of this influence is unknown. Also, theoutcome of the combination of a strongly pos-itive, weakly positive and weakly negative signdepends on the evaluation order.

⊗i [r, s]

[p, q] [min X, max X],

where X = p · r, p · s, q · r, q · s

⊕i [r, s]

[p, q] [p + r, q + s] ∩ [−1, 1]

Table 3: The ⊗i- and ⊕i-operators for interval multipli-cation and addition

3.1 Relaxation schemes

If we take a closer look at the ⊕e, ⊕r, and⊗e operators defined in (Renooij and van derGaag, 1999) and compare them with theinterval operators ⊕i and ⊗i, we can see thatthe interval results are sometimes somehow’relaxed’. We see that symbols representinginfluences correspond to intervals, but after theapplication of any operation on these intervals,the result is extended to an interval that canbe represented by one of the available symbols.For example, in the interval model we have[α, 1] ⊕i [−1, 1] = [α − 1, 1], but, while [α, 1]corresponds to ++ in the enhanced model and[−1, 1] corresponds to ?, + +⊕e? =? ≡ [−1, 1].The lower limit α− 1 is relaxed to −1, becausethe actually resulting interval [α − 1, 1] doesnot correspond to any symbol. Therefore, toconnect the (enhanced) qualitative and intervalmodels, we will introduce relaxation schemesthat map the result of each operation to theminimal interval that can be represented by

Complexity Results for Enhanced Qualitative Probabilistic Networks 173

Page 184: Proceedings of the Third European Workshop on Probabilistic

one of the available symbols:

Definition 1. (Relaxation scheme)Rx will be defined as a relaxation scheme,denoted as Rx([a, b]) = [c, d], if Rx maps theoutcome [a, b] of an ⊕ or ⊗ operation to aninterval [c, d], where [a, b] ⊆ [c, d].

In standard QPNs, the relaxation scheme(which I will denote RI or the unit scheme) isdefined as in figure 2.

RI(a, b) =

[0, 1] if a ≥ 0 ∧ b > 0

[−1, 0] if a < 0 ∧ b ≤ 0

[0, 0] if a = b = 0

[−1, 1] otherwise

Figure 2: Relaxation scheme RI(a, b)

Similarly, the ⊕e, ⊕r, and ⊗e operators canbe denoted with the relaxation schemes in fig-ure 3, in which m equals min(i, j) and α is anarbitrary cut-off value. To improve readabil-ity, in the remainder of this paper the ⊕- and⊗-operators, when used without index, denoteoperators on intervals as defined in table 3.

R⊗e(a, b) =

[−1, 1] if a < 0 ∧ b > 0

[a, b] otherwise

R⊕e(a, b) =

[αm, 1] if a = αi + αj ≤ b

[−1,−αm] if b = −(αi + αj) ≥ a

[0, 1] if a ≤ b = αi + αj

[−1, 0] if a = −(αi + αj) ≤ b

[0, 1] if a = (αi − αj)

and b ≥ 0 and i < j

[−1, 0] if a = −(αi − αj)

and b ≤ 0 and i < j

[−1, 1] if a ≤ 0 and b ≥ 0

[a, b] otherwise

R⊕r (a, b) =

[−1, 1] if a < 0 ∧ b > 0

[a, b] otherwise

Figure 3: Relaxation schemes R⊗e , R⊕e , and R⊕r

This notion of a relaxation scheme allows usto relate various operators in a uniform way.A common property of most of these schemes

is that the ⊕-operator is no longer associative.The result of inference now depends on the or-der in which various influences are propagatedthrough the network.

3.2 Problem definition

To decide on the complexity of inference of thisgeneral, interval-based enhancements of QPNs,a decision problem needs to be determined. Westate this problem, denoted as iPieqnetd2, asfollows.

iPieqnetdInstance: Qualitative Probabilistic Net-work Q = (G, ∆) with an instantiationfor A ∈ V (G) and a node B ∈ V \ A.Question: Is there an ordering on thecombination of influences such that theinfluence of A on B ⊂ [−1, 1]?

3.3 Probability representation

In this paper, we assume that the probabili-ties in the network are represented by fractions,denoted by integer pairs, rather than by reals.This has the advantage, that the length of theresult of addition and multiplication of fractionsis polynomial in the length of the original num-bers. We can efficiently code the fractions inthe network by rewriting them, using their leastcommon denominator. Adding or multiplyingthese fractions will not affect their denomina-tors, whose length will not change during theinference process.

4 Complexity of the problems

We will prove the hardness of the inferenceproblem iPieqnetd by a transformation from3sat. We construct a network Q using clausesC and boolean variables U , and prove that,upon instantiation of a node I to [1, 1], thereis an ordering on the combination of influencessuch that the influence of I on a given nodeY ∈ Q \ I is a true subset of [−1, 1], if andonly if the corresponding 3sat instance is sat-isfiable. In the network, the influence of a node

2An acronym for Interval-based Probabilistic Infer-ence in Enhanced Qualitative Networks

174 J. Kwisthout and G. Tel

Page 185: Proceedings of the Third European Workshop on Probabilistic

A on a node B along the arc (A,B) is given asan interval; when the interval equals [1, 1] (i.e.,Pr(b | a) = 1 and Pr(b | a) = 0) then the in-terval is omitted for readability. Note, that theinfluence of B on A, against the direction of thearc (A,B), equals the unit interval of the influ-ence associated with (A,B).

As a running example, we will constructa network for the following 3sat instance,introduced in (Cooper, 1990):

Example 1. (3satex)U = u1, u2, u3, u4, and C = (u1 ∨ u2 ∨ u3),(¬u1 ∨ ¬u2 ∨ u3), (u2 ∨ ¬u3 ∨ u4)

This instance is satisfiable, for example withthe truth assignment u1 = T , u2 = F , u3 = F ,and u4 = T .

4.1 Construct for our proofs

For each variable in the 3sat instance, our net-work contains a ”variable gadget” as shownin figure 4. After the instantiation of nodeI with [1, 1], the influence at node D equals[12 , 1]⊕[−1

2 , 12 ]⊕[−1,−1

2 ], which is either [−1, 12 ],

[−12 , 1] or [−1, 1], depending on the order of

evaluation. We will use the non-associativityof the ⊕-operator in this network as a non-deterministic choice of assignment of truth val-ues to variables. As we will see later, an eval-uation order that leads to [−1, 1] can be safelydismissed (it will act as a ’falsum’ in the clauses,making both x and ¬x false), so we will concen-trate on [−1

2 , 1] (which will be our T assign-ment) and [−1, 1

2 ] (F assignment) as the twopossible choices.

We construct literals ui from our 3sat in-stance, each with a variable gadget Vg asinput. Therefore, each variable can have avalue of [−1, 1

2 ] or [−12 , 1] as influence, non-

deterministicly. Furthermore, we add clause-networks Clj for each clause in the instanceand connect each variable ui to a clause-networkClj if the variable occurs in Cj . The influenceassociated with this arc (ui, Clj) is defined asF [p,q](ui, Clj), where [p, q] equals [−1, 0] if ¬ui

is in Cj , and [0, 1] if ui is in Cj (figure 5). Note

I

A

[ 12, 1] [−1,− 1

2]

B

D

C[− 12, 1

2]

Vg

Figure 4: ”Variable gadget” Vg

that an ⊗-operation with [−1, 0] will transforma value of [−1, 1

2 ] in [−12 , 1] and vice versa, and

[0, 1] will not change them. We can thereforeregard an influence F [−1,0] as a negation of thetruth assignment for that influence. Note, that[−1, 1] will stay the same in both cases.

If we zoom in on the clause-network in figure6, we will see that the three ‘incoming’ vari-ables in a clause, that have a value of either[−1, 1

2 ], [−12 , 1], or [−1, 1], are multiplied with

the arc influence Fi,j to form variables, and thencombined with the instantiation node (with avalue of [1, 1]), forming nodes wi. Note that[−1, 1

2 ]⊕ [1, 1] = [−1, 1]⊕ [1, 1] = [0, 1] and that[−1

2 , 1]⊕ [1, 1] = [12 , 1]. Since a [−1, 1] result inthe variable gadget does not change by multi-plying with the Fi,j influence, and leads to thesame value as an F variable, such an assignmentwill never satisfy the 3sat instance.

The influences associated with these nodes wi

are multiplied by [12 , 1] and added together inthe clause result Oj . At this point, Oj has avalue of [k4 , 1], where k equals the number ofliterals which are true in this clause. The con-secutive adding to [−1

4 , 1], multiplication with[0, 1] and adding to [1, 1] has the function of alogical or -operator, giving a value for Cj of [34 , 1]if no literal in the clause was true, and [1, 1] ifone or more were true.

We then combine the separate clauses Cj intoa variable Y , by adding edges from each clauseto Y using intermediate variables D1 to Dn−1.

Complexity Results for Enhanced Qualitative Probabilistic Networks 175

Page 186: Proceedings of the Third European Workshop on Probabilistic

I

Vg1

u1 u2 u3 u4

F1,1 F4,3

Cl1 Cl2 Cl3

F1,2 F3,3

Vg2 Vg3 Vg4

F3,2

F2,1

F2,2

F3,1F2,3

Figure 5: The literal-clause construction

u1 u2 u3 I

[ 12 , 1][ 12 , 1] [ 12 , 1]

[− 14 , 1]

F1,j F2,jF3,j

w1

w2

w3

Oj

O′j

Cj

[0, 1]

Clj

Figure 6: The clause construction

The use of these intermediate variables allowsus to generalise these results to more restrictedcases. The interval of these edges is [12 , 1], lead-ing to a value of [1, 1] in Y if and only if allclauses Cj have a value of [1, 1] (see figure 7).

The influence interval in Y has a value between[34 , 1] and [2

k+1−12k+1 , 1], where k is the number of

clauses, if one or more clauses had a value of[0, 1]. Note that we can easily transform theinterval in Y to a subset of [−1, 1] in Y ′ by con-secutively adding it to [−1, 1] and [−1+ 1

2k+1 , 1].This would result in a true subset of [−1, 1] ifand only Y was equal to [1, 1].

CnC2C1

[ 12 , 1]

Y

[ 12 , 1] [ 12 , 1]

D1

Ci

Di

[ 12 , 1]

[ 12 , 1]

[ 12 , 1]

Figure 7: Connecting the clauses

4.2 NP-hardness proof

Using the construct presented in the previoussection, the computational complexity of theiPieqnetd can be established as follows.

Theorem 1. The iPieqnetd problem is NP-hard.

Proof. To prove NP -hardness, we construct atransformation from the 3sat problem. Let(U,C) be an instance of this problem, and letQ(U,C) be the interval-based qualitative proba-bilistic network constructed from this instance,as described in the previous section. When thenode I ∈ Q is instantiated with [1, 1], then Ihas an influence of [1, 1] on Y (and therefore aninfluence on Y ′ which is a true subset of [−1, 1])if and only if all nodes Cj have a value of [1, 1],i.e. there exists an ordering of the operatorsin the ’variable-gadget’ such that at least oneliteral in C is true. We conclude that (U,C)has a solution with at least one true literal ineach clause, if and only if the iPieqnetd prob-lem has a solution for network Q(U,C), instanti-

176 J. Kwisthout and G. Tel

Page 187: Proceedings of the Third European Workshop on Probabilistic

ation I = [1, 1] and node Y ′. Since Q(U,C) canbe computed from (U,C) in polynomial time,we have a polynomial-time transformation from3sat to iPieqnetd, which proves NP -hardnessof iPieqnetd.

4.3 On the possible membership of NP

Although iPieqnetd has been shown to be NP -hard, membership of the NP -class (and, as aconsequence, NP -completeness) is not trivial toprove. To prove membership of NP, one has toprove that if the instance is solvable, then thereexists a certificate, polynomial in length withrespect to the input of the problem, that canbe used to verify this claim. A trivial certifi-cate could be a formula, using the ⊕- and ⊗-operators, influences, and parentheses, describ-ing how the influence of the a certain node canbe calculated from the instantiated node andthe characteristics of the network. Unfortu-nately, such a certificate can grow exponentiallylarge.

In special cases, this certificate might also bedescribed by an ordering of the nodes in the net-work and for each node an ordering of the in-puts, which would be a polynomial certificate.For example, the variable gadget from figure 4can be described with the ordering I, A,B,C,Dplus an ordering on the incoming trails in D.All possible outcomes of the propagation algo-rithm can be calculated using this description.Note, however, that there are other propagationsequences possible that cannot be described inthis way. For example, the algorithm might firstexplore the trail A → B → D → C, and afterthat the trail A → C → D → B. Then, theinfluence in D is dependent on the informationin C, but C is visited by D following the firsttrail, and D is visited by C if the second trailis explored. Nevertheless, this cannot lead toother outcomes than [−1, 1

2 ], [−12 , 1], or [−1, 1].

However, it is doubtful that such a descrip-tion exists for all possible networks. For exam-ple, even if we can make a topological sort ofa certain network, starting with a certain in-stantiation node X and ending with anothernode Y , it is still not the case that a propa-gation sequence that follows only the direction

of the arcs until all incoming trails of Y havebeen calculated always yields better results thana propagation sequence that doesn’t have thisproperty. This makes it plausible that there ex-ist networks where an optimal solution (i.e., apropagation sequence that leads to the small-est possible subset of [−1, 1] at the node we areinterested in) cannot be described using such apolynomial certificate.

5 Operator variants

In order to be able to represent every possi-ble 3sat instance, a relaxation scheme must beable to generate a variable gadget, and retainenough information to discriminate between thecases where zero, one, two or three literals ineach clause are true. Furthermore, the relax-ation scheme must be able to represent the in-stantiations [1, 1] (or >) and [−1,−1] (or ⊥),and the uninstantiated case [0, 0]. With a re-laxation scheme that effectively divides the in-terval [−1, 1] in discrete blocks with size of amultitude of 1

4 , (such as R 14(a, b) = [ b4ac

4 , d4be4 ])

the proof construction is essentially the sameas in the general case discussed in section 3.This relaxation scheme does not have any effecton the intervals we used in the variable gadgetand the clause construction of Q(U,C), the net-work constructed in the NP -hardness proof ofthe general case used only intervals (a, b) forwhich R 1

4(a, b) = (a, b). Furthermore, when

connecting the clauses, the possible influencesin Y are relaxed to [0, 1], [14 , 1], [12 , 1], [34 , 1], and[1, 1], so we can onstruct Y ′ by consecutivelyadding the interval in Y to [−1, 1] and [−3

4 , 1].Thus, the problem - which we will denote asrelaxed-Pieqnetd - remains NP -hard for re-laxation scheme R 1

4.

The non-associativity of the ⊕e- and ⊕r-operators defined in (Renooij and van der Gaag,1999) suggest hardness of the inference problemas well. Although ⊕e and ⊕r are not associa-tive, they cannot produce results that can beregarded as opposites. For example, the expres-sion (++⊕e +⊕e−) can lead to a positive influ-ence of unknown strength (‘+?’) when evaluatedas ((+ + ⊕e+) ⊕e −) or an unknown influence

Complexity Results for Enhanced Qualitative Probabilistic Networks 177

Page 188: Proceedings of the Third European Workshop on Probabilistic

(‘?’) when evaluated as (+ + ⊕e(+ ⊕e −)), butnever to a negative influence. A transformationfrom a 3sat variant might not succeed becauseof this reason. However, it might be possibleto construct a transformation from relaxed-Pieqnetd, which is subject of ongoing research.

6 Conclusion

In this paper, we addressed the computationalcomplexity of inference in enhanced Qualita-tive Probabilistic Networks. As a first step,we have ”embedded” both standard and en-hanced QPNs in the interval-model using relax-ation schemes, and we showed that inference inthis general interval-model is NP -hard and re-mains NP -hard for relaxation scheme R 1

4(a, b).

We believe, that the hardness of inference is dueto the fact that reasoning in QPNs is under-defined: The outcome of the inference processdepends on choices during evaluation. Never-theless, further research needs to be conductedin order to determine where exactly the NP/Pborder lies, in other words: which enhance-ments to the standard qualitative model allowfor polynomial-time inference, and which en-hancements lead to intractable results. Fur-thermore, a definition of transitive and com-positional combinations of qualitative influencesin which the outcome is independent of the or-der of the influence propagation might reducethe computational complexity of inference andfacilitate the use of qualitative models to de-sign, validate, analyse, and simulate probabilis-tic networks.

Acknowledgements

This research has been (partly) supported bythe Netherlands Organisation for Scientific Re-search (NWO).The authors wish to thank Hans Bodlaenderand Silja Renooij for their insightful commentson this subject.

References

G.F. Cooper. 1990. The Computational Complex-ity of Probabilistic Inference using Bayesian BeliefNetworks. Artificial Intelligence, 42(2):393–405.

M.J. Druzdzel and M. Henrion. 1993a. Belief Prop-agation in Qualitative Probabilistic Networks.In N. Piera Carrete and M.G. Singh, editors,Qualitative Reasoning and Decision Technologies,CIMNE: Barcelona.

M.J. Druzdzel and M. Henrion. 1993b. Intercausalreasoning with uninstantiated ancestor nodes. InProceedings of the Ninth Conference on Uncer-tainty in Artificial Intelligence, pages 317–325.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. Mor-gan Kaufmann, Palo Alto.

S. Renooij and L.C. van der Gaag. 1999. EnhancingQPNs for trade-off resolution. In Proceedings ofthe Fifteenth Conference on Uncertainy in Artifi-cial Intelligence, pages 559–566.

S. Renooij and L.C. van der Gaag. 2002. FromQualitative to Quantitative Probabilistic Net-works. In Proceedings of the Eighteenth Con-ference in Uncertainty in Artificial Intelligence,pages 422–429.

L.C. van der Gaag, S. Renooij, and P.L. Gee-nen. 2006. Lattices for verifying monotonicity ofBayesian networks. In Proceedings of the ThirdEuropean Workshop on Probabilistic GraphicalModels (to appear).

M.P. Wellman. 1990. Fundamental Concepts ofQualitative Probabilistic Networks. Artificial In-telligence, 44(3):257–303.

178 J. Kwisthout and G. Tel

Page 189: Proceedings of the Third European Workshop on Probabilistic

Decision analysis with influence diagramsusing Elvira’s explanation facilities

Manuel Luque and Francisco J. DıezDept. Inteligencia Artificial, UNED

Juan del Rosal, 16, 28040 Madrid, Spain

Abstract

Explanation of reasoning in expert systems is necessary for debugging the knowledge base,for facilitating their acceptance by human users, and for using them as tutoring systems.Influence diagrams have proved to be effective tools for building decision-support systems,but explanation of their reasoning is difficult, because inference in probabilistic graphicalmodels seems to have little relation with human thinking. The current paper describessome explanation capabilities for influence diagrams and how they have been implementedin Elvira, a public software tool.

1 Introduction

Influence diagrams (IDs) are a probabilisticgraphical model for modelling decision prob-lems. They constitute a simple graphical for-malism that makes it possible to represent thethree main components of decision problems:the uncertainty, due to the existence of variablesnot controlled by the decision maker, the deci-sions or actions under their direct control, andtheir preferences about the possible outcomes.

In the context of expert systems, either prob-abilistic or heuristic, the development of ex-planation facilities is important for three mainreasons (Lacave and Dıez, 2002). First, be-cause the construction of those systems with thehelp of human experts is a difficult and time-consuming task, prone to errors and omissions.An explanation tool can help the experts andthe knowledge engineers to debug the systemwhen it does not yield the expected results andeven before a malfunction occurs. Second, be-cause human beings are reluctant to accept theadvice offered by a machine if they are not ableto understand how the system arrived at thoserecommendations; this reluctancy is especiallyclear in medicine (Wallis and Shortliffe, 1984).And third, because an expert system used as anintelligent tutor must be able to communicateto the apprentice the knowledge it contains, the

way in which the knowledge has been applied forarriving at a conclusion, and what would havehappened if the user had introduced differentpieces of evidence (what-if reasoning).

These reasons are especially relevant in thecase of probabilistic expert systems, because theelicitation of probabilities is a difficult task thatusually requires debugging and refinement, andbecause the algorithms for the computation ofprobabilities and utilities are, at least appar-ently, very different from human reasoning.

Unfortunately, most expert systems and com-mercial tools available today, either heuristicor probabilistic, have no explanation capabili-ties. In this paper we describe some explanationmethods developed as a response to the needsthat we have detected when building and debug-ging medical expert systems (Dıez et al., 1997;Luque et al., 2005) and when teaching proba-bilistic graphical models to pre- and postgradu-ate students of computer science and medicine(Dıez, 2004). These new methods have beenimplemented in Elvira, a public software tool.

1.1 Elvira

Elvira1 is a tool for building and evaluatinggraphical probabilistic models (Elvira Consor-

1At http://www.ia.uned.es/∼elvira it is possible toobtain the source code and several technical documentsabout Elvira.

Page 190: Proceedings of the Third European Workshop on Probabilistic

tium, 2002) developed as a join project of sev-eral Spanish universities. It contains a graphi-cal interface for editing networks, with specificoptions for canonical models, exact and approx-imate algorithms for both discrete and contin-uous variables, explanation facilities, learningmethods for building networks from databases,etc. Although some of the algorithms can workwith both discrete and continuous variables,most of the explanation capabilities assume thatall the variables are discrete.

2 Influence diagrams

2.1 Definition of an ID

An influence diagram (ID) consists of a directedacyclic graph that contains three kinds of nodes:chance nodes VC , decision nodes VD and util-ity nodes VU—see Figure 1. Chance nodes rep-resent random variables not controlled by thedecision maker. Decision nodes correspond toactions under the direct control of the deci-sion maker. Utility nodes represent the deci-sion maker’s preferences. Utility nodes can notbe parents of chance or decision nodes. Giventhat each node represents a variable, we will usethe terms variable and node interchangeably.

In the extended framework proposed byTatman and Shachter (1990) there are twokinds of utility nodes: ordinary utility nodes,whose parents are decision and/or chance nodes,and super-value nodes, whose parents are util-ity nodes, and can be in turn of two types, sumand product. We assume that there is a utilitynode U0, which is either the only utility node ora descendant of all the other utility nodes, andtherefore has no children.2

There are three kinds of arcs in an ID, de-pending on the type of node they go into. Arcsinto chance nodes represent probabilistic depen-dency. Arcs into decision nodes represent avail-ability of information, i.e., an arc X → D meansthat the state of X is known when making deci-

2An ID that does not fulfill this condition can betransformed by adding a super-value node U0 of typesum whose parents are the utility nodes that did nothave descendants. The expected utility and the optimalstrategy (both defined below) of the transformed dia-gram are the same as those of the original one.

sion D. Arcs into utility nodes represent func-tional dependence: for ordinary utility nodes,they represent the domain of the associated util-ity function; for a super-value node they in-dicate that the associated utility is a function(sum or product) of the utility functions of itsparents.

Figure 1: ID with two decisions (ovals), twochance nodes (squares) and three utility nodes(diamonds). There is a directed path includingall the decisions and node U0.

Standard IDs require that there is a directedpath that includes all the decision nodes andindicates the order in which the decisions aremade. This in turn induces a partition ofVC such that for an ID having n decisionsD0, . . . , Dn−1, the partition contains n + 1subsets C0,C1, ...,Cn, where Ci is the setof chance variables C such that there is a linkC → Di and no link C → Dj with j < i; i.e.,Ci represents the set of random variables knownfor Di and unknown for previous decisions. Cn

is the set of variables having no link to any deci-sion, i.e., the variables whose true value is neverknown directly.

Given a chance or decision variable V , twodecisions Di and Dj such that i < j, and twolinks V → Di and V → Dj , the former link

180 M.l Luque and F. J. Díez

Page 191: Proceedings of the Third European Workshop on Probabilistic

is said to be a no-forgetting link. In the aboveexample, T → D would be a non-forgetting link.

The variables known to the decision makerwhen deciding on Di are called informationalpredecessors of Di and denoted by IPred(Di).Assuming the no-forgetting hypothesis, we havethat IPred(Di) = IPred(Di−1) ∪ Di−1 ∪Ci =C0 ∪ D0 ∪C1 ∪ . . . ∪ Di−1 ∪Ci.

The quantitative information that defines anID is given by assigning to each random nodeC a probability distribution P (c|pa(C)) foreach configuration of its parents, pa(C), as-signing to each ordinary utility node U a func-tion ψU (pa(U)) that maps each configurationof its parents onto a real number, and assign-ing a utility-combination function to each super-value node. The domain of each function U isgiven by its functional predecessors, FPred(U).For an ordinary utility node, FPred(U) =Pa(U), and for a super-value node FPred(U) =⋃

U ′∈Pa(U) FPred(U ′).For instance, in the ID of Figure 1, we have

that FPred(U1) = X, D, FPred(U2) = Tand FPred(U0) = X, D, T.

In order to simplify our notation, we willsometimes assume without loss of generalitythat for any utility node U we have thatFPred(U) = VC ∪VD.

2.2 Policies and expected utilities

For each configuration vD of the decision vari-ables VD we have a joint distribution over theset of random variables VC :

P (vC : vD) =∏

C∈VC

P (c|pa(C)) (1)

which represents the probability of configura-tion vC when the decision variables are exter-nally set to the values given by vD (Cowell etal., 1999).

A stochastic policy for a decision D is a prob-ability distribution defined over D and condi-tioned on the set of its informational predeces-sors, PD(d|IPred(D)). If PD is degenerate (con-sisting of ones and zeros only) then we say thepolicy is deterministic.

A strategy ∆ for an ID is a set of policies, onefor each decision, PD|D ∈ VD. A strategy

∆ induces a joint distribution over VC ∪ VD

defined by

P∆(vC ,vD)

= P (vC : vD)∏

D∈VD

PD(d|IPred(D))

=∏

C∈VC

P (c|pa(C))∏

D∈VD

PD(d|pa(D)) (2)

Let I be an ID, ∆ a strategy for I and ra configuration defined over a set of variablesR ⊆ VC ∪ VD such that P∆(r) 6= 0. Theprobability distribution induced by strategy ∆given the configuration r, defined over R′ =(VC ∪VD) \R, is given by:

P∆(r′|r) =P∆(r, r′)P∆(r)

. (3)

Using this distribution we can compute the ex-pected utility of U under strategy ∆ given theconfiguration r as:

EUU (∆, r) =∑

r′P∆(r′|r)ψU (r, r′) . (4)

For the terminal utility node U0, EUU0(∆, r) issaid to be the expected utility of strategy ∆ giventhe configuration r, and denoted by EU(∆, r).

We define the expected utility of U understrategy ∆ as EUU (∆) = EUU (∆, ¨), where¨ is the empty configuration. We also definethe expected utility of strategy ∆ as EU(∆) =EUU0(∆). We have that

EUU (∆) =∑r

P∆(r)EUU (∆, r) . (5)

An optimal strategy is a strategy ∆opt thatmaximizes the expected utility:

∆opt = arg max∆∈∆∗

EU(∆) , (6)

where ∆∗ is the set of all strategies for I. Eachpolicy in an optimal strategy is said to be anoptimal policy. The maximum expected utility(MEU) is

MEU = EU(∆opt) = max∆∈∆∗

EU(∆) . (7)

Decision analysis with influence diagrams using Elvira's explanation facilities 181

Page 192: Proceedings of the Third European Workshop on Probabilistic

The evaluation of an ID consists in findingthe MEU and an optimal strategy. It can beproved (Cowell et al., 1999; Jensen, 2001) that

MEU =∑c0

maxd0

. . .∑cn−1

maxdn−1

∑cn

P (vC : vD)ψ(vC ,vD) . (8)

For instance, the MEU for the ID in Figure 1,assuming that U0 is of type sum, is

MEU = maxt

∑y

maxd

∑x

P (x) · P (y|t, x)·

· (U1(x, d) + U2(t))︸ ︷︷ ︸U0(x,d,t)

. (9)

2.3 Cooper policy networks

When a strategy ∆ = PD|D ∈ VD is de-fined for an ID, we can convert this into aBayesian network, called Cooper policy net-work (CPN), as follows: each decision Dis replaced by a chance node with probabil-ity potential PD and parents IPred(D), andeach utility node U is converted into a chancenode whose parents are its functional predeces-sors (cf. Sec. 2.1); the values of new chancevariables are +u,¬u and its probability isPCPN (+u|FPred(U)) =normU (U(FPred(U))),where normU is a bijective linear transforma-tion that maps the utilities onto the interval[0, 1] (Cooper, 1988).

For instance, the CPN for the ID in Figure 1 isdisplayed in Figure 2. Please note the additionof the non-forgetting link T → D and that theparents of node U0 are no longer U1 and U2 butT , X, and D, which were chance or decisionnodes in the ID.

The joint distribution of the CPN is:

PCPN (vC ,vD,vU )

= P∆(vC ,vD)∏

U∈VU

PU (u|pa(U)) . (10)

For a configuration r defined over a set of vari-ables R ⊆ VC ∪VD and U a utility node, it ispossible to prove that

PCPN (r) =P∆(r) (11)

X

Y

D

T

U2

U0

U1

Figure 2: Cooper policy network (PN) for theID in Figure 1.

PCPN (+u|r) = normU (EUU (∆, r)) . (12)

This equation allows us to compute EUU (∆, r)as norm−1

U (PCPN (+u|r)); i.e., the expected util-ity for a node U in the ID can be computed fromthe marginal probability of the correspondingnode in the CPN.

3 Explanation of influence diagramsin Elvira

3.1 Explanation of the model

The explanation of IDs in Elvira is based, to agreat extent, on the methods developed for ex-planation of Bayesian networks (Lacave et al.,2000; Lacave et al., 2006b). One of the meth-ods that have proven to be more useful is theautomatic colorings of links. The definitions in(Lacave et al., 2006b) for the sign of influenceand magnitude of influence, inspired on (Well-man, 1990), have been adapted to incoming arcsto ordinary utility nodes.

Specifically, the magnitude of the influencegives a relative measure of how a variable is in-fluencing an ordinary utility node (see (Lacaveet al., 2006a) for further details). Then, the in-fluence of a link pointing to a utility node ispositive when higher values of A lead to higherutilities. Cooper’s transformation normU guar-antees that the magnitude of the influence isnormalized.

For instance, in Figure 1 the link X → Y iscolored in red because it represents a positiveinfluence: the presence of the disease increasesthe probability of a positive result of the test.The link X → U1 is colored in blue because it

182 M.l Luque and F. J. Díez

Page 193: Proceedings of the Third European Workshop on Probabilistic

represents a negative influence: the disease de-creases the expected quality of life. The linkD → U1 is colored in purple because the influ-ence that it represents is undefined: the treat-ment is beneficial for patients suffering from Xbut detrimental for healthy patients.

Additionally, when a decision node has beenassigned a policy, either by optimization or im-posed by the user (see Sec. 3.3), the correspond-ing probability distribution PD can be used tocolor the links pointing to that node, as shownin Figure 1.

The coloring of links has been very useful inElvira for debugging Bayesian networks (Lacaveet al., 2006b) and IDs, in particular for detect-ing some cases in which the numerical probabil-ities did not correspond to the qualitative rela-tions determined by human experts, and also forchecking the effect that the informational pre-decessors of a decision node have on its policy.

3.2 Explanation of reasoning:cases of evidence

In Section 2.3 we have seen that, given a strat-egy, an ID can be converted into a CPN, whichis a true Bayesian network. Consequently, allthe explanation capabilities for BNs, such asElvira’s ability to manage evidence cases arealso available for IDs.

A finding states with certainty the valuetaken on a chance or decision variable. A set offindings is called evidence and corresponds to acertain configuration e of a set of observed vari-ables E. An evidence case is determined by anevidence e and the posterior probabilities andthe expected utilities induced by it.

A distinguishable feature of Elvira is its abil-ity to manage several evidence cases simultane-ously. A special evidence case is the prior case,which is the first case created and correspondsto the absence of evidence.

One of the evidence cases is marked as thecurrent case. Its probabilities and utilities aredisplayed in the nodes. For the rest of the cases,the probabilities and utilities are displayed onlyby bars, in order to visually compare how theyvary when new evidence is introduced.

The information displayed for nodes depends

on the kind of node. Chance and decisionnodes present bars and numbers correspond-ing to posterior probabilities of their states,P∆(v|e), as given by Equation 3. This is theprobability that a chance variable takes a cer-tain value or the probability that the decisionmaker chooses a certain option (Nilsson andJensen, 1998)—please note that in Equation 3there is no distinction between chance and de-cision nodes. Utility nodes show the expectedutilities, EUU (∆, e), given by Equation 4. Theguide bar indicates the range of the utilities.

Figure 3: ID in Elvira with two evidence cases:(a) the prior case (no evidence); (b) a case withthe evidence e = +y.

Figure 3 shows the result of evaluating theID in Figure 1. The link T → D is drawn asa discontinuous arrow to indicate that it hasbeen added by the evaluation of the diagram.In this example, as there was no policy imposedby the user (see below), Elvira computed theoptimal strategy. In this figure two evidencecases are displayed. The first one is the priorcase, i.e., the case in which there is no evidence.The second evidence case is given by e = +y;i.e., it displays the probabilities and utilitiesof the subpopulation of patients in which thetest gives a positive result. Node Y is coloredin gray to highlight the fact that there is ev-idence about it. The probability of +x, rep-resented by a red bar, is 0.70; the green barclose to it represents the probability of +x for

Decision analysis with influence diagrams using Elvira's explanation facilities 183

Page 194: Proceedings of the Third European Workshop on Probabilistic

the prior case, i.e., the prevalence of the disease;the red bar is longer than the green one becauseP (+x|+ y) > P (+x). The global utility for thesecond case is 81.05, which is smaller than thegreen bar close to it (the expected utility for thegeneral population) because the presence of thesymptom worsens the prognosis. The red barfor Treatment=yes is the probability that a pa-tient having a positive result in the test receivesthe treatment; this probability is 100% becausethe optimal strategy determines that all symp-tomatic patients must be treated.

The possibility of introducing evidence inElvira has been useful for building IDs in medi-cine: when we were interested in computing theposterior probability of diagnoses given severalsets of findings, we need to manually convert theID into a Bayesian network by removing deci-sion and utility nodes, and each time the ID wasmodified we have to convert it into a Bayesiannetwork to compute the probabilities. Now theprobabilities can be computed directly on theID.

3.2.1 Clarifying the concept ofevidence in influence diagrams

In order to avoid confusions, we must men-tion that Ezawa’s method (1998) for introduc-ing evidence in IDs is very different from theway that is introduced in Elvira. For Ezawa,the introduction of evidence e leads to a differ-ent decision problem in which the values of thevariables in E would be known with certaintybefore making any decision. For instance, in-troducing evidence +x in the ID in Figure 1would imply that X would be known when mak-ing decisions T and D. Therefore, the expectedutility of the new decision problem would be

maxt

∑y

maxd

P (y|+x : t, d)·(U1(+x, d) + U2(t)︸ ︷︷ ︸)U0(+x,d,t)

.

where P (y|+x : t, d) = P (+x, y : t, d)/P (+x) =P (y|+x : t). In spite of the apparent similarityof this expression with Equation 9, the optimalstrategy changes significantly to “always treat,without testing”, because if we know with cer-tainty that the disease X is present the result

of the test is irrelevant. The MEU for this de-cision problem is U1(+x,+d).

In contrast, the introduction of evidence inElvira (which may include “findings” for deci-sion variables as well) does not lead to a new de-cision scenario nor to a different strategy, sincethe strategy is determined before introducingthe “evidence”. Put another way, in Elvira weadopt the point view of an external observer of asystem that includes the decision maker as oneof its components. The probabilities and ex-pected utilities given by Equations 2 and 4 arethose corresponding to the subpopulation indi-cated by e when treated with strategy ∆. Forinstance, given the evidence +x, the probabil-ity P∆(+t|+x) shown by Elvira is the probabil-ity that a patient suffering from X receives thetest, which is 100% (it was 0% in Ezawa’s sce-nario), and P∆(+d|+ x) is the probability thathe receives the treatment; contrary to Ezawa’sscenario, this probability may differ from 100%because of false negatives. The expected utilityfor a patient suffering from X is

EU(∆, +x) =

=∑

t,y,d

P∆(t, y, d|+ x) · (U1(+x, d) + U2(t)︸ ︷︷ ︸)U0(+x,d,t)

.

where P∆(t, y, d| + x) = P∆(t) · P (y|t,+x) ·P∆(d|t, y).

Finally, we must underlie that both ap-proaches are not rivals. They correspond todifferent points of view when considering evi-dence in IDs and can complement each other inorder to perform a better decision analysis andto explain the reasoning.3

3.3 What-if reasoning: analysis ofnon-optimal strategies

In Elvira it is possible to have a strategy inwhich some of the policies are imposed by theuser and the others are computed by maximiza-tion. The way of imposing a policy consistsin setting a probability distribution PD for thecorresponding decision D by means of Elvira’s

3In the future, we will implement in Elvira Ezawa’smethod and the possibility of computing the expectedvalue of perfect information (EVPI).

184 M.l Luque and F. J. Díez

Page 195: Proceedings of the Third European Workshop on Probabilistic

GUI; the process is identical to editing the con-ditional probability table of a chance node. Infact, such a decision will be treated by Elvira asit were a chance node, and the maximization isperformed only on the rest of the decision nodes.

This way, in addition to computing the op-timal strategy (when the user has imposed nopolicy), as any other software tool for influencediagrams, Elvira also permits to analyze howthe expected utilities and the rest of the policieswould vary if the decision maker chose a non-optimal policy for some of the decisions (what-ifreasoning).

The reason for implementing this explanationfacility is that when we were building a cer-tain medical influence diagram (Luque et al.,2005) our expert wondered why the model rec-ommended not to perform a certain test. Wewished to compute the a posteriori probabilityof the disease given a positive result in the test,but we could not introduce this “evidence”, be-cause it was incompatible with the optimal pol-icy (not to test). After we implemented the pos-sibility of imposing non-optimal policies (in thiscase, performing the test) we could see that theposterior probability of the disease remained be-low the treatment threshold even after a posi-tive result in the test, and given that the resultof the test would be irrelevant, it is not worthyto do it.

3.4 Decision trees

Elvira can expand a decision tree correspondingto an ID, with the possibility of expanding andcontracting its branches to the desired level ofdetail. This is especially useful when buildingIDs in medicine, because physicians are used tothinking in terms of scenarios and most of themare familiar with decision trees, while very fewhave heard about influence diagrams. One dif-ference of Elvira with respect to other softwaretools is the possibility of expanding the nodes inthe decision tree to explore the decomposition ofits utility given by the structure of super-valuenodes of the ID.

3.5 Sensitivity analysis

Recently Elvira has been endowed with somewell-known sensitivity analysis tools, such asone-way sensitivity analysis, tornado diagrams,and spider diagrams, which can be combinedwith the above-mentioned methods for the ex-planation of reasoning. For instance, given theID in Figure 1, one-way sensitivity analysis onthe prevalence of the disease X can be used todetermine the threshold treatment, and we canlater see whether the result of test Y makes theprobability of X cross that threshold. In theconstruction of more complex IDs this has beenuseful for understanding why some tests are nec-essary or not, and why sometimes the result ofa test is irrelevant.

4 Conclusions

The explanation of reasoning in expert systemsis necessary for debugging the knowledge base,for facilitating their acceptance by human users,and for using them as tutoring systems. This isespecially important in the case of influence dia-grams, because inference in probabilistic graph-ical models seems to have little relation withhuman thinking. Nevertheless, in general cur-rent software tools for building and evaluatingdecision trees and IDs offer very little supportfor analyzing the “reasoning”: usually they onlypermit to compute the value of information, toperform some kind of sensitivity analysis, or toexpand a decision tree, which can be hardly con-sidered as explanation capabilities.

In this paper we have described some facili-ties implemented in Elvira that have been usefulfor understanding the knowledge contained ina certain ID, and why its evaluation has led tocertain results, i.e., the optimal strategy and theexpected utility. They can analyze how the pos-terior probabilities, policies, and expected util-ities would vary if the decision maker applied adifferent (non-optimal) strategy. Most of theseexplanation facilities are based on the construc-tion of a so-called Cooper policy network, whichis a true Bayesian network, and consequently allthe explanation options that were implementedfor Bayesian networks in Elvira are also avail-

Decision analysis with influence diagrams using Elvira's explanation facilities 185

Page 196: Proceedings of the Third European Workshop on Probabilistic

able for IDs, such as the possibility of handlingseveral evidence cases simultaneously. Anothernovelty of Elvira with respect to most softwaretools for IDs is the ability to include super-valuenodes in the IDs, even when expanding the de-cision tree equivalent to an ID.

We have also mentioned in this paper howthese explanation facilities, which we have de-veloped as a response to the needs that wehave encountered when building medical mod-els, have helped us to explain the IDs to thephysicians collaborating with us and to debugthose models (see (Lacave et al., 2006a) for fur-ther details).

Acknowledgments

Part of this work has been financed by theSpanish CICYT under project TIC2001-2973-C05. Manuel Luque has also been partially sup-ported by Department of Education of the Co-munidad de Madrid and the European SocialFund (ESF).

References

G. F. Cooper. 1988. A method for using belief net-works as influence diagrams. In Proceedings of the4th Workshop on Uncertainty in AI, pages 55–63,University of Minnesota, Minneapolis.

R. G. Cowell, A. P. Dawid, S. L. Lauritzen, andD. J. Spiegelhalter. 1999. Probabilistic Networksand Expert Systems. Springer-Verlag, New York.

F. J. Dıez, J. Mira, E. Iturralde, and S. Zubillaga.1997. DIAVAL, a Bayesian expert system forechocardiography. Artificial Intelligence in Medi-cine, 10:59–73.

F.J. Dıez. 2004. Teaching probabilistic medicalreasoning with the Elvira software. In R. Hauxand C. Kulikowski, editors, IMIA Yearbook ofMedical Informatics, pages 175–180. Schattauer,Sttutgart.

The Elvira Consortium. 2002. Elvira: An environ-ment for creating and using probabilistic graph-ical models. In Proceedings of the First Euro-pean Workshop on Probabilistic Graphical Models(PGM’02), pages 1–11, Cuenca, Spain.

K. Ezawa. 1998. Evidence propagation and value ofevidence on influence diagrams. Operations Re-search, 46:73–83.

F. V. Jensen. 2001. Bayesian Networks and Deci-sion Graphs. Springer-Verlag, New York.

C. Lacave and F. J. Dıez. 2002. A review of explana-tion methods for Bayesian networks. KnowledgeEngineering Review, 17:107–127.

C. Lacave, R. Atienza, and F. J. Dıez. 2000. Graph-ical explanation in Bayesian networks. In Pro-ceedings of the International Symposium on Med-ical Data Analysis (ISMDA-2000), pages 122–129,Frankfurt, Germany. Springer-Verlag, Heidelberg.

C. Lacave, M. Luque, and F. J. Dıez. 2006a. Ex-planation of Bayesian networks and influence di-agrams in Elvira. Technical Report, CISIAD,UNED, Madrid.

C. Lacave, A. Onisko, and F. J. Dıez. 2006b. Use ofElvira’s explanation facility for debugging prob-abilistic expert systems. Knowledge-Based Sys-tems. In press.

M. Luque, F. J. Dıez, and C. Disdier. 2005. In-fluence diagrams for medical decision problems:Some limitations and proposed solutions. In J. H.Holmes and N. Peek, editors, Proceedings of theIntelligent Data Analysis in Medicine and Phar-macology, pages 85–86.

D. Nilsson and F. V. Jensen. 1998. Probabilities offuture decisions. In Proceedings from the Inter-national Conference on Informational Processingand Management of Uncertainty in knowledge-based Systems, pages 1454–1461.

J. A. Tatman and R. D. Shachter. 1990. Dynamicprogramming and influence diagrams. IEEETransactions on Systems, Man, and Cybernetics,20:365–379.

J. W. Wallis and E. H. Shortliffe. 1984. Cus-tomized explanations using causal knowledge. InB. G. Buchanan and E. H. Shortliffe, editors,Rule-Based Expert Systems: The MYCIN Ex-periments of the Stanford Heuristic ProgrammingProject, chapter 20, pages 371–388. Addison-Wesley, Reading, MA.

M. P. Wellman. 1990. Fundamental concepts ofqualitative probabilistic networks. Artificial In-telligence, 44:257–303.

186 M.l Luque and F. J. Díez

Page 197: Proceedings of the Third European Workshop on Probabilistic

Dynamic importance sampling in Bayesian networks usingfactorisation of probability trees

Irene MartınezDepartment of Languages and Computation

University of AlmerıaAlmerıa, 04120, Spain

Carmelo Rodrıguez and Antonio SalmeronDepartment of Statistics and Applied Mathematics

University of AlmerıaAlmerıa, 04120, Spain

Abstract

Factorisation of probability trees is a useful tool for inference in Bayesian networks. Prob-abilistic potentials some of whose parts are proportional can be decomposed as a productof smaller trees. Some algorithms, like lazy propagation, can take advantage of this fact.Also, the factorisation can be used as a tool for approximating inference, if the decompo-sition is carried out even if the proportionality is not completely reached. In this paper wepropose the use of approximate factorisation as a means of controlling the approximationlevel in a dynamic importance sampling algorithm.

1 Introduction

In this paper we propose an algorithm for up-dating probabilities in Bayesian networks. Thisproblem is known to be NP-hard even in the ap-proximate case (Dagum and Luby, 1993). Thereexist several deterministic approximate algo-rithms (Cano et al., 2000; Cano et al., 2002;Cano et al., 2003; Kjærulff, 1994) as well asalgorithms based on Monte Carlo simulation.The two main approaches are: Gibbs sam-pling (Jensen et al., 1995; Pearl, 1987) and im-portance sampling (Cheng and Druzdzel, 2000;Hernandez et al., 1998; Moral and Salmeron,2005; Salmeron et al., 2000; Shachter and Peot,1990).

A class of these simulation procedures iscomposed by the importance sampling algo-rithms based on approximate pre-computation(Hernandez et al., 1998; Moral and Salmeron,2005; Salmeron et al., 2000). These methodsperform first a fast but non exact propagation,consisting of a node removal process (Zhang andPoole, 1996). In this way, an approximate ‘a

posteriori’ distribution is obtained. In the sec-ond stage a sample is drawn using the approx-imate distribution and the probabilities are es-timated according to the importance samplingmethodology (Rubinstein, 1981). In the mostrecent algorithm (Moral and Salmeron, 2005) adynamic approach is adopted, in which the sam-pling distributions are updated according to theinformation collected during the simulation.

Recently, a new approach to construct ap-proximate deterministic algorithms was pro-posed in (Martınez et al., 2005), based onthe concept of approximate factorisation of theprobability trees used to represent the probabil-ity functions.

The goal of this paper is to incorporate theideas contained in (Martınez et al., 2005) to thedynamic algorithm introduced in (Moral andSalmeron, 2005), with the aim of showing thatthe approximation level can be controlled by thefactorisation alternatively to pruning the trees.

The rest of the paper is organised as fol-lows: in section 2 we analyse the main featuresof the dynamic importance sampling algorithm

Page 198: Proceedings of the Third European Workshop on Probabilistic

in (Moral and Salmeron, 2005). Section 3 ex-plain the concept of approximate factorisationof probability trees. Then, in section 4 we ex-plain how both ideas can be combined in a newalgorithm. The performance of the new algo-rithm is tested using three real-world Bayesiannetworks in section 5 and the paper ends withthe conclusions in section 6.

2 Dynamic importance sampling in

Bayesian networks

Along this paper we will consider a Bayesiannetwork with variables X = X1, . . . ,Xnwhere each Xi is a discrete variable taking val-ues on a finite set ΩXi

. By ΩX we denote thestate space of the n-dimensional random vari-able X.

Probabilistic reasoning in Bayesian networksrequires the computation of the posterior prob-abilities p(xk|e), xk ∈ ΩXk

for each variable ofinterest Xk, given that some other variables E

have been observed to take value e.

The posterior probability mentioned abovecan be expressed as

p(xk|e) =p(xk, e)

p(e)∀xk ∈ ΩXk

,

and, since p(e) is a constant value, the prob-lem of calculating the posterior probability ofinterest is equivalent to obtaining p(xk, e) andnormalising afterwards. If we denote by p(x)the joint distribution for variables X, then itholds that

p(xk, e) =∑

ΩX\(Xk∪E)

p(x) ,

where we assume that the k-th coordinate ofx is equal to xk and the coordinates in x cor-responding to observed variables are equal toe. Therefore, the problem of probabilistic rea-soning can be reduced to computing a sum, andhere is where the importance sampling techniquetakes part.

Importance sampling is a well known methodfor computing integrals (Rubinstein, 1981) orsums over multivariate functions. A straight-forward way to do that could be to draw a sam-

ple from p(x) and then estimate p(xk, e) fromit. However, p(x) is often unmanageable, dueto the large size of the state space of X, ΩX.

Importance sampling tries to overcome thisproblem by using a simplified probability func-tion p∗(x) to obtain a sample of ΩX. The es-timation is carried out according to the nextprocedure.

Importance Sampling

1. FOR j := 1 to m (sample size)

(a) Generate a configuration x(j) ∈ ΩX

using p∗.

(b) Compute a weight for the generatedconfiguration as:

wj :=p(x(j))

p∗(x(j)). (1)

2. For each xk ∈ ΩXk, estimate p(xk, e) as

p(xk, e) obtained as the sum of the weightsin formula (1) corresponding to configura-tions containing xk divided by m.

3. Normalise the values p(xk, e) in orderto obtain p(xk|e), i.e., an estimation ofp(xk|e).

In (Salmeron et al., 2000; Moral andSalmeron, 2005), a sampling distribution iscomputed for each variable, so that p∗ is equalto the product of the sampling distributionsfor all the variables. The sampling distribu-tion for each variable can be obtained througha process of variable elimination. Assume thatthe variables are eliminated following the or-der X1, . . . ,Xn, and that, before eliminating thefirst variable, H is the set of conditional dis-tributions in the network. Then, the next al-gorithm obtains a sampling distribution for hei-th variable.

Get Sampling Distribution(Xi,H)

1. Hi := f ∈ H|f is defined for Xi.

2. fi :=∏

f∈Hif .

3. f ′i :=

x∈ΩXi

fi.

188 I. Martínez, C. Rodríguez, and A. Salmerón

Page 199: Proceedings of the Third European Workshop on Probabilistic

4. H := H \ Hi ∪ f ′i.

5. RETURN (fi).

Simulation is carried out in an order contraryto the one in which variables are deleted. Eachvariable Xi is simulated from its sampling distri-bution fi. This function is defined for variableXi and other variables already sampled. Thepotential fi is restricted to the already obtainedvalues of the variables for which it is defined,except Xi, giving rise to a function which de-pends only on Xi. Finally, a value for this vari-able is obtained with probability proportionalto the values of this potential. If all the compu-tations are exact, it was proved in (Hernandezet al., 1998) that, following this procedure, weare really sampling with the optimal probabilityp∗(x) = p(x|e). However, the result of the com-binations in the process of obtaining the sam-pling distributions may require a large amountof space to be stored, and therefore approxima-tions are usually employed, either using proba-bility tables (Hernandez et al., 1998) or proba-bility trees (Salmeron et al., 2000) to representthe distributions.

In (Moral and Salmeron, 2005) an alternativeprocedure to simulate each variable was used.Instead of restricting fi to the values of thevariables already sampled, all the functions inHi are restricted, resulting in a set of functionsdepending only on Xi. The sampling distribu-tion is then computed by multiplying all thesefunctions. If the computations are exact, thenboth distributions are the same, as restrictionand combination commute. This is the basisfor the dynamic updating procedure proposedin (Moral and Salmeron, 2005). Probabilisticpotentials are represented by probability trees,which are approximated by pruning some of itsbranches when computing fi during the calcula-tion of the sampling distributions. This approx-imation is somehow corrected according to theinformation collected during the simulation: theprobability value of the configuration alreadysimulated provided by the product of the poten-tials in Hi must be the same as the probabilityvalue for the same configuration provided by f ′

i .

Otherwise, potential fi is re-computed in orderto correct the detected discrepancy.

3 Factorisation of probability trees

As mentioned in section 2, the approximation ofthe probability trees used to represent the prob-abilistic potentials consists of pruning some oftheir branches, namely those that lead to simi-lar leaves (similar probability values), that canbe substituted by the average, so that the errorof the approximation depends on the differencesbetween the values of the leaves correspondingto the pruned branches. Another way of tak-ing advantage of the use of probability trees isgiven by the possible presence of proportional-ity between different subtrees (Martınez et al.,2002). This fact is illustrated in figures 1 and2. In this case, the tree in figure 1 can be repre-sented, without loss of precision, by the productof two smaller trees, shown in figure 2.

Note that the second tree in figure 2 doesnot contain variable X. It means that, whencomputing the sampling distribution for vari-able X, the products necessary to obtain func-tion fi would be done over smaller trees.

3.1 Approximate Factorisation of

Probability Trees

Factorisation of probability trees can be utilisedas a tool for developing approximate inferencealgorithms, when the proportionality conditionis relaxed. In this case, probability trees can bedecomposed even if we find only almost propor-tional, rather than proportional, subtrees. Ap-proximate factorisation and its application toLazy propagation was proposed in (Martınez etal., 2005), and is stated as follows. Let T1 andT2 be two sub-trees which are siblings for a givencontext (i.e. both sub-trees are children of thesame node), such that both have the same sizeand their leaves contain only strictly positivenumbers. The goal of the approximate factori-

sation is to find a tree T ∗2 with the same struc-

ture than T2, such that T ∗2 and T1 become pro-

portional, under the restriction that the poten-tial represented by T ∗

2 must be as close as possi-ble to the one represented by T2. Then, T2 canbe replaced by T ∗

2 and the resulting tree that

Dynamic importance sampling in Bayesian networks using factorisation of probability trees 189

Page 200: Proceedings of the Third European Workshop on Probabilistic

W

X

0

Y

0

0.1

0

0.2

1

0.2

2

0.5

3

Y

1

0.2

0

0.4

1

0.4

2

1.0

3

Y

2

0.4

0

0.8

1

0.8

2

2.0

3

X

1

0.4

0

0.1

1

0.5

2

Figure 1: A probability tree proportional below X for context (W = 0).

W

X

0

1

0

2

1

4

2

X

1

0.4

0

0.1

1

0.5

2

⊗ W

Y

0

0.1

0

0.2

1

0.2

2

0.5

3

1

1

Figure 2: Decomposition of the tree in figure 1 with respect to variable X.

contain T1 and T2 can be decomposed by factori-sation. Formally, approximate factorisation isdefined through the concept of δ-factorisability.

Definition 1. A probability tree T is δ-factorisable within context (XC = xC), withproportionality factors α with respect to a di-vergence measure D if there is an xi ∈ ΩX anda set L ⊂ ΩX \ xi such that for every xj ∈ L,∃αj > 0 such that

D(T R(XC=xC ,X=xj), αj · TR(XC=xC ,X=xi)) ≤ δ ,

where T R(XC=xC ,X=xi) denotes the subtree ofT which is reached by the branch where vari-ables XC take values xC and Xi takes value xi.Parameter δ > 0 is called the tolerance of the

approximation.

Observe that if δ = 0, we have exact factori-sation. In the definition above, D and α =(α1, . . . , αk) are related in such a way that α

can be computed in order to minimise the valueof the divergence measure D.

In (Martınez et al., 2005) several divergencemeasures are considered, giving the optimal α

in each case. In this work we will consideronly the measure that showed the best perfor-mance in (Martınez et al., 2005) that we willdescribe now. Consider a probability tree T .Let T1 and T2 be sub-trees of T below a vari-able X, for a given context (XC = xc) withleaves P = pi : i = 1, . . . , n ; pi 6= 0 and Q =qi : i = 1, . . . , n respectively. As we describedbefore, approximate factorisation is achieved byreplacing T2 by another tree T ∗

2 such that T ∗2 is

proportional to T1. It means that the leaves ofT ∗

2 will be Q∗ = αpi : i = 1, . . . , n, where αis the proportionality factor between T1 and T2.Let us denote by πi = qi/pi, i = 1, . . . , n theratios between the leaves of T2 and T1.

The χ2 divergence, defined as

Dχ(T2,T∗2 ) =

n∑

i=1

(qi − αpi)2

qi,

190 I. Martínez, C. Rodríguez, and A. Salmerón

Page 201: Proceedings of the Third European Workshop on Probabilistic

is minimised for α equal to

αχ =

∑ni=1 pi

∑ni=1 pi/πi

.

In this work we will use its normalised version

Dχ∗(T2,T∗2 ) =

Dχ + n, (2)

which takes values between 0 and 1 and is min-imised for the same α.

4 Dynamic importance sampling

combined with factorisation

As we mentioned in section 2, the complexityof the dynamic importance sampling algorithmrelies on the computation of the sampling dis-tribution. In the algorithm proposed in (Moraland Salmeron, 2005), this complexity is con-trolled by pruning the trees resulting from mul-tiplications of other trees.

Here we propose to use approximate factori-sation instead of tree pruning to control thecomplexity of the sampling distributions com-putation. Note that these two alternatives (ap-proximate factorisation and tree pruning) arenot exclusive. They can be used in a combinedform. However, in order to have a clear idea ofhow approximate factorisation affects the accu-racy of the results, we will not mix it with treepruning in the algorithm proposed here.

The difference between the dynamic algo-rithm described in (Moral and Salmeron, 2005)and the method we propose in this paper isin the computation of the sampling distribu-tions. The simulation phase and the update ofthe sampling distribution is carried out exactlyin the same way, except that we have estab-lished a limit for the number of updates dur-ing the simulations equal to 1000 iterations. Itmeans that, after iteration #1000 in the simu-lation, the sampling distributions are no longerupdated. We have decided this with the aim ofavoiding a high increase in space requirementsduring the simulation. Therefore, we only de-scribe the part of the algorithm devoted to ob-tain the sampling distribution for a given vari-able Xi.

Get Sampling Distribution(Xi,H,D,δ)

1. Hi := f ∈ H|f is defined for Xi.

2. FOR each f ∈ Hi,

(a) IF there is a context XC for whichthe tree corresponding to f is δ-factorisable with respect to D,

• Decompose f as f1 × f2, where f1

is defined for Xi and f2 is not.

• Hi := (Hi \ f) ∪ f1.

• H := (H \ f) ∪ f2.

(b) ELSE

• H := H \ f.

3. fi :=∏

f∈Hif .

4. f ′i :=

x∈ΩXi

fi.

5. H := H ∪ f ′i.

6. RETURN (fi).

According to the algorithm above, the sam-pling distribution for a variable Xi is computedby taking all the functions which are definedfor it, but unlike the method in (Moral andSalmeron, 2005), the product of all those func-tions is postponed until the possible factorisa-tions of all the intervening functions are tested.Therefore, the product in step 3 is carried outover functions with smaller domains than theproduct in step 2 of the algorithm described insection 2.

The accuracy of the sampling distribution iscontrolled by parameter δ, so that higher valuesof it result in worse approximations.

5 Experimental evaluation

In this section we describe a set of experimentscarried out to show how approximate factori-sation can be used to control the level of ap-proximation in dynamic importance sampling.We have implemented the algorithm in java,as a part of the Elvira system (Elvira Con-sortium, 2002). We have selected three real-world Bayesian networks borrowed from theMachine Intelligence group at Aalborg Uni-versity (www.cs.aau.dk/research/MI/). The

Dynamic importance sampling in Bayesian networks using factorisation of probability trees 191

Page 202: Proceedings of the Third European Workshop on Probabilistic

three networks are called Link (Jensen et al.,1995), Munin1 and Munin2 (Andreassen et al.,1989). Table 1 displays, for each network, thenumber of variables, number of observed vari-ables (selected at random) and its size. By thesize we mean the sum of the clique sizes that re-sulted when using HUGIN (Jensen et al., 1990)for computing the exact posterior distributionsgiven the observed variables.

Table 1: Statistics about the networks used inthe experiments.

Vars. Obs. vars. Size

Link 441 166 23,983,962Munin1 189 8 83,735,758Munin2 1003 15 2,049,942

We have run the importance sam-pling algorithm with tolerance valuesδ = 0.1, 0.05, 0.01, 0.005 and 0.001 andwith different number of simulation iterations:2000, 3000, 4000 and 5000. Given the randomnature of this algorithm, we have run eachexperiment 20 times and the errors have beenaveraged. For one variable Xl, the error inthe estimation in its posterior distribution ismeasured as (Fertig and Mann, 1980):

G(Xl) =

1

|ΩXl|

al∈ΩXl

(p′(al|e) − p(al|e))2

p(al|e)(1 − p(al|e))

(3)

where p(al|e) is the true a posteriori probability,p′(al|e) is the estimated value and |ΩXl

| is thenumber of cases of variable Xl. For a set ofvariables X1, . . . ,Xn, the error is:

G(X1, . . . ,Xn) =

n∑

i=1

G(Xi)2 (4)

This error measure emphasises the differencesin small probability values, which means that ismore discriminant than other measures like themean squared error.

Figure 3: Results for Link with δ =0.1, 0.05, 0.01, 0.005 and 0.001.

Figure 4: Results for Munin1 with δ =0.1, 0.05, 0.01, 0.005 and 0.001.

5.1 Results discussion

The results summarised in figures 3, 4 and 5, in-dicate that the error can be somehow controlledby means of the δ parameter. There are, how-ever, irregularities in the graphs, probably dueto the variance associated with the estimations,due to the stochastic nature of the importancesampling method.

The best estimations are achieved for theMunin1 network. This is due to the factthat few factorisations are actually carried out,and therefore the number of approximations islower, what also decreases the variance of theestimations.

Factorisation or even approximate factorisa-tion are difficult to carry out over very extremedistributions. In the approximate case, a highvalue for δ would be necessary. This difficulty

192 I. Martínez, C. Rodríguez, and A. Salmerón

Page 203: Proceedings of the Third European Workshop on Probabilistic

Figure 5: Results for Munin2 with δ =0.1, 0.05, 0.01, 0.005 and 0.001.

also arises when using tree pruning as approx-imation method, since this former method isgood to capture uniform regions.

Regarding the high variance associated withthe estimations suggested by the graphs in fig-ures 3 and 5, it can be caused by multiple fac-torisations of the same tree: A tree can be fac-torised for some variable and afterwards, oneof the resulting factors, can be factorised againwhen deleting another variable. In this case,even if the local error is controlled by δ, there isa global error due to the concatenations of fac-torisations that is difficult to handle. Perhapsthe solution is to forbid that trees that resultfrom a factorisations be factorised again.

6 Conclusions

In this paper we have introduced a new versionof the dynamic importance sampling algorithmproposed in (Moral and Salmeron, 2005). Thenovelty consists of using the approximate fac-torisation of probability trees to control the ap-proximation of the sampling distributions and,in consequence, the final approximations.

The experimental results suggest that themethod is valid but perhaps more difficult tocontrol than tree pruning.

Even though the goal of this paper was toanalyse the performance of tree factorisation asa stand-alone approximation method, one im-mediate conclusion that can be drawn from thework presented here is that the joint use of ap-

proximate factorisation and tree pruning shouldbe studied. We have carried out some experi-ment in this direction and the first results arevery promising: the variance in the estimationsand the computing time seems to be signifi-cantly reduced. The main problem of combin-ing both methods is that the difficulty of con-trolling the final error of the estimation throughthe parameters of the algorithm increases. Weleave for an extended version of this paper a de-tailed insight on this issue. The experiments weare currently carrying out suggest that the mostsensible way of combining both approximationprocedures is to use slight tree pruning and alsolow values for the tolerance of the factorisation.Otherwise, the error introduced by one of theapproximation method can affect the other.

It seems difficult to establish a general rulefor determining which approximation methodis preferable. Depending on the network andthe propagation algorithm, the performance ofboth techniques may vary. For instance, ap-proximate factorisation is appropriate for al-gorithms that handle factorised potentials, asLazy propagation or the dynamic importancesampling method used in this paper.

Finally, an efficiency comparison betweenthe resulting algorithm and the dynamic im-portance sampling algorithm in (Moral andSalmeron, 2005) will be one of the issues thatwe plan to aim at.

Acknowledgments

This work has been supported by the Span-ish Ministry of Education and Science throughproject TIN2004-06204-C03-01.

References

S. Andreassen, F.V. Jensen, S.K. Andersen,B. Falck, U. Kjærulff, M. Woldbye, A.R. Sorensen,A. Rosenfalck, and F. Jensen, 1989. Computer-Aided Electromyography and Expert Systems,chapter MUNIN - an expert EMG assistant, pages255–277. Elsevier Science Publishers B.V., Ams-terdam.

A. Cano, S. Moral, and A. Salmeron. 2000. Penni-less propagation in join trees. International Jour-nal of Intelligent Systems, 15:1027–1059.

Dynamic importance sampling in Bayesian networks using factorisation of probability trees 193

Page 204: Proceedings of the Third European Workshop on Probabilistic

A. Cano, S. Moral, and A. Salmeron. 2002. Lazyevaluation in Penniless propagation over jointrees. Networks, 39:175–185.

A. Cano, S. Moral, and A. Salmeron. 2003. Novelstrategies to approximate probability trees in Pen-niless propagation. International Journal of Intel-ligent Systems, 18:193–203.

J. Cheng and M.J. Druzdzel. 2000. AIS-BN: An adaptive importance sampling algorithmfor evidential reasoning in large Bayesian net-works. Journal of Artificial Intelligence Research,13:155–188.

P. Dagum and M. Luby. 1993. Approximating prob-abilistic inference in Bayesian belief networks isNP-hard. Artificial Intelligence, 60:141–153.

Elvira Consortium. 2002. Elvira: An environ-ment for creating and using probabilistic graph-ical models. In J.A. Gmez and A. Salmern, edi-tors, Proceedings of the First European Workshopon Probabilistic Graphical Models, pages 222–230.

K.W. Fertig and N.R. Mann. 1980. An accurateapproximation to the sampling distribution of thestudentized extreme-valued statistic. Technomet-rics, 22:83–90.

L.D. Hernandez, S. Moral, and A. Salmeron. 1998.A Monte Carlo algorithm for probabilistic prop-agation in belief networks based on importancesampling and stratified simulation techniques. In-ternational Journal of Approximate Reasoning,18:53–91.

F.V. Jensen, S.L. Lauritzen, and K.G. Olesen.1990. Bayesian updating in causal probabilisticnetworks by local computation. ComputationalStatistics Quarterly, 4:269–282.

C.S. Jensen, A. Kong, and U. Kjærulff. 1995. Block-ing Gibbs sampling in very large probabilistic ex-pert systems. International Journal of Human-Computer Studies, 42:647–666.

U. Kjærulff. 1994. Reduction of computational com-plexity in Bayesian networks through removal ofweak dependencies. In Proceedings of the 10thConference on Uncertainty in Artificial Intelli-gence, pages 374–382. Morgan Kaufmann, SanFrancisco.

I. Martınez, S. Moral, C. Rodrıguez, andA. Salmeron. 2002. Factorisation of probabilitytrees and its application to inference in Bayesiannetworks. In J.A. Gamez and A. Salmeron, edi-tors, Proceedings of the First European Workshopon Probabilistic Graphical Models, pages 127–134.

I. Martınez, S. Moral, C. Rodrıguez, andA. Salmeron. 2005. Approximate factorisation ofprobability trees. ECSQARU’05. Lecture Notesin Artificial Intelligence, 3571:51–62.

S. Moral and A. Salmeron. 2005. Dynamic im-portance sampling in Bayesian networks based onprobability trees. International Journal of Ap-proximate Reasoning, 38(3):245–261.

J. Pearl. 1987. Evidential reasoning using stochas-tic simulation of causal models. Artificial Intelli-gence, 32:247–257.

R.Y. Rubinstein. 1981. Simulation and the MonteCarlo Method. Wiley (New York).

A. Salmeron, A. Cano, and S. Moral. 2000. Impor-tance sampling in Bayesian networks using prob-ability trees. Computational Statistics and DataAnalysis, 34:387–413.

R.D. Shachter and M.A. Peot. 1990. Simulation ap-proaches to general probabilistic inference on be-lief networks. In M. Henrion, R.D. Shachter, L.N.Kanal, and J.F. Lemmer, editors, Uncertainty inArtificial Intelligence, volume 5, pages 221–231.North Holland (Amsterdam).

N.L. Zhang and D. Poole. 1996. Exploitingcausal independence in Bayesian network infer-ence. Journal of Artificial Intelligence Research,5:301–328.

194 I. Martínez, C. Rodríguez, and A. Salmerón

Page 205: Proceedings of the Third European Workshop on Probabilistic

Learning Semi-Markovian Causal Models using Experiments

Stijn Meganck1, Sam Maes2, Philippe Leray2 and Bernard Manderick1

1CoMoVrije Universiteit Brussel

Brussels, Belgium2LITIS

INSA RouenSt. Etienne du Rouvray, France

Abstract

Semi-Markovian causal models (SMCMs) are an extension of causal Bayesian networks formodeling problems with latent variables. However, there is a big gap between the SMCMsused in theoretical studies and the models that can be learned from observational dataalone. The result of standard algorithms for learning from observations, is a completepartially ancestral graph (CPAG), representing the Markov equivalence class of maximalancestral graphs (MAGs). In MAGs not all edges can be interpreted as immediate causalrelationships. In order to apply state-of-the-art causal inference techniques we need tocompletely orient the learned CPAG and to transform the result into a SMCM by removingnon-causal edges. In this paper we combine recent work on MAG structure learning fromobservational data with causal learning from experiments in order to achieve that goal.More specifically, we provide a set of rules that indicate which experiments are neededin order to transform a CPAG to a completely oriented SMCM and how the results ofthese experiments have to be processed. We will propose an alternative representationfor SMCMs that can easily be parametrised and where the parameters can be learnedwith classical methods. Finally, we show how this parametrisation can be used to developmethods to efficiently perform both probabilistic and causal inference.

1 Introduction

This paper discusses graphical models that canhandle latent variables without explicitly mod-eling them quantitatively. For such problem do-mains, several paradigms exist, such as semi-Markovian causal models or maximal ances-tral graphs. Applying these techniques to aproblem domain consists of several steps, typi-cally: structure learning from observational andexperimental data, parameter learning, proba-bilistic inference, and, quantitative causal infer-ence.

A problem is that each existing approach onlyfocuses on one or a few of all the steps involvedin the process of modeling a problem includ-ing latent variables. The goal of this paper isto investigate the integral process from learning

from observational and experimental data untodifferent types of efficient inference.

Semi-Markovian causal models (SMCMs)(Pearl, 2000; Tian and Pearl, 2002) are specif-ically suited for performing quantitative causalinference in the presence of latent variables.However, at this time no efficient parametrisa-tion of such models is provided and there are notechniques for performing efficient probabilisticinference. Furthermore there are no techniquesfor learning these models from data issued fromobservations, experiments or both.

Maximal ancestral graphs (MAGs), devel-oped in (Richardson and Spirtes, 2002) arespecifically suited for structure learning fromobservational data. In MAGs every edge de-picts an ancestral relationship. However, thetechniques only learn up to Markov equivalence

Page 206: Proceedings of the Third European Workshop on Probabilistic

and provide no clues on which additional ex-periments to perform in order to obtain thefully oriented causal graph. See (Eberhardt etal., 2005; Meganck et al., 2006) for that typeof results on Bayesian networks without latentvariables. Furthermore, no parametrisation fordiscrete variables is provided for MAGs (onlyone in terms of Gaussian distributions) and notechniques for probabilistic and causal inferencehave been developed.

We have chosen to use SMCMs in this pa-per, because they are the only formalism thatallows to perform causal inference while takinginto account the influence of latent variables.However, we will combine existing techniquesto learn MAGs with newly developed methodsto provide an integral approach that uses bothobservational data and experiments in order tolearn fully oriented semi-Markovian causal mod-els.

In this paper we also introduce an alterna-tive representation for SMCMs together with aparametrisation for this representation, wherethe parameters can be learned from data withclassical techniques. Finally, we discuss howprobabilistic and quantitative causal inferencecan be performed in these models.

The next section introduces the necessary no-tations and definitions. It also discusses the se-mantical and other differences between SMCMsand MAGs. In section 3, we discuss structurelearning for SMCMs. Then we introduce a newrepresentation for SMCMs that can easily beparametrised. We also show how both proba-bilistic and causal inference can be performedwith the help of this new representation.

2 Notation and Definitions

We start this section by introducing notationsand defining concepts necessary in the rest ofthis paper. We will also clarify the differencesand similarities between the semantics of SM-CMs and MAGs.

2.1 Notation

In this work uppercase letters are used to rep-resent variables or sets of variables, i.e., V =

V1, . . . , Vn, while corresponding lowercase let-ters are used to represent their instantiations,i.e., v1, v2 and v is an instantiation of all vi.P (Vi) is used to denote the probability distribu-tion over all possible values of variable Vi, whileP (Vi = vi) is used to denote the probability dis-tribution over the instantiation of variable Vi tovalue vi. Usually, P (vi) is used as an abbrevia-tion of P (Vi = vi).

The operators Pa(Vi), Anc(Vi), Ne(Vi) de-note the observable parents, ancestors andneighbors respectively of variable Vi in a graphand Pa(vi) represents the values of the parentsof Vi. Likewise, the operator LPa(Vi) representsthe latent parents of variable Vi. If Vi ↔ Vj

appears in a graph then we say that they arespouses, i.e., Vi ∈ Sp(Vj) and vice versa.

When two variables Vi, Vj are independent wedenote it by (Vi⊥⊥Vj), when they are dependentby (Vi 2Vj).

2.2 Semi-Markovian Causal Models

Consider the model in Figure 1(a), it is a prob-lem with observable variables V1, . . . , V6 and la-tent variables L1, L2 and it is represented by adirected acyclic graph (DAG). As this DAG rep-resents the actual problem, henceforth we willrefer to it as the underlying DAG.

The central graphical modeling representa-tion that we use are the semi-Markovian causalmodels (Tian and Pearl, 2002).

Definition 1. A semi-Markovian causalmodel (SMCM) is a representation of a causalBayesian network (CBN) with observable vari-ables V = V1, . . . , Vn and latent variablesL = L1, . . . , Ln′. Furthermore, every latentvariable has no parents (i.e., is a root node) andhas exactly two children that are both observed.

See Figure 1(b) for an example SMCM repre-senting the underlying DAG in (a).

In a SMCM each directed edge represents animmediate autonomous causal relation betweenthe corresponding variables. Our operationaldefinition of causality is as follows: a relationfrom variable C to variable E is causal in a cer-tain context, when a manipulation in the formof a randomised controlled experiment on vari-

196 S. Meganck, S. Maes, P. Leray, and B. Manderick

Page 207: Proceedings of the Third European Workshop on Probabilistic

V 3

V 1

V 2

V 5

V 4

V 6

L 1

L 2

V 3

V 1

V 2

V 5

V 4

V 6

(a) (b)

V 3

V 1

V 2

V 5

V 4

V 6

(c)

Figure 1: (a) A problem domain represented bya causal DAG model with observable and latentvariables. (b) A semi-Markovian causal modelrepresentation of (a). (c) A maximal ancestralgraph representation of (a).

able C, induces a change in the probability dis-tribution of variable E, in that specific context(Neapolitan, 2003).

In a SMCM a bi-directed edge between twovariables represents a latent variable that is acommon cause of these two variables. Althoughthis seems very restrictive, it has been shownthat models with arbitrary latent variables canbe converted into SMCMs while preserving thesame independence relations between the ob-servable variables (Tian and Pearl, 2002).

The semantics of both directed and bi-directed edges imply that SMCMs are not max-imal. This means that they do not represent alldependencies between variables with an edge.This is because in a SMCM an edge either rep-resents an immediate causal relation or a latentcommon cause, and therefore dependencies dueto a so called inducing path, will not be repre-sented by an edge.

A node is a collider on a path if both its im-mediate neighbors on the path point into it.

Definition 2. An inducing path is a path ina graph such that each observable non-endpointnode of the path is a collider, and an ancestorof at least one of the endpoints.

Inducing paths have the property that theirendpoints can not be separated by conditioningon any subset of the observable variables. Forinstance, in Figure 1(a), the path V1 → V2 ←L1 → V6 is inducing.

SMCMs are specifically suited for anothertype of inference, i.e., causal inference.

Definition 3. Causal inference is the processof calculating the effect of manipulating somevariables X on the probability distribution ofsome other variables Y ; this is denoted asP (Y = y|do(X = x)).

An example causal inference query in theSMCM of Figure 1(b) is P (V6 = v6|do(V2 =v2)).

Tian and Pearl have introduced theoreticalcausal inference algorithms to perform causalinference in SMCMs (Pearl, 2000; Tian andPearl, 2002). However these algorithms assumethat any distribution that can be obtained fromthe JPD over the observable variables is avail-able. We will show that their algorithm per-forms efficiently on our parametrisation.

2.3 Maximal Ancestral Graphs

Maximal ancestral graphs are another approachto modeling with latent variables (Richardsonand Spirtes, 2002). The main research focus inthat area lies on learning the structure of thesemodels.

Ancestral graphs (AGs) are complete undermarginalisation and conditioning. We will onlydiscuss AGs without conditioning as is com-monly done in recent work.

Definition 4. An ancestral graph withoutconditioning is a graph containing directed →and bi-directed ↔ edges, such that there is nobi-directed edge between two variables that areconnected by a directed path.

Pearl’s d-separation criterion can be extendedto be applied to ancestral graphs, and is calledm-separation in this setting. M -separationcharacterizes the independence relations repre-sented by an ancestral graph.

Definition 5. An ancestral graph is said tobe maximal if, for every pair of non-adjacentnodes (Vi, Vj) there exists a set Z such that Vi

and Vj are independent conditional on Z.

A non-maximal AG can be transformed intoa MAG by adding some bi-directed edges (in-dicating confounding) to the model. See Figure1(c) for an example MAG representing the samemodel as the underlying DAG in (a).

Learning Semi-Markovian Causal Models using Experiments 197

Page 208: Proceedings of the Third European Workshop on Probabilistic

In this setting a directed edge represents anancestral relation in the underlying DAG withlatent variables. I.e., an edge from variable A

to B represents that in the underlying causalDAG with latent variables, there is a directedpath between A and B.

Bi-directed edges represent a latent commoncause between the variables. However, if thereis a latent common cause between two variablesA and B, and there is also a directed path be-tween A and B in the underlying DAG, thenin the MAG the ancestral relation takes prece-dence and a directed edge will be found betweenthe variables. V2 → V6 in Figure 1(c) is an ex-ample of such an edge.

Furthermore, as MAGs are maximal, therewill also be edges between variables that have noimmediate connection in the underlying DAG,but that are connected via an inducing path.The edge V1 → V6 in Figure 1(c) is an exampleof such an edge.

These semantics of edges make causal infer-ence in MAGs virtually impossible. As statedby the Manipulation Theorem (Spirtes et al.,2000), in order to calculate the causal effect ofa variable A on another variable B, the immedi-ate parents (i.e., the pre-intervention causes) ofA have to be removed from the model. However,as opposed to SMCMs, in MAGs an edge doesnot necessarily represent an immediate causalrelationship, but rather an ancestral relation-ship and hence in general the modeler does notknow which are the real immediate causes of amanipulated variable.

An additional problem for finding the pre-intervention causes of a variable in MAGs is thatwhen there is an ancestral relation and a latentcommon cause between variables, then the an-cestral relation takes precedence and the con-founding is absorbed in the ancestral relation.

Complete partial ancestral graphs (CPAGs)are defined in (Zhang and Spirtes, 2005) in thefollowing way.

Definition 6. Let [G] be the Markov equiva-lence class for an arbitrary MAG G. The com-plete partial ancestral graph (CPAG) for[G], PG, is a graph with possibly the following

edges →,↔, o−o, o→, such that

1. PG has the same adjacencies as G (andhence any member of [G]) does;

2. A mark of arrowhead > is in PG if and onlyif it is invariant in [G]; and

3. A mark of tail − is in PG if and only if itis invariant in [G].

4. A mark of o is in PG if not all members in[G] have the same mark.

2.4 Assumptions

As is customary in the graphical modeling re-search area, the SMCMs we take into accountin this paper are subject to some simplifyingassumptions:

1. Stability, i.e., the independencies in theCBN with observed and latent variablesthat generates the data are structural andnot due to several influences exactly can-celing each other out (Pearl, 2000).

2. Only a single immediate connection pertwo variables in the underlying DAG. I.e.,we do not take into account problems wheretwo variables that are connected by an im-mediate causal edge are also confoundedby a latent variable causing both variables.Constraint based learning techniques suchas IC* (Pearl, 2000) and FCI (Spirtes et al.,2000) also do not explicitly recognise mul-tiple edges between variables. However, in(Tian and Pearl, 2002) Tian presents analgorithm for performing causal inferencewhere such relations between variables aretaken into account.

3. No selection bias. Mimicking recent work,we do not take into account latent variablesthat are conditioned upon, as can be theconsequence of selection effects.

4. Discrete variables. All the variables in ourmodels are discrete.

198 S. Meganck, S. Maes, P. Leray, and B. Manderick

Page 209: Proceedings of the Third European Workshop on Probabilistic

3 Structure learning

Just as learning a graphical model in general,learning a SMCM consists of two parts: struc-ture learning and parameter learning. Both canbe done using data, expert knowledge and/orexperiments. In this section we discuss onlystructure learning.

3.1 Without latent variables

Learning the structure of Bayesian networkswithout latent variables has been studied by anumber of researchers (Pearl, 2000; Spirtes etal., 2000). The results of applying one of thosealgorithms is a representative of the Markovequivalence class.

In order to perform probabilistic or causal in-ference, we need a fully oriented structure. Forprobabilistic inference this can be any repre-sentative of the Markov equivalence class, butfor causal inference we need the correct causalgraph that models the underlying system. In or-der to achieve this, additional experiments haveto be performed.

In previous work (Meganck et al., 2006), westudied learning the completely oriented struc-ture for causal Bayesian networks without la-tent variables. We proposed a solution to min-imise the total cost of the experiments neededby using elements from decision theory. Thetechniques used there could be extended to theresults of this paper.

3.2 With latent variables

In order to learn graphical models with la-tent variables from observational data, the FastCausal Inference (FCI) algorithm (Spirtes etal., 2000) has been proposed. Recently this re-sult has been extended with the complete tailaugmentation rules introduced in (Zhang andSpirtes, 2005). The results of this algorithm isa CPAG, representing the Markov equivalenceclass of MAGs consistent with the data.

As mentioned above for MAGs, in a CPAGthe directed edges have to be interpreted asbeing ancestral instead of causal. This meansthat there is a directed edge from Vi to Vj ifVi is an ancestor of Vj in the underlying DAG

V 1 V 2

V 3 V 4

V 1 V 2

V 3 V 4

(a) (b)

Figure 2: (a) A SMCM. (b) Result of FCI, withan i-false edge V3o−oV4.

and there is no subset of observable variables D

such that (Vi⊥⊥Vj |D). This does not necessarilymean that Vi has an immediate causal influenceon Vj: it may also be a result of an inducingpath between Vi and Vj . For instance in Fig-ure 1(c), the link between V1 and V6 is presentdue to the inducing path V1, V2, L1, V6 shown inFigure 1(a).

Inducing paths may also introduce an edge oftype o→ or o−o between two variables, indicatingeither a directed or bi-directed edge, althoughthere is no immediate influence in the form ofan immediate causal influence or latent commoncause between the two variables. An example ofsuch a link is V3o−oV4 in Figure 2.

A consequence of these properties of MAGsand CPAGs is that they are not suited for causalinference, since the immediate causal parents ofeach observable variable are not available, as isnecessary according to the Manipulation The-orem. As we want to learn models that canperform causal inference, we will discuss how totransform a CPAG into a SMCM in the nextsections. Before we start, we have to men-tion that we assume that the CPAG is correctlylearned from data with the FCI algorithm andthe extended tail augmentation rules, i.e., eachresult that is found is not due to a samplingerror.

3.3 Transforming the CPAG

Our goal is to transform a given CPAG in or-der to obtain a SMCM that corresponds to thecorrespondig DAG. Remember that in generalthere are three types of edges in a CPAG: →,o→, o−o, in which o means either a tail mark− or a directed mark >. So one of the tasks to

Learning Semi-Markovian Causal Models using Experiments 199

Page 210: Proceedings of the Third European Workshop on Probabilistic

obtain a valid SMCM is to disambiguate thoseedges with at least one o as an endpoint. Asecond task will be to identify and remove theedges that are created due to an inducing path.

In the next section we will first discuss exactlywhich information we obtain from performingan experiment. Then, we will discuss the twopossibilities o→ and o−o. Finally, we will dis-cuss how we can find edges that are created dueto inducing paths and how to remove these toobtain the correct SMCM.

3.3.1 Performing experiments

The experiments discussed here play the roleof the manipulations that define a causal re-lation (cf. Section 2.2). An experiment on avariable Vi, i.e., a randomised controlled exper-iment, removes the influence of other variablesin the system on Vi. The experiment forcesa distribution on Vi, and thereby changes thejoint distribution of all variables in the systemthat depend directly or indirectly on Vi butdoes not change the conditional distribution ofother variables given values of Vi. After therandomisation, the associations of the remain-ing variables with Vi provide information aboutwhich variables are influenced by Vi (Neapoli-tan, 2003). When we perform the experimentwe cut all influence of other variables on Vi.Graphically this corresponds to removing all in-coming edges into Vi from the underlying DAG.

All parameters besides those for the variableexperimented on, (i.e., P (Vi|Pa(Vi))), remainthe same. We then measure the influence of themanipulation on variables of interest to get thepost-interventional distribution on these vari-ables.

To analyse the results of the experiment wecompare for each variable of interest Vj the orig-inal distribution P and the post-interventionaldistribution PE , thus comparing P (Vj) andPE(Vj) = P (Vj |do(Vi = vi)).

We denote performing an experiment at vari-able Vi or a set of variables W by exp(Vi) orexp(W ) respectively, and if we have to conditionon some other set of variables D while perform-ing the experiment, we denote it as exp(Vi)|Dand exp(W )|D.

In general if a variable Vi is experimented onand another variable Vj is affected by this ex-periment, we say that Vj varies with exp(Vi),denoted by exp(Vi) Vj. If there is no varia-tion in Vj we note exp(Vi) 6 Vj.

Although conditioning on a set of variablesD might cause some variables to become proba-bilistically dependent, conditioning will not in-fluence whether two variables vary with eachother when performing an experiment. I.e., sup-pose the following structure is given Vi → D ←Vj, then conditioning on D will make Vi depen-dent on Vj , but when we perform an experimenton Vi and check whether Vj varies with Vi thenconditioning on D will make no difference.

First we have to introduce the following defi-nition:

Definition 7. A potentially directed path(p.d. path) in a CPAG is a path made only ofedges of types o→ and →, with all arrowheadsin the same direction. A p.d. path from Vi toVj is denoted as Vi 99K Vj.

3.3.2 Solving o→

An overview of the different rules for solvingo→ is given in Table 1

Type 1(a)Given Ao→ B

exp(A) 6 B

Action A↔ B

Type 1(b)Given Ao→ B

exp(A) B

6 ∃p.d. path (length ≥ 2)A 99K B

Action A→ B

Type 1(c)Given Ao→ B

exp(A) B

∃p.d. path (length ≥ 2)A 99K B

Action Block all p.d. paths by condi-tioning on blocking set D

(a) exp(A)|D B: A→ B

(b) exp(A)|D 6 B: A↔ B

Table 1: An overview of the different actionsneeded to complete edges of type o→.

200 S. Meganck, S. Maes, P. Leray, and B. Manderick

Page 211: Proceedings of the Third European Workshop on Probabilistic

For any edge Vio→ Vj , there is no need toperform an experiment on Vj because we knowthat there can be no immediate influence of Vj

on Vi, so we will only perform an experiment onVi.

If exp(Vi) 6 Vj , then there is no influence ofVi on Vj, so we know that there can be no di-rected edge between Vi and Vj and thus the onlyremaining possibility is Vi ↔ Vj (Type 1(a)).

If exp(Vi) Vj, then we know for sure thatthere is an influence of Vi on Vj, we now need todiscover whether this influence is immediate orvia some intermediate variables. Therefore wemake a difference whether there is a potentiallydirected (p.d.) path between Vi and Vj of length≥ 2, or not. If no such path exists, then theinfluence has to be immediate and the edge isfound Vi → Vj (Type 1(b)).

If at least one p.d. path Vi 99K Vj exists,we need to block the influence of those pathson Vj while performing the experiment, so wetry to find a blocking set D for all these paths.If exp(Vi)|D Vj , then the influence has tobe immediate, because all paths of length ≥ 2are blocked, so Vi → Vj. On the other hand ifexp(Vi)|D 6 Vj , there is no immediate influ-ence and the edge is Vi ↔ Vj (Type 1(c)).

3.3.3 Solving o−o

An overview of the different rules for solvingo−o is given in Table 2.

For any edge Vio−oVj, we have no informationat all, so we might need to perform experimentson both variables.

If exp(Vi) 6 Vj , then there is no influenceof Vi on Vj so we know that there can be nodirected edge between Vi and Vj and thus theedge is of the following form: Vi ←oVj , whichthen becomes a problem of Type 1.

If exp(Vi) Vj, then we know for sure thatthere is an influence of Vi on Vj, and as in thecase of Type 1(b), we make a difference whetherthere is a potentially directed path between Vi

and Vj of length ≥ 2, or not. If no such pathexists, then the influence has to be immediateand the edge must be turned into Vi → Vj.

If at least one p.d. path Vi 99K Vj exists,we need to block the influence of those paths

Type 2(a)Given Ao−oB

exp(A) 6 B

Action A←oB(⇒Type 1)

Type 2(b)Given Ao−oB

exp(A) B

6 ∃p.d. path (length ≥ 2)A 99K B

Action A→ B

Type 2(c)Given Ao−oB

exp(A) B

∃p.d. path (length ≥ 2)A 99K B

Action Block all p.d. paths by condi-tioning on blocking set D

(a) exp(A)|D B: A→ B

(b) exp(A)|D 6 B: A←oB

(⇒Type 1)

Table 2: An overview of the different actionsneeded to complete edges of type o−o.

on Vj while performing the experiment, so wefind a blocking set D like with Type 1(c). Ifexp(Vi)|D Vj, then the influence has to beimmediate, because all paths of length ≥ 2 areblocked, so Vi → Vj . On the other hand ifexp(Vi)|D 6 Vj , there is no immediate influ-ence and the edge is of type: Vi ←oVj , whichagain becomes a problem of Type 1.

3.3.4 Removing inducing path edges

An inducing path between two variables Vi

and Vj might create an edge between these twovariables during learning because the two aredependent conditional on any subset of observ-able variables. As mentioned before, this typeof edges is not present in SMCMs as it does notrepresent an immediate causal influence or a la-tent variable in the underlying DAG. We willcall such an edge an i-false edge.

For instance, in Figure 1(a) the pathV1, V2, L1, V6 is an inducing path, which causesthe FCI algorithm to find an i-false edge be-tween V1 and V6, see Figure 1(c). Another ex-ample is given in Figure 2 where the SMCM isgiven in (a) and the result of FCI in (b). The

Learning Semi-Markovian Causal Models using Experiments 201

Page 212: Proceedings of the Third European Workshop on Probabilistic

edge between V3 and V4 in (b) is a consequenceof the inducing path via the observable variablesV3, V1, V2, V4.

In order to be able to apply a causal inferencealgorithm we need to remove all i-false edgesfrom the learned structure. We need to identifythe substructures that can indicate this type ofedges. This is easily done by looking at anytwo variables that are connected by an immedi-ate connection, and when this edge is removed,they have at least one inducing path betweenthem. To check whether the immediate con-nection needs to be present we have to blockall inducing paths by performing one or moreexperiments on an inducing path blocking set(i-blocking set) Dip and block all other pathsby conditioning on a blocking set D. If Vi andVj are dependent, i.e., Vi 2Vj under these cir-cumstances the edge is correct and otherwise itcan be removed.

In the example of Figure 1(c), we can blockthe inducing path by performing an experimenton V2, and hence can check that V1 and V6

do not covary with each other in these circum-stances, so the edge can be removed.

In Table 3 an overview of the actions to re-solve i-false edges is given.

Given A pair of connected variables Vi, Vj

A set of inducing paths Vi, . . . , Vj

Action Block all inducing paths Vi, . . . , Vj

by conditioning on i-blocking setDip.Block all other paths between Vi

and Vj by conditioning on blockingset D.When performing all exp(Dip)|D:if Vi 2Vj: confounding is realelse remove edge between Vi, Vj

Table 3: Removing inducing path edges.

3.4 Example

We will demonstrate a number of steps to dis-cover the completely oriented SMCM (Figure1(b)) based on the result of the FCI algorithmapplied on observational data generated fromthe underlying DAG in Figure 1(a). The result

V 3

V 1

V 2

V 5

V 4

V 6

V 3

V 1

V 2

V 5

V 4

V 6

V 3

V 1

V 2

V 5

V 4

V 6

(b) (c)

(d) (e)

V 3

V 1

V 2

V 5

V 4

V 6

V 3

V 1

V 2

V 5

V 4

V 6

(f)

V 3

V 1

V 2

V 5

V 4

V 6

(a)

Figure 3: (a) The result of FCI on data of theunderlying DAG of Figure 1(a). (b) Result ofan experiment on V5. (c) After experiment onV4. (d) After experiment on V3. (e) After ex-periment on V2 while conditioning on V3. (f)After resolving all problems of Type 1 and 2.

of the FCI algorithm can be seen in Figure 3(a).We will first resolve problems of Type 1 and 2,and then remove i-false edges. The result ofeach step is indicated in Figure 3.

• exp(V5)

– V5o−oV4:exp(V5) 6 V4 ⇒ V4o→ V5 (Type 2(a))

– V5o→ V6:exp(V5) 6 V6 ⇒ V5 ↔ V6 (Type 1(a))

• exp(V4)

– V4o−oV2:exp(V4) 6 V2 ⇒ V2o→ V4 (Type 2(a))

– V4o−oV3:exp(V4) 6 V3 ⇒ V3o→ V4 (Type 2(a))

– V4o→ V5:exp(V4) V5 ⇒ V4 → V5 (Type 1(b))

– V4o→ V6:exp(V4) V6 ⇒ V4 → V6 (Type 1(b))

• exp(V3)

– V3o−oV2:exp(V3) 6 V2 ⇒ V2o→ V3 (Type 2(a))

– V3o→ V4:exp(V3) V4 ⇒ V3 → V4 (Type 1(b))

202 S. Meganck, S. Maes, P. Leray, and B. Manderick

Page 213: Proceedings of the Third European Workshop on Probabilistic

• exp(V2) and exp(V2)|V3, because two p.d.paths between V2 and V4

– V2o−oV1:exp(V2) 6 V1 ⇒ V1o→ V2 (Type 2(a))

– V2o→ V3:exp(V2) V3 ⇒ V2 → V3 (Type 1(b))

– V2o→ V4:exp(V2)|V3 V4 ⇒ V2 → V4 (Type1(c))

After resolving all problems of Type 1 and2 we end up with the SMCM structure shownin Figure 3(f). This representation is no longerconsistent with the MAG representation sincethere are bi-directed edges between two vari-ables on a directed path, i.e., V2, V6. There isa potentially i-false edge V1 ↔ V6 in the struc-ture with inducing path V1, V2, V6, so we needto perform an experiment on V2, blocking allother paths between V1 and V6 (this is also doneby exp(V2) in this case). Given that the orig-inal structure is as in Figure 1(a), performingexp(V2) shows that V1 and V6 are independent,i.e., exp(V2) : (V1⊥⊥V6). Thus the bi-directededge between V1 and V6 is removed, giving usthe SMCM of Figure 1(b).

4 Parametrisation of SMCMs

As mentioned before, in their work on causal in-ference, Tian and Pearl provide an algorithm forperforming causal inference given knowledge ofthe structure of an SMCM and the joint prob-ability distribution (JPD) over the observablevariables. However, they do not provide a para-metrisation to efficiently store the JPD over theobservables.

We start this section by discussing the fac-torisation for SMCMs introduced in (Tian andPearl, 2002). From that result we derive an ad-ditional representation for SMCMs and a para-metrisation that facilitates probabilistic andcausal inference. We will also discuss how theseparameters can be learned from data.

4.1 Factorising with Latent Variables

Consider an underlying DAG with observablevariables V = V1, . . . , Vn and latent variables

L = L1, . . . , Ln′. Then the joint probabil-ity distribution can be written as the followingmixture of products:

P (v) =∑

lk|Lk∈L

Vi∈V

P (vi|Pa(vi), LPa(vi))

(1)∏

Lj∈L

P (lj).

Taking into account that in a SMCM the la-tent variables are implicitly represented by bi-directed edges, we introduce the following defi-nition.

Definition 8. In a SMCM, the set of observ-able variables can be partitioned by assigningtwo variables to the same group iff they are con-nected by a bi-directed path. We call such agroup a c-component (from “confounded com-ponent”) (Tian and Pearl, 2002).

For instance, in Figure 1(b) variablesV2, V5, V6 belong to the same c-component.Then it can be readily seen that c-componentsand their associated latent variables form re-spective partitions of the observable and la-tent variables. Let Q[Si] denote the contribu-tion of a c-component with observable variablesSi ⊂ V to the mixture of products in Equa-tion 1. Then we can rewrite the JPD as follows:P (v) =

∏i∈1,...,k

Q[Si].

Finally, (Tian and Pearl, 2002) proved thateach Q[S] could be calculated as follows. LetVo1

< . . . < Vonbe a topological order over V ,

and let V (i) = Vo1< . . . < Voi

, i = 1, . . . , n, andV (0) = ∅.

Q[S] =∏

Vi∈S

P (vi|(Ti ∪ Pa(Ti))\Vi) (2)

where Ti is the c-component of the SMCM G

reduced to variables V (i), that contains Vi. TheSMCM G reduced to a set of variables V ′ ⊂V is the graph obtained by removing from thegraph all variables V \V ′ and the edges that areconnected to them.

In the rest of this section we will developa method for deriving a DAG from a SMCM.

Learning Semi-Markovian Causal Models using Experiments 203

Page 214: Proceedings of the Third European Workshop on Probabilistic

We will show that the classical factorisation∏P (vi|Pa(vi)) associated with this DAG, is the

same as the one associated with the SMCM, asabove.

4.2 Parametrised representation

Here we first introduce an additional represen-tation for SMCMs, then we show how it can beparametrised and, finally we discuss how thisnew representation could be optimised.

4.2.1 PR-representation

Consider Vo1< . . . < Von

to be a topologicalorder O over the observable variables V , onlyconsidering the directed edges of the SMCM,and let V (i) = Vo1

< . . . < Voi, i = 1, . . . , n, and

V (0) = ∅. Then Table 4 shows how the para-metrised (PR-) representation can be obtainedfrom the original SMCM structure.

Given a SMCM G and a topological orderingO, the PR-representation has these properties:

1. The nodes are V , i.e., the observable var-iables of the SMCM.

2. The directed edges are the same as in theSMCM.

3. The confounded edges are replaced by anumber of directed edges as follows:Add an edge from node Vi to node Vj iff:

a) Vi ∈ (Tj ∪ Pa(Tj)), where Tj is thec-component of G reduced to variables

V (j) that contains Vj ,b) and there was not already an edgebetween nodes Vi and Vj.

Table 4: Obtaining the PR-representation.

This way each variable becomes a child of thevariables it would condition on in the calcula-tion of the contribution of its c-component as inEquation (2).

Figure 4(a) shows the PR-representation ofthe SMCM in Figure 1(a). The topological or-der that was used here is V1 < V2 < V3 < V4 <

V5 < V6 and the directed edges that have beenadded are V1 → V5, V2 → V5, V1 → V6, V2 → V6,and, V5 → V6.

V 3

V 1

V 2

V 5

V 4

V 6

V 1 ,V

2 ,V

4 ,V

5 ,V

6 V 2 ,V 3 ,V 4

V 2 , V 4

(a) (b)

Figure 4: (a) The PR-representation applied tothe SMCM of Figure 1(b). (b) Junction treerepresentation of the DAG in (a).

The resulting DAG is an I -map (Pearl, 1988),over the observable variables of the indepen-dence model represented by the SMCM. Thismeans that all the independencies that can bederived from the new graph must also be presentin the JPD over the observable variables. Thisproperty can be more formally stated as the fol-lowing theorem.

Theorem 9. The PR-representation PR de-rived from a SMCM S is an I-map of thatSMCM.

Proof. Consider two variables A and B in PR

that are not connected by an edge. This meansthat they are not independent, since a necessarycondition for two variables to be conditionallyindependent in a stable DAG is that they arenot connected by an edge. PR is stable as weconsider only stable problems (Assumption 1 inSection 2.4).

Then, from the method for constructing PR

we can conclude that S (i) contains no directededge between A and B, (ii) contains no bi-directed edge between A and B, (iii) containsno inducing path between A and B. Property(iii) holds because the only inducing paths thatare possible in a SMCM are those between amember of a c-component and the immediateparent of another variable of the c-component,and in these cases the method for constructingPR adds an edge between those variables. Be-cause of (i),(ii), and (iii) we can conclude thatA and B are independent in S.

204 S. Meganck, S. Maes, P. Leray, and B. Manderick

Page 215: Proceedings of the Third European Workshop on Probabilistic

4.2.2 Parametrisation

For this DAG we can use the same para-metrisation as for classical BNs, i.e., learningP (vi|Pa(vi)) for each variable, where Pa(vi)denotes the parents in the new DAG. In thisway the JPD over the observable variablesfactorises as in a classical BN, i.e., P (v) =∏

P (vi|Pa(vi)). This follows immediately fromthe definition of a c-component and from Equa-tion (2).

4.2.3 Optimising the Parametrisation

We have mentioned that the number ofedges added during the creation of the PR-representation depends on the topological orderof the SMCM.

As this order is not unique, choosing an or-der where variables with a lesser amount of par-ents have precedence, will cause less edges tobe added to the DAG. This is because most ofthe added edges go from parents of c-componentmembers to c-component members that aretopological descendants.

By choosing an optimal topological order, wecan conserve more conditional independence re-lations of the SMCM and thus make the graphmore sparse, thus leading to a more efficientparametrisation.

4.2.4 Learning parameters

As the PR-representation of SMCMs is aDAG as in the classical Bayesian network for-malism, the parameters that have to be learnedare P (vi|Pa(vi)). Therefore, techniques suchas ML and MAP estimation (Heckerman, 1995)can be applied to perform this task.

4.3 Probabilistic inference

One of the most famous existing probabilisticinference algorithm for models without latentvariables is the junction tree algorithm (JT)(Lauritzen and Spiegelhalter, 1988).

These techniques cannot immediately be ap-plied to SMCMs for two reasons. First of alluntil now no efficient parametrisation for thistype of models was available, and secondly, itis not clear how to handle the bi-directed edgesthat are present in SMCMs.

We have solved this problem by first trans-forming the SMCM into its PR-representation,which allows us to apply the junction tree in-ference algorithm. This is a consequence ofthe fact that, as previously mentioned, the PR-representation is an I -map over the observablevariables. And as the JT algorithm is basedonly on independencies in the DAG, applying itto an I -map of the problem gives correct results.See Figure 4(b) for the junction tree obtainedfrom the parametrised representation in Figure4(a).

Note that any other classical probabilistic in-ference technique that only uses conditional in-dependencies between variables could also beapplied to the PR-representation.

4.4 Causal inference

Tian and Pearl (2002) developed an algorithmfor performing causal inference, however asmentioned before they have not provided an ef-ficient parametrisation.

Richardson and Spirtes (2003) show causalinference in AGs on an example, but a detailedapproach is not provided and the problem ofwhat to do when some of the parents of a vari-able are latent is not solved.

By definition in the PR-representation, theparents of each variable are exactly those vari-ables that have to be conditioned on in order toobtain the factor of that variable in the calcula-tion of the c-component, see Table 4 and (Tianand Pearl, 2002). Thus, the PR-representationprovides all the necessary quantitative informa-tion, while the original structure of the SMCMprovides the necessary structural information,for the algorithm by Tian and Pearl to be ap-plied.

5 Conclusions and Perspectives

In this paper we have proposed a number of so-lutions to problems that arise when using SM-CMs in practice.

More precisely, we showed that there is a biggap between the models that can be learnedfrom data alone and the models that are usedin theory. We showed that it is important to re-trieve the fully oriented structure of a SMCM,

Learning Semi-Markovian Causal Models using Experiments 205

Page 216: Proceedings of the Third European Workshop on Probabilistic

and discussed how to obtain this from a givenCPAG by performing experiments.

For future work we would like to relax theassumptions made in this paper. First of allwe want to study the implications of allowingtwo types of edges between two variables, i.e.,confounding as well as a immediate causal rela-tionship. Another direction for possible futurework would be to study the effect of allowingmultiple joint experiments in other cases thanwhen removing inducing path edges.

Furthermore, we believe that applying theorientation and tail augmentation rules of(Zhang and Spirtes, 2005) after each experi-ment, might help to reduce the number of exper-iments needed to fully orient the structure. Inthis way we could extend to SMCMs our previ-ous results (Meganck et al., 2006) on minimisingthe total number of experiments in causal mod-els without latent variables. This would allow tocompare empirical results with the theoreticalbounds developed in (Eberhardt et al., 2005).

Up until now SMCMs have not been para-metrised in another way than by the entire jointprobability distribution. Another contributionof our paper is that we showed that using analternative representation, we can parametriseSMCMs in order to perform probabilistic as wellas causal inference. Furthermore this new rep-resentation allows to learn the parameters usingclassical methods.

We have informally pointed out that thechoice of a topological order when creating thePR-representation, influences its size and thusits efficiency. We would like to investigate thisproperty in a more formal manner. Finally, wehave started implementing the techniques intro-duced in this paper into the structure learningpackage (SLP)1 of the Bayesian networks tool-box (BNT)2 for MATLAB.

Acknowledgments

This work was partially funded by an IWT-scholarship and by the IST Programme of theEuropean Community, under the PASCAL net-

1http://banquiseasi.insa-rouen.fr/projects/bnt-slp/2http://bnt.sourceforge.net/

work of Excellence, IST-2002-506778.

References

F. Eberhardt, C. Glymour, and R. Scheines. 2005.On the number of experiments sufficient and inthe worst case necessary to identify all causal rela-tions among n variables. In Proc. of the 21st Con-ference on Uncertainty in Artificial Intelligence(UAI), pages 178–183. Edinburgh.

D. Heckerman. 1995. A tutorial on learning withBayesian networks. Technical report, MicrosoftResearch.

S. L. Lauritzen and D. J. Spiegelhalter. 1988. Lo-cal computations with probabilities on graphicalstructures and their application to expert sys-tems. Journal of the Royal Statistical Society, se-ries B, 50:157–244.

S. Meganck, P. Leray, and B. Manderick. 2006.Learning causal Bayesian networks from observa-tions and experiments: A decision theoretic ap-proach. In Modeling Decisions in Artificial Intel-ligence, LNAI 3885, pages 58 – 69.

R.E. Neapolitan. 2003. Learning Bayesian Net-works. Prentice Hall.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems. Morgan Kaufmann.

J. Pearl. 2000. Causality: Models, Reasoning andInference. MIT Press.

T. Richardson and P. Spirtes. 2002. Ancestral graphMarkov models. Technical Report 375, Dept. ofStatistics, University of Washington.

T. Richardson and P. Spirtes, 2003. Causal infer-ence via ancestral graph models, chapter 3. Ox-ford Statistical Science Series: Highly StructuredStochastic Systems. Oxford University Press.

P. Spirtes, C. Glymour, and R. Scheines. 2000. Cau-sation, Prediction and Search. MIT Press.

J. Tian and J. Pearl. 2002. On the identificationof causal effects. Technical Report (R-290-L),UCLA C.S. Lab.

J. Zhang and P. Spirtes. 2005. A characterization ofMarkov equivalence classes for ancestral graphicalmodels. Technical Report 168, Dept. of Philoso-phy, Carnegie-Mellon University.

206 S. Meganck, S. Maes, P. Leray, and B. Manderick

Page 217: Proceedings of the Third European Workshop on Probabilistic

Geometry of Rank Tests

Jason Morton, Lior Pachter, Anne Shiu, Bernd Sturmfels, and Oliver WienandDepartment of Mathematics, UC Berkeley

Abstract

We study partitions of the symmetric group which have desirable geometric properties.The statistical tests defined by such partitions involve counting all permutations in theequivalence classes. These permutations are the linear extensions of partially ordered setsspecified by the data. Our methods refine rank tests of non-parametric statistics, suchas the sign test and the runs test, and are useful for the exploratory analysis of ordinaldata. Convex rank tests correspond to probabilistic conditional independence structuresknown as semi-graphoids. Submodular rank tests are classified by the faces of the cone ofsubmodular functions, or by Minkowski summands of the permutohedron. We enumerateall small instances of such rank tests. Graphical tests correspond to both graphical modelsand to graph associahedra, and they have excellent statistical and algorithmic properties.

1 Introduction

The non-parametric approach to statistics wasintroduced by (Pitman, 1937). The emergenceof microarray data in molecular biology has ledto a number of new tests for identifying signifi-cant patterns in gene expression time series; seee.g. (Willbrand, 2005). This application moti-vated us to develop a mathematical theory ofrank tests. We propose that a rank test is apartition of Sn induced by a map τ : Sn → Tfrom the symmetric group of all permutations of[n] = 1, . . . , n onto a set T of statistics. Thestatistic τ(π) is the signature of the permuta-tion π ∈ Sn. Each rank test defines a partitionof Sn into classes, where π and π′ are in thesame class if and only if τ(π) = τ(π′). We iden-tify T = image(τ) with the set of all classes inthis partition of Sn. Assuming the uniform dis-tribution on Sn, the probability of seeing a par-ticular signature t ∈ T is 1/n! times |τ−1(t)|.The computation of a p-value for a given per-mutation π ∈ Sn typically amounts to summing

Pr(π′) =1n!· | τ−1

(τ(π′)

)| (1)

over all permutations π′ with Pr(π′) < Pr(π).In Section 2 we explain how existing rank testscan be understood from our point of view.

In Section 3 we describe the class of con-vex rank tests which captures properties of testsused in practice. We work in the language ofalgebraic combinatorics (Stanley, 1997). Con-vex rank tests are in bijection with polyhedralfans that coarsen the hyperplane arrangementof Sn, and with conditional independence struc-tures known as semi-graphoids (Studeny, 2005).

Section 4 is devoted to convex rank tests thatare induced by submodular functions. Thesesubmodular rank tests are in bijection withMinkowski summands of the (n−1)-dimensionalpermutohedron and with structural imset mod-els. Furthermore, these tests are at a suitablelevel of generality for the biological applicationsthat motivated us. We make the connectionsto polytopes and independence models concreteby classifying all convex rank tests for n ≤ 5.

In Section 5 we discuss the class of graphi-cal tests. In mathematics, these correspond tograph associahedra, and in statistics to graphi-cal models. The equivalence of these two struc-tures is shown in Theorem 18. The implemen-tation of convex rank tests requires the efficientenumeration of linear extensions of partially or-dered sets (posets). A key ingredient is a highlyoptimized method for computing distributivelattices. Our software is discussed in Section 6.

Page 218: Proceedings of the Third European Workshop on Probabilistic

2 Rank tests and posets

A permutation π in Sn is a total order on [n] =1, . . . , n. This means that π is a set of

(n2

)ordered pairs of elements in [n]. If π and π′ arepermutations then π ∩ π′ is a partial order.

In the applications we have in mind, the dataare vectors u ∈ Rn with distinct coordinates.The permutation associated with u is the to-tal order π = (i, j) ∈ [n] × [n] : ui < uj .We shall employ two other ways of writingthis permutation. The first is the rank vectorρ = (ρ1, . . . , ρn), whose defining properties areρ1, . . . , ρn = [n] and ρi < ρj if and only ifui < uj . That is, the coordinate of the rankvector with value i is at the same position as theith smallest coordinate of u. The second is thedescent vector δ = (δ1, . . . , δn), defined by uδi

>uδi+1

. The ith coordinate of the descent vectoris the position of the ith largest value of u. Forexample, if u = (11, 7, 13) then its permutationis represented by π = (2, 1), (1, 3), (2, 3), byρ = (2, 1, 3), or by δ = (3, 1, 2).

A permutation π is a linear extension of apartial order P on [n] if P ⊆ π. We writeL(P ) ⊆ Sn for the set of linear extensions of P .A partition τ of the symmetric group Sn is a pre-convex rank test if the following axiom holds:

(PC)If τ(π) = τ(π′) and π′′ ∈ L(π ∩ π′)

then τ(π)=τ(π′)=τ(π′′).

Note that π′′ ∈ L(π ∩ π′) means π ∩ π′ ⊆ π′′.For n = 3 the number of all rank tests is the Bellnumber B6 = 203. Of these 203 set partitionsof S3, only 40 satisfy the axiom (PC).

Each class C of a pre-convex rank test τ cor-responds to a poset P on [n]; namely, P isthe intersection of all total orders in that class:P =

⋂π∈C π. The axiom (PC) ensures that C

coincides with the set L(P ) of all linear exten-sions of P . The inclusion C ⊆ L(P ) is clear. Forthe reverse inclusion, note that from any permu-tation π in L(P ), we can obtain any other π′ inL(P ) by a sequence of reversals (a, b) 7→ (b, a),where each intermediate π is also in L(P ). As-sume π0 ∈ L(P ) and π1 ∈ C differ by one rever-sal (a, b) ∈ π0, (b, a) ∈ π1. Then (b, a) /∈ P , so

there is some π2 ∈ C such that (a, b) ∈ π2; thus,π0 ∈ L(π1 ∩ π2) by (PC). This shows π0 ∈ C.

A pre-convex rank test is thus an unorderedcollection of posets P1, , . . . , Pk on [n] that sat-isfies the property that Sn is the disjoint unionof the subsets L(P1), . . . ,L(Pk). The posets Pi

that represent the classes in a pre-convex ranktest capture the shapes of data vectors.

Example 1 (The sign test for paired data).The sign test is performed on data that arepaired as two vectors u = (u1, u2, . . . , um) andv = (v1, v2, . . . , vm). The null hypothesis is thatthe median of the differences ui − vi is 0. Thetest statistic is the number of differences thatare positive. This test is a rank test, because uand v can be transformed into the overall ranksof the n = 2m values, and the rank vector en-tries can then be compared. This test coarsensthe convex rank test which is the MSS of Section4 with K = 1,m + 1, 2,m + 2, . . . .Example 2 (Runs tests). A runs test can beused when there is a natural ordering on thedata points, such as in a time series. The dataare transformed into a sequence of ‘pluses’ and‘minuses,’ and the null hypothesis is that thenumber of observed runs is no more than thatexpected by chance. A runs test is a coarseningof the convex rank test τ described in (Will-brand, 2005, Section 6.1.1) and in Example 4.

These two examples suggest that many testsfrom classical statistics have a natural refine-ment by a pre-convex rank test. The term“pre-convex” refers to the following interpreta-tion of the axiom (PC). Consider any two vec-tors u and u′ in Rn, and a convex combinationu′′ = λu + (1 − λ)u′, with 0 < λ < 1. Ifπ, π′, π′′ are the permutations of u, u′, u′′ thenπ′′ ∈ L(π ∩ π′). Thus the regions in Rn speci-fied by a pre-convex rank test are convex cones.

3 Convex rank tests

A fan in Rn is a finite collection F of polyhe-dral cones which satisfies the following proper-ties: (i) if C ∈ F and C ′ is a face of C, thenC ′ ∈ F , (ii) If C,C ′ ∈ F , then C ∩ C ′ is a faceof C. Two vectors u and v in Rn are permu-tation equivalent when ui < uj if and only if

208 J. Morton, L. Pachter, A. Shiu, B. Sturmfels, and O. Wienand

Page 219: Proceedings of the Third European Workshop on Probabilistic

vi < vj , and ui = uj if and only if vi = vj for alli, j ∈ [n]. The permutation equivalence classes(of which there are 13 for n = 3) induce a fanwhich we call the Sn-fan. The maximal conesin the Sn-fan, which are the closures of the per-mutation equivalence classes corresponding tototal orders, are indexed by permutations δ inSn. A coarsening of the Sn-fan is a fan F suchthat every permutation equivalence class of Rn

is fully contained in a cone C of F ; F definesa partition of Sn because each maximal cone ofthe Sn-fan is contained in some cone C ∈ F .We define a convex rank test to be a partitionof Sn defined by a coarsening of the Sn-fan. Weidentify the fan with that test.

Two maximal cones of the Sn-fan share a wallif there exists an index k such that δk = δ′k+1,δk+1 = δ′k and δi = δ′i for i 6= k, k + 1. That is,the corresponding permutations δ and δ′ differby an adjacent transposition. To such an un-ordered pair δ, δ′, we associate the followingconditional independence (CI) statement:

δk ⊥⊥ δk+1 | δ1, . . . , δk−1. (2)

This formula defines a map from the set of wallsof the Sn-fan onto the set of all CI statements

Tn =

i ⊥⊥ j |K : K ⊆ [n]\i, j.

The map from walls to CI statements is not in-jective; there are (n−k−1)!(k−1)! walls whichare labelled by the statement (2).

Any convex rank test F is characterized bythe collection of walls δ, δ′ that are removedwhen passing from the Sn-fan to F . So, from(2), any convex rank test F maps to a set MFof CI statements corresponding to missing walls.Recall from (Matus, 2004) and (Studeny, 2005)that a subset M of Tn is a semi-graphoid if thefollowing axiom holds:

i ⊥⊥ j |K ∪ ` ∈M and i ⊥⊥ ` |K ∈Mimplies i ⊥⊥ j |K ∈M and i ⊥⊥ ` |K∪j ∈M.

Theorem 3. The map F 7→MF is a bijectionbetween convex rank tests and semi-graphoids.

Example 4 (Up-down analysis for n = 3). Thetest in (Willbrand, 2005) is a convex rank test

123• 132•

312•

321•

231•

213•11

1111

111

111111111

1⊥⊥3|∅

1⊥⊥3|2

1⊥⊥3|∅

1⊥⊥3|2

qqqqq

MMMMMMMMM

MMMMMMMMM

q q q q q

Figure 1: The permutohedron P3 and the S3-fan projected to the plane. Each permutationis represented by its descent vector δ = δ1δ2δ3.

and is visualized in Figure 1. Permutations arein the same class if they are connected by asolid edge; there are four classes. In the S3-fan,the two missing walls are labeled by conditionalindependence statements as defined in (2).

Example 5 (Up-down analysis for n = 4). Thetest F in (Willbrand, 2005) is shown in Figure 2.The double edges correspond to the 12 CI state-ments inMF . There are 8 classes; e.g., the class3412, 3142, 1342, 1324, 3124 consists of the 5permutations with up-down pattern (−,+,−).

Our proof of Theorem 3 rests on translatingthe semi-graphoid axiom for a set of CI state-ments into geometric statements about the cor-responding set of edges of the permutohedron.

The Sn-fan is the normal fan (Ziegler, 1995)of the permutohedron Pn, which is the convexhull of the vectors (ρ1, . . . , ρn) ∈ Rn, where ρruns over all rank vectors of permutations inSn. The edges of Pn correspond to walls andare thus labeled with CI statements. A collec-tion of parallel edges of Pn perpendicular to ahyperplane xi = xj corresponds to the set ofCI statements i ⊥⊥ j|K, where K ranges overall subsets of [n]\i, j. The two-dimensionalfaces of Pn are squares and regular hexagons,and two edges of Pn have the same label in Tn

Geometry of Rank Tests 209

Page 220: Proceedings of the Third European Workshop on Probabilistic

3214•

2314•

3241• 2341

3124• 2134•

3421• 2431

1324• 1234•3142•

2143•

3412• 24134321 4231

1342• 1243•

4312• 42131432

• 1423•4132• 4123•

LLLLLLLLLL

))))))))

)))))))

PPPPPPPPPP OOOOO

OOOOO

++++

++++

::::

::::

::::

::::

::::

::::

9999

9999

9999

9999

9999

9999vvvvvvvvvvvv

vvvvvvvvvvvv

xxxxxxxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxx((((((((

9999

9999

9999

9999

9999

9999

PPPPPPPPPPPP

xxxxxxxxxxxx

xxxxxxxxxxxx

7777

7777

7777

7777

7777

7777

Figure 2: The permutohedron P4 with verticesmarked by descent vectors δ. The test “up-down analysis” is indicated by the double edges.

if, but not only if, they are opposite edges of asquare. A semi-graphoid M can be identifiedwith the set M of edges with labels from M.The semi-graphoid axiom translates into a geo-metric condition on the hexagonal faces of Pn.

Observation 6. A set M of edges of the per-mutohedron Pn is a semi-graphoid if and onlyif M satisfies the following two axioms:Square axiom: Whenever an edge of a squareis in M, then the opposite edge is also in M.Hexagon axiom: Whenever two adjacent edg-es of a hexagon are in M, then the two oppositeedges of that hexagon are also in M.

Let M be the subgraph of the edge graph ofPn defined by the statements in M. Then theclasses of the rank test defined by M are givenby the permutations in the path-connected com-ponents of M. We regard a path from δ toδ′ on Pn as a word σ(1) · · ·σ(l) in the free as-sociative algebra A generated by the adjacenttranspositions of [n]. For example, the wordσ23 := (23) gives the path from δ to δ′ = σ23δ =δ1δ3δ2δ4 . . . δn. The following relations in A de-

fine a presentation of the group algebra of Sn:

(BS) σi,i+1σi+k+1,i+k+2 − σi+k+1,i+k+2σi,i+1,

(BH) σi,i+1σi+1,i+2σi,i+1 − σi+1,i+2σi,i+1σi+1,i+2,

(BN) σ2i,i+1 − 1,

where suitable i and k vary over [n]. The firsttwo are the braid relations, and the last repre-sents the idempotency of each transposition.

Now, we regard these relations as proper-ties of a set of edges of Pn, by identifying aword and a permutation δ with the set of edgesthat comprise the corresponding path in Pn.For example, a set satisfying (BS) is one suchthat, starting from any δ, the edges of the pathσi,i+1σi+k+1,i+k+2 are in the set if and only if theedges of the path σi+k+1,i+k+2σi,i+1 are in theset. Note then, that (BS) is the square axiom,and (BH) is a weakening of the hexagon ax-iom of semi-graphoids. That is, implications ineither direction hold in a semi-graphoid. How-ever, (BN) holds only directionally in a semi-graphoid: if an edge lies in the semi-graphoid,then its two vertices are in the same class; butthe empty path at some vertex δ certainly doesnot imply the presence of all incident edges inthe semi-graphoid. Thus, for a semi-graphoid,we have (BS) and (BH), but must replace (BN)with the directional version

(BN ′) σ2i,i+1 → 1.

Consider a path p from δ to δ′ in a semi-graphoid. A result of (Tits, 1968) gives the fol-lowing lemma; see also (Brown, 1989, p. 49-51).

Lemma 7. If M is a semi-graphoid, then if δand δ′ lie in the same class of M, then so doall shortest paths on Pn between them.

We are now equipped to prove Theorem 3.Note that we have demonstrated that semi-graphoids and convex rank tests can be re-garded as sets of edges of Pn, so we will showthat their axiom systems are equivalent. Wefirst show that a semi-graphoid satisfies (PC).Consider δ, δ′ in the same class C of a semi-graphoid, and let δ′′ ∈ L(δ, δ′). Further, let p bea shortest path from δ to δ′′ (so, pδ = δ′′), andlet q be a shortest path from δ′′ to δ′. We claim

210 J. Morton, L. Pachter, A. Shiu, B. Sturmfels, and O. Wienand

Page 221: Proceedings of the Third European Workshop on Probabilistic

that qp is a shortest path from δ to δ′, and thusδ′′ ∈ C by Lemma 7. Suppose qp is not a short-est path. Then, we can obtain a shorter path inthe semi-graphoid by some sequence of substitu-tions according to (BS), (BH), and (BN’). Only(BN’) decreases the length of a path, so the se-quence must involve (BN’). Therefore, there issome i, j in [n], such that their positions rela-tive to each other are reversed twice in qp. Butp and q are shortest paths, hence one reversaloccurs in each p and q. Then δ and δ′ agree onwhether i > j or j > i, but the reverse holdsin δ′′, contradicting δ′′ ∈ L(δ, δ′). Thus everysemi-graphoid is a pre-convex rank test.

Now, we show that a semi-graphoid corre-sponds to a fan. Consider the cone corre-sponding to a class C. We need only showthat it meets any other cone in a shared face.Since C is a cone of a coarsening of the Sn-fan,each nonmaximal face of C lies in a hyperplaneH = xi = xj. Suppose a face of C coin-cides with the hyperplane H and that i > j inC. A vertex δ borders H if i and j are adja-cent in δ. We will show that if δ, δ′ ∈ C bor-der H, then their reflections δ = δ1 . . . ji . . . δn

and δ′ = δ′1 . . . ji . . . δ′n both lie in some classC ′. Consider a ‘great circle’ path between δand δ′ which stays closest to H: all vertices inthe path have i and j separated by at most oneposition, and no two consecutive vertices have iand j nonadjacent. This is a shortest path, soit lies in C, by Lemma 7. Using the square andhexagon axioms (Observation 6), we see thatthe reflection of the path across H is a path inthe semi-graphoid that connects δ to δ′ (Figure3). Thus a semigraphoid is a convex rank test.

••qqq

qq•δ

MMMMMM

•δ•qqqqqq •M

MMMM

δ′•

•δ′•qqq

qqq•MMMMM

••qqqqq •M

MMMM

xi = xj_________________

Figure 3: Reflecting a path across a hyperplane.

Finally, if M is a set of edges of Pn, repre-senting a convex rank test, then it is easy to

show that M satisfies the square and hexagonaxioms. This completes the proof of Theorem 3.

Remark 8. For n = 3 there are 40 pre-convexrank tests, but only 22 of them are convex ranktests. The corresponding CI models are shownin Figure 5.6 on page 108 in (Studeny, 2005).

4 The submodular cone

In this section we examine a subclass of the con-vex rank tests. Let 2[n] denote the collectionof all subsets of [n] = 1, 2, . . . , n. Any real-valued function w : 2[n] → R defines a convexpolytope Qw of dimension ≤ n− 1 as follows:

Qw :=

x ∈ Rn : x1 + x2 + · · ·+ xn = w([n])and

∑i∈I xi ≤ w(I) for all ∅ 6= I ⊆ [n]

.

A function w : 2[n] → R is called submodular ifw(I)+w(J) ≥ w(I∩J)+w(I∪J) for I, J ⊆ [n].

Proposition 9. A function w : 2[n] → R issubmodular if and only if the normal fan of thepolyhedron Qw is a coarsening of the Sn-fan.

This follows from greedy maximization as in(Lovasz, 1983). Note that the function w is sub-modular if and only if the optimal solution of

maximize u · x subject to x ∈ Qw

depends only on the permutation equivalenceclass of u. Thus, solving this linear program-ming problem constitutes a convex rank test.Any such test is called a submodular rank test.

A convex polytope is a (Minkowski) summandof another polytope if the normal fan of the lat-ter refines the normal fan of the former. Thepolytope Qw that represents a submodular ranktest is a summand of the permutohedron Pn.

Theorem 10. The following combinatorialobjects are equivalent for any positive integer n:1. submodular rank tests,2. summands of the permutohedron Pn,3. structural conditional independence models,4. faces of the submodular cone Cn in R2n

.

We have 1 ⇐⇒ 2 from Proposition 9, and1 ⇐⇒ 3 follows from (Studeny, 2005). Further3 ⇐⇒ 4 holds by definition.

Geometry of Rank Tests 211

Page 222: Proceedings of the Third European Workshop on Probabilistic

The submodular cone is the cone Cn of allsubmodular functions w : 2[n] → R. Work-ing modulo its lineality space Cn ∩ (−Cn),we regard Cn as a pointed cone of dimension2n − n− 1.

Remark 11. All 22 convex rank tests for n = 3are submodular. The submodular cone C3 is a4-dimensional cone whose base is a bipyramid.The polytopes Qw, as w ranges over the facesof C3, are all the Minkowski summands of P3.

Proposition 12. For n ≥ 4, there exist convexrank tests that are not submodular rank tests.Equivalently, there are fans that coarsen the Sn-fan but are not the normal fan of any polytope.

This result is stated in Section 2.2.4 of (Stu-deny, 2005) in the following form: “There existsemi-graphoids that are not structural.”

We answered Question 4.5 posed in (Post-nikov, 2006) by finding a non-submodular con-vex rank test in which all the posets Pi are trees:

M =2 ⊥⊥ 3|1, 4, 1 ⊥⊥ 4|2, 3,

1 ⊥⊥ 2|∅, 3 ⊥⊥ 4|∅.

Remark 13. For n = 4 there are 22108 sub-modular rank tests, one for each face of the 11-dimensional cone C4. The base of this submod-ular cone is a polytope with f -vector (1, 37, 356,1596, 3985, 5980, 5560, 3212, 1128, 228, 24, 1).

Remark 14. For n = 5 there are 117978 coars-est submodular rank tests, in 1319 symmetryclasses. We confirmed this result of (Studeny,2000) with POLYMAKE (Gawrilow, 2000).

We now define a class of submodular ranktests, which we call Minkowski sum of simplices(MSS) tests. Note that each subset K of [n]defines a submodular function wK by settingwK(I) = 1 if K∩I is non-empty and wK(I) = 0if K ∩ I is empty. The corresponding polytopeQwK is the simplex ∆K = convek : k ∈ K.

Now consider an arbitrary subset K =K1,K2, . . . ,Kr of 2[n]. It defines the submod-ular function wK = wK1 +wK2 + · · ·+wKr . Thecorresponding polytope is the Minkowski sum

∆K = ∆K1 + ∆K2 + · · ·+ ∆Kr .

The associated MSS test τK is defined as follows.Given ρ ∈ Sn, we compute the number of indicesj ∈ [r] such that maxρk : k ∈ Kj = ρi, foreach i ∈ [n]. The signature τK(ρ) is the vectorin Nn whose ith coordinate is that number. Fewsubmodular rank tests are MSS tests:

Remark 15. For n = 3, among the 22 sub-modular rank tests, only 15 are MSS tests. Forn = 4, among the 22108, only 1218 are MSS.

5 Graphical tests

Graphical models are fundamental in statistics,and they also lead to a useful class of rank tests.First we show how to associate a semi-graphoidto a family K. Let FwK be the normal fan ofQwK . We write MK for the CI model derivedfrom FwK using the bijection in Theorem 3.

Proposition 16. The semi-graphoid MK is theset of CI statements i ⊥⊥ j |K which satisfy thefollowing property: all sets containing i, j andcontained in i, j ∪ [n]\K are not in K.

Let G be a graph with vertex set [n]. Wedefine K(G) to be the collection of all subsets Kof [n] such that the induced subgraph of G|K isconnected. Recall that the undirected graphicalmodel (or Markov random field) derived fromthe graph G is the set MG of CI statements:

MG =

i ⊥⊥ j |C : the restriction of G to[n]\C contains no path from i to j

.

The polytope ∆G = ∆K(G) is the graph asso-ciahedron, which is a well-studied object in com-binatorics (Carr, 2004; Postnikov, 2005). Thenext theorem is derived from Proposition 16.

Theorem 17. The CI model induced by thegraph associahedron coincides with the graphi-cal model MG, i.e., MK(G) = MG.

There is a natural involution ∗ on the set ofall CI statements which is defined as follows:

(i ⊥⊥ j |C)∗ := i ⊥⊥ j | [n]\(C ∪ i, j).

If M is any CI model, then the CI model M∗ isobtained by applying the involution ∗ to all theCI statements in the model M. Note that thisinvolution was called duality in (Matus, 1992).

212 J. Morton, L. Pachter, A. Shiu, B. Sturmfels, and O. Wienand

Page 223: Proceedings of the Third European Workshop on Probabilistic

3214•

2314•

3241• 2341

3124• 2134•

3421• 2431

1324• 1234•3142•

2143•

3412• 24134321 4231

1342• 1243•

4312• 42131432

• 1423•4132• 4123•

LLLLLLLLLL

))))))))

)))))))

•••••• •••••

++++

++++

::::

::::

::::

::::

::::

::::

9999

9999

9999

9999

9999

9999•••••••••••

•••••••••••

xxxxxxxxxxxx

xxxxxxxxxxxx((((((((

9999

9999

9999

9999

9999

9999

PPPPPPPPPPPP•••

•••

•••••••••••

••••

••••

•••

Figure 4: The permutohedron P4. Double edgesindicate the test τK(G) when G is the path.Edges with large dots indicate the test τ∗K(G).

The graphical tubing rank test τ∗K(G) is the testassociated with M∗

K(G). It can be obtained bya construction similar to the MSS test τK, withthe function wK defined differently and super-modular. The graphical model rank test τK(G) isthe MSS test of the set family K(G).

We next relate τK(G) and τ∗K(G) to a knowncombinatorial characterization of the graph as-sociahedron ∆G. Two subsets A,B ⊂ [n] arecompatible for the graph G if one of the fol-lowing conditions holds: A ⊂ B, B ⊂ A, orA ∩ B = ∅, and there is no edge between anynode in A and B. A tubing of the graph G is asubset T of 2[n] such that any two elements ofT are compatible. Carr and Devadoss (2005)showed that ∆G is a simple polytope whosefaces are in bijection with the tubings.

Theorem 18. The following four combinatorialobjects are isomorphic for any graph G on [n]:• the graphical model rank test τK(G),• the graphical tubing rank test τ∗K(G),• the fan of the graph associahedron ∆G,• the simplicial complex of all tubings on G.

The maximal tubings of G correspond to ver-tices of the graph associahedron ∆G. When G isthe path of length n, then ∆G is the associahe-

dron, and when it is a cycle, ∆G is the cyclohe-dron. The number of classes in the tubing testτ∗K(G) is the G-Catalan number of (Postnikov,

2005). This number is 1n+1

(2nn

)for the associa-

hedron test and(2n−2n−1

)for the cyclohedron test.

6 Enumerating linear extensions

In this paper we introduced a hierarchy ofrank tests, ranging from pre-convex to graph-ical. Rank tests are applied to data vectorsu ∈ Rn, or permutations π ∈ Sn, and locatetheir cones. In order to determine the signifi-cance of a data vector, one needs to compute thequantity | τ−1

(τ(π)

)|, and possibly the proba-

bilities of other maximal cones. These cones areindexed by posets P1, P2, . . . , Pk on [n], and theprobability computations are equivalent to find-ing the cardinality of some of the sets L(Pi).

We now present our methods for computinglinear extensions. If the rank test is a tubingtest then this computation is done as follows.From the given permutation, we identify its sig-nature (image under τ), which we may assumeis its G-tree T (Postnikov, 2005). Suppose theroot of the tree T has k children, each of whichis a root of a subtree Ti for i = 1, . . . , k. Writing|Ti| for the number of nodes in Ti, we have

| τ−1(T) | =( ∑k

i=1 |Ti||T1|, . . . , |Tk|

)( k∏i=1

|τ−1(Ti)|

).

This recursive formula can be translated intoan efficient iterative algorithm. In (Willbrand,2005) the analogous problem is raised for thetest in Example 3. A determinantal formula for(1) appears in (Stanley, 1997, page 69).

For an arbitrary convex rank test we proceedas follows. The test is specified (implicitly orexplicitly) by a collection of posets P1, . . . , Pk

on [n]. From the given permutation, we firstidentify the unique poset Pi of which that per-mutation is a linear extension. We next con-struct the distributive lattice L(Pi) of all orderideals of Pi. Recall that an order ideal is a sub-set O of [n] such that if l ∈ O and (k, l) ∈ Pi

then k ∈ O. The set of all order ideals is a lat-tice with meet and join operations given by set

Geometry of Rank Tests 213

Page 224: Proceedings of the Third European Workshop on Probabilistic

intersection O∩O′ and set union O∪O′. Knowl-edge of this distributive lattice L(Pi) solves ourproblem because the linear extensions of Pi areprecisely the maximal chains of L(Pi). Com-puting the number of linear extensions is #P-complete (Brightwell, 1991). Therefore we de-veloped efficient heuristics to build L(Pi).

The key algorithmic task is the following:given a poset Pi on [n], compute an efficientrepresentation of the distributive lattice L(Pi).Our program for performing rank tests worksas follows. The input is a permutation π and arank test τ . The test τ can be specified either

• by a list of posets P1, . . . , Pk (pre-convex),

• or by a semigraphoidM (convex rank test),

• or by a submodular function w : 2[n] → R,

• or by a collection K of subsets of [n] (MSS),

• or by a graph G on [n] (graphical test).

The output of our program has two parts. First,it gives the number |L(Pi)| of linear extensions,where the poset Pi represents the equivalenceclass of Sn specified by the data π. It alsogives a representation of the distributive lat-tice L(Pi), in a format that can be read bythe maple package posets (Stembridge, 2004).Our software for the above rank tests is availableat www.bio.math.berkeley.edu/ranktests/.

Acknowledgments

This paper originated in discussions withOlivier Pourquie and Mary-Lee Dequeant inthe DARPA Fundamental Laws of Biology Pro-gram, which supported Jason Morton, LiorPachter, and Bernd Sturmfels. Anne Shiu wassupported by a Lucent Technologies Bell LabsGraduate Research Fellowship. Oliver Wienandwas supported by the Wipprecht foundation.

References

G Brightwell and P Winkler. Counting linear exten-sions. Order, 8(3):225-242, 1991.

K Brown. Buildings. Springer, New York, 1989.

M Carr and S Devadoss. Coxeter complexesand graph associahedra. 2004. Available fromhttp://arxiv.org/abs/math.QA/0407229.

E Gawrilow and M Joswig. Polymake: a frame-work for analyzing convex polytopes, in Polytopes– Combinatorics and Computation, eds. G Kalaiand G M Ziegler, Birkhauser, 2000, 43-74.

L Lovasz. Submodular functions and convexity, inMath Programming: The State of the Art, eds.A Bachem, M Groetschel, and B Korte, Springer,1983, 235-257.

F Matus. Ascending and descending conditionalindependence relations, in Proceedings of theEleventh Prague Conference on Inform. The-ory, Stat. Dec. Functions and Random Proc.,Academia, B, 1992, 189-200.

F Matus. Towards classification of semigraphoids.Discrete Mathematics, 277, 115-145, 2004.

EJG Pitman. Significance tests which may be ap-plied to samples from any populations. Supple-ment to the Journal of the Royal Statistical Soci-ety, 4(1):119–130, 1937.

A Postnikov. Permutohedra, associahe-dra, and beyond. 2005. Available fromhttp://arxiv.org/abs/math/0507163.

A Postnikov, V Reiner, L Williams. Faces of SimpleGeneralized Permutohedra. Preprint, 2006.

RP Stanley. Enumerative Combinatorics Volume I,Cambridge University Press, Cambridge, 1997.

J Stembridge. Maple packages for symmet-ric functions, posets, root systems, andfinite Coxeter groups. Available fromwww.math.lsa.umich.edu/∼jrs/maple.html.

M Studeny. Probablistic conditional independencestructures. Springer Series in Information Scienceand Statistics, Springer-Verlag, London, 2005.

M Studeny, RR Bouckaert, and T Kocka. Extremesupermodular set functions over five variables. In-stitute of Information Theory and Automation,Research report n. 1977, Prague, 2000.

J Tits. Le probleme des mots dans les groupes deCoxeter. Symposia Math., 1:175-185, 1968.

K Willbrand, F Radvanyi, JP Nadal, JP Thiery, andT Fink. Identifying genes from up-down proper-ties of microarray expression series. Bioinformat-ics, 21(20):3859–3864, 2005.

G Ziegler. Lectures on polytopes. Vol. 152 of Gradu-ate Texts in Mathematics. Springer-Verlag, 1995.

214 J. Morton, L. Pachter, A. Shiu, B. Sturmfels, and O. Wienand

Page 225: Proceedings of the Third European Workshop on Probabilistic

An Empirical Study of Efficiency and Accuracy of Probabilistic GraphicalModels

Jens Dalgaard Nielsen and Manfred JaegerInstitut for Datalogi, Aalborg Universitet

Frederik Bajers Vej 7, 9220 Aalborg Ø, Denmarkdalgaard,[email protected]

Abstract

In this paper we compare Naıve Bayes (NB) models, general Bayes Net (BN) models and Proba-bilistic Decision Graph (PDG) models w.r.t. accuracy and efficiency. As the basis for our analysiswe use graphs of size vs. likelihood that show the theoretical capabilities of the models. We alsomeasure accuracy and efficiency empirically by running exact inference algorithms on randomlygenerated queries. Our analysis supports previous results by showing good accuracy for NB mod-els compared to both BN and PDG models. However, our results also show that the advantage ofthe low complexity inference provided by NB models is not as significant as assessed in a previousstudy.

1 Introduction

Probabilistic graphical models (PGMs) have beenapplied extensively in machine learning and datamining research, and many studies have been dedi-cated to the development of algorithms for learningPGMs from data. Automatically learned PGMs aretypically used for inference, and therefore efficiencyand accuracy of the PGM w.r.t. inference are of in-terest when evaluating a learned model.

Among some of the most commonly used PGMsare the general Bayesian Network model (BN) andthe Naıve Bayes model (NB). The BN model ef-ficiently represents a joint probability distributionover a domain of discrete random variables by a fac-torization into independent local distributions. TheNB model contains a number of components definedby an unobserved latent variable and models eachdiscrete random variable as independent of all othervariables within each component. Exact inferencehas linear time complexity in the size of the modelwhen using NB models.

For the general BN model both exact and approx-imate inference are NP-hard (Cooper, 1987; Dagumand Luby, 1993).

Model-selection algorithms for learning PGMstypically use some conventional score-metric,searching for a model that optimises the metric. Pe-

nalised likelihood metrics like BIC, AIC and MDLare weighted sums of model accuracy and size.When the learned model is to be used for general in-ference, including a measure for inference complex-ity into the metric is relevant. Neither BIC, AIC norMDL explicitly takes inference complexity into ac-count when assessing a given model. Recently, sev-eral authors have independently emphasised the im-portance of considering inference complexity whenapplying learning in a real domain.

Beygelzimer and Rish (2003) investigate thetradeoff between model accuracy and efficiency.They only consider BN models for a given targetdistribution (in a learning setting, the target distri-bution is the empirical distribution defined by thedata; more generally, the target distribution couldbe any distribution one wants to represent). For BNstreewidth is an adequate efficiency measure (definedas k − 1, where k is the size of the largest clique inan optimal junction tree). Tradeoff curves that plottreewidth against the best possible accuracy achiev-able with a given treewidth are introduced. Thesetradeoff curves can be used to investigate the ap-proximability of a target distribution.

As an example for a distribution with poor ap-proximability in this sense, Beygelzimer and Rish(2003) mention the parity distribution, which rep-

Page 226: Proceedings of the Third European Workshop on Probabilistic

resents the parity function on n binary inputs. Anaccurate representation of this distribution requiresa BN of treewidth n− 1, and any BN with a smallertreewidth can approximate the parity distributiononly as well as the empty network.

The non-approximability of the parity distribu-tion (and hence the impossibility of accurate modelssupporting efficient inference) only holds under therestriction to BN models with nodes correspondingexactly to the n input bits. The use of other PGMs,or the use of latent variables in a BN representation,can still lead to accurate and computationally effi-cient representations of the parity distribution.

Motivated by some distribution’s refusal to be ef-ficiently approximated by BN models, the PGM lan-guage of probabilistic decision graph (PDG) mod-els was developed (Jaeger, 2004). In particular, theparity distribution is representable by a PDG thathas inference complexity linear in n. In a recentstudy an empirical analysis of the approximationsoffered by BN and PDG models learned from real-world data was conducted (Jaeger et al., 2006). Sim-ilar to the tradeoff curves of (Beygelzimer and Rish,2003), Jaeger et al. (2006) used graphs showinglikelihood of data vs. size of the model for the anal-ysis of accuracy vs. complexity. The comparison ofPDGs vs. BNs did not produce a clear winner, andthe main lesson was that the models offer surpris-ingly similar tradeoffs when learned from real data.

Also motivated by considerations of model ac-curacy and efficiency, Lowd and Domingos (2005)in a recent study compared NB and BN models.NB models can potentially offer accuracy-efficiencytradeoff behaviors that for some distributions differfrom those provided by standard BN representation(although NBs do not include the latent class mod-els that allow an efficient representation of the par-ity distribution). Lowd and Domingos (2005) deter-mine inference complexity empirically by measur-ing inference times on randomly generated queries.The inferences are computed exactly for NB mod-els, but for BN models approximate methods wereused (Gibbs sampling and loopy belief propaga-tion). Lowd and Domingos (2005) conclude thatNB models offer approximations that are as accu-rate as those offered by BN models, but in terms ofinference complexity the NB models are reported tobe orders of magnitude faster than BN models.

Our present paper extend these previous works intwo ways. First, we conduct a comparative analysisof accuracy vs. efficiency tradeoffs for three typeof PGMs: BN, NB and PDG models. Our resultsshow that in spite of theoretical differences BN,NB and PDG models perform surprisingly similarwhen learned from real data, and no single modelis consistently superior. Second, we investigate thetheoretical and empirical efficiency of exact infer-ence for all models. This analysis somewhat differsfrom the analysis in (Lowd and Domingos, 2005),where only approximate inference was consideredfor BNs. The latter approach can lead to somewhatunfavorable results for BNs, because approximateinference can be much slower than exact inferencefor models still amenable to exact inference. Our re-sults show that while NB models are still very com-petitive w.r.t. accuracy, exact inference in BN mod-els is often tractable and differences in empiricallymeasured run-times are typically not significant.

2 Probabilistic Graphical Models

In this section we introduce the three types of mod-els that we will use in our experiments; the gen-eral Bayesian Network (BN), the Naıve BayesianNetwork (NB) and the Probabilistic Decision Graph(PDG).

2.1 Bayesian Network Models

BNs (Jensen, 2001; Pearl, 1988) are a class ofprobabilistic graphical models that represent a jointprobability distribution over a domain X of discreterandom variables through a factorization of inde-pendent local distributions or factors. The structureof a BN is a directed acyclic graph (DAG) G =(V,E) of nodes V and directed edges E. Each ran-dom variable Xi ∈ X is represented by a node Vi ∈V, and the factorization

∏Xi∈X

P (Xi|paG(Xi))(where paG(Xi) is the set of random variables rep-resented by parents of node Vi in DAG G) definesthe full joint probability distribution P (X) repre-sented by the BN model. By size of a BN we under-stand the size of the representation, i.e. the numberof independent parameters. Exact inference is usu-ally performed by first constructing a junction treefrom the BN. Inference is then solvable in time lin-ear in the size of the junction tree (Lauritzen and

216 J. D. Nielsen and M. Jaeger

Page 227: Proceedings of the Third European Workshop on Probabilistic

Spiegelhalter, 1988), which may be exponential inthe size of the BN from which it was constructed.

2.2 Naıve Bayes Models

The NB model represents a joint probability distri-bution over a domain X of discrete random vari-ables by introducing an unobserved, latent variableC . Each state of C is referred to as a component,and conditioned on C , each variable Xi ∈ X is as-sumed to be independent of all other variables in X.This yields the simple factorization: P (X, C) =P (C)

∏Xi∈X

P (Xi|C). Exact inference is com-putable in time linear in the representation size ofthe NB model.

2.3 Probabilistic Decision Graph Models

PDGs are a fairly new language for probabilis-tic graphical modeling (Jaeger, 2004; Bozga andMaler, 1999). As BNs and NBs, PDGs representa joint probability distribution over a domain of dis-crete random variables X through a factorization oflocal distributions. However, the structure of thefactorization defined by a PDG is not based on avariable level independence model but on a certainkind of context specific independencies among thevariables. A PDG can be seen as a two-layer struc-ture, 1) a forest of tree-structures over all mem-bers of X, and 2) a set of rooted DAG structuresover parameter nodes, each holding a local distri-bution over one random variable. Figure 1(a) showsa forest F of tree-structures over binary variablesX = X0, X1 . . . , X5, and figure 1(b) shows anexample of a PDG structure based on F . For a com-plete semantics of the PDG model and algorithmsfor exact inference with linear complexity in the sizeof the model, the reader is referred to (Jaeger, 2004).

X0

X1 X2

X3

X4

X5

(a)

X0

X1 X2

X3

X4

X5

(b)

0 1 0 1

0 1 01

0 1

Figure 1: Example PDG. Subfigure (a) shows the aforest-structure F over binary 5 variables, and (b)shows a full PDG structure based on F .

3 Elements of the Analysis

The goal of our analysis is to investigate the qual-ity of PGMs learned from real data w.r.t. accu-racy and inference efficiency. The appropriate no-tion of accuracy depends on the intended tasks forthe model. Following (Jaeger et al., 2006; Lowdand Domingos, 2005; Beygelzimer and Rish, 2003)we use log-likelihood of the model given the data(L(M,D)) as a “global” measure of accuracy. Log-likelihood score is essentially equivalent to cross-entropy (CE) between the empirical distributionPD and the distribution P M represented in modelM :

CE(P D, P M ) = −H(P D)−1

|D|L(M,D), (1)

where H(·) is the entropy function. Observe thatwhen CE(P D, P M ) = 0 (when P D and P M areequal), then L(M,D) = −|D| ·H(P D). Thus, dataentropy is an upper bound on the log-likelihood.

3.1 Theoretical Complexity vs. Accuracy

We use SL-curves (Jaeger et al., 2006) in our anal-ysis of the theoretical performance of each PGMlanguage. SL-curves are plots of size vs. likeli-hood. The size of model here is the effective size, i.e.a model complexity parameter, such that inferencehas linear time complexity in this parameter. ForNB and PDG models this is the size of the modelitself. For BN models it is the size of the junctiontree constructed for inference.

3.2 Empirical Complexity and Accuracy

The size measure used in the SL-curves describedin section 3.1 measures inference complexity onlyup to a linear factor. Following Lowd and Domin-gos (2005), we estimate the complexity of exactinference also empirically by measuring executiontimes for random queries. A query for model M

is solved by computing a conditional probabilityPM (Q = q|E = e), where Q,E are disjoint sub-sets of X, and q, e are instantiations of Q, respec-tively E. Queries are randomly generated as fol-lows: first a random pair 〈Qi,Ei〉 of subsets ofvariables is drawn from X. Then, an instance di

is randomly drawn from the test data. The randomquery then is P M (Q = di[Qi]|E = di[Ei]), where

An Empirical Study of Efficiency and Accuracy of Probabilistic Graphical Models 217

Page 228: Proceedings of the Third European Workshop on Probabilistic

di[Qi], di[Ei] are the instantiations of Q, respec-tively E in di. The empirical complexity is sim-ply the average execution time for random queries.The empirical accuracy is measured by averaginglog(P M (Q = di[Qi]|E = di[Ei])) over the ran-dom queries. Compared to the global accuracy mea-sure L(M,D), this can be understood as a measurefor “local” accuracy, i.e. restricted to specific con-ditional and marginal distributions of P M .

4 Learning

In this section we briefly describe the algorithmswe use for learning each type of PGM from data.For our analysis we need to learn a range of modelswith different efficiency vs. accuracy tradeoffs. Forscore based learning a general λ-score will be used(Jaeger et al., 2006):

Sλ(M,D) = λ · L(M,D)− (1− λ)|M |, (2)

where 0 < λ < 1, and |M | is the size of modelM . Equation (2) is a general score metric, as it be-comes equivalent to common metrics as BIC, AICand MDL for specific settings of λ1. By optimizingscores with different settings of λ we get a range ofmodels offering different tradeoffs between size andaccuracy. Suitable ranges of λ-values were deter-mined experimentally for each type of model usingscore score-based learning (BNs and PDGs).

4.1 Learning Bayesian Networks

We use the KES algorithm for learning BN mod-els (Nielsen et al., 2003). KES performs model-selection in the space of equivalence classes of BNstructures using a semi-greedy heuristic. A param-eter k ∈ [0 . . . 1] controls the level of greediness,where a setting of 0 is maximally stochastic and 1 ismaximally greedy.

The λ-score used in the KES algorithm uses thesize of the BN as the size parameter |M |, not thesize of its junction tree. Clearly, it would be desir-able to score BNs directly by the size of their junc-tion trees, but this appears computationally infeasi-ble. Thus, our SL-curves for BNs do not show for agiven accuracy level the smallest possible size of ajunction tree achieving that accuracy, but the size of

1E.g. (2) with λ = log|D|

2+log |D|corresponds to BIC

a junction tree we were able to find using existingstate-of-the-art learning and triangulation methods.

4.2 Learning Naıve Bayes Net Models

For learning NB models we have implemented aversion of the NBE algorithm (Lowd and Domin-gos, 2005) for learning NB models. As the structureof NB models is fixed, the task reduces to learningthe number of states in the latent variable C , andthe parameters of the model. Learning in the pres-ence of the latent components is done by standardExpectation Maximization (EM) approach, follow-ing (Lowd and Domingos, 2005; Karciauskas et al.,2004). Learning a range of models is done by incre-mentally increasing the number of states of C , andoutputting the model learned for each cardinality. Inthis way we obtain a range of models that offer dif-ferent complexity vs. accuracy tradeoffs. Note thatno structure-score like (2) is required as the struc-ture is fixed.

4.3 Learning Probabilistic Decision Graphs

Learning of PDGs is done using the model-selectionalgorithm presented in (Jaeger et al., 2006). Usinglocal transformations as search operators, the algo-rithm performs a search for a structure that opti-mises λ-score.

5 Experiments

We have produced SL-curves and empirically mea-sured inference times and accuracy, on 5 differ-ent datasets (see table 1) from the UCI repository2 .These five datasets are a representative sample ofthe 50 datasets used in the extensive study by Lowdand Domingos (2005).

We used the same versions of the datasets as usedby Lowd and Domingos (2005). Specifically, thepartitioning into training (90%) and test (10%) setswas the same, continuous variables were discretizedinto five equal frequency bins, and missing valueswere interpreted as special states of the variables.

For measuring the empirical efficiency and accu-racy, we generated random queries as described in3.2 consisting of 1 to 5 query variables Q and 0 to5 evidence variables E.

2http://www.ics.uci.edu/˜mlearn

218 J. D. Nielsen and M. Jaeger

Page 229: Proceedings of the Third European Workshop on Probabilistic

Table 1: Datasets used for experiments.Dataset #Vars training testPoisonous Mushroom 23 7337 787King, Rook vs. King 7 25188 2868Pageblocks 11 4482 574Abalone 9 3758 419Image Segmentation 17 2047 263

For inference in BN and NB models we usedthe junction-tree algorithm implemented in the in-ference engine in Hugin3 through the Hugin JavaAPI. For inference in PDGs, the method describedin (Jaeger, 2004) was implemented in Java. BN andNB experiments were performed on a standard lap-top, 1.6GHz Pentium CPU with 512Mb RAM run-ning Linux. PDG experiments were performed ona Sun Fire280R, 900Mhz SPARC CPU with 4GbRAM running Solaris 9.

6 Results

Table 2 shows SL-curves for each dataset and PGM,both for training (left column) and test sets (rightcolumn).

Lowd and Domingos (2005) based their compar-ison on single BN and NB models. The NB modelswere selected by maximizing likelihood on a hold-out set. Crosses × in the right column of table 2 in-dicate the size-likelihood values obtained by the NBmodels reported in (Lowd and Domingos, 2005).

The first observation we make from table 2 isthat no single model language consistently domi-nates the others. The plots in the left column showsthat BN models have the lowest log-likelihood mea-sured on training data consistently for models largerthan some small threshold. For Abalone and ImageSegmentation, this characteristic is mitigated in theplots for the test-data, where especially PDGs seemto overfit the training-data and accordingly receiveslow log-likelihood score on the test-data.

The overall picture in table 2 is that in terms of ac-curacy, BNs and NBs are often quite similar (King,Rook vs. King is the only exception). This is con-sistent with what Lowd and Domingos (2005) havefound. However, Lowd and Domingos (2005) re-ported big differences in inference complexity when

3http://www.hugin.com

comparing exact inference in NB to approximatemethods in BNs. We do not observe this tendencywhen considering exact inference for both BNs andNBs. Our results show that we can learn BNs thatare within reach of exact inference methods, andthat the theoretical inference complexity as mea-sured by effective model size mostly is similar fora given accuracy level for all three PGM languages.

Effective model size measures actual inferencetime only up to a linear factor. In order to de-termine whether there possibly are huge (orders ofmagnitude) differences in these linear factors, wemeasure the actual inference time on our randomqueries. The left column of table 3 shows the av-erage inference time for 1000 random queries with4 query and 3 evidence variables (results for othernumbers of query and evidence variables were verysimilar). We observe that this empirical complex-ity behaves almost indistinguishably for BN and NBmodels. This is not surprising, since both modelsuse the Hugin inference engine 4. The results doshow, however, that the different structures of thejunction trees for BN and NB models do not havea significant impact on runtime. The linear factorfor PDG inference in these experiments is about 4times larger than that for BN/NB inference5 . Seeingthat we use a proof-of-concept prototype Java im-plementation for PDGs, and the commercial Hugininference engine for BNs and NBs, this indicatesthat PDGs are competitive in practice, not only ac-cording to theoretical complexity analyses.

The right column in table 3 shows the empirical(local) accuracy obtained for 1000 random querieswith 4 query and 3 evidence variables. Overall, theresults are consistent with the global accuracy onthe test data (table 2, right column). The differencesobserved for the different PGMs in table 2 can alsobe seen in table 3, though the discrepancies tend

4Zhang (1998) shows that variable elimination can be moreefficient than junction tree-based inference. However, his re-sults do not indicate that we would obtain substantially differ-ent results if we used variable elimination in our experiments.

5This factor has to be viewed with caution, since PDG in-ference was run on a machine with a slower CPU but moremain memory. When running PDG inference on the same ma-chine as NB/BN inference, we observed overall a similar per-formance, but more deviations from a strictly linear behavior(in table 3 still visible to some degree for the Mushroom andAbalone data). These deviations seem mostly attributable tothe memory management in the Java runtime environment.

An Empirical Study of Efficiency and Accuracy of Probabilistic Graphical Models 219

Page 230: Proceedings of the Third European Workshop on Probabilistic

Table 2: SL-curves for train-sets (left column) and test-sets (right column). The crosses × marks the NBmodels reported by Lowd and Domingos (2005). −H(D) (minus data-entropy) is plotted as a horizontalline.

-30

-28

-26

-24

-22

-20

-18

-16

-14

-12

0 5000 10000 15000 20000 25000 30000-30

-28

-26

-24

-22

-20

-18

-16

-14

-12

Log

-Lik

elih

ood

Effective model size

Poisonous Mushroom - Training

-30

-28

-26

-24

-22

-20

-18

-16

-14

-12

0 5000 10000 15000 20000 25000 30000-30

-28

-26

-24

-22

-20

-18

-16

-14

-12

Log

-Lik

elih

ood

Effective model size

Poisonous Mushroom - Test

-19

-18.5

-18

-17.5

-17

-16.5

-16

-15.5

-15

-14.5

0 5000 10000 15000 20000 25000 30000-19

-18.5

-18

-17.5

-17

-16.5

-16

-15.5

-15

-14.5

Log

-Lik

elih

ood

Effective model size

King, Rook vs. King - Training

-19.5

-19

-18.5

-18

-17.5

-17

-16.5

-16

0 5000 10000 15000 20000 25000 30000-19.5

-19

-18.5

-18

-17.5

-17

-16.5

-16

Log

-Lik

elih

ood

Effective model size

King, Rook vs. King - Test

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10

0 5000 10000 15000 20000-19

-18

-17

-16

-15

-14

-13

-12

-11

-10

Log

-Lik

elih

ood

Effective model size

Pageblocks - Training

-19

-18

-17

-16

-15

-14

-13

-12

-11

0 5000 10000 15000 20000-19

-18

-17

-16

-15

-14

-13

-12

-11

Log

-Lik

elih

ood

Effective model size

Pageblocks - Test

-12

-11.5

-11

-10.5

-10

-9.5

-9

0 2000 4000 6000 8000 10000-12

-11.5

-11

-10.5

-10

-9.5

-9

Log

-Lik

elih

ood

Effective model size

Abalone - Training

-11.6

-11.4

-11.2

-11

-10.8

-10.6

-10.4

0 2000 4000 6000 8000 10000-11.6

-11.4

-11.2

-11

-10.8

-10.6

-10.4

Log

-Lik

elih

ood

Effective model size

Abalone - Test

-26

-24

-22

-20

-18

-16

-14

-12

-10

0 5000 10000 15000 20000 25000 30000-26

-24

-22

-20

-18

-16

-14

-12

-10

Log

-Lik

elih

ood

Effective model size

Image Segmentation - Training

BN NB PDG -H(D)

-26

-25

-24

-23

-22

-21

-20

-19

-18

-17

0 5000 10000 15000 20000 25000 30000-26

-25

-24

-23

-22

-21

-20

-19

-18

-17

Log

-Lik

elih

ood

Effective model size

Image Segmentation - Test

BN NB PDG

220 J. D. Nielsen and M. Jaeger

Page 231: Proceedings of the Third European Workshop on Probabilistic

Table 3: Empirical efficiency (left column) and accuracy (right column) for 4 query and 3 evidence variables.

0

1

2

3

4

5

6

7

8

9

0 5000 10000 15000 20000 25000 30000 0

1

2

3

4

5

6

7

8

9

Em

piri

cal c

ompl

exity

(m

s)

Effective model size

Poisonous Mushroom

-5.5

-5

-4.5

-4

-3.5

-3

-2.5

0 5000 10000 15000 20000 25000 30000-5.5

-5

-4.5

-4

-3.5

-3

-2.5

Em

piri

cal A

ccur

acy

(log

-lik

elih

ood)

Effective model size

Poisonous Mushroom

0

0.5

1

1.5

2

2.5

3

0 5000 10000 15000 20000 25000 30000 0

0.5

1

1.5

2

2.5

3

Em

piri

cal c

ompl

exity

(m

s)

Effective model size

King, Rook vs. King

-11

-10.5

-10

-9.5

-9

-8.5

-8

0 5000 10000 15000 20000 25000 30000-11

-10.5

-10

-9.5

-9

-8.5

-8

Em

piri

cal A

ccur

acy

(log

-lik

elih

ood)

Effective model size

King, Rook vs. King

0

1

2

3

4

5

6

0 5000 10000 15000 20000 25000 0

1

2

3

4

5

6

Em

piri

cal c

ompl

exity

(m

s)

Effective model size

Pageblocks

-5.6

-5.4

-5.2

-5

-4.8

-4.6

-4.4

-4.2

-4

0 5000 10000 15000 20000 25000-5.6

-5.4

-5.2

-5

-4.8

-4.6

-4.4

-4.2

-4

Em

piri

cal A

ccur

acy

(log

-lik

elih

ood)

Effective model size

Pageblocks

0

0.5

1

1.5

2

2.5

3

0 5000 10000 0

0.5

1

1.5

2

2.5

3

Em

piri

cal c

ompl

exity

(m

s)

Effective model size

Abalone

-4.6

-4.5

-4.4

-4.3

-4.2

-4.1

-4

-3.9

-3.8

0 5000 10000-4.6

-4.5

-4.4

-4.3

-4.2

-4.1

-4

-3.9

-3.8

Em

piri

cal A

ccur

acy

(log

-lik

elih

ood)

Effective model size

Abalone

0

2

4

6

8

10

12

14

16

18

20

0 25000 50000 75000 0

2

4

6

8

10

12

14

16

18

20

Em

piri

cal c

ompl

exity

(m

s)

Effective model size

Image Segmentation

BN NB PDG

-6.5

-6

-5.5

-5

-4.5

-4

0 25000 50000 75000-6.5

-6

-5.5

-5

-4.5

-4

Em

piri

cal A

ccur

acy

(log

-lik

elih

ood)

Effective model size

Image Segmentation

BN NB PDG

An Empirical Study of Efficiency and Accuracy of Probabilistic Graphical Models 221

Page 232: Proceedings of the Third European Workshop on Probabilistic

to become less pronounced on the random queriesas on global likelihood (particularly for PDGs inthe image segmentation data). One possible expla-nation for this is that low global likelihood scoresare mostly due to a few test cases whose joint in-stantiation of the variables are given low probabilityby a model, and that these isolated low-probabilityconfigurations are seldom met with in the randomqueries.

7 Conclusion

Motivated by several previous, independent studieson the tradeoff between model accuracy and effi-ciency in different PGM languages, we have inves-tigated the performance of BN, NB, and PDG mod-els. Our main findings are: 1) In contrast to po-tentially widely different performance on artificialexamples (e.g. the parity distribution), we observe arelatively uniform behavior of all three languages onreal-life data. 2) Our results confirm the conclusionsof Lowd and Domingos (2005) that the NB modelis a viable alternative to the BN model for gen-eral purpose probabilistic modeling and inference.However, the order-of-magnitude advantages in in-ference complexity could not be confirmed whencomparing exact inference methods for both typesof models. 3) Previous theoretical complexity anal-yses for inference in PDG models now have beencomplemented with empirical results showing alsothe practical competitiveness of PDGs.

Acknowledgment

We thank Daniel Lowd for providing the pre-processed datasets we used in our experiments. Wethank the AutonLab at CMU for making their effi-cient software libraries available to us.

References

Beygelzimer, A. and Rish, I.: 2003, Approximabil-ity of probability distributions, Advances in Neu-ral Information Processing Systems 16, The MITPress.

Bozga, M. and Maler, O.: 1999, On the represen-tation of probabilities over structured domains,Proceedings of the 11th International Confer-ence on Computer Aided Verification, Springer,pp. 261–273.

Cooper, G. F.: 1987, Probabilistic inference us-ing belief networks is NP-hard, Technical report,Knowledge Systems Laboratory, Stanford Uni-versity.

Dagum, P. and Luby, M.: 1993, Approximatingprobabilistic inference in Bayesian belief net-works is NP-hard, Artificial Intelligence 60, 141–153.

Jaeger, M.: 2004, Probabilistic decision graphs- combining verification and ai techniques forprobabilistic inference, International Journal ofUncertainty, Fuzziness and Knowledge-BasedSystems 12, 19–42.

Jaeger, M., Nielsen, J. D. and Silander, T.: 2006,Learning probabilistic decision graphs, Interna-tional Journal of Approximate Reasoning 42(1-2), 84–100.

Jensen, F. V.: 2001, Bayesian Networks and Deci-sion Graphs, Springer.

Karciauskas, G., Kocka, T., Jensen, F. V.,Larranaga, P. and Lozano, J. A.: 2004, Learningof latent class models by splitting and mergingcomponents, Proceedings of the Second Euro-pean Workshop on Probabilistic Graphical Mod-els, pp. 137–144.

Lauritzen, S. L. and Spiegelhalter, D. J.: 1988, Lo-cal computations with probabilities on graphi-cal structures and their application to expert sys-tems, Journal of the Royal Statistical Society50(2), 157–224.

Lowd, D. and Domingos, P.: 2005, Naive Bayesmodels for probability estimation, Proceedingsof the Twentysecond International Conference onMachine Learning, pp. 529–536.

Nielsen, J. D., Kocka, T. and Pena, J. M.: 2003,On local optima in learning Bayesian networks,Proceedings of the Nineteenth Conference on Un-certainty in Artificial Intelligence, Morgan Kauf-mann Publishers, pp. 435–442.

Pearl, J.: 1988, Probabilistic reasoning in intelli-gent systems: networks of plausible inference,Morgan Kaufmann Publishers.

Zhang, N. L.: 1998, Computational properties oftwo exact algorithms for Bayesian networks, Ap-plied Intelligence 9, 173–183.

222 J. D. Nielsen and M. Jaeger

Page 233: Proceedings of the Third European Workshop on Probabilistic

Adapting Bayes Network Structures to Non-stationary Domains

Søren Holbech Nielsen and Thomas D. NielsenDepartment of Computer Science

Aalborg UniversityFredrik Bajers Vej 7E

9220 Aalborg Ø, Denmark

Abstract

When an incremental structural learning method gradually modifies a Bayesian network (BN)structure to fit observations, as they are read from a database, we call the process structural adap-tation. Structural adaptation is useful when the learner isset to work in an unknown environment,where a BN is to be gradually constructed as observations of the environment are made. Existingalgorithms for incremental learning assume that the samples in the database have been drawn froma single underlying distribution. In this paper we relax this assumption, so that the underlying dis-tribution can change during the sampling of the database. The method that we present can thus beused in unknown environments, where it is not even known whether the dynamics of the environ-ment are stable. We briefly state formal correctness resultsfor our method, and demonstrate itsfeasibility experimentally.

1 Introduction

Ever since Pearl (1988) published his seminal bookon Bayesian networks (BNs), the formalism has be-come a widespread tool for representing, eliciting,and discovering probabilistic relationships for prob-lem domains defined over discrete variables. Onearea of research that has seen much activity is thearea of learning the structure of BNs, where prob-abilistic relationships for variables are discovered,or inferred, from a database of observations of thesevariables (main papers in learning include (Heck-erman, 1998)). One part of this research area fo-cuses on incremental structural learning, where ob-servations are received sequentially, and a BN struc-ture is gradually constructed along the way with-out keeping all observations in memory. A specialcase of incremental structural learning is structuraladaptation, where the incremental algorithm main-tains one or more candidate structures and applieschanges to these structures as observations are re-ceived. This particular area of research has receivedvery little attention, with the only results that we areaware of being (Buntine, 1991; Lam and Bacchus,1994; Lam, 1998; Friedman and Goldszmidt, 1997;Roure, 2004).

These papers all assume that the database ofobservations has been produced by a stationarystochastic process. That is, the ordering of the ob-servations in the database is inconsequential. How-ever, many real life observable processes cannot re-ally be said to be invariant with respect to time:Mechanical mechanisms may suddenly fail, forinstance, and non-observable effects may changeabruptly. When human decision makers are some-how involved in the data generating process, theseare almost surely not fully describable by the ob-servables and may change their behaviour instanta-neously. A simple example of a situation, where it isunrealistic to expect a stationary generating process,is an industrial system, where some component isexchanged for one of another make. Similarly, ifthe coach of a soccer team changes the strategy ofthe team during a match, data on the play from af-ter the chance would be distributed differently fromthat representing the time before.

In this work we relax the assumption on sta-tionary data, opting instead for learning from datawhich is only “approximately” stationary. Moreconcretely, we assume that the data generating pro-cess is piecewise stationary, as in the examplesgiven above, and thus do not try to deal with data

Page 234: Proceedings of the Third European Workshop on Probabilistic

where the data generating process changes gradu-ally, as can happen when machinery is slowly beingworn down.1 Furthermore, we focus on domainswhere the shifts in distribution from one stationaryperiod to the next is of a local nature (i.e. only asubset of the probabilistic relationships among vari-ables change as the shifts take place).

2 Preliminaries

As a general notational rule we use bold font to de-note sets and vectors (V , c, etc.) and calligraphicfont to denote mathematical structures and compo-sitions (B, G, etc.). Moreover, we shall use uppercase letters to denote random variables or sets ofrandom variables (X, Y , V , etc.), and lower caseletters to denote specific states of these variables(x4, y′, c, etc.).≡ is used to denote “defined as”.

A BN B ≡ (G,Φ) over a set of discrete vari-ablesV consists of an acyclic directed graph (tradi-tionally abbreviated DAG)G, whose nodes are thevariables inV , and a set of conditional probabil-ity distributionsΦ (which we abbreviate CPTs for“conditional probability table”). A correspondencebetweenG andΦ is enforced by requiring thatΦconsists of one CPTP (X|PAG(X)) for each vari-ableX, specifying a conditional probability distri-bution forX given each possible instantiation of theparentsPAG(X) of X in G. A unique joint distri-butionPB overV is obtained by taking the productof all the CPTs inΦ. When it introduces no am-biguity, we shall sometimes treatB as synonymouswith its graphG.

Due to the construction ofPB we are guaranteedthat all dependencies inherent inPB can be read di-rectly from G by use of thed-separation criterion(Pearl, 1988). The d-separation criterion states that,if X andY are d-separated byZ, then it holds thatX is conditionally independent ofY givenZ in PB,or equivalently, ifX is conditionally dependent ofY givenZ in PB, thenX andY are not d-separatedby Z in G. In the remainder of the text, we useX ⊥⊥ GY | Z to denote thatX is d-separated fromY by Z in the DAGG, andX⊥⊥PY | Z to denotethat X is conditionally independent ofY given Z

1The changes in distribution of such data is of a continousnature, and adaptation of networks would probably be betteraccomplished by adjusting parameters in the net, rather thanthe structure itself.

in the distributionP . The d-separation criterion isthus

X⊥⊥GY | Z ⇒ X⊥⊥PBY | Z,

for any BNB ≡ (G,Φ). The set of all conditionalindependence statements that may be read from agraph in this manner, is referred to as that graph’sd-separation properties.

We refer to any two graphs over the same vari-ables as beingequivalent if they have the samed-separation properties. Equivalence is obviouslyan equivalence relation. Verma and Pearl (1990)proved that equivalent graphs necessarily have thesame skeleton and the same v-structures.2 Theequivalence class of graphs containing a specificgraphG can then be uniquely represented by thepartially directed graphG∗ obtained from the skele-ton of G by directing links that participate in a v-structure inG in the direction dictated byG. G∗

is called thepatternof G. Any graphG′, which isobtained fromG∗ by directing the remaining undi-rected links, without creating a directed cycle or anew v-structure, is then equivalent toG. We saythatG′ is aconsistent extensionof G∗. The partiallydirected graphG∗∗ obtained fromG∗, by directingundirected links as they appear inG, whenever allconsistent extensions ofG∗ agree on this direction,is called thecompletedpattern ofG. G∗∗ is ob-viously a unique representation ofG’s equivalenceclass as well.

Given any joint distributionP overV it is possi-ble to construct a BNB such thatP = PB (Pearl,1988). A distributionP for which there is a BNBP ≡ (GP ,ΦP ) such thatPBP

= P and also

X⊥⊥P Y | Z ⇒ X⊥⊥GPY | Z

holds, is calledDAG faithful, andBP (and some-times justGP ) is called aperfect map. DAG faithfuldistributions are important since, if a data generat-ing process is known to be DAG faithful, then a per-fect map can, in principle, be inferred from the dataunder the assumption that the data is representativeof the distribution.

For any probability distributionP over variablesV and variableX ∈ V , we define aMarkov bound-ary (Pearl, 1988) ofX to be a setS ⊆ V \ X

2A triple of nodes(X, Y, Z) constitutes av-structureiff X

andZ are non-adjacent and both are parents ofY .

224 S. H. Nielsen and T. D. Nielsen

Page 235: Proceedings of the Third European Workshop on Probabilistic

such thatX⊥⊥PV \ (S ∪ X) | S and this holdsfor no proper subset ofS. It is easy to see that ifP is DAG faithful, the Markov boundary ofX isuniquely defined, and consists ofX ’s parents, chil-dren, and children’s parents in a perfect map ofP .In the case ofP being DAG faithful, we denote theMarkov boundary byMBP (X).

3 The Adaptation Problem

Before presenting our method for structural adapta-tion, we describe the problem more precisely:

We say that a sequence issamples from a piece-wise DAG faithful distribution, if the sequence canbe partitioned into sets such that each set is adatabase sampled from a single DAG faithful dis-tribution, and therank of the sequence is the sizeof the smallest such partition. Formally, letD =(d1, . . . ,dl) be a sequence of observations overvariablesV . We say thatD is sampled from a piece-wise DAG faithful distribution (or simply that it isa piecewise DAG faithful sequence), if there are in-dices1 = i1 < · · · < im = l + 1, such that eachof Dj = (dij , . . . ,dij+1−1), for 1 ≤ j ≤ m − 1,is a sequence of samples from a DAG faithful dis-tribution. The rank of the sequence is defined asminj ij+1− ij , and we say thatm− 1 is itssizeandl its length. A pair of consecutive samples,di anddi+1, constitute ashift in D, if there isj such thatdi is inDj anddi+1 is inDj+1. Obviously, we canhave any sequence of observations being indistin-guishable from a piecewise DAG faithful sequence,by selecting the partitions small enough, so we re-strict our attention to sequences that are piecewiseDAG faithful of at least rankr. However, we do notassume that neither the actual rank nor size of thesequences are known, and specifically we do not as-sume that the indicesi1, . . . , im are known.

The learning task that we address in this paperconsists of incrementally learning a BN, while re-ceiving a piecewise DAG faithful sequence of sam-ples, and making sure that after each sample pointthe BN structure is as close as possible to the dis-tribution that generated this point. Throughout thepaper we assume that each sample is complete, sothat no observations in the sequence have missingvalues. Formally, letD be a complete piecewiseDAG faithful sample sequence of lengthl, and let

Pt be the distribution generating sample pointt.Furthermore, letB1, . . . ,Bl be the BNs found by astructural adaptation methodM , when receivingD.Given a distance measurediston BNs, we define thedevianceof M onD wrt. dist as

dev(M,D) ≡1

l

l∑

i=1

dist(BPi,Bi).

For a methodM to adaptto a DAG faithful samplesequenceD wrt. dist then means thatM seeks tominimize its deviance onD wrt. dist as possible.

4 A Structural Adaptation Method

The method proposed here continuously monitorsthe data streamD and evaluates whether the last,sayk, observations fit the current model. When thisturns out not to be the case, we conclude that a shiftin D took placek observations ago. To adapt to thechange, an immediate approach could be to learn anew network from the lastk cases. By followingthis approach, however, we will unfortunately looseall the knowledge gained from cases before the lastk observations. This is a problem if some parts ofthe perfect maps, of the two distributions on eachside of the shift, are the same, since in such situ-ations we re-learn those parts from the new data,even though they have not changed. Not only is thisa waste of computational effort, but it can also be thecase that the lastk observations, while not directlycontradicting these parts, do not enforce them ei-ther, and consequently they are altered erroneously.Instead, we try to detect where in the perfect mapsof the two distributions changes have taken place,and only learn these parts of the new perfect map.This presents challenges, not only in detection, butalso in learning the changed parts and having themfit the non-changed parts seamlessly. Hence, themethod consists of two main mechanisms: One,monitoring the current BN while receiving obser-vations and detecting when and where the modelshould be changed, and two, re-learning the parts ofthe model that conflicts with the observations, andintegrating the re-learned parts with the remainingparts of the model. These two mechanisms are de-scribed below in Sections 4.1 and 4.2, respectively.

Adapting Bayes Network Structures to Non-stationary Domains 225

Page 236: Proceedings of the Third European Workshop on Probabilistic

4.1 Detecting Changes

The detection part of our method, shown in Al-gorithm 1, continuously processes the cases it re-ceives. For each observationd and nodeX, themethod measures (using CONFLICTMEASURE(B,X, d)) how well d fits with the local structure ofB aroundX. Based on the history of measurementsfor nodeX, cX , the method tests (using SHIFTIN-STREAM(cX , k)) whether a shift occurredk obser-vations ago.k thus acts as the number of observa-tions that are allowed to “pass” before the methodshould realize that a shift has taken place. We there-fore call the parameterk the allowed delayof themethod. When the actual detection has taken place,as a last step, the detection algorithm invokes theupdating algorithm (UPDATENET(·)) with the setof nodes, for which SHIFTINSTREAM(·) detected achange, together with the lastk observations.

Algorithm 1 Algorithm for BN adaption. Takes asinput an initial networkB, defined over variablesV , a series of casesD, and an allowed delayk fordetecting shifts inD.1: procedure ADAPT(B, V ,D, k)2: D′ ← []3: cX ← [] (∀X ∈ V )4: loop5: d←NEXTCASE(D)6: APPEND(D′, (d))7: S ← ∅

8: for X ∈ V do9: c←CONFLICTMEASURE(B, X, d)

10: APPEND(cX, c)11: if SHIFTINSTREAM(cX, k) then12: S ← S ∪ X

13: D′ ←LASTKENTRIES(D′, k)14: if S 6= ∅ then15: UPDATENET(B, S,D′)

To monitor how well each observationd ≡(d1, . . . , dm) “fit” the current modelB, and espe-cially the connections between a nodeXi and the re-maining nodes inB, we have followed the approachof Jensen et al. (1991): If the current model is cor-rect, then we would in general expect that the prob-ability for observingd dictated byB is higher thanor equal to that yielded by most other models. Thisshould especially be the case for the empty modelE ,where all nodes are unconnected. That is, inB, weexpect the individual attributes ofd to be positivelycorrelated (unlessd is a rare case, in which case all

bets are off):

logPE(Xi = di)

PB(Xi = di|Xj = dj (∀j 6= i))> 0 . (1)

Therefore, we let CONFLICTMEASURE(B, Xi, d)return the value given on the left-hand side of (1).We note that this is where the assumption of com-plete data comes into play: Ifd is not completelyobserved, then (1) cannot be evaluated for all nodesXi.

Since a high value returned by CONFLICTMEA-SURE(·) for a nodeX could be caused by a rarecase, we cannot use that value directly for deter-mining whether a shift has occurred. Rather, welook at the block of values for the lastk cases,and compare these with those from before that. Ifthere is a tendency towards higher values in the for-mer, then we conclude that this cannot be causedonly by rare cases, and that a shift must have oc-curred. Specifically, for each variableX, SHIFTIN-STREAM(cX , k) checks whether there is a signif-icant increase in the values of the lastk entries incX relative to those before that. In our implementa-tion SHIFTINSTREAM(cX , k) calculates the nega-tive of the second discrete cosine transform compo-nent (see e.g. (Press et al., 2002)) of the last2k mea-sures incX , and returns true if this statistic exceedsa pre-specified threshold value. We are unaware ofany previous work using this technique for changepoint detection, but we chose to use this as it out-performed the more traditional methods of log-oddsratios andt-tests in our setting.

4.2 Learning and Incorporating Changes

When a shift involving nodesS has been detected,UPDATENET(B, S, D′) in Algorithm 2 adapts theBN B around the nodes inS to fit the empirical dis-tribution defined by the lastk casesD′ read fromD. Throughout the text, both the cases and the em-pirical distribution will be denotedD′. Since wewant to reuse the knowledge encoded inB that hasnot been deemed outdated by the detection part ofthe method, we will updateB to fit D′ based on theassumption that only nodes inS need updating oftheir probabilistic bindings (i.e. the structure asso-ciated with their Markov boundaries inBD′). Ig-noring most details for the moment, the updatingmethod in Algorithm 2 first runs through the nodes

226 S. H. Nielsen and T. D. Nielsen

Page 237: Proceedings of the Third European Workshop on Probabilistic

Algorithm 2 Update Algorithm for BN. Takes asinput the network to be updatedB, a set of vari-ables whose structural bindings may be wrongS,and data to learn fromD′.1: procedure UPDATENET(B, S,D′)2: for X ∈ S do3: dMBD

′(X)←MARKOVBOUNDARY(X,D′)4: GX ←ADJACENCIES(X, dMBD

′(X),D′)5: PARTIALLY DIRECT(X,GX , dMBD

′(X),D′)6: G′ ←MERGEFRAGMENTS(GXX∈S )7: G′′ ←MERGECONNECTIONS(B,G′, S)8: (G′′, C)←DIRECT(G′′, B, S)9: Φ

′′ ← ∅

10: for X ∈ V do11: if X ∈ C then12: Φ

′′ ← Φ′′ ∪ PD

′(X|PAG′′(X))

13: else14: Φ

′′ ← Φ′′ ∪ PB(X|PAG

′′(X))

15: B ← (G′′,Φ′′)

in S and learns a partially directed graph fragmentGX for each nodeX (GX can roughly be thought ofas a “local completed pattern” forX). When net-work fragments have been constructed for all nodesin S, these fragments are merged into a single graphG′, which is again merged with fragments from theoriginal graph ofB. The merged graph is then di-rected using four direction rules, which try to pre-serve as much ofB’s structure as possible, with-out violating the newly uncovered knowledge rep-resented by the learned graph fragments. Finally,new CPTs are constructed for those nodesC thathave a new parent set inBD′ (nodes which, ideally,should be a subset ofS).

The actual construction ofGX is divided intothree steps: First, an estimateMBD′(X) ofMBD′(X) is computed, using MARKOVBOUND-ARY(X, D′); second, nodes inMBD′(X) that areadjacent toX in BD′ are uncovered, using ADJA-CENCIES(X, MBD′(X), D′), andGX is initializedas a graph overX and these nodes, whereX is con-nected with links to these adjacent nodes; and third,some links inGX that are arcs inB∗∗D′ are directed asthey would be inB∗∗D′ using PARTIALLY DIRECT(X,

GX , MBD′(X), D′). See (Nielsen and Nielsen,2006) for more elaboration on this.

In our experimental implementation, we used thedecision tree learning method of Frey et al. (2003)to find MBD′(X), and the ALGORITHMPCD(·)method in (Pena et al., 2005) (restricted to the vari-

ables inMBD′(X)) to find variables adjacent toX.The latter method uses a greedy search for itera-tively growing and shrinking the estimated set of ad-jacent variables until no further change takes place.Both of these methods need an “independence ora-cle” ID′ . For this we have used aχ2 test onD′.

Algorithm 3 Uncovers the direction of some arcsadjacent to a variableX as they would be inB∗∗D′.NEGX

(X) consists of the nodes connected toX bya link in GX .

1: procedure PARTIALLY DIRECT(X,GX , dMBD′(X),D′)

2: DIRECTASINOTHERFRAGMENTS(GX)3: for Y ∈ dMBD

′(X) \ (NEGX(X) ∪PAGX

(X)) do4: for T ( dMBD

′(X) \ X, Y do5: if ID′(X, Y | T ) then6: for Z ∈ NEGX

(X) \ T do7: if ¬ID′(X, Y | T ∪ Z) then8: LINK TOARC(GX, X, Z)

The method PARTIALLY DIRECT(X, GX ,MBD′(X), D′) directs a number of links in thegraph fragmentGX in accordance with the directionof these in (the unknown)B∗∗D′ . With DIREC-TASINOTHERFRAGMENTS(GX ), the procedurefirst exchanges links for arcs, when previouslydirected graph fragments unanimously dictate this.The procedure then runs through each variableY inMBD′(X) not adjacent toX, finds a set of nodesin MBD′(X) that separateX from Y , and thentries to re-establish connection toY by repeatedlyexpanding the set of separating nodes by a singlenode adjacent toX. If such a node can be foundit has to be a child ofX in the completed patternB∗∗D′ , and no arc in the patternB∗D′ originating fromX is left as a link by the procedure (see (Nielsenand Nielsen, 2006) for proofs). As before theindependence oracleID′ was implemented as aχ2

test in our experiments.In most constraint based learning methods, only

the direction of arcs participating in v-structuresare uncovered using independence tests, and struc-tural rules are relied on for directing the remainingarcs afterwards. For the proposed method, how-ever, more arcs from the completed pattern, thanjust those of v-structures, are directed through in-dependence tests. The reason is that traditional un-covering of the direction of arcs in a v-structureX → Y ← Z relies not only on knowledge thatX

Adapting Bayes Network Structures to Non-stationary Domains 227

Page 238: Proceedings of the Third European Workshop on Probabilistic

andY are adjacent, and thatX andZ are not, butalso on the knowledge thatY andZ are adjacent. Atthe point, whereGX is learned, however, knowledgeof the connections among nodes adjacent toX is notknown (and may be dictated byD′ or may be dic-tated byB), so this traditional approach is not pos-sible. Of course these unknown connections couldbe uncovered fromD′ using a constraint based algo-rithm, but the entire point of the method is to avoidlearning of the complete new network.

When all graph fragments for nodes inS havebeen constructed, they are merged through a simplegraph union in MERGEFRAGMENTS(·); no conflictsamong orientations can happen due to the construc-tion of PARTIALLY DIRECT(·). In MERGECON-NECTIONS(B, G′, S) connections among nodes inV \ S are added according to the following rule: IfX,Y ∈ V \ S are adjacent inB, then add the linkX − Y to G′. The reason for this rule is that insepa-rable nodes, for which no change has been detected,are assumed to be inseparable still. However, the di-rection of some of the arcs may have changed inG′,wherefore we cannot directly transfer the directionsin B to G′.

Following the merge, DIRECT(G′′ , B, S) directsthe remaining links inG′′ according to the followingfive rules:

1. If X ∈ V \ S, X − Y is a link,X → Y is anarc inB, andY is a descendant of some nodeZ in MBB(X) \ADB(X), whereADB(X)are nodes adjacent toX in B, through a pathinvolving only children ofX, then direct thelink X − Y asX → Y .3

2. If Rule 1 cannot be applied, and ifX − Y is alink, Z → X is an arc, andZ andY are non-adjacent, then direct the linkX−Y asX → Y .

3. If Rule 1 cannot be applied, and ifX − Y is alink and there is a directed path fromX to Y ,then direct the linkX − Y asX → Y .

3That Rule 1 is sensible is proved in (Nielsen and Nielsen,2006). Intuitively, we try to identify a graph fragment forX inB, that can be merged with the graph fragments learned fromD′. It turns out that the arcs directed by Rule 1 are exactlythose that would have been learned by PARTIALLY DIRECT(X,GX , MBB(X), PB).

4. If Rules 1 to 3 cannot be applied, chose a linkX − Y at random, such thatX,Y ∈ V \ S,and direct it as inB.

5. If Rules 1 to 4 cannot be applied, chose a linkat random, and direct it randomly.

Due to potentially flawed statistical tests, the resul-tant graph may contain cycles each involving at leastone node inS. These are eliminated by reversingonly arcs connecting to at least one node inS. Thereversal process resembles the one used in (Margari-tis and Thrun, 2000): We remove all arcs connectingto nodes inS that appears in at least one cycle. Weorder the removed arcs according to how many cy-cles they appear in, and then insert them back in thegraph, starting with the arcs that appear in the leastnumber of cycles, breaking ties arbitrarily. Whenat some point the insertion of an arc gives rise to acycle, we insert the arc as its reverse.

We have obtained a proof of the “correctness”of the proposed method, but space restrictions pre-vents us from bringing it here. Basically, we haveshown that, given the set-up from Section 3, if themethod is started with a network equivalent toBP1

,thenBi will be equivalent toBPi−k

for all i > k.This is what we refer to as “correct” behaviour, andit means that once on the right track, the methodwill continue to adapt to the underlying distribu-tion, with the delayk. The assumptions behind theresult, besides that eachPi is DAG faithful, are i)the samples inD are representative of the distribu-tions they are drawn from, ii) the rank ofD is biggerthan 2k, and iii) SHIFTINSTREAM(·) returns truefor variableX and samplej iff Pj−k is not simi-lar to Pj−k−1 aroundX (see (Nielsen and Nielsen,2006) for formal definitions and detailed proofs).

5 Experiments and Results

To investigate how our method behaves in practice,we ran a series of experiments. We constructed100 experiments, where each consisted of five ran-domly generated BNsB1, . . . ,B5 over ten variables,each having between two and five states. We madesure thatBi was structurally identical toBi−1 ex-cept for the connection between two randomly cho-sen nodes. All CPTs inBi were kept the same asin Bi−1, except for the nodes with a new parent

228 S. H. Nielsen and T. D. Nielsen

Page 239: Proceedings of the Third European Workshop on Probabilistic

set. For these we employed four different meth-ods for generating new distributions:A estimatedthe probabilities from the previous network withsome added noise to ensure that no two distribu-tions were the same.B, C, andD generated entirelynew CPTs, withB drawing distributions from a uni-form distribution over distributions.C drew dis-tributions from the same distribution, but rejectedthose CPTs where there were no two parent con-figurations, for which the listed distributions had aKL-distance of more than1. D was identical toC,except for having a threshold of5. The purpose ofthe latter two methods is to ensure strong probabilis-tic dependencies for at least one parent configura-tion. For generation of the initial BNs we used themethod of Ide et al. (). For each series of five BNs,we sampledr cases from each network and concate-nated them into a piecewise DAG faithful samplesequence of rankr and length5r, for r being500,1000, 5000, and10000.

We fed our method (NN) with the generatedsequences, using different delaysk (100, 500,and 1000), and measured the deciance wrt. theKL-distance on each. As mentioned we are un-aware of other work geared towards non-stationarydistributions, but for base-line comparison pur-poses, we implemented the structural adaptationmethods of Friedman and Goldszmidt (1997) (FG)and Lam and Bacchus (1994) (LB). For the methodof Friedman and Goldszmidt (1997) we tried bothsimulated annealing (FG-SA) and a more time con-suming hill-climbing (FG-HC) for the unspecifiedsearch step of the algorithm. As these methods havenot been developed to deal with non-stationary dis-tributions, they have to be told the delay betweenlearning. For this we used the same valuek, thatwe use as delay for our own method, as this ensurethat all methods store only a maximum ofk fullcases at any one time. The chosenk values, alsocorrespond to those found for the experimental re-sults reported in Friedman and Goldszmidt (1997)and Lam (1998). The only other pre-specified pa-rameter required by our method, viz. a threshold forthe χ2-tests we set at a conventional0.05. Eachmethod was given the correct initial networkB1 tostart its exploration.

Space does not permit us to present the resultsin full, but the deviance of both NN, FG-SA, and

0

100

200

300

400

500

600

0 100 200 300 400 500 600

ABCD

Figure 1: Deviance measures FG-SA (X-axis) vs.NN (Y-axis).

0

50

100

150

200

250

300

350

400

450

500

0 50 100 150 200 250 300 350 400 450 500

ABCD

Figure 2: Deviance measures FG-HC (X-axis) vs.NN (Y-axis).

FG-HC are presented in Figures 1 and 2. NN out-performed FG-SA in 81 of the experiments, andFG-HC in 65 of the experiments. The deviance ofthe LB method was much worse than for either ofthese three. The one experiment, where both theFG methods outperformed the NN method substan-tially, had r equal to10000 and k equal to1000,and was thus the experiment closest to the assump-tion on stationary distributions of the FG and LBlearners.

Studying the individual experiments moreclosely, it became apparent that NN is more “sta-ble” than FG: It does not alter the network as oftenas FG, and when doing so, NN does not alter it asmuch as FG. This is positive, as besides preventingunnecessary computations, it frees the user of

Adapting Bayes Network Structures to Non-stationary Domains 229

Page 240: Proceedings of the Third European Workshop on Probabilistic

the learned nets from a range of false positives.Furthermore, we observed that the distance betweenthe BN maintained by NN and the generating oneseems to stabilize after some time. This was notalways the case for FG.

6 Discussion

The plotted experiments seem to indicate that ourmethod is superior to existing techniques for do-mains, where the underlying distribution is not sta-tionary. This is especially underscored by our ex-periments actually favouring the existing methodsthrough using only sequences whose rank is a multi-plum of the learnersk value, which means that bothFG and LB always learn from data from only onepartition of the sequence, unlike NN, which rarelyidentifies the point of change completely accurately.Moreover, score based approaches are geared to-wards getting small KL-scores, and thus the metricwe have reported should favour FG and LB too.

Of immediate interest to us, is investigation ofhow our method fares when the BN given to it atthe beginning is not representative of the distribu-tion generating the first partition of the sample se-quence. Also, it would be interesting to investigatethe extreme cases of sequences of size1 and thosewith very low ranks (r ≪ 500). Obviously, the taskof testing other parameter choices and other imple-mentation options for the helper functions need tobe carried out too.

Currently, we have some ideas for optimizingthe precision of our method, including performingparameter adaptation of the CPTs associated withthe maintained structure, and letting changes “cas-cade”, by marking nodes adjacent to changed nodesas changed themselves.

In the future it would be interesting to see howa score based approach to the local learning part ofour method would perform. The problem with tak-ing this road is that it does not seem to have anyformal underpinnings, as the measures score basedapproaches optimize are all defined in terms of asingle underlying distribution. A difficulty whichFriedman and Goldszmidt (1997) also allude to intheir efforts to justify learning from data collectionsof varying size for local parts of the network.

ReferencesW. Buntine. 1991. Theory refinement on Bayesian net-

works. InUAI 91, pages 52–60. Morgan Kaufmann.

L. Frey, D. Fisher, I. Tsamardinos, C. F. Aliferis, andA. Statnikov. 2003. Identifying Markov blankets withdecision tree induction. InICDM 03, pages 59–66.IEEE Computer Society Press.

N. Friedman and M. Goldszmidt. 1997. Sequential up-date of Bayesian network structure. InUAI 97, pages165–174. Morgan Kaufmann.

D. Heckerman. 1998. A tutorial on learning withBayesian networks. In M. Jordan, editor,Learningin Graphical Models, pages 301–354. Kluwer.

J. S. Ide, F. G. Cozman, and F. T. Ramos. Generat-ing random Bayesian networks with constraints on in-duced width. InECAI 04, pages 323–327. IOS Press.

F. V. Jensen, B. Chamberlain, T. Nordahl, and F. Jensen.1991. Analysis in HUGIN of data conflict. InUAI 91.Elsevier.

W. Lam and F. Bacchus. 1994. Using new data to re-fine a Bayesian network. InUAI 94, pages 383–390.Morgan Kaufmann.

W. Lam. 1998. Bayesian network refinement via ma-chine learning approach.IEEE Trans. Pattern Anal.Mach. Intell., 20(3):240–251.

D. Margaritis and S. Thrun. 2000. Bayesian network in-duction via local neighborhoods. InAdvances in Neu-ral Information Processing Systems 12, pages 505–511. MIT Press.

S. H. Nielsen and T. D. Nielsen. 2006. Adapt-ing bayes nets to non-stationary probability dis-tributions. Technical report, Aalborg University.www.cs.aau.dk/˜holbech/nielsennielsennote.ps.

J. Pearl. 1988.Probabilistic Reasoning in IntelligentSystems. Morgan Kaufmann.

J. M. Pena, J. Bjorkegren, and J. Tegner. 2005. Scalable,efficient and correct learning of Markov boundariesunder the faithfulness assumption. InECSQARU 05,volume 3571 ofLNCS, pages 136–147. Springer.

W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.Flannery, editors. 2002.Numerical Recipes in C++:The Art of Scientific Computing. Cambridge Univer-sity Press, 2nd edition.

J. Roure. 2004. Incremental hill-climbing search appliedto Bayesian network structure learning. InProceed-ings of the 15th European Conference on MachineLearning. Springer.

T. Verma and J. Pearl. 1990. Equivalence and synthesisof causal models. InUAI 91, pages 220–227. Elsevier.

230 S. H. Nielsen and T. D. Nielsen

Page 241: Proceedings of the Third European Workshop on Probabilistic

Diagnosing Lyme disease - Tailoring patient specific Bayesiannetworks for temporal reasoning

Kristian G. Olesen ([email protected])Department of Computer Science, Aalborg University

Fredrik Bajersvej 7 E, DK-9220 Aalborg East, Denmark

Ole K. Hejlesen ([email protected])Department of Medical Informatics, Aalborg UniversityFredrik Bajersvej 7 D, DK-9220 Aalborg East, Denmark

Ram Dessau ([email protected])Clinical Microbiology, Næstved Hospital

Denmark

Ivan Beltoft ([email protected]) and Michael Trangeled ([email protected])Department of Medical Informatics, Aalborg University

Abstract

Lyme disease is an infection evolving in three stages. Lyme disease is characterised by anumber of symptoms whose manifestations evolve over time. In order to correctly classifythe disease it is important to include the clinical history of the patient. Consultationsare typically scattered at non-equidistant points in time and the probability of observingsymptoms depend on the time since the disease was inflicted on the patient.

A simple model of the evolution of symptoms over time forms the basis of a dynamicallytailored model that describes a specific patient. The time of infliction of the disease isestimated by a model search that identifies the most probable model for the patient giventhe pattern of symptom manifestations over time.

1 Introduction

Lyme disease, or Lyme Borreliosis, is the mostcommonly reported tick-borne infection in Eu-rope and North America. Lyme disease wasnamed in 1977 when a cluster of children in andaround Lyme, Connecticut, displayed a similardisease pattern. Subsequent studies revealedthat the children were infected by a bacteria,which was named Borrelia burgdorferi. Thesebacteria are transmitted to humans by the biteof infected ticks (Ixodus species). The clinicalmanifestations of Lyme borreliosis are describedin three stages and may affect different organsystems, most frequently the skin and nervoussystem. The diagnosis is based on the clinicalfindings and results of laboratory testing for an-

tibodies (immunglobulin M and G) in the blood.Methods for direct detection of the bacteria inthe tissue or the blood are not available for rou-tine diagnosis. Thus it may be difficult to diag-nose Lyme disease as the symptoms may not beexclusive for the disease and testing for antibod-ies may yield false positive or negative results.

In the next section we give an overview ofLyme disease and in section 3 a brief summaryof two earlier systems are given. In section 4 wesuggest an approach that overcome the difficul-ties by modelling multiple consultations spreadout in non-equidistant points in time. This ap-proach assumes that the time of infection isknown, but this is rarely the case. We aim foran estimation of this fact; a number of hypoth-esised models are generated and the best one is

Page 242: Proceedings of the Third European Workshop on Probabilistic

Organ system Stage 1: Stage 2: Stage 3:

Incubation period Incubation period Incubation periodone week (few days one week to a few months to yearsto one month) months

Skin Erythema migrans Borrelial lympocytoma Acrodermatitis chronicaathrophicans

Central nervous Early neuroborreliosis Chronic neuroborreliosissystem

Joints Lyme arthritis

Heart Lyme carditis

Average sentitivity ofantibody detection 50% 80% 100%(IgG or IgM)

Table 1: Clinical manifestations of Lyme disease.

identified based on the available evidence. Insection 5 a preliminary evaluation of the ap-proach is described and finally we conclude insection 6.

2 Lyme Disease

According to European case defintions(http://www.oeghmp.at/eucalb/diagnosis case-definition-outline.html) Lyme disease and itsmanifestations are described in three stages asshown in Table 1.

Erythema migrans is the most frequent man-ifestation. The rash which migrates from thesite of the tickbite continuously grows in size.Erythema migrans is fairly characteristic andthe diagnosis is clinical. Laboratory testing isuseless as only half of the patients are positive,when the disease is localized to the skin only.Neuroborreliosis is less frequent, in Denmarkthe incidence is around 3/100.000 per year. TheIgM or IgG may be positive in 80% of patientswith neuroborreliosis, but if the duration of thethe clinical disease is more than two months,then 100% of patients are positive. Figure 1shows that not only the probability of beingpositive, but also the level of antibodies vary asa function of time. Thus the laboratory mea-surement of IgM and IgG antibodies dependsboth on the duration of the clinical disease andthe dissemination of the infection to other or-gan systems than the skin. The incidence of the

other manifestations of Lyme disease is morerare. The diagnosis of lyme arthritis is espe-cially difficult as the symptoms are similar toarthritis due to other causes and because thedisease is rare.

Time2 years4-6 months4-6 weeks3 weeks

Level

Bite

IgM IgG

Figure 1: IgM and IgG development after infec-tion of Lyme disease.

The epidemiology is complex. For exam-ple different age groups are at different risks,there is a large seasonsal variation due to vari-ations in tick activity (Figure 2), the incuba-tion period may vary from a few days to sev-eral years and some clinical manifestations arevery rare. There are large variations in in-cubation time and clinical progression. Somepatients may have the disease starting with arash (erythema migrans) and then progressingto ntaeuroborreliosis, but most patients withneuroborreliosis are not aware of a preceedingrash. The description of the disease and it’s pro-gression primarily based on (Smith et al., 2003)and (Gray et al., 2002).

There are also large individual variations in

232 K. G. Olesen, O. K. Hejlesen, R. Dessau, I. Beltoft, and M. Trangeled

Page 243: Proceedings of the Third European Workshop on Probabilistic

Figure 2: Comparison of seasonal occurrence oferythema migrans (EM, n = 566) and neurobor-reliosis (NB, n = 148) in northeastern Austria,1994-1995.(Gray et al., 2002).

the behaviour of the patient and the physician,about when to visit the doctor, when to takethe laboratory test ect. Possible repeated visitsto the doctor and repeated laboratory testingare performed at variable intervals. Joint painand arthritis is a very common problem in thegeneral population, but is only very rarely dueto Lyme borreliosis. However, many patientswith arthritis are tested for Lyme borreliosis( 30% of all patients samples) and as the priorprobability of Lyme disease is low, the posteriorprobality of Lyme arthritis is also low in spiteof a positive test result. Laboratories in somecountries attempt to improve the diagnosis ofLyme disease by using a two-step strategy inantibody detection, by setting a low cut-off onthe screening test and then performing a moreexpensive and complex test to confirm or dis-card the result of the primary test. As the sec-ond test is using the same principle of indirecttesting for antibody reactions to different Borre-lial antigens the two tests are highly correlated,and the information gain is limited. Thus, thebasic problem of false positive or false negativeresults is not solved. It was therefore found im-portant to develop a decision support system toassist the clinician by calculating the posteriorprobability of Lyme disease to guide the choiceof treatment. An evidence based clinical diag-

nosis is supported by incorporating clinical andlaboratory data into the model.

To capture the complex patterns of the dis-ease, a model must incorporate the relevantclinical evidence including estimation of thetemporal aspects of time since the tickbite, theduration of clinical disease and the developmentof antibody response.

3 Existing Models for Diagnosis of

Lyme Disease

We have knowledge of two models thathave been developed to assist the med-ical practitioner in the diagnosis ofLyme disease (Dessau and Andersen, 2001;Fisker et al., 2002). Both models are based onBayesian networks (Jensen, 2001) and as thelatter is a further development of the former,they share most of the variables.

The models include a group of variables de-scribing general information and knowledge in-cluding age and gender of the patient, the pa-tient’s exposure to ticks (is the patient a forrestworker or orienteer), the month of the year andwhether the patient recalls a tick bite. The in-formation is used to establish whether or notthe conditions for a Lyme disease infection hasbeen present. This part of the model influencethe hypothesis variable, Borrelia.

Another section of the models describe clin-ical manifestations. The findings are influ-enced by the hypothesis variable, but may becaused by other reasons. Similarly, a sectionof variables describing laboratory findings maybe caused by either Lyme disease or by otherdisorders.

The structure of the model by Dessau andAndersen (2001) is shown in Figure 3. In thismodel the three stages of Lyme disease is explic-itly represented as three copies of the hypothesisand the findings sections. The conditional prob-abilities reflect the temporal evolution of symp-toms, such that e.g. neuroborreliosis is moreprobable in stage two than in stages one andthree. A problem with this approach is that theprogression of the disease is uncertain, and con-sequently it is unclear which copy of e.g. finding

Diagnosing Lyme disease - Tailoring patient specific Bayesian networks for temporal reasoning 233

Page 244: Proceedings of the Third European Workshop on Probabilistic

Background Borrelia1 Borrelia2 Borrelia3

Pathology1 Pathology2 Pathology3

Clinicalfindings

Immunology1 Immunology2 Immunology3

Laboratoryfindings

Borrelia

Figure 3: Structure of the Bayesian network by Dessau and Andersen.

variables to use. This was resolved by combin-ing variables from each stage in single variablesthat describe the observable clinical manifesta-tions and laboratory findings.

The progress of Lyme disease was modelledby a variable “duration” (not shown in the fig-ure), indicating the time a patient has expe-rienced the symptoms. It does not distinguishbetween the duration of the different symptoms,but models the total duration of illness. If theduration is short it indicates a stage one Lymedisease and so forth.

Dessau and Andersen’s model is a snapshotof a patient at a specific point in time and itdoes not involve temporal aspects such as meansfor entering of multiple evidence on the samevariables as a result of repeated consultations.

The model by Fisker et al. (2002) aims toinclude these issues. The idea is to reflect theclinical practice to examine a patient more thanonce if the diagnosis is uncertain. This gives themedical practitioner the opportunity to observeif the development of the disease correspondsto the typical pattern. The model mainly in-cludes the same variables, but the structure ofthe network is different (see Figure 4).

Instead of triplicating the pathology for thedifferent stages of the disease the temporal as-pect was incorporated in the model by defin-ing the states of manifestation variables astime intervals. This approach was inspired by(Arroyo-Figueroa and Sucar, 1999) that intro-

Background Borrelia

Clinicalfindings

Old laboratoryfindings

Laboratoryfindings

Figure 4: Structure of the Bayesian network byFisker et al.

duced Temporal Nodes Bayesian Networks asan alternative to dynamic Bayesian networks.The time intervals represent the time sincethe symptom was first observed, and both thelength and the number of time intervals wastailored individually for each node. With thisapproach it became possible to enter ”old” evi-dence into the model. For example, if erythemamigrans was observed on, or recalled by, thepatient ten weeks ago and lymphocytoma is acurrent observation, the EM variable is instan-tiated to the state 8 weeks - 4 months and thelymphocy is instantiated to < 6 weeks.

The model also incorporated the ability toenter duplicate results of laboratory tests. Thisoption was included by repeating the laboratoryfindings and including a variable specifying thetime between the result of the old test and thecurrent test (not shown in the figure).

The two models basically use the same fea-tures but differ in the modelling of the progres-

234 K. G. Olesen, O. K. Hejlesen, R. Dessau, I. Beltoft, and M. Trangeled

Page 245: Proceedings of the Third European Workshop on Probabilistic

Consultation 1 Consultation 2 Consultation N

...Lyme disease

IgM

IgMTest

FP_IgM

IgG

IgGTest

FP_IgG

OtherEM

Rash

EM

OtherMS

Arthritis

Arthrit

OtherNeuro

Early Neuro

ENB

OtherCardit

Carditis

Cardit

OtherLym

Lymphocytom

Lymcyt

OtherCNB

Chronic

Neuro

CNB

OtherACA

ACA

ACA

Lyme disease

IgM

IgMTest

FP_IgM

IgG

IgGTest

FP_IgG

OtherEM

Rash

EM

OtherMS

Arthritis

Arthrit

OtherNeuro

Early Neuro

ENB

OtherCardit

Carditis

Cardit

OtherLym

Lymphocytom

Lymcyt

OtherCNB

Chronic

Neuro

CNB

OtherACA

ACA

ACA

Lyme disease

IgM

IgMTest

FP_IgM

IgG

IgGTest

FP_IgG

OtherEM

Rash

EM

OtherMS

Arthritis

Arthrit

OtherNeuro

Early Neuro

ENB

OtherCardit

Carditis

Cardit

OtherLym

Lymphocytom

Lymcyt

OtherCNB

Chronic

Neuro

CNB

OtherACA

ACA

ACA

Month of Bite

Lyme

DiseaseTick Bite 2

Exposure

Recall Bite

Tick BiteTick Activity

Gender

Age

Figure 5: The structure of the tailored patient specific model is composed of background knowledgeand consultation modules.

sion of the disease. Further, the model by Fiskeret al. takes the use of data from previous con-sultations into consideration.

Both models implement the possibility ofreading the probability of having Lyme diseasein one of the three stages, but with two dif-ferent approaches. The model by Fisker et al.models the stages as states in a single variable,whereas the model by Dessau and Andersenmodels the stages explicitly as causally depen-dent variables.

The main problem in both models is the tem-poral progression of Lyme disease. Dessau andAndersen’s model gives a static picture of thecurrent situation, that only takes the total du-ration of the period of illness into consideration.Fisker et al. is more explicit in the modellingof the progress of the disease by letting statesdenote time intervals since a symptom was firstobserved and by duplicating variables for labo-ratory findings. This enables inclusion of find-ings from the past, but is still limited to singleobservations for clinical evidence and two en-tries for laboratory tests.

Besides the difficulties in determining thestructure of the models, a considerable effortwas required to specify the conditional proba-bilities.

In the following section we propose a furtherelaboration of the modelling by introduction ofcontinuous time and an arbitrary number ofconsultations.

4 Tailoring patient specific models

We aim for a Bayesian network that incor-porates the full clinical history of individualpatients. The model is composed of mod-

ules, where each module describes a consul-tation. The problem is that consultationsmay appear at random points in time andthat the conditional probabilities for the symp-toms at each consultation vary, dependingon the time that has passed since the tickbite. Therefore frameworks such as dynamicBayesian networks and Markov chains do notapply. We introduce continuous time inspiredby (Nodelman et al., 2002). The proposed ap-proach involve a model for the conditional prob-abilities, but before reaching that point we de-termine the structure of the model.

4.1 Structure

The structure of the patient specific model isillustrated in Figure 5. At the first consulta-tion the background knowledge module is linkedto a generic consultation module consisting ofthe disease node, clinical findings and labora-tory findings. In order to keep focus on theoverall modelling technique we deliberately keptthe model simple. The consultation module isa naive Bayes model, and subsequent consulta-tions are included in the the model by extended-ing it with a new consultation module. Thisprocess can be repeated for an arbitrary num-ber of consultations. The consultation modulesare connected only through the disease nodes.This is a quite strong assumption that may bedebated, but it simplifies things and keep thecomplexity of the resulting model linear in thenumber of consultations. Less critical is thatthe disease nodes do not take the stage of thedisease into account; we consider that the clas-sification into stages are mostly for descriptivepurposes, but it could be modeled as stages in

Diagnosing Lyme disease - Tailoring patient specific Bayesian networks for temporal reasoning 235

Page 246: Proceedings of the Third European Workshop on Probabilistic

the disease node, although this would compli-cate the quantitative specification slightly.

4.2 Conditional probability tables

As the symptoms of Lyme disease vary overtime we assume a simple model for the condi-tional probabilities in the model. We approxi-mate the conditional probabilities for the symp-toms by a continuous function, composed as amixture of two sigmoid functions. This is asomewhat arbitrary choise; other models maybe investigated, but for the present purpose thissimple model suffice. The functions describingthe conditional probabilities are based on em-pirical knowledge. From existing databases weextract the probabilities for the various symp-toms at different times since the infection wasinflicted on the patient. The time of infliction isusually not known, but around one third of thepatients recall a tick bit. In other cases there isindirect evidence for the time of the bite, suchas limited periods of exposure.

An example is shown in Figure 6, where theresulting functions for the probability of show-ing the symptom EM up to 100 days after theinfection are drawn. As can be seen, the age ofthe patient has been incorporporated as a pa-rameter to the function. Thus, we can computethe conditional probabilities for a consultationmodule directly, provided that the time sincethe infection is known. This is rarely the case.

4.3 Determining the time of infection

The purpose of the Bayesian model is to cal-culate the probability of Lyme disease based onthe clinical findings and possible laboratory evi-dence. As described in the previous section, theconditional probabilities of the symptoms andthe serology are determined by the time sincethe tick bite. Thus, in order to construct thecomplete model it is necessary to know the timeof infliction.

The clinical history for a given patient is il-lustrated in Figure 7. Different time intervalsare shown in the figure. The interval t1 is thetime from the infection to the first consulta-tion, t2 is the interval between the first and thesecond consultation and so on. When the pa-

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Days since infection

P(

EM

| L

yme d

isease

)

Age: 020Age: 2170Age: >70

Figure 6: The probability of experiencing ery-thema migrans (EM) over time modelled as con-tinuous functions. The development of EM isdependent on the age of the patient.

Time of

infection

1'st consultation 2'nd consultation

t2t1

time

n’th consultation

. . .tn

Figure 7: Time line that represents the consul-tations that forms the clinical history.

tient consults the medical practitioner the timeof the consultation is registered. Therefore, theintervals between the consultations are known,whereas the interval, t1, from the infection tothe first consultation is unknown and must beestimated. It is often difficult to determine t1,because a patient may not recollect a tick bite,and because symptoms may not appear untillate in the progress of the disease.

As the conditional probabilities for the con-sultation modules vary depending on t1, differ-ent values of t1 will result in Bayesian networkswith different quantitative specification. Hence,estimation of t1 is a crucial part of the construc-tion of the patient specific model. We tacklethis problem by hypothesising a number of dif-ferent models, and we identify those that bestmatches the evidence. Each model includes allconsultations for the patient. Thus, we have afixed structure with different instantiations of

236 K. G. Olesen, O. K. Hejlesen, R. Dessau, I. Beltoft, and M. Trangeled

Page 247: Proceedings of the Third European Workshop on Probabilistic

the conditional probabilities for different valuesof t1.

The probability of a model given the evidencecan be calculated by using Bayes’ rule:

P (M | e) =P (e | M) · P (M)

P (e)(1)

P (e | M) can be extracted from the hy-pothesised Bayesian network as the normaliza-tion constant. The probability of the evidence,P (e), is not known, but is a model independentconstant. The prior probability of the model,P (M), is assumed to be equal for all models,and is therefore inversely proportional to thenumber of models. Thus, P (M | e) can be ex-pressed by

P (M | e) α P (e | M) (2)

The probability of Lyme disease, P (Ld) canbe read from each model and will, of course,vary depending on the model M. By usingthe fundamental rule, the joint probability ofP (Ld,M | e) can be obtained as

P (Ld,M | e) = P (Ld | M,e) · P (M | e) (3)

where P (M | e) can be substituted by usingequation 2:

P (Ld,M | e) α P (Ld | M,e) · P (e | M) (4)

The probability of Lyme disease given the ev-idence, can now be found by marginalizing overM:

P (Ld | e) =Tend∑

Tn=Tstart

P (Ld | MTn, e)·P (e | MTn

)

(5)Tstart and Tend represent an interval that sur-

rounds the most probable value of the time sincethe bite, t1, as illustrated in Figure 8.

The interval is chosen due to computationalconsiderations, and because the assumptionthat all models are equally probable is obviouslynot valid. Alternatively, we could simply choosethe model with highest probability.

P(M|e)

Time since

infection

Tstart Tend

Most probable model

t1

Figure 8: The figure illustrates the probabil-ity of the different models as a function of dayssince infection.

The estimation of the most probable modelhas been implemented by evaluating all modelsfor t1 in the interval 0-800 days. The upper limitof 800 days is based on the assumption that thegeneral development pattern of Lyme disease atthis time is stabilized and the symptoms arechronical. The size of the interval Tstart - Tend

has, somewhat arbritrarily, been set to 25 days.

5 Preliminary evaluation

In a preliminary test of the proposed approachthe results were compared to the outcome ofthe model by Dessau and Andersen. The eval-uation was focused on the structural problemsin the models. The cases that were used forthe evaluation are based on a survey performedby Ram Dessau where 2414 questionnaires havebeen collected from the Danish hospitals in Aal-borg and Herlev. The overall picture of the com-parison is that the proposed model in generalestimates a higher probability of Lyme diseasethan the model by Dessau and Andersen.

In order to evaluate the effect of the incorpo-ration of the clinical history in the time slicedmodel, a number of typical courses of Lyme dis-ease have been evaluated as single consultationswithout utilizing the clinical history and as con-sultations utilizing the previous history. At thesame time, each of the consultations have beenevaluated in the model by Dessau and Ander-sen in order to evaluate how it handles the laterstage symptoms, when the clinical history is notincorporated.

This informal study indicates that the timesliced model behaves as intended when the clini-cal history is incorporated. The estimated prob-

Diagnosing Lyme disease - Tailoring patient specific Bayesian networks for temporal reasoning 237

Page 248: Proceedings of the Third European Workshop on Probabilistic

ability of Lyme disease is gradually increasedfor each added consultation, as the general de-velopment pattern of the disease is confirmed intypical courses.

In the estimates from the model by Dessauand Andersen the probability of Lyme diseasedecreases as the later stage symptoms are ob-served. This reasoning does not seem appropri-ate when it is known from earlier consultationsthat the clinical history points toward a borre-lial infection. The model by Dessau and Ander-sen is not designed to incorporate the clinicalhistory, but from the results it can be seen thatthis parameter is important in order to providereasonable estimates of the probability of Lymedisease.

6 Conclusion

Temporal reasoning is an integral part of medi-cal expertise. Many decisions are based on prog-noses or diagnoses where symptoms evolve overtime and the effects of treatments typically be-come apparent only with some delay. A com-mon scenario in the general practice is that apatient attends the clinic at different times ona non-regular basis.

Lyme disease is an infection characterised bya number of symptoms whose manifestationsevolve over time. In order to correctly classifythe disease it is important to include the clini-cal history of the patient. We have proposed amethod to include consultations scattered overtime at non-equidistant points. A descriptionof the evolution of symptoms over time formsthe basis of a dynamically tailored model thatdescribes a specific patient. The time of in-fliction of the disease is estimated by a modelsearch that identifies the most probable modelfor the patient given the pattern of symptomsover time.

Based on a preliminary evaluation it can beconcluded that the method proposed for han-dling nonequivalent time intervals in order to in-corporate the clinical history works satisfactory.In cases where the clinical history confirmedthe general development pattern of Lyme dis-ease the estimated probability the disease was

gradually increased from consultation to consul-tation, whereas it was reduced when the historydid not confirm the pattern.

We conclude that the proposed method seemsviable for temporal domains, where changingconditional probabilities can be modeled by con-tinuous functions.

References

[Arroyo-Figueroa and Sucar1999] Gustava Arroyo-Figueroa and Luis Enrique Sucar. 1999. ATemporal Bayesian Network for Diagnosis andPrediction. In Proceedings of the 15th Conferenceon Uncertainty in Artificial Intelligence, pages13–20. ISBN: 1-55860-614-9.

[Dessau and Andersen2001] Ram Dessau and Maj-Britt Nordahl Andersen. 2001. Beslutningsstøttemed Bayesiansk netværk - En model til tolkningaf laboratoriesvar. Technical report, Aalborg Uni-versitet. In danish.

[Fisker et al.2002] Brian Fisker, Kristian Kirk,Jimmy Klitgaard, and Lars Reng. 2002. DecisionSupport System for Lyme Disease. Technicalreport, Aalborg Universitet.

[Gray et al.2002] J.S. Gray, O. Kahl, R.S. Lane, andG. Stanek. 2002. Lyme Borreliosis: Biology, Epi-demiology and Control. Cabi Publishing. ISBN:0-85199-632-9.

[Jensen2001] Finn V. Jensen. 2001. Bayesian Net-works and Decision Graphs. Springer Verlag.ISBN: 0-387-95259-4.

[Nodelman et al.2002] Uri Nodelman, Christian R.Shelton, and Daphne Koller. 2002. ContinuousTime Bayesian Networks. In Proceedings of the18th Conference on Uncertainty in Artificial In-telligence, pages 378–387. ISBN: 1-55860-897-4.

[Smith et al.2003] Martin Smith, JeremyGray, Marta Granstrom, Crawford Re-vie, and George Gettinby. 2003. Euro-pean Concerted Action on Lyme Borreliosis.http://vie.dis.strath.ac.uk/vie/LymeEU/.Read 04-12-2003.

238 K. G. Olesen, O. K. Hejlesen, R. Dessau, I. Beltoft, and M. Trangeled

Page 249: Proceedings of the Third European Workshop on Probabilistic

Predictive Maintenance using Dynamic Probabilistic Networks

Demet Ozgur-UnluakınTaner Bilgic

Department of Industrial EngineeringBogazici University

Istanbul, 34342, Turkey

Abstract

We study dynamic reliability of systems where system components age with a constantfailure rate and there is a budget constraint. We develop a methodology to effectivelyprepare a predictive maintenance plan of such a system using dynamic Bayesian networks(DBNs). DBN representation allows monitoring the system reliability in a given planninghorizon and predicting the system state under different replacement plans. When thesystem reliability falls below a predetermined threshold value, component replacementsare planned such that maintenance budget is not exceeded. The decision of which com-ponent(s) to replace is an important issue since it affects future system reliability andconsequently the next time to do replacement. Component marginal probabilities giventhe system state are used to determine which component(s) to replace. Two approachesare proposed in calculating the marginal probabilities of components. The first is a myopicapproach where the only evidence belongs to the current planning time. The second is alook-ahead approach where all the subsequent time intervals are included as evidence.

1 Introduction

Maintenance can be performed either after thebreakdown takes place or before the problemarises. The former is reactive whereas the latteris proactive. Reactive maintenance is appropri-ate for systems where the failure does not re-sult in serious consequences. Decision-theoretictroubleshooting belongs to this category. Proac-tive or planned maintenance can be further clas-sified as preventive and predictive (Kothamasuet al., 2006). These differ in the schedulingbehaviour. Preventive maintenance performsmaintenance on a fixed schedule whereas in pre-dictive maintenance, the schedule is adaptivelydetermined. Reliability-centered maintenanceis predictive maintenance where reliability esti-mates of the system are used to develop a cost-effective schedule.

Decision-theoretic troubleshooting, whichbalances costs and likelihoods for the best ac-tion, is first studied by (Kalagnanam and Hen-rion, 1988). Heckerman et al.(1995) extend it to

the context of Bayesian networks (Pearl, 1988).A similar troubleshooting problem, where mul-tiple but independent faults are allowed, is ad-dressed in (Srinivas, 1995). More recent studiesare mostly due to researchers from the SACSO(Systems for Automated Customer Support Op-erations) project. By assuming a single fault,independent actions and constant costs, andmaking use of a simple model representationtechnique (Skaanning et al., 2000), they showthat the simple efficiency ordering yields an op-timal sequence of actions (Jensen et al., 2001).Langseth and Jensen (2001) present two heuris-tics for handling dependent actions and condi-tional costs. Langseth and Jensen (2003) pro-vide a formalism that combines the methodolo-gies used in reliability analysis and decision-theoretic troubleshooting. Koca and Bilgic(2004) present a generic decision-theoretic trou-bleshooter to handle troubleshooting tasks in-corporating questions, dependent actions, con-ditional costs, and any combinations of these.Decision-theoretic troubleshooting has always

Page 250: Proceedings of the Third European Workshop on Probabilistic

been studied as a static problem and with anobjective to reach a minimum-cost action plan.

On the reliability analysis side, fault diagnosisfinds its roots in (Vesely, 1970). Kothamasu etal.(2006) review the philosophies and techniquesthat focus on improving reliability and reducingunscheduled downtime by monitoring and pre-dicting machine health. Torres-Toledano andSucar (1998) develop a general methodology forreliability modelling of complex systems basedon Bayesian networks. Welch and Thelen (2000)apply DBNs to an example from the dynamicreliability community. Weber and Jouffe (2003,2006) present a methodology that helps de-veloping DBNs and Dynamic Object OrientedBayesian Networks (DOOBNs) from availabledata to formalize reliability of complex dynamicmodels. Muller et al. (2004) propose a method-ology to design a prognosis process taking intoaccount the behavior of environmental events.They develop a probabilistic model based on thetranslation of a preliminary process model intoDBN. Bouillaut et al. (2004) use causal prob-abilistic networks for the improvement of themaintenance of railway infrastructure. Weberet al. (2004) use DBN to model dependabil-ity of systems taking into account degradationsand failure modes governed by exogenous con-straints.

All of the studies related to reliability analysisusing DBNs are descriptive. Dynamic problemis represented with DBNs and the outcome ofthe analysis is how system reliability behavesin time. The impact of maintenance of an el-ement at a specific time on this behaviour isalso reported in some of them. However opti-mization of maintenance activities (i.e., findinga minimum cost plan) is not considered whichis the main motivation of our paper. Main-tenance is expensive and critical in most sys-tems. Unexpected breakdowns are not tolera-ble. That is why planning maintenance activ-ities intelligently is an important issue since itsaves money, service time and also lost produc-tion time. In this study, we are trying to opti-mize maintenance activities of a system wherecomponents age with a constant failure rateand there is a budget constraint. We develop

a methodology to effectively prepare a predic-tive maintenance plan using DBNs. One canargue that the failure of the system and its as-sociated costs can be modelled using influencediagrams or limited memory influence diagrams(LIMIDs). But we propose a way of represent-ing the problem as an optimization problem firstand then use only DBNs for fast inference undersome simplifying assumptions.

The rest of the paper is organized as follows:In Section 2, problem is defined and in Sec-tion 3, dynamic Bayesian network based modelsare proposed as a solution to the problem de-fined. Two approaches are presented in schedul-ing maintenance plans. Numerical results aregiven in Section 4. Finally Section 5 gives con-clusions and points future work.

2 Problem Definition

The problem we take up can be described as fol-lows: There is a system which consists of sev-eral components. We observe the system in dis-crete epochs and assume that system reliabilityis observable. System reliability is a functionof the interactions of the system componentswhich are not directly observable. We presumethat the reliability of the system should be keptover a predetermined threshold value in all pe-riods. This is reasonable in mission critical sys-tems where the failure of the system is a verylow probability event due to built in redundancyand other structural properties. Therefore, wedo not explicitly model the case where the sys-tem actually fails. Components age with a con-stant rate and it is possible to replace compo-nents in any period. Once replaced, the com-ponents will work at their full capacity. Thereis a given maintenance budget for each period,which the total replacement cost in that periodcannot exceed. Our aim is to minimize totalmaintenance cost in a planning horizon suchthat reliability of the system never falls belowthe threshold and maintenance budget is notexceeded. Furthermore we make the followingassumptions:

(i) Lifetime of any component in the systemis exponentially distributed. That is failure rate

240 D. Özgür-Ünlüakin and T. Bilgiç

Page 251: Proceedings of the Third European Workshop on Probabilistic

of any component is constant for all periods.(ii) All other conditional probability distrib-

utions used in the representation are discrete.(iii) All components and the system have two

states (“1” is the working state, “0” is the fail-ure state).

(iv) Components can be replaced at the be-ginning of each period. Once they are replaced,their working state probability (i.e., their relia-bility) become 1 in that period.

The problem can be expressed as a mathe-matical optimization problem.

The model parameters are:

i : index of componentst : index of time periodsλi : failure rate of component iRi1 : initial reliability of component icit : cost of replacing component i in period tBt : available maintenance budget in period tL : threshold value of system reliabilityf(·) : function mapping component reliabilities

Rit to system reliability R0t

The decision variables are:

Xit =

1 if component i is replaced in period t0 otherwise

Rit : reliability of component i in period tR0t : reliability of system in period t

The Predictive Maintenance (PM) model canbe formulated mathematically as follows:

Z(PM) = minT∑

t=1

n∑

i=1

citXit (1)

subject to

n∑

i=1

citXit ≤ Bt, ∀t (2)

R0t ≥ L, ∀t (3)

Rit = (1−Xit)e−λiRi,t−1 + Xit, ∀i, t (4)

R0t = f(R1t, R2t, ...Rnt), ∀t (5)

Xit ∈ 0, 1, ∀i, t (6)

0 ≤ Rit ≤ 1, ∀i, t (7)

The objective function (1) aims to minimizethe total component replacement cost. Con-straint set (2) represents the budget constraints.Constraint set (3) guarantees that system relia-bility in each period should be greater than thegiven threshold value. Constraints in (4) ensurethat if components are replaced their reliabilitybecomes 1, otherwise it will decrease with cor-responding failure rates. System reliability ineach period is calculated by constraint (5). Fi-nally constraints (6) and (7) define the boundson decision variables. In general, solving thisproblem may be quite difficult. The difficultylies in constraint sets (4) and (5). Constraintset (4) defines a non-linear relation of the deci-sion variables whereas constraint set (5) is muchmore generic. In fact, system reliability at timet can be a function of whole history of the sys-tem. It is this set of constraints and the impliedrelationships of constraint set (4) that we rep-resent using a dynamic Bayesian network.

Further assumptions are imposed in order tosimplify the above problem:

(v) Replacement costs of all components inany period are the same and they are all nor-malized at one.

(vi) Available budget in any period is normal-ized at one.

These assumptions indicate that in any pe-riod, only one replacement can be planned andthe objective function we are trying to minimizebecomes the total number of replacements in aplanning horizon.

3 Proposed Solution

The mathematical problem may be solved ana-lytically or numerically once the constraint set(5) is made explicit. However, as the causal rela-tions (represented with constraint set (5) in theproblem formulation) between the componentsand the system becomes more complex, it getsdifficult to represent and solve it mathemati-cally. We represent the constraint set (5) usingdynamic Bayesian Networks (DBNs) (Murphy,2002). A DBN is an extended Bayesian net-work (BN) which includes a temporal dimen-sion. BNs are a widely used formalism for repre-

Predictive Maintenance using Dynamic Probabilistic Networks 241

Page 252: Proceedings of the Third European Workshop on Probabilistic

senting uncertain knowledge. The main featuresof the formalism are a graphical encoding of aset of conditional independence relations and acompact way of representing a joint probabilitydistribution between random variables (Pearl,1988). BNs have the power to represent causalrelations between components and the systemusing conditional probability distributions. Itis possible to analyse the process over a largeplanning horizon with DBNs.

A1

S1

B1

A2

S2

B2

A3

S3

B3

Figure 1: DBN representation of a system with2 components

Figure 1 illustrates a DBN representation of asystem with 2 components, A and B. Solid arcsrepresent the causal relations between the com-ponents and the system node whereas dashedarcs represent temporal relation of the com-ponents between two consecutive time peri-ods. Note that this representation is Markov-ian whereas DBNs can represent more generaltransition structures. Temporal relations arethe transition probabilities of components dueto aging. Since the lifetime of any compo-nent in the system is exponentially distributed,the transition probabilities are constant becauseof the memoryless property of the exponentialdistribution given the time intervals are equal.Transition probability table for a componentwith two states (“1” is the working state, “0” isthe failure state) is given in Table 1.

Table 1: Transition probability for component iComp(t)

Comp(t + 1) 1 01 e−λit 00 1-e−λit 1

DBN representation allows monitoring thesystem reliability in a given planning horizonand predicting the system state under differ-ent replacement schedules. When the systemreliability falls below a predetermined thresh-old value, a component replacement is planned.The decision of which component to replace isan important issue since it affects future systemreliability and consequently the next time to doa replacement. Like in decision-theoretic trou-bleshooting (Heckerman et al., 1995), marginalprobabilities of components given the systemstate are used as efficiency measures of com-ponents in each period when a replacement isplanned. Let Sk the denote system state in pe-riod k and Cik denote the state of componenti in period k. The following algorithm summa-rizes our DBN approach:

(i) Initialize t = 1(ii) Infer system reliability P (Sk = 1) t ≤ k ≤ T(iii) Check if P (Sk = 1) ≤ L(iv) If P (Sk = 1) ≥ L ∀k, then stop.(v) Else prepare a replacement plan for period k

(a) Calculate Pik = P (Cik = 0|Sk = 0) ∀i(b) i∗ = arg maxPik(c) Update reliability of i∗ in k to 1.

P (Cik = 1)← 1(d) Update t = k + 1

(vi) If t > T , then stop.(vii) Else continue with step (ii)

Note that P (St = 1) = R0t and P (Cit = 1) =Rit. This is a myopic approach, since the onlyevidence in calculating marginal probabilities in(v.a) belongs to the system state at the currentplanning time. An alternative approach is totake into account future information which canbe transmitted by the transition probabilitiesof components. This is done by entering evi-dence to the system node from k + 1 to T asSk+1:T = 0. We call this approach the look-ahead approach. The algorithm is the same asabove except for step (v.a) which is replaced asfollows in look-ahead approach:

(v.a) Calculate Pik = P (Cik = 0|Sk+1:T = 0)∀i where Sk+1:T denotes Sk+1, ..., ST .

242 D. Özgür-Ünlüakin and T. Bilgiç

Page 253: Proceedings of the Third European Workshop on Probabilistic

4 Numerical Results

The DBN algorithm is coded in Matlab and usesthe Bayes Net Toolbox (BNT) (Murphy, 2001)to represent the causal and temporal relations,and to infer the reliability of the system. Twoapproaches, myopic and look-ahead, are com-pared on a small example with two componentsgiven in Figure 1. The planning horizon is takenas 100 periods and the threshold value is givenas 0.50.

First scenario is created by taking mean timeto failure (MTTF) (1/λi) of each componenti equal which is set at 40 periods. The samereplacement plan, given in Figure 2, is gener-ated by both approaches. This is because com-ponents have equal MTTFs and hence equaltransition probabilities. System reliability ofscenario 1 is illustrated in Figure 2 where thepeak points are the periods where a componentis replaced. On each peak point, the compo-nent which is planned for replacement is indi-cated in the figure. 4 replacements are plannedin both approaches. When a replacement oc-curs, system reliability jumps to a higher reli-ability value, and then gradually decreases astime evolves until the next replacement.

0 20 40 60 80 1000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9System Reliability − Threshold value=0.50

Time

Pro

babi

lity

A

B

A B

Figure 2: Scenario 1- system reliability and re-placement plan

Second scenario is created by differentiatingMTTFs of components. We decrease MTTFof component B to 10 periods. Replacement

plans in Figure 3 and Figure 4 are generated bythe myopic and look-ahead approaches, respec-tively. Replacement plans of the two approachesdiffer since components have different transitionprobabilities. Both approaches begin their re-placement plan in period 12, by selecting thesame component to replace. In the next replace-ment period (k = 21), different components areselected. Myopic approach selects componentB, because this replacement will make the sys-tem reliability higher in the short-term. Look-ahead approach selects component A, becausethis replacement will make the system reliabil-ity higher in the long-run. This is further illus-trated in Table 2. Although system reliabilityin the myopic approach is higher at t = 21, itdecreases faster than the system reliability un-der the look-ahead approach. Hence, the look-ahead approach plans its next replacement att = 29 while the myopic approach plans its nextreplacement at t = 28. By selecting A insteadof B, the look-ahead approach defers its nextreplacement time. So as a total, in scenario 2,the look-ahead approach generates 10 replace-ments in 100 periods while the myopic approachgenerates 11.

Table 2: Scenario 2- system reliability where21 ≤ t ≤ 28

Period 21 22 23 24Myopic .7712 .7169 .6673 .6220Look-ahead .7036 .6718 .6421 .6144Period 25 26 27 28Myopic .5805 .5425 .5077 .4758Look-ahead .5884 .5642 .5414 .5200

System reliability of scenario 2 is illustratedin Figures 3 and 4 for myopic and look-aheadapproaches, respectively. In Figure 3, there are11 peak points which means 11 replacements areplanned. In Figure 4, there are 10 peak pointswhich means 10 replacements are planned.

A third scenario is also carried out by furtherdecreasing MTTF of component B to 5 peri-ods. The discrepancy between plans generatedby the two approaches becomes more apparent.Myopic approach plans a total of 19 replace-

Predictive Maintenance using Dynamic Probabilistic Networks 243

Page 254: Proceedings of the Third European Workshop on Probabilistic

10 20 30 40 50 60 70 80 90 100

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

System Reliability − Threshold value=0.50

Time

Pro

babi

lity

B

B A

B

B

B A

B

B

B A

Figure 3: Scenario 2- system reliability and re-placement plan with the myopic approach

ments while look-ahead approach plans a totalof 16 replacements.

When the threshold value increases, an inter-esting question arises: Does it still worth to ac-count for future information in choosing whichcomponent to replace? Table 3 shows the num-ber of replacements found by the myopic andthe look-ahead approaches at various threshold(L) values for scenario 2. When L = 0.80, my-opic approach finds fewer replacements than thelook-ahead approach. This is because as thresh-old increases, more frequent replacements willbe planned, hence focusing on short-term relia-bility instead of the future reliability may resultin less number of replacements.

Table 3: Number of replacements for the myopicand look-ahead approaches at various thresholdvalues for T = 100

Threshold Myopic Look-ahead0.50 11 100.60 15 150.70 23 220.80 33 350.90 67 670.95 100 100

In order to understand how good our method-ology is, we enumerate all possible solutions of

0 20 40 60 80 1000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95System Reliability − Threshold value=0.50

Time

Pro

babi

lity

B

B B B B

B

A

A A A

Figure 4: Scenario 2- system reliability and re-placement plan with the look-ahead approach

k replacements in a reasonable time horizon.Here, k refers to the upper bound of minimumreplacements given by our DBN algorithm. Thetotal number of solutions in a horizon of T pe-riods with k replacements is given as:

(Tk

)2k (8)

The first term is the total number of all possiblesize-k subsets of a size-T set. This correspondsto the total number of all possible time alterna-tives of k replacements in horizon T . The sec-ond term is the total number of all possible re-placements for two components. Since this solu-tion space becomes intractable with increasingT and k, a smaller part of the planning horizonis taken for enumeration of both scenarios. Westarted working with T = 50 periods where 2repairs are proposed and T = 30 periods where3 repairs are proposed by our algorithm for sce-nario 1 and scenario 2, respectively. The nextreplacements correspond to t = 62 (Figure 2)and t = 40 (Figure 4). Hence, we increasedT = 61 and T = 39, the periods just beforethe 3rd and 4th replacements given by our al-gorithm for scenarios 1 and 2. The number offeasible solutions found by enumeration are re-ported in Table 4.

When we decrease k, number of replacements,by 1 (k = 1 and k = 2 for scenarios 1 and 2 re-

244 D. Özgür-Ünlüakin and T. Bilgiç

Page 255: Proceedings of the Third European Workshop on Probabilistic

Table 4: Enumeration resultsScen- Horizon Number of Feasibleario Replacements Solutions1 50 2 3241 50 1 01 61 2 21 61 1 02 30 3 15432 30 2 02 39 3 12 39 2 0

spectively); no feasible solution is found. So, itis not possible to find a plan with less replace-ments given by our algorithm for these cases.Note also that, in scenario 1 when T = 61 andk = 2, enumeration finds two feasible solutionswhich are in fact symmetric solutions found alsoby our algorithm. Similarly in scenario 2 whenT = 39 and k = 3, enumeration finds one fea-sible solution which is the one found by our al-gorithm with the look-ahead approach. Thereare 1543 feasible solutions for scenario 2 withT = 30 and k = 3. Our lookahead approachfinds one of these solutions and the solutionit finds has the maximum system reliability atT = 30 among all solutions. The same observa-tion is also valid for scenario 1 with T = 50 andk = 2.

5 Conclusion

We study dynamic reliability of a system wherecomponents age with a constant failure rate andthere is a budget constraint. We develop amethodology to effectively prepare a good pre-dictive maintenance plan using DBNs to rep-resent the causal and temporal relations, andto infer reliability values. We try to minimizenumber of component replacements in a plan-ning horizon by first deciding the time and thenthe component to replace. Two approaches arepresented to choose the component and theyare compared on three scenarios and variousthreshold values. When failure rates of com-ponents are equal, they find the same replace-ment plan. However, as failure rates differ, the

two approaches may end up with different num-ber of replacements. This is because the look-ahead approach takes future system reliabilityinto consideration while the myopic approachfocuses on the current planning time.

In this kind of predictive maintenance prob-lem, there are two important decisions: One isthe time of replacement. Replacement shouldbe done such that system reliability is alwaysguaranteed to be over a threshold. The otherdecision is which component(s) to replace inthat period such that budget is not exceededand total replacement cost is minimized. Ourmethodology is based on separating these deci-sions under assumptions (v) and (vi). We givethe former decision by monitoring the first pe-riod when system reliability just falls below agiven threshold. In other words, we defer a re-placement decision as far as the threshold per-mits. As for the latter decision, we propose twoapproaches, myopic and look-ahead, to choosethe component to replace. By enumerating fea-sible solutions in a reasonable horizon, we showthat our method is effective for our simplifiedproblem where the objective has become min-imizing total replacements in a given planninghorizon.

In this paper, we outline a method that canbe used for finding a minimum-cost predictivemaintenance schedule such that the system reli-ability is always above a certain threshold. Theapproach is normative in nature as opposed todescriptive which is the case in most of the lit-erature that uses DBNs in reliability analysis.

The problem becomes more complex by(i) differentiating component costs (in time),(ii) differentiating available budget in time,(iii) defining a maintenance fixed cost for eachperiod which may or may not differ in time. Inthese cases, separating the two decisions maynot give a good solution. Studying such cases isleft for future work.

Acknowledgments

This work is supported by Bogazici UniversityResearch Fund under grant BAP06A301D.Demet Ozgur-Unluakın is supported byBogazici University Foundation to attend the

Predictive Maintenance using Dynamic Probabilistic Networks 245

Page 256: Proceedings of the Third European Workshop on Probabilistic

PGM ’06 workshop.

References

Laurent Bouillaut, Philippe Weber, A. Ben Salemand Patrice Aknin. 2004. Use of causal probabilis-tic networks for the improvement of the mainte-nance of railway infrastructure. In IEEE Interna-tional Conference on Systems, Man and Cyber-natics, pages 6243-6249.

David Heckerman, John S. Breese and Koos Rom-melse. 1995. Decision-theoretic troubleshooting,Communications of the ACM, 38(3): 49-57.

Finn V. Jensen, Uffe. Kjærulff, Brian. Kristiansen,Helge Langseth, Claus Skaanning, Jiri Vomlel andMarta Vomlelovaa. 2001. The SACSO methodol-ogy for troubleshooting complex systems. Artifi-cial Intelligence for Engineering, Design, Analysisand Manufacturing,15(4):321-333.

Jayant Kalagnanam and Max Henrion. 1988. A com-parison of decision analysis and expert rules forsequential analysis. In 4th Conference on Uncer-tainty in Artificial Intelligence, pages 271-281.

Eylem Koca and Taner Bilgic. 2004. Troubleshoot-ing with dependent actions, conditional costs, andquestions, Technical Report, FBE-IE-13/2004-18,Bogazici University.

Ranganath Kothamasu, Samuel H. Huang andWilliam H. VerDuin. 2006. System health mon-itoring and prognostics - a review of current par-adigms and practices. Int J Adv Manuf Technol,28:1012-1024.

Helge Langseth and Finn V. Jensen. 2003. Decisiontheoretic troubleshooting of coherent systems. Re-liability Engineering and System Safety, 80(1):49-62.

Helge Langseth and Finn V. Jensen. 2001. Heuris-tics for two extensions of basic troubleshooting.In 7th Scandinavian Conference on Artificial In-telligence, Frontiers in Artificial Intelligence andApplications, pages 80-89.

Alexandre Muller, Philippe Weber and A. BenSalem. 2004. Process model-based dynamicBayesian networks for prognostic. In IEEE 4thInternational Conference on Intelligent SystemsDesign and Applications.

Kevin Patrick Murphy. 2002. Dynamic Bayesiannetworks: representation, inference and learn-ing, Ph.D. Dissertation, University of California,Berkeley.

Kevin Patrick Murphy. 2001. The Bayes Net Tool-box for Matlab, Computing Science and Statistics:Proceedings of the Interface.

Jude Pearl. 1988. Probabilistic reasoning in intel-ligent systems: Networks of plausible inference,Morgan Kaufmann Publishers.

Sampath Srinivas. 1995. A polynomial algorithm forcomputing the optimal repair strategy in a systemwith independent component failures. In 11th An-nual Conference on Uncertainty in Artificial In-telligence, pages 515-522.

Claus Skaanning, Finn V. Jensen and Uffe Kjærulff.2000. Printer troubleshooting using Bayesian net-works. In 13th International Conference on In-dustrial and Engineering Applications of ArtificialIntelligence and Expert Systems, pages 367-379.

Jose Gerardo Torres-Toledano and Luis Enrique Su-car. 1998. Bayesian networks for reliability analy-sis of complex systems. In 6th Ibero-AmericanConference on AI: Progress in Artificial Intelli-gence, pages 195-206.

W. E. Vesely. 1970. A time-dependent methodologyfor fault tree evaluation. Nuclear Engineering andDesign, 13:337-360.

Philippe Weber and Lionel Jouffe. 2006. Complexsystem reliability modeling with Dynamic ObjectOriented Bayesian Networks (DOOBN). Reliabil-ity Engineering and System Safety, 91: 149-162.

Philippe Weber, Paul Munteanu and Lionel Jouffe.2004. Dynamic Bayesian networks modeling thedependability of systems with degradations andexogenous constraints. In 11th IFAC Symposiumon Informational Control Problems in Manufac-turing (INCOM’04).

Philippe Weber and Lionel Jouffe. 2003. Reliabilitymodeling with dynamic Bayesian networks. In 5thIFAC Symposium SAFEPROCESS’03, pages 57-62.

Robert L. Welch and Travis V. Thelen. 2000. Dy-namic reliability analysis in an operational con-text: the Bayesian network perspective. In Dy-namic Reliability: Future Directions, pages 277-307.

246 D. Özgür-Ünlüakin and T. Bilgiç

Page 257: Proceedings of the Third European Workshop on Probabilistic

Reading Dependencies from the Minimal UndirectedIndependence Map of a Graphoid that Satisfies Weak Transitivity

Jose M. PenaLinkoping UniversityLinkoping, Sweden

Roland NilssonLinkoping UniversityLinkoping, Sweden

Johan BjorkegrenKarolinska InstitutetStockholm, Sweden

Jesper TegnerLinkoping UniversityLinkoping, Sweden

Abstract

We present a sound and complete graphical criterion for reading dependencies from theminimal undirected independence map of a graphoid that satisfies weak transitivity. Weargue that assuming weak transitivity is not too restrictive.

1 Introduction

A minimal undirected independence map G ofan independence model p is used to read inde-pendencies that hold in p. Sometimes, however,G can also be used to read dependencies hol-ding in p. For instance, if p is a graphoid that isfaithful to G then, by definition, vertex separa-tion is a sound and complete graphical criterionfor reading dependencies from G. If p is simplya graphoid, then there also exists a sound andcomplete graphical criterion for reading depen-dencies from G (Bouckaert, 1995).

In this paper, we introduce a sound and com-plete graphical criterion for reading dependen-cies from G under the assumption that p is agraphoid that satisfies weak transitivity. Ourcriterion allows reading more dependencies thanthe criterion in (Bouckaert, 1995) at the cost ofassuming weak transitivity. We argue that thisassumption is not too restrictive. Specifically,we show that there exist important families ofprobability distributions that are graphoids andsatisfy weak transitivity.

The rest of the paper is organized as follows.In Section 5, we present our criterion for readingdependencies from G. As will become clear la-ter, it is important to first prove that vertexseparation is sound and complete for readingindependencies from G. We do so in Section4. Equally important is to show that assumingthat p satisfies weak transitivity is not too res-trictive. We do so in Section 3. We start byreviewing some key concepts in Section 2 and

close with some discussion in Section 6.

2 Preliminaries

The following definitions and results can befound in most books on probabilistic graphi-cal models, e.g. (Pearl, 1988; Studeny, 2005).Let U denote a set of random variables. Un-less otherwise stated, all the independence mo-dels and graphs in this paper are defined overU. Let X, Y, Z and W denote four mutuallydisjoint subsets of U. An independence mo-del p is a set of independencies of the formX is independent of Y given Z. We representthat an independency is in p by X⊥⊥Y|Z andthat an independency is not in p by X 6⊥⊥Y|Z.An independence model is a graphoid whenit satisfies the following five properties: Sym-metry X ⊥⊥ Y|Z ⇒ Y ⊥⊥ X|Z, decomposi-tion X ⊥⊥ YW|Z ⇒ X ⊥⊥ Y|Z, weak unionX ⊥⊥YW|Z ⇒ X ⊥⊥Y|ZW, contraction X ⊥⊥Y|ZW∧X⊥⊥W|Z ⇒ X⊥⊥YW|Z, and intersec-tion X⊥⊥Y|ZW ∧X⊥⊥W|ZY ⇒ X⊥⊥YW|Z.Any strictly positive probability distribution isa graphoid.

Let sep(X,Y|Z) denote that X is separatedfrom Y given Z in a graph G. Specifically,sep(X,Y|Z) holds when every path in G bet-ween X and Y is blocked by Z. If G is an undi-rected graph (UG), then a path in G between Xand Y is blocked by Z when there exists someZ ∈ Z in the path. If G is a directed and acyclicgraph (DAG), then a path in G between X andY is blocked by Z when there exists a node Z

Page 258: Proceedings of the Third European Workshop on Probabilistic

in the path such that either (i) Z does not havetwo parents in the path and Z ∈ Z, or (ii) Z hastwo parents in the path and neither Z nor any ofits descendants in G is in Z. An independencemodel p is faithful to an UG or DAG G whenX ⊥⊥ Y|Z iff sep(X,Y|Z). Any independencemodel that is faithful to some UG or DAG is agraphoid. An UG G is an undirected indepen-dence map of an independence model p whenX⊥⊥Y|Z if sep(X,Y|Z). Moreover, G is a mi-nimal undirected independence (MUI) map of pwhen removing any edge from G makes it ceaseto be an independence map of p. A Markovboundary of X ∈ U in an independence modelp is any subset MB(X) of U \X such that (i)X⊥⊥U\X\MB(X)|MB(X), and (ii) no propersubset of MB(X) satisfies (i). If p is a graphoid,then (i) MB(X) is unique for all X, (ii) theMUI map G of p is unique, and (iii) two nodesX and Y are adjacent in G iff X ∈ MB(Y ) iffY ∈ MB(X) iff X 6⊥⊥Y |U \ (XY ).

A Bayesian network (BN) is a pair (G, θ)where G is a DAG and θ are parameters spe-cifying a probability distribution for each X ∈U given its parents in G, p(X|Pa(X)). TheBN represents the probability distribution p =∏

X∈U p(X|Pa(X)). Then, G is an indepen-dence map of a probability distribution p iff pcan be represented by a BN with DAG G.

3 WT Graphoids

Let X, Y and Z denote three mutually disjointsubsets of U. We call WT graphoid to anygraphoid that satisfies weak transitivity X ⊥⊥Y|Z ∧X⊥⊥Y|ZV ⇒ X⊥⊥V |Z ∨ V ⊥⊥Y|Z withV ∈ U\ (XYZ). We now argue that there existimportant families of probability distributionsthat are WT graphoids and, thus, that WT gra-phoids are worth studying. For instance, anyprobability distribution that is Gaussian or fai-thful to some UG or DAG is a WT graphoid(Pearl, 1988; Studeny, 2005). There also existprobability distributions that are WT graphoidsalthough they are neither Gaussian nor faithfulto any UG or DAG. For instance, it follows fromthe theorem below that the probability distribu-tion that results from marginalizing some nodes

out and instantiating some others in a probabi-lity distribution that is faithful to some DAGis a WT graphoid, although it may be neitherGaussian nor faithful to any UG or DAG.

Theorem 1. Let p be a probability distributionthat is a WT graphoid and let W ⊆ U. Then,p(U \W) is a WT graphoid. If p(U \W|W =w) has the same independencies for all w, thenp(U\W|W = w) for any w is a WT graphoid.

Proof. Let X, Y and Z denote three mutuallydisjoint subsets of U \W. Then, X⊥⊥Y|Z inp(U \W) iff X⊥⊥Y|Z in p and, thus, p(U \W)satisfies the WT graphoid properties because psatisfies them. If p(U \ W|W = w) has thesame independencies for all w then, for any w,X⊥⊥Y|Z in p(U\W|W = w) iff X⊥⊥Y|ZW inp. Then, p(U \W|W = w) for any w satisfiesthe WT graphoid properties because p satisfiesthem.

We now show that it is not too restrictive toassume in the theorem above that p(U\W|W =w) has the same independencies for all w, be-cause there exist important families of proba-bility distributions whose all or almost all themembers satisfy such an assumption. For ins-tance, if p is a Gaussian probability distribution,then p(U \W|W = w) has the same indepen-dencies for all w, because the independencies inp(U\W|W = w) only depend on the variance-covariance matrix of p (Anderson, 1984). Letus now consider all the multinomial probabilitydistributions for which a DAG G is an indepen-dence map and denote them by M(G). Thefollowing theorem, which is inspired by (Meek,1995), proves that the probability of randomlydrawing from M(G) a probability distributionp such that p(U\W|W = w) does not have thesame independencies for all w is zero.

Theorem 2. The probability distributions p inM(G) for which there exists some W ⊆ U suchthat p(U \W|W = w) does not have the sameindependencies for all w have Lebesgue measurezero wrt M(G).

Proof. The proof basically proceeds in the sameway as that of Theorem 7 in (Meek, 1995), so werefer the reader to that paper for more details.

248 J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér

Page 259: Proceedings of the Third European Workshop on Probabilistic

Let W ⊆ U and let X, Y and Z denote threedisjoint subsets of U\W. For a constraint suchas X ⊥⊥ Y|Z to be true in p(U \ W|W = w)but false in p(U \ W|W = w′), two condi-tions must be met. First, sep(X,Y|ZW) mustnot hold in G and, second, the following equa-tions must be satisfied: p(X = x,Y = y,Z =z,W = w)p(Z = z,W = w) − p(X = x,Z =z,W = w)p(Y = y,Z = z,W = w) = 0for all x, y and z. Each equation is a poly-nomial in the BN parameters corresponding toG, because each term p(V = v) in the equa-tions is the summation of products of BN para-meters (Meek, 1995). Furthermore, each poly-nomial is non-trivial, i.e. not all the values ofthe BN parameters corresponding to G are so-lutions to the polynomial. To see it, note thatthere exists a probability distribution q in M(G)that is faithful to G (Meek, 1995) and, thus,that X 6⊥⊥ Y|ZW in q because sep(X,Y|ZW)does not hold in G. Then, by permuting thestates of the random variables, we can trans-form the BN parameter values corresponding toq into BN parameter values for p so that the po-lynomial does not hold. Let sol(x,y, z,w) de-note the set of solutions to the polynomial forx, y and z. Then, sol(x,y, z,w) has Lebesguemeasure zero wrt Rn, where n is the number oflinearly independent BN parameters correspon-ding to G, because it consists of the solutionsto a non-trivial polynomial (Okamoto, 1973).Let sol =

⋃X,Y,Z,W

⋃w

⋂x,y,z sol(x,y, z,w)

and recall from above that the outer-mostunion only involves those cases for whichsep(X,Y|ZW) does not hold in G. Then, solhas Lebesgue measure zero wrt Rn, because thefinite union and intersection of sets of Lebesguemeasure zero has Lebesgue measure zero too.Consequently, the probability distributions p inM(G) such that p(U \ W|W = w) does nothave the same independencies for all w haveLebesgue measure zero wrt Rn because theyare contained in sol. These probability distri-butions also have Lebesgue measure zero wrtM(G), because M(G) has positive Lebesguemeasure wrt Rn (Meek, 1995).

Finally, we argue in Section 6 that it is not

unrealistic to assume that the probability dis-tribution underlying the learning data in mostprojects on gene expression data analysis, oneof the hottest areas of research nowadays, is aWT graphoid.

4 Reading Independencies

By definition, sep is sound for reading indepen-dencies from the MUI map G of a WT graphoidp, i.e. it only identifies independencies in p.Now, we prove that sep in G is also completein the sense that it identifies all the indepen-dencies in p that can be identified by studyingG alone. Specifically, we prove that there existmultinomial and Gaussian probability distribu-tions that are faithful to G. Such probabilitydistributions have all and only the independen-cies that sep identifies from G. Moreover, suchprobability distributions must be WT graphoidsbecause sep satisfies the WT graphoid proper-ties (Pearl, 1988). The fact that sep in G iscomplete, in addition to being an important re-sult in itself, is important for reading as manydependencies as possible from G (see Section 5).

Theorem 3. Let G be an UG. There exist mul-tinomial and Gaussian probability distributionsthat are faithful to G.

Proof. We first prove the theorem for multino-mial probability distributions. Create a copy Hof G and, then, replace every edge X − Y inH by X → WXY ← Y where WXY /∈ U is anauxiliary node. Let W denote all the auxiliarynodes created. Then, H is a DAG over UW.Moreover, for any three mutually disjoint sub-sets X, Y and Z of U, sep(X,Y|ZW) in H iffsep(X,Y|Z) in G.

The probability distributions p(U,W) inM(H) that are faithful to H and satisfy thatp(U|W = w) has the same independencies forall w have positive Lebesgue measure wrt M(H)because (i) M(H) has positive Lebesgue mea-sure wrt Rn (Meek, 1995), (ii) the probabi-lity distributions in M(H) that are not faithfulto H have Lebesgue measure zero wrt M(H)(Meek, 1995), (iii) the probability distributionsp(U,W) in M(H) such that p(U|W = w) doesnot have the same independencies for all w have

Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that SatisfiesWeak Transitivity 249

Page 260: Proceedings of the Third European Workshop on Probabilistic

Lebesgue measure zero wrt M(H) by Theorem2, and (iv) the union of the probability distri-butions in (ii) and (iii) has Lebesgue measurezero wrt M(H) because the finite union of setsof Lebesgue measure zero has Lebesgue measurezero.

Let p(U,W) denote any probability distribu-tion in M(H) that is faithful to H and satis-fies that p(U|W = w) has the same indepen-dencies for all w. As proven in the paragraphabove, such a probability distribution exists.Fix any w and let X, Y and Z denote threemutually disjoint subsets of U. Then, X⊥⊥Y|Zin p(U|W = w) iff X ⊥⊥ Y|ZW in p(U,W)iff sep(X,Y|ZW) in H iff sep(X,Y|Z) in G.Then, p(U|W = w) is faithful to G.

The proof for Gaussian probability distribu-tions is analogous. In this case, p(U,W) is aGaussian probability distribution and thus, forany w, p(U|W = w) is Gaussian too (Ander-son, 1984). Theorem 2 is not needed in the proofbecause, as discussed in Section 3, any Gaussianprobability distribution p(U,W) satisfies thatp(U|W = w) has the same independencies forall w.

The theorem above has previously been pro-ven for multinomial probability distributionsin (Geiger and Pearl, 1993), but the proofconstrains the cardinality of U. Our proof doesnot constraint the cardinality of U and appliesnot only to multinomial but also to Gaussianprobability distributions. It has been provenin (Frydenberg, 1990) that sep in an UG G iscomplete in the sense that it identifies all theindependencies holding in every Gaussian pro-bability distribution for which G is an indepen-dence map. Our result is stronger because itproves the existence of a Gaussian probabilitydistribution with exactly these independencies.We learned from one of the reviewers, whom wethank for it, that a rather different proof of thetheorem above for Gaussian probability distri-butions is reported in (Lnenicka, 2005).

The theorem above proves that sep in theMUI map G of a WT graphoid p is completein the sense that it identifies all the indepen-dencies in p that can be identified by studying

G alone. However, sep in G is not complete ifbeing complete is understood as being able toidentify all the independencies in p. Actually,no sound criterion for reading independenciesfrom G alone is complete in the latter sense.An example follows.

Example 1. Let p be a multinomial (Gaussian)probability distribution that is faithful to theDAG X → Z ← Y . Such a probability dis-tribution exists (Meek, 1995). Let G denotethe MUI map of p, namely the complete UG.Note that p is not faithful to G. However, byTheorem 3, there exists a multinomial (Gaus-sian) probability distribution q that is faithfulto G. As discussed in Section 3, p and q are WTgraphoids. Let us assume that we are dealingwith p. Then, no sound criterion can concludeX ⊥⊥Y |∅ by just studying G because this inde-pendency does not hold in q, and it is impossibleto know whether we are dealing with p or q onthe sole basis of G.

5 Reading Dependencies

In this section, we propose a sound and com-plete criterion for reading dependencies fromthe MUI map of a WT graphoid. We define thedependence base of an independence model p asthe set of all the dependencies X 6⊥⊥Y |U \ (XY )with X, Y ∈ U. If p is a WT graphoid, then ad-ditional dependencies in p can be derived fromits dependence base via the WT graphoid pro-perties. For this purpose, we rephrase the WTgraphoid properties as follows. Let X, Y, Zand W denote four mutually disjoint subsetsof U. Symmetry Y 6⊥⊥X|Z ⇒ X 6⊥⊥Y|Z. De-composition X 6⊥⊥ Y|Z ⇒ X 6⊥⊥ YW|Z. Weakunion X 6⊥⊥Y|ZW ⇒ X 6⊥⊥YW|Z. ContractionX 6⊥⊥YW|Z ⇒ X 6⊥⊥Y|ZW ∨X 6⊥⊥W|Z is pro-blematic for deriving new dependencies becauseit contains a disjunction in the right-hand sideand, thus, it should be split into two properties:Contraction1 X 6⊥⊥YW|Z ∧ X ⊥⊥Y|ZW ⇒ X6⊥⊥ W|Z, and contraction2 X 6⊥⊥ YW|Z ∧ X⊥⊥ W|Z ⇒ X 6⊥⊥ Y|ZW. Likewise, intersec-tion X 6⊥⊥YW|Z ⇒ X 6⊥⊥Y|ZW ∨X 6⊥⊥W|ZYgives rise to intersection1 X 6⊥⊥ YW|Z ∧ X⊥⊥ Y|ZW ⇒ X 6⊥⊥ W|ZY, and intersection2

250 J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér

Page 261: Proceedings of the Third European Workshop on Probabilistic

X 6⊥⊥ YW|Z ∧ X ⊥⊥ W|ZY ⇒ X 6⊥⊥ Y|ZW.Note that intersection1 and intersection2 areequivalent and, thus, we refer to them simplyas intersection. Finally, weak transitivity X 6⊥⊥V |Z ∧ V 6⊥⊥Y|Z ⇒ X 6⊥⊥Y|Z ∨X 6⊥⊥Y|ZV withV ∈ U \ (XYZ) gives rise to weak transitivity1X 6⊥⊥V |Z ∧ V 6⊥⊥Y|Z ∧X⊥⊥Y|Z ⇒ X 6⊥⊥Y|ZV ,and weak transitivity2 X 6⊥⊥V |Z∧ V 6⊥⊥Y|Z∧X⊥⊥ Y|ZV ⇒ X 6⊥⊥ Y|Z. The independency inthe left-hand side of any of the properties aboveholds if the corresponding sep statement holdsin the MUI map G of p. This is the best solutionwe can hope for because, as discussed in Section4, sep in G is sound and complete. Moreover,this solution does not require more informationabout p than what it is available, because G canbe constructed from the dependence base of p.We call the WT graphoid closure of the depen-dence base of p to the set of all the dependenciesthat are in the dependence base of p plus thosethat can be derived from it by applying the WTgraphoid properties.

We now introduce our criterion for readingdependencies from the MUI map of a WT gra-phoid. Let X, Y and Z denote three mutuallydisjoint subsets of U. Then, con(X,Y|Z) de-notes that X is connected to Y given Z in anUG G. Specifically, con(X,Y|Z) holds whenthere exist some X1 ∈ X and Xn ∈ Y such thatthere exists exactly one path in G between X1

and Xn that is not blocked by (X \ X1)(Y \Xn)Z. Note that there may exist several unblo-cked paths in G between X and Y but only onebetween X1 and Xn. We now prove that con issound for reading dependencies from the MUImap of a WT graphoid, i.e. it only identifiesdependencies in the WT graphoid. Actually, itonly identifies dependencies in the WT graphoidclosure of the dependence base of p. Hereinaf-ter, X1:n denotes a path X1, . . . , Xn in an UG.

Theorem 4. Let p be a WT graphoid and Gits MUI map. Then, con in G only identifiesdependencies in the WT graphoid closure of thedependence base of p.

Proof. We first prove that if X1:n is the onlypath in G between X1 and Xn that is not blo-cked by Y ⊆ U \ X1:n, then X1 6⊥⊥Xn|Y. We

prove it by induction over n. We first proveit for n = 2. Let W denote all the nodesin U \ X1:2 \ Y that are not separated fromX1 given X2Y in G. Since X1 and X2 areadjacent in G, X1 6⊥⊥ X2|U \ X1:2 and, thus,X1W 6⊥⊥X2(U \ X1:2 \Y \W)|Y due to weakunion. This together with sep(X1W,U \X1:2 \Y \W|X2Y), which follows from the definitionof W, implies X1W 6⊥⊥ X2|Y due to contrac-tion1. Note that if U \ X1:2 \ Y \ W = ∅,then X1W 6⊥⊥X2(U \X1:2 \Y \W)|Y directlyimplies X1W 6⊥⊥X2|Y. In any case, this inde-pendency together with sep(W, X2|X1Y), be-cause otherwise there exist several unblockedpaths in G between X1 and X2 which contra-dicts the definition of Y, implies X1 6⊥⊥ X2|Ydue to contraction1. Note that if W = ∅, thenX1W 6⊥⊥X2|Y directly implies X1 6⊥⊥X2|Y. Letus assume as induction hypothesis that the sta-tement that we are proving holds for all n < m.We now prove it for n = m. Since the pathsX1:2 and X2:m contain less than m nodes andY blocks all the other paths in G between X1

and X2 and between X2 and Xm, because other-wise there exist several unblocked paths in Gbetween X1 and Xm which contradicts the de-finition of Y, then X1 6⊥⊥X2|Y and X2 6⊥⊥Xm|Ydue to the induction hypothesis. This togetherwith sep(X1, Xm|YX2), which follows from thedefinition of X1:m and Y, implies X1 6⊥⊥Xm|Ydue to weak transitivity2.

Let X, Y and Z denote three mutually dis-joint subsets of U. If con(X,Y|Z) holds in G,then there exist some X1 ∈ X and Xn ∈ Ysuch that X1 6⊥⊥Xn|(X \ X1)(Y \ Xn)Z due tothe paragraph above and, thus, X 6⊥⊥Y|Z dueto weak union. Then, every con statement inG corresponds to a dependency in p. Moreover,this dependency must be in the WT graphoidclosure of the dependence base of p, because wehave only used in the proof the dependence baseof p and the WT graphoid properties.

We now prove that con is complete for rea-ding dependencies from the MUI map of a WTgraphoid p, in the sense that it identifies all thedependencies in p that follow from the informa-tion about p that is available, namely the de-

Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that SatisfiesWeak Transitivity 251

Page 262: Proceedings of the Third European Workshop on Probabilistic

pendence base of p and the fact that p is a WTgraphoid.

Theorem 5. Let p be a WT graphoid and Gits MUI map. Then, con in G identifies all thedependencies in the WT graphoid closure of thedependence base of p.

Proof. It suffices to prove (i) that all the depen-dencies in the dependence base of p are identi-fied by con in G, and (ii) that con satisfies theWT graphoid properties. Since the first pointis trivial, we only prove the second point. LetX, Y, Z and W denote four mutually disjointsubsets of U.

• Symmetry con(Y,X|Z) ⇒ con(X,Y|Z).Trivial.

• Decomposition con(X,Y|Z) ⇒con(X,YW|Z). Trivial if W contains nonode in the path X1:n in the left-handside. If W contains some node in X1:n,then let Xm denote the closest node toX1 such that Xm ∈ X1:n ∩W. Then, thepath X1:m satisfies the right-hand sidebecause (X \ X1)(YW \ Xm)Z blocks allthe other paths in G between X1 and Xm,since (X \ X1)(Y \ Xn)Z blocks all thepaths in G between X1 and Xm exceptX1:m, because otherwise there exist severalunblocked paths in G between X1 and Xn,which contradicts the left-hand side.

• Weak union con(X,Y|ZW) ⇒con(X,YW|Z). Trivial because Wcontains no node in the path X1:n in theleft-hand side.

• Contraction1 con(X,YW|Z) ∧sep(X,Y|ZW) ⇒ con(X,W|Z). SinceZW blocks all the paths in G betweenX and Y, then (i) the path X1:n in theleft-hand side must be between X and W,and (ii) all the paths in G between X1 andXn that are blocked by Y are also blockedby (W \Xn)Z and, thus, Y is not neededto block all the paths in G between X1

and Xn except X1:n. Then, X1:n satisfiesthe right-hand side.

• Contraction2 con(X,YW|Z) ∧sep(X,W|Z) ⇒ con(X,Y|ZW). Since Zblocks all the paths in G between X andW, the path X1:n in the left-hand sidemust be between X and Y and, thus, itsatisfies the right-hand side.

• Intersection con(X,YW|Z) ∧sep(X,Y|ZW) ⇒ con(X,W|ZY). SinceZW blocks all the paths in G between Xand Y, the path X1:n in the left-hand sidemust be between X and W and, thus, itsatisfies the right-hand side.

• Weak transitivity2 con(X, Xm|Z) ∧con(Xm,Y|Z) ∧ sep(X,Y|ZXm) ⇒con(X,Y|Z) with Xm ∈ U \ (XYZ). LetX1:m and Xm:n denote the paths in the firstand second, respectively, con statements inthe left-hand side. Let X1:m:n denote thepath X1, . . . , Xm, . . . , Xn. Then, X1:m:n

satisfies the right-hand side because (i) Zdoes not block X1:m:n, and (ii) Z blocksall the other paths in G between X1 andXn, because otherwise there exist severalunblocked paths in G between X1 and Xm

or between Xm and Xn, which contradictsthe left-hand side.

• Weak transitivity1 con(X, Xm|Z) ∧con(Xm,Y|Z) ∧ sep(X,Y|Z) ⇒con(X,Y|ZXm) with Xm ∈ U \ (XYZ).This property never applies because, asseen in weak transitivity2, sep(X,Y|Z)never holds since Z does not block X1:m:n.

Note that the meaning of completeness in thetheorem above differs from that in Theorem 3.It remains an open question whether con in Gidentifies all the dependencies in p that can beidentified by studying G alone. Note also thatcon in G is not complete if being complete isunderstood as being able to identify all the de-pendencies in p. Actually, no sound criterion forreading dependencies from G alone is completein this sense. Example 1 illustrates this point.Let us now assume that we are dealing with qinstead of with p. Then, no sound criterion can

252 J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér

Page 263: Proceedings of the Third European Workshop on Probabilistic

conclude X 6⊥⊥ Y |∅ by just studying G becausethis dependency does not hold in p, and it is im-possible to know whether we are dealing with por q on the sole basis of G.

We have defined the dependence base of aWT graphoid p as the set of all the dependen-cies X 6⊥⊥ Y |U \ (XY ) with X, Y ∈ U. Howe-ver, Theorems 4 and 5 remain valid if we re-define the dependence base of p as the set ofall the dependencies X 6⊥⊥ Y |MB(X) \ Y withX, Y ∈ U. It suffices to prove that the WTgraphoid closure is the same for both depen-dence bases of p. Specifically, we prove thatthe first dependence base is in the WT gra-phoid closure of the second dependence baseand vice versa. If X 6⊥⊥ Y |U \ (XY ), thenX 6⊥⊥Y (U\(XY )\(MB(X)\Y ))|MB(X)\Y dueto weak union. This together with sep(X,U \(XY ) \ (MB(X) \ Y )|Y (MB(X) \ Y )) impliesX 6⊥⊥ Y |MB(X) \ Y due to contraction1. Onthe other hand, if X 6⊥⊥ Y |MB(X) \ Y , thenX 6⊥⊥ Y (U \ (XY ) \ (MB(X) \ Y ))|MB(X) \Y due to decomposition. This together withsep(X,U\(XY )\(MB(X)\Y )|Y (MB(X)\Y ))implies X 6⊥⊥Y |U \ (XY ) due to intersection.

In (Bouckaert, 1995), the following sound andcomplete (in the same sense as con) criterionfor reading dependencies from the MUI mapof a graphoid is introduced: Let X, Y andZ denote three mutually disjoint subsets of U,then X 6⊥⊥Y|Z when there exist some X1 ∈ Xand X2 ∈ Y such that X1 ∈ MB(X2) and ei-ther MB(X1) \ X2 ⊆ (X \ X1)(Y \ X2)Z orMB(X2) \X1 ⊆ (X \X1)(Y \X2)Z. Note thatcon(X,Y|Z) coincides with this criterion whenn = 2 and either MB(X1) \X2 ⊆ (X \X1)(Y \X2)Z or MB(X2) \ X1 ⊆ (X \ X1)(Y \ X2)Z.Therefore, con allows reading more dependen-cies than the criterion in (Bouckaert, 1995) atthe cost of assuming weak transitivity which, asdiscussed in Section 3, is not a too restrictiveassumption.

Finally, the soundness of con allows us to givean alternative proof to the following theorem,which was originally proven in (Becker et al.,2000).

Theorem 6. Let p be a WT graphoid and G its

MUI map. If G is a forest, then p is faithful toit.

Proof. Any independency in p for which the cor-responding separation statement does not holdin G contradicts Theorem 4.

6 Discussion

In this paper, we have introduced a soundand complete criterion for reading dependenciesfrom the MUI map of a WT graphoid. In (Penaet al., 2006), we show how this helps to iden-tify all the nodes that are relevant to computeall the conditional probability distributions fora given set of nodes without having to learn aBN first. We are currently working on a soundand complete criterion for reading dependenciesfrom a minimal directed independence map of aWT graphoid.

Due to lack of time, we have not been able toaddress some of the questions posed by the re-viewers. We plan to do it in an extended versionof this paper. These questions were studyingthe relation between the new criterion and lackof vertex separation, studying the complexity ofthe new criterion, and studying the uniquenessand consistency of the WT graphoid closure.

Our end-goal is to apply the results in thispaper to our project on atherosclerosis gene ex-pression data analysis in order to learn depen-dencies between genes. We believe that it is notunrealistic to assume that the probability distri-bution underlying our data satisfies strict positi-vity and weak transitivity and, thus, it is a WTgraphoid. The cell is the functional unit of allthe organisms and includes all the informationnecessary to regulate its function. This informa-tion is encoded in the DNA of the cell, which isdivided into a set of genes, each coding for oneor more proteins. Proteins are required for prac-tically all the functions in the cell. The amountof protein produced depends on the expressionlevel of the coding gene which, in turn, dependson the amount of proteins produced by othergenes. Therefore, a dynamic Bayesian networkis a rather accurate model of the cell (Murphyand Mian, 1999): The nodes represent the genesand proteins, and the edges and parameters re-

Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that SatisfiesWeak Transitivity 253

Page 264: Proceedings of the Third European Workshop on Probabilistic

present the causal relations between the geneexpression levels and the protein amounts. Itis important that the Bayesian network is dy-namic because a gene can regulate some of itsregulators and even itself with some time delay.Since the technology for measuring the state ofthe protein nodes is not widely available yet,the data in most projects on gene expressiondata analysis are a sample of the probability dis-tribution represented by the dynamic Bayesiannetwork after marginalizing the protein nodesout. The probability distribution with no nodemarginalized out is almost surely faithful to thedynamic Bayesian network (Meek, 1995) and,thus, it satisfies weak transitivity (see Section 3)and, thus, so does the probability distributionafter marginalizing the protein nodes out (seeTheorem 1). The assumption that the proba-bility distribution sampled is strictly positive isjustified because measuring the state of the genenodes involves a series of complex wet-lab andcomputer-assisted steps that introduces noise inthe measurements (Sebastiani et al., 2003).

Additional evidence supporting the claimthat the results in this paper can be help-ful for learning gene dependencies comes fromthe increasing attention that graphical Gaussianmodels of gene networks have been receivingfrom the bioinformatics community (Schaferand Strimmer, 2005). A graphical Gaussian mo-del of a gene network is not more than the MUImap of the probability distribution underlyingthe gene network, which is assumed to be Gaus-sian, hence the name of the model. Then, thisunderlying probability distribution is a WT gra-phoid and, thus, the results in this paper apply.

Acknowledgments

We thank the anonymous referees for their com-ments. This work is funded by the Swedish Re-search Council (VR-621-2005-4202), the Ph.D.Programme in Medical Bioinformatics, the Swe-dish Foundation for Strategic Research, ClinicalGene Networks AB, and Linkoping University.

References

Theodore W. Anderson. 1984. An Introduction toMultivariate Statistical Analysis. John Wiley &Sons.

Ann Becker, Dan Geiger and Christopher Meek.2000. Perfect Tree-Like Markovian Distributions.In 16th Conference on UAI, pages 19–23.

Remco R. Bouckaert. 1995. Bayesian Belief Net-works: From Construction to Inference. PhD The-sis, University of Utrecht.

Morten Frydenberg. 1990. Marginalization and Col-lapsability in Graphical Interaction Models. An-nals of Statistics, 18:790–805.

Dan Geiger and Judea Pearl. 1993. Logical and Al-gorithmic Properties of Conditional Independenceand Graphical Models. The Annals of Statistics,21:2001–2021.

Radim Lnenicka. 2005. On Gaussian Conditional In-dependence Structures. Technical Report 2005/14, Academy of Sciences of the Czech Republic.

Christopher Meek. 1995. Strong Completeness andFaithfulness in Bayesian Networks. In 11th Confe-rence on UAI, pages 411–418.

Kevin Murphy and Saira Mian. 1999. ModellingGene Expression Data Using Dynamic BayesianNetworks. Technical Report, University of Cali-fornia.

Masashi Okamoto. 1973. Distinctness of the Eigen-values of a Quadratic Form in a MultivariateSample. Annals of Statistics, 1:763–765.

Judea Pearl. 1988. Probabilistic Reasoning in Intel-ligent Systems: Networks of Plausible Inference.Morgan Kaufmann.

Jose M. Pena, Roland Nilsson, Johan Bjorkegrenand Jesper Tegner. 2006. Identifying the Rele-vant Nodes Without Learning the Model. In 22ndConference on UAI, pages 367–374.

Juliane Schafer and Korbinian Strimmer. 2005.Learning Large-Scale Graphical Gaussian Modelsfrom Genomic Data. In Science of Complex Net-works: From Biology to the Internet and WWW.

Paola Sebastiani, Emanuela Gussoni, Isaac S. Ko-hane and Marco F. Ramoni. 2003. StatisticalChallenges in Functional Genomics (with Discus-sion). Statistical Science, 18:33–60.

Milan Studeny. 2005. Probabilistic Conditional In-dependence Structures. Springer.

254 J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér

Page 265: Proceedings of the Third European Workshop on Probabilistic

!#"$%&(')*+,.-//01

24365 7%89:<;>=?=@3 7A8B;>C)DE36;>C>81F&GIHJ8B;KCI:ML&NO8B8JPQ :MRS8JLTU:<;?TV=BWYX;IWZ=BL!U18JT3[=@;K8B;>C\F]=@URS^IT36;>P12I_M3[:<;>_`:<aMb>c+T%L:<_ed?T+c&;>3[HB:ML!a3[Tgf

h GjiGSkl=<monJp4Gqp@nBrsbtBuJp@n1vkwc+T%L:<_ed?TMb/vVdI:yxV:MTd>:MLe568B;>C>az@?|~~4eI|>4I>ss

E<SB URS3[L!36_M8B5S:MH436CI:<;S_`:aedI=VaTd>8JTV;>8B3[HB:Ok]8<fB:<ae368B;o_M58Baa3[>:ML!aR:MLW=BLeUs^>3[T%:Ol:<565_`=@URS8JL:<CAT%=U1=BL:a%=BRSd>3a%T36_M8JT%:<C1;I:MT=BLe_M58Baa3[>:ML!aMbB:MHB:<; 36;1Hs3[:M¡=BW36;S8B_M_M^ILe8B_M3[:<a3;TdI:<3[L RS8JLe8BU1:MT%:MLeaMG X;TdS36aR8JR/:MLbsl:Oa%T^>CIf TdI:+:`¢:<_`Ta=BWYa^>_ed£R8JLe8BU:MT%:ML36;>8B_M_M^>Le8B_M3[:<al¤?f¥36;~HB:<aT3[P@8JT36;IPTdI:Oa%:<;>ae3§¦T36Hs3[TfAW^S;>_`T3[=@;>a=BW8 ;>8B3[HB:k]8<fB:<a38B;¨_M58Baa3[>:MLG©ª:«CI:<U1=@;>a%T%Le8JT%:«Td>8JTMb8Ba+8_`=@;>a%:<s^I:<;>_`:«=BWTd>:+_M568Baae3[>:ML<¬qa36;>CI:MR:<;>CI:<;S_`:R>L=BR:MLT3[:<aMb?TdI:<a%:&a%:<;>a3[T36Hs3[TfW^>;S_`T3[=@;>a8JLe:&d>36P@d>5[f_`=@;>a%T%Le8B36;>:<CG©ª:&36;?HB:<a%T3[P@8JT%:V+dI:MTdI:MLTdI:VH­8JLe36=@^>a RS8JT%T%:MLe;Sa=BWa%:<;>a36T3[Hs36TgfTdS8JTW=@565[=®WZL=@U TdI:<a%:VW¯^>;>_`T36=@;>aae^IR>R=BLTTdI:=B¤Sa:MLHB:<CL=B¤S^>a%T;>:<aa=BWY;>8B3[HB:k]8<fB:<a368B;A_M58Baa3[>:ML!aMGX°;8BC>C>3[T36=@; T%=TdI:a%T8B;>C>8JL!Ca:<;>a3[T3[H43[TfoP@3[HB:<;TdI:8<HJ8B36568J¤S5[: :MHs36C>:<;>_`:Bb:8B56a%=oaT^>CIf±TdI: :`¢:<_`T«=BWRS8JLe8BU:MT%:ML36;>8B_M_M^IL!8­¦_M36:<a36;H43[:M²=BWVae_`:<;>8JLe3[=@a=BWV8BC>C>36T3[=@;>8B5:MH436CI:<;S_`:BG\©ª:AadI=³Td>8JT«TdI:£a%T8B;>C>8JL!Ca%:<;Sa3[T3[H43[TgfW¯^>;>_`T36=@;>aa^I´ _`:OT%=¥CI:<a_`Le3[¤:Oa^>_!dKa_`:<;>8JL!3[=1a%:<;>a36T3[Hs36T3[:<aMG

µ ¶·E¸B¹@º »¼½s¸@¾gº¿·

k]8<fB:<a38B;ª;I:MTgl=BL4a&8JLe:=BWT%:<;¨:<URS5[=fB:<C\W=BLO_M568Ba¦a3[_M8JT3[=@; T8BasalVdI:MLe:+8B;36;IRS^IT36;>a%T8B;>_`:&36aT%=«¤:8Baa36P@;I:<CªT%=K=@;I: =BW]8\;?^>U¤:MLO=BW]=@^IT%RS^ITy_M568Baea%:<aMGvdI:±8B_`T^>8B5_M568Baae3[>:MLTdI:<;w36a£8ÀW¯^>;>_`T3[=@;ÁVd>3_ed8Baa36P@;>a 8Âa36;IP@56:K_M568Baa¥T%=:<8B_!dÃ3;IRS^ITMbl¤S8Ba%:<CÁ=@;TdI:+R/=@aT%:MLe3[=BLRSL=B¤S8J¤S3653[TgfC>36a%T%L!3[¤S^IT3[=@;_`=@UR^IT%:<CWL=@U$TdI:k]8fB:<a368B;±;I:MTgl=BL£W=BLVTdI:y=@^>T%RS^ITHJ8JLe3§¦8J¤S5[:BGÄ24^>_edÅ_M568Baae3[>:MLeao8JL:=BWT%:<;ŤS^>3656TK^IR=@;Æ8;>8B3[HB:okl8fB:<a368B;Á;I:MTgl=BL/b]_`=@;Sa36a%T36;>Pª=BW&8Çae36;IP@5[:_M568BaaHJ8JLe368J¤S5[:¥8B;>CÀ8¨;?^SU¤:MLy=BW=B¤a%:MLHJ8J¤S5[:1W:<8­¦T^IL:OHJ8JLe368J¤5[:<aMb4:<8B_edo=BWE+d>36_edA3aU=sC>:<565[:<CA8Ba¤:`¦36;IP¥36;>CI:MR:<;>CI:<;?T=BW :MHB:MLfK=BTd>:ML&W:<8JT^IL:H­8JL!368J¤S5[:P@3[HB:<;\TdI:_M568BaaVH­8JL!368J¤S5[:BGvVdI:y;?^SU:MLe36_M8B5R8JLe8BU¦:MT%:MLeaW=BLa^>_!d8&;S8B3[HB:;I:MTgl=BL8JLe:PB:<;I:MLe8B565[f:<a%T3§¦U18JT%:<C£WL=@UC>8JT88B;>C36;I:MH43[T8J¤S5[f18JL:36;>8B_M_M^>Le8JT%:BG msR:MLe36U1:<;~TadS8<HB:+aedI=V; T36U:V8B;SC 8JP@8B36; Td>8JT_M568Baae3[>:MLea¤^>365[T=@;¨;>8B3[HB:«kl8fB:<a368B;±;I:MT=BL4a8JL:s^>3[T%:oaT8J¤S5[:BȱTdI:Mf®8JL:¨_`=@UR:MT3[T3[HB:¨V3[Td®=BTdI:ML5[:<8JLe;>:MLeaMbJL:MP@8JLeC>56:<aaY=BW>TdI:a3[ÉM:]8B;>C?^S8B563[TgfO=BWTdI:C>8JT8a%:MTYWZLe=@UÆVdS36_edTdI:Mf8JL:5[:<8JL!;I:<C¥Ê Q =@U136;>PB=@aË h 8JÉMÉ<8B;>3blÌ<rBr@Í?ÎY9+36adbÏJpBpIÌJÎ D¿36^¨:MTy8B5G[bYÏJpBp@u@Ð!GÑ R>RS8JL:<;?T5[fBbV36;S8B_M_M^ILe8B_M3[:<a£36;ÁTdI:¨RS8JLe8BU:MT%:MLea£=BWTdI:^>;>C>:MLe5[f436;IP¥;>:MTgl=BL¨C>=£;>=BTae3[P@;>3[S_M8B;?T5[fK8JWZ¦W:<_`TTdI:yR:MLW=BLeU18B;>_`:O=BWa^>_!dK81;>8B36HB:y_M568Baa3[S:ML<Gvd>:=B¤Sa%:MLHB:<C1aT8J¤S36563[Tf=BW;>8B3[HB:k]8fB:<a368B;_M568Ba¦

a3[S:MLeaÒU18f$¤/:Ó8JT%T%Le3[¤S^IT%:<CT%=Ôʯ8Õ_`=@U¤S3;>8JT3[=@;=BW!ÐR>L=BR:MLT3[:<a=BWVTdI:£C>8JT8±8B;>CÂ=BWTdI:£_M568Baae3[>:MLW¯^>;>_`T3[=@;G«vdI:_`=@U1U=@;S5[fo^>a:<C±V36;>;I:ML¦ÖT8JB:<a¦g8B565Le^>56:Bbs+d>36_ed£8Baae3[P@;>a]8B;K36;>aT8B;>_`:T%=18_M568Baa]Vd>3_ed36aAU=@aTAR>L=B¤S8J¤S56:±8B_M_`=BL!C>36;IP¡T%=ÒTd>:Ç^>;>CI:MLe56fs36;>Pk]8<fB:<a38B;Á;I:MT=BLeb]W=BL1:`m48BU1RS5[:Bb]a%:M:<U1a1T%=_`=@;4¦T%Le3[¤^IT%:T%=1TdI:y;>8B3[HB:_M568Baa3[S:ML<¬qaa^>_M_`:<aaÊ Q =@U 36;4¦PB=@a Ë h 8JÉMÉ<8B;>3byÌ<rBr@ÍBÐ!G.vdI:A=B¤Sa:MLHB:<C®a%T8J¤36563[TfU18fÁ8B56a%=¤:¨8JT%T%L!3[¤S^IT%:<CÃT%=×TdI:ª;>8B3[HB:±36;>CI:MR:<;4¦CI:<;>_`:ARSL=BR:MLT3[:<a=BWTdI:K_M568Baae3[>:ML<G×X°T d>8Ba¤:M:<;adI=V;bIW=BL:`mI8BURS5[:Bb>TdS8JTV;>8B3[HB:yk]8<fB:<ae368B;o_M58Baa3§¦>:MLeaR/:MLeWZ=BLeUl:<565W=BLV¤=BTdK_`=@U1RS5[:MT%:<5[f£36;>CI:MR:<;4¦CI:<;?TA8B;>CwW^S;>_`T3[=@;>8B5656fÒCI:MR:<;>CI:<;?T¥W:<8JT^IL:±HJ8JLe3§¦8J¤S5[:<aÊ Q =@U 36;IPB=@a Ë h 8JÉMÉ<8B;S3b¿Ì<rBr@Í?Î9+36adb>ÏJpBpIÌÐ!GX;TdS36a¿R8JR/:MLbl:]:<URS5[=fOT%:<_ed>;>3?^I:<aYWL=@UÅa%:<;4¦a3[T36Hs3[Tf8B;S8B5[fsae36aT%=¥aT^>CIfTdI:y:`¢:<_`Ta+=BW R8JLe8BU:`¦T%:ML3;>8B_M_M^ILe8B_M3[:<ay=@;Ç8K;>8B3[HB:1k]8fB:<a368B;Â;I:MTgl=BL/¬qaR=@a%T%:MLe3[=BLR>L=B¤S8J¤S3563[TgfoC>36a%T%Le36¤S^IT3[=@;>a&8B;>CªTd>:ML:M¤?f_`=@;?T%Le3[¤S^IT%:ofB:MTA8B;I=BTdI:ML£:`msR568B;>8JT3[=@;Á=BWyTdI:¨=B¤I¦a%:MLHB:<Coa%T8J¤S3653[Tgf¥=BW;S8B3[HB:yk]8<fB:<a368B;¨_M568Baae3[>:MLeaMG X;PB:<;I:MLe8B5ÖbTdI:1a%:<;>ae3[T3[H43[TgfK=BW]8£R=@a%T%:MLe3[=BLR>Le=B¤S8J¤S365§¦3[Tfª=BWV36;?T%:ML:<a%T«T%=oRS8JL!8BU:MT%:MLH­8JLe38JT3[=@;À_`=@UR563[:<aV3[TdO8Va36URS56:U 8JTdI:<U18JT36_M8B5BW¯^>;>_`T36=@; ÊÖF]8Ba%T3655[=ØMÙÚJÛ[Ü bÌ<rBr@Í?ÎFl=@^IR]Ý: ËÅÞ 8B;KC>:ML+N8B8JPIb/ÏJpBp@Ï@Ð!b/_M8B565[:<C8Åa%:<;>a3[T36Hs3[TfßW^>;S_`T3[=@;G ©ª:wV36551CI:<U=@;>aT%Le8JT%:Td>8JTyTd>:136;>CI:MR:<;>CI:<;S_`:«8Baae^>UR>T3[=@;SaO^>;>CI:MLe56fs36;>P8;>8B3[HB:Ok]8<fB:<a38B;A;I:MTgl=BL£_`=@;>a%T%Le8B36;TdI:<a:a:<;>a3§¦

Page 266: Proceedings of the Third European Workshop on Probabilistic

T3[H43[Tf«W¯^>;>_`T36=@;>aT%=a^>_!d¥8«568JLPB:+:`msT%:<;?T]Td>8JTTdI:Mf_M8B;36;«W¯8B_`T ¤::<a%T8J¤5636adI:<C:`mI8B_`T5[fWL=@UßHB:MLf«5636U¦3[T%:<C 36;IW=BLeU18JT3[=@;GEX°;¥8BC>C>3[T36=@;bB:+a%T^>CIf«TdI:HJ8JL%¦3[=@^Sa&a:<;>a3[T3[H43[Tf\R>L=BR:MLT36:<a+TdS8JTOWZ=@5656= WL=@U TdI:_`=@;>aT%Le8B36;I:<C®W^>;S_`T3[=@;>aMGéª:ol=@^>56C®563[B:KT%=×;I=BT%:Td>8JTKa:<;>a3[T3[H43[Tf®8B;>8B5[f4a36a£d>8Ba£¤:M:<;.8JR>RS53[:<C®¤:`¦W=BL:O36;TdI:y_`=@;?T%:`msT=BWY;>8B3[HB:Ok]8<fB:<a38B;\_M568Baae3[>:MLeaMb8Ba8¥U:<8B;>a=BWR>Le=H436C>36;IP«¤=@^>;>C>a]=@;KTdI:8BU=@^>;?T=BW]RS8JLe8BU1:MT%:MLHJ8JLe368JT3[=@;ÇTd>8JT36a8B565[=l:<CV3[TdI=@^IT_!d>8B;IP@36;IPIbSWZ=BL Ú =BWTdI:R=@aa36¤S5[:36;>a%T8B;>_`:<a<b>TdI:_M568Baea8B;×36;>aT8B;>_`:3a8Baa36P@;I:<CÀT%=ÒÊÖF]dS8B; Ë Q 8JL%¦V3_edI:BbOÏJpBp@t@Ð!G$©ª:Ç:`m4T%:<;>Cw=@;.Td>36aL:<ae^>5[T£V3[TdW¯^ILTdI:ML3;>a3[P@d?TaMG =BL®_M568Baa36S_M8JT3[=@;ÕR>L=B¤S56:<U1aMbA3[T¡36a×=BWT%:<;8Ba¦a^SU:<CªTd>8JT:MHs36C>:<;>_`: 36a8<HJ8B36568J¤S5[:W=BL«:MHB:MLfÇa36;4¦P@5[:oWZ:<8JT^>L:oHJ8JLe368J¤5[:BGÆX;Á;s^>U:MLe=@^>a 8JR>R5636_M8JT3[=@;CI=@U 8B36;>aMb>dI=:MHB:MLbTd>3aV8Baa^SUR>T3[=@;KU 8<fK;I=BT¤:L:<8B536a%T36_JG)vd>:\?^>:<a%T3[=@;ÁTdI:<;Á8JL!36a%:<a d>= U^S_ed36U1RS8B_`T¥8BC>CS3[T3[=@;>8B5:MH436CI:<;S_`:o_`=@^>56CÃd>8HB:o=@;ÁTdI:=@^IT%R^IT =BW/3;~T%:ML:<aT8B;>C1dI=Áa%:<;Sa3[T3[HB:]Td>36a36URS8B_`T36aT%=¨36;>8B_M_M^IL!8B_M3[:<aO36;ªTdI:1;>:MTgl=BL¬qaRS8JLe8BU1:MT%:MLeaMG©ª:136;?T%L=4C>^>_`:TdI:1;I=HB:<5;I=BT3[=@;±=BW]a_`:<;>8JL!3[=Ka%:<;4¦a36T3[Hs36Tgf£T%=_M8JRST^IL:«TdI:568JT%T%:MLTf?R:«=BWa%:<;>a3[T36Hs3[Tf8B;>Cad>=¡Td>8JTTdI:]:`¢:<_`Ta =BWSRS8JL!8BU:MT%:ML H­8JL!368JT3[=@;36;±H43[:M =BWa_`:<;>8JL!3[=@aO=BW8BC>C>3[T36=@;>8B5Y:MHs36C>:<;>_`:1_M8B;¤::<a%T8J¤S563adI:<C£:M´ _M3[:<;?T5[fBGvVdI:oR8JR/:MLA36a£=BLeP@8B;>36a%:<Cw8BaAW=@565[=VaMG X;Æ2s:<_ ¦T3[=@;ªÏsbl:R>L:<a%:<;?T&a=@U:R>L:<5636U 36;>8JLe3[:<a=@;¨a:<;>a3§¦T3[H43[TfW¯^>;>_`T36=@;>aV8B;>C\TdI:<3[L8Baa%=4_M368JT%:<C¨a%:<;>a3[T36Hs3[TfR>Le=BR/:MLeT3[:<aMGlX°;ª2s:<_`T3[=@;¨tsbl::<aT8J¤S5636adKTdI:W¯^>;>_ ¦T3[=@;S8B5BWZ=BL!U1a=BW4TdI:a:<;>a3[T3[H43[Tf+W¯^>;>_`T3[=@;SaWZ=BLY;>8B3[HB:k]8<fB:<ae368B; ;I:MT=BLesa8B;>C8BC>CIL:<aea TdI::<;>a^S36;IPOa%:<;4¦a36T3[Hs36TgfÂR>L=BR:MLT3[:<aMGÃX;Ã2s:<_`T3[=@;Àl:¨3;~T%L=4C>^>_`:TdI:;I=BT3[=@;¥=BW¿a_`:<;S8JLe3[=a:<;>a3[T3[H43[Tf8B;SC¥ad>=wTd>8JT3[Ty_M8B;À¤:«:<aT8J¤S5636ad>:<C±WZL=@U a%T8B;>C>8JLeCÀa%:<;>a3[T36Hs3[TfW¯^>;>_`T3[=@;SaMGYvVdI:VRS8JR:ML:<;>C>a+3[Td1=@^IL]_`=@;>_M56^Sa3[=@;>a8B;>CACS3[L:<_`T3[=@;>a]W=BLW¯^IT^IL:OLe:<a%:<8JLe_edo36;\2s:<_`T3[=@;ousG

¹ g¾w¾·¹~¾ vE=o36;?HB:<a%T3[P@8JT%:TdI:¥:`¢/:<_`Ta«=BW36;>8B_M_M^>Le8B_M3[:<a3;Ç3[TaRS8JL!8BU:MT%:MLeaMb8.k]8<fB:<a38B;²;I:MT=BLß_M8B; ¤:×a^I¤I¦7%:<_`T%:<C T%=«8a%:<;Sa3[T3[H43[Tgf8B;>8B5[f4a36a<GEX;1a^>_!d 8B;8B;>8B5§¦f4a36aMbBTd>:l:`¢:<_`Ta=BWSHJ8JLfs3;IPO8Oa36;IP@5[:]RS8JL!8BU:MT%:ML=@;8B;.=@^>T%RS^ITAR>L=B¤8J¤S36563[TfÒ=BW36;?T%:ML:<a%TK8JL:ÀaT^>C>3[:<CGvd>:<a%:¥:`¢:<_`Ta18JL:KC>:<a_`Le3[¤:<CǤsf8±a3URS5[:U18JTd4¦:<U18JT3_M8B5EW¯^>;>_`T36=@;b_M8B565[:<C¨8<Ø Ù¯Ù ! Ù#" GX°Wb­^IR=@;H­8JLefs36;>P8a36;IP@56: RS8JLe8BU:MT%:ML<b­TdI:R8JLe8BU:`¦

√|2r|

center(s,t)

• vertex

r < 0s > 1t ≤ 1

r < 0s < 0t ≥ 0

r > 0s < 0t ≤ 1

r > 0s > 1t ≥ 0

I

IVIII

II

3[P@^ILe:ÁÌJÈÓvVdI:ÀR=@aa36¤S5[:Àd?f?R:ML¤=@568Bao8B;>CÅTdI:<3[L_`=@;>aT8B;~TaMb +dI:ML:1TdI:_`=@;>a%T%Le8B3;~Ta=@;%$4b&8B;>C('8JL:yaR/:<_M36S_WZ=BL+a%:<;>a3[T36Hs3[Tf W¯^>;>_`T3[=@;SaMG

T%:MLeaYR:MLT8B36;S36;IPT%=&TdI:a8BU:]_`=@;>C>3[T36=@;>8B5~C>3a%T%Le3[¤S^I¦T3[=@;¨8JL:_`=J¦ÖH­8JL!3[:<C¨R>L=BR=BLT3[=@;>8B55[fBb>TdI:<;¨TdI:a%:<;4¦a36T3[Hs36TgfW¯^>;>_`T36=@;36a 8ys^I=BT3[:<;?T =BW/T=5636;I:<8JL W¯^>;>_ ¦T3[=@;Sa36;TdI:R8JLe8BU:MT%:ML*)^>;>C>:MLa%T^>CIfÊÖF]8Ba%T3655[=ØMÙÚJÛ[Ü bÌ<rBr@Í?ÎFl=@^IR]Ý: Ë²Þ 8B;±CI:MLN8B8JPIb¿ÏJpBp@Ï@Ð!ÎEU=BL:W=BLeU18B5656fBbsTd>:yW^S;>_`T3[=@;£T8JB:<a+TdI:OWZ=BL!U

+ Ê)Ð-,/.10 )32546 0 )7298Vd>:ML:TdI:_`=@;>a%T8B;?Ta . : 4 : 6 : 8o8JL:¤^>365[TWL=@U(TdI:8Baa:<aaU:<;?Ta W=BL TdI:;I=@;4¦ÖHJ8JLe3[:<C«RS8JLe8BU1:MT%:MLeaMG vdI:_`=@;>aT8B;~Ta«=BWTdI:¥W^>;S_`T3[=@;Ç_M8B;¤/: WZ:<8Ba36¤S5[fª_`=@U¦RS^>T%:<C³WL=@U TdI:.;I:MT=BLe ÊÖFl=@^>RlÝ: ËÞ 8B;CI:MLN8B8JPIbÏJpBp@ÏsÎ ;l7=<]Le^S5§¢ ËßÞ 8B;oCI:ML&N8B8JPIbÏJpBpBp~Ð!GÑ a%:<;Sa3[T3[H43[TgfW¯^>;>_`T3[=@;K3a:<3[TdI:ML&8 Û Ø Ú> W¯^>;>_ ¦T3[=@;1=BL8OWLe8JP@U:<;?T=BW8 > Ø Ù Ú?A@?4Û[Ú>CBA=D Ø >FE " Û[Ú G ÑL:<_`T8B;>P@^>568JLVd?f?R:ML¤=@568T8JB:<a+TdI:yPB:<;I:MLe8B5WZ=BLeU+ Ê)Ð-, $

)HG(& 2('Vd>:ML:Bb~W=BL8W^>;S_`T3[=@;V3[Td .I: 4 : 6 : 88Ba¤:MW=BL:Bb~l:d>8HB:OTd>8JTC&J,KGMLN~bO',/PN4b48B;SCQ$R,TSN-2U& 0 ' GvdI:d?f?R:ML¤=@568Vd>8BaYTgl=&¤SLe8B;>_ed>:<aY8B;SCTdI:lTgl=y8Ba%f4URI¦T%=BT%:<aV),W&£8B;>C5+ Ê)ÐX,T' Î 3[P@^IL:¨Ì 36565^>a%T%Le8JT%:<aTdI:o5[=s_M8JT36=@;>a=BWOTdI:KR=@aae3[¤S5[:£¤>Le8B;S_edI:<aLe:<568JT3[HB:T%=KTd>: 8Ba%f4UR>T%=BT%:<aMGK2436;>_`:¤=BTdY)Â8B;>C(+ Ê)Ð&8JL:R>Le=B¤S8J¤S365636T3[:<aMb<TdI:]T=J¦gC>3U:<;>a3[=@;S8B5?a%R8B_`:l=BWTdI:<3[LW:<8Ba3[¤S56: H­8B56^>:<a36aCI:M;I:<CO¤sfp3Z[) : + Ê)Ð\Z)ÌJÎBTd>36aa%R8B_`:36aT%:MLeU:<CTd>: Ù^]- !_ "] G2436;S_`:]8B;?fa%:<;4¦a36T3[Hs36Tgf\W¯^>;>_`T36=@;ªadI=@^>56C¨¤:l:<565§¦Ö¤:<d>8<HB:<CÀ36;±TdI:^>;S3[T+36;>CI=yb8ªd~fsR:ML¤=@5636_¥a%:<;>a3[T36Hs3[TfÇW¯^>;>_`T3[=@;36a8WZLe8JP@U1:<;~T=BW]7^>a%T=@;I:o=BWTdI:oW=@^ILR=@aa3[¤S56:¤>L!8B;>_edI:<aV36; 3[P@^IL:1ÌJG L=@U 8ªa%:<;Sa3[T3[H43[TgfW^S;>_`T3[=@;bH­8JLe36=@^>aR>L=BR:ML%¦T3[:<a®_M8B;¤:)_`=@URS^IT%:<CG `V:MLe:):Ó¤>Le3[:baSf L:`¦

256 S. Renooij and L. van der Gaag

Page 267: Proceedings of the Third European Workshop on Probabilistic

H43[:MÁTd>:+R>Le=BR/:MLeT3[:<a=BW <Ø Ù¯Ù ÚJÛ ØʯD¿8BaB:MfBbÌ<rBrBu@Ð!bQJØ > ÙgØ D > " V¯Ù 8B;SC Ú_ 7 = EMÛ Ø _ Ø ÚÙ#" Ê Þ 8B; CI:MLÇN8B8JP Ë 9:<;>=?=@3 7Mb«ÏJpBpIÌÐ!G vdI:MØ ¯Ù¯Ù ÚJÛ ØÇW=BL\8×RS8JL!8BU:MT%:ML\36a£TdI:Ç8J¤a%=J¦56^IT%:H­8B56^>:

Ê)Ð=BW TdI:SLea%TVCI:ML!3[H­8JT36HB:y=BW TdI:

a%:<;>ae3[T3[H43[Tgf«W^S;>_`T3[=@;18JTTdI:+8Baa:<aaU:<;?T )WZ=BLTdI:RS8JLe8BU1:MT%:ML<Î>3[T]_M8JR>T^IL:<a]TdI::`¢:<_`T=BWE8B;A36;>S;>3[T%:<a¦36U18B55[faeU18B565¿aed>3[WTV36;KTd>:RS8JLe8BU1:MT%:ML&=@;oTdI:=@^IT¦RS^IT+R>L=B¤S8J¤36563[TfBGvdI:3URS8B_`T&=BW8¥568JLePB:MLaed>3[WTV36aa%T%L=@;>P@5[fOCI:MR:<;>CI:<;?T^IR=@;OTdI:5[=4_M8JT3[=@;=BWITdI:CBØ >ÙgØ =BWTdI:+a%:<;>a3[T36Hs3[TfW¯^>;>_`T36=@;b@Td>8JT3aMb~TdI:VR/=@3;~TVdI:MLe:

Ê)Ð ,ßÌJG Ñ HB:MLeT%:`m«Td>8JT53[:<a¿+3[Td>36;TdI:

^>;>36TV36;>C>=À¤8Ba36_M8B565[fOU18JLesaETdI:lT%Le8B;>a3[T36=@;yWL=@URS8JLe8BU1:MT%:MLEHJ8B56^I:<a+3[Tdy8VdS3[P@dya%:<;>ae3[T3[H43[Tgf+H­8B56^>:T%=RS8JLe8BU1:MT%:ML HJ8B56^I:<aV36Td¡8ª5[= a:<;>a3[T3[H43[TfÀHJ8B56^I:Bb=BL«Hs36_`: HB:MLea84GAX°W]8B;=@^IT%RS^>TyR>L=B¤S8J¤36563[Tfo36a^>a%:<CW=BL:<a%T8J¤S536ad>36;>P+TdI:+U=@a%T5636B:<5[fHJ8B56^I:=BWTdI:V=@^IT¦RS^ITHJ8JLe368J¤S56:Bb TdI:A:`¢/:<_`T=BW&RS8JLe8BU1:MT%:ML1H­8JL!368JT3[=@;=@;¡TdI:oU=@a%T 5636B:<5[fÀ=@^IT%R^ITH­8B5^I:K36a=BWO36;?T%:ML:<a%TMGvdI: Ú_ 7 = E`Û Ø _ Ø Ú Ù#" WZ=BL8oRS8JLe8BU:MT%:ML1;I=36a&8 R8B3[LÊ : Ð!bVd>:ML:Ò36aVTd>:«8BU=@^S;~T&=BWHJ8JLe3§¦8JT3[=@;ª8B5656=l:<C±T%=HJ8B56^I:<aOaeU18B565[:ML&Td>8B;ª3[Ta8Baa%:<aea¦U:<;?TV36TdI=@^IT_!d>8B;IP@36;IPoTdI:¥U1=@a%T5636B:<5[fo=@^>T%RS^ITHJ8B56^I:Bb>8B;>C 3aTd>:y8BU=@^>;?T=BWEHJ8JLe368JT3[=@;A8B55[=l:<CT%= 568JLePB:MLH­8B5^I:<aMÎITdI:ya%f4U¤/=@5aß8B;>Cß8JL:^>a%:<CT%=±36;>C>36_M8JT%:¥Td>8JT«H­8JLe38JT3[=@;3a8B5656=l:<CÂ^>RÀT%=±TdI:¤=@^>;>C>8JL!3[:<al=BWETdI:^>;>3[T]V36;SCI=yG

S· ­¾Ö¸@¾¾¸@¾ ¾·! (¾*#" J¾$ >¹ c+R/=@;Ť:<36;IPÃa^I¤47%:<_`T%:<CßT%=)8wa%:<;>a36T3[Hs36Tgf)8B;>8B5[f?¦a36a<b&Td>:36;>C>:MR/:<;SCI:<;>_`:ª8Baa^>U1R>T3[=@;>aA=BW8Á;>8B3[HB:k]8<fB:<a38B;Ó;I:MT=BLe)a%T%L=@;IP@5[f._`=@;>aT%Le8B36;.TdI:ÀPB:<;4¦:MLe8B5 W=BLeU =BWlTdI:L:<ae^>5[T36;IPAa%:<;Sa3[T3[H43[Tgf\W^>;S_`T3[=@;>aMGX;ªW8B_`TMbYP@3[HB:<;7^>a%Ty536U13[T%:<C±36;IW=BLeU18JT36=@;bTdI::`ms¦8B_`T W¯^>;>_`T3[=@;>aY_M8B;¤:L:<8BCS365[fyCI:MT%:MLeU 36;I:<CWZ=BL :<8B_!d_M568BaaVH­8B5^I:8B;>C\:<8B_edoRS8JLe8BU:MT%:MLGX°;\Td>36aVa:<_`T3[=@;l:1CI:MLe3[HB:TdI:<a%:W¯^>;>_`T36=@;>a8B;>CªC>36a_M^SaaVTd>:<3[LO:<;4¦a^>3;IPa%:<;>a3[T36Hs3[Tf R>Le=BR/:MLeT3[:<aMG

%'&( )+*-, B.0/ , 1324/S5kl:MWZ=BLe::<a%T8J¤S536ad>36;>P«Td>:a%:<;>ae3[T3[H43[Tgf¥W^>;S_`T3[=@;>aW=BL;>8B3[HB:Ãk]8<fB:<a368B;;I:MT=BLesaMb£l:Á36;?T%L=4C>^>_`:Ãa%=@U:;I=BT8JT3[=@;S8B5«_`=@;?HB:<;?T3[=@;>aMG ©ª:¡_`=@;>a36CI:ML±8Ã;>8B3[HB:k]8<fB:<a38B;K;I:MTgl=BLV3[Td;I=4CI:<a76¥Ê98Ð, z:«<;>= b= , z=@? : A A A: =7BS bDCFE Ïsb+8B;SCÃ8JL!_MaHGÊ98yÐ ,z Ê : : =I ÐKJMLH, Ì : A A A: C Î : 36a _M8B565[:<CÒTd>: MÛ[Ú = Ú?> Ú E`Û ØK=BWVTdI:£;I:MTgl=BL×8B;>C =7I 8JLe:T%:ML!U:<C¡3[Ta

Ø Ú Ù > Ø7 Ú?> Ú E`Û Ø G©ª:«;>=ßa%T^>CIf£TdI:a:<;>a3[T3[H43[Tf=BWlTdI:1R=@a%T%:MLe3[=BLORSL=B¤S8J¤S3653[Tgf h LÊ 6 JNÐ=BW]8A_M58BaaHJ8B56^I: 6 P@3[HB:<;Á8B;Ã36;IR^IT136;>a%T8B;S_`:KN¨36;ÒT%:ML!U1a =BW8±RS8JLe8BU1:MT%:MLM) ,POÊN QRSJ 6 QÐ!b+dI:ML:TN QR\C>:<;I=BT%:<aa%=@U:«HJ8B56^I:=BWTdI:W:<8JT^IL:«HJ8JLe368J¤S5[: = R 8B;>C 6 Q«36a8+H­8B56^>:=BWITdI:]_M568BaaEH­8JLe38J¤S5[:BG¿vd>:=BLe3[P@3;>8B5@8Baa%:<aea¦U:<;?Ta%R:<_M3[>:<C 36;TdI:+;>:MTgl=BLWZ=BLTdI:RS8JLe8BU1:MT%:ML)±36aCI:<;I=BT%:<CK¤?fQ)JGlvdI:y;I=BT8JT36=@;>O N 36aV^>a%:<CAW=BLTdI:&R=@a%T%:MLe3[=BLR>L=B¤S8J¤S3563[Tgf=BW¿3;~T%:ML:<aTR>Le36=BLT%=RS8­¦Le8BU:MT%:ML&HJ8JLe368JT3[=@;G©ª:«^>a:X+'UWV9X NZY []\ Ê)Ð]T%= C>:<;I=BT%:TdI:Va:<;>a3[T3[H43[TfW¯^>;>_`T3[=@;Td>8JT:`msR>Le:<aa%:<a TdI:VR>L=B¤I¦8J¤S3653[Tgf¥=BWY36;?T%:ML:<a%T+36;£T%:MLeU1a=BWTdI:OR8JLe8BU:MT%:MLC)EGvd>:KWZ=@55[=V3;IPªR>L=BR=@a3[T36=@;¡;I=Ô_M8JR>T^IL:<a¥TdI:a%:<;>ae3[T3[H43[TgfÁW^>;S_`T3[=@;.Td>8JT\C>:<a_`Le3[¤:<a£TdI:Ç=@^>T%RS^ITR>L=B¤8J¤S36563[TfA=BW36;?T%:ML:<a%T=BWl8A;>8B3[HB:k]8<fB:<a368B;À;I:MT¦l=BLo36;oT%:MLeU1a=BWl8£a36;IP@5[:RS8JLe8BU:MT%:MLy8Baea%=s_M38JT%:<CV3[Td=@;I:A=BW+TdI:£WZ:<8JT^ILe:HJ8JLe368J¤5[:<aMGªvdI:£R>L=BR=J¦a3[T36=@;.adI=+a£Td>8JTKTd>36a£W¯^>;>_`T36=@;Ã3aAd>3[P@d>5[fÁ_`=@;4¦a%T%Le8B3;I:<C£8Ba8L:<a^>56Tl=BWETdI:y36;>C>:MR/:<;SCI:<;>_`:<al36;£TdI:;I:MT=BLÒ8B;SCÒTdI:oRS8JLe8BU:MT%:ML¤:<36;IPÇ_`=@;>C>36T3[=@;I:<C=@;A8HJ8B56^I:&=BWETdI:O_M568Baea]H­8JLe38J¤S5[:BGX°;£W¯8B_`TMb>W=BL8B;~f_M568Baa]HJ8B56^I:8B;>CA8B;?f RS8JL!8BU:MT%:ML<bI=@;>5[f =@;I:=BWEW=@^ILC>3§¢:ML:<;?TW^S;>_`T3[=@;>a_M8B;ALe:<a^>5[TMG ©ª:yl=@^>56C\563[B:T%=;I=BT%:&Td>8JT]TdI:&R>L=BR=@a3[T3[=@; dI=@56C>a=@;>5[fW=BLlR8JLe8BU¦:MT%:MLea«8B;>C=@^IT%RS^ITyRSL=B¤S8J¤S3653[T3[:<a&Td>8JT8B565[=³WZ=BL8d?f?R:ML¤=@5636_&a%:<;>ae3[T3[H43[Tgf¥W^>;S_`T3[=@;G^ /`_M/.Z./ ,a(`&cb ØMÙ = R E Ø Ú Ø Ú Ù > Ø Ú> Ú EMÛ Ø]-¯Ù B Ù B ØU ÚJÛ ØdN R Ú _.Û Ø<Ù1) ,eOYÊN QRfJ 6 Q<Ð E ØÚ D4Ú?>Ú Ø<ÙgØ >ÓÚ " Ú ÙgØ _ ]-Ù B = R Ü g*B Ø ih Ù B ØMØ ¯Ù¯Ù Y Ù" + UjV4X NkY [l\ Ê)Ð BsÚ " Ø" Ù B Ø " ÛZÛ "]- A@J " > 1 m

nZoprqts]u vxw4yz| ~k~ ~k7T~ zz3 z3cz3k s z z3 s z' z>

] B Ø > Ø & ? , )^G Ú _ & , ) 2X ? \ Ú> ØyÙ B Ø

JØ > Ù eÚBÛ&Ú D Ù"@ÙgØ " Ù B Ø " >> Ø D " _ A@ Ù#" Ú _ OQ- Ù B Ø7" > @ ÚJÛ ÚJÛ Ø1" h LÊ 6 QJN­Ð Ü > "F" MÜ ©ª:ÅR>L=HB:ÓTdI:ÆR>L=BR:MLTgfaT8JT%:<C(36; TdI:R>L=BR=@a36T3[=@;WZ=BLDN R ,N QR~ÎITdI:R>L=s=BWEW=BLDN RH,N QR«36a8B;>8B5[=BPB=@^SaMG ©ª:O¤:MP@36;1¤sf1Le36T36;IPyTdI:&R>L=B¤8J¤S36563[Tfh LÊ 6 : N­Ð36;£T%:ML!U1a=BWYTdI:;I:MT=BL/¬qaR8JLe8BU:MT%:MLeaMÈh LÊ 6 : N­Ð ,

`xk] OYÊN I J 6 Ð 0 OÊ 6 Ð 0 OYÊN R J 6 Ð :

Evidence and Scenario Sensitivities in Naive Bayesian Classifiers 257

Page 268: Proceedings of the Third European Workshop on Probabilistic

Vd>:ML: N I 36aTd>: HJ8B56^I: =BWTd>:¥HJ8JLe368J¤S56: =7I 36;ÀTdI:36;>RS^IT 36;>a%T8B;S_`:NBGvd>3aR>L=B¤S8J¤S3563[TgfL:<568JT%:<aT%=yTdI:RS8JL!8BU:MT%:ML ) , OYÊN QRJ 6 Q Ðl8Bah LÊ 6 Q : N­Ð`Ê)Ð ,

`xk] OYÊN I J 6 Q Ð 0 OYÊ 6 Q Ð 0 )

W=BL 6 , 6 Q8B;>CK8Bah LÊ 6 : NÐ`Ê)Ð ,

xOYÊN I J 6 Ð 0 OYÊ 6 Ð

W=BL 6 , 6 Q¯Gl2436U13568JLe5[fBb h LÊNÐ_M8B;A¤:OLe3[T%T%:<;\8Bah LÊNÐ, h LÊ 6 Q : NÐ 2 N N

h LÊ 6 : NÐ :Vd>:ML: h LÊ 6 Q : N­Ð«8JP@8B36;×L:<568JT%:<aT%=±TdI:£RS8JL!8BU:MT%:ML)K8Ba36;SC>36_M8JT%:<C8J¤/=HB:y8B;>C¥TdI:=BTdI:MLT%:MLeU36a]_`=@;4¦a%T8B;?TV36Td£L:<a%R:<_`TT%=1TdI:RS8JLe8BU:MT%:ML<G =BL 6 , 6 Ql:y;I=ÓS;>C£W=BLTdI:ya%:<;Sa3[T3[H43[Tgf¥W^S;>_`T3[=@;£Td>8JT

+UWV9X N Y [l\ Ê)Ð-, h LÊ 6 Q : NÐ`Ê)Ðh LÊNÐ`Ê)Ð , .X0 ).10 )7258

, ))HG(& ?

8B;>C£W=BL 6 , 6 QTd>8JT+ UWV9X NkY [l\ Ê)Ð, h LÊ 6 : NÐ`Ê)Ðh LÊN­Ð`Ê)Ð , 4 N

.V0 )3258, $ N)HG%& ?

V36Td£TdI:y_`=@;>a%T8B;?Ta

. , `xk]

OYÊN I J 6 Q Ð 0 OYÊ 6 Q Ð

4 N , x

OYÊN I J 6 Ð 0 OYÊ 6 Ð8 ,

N N h LÊ 6 : NÐ

vd>:K_`=@;>a%T8B;?TQ& ? , GHLP 36a TdI:KHB:MLT36_M8B5&8Ba%f4URI¦T%=BT%:«Td>8JT36a+aed>8JL:<CK¤sf\8B565¿W^S;>_`T3[=@;>aMÎ/3[Ta+H­8B56^>:36aCI:MT%:ML!U136;I:<C£WL=@U

& ? , G h LÊN­Ð G h LÊ 6 Q : N­Ðh LÊ 6 Q : N­ÐZOYÊN R J 6 Q Ð, )\G )O Q

X;\8BC>CS3[T3[=@;bTdI:W¯^>;>_`T36=@; + UWV9X N Y [l\ Ê)Ðd>8Ba8dI=BL%¦3[ÉM=@;?T8B58Ba%f4UR>T%=BT%:8JT '\, PP ,ÕÌJGOvdI:W^S;>_`T3[=@;>a+ UWV9X NkY []\ Ê)Ð]+3[Td 6 , 6 Q b8B565dS8<HB:V' , P ,Æp4GvdI:_`=@;>aT8B;~Ta\$ N ,/S sP =BW¿Td>:58JT%T%:MLW^>;S_`T3[=@;>a]8JL:OCI:`¦T%:MLeU 36;I:<C£WL=@U O N , s G

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

f c (

x )

xx0

fc’ (x )

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

f c (

x )

xx0

fc’ (x )

3[P@^ILe:ÏsÈ mI8BURS5[:ya%:<;Sa3[T3[H43[Tgf£W¯^>;>_`T3[=@;SaWZ=BL+TdI:RS8JL!8BU:MT%:ML^) , OYÊNQ R J 6 QÐ+3[TdR) ,wp A ÏsbO`Q,wp A IbVd>:ML:N R , N QR Ê Û Ø ÙÐl=BL N Rc,N QR Ê > @B Ù%Ð!G

Le=@U³Td>:&HJ8B56^I:<a=BW¿Td>:&8Ba%f4UR>T%=BT%:<a]W=@^>;>C36;¥TdI:R>Le=?=BW8J¤=HB:Bbl:\=B¤Sa:MLHB:\Td>8JT¥WZ=BL N R , N QRoTdI:a%:<;Sa3[T3[H43[Tgf\W^>;S_`T3[=@;Y+ UjV4X N Y []\ Ê)Ð&36a8£WZL!8JP@U:<;~TO=BW8wX Þ TdI¦g?^>8BC>Le8B;~TÇd?fsR/:MLe¤/=@58Á¤SLe8B;>_edΣ8B55=BTdI:MLW¯^>;>_`T3[=@;Sa1+ UWV9X NZY []\ Ê)ÐV3[Td 6 , 6 Q8JL:£WLe8JP@U:<;?Ta=BW]Xa%T¦g?^S8BCILe8B;?Ty¤>Le8B;>_!dI:<aMG =BL N R , N Q R l:1S;>CXXeXLeCI¦8B;>C«XX;>C4¦gs^>8BCILe8B;?T¿¤>L!8B;>_edI:<a<bBL:<a%R:<_`T3[HB:<5[fBGvd>:<a%:+RSL=BR:MLT3[:<al8JL:O365656^>aT%Le8JT%:<C¥3; 3[P@^ILe:OÏsGvE=36;?T^>3[T36HB:<5[f£:`msRS58B36;KVd?fKTdI:«W¯^>;>_`T3[=@; + UjV4X N Y []\ Ê)Ðd>8BaO8ACS3§¢/:MLe:<;~TOad>8JR:Td>8B;±TdI:=BTd>:MLyW^S;>_`T3[=@;>aMbl:=B¤Sa%:MLHB: Td>8JTyHJ8JLfs3;IPA8ARS8JLe8BU:MT%:MLOYÊN I J 6 Q Ðd>8BaV81C>3[L:<_`T:`¢:<_`TV=@;\TdI:OR>L=B¤8J¤S36563[Tf h LÊ 6 Q JNÐ!bVd>:ML:<8BaTdI:1R>Le=B¤S8J¤S365636T3[:<a h LÊ 6 JNÐ!b 6 , 6 QJb¿8JL:=@;>56f¨36;SC>3[L:<_`T5[f¨8­¢/:<_`T%:<CÂT%=o:<;>a^IL: Td>8JTTdI:C>36a¦T%Le36¤S^IT3[=@;¨a^>U1a&T%=A=@;I:BG Q ^I:T%=ATdI:_`=@;>a%T%L!8B36;I:<CW=BLeU)=BWITd>:<a%:W¯^>;>_`T36=@;>aMbU1=BL:M=HB:ML<b?8B565A+ UWV9X NkY []\ Ê)Ð!b6 , 6 Q­bydS8<HB:ÂTdI:Àa8BU:Âad>8JR:BG(vVdI:ÇW¯^>;>_`T3[=@;+UWV9X N Y [l\ Ê)Ð]TdI:ML:MW=BL: Ùl¤:yCI:MHs38B;~TMG©ª:)35656^>a%T%L!8JT%:ÁTdI:.W¯^>;>_`T36=@;>8B5WZ=BLeU aÂC>:MLe3[HB:<C8J¤=HB:ÀV3[Td.8B;.:`mI8BURS5[:BGvVdI:ª:`m48BU1RS5[:ÇadI=VaU=BLe:a%R:<_M3[_M8B565[f±Td>8JT8Ba8\L:<ae^>5[T=BW+TdI:<3[L_`=@;4¦a%T%L!8B36;I:<C×W=BLeUAb]8B;~f¡a%:<;>ae3[T3[H43[TgfÂW^S;>_`T3[=@;¡_M8B;Ò¤::<a%T8J¤5636adI:<CWL=@U$HB:MLfA563U13[T%:<C£36;IW=BLeU18JT3[=@;G `5 _-1 (`& ©ª:_`=@;>a36C>:ML8O;>8B3[HB:k]8<fB:<a368B;¥;I:MT¦l=BL£V3[TdA8¥_M568BaaHJ8JLe368J¤S56:yU=sC>:<565636;IP«TdI:aT8JPB:<aX b¿XX Ñ b¿XXk&b¿XXX bX ÞYÑ 8B;SCÀX Þ k =BWV_M8B;>_`:ML=BWTdI:=s:<a%=BRSd>8JP@^SaMÎBTdI:lWZ:<8JT^ILe:HJ8JLe368J¤S56:<a=BW>TdI:];I:MT=BL_M8JR>T^>L:OTdI:OL:<a^S5[Ta]WZLe=@UC>368JP@;I=@aT36_T%:<a%TaMG =BL+8P@3[HB:<;wR8JT3[:<;~TMbTdI:ª8H­8B3658J¤S5[:±S;>C>3;IP@a¥8JLe:Ça^>U¦U18JL!36a%:<C.36;.Td>:À36;IRS^>T36;Sa%T8B;>_`: N@bOP@3[HB:<;)VdS36_edTdI:W=@565[=V36;IPOR=@a%T%:ML!3[=BLC>36a%T%L!3[¤S^IT3[=@;«36a_`=@U1RS^IT%:<C=HB:MLVTdI:_M58BaaHJ8JLe368J¤S5[:lÈ

y ~Z| !#" !$!%" '&(!%" !)!%" !+*-,." /10 !%"

258 S. Renooij and L. van der Gaag

Page 269: Proceedings of the Third European Workshop on Probabilistic

©ª:ÀW¯^ILTd>:MLK_`=@;>a36CI:ML\TdI:ÇW:<8JT^IL:ÇHJ8JLe368J¤S56: g+Û " "JbBU=4CI:<56536;IPlTdI:=B¤Sa%:MLeHB:<CyR>L:<a:<;>_`:=BL¿8J¤Sa:<;>_`:=BW5[=s_`=J¦ÖLe:MP@3[=@;>8B5>5[f4URSdS8JT36_U1:MT8Ba%T8Ba%:<aMGvVdI:;I:MT¦l=BLA36;>_M56^>C>:<aTd>:yWZ=@55[=V3;IP18Baa%:<aaeU:<;~TaVWZ=BL+TdI:RS8JLe8BU1:MT%:MLeaW=BLTd>36a]HJ8JLe368J¤5[:BÈ

!%" ! !%" ! !%" !#" !%" !%" *

!%" & !%" & !%" !#" ,." !%" *"!x+=a^IRSR/=@a: Td>8JTl:£8JLe:£3;~T%:ML:<aT%:<C¡36;ÀTd>::MWZ¦W:<_`Taw=BWÇ36;>8B_M_M^ILe8B_M36:<aÃ3;TdI:ÅRS8JLe8BU:MT%:ML ) ,OYÊ g+°Û " "(, ;>= J T, X ÞYÑ Ð=@;ÁTdI:oR/=@aT%:MLe3[=BLR>L=B¤8J¤S36563[T36:<a h L­Ê JNÐ+W=BLy=@^ILOR8JT3[:<;~TO+dI=Ad>8Ba;I=:MHs3CI:<;>_`:=BW5[=4_`=J¦ÖL:MP@3[=@;>8B5U:MT8BaT8Ba%:<aMGvVdI:<a%::`¢:<_`Ta8JLe:o_M8JR>T^ILe:<CÒ¤sf×ae3§m¡W^>;S_`T3[=@;>aV36Td¡TdI:HB:MLT36_M8B5+8Ba%f4UR>T%=BT%:Y& ? , p A uBÏ G

# $# % ? , GVp A tBtsG©w3[TdI=@^ITd>8Hs36;>PT%=R:MLW=BLeU8B;~f¥W^ILeTdI:MLl_`=@U1RS^4¦T8JT3[=@;>a<b>:;I=);>C£Td>8JT+&('*)Ê)Ð, )

)72×p A tBt8B;>C+,+Ê)Ð, h L­Ê JNÐ 0

p A nBu)72×p A tBtW=BLy8B;~f , X ÞÑ G©ª:1W¯^ILTdI:ML&S;>C¨Td>8JTOW=BLOTdI:_`=@URS56:<U:<;~T¨=BWTdI:ÂRS8JLe8BU:MT%:ML ) 8B565W^>;S_`T3[=@;>ad>8HB:TdI:<3[LHB:MLeT36_M8B5¿8Ba%f4UR>T%=BT%:8JT & ,ßÌ A tBt A.-%'&0/ 1 , ..02.Z3 _E/`_ ?.? =BL®8ß;>8B36HB:wk]8fB:<a368B; ;>:MTgl=BLbo8B;~fÕa%:<;>a3[T36H~¦3[TfÃW¯^>;>_`T36=@;.36aA:`mI8B_`T5[f)CI:MT%:MLeU13;I:<C.¤sfÁTdI:8Ba¦a%:<aaeU:<;~T¨W=BL¨Td>:RS8JL!8BU:MT%:ML±¤/:<3;IP®HJ8JLe3[:<CÅ8B;>CTdI:=BLe3[P@36;>8B5HJ8B56^I:À=BW«Td>:ÀR>L=B¤8J¤S36563[Tf®=BW36;?T%:ML%¦:<a%TMG L=@UÓTdI:W¯^>;>_`T3[=@;O;>=8B;?fOa%:<;>a36T3[Hs36Tgf&R>L=BRI¦:MLTf1=BW¿36;?T%:ML:<a%T]_M8B;¤/:&_`=@URS^IT%:<CGY©ª:Oa%T^SCIfTdI:R>L=BR:MLT36:<a=BWYa%:<;>a3[T36Hs3[TfH­8B5^I:Bb?HB:MLeT%:`mR>Le=<mI36U13[Tf8B;>CK8BCSU136aa36¤S5[:+C>:MHs368JT36=@;G1 , .Z.2.Z342I`1 * ,65 2s _Y/ .x5 .3 =BLTdI:H­8JL!3[=@^>aR=@aa3[¤5[:la:<;>a3[T3[H43[TfW¯^>;>_`T3[=@;>aYW=BL8;S8B3[HB:+k]8<fB:<ae368B;£;I:MTgl=BL/bsTd>:+a%:<;>ae3[T3[H43[TgfHJ8B56^I:<a8JL:&L:<8BC>3656f«:<aT8J¤S5636ad>:<CÈEW=BLN R ,N QR:&S;>C Td>8JT7 + N 7 ) Ê)Ð

, ÊÌ G O Q Ð 0O Q) EdO N 0

O Q) ,

7 + N7 ) Ê)<Ð A =BL@N R , NQ R b/l:S;SC\8a36U13658JLL:<a^>5[T+¤?fALe:MRS568B_ ¦

36;IP )«¤?f¡ÌJGU)­G x+=BT%: Td>8JTyW=BL8A¤36;>8JLf\_M58BaaHJ8JLe368J¤S5[:BbTdI:1a:<;>a3[T3[H43[TfAH­8B56^>:<aWZ=BLy3[TaT=K_M58BaaHJ8B56^I:<a¿8JLe:TdI:ae8BU:BG vd>36aR>Le=BR/:MLeTgfOdI=@56C>aWZ=BL8B;~f¤S36;S8JLf+=@^>T%RS^IT¿HJ8JLe368J¤5[:36;8B;~fyk]8<fB:<ae368B;;I:MTgl=BL/G

0 0.2 0.4 0.6 0.8 1x0 0

0.20.4

0.60.8

1

P’0

0.5

1

1.5

2

2.5

3

|df/dx (x0)|

3[P@^IL:tsÈ vdI:a%:<;Sa3[T3[H43[TgfH­8B5^I:]WZ=BL + UWV9X N Y [l\ Ê)Ð8Ba8W¯^>;>_`T3[=@;£=BW )O8B;SCO`QbIWZ=BL N R ,NQR@Gk]8Ba%:<C±^IR=@;KTd>:PB:<;I:ML!8B5EW=BLeU(=BW TdI:a%:<;>a3[T36H~¦3[TfÁHJ8B56^I:ªCI:MLe36HB:<C.8J¤/=HB:Bb 3[P@^IL:Çt¡CI:MR36_`Ta£TdI:a%:<;>ae3[T3[H43[Tgf1HJ8B56^I:=BW-+UjV4X N Y []\ Ê)ÐlP@3[HB:<;TN R ,NQ R W=BLC>3§¢:ML:<;?T«_`=@U¤S36;S8JT3[=@;>a«=BWHJ8B56^I:<a«WZ=BL7)¥8B;>C O`QG©ª:=B¤Sa%:MLeHB:Td>8JTd>3[P@d«a%:<;>a36T3[Hs36TgfyHJ8B56^I:<a_M8B;=@;>56f¤:±WZ=@^>;SCÁ3[WTdI:ª8Baa%:<aaeU:<;~TAW=BLATdI:±RS8JLe8BU1:MT%:ML^>;>C>:ML&a%T^SCIf\36aaU18B55E8B;>C¨TdI:=BLe36P@36;>8B5ER/=@aT%:MLe3[=BLR>L=B¤8J¤S36563[TfTO Q 3a ;I=@;4¦Ö:`m4T%L:<U:BG98\=BL:¨WZ=BL!U18B565[fBbl:Kd>8<HB:ATd>8JT

oprqts u vw Ê)<Ð :ÌA3[W+8B;>C×=@;>5[fÀ36W

)<;dO Q GHO Q ZÒp A ÏBusG =BLDN RH,N QR@b>dS3[P@d£a%:<;>a3[T36H~¦3[TfªH­8B56^>:<a_M8B;=@;>5[f±¤: W=@^>;>C3[W]TdI:¥8Baa:<aaU:<;?TW=BL TdI:oRS8JLe8BU:MT%:ML£36a1568JLePB:ML Td>8B;Áp A ÍJuÀ8B;>C®TdI:=BLe3[P@3;>8B5SR=@a%T%:ML!3[=BLRSL=B¤S8J¤S3653[Tgf O Q 36a;>=@;4¦Ö:`msT%Le:<U:BGx+=BT%:«TdS8JTdS3[P@d\a%:<;Sa3[T3[H43[Tgf£HJ8B56^I:<a&_M8B;¨Td?^>a&=@;>56f¤:¥W=@^>;>CÀWZ=BLTdI:£RS8JL!8BU:MT%:MLea«Td>8JTCI:<a_`Le36¤/:¥TdI:Û Ø Ú `Ù Û =BØ Û H­8B56^>:A=BW8ÇWZ:<8JT^ILe:\HJ8JLe368J¤S5[:AP@36HB:<;®8RS8JLT3_M^>568JL _M568Baea HJ8B56^I:BG.vd>3aR>L=BR:MLTf¡l8Ba;I=J¦T36_`:<C ¤/:MW=BL:W=BLPB:<;I:ML!8B5Sk]8<fB:<a38B;¥;I:MT=BLesa&Ê Þ 8B;CI:ML&NO8B8JP Ë 9V:<;I=?=@3 7`bSÏJpBp?>@Ð!GX;ÀH43[:MÕ=BWV;>8B3[HB:k]8<fB:<a368B;¡_M58Baa3[>:ML!aMb:=B¤I¦a%:MLHB:TdS8JT36;IRS^IT36;>aT8B;>_`:<aVd>36_!d«=s_M_M^IL Td>:U=BL:WL:<?^>:<;~T5[fP@3[HB:<;8RS8JLT36_M^S568JLY_M58Baa 8JL:8B56a%=O563[B:<56fT%=1¤:OTdI:U1=BL:yR>L=B¤8J¤S5[:P@3[HB:<;KTdS8JTV_M568Baa3;ATdI:;I:MT=BL)^S;>CI:MLe5[f436;IPÂTdI:À_M58Baa3[>:MLG 2436;S_`:Çd>3[P@da%:<;>ae3[T3[H43[Tgf¥H­8B56^>:<aV_M8B;K=@;S5[f¥¤:OW=@^>;>CAW=BLVTd>:yRS8­¦Le8BU:MT%:ML!aTd>8JTCI:<ae_`Le3[¤:]TdI:5[:<8Ba%T53[B:<5[fHJ8B56^I:]=BW8W:<8JT^IL:HJ8JLe368J¤S5[:Bb~TdI:R=@a%T%:MLe3[=BL R>Le=B¤S8J¤S365636Tgfy=BW/TdI:_`=BLL:<aR/=@;SC>36;IP_M568Baa&H­8B5^I:WZ=BLOa^>_!d±8B;ª36;IRS^IT&36;4¦a%T8B;>_`:O+36565S;>=BTl¤:&HB:MLf¥a:<;>a3[T3[HB:&T%=36;>8B_M_M^IL!8B_M3[:<a36;oTdI:«H­8JLe36=@^>a&RS8JLe8BU:MT%:MLea<GOX°;oW8B_`TMbTd>:«HB:MLeT36_M8B58Ba%f4UR>T%=BT%:=BW8B;£8Baa%=4_M368JT%:<C£a%:<;>a36T3[Hs36TgfW^S;>_`T3[=@;b8B;>C±dI:<;>_`:TdI:7)S¦ÖH­8B56^>:«=BWTdI:W^S;>_`T3[=@;¬qa&HB:MLT%:`mbV3655¤:A?^S3[T%:AC>36a%T8B;?TWZL=@UÄTd>:ARS8JLe8BU:MT%:ML<¬qa 8Ba¦a%:<aaeU:<;~TMbL:<ae^>5[T36;IPo36;À8\Le8JTdI:MLVaS8JTW¯^>;>_`T3[=@;36;

Evidence and Scenario Sensitivities in Naive Bayesian Classifiers 259

Page 270: Proceedings of the Third European Workshop on Probabilistic

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

f c (

x )

xx0

fc’ (x )

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1f c

(x

)

xx0

fc’ (x )

3[P@^ILe:1IÈ mI8BURS56:<a+=BWla%:<;>a36T3[Hs36TgfKW¯^>;>_`T3[=@;Sa+W=BLTdI:R8JLe8BU:MT%:ML ) , OYÊNQ R J 6 QÐ!b~+dI:ML:7N R ,N Q R 8B;>C6 Q 36a "BÙlTdI:OU=@a%T5636B:<5[f1HJ8B56^I:BG Þ 8JLe368JT36=@;£:<3[TdI:ML_M8B;±U18JB: 6 QTdI:U=@a%T563[B:<56fHJ8B56^I:AÊ Û Ø Ù%Ð=BLV36565;I=BT_!d>8B;IPB:TdI:yU=@a%TV53[B:<5[f HJ8B56^I: Ê > @B ÙÐ!G

TdI:V¤>L=@8BCI:MLH436_M36;S3[Tgf=BW!) GvVdI:U=@a%Tl563[B:<5[f«_M568BaaHJ8B56^I:1+36565TdI:<;Âd>8JLeC>56fo:MHB:ML_edS8B;IPB:^>R/=@;ÀaU18B565adS3[WZTaY36;8RS8JLe8BU1:MT%:ML<G vdI:8J¤=HB:8JLP@^>U1:<;~T Tds^>a_`=BLLe=B¤/=BL!8JT%:<aoTd>::<UR3[Le36_M8B5656f®=B¤Sa%:MLeHB:<CÓLe=B¤S^>a%T¦;I:<aea=BWY_M58Baa3[>:ML!a]T%=1RS8JLe8BU:MT%:ML&36;>8B_M_M^IL!8B_M3[:<aMG

5 5 .<.1 5 2.I./ , X;ÇH43[:M²=BW_M568Baea3[S_M8­¦T3[=@;bsTd>:OR>L=BR:MLTgf¥=BWY8BCSU136aa36¤S5[:+C>:MHs368JT36=@;A36a]:<a¦R:<_M368B5656f=BWy36;?T%:ML:<a%TMGÅvdI:\8BCSU136aa36¤S5[:ACI:MH4368JT3[=@;W=BL8±RS8JLe8BU:MT%:MLP@3[HB:<a1TdI:A8BU=@^>;?T«=BW+H­8JL!368JT3[=@;Td>8JT13a18B565[=:<C¡¤:MW=BL:K8B;Ò36;Sa%T8B;>_`:K36a_M568Baea3[>:<C8Ba¥¤:<5[=@;IP@36;IPÀT%=Â8ÂC>3§¢:ML:<;?T¥_M568Baa<GÆvdI:oWZ=@5656=¦36;>PR>Le=BR/=@ae3[T3[=@;£;I=.P@36HB:<a8PB:<;I:MLe8B5W=BLeUW=BLV8B565R=@aa36¤S5[:CI:MH4368JT3[=@;SaE36;8;>8B3[HB:k]8<fB:<ae368B;;I:MTgl=BL/Gvd>:+R>Le=BR/=@ae3[T3[=@; U1=BL:&aR/:<_M36S_M8B565[f1ad>=ValTd>8JT36;a^S_ed¨8¥;I:MT=BLebTdI:U=@a%T563[B:<5[fK_M58BaaVHJ8B56^I:«_M8B;_!d>8B;IPB: Ú Ù+M" `Ù" Ø^IR=@;HJ8JLfs3;IPy8yRS8JL!8BU:MT%:ML<G^ /`_</S.Z./ , /'& b Ø<ÙO ,)U18­m z O N J 6 , 6 Q hÛ ØMÙ6 E Ø Ú ÚJÛ Ø" : " > ] B B h L­Ê 6 JNÐM,PO hÚ!_ Û ØMÙ + Ê)Ð-, + UjV4X N Y [l\ Ê)Ð Ü g*B Ø h ]Ø B4Ú JØJ+ UWV9X NkY []\ Ê)Ð Z+ Ê)Ð " >ÚBÛZÛI6 , 6 QQ " > N R ,N QR Ù B Ø Ú_ V = EMÛ Ø _ Øb Ú Ù" X " > Ù B ØDIÚ>Ú 1Ø<ÙgØ > )5 m " _ ¯Ù#" Ú_ 7 E`Û Ø _ Øb Ú Ù" O ;dO Q Ê)\G ) : yÐO , O`Q ʯp : p~ÐO : O Q Ê9 : ) G )<Ð ) ; Ê9 : yÐ "BÙ B Ø > ]- MØ

] B Ø > Ø ) , O 0

Q " > N Rc, N QR£Ù B Ø Ù B Ø ÚAE "BØ Ú_ 7 E`Û Ø _ Øb Ú Ù" B " Û _ ]-¯Ù B Ø Ú bB " = F>> Ø ØM" ) Ú? _) > Ø DÛ[Ú Ø _1Eb ÌGC) Ú!_ ÌG ) h > Ø D Ø ÙJØ Û @Ü

> "F" `Ü vVdI:¥SLea%T«R>L=BR:MLTgfWZ=@55[=Va«WL=@U TdI:£=B¤I¦a%:MLeH­8JT3[=@;±TdS8JT8B55EW¯^>;>_`T36=@;>a+UjV4X NkY [l\ V3[Td 6 , 6 Qd>8HB:oTd>:\a8BU:±dI=BLe3[ÉM=@;?T8B5V8B;SCÒHB:MLT36_M8B5&8Ba%f4URI¦T%=BT%:<aO8B;>CoTdI:ML:MW=BL:CI=£;I=BT36;~T%:ML!a%:<_`TMG Ñ a8_`=@;4¦a%:<s^I:<;>_`:Bbs:<3[TdI:ML 6 Q =BL 6 36aTd>:+U=@a%T]563[B:<56fHJ8B56^I:=BW : bL:MP@8JLeC>5[:<aea=BW&Td>:AH­8B56^>:=BWJ)EGÂX°W&TdI:ATgl=W¯^>;>_`T3[=@;SaJ+ UWV9X N Y []\ Ê)Ð8B;>C + Ê)Ð36;?T%:MLea%:<_`T8JTR) bTdI:<;±l:1d>8HB:1Td>8JT) p : W=BLN R , N QR18B;>C) G : ÌW=BL3N Rc,NQ R Gvd>:Va%:<_`=@;>C£8B;>C Td>3[LeCR>Le=BR/:MLeT3[:<a a%T8JT%:<C 8J¤=HB:+;I=®W=@565[=Á36U1U:<C>38JT%:<5[fWL=@U$TdI:OPB:<;>:MLe8B5W=BLeU1a=BWTdI:OW¯^>;>_`T3[=@;SaMG©ª:=B¤Sa:MLHB:]WZL=@UßTd>:]8J¤=HB:R>L=BR=@a3[T3[=@;Td>8JTMb@W=BL8B;?fKRS8JLe8BU1:MT%:ML<b8B;±8BC>U136aae3[¤S5[:y8BU=@^S;~T&=BWHJ8JLe368­¦T3[=@;C>3[¢/:ML:<;?T¿WL=@U Á=BL<Á36aYR/=@aea3[¤S5[:=@;>5[fOT%=TdI:5[:MWT=BLYT%=+Td>:Le3[P@d?T=BWITdI:RS8JLe8BU:MT%:ML¬qaY8Baea%:<aaU:<;?T¤S^>TE;I:MHB:ML 36;¤/=BTd«C>3[Le:<_`T3[=@;>aMG L=@UÅTd>36aE=B¤Sa%:MLHJ8­¦T3[=@;\:«d>8<HB:Td>8JT^IR=@;£H­8JLf436;IP TdI:RS8JL!8BU:MT%:ML<bTdI:]U=@a%T563[B:<56f_M58BaaYH­8B56^>:_M8B;_ed>8B;>PB:l=@;S5[fO=@;>_`:BG 3[P@^ILe:<aÏ8B;>C ae^IR>R=BLTlTd>36a=B¤Sa:MLH­8JT36=@;G `5 _-1 /& ©ª:Ô_`=@;>a36CI:ML³8JP@8B36;TdI:Ô;>8B3[HB:k]8<fB:<ae368B;À_M568Baea3[>:MLO8B;>CªTd>:RS8JT3[:<;?Ty36;IW=BLeU18JT3[=@;WL=@U m48BUR5[:ÌJG 24^>R>R=@a%:Td>8JTl:8JL:=@;>_`:U=BL:36;?T%:ML:<a%T%:<C36;TdI::`¢:<_`Ta¿=BWI36;S8B_M_M^ILe8B_M3[:<a36;yTdI: RS8­¦Le8BU1:MT%:ML ), OÊ g+Û " "R,Æ;I=J1U,.X ÞYÑ Ð]=@;\TdI:R=@a%T%:MLe36=BLlRSL=B¤S8J¤S3653[T3[:<a h LÊ JN­ÐW=BLTdI:RS8JT3[:<;?TMG©ª:Le:<_M8B565ITd>8JTTd>:H­8B5^I:X ÞÑ _`=BLL:<a%R=@;>CSaT%=yTdI:U=@aT]53[B:<5[f1a%T8JPB:OW=BL]Td>36alRS8JT3[:<;?TMb4V3[Td£8«R>L=B¤S8­¦¤S3563[Tgf=BWp A >4ÌJGOvdI:«a%:<_`=@;>C±U=@a%T563[B:<56f£aT8JPB:«W=BLTdI:R8JT3[:<;~Ty3aOa%T8JPB: XX Ñ b/+3[Td±R>L=B¤S8J¤36563[Tf\p A Ì<rsG©ª:y;I=)S;>C¥Td>8JTTdI:yU1=@a%T563[B:<5[f¥a%T8JPB:_edS8B;IPB:<aWL=@UX ÞYÑ T%=XX Ñ 8JT ) ,wp A Ì<r 0

# $# % ? ,wp A Ì >sGlvdI:8BC>U 36aa3[¤5[:1CI:MH4368JT3[=@;WZ=BLTdI:RS8JLe8BU:MT%:MLTds^>a36aʯp A t,> : yÐ!Gx+=BT%:Td>8JTTdI:U=@a%TO563[B:<5[foa%T8JPB:1_M8B;4¦;I=BT_!d>8B;IPB:K36;?T%=ª8B;~fÂ=BTdI:MLHJ8B56^I:A^IR=@;ÀHJ8JLf436;IPTdI:OR8JLe8BU:MT%:ML+^S;>CI:MLa%T^>C>fBG -X;«Hs3[:MÂ=BW;S8B3[HB:kl8fB:<a368B;1;>:MTgl=BL_M568Baea3[>:MLea<b­l:d>8HB:]WZL=@UÅTdI:]8J¤=HB:]R>L=BR=@a36T3[=@;yTd>8JTl:_M8B;:`ms¦R:<_`TT%=S;>C¥568JLPB:+8BCSU136aa36¤S5[:CI:MH4368JT3[=@;>aWZ=BLlU136a%T36;Sa%T8B;>_`:<aMG vE=«a^>R>R=BLTTd>36a:`m4R/:<_`T8JT36=@;b4:_`=@;4¦a3CI:ML81RS8JL!8BU:MT%:MLC)¨V3[TdN R , N QR@Î/a36U13568JL8JLP@^4¦U:<;?Ta dI=@56C«W=BL<N RH,N Q R GXW O ; O`Qb@l:]dS8<HB:WL=@UTdI:OW¯^>;>_`T36=@;>8B5/WZ=BLeU aTd>8JTTdI:8Baea%:<aaU:<;?T\)W=BLTdI:lRS8JLe8BU1:MT%:ML3aE^>;S563[B:<5[f&T%=+¤:lHB:MLfaU 8B565bJ8BaEW=BL

260 S. Renooij and L. van der Gaag

Page 271: Proceedings of the Third European Workshop on Probabilistic

W¯^>;>_`T3[=@;>ayV3[TdÀ8B;8Ba%f4UR>T%=BT%:T%=oTdI:¥5[:MWZT=BWTdI:^>;>36TyV36;>C>=ÕTdI:3;~T%:MLea:<_`T3[=@;À=BWVTdI:a:<;>a3[T3[H43[TfW¯^>;>_`T3[=@;>a«563[:<aT%=ªTdI:A5[:MWT=BW )­GªX°WMO : O Q b=@;TdI:y=BTd>:MLVd>8B;>CbIl:y:`m4R:<_`T8B;\8BC>U13aa3[¤S56:+CI:MH4368­¦T3[=@;¥V3[Td>3;TdI:+^S;>3[TV3;>CI=K" >Û WZ=BL]HB:MLf1aeU18B565HJ8B56^I:<a=BW )­GX°WYTdI:36;>aT8B;>_`:<aTd>8JTV8JLe:U=4CI:<565[:<C8BaTdI:ªU=BL:¨R>L=B¤S8J¤5[:\P@3[HB:<;w8ÂRS8JLT3_M^>568JL¥_M58BaaHJ8B56^I:Bb@=4_M_M^ILTd>:VU=BL:=BWT%:<; 3;R>Le8B_`T36_`:BbsTdI:<;W=BLTdI:<a%:36;>aT8B;>_`:<al:&:`msR:<_`T]Td>8JT )V3a;I=BT]8aeU18B565HJ8B56^I:£8B;>CÂTd>8JT O`Q : O GªX;RSLe8B_`T36_`::£l=@^>56CTdI:ML:MW=BL:O:`m4R:<_`TVU=@a%T36;Sa%T8B;>_`:<a]T%=1L:<ae^>5[T]36;A8BC4¦U136aea3[¤S5[:&CI:MH4368JT3[=@;>a=BWÊ9 : yÐ!G ½ S· /¹?¾Öº · J¾¸@¾¾Ö¸ =BL\_M58Baa3[S_M8JT36=@;ÁRSL=B¤S5[:<U1a<bV3[T£36aPB:<;I:MLe8B565[fÁ8Ba¦a^>U1:<CTd>8JT :MHs3CI:<;>_`:36a 8H­8B3658J¤S5[:]WZ=BL:MHB:MLf«ae36;IP@5[:W:<8JT^IL:VHJ8JLe368J¤5[:BGEX;1RSLe8B_`T36_M8B5/8JR>RS5636_M8JT36=@;>aMb~dI=¦:MHB:ML<bTd>36a8Baa^>U1R>T3[=@;ÇU18fÀ;I=BT¤/: L:<8B5636a%T3_JGX;TdI:KU:<CS36_M8B5]CI=@U18B36;bWZ=BL:`mI8BURS5[:Bbl8±RS8JT3[:<;?T136aT%=¤:&_M58Baa3[>:<C36;?T%==@;I:=BWE8;?^SU¤:MLl=BWEC>36a:<8Ba%:<aV3[Td>=@^IT¤:<36;>P«a^>¤47:<_`T%:<CKT%=:MHB:MLf¥R=@aae3[¤S5[:&C>368JPJ¦;I=@a%T3_+T%:<aTMGvdI:Os^I:<a%T3[=@;£TdI:<;A8JL!36a%:<adI=)U^>_!d36UR8B_`T¥8BC>C>36T3[=@;>8B5:MH436CI:<;>_`:¨_`=@^>56CÃd>8<HB:¨=@;ÃTdI:R>L=B¤8J¤S36563[TfªC>36a%T%Le36¤S^IT3[=@;±=HB:ML1TdI:£_M568Baa«H­8JL!368J¤S5[:8B;>C£dI=wa:<;>a3[T3[HB:&Td>36al36UR8B_`Tl36aT%=36;>8B_M_M^IL!8B_M3[:<a36;¨TdI:1;I:MT=BLe¬qaOR8JLe8BU:MT%:MLeaMGvd>:«W=BLeU:MLy3aa^I:36a1_M56=@a%:<5[fÂL:<568JT%:<C¡T%=ÇTd>:\;I=BT3[=@;¡=BW&HJ8B56^I:A=BWO36;4¦W=BLeU18JT3[=@;GvVdI:O568JT%T%:ML+36aea^I:36;~HB=@56HB:<a8;I=BT3[=@;A=BWa%:<;>ae3[T3[H43[TgfTd>8JTC>3§¢:MLeaWL=@U TdI:+a%T8B;>C>8JLeC;>=BT3[=@;^>a%:<CÀ36;±TdI:1R>Le:MHs3[=@^SaOa%:<_`T3[=@;>ay3;±TdI: a%:<;>a:Td>8JT3[T]R:MLT8B36;>a];I=BTT%=¥8<HJ8B36568J¤S5[::MH436CI:<;>_`:O¤^IT]T%=1a_`:`¦;>8JLe36=@a=BWR=@aae3[¤S5[f8BC>C>3[T36=@;>8B5>:MH436CI:<;>_`:BG©ª:L:MW:MLT%=ÇTd>3a;I=BT3[=@;¡=BWOa%:<;>ae3[T3[H43[TgfÂ8BaQ Ø Ú> #" <Ø Ù¯Ù 8B;SC^>a: TdI:T%:MLeU Ø _ Ø ØH<Ø Ù¯Ù T%=L:MW:MLVT%=¥TdI:U=BLe:a%T8B;>CS8JLeCK;I=BT3[=@;G Ñ 5[TdI=@^>P@dK3[T36a18JRSRS5636_M8J¤S56:AT%=k]8<fB:<a38B;Á;I:MT=BLesa£36;¡PB:<;>:MLe8B5b36;Td>36a]a%:<_`T36=@;£:yC>3a_M^>aala_`:<;>8JLe36=1a%:<;>a3[T36Hs3[Tf136;TdI:_`=@;?T%:`m4TV=BW;>8B3[HB:k]8<fB:<ae368B;\_M568Baea3[>:MLea<GvE=¥a%T^SCIf£TdI:y:`¢:<_`T+=BW8BC>CS3[T3[=@;>8B5:MH436CI:<;S_`:y=@;8B;K=@^>T%RS^ITR>Le=B¤S8J¤S365636Tgf W=BLVTd>:_M568BaaVH­8JL!368J¤S5[:BbIl:_`=@;>a3CI:MLTdI:&Le8JT3[= h LÊ 6 JNYÐ T%= h LÊ 6 JNÐ+dI:ML:N 8B;SCN CI:<;I=BT%:]TdI:l:MHs3CI:<;>_`:RSLe3[=BLYT%=&8B;>C«8JWT%:MLL:<_`:<3[H436;IPTd>:y;I:M):MH436CI:<;>_`:OL:<aR/:<_`T36HB:<5[fBG^ /`_M/.Z./ , %'&Db ØMÙ = Ú _ = E Ø<ØMÙX" - Ø ÚÙ > ØY Ú> Ú E`Û Ø ]-Ù B = = = Ú!_= UG = , z= ? : A A A: = ¯ h Ì Z7ZfC Ü b Ø<ÙNÚ _ N E Ø " ÙgØ Ù Ù Ú ØH" = Ú _ = h

> Ø D Ø ÙJØ Û @Ücg*B Ø hI " > Ø Ú B 6h ]Ø BsÚ JØ«Ù BsÚ Ùh LÊ 6 JNÐh LÊ 6 JN Ð , I ? h LÊN I J 6 Ð N I ? h LÊN I J 6 Ð 0 h LÊ 6 JN Ð

] B Ø > Ø N I OÙ B Ø1 ÚJÛ Ø7" =7I = Ü > "F" MÜ vd>:R>Le=BR/:MLeTgfyW=@565[=Va36U1U:<C>38JT%:<5[fO¤?f«8JRI¦RS5[f436;IPAk]8<fB:<a<¬YL!^>5[:18B;>CÀ:`msRS56=@3[T36;IPATdI:36;>CI:MR:<;4¦CI:<;>_`:<aTd>8JTVdI=@56CA3;£TdI:;>8B3[HB:y;>:MTgl=BLG

vdI:K8J¤=HB:\R>L=BR=@a36T3[=@;¡;I=(8B565[=+a ^>aT%=_`=@U¦RS^IT%:TdI:;I:MßR/=@aT%:MLe3[=BLVRSL=B¤S8J¤S3653[TgfAC>36aT%Le3[¤S^IT36=@;=HB:ML&TdI:y_M568BaeaH­8JL!368J¤S5[:OWL=@UTdI:yRSL:MHs36=@^>a]=@;I:BG `5 _-1 %'& ©ª:OL:<_`=@;Sa36CI:MLlTdI:O;>8B3[HB:Ok]8<fB:<a38B;;I:MT=BL×8B;>CÂTdI:£RS8JT3[:<;?T36;IW=BLeU18JT3[=@;WL=@UÄTdI:R>L:MH43[=@^>ay:`m48BUR5[:<aMG X;À8BCSC>3[T3[=@;±T%=oTdI:1T%:<a%TaT%=Vd>3_edTdI:£RS8JT3[:<;?T«d>8Ba8B5[Le:<8BCIfǤ:M:<;×a^I¤47%:<_`T%:<Cb8Ka_M8B;À=BW]TdI: 563[HB:MLy_M8B;À¤:R/:MLeWZ=BLeU1:<CG©ª:¥;I=8JL:3;~T%:ML:<aT%:<CK36;ATdI:OR=@a%T%:ML!3[=BLVC>36aT%Le3[¤S^IT36=@; =HB:MLTdI:yHJ8JLe3[=@^SaVa%T8JPB:<a&3[WYTdI:L:<ae^>5[T=BWTd>36aT%:<aTVl:ML:R=@a3[T3[HB:BG =BLTdI:«WZ:<8JT^ILe:HJ8JLe368J¤S56: g+°Û JØ > b/TdI:W=@565[=+36;IPRS8JLe8BU:MT%:ML&8Baa%:<aaeU:<;~Ta+8JL:ya%R:<_M3[>:<CÈ

$0 !#" ! !%" ! !%" ! !%" ! !#" ! ,." /

* !#" & !%" & !%" & !%" & !#" & !%" ! L=@U TdI:¡T8J¤S5[:¡8B;>CßTdI:¡RSL=B¤S8J¤S3653[Tgf)C>3a%T%Le3[¤S^I¦T3[=@; h LÊ J7N­ÐWL=@U mI8BURS56:ÌJbVl:±S;>C®Td>8JT + h LÊ g+Û JØ > ,.fB:<a@J1Ð 0 h LÊ J NÐ ,)p A Ì<ÏJpIGvdI:;I:MÆR=@a%T%:MLe36=BLR>L=B¤S8J¤36563[Tf =BW (, X Þ k);I=W=@565[=+aC>3[L:<_`T5[fWL=@Uh LÊX Þ k JNÐp A ÌBÌ

, p A >Brp A Ì<ÏJp,wp A >Bt

V3[Td>=@^IT+Le:<?^>36Le36;IP 8B;?fo8BCSC>3[T3[=@;>8B5E_`=@URS^IT8JT36=@;>aWL=@U TdI:;I:MT=BLeGx+=BT%:Td>8JTTd>:;I:MÒT%:<a%TLe:<a^>5[Tl=@^>56CA_edS8B;IPB:TdI:yU1=@a%TV563[B:<56f¥a%T8JPB:BG -vdI:Ç8J¤=HB:ÀR>L=BR=@a36T3[=@;.8B565[=VaAW=BLK:<a%T8J¤S563ad>36;IPTdI:y36U1RS8B_`T]=BWE8BC>C>36T3[=@;>8B5/:MHs3CI:<;>_`:=@;£TdI:OR=@a%T%:`¦Le3[=BLRSL=B¤S8J¤S3653[Tgf CS36a%T%Le3[¤^IT3[=@;1=HB:MLVTdI:_M568Baa]HJ8JLe3§¦8J¤S5[:BGvE=\_M8JR>T^ILe:TdI:1a%:<;>a36T3[Hs36Tgf\=BWlTd>36aO3URS8B_`TT%=&RS8JLe8BU:MT%:MLYH­8JL!368JT3[=@;b­:]_`=@;>a3CI:ML¿TdI:W^S;>_`T3[=@; Ê)ÐVTd>8JTyCI:<a_`Le36¤/:<aVTdI:8J¤=HB:R>L=B¤8J¤S36563[Tf£Le8JT3[=8BaV8W¯^>;>_`T36=@;£=BW81RS8JLe8BU:MT%:ML )Q, OYÊNQ R J 6 Q6Ð!È Ê)Ð,"! h LÊ 6 JNYÐh LÊ 6 JN Ð$# Ê)Ð, h LÊ 6 JNYÐ`Ê)Ðh LÊ 6 JN Ð`Ê)ÐX°WETdI:OR8JLe8BU:MT%:MLC)oR:MLT8B36;>alT%= 8HJ8JLe368J¤5[:O36;£TdI:a%:MT = UG = blTdI:<;¡TdI:KCI:<;>=@U136;>8JT%=BL 3a8ª_`=@;4¦a%T8B;?T+3[TdL:<a%R:<_`TT%=V)¿G vd>:W^S;>_`T3[=@; Ê)Ð TdI:<;

Evidence and Scenario Sensitivities in Naive Bayesian Classifiers 261

Page 272: Proceedings of the Third European Workshop on Probabilistic

7^>a%T«36a8% !ÚJÛ Ø _ HB:MLea36=@;À=BWTd>:a%:<;Sa3[T3[H43[Tgf±W¯^>;>_ ¦T3[=@;R+ UWV9X NkY [ \ Ê)Ð!G NO3[HB:<;OTdI:RSL=B¤S8J¤S3653[TgfC>3a%T%Le3[¤S^I¦T3[=@; h LÊ : J`NÐ+=HB:MLTdI:1_M58BaaH­8JL!368J¤S5[:Bbl:1_M8B;TdI:MLe:MWZ=BL:36U1U1:<C>368JT%:<5[f1CI:MT%:ML!U136;I:&TdI:a%:<;>a3[T36Hs3[Tf=BW&TdI:\36U1RS8B_`T=BWTdI:\8BC>C>36T3[=@;>8B5]:MH436CI:<;>_`:AWL=@UTdI:1a:<;>a3[T3[H43[Tf\W¯^>;>_`T3[=@; + UWV9X NkY [ \ Ê)Ð!G xV=BT%:¥Td>8JT36;A8;S8B3[HB:yk]8<fB:<a368B;¨;I:MT=BL£TdI:y58JT%T%:MLVW¯^>;>_`T3[=@;36a4;I=V;OW=BL:<8B_edRS8JLe8BU:MT%:ML^)=@;>_`:Td>:R=@a%T%:MLe3[=BLR>Le=B¤S8J¤S365636Tgf C>3a%T%Le3[¤S^>T3[=@; h LÊ : JN Ð]3a8<HJ8B36568J¤S5[:BG `5 _-1 & ©ª:+L:<_`=@;>a3CI:MLTdI:VR>L:MH43[=@^>a :`mI8BU¦RS56:BG+©ª:;I=ß8JLe:3;~T%:ML:<aT%:<C¨3;\TdI:«:`¢:<_`Ta=BW36;4¦8B_M_M^IL!8B_M3[:<a36;ªTdI:RS8JLe8BU:MT%:ML7)[, OYÊ g+°Û BØ > ,fB:<aaJ×X Þ kVÐÒ=@;ÔTdI:ßLe8JT3[= h LÊX Þ k J NYЮT%=h LÊX Þ k J<N Ð!Gw©ª:¨L:<_M8B565TdS8JT TdI:\;>:M R=@a%T%:`¦Le36=BLVR>L=B¤8J¤S36563[Tf =BW 5, X Þ k)l=@^>56C\¤:p A >BtsÎTdI:R>Le=B¤S8J¤S365636Tgf&P@3[HB:<;7^>a%TETdI:8H­8B3568J¤S5[::MHs36C>:<;>_`:]8Bap A ÌBÌJG©ª:;I=Ó:<a%T8J¤5636ad\TdI:a%:<;Sa3[T3[H43[TgfW¯^>;>_`T3[=@;+ UWV9X Y [ \ Ê)Ð, # ? 8B;>C£S;>C£TdS8JT Ê)Ð, Ì

p A ÌBÌ 0)

)72×p A >Ì Le=@U m48BU1RS5[:&tl:&dS8BC Td>8JT]TdI:&R>L=B¤8J¤S36563[Tf=BWTdI:y_M58BaaHJ8B56^I:OX Þ k.36;S_`L:<8Ba%:<CAWL=@Up A ÌBÌT%= p A >Bt^IR=@;8+R=@a36T3[HB:5636HB:MLYa_M8B;b@TdI:ML:M¤sfy¤:<_`=@U136;IPu A ÍT36U1:<a8BaE53[B:<5[fBG©ª:]_M8B;;I=36;8BC>C>3[T3[=@;O_`=@;S_M56^>CI:Td>8JT3[WTd>:RS8JLe8BU:MT%:ML )3a H­8JL!3[:<Cb@TdI:+_M568Baa HJ8B56^I:X Þ kÅ_M8B;±¤:<_`=@U:8JTyU=@a%T > A AT36U:<a8BaO5636B:<5[fK8BaV36TdI=@^IT]TdI:8BC>CS3[T3[=@;>8B5/:MH436CI:<;>_`:BG - "¥º¿·½Og¼ J¾gº¿· X;ÒTd>3a1RS8JR:ML<b]l:o^>a%:<C®T%:<_ed>;S36?^>:<a WL=@U a:<;>a3§¦T3[H43[Tf8B;>8B5[f4a36aT%=ÇaT^>CIfÀTd>:K:`¢/:<_`Ta =BW+R8JLe8BU:`¦T%:ML3;>8B_M_M^ILe8B_M3[:<ay=@;À8A;>8B3[HB:¥kl8fB:<a368B;Â;I:MTgl=BL/¬qaR=@a%T%:MLe36=BL+R>Le=B¤S8J¤S365636TgfKC>36aT%Le3[¤S^IT36=@;>aMGV©ª:1aedI=l:<CTd>8JTATd>:Ç36;>CI:MR:<;>CI:<;S_`:¨8Baae^>UR>T3[=@;Sa¥=BWa^S_ed.8;I:MT=BLe_`=@;Sa%T%Le8B36;TdI:VW^>;S_`T3[=@;>8B5sWZ=BL!U =BWTdI:8Ba¦a%=4_M368JT%:<C.a%:<;Sa3[T3[H43[Tgf®W^>;S_`T3[=@;>aMÈÇTd>:<a%:ªW^S;>_`T3[=@;>a8JL:OCI:MT%:ML!U136;I:<C£a%=@5[:<56f1¤?f¥TdI:O8Baa%:<aaeU:<;~TW=BL]TdI:RS8JL!8BU:MT%:ML^>;>C>:MLya%T^>CIf±8B;>C±TdI:1=BL!3[P@36;>8B5YR=@a%T%:`¦Le36=BLR>L=B¤S8J¤S3563[Tgf1CS36a%T%Le3[¤^IT3[=@;¥=HB:MLVTd>:y_M568BaaHJ8JLe3§¦8J¤S56:BGvVdI:a%:<;>a3[T36Hs3[TfKR>L=BR:MLT36:<aWZ=@5656=V36;>P¥WL=@UTdI:W¯^>;>_`T3[=@;SaR>Le=H436CI:<Ca%=@U:OW¯^>;>C>8BU:<;?T8B5/8JLP@^4¦U:<;?Ta¥Vd>36_!dÁ_`=BLLe=B¤/=BL!8JT%:<C®TdI:¨:<URS36Le36_M8B565[f=B¤I¦a%:MLeHB:<C£L=B¤S^>a%T;>:<aal=BWY_M568Baea3[>:MLealT%=RS8JL!8BU:MT%:MLV36;4¦8B_M_M^IL!8B_M3[:<aMG 8K=BL:L:<a%:<8JLe_!do3aL:<s^>3[L:<Cb>d>=l:MHB:ML<bT%=ÒW¯^ILTd>:MLKa^I¤Sa%T8B;?T368JT%:ÇTd>:<a%:À8JLP@^>U1:<;~TaMG(©ª:W¯^ILTdI:ML36;?T%L=4C>^>_`:<CKTd>:;>=HB:<5 ;I=BT3[=@;o=BWae_`:<;>8JLe3[=a%:<;Sa3[T3[H43[TgfBbVd>36_!dßCI:<a_`Le3[¤:<a\TdI:×:`¢/:<_`Taª=BW¥RS8­¦

Le8BU1:MT%:MLV36;>8B_M_M^IL!8B_M3[:<a36;£H43[:M.=BWYae_`:<;>8JLe3[=@a]=BW8BC4¦C>36T3[=@;>8B5:MH436CI:<;>_`:BGÔ©ª:ad>=l:<CÆTd>8JToWZ=BL¨;>8B3[HB:k]8<fB:<ae368B;);I:MT=BL4aKa_`:<;S8JLe3[=Òa%:<;>ae3[T3[H43[TgfÒ_M8B;.¤:L:<8BCS365[fw:`msRSL:<aa%:<CÅ36;ÓT%:MLeU1ao=BW1TdI:×U=BL:×a%T8B;4¦C>8JL!Ca%:<;Sa3[T3[H43[TgfªW¯^>;>_`T36=@;>aMG±X°;TdI:A;I:<8JLW¯^IT^IL:Bbl:ol=@^>56CÃ563[B:¨T%=¡a%T^>CIfÒae_`:<;>8JLe3[=×a%:<;>a36T3[Hs36TgfÒ36;k]8<fB:<ae368B;\;I:MT=BLesa+36;£PB:<;I:MLe8B5G

>¹·½ I ! #"$&%')(*(*')+ -,.0/1 23#45.$687:99<;<2 ' ,=*><@?-,?$A,B.$C*D (*' 'FE G? ' C-,!, ' @HI (J KL*MNMNMPO$Q*RTSVU)RXW&Y6Z\[TSU#[TS^]_GU`Yba&c8UedgfhRTSdiR-Skj^lm_n*a)Q`oSpa&Y6Z\W&Ueqr;<smt7Gquvt<q-w

41iIx,y-,.z/m5 ( H|CDex ' mqwg~ ' ,V,VhX"$E G? ' ,1, ' @HI (J D&C* ')( KTb,^Q[:W*aeaejTZS<TU [@YVa YlN[-SK&a)Qa&SpWea[-SSkWea)Q`YR-ZSY6_ZS5Q`Y6Z IW)Z\RTL&SoYba)Z=Xa)SkWea) ( r-,^!"$v-,V,gV- ' #7K<;#u7X7:$

8 4I"%' -,.N5>T-,. 'K( X-kq-rq$V ( X$¡'K( * ' g ' ,=*><@?-,?$g- E G? ' ,! ' ' k, ' ¡H (*J )A5SSkR-¢U£[@vfhR-Ya&cvRTY6Z\W`U¤RTSkj¤5Q`Y6Z IW)Z\RTL&SoYba)Z=Xa)SkWea)w¥siwXq-w1u£wX¥V

m5¦,Vri-,.¦r i +K+ -,§$7:99r;$¨5,8*x ' X$v-¡@?#-$x ' ¦V 'IE G? ' , D)X ')( "V,. 'K(A+K')( -¡, 'X*KfhRXW*$ZSka©Aa*R-Q`SZS<-q9VsF7KXw!u7:wV

ª1 «¬ ( "V­®,.¯°¢5 >T-,. ')( #qXV5h J ¡,V ' ,=*><@?h-,-?<C!D&¦V"V*T*X,-? '&± D& ' ,r:b,²Q[:Weaea*j-ZSr-U[³Ya X´ YµlN[-SK&a)Qa&SpWea¶[TS·ASoWea)Q`YR-ZSY6_¸ZSh5Q`Y§Z W&Z\R-pL&SYba&=Z=Xa)SkWea)p ( X-,¤!-"V¡v,V,$ ' |w7G; u£wXqV

¦ E FA J' ?#7K99r$|2 ' ,><=@?¶,-?$C^ ( ( -¡=@?¹* ' ' ,r*º, E G? ' C-,, ' @H (*J )LM°MNMOVQR-SVU)RW)Y§Z\[-SVU8[-S]k_GU`Ya)c1UedNfhR-SVdRTSkjlm_nea&Q`Spa&Y6Z\W&UeqX$si97u£9X9V

mT"-°G ' V,.1»1:¼º"qXV:/³½r"-,r*=eT> ' ".$?*x '' ­ ' D&N-p¦*,V!.V-*#,vD)X= ')( )Tb,¸LM°MNMFQ[KWeaeaejTZS<TUº[YVa¾-Y³L`SYa)Q`SkR-Y§Z\[-SkR-8lN[TS:&a&Q*a&SpW*a[-S¿lN[-cÀpÁ$Ya)QBR-Skj¦L`S:&[TQ`cvRTY6Z\[TS¶OaeWe<Sp[T[*-_T3- ' qÂ!u£wXwV

`<~|Cx$q-XV7X/, ' ¦V ( CD)-k@*".$?¦-3*x ' ,-> 'E G? ' D)X ')( Kb,¦L&ÃklgL¦Ä[TQÅTUV[eÀ[-SBMcÀkZQ`Z\WeRTfa)Y<o[:jGU!ZS5Q`Y§Z W&Z\R-L`SYa)Z=ra&SpW*a)V ' Ft75u¸tX¥V

°¢5>G,. ')( -¸,.2p~ ' ,V< «&kq7V/,-?<,V ' ,><=@?Æ.VTe³ ( XÇ ( VC@*D, ' @H (*J K|b,FQ[KWeaeaejTZS<TUÈ[ÉYVa $Ê YËlN[TS:&a&Qa)SkWeaÌ[TSÍASpWea&Q`oYbRTZSY§_ZSÎ5Q`Y6Z IW)Z\RTvL`SYa)Z=ra&SpW*a) ( r-,²!-"V¡v,V,$ ' wu^-wr;$

°¢5>T-,È. ')( -¿-,.Æ2p|~ ' ,V<X «&|q-X¥V|¨5,x ' ' ,><=@?I ( VC@*D8, ' @HI (J 5 (' C-=@?Dex ( D` ')( CD)K fh[KjXa&Q`S0L`S:&[TQ`cvRTY6Z\[TSQ[:Wea`UeU`ZSrrÏÐQ[-cÑOpVae[TQ`_£Y[BIÀÀpZ\WeRTY6Z\[TSUekNC ' >< 'K( < ' FwXÂXu¸trX$

262 S. Renooij and L. van der Gaag

Page 273: Proceedings of the Third European Workshop on Probabilistic

Abstraction and Refinement for Solving Continuous MarkovDecision Processes

Alberto Reyes1and Pablo IbarguengoytiaInst. de Inv. Electricas

Av. Reforma 113, Palmira, Cuernavaca, Mor., Mexicoareyes,[email protected],mx

L. Enrique Sucar and Eduardo MoralesINAOE

Luis Enrique Erro 1, Sta. Ma. Tonantzintla, Pue., Mexicoesucar,[email protected]

Abstract

We propose a novel approach for solving continuous and hybrid Markov Decision Processes(MDPs) based on two phases. In the first phase, an initial approximate solution is obtainedby partitioning the state space based on the reward function, and solving the resultingdiscrete MDP. In the second phase, the initial abstraction is refined and improved. Stateswith high variance in their value with respect to neighboring states are partitioned, andthe MDP is solved locally to improve the policy. In our approach, the reward function andtransition model are learned from a random exploration of the environment, and can workwith both, pure continuous spaces; or hybrid, with continuous and discrete variables. Wedemonstrate empirically the method in several simulated robot navigation problems, withdifferent sizes and complexities. Our results show an approximate optimal solution withan important reduction in state size and solution time compared to a fine discretizationof the space.

1 Introduction

Markov Decision Processes (MDPs) have de-veloped as a standard method for decision-theoretic planning. Traditional MDP solutiontechniques have the drawback that they re-quire an explicit state representation, limitingtheir applicability to real-world problems. Fac-tored representations (Boutilier et al., 1999) ad-dress this drawback via compactly specifyingthe state-space in factored form by using dy-namic Bayesian networks or decision diagrams.Such Factored MDPs can be used to repre-sent in a more compact way exponentially largestate spaces. The algorithms for planning usingMDPs, however, still run in time polynomial inthe size of the state space or exponential in the

1Ph.D. Student at ITESM Campus Cuernavaca, Av.Reforma 182-A, Lomas, Cuernavaca, Mor., Mexico

number of state-variables; and do not apply tocontinuous domains.

Many stochastic processes are more naturallydefined using continuous state variables whichare solved as Continuous Markov Decision Pro-cesses (CMDPs) or general state-space MDPsby analogy with general state-space Markovchains. In this approach the optimal value func-tion satisfies the Bellman fixed point equation:

V (s) = maxa[R(s, a) + γ

∫s′

p(s′|s, a)V (s′)ds′].

Where s represents the state, a a finite set of ac-tions, R the immediate reward, V the expectedvalue, P the transition function, and γ a dis-count factor.

The problem with CMDPs is that if the con-tinuous space is discretized to find a solution,the discretization causes yet another level of ex-

Page 274: Proceedings of the Third European Workshop on Probabilistic

ponential blow up. This “curse of dimensional-ity” has limited the use of the MDP framework,and overcoming it has become a relevant topic ofresearch. Existing solutions attempt to replacethe value function or the optimal policy with afinite approximation. Two recent methods tosolve a CMDPs are known as grid-based MDPdiscretizations and parametric approximations.The idea behind the grid-based MDPs is to dis-cretize the state-space in a set of grid points andapproximate value functions over such points.Unfortunately, classic grid algorithms scale upexponentially with the number of state variables(Bonet and Pearl, 2002). An alternative way isto approximate the optimal value function V (x)with an appropriate parametric function model(Bertsekas and Tsitsiklis, 1996). The parame-ters of the model are fitted iteratively by ap-plying one step Bellman backups to a finite setof state points arranged on a fixed grid or ob-tained through Monte Carlo sampling. Leastsquares criterion is used to fit the parameters ofthe model. In addition to parallel updates andoptimizations, on-line update schemes based ongradient descent, e.g., (Bertsekas and Tsitsiklis,1996) can be used to optimize the parameters.The disadvantages of these methods are theirinstability and possible divergence (Bertsekas,1995).

Several authors, e.g., (Dean and Givan,1997), use the notions of abstraction and ag-gregation to group states that are similar withrespect to certain problem characteristics to fur-ther reduce the complexity of the representationor the solution. (Feng et al., 2004) proposes astate aggregation approach for exploiting struc-ture in MDPs with continuous variables wherethe state space is dynamically partitioned intoregions where the value function is the samethroughout each region. The technique comesfrom POMDPs to represent and reason aboutlinear surfaces effectively. (Hauskrecht andKveton, 2003) show that approximate linearprogramming is able to solve not only MDPswith finite states, but also be successfully ap-plied to factored continuous MDPs. Similarly,(Guestrin et al., 2004) presents a frameworkthat also exploits structure to model and solve

factored MDPs although he extends the tech-nique to be applied to both discrete and contin-uous problems in a collaborative setting.

Our approach is related to these works, how-ever it differs on several aspects. First, it isbased on qualitative models (Kuipers, 1986),which are particularly useful for domains withcontinuous state variables. It also differs in theway the abstraction is built. We use trainingdata to learn a decision tree for the rewardfunction, from which we deduce an abstractioncalled qualitative states. This initial abstrac-tion is refined and improved via a local itera-tive process. States with high variance in theirvalue with respect to neighboring states are par-titioned, and the MDP is solved locally to im-prove the policy. At each stage in the refine-ment process, only one state is partitioned, andthe process finishes when any potential partitiondoes not change the policy. In our approach,the reward function and transition model arelearned from a random exploration of the en-vironment, and can work with both, pure con-tinuous spaces; or hybrid, with continuous anddiscrete variables.

We have tested our method in simulatedrobot navigation problems, in which the statespace is continuous, with several scenarios withdifferent sizes and complexities. We comparethe solution of the models obtained with theinitial abstraction and the final refinement, withthe solution of a discrete MDP obtained with afine discretization of the state space, in terms ofdifferences in policy, value and complexity. Ourresults show an approximate optimal solutionwith an important reduction in state size andsolution time compared to a fine discretizationof the space.

The rest of the paper is organized as follows.The next section gives a brief introduction toMDPs. Section 3 develops the abstraction pro-cess and Section 4 the refinement stage. In Sec-tion 5 the empirical evaluation is described. Weconclude with a summary and directions for fu-ture work.

264 A. Reyes, P. Ibargüengoytia, L. E. Sucar, and E. Morales

Page 275: Proceedings of the Third European Workshop on Probabilistic

2 Markov Decision Processes

A Markov Decision Process (MDP) (Puterman,1994) models a sequential decision problem, inwhich a system evolves in time and is controlledby an agent. The system dynamics is governedby a probabilistic transition function that mapsstates and actions to states. At each time, anagent receives a reward that depends on the cur-rent state and the applied action. Thus, themain problem is to find a control strategy orpolicy that maximizes the expected reward overtime.

Formally, an MDP is a tuple M =<S,A,Φ, R >, where S is a finite set of statess1, . . . , sn. A is a finite set of actions for allstates. Φ : A × S × S is the state transitionfunction specified as a probability distribution.The probability of reaching state s′ by perform-ing action a in state s is written as Φ(a, s, s′).R : S × A → < is the reward function. R(s, a)is the reward that the agent receives if it takesaction a in state s. A policy for an MDP isa mapping π : S → A that selects an actionfor each state. A solution to an MDP is a pol-icy that maximizes its expected value. For thediscounted infinite–horizon case with any givendiscount factor γ ∈ [0, 1), there is a policy V ∗

that is optimal regardless of the starting statethat satisfies the Bellman equation (Bellman,1957):

V ∗(s) = maxaR(s, a)+γΣs′∈SΦ(a, s, s′)V ∗(s′)

Two popular methods for solving this equa-tion and finding an optimal policy for an MDPare: (a) value iteration and (b) policy iteration(Puterman, 1994).

2.1 Factored MDPs

A common problem with the MDP formalismis that the state space grows exponentially withthe number of domain variables, and its infer-ence methods grow in the number of actions.Thus, in large problems, MDPs becomes im-practical and inefficient. Factored represen-tations avoid enumerating the problem statespace, producing a more concise representa-tion that makes solving more complex problems

tractable.

In a factored MDP, the set of states is de-scribed via a set of random variables X =X1, . . . , Xn, where each Xi takes on valuesin some finite domain Dom(Xi). A state x de-fines a value xi ∈ Dom(Xi) for each variableXi. Thus, the set of states S = Dom(Xi)is exponentially large, making it impracticalto represent the transition model explicitly asmatrices. Fortunately, the framework of dy-namic Bayesian networks (DBN) (Dean andKanazawa, 1989) gives us the tools to describethe transition model function concisely. In theserepresentations, the post-action nodes (at thetime t + 1) contain matrices with the probabil-ities of their values given their parents’ valuesunder the effects of an action.

3 Qualitative MDPs

3.1 Qualitative states

We define a qualitative state space Q as a set ofstates q1, q2, ..qn that have different utility prop-erties. These properties map the state spaceinto a set of partitions, such that each partitioncorresponds to a group of continuous states witha similar reward value. In a qualitative MDP, astate partition qi is a region bounded by a setof constraints over the continuous dimensions inthe state space. The relational operators usedin this approach are < and ≥. For example,assuming that the immediate reward is a func-tion of the linear position in a robot navigationdomain, a qualitative state could be a regionin an x0 − x1 coordinates system bounded bythe constraints: x0 ≥ val(x0) and x1 ≥ val(x1),expressing that the current x0 coordinate is lim-ited by the interval [val(x0),∞], and the x1 co-ordinate by the interval [val(x1),∞]. It is evi-dent that a qualitative state can cover a largenumber of states (if we consider a fine discretiza-tion) with similar properties.

Similarly to the reward function in a factoredMDP, the state space Q is represented by a deci-sion tree (Q–tree). In our approach, the decisiontree is automatically induced from data. Eachleaf in the induced decision tree is labeled witha new qualitative state. Even for leaves with the

Abstraction and Refinement for Solving Continuous Markov Decision Processes 265

Page 276: Proceedings of the Third European Workshop on Probabilistic

same reward value, we assign a different qualita-tive state value. This generates more states butat the same time creates more guidance thathelps produce more adequate policies. Stateswith similar reward are partitioned so each q–state is a continuous region. Figure 1 showsthis tree transformation in a two dimensionaldomain.

Figure 1: Transformation of the reward deci-sion tree into a Q-tree. Nodes in the tree rep-resent continuous variables and edges evaluatewhether this variable is less or greater than aparticular bound.

Each branch in the Q–tree denotes a set ofconstraints for each partition qi. Figure 2 illus-trates the constraints associated to the examplepresented above, and its representation in a 2-dimensional space.

Figure 2: In a Q-tree, branches are constraintsand leaves are qualitative states. A graphicalrepresentation of the tree is also shown. Notethat when an upper or lower variable bound isinfinite, it must be understood as the upper orlower variable bound in the domain.

3.2 Qualitative MDP Model

Specification

We can define a qualitative MDP as a factoredMDP with a set of hybrid qualitative–discrete

factors. The qualitative state space Q, is anadditional factor that concentrates all the con-tinuous variables. The idea is to substituteall these variables by this abstraction to re-duce the dimensionality of the state space. Ini-tially, only the continuous variables involved inthe reward function are considered, but, as de-scribed in Section 4, other continuous variablescan be incorporated in the refinement stage.Thus, a Qualitative MDP state is described ina factored form as X = X1, . . . , Xn, Q, whereX1, . . . , Xn are the discrete factors, and Q is afactor that represents the relevant continuousdimensions.

3.3 Learning Qualitative MDPs

The Qualitative MDP model is learned fromdata based on a random exploration of the en-vironment with a 10% Gaussian noise intro-duced on the actions outcomes. We assume thatthe agent can explore the state space, and foreach state–action can receive some immediatereward. Based on this random exploration, aninitial partition, Q0, of the continuous dimen-sions is obtained, and the reward function andtransition functions are induced.

Given a set of state transition represented asa set of random variables, Oj = Xt,A,Xt+1,for j = 1, 2, ...,M , for each state and action Aexecuted by an agent, and a reward (or cost) Rj

associated to each transition, we learn a quali-tative factored MDP model:

1. From a set of examples O,R obtain adecision tree, RDT , that predicts the re-ward function R in terms of continuousand discrete state variables, X1, . . . , Xk, Q.We used J48, a Java re- implementation ofC4.5 (Quinlan, 1993) in Weka, to induce apruned decision tree.

2. Obtain from the decision tree, RDT , theset of constraints for the continuous vari-ables relevant to determine the qualitativestates (q–states) in the form of a Q-tree. Interms of the domain variables, we obtaina new variable Q representing the reward-based qualitative state space whose valuesare the q–states.

266 A. Reyes, P. Ibargüengoytia, L. E. Sucar, and E. Morales

Page 277: Proceedings of the Third European Workshop on Probabilistic

3. Qualify data from the original sample insuch a way that the new set of attributesare the Q variables, the remaining discretestate variables not included in the decisiontree, and the action A. This transformeddata set can be called the qualified data set.

4. Format the qualified data set in such a waythat the attributes follow a temporal causalordering. For example variable Qt must beset before Qt+1, X1t before X1t+1, and soon. The whole set of attributes should bethe variable Q in time t, the remaining sys-tem variables in time t, the variable Q intime t+1, the remaining system variablesin time t+1, and the action A.

5. Prepare data for the induction of a 2-stagedynamic Bayesian net. According to theaction space dimension, split the qualifieddata set into |A| sets of samples for eachaction.

6. Induce the transition model for each action,Aj , using the K2 algorithm (Cooper andHerskovits, 1992).

This initial model represents a high-level ab-straction of the continuous state space and canbe solved using a standard solution technique,such as value iteration, to obtain the optimalpolicy. This approach has been successfully ap-plied in some domains. However, in some cases,our abstraction can miss some relevant detailsof the domain and consequently produce sub-optimal policies. We improve this initial parti-tion through a refinement stage described in thenext section.

4 Qualitative State Refinement

We have designed a value-based algorithm thatrecursively selects and partitions abtract stateswith high utility variance. Given an initialpartition and a solution for the qualitativeMDP, the algorithm proceeds as follows:

While there is an unmarked partition greaterthan the minimum size:

1. Select an unmarked partition (state) withthe highest variance in its utility value withrespect to its neighbours

2. Select the dimension (state variable) withthe highest difference in utility value withits contiguous states

3. Bisect the dimension (divide the state intwo equal-size parts)

4. Solve the new MDP

5. If the new MDP has the same policy asbefore, mark the original state before thepartition and return to the previous MDP,otherwise, accept the refinement and con-tinue.

The minimum size of a state is defined by theuser and is domain dependent. For example, inthe robot navigation domain, the minimum sizedepends on the smallest goal (area with certainreward) or on the robot step size. A graphicalrepresentation of this process is shown in fig-ure 3. Figure 3 (a) shows an initial partitionfor two qualitative variables where each quali-tative state is a set of ground states with simi-lar reward. Figure 3 (b) shows the refined two-dimension state space after applying the split-ting state algorithm.

Figure 3: Qualitative refinement for a two-dimension state space. a) initial partition byreward. b) refined state space

5 Experimental Results

We tested our approach in a robot navigationdomain using a simulated environment. In thissetting goals are represented as light-coloredsquare regions with positive immediate reward,while non-desirable regions are represented by

Abstraction and Refinement for Solving Continuous Markov Decision Processes 267

Page 278: Proceedings of the Third European Workshop on Probabilistic

Figure 4: Abstraction and refinement process. Upper left: reward regions. Upper right: explo-ration process. Lower left: initial qualitative states and their corresponding policies, where u=up,d=down, r=right, and l=left. Lower right: refined partition.

dark-colored squares with negative reward. Theremaining regions in the navigation area receive0 reward (white). Experimentally, we expressthe size of a rewarded (non zero reward) as afunction of the navigation area. Rewarded re-gions are multivalued and can be distributedrandomly over the navigation area. The num-ber of rewarded squares is also variable. Sinceobstacles are not considered robot states, theyare not included.

The robot sensor system included the x-yposition, angular orientation, and navigationbounds detection. In a set of experiments thepossible actions are discrete orthogonal move-ments to the right, left, up, and down. Fig-ure 4 upper left shows an example of a navi-gation problem with 26 rewarded regions. Thereward function can have six possible values. Inthis example, goals are represented as different-scale light colors. Similarly, negative rewardsare represented with different-scale dark colors.The planning problem is to automatically ob-tain an optimal policy for the robot to achieveits goals avoiding negative rewarded regions.

The qualitative representation and refine-ment were tested with several problems of dif-ferent sizes and complexities, and compared to afine discretization of the environment in termsof precision and complexity. The precision isevaluated by comparing the policies and valuesper state. The policy precision is obtained bycomparing the policies generated with respectto the policy obtained from a fine discretiza-tion. That is, we count the number of fine cellsin which the policies are the same:

PP = (NEC/NTC)× 100,

where PP is the policy precision in percentage,NEC is the number of fine cells with the samepolicy, and NTC is the total number of finecells. This measure is pessimistic because insome states it is possible for more than one ac-tion to have the same or similar value, and inthis measure only one is considered correct.

The utility error is calculated as follows. Theutility values of all the states in each represen-tation is first normalized. The sum of the abso-lute differences of the utility values of the corre-

268 A. Reyes, P. Ibargüengoytia, L. E. Sucar, and E. Morales

Page 279: Proceedings of the Third European Workshop on Probabilistic

Table 1: Description of problems and comparison between a “normal” discretization and our qual-itative discretization.

Problem Discrete Qualitative

Learning Inference Learning Inference

no. reward no. no. no. time no. time no. time no. timeid reward size reward samples states (ms) itera- (ms) states (ms) itera- (ms)

cells (% dim) values tions tions1 2 20 3 40,000 25 7,671 120 20 8 2,634 120 202 4 20 5 40,000 25 1,763 123 20 13 2,423 122 203 10 10 3 40,000 100 4,026 120 80 26 2,503 120 204 6 5 3 40,000 400 5,418 120 1,602 24 4,527 120 405 10 5 5 28,868 400 3,595 128 2,774 29 2,203 127 606 12 5 4 29,250 400 7,351 124 7,921 46 2,163 124 307 14 3.3 9 50,000 900 9,223 117 16,784 60 4,296 117 241

sponding states is evaluated and averaged overall the differences.

Figure 4 shows the abstraction and refine-ment process for the motion planning problempresented above. A color inversion and grayscale format is used for clarification. The upperleft figure shows the rewarded regions. The up-per right figure illustrates the exploration pro-cess. The lower left figure shows the initial qual-itative states and their corresponding policies.The lower right figure shows the refined parti-tion.

Table 1 presents a comparison between thebehavior of seven problems solved with a sim-ple discretization approach and our qualitativeapproach. Problems are identified with a num-ber as shown in the first column. The firstfive columns describe the characteristics of eachproblem. For example, problem 1 (first row) has2 reward cells with values different from zerothat occupy 20% of the number of cells, the dif-ferent number of reward values is 3 (e.g., -10,0 and 10) and we generated 40,000 samples tobuild the MDP model.

Table 2 presents a comparison between thequalitative and the refined representation. Thefirst three columns describe the characteristicsof the qualitative model and the following de-scribe the characteristics with our refinementprocess. They are compared in terms of utilityerror in %, the policy precision also in %, andthe time spent in the refinement in minutes.

As can be seen from Table 1, there is a signif-icant reduction in the complexity of the prob-lems using our abstraction approach. This canbe clearly appreciated from the number of states

and processing time required to solve the prob-lems. This is important since in complex do-mains where it can be difficult to define an ade-quate abstraction or solve the resulting MDPproblem, one option is to create abstractionsand hope for suboptimal policies. To evaluatethe quality of the results Table 2 shows that theproposed abstraction produces on average only9.17% error in the utility value when comparedagainst the values obtained from the dicretizedproblem as can be seen in Table 2. Finally, sincean initial refinement can miss some relevant as-pects of the domain, a refinement process maybe necessary to obtain more adequate policies.Our refinement process is able to maintain orimprove the utility values in all cases. The per-centage of policy precision with respect to theinitial qualitative model can sometimes decreasewith the refinement process. This is due to ourpessimistic measure.

Table 2: Comparative results between the ini-tial abstraction and the proposed refinementprocess.

Qualitative Refinement

util. policy util. policy refin.id error precis. error precis. time

(%) (%) (%) (%) (min)1 7.38 80 7.38 80 0.332 9.03 64 9.03 64 0.2473 10.68 64 9.45 74 6.34 12.65 52 8.82 54.5 6.135 7.13 35 5.79 36 1.236 11.56 47.2 11.32 46.72 10.317 5.78 44.78 5.45 43.89 23.34

Abstraction and Refinement for Solving Continuous Markov Decision Processes 269

Page 280: Proceedings of the Third European Workshop on Probabilistic

6 Conclusions and Future Work

In this paper, a new approach for solving con-tinuous and hybrid infinite-horizon MDPs isdescribed. In the first phase we use an ex-ploration strategy of the environment and amachine learning approach to induce an ini-tial state abstraction. We then follow a refine-ment process to improve on the utility value.Our approach creates significant reductions inspace and time allowing to solve quickly rela-tively large problems. The utility values on ourabstracted representation are reasonably close(less than 10%) to those obtained using a finediscretization of the domain. A new refinementprocess to improve the results of the initial pro-posed abstraction is also presented. It alwaysimproves or at least maintains the utility val-ues obtained from the qualitative abstraction.Although tested on small solvable problems forcomparison purposes, the approach can be ap-plied to more complex domains where a simpledicretization approach is not feasible.

As future research work we will like to im-prove our refinement strategy to select a bettersegmentation of the abstract states and use analternative search strategy. We also plan to testour approach in other domains.

Acknowledgments

This work was supported in part by the Insti-tuto de Investigaciones Electricas, Mexico andCONACYT Project No. 47968.

References

R.E. Bellman. 1957. Dynamic Programming.Princeton U. Press, Princeton, N.J.

D. P. Bertsekas and J.N. Tsitsiklis. 1996. Neuro-dynamic programming. Athena Sciences.

D. P. Bertsekas. 1995. A counter-example to tem-poral difference learning. Neural Computation,7:270–279.

B. Bonet and J. Pearl. 2002. Qualitative mdpsand pomdps: An order-of-magnitude approach.In Proceedings of the 18th Conf. on Uncertaintyin AI, UAI-02, pages 61–68, Edmonton, Canada.

C. Boutilier, T. Dean, and S. Hanks. 1999. Decision-theoretic planning: structural assumptions andcomputational leverage. Journal of AI Research,11:1–94.

G. F. Cooper and E. Herskovits. 1992. A Bayesianmethod for the induction of probabilistic networksfrom data. Machine Learning.

T. Dean and R. Givan. 1997. Model minimization inMarkov Decision Processes. In Proc. of the 14thNational Conf. on AI, pages 106–111. AAAI.

T. Dean and K. Kanazawa. 1989. A model for rea-soning about persistence and causation. Compu-tational Intelligence, 5:142–150.

Z. Feng, R. Dearden, N. Meuleau, and R. Washing-ton. 2004. Dynamic programming for structuredcontinuous Markov decision problems. In Proc. ofthe 20th Conf. on Uncertainty in AI (UAI-2004).Banff, Canada.

C. Guestrin, M. Hauskrecht, and B. Kveton. 2004.Solving factored mdps with continuous and dis-crete variables. In Twentieth Conference on Un-certainty in Artificial Intelligence (UAI 2004),Banff, Canada.

M. Hauskrecht and B. Kveton. 2003. Linear pro-gram approximation for factored continuous-stateMarkov decision processes. In Advances in NeuralInformation Processing Systems NIPS(03), pages895– 902.

B. Kuipers. 1986. Qualitative simulation. AI,29:289–338.

M.L. Puterman. 1994. Markov Decision Processes.Wiley, New York.

J.R. Quinlan. 1993. C4.5: Programs for ma-chine learning. Morgan Kaufmann, San Fran-cisco, Calif., USA.

270 A. Reyes, P. Ibargüengoytia, L. E. Sucar, and E. Morales

Page 281: Proceedings of the Third European Workshop on Probabilistic

Bayesian Model Averaging of TAN Models for Clustering

Guzman Santafe, Jose A. Lozano and Pedro LarranagaIntelligent Systems Group, Computer Science and A. I. Dept.

University of the Basque Country, Spain

Abstract

Selecting a single model for clustering ignores the uncertainty left by finite data as towhich is the correct model to describe the dataset. In fact, the fewer samples the datasethas, the higher the uncertainty is in model selection. In these cases, a Bayesian approachmay be beneficial, but unfortunately this approach is usually computationally intractableand only approximations are feasible. For supervised classification problems, it has beendemonstrated that model averaging calculations, under some restrictions, are feasible andefficient. In this paper, we extend the expectation model averaging (EMA) algorithmoriginally proposed in Santafe et al. (2006) to deal with model averaging of naive Bayesmodels for clustering. Thus, the extended algorithm, EMA-TAN, allows to perform anefficient approximation for a model averaging over the class of tree augmented naive Bayes(TAN) models for clustering. We also present some empirical results that show how theEMA algorithm based on TAN outperforms other clustering methods.

1 Introduction

Unsupervised classification or clustering is theprocess of grouping similar objects or data sam-ples together into natural groups called clus-ters. This process generates a partition ofthe objects to be classified. Bayesian networks(Jensen, 2001) are powerful probabilistic graph-ical models that can be used for clusteringpurposes. The naive Bayes is the most sim-ple Bayesian network model (Duda and Hart,1973). It assumes that the predictive vari-ables in the model are independent given thevalue of the cluster variable. Despite this be-ing a very simple model, it has been success-fully used in clustering problems (Cheesemanand Stutz, 1996). Other models have been pro-posed in the literature in order to relax theheavy independence assumptions that the naiveBayes model makes. For example, tree aug-mented naive Bayes (TAN) models (Friedmanet al., 1997) allow the predictive variables toform a tree and the class variable remains asa parent of each predictive variable. On theother hand, more complicated methods havebeen proposed to allow the encoding of context-specific (in)dependencies in clustering problemsby using, for instance, naive Bayes (Barash and

Friedman, 2002) or recursive Bayesian multinets(Pena et al., 2002) which are trees that incor-porate Bayesian multinets in the leaves.

The process of selecting a single model forclustering ignores model uncertainty and it canlead to the selection of a model that does notproperly describe the data. In fact, the fewersamples the dataset has, the higher the uncer-tainty is in model selection. In these cases,a Bayesian approach may be beneficial. TheBayesian approach proposes an averaging overall models weighted by their posterior probabil-ity given the data (Madigan and Raftery, 1994;Hoeting et al., 1999). The Bayesian model av-eraging has been seen by some authors as amodel combination technique (Domingos, 2000;Clarke, 2003) and it does not always performas successfully as expected. However, other au-thors states that Bayesian model averaging isnot exactly a model ensemble technique but amethod for ‘soft model selection’ (Minka, 2002).

Although model averaging approach is nor-mally preferred when there are a few data sam-ples because it deals with uncertainty in modelselection, this approach is usually intractableand only approximations are feasible. Typi-cally, the Bayesian model averaging for cluster-ing is approximated by averaging over some of

Page 282: Proceedings of the Third European Workshop on Probabilistic

the models with the highest posterior probabili-ties (Friedman, 1998). However, efficient calcu-lation of model averaging for supervised classifi-cation models under some constraints has beenproposed in the literature (Dash and Cooper,2004; Cerquides and Lopez de Mantaras, 2005).Some of these proposals have also been extendedto clustering problems. For example, Santafeet al. (2006) extend the calculations of naiveBayes models to approximate a Bayesian modelaveraging for clustering. In that paper, theauthors introduce the expectation model aver-aging (EMA) algorithm, which is a variant ofthe well-known EM algorithm (Dempster et al.,1977) and allows to deal with the latent clus-ter variable and then approximate a Bayesianmodel averaging of naive Bayes models for clus-tering.

In this paper we use the structural featuresestimation proposed by Friedman and Koller(2003) and the model averaging calculationsfor supervised classifiers presented in Dash andCooper (2004) in order to extend the EMA algo-rithm to TAN models (EMA-TAN algorithm).In other words, we propose a method to obtaina single Bayesian network model for clusteringwhich approximates a Bayesian model averagingover all possible TAN models. This is possibleby setting an ancestral order among the predic-tive variables. The result of the Bayesian modelaveraging over TAN models is a single Bayesiannetwork model for clustering (Dash and Cooper,2004).

The rest of the paper is organized as follows.Section 2 introduces the notation that is usedthroughout the paper as well as the assumptionsthat we make. Section 3 describes the EMAalgorithm to approximate model averaging overthe class of TAN models. Section 4 shows someexperimental results with synthetic data thatillustrate the behavior of the EMA algorithm.Finally, section 5 presents the conclusions of thepaper and future work.

2 Notation and Assumptions

In an unsupervised learning problem there is aset of predictive variables, X1, . . . ,Xn, and thelatent cluster variable, C. The dataset D =x(1), . . . ,x(N) contains data samples x(l) =

x(l)1 , . . . , x

(l)n , with l = 1, . . . , N .

We define a Bayesian network model as B =〈S,θ〉, where S describes the structure of themodel and θ its parameter set. Using the clas-sical notation in Bayesian networks, θijk (withk = 1, . . . , ri and ri being the number of statesfor variable Xi) represents the conditional prob-ability of variable Xi taking its k-th value giventhat its parents, P ai, takes its j-th configura-tion. The conditional probability mass functionfor Xi given the j-th configuration of its par-ents is designated as θij , with j = 1, . . . , qi,where qi is the number of different states of P ai.Finally, θi = (θi1, . . . ,θiqi) denotes the set ofparameters for variable Xi, and therefore, θ =(θC ,θ1, . . . ,θn), where θC = (θC−1, . . . , θC−rC

)is the set of parameters for the cluster variable,with rC the number of clusters fixed in advance.

In order to use the decomposition proposedin Friedman and Koller (2003) for an efficientmodel averaging calculation, we need to con-sider an ancestral order π over the predictivevariables.

Definition 1. Class of TAN models (LπTAN ):

given an ancestral order π, a model B belongsto Lπ

TAN if each predictive variable has up totwo parents (another predictive variable and thecluster variable) and the arcs between variablesthat are defined in the structure S are directeddown levels: Xj → Xi ∈ S ⇒ levelπ(Xi) <levelπ(Xj).

Note that, the classical conception of TANmodel allows the predictive variables to form atree and then, the cluster variable is set as aparent of each predictive variable. However, wedo not restrict Lπ

TAN to only these tree mod-els, but we also allow the predictive variables toform a forest and also the class variable may ormay not be set as a parent of each predictivevariable. Therefore Lπ

TAN also includes, amongothers, naive Bayes and selective naive Bayesmodels.

For a given ordering π and a particular vari-able Xi, we can enumerate all the possibleparent sets for Xi in the class of TAN mod-els Lπ

TAN . In order to clarify calculations, wesuperscript with v any quantity related to apredictive variable Xi and thus, we are ableto identify the parent set of variable Xi that

272 G. Santafé, J. A. Lozano, and P. Larrañaga

Page 283: Proceedings of the Third European Workshop on Probabilistic

we are taking into consideration. For exam-ple, for a given order π = 〈X1,X2,X3〉 thepossible sets of parents for X3 in Lπ

TAN are:P a1

3 = ∅,P a23 = X1,P a3

3 = X2,P a43 =

C,X1,P a53 = C,X2,P a6

3 = C with, inthis case, v = 1, . . . , 6. In general, we consider,without loss of generality, π = 〈X1, . . . ,Xn〉and therefore, for a variable Xi, v = 1, . . . , 2i.Moreover, we use i to index any quantity relatedto the i-th predictive variable, with i = 1, . . . , n.

Additionally, the following five assumptionsare needed to perform an efficient approxima-tion of model averaging over Lπ

TAN :Multinomial variables: Each variable Xi isdiscrete and can take ri states. The cluster vari-able is also discrete and can take rC possiblestates, rC being the number of clusters fixed inadvance.Complete dataset: We assume that there areno missing values for the predictive variables inthe dataset. However, the cluster variable islatent; therefore, its values are always missing.Dirichlet priors: The parameters of everymodel are assumed to follow a Dirichlet distri-bution. Thus, αijk is the Dirichlet hyperparam-eter for parameter θijk from the network, andαC−j is the hyperparameter for θC−j. In fact, aswe have to take into consideration each possiblemodel in Lπ

TAN , the parameters of the modelscan be denoted as θv

ijk. Hence, we assume theexistence of hyperparameters αv

ijk.Parameter independence: The probabilityof having the set of parameters θ for a givenstructure S can be factorized as follows:

p(θ|S) = p(θC)n∏

i=1

qi∏

j=1

p(θij|S) (1)

Structure modularity: The prior probabilityp(S) can be decomposed in terms of eachvariable and its parents:

p(S) ∝ pS(C)n∏

i=1

pS(Xi,P ai) (2)

where pS(Xi,P ai) is the information con-tributed by variable Xi to p(S), and pS(C) isthe information contributed by the cluster vari-able.

Parameter independence assumes that theprior on parameters θijk for a variable Xi de-pends only on local structures. This is known asparameter modularity (Heckerman et al., 1995).Therefore, we can state that for any two net-work structures S1 and S2, if Xi has the sameparent set in both structures then p(θijk|S1) =p(θijk|S2). As a consequence, parameter cal-culations for a variable Xi will be the same inevery model whose structure defines that thevariable Xi has the same parent set.

3 The EMA-TAN Algorithm

The EMA algorithm was originally introducedin Santafe et al. (2006) for dealing with modelaveraging calculations of naive Bayes models.In this section we present the extension of thisalgorithm (EMA-TAN) in order to average overTAN models.

Theorem 1 (Dash and Cooper, 2004). Thereexist, for supervised classification problems, asingle model B = 〈S, θ〉 which defines a jointprobability distribution p(c,x|B) equivalent tothe joint probability distribution produced bymodel averaging over all TAN models. Thismodel B is a complete Bayesian network wherethe structure S defines the relationship betweenvariables in such a way that, for a variable Xi,the parent set of Xi in the model B is P ai =∪2i

v=1P avi .

This result can be extended to clusteringproblems by means of the EMA-TAN algorithm.However, the latent cluster variable prevents theexact calculation of the model averaging andtherefore the model B obtained by the EMA-TAN algorithm is not an exact model averag-ing over Lπ

TAN but an approximation. There-fore, the EMA-TAN algorithm provides a pow-erful tool that allows to learn a single unsuper-vised Bayesian network model which approxi-mates Bayesian model averaging over Lπ

TAN .The unsupervised classifier is obtained by

learning the predictive probability, p(c,x|D),averaged over the maximum a posteriori (MAP)parameter configurations for all the models inLπ

TAN . Then, we can obtain the unsupervisedclassifier by using the conditional probability ofthe cluster variable given by Bayes’ rule.

The EMA-TAN algorithm is an adaption of

Bayesian Model Averaging of TAN Models for Clustering 273

Page 284: Proceedings of the Third European Workshop on Probabilistic

the well-known EM algorithm. It uses the Estep of the EM algorithm to deal with the miss-ing values for the cluster variable. Then, it per-forms a model averaging step (MA) to obtainp(c,x|D) and thus the unsupervised Bayesiannetwork model for clustering.

The EMA-TAN, as well as the EM algorithm,is an iterative process where the two steps ofthe algorithm are repeated successively until astopping criterion is met. At the t-th iterationof the algorithm, a set of parameters, θ

(t), for

the Bayesian network model B(t) is calculated.Note that, although we differentiate betweenthe Bayesian network models among the itera-tions of the EMA-TAN algorithm, the structureof the model, S, is constant and only the es-timated parameter set changes. The algorithmstops when the difference between the sets of pa-rameters learned in two consecutive iterations,θ

(t)and θ

(t+1), is less than threshold ε, which

is fixed in advance.In order to use the EMA-TAN algorithm,

we need to set an initial parameter configu-ration, θ

(0), and the value of ε. The values

for θ(0)

are usually taken at random and ε isset at a small value. Note that, even thoughthe obtained model is a single unsupervisedBayesian network, its parameters are learnedtaking into account the MAP parameter con-figuration for every model in Lπ

TAN . Thus,the resulting unsupervised Bayesian networkwill incorporate into its parameters informationabout the (in)dependencies between variablesdescribed by the different TAN models.

3.1 E Step (Expectation)

Intuitively, we can see this step as a comple-tion of the values for the cluster variable, whichare missing. Actually, this step computes theexpected sufficient statistics in the dataset forvariable Xi in every model in Lπ

TAN given thecurrent model, B(t). These expected sufficientstatistics are used in the next step of the al-gorithm, MA, as if they were actual sufficientstatistics from a complete dataset. From nowon, D(t) denotes the dataset after the E step atthe t-th iteration of the algorithm.

Note that, due to parameter modularity, wedo not actually need to calculate the expected

sufficient statistics for all the models in LπTAN

because some of these models share the samevalue for the expected sufficient statistics.Hence, it is only necessary to calculate theexpected sufficient statistics with differentparent sets. They can be obtained as follows:

E(Nvijk|B(t)) =

N∑

l=1

p(xki ,P av

i = j|x(l), B(t)) (3)

where xki represents the k-th value of the i-

th variable. The expected sufficient statisticE(Nv

ijk|B(t)) denotes, at iteration t, the ex-pected number of cases in the dataset D wherevariable Xi takes its k-th value, and the v-thparent set of Xi takes its j-th configuration.

Similarly, we can obtain the expected suf-ficient statistics for the cluster variable. Thisis a special case since for any model in Lπ

TANthe parent set for C is the same (the clustervariable does not have any parent). Therefore,we refuse the use of superindex v in thosequantities related only to C.

E(NC−j |B(t)) =N∑

l=1

p(C = j|x(l), B(t)) (4)

Note that, some of the expected sufficientstatistics E(Nv

ijk|B(t)) do not depend on thevalue of C. Therefore, these values are constantthroughout the iterations of the algorithm andit is necessary to calculate them only once.

3.2 MA Step (Model Averaging)

In this second step, the EMA algorithm per-forms the model averaging calculations whichobtain a single Bayesian network model withparameters θ

(t+1). These parameters are ob-

tained by calculating p(c,x|D(t)) as an averageover the MAP configurations for the models inLπ

TAN .In order to make the calculations clearer, we

first show how we can obtain p(c,x|S,D(t)) fora fixed structure S:

p(c,x|S,D(t)) =∫

p(c,x|S,θ)p(θ|S,D(t))dθ (5)

The exact computation of the integral inEquation 5 is intractable for clustering prob-lems, therefore, an approximation is needed

274 G. Santafé, J. A. Lozano, and P. Larrañaga

Page 285: Proceedings of the Third European Workshop on Probabilistic

(Heckerman et al., 1995). However, assumingparameter independence and Dirichlet priors,and given that the expected sufficient statis-tics calculated in the previous E step canbe used as an approximation to the actualsufficient statistics in the complete dataset, wecan approximate p(c,x|S,D(t)) by the MAPparameter configuration. This is the parameterconfiguration that maximizes p(θ|S,D(t)) andcan be described in terms of the expectedsufficient statistics and the Dirichlet hyper-parameters (Heckerman et al., 1995; Cooperand Herskovits, 1992). Therefore, Equation 5results:

p(c,x|S,D(t)) ≈ αC−j + E(NC−j |B(t))αC + E(NC |B(t))

· (6)

n∏

i=1

αµiijk + E(Nµi

ijk|B(t))

αµiij + E(Nµi

ij |B(t))= θC−j

n∏

i=1

θµi

ijk

where θµiijk is the MAP parameter configu-

ration for θµiijk (μi denotes the parent index

that corresponds to the parent set for Xi de-scribed in S), αµi

ij =∑ri

k=1 αµi

ijk, E(Nij |B(t)) =∑ri

k=1 E(Nijk|B(t)) and similarly for the valuesrelated to C.

Considering that the structure is not fixeda priori, we should average over all selectivemodel structures in Lπ

TAN in the following way:

p(c,x|D(t)) = (7)∑

S

∫p(c,x|S,θ)p(θ|S,D(t))dθ p(S|D(t))

Therefore, the model averaging calculationsrequire a summation over 2nn! terms, which arethe models in Lπ

TAN .Using the previous calculations for a fixed

structure, Equation 8 can be written as:

p(c,x|D(t)) ≈∑

S

θC−j

n∏

i=1

θµi

ijk p(S|D(t))

∝∑

S

θC−j

n∏

i=1

θµi

ijk p(D(t)|S) p(S) (8)

Given the assumption of Dirichlet priors andparameter independence, we can approximate

p(D(t)|S) efficiently. In order to do so, weadapt the formula to calculate the marginallikelihood with complete data (Cooper andHerskovits, 1992) to our problem with missingvalues. Thus, we have an approximation top(D(t)|S):

p(D(t)|S)≈ Γ(αC)

Γ(αC + E(NC |B(t)))

rC∏

j=1

Γ(αC−j + E(NC−j|B(t)))

Γ(αC−j)·

n∏

i=1

qi∏

j=1

Γ(αµiij

)

Γ(αµiij

+ E(Nµiij

|B(t)))

ri∏

k=1

Γ(αµiijk

+ E(Nµiijk

|B(t)))

Γ(αµiijk

)

At this point, given structure modular-ity assumption, we are able to approximatep(c,x|D(t)) with the following expression:

p(c,x|D(t)) ≈ κ∑

S

ρC−j

n∏

i=1

ρµiijk (9)

where κ is a constant and ρC−j and ρµiijk are

defined in Equation 10.

ρC−j = θC−j pS(C)Γ(αC)

Γ(αC + E(NC |B(t)))·

rC∏

j=1

Γ(αC−j + E(NC−j |B(t)))

Γ(αC−j)(10)

ρµiijk = θµi

ijk pS(Xi, P aµii )

qi∏

j=1

Γ(αµiij )

Γ(αµiij + E(Nµi

ij |B(t)))·

ri∏

k=1

Γ(αµiijk + E(Nµi

ijk|B(t)))

Γ(αµiijk)

Since we are assuming parameter indepen-dence, structure modularity and parametermodularity, we can apply the dynamic pro-gramming solution described in Friedman andKoller (2003), and Dash and Cooper (2004).Thus Equation 9 can be written as follows:

p(c,x|D(t)) ≈ λ ρC−j

n∏

i=1

2i∑

v=1

ρvijk (11)

with λ being a constant.Note that the time complexity needed to cal-

culate the averaging over LπTAN is the same as

that which is needed to learn the MAP param-eter configuration for B.

Bayesian Model Averaging of TAN Models for Clustering 275

Page 286: Proceedings of the Third European Workshop on Probabilistic

We can see the similarity of the above-described Equation 11 with the factorizationof a Bayesian network model. Indeed the jointprobability distribution of the approximatedBayesian model averaging over Lπ

TAN forclustering is equivalent to a single Bayesiannetwork model. Therefore, the parametersof the model for the next iteration of thealgorithm can be calculated as follows:

θ(t+1)C−j ∝ ρC−j , θ

(t+1)ijk ∝

2i∑

v=1

ρvijk (12)

3.3 Multi-start EMA-TAN

The EMA-TAN is a greedy algorithm that issusceptible to be trapped in a local optima. Theresults obtained by the algorithm depend on therandom initialization of the parameters. There-fore, we propose the use of a multi-start schemewhere m different runs of the algorithm with dif-ferent random initializations are performed. InSantaf et al. (2006) different criteria to obtainthe final model from the multi-start process areproposed. In our case, we use the same criteriathat the multi-start EM uses: the best model interms of likelihood among all the m calculatedmodels is selected as the final model. This isnot a pure Bayesian approach to the model av-eraging process but, in practice, it works as wellas other more complicated techniques.

4 Evaluation in Clustering Problems

It is not easy to validate clustering algorithmssince clustering problems do not normallyprovide information about the true grouping ofdata samples. In general, it is quite commonto use synthetic data because the true modelthat generated the dataset as well as theunderlying clustering structure of the dataare known. In order to illustrate the behaviorof the EMA-TAN algorithm, we compare itwith the classical EM algorithm and with theEMA algorithm (Santafe et al., 2006). ForEMA-TAN evaluation, we obtain random TANmodels where the number of predictive vari-ables vary in 2, 4, 8, 10, 12, 14, each predictivevariable can take up to three states and thenumber of clusters is set to two. For each TANconfiguration we generate 100 random modelsand each one of these models is sampled to

obtain different datasets of sizes 40, 80, 160, 320and 640. In the experiments, we compare themulti-start EMA-TAN (called also EMA-TANfor convenience) with three different algorithms:

EM-TAN: a multi-start EM that learns a TANmodel by using, at each step of the EM algo-rithm, the classical method proposed by Fried-man et al. (1997) adapted to be used within theEM algorithm.EM-BNET: a multi-start EM algorithm usedto learn the MAP parameters of a completeBayesian network for a given order π.EMA: the multi-start model averaging of naiveBayes for clustering.

Note that, for a given order π, both EMA-TAN and EM-BNET models share the samenetwork structure but their parameter sets arecalculated in a different way. The number ofmulti-start iterations for both multi-start EMand multi-start EMA is m = 100.

Since the datasets are synthetically gener-ated, we are able to know the real orderingamong variables. Nevertheless, we prefer to usea more general approach and assume that thisorder is unknown. Therefore, we use a randomordering among predictive variables for eachEMA-TAN model that we learn. As the EM-BNET algorithm also needs an ancestral orderamong predictive variables, in the experiments,we use the same random ordering for any pairof models (EMA-TAN vs. EM-BNET) that wecompare.

Every model is used to cluster the datasetfrom which it has been learned. In the exper-iments, we compare EMA-TAN vs. EM-TAN,EMA-TAN vs. EM-BNET and EMA-TAN vs.EMA. For each test, the winner model is ob-tained by comparing the data partitions ob-tained by both models with the true partition ofthe dataset. Thus, the model with the best es-timated accuracy (percentage of correctly clas-sified instances) is the winner model. In Table1, the results from the experiments with ran-dom TAN models are shown. For each modelconfiguration, the table describes the numberof wins/draws/losses of the EMA-TAN modelswith respect to EM-TAN, EM-BNET or EMAmodels on basis of the estimated accuracy ofeach model. We also provide information about

276 G. Santafé, J. A. Lozano, and P. Larrañaga

Page 287: Proceedings of the Third European Workshop on Probabilistic

a Wilcoxon signed-rank test1 used to evaluatewhether the accuracy estimated by two differ-ent models is different at the 1% and 10% levels.We write the results shown in Table 1 in bold ifthe test is surpassed at the 10% level and insidea gray color box if the test is surpassed at the 1%level. It can be seen that, in general, the EMA-TAN models behave better than the others andthe differences between estimated accuracy are,in most of the cases, statistically significant.

We can see that the compared models obtainvery similar results when they have a few predic-tive variables. This is because the set of modelsthat we are averaging over to obtain the EMA-TAN model is very small. Therefore, it is quitepossible that other algorithms such as EM-TANselect the correct model. Hence, in some exper-iments with the simplest models (models with 2predictive variables), EMA-TAN algorithm sig-nificantly lost with the other algorithms. How-ever, when the number of predictive variablesin the model increases and the dataset size isrelatively big (the smallest datasets are not bigenough for a reliable estimation of the param-eters) the EMA-TAN considerably outperformsany other model in the test.

The experimental results from this section re-inforce the idea that the results of the Bayesianmodel averaging outperform other methodswhen the model that generated the data is in-cluded in the set of models that we are aver-aging over. Since we are averaging over a re-stricted class of models, this situation may notbe fulfilled when applying the EMA-TAN to realproblems. Due to the lack of space, we do notinclude in the paper more experimentation inprogress, but we are aware that it would bevery interesting to check the performance of theEMA algorithm with other synthetic datasetsgenerated by sampling naive Bayes and generalBayesian network models and also with datasetfrom real problems.

5 Conclusions

We have shown that it is possible to obtain a sin-gle unsupervised Bayesian network that approx-

1Since we have checked by means of a Kolmogorov-Smirnov test that not all the outcomes from the experi-ment can be considered to follow a normal distribution,we decided to use a Wilcoxon sign-rank test to comparethe results

imates model averaging over the class of TANmodels. Furthermore, this approximation canbe performed efficiently. This is possible by us-ing the EMA-TAN algorithm. The EMA is analgorithm originally proposed in Santafe et al.(2006) for averaging over naive Bayes models inclustering problems. In this paper we extendthe algorithm to deal with more complicatedmodels such as TAN (EMA-TAN algorithm).We also present an empirical evaluation by com-paring the model averaging over TAN modelswith the model averaging over naive Bayes mod-els and with the EM algorithm to learn a sin-gle TAN model and a Bayesian network model.These experiments conclude that, at least, whenthe model that generated the dataset is includedin the set of models that we average over, the av-eraging over TAN models outperforms the othermethods

Probably, one of the limitations of the pro-posed algorithm is that, because the learnedmodel which approximates model averaging is acomplete Bayesian network, it is computation-ally hard to learn the model for problems withmany predictive variables. In order to overcomethis situation, we can restrict the final modelto a maximum of k possible parents for eachpredictive variable. Future work might includea more exhaustive empirical evaluation of theEMA-TAN algorithm and the application of thealgorithm to real problems. Moreover, the algo-rithm can be extended to other Bayesian clas-sifiers such as k-dependent naive Bayes (kDB),selective naive Bayes, etc. Another interestingfuture work may be the relaxation of completedataset assumption.

Acknowledgments

This work was supported by the SAIOTEK-Autoinmune (II) 2006 and Etortek researchprojects from the Basque Government, by theSpanish Ministerio de Educacion y Ciencia un-der grant TIN 2005-03824, and by the Govern-ment of Navarre under a PhD grant.

References

Y. Barash and N. Friedman. 2002. Context-specific Bayesian clustering for gene expressiondata. Journal of Computational Biology, 9:169–191.

Bayesian Model Averaging of TAN Models for Clustering 277

Page 288: Proceedings of the Third European Workshop on Probabilistic

EMA-TAN vs. EM-TAN EMA-TAN vs. EM-BNET#Var 40 80 160 320 640 #Var 40 80 160 320 640

2 11/71/18 9/71/20 15/68/17 18/70/12 15/63/22 2 24/51/25 29/35/36 33/34/33 28/35/37 41/29/30

4 37/24/39 33/17/50 50/8/42 48/5/47 46/3/51 4 54/11/35 51/8/41 59/8/33 49/6/45 57/1/42

8 51/6/43 57/4/39 65/6/29 74/4/22 71/1/28 8 59/8/33 64/5/31 81/1/18 83/0/17 91/0/9

10 48/5/47 59/4/37 55/10/35 64/6/30 67/4/29 10 67/7/26 76/0/24 82/2/16 84/0/16 94/0/6

12 54/4/42 65/3/32 65/4/31 69/4/27 83/2/15 12 66/5/29 86/1/13 80/0/20 91/0/9 89/0/11

14 59/8/33 68/6/26 75/2/23 76/4/20 76/6/18 14 74/2/24 83/1/16 92/1/7 92/0/8 96/0/4

EMA-TAN vs. EMA#Var 40 80 160 320 640

2 21/53/26 24/41/35 24/48/28 23/45/32 25/45/30

4 47/14/39 49/6/45 49/10/41 50/5/45 56/3/41

8 52/10/38 58/4/38 80/4/16 90/0/10 89/0/11

10 48/13/39 55/8/37 69/6/25 81/1/18 88/1/11

12 55/8/37 78/6/16 79/1/20 88/2/10 88/0/12

14 55/11/34 68/7/25 81/3/16 84/3/13 92/0/8

Table 1: Comparison between EMA-TAN models and EM-TAN, EM-BNET and EMA learned from datasetssampled from random TAN models

J. Cerquides and R. Lopez de Mantaras. 2005. TANclassifiers based on decomposable distributions.Machine Learning, 59(3):323–354.

P. Cheeseman and J. Stutz. 1996. Bayesian classi-fication (Autoclass): Theory and results. In Ad-vances in Knowledge Discovery and Data Mining,pages 153–180. AAAI Press.

B. Clarke. 2003. Comparing Bayes model averag-ing and stacking when model approximation errorcannot be ignored. Journal of Machine LearningResearch, 4:683–712.

G. F. Cooper and E. Herskovits. 1992. A Bayesianmethod for the induction of probabilistic networksfrom data. Machine Learning, 9:309–347.

D. Dash and G. F. Cooper. 2004. Model averagingfor prediction with discrete Bayesian networks.Journal of Machine Learning Research, 5:1177–1203.

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977.Maximum likelihood from incomplete data via theEM algorithm (with discussion). Journal of theRoyal Statistical Society B, 39:1–38.

P. Domingos. 2000. Bayesian averaging of classifiersand the overfitting problem. In Proceedings of theSeventeenth International Conference on MachineLearning, pages 223–230.

R. Duda and P. Hart. 1973. Pattern Classificationand Scene Analysis. John Wiley and Sons.

N. Friedman and D. Koller. 2003. Being Bayesianabout network structure: A Bayesian approachto structure discovery in Bayesian networks. Ma-chine Learning, 50:95–126.

N. Friedman, D. Geiger and M. Goldszmidt 1997.Bayesian networks classifiers. Machine Learning,29:131–163.

N. Friedman. 1998. The Bayesian structural EMalgorithm. In Proceedings of the Fourteenth Con-ference on Uncertainty in Artificial Intelligence,pages 129–138.

D. Heckerman, D. Geiger, and D. M. Chickering.1995. Learning Bayesian networks: The combina-tion of knowledge and statistical data. MachineLearning, 20:197–243.

J. Hoeting, D. Madigan, A. E. Raftery, and C. Volin-sky. 1999. Bayesian model averaging. StatisticalScience, 14:382–401.

F.V. Jensen. 2001. Bayesian Networks and DecisionGraphs. Springer Verlag.

D. Madigan and A. E. Raftery. 1994. Model se-lection and accounting for model uncertainty ingraphical models using Occam’s window. Journalof the American Statistical Association, 89:1535–1546.

T. Minka. 2002. Bayesian model averaging is notmodel combination. MIT Media Lab note.

J. M. Pena, J. A. Lozano, and P. Larranaga. 2002.Learning recursive Bayesian multinets for cluster-ing by means of constructive induction. MachineLearning, 47(1):63-90.

G. Santafe, J. A. Lozano, and P. Larranaga. 2006.Bayesian model averaging of naive Bayes for clus-tering. IEEE Transactions on Systems, Man, andCybernetics–Part B. Accepted for publication.

278 G. Santafé, J. A. Lozano, and P. Larrañaga

Page 289: Proceedings of the Third European Workshop on Probabilistic

Dynamic Weighting A* Search-based MAP Algorithm forBayesian Networks

Xiaoxun SunComputer Science Department

University of Southern CaliforniaLos Angeles, CA [email protected]

Marek J. DruzdzelDecision Systems LaboratorySchool of Information Sciences& Intelligent Systems Program

University of PittsburghPittsburgh, PA [email protected]

Changhe YuanDepartment of Computer Science

and EngineeringMississippi State UniversityMississippi State, MS [email protected]

Abstract

In this paper we introduce the Dynamic Weighting A* (DWA*) search algorithm for solv-ing MAP. By exploiting asymmetries in the distribution of MAP variables, the algorithmcan greatly reduce the search space and yield MAP solutions with high quality.

1 Introduction

The Maximum A-Posteriori assignment (MAP)is the problem of finding the most probable in-stantiation of a set of variables given partial ev-idence on the remaining variables in a Bayesiannetwork. A special case of MAP that has beenpaid much attention is the Most Probable Ex-planation (MPE) problem. MPE is the problemof finding the most probable state of the modelgiven that all its evidences have been observed.MAP turns out to be a much harder problemcompared to MPE. Particularly, MPE is NP-complete while the corresponding MAP prob-lem is NPPP -complete (Park, 2002). MAP ismore useful than MPE for providing explana-tions. For instance, in diagnosis, generally weare only interested in the configuration of faultvariables given some observations. There maybe many other variables that have not been ob-served and are outside the scope of our interest.

In this paper, we introduce the DynamicWeighting A* (DWA*) search algorithm forsolving MAP that is generally more efficientthan existing algorithms. The algorithm ex-plores the asymmetries among all possible as-signments in the joint probability distributionof the MAP variables. Typically, a small frac-tion of all possible assignments can be expectedto cover a large portion of the total probabil-

ity space with the remaining assignments hav-ing practically negligible probability (Druzdzel,1994). Also, the DWA* uses dynamic weight-ing based on greedy guess (Park and Darwiche,2001; Yuan et al., 2004) as the heuristic func-tion. While it is theoretically not admissi-ble (admissible heuristic should offer an upperbound on the MAP), it offers ε-admissibility andexcellent performance in practice (Pearl, 1988).

2 MAP

The MAP problem is defined as follows: Let M

be the set of MAP variables (we are interestedin the most probable configuration of these vari-ables); E is the set of evidence, namely thevariables whose states we have observed; Theremainder of the variables, denoted by S, arevariables that are neither observed nor of inter-est to us. Given an assignment e of variablesE, the MAP problem is that of finding the as-signment m of variables M which maximizesthe probability of P (m | e), while MPE is thespecial case of MAP with S being empty.

map = maxM

S

p(M, S | E) . (1)

In Bayesian networks, we use the ConditionalProbability Table (CPT) φ as the potential overa variable and its parents. A potential over all

Page 290: Proceedings of the Third European Workshop on Probabilistic

the states of one variable after updating beliefsis called marginal. The notation φe stands forthe potential in which we have fixed the valueof e ∈ E. Then the probability of MAP with Φas its CPTs turns out to be a real number:

map = maxM

S

φ∈Φ

φe . (2)

In Equation 2, summation does not commutewith maximization. Therefore, it is necessaryto do summation before the maximization. Theorder is called the elimination order. The sizeof the largest clique minus 1 in a jointree con-structed based on an elimination order is calledthe induced width. The induced width of thebest elimination order is called the treewidth.However, for the MAP problems in which nei-ther the set S nor the set M are empty, the orderis constrained. Then the constrained elimina-tion order is known as the constrained treewidth.Generally, the constrained treewidth is muchlarger than treewidth, leading the problem be-yond the limits of feasibility.

Several researchers have proposed algorithmsfor solving MAP. A very efficient approximatesearch-based algorithm based on local search,proposed by Park (2002), is capable of solvingMAP efficiently. An exact method, based onbranch-and-bound depth-first search, proposedby Park and Darwiche (2003), performs quitewell when the search space is not too large. An-other approximate scheme, proposed by Yuan etal. (2004), is a Reheated Annealing MAP algo-rithm. It is somewhat slower on simple networksbut it is capable of handling difficult cases thatexact methods cannot tackle.

3 Solving MAP using Dynamic

Weighting A* Search

We propose in this section an algorithm forsolving MAP using Dynamic Weighting A*search, which incorporates the dynamic weight-ing (Pearl, 1988) in the heuristic function,relevance reasoning (Druzdzel and Suermondt,1994), and dynamic ordering in the search tree.

3.1 A* search

MAP can be solved by A* search in the proba-bility tree that is composed of all the variablesin the MAP set. The nodes in the search treerepresent partial assignments of the MAP vari-ables M. The root node represents an emptyassignment. Each MAP variable will be instan-tiated in a certain order. If a variable x in theset of MAP variables M is instantiated at theith place using its jth state, it will be denotedas Mij . Leaves of the search tree correspond tothe last MAP variable that has been instanti-ated. The vector of instantiated states of eachMAP variable is called an assignment or a sce-nario. We compute the probability of assign-ments while searching the whole probability treeusing chain rule. For each inner node, the newlyinstantiated node will be added to the evidenceset, i.e., the evidence set will be extended toMij ∪E. Then the probability of an assignmentof n MAP variables can be computed as follows:

P (M | E) = P (Mni |M1j , M2k, . . .M(n−1)t, E)

. . . P (M2k |M1j , E)P (M1j | E) .

Suppose we are in the xth layer of the search treeand preparing for instantiating the xth MAPvariables. Then the function above can berewritten as follows:

P (M | E) =

b

︷ ︸︸ ︷

P (Mni |M1j . . . M(n−1)t, E) . . . P (M(x+1)z |Mxy . . . E)

·P (Mxy |M1j , M2k . . . M(x−1)q, E) . . . P (M1j | E)︸ ︷︷ ︸

a

(3)

The general idea of DWA* is that in each in-ner node of the probability tree, we can computethe value of item (a) in the function above ex-actly. We can estimate the heuristic value of theitem (b) for the MAP variables that have notbeen instantiated given the initial evidence setand the MAP variables that have been instanti-ated as the new evidence. In order to fit the typ-ical format of the cost function of A* search, wecan take the logarithm of the equation above,which will not change its monotonicity. Thenwe get f(n) = g(n)+h(n), where g(n) and h(n)

280 X. Sun, M. J. Druzdzel, and C. Yuan

Page 291: Proceedings of the Third European Workshop on Probabilistic

are obtained from the logarithmic transforma-tion of items (a) and (b) respectively. g(n) givesthe exact cost from the start node to node in thenth layer of the search tree, and h(n) is the es-timated cost of the best search path from thenth layer to the leaf nodes of the search tree.In order to guarantee the optimality of the so-lution, h(n) should be admissible, which in thiscase means that it should be an upper-bound onthe value of any assignment with the currentlyinstantiated MAP variables as its elements.

3.2 Heuristic Function with Dynamic

Weighting

Definition 1. A heuristic function h2 is said tobe more informed than h1 if both are admissibleand h2 is closer to the optimal cost.

For the MAP problem, the probability of theoptimal assignment Popt < h2 < h1.

Theorem 1. If h2 is more informed than h1

then A∗2

dominates A∗1. (Pearl, 1988)

The power of the heuristic function is mea-sured by the amount of pruning induced by h(n)and depends on the accuracy of this estimate.If h(n) estimates the completion cost precisely(h(n) = Popt), then A* will only expand nodeson the optimal path. On the other hand, if noheuristic at all is used, (for the MAP problemthis amounts to h(n) = 1), then a uniform-costsearch ensues, which is far less efficient. So itis critical for us to find an admissible and tight

h(n) to get both accurate and efficient solutions.

3.2.1 Greedy Guess

If each variable in the MAP set M is condi-tionally independent of all remaining MAP vari-ables (this is called exhaustive independence),then the MAP problem amounts to a simplecomputation based on the greedy chain rule.We instantiate the MAP variable in the currentsearch layer to the state with the largest proba-bility and repeat this for each of the remainingMAP variables one by one. The probability ofMAP is then

P (M |E) =n

i=1

maxj

P (Mij |M(i−1)k . . .M1m, E) .

(4)

The requirement of exhaustive independenceis too strict for most MAP problems. However,simulation results show that in practice, evenwhen this requirement is violated, the productis still extremely close to the MAP probability(Yuan et al., 2004). This suggests using it as anε-admissible heuristic function (Pearl, 1988).

The curve Greedy Guess Estimate in Figure 1shows that with the increase of the number ofMAP variables, the ratio between the greedyguess and the accurate estimate of the optimalprobability diverges from the ideal ratio one al-though not always monotonically.

3.2.2 Dynamic Weighting

Since greedy guess is a tight lower bound onthe optimal probability of MAP, it is possibleto compensate for the error between the greedyguess and the optimal probability. We can dothis by adding a weight to the greedy guess suchthat the product of them is equal to or largerthan the optimal probability for each inner nodein the search tree. This, it turns out, yields anexcellent ε-admissible heuristic function. Thisassumption can be represented as follows:

∃ε∀PGreedyGuess ∗ (1 + ε) ≥ Popt∧

∀ε′

(PGreedyGuess ∗ (1 + ε′

) ≥ Popt)⇒ ε < ε′

,

where ε is the minimum weight that can guar-antee the heuristic function to be admissible.Figure 1 shows that if we just keep ε constant,neglecting the changes of the estimate accuracywith the increase of the MAP variables, the es-timate function and the optimal probability canbe represented by the curve Constant Weight-ing Heuristic. Obviously, the problem with thisidea is that it is less informed when the searchprogresses, as there are fewer MAP variables toestimate.

Dynamic Weighting (Pohl, 1973) is an effi-cient tool for improving the efficiency of A*search. If applied properly, it will keep theheuristic function admissible while remainingtight on the optimal probability. For MAP, inthe shallow layer of the search tree, we get moreMAP variables than the deeper layer for esti-mate. Hence, the greedy estimate will be more

Dynamic Weighting A* Search-based MAP Algorithm for Bayesian Networks 281

Page 292: Proceedings of the Third European Workshop on Probabilistic

likely to diverge from the optimal probability.We propose the following Dynamic WeightingHeuristic Function for the xth layer of the searchtree of n MAP variables:

h(x) = GreedyGuess · (1 + αn− (x + 1)

n)

(α ≥ ε) .

Rather than keeping the weight constantthroughout the search, we dynamically changeit so as to make it less heavy as the search goesdeeper. In the last step of the search (x = n−1),the weight will be zero, since the greedy guessfor only one MAP variable is exact and then thecost function f(n-1) is equal to the probabilityof the assignment. Figure 1 shows an empiri-cal comparison of greedy guess, constant, anddynamic weighting heuristics against accurateestimate of the probability. We see that the dy-namic weighting heuristic is more informed thanconstant weighting.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 5 10 15 2 0 2 5 3 0 3 5 4 0

T h e n u m b e r o f M A P v a ria b le s

Qu

alit

y o

f H

(X)

(h(x

)/o

ptim

al)

Figure 1: An empirical comparison of the con-stant weighting and the dynamic weightingheuristics based on greedy guess.

3.3 Searching with Inadmissible

Heuristics for MAP Problem

Since the minimum weight ε that can guaranteethe heuristic function to be admissible is un-known before the MAP problem is solved, andit may vary between cases, we normally set α tobe a safe parameter that is supposed to be largerthan ε (In our experiments, we set α to be 1.0).However, if α is accidentally smaller than ε, it

will lead the weighted heuristic to be inadmis-sible. Let us give this a closer look and analyzethe conditions under which the algorithm failsto achieve optimality. Suppose there are twocandidate assignments: s1 and s2 with proba-bilities p1 and p2 respectively, among which s2

is the optimal assignment that the algorithmfails to find. And s1 is now in the last step ofsearch which will lead to a suboptimal solution.We skip the logarithm in the function for thesake of clarity here (then the cost function f is aproduct of transformed g and h instead of theirsum).

f1 = g1 · h1 and f2 = g2 · h2

The error introduced by a inadmissible h2 isf1 > f2. The algorithm will then find s1 in-stead of s2, i.e.,

f1 > f2 ⇒ g1 · h1 > g2 · h2.

Since s1 is now in the last step of search, f1 = p1

(Section 3.2.2). Now, suppose that we havean ideal heuristic function h

2, which leads to

p2 = g2 · h′

2. Then we have:

g1 · h1

p2

>g2 · h2

g2 · h′

2

⇒p1

p2

>g2 · h2

g2 · h′

2

⇒p1

p2

>h2

h′

2

.

It is clear that only when the ratio betweenthe probability of suboptimal assignment andthe optimal one is larger than the ratio be-tween the inadmissible heuristic function andthe ideal one, will the algorithm find a subop-timal solution. Because of large asymmetriesamong probabilities that are further amplifiedby their multiplicative combination (Druzdzel,1994), we can expect that for most cases, theratios between p1 and p2 are far less than 1.Even though the heuristic function will some-times break the rule of admissibility, if only thegreedy guess is not too divergent from the idealestimate, the algorithm will still achieve opti-mality. Our simulation results also confirm therobustness of the algorithm.

3.4 Improvements to the Algorithm

There are two main techniques that we used toimprove the efficiency of the basic A* algorithm.

282 X. Sun, M. J. Druzdzel, and C. Yuan

Page 293: Proceedings of the Third European Workshop on Probabilistic

3.4.1 Relevance Reasoning

The main problem faced by the decision-theoretic approach is the complexity of prob-abilistic reasoning. The critical factor in ex-act inference schemes for Bayesian networksis the topology of the underlying graph and,more specifically, its connectivity. The frame-work of relevance reasoning (Druzdzel andSuermondt (1994) provides an accessible sum-mary of the relevant techniques) is based ond-separation and other simple and computa-tional efficient techniques for pruning irrelevantparts of a Bayesian network and can yield sub-networks that are smaller and less densely con-nected than the original network. Relevancereasoning is an integral part of the SMILE li-brary (Druzdzel, 2005) on which the implemen-tation of our algorithm is based.

For MAP, our focus is the set of variablesM and the evidence set E. Parts of the modelthat are probabilistically independent from thenodes in M given the observed evidence E arecomputationally irrelevant to reasoning aboutthe MAP problem. Removing them leads tosubstantial savings in computation.

3.4.2 Dynamic Ordering

As the search tree is constructed dynamically,we have the freedom to order the variables in away that will improve the efficiency of DWA*.Expanding nodes with the largest asymmetriesin marginal probability distribution leads toearly cut-off of less promising branches of thesearch tree. We use the entropy of the marginalprobability distributions as a measure of asym-metry.

4 Experimental Results

To test DWA*, we compared its performancein real Bayesian networks with those of currentstate of the art MAP algorithms: the P-Loc

and P-Sys algorithms (Park and Darwiche,2001; Park and Darwiche, 2003) implementedin SamIam, and AnnealedMAP (Yuan et al.,2004) in SMILE respectively. We implementedDWA* in C++ and performed our tests on a 3.0GHz Pentium D Windows XP computer with2GB RAM. We used the default parameters and

settings for all the three algorithms above dur-ing comparison, unless otherwise stated.

4.1 Experimental Design

The Bayesian networks that we used in ourexperiments included Alarm (Beinlich et al.,1989), Barley (Kristensen and Rasmussen,2002), CPCS179 and CPCS360 (Pradhan et al.,1994), Diabetes (Andreassen et al., 1991), Hail-finder (Abramson et al., 1996), Munin (An-dreassen et al., 1989), Pathfinder (Heckerman,1990), Andes, and Win95pts (Heckerman et al.,1995). We also tested the algorithms on twolarge proprietary diagnostic networks built atthe HRL Laboratories (HRL1 and HRL2). Wedivided the networks into three groups: (1)small and middle-sized, (2) large but tractable,and (3) hard networks.

For each network, we randomly generated 20cases. For each case, we randomly chose 20MAP variables from among the root nodes. Wechose the same number of evidence nodes fromamong the leaf nodes. Following tests of MAPalgorithms published earlier in the literature, weset the search time limit to be 3, 000 seconds.

4.2 Results of First & Second Group

We firstly ran the P-Loc, P-Sys, An-

nealedMAP and DWA* on all networks in thefirst and second group. The P-Sys is an exactalgorithm. So Table 2 only reports the numberof MAP problems that were solved optimally bythe rest three algorithms. DWA* found all opti-mal solutions. The P-Loc missed only one caseon Andes and the AnnealedMAP missed oneon Hailfinder and two cases on Andes.

Since both AnnealedMAP and P-Loc

failed to find all optimal solutions in Andes, westudied the performance of the four algorithmsas a function of the number of MAP variables(we randomly generated 20 cases for each num-ber of MAP variables) on it.

Because P-Sys failed to generate any resultwhen the number of MAP variables reached 40,while DWA* found all largest probabilities, wesubsequently compared all the other three al-gorithms with DWA*. With the increase ofthe number of MAP variables, both P-Loc and

Dynamic Weighting A* Search-based MAP Algorithm for Bayesian Networks 283

Page 294: Proceedings of the Third European Workshop on Probabilistic

Group Networks P-Loc A-MAP A*

1 Alarm 20 20 20CPCS179 20 20 20CPCS360 20 20 20Hailfinder 20 19 20Pathfinder 20 20 20

Andes 19 18 20Win95pts 20 20 20

2 Munin 20 20 20HRL1 20 20 20HRL2 20 20 20

Table 1: Number of cases solved optimally outof 20 cases for the first and second group.

#MAP P-Sys P-Loc A-MAP

10 0 0 020 0 1 230 0 1 040 TimeOut 4 450 TimeOut 6 260 TimeOut 5 270 TimeOut 6 580 TimeOut 6 1

Table 2: Number of cases in which DWA*found more probable instantiation than theother three algorithms (network Andes).

AnnealedMAP turned out to be less accu-rate than DWA* on Andes. When the num-ber of MAP variables was above 40, there wereabout 25% cases of P-Loc and 15% cases inwhich AnnealedMAP found smaller probabil-ities than DWA*. We notice from Table 2 thatP-Loc spent less time than DWA* when usingits default settings for Andes, so we increasedthe search steps of P-Loc such that it spentthe same amount of time as DWA* in order tomake a fair comparison. However, in practicethe search time is not continuous in the numberof search steps, so we just chose parameters forP-Loc such that it spent only a little bit moretime than DWA*. Table 3 shows the compar-ison results. We can see that after increasingthe search steps of P-Loc, DWA* still main-tains better accuracy.

In addition to the precision of the results, we

#MAP P-Loc<DWA* P-Loc>DWA*

10 0 020 0 030 0 040 1 050 2 060 2 170 3 280 5 0

Table 3: The number of cases in which the P-

Loc algorithm found larger/smaller probabili-ties than DWA* in network Andes when spend-ing a little bit more time than DWA*.

also compared the efficiency of the algorithms.Table 4 reports the average running time ofthe four algorithms on the first and the sec-ond groups of networks. For the first group,

P-Sys P-Loc A-MAP A*

Alarm 0.017 0.020 0.042 0.005CPCS179 0.031 0.117 0.257 0.024CPCS360 0.045 75.20 0.427 0.072Hailfinder 2.281 0.109 0.219 0.266Pathfinder 0.052 0.056 0.098 0.005Andes 14.49 1.250 4.283 2.406Win95pts 0.035 0.041 0.328 0.032

Munin 3.064 4.101 19.24 1.763HRL1 0.493 51.18 2.831 0.193HRL2 0.092 3.011 2.041 0.169

Table 4: Running time (in seconds) of the fouralgorithms on the first and second group.

the AnnealedMAP, P-Loc and P-Sys algo-rithms showed similar efficiency on all exceptthe CPCS360 and Andes networks. DWA* gen-erated solutions with the shortest time on av-erage. Its smaller variance of the search timeindicates that DWA* is more stable across dif-ferent networks.

For the second group, which consists of largeBayesian networks, P-Sys, AnnealedMAP

and DWA* are all efficient. DWA* still spendsshortest time on average, while the P-Loc ismuch slower on the HRL1 network.

284 X. Sun, M. J. Druzdzel, and C. Yuan

Page 295: Proceedings of the Third European Workshop on Probabilistic

4.3 Results of Third Group

The third group consists of two complexBayesian networks: Barley and Diabetes, manynodes of which have more than 10 differentstates. Because the P-Sys algorithm did notproduce results within the time limit, the onlyavailable measure of accuracy was a relative one:which of the algorithms found an assignmentwith higher probability. Table 5 lists the num-ber of cases that were solved differently betweenthe P-Loc, AnnealedMAP, and DWA* algo-rithms. PL, PA and P∗ stand for the proba-bility of MAP solutions found by P-Loc, An-

nealedMAP and DWA* respectively.

Group3 P∗>PL/P∗<PL P∗>PA/P∗<PA

Barley 3/2 5/3Diabetes 5/0 4/0

Table 5: The numberof cases that are solveddifferently from P-Loc, AnnealedMAP andDWA*.

For Barley, the accuracy of the three algo-rithms was quite similar. However, for DiabetesDWA* was more accurate: it found solutionswith largest probabilities for all 20 cases, whileP-Loc failed to find 5 and AnnealedMAP

failed to find 4 of them.

Group3 P-Sys P-Loc A-MAP A*Barley TimeOut 68.63 31.95 122.1Diabetes TimeOut 338.4 163.4 81.8

Table 6: Running time (in seconds) of the fouralgorithms on the third group.

DWA* turns out to be slower than P-Loc

and AnnealedMAP on Barley but more effi-cient on Diabetes (see Table 6).

4.4 Results of Incremental MAP Test

Out last experiment focused on the robustnessof the four algorithms to the number of MAPvariables. In this experiment, we set the num-ber of evidence nodes to 100, generated MAPproblems with an increasing number of MAPnodes. We chose the Munin network, becauseit seems the hardest network among the group1 & 2 and has sufficiently large sets of root and

leaf nodes. The running times are shown in Fig-ure 2. Typically, P-Sys and P-Loc need morerunning time in face of more complex problems,while AnnealedMAP and DWA* seem morerobust in comparison.

Number of MAP Variables(100 Evidences)

Ru

nn

ing

Tim

e (

m)

P-Sys

P-L o c

A n n e a lin g M A P

D W A *

Figure 2: Running time of the four algorithmswhen increasing the number of MAP nodes onthe Munin network.

5 Discussion

Solving MAP is hard. By exploiting asymme-tries among the probabilities of possible ele-ments of the joint probability distributions ofMAP variables, DWA* is able to greatly reducethe search space and lead to efficient and accu-rate solutions of MAP problems. Our experi-mental results also show that generally, DWA*is more efficient than the existent algorithms.Especially for large and complex Bayesian net-works, when the exact algorithm fails to gen-erate any result within a reasonable time, theDWA* can still provide accurate solutions effi-ciently. Further extension of this research is toapply DWA* to the K-MAP problem, which isto find k most probable assignments for MAPvariables. It is very convenient for DWA* toachieve that, since in the process of finding themost probable assignment the algorithm keepsall the candidate assignments in the search fron-tier. We can expect that the additional searchtime will be linear in k.

In sum, DWA* enriches the approaches forsolving MAP problem and extends the scope ofMAP problems that can be solved.

Dynamic Weighting A* Search-based MAP Algorithm for Bayesian Networks 285

Page 296: Proceedings of the Third European Workshop on Probabilistic

Acknowledgements

This research was supported by the Air ForceOffice of Scientific Research grants F49620–03–1–0187 and FA9550–06–1–0243 and by Intel Re-search. We thank anonymous reviewers for sev-eral insightful comments that led to improve-ments in the presentation of the paper. Allexperimental data have been obtained usingSMILE, a Bayesian inference engine developedat the Decision Systems Laboratory and avail-able at http://genie.sis.pitt.edu/.

References

B. Abramson, J. Brown, W. Edwards, A. Murphy,and R. Winkler. 1996. Hailfinder: A Bayesiansystem for forecasting severe weather. Interna-tional Journal of Forecasting, 12(1):57–72.

S. Andreassen, F. V. Jensen, S. K. Andersen,B. Falck, U. Kjærulff, M. Woldbye, A. R.Sørensen, A. Rosenfalck, and F. Jensen. 1989.MUNIN — an expert EMG assistant. In John E.Desmedt, editor, Computer-Aided Electromyogra-phy and Expert Systems, chapter 21. Elsevier Sci-ence Publishers, Amsterdam.

S. Andreassen, R. Hovorka, J. Benn, K. G. Olesen,and E. R. Carson. 1991. A model-based ap-proach to insulin adjustment. In M. Stefanelli,A. Hasman, M. Fieschi, and J. Talmon, editors,Proceedings of the Third Conference on ArtificialIntelligence in Medicine, pages 239–248. Springer-Verlag.

I. Beinlich, G. Suermondt, R. Chavez, andG. Cooper. 1989. The ALARM monitoring sys-tem: A case study with two probabilistic in-ference techniques for belief networks. In InProc. 2’nd European Conf. on AI and Medicine,pages 38:247–256, Springer-Verlag, Berlin.

Marek J. Druzdzel and Henri J. Suermondt. 1994.Relevance in probabilistic models: “Backyards” ina “small world”. In Working notes of the AAAI–1994 Fall Symposium Series: Relevance, pages60–63, New Orleans, LA (An extended version ofthis paper is in preparation.), 4–6 November.

M. J. Druzdzel. 1994. Some properties of joint prob-ability distributions. In Proceedings of the 10thConference on Uncertainty in Artificial Intelli-gence (UAI–94), pages 187–194, Morgan Kauf-mann Publishers San Francisco, California.

M. J. Druzdzel. 2005. Intelligent decision supportsystems based on smile. Software 2.0, 2:12–33.

D. Heckerman, J. Breese, and K. Rommelse. 1995.Decision-theoretic troubleshooting. Communica-tions of the ACM, 38:49–57.

D. Heckerman. 1990. Probabilistic similarity net-works. Networks, 20(5):607–636, August.

K. Kristensen and I.A. Rasmussen. 2002. The useof a Bayesian network in the design of a decisionsupport system for growing malting barley with-out use of pesticides. Computers and Electronicsin Agriculture, 33:197–217.

J. D. Park and A. Darwiche. 2001. Approximat-ing MAP using local search. In Proceedings ofthe 17th Conference on Uncertainty in ArtificialIntelligence (UAI–01), pages 403–410, MorganKaufmann Publishers San Francisco, California.

J. D. Park and A. Darwiche. 2003. Solving MAPexactly using systematic search. In Proceedingsof the 19th Conference on Uncertainty in Artifi-cial Intelligence (UAI–03), pages 459–468, Mor-gan Kaufmann Publishers San Francisco, Califor-nia.

J. D. Park. 2002. MAP complexity results and ap-proximation methods. In Proceedings of the 18thConference on Uncertainty in Artificial Intelli-gence (UAI–02), pages 388–396, Morgan Kauf-mann Publishers San Francisco, California.

J. Pearl. 1988. Heuristics : intelligent searchstrategies for computer problem solving. Addison-Wesley Publishing Company, Inc.

I. Pohl. 1973. The avoidance of (relative) catas-trophe, heuristic competence, genuine dynamicweighting and computational issues in heuristicproblem solving. In Proc. of the 3rd IJCAI, pages12–17, Stanford, MA.

M. Pradhan, G. Provan, B. Middleton, and M. Hen-rion. 1994. Knowledge engineering for large be-lief networks. In Proceedings of the Tenth An-nual Conference on Uncertainty in Artificial In-telligence (UAI–94), pages 484–490, San Mateo,CA. Morgan Kaufmann Publishers, Inc.

S. E. Shimony. 1994. Finding MAPs for belief net-works is NP-hard. Artificial Intelligence, 68:399–410.

C. Yuan, T. Lu, and M. J. Druzdzel. 2004. AnnealedMAP. In Proceedings of the 20th Annual Con-ference on Uncertainty in Artificial Intelligence(UAI–04), pages 628–635, AUAI Press, Arling-ton, Virginia.

286 X. Sun, M. J. Druzdzel, and C. Yuan

Page 297: Proceedings of the Third European Workshop on Probabilistic

A Short Note on Discrete Representability of IndependenceModels

Petr SimecekInstitute of Information Theory and Automation

Academy of Sciences of the Czech RepublicPod Vodarenskou vezı 4,

182 08 Prague, Czech Republic

Abstract

The paper discusses the problem to characterize collections of conditional independencetriples (independence model) that are representable by a discrete distribution. The knownresults are summarized and the number of representable models over four elements set,often mistakenly claimed to be 18300, is corrected. In the second part, the bounds for thenumber of positively representable models over four elements set are derived.

1 Introduction

Conditional independence relationships occurnaturally among components of highly struc-tured stochastic systems. In a field of graphicalMarkov models, a graph (nodes connected byedges) is used to represent the CI structure of aset of probability distributions.

Given the joint probability distribution ofa collection of random variables it is easy to con-struct the list of all CIs among them. On theother hand, given the list (here called an inde-pendence model) of CIs an interesting questionarises whether there exists a collection of dis-crete random variables meeting these and onlythese CIs, i.e. representing that model.

The problem of probabilistic representabilitycomes originally from J. Pearl, cf. (Pearl, 1998).It was proved by M. Studeny in (Studeny, 1992)that there is no finite characterization (≡finiteset of inference rules) of the set of representableindependence models.

The interesting point is that for the fixednumber of variables (or vertices) the numberof representable independence models is muchhigher than the number of graphs. Therefore,even partial characterization of representableindependence models may help to improve andunderstand the limits of learning of Bayesiannetworks and Markov models.

2 Independence models

For the reader’s convenience, auxiliary resultsrelated to independence models are recalled inthis section.

Throughout the paper, the singleton a willbe shorten by a and the union of sets A∪B willbe written simply as juxtaposition AB. A ran-dom vector ξ = (ξa)a∈N is a collection of ran-dom variables indexed by a finite set N . ForA ⊆ N , a subvector (ξa)a∈A is denoted by ξA;ξ∅ is presumed to be a constant. Analogously,if x = (xa)a∈N is a constant vector then xA isan appropriate coordinate projection.

Provided A, B, C are pairwise disjoint subsetsof N , “ξA⊥⊥ξB|ξC” stands for a statement ξA

and ξB are conditionally independent given ξC .In particular, unconditional independence (C =∅) is abbreviated as ξA⊥⊥ξB.

A random vector ξ = (ξa)a∈N is called dis-

crete if each ξa takes values in a state spaceXa such that 1 < |Xa| < ∞. A discrete randomvector ξ is called positive if for any appropriateconstant vector x

0 < P (ξ = x) < 1.

In the case of discretely distributed random vec-tor, variables ξa and ξb are independent1 given

1The independence relation between random vectors

Page 298: Proceedings of the Third European Workshop on Probabilistic

ξC iff for any appropriate constant vector xabC

P (ξabC = xabC)P (ξC = xC) =

P (ξaC = xaC)P (ξbC = xbC).

Let N be a finite set and TN denotes the setof all pairs 〈ab|C〉 such that ab is an (unordered)couple of distinct elements of N and C ⊆ N \ab.

Subsets of TN will be referred as formal in-

dependence models over N . Independencemodels ∅ and TN are called trivial.

The independence model I(ξ) induced by arandom vector ξ indexed by N is the indepen-dence model over N defined as follows

I(ξ) = 〈ab|C〉; ξa⊥⊥ξb|ξC .

Let us emphasize that an independence modelI(ξ) uniquely determines also all other condi-tional independences among subvectors of ξ, cf.(Matus, 1992).

Diagrams proposed by R. Lnenicka will beused for a visualisation of independence modelI over N such that |N | ≤ 4. Each element ofN is plotted as a dot. If 〈ab|∅〉 ∈ I then dotscorresponding to a and b are joined by a line. If〈ab|c〉 ∈ I then we put a line between dots cor-responding to a and b and add small line in themiddle pointing in c–direction. If both 〈ab|c〉and 〈ab|d〉 are elements of I, then only one linewith two small lines in the middle is plotted.Finally, if 〈ab|cd〉 ∈ I is visualised by a bracebetween a and b. See example in Figure 1.

21

34

Figure 1: Diagram of the independence modelI =

〈12|∅〉, 〈23|1〉, 〈23|4〉, 〈34|12〉, 〈14|∅〉,

〈14|2〉.

Independence models I and I∗ over N will becalled isomorphic if there exists a permutation

ξA, ξ

Bgiven ξ

Cmight be defined analogously. However,

we will see that such relationships are uniquely deter-mined by the elementary ones (|A| = |B| = 1).

π on N such that

〈ab|C〉 ∈ I ⇐⇒ 〈π(a)π(b)|π(C)〉 ∈ I∗,

where π(C) stands for π(c); c ∈ C. See Fig-ure 2 for an example of three isomorphic models.

An equivalence class of independence modelswith respect to the isomorphic relation will bereferred as type.

Figure 2: Example of three isomorphic models.

An independence model I is said to be repre-

sentable2 if there exists a discretely distributedrandom vector ξ such that I = I(ξ). In addi-tion, a special attention will be devoted to pos-

itive representations, i.e. representations bya positive discrete distribution.

Let us note that isomorphic models are eitherall representable or non-representable. Conse-quently, we can classify types as representableand non-representable.

Lemma 1. If I = I(ξ) and I∗ = I(ξ∗) arerepresentable independence models then the in-dependence model I ∩ I∗ is also representable.In particular, if they have positive representa-tions then there exists a positive representationof I ∩ I∗, too.

Proof. Let X =∏

Xa and X ′ =∏

X ′a be state

spaces of ξ and ξ′, respectively. The requiredrepresentation ξ takes place in

X =∏

a∈N

Xa ×X ′a

and it is distributed as follows

P (ξ = (xa, x′a)a∈N ) =

P (ξ = (xa)a∈N ) · P (ξ′ = (x′a)a∈N ).

See (Studeny and Vejnarova, 1998), pp. 5, formore details.

2Of course, it is also possible to consider repre-sentability in other distribution frameworks that the dis-crete distributions, cf. (Lnenicka, 2005).

288 P. Šimeček

Page 299: Proceedings of the Third European Workshop on Probabilistic

Figure 3: Irreducible models over N = 1, 2, 3, 4.

A Short Note on Discrete Representability of Independence Models 289

Page 300: Proceedings of the Third European Workshop on Probabilistic

Lemma 2. Let a, b, c be distinct elements of N

and D ⊆ N \ abc. If an independence models I

over N is representable, then

(〈ab|cD〉, 〈ac|D〉 ⊆ I

)⇐⇒(

〈ac|bD〉, 〈ab|D〉 ⊆ I).

Moreover, if I is positively representable, then

(〈ab|cD〉, 〈ac|bD〉 ⊆ I

)=⇒(

〈ab|D〉, 〈ac|D〉 ⊆ I).

Proof. These are so called “semigraphoid” and“graphoid” properties, cf. (Lauritzen, 1996) forthe proof.

3 Representability of Independence

Models over N = 1, 2, 3, 4

For N consisting of three or less elements, allindependence models not contradicting proper-ties from Lemma 2 are representable, resp. pos-itively representable, cf. (Studeny, 2005). Thatis why we focus on N = 1, 2, 3, 4 from now tothe end of the paper.

The first subsection summarizes known re-sults related to (general) representability of in-dependence models. The second subsection isdevoted to positive representability.

3.1 General Representability

The problem was solved in a brilliant seriesof papers (Matus and Studeny, 1995), (Matus,1995) and (Matus, 1999) by F. Matus andM. Studeny. The final conclusions are clearlyand comprehensibly presented in (Studeny andBocek, 1994) and (Matus, 1997)3.

In brief, due to Lemma 1 an intersection oftwo representable models is also representable.Therefore, the class of all representable modelsover N can be described by a set of so calledirreducible models C, i.e. nontrivial repre-sentable models that cannot be written as anintersection of two other representable models.It is not difficult to evidence that a nontrivial in-dependence model I is representable if and only

3To avoid confusion, note that (Matus, 1997) containsa minor typo in Figure 14 on pp. 21. Over the upperline in the first two diagrams should be ∅ instead of ∗.

if there exists A ⊆ C such that

I =⋂

C∈A

C.

There are only only 13 types of irreduciblemodels, see Figure 3 or (Studeny and Bocek,1994), pp. 277–278. The problematic point isthe total number of representable independencemodels over N . It has been believed that thisnumber is 18300, cf. (Studeny, 2002), (Lau-ritzen and Richardson, 2002), (Robins et al.,2003), (Simecek, 2006). . . However, working onthis paper I have discovered that there actu-ally exist 18478 different representable indepen-dence models over N of 1098 types. The list ofmodels has been checked by several programsincluding SG POKUS written by M. Studenyand P. Bocek. The list can be downloaded fromthe web page

http://5r.matfyz.cz/skola/models

3.2 Positive Representability

Only a little is known about positive rep-resentability. This paper would like to bethe first step to the complete characterisationof positively representable models over N =1, 2, 3, 4.

Obviously, a set of positively representablemodels is a subset of the set of (generally) repre-sentable models. In addition, positively repre-sentable model must fulfill properties followingthe second part of Lemma 2 and Lemma 3 be-low.

Lemma 3. Let a, b, c, d be distinct elements ofN . If I is a positively representable indepen-dence model oven N such that

〈ab|cd〉, 〈cd|ab〉, 〈cd|a〉 ⊆ I,

then〈cd|b〉 ∈ I ⇐⇒ 〈cd|∅〉 ∈ I.

Proof. See (Spohn, 1994), pp. 15.

There are 5547 (generally) representablemodels (356 types) meeting requirements on forpositively representable models from Lemma 2and Lemma 3. This is the upper bound to theset of all positively representable models.

290 P. Šimeček

Page 301: Proceedings of the Third European Workshop on Probabilistic

Figure 4: Undecided types.

A Short Note on Discrete Representability of Independence Models 291

Page 302: Proceedings of the Third European Workshop on Probabilistic

Again, this class of models A is generated byits subset C by the operation of intersection.The elements of C can be found by starting withan empty set C and in each step adding the el-ement of A not yet generated by C with thegreatest size. Actually, C contains (nontrivial)models of 23 types (may be downloaded fromthe mentioned web page).

To find some lower bound to the set of pos-itively representable models, large amount ofpositive binary (≡sample space 0, 1N ) ran-dom distributions have been randomly gener-ated. The description of the generating pro-cess will be omitted here4, see the above men-tioned web page for the list of correspondingindependence models and their binary represen-tations. Using Lemma 1 we obtained 4555 (299types) different positive representations of inde-pendence models. The remaining problematic57 types are plotted in Figure 4.

3.3 Conclusion

Let us summarize the results into the conclud-ing theorem.

Theorem 1. There are 18478 different (gener-ally) representable independence models (1098types) over the set N = 1, 2, 3, 4. There arebetween 4555 and 5547 different positively rep-resentable independence models (299–356 types)over the set N = 1, 2, 3, 4.

Acknowledgments

This work was supported by the grants GA CRn. 201/04/0393 and GA CR n. 201/05/H007.

References

S. L. Lauritzen and T. S. Richardson. 2002. Chaingraph models and their causal interpretations.Journal of the Royal Statistical Society: Series B(Statistical Methodology), 64:321–361.

S. L. Lauritzen. 1996. Graphical Models. OxfordUniversity Press.

4Briefly, the parametrization of joint distribution bymoments has been used. Further, both systematic andrandom search through parameter space was performed.

R. Lnenicka. 2005. On gaussian conditional inde-pendence structures. Technical report, UTIA AVCR.

F. Matus and M. Studeny. 1995. Conditional inde-pendences among four random variables I. Com-binatorics, Probability & Computing, 4:269–278.

F. Matus. 1992. On equivalence of markov prop-erties over undirected graphs. Journal of AppliedProbability, 29:745–749.

F. Matus. 1995. Conditional independences amongfour random variables II. Combinatorics, Proba-bility & Computing, 4:407–417.

F. Matus. 1997. Conditional independence struc-tures examined via minors. The Annals of Math-ematics and Artificial Intelligence, 21:99–128.

F. Matus. 1999. Conditional independences amongfour random variables III. Combinatorics, Proba-bility & Computing, 8:269–276.

J. Pearl. 1998. Probabilistic Reasoning in IntelligentSystems. Morgan Kaufmann.

J. M. Robins, R. Scheines, P. Spirtes, and L. Wasser-man. 2003. Uniform consistency in causal infer-ence. Biometrika, 90:491–515.

P. Simecek. 2006. Gaussian representation of in-dependence models over four random variables.Accepted to COMPSTAT conference.

W. Spohn. 1994. On the properties of conditionalindependence. In P. Humphreys, editor, PatrickSuppes: Scientific Philosopher, volume 1, pages173–194. Kluwer.

M. Studeny and P. Bocek. 1994. CI-models arisingamong 4 random variables. In Proceedings of the3rd workshop WUPES, pages 268–282.

M. Studeny and J. Vejnarova. 1998. The multiinfor-mation function as a tool for measuring stochasticdependence. In M. I. Jordan, editor, Learning inGraphical Models, pages 261–298. Kluwer.

M. Studeny. 1992. Conditional independence re-lations have no finite complete characterization.In S. Kubık and J.A. Vısek, editors, InformationTheory, Statistical Decision Functions and Ran-dom Processes. Transactions of the 11th PragueConference, volume B, pages 377–396. Kluwer.

M. Studeny. 2002. On stochastic conditional inde-pendence: the problems of characterization anddescription. Annals of Mathematics and Artifi-cial Intelligence, 35:241–323.

M. Studeny. 2005. Probabilistic Conditional Inde-pendence Structures. Springer.

292 P. Šimeček

Page 303: Proceedings of the Third European Workshop on Probabilistic

Evaluating Causal ee ts using Chain Event GraphsPeter Thwaites and Jim SmithUniversity of Warwi k Statisti s DepartmentCoventry, United KingdomAbstra tThe Chain Event Graph (CEG) is a oloured mixed graph used for the representationof nite dis rete distributions. It an be derived from an Event Tree (ET) togetherwith a set of equivalen e statements relating to the probabilisti stru ture of the ET.CEGs are espe ially useful for representing and analysing asymmetri pro esses, and olle tions of implied onditional independen e statements over a variety of fun tions an be read from their topology. The CEG is also a valuable framework for expressing ausal hypotheses, and manipulated-probability expressions analogous to that givenby Pearl in his Ba k Door Theorem an be derived. The expression we derive hereis valid for a far larger set of interventions than an be analysed using Bayesian Net-works (BNs), and also for models whi h have insuÆ ient symmetry to be des ribedadequately by a Bayesian Network.1 Introdu tionBayesian Networks are good graphi al repre-sentations for many dis rete joint probabil-ity distributions. However, many asymmet-ri models (by whi h we mean models withnon-symmetri sample spa e stru tures) an-not be fully des ribed by a BN. Su h pro- esses arise frequently in, for example, bio-logi al regulation, risk analysis and Bayesianpoli y analysis.In eli iting these models it is usually sen-sible to start with an Event Tree (Shafer,1996), whi h is essentially a des ription ofhow the pro ess unfolds rather than how thesystem might appear to an observer. Work-ing with an ET an be quite umbersome, butthey do re e t any model asymmetry, bothin model development and in model samplespa e stru ture.The Chain Event Graph (Ri omagno &Smith, 2005; Smith & Anderson, 2006;Thwaites & Smith, 2006a) is a graphi alstru ture designed for analysis of asymmetri systems. It retains the advantages of the ET,whilst typi ally having far fewer edges andverti es. Moreover, the CEG an be read fora ri h olle tion of onditional independen eproperties of the model. Unlike Jaeger's veryuseful Probabilisti De ision Graph (2002)

this in ludes all the properties that an beread from the equivalent BN if the CEG rep-resents a symmetri model and far more if themodel is asymmetri .In the next se tion we show how CEGs an be onstru ted. We then des ribe how we an use CEGs to analyse the ee ts of Causalmanipulation.Bayesian Networks are often extended toapply also to a ontrol spa e. When it is validto make this extension the BN is alled ausal.Although there is debate (Pearl, 2000; Lau-ritzen, 2001; Dawid, 2002) about terminol-ogy, it is ertainly the ase that BNs are use-ful for analysing (in Pearl's notation) the ef-fe ts of manipulations of the form Do X = x0in symmetri models, where X is a variableto be manipulated and x0 the setting thatthis variable is to be manipulated to. Thistype of intervention, whi h might be termedatomi , is a tually a rather oarse manipula-tion sin e we would need to extend the spa eto make predi tions of ee ts when X is ma-nipulated to any value. Although there isa ase for only onsidering su h manipula-tions when a model is very symmetri , it istoo oarse to apture many of the manipu-lations we might want to onsider in asym-metri environments. We use this paper toshow how CEGs an be used to analyse a far

Page 304: Proceedings of the Third European Workshop on Probabilistic

more rened singular manipulation in modelswhi h may have insuÆ ient symmetry to bedes ribed adequately by a Bayesian Network.2 CEG onstru tionWe an produ e a CEG from an Event Treewhi h we believe represents the model (see forexample Figure 1). This ET is just a graph-i al des ription of how the pro ess unfolds,and the set of atoms of the Event Spa e (orpath sigma algebra) of the tree is simply theset of root to leaf paths within the tree. Anyrandom variables dened on the tree are mea-sureable with respe t to this path sigma alge-bra.v0

v1

v4

v5

v6

v7v2

v8

v3

v9

v18

v23

π 1

π2 π5

π3

π9

π10

π 4

π9

π10

π9

π 6

π 7

π8

π10

π 11

π12

π6

π7

π8

π 9

π10

π13

π14Figure 1. ET for ma hine example.Example 1.A ma hine in a produ tion line utilises tworepla eable omponents A and B. Faults inthese omponents do not automati ally ausethe ma hine to fail, but do ae t the qual-ity of the produ t, so the ma hine in orpo-rates an automated monitoring system, whi his ompletely reliable for nding faults in A,but whi h an dete t a fault in B when it isfun tioning orre tly.In any monitoring y le, omponent A is he ked rst, and there are three initial pos-sibilities: A, B he ked and no faults found

(1 on the ET in Figure 1); A he ked, faultfound, ma hine swit hed o (2); A he ked,no fault found, B he ked, fault found, ma- hine swit hed o (3).If A is found faulty it is repla ed and thema hine swit hed ba k on (vertex v1), andB is then he ked. B is then either foundnot faulty (4), or faulty and the ma hineswit hed o (5).If B is found faulty by the monitoring sys-tem, then it is removed and he ked (verti esv2 and v3). There are then three possibili-ties, whose probabilities are independent ofwhether or not omponent A has been re-pla ed: B is not in fa t faulty, the ma hineis reset and restarted (6); B is faulty, is su - essfully repla ed and the ma hine restarted(7); B is faulty, is repla ed unsu essfullyand the ma hine is left o until the engineer an see it (8).At the time of any monitoring y le, thequality of the produ t produ ed (10) is un-ae ted by the repla ement of A unless B isalso repla ed. It is however dependent on theee tiveness of B whi h depends on its age,but also, if it is a new omponent, on the ageof A; so:(good produ t j A and B repla ed) = 12> (good produ t j only B repla ed) = 14> (good produ t j B not repla ed) = 10An ET for this set-up is given in Figure 1 anda derived CEG in Figure 2. Note that: The subtrees rooted in the verti es v4; v5;v6 and v8 of the ET are identi al (bothin physi al stru ture and in probabilitydistribution), so these verti es have been onjoined into the vertex (or position) w4in the CEG. The subtrees rooted in v2 and v3 are notidenti al (as 11 6= 13; 12 6= 14), butthe edges leaving v2 and v3 arry identi alprobabilities. The equivalent positions inthe CEG w2 and w3 have been joined byan undire ted edge. All leaf-verti es of the ET have been on-joined into one sink-vertex in the CEG,labelled w1.To omplete the transformation, note thatvi ! wi for 0 i 3, v7 ! w5 and v9 ! w6.294 P. Thwaites and J. Smith

Page 305: Proceedings of the Third European Workshop on Probabilistic

w0

w2

w4

w5

w6

w1

w3

winf

π 1

π2 π5

π 4 π 6 π6

π9

π 7

π10

π11

π12

π8

π13

π 14

π 8

π3

π 7Figure 2. CEG for ma hine example.A formal des ription of the pro ess is as fol-lows: Consider the ET T = (V (T ); E(T ))where ea h element of E(T ) has an asso i-ated edge probability. Let S(T ) V (T ) bethe set of non-leaf verti es of the ET.Let vi vj indi ate that there is a path join-ing verti es vi and vj in the ET, and that vipre edes vj on this path.Let X(v) be the sample spa e of X(v), therandom variable asso iated with the vertex v(X(v) an be thought of as the set of edgesleaving v, so in our example, X(v1) =fB found not faulty;B found faultyg).For any v 2 S(T ), vl 2 V (T )nS(T ) su h thatv vl : Label v = v0 Let vi+1 be the vertex su h thatvi vi+1 vl for whi h there is novertex v0 su h that vi v0 vi+1 fori 0 Label vl = vm, where the path onsistsof m edges of the form e(vi; vi+1)Denition 1. For any v1; v2 2 S(T ), v1 andv2 are termed equivalent, i there is a bije -tion whi h maps the set of paths (and om-ponent edges)1 = f1(v1; vl1) j vl1 2 V (T )nS(T )g onto2 = f2(v2; vl2) j vl2 2 V (T )nS(T )g in su ha way that:

(a) (e(v1 i; v1 i+1)) = e( (v1 i); (v1 i+1))= e(v2 i; v2 i+1) for 0 i m()(b) (v1 i+1 j v1 i) = (v2 i+1 j v2 i)where v1 i+1 and v2 i+1 label the same valueon the sample spa es X(vi1 ) and X(vi2 ) fori 0.The set of equivalen e lasses indu ed bythe bije tion is denoted K(T ), and the ele-ments of K(T ) are alled positions.Denition 2. For any v1; v2 2 S(T ), v1 andv2 are termed stage-equivalent, i there is abije tion whi h maps the set of edgesE1 = fe1(v1; v1 0) j v10 2 X(v1)g ontoE2 = fe2(v2; v2 0) j v20 2 X(v2)g in su ha way that:(v10 j v1) = ((v10) j (v1)) = (v20 j v2)where v10 and v20 label the same value on thesample spa es X(v1) and X(v2).The set of equivalen e lasses indu ed bythe bije tion is denoted L(T ), and the ele-ments of L(T ) are alled stages.A CEG C(T ) of our model is onstru ted asfollows:(1) V (C(T )) = K(T ) [ fw1g(2) Ea h w;w0 2 K(T ) will orrespond to aset of v; v0 2 S(T ). If, for su h v; v0, 9a dire ted edge e(v; v0) 2 E(T ), then 9 adire ted edge e(w;w0) 2 E(C(T ))(3) If 9 an edge e(v; vl) 2 E(T ) st v 2 S(T )Evaluating Causal effects using Chain Event Graphs 295

Page 306: Proceedings of the Third European Workshop on Probabilistic

and vl 2 V (T )nS(T ), then 9 a dire tededge e(w;w1) 2 E(C(T ))(4) If two verti es v1; v2 2 S(T ) are stage-equivalent, then 9 an undire ted edgee(w1; w2) 2 E(C(T ))(5) If w1 and w2 are in the same stage (ie: ifv1; v2 are stage-equivalent in E(T )), andif (v10 j v1) = (v20 j v2) then the edgese(w1; w1 0) and e(w2; w2 0) have the samelabel or olour in E(C(T )).Note that in our example, w2 and w3 are inthe same stage and that (v6jv2) = (v8jv3),(v7jv2) = (v9jv3), (v18jv2) = (v23jv3), sothe edges e(w2; w4) and e(w3; w4) are oloured the same, as are the edges e(w2; w5)and e(w3; w6) and as are e(w2; w1) ande(w3; w1).More detail on CEG onstru tion an befound in Smith & Anderson (2006), as ana detailed des ription of how CEGs are read.We on lude se tion 2 of this paper by lookingat two ideas that will be used extensively inthe next se tion.Firstly, when we say that a position win our CEG or the set of edges leaving whave an asso iated variable, we are not refer-ing to the measurement-variables of a BN-representation of the problem, ea h of whi hmust take a value for any atomi event, butto a more exible onstru t dened throughstage-equivalen e in the underlying tree. Theexit-edges of a position are simply the olle -tion of possible immediate out omes in thenext step of the pro ess given the history upto that position. A setting (or value or level)is then simply a possible realisation of a vari-able in this olle tion.Se ondly, we use these edge probabilitiesto dene the probabilities of omposite eventsin the path sigma eld of our CEG:Denition 3. For two positions w;w0 withw w0, let (w0 j w) be the probability as-so iated with the path (w;w0). Note thatthis will be a produ t of edge probabilities.Dene (w0 j w) ,X2 (w0 j w)where is the set of all paths from w to w0.

Note that the ombination rules for pathprobabilities on CEGs (dire tly analogous tothose for trees) give us that for any 3 posi-tions w1; w2; w3, with w1 w2 w3, we havethat (w3 j w1; w2) = (w3 j w2); that is theprobability that we pass through position w3given that we have passed through positionsw1 and w2 is simply the probability that wepass through position w3 given that we havepassed through position w2.3 Manipulations of CEGsThe simplest types of intervention are of theform Do X = x0 for some variable X andsetting x0, and these are really the only in-terventions that an be satisfa torily analysedusing BNs. In this paper we onsider a mu hmore general intervention where not only thesetting of the manipulated variable, but thevariable itself may be dierent depending onthe settings of other variables within the sys-tem.We an model su h interventions by theprodu tion of a manipulated CEG C in par-allel with our idle CEG C. In the interven-tion onsidered here every path in our CEG ismanipulated by having one omponent edgegiven a probability of 1 or 0. All edges withzero probabilities and bran hes stemmingfrom su h edges are removed (or pruned) fromC (note that in this paper all edges on anyCEG will have non-zero probabilities). Wewill all su h an intervention a singular ma-nipulation, and denote it Do Int.Denition 4. A subset WX of positions ofC qualies as a singular manipulation set if:(1) all root-to-sink paths in C pass throughexa tly one position in pa(WX), wherew 2 pa(WX) if w w0 for some w0 2WXand there exists an edge e(w;w0)(2) ea h position in pa(WX) has exa tly one hild in WX , by whi h we mean that forw 2 pa(WX), there exists exa tly onew0 2 WX su h that there exists an edgee(w;w0)A singular manipulation is then an interven-tion su h that:(a) for ea h w 2 pa(WX) and orrespondingw0 2WX , (w0 j w) = 1296 P. Thwaites and J. Smith

Page 307: Proceedings of the Third European Workshop on Probabilistic

(b) for any w 2 pa(WX) and w0 =2 WX su hthat w w0 and there exists an edgee(w;w0), then (w0 j w) = 0, and thisedge is removed (or pruned) in C( ) for any w =2 pa(WX) and w0 su h thatw w0 and there exists an edge e(w;w0),then (w0 j w) = (w0 j w)where is a probability in our manipulatedCEG C.Let WX = fwjg, pa(WX) = fwijg. Ea h posi-tion in pa(WX) has exa tly one hild inWX soelements of pa(WX) an be intially identiedby their hild in WX (ie by theindex j). But a position in WX ould havemore than one parent in pa(WX), so we dis-tinguish these parents by a se ond index i.For ea h pair (wij; wj) let Xij be the variableasso iated with the edge e(wij; wj) and xij bethe setting of this variable on this edge.If we also onsider a response variable Ydownstream from the set of positions WX ,then we an show (using for example Pearl'sdenition of Do) that:(y j Do Int)=Xi;j h(wij j w0) (y j wj)i (3:1)Pearl's own Ba k Door expression (below) isa simpli ation of the general manipulated-probability expression used with BNs.(y j Do x0)=Xz (y j z; x0) (z) (3:2)Z here is a subset of the measurement-variables of the BN whi h obey ertain on-ditions. If Z is hosen arefully then we an al ulate (y j Do x0) without onditioningon the full set of measurement-variables.In this paper we use the topology of theCEG to produ e an analogous expression to(3.2) for our more general singular manipu-lation, by using a set of positions WZ down-stream from the intervention whi h an stand-in for the set of positionsWX used in expres-sion (3.1). As with Pearl's expression, theuse of su h a setWZ will redu e the omplex-ity of the general expression (3.1) as well as

possibly allowing us to sidestep identiabilityproblems asso iated with it.Following Pearl, we have two onditions,whi h if satised, are suÆ ient for WZ to be onsidered a Ba k Door blo king set. We givethe rst here, and the se ond following a fewfurther denitions.(A) For all wj 2 WX , every wj w1 path inC must pass through exa tly one positionwk 2WZThe obvious notation for use with CEGs is apath-based one. However most pra titionerswill be more familiar with expressions su has (3.2), so we here develop a few ideas toallow us to express our ausal expression in asimilar fashion. The rst step in this pro essis to note that any position w in a CEG hasa unique set q(w) asso iated with it, where: Q(w) is the minimum set of variables,by spe ifying the settings (or values orlevels) of whi h, we an des ribe the unionof all w0 w paths q(w) are the settings of Q(w) whi h fullydes ribe the union of all w0 w pathsFormally this means that:q(w) = [2w q()where q() are the settings on thew0w path , and w is the set of all w0wpaths.Letting Z(w) be the set of variables en oun-tered on edges upstream of w, X(w) be the setof variables en ountered on edges downstreamof w, and R(w) = Z(w)nQ(w), we note thatthe onditional independen e statement en- oded by the position w is of the form:X(w)q R(w) j q(w)In the CEG in Figure 2 for example,X(w4) q R(w4) j q(w4) tells us that produ tquality is independent of the monitoring sys-tem responses, given that B is not repla ed.Note that the denition of q(w) meansthat the variable-settings within it might notalways orrespond to simple values of the vari-ables within Q(w). None-the-less, we haveEvaluating Causal effects using Chain Event Graphs 297

Page 308: Proceedings of the Third European Workshop on Probabilistic

found that q(w) is typi ally simpler than ea hindividual q().We now use the ideas outlined above toexpand the expression (3.1). As the positionwk is uniquely dened by q(wk), we an write,without ambiguity (y j wk) = (y j q(wk)).Using ondition (A) we get:(y j Do Int) (3:3)=Xi;j h(wij j w0) Xk (wk; y j wj)i=Xi;j;k (wij j w0) (y j wj ; wk) (wk j wj)=Xi;j;k (wij j w0) (y j wk) (wk j wj)=Xk hXi;j (wij jw0)(wkjwj)i(yjq(wk))The equivalen e of (y j wj ; wk) and (y j wk)is a onsequen e of the equivalen e of(w3 j w1; w2) and (w3 j w2) noted at theend of se tion 2, and proved in Thwaites &Smith (2006b).We now need a number of te hni al def-initions before we an introdu e our se ond ondition and pro eed to our ausal expres-sion. Examples illustrating these denitions an be found in Thwaites & Smith (2006b).Now, the position wk an also be fully de-s ribed by the union of disjoint events, ea h ofwhi h is (by onditions (1) and (A)) aw0 wij wk path for some wij 2 pa(WX).These events divide into 2 distin t sets:(1) w0 wij wj wk paths(2) paths that do not utilise the xij edge whenleaving wij (formally w0 wij w0 wkpaths where there exists an edge e(wij; w0),but w0 =2WX).We an ombine the events in set (1) into omposite or C-paths so that ea h C-pathpasses through exa tly one wij and an beuniquely hara terised by a pairqij(wk) = (xij ; zij(wk)), where zij is dened asfollows: Zij(wk) is the minimum set of variables,by spe ifying the settings of whi h, we an des ribe the union of all w0 wij wj wk paths (with Xij ex luded fromthis set)

zij(wk) are the settings of Zij(wk) whi h(with the addition of Xij = xij) fully de-s ribe the union of all w0 wij wj wkpathsDenition 5. We express this formally as:Let qij(wk) = S2 q(), where q() are thesettings on the w0wijwjwk path , and is the set of all w0 wij wj wk pathsin C. Let Qij(wk) be the set of variablespresent in qij(wk).Dene Zij(wk) as Qij(wk)nXij. Let zij(wk) bethe settings of Zij(wk) ompatible with qij(wk).Our Xij; xij ; Zij(wk); zij(wk) are dire tlyanalogous to Pearl's X;x;Z and z in expres-sion (3.2), and full similar roles in our nal ausal expression.The following rather te hni al denitionsare only required for an understanding of theproof. Denition 7 deals with the idea of ades endant whi h is very similar to the analo-gous idea in BNs, and is needed for ondition (B).Denition 6.Dene q(wij) analogously with the denitionof q(w). Let Q(wij) be the set of variablespresent in q(wij).Dene Qj(wk) as Zij(wk)nQ(wij). Note thatQ(wij) Zij(wk). Let qj(wk) be the settingsofQj(wk) ompatible with zij(wk) (or qij(wk)).We an therefore write:zij(wk) = (q(wij); qj(wk))qij(wk) = (xij; zij(wk)) = (q(wij); xij ; qj(wk))We an also ombine the events in set (2) intoC-paths, ea h of whi h an be uniquely har-a terised by rij(wk) = S2M q(), where q()are the settings on the w0 wij w0 wkpath , andM is the set of allw0wijw0wkpaths in C. We an therefore write:q(wk) = h[i;j qij(wk)i[h[i;j rij(wk)iDenition 7. Consider variablesA;D; fBmgdened on our CEG C. Then D is a de-s endant of A in C if there exists a sequen e298 P. Thwaites and J. Smith

Page 309: Proceedings of the Third European Workshop on Probabilistic

of (not ne essarily adja ent) edges e1; : : : enforming part of a w0 w1 path in C wherethe edges e1; : : : en are labelled respe tivelyb1 j (a; : : :); b2 j (b1; : : :); : : : bn1 j (bn2; : : :);d j (bn1; : : :), or if there exists an edge form-ing part of a w0 w1 path in C labelledd j (a; : : :); where a; b1; b2; : : : bn1; d are set-tings of A;B1; B2; : : : Bn1;D.We are now in a position to state our 2nd ondition.(B) In the sub-CEG Cij with wij as root-node,Qj(wk) must ontain no des endants ofXij for all i; j; for ea h position wkChe king that ondition (B) is fullled is a -tually straightforward on a CEG, espe iallysin e we will know whi h manipulations weintend to investigate, and an usually on-stru t our CEG so as to make Qj(wk) as smallas possible for all values of j; k.We an now repla e expression (3.3) by a Ba kDoor expression for singular manipulations:Proposition 1.(y j Do Int) (3:4)=Xk hXi;j (zij(wk))i (y j q(wk))Proof.Consider(wk j wj) = (wk j wij ; wj)= (q(wk) j q(wij); xij)= h [m;n qmn (wk)i [ h [m;n rmn (wk)ijq(wij); xij=Xm;n (qmn (wk) j q(wij); xij)+Xm;n (rmn (wk) j q(wij); xij)sin e disjoint.= (qij(wk)jq(wij); xij) + (rij(wk)jq(wij); xij)= (qij(wk) j q(wij); xij)= (q(wij); xij ; qj(wk) j q(wij); xij)= (qj(wk) j q(wij); xij)

But this is simply the probability thatQj(wk) = qj(wk) given that Xij = xij in thesub-CEG Cij .Condition (B) implies that XijqQj(wk) in Cijsin e Xij has no parents in this CEG. So weget: (wk j wj) = (qj(wk) j q(wij))Substituting this into expression (3.3), we get:(y j Do Int)=Xk hXi;j (wij j w0) (qj(wk) j q(wij))i(y j q(wk))=Xk hXi;j (q(wij)) (qj(wk) j q(wij))i(y j q(wk))=Xk hXi;j (zij(wk))i (y j q(wk)) It is possible to show that the expression(y j q(wk)) an be repla ed by a probability onditioned on a single w0 wk path, andmoreover that even on that path Y may wellbe independent of some of the variables en- ountered given the path-settings of the oth-ers | for details see Thwaites & Smith(2006b).We also noted earlier that q(w) = S q()is typi ally simpler than ea h individualq(). In most instan es Pi;j (zij(wk)) willbe the probability of a union of disjoint eventswhi h will also typi ally be simpler than anindividual zij(wk). We an dedu e that al u-latingPi;j (zij(wk)) is unlikely to be a om-plex task.We on lude this se tion by demonstratinghow expression (3.4) is related to Pearl's Ba kDoor expression (3.2):Consider the intervention Do X = x0, andlet Xij = X and xij = x0 for all i; j. Combineall our w0 wij wj wk C-paths, and writeSi;j qij(wk) = (x0; z(wk)). Rephrase ondi-tions (2) and (B) as:(2) ea h position in pa(WX) has exa tly oneof its outward edges in C labelled x0, andthis edge joins the position in pa(WX) toa position in WXEvaluating Causal effects using Chain Event Graphs 299

Page 310: Proceedings of the Third European Workshop on Probabilistic

(B) Z(wk) must ontain no des endants of X(where Z(Wk) is dened from z(wk) inthe obvious manner)Then with a little work, we an repla e ex-pression (3.4) by:(y j Do Int)=Xk (z(wk)) (y j x0; z(wk))If Z(wk) ontains the same variables for all k,and z(wk) runs through all settings of Z(wk)as we run through all wk, then this expressionredu es to Pearl's expression (3.2).4 Causal Analysis on CEGs and BNsThe prin ipal advantage that CEGs have overBNs when it omes to Causal analysis is their exibility. In a BN the kind of manipulationswe an onsider are severely restri ted (see forexample se tion 2.6 of Lauritzen (2001) wherehe omments on Shafer (1996)), whereas us-ing a CEG we an ta kle not only the on-ventional Do X = x0 manipulations of sym-metri models, but also the analysis of inter-ventions on asymmetri models and manipu-lations where both the manipulated-variableand the manipulated-variable value an dierfor dierent settings of other variables. It isalso the ase that our blo king sets are setsof values or settings, and do not need to or-respond to any xed subset of the originalproblem random variables.For simpli ity of exposition in this pa-per we have not fully exploited the potential exibility of the CEG, onsidering only for-mulae asso iated with a blo king-set WZ ofpositions downstream of WX . We an also onsider sets of stages upstream of WX , and ombinations of the two. Also, we have onlydis ussed one parti ular fairly oarse exam-ple of an intervention. There are often ir- umstan es where some paths in our CEGare not manipulated at all, for example ina treatment regime where only patients with ertain ombinations of symptoms (ie at er-tain positions or stages) are treated. Thereare also non-singular interventions where (forinstan e) a manipulation, rather than for -ing a path to follow one spe i edge at some

vertex, instead provides a probability distri-bution for the outgoing edges of that vertex.So not only are CEGs ideal representa-tions of asymmetri dis rete models, retainingthe onvenien e of a tree-form for des ribinghow a pro ess unfolds but also expressing ari h olle tion of the model's onditional in-dependen e properties, but their event-based ausal analysis has distin t advantages overthe variable-based analysis one performs onBayesian Networks.Referen esA.P. Dawid. 2002. In uen e Diagrams for CausalModelling and Inferen e. International Statisti- al Review 70.M. Jaeger. 2002. Probabilisti De ision Graphs| Combining Veri ation and AI te hniques forProbabilisti Inferen e. In PGM '02 Pro eedingsof the 1st European Workshop on Probabilisti Graphi al Models, pages 81-88.S.L. Lauritzen. 2001. Causal Inferen e fromGraphi al Models. In O.E. Barndor-Nielsen etal (Eds) Complex Sto hasti Systems. Chapmanand Hall / CRC.J. Pearl. 2000. Causality: models, reasoning andinferen e. Cambridge University Press.E. Ri omagno and J.Q. Smith. 2005. The CausalManipulation and Bayesian Estimation of ChainEvent Graphs. CRiSM Resear h Report no. 05-16, University of Warwi k.G. Shafer. 1996. The Art of Causal Conje ture.MIT Press.J.Q. Smith and P.E. Anderson. 2006. Conditionalindependen e and Chain Event Graphs. A - epted, subje t to revision, by Journal of Ar-ti ial Intelligen e.P.A. Thwaites and J.Q. Smith. 2006a. Non-symmetri models, Chain Event Graphs andpropagation. In IPMU '06 Pro eedings of the11th International Conferen e on InformationPro essing and Management of Un ertainty inKnowledge-Based Systems.P.A. Thwaites and J.Q. Smith. 2006b. Singularmanipulations on Chain Event Graphs. War-wi k Resear h Report, University of Warwi k.300 P. Thwaites and J. Smith

Page 311: Proceedings of the Third European Workshop on Probabilistic

Severity of Local Maxima for the EM Algorithm:Experiences with Hierarchical Latent Class Models

Yi Wang and Nevin L. ZhangDepartment of Computer Science and EngineeringHong Kong University of Science and Technology

Hong Kong, Chinawangyi,[email protected]

Abstract

It is common knowledge that the EM algorithm can be trapped at local maxima andconsequently fails to reach global maxima. We empirically investigate the severity of thisproblem in the context of hierarchical latent class (HLC) models. Our experiments wererun on HLC models where dependency between neighboring variables is strong. (Thereason for focusing on this class of models will be made clear in the main text.) Wefirst ran EM from randomly generated single starting points, and observed that (1) theprobability of hitting global maxima is generally high, (2) it increases with the strength ofdependency and sample sizes, and (3) it decreases with the amount of extreme probabilityvalues. We also observed that, at high dependence strength levels, local maxima are farapart from global ones in terms of likelihoods. Those imply that local maxima can bereliably avoided by running EM from a few starting points and hence are not a seriousissue. This is confirmed by our second set of experiments.

1 Introduction

The EM algorithm (Dempster et al., 1977) isa popular method for approximating maximumlikelihood estimate in the case of incompletedata. It is widely used for parameter learningin models, such as mixture models (Everitt andHand, 1981) and latent class models (Lazarsfeldand Henry, 1968), that contain latent variables.

A well known problem associated with EMis that it can be trapped at local maxima andconsequently fails to reach global maxima (Wu,1983). One simple way to alleviate the problemis to run EM many times from randomly gener-ated starting points, and take the highest like-lihood obtained as the global maximum. Oneproblem with this method is that it is compu-tationally expensive when the number of start-ing points is large because EM converges slowly.For this reason, researchers usually adopt themultiple restart strategy (Chickering and Heck-erman, 1997; van de Pol et al., 1998; Uebersax,2000; Vermunt and Magidson, 2000): First run

EM from multiple random starting points fora number of steps, then pick the one with thehighest likelihood, and continue EM from thepicked point until convergence. In addition tomultiple restart EM, several more sophisticatedstrategies for avoiding local maxima have alsobeen proposed (Fayyad et al., 1998; Ueda andNakano, 1998; Ueda et al., 2000; Elidan et al.,2002; Karciauskas et al., 2004).

While there is abundant work on avoiding lo-cal maxima, we are aware of few work on theseverity of the local maxima issue. In this pa-per, we empirically investigate the severity oflocal maxima for hierarchical latent class (HLC)models. Our experiments were run on HLCmodels where dependency between neighboringvariables is strong. This class of models waschosen because we use HLC models to discoverlatent structures. It is our philosophical viewthat one cannot expect to discover latent struc-tures reliably unless observed variables stronglydepend on latent variables (Zhang, 2004; Zhangand Kocka, 2004).

Page 312: Proceedings of the Third European Workshop on Probabilistic

In the first set of experiments, we ran EMfrom randomly generated single starting points,and observed that (1) the probability of hittingglobal maxima is generally high, (2) it increaseswith the strength of dependency and samplesizes, and (3) it decreases with the amount ofextreme probability values. We also observedthat, at high dependence strength levels, localmaxima are far apart from global ones in termsof likelihoods.

Those observations have immediate practi-cal implications. Earlier in this section, wementioned a simple local-maxima avoidancemethod. We pointed out one of its problems,i.e. its high computational complexity, and saidthat multiple restart can alleviate this prob-lem. There is another problem with the method:there is no guidance on how many startingpoints to use in order to avoid local maximareliably. Multiple restart provides no solutionfor this problem.

Observations from our experiments suggest aguideline for strong dependence HLC models.As a matter of fact, they imply that local max-ima can be reliably avoided by using multiplerestart and a few starting points. This is con-firmed by our second set of experiments.

The remainder of this paper is organized asfollows. In Section 2, we review some basic con-cepts about HLC models and the EM algorithm.In Sections 3 and 4, we report our first and sec-ond sets of experiments respectively. We con-clude this paper and point out some potentialfuture work in Section 5.

2 Background

2.1 Hierarchical Latent Class Models

Hierarchical latent class (HLC) models (Zhang,2004) are tree-structured Bayesian networkswhere the leaf nodes are observed while theinternal nodes are hidden. An example HLCmodel is shown in Figure 1. Following the con-ventions in latent variable model literatures, wecall the leaf nodes manifest variables and theinternal nodes latent variables1.

1In this paper, we do not distinguish between nodesand variables.

X1

X2 Y1 X3

Y2 Y3 Y4 Y5 Y6 Y7

Figure 1: An example HLC model. X1, X2,X3 are latent variables, and Y1, Y2, · · · , Y7 aremanifest variables.

We usually write an HLC model as a pairM=(m,θ). The first component m consists ofthe model structure and cardinalities of the vari-ables. The second component θ is the collectionof parameters. Two HLC models M=(m,θ)and M ′=(m′,θ′) are marginally equivalent ifthey share the same set of manifest variablesY and P (Y|m,θ)=P (Y|m′,θ′).

We denote the cardinality of a variable X by|X|. For a latent variable X in an HLC model,denote the set of its neighbors by nb(X). AnHLC model is regular if for any latent vari-

able X, |X|≤Q

Z∈nb(X)|Z|

maxZ∈nb(X)|Z| , and the inequality

strictly holds when X has only two neighbors,one of which being a latent variable. Zhang(2004) has shown that an irregular HLC modelcan always be reduced to a marginally equiv-alent HLC model that is regular and containsfewer independent parameters. Henceforth, ourdiscussions are restricted to regular HLC mod-els.

2.2 Strong Dependence HLC Models

The strength of dependency between two vari-ables is usually measured by mutual informa-tion or correlation (Cover and Thomas, 1991).However, there is no general definition of strongdependency for HLC models yet. In this study,we use the operational definition described inthe following paragraph.

Consider a probability distribution. We callthe component with the largest probabilitymass the major component. We say that anHLC model is a strong dependence model if

• The cardinality of each node is no smallerthan that of its parent.

• The major components in all conditionaldistributions are larger than 0.5, and

302 Y. Wang and N. L. Zhang

Page 313: Proceedings of the Third European Workshop on Probabilistic

• In each conditional probability table, themajor components of different rows are lo-cated in different columns.

In general, the larger the major components, thehigher the dependence strength (DS) level. Inthe extreme case, when all major componentsare equal to 1, the HLC model becomes deter-ministic. Strong dependence HLC models de-fined in this way have been examined in ourprevious work on discovering latent structures(Zhang, 2004; Zhang and Kocka, 2004). Theresults show that such models can be reliablyrecovered from data.

2.3 EM Algorithm

Latent variables can never be observed andtheir values are always missing in data. Thisfact complicates the maximum likelihood esti-mation problem, since we cannot compute suf-ficient statistics from incomplete data records.A common method to deal with such situationsis to use the expectation-maximization (EM) al-

gorithm (Dempster et al., 1977). The EM al-gorithm starts with a randomly chosen estima-tion to parameters, and iteratively improves thisestimation by increasing its loglikelihood. Ineach EM step, the task of increasing loglikeli-hood is delegated to the maximization of theexpected loglikelihood function. The latter is alower bound of the loglikelihood function. It isdefined as

Q(θ|D,θt)

=

m∑

l=1

Xl

P (Xl|Dl,θt) log P (Xl,Dl|θ),

where D=D1,D2, · · · ,Dm denotes the col-letion of data, Xl denotes the set of variableswhose values are missing in Dl, and θ

t denotesthe estimation at the t-th step. The EM algo-rithm terminates when the increase in loglike-lihood between two successive steps is smallerthan a predefined stopping threshold.

It is common knowledge that the EM algo-rithm can be trapped at local maxima and con-sequently fails to reach global maxima (Wu,1983). The specific result depends on the choiceof the starting point. A common method to

avoid local maxima is to run EM many timeswith randomly generated starting points, andpick the instance with the highest likelihood asthe final result. The more starting points ituses, the higher the chance it can reach theglobal maximum. However, due to the slowconvergence of EM, this method is computa-tionally expensive. A more feasible method,called multiple restart EM, is to run EM withmultiple random starting points, and retain theone with the highest likelihood after a specifiednumber of initial steps. This method and itsvariant are commonly used to learn latent vari-able models in practice (Chickering and Heck-erman, 1997; van de Pol et al., 1998; Uebersax,2000; Vermunt and Magidson, 2000). Otherwork on escaping from poor local maxima in-cludes (Fayyad et al., 1998; Ueda and Nakano,1998; Ueda et al., 2000; Elidan et al., 2002; Kar-ciauskas et al., 2004).

3 Severity of Local Maxima

Here is the strategy that we adopt to empiri-cally investigate the severity of local maxima instrong dependence HLC models: (1) create a setof strong dependence models, (2) sample somedata from each of the models, (3) learn modelparameters from the data by running EM toconvergence from a number of starting points,(4) graph the final loglikelihoods obtained.

The final loglikelihoods for different startingpoints could be different due to local maxima.Hence, an inspection of their distribution wouldgive us a good idea about the severity of localmaxima.

3.1 Experiment Setup

The structure of all models used in our exper-iments was the ternary tree with height equals3, as shown in Figure 2. The cardinalities ofall variables were set at 3. Parameters wererandomly generated subject to the strong de-pendency condition. We examined 5 DS levels,labeled from 1 to 5. They correspond to restrict-ing the major components within the following 5intervals: [0.5, 0.6), [0.6, 0.7),[0.7, 0.8),[0.8, 0.9),and [0.9, 1.0]. For each DS level, 5 different pa-rameterizations were generated. Consequently,

Severity of Local Maxima for the EM Algorithm: Experiences with Hierarchical Latent Class Models 303

Page 314: Proceedings of the Third European Workshop on Probabilistic

X1

X2 X3 X4

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9

Figure 2: The structure of generative models.

we examined 25 models in total.

From each of the 25 models, four training setsof 100, 500, 1000, and 5000 records were sam-pled. On each training set, we ran EM indepen-dently for 100 times, each time starting from arandomly selected initial point. The stoppingthreshold of EM was set at 0.0001. The high-est loglikelihood obtained in the 100 runs is re-garded as the global maximum.

3.2 Results

The results are summarized in Figures 3 and4. In Figure 3, there is a plot for each combi-nation of DS level (row) and sample size (col-umn). As mentioned earlier, 5 models were cre-ated for each DS level. The plot is for one ofthose 5 models. In the plot, there are threecurves: solid, dashed, and dotted2. The dashedand dotted curves are for Section 4. The solidcurve is for this section. The curve depicts adistribution function F (x), where x is loglikeli-hood and F (x) is the percent of EM runs thatachieved a loglikelihood no larger than x.

While a plot in Figure 3 is about one modelat a DS level, a plot in Figure 4 represents anaggregation of results about all 5 models at a DSlevel. Loglikelihoods for different models can bein very different ranges. To aggregate them ina meaningful way, we introduce the concept ofrelative likelihood shortfall of EM run.

Consider a particular model. We have runEM 100 times and hence obtained 100 loglike-lihoods. The maximum is regarded as the op-timum and is denoted by l∗. Suppose a partic-ular EM run resulted in loglikelihood l. Thenthe relative likelihood shortfall of that EM run isdefined as (l−l∗)/l∗. This value is nonnegative.

2Note that in some plot (e.g., that for DS level 4 andsample size 5000) the curves overlap and are indistin-guishable.

The smaller it is, the higher the quality of theparameters produced by the EM run. In partic-ular, the relative likelihood shortfall of the runthat produced the global maximum l∗ is 0.

For a given DS level, there are 5 models andhence 500 EM runs. We put the relative like-lihood shortfalls of all those EM runs into oneset and let, for any nonnegative real number x,F (x) be the percent of the elements in the setthat is no larger than x. We call F (x) the dis-

tribution function of (aggregated) relative likeli-

hood shortfalls of EM runs, or simply distribu-

tion of EM relative likelihood shortfall, for theDS level.

The first 5 plots in Figure 4 depict the dis-tributions of EM relative likelihood shortfall forthe 5 DS levels. There are four curves in eachplot, each corresponding to a sample size3.

3.2.1 Probability of Hitting Global

Maxima

The most interesting question is how oftenEM hits global maxima. To answer this ques-tion, we first look at the solid curves in Fig-ure 3. Most of them are stair-shaped. In eachcurve, the x-position of the right most stair isthe global maximum, and the height of thatstair is the frequency of hitting the global max-imum.

We see that the frequency of hitting globalmaxima was generally high for high DS lev-els. In particular, for DS level 3 or above, EMreached global maxima more than half of thetime. For sample size 500 or above, the fre-quency was even greater than 0.7. On the otherhand, the frequency was low for DS level 1, es-pecially when the sample size was small.

The first 5 plots in Figure 4 tell the samestory. In those plots, the global maxima arerepresented by x=0. The heights of the curvesat x=0 are the frequencies of hitting global max-ima. We see that for DS level 3 or above, the fre-quency of hitting global maxima is larger than0.5, except that for DS level 3 and sample size100. And once again, the frequency was low forDS level 1.

3Note that in Figure 4 (b) and (c) some curves areclose to the y-axes and are hardly distinguishable.

304 Y. Wang and N. L. Zhang

Page 315: Proceedings of the Third European Workshop on Probabilistic

100 records 500 records 1000 records 5000 records

DS

level

1

−925 −920 −915 −9100

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4715 −4710 −4705 −4700 −46950

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−9504 −9503 −9502 −9501 −9500 −94990

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4.755 −4.7545 −4.754 −4.7535 −4.753

x 104

0

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

DS

level

2

−890 −885 −880 −875 −8700

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4495 −4490 −4485 −4480 −4475 −44700

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−8986.5 −8986 −8985.5 −89850

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4.5504−4.5504−4.5504−4.5504−4.5503−4.5503

x 104

0

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

DS

level

3

−800 −795 −790 −785 −780 −7750

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4100 −4090 −4080 −4070 −4060 −40500

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−7967 −7966 −7965 −7964 −7963 −79620

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4.05 −4.049 −4.048 −4.047 −4.046 −4.045

x 104

0

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

DS

level

4

−680 −660 −640 −620 −6000

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−3500 −3400 −3300 −3200 −31000

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−7200 −7000 −6800 −6600 −64000

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−3.241 −3.2405 −3.24 −3.2395 −3.239

x 104

0

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

DS

level

5

−450 −400 −350 −3000

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−2400 −2200 −2000 −1800 −16000

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−4500 −4000 −3500 −30000

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

−2.2 −2 −1.8 −1.6

x 104

0

0.2

0.4

0.6

0.8

1

Loglikelihood

Dis

trib

utio

n fu

nctio

n

Figure 3: Distributions of loglikelihoods obtained by different EM runs. Solid curves are for EMruns with single starting points. Dashed and dotted curves are for runs of multiple restart EM withsetting 4×10 and 16×50 respectively (see Section 4). Each curve depicts a distribution functionF (x), where x is loglikelihood and F (x) is the percent of runs that achieved a loglikelihood nolarger than x.

Severity of Local Maxima for the EM Algorithm: Experiences with Hierarchical Latent Class Models 305

Page 316: Proceedings of the Third European Workshop on Probabilistic

0 0.002 0.004 0.006 0.008 0.010

0.2

0.4

0.6

0.8

1

Relative likelihood shortfall

Dis

trib

utio

n fu

nctio

n

100 records500 records1000 records5000 records

(a) DS level 1

0 0.002 0.004 0.006 0.008 0.010

0.2

0.4

0.6

0.8

1

Relative likelihood shortfall

Dis

trib

utio

n fu

nctio

n

100 records500 records1000 records5000 records

(b) DS level 2

0 0.002 0.004 0.006 0.008 0.010

0.2

0.4

0.6

0.8

1

Relative likelihood shortfall

Dis

trib

utio

n fu

nctio

n

100 records500 records1000 records5000 records

(c) DS level 3

0 0.002 0.004 0.006 0.008 0.010

0.2

0.4

0.6

0.8

1

Relative likelihood shortfall

Dis

trib

utio

n fu

nctio

n

100 records500 records1000 records5000 records

(d) DS level 4

0 0.002 0.004 0.006 0.008 0.010

0.2

0.4

0.6

0.8

1

Relative likelihood shortfall

Dis

trub

ition

func

tion

100 records500 records1000 records5000 records

(e) DS level 5

0 0.002 0.004 0.006 0.008 0.010

0.2

0.4

0.6

0.8

1

Relative likelihood shortfallD

istr

ibut

ion

func

tion

100 records500 records1000 records5000 records

(f) DS level 3 with 15% zero components

Figure 4: Distributions of EM relative likelihood shortfall for different DS levels.

3.2.2 DS Level and Local Maxima

To see how DS level influences the severityof local maxima, read each column of Figure 3(for a given sample size) from top to bottom.Notice that, in general, the height of the rightmost stair increases with the DS level. It meansthat the frequency of hitting global maxima in-creases with DS level. The same conclusion canbe read off from Figure 4. Compare the curvesfor a given sample size from the first 5 plots. Ingeneral, their values at x=0 rise up as we movefrom DS level 1 to DS level 5.

3.2.3 Extreme Parameter Values and

Local Maxima

There are some exceptions to the regularitiesmentioned in the previous paragraph. We see inboth figures that the frequency of hitting globalmaxima decreases as the DS level changes from4 to 5. By comparing Figure 4 (c) and (d), wealso find that the frequency slightly drops whenthe DS level changes from 3 to 4, given that the

sample size is large.

We conjecture that this is because that, ingenerative models for DS levels 4 and 5, thereare many parameter values close to 0. It hasbeen reported in latent variable model litera-tures that extreme parameter values cause localmaxima (Uebersax, 2000; McCutcheon, 2002).

To confirm our conjecture, we created anotherset of five models for DS level 3. The parame-ters were generated in the same way as before,except that we randomly set 15% of the non-major components in the probability distribu-tions to 0. We then repeated the experimentsfor this set of models. The EM relative likeli-hood shortfall distributions are given in Figure4 (f). We see that, as expected, the frequencyof hitting global maxima did decrease consider-ably compared with Figure 4 (c).

3.2.4 Sample Size and Local Maxima

We also noticed the power of the sample size.By going through each row of Figure 3, we found

306 Y. Wang and N. L. Zhang

Page 317: Proceedings of the Third European Workshop on Probabilistic

that the frequency of hitting global maxima in-creases with the sample size. The phenomenonis more apparent if we look at Figure 4, wherecurves for different sample sizes are plotted inthe same picture. As the sample size goes up,the curve becomes steeper and its value at x=0increases significantly.

3.2.5 Quality of Local Maxima

In addition to the frequency of hitting globalmaxima, another interesting question is howbad local maxima could be. Let us first examinethe local maxima in Figure 3. They are markedby the x-positions of the inflection points oncurves. For DS level 1, the local maxima are notfar from the global ones. The discrepancies areless than 15. Moreover, there are a lot inflectionpoints on the curves, namely, distinct local max-imal solutions. They distributed evenly withinthe interval between the worst local maxima andthe global ones.

For the other DS levels, things are different.We first observed that the quality of local max-ima can be extremely poor in some situation.The worst case is that of DS level 5 with samplesize 5000. The loglikelihood of the local maxi-mum can be lower than that of the global oneby more than 4000. The second thing we ob-served is that the curves contain much fewerinflection points. In other words, there are onlya few distinct local maximal solutions in suchcases. Moreover, those local maxima stay farapart from global ones since the steps of thestaircases are fairly large. Those observationscan be confirmed by studying the first 5 plotsin Figure 4, where the curves and the inflectionpoints can be interpreted similarly.

4 Performance of Multiple Restart

EM

We emphasize two observations mentioned inthe previous section: (1) the probability of hit-ting global maxima is generally high, and (2)likelihoods of local maxima are far apart fromthose of global maxima at high DS levels. Wewill see that these observations have immedi-ate implications on the performance of multiplerestart EM.

We say that a starting point is optimal if itconverges to the global maximum. The first ob-servation can be restated as follows: the proba-bility for a randomly generated starting point tobe optimal is generally high. Consequently, itis almost sure that there is an optimal startingpoint within a few randomly generated ones.

As it is well known, the EM algorithm in-creases the likelihood quickly in its early stageand slows down when it is converging. In otherwords, the likelihood should become relativelystable after a few steps. Therefore, the secondobservation implies that we can easily separatethe optimal starting point from the others afterrunning a few EM steps on them.

A straightforward consequence of the aboveinference is that multiple restart EM with a fewstarting points and initial steps should reliablyavoid local maxima for strong dependence HLCmodels. To confirm this conjecture, we ran mul-tiple restart EM independently for 100 times oneach training set that was presented in Figure3. We tested two settings for multiple restartEM: (1) 4 random starting points with 10 ini-tial steps (in short, 4×10), and (2) 16 randomstarting points with 50 initial steps (in short,16×50). As before, we plotted the distributionsof loglikelihoods in Figure 3. The dashed andthe dotted curves denote the results for settingsof 4×10 and 16×50, respectively.

From Figure 3, we see that multiple restartEM with setting 16×50 can reliably avoid lo-cal maxima for DS level 2 or above. Actually,the dotted curves are parallel to the y-axes ex-cept that for DS level 2 and sample size 100.It means that global maxima can always bereached in such cases. For setting 4×10, sim-ilar behaviors are observed for DS level 4 or 5and sample size 500 or above. Note that dashedand dotted curves overlap in those plots.

Nonetheless, we also notice that, for DS level1, multiple restart EM with both settings stillcan not find global maxima reliably. This isconsistent with our reasoning. As we have men-tioned in Section 3.2.1, when we ran EM withrandomly generated single starting points, thefrequency of hitting global maxima is low for DSlevel 1. In other words, it is hard to hit an op-

Severity of Local Maxima for the EM Algorithm: Experiences with Hierarchical Latent Class Models 307

Page 318: Proceedings of the Third European Workshop on Probabilistic

timal starting point by chance. Moreover, dueto the small discrepancy among local maxima(see Section 3.2.5), it demands a large numberof initial steps to distinguish an optimal startingpoint from the others. Therefore, the effective-ness of multiple restart EM degenerates. In suchsituations, one can either increase the numberof starting points and initial steps, or appeal tomore sophisticated methods for avoiding localmaxima.

5 Conclusions

We have empirically investigated the severity oflocal maxima for EM in the context of strong de-pendence HLC models. We have observed that(1) the probability of hitting global maxima isgenerally high, (2) it increases with the strengthof dependency and sample sizes, (3) it decreaseswith the amount of extreme probability values,and (4) likelihoods of local maxima are far apartfrom those of global maxima at high dependencestrength levels. We have also empirically shownthat the local maxima can be reliably avoidedby using multiple restart EM with a few startingpoints and hence are not a serious issue.

Our discussion has been restricted to a spe-cific class of HLC models. In particular, wehave defined the strong dependency in an op-erational way. One can devise more formal def-initions and carry on similar studies for gener-alized strong dependence models. Another fu-ture work would be the theoretical explorationto support our experiences.

Based on our observations, we have ana-lyzed the performance of multiple restart EM.One can exploit those observations to analyzemore sophisticated strategies for avoiding localmaxima. We believe that those observationscan also give some inspirations to develop newmethods on this direction.

Acknowledgments

Research on this work was supported by HongKong Grants Council Grant #622105.

References

D. M. Chickering and D. Heckerman. 1997. Effi-cient approximations for the marginal likelihood

of Bayesian networks with hidden variables. Ma-chine Learning, 29:181–212.

T. M. Cover and J. A. Thomas. 1991. Elements ofInformation Theory. John Wiley & Sons, Inc.

A. P. Dempster, N. M. Laird, and D. R. Rubin.1977. Maximum likelihood from incomplete datavia the EM algorithm. Journal of the Royal Sta-tistical Society, Series B, 39(1):1–38.

G. Elidan, M. Ninio, N. Friedman, and D. Schuur-mans. 2002. Data perturbation for escaping localmaxima in learning. In AAAI-02, pages 132–139.

B. S. Everitt and D. J. Hand. 1981. Finite MixtureDistributions. Chapman and Hall.

U. M. Fayyad, C. A. Reina, and P. S. Bradley. 1998.Initialization of iterative refinement clustering al-gorithms. In KDD-98, pages 194–198.

G. Karciauskas, T. Kocka, F. V. Jensen, P. Lar-ranaga, and J. A. Lozano. 2004. Learning oflatent class models by splitting and merging com-ponents. In PGM-04.

P. F. Lazarsfeld and N. W. Henry. 1968. LatentStructure Analysis. Houghton Mifflin.

A. L. McCutcheon. 2002. Basic concepts and pro-cedures in single- and multiple-group latent classanalysis. In Applied Latent Class Analysis, chap-ter 2, pages 56–85. Cambridge University Press.

J. S. Uebersax. 2000. A brief study of localmaximum solutions in latent class analysis.http://ourworld.compuserve.com/homepages/jsuebersax/local.htm.

N. Ueda and R. Nakano. 1998. Determinis-tic annealing EM algorithm. Neural Networks,11(2):271–282.

N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hin-ton. 2000. SMEM algorithm for mixture models.Neural Computation, 12(9):2109–2128.

F. van de Pol, R. Langeheine, and W. de Jong, 1998.PANMARK user manual, version 3. NetherlandsCentral Bureau of Statistics.

J. K. Vermunt and J. Magidson, 2000. Latent GOLDUser’s Guide. Statistical Innovations Inc.

C. F. J. Wu. 1983. On the convergence propertiesof the EM algorithm. The Annals of Statistics,11(1):95–103.

N. L. Zhang and T. Kocka. 2004. Efficient learningof hierarchical latent class models. In ICTAI-04,pages 585–593.

N. L. Zhang. 2004. Hierarchical latent class modelsfor cluster analysis. Journal of Machine LearningResearch, 5(6):697–723.

308 Y. Wang and N. L. Zhang

Page 319: Proceedings of the Third European Workshop on Probabilistic

! "#$&%'()*,+.-/1032.4

562./87:9<;=/8>@?BA:CED)F$9HG1IKJLNMO032P0:QP0

RTSVUHWYX[ZK\:W] =BIP0^;>TA:C,Q.9H_</1=/8A`2bac>J$9<A:;9<>/1_d_eA3G1G10:fA:;0^>/87:9gQ.9H=/84`2L>JP/1=ThiA:;j&0:QPQ.;9H==k9H=T;9HI.;9H=9[23>0^>/8A`2032PQl_eA`mnINF$>0^>/8A`2d/1==oF$9H=)CpA:;0=k9<>aqfP0:=k9HQrQ.9H=/84`2l0^4:9[23><+tsu9v;9<wN2$9x>J$9vQ.9<w2./8>/8A`2uA:CyQ.9H=/84`22.9<>qhzA:;jb=i>kAf9mnA:;99e.I.;9H==/87:9|CpA:;iQ.9H=/84`2xj.2$A[h|G89HQP4:9032.Qm~A:;99e9H_e>/87:96CA:;i4`F./1QP/2$4m~A$Q.9HG_eA`2P=k>k;oF._e>/8A`2g032.Qd7:9<;o/8wK_<0^>/8A`2+tsu9I.;A3IA3=k9x0_eA`mnIP/1G89HQg=k>k;oF._e>F$;9nA:CQ.9H=/84`2g2$9<>qhzA:;j$=032PQ0g=F./8>k9A:C)0:G84:A:;/8>JPmn=x>J.0^>T_eA`mnIKF$>k9H=nA3I.>/mn0:GQ.9H=/84`2.=v9<x_</89[2`>G8?:+|J$9t;9H=F.G8>vIP;A[7$/1Q.9H=0mn9H_oJ.032P/1=m>JP0^>#_eA`mfP/2.9H=IP;A3fP0:fP/1G1/1=k>/1_^L_eA`2.=k>k;0:/2`>aqfP0:=k9HQL032.Q,Q.9H_</1=/8A`2bac>J$9<A:;9<>/1_y;9H0:=kA`2./2.4CpA:;|=/2.43G89eaq0^4:9[23>A3I.>/mn0:GQP9H=/84`2LK032.QtwKG1G1=/2B0~430:I/2TA3I.>/mn0:GV_eA3G1G10:fNA:;o0^>/87:9Q.9H=/84`2+

Y:3Eb3qV

|J./1=lhiA:;j_eA`2._e9<;o2.=gA3I.>/mx0:GxQ.9H_</1=/8A`2/2_e9[2ba>k;0:G1/8<9HQ032PQr_eA3G1G10:fA:;0^>/87:9Q.9H=/84`2+ ] >~>J$9B_e9[2ba>k;0:G1/8<9HQC;A`23><L/8>gQ.9H0:G1=lh|/8>J=k9<>aqfK0:=k9HQQP9H=/84`2 sd0^;QLTH::bT $A3fN9<j¡9<>¢0:G£+1LTH::¤ThJ./1_oJL,FP2baG1/8j:9IA3/2`>aqfP0:=k9HQQ.9H=/84`2rA:C>k9[2¥_eA`2.wN2$9HQ¦>kAlG8A$_<0:GA3I.>/mx0:G1=<Lx9<7^0:GF.0^>k9H=¥0:G1Gx0:G8>k9<;§2.0^>/87:9Q.9H=/84`2.=l>kA=k9<9<j>J$9643G8A3fP0:GA3I.>/mx0:G£+¨;/mx0^;?n_oJP0:G1G89[2$4:9H=CpA:;I.;A$QKF._e>©G1/8Cp9H_e?$_<G89Omx032.0^4:9[m~9[2`>V/2._<GF.QP9iJ$A[hg>kA9<CawK_</89[2`>G8?lI.;A3IK0^430^>k9_eA`2.=>k;0:/23>=,032.Qg=k9H0^;_§Jg/2g0Q.9H=/84`2l=IP0:_e9 ¨0^;9HQP/1=ª9<>0:G£+1L«^¬:¬3­¤+|J./1=ªhiA:;j_eA`2`>k;/1fKF$>k9H=z>kAnm~9<9<>/2$4>J$9H=k96_oJP0:G1G89[2$4:9H=yh|/8>Jv0320:G84:A:;/8>JPm=FP/8>k9|>J.0^>zQ.9H=/84`2.=A3I.>/mn0:G1G8?nQ.9H_</1=/8A`2ba>J$9<A:;9<>/1_<0:G1G8?032.QB9<x_</89[2`>G8?:+] >>J$9g_eA3G1G10:fA:;0^>/87:9¥Cp;A`2`><Lª/8>=kA3G87:9H=¢0®=F.f$a

I.;A3fKG89[m¯/2gQ.9H_</1=/8A`2bac>J.9<A:;9<>/1_BQ.9H=/84`2°n±A3=k>,;9ea=k9H0^;_§J®A`2®_eA3G1G10:fA:;0^>/87:9gQ.9H=/84`2L9:+²4.+ £³ A`2PQKF$;/032.QMyJ.032PQ.;0^j^0:=032LH::¤LOCA$_HF.=k9H=A`2¦Q.9H=/84`2$9<;/2$CpA:;omn0^>/8A`2l=J.0^;/2.4fKF$>2$A:>A`2rmn0^j$/2$4QP9H=/84`2_oJ.A3/1_e9H=<+l´E._e9HI.>/8A`2.=HL=F._oJr0:=uµE¶^·p·8¸¹¶:º¸:»½¼p¾À¿¢¶oÁ»½¼pü1ÄH¸3»½¼p¶:Å £Æ ;03FP2Ç9<>v0:G£+1LH::­¤~0^;99H==k9[2`>/10:G1G8?IA3/23>aqfK0:=k9HQ032.QÈA`2.G8?&I.;A$QKF._e9uG8A$_<0:G1G8?&A3IP>/mn0:GQ.9H=/84`2P=<+ ] Q.9H_</1=/8A`2bac>J$9<A:;9<>/1_T4:;0:IKJ./1_<0:Gym~A$Q.9HG£L>k9<;omn9HQ&Éo¶^··1¸`¹¶:ºÊ¸3»½¼¾^¿Ë¿§Ì§¼1Í:ÅÇÅN¿<»½Î¶:ºÏ MyÐѪ¤LE/1=I.;A3IA3=k9HQl/2 -6/1032$49<>0:G£+1L«^¬:¬^Ò$¤+MzA`mxINA`2.9[23>a_e9[2`>k9<;9HQgQP9H=/84`2d/1=_eA`2.=/1Q.9<;9HQg/2u>J$9v_eA`23>k9e$>A:CI.;A$QKF._e>6G1/8Cp9H_e?b_<G89Tmn032.0^4:9[mn9[23> ¨Ó±u¤h/8>J¢>J$9A3f$Ô9H_e>/87:9ÕA:CVAY7:9<;0:G1GA3I.>/mx0:GNI9<;CpA:;omn032P_e9_eA`FP2`>a/2$4dQP/87:9<;o=k9FP2._e9<;>0:/23>@?¥C;A`mÖmn0^>k9<;/10:G1=<Lymn032bFbaC£0:_e>F$;/2$4r032.QA3I9<;0^>/2$4.L0:=vhz9HG1G|0:=I.;9<Cp9<;9[2._e9A:Cx7:9[2PQ.A:;=¢032.Q×F.=k9<;=H+ ] =_oJ$9[mn9gCpA:;lm,F.G8>/10Àa

4:9[2`>_eA3G1G10:fA:;0^>/8A`2/1=BQP9<7:9HG8A3IN9HQ -/1032$4Ç9<>0:G£+1L«^¬:¬3ؤnhJ./1_§J®;9HQKF._e9H=T_eA`mnIKG89e$/8>@?Ç9e.INA`2.9[23>/10:G1G8?Cp;A`m">J.0^>|A:C#0v_e9[23>k;o0:G1/8<9HQQP9H=/84`2tf?T9e.J.03FP=k>/87:99<7^0:GF.0^>/8A`2+|JP/1=v_eA`2`>k;/1fKF$>/8A`2&4:Ab9H=TC½F$;>J$9<;>kA9[2.0:fPG899<x_</89[2`>ÕG8Ab_<0:GQ.9H=/84`20^>9H0:_oJ¢0^4:9[2`><+ ] Q$aQP/8>/8A`2.0:G#_eA`2.=>k;0:/23>=6CpA:;)MOÐ6Ñ0^;9,0:G1=kABI.;A3IA3=k9HQ>kA|C£0:_</1G1/8>0^>k9m~A$Q.9HG:_eA`2.=k>k;§F._e>/8A`2ª032PQ)7:9<;/8wK_<0^>/8A`2+Ù ÚuÛKÜ^cÝVÞgÛ.HßT©3àEÜ] =/2`>k;A$QKF._e9HQr0:fA[7:9:L>J./1=hzA:;jl/1=,032l9H==k9[2`>/10:GIP0^;>A:CE>J$9ª;9H=k9H0^;_oJ>kAYhz0^;QK=m,F.G8>/10^4:9[2`>_eA3G1G10:f.aA:;0^>/87:9Q.9H=/84`2+z ./2._e9Õ>J$9)CpAb_HFP=J$9<;9)/1=>J.9)Q.9H_</áa=/8A`2I.;A$_e9H==z0^>y0=/2.43G890^4:9[2`><LKm,F.G8>/10^4:9[23>y/1==oF$9H=_<032f9=k9<>Õ0:=/1Q.9h|/8>J$A`F.>0À9H_e>/2$4B>J$9,/23>k9<4:;o/8>q?A:Cz>J$9~;9H=F.G8>ÕI.;9H=k9[23>k9HQ©+âk2>J./1=Õ=k9H_e>/8A`2Lhz9nQ.9eawN2$90Ë`¿ȩ¼1Í:ÅBÅN¿H»½Î¶:ºÏ Ð6Ñ)¤E0:=>J.9|j:9<?n4:;0:INJ./1_<0:Gm~A$Q.9HG0:==kA$_</10^>k9HQth|/8>Jt=F._oJT032T0^4:9[2`><+iJ$9ÕQ.9<w.a2./8>/8A`2J$9<;9)9e$>k9[2.QP=|>J.0^>/2 -/1032$4v9<>|0:G£+1L«^¬:¬^Ò$¤=kAg>J.0^>/8>/1=2.A:>vA`2.G8?ÇmnA:;9t9e.I.;9H==/87:9/2;9HI$a;9H=k9[2`>/2$4lQP9H=/84`2¦j.2$A[h|G89HQP4:9:LOfNF$>n0:G1=kAgmnA:;9B;9ea=k>k;/1_e>/87:9|>kA)fN9<>k>k9<;4`F./1Q.9m~A$Q.9HG$_eA`2.=k>k;oFP_e>/8A`2~032.Q7:9<;/8wK_<0^>/8A`2+ ] =I9H_e>=/2 -6/1032$4n9<>0:G£+1L«^¬:¬^Ò$¤O0^;9A`F$>G1/2$9HQ¢J$9<;9f.;/89<ãP?TCpA:;_eA`mnIKG89<>k9[2$9H==<+yä9H0:QP9<;=0^;9);9<C9<;;9HQT>kAn;9<Cp9<;9[2._e9)CpA:;m~A:;9)Q.9<>0:/1G1=<+] QP9H=/84`2È2$9<>qhzA:;jÇ/1=T0g>k;/1IPG89dåçæ cèéênéë ¤

hJ$A3=9=>k;oF._e>F$;9×/1=0"_eA`2K2$9H_e>k9HQ¯Ð ] D ì æcèéê ¤+d|J$9=k9<>,A:C2$A$Q.9H=<L#9H0:_oJ_eA:;;9H=IA`2.QP=ª>kA0d7^0^;/10:fPG89:L/1= è æîíçïuðïgñ ï¥ò+Èíó/1=v0322$A`2bac9[mxI.>q?u=k9<>A:C)Ë`¿§Ìe¼8Í3ÅÁ$¸:º¸:Ãn¿<»q¿<ºo+tðô/1=0=k9<>A:C¿HÅP¾[¼pºÊ¶3ÅPÃn¿<ÅP»c¸^·<õ<¸3É[»c¶:ºÌ;9HI.;9H=k9[2`>/2$4>J$9zFK2._e9<;ka>0:/2mn032bF$Cp0:_e>F.;/2$4x032.QA3I9<;0^>/2$4x9[2`7$/8;A`2Pm~9[2`>CpA:;t>J$9uI.;A$QKF._e>tFP2.QP9<;TQP9H=/84`2+öñ /1=032¡2$A`2ba

Page 320: Proceedings of the Third European Workshop on Probabilistic

9[mxI.>q?n=k9<>OA:C©A3f.ÔÊ9H_e>/87:96ÁP¿<ºõe¶:º§Ã~¸:ÅKɧ¿Ã~¿¸ÀÌ ºk¿§ÌA:C>J$9~I.;AbQKFP_e><+nò/1=Õ032u2.A`2bac9[mnI.>@?=k9<>ÕA:CO=F.f.ÔÊ9H_§a>/87:9 »½¼·á¼p»õ ÅKÉH»½¼£¶:Å$ÌOA:C#>J$9)Q.9H=/84`2.9<;H+ê /1=|032t2.A`2bac9[mnI.>@?=9<>A:CO·1¿@Í3¸^·.0^;_<=<+isu9ª;9<Cp9<;

>kATQP/1=k>/2._e>9[2.QPIA3/2`>=|Cp;A`m í 0:=T032.Q£LNCp;A`mð®0:=032PQ L`Cp;A`mñ 0:="032PQ L3C;A`mòÈ0:= 032.Q £L3;9H=I9H_e>/87:9HG8?:+|J.9<;90^;9|=/á,>q?$I9H=EA:CG89<430:G0^;_<=H°^+ ] ;_ é £¤=/84`2P/8wP9H=>J.0^>>J.9i>@hiAÕIP0^;03mn9<>k9<;=0^;9ª/237:A3G87:9HQ/2B0nQP9H=/84`2B_eA`2.=k>k;o0:/23><+

«b+ ] ;_ é ¤|;9HI.;9H=k9[23>=)>J.0^>ªI9<;CpA:;omn032._e9QP9HIN9[2PQP=OA`2BQ.9H=/84`2IP0^;03mn9<>k9<;.+

+ ] ;_ é p¤ª;9HI.;9H=9[23>=Q.9HI9[2.Q.9[2._e?¦f9<>qhz9<9[29[2`7$/8;A`2Pm~9[2`>0:G6C£0:_e>kA:;=<+J./1=>q?$IN9¢A:C0^;_<=/1=Ç0:QPQ.9HQ>kAG89<430:GB0^;_<=/2 -6/1032$49<>¥0:G£+1L«^¬:¬^Ò$¤>kAv9[2._eA$Q.9~_eA`mnIKG89eQ.9HI9[2.Q.9[2._e9;9HG10Àa>/8A`2P=|03m~A`2.4~9[237$/8;A`2Pmn9[23>0:GC£0:_e>kA:;=<+

Ò.+ ] ;_ é ¤#=/84`2./8wP9H=i>J.0^>iI9<;CpA:;omx032._e9öQ.9eaI9[2.QK=OA`2T9[237$/8;A`2Pmn9[23>yC£0:_e>kA:;o+

Øb+ ] ;_ é p¤Q.9<wN2$9H=N0:=y0,_eA`mxINA3=/8>k9I9<;kaCpA:;omx032._e9ªm~9H0:=F.;9:+

­b+ ] ;_ é ©¤=/84`2P/8wP9H=)>J.0^>F$>/1G1/8>q? gQ.9HI9[2.QP=A`2I9<;CA:;§mn032._e9¢+

ë /1=x0u=k9<>~A:CIA:>k9[2`>/10:G1=nA`2$9t0:==kAb_</10^>k9HQÇh|/8>J9H0:_§JB2$AbQP9t/2v>J$9Oõ<¶:ºoà A:CV0,_eA`2.QP/8>/8A`2P0:GI.;A3fP0ÀafP/1G1/8>@?BQP/1=k>k;/1fKF.>/8A`2 ëx V¤k¤LKhJ.9<;9 ©¤z/1=y>J$9=k9<>BA:C,IP0^;9[2`>t2$A$Q.9H=TA:C+ ] _<_eA:;QP/2.4¥>kAÇG89<430:G0^;_<=HLb0Q.9H=/84`2nIK0^;03m~9<>k9<;_<032vJ.0[7:9A`2PG8?nQ.9H=/84`2IP0^;o03m~9<>k9<;=,0:=nIP0^;9[23>=<+gâ@Cu/1=n0;AbA:><L ëx .¤/1=0u_eA`2.=>0323>xQK/1=k>k;/1fKF$>/8A`2+! >J.9<;h|/1=k9:L ëx " .¤k¤m,F.=k>O_eA`2`>0:/2B>J$967^0:GF$9)¬ /£+²9:+1LN2$A:>y=k>k;/1_e>G8?BIA3=/áa>/87:9^¤+|J.9z2$A`2baq=>k;/1_e>G8?`aqIA3=/8>/87:9O;9$#$F./8;9[mn9[23>_<032f9FP2PQ.9<;=k>kAbAbQÇ0:=vCpA3G1G8A[h=<°usÈJ.9[2% .¤,/1=T2$A`2ba9[mxI.>q?:L ëx & $¤k¤i;9HI.;9H=k9[2`>=Q.9H=/84`2T_eA`2P=k>k;0:/2`>=<+âÊ>~=I9H_</8wP9H=~FP2.Q.9<;hJ.0^>~_eA`2$wP4`F.;0^>/8A`2.=,A:C .¤L_e9<;>0:/2g7^0:GF$9H=A:C¢0^;9T/1G1G89<430:G£L032.Qg>J$97À0:GF$9T¬=/84`2P/8wP9H=y7$/8A3G10^>/8A`2tA:CEQ.9H=/84`2t_eA`2.=k>k;0:/2`>=<+¨A:>k9[2`>/10:G ëx ' o¤k¤u/1=l0>@?bIK/1_<0:GvI.;A3fP0:fP/1G1/8>q?

QP/1=>k;/1fKF$>/8A`2+ ] _<_eA:;oQP/2$4T>kAG89<430:GE0^;o_<=<L( ¤y_eA`2ba=/1=>=A:C#9[237$/8;A`2Pmn9[23>0:G©Cp0:_e>kA:;=A`2.G8?:+ ëx ¤k¤/1=0:G1=kA0x>q?$IP/1_<0:G#I.;A3fP0:fK/1G1/8>q?tQP/1=k>k;o/1fKF$>/8A`2LN;9HI.;9ea=k9[2`>/2$4,>J$9ÕFP2._e9<;>0:/2Q.9HI9[2.QP9[2._e9A:CV>J$9I9<;CpA:;kamx032._e96A`2BQ.9H=/84`2LK9[237$/8;A`2Pmn9[23><LP0:=|hi9HG1G©0:=A:>J$9<;I9<;CpA:;omn032P_e96mn9H0:=F$;9H=<+

] G1GyF$>/1G1/8>q?g7À0^;o/10:fPG89H=~0^;9BfP/2.0^;?lh|/8>JrQ.A`mn0:/2)+*Né-,/. +|J$9INA:>k9[2`>/10:G ëx Bæ * ©¤k¤/1=i0:==/84`2$9HQ0¥F$>/1G1/8>@?&C½FP2._e>/8A`20 ¤k¤xhJ$A3=k9¢7À0:GF.9H=T;032$4:9/201 ¬ é 32032.Q¦/2._<GFPQ.9T>J$9T>qhzAdfNA`FK2.QP=<+ ] _<_eA:;Q$a/2$4>kAG89<430:GO0^;_<=HL4 ¤Õ_eA`2.=/1=k>=A:CI9<;CpA:;omn032._e9mn9H0:=F$;9H=|A`2.G8?:+J$9IA:>k9[2`>/10:G ëv æ , ¤k¤/1=0:==/84`2$9HQ 65 ëx ×æ * ©¤k¤+´i0:_oJÈ2$A$Q.97 ¡/1=0:G1=kA¢0:==/84`2.9HQg0hi9H/84`J`>68%9:1 ¬ é 32y=F._§Jd>J.0^>>J$9hz9H/84`J3>=|A:CE0:G1GVF$>/1G1/8>q?2.AbQ.9H=y=oFPmô>kAnA`2$9:+sÈ/8>J ë >JF.=rQ.9<wN2.9HQLBå /1=ÇÌ3^ÅP»c¸3É[»½¼pÉo¸^·p·;È0

Æ 0H?:9H=/10322.9<>qhzA:;jN+ ] ==FKmn/2$4~0:QPQP/8>/87:9,/2.Q.9HI9[2baQ.9[2P_e9 £³ 9<9[2.9<?032.Qä|0:/á0$LH<^­¤|03m~A`2$4TF$>/1G1/8>q?7^0^;/10:fPG89H=<LK>J$9Õ9e$I9H_e>k9HQF$>/1G1/8>q?BA:CE0~Q.9H=/84`27=

ê ò =¤Eæ?>A@B8 @ >DCE @ F ¤ ëxF =¤k¤ À¤

_<032lfN9n_eA`mxIKF$>k9HQdf?¢=k>032.QP0^;QlI.;A3fP0:fP/1G1/1=>/1_x/2baCp9<;9[2._e9y/2nå -6/1032$4Õ9<>E0:G£+1Lb«^¬:¬^Ò$¤L`hJ$9<;9= fA3G1Q¤/1=)0B_eA`2.wP4`F$;0^>/8A`2dA:CzíLHG|/2PQ.9eb9H=F$>/1G1/8>q?d2$A$Q.9H=/2tòL F! fA3G1Q¤E/1=z0,_eA`2$wK4`F$;0^>/8A`2vA:C©>J$96IP0^;9[2`>=A:C @ L032PQB8 @ /1=>J$9hi9H/84`J`>n0:==kA$_</10^>k9HQ¦h|/8>JI @ +J#/84`F$;9nª=J$A[h=0~>k;/87$/10:G©9e.03mnIPG89)Ð6Ñ+

u_cost

u_perf

d_memory

u_stability

t_humidityt_temperatured_12v

d_c_voltage

d_c_chipset

m_io_perf m_costu_io_perf

d_b_chipsetd_io_controler

m_perf

m_stability

J/84`F.;9x^° ] >k;/87$/10:GVÐ6ÑCA:;|¨OM¡m~A:>J.9<;fA30^;Q+ä|9H=k>k;/1_e>/8A`2ÈA:CG89<430:G)0^;_<=tQPA9H=t2$A:>B_eA`2.=>k;0:/2

å=F$v_</89[23>G8?&=kA¦>J.0^>T9<7:9<;?&_eA`mxIA`2$9[23>/1=T;9HGáa9<7^0323>~>kA¢>J$9BQ.9H=/84`2r>0:=kjNLz0:=,h|/1G1GOf9H_eA`m~9B_<G89H0^;f9HG8A[hª+su9/mxIA3=k96>J$9)CpA3G1G8A[h|/2$4d¿§ÌoÌ<¿HÅP»½¼p¸:·á¼p»6;9ea#$F./8;9[mn9[23>E>kAI.;A[7$/1Q.9C½F$;>J.9<;4`F./1QP032._e9>kAmnAbQ.9HG_eA`2.=>k;oF._e>/8A`2032.QÕ>kA|C£0:_</1G1/8>0^>k9|m~A$Q.9HG^7:9<;/8wN_<0^>/8A`2+JPA:;t0323?&Q.9H=/84`2ÈIP0^;03mn9<>k9<;K.L/8C>J$9<;9¢9e./1=k>=0uQP/8;9H_e>k9HQÇIP0^>J¥/2®åC;A`mLd>kAg0lF$>/1G1/8>q?2$A$Q.9 VLª>J$9[2MÈ/1=¥¿§ÌÌH¿<ÅP»½¼£¸^·á+N >J$9<;h|/1=9:LO¡/1=rÅK¶:ÅP¿eÌÌ<¿HÅP»½¼£¸^· + âÊCPÈ/1=d2$A`2$ac9H==k9[2`>/10:G£L>J.9[2>J.9r9ebaI9H_e>k9HQ,F$>/1G1/8>q?A:CK0323?Õ43/87:9[2,QP9H=/84`2ª/1=#/2.QP9HIN9[2PQ.9[23>A:Cy>J$9v7À0:GF.9x>J.0^>>0^j:9H=<+âk2uA:>J$9<;hzA:;QP=<L>J$9A3I.>/mx0:G6Q.9H=/84`2&A[7:9<;>J$9u;9[mn0:/2P/2$4rQP9H=/84`2ÈIP0Àa;03mn9<>k9<;=ª/1=,/2.QP9HIN9[2PQ.9[23>ªA:C.LE032.Qg>J$9A3I.>/mn0:G

310 Y. Xiang

Page 321: Proceedings of the Third European Workshop on Probabilistic

7^0:GF$9tCpA:;Pl/1=vFP2.Q.9<wN2.9HQ+|9[2._e9:L l_<032f9B;9eam~AY7:9HQnCp;A`måuh|/8>J$A`F$>0À9H_e>/2$4>J$9A3I.>/mx0:GPQ.9ea=/84`2A[7:9<;ª;9[mn0:/2./2.4vQ.9H=/84`2IP0^;o03m~9<>k9<;=032.Q>J$9mn0À./m,FPm9e$I9H_e>k9HQF$>/1G1/8>q?:+J.A:;6032`?TI9<;CpA:;omx032._e9)m~9H0:=F$;9 ¢LK/8C#>J$9<;9)9eba/1=k>=y0,QP/8;9H_e>k9HQBIK0^>JT/2tådCp;A`mô0,Q.9H=/84`2TIK0^;03m~9ea>k9<;ª>kA6¢L$0:=hz9HG1GN0:=z0ªQK/8;9H_e>k9HQvIK0^>JnCp;A`m ">kA0F$>/1G1/8>@?l2$A$Q.9 ©LV>J.9[2 /1=v¿eÌÌ<¿HÅP»½¼£¸^· +vs&/8>J.A`F$>>J$9IP0^>J~C;A`m )>kA¢L h|/1G1GK2.A:>Q.9HI9[2.QA`2~0323?Q.9H=/84`2+sÈ/8>J$A`F$>6>J$9,IP0^>JCp;A`mE¯>kA ©L( h|/1G1G2$A:>/2.ãNF$9[2._e9>J.9A3I.>/mn0:GKQ.9H=/84`2+#âk2,9H/8>J$9<;O_<0:=k9:L /1=Q.9<9[mn9HQuÅK¶:ÅPÂk¿§ÌoÌ<¿HÅP»½¼p¸:·$032PQt_<032Bf9Õ;9[m~AY7:9HQh|/8>J$A`F.>0À9H_e>/2$4>J$9ÕA3I.>/mx0:GQ.9H=/84`2+|J.9 _<0:=k9CA:;&9[2`7b/8;A`2Km~9[2`>0:GC£0:_e>kA:;=ÈQP/á9<;=

Cp;A`m >J$9x0:fA[7:9:+vMOA`2.=/1Q.9<;)0BIP0^>J pLhJ$9<;9 ó/1=9H==k9[2`>/10:Gy032.QI /1=~0G89H0^CÊ+r .F.IPIA3=k9>J.0^>>J$9ª7^0:GF$9A:C /1=j.2$A[h20^>>J.9ª>/m~9A:CQ.9ea=/84`2 032T9eb>k9[2P=/8A`2B>kAv>J$9)_eA`23>k9e$>CA:;|´#b2+ À¤k¤+Ð9HI9[2.QP/2$4)A`2n>J$9|j.2$AYh2~7^0:GF$9|A:C Lb>J$9|/mxIP0:_e>A:C )A`2Imn0[?lQP/á9<;H+âÊC|0QP/8;9H_e>k9HQ¥IP0^>JlCp;A`m9H0:_oJÇ9[2`7b/8;A`2Km~9[2`>0:GC£0:_e>kA:;v>kAr0gF$>/1G1/8>@?Ç2$AbQP9/1=;9$#$F./8;9HQL>J$9T2$AbQP9 0:fAY7:9h/1G1Gf9xQP/1=0:G1G8AYhi9HQ©+|J$9<;9<CA:;9:LN032B9[2`7b/8;A`2Km~9[2`>0:G©C£0:_e>kA:;y/1=|QP9<9[m~9HQ¿§ÌoÌ<¿HÅP»½¼p¸:·/8CV>J.9<;969e$/1=k>=y032TFP2PQP/8;9H_e>k9HQTIP0^>JT/2BåCp;A`m >kA)0ªIN9<;CA:;omx032._e9ym~9H0:=F$;9 ¢+ >J$9<;h|/1=k9:LO/1=ªÅK¶:ÅK¿eÌÌ<¿HÅP»½¼£¸^· +su9;9$#bFP/8;9B>J.0^>v0:G1G|Q.9H=/84`2IK0^;03m~9<>k9<;=HLzI9<;ka

CpA:;omn032._e9Tm~9H0:=F.;9H=<LE032.Qg9[237$/8;A`2Pmn9[23>0:GiCp0:_e>kA:;o=/2B0nQP9H=/84`2t2$9<>@hiA:;jT>kAxf9Õ9H==9[23>/10:G£+

ÚuÛ.:ÛKb3qÝÇcÛPÝÚuÛKÜ^cÝÜ] 2v0^4:9[2`>9$#$F./1IPI9HQvh|/8>JT0,ÐÑ _<032T_eA`mnIKF.>k99ebaI9H_e>k9HQF$>/1G1/8>@? ê ò =¤ ´#$2+ À¤k¤zCpA:;|9H0:_§J0:G8>k9<;ka2.0^>/87:9TQ.9H=/84`2¦F.=/2.4=k>032.QP0^;oQgI.;A3fP0:fP/1G1/1=k>/1_x;9H0Àa=kA`2./2.4.+ A[hz9<7:9<;HL,wN2.QP/2$4Ç>J$9gA3IP>/mn0:GªQP9H=/84`2f?9ePJ.03F.=k>/87:9HG8?9<7À0:GFP0^>/2$4B9H0:_oJuQP9H=/84`2¢J.0:=6>J$9_eA`mnIKG89e$/8>@? ¤LhJ$9<;9 /1=v>J$9mx0À$/m,FPm2bFPmfN9<;ªA:Cy7À0:GF$9H=>J.0^>0QP9H=/84`2lIK0^;03m~9<>k9<;,_<032>0^j:9:+su9ÕI.;9H=k9[23>y0~m,F._oJm~A:;99<x_</89[2`>m~9<>J$A$Qf9HG8A[h)+] 2/1G1G89<430:GQ.9H=/84`27$/8A3G10^>k9H=ª0^>)G89H0:=k>ÕA`2.9,QP9H=/84`2

_eA`2.=k>k;o0:/23><+xJ$9~=kAA`2.9<;ª/8>)_<032df9~Q.9<>k9H_e>k9HQL>J$9=kAbA`2$9<;>J$99<7^0:GF.0^>/8A`2&_eA`mnIKF.>0^>/8A`2Ç_<032®f9QP/áa;9H_e>k9HQ¥>kAd0:G8>k9<;§2.0^>/87:9Q.9H=/84`2.=~032.Q¦>J$9tm~A:;99<CawK_</89[2`>u>J$9¦A[7:9<;0:G1GvQ.9H=/84`2_eA`mxIKF$>0^>/8A`2+! .F.I$aIA3=k96>J.0^>0,_eA`2.=k>k;0:/2`>|/2`7:A3G87:9H=|0~=FPfP=k9<>íA:CfK/2.0^;?Q.9H=/84`2ôIK0^;03m~9<>k9<;=! é1é `LhJ$9<;9

! é1é x0^;9>J.9nIP0^;9[2`>=ÕA:C /2>J.9nÐÑL©=F._§J>J.0^> é1é _<032K2$A:>u>0^j:97^0:GF$9¬¡0:G1G~0^>u>J$9=03mn9>/m~9:+ |J./1=g/1=g=IN9H_</8wK9HQ/2×>J.9®Ð6Ñ!f`?ëx ¥æó¬ " gæó¬ é1é ®æó¬b¤æ ¬$+ ] 23?ÈQ.9ea=/84`2d>J.0^>ª0^;9x_eA`2P=/1=k>k9[2`>ªh|/8>Jd>J$9x_eA`2$wP4`F.;0^>/8A`2 #æ ¬ é1é dæ¬b¤/1=~032¥/1G1G89<430:GQ.9H=/84`2+ÑA:>k9>J.0^>n>J.92FPmf9<;,A:C=oF._oJ¥/1G1G89<430:GQ.9H=/84`2.=x0^;9t/2>J$9)A:;oQ.9<;yA:C$ 8 &%(' ¤+] Ð6Ñ!/1=g=?$2`>0:_e>/1_<0:G1G8?0 Æ 0[?:9H=/1032ô2$9<>qhzA:;j

032.QJ.9[2._e9ª_<032tf9)_eA`mxIP/1G89HQ/2`>kAv06ÔFP2._e>/8A`2B>k;9<9) Õ¤ðª+vÐÕF$9x>kA>J$9vmnA:;0:G1/8H0^>/8A`2g=k>k9HIl/2l_eA`m,aIP/1G10^>/8A`2Lª>J$9<;9u9e./1=k>=0Ç_<GF.=k>k9<;+* ,- /2&ðª+] C>k9<;t_eA`mxIP/1G10^>/8A`2L9H0:_oJ¡_<GF.=>k9<;+.î/2®ð/1=B0:=a=/84`2$9HQ0ªIA:>k9[2`>/10:G/ .,¤AY7:9<;z>J.9=k9<>0.A:CVm~9[m,af9<;7^0^;/10:fPG89H=<+J$9l0:==/84`2Km~9[2`>>kA1*Ö=0^>/1=kwK9H=/ #|æȬ é1é ÕæȬb¤#æȬ$+J#/8;=k><L~hz90:==FKm~9r>J.0^>2* æ3d+ MzA`2P=/1Q.9<;0dQP/á9<;9[2`>x_eA`2$wP4`F.;0^>/8A`2A:C4* hJ$A3=k9tIA:>k9[2`>/10:G7^0:GF$9u/1=t2$A`2bac<9<;A.LCA:;B/2.=k>032P_e9:L>J$9u_eA`2$wK4`F$;0Àa>/8A`2 æ ¬ é æ é #5 æ ¬ é1é æ ¬b¤=F._§J&>J.0^>6/ #¥æÖ¬ é " dæ é 5 æÖ¬ é1é &æ¬b¤07¡¬$+Av9<7^0:GF.0^>k9~032`?tQ.9H=/84`2_eA`2.=/1=k>k9[2`>h|/8>J #Tæö¬ é1é tæö¬b¤LE7^0:GF$9H=#Tæ"¬$Ly+1+1+1LTæç¬0^;99[23>k9<;9HQ&/2`>kA8*,+s¡J$9[2 ! ¢æ ¬g/1=v9[23>k9<;9HQ/2`>kA9*,L~>J$9IA:>k9[2`>/10:Gn7^0:GF$98/ #æ ¬ é ! æ é 5 æ׬ é1é ~æ¬b¤/1=6m,F.G8>/1IKG1/89HQf?¬Tf9H_<03F.=k9>J$9_eA`2$wP4`F$;0^>/8A`2Ç/1=v/2P_eA`2.=/1=k>k9[2`>xh|/8>J æ ¬$+] =t>J$9d;9H=F.G8><LÕ>J$9gFPIQK0^>k9HQIA:>k9[23>/10:Gª7À0:GF$9r/1=/ #Tæç¬ é " næ¯ é 5 æç¬ é1é æö¬b¤æç¬$+A=FPmvmn0^;/8<9:L#f9<CA:;9x9[2`>k9<;/2$47À0:GF$9H=" næç¬$LE>J$9IA:>k9[23>/10:Gz7^0:GF$9:/ væ"¬ é1é æç¬b¤6/1=¬$L032.Q0^Cp>k9<;9[2`>k9<;/2$4!! æ ¬$L>J$9INA:>k9[2`>/10:G7À0:GF.9CpA:;9<7:9<;?BA:>J$9<;6_eA`2$wP4`F.;0^>/8A`2/1=|¬$+09[2._e9:L0^C>k9<;9[2ba>k9<;/2$4K æ׬$L+1+1+1LH æ¬$L9<7:9<;?IA:>k9[23>/10:G7^0:GF$9/26/ *¤f9H_eA`mn9H=¬$+Ñ9eb><L®0:==FPmn9×>J.0^>;* æ ï;<vLrhJ$9<;9

< =æ >032.Q?< @A æ >b+ $/2._e90323?öIA^a>k9[2`>/10:GB/ < é # é1é 3¤r_<032çf9¡Cp0:_e>kA:;/8<9HQ /2`>kA/ < é é1é ¤C/ é1é ¤L¢hz9 _eA`2._<GF.QP9Cp;A`m>J$960:fA[7:96>J.0^>O0^C>k9<;O9[2`>k9<;/2$4#|æȬ$L+1+1+1L æȬ$L9<7:9<;?BIA:>k9[23>/10:GV7À0:GF.9ª/26/ *¤f9H_eA`m~9H=|¬$+|JP/1=V032.0:G8?$=/1=#=F$4:4:9H=k>=#0m~9<>J$A$Q)>kA6Q.9<>k9H_e>E/1G1G89ea

430:GQ.9H=/84`2.=H+s¡J$9[20IP0^;>/1_HF.G10^;yQ.9H=/84`2 IA3==/1fPG8?/1G1G89<430:Gp¤i/1=O9<7^0:GF.0^>k9HQL.CpA:;O9H0:_oJB_<GF.=>k9<;D* >J.0^>J.0:=f9<9[2~0:==/84`2.9HQ ëx & $¤k¤#;9HG10^>/87:9>kA0)Q.9H=/84`2xIP0Àa;03mn9<>k9<;P.Lz9[23>k9<;v>J$9t7^0:GF$9tA:Cl032PQ¥>J$9B7^0:GF$9A:CE9H0:_oJQ.9H=/84`2IP0^;o03m~9<>k9<;/2 $¤y/2`>kA6*~+ ] Cp>k9<;9[2`>k9<;/2$4.LP_oJ.9H_jB/8CV>J$9Õ=FKmA:CIA:>k9[23>/10:G7^0:GF$9H=yA:C

Optimal Design with Design Networks 311

Page 322: Proceedings of the Third European Workshop on Probabilistic

/ *¤K/1=V¬$+ ] =©=kAbA`2Õ0:=V0yIA3=/8>/87:9>k9H=k>VA$_<_HF$;=<LÀ_eA`2ba_<GF.QP9x>J.0^>~0:G1GOQ.9H=/84`2.=>J.0^>~0^;9T_eA`2.=/1=k>k9[2`>,h|/8>J>J./1=|_eA`2$wP4`F$;0^>/8A`2BA:C é $¤k¤O0^;9ª/1G1G89<430:G£+O|J$9<;9h|/1G1Gf9ª2$Ax2.9<9HQB>kAx9<7^0:GF.0^>k9032`?A`2$9)A:C#>J$9[mB+ ÚuqÚd@ÜÀqVâk2T0:QPQP/8>/8A`2T>kAx9H0^;G8?BQ.9<>k9H_e>/8A`2BA:C#/1G1G89<430:GQP9H=/84`2.=<Lhz9z=9<9<j)>kA69<7À0:GF.0^>k9yG89<430:G$Q.9H=/84`2P=#mnA:;99<v_</89[23>G8?0:=CpA3G1G8A[h|=<+6su9~0:==FKm~9ª>JP0^>=kA`mn9=9HIP0^;0^>kA:;=6/2ð¡_eA`2.=/1=k>OA:CVQP9H=/84`2IP0^;03m~9<>k9<;o=iA`2PG8?:+|J$96=k>k;oF._§a>F$;9tA:CÕ032¥9e.03mnIPG89Ð6Ñ /1=v=J$A[h2/2 J#/84`F$;9«b+âÊ>Õ/1=Õ_eA`237:9<;>k9HQg/23>kAB>J.9 ) ×/2 J#/84`F$;9 +âk2>J$9) ÕL>J$;9<9=k9HIP0^;0^>kA:;= ) é . L ) é . 032.Q) é . _eA`2.=/1=k>A:C#Q.9H=/84`2IP0^;03mn9<>k9<;=OA`2.G8?:+

u_w

u_v

d_a

m_g

m_h

t_p

m_j

d_d

t_qd_b

m_i

d_c

J#/84`F$;9«b°|J$9)=k>k;§F._e>F$;9ÕA:CE0~Q.9H=/84`22$9<>qhzA:;j+5=/2$4">J.9H=k9=k9HIP0^;o0^>kA:;=0:=fA`FP2PQP0^;/89H=<Ldhz9

_eA`mxIP/1G89~>J$9 ) ð/2`>kA0B;9HI.;9H=k9[2`>0^>/8A`2L#_<0:G1G89HQË:¼p¾[¼ ̧¼£¶:Å»½º¿o¿>J.0^>/1=m~A:;9u9e9H_e>/87:9¦CA:;Q.9H=/84`2_eA`mxIKF$>0^>/8A`2+,¿iÅP¼»½¼£¶:Åd#ÓV9<>zå¢f90)Q.9H=/84`2n2$9<>qhzA:;j,A[7:9<;O0322$A`2.9[mnI.>@?=k9<>A:C9H==9[23>/10:GK7À0^;/10:fKG89H= è æÈí¥ïðTïñïò+ ] CpA:;å¥/1=0~>F.IPG89 uæcèé"!Té$#é$% ¤ è /1=>J$9 & '() & A:C*+ !/1=y0n=oF.fP=k9<>OA:C>J$9)IA[hz9<;=k9<> ë ,+ cè ¤A:C è =FP_oJ>J.0^>)ï.-*/'0Çæ è +´0:_oJ9HG89[mn9[23>A:C ! /1=6_<0:G1G89HQ¢0 )1' + # /1=|_eA`mxINA3=9HQTA:C>J$9ªCA3G1G8AYh|/2$4.°

# æ )32 . é . 7: . é . 9 !Té . =æ . é. @ . =æ > é .@ . í .#

´i0:_oJôFP2$A:;QP9<;9HQIP0:/8; 2 . é . 7ó/1=r_<0:G1G89HQ"0'4(35(3 f9<>qhz9<9[2T>J$9Õ>@hiAxQK/87b/1=/8A`2P=y032.QB/1=G10Àaf9HG89HQ,f`?ª>J$9z/2`>k9<;=k9H_e>/8A`2 . @D.p+ ! 032.Q # 0^;9O=kA_eA`mxINA3=9HQt>J.0^> cèé"!Té$# ¤yCA:;§mn=|0ªÔFP2._e>/8A`2>k;9<9:+% /1=0g=k9<>A:C 768191:5; +¡âÊ>=9HG89[mn9[23>=mx0:IxA`2$9eac>kA^acA`2.9)>kA~9HG89[m~9[2`>=OA:C ! =FP_oJ>J.0^>9H0:_oJ¦QK/87b/1=/8A`2ÔFP2._e>/8A`2g>k;9<9BJ.0:=>J$9T_eA:;;9ea=IA`2.QK/2$4QK/87b/1=/8A`20:=/8>=4:9[2.9<;0^>/2$4v=9<><+

m_j, d_b

d_a, d_dd_d, m_g, m_h

d_d, m_g

u_w, m_h

d_a, t_p, m_g

d_a, m_g

m_h

m_i, m_j, d_b

t_q, m_i, d_b

m_i, m_j, u_v

m_i, d_b

m_i, m_j

m_j, d_b, d_c

d_a, d_b, d_c

d_a, d_c, d_d

d_a, d_c

d_b, d_c

d_a, d_d, m_g

J#/84`F$;9 ° ] ÔkFP2P_e>/8A`2t>k;9<9,;9HIP;9H=k9[2`>0^>/8A`2A:CQ.9ea=/84`22$9<>qhzA:;jB/2 J#/84`F$;9«b+] =0329e.03mnIPG89:Ly_eA`2.=/1Q.9<;0rQP/87$/1=/8A`2Ç>k;9<9CpA:;

>J$9vÐÑçå/2 J#/84`F$;9«b+B|J$9x=9<> è A:Cy7À0^;o/10:fPG89H=/2uå®/1=6>J$94:9[2$9<;0^>/2$4t=k9<><+)J$9=9<>A:CzQP/87$/1=/8A`2.=/1= ! æ )Àè éoè éoè 5 éoè< . 0:=B=oJ$A[h2&/2 J#/84`F$;9¢Ò.+|J.9x=k9<> # A:CQP/87$/1=/8A`2l=9HIP0^;0^>kA:;=/1=0:G1=A=J$AYh2/2>J$9,wP4`F.;9:+ªâÊ>Õ_<032uf99H0:=/1G8?7:9<;/8wK9HQ¢>J.0^>Õ>J$94:;0:INJÕQ.9HIP/1_e>k9HQ/1=0 ) Õ+[J$9=k9<>A:CPQP/87b/1=/8A`2 ) |=V/1=% æ ) åð é åð é åð 5 é åð < . 0:==J.A[h2~/2J/84`F.;9ØbLhJ.9<;9n>J$9v4:9[2$9<;0^>/2$4u=k9<>A:CQP/87b/1=/8A`2 ) åð

@/1=

QP/87$/1=/8A`2 è@+

d_a, d_c

d_a, d_d, m_g, m_h, t_p, u_w

V3

d_b, d_c, m_i, m_j, t_q, u_v

d_a, d_b, d_c

d_a, d_c, d_d

d_b, d_c

V0

V1

d_a, d_d

V2

J#/84`F$;9~Ò.°|J$9nQP/87$/1=/8A`2¢>k;9<9xCA:;ªQP9H=/84`2d2$9<>@hiA:;j/2PJ#/84`F$;9Õ«b+´0:_oJQ.A`FPfPG89eacA[7^0:G;9HI.;9H=k9[2`>=z0QP/87$/áa=/8A`2+´0:_oJfAHB;9HI.;9H=9[23>=0x=k9HIP0^;0^>kA:;H+

)2._e9uÐÑ å"J.0:=Bf9<9[2È_eA`mxIP/1G89HQ¡/23>kA ) ðªL_eA`2.=>k;oF._e>/8A`2¢A:CO0tQP/87$/1=/8A`2¢>k;9<9=®/1=)=k>k;0:/84`J`>kCpA:;kahO0^;Q°|J$9y=k9<>A:CQP/87b/1=/8A`2 ) =E0^;9yA3f.>0:/2$9HQ~Cp;A`mðÇf?;9[m~A[7$/2$4Õ/8>==k9HIK0^;0^>kA:;=E>J.0^>i0^;9_eA`mnIA3=k9HQA:C|QP9H=/84`2¦IP0^;03mn9<>k9<;=A`2.G8?:+J.A:;n/2.=>032._e9:Li0^Cp>k9<;;9[mnA[7$/2$4=k9HIP0^;o0^>kA:;= ) é . L ) é . 032.Q

312 Y. Xiang

Page 323: Proceedings of the Third European Workshop on Probabilistic

d_a, m_g

d_d, m_gm_h

m_i, m_j, d_b

t_q, m_i, d_b

m_i, m_j, u_v

m_i, d_b

m_i, m_j

m_j, d_b, d_c

d_a, d_b, d_c

u_w, m_h

m_j, d_b

ST0

ST2

ST1

d_a, d_c, d_d

ST3

d_d, m_g, m_h

d_a, t_p, m_g

d_a, d_d, m_g

J#/84`F$;9~Øb°Ð6/87b/1=/8A`2 ) =CpA:;ÕQP/87$/1=/8A`2t>k;9<9/2J#/84^aF$;9ÕÒ.+Oåð

@_eA:;;9H=IA`2.QP=O>kAxQP/87$/1=/8A`2 è

@+

) é . Liðç/1=n=IPG1/8>n/2`>kAlQP/87$/1=/8A`2 ) =~/2!J#/84^aF$;96Øb+|J$94:9[2.9<;0^>/2$4~=k9<>=zA:C>J$9H=k96QP/87$/1=/8A`2 ) =f9H_eA`m~9QP/87$/1=/8A`2.=i/2 ! +E|J$9|;9[m~AY7:9HQn=k9HIP0^;0^>kA:;o=f9H_eA`m~9ªQK/87b/1=/8A`2=k9HIP0^;o0^>kA:;=6/2 # +âÊ>/1=0x=/mnIPG89mn0^>k>k9<;|>kAn=J$A[h>J.0^> cèEé"!Bé$# ¤>JbF.=_eA`2.=k>k;§F._e>k9HQCpA:;omn=y06ÔFP2._e>/8A`2B>k;9<9:+

ÚuÛKÜ^cÝV @ 3cVßciqÚd@ÜÀqV´0:_§J¦QP/87$/1=/8A`2 è

@_eA`2`>0:/2.=~032¦2$A`2$9[mxI.>q?d=oF.fP=k9<>

í@A:CQ.9H=/84`2,IP0^;03m~9<>k9<;o=<L3hJ$9<;9GV/2.Q.9e$9H=#>J$9QP/áa

7$/1=/8A`2+#|J./1=V/1=V>k;oF$9z=/2._e9>J$9zQP/87$/1=/8A`2ª_eA`2`>0:/2.=0^>G89H0:=k>y>J$96Q.9H=/84`2IP0^;o03m~9<>k9<;=z>J.0^>OCpA:;om>J$9=9HIP0Àa;0^>kA:;fN9<>@hi9<9[2/8>=k9HG8CE032.Qt032$A:>J.9<;|QP/87$/1=/8A`2+J#/8;=k><LCA:;i9H0:_oJv_eA`2$wP4`F$;o0^>/8A`2nA:Cí @ L`hJ$9<>J.9<;/8>/1=z0ªG89<430:GIP0^;>/10:GNQ.9H=/84`22$9<9HQP=>kA,f9|Q.9<>k9<;omx/2$9HQL>kA>J$9v9e$>k9[2`>,0:G1G8AYhi9HQÇf`?l/2.CA:;omx0^>/8A`2l0[7À0:/1G10:fKG89/2¦>J$9QP/87b/1=/8A`2+®|JP/1=n_<032f9t0:_oJP/89<7:9HQ®0:=xA`F$>aG1/2$9HQu/2l $9H_e>/8A`2 + ] QP/87$/1=/8A`2dmx0H?A:;mn0H?¢2$A:>_eA`2`>0:/2dF$>/1G1/8>q?7^0^;/10:fPG89H=<+â@C/8>ÕQ.Ab9H=)2$A:>Õ_eA`2`>0:/2F$>/1G1/8>@?v7^0^;/10:fPG89H=<L.>J.9[2x>J.99<7^0:GF.0^>/8A`2th|/8>J./2v>J$9QP/87$/1=/8A`2&/1=T_eA`mnIPG89<>k9:+ J.A:;B>J$9¢QP/87$/1=/8A`2®>k;9<9u/2J#/84`F$;9)Ò.L è |032.Q è 5 0^;9ª=oF._oJBQK/87b/1=/8A`2P=<+âÊC©>J$9)QP/87$/1=/8A`2T_eA`2`>0:/2.=|F$>/1G1/8>@?v7^0^;/10:fPG89H=HL.>J$9[2

CpA:;9H0:_§J¢G89<430:G_eA`2.wP4`F$;0^>/8A`2A:Cí@LNC½F$;>J.9<;9<7^0:Gáa

F.0^>/8A`22.9<9HQP=¢>kAf9¦I9<;CpA:;omn9HQ+ |J./1=u_<032×f9Q.A`2$9f`?uf9HG1/89<CI.;A3IK0^430^>/8A`2lh/8>J./2u>J$9QP/87$/1=/8A`2) )+|J$99e.IN9H_e>k9HQ¥F$>/1G1/8>@?lCpA:;,9H0:_oJ¥F$>/1G1/8>@?gC½FP2._§a

>/8A`2_<032n>J.9[2xf9;9<>k;o/89<7:9HQvCp;A`m>J$9_eA:;;9H=IA`2.Q$a/2$4dF$>/1G1/8>@?r7^0^;/10:fPG89:+r|J$9T9e.IN9H_e>k9HQF.>/1G1/8>q?¦_eA`2ba>k;/1fKF.>/8A`2A:C>J$9~IP0^;>/10:GEQP9H=/84`2¢_<032¢f9A3fP>0:/2$9HQf?_eA`mfP/2./2$4>J$9,;9H=F.G8>Cp;A`m /2.QP/87$/1QKF.0:G#F$>/1G1/8>q?7^0^;/10:fPG89H=<+±A:;96I.;9H_</1=k9HG8?:L./2vQP/87b/1=/8A`2 è

@Lb>J$9CA3G1G8AYh|/2$4~/1=

A3f.>0:/2$9HQCA:;|9H0:_oJt_eA`2.wP4`F$;0^>/8A`27=©A:C#í @ Lê ò =£¤#æ >

8 > C F ¤ ëxF =c¤k¤

hJ$9<;9v/2.Q.9e$9H=|F$>/1G1/8>@?t2$AbQP9H=y/2 è@L F /1=|0~_eA`2ba

wP4`F$;o0^>/8A`2&A:C)>J$9uIP0^;9[2`>=TA:C L032PQ08 /1=T>J$9hz9H/84`J3>0:==kA$_</10^>k9HQh|/8>JK ^+su99eb>k9[2PQl>J$9TQ.9<wN2P/8>/8A`2dA:C ê ò =£¤6>kAQP/87$/áa

=/8A`2.=h|/8>J.A`F$>F$>/1G1/8>q?7À0^;/10:fKG89H=032.Q>kA6/1G1G89<430:GKIP0^;ka>/10:G#Q.9H=/84`2 =0:=CpA3G1G8A[h|=<°|su9,=9<> ê ò =£¤zæ ¬v/8CQP/87$/1=/8A`2 è

@_eA`2`>0:/2.=|2$AnF$>/1G1/8>@?v7^0^;/10:fPG89H=<+su9)=k9<>

ê ò =c¤æ , #/8C=V/1=|/1G1G89<430:G£+] C>k9<;~>J.9T0:fAY7:9B9<7À0:GF.0^>/8A`2A`2r9H0:_oJ¥_eA`2.wP4`Fba

;0^>/8A`2¦A:Cí@Li>J$9BQ.9H=/84`2r9<7^0:GF.0^>/8A`2¥h/8>J./2r>J$9

QP/87$/1=/8A`2B/1=_eA`mnIPG89<>k9:+

"ÛKÜ^Ü ÝVÛ Ü^Ü^qÝ&@ÚlqÜ^c3ÛPÛA,_eA`mfP/2.96QP9H=/84`2v9<7À0:GF.0^>/8A`2P=0^>/2.QP/87$/1QKF.0:GQP/áa7$/1=/8A`2.=|032PQtQ.9<>k9<;omx/2$96>J$9)A3I.>/mx0:G©Q.9H=/84`2Lm~9H=a=0^4:9H=t0^;9¢IK0:==k9HQÈf9<>qhz9<9[2¡QP/87$/1=/8A`2.=<L0:G8A`2$4¥>J$9=k9HIP0^;o0^>kA:;=)A:Cy>J$9vQP/87b/1=/8A`2u>k;9<9 9:+²4.+1L4J#/84`F$;9nÒ$¤+|J$9Èm~9H==0^4:9®IP0:==/2$4/1=gA:;43032./8<9HQô/2`>kA>J$;9<9;A`FP2PQP=y0:G8A`2$4n>J$9)QK/87b/1=/8A`2t>k;9<9:+|J.9)=k?$2`>0Àt032.Q=k9[mx0323>/1_<=|A:C#mn9H==0^4:9H=|/2B9H0:_oJB;A`FP2.QBQP/á9<;H+|J.90:G84:A:;/8>JPmn=d>J.0^>dhz9ÇI.;9H=k9[2`>lf9HG8A[hó0^;9

m~A3=>G8?x9e$9H_HF$>k9HQf?/2PQP/87b/1QNF.0:GQP/87b/1=/8A`2.=<L.9e._e9HI.>A`2$9f`?B>J$9,0^4:9[2`>+sÈ/8>J$A`F$>6G8A3=/2$44:9[2$9<;0:G1/8>q?:Lhz9Q.9[2.A:>k9>J$9,QP/87$/1=/8A`2A3f$Ô9H_e>9e$9H_HF$>/2$4T>J$9,0:Gáa4:A:;/8>JPmçf`? è Y+i|J$969eb9H_HF.>/8A`2/1=0:_e>/87^0^>k9HQf`?B0_<0:G1G89<;HLQP9[2$A:>k9HQgf? è LhJ./1_oJg/1=9H/8>J$9<;032r0:Q^Ôk0Àa_e9[2`>zQP/87$/1=/8A`2nA:C è /2 A:;O0^4:9[2`>+E|J$9|=k9HIK0^;0Àa>kA:;yf9<>qhz9<9[2v_<0:G1G89<;yQP/87$/1=/8A`2 è 032.Q è |/1=zQP9[2$A:>k9HQ0:= +BâÊC è J.0:=0:QKQP/8>/8A`2.0:Gz0:Q^Ô0:_e9[2`>,QP/87b/1=/8A`2.=<L>J$9<?t0^;9Q.9[2$A:>k9HQ0:= è éoè 5 é1éoè 032PQB>J$9H/8;6=k9HI$a0^;0^>kA:;=th|/8>J è ¢0^;9uQ.9[2$A:>k9HQ¡0:= é 5 é1é L;9H=I9H_e>/87:9HG8?:+ CqÛKb3qÝ¢3C@½! V:`#"zE3câk26>J$9wK;=k>;A`FP2PQÕA:CPm~9H==0^4:9iIP0:==/2$4.L^QP/87$/1=/8A`2 è ;9H_e9H/87:9H=067:9H_e>kA:;zm~9H==0^4:9OCp;A`m 9H0:_oJ~0:Q:Ô0:_e9[2`>QP/áa7$/1=/8A`2 è

@+E´G89[mn9[23>=#A:C>J$9y7:9H_e>kA:;z0^;9/2.Q.9e$9HQ~f`?

Optimal Design with Design Networks 313

Page 324: Proceedings of the Third European Workshop on Probabilistic

IP0^;>/10:GQ.9H=/84`2.=OAY7:9<; @+J$98b>J9HG89[mn9[23>yA:C>J$9

7:9H_e>kA:;HLP/2PQ.9eb9HQTf?vIP0^;>/10:GQP9H=/84`2 L./1=zQP9[2$A:>k9HQ0:=ªñ êè@ +,J$9~_eA:;;9H=IA`2.QP/2.4T_eA`mxIA`2$9[23>=6CpA:;>J$9Õ7:9H_e>kA:;Õm~9H==0^4:9Õ>J.0^> è Õ=k9[2.QP=O>kAxQK/87b/1=/8A`2 è 0^;9 A[7:9<; ¤i032PQñ ê,è LP;9H=IN9H_e>/87:9HG8?:+âÊC è

@/1=z0G89H0^C©A`2x>J$96QP/87b/1=/8A`2x>k;9<9 hJ.A3=k9|A`2.G8?

0:Q^Ôk0:_e9[2`>QP/87$/1=/8A`2t/1= è Y¤Lñ ê,è @ _eA:;;9H=INA`2PQP=y>kA>J$9ªmx0À$/m,FPm9e.I9H_e>k9HQF$>/1G1/8>@?

ñ êè @ æ0 ê ò =c¤AY7:9<;0:G1G3IP0^;>/10:G3Q.9H=/84`2=3>J.0^>V0^;9_eA`2.=/1=k>k9[23>h|/8>J + >J$9<;h|/1=9:LK/8>=/2`>k9<;I.;9<>0^>/8A`2f9H_eA`m~9H=_<G89H0^;f9HG8A[hª+J$9CpA3G1G8A[h|/2.4¦0:G84:A:;o/8>JPmBLhJ$9[2¥9e$9H_HF$>k9HQ¡f?

9H0:_§J®QK/87b/1=/8A`2L|I.;A3IP0^430^>k9H=tF.>/1G1/8>q?®_eA`2`>k;/1fKF.>/8A`2.=A:CPIP0^;>/10:GQ.9H=/84`2.=0^>/2.QP/87$/1QKF.0:GQP/87b/1=/8A`2.=/2`hO0^;QP=0:G8A`2$4QK/87b/1=/8A`2&>k;9<9:+|J$9¢7:9H_e>kA:;¢m~9H==0^4:9u=k9[2`>Cp;A`m è

@>kA è y/1=iQ.9[2$A:>k9HQv0:=ñ ê,è

@032PQ~>J.0^>=k9[2`>

Cp;A`m è >kA è /1=Q.9[2$A:>k9HQt0:=ñ ê,è + · Í3¶3ºo¼p» ÃîoµE¶:··¿É[» ª¼p¾[¼ ̧¼£¶:Å#»½¼·á¼p»'zs¡J$9[2BQP/áa7$/1=/8A`2 è /1=_<0:G1G89HQf`? è >kABMzA3G1G89H_e>Ð6/87b/1=/8A`2.5|>/1Gáa/8>@?:LN/8>|Q.Ab9H=y>J$9ÕCpA3G1G8A[h/2$4.°^+JPA:;|9H0:_oJ0:Q^Ôk0:_e9[23>|QK/87b/1=/8A`2 è @ L è _<0:G1G1=6MzA3GáaG89H_e>Ð6/87$/1=/8A`2.5>/1G1/8>q?n/2 è

@032.Qª;9H_e9H/87:9H=Eñ ê,è

@Cp;A`m è

@+

«b+JPA:;9H0:_§JtIP0^;>/10:GVQ.9H=/84`2K=bLNF.IQP0^>k9ê ò =b¤#æ ê ò =b¤ > @rñ êè @ é

hJ$9<;9~ñ ê,è @ /1=Õ/2.Q.9e$9HQf?IP0^;>/10:GEQ.9H=/84`2 032.Q /1=_eA`2.=/1=k>k9[23>h|/8>J =$+

+âÊC è /1=032ª0:Q^Ôk0:_e9[2`>QP/87$/1=/8A`2L^CpA:;9H0:_§JIP0^;>/10:GQP9H=/84`2 L è Õ_eA`mnIKF.>k9H=

ñ êè æ0 ê ò =b¤AY7:9<;0:G1GvIK0^;>/10:GvQP9H=/84`2 =>JP0^>g0^;9È_eA`2ba=/1=k>k9[23>®h|/8>J LG10:f9HG1=A`2$9=oF._oJçIP0^;>/10:GQP9H=/84`2>JP0^>;9H0:_§J$9H=>J$9l7^0:GF$9rñ êè f?= f.;9H0^j$/2$4g>/89H=0^;ofP/8>k;0^;/1G8?N¤L032.QÇ=9[2.QP=ñ êè >kA è +

ÑA:>k9È>J.0^> ê ò = ¤dmn0[?f9Ç9$#$F.0:Gx>kA , Ê+s¡J$9[2 , IP0^;>/1_</1IP0^>k9H=n/2¦032¦0:QKQP/8>/8A`2Lhz9T;9ea#$F./8;9~>J.0^>>J$9x=FKm /1= , Ê+xs¡J$9[2 , OIP0^;>/1_§a/1IP0^>k9H=Õ/20 uA3IN9<;o0^>/8A`2Lhi9;9$#$F./8;9>J.0^>6>J$9;9H=oF.G8>/1= , /8C032.QnA`2.G8?~/8C0:G1GKA3I9<;0323>=i0^;9 , Ê+

ÚdqÜY:` "iE3@Ý 3 ÚuÛKÜ^cÝV

âk2Õ>J$9O=k9H_eA`2.Q;A`FP2PQ)A:CKm~9H==0^4:9zIK0:==/2$4.L:QP/87$/1=/8A`2è 6;9H_e9H/87:9H=C;A`möQP/87$/1=/8A`2 è 0vIP0^;>/10:GVQP9H=/84`2 >J.0^>/1=©_eA`2.=/1=k>k9[2`>©h|/8>JÕ>J$9A3I.>/mx0:G:Q.9H=/84`2+EMzA`m,afP/2P/2$4/8>Õh/8>J>J$9~Q.9H=/84`29<7^0:GF.0^>/8A`2dI9<;CpA:;om~9HQ9H0^;G1/89<;h|/8>J./2t>J$9QK/87b/1=/8A`2L è )/1Q.9[2`>/8wP9H=|>J.9)A3I$a>/mx0:GKIP0^;>/10:GQ.9H=/84`2xh|/8>J./2x>J$9QP/87b/1=/8A`2+Eâ@>i>J$9[2=k9[2PQP=y>J$9A3I.>/mn0:GIP0^;>/10:GVQP9H=/84`2t;9HG10^>/87:9>kAv>J$9=k9HIK0^;0^>kA:;A:C|9H0:_oJr0:Q^Ôk0:_e9[23>,QP/87$/1=/8A`2g>kA>J$9_eA:;ka;9H=INA`2PQP/2$4 è

@+ ] =ªm~9H==0^4:9xIP0:==/2$4tI.;A:4:;9H==k9H=<L

9H0:_§J¦QP/87b/1=/8A`2¦/1Q.9[23>/8wK9H=>J.9A3IP>/mn0:GzIP0^;>/10:GOQ.9ea=/84`2®h|/8>JP/2>J$9uQP/87$/1=/8A`2+×|JP/1=v/1=B0:_§J./89<7:9HQf?>J$9ÕCpA3G1G8A[h/2$4v0:G84:A:;/8>JKmB+ · Í3¶3ºo¼p» à « ª¼ ̧»½º§¼½¹ »q¿!Á»½¼pÃ~¸^· ª¼p¾[¼ ̧¼£¶:Å,¿HÂÌe¼8Í3Å's¡J$9[2 QP/87$/1=/8A`2 è /1=¡_<0:G1G89HQ f`? è >kAÐ6/1=k>k;/1fNF$>k9 6I.>/mx0:G1Ð/87$/1=/8A`2.Ð9H=/84`2Lt/8>¦QPA9H=g>J$9CpA3G1G8A[h|/2.4.°

^+âÊC è /1=)032d0:Q^Ô0:_e9[2`>ªQP/87$/1=/8A`2L è ;9H_e9H/87:9H=0IK0^;>/10:GOQ.9H=/84`2 AY7:9<; Cp;A`m è +¢|J$9[2L03mnA`2$4|IP0^;>/10:GQ.9H=/84`2.=©AY7:9<;#í >J.0^>0^;9z_eA`2ba=/1=k>k9[23>vh|/8>J Ly/8>x/1QP9[23>/8wP9H=v>J$9tA`2.9th|/8>J>J.9J./84`J$9H=k> ê ò 7À0:GF$9 fP;9H0^jb/2.4u>/89H=n0^;fP/áa>k;o0^;/1G8?K¤O032.QBG10:f9HG©/8>|0:== +

«b+ 6>J$9<;h|/1=k9:Lz03m~A`2$4u0:G1GyIP0^;>/10:GyQ.9H=/84`2.=,AY7:9<;í ^Ly/8>/1Q.9[23>/8wK9H=x>J$9A`2$9h/8>J¥>J$9JP/84`J$9H=k>ê òÇ7^0:GF$9 f.;9H0^j$/2$46>/89H=0^;fP/8>k;0^;o/1G8?K¤©032.Q,G10Àaf9HG/8>|0:== +

+JPA:;9H0:_oJx0:Q^Ô0:_e9[2`>QP/87$/1=/8A`2 è @ L`_<0:G1GKÐ/1=>k;/1fKFba>k9 ÕI.>/mn0:G1Ð6/87$/1=/8A`2.Ð9H=/84`2 /2 è

@032.Qô=k9[2.Q

>J.9OIK0^;>/10:G$Q.9H=/84`2 >J.0^>E/1=E_eA`2.=/1=k>k9[2`>Eh|/8>J= >kA è @ +" cÛKb3qÝ 3 ÚdÛKÜ^cÝV

âk2g>J$9tG10:=k>~;A`FP2PQgA:CÕm~9H==0^4:9tIP0:==/2.4.LiQP/87$/1=/8A`2è x;9H_e9H/87:9H=~>J$9TA3I.>/mn0:GOIP0^;>/10:GzQP9H=/84`2rA[7:9<;xí

@Cp;A`m 9H0:_oJ×0:Q:Ô0:_e9[2`>dQP/87$/1=/8A`2 è

@+ âÊ>u_eA`mfP/2$9H=

>J$9[m¯h|/8>Jg/8>=A[h2gA3I.>/mx0:GIP0^;>/10:GOQ.9H=/84`2dAY7:9<;í 032.Qu=k9[2.QK=>J.9;9H=F.G8>Õ>kABQP/87$/1=/8A`2 è + ] >Õ>J$99[2.QtA:C>J./1=|;A`FP2.Q©Lb>J.9)A3I.>/mn0:GVQ.9H=/84`2TAY7:9<;í /1=A3f.>0:/2.9HQ+ · Í3¶3ºo¼p» à §µE¶^·p·1¿ÉH»#!Á»½¼pÃ~¸^· ,¿ȩ¼1Í:Ås¡J$9[2QP/áa7$/1=/8A`2 è n/1=_<0:G1G89HQgf? è >kAuMzA3G1G89H_e> 6I.>/mx0:G1Ð9ea=/84`2LP/8>|Q.Ab9H=y>J$9)CpA3G1G8A[h|/2.4.°

314 Y. Xiang

Page 325: Proceedings of the Third European Workshop on Probabilistic

^+JPA:;9H0:_oJ0:Q^Ôk0:_e9[23>|QK/87b/1=/8A`2 è @ L è 6_<0:G1G1=6MzA3GáaG89H_e> ÕI.>/mn0:G1Ð9H=/84`2/2 è

@032PQ;9H_e9H/87:9H= =

Cp;A`m è@+

«b+OMOA`mfK/2$9= h|/8>J0:G1G/= ;9H_e9H/87:9HQu032.Q=k9[2.Q;9H=F.G8>y>kA è +

|J.9CpA3G1G8A[h|/2$40:G84:A:;o/8>JPm0:_e>/87À0^>k9H=t>J$9¢>J$;9<9;A`FP2PQP=ªA:C6m~9H==0^4:9TIK0:==/2$4u/2r>F$;o2r032.Q¦/1=,9e$9ea_HF$>k9HQtf?T>J$9)0^4:9[2`> + · Í3¶:º§¼p» Ã Ò !Á»½¼Ã~¸^· ,¿§Ìe¼8Í3Å ¼p¾[¼ ̧¼£¶:ÅKº¿o¿

"!# $%" &'

()*+, -*.,/ $"01 #2" 34&5 6

7*+, 801 9" !:"%;=<#" >? /0=/) #01 @5 A#B *+, -*.,/ $;=<#" >?, 01 @,A/

C)DFEA HG # EI)"JJ/1KL">M3#J/@,ANO%PJQR

bA`FK2.QK2$9H==tA:CP 6I.>/mn0:G1Ð9H=/84`2 Æ ?$Ð6/87b/1=/8A`2.V;9<9/1=y9H=k>0:fKG1/1=J$9HQf9HG8AYh)+¨;A3IA3=/8>/8A`2«~0:==k9<;>=>J.0^>=I.;AbQKFP_e9HQl/2g>J$9T0:G84:A:;/8>JPm /1=,0G89<430:GyQP9H=/84`2+ Õ2.G8?0~I.;AbA:C=kj:9<>_§Jt/1=y43/87:9[2QKF$96>kAx=IK0:_e9)G1/mn/8><+Syº¶oÁ$¶^̧¼p»½¼p¶:Å¡«|J$9vQ.9H=/84`2 =¡I.;A$QKF._e9HQdf? 6I$a>/mn0:G1Ð9H=/84`2 Æ ?$Ð6/87b/1=/8A`2.V;9<9/1=G89<430:G£+¨;AA:CN=kj:9<>_oJ°=T6/89<hr>J.9OQK/87b/1=/8A`2>k;9<9O0:=#;AbA:>k9HQ

0^> è-U 032.Q¢F.=k9,/2.QKF._e>/8A`2A`2/8>=ÕQ.9HI.>JWVXV+Õ|J$9fP0:=k9,_<0:=k9~/1=WVXæ¬$+|J.9<;9/1=0T=/2$43G89,QP/87$/1=/8A`2032.Qr>J$9tI.;A3INA3=/8>/8A`2g_<032¦f9T9H0:=/1G8?¦=J$AYh2+ ] =a=FPmn96>J$9ªI.;A3IA3=/8>/8A`2BCA:;8VJXZY 032.Qt_eA`2.=/1Q.9<;WVXBæ0u^+#Ó©9<>>J$9|;AA:>iQP/87b/1=/8A`2nf9 è 032PQn/8>=0:Q^Ôk0:_e9[23>6QP/87$/1=/8A`2.=fN9 è

@ GEæ é1é [ ¤+|J$9=F.f$a

>k;9<9Õ;AbA:>k9HQB0^>y9H0:_oJ è@J.0:=y0,Q.9HI.>JZY ¢+ Æ ?0:=a

=FPmxI.>/8A`2LV/8CMzA3G1G89H_e>Ð6/87$/1=/8A`2.5>/1G1/8>q?¥/1=ª_<0:G1G89HQgA`29H0:_oJ è

@f?t>J$90^4:9[2`><LCpA3G1G8A[hz9HQuf`?0_<0:G1GA:CÐ6/1=a

>k;/1fKF.>k9 6I.>/mn0:G1Ð6/87$/1=/8A`2.Ð9H=/84`2A`2 è@LnCA3G1G8AYhi9HQ

f?g0_<0:G1GOA:C6MzA3G1G89H_e> 6IP>/mn0:G1Ð9H=/84`2¥A`2 è @ L>J$9[2>J$9 IP0^;>/10:Gp¤zQ.9H=/84`2;9<>F$;§2$9HQvf? è

@/1=yG89<430:G©;9HG10Àa

>/87:9ª>kAx>J$9)=F.f.>k;9<9:+|J.90:_e>F.0:G39e$9H_HF$>/8A`2QP/á9<;=©Cp;A`m>J$90:fA[7:9O0:=

CpA3G1G8A[h|=<°yMzA3G1G89H_e>Ð6/87b/1=/8A`2.5|>/1G1/8>@?/1=|_<0:G1G89HQtA`2T9H0:_§Jè@f`? è ihJ./1_§J);9H_e9H/87:9H=iñ ê,è

@C;A`m è

@+J.A:;E9H0:_§J

9HG89[m~9[2`>ñ êè @ /2ñ ê,è @ LK/8>/1= , /8C>J$9<;9Õ9eba/1=k>=62$AxG89<430:G IP0^;>/10:Gp¤yQ.9H=/84`2_eA`2.=/1=k>k9[2`>h/8>J /2l>J$9=oF.f.>k;9<9;AbA:>k9HQr0^> è

@+ Æ ?g2bF.G1G0:QPQK/8>/8A`2L

032`?¢IK0^;>/10:GQ.9H=/84`2 =t/2 è ~_eA`2.=/1=k>k9[2`>ªh/8>J h|/1G1Gf99<7À0:GF.0^>k9HQ>kA ê ò = b¤#æ , Ê+ |9[2._e9:L =_<032P2$A:>Of9|=k9HG89H_e>k9HQB0:= = f`? è |QNF$;/2$4ªÐ6/1=k>k;o/1fKFba>k9 6I.>/mx0:G1Ð6/87b/1=/8A`2PÐ9H=/84`2v032PQx_<032P2$A:>zf9|IP0^;>A:C

>J$9|QP9H=/84`2,;9<>F$;§2$9HQ~>J$;A`F.4`JxMzA3G1G89H_e> 6I.>/mx0:G1Ð9ea=/84`2+1 9[2.QBA:C#=kj:9<>_oJ 2|J.9<A:;9[m =J$AYh|=#>J.0^>=/1=032A3I.>/mn0:G.QP9H=/84`2+

$¿¶:º¿Hà v|J$9ÈQ.9H=/84`2 =I.;A$QKF._e9HQf`? 6I$a>/mn0:G1Ð9H=/84`2 Æ ?$Ð6/87b/1=/8A`2.V;9<9/1=|A3I.>/mn0:G£+¨;AA:C=kj:9<>_§J°#â@>=F.x_e9H=E>kAª=J.A[h¥>J.0^> ê ò = ¤

A3f.>0:/2$9HQ¡f?>J$9¢;AbA:>TQP/87b/1=/8A`2 è /2È=k>k9HI¡«lA:CÐ6/1=k>k;/1fKF.>k9 6I.>/mn0:G1Ð6/87$/1=/8A`2.Ð9H=/84`2/1=>J$9gmn0À./áam,FPm 9e.IN9H_e>k9HQ F$>/1G1/8>q?"AY7:9<;&0:G1GG89<430:GQ.9H=/84`2.=<+ Õ2._e9>J./1=T/1=x9H=>0:fPG1/1=J$9HQL/8>vCpA3G1G8A[h|=T>JP0^>0rQ.9ea=/84`2L>J.0^>60^>k>0:/2.=6>J./1=Õmx0À$/m,FPmô9e.I9H_e>k9HQuF$>/1Gáa/8>q?ô032.Qô/1=¦;9H=k>k;o/1_e>k9HQ">kA9H0:_oJ"QP/87$/1=/8A`2Lt/1=!= G10:f9HG89HQf?>J$9Ç_eA:;;9H=INA`2PQP/2$4QP/87$/1=/8A`2QNF$;/2$4Ð6/1=k>k;/1fKF.>k9 6I.>/mn0:G1Ð6/87$/1=/8A`2.Ð9H=/84`2+ MzA3G1G89H_e> ÕI$a>/mn0:G1Ð9H=/84`2=/mnIKG8? 0:==k9[mfPG89H=l>J$9[m >kA:4:9<>J$9<;[+|J$9ufA$Q.?¥A:C>J.9¢IP;AA:C,F.=k9H=B/2.QNF._e>/8A`2&h|/8>JÈ0=k>k;oFP_e>F$;9¢=/mn/1G10^;T>kA¦>J$9uI.;AbA:C)0:fA[7:9:+E1 9[2PQ®A:C=kj:9<>_§J 2]\ qÛ-^½!

Ð9[2$A:>k9>J$9n>kA:>0:Gy2FKmf9<;ÕA:CyQ.9H=/84`2gIP0^;03mn9<>k9<;=f? í 032.Q®>J$9umn0À./mFKm 2FKmf9<;vA:C)INA3==/1fPG897^0:GF$9H=A:C0Q.9H=/84`2lIP0^;o03m~9<>k9<;f? + ] _e9[23>k;o0:Gáa/8<9HQÈA3I.>/mx0:GQ.9H=/84`2Ç>JP0^>9<7^0:GF.0^>k9H=t0:G1G6Q.9H=/84`2.=9ePJ.03F.=k>/87:9HG8?J.0:=>J$9)_eA`mnIPG89e./8>q?B ¤+J.A:; 6I.>/mx0:G1Ð9H=/84`2 Æ ?bÐ6/87$/1=/8A`2.;9<9:LÇG89<>>J$92bFPmfN9<;A:C©QP/87b/1=/8A`2.=if9O ! L>J$9Õmn0À./m,FPm×2bFPm,af9<;yA:CQP9H=/84`2BIP0^;03mn9<>k9<;=I9<;QP/87$/1=/8A`2Bf9_3L032.Q>J$9)mn0À./m,FPm_<0^;QP/2.0:G1/8>@?A:CQP/87$/1=/8A`2T=k9HIP0^;0^>kA:;o=f9a`+vÐÕF$;/2$4¢MzA3G1G89H_e>Ð6/87b/1=/8A`2P5|>/1G1/8>@?:Li9H0:_§JlQP/87$/áa=/8A`2,9<7^0:GF.0^>k9H= ]b ¤VIP0^;>/10:GKQ.9H=/84`2.=E032.Q~=9[2.QP=E0m~9H==0^4:9ªA:C=/8<9 ]c ¤z>kA>J$9_<0:G1G89<;[+9[2._e9:L>J$9_eA`mnIKG89e$/8>@?A:C 6I.>/mn0:G1Ð9H=/84`2 Æ ?$Ð6/87b/1=/8A`2.V;9<9x/1= ! ]b ! 5ÈÀ¤ ]c ¤ ÑA:;omx0:G1G8?:L1`T/1=mFP_oJ=mx0:G1G89<;>J.032d_032.Q">J$9_eA`mxIPG89e./8>q?"f9H_eA`m~9H= ! eb ¤+ sÈJ$9[2f_¡/1=¦F.IPI9<;kaqfA`FP2.QP9HQL 6I$a>/mn0:G1Ð9H=/84`2 Æ ?$Ð6/87b/1=/8A`2.V;9<9/1=|9<x_</89[23><+

© BÛPhgÛ :ÛKji ©3àâk2¦0:QPQP/8>/8A`2¦>kAdI.;9<7b/8A`F.=nhiA:;jr;9<7$/89<hi9HQÇ/2 b9H_§a>/8A`2g^LNhz9QK/1=_HF.==|;9HG10^>/8A`2.=A:CE>J./1=|_eA`2`>k;/1fKF.>/8A`2>kAnA:>J$9<;;9HG10^>k9HQthzA:;jTf9HG8A[h)°âk2¥>J$9G1/8>k9<;0^>F$;9A`2¥4:;0:INJ./1_<0:GmnAbQP9HG1=<Lz0l;9ea

G10^>k9HQ hiA:;j/1==k>k;A`2$4dÔkFP2P_e>/8A`2¡>k;9<9 ) 9[2.=k9[29<>0:G£+1LyH:ÀÒ$¤hJ./1_oJu0:QKQ.;9H==k9H=ª>J$9x/1==oF$9nA:Cy=k9$#$F$9[2ba>/10:GyQ.9H_</1=/8A`2¥mx0^jb/2.4.+ ] =~/2.QK/1_<0^>k9HQ¥f`?r¨0^;9HQP/1=

Optimal Design with Design Networks 315

Page 326: Proceedings of the Third European Workshop on Probabilistic

9<>t0:G£+ ¨#0^;9HQP/1=B9<>t0:G£+1L)«^¬:¬3­¤L)IA3/23>aqfK0:=k9HQ¡Q.9ea=/84`2T_eA:;;9H=IA`2.QP=i>kAn=k9$#$F$9[23>/10:GQP9H_</1=/8A`2tmn0^j$/2$4.+âk2.=k>k9H0:QL>J$9dA3IP>/mn0:GÕQ.9H=/84`20:QPQ.;9H==9HQ¡/2È>J./1=_eA`2`>k;/1fKF$>/8A`2>0^j:9H=E>J.9z0:IPIP;A30:_oJA:CN=9<>aqfP0:=k9HQ,Q.9ea=/84`2LKhJ./1_§Jt/2`7:A3G87:9H=Õ=/m,F.G8>032$9<A`F.=|9<7À0:GFP0^>/8A`2A:C0ªm,F._§J~G10^;4:962FPmf9<;EA:CQ.9H=/84`2xIP0^;03m~9<>k9<;o=E>J.032hJP0^>v/1=T_eA`2.=/1Q.9<;9HQ®/2®0l>@?bIK/1_<0:G=9$#bF$9[2`>/10:G6Q.9ea_</1=/8A`2gI.;A3fKG89[mB+vJ$9x/1==F.9nA:CQ.9H=/84`2l_eA`2.=k>k;0:/2`>7$/8A3G10^>/8A`2t/1=0:G1=AnQ.9H0:G8>h|/8>JT/2>J./1=|_eA`23>k;/1fNF$>/8A`2+] 2$A:>J$9<;z;9HG10^>k9HQThzA:;jn/1=y2$9H=k>k9HQÔFP2._e>/8A`2x>k;9<9H=

£³ Ôk0^9<;oF.Gá#LPH:<`¤L`hJ$9<;9O0 ) ¦/1=E2$9H=k>k9HQ~/206_<GF.=a>k9<;A:CK06J./84`J$9<;#G89<7:9HG ) d>kA;9HQKF._e9O=IP0:_e9O_eA`mnIKG89ea/8>@?r032.Qr>kAd0:G1G8AYh m~A:;9v9<x_</89[2`>xf9HG1/89<CIP;A3IP0^430Àa>/8A`2+J$9ÕQP/87b/1=/8A`2>k;9<9:LPI.;A3IA3=k9HQT/2v>J./1=y_eA`2`>k;/áafKF.>/8A`2Lb/1=O0:G1=kA~0,2$9H=k>k9HQ ) &=k>k;oFP_e>F$;9:+E|J$96_eA`m,aIKF.>0^>/8A`2_eA`2.QNF._e>k9HQh/8>J./2t0vQP/87b/1=/8A`2LJ.A[hz9<7:9<;HL4:Ab9H=f9<?:A`2.QI.;A3fP0:fK/1G1/1=k>/1_ª;9H0:=kA`2./2$4.+yâÊ>|;9H0:=A`2.=0:fA`F$>#Q.9H=/84`2,Q.9H_</1=/8A`2.=<L3_eA`2.=k>k;o0:/23>=032PQ,F.>/1G1/8>/89H=<L032.QB/1=Q.9H_</1=/8A`2bac>J.9<A:;9<>/1_/22.0^>F$;9:+âk2>J$9&G1/8>k9<;o0^>F$;9ÈA`2_eA`2.=k>k;0:/2`>¥=0^>/1=kC£0:_e>/8A`2

I.;A3fPG89[m M| $¨y¤L^_eA`2.=k>k;0:/2`>#4:;0:IKJ.=0^;9O_eA`237:9<;>k9HQ>kA ) =>kAª=kA3G87:9M $¨i= Ð9H_oJ`>k9<;i032.Q~¨9H0^;G£LH::¤+|J.9_HF.;;9[2`>È_eA`2`>k;/1fKF.>/8A`2 9H==k9[2`>/10:G1G8? 9e$>k9[2.QP=>J.0^>t0:IKI.;A30:_oJ¡>kA_eA`2.=k>k;0:/2`>tA3I.>/mx/8H0^>/8A`2¡032.Qh|/8>Jt0~Q.9H_</1=/8A`2bac>J$9<A:;9<>/1_ªA3f$Ô9H_e>/87:9C£FP2P_e>/8A`2+

KÙ Vi"qiÜÀqV

su9y;9<w2$9HQª>J.9OQP9<wN2./8>/8A`2CpA:;#QP9H=/84`2~2$9<>qhzA:;j$=#;9ea430^;QK/2$4G89<430:G.0^;_<=#032.Q9H==k9[2`>/10:G1/8>q?:+i|J$9z2.9<h¦QP9<C a/2./8>/8A`2x/mnI.;A[7:9H=9e.I.;9H==/87:9[2$9H==i032.Q~4`F./1QP032P_e9|>kAmnAbQ.9HG_eA`2.=k>k;oF._e>/8A`2¦032PQl7:9<;/8wN_<0^>/8A`2+dsu9B_eA`m,aIP/1G89HQ¢0TQ.9H=/84`2u2.9<>qhzA:;j/2`>kAt0TQP/87$/1=/8A`2>k;9<9~032.QI.;9H=k9[23>k9HQ®0:G84:A:;/8>JPmx=n>J.0^>T_eA`mfP/2$9I.;A3fP0:fP/1G1/1=a>/1_^L$_eA`2.=k>k;0:/2`>aqfP0:=k9HQT032.QvQ.9H_</1=/8A`2bac>J.9<A:;9<>/1_6;9H0Àa=kA`2P/2$4ÕCA:;iA3I.>/mn0:G.QP9H=/84`2n/2,>J.9|_eA`mnIP/1G89HQx=k>k;oF._§a>F$;9:+&|JP/1=~;9H=F.G8>2$A:>xA`2.G8?¥I.;A[7$/1Q.9H=x0g_eA`mnIKFba>0^>/8A`2.0:Gm~9H_§J.032./1=mCA:;=/2$43G89eaq0^4:9[23>OA3I.>/mx0:G.Q.9ea=/84`2LVfKF$>Õ0:G1=kAtwNG1G1=)/2u0T430:Id/2¢A3IP>/mn0:GE_eA3G1G10:fA^a;0^>/87:9Q.9H=/84`2+J$9"I.;9H=k9[2`>k9HQ 0:G84:A:;/8>JPm =F./8>k9ö=kA3G87:9H=>J$9

I.;A3fPG89[m"A:CQ.9H_</1=/8A`2bac>J$9<A:;9<>/1_A3I.>/mx0:GVQ.9H=/84`2/2>J$96_eA`2`>k9eb>OA:CV_<0^>0:G8A:4xQP9H=/84`2vhJ$9<;9QP/1=_e;9<>k96Q.9ea=/84`2)A3I.>/8A`2.=0^;9i_eA`2.wP4`F$;9HQ+|J$9z0:G84:A:;/8>JPm =F./8>k9Q.9<;o/87:9H=z/8>=z9<x_</89[2P_e?vf?nQ.9H_eA`mnIA3=/2$4>J$96Q.9H=/84`2Q.A`mx0:/2¢0:_<_eA:;QP/2$4>kAQ.9H=/84`2l=k9HIP0^;0^>kA:;o=ª/2uQP/87$/áa=/8A`2ª>k;9<9H=<+s¡J$9[2m,F.G8>/1IKG89A3I.>/mx0:G`Q.9H=/84`2P=V9e./1=k><L>J$90:G84:A:;o/8>JPm=oF./8>k9;9<>F$;§2.=A`2$9EA:C$>J$9[m¡0^;fK/8>k;0^;ka

/1G8?:LKfNF$>_<032Bf9Õ9eb>k9[2PQ.9HQt>kAn;9<>F$;o2T0:G1G£+ `à#Kß+cÛKÝ&ÛK:ÜJ#/2.032._</10:G=F.IPIA:;>~>kAg>J./1=n;9H=k9H0^;_oJ®/1=xIP;A[7$/1Q.9HQf?ÑÕ $´iäyMÈA:CEMO032P0:QP0$+

g¢ÛÛP3ÛN$ÛKÜ "!# !#$% &('*)),+-./&013254&7689:9:;, 25<:=&>?25<:@A<:BC25<:DE6E4<GF2 &C6H25 &I1J,K9L 6E47=&84 <:689G&M&C.5<GN, OQPSR5TCU*VWXJY[Z\5ZZS]_^a`bZc]dZe`Z]b\E` `fhgi`kjmln Tpoqsr l Ttf rbu X qwvmqLo8U8q n uxqstzymR jZ tkyu j oqLo|ymtzv gnzX q l qG~*y X qwTmt ?N&*.S+m% 'C

&C6E425&C[&C59Q'C),)0a5&C&6H9: ./25&8 <:N1J,6H, ./25E<:,2[&H2 b.8 Z R X q UHqwyu \ t X usuxq tkU , E ,pb++

&C .5&8 &8 .5&8e z _<G2/2 @A&CC '*))¡ 5,@¢<:b£ &C 6H&¤b<:NE@¥.¦25¨§ 62 <G,©25 &8&*.8ªOPSR5TCU*V_«,¬ XwY¨­ TmtC®HVSTt ^ tkU R X yqst Xj qst Z R X q UHqwyu \ t ¯X usuxq tkU ? N&C.,+°*%°

±²&C&8&C³¨ ¤´#±<µ¶·'*)°+¸ U8qLoqwTmto$¹²q XwYf rbu X q n u ºg»J¼8 U X qs½ o¾@;5<LbN,&

-k§/&8 9µ'*))%°%¿&C./25&Cº§ 6H25<:À2 5&C&C.COPSR5TCU*V«Á XwY­ TtC®HV¦Tmt ^ tkU R X yqst Xj qst Z R X q ÃU8qwymu \ t X usuxqs¯ tkU b?N&*.Ä)¡m%,',,± m=%<:&8 68&b4bb&.59: ¶

Å ¶b <a!#ƾÃ4 bEm. Æ'C),)) !©1JE@Ç&8F ¥1J,6H9:9L;kEm25<:=& $b<:./25 <:;b25&*$Ã&C;bF; .5&C&C.5<GN,aOÈPSR5TCU*VÁ WmXwY ¸ oqt Z r X T l y X qwTmt ­ TmtC® R¯ tkU b? N,&C.,)pb),

¾É² &Cb<L.8ÊÉ!N,4&8%; N4Ê©À9LzË ¶&C% 6KÄ+ %&H25F; ,./&*·b&*./<:N&C6H<L.5<G,bF2 4&8,5&825<L6Ì?k&8E./?k&C6H25<:=& OQPR5T*UCV c R5Tmt X q REo3qst¸ oEqtiÍ ` q l rbu:y X qwTmtÎ o ymR5U YÐÏ ¬,¬ WÌÑ TREÒCo Y T n ?N&*.e'bÄ,b

Çb%,;&Ck%!#x¾ÓÔ ¶%Õ Ç,Æ<G,&8*Ã'*)),)aam³F2 Ö .?5<: 68<G?9:&C.e1_./&82/F; .5&C×68 68 5&C,2&8N,<G&C&85F<:N ` uGT*ymt f ytky8 8lÇ t X Î ½Cq ¹² ¡ Ä,E +°*%¡

!¾zÓE¶Ç'C),) Z0ØY% TmR j T®¦ÙSr ymt X q X y X qs½ #\ t*® R¯ tkU ºZSnn uq v X T ZÚf U Y yt qwUEymuS¸ oqt ­ T ln qsu R_4 8254&*./<L.8$p&8? /2 @A&C2Æ1 $&*6E4 <L689Û±FN,<G &8&8 <:N

ÜAÃÝ<:N ¾Ã4&8Æ !#S&*./4 @%4ÌÄ,¡ i!&C6H<L.5<G,bF2 4&8,5&825<L6¥NE? 4<:6C9Ã@Abb&C91JÇ6H9:9L;kEmF2 <G=,&b&C.5<GN,È,¦.5?? 9G³Õ6E4<: .8O$!# Ü pSÞ Õ ¶ Å %bb<GÆ&Cb<G25, .C Z vm½mymtkU o#qst Z R X q ÃU8qwymu \ t ¯X usuxq tkU HßàÆdZ\ Á¬ W ¬m ?N&*.,,*%,+) %? <:N&CC

ÜACÝ<:N m¾Ã4&CÓ© ¶p´p=&C .8Äbaá?b2 <G@¥9&C.5<GN,[<:¨6H,9G9L;kEm2 <G=,&Ab&*./<:N&82Ã,5zºO¤PSR5TCU*Vâ XJY\ t X R8V ãTqst X²­ Tmt*®HV¶Tmt Z r X TmtzT l Tmr%o Z t X oymtzvf rbu X qwy8 t Xz`kj o X8l oeä ZZefÈZe`Så ¬æHç*? N,&C.aÄ¡ 'HbÄm¡

316 Y. Xiang

Page 327: Proceedings of the Third European Workshop on Probabilistic

Hybrid Loopy Belief Propagation

Changhe Yuan

Department of Computer Science and EngineeringMississippi State UniversityMississippi State, MS [email protected]

Marek J. Druzdzel

Decision Systems LaboratorySchool of Information Sciences

University of PittsburghPittsburgh, PA [email protected]

Abstract

We propose an algorithm called Hybrid Loopy Belief Propagation (HLBP), which extendsthe Loopy Belief Propagation (LBP) (Murphy et al., 1999) and Nonparametric BeliefPropagation (NBP) (Sudderth et al., 2003) algorithms to deal with general hybrid Bayesiannetworks. The main idea is to represent the LBP messages with mixture of Gaussians andformulate their calculation as Monte Carlo integration problems. The new algorithm isgeneral enough to deal with hybrid models that may represent linear or nonlinear equationsand arbitrary probability distributions.

1 Introduction

Some real problems are more naturally mod-elled by hybrid Bayesian networks that con-tain mixtures of discrete and continuous vari-ables. However, several factors make inferencein hybrid models extremely hard. First, linearor nonlinear deterministic relations may existin the models. Second, the models may con-tain arbitrary probability distributions. Third,the orderings among the discrete and continu-ous variables may be arbitrary. Since the gen-eral case is difficult, existing research often fo-cuses on special instances of hybrid models, suchas Conditional Linear Gaussians (CLG) (Lau-ritzen, 1992). However, one major assumptionbehind CLG is that discrete variables cannothave continuous parents. This limitation waslater addressed by extending CLG with logis-tic and softmax functions (Lerner et al., 2001;Murphy, 1999). More recent research begin todevelop methodologies for more general non-Gaussian models, such as Mixture of TruncatedExponentials (MTE) (Moral et al., 2001; Cobband Shenoy, 2005), and junction tree algorithmwith approximate clique potentials (Koller et al.,1999). However, most of these approaches relyon the junction tree algorithm (Lauritzen and

Spiegelhalter, 1988). As Lerner et al. (Lerner etal., 2001) pointed out, it is important to havealternative solutions in case that junction treealgorithm-based methods are not feasible.

In this paper, we propose the Hybrid LoopyBelief Propagation algorithm, which extends theLoopy Belief Propagation (LBP) (Murphy etal., 1999) and Nonparametric Belief Propaga-tion (NBP) (Sudderth et al., 2003) algorithmsto deal with general hybrid Bayesian networks.The main idea is to represent LBP messagesas Mixtures of Gaussians (MG) and formulatetheir calculation as Monte Carlo integrationproblems. The extension is far from trivial dueto the enormous complexity brought by deter-ministic equations and mixtures of discrete andcontinuous variables. Another advantage of thealgorithm is that it approximates the true pos-terior probability distributions, unlike most ex-isting approaches which only produce their firsttwo moments for CLG models.

2 Hybrid Loopy Belief Propagation

To extend LBP to hybrid Bayesian networks,we need to know how to calculate the LBP mes-sages in hybrid models. There are two kinds ofmessages defined in LBP. Let X be a node ina hybrid model. Let Yj be X’s children and Ui

Page 328: Proceedings of the Third European Workshop on Probabilistic

be X’s parents. The λ(t+1)

X (ui) message that X

sends to its parent Ui is given by:

λ(t+1)

X (ui) =

α∫x

λX(x)∏j

λ(t)

Yj(x)

uk:k 6=i

P (x|u)∏k 6=i

π(t)

X (uk) .(1)

and the message π(t+1)

Yj(x) that X sends to its

child Yj is defined as:

π(t+1)

Yj(x) =

αλX(x)∏

k 6=j

λ(t)

Yk(x)

∫u

P (x|u)∏k

π(t)

X (uk) .(2)

where λX(x) is a message that X sends to itselfrepresenting evidence. For simplicity, I use onlyintegrations in the message definitions. It is ev-ident that no closed-form solutions exist for themessages in general hybrid models. However,we observe that their calculations can be for-mulated as Monte Carlo integration problems.

First, let us look at the π(t+1)

Yj(x) message de-

fined in Equation 2. We can rearrange the equa-tion in the following way:

π(t+1)

Yj(x) =

α∫u

P (x,u)

︷ ︸︸ ︷λX(x)

k 6=j

λ(t)

Yk(x)P (x|u)

k

π(t)

X (uk)

.

Essentially, we put all the integrations out-side of the other operations. Given the new for-mula, we realize that we have a joint probabilitydistribution over x and uis, and the task is tointegrate all the uis out and get the marginalprobability distribution over x. Since P (x, u)can be naturally decomposed into P (x|u)P (u),the calculation of the message can be solved us-ing a Monte Carlo sampling technique calledComposition method (Tanner, 1993). The ideais to first draw samples for each uis from

π(t)

X (ui), and then sample from the product of

λX(x)∏

k 6=j

λ(t)

Yk(x)P (x|u). We will discuss how

to take the product in the next subsection. Fornow, let us assume that the computation is pos-sible. To make life even easier, we make further

modifications and get

π(t+1)

Yj(x) =

α∫u

λX(x)

∏k

λ(t)

Yk(x)P (x|u)

∏k

π(t)

X(uk)

λ(t)

Yj(x)

.

Now, for the messages sent from X toits different children, we can share most ofthe calculation. We first get samples foruis, and then sample from the product of

λX(x)∏k

λ(t)

Yk(x)P (x|u). For each different mes-

sage π(t+1)

Yj(x), we use the same sample x but

assign it different weights 1/λ(t)

Yi(x).

Let us now consider how to calculate theλ

(t+1)

X (ui) message defined in Equation 1. First,we rearrange the equation analogously:

λ(t+1)

X (ui) = α

x,uk:k 6=i

P (x,uk:k 6=i|ui)︷ ︸︸ ︷λX(x)

j

λ(t)

Yj(x)P (x|u)

k 6=i

π(t)

X (uk)

.(3)

It turns out that here we are facing a quite dif-

ferent problem from the calculation of π(t+1)

Yj(x).

Note that now we have P (x, uk : k 6= i|ui), ajoint distribution over x and uk(k 6= i) condi-tional on ui, so the whole expression is only alikelihood function of ui, which is not guaran-teed to be integrable. As in (Sudderth et al.,2003; Koller et al., 1999), we choose to restrictour attention to densities and assume that theranges of all continuous variables are bounded(maybe large). The assumption only solves partof the problem. Another difficulty is that com-position method is no longer applicable here,because we have to draw samples for all par-ents ui before we can decide P (x|u). We notethat for any fixed ui, Equation 3 is an integra-tion over x and uk, k 6= i and can be estimatedusing Monte Carlo methods. That means that

we can estimate λ(t+1)

X (ui) up to a constant forany value ui, although we do not know how tosample from it. This is exactly the time whenimportance sampling becomes handy. We canestimate the message by drawing a set of im-portance samples as follows: sample ui from a

318 C. Yuan and M. J. Druzdzel

Page 329: Proceedings of the Third European Workshop on Probabilistic

chosen importance function, estimate λ(t+1)

X (ui)using Monte Carlo integration, and assign theratio between the estimated value and I(ui) asthe weight for the sample. A simple choice forthe importance function is the uniform distribu-tion over the range of ui, but we can improve theaccuracy of Monte Carlo integration by choos-ing a more informed importance function, the

corresponding message λ(t)

X (ui) from the last it-eration. Because of the iterative nature of LBP,messages usually keep improving over each it-eration, so they are clearly better importancefunctions.

Note that the preceding discussion is quitegeneral. The variables under consideration canbe either discrete or continuous. We only needto know how to sample from or evaluate theconditional relations. Therefore, we can use anyrepresentation to model the case of discrete vari-ables with continuous parents. For instance, wecan use general softmax functions (Lerner et al.,2001) for the representation.

We now know how to use Monte Carlo inte-gration methods to estimate the LBP messages,represented as sets of weighted samples. Tocomplete the algorithm, we need to figure outhow to propagate these messages. Belief prop-agation involves operations such as sampling,multiplication, and marginalization. Samplingfrom a message represented as a set of weightedsamples may be easy to do; we can use resam-pling technique. However, multiplying two suchmessages is not straightforward. Therefore, wechoose to use density estimation techniques toapproximate each continuous message using amixture of Gaussians (MG) (Sudderth et al.,2003). A K component mixture of Gaussianhas the following form

M(x) =K∑

i=1

wiN(x; µi, σi) , (4)

whereK∑

i=1

wi = 1. MG has several nice prop-

erties. First, we can approximate any contin-uous distribution reasonably well using a MG.Second, it is closed under multiplication. Last,we can estimate a MG from a set of weighted

Algorithm: HLBP

1. Initialize the messages that evidence nodes send tothemselves and their children as indicating messageswith fixed values, and initialize all other messages tobe uniform.

2. while (stopping criterion not satisfied)

Recompute all the messages using Monte Carlointegration methods.

Normalize discrete messages.

Approximate all continuous messages using MGs.

end while

3. Calculate λ(x) and π(x) messages for each variable.

4. Calculate the posterior probability distributions forall the variables by sampling from the product ofλ(x) and π(x) messages.

Figure 1: The Hybrid Loopy Belief Propagationalgorithm.

samples using a regularized version of expecta-tion maximization (EM) (Dempster et al., 1977;Koller et al., 1999) algorithm.

Given the above discussion, we outline thefinal HLBP algorithm in Figure 1. We first ini-tialize all the messages. Then, we iterativelyrecompute all the messages using the methodsdescribed above. In the end, we calculate themessages λ(x) and π(x) for each node X anduse them to estimate the posterior probabilitydistribution over X.

3 Product of Mixtures of Gaussians

One question that remains to be answered inthe last section is how to sample from the prod-

uct of λX(x)∏

k 6=j

λ(t)

Yk(x)P (x|u). We address the

problem in this section. If P (x|u) is a contin-uous probability distribution, we can approx-imate P (x|u) with a MG. Then, the problembecomes how to compute the product of severalMGs. The approximation is often a reasonablething to do because we can approximate anycontinuous probability distribution reasonablywell using MG. Even when such approxima-tion is poor, we can approximate the product ofP (x|u) with an MG using another MG. One ex-ample is that the product of Gaussian distribu-

Hybrid Loopy Belief Propagation 319

Page 330: Proceedings of the Third European Workshop on Probabilistic

tion and logistic function can be approximatedwell with another Gaussian distribution (Mur-phy, 1999).

Suppose we have messages M1, M2, ..., MK ,each represented as an MG. Sudderth et al. useGibbs sampling algorithm to address the prob-lem (Sudderth et al., 2003). The shortcomingof the Gibbs sampler is its efficiency: We usu-ally have to carry out several iterations of Gibbssampling in order to get one sample.

Here we propose a more efficient methodbased on the chain rule. If we treat the se-lection of one component from each MG mes-sage as a random variable, which we call alabel, our goal is to draw a sample from thejoint probability distribution of all the labelsP (L1, L2, ..., LK). We note that the joint prob-ability distribution of the labels of all the mes-sages P (L1, L2, ..., LK) can be factorized usingthe chain rule as follows:

P (L1, L2, ..., LK) = P (L1)K∏

i=2

P (Li|L1, ..., Li−1) .

(5)Therefore, the idea is to sample from the

labels sequentially based on the prior or con-ditional probability distributions. Let wij bethe weight for the jth component of ith mes-sage and µij and σij be the component’s pa-rameters. We can sample from the product ofmessages M1, ..., MK using the algorithm pre-sented in Figure 2. The main idea is to cal-culate the conditional probability distributionscumulatively. Due to the Gaussian densities,the method has to correct the bias introducedduring the sampling by assigning the samplesweights. The method only needs to go over themessages once to obtain one importance sam-ple and is more efficient than the Gibbs sam-pler in (Sudderth et al., 2003). Empirical re-sults show that the precision obtained by theimportance sampler is comparable to the Gibbssampler given a reasonable number of samples.

4 Belief Propagation with Evidence

Special care is needed for belief propaga-tion with evidence and deterministic relations.

Algorithm: Sample from a product of MGs M1 ×M2 ×...×MK .

1. Randomly pick a component, say j1, form the firstmessage M1 according to its weights w11, ..., w1J1

.

2. Initialize cumulative parameters as followsµ∗

1 ← µ1j1; σ∗

1 ← σ1j1.

3. i← 2; wImportance← w1j1.

4. while (i ≤ K)

Compute new parameters for each component ofith message as follows

σ∗

ik ← ((σ∗

i−1)−1 + (σik)−1)−1,

µ∗

ik ← (µik

σik

i−1

σ∗

i−1

)σ∗

ik.

Compute new weights for ith message with any x

w∗

ik = wikw∗

i−1ji−1

N(x;µ∗

i−1,σ

i−1)N(x;µik,σik)

N(x;µ∗

ik,σ∗

ik)

.

Calculate the normalized weights w∗

ik.

Randomly pick a component, say ji, from the ith

message using the normalized weights.

µ∗

i ← µ∗

iji; σ∗

i ← σ∗

iji.

i← i + 1;

wImportance = wImportance× w∗

iji.

end while

5. Sample from the Gaussian with mean µ∗

K and vari-ance σ∗

K .

6. Assign the sample weight w∗

KjK/wImportance.

Figure 2: Sample from a product of MGs.

In our preceding discussion, we approximateP (x|u) using MG if P (x|u) is a continuous prob-ability distribution. It is not the case if P (x|u)is deterministic or if X is observed. We discussthe following several scenarios separately.

Deterministic relation without evidence: Wesimply evaluate P (x|u) to get the value x as asample. Because we did not take into accountthe λ messages, we need to correct our bias us-ing weights. For the message πYi

(x) sent fromX to its child Yi, we take x as the sample and

assign it weight λX(x)∏k 6=i

λ(t)

Yk(x). For the mes-

sage λX(ui) sent from X to its parent Ui, wetake value ui as a sample for Ui and assign it

weight λX(x)∏k

λ(t)

Yk(x)/λ

(t)

X (ui).

Stochastic relation with evidence: The mes-sages πYj

(x) sent from evidence node X to its

320 C. Yuan and M. J. Druzdzel

Page 331: Proceedings of the Third European Workshop on Probabilistic

children are always indicating messages withfixed values. The messages λYj

(x) sent fromthe children to X have no influence on X, sowe need not calculate them. We only need toupdate the messages λX(ui), for which we takethe value ui as the sample and assign it weight

λX(e)∏k

λ(t)

Yk(e)/λ

(t)

X (ui), where e is the observed

value of X.Deterministic relation with evidence: This

case is the most difficult. To illustrate this casemore clearly, we use a simple hybrid Bayesiannetwork with one discrete node A and two con-tinuous nodes B and C (see Figure 3).

a 0.7

¬a 0.3

B

N(B; 1, 1)

P (C|A, B) a ¬a

C C = B C = 2 ∗B

@@R

A B

C

Figure 3: A simple hybrid Bayesian network.

Example 1. Let C be observed at state 2.0.Given the evidence, there are only two possiblevalues for B: 2.0 when A = a and 1.0 when A =¬a. We need to calculate messages λC(a) andλC(b). If we follow the routine and sample fromA and B first, it is extremely unlikely for usto hit a feasible sample; Almost all the samplesthat we get would have weight 0.0. Clearly weneed a better way to do that.

First, let us consider how to calculate themessage λC(a) sent from C to A. Suppose wechoose uniform distribution as the importancefunction, we first randomly pick a state for A.After the state of A is fixed, we note that we cansolve P (C|A, B) to get the state for B. There-fore, we need not sample for B from πC(b), inthis case N(B; 1, 1). We then assign N(B; 1, 1)as weight for the sample. Since A’s two statesare equally likely, the message λC(A) would be

proportional N(2; 1, 1), N(1; 1, 1).

For the message λC(b) message sent from C

to B, since we know that B can only take twovalues, we choose a distribution that puts equalprobabilities on these two values as the impor-tance function, from which we sample for B.The state of B will also determine A as follows:when B = 2, we have A = a; when B = 1, wehave A = ¬a. We then assign weight 0.7 to thesample if B = 2 and 0.3 if B = 1. However, themagic of knowing feasible values for B is impos-sible in practice. Instead, we first sample for A

from πC(A), in this case, 0.7, 0.3, given whichwe can solve P (C|A, B) for B and assign eachsample weight 1.0. So λC(b) have probabilitiesproportional to 0.7, 0.3 on two values 2.0 and1.0.

In general, in order to calculate λ messagessent out from a deterministic node with evi-dence, we need to sample from all parents ex-cept one, and then solve P (x|u) for that par-ent. There are several issues here. First, sincewe want to use the values of other parents tosolve for the chosen parent, we need an equa-tion solver. We used an implementation of theNewton’s method for solving nonlinear set ofequations (Kelley, 2003). However, not all equa-tions are solvable by this equation solver or anyequation solver for that matter. We may wantto choose the parent that is easiest to solve.This can be tested by means of a preprocess-ing step. In more difficult cases, we have to re-sort to users’ help and ask at the model build-ing stage for specifying which parent to solveor even manually specify the inverse functions.When there are multiple choices, one heuristicthat we find helpful is to choose the continuousparent with the largest variance.

5 Lazy LBP

We can see that HLBP involves repeated den-sity estimation and Monte Carlo integration,which are both computationally intense. Effi-ciency naturally becomes a concern for the al-gorithm. To improve its efficiency, we proposea technique called Lazy LBP, which is also ap-plicable to other extensions of LBP. The tech-

Hybrid Loopy Belief Propagation 321

Page 332: Proceedings of the Third European Workshop on Probabilistic

nique serves as a summarization that containsboth my original findings and some commonlyused optimization methods.

After evidence is introduced in the network,we can pre-propagate the evidence to reducecomputation in HLBP. First, we can plug in anyevidence to the conditional relations of its chil-dren, so we need not calculate the π messagesfrom the evidence to its children. We need notcalculate the λ messages from its children to theevidence either. Secondly, evidence may deter-mine the value of its neighbors because of deter-ministic relations, in which case we can evaluatethe deterministic relations in advance so that weneed not calculate messages between them.

Furthermore, from the definitions of the LBPmessages, we can easily see that we do not haveto recompute the messages all the time. For ex-ample, the λ(x) messages from the children of anode with no evidence as descendant are alwaysuniform. Also, a message needs not be updatedif the sender of the message has received no newmessages from neighbors other than the recipi-ent.

Based on Equation 1, λ messages should beupdated if incoming π messages change. How-ever, we have the following result.

Theorem 1. The λ messages sent from a non-evidence node to its parents remain uniform be-fore it receives any non-uniform messages fromits children, even though there are new π mes-sages coming from the parents.

Proof. Since there are no non-uniform λ mes-sages coming in, Equation 1 simplifies to

λ(t+1)

X (ui) = α

x,uk:k 6=i

P (x|u)∏

k 6=i

π(t)

X (uk)

= α

x,uk:k 6=i

P (x, uk : k 6= i|ui)

= α .

Finally for HLBP, we may be able to calculatesome messages exactly. For example, supposea discrete node has only discrete parents. Wecan always calculate the messages sent from thisnode to its neighbors exactly. In this case, weshould avoid using Monte Carlo sampling.

6 Experimental Results

We tested the HLBP algorithm on two bench-mark hybrid Bayesian networks: emission net-work (Lauritzen, 1992) and its extension aug-mented emission network (Lerner et al., 2001)shown in Figure 4(a). Note that HLBP is ap-plicable to more general hybrid models; Wechoose the networks only for comparison pur-pose. To evaluate how well HLBP performs, wediscretized the ranges of continuous variables to50 intervals and then calculated the Hellinger’sdistance (Kokolakis and Nanopoulos, 2001) be-tween the results of HLBP and the exact solu-tions obtained by a massive amount of computa-tion (likelihood weighting with 100M samples)as the error for HLBP. All results reported arethe average of 50 runs of the experiments.

6.1 Parameter Selection

HLBP has several tunable parameters. Wehave number of samples for estimating messages(number of message samples), number of sam-ples for the integration (number of integrationsamples) in Equation 3. The most dramaticinfluence on precision comes from the numberof message samples, shown as in Figure 4(b).Counter intuitively, the number of integrationsamples does not have as big impact as we mightthink (see Figure 4(c) with 1K message sam-ples). The reason we believe is that when wedraw a lot of samples for messages, the preci-sion of each sample becomes less critical. Inour experiments, we set the number of messagesamples to 1, 000 and the number of integra-tion samples to 12. For the EM algorithm forestimating MGs, we typically set the regulariza-tion constant for preventing over fitting to 0.8,stopping likelihood threshold to 0.001, and thenumber of components in mixtures of Gaussianto 2.

We also compared the performance of twosamplers for product of MGs: Gibbs sam-pler (Sudderth et al., 2003) and the importancesampler in Section 3. As we can see from Fig-ure 4(b,c), when the number of message sam-ples is small, Gibbs sampler has slight advan-tage over the importance sampler. As the num-

322 C. Yuan and M. J. Druzdzel

Page 333: Proceedings of the Third European Workshop on Probabilistic

Metal

W as teE f f ic ien c y C O 2

E m is s io n

Metal

E m is s io n

D u s t

E m is s io n

L ig h t

P en etrab ility

W as te

T y p e F ilter

S tate

B u r n in g

R eg im e

Metal

S en s o r

D u s t

S en s o r

C O 2

S en s o r

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4 5 6 7 8 9 1 0

N u m b e r o f M e s s a g e S a m p le s (1 0* 2 ^ (i-1 ))

He

llin

ge

r's D

ista

nce

Im p o rta n c e S a m p le r

G ib b s S a m p le r

0

0.02

0.04

0.06

0.08

0.1

0.1 2

1 2 3 4 5 6 7 8 9 1 0

N u m b e r o f M e s s a g e In te g ra tio n S a m p le s (2 ^ i)

Helli

nger's D

ista

nce

Im p o rta n c e S a m p le r

G ib b s S a m p le r

0

0.2

0.4

0.6

0.8

1

1.2

0.7 6 1.7 2 2 .6 8 3 .6 4 4 .6 5 .5 6 6 .5 2 7 .4 8 8 .4 4

P o s te rio r D is trib u tio n o f D u s tE m is s io n

Pro

ba

bili

ty

E s tim a te d P o s te rio r

E x a c t P o s te rio r

N o rm a l A p p ro x im a tio n

(a) (b) (c) (d)

0

0.05

0.1

0.1 5

0.2

0.2 5

0.3

0.3 5

1 2 3 4 5 6 7 8 9 1 0

P ro p a g a tio n L e n g th

He

llin

ge

r's D

ista

nce

H L B P

L a z y H L B P

0

0.5

1

1.5

2

2.5

3

3.5

4

1 2 3 4 5 6 7 8 9 10

P ro p a g a tio n L e n g thR

un

nin

g T

ime

(s)

H L B P

L a z y H L B P

0

0.05

0.1

0.1 5

0.2

0.2 5

0.3

0.3 5

1 2 3 4 5 6 7 8 9 1 0

P ro p a g a tio n L e n g th

He

llin

ge

r's D

ista

nce

H L B P

L a z y H L B P

0

0.5

1

1.5

2

2.5

3

3.5

1 2 3 4 5 6 7 8 9 10

P ro p a g a tio n L e n g th

Ru

nn

ing

Tim

e (

s)

H L B P

L a z y H L B P

(e) (f) (g) (h)

Figure 4: (a) Emission network (without dashed nodes) and augmented emission network (withdashed nodes). (b,c) The influence of number of message samples and number of message inte-gration samples on the precision of HLBP on augmented Emission network CO2Sensor and Dust-Sensor both observed to be true and Penetrability to be 0.5. (d) Posterior probability distributionof DustEmission when observing CO2Emission to be −1.3, Penetrability to be 0.5 and WasteTypeto be 1 in Emission network. (e,f,g,h) Results of HLBP and Lazy HLBP: (e) error on emission,(f) running time on emission, (g) error on augmented emission, (h) running time on augmentedemission.

ber of message samples increases, the differencebecomes negligible. Since the importance sam-pler is much more efficient, we use it in all ourother experiments.

6.2 Results on Emission Networks

We note that mean and variance alone provideonly limited information about the actual poste-rior probability distributions. Figure 4(d) showsthe posterior probability distribution of nodeDustEmission when observing CO2Emission at−1.3, Penetrability at 0.5, and WasteType at1. We also plot in the same figure the corre-sponding normal approximation with mean 3.77and variance 1.74. We see that the normal ap-proximation does not reflect the true posterior.While the actual posterior distribution has amultimodal shape, the normal approximationdoes not tell us where the mass really is. Wealso report the estimated posterior probabilitydistribution of DustEmission by HLBP. HLBPseemed able to estimate the shape of the actualdistribution very accurately.

In Figures 4(e,f), we plot the error curvesof HLBP and Lazy HLBP (HLBP enhanced

by Lazy LBP) as a function of the propaga-tion length. We can see that HLBP needs onlyseveral steps to converge. Furthermore, HLBPachieves better precision than its lazy version,but Lazy HLBP is much more efficient thanHLBP. Theoretically, Lazy LBP should not af-fect the results of HLBP but only improve itsefficiency if the messages are calcualted exactly.However, we use importance sampling to esti-mate the messages. Since we use the messagesfrom the last iteration as the importance func-tions, iterations will help improving the func-tions.

We also tested the HLBP algorithm on theaugmented Emission network (Lerner et al.,2001) with CO2Sensor and DustSensor both ob-served to be true and Penetrability to be 0.5and report the results in Figures 4(g,h). Weagain observed that not many iterations areneeded for HLBP to converge. In this case,Lazy HLBP provides comparable results whileimproving the efficiency of HLBP.

Hybrid Loopy Belief Propagation 323

Page 334: Proceedings of the Third European Workshop on Probabilistic

7 Conclusion

The contribution of this paper is two-fold. First,we propose the Hybrid Loopy Belief Propagationalgorithm (HLBP). The algorithm is generalenough to deal with general hybrid Bayesiannetworks that contain mixtures of discrete andcontinuous variables and may represent linearor nonlinear equations and arbitrary probabilitydistributions and naturally accommodate thescenario where discrete variables have contin-uous parents. Its another advantage is that itapproximates the true posterior distributions.Second, we propose an importance sampler tosample from the product of MGs. Its accuracy iscomparable to the Gibbs sampler in (Sudderthet al., 2003) but much more efficient given thesame number of samples. We anticipate that,just as LBP, HLBP will work well for manypractical models and can serve as a promisingapproximate method for hybrid Bayesian net-works.

Acknowledgements

This research was supported by the Air ForceOffice of Scientific Research grants F49620–03–1–0187 and FA9550–06–1–0243 and by Intel Re-search. We thank anonymous reviewers for sev-eral insightful comments that led to improve-ments in the presentation of the paper. Allexperimental data have been obtained usingSMILE, a Bayesian inference engine developedat the Decision Systems Laboratory and avail-able at http://genie.sis.pitt.edu/.

References

B. R. Cobb and P. P. Shenoy. 2005. Hybrid Bayesiannetworks with linear deterministic variables. InProceedings of the 21th Annual Conference onUncertainty in Artificial Intelligence (UAI–05),pages 136–144, AUAI Press Corvallis, Oregon.

A.P. Dempster, N.M. Laird, and D.B. Rubin. 1977.Maximum likelihood from incomplete data viathe EM algorithm. J. Royal Statistical Society,B39:1–38.

C. T. Kelley. 2003. Solving Nonlinear Equationswith Newton’s Method.

G. Kokolakis and P.H. Nanopoulos. 2001.Bayesian multivariate micro-aggregation underthe Hellinger’s distance criterion. Research in of-ficial statistics, 4(1):117–126.

D. Koller, U. Lerner, and D. Angelov. 1999. A gen-eral algorithm for approximate inference and itsapplication to hybrid Bayes nets. In Proceedingsof the Fifteenth Annual Conference on Uncer-tainty in Artificial Intelligence (UAI-99), pages324–333. Morgan Kaufmann Publishers Inc.

S. L. Lauritzen and D. J. Spiegelhalter. 1988. Lo-cal computations with probabilities on graphicalstructures and their application to expert sys-tems. Journal of the Royal Statistical Society, Se-ries B (Methodological), 50(2):157–224.

S.L. Lauritzen. 1992. Propagation of probabilities,means and variances in mixed graphical associa-tion models. Journal of the American StatisticalAssociation, 87:1098–1108.

U. Lerner, E. Segal, and D. Koller. 2001. Ex-act inference in networks with discrete children ofcontinuous parents. In Proceedings of the Seven-teenth Annual Conference on Uncertainty in Arti-ficial Intelligence (UAI-01), pages 319–328. Mor-gan Kaufmann Publishers Inc.

S. Moral, R. Rumi, and A. Salmeron. 2001. Mix-tures of truncated exponentials in hybrid Bayesiannetworks. In Proceedings of Fifth European Con-ference on Symbolic and Quantitative Approachesto Reasoning with Uncertainty (ECSQARU-01),pages 156–167.

K. Murphy, Y. Weiss, and M. Jordan. 1999. Loopybelief propagation for approximate inference: Anempirical study. In Proceedings of the FifteenthAnnual Conference on Uncertainty in ArtificialIntelligence (UAI–99), pages 467–475, San Fran-cisco, CA. Morgan Kaufmann Publishers.

K. P. Murphy. 1999. A variational approximationfor Bayesian networks with discrete and contin-uous latent variables. In Proceedings of the Fif-teenth Annual Conference on Uncertainty in Arti-ficial Intelligence (UAI-99), pages 457–466. Mor-gan Kaufmann Publishers Inc.

E. Sudderth, A. Ihler, W. Freeman, and A. Will-sky. 2003. Nonparametric belief propagation. InProceedings of 2003 IEEE Computer Society Con-ference on Computer Vision and Pattern Recog-nition (CVPR-03).

M. Tanner. 1993. Tools for Statistical Inference:Methods for Exploration of Posterior Distribu-tions and Likelihood Functions. Springer, NewYork.

324 C. Yuan and M. J. Druzdzel

Page 335: Proceedings of the Third European Workshop on Probabilistic

Probabilistic Independence of Causal Influences

Adam ZagoreckiEngineering Systems Department

Defence Academy of the UKCranfield University

Shrivenham, UK

Marek DruzdzelDecision Systems LaboratorySchool of Information Sciences

University of PittsburghPittsburgh, PA 15260, USA

Abstract

One practical problem with building large scale Bayesian network models is an expo-nential growth of the number of numerical parameters in conditional probability tables.Obtaining large number of probabilities from domain experts is too expensive and too timedemanding in practice. A widely accepted solution to this problem is the assumption ofindependence of causal influences (ICI) which allows for parametric models that defineconditional probability distributions using only a number of parameters that is linear inthe number of causes. ICI models, such as the noisy-OR and the noisy-AND gates, havebeen widely used by practitioners. In this paper we propose PICI, probabilistic ICI, anextension of the ICI assumption that leads to more expressive parametric models. Weprovide examples of three PICI models and demonstrate how they can cope with a com-bination of positive and negative influences, something that is hard for noisy-OR andnoisy-AND gates.

1 INTRODUCTION

Bayesian networks (Pearl, 1988) have provedtheir value as a modeling tool in many distinctdomains. One of the most serious bottlenecks inapplying this modeling tool is the costly processof creating the model. Although practically ap-plicable algorithms for learning BN from dataare available (Heckerman, 1999), still there aremany applications that lack quality data and re-quire using an expert’s knowledge to build mod-els.

One of the most serious problems related tobuilding practical models is the large numberof conditional probability distributions neededto be specified when a node in the graph haslarge number of parent nodes. In the case ofdiscrete variables, which we assume in this pa-per, conditional probabilities are encoded in theform of conditional probability tables (CPTs)that are indexed by all possible combinationsof parent states. For example, 15 binary par-ent variables result in over 32,000 parameters, anumber that is impossible to specify in a direct

manner. There are two qualitatively differentapproaches to avoiding the problem of specify-ing large CPTs. The first is to exploit inter-nal structure within a CPT – which basicallymeans to encode efficiently symmetries in CPTs(Boutilier et al., 1996; Boutilier et al., 1995).The other is to assume some model of interac-tion among causes (parent influences) that de-fines the effect’s (child node’s) CPT. The mostpopular class of model in this category is basedon the concept known as causal independence orindependence of causal influences (ICI) (Heck-erman and Breese, 1994) which we describe inSection 2. These two approaches should beviewed as complementary. It is because theyare able to capture two distinct types of inter-actions between causes.

In practical applications, the noisy-OR(Good, 1961; Pearl, 1988) model together withits extension capable of handling multi-valuedvariables, the noisy-MAX (Henrion, 1989),and the complementary models the noisy-AND/MIN (Dıez and Druzdzel, 2002) are themost often applied ICI models. One of the ob-

Page 336: Proceedings of the Third European Workshop on Probabilistic

vious limitations of these models is that theycapture only a small set of patterns of interac-tions among causes, in particular they do notallow for combining both positive and negativeinfluences. In this paper we introduce an ex-tension to the independence of causal influenceswhich allows us to define models that captureboth positive and negative influences. We be-lieve that the new models are of practical impor-tance as practitioners with whom we have hadcontact often express a need for conditional dis-tribution models that allow for a combinationof promoting and inhibiting causes.

The problem of insufficient expressive powerof the ICI models has been recognized by prac-titioners and we are aware of at least two at-tempts to propose models that are based on theICI idea, but are not strictly ICI models. Thefirst of them is the recursive noisy-OR (Lemmerand Gossink, 2004) which allows the specifica-tion of interactions among parent causes, butstill is limited to only positive influences. Theother interesting proposal is the CAST logic(Chang et al., 1994; Rosen and Smith, 1996)which allows for combining both positive andnegative influences in a single model, howeverdetaches from a probabilistic interpretation ofthe parameters, and consequently leads to diffi-culties with their interpretation and it can notbe exploited to speed-up inference.

The remainder of this paper is organized asfollows. In Section 2, we briefly describe in-dependence of causal influences. In Section 3we discuss the amechanistic property of the ICImodels, while in Section 4 we introduce an ex-tension of ICI – a new family of models – proba-bilistic independence of causal influences. In thefollowing two Sections 5 and 6, we introduce twoexamples of models for local probability distri-butions, that belong to the new family. Finally,we conclude our paper with discussion of ourproposal in Section 7.

2 INDEPENDENCE OF CAUSAL

INFLUENCES (ICI)

In this section we briefly introduce the conceptof independence of causal influences (ICI). First

Figure 1: General form of independence ofcausal interactions

however we introduce notation used throughoutthe paper. We denote an effect (child) vari-able as Y and its n parent variables as X =X1, . . . ,Xn. We denote the states of a vari-able with lower case, e.g. X = x. When it isunambiguous we use x instead of X = x.

In the ICI model, the interaction betweenvariables Xi and Y is defined by means of(1) the mechanism variables Mi, introduced toquantify the influence of each cause on the effectseparately, and (2) the deterministic function fthat maps the outputs of Mi into Y . Formally,the causal independence model is a model forwhich two independence assertions hold: (1) forany two mechanism variables Mi and Mj (i 6= j)Mi is independent of Mj given X1, . . . ,Xn, and(2) Mi and any other variable in the networkthat does not belong to the causal mechanismare independent given X1, . . . ,Xn and Y . AnICI model is shown in Figure 1.

The most popular example of an ICI modelis the noisy-OR model. The noisy-OR modelassumes that all variables involved in the inter-action are binary. The mechanism variables inthe context of the noisy-OR are often referredto as inhibitors. The inhibitors have the samerange as Y and their CPTs are defined as fol-lows:

P (Mi = y|Xi = xi) = pi

P (Mi = y|Xi = xi) = 0 . (1)

Function f that combines the individual influ-ences is the deterministic OR. It is important

326 A. Zagorecki and M. J. Druzdzel

Page 337: Proceedings of the Third European Workshop on Probabilistic

to note that the domain of the function defin-ing the individual influences are the outcomes(states) of Y (each mechanism variable mapsRange(Xi) to Range(Y )). This means that fis of the form Y = f(M1, . . . ,Mn), where typ-ically all variables Mi and Y take values fromthe same set. In the case of the noisy-OR modelit is y, y. The noisy-MAX model is an ex-tension of the noisy-OR model to multi-valuedvariables where the combination function is thedeterministic MAX defined over Y ’s outcomes.

3 AMECHANISTIC PROPERTY

The amechanistic property of the causal interac-tion models was explicated by Breese and Heck-erman, originally under the name of atemporal

(Heckerman, 1993), although later the authorschanged the name to amechanistic (Heckermanand Breese, 1996). The amechanistic propertyof ICI models relates to a major problem ofthis proposal — namely the problem of deter-mining semantic meaning of causal mechanisms.In practice, it is often impossible to say any-thing about the nature of causal mechanisms(as they can often be simply artificial constructsfor the sake of modeling) and therefore they, ortheir parameters, can not be specified explicitly.Even though Heckerman and Breese proposed astrict definition of the amechanistic property,in this paper we will broaden this definitionand assume that an ICI model is an amecha-nistic model, when its parameterization can bedefined exclusively by means of (a subset) ofconditional probabilities P (Y |X) without men-tioning Mis explicitly. This removes the burdenof defining mechanisms directly.

To achieve this goal it is assumed that oneof the states of each cause Xi is a special state(also referred to as as the distinguished state).Usually such a state is the normal state, likeok in hardware diagnostic systems or absent fora disease in a medical system, but such asso-ciation depends on the modeled domain. Wewill use * symbol to denote the distinguishedstate. Given that all causes Xi are in their dis-tinguished states, the effect variable Y is guar-anteed to be in its distinguished state. The idea

is to allow for easy elicitation of parameters ofthe intermediate nodes Mi, even though thesecan not be observed directly. This is achievedthrough a particular way of setting (control-ling) the causes Xi. Assuming that all causesexcept for a single cause Xi are in their dis-tinguished states and Xi is in some state (notdistinguished), it is easy to determine the prob-ability distribution for the hidden variable Mi.

Not surprisingly, the noisy-OR model is anexample of an amechanistic model. In this case,the distinguished states are usually false or ab-

sent states, because the effect variable is guar-anteed to be in the distinguished state giventhat all the causes are in their distinguishedstates Xi = xi. Equation 1 reflects the amecha-nistic assumption and results in the fact that theparameters of the mechanisms (inhibitors) canbe obtained directly as the conditional proba-bilities: P (Y = y|x1, . . . , xi−1, xi, xi+1, . . . , xn).Similarly, the noisy-MAX is an amechanisticmodel – it assumes that each parent variable hasa distinguished state (arbitrarily selected) andthe effect variable has a distinguished state. Inthe case of the effect variable, the distinguishedstate is assumed to be the lowest state (with re-spect to the ordering relation imposed on the ef-fect variable’s states). We strongly believe thatthe amechanistic property is highly significantfrom the point of view of knowledge acquisi-tion. Even though we introduce a parametricmodel instead of a traditional CPT, the amech-anistic property causes the parametric model tobe defined in terms of a conditional probabilitydistribution P (Y |X) and, therefore, is concep-tually consistent with the BN framework. Webelieve that the specification of a parametricmodel in terms of probabilities has contributedto the great popularity of the noisy-OR model.

4 PROBABILISTIC ICI

The combination function in the ICI models isdefined as a mapping of mechanisms’ states intothe states of the effect variable Y . Therefore,it can be written as Y = f(M), where M isa vector of mechanism variables. Let Qi bea set of parameters of CPT of node Mi, and

Probabilistic Independence of Causal Influences 327

Page 338: Proceedings of the Third European Workshop on Probabilistic

Q = Q1, . . . , Qn be a set of all parametersof all mechanism variables. Now we define thenew family probabilistic independence of causal

interactions (PICI) for local probability distrib-utions. A PICI model for the variable Y consistsof (1) a set of n mechanism variables Mi, whereeach variable Mi corresponds to exactly one par-ent Xi and has the same range as Y , and (2) acombination function f that transforms a set ofprobability distributions Qi into a single prob-ability distribution over Y . The mechanismsMi in the PICI obey the same independence as-sumptions as in the ICI. The PICI family is de-fined in a way similar to the ICI family, withthe exception of the combination function, thatis defined in the form P (Y ) = f(Q,M). ThePICI family includes both ICI models, whichcan be easily seen from its definition, as f(M)is a subset of f(Q,M), assuming Q is the emptyset.

In other words, in the case of ICI for a giveninstantiation of the states of the mechanismvariables, the state of Y is a function of thestates of the mechanism variables, while for thePICI the distribution over Y is a function ofthe states of the mechanism variables and someparameters Q.

Heckerman and Breese (Heckerman, 1993)identified other forms (or rather properties) ofthe ICI models that are interesting from thepractical point of view. We would like to notethat those forms (decomposable, multiple de-composable, and temporal ICI) are related toproperties of the function f , and can be appliedto the PICI models in the same way as they areapplied to the ICI models.

5 NOISY-AVERAGE MODEL

In this section, we propose a new local distrib-ution model that is a PICI model. Our goalis to propose a model that (1) is convenientfor knowledge elicitation from human expertsby providing a clear parameterization, and (2)is able to express interactions that are impos-sible to capture by other widely used models(like the noisy-MAX model). With this modelwe are interested in modeling positive and neg-

Figure 2: BN model for probabilistic indepen-dence of causal interactions, where P (Y |M) =f(Q,M).

ative influences on the effect variable that has adistinguished state in the middle of the scale.

We assume that the parent nodes Xi are dis-crete (not necessarily binary, nor an orderingrelation between states is required), and each ofthem has one distinguished state, that we de-note as x∗

i . The distinguished state is not aproperty of a parent variable, but rather a partof a definition of a causal interaction model — avariable that is a parent in two causal indepen-dence models may have different distinguishedstates in each of these models. The effect vari-able Y also has its distinguished state, and byanalogy we will denote it by y∗. The range ofthe mechanism variables Mi is the same as therange of Y . Unlike the noisy-MAX model, thedistinguished state may be in the middle of thescale.

In terms of parameterization of the mecha-nisms, the only constraint on the distributionof Mi conditional on Xi = x∗

i is:

P (Mi = m∗

i |Xi = x∗

i ) = 1

P (Mi 6= m∗

i |Xi = x∗

i ) = 0 , (2)

while the other parameters in the CPT of Mi

can take arbitrary values.

The definition of the CPT for Y is a key el-ement of our proposal. In the ICI models, theCPT for Y was by definition constrained to bea deterministic function, mapping states of Misto the states of Y . In our proposal, we definethe CPT of Y to be a function of probabilities

328 A. Zagorecki and M. J. Druzdzel

Page 339: Proceedings of the Third European Workshop on Probabilistic

of the Mis:

P (y|x) =

∏ni=1 P (Mi = y∗|xi) for y = y∗

αn

∑ni=1 P (Mi = y|xi) for y 6= y∗

where α is a normalizing constant discussedlater. Let qj∗

i = q∗i . For simplicity of notation

assume that qji = P (Mi = yj|xi), q∗i = P (Mi =

y∗|xi), and D =∏n

i=1 P (Mi = y∗|xi). Then wecan write:

my∑

j=1

P (yj|x) = D +

my∑

j=1,j 6=j∗

α

n

n∑

i=1

qji =

= D +α

n

my∑

j=1,j 6=j∗

n∑

i=1

qji = D +

α

n

n∑

i=1

(1 − q∗i ),

where my is the number of states of Y . Sincethe sum on the left must equal 1, as it definesthe probability distribution P (Y |x), we can cal-culate α as:

α =n(1 − D)

∑ni=1 (1 − q∗i )

.

The definition above does not define P (Y |M)but rather P (Y |X). It is possible to calculateP (Y |M) from P (Y |X), though it is not neededto use the model. Now we discuss how to ob-tain the probabilities P (Mi|Xi). Using defini-tion of P (y|x) and the amechanistic property,this task amounts to obtaining the probabilitiesof Y given that Xi is in its non-distinguishedstate and all other causes are in their distin-guished states (in a very similar way to how thenoisy-OR parameters are obtained). P (y|x) inthis case takes the form:

P (Y = y|x∗

1, . . . , x∗

i−1, xi, . . . , x∗

i+1, x∗

n) =

= P (Mi = y|xi) , (3)

and, therefore, defines an easy and intuitive wayfor parameterizing the model by just asking forconditional probabilities, in a very similar wayto the noisy-OR model. It is easy to notice thatP (Y = y∗|x∗

1, . . . , x∗

i , . . . , x∗

n) = 1, which posesa constraint that may be unacceptable from amodeling point of view. We can address thislimitation in a very similar way to the noisy-OR model, by assuming a dummy variable X0

(often referred to as leak), that stands for allunmodeled causes and is assumed to be alwaysin some state x0. The leak probabilities are ob-tained using:

P (Y = y|x∗

1, . . . , x∗

n) = P (M0 = y) .

However, this slightly complicates the schemafor obtaining parameters P (Mi = y|xi). Inthe case of the leaky model, the equality inEquation 3 does not hold, since X0 acts asa regular parent variable that is in a non-distinguished state. Therefore, the parametersfor other mechanism variables should be ob-tained using conditional probabilities P (Y =y|x∗

1, . . . , xi, . . . , x∗

n), P (M0 = y|x0) and thecombination function. This implies that theacquired probabilities should fulfil some non-trivial constraints. Because of space limitation,we decided to skip the discussion of these con-straints. In a nutshell, these constraints are sim-ilar in nature to constraints for the leaky noisy-MAX model. These constraints should not be aproblem in practice, when P (M0 = y∗) is large(which implies that the leak cause has marginalinfluence on non-distinguished states).

Now we introduce an example of the applica-tion of the new model. Imagine a simple di-agnostic model for an engine cooling system.The pressure sensor reading (S) can be in threestates high, normal, or low hi, nr, lo, thatcorrespond to pressure in a hose. Two possiblefaults included in our model are: pump failure

(P) and crack (C). The pump can malfunctionin two distinct ways: work non-stop instead ofadjusting its speed, or simply fail and not workat all. The states for pump failure are: ns, fa,

ok. For simplicity we assume that the crack onthe hose can be present or absent pr,ab. TheBN for this problem is presented in Figure 3.The noisy-MAX model is not appropriate here,because the distinguished state of the effectvariable (S) does not correspond to the lowestvalue in the ordering relation. In other words,the neutral value is not one of the extremes,but lies in the middle, which makes use of theMAX function over the states inappropriate. Toapply the noisy-average model, first we shouldidentify the distinguished states of the variables.

Probabilistic Independence of Causal Influences 329

Page 340: Proceedings of the Third European Workshop on Probabilistic

Figure 3: BN model for the pump example.

In our example, they will be: normal for sen-

sor reading, ok for pump failure and absent forcrack. The next step is to decide whether weshould add an influence of non-modeled causeson the sensor (a leak probability). If such aninfluence is not included, this would imply thatP (S = nr ∗ |P = ok∗, C = ab∗) = 1, otherwisethis probability distribution can take arbitraryvalues from the range (0, 1], but in practice itshould always be close to 1.

Assuming that the influence of non-modeledcauses is not included, the acquisition of themechanism parameters is performed directlyby asking for conditional probabilities of formP (Y |x∗

1, . . . , xi, . . . , x∗

n). In that case, a typicalquestion asked of an expert would be: What

is the probability of the sensor being in the low

state, given that a crack was observed but the

pump is in state ok? However, if the unmod-eled influences were significant, an adjustmentfor the leak probability is needed. Having ob-tained all the mechanism parameters, the noisy-average model specifies a conditional probabil-ity in a CPT by means of the combination func-tion.

Intuitively, the noisy-average combines thevarious influences by averaging probabilities. Incase where all active influences (the parents innon-distinguished states) imply high probabil-ity of one value, this value will have a highposterior probability, and the synergetic effectwill take place similarly to the noisy-OR/MAXmodels. If the active parents will ‘vote’ for dif-ferent effect’s states, the combined effect will bean average of the individual influences. More-over, the noisy-average model is a decomposable

model — the CPT of Y can be decomposed inpairwise relations (Figure 4) and such a decom-position can be exploited for sake of improvedinference speed in the same way as for decom-posable ICI models.

Figure 4: Decomposition of a combination func-tion.

It is important to note that the noisy-averagemodel does not take into account the orderingof states (only the distinguished state is treatedin a special manner). If a two causes have highprobability of high and low pressure, one shouldnot expect that combined effect will have prob-ability of normal state (the value in-between).

6 AVERAGE MODEL

Another example of a PICI model we want topresent is the model that averages influencesof mechanisms. This model highlights anotherproperty of the PICI models that is importantin practice. If we look at the representation ofa PICI model, we will see that the size of theCPT of node Y is exponential in the number ofmechanisms (or causes). Hence, in general caseit does not guarantee a low number of distrib-utions. One solution is to define a combinationfunction that can be expressed explicitly in theform of a BN but in such a way that it has sig-nificantly fewer parameters. In the case of ICImodels, the decomposability property (Hecker-man and Breese, 1996) served this purpose, andcan do too for in PICI models. This propertyallows for significant speed-ups in inference.

In the average model, the probability distrib-ution over Y given the mechanisms is basicallya ratio of the number of mechanisms that arein given state divided by the total number ofmechanisms (by definition Y and M have the

330 A. Zagorecki and M. J. Druzdzel

Page 341: Proceedings of the Third European Workshop on Probabilistic

same range):

P (Y = y|M1, . . . ,Mn) =1

n

n∑

i=1

I(Mi = y) .

(4)where I is the identity function. Basically, thiscombination function says that the probabilityof the effect being in state y is the ratio of mech-anisms that result in state y to all mechanisms.Please note that the definition of how a cause Xi

results in the effect is defined in the probabilitydistribution P (Mi|Xi). The pairwise decompo-sition can be done as follows:

P (Yi = y|Yi−1 = a,Mn = b)

=i

i + 1I(y = a) +

1

i + 1I(y = b) ,

for Y2, . . . , Yn. Y1 is defined as:

P (Y1 = y|M1 = a,M2 = b) =

=1

2I(y = a) +

1

2I(y = b) .

The fact that the combination function is de-composable may be easily exploited by inferencealgorithms. Additionally, we showed that thismodel presents benefits for learning from smalldata sets (Zagorecki et al., 2006).

Theoretically, this model is amechanistic, be-cause it is possible to obtain parameters ofthis model (probability distributions over mech-anism variables) by asking an expert only forprobabilities in the form of P (Y |X). For ex-ample, assuming variables in the model are bi-nary, we have 2n parameters in the model. Itwould be enough to select 2n arbitrary proba-bilities P (Y |X) out of 2n and create a set of 2nlinear equations applying Equation 4. Thoughin practice, one needs to ensure that the set ofequations has exactly one solution what in gen-eral case is not guaranteed.

As an example, let us assume that we wantto model classification of a threat at a mili-tary checkpoint. There is an expected terroristthreat at that location and there are particularelements of behavior that can help spot a terror-ist. We can expect that a terrorist can approachthe checkpoint in a large vehicle, being the only

person in the vehicle, try to carry the attackat rush hours or time when the security is lessstrict, etc. Each of these behaviors is not nec-essarily a strong indicator of terrorist activity,but several of them occurring at the same timemay indicate possible threat.

The average model can be used to model thissituation as follows: separately for each of sus-picious activities (causes) a probability distribu-tion of terrorist presence given this activity canbe obtained which basically means specificationof probability distribution of mechanisms. Thencombination function for the average model actsas ”popular voting” to determine P (Y |X).

The average model draws ideas from the lin-ear models, but unlike the linear models, the lin-ear sum is done over probabilities (as it is PICI),and it has explicit hidden mechanism variablesthat define influence of single cause on the ef-fect.

7 CONCLUSIONS

In this paper, we formally introduced a newclass of models for local probability distribu-tions that is called PICI. The new class is anextension of the widely accepted concept of in-dependence of causal influences. The basic ideais to relax the assumption that the combinationfunction should be deterministic. We claim thatsuch an assumption is not necessary either forclarity of the models and their parameters, norfor other aspects such as convenient decompo-sitions of the combination function that can beexploited by inference algorithms.

To support our claim, we presented two con-ceptually distinct models for local probabilitydistributions that address different limitationsof existing models based on the ICI. These mod-els have clear parameterizations that facilitatetheir use by human experts. The proposed mod-els can be directly exploited by inference al-gorithms due to fact that they can be explic-itly represented by means of a BN, and theircombination function can be decomposed into achain of binary relationships. This property hasbeen recognized to provide significant inferencespeed-ups for the ICI models. Finally, because

Probabilistic Independence of Causal Influences 331

Page 342: Proceedings of the Third European Workshop on Probabilistic

they can be represented in form of hidden vari-ables, their parameters can be learned using theEM algorithm.

We believe that the concept of PICI may leadto new models not described here. One remarkwe shall make here: it is important that newmodels should be explicitly expressible in termsof a BN. If a model does not allow for com-pact representation and needs to be specified asa CPT for inference purposes, it undermines asignificant benefit of models for local probabil-ity distributions – a way to avoid using largeconditional probability tables.

Acknowledgements

This research was supported by the Air ForceOffice of Scientific Research grants F49620–03–1–0187 and FA9550–06–1–0243 and by Intel Re-search. While we take full responsibility forany errors in the paper, we would like to thankanonymous reviewers for several insightful com-ments that led to improvements in the presen-tation of the paper, Tsai-Ching Lu for inspi-ration, invaluable discussions and critical com-ments, and Ken McNaught for useful commentson the draft.

References

C. Boutilier, R. Dearden, and M. Goldszmidt. 1995.Exploiting structure in policy construction. InChris Mellish, editor, Proceedings of the Four-teenth International Joint Conference on Artifi-cial Intelligence, pages 1104–1111, San Francisco.Morgan Kaufmann.

C. Boutilier, N. Friedman, M. Goldszmidt, andD. Koller. 1996. Context-specific independencein Bayesian networks. In Proceedings of the 12thAnnual Conference on Uncertainty in ArtificialIntelligence (UAI-96), pages 115–123, San Fran-cisco, CA. Morgan Kaufmann Publishers.

K.C. Chang, P.E. Lehner, A.H. Levis, Abbas K.Zaidi, and X. Zhao. 1994. On causal influencelogic. Technical Report for Subcontract no. 26-940079-80.

F. J. Dıez and Marek J. Druzdzel. 2002. Canon-ical probabilistic models for knowledge engineer-ing. Forthcoming.

I. Good. 1961. A causal calculus (I). British Journalof Philosophy of Science, 11:305–318.

D. Heckerman and J. Breese. 1994. A new look atcausal independence. In Proceedings of the TenthAnnual Conference on Uncertainty in ArtificialIntelligence (UAI–94), pages 286–292, San Fran-cisco, CA. Morgan Kaufmann Publishers.

D. Heckerman and J. Breese. 1996. Causal indepen-dence for probability assessment and inference us-ing Bayesian networks. In IEEE, Systems, Man,and Cybernetics, pages 26:826–831.

D. Heckerman. 1993. Causal independence forknowledge acquisition and inference. In Proceed-ings of the Ninth Conference on Uncertainty inArtificial Intelligence (UAI–93), pages 122–127,Washington, DC. Morgan Kaufmann Publishers.

D. Heckerman. 1999. A tutorial on learning withBayesian networks. In Learning in GraphicalModels, pages 301–354, Cambridge, MA, USA.MIT Press.

M. Henrion. 1989. Some practical issues in con-structing belief networks. In L.N. Kanal, T.S.Levitt, and J.F. Lemmer, editors, Uncertainty inArtificial Intelligence 3, pages 161–173. ElsevierScience Publishing Company, Inc., New York, N.Y.

J. F. Lemmer and D. E. Gossink. 2004. Recursivenoisy-OR: A rule for estimating complex prob-abilistic causal interactions. IEEE Transactionson Systems, Man and Cybernetics, (34(6)):2252 –2261.

J. Pearl. 1988. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. Mor-gan Kaufmann Publishers, Inc., San Mateo, CA.

J. A. Rosen and W. L. Smith. 1996. Influencenet modeling and causal strengths: An evolution-ary approach. In Command and Control Researchand Technology Symposium.

A. Zagorecki, M. Voortman, and M. Druzdzel. 2006.Decomposing local probability distributions inBayesian networks for improved inference and pa-rameter learning. In FLAIRS Conference.

332 A. Zagorecki and M. J. Druzdzel

Page 343: Proceedings of the Third European Workshop on Probabilistic

Author Index

Abellan, Joaquın, 1Antal, Peter, 9, 17Antonucci, Alessandro, 25Bangsø, Olav, 35Beltoft, Ivan, 231Bilgic, Taner, 239Bjorkegren, Johan, 247Bolt, Janneke H., 43, 51Chen, Tao, 59Cobb, Barry R., 67Dessau, Ram, 231Dıez, Francisco J., 131, 179Druzdzel, Marek J., 279, 317, 325Feelders, Ad, 75Flores, M. Julia, 83Francois, Olivier C.H., 91van der Gaag, Linda C., 51, 99, 107, 255Gamez, Jose A., 83, 115, 123Geenen, Petra L., 99van Gerven, Marcel A. J., 131Gezsi, Andras, 9Gomez-Olmedo, Manuel, 1Gomez-Villegas, Miguel A., 139Gonzales, Christophe, 147Hejlesen, Ole K., 231Heskes, Tom, 163Hoyer, Patrik O., 155Hullam, Gabor, 9Ibarguengoytia, Pablo, 263Ivanovs, Jevgenijs, 75Jaeger, Manfred, 215Jensen, Finn V., 35Jouve, Nicolas, 147Jurgelenaite, Rasa, 163Kerminen, Antti J., 155Kwisthout, Johan, 171Larranaga, Pedro, 271Leray, Philippe, 91, 195Lozano, Jose A., 271Luque, Manuel, 179Maes, Sam, 195Maın, Paloma, 139Manderick, Bernard, 195

Martınez, Irene, 187Mateo, Juan L., 115Meganck, Stijn, 195Millinghoffer, Andras, 9, 17Moral, Serafın, 1, 83Morales, Eduardo, 263Morton, Jason, 207Nielsen, Jens D., 215Nielsen, Søren H., 223Nielsen, Thomas D., 223Nilsson, Roland, 247Olesen, Kristian G., 231Ozgur-Unluakin, Demet, 239Pachter, Lior, 207Pena, Jose M., 247Puerta, Jose M., 115Renooij, Silja, 99, 255Reyes, Alberto, 263Rodrıguez, Carmelo, 187Rumı, Rafael, 123Salmeron, Antonio, 123, 187Santafe, Guzman, 271Shimizu, Shohei, 155Shiu, Anne, 207Smith, Jim, 293Søndberg-Madsen, Nicolaj, 35Sturmfels, Bernd, 207Sucar, L. Enrique, 263Sun, Xiaoxun, 279Susi, Rosario, 139Simecek, Petr, 287Tegner, Jesper, 247Tel, Gerard, 171Thwaites, Peter, 293Trangeled, Michael, 231de Waal, Peter R., 107Wang, Yi, 301Wienand, Oliver, 207Xiang, Yang, 309Yuan, Changhe, 279, 317Zaffalon, Marco, 25Zagorecki, Adam, 325Zhang, Nevin L., 59, 301

Page 344: Proceedings of the Third European Workshop on Probabilistic

Title: Proceedings of the Third European Workshopon Probabilitic Graphical Models

Editors: Milan Studeny and Jirı Vomlel

Published by: Zeithamlova Milena, Ing. - Agentura Action MVrsovicka 68, 101 00 Praha 10, Czech Republichttp://www.action-m.com

Number of Pages: 344Number of Copies: 80Year of Publication: 2006Place of Publication: PragueFirst Edition

Printed by: Reprostredisko UK MFFCharles University in Prague - Faculty of Mathematics and PhysicsSokolovska 83, 186 75 Praha 8 - Karlın, Czech Republic

ISBN 80-86742-14-8