signal propagation in bayesian networks and its relationship with intrinsically multivariate...

17
Signal propagation in Bayesian networks and its relationship with intrinsically multivariate predictive variables David C. Martins Jr. a,, Evaldo A. de Oliveira b , Ulisses M. Braga-Neto e , Ronaldo F. Hashimoto c , Roberto M. Cesar Jr. c,d a Center for Mathematics, Computation and Cognition, Federal University of ABC, R. Santa Adélia 166, Santo André, SP 09210-170, Brazil b Department of Earth and Exact Sciences, Federal University of São Paulo, R. Arthur Ridel 275, Diadema, SP 09972-270, Brazil c Institute of Mathematics and Statistics, University of São Paulo, R. do Matão 1010, São Paulo, SP 05508-090, Brazil d Brazilian Bioethanol Science and Technology Laboratory, Campinas, SP 13083-970, Brazil e Genomic Signal Processing Lab, Texas A&M University, College Station, TX 77843-3128, USA article info Article history: Received 1 June 2011 Received in revised form 9 October 2012 Accepted 14 October 2012 Available online 23 November 2012 Keywords: Bayesian network Feature selection Intrinsically multivariate prediction abstract A set of predictor variables is said to be intrinsically multivariate predictive (IMP) for a tar- get variable if all properly contained subsets of the predictor set are poor predictors of the target but the full set predicts the target with great accuracy. In a previous article, the main properties of IMP Boolean variables have been analytically described, including the intro- duction of the IMP score, a metric based on the coefficient of determination (CoD) as a mea- sure of predictiveness with respect to the target variable. It was shown that the IMP score depends on four main properties: logic of connection, predictive power, covariance between predictors and marginal predictor probabilities (biases). This paper extends that work to a broader context, in an attempt to characterize properties of discrete Bayesian networks that contribute to the presence of variables (network nodes) with high IMP scores. We have found that there is a relationship between the IMP score of a node and its territory size, i.e., its position along a pathway with one source: nodes far from the source display larger IMP scores than those closer to the source, and longer pathways dis- play larger maximum IMP scores. This appears to be a consequence of the fact that nodes with small territory have larger probability of having highly covariate predictors, which leads to smaller IMP scores. In addition, a larger number of XOR and NXOR predictive logic relationships has positive influence over the maximum IMP score found in the pathway. This work presents analytical results based on a simple structure network and an analysis involving random networks constructed by computational simulations. Finally, results from a real Bayesian network application are provided. Ó 2012 Elsevier Inc. All rights reserved. 1. Introduction Bayesian networks [2,8] have been used as a useful approach to model systems composed of components that commu- nicate by local interaction, i.e. each component directly depending on a small number of elements. Biological systems, for instance, present such property [4]. Bayesian networks are mathematically defined in terms of probabilities and conditional independence properties and can be employed to infer direct ‘‘causal’’ influence (connections between variables) [13,18,5,8,14]. 0020-0255/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2012.10.027 Corresponding author. E-mail address: [email protected] (D.C. Martins Jr.) Information Sciences 225 (2013) 18–34 Contents lists available at SciVerse ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins

Upload: roberto-m

Post on 09-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Information Sciences 225 (2013) 18–34

Contents lists available at SciVerse ScienceDirect

Information Sciences

journal homepage: www.elsevier .com/locate / ins

Signal propagation in Bayesian networks and its relationshipwith intrinsically multivariate predictive variables

David C. Martins Jr. a,⇑, Evaldo A. de Oliveira b, Ulisses M. Braga-Neto e,Ronaldo F. Hashimoto c, Roberto M. Cesar Jr. c,d

a Center for Mathematics, Computation and Cognition, Federal University of ABC, R. Santa Adélia 166, Santo André, SP 09210-170, Brazilb Department of Earth and Exact Sciences, Federal University of São Paulo, R. Arthur Ridel 275, Diadema, SP 09972-270, Brazilc Institute of Mathematics and Statistics, University of São Paulo, R. do Matão 1010, São Paulo, SP 05508-090, Brazild Brazilian Bioethanol Science and Technology Laboratory, Campinas, SP 13083-970, Brazile Genomic Signal Processing Lab, Texas A&M University, College Station, TX 77843-3128, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 1 June 2011Received in revised form 9 October 2012Accepted 14 October 2012Available online 23 November 2012

Keywords:Bayesian networkFeature selectionIntrinsically multivariate prediction

0020-0255/$ - see front matter � 2012 Elsevier Inchttp://dx.doi.org/10.1016/j.ins.2012.10.027

⇑ Corresponding author.E-mail address: [email protected] (D.C

A set of predictor variables is said to be intrinsically multivariate predictive (IMP) for a tar-get variable if all properly contained subsets of the predictor set are poor predictors of thetarget but the full set predicts the target with great accuracy. In a previous article, the mainproperties of IMP Boolean variables have been analytically described, including the intro-duction of the IMP score, a metric based on the coefficient of determination (CoD) as a mea-sure of predictiveness with respect to the target variable. It was shown that the IMP scoredepends on four main properties: logic of connection, predictive power, covariancebetween predictors and marginal predictor probabilities (biases). This paper extends thatwork to a broader context, in an attempt to characterize properties of discrete Bayesiannetworks that contribute to the presence of variables (network nodes) with high IMPscores. We have found that there is a relationship between the IMP score of a node andits territory size, i.e., its position along a pathway with one source: nodes far from thesource display larger IMP scores than those closer to the source, and longer pathways dis-play larger maximum IMP scores. This appears to be a consequence of the fact that nodeswith small territory have larger probability of having highly covariate predictors, whichleads to smaller IMP scores. In addition, a larger number of XOR and NXOR predictive logicrelationships has positive influence over the maximum IMP score found in the pathway.This work presents analytical results based on a simple structure network and an analysisinvolving random networks constructed by computational simulations. Finally, resultsfrom a real Bayesian network application are provided.

� 2012 Elsevier Inc. All rights reserved.

1. Introduction

Bayesian networks [2,8] have been used as a useful approach to model systems composed of components that commu-nicate by local interaction, i.e. each component directly depending on a small number of elements. Biological systems, forinstance, present such property [4]. Bayesian networks are mathematically defined in terms of probabilities and conditionalindependence properties and can be employed to infer direct ‘‘causal’’ influence (connections between variables)[13,18,5,8,14].

. All rights reserved.

. Martins Jr.)

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 19

The concept of intrinsically multivariate predictive (IMP) variables was introduced in [9], in which a target variablestrongly depends on a set of other variables, but such dependence is weak or absent when one considers properly containedsubsets of variables. The IMP score was introduced as a metric based on the coefficient of determination (CoD) [12,3] as ameasure of predictiveness with respect to the target variable. It was shown in [9] that the IMP score of a target variableis affected by four properties: logic of prediction, predictive power, covariance between the predictors, and the marginalprobabilities of each individual predictor. It was demonstrated that IMP variables (i.e., variables with large IMP score) tendto occur for large predictive power, small correlation between predictors, and certain specific predictor logics—2-mintermlogics (XOR and NXOR) lead to larger IMP scores than 1-and 3-minterm logics (AND;OR;NOR;NAND; x1 ^ �x2 and x1 _ �x2).Based on these results, we hypothesized that large proportions of nodes with XOR logic of prediction in the networks couldimprove the chance for the appearance of nodes with large IMP score. We show in this paper that this is indeed the case; thelarger the number of XOR/NXOR logics in the network is, the larger the maximum IMP score in the network is.

The study of the IMP phenomenon can be useful in feature selection for pattern recognition, since it is one of the mainreasons for the occurrence of the nesting effect. Basically, the nesting effect is a feature selection issue that occurs when somefeatures included in the partial subset solution by some algorithm are not present in the optimal solution and never dis-carded, leading to a suboptimal solution [15]. Another application of IMP is that it seems to be associated with variables thatpossess canalizing functions [7,6,10], an important concept in Systems Biology—canalizing genes exhibit key roles on generegulatory networks [16]. Martins et al. showed that DUSP1 gene, which is canalizing gene exhibiting control over a central,process-integrating signaling pathway, displays the largest number of IMP predictors in melanoma expression data [9]. Be-sides, Bayesian networks are often applied to financial risk analysis in order to model conditional multivariate dependenceamong variables [19,11].

In this paper, we analyze the intrinsically multivariate prediction phenomenon in networks with three or more nodes, anextension of the study presented in [9] which considers only one target and its set of predictors (two or three). In particular,we analyze how the territory size of a target node (a graph-theoretical property defined in Section 4) impacts the probabilityof occurrence of IMP nodes in Bayesian networks with Boolean variables. We show that a target with large territory canachieve larger IMP scores with its predictors than a target with small territory. This finding is in agreement with the hypoth-esis, advanced in [9], that subsets with high IMP score are more susceptible to be responsible for regulation of several met-abolic pathways or subsystems as observed in microarray data analysis of melanoma experiments. We also show that theabsolute value of the covariance between predictors is negatively correlated with the territory size. It is worth mentioningthat, although these results are given in the context of logical functions, they can be easily extended to other types of func-tions. In summary, this paper contributes to theoretical advances in the analysis of the intrinsically multivariate predictionphenomenon in the context of Bayesian networks.

This work is organized as follows. Section 2 reviews fundamental concepts. Section 3 describes the network model used toanalyze the IMP score behavior as a function of the territory size of a given target. Section 4 presents analytical results basedon a simple structure network. In order to generalize the analytical results, Section 5 presents an analysis of the IMP score inrandom networks constructed by computational simulations, as well as a real example from the Bayesian networks Repos-itory (http://www.cs.huji.ac.il/site/labs/compbio/Repository). Finally, conclusions are given in Section 6.

2. Background

2.1. Bayesian networks

Here we review fundamental concepts of Bayesian networks to aid the comprehension of the paper; for more details, see[4]. Let X = {x1,x2, . . . ,xn} 2 Dn be a set of random variables, each one defined in a given domain D. A Bayesian network is arepresentation of the joint probability distribution of X consisting of a directed acyclic graph (DAG) G whose vertices arethe random variables x1, . . . ,xn and a component that describes a conditional probability distribution for each variable, givenits parents in the graph. These two components describe a unique joint probability distribution on X1, . . . ,Xn.

Such graph allows a decomposition of the joint distribution that leads to a reduced number of parameters. Each variable isindependent of its non-descendants, given its parents. The joint distribution can be decomposed into the product form:

PðX1; . . . ;XnÞ ¼Yn

i¼1

PðXijPaGðXiÞÞ; ð1Þ

where PaG(Xi) is the set of parents of Xi in G.Considering X1, . . . ,Xn as Boolean variables (i.e. Xi 2 {0,1}), the conditional probability P(XijPaG(Xi)) can be represented by a

table that specifies the probability of values for Xi considering all possible observations (or configurations) of the values ofPaG(Xi). So, if the set PaG(Xi) has size k, the number of rows in the table is given by 2k, where each row corresponds to a dis-tribution of Xi given a specific configuration of PaG(Xi).

2.2. Intrinsically multivariate prediction

Here we recall from Ref. [9] fundamentals behind the intrinsically multivariate prediction concept. Let X = {x1,x2, . . . ,xn}2 {0,1}n be a set of predictor random variables and Y = y 2 {0,1} be the target random variable. The coefficient of determi-nation (CoD) of X with respect to Y [3] is given by:

1 A fu

20 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

CoDYðXÞ ¼ 1� EYðXÞEY

; ð2Þ

where EY is the error of the best predictor of Y in the absence of other observations and EYðXÞ is the error of the best predictorof Y based on the observation of X.

The intrinsically multivariate predictiveness (IMP score) is given by:

IMPYðXÞ ¼ CoDY ðXÞ �maxZ&X

CoDY ðZÞ; ð3Þ

which is the difference between the CoD of the whole set of predictors and the maximum of the CoDs of the properly con-tained subsets. When this expression evaluates close to 0, it means that at least one properly contained subset of predictorsis almost as good as the whole set of predictors, which is then not IMP. Otherwise, if this expression evaluates close to 1, itmeans that all properly contained predictor subsets are bad while the whole predictor set is very good, which is then said tobe IMP.

Considering X and Y Boolean variables, we can define the logic of prediction, which corresponds to the table of outputsY = {0,1} for every instance X = {0,1}n. It can be represented by a binary string. For example, considering two variables, theAND logic is represented by the string (0001), which means that if we observe X = {(0,0), (0,1), (1,0)} then Y = 0, and ifX = {(1,1)}, then Y = 1. Thus, there are 24 possible logics for the two predictor case. In general, there are 22n

logics for then-predictor case.

We can define three classes of logics with respect to the IMP score:

1. IMP logic: a logic of prediction is IMP if maxZˆXCoDY(Z) = 0;2. anti-IMP logic: a logic of prediction is anti-IMP if maxZˆXCoDY(Z) = 1;3. partly-IMP logic: a logic of prediction is partly-IMP if 0 < maxZˆXCoDY(Z) < 1;

As seen in [9], for the two predictors case, there are 10 IMP logics {AND (0001), x1 ^ �x2 (0010), �x1 ^ x2 (0100), NOR(1000), OR (0111), x1 _ �x2 (1011), �x1 _ x2 (1101), NAND (1110), XOR (0110), NXOR (1001)}. The first four are called1-minterm logics,1 the next four are called 3-minterm logics and the last two are called 2-minterm logics (XOR/NXOR). Theremaining 6 logics are anti-IMP. There are partly-IMP logics for three or more predictors, but not for two predictors.

The predictive power is defined by p ¼ 1� EY ðXÞ, measuring the accuracy of the set of predictors in predicting the targetbehavior. This factor makes the prediction stochastic, so that the logics correspond to stochastic truth tables, where all out-put values are of the form p or 1 � p for 1

2 6 p 6 1. In addition, due to this stochasticity, the IMP logics will generate partlyIMP scores (0 < IMPY(X) < 1), while the case p = 1 produces standard deterministic logics. Therefore, the present framework isa stochastic generalization of the classic Boolean case. For example, the stochastic AND logics now can be expressed by thefollowing conditional probabilities of Y = 1 given X: P(Y = 1jX – (1,1)) = 1 � p and P(Y = 1jX = (1,1)) = p.

3. Network model

Initially, it is necessary to decide which types of connections and restrictions should be considered to define the model ofthe networks to be generated. The model will admit only 1 and 2-degree of connection, i.e., each node has 1 or 2 predictors(except for the source node which do not have predictors). There are two reasons to not consider connections with largerthan 2-degree of prediction in the network properties. First, even for 3 degrees of prediction, the number of possible logicsis high enough to introduce confounding effects that hide some interesting network properties. Second, if we want to applythe approach in a situation where there is a small number of samples, such as is often the case in certain medical applica-tions, the correct estimation of the probability distribution of a large number of variables becomes very difficult. A third rea-son, in the specific case of biological networks, is that such systems are thought to be in the frontier between non-chaoticand chaotic behavior, which occurs for an average degree of connectivity between 2 and 3 [7].

With regard to the 1-degree predictions, there are only two logics of interest (i.e., which yield positive CoD): activationand inhibition. Given the predictive power p P 0.5, the conditional probability distribution (CPD) tables of these two logicsare shown in Table 1.

As seen in Section 2, there are ten 2-degree IMP logics. Two of them are redundant with respect to the order of variables(�x1 ^ x2 redundant to x1 ^ �x2 and �x1 _ x2 to x1 _ �x2). There remain 8 IMP logics: AND; OR; NAND; NOR; �x1 ^ x2;�x1 _ x2; XOR and NXOR. In our model, all these eight logics are admitted.

The proposed network model is a Bayesian network, in the sense that it does not contain any cycles. By using the Bayesiannetwork model, every gene has a CPD table related to the predictors, except for the source nodes (nodes with no predictors).Through these CPD tables, it is possible to construct a joint probability distribution (JPD) table with 2n rows where n is thenumber of nodes. From this table and the logical structure of the network, it is possible to obtain the CoDs and consequentlythe IMP scores of all connections. As a source node does not have predictors, the probability of its value being 1 is either

nction is called m-minterm if it exhibits m true outputs.

Table 1CPD tables for 1-degree prediction logics (activation and inhibition) with predictive powerp.

X P(Y = 0jX) P(Y = 1jX)

(a) Activation0 p 1 � p1 1 � p p

(b) Inhibition0 1 � p p1 p 1 � p

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 21

randomly chosen from a given interval of probabilities, or it is unbiased, i.e., the probability of value 1 is equal to the prob-ability of value 0.

4. Analytical study for a simple network structure

In this section, we define a network structure that is simple enough to allow an analytical characterization of the relation-ship between the IMP score and the territory size of a given node. First, let us define the concept of the territory of a node x,which is usually defined as the set T of nodes in which there is at least one path from x to every node x0 2 T. Here, we redefinethe territory considering the inverse paths, i.e., T is the territory of a node x if and only if for every x0 2 T, there is at least onepath from x0 to x (the ‘‘from’’ and ‘‘to’’ are inverted). In particular, x 2 T, so the minimum size of the territory of a node is 1.This is illustrated in Fig. 1.

The network model G adopted in this section is defined as follows: G = (X, E) is a graph with a set of vertices X = (x1,x2,. . . ,xn) and a set of edges E, in which each edge is an ordered pair of vertices (xi, xj) indicating that xj is logically dependent onxi. The particular structure of G adopted here contains edges (xi, xi+1) and (xi, xi+2) for every i = 1,2, . . . ,n � 2 and (xn�1, xn). Anexample of a network with this structure is displayed in Fig. 2. Nodes x3,x4, . . . ,xn have two predictors, node x2 has one pre-dictor and node x1 is the source of the network. All connections are assumed to have the same predictive power p.

The probability of activation of the source node x1 is set to P(x1 = 1) = 0.5 (unbiased). We remark that other values forP(x1 = 1) have been considered and the main conclusions obtained remained unchanged.

For a node with one predictor (e.g. x2 with predictor x1 in Fig. 2), the logical function has one of the two forms: activationor inhibition. If x2 is activated by x1, the conditional probability of x2 given x1 is given by:

Fig. 1.under t

Pðx2 ¼ 1jx1Þ ¼ 1� pþ ð2p� 1Þx1: ð4Þ

On the other hand, if x2 is inhibited by x1, the conditional probability of x2 given x1 is given by:

Pðx2 ¼ 1jx1Þ ¼ pþ ð1� 2pÞx1: ð5Þ

For a node xk+1 with two predictors xk, xk�1, the conditional probability of xk+1 given xk, xk�1 depends on the logical func-tion F associated with the prediction:

Pðxkþ1 ¼ 1jxk; xk�1Þ ¼ 1� pþ ð2p� 1ÞFðxk; xk�1Þ: ð6Þ

As seen in Sections 2 and 3, there are eight logics for the two-predictor case that can produce IMP. For the particular net-work structure adopted here, we consider two of them: the AND logic function, representing all 1-minterm and 3-mintermlogics and the XOR logic function, representing the 2-minterm IMP logics.

Example of network, with numbered nodes. The territory of node 7 is indicated by the shaded area, and the territory size t of each node is indicatedhe node.

Fig. 2. An example of a simple network considered for the analytical study.

22 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

4.1. AND network

Let us first consider the case where all two-predictor logics are AND. Thus, the likelihood of the activation of a given genexk+1 is

Pðxkþ1 ¼ 1jxk; xk�1Þ ¼ 1� pþ ð2p� 1Þxkxk�1: ð7Þ

The computation of the CoD (2) requires the likelihoods, the first and the second moments of each node. The likelihood ofthe activation of node xk+1 given xk is:

Pðxkþ1 ¼ 1jxkÞ ¼Xxk�1

½1� pþ ð2p� 1Þxkxk�1�Pðxk�1jxkÞ: ð8Þ

The first moment (expectation of xk+1) is hxkþ1i ¼P

xkþ1xkþ1Pðxkþ1Þ ¼ Pðxkþ1 ¼ 1Þ, which by Bayes’ Theorem [17] leads to

hxkþ1i ¼X

xk ;xk�1

Pðxkþ1 ¼ 1jxk; xk�1ÞPðxk; xk�1Þ ¼ ð1� pÞ þ ð2p� 1Þhxkxk�1i: ð9Þ

In the same way, the second moment (expectation of the product xk+1 xk) is given by:

hxkþ1xki ¼ ð1� pÞhxki þ ð2p� 1Þhxkxk�1i: ð10Þ

The error of prediction of xk+1 without taking into account any predictor is given by the minimum between P(xk+1 = 1) andP(xk+1 = 0), which can be written equivalently as:

Exkþ1¼ 1

2� 1

2j1� 2Pðxkþ1 ¼ 1Þj: ð11Þ

The error of prediction of xk+1 given xi is:

Exkþ1ðxiÞ ¼

12� 1

2

Xxi

j1� 2Pðxkþ1 ¼ 1jxiÞjPðxiÞ: ð12Þ

The error of prediction of xk+1 given xi and xj is:

Exkþ1ðxi; xjÞ ¼

12� 1

2

Xxi ;xj

j1� 2Pðxkþ1 ¼ 1jxi; xjÞjPðxi; xjÞ: ð13Þ

More concise expressions are obtained by using (7) and recalling that p P 12:

Fig. 3.activate

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 23

Exkþ1¼ 1

2� 1

2j1� 2hxkþ1ij; ð14Þ

Exkþ1ðxkÞ ¼

12� 1

2ð2p� 1Þð1� hxki þ jhxki � 2hxkxk�1ijÞ; ð15Þ

Exkþ1ðxk�1Þ ¼

12� 1

2ð2p� 1Þð1� hxk�1i þ jhxk�1i � 2hxkxk�1ijÞ; ð16Þ

Exkþ1ðxk; xk�1Þ ¼ 1� p: ð17Þ

Finally, the IMP score can be computed using the following expression:

IMPxkþ1ðxk; xk�1Þ ¼ CoDxkþ1

ðxk; xk�1Þ � maxZ$fxk ;xk�1g

CoDxkþ1ðZÞ; ð18Þ

where the CoD’s are obtained by plugging Eqs. (14)–(17) into (2).Fig. 3 shows two plots containing four IMP score curves as a function of the territory size k, for different values of the

predictive power: p = 0.95 (very large), p = 0.9 (large), p = 0.8 (small) and p = 0.6 (very small). The curves in the left plot cor-respond to the case in which x2 is activated by x1 while the curves in the right plot correspond to the case in which x2 isinhibited by x1.

The first interesting observation is that, for every predictive power considered, the IMP score increases with the territorysize k, converging to a limiting value as k ?1. This asymptotic IMP score depends on the predictive power in such way thatlarger p leads to larger asymptotic IMP score. In addition, it can be observed that its convergence rate depends on the pre-dictive power for the activation case, Fig. 3a, while being almost instantaneous for the inhibition case, Fig. 3b.

4.2. XOR network

Now, let us consider the case in which all 2-degree prediction logics are XOR. The likelihood of xk+1 being activated is givenby:

Pðxkþ1 ¼ 1jxk; xk�1Þ ¼ 1� pþ ð2p� 1Þðxk þ xk�1 � 2xkxk�1Þ: ð19Þ

The first and second moments are

hxkþ1i ¼ ð1� pÞ þ ð2p� 1Þðhxki þ hxk�1i � 2hxkxk�1iÞ; ð20Þhxkþ1xki ¼ phxki � ð2p� 1Þhxkxk�1i; ð21Þ

and the error expressions are:

Exkþ1¼ 1

2� 1

2j1� 2hxkþ1ij; ð22Þ

Exkþ1ðxkÞ ¼

12� 1

2ð2p� 1Þ j1� hxki � 2hxk�1i þ 2hxkxk�1ij þ jhxki � 2hxkxk�1ijf g; ð23Þ

3 45 90 135 1800

0.1

0.2

0.3

0.4

0.5

(a)

3 45 90 135 1800

0.1

0.2

0.3

0.4

0.5

(b)The IMP score for the AND network as a function of the territory size k for varying predictive power value p. The left plot (a) corresponds to x2 beingd by x1 and the right plot (b) corresponds to x2 being inhibited by x1.

Fig. 4.activate

24 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

Exkþ1ðxk�1Þ ¼

12� 1

2ð2p� 1Þ j1� hxk�1i � 2hxki þ 2hxkxk�1ij þ jhxk�1i � 2hxkxk�1ijf g; ð24Þ

Exkþ1ðxk; xk�1Þ ¼ 1� p: ð25Þ

The IMP score curves for the XOR network as a function of the territory size k are plotted in Fig. 4. Due to the XOR sym-metry, considering x2 being either activated or inhibited by x1 leads to the same curves. Such curves present a strong oscil-latory behavior for small k, although the overall trend of the curves are clearly increasing.

4.3. Mixed AND/XOR network

So far, we have shown that the IMP score has an increasing trend with the territory size in networks containing eitherAND or XOR two-predictor logics. In this section, we show that in a network containing a mixture of AND and XOR logics,larger proportions of XOR logics increase the asymptotic IMP score.

According to Martins et al. [9], XOR and NXOR logics can achieve higher IMP scores than 1-minterm and 3-minterm logicscan do, considering the same predictive power p. From this, it should be expected that the larger the number of XOR/NXORlogics in the network is, the larger the likelihood of achieving high IMP scores is. In other words, one expects to observe ahigh positive correlation between the number of XOR/NXOR present in the network and its highest IMP score. We will seethat this is indeed the case.

In order to represent the proportion of XOR logics in the network, let us define a random variable fk 2 {0,1} associated witheach target gene xk, such that fk = 0 indicates that xk is predicted by and AND logic and fk = 1 indicates that xk is predicted byan XOR logic. In addition, we make the natural assumption of independence of fk with regard to the states of any gene: P(fk-

jxj) = P(fk), for all j, k. Given these assumptions, the probability of activation of the target is given by

Pðxkþ1 ¼ 1jfkþ1; xk; xk�1Þ ¼ 1� pþ ð2p� 1Þ½fkþ1ðxk þ xk�1Þ þ ð1� 3f kþ1Þxkxk�1�: ð26Þ

From Eq. (26), we obtain:

hxkþ1i ¼ 1� pþ ð2p� 1Þ½qhxki þ qhxk�1i þ ð1� 3qÞhxkxk�1i�; ð27Þhxkxkþ1i ¼ ½1� pþ qð2p� 1Þ�hxki þ ð2p� 1Þð1� 2qÞhxkxk�1i: ð28Þ

Finally, after some straightforward calculation, we obtain the errors of taking into account only one of the predictors are:

Exkþ1ðxkÞ ¼

12� p� 1

2

� �j1� hxki � 2qðhxk�1i � hxkxk�1iÞj þ j1� 2qjjhxki � 2hxkxk�1ijð Þ; ð29Þ

Exkþ1ðxk�1Þ ¼

12� p� 1

2

� �j1� hxk�1i � 2qðhxki � hxkxk�1iÞj þ j1� 2qjjhxk�1i � 2hxkxk�1ijð Þ; ð30Þ

which, together with Exkþ1¼ 1

2� 12 j1� 2hxkþ1ij and Exkþ1

ðxk; xk�1Þ ¼ 1� p, give the IMP score defined in Eq. (3).Fig. 5 shows five curves of IMP score as a function of predictive power for different values of q, in order to see the effect of

the proportion of XOR logics in the asymptotic IMP score. It can be observed that the asymptotic IMP score grows as a linearfunction of the predictive power and larger q leads to higher curves, which means that larger proportions of XOR logics leadto large IMP scores, as expected.

3 45 90 135 1800

0.2

0.4

0.6

0.8

1

The IMP score for the XOR network as a function of the territory size k for varying predictive power value p. The same plot applies to x2 beingd by x1 and x2 being inhibited by x1.

0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Fig. 5. The asymptotic IMP score (k ?1) for the mixed AND/XOR network as a function of the predictive power p for varying q (proportion of XOR logics inthe network). From bottom to top: q = 0 (circles), q = 0.25 (down triangles), q = 0.5 (stars), q = 0.75 (up triangles) and q = 1 (squares).

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 25

4.4. Covariance vs. territory size

As seen in Martins et al. [9], the covariance between predictors is an important factor that influences the IMP score. Usu-ally, large covariances in modulus tend to impact negatively the IMP score. In the network structure adopted, there is strongcovariance between the source x1 and x2, since x2 is uniquely predicted by x1 (activation or inhibition). Thus, it is intuitivelyexpected that targets relatively close to the source (small territory) tend to present predictors with larger absolute value ofthe covariance than those targets far from the source. If this property is verified, it would be in agreement with the increasingtendency of the IMP score with the territory size of a given target.

The covariance of the predictors xk�1, xk�2 of a given target xk, for k P 3, is given by hxk�1xk�2i � hxk�1ihxk�2i, and can beobtained by applying the moments in Eqs. (9) and (10) for the pure AND network, Eqs. (20) and (21) for the pure XOR net-work. Fig. 6 exhibits the curves of the covariance as a function of the territory size assuming AND logics for every 2-degreeprediction and for both x1 activating x2 and x1 inhibiting x2. Fig. 7 presents the curves assuming XOR logics for every 2-degreeprediction.

4.5. Discussion

The general conclusion to be obtained from the analytical results is that target nodes with large territory size have morechance to display large IMP scores. This fact is corroborated by the observation that predictors of targets close to the sourcenode (small territory) tend to be highly covariate, whereas those predictors far from the source tend to have smallcovariance.

3 45 90 135 1800

0.05

0.1

0.15

0.2

0.25

(a)

3 45 90 135 1800

0.05

0.1

0.15

0.2

0.25

(b)Fig. 6. Absolute value of the covariance as a function of the territory size for the AND network and varying predictive power value p. (a) x2 being activatedby x1 and (b) x2 being inhibited by x1.

3 45 90 135 1800

0.05

0.1

0.15

0.2

0.25

Fig. 7. Absolute value of the covariance as a function of the territory size for the XOR network and varying predictive power value p. (left) x2 being activatedby x1; (right) x2 being inhibited by x1. The same plot applies to x2 being activated by x1 and x2 being inhibited by x1.

26 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

Looking more carefully at Fig. 3a (AND network in which x2 is activated by x1), it is possible to notice that the curves forhigh predictive power (p = 0.90 and p = 0.95) tend to present smaller IMP scores for small k (less than 10) than those pre-sented by curves for small predictive power (p = 0.60 and p = 0.80). As stated in Martins et al. [9], positive covariance forthe AND logic of prediction strongly inhibits the IMP score. Thus, high predictive power means that x1 is highly covariate withx2, inhibiting the IMP score of variables close to the source, even though high predictive power is ultimately important toproduce high IMP scores (the covariance effect predominates for small k). As the territory size increases, the effect of covari-ance weakens and the predictive power becomes an important factor to get large IMP scores, so that the curves for high pre-dictive power overtake the other curves. In Fig. 3b (AND network in which x2 is inhibited by x1), because negative covarianceis not so detrimental to the IMP score (actually a small negative covariance around�0.05 is beneficial [9]), curves with high pare always above the curves with small p.

For the XOR network case, the oscillatory behavior of the IMP score for small k can be explained by the variation of thebias on the marginal probabilities of xk and xk�1. Small bias of a pair (xk, xk�1) means that the first moments hxki and hxk�1i areclose to 1

2, i.e. the probability of each of these variables being activated is approximately equal to the probability of beinginhibited. Particularly for the XOR logic, small bias leads to large IMP score [9]. Fig. 8 displays the first moment hxki as a func-tion of k, for x1 activating x2 and x1 inhibiting x2. For k values multiples of 3 ðk ¼ 3i; i 2 NþÞ there is a peak, while for theother k values their biases are minimum: hxki ¼ 1

2 for all k – 3i; i 2 Nþ, which is beneficial to the IMP score. Comparing theseplots with the one presented in Fig. 4, it is possible to see the correspondence between the oscillation peaks, observing thatthe peaks occur for k multiple of 3 in both plots, which clearly indicates that the biases are the main effect on the IMP scoreoscillations. In addition, as k increases, the oscillations become smaller and the IMP score converges to the maximum value,

3 4 6 8 10 12 14 160

0.1

0.2

0.3

0.4

0.5

(a)

6 83 4 10 12 14 16

0.5

0.6

0.7

0.8

0.9

(b)Fig. 8. First moment hxki as a function of k for the XOR network, indicating the occurrence of minimum bias on the predictors of the targets xk, i.ehxk�2i = hxk�1i = 0.5, for k multiple of 3. (a) x2 activated by x1 and (b) x2 inhibited by x1.

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 27

since the biases of predictors tend to be minimum. The covariance plot presented in Fig. 7 also has peaks for k multiple of 3.These peaks also shrink as k increases, contributing for the IMP score to converge to the maximum value, which depends onthe predictive power p.

5. Network experimental study

Here we examine networks of general structure, allowing all 2-degree IMP logics to appear: AND;NAND;OR;NOR; xi ^ �xj

and xi _ �xj;XOR and NXOR. Such networks are not amenable to analytical study, therefore we employ an experimental ap-proach based on ensembles of randomly generated networks.

5.1. Network construction mechanism

Let X = x1,x2, . . . ,xn be the genes composing the network. The predictors of xi comprise the subset Zi, such that if xj 2 Zi,then i > j. The node x1 is the source of the network (no predictors) and its bias P(x1 = 1) is randomly chosen in the interval[0.1,0.9] to avoid nearly-constant (housekeeping) source nodes. The node x2 is predicted only by the source x1, with eitheractivation or inhibition logic chosen with probability 0.5 each. The node xi, for i P 3, can be predicted by any two genes ran-domly chosen from the set x1,x2, . . . ,xi�1. The logic of connection is randomly selected from the 8 IMP logics.

An example of a network generated by this mechanism is illustrated in Fig. 9 (logics not shown).

5.2. Results

We performed two experiments generating simulated networks via the construction mechanism described in Section 5.1,considering the same predictive power values used in Section 4: p = 0.95 (very large), p = 0.9 (large), p = 0.8 (small) andp = 0.6 (very small). The results corroborate the findings of Martins et al. [9] and the analytical study of Section 4, revealinga remarkable property in which IMP sets have a tendency to control a large number of nodes, especially when p is very large.

The first experiment consists in analyzing the correlation between the number of categories of logics (e.g. number of XOR/NXOR or number of 1-minterm and 3-minterm logics) and the maximum IMP score obtained in the simulated networks. Thesecond experiment aims at investigating the relation between territory size of a node and its IMP score, according to its logic(XOR/NXOR or 1-minterm and 3-minterm).

5.3. Number of XOR/NXOR logics

In this experiment, we generated 10,000 networks with 14 nodes for each considered p. Each network has a node withmaximum IMP score. The idea is to observe if there is some correlation between number of XOR/NXOR in the network withthose maximum IMP scores for a given p.

According to [9], XOR and NXOR logics can achieve higher IMP scores than 1-3-minterm logics considering the same p forboth. In Section 4.3, for a particular structure of network and a mixture of AND and XOR, large proportion of XORs leads tolarge IMP scores considering the same p. From these observations, it should be expected that the larger the number of XOR/NXOR logics in the network, the larger the possibility to achieve high IMP score. In other words, we expect to observe a largepositive correlation between the number of XOR/NXOR logics present in the network and its highest IMP score. Fig. 10 dis-plays four plots showing the evolution of the median of the maximum IMP scores vs. number of XOR/NXOR logics. The ver-tical bars indicate the interquartile range for the simulated networks containing the respective number of XOR/NXOR logics,while the points plotted above the vertical bars indicate the value of the maximum IMP score. The corresponding mean andstandard deviation plots are presented in Fig. 11. These plots show that there is some positive correlation between the

Fig. 9. Example of a network constructed with n = 8 (logics not shown).

0 1 2 3 4 50.3

0.4

0.5

0.6

0.7

0.8

0.9

# of XOR/NXOR logics

max

imum

IMP

scor

ep = 0.95 (Correlation = 0.4269)

0 1 2 3 4 50.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

# of XOR/NXOR logics

max

imum

IMP

scor

e

p = 0.90 (Correlation = 0.4806)

0 1 2 3 4 50.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

# of XOR/NXOR logics

max

imum

IMP

scor

e

p = 0.80 (Correlation = 0.5871)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

# of XOR/NXOR logics

max

imum

IMP

scor

e

p = 0.60 (Correlation = 0.5856)

Fig. 10. Median value of maximum IMP scores as a function of number of XOR/NXOR logics for p = 0.95, p = 0.9, p = 0.8 and p = 0.6. The correlations betweenthe number of XOR/NXOR logics and maximum IMP score are: (a) 0.4269, (b) 0.4806, (c) 0.5871 and (d) 0.5856.

28 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

number of XOR/NXOR logics and maximum IMP score as hypothesized. Also, this correlation increases with decreasing p,agreeing with [9], which states that high IMP scores will mostly occur for XOR/NXOR logics when p is not large enough.

5.4. Territory size vs. IMP score

As it was seen in Section 4, the IMP score increases with the territory size for a particular network structure and 2-degreepredictions full of AND or XOR. This section verifies if such property is maintained for simulated random networks that mixall 2-degree IMP logics.

As the XOR/NXOR logics usually tend to give higher IMP scores, it is necessary to separate the analysis in two: the firstanalysis is done only for the set of networks which have XOR/NXOR logics as targets with a given territory size; and the sec-ond is analogous but for 1-minterm or 3-minterm logics. For each considered p, 120,000 networks were generated, 10,000 foreach considered territory size k (k 2 {3,4, . . . ,14}). By construction, node xk has the maximum territory size (whenever it issaid that a network has a territory size k, it means that the node xk of this network has territory size k). The total of 120,000network samples is sufficiently large, as it can be observed from the fact that even at one quarter of this number (30,000networks), the means and standard deviations of the plots of territory size vs. IMP score are almost the same consideringthree sample sizes (30,000, 60,000 and 120,000), for every combination of logics (XOR/NXOR or 1-minterm and 3-minterm)and predictive powers considered in the experiments. As an example, Table 2 shows the means and standard deviations ofIMP scores for each set of samples grouped by territory size and for three sample sizes (30,000; 60,000; 120,000) considering

Fig. 11. Mean values and standard deviation bars of maximum IMP scores as a function of number of XOR/NXOR logics for p = 0.95, p = 0.9, p = 0.8 andp = 0.6.

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 29

XOR/NXOR logics as targets and predictive power p = 0.95. The summarization of the numerical data presented in this tablecan be observed in Fig. 12. Observe that the variation among the means and standard deviations are very small along thenumber of samples for all territory sizes.

5.4.1. XOR/NXOR logicsHere we analyzed how many sample networks of each considered territory size k appear in the top 5% networks in IMP

score (6000 out of 120,000 sample networks) assigning XOR/NXOR logic of prediction to the node xk. Fig. 13 displays histo-grams showing the respective frequencies of each territory size among the top 6000 sample networks for four values of p. Forhigh p (0.95 and 0.9), the bars increase with the territory size, showing that there is a relatively high correlation between theterritory size and the number of networks present in top 5% IMP scores. On the other hand, for low p (0.8 and 0.6), such cor-relation significantly diminishes. A possible explanation for this fact is given later in Section 5.4.3.

5.4.2. 1-Minterm and 3-minterm logicsBy taking into account the IMP scores obtained for targets with 1-minterm and 3-minterm logics of prediction and their

territory sizes, for high p (0.95,0.90), the histograms of sample networks with a given territory size and found in the top 5%IMP scores are increasing along the entire domain of the considered territory sizes. For low p (0.80,0.60), differently from theXOR/NXOR case, the bars still have a remarkable increasing behavior until the territory size is equal to 12 (see Fig. 14).

Table 2Means and standard deviations of IMP scores for each territory size and for three total numbers of networks (30,000; 60,000; 120,000) considering XOR/NXORlogics as targets and predictive power p = 0.95.

Samples

30,000 Mean 0.4737 0.4952 0.4593 0.4480 0.4384 0.4318Std 0 0.1640 0.2105 0.2191 0.2216 0.2150

60,000 Mean 0.4737 0.4976 0.4600 0.4400 0.4429 0.4451Std 0 0.1610 0.2130 0.2174 0.2211 0.2206

120,000 Mean 0.4737 0.4960 0.4562 0.4438 0.4398 0.4368Std 0 0.1632 0.2135 0.2191 0.2206 0.2178

Territory sizes 3 4 5 6 7 8

30,000 Mean 0.4289 0.4378 0.4269 0.4359 0.4273 0.4324Std 0.2191 0.2162 0.2103 0.2175 0.2129 0.2131

60,000 Mean 0.4320 0.4255 0.4321 0.4344 0.4303 0.4314Std 0.2160 0.2134 0.2130 0.2134 0.2141 0.2121

120,000 Mean 0.4336 0.4341 0.4314 0.4302 0.4319 0.4353Std 0.2152 0.2147 0.2152 0.2125 0.2124 0.2123

Territory sizes 9 10 11 12 13 14

Fig. 12. Summarization of Table 2 presenting the IMP score means and standard deviation (std) bars for each territory size and for three total numbers ofnetworks: 30,000 (‘‘X’’), 60,000 (circles) and 120,000 (crosses). In black: mean; in green: +std; in blue: �std. (For interpretation of the references to color inthis figure legend, the reader is referred to the web version of this article.)

30 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

5.4.3. DiscussionThe reason why the top-ranked networks in terms of IMP scores have large territory size, especially for high predictive

power, can be explained by the relation between territory size and covariance between predictors, confirming the analyticalstudy observations made in Section 4. The first node (the source) has large (absolute) covariance with the second node, sinceit is uniquely predicted by the source node. Thus, if the target node considered is close to the source node (small territorysize), it is expected that its predictors will be highly covariate. On the other hand, if the target is far from the source node(large territory size), it is more likely that the predictors will be less covariate. In other words, the high covariance betweenthe source node and its successor dissipates along the network. This result agrees with the analysis presented in [9], in whichonly small covariance values can benefit IMP score, whereas high covariance dramatically reduces it.

For each predictive power considered (p = {0.95,0.90,0.800.60}), 240,000 networks were generated for each p value(120,000 considering the gene k predicted by XOR/NXOR logics, and 120,000 predicted by 1-minterm and 3-minterm logics),20,000 networks for each territory size considered (3,4, . . . ,14). It is important to analyze the absolute covariance betweenthe predictors of node k and the correlation with its territory size. Fig. 15 displays four curves, one for each p considered,indicating the evolution of the average of the covariances along the territory sizes (each mean is calculated from 20,000 sam-ples). The correlations of the covariances and territory sizes are �0.51053, �0.59186, �0.65099 and �0.59599 for p = 0.95,0.90, 0.80, 0.60, respectively.

In addition, the curves plotted in Fig. 15 can explain the fact that the increasing behavior of the histograms bars tend todisappear for lower predictive power (disappearing faster for XOR/NXOR logics). From Martins et al. work [9], we have that asmall absolute covariance (jcj 6 0.08) is actually beneficial to increase the chance of IMP in the case of XOR/NXOR logics. For

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

esXOR/NXOR with p = 0.95

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es

XOR/NXOR with p = 0.90

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es

XOR/NXOR with p = 0.80

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es

XOR/NXOR with p = 0.60

Fig. 13. Histograms of sample networks with a given territory size and IMP score belonging to the top 5% IMP scores for XOR/NXOR logics andp = {0.95,0.9,0.8,0.6}.

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 31

the 1-minterm and 3-minterm logics, either positive or negative covariance is beneficial. If positive covariance is beneficial,negative covariance is detrimental. Otherwise, if positive covariance is detrimental, negative covariance is beneficial.

Looking more carefully at Fig. 15, the curves for large p (0.95 and 0.90) start with large average covariance (larger than0.15) and stay larger than 0.06 until territory size equal to 5, which is detrimental for all logics, inducing a notable increasingbehavior on the histograms bars for both cases, as it is clearly easier to get large IMP scores when the average covariance isnot so high (around 0.05), which happens only for large territory sizes. On the other hand, for low p (0.80 and 0.60), thecurves do not start out as high, so it is perfectly possible to achieve high IMP scores for territory sizes around 6 or 7 inthe case of XOR/NXOR logics (the increasing behavior of the histogram bars disappeared even for p = 0.8). For 1-mintermand 3-minterm logics, the increasing trend of the histogram bars still persists for low p, since the absolute value of thecovariance can be either beneficial or detrimental to achieve high IMP scores.

5.4.4. Real network experimentIn order to show the increasing tendency of the IMP score with respect to the territory size can be found in practical sit-

uations, we considered the Alarm dataset retrieved from the Bayesian network Repository (http://www.cs.huji.ac.il/site/labs/compbio/Repository). It is a Bayesian network that codifies the medical knowledge in terms of diagnosis variables, findingvariables and intermediate variables for medical diagnosis purposes [1]. Each variable assumes a number of possible states(values) that ranges from 2 to 4. The variables (nodes) contain the conditional probability distribution given the values oftheir parent nodes. Given the conditional probability distribution of all variables, it is possible to calculate the jointprobability distribution of the whole network and, consequently, the IMP scores of the variables. Fig. 16 illustrates the Alarm

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es1,3−minterm with p = 0.95

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es

1,3−minterm with p = 0.90

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es

1,3−minterm with p = 0.80

3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

territory size

# of

occ

urre

nces

in to

p 5%

IMP

scor

es

1,3−minterm with p = 0.60

Fig. 14. Histograms of sample networks with a given territory size and IMP score belonging to the top 5% IMP scores for 1-minterm and 3-minterm logicsand p = {0.95,0.9,0.8,0.6}.

Fig. 15. Evolution of the average covariances (in modulus) between the predictors of the node k along the territory sizes for p = {0.95,0.90, 0.80,0.60}. Therespective correlations of the covariances and territory sizes are �0.51053, �0.59186, �0.65099 and �0.59599.

32 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

Fig. 16. The Alarm Bayesian network including the IMP scores obtained for 2-degree variables.

Fig. 17. Plot of the territory sizes and IMP scores of the Alarm Bayesian network variables. The linear and Spearman correlations between the territory sizeand IMP score are 0.4759 and 0.5495 respectively.

D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34 33

network along with the IMP scores of the 2-degree variables, while their corresponding (territory size, IMP score) pairs canbe seen in Fig. 17. For this network, there is a positive correlation of 0.4759 between the territory sizes and IMP scores. Itscorresponding Spearman rank correlation is 0.5495. The seven nodes with the largest IMP scores have territory sizes larger orequal to 8, while the node with the largest IMP score has a territory size equal to 10. Such result corroborates the conclusionsprovided by the histograms presented in Figs. 13 and 14, which means that nodes with territory size larger or equal to 8 havea better chance to present a large IMP score than those with very small territory size.

34 D.C. Martins Jr. et al. / Information Sciences 225 (2013) 18–34

6. Conclusion

In this paper we have analyzed the intrinsically multivariate prediction phenomenon in networks with three or morenodes, an extension of the study presented in [9] which considers only one target and its set of predictors. In order to mea-sure the intrinsically multivariate predictiveness, we employed the IMP score, a metric based on the coefficient of determi-nation (CoD). We have derived analytical formulas of the IMP score in terms of the territory size of a given node in aparticular network structure following the Bayesian network model. In order to study more general Bayesian networks,we have conducted an experimental study based on simulated networks, and the results corroborate the analytical resultsobtained for the particular network structure.

The main finding of this paper is that the IMP score between a target node and its predictors (directly connected with thetarget) tends to grow with the territory size. Such tendency was also found in a real Bayesian network application. This ten-dency is explained by the fact that the absolute value of the covariance between predictors of a given target decreases withthe territory size of the considered target, which is consistent with the results found in [9] (large absolute covariance tendsto reduce the IMP score). Moreover, such a trend is very relevant if one is interested in biological networks, which usuallyhave a large number of nodes and, hence, a high probability of presenting nodes with large territories (e.g. gene networkswith thousands of genes). Naturally, it is possible to find nodes with large IMP scores without large territories, but if thereare more nodes with large territories, then large IMP scores become more likely, because a low covariance between the pre-dictors (possibly caused by the large target territory as shown in Fig. 15) tends to impact positively their IMP score with re-spect to the target.

The asymptotic IMP score was found to depend on the number of XOR logics and the predictive power in the network. Thissuggests a kind of capacity, the maximum (asymptotic) IMP score, of the network from which classes of networks and theirproperties may be investigated. For example, from Fig. 5 different networks (with different predictive power and number ofXOR gates) that converge to the same IMP score may be found.

This work presented theoretical advances in the analysis of the intrinsically multivariate prediction concept in Bayesiannetworks. Based on the results presented here, methods could be proposed to find intrinsically multivariate predicted vari-ables in Bayesian networks. In order to accomplish this task, such methods could be guided to look for variables with largeterritories or XOR logics. The development of a technique to discover IMP variables based on the theoretical findings revealedin this paper can be considered for future work.

Acknowledgments

This work was supported by the Fundação de Amparo e Apoio à Pesquisa do Estado de São Paulo (FAPESP), theCoordenação de Aperfeiçoamento de Pessoal de Nı́vel Superior (CAPES), the Conselho Nacional de DesenvolvimentoCientı́fico e Tecnológico (CNPq), Microsoft Research, and the U.S. National Science Foundation, through NSF awardsCCF-0845407 (Braga-Neto).

References

[1] I. Beinlich, G. Suermondt, R. Chavez, G. Cooper, The alarm monitoring system: a case study with two probabilistic inference techniques for beliefnetworks, in: Proc 2nd European Conference on AI and Medicine, Springer, Berlin, 1989.

[2] C. Butz, S. Hua, J. Chen, H. Yao, A simple graphical approach for understanding probabilistic inference in Bayesian networks, Information Sciences 179(2009) 699–716.

[3] E.R. Dougherty, S. Kim, Y. Chen, Coefficient of determination in nonlinear signal processing, EURASIP Journal on Signal Processing 80 (10) (2000) 2219–2235.

[4] N. Friedman, M. Linial, I. Nachman, D. Pe’er, Using Bayesian networks to analyze expression data, Journal of Computational Biology 7 (2000) 601–620.[5] D. Heckerman, C. Meek, G. Cooper, A Bayesian Approach to Causal Discovery. Tech. Rep., Microsoft Research, 1997.[6] A.S. Jarrah, B. Raposa, R. Laubenbacher, Nested canalyzing, unate cascade, and polynomial functions, Physica D 233 (2007) 167–174.[7] S.A. Kauffman, The Origins of Order, Oxford University Press, 1993.[8] T. Koski, J. Noble, Bayesian Networks: A Introduction, Wiley Series in Probability and Statistics, Wiley, 2009.[9] D.C. Martins Jr., U. Braga-Neto, R.F. Hashimoto, E.R. Dougherty, M.L. Bittner, Intrinsically multivariate predictive genes, IEEE Journal of Selected Topics

in Signal Processing, Special Issue on Genomics and Proteomics Signal Processing 2 (2008) 424–439.[10] A.A. Moreira, L.A.N. Amaral, Canalizing Kauffman networks: nonergodicity and its effect on their critical behavior, Physical Review Letters 94 (2005)

218702.[11] M. Neil, N. Fenton, M. Tailor, Using Bayesian networks to model expected and unexpected operational losses, Risk Analysis 25 (4) (2005) 963–972.[12] D.J. Ozer, Correlation and the coefficient of determination, Psychological Bulletin 97 (2) (1985) 307–315.[13] J. Pearl, T.S. Verma, A theory of inferred causation, in: Principles of Knowledge Representation and Reasoning: Proc. Second International Conference

(KR ’91), Madrid, Spain, 1991, pp. 441–552.[14] O. Pourret, N. Patrick, B. Marcot, Bayesian Networks: A Practical Guide to Applications, John Wiley & Sons, 2008.[15] P. Pudil, J. Novovicová, J. Kittler, Floating search methods in feature selection, Pattern Recognition Letters 15 (1994) 1119–1125.[16] I. Shmulevich, S.A. Kauffman, Activities and sensitivities in boolean network models, Physical Review Letters 93 (10) (2004) 048701.[17] A. Smith, J. Bernardo, Bayesian Theory, John Wiley & Sons, 1994.[18] P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction and Search, Springer Verlag, 1993.[19] L.C. Thomas, A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers, International Journal of Forecasting 16 (2)

(2000) 149–172.