bmc bioinformatics biomed central - springer · pdf filebmc bioinformatics research article...

17
BMC Bioinformatics Research article Granger causality vs. dynamic Bayesian network inference: a comparative study Cunlu Zou 1 and Jianfeng Feng* 2,1 Address: 1 Department of Computer Science, University of Warwick, Coventry, UK and 2 Centre for Computational Systems Biology Fudan University, Shanghai, PR China E-mail: Cunlu Zou - [email protected]; Jianfeng Feng* - [email protected] *Corresponding author Published: 24 April 2009 Received: 17 December 2008 BMC Bioinformatics 2009, 10:122 doi: 10.1186/1471-2105-10-122 Accepted: 24 April 2009 This article is available from: http://www.biomedcentral.com/1471-2105/10/122 © 2009 Zou and Feng; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results. Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better. Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better. Background Based upon high throughput data, to reliably and accurately explore the network structure of elements (genes, proteins, metabolites, neurons etc.) is one of the most important issues in computational biology [1-6]. Currently, there are two main approaches which are often used to infer causal relationships [7] or interac- tions among a set of elements [8,9]. One is the Granger causality approach [10,11], and the other is the Bayesian network inference approach [12,13]. The latter is often applied to static data. However, one can employ the dynamic Bayesian networks to deal with time series data for which the Granger causality has been solely developed. The Granger causality has the advantage of having a corresponding frequency domain decomposi- tion so that one can clearly find at which frequencies two elements interact with each other. Giving a multi-variable time series dataset, the Granger causality and dynamic Bayesian networks [14] can both be applied. The Granger causality notation, which was firstly introduced by Wiener and Granger [15,16], proposed that we can determine a causal influence of one time series on another: the prediction of one time series can be improved by incorporating the knowledge of the second one. On the other hand, The Bayesian Page 1 of 17 (page number not for citation purposes) BioMed Central Open Access

Upload: vuongque

Post on 15-Mar-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

BMC Bioinformatics

Research articleGranger causality vs. dynamic Bayesian network inference:a comparative studyCunlu Zou1 and Jianfeng Feng*2,1

Address: 1Department of Computer Science, University of Warwick, Coventry, UK and 2Centre for Computational Systems Biology FudanUniversity, Shanghai, PR China

E-mail: Cunlu Zou - [email protected]; Jianfeng Feng* - [email protected]*Corresponding author

Published: 24 April 2009 Received: 17 December 2008

BMC Bioinformatics 2009, 10:122 doi: 10.1186/1471-2105-10-122 Accepted: 24 April 2009

This article is available from: http://www.biomedcentral.com/1471-2105/10/122

© 2009 Zou and Feng; licensee BioMed Central Ltd.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: In computational biology, one often faces the problem of deriving the causalrelationship among different elements such as genes, proteins, metabolites, neurons and so on,based upon multi-dimensional temporal data. Currently, there are two common approaches usedto explore the network structure among elements. One is the Granger causality approach, and theother is the dynamic Bayesian network inference approach. Both have at least a few thousandpublications reported in the literature. A key issue is to choose which approach is used to tacklethe data, in particular when they give rise to contradictory results.

Results: In this paper, we provide an answer by focusing on a systematic and computationallyintensive comparison between the two approaches on both synthesized and experimental data. Forsynthesized data, a critical point of the data length is found: the dynamic Bayesian networkoutperforms the Granger causality approach when the data length is short, and vice versa. We thentest our results in experimental data of short length which is a common scenario in currentbiological experiments: it is again confirmed that the dynamic Bayesian network works better.

Conclusion: When the data size is short, the dynamic Bayesian network inference performsbetter than the Granger causality approach; otherwise the Granger causality approach is better.

BackgroundBased upon high throughput data, to reliably andaccurately explore the network structure of elements(genes, proteins, metabolites, neurons etc.) is one of themost important issues in computational biology [1-6].Currently, there are two main approaches which areoften used to infer causal relationships [7] or interac-tions among a set of elements [8,9]. One is the Grangercausality approach [10,11], and the other is the Bayesiannetwork inference approach [12,13]. The latter is oftenapplied to static data. However, one can employ thedynamic Bayesian networks to deal with time series datafor which the Granger causality has been solely

developed. The Granger causality has the advantage ofhaving a corresponding frequency domain decomposi-tion so that one can clearly find at which frequencies twoelements interact with each other.

Giving a multi-variable time series dataset, the Grangercausality and dynamic Bayesian networks [14] can bothbe applied. The Granger causality notation, which wasfirstly introduced by Wiener and Granger [15,16],proposed that we can determine a causal influence ofone time series on another: the prediction of one timeseries can be improved by incorporating the knowledgeof the second one. On the other hand, The Bayesian

Page 1 of 17(page number not for citation purposes)

BioMed Central

Open Access

network [17] is a special case of a diagrammaticrepresentation of probability distributions, called prob-abilistic graphical models [18-20]. The Bayesian networkgraph model comprises nodes (also called vertices)connected by directed links (also called edges or arcs)and there is no cycle in the graph. To learn the structureand the parameters for the Bayesian networks from a setof data, we should search the space(s) of all possiblegraph representations, and find out which structure ismost likely to produce our data. If we have a scoringfunction (or likelihood function) which can determinethe structure and parameter likelihood from the data,then the problem is to find the highest score (maximumlikelihood) structure among all the possible representa-tions.

The causal relationship derived from these twoapproaches could be different, in particular when weface the data obtained from experiments. Therefore it isof vital importance to compare these two causal inferringapproaches before we could confidently apply them tobiological data. By doing the comparison, one expects tofind the advantages, performances and stabilities foreach technique.

Adopting the most common existing methods to find thecoefficients of the time series in both approaches in theliterature, we compare the dynamic Bayesian networkwith the Granger causality both in the linear andnonlinear model. Interestingly, a critical point of thedata length is found. When the data length is shorterthan the critical point, the dynamic Bayesian networkapproach outperforms the Granger causality approach.But when the data length is longer, the Granger causalityis more reliable. The conclusion is obtained via intensivecomputations (more than 100 computers over a fewweeks). A biological data set of gene microarray isanalyzed using both approaches, which indicates that fora data set with a short sampling length the dynamicBayesian network produces more reliable results. Insummary, we would argue that the dynamic Bayesiannetwork is more suitable for dealing with experimentaldata.

ResultsTo illustrate and compare the differences between thedynamic Bayesian network inference and the conditionalGranger causality, a simple multivariate model withfixed coefficients, which has been discussed in manypapers to test the Granger causality, is tested first. Wethen extend our comparisons to the more general case ofthe model with random coefficients, which requiresconsiderable computational resources. More than 100networked computers are used to perform the

comparisons for more than a week. Both the Grangercausality and the dynamic Bayesian network are appliedto nonlinear models. Finally, we test our approach on aset of microarray data recently acquired from a compar-ison of mock and infected Arabidopsis leaf.

Synthesized data: linear caseExample 1 Suppose we have 5 simultaneously recordedtime series generated according to the equations:

X n X n X nX n X nX n

1 1 1 1

2 1 2

3

0 95 1 0 9025 20 5 2

2( ) . ( ) . ( )( ) . ( )(

= − − − += − +

ee

)) . ( )( ) . ( ) . ( ) . (

= − − += − − + − + −

0 4 30 5 1 0 25 1 0 25

1 3

4 1 4 52 2

X nX n X n X n X n

e11

0 25 1 0 25 14

5 4 5 52 2

)( ) . ( ) . ( )

+= − − + − +

⎪⎪⎪⎪

⎪⎪⎪⎪

eeX n X n X n

(1)

where n is the time, and [ε1, ε2, ε3, ε4, ε5] are independentGaussian white noise processes with zero means andunit variances. From the equations, we see that X1(n) is acause of X2(n), X3(n) and X4(n), and X4(n) and X5(n)share a feedback loop with each other, as depicted inFigure 1B. Figure 1A shows an example of the time traceof 5 time series. For the Granger causality approach, wesimulated the fitted vector autoregressive (VAR) modelto generate a data set of 100 realizations of 1000 timepoints, and applied the bootstrap approach to constructthe 95% confidence intervals (Figure 1C). For Grangercausality, we assume the causality value is Gaussiandistributed. Then the confidence intervals can beobtained by calculating the mean and standard deriva-tion values [21,22]. According to the confidence inter-vals, one can derive the network structure as shown inFigure 1B which correctly recovers the pattern of theconnectivity in our toy model. For the dynamic Bayesiannetwork inference approach, we can infer a networkstructure (Figure 1Da) for each realization of 1000 timepoints. The final resulting causal network model wasinferred with high-confidence causal arcs (the arcs occurmore than 95% of the time in the whole population)between various variables [13]. This complex networkcontains the information of different time-lags for eachvariable. It fits exactly the pattern of connectivity in ourVAR model. In order to compare it with the Grangercausality approach, we can further simplify the networkby hiding the information of time-lags, and then we inferthe exactly same structure as the Granger causalityapproach (Figure 1Dd). From this simple example, wecan find that both approaches can reveal correct networkstructures for the data with a large sample size (1000here).

Most, if not all, experimental data has a very limited timestep due to various experimental restrictions. Hence one

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 2 of 17(page number not for citation purposes)

of the key quantities to test the reliability of an approachis the data length (sample size). In the next setup, we

reduce the sample size to a smaller value and check itsimpact. Figure 2A shows the case of the sample size of80: we find both approaches start failing to detect someinteractions (false negative). By reducing the sample sizeto 20, we can see that the Bayesian network inference canderive more true positive connections than the Grangercausality. This is certainly an interesting phenomenonand we intend to explore whether it is true for a moregeneral case.

Example 2 we considered a more general toy model; thecoefficients in the equations (1) of Example 1 arerandomly generated. This toy model aims to test thecausality sensitivity for the two approaches. Suppose 5simultaneously generated time series according to theequations:

X n w X n w X nX n w X nX n w X n

1 1 1 2 1 1

2 3 1 2

3 4 1

1 22

( ) ( ) ( )( ) ( )( ) (

= − + − += − +=

ee

−− += − + − + −= − +

31 1 11

3

4 5 1 6 4 7 5

5 8 4

)( ) ( ) ( ) ( )( ) ( )

eX n w X n w X n w X nX n w X n ww X n9 5 1( )−

⎪⎪⎪⎪

⎪⎪⎪⎪

(2)

where W1, W2, ..., W9 are uniformly distributed randomvariables in the interval [-1,1]. The randomly generatedcoefficients are also required to make the system stable.The stability can be tested by using the z-plane pole-zeromethod, which states if the outermost poles of the z-transfer function describing the time series are inside theunit circle on the z-plane pole-zero plot, then the systemis stable.

The above toy model is then used to test the two differentcausality approaches: Bayesian network inference andGranger causality. They are applied with different samplesizes. For each sample size, we randomly generated 100different coefficient vectors [W1, W2, ..., W9], whichcorresponds to100 different toy models in Example 1. Foreach different coefficient vectors model, we applied thesame approach as in Example 1, using Monte Carlosmethod to construct 95% confidence interval for theGranger causality approach and chose high-confidencearcs (appearing in at least 95% of all samplings) for theBayesian network inference approach. The total numberof arcs (or causalities) is 500 (5 interactions for eachrealization) for each sample size. However we cannotexpect to detect the maximum number of arcs in oursystem, since the coefficients are randomly generated,which could be significantly small.

Figure 3Aa shows the comparison result of the percen-tage of true positive connections derived from these two

Figure 1Granger causality and Bayesian network inferenceapproaches applied on a simple linear toy model. A.Five time series are simultaneously generated, and the lengthof each time series is 1000. X2, X3, X4 and X5 are shiftedupward for visualization purpose. B. Granger causalityresults. (a) The network structure inferred from Grangercausality approach. (b) The 95% confidence intervals graphfor all the possible directed connections. (c) For visualizationpurpose, all directed edges (causalities) are sorted andenumerated into the table. The total number of edges is 20.C. Dynamic Bayesian network inference results. (a) Thecausal network structure learned from Bayesian networkinference. (b) Each variable is represented by four nodes,representing different time-lags, we have a total of 20 nodes.They are numbered and enumerated in the table. (c) Thesimplified network structure: since we only care about thecausality to the current time status, we can remove all theother edges and nodes that have no connection to the node16 to node 20 (five variables with current time status).(d) A further simplified network structure of causality.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 3 of 17(page number not for citation purposes)

methods. In general, the Granger causality approach caninfer slightly more true positive causalities compared to

the Bayesian network inference approach when the datalength is long. It is interesting to see that there is a criticalpoint at around 30 in Figure 3Aa. If the sample size islarger than 30, then the Bayesian network recovers lesspositive connections. However, if the sample size issmaller than 30, the Bayesian network performs better.From Figure 3Ab, we see that computing time for theBayesian network inference is much larger than theGranger causality.

Now we are in the position to find out why the dynamicBayesian network is better than the Granger causalitywhen the data length is short, and vise verse. In Figure3B, we compare the performances on different coeffi-cients (strength of interaction) for a fixed sample size of900 (super-critical case). The x-axis shows the absolutevalue of coefficients, and y shows the correspondingcausality (1 indicates positive causality and 0 indicatesno causality). For visualization purposes, the figure forthe Granger causality is shifted upward. From the fivegraphs, we can see that there is no difference betweenthese two approaches if the coefficients are significantlarge (strong interactions with an absolute value ofcoefficients being greater than 0.15): both approachescan infer the correct connections. For most cases, theGranger causality approach performs with more stabilitywhen the coefficients are larger than 0.15, and theBayesian network inference approach shows slightlymore oscillations around this point. Hence we concludethat the Granger causality is less sensitive to the smallvalue of the connection when the data length is large(see also the nonlinear case below).

We now compare the fitting accuracy of the twoapproaches, as shown in Figure 3C. We use the averagemean-square error as a measurement of the fitting. Notsurprisingly, the dynamic Bayesian network approachconsiderably outperforms the simple fitting algorithm inthe Granger approach [15,16], in particular when thedata length is short.

In conclusion, when the data is a reasonable fit to theoriginal model, the Granger causality works better. Thisis due to the fact that the Granger causality approach ismore sensitive to a small value of the interactions. Whenthe data length is short, the Bayesian approach can fit thedata much more reliably and it outperforms the Grangerapproach.

Synthesized data: non-linear caseIn real situations, all data should be nonlinear and alinear relationship as described above is only anapproximation. To address the nonlinear issue, we turnour attention to kernel models. As we all know, any

Figure 2Granger causality and Bayesian network inferenceapplied on data points of various sample sizes. Thegrey edges in the inferred network structures indicateundetected causalities in the toy model. For each sample sizen, we simulated a data set of 100 realizations of n time points.The Bayesian network structure represents a model averagefrom these 100 realizations. High-confidence arcs, appearingin at least 95% of the networks are shown. The Grangercausality inferred the structure according to the 95%confidence interval constructed by using the bootstrapmethod. (A) The sample size is 80. (B) The sample size is 60.(C) The sample size is 20.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 4 of 17(page number not for citation purposes)

Figure 3Granger causality and Bayesian network inference applied on a stochastic coefficients toy model. Theparameters in polynomial equation are randomly generated in the interval [-1,1]. For each randomly generated coefficientvector, we applied the same approach as example 1: bootstrapping method and 95% confidence interval for Granger causality;95% high confidence arcs are chosen from Bayesian network inference. (A) We applied both approaches on different samplesize (from 20 to 900). For each sample size, we generated 100 different coefficient vectors, so the total number of directedinteractions for each sample size is 500. (a) The percentage of detected true positive causalities for both approaches. (b) Timecost for both approaches. (B) For sample size 900, the derived causality (1 represents positive causality and 0 representsnegative) is plotted with the absolute value of corresponding coefficients. For visualization purpose, the figure for Grangercausality is shifted upward. (C) Linear model fitting comparison for both Granger causality and Bayesian networks. Using anumber of training data points to fit both linear models, one can calculate a corresponding predicted mean-square error byapplying a set of test data. And we can find that Bayesian networks inference approach works much better than the Grangercausality approach when the sample size is significant small (around 100). When the sample size is significant large, bothapproaches converge to the standard error which exactly fits the noise term in our toy model.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 5 of 17(page number not for citation purposes)

nonlinear relationship can be approximated by a seriesof kernel functions.

Example 3 we modify the model in example 1 to a seriesof nonlinear equations as follows:

X nX n

X nX n

X

1 1

2 2

0 125 2 1 1 2

2

1 2 1 1 2

2

( ) . exp(( )

)

( ) . exp(( )

)

= − − +

= − − +

e

e

33 3

4

1 05 1 1 2

2

1 15 1 1 2

2

0

( ) . exp(( )

)

( ) . exp(( )

)

.

nX n

X nX n

= − − − +

= − − −

+

e

22 2 4 1 2

21 35 5 1 2

2

0 5 2

4

5

exp(( )

) . exp(( )

)

( ) . exp(

− − + − − +

= − −

X n X n

X nX

e

44 1 2

20 25 2 5 1 2

2 5( )

) . exp(( )

)n X n− + − − +

⎪⎪⎪⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪⎪⎪⎪ e

(3)

In this example, the center and variance of each timeseries is chosen as the center and variance in the kernelfunction. We use the fuzzy c-mean method to find thecenter of each time series and then applied the sameapproach as in Example 1. For the Granger causalityapproach, we simulated the fitted VAR model to generatea data set of 100 realizations of 1000 time points, andapplied the bootstrap approach to construct the 95%confidence intervals (Figure 4Ca). According to theconfidence interval, one can derive the network structure(Figure 4Cb) which correctly recovers the pattern ofconnectivity in our non-linear model. For the Bayesiannetwork inference approach, we can infer a networkstructure (Figure 4Da) for each realization of 1000 timepoints. We can then obtain a simplified networkstructure (Figure 4Dd).

For a small sample size (see Figure 5), worse results areobtained for both approaches comparing to the previouslinear model. Both approaches start to miss interactionswhen the sample size is smaller than 300. When thesample size is 150, the Bayesian network inferenceapproach can detect one more true positive interactionthan the Granger causality. However, when the samplesize is 50, both approaches fail to detect all theinteractions.

In the next step, we extend our non-linear model to amore general setting in which the coefficients in theequations are randomly generated. Figure 6Aa shows thecomparison result of the percentage of true positiveconnections derived from these two methods. It is veryinteresting to see that a critical point around 500 exists inthe non-linear model, similar to the linear model before.

From Figure 6Ab, the computing time required for theBayesian network inference is still much larger thanthe Granger causality. In Figure 6B, we compare theperformances on different coefficients (strength ofinteraction) for a fixed sample size of 900. From thefive graphs, we can see that in general the Grangerapproach is more sensitive to a small value of thecoefficients (see Figure 6B. X5 -> X4 and X4 -> X5).

Therefore, all conclusions in the linear case are con-firmed in the nonlinear model. In the literature [23], theresult they obtained shows the same direction as we didhere, which finds that the Granger causality performsbetter than the dynamic Bayesian network inferenceconcerning a nonlinear kernel model of genetic regula-tory pathways and for a sufficiently large sample size(2000 data points).

Experimental dataFinally we carry out a study on experimental data ofmicroarray experiments. The gene data were collectedfrom two cases of Arabidopsis Leaf: the mock (normal)case and the infected case with the plant pathogenBotrytis cinerea. A total of 31,000 genes were measuredwith a time interval of two hours, with a total of 24sampling points (two days) and four replicates. We testthe Granger causality approach and dynamic Bayesiannetwork inference approach on a well-known circadiancircuit. This circuit contains 7 genes: PRR7, GI, PRR9,ELF4, LHY, CCA1 and TOC1. Figure 7A shows the timetraces of the 7 genes. From the time traces figure, it isclearly to see that they exhibit a 24 hour rhythm. Notethat the total number of time points is only 24.Compared to our previous toy model case, this samplesize is quite small. We therefore expect the Bayesiannetwork inference to be more reliable.

We first apply the dynamic Bayesian network inferenceapproach on these two data sets. The two networkstructures for two cases are shown in Figure 7B. In thenext step, the conditional Granger causality approach isapplied. By using the bootstrapping method, weconstruct 95% confidence intervals as shown in Figure7Cc. Finally, we can also obtain two network structuresfor two different cases shown in Figure 7Ca and Figure7Cb. It is clearly seen that the globe patterns for themock case and the infected case are different.

From the literature, there are three well known connec-tions among the whole structure for the mock case. (1) Itis known that GI alone is independent of the remainingsix genes in the circuit. There should be no connection toand from the GI node (node 2 in the Figure 7) in ourderived network. From Figure 7Ba and Figure 7Ca, we

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 6 of 17(page number not for citation purposes)

Figure 4Granger causality and Bayesian network inference approaches applied on a simple non-linear toy model. (A)Five time series are simultaneously generated, and the length of each time series is 1000. They are assumed to be stationary.(B) The five histogram graphs show the probability distribution for these five time series. (C) Assuming no knowledge ofMVAR toy model we fitted, we calculated Granger causality. Bootstrapping approach is used to construct the confidenceintervals. The fitted MVAR model is simulated to generate a data set of 100 realizations of 1000 time points each. (a) Forvisualization purpose, all directed edges (causalities) are sorted and enumerated into the table. The total number of edges is 20.95% confidence interval is chosen. (b) The network structure inferred from Granger causality method correctly recovers thepattern of connectivity in our MVAR toy model. (D) Assuming no knowledge of MVAR toy model we fitted, we approachBayesian network inference. (a) The causal network structure learned from Bayesian network inference for one realization of1000 time points. (b) Each variable is represented by two nodes; each node represents different time statuses, so we have 10nodes in total. They are numbered and enumerated into the table. (c) The simplified network structure: since we only careabout the causality to the current time status, we can remove all the other edges and nodes that have no connection to thenode 6 to node 10 (five variables with current time status). (d) A further simplified network structure: in order to comparewith Granger causality approach, we hid the information of time status, and we obtained the same structure as Grangercausality method had.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 7 of 17(page number not for citation purposes)

Figure 5Granger causality and Bayesian network inference applied on insufficient number of data points for non-linearmodel. The grey edges in the inferred network structures indicate undetected causalities in our defined toy model. For eachsample size n, we simulated a data set of 100 realizations of n time points. The Bayesian network structure represents a modelaverage from these 100 realizations. High-confidence arcs, appearing in at least 95% of the networks are shown. The Grangercausality inferred the structure according to the 95% confidence interval constructed by using the bootstrap method. (A) Thesample size is 300. (B) The sample size is 150. (C) The sample size is 50.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 8 of 17(page number not for citation purposes)

Figure 6Granger causality and Bayesian network inference applied on a stochastic coefficients non-linear model. Theparameters in polynomial equation are randomly generated in the interval [-2,2]. (A) We applied both approaches on differentsample size (from 300 to 900). For each sample size, we generated 100 different coefficient vectors, so the total number ofdirected interactions for each sample size is 500. (a) The percentage of detected true positive causalities for both approaches.(b) Time cost for both approaches. (B) For sample size 900, the derived causality (1 represents positive causality and 0represents negative) is plotted with the absolute value of corresponding coefficients. For visualization purpose, the figure forGranger causality is shifted upward.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 9 of 17(page number not for citation purposes)

Figure 7Granger causality approaches and Bayesian network inference approaches applied on experimental data(small sample size). The experiment measures the intensity of 7 genes in two cases of Arabidopsis Leaf: mock (normal) andinfected. (A)The time traces of 7 genes are plotted. There are 4 realizations of 24 time points. The time interval is 2 hours.(B) The network structures are derived by using dynamic Bayesian network inference. All the genes are numbered as shown.Interestingly, after infection, the total network structure is changed. (a) The network structure for mock case. (b) the networkstructure for infected case. (C) The network structures are derived by using Granger causality. (a) The network structure formock case. (b) the network structure for infected case. (c) Using bootstrapping method to construct a 95% confidenceintervals. For visualization purpose, all the directed edges are numbered and enumerate them into the table.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 10 of 17(page number not for citation purposes)

find that the dynamic Bayesian network inferencemethod clearly picks this up, but the conditionalGranger causality approach fails to detect this property.The Granger causality approach derived two falsepositive arcs which were connected to a GI node asshown in Figure 7Ca. (2) It is known that PRR7 and LHYshare a feedback loop. In other words, there should betwo directed arcs connected from node 1 (PRR7) to node5 (LHY) and from node 5 to node 1. The networkstructures derived from both approaches are in agree-ment with this known relationship. (3) It is known thatELF4 has some interactions with both LHY and CCA1.There should be some connections between node 4(ELF4) to node 5 (LHY), and between node 4 (ELF4) andnode 6 (CCA1). From our derived structures, bothapproaches can detect these connections, which are inagreement with the known structure in the literature[24,25].

According these three known relationships in thestructure, we can find that the Bayesian network structureis in agreement with all three rules, but the networkstructure derived from the conditional Granger causalityis not: two more false positive interactions are found.Again for a small sample size, the Bayesian networkinference approach could be more reliable than theconditional Granger causality approach.

DiscussionA fair comparisonIn our results presented here, one of the key issues whichis the cause of the critical point of the sampling sizebetween the dynamic Bayesian approach and theGranger causality lies in the fact that a batch fittingapproach is used in the Granger causality approach. Onemight argue that we could use the sequential fittingapproach as in the Bayesian network to improve theperformance of the Granger causality approach. This iscertainly the case. However, due to the thousandpublications in both topics [26], we simply adoptedthe most common approaches in the dynamic Bayesiannetwork approach and the Granger causality. Developingone of the approaches, for example the Grangercausality, so that it could always outperform the otheris an interesting future research topic.

How long is long enough?Although we have found the critical point of the twoapproaches, in practical applications, we have no certainidea where the critical point is. Hence, we still have tochoose one of them to tackle the data. In molecularbiology, we have to deal with a very limited data size;but in physiology, for example, neurophysiology, thedata we record is usually very long. Hence one could

argue that we use the dynamic Bayesian network in gene,protein or metabolite data, and apply the Grangercausality to physiology data. The dynamic Bayesiannetwork is more often reported in molecular biology,but the Granger causality has been very successfullyapplied in neurophysiological data [27] and fMRI. Theresult we chose to use was always chosen throughexperimental validation, as we did here for the plantdata.

Frequency decompositionAs we emphasized at the beginning, the advantage of theGranger causality over the dynamic Bayesian network isthe frequency decomposition, which is usually informa-tive when we deal with temporal data. For example, inneurophysiology data, we know the brain employsdifferent frequency bands to communicate betweenneurons and brain areas [28,29]. We would expect asimilar situation to arise in genes, proteins andmetabolites, although we lack a detailed analysis dueto the limited data length. To this end, we have alsopresented frequency decomposition results in Appendix1 (see Additional files 1 and 2) for the dynamic Bayesiannetwork.

False positiveIn our synthesized data, for both approaches, we did notfind any false positive links in our experiments.However, there were a few false positive links foundwhen we applied the conditional Granger causality andalso partial Granger causality (data not shown, [21]) onthe gene data. One might ask why this is the case; thereare several different reasons. Firstly, the experimentaldata is not strictly stationary: it is a natural process andevolves with time. As a first approximation, we treat it asstationary. Of course, we could use ARIMA rather thanARMA model to fit the data in the Granger causality.Secondly, the seven gene network is only a smallnetwork embedded in complete and large network, sothere are latent variables. Using the partial Grangercausality [21] which was originally developed foreliminating latent variables, GI still has links with theother six genes. Whether the dynamic Bayesian networkcould do a better job in the presence of latent variables isanother research topic.

The meaning of the found motifsTwo circuits are found: one with the mock plant and onewith the infected plant. The plant rewires its circadiancircuit after infection. Ignoring the issue of identifyingthe molecular mechanisms which control circuit rewir-ing, which is itself an interesting and challengingproblem, we intend to discuss the functional meaningof the two circuits. To this end, we could assign a

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 11 of 17(page number not for citation purposes)

dynamics to the network and try to decipher theimplications of the rewiring. Interestingly, we foundthat GI is recruited to save the network: if we leave GI asit is in the mock case, the whole network will rapidlyconverge to a fixed point state (a dead state). We willpublish the results elsewhere.

Reasons for short size dataIn our synthesized data, we test both short and long datasamples and come to the conclusion that there is acritical size, at which the two approaches behavedifferently. In our experimental data, we only tested itfor the short data set. Of course, as we mentioned above,in neurophysiological data, we have recordings of longtime traces and the Granger causality is widely usedthere. However, we have to realize that all in vivorecordings are very dynamic and stationary of data willbecome a key issue once we apply both approaches to along dataset. Furthermore, when the dataset is long, bothapproaches could do well and it is more difficult to findthe difference between the two. Hence we have onlycompared the results for short data length in theexperimental setup.

Reasons for small size of variablesIn our synthesized data, we only used 5 variables tosimulate a small interacting network; the number ofvariables could affect the result we derived. As expected,see also [23], the estimation of the Granger causalitybecomes unfeasible when the number of variables islarge and the amount of the data sets is small. Hence, allresults in the literature on estimating Granger causalityare exclusive for small networks (around the order of10), as we considered here. This is more or less true fordynamic Bayesian network inference as well. Extendingthe Granger causality and the dynamic Bayesian networkinference to large networks is a challenging problem,even before we carry out the same comparison study onthese two approaches as we did here.

ConclusionIn this paper, we carried out a systematic and compu-tationally intensive comparison between the two net-work structures derived from two common approaches:the dynamic Bayesian network inference and the Grangercausality. These two approaches are applied on bothsynthesized and experimental data. For synthesized data(both linear model and non-linear model), a criticalpoint of the data length is found, and the result is furtherconfirmed in experimental data. The dynamic Bayesiannetwork inference performs better than the Grangercausality approach, when the data length is short, andvice versa.

MethodsGranger causalityCausal influence measurement notation for time serieswas firstly proposed by Wiener-Granger. We can deter-mine a causal influence of one time series on another, ifthe predication of one time series can be improved byincorporating the knowledge of the second one. Grangerapplied this notation by using the context of linear vectorauto-regression (VAR) model of stochastic processes[30-33]. In the AR model, the variance of the predictionerror is used to test the perdition improvement. Forinstance, assume two time series; if the variance of theautoregressive prediction error of the first time series atthe present time is reduced by inclusion of pastmeasurements from the second time series, then onecan conclude that the second time series have a causalinfluence on the first one. Geweke [15,16] decomposedthe VAR process into the frequency domain, it convertedthe causality measurement into a spectral representationand made the interpretation more appealing.

The pairwise analysis introduced above can only beapplied to bivairate time series. For more than two timeseries, a time series can have a direct or indirect causalinfluence to other time series. In this case, pairwiseanalysis is not sufficient or misleading for revealingwhether the causal interaction between a pair is direct orindirect. In order to distinguish the direct and indirectcausal affect, one introduces the conditional causalitywhich takes account of the other time series' effect in amultivariate time series. In this paper, we used condi-tional causality to compare with the Bayesian networkinference introduced before.

Linear conditional Granger causalityThe conditional Granger causality was defined byGranger. It can be explained as following. Giving Twotime series Xt and Zt, the joint autoregressive representa-tion for Xt and Zt by using the knowledge of their pastmeasurement can be expressed as

X X Z

Z Z X

t i t i i t i t

ii

t i t i

i

i t i

i

a c

b d

= + +

= +

− −=

=

−=

−=

∑∑

1 1 1

11

1

1

1

1

ee

∞∞

∑ +

⎪⎪⎪

⎪⎪⎪

ee2t

(4)

and the noise covariance matrix for the system can berepresented as

SS S

S=

⎣⎢

⎦⎥ =

var( ) cov( , )

cov( , ) var( )

ee ee eeee ee

1 1 2

2 1 2

11 12

21

t t t

t t te SS22

⎣⎢

⎦⎥ (5)

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 12 of 17(page number not for citation purposes)

where var and cov represent variance and co-variancerespectively. Incorporating the knowledge of third timeseries, the vector autoregressive mode can be representedinvolving three time series Xt, Yt and Zt can berepresented as

X X Y Z

Y X

t i t i i t i

i

i t i t

ii

t i t i

a b c

d e

= + + +

= +

− −=

−=

=

∑ ∑∑ 2 2

1

2 3

11

2 2

ee

ii t i

i

i t i t

ii

t i t i

i

i t i

i

f

g h

Y Z

Z X Y

−=

−=

=

−=

−=

∑ ∑∑

+ +

= +

1

2 4

11

2

1

2

ee

11

2

1

5

−=

∑ ∑+ +

⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪

k i t i

i

tZ ee

(6)

And the noise covariance matrix for the above system is

SSee ee ee ee ee

ee ee ee=var( ) cov( , ) cov( , )

var( , ) var( ) cov(3 3 4 3 5

4 3 4

t t t t t

t t t eeee eeee ee ee ee ee

SS SS

4 5

5 3 5 4 5

t t

t t t t t

, )

var( , ) var( , ) var( )

⎢⎢⎢

⎥⎥⎥

=xx xy SSSS

SS SS SS

SS SS SS

xz

yx yy yz

zx zy zz

⎢⎢⎢⎢

⎥⎥⎥⎥

(7)

where εit, i = 1, 2, ..., 5 are the prediction error, which areuncorrelated over time. From above two sets ofequations, the conditional Granger causality form Y toX conditional on Z can be defined as

F t

tY X Z→ =| ln(

var( )

var( ))

eeee1

3(8)

When the causal influence from Y to X is entirely mediatedby Z, the coefficient b2i is uniformly zero, and the twoautoregression models for two time series and three timeseries will be exactly same, thus we can get var(ε1t) = var(ε3t).We then can deduce FYÆX|Z = 0, which means Y can notfuther improve the prediction of X including past measure-ments of Y conditional on Z. For var(ε1t) > var(ε3t) andFYÆX|Z = 0, we can say that there is still a direct influencefrom Y to X conditional on the past measurements of Z.

Non-linear conditional Granger causalityWe can extend our Granger causality to a non-linearmodel by using a series kernel functions [22,34]. Let X, Yand Z be three time series of n simultaneously measuredquantities, which are assumed to be stationary. We aresupposed to quantify how much Y cause X conditionalon Z. The general expression for the nonlinear model is:

X w X w Z

Z w X w Z

t j j t j

j

j j t j

j

t j j t j

j

j j t

= + +

= +

− −

∑ ∑∑

1 2 6

3 4

Φ Φ

Φ Φ

( ) ( )

( ) (

ee

−−∑ +

⎨⎪⎪

⎩⎪⎪

j

j

) ee7

(9)

Function F can be selected as the kernel function of Xand Z which has the following expression:

Φ j jX( ) exp( / )X X X= − −2 22s (10)

Φ j jZ( ) exp( / )Z Z Z= − −2 22s (11)

where X , Z are centers of X and Z, s X2 , s Z

2 arevariances of X and Z. The covariance matrix of predictionerror can be expressed as

SS S

S S=

⎣⎢

⎦⎥ =

var( ) cov( , )

cov( , ) var( )

ee ee eeee ee ee

6 6 7

7 6 7

112

122

212

2222

⎣⎢⎢

⎦⎥⎥

(12)

A joint autoregressive representation has the followingexpression:

X X Y Z

Y X

t j j t j

j

j j t j

j

j j t j

j

t j j t

w w w

w

= + + +

=

− − −∑ ∑ ∑5 6 7 8

8

Φ Φ Φ

Φ

( ) ( ) ( )

(

ee

−− − −

∑ ∑ ∑∑

+ + +

= +

j

j

j j t j

j

j j t j

j

t j j t j

j

w w

w w

) ( ) ( )

( )

9 10 9

11 1

Φ Φ

Φ

Y Z

Z X

ee

22 13 10j j t j

j

j j t j

j

wΦ Φ( ) ( )Y Z− −∑ ∑+ +

⎪⎪⎪⎪

⎪⎪⎪⎪

ee

(13)

The covariance matrix of prediction error can beexpressed as

SSee ee ee ee ee

ee ee ee ee ee=var( ) cov( , ) cov( , )

cov( , ) var( ) cov( , )8 8 9 8 10

9 8 9 9 10

ccov( , ) cov( , ) var( )ee ee ee ee ee

SS SS SS

SS

10 8 10 8 10

2 2 2

2⎡

⎢⎢⎢

⎥⎥⎥

=

xx xy xz

yx SSSS SS

SS SS SS

yy yz

zx zy zz

2 2

2 2 2

⎢⎢⎢⎢

⎥⎥⎥⎥

(14)

Similarly, we can define the conditional causality as

F t

tY X Z→ =| ln(

var( )

var( ))

eeee6

8(15)

Bayesian networkBayesian networks are probabilistic graphical modelsinitially introduced by [Kim & Pearl, 1987]. A Bayesiannetwork is the specific type of graphical model which isdirected acyclic graph [35,36]. Each arc in the model isdirected and there is no way to start from any nodes andtravel along a set of directed edges and get back at theinitial node. The set of nodes represent a set of randomvariables [X1, X2, ..., Xn], and the arcs express statisticaldependence between the downstream variables and theupstream variables. The upstream variables are alsocalled the parent variables of the downstream variables.

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 13 of 17(page number not for citation purposes)

Bayesian network inference yields the most concisemodel, automatically excluding arcs based on depen-dencies already explained by the model, which meansthe arcs in the network can be interpreted as aconditional causality. The edges in the Bayesian networkencode a particular factorization of the joint distribu-tion. The joint probability distribution can be decom-posed as following:

Ρ Ρ( , , , ) ( | ( ))X X X X X1 2

1

n i

i

n

iparents==

∏ (16)

That is, the joint probability distribution is the productof the local distributions of each node and its parents. Ifnode Xi has no parents, its local probability distributionis said to be unconditional, otherwise it is conditional.This decomposition is useful for Bayesian networksinference algorithm to deal with the uncertain situationand incomplete data.

To learn the parameter of the Bayesian network is toessentially estimate two kinds of probability distribu-tions: the probability P(X) and the conditional prob-ability P(X|Y). There are two kinds of approaches todensity estimation; the nonparametric method and theparametric method. The easiest estimation for nonpara-metric method is to use the histogram approach. Thedistribution can then be a tabular conditional prob-ability distribution, which is represented as a table.However this approach requires a much larger samplesize to give us an accurate estimation, which is notsuitable for general experimental data. For parametricmethod, one needs to make some assumptions about theform of the distribution such as widely used Gaussiandistribution. For a D-dimensional vector X, the multi-variate Gaussian distribution is in the form

N X X X| ,

| |

expmm SS

SS

mm SS mm( ) =

( )− −( ) −( )⎧

⎨⎩

⎫⎬⎭

−1

2 2

112

12

1

pD

T

(17)

Where μ is a D-dimensional mean vector, Σ is a D × Dcovariance matrix, and |Σ| denotes the determinant of Σ.In this paper, we first consider every node's conditionalprobability distribution as a conditional Gaussiandistribution for the following inferences. The distribu-tion on a node X can be defined as follows:

-no parents: P N( ) ~ ( | , )X X mm ss (18)

-continuous parents : P N TY X Y y X W y( | ) ~ ( | , )= +mm ss

(19)

Where the T is the matrix transposition. W is theconnection weight vector between node X and its parentsY. It can be represented by using the covariance matrix asfollowing:

W xy yy= SS SS (20)

The detailed inductions of parameter estimations aregiven in the next chapter.

For learning the structure of the Bayesian network, oneneeds to search the space of all the possible structuresand find out the best one which can be used to describethe input data, which is to maximum the conditionalprobability P(Data|θ, M) of data (Data) by give theparameters (θ) and the network structure (M). In orderto balance the complex and concise of the structure, wecan use BIC (Bayesian Information Criterion) as ascoring function, which includes an extra penalty term.

Parameter learning for linear GaussianThe parameter learning part can be approached by fittinga linear-Gaussian model. The goal of learning parameterin Bayesian network is to estimate the mean andcovariance of the conditional Gaussian distribution,thus we can deduce the parameters of μ, s and W inthe equation (19).

Suppose X is a D-dimensional vector with Gaussiandistribution N(X|μ, Σ), and one partition X into twodisjoint subsets Xa and Xb, To be easily illustration, onetakes the first M components of X to form Xa, and theremaining components to form Xb, so that

XX

X=

⎝⎜

⎠⎟

a

b(21)

We also define mean vector μ and the covariance matrixΣ given by

mmmmmm

SSSS SSSS SS

=⎛

⎝⎜

⎠⎟ =

⎝⎜

⎠⎟

a

b

aa ab

ba bb(22)

Considering the quadratic form in the exponent of theGaussian distribution, we can get following equation bya transformation.

− − − = − + +

= − −

− − −

12

12

12

1 1 1

1

( ) ( )

( ) (

X X X X X

X

mm SS mm SS SS mm

mm SS

T

a a aa

T T

T

const

XX X X

X X X

a a a a ab b b

b b ba a a

− − − −

− − − −

mm mm SS mm

mm SS mm

) ( ) ( )

( ) ( ) (

12

12

12

1

1

T

Tbb b bb b b− −−mm SS mm) ( )T 1 X

(23)

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 14 of 17(page number not for citation purposes)

From these and regard Xb as a constant, we obtain thefollowing expressions for the mean and covariance of theconditional probability distribution P(Xa|Xb).

mm mm SS SS mma b a ab bb b b| ( )= + −−1 X (24)

SS SS SS SS SSa b aa ab bb ba| = − −1 (25)

Thus the parameters in the Bayesian network can belearned from above two equations.

Parameter learning for non-linear GaussianWe can also extend our linear model to a non-linearmodel like that for the Granger causality case. Supposewe have two variables which can be expressed as inequation (9). The kernel function is also chosen asdescribed in equation (10) and equation (11).

Our non-linear model, the probability distribution for Xt

is no longer a Gaussian distribution. From the expres-sion in equation (9), we can find that the probabilitydistribution for Xt is a combined distribution of kernelfunction distribution for the past measured values of Xand Z, and a Gaussian distribution for the noise term.The kernel distribution is very difficult to derive, so onecan use a mixture of Gaussian models to approximatethe real distribution of kernel function. The mixtureGaussian model is in the form:

P N( ) ( | , )X X==

∑p k k k

k

K

mm SS1

(26)

Each Gaussian density N(X|μk, Σk) is called a componentof the mixture and has its own mean μk and covarianceΣk. The parameter πk are called mixing coefficients whichsatisfies:

p k

k

K

==

∑ 11

(27)

The conditional probability distribution for Xt condi-tional on the past observation of X and Y in thenonlinear model is still a Gaussian distribution whichcan be easily obtained as following:

P N( | ) ( | ( ), )X Y = y X= + ∑mm w yi i

i

Φ s (28)

where w is the connection weights between node X and itparents. It can be estimated by using the simple linearregression method.

Structure learningThere are two very different approaches to structurelearning: one is constraint-based and the other is searchand score algorithm. For the constraint-based algorithm,we start with a fully connected network and then removethe arcs, which are conditional independent. This has thedisadvantage that repeated independence tests losestatistical power. For the latter algorithm, we perform asearch on all possible graphs and select one graph whichbest describes the statistical dependence relationship inthe observed data.

Unfortunately, the number of possible graphs increasessuper-exponentially with the number of nodes, so somesearch algorithms are required for overcoming this kindof complex problem rather than doing a exhaustivesearch in the space. There are several searching algo-rithms that can be applied; such as annealing search,genetic algorithm search and so on. The question couldbecome easier if we know the total order of the nodes.The K2 algorithm allows us to find the best structure byselecting the best set of parents for each node indepen-dently. In the dynamic Bayesian networks, the order ofnodes can be interpreted as the sequence of time lagsrepresented for each node, so the K2 algorithm is appliedfor Bayesian network structure learning in this paper (seeAppendix 2 in Additional file 1 for more detailsdescriptions). The K2 algorithm tests parent insertionaccording to the order. The first node cannot have anyparent, for other nodes, we can only choose the parentnodes which are behind it in this order. Then the scoringfunction can be applied to determine the best parent set,i.e. the one which gives the best score.

In addition to the search algorithm, a scoring functionmust be defined in order to decide which structure isthe best (a high scoring network). There are two popularchoices. One is the Bayesian score metric which isthe marginal likelihood of the model, and the otheris BIC (Bayesian Information Criterion) defined asfollowing:

LogP DatadLog N( | ) ( )q −

2(29)

Where Data is the observed data, θ is the estimated valueof the parameters, d is the number of parameters and Nis the number of data cases. The term of d Log N2 ( ) isregarded as a penalty term in order to balance bothsimple and accurate structure representation.

Suppose we observed a set of independent and identi-cally distributed data Data = {Y1, Y2,..., YN}, each ofwhich can be a case of multi-dimensional data. Then thelog likelihood of the data set can be defined as

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 15 of 17(page number not for citation purposes)

LogP Data P Pi

i

N

ji

ji

N

pa ji

j( | ) log ( | ) log ( | , )( )q q q= == =∑ ∏∑Y Y Y

1 1

== ∑∑=

log ( | , )( )

ji

N

ji

pa ji

jP1

Y Y q

(30)

Where j is the index of the nodes or variables in theBayesian network, pa(j) is the set of parents of node j,and θj are the parameters that define the conditionalprobability of Yj giving its parents.

In this paper, the Bayesian networks inference can thenbe approached by following procedure: initially, K2algorithm is applied to search the space of possiblegraphs. For each possible structure, we can use theparameter learning algorithm to estimate the parametersof the networks. The BIC scoring function assigns acorresponding score through the estimated parametersand observed data set. The best network we can get is thehighest score structures among all the possible graphs[37].

Authors' contributionsThe whole work was carried out by CZ and supervisedby JF.

Additional material

Additional file 1A method for Bayesian network inference approach in a frequencydomain and a detailed description of Bayesian network structurelearning.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-122-S1.doc]

Additional file 2A matlab software for Bayesian network inference approach used inthis paper.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-122-S2.rar]

AcknowledgementsThis work was supported by the CARMAN e-science project funded bythe EPSRC (EP/E002331/1) and an EU grant (BION).

References1. Klipp E, Herwig R, Kowald A, Wierling C and Lehrach H: Systems

Biology in Practice: Concepts, Implementation and Application Weinheim:Wiley-VCH Press; 2005.

2. Feng J, Jost J and Qian M: Networks: From Biology to Theory London:Springer Press; 2007.

3. Alon U: Network motifs: theory and experimentalapproaches. Nat Rev Genet 2007, 8(6):450–461.

4. Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J,Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H,Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L,Krogan N, Li Z, Levinson JN, Lu H, Ménard P, Munyana C,Parsons AB, Ryan O, Tonikian R, Roberts T, Sdicu AM, Shapiro J,Sheikh B, Suter B, Wong SL, Zhang LV, Zhu H, Burd CG, Munro S,Sander C, Rine J, Greenblatt J, Peter M, Bretscher A, Bell G, Roth FP,Brown GW, Andrews B, Bussey H and Boone C: Global Mappingof the Yeast Genetic Interaction Network. Science 2004,303:808.

5. Tsai TY, Choi YS, Ma W, Pomerening JR, Tang C and Ferrell JE Jr:Robust, Tunable Biological Oscillations from InterlinkedPositive and Negative Feedback Loops. Science 2008, 321:126.

6. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK,Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J,Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB,Volkert TL, Fraenkel E, Gifford DK and Young RA: TranscriptionalRegulatory Networks in Saccharomyces cerevisiae. Science2002, 298:799.

7. Pearl J: Causality: Models, Reasoning, and Inference Cambridge:Cambridge Univ. Press; 2000.

8. Albo Z, Di Prisco GV, Chen Y, Rangarajan G, Truccolo W, Feng J,Vertes RP and Ding M: Is partial coherence a viable techniquefor identifying generators of neural oscillations. BiologicalCybernetics 2004, 90:318.

9. Horton PM, Bonny L, Nicol AU, Kendrick KM and Feng JF:Applications of multi-variate analysis of variances (MANOVA)to multi-electrode array data. Journal of Neuroscience Methods 2005,146:22.

10. Guo S, Wu J, Ding M and Feng J: Uncovering interactions in thefrequence domain. PLoS Computational Biology 2008, 4(5):e1000087.

11. Wu J, Liu X and Feng J: Detecting causality between differentfrequencies. Journal of Neuroscience Methods 2008, 167:367.

12. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S,Emili A, Snyder M, Greenblatt JF and Gerstein M: A BayesianNetworks Approach for Predicting Protein-Protein Inter-actions from Genomic Data. Science 2003, 302:449.

13. Sachs K, Perez O, Pe'er D, Lauffenburger DA and Nolan GP: CausalProtein-Signaling Networks Derived from MultiparameterSingle-Cell Data. Science 2005, 308:523.

14. Ghahramani Z: Learning Dynamic Bayesian Networks Berlin: SpringerPress; 2004.

15. Geweke J: Measurement of Conditional Linear Dependenceand Feedback Between Time Series. Journal of the AmericanStatistical Association 1982, 79(388):907.

16. Geweke J: Measurement of Linear Dependence and FeedbackBetween Multiple Time Series. Journal of the American StatisticalAssociation 1982, 77(378):304.

17. Jensen FV: An introduction to Bayesian networks London: UCL Press;1996.

18. Bach FR and Jordan MI: Learning Graphical Models forStationary Time Series. IEEE transactions on signal processing2004, 52(8):2189.

19. Buntine WL: Operations for Learning with Graphical Models.Journal of Artificial Intelligence Research 1994, 2:159.

20. Friedman N: Inferring Cellular Networks Using ProbabilisticGraphical Models. Science 2004, 303:799.

21. Guo S, Seth AK, Kendrick KM, Zhou C and Feng J: Partial GrangerCausality-Eliminating Exogenous Inputs and latent Vari-ables. Journal of neuroscience methods 2008, 172(1):79.

22. Chen Y, Rangarajan G, Feng J and Ding M: Analyzing multiplenonlinear time series with extended Granger causality.Physics Letters A 2004, 324:26.

23. Marinazzo D, Pellicoro M and Stramaglia S: Kernel-Grangercausality and the analysis of dynamic networks. Physical reviewE 2008, 77:056215.

24. Locke JC, Kozma-Bognár L, Gould PD, Fehér B, Kevei E, Nagy F,Turner MS, Hall A and Millar AJ: Experimental Validation of apredicted feedback loop in the multi-oscillator clock ofArabidopsis thaliana. Molecular Systems Biology 2006, 2:59.

25. Ueda HR: Systems biology flowering in the plant clock field.Molecular Systems Biology 2006, 2:60.

26. ISI Web of knowledge: On 24th July, 2008, a search on ISI tells us thatthere are 3531 papers on Bayesian network in the area of Comp. Sci.,Math., Math + Comp. Biol. and Business + Economics, and 1125 paperson Granger causality http://www.isiknowledge.com.

27. Wang S, Chen Y, Ding M, Feng J, Stein JF, Aziz TZ and Liu X:Revealing the dynamic causal interdenpendence betweenneural and muscular signals in Parkinsonian tremor. Journal of

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 16 of 17(page number not for citation purposes)

Franklin Institute-Engineering and Applied Mathematics 2007, 344(3–4):180.

28. Wu J, Kendrick K and Feng J: Detecting Hot-Spots in Multi-variates Biological Data. BMC Bioinformatics 2007, 8:331.

29. Zhan Y, Halliday D, Jiang Ping, Liu X and Feng J: Detecting thetime-dependent coherence between non-stationary electro-physiological signals-A combined statistical and time-fre-quency approach. Journal of Neuroscience Methods 2006, 156:322.

30. Akaike H: Fitting Autoregressive Models for Regression.Annals of the Institute of Statistcal Mathmatics 1969, 21:243.

31. Beamish N and Priestley MB: A Study of Autoregressive andWindow Spectral Estimation. Applied Statistics 1981, 30(1):41.

32. Morettin PA: Levinson Algorithm and Its Applications in TimeSeries Analysis. International Statistical Review 1984, 52(1):83.

33. Morf M, Vieira A, Lee DTL and Kailath T: Recursive MultichannelMaximum Entropy Spectral Estimation. IEEE transactions ongeosciences electronics 1978, GE-16(2):85.

34. Ancona N, Marinazzo D and Stramaglia S: Radial Basis FunctionApproach to Nonlinear Granger Causality of Time Series.Physical Review E 2004, 70:056221.

35. Bishop CM: Neural Networks for Pattern Recognition Oxford: OxfordUniv. Press; 1995.

36. Bishop CM: Pattern Recognition and Machine Learning New York:Springer Press; 2006.

37. Murphy K: Bayes Net Toolbox for Matlab. It can be down-loaded from the website. http://www.cs.ubc.ca/~murphyk/Soft-ware/BNT/bnt.html.

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

BMC Bioinformatics 2009, 10:122 http://www.biomedcentral.com/1471-2105/10/122

Page 17 of 17(page number not for citation purposes)