probabilistic approximations of bio-pathways dynamics p.s. thiagarajan school of computing, national...

Probabilistic Approximations of Bio-Pathways Dynamics

P.S. ThiagarajanSchool of Computing, National University of Singapore

Joint Work with: Liu Bing, David Hsu

Bio-pathways

Intracellular: Gene regulatory networks Metabolic networks signalling pathways.

A Common Modeling Approach

View a bio-pathway as a network of bio-chemical reactions Model the network as a system of ODEs

One for each molecular species Mass action, Michelis-Menten, Hill, etc.

Study the ODE system to understand the dynamics of the bio-chemical network

Major Hurdles Many rate constants not known

must be estimated noisy data; limited precision; population-based

High dimensional system closed-form solutions are impossible Must resort to numerical simulations Point values of initial states/data will not be available. a large number of numerical simulations needed for

answering each question

Our work:

Decompose Analyze Compose (ISMB’06, WABI’07)

Probabilistic models and methods to: approximate the ODE dynamics(CMSB’09,

TCS’11). update models (RECOMB’10)

The “Opinion Poll” Idea

Start with an ODEs system. Discretize the time and value domains. Assume a (uniform) distribution of initial states Generate a “sufficiently” large number of trajectories

by sampling the initial states and numerical simulations.

Encode this collection of trajectories as a dynamic Bayesian network.

ODEs DBN

The “opinion poll” Idea

Pay the one-time cost of constructing the Bayesian network.

Amortize this cost: perform multiple analysis tasks using

inferencing algorithms for DBNs.

Time Discretization

Observe the system only at discrete time points; in fact, only at finitely many time points

x(t)

t0 t1 t2tmax... ... ... ...

x(t) = t3 + 4t + x(0)

Value Discretization

Observe only with bounded precision

x(t)

t0 t1 t2 tmax... ... ... ...

A

B

C

D

E

Discrete Trajectories

A trajectory now is sequence of discrete values

C D D E D C B

x(t)

t0 t1 t2 tmax... ... ... ...

A

B

C

D

E

Discretized Behavior

Assume a prior distribution of the initial states. Behavior = (Uncountably) many sequences of discrete

values.

t

x(t)

t0 t1 t2 tmax... ... ... ...

A

B

C

D

E

C D D E D C B

D ... ...

Representation of Behavior

In fact, a probabilistic transition system. Pr( (D, 2) (E, 3) ) is

the “fraction” of the

trajectories residing in D

at t = 2 that land in E at

t = 3.

C D D E D C B

D

0.8

0.2

tt0 t1 t2 tmax... ... ... ...

A

B

C

D

E

Mathematical Justification The value space of the variables is assumed to be a

compact subset C of In Z’ = F(Z), F is assumed to be differentiable

everywhere in C. Mass-law, Michaelis-Menton,… for every rate constant k.

Then the solution (flow) t : C C (for each t) exists, is unique, a bijection, continuous and hence measurable.

But the transition probabilities can not be computed.

C D D E D C B

D

0.8

0.2

(s, i) – States; (s, i) (s’, i+1) -- Transitions

Sample, say, 1000 times the initial states.

Through numerical simulation, generate 1000 trajectories.

Pr((s, i) (s’ i+1)) is the fraction of the trajectories that are in s at ti which land in s’ at t i+1. tt0 t1 t2 tmax... ... ... ...

A

B

C

D

E

Computational Approximation

Infeasible Size! But the transition system will be huge.

O(T . kn) k 2 and n ( 50-100) can be large.

Compact Representation Exploit the network structure (additional

independence assumptions) to construct a dynamic Bayesian network instead.

The DBN is a factored form of a probabilistic transition system (Markov chain).

The DBN Representation

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

The DBN Representation

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

ABC

Each node has a CPT associated with it.

This specifies the local (probabilistic) dynamics.

S0

E0

ES0

P0 P1

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

ABC

A

BCC

B

BCB

C

BBB

C

CBC

C

CAC

... ...

S1

E1

ES1

S2

.

P(S2=C|S1=B,E1=C,ES1=B)= 0.2

P(S2=C|S1=B,E1=C,ES1=C)= 0.1

P(S2=A|S1=A,E1=A,ES1=C)= 0.05

. . .

Fill up the entries in the CPTs by sampling, simulations and counting

S0

E0

ES0

P0 P1

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

ABC S1

E1

ES1

S2

.

P(S2=C|S1=B,E1=C,ES1=B)= 0.2

P(S2=C|S1=B,E1=C,ES1=C)= 0.1

P(S2=A|S1=A,E1=A,ES1=C)= 0.05

. . .

Computational Approximation


S0

E0

ES0

P0 P1

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

ABC S1

E1

ES1

S2

500 100

1000

The Technique


S0

E0

ES0

P0 P1

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

ABC S1

E1

ES1

S2

500 100

P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2

The Technique

S0

E0

ES0

P0 P1

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

ABC S1

E1

ES1

S2

500 100

P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2

The size of the DBN is:

O(T . n . kd)

d will be usually much smaller than n.

Unknown rate constants

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

k2


During the numerical

generation of a

trajectory, the value

of k2 does not change

after sampling.

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

k2 k2 k2k2


Sample uniformly

across all the

Intervals.

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

k2 k2 k2k2

Bayesian inference

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

k2 k2 k2k2

This is the Factored Frontier algorithm (Murphy & Weiss)

Approximate but fast:

Parameter Estimation

S0

E0

ES0

P0

S1

E1

ES1

P1

S2

E2

ES2

P2

S3

E3

ES3

P3

... ...

... ...

... ...

... ...

k2 k2 k2k2

For each choice of (jnterval) values for unknown parameters, present experimental data as evidence and assign a score using FF.

Return parameter estimates as maximal likelihoods.

Similar application of FF to do sensitivity analysis.

DBN based Analysis

Our experiments with signalling pathways models (taken from the biomodels data base) suggest:The one-time cost of constructing the DBN

can be easily recovered by using it to do parameter estimation and sensitivity analysis.

Segmentation Clock Network Governs the periodic formation

process of somites during embryogenesis

The underlying signaling network couples three oscillating pathways: FGF, Wnt and Notch.

22 ODEs; 75 unknown parameters

5 equal intervals; t = 5 min; 100 time points; 2.5 million trajectories; 3.1 hours on a 10 PC cluster.

Results

Global sensitivity analysis ODE based: 81 hours DBN based: 3 hours

LM: Levenberg–Marquardt

GA: Genetic Algorithm

SRES: Evolutionary Strategy

PSO: Particle Swarm Optimization

BDM: our method

40 unknown parameters;Synthetic data: 8 proteins, 10 time points (noise with 5% variance added)

ODE model (Brown et al. 2004) 32 species 48 parameters Features:

Good size Feedback loops

DBN construction Settings

5 intervals 1 min time-step, 100 minutes 3 x 106 samples

Runtime 4 hours on a cluster of 10 PCs

EGF-NGF Pathway

LM: Levenberg–Marquardt

GA: Genetic Algorithm

SRES: Evolutionary Strategy

PSO: Particle Swarm Optimization

BDM: our method

Parameter Estimation20 (out of 48) unknown parameters Synthetic data: 9 proteins, 10 time points

Sensitivity Analysis

Running time ODE based: 22 hours DBN based: 34 minutes

35

Complement System

Complement system is a critical part of the immune system

Ricklin et al. 2007

Classical pathwayLectin pathwayAmplified responseInhibition

37

Goals

Quantitatively understand the regulatory mechanisms of complement system How is the complement activity enhanced under inflammation

condition? How is the excessive response of the complement avoided?

The model: Classical pathway + the lectin pathway

enhancement

Inhibitory mechanisms C4BP

Complement System

ODE Model 42 Species 45 Reactions

Mass law Michaelis-Menten kinetics

92 Parameters (71 unknown) DBN Construction

Settings 6 intervals 100s time-step, 12600s 1.2 x 106 samples

Runtime 12 hours on a cluster of 20 PCss

39

Parameter estimation Training data

4 proteins, 7 time points, 4 conditions

* inflammation condition: pH 6.5 calcium 2 nM

40

Model Calibration (parameter estimation)

41

Model validation

Validated the model using previous published data (Zhang et al 2009)

42

Enhancement mechanism

The antimicrobial response is sensitive to the pH and calcium level

43

Model prediction: The regulatory effect of C4BP

C4BP maintains classical complement activation but delays the maximal response time.

But attenuates the lectin pathway activation.

Classical pathway Lectin pathway

44

The regulatory mechanism of C4BP The major inhibitory role of C4BP is to facilitate the decay of

C3 convertase

A

BD

C

A B

C D

Results

Both predictions concerning C4BP were experimentally verified.

“A Computational and Experimental Study of the Regulatory Mechanisms of the Complement System” Liu et.al: PLoS Comp.Bio7(2011)

BioModels database (303.Liu).

Current Collaborators

Immune signalling duringMultiple infections

DNA damage/response pathway Chromosome co-localizations and co-regulations

Ding Jeak Ling Marie-Veronique Clement G V Shivashankar

Current work

Parametrized version of FF Reduce errors by investing more

computational time. Probabilistic verification methods.

Monte Carlo, FF based, … Implementation on a GPU platform.

Conclusion

The DBN approximation method is useful and efficient.

This is all very well in practice but will it work in theory?

Our group:

David Hsu SucheePalaniappan

Liu Bing Abhinav Dubey Blaise GenestS. Akshay Benjamin Gyori

GireedharVenkatachalam

Wang JunjieP.S. Thiagarajan

probabilistic approximations of bio-pathways dynamics p.s. thiagarajan school of computing, national...

Documents

b c d e slide

c d ded c b d

b c d e c d ded c b

t x0 slide

t xt t0t0 t1t1 t2t2

biochemical network

solution flow t

odes dbn slide