probabilistic approximations of bio-pathways dynamics p.s. thiagarajan school of computing, national...
TRANSCRIPT
Probabilistic Approximations of Bio-Pathways Dynamics
P.S. ThiagarajanSchool of Computing, National University of Singapore
Joint Work with: Liu Bing, David Hsu
Bio-pathways
Intracellular: Gene regulatory networks Metabolic networks signalling pathways.
A Common Modeling Approach
View a bio-pathway as a network of bio-chemical reactions Model the network as a system of ODEs
One for each molecular species Mass action, Michelis-Menten, Hill, etc.
Study the ODE system to understand the dynamics of the bio-chemical network
Major Hurdles Many rate constants not known
must be estimated noisy data; limited precision; population-based
High dimensional system closed-form solutions are impossible Must resort to numerical simulations Point values of initial states/data will not be available. a large number of numerical simulations needed for
answering each question
Our work:
Decompose Analyze Compose (ISMB’06, WABI’07)
Probabilistic models and methods to: approximate the ODE dynamics(CMSB’09,
TCS’11). update models (RECOMB’10)
The “Opinion Poll” Idea
Start with an ODEs system. Discretize the time and value domains. Assume a (uniform) distribution of initial states Generate a “sufficiently” large number of trajectories
by sampling the initial states and numerical simulations.
Encode this collection of trajectories as a dynamic Bayesian network.
ODEs DBN
The “opinion poll” Idea
Pay the one-time cost of constructing the Bayesian network.
Amortize this cost: perform multiple analysis tasks using
inferencing algorithms for DBNs.
Time Discretization
Observe the system only at discrete time points; in fact, only at finitely many time points
x(t)
t0 t1 t2tmax... ... ... ...
x(t) = t3 + 4t + x(0)
Value Discretization
Observe only with bounded precision
x(t)
t0 t1 t2 tmax... ... ... ...
A
B
C
D
E
Discrete Trajectories
A trajectory now is sequence of discrete values
C D D E D C B
x(t)
t0 t1 t2 tmax... ... ... ...
A
B
C
D
E
Discretized Behavior
Assume a prior distribution of the initial states. Behavior = (Uncountably) many sequences of discrete
values.
t
x(t)
t0 t1 t2 tmax... ... ... ...
A
B
C
D
E
C D D E D C B
D ... ...
Representation of Behavior
In fact, a probabilistic transition system. Pr( (D, 2) (E, 3) ) is
the “fraction” of the
trajectories residing in D
at t = 2 that land in E at
t = 3.
C D D E D C B
D
0.8
0.2
tt0 t1 t2 tmax... ... ... ...
A
B
C
D
E
Mathematical Justification The value space of the variables is assumed to be a
compact subset C of In Z’ = F(Z), F is assumed to be differentiable
everywhere in C. Mass-law, Michaelis-Menton,… for every rate constant k.
Then the solution (flow) t : C C (for each t) exists, is unique, a bijection, continuous and hence measurable.
But the transition probabilities can not be computed.
C D D E D C B
D
0.8
0.2
(s, i) – States; (s, i) (s’, i+1) -- Transitions
Sample, say, 1000 times the initial states.
Through numerical simulation, generate 1000 trajectories.
Pr((s, i) (s’ i+1)) is the fraction of the trajectories that are in s at ti which land in s’ at t i+1. tt0 t1 t2 tmax... ... ... ...
A
B
C
D
E
Computational Approximation
Infeasible Size! But the transition system will be huge.
O(T . kn) k 2 and n ( 50-100) can be large.
Compact Representation Exploit the network structure (additional
independence assumptions) to construct a dynamic Bayesian network instead.
The DBN is a factored form of a probabilistic transition system (Markov chain).
The DBN Representation
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
The DBN Representation
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
ABC
Each node has a CPT associated with it.
This specifies the local (probabilistic) dynamics.
S0
E0
ES0
P0 P1
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
ABC
A
BCC
B
BCB
C
BBB
C
CBC
C
CAC
... ...
S1
E1
ES1
S2
.
P(S2=C|S1=B,E1=C,ES1=B)= 0.2
P(S2=C|S1=B,E1=C,ES1=C)= 0.1
P(S2=A|S1=A,E1=A,ES1=C)= 0.05
. . .
Fill up the entries in the CPTs by sampling, simulations and counting
S0
E0
ES0
P0 P1
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
ABC S1
E1
ES1
S2
.
P(S2=C|S1=B,E1=C,ES1=B)= 0.2
P(S2=C|S1=B,E1=C,ES1=C)= 0.1
P(S2=A|S1=A,E1=A,ES1=C)= 0.05
. . .
Computational Approximation
Fill up the entries in the CPTs by sampling, simulations and counting
S0
E0
ES0
P0 P1
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
ABC S1
E1
ES1
S2
500 100
1000
The Technique
Fill up the entries in the CPTs by sampling, simulations and counting
S0
E0
ES0
P0 P1
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
ABC S1
E1
ES1
S2
500 100
P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2
The Technique
S0
E0
ES0
P0 P1
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
ABC S1
E1
ES1
S2
500 100
P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2
The size of the DBN is:
O(T . n . kd)
d will be usually much smaller than n.
Unknown rate constants
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
k2
Unknown rate constants
During the numerical
generation of a
trajectory, the value
of k2 does not change
after sampling.
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
k2 k2 k2k2
Unknown rate constants
Sample uniformly
across all the
Intervals.
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
k2 k2 k2k2
Bayesian inference
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
k2 k2 k2k2
This is the Factored Frontier algorithm (Murphy & Weiss)
Approximate but fast:
Parameter Estimation
S0
E0
ES0
P0
S1
E1
ES1
P1
S2
E2
ES2
P2
S3
E3
ES3
P3
... ...
... ...
... ...
... ...
k2 k2 k2k2
For each choice of (jnterval) values for unknown parameters, present experimental data as evidence and assign a score using FF.
Return parameter estimates as maximal likelihoods.
Similar application of FF to do sensitivity analysis.
DBN based Analysis
Our experiments with signalling pathways models (taken from the biomodels data base) suggest:The one-time cost of constructing the DBN
can be easily recovered by using it to do parameter estimation and sensitivity analysis.
Segmentation Clock Network Governs the periodic formation
process of somites during embryogenesis
The underlying signaling network couples three oscillating pathways: FGF, Wnt and Notch.
22 ODEs; 75 unknown parameters
5 equal intervals; t = 5 min; 100 time points; 2.5 million trajectories; 3.1 hours on a 10 PC cluster.
Results
Global sensitivity analysis ODE based: 81 hours DBN based: 3 hours
LM: Levenberg–Marquardt
GA: Genetic Algorithm
SRES: Evolutionary Strategy
PSO: Particle Swarm Optimization
BDM: our method
40 unknown parameters;Synthetic data: 8 proteins, 10 time points (noise with 5% variance added)
ODE model (Brown et al. 2004) 32 species 48 parameters Features:
Good size Feedback loops
DBN construction Settings
5 intervals 1 min time-step, 100 minutes 3 x 106 samples
Runtime 4 hours on a cluster of 10 PCs
EGF-NGF Pathway
LM: Levenberg–Marquardt
GA: Genetic Algorithm
SRES: Evolutionary Strategy
PSO: Particle Swarm Optimization
BDM: our method
Parameter Estimation20 (out of 48) unknown parameters Synthetic data: 9 proteins, 10 time points
Sensitivity Analysis
Running time ODE based: 22 hours DBN based: 34 minutes
35
Complement System
Complement system is a critical part of the immune system
Ricklin et al. 2007
Classical pathwayLectin pathwayAmplified responseInhibition
37
Goals
Quantitatively understand the regulatory mechanisms of complement system How is the complement activity enhanced under inflammation
condition? How is the excessive response of the complement avoided?
The model: Classical pathway + the lectin pathway
enhancement
Inhibitory mechanisms C4BP
Complement System
ODE Model 42 Species 45 Reactions
Mass law Michaelis-Menten kinetics
92 Parameters (71 unknown) DBN Construction
Settings 6 intervals 100s time-step, 12600s 1.2 x 106 samples
Runtime 12 hours on a cluster of 20 PCss
39
Parameter estimation Training data
4 proteins, 7 time points, 4 conditions
* inflammation condition: pH 6.5 calcium 2 nM
40
Model Calibration (parameter estimation)
41
Model validation
Validated the model using previous published data (Zhang et al 2009)
42
Enhancement mechanism
The antimicrobial response is sensitive to the pH and calcium level
43
Model prediction: The regulatory effect of C4BP
C4BP maintains classical complement activation but delays the maximal response time.
But attenuates the lectin pathway activation.
Classical pathway Lectin pathway
44
The regulatory mechanism of C4BP The major inhibitory role of C4BP is to facilitate the decay of
C3 convertase
A
BD
C
A B
C D
Results
Both predictions concerning C4BP were experimentally verified.
“A Computational and Experimental Study of the Regulatory Mechanisms of the Complement System” Liu et.al: PLoS Comp.Bio7(2011)
BioModels database (303.Liu).
Current Collaborators
Immune signalling duringMultiple infections
DNA damage/response pathway Chromosome co-localizations and co-regulations
Ding Jeak Ling Marie-Veronique Clement G V Shivashankar
Current work
Parametrized version of FF Reduce errors by investing more
computational time. Probabilistic verification methods.
Monte Carlo, FF based, … Implementation on a GPU platform.
Conclusion
The DBN approximation method is useful and efficient.
This is all very well in practice but will it work in theory?
Our group:
David Hsu SucheePalaniappan
Liu Bing Abhinav Dubey Blaise GenestS. Akshay Benjamin Gyori
GireedharVenkatachalam
Wang JunjieP.S. Thiagarajan