Analysis and Modeling of AfCS Data
What do we need and how do we get it?
What do we need?
Statistical Data Analysis
MechanisticModeling
Thinkin’ about it.
Paul
RamaMadhu
MichalGilLily Madhu
The fabled sweet spot.
Some basic vocabulary
• Input: an experimental condition and/or treatment.– Ligands– siRNA– Toxins
• Observables– Calcium concentration– RNA expression– Chemotactic index
• Features– Peak calcium concentration– Basal calcium concentration
Statistical Data Analysis
• Did an observable change significantly during a treatment?
• What features of an observable change significantly?
• What groups of observables change in a correlated fashion across a number of treatments/measurements?
• Which input and observables are statistically prerequisites for other observables.
Mechanistic Models
• “Mechanistic” Models are a series of assertions about the causal structure and dynamics of a system.
• Examples
– A is necessary for B to occur– P(A,B) != P(A)*P(B)
– dB/dt= k1*A – k2*B
– dP(X,t)/dt= Wx’x(X’,t)P(X’)-Wxx’(X,t)P(X)
Logical Statistical
Deterministic Stochastic
(I won’t talk about spatial today.)
What do we need?
• Most Alliance data is geared to statistical data analysis
• For modeling we get:– What is there? (e.g. Ryanodine
receptors and IP3 receptors?) (Transcription)
– Input/Output relations between ligands/siRNA and “outputs” (protein phosphorylation, Ca2+ traces as a function of ligand and knock down)
– Measures of uncertainty in these relations.
We have an unprecedented set of quality-controlled data to get started.
Capturing Mechanistic Knowledge
• The FXM is using PathwayBuilder to capture “Mechanistic” knowledge– But there are still necessary and useful abstractions that
are used
• We NEED highly-curated, biochemically reasonable pathways– Molecule pages are making great strides
• We MUST annotate model uncertainties– Confidences in the existence of an interaction– Confidences in the type of mechanism–
• We MUST annotate (relative) parameter value ranges
Pathway Builder Representation
• Initially very abstract• But every “process” may be assigned a model• Right now– different pathways must be made for each
mechanistic hypothesis.• But shortly we will be able to encode parameter
uncertainty.
Abstract concepts can be modeled
Pathway Builder Representation
• Initially very abstract• But every “process” may be assigned a model• Right now– different pathways must be made for each
mechanistic hypothesis.• But shortly we will be able to encode parameter
uncertainty.
Levels of Abstraction
• In the current AFCS release of PathwayBuilder the “Futile Cycle” is just graphical.
• But it is possible to assign an abstract model to that box to encode a phenomenological model of all the interactions within.
FXM Map
Cytosolic calcium
C5a
UDP
Paring down models
• Current calcium models do NOT capture the path between receptors and calcium.
• But they do set the fundamental “response circuit” for calcium dynamics– given and initial change in IP3 and DAG what calcium dynamics do you expect given expression of different channels and calcium and IP3 receptors.
• So what ARE all the paths from receptors to effects on the calcium transients?
• How do they regulate each other and interact?
Current calcium measurements
• Current calcium models don’t explain different ligand response or variabilities.
Peak height
Peak width
Final calcium up
slop
e
downslope
downslope variability?
C5A Response UDP Response
Single Cell Data
Q: oscillation: C5a (high dose); UDP (low dose)
Stolen from Lily Jiang
New Feature
Fraction of cells with each CLASS of responseParticularities of each response
Modeling Issues
How do we explain data variability with models?Mathematical representation of models
Model explanations
( , )dX
f X p wdt
The time dependent behavior of X depends on:Initial conditions of X Exact values of pThe nature of the uncertainty, w
0 1 2
1 2 3( )
dxk k y k x
dtdy
k y k x k ydt
E.g.p
Bistability: Parameter dependence
A simple model of the positive feedback
Monostable
Weakly bistable
Irreversibly Bistable
kC=1.6
kc
kc – catalytic constant for the trans-autophosphorylation.
Sta
tio
nar
y st
ate
[FA
K-I
]
B-p
A A-p
Exogenous noise
kc=1.6
Endogenous Noise
det
½p1p
0p
0 . 3 0 . 5 1 1 . 5 2E
0 . 0 0 5
0 . 0 1
0 . 0 5
0 . 1
0 . 5
1Xs s
E 0 E ½ E 1
tdBEfXK
Xkdt
XK
XEk
XK
XEkdXdX )(
*
**
0)()()(
))(( 22
2
0
0
Ef
XK
Kk
XXKXk
XKXXEkE
ssssss
ssss pEEf )(
Dynamical Noise Effects
N
E
*X *X E
With tiny noise on E+ Without noise on E+
We are NOT talking about space
• Though we could…
Model Sensitivity and Features
Can be used to bounds on parameter values
Model Building and (In)Validation with Data Collaboration
Matt Onsum, Ryan Feeley, Michael Frenklach & Andrew Packard
Given a set of mechanistic models, we can determine which model is the most consistent with data.
Set of Models
Parameter Uncertainty
Data
Check for Consistency
1. Consistent Models
2. Invalidated models
3. Information on constrained data/parameters
Model Invalidation
An experiment consists of: – Measured observable, D– features of the data– Experimental tolerance in measuring observable, e – Mathematical Model, M(), showing dependency on active
variables n
– A set of acceptable values for . Since each parameter of the model has uncertainty, there exists a hypercube, H, of possible values for
The experiment actually asserts an inequality constraint among the active variables:
|M() - D| < e.H
Therefore we set up the following constrained optimization:
Subject to
Model/Data Consistency
Much can be accomplished in this optimization framework
Check the consistency of the assertions. Does there exist a satisfying all of the assertions? -- Invalidate proposed mechanisms -- Quick tests to indicate likely sources of inconsistency -- Subsets of the assertions may be readily considered The (deterministic) experimental uncertainties are directly transferred into prediction uncertainties. Generate a “best fit” parameter (more on this in the next slide)
Typical Data ProcessingGiven:
– A priori knowledge: -1 k 1 k n.– An experiment: (M(), D, e) with n
From this, all that can be concluded is |M()-D|<e.
But, typically the procedure is:– Freeze all parameters except one, at the nominal: k=0 for k k0
– Find range of the investigated (unfrozen) parameter:
max/min k0
subject to: k=0 for k k0
-1 k0 1
|M()-D|<e
The reported range is a subset of what can actually be inferred from (M(), D, e), but the implied higher dimensional cube (the new, in-literature feasible set) neither contains, nor is a subset of the feasible parameter set.
eDM )(
1,1 : 21
Mistakes in Isolation
E66
44
4
5
C
A
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
E67 C
A
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
45
Related work: Consistency of Methane Combustion Database
• GRI-Mech has 300+ elementary reactions, 53 Species, and 102 “active” parameters•The community needed a database containing all relevant experimental info for methane combustion to determine the “right” kinetic parameters•It was realized that the best-fit parameter values did not give the combustion model good predictive power•Needed a way to better incorporate uncertainty about the parameter values
Pathway diagram for methane combustion [Turns]
Michael Frenklach, Andrew Packard, Pete Seiler and Ryan Feeley, “Collaborative data processing in developing predictive models of complex reaction systems,” International Journal of Chemical Kinetics, vol. 36, issue 1, pp. 57-66, 2004.Michael Frenklach, Andy Packard and Pete Seiler, “Prediction uncertainty from models and data,” 2002 American Control Conference, pp. 4135-4140, Anchorage, Alaska, May 8-10, 2002.
Sensitivity of Data Set Consistency to Assertions
20 40 60 80 100
0
1
2
3x 10
-3
Parameter number
* ()
20 40 60 80 100
0
1
2
3x 10
-3
Parameter number
* ()
20 40 60
0
0.1
0.2
0.3
0.4
Dataset unit number
* (l)
20 40 60
0
0.1
0.2
0.3
0.4
Dataset unit number
* (u)
Feature number Feature number
Upper BoundLower Bound
Modeling goal for the AfCS data
•Distinguish and rank competing mechanistic models• •Propose experiments that will further distinguish competing models
•Identify structural problems with current models
• In the examples that follow pathway 1 will be the “true model”
• Data was generated by simulating the true model with random initial conditions. Each initial condition assumed to be Gaussian with nominal mean, and 0.01 variance. This was run 1000 times and the resulting pathways were averaged to give trace data
• We then try to find the maximum parameter uncertainty that still allows us to identify the true pathway.
Distinguishing similar pathways
Example 1: Test for degenerate solutions
Features:4- Peak value
Example 2: Test for complex formation
Feature:
Example 3: Test for reversibility
Features:1- Rise Time (i.c. 1,1)2- Peak value3- Rise Time (i.c. 2,2)5- Rise Time after pretreatment
Example 4: Test for missing intermediate
Summary of Toy Examples
1. For large parameter and experimental uncertainty, multiple models can fit the same data.
2. Some experiments provide tighter constraints then others.
3. Repeats of these experiments (reduction of uncertainty) improves our ability to distinguish similar pathways.
Initial data is dose response
0
10
20
30
40
50
60
1 12 23 34 45 56 67 78 89 100 111 122 133 144 155
C5a 1uM
C5a 0.5
C5a 0.25
C5a 0.1
C5a 0.05
Can we use legacy models to explain AFCS data?
We began with the model by Goldbeter
Steady state Ca2+
Peak Output
Upper BoundLower Bound
Formation of active G-protein
Inactivation of G-protein
Weisner model fits a single response
Simulations of base model show two sensitive parameters.
However, we could not fit the model to the dose response data.
The model was not able to reproduce the change in steady state values
Steady-state calcium level.
Passive ER Ca2+ leak
Km ion pump
Vmax ion exchanger
Vmax Ca2+ ATPase pump
Lower Bound Upper Bound
Conclusions
• Method for model validation
• Showed that it can distinguish between canonical pathways even with high uncertainty
• We have begun invalidating literature models
So what do we need? (Experiment)
• A number of well-chosen knock-downs upstream of calcium and in “independent” parts of the different receptor pathways. (Accurate assessment of loss of function, induction)
• Ways of separating exogenous variability from endogenous variability from measurement noise.
• Measurement of the dose-response of intermediates (not just calcium) for the single FXM ligands and
• Determination of a set of physiologically relevant and significantly affected features.
• Similarly for double-ligand responses.
• Single cell assays should be expanded!
So what do we need? (Analysis)
• Identification of important features in the data that we wish to explain.
• Determination of value and variance of significantly changing features.
• Figure out a consistent way to classify single cell responses.
So what do we need? (Modeling)
• Biochemists and geneticists editing the maps and making hypotheses– Perhaps we should have a model hypothesis page as an
addendum to Henry’s?
• An initial “frozen” data set to be the test bed for all initial modeling discussions.
• An initial “frozen” analysis thereof
• A series of “minimal” pathways derived from the FXM maps that are believed to be the significant determinants of our output signals.
• Choice of mathematical picture and inference about the significance of the single cell responses.
• A direct way of driving experiments from models.
Acknowledgements
• Matt Onsum• Ryan Feeley• Andrew Packard• Michael Frenklach• Michael Samoilov• Alex Gilman
Matt Andy
Mike
The Alliance and especially the FXM
Lily JiangMadhu NatarajanGil Sambrano