françois fages fjcp 2005 temporal logic constraints in the biochemical abstract machine biocham...

27
François Fages FJCP 2005 Temporal Logic Constraints in the Biochemical Abstract Machine BIOCHAM François Fages, Project-team: Contraintes, INRIA Rocquencourt, France http://contraintes.inria.fr/ Joint work with : Nathalie Sylvain Laurence Chabrier-Rivier Soliman Calzone 2002-2004: ARC CPBIO “Process Calculi and Biology of Molecular Networks” A. Bockmayr, LORIA, V. Danos, CNRS PPS, V. Schächter, Genoscope Evry

Upload: job-mcdowell

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

François Fages FJCP 2005

Temporal Logic Constraints in the Biochemical Abstract Machine BIOCHAM

François Fages, Project-team: Contraintes,

INRIA Rocquencourt, Francehttp://contraintes.inria.fr/

Joint work with :

Nathalie Sylvain Laurence

Chabrier-Rivier Soliman Calzone

2002-2004: ARC CPBIO “Process Calculi and Biology of Molecular Networks”

A. Bockmayr, LORIA, V. Danos, CNRS PPS, V. Schächter, Genoscope Evry

François Fages FJCP 2005

Systems Biology ?

•Multidisciplinary field aiming at getting over

the complexity walls to reason about

biological processes at the system level.

•Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments)

•Bioinformatics: end 90’s, genomic sequences post-genomic data (RNA expression, protein synthesis, protein-protein interactions,… )

•Need for a strong effort on:

- the formal representation of biological processes,

- formal tools for modeling and reasoning about their global behavior.

François Fages FJCP 2005

Language Approach to Cell Systems Biology

Qualitative models: from diagrammatic notation to• Boolean networks [Thomas 73]

• Petri Nets [Reddy 93]

• Milner’s π–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03]

• Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02]

• Transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 04]

Biochemical abstract machine BIOCHAM-1 [Chabrier-Fages 03]

Quantitative models: from differential equation systems to• Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00]

• Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01]

• Hybrid concurrent constraint languages [Bockmayr-Courtois 01]

• Rules with continuous dynamics BIOCHAM-2 [Chabrier-Fages-Soliman 04]

François Fages FJCP 2005

Outline of the Presentation

1. Introduction

2. Biocham Rule Language for Modeling Biochemical Systems 1. Syntax of objects and reactions2. Semantics at 3 abstraction levels: Boolean, Concentrations,

Populations

3. Biocham Temporal Logic for Formalizing Biological Properties1. CTL for Boolean semantics2. Constraint LTL for Concentration semantics

4. Learning Rules and Parameters from Temporal Properties1. Learning reaction rules from CTL specification2. Learning kinetic parameter values from Constraint-LTL specification

5. Conclusion and collaborations

François Fages FJCP 2005

2. Modeling Biochemical Systems

Small molecules: covalent bonds (outer electrons shared) 50-200 kcal/mol

• 70% water

• 1% ions

• 6% amino acids (20), nucleotides (5),

fats, sugars, ATP, ADP, …

Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol

Stability and bindings determined by the number of weak bonds: 3D shape

• 20% proteins (50-104 amino acids)

• RNA (102-104 nucleotides AGCU)

• DNA (102-106 nucleotides AGCT)

François Fages FJCP 2005

Formal Proteins

Cyclin dependent kinase 1 Cdk1

(free, inactive)

Complex Cdk1-Cyclin B Cdk1–CycB

(low activity)

Phosphorylated form Cdk1~{thr161}-CycB

at site threonine 161

(high activity) also called

Mitosis Promotion Factor MPF

François Fages FJCP 2005

BIOCHAM Syntax of Objects

E == compound | E-E | E~{p1,…,pn}

Compound: molecule, #gene binding site, abstract @process…

- : binding operator for protein complexes, gene binding sites, …

Associative and commutative.

~{…}: modification operator for phosphorylated sites, …

Set of modified sites (Associative, Commutative, Idempotent).

O == E | E::location

Location: symbolic compartment (nucleus, cytoplasm, membrane, …)

S == _ | O+S

+ : solution operator (Associative, Commutative, Neutral _)

François Fages FJCP 2005

Seven Main Rule Schemas

Complexation: A + B => A-B Decomplexation A-B => A + B

cdk1+cycB => cdk1–cycB

Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A

Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB

Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB

Synthesis: _ =[C]=> A. Degradation: A =[C]=> _.

_=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _

(not for cycE-cdk2 which is stable)

Transport: A::L1 => A::L2

Cdk1~{p}-CycB::cytoplasm=>Cdk1~{p}-CycB::nucleus

François Fages FJCP 2005

BIOCHAM Syntax of Reaction Rules

R ::= S=>S | S=[O]=>S | S<=>S | S<=[O]=>S

where A=[C]=>B stands for A+C=>B+C

A<=>B stands for A=>B and B=>A, etc.

N ::= expr for R (import/export SBML format)

Three abstraction levels:

1. Boolean Semantics: presence-absence of molecules1. Concurrent Transition System (asynchronous, non-

deterministic)

2. Concentration Semantics: number / volume of diffusion1. Ordinary Differential Equations (deterministic)

• Population of molecules: number of molecules • Stochastic Multiset Rewriting

François Fages FJCP 2005

Cell Cycle: G1 DNA Synthesis G2 Mitosis

G1: CdK4-CycD S: Cdk2-CycA G2,M: Cdk1-CycA

Cdk6-CycD Cdk1-CycB

Cdk2-CycE (MPF)

François Fages FJCP 2005

Mammalian Cell Cycle Model [Kohn 99]

François Fages FJCP 2005

Boolean Semantics

Associate:

• Boolean state variables to molecules

denoting the presence/absence of molecules in the cell or compartment

• A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors

A reaction A+B=>C+D is translated into 4 transition rules for the possibly complete consumption of reactants:

A+BA+B+C+D

A+BA+B +C+D

A+BA+B+C+D

A+BA+B+C+D

François Fages FJCP 2005

Concentration Semantics

k1cc for _=>preMPF.

k3cc*[C25~{s1,s2}]*[preMPF] for preMPF=[C25~{s1,s2}]=>MPF.

(k14cc*[CKI]*[MPF],k15cc*[CKI-MPF]) for CKI+MPF<=>CKI-MPF.

k2cc*[preMPF] for preMPF=>_.

k2cc*[MPF] for MPF=>_.

k2u*[APC]*[MPF] for MPF=[APC]=>_.

k4cc*[Wee1]*[MPF] for MPF=[Wee1]=>preMPF.

parameter(k1cc,0.25).

present({preMPF, Wee1m}).

Compiles into an ODE system

(or a Stochastic Process under

the Population semantics)

François Fages FJCP 2005

3. Formalizing Biological Properties in Temporal Logics

Boolean Semantics: Computation Tree Logic CTL

Time

Non-determinism E, A

F,G,U EF

EU

AG

Choice

Time

E

exists 

A

always

X

next time

EX(f)

AX(f)

AX(f)

F

finally

EF(f)

AG(f)

AF(f)

G

globally

EG(f)

AF( f)

AG(f)

U

untilE (f1 U f2) A (f1 U f2)

François Fages FJCP 2005

Biological Properties formalized in CTL [Chabrier Fages 03]

About reachability:

• Can the cell produce some protein P? reachable(P)==EF(P)

François Fages FJCP 2005

Biological Properties formalized in CTL [Chabrier Fages 03]

About reachability:

• Can the cell produce some protein P? reachable(P)==EF(P)

About pathways:

• Is it possible to produce P without having Q? E(Q U P)• Is state s2 a necessary checkpoint for reaching state s?

checkpoint(s2,s)== E(s2U s)

François Fages FJCP 2005

Biological Properties formalized in CTL [Chabrier Fages 03]

About reachability:

• Can the cell produce some protein P? reachable(P)==EF(P)

About pathways:

• Is it possible to produce P without having Q? E(Q U P)• Is state s2 a necessary checkpoint for reaching state s?

checkpoint(s2,s)== E(s2U s)

About stationarity:

• Is a (partially described) state s a stable state? stable(s)== AG(s)

• Is s a steady state (with possibility of escaping) ? steady(s)==EG(s)

• Can the cell reach a stable state? EF(stable(s))

François Fages FJCP 2005

Biological Properties formalized in CTL [Chabrier Fages 03]

About reachability:

• Can the cell produce some protein P? reachable(P)==EF(P)

About pathways:

• Is it possible to produce P without having Q? E(Q U P)• Is state s2 a necessary checkpoint for reaching state s?

checkpoint(s2,s)== E(s2U s)

About stationarity:

• Is a (partially described) state s a stable state? stable(s)== AG(s)

• Is s a steady state (with possibility of escaping) ? steady(s)==EG(s)

• Can the cell reach a stable state? EF(stable(s))

About oscillations (approximation without strong fairness):

• Can the system exhibit a cyclic behavior w.r.t. the presence of P ? oscillation(P)== EG((P EF P) ^ (P EF P))

François Fages FJCP 2005

Cell Cycle Model-Checking

biocham: check_reachable(cdk46~{p1,p2}-cycD~{p1}). Ei(EF(cdk46~{p1,p2}-cycD~{p1})) is truebiocham: check_checkpoint(cdc25C~{p1,p2}, cdk1~{p1,p3}-cycB). Ai(!(E(!(cdc25C~{p1,p2}) U cdk1~{p1,p3}-cycB))) is truebiocham: nusmv(Ai(AG(!(cdk1~{p1,p2,p3}-cycB) -> checkpoint(Wee1, cdk1~{p1,p2,p3}-cycB))))). Ai(AG(!(cdk1~{p1,p2,p3}-cycB)->!(E(!(Wee1) U cdk1~{p1,p2,p3}-cycB)))) is falsebiocham: why.-- Loop starts here cycB-cdk1~{p1,p2,p3} is present cdk7 is present cycH is present cdk1 is present Myt1 is present cdc25C~{p1} is presentrule_114 cycB-cdk1~{p1,p2,p3}=[cdc25C~{p1}]=>cycB-cdk1~{p2,p3}. cycB-cdk1~{p2,p3} is present cycB-cdk1~{p1,p2,p3} is absentrule_74 cycB-cdk1~{p2,p3}=[Myt1]=>cycB-cdk1~{p1,p2,p3}. cycB-cdk1~{p2,p3} is absent cycB-cdk1~{p1,p2,p3} is present

François Fages FJCP 2005

Cell Cycle Model-Checking

800 rules, 165 proteins and genes, 500 variables.

BIOCHAM-NuSMV symbolic model-checker time in seconds:

Initial state G2 Query: Time

compiling 29s

Reachability G1 EF CycE 2s

Reachability G1 EF CycD 1.9s

Reachability G1 EF PCNA-CycD 1.7s

Checkpoint

for mitosis complex

EF ( Cdc25~{Nterm}

U Cdk1~{Thr161}-CycB)

2.2s

Cycle EG ( (CycA EF CycA)

( CycA EF CycA))

31.8s

François Fages FJCP 2005

Concentration Semantics: Constraint LTL

• Constraints over concentrations and derivatives as FOL formulae over the reals:

• [M] > 0.2

• [M]+[P] > [Q]

• d([M])/dt < 0

• Constraint LTL operators for time F, U, G (no non-determinism).• F([M]>0.2)

• FG([M]>0.2)

• F ([M]>2 & F (d([M])/dt<0 & F ([M]<2 & d([M])/dt>0 & F(d([M])/dt<0))))

• oscil(M,n)= F (d([M])/dt>0 & F(d([M])/dt<0 & … ))

• Language to formalize the relevant properties observed in experiments

François Fages FJCP 2005

Traces from Numerical Simulation

• From a system of Ordinary Differential Equations

dX/dt = f(X)

• Numerical integration produces a discretization of time (adaptive step size Runge-Kutta and Rosenbrock method for stiff systems)

• The trace is a linear Kripke structure:

(t0,X0), (t1,X1), …, (tn,Xn)…

the derivatives can be added to the trace

(t0,X0,dX0/dt), (t1,X1,dX1/dt), …, (tn,Xn,dXn/dt)…

• Equality x=v true if xi≤v & xi+1≥v or if xi≥v & xi+1≤v

François Fages FJCP 2005

4. Learning Kinetic Parameters with Constraint-LTL

parameter(k3cc,0.1).

k3cc*[MPF~{p}]*[cdc25C~{p1,p2}] for

MPF~{p}=[cdc25C~{p1,p2}]=>MPF.

biocham: trace_get([k3cc],[(0,5)],20,

oscil(MPF,4)&F([MPF]>1),100).

Found parameters that make

oscil(MPF,4) & F([MPF]>1) true:

parameter(k3cc,2.5).

François Fages FJCP 2005

Learning Reaction Rules from CTL Specification

The biological properties of the system are added as CTL formulas

biocham: add_spec({reachable(MPF),checkpoint(cdc25C~{p1,p2},MPF),...}).

Suppose that the MPF activation rule is missing in the model

biocham: delete_rule(MPF~{p}=[cdc25C~{p1,p2}]=>MPF).

biocham: check_all.The specification is not satisfied.

This formula is the first not verified: Ei(EF(MPF))

Rules can be searched to correct the model w.r.t. specification:

biocham: learn_one_rule(all_elementary_interaction_rules).Possible rules to be added: 3

_=[cdc25C~{p1,p2}]=>MPF

MPF~{p}=[cdc25C~{p1,p2}]=>MPF

CKI+MPF~{p}=[cdc25C~{p1,p2}]=>CKI-MPF

François Fages FJCP 2005

Learning Reaction Rules from CTL Specification

Example: finding an intermediary step between MPF and APC activation

biocham: absent(X). add_rule(_=>X). add_rule(X=>_).

biocham: add_specs({ Ei(reachable(X)), Ai(oscil(X)),

Ai(AG(!APC->checkpoint(X,APC))),

Ai(AG(!X->checkpoint(MPF,X))) }).

biocham: check_all.The specification is not satisfied.

This formula is the first not verified: Ai(AG(!APC->!(E(!X U APC))))

Biocham searches for revisions of the model satisfying the specification

biocham: revise_model.

Deletion(s): _=[MPF]=>APC. _=>X.

Addition(s): _=[X]=>APC. _=[MPF]=>X.

François Fages FJCP 2005

Conclusion

The biochemical abstract machine BIOCHAM implements:

• A simple rule-based language for modeling biochemical processes with three abstraction levels:

• Boolean semantics: presence/absence of molecules• Molecule Concentration semantics (ODE)• Molecule Population semantics (stochastic)

• A powerful temporal logic language for formalizing biological properties• CTL (implemented with NuSMV model checker)• Constraint LTL (implemented in Prolog)

• Machine learning techniques• Reaction rule discovery from CTL specification• Parameter estimation from constraint LTL specification

Issue of compositionality: model reuse in different contexts

Issue of abstraction/refinement: model simplification/decomposition

François Fages FJCP 2005

Collaborations

STREP APRIL 2: Applications of probabilistic inductive logic programming

Luc de Raedt, Freiburg, Stephen Muggleton, Imperial College London,…

• Learning in a probabilistic logic setting

NoE REWERSE: Reasoning on the web with rules and semantics

François Bry, Münich, Rolf Backofen Jena, Mike Schroeder Dresden,…

• Connecting Biocham to the semantic web: gene and protein ontologies

INRIA Bang, Jean Clairambault, Benoît Perthame

INSERM, Villejuif, Francis Lévi “Cancer chronotherapies”

ULB, Albert Goldbeter, Bruxelles

• Coupled models of cell cycle, circadian cycle, cytotoxic drugs.