modelling gene expression dynamics with gaussian processes · di erential equations: pol-ii to mrna...

45
Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March 10 th 2016 Magnus Rattray Faculty of Life Sciences University of Manchester

Upload: others

Post on 19-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Modelling gene expression dynamicswith Gaussian processes

Regulatory Genomics and EpigenomicsMarch 10th 2016

Magnus RattrayFaculty of Life Sciences

University of Manchester

Page 2: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Talk Outline

Introduction to Gaussian process regression

Example 1. Hierarchical models: batches and clusters

Example 2. Branching models: perturbations and bifurcations

Example 3. Differential equations: Pol-II to mRNA dynamics

Page 3: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes: flexible non-parametric models

Probability distributions over functions

f (t) ∼ GP(mean(t), cov(t, t ′)

)Covariance function k = cov(t, t ′) defines typical properties,

I Static . . . Dynamic

I Smooth . . . Rough

I Stationary. . . non-Stationary

I Periodic. . . Chaotic

The covariance function has parameters tuning these properties

Bayesian Machine Learning perspective: Rasmussen & Williams“Gaussian Processes for Machine Learning” (MIT Press, 2006)

Page 4: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes

Samples from a 25-dimensional multivariate Gaussian distribution:

n

fn

5 10 15 20 25

−1

−0.5

0.5

1

1.5

2

n

m

5 10 15 20 25

5

10

15

20

25

0.2

0.4

0.6

0.8

[f1, f2, . . . , f25] ∼ N (0,C )

Learning and Inference in Computational Systems Biology, MIT Press

Page 5: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes

Take dimension →∞

−3 −2 −1 1 2 3

−2.5

−2

−1.5

−1

−0.5

0.5

1

1.5

2

2.5

−3 −2 −1 1 2 3

−3

−2

−1

1

2

3

f ∼ GP(0, k) k(t, t ′)

= exp

(−(t − t ′)2

l2

)

Learning and Inference in Computational Systems Biology, MIT Press

Page 6: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes for inference: Bayesian Regression

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 7: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Regression example

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 8: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Regression example

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 9: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Regression example

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 10: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Ex1. Hierarchical models: batches and clusters

0 5 10 15 2021012

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

0 5 10 15 2021012

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Data from Kalinka et al. ”Gene expression divergence recapitulatesthe developmental hourglass model” Nature 2010

Joint work with James Hensman and Neil Lawrence

Page 11: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Usual processing options for time course batches

Lumped

0 5 10 15 20

2

1

0

1

2

Averaged

0 5 10 15 20

2

1

0

1

2

Page 12: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process

gene:

g(t) ∼ GP(0, kg (t, t ′)

)replicate:

fi (t) ∼ GP(g(t), kf (t, t ′)

)

Page 13: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process

3

2

1

0

1

2

J. Hensman, N.D. Lawrence, M.Rattray ”Hierarchical Bayesianmodelling of gene expression time series across irregularly sampledreplicates and clusters” BMC Bioinformatics 2013

Page 14: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process for clustering

An extended hierarchy

h(t) ∼ GP(

0, kh(t, t ′))

cluster

gi (t) ∼ GP(h(t), kg (t, t ′)

)gene

fir (t) ∼ GP(gi (t), kf (t, t ′)

)replicate

Page 15: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process for clustering

2

1

0

1

2

CG11786

2

1

0

1

2

CG12723

2

1

0

1

2

CG13196

2

1

0

1

2

CG13627

2

1

0

1

2

CG4386

2

1

0

1

2

Osi15

0 5 10 15 202

1

0

1

2

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Time (hours)

Norm

alis

ed

gen

e e

xp

ress

ion

Page 16: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process for clustering

Modifying an existing algorithm to include this model of replicateand cluster structure leads to more meaningful clustering

MF BP CC L N. clust.

agglomerative HGP 0.46 0.16 0.50 7360.8 50agglomerative GP 0.39 0.13 0.36 6203.7 128Mclust (concat.) 0.39 0.07 0.25 1324.0 26

Mclust (averaged) 0.40 0.08 0.24 -736.2 20

Variational Bayes algorithm is more efficient, allowing Bayesianclustering of >10K profiles with a Dirichlet Process prior

J. Hensman, M.Rattray, N.D. Lawrence “Fast non-parametricclustering of time-series data” IEEE TPAMI 2015

Page 17: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Clustering with a periodic covariance

Page 18: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Ex2. Branching models: perturbations and bifurcations

0 5 10 15Time (hrs)

−4

−3

−2

−1

0

1

2

3

4

5

Sim

ulated gene expression

Simulated control data

Simulated perturbed data

Joint work with Jing Yang, Chris Penfold and Murray Grant

Page 19: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 1

Page 20: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 2

Page 21: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 3

Page 22: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 4

Page 23: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 5

Page 24: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Joint distribution to two functions crossing at tp

f ∼ GP(0,K ) , g ∼ GP(0,K ) , g(tp) = f (tp)

Σ =

(Kff Kfg

Kgf Kgg

)=

(K (T,T)

K(T,tp)K(tp ,T)k(tp ,tp)

K(T,tp)K(tp ,T)k(tp ,tp)

K (T,T)

)(1)

Page 25: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Joint distribution of two datasets diverging at tp

y c(tn) ∼ N (f (tn), σ2)

yp(tn) ∼ N (f (tn), σ2) for tn ≤ tp

yp(tn) ∼ N (g(tn), σ2) for tn > tp

Page 26: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Posterior probability of the perturbation time tp

p(tp|y c(T), yp(T)) ' p(y c(T), yp(T)|tp)∑t=tmaxt=tmin

p(y c (T), yp(T)|t)

0 5 10 15Time (hrs)

−4

−3

−2

−1

0

1

2

3

4

5

Sim

ulated gene expression

Simulated control data

Simulated perturbed data

Page 27: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Comparison to DE thresholding approach

Alternative: use first point where some DE threshold is passedWe tested several DE metrics in the nsgp package

0 5 10 15 20

0

5

10

15

20

Esti

mate

d p

ert

urb

ati

on

tim

es

DEtime

Mean

Median

MAP

0 5 10 15 20

nsgp

EMLL0.5

EMLL1

Testing perturbation times

Page 28: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Investigating a plant’s response to bacterial challenge

Infection with virulent Pseudomonas syringage pv. tomato DC3000vs. disarmed strain DC3000hrpA

0 10 17

Times (hrs)

8

9

10

11

12

13

Gen

e e

xp

ressio

n

CATMA1c72124

Condition 1

Condition 2

0 10 17

Times (hrs)

CATMA1a09335

Condition 1

Condition 2

Yang et al. “Inferring the perturbation time from biological timecourse data” Bioinformatics (accepted) preprint: arxiv 1602.01743

Page 29: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Current work: modelling branching in single-cell data

12

48

1632

64start

5

10

15

20

25

30

35

40

1

2

3

4

5

6

7

with A. Boukouvalas, M. Zweissele, J. Hensman and N. Lawrence

Page 30: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Ex 3. Linking Pol-II activity to mRNA profiles

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II

0

2

4

6

Pol II

ENSG00000163659 (TIPARP)

0 40 80 160 320 64012800

50

100

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Joint work with Antti Honkela,Jaakko Peltonen, Neil Lawrence

Page 31: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Linking Pol-II activity to mRNA profiles

dm(t)

dt= βp(t −∆)− αm(t)

I m(t) is mRNA concentration (RNA-Seq data)

I p(t) is mRNA production rate (3’ pol-II ChIP-Seq data)

I α is degradation rate (mRNA half-life t1/2 = 2/α)

I ∆ is processing delay

We model p(t) ∼ GP(0, kp) as a Gaussian process (GP)

Likelihood can be worked out exactly

Bayesian MCMC used to estimate parameters α, β, ∆ and GPcovariance (2) and noise variance (1) parameters

Page 32: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Linking Pol-II activity to mRNA profiles

dm(t)

dt= βp(t −∆)− αm(t)

I m(t) is mRNA concentration (RNA-Seq data)

I p(t) is mRNA production rate (3’ pol-II ChIP-Seq data)

I α is degradation rate (mRNA half-life t1/2 = 2/α)

I ∆ is processing delay

We model p(t) ∼ GP(0, kp) as a Gaussian process (GP)

Likelihood can be worked out exactly

Bayesian MCMC used to estimate parameters α, β, ∆ and GPcovariance (2) and noise variance (1) parameters

Page 33: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II a: Early pol-II, no production delay

0

2

4

Pol II

ENSG00000166483 (WEE1)

0 40 80 160 320 64012800

20

40

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 34: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II b: Early pol-II, delayed production

0

1

2

3

Pol II

ENSG00000019549 (SNAI2)

0 40 80 160 320 64012800

2

4

6

8

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 35: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II c: Later pol-II, delayed production

0

2

4

6

Pol II

ENSG00000163659 (TIPARP)

0 40 80 160 320 64012800

50

100

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 36: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II d: Late pol-II, no delay

0

0.5

1

Pol II

ENSG00000162949 (CAPN13)

0 40 80 160 320 64012800

5

10

15

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 37: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II e: Late pol-II, no delay

0

0.5

1

Pol II

ENSG00000117298 (ECE1)

0 40 80 160 320 64012800

20

40

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 38: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II f: Late pol-II, delayed production

0

1

2

3

Pol II

ENSG00000181026 (AEN)

0 40 80 160 320 64012800

10

20

30

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 39: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Large processing delays observed in 11% of genes

Delay (min)

# of

gen

es

0 40 80 >120

050

100

1500

1550

Page 40: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Delay linked with gene length and intron structure

0 20 40 60 80t (min)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Fra

ctio

n of

gen

es w

ith ∆

>t

02

46

810

12−

log 1

0(p−

valu

e)

m<10000(N=282)m>10000(N=1504)

p−value

0 20 40 60 80t (min)

0.00

0.05

0.10

0.15

0.20

0.25

Fra

ctio

n of

gen

es w

ith ∆

>t

02

46

810

−lo

g 10(p

−va

lue)

f<0.20(N=1443)f>0.20(N=343)

p−value

∆: delay m: gene length f: final intron length / gene length

Page 41: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Delayed genes show evidence of late-intron retention

DLX

3

0246

10 m

in

0

5

10

20 m

in

0102030

40 m

in

05

101520

80 m

in

02468

10

160

min

02468

320

min

10 20 30 40 50 60t (min)

−0.

010.

000.

010.

020.

030.

04M

ean

pre−

mR

NA

end

acc

umul

atio

n in

dex

01

23

45

67

−lo

g 10(p

−va

lue)

∆ > t∆ < tp−value

Left: density of RNA-Seq reads uniquely mapping to the introns inthe DLX3 gene

Right: Differences in the mean pre-mRNA accumulation index inlong delay genes (blue) and short delay genes (red)

Page 42: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Comparison of processing and transcription times

RNA processing delay ∆ (min)

Tran

scrip

tiona

l del

ay (

min

)

1 3 10 30 100

13

1030

100

Transcription time = length/velocityestimate from Danko Mol Cell 2013

Transcription time > processing delayin 87% of genes

Honkela et al. “Genome-wide modeling of transcription kineticsreveals patterns of RNA production delays” PNAS 2015

Page 43: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Extension - inferring time-varying degradation rates

dm(t)

dt= βp(t)− α(t)m(t)

0 2 4 6 8 10Times

0.5

0.0

0.5

1.0

1.5

2.0

2.5

Degra

dati

on r

ate

D

0 2 4 6 8 10Times

5

0

5

10

15

20

25

30

35

Gene e

xpre

ssio

n

mRNA measurementspre-mRNA measurements

Page 44: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Conclusion

I Gaussian process models provide a good mix of flexibility andtractability in temporal and spatial modelling:

I Hierarchical modelling, e.g. replicates, clusters, speciesI Periodic models without strong sinuisoidal assumptionsI Tractable under branching and bifurcationsI Tractable under linear operations on functions

I Easy addition and multiplication of kernels

I Good approximate inference algorithms for non-Gaussian data

I Codebase is growing making methods increasingly flexible andcomputationally efficient (e.g. GPy, GPflow and some R)

Funding: BBSRC (EraSysBio+ SYNERGY), EU FP7 (RADIANT)

Collaborators: Neil Lawrence, James Hensman, Antti Honkela,Jaakko Peltonen, Jing Yang, Alexis Boukouvalas, Max Zweissele

Page 45: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Our existential crisis