computing bayesian posterior with empirical likelihood … of contents 1 models and aims 2...

40
Computing Bayesian posterior with empirical likelihood in population genetics Pierre Pudlo INRA & U. Montpellier 2 MCEB, June 2012 Pierre Pudlo (INRA & U. Montpellier 2) ABC el GenPop MCEB June 2012 1 / 25

Upload: vuongtram

Post on 06-May-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Computing Bayesian posterior with empiricallikelihood in population genetics

Pierre Pudlo

INRA & U. Montpellier 2

MCEB, June 2012

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 1 / 25

Page 2: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Table of contents

1 Models and aims

2 Likelihood free methods

3 ABCel

4 Numerical experiments

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 2 / 25

Page 3: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Table of contents

1 Models and aims

2 Likelihood free methods

3 ABCel

4 Numerical experiments

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 3 / 25

Page 4: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Sample of 8 genes

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 4 / 25

Page 5: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 4 / 25

Page 6: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches

• MRCA = 100• independent mutations:±1 with pr. 1/2

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 4 / 25

Page 7: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Observations: leafs of the treeθ = ?

Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 4 / 25

Page 8: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Much more interesting models. . .

I several independent lociIndependent gene genealogies and mutations

I different populationslinked by an evolutionary scenario made of divergences,admixtures, migrations between populations, etc.

I larger sample sizeusually between 50 and 100 genes

A typical evolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 5 / 25

Page 9: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Table of contents

1 Models and aims

2 Likelihood free methods

3 ABCel

4 Numerical experiments

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 6 / 25

Page 10: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

When the likelihood is not completely known

I Hidden Markov and other dynamic models: latent processwhich is not observed

↪→ Classical answer: Markov chain Monte Carlo,. . .

I Population genetics: the whole gene genealogy is unobservedLikelihood is an integral over

I all possible gene genealogiesI all possible mutations along the genealogies

↪→ Classical answer: Approximate Bayesian computation (ABC)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 7 / 25

Page 11: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

ABC in a nutshell

Posterior distribution is the conditional distribution ofπ(φ)`(x|φ) (∗)

knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Seminal papersI Tavare, Balding, Griffith and Donnelly (1997, Genetics)I Pritchard, Seielstad, Perez-Lezuan, Feldman (1999, Molecular

Biology and Evolution)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 8 / 25

Page 12: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Shortcomings.I time consuming – If simulation of the latent process is not

straightforward

I curse of dimensionality vs. loss of information –I If x lies in a high dimensional space X (often), we are unable to

estimate of the conditional density

I Hence, we project the (observed and simulated) datasets on aspace with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 8 / 25

Page 13: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Shortcomings.I time consuming – If simulation of the latent process is not

straightforwardI curse of dimensionality vs. loss of information –

I If x lies in a high dimensional space X (often), we are unable toestimate of the conditional density

I Hence, we project the (observed and simulated) datasets on aspace with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 8 / 25

Page 14: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|η(xobs)) ∝ π(φ)∫x : η(x)=η(xobs)

`(x|φ) dx

Shortcomings.I time consuming – If simulation of the latent process is not

straightforwardI curse of dimensionality vs. loss of information –

I If x lies in a high dimensional space X (often), we are unable toestimate of the conditional density

I Hence, we project the (observed and simulated) datasets on aspace with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 8 / 25

Page 15: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Curse of dimensionality

Assume thatI the simulated summary statistics η(x1), . . . ,η(xN)I the observed summary statistics η(xobs)

are iid, with uniform law on [0, 1]d

Let d∞(d,N) = E

[mini=1,...,N

∥∥η(xobs) − η(xi)∥∥∞]

N = 100 N = 1, 000 N = 10, 000 N = 100, 000δ∞(1,N) 0.0025 0.00025 0.000025 0.0000025δ∞(2,N) > 0.033 > 0.01 > 0.0033 > 0.001δ∞(10,N) > 0.28 > 0.22 > 0.18 > 0.14δ∞(200,N) > 0.48 > 0.48 > 0.47 > 0.46

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 9 / 25

Page 16: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Table of contents

1 Models and aims

2 Likelihood free methods

3 ABCel

4 Numerical experiments

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 10 / 25

Page 17: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 11 / 25

Page 18: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Empirical likelihood (EL)

Owen (1988, Biometrika), Owen (2001, Chapman & Hall)

Assume that the dataset x is composed of n independent replicatesx = (x1, . . . , xn) of some X ∼ F

Generalized moment condition modelThe law F of X satisfy

EF[h(X,φ)

]= 0,

where h is a known function, and φ an unknown parameter

Empirical likelihood

Lel(φ|x) = maxp

n∏i=1

pi

for all p such that 0 6 pi 6 1,∑pi = 1,

∑i pih(xi,φ) = 0.

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 12 / 25

Page 19: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Raw ABCelsamplerWe act as if EL was an exact likelihood

for i = 1→ N do

generate φi from the prior distribution π(·)set the weight ωi = Lel(φi|xobs)

end for

return (φi,ωi), i = 1, . . . ,N

I The output is sample of parameters of size N with associatedweights

I Performance of the output evaluated through effective sample size

ESS = 1/ N∑i=1

ωi/N∑j=1

ωj

2

I Other classical sampling algorithms might be adapted to use EL.We resorted the adaptive multiple importance sampling (AMIS) ofCornuet et al. (Scandinavian J. of Statis.) to speed upcomputations

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 13 / 25

Page 20: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Moment condition in population genetics?EL does not require a fully defined and often complex (hencedebatable) parametric model.

Main difficultyDerive a constraint

EF[h(X,φ)

]= 0,

on the parameters of interest φ when X is the allelic states of oursample of individuals at a given locus

E.g., in phylogeography, φ is composed ofI dates of splits of populations,I ratio of population sizes,I mutation rates, etc.

None of them are moments of the distribution of the allelic states of thesample

↪→ h = pairwise composite scores whose zero is the pairwisemaximum likelihood estimator

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 14 / 25

Page 21: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Moment condition in population genetics?EL does not require a fully defined and often complex (hencedebatable) parametric model.

Main difficultyDerive a constraint

EF[h(X,φ)

]= 0,

on the parameters of interest φ when X is the allelic states of oursample of individuals at a given locus

E.g., in phylogeography, φ is composed ofI dates of splits of populations,I ratio of population sizes,I mutation rates, etc.

None of them are moments of the distribution of the allelic states of thesample

↪→ h = pairwise composite scores whose zero is the pairwisemaximum likelihood estimator

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 14 / 25

Page 22: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pairwise composite likelihood?

The intra-locus pairwise likelihood

`2(xk|φ) =∏i<j

`2(xik, xjk|φ)

with x1k, . . . , xnk : allelic states of the gene sample at the k-th locus

The pairwise score function

∇φ log `2(xk|φ) =∑i<j

∇φ log `2(xik, xjk|φ)

� Composite likelihoods are often much more narrow than thedistribution of the model

Safe with EL because we only use position of its mode

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 15 / 25

Page 23: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 16 / 25

Page 24: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 16 / 25

Page 25: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 16 / 25

Page 26: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pairwise likelihood: 2 diverging populations

MRCA

POP a POP b

τ

AssumptionsI τ: divergence date of pop.a and b

I θ/2: mutation rateLet xik and xjk be two genescoming resp. from pop. a and bSet δ = xik − x

jk.

Then `2(δ|θ, τ) =e−τθ√1+ 2θ

+∞∑k=−∞ ρ(θ)

|k|Iδ−k(τθ).

whereIn(z) nth-order modified Besselfunction of the first kind

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 17 / 25

Page 27: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Pairwise likelihood: 2 diverging populations

MRCA

POP a POP b

τ

AssumptionsI τ: divergence date of pop.a and b

I θ/2: mutation rateLet xik and xjk be two genescoming resp. from pop. a and bSet δ = xik − x

jk.

A 2-dim score function∂τ log `2(δ|θ, τ) =

−θ+θ

2

`2(δ− 1|θ, τ) + `2(δ+ 1|θ, τ)`2(δ|θ, τ)

∂θ log `2(δ|θ, τ) =

−τ−1

1+ 2θ+

τ

2

`2(δ− 1|θ, τ) + `2(δ+ 1|θ, τ)`2(δ|θ, τ)

+

q(δ|θ, τ)`2(δ|θ, τ)

whereq(δ|θ, τ) :=

e−τθ√1+ 2θ

ρ ′(θ)

ρ(θ)

∞∑k=−∞ |k|ρ(θ)|k|Iδ−k(τθ)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 17 / 25

Page 28: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Recap

Three kinds of likelihood:

I True likelihood: given by the model (evolutionary scenario &Kingman’s coalecent)↪→ cannot compute

I Pairwise composite likelihood: act as if each pair of genes wasindependent of the other ones↪→ its maximum provides as “good” approximation of the MLE

I Empirical likelihood: a way to profile the likelihood from thedata, using generalized moment conditions↪→ generalized moment condition in population genetics =pairwise composite scores (whose zero is the pairwise compositemaximum likelihood)

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 18 / 25

Page 29: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Table of contents

1 Models and aims

2 Likelihood free methods

3 ABCel

4 Numerical experiments

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 19 / 25

Page 30: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

A first experimentEvolutionary scenario:

MRCA

POP 0 POP 1

τ

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = (log10 θ, log10 τ)I uniform prior over(−1., 1.5)× (−1., 1.)

Comparison of the originalABC with ABCel

ESS=7034

log(theta)

Den

sity

0.00 0.05 0.10 0.15 0.20 0.25

05

1015

log(tau1)

Den

sity

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

01

23

45

67

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 20 / 25

Page 31: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

A first experimentEvolutionary scenario:

MRCA

POP 0 POP 1

τ

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = (log10 θ, log10 τ)I uniform prior over(−1., 1.5)× (−1., 1.)

Comparison of the originalABC with ABCel

ESS=7034

log(theta)

Den

sity

0.00 0.05 0.10 0.15 0.20 0.25

05

1015

log(tau1)

Den

sity

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

01

23

45

67

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 20 / 25

Page 32: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

ABC vs. ABCel on 100 replicates of the 1st experiment

Accuracy:log10 θ log10 τ

ABC ABCel ABC ABCel(1) 0.097 0.094 0.315 0.117(2) 0.68 0.81 1.0 0.80

(1) Root Mean Square Error of the posterior mean(2) Coverage of the credibility interval of probability 0.8

Computation time: on a recent 6-core computer (C++/OpenMP)I ABC ≈ 4 hoursI ABCel≈ 2 minutes

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 21 / 25

Page 33: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ =(log10 θ, log10 τ1, log10 τ2)

I non-informative prior

Comparison of the original ABCwith ABCelhistogram = ABCelcurve = original ABCvertical line = “true” parameter

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 22 / 25

Page 34: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ =(log10 θ, log10 τ1, log10 τ2)

I non-informative prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 22 / 25

Page 35: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ =(log10 θ, log10 τ1, log10 τ2)

I non-informative prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 22 / 25

Page 36: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ =(log10 θ, log10 τ1, log10 τ2)

I non-informative prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 22 / 25

Page 37: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

ABC vs. ABCel on 100 replicates of the 2ndexperiment

Accuracy:log10 θ log10 τ1 log10 τ2

ABC ABCel ABC ABCel ABC ABCel(1) 0.0059 0.0794 0.472 0.483 29.3 4.76(3) 0.79 0.76 0.88 0.76 0.89 0.79

(1) Root Mean Square Error of the posterior mean(2) Coverage of the credibility interval of probability 0.8

Computation time: on a recent 6-core computer (C++/OpenMP)I ABC ≈ 6 hoursI ABCel≈ 8 minutes

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 23 / 25

Page 38: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Why?

On large datasets, ABCel gives more accurate results than ABC

ABC simplifies the dataset through summary statisticsDue to the large dimension of x, the original ABC algorithm estimates

π(θ∣∣∣η(xobs)

),

where η(xobs) is some (non-linear) projection of the observed dataseton a space with smaller dimension↪→ Some information is lost

ABCel simplifies the model through a generalized moment conditionmodel.↪→ Provides more accurate approximation if the constraint is wellchoosen.

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 24 / 25

Page 39: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Joint work withI Christian P. Robert (U. Dauphine & IUF)I Kerrie Mengersen (QUT, Australia)I Raphael Leblois (INRA CBGP, Montpellier)

Grant from ANR throughProject “Emile”

First preprint on arXivApproximate Bayesian computation viaempirical likelihood

Coming soon: population genetic modelswhich are too slow to simulate

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 25 / 25

Page 40: Computing Bayesian posterior with empirical likelihood … of contents 1 Models and aims 2 Likelihood free methods 3 ABCel 4 Numerical experiments Pierre Pudlo (INRA & U. Montpellier

Joint work withI Christian P. Robert (U. Dauphine & IUF)I Kerrie Mengersen (QUT, Australia)I Raphael Leblois (INRA CBGP, Montpellier)

Grant from ANR throughProject “Emile”

First preprint on arXivApproximate Bayesian computation viaempirical likelihood

Coming soon: population genetic modelswhich are too slow to simulate

Pierre Pudlo (INRA & U. Montpellier 2) ABCel GenPop MCEB June 2012 25 / 25