approximate bayesian computation: algorithms, theory and ...parameter inferencemodel selection2-3...

42
Parameter inference Model selection 2-3 examples Approximate Bayesian Computation: algorithms, theory and applications Michael G.B. Blum Laboratoire TIMC-IMAG, UJF Grenoble, CNRS MCEB, June 2012

Upload: others

Post on 28-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Approximate Bayesian Computation:algorithms, theory and applications

Michael G.B. Blum

Laboratoire TIMC-IMAG, UJF Grenoble, CNRS

MCEB, June 2012

Page 2: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

“ABC is a ’democratizing’ method in that it will attract, forexample, biologists, who enjoy computer simulation but havelittle background in probability, into converting their favoritesimulation into a tool for inference”

Beaumont and Rannala, Nat. Rev. Genet. 2004

Different  values  of  the  parameter    (Random  design)  

Simula9ons   Simulated  DNA  sequences  

Observed  DNA  sequences  

ABC  Most  probable  values  for  the  parameter  

Page 3: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Parameter inference for ABC

Page 4: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

A coalescent example in population geneticsEstimating the mutation rate θ

T2

T3

T4

A B C D

1

2

3

4

5

6

ABCD

TMR

CA

123456

Segregatingsites

000100011000101000101011

Number of segregating sites: S=6

Heterozygosity: H=3.16

Page 5: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Two approximations in ABC

Replace full posterior p(θ|D) with partial posterior p(θ|sobs)

Nonparametric estimation of p(θ|sobs)

Page 6: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Rejection algorithmPritchard et al., MBE 1999

Simulate n values θi , i = 1, . . . ,n from the prior πSimulate n (possibly multivariate) summary statistics siaccording to p(si|θi)

Consider the weighted sample (θi ,Wi), i = 1, . . . ,n

Wi =

{1 if‖si − sobs)‖ ≤ b0 otherwise.

The parameter b is an acceptance threshold.

Page 7: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Rejection algorithm

��

���

��

��

0.2 0.4 0.6 0.8 1.0

Regression correction

Summary statistic

Mod

el p

aram

eter

��

��

��

��

��

��

−−b ++ b

S((y0))

�θ i

Posterior distribution

Page 8: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Regression adjustmentBeaumont et al., Genetics 2002

��

���

��

��

0.2 0.4 0.6 0.8 1.0

Regression correction

Summary statistic

Mod

el p

aram

eter

��

��

��

��

��

��

��

sobs

++ b−−b

θ i

Posterior distribution

θ i*

Page 9: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Linear regression adjustment

A model of local regression

θi |si = m(si) + εi

Local linear approximation

m(si) = α+ stiβ

Adjustmentθ∗i = m̂(sobs) + ε̃i ,

Page 10: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Main theoremBlum, JASA 2010

Asymptotic bias of the estimated posterior meanj = 0 rejection, j = 1 linear adjustment

C1,jb2

Asymptotic varianceC3

nbd

d is the number of the statistics and n is the number of simulations

Overemphasizes the curse of dimensionality because empiricalevidence are much more optimistic.

Page 11: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Comparison between the two estimators withadjustment

When the modelθi = m(si) + εi

is homoscedastic in the vicinity of sobs, the bias for theestimator with quadratic adjustment is

o(b2).

Transformations of the sum stat to make the model ashomoscedastic as possible.non-linear adj.Non-homoscedastic adjustment

Page 12: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Regression adjustment for the mean and the varianceBlum and François, Stat and Comput 2010

��

���

��

��

0.2 0.4 0.6 0.8 1.0

Regression correction

Summary statistic

Mod

el p

aram

eter

��

��

��

��

��

��

S((y0))

++ b−−b

θθi

θθi*

Posterior distribution

Page 13: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

ABC with and without adjustmentEstimating the mean in a Gaussian sample

0.5 1.5 2.5 3.5

0.0

0.5

1.0

1.5

2.0

μ

Den

sity

1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

No regression adjustment With regression adjustment

μ

True posterior

Page 14: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

How to check that ABC works when you do not knowthe posteriorCook et al. 2006 J. Comp. Graph. Stat.

Take a (θi ,si) drawn from π(θi)p(si|θi).Perform ABC with sobs = si .Compute the proportion pi of posterior samples smallerthan θi .

If the algorithm provide samples from p(θi |si), pi should beuniformly distributed.

Page 15: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

How to check that ABC works when you do not knowthe posteriorCook et al. 2006 J. Comp. Graph. Stat.

No regression adjustment

pi

Freq

uenc

y

0.0 0.4 0.8

05

1015

2025

3035

With regression adjustment

Freq

uenc

y

0.0 0.4 0.8

05

1015

2025

pi

Page 16: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Adaptive ABCSisson et al., PNAS 2007; Beaumont et al, Biometrika

2009; Del Moral et al. Stat and Comput 2011

Multi-step algorithms that sample θ from updated distributionsthat get closer and closer to the posterior distribution.

Page 17: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Model selection and related criticisms

Page 18: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Distinguishing between modelsAn exemple in human evolutionFagundes et al., PNAS 2007

Page 19: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Rejection algorithmPritchard et al., MBE 1999

Simulate the same number of simulations under eachmodelMk , k = 1, . . . ,K .Accept the simulations for which ‖si − sobs‖ ≤ b.

The proportion of accepted simulations under each modelk = 1, . . . ,K is an estimate of the posterior distributionp(Mk |sobs).

Page 20: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Model selection with logistic regressionBeaumont 2008

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Summary statistic

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Post

erio

r pro

babi

lity

of m

odel

1

−−b

S((y0))� Model 0Model 1

+b

Page 21: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Criticisms about model selection with ABCTempleton, PNAS 2010

‘The probability of the nested special case must be less than orequal to the probability of the general model within which thespecial case is nested’.

In a proper Bayesian framework, two hypotheses representnon-nested events.Coin tossing example.M0 : q = 0.5,M1 : q U(0,1).9 heads out of 20 tosses.

p(M0|9/20) ≈ 3.36p(M1|9/20)

Page 22: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Criticisms about model selection with ABCRobert et al., PNAS 2011

‘The algorithm involves an unknown loss of informationinduced by the use of insufficient summary statistics’.Assume that s is sufficient for parameter inference inmodel 0 and model 1.

p(M0|sobs)

p(M1|sobs)= g(D)

p(M0|D)

p(M1|D),

with g(D) possibly different from 1.

Page 23: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Some elements of answers if your NSF reviewerkeeps bothering you with this.

The criticism pertains to situation where s is sufficient forparameter inference in model 0 and model 1. Exercise :For coalescent models, try to find one for a constant-sizepopulation model and a bottleneck model.In approximate Bayesian Computation, we target p(M|s)instead of p(M|D).An important question to address might be ‘Does s containenough information to distinguish between models’.

Page 24: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Model selection with ABC : the right answer ?

Fagundes et al., PNAS 2007

Page 25: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

A deviance criterion for model selection with ABCFrançois and Laval, SAGMB 2011

Estimators of p(M|s) ignore regression adjustments onparameter samples

DIC = EPost[deviance] + effective number of parameters

Page 26: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

2-3 examples

Page 27: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Example 1 : Models of origins for modern humansBlum and Jakobsson, MBE 2011

TMRCA distribution A) Autosomal genes

DataSingle OriginModelLow admixturewith archaichumans

Millions of years

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5 3.0

B) X chromosome

0.0 0.5 1.0 1.5 2.0 2.5 3.0Millions of years

Den

sity

Page 28: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Models of origins for modern humans

C) mtDNA and Y chromosome

Millions of years

Den

sity

Y chr. mtDNA

0.0 0.2 0.4 0.6 0.8 1.0

Single Origin Model

Low admixture witharchaic humans

Bottleneck

Page 29: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Example 1 : Testing the human ‘speciation’ bottleneckSjödin et al., MBE 2012

Lahr and Foley 1998

Page 30: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Estimating the strength of the bottleneck

−1.5 −1.0 −0.5 0.0

02

46

8

Magnitude of reduction b

Den

sity San

BiakaMandenka

Prior

30 10 3 1Ratio of population sizes NA/NB

Page 31: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Support for a ‘no-bottleneck’ model against 2bottleneck modelsPr(no bottleneck |sobs) ≥ 79%

max. expansionof Sahara

SWAfricandesert

max. expansionof Sahara

SWAfricandesert

TgrN0

NANA

n

T =20-60 kyadur

130 kya

NB

No bottleneck Bottleneck

Founder hypothesis Fragmentation hypothesis

NA

NA

NA

N0 N0

Page 32: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Is it possible to distinguish between models ?

−1.5 −1.0 −0.5 0.0

Magnitude of reduction b

Prop

ortio

n of

ass

ignm

ent

Magnitude of reduction b

A. No bottleneck B. Founder C. Fragmentation

FragmentationFounderNo bott.

0.0

0.2

0.4

0.6

0.8

1.0

−1.5 −1.0 −0.5 0.0

30 10 3 1Ratio of population sizes NA/NB

30 10 3 1Ratio of population sizes NA/NB

Page 33: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Goodness of fitPosterior predictive checks + PCA

6

High Mut.Low Mut.

No bottleneck Bottleneck founder fragmentation

−6 −4 −2 0 2 4 6

−6−4

−20

24

6

PC1

PC2

−6 −4 −2 0 2 4 6

−6−4

−20

24

6

PC1−6 −4 −2 0 2 4

−6−4

−20

24

6

PC1

San Biaka Mandenka

Page 34: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Example 2 : Species delimitation with ABCCamargo et al., Evolution 2012

A""""""""B""""""""C""""""" (A,B)""""""""""""""""""(C,

Page 35: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Gene trees of loci sampled for species delimitationanalyses

Page 36: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Prior predictive checks

Page 37: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Example 3 : Fitting models of continuous trait evolutionSlater et al., Evolution 2012

Time-calibrated phylogeny of Carnivora used to estimate ratesof trait evolution

Page 38: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Comparing a two-rate modelM2 to a one-rate modelM1

If p(M1|s) = x%, there is a probability of x% that s wasgenerated fromM1 (and 1− x% that s was generated fromM2).Cook et al. 2006 in a model selection framework.

Page 39: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Checking the ‘consistency’ of the Bayes factor

Page 40: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Comparing pinniped and terrestrial carnivore bodysize evolutionary rates

Page 41: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Conclusion

ABC incorporates all aspects of Bayesian data analysis :formulation, fitting and model selection, and improvementof a model through model checkingCsilléry et al., TREE 2010.To address issues related to model selection

1 Ability to distinguish between models2 ‘Consistency’ of the Bayes factor3 Buy a bottle of wine to the reviewer

The R package abc implements several ABC algorithms.

Page 42: Approximate Bayesian Computation: algorithms, theory and ...Parameter inferenceModel selection2-3 examples Criticisms about model selection with ABC Templeton, PNAS 2010 ‘The probability

Parameter inference Model selection 2-3 examples

Colleagues