accommodating clustered divergences in phylogenetic inference

92
Accommodating clustered divergences in phylogenetic inference Jamie R. Oaks 1,2 1 Department of Biology, University of Washington 2 Department of Biological Sciences, Auburn University October 21, 2015 c 2007 Boris Kulikov boris-kulikov.blogspot.com Clustered diversification Jamie Oaks – phyletica.org 1/27

Upload: jamie-oaks

Post on 12-Apr-2017

247 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Accommodating clustered divergences in phylogenetic inference

Accommodatingclustered divergences inphylogenetic inference

Jamie R. Oaks1,2

1Department of Biology, University ofWashington

2Department of Biological Sciences,Auburn University

October 21, 2015

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 1/27

Page 2: Accommodating clustered divergences in phylogenetic inference

I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference

I “Big data” present excitingpossibilities andcomputational challenges

I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 2/27

Page 3: Accommodating clustered divergences in phylogenetic inference

I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference

I “Big data” present excitingpossibilities andcomputational challenges

I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 2/27

Page 4: Accommodating clustered divergences in phylogenetic inference

I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference

I “Big data” present excitingpossibilities andcomputational challenges

I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 2/27

Page 5: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 6: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 7: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 8: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 9: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 10: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 11: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 12: Accommodating clustered divergences in phylogenetic inference

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Page 13: Accommodating clustered divergences in phylogenetic inference

Divergence model choice

τ1

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Page 14: Accommodating clustered divergences in phylogenetic inference

Divergence model choice

τ1

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Page 15: Accommodating clustered divergences in phylogenetic inference

Divergence model choice

τ2 τ1

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Page 16: Accommodating clustered divergences in phylogenetic inference

Divergence model choice

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Page 17: Accommodating clustered divergences in phylogenetic inference

Divergence model choice

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Page 18: Accommodating clustered divergences in phylogenetic inference

Divergence model choice

τ3 τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Page 19: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

m1 m2 m3 m4 m5

τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 20: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

m1 m2 m3 m4 m5

τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 21: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 22: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

p(mi |X) ∝ p(X |mi )p(mi )

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 23: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

p(mi |X) ∝ p(X |mi )p(mi )

p(X |mi ) =

∫θp(X | θ,mi )p(θ |mi )dθ

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 24: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

p(mi |X) ∝ p(X |mi )p(mi )

p(X |mi ) =

∫θp(X | θ,mi )p(θ |mi )dθ

I Divergence times

I Gene trees

I Substitution parameters

I Demographic parameters

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 25: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analytically

I Numerical approximation via approximate-likelihood Bayesiancomputation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 26: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analytically

I Numerical approximation via approximate-likelihood Bayesiancomputation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 27: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 28: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 29: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 models

I 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 30: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 models

I 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 31: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!

I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 32: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Page 33: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

A

A

A

G

G

G

C

C

C

C

C

C

G

G

G

G

G

G

A

A

A

A

A

T

A

A

A

A

A

A

T

T

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

G

G

G

G

G

G

C

C

C

T

T

T

T

T

T

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

C

C

T

T

T

T

A

A

A

A

A

A

C

C

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

A

A

A

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

T

T

T

G

G

G

G

G

G

T

T

T

T

C

C

A

A

A

A

A

A

C

C

C

C

C

C

C

C

C

T

T

T

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 6/27

Page 34: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

A

A

A

G

G

G

C

C

C

C

C

C

G

G

G

G

G

G

A

A

A

A

A

T

A

A

A

A

A

A

T

T

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

G

G

G

G

G

G

C

C

C

T

T

T

T

T

T

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

C

C

T

T

T

T

A

A

A

A

A

A

C

C

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

A

A

A

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

T

T

T

G

G

G

G

G

G

T

T

T

T

C

C

A

A

A

A

A

A

C

C

C

C

C

C

C

C

C

T

T

T

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 6/27

Page 35: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

A

A

A

G

G

G

C

C

C

C

C

C

G

G

G

G

G

G

A

A

A

A

A

T

A

A

A

A

A

A

T

T

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

G

G

G

G

G

G

C

C

C

T

T

T

T

T

T

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

C

C

T

T

T

T

A

A

A

A

A

A

C

C

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

A

A

A

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

T

T

T

G

G

G

G

G

G

T

T

T

T

C

C

A

A

A

A

A

A

C

C

C

C

C

C

C

C

C

T

T

T

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 6/27

Page 36: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 37: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 38: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 39: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 40: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 41: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 42: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 43: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 44: Accommodating clustered divergences in phylogenetic inference

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Page 45: Accommodating clustered divergences in phylogenetic inference
Page 46: Accommodating clustered divergences in phylogenetic inference

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 9/27

Page 47: Accommodating clustered divergences in phylogenetic inference

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Page 48: Accommodating clustered divergences in phylogenetic inference

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Page 49: Accommodating clustered divergences in phylogenetic inference

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Page 50: Accommodating clustered divergences in phylogenetic inference

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Page 51: Accommodating clustered divergences in phylogenetic inference

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 52: Accommodating clustered divergences in phylogenetic inference

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 53: Accommodating clustered divergences in phylogenetic inference

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 54: Accommodating clustered divergences in phylogenetic inference

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 55: Accommodating clustered divergences in phylogenetic inference

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 56: Accommodating clustered divergences in phylogenetic inference

α = 0.5

(αα+1

)(αα+2

)= 0.067

= 0.758

αα+2

(αα+1

)(1

α+2

)= 0.133

= 0.076

1α+2

(αα+1

)(1

α+2

)= 0.133

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)= 0.133

= 0.076

αα+2

(1

α+1

)(2

α+2

)= 0.533

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 57: Accommodating clustered divergences in phylogenetic inference

α = 10.0

(αα+1

)(αα+2

)= 0.758

αα+2

(αα+1

)(1

α+2

)= 0.076

1α+2

(αα+1

)(1

α+2

)= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)= 0.076

αα+2

(1

α+1

)(2

α+2

)= 0.0152

α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

Page 58: Accommodating clustered divergences in phylogenetic inference

New method: dpp-msbayes

I Flexible Dirichlet-process prior (DPP) over all possibledivergence models

I Flexible priors on parameters to avoid strongly weightedposteriors

I Multi-processing to accommodate genomic datasets

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27

Page 59: Accommodating clustered divergences in phylogenetic inference

New method: dpp-msbayes

I Flexible Dirichlet-process prior (DPP) over all possibledivergence models

I Flexible priors on parameters to avoid strongly weightedposteriors

I Multi-processing to accommodate genomic datasets

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27

Page 60: Accommodating clustered divergences in phylogenetic inference

New method: dpp-msbayes

I Flexible Dirichlet-process prior (DPP) over all possibledivergence models

I Flexible priors on parameters to avoid strongly weightedposteriors

I Multi-processing to accommodate genomic datasets

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27

Page 61: Accommodating clustered divergences in phylogenetic inference

dpp-msbayes: Simulation-based assessment

Validation:

I Simulate 50,000 datasets and analyze each under the samemodel

Robustness:

I Simulate datasets that violate model assumptions and analyzeeach of them

Clustered diversification Jamie Oaks – phyletica.org 13/27

Page 62: Accommodating clustered divergences in phylogenetic inference

dpp-msbayes: Simulation-based assessment

Validation:

I Simulate 50,000 datasets and analyze each under the samemodel

Robustness:

I Simulate datasets that violate model assumptions and analyzeeach of them

Clustered diversification Jamie Oaks – phyletica.org 13/27

Page 63: Accommodating clustered divergences in phylogenetic inference

dpp-msbayes: Validation results

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Posterior probability of one divergence

True

prob

abili

tyof

one

dive

rgen

ce

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 14/27

Page 64: Accommodating clustered divergences in phylogenetic inference

dpp-msbayes: Robustness results

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior probability of one divergence

True

prob

abili

tyof

one

dive

rgen

ce

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 15/27

Page 65: Accommodating clustered divergences in phylogenetic inference

dpp-msbayes: Performance

I New method for estimating shared evolutionary history shows:

1. Model-choice accuracy2. Robustness to model violations3. Power to detect variation in divergence times4. It’s fast!

I A new tool for biologists to leverage comparativegenomic data to explore processes of co-diversification

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 16/27

Page 66: Accommodating clustered divergences in phylogenetic inference

dpp-msbayes: Performance

I New method for estimating shared evolutionary history shows:

1. Model-choice accuracy2. Robustness to model violations3. Power to detect variation in divergence times4. It’s fast!

I A new tool for biologists to leverage comparativegenomic data to explore processes of co-diversification

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 16/27

Page 67: Accommodating clustered divergences in phylogenetic inference

Empirical applications

Did repeatedfragmentation of islandsduring inter-glacial risesin sea level promotediversification?

Clustered diversification Jamie Oaks – phyletica.org 17/27

Page 68: Accommodating clustered divergences in phylogenetic inference

Empirical applications

Did repeatedfragmentation of islandsduring inter-glacial risesin sea level promotediversification?

Clustered diversification Jamie Oaks – phyletica.org 17/27

Page 69: Accommodating clustered divergences in phylogenetic inference

Climate-driven diversification

Clustered diversification Jamie Oaks – phyletica.org 18/27

Page 70: Accommodating clustered divergences in phylogenetic inference

Climate-driven diversification

Clustered diversification Jamie Oaks – phyletica.org 18/27

Page 71: Accommodating clustered divergences in phylogenetic inference

Climate-driven diversification

Clustered diversification Jamie Oaks – phyletica.org 18/27

Page 72: Accommodating clustered divergences in phylogenetic inference

Results

1 3 5 7 9 11 13 15 17 19 21Number of divergence events

0.00

0.02

0.04

0.06

0.08

0.10

Pos

terio

r pro

babi

lity

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 19/27

Page 73: Accommodating clustered divergences in phylogenetic inference

Results

1 3 5 7 9 11 13 15 17 19 21Number of divergence events

0.00

0.02

0.04

0.06

0.08

0.10

Pos

terio

r pro

babi

lity

0100200300400500Time (kya)

0

-50

-100

Sea le

vel (m

)

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 19/27

Page 74: Accommodating clustered divergences in phylogenetic inference

More data!

I Collecting genomic data from taxa co-distributed acrossSoutheast Asian Islands and Mainland

I Preliminary results for 1000 loci from 5 pairs of Gekkomindorensis populations

Clustered diversification Jamie Oaks – phyletica.org 20/27

Page 75: Accommodating clustered divergences in phylogenetic inference

More data!

I Collecting genomic data from taxa co-distributed acrossSoutheast Asian Islands and Mainland

I Preliminary results for 1000 loci from 5 pairs of Gekkomindorensis populations

1 2 3 4 5Number of divergence events, j¿j

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.02l

n(B

ayes

fact

or)

Clustered diversification Jamie Oaks – phyletica.org 20/27

Page 76: Accommodating clustered divergences in phylogenetic inference

Diversification across African rainforests

I Did climate cycles drivediversification andcommunity assembly acrossrainforest taxa?

I Preliminary results with 300loci from 3 taxa

Clustered diversification Jamie Oaks – phyletica.org 21/27

Page 77: Accommodating clustered divergences in phylogenetic inference

Diversification across African rainforests

I Did climate cycles drivediversification andcommunity assembly acrossrainforest taxa?

I Preliminary results with 300loci from 3 taxa

1 2 3Number of divergence events, j¿j

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2ln(

Bay

es fa

ctor

)

Clustered diversification Jamie Oaks – phyletica.org 21/27

Page 78: Accommodating clustered divergences in phylogenetic inference

Conclusions

I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations

I Finding support for temporally clustered divergences inmultiple systems

I However, there is a lot of uncertainty!

Clustered diversification Jamie Oaks – phyletica.org 22/27

Page 79: Accommodating clustered divergences in phylogenetic inference

Conclusions

I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations

I Finding support for temporally clustered divergences inmultiple systems

I However, there is a lot of uncertainty!

Clustered diversification Jamie Oaks – phyletica.org 22/27

Page 80: Accommodating clustered divergences in phylogenetic inference

Conclusions

I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations

I Finding support for temporally clustered divergences inmultiple systems

I However, there is a lot of uncertainty!

Clustered diversification Jamie Oaks – phyletica.org 22/27

Page 81: Accommodating clustered divergences in phylogenetic inference

Current work: More power

I Full-likelihood Bayesian implementation

I Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Page 82: Accommodating clustered divergences in phylogenetic inference

Current work: More power

I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Page 83: Accommodating clustered divergences in phylogenetic inference

Current work: More power

I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Page 84: Accommodating clustered divergences in phylogenetic inference

Current work: More power

I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Page 85: Accommodating clustered divergences in phylogenetic inference

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Page 86: Accommodating clustered divergences in phylogenetic inference

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Page 87: Accommodating clustered divergences in phylogenetic inference

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Page 88: Accommodating clustered divergences in phylogenetic inference

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Page 89: Accommodating clustered divergences in phylogenetic inference

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Page 90: Accommodating clustered divergences in phylogenetic inference

Everything is on GitHub. . .

Software:

I dpp-msbayes: https://github.com/joaks1/dpp-msbayes

I PyMsBayes: https://joaks1.github.io/PyMsBayes

I ABACUS: Approximate BAyesian C UtilitieS.https://github.com/joaks1/abacus

Open-Science Notebook:

I msbayes-experiments:https://github.com/joaks1/msbayes-experiments

Clustered diversification Jamie Oaks – phyletica.org 25/27

Page 91: Accommodating clustered divergences in phylogenetic inference

Acknowledgments

Ideas and feedback:

I Leache Lab

I Minin Lab

I Holder Lab

I Brown Lab/KU Herpetology

Computation:

Funding:

Photo credits:

I Rafe Brown, Cam Siler, JesseGrismer, & Jake Esselstyn

I FMNH Philippine MammalWebsite:

I D.S. Balete, M.R.M. Duya,& J. Holden

I PhyloPic!

Clustered diversification Jamie Oaks – phyletica.org 26/27

Page 92: Accommodating clustered divergences in phylogenetic inference

Questions?

[email protected]

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 27/27