a relaxed drift di usion model for phylogenetic trait ... · fusion model for evolution of...

35
A Relaxed Drift Diffusion Model for Phylogenetic Trait Evolution Mandev S. Gill 1 , Lam Si Tung Ho 1 , Guy Baele 2 , Philippe Lemey 2 , and Marc A. Suchard 1,3,4 1 Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States 2 Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium 3 Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States 4 Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California, Los Angeles, United States Abstract Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phy- logeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits as a Brownian dif- fusion process acting on a phylogenetic tree. However, standard Brownian diffusion is quite restrictive and may not accurately characterize certain trait evolutionary pro- cesses. Here, we relax one of the major restrictions of standard Brownian diffusion by incorporating a nontrivial estimable drift into the process. We introduce a re- laxed drift diffusion model for the evolution of multivariate continuously varying traits along a phylogenetic tree via Brownian diffusion with drift. Notably, the relaxed drift model accommodates branch-specific variation of drift rates while preserving model identifiability. Furthermore, our development of a computationally efficient dynamic programming approach to compute the data likelihood enables scaling of our method to large data sets frequently encountered in viral evolution. We implement the relaxed drift model in a Bayesian inference framework to simultaneously reconstruct the evo- lutionary histories of molecular sequence data and associated multivariate continuous trait data, and provide tools to visualize evolutionary reconstructions. We illustrate the utility of our approach in three viral examples. In the first two, we examine the spatiotemporal spread of HIV-1 in central Africa and West Nile virus in North Amer- ica and show that a relaxed drift approach uncovers a clearer, more detailed picture of the dynamics of viral dispersal than standard Brownian diffusion. Finally, we study antigenic evolution in the context of HIV-1 resistance to three broadly neutralizing antibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 popu- lation level towards enhanced resistance to neutralization by the VRC01 monoclonal antibody over the course of the epidemic. arXiv:1512.07948v2 [q-bio.PE] 29 Dec 2015

Upload: others

Post on 08-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

A Relaxed Drift Diffusion Model for PhylogeneticTrait Evolution

Mandev S. Gill1, Lam Si Tung Ho1, Guy Baele2, Philippe Lemey2,and Marc A. Suchard1,3,4

1Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University ofCalifornia, Los Angeles, United States

2Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium3Department of Biomathematics, David Geffen School of Medicine at UCLA, University of

California, Los Angeles, United States4Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of

California, Los Angeles, United States

Abstract

Understanding the processes that give rise to quantitative measurements associatedwith molecular sequence data remains an important issue in statistical phylogenetics.Examples of such measurements include geographic coordinates in the context of phy-logeography and phenotypic traits in the context of comparative studies. A popularapproach is to model the evolution of continuously varying traits as a Brownian dif-fusion process acting on a phylogenetic tree. However, standard Brownian diffusionis quite restrictive and may not accurately characterize certain trait evolutionary pro-cesses. Here, we relax one of the major restrictions of standard Brownian diffusionby incorporating a nontrivial estimable drift into the process. We introduce a re-laxed drift diffusion model for the evolution of multivariate continuously varying traitsalong a phylogenetic tree via Brownian diffusion with drift. Notably, the relaxed driftmodel accommodates branch-specific variation of drift rates while preserving modelidentifiability. Furthermore, our development of a computationally efficient dynamicprogramming approach to compute the data likelihood enables scaling of our methodto large data sets frequently encountered in viral evolution. We implement the relaxeddrift model in a Bayesian inference framework to simultaneously reconstruct the evo-lutionary histories of molecular sequence data and associated multivariate continuoustrait data, and provide tools to visualize evolutionary reconstructions. We illustratethe utility of our approach in three viral examples. In the first two, we examine thespatiotemporal spread of HIV-1 in central Africa and West Nile virus in North Amer-ica and show that a relaxed drift approach uncovers a clearer, more detailed pictureof the dynamics of viral dispersal than standard Brownian diffusion. Finally, we studyantigenic evolution in the context of HIV-1 resistance to three broadly neutralizingantibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 popu-lation level towards enhanced resistance to neutralization by the VRC01 monoclonalantibody over the course of the epidemic.

arX

iv:1

512.

0794

8v2

[q-

bio.

PE]

29

Dec

201

5

Page 2: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

1 Introduction

Phylogenetic inference has emerged as an important tool for understanding patternsof molecular sequence variation over time. Along with the increasing availability of molecularsequence data, there has been a growth of associated sources of information, such as spatialand phenotypic trait data, underscoring the need for integrated models of sequence andtrait evolution on phylogenies, which promise to deliver more precise insights and increaseopportunities for statistical hypothesis testing.

Much of the development of trait evolution models has been motivated by phylogeneticcomparative approaches focusing on phenotypic and ecological traits. Traditional compara-tive methods assess the correlation between traits through standard regression models thatassume taxa traits are generated independently by the same distribution. This assumptionis obviously violated by taxa traits due to their shared ancestry. A proper understanding ofpatterns of correlation between traits can be achieved only by accounting for their sharedevolutionary history (Felsenstein 1985; Harvey and Pagel 1991), and comparative methodsfocus on relating observed phenotype information to an evolutionary history.

Trait evolution has been tackled from another angle in phylogeographic approachesfocusing on geographic locations rather than phenotypic traits. Evolutionary change is bet-ter understood when accounting for its geographic context, and phylogeographic inferencemethods aim to connect the evolutionary and spatial history of a population (Bloomquistet al. 2010). Phylogeographic techniques have allowed researchers to better understand theorigin, spread, and dynamics of emerging infectious diseases. Examples include the humaninfluenza A virus (Rambaut et al. 2008; Smith et al. 2009; Lemey et al. 2009b), rabies viruses(Biek et al. 2007; Seetahal et al. 2013), dengue virus (Bennett et al. 2010; Allicock et al.2012) and hepatitis B virus (e.g. Mello et al. (2013)).

While methods for phenotypic and phylogeographic analyses are developed with dif-ferent data in mind, they address similar situations and it is appropriate to speak moregenerally of trait evolution. Two key components required for modeling phylogenetic traitevolution are a method for incorporating phylogenetic information and a model of an evolu-tionary process on a phylogeny giving rise to the observed traits. Many popular approachesfirst reconstruct a phylogenetic tree and condition inferences pertaining to the trait evolutionprocess on this fixed tree. However, computational advances, particularly in Markov chainMonte Carlo (MCMC) sampling techniques, have made it possible to control for phylogeneticuncertainty (as well as uncertainty in other important model parameters) through integratedmodels that jointly estimate parameters of interest (Huelsenbeck and Rannala 2003; Lemeyet al. 2010).

The evolution of discrete traits has typically been modeled using continuous-timeMarkov chains (Felsenstein 1981; Pagel 1999; Lemey et al. 2009a), analogous to substitutionmodels for molecular sequence characters. However, phenotypic and geographic traits are of-ten continuously distributed, and while meaningful inferences may still be made partitioningthe state space into finite parts, stochastic processes with continuous state spaces representa more natural approach. A popular choice to model continous trait evolution along thelineages of a phylogenetic tree is Brownian diffusion (Felsenstein 1985). Lemey et al. (2010)

Page 3: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

and Pybus et al. (2012) have recently developed a computationally efficient Brownian dif-fusion model for evolution of multivariate traits in a Bayesian framework that integratesit with models for phylogenetic reconstruction and molecular evolution. Notably, their fullprobabilistic approach accounts for uncertainty in the phylogeny, demographic history andevolutionary parameters. Trait evolution is modeled as a multivariate time-scaled mixtureof Brownian diffusion processes with a zero-mean displacement (or, in other words, neutraldrift) along each branch of the possibly unknown phylogeny.

While adopting a mixture of Brownian diffusion processes is a popular and useful ap-proach, it may not appropriately describe the evolutionary process in certain situations. Suchscenarios are more realistically modeled by more sophisticated diffusion processes. Theremay, for example, be selection toward an optimal trait value. To address this phenomenon,there has been considerable development of mean-reverting Ornstein-Uhlenbeck process mod-els for trait evolution, featuring a stochastic Brownian component along with a deterministiccomponent (Hansen 1997; Butler and King 2004; Bartoszek et al. 2012).

Another trait evolutionary process inadequately modeled by standard Brownian dif-fusion is one characterized by directional trends. The need for relaxing the assumption ofneutral drift is highlighted by a number of evolutionary scenarios in which there are ap-parent trends in the direction of variations, including antigenic drift in influenza (Bedfordet al. 2014), the evolution of body mass in carnivores (Lartillot and Poujol 2011), and dis-persal patterns of viral outbreaks (Pybus et al. 2012). To this end, we extend the Bayesianmultivariate Brownian diffusion framework of Lemey et al. (2010) to allow for an unknownestimable nonzero drift vector for the mean displacement in a computationally efficient man-ner. While inclusion of a nontrivial drift represents a promising first step, a constant driftrate may not hold over an entire evolutionary history. We address this issue by presentinga flexible relaxed drift model that permits multiple drift rates on a phylogenetic tree. Im-portantly, we equip the model with machinery to infer the number of different drift ratessupported by the data as well as the locations of rate changes.

We apply our relaxed drift diffusion methodology to three viral examples of clinicalimportance. In the first two examples, we illustrate our approach in a phylogeographic set-ting by investigating the spatial diffusion of HIV-1 in central Africa and the West Nile virusin North America. For the third example, we explore antigenic evolution in the context ofenhanced resistance of HIV-1 to broadly neutralizing antibodes over the course of the epi-demic. We employ model selection techniques to compare the nested drift-neutral, constantdrift and relaxed drift Brownian diffusion models. We demonstrate a better fit by relaxingthe restrictive drift-neutral assumption, and an improved ability to uncover and quantifykey aspects of trait evolution dynamics.

2 Methods

We start by assuming we have a dataset of N aligned molecular sequences X =(X1, . . . ,XN) along with N associated M -dimensional, continuously varying traits Y =(Y1, . . . ,YN). The sequence and trait data correspond to the tips of an unknown yet es-

Page 4: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

timable phylogenetic tree τ . Later we will discuss accounting for phylogenetic uncertainty,modeling the molecular evolution process giving rise to X and integrating it with a modelfor trait evolution. But first, we explore trait evolution on a fixed phylogeny via a diffusionprocess acting conditionally independently along its branches.

The N -tipped bifurcating phylogenetic tree τ is a graph with a set of vertices V =(V1, . . . ,V2N−1) and edge weights T = (t1, . . . , t2N−2). The vertices correspond to nodes ofthe tree and, as the length of the tree τ is measured in units of time, T consists of timescorresponding to branch lengths. Each external node Vi for i = 1, . . . , N is of degree 1, withone parent node Vpa(i) from within the internal or root nodes. Each internal node Vi fori = N + 1, . . . , 2N − 2 is of degree 3 and the root node V2N−1 is of degree 2. An edge withweight ti connects Vi to Vpa(i), and we refer to this edge as branch i. In addition to theobserved traits Y1, . . . ,YN at the external nodes, we posit for mathematical convenienceunobserved traits YN+1, . . . ,Y2N−1 at the internal nodes and root.

Brownian diffusion (also known as a Wiener process) is a continuous-time stochasticprocess originally developed to model the random motion of a physical particle (Brown1828; Wiener 1958). Formally, a standard multivariate Brownian diffusion process W(t) ischaracterized by the following properties: W(0) = 0, the map t 7→ W(t) is almost surelycontinuous, W(t) has independent increments and, for 0 ≤ s ≤ t, W(t) −W(s) has amultivariate normal distribution with mean 0 and variance matrix (t− s)I.

Recent phylogenetic comparative methods (Vrancken et al. 2015; Cybis et al. 2015)aim to model the correlated evolution between multiple traits and, to this end, employ acorrelated multivariate Brownian diffusion with displacement variance (t − s)P−1. Here, Pis an M ×M infinitesimal precision matrix. The mean of 0 posits a neutral drift so thatthe traits do not evolve according to any systematic directional trend. Matrix P determinesthe intensity and correlation of the trait diffusion after controlling for shared evolutionaryhistory.

The Brownian diffusion process along a phylogeny produces the observed traits bystarting at the root node and proceeding down the branches of τ . The displacement Yi −Ypa(i) along a branch is multivariate normally distributed, centered at 0 with variance tiP

−1

proportional to the length of the branch. Therefore, conditioning on the trait value Ypa(i)

at the parent node, we have

Yi|Ypa(i) ∼ N(Ypa(i), tiP

−1) . (1)

An extension that introduces branch-specific mixing parameters φi into the process thatrescale ti 7→ φiti yields a mixture of Brownian processes and remains popular in phylogeog-raphy (Lemey et al. 2010).

2.1 Drift

To incorporate a directional trend, we adopt a multivariate correlated Brownian diffu-sion process with a non-neutral drift. In our drift diffusion process we replace the zero meanof the increment W(t) −W(s) with the time-scaled mean vector (t − s)µ. The expected

Page 5: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

difference between the trait values of a descendant and its ancestor is determined by theoverall drift vector µ and the time elapsed between descendant and ancestor. This yieldswhat we will call the constant drift model:

Yi|Ypa(i) ∼ N(Ypa(i) + tiµ, tiP

−1) . (2)

While this approach is useful for modeling general directional trends, it is quite re-strictive in that the drift µ is fixed over the entire phylogeny. We can relax this assumptionby introducing branch-specific drift vectors µi:

Yi|Ypa(i) ∼ N(Ypa(i) + tiµi, tiP

−1) (3)

for i = 1, . . . , 2N − 2. We assign the root a conjugate prior

Y2N−1 ∼ N(µ∗, (φP)−1

), (4)

that is relatively uninformative for small values of φ.Conditioning on the trait value Y2N−1 at the root of τ , the joint distribution of

observed traits Y1, . . . ,YN can be expressed as

vec [Y] |(Y2N−1,P,Vτ ,µτ ) ∼ N(Yroot + (T⊗ IM)µτ ,P

−1 ⊗Vτ

), (5)

building on a similar construction for drift-neutral Brownian diffusion (Felsenstein 1973;Freckleton et al. 2002). Here, vec[Y] is the vectorization of the column vectors Y1, . . . ,YN ,while IM is an M×M identity matrix, and ⊗ is the Kronecker product. Yroot is the NM×1vector (Yt

2N−1, . . . ,Yt2N−1)

t, and µτ is the (2N − 2)M × 1 drift rate vector (µt1, . . . ,µ

t2N−2)

t.The N×N variance matrix Vτ is a deterministic function of τ and represents the contributionof the phylogenetic tree to the covariance structure. Its diagonal entries Vii are equal to thedistance in time between the tip Vi and the root node V2N−1, and off-diagonal entries Vijcorrespond to the distance in time between the root node V2N−1 and the most recent commonancestor of tips Vi and Vj. Finally, the N × (2N −2) matrix T is defined as follows: Tij = tj,the length of branch j, if branch j is part of the path from the external node i to the root,and Tij = 0 otherwise.

Our development thus far clarifies some important issues. First, while it is tempting tomodel a unique drift rate on each branch, not all µi are uniquely identifiable in the likelihood(5). Care must be taken to impose necessary restrictions to ensure identifiability while stillpermitting sufficient drift rate variation, and we discuss an approach to achieve this in section2.3. Second, the variance matrix P−1⊗Vτ in (5) suggests a computational order ofO(N3M3)to evaluate the density. Repeated evaluation of (5) is necessary for numerical integration inBayesian modeling, and viral data sets may encompass thousands of sequences. Fortunately,Pybus et al. (2012) demonstrate that phylogenetic Brownian diffusion likelihoods can beevaluated in computational order O(NM2) by modeling in terms of the precision matrix P(as opposed to the variance) and adopting a dynamic programming approach. In section2.2, we present an adaptation of their algorithm for our drift diffusion likelihood.

Page 6: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

2.2 Multivariate Trait Peeling

Under our Brownian drift diffusion process, the joint distribution of all traits isstraightforwardly expressed as the product

P (Y1, . . . ,Y2N−1|τ,P,µ, φ) =

(2N−2∏i=1

P (Yi|Ypa(i),P, ti,µi)

)P (Y2N−1|P,µ∗, φ), (6)

where µ = (µ1, . . . ,µ2N−2,µ∗). The density of the observed traits can then be obtained

by integrating over all possible realizations of the unobserved traits at the root and internalnodes. We adopt a dynamic programming approach that is analogous to Felsenstein’s prun-ing method (Felsenstein 1981) and has been employed for drift-neutral Brownian diffusionlikelihoods (Pybus et al. 2012; Vrancken et al. 2015; Cybis et al. 2015) .

We wish to compute the density

P (Y1, . . . ,YN) =

∫· · ·∫P (Y1, . . . ,Y2N−1)dYN+1 . . . dY2N−1 (7)

=

∫· · ·∫ (2N−2∏

i=1

P (Yi|Ypa(i))

)P (Y2N−1)dYN+1 . . . dY2N−1. (8)

We have omitted dependence on the parameters τ,P,µ1, . . . ,µ2N−2,µ∗ and φ from thenotation for the sake of clarity. The integration proceeds in a postorder traversal integratingout one internal node trait at a time. Let {Yi} denote the set of observed trait valuesdescendant from and including the node Vi, and suppose pa(i) = pa(j) = k. Our traversalrequires computing integrals of the form

P ({Yk}|Ypa(k)) =

∫P ({Yi}|Yk)P ({Yj}|Yk)P (Yk|Ypa(k))dYk. (9)

Because the integrand is proportional to a multivariate normal density, it suffices to keeptrack of partial mean vectors mk, partial precision scalars pk and normalizing constants ρk.

Let MVN(.;κ,Λ) denote a multivariate normal probability density function with meanκ and precision Λ. We can rewrite conditional densities to facilitate integration with respectto the trait at the parent node:

P (Yi|Yk) = MVN

(Yi; tiµi + Yk,

1

tiP

)(10)

= MVN

(Yi − tiµi; Yk,

1

tiP

). (11)

For i = 1, . . . , N , set normalizing constant ρi = 1, partial mean

mi = Yi − tiµi (12)

and partial precision

pi =1

ti. (13)

Page 7: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Then

P ({Yi}|Yk)P ({Yj}|Yk) = ρiρj ×MVN(mi; Yk, piP)× (14)

MVN(mj; Yk, pjP)

= ρk ×MVN(Yk; m∗k, (pi + pj)P) (15)

where partial unshifted mean

m∗k =pimi + pjmj

pi + pj, (16)

and normalizing constant

ρk = ρiρj

(pipj

2π(pi + pj)

)d/2|P|1/2

exp[−pi

2m′iPmi − pj

2m′jPmj

]exp

[−pi+pj

2m∗

′k Pm∗k

] . (17)

Multiplying by P (Yk|Ypa(k)) and integrating with respect to Yk, we get

P ({Yk}|Ypa(k)) =

∫P ({Yi}|Yk)P ({Yj}|Yk)P (Yk|Ypa(k))dYk (18)

= ρk ×MVN(Ypa(k); mk, pkP), (19)

wheremk = m∗k − tkµk, (20)

and

pk =1

tk + 1pi+pj

. (21)

Integrating out all internal node traits yields

P (Y1, . . . ,YN |Y2N−1) = ρ2N−1 ×MVN(Y2N−1; m∗2N−1, (p2N−2 + p2N−3)P). (22)

For the final step, we multiply by the conjugate root prior and integrate:

P (Y1, . . . ,YN) =

∫P (Y1, . . . ,YN |Y2N−1)P (Y2N−1)dY2N−1 (23)

= ρ2N−1MVN(m∗2N−1;µ∗, p2N−1P), (24)

where

p2N−1 =(p2N−2 + p2N−3)φ

p2N−2 + p2N−3 + φ. (25)

In practice, the algorithm visits each node in the phylogeny once and computes partialunshifted means m∗k, partial means mk, partial precisions pk, and normalizing constants ρk.

Page 8: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

2.3 Identifiability and Relaxed Drift

Ideally, we would like to model a unique drift rate µi on each branch i of the phyloge-netic tree. However, such lax assumptions open the door to misleading inferences. Adoptingthe notation of section 2.1, there can exist distinct drift rate vectors µτ 6= µ∗τ such that

(T⊗ IM)µτ = (T⊗ IM)µ∗τ , (26)

yielding identical trait likelihoods (5). The lack of model identifiability presents an obstacleto uncovering the “true” values of the drift rates that characterize the trait evolution process.

We propose a relaxed drift model that allows for drift rate variation along a phy-logenetic tree while maintaining model identifiability. This is achieved by having branchesinherit drift rates from ancestral branches by default, but allowing a random number ofspecific types of rate changes to occur along the tree. We describe the model here and referreaders to the Appendix for a detailed argument establishing identifiability.

We begin at the unobserved branch leading to the root, or most recent commonancestor (MRCA), of the phylogenetic tree τ and associate with it the drift µMRCA. Thenthe two branches emanating from the root node either both inherit the drift rate µMRCA, ora rate change occurs and one branch receives a new rate while the other branch assumes therate µMRCA. Similarly, whenever a branch splits into two anywhere in τ , either both childbranches assume the same drift rate as the parent branch, or one child branch takes on anew value while the other inherits its drift from the parent branch. Both child branchestaking on different drift rates than the parent branch is not permitted.

Importantly, rather than fix the type of drift rate transfer that occurs at a givennode, we estimate it from the data. The benefits of this choice are twofold. First, a ratechange is not forced when the data do not suggest a need for one. Unnecessarily imposing alarge number of unique drift rates to be inferred from limited data can lead to high varianceestimates. Second, in the event of a rate change occurring at a node, only one of the twochild branches can assume a new drift rate. We let the data determine which of the childbranches assumes the new rate. The data may support new rates on both child branches.While our model may seem too restrictive to accommodate such a scenario at first glance,we are able to infer the relative support for each child, and it is reflected in the posteriordistribution in terms of the probabilities of the two types of changes. Thus summaries ofthe posterior distribution can capture the true nature of drift rate variation in spite of theidentifiability restrictions.

It is important to handle the initial drift rate µMRCA with care. One option is toestimate µMRCA from the data just as with all other drift rates. However, such a choice maynot be ideal for data sets that exhibit relatively long periods of divergence from the MRCAto the sampling times. There is generally more information about diffusion dynamics duringtime periods overlapping with or close to sampling times. Likewise, the further removedthe MRCA is from sampling times, the less information there is about µMRCA and otherdrift rates near the MRCA. Because of the interconnectedness of branch drift rates in therelaxed model, estimates of µMRCA and neighboring drift rates under such circumstances mayprimarily reflect information about drift rates on branches near sampling times. To mitigate

Page 9: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

misleading inferences of drift near the MRCA, we can adopt an initial drift µMRCA = 0 andstill interprete changes in drift rates across the tree.

To parameterize the model, we associate a ternary variable δk with each internal nodek specifying how it passes on its drift rate to its child nodes. Suppose node k has left childnode i and right child node j. If δk = −1, then µi = µk and node j assumes a new rateµj = µk + αj. If δk = 1, then node i assumes a new rate µi = µk + αi while µj = µk. Ifδk = 0, then no drift rate changes occur and µk = µi = µj. To map the ternary δk variablesto binary indicators γi of rate changes for child branches, we define

γi =1 + δk

2|δk|, (27)

and

γj =1− δk

2|δk|. (28)

Thusµi = µpa(i) + γiαi, (29)

for i = 1, . . . , 2N − 2. Working with the binary γi eases our understanding of the MCMCprocedure to infer drift rate changes, discussed below. However, we parameterize the modelin terms of the ternary δk to facilitate enforcement of the model restrictions.

Of particular interest is the random number K ∈ 0, . . . , N − 1 of rate changes thatoccur in τ . We can write K in terms of the δi,

K =2N−1∑i=N+1

|δi|, (30)

and it provides us with a natural way to think of the vector δ = (δN+1, . . . , δ2N−1). Forexample, we can express our prior beliefs about δ in terms of K. A popular prior for countdata is the Poisson distribution

K ∼ Poisson(λ). (31)

Here, λ is the expected number of rate changes in τ . In our analyses, we set λ = log(2),which places 50% prior probability on the hypothesis of no rate changes.

In order to infer the nature of the drift rate transitions that occur at the nodesof the phylogenetic tree, we borrow ideas from Bayesian stochastic search variable selec-tion (BSSVS) (George and McCulloch 1993; Kuo and Mallick 1998; Chipman et al. 2001).BSSVS is typically applied to model selection problems in a linear regression setting. In thisframework, we begin with a large number P of potential predictors X1, . . . ,XP and seekto determine which of them associate linearly with an N -dimensional outcome Y. The fullmodel with all predictors is

Y = X1β1 + · · ·+ XPβP + ε, (32)

where the βi are regression coefficients and ε is a vector of normally distributed error termswith mean 0. When a particular βi is determined to differ significantly from 0, the cor-responding Xi helps predict Y. If not, Xi contributes little additional information and is

Page 10: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

fit to be removed from the model by forcing βi = 0. Predictors may be highly correlated,and deterministic model selection strategies tend not to find the optimal set of predictorswithout exploring all possible subsets. There exist 2P such subsets, so exploring all of themis computationally unfeasible in general and fails completely for P > N .

BSSVS efficiently explores the possible subsets of model predictors by augmenting themodel state space with a vector δ = (δ1, . . . , δP ) of binary indicator variables that dictatewhich predictors to include. The indicators δi impose a prior on the regression coefficientsβ = (β1, . . . ,βP ) with mean 0 and variance proportional to a P × P diagonal matrix withits diagonal equal to δ. If δi = 0, then the prior variance on βi shrinks to 0 and forces βi = 0in the posterior. The joint space (β, δ) is explored simultaneously through MCMC.

We apply BSSVS in our relaxed drift setting to determine the types of drift ratetransfers that occur. We achieve this by exploring the joint space (α, δ) of rate differencesbetween parent and child branches, and ternary rate change indicators. The δk map to binaryindicators γi, as shown in (27) and (28). We assume that drift rate differences αi = µi−µpa(i)

are a priori independent and normally distributed,

αi ∼ N(0,γiσ2I). (33)

If γi = 0, then the prior variance σ2 on the components of αi shrinks to 0. This forcesαi = 0, and hence µi = µpa(i), in the posterior.

We complete our drift diffusion model specification by assigning the precision matrixP a Wishart prior with, say, degrees of freedom v and scale matrix V. Importantly, theWishart distribution is conjugate to the observed trait likelihood. Indeed, invoking thenotation for partial means and precisions from section 2.2, the posterior

P (P|Y1, . . . ,YN) ∝ P (Y1, . . . ,YN |P)P (P) (34)

has a Wishart distribution with N + v degrees of freedom and scale matrix(V−1 + p2N−1(m

∗2N−1 − µ∗)(m∗2N−1 − µ∗)′

+2N−1∑k=N+1

[pimim′i + pjmjm

′j − (pi + pj)m

∗km∗′k ]

)−1. (35)

Lemey et al. (2010) exploit a similar conjugacy to construct an efficient Gibbs sampler forP, and our adoption of the Wishart prior conveniently allows us to extend use of the samplerto our model that now includes a relaxed drift process.

2.4 Joint Modeling and Inference

A major strength of our Bayesian framework is that it jointly models sequence andtrait evolution. Adopting a standard phylogenetic approach, we assume the sequence dataX arise from a continuous-time Markov chain (CTMC) model for character evolution acting

Page 11: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

along the unobserved phylogenetic tree τ . The CTMC is characterized by a vector Q of mu-tation parameters that may include, for instance, relative exchange rates among characters,an overall rate multiplier and across-site variation specifications. The traits Y arise from aBrownian drift diffusion process acting on τ , governed by parameters Λ. A crucial assump-tion is that the processes giving rise to the observed sequences and traits are conditionallyindependent given the phylogenetic tree τ :

P (X,Y|τ,Q,Λ) = P (X|τ,Q)P (Y|τ,Λ), (36)

enabling us to write the joint model posterior distribution as

P (τ,Q,Λ|X,Y) ∝ P (X|τ,Q)P (Y|τ,Λ)P (τ)P (Q)P (Λ). (37)

We implement the joint model by integrating our drift diffusion framework for traitevolution into the Bayesian Evolutionary Analysis Sampling Trees (BEAST) software pack-age (Drummond et al. 2012). BEAST provides an array of efficient methods for Bayesianphylogenetic inference, particularly to estimate phylogenies and model molecular sequenceevolution. For the phylogeny τ , we choose from flexible coalescent-based priors that do notmake strong a priori assumptions about the population history (Minin et al. 2008; Gillet al. 2013). For sequence evolution, we have access to a range of classic substitution models(Kimura 1980; Felsenstein 1981; Hasegawa et al. 1985), gamma-distributed rate heterogene-ity among sites (Yang 1994), and strict and relaxed molecular clock models for branch rates(Drummond et al. 2006).

Estimation of the full joint posterior (37) is achieved through MCMC sampling(Metropolis et al. 1953; Hastings 1970). We employ standard Metropolis-Hastings tran-sition kernels available in BEAST to integrate over the parameter spaces of Q and τ . Tosample realizations of the drift diffusion precision matrix P, we adapt a Gibbs sampler devel-oped for drift-neutral Brownian diffusion (Lemey et al. 2010). For the relaxed drift model,we need transition kernels to explore the space (α, δ) of branch rate differences and ternaryrate change indicators. We propose new rate differences α∗i component-wise through a ran-dom walk transition kernel that adds random values within a specified window size to thecurrent αi.

For δ, we implement a trit-flip transition kernel that chooses one of the N−1 ternaryindicators δk uniformly at random and proposes a new state δ∗k assuming one of the twopossible values not equal to δk with equal probability. For example, if δk = 0, then

δ∗k =

{−1 with probability 1

2

1 with probability 12.

(38)

We have parameterized our prior on δ in terms of the number K of rate changes, and thisparameterization should be retained for the transition kernel in order to ensure the correctMetropolis-Hastings proposal ratio (Drummond and Suchard 2010). A proposed increase inrate changes occurs when we choose a δk with value 0, so

q(K∗ = K + 1|K) =N − 1−KN − 1

. (39)

Page 12: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

If we choose a δk with a nonzero value, we propose the other nonzero value with probability0.5 (corresponding to K∗ = K), and we propose 0 with probability 0.5 (which means adecrease in rate changes, K∗ = K − 1). Therefore

q(K∗ = K|K) = q(K∗ = K − 1|K) =1

2

K

N − 1. (40)

These calculations yield the following proposal ratio for K:

q(K|K∗)q(K∗|K)

=

12

K+1N−1−K if K∗ = K + 1

1 if K∗ = K2(N−K)

Kif K∗ = K − 1.

(41)

In addition to the parameters characterizing the trait and sequence evolution pro-cesses, we may wish to make inferences about the posterior distribution of traits at the rootand internal nodes, or at any arbitrary time point in the past. We equip BEAST with theability to generate posterior trait realizations at these nodes by implementing a preorder,tree-traversal algorithm.

A natural choice to summarize the results of a Bayesian phylogenetic analysis is amaximum clade credibility (MCC) tree. To form an MCC tree, the posterior sample oftrees is examined to determine posterior clade probabilities, and the tree with the maximumproduct of posterior clade probabilities is the MCC tree. The branches and nodes of MCCtrees can be annotated with inferred drift rates and trait values, along with other quantitiesof interest. Annotated MCC trees can be summarised using TreeAnnotator, available aspart of the BEAST distribution, and visualized using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

2.5 Model Selection

We can formally compare the constant and relaxed drift diffusion models throughBayes factors (BFs). A Bayes factor (Jeffreys 1935, 1961) compares the fit of two models,M1 and M0, to observed data (X,Y) by taking the ratio of marginal likelihoods:

BF10 =P (X,Y|M1)

P (X,Y|M0)=P (M1|X,Y)

P (M0|X,Y)

/P (M1)

P (M0). (42)

BF10 quantifies the evidence in favor of model M1 over M0. Kass and Raftery (1995) provideguidelines for assessing the strength of the evidence against M0: BFs between 1 and 3 arenot worth more than a bare mention, while values between 3 and 20 are considered positiveevidence against M0. BFs in the ranges 20-150 and >150 are considered to be strong andvery strong evidence against M0, respectively.

Evaluation of Bayes factors has become a popular approach to model selection inBayesian phylogenetics (Sinsheimer et al. 1996; Suchard et al. 2001, 2005). Marginal likeli-hood estimation can be quite difficult in a phylogenetic context, and stepping-stone sampling

Page 13: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

estimators have been implemented to address this (Baele et al. 2012, 2013; Baele and Lemey2013). Following the approach of Drummond and Suchard (2010), however, we are able tostraightforwardly compute the Bayes factor BFC supporting the constant drift model MC

over the relaxed drift model MR. The model MC is nested within the more general modelMR and occurs when K = 0. This enables us to write

BFC =P (X,Y|MC)

P (X,Y|MR)=P (MC |X,Y)

P (MR|X,Y)

/P (MC)

P (MR)(43)

=P (K = 0|X,Y,MR)

1− P (K = 0|X,Y,MR)

/P (K = 0|MR)

1− P (K = 0|MR), (44)

requiring only our prior probability of no rate changes under the relaxed drift model, andthe posterior probability of zero rate changes.

3 The Spread of HIV-1 in Central Africa

Faria et al. (2014) explore the early spatial expansion and epidemic dynamics ofHIV-1 in central Africa by analyzing sequence data sampled from countries in the CongoRiver basin. The authors employ a discrete phylogeographic inference framework (Lemeyet al. 2009a) and show that the pandemic likely originated in Kinshasa (in what is now theDemocratic Republic of Congo) in the 1920s. Furthermore, viral spread to other populationcenters in sub-Saharan Africa was aided by a combination of factors, including strong railwaynetworks, urban growth, and changes in sexual behavior.

We follow up the analysis of Faria et al. (2014) by applying our continuous driftdiffusion approach to one of the data sets analysed in this study. The data set consistsof HIV-1 sequences sampled between 1985-2004 from the Democratic Republic of Congoand the Republic of Congo and includes 96 sequences from Kinshasa (Kalish et al. 2004;Vidal et al. 2000, 2005; Yang et al. 2005), 96 sequences from Mbuji-Mayi (Vidal et al. 2000,2005), 96 from Brazzaville (Bikandou et al. 2004; Niama et al. 2006), 76 from Lubumbashi(Vidal et al. 2005), 33 from Bwamanda (Vidal et al. 2000), 24 from Likasi (Kita et al.2004), 23 from Kisangani (Vidal et al. 2005), and 22 sequences from Pointe-Noire (Bikandouet al. 2000). We reconstruct the spatial dynamics under drift-neutral, constant drift, andrelaxed drift Brownian diffusion on a maximum clade credibility tree estimated from thesequences and their locations of sampling. The traits in this instance are bivariate longitudeand latitude coordinates, with observed traits corresponding to sampling locations. Table 1reports posterior estimates of drift.

Under the constant drift model, we infer a significant longitudinal drift with posteriormean 0.30 degrees per year and a 95% Bayesian credibility interval (BCI) of (0.26, 0.33), aswell as a significant latitudinal drift with posterior mean of -0.09 degrees per year and BCI(-0.11, -0.06). Furthermore, for each coordinate the Bayes factor in favor of a constant driftmodel over a drift-neutral model is greater than 1,000, indicating a substantially better fit forthe constant drift model. These results imply general eastward and southward trends in the

Page 14: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

No Drift Constant Drift Relaxed DriftDrift (Lat.) - - -0.09 (-0.11, -0.06) -0.03 (-0.05, -0.01)Rate Changes (Lat.) - - - - 28.13 (27.0, 29.0)Variance (Lat.) 0.25 (0.23, 0.27) 0.23 (0.21, 0.25) 0.13 (0.12, 0.14)Drift (Long.) - - 0.30 (0.26, 0.33) 0.12 (0.08, 0.16)Rate Changes (Long.) - - - - 1.48 (1.00, 3.00)Variance (Long.) 0.59 (0.55, 0.64) 0.37 (0.34, 0.40) 0.43 (0.39, 0.45)Correlation -0.47 (-0.54, -0.40) -0.40 (-0.47, -0.32) -0.84 (-0.87, -0.82)

Table 1: Spatiotemporal dynamics of HIV-1 in central Africa. Model comparison of Brow-nian diffusion with no drift, constant drift Brownian diffusion, and relaxed drift Browniandiffusion. We report posterior mean estimates along with 95% Bayesian credibility intervals(BCIs). Drift rates for latitude and longitude coordinates are reported in units of degreesper year.

spread of HIV from the Kinshasa-Brazzaville-Pointe-Noire area to other population centers.They also reflect the composition of sampling locations: in terms of longitude, a majorityare far to the east of the believed origin while the rest are relatively close to it. Similarly,nearly 90% of the sequences come from locations south of Kinshasa or from neighboringlocations of similar latitude. On the other hand, the existence of samples from cities northof Kinshasa, Bwamanda and Kisangani, suggests that diffusion with a northward trend maymore accurately characterize part of the spatial history. We explore this possibility with therelaxed drift model.

Under the relaxed drift model, there is significant evidence of multiple longitudinaldrift rates. We estimate a posterior mean of 1.48 drift rate changes with BCI (1, 3), and theBayes factor in favor of relaxed drift over constant drift is greater than 1,000. The posteriormean longitudinal drift rate across all branches is 0.12 with BCI (0.08, 0.16). Figure 1 showsthe maximum clade credibility tree colored according to drift rates. The tree is essentiallydivided up into two clades: one clade with green-colored branches and another with brown-colored branches. We infer eastward drift rates of 0.18 degrees per year on green branches,and drift rates close to or equal to 0 on brown branches. Tree nodes in Figure 1 are depictedas circles of different sizes, with the size of each circle being determined by the longitude ofthe observed or inferred location corresponding to the node. Larger circles represent moreeastward locations. The observed and inferred longitudes provide better understanding ofthe difference in drift rates between the two clades. During the period 1960-2004, thereis generally greater eastward movement along the lineages of the green clade. Althoughthe lineages of the brown clade show greater eastward spread during the first half of theevolutionary history, the drift rates in the two clades are driven by the trends of the secondhalf of the evolutionary history. The second half accounts for a much greater proportion oftree branches and, because it overlaps with all sampling times, contains more informationabout the spatial diffusion process.

Page 15: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

drift(longitude)0.18

-0 .03

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0 160.0 170.0 180.0 190.0 200.0 210.0 220.0 230.0 240.0 250.0 260.0 270.0 280.0 290.0 300.0 310.0 320.0 330.0 340.0 350.0 360.0 370.0 380.0 390.0 400.0 410.0 420.0 430.0 440.0 450.0 460.0 470.0 480.0 490.0 500.0 510.0 520.0 530.0 540.0 550.0 560.0 570.0 580.0 590.0 600.0 610.0 620.0 630.0 640.0 650.0 660.0 670.0 680.0 690.0 700.0 710.0 720.0 730.0 740.0 750.0 760.0 770.0 780.0 790.0 800.0 810.0 820.0 830.0 840.0 850.0 860.0 870.0 880.0 890.0 900.0 910.0 920.0 930.0 940.0 950.0 960.0 970.0 980.0 990.0 1000.0 1010.0 1020.0 1030.0 1040.0 1050.0 1060.0 1070.0 1080.0 1090.0 1100.0 1110.0 1120.0 1130.0 1140.0 1150.0 1160.0 1170.0 1180.0 1190.0 1200.0 1210.0 1220.0 1230.0 1240.0 1250.0 1260.0 1270.0 1280.0 1290.0 1300.0 1310.0 1320.0 1330.0 1340.0 1350.0 1360.0 1370.0 1380.0 1390.0 1400.0 1410.0 1420.0 1430.0 1440.0 1450.0 1460.0 1470.0 1480.0 1490.0 1500.0 1510.0 1520.0 1530.0 1540.0 1550.0 1560.0 1570.0 1580.0 1590.0 1600.0 1610.0 1620.0 1630.0 1640.0 1650.0 1660.0 1670.0 1680.0 1690.0 1700.0 1710.0 1720.0 1730.0 1740.0 1750.0 1760.0 1770.0 1780.0 1790.0 1800.0 1810.0 1820.0 1830.0 1840.0 1850.0 1860.0 1870.0 1880.0 1890.0 1900.0 1910.0 1920.0 1930.0 1940.0 1950.0 1960.0 1970.0 1980.0 1990.0 2000.0 2010.0 2020.0 2030.0 2040.0 2050.0 2060.0 2070.0 2080.0 2090.0 2100.0 2110.0 2120.0 2130.0 2140.0 2150.0 2160.0 2170.0 2180.0 2190.0 2200.0 2210.0 2220.0 2230.0 2240.0 2250.0 2260.0 2270.0 2280.0 2290.0 2300.0 2310.0 2320.0 2330.0 2340.0 2350.0 2360.0 2370.0 2380.0 2390.0 2400.0 2410.0 2420.0 2430.0 2440.0 2450.0 2460.0 2470.0 2480.0 2490.0 2500.0 2510.0 2520.0 2530.0 2540.0 2550.0 2560.0 2570.0 2580.0 2590.0 2600.0 2610.0 2620.0 2630.0 2640.0 2650.0 2660.0 2670.0 2680.0 2690.0 2700.0 2710.0 2720.0 2730.0 2740.0 2750.0 2760.0 2770.0 2780.0 2790.0 2800.0 2810.0 2820.0 2830.0 2840.0 2850.0 2860.0 2870.0 2880.0 2890.0 2900.0 2910.0 2920.0 2930.0 2940.0 2950.0 2960.0 2970.0 2980.0 2990.0 3000.0 3010.0 3020.0 3030.0 3040.0 3050.0 3060.0 3070.0 3080.0 3090.0 3100.0 3110.0 3120.0 3130.0 3140.0 3150.0 3160.0 3170.0 3180.0 3190.0 3200.0 3210.0 3220.0 3230.0 3240.0 3250.0 3260.0 3270.0 3280.0 3290.0 3300.0 3310.0 3320.0 3330.0 3340.0 3350.0 3360.0 3370.0 3380.0 3390.0 3400.0 3410.0 3420.0 3430.0 3440.0 3450.0 3460.0 3470.0 3480.0 3490.0 3500.0 3510.0 3520.0 3530.0 3540.0 3550.0 3560.0 3570.0 3580.0 3590.0 3600.0 3610.0 3620.0 3630.0 3640.0 3650.0 3660.0 3670.0 3680.0 3690.0 3700.0 3710.0 3720.0 3730.0 3740.0 3750.0 3760.0 3770.0 3780.0 3790.0 3800.0 3810.0 3820.0 3830.0 3840.0 3850.0 3860.0 3870.0 3880.0 3890.0 3900.0 3910.0 3920.0 3930.0 3940.0 3950.0 3960.0 3970.0 3980.0 3990.0 4000.0 4010.0 4020.0 4030.0 4040.0 4050.0 4060.0 4070.0 4080.0 4090.0 4100.0 4110.0 4120.0 4130.0 4140.0 4150.0 4160.0 4170.0 4180.0 4190.0 4200.0 4210.0 4220.0 4230.0 4240.0 4250.0 4260.0 4270.0 4280.0 4290.0 4300.0 4310.0 4320.0 4330.0 4340.0 4350.0 4360.0 4370.0 4380.0 4390.0 4400.0 4410.0 4420.0 4430.0 4440.0 4450.0 4460.0 4470.0 4480.0 4490.0 4500.0 4510.0 4520.0 4530.0 4540.0 4550.0 4560.0 4570.0 4580.0 4590.0 4600.0 4610.0 4620.0 4630.0 4640.0 4650.0 4660.0 4670.0 4680.0 4690.0 4700.0 4710.0 4720.0 4730.0 4740.0 4750.0 4760.0 4770.0 4780.0 4790.0 4800.0 4810.0 4820.0 4830.0 4840.0 4850.0 4860.0 4870.0 4880.0 4890.0 4900.0 4910.0 4920.0 4930.0 4940.0 4950.0 4960.0 4970.0 4980.0 4990.0 5000.0 5010.0 5020.0 5030.0 5040.0 5050.0 5060.0 5070.0 5080.0 5090.0 5100.0 5110.0 5120.0 5130.0 5140.0 5150.0 5160.0 5170.0 5180.0 5190.0 5200.0 5210.0 5220.0 5230.0 5240.0 5250.0 5260.0 5270.0 5280.0 5290.0 5300.0 5310.0 5320.0 5330.0 5340.0 5350.0 5360.0 5370.0 5380.0 5390.0 5400.0 5410.0 5420.0 5430.0 5440.0 5450.0 5460.0 5470.0 5480.0 5490.0 5500.0 5510.0 5520.0 5530.0 5540.0 5550.0 5560.0 5570.0 5580.0 5590.0 5600.0 5610.0 5620.0 5630.0 5640.0 5650.0 5660.0 5670.0 5680.0 5690.0 5700.0 5710.0 5720.0 5730.0 5740.0 5750.0 5760.0 5770.0 5780.0 5790.0 5800.0 5810.0 5820.0 5830.0 5840.0 5850.0 5860.0 5870.0 5880.0 5890.0 5900.0 5910.0 5920.0 5930.0 5940.0 5950.0 5960.0 5970.0 5980.0 5990.0 6000.0 6010.0 6020.0 6030.0 6040.0 6050.0 6060.0 6070.0 6080.0 6090.0 6100.0 6110.0 6120.0 6130.0 6140.0 6150.0 6160.0 6170.0 6180.0 6190.0 6200.0 6210.0 6220.0 6230.0 6240.0 6250.0 6260.0 6270.0 6280.0 6290.0 6300.0 6310.0 6320.0 6330.0 6340.0 6350.0 6360.0 6370.0 6380.0 6390.0 6400.0 6410.0 6420.0 6430.0 6440.0 6450.0 6460.0 6470.0 6480.0 6490.0 6500.0 6510.0 6520.0 6530.0 6540.0 6550.0 6560.0 6570.0 6580.0 6590.0 6600.0 6610.0 6620.0 6630.0 6640.0 6650.0 6660.0 6670.0 6680.0 6690.0 6700.0 6710.0 6720.0 6730.0 6740.0 6750.0 6760.0 6770.0 6780.0 6790.0 6800.0 6810.0 6820.0 6830.0 6840.0 6850.0 6860.0 6870.0 6880.0 6890.0 6900.0 6910.0 6920.0 6930.0 6940.0 6950.0 6960.0 6970.0 6980.0 6990.0 7000.0 7010.0 7020.0 7030.0 7040.0 7050.0 7060.0 7070.0 7080.0 7090.0 7100.0 7110.0 7120.0 7130.0 7140.0 7150.0 7160.0 7170.0 7180.0 7190.0 7200.0 7210.0 7220.0 7230.0 7240.0 7250.0 7260.0 7270.0 7280.0 7290.0 7300.0 7310.0 7320.0 7330.0 7340.0 7350.0 7360.0 7370.0 7380.0 7390.0 7400.0 7410.0 7420.0 7430.0 7440.0 7450.0 7460.0 7470.0 7480.0 7490.0 7500.0 7510.0 7520.0 7530.0 7540.0 7550.0 7560.0 7570.0 7580.0 7590.0 7600.0 7610.0 7620.0 7630.0 7640.0 7650.0 7660.0 7670.0 7680.0 7690.0 7700.0 7710.0 7720.0 7730.0 7740.0 7750.0 7760.0 7770.0 7780.0 7790.0 7800.0 7810.0 7820.0 7830.0 7840.0 7850.0 7860.0 7870.0 7880.0 7890.0 7900.0 7910.0 7920.0 7930.0 7940.0 7950.0 7960.0 7970.0 7980.0 7990.0 8000.0 8010.0 8020.0 8030.0 8040.0 8050.0 8060.0 8070.0 8080.0 8090.0 8100.0 8110.0 8120.0 8130.0 8140.0 8150.0 8160.0 8170.0 8180.0 8190.0 8200.0 8210.0 8220.0 8230.0 8240.0 8250.0 8260.0 8270.0 8280.0 8290.0 8300.0 8310.0 8320.0 8330.0 8340.0 8350.0 8360.0 8370.0 8380.0 8390.0 8400.0 8410.0 8420.0 8430.0 8440.0 8450.0 8460.0 8470.0 8480.0 8490.0 8500.0 8510.0 8520.0 8530.0 8540.0 8550.0 8560.0 8570.0 8580.0 8590.0 8600.0 8610.0 8620.0 8630.0 8640.0 8650.0 8660.0 8670.0 8680.0 8690.0 8700.0 8710.0 8720.0 8730.0 8740.0 8750.0 8760.0 8770.0 8780.0 8790.0 8800.0 8810.0 8820.0 8830.0 8840.0 8850.0 8860.0 8870.0 8880.0 8890.0 8900.0 8910.0 8920.0 8930.0 8940.0 8950.0 8960.0 8970.0 8980.0 8990.0 9000.0 9010.0 9020.0 9030.0 9040.0 9050.0 9060.0 9070.0 9080.0 9090.0 9100.0 9110.0 9120.0 9130.0 9140.0 9150.0 9160.0 9170.0 9180.0 9190.0 9200.0 9210.0 9220.0 9230.0 9240.0 9250.0 9260.0 9270.0 9280.0 9290.0 9300.0 9310.0 9320.0 9330.0 9340.0 9350.0 9360.0 9370.0 9380.0 9390.0 9400.0 9410.0 9420.0 9430.0 9440.0 9450.0 9460.0 9470.0 9480.0 9490.0 9500.0 9510.0 9520.0 9530.0 9540.0 9550.0 9560.0 9570.0 9580.0 9590.0 9600.0 9610.0 9620.0 9630.0 9640.0 9650.0 9660.0 9670.0 9680.0 9690.0 9700.0 9710.0 9720.0 9730.0 9740.0 9750.0 9760.0 9770.0 9780.0 9790.0 9800.0 9810.0 9820.0 9830.0 9840.0 9850.0 9860.0 9870.0 9880.0 9890.0 9900.0 9910.0 9920.0 9930.0 9940.0 9950.0 9960.0 9970.0 9980.0 9990.0 10000.0 10010.0 10020.0 10030.0 10040.0 10050.0 10060.0 10070.0 10080.0 10090.0 10100.0 10110.0 10120.0 10130.0 10140.0 10150.0 10160.0 10170.0 10180.0 10190.0 10200.0 10210.0 10220.0 10230.0 10240.0 10250.0 10260.0 10270.0 10280.0 10290.0 10300.0 10310.0 10320.0 10330.0 10340.0 10350.0 10360.0 10370.0 10380.0 10390.0 10400.0 10410.0 10420.0 10430.0 10440.0 10450.0 10460.0 10470.0 10480.0 10490.0 10500.0 10510.0 10520.0 10530.0 10540.0 10550.0 10560.0 10570.0 10580.0 10590.0 10600.0 10610.0 10620.0 10630.0 10640.0 10650.0 10660.0 10670.0 10680.0 10690.0 10700.0 10710.0 10720.0 10730.0 10740.0 10750.0 10760.0 10770.0 10780.0 10790.0 10800.0 10810.0 10820.0 10830.0 10840.0 10850.0 10860.0 10870.0 10880.0 10890.0 10900.0 10910.0 10920.0 10930.0 10940.0 10950.0 10960.0 10970.0 10980.0 10990.0 11000.0 11010.0 11020.0 11030.0 11040.0 11050.0 11060.0 11070.0 11080.0 11090.0 11100.0 11110.0 11120.0 11130.0 11140.0 11150.0 11160.0 11170.0 11180.0 11190.0 11200.0 11210.0 11220.0 11230.0 11240.0 11250.0 11260.0 11270.0 11280.0 11290.0 11300.0 11310.0 11320.0 11330.0 11340.0 11350.0 11360.0 11370.0 11380.0 11390.0 11400.0 11410.0 11420.0 11430.0 11440.0 11450.0 11460.0 11470.0 11480.0 11490.0 11500.0 11510.0 11520.0 11530.0 11540.0 11550.0 11560.0 11570.0 11580.0 11590.0 11600.0 11610.0 11620.0 11630.0 11640.0 11650.0 11660.0 11670.0 11680.0 11690.0 11700.0 11710.0 11720.0 11730.0 11740.0 11750.0 11760.0 11770.0 11780.0 11790.0 11800.0 11810.0 11820.0 11830.0 11840.0 11850.0 11860.0 11870.0 11880.0 11890.0 11900.0 11910.0 11920.0 11930.0 11940.0 11950.0 11960.0 11970.0 11980.0 11990.0 12000.0 12010.0 12020.0 12030.0 12040.0 12050.0 12060.0 12070.0 12080.0 12090.0 12100.0 12110.0 12120.0 12130.0 12140.0 12150.0 12160.0 12170.0 12180.0 12190.0 12200.0 12210.0 12220.0 12230.0 12240.0 12250.0 12260.0 12270.0 12280.0 12290.0 12300.0 12310.0 12320.0 12330.0 12340.0 12350.0 12360.0 12370.0 12380.0 12390.0 12400.0 12410.0 12420.0 12430.0 12440.0 12450.0 12460.0 12470.0 12480.0 12490.0 12500.0 12510.0 12520.0 12530.0 12540.0 12550.0 12560.0 12570.0 12580.0 12590.0 12600.0 12610.0 12620.0 12630.0 12640.0 12650.0 12660.0 12670.0 12680.0 12690.0 12700.0 12710.0 12720.0 12730.0 12740.0 12750.0 12760.0 12770.0 12780.0 12790.0 12800.0 12810.0 12820.0 12830.0 12840.0 12850.0 12860.0 12870.0 12880.0 12890.0 12900.0 12910.0 12920.0 12930.0 12940.0 12950.0 12960.0 12970.0 12980.0 12990.0 13000.0 13010.0 13020.0 13030.0 13040.0 13050.0 13060.0 13070.0 13080.0 13090.0 13100.0 13110.0 13120.0 13130.0 13140.0 13150.0 13160.0 13170.0 13180.0 13190.0 13200.0 13210.0 13220.0 13230.0 13240.0 13250.0 13260.0 13270.0 13280.0 13290.0 13300.0 13310.0 13320.0 13330.0 13340.0 13350.0 13360.0 13370.0 13380.0 13390.0 13400.0 13410.0 13420.0 13430.0 13440.0 13450.0 13460.0 13470.0 13480.0 13490.0 13500.0 13510.0 13520.0 13530.0 13540.0 13550.0 13560.0 13570.0 13580.0 13590.0 13600.0 13610.0 13620.0 13630.0 13640.0 13650.0 13660.0 13670.0 13680.0 13690.0 13700.0 13710.0 13720.0 13730.0 13740.0 13750.0 13760.0 13770.0 13780.0 13790.0 13800.0 13810.0 13820.0 13830.0 13840.0 13850.0 13860.0 13870.0 13880.0 13890.0 13900.0 13910.0 13920.0 13930.0 13940.0 13950.0 13960.0 13970.0 13980.0 13990.0 14000.0 14010.0 14020.0 14030.0 14040.0 14050.0 14060.0 14070.0 14080.0 14090.0 14100.0 14110.0 14120.0 14130.0 14140.0 14150.0 14160.0 14170.0 14180.0 14190.0 14200.0 14210.0 14220.0 14230.0 14240.0 14250.0 14260.0 14270.0 14280.0 14290.0 14300.0 14310.0 14320.0 14330.0 14340.0 14350.0 14360.0 14370.0 14380.0 14390.0 14400.0 14410.0 14420.0 14430.0 14440.0 14450.0 14460.0 14470.0 14480.0 14490.0 14500.0 14510.0 14520.0 14530.0 14540.0 14550.0 14560.0 14570.0 14580.0 14590.0 14600.0 14610.0 14620.0 14630.0 14640.0 14650.0 14660.0 14670.0 14680.0 14690.0 14700.0 14710.0 14720.0 14730.0 14740.0 14750.0 14760.0 14770.0 14780.0 14790.0 14800.0 14810.0 14820.0 14830.0 14840.0 14850.0 14860.0 14870.0 14880.0 14890.0 14900.0 14910.0 14920.0 14930.0 14940.0 14950.0 14960.0 14970.0 14980.0 14990.0 15000.0 15010.0 15020.0 15030.0 15040.0 15050.0 15060.0 15070.0 15080.0 15090.0 15100.0 15110.0 15120.0 15130.0 15140.0 15150.0 15160.0 15170.0 15180.0 15190.0 15200.0 15210.0 15220.0 15230.0 15240.0 15250.0 15260.0 15270.0 15280.0 15290.0 15300.0 15310.0 15320.0 15330.0 15340.0 15350.0 15360.0 15370.0 15380.0 15390.0 15400.0 15410.0 15420.0 15430.0 15440.0 15450.0 15460.0 15470.0 15480.0 15490.0 15500.0 15510.0 15520.0 15530.0 15540.0 15550.0 15560.0 15570.0 15580.0 15590.0 15600.0 15610.0 15620.0 15630.0 15640.0 15650.0 15660.0 15670.0 15680.0 15690.0 15700.0 15710.0 15720.0 15730.0 15740.0 15750.0 15760.0 15770.0 15780.0 15790.0 15800.0 15810.0 15820.0 15830.0 15840.0 15850.0 15860.0 15870.0 15880.0 15890.0 15900.0 15910.0 15920.0 15930.0 15940.0 15950.0 15960.0 15970.0 15980.0 15990.0 16000.0 16010.0 16020.0 16030.0 16040.0 16050.0 16060.0 16070.0 16080.0 16090.0 16100.0 16110.0 16120.0 16130.0 16140.0 16150.0 16160.0 16170.0 16180.0 16190.0 16200.0 16210.0 16220.0 16230.0 16240.0 16250.0 16260.0 16270.0 16280.0 16290.0 16300.0 16310.0 16320.0 16330.0 16340.0 16350.0 16360.0 16370.0 16380.0 16390.0 16400.0 16410.0 16420.0 16430.0 16440.0 16450.0 16460.0 16470.0 16480.0 16490.0 16500.0 16510.0 16520.0 16530.0 16540.0 16550.0 16560.0 16570.0 16580.0 16590.0 16600.0 16610.0 16620.0 16630.0 16640.0 16650.0 16660.0 16670.0 16680.0 16690.0 16700.0 16710.0 16720.0 16730.0 16740.0 16750.0 16760.0 16770.0 16780.0 16790.0 16800.0 16810.0 16820.0 16830.0 16840.0 16850.0 16860.0 16870.0 16880.0 16890.0 16900.0 16910.0 16920.0 16930.0 16940.0 16950.0 16960.0 16970.0 16980.0 16990.0 17000.0 17010.0 17020.0 17030.0 17040.0 17050.0 17060.0 17070.0 17080.0 17090.0 17100.0 17110.0 17120.0 17130.0 17140.0 17150.0 17160.0 17170.0 17180.0 17190.0 17200.0 17210.0 17220.0 17230.0 17240.0 17250.0 17260.0 17270.0 17280.0 17290.0 17300.0 17310.0 17320.0 17330.0 17340.0 17350.0 17360.0 17370.0 17380.0 17390.0 17400.0 17410.0 17420.0 17430.0 17440.0 17450.0 17460.0 17470.0 17480.0 17490.0 17500.0 17510.0 17520.0 17530.0 17540.0 17550.0 17560.0 17570.0 17580.0 17590.0 17600.0 17610.0 17620.0 17630.0 17640.0 17650.0 17660.0 17670.0 17680.0 17690.0 17700.0 17710.0 17720.0 17730.0 17740.0 17750.0 17760.0 17770.0 17780.0 17790.0 17800.0 17810.0 17820.0 17830.0 17840.0 17850.0 17860.0 17870.0 17880.0 17890.0 17900.0 17910.0 17920.0 17930.0 17940.0 17950.0 17960.0 17970.0 17980.0 17990.0 18000.0 18010.0 18020.0 18030.0 18040.0 18050.0 18060.0 18070.0 18080.0 18090.0 18100.0 18110.0 18120.0 18130.0 18140.0 18150.0 18160.0 18170.0 18180.0 18190.0 18200.0 18210.0 18220.0 18230.0 18240.0 18250.0 18260.0 18270.0 18280.0 18290.0 18300.0 18310.0 18320.0 18330.0 18340.0 18350.0 18360.0 18370.0 18380.0 18390.0 18400.0 18410.0 18420.0 18430.0 18440.0 18450.0 18460.0 18470.0 18480.0 18490.0 18500.0 18510.0 18520.0 18530.0 18540.0 18550.0 18560.0 18570.0 18580.0 18590.0 18600.0 18610.0 18620.0 18630.0 18640.0 18650.0 18660.0 18670.0 18680.0 18690.0 18700.0 18710.0 18720.0 18730.0 18740.0 18750.0 18760.0 18770.0 18780.0 18790.0 18800.0 18810.0 18820.0 18830.0 18840.0 18850.0 18860.0 18870.0 18880.0 18890.0 18900.0 18910.0 18920.0 18930.0 18940.0 18950.0 18960.0 18970.0 18980.0 18990.0 19000.0 19010.0 19020.0 19030.0 19040.0 19050.0 19060.0 19070.0 19080.0 19090.0 19100.0 19110.0 19120.0 19130.0 19140.0 19150.0 19160.0 19170.0 19180.0 19190.0 19200.0 19210.0 19220.0 19230.0 19240.0 19250.0 19260.0 19270.0 19280.0 19290.0 19300.0 19310.0 19320.0 19330.0 19340.0 19350.0 19360.0 19370.0 19380.0 19390.0 19400.0 19410.0 19420.0 19430.0 19440.0 19450.0 19460.0 19470.0 19480.0 19490.0 19500.0 19510.0 19520.0 19530.0 19540.0 19550.0 19560.0 19570.0 19580.0 19590.0 19600.0 19610.0 19620.0 19630.0 19640.0 19650.0 19660.0 19670.0 19680.0 19690.0 19700.0 19710.0 19720.0 19730.0 19740.0 19750.0 19760.0 19770.0 19780.0 19790.0 19800.0 19810.0 19820.0 19830.0 19840.0 19850.0 19860.0 19870.0 19880.0 19890.0 19900.0 19910.0 19920.0 19930.0 19940.0 19950.0 19960.0 19970.0 19980.0 19990.0 20000.0 20010.0 20020.0 20030.0 20040.0 20050.0 20060.0 20070.0 20080.0 20090.0

Figure 1: Maximum clade credibility tree for spread of HIV-1 in central Africa. The posteriormean longitudinal drift is depicted using a color gradient along the branches. Green indicatesan eastward drift while brown signifies drift rates close to or equal to zero. Tree nodes aredepicted as circles of different sizes. The size of each circle is determined by the longitude ofthe observed or inferred location corresponding to the node. Larger circles represent moreeastward locations.

Page 16: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

drift(latitude)0.29

-0 .14

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0 160.0 170.0 180.0 190.0 200.0 210.0 220.0 230.0 240.0 250.0 260.0 270.0 280.0 290.0 300.0 310.0 320.0 330.0 340.0 350.0 360.0 370.0 380.0 390.0 400.0 410.0 420.0 430.0 440.0 450.0 460.0 470.0 480.0 490.0 500.0 510.0 520.0 530.0 540.0 550.0 560.0 570.0 580.0 590.0 600.0 610.0 620.0 630.0 640.0 650.0 660.0 670.0 680.0 690.0 700.0 710.0 720.0 730.0 740.0 750.0 760.0 770.0 780.0 790.0 800.0 810.0 820.0 830.0 840.0 850.0 860.0 870.0 880.0 890.0 900.0 910.0 920.0 930.0 940.0 950.0 960.0 970.0 980.0 990.0 1000.0 1010.0 1020.0 1030.0 1040.0 1050.0 1060.0 1070.0 1080.0 1090.0 1100.0 1110.0 1120.0 1130.0 1140.0 1150.0 1160.0 1170.0 1180.0 1190.0 1200.0 1210.0 1220.0 1230.0 1240.0 1250.0 1260.0 1270.0 1280.0 1290.0 1300.0 1310.0 1320.0 1330.0 1340.0 1350.0 1360.0 1370.0 1380.0 1390.0 1400.0 1410.0 1420.0 1430.0 1440.0 1450.0 1460.0 1470.0 1480.0 1490.0 1500.0 1510.0 1520.0 1530.0 1540.0 1550.0 1560.0 1570.0 1580.0 1590.0 1600.0 1610.0 1620.0 1630.0 1640.0 1650.0 1660.0 1670.0 1680.0 1690.0 1700.0 1710.0 1720.0 1730.0 1740.0 1750.0 1760.0 1770.0 1780.0 1790.0 1800.0 1810.0 1820.0 1830.0 1840.0 1850.0 1860.0 1870.0 1880.0 1890.0 1900.0 1910.0 1920.0 1930.0 1940.0 1950.0 1960.0 1970.0 1980.0 1990.0 2000.0 2010.0 2020.0 2030.0 2040.0 2050.0 2060.0 2070.0 2080.0 2090.0 2100.0 2110.0 2120.0 2130.0 2140.0 2150.0 2160.0 2170.0 2180.0 2190.0 2200.0 2210.0 2220.0 2230.0 2240.0 2250.0 2260.0 2270.0 2280.0 2290.0 2300.0 2310.0 2320.0 2330.0 2340.0 2350.0 2360.0 2370.0 2380.0 2390.0 2400.0 2410.0 2420.0 2430.0 2440.0 2450.0 2460.0 2470.0 2480.0 2490.0 2500.0 2510.0 2520.0 2530.0 2540.0 2550.0 2560.0 2570.0 2580.0 2590.0 2600.0 2610.0 2620.0 2630.0 2640.0 2650.0 2660.0 2670.0 2680.0 2690.0 2700.0 2710.0 2720.0 2730.0 2740.0 2750.0 2760.0 2770.0 2780.0 2790.0 2800.0 2810.0 2820.0 2830.0 2840.0 2850.0 2860.0 2870.0 2880.0 2890.0 2900.0 2910.0 2920.0 2930.0 2940.0 2950.0 2960.0 2970.0 2980.0 2990.0 3000.0 3010.0 3020.0 3030.0 3040.0 3050.0 3060.0 3070.0 3080.0 3090.0 3100.0 3110.0 3120.0 3130.0 3140.0 3150.0 3160.0 3170.0 3180.0 3190.0 3200.0 3210.0 3220.0 3230.0 3240.0 3250.0 3260.0 3270.0 3280.0 3290.0 3300.0 3310.0 3320.0 3330.0 3340.0 3350.0 3360.0 3370.0 3380.0 3390.0 3400.0 3410.0 3420.0 3430.0 3440.0 3450.0 3460.0 3470.0 3480.0 3490.0 3500.0 3510.0 3520.0 3530.0 3540.0 3550.0 3560.0 3570.0 3580.0 3590.0 3600.0 3610.0 3620.0 3630.0 3640.0 3650.0 3660.0 3670.0 3680.0 3690.0 3700.0 3710.0 3720.0 3730.0 3740.0 3750.0 3760.0 3770.0 3780.0 3790.0 3800.0 3810.0 3820.0 3830.0 3840.0 3850.0 3860.0 3870.0 3880.0 3890.0 3900.0 3910.0 3920.0 3930.0 3940.0 3950.0 3960.0 3970.0 3980.0 3990.0 4000.0 4010.0 4020.0 4030.0 4040.0 4050.0 4060.0 4070.0 4080.0 4090.0 4100.0 4110.0 4120.0 4130.0 4140.0 4150.0 4160.0 4170.0 4180.0 4190.0 4200.0 4210.0 4220.0 4230.0 4240.0 4250.0 4260.0 4270.0 4280.0 4290.0 4300.0 4310.0 4320.0 4330.0 4340.0 4350.0 4360.0 4370.0 4380.0 4390.0 4400.0 4410.0 4420.0 4430.0 4440.0 4450.0 4460.0 4470.0 4480.0 4490.0 4500.0 4510.0 4520.0 4530.0 4540.0 4550.0 4560.0 4570.0 4580.0 4590.0 4600.0 4610.0 4620.0 4630.0 4640.0 4650.0 4660.0 4670.0 4680.0 4690.0 4700.0 4710.0 4720.0 4730.0 4740.0 4750.0 4760.0 4770.0 4780.0 4790.0 4800.0 4810.0 4820.0 4830.0 4840.0 4850.0 4860.0 4870.0 4880.0 4890.0 4900.0 4910.0 4920.0 4930.0 4940.0 4950.0 4960.0 4970.0 4980.0 4990.0 5000.0 5010.0 5020.0 5030.0 5040.0 5050.0 5060.0 5070.0 5080.0 5090.0 5100.0 5110.0 5120.0 5130.0 5140.0 5150.0 5160.0 5170.0 5180.0 5190.0 5200.0 5210.0 5220.0 5230.0 5240.0 5250.0 5260.0 5270.0 5280.0 5290.0 5300.0 5310.0 5320.0 5330.0 5340.0 5350.0 5360.0 5370.0 5380.0 5390.0 5400.0 5410.0 5420.0 5430.0 5440.0 5450.0 5460.0 5470.0 5480.0 5490.0 5500.0 5510.0 5520.0 5530.0 5540.0 5550.0 5560.0 5570.0 5580.0 5590.0 5600.0 5610.0 5620.0 5630.0 5640.0 5650.0 5660.0 5670.0 5680.0 5690.0 5700.0 5710.0 5720.0 5730.0 5740.0 5750.0 5760.0 5770.0 5780.0 5790.0 5800.0 5810.0 5820.0 5830.0 5840.0 5850.0 5860.0 5870.0 5880.0 5890.0 5900.0 5910.0 5920.0 5930.0 5940.0 5950.0 5960.0 5970.0 5980.0 5990.0 6000.0 6010.0 6020.0 6030.0 6040.0 6050.0 6060.0 6070.0 6080.0 6090.0 6100.0 6110.0 6120.0 6130.0 6140.0 6150.0 6160.0 6170.0 6180.0 6190.0 6200.0 6210.0 6220.0 6230.0 6240.0 6250.0 6260.0 6270.0 6280.0 6290.0 6300.0 6310.0 6320.0 6330.0 6340.0 6350.0 6360.0 6370.0 6380.0 6390.0 6400.0 6410.0 6420.0 6430.0 6440.0 6450.0 6460.0 6470.0 6480.0 6490.0 6500.0 6510.0 6520.0 6530.0 6540.0 6550.0 6560.0 6570.0 6580.0 6590.0 6600.0 6610.0 6620.0 6630.0 6640.0 6650.0 6660.0 6670.0 6680.0 6690.0 6700.0 6710.0 6720.0 6730.0 6740.0 6750.0 6760.0 6770.0 6780.0 6790.0 6800.0 6810.0 6820.0 6830.0 6840.0 6850.0 6860.0 6870.0 6880.0 6890.0 6900.0 6910.0 6920.0 6930.0 6940.0 6950.0 6960.0 6970.0 6980.0 6990.0 7000.0 7010.0 7020.0 7030.0 7040.0 7050.0 7060.0 7070.0 7080.0 7090.0 7100.0 7110.0 7120.0 7130.0 7140.0 7150.0 7160.0 7170.0 7180.0 7190.0 7200.0 7210.0 7220.0 7230.0 7240.0 7250.0 7260.0 7270.0 7280.0 7290.0 7300.0 7310.0 7320.0 7330.0 7340.0 7350.0 7360.0 7370.0 7380.0 7390.0 7400.0 7410.0 7420.0 7430.0 7440.0 7450.0 7460.0 7470.0 7480.0 7490.0 7500.0 7510.0 7520.0 7530.0 7540.0 7550.0 7560.0 7570.0 7580.0 7590.0 7600.0 7610.0 7620.0 7630.0 7640.0 7650.0 7660.0 7670.0 7680.0 7690.0 7700.0 7710.0 7720.0 7730.0 7740.0 7750.0 7760.0 7770.0 7780.0 7790.0 7800.0 7810.0 7820.0 7830.0 7840.0 7850.0 7860.0 7870.0 7880.0 7890.0 7900.0 7910.0 7920.0 7930.0 7940.0 7950.0 7960.0 7970.0 7980.0 7990.0 8000.0 8010.0 8020.0 8030.0 8040.0 8050.0 8060.0 8070.0 8080.0 8090.0 8100.0 8110.0 8120.0 8130.0 8140.0 8150.0 8160.0 8170.0 8180.0 8190.0 8200.0 8210.0 8220.0 8230.0 8240.0 8250.0 8260.0 8270.0 8280.0 8290.0 8300.0 8310.0 8320.0 8330.0 8340.0 8350.0 8360.0 8370.0 8380.0 8390.0 8400.0 8410.0 8420.0 8430.0 8440.0 8450.0 8460.0 8470.0 8480.0 8490.0 8500.0 8510.0 8520.0 8530.0 8540.0 8550.0 8560.0 8570.0 8580.0 8590.0 8600.0 8610.0 8620.0 8630.0 8640.0 8650.0 8660.0 8670.0 8680.0 8690.0 8700.0 8710.0 8720.0 8730.0 8740.0 8750.0 8760.0 8770.0 8780.0 8790.0 8800.0 8810.0 8820.0 8830.0 8840.0 8850.0 8860.0 8870.0 8880.0 8890.0 8900.0 8910.0 8920.0 8930.0 8940.0 8950.0 8960.0 8970.0 8980.0 8990.0 9000.0 9010.0 9020.0 9030.0 9040.0 9050.0 9060.0 9070.0 9080.0 9090.0 9100.0 9110.0 9120.0 9130.0 9140.0 9150.0 9160.0 9170.0 9180.0 9190.0 9200.0 9210.0 9220.0 9230.0 9240.0 9250.0 9260.0 9270.0 9280.0 9290.0 9300.0 9310.0 9320.0 9330.0 9340.0 9350.0 9360.0 9370.0 9380.0 9390.0 9400.0 9410.0 9420.0 9430.0 9440.0 9450.0 9460.0 9470.0 9480.0 9490.0 9500.0 9510.0 9520.0 9530.0 9540.0 9550.0 9560.0 9570.0 9580.0 9590.0 9600.0 9610.0 9620.0 9630.0 9640.0 9650.0 9660.0 9670.0 9680.0 9690.0 9700.0 9710.0 9720.0 9730.0 9740.0 9750.0 9760.0 9770.0 9780.0 9790.0 9800.0 9810.0 9820.0 9830.0 9840.0 9850.0 9860.0 9870.0 9880.0 9890.0 9900.0 9910.0 9920.0 9930.0 9940.0 9950.0 9960.0 9970.0 9980.0 9990.0 10000.0 10010.0 10020.0 10030.0 10040.0 10050.0 10060.0 10070.0 10080.0 10090.0 10100.0 10110.0 10120.0 10130.0 10140.0 10150.0 10160.0 10170.0 10180.0 10190.0 10200.0 10210.0 10220.0 10230.0 10240.0 10250.0 10260.0 10270.0 10280.0 10290.0 10300.0 10310.0 10320.0 10330.0 10340.0 10350.0 10360.0 10370.0 10380.0 10390.0 10400.0 10410.0 10420.0 10430.0 10440.0 10450.0 10460.0 10470.0 10480.0 10490.0 10500.0 10510.0 10520.0 10530.0 10540.0 10550.0 10560.0 10570.0 10580.0 10590.0 10600.0 10610.0 10620.0 10630.0 10640.0 10650.0 10660.0 10670.0 10680.0 10690.0 10700.0 10710.0 10720.0 10730.0 10740.0 10750.0 10760.0 10770.0 10780.0 10790.0 10800.0 10810.0 10820.0 10830.0 10840.0 10850.0 10860.0 10870.0 10880.0 10890.0 10900.0 10910.0 10920.0 10930.0 10940.0 10950.0 10960.0 10970.0 10980.0 10990.0 11000.0 11010.0 11020.0 11030.0 11040.0 11050.0 11060.0 11070.0 11080.0 11090.0 11100.0 11110.0 11120.0 11130.0 11140.0 11150.0 11160.0 11170.0 11180.0 11190.0 11200.0 11210.0 11220.0 11230.0 11240.0 11250.0 11260.0 11270.0 11280.0 11290.0 11300.0 11310.0 11320.0 11330.0 11340.0 11350.0 11360.0 11370.0 11380.0 11390.0 11400.0 11410.0 11420.0 11430.0 11440.0 11450.0 11460.0 11470.0 11480.0 11490.0 11500.0 11510.0 11520.0 11530.0 11540.0 11550.0 11560.0 11570.0 11580.0 11590.0 11600.0 11610.0 11620.0 11630.0 11640.0 11650.0 11660.0 11670.0 11680.0 11690.0 11700.0 11710.0 11720.0 11730.0 11740.0 11750.0 11760.0 11770.0 11780.0 11790.0 11800.0 11810.0 11820.0 11830.0 11840.0 11850.0 11860.0 11870.0 11880.0 11890.0 11900.0 11910.0 11920.0 11930.0 11940.0 11950.0 11960.0 11970.0 11980.0 11990.0 12000.0 12010.0 12020.0 12030.0 12040.0 12050.0 12060.0 12070.0 12080.0 12090.0 12100.0 12110.0 12120.0 12130.0 12140.0 12150.0 12160.0 12170.0 12180.0 12190.0 12200.0 12210.0 12220.0 12230.0 12240.0 12250.0 12260.0 12270.0 12280.0 12290.0 12300.0 12310.0 12320.0 12330.0 12340.0 12350.0 12360.0 12370.0 12380.0 12390.0 12400.0 12410.0 12420.0 12430.0 12440.0 12450.0 12460.0 12470.0 12480.0 12490.0 12500.0 12510.0 12520.0 12530.0 12540.0 12550.0 12560.0 12570.0 12580.0 12590.0 12600.0 12610.0 12620.0 12630.0 12640.0 12650.0 12660.0 12670.0 12680.0 12690.0 12700.0 12710.0 12720.0 12730.0 12740.0 12750.0 12760.0 12770.0 12780.0 12790.0 12800.0 12810.0 12820.0 12830.0 12840.0 12850.0 12860.0 12870.0 12880.0 12890.0 12900.0 12910.0 12920.0 12930.0 12940.0 12950.0 12960.0 12970.0 12980.0 12990.0 13000.0 13010.0 13020.0 13030.0 13040.0 13050.0 13060.0 13070.0 13080.0 13090.0 13100.0 13110.0 13120.0 13130.0 13140.0 13150.0 13160.0 13170.0 13180.0 13190.0 13200.0 13210.0 13220.0 13230.0 13240.0 13250.0 13260.0 13270.0 13280.0 13290.0 13300.0 13310.0 13320.0 13330.0 13340.0 13350.0 13360.0 13370.0 13380.0 13390.0 13400.0 13410.0 13420.0 13430.0 13440.0 13450.0 13460.0 13470.0 13480.0 13490.0 13500.0 13510.0 13520.0 13530.0 13540.0 13550.0 13560.0 13570.0 13580.0 13590.0 13600.0 13610.0 13620.0 13630.0 13640.0 13650.0 13660.0 13670.0 13680.0 13690.0 13700.0 13710.0 13720.0 13730.0 13740.0 13750.0 13760.0 13770.0 13780.0 13790.0 13800.0 13810.0 13820.0 13830.0 13840.0 13850.0 13860.0 13870.0 13880.0 13890.0 13900.0 13910.0 13920.0 13930.0 13940.0 13950.0 13960.0 13970.0 13980.0 13990.0 14000.0 14010.0 14020.0 14030.0 14040.0 14050.0 14060.0 14070.0 14080.0 14090.0 14100.0 14110.0 14120.0 14130.0 14140.0 14150.0 14160.0 14170.0 14180.0 14190.0 14200.0 14210.0 14220.0 14230.0 14240.0 14250.0 14260.0 14270.0 14280.0 14290.0 14300.0 14310.0 14320.0 14330.0 14340.0 14350.0 14360.0 14370.0 14380.0 14390.0 14400.0 14410.0 14420.0 14430.0 14440.0 14450.0 14460.0 14470.0 14480.0 14490.0 14500.0 14510.0 14520.0 14530.0 14540.0 14550.0 14560.0 14570.0 14580.0 14590.0 14600.0 14610.0 14620.0 14630.0 14640.0 14650.0 14660.0 14670.0 14680.0 14690.0 14700.0 14710.0 14720.0 14730.0 14740.0 14750.0 14760.0 14770.0 14780.0 14790.0 14800.0 14810.0 14820.0 14830.0 14840.0 14850.0 14860.0 14870.0 14880.0 14890.0 14900.0 14910.0 14920.0 14930.0 14940.0 14950.0 14960.0 14970.0 14980.0 14990.0 15000.0 15010.0 15020.0 15030.0 15040.0 15050.0 15060.0 15070.0 15080.0 15090.0 15100.0 15110.0 15120.0 15130.0 15140.0 15150.0 15160.0 15170.0 15180.0 15190.0 15200.0 15210.0 15220.0 15230.0 15240.0 15250.0 15260.0 15270.0 15280.0 15290.0 15300.0 15310.0 15320.0 15330.0 15340.0 15350.0 15360.0 15370.0 15380.0 15390.0 15400.0 15410.0 15420.0 15430.0 15440.0 15450.0 15460.0 15470.0 15480.0 15490.0 15500.0 15510.0 15520.0 15530.0 15540.0 15550.0 15560.0 15570.0 15580.0 15590.0 15600.0 15610.0 15620.0 15630.0 15640.0 15650.0 15660.0 15670.0 15680.0 15690.0 15700.0 15710.0 15720.0 15730.0 15740.0 15750.0 15760.0 15770.0 15780.0 15790.0 15800.0 15810.0 15820.0 15830.0 15840.0 15850.0 15860.0 15870.0 15880.0 15890.0 15900.0 15910.0 15920.0 15930.0 15940.0 15950.0 15960.0 15970.0 15980.0 15990.0 16000.0 16010.0 16020.0 16030.0 16040.0 16050.0 16060.0 16070.0 16080.0 16090.0 16100.0 16110.0 16120.0 16130.0 16140.0 16150.0 16160.0 16170.0 16180.0 16190.0 16200.0 16210.0 16220.0 16230.0 16240.0 16250.0 16260.0 16270.0 16280.0 16290.0 16300.0 16310.0 16320.0 16330.0 16340.0 16350.0 16360.0 16370.0 16380.0 16390.0 16400.0 16410.0 16420.0 16430.0 16440.0 16450.0 16460.0 16470.0 16480.0 16490.0 16500.0 16510.0 16520.0 16530.0 16540.0 16550.0 16560.0 16570.0 16580.0 16590.0 16600.0 16610.0 16620.0 16630.0 16640.0 16650.0 16660.0 16670.0 16680.0 16690.0 16700.0 16710.0 16720.0 16730.0 16740.0 16750.0 16760.0 16770.0 16780.0 16790.0 16800.0 16810.0 16820.0 16830.0 16840.0 16850.0 16860.0 16870.0 16880.0 16890.0 16900.0 16910.0 16920.0 16930.0 16940.0 16950.0 16960.0 16970.0 16980.0 16990.0 17000.0 17010.0 17020.0 17030.0 17040.0 17050.0 17060.0 17070.0 17080.0 17090.0 17100.0 17110.0 17120.0 17130.0 17140.0 17150.0 17160.0 17170.0 17180.0 17190.0 17200.0 17210.0 17220.0 17230.0 17240.0 17250.0 17260.0 17270.0 17280.0 17290.0 17300.0 17310.0 17320.0 17330.0 17340.0 17350.0 17360.0 17370.0 17380.0 17390.0 17400.0 17410.0 17420.0 17430.0 17440.0 17450.0 17460.0 17470.0 17480.0 17490.0 17500.0 17510.0 17520.0 17530.0 17540.0 17550.0 17560.0 17570.0 17580.0 17590.0 17600.0 17610.0 17620.0 17630.0 17640.0 17650.0 17660.0 17670.0 17680.0 17690.0 17700.0 17710.0 17720.0 17730.0 17740.0 17750.0 17760.0 17770.0 17780.0 17790.0 17800.0 17810.0 17820.0 17830.0 17840.0 17850.0 17860.0 17870.0 17880.0 17890.0 17900.0 17910.0 17920.0 17930.0 17940.0 17950.0 17960.0 17970.0 17980.0 17990.0 18000.0 18010.0 18020.0 18030.0 18040.0 18050.0 18060.0 18070.0 18080.0 18090.0 18100.0 18110.0 18120.0 18130.0 18140.0 18150.0 18160.0 18170.0 18180.0 18190.0 18200.0 18210.0 18220.0 18230.0 18240.0 18250.0 18260.0 18270.0 18280.0 18290.0 18300.0 18310.0 18320.0 18330.0 18340.0 18350.0 18360.0 18370.0 18380.0 18390.0 18400.0 18410.0 18420.0 18430.0 18440.0 18450.0 18460.0 18470.0 18480.0 18490.0 18500.0 18510.0 18520.0 18530.0 18540.0 18550.0 18560.0 18570.0 18580.0 18590.0 18600.0 18610.0 18620.0 18630.0 18640.0 18650.0 18660.0 18670.0 18680.0 18690.0 18700.0 18710.0 18720.0 18730.0 18740.0 18750.0 18760.0 18770.0 18780.0 18790.0 18800.0 18810.0 18820.0 18830.0 18840.0 18850.0 18860.0 18870.0 18880.0 18890.0 18900.0 18910.0 18920.0 18930.0 18940.0 18950.0 18960.0 18970.0 18980.0 18990.0 19000.0 19010.0 19020.0 19030.0 19040.0 19050.0 19060.0 19070.0 19080.0 19090.0 19100.0 19110.0 19120.0 19130.0 19140.0 19150.0 19160.0 19170.0 19180.0 19190.0 19200.0 19210.0 19220.0 19230.0 19240.0 19250.0 19260.0 19270.0 19280.0 19290.0 19300.0 19310.0 19320.0 19330.0 19340.0 19350.0 19360.0 19370.0 19380.0 19390.0 19400.0 19410.0 19420.0 19430.0 19440.0 19450.0 19460.0 19470.0 19480.0 19490.0 19500.0 19510.0 19520.0 19530.0 19540.0 19550.0 19560.0 19570.0 19580.0 19590.0 19600.0 19610.0 19620.0 19630.0 19640.0 19650.0 19660.0 19670.0 19680.0 19690.0 19700.0 19710.0 19720.0 19730.0 19740.0 19750.0 19760.0 19770.0 19780.0 19790.0 19800.0 19810.0 19820.0 19830.0 19840.0 19850.0 19860.0 19870.0 19880.0 19890.0 19900.0 19910.0 19920.0 19930.0 19940.0 19950.0 19960.0 19970.0 19980.0 19990.0 20000.0 20010.0 20020.0 20030.0 20040.0 20050.0 20060.0 20070.0 20080.0 20090.0

Figure 2: Maximum clade credibility tree for spread of HIV-1 in central Africa. The posteriormean latitudinal drift is depicted using a color gradient along the branches. The colorsrange between red and blue, with the former indicating a southward drift and the latter anorthward drift. Tree nodes are depicted as circles of different sizes. The size of each circle isdetermined by the longitude of the observed or inferred location corresponding to the node.Larger circles represent more northward locations.

Page 17: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

For latitudinal drift, we estimate 28.13 rate changes with BCI (27, 29). Furthermore,relaxed drift is supported over constant drift with a Bayes factor greater than 1,000. Theoverall posterior mean drift rate across all branches is -0.03 with BCI (-0.05, -0.01). Figure2 depicts an annotated maximum clade credibility tree colored according to latitudinal driftrates. The color gradient ranges from red to black to blue, with red indicating a southwarddrift, blue a northward drift, and black a neutral drift. As in Figure 1, tree nodes aredepicted as circles of varying sizes, with larger circles representing greater observed or inferredlatitude coordinates. Notably, external branches with positive drift rates lead to samplesfrom locations north of the origin (Bwamanda and Kisangani), while external branches withnonpositive drift rates lead to samples from locations with latitudes south of or similar tothe origin.

Importantly, adopting a zero-mean displacement distribution when the diffusion pro-cess is more accurately described with a nontrivial mean can result in inflated displacementvariance rate estimates. (Recall that the displacement variance along a branch of length ti istiP−1, so by “displacement variance rates,” we mean the diagonal elements of the variance

rate matrix P−1.) By incorporating drift into the model, we are able to disentangle thedrift from the variance and uncover a clearer picture of the movement. The variance rateestimates in Table 1 illustrate this point. Including a constant drift reduces the variancerate of the displacement of the longitude coordinate from 0.59 to 0.37. For the latitude,on the other hand, the variance rate decreases modestly from 0.25 to 0.23 and the BCIs of(0.23, 0.27) and (0.21, 0.25) overlap. The lack of an appreciable reduction in variance ratemay be explained by the fact that the apparent northward drift to Bwamanda and Kisan-gani remains unaccounted for by inclusion of a constant southward drift rate. Indeed, byaccommodating drift rate changes under the relaxed drift model, the variance rate of thelatitudinal displacement drops to 0.13 with BCI (0.12, 0.14).

4 West Nile Virus

West Nile virus (WNV) is the most important mosquito-transmitted arbovirus nownative to the U.S. Birds are the most common host, although WNV has been known toinfect other animals including humans. About 70-80% of infected humans do not developany symptoms, and most of the remaining infections result in fever and other symptomssuch as headaches, body aches, joint pains, vomiting, diarrhea or rash. Fewer than 1% ofinfected humans will develop a serious neurologic illness such as encephalitis or meningitis.However, recovery from neurologic infection can take several months, some neurologic effectscan be permanent, and about 10% of such infections lead to death. Since 1999, WNV hasbeen responsible for more than 1,500 deaths in the United States (CDC 2013).

WNV was first detected in the United States in New York City in August 1999, and ismost closely related to a highly pathogenic WNV lineage isolated in Israel in 1998 (Lanciottiet al. 1999). Surveillance records of WNV incidence show a wave of infection that spreadwestward and reached the west coast by 2004 (CDC 2013). Thus the spread of WNV inNorth America can be naturally modeled as a spatial diffusion process with a directional

Page 18: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

trend.We analyze a data set of 104 WNV complete genome sequences sampled between

1999 and 2008 and isolated from a number of different host and vector species (Pybus et al.2012). The sequences were sampled from a wide variety of locations in the United Statesand Mexico. We represent the sampling locations as bivariate traits consisting of latitudeand longitude coordinates and conduct a phylogeographic analysis using our Brownian driftdiffusion model. Results are presented in Table 2.

No Drift Constant DriftDrift (Lat.) - - -0.25 (-1.05, 0.08)Variance (Lat.) 6.74 (4.83, 15.81) 6.38 (4.60, 14.61)Drift (Long.) - - -1.77 (-3.43, -0.27)Variance (Long.) 23.37 (16.95, 54.04) 20.06 (14.67, 45.52)Correlation 0.22 (0.03, 0.43) 0.18 (-0.03, 0.36)

Table 2: Spatiotemporal dynamics of West Nile virus in North America. Model comparison ofBrownian diffusion with no drift and constant drift Brownian diffusion. We report posteriormean estimates along with 95% BCIs. Drift rates for latitude and longitude coordinates arereported in units of degrees per year.

We begin by analyzing the data with the constant drift model. For the longitude, theposterior mean drift rate is -1.77 degrees per year with a 95% BCI of (-3.43, -0.27). Thisindicates a significant westward trend in the spread of WNV, consistent with WNV incidencedata. Furthermore, a Bayes factor of 18.23 lends considerable support to the constant driftmodel over the drift-neutral model. For the latitude, we have an estimated posterior meandrift of -0.25 degrees per year, but the 95% BCI of (-1.05, 0.08) contains zero. Moreover,the Bayes factor in favor of constant drift over no drift is 0.71. Therefore there is not muchevidence of a significant North-South drift. We check for the possibility of multiple driftrates under the relaxed drift model. However, the inferred number of rate changes in thelatitudinal drift is 0.63 with BCI (0, 2), and the Bayes factor in favor of relaxed drift overconstant drift is 0.87. Similarly, for the longitude coordinate, we estimate 0.71 rate changeswith BCI (0, 2), and Bayes factor of 1.04. Thus the data do not support relaxed drift overconstant drift.

While inclusion of drift leads to smaller displacement variance rates in our analysis ofHIV-1 dispersal dynamics, Table 2 shows that not to be the case for the WNV data. Evenfor the longitude coordinate where there is significant evidence of a nontrivial drift, there isa great deal of BCI overlap in the estimated displacement variance rate for drift-neutral andconstant drift diffusion. This difference can be explained by comparing the magnitude ofthe displacement drift relative to the displacement standard deviation in the WNV and HIVexamples. Here, we work with displacement standard deviation rather than displacementvariance in order to make comparisons on the same scale. Recall that the displacement meanand variance along a branch are both proportional to the branch length. To get an estimate

Page 19: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

of the average displacement mean and standard deviation for each example, we use the meanbranch length. The mean branch length is 16.9 years for the HIV data set, and 2.4 yearsfor the WNV data set. Under constant drift, we get an average longitudinal displacementmean of 5.07 for the HIV data, and -4.25 for the WNV data. The average longitudinaldisplacement standard deviation without drift is 3.25 for the HIV data, and 7.49 for theWNV data. In the HIV analysis, the average displacement mean with drift is 1.61 times asmuch as the average displacement standard deviation without drift. In the case of the WNV,the absolute value of the average displacement mean with drift is 0.57 times as much as theaverage displacement standard deviation without drift. So while the displacement standarddeviation in the drift-neutral HIV analysis is inflated by the hidden drift, the latent driftrepresents a much smaller contribution to the drift-neutral displacement standard deviationin the WNV analysis.

5 HIV-1 Resistance to Broadly Neutralizing Antibod-

ies

It is widely believed that a successful HIV-1 vaccine will require the elicitation ofneutralizing antibodies (Johnston and Fauci 2007; Barouch 2008; Walker and Burton 2008).Most neutralizing antibodies are strain-specific and therefore not so attractive for vaccinedesign (Weiss et al. 1985; Mascola and Montefiori 2010). It is important to identify andcharacterize antibody specificities that are effective against a large number of currentlycirculating HIV-1 variants (Burton 2002, 2004). Several broadly neutralizing monoclonalantibodies have been recently isolated, including PG9 and PG16 (Walker et al. 2009), andVRC01 (Zhou et al. 2010).

Studies comparing viruses isolated from individuals who seroconverted early in theHIV-1 epidemic to viruses from individuals who seroconverted in recent years have shownthat HIV-1 has become increasingly resistant to antibody neutralization over the courseof the epidemic (Bunnik et al. 2010; Euler et al. 2011; Bouvin-Pley et al. 2013). Bunniket al. (2010) demonstrate a decreased sensitivity to polyclonal antibodies and to monoclonalantibody b12. Euler et al. (2011) extend those findings by investigating whether HIV-1 hasadapted to the neutralization activity of PG9, PG16, and VRC01. Their results show thatHIV-1 has become significantly more resistant to neutralization by VRC01 and also providesome support for increased resistance to neutralization by PG16.

These studies typically do not account for phylogenetic dependence among the sam-pled viruses. Vrancken et al. (2015) examine the data set of Euler et al. (2011) with aBrownian diffusion trait evolution model that simultaneously infers phylogenetic signal, thedegree to which resemblance in traits reflects phylogenetic relatedness. They find moder-ate phylogenetic signal and, through ancestral trait value reconstruction, more evidence ofdecreased sensitivity of HIV-1 to VRC01 and PG16 neutralization at the population level.

We follow up on the analysis of Vrancken et al. (2015) by incorporating drift into theBrownian trait evolution. The data set is comprised of clonal HIV-1 variants from “historic”

Page 20: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

and “contemporary” seroconverters with an acute or early subtype B HIV-1 infection. The14 historic seroconverters have a known seroconversion date between 1985 and 1989, andthe 21 contemporary seroconverters have a seroconversion date between 2003 and 2006. Thepercent neutralization is determined by calculating the reduction in p24 production in thepresence of the neutralizing agent compared to the p24 levels in the cultures with virusonly. The trait values of interest are 50% inhibitory concentration (IC50) assay values thatsummarize the percent neutralization by antibodies PG9, PG16 and VRC01, measured inunits of µg/ml. We take the log-transform of IC50 values in order to ensure that concentrationvalues are strictly positive under the diffusion process. Higher log(IC50) values correspondto greater resistance to antibody neutralization. For viruses with log(IC50) values that falloutside the tested antibody concentration range, we integrate out the concentration over aplausible IC50 interval.

First, we analyze the data with the constant drift model (see Table 3). The resultsare essentially consistent with the findings of Euler et al. (2011). For VRC01, we estimate aposterior mean drift of 0.15 with 95% BCI (0.06, 0.24), signaling a significant drift towardhigher resistance to VRC01 neutralization. Furthermore, a Bayes factor of 32.33 lends strongsupport to a constant drift over no drift. There is not as much evidence of a trend for PG16.On one hand, the posterior probability that the drift rate is positive is 0.953, providing somecorroboration for a decreased sensitivity to PG16 neutralization. However, we infer a meandrift rate of 0.10 with a 95% BCI of (-0.02, 0.20) that contains zero. Furthermore, the Bayesfactor in favor of constant drift over no drift is just 1.32, showing little support for inclusionof a drift term. We do not detect a significant drift in the case of PG9: the posterior meanis 0.08 with 95% BCI (-0.05, 0.19) and the Bayes factor is 1.17.

To take a closer look, we fit the data to the relaxed drift model. Along with branchspecific drift rates, we examine the posterior mean rates over the entire evolutionary history(see Table 3). First, we consider the results for PG9 and PG16. The posterior mean driftrates are similar to the inferred drift rates under the constant drift model, and their 95%BCIs contain 0. There is little support for any drift rate changes occurring along the phy-logeny. The mean estimated number of rate changes are 0.17 and 0.2 for PG9 and PG16,respectively, and the Bayes factors in favor of relaxed drift over constant drift are 0.19 and0.20. Hence there is not much evidence of localized directional trends that differ from theoverall directional trends.

We illustrate the evolutionary pattern for resistance against VRC01 under the relaxeddrift model in Figure 3. The branches of the maximum clade credibility tree are coloredaccording to the inferred branch-specific mean drift rates, and the tree tips are labeled withsubject identifications. We obtain a posterior mean estimate of 1.13 rate changes, with aposterior probability of 0.88 for exactly one rate change and probability greater than 0.99for at least one rate change. Furthermore, the Bayes factor in favor of relaxed drift overconstant drift is 359.04, providing very strong support for relaxed drift. As shown in Table4, the displacement variance rate decreases as we move from no drift to a constant drift,and then to relaxed drift. Figure 3 shows a rate change occurring at the common ancestralnode of samples from subjects P001 and P002. Apart from the two branches leading to tips

Page 21: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

P001 and P002 (which we refer to as “branch P001” and “branch P002,” respectively), thebranches have essentially identical mean drift rates of about 0.15 with 95% BCI (0.09,0.21).For the blue-colored branch P001, we have an estimated drift of 2.13 with 95% BCI (0.06,4.89), and for the red-colored branch P002 the estimated drift is -1.18 with 95% BCI (-4.46,0.22). Both estimated drift rates are drastically different from the parent branch rate, andtheir BCIs are also much wider. If either subject P001 or P002 is deleted from the dataset, we infer a constant, significant drift rate of 0.15 over the entire evolutionary history.Notably, we do not infer any drift rate changes after deleting either P001 or P002.

Antibody Constant Drift Relaxed DriftPG9 0.08 (-0.05, 0.19) 0.07 (-0.04, 0.18)PG16 0.10 (-0.02, 0.20) 0.11 (-0.01, 0.24)VRC01 0.15 (0.06, 0.24) 0.15 (0.09, 0.21)

Table 3: HIV-1 resistance to broadly neutralizing antibodies. Mean drift rates under constantand relaxed drift models for log(IC50) measurements corresponding to monoclonal neutraliz-ing antibodies PG9, PG16, and VRC01. Higher log(IC50) values represent lower sensitivityto antibody neutralization, and positive drift rates indicate a trend over time toward greaterresistance. We report posterior means along with 95% BCIs.

Displacement VarianceAntibody No Drift Constant Drift Relaxed DriftPG9 0.28 (0.18, 0.57) 0.26 (0.16, 0.53) 0.26 (0.17, 0.55)PG16 0.26 (0.17, 0.51) 0.23 (0.15, 0.49) 0.23 (0.15, 0.51)VRC01 0.19 (0.11, 0.43) 0.13 (0.08, 0.27) 0.06 (0.04, 0.14)

Table 4: HIV-1 resistance to broadly neutralizing antibodies. Displacement variance rateunder drift-neutral, constant drift and relaxed drift models for log(IC50) measurements cor-responding to monoclonal neutralizing antibodies PG9, PG16, and VRC01. We report pos-terior means along with 95% BCIs.

Under the relaxed drift model, situations in which both child branches have drift ratesthat differ from the drift rate of the parent branch must be handled with care. In any giventree in the posterior sample, one of the two branches must inherit its drift rate from theparent branch. An examination of the posterior distribution of the rate change indicator atthe parent node of tips P001 and P002 reveals that branch P001 inherits the parent branchdrift rate with posterior probability of about 40% and branch P002 inherits the parent ratewith the other 60% probability mass. Thus the posterior drift rate estimate for each ofbranches P001 and P002 averages over the cases where it inherits the parent drift rate andthe cases where it differs from the parent rate. While the potentially large departures fromthe parent drift rate still come through in the “averaged” posterior estimates, it is of interest

Page 22: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Figure 3: Maximum clade credibility tree depicting evolutionary pattern of HIV-1 resistanceto neutralization by antibody VRC01. Subject identifiers corresponding to tree tips arelisted to the right of the tree. The posterior mean drift is depicted using a color gradientalong the branches. Positive drift rates correspond to a trend toward greater resistance toneutralization. The estimated posterior mean drift rate along the purple colored branchesthat make up most of the tree is 0.15. A drift rate change is inferred at the common ancestorof tips P001 and P002. The red colored branch leading to the tip P002 has an estimatedposterior mean drift of -1.18, and the blue colored branch leading tip P001 has an estimatedposterior mean drift of 2.13. Tree tips and internal nodes are annotated with observed andinferred log(IC50) values.

Page 23: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

to find out how representative they are of the true drift rates. It is conceivable that the“inherited” portion of the posterior may bias the estimate, shifting the mean and wideningthe BCI. In the case of branch P002, for example, the posterior mass near the parent driftrate extends the BCI into the positive axis so that it includes zero. It is also conceivablethat the “inherited” part of the posterior represents a part of the distribution that wouldshow up even without the restrictions of the relaxed drift model.

To elucidate the true nature of the drift rate change that occurs at the parent oftips P001 and P002, we conduct a follow-up analysis. We introduce a new parameterizationof our drift diffusion model that posits three unique drift rates: one each corresponding tobranches P001 and P002, and another for all remaining branches in the phylogenetic tree.The results, presented in Table 5, are similar to the findings from the relaxed drift model.Notably, the 95% BCIs for the drift on branches P001 and P002 still contain the range ofcredible values for the parent drift rate. There are also some key differences between the twoanalyses. The distributions for the drift on branches P001 and P002 are bell-shaped ratherthan bimodal, and the 95% BCIs are wider than under the relaxed drift model. Unlike therelaxed drift model estimate, the 95% BCI for the drift on branch P001 contains 0. However,it has a 0.95 posterior probability of being positive, so there is still some support that thedrift on branch P001 is statistically significant.

An examination of the maximum clade credibility tree in Figure 3, annotated withobserved log(IC50) values and inferred ancestral trait realizations, clarifies why we infer a driftrate change. Consider node triples consisting of two nodes and their common parent node.The triple of tips P001, P002 and their common ancestor features a relatively large differencein trait values at the child nodes as well as relatively short branch lengths connecting nodesP001 and P002 to their parent. While there are other triples with child nodes possessing acomparable difference in log(IC50) values, they have much longer branches leading from theparent node to the children. Similarly, while other triples feature relatively short branchesconnecting the parent to the children, the trait values at the child nodes do not differ asmuch. The unique combination of short branches coupled with a large difference betweenlog(IC50) values at the child nodes explains why the drift rate present on most of the treemay be incompatible with the triple of P001, P002 and their parent.

Relaxed Drift Fixed-ChangesBranch Mean 95% BCI P(Drift > 0) Mean 95% BCI P(Drift > 0)P001 2.13 (0.06, 4.89) > 0.99 2.27 (-0.36, 5.10) 0.95P002 -1.18 (-4.46, 0.22) 0.60 -1.81 (-4.61, 1.03) 0.10Other 0.15 (0.09, 0.21) > 0.99 0.14 (0.08, 0.20) > 0.99

Table 5: Rate changes in HIV-1 drift toward resistance to antibody VRC01. We reportposterior means, 95% BCIs, and posterior probabilities that the drift > 0.

Although there is strong evidence of a drift rate change at the ancestral node of sam-ples P001 and P002, the wide BCIs for the branch P001 and branch P002 drift rate estimates

Page 24: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

suggest that they are poorly informed by the data. The type of drift rate change that occursat the ancestral node of tips P001 and P002 also remains unclear. Subjects P001 and P002are a transmission couple and it appears that one person mounted a very different antibodyresponse than the other to a highly similar virus. Further research may clarify the situation.Nevertheless, we infer a clear, significant drift towards increased resistance to neutralizationby VRC01, and it is robust to deletion of either subject P001 or P002. At the populationlevel, the phylogenetic structure of HIV is “starlike,” featuring multiple co-circulating lin-eages, the dynamics of which generally reflects neutral epidemiological processes (Grenfellet al. 2004). It is therefore notable to find evidence of population level evolution towardsincreased resistance.

6 Discussion

Standard Brownian diffusion is a popular and, in many ways, natural starting pointfor modeling continuous trait evolution in a phylogenetic context. On the other hand, it isvery restrictive and may not adequately describe the dynamics of the underlying evolutionaryprocess. Development of non-Brownian models, such as mean-reverting Ornstein-Uhlenbeckprocesses, represents a promising avenue. However, substantial gains can also be madethrough building upon standard Brownian diffusion approaches. For example, the displace-ment along a branch is typically assumed to have variance equal to the product of the branchlength and a diffusion variance rate matrix P−1, where P−1 does not vary along the phylo-genetic tree. Lemey et al. (2010) demonstrate improvements by relaxing this homogeneityassumption via branch-specific diffusion rate scalars that yield a mixture of Brownian pro-cesses. Here, we show that progress towards a more realistic trait diffusion can be madeby relaxing the assumption of a zero-mean displacement. Furthermore, the drift diffusionapproach we consider is very general. Notably, the Ornstein-Uhlenbeck process is nestedwithin the drift diffusion process defined by

Yi|Ypa(i) ∼ N(β1(ti)Ypa(i) + β2(ti)µi,Σ(ti)

). (45)

Consider the special case where µi = µ for every branch, and

β1(ti) = e−αti , β2(ti) = 1− e−αti , and Σ(ti) =σ2

[1− e−2αti

]. (46)

This is equivalent to an Ornstein-Uhlenbeck process on the phylogenetic tree defined by thestochastic differential equation

dYt = α(µ−Yt)dt+ σdWt, (47)

where Wt is a standard Brownian diffusion process. Here, µ can be thought of as an optimaltrait value, α represents the strength of selection towards µ, and σ2 is the variance of theBrownian diffusion component. Such generality enables formal testing between a wide classof different Gaussian process models.

Page 25: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

We introduce a flexible new Bayesian framework for phylogenetic trait evolution,modeling the evolutionary process as Brownian diffusion with a nontrivial drift. By allowingan estimable mean vector in the displacement distribution, we can account for and quantifya directional trend. However, imposing a constant drift rate can make for an unrealisticapproximation of the underlying process. We overcome this limitation through the relaxeddrift model. The relaxed drift model permits drift rate variation along a phylogenetic treewhile maintaining model identifiability. Drift rates are generally passed on from parentbranches to child branches, and variation is achieved by allowing at most one branch ofany given pair of child branches to assume a different drift rate from their common parentbranch.

The utility of incorporating drift into the diffusion is corroborated by our analyses ofthree viral examples. We apply our methodology to both geographic traits in a phylogeo-graphic setting as well as phenotypic traits. Our phylogeographic analysis of the spread ofHIV-1 in central Africa confirms the findings obtained by discrete phylogeographic inference(Faria et al. 2014). Drift diffusion models fit the data better than drift-neutral Browniandiffusion, and we uncover directional trends in the dispersal of the virus from its origin tosampling locations. We also see that drift rate variation characterizes real spatiotemporaldiffusion processes. The absence of drift in the diffusion model can lead to conflation ofthe latent drift with other parameters, particularly the displacement variance. Our analy-sis of the spread of HIV-1 illustrates how inferred displacement variance rates can decreasewith appropriate drift rate modeling, revealing a clearer, more detailed picture of dispersaldynamics.

While it is tempting to assume drift rate variation and seek out the additional insightit may provide, the data may not support multiple drift rates. This may be the case evenwhen there is a significant constant drift, as we see in our analysis of the West Nile virus.Parameterizing the model to allow the maximal number of unique drift rates can resultin numerous small rate changes that are a consequence of the modeling choice and are notnecessarily driven by the data. Our relaxed drift framework overcomes this issue by inferringthe locations and types of rate changes directly from the data as opposed to making a prioriassumptions about the number of unique drift rates and their appropriate assignments.Bayesian stochastic search variable selection enables efficient exploration of all possible driftrate configurations.

Although our focus has been on drift, a major strength of our approach is its im-plementation in the larger Bayesian phylogenetic framework of BEAST. Through BEAST,we have access to a plethora of different models for molecular character substitution, demo-graphic history and molecular clocks. Bayesian inference provides a natural framework forcontrolling for different sources of uncertainty in evolutionary models, including the phyloge-netic tree and trait and sequence evolution parameters, and testing evolutionary hypotheses.

The gains from introducing drift into our real data examples are encouraging, andthere is a need for continued development of more realistic trait evolutionary models. Weanticipate that our drift diffusion approach will be useful in other scenarios not examinedhere, including antigenic drift in influenza. Antigenic drift is the process by which influenza

Page 26: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

viruses evolve to evade the immune system, and an understanding of its dynamics is essentialto public health efforts. Bedford et al. (2014) have recently developed an integrated approachto mapping antigenic phenotypes that combines it with genetic information. It may befruitful to model the diffusion of the antigenic phenotype in their framework with a relaxeddrift.

While the relaxed drift model has proven to be flexible and useful, its identifiabilityrestrictions may render it inappropriate for some evolutionary scenarios. For example, once adrift rate appears anywhere in the phylogenetic tree, the restrictions mandate that it must bepassed on and “survive” until it reaches an external branch. Lack of support for the survivalof a specific drift rate may not preclude its inclusion under relaxed drift. In the analysisof HIV-1 resistance to neutralization, for example, the parent branch of branches P001 andP002 must pass on its drift rate to one of the two child branches in each tree in the posteriorsample. Yet, the branch which is forced to inherit the parent drift rate alternates betweenP001 and P002 in the posterior sample, resulting in posterior drift distributions for bothbranches that differ from that of their parent drift. However, this unnatural mechanismof deflecting an unsupported drift rate may contribute to misleading drift rate estimates.It would be preferable to sidestep such problems by developing alternative models thataccommodate drift rate variation while retaining identifiability.

Acknowledgements

The research leading to these results has received funding from the European ResearchCouncil under the European Community’s Seventh Framework Programme (FP7/2007-2013)under Grant Agreement no. 278433-PREDEMICS and ERC Grant agreement no. 260864 andthe National Institutes of Health (R01 AI107034, R01 HG006139, R01 LM011827 and T32AI007370) and the National Science Foundation (DMS 1264153).

Appendix

A Identifiability

To ensure that our results are meaningful, it is important to understand the conditionsunder which our model is identifiable. For convenience, and without loss of generality, weassume here that traits are one-dimensional. Following the development in section 2.1, theobserved traits Y = (Y1, . . . ,YN)t at the tips of the phylogeny τ are multivariate-normaldistributed:

P (Y|Y2N−1,P,Vτ ,µτ ) = MVN(Y; Yroot + Tµτ ,P

−1 ⊗Vτ

). (48)

Here, Yroot is an N × 1 vector with the root trait Y2N−1 repeated N times. The vectorµτ = (µ1, . . . ,µ2N−2)

t consists of branch-specific drift rates µi. The N ×N variance matrixVτ is a deterministic function of τ and represents the contribution of the phylogenetic tree

Page 27: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

to the covariance structure. Its diagonal entries Vii are equal to the distance in time betweenthe tip Vi and the root node V2N−1, and off-diagonal entries Vij correspond to the distancein time between the root node V2N−1 and the most recent common ancestor of tips Vi andVj. Finally, the N × (2N − 2) matrix T is defined as follows: Tij = tj, the length of branchj, if branch j is part of the path from the external node i to the root, and Tij = 0 otherwise.In other words, the ith row of T specifies the path of branches in τ connecting external nodei to the root.

Let V(µτ ) denote the vector space of permissible values of µτ for our model. Withrespect to the drift, the model is identifiable if the equality

P (Y|Y2N−1,P,Vτ ,µτ ) = P (Y|Y2N−1,P,Vτ ,µ∗τ ) (49)

implies that µτ = µ∗τ . Because the drift appears only in the mean of the distribution, wehave an identifiability problem if the same mean E(Y) can be realized from different valuesof µτ . In other words, if there exist µτ 6= µ∗τ such that Tµτ = Tµ∗τ . This can happen if andonly if the linear transformation T has a nontrivial kernel. We know that

dimV(µτ ) = dim ker(T) + dim range(T), (50)

and we also know that for any phylogeny τ , T is of full rank because its rows are linearlyindependent. It follows that for the kernel of T to be trivial, we must have

dimV(µτ ) ≤ N. (51)

If we allow a unique drift rate on each branch of τ , we have V(µτ ) = R2N−2. Therefore wemust take a different approach.

It is illuminating to look at identifiability from the perspective of linear equations.For a given µτ , T maps µτ to an N × 1 vector γ:

Tµτ = γ. (52)

Identifiability is then equivalent to the system (52) of N linear equations having a uniquesolution.

To achieve identifiability, we introduce the relaxed drift model. Starting with a driftrate on the unobserved branch leading to the root node and moving down the tree towardthe external nodes, every time a branch splits into two branches, one of two things happens.Either both of the child branches inherit the drift rate of the parent branch, or exactly one ofthe child branches inherits the drift rate from the parent branch while the other gets a newdrift rate. Both child branches taking on different drift rates than the parent branch is notpermitted. To avoid confusion, we continue to denote the 2N − 2 branch-specific drift ratesas µ1, . . . ,µ2N−2, with the understanding that they are not all unique. We let µ∗1, . . . ,µ

∗K

denote the unique drift rates, where K ≤ N .

Definition: We say that a row Ti = (Ti1, Ti2, . . . , Ti,2N−2) in T is µ∗k − dominated if itsassociated path from the root of τ to a tip ends with a branch with drift rate µ∗k. We also

Page 28: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

refer to the sum Tiµτ =∑2N−2

j=1 Tijµj and the path associated with Ti as µ∗k − dominated.

Note that each unique drift rate dominates at least one path.

Definition: An initial branch of the rate µ∗k is a branch whose parent branch has a differentdrift rate. The unobserved branch leading to the root node is also defined to be an initialbranch.

Observe that every branch with drift µ∗k is an initial branch of µ∗k or a descendant of aninitial branch of µ∗k. A drift rate may have more than one initial branch. In order to quantifyhow deep into the tree τ a drift rate extends (starting from the tips and going toward theroot), we make the following definition.

Definition: By a descendant path of a branch b, we mean a series of connected branches,starting with a child branch of b and ending with a branch leading to a tip. The depth of abranch b is equal to the maximal number of branches in descendant paths of b. The depthof a drift rate µ∗k is equal to the maximal depth of its initial branches.

For example, if µ∗k has one initial branch leading to a tip, then µ∗k has depth 0. If thenumber of unknowns K is less than the number of equations N in the system Tµτ = γ, aunique solution can be established by working with a reduced system. We form the reducedsystem by choosing K of the N rows in T, say Ti1 , . . . ,TiK , such that each is dominatedby a different drift rate. If a drift rate dominates more than one path, we choose a pathcontaining a maximal depth initial branch of the rate for the reduced system.

Claim: The relaxed drift model is identifiable.

Proof: The reduced linear system

2N−2∑j=1

Ti1jµj = γi1 (53)

2N−2∑j=1

Ti2jµj = γi2 (54)

. . . (55)2N−2∑j=1

TiKjµj = γiK , (56)

consists of K equations and K variables. Therefore to show that the solution is unique, itsuffices to show that the linear system is independent. To establish independence, it sufficesto show that if

a1

2N−2∑j=1

Ti1jµj + a2

2N−2∑j=1

Ti2jµj · · ·+ aN

2N−2∑j=1

TiKjµj = 0, (57)

for some constants a1, . . . , aK , then we must have

a1 = a2 = · · · = aK = 0. (58)

Page 29: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Suppose (57) holds. The idea behind the proof is as follows: we consider all drift rates ofdepth 0, conclude that each sum in (57) dominated by a drift rate of depth 0 must have itscorresponding coefficient ai = 0, then consider all drift rates of depth 1, conclude that eachsum in (57) dominated by a drift rate of depth 1 must have corresponding coefficient ai = 0,and so on until we have gone through all possible depth values of drift rates in τ .

Suppose µ∗k has depth 0. Then µ∗k only appears in the single µ∗k−dominated sumand cannot be canceled out by a linear combination of the other sums. This forces thecoefficient ai of the µ∗k−dominated sum in (57) to be equal to zero. Having shown that anysum dominated by a drift rate of depth 0 must have a zero coefficient in (57), we can moveon to the case of depth 1. Rather than handle the case of depth 1 separately, we present ageneral argument.

Suppose the coefficients of all sums in (57) that are dominated by drift rates of depthless than m have been shown to be zero. Consider drift rates of depth m. If µ∗k has depthm, then it appears in the µ∗k−dominated path, and it may appear in paths dominated byother rates. Suppose µ∗k appears in a path Pi dominated by a different rate, say µ∗i . Byconstruction of the reduced system, Pi contains an initial branch bi of µ∗i of maximal depth.This means the depth of bi is equal to the depth of µ∗i . Because bi is a descendant of abranch with rate µ∗k, the depth of µ∗i must be less than the depth of µ∗k. But sums dominatedby rates of depth less than m have already been shown to have zero coefficients in (57). Thusthe sum associated with Pi has coefficient zero. Because µ∗k appears in only one sum whichis not already known to have a zero coefficient, the µ∗k−dominated sum, it follows that inorder for (57) to hold, the µ∗k−dominated sum must also have a zero coefficient. Thereforesums dominated by drift rates of depth m must have zero coefficients in (57).

Invoking this argument until we have gone through all possible values of drift ratedepth, it follows that a1 = a2 = · · · = aK = 0. �

Page 30: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

References

Allicock, O. M., P. Lemey, A. J. Tatem, O. G. Pybus, S. N. Bennett, B. A. Mueller, M. A.Suchard, J. E. Foster, A. Rambaut, and C. V. Carrington. 2012. Phylogeography andpopulation dynamics of dengue viruses in the americas. Molecular Biology and Evolution29:1533–1543.

Baele, G. and P. Lemey. 2013. Bayesian evolutionary model testing in the phylogenomics era:matching model complexity with computational efficiency. Bioinformatics 29:1970–1979.

Baele, G., P. Lemey, T. Bedford, A. Rambaut, M. A. Suchard, and A. V. Alekseyenko.2012. Improving the accuracy of demographic and molecular clock model comparison whileaccommodating phylogenetic uncertainty. Molecular Biology and Evolution 29:2157–2167.

Baele, G., W. L. S. Li, A. J. Drummond, M. A. Suchard, and P. Lemey. 2013. Accuratemodel selection of relaxed molecular clocks in Bayesian phylogenetics. Molecular Biologyand Evolution 30:239–243.

Barouch, D. 2008. Challenges in the development of an HIV-1 vaccine. Nature 455:613–619.

Bartoszek, K., J. Pienaar, P. Mostad, S. Andersson, and T. Hansen. 2012. A phylogeneticcomparative method for studying multivariate adaptation. Journal of Theoretical Biology314:204–215.

Bedford, T., M. Suchard, P. Lemey, G. Dudas, V. Gregory, A. Hay, J. McCauley, C. Russell,D. Smith, and A. Rambaut. 2014. Integrating influenza antigenic dynamics with molecularevolution. Elife 3:e01914.

Bennett, S., A. Drummond, D. Kapan, M. Suchard, J. Munoz-Jordan, O. Pybus, E. Holmes,and D. Gubler. 2010. Epidemic dynamics revealed in dengue evolution. Molecular Biologyand Evolution 27:811–818.

Biek, R., J. Henderson, L. Waller, C. Rupprecht, and L. Real. 2007. A high-resolution geneticsignature of demographic and spatial expansion in epizootic rabies virus. PNAS 104:7993–7998.

Bikandou, B., M. Ndoundou-Nkodia, F. Niama, M. Ekwalanga, O. Obengui, R. Taty-Taty,H. Parra, and S. Saragosti. 2004. Genetic subtyping of gag and env regions of HIV type 1isolates in the Republic of Congo. AIDS Research and Human Retroviruses 20.

Bikandou, B., J. Takehisa, I. Mboudjeka, E. Ido, T. Kuwata, Y. Miyazaki, H. Moriyama,Y. Harada, Y. Taniguchi, H. Ichimura, M. Ikeda, P. Ndolo, M. Nzoukoudi, R. M’Vouenze,M. M’Pandi, H. Parra, P. M’Pele, and M. Hayami. 2000. Genetic subtypes of HIV type 1in Republic of Congo. AIDS Research and Human Retroviruses 16:613–619.

Bloomquist, E. W., P. Lemey, and M. A. Suchard. 2010. Three roads diverged? routes tophylogeographic inference. Trends in Ecology & Evolution 25:626–632.

Page 31: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Bouvin-Pley, M., M. Morgand, A. Moreau, P. Jestin, C. Simonnet, L. Tran, C. Goujard,L. Meyer, F. Barin, and M. Braiband. 2013. Evidence for a continuous drift of the HIV-1species towards higher resistance to neutralizing antibodies over the course of the epidemic.PLoS Pathogens 9:e1003477.

Brown, R. 1828. A brief account of microscopial observations made in the months of June,July and August, 1827, on the particles contained in the pollen of plants; and on the generalexistence of active molecules in organic and inorganic bodies. Philosophical Magazine4:161–173.

Bunnik, E., Z. Euler, M. Welkers, B. Boeser-Nunnink, M. Grijsen, J. Prins, andH. Schuitemaker. 2010. Adaptation of HIV-1 envelope gp120 to humoral immunity ata population level. Nature Medicine 16:995–997.

Burton, D. 2002. Antibodies, viruses and vaccines. Nature Reviews Immunology 2:706–713.

Burton, D. 2004. HIV vaccine design and the neutralizing antibody problem. Nature Im-munology 5:233–236.

Butler, M. and A. King. 2004. Phylogenetic comparative analysis: a modeling approach foradaptive evolution. American Naturalist 164.

CDC. 2013. Centers for disease control and prevention, West Nile Virus. http://www.cdc.gov/westnile.

Chipman, H., E. George, and R. McCulloch. 2001. The practical implementation of Bayesianmodel selection. IMS Lecture Notes - Monograph Series 38:67–134.

Cybis, G. B., J. S. Sinsheimer, T. Bedford, A. E. Mather, P. Lemey, and M. A. Suchard. 2015.Assessing phenotypic correlations through the multivariate phylogenetic latent liabilitymodel. Annals of Applied Statistics 9:969–991.

Drummond, A., S. Ho, M. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and datingwith confidence. PLoS Biology 4:e88.

Drummond, A. and M. Suchard. 2010. Bayesian random local clocks, or one rate to rulethem all. BMC Biology 8.

Drummond, A. J., M. A. Suchard, D. Xie, and A. Rambaut. 2012. Bayesian phylogeneticswith BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29:1969–1973.

Euler, Z., E. Bunnik, J. Burger, B. Boeser-Nunnink, M. Grijsen, J. Prins, andH. Schuitemaker. 2011. Activity of broadly neutralizing antibodies, including PG9, PG16,and VRC01, against recently transmitted subtype B HIV-1 variants from early and latein the epidemic. Journal of Virology 85:7236–7245.

Page 32: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Faria, N., A. Rambaut, M. Suchard, G. Baele, T. Bedford, M. Ward, A. Tatem, J. Sousa,N. Arinaminpathy, J. Pepin, D. Posada, M. Peeters, O. Pybus, and P. Lemey. 2014. Theearly spead and epidemic ignition of HIV-1 in human populations. Science 346:56–61.

Felsenstein, J. 1973. Maximum-likelihood estimation of evolutionary trees from continuouscharacters. American Journal of Human Genetics 25:471–492.

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood ap-proach. Journal of Molecular Evolution 13:93–104.

Felsenstein, J. 1985. Phylogenies and the comparative method. The American Naturalist125:1–15.

Freckleton, R., P. Harvey, and M. Pagel. 2002. Phylogenetic analysis and comparative data:A test and review of evidence. The American Naturalist 160:712–726.

George, E. and R. McCulloch. 1993. Variable selection via Gibbs sampling. Journal of Amer-ican Statistical Association 88:881–889.

Gill, M., P. Lemey, N. Faria, A. Rambaut, B. Shapiro, and M. Suchard. 2013. Improv-ing Bayesian population dynamics inference: a coalescent-based model for multiple loci.Molecular Biology and Evolution 30:713–724.

Grenfell, B., O. Pybus, J. Gog, J. Wood, J. Daly, J. Mumford, and E. Holmes. 2004. Unifyingthe epidemiological and evolutionary dynamics of pathogens. Science 303:327–332.

Hansen, T. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution52:1341–1351.

Harvey, P. and M. Pagel. 1991. The comparative method in evolutionary biology. OxfordUniversity Press.

Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecularclock of mitochondrial DNA. Journal of Molecular Evolution 22:160–174.

Hastings, W. 1970. Monte Carlo sampling methods using Markov chains and their applica-tions. Biometrika 57:97–109.

Huelsenbeck, J. and B. Rannala. 2003. Detecting correlation between characters in a com-parative analysis with uncertain phylogeny. Evolution 57:1237–1247.

Jeffreys, H. 1935. Some tests of significance, treated by the theory of probability. Mathemat-ical Proceedings of the Cambridge Philosophical Society 31:203–222.

Jeffreys, H. 1961. Theory of Probability. Oxford University Press.

Johnston, M. and A. Fauci. 2007. An HIV vaccine: evolving concepts. New England Journalof Medicine 356:2073–2080.

Page 33: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Kalish, M., K. Robbins, D. Pieniazek, A. Schaefer, N. Nzilambi, T. Quinn, M. S. Louis,A. Youngpairoj, J.Phillips, H. Jaffe, and T. Folks. 2004. Recombinant viruses and earlyglobal HIV-1 epidemic. Emerging Infectious Diseases 10:1227–1234.

Kass, R. and A. Raftery. 1995. Bayes factors. Journal of the American Statistical Association90:773–795.

Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitu-tions through comparative studies of nucleotide sequences. Journal of Molecular Evolution16:111–120.

Kita, K., N. Ndembi, M. Ekwalanga, E. Ido, R. Kazadi, B. Bikandou, J. Takehisa, T. Take-mura, S. Kageyama, J. Tanaka, H. Parra, M. Hayami, and H. Ichimura. 2004. Geneticdiversity of HIV type 1 in Likasi, southeast of the Democratic Republic of Congo. AIDSResearch and Human Retroviruses 20:1352–1357.

Kuo, L. and B. Mallick. 1998. Variable selection for regression models. Sankhya B 60:65–81.

Lanciotti, R., J. Roehrig, V. Deubel, J. Smith, M. Parker, K. Steele, B. Crise, K. Volpe,M. Crabtree, J. Scherret, R. Hall, J. MacKenzie, C. Cropp, B. Panigrahy, E. Ostlund,B. Schmitt, M. Malkinson, C. Banet, J. Weissman, N. Komar, H. Savage, W. Stone,T. McNamara, and D. Gubler. 1999. Origin of the West Nile virus responsible for anoutbreak of encephalitis in the northeastern United States. Science 286:2333–2337.

Lartillot, N. and R. Poujol. 2011. A phylogenetic model for investigating correlated evo-lution of subsitution rates and continuous phenotypic characters. Molecular Biology andEvolution 28:729–744.

Lemey, P., A. Rambaut, A. Drummond, and M. Suchard. 2009a. Bayesian phylogeographyfinds its roots. PLoS Computational Biology 5:e1000520.

Lemey, P., A. Rambaut, J. Welch, and M. Suchard. 2010. Phylogeography takes a relaxedrandom walk in continuous space and time. Molecular Biology and Evolution 27:1877–1885.

Lemey, P., M. Suchard, and A. Rambaut. 2009b. Reconstructing the initial global spread ofa human influenza pandemic: a Bayesian spatial-temporal model for the global spread ofH1N1. PLoS Currents RRN1031.

Mascola, J. and D. Montefiori. 2010. The role of antibodies in HIV vaccines. Annual Reviewof Immunology 28:413–444.

Mello, F., O. Araujo, B. Lago, A. Motta-Castro, M. Moraes, S. Gomes, G. Bello, andN. Araujo. 2013. Phylogeography and evolutionary history of hepatitis B virus genotypeF in Brazil. Virology Journal 10:236.

Page 34: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Metropolis, N., A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. 1953. Equation ofstate calculation by fast computing machines. Journal of Chemical Physics 21:1087–1092.

Minin, V., E. Bloomquist, and M. Suchard. 2008. Smooth skyride through a rough skyline:Bayesian coalescent based inference of population dynamics. Mol Biol Evol 25:1459–1471.

Niama, F., C. Toure-Kane, N. Vidal, P. Obengui, B. Bikandou, M. Ndoundou-Nkodia,C. Montavon, H. Diop-Ndiaye, J. Mombouli, E. Mokondzimobe, A. Diallo, E. Delaporte,H. Parra, M. Peeters, and S. Mboup. 2006. HIV-1 subtypes and recombinants in theRepublic of Congo. Infection, Genetics and Evolution 6:337–343.

Pagel, M. 1999. The maximum likelihood approach to reconstructing ancestral characterstates of discrete characters on phylogenies. Systematic Biology 48:612–622.

Pybus, O., M. Suchard, P. Lemey, F. Bernardin, A. Rambaut, F. Crawford, R. Gray, N. Ari-naminpathy, S. Stramer, M. Busch, and E. Delwart. 2012. Unifying the spatial epidemiol-ogy and molecular evolution of emerging epidemics. PNAS 109:15066–71.

Rambaut, A., O. Pybus, M. Nelson, C. Viboud, J. Taubenberger, and E. Holmes. 2008. Thegenomic and epidemiological dynamics of human influenza A virus. Nature 453:615–619.

Seetahal, J., A. Velasco-Billa, O. Allicock, A. Adesiyun, J. Bissessar, K. Amour, A. Phillip-Hosein, D. Marston, L. McElhinney, M. Shi, C. Wharwood, A. Fooks, and C. Carrington.2013. Evolutionary history and phylogeography of rabies viruses associated with outbreaksin Trinidad. PLoS Neglected Tropical Diseases 7:e2365.

Sinsheimer, J., J. Lake, and R. Little. 1996. Bayesian hypothesis testing of four-taxon topolo-gies using molecular sequence data. Biometrics 52:193–210.

Smith, G., D. Vijaykrishna, J. Bahl, S. Lycett, M. Worobey, O. Pybus, S. Ma, C. Cheung,J. Raghwani, S. Bhatt, J. Peiris, Y. Guan, and A. Rambaut. 2009. Origins and evolutionarygenomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459:1122–1125.

Suchard, M., R. Weiss, and J. Sinsheimer. 2001. Bayesian selection of continuous-timeMarkov chain evolutionary models. Molecular Biology and Evolution 18:1001–1013.

Suchard, M., R. Weiss, and J. Sinsheimer. 2005. Models for estimating Bayes factors withapplications to phylogeny and tests of monophyly. Biometrics 61:665–673.

Vidal, N., C. Mulanga, S. Bazepeo, J. Mwamba, J. Tshimpaka, M. Kashi, N. Mama, C. Lau-rent, F. Lepira, E. Delaporte, and M. Peeters. 2005. Distribution of HIV-1 variants in theDemocratic Republic of Congo suggests increase of subtype C in Kinshasa between 1997and 2002. Journal of Acquired Immune Deficiency Syndromes 40:456–462.

Vidal, N., M. Peeters, C. Mulanga-Kabeya, N. Nzilambi, D. Robertson, W. Ilunga, H. Sema,K. Tshimanga, B. Bongo, and E. Delaporte. 2000. Unprecedented degree of human immun-odeficiency virus type 1 (HIV-1) group M genetic diversity in the Democratic Republic of

Page 35: A Relaxed Drift Di usion Model for Phylogenetic Trait ... · fusion model for evolution of multivariate traits in a Bayesian framework that integrates it with models for phylogenetic

Congo suggests that the HIV-1 pandemic originated in Central Africa. Journal of Virology74:10498–10507.

Vrancken, B., P. Lemey, A. Rambaut, T. Bedford, B. Longdon, H. F. Gunthard, and M. A.Suchard. 2015. Simultaneously estimating evolutionary history and repeated traits phylo-genetic signal: applications to viral and host phenotypic evolution. Methods in Ecologyand Evolution 6:67–82.

Walker, B. and D. Burton. 2008. Toward an AIDS vaccine. Science 320:760–764.

Walker, L., S. Phogat, P. Chan-Hui, D. Wagner, P. Phung, J. Goss, T. Wrin, M. Simek,S. Fling, J. Mitcham, J. Lehrman, F. Priddy, O. Olsen, S. Frey, P. Hammond, P. G. P.Investigators, S. Kaminsky, T. Zamb, M. Moyle, W. Koff, P. Poignard, and D. Burton.2009. Broad and potent neutralizing antibodies from an African donor reveal a new HIV-1vaccine target. Science 326:285–289.

Weiss, R., P. Clapham, R. Cheingsong-Popov, A. Dalgleish, C. Carne, I. Weller, and R. Ted-der. 1985. Neutralization of human T-lymphocyte virus type III by sera of AIDS andAIDS-risk patients. Nature 316:69–72.

Wiener, N. 1958. Nonlinear problems in random theory. MIT Press, Cambridge (MA).

Yang, C., M. Li, J. Mokili, J. Winter, N. Lubaki, K. Mwandagalirwa, M. Kasali, A. Losoma,T. Quinn, R. Bollinger, and R. Lal. 2005. Genetic diversification and recombination ofHIV type 1 group M in Kinshasa, Democratic Republic of Congo. AIDS Research andHuman Retroviruses 21:661–666.

Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with vari-able rates over sites: approximate methods. Journal of Molecular Evolution 39:306–314.

Zhou, T., I. Georgiev, X. Wu, Z. Yang, K. Dai, A. Finzi, Y. Kwon, J. Scheid, W. Shi,L. Xu, Y. Yang, J. Zhu, M. Nussenzweig, J. Sodroski, L. Shapiro, G. Nabel, J. Mascola,and P. Kwong. 2010. Structural basis for broad and potent neutralization of HIV-1 byantibody VRC01. Science 329:811–817.