intraspecific gene genealogies: trees grafting into networks

9
During the past decade, the explosion of molecular techniques has led to the accumulation of a considerable amount of comparative genetic information at the population level. At the same time, recent advances in population genetics theory, especially coalescent theory, have generated powerful tools for the analysis of intraspecific data. These two developments have converted intraspecific phylogenies into useful tools for testing a variety of evolutionary and population genetic hypotheses. Several phylogenetic methods, especially NETWORK (see Glossary) approaches, have been developed to take advantage of the unique characteristics of intraspecific data. In this article, we summarize some population genetics principles, explain why networks are appropriate representations of intraspecific genetic variation, describe and compare available methods and software for network estimation, and give examples of their application. Gene genealogies Given a sample of GENES, the relationships among them can be traced back in time to a common ancestral gene. The genealogical pathways interconnecting the current sample to the common ancestor constitute a GENE TREE or gene genealogy. A gene tree is the pedigree of a set of genes and exists independently of potential mutations. The only portion of a gene tree that can generally be estimated with genetic data is that portion marked by the (potential) mutational events that define the different ALLELES ( Box 1). This lower resolution tree is the allele Intraspecific gene evolution cannot always be represented by a bifurcating tree. Rather , population genealogies are often multifurcated, descendant genes coexist with persistent ancestors and recombination events produce reticulate relationships. Whereas traditional phylogenetic methods assume bifurcating trees, several networking approaches have recently been developed to estimate intraspecific genealogies that take into account these population- level phenomena. Intr aspecific gene genealogies: tr ees gr af ting int o networ ks David Posadaand Keith A. Crandall TRENDS in Ecology & Evolution Vol.16 No.1 January 2001 http://tree.trends.com 01695347/01/$ see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0169-5347(00)02026-7 37 Review Review 12 Swofford, D.L. et al. (1996) Phylogenetic inference. In Molecular Systematics (Hillis, D.M. et al., eds), pp. 407–514, Sinauer Associates 13 Huelsenbeck, J.P. and Crandall, K.A. (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst. 28, 437–466 14 Lewis, P.O. (1998) Maximum likelihood as an alternative to parsimony for inferring phylogeny using nucleotide sequence data. In Molecular Systematics of Plants II (Soltis, D.E. et al., eds), pp. 132–163, Kluwer 15 Swofford, D.L. (2000) PAUP*, Phylogenetic Analysis Using Parsimony (Sinauer, Sunderland, MA), Version 4.0b3, 16 Yang, Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13, 555–556 17 Muse, S.V. and Gaut, B.S. (1994) A likelihood approach for comparing synonymous and nonsynonymous substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11, 715–724 18 Goldman, N. and Yang, Z. (1994) A codon-based model of nucleotide substitution for protein- coding DNA sequences. Mol. Biol. Evol. 11, 725–736 19 Muse, S.V. and Kosakovsky Pond, S.L. (2000) HYPHY, Hypothesis Testing Using Phylogenies, (North Carolina State University, Raleigh) Version 1 20 Tillier, E.R.M. and Collins, R.A. (1995) Neighbor joining and maximum likelihood with RNA sequences: addressing the interdependence of sites. Mol. Biol. Evol. 12, 7–15 21 Muse, S.V. (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139, 1429–1439 22 Schluter, D. et al. (1997) Likelihood of ancestor states in adaptive radiation. Evolution 51, 1699–1711 23 Mooers, A.Ø. and Schluter, D. (1999) Reconstructing ancestor states with maximum likelihood: support for one- and two-rate models. Syst. Biol. 48, 623–633 24 Pagel, M. (1999) The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst. Biol. 48, 612–622 25 Cunningham, C.W. et al. (1998) Reconstructing ancestral character states: a critical reappraisal. Trends Ecol. Evol. 13, 361–366 26 Pagel, M. (1994) Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc. R. Soc. London B Biol. Sci. 255, 37–45 27 Qiang, J. et al. (1998) Two feathered dinosaurs from northeastern China. Nature 393, 753–761 28 Lutzoni, F. and Pagel, M. (1997) Accelerated evolution as a consequence of transitions to mutualism. Proc. Natl. Acad. Sci. U. S. A. 94, 11422–11427 29 Larget, B. and Simon, D.L. (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 30 Mau, B. and Newton, M.A. (1997) Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo. J. Comput. Graph. Stat. 6, 122–131 31 Mau, B. et al. (1999) Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55, 1–12 32 Newton, M. et al. (1999) Markov chain Monte Carlo for the Bayesian analysis of evolutionary trees from aligned molecular sequences. In Statistics in Molecular Biology (Seillier- Moseiwitch, F. et al., eds), Institute of Mathematical Statistics 33 Rannala, B. and Yang, Z. (1996) Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43, 304–311 34 Yang, Z.H. and Rannala, B. (1997) Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo method. Mol. Biol. Evol. 14, 717–724 35 Metropolis, N. et al. (1953) Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 36 Hastings, W.K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 37 Huelsenbeck, J.P. et al. (2000) A compound Poisson process for relaxing the molecular clock. Genetics 154, 1879–1892 38 Thorne, J.L. et al. (1998) Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647–1657 39 Felsenstein, J. (1985) Confidence intervals on phylogenies: an approach using the bootstrap. Evolution 39, 783–791

Upload: dangthu

Post on 10-Feb-2017

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Intraspecific gene genealogies: trees grafting into networks

During the past decade, the explosion of moleculartechniques has led to the accumulation of aconsiderable amount of comparative geneticinformation at the population level. At the same time,recent advances in population genetics theory,especially coalescent theory, have generated powerfultools for the analysis of intraspecific data. These twodevelopments have converted intraspecificphylogenies into useful tools for testing a variety ofevolutionary and population genetic hypotheses.Several phylogenetic methods, especially NETWORK

(see Glossary) approaches, have been developed to

take advantage of the unique characteristics ofintraspecific data. In this article, we summarize somepopulation genetics principles, explain why networksare appropriate representations of intraspecificgenetic variation, describe and compare availablemethods and software for network estimation, andgive examples of their application.

Gene genealogiesGiven a sample of GENES, the relationships amongthem can be traced back in time to a commonancestral gene. The genealogical pathwaysinterconnecting the current sample to the commonancestor constitute a GENE TREE or gene genealogy. Agene tree is the pedigree of a set of genes and existsindependently of potential mutations. The onlyportion of a gene tree that can generally be estimatedwith genetic data is that portion marked by the(potential) mutational events that define the differentALLELES (Box 1). This lower resolution tree is the allele

Intraspecific gene evolution cannot always be represented by a bifurcating tree.Rather, population genealogies are often multifurcated, descendant genescoexist with persistent ancestors and recombination events produce reticulaterelationships.Whereas traditional phylogenetic methods assume bifurcatingtrees, several networking approaches have recently been developed toestimate intraspecific genealogies that take into account these population-level phenomena.

Intraspecific gene genealogies:trees grafting into networksDavid Posada and Keith A. Crandall

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com 0169–5347/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0169-5347(00)02026-7

37ReviewReview

12 Swofford, D.L. et al. (1996) Phylogeneticinference. In Molecular Systematics (Hillis,D.M. et al., eds), pp. 407–514, SinauerAssociates

13 Huelsenbeck, J.P. and Crandall, K.A. (1997)Phylogeny estimation and hypothesis testingusing maximum likelihood. Annu. Rev. Ecol.Syst. 28, 437–466

14 Lewis, P.O. (1998) Maximum likelihood as analternative to parsimony for inferringphylogeny using nucleotide sequence data. InMolecular Systematics of Plants II (Soltis, D.E.et al., eds), pp. 132–163, Kluwer

15 Swofford, D.L. (2000) PAUP*, PhylogeneticAnalysis Using Parsimony (Sinauer,Sunderland, MA), Version 4.0b3,

16 Yang, Z. (1997) PAML: a program package forphylogenetic analysis by maximum likelihood.CABIOS 13, 555–556

17 Muse, S.V. and Gaut, B.S. (1994) A likelihoodapproach for comparing synonymous andnonsynonymous substitution rates, withapplication to the chloroplast genome. Mol. Biol.Evol. 11, 715–724

18 Goldman, N. and Yang, Z. (1994) A codon-basedmodel of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11,725–736

19 Muse, S.V. and Kosakovsky Pond, S.L. (2000)HYPHY, Hypothesis Testing Using Phylogenies,(North Carolina State University, Raleigh)Version 1

20 Tillier, E.R.M. and Collins, R.A. (1995)Neighbor joining and maximum likelihood with

RNA sequences: addressing the interdependenceof sites. Mol. Biol. Evol. 12, 7–15

21 Muse, S.V. (1995) Evolutionary analyses of DNAsequences subject to constraints on secondarystructure. Genetics 139, 1429–1439

22 Schluter, D. et al. (1997) Likelihood of ancestorstates in adaptive radiation. Evolution 51,1699–1711

23 Mooers, A.Ø. and Schluter, D. (1999)Reconstructing ancestor states with maximumlikelihood: support for one- and two-rate models.Syst. Biol. 48, 623–633

24 Pagel, M. (1999) The maximum likelihoodapproach to reconstructing ancestral characterstates of discrete characters on phylogenies. Syst.Biol. 48, 612–622

25 Cunningham, C.W. et al. (1998) Reconstructingancestral character states: a critical reappraisal.Trends Ecol. Evol. 13, 361–366

26 Pagel, M. (1994) Detecting correlated evolution onphylogenies: a general method for thecomparative analysis of discrete characters. Proc.R. Soc. London B Biol. Sci. 255, 37–45

27 Qiang, J. et al. (1998) Two feathered dinosaursfrom northeastern China. Nature 393, 753–761

28 Lutzoni, F. and Pagel, M. (1997) Acceleratedevolution as a consequence of transitions tomutualism. Proc. Natl. Acad. Sci. U. S.A. 94,11422–11427

29 Larget, B. and Simon, D.L. (1999) Markov chainMonte Carlo algorithms for the Bayesian analysisof phylogenetic trees. Mol. Biol. Evol. 16, 750–759

30 Mau, B. and Newton, M.A. (1997) Phylogeneticinference for binary data on dendrograms using

Markov chain Monte Carlo. J. Comput. Graph.Stat. 6, 122–131

31 Mau, B. et al. (1999) Bayesian phylogeneticinference via Markov chain Monte Carlo methods.Biometrics 55, 1–12

32 Newton, M. et al. (1999) Markov chain MonteCarlo for the Bayesian analysis of evolutionarytrees from aligned molecular sequences. InStatistics in Molecular Biology (Seillier-Moseiwitch, F. et al., eds), Institute ofMathematical Statistics

33 Rannala, B. and Yang, Z. (1996) Probabilitydistribution of molecular evolutionary trees: anew method of phylogenetic inference. J. Mol.Evol. 43, 304–311

34 Yang, Z.H. and Rannala, B. (1997) Bayesianphylogenetic inference using DNA sequences: aMarkov Chain Monte Carlo method. Mol. Biol.Evol. 14, 717–724

35 Metropolis, N. et al. (1953) Equation of statecalculations by fast computing machines. J. Chem. Phys. 21, 1087–1092

36 Hastings, W.K. (1970) Monte Carlo samplingmethods using Markov chains and theirapplications. Biometrika 57, 97–109

37 Huelsenbeck, J.P. et al. (2000) A compoundPoisson process for relaxing the molecular clock.Genetics 154, 1879–1892

38 Thorne, J.L. et al. (1998) Estimating the rate ofevolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647–1657

39 Felsenstein, J. (1985) Confidence intervals onphylogenies: an approach using the bootstrap.Evolution 39, 783–791

Page 2: Intraspecific gene genealogies: trees grafting into networks

or HAPLOTYPE tree. Gene genealogies are approximatedby the estimation of haplotype or allele trees. Giventhe abundance of methods available for phylogeneticestimation1, which ones are most appropriate forestimating haplotype trees?

Problems with interspecific methods at the intraspecificlevelEvolutionary relationships above and below the specieslevel are different in nature. Relationships betweengenes sampled from individuals belonging to differentspecies (phylogeny sensu stricto) are hierarchical. Thisis because they are the product of reproductive isolationand population fission over longer timescales, duringwhich mutation combined with population divergenceled to the fixation of different alleles and, ultimately, tononoverlapping gene pools.

By contrast, relationships between genes sampledfrom individuals within a species (sometimes calledTOKOGENY2) are not hierarchical, because they are theresult of sexual reproduction, of smaller numbers ofrelatively recent mutations and, frequently, ofrecombination (Fig. 1). More traditional methodsdeveloped to estimate interspecific relationships,such as maximum likelihood, maximum parsimonyand minimum evolution, cannot properly takeaccount of the fact that, at the population level,several phenomena violate some of theirassumptions. This leads to poor resolution orinadequately portrays genealogical relationships.

Low divergenceBy necessity, conspecific individuals diverge laterthan individuals from different species.Consequently, within-species data sets have fewercharacters for phylogenetic analysis, diminishing thestatistical power of traditional phylogenetic methods.

Extant ancestral nodesIn natural populations, most haplotypes in the genepool exist as sets of multiple, identical copies thatoriginated by DNA replication. When one of thesecopies mutates to a new haplotype, it is extremelyunlikely that other copies of the ancestral haplotypealso mutate or that all copies of the ancestralhaplotype rapidly become extinct. Thus, the ancestralhaplotypes are expected to persist in the populationand to be sampled together with their descendants.Traditional phylogenetic methods, based on abifurcating TREE, can detect and artificially representpersistent ancestral haplotypes as occupying abranch of zero length at the basal node of a cluster.However, this approach relies on modifying (e.g. byestimation of branch lengths) an inappropriate model– a bifurcating tree with all haplotypes occupying tipsor terminal branches.

MultifurcationsA fact related to the persistence of ancestral haplotypesin the population is that a single ancestral haplotypewill often give rise to multiple descendant haplotypes,

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

38 ReviewReview

David Posada*Keith Crandall Dept of Zoology, BrighamYoung University, 574WIDB, Provo, UT 84602,USA.*e-mail: david.posada @ byu.edu

Additive tree: a tree on which the pairwise distances betweenhaplotypes are equal to the sum of the lengths of the branches on thepath between the members of each pair.Coalescent event: the time inverse of a DNA replication event; thatis, the event leading to the common ancestor of two sequenceslooking back in time.Gene: a segment of DNA.Gene tree: the representation of the evolutionary history of a groupof genes.Haplotype or allele: a unique combination of genetic markerspresent in a sample.Haplotype space: the collection of points representing the possibledifferent haplotypes. The dimension of this space is the number ofcharacters (L). The number of points (haplotypes) included in thehaplotype space is the number of states raised to the number ofcharacters (i.e. for DNA sequences 4L).Homoplasy: a similarity that is not a result of common history. It iscaused by parallel, convergent or reverse mutations.Interior haplotypes: those haplotypes that have more than onemutational connection.Minimum-spanning tree: graph theory construct that connects the nhaplotypes, thus a complete network of n-1 branches is built. The treeis ‘minimal’ when the total length of the branches is the minimumnecessary to connect all the haplotypes.Missing intermediate: an extant haplotype that was not sampled oran extinct ancestral haplotype.Network: a connected graph with cycles (Fig. Ia).Patristic or phyletic distances: the distance between haplotypes asinferred from the network. It does not have to be equal to the actualdistance (number of differences) between haplotypes.Phylogeny: hierarchical genetic relationships among species. Arisingby speciation.Robinsonian distance matrix: a matrix with the property that, for any

ordered triplet of sequences i, j and k, any distance dik≥ max (dij ,djk).Singletons: haplotypes represented by a single sequence in thesample.Species tree: the representation of the evolutionary history of agroup of species.Split: the division of the haplotypes into two exclusive sets. For nsequences, there are 2n-1 possible splits.Tip haplotypes: those haplotypes that have only a single mutationalconnection to the other haplotypes within the network.Tokogeny: nonhierarchical genetic relationships among individuals.Arising by sexual reproduction.Tree: a connected graph without cycles (cycles are commonly calledreticulations or loops by evolutionary biologists) (Fig. Ib,c).

Glossary

TRENDS in Ecology & Evolution

Loop or cycle

Bifurcate tree

Multifurcate treeNetwork

I (a) (b)

(c)

Page 3: Intraspecific gene genealogies: trees grafting into networks

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

39Review

yielding a haplotype tree with true multifurcations.Indeed, population genetics theory predicts that theolder the haplotype, the more descendant haplotypeswill be associated with it (Box 2).

ReticulationEvolutionary processes commonly acting at thepopulation level, such as recombination betweengenes and hybridization between lineages, andHOMOPLASY (Box 1), generate reticulate relationshipswithin the population. Traditional methods, based onbifurcating trees, make no explicit allowance for suchreticulations. Instead, for instance, maximumparsimony deals with ambiguities arising fromhomoplasy by simply selecting a tree that minimizesthe number of assumptions of parallel, convergent orreversing mutations without showing where these

might have occurred. Recombinants are also typicallyforced into a nonreticulating tree topology, in which, insome fortunate instances, they might occupy positionsintermediate between two clusters. In other cases, therecombinant will be placed in a basal lineage to theclade that includes its most derived parent3,4.

Large sample sizesAppropriate sampling can be crucial to phylogeneticstudies5. Typically, intraspecific studies involve manyindividuals for comparison, whereas manyinterspecific phylogenetic studies tend to be based onone representative individual per species. Because ofthe density of sampling, especially when coupled withlow divergence, intraspecific data sets reachconsiderable sample sizes (>100). These large samplesizes would require excessive computational time for

Review

Gene trees versus haplotype treesGene trees are independent of the (neutral)mutation process. They depend ondemographic factors such as population sizeand geographical structure. Because, bydefinition, neutral mutations do not affectthe number of offspring or migrationpatterns, they do not affect the genegenealogy. Gene trees precisely representthe gene genealogy of a given sample.However, such precise information is usuallyunknown or even impossible to extract froma sample of extant genes. For example, inFig. I, there is no way to know that gene A1 ismore closely related to A4 than either is toA2. Unless detailed pedigree information isavailable, which usually is not the case, theonly branches of the gene tree that we canestimate are those marked by a mutationand that therefore define haplotypes.

We must be able to see geneticdifferences (mutations) to determinerelationships; therefore, we use haplotypetrees most often. With haplotype trees, wecannot see all the coalescent events and canonly group genes by their similarities andhaplotype classes. Whereas traditionalmethods often lack the power to solveintraspecific relationships, networkapproaches offer an appropriaterepresentation of the haplotyperelationships, including extinct (Fig. I,haplotype D) or unsampled (Fig. I, haplotypeB) haplotype variants.

Haplotype trees versus population treesIt is often assumed that the haplotypetrees represent exactly the history of thepopulations sampled. However,

haplotypes do not, in general, have thesame evolutionary history as populationlineagesa. Disagreement amonghaplotype trees and population trees canarise from recombination (Fig. IIa) and

deep coalescence or lineage sorting (Fig. IIb).

Referencea Hudson, R.R. (1990) Gene genealogies and thecoalescent process. Oxf. Surv. Evol. Biol. 7, 1–44

Box 1. Gene trees, haplotype trees and population trees

TRENDS in Ecology & Evolution

Haplotype tree

A GD

IH BA1 H1 B1 A5 A2I1 C1 I2 G1E1 A3A4

Gene tree

Coalescentevents Mutations

1A T

Present

Past

Time

EC27A G

FC IF1

Haplotype FrequencyA 4C 2E 1F 1G 1H 1I 2

A AAAAAAAC . . . C . . .E . GT . . . .F . . . CG . .G . . . . . T .H T . . . . . GI . G . C . . .

DNA alignment

Individual genes Haplotypes

Sample:12 individuals, 8 haplotypes

3A T

2A G

6A T2A G5A G

4A C

4A CD1

Common ancestorof all the

genessampled

(haplotype A)

I

TRENDS in Ecology & Evolution

Pop. A(hap. Aand B)

Pop. B(hap.C and D)

Pop. C(hap. E)

Pop. D(hap. E)

Pop. A(hap. Aand B)

Pop. B(hap.C and D)

Pop. C(hap. E)

Pop. D(hap. E)

x

xRPop.Hap.

Mutation

Population treeHaplotype tree

Extinct or not sampledRecombinationPopulationHaplotype

R

II (a) (b)

Page 4: Intraspecific gene genealogies: trees grafting into networks

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

40 Review

most methods that have been developed forinterspecific comparisons.

Solution: network methodsPhylogenetic methods that allow for persistentancestral nodes, multifurcations and reticulations areneeded to take these population phenomena intoaccount. The advantage of networks over strictlybifurcating trees for estimating within-speciesrelationships now becomes obvious. Networks canaccount effectively for processes acting at the specieslevel and they might be able to incorporatepredictions from population genetics theory (Box 2).In addition, networks provide a way of representingmore of the phylogenetic information present in adata set (Fig. 2). For example, the presence of loops ina network might indicate recombination. In othercases, loops are the product of homoplasies andprecisely indicate the occurrence of reverse or parallelmutations. Most network methods are distancemethods, with the common idea of minimizing (with

some specific restrictions) the distances (number ofmutations) among haplotypes. In other cases, thelikelihood function is maximized.

PyramidsThe pyramids technique6 is an extension of thehierarchical clustering framework. Whereastraditional hierarchical methods such as UnweightedPair Group with Arithmetic Means (UPGAM)represent a nested set of nonoverlapping clades,pyramids represent a set of clades that can overlapwithout necessarily being nested. The input data is a(Robinsonian) distance matrix (Box 1). The pyramidis obtained by using agglomerative algorithms. Byallowing overlapping clusters, pyramids can be usedto represent reticulate events, although these eventsare only allowed to be placed among terminal nodesthat are sister taxa.

Statistical geometryThis was one of the first network approaches forintraspecific phylogenetics to be developed7. In thismethod, haplotypes are considered as geometricconfigurations in the HAPLOTYPE SPACE. Numericalinvariants (some function of the data whose valueremains constant) related to the length of theconnections (pairwise differences) among haplotypesare assigned to the haplotypes and statisticalaverages of these invariants are then calculated.Given the value for the invariants, the optimalnetwork connecting the four haplotypes in eachquartet is derived. Finally, an average quartetgeometry representative of the whole data set isconstructed by integrating all the quartet networks.This geometry and the associated statistics can beused, for example, to deduce the degree of tree-likeness of the data, to detect varying positionalsubstitution rates in sequences or to estimate therelative temporal order of haplotype divergence.However, they do not offer an estimate of thesequence genealogy. The statistical geometryincorporates a model of nucleotide substitutionthrough the estimation of haplotype distances and itsstatistical nature allows a reliable assessment of thederived conclusions.

Split decompositionAny data set can be partitioned into sets (notnecessarily of equal size) of sequences or ‘splits’. Anetwork can be built by taking in turn those splitsdefined by the characters and combining themsuccessively8. Each split will define a branchconnecting the two partitions delimited by the split.When splits are incompatible (i.e. they definecontradictory groupings) a loop is introduced toindicate that there are alternative splits. The splitdecomposition method is fast, which means that areasonable number of haplotypes (>50) can beanalyzed; that it can be applied to nucleotide orprotein data; and that it allows for the inclusion of

Review

TRENDS in Ecology & Evolution

Species 3

Species 1

Phylogeneticrelationships

(c)

Species 2 (or 1) Species 3

Species 1Hierarchical relationships

Parent 1 Parent 2

Descendant

Nonhierarchical relationships

(a) (b)

Individual

Species 2 (or 1)

Tokogenicrelationships

Fig. 1. Tokogeny versus phylogeny. (a) Processes occurring among sexual species (phylogeneticprocesses) are hierarchical. That is, an ancestral species gives rise to two descendant species. (b) Processes occurring within sexual species (tokogenetic processes) are nonhierarchical. That is, two parentals combine their genes to give rise to the offspring. (c) The split of two species defines aphylogenetic relationship among species (thick lines) but, at the same time, relationships amongindividuals within the ancestral species (species 1) and within the descendant species (species 2 and3) are tokogenetic (arrows).

Page 5: Intraspecific gene genealogies: trees grafting into networks

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

41Review

models of nucleotide substitution or amino acidreplacement. The method is suitable also to bootstrapevaluation.

Median networksIn the median-network approach9,10, sequences arefirst converted to binary data and constant sites areeliminated. Each split is encoded as a binarycharacter with states 0 and 1. Sites that support thesame split are grouped in one character, which isweighted by the number of sites grouped. This leadsto the representation of haplotypes as 0–1 vectors.Median or consensus vectors are calculated for eachtriplet of vectors until the median network isfinished. For >30 haplotypes, the resulting mediannetworks are impractical to display, owing to thepresence of high-dimensional hypercubes.Fortunately, the network can be reduced (i.e. someloops can be solved) using predictions from

coalescent theory (Box 2).All the most parsimonioustrees are guaranteed to be represented in a mediannetwork. Although mainly aimed at mtDNA data,median networks can be estimated from other kindsof data, as long as the data are binary or can bereduced to binary data.

Median-joining networksThe median-joining network method11,12 begins bycombining the MINIMUM-SPANNING TREES (MSTs) within asingle network. With a parsimony criterion, medianvectors (which represent MISSING INTERMEDIATES) areadded to the network. Median-joining networks canhandle large data sets and multistate characters. It is anexceptionally fast method that can analyze thousands ofhaplotypes in a reasonable amount of time and can alsobe applied to amino acid sequences. However, it requiresthe absence of recombination, which restricts theapplication of this method at the population level.

Review

The ancestry of a random sample of ngenes is often modeled by a stochasticprocess known as the coalescent.Coalescent theory describes thegenealogical process of a sample ofselectively neutral genes from apopulation, looking backwards in timea.Several results from coalescent theoryrelated to the frequency and geographicaldistribution of the haplotypes are relevantto intraspecific phylogenetics.

There is a direct relationship betweenhaplotype frequencies and the ages of thehaplotypesb,c. Specifically, the probabilitythat an allele represented ni times in asample of size n is the oldest allele in thesample is ni /n, and the expected rank ofthe alleles by age is the same as the rank ofalleles by frequency. Therefore, high-frequency haplotypes have probably beenpresent in the population for a long time.Consequently, most of the new mutantsare derived from common haplotypes,implying that rarer variants representmore recent mutations and are more likelyto be related to common haplotypes thanto other rare variantsd. The expectation forthe number of alleles in common betweenthe samplese implies that the immediatedescendents of a new mutation are morelikely to remain in the original populationthan to move to a distant population,unless high levels of gene flow occur.These results can be summarized in fiveexplicit predictions.• Older alleles (those of higher frequency

in the population) have a greater

probability of becoming interiorhaplotypes (those haplotypes that havemore than one mutational connection)than younger haplotypes.

• On average, older alleles will be morebroadly distributed geographically.

• Haplotypes with greater frequency willtend to have more mutationalconnections.

• Singletons are more likely to beconnected to nonsingletons than toother singletons.

• Singletons are more likely to beconnected to haplotypes from thesame population than to haplotypesfrom different populations.These theoretical predictions make

intuitive sense and have been shown to bevalid in empirical data setsf. However, theyassume neutral evolution (and lack ofpopulation subdivision) and might not be

accurate in the presence of selection.These predictions can also be used to rootthe network. Some loops or reticulationsin the networks can be the result ofhomoplasies, thus representing theconsequence of a lack of power to decideamong alternative connections. Theseloops could be broken at any place,resulting in different networks.

The above predictions can be used toestablish which one of the alternativenetworks is more plausiblef,g. Forexample, in Fig. I, if the frequency ofhaplotype 2 is lower than the frequency ofthe other haplotypes, resolutions a and bare favored because they result inhaplotype 2 as a tip. (Dotted linesrepresent ambiguous connections.)

Referencesa Hudson, R.R. (1990) Gene genealogies and thecoalescent process. Oxf. Surv. Evol. Biol. 7, 1–44

b Watterson, G.A. and Guess, H.A. (1977) Is themost frequent allele the oldest? Theor. Popul.Biol. 11, 141–160

c Donnelly, P. and Tavaré, S. (1986) The ages ofalleles and a coalescent. Adv. Appl. Prob. 18,1–19

d Excoffier, L. and Langaney, A. (1989) Originand differentiation of human mitochondrialDNA. Am. J. Hum. Genet. 44, 73–85

e Watterson, G.A. (1985) The genetic divergenceof two populations. Theor. Popul. Biol. 27,298–317

f Crandall, K.A. and Templeton, A.R. (1993)Empirical tests of some predictions fromcoalescent theory with applications tointraspecific phylogeny reconstruction. Genetics134, 959–969

g Smouse, P. (1998) To tree or not to tree. Mol.Ecol. 7, 399–412

Box 2. Predictions from coalescent theory and application to intraspecific phylogenetics

TRENDS in Ecology & Evolution

1 2

3 4ba

1 2

3 4

c1 2

3 4d

1 2

3 4

1 2

3 4

I

Page 6: Intraspecific gene genealogies: trees grafting into networks

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

42 Review

Statistical parsimonyThe statistical parsimony algorithm13 begins byestimating the maximum number of differencesamong haplotypes as a result of single substitutions(i.e. those that are not the result of multiplesubstitutions at a single site) with a 95% statisticalconfidence. This number is called the parsimony limit(or parsimony connection limit). After this,haplotypes differing by one change are connected,then those differing by two, by three and so on, untilall the haplotypes are included in a single network orthe parsimony connection limit is reached. Thestatistical parsimony method emphasizes what isshared among haplotypes that differ minimallyrather than the differences among the haplotypes andprovides an empirical assessment of deviations fromparsimony. This method allows the identification ofputative recombinants by looking at the spatial

distribution in the sequence of the homoplasiesdefined by the network14.

Molecular-variance parsimonyThe overall strategy in the molecular-varianceparsimony technique15 is to use some populationstatistics as criteria for the choice of the optimalnetwork. Each competing MST is translated into amatrix of PATRISTIC DISTANCES among haplotypes.These matrices are used to compute a set of relevantpopulation statistics: functions of haplotypefrequencies, squared patristic distances amonghaplotypes and geographic partitioning ofpopulations. The optimal MSTs are those from whichoptimum estimates of population statistics areobtained (e.g. minimizing the molecular variance orthe sum of square deviations). This method makesexplicit use of sampled haplotype frequencies andgeographic subdivisions, and presents the solution inthe form of a set of (near) optimal networks.

NettingThis is a distance method that represents all theequally most parsimonious trees for a given data set ina single network16. The underlying idea is to join theclosest pair of sequences (the pair with the fewestdifferences). The next sequence that is closest to thefirst two is joined so that the three pairwise differencesare satisfied. Thus, patristic distances necessarilyequal the number of differences. If a homoplasy isencountered, a new spatial dimension is added to thegraph. Gaps and invariant positions are excluded fromthe analysis. Because the method tries to satisfy allthe distances among haplotypes, the number ofdimensions might be high and the display of thenetwork thus becomes complex and difficult. Thisproblem becomes worse as data sets become larger.

Likelihood networkThe likelihood-network procedure17 is based on adirected graphical model for the evolution ofsequences along a network. Graphical models aregraphs in which nodes are stochastic variables,whereas branches indicate correlations between these

Review

TRENDS in Ecology & Evolution

A C

E

FG

H

0

0 I

B not sampled

D extinct

Missingintermediate

A

G

IHC

FE

AC

E

F

G

HI

EFGH IA C

A C

E

FG

H

I

A C

E

FG

H

I

E FGH I AC A C F G H IE

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 2. Phylogeneticestimation. Differentphylogeneticreconstructiontechniques, includingnetwork techniques,were applied to the dataset described in Box 1,Fig. I. (a) UPGMA. (b) Maximum parsimony.(c) Pyramid. (d) Statisticalgeometry (provides anestimate of averagequartet topology ratherthan of the actualgenealogy). (e) Splitdecomposition. (f) Minimum spanningnetwork. (g) Median-joining network. (h) Statistical parsimony. (i) Reticulogram.

Table 1. Properties of networking algorithms

Methods Categorya Software Speed Input data Model of Reticulations Statistical evolution assessment

Pyramids Distance Pyramids Fast Distances Yes Yes NoStatistical geometry Distance invariants Geometry, Statgeom Fast Multistate Yes Yes YesSplit decomposition Distance parsimony SplitsTree Fast Multistate Yes Yes YesMedian networks Distance No Slow Binary No Yes NoMedian-joining networks Distance Network Very fast Multistate No No NoStatistical parsimony Distance TCS Fast Multistate No Yes YesMolecular-variance parsimony Distance Arlequin Fast Multistate Yes Yes YesNetting Distance No Slow Multistate No Yes NoLikelihood network Likelihood PAL Slow Multistate Yes Yes YesReticulogram Least squares T-rex Fast Distances No Yes YesReticulate phylogeny Least squares No Slow Distancesb Yes Yes YesaDetails of software programs given in Box 3; bDistances estimated from gene frequency data.

Page 7: Intraspecific gene genealogies: trees grafting into networks

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

43Review

variables. A graph might be directed by rooting thenetwork at a specific node and directing all branchesaway from this node. A given network can be turnedinto directed acyclic graphs, which allows thecomputation of its likelihood. Arbitrary phylogeneticnetworks are evaluated by likelihood and the networkwith the best likelihood is chosen as the final solution.

Within this framework, the use of ancestralrecombination graphs18 when recombination ispresent has recently been suggested19. The likelihood-network algorithm also allows the simultaneousestimation of parameters (e.g. the recombinationrate). The use of a likelihood framework offers thepossibility of hypothesis testing, and the statisticalcomparison of both trees and networks. However, adifference compared with the other methods is thatthe amount of computing time can be excessivelylarge. Effective search strategies need to be devisedbefore large data sets can be analyzed.

ReticulogramThis procedure20 is based on the addition ofreticulations to a bifurcating tree. The method uses asinput a distance matrix and an ADDITIVE TREE inferredfrom the same distance matrix using one of the classicalreconstruction algorithms. An optimality criterion isused to estimate the minimum number of reticulationsrequired to maximize the fit of the network to the data –

a least squares loss function computed as the sum of thesquared differences between the original and thepatristic distances. The minimum of this criterionprovides a stopping rule for the addition of reticulation.The computations are repeated recursively for all pairsof nodes (except those already connected) in thenetwork to obtain a globally optimal solution.

Reticulate phylogenies from gene frequenciesThis method infers reticulate phylogenies usinggenetics distances estimated from gene frequencydata21. The mean squared error (MSE) of a leastsquares function is used as the optimality criterion toselect among possible reticulate phylogenies.Theoretically, all possible phylogenies must beevaluated, and the inferred phylogeny is the one thathas the minimum MSE. The use of a least-squaresfunction allows the estimation of the branch lengths inthe phylogeny. Two models are defined: a drift modeland an extended model with mutation. The drift modelis best used in short-term evolution, in which mutationhas not played an important role, whereas themutation model is applicable to long-term evolution.

Strengths and weaknesses of network methodsThe comparison of different networking strategies is not straightforward. Comprehensivesimulation studies are needed to evaluate the

Review

Several software packages implement someof the network methods described in thisarticle.• PYRAMIDS estimates pyramids from

distance matrices. Executables forWindows and Unix are available(http://genome.genetique.uvsq.fr/Pyramids/).There is also a convenient onlineimplementation (http://www.bioweb.pasteur.fr/seqanal/interfaces/pyramids.html). PYRAMIDS was written byJ.C. Aude et al. (INRA, France).

• STATGEOM calculates the statisticalgeometry in distance and in sequencespace of a set of aligned DNA, RNA,amino acid or binary sequences. Sourcecode with documentation and a SunSPARC executable are available (http://gwdu17.gwdg.de/~kniesel/statgeom.html).STATGEOM was written by K. Nieselt-Struwe (Dept of Physics, University ofAuckland, New Zealand).

• GEOMETRY is a package for nucleotidesequence analysis using the method ofstatistical geometry in sequence spacea.The program is available as a DOSexecutable (http://molevol.bionet.nsc.ru/soft.htm; ftp://ftp.bionet.nsk.su/incoming/

molevol/; ftp://ftp.ebi.ac.uk/pub/software/dos/).

• SPLITSTREE implements the splitdecomposition methodb. This program iscurrently available as a Mac executableor as a Unix version (ftp://ftp.uni-bielefeld.de/pub/math/splits/).

• NETWORK is a program for estimatingmedian-joining networks. The program isavailable as a DOS executable(http://www.fluxusengineering.com/sharenet.htm) and was written by A. Röhl(Mathematisches Seminar, UniversitätHamburg, Germany).

• ARLEQUIN is a Java program thatimplements the method of molecularvariance parsimony (http://lgb.unige.ch/arlequin/index.php3). ARLEQUIN waswritten by S. Scheneider et al. (Dept ofAnthropology, University of Geneva,Switzerland).

• TCS is a Java program for estimatingstatistical parsimony networksc (http://bioag.byu.edu/zoology/crandall_lab/tcs.htm). It allows the estimation of rootprobabilities.

• PAL is an open-source Java library formolecular evolution and phylogenetics

(http://www.pal-project.org/). Its >120modules allow the fast prototyping ofspecial-purpose analysis programs. PALhas been used to compute likelihoods fornetworks. The PAL project is led byK. Strimmer (Dept of Zoology, Universityof Oxford, UK) and A. Drummond (Schoolof Biological Science, University ofAuckland, New Zealand).

• T-REX is a C++ program implementingthe reticulogram estimationd. Windows,Macintosh and DOS executables areavailable (http://www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex/index.html).

Referencesa Kuznetsov, I. and Morozov, P. (1996)

GEOMETRY: a software package for nucleotidesequence analysis using statistical geometry insequence space. Comput. Appl. Biosci. 12, 297–301

b Huson, D.H. (1998) SplitsTree: analyzing andvisualizing evolutionary data. Bioinformatics14,68–73

c Clement, M. et al. (2000) TCS: a computerprogram to estimate gene genealogies. Mol. Ecol.9, 1657–1660

d Makarenkov, V. and Legendre, P. (2000)Improving the additive tree representation of adissimilarity matrix using reticulations. In DataAnalysis, Classification and Related Methods(Kiers, H.A.L. et al., eds), pp. 35–46, Springer

Box 3. Software availability

Page 8: Intraspecific gene genealogies: trees grafting into networks

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com

44 Review

References1 Swofford, D.L. et al. (1996) Phylogenetic

inference. In Molecular Systematics (Hillis,D.M. et al., eds), pp. 407–514, SinauerAssociates

2 Hennig, W. (1966) Phylogenetic Systematics,University of Illinois Press

3 McDade, L. (1990) Hybrids and phylogeneticsystematics I. Patterns of character expressionin hybrids and their implications for cladisticanalysis. Evolution 44, 1685–1700

4 McDade, L.A. (1992) Hybrids and phylogeneticsystematics II. The impact of hybrids oncladistic analysis. Evolution 46, 1329–1346

5 Lecointre, G. et al. (1993) Species sampling hasa major impact on phylogenetic inference. Mol.Phylogenet. Evol. 3, 205–224

6 Diday, E. and Bertrand, P. (1986) An extensionof hierarchical clustering: the pyramidalrepresentation. In Pattern Recognition inPractice (Gelsema, E.S. and Kanal, L.N., eds),pp. 411–424, North–Holland

7 Eigen, M. et al. (1988) Statistical geometry insequence space: a method of quantitativesequence analysis. Proc. Natl. Acad. Sci. U. S. A.85, 5917

8 Bandelt, H-J. and Dress, A.W.M. (1992) Splitdecomposition: a new and useful approach to

phylogenetic analysis of distance data. Mol.Phylogenet. Evol. 1, 242–252

9 Bandelt, H-J. et al. (1995) Mitochondrialportraits of human populations using mediannetworks. Genetics 141, 743–753

10 Bandelt, H-J. et al. (2000) Median networks:speedy construction and greedy reduction, onesimulation, and two case studies from humanmtDNA. Mol. Phylogenet. Evol. 16, 8–28

11 Bandelt, H-J. et al. (1999) Median-joiningnetworks for inferring intraspecific phylogenies.Mol. Biol. Evol. 16, 37

12 Foulds, L.R. et al. (1979) A graph theoreticapproach to the development of minimalphylogenetic trees. J. Mol. Evol. 13, 127–149

13 Templeton, A.R. et al. (1992) A cladistic analysisof phenotypic associations with haplotypesinferred from restriction endonuclease mappingand DNA sequence data III. Cladogramestimation. Genetics 132, 619–633

14 Crandall, K.A. and Templeton, A.R. (1999)Statistical methods for detecting recombination.In The Evolution of HIV (Crandall, K.A., ed.),pp. 153–176, The Johns Hopkins UniversityPress

15 Excoffier, L. and Smouse, P.E. (1994) Usingallele frequencies and geographic subdivision toreconstruct gene trees within a species:

molecular variance parsimony. Genetics 136,343–359

16 Fitch, W.M. (1997) Networks and viralevolution. J. Mol. Evol. 44, S65–S75

17 Strimmer, K. and Moulton, V. (2000) Likelihoodanalysis of phylogenetic networks using directedgraphical models. Mol. Biol. Evol. 17, 875–881

18 Griffiths, R.C. and Marjoram, P. (1997) Anancestral recombination graph. In Progress inPopulation Genetics and Human Evolution.(Donelly, P. and Tavare, S., eds), pp. 257–270,Springer-Verlag

19 Strimmer, K. et al. Recombination analysisusing directed graphical models. Mol. Biol. Evol.(in press)

20 Makarenkov, V. and Legendre, P. (2000)Improving the additive tree representation of adissimilarity matrix using reticulations. InData Analysis, Classification and RelatedMethods (Kiers, H.A.L. et al., eds), pp. 35–46,Springer

21 Xu, S. (2000) Phylogenetic analysis underreticulate evolution. Mol. Biol. Evol. 17,897–907

22 Castelloe, J. and Templeton, A.R. (1994) Rootprobabilities for intraspecific gene trees underneutral coalescent theory. Mol. Phylogenet.Evol. 3, 102–113

accuracy and robustness of different methods. Ingeneral, distance methods might imply a loss ofinformation by summarizing the difference betweenhaplotypes with one value. Optimality criterionmethods (likelihood and least squares) offer a reliablestatistical assessment, hypotheses testing and, insome cases, parameter estimation. Most of thesemethods are conveniently implemented in computerprograms, which are in some cases capable ofhandling many sequences. Several properties ofdifferent methods are summarized in Table 1.

Rooting intraspecific phylogeniesIn many cases, the problem of biological interestrequires a root, or at least some knowledge of therelative ages of haplotypes22. Rooting networks isespecially difficult because outgroups are oftenseparated from the ingroup by many mutational stepsand because individuals within a species are similar toeach other. This leads to a lack of power to decide wherethe rooting should take place. However, predictionsfrom coalescent theory can be applied in intraspecificphylogeny reconstruction to root the network. Bydefinition, the oldest ancestral haplotype is the root ofthe phylogeny. Given coalescent criteria, this ancestralhaplotype is identified as the most frequent. Thenumber of connections and the position of a haplotypein the network can also be used to assign rootprobabilities. The likelihood-network method providesa way to root a phylogenetic network by choosing anode that produces the most likely network.

Conclusions and future directionsTraditional methods for estimating phylogenieswere not designed and might not be adequate for

within-species phylogeny. Network approaches canincorporate population processes in the constructionor refinement of haplotype relationships. Moreover,networks allow a more detailed display ofpopulational information than strictly bifurcatingtrees. Although we have focused on the applicationof networks to sequence data, most methodsdescribed here can be applied to proteins orrestriction-fragment length polymorphism data. Ingeneral, the main interest of intraspecificphylogenies is not in themselves but rather in theirapplications. They have been used for detectingrecombination23,24, delimiting species25, inferringmodes of speciation26, partitioning populationhistory and structure27, and studying genotype andphenotype associations28.

The development of networks in a likelihoodframework, which allows hypothesis testing with asound statistical basis, is particularly interesting.However, the computational tractability of likelihoodnetworks remains a drawback. Coalescent theory isan active area of research that is likely to yield newpredictions that could be used in the development ofmore refined network approaches, which currentlyare not based on the coalescent. Most of the methodsdescribed here are implemented in computerprograms (BOX 3), and other network methods havebeen reviewed recently29. Given the recent interest inintraspecific phylogenies and their applications, weexpect to see increased interest in the developmentand application of intraspecific phylogenetics usingnetwork approaches to depict genealogicalrelationships. Future work will involve comparingmethods and testing the robustness of theirassumptions.

Review

AcknowledgementsComments by ChrisSimon and François-Joseph Lapointeimproved this review. Wethank M. Coulthart,A. Whiting, M. Porter andJ.W. Sites Jr for helpfulsuggestions. Our workwas supported by NSF-DEB 0073154, the Alfred P.Sloan Foundation andNIH R01-HD34350.

Page 9: Intraspecific gene genealogies: trees grafting into networks

In a seminal paper in 1977, Peter Grubb1 introducedthe concept of the ‘regeneration niche’ to help explainthe coexistence of similar species. Grubb noted thatspecies that share much the same life form,phenology and habitat range might neverthelesshave different seedling requirements. For example,‘when a whole large tree is blown over’ the large gapthus formed would favour light-demandingseedlings, whereas smaller gaps would favour shade-tolerant recruits. The literature on the ecology ofseeds and seedlings has expanded enormously sincethis time. Seed ecology is widely studied because ofits assumed importance in the population growth ofplants. Seed traits are included in general systemsfor classifying plant functional types2,3. The samebroad recognition has not been given to the mode ofpersistence of established plants. When a tree isblown over, gaps might not be filled by seedlings butby shoots sprouting from the fallen tree. Sproutsgrow much faster than seedlings and can quicklyreoccupy their own gaps. Sprouting ability can havemajor impacts on plant populations: turnover ofpopulations is reduced; the effects of disturbance areminimized; and dependence on seeds for populationmaintenance might become negligible. Species differin their sprouting ability, and both strong andweakly sprouting species occur in diverseecosystems4,5. Here, we review studies of thephenomenon emphasizing emerging generalizationsand implications for the population biology of woodyplants.

The ecology of sprouting has analogies toclonality6. However, although clonal plants generallysprout, only a small fraction of woody sprouters areclonal and capable of vegetative spread. Sproutingresponse is difficult to quantify partly because thereis a continuum of responses to disturbances of varyingseverity7. This could partly account for its absencefrom general plant strategy schemes (Box 1). Afterthe least severe disturbance, such as damage causedby caterpillar feeding, plants replace lost tissues bysprouting from buds. As the severity of disturbanceincreases, plants diverge in their ability to recover bysprouting (Fig. 1). Most interest has been indisturbances that can potentially kill plants such asfire, hurricane damage, drought, flooding, herbivoryand landslides, as well as anthropogenic disturbancesuch as forest clearing. In Mediterranean typeshrublands, where crown fires are relatively frequent,shrub species are often either killed outright byburning (nonsprouters or ‘seeders’) or recovervegetatively from roots or stems (sprouters)4,8.Classifying sprouting behaviour is harder in forestsand woodlands because species often diverge in theirsprouting response at different life history stages5,7,9.Some species never sprout; in some species, sproutingability increases with size to reach a maximum inadult stages, although in other species sprouting iscommon in juveniles but adults are unable to sprout(Fig. 2). Some forest trees retain a bud bank,sprouting continuously with or without disturbanceand thus come close to immortality. Examples includeTilia cordata (small-leaved lime) in Britain, whosenorthern populations have survived by sproutingsince climates cooled and reproduction ceased 5000years ago10. Ginkgo biloba (maidenhair tree) mightowe its survival in China to the extraordinarypersistence of trees that resprout from specializedbasal swellings on the trunk11.

Sprouting is common, and might be the ancestralstate, in woody angiosperms12. Most conifers do notsprout although there are exceptions in severalunrelated lineages, including species of Pinus in the

Many woody plants can resprout and many ecosystems are dominated byresprouters.They persist in situ through disturbance events such as fire, floodingor wind storms. However, the importance of ‘persistence’ in plant demographyhas been neglected in favour of ‘recruitment’.Thus much research on plantregeneration, conservation and evolution has focused on the importance of safesites, seed and seedling banks, dispersal and germination with the impliedimportance of de novo replacement rather than persistence. Recent researchshows a growing appreciation for the role of sprouting as a form of persistence ina diversity of ecosystems and tradeoffs between the two regeneration modes.

Ecology of sprouting in woody plants:the persistence nicheWilliam J. Bond and Jeremy J. Midgley

23 Holmes, E.C. et al. (1999) The influence ofrecombination on the population structure andevolution of the human pathogen Neisseriameningitidis. Mol. Biol. Evol. 16, 741–749

24 Templeton, A.R. et al. (2000) Recombinational andmutational hotspots within the human lipoproteinlipase gene. Am. J. Hum. Genet. 66, 69–83

25 Shaw, K.L. (1999) A nested analysis of song groups

and species boundaries in the Hawaiian cricketgenus Laupala. Mol. Phylogenet. Evol. 11, 332–341

26 Barraclough, T.G. and Vogler, A.P. (2000)Detecting the geographical pattern of speciationfrom species-level phylogenies. Am. Nat. 155,419–434

27 Gómez-Zurita, J. et al. (2000) Nested cladisticanalysis, phylogeography and speciation in the

Timarcha goettingensis complex (Coleoptera,Chrysomelidae). Mol. Ecol. 9, 557–560

28 Sing, C.F. et al. (1992) Application of cladisticsto the analysis of genotype–phenotyperelationships. Eur. J. Epidemiol. 8, 3–9

29 Lapointe, F-J. How to account for reticulationevent in phylogenetic analysis: a review ofdistance-based methods. J. Classif. (in press)

TRENDS in Ecology & Evolution Vol.16 No.1 January 2001

http://tree.trends.com 0169–5347/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0169-5347(00)02033-4

45ReviewReview

William J. Bond*Jeremy J. MidgleyDept Botany, University ofCape Town, Private Bag,Rondebosch 7701, South Africa. *e-mail:bond @ botzoo.uct.ac.za