phylogenetic and evolutionary implications of complete chloroplast genome sequences of four...

17
Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae) Debra R. Hansen a , Sayantani G. Dastidar a , Zhengqiu Cai a , Cynthia Penaflor b , Jennifer V. Kuehl c , Jeffrey L. Boore c,1 , Robert K. Jansen a, * a Section of Integrative Biology and Institute of Cellular and Molecular Biology, Biological Laboratories 404, University of Texas, Austin, TX 78712, USA b Biology Department, 373 WIDB, Brigham Young University, Provo, UT 84602, USA c DOE Joint Genome Institute and Lawrence Berkeley National Laboratory, Walnut Creek, CA 94598, USA Received 24 January 2007; revised 5 June 2007; accepted 11 June 2007 Available online 16 June 2007 Abstract We have determined the complete chloroplast genome sequences of four early-diverging lineages of angiosperms, Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae), to examine the organization and evolution of plastid genomes and to estimate phylogenetic relationships among angiosperms. For the most part, the organization of these plastid gen- omes is quite similar to the ancestral angiosperm plastid genome with a few notable exceptions. Dioscorea has lost one protein-coding gene, rps16; this gene loss has also happened independently in four other land plant lineages, liverworts, conifers, Populus, and legumes. There has also been a small expansion of the inverted repeat (IR) in Dioscorea that has duplicated trnH-GUG. This event has also occurred multiple times in angiosperms, including in monocots, and in the two basal angiosperms Nuphar and Drimys. The Illicium chlo- roplast genome is unusual by having a 10 kb contraction of the IR. The four taxa sequenced represent key groups in resolving phylo- genetic relationships among angiosperms. Illicium is one of the basal angiosperms in the Austrobaileyales, Chloranthus (Chloranthales) remains unplaced in angiosperm classifications, and Buxus and Dioscorea are early-diverging eudicots and monocots, respectively. We have used sequences for 61 shared protein-coding genes from these four genomes and combined them with sequences from 35 other gen- omes to estimate phylogenetic relationships using parsimony, likelihood, and Bayesian methods. There is strong congruence among the trees generated by the three methods, and most nodes have high levels of support. The results indicate that Amborella alone is sister to the remaining angiosperms; the Nymphaeales represent the next-diverging clade followed by Illicium; Chloranthus is sister to the magnoliids and together this group is sister to a large clade that includes eudicots and monocots; and Dioscorea represents an early-diverging lineage of monocots just internal to Acorus. Published by Elsevier Inc. Keywords: Chloroplast genome evolution; Basal angiosperm phylogeny; Chloranthales; Austrobaileyales; Inverted repeat 1. Introduction The history of molecular phylogenetic studies in angio- sperms has been one of a common theme: as techniques improve, both for generating sequence data and for per- forming phylogenetic analyses, so has our confidence in the results. Since the 1992 single-gene, 60-taxa study of 1055-7903/$ - see front matter Published by Elsevier Inc. doi:10.1016/j.ympev.2007.06.004 * Corresponding author. Fax: +1 512 232 9529. E-mail address: [email protected] (R.K. Jansen). 1 Present address: SymBio, 1455 Adams Drive, Menlo Park, CA 94025, USA. www.elsevier.com/locate/ympev Available online at www.sciencedirect.com Molecular Phylogenetics and Evolution 45 (2007) 547–563

Upload: independent

Post on 17-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Available online at www.sciencedirect.com

www.elsevier.com/locate/ympev

Molecular Phylogenetics and Evolution 45 (2007) 547–563

Phylogenetic and evolutionary implications of completechloroplast genome sequences of four early-diverging angiosperms:

Buxus (Buxaceae), Chloranthus (Chloranthaceae),Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae)

Debra R. Hansen a, Sayantani G. Dastidar a, Zhengqiu Cai a, Cynthia Penaflor b,Jennifer V. Kuehl c, Jeffrey L. Boore c,1, Robert K. Jansen a,*

a Section of Integrative Biology and Institute of Cellular and Molecular Biology, Biological Laboratories 404, University of Texas, Austin, TX 78712, USAb Biology Department, 373 WIDB, Brigham Young University, Provo, UT 84602, USA

c DOE Joint Genome Institute and Lawrence Berkeley National Laboratory, Walnut Creek, CA 94598, USA

Received 24 January 2007; revised 5 June 2007; accepted 11 June 2007Available online 16 June 2007

Abstract

We have determined the complete chloroplast genome sequences of four early-diverging lineages of angiosperms, Buxus (Buxaceae),Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae), to examine the organization and evolution ofplastid genomes and to estimate phylogenetic relationships among angiosperms. For the most part, the organization of these plastid gen-omes is quite similar to the ancestral angiosperm plastid genome with a few notable exceptions. Dioscorea has lost one protein-codinggene, rps16; this gene loss has also happened independently in four other land plant lineages, liverworts, conifers, Populus, and legumes.There has also been a small expansion of the inverted repeat (IR) in Dioscorea that has duplicated trnH-GUG. This event has alsooccurred multiple times in angiosperms, including in monocots, and in the two basal angiosperms Nuphar and Drimys. The Illicium chlo-roplast genome is unusual by having a 10 kb contraction of the IR. The four taxa sequenced represent key groups in resolving phylo-genetic relationships among angiosperms. Illicium is one of the basal angiosperms in the Austrobaileyales, Chloranthus (Chloranthales)remains unplaced in angiosperm classifications, and Buxus and Dioscorea are early-diverging eudicots and monocots, respectively. Wehave used sequences for 61 shared protein-coding genes from these four genomes and combined them with sequences from 35 other gen-omes to estimate phylogenetic relationships using parsimony, likelihood, and Bayesian methods. There is strong congruence among thetrees generated by the three methods, and most nodes have high levels of support. The results indicate that Amborella alone is sister to theremaining angiosperms; the Nymphaeales represent the next-diverging clade followed by Illicium; Chloranthus is sister to the magnoliidsand together this group is sister to a large clade that includes eudicots and monocots; and Dioscorea represents an early-diverging lineageof monocots just internal to Acorus.Published by Elsevier Inc.

Keywords: Chloroplast genome evolution; Basal angiosperm phylogeny; Chloranthales; Austrobaileyales; Inverted repeat

1055-7903/$ - see front matter Published by Elsevier Inc.

doi:10.1016/j.ympev.2007.06.004

* Corresponding author. Fax: +1 512 232 9529.E-mail address: [email protected] (R.K. Jansen).

1 Present address: SymBio, 1455 Adams Drive, Menlo Park, CA 94025,USA.

1. Introduction

The history of molecular phylogenetic studies in angio-sperms has been one of a common theme: as techniquesimprove, both for generating sequence data and for per-forming phylogenetic analyses, so has our confidence inthe results. Since the 1992 single-gene, 60-taxa study of

548 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

Hamby and Zimmer, researchers have continued to addboth taxa and characters, often pushing the limits of theanalytical tools used to interpret the data. Thanks to manyrecent studies, there has been a general convergence withregard to the placement of the major lineages within flow-ering plants (summarized in Soltis et al., 2005 and Qiuet al., 2005). Some questions remain, and support is stilllow for some nodes. Specifically, there are still uncertaintiesregarding the relationships among the basal angiospermsand early-diverging lineages of eudicots. Each new studyattempts to put the issue to bed, but even recent studies stillshow ambiguous or conflicting data. For example, moststudies (e.g. Barkman et al., 2000; Graham and Olmstead,2000; Qiu et al., 2005) suggest the placement of Amborellaas sister to the remaining angiosperms, but support issometimes low, and is dependent on methods of phyloge-netic reconstruction. A recent study using eight markersfrom three genomes (Qiu et al., 2006) suggests that chloro-plast data lends support for Amborella alone, whereasmitochondrial genes support Amborella + Nymphaealesas sister to other angiosperms. Most studies (Mathewsand Donoghue, 2000; Qiu et al., 2005 , among others) showAustrobaileyales (Austrobaileyaceae, Trimeniaceae, andSchisandraceae, including Illiiaceae; Bremer et al., 2003)to be the next-diverging lineage, followed by the magnoli-ids. However, relationships among the major clades andthe placement of many early-diverging lineages of angio-sperms and eudicots remain unresolved or weaklysupported.

Since the publication of the first angiosperm chloroplastgenome sequence of Nicotiana in 1986 (Shinozaki et al.,1986), researchers have been keen to use complete chloro-plast genome sequences to infer a wide range of informa-tion about plants, including the examination ofphylogenetic relationships. Phylogenetic analysis, however,requires substantial taxon sampling, and the use of wholegenomes to infer phylogeny has been constrained by thepaucity of complete genomes sequenced (Soltis et al.,2004; Stefanovic et al., 2004; Leebens-Mack et al., 2005;Jansen et al., 2006). But with the advent of relatively rapidand more inexpensive cloning and sequencing techniques(e.g. Jansen et al., 2005; McNeal et al., 2006; Mooreet al., 2006), we have seen a recent flurry of plastid genomessequenced. Since 2003 more than 40 new angiosperm plas-tid genomes have been published and are publicly availableon GenBank. This rapid growth in the availability of com-plete chloroplast genome sequences has provided a wealthof new data for phylogenetic analyses of angiosperm rela-tionships. A number of recent phylogenetic studies haveexamined relationships among angiosperms using 61 pro-tein-coding genes and 13–35 taxa (e.g. Goremykin et al.,2003a,b, 2004, 2005; Leebens-Mack et al., 2005; Changet al., 2006; Lee et al., 2006; Jansen et al., 2006; Bausheret al., 2006; Cai et al., 2006; Ruhlman et al., 2006). Overall,these studies have provided additional support and resolu-tion of relationships among some of the major clades ofangiosperms, including support for the monophyly of mag-

noliids, a sister relationship between magnoliids and aclade that includes both monocots and eudicots, the place-ment of Vitis as an early-diverging lineage of rosids, and asister relationship between Caryophyllales and asterids.

The rapid increase in the availability of complete plastidgenome sequences has also provided much new informa-tion about the organization and evolution of angiospermgenomes. In general, angiosperm chloroplast genomes arehighly conserved in their organization, gene content, andgene order (reviewed in Sugiura, 1992; Palmer, 1991;Raubeson and Jansen, 2005), and this conservation is alsoapparent in recently sequenced gymnosperm genomes (Wuet al., 2007). In most cases, the genomes have a quadripar-tite organization with a large inverted repeat (IR) separat-ing the two single-copy regions (small and large single-copyregions, SSC and LSC). The IR ranges in size from 20 to27 kb in most angiosperms, and variation in the extent ofthe IR is due largely to small expansions and contractionsat the SSC and LSC boundaries (Goulding et al., 1996). Inseveral angiosperm lineages, including some Fabaceae (Pal-mer and Thompson, 1982; Palmer et al., 1987b; Wolfe,1988; Saski et al., 2005) and Geraniaceae (Price et al.,1990), one copy of the IR has been lost or greatly reducedin size, whereas in other cases the IR has expanded. Themost extreme case of IR expansion is in the Pelargonium

chloroplast genome where the IR is over 75 kb (Palmeret al., 1987a; Chumley et al., 2006). Gene content and orderare also highly conserved with most angiosperm genomeshaving 120–130 genes in the same order and orientationas tobacco. The tobacco chloroplast genome is consideredto have the ancestral angiosperm chloroplast genome orga-nization based on its nearly identical organization to thegenomes from several basal angiosperm lineages includingAmborella (Goremykin et al., 2003a), Nuphar (Raubesonet al., 2007), Nymphaea (Goremykin et al., 2004), and mag-noliids (Goremykin et al., 2003a; Cai et al., 2006). Unfor-tunately, our understanding of the organization ofangiosperm chloroplast genomes is limited because mostavailable complete genome sequences are from morederived eudicot and monocot lineages.

In this paper, we report complete chloroplast genomesequences of four new angiosperms representing twoearly-diverging angiosperms (Chloranthus and Illicium),one basal eudicot (Buxus) and one basal monocot (Diosco-

rea). The chloroplast genome sequences are utilized toaddress two primary issues: (1) genome organization andevolution of angiosperms, with an emphasis on gene con-tent and evolution of the IR; and (2) phylogenetic relation-ships among angiosperms, with an emphasis on early-diverging lineages. Chloranthus and Illicium are key taxato include, for they both sit at crossroads between impor-tant lineages. Illicium, representing Austrobaileyales, haslong been thought to have a place between Amborella/Nymphaeales and magnoliids. Chloranthus from the Chlo-ranthales has been placed in several different phylogeneticpositions in angiosperms and its placement continues tobe problematic despite numerous studies based on single

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 549

and multiple gene sequences (see Soltis et al., 2005 for areview). In fact, the placement of the Chloranthalesremains one of the most difficult remaining issues forresolving relationships among the deep nodes of the angio-sperm tree of life. Buxus is thought to have diverged afterRanunculales, and to be sister to the core eudicots (Soltiset al., 2005). Buxus shares some characters with basalangiosperms (e.g. the lack of ellagic acid and the presenceof undifferentiated tepals instead of sepals and petals)and some with the core eudicots, such as tri-aperturate pol-len. This particular clade of early-diverging eudicots isinteresting because its diversification coincides with a sig-nificant evolution of floral development that separatesbasal angiosperms and core eudicots (Soltis et al., 2005).Finally, the phylogenetic placement of Dioscoreaceae tra-ditionally has been problematic because of its highlyreduced floral morphology. However, recent molecularphylogenetic studies have positioned this family near thebase of monocots (reviewed in Soltis et al., 2005). The addi-tion of the Dioscorea chloroplast genome adds to the sam-pling of basal lineages of monocots, which currently arerepresented by only Acorus (Goremykin et al., 2005; Lee-bens-Mack et al., 2005).

2. Materials and methods

2.1. Chloroplast isolation, amplification, and sequencing

We obtained leaf material from the University of Con-necticut Greenhouses for Buxus microphyllum (M.R. Opel262, CONN), Chloranthus spicatus (M.R. Opel 263,CONN), Dioscorea elephantipes (M.R. Opel 264, CONN),and Illicium oligandrum (M.R. Opel 265, CONN). Ten to20 g of fresh leaf material was used to isolate chloroplastsusing the sucrose-gradient method (Palmer, 1986). Thechloroplasts were lysed and the entire chloroplast genomeswere amplified using Rolling Circular Amplification (RCA,using the REPLI-g� Qiagen, Valencia, CA, USA) follow-ing the methods outlined in Jansen et al. (2005). The RCAproduct was digested with the restriction enzymes EcoRIand BstBI and the resulting fragments were separated byagarose gel electrophoresis to determine the presence/qual-ity of chloroplast DNA. The RCA product was sheared byserial passage through a narrow aperture using a GeneM-achines HydroShear device (Genomic Solutions, AnnArbor, MI, USA), and the resulting fragments wereenzymatically repaired to blunt ends and gel purified, andthen ligated into pUC18 plasmids. The clones wereintroduced into Escherichia coli by electroporation, thenplated onto nutrient agar with antibiotic selection andgrown overnight. Colonies were randomly selected androbotically processed through RCA of plasmid clones,sequencing reactions were carried out using BigDyechemistry (Applied Biosystems, Foster City, CA, USA),reaction cleanup was done with solid-phase reversibleimmobilization, and sequencing was done on an ABI 3730XL automated DNA sequencer. Detailed protocols are

available at http://www.jgi.doe.gov/sequencing/protocols/protsproduction.html.

2.2. Genome assembly and annotation

Sequences from randomly chosen clones were processedusing Phred and assembled based on overlapping sequenceinto a draft genome sequence using Phrap (Ewing andGreen, 1998). Quality of the sequence and assembly wasverified using Consed (Gordon et al., 1998). In mostregions of the genomes we had 6- to 12-fold coverage butthere were a few areas with gaps or low depth of coverage.PCR and sequencing at The University of Texas at Austinwere used to bridge gaps and fill in areas of low coverage inthe genome. Additional sequences were added until a com-pletely contiguous consensus was created representing theentire chloroplast genome with a minimum of 2· coverageand a consensus quality score of Q40 or greater.

The first bp after IRa on the psbA side was defined as bpone for gene annotation. The genomes of Buxus, Chloran-

thus, Dioscorea, and Illicium were annotated using the pro-gram DOGMA (Dual Organellar GenoMe Annotator,Wyman et al., 2004). All protein-coding genes were identi-fied using the plastid/bacterial genetic code.

2.3. Sequence alignment

Sixty-one protein-coding genes (the same ones includedin the analyses of Goremykin et al., 2003a,b, 2004, 2005;Leebens-Mack et al., 2005; Cai et al., 2006) were extractedfrom Buxus, Chloranthus, Dioscorea, and Illicium usingDOGMA (Wyman et al., 2004). The same set of 61 geneswas extracted from chloroplast genome sequences of 35other sequenced chloroplast genomes (see Table 2 for com-plete list of genomes examined). All 61 protein-codinggenes of the 39 taxa were aligned using the MultipleSequence Analysis Tool for Phylogeny program (MSAT,Z. Cai and R. Jansen unpublished). This program trans-lated the sequence into amino acids, which were thenaligned using MUSCLE (Edgar, 2004), manually adjusted,and then nucleotide sequences of these genes were alignedby constraining them to the aligned amino acid sequences.A Nexus file with character sets for phylogenetic analyseswas generated after nucleotide sequence alignment wascompleted. The complete nucleotide alignment is availableonline at the ChloroplastDB (Cui et al., 2006, http://chloroplast.cbio.psu.edu/).

2.4. Phylogenetic analysis

We used 61 chloroplast genes, totalling 46,266 charac-ters, to infer phylogenetic trees on the 39-taxa dataset usingthree different methods: maximum parsimony (MP) wasperformed using PAUP* version 4.10 (Swofford, 2003),maximum likelihood (ML) using GARLI version 0.942(Zwickl, 2006), and Bayesian inference using MrBayes3.1.1 (Huelsenbeck and Ronquist, 2001). All analyses

Table 1Comparison of major features of Buxus, Chloranthus, Dioscorea, and Illicium plastid genomes

Feature Buxus Chloranthus Dioscorea Illicium

Entire plastid size 159,010 157,772 152,609 148,552Large single copy (LSC) 88,143 88,108 82,777 98,057Small single copy (SSC) 17,747 18,398 18,806 20,267Inverted repeat (IR) 26,560 26,133 25,513 15,114Percent coding/non-coding 57.3/42.7 58.5/41.5 59.6/40.4 55.8/44.2A–T (%)

Total genome 61.9 62.9 61.1 61.0IR 57 58.6 57 53.8LSC 65 63.7 66.1 62.9SSC 67.8 65.1 68.9 66.1

Number of genes (different/total) 113/129 113/129 112/129 113/124Number of genes duplicated in IR 16 16 17 11Number of genes with introns (with 2 introns) 18 (2) 18 (2) 17 (2) 18 (2)Number of protein-coding genes (total/in IR) 79/5 79/5 78/5 79/3Number of rRNA genes 4 4 4 4Number of tRNA genes (total/in IR) 37/7 37/7 38/8 35/5

550 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

excluded gap regions to avoid ambiguities in alignmentcaused by insertions and deletions. The MP searchesincluded 100 random addition replicates and TBR branchswapping with the Multrees option. Akaike informationcriterion via Modeltest 3.7 (Posada and Crandall, 1998)was used to determine the most appropriate model ofDNA sequence evolution for each of the 61 genes. ForML analysis we used GARLI, which performs heuristicsearches under the General Time Reversible (GTR) modelof nucleotide substitution, with gamma distributed rateheterogeneity and a proportion of invariant sites. We con-ducted two independent runs of the program, using theautomated stopping criterion, which specifies that thesearch is finished when the same �ln score is found for20,000 consecutive generations. Likelihood scores wereobtained using PAUP*, as PAUP* is better at optimizingbranch lengths on the final topology (Zwickl, 2006). ForBayesian analysis, we performed two analyses, one withthe GTR + G + I on a concatenated data matrix of 61genes and a second with genes partitioned into five differentmodels (Supplemental Table 1). Each run started with arandom starting tree, default priors, and four Markovchains with heating values of 0.03, sampled every 100 gen-erations. Convergence was confirmed by using AWTYgraphical analysis (Wilgenbusch et al., 2004). The analyseswith a single model and with genes partitioned into five dif-ferent models (Supplemental Table 1) ran for 2.0 · 107 gen-erations. Burn-in trees were discarded, and the remainingtrees and their parameters saved. We used the frequencyof inferred relationships to represent estimated posteriorprobabilities (PP). Clades with PP P 0.95 are consideredstrongly supported. Non-parametric bootstrap analyses(Felsenstein, 1985) were performed for MP analysis inPAUP with 1000 replicates with TBR branch swapping,one random addition replicate, and the Multrees optionand for ML analyses in GARLI with 100 replicates usingthe automated stopping criterion set at 10,000 generationsfor each replicate.

2.5. Tests of alternate topologies

Shimodaira–Hasegawa (SH) tests (Shimodaira andHasegawa, 1999) were performed to determine if alterna-tive topologies regarding basal angiosperms, placement ofChloranthus, and relationship of monocots to eudicots weresignificantly different than the best ML trees. Constrainttopologies with the alternative tree topologies were usedand the SH test was conducted using RELL optimization(Goldman et al., 2000) as implemented in PAUP* version4.10 (Swofford, 2003).

3. Results

3.1. Size, gene content, and organization

The plastid genomes of B. microphylla (GenBankAccession No. NC_009599), C. spicatus (GenBank Acces-sion No. NC_009598), D. elephantipes (GenBank Acces-sion No. NC_009601), and I. oligandrum (GenBankAccession No. NC_009600) are very similar in size, genecontent, and gene order to the ancestral angiosperm gen-ome represented by Amborella, Nuphar, and Nymphaea

(see Table 1, Figs. 1–3 and Supplemental Fig. 1 for details).Genome sizes range from 148,552 bp in Illicium to159,010 bp in Buxus, and most of this variation is due toexpansion and contraction of the IR. All plastid genomesare A + T rich with an overall A + T content ranging from61.0% (Illicium) to 62.9% (Chloranthus). The A + T per-centages are higher in the SSC (65.1–68.9%) and LSC(62.9–66.1%) regions than the IR (53.8–58.6%). The gen-omes consist of 55.8% (Illicium) to 59.6% (Dioscorea) cod-ing sequences (protein-coding genes, rRNA genes andtRNAs) and 40.4% (Dioscorea) to 44.2% (Illicium) non-coding regions (intergenic spacers and introns). The num-ber of different genes in each of the four plastid genomesis 112–113 with 11–17 of these duplicated in the IR to givea total of 124 (Illicium) to 129 (Buxus, Chloranthus, and

Table 2Taxa included in phylogenetic analyses with GenBank accession numbers and references

Taxon GenBank Accession Nos. Reference

Gymnosperms—outgroupsPinus thunbergii NC_001631 Wakasugi et al. (1994)Ginkgo biloba DQ069337–DQ069702 Leebens-Mack et al. (2005)

Basal angiospermsAmborella trichopoda NC_005086 Goremykin et al. (2003b)Nuphar advena NC_008788 Raubeson et al. (2007)Nymphaea alba NC_006050 Goremykin et al. (2004)Illicium oligandrum NC_009600 Current studyChloranthus spicatus NC_009598 Current study

MagnoliidsCalycanthus floridus NC_004993 Goremykin et al. (2003a)Drimys granadensis NC_008456 Cai et al. (2006)Liriodendron tulipifera NC_008326 Cai et al. (2006)Piper coenoclatum NC_008457 Cai et al. (2006)

MonocotsAcorus americanus DQ069337–DQ069702 Leebens-Mack et al. (2005)Dioscorea elephantipes NC_009601 Current studyOryza sativa NC_001320 Hiratsuka et al. (1989)Saccharum officinarum NC_006084 Asano et al. (2004)Phaleanopsis aphrodite NC_007499 Chang et al. (2006)Triticum aestivum NC_002762 Ogihara et al. (2000)Typha latifolia DQ069337–DQ069702 Leebens-Mack et al. (2005)Yucca schidigera DQ069337–DQ069702 Leebens-Mack et al. (2005)Zea mays NC_001666 Maier et al. (1995)

EudicotsArabidopsis thaliana NC_000932 Sato et al. (1999)Atropa belladonna NC_004561 Schmitz-Linneweber et al. (2002)Buxus microphylla NC_009599 Current studyCitrus sinensis NC_008334 Bausher et al. (2006)Cucumis sativus NC_007144 Plader et al. (unpublished)Eucalyptus globulus NC_008115 Steane (2005)Glycine max NC_007942 Saski et al. (2005)Gossypium hirsutum NC_007944 Lee et al. (2006)Lotus corniculatus NC_002694 Kato et al. (2000)Medicago truncatula NC_003119 Lin et al. (unpublished)Nicotiana tabacum NC_001879 Shinozaki et al. (1986)Oenothera elata NC_002693 Hupfer et al. (2000)Panax schinseng NC_006290 Kim and Lee (2004)Populus trichocarpa NC_008235 Tuskan et al. (2006)Ranunculus macranthus NC_008796 Raubeson et al. (2007)Solanum lycopersicum DQ347959 Daniell et al. (2006)Solanum bulbocastanum NC_007943 Daniell et al. (2006)Spinacia oleracea NC_002202 Schmitz-Linneweber et al. (2001)Vitis vinifera NC_007957 Jansen et al. (2006)

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 551

Dioscorea) genes (Table 1). The difference in the number ofdistinct and total genes in different genomes is the result oftwo processes. First, the loss of rps16 in Dioscorea results ina genome with only 112 different genes. Second, Illicium

has five fewer total genes due a large contraction of theIR (see details below). All genomes have 78 (Dioscorea)or 79 protein-coding genes, 3 (Illicium) to 5 (Buxus,Chloranthus, and Dioscorea) of which are duplicated inthe IR. Three of the four genomes (Buxus, Chloranthus,and Dioscorea) have 18 genes with introns, 15 of whichcontain a single intron and two with two introns (clpP,ycf3). Dioscorea has only 17 genes with introns due tothe loss of the intron-containing gene rps16. The third gene

with three exons, rps12, is transpliced; the 50 exon is in theLSC and is separated from the two remaining exons in theIR by more than 30 kb.

For the most part, gene order of the four plastid genomesreported here is identical to the ancestral angiosperm gen-ome, represented by Amborella, Nuphar, and Nymphaea. Inthe case of Chloranthus (Supplemental Fig. 1) the gene orderis identical but the remaining three genomes have minorchanges caused by expansion and contraction of the IRand a small inversion. In Dioscorea (Fig. 2), the IR hasexpanded at the IRa/LSC boundary to duplicate trnH-GUG, which results in trnH-GUG also occurring betweenrpl2 and rps19 on the IRb/LSC boundary. In Illicium

Fig. 1. Gene map of Buxus microphylla chloroplast genome. Thick lines represent the extent of the inverted repeats, which separate the large single-copyregion (LSC) from the small single-copy region (SSC). Genes shown outside the circle are transcribed in the clockwise direction, and those on the inside ofthe circle are transcribed in the counter-clockwise direction. Genes containing introns have asterisks and are linked with black bars spanning the intron.

552 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

(Fig. 3; see details below) the IR has contracted by approxi-mately 10 kb on the IRb/LSC boundary. This has resulted inthe loss of the duplicated copies of five genes (trnL-CAA,ycf2, trnI-CAU, rpl23, and rpl2) from the IR so that thesegenes now are single copy in the IRb/LSC boundary region.The Buxus genome (Fig. 1) has the same gene order as Chlo-

ranthus (Supplemental Fig. 1) except for an inversion oftrnG-UCC, which is typically located adjacent to and onthe same strand as trnR-UCU. Since inversions are knownto be mediated by repeats (Palmer, 1991; Raubeson andJansen, 2005), the intergenic spacer (IGS) sequences thatflank the inverted tRNA were examined for repeatedsequences six bp or larger using both REPuter (Kurtzet al., 2001) and pairwise BLAST analyses. No repeats werelocated in the IGS regions near this small inversion.

3.2. Examination of inverted repeat boundary sequences in

Illicium

One way to determine the extent of the IR is to align thetwo sides and examine where they do and do not match. Aperfect match defines the IR, and the point at which thetwo sides differ defines single-copy sequence. In mostangiosperm plastid genomes, this point usually occursbefore or within rps19 on the IRb/LSC side, and between

rpl2 and trnH-GUG on the IRa/LSC side, with between0 and 5 non-coding nucleotide between these two genes(Fig. 4). In Illicium, due to the 10 kb contraction of theIR, the boundary is between trnL-CAA and ndhB on theIRb/LSC side, and between ndhB and trnH-GUG on theIRa/LSC side. There are 500 nt of intergenic spacerbetween these two genes, and this non-coding DNA has avery high sequence similarity to the spacer region betweentrnL-CAA and ndhB. If the two sequences were identical, itwould have extended the IR by 500 bp, but they differ justafter ndhB, and thus are single copy. To judge the similaritybetween these two sequences against what we would expectfrom comparing random sequences, we sampled random550-nt non-coding regions of single-copy chloroplastDNA of Illicium and five other angiosperms (Arabidopsis,Buxus, Calycanthus, Nicotiana, and Triticum). For eachgenome all the single-copy intergenic spacers wereextracted and divided into 550-bp sequences (1–550, 551–1100, etc.). These sequences were renamed using randomnumbers, and 20 sequence pairs were matched accordingto their numerical order (e.g. 35 with 41, 594 with 621,etc.). After the sequence pairs were aligned using MacClade(version 4.08, Maddison and Maddison, 2005) they weretrimmed to 500 bp to keep their length comparable to thatof the Illicium sequences. Uncorrected pair-wise distances

Fig. 2. Gene map of Dioscorea elephantipes chloroplast genome. Thick lines represent the extent of the inverted repeats, which separate the large single-copy region (LSC) from the small single-copy region (SSC). Genes shown outside the circle are transcribed in the clockwise direction, and those on theinside of the circle are transcribed in the counter-clockwise direction. Genes containing introns have asterisks and are linked with black bars spanning theintron.

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 553

of each random pair were calculated using PAUP. Thepair-wise distances between random spacer regions withina genome were then compared to the pair-wise distancebetween the two Illicium regions in question. There is anaverage pairwise distance of 0.510 between random pairsof single-copy intergenic spacer sequence, compared to just0.09 between the Illicium sequences (Fig. 5). The most obvi-ous explanation for this similarity is that the spacer regionjust outside the IRa/LSC boundary used to be in the IRand was identical to its mirror sequence between trnL-CAA and ndhB. At some point in the past, possibly dueto errors during replication, this sequence was excludedfrom the IR and has now accumulated sequence variationsince its release from the copy correction associated withIR sequence. Compared to the trnL-CAA-ndhB spacersequence, there is an 8-nt deletion in the IRa/LSC spacersequence at the point where IR stops and LSC begins.

3.3. Phylogenetic analyses

We conducted three methods of phylogenetic analysis onthe same 39-taxa, 61-gene dataset. The resulting trees werestrikingly similar, but the maximum parsimony (MP) analy-sis differed from maximum likelihood analysis (ML) andBayesian inference (BI) in its placement of Chloranthus andsome relationships within rosids, albeit with low support.

In general, the topologies agreed, and there was higherbootstrap support in the ML trees than in the MP trees,and higher posterior probability scores for the BI trees thanbootstrap support for the ML trees (Fig. 6). The treesprovide support for the following relationships: Amborella

alone is sister to the remaining angiosperms; Nymphaealesare the next most basal lineage; Illicium is sister to a clade thatincludes Chloranthus, magnoliids, monocots, and eudicots;the Chloranthaceae and magnoliids are sister to a large cladethat includes both monocots and eudicots, which are bothmonophyletic; Ranunculus and Buxus are early-diverginglineages of eudicots; rosids and asterids are each monophy-letic; Caryophyllales are sister to asterids; Vitis is the earliestdiverging lineage of rosids; and the eurosid II clade ismonophyletic.

Maximum parsimony analysis resulted in one tree of67,299 steps, a consistency index of 0.390 (excluding unin-formative characters), and a retention index of 0.562. Thetree (not shown) was fully resolved. Both ML analysesresulted in one tree with �lnL of 370627.97769. Bootstrapanalysis showed strong support (>95%) for 30/35 nodes(Fig. 6). The Bayesian MCMC runs resulted in four setsof 20,000 trees each. We used AWTY (Wilgenbuschet al., 2004) to analyze burn-in and convergence of theruns, and found that stability was reached by 400,000generations (20% of the run). To be conservative, we

Fig. 3. Gene map of Illicium oligandrum chloroplast genome. Thick lines represent the extent of the inverted repeats, which separate the large single-copyregion (LSC) from the small single-copy region (SSC). Genes shown outside the circle are transcribed in the clockwise direction, and those on the inside ofthe circle are transcribed in the counter-clockwise direction. Genes containing introns have asterisks and are linked with black bars spanning the intron.

Fig. 4. Comparison of IRa/LSC boundaries in angiosperms. The monocots sampled have unique non-coding sequence between the last gene of IRa andthe first gene of the LSC, but most angiosperms have very little sequence in this area. In contrast, Illicium has 500 bases of non-coding sequence betweenndhB (IRa) and trnH (LSC). In some genomes rps19 is single copy, and in some the IRb/LSC junction occurs within rps19, leaving a partial rps19 sequenceat IRa.

554 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

discarded the first 30% of the runs (6000 saved trees fromeach run) as burn-in, leaving a total of 56,000 treesfrom which to estimate posterior credibility values. Treesfrom both Bayesian analyses using a single model for all

61 genes and five different models (Supplemental Table 1)were identical (compare Fig. 6 and Supplemental Fig. 2).The Bayesian consensus tree was congruent with the MLtree, however both trees differed from the MP tree in two

Fig. 5. Comparison of single-copy sequence adjacent to IR in Illicium to random intergenic spacers (500 nt pairs) in Illicium and other genomes. A likelyexplanation for this unusual similarity in Illicium is that this 500-bp region represents sequence that used to be part of the IR, and now shows mutationsaccumulated since it was released from the constraints of copy correction that keep the IR intact.

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 555

ways. First, the ML and BI trees placed Chloranthus sisterto the magnoliids with moderate or strong support (Fig. 6),whereas the MP tree placed Chloranthus sister to a cladethat includes magnoliids, eudicots, and monocots. Second,ML and BI trees positioned the Cucumis/Eucalyptus/Oeno-

thera clade sister to eurosids II with weak or strong sup-port, whereas MP analyses placed this clade sister toeurosids I with weak support.

3.4. Tests of alternative topologies

We performed two sets of SH tests to test alternativetree topologies. First, although the MP, ML, and BI treesplace Amborella sister to the remaining angiosperms withmoderate to strong support (Fig. 6), previous phylogeneticstudies suggested two alternative topologies, Nymphaeales(Nuphar and Nymphaea) or Amborella + Nymphaealesbasal. The Nymphaeales basal hypothesis had a differencein �lnL of 15.09303 (370643.07071 versus 370627.97769for the best estimate with Amborella basal) and wasrejected with a p value of 0.019. The Amborella + Nymp-haeales basal hypothesis could not be rejected becausethe difference in �ln L was only 6.732 (370634.70210 versus370627.97769 for the best estimate, p = 0.241). Second, 10alternative placements of Chloranthus were tested (Fig. 7a–j). Nine of these alternatives were rejected with p < 0.05(Table 3 and Fig. 7a–i). Although the best estimates forboth ML and BI analyses placed Chloranthus sister to themagnoliids with moderate to strong support (Figs. 6 and7k), MP trees place it sister to a clade that included magno-liids, monocots and eudicots (Fig. 7j). This is also the onlyalternative topology that could not be rejected in the SHtests (Fig. 7 and Table 3).

4. Discussion

4.1. Chloroplast genome organization and evolution of

angiosperms

The organization of the four new plastid genomesreported here is very similar to the ancestral angiosperm

genome represented by Amborella, Nuphar, and Nymphaea.Thus, the genome organization of these earliest divergingangiosperms represents the ancestral condition, which hasbeen widely conserved across most of the genomes sampledin the more derived lineages: magnoliids, monocots, andeudicots. Gene content and order are very similar with onlya few relatively minor differences. In terms of gene content,Dioscorea has lost rps16 from the LSC (Fig. 2), a featurethat is shared with several unrelated taxa, including the liv-erwort Marchantia (Ohyama et al., 1986), the gymnospermPinus (Tsudzuki et al., 1992), the two legumes Medicago

(Saski et al., 2005) and Pisum (Nagano et al., 1991), andPopulus (Tuskan et al., 2006). Among legumes, multipleindependent losses of rps16 have been detected using a filterhybridization approach (Doyle et al., 1995). Thus, the lossof this gene has clearly occurred multiple times throughoutthe evolutionary history of land plants. With one excep-tion, the primary reason for differences in gene order andorientation in the four genomes reported here is due toexpansion and contraction of the IR. The one exceptionis Buxus, which has a small inversion of trnG-UCC in theLSC region (Fig. 1). Single gene inversions of tRNA’s inangiosperm chloroplast genomes are rare. The only otherexample we know about among angiosperms is the inver-sion of trnT-UGU in two species of Jasminum (Oleaceae,J. nudiflorum and J. mesnyi, Lee et al., 2007). In the Olea-ceae, the change in strand of trnT-UGU is due to a small219 bp inversion that is flanked by conserved 11 bp repeatsthat likely played a role in facilitating this rearrangement.No repeats were found flanking the trnG-UCC inversionin Buxus.

Chloroplast genome length is highly conserved acrossangiosperms (Palmer, 1991; Palmer and Stein, 1986;Raubeson and Jansen, 2005). This is especially true forthe IR regions, which are thought to undergo more strictcopy correction due to the base pairing of repeat sequencesduring replication. The absence of the IR in some legumes(Palmer et al., 1987b; Lavin et al., 1990) and its reductionto 495 bp in Pinus thunbergii (Wakasugi et al., 1994) indi-cates that it is not required for chloroplast function, at leastin some lineages. However, it remains a virtually constant,

Fig. 6. Phylogenetic tree of 39-taxa dataset based on 61 protein-coding chloroplast genes. Tree shown is the topology from Bayesian analysis of 2 · 2 runs of2.0 · 107 generations using a single model (GTR + G + I) for a concatenated data matrix of 61 gene sequences. The ML tree has a �lnL of 370627.97769.Numbers above (or below) branches indicate support from Bayesian, ML, and MP analysis. Where bootstrap support values are the same for ML and MPanalyses, only one number is shown. Asterisk indicates <50% support in that analysis. L, Laurales; M, Magnoliales; C, Canellales; and P, Piperales.

556 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

Fig. 7. Eleven tree topologies for the placement of Chloranthales. Topologies (a–i) are from previous phylogenetic studies of angiosperms (see Table 3 forreferences). Topologies (j) and (k) are the maximum parsimony and maximum likelihood trees generated in this study. Values below the trees represent p

values for SH tests comparing the best ML tree (k) with the other 10 alternative tree topologies (a–j).

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 557

stable feature of chloroplast genomes. Small expansionsand contractions of <100 bp are not uncommon at IR/LSC and IR/SSC boundaries (Aii et al., 1997; Gouldinget al., 1996), and expansions of >1000 bp have beenreported in Nicotiana (Shen et al., 1982; Goulding et al.,1996), Pelargonium (Palmer et al., 1987a; Chumley et al.,2006), Mahonia (Kim and Jansen, 1994), and Fagopyrum

(Aii et al., 1997) among others. In Dioscorea, expansionof the IR has occurred on the IRa/LSC boundary resultingin a duplicate copy of the trnH-GUG gene next to rps19 atthe IRb/LSC boundary (Fig. 2). This expansion has alsooccurred in several other monocots, including Phalaenopsis

(Chang et al., 2006), Acorus (Goremykin et al., 2005), and

seven different genera of grasses (Saski et al., 2007), andtwo early-diverging angiosperm lineages, Drimys (Caiet al., 2006) and Nuphar (Raubeson et al., 2007). The phy-logenetic distribution of this IR expansion suggests that ithas occurred multiple times in angiosperms.

In contrast to IR expansion, evidence of significant IRcontractions is rare. The reduction of IRa in Illicium repre-sents a roughly 10,000 bp contraction. Some Apiaceae spe-cies have significantly reduced IR regions (Plunkett andDownie, 2000), such as the 4 kb contraction in coriander(R. Peery and L. Raubeson, personal communication),and several Cuscuta species have IR reductions of�700 bp (loss of the rpl2 intron) to �8 kb (loss of the dupli-

Table 3Summary of the results of the SH tests of alternative topologies for the placement of Chloranthaceae (see Fig. 7)

Tree �lnL Diff. from best �lnL p Value Reference(s) for tree

a 370679.43721 51.45953 0.000* Qiu et al. (1999, 2005), Zanis et al. (2002)b 370664.23964 46.26195 0.000* Qiu et al. (2006)c 370673.95279 45.97510 0.001* Hilu et al. (2003)d 370659..27756 31.29987 0.002* NTa

e 370672.38447 44.40678 0.002* Qiu et al. (1999, 2005)f 370673.94771 45.97002 0.002* NTa

g 370877.10979 249.13210 0.000* Doyle and Endress (2000)h 370679.12298 51.14529 0.000* Qiu et al. (2000, 2005), Soltis et al. (1999)i 370674.34241 46.36472 0.001* Soltis et al. (2000)j 370635.33541 7.35772 0.142 MP tree from this datasetk 370627.97769 Best ML tree from this dataset

a NT indicates that no tree with this topology has been published but that this alternative represents a minor modification of one of the other previouslypublished placements of Chloranthaceae.

* Significant at the p < 0.05 level.

558 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

cated copies of rpl2, rpl23 and trnI-CAU genes, and signif-icant reduction in length of both ycf2 and ndhB genes;Bommer et al., 1992; Downie et al., 1991; Stefanovic andOlmstead, 2005). Along with Illicium, these are the onlyknown significant (>100 bp) IR reductions sequenced todate. It is noteworthy that the IR reduction in Cuscuta

reflexa at the LSC/IRa junction is similar to the reductionin Illicium. In C. reflexa, ycf2 is significantly reduced, andndhB is reduced to a pseudogene, yet there is no indicationthat trnL-CAA (which is located between ycf2 and ndhB inmost IR sequences) is absent. In Illicium, ycf2 and trnL-CAA are missing, and ndhB, although duplicated in bothIRa and IRb, is reduced in size and may not be functional.Expansions in the IR have been correlated to the presenceof multiple repeat sequences, and mechanisms for expan-sion have been suggested (Aii et al., 1997; Gouldinget al., 1996; Palmer et al., 1987a). Examples of IR contrac-tions are few, and to date we have none of the tell-talerepeats or other signs of the genome having changed, otherthan the obvious absence of a segment of DNA. Plunkettand Downie (2000) point out that junction shifts oftenoccur adjacent to tRNA genes, and that recombinationbetween the short repeats or poly(A) sections associatedwith tRNAs may set the stage for shifts in IR length. InIllicium the contraction is between trnH-GUG and ndhB.There are small poly(A) sequences remaining in the inter-genic spacer between ndhB and trnL-CAA on the IRb side,and ndhB and trnH-GUG on the IRa side. It is possiblethat an indel caused a mismatch that resulted in theupstream sequence becoming single copy. It is also possiblethat the mechanism that resulted in the 10 kb contractionand this ‘‘left behind’’ sequence are related, but the mech-anisms of IR contraction are not known. To our knowl-edge, this is the only known example of sequence thathas been removed from the IR without being removedfrom the genome entirely.

4.2. Phylogenetic relationships of angiosperms

Chloroplast sequences have been used alone and in con-junction with mitochondrial and nuclear gene sequences to

provide valuable information regarding phylogenetic rela-tionships among angiosperms (Chase et al., 1993; Soltiset al., 1999, 2000, 2007; Graham and Olmstead, 2000;Zanis et al., 2002, 2003; Hilu et al., 2003; Qiu et al.,1999, 2005, 2006). Whole chloroplast genome sequencesare providing an unprecedented number of characters withwhich to infer phylogeny (Goremykin et al., 2003a,b, 2004,2005; Leebens-Mack et al., 2005; Chang et al., 2006; Jansenet al., 2006; Lee et al., 2006; Cai et al., 2006), but incom-plete taxon sampling remains an issue in resolving conflict-ing or weakly supported results (Soltis and Soltis, 2004a;Stefanovic et al., 2004; Jansen et al., 2006). A recentfocused effort has increased the number of chloroplast gen-omes available for analysis, and as each new genome issequenced, the inferences become more robust. A greatdeal of energy has been focused toward discerning the rela-tionships among basal angiosperms (Barkman et al., 2000;Graham and Olmstead, 2000; Borsch et al., 2003; Aokiet al., 2004; Goremykin et al., 2003a; Leebens-Macket al., 2005; Qiu et al., 2005, 2006; Soltis et al., 2007 to citejust a few). Following Soltis and Soltis (2004b), we use theterm ‘‘basal angiosperm’’ to comprise magnoliids, Chlo-ranthales, Austrobaileyales, Nymphaeales, and Amborella.Limited taxon sampling and use of inappropriate modelshave been cited for the conflicting results between and/orlow support for relationships implied by the whole genomeapproach (Soltis and Soltis, 2004a,b; Stefanovic et al.,2004; Leebens-Mack et al., 2005; Lockhart and Penny,2005; Chang et al., 2006; Goremykin et al., 2005; Jansenet al., 2006; Goremykin and Hellwig, 2006). Accordingly,workers continue to add taxa and to refine evolutionarymodels in an effort to converge on a unified and well-sup-ported view of these relationships. Analyses using completegenomes are not short on characters, but the paucity ofgenomes available has made analyses susceptible to long-branch attraction (Soltis and Soltis, 2004a; Soltis et al.,2004; Stefanovic et al., 2004; Leebens-Mack et al., 2005).The addition of four new chloroplast genome sequencesreported here enables us to infer relationships based onmore taxa than has been previously possible. Illicium inparticular, representing Austrobaileyales, allows us to

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 559

add a basal angiosperm to the mix. Chloranthus, represent-ing the Chloranthales, enables an assessment of the place-ment of this problematic clade among basal angiosperms.The addition of Buxus and Dioscorea provides densertaxon sampling of early-diverging eudicots and monocots,respectively.

All three phylogenetic methods of the 39-genome data-set provide congruent results concerning relationshipsamong the major angiosperm clades (Fig. 6). There is mod-erate to strong support for the position of Amborella as thesole sister group to the rest of angiosperms. Most recentstudies agree that Amborella and the Nymphaeales repre-sent the earliest diverging angiosperm lineages (Mathewsand Donoghue, 1999; Barkman et al., 2000; Graham andOlmstead, 2000; Zanis et al., 2002, 2003; Borsch et al.,2003; Qiu et al., 1993, 1999, 2000, 2001; Savolainen et al.,2000; Leebens-Mack et al., 2005; Chang et al., 2006; Stefa-novic et al., 2004; Soltis et al., 1999, 2000, 2007; Hilu et al.,2003). The most recent analyses based on nine genes fromthe plastid, mitochondrial, and nuclear genomes (Qiu et al.,2006) generated trees supporting both of these hypothesesdepending on the phylogenetic method and the genesincluded. Plastid gene trees positioned Amborella sister tothe remaining angiosperms, whereas mitochondrial genetrees supported the Amborella + Nymphaeales hypothesis.Moreover, parsimony trees supported the Amborella basalhypothesis and likelihood and Bayesian trees supportedAmborella + Nymphaeales as the sister group to the restof angiosperms. Several phylogenetic studies based on 61protein-coding plastid genes (Leebens-Mack et al., 2005;Chang et al., 2006; Jansen et al., 2006; Ruhlman et al.,2006; Bausher et al., 2006) also showed support for thesetwo different hypotheses depending on the method of phy-logenetic analysis, with MP trees placing Amborella aloneas the basalmost angiosperm with strong support andML trees placing Amborella + Nymphaeales at the basewith moderate support. The recent addition of three mag-noliids genomes (Cai et al., 2006) resolved this incongru-ence between MP and ML trees and resulted intopologies with moderate or strong support for Amborella

alone as the earliest diverging lineage of angiosperms. Ouranalyses that include four new plastid genomes from early-diverging angiosperm, eudicot and monocot lineages alsogenerate trees with Amborella sister to all other angio-sperms regardless of method of analysis. These data canreject one of the alternative hypotheses that positions theNymphaeales as the basalmost angiosperm, but the Ambo-

rella + Nymphaeales basal hypothesis still cannot berejected with SH tests even with these additional taxa(Fig. 7).

For many years Illicium was considered a member of theIlliciales and this order was included in the basal ANITAgrade (Amborella, Nymphaeaceae, Illiciales, Trimeniaceae,Austrobaileyaceae, Qiu et al., 1999). The genus is currentlyconsidered to be a member of the Schisandraceae in theorder Austrobaileyales by the Angiosperm PhylogenyGroup (Bremer et al., 2003). Recent single and multiple

gene trees position this order just internal to Amborella

and Nymphaeales (reviewed in Soltis et al., 2005 , see alsoQiu et al., 2006; Soltis et al., 2007). Our phylogenetic anal-yses of the 61 chloroplast protein-coding genes concur withthis placement with strong support (Fig. 6), thereby con-firming the Austrobaileyales as one of the earliest divergingangiosperm lineages.

One of the most controversial remaining issues regard-ing relationships among angiosperms concerns the place-ment of the Chloranthales, which includes the singlefamily Chloranthaceae. This family with four genera andapproximately 75 extant species (Eklund et al., 2004) isregarded as one of several early-diverging lineages ofangiosperms (reviewed in Soltis et al., 2005). There is anextensive fossil record for Chloranthaceae dating back toearly Cretaceous (Doyle et al., 2003; Eklund et al., 2004).From the fossil record it is clear that the Chloranthaceaewas much more widespread and species-rich in the past,and that the family has experienced widespread extinctionsince its origin (Zhang and Renner, 2003). This may havecontributed to the incredible morphological diversityamong the four extant genera, which has made it difficultto resolve the phylogenetic placement of Chloranthaceae.The family has been placed in many different positions inphylogenetic trees based on morphology and single or mul-tiple gene sequences (Chase et al., 1993; Mathews andDonoghue, 1999; Graham et al., 2000; Qiu et al., 2000,summarized in Fig. 7). This has resulted in lack of anassignment of the order Chloranthales in the most recentAngiosperm Phylogeny Group classification (Bremeret al., 2003). Our phylogenetic estimates using 61 chloro-plast genes and ML or BI methods place Chloranthaceaesister to the magnoliids with moderate or strong support(Figs. 6 and 7k). However, MP analyses provide weak sup-port (55% bootstrap value) for the placement of Chloran-

thus sister to a clade that includes magnoliids, monocotsand eudicots (Fig. 7j). Our SH tests of 10 alternative place-ments of Chloranthus (Fig. 7a–j and Table 3) rejected allalternatives except for the one topology obtained in theMP analyses. Thus, although we have not definitivelyresolved the placement of the Chloranthales, we have elim-inated all but two of the previous placements of the Chlo-ranthales: sister to magnoliids, or just internal to theAustrobaileyales (although support for this latter place-ment in the MP tree is very weak, with a bootstrap valueof 55%).

The eudicots represent the largest clade of angiospermswith over 75% of the extant species (Soltis et al., 2005).Buxus (Buxaceae) is one of several families of basal eudi-cots that form a basal grade sister to core eudicots, whichinclude the rosids and asterids. Among the basal eudicots,molecular phylogenies position the Ranunculales sister toall other eudicots but the relationships of the other basaleudicots remain controversial. Our phylogenetic analysesprovide strong support of the position of Buxus as anearly-diverging lineage of eudicots just internal to Ranuncu-

lus. This result is congruent with recent phylogenetic trees

560 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

based on multiple genes from the chloroplast, mitochon-drial, and nuclear genomes (Soltis et al., 2000; Qiu et al.,2006). Complete chloroplast genome sequences of addi-tional basal eudicot lineages are needed to resolve relation-ships among Buxaceae and the other five members of thisbasal grade.

Finally, the placement of Dioscorea has not been partic-ularly controversial and our phylogenetic analyses (Fig. 6)agree with the relationship suggested in many recent molec-ular phylogenetic studies (Chase et al., 2005; Soltis et al.,2000). Dioscorea represents an early-diverging lineage ofmonocots, just inside of Acorus, which forms the sister line-age to all remaining monocots.

Acknowledgments

We thank Clinton Morse from the University of Con-necticut greenhouses for providing leaf material of Buxus,Chloranthus, Dioscorea and Illicium, and Ruth Timmeand two anonymous reviewers for critically reading an ear-lier version of the manuscript. This research was supportedin part by a grant from the National Science Foundation(DEB 0120709) and the Sidney F. and Doris Blake Centen-nial Professorship in Systematic Botany to R.K.J. D.R.H.was supported by a graduate fellowship from the College ofNatural Sciences and Z.C. was supported by a Pre-emptivegraduate fellowship for the Graduate School. Part of thiswork was performed under the auspices of the US Depart-ment of Energy, Office of Biological and EnvironmentalResearch, by the University of California, Lawrence Berke-ley National Laboratory, under Contract No. DE-AC02-05CH11231.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at doi:10.1016/j.ympev.2007.06.004.

References

Aii, J., Kishima, Y., Mikami, T., Adachi, T., 1997. Expansion of the IR inthe chloroplast genomes of buckwheat species is due to incorporationof an SSC sequence that could be mediated by an inversion. Curr.Genet. 31, 276–279.

Aoki, S., Uehara, K., Imafuku, M., Hasebe, M., Ito, M., 2004.Phylogeny and divergence of basal angiosperms inferred fromAPETALA3- and PISTILLATA-like MADS-box genes. J. PlantRes. 117, 229–244.

Asano, T., Tsudzuki, T., Takahashi, S., Shimada, H., Kadowaki, K.,2004. Complete nucleotide sequence of the sugarcane (Saccharum

officinarum) chloroplast genome: a comparative analysis of fourmonocot chloroplast genomes. DNA Res. 11, 93–99.

Barkman, T.J., Chenery, G., McNeal, J.R., Lyons-Weiler, J., Ellisens,W.J., Moore, G., Wolfe, A.D., dePamphilis, C.W., 2000. Independentand combined analyses of sequences from all three genomic compart-ments converge on the root of flowering plant phylogeny. Proc. Natl.Acad. Sci. USA 97, 13166–13171.

Bausher, M.G., Singh, N.D., Lee, S.-B., Jansen, R.K., Daniell, H., 2006.The complete chloroplast genome sequence of Citrus sinensis (L.)

Osbeck var. ‘Ridge Pineapple’: organization and phylogenetic rela-tionships to other angiosperms. BMC Plant Biol. 6, 21.

Bommer, D., Haberhausen, G., Zetsche, K., 1992. A large deletion in theplastid DNA of the holoparasitic flowering plant Cuscuta reflexa

concerning two ribosomal proteins (rpl2, rpl23), one transfer RNA(trnI) and an ORF2280 homologue. Curr. Genet. 24, 171–176.

Borsch, T., Hilu, K.W., Quandt, D., Wilde, V., Neinhuis, C., Barthlott,W., 2003. Noncoding plastid trnT-trnF sequences reveal a well resolvedphylogeny of basal angiosperms. J. Evol. Biol. 16, 558–576.

Bremer, B., Bremer, K., Chase, M.W., Reveal, J.L., Soltis, D.E., Soltis,P.S., Stevens, P.F., Anderberg, A.A., Fay, M.F., Goldblatt, P., Judd,W.S., Kallersjo, M., Karehed, J., Kron, K.A., Lundberg, J., Nickrent,D.L., Olmstead, R.G., Oxelman, B., Pires, J.C., Rodman, J.E., Rudall,P.J., Savolainen, V., Sytsma, K.J., van der Bank, M., Wurdack, K.,Xiang, J.Q.Y., Zmarzty, S., 2003. An update of the AngiospermPhylogeny Group classification for the orders and families of floweringplants: APG II. Bot. J. Linn. Soc. 141, 399–436.

Cai, Z., Penaflor, C., Kuehl, J.V., Leebens-Mack, J., Carlson, J.,dePamphilis, C.W., Jansen, R.K., 2006. Complete plastid genomesequences of Drimys, Liriodendron, and Piper: implications for thephylogeny of magnoliids. BMC Evol. Biol. 6, 77.

Chang, C.-C., Lin, H.-C., Lin, I.-P., Chow, T.-Y., Chen, H.-H., Chen,W.-H., Cheng, C.-H., Lin, C.-Y., Liu, S.-M., Chang, C.-C., Chaw,S.-M., 2006. The chloroplast genome of Phalaenopsis aphrodite

(Orchidaceae): comparative analysis of evolutionary rate with that ofgrasses and its phylogenetic implications. Mol. Biol. Evol. 23, 279–291.

Chase, M.W., Soltis, D.E., Olmstead, R., Morgan, D., Les, D., Mishler,B., Duvall, M., Price, R., Hills, H., Qui, Y.-L., Kron, K., Rettig, J.,Conti, E., Palmer, J., Manhart, J., Sytsma, K., Michaels, H., Kress, J.,Karol, K., Clark, D., Hedren, M., Gaut, B., Jansen, R.K., Kim, K.-J.,Wimpee, C., Smith, J., Furnier, G., Straus, S., Xiang, Q.-Y., Plunkett,G., Soltis, P.S., Swensen, S., Williams, S., Gadek, P., Quinn, C.,Equiarte, L., Golenberg, E., Learn Jr., G., Graham, S., Barrett, S.,Dayanandan, S., Albert, V., 1993. Phylogenetics of seed plants: ananalysis of nucleotide sequences from the plastid gene rbcL. Ann. Mo.Bot. Gard. 80, 528–580.

Chase, M.W., Fay, M.F., Devey, D., Maurin, O., Rønsted, N., Davies, J.,Pillon, Y., Petersen, G., Seberg, O., Tamura, M.N., Asmussen, C.B.,Hilu, K., Borsch, T., Davis, J.I., Stevenson, D.W., Pires, J.C., Givnish,T.J., Sytsma, K.J., McPherson, M.A., Graham, S.W., Rai, H.S., 2005.Multigene analyses of monocot relationships: a summary. Aliso 22,63–75.

Chumley, T.W., Palmer, J.D., Mower, J.P., Fourcade, H.M., Calie, P.J.,Boore, J.L., Jansen, R.K., 2006. The complete chloroplast genomesequence of Pelargonium · hortorum: organization and evolution of thelargest and most highly rearranged chloroplast genome of land plants.Mol. Biol. Evol. 23, 2175–2190.

Cui, L., Veeraraghavan, N., Richer, A., Wall, K., Jansen, R.K., Leebens-Mack, J., Makalowska, I., dePamphillis, C.W., 2006. ChloroplastDB:the chloroplast genome database. Nucleic Acids Res. 34, D692–D696.

Daniell, H., Lee, S.-B., Grevich, J., Saksi, C., Quesada-Vargas, T., Guda,C., Tomkins, J., Jansen, R.K., 2006. Complete chloroplast genomesequences of Solanum bulbocastanum, Solanum lycopersicum andcomparative analyses with other Solanaceae genomes. Theor. Appl.Genet. 112, 1503–1518.

Downie, S.R., Olmstead, R.G., Zurawski, G., Soltis, D.E., Soltis, P.S.,Watson, J.C., Palmer, J.D., 1991. Six independent losses of thechloroplast DNA rpl2 intron in dicotyledons: molecular and phyloge-netic implications. Evolution 45, 1245–1259.

Doyle, J.A., Eklund, H., Herendeen, P.S., 2003. Floral evolution inChloranthaceae: implications of a morphological phylogenetic analy-sis. Int. J. Plant Sci. 164, S365–S382.

Doyle, J.A., Endress, P.K., 2000. Morphological phylogenetic analysis ofbasal angiosperms: comparison and combination with molecular data.Int. J. Plant Sci. 161, S121–S153.

Doyle, J.J., Doyle, J.L., Palmer, J.D., 1995. Multiple independent losses oftwo genes and one intron from legume chloroplast genomes. Syst. Bot.20, 272–294.

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 561

Eklund, H., Doyle, J.A., Herendeen, P.S., 2004. Morphological phyloge-netic analysis of living and fossil Chloranthaceae. Int. J. Plant Sci. 165,107–151.

Edgar, R.C., 2004. MUSCLE: a multiple sequence alignment method withreduced time and space complexity. BMC Bioinform. 5, 1–19.

Ewing, B., Green, P., 1998. Base-calling of automated sequencer tracesusing phred. II. Error probabilities. Gen. Res. 8, 186–194.

Felsenstein, J., 1985. Confidence limits on phylogenies: an approach usingthe bootstrap. Evolution 39, 783–791.

Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-basedtests of topologies in phylogenetics. Syst. Biol. 49, 652–670.

Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphical tool forsequence finishing. Gen. Res. 8, 195–202.

Goremykin, V., Hirsch-Ernst, K.I., Wolfl, S., Hellwig, F.H., 2003a. Thechloroplast genome of the ‘‘basal’’ angiosperm Calycanthus fertilis –structural and phylogenetic analyses. Plant Syst. Evol. 242, 119–135.

Goremykin, V.V., Hirsch-Ernst, K.I., Wolfl, S., Hellwig, F.H., 2003b.Analysis of the Amborella trichopoda chloroplast genome sequencesuggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20,1499–1505.

Goremykin, V.V., Hirsch-Ernst, K.I., Wolfl, S., Hellwig, F.H., 2004. Thechloroplast genome of Nymphaea alba: whole-genome analyses and theproblem of identifying the most basal angiosperm. Mol. Biol. Evol. 21,1445–1454.

Goremykin, V.V., Holland, B., Hirsch-Ernst, K.I., Hellwig, F.H., 2005.Analysis of Acorus calamus chloroplast genome and its phylogeneticimplications. Mol. Biol. Evol. 22, 1813–1822.

Goremykin, V.V., Hellwig, F.H., 2006. A new test of phylogenetic modelfitness addresses the issue of the basal angiosperm phylogeny. Gene381, 81–91.

Goulding, S.E., Olmstead, R.G., Morden, C.W., Wolfe, K.H., 1996. Ebband flow of the chloroplast inverted repeat. Mol. Gen. Genet. 252,195–206.

Graham, S.W., Olmstead, R.G., 2000. Utility of 17 chloroplast genes forinferring the phylogeny of the basal angiosperms. Am. J. Bot. 87,1712–1730.

Graham, S.W., Reeves, P.A., Burns, A.C.E., Olmstead, R.G., 2000.Microstructural changes in noncoding chloroplast DNA: interpreta-tion, evolution, and utility of indels and inversions in basal angiospermphylogenetic inference. Int. J. Plant Sci. 161, S83–S96.

Hamby, R.K., Zimmer, E.A., 1992. Ribosomal RNA as a phylogenetictool in plant systematics. In: Soltis, P.S., Soltis, D.E., Doyle, J.J.(Eds.), Molecular Systematics of Plants. Chapman and Hall, NewYork, pp. 50–91.

Hilu, K.W., Borsch, T., Muller, K., Soltis, D.E., Soltis, P.S., Savolainen,V., Chase, M., Powell, M., Alice, L., Evans, R., Sauquet, H., Neinhuis,C., Slotta, T., Rohwer, J., Chatrou, L., 2003. Inference of angiospermphylogeny based on matK sequence information. Am. J. Bot. 90, 1758–1776.

Hiratsuka, J., Shimada, H., Whittier, R., Ishibashi, T., Sakamoto, M.,Mori, M., Kondo, C., Honji, Y., Sun, C.R., Meng, B.Y., Li, Y.Q.,Kanno, A., Nishizawa, Y., Hirai, A., Shinozaki, K., Sugiura, M., 1989.The complete sequence of the rice (Oryza sativa) chloroplast genome:intermolecular recombination between distinct tRNA genes accountsfor a major plastid DNA inversion during the evolution of the cereals.Mol. Gen. Genet. 217, 185–194.

Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: Bayesian inference ofphylogeny. Bioinformatics 17, 754–755.

Hupfer, H., Swaitek, M., Hornung, S., Herrmann, R.G., Maier, R.M.,Chiu, W.L., Sears, B., 2000. Complete nucleotide sequence of theOenothera elata plastid chromosome, representing plastome 1 of thefive distinguishable Euoenthera plastomes. Mol. Gen. Genet. 263, 581–585.

Jansen, R.K., Kaittanis, C., Saski, C., Lee, S.B., Tomkins, J., Alverson,A.J., Daniell, H., 2006. Phylogenetic analyses of Vitis (Vitaceae) basedon complete chloroplast genome sequences: effects of taxon samplingand phylogenetic methods on resolving relationships among rosids.BMC Evol. Biol. 6, 32.

Jansen, R.K., Raubeson, L.A., Boore, J.L., dePamphilis, C.W., Chumley,T.W., Haberle, R.C., Wyman, S.K., Alverson, A.J., Peery, R.,Herman, S.J., Fourcade, H.M., Kuehl, J.V., McNeal, J.R., Leebens-Mack, J., Cui, L., 2005. Methods for obtaining and analyzingchloroplast genome sequences. Methods Enzymol. 395, 348–384.

Kato, T., Kaneko, T., Sato, S., Nakamura, Y., Tabata, S., 2000. Completestructure of the chloroplast genome of a legume, Lotus japonicus. DNARes. 7, 323–330.

Kim, K.-J., Lee, H.-L., 2004. Complete chloroplast genome sequence fromKorean Ginseng (Panax schiseng Nees) and comparative analysis ofsequence evolution among 17 vascular plants. DNA Res. 11, 247–261.

Kim, Y.D., Jansen, R.K., 1994. Characterization and phylogeneticdistribution of chloroplast DNA rearrangement in the Berberidaceae.Plant Syst. Evol. 190, 157–185.

Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J.,Giegerich, R., 2001. REPuter: the manifold applications of repeatanalysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642.

Lavin, M., Doyle, J.J., Palmer, J.D., 1990. Systematic and evolutionarysignificance of the loss of the large chloroplast DNA inverted repeat inthe family Leguminosae. Evolution 44, 390–402.

Lee, H.-L., Jansen, R.K., Chumley, T.W., Kim, K.-J., 2007. Apparentgene translocations within chloroplast genomes of Jasminum andMenodora (Oleaceae) are due to multiple, overlapping inversions. Mol.Biol. Evol. 24, 1161–1180.

Lee, S.B., Kaittanis, C., Jansen, R.K., Hostetler, J.B., Tallon, L.J., Town,C.D., Daniell, H., 2006. The complete chloroplast genome sequence ofGossypium hirsutum: organization and phylogenetic relationships toother angiosperms. BMC Genomics 7, 61.

Leebens-Mack, J., Raubeson, L.A., Cui, L.Y., Kuehl, J.V., Fourcade,M.H., Chumley, T.W., Boore, J.L., Jansen, R.K., dePamphilis, C.W.,2005. Identifying the basal angiosperm node in chloroplast genomephylogenies: sampling one’s way out of the Felsenstein zone. Mol.Biol. Evol. 22, 1948–1963.

Lockhart, P.J., Penny, D., 2005. The place of Amborella within theradiation of angiosperms. Trends Plant Sci. 10, 201–202.

Maddison, D.R., Maddison, W.P., 2005. MacClade. Version 4.08. SinauerAssociates, Sunderland, Massachusetts.

Maier, R.M., Neckermann, K., Igloi, G.L., Kossel, H., 1995. Completesequence of the maize chloroplast genome: gene content, hotspots ofdivergence and fine tuning of genetic information by transcript editing.J. Mol. Biol. 251, 614–628.

Mathews, S., Donoghue, M.J., 1999. The root of angiosperm phylogenyinferred from duplicate phytochrome genes. Science 286, 947–950.

Mathews, S., Donoghue, M.J., 2000. Basal angiosperm phylogenyinferred from duplicate phytochromes A and C. Int. J. Plant Sci.161, S41–S55.

McNeal, J.R., Leebens-Mack, J.H., dePamphilis, C.W., 2006. Utilizationof fosmid partial genomic libraries for sequencing complete organellargenomes. Biotechniques 41, 69–73.

Moore, M.J., Dhingra, A., Soltis, P., Shaw, R., Farmerie, W.G., Folta,K.M., Soltis, D.E., 2006. Rapid and accurate pyrosequencing ofangiosperm plastid genomes. BMC Plant Biol. 6, 17.

Nagano, Y., Matsuno, R., Sasaki, Y., 1991. Sequence and transcriptionalanalysis of the gene cluster trnQ-zfpA-psaI-ORF231-petA in peachloroplasts. Curr. Genet. 20, 431–436.

Ogihara, Y., Isono, K., Kojima, T., Endo, A., Hanaoka, M., Shiina, T.,Terachi, T., Utsugi, S., Murata, M., Mori, N., Takumi, S., Ikeo, K.,Gojobori, T., Murai, R., Murai, K., Matsuoka, Y., Ohnishi, Y., Tajiri,H., Tsuenewaki, K., 2000. Chinese spring wheat (Triticum aestivum L.)chloroplast genome: complete sequence and contig clones. Plant Mol.Biol. Rep. 18, 243–253.

Ohyama, K., Fukuzawa, H., Kohchi, T., Shirai, H., Sano, T., Sano, S.,Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi,H., Ozeki, H., 1986. Chloroplast gene organization deduced fromcomplete sequence of liverwort Marchantia polymorpha chloroplastDNA. Nature 322, 572–574.

Palmer, J.D., 1986. Isolation and structural analysis of chloroplast DNA.Methods Enzymol. 118, 167–186.

562 D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563

Palmer, J.D., 1991. Plastid chromosomes: structure and evolution. In:Bogorad, L., Vasil, I. (Eds.), Cell Culture and Somatic Cell Genetics ofPlants. Academic Press, San Diego, California, USA, pp. 5–53.

Palmer, J.D., Thompson, W.F., 1982. Chloroplast DNA rearrangementsare more frequent when a large inverted repeat sequence is lost. Cell29, 537–550.

Palmer, J.D., Nugent, J.M., Herbon, L.A., 1987a. Unusual structure ofGeranium chloroplast DNA – a triple-sized inverted repeat, extensivegene duplications, multiple inversions, and two repeat families. Proc.Natl. Acad. Sci. USA 84, 769–773.

Palmer, J.D., Osorio, B., Aldrich, J., Thompson, W.F., 1987b. Chloro-plast DNA evolution among legumes – loss of a large inverted repeatoccurred prior to other sequence rearrangements. Curr. Genet. 11,275–286.

Palmer, J.D., Stein, D.B., 1986. Conservation of chloroplast genomestructure among vascular plants. Curr. Genet. 10, 823–833.

Plunkett, G.M., Downie, S.R., 2000. Expansion and contraction of thechloroplast inverted repeat in Apiaceae subfamily Apioideae. Syst.Bot. 25, 648–667.

Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNAsubstitution. Bioinformatics 14, 817–818.

Price, R.A., Calie, P.J., Downie, S.R., Logsdon, J.M., Palmer, J.D., 1990.Chloroplast DNA variation in the Geraniaceae – a preliminary report.In: Vorster, P. (Ed.), The International Geraniaceae Symposium. TheUniversity of Stellenbosch, Republic of South Africa.

Qiu, Y.-L., Chase, M.W., Les, D.H., Parks, C.R., 1993. Molecularphylogenetics of the Magnoliidae: cladistic analyses of nucleotidesequences of the plastid gene rbcL. Ann. Mo. Bot. Gard. 80, 587–606.

Qiu, Y.-L., Lee, J., Bernasconi-Quadroni, F., Soltis, D.E., Soltis, P.S.,Zanis, M., Zimmer, E.A., Chen, Z., Savolainen, V., Chase, M.W.,2000. Phylogeny of basal angiosperms: analyses of five genes fromthree genomes. Int. J. Plant Sci. 161, S3–S27.

Qiu, Y.L., Lee, J., Bernasconi-Quadroni, F., Soltis, D.E., Soltis, P.S.,Zanis, M., Zimmer, E.A., Chen, Z.D., Savolainen, V., Chase, M.W.,1999. The earliest angiosperms: evidence from mitochondrial, plastidand nuclear genomes. Nature 402, 404–407.

Qiu, Y.-L., Lee, J., Whitlock, B.A., Bernasconi-Quadroni, F., Dom-brovska, O., 2001. Was the ANITA rooting of the angiospermphylogeny affected by long branch attraction? Mol. Biol. Evol. 18,1745–1753.

Qiu, Y.L., Dombrovska, O., Lee, J., Li, L.B., Whitlock, B.A., Bernasconi-Quadroni, F., Rest, J.S., Davis, C.C., Borsch, T., Hilu, K.W., Renner,S.S., Soltis, D.E., Soltis, P.S., Zanis, M.J., Cannone, J.J., Gutell, R.R.,Powell, M., Savolainen, V., Chatrou, L.W., Chase, M.W., 2005.Phylogenetic analyses of basal angiosperms based on nine plastid,mitochondrial, and nuclear genes. Int. J. Plant Sci. 166, 815–842.

Qiu, Y.-L., Li, L., Hendry, T., Li, R., Taylor, D.W., Issa, M.J., Ronen,A.J., Vekaria, M.L., White, A.M., 2006. Reconstructing the basalangiosperm phylogeny: evaluating information content of the mito-chondrial genes. Taxon 55, 837–856.

Raubeson, L.A., Jansen, R.K., 2005. Chloroplast genomes of plants. In:Henry, R. (Ed.), Diversity and Evolution of Plants – Genotypic andPhenotypic Variation in Higher Plants. CABI Publishing, UK, pp. 45–68.

Raubeson, L.A., Peery, R., Chumley, T.W., Dziubek, C., Fourcade,H.M., Boore, J.L., Jansen, R.K., 2007. Comparative chloroplastgenomics: analyses including new sequences from the angiospermsNuphar advena and Ranunculus macranthus. BMC Genomics 8, 174.

Ruhlman, T., Lee, S.-B., Jansen, R.K., Hostetler, J.B., Tallon, L.J., Town,C.D., Daniell, H., 2006. Complete chloroplast genome sequence ofDaucus carota: implications for biotechnology and phylogeny ofangiosperms. BMC Genomics 7, 222.

Saski, C., Lee, S.-B., Daniell, H., Wood, T.C., Tomkins, J., Kim, H.-G.,Jansen, R.K., 2005. Complete chloroplast genome sequence of Glycine

max and comparative analyses with other legume genomes. Plant Mol.Biol. 59, 309–322.

Saski, C., Lee, S.-B., Fjellheim, S., Guda, C., Jansen, R.K., Luo, H.,Tomkins, J., Rognli, O.A., Daniell, H., Clarke, J.L., 2007. Complete

chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor

and Agrostis stolonifera and comparative analyses with other grassgenomes. Theor. Appl. Genet.. doi:10.1007/s00122-007-0567-4.

Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Tabata, S., 1999.Complete structure of the chloroplast genome of Arabidopsis thaliana.DNA Res. 6, 283–290.

Savolainen, V., Chase, M.W., Morton, C.M., Soltis, D.E., Bayer, C., Fay,M.F., De Bruijn, A., Sullivan, S., Qiu, Y.-L., 2000. Phylogenetics offlowering plants based upon a combined analysis of plastid atpB andrbcL gene sequences. Syst. Biol. 49, 306–362.

Schmitz-Linneweber, C., Maier, R.M., Alcaraz, J.P., Cottet, A., Herr-mann, R.G., Mache, R., 2001. The plastid chromosome of spinach(Spinacia oleracea): complete nucleotide sequence and gene organiza-tion. Plant Mol. Biol. 45, 307–315.

Schmitz-Linneweber, C., Regel, R., Du, T.G., Hupfer, H., Herrmann,R.G., Maier, R.M., 2002. The plastid chromosome of Atropa

belladonna and its comparison with that of Nicotiana tabacum: therole of RNA editing in generating divergence in the process of plantspeciation. Mol. Biol. Evol. 19, 1602–1612.

Shen, G.F., Chen, K., Wu, M., Kung, S.D., 1982. Nicotiana chloroplastgenome IV. N. accuminata has larger inverted repeats and genome size.Mol. Gen. Genet. 187, 12–18.

Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol.Evol. 16, 1114–1116.

Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T., Hayashida, N.,Matsubayashi, T., Zaita, N., Chunwongse, J., Obokata, J., Yamag-uchi-Shinozaki, K., Ohto, C., Torazawa, K., Meng, B.Y., Sugita, M.,Deno, H., Kamogashira, T., Yamada, K., Kusuda, J., Takaiwa, F.,Kato, A., Tohdoh, N., Shimada, H., Sugiura, M., 1986. The completenucleotide sequence of the tobacco chloroplast genome: its geneorganization and expression. EMBO J. 5, 2043–2049.

Soltis, D.E., Soltis, P.S., 2004a. Amborella not a ‘‘basal angiosperm’’? Notso fast. Am. J. Bot. 91, 997–1001.

Soltis, P.S., Soltis, D.E., 2004b. The origin and diversification ofangiosperms. Am. J. Bot. 91, 1614–1626.

Soltis, P.S., Soltis, D.E., Chase, M.W., 1999. Angiosperm phylogenyinferred from multiple genes as a tool for comparative biology. Nature402, 402–404.

Soltis, D.E., Soltis, P.S., Chase, M.W., Mort, M.W., Albach, D.C., Zanis,M., Savolainen, V., Hahn, W.H., Hoot, S.B., Fay, M.F., Axtell, M.,Swensen, S.M., Prince, L.M., Kress, W.J., Nixon, K.J., Farris, J.S.,2000. Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB

sequences. Bot. J. Linn. Soc. 133, 381–461.Soltis, D.E., Albert, V.A., Savolainen, V., Hilu, K., Qiu, Y.L., Chase,

M.W., Farris, J.S., Stefanovic, S., Rice, D.W., Palmer, J.D., Soltis,P.S., 2004. Genome-scale data, angiosperm relationships, and ‘endingincongruence’: a cautionary tale in phylogenetics. Trends Plant Sci. 9,477–483.

Soltis, P.S., Endress, P.K., Chase, M.W., Soltis, D.E., 2005. Phylogenyand Evolution of Angiosperms. Sinauer Associates, Inc., Sunderland,MA, USA.

Soltis, D.E., Gitzendanner, M.A., Soltis, P.S., 2007. A 567-taxon data setfor angiosperms: the challenge posed by Bayesian analyses of largedata sets. Int. J. Plant Sci. 168, 137–157.

Steane, D.A., 2005. Complete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum, Eucalyptus globulus (Myrta-ceae). DNA Res. 12, 215–220.

Stefanovic, S., Olmstead, R.G., 2005. Down the slippery slope: plastidgenome evolution in Convolvulaceae. J. Mol. Evol. 61, 292–305.

Stefanovic, S., Rice, D.W., Palmer, J.D., 2004. Long branch attraction,taxon sampling, and the earliest angiosperms: Amborella or monocots?BMC Evol. Biol. 4, 35.

Sugiura, M., 1992. The Chloroplast Genome. Plant Mol. Biol. 19, 149–168.

Swofford, D.L., 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4.10. Sinauer Associates, Sunderland,Massachusetts.

D.R. Hansen et al. / Molecular Phylogenetics and Evolution 45 (2007) 547–563 563

Tsudzuki, J., Nakashima, K., Tsudzuki, T., Hiratsuka, J., Shibata, M.,Wakasugi, T., Sugiura, M., 1992. Chloroplast DNA of black pineretains a residual inverted repeat lacking rRNA genes: nucleotidesequences of trnQ, trnK, psbA, trnI and trnH and the absence of rps16.Mol. Gen. Genet. 232, 206–214.

Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I.,Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A.,Schein, J., Sterck, L., Aerts, A., Bhalerao, R.R., Bhalerao, R.P.,Blaudez, D., Boerjan, W., Brun, A., Brunner, A., Busov, V., Campbell,M., Carlson, J., Chalot, M., Chapman, J., Chen, G.L., Cooper, D.,Coutinho, P.M., Couturier, J., Covert, S., Cronk, Q., Cunningham,R., Davis, J., Degroeve, S., Dejardin, A., de Pamphilis, C., Detter, J.,Dirks, B., Dubchak, I., Duplessis, S., Ehlting, J., Ellis, B., Gendler, K.,Goodstein, D., Gribskov, M., Grimwood, J., Groover, A., Gunter, L.,Hamberger, B., Heinze, B., Helariutta, Y., Henrissat, B., Holligan, D.,Holt, R., Huang, W., Islam-Faridi, N., Jones, S., Jones-Rhoades, M.,Jorgensen, R., Joshi, C., Kangasjarvi, J., Karlsson, J., Kelleher, C.,Kirkpatrick, R., Kirst, M., Kohler, A., Kalluri, U., Larimer, F.,Leebens-Mack, J., Leple, J.C., Locascio, P., Lou, Y., Lucas, S.,Martin, F., Montanini, B., Napoli, C., Nelson, D.R., Nelson, C.,Nieminen, K., Nilsson, O., Pereda, V., Peter, G., Philippe, R., Pilate,G., Poliakov, A., Razumovskaya, J., Richardson, P., Rinaldi, C.,Ritland, K., Rouze, P., Ryaboy, D., Schmutz, J., Schrader, J.,Segerman, B., Shin, H., Siddiqui, A., Sterky, F., Terry, A., Tsai,C.J., Uberbacher, E., Unneberg, P., Vahala, J., Wall, K., Wessler, S.,Yang, G., Yin, T., Douglas, C., Marra, M., Sandberg, G., Van de Peer,Y., Rokhsar, D., 2006. The genome of black cottonwood, Populus

trichocarpa (Torr. & Gray). Science 313, 1596–1604.

Wakasugi, T., Tsudzuki, J., Ito, S., Nakashima, K., Tsudzuki, T., Sugiura,M., 1994. Loss of all ndh genes as determined by sequencing the entirechloroplast genome of the black pine Pinus thunbergii. Proc. Natl.Acad. Sci. USA 91, 9794–9798.

Wilgenbusch, J.C., Warren, D.L., Swofford, D.L., 2004. AWTY: a systemfor graphical exploration of MCMC convergence in Bayesian phylo-genetic inference. <http://ceb.csit.fsu.edu/awty>.

Wolfe, K.H., 1988. The site of deletion of the inverted repeat in peachloroplast DNA contains duplicated gene fragments. Curr. Genet. 13,97–99.

Wu, C.-S., Wang, Y.-N., Liu, S.-M., Chaw, S.-M., 2007. Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genesof Gnetum parviflorum: insights into cpDNA evolution and phylogenyof extant seed plants. Mol. Biol. Evol. 24, 1366–1379.

Wyman, S.K., Jansen, R.K., Boore, J.L., 2004. Automatic annotation oforganellar genomes with DOGMA. Bioinformatics 20, 3252–3255.

Zanis, M.J., Soltis, D.E., Soltis, P.S., Mathews, S., Donoghue, M.J., 2002.The root of the angiosperms revisited. Proc. Natl. Acad. Sci. USA 99,6848–6853.

Zanis, M.J., Soltis, P.S., Qiu, Y.-L., Zimmer, E.A., Soltis, D.E., 2003.Phylogenetic analyses and perianth evolution in basal angiosperms.Ann. Mo. Bot. Gard. 90, 129–150.

Zhang, L.B., Renner, S., 2003. The deepest splits in Chloranthaceaeas resolved by chloroplast sequences. Int. J. Plant Sci. 164, S383–S392.

Zwickl, D.J., 2006. GARLI (Genetic Algorithm for Rapid LikelihoodInference). Version 0.942 (www.bio.utexas.edu/faculty/antisense/garli/Garli.html).