reconstructing the tree of life using molecular sequence data

12
Reconstructing the tree of life using molecular sequence data John Dunbar SN: 69789839 Abstract: The Tree of Life (ToL) and the evolutionary histories of species have proved to be an extremely complex field of study. There has been much debate on how the universal ToL stands. Traditional taxonomy methods of identifying shared morphological characteristics have been beneficial to constructing phylogenetic trees and inferring the ToL. However, molecular sequence data in the 1960s revolutionised phylogenetics and has enabled us to resolve many relationships between species. Although horizontal gene transfer (HGT) made interpreting molecular sequence data very difficult, good computational analysis has made inferring the deepest roots of ToL more possible.

Upload: john-dunbar

Post on 10-Mar-2016

218 views

Category:

Documents


2 download

DESCRIPTION

Some of the implications in reconstructing the tree of life over the past few decades.

TRANSCRIPT

Page 1: Reconstructing the tree of life using molecular sequence data

Reconstructing the tree of life using molecular sequence data

John Dunbar SN: 69789839

Abstract:

The Tree of Life (ToL) and the evolutionary histories of species have proved to be an extremely

complex field of study. There has been much debate on how the universal ToL stands. Traditional

taxonomy methods of identifying shared morphological characteristics have been beneficial to

constructing phylogenetic trees and inferring the ToL. However, molecular sequence data in the

1960s revolutionised phylogenetics and has enabled us to resolve many relationships between

species. Although horizontal gene transfer (HGT) made interpreting molecular sequence data very

difficult, good computational analysis has made inferring the deepest roots of ToL more possible.

Page 2: Reconstructing the tree of life using molecular sequence data

1

Table of Contents

Introduction….................................................................................................. 2

Traditional & Phylogenetic classification…………........................................ 2

Molecular sequencing and the impact of Horizontal Gene Transfer (HGT) &

Genome Fusion………………………………………………………………. 4

DNA Barcoding……………………………………………………………… 7

Conclusion…………………………………………………………………… 9

References……………………………………………………………………. 10

Page 3: Reconstructing the tree of life using molecular sequence data

2

Reconstructing the tree of life using molecular

sequence data

Introduction

In the publication “On the Origin of Species”, Darwin, (1859) depicted a tree like

diagram in attempt to illustrate the relatedness and histories of all living organisms and

demonstrate that all life descended from a common ancestor. This is known as the Tree of

Life (ToL). In doing so, Darwin opened up a new world for exploration, and a century and a

half later, with the aid of traditional taxonomy methods and molecular sequence data, many

phylogenetic trees have been established. However, the ToL and the evolutionary histories of

species have proved extremely complex and there has been much debate on how the universal

ToL stands.

Traditional & Phylogenetic classification

Traditional taxonomy is founded on the grouping of organisms’ in terms of

classification, identification, and naming based on shared morphological characteristics. The

system of naming and grouping organisms was established by Carolus Linnaeus in 1735

when he published his book “Systema Naturae” (Futuyma, 2009). The rationale at the time

was to develop an understanding of the scale of life in the hope to reveal God’s plan of

creation (Understanding Evolution, 2011). This system consists of first assigning a species

with a two-part name, known as binomial nomenclature, i.e. a genus name followed by a

species name. Secondly, species are grouped into larger groups known as taxonomic levels

which are categorised starting with the highest level as: Kingdom, Phylum, Class, Order,

Page 4: Reconstructing the tree of life using molecular sequence data

3

Family, Genus, and Species. This is known as hierarchical classification (Futuyma, 2009).

However, the Linnaean system does not identify the organism’s position on the ToL or its

evolutionary history because it ranks organisms into equivalent taxonomic levels regardless

of their evolutionary history. For example, the cats (Felidae) and the orchids (Orchidaceae)

are both grouped at the family level, even though Orchidaceae has an evolutionary history

expanding over 70 million years longer than Felidae (Understanding Evolution, 2011).

We can demonstrate where organisms are positioned within the ToL by using

methods known as phylogenetic classification. Phylogenetics is concerned with the

classification of species in a way that reflects

their evolutionary relatedness and classifies

them into groups called clades (Fig.1) which

include a species and all its descendants.

These clades are then depicted on a tree like

diagram called a phylogenetic tree based on

phenotypic characters such as internal and

external morphology and, behaviour and molecular sequence data, which will be discussed in

the next chapter. A tree can be established by identifying a particular trait in a clade and

determining if the trait is ancestral (plesiomorphic) or a newly derived trait (apomorphic)

(Fig.2). All possessing synapomorphic traits (traits

that are shared by two or more lineages within a

clade) are branched together high in the tree while

those believed to be more basal (primitive) and

missing the trait but possessing other shared

Figure 1. Demonstrates a clade on a phylogenetic tree.

Taken from (Understanding Evolution, 2011)

Figure 2. Demonstrates apomorphic and

plesiomorphic characters . Taken from

(Understanding Evolution, 2011)

Page 5: Reconstructing the tree of life using molecular sequence data

4

plesiomorphic traits are placed lower on the tree indicating that they all share a common

ancestor.

Often, new data will conflict with existing trees. Take, for example, the evolution of

snakes, which are known to have arisen during the Cretaceous period approx. 98 million

years ago (Wilson et al., 2010). Like the mammals, after the K-T extinction (an extinction

event 65 million years ago) they exploded in diversity during the Tertiary (Benton, 2000).

Macrostomata, the wide gape that gives snakes the ability to swallow prey many times larger

than their skull, is believed to have evolved later in the clade Alethinophidia (Lee et al.,

2007). The most basal macrostomatans are Xenopeltids and the Core Macrostomata are

concerned with the more recently evolved booids which include giant constricting snakes and

the more advanced snakes, such as those that contain a venom apparatus (Lee et al., 2007).

This hypothesis suggests that a restricted gape in extant basal snakes is a plesiomorphic

character. However, recent findings in India of a 3.5 meter long snake found within a

sauropod dinosaur nest site, recovered from Upper Cretaceous rocks by Wilson et al., (2010),

demonstrates that early snakes did have the ability to attain a large body size and jaw

mobility did in fact allow them to subdue large prey. The finding also suggests that the

restricted gape in extant basal snakes is a result of miniaturisation of habitat and prey rather

than a plesiomorphic character.

Molecular sequencing and the impact of Horizontal Gene Transfer (HGT) & Genome

Fusion

A new era of inferring the ToL arrived in the 1960s, when molecular biology

techniques allowed us to read the genealogical history of an organism from its DNA sequence

(Zuckerkandl & Pauling., 1965). In contrast to the traditional taxonomic methods, sequencing

Page 6: Reconstructing the tree of life using molecular sequence data

5

of an organism’s DNA revealed its evolutionary history and seemed promising in uncovering

the ToL (Woese, 2000). Traditional morphological identification grouped Tardigrada into a

monophyletic group with Onychophora and Arthropoda but recent large scale molecular data

places Tardigrada as a sister group to Arthropoda (Campbell et al., 2011). Ribosomal RNA

(rRNA) for example, is among the ideal molecules for sequencing, due to its universal

function (O’Malley & Koonin, 2011). Sequencing rRNA enabled biologists to build up a

comprehensive overview of historical events that become recorded in molecules and could

therefore, help us infer the universal ToL (McInerney et al., 2007). However, during the

1990s, it became apparent that it wouldn’t be an easy task to construct the ToL using

genomics. New data acquired from the sequencing of genomes conflicted with other

molecular data and this initially seemed to cast doubt on molecular sequencing as a reliable

tool for inferring the ToL. The errors in the molecular data occurred through a process known

as horizontal gene transfer (HGT). HGT occurs when organisms’ genes are incorporated into

another organism’s genome (Woese, 2000). The errors in interpretation arose because of the

fact that HGT was not discovered back then.

HGT might initially appear to be an undesirable event, particularly for rooting trees,

but in regards to cellular defence, cells appear more than capable of defending against foreign

DNA (Woese, 2000), suggesting that perhaps there are advantages to acquiring new genes.

There are extant multicellular organisms in which HGT is believed to have had a significant

impact on their anatomy and physiology. Take for instance, the sea slug Elysia chorotica, an

animal which acquires food via photosynthesis for part of its life (Rumpho et al., 2008).

Initially E. chorotica’s nutritional source is a filamentous algae Vaucheria litorea. After

eating, E. chorotica employs chloroplasts from V. litorea in the digestive apparatus to carry

out photosynthesis in a suitable organ (Rumpho et al., 2008). However, this process is time

limited as active photosynthesis can be carried out for approx. 10 months only. While this

Page 7: Reconstructing the tree of life using molecular sequence data

6

process sounds ideal, especially during times of food scarcity, short lived proteins such as,

psb0 encoded in the nucleus of V. litorea must be regularly imported by the chloroplasts in

order to ensure continuous function (Skulachev, 2010). Acquiring the chloroplasts from algae

is not enough to carry out photosynthesis alone, the presence of psb0 genes are critical and E.

chorotica contains the psb0 genes which is identical, to that of V. litorea, as revealed through

sequencing (Rumpho et al., 2011). However, Wagele et al., (2011) argues that HGT is not

responsible for the acquisition of psb0 by E. chorotica which is needed to sustain

photosynthesis, and instead suggests that this gene is actually already possessed by the

plastids and are only aided by the intracellular environment of E. chorotica as it refrains

from degrading plastids.

HGT has had a minor impact on multicellular organisms, with some estimates of 2%

of an entire genome arising from HGT (McInerney et al., 2007). However, the impact on

single celled organisms is far greater and HGT has the potential to erase much of the history

recorded in prokaryotes prior to the evolution of eukaryotes (Simonson et al., 2005). The

importance of this is that it poses a difficulty in interpreting the true evolutionary history of

prokaryotes from molecular sequence data. This means that the original tree-like root on the

ToL has become compromised and it is now difficult to determine if the ToL is a tree like

structure at the root or if it is now net or web-like as a consequence of HGT. A net or web-

like tree illustrates that the rate at which HGT has occurred is too great to determine which is

its true ancestor and which is the original source of the transferred DNA (Williams et al.,

2011). However, if many molecules are sequenced and good computational analysis is

applied, it may still be possible to retrieve the true tree of cellular division (Puigbo et al.,

2009, Williams et al., 2011).

Page 8: Reconstructing the tree of life using molecular sequence data

7

The evolution of eukaryotes (Fig.3)

occurred through a phenomenon known as

genome fusion. This refers to a process whereby

an organism incorporates its genome into another

organism’s genome (Cotton & McInerney, 2010).

Eukaryotes arose as a result of genome fusion,

between two diverse prokaryotic genomes,

eubacteria and archaebacteria, providing each other with biochemical services (Pisani et al.,

2007), resulting in a more complex cell of which the potential is evident by the diversity of

complex life on earth (Rivera & Lake,

2004). However, the eubacteria and

archaebacteria are both distinct separate

domains on the base of the ToL and this

genome fusion means that both domains

merged into one, and this means that the tree

of life now resembles a ring of life (Fig.4)

(Rivera & Lake, 2004).

DNA Barcoding

A molecular-based tool for identifying and categorising specimens has emerged over

the last decade (Hebert et al., 2003), and this may also aid in reconstructing the universal ToL

(Casiraghi et al., 2010). This simple concept known as DNA barcoding is achieved by

applying “barcodes” to specimens by marking a particular region on a DNA sequence called

the cytochrome c oxidase 1 (CO1) gene (Kress et al., 2005). CO1 is a 648-bp region on a

Figure 3. The three domains of life, eubacteria and

archaebacteria giving rise to the eukaryota. Taken

from (Understanding Evolution, 2011)

Figure 4. A schematic diagram of the ring of life. Taken

from (Rivera & Lake, 2004)

Page 9: Reconstructing the tree of life using molecular sequence data

8

mitochondrial gene and it is a more suitable marker than nuclear DNA. This is, because

evolution occurs faster in mitochondrial DNA (mtDNA), resulting in a larger accumulation of

changes between species that are closely related (Hebert et al., 2004b). Although several

studies using DNA barcoding have been successful (Hebert et al., 2003, Hebert et al., 2004a,

Hebert et al., 2004b), DNA barcoding has been highly controversial (Meyer & Paulay, 2005).

Criticisms have been made, for example, in using single genes as universal markers, and this

is particularly true for CO1, as while it is successful in animals, it is not appropriate in

barcoding plants (Kress et al., 2005), because the gene evolves too slowly in plants and fungi

for it to discriminate between species levels (Pires & Marinoni, 2010), in which case other

genes are being explored (Kress et al., 2005).

In recent years, DNA barcoding has become more accepted by the scientific

community (Moritz & Cicero, 2004), bringing us closer to inferring the universal ToL.

Although taxonomists were largely sceptical and had reservations in regarding to DNA

barcoding methods, fearing that it might replace traditional taxonomic methods, it has now

become apparent that both methods are crucial and complement each other (Moritz & Cicero,

2004). However, DNA barcoding is still in its infancy and many issues need to be addressed.

For instance, Casiraghi et al., (2010) cautions that evolution is continuous and species are not

frozen in time. Therefore, we need to consider that once a given species is assigned a

barcode, how that will read as that species evolves in the future. Databases have been

established which confirm the growing support for DNA barcoding. For example, The

consortium for the barcode of life (CBOL) (http://barcoding.si.edu/) and The Barcode of Life

Data Systems (BOLD) (http://www.boldsystems.org/views/login.php) are online resource

databases that provide a global bank containing DNA barcode records (Edwards, 2005,

Ratnasingham & Hebert, 2007). As data accumulates, detailed phylogenies will become

evident and DNA barcoding will become a formidable tool for inferring the universal ToL.

Page 10: Reconstructing the tree of life using molecular sequence data

9

Conclusion

Although binomial nomenclature does little to explain the evolutionary history of organisms,

the traditional methods of identifying shared morphological characteristics have been

beneficial to constructing phylogenetic trees and inferring the ToL. Molecular sequence data

in the 1960s revolutionised phylogenetics and has enabled us to resolve many relationships

between species. However, DNA sequencing often conflicts with traditional taxonomy and

has revealed the damaging effect that HGT has had on interpreting the evolutionary history

that is preserved in DNA. It first seemed that the legitimate root of the ToL was not

retrievable but we now know that large scale molecular and morphological work can resolve

deep branches and with the advancement of DNA barcoding, long term data compiling is

likely to open up new avenues in molecular sequencing and resolve detailed phylogenetic

trees. As it stands, the ToL resembles a ring of life as a result of endosymbiosis between

eubacteria and archaebacteria giving rise to the eukaryote.

Page 11: Reconstructing the tree of life using molecular sequence data

10

References Benton, M.J., (2000) Vertebrate Palaeontology, Blackwell Science Ltd, Oxford. Campbell, L. I., O. Rota-Stabelli, G. D. Edgecombe, T. Marchioro, S. J. Longhorn, M. J. Telford, H.

Philippe, L. Rebecchi, K. J. Peterson & D. Pisani, (2011) MicroRNAs and phylogenomics resolve the relationships of Tardigrada and suggest that velvet worms are the sister group of Arthropoda. Proceedings of the National Academy of Sciences of the United States of America 108: 15920-15924.

Casiraghi, M., M. Labra, E. Ferri, A. Galimberti & F. De Mattia, (2010) DNA barcoding: a six-question tour to improve users' awareness about the method. Briefings in Bioinformatics 11: 440-453.

Cotton, J. A. & J. O. McInerney, (2010) Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function. Proceedings of the National Academy of Sciences of the United States of America 107: 17252-17255.

Darwin, C., (1859) On the Origin of Species By Means of Natural Selection, John Murray, London. Edwards, J. L., (2005) Consortium for the Barcode of Life: An introduction. Genes & Genetic Systems

80: 443. Futuyma, D. J., (2009) Evolution, Stony Brook, New York. Hebert, P. D. N., A. Cywinska, S. L. Ball & J. R. DeWaard, (2003) Biological identifications through DNA

barcodes. Proceedings of the Royal Society of London Series B-Biological Sciences 270: 313-321.

Hebert, P. D. N., E. H. Penton, J. M. Burns, D. H. Janzen & W. Hallwachs, (2004a) Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proceedings of the National Academy of Sciences of the United States of America 101: 14812-14817.

Hebert, P. D. N., M. Y. Stoeckle, T. S. Zemlak & C. M. Francis, (2004b) Identification of birds through DNA barcodes. Plos Biology 2: 1657-1663.

Kress, W. J., K. J. Wurdack, E. A. Zimmer, L. A. Weigt & D. H. Janzen, (2005) Use of DNA barcodes to identify flowering plants. Proceedings of the National Academy of Sciences of the United States of America 102: 8369-8374.

Lee, M. S. Y., A. F. Hugall, R. Lawson & J. D. Scanlon, (2007) Phylogeny of snakes (Serpentes): combining morphological and molecular data in likelihood, Bayesian and parsimony analyses. Systematics and Biodiversity 5: 371-389.

McInerney, J. O., D. E. Pisani, M. J. O'Connell, D. A. Fitzpatrick & C. J. Creevey, (2007) Evolutionary history of prokaryotes: Tree or no tree?, p. 49-59.

Meyer, C. P. & G. Paulay, (2005) DNA barcoding: Error rates based on comprehensive sampling. Plos Biology 3: 2229-2238.

Moritz, C. & C. Cicero, (2004) DNA barcoding: Promise and pitfalls. Plos Biology 2: 1529-1531. O’Malley. M. & E. Koonin, (2011) How stands the Tree of Life a century and a half after The Origin?

Biology Direct 6:32. Pires, A. C. & L. Marinoni, (2010) DNA barcoding and traditional taxonomy unified through

Integrative Taxonomy: a view that challenges the debate questioning both methodologies. Biota Neotropica 10: 339-346.

Pisani, D., J. A. Cotton & J. O. McInerney, (2007) Supertrees disentangle the chimerical origin of eukaryotic Genomes. Molecular Biology and Evolution 24: 1752-1760.

Puigbo, P., Y. I. Wolf & E. V. Koonin, (2009) Search for a 'Tree of Life' in the thicket of the phylogenetic forest. Journal of Biology (London) 8: 59.

Ratnasingham, S. & P. D. N. Hebert, (2007) BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes 7: 355-364.

Rivera, M. C. & J. A. Lake, (2004) The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431: 152-155.

Page 12: Reconstructing the tree of life using molecular sequence data

11

Rumpho, M. E., K. N. Pelletreau, A. Moustafa & D. Bhattacharya, (2011) The making of a photosynthetic animal. Journal of Experimental Biology 214: 303-311.

Rumpho, M. E., J. M. Worful, J. Lee, K. Kannan, M. S. Tyler, D. Bhattacharya, A. Moustafa & J. R. Manhart, (2008) Horizontal gene transfer of the algal nuclear gene psbO to the photosynthetic sea slug Elysia chlorotica. Proceedings of the National Academy of Sciences of the United States of America 105: 17867-17871.

Simonson, A. B., J. A. Servin, R. G. Skophammer, C. W. Herbold, M. C. Rivera & J. A. Lake, (2005) Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the United States of America 102: 6608-6613.

Skulachev, V. P., (2010) Discovery of a Photosynthesizing Animal that Can Survive for Months in a Light-Dependent Manner. Biochemistry-Moscow 75: 1498-1499.

Understanding Evolution. 2011. University of California Museum of Paleontology. Available from <http://evolution.berkeley.edu/>. [Accessed on 02 November 2011]

Wagele. H., O. Deusch, K. Handeler, R. Martin, V. Schmitt, G. Christa, B. Pinzger, S. B. Gould, T. Dagan, A. Klussmann-Kolb, and W. Martin, (2011) Transcriptomic Evidence That Longevity of Acquired Plastids in the Photosynthetic Slugs Elysia timida and Plakobranchus ocellatus Does Not Entail Lateral Transfer of Algal Nuclear Genes. Mol. Biol. Evol. 28(1):699–706.

Williams, D., G. P. Fournier, P. Lapierre, K. S. Swithers, A. G. Green, C. P. Andam & J. P. Gogarten, (2011) A Rooted Net of Life. Biology Direct 6.

Wilson, J. A., D. M. Mohabey, S. E. Peters & J. J. Head, (2010) Predation upon Hatchling Dinosaurs by a New Snake from the Late Cretaceous of India. Plos Biology 8.

Woese, C. R., (2000) Interpreting the universal phylogenetic tree. Proceedings of the National Academy of Sciences of the United States of America 97: 8392-8396.

Zuckerkandl. E. & L. Pauling, (1965) Molecules as documents of evolutionary history. Journal of Theoretical Biology, 8(2): 357-366.