reconstructing the tree of life using molecular sequence data
DESCRIPTION
Some of the implications in reconstructing the tree of life over the past few decades.TRANSCRIPT
Reconstructing the tree of life using molecular sequence data
John Dunbar SN: 69789839
Abstract:
The Tree of Life (ToL) and the evolutionary histories of species have proved to be an extremely
complex field of study. There has been much debate on how the universal ToL stands. Traditional
taxonomy methods of identifying shared morphological characteristics have been beneficial to
constructing phylogenetic trees and inferring the ToL. However, molecular sequence data in the
1960s revolutionised phylogenetics and has enabled us to resolve many relationships between
species. Although horizontal gene transfer (HGT) made interpreting molecular sequence data very
difficult, good computational analysis has made inferring the deepest roots of ToL more possible.
1
Table of Contents
Introduction….................................................................................................. 2
Traditional & Phylogenetic classification…………........................................ 2
Molecular sequencing and the impact of Horizontal Gene Transfer (HGT) &
Genome Fusion………………………………………………………………. 4
DNA Barcoding……………………………………………………………… 7
Conclusion…………………………………………………………………… 9
References……………………………………………………………………. 10
2
Reconstructing the tree of life using molecular
sequence data
Introduction
In the publication “On the Origin of Species”, Darwin, (1859) depicted a tree like
diagram in attempt to illustrate the relatedness and histories of all living organisms and
demonstrate that all life descended from a common ancestor. This is known as the Tree of
Life (ToL). In doing so, Darwin opened up a new world for exploration, and a century and a
half later, with the aid of traditional taxonomy methods and molecular sequence data, many
phylogenetic trees have been established. However, the ToL and the evolutionary histories of
species have proved extremely complex and there has been much debate on how the universal
ToL stands.
Traditional & Phylogenetic classification
Traditional taxonomy is founded on the grouping of organisms’ in terms of
classification, identification, and naming based on shared morphological characteristics. The
system of naming and grouping organisms was established by Carolus Linnaeus in 1735
when he published his book “Systema Naturae” (Futuyma, 2009). The rationale at the time
was to develop an understanding of the scale of life in the hope to reveal God’s plan of
creation (Understanding Evolution, 2011). This system consists of first assigning a species
with a two-part name, known as binomial nomenclature, i.e. a genus name followed by a
species name. Secondly, species are grouped into larger groups known as taxonomic levels
which are categorised starting with the highest level as: Kingdom, Phylum, Class, Order,
3
Family, Genus, and Species. This is known as hierarchical classification (Futuyma, 2009).
However, the Linnaean system does not identify the organism’s position on the ToL or its
evolutionary history because it ranks organisms into equivalent taxonomic levels regardless
of their evolutionary history. For example, the cats (Felidae) and the orchids (Orchidaceae)
are both grouped at the family level, even though Orchidaceae has an evolutionary history
expanding over 70 million years longer than Felidae (Understanding Evolution, 2011).
We can demonstrate where organisms are positioned within the ToL by using
methods known as phylogenetic classification. Phylogenetics is concerned with the
classification of species in a way that reflects
their evolutionary relatedness and classifies
them into groups called clades (Fig.1) which
include a species and all its descendants.
These clades are then depicted on a tree like
diagram called a phylogenetic tree based on
phenotypic characters such as internal and
external morphology and, behaviour and molecular sequence data, which will be discussed in
the next chapter. A tree can be established by identifying a particular trait in a clade and
determining if the trait is ancestral (plesiomorphic) or a newly derived trait (apomorphic)
(Fig.2). All possessing synapomorphic traits (traits
that are shared by two or more lineages within a
clade) are branched together high in the tree while
those believed to be more basal (primitive) and
missing the trait but possessing other shared
Figure 1. Demonstrates a clade on a phylogenetic tree.
Taken from (Understanding Evolution, 2011)
Figure 2. Demonstrates apomorphic and
plesiomorphic characters . Taken from
(Understanding Evolution, 2011)
4
plesiomorphic traits are placed lower on the tree indicating that they all share a common
ancestor.
Often, new data will conflict with existing trees. Take, for example, the evolution of
snakes, which are known to have arisen during the Cretaceous period approx. 98 million
years ago (Wilson et al., 2010). Like the mammals, after the K-T extinction (an extinction
event 65 million years ago) they exploded in diversity during the Tertiary (Benton, 2000).
Macrostomata, the wide gape that gives snakes the ability to swallow prey many times larger
than their skull, is believed to have evolved later in the clade Alethinophidia (Lee et al.,
2007). The most basal macrostomatans are Xenopeltids and the Core Macrostomata are
concerned with the more recently evolved booids which include giant constricting snakes and
the more advanced snakes, such as those that contain a venom apparatus (Lee et al., 2007).
This hypothesis suggests that a restricted gape in extant basal snakes is a plesiomorphic
character. However, recent findings in India of a 3.5 meter long snake found within a
sauropod dinosaur nest site, recovered from Upper Cretaceous rocks by Wilson et al., (2010),
demonstrates that early snakes did have the ability to attain a large body size and jaw
mobility did in fact allow them to subdue large prey. The finding also suggests that the
restricted gape in extant basal snakes is a result of miniaturisation of habitat and prey rather
than a plesiomorphic character.
Molecular sequencing and the impact of Horizontal Gene Transfer (HGT) & Genome
Fusion
A new era of inferring the ToL arrived in the 1960s, when molecular biology
techniques allowed us to read the genealogical history of an organism from its DNA sequence
(Zuckerkandl & Pauling., 1965). In contrast to the traditional taxonomic methods, sequencing
5
of an organism’s DNA revealed its evolutionary history and seemed promising in uncovering
the ToL (Woese, 2000). Traditional morphological identification grouped Tardigrada into a
monophyletic group with Onychophora and Arthropoda but recent large scale molecular data
places Tardigrada as a sister group to Arthropoda (Campbell et al., 2011). Ribosomal RNA
(rRNA) for example, is among the ideal molecules for sequencing, due to its universal
function (O’Malley & Koonin, 2011). Sequencing rRNA enabled biologists to build up a
comprehensive overview of historical events that become recorded in molecules and could
therefore, help us infer the universal ToL (McInerney et al., 2007). However, during the
1990s, it became apparent that it wouldn’t be an easy task to construct the ToL using
genomics. New data acquired from the sequencing of genomes conflicted with other
molecular data and this initially seemed to cast doubt on molecular sequencing as a reliable
tool for inferring the ToL. The errors in the molecular data occurred through a process known
as horizontal gene transfer (HGT). HGT occurs when organisms’ genes are incorporated into
another organism’s genome (Woese, 2000). The errors in interpretation arose because of the
fact that HGT was not discovered back then.
HGT might initially appear to be an undesirable event, particularly for rooting trees,
but in regards to cellular defence, cells appear more than capable of defending against foreign
DNA (Woese, 2000), suggesting that perhaps there are advantages to acquiring new genes.
There are extant multicellular organisms in which HGT is believed to have had a significant
impact on their anatomy and physiology. Take for instance, the sea slug Elysia chorotica, an
animal which acquires food via photosynthesis for part of its life (Rumpho et al., 2008).
Initially E. chorotica’s nutritional source is a filamentous algae Vaucheria litorea. After
eating, E. chorotica employs chloroplasts from V. litorea in the digestive apparatus to carry
out photosynthesis in a suitable organ (Rumpho et al., 2008). However, this process is time
limited as active photosynthesis can be carried out for approx. 10 months only. While this
6
process sounds ideal, especially during times of food scarcity, short lived proteins such as,
psb0 encoded in the nucleus of V. litorea must be regularly imported by the chloroplasts in
order to ensure continuous function (Skulachev, 2010). Acquiring the chloroplasts from algae
is not enough to carry out photosynthesis alone, the presence of psb0 genes are critical and E.
chorotica contains the psb0 genes which is identical, to that of V. litorea, as revealed through
sequencing (Rumpho et al., 2011). However, Wagele et al., (2011) argues that HGT is not
responsible for the acquisition of psb0 by E. chorotica which is needed to sustain
photosynthesis, and instead suggests that this gene is actually already possessed by the
plastids and are only aided by the intracellular environment of E. chorotica as it refrains
from degrading plastids.
HGT has had a minor impact on multicellular organisms, with some estimates of 2%
of an entire genome arising from HGT (McInerney et al., 2007). However, the impact on
single celled organisms is far greater and HGT has the potential to erase much of the history
recorded in prokaryotes prior to the evolution of eukaryotes (Simonson et al., 2005). The
importance of this is that it poses a difficulty in interpreting the true evolutionary history of
prokaryotes from molecular sequence data. This means that the original tree-like root on the
ToL has become compromised and it is now difficult to determine if the ToL is a tree like
structure at the root or if it is now net or web-like as a consequence of HGT. A net or web-
like tree illustrates that the rate at which HGT has occurred is too great to determine which is
its true ancestor and which is the original source of the transferred DNA (Williams et al.,
2011). However, if many molecules are sequenced and good computational analysis is
applied, it may still be possible to retrieve the true tree of cellular division (Puigbo et al.,
2009, Williams et al., 2011).
7
The evolution of eukaryotes (Fig.3)
occurred through a phenomenon known as
genome fusion. This refers to a process whereby
an organism incorporates its genome into another
organism’s genome (Cotton & McInerney, 2010).
Eukaryotes arose as a result of genome fusion,
between two diverse prokaryotic genomes,
eubacteria and archaebacteria, providing each other with biochemical services (Pisani et al.,
2007), resulting in a more complex cell of which the potential is evident by the diversity of
complex life on earth (Rivera & Lake,
2004). However, the eubacteria and
archaebacteria are both distinct separate
domains on the base of the ToL and this
genome fusion means that both domains
merged into one, and this means that the tree
of life now resembles a ring of life (Fig.4)
(Rivera & Lake, 2004).
DNA Barcoding
A molecular-based tool for identifying and categorising specimens has emerged over
the last decade (Hebert et al., 2003), and this may also aid in reconstructing the universal ToL
(Casiraghi et al., 2010). This simple concept known as DNA barcoding is achieved by
applying “barcodes” to specimens by marking a particular region on a DNA sequence called
the cytochrome c oxidase 1 (CO1) gene (Kress et al., 2005). CO1 is a 648-bp region on a
Figure 3. The three domains of life, eubacteria and
archaebacteria giving rise to the eukaryota. Taken
from (Understanding Evolution, 2011)
Figure 4. A schematic diagram of the ring of life. Taken
from (Rivera & Lake, 2004)
8
mitochondrial gene and it is a more suitable marker than nuclear DNA. This is, because
evolution occurs faster in mitochondrial DNA (mtDNA), resulting in a larger accumulation of
changes between species that are closely related (Hebert et al., 2004b). Although several
studies using DNA barcoding have been successful (Hebert et al., 2003, Hebert et al., 2004a,
Hebert et al., 2004b), DNA barcoding has been highly controversial (Meyer & Paulay, 2005).
Criticisms have been made, for example, in using single genes as universal markers, and this
is particularly true for CO1, as while it is successful in animals, it is not appropriate in
barcoding plants (Kress et al., 2005), because the gene evolves too slowly in plants and fungi
for it to discriminate between species levels (Pires & Marinoni, 2010), in which case other
genes are being explored (Kress et al., 2005).
In recent years, DNA barcoding has become more accepted by the scientific
community (Moritz & Cicero, 2004), bringing us closer to inferring the universal ToL.
Although taxonomists were largely sceptical and had reservations in regarding to DNA
barcoding methods, fearing that it might replace traditional taxonomic methods, it has now
become apparent that both methods are crucial and complement each other (Moritz & Cicero,
2004). However, DNA barcoding is still in its infancy and many issues need to be addressed.
For instance, Casiraghi et al., (2010) cautions that evolution is continuous and species are not
frozen in time. Therefore, we need to consider that once a given species is assigned a
barcode, how that will read as that species evolves in the future. Databases have been
established which confirm the growing support for DNA barcoding. For example, The
consortium for the barcode of life (CBOL) (http://barcoding.si.edu/) and The Barcode of Life
Data Systems (BOLD) (http://www.boldsystems.org/views/login.php) are online resource
databases that provide a global bank containing DNA barcode records (Edwards, 2005,
Ratnasingham & Hebert, 2007). As data accumulates, detailed phylogenies will become
evident and DNA barcoding will become a formidable tool for inferring the universal ToL.
9
Conclusion
Although binomial nomenclature does little to explain the evolutionary history of organisms,
the traditional methods of identifying shared morphological characteristics have been
beneficial to constructing phylogenetic trees and inferring the ToL. Molecular sequence data
in the 1960s revolutionised phylogenetics and has enabled us to resolve many relationships
between species. However, DNA sequencing often conflicts with traditional taxonomy and
has revealed the damaging effect that HGT has had on interpreting the evolutionary history
that is preserved in DNA. It first seemed that the legitimate root of the ToL was not
retrievable but we now know that large scale molecular and morphological work can resolve
deep branches and with the advancement of DNA barcoding, long term data compiling is
likely to open up new avenues in molecular sequencing and resolve detailed phylogenetic
trees. As it stands, the ToL resembles a ring of life as a result of endosymbiosis between
eubacteria and archaebacteria giving rise to the eukaryote.
10
References Benton, M.J., (2000) Vertebrate Palaeontology, Blackwell Science Ltd, Oxford. Campbell, L. I., O. Rota-Stabelli, G. D. Edgecombe, T. Marchioro, S. J. Longhorn, M. J. Telford, H.
Philippe, L. Rebecchi, K. J. Peterson & D. Pisani, (2011) MicroRNAs and phylogenomics resolve the relationships of Tardigrada and suggest that velvet worms are the sister group of Arthropoda. Proceedings of the National Academy of Sciences of the United States of America 108: 15920-15924.
Casiraghi, M., M. Labra, E. Ferri, A. Galimberti & F. De Mattia, (2010) DNA barcoding: a six-question tour to improve users' awareness about the method. Briefings in Bioinformatics 11: 440-453.
Cotton, J. A. & J. O. McInerney, (2010) Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function. Proceedings of the National Academy of Sciences of the United States of America 107: 17252-17255.
Darwin, C., (1859) On the Origin of Species By Means of Natural Selection, John Murray, London. Edwards, J. L., (2005) Consortium for the Barcode of Life: An introduction. Genes & Genetic Systems
80: 443. Futuyma, D. J., (2009) Evolution, Stony Brook, New York. Hebert, P. D. N., A. Cywinska, S. L. Ball & J. R. DeWaard, (2003) Biological identifications through DNA
barcodes. Proceedings of the Royal Society of London Series B-Biological Sciences 270: 313-321.
Hebert, P. D. N., E. H. Penton, J. M. Burns, D. H. Janzen & W. Hallwachs, (2004a) Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proceedings of the National Academy of Sciences of the United States of America 101: 14812-14817.
Hebert, P. D. N., M. Y. Stoeckle, T. S. Zemlak & C. M. Francis, (2004b) Identification of birds through DNA barcodes. Plos Biology 2: 1657-1663.
Kress, W. J., K. J. Wurdack, E. A. Zimmer, L. A. Weigt & D. H. Janzen, (2005) Use of DNA barcodes to identify flowering plants. Proceedings of the National Academy of Sciences of the United States of America 102: 8369-8374.
Lee, M. S. Y., A. F. Hugall, R. Lawson & J. D. Scanlon, (2007) Phylogeny of snakes (Serpentes): combining morphological and molecular data in likelihood, Bayesian and parsimony analyses. Systematics and Biodiversity 5: 371-389.
McInerney, J. O., D. E. Pisani, M. J. O'Connell, D. A. Fitzpatrick & C. J. Creevey, (2007) Evolutionary history of prokaryotes: Tree or no tree?, p. 49-59.
Meyer, C. P. & G. Paulay, (2005) DNA barcoding: Error rates based on comprehensive sampling. Plos Biology 3: 2229-2238.
Moritz, C. & C. Cicero, (2004) DNA barcoding: Promise and pitfalls. Plos Biology 2: 1529-1531. O’Malley. M. & E. Koonin, (2011) How stands the Tree of Life a century and a half after The Origin?
Biology Direct 6:32. Pires, A. C. & L. Marinoni, (2010) DNA barcoding and traditional taxonomy unified through
Integrative Taxonomy: a view that challenges the debate questioning both methodologies. Biota Neotropica 10: 339-346.
Pisani, D., J. A. Cotton & J. O. McInerney, (2007) Supertrees disentangle the chimerical origin of eukaryotic Genomes. Molecular Biology and Evolution 24: 1752-1760.
Puigbo, P., Y. I. Wolf & E. V. Koonin, (2009) Search for a 'Tree of Life' in the thicket of the phylogenetic forest. Journal of Biology (London) 8: 59.
Ratnasingham, S. & P. D. N. Hebert, (2007) BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes 7: 355-364.
Rivera, M. C. & J. A. Lake, (2004) The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431: 152-155.
11
Rumpho, M. E., K. N. Pelletreau, A. Moustafa & D. Bhattacharya, (2011) The making of a photosynthetic animal. Journal of Experimental Biology 214: 303-311.
Rumpho, M. E., J. M. Worful, J. Lee, K. Kannan, M. S. Tyler, D. Bhattacharya, A. Moustafa & J. R. Manhart, (2008) Horizontal gene transfer of the algal nuclear gene psbO to the photosynthetic sea slug Elysia chlorotica. Proceedings of the National Academy of Sciences of the United States of America 105: 17867-17871.
Simonson, A. B., J. A. Servin, R. G. Skophammer, C. W. Herbold, M. C. Rivera & J. A. Lake, (2005) Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the United States of America 102: 6608-6613.
Skulachev, V. P., (2010) Discovery of a Photosynthesizing Animal that Can Survive for Months in a Light-Dependent Manner. Biochemistry-Moscow 75: 1498-1499.
Understanding Evolution. 2011. University of California Museum of Paleontology. Available from <http://evolution.berkeley.edu/>. [Accessed on 02 November 2011]
Wagele. H., O. Deusch, K. Handeler, R. Martin, V. Schmitt, G. Christa, B. Pinzger, S. B. Gould, T. Dagan, A. Klussmann-Kolb, and W. Martin, (2011) Transcriptomic Evidence That Longevity of Acquired Plastids in the Photosynthetic Slugs Elysia timida and Plakobranchus ocellatus Does Not Entail Lateral Transfer of Algal Nuclear Genes. Mol. Biol. Evol. 28(1):699–706.
Williams, D., G. P. Fournier, P. Lapierre, K. S. Swithers, A. G. Green, C. P. Andam & J. P. Gogarten, (2011) A Rooted Net of Life. Biology Direct 6.
Wilson, J. A., D. M. Mohabey, S. E. Peters & J. J. Head, (2010) Predation upon Hatchling Dinosaurs by a New Snake from the Late Cretaceous of India. Plos Biology 8.
Woese, C. R., (2000) Interpreting the universal phylogenetic tree. Proceedings of the National Academy of Sciences of the United States of America 97: 8392-8396.
Zuckerkandl. E. & L. Pauling, (1965) Molecules as documents of evolutionary history. Journal of Theoretical Biology, 8(2): 357-366.