phylogenetic analysis 1 phylogeny (phylo =tribe + genesis)

47
Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Upload: bridget-mason

Post on 17-Dec-2015

234 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Phylogenetic Analysis 1

Phylogeny (phylo =tribe + genesis)

Page 2: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

What can be inferred from phylogenetic trees built from sequence data?

• Which species are the closest living relatives of modern humans?

• Did the infamous Florida Dentist infect his patients with HIV?

• What were the origins of specific transposable elements?

• Plus countless others…..

Page 3: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Which species are the closest living relatives of modern humans?

Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas.

The pre-molecular view was that the great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans, and that humans diverged from the apes at least 15-30 MYA.

MYA

Chimpanzees

Orangutans Humans

Bonobos

GorillasHumans

Bonobos

Gorillas Orangutans

Chimpanzees

MYA015-30014

Page 4: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Did the Florida Dentist infect his patients with HIV?

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient A

Patient G

Patient BPatient E

Patient A

Local control 2

Local control 3

Local control 9

Local control 35

Local control 3

Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.

No

No

From Ou et al. (1992) and Page & Holmes (1998)

Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:

Page 5: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

What can be learned from character analysis using

phylogenies?

• When did specific episodes of positive Darwinian selection occur during evolutionary history?

• Which genetic changes are unique to the human lineage?

• What was the most likely geographical location of the common ancestor of the African apes and humans?

• Plus countless others…..

Page 6: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

What was the most likely geographical location of the

common ancestor of the African apes and humans?

Eurasia = Black Africa = Red

= Dispersal

Modified from: Stewart, C.-B. & Disotell,T.R. (1998) Current Biology 8: R582-588.

Scenario B requires fourfewer dispersal events

OW Monkeys

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Ouranopithecus

Dryopithecus

Lufengpithecus

Living Species

Living + Fossil Species

Oreopithecus

Proconsul

OW Monkeys

OW Monkeys

Kenyapithecus

OW Monkeys

Kenyapithecus

Proconsul

Ouranopithecus

Dryopithecus

Lufengpithecus

Oreopithecus

Scenario A: Africa as species fountain Scenario B: Eurasia as ancestral homeland

Page 7: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

How can we choose between competing hypotheses on phylogeny of whales?

Page 8: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Phylogenetic Reconstruction of Whales

• Whales belong to artiodactyla (ungulate mammals), which includes camels, pigs, hippos, cows, deer

• Outgroup is rhinos/horses • Difficult to place them because they lack

many characters present in terrestrial mammals (e.g. hind limbs)

• Are whales sister to entire group or to hippos?

Page 9: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

DNA Sequence Data and Whale Evolution

• Data collected from beta-casein gene for all taxa and sequences aligned.

• Nucleotide changes between outgroup and ingroup species indicate shared derived homologies.

• Most nucleotides are identical in all taxa, these are uninformative for phylogeny.

• Some nucleotides indicate that whales belong with cows, deer, and hippos (162).

• Others indicate that whales and hippos are sister groups (166).

• Others contradict sister group status of whale/hippo and cow deer (177) and may indicate a reversal.

Page 10: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Problems in Reconstructing Phylogeny

• Characters sometimes conflict • It is sometimes difficult to tell homology

from homoplasy– Analogy- characters similar because of

convergent evolution – Reversal- character reverts to ancestral form

• With morphological characters, careful examination may distinguish homoplasy (orthologs) from homology

• With molecular characters (DNA/Protein sequences), orthologs sometimes impossible to distinguish from homologs and paralogs.

Page 11: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

A Phylogenetic Tree

• Taxon -- Any named group of organisms – evolutionary theory not necessarily involved.

• Clade -- A monophyletic taxon (evolutionary theory utilized)

Page 12: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

A phylogenetic tree with branch lengths

• Branch length can be significant…

• In this case it is and mouse is slightly more similar to fly than human is to fly (sum of branches 1+2+3 is less than sum of 1+2+4)

Page 13: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Ancestral Node or ROOT of

the TreeInternal Nodes orDivergence Points

(represent hypothetical ancestors of the taxa)

Branches or Lineages

Terminal Nodes

A

B

C

D

E

Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny

Common Phylogenetic Tree Terminology

Page 14: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Phylogenetic trees diagram the evolutionary relationships between the taxa

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.

This dimension either can have no scale (for ‘cladograms’),can be proportional to genetic distance or amount of change(for ‘phylograms’ or ‘additive trees’), or can be proportionalto time (for ‘ultrametric trees’ or true evolutionary trees).

These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely related.

Page 15: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Taxon A

Taxon B

Taxon C

Taxon D

1

1

1

6

3

5

genetic change

Taxon A

Taxon B

Taxon C

Taxon D

time

Taxon A

Taxon B

Taxon C

Taxon D

no meaning

Three types of trees

Cladogram Phylogram Ultrametric tree

All show the same evolutionary relationships, or branching orders, between the taxa.

Page 16: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Types of trees: cladogram

Pagurus bernhardus

Pagurus acadianus

Ellasochirus tenuimanus

Labidochirus splendescens

Lithodes aequispina

Paralithodes camtschatica

Pagurus pollicaris (NE)

Pagurus pollicaris (GU)

Pagurus longicarpus (NE)

Pagurus longicarpus (GU)

Clibanarius vittatus

Coenobita sp.

Artemia salina

t1

t2

cladogramrelative recenct common descent.

•Does not imply that ancestors on the same line necessarily speciated at the same time. • t1 can be

before or after t2 but not before t3

t3

(no time scale)

Page 17: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Pagurus bernhardus

Pagurus acadianus

Ellasochirus tenuimanus

Labidochirus splendescens

Lithodes aequispina

Paralithodes camtschatica

Pagurus pollicaris (NE)

Pagurus pollicaris (GU)

Pagurus longicarpus (NE)

Pagurus longicarpus (GU)

Clibanarius vittatus

Coenobita sp.

Artemia salina

0.05

phylogram(additive tree: branch lenghts can be summed)

relative recenctcommon descent, and

branch lengths =amount of change

Types of trees: phylogram

Page 18: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Pagurus bernhardus

Pagurus acadianus

Ellasochirus tenuimanus

Labidochirus splendescens

Lithodes aequispina

Paralithodes camtschatica

Pagurus pollicaris (NE)

Pagurus pollicaris (GU)

Pagurus longicarpus (NE)

Pagurus longicarpus (GU)

Clibanarius vittatus

Coenobita sp.

Artemia salina

0.000.050.100.15

Ultrametric tree(linearized tree)

Amount of change can be scaled to time

Types of trees: ultrametric

scale = time

divergence

All tree tips are equidistant from the root

Page 19: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Completely unresolvedor "star" phylogeny

Partially resolvedphylogeny

Fully resolved,bifurcating phylogeny

A A A

B

B B

C

C

C

E

E

E

D

D D

Polytomy or multifurcation A bifurcation

The goal of phylogeny inference is to resolve the branching orders of lineages in evolutionary trees

Page 20: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

There are three possible unrooted trees for four taxa (A, B, C, D)

A C

B D

Tree 1

A B

C D

Tree 2

A B

D C

Tree 3

Phylogenetic tree building (or inference) methods are aimed at discovering which of the possible unrooted trees is "correct".We would like this to be the “true” biological tree — that is, one that accurately represents the evolutionary history of the taxa.However, we must settle for discovering the computationally correct or optimal tree for the phylogenetic method of choice.

Page 21: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

The number of unrooted trees increases in a greater than exponential manner with number of taxa

(2N - 5)!! = # unrooted trees for N taxa

CA

B D

A B

C

A D

B E

C

A D

B E

C

F

Page 22: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Inferring evolutionary relationships between the taxa requires rooting the tree:

To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root:

A

BC

Root D

A B C D

RootNote that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.

Rooted tree

Unrooted tree

Page 23: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Try it again with the root at another position

A

BC

Root

D

Unrooted tree

Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.

C D

Root

Rooted tree

A

BB

Page 24: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees

The unrooted tree 1:

A C

B D

Rooted tree 1d

C

D

A

B

4

Rooted tree 1c

A

B

C

D

3

Rooted tree 1e

D

C

A

B

5

Rooted tree 1b

A

B

C

D

2

Rooted tree 1a

B

A

C

D

1

These trees show five different evolutionary relationships among the taxa!

Page 25: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

• Sometimes two trees may look very different but, in fact, differ only in the position of the root

Page 26: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

All of these rearrangements show the same evolutionary relationships between the taxa

B

A

C

D

A

B

D

C

B

C

A

D

B

D

A

C

B

AC

DRooted tree 1a

B

A

C

D

A

B

C

D

Page 27: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. The outgroup can either be species (e.g., birds to root a mammalian tree) or previous gene duplicates (e.g., a-globins to root b-globins).

There are two major ways to root trees

A

B

C

D

10

2

3

5

2

By midpoint or distance:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. Assumes that the taxa are evolving in a clock-like manner. This assumption is built into some of the distance-based tree building methods.

outgroup

d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9

Page 28: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Rooting Using an Outgroup

• The outgroup should be a sequence (or set of sequences) known to be less closely related to the rest of the sequences than they are to each other.

• It should ideally be as closely related as possible to the rest of the sequences while still satisfying the first condition.

• The root must be somewhere between the outgroup and the rest (either on the node or in a branch).

Page 29: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Automatic rooting

• Many software packages will root trees automatically (e.g. mid-point rooting in NJPlot)

• This normally involves assumptions… BEWARE!

Page 30: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

x =

CA

B D

A D

B E

C

A D

B E

C

F (2N - 3)!! = # unrooted trees for N taxa

Each unrooted tree theoretically can be rooted anywhere along any of its branches

Page 31: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Molecular phylogenetic tree building methodsAre mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows:

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 32: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Types of data used in phylogenetic inferenceCharacter-based methods: Use the aligned characters, such as DNA

or protein sequences, directly during tree inference. Taxa Characters

Species A ATGGCTATTCTTATAGTACGSpecies B ATCGCTAGTCTTATATTACASpecies C TTCACTAGACCTGTGGTCCASpecies D TTGACCAGACCTGTGGTCCGSpecies E TTGACCAGTTCTCTAGTTCG

Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building.

A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ----

Example 1: Uncorrected“p” distance(=observed percentsequence difference)

Example 2: Kimura 2-parameter distance(estimate of the true number of substitutions between taxa)

Page 33: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Similarity vs. Evolutionary Relationship

Similarity and relationship are not the same thing, even thoughevolutionary relationship is inferred from certain types of similarity.

Similar: having likeness or resemblance (an observation)

Related: genetically connected (an historical fact)

Two taxa can be most similar without being most closely-related:

Taxon A

Taxon B

Taxon C

Taxon D

1

1

1

6

3

5

C is more similar in sequence to A (d = 3) than to B (d = 7),but C and B are most closelyrelated (that is, C and B shareda common ancestor more recentlythan either did with A).

Page 34: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Character-based methods can tease apart types of similarity and theoreticallyfind the true evolutionary tree. Similarity = relationship only if certain conditionsare met (if the distances are ‘ultrametric’).

Types of Similarity

Observed similarity between two entities can be due to:

Evolutionary relationship:Shared ancestral characters (‘plesiomorphies’)Shared derived characters (‘’synapomorphy’)

Homoplasy (independent evolution of the same character):Convergent events (in either related on unrelated entities),Parallel events (in related entities), Reversals (in related entities)

CC

G

G

C

C

G

G

CG

G C

C

G

GT

Page 35: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

METRIC DISTANCES between any two or three taxa(a, b, and c) have the following properties:

Property 1: d (a, b) ≥ 0 Non-negativity

Property 2: d (a, b) = d (b, a) Symmetry

Property 3: d (a, b) = 0 if and only if a = b Distinctness

Property 4: d (a, c) ≤ d (a, b) + d (b, c) Triangle inequality:

a

b

c6

9

5

Page 36: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

ULTRAMETRIC DISTANCESmust satisfy the previous four conditions, plus:

Property 5 d (a, b) ≤ maximum [d (a, c), d (b, c)]

If distances are ultrametric, then the sequences are evolving in a perfectly clock-like manner, thus can be used in UPGMA trees and for the most precise calculations of divergence dates.

a b4

66

c

Similarity = Relationship if the distances are ultrametric!

a

b

c

2

22

4

This implies that the two largest distances are equal, so that they define an isosceles triangle:

Page 37: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

ADDITIVE DISTANCES:

Property 6:

d (a, b) + d (c, d) ≤ maximum [d (a, c) + d (b, d), d (a, d) + d (b, c)]

For distances to fit into an evolutionary tree, they must be eithermetric or ultrametric, and they must be additive. Estimateddistances often fall short of these criteria, and thus can fail toproduce correct evolutionary trees.

Page 38: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Types of computational methods

Page 39: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Clustering algorithms:

• Use pairwise distances. • Are purely algorithmic methods, in which

the algorithm itself defines the the tree selection criterion.

• Tend to be very fast programs that produce singular trees rooted by distance.

• No objective function to compare to other trees, even if numerous other trees could explain the data equally well.

• Warning: Finding a singular tree is not necessarily the same as finding the "true” evolutionary tree.

Page 40: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Optimality approaches:

• Use either character or distance data.• First define an optimality criterion

(minimum branch lengths, fewest number of events, highest likelihood), and then use a specific algorithm for finding trees with the best value for the objective function.

• Can identify many equally optimal trees, if such exist.

• Warning: Finding an optimal tree is not necessarily the same as finding the "true” tree.

Page 41: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Exact algorithms: "Guarantee" to find the optimal or "best" tree for the method of choice. Two types used in tree building:

Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method.

Branch-and-bound search: Eliminates the parts of thesearch tree that only contain suboptimal solutions.

Heuristic algorithms: Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so. Heuristic searchesoften operate by “hill-climbing” methods.

Computational methods for finding optimal trees:

Page 42: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Exact searches become increasingly difficult, andeventually impossible, as the number of taxa

increases:

(2N - 5)!! = # unrooted trees for N taxa

A D

B E

C

CA

B D

A B

C

A D

B E

C

F

Page 43: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Heuristic search algorithms are input order dependent and can get stuck in local minima or maxima

Rerunning heuristic searches using different input orders of taxa can help

find global minima or maxima

Searchfor global minimum GLOBAL

MAXIMUM

GLOBALMINIMUM

localminimum

localmaximum

Searchfor globalmaximum

GLOBALMAXIMUM

GLOBALMINIMUM

Page 44: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Assumptions made by phylogenetic methods:

• The sequences are correct• The sequence are homologous• Each position is homologous• The sampling of taxa or genes is sufficient to

resolve the problem of interest• Sequence variation is representative of the

broader group of interest• Sequence variation contains sufficient

phylogenetic signal (as opposed to noise) to resolve the problem of interest

• Each position in the sequence evolved independently

Page 45: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Problems with Phylogenetic Inference

1. How do we know what the potential candidate trees are?

2. How do we choose which tree is (most likely) the true tree?

Page 46: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Recipe for reconstructing a phylogeny

1. Select an optimality criterion2. Select a search strategy3. Use the selected search

strategy to generate a series of trees, and apply the selected optimality criterion to each tree, always keeping track of the “best” tree examined thus far.

How do you know the “best” tree?Which is the “true” tree?

Page 47: Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)

Search strategy: Which is the right tree?

• When m is the number of taxa, the number of possible trees is:– [(2m-3)!]/[2m-2(m-2)!]– For 10 taxa, the number of trees is 34,459,425

• Many trees can be discarded because they are obviously wrong

• Sometimes, there is a general or even specific grouping that can serve as a start for the tree search

• There are a number of approaches to tree searches that can be used