![Page 1: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/1.jpg)
Intro. To Phylogenetic Analysis
Slides modified by David Ardell
From Caro-Beth Stewart, Paul Higgs,
Joe Felsenstein and Mikael Thollesson
![Page 2: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/2.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
What is phylogenetic analysis and why should we perform it?
Phylogenetic analysis has two major components:
1. Phylogeny inference or “tree building” — evolutionary relationships between genes or species
2. Character and rate analysis —mapping information onto trees
![Page 3: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/3.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Ancestral Node or ROOT of
the TreeInternal Nodes (represent hypothetical ancestors of
the taxa)
Branches or Lineages
Terminal Nodes
A
B
C
D
E
Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny
Common Phylogenetic Tree Terminology
CLADE
![Page 4: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/4.jpg)
A
B
C
D
X and Y are defined to be more closely related to each other than to Z if, and only if, they share a more recent common ancestor than they do with Z
D C A BB A C D
![Page 5: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/5.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
All of these rearrangements show the same evolutionary relationships between the taxa
B
A
C
D
A
B
D
C
B
C
A
D
B
D
A
C
B
AC
DRooted tree 1a
B
A
C
D
A
B
C
D
![Page 6: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/6.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 7: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/7.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Taxon A
Taxon B
Taxon C
Taxon D
no meaning
Three types of trees
Cladogram
All show the same branching orders between taxa.
groupings
![Page 8: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/8.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Taxon A
Taxon B
Taxon C
Taxon D
1
1
1
6
3
5
evolutionary distance
Taxon A
Taxon B
Taxon C
Taxon D
no meaning
Three types of trees
Cladogram Phylogram
All show the same branching orders between taxa.
groupings groupings + distance
![Page 9: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/9.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Taxon A
Taxon B
Taxon C
Taxon D
1
1
1
6
3
5
Evolutionary distance
Taxon A
Taxon B
Taxon C
Taxon D
time
Taxon A
Taxon B
Taxon C
Taxon D
no meaning
Three types of trees
Cladogram Phylogram Ultrametric tree
All show the same branching orders between taxa.
groupings groupings + distance groupings + time
![Page 10: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/10.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Similarity vs. Evolutionary Relationship:
Since taxa evolve at different rates, your closest relative could be very different
Taxon A
Taxon B
Taxon C (think lamprey)
Taxon D
1
1
1
6
3
5
C is closer to A but more closely relatedto B
This is why the closest BLAST hit is not necessarily the closest relative, and why you need to make trees.
![Page 11: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/11.jpg)
Types of Similarity
Observed similarity between two entities can be due to:
Evolutionary relationship:Shared ancestral characters (‘plesiomorphies’)Shared derived characters (‘’synapomorphy’)
Homoplasy (independent evolution of the same character):Convergent events,Parallel events, Reversals
CC
G
G
C
C
G
G
CG
G C
C
G
GT
![Page 12: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/12.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
A few examples of what can be inferred from phylogenetic trees built from DNA
or protein sequence data:
• Which species are the closest living relatives of modern humans?
• Did the infamous Florida Dentist infect his patients with HIV?
• What were the origins of specific transposable elements?
![Page 13: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/13.jpg)
Which species are the closest living relatives of modern humans?
Classical view
Humans
Bonobos
Gorillas
Orangutans
Chimpanzees
MYA015-30
![Page 14: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/14.jpg)
Which species are the closest living relatives of modern humans?
Molecular viewClassical view
MYA
Chimpanzees
OrangutansHumans
Bonobos
Gorillas Humans
Bonobos
GorillasOrangutans
Chimpanzees
MYA015-30 014
![Page 15: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/15.jpg)
Did the Florida Dentist infect his patients with HIV?
DENTIST
DENTIST
Patient D
Patient F
Patient C
Patient A
Patient G
Patient BPatient E
Patient A
Local control 2
Local control 3
Local control 9
Local control 35
Local control 3
Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.
No
No
From Ou et al. (1992) and Page & Holmes (1998)
Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:
![Page 16: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/16.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Uses of character mapping:
• Dating adaptive evolutionary events
• Ancestral reconstruction
• Testing biological hypotheses of correlated function or change
![Page 17: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/17.jpg)
Ex: Where geographically was thecommon ancestor of African apes and humans?
Eurasia = Black Africa = Red
= Dispersal
Modified from: Stewart, C.-B. & Disotell,T.R. (1998) Current Biology 8: R582-588.
Scenario B requires fourfewer dispersal events
OW Monkeys
Chimpanzees
Humans
Gorillas
Orangutans
Gibbons
Chimpanzees
Humans
Gorillas
Orangutans
Gibbons
Chimpanzees
Humans
Gorillas
Orangutans
Gibbons
Chimpanzees
Humans
Gorillas
Orangutans
Gibbons
Ouranopithecus
Dryopithecus
Lufengpithecus
Living Species
Living + Fossil Species
Oreopithecus
Proconsul
OW Monkeys
OW Monkeys
Kenyapithecus
OW Monkeys
Kenyapithecus
Proconsul
Ouranopithecus
Dryopithecus
Lufengpithecus
Oreopithecus
Scenario A: Africa as species fountain Scenario B: Eurasia as ancestral homeland
![Page 18: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/18.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Building Trees
COMPUTATIONAL METHOD
Clustering algorithmOptimality criterion
DA
TA
TY
PE
Ch
arac
ters
Dis
tan
ces
PARSIMONY
MAXIMUM LIKELIHOOD
UPGMA
NEIGHBOR-JOINING
MINIMUM EVOLUTION
LEAST SQUARES
![Page 19: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/19.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Building Trees
COMPUTATIONAL METHOD
Clustering algorithmOptimality criterion
DA
TA
TY
PE
Ch
arac
ters
Dis
tan
ces
PARSIMONY
MAXIMUM LIKELIHOOD
UPGMA
NEIGHBOR-JOINING
MINIMUM EVOLUTION
LEAST SQUARES
![Page 20: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/20.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Building Trees
COMPUTATIONAL METHOD
Clustering algorithmOptimality criterion
DA
TA
TY
PE
Ch
arac
ters
Dis
tan
ces
PARSIMONY
MAXIMUM LIKELIHOOD
UPGMA
NEIGHBOR-JOINING
MINIMUM EVOLUTION
LEAST SQUARES
![Page 21: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/21.jpg)
Types of data:
Character-data: Taxa Characters
Species A ATGGCTATTCTTATAGTACGSpecies B ATCGCTAGTCTTATATTACASpecies C TTCACTAGACCTGTGGTCCASpecies D TTGACCAGACCTGTGGTCCGSpecies E TTGACCAGTTCTCTAGTTCG
Distance-based data: pairwise distances (dissimilarities)
A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ----
Uncorrected“p” distance
Example 2: Kimura 2-parameter distance
![Page 22: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/22.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 23: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/23.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 24: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/24.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Building Trees
COMPUTATIONAL METHOD
Clustering algorithmOptimality criterion
DA
TA
TY
PE
Ch
arac
ters
Dis
tan
ces
PARSIMONY
MAXIMUM LIKELIHOOD
UPGMA
NEIGHBOR-JOINING
MINIMUM EVOLUTION
LEAST SQUARES
![Page 25: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/25.jpg)
Parsimony
Given two trees, the one requiring the lowest number of character changes to explain the observations is the better
– Parsimony score for a tree is the minimum number of required changes
– This score is frequently referred to as number of steps or tree length
![Page 26: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/26.jpg)
Parsimony – an example acgtatgga acgggtgca aacggtgga aactgtgca
: c
: c
: a
: a
: c
: c
: a
: a
: c
: a
: a
: c
Total tree length: 7 Total tree length: 8 Total tree length: 8
![Page 27: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/27.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Building Trees
COMPUTATIONAL METHOD
Clustering algorithmOptimality criterion
DA
TA
TY
PE
Ch
arac
ters
Dis
tan
ces
PARSIMONY
MAXIMUM LIKELIHOOD
UPGMA
NEIGHBOR-JOINING
MINIMUM EVOLUTION
LEAST SQUARES
![Page 28: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/28.jpg)
Using modelsObserved differences
Actual changes
A G
C T
€
Q =
−3α α α α
α −3α α α
α α −3α α
α α α −3α
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
Example: Jukes-Cantor
pij =14
−14e−4αt
pij =14
+34e−4αt
, if i=j
, if i≠j
A C GC
A C G T
ACGT
![Page 29: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/29.jpg)
![Page 30: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/30.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 31: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/31.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 32: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/32.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 33: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/33.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 34: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/34.jpg)
-55,0
-54,5
-54,0
-53,5
-53,0
-52,5
-52,0
-51,5
-51,0
-50,5
0 0,02 0,04 0,06 0,08 0,1
30 nucleotides from -globin genes of two primates on a one-edge tree * *
Gorilla GAAGTCCTTGAGAAATAAACTGCACACTGGOrangutan GGACTCCTTGAGAAATAAACTGCACACTGG
There are two differences and 28 similarities
L =1
161−e−4αt( )
⎡ ⎣ ⎢
⎤ ⎦ ⎥
2 116
1+3e−4αt( )
⎡ ⎣ ⎢
⎤ ⎦ ⎥
28
t
lnL
t= 0.02327lnL= -51.133956
Likelihood of a one-branch tree…
![Page 35: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/35.jpg)
A recipe for phylogenetic inference
Collect your data Select an optimality criterion (“which tree is better?”, tree
score) Optional: do data transformation (“corrections”) Select a search strategy to find the best tree Find the best hypothesis according to that criterion Assess the variation in your data in some way
![Page 36: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/36.jpg)
Finding the best tree
Number of (rooted) trees– 3 taxa -> 3 trees– 4 taxa -> 15 trees– 10 taxa -> 34 459 425 trees– 25 taxa -> 1,19·1030 trees– 52 taxa -> 2,75·1080 trees
Finding the optimal tree is an NP-complete problem
–Search strategiesExact
Exhaustive Branch and bound
Algorithmic Greedy algorithms, a.k.a.
hill-climbing (including Neighbor-joining)
Heuristic Systematic; branch-
swapping (NNI, SPR, TBR)
Stochastic – Markov Chain Monte
Carlo (MCMC)– Genetic algorithms
![Page 37: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/37.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Completely unresolvedor "star" phylogeny
Partially resolvedphylogeny
Fully resolved,bifurcating phylogeny
A A A
B
B B
C
C
C
E
E
E
D
D D
Polytomy or multifurcation A bifurcation
“Star-Decomposition”
![Page 38: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/38.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
There are three possible unrooted trees on four taxa (A, B, C, D)
A C
B D
Tree 1
A B
C D
Tree 2
A B
D C
Tree 3
![Page 39: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/39.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
The number of unrooted trees increases in a greater than exponential manner with number of taxa
(2N - 5)!! = # unrooted trees for N taxa
CA
B D
A B
C
A D
B E
C
A D
B E
C
F
![Page 40: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/40.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 41: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/41.jpg)
What is a “good” method?
Efficiency Power Consistency Robustness Falsifiability
– Time to find a/the solution
– Rate of convergence/how much data are needed
– Convergence to “correct” solution as data are added
– Performance when assumptions are violated
– Rejection of the model when inadequate
![Page 42: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/42.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 43: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/43.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
![Page 44: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/44.jpg)
![Page 45: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/45.jpg)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
10 100 1000 10000 100000
Lakes invariants Parsimonny, uniform
UPGMA, Kimura NJ, Kimura
ML, Kimura Parsimony, weighted
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
10 100 1000 10000 100000
UPGMA, Kimura
NJ, percentage
Parsimony, uniform
Parsimony,weightedNJ, Kimura
ML, Kimura
Frequency of correct inference
Sequence length
All 0.50
0.30 and0.05 respectively
Performance on simulated data
![Page 46: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/46.jpg)
+ and – of the methods Pair-wise, NJ, distance approach
+ Fast (efficiency)
+ Models can be used to make distances (can be consistent)
– pairwise distances throw out information (loss of power)
– One will get a tree, but no score to compare with other trees or hypotheses
Parsimony and tree-search+ Philosophically appealing – Occam’s razor
– Can be inconsistent
– Can be computationally slow due to a huge number of possible trees Maximum likelihood and tree-search
+ Model-based, can be consistent, powerful, gain biological info
– Model-based, bad when you have the wrong model
– Computationally veeeeery slow due to heavy calculations in determining the tree score and a huge number of possible trees
![Page 47: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/47.jpg)
The quick and dirty, pretty good tree
Calculate model-based pairwise distances. Make a Neighbor-Joining Tree Do a bootstrap
![Page 48: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/48.jpg)
A recipe for phylogenetic inference
Collect your data Select an optimality criterion (“which tree is better”?) Optional: do data transformation (“corrections”) Select a search strategy to find the best tree Find the best hypothesis according to that criterion Assess the variation in your data in some way
![Page 49: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/49.jpg)
Assessing the variation
Jackknife – resampling without replacement Bootstrap – resampling with replacement
![Page 50: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/50.jpg)
Assessing the variation
Jackknife – resampling without replacement Bootstrap – resampling with replacement
1. Resample columns from an alignment with replacement to make a simulated sample of the same size
![Page 51: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/51.jpg)
Assessing the variation
Jackknife – resampling without replacement Bootstrap – resampling with replacement
1. Resample columns from an alignment with replacement to make a simulated sample of the same size
2. Analyze this resampled dataset in the same way as you did the original sample
![Page 52: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/52.jpg)
Assessing the variation
Jackknife – resampling without replacement Bootstrap – resampling with replacement
1. Resample columns from an alignment with replacement to make a simulated sample of the same size
2. Analyze this resampled dataset in the same way as you did the original sample
3. Repeat this 100+ times, making 100 bootstrap trees
![Page 53: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/53.jpg)
Assessing the variation
Jackknife – resampling without replacement Bootstrap – resampling with replacement
1. Resample columns from an alignment with replacement to make a simulated sample of the same size
2. Analyze this resampled dataset in the same way as you did the original sample
3. Repeat this 100+ times, making 100 bootstrap trees
4. Summarize, for example, as a majority-rule consensus tree
5. Clades in 50% of the trees will be shown, need 70% to be called “weakly supported”
![Page 54: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/54.jpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Aus C G A C G G T G G T C T A T A C A C G ABeus C G G C G G T G A T C T A T G C A C G GCeus T G G C G G C G T C T C A T A C A A T ADeus T A A C G A T G A C C C G A C T A T T G
Original data set with n characters.
2 3 13 8 3 19 14 6 20 20 7 1 9 11 17 10 6 14 8 16Aus G A A G A G T G A A T C G C A T G T G CBeus G G A G G G T G G G T C A C A T G T G CCeus G G A G G T T G A A C T T T A C G T G CDeus A A G G A T A A G G T T A C A C A A G T
Draw n characters randomly with re-placement. Repeat m times.
m pseudo-replicates, each with n characters.
Aus
Beus
Ceus
Deus
Original analysis, e.g. MP, ML, NJ.
Aus
Beus
Ceus
Deus
75%
Evaluate the results from the m analyses.
Aus
Beus
Ceus
Deus
Aus
Beus
Ceus
Deus
Aus
Beus
Ceus
Deus
Aus
Beus
Ceus
Deus
Aus
Beus
Ceus
Deus
Aus
Beus
Ceus
Deus
Repeat original analysis on each of the pseudo-replicate data sets.
Bootstrap
NB! The consensus tree is not a phylogenetic hypothesis, but a way to summarize other trees – in this case bootstrapped trees
![Page 55: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/55.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Rooting
To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A
BC
Root D
A B C D
RootNote that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.
Rooted tree
Unrooted tree
![Page 56: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/56.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
Now, try it again with the root at another position:
A
BC
Root
D
Unrooted tree
Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.
C D
Root
Rooted tree
A
B
![Page 57: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/57.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
An unrooted, four-taxon tree can be rooted in five different places
The unrooted tree 1:
A C
B D
Rooted tree 1d
C
D
A
B
4
Rooted tree 1c
A
B
C
D
3
Rooted tree 1e
D
C
A
B
5
Rooted tree 1b
A
B
C
D
2
Rooted tree 1a
B
A
C
D
1
![Page 58: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/58.jpg)
Outgroup rooting: Uses taxa or sequences (the “outgroup”) known to fall outside all the others (the “ingroup”). Requires prior knowledge.
There are two major ways to root trees:
A
B
C
D
10
2
3
5
2
Midpoint rooting:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. Assumes clock-like evolution.
outgroup
d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9
![Page 59: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/59.jpg)
C-B Stewart, NHGRI lecture, 12/5/00
x =
CA
B D
A D
B E
C
A D
B E
C
F (2N - 3)!! = # unrooted trees for N taxa
Each unrooted tree theoretically can be rooted anywhere along any of its branches
![Page 60: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/60.jpg)
We have arrived at a tree – can we trust it as a good hypothesis of the phylogeny?
What can go wrong? Sampling error
– Assessed by - for example - the bootstrap Too superficial tree search
– Remember – finding the best tree is really hard– Systematic error (inconsistent method)– Tests of the adequacy of models used– Premeditated use of different methods
Reality– A tree may be a poor model of the real history– Information has been lost by subsequent evolutionary changes
“Species” vs. “gene” trees
![Page 61: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/61.jpg)
Canis MusGadus
What is wrong with this tree?
Negligible (within sequence) sampling error
Tree estimated by a consistent method
100
100
![Page 62: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/62.jpg)
Gene duplication
“Species” tree
“Gene” trees
The expected tree…
![Page 63: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/63.jpg)
Canis Mus Gadus Gadus Mus Canis
Two copies (paralogs) present in the genomes
Paralogous
Orthologous Orthologous
![Page 64: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/64.jpg)
Canis Gadus Mus
What we have studied…
![Page 65: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/65.jpg)
Canis Gadus Mus
What we have studied…
Message: specific loss patterns of paralogs can disrupt species trees if we don’t know what is a paralogAnd what is an ortholog
![Page 66: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/66.jpg)
To conclude– Phylogenetic inference deals with historical events and information
transfer through time Results from phylogenetic analyses are hypotheses for further testing;
the true history will remain unknown Inference is mathematical intricate and computational heavy, and as a
result methods for phylogenetic inference are legio There are several pitfalls to avoid when doing the analyses and when
interpreting them But… Ignoring the shared histories can sometimes give completely
bogus results in comparative studies
![Page 67: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson](https://reader030.vdocuments.mx/reader030/viewer/2022032805/56649eea5503460f94bfb75a/html5/thumbnails/67.jpg)
Phylogenetic trees diagram the evolutionary relationships between the taxa
((A,(B,C)),(D,E))
Taxon A
Taxon B
Taxon C
Taxon E
Taxon D
No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.