![Page 1: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/1.jpg)
An Introduction to Phylogenetics
> Sequence 1GAGGTAGTAATTAGATCCGAAA…> Sequence 2GAGGTAGTAATTAGATCTGAAA…> Sequence 3GAGGTAGTAATTAGATCTGTCA…
Anton E. Weisstein
Indiana State UniversityMarch 11-14, 2004
![Page 2: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/2.jpg)
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
![Page 3: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/3.jpg)
What is phylogenetics?
Phylogenetics is the study of evolutionary relationships.
Relationships among species:
crocodiles
birds
lizards
snakesrodents
primates
marsupials
![Page 4: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/4.jpg)
What is phylogenetics?
Relationships among species:
crocodiles
birds
lizards
snakes
rodents
primates
marsupials
This is an example of a phylogenetic tree.
![Page 5: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/5.jpg)
What is phylogenetics?
Relationships within species:HIV subtypes
Rwanda
Ivory Coast
UgandaU.S.
U.S.
Italy
U.K.
India Rwanda
EthiopiaS. Africa
Uganda
Uganda
Tanzania
Romania
BrazilCameroon
Netherlands
NetherlandsTaiwan
Russia
A
B
C
D
F G
![Page 6: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/6.jpg)
So what is phylogeneticsgood for?
Phylogenetics has direct applications to:
• Conservation: test wood, ivory, meat products for poaching
• Agriculture: analyze specific differences between cultivars
• Forensics: DNA fingerprinting
• Medicine: determine specific biochemical function of cancer-causing genes
![Page 7: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/7.jpg)
1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
HIV Example 1:Florida dentist case
![Page 8: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/8.jpg)
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
![Page 9: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/9.jpg)
Phylogenetic concepts:Interpreting a Phylogeny
Sequence A
Sequence B
Sequence C
Sequence D
Sequence E
Time
Which sequence is most closely related to B?
A, because B diverged from A more recently than from any other sequence.
Physical position in tree is not meaningful! Only tree structure matters.
![Page 10: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/10.jpg)
Phylogenetic concepts:Rooted and Unrooted Trees
Time
A
B
C
D
Root =
A B
C D
Root
X
=?
A B
C D
?
? ?
? ?
X
![Page 11: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/11.jpg)
How Many Trees?
Unrooted trees Rooted trees
# sequences
# pairwise distances # trees
# branches /
tree # trees
# branches
/tree
3 3 1 3 3 4
4 6 3 5 15 6
5 10 15 7 105 8
6 15 105 9 945 10
10 45 2,027,025 17 34,459,425 18
30 435 8.69 1036 57 4.95 1038 58
N N (N - 1)
2
(2N - 5)!
2N - 3 (N - 3)!
2N - 3 (2N - 3)!
2N - 2 (N - 2)!
2N - 2
![Page 12: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/12.jpg)
Tree Types
Root
50 million years
sharks
seahorses
frogs
owls
crocodiles
armadillosbats
Evolutionary trees measure time.
Root
sharksseahorses
frogsowls
crocodilesarmadillos
bats5% change
Phylograms measure change.
![Page 13: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/13.jpg)
Tree Properties
Root
UltrametricityAll tips are an equal
distance from the root.X
Y
a
b
c de
a = b + c + d + e
Root
AdditivityDistance between any two tips equals the total branch
length between them.
X
Y
ab
c d
e
XY = a + b + c + d + e
In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.
![Page 14: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/14.jpg)
Tree Building Exercise
UltrametricityAll tips are an equal
distance from the root. Root
X
Y
a
b
c de
a = b + c + d + e
Using the distance matrix given, construct an ultrametric tree.
![Page 15: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/15.jpg)
Phylogenetic Methods
Neighbor-joining• Minimizes distance between nearest neighbors
Maximum parsimony• Minimizes total evolutionary change
Maximum likelihood• Maximizes likelihood of observed data
Many different procedures exist. Three of the most popular:
![Page 16: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/16.jpg)
Comparison of Methods
Neighbor-joining Maximum parsimony Maximum likelihood
Uses only pairwise distances
Uses only shared derived characters
Uses all data
Minimizes distance between nearest neighbors
Minimizes total distance
Maximizes tree likelihood given specific parameter values
Very fast Slow Very slow
Easily trapped in local optima
Assumptions fail when evolution is rapid
Highly dependent on assumed evolution model
Good for generating tentative tree, or choosing among multiple trees
Best option when tractable (<30 taxa, homoplasy rare)
Good for very small data sets and for testing trees built using other methods
![Page 17: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/17.jpg)
Which procedure should we use?Neighbor-
joining
Maximumparsimony
Maximumlikelihood
All that we can!
?
• Each method has its own strengths
• Use multiple methods for cross-validation
• In some cases, none of the three gives the correct phylogeny!
![Page 18: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/18.jpg)
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
![Page 19: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/19.jpg)
Phylogenetic concepts:Homology and Homoplasy
Homology: identical character due to shared ancestry (evolutionary signal)
Homoplasy: identical character due to evolutionary convergence or reversal (evolutionary noise)
lizards
snakes
rodentsprimates
+hair
Homology Homoplasy(Convergence)
birds
snakes
rodentsbats
+flight
+flight
Homoplasy(Reversal)
worms
lizardssnakes
+legs–legs
![Page 20: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/20.jpg)
Watching the Molecular ClockMutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock.
% genetic difference
20012002
2001
2002
2000
![Page 21: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/21.jpg)
Watching the Molecular ClockMutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock.
% genetic difference
BUT:• Natural selection favors some mutations and eliminates others• Selection varies over time and across lineages
2000
20012002
200120012002
2002
![Page 22: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/22.jpg)
Trees are hypotheses about evolutionary history
So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.
![Page 23: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/23.jpg)
Tree Testing:Split Decomposition
Split decomposition is one method for testing a tree.
A
B
C
D
A
D
B
C
A
C
B
D
Under this procedure, we choose exactly four taxa (A, B, C, D) and examine the topologies of all possible unrooted trees. How many such trees are there?
Only one of these topologies is right. How can we quantitatively assess the support for each tree?
![Page 24: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/24.jpg)
Tree Testing:Split Decomposition
The correct tree should be approximately additive; the others usually will not. For each tree, we calculate split indices that estimate the length of the internal branch:
+A
D
B
C+
A
C
B
D
–
2Large split indices Long internal branch Topology strongly supported
Small split indices Short internal branch Topology weakly supported
Negative split indices Biologically impossible Topology probably wrong
=
if A
C
B
Dis the right phylogeny!
![Page 25: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/25.jpg)
Tree Testing:Bootstrapping
Used to assess the support for individual branches
Randomly resample characters, with replacement
How often does a specific branch appear?
Repeat many times (1000 or more)
rathumanturtlefruit flyoakduckweed
100
98
73
![Page 26: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/26.jpg)
Tree Testing:Bootstrapping
MacClade Example:
Vertebrate evolution
![Page 27: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/27.jpg)
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
![Page 28: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/28.jpg)
HIV Example 1:Florida dentist case
• 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
• HIV evolves so fast that transmission patterns can be reconstructed from viral sequence (molecular forensics).
• Compared viral sequence from the dentist, three of his HIV+ patients, and two HIV+ local controls.
![Page 29: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/29.jpg)
Florida dentist case
![Page 30: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E](https://reader031.vdocuments.mx/reader031/viewer/2022032722/56649cee5503460f949bcb8d/html5/thumbnails/30.jpg)
So what do the results mean?
• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?
• Do we have enough data to be confident in our conclusions? What additional data would help?
• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?