molecular phylogeny and evolution
DESCRIPTION
Molecular Phylogeny and Evolution. Outline. Introduction to evolution and phylogeny Nomenclature of trees Five stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment [3] models of substitution [4] tree-building [5] tree evaluation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/1.jpg)
Molecular Phylogenyand Evolution
![Page 2: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/2.jpg)
Introduction to evolution and phylogeny
Nomenclature of trees
Five stages of molecular phylogeny:[1] selecting sequences[2] multiple sequence alignment[3] models of substitution[4] tree-building[5] tree evaluation
Outline
![Page 3: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/3.jpg)
Studies of molecular evolution began with the firstsequencing of proteins, beginning in the 1950s.
In 1953 Frederick Sanger and colleagues determinedthe primary amino acid sequence of insulin.
(The accession number of human insulin is NP_000198)
Historical background
Page 219
![Page 4: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/4.jpg)
Note the sequence divergence in the disulfide loop region of the A chain Fig. 7.3
Page 220
![Page 5: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/5.jpg)
By the 1950s, it became clear that amino acid substitutions occur nonrandomly. For example, Sanger and colleagues noted that most amino acid changes in the insulin A chain are restricted to a disulfide loop region.Such differences are called “neutral” changes(Kimura, 1968; Jukes and Cantor, 1969).
Subsequent studies at the DNA level showed that rate ofnucleotide (and of amino acid) substitution is about six-to ten-fold higher in the C peptide, relative to the A and Bchains.
Historical background: insulin
Page 219
![Page 6: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/6.jpg)
Number of nucleotide substitutions/site/year
0.1 x 10-9
0.1 x 10-91 x 10-9
Fig. 7.3Page 220
![Page 7: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/7.jpg)
Surprisingly, insulin from the guinea pig (and from the related coypu) evolve seven times faster than insulinfrom other species. Why?
The answer is that guinea pig and coypu insulindo not bind two zinc ions, while insulin molecules frommost other species do. There was a relaxation on thestructural constraints of these molecules, and so the genes diverged rapidly.
Historical background: insulin
Page 219
![Page 8: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/8.jpg)
Fig. 7.3Page 220
Guinea pig and coypu insulin have undergone anextremely rapid rate of evolutionary change
Arrows indicate positions at which guinea pig insulin (A chain and B chain) differs from both human and mouse
![Page 9: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/9.jpg)
In the 1960s, sequence data were accumulated forsmall, abundant proteins such as globins,cytochromes c, and fibrinopeptides. Some proteinsappeared to evolve slowly, while others evolved rapidly.
Linus Pauling, Emanuel Margoliash and others proposed the hypothesis of a molecular clock:
For every given protein, the rate of molecular evolution is approximately constant in all evolutionary lineages
Molecular clock hypothesis
Page 221
![Page 10: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/10.jpg)
As an example, Richard Dickerson (1971) plotted datafrom three protein families: cytochrome c, hemoglobin, and fibrinopeptides.
The x-axis shows the divergence times of the species,estimated from paleontological data. The y-axis showsm, the corrected number of amino acid changes per 100 residues.
n is the observed number of amino acid changes per100 residues, and it is corrected to m to account forchanges that occur but are not observed.
Molecular clock hypothesis
Page 222
N100
= 1 – e-(m/100)
![Page 11: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/11.jpg)
Dickerson drew the following conclusions:
• For each protein, the data lie on a straight line. Thus, the rate of amino acid substitution has remained constant for each protein.
• The average rate of change differs for each protein. The time for a 1% change to occur between two lines of evolution is 20 MY (cytochrome c), 5.8 MY (hemoglobin), and 1.1 MY (fibrinopeptides).
• The observed variations in rate of change reflect functional constraints imposed by natural selection.
Molecular clock hypothesis: conclusions
Page 223
![Page 12: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/12.jpg)
Fibrinopeptides 9.0Kappa casein 3.3Lactalbumin 2.7Serum albumin 1.9Lysozyme 0.98Trypsin 0.59Insulin 0.44Cytochrome c 0.22Histone H2B 0.09Ubiquitin 0.010Histone H4 0.010
Molecular clock for proteins:rate of substitutions per aa site per 109 years
Table 7-1Page 223
![Page 13: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/13.jpg)
If protein sequences evolve at constant rates,they can be used to estimate the times that sequences diverged. This is analogous to datinggeological specimens by radioactive decay.
Molecular clock hypothesis: implications
Page 225
![Page 14: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/14.jpg)
There are two main kinds of information inherentto any tree: topology and branch lengths.
We will now describe the parts of a tree.
Molecular phylogeny: nomenclature of trees
Page 231
![Page 15: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/15.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Molecular phylogeny uses trees to depict evolutionaryrelationships among organisms. These trees are basedupon DNA and protein sequence data.
Fig. 7.8Page 232
![Page 16: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/16.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
taxon
operational taxonomic unit (OTU) such as a protein sequence
Fig. 7.8Page 232
![Page 17: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/17.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
branch (edge)
Node (intersection or terminating pointof two or more branches)
Fig. 7.8Page 232
![Page 18: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/18.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
Branches are unscaled... Branches are scaled...
…branch lengths areproportional to number ofamino acid changes
…OTUs are neatly aligned,and nodes reflect time
Fig. 7.8Page 232
![Page 19: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/19.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
22
D
Eone unit
Tree nomenclature
bifurcatinginternal node
multifurcatinginternalnode
Fig. 7.9Page 233
![Page 20: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/20.jpg)
Examples of multifurcation: failure to resolve the branching orderof some metazoans and protostomes
Rokas A. et al., Animal Evolution and the Molecular Signature of RadiationsCompressed in Time, Science 310:1933 (2005), Fig. 1.
![Page 21: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/21.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature: clades
Clade ABF (monophyletic group)
Fig. 7.8Page 232
![Page 22: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/22.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature
Clade CDH
Fig. 7.8Page 232
![Page 23: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/23.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature
Clade ABF/CDH/G
Fig. 7.8Page 232
![Page 24: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/24.jpg)
Examples of clades
Lindblad-Toh et al., Nature 438: 803 (2005), fig. 10
![Page 25: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/25.jpg)
The root of a phylogenetic tree represents thecommon ancestor of the sequences. Some treesare unrooted, and thus do not specify the commonancestor.
A tree can be rooted using an outgroup (that is, ataxon known to be distantly related from all otherOTUs).
Tree roots
Page 233
![Page 26: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/26.jpg)
Tree nomenclature: roots
past
present
1
2 3 4
5
6
7 8
9
4
5
87
1
2
36
Rooted tree(specifies evolutionarypath)
Unrooted tree
Fig. 7.10Page 234
![Page 27: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/27.jpg)
Tree nomenclature: outgroup rooting
past
present
1
2 3 4
5
6
7 8
9
Rooted tree
1
2 3 4
5 6
Outgroup(used to place the root)
7 9
10
root
8
Fig. 7.10Page 234
![Page 28: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/28.jpg)
Cavalii-Sforza and Edwards (1967) derived the numberof possible unrooted trees (NU) for n OTUs (n > 3):
NU =
The number of bifurcating rooted trees (NR)
NR =
For 10 OTUs (e.g. 10 DNA or protein sequences),the number of possible rooted trees is 34 million,and the number of unrooted trees is 2 million.Many tree-making algorithms can exhaustively examine every possible tree for up to ten to twelvesequences.
Enumerating trees
Page 235
(2n-5)!2n-3(n-3)!
(2n-3)!2n-2(n-2)!
![Page 29: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/29.jpg)
Numbers of possible trees extremely large for >10 sequences
Number Number of Number of of OTUs rooted trees unrooted trees
2 1 13 3 14 15 35 105 1510 34,459,425 10520 8 x 1021 2 x 1020
Page 235
![Page 30: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/30.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
Page 243
![Page 31: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/31.jpg)
For some phylogenetic studies, it may be preferableto use protein instead of DNA sequences. We sawthat in pairwise alignment and in BLAST searching,protein is often more informative than DNA (Chapter 3).
Stage 1: Use of DNA, RNA, or protein
Page 240
![Page 32: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/32.jpg)
For phylogeny, DNA can be more informative.
--The protein-coding portion of DNA has synonymousand nonsynonymous substitutions. Thus, some DNAchanges do not have corresponding protein changes.
Stage 1: Use of DNA, RNA, or protein
Page 240
![Page 33: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/33.jpg)
For phylogeny, DNA can be more informative.
--The protein-coding portion of DNA has synonymousand nonsynonymous substitutions. Thus, some DNAchanges do not have corresponding protein changes.
If the synonymous substitution rate (dS) is greater thanthe nonsynonymous substitution rate (dN), the DNAsequence is under negative (purifying) selection. Thislimits change in the sequence (e.g. insulin A chain).
If dS < dN, positive selection occurs. For example, a duplicated gene may evolve rapidly to assume new functions.
Stage 1: Use of DNA, RNA, or protein
Page 230
![Page 34: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/34.jpg)
For phylogeny, DNA can be more informative.
--Some substitutions in a DNA sequence alignment canbe directly observed: single nucleotide substitutions,sequential substitutions, coincidental substitutions.
Stage 1: Use of DNA, RNA, or protein
Page 241
![Page 35: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/35.jpg)
For phylogeny, DNA can be more informative.
--Noncoding regions (such as 5’ and 3’ untranslatedregions) may be analyzed using molecular phylogeny.
--Pseudogenes (nonfunctional genes) are studied bymolecular phylogeny
--Rates of transitions and transversions can be measured. Transitions: purine (A G) or pyrimidine (C T) substitutionsTransversion: purine pyrimidine
Stage 1: Use of DNA, RNA, or protein
Page 242
![Page 36: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/36.jpg)
For phylogeny, protein sequences are also often used.
--Proteins have 20 states (amino acids) instead of onlyfour for DNA, so there is a stronger phylogenetic signal.
Nucleotides are unordered characters: any onenucleotide can change to any other in one step.
An ordered character must pass through one or moreintermediate states before reaching the final state.
Amino acid sequences are partially ordered character states: there is a variable number of states betweenthe starting value and the final value.
Stage 1: Use of DNA, RNA, or protein
Page 243
![Page 37: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/37.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
Page 244
![Page 38: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/38.jpg)
The fundamental basis of a phylogenetic tree is a multiple sequence alignment.
(If there is a misalignment, or if a nonhomologoussequence is included in the alignment, it will still be possible to generate a tree.)
Consider the following alignment of orthologous globins (see Fig. 3.2)
Stage 2: Multiple sequence alignment
Page 244
![Page 39: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/39.jpg)
[1] Confirm that all sequences are homologous
[2] Adjust gap creation and extension penalties as needed to optimize the alignment
[3] Restrict phylogenetic analysis to regions of the multiple sequence alignment for which data are available for all taxa (delete columns having incomplete data or gaps).
Stage 2: Multiple sequence alignment
Page 244
![Page 40: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/40.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
Page 246
![Page 41: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/41.jpg)
Stage 4: Tree-building methods: distance
Page 247
The simplest approach to measuring distances between sequences is to align pairs of sequences, andthen to count the number of differences. The degree ofdivergence is called the Hamming distance. For analignment of length N with n sites at which there aredifferences, the degree of divergence D is:
D = n / N
But observed differences do not equal genetic distance!Genetic distance involves mutations that are notobserved directly.
![Page 42: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/42.jpg)
Distance-based methods involve a distance metric,such as the number of amino acid changes betweenthe sequences, or a distance score. Examples ofdistance-based algorithms are UPGMA and neighbor-joining.
Character-based methods include maximum parsimonyand maximum likelihood. Parsimony analysis involvesthe search for the tree with the fewest amino acid(or nucleotide) changes that account for the observeddifferences between taxa.
Stage 4: Tree-building methods
Page 254
![Page 43: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/43.jpg)
We can introduce distance-based and character-based tree-building methods by referring to a group of orthologous globin proteins.
Stage 4: Tree-building methods
Page 255
![Page 44: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/44.jpg)
[1] distance-based
[2] character-based: maximum parsimony
Stage 4: Tree-building methods
![Page 45: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/45.jpg)
Tree-building methods: UPGMA
UPGMA is unweighted pair group methodusing arithmetic mean
1 2
3
4
5
Fig. 7.26Page 257
![Page 46: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/46.jpg)
Tree-building methods: UPGMA
Step 1: compute the pairwise distances of allthe proteins. Get ready to put the numbers 1-5at the bottom of your new tree.
1 2
3
4
5
Fig. 7.26Page 257
![Page 47: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/47.jpg)
Tree-building methods: UPGMA
Step 2: Find the two proteins with the smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
Fig. 7.26Page 257
![Page 48: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/48.jpg)
Tree-building methods: UPGMA
Step 3: Do it again. Find the next two proteins with the smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
4 5
7
Fig. 7.26Page 257
![Page 49: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/49.jpg)
Tree-building methods: UPGMA
Step 4: Keep going. Cluster.
1 2
3
4
5 1 2
6
4 5
7
3
8
Fig. 7.26Page 257
![Page 50: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/50.jpg)
Tree-building methods: UPGMA
Step 4: Last cluster! This is your tree.
1 2
3
4
5
1 2
6
4 5
7
3
8
9
Fig. 7.26Page 257
![Page 51: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/51.jpg)
UPGMA is a simple approach for making trees.
• An UPGMA tree is always rooted.• An assumption of the algorithm is that the molecular clock is constant for sequences in the tree. If there are unequal substitution rates, the tree may be wrong.• While UPGMA is simple, it is less accurate than the neighbor-joining approach (described next).
Distance-based methods: UPGMA trees
Page 256
![Page 52: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/52.jpg)
The neighbor-joiningmethod of Saitou and Nei(1987) Is especially usefulfor making a tree having a large number of taxa.
Begin by placing all the taxa in a star-like structure.
Making trees using neighbor-joining
Page 259
![Page 53: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/53.jpg)
Page 259
Tree-building methods: Neighbor joining
Next, identify neighbors (e.g. 1 and 2) that are most closelyrelated. Connect these neighbors to other OTUs via aninternal branch, XY. At each successive stage, minimizethe sum of the branch lengths.
![Page 54: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/54.jpg)
Tree-building methods: Neighbor joining
Define the distance from X to Y by
dXY = 1/2(d1Y + d2Y – d12)
Page 259
![Page 55: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/55.jpg)
Example of aneighbor-joiningtree: phylogeneticanalysis of 13RBPs
![Page 56: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/56.jpg)
We will discuss four tree-building methods:
[1] distance-based
[2] character-based: maximum parsimony
Stage 4: Tree-building methods
![Page 57: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/57.jpg)
Tree-building methods: character based
Rather than pairwise distances between proteins,evaluate the aligned columns of amino acidresidues (characters).
Page 260
![Page 58: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/58.jpg)
The main idea of character-based methods is to findthe tree with the shortest branch lengths possible.Thus we seek the most parsimonious (“simple”) tree.
• Identify informative sites. For example, constant characters are not parsimony-informative.
• Construct trees, counting the number of changesrequired to create each tree. For about 12 taxa orfewer, evaluate all possible trees exhaustively; for >12 taxa perform a heuristic search.
• Select the shortest tree (or trees).
Making trees using character-based methods
Page 260
![Page 59: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/59.jpg)
As an example of tree-building using maximum parsimony, consider these four taxa:
AAGAAAGGAAGA
How might they have evolved from a common ancestor such as AAA?
Page 261
![Page 60: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/60.jpg)
AAG AAA GGA AGA
AAAAAA
1 1AGA
AAG AGA AAA GGA
AAAAAA
1 2AAA
AAG GGA AAA AGA
AAAAAA
1 1AAA
1 2
Tree-building methods: Maximum parsimony
Cost = 3 Cost = 4 Cost = 4
1
In maximum parsimony, choose the tree(s) with the lowest cost (shortest branch lengths).
Page 261
![Page 61: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/61.jpg)
We will discuss four tree-building methods:
[1] distance-based
[2] character-based: maximum parsimony
Stage 4: Tree-building methods
![Page 62: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/62.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
Page 266
![Page 63: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/63.jpg)
Bootstrapping is a commonly used approach tomeasuring the robustness of a tree topology.Given a branching order, how consistently doesan algorithm find that branching order in a randomly permuted version of the original data set?
To bootstrap, make an artificial dataset obtained by randomly sampling columns from your multiple sequence alignment. Make the dataset the same size as the original. Do 100 (to 1,000) bootstrap replicates.Observe the percent of cases in which the assignmentof clades in the original tree is supported by the bootstrap replicates. >70% is considered significant.
Stage 5: Evaluating trees: bootstrapping
Page 266
![Page 64: Molecular Phylogeny and Evolution](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56812bd6550346895d904021/html5/thumbnails/64.jpg)
In 61% of the bootstrapresamplings, ssrbp and btrbp(pig and cow RBP) formed adistinct clade. In 39% of the cases, another protein joinedthe clade (e.g. ecrbp), or oneof these two sequences joinedanother clade.