plants
DESCRIPTION
Five kingdom system (Haeckel, 1879). mammals. vertebrates. animals. invertebrates. plants. fungi. protists. protozoa. monera. We will use MEGA to make phylogenetic trees. Open the alignment editor…. Choose DNA or protein…. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/1.jpg)
![Page 2: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/2.jpg)
plants
animals
monera
fungi
protistsprotozoa
invertebrates
vertebrates
mammalsFive kingdom
system(Haeckel, 1879)
![Page 3: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/3.jpg)
We will use MEGA to make phylogenetic trees
![Page 4: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/4.jpg)
Open the alignment editor…
Choose DNA or protein…
Paste in sequences in the fasta format or as a multiple sequence alignment…
![Page 5: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/5.jpg)
You can use a set of protein or DNA sequences in the fasta format obtained from HomoloGene
![Page 6: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/6.jpg)
Use MEGA to make phylogenetic trees
Trees show the evolutionary relationships among proteins, or DNA sequences, or species…
![Page 7: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/7.jpg)
Charles Darwin’s 1859 book (On the Origin of SpeciesBy Means of Natural Selection, or the Preservationof Favoured Races in the Struggle for Life) introducedthe theory of evolution.
To Darwin, the struggle for existence induces a naturalselection. Offspring are dissimilar from their parents(that is, variability exists), and individuals that are morefit for a given environment are selected for. In this way,over long periods of time, species evolve. Groups of organisms change over time so that descendants differstructurally and functionally from their ancestors.
Introduction
![Page 8: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/8.jpg)
At the molecular level, evolution is a process ofmutation with selection.
Molecular evolution is the study of changes in genesand proteins throughout different branches of the tree of life.
Phylogeny is the inference of evolutionary relationships.Traditionally, phylogeny relied on the comparisonof morphological features between organisms. Today,molecular sequence data are also used for phylogeneticanalyses.
Introduction
![Page 9: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/9.jpg)
Phylogeny can answer questions such as:
Goals of molecular phylogeny
• How many genes are related to my favorite gene?• How related are whales, dolphins & porpoises to cows?• Where and when did HIV or other viruses originate?• What is the history of life on earth?• Was the extinct quagga more like a zebra or a horse?
![Page 10: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/10.jpg)
In the 1960s, sequence data were accumulated forsmall, abundant proteins such as globins,cytochromes c, and fibrinopeptides. Some proteinsappeared to evolve slowly, while others evolved rapidly.
Linus Pauling, Emanuel Margoliash and others proposed the hypothesis of a molecular clock:
For every given protein, the rate of molecular evolution is approximately constant in all evolutionary lineages
Implications: If protein sequences evolve at constant rates, they can be used to estimate the times that sequences diverged. This is analogous to datinggeological specimens by radioactive decay.
Molecular clock hypothesis
![Page 11: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/11.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Molecular phylogeny uses trees to depict evolutionaryrelationships among organisms. These trees are basedupon DNA and protein sequence data.
![Page 12: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/12.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
taxon
taxon
![Page 13: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/13.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
taxon
operational taxonomic unit (OTU) such as a protein sequence
![Page 14: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/14.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
branch (edge)
Node (intersection or terminating pointof two or more branches)
![Page 15: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/15.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
Branches are unscaled... Branches are scaled...
…branch lengths areproportional to number ofamino acid changes
…OTUs are neatly aligned,and nodes reflect time
![Page 16: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/16.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
22
D
Eone unit
Tree nomenclature
bifurcatinginternal node
multifurcatinginternalnode
![Page 17: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/17.jpg)
Examples of multifurcation: failure to resolve the branching orderof some metazoans and protostomes
Rokas A. et al., Animal Evolution and the Molecular Signature of RadiationsCompressed in Time, Science 310:1933 (2005), Fig. 1.
![Page 18: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/18.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature: clades
Clade ABF (monophyletic group)
![Page 19: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/19.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature
Clade CDH
![Page 20: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/20.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature
Clade ABF/CDH/G
![Page 21: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/21.jpg)
Examples of clades
Lindblad-Toh et al., Nature 438: 803 (2005), fig. 10
![Page 22: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/22.jpg)
The root of a phylogenetic tree represents thecommon ancestor of the sequences. Some treesare unrooted, and thus do not specify the commonancestor.
A tree can be rooted using an outgroup (that is, ataxon known to be distantly related from all otherOTUs).
Tree roots
![Page 23: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/23.jpg)
Tree nomenclature: roots
past
present
1
2 3 4
5
6
7 8
9
4
5
87
1
2
36
Rooted tree(specifies evolutionarypath)
Unrooted tree
![Page 24: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/24.jpg)
Tree nomenclature: outgroup rooting
past
present
1
2 3 4
5
6
7 8
9
Rooted tree
1
2 3 4
5 6
Outgroup(used to place the root)
7 9
10
root
8
![Page 25: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/25.jpg)
Numbers of possible trees extremely large for >10 sequences
Number Number of Number of of OTUs rooted trees unrooted trees
2 1 13 3 14 15 35 105 1510 34,459,425 10520 8 x 1021 2 x 1020
![Page 26: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/26.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
![Page 27: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/27.jpg)
For phylogeny, DNA can be more informative.
--The protein-coding portion of DNA has synonymousand nonsynonymous substitutions. Thus, some DNAchanges do not have corresponding protein changes.
Stage 1: Use of DNA, RNA, or protein
![Page 28: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/28.jpg)
![Page 29: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/29.jpg)
![Page 30: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/30.jpg)
For phylogeny, DNA can be more informative.
--Some substitutions in a DNA sequence alignment canbe directly observed: single nucleotide substitutions,sequential substitutions, coincidental substitutions.
Stage 1: Use of DNA, RNA, or protein
![Page 31: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/31.jpg)
Substitutions in a DNA sequence alignment can be directly observed, or inferred
![Page 32: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/32.jpg)
![Page 33: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/33.jpg)
For phylogeny, DNA can be more informative.
--Noncoding regions (such as 5’ and 3’ untranslatedregions) may be analyzed using molecular phylogeny.
--Pseudogenes (nonfunctional genes) are studied bymolecular phylogeny
--Rates of transitions and transversions can be measured. Transitions: purine (A G) or pyrimidine (C T) substitutionsTransversion: purine pyrimidine
Stage 1: Use of DNA, RNA, or protein
![Page 34: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/34.jpg)
For phylogeny, protein sequences are also often used.
--Proteins have 20 states (amino acids) instead of onlyfour for DNA, so there is a stronger phylogenetic signal.
Nucleotides are unordered characters: any onenucleotide can change to any other in one step.
An ordered character must pass through one or moreintermediate states before reaching the final state.
Amino acid sequences are partially ordered character states: there is a variable number of states betweenthe starting value and the final value.
Stage 1: Use of DNA, RNA, or protein
![Page 35: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/35.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
![Page 36: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/36.jpg)
The fundamental basis of a phylogenetic tree is a multiple sequence alignment.
(If there is a misalignment, or if a nonhomologoussequence is included in the alignment, it will still be possible to generate a tree.)
Consider the following alignment of orthologous globins (see Fig. 3.2)
Stage 2: Multiple sequence alignment
![Page 37: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/37.jpg)
gaps100%
conserved
open circles: positions that distinguish myoglobins, alpha globins, beta globins
![Page 38: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/38.jpg)
[1] Confirm that all sequences are homologous
[2] Adjust gap creation and extension penalties as needed to optimize the alignment
[3] Restrict phylogenetic analysis to regions of the multiple sequence alignment for which data are available for all taxa (delete columns having incomplete data or gaps).
Stage 2: Multiple sequence alignment
![Page 39: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/39.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
![Page 40: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/40.jpg)
Stage 4: Tree-building methods: distance
The simplest approach to measuring distances between sequences is to align pairs of sequences, andthen to count the number of differences. The degree ofdivergence is called the Hamming distance. For analignment of length N with n sites at which there aredifferences, the degree of divergence D is:
D = n / N
But observed differences do not equal genetic distance!Genetic distance involves mutations that are notobserved directly.
![Page 41: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/41.jpg)
Stage 4: Tree-building methods: distance
Jukes and Cantor (1969) proposed a corrective formula:
D = (- ) ln (1 – p)34
43
This model describes the probability that one nucleotidewill change into another. It assumes that each residue is equally likely to change into any other (i.e. the rate oftransversions equals the rate of transitions). In practice,the transition is typically greater than the transversionrate.
![Page 42: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/42.jpg)
Models of nucleotide substitution
A G
T C
transition
transition
transversiontransversion
![Page 43: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/43.jpg)
A
Jukes and Cantor one-parameter model of nucleotide substitution (=)
G
T C
![Page 44: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/44.jpg)
A
Kimura model of nucleotide substitution (assumes ≠ )
G
T C
![Page 45: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/45.jpg)
Stage 4: Tree-building methods: distance
Jukes and Cantor (1969) proposed a corrective formula:
D = (- ) ln (1 – p)34
43
![Page 46: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/46.jpg)
Stage 4: Tree-building methods: distance
Jukes and Cantor (1969) proposed a corrective formula:
D = (- ) ln (1 – p)34
43
Consider an alignment where 3/60 aligned residues differ.The normalized Hamming distance is 3/60 = 0.05.The Jukes-Cantor correction is
D = (- ) ln (1 – 0.05) = 0.05234
43
When 30/60 aligned residues differ, the Jukes-Cantor correction is more substantial:
D = (- ) ln (1 – 0.5) = 0.8234
43
![Page 47: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/47.jpg)
Each model can affect the topology and branch lengths of the tree
p-distance correction
![Page 48: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/48.jpg)
Poisson correction
![Page 49: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/49.jpg)
Gamma models account for unequal substitution rates across variable sites
Changing this parameter does alter the topology and branch lengths of the tree…(on next slide, kangaroo globin switches clades)
Substitution rate
Fre
quen
cy
dist
ribu
tion
![Page 50: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/50.jpg)
= 0.25
= 1
= 5
![Page 51: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/51.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
![Page 52: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/52.jpg)
Distance-based methods involve a distance metric,such as the number of amino acid changes betweenthe sequences, or a distance score. Examples ofdistance-based algorithms are UPGMA and neighbor-joining.
Character-based methods include maximum parsimonyand maximum likelihood. Parsimony analysis involvesthe search for the tree with the fewest amino acid(or nucleotide) changes that account for the observeddifferences between taxa.
Stage 4: Tree-building methods
![Page 53: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/53.jpg)
We can introduce distance-based and character-based tree-building methods by referring to a group of orthologous globin proteins.
Stage 4: Tree-building methods
![Page 54: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/54.jpg)
![Page 55: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/55.jpg)
Distance-based treeCalculate the pairwise alignments;if two sequences are related,put them next to each other on the tree
![Page 56: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/56.jpg)
Character-based tree: identify positions that best describe how characters (amino acids) are derived from common ancestors
![Page 57: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/57.jpg)
[1] distance-based
[2] character-based: maximum parsimony
[3] character- and model-based: maximum likelihood
[4] character- and model-based: Bayesian
Stage 4: Tree-building methods
![Page 58: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/58.jpg)
How to use MEGA to make a tree
[1] Enter a multiple sequence alignment (.meg) file[2] Under the phylogeny menu, select one of these four methods…
Neighbor-Joining (NJ)Minimum Evolution (ME)Maximum Parsimony (MP)UPGMA
![Page 59: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/59.jpg)
Use of MEGA for a distance-based tree: UPGMA
Click computeto obtain tree
Click green boxesto obtain options
![Page 60: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/60.jpg)
Use of MEGA for a distance-based tree: UPGMA
![Page 61: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/61.jpg)
Use of MEGA for a distance-based tree: UPGMA
Flipping branches around a node createsan equivalent topology
![Page 62: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/62.jpg)
Tree-building methods: UPGMA
UPGMA is unweighted pair group methodusing arithmetic mean
1 2
3
4
5
![Page 63: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/63.jpg)
Tree-building methods: UPGMA
Step 1: Compute the pairwise distances of allthe proteins.
1 2
3
4
5
![Page 64: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/64.jpg)
Tree-building methods: UPGMA
Step 2: Find the two proteins with the smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
![Page 65: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/65.jpg)
Tree-building methods: UPGMA
Step 3: Do it again. Find the next two proteins with the smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
4 5
7
![Page 66: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/66.jpg)
Tree-building methods: UPGMA
Step 4: Keep going. Cluster.
1 2
3
4
5 1 2
6
4 5
7
3
8
![Page 67: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/67.jpg)
Tree-building methods: UPGMA
Step 4: Last cluster! This is your tree.
1 2
3
4
5
1 2
6
4 5
7
3
8
9
![Page 68: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/68.jpg)
UPGMA is a simple approach for making trees.
• An UPGMA tree is always rooted.• An assumption of the algorithm is that the molecular clock is constant for sequences in the tree. If there are unequal substitution rates, the tree may be wrong.• While UPGMA is simple, it is less accurate than the neighbor-joining approach (described next).
Distance-based methods: UPGMA trees
![Page 69: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/69.jpg)
The neighbor-joiningmethod of Saitou and Nei(1987) Is especially usefulfor making a tree having a large number of taxa.
Begin by placing all the taxa in a star-like structure.
Making trees using neighbor-joining
![Page 70: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/70.jpg)
Tree-building methods: Neighbor joining
Next, identify neighbors (e.g. 1 and 2) that are most closelyrelated. Connect these neighbors to other OTUs via aninternal branch, XY. At each successive stage, minimizethe sum of the branch lengths.
![Page 71: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/71.jpg)
Tree-building methods: Neighbor joining
Define the distance from X to Y by
dXY = 1/2(d1Y + d2Y – d12)
![Page 72: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/72.jpg)
Use of MEGA for a distance-based tree: NJ
Neighbor Joining produces areasonably similar tree asUPGMA
![Page 73: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/73.jpg)
Example of aneighbor-joiningtree: phylogeneticanalysis of 13RBPs
![Page 74: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/74.jpg)
We will discuss four tree-building methods:
[1] distance-based
[2] character-based: maximum parsimony
[3] character- and model-based: maximum likelihood
[4] character- and model-based: Bayesian
Stage 4: Tree-building methods
![Page 75: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/75.jpg)
Tree-building methods: character based
Rather than pairwise distances between proteins,evaluate the aligned columns of amino acidresidues (characters).
![Page 76: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/76.jpg)
The main idea of character-based methods is to findthe tree with the shortest branch lengths possible.Thus we seek the most parsimonious (“simple”) tree.
• Identify informative sites. For example, constant characters are not parsimony-informative.
• Construct trees, counting the number of changesrequired to create each tree. For about 12 taxa orfewer, evaluate all possible trees exhaustively; for >12 taxa perform a heuristic search.
• Select the shortest tree (or trees).
Making trees using character-based methods
![Page 77: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/77.jpg)
As an example of tree-building using maximum parsimony, consider these four taxa:
AAGAAAGGAAGA
How might they have evolved from a common ancestor such as AAA?
![Page 78: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/78.jpg)
AAG AAA GGA AGA
AAAAAA
1 1AGA
AAG AGA AAA GGA
AAAAAA
1 2AAA
AAG GGA AAA AGA
AAAAAA
1 1AAA
1 2
Tree-building methods: Maximum parsimony
Cost = 3 Cost = 4 Cost = 4
1
In maximum parsimony, choose the tree(s) with the lowest cost (shortest branch lengths).
![Page 79: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/79.jpg)
MEGA for maximum parsimony (MP) trees
Options include heuristic approaches,and bootstrapping
![Page 80: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/80.jpg)
MEGA for maximum parsimony (MP) trees
In maximum parsimony, there may be more than one treehaving the lowest total branch length. You may computethe consensus best tree.
![Page 81: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/81.jpg)
MEGA for maximum parsimony (MP) trees
Bootstrap values show the percent of times each cladeis supported after a large number (n=500) of replicatesamplings of the data.
![Page 82: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/82.jpg)
We will discuss four tree-building methods:
[1] distance-based
[2] character-based: maximum parsimony
[3] character- and model-based: maximum likelihood
[4] character- and model-based: Bayesian
Stage 4: Tree-building methods
![Page 83: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/83.jpg)
Maximum likelihood is an alternative to maximumparsimony. It is computationally intensive. A likelihoodis calculated for the probability of each residue inan alignment, based upon some model of thesubstitution process.
What are the tree topology and branch lengths that have the greatest likelihood of producing the observed data set?
ML is implemented in the TREE-PUZZLE program,as well as PAUP and PHYLIP.
Making trees using maximum likelihood
![Page 84: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/84.jpg)
(1) Reconstruct all possible quartets A, B, C, D. For 12 myoglobins there are 495 possible quartets.
(2) Puzzling step: begin with one quartet tree. N-4 sequences remain. Add them to the branches systematically, estimating the support for each internal branch. Report a consensus tree.
Maximum likelihood: Tree-Puzzle
![Page 85: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/85.jpg)
Maximum likelihood tree
![Page 86: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/86.jpg)
Quartet puzzling
![Page 87: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/87.jpg)
We will discuss four tree-building methods:
[1] distance-based
[2] character-based: maximum parsimony
[3] character- and model-based: maximum likelihood
[4] character- and model-based: Bayesian
Stage 4: Tree-building methods
![Page 88: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/88.jpg)
Calculate:
Pr [ Tree | Data] =
Bayesian inference of phylogeny with MrBayes
Pr [ Data | Tree] x Pr [ Tree ]
Pr [ Data ]
Pr [ Tree | Data ] is the posterior probability distribution of trees. Ideally this involves a summation over all possible trees. In practice, Monte Carlo Markov Chains (MCMC) are run to estimate the posterior probability distribution.
Notably, Bayesian approaches require you to specify prior assumptions about the model of evolution.
![Page 89: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/89.jpg)
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Selection of a substitution model
[4] Tree building
[5] Tree evaluation
Five stages of phylogenetic analysis
![Page 90: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/90.jpg)
Bootstrapping is a commonly used approach tomeasuring the robustness of a tree topology.Given a branching order, how consistently doesan algorithm find that branching order in a randomly permuted version of the original data set?
To bootstrap, make an artificial dataset obtained by randomly sampling columns from your multiple sequence alignment. Make the dataset the same size as the original. Do 100 (to 1,000) bootstrap replicates.Observe the percent of cases in which the assignmentof clades in the original tree is supported by the bootstrap replicates. >70% is considered significant.
Stage 5: Evaluating trees: bootstrapping
![Page 91: plants](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56813ff7550346895dab1f5b/html5/thumbnails/91.jpg)
In 61% of the bootstrapresamplings, ssrbp and btrbp(pig and cow RBP) formed adistinct clade. In 39% of the cases, another protein joinedthe clade (e.g. ecrbp), or oneof these two sequences joinedanother clade.