lecture 3 molecular evolution and phylogeny. facts on the molecular basis of life every life forms...
TRANSCRIPT
Lecture 3
Molecular Evolution and Phylogeny
Facts on the molecular basis of life
• Every life forms is genome based• Genomes evolves• There are large numbers of apparently hom
logous intra-genomic (paralog) and inter-genomic (ortholog) genes
• Some genes, especially those related to the function of transcription and translation, are common to ALL life forms
• The closer two organisms seem to be phylogenetically, the more similar their genomes and corresponding genes are
Central dogma of molecular biology
DNA
RNA
Protein
• Closer related organisms have more similar genomes
• Highly similar genes are homologs (have the same ancestor)
• A universal ancestor exists for all life forms• Molecular difference in homologous genes
(or protein sequences) are positively correlated with evolution time
• Phylogenetic relation can be expressed by a dendrogram (a “tree”)
Basic assumptions of molecular evolution
The five steps in phylogenetics dancing
Modified from Hillis et al., (1993). Methods in Enzymology 224, 456-487
12
3
4
5
Sequence data
Align Sequences
Phylogenetic signal?Patterns—>evolutionary processes?
Test phylogenetic reliability
Distances methods
Choose a method
MB ML
Characters based methods
Single treeOptimality criterion
Calculate or estimate best fit tree
LS ME NJ
Distance calculation(which model?)
Model?
MPWheighting?
(sites, changes)?Model?
Why protein phylogenies?Why protein phylogenies?
• For historical reasons - first For historical reasons - first
sequences...sequences...• Most genes encode proteins...Most genes encode proteins...• To study protein structure, function To study protein structure, function
andand
evolutionevolution• Comparing DNA and protein based Comparing DNA and protein based
phylogenies can be usefulphylogenies can be useful•Different genes - e.g. 18S rRNA versus Different genes - e.g. 18S rRNA versus
EF-2 proteinEF-2 protein•Protein encoding gene - codons versus Protein encoding gene - codons versus
amino acidsamino acids
Protein were the first molecular Protein were the first molecular sequences to be used for sequences to be used for phylogenetic inferencephylogenetic inference
Fitch and Margoliash (1967)
Construction of phylogenetic trees.
Science 155, 279-284.
Statistical Physics and Biological InformationInstitute of Theoretical Physics
University of California at Santa Barbara2001 May 7
Most of what follows taken from:
Understanding trees
Time
30 Mya
Root
22 Mya
7 Mya
same as
Understanding trees #2
Understanding trees #3
Difference in homologous sequences is a measure of evolution time
Part of multiple sequence alignment of Mitochondrial Small Sub-Unit rRNA
Full length is ~ 950
11 primate species with mouse as outgroup靈長目
Change similarity matrix to distance matrix: d = 1 - S
From alignment construct pairwise distance**Note: Alignment is not the only way to computedistance
Models of sequence evolution
Jukes-Cantor (minimal) Model
All substitution rates = all base frequency = 1/4
A C= 3 Pij(2t)
• Let probability of site being a base at time t be P(t)• After elapse time t
mutate to other three bases is –3t P(t) Gain from other bases is t (1 - P(t))
• Hence P(t + t) = P(t) –3t P(t) + t (1 - P(t)) dP(t)/dt = P(t)
• Write P(t) = a exp(-bt) +c, solution is b= , c=1/4 P(t) = a exp(- t) +1/4
• If P(0) = 1, then a = ¾. If P(0) = 0, then a = -1/4• Finally Psame(t) =1/4 +3/4 exp(- t)
Pchange(t) =1/4 - 1/4 exp(- t)
Derivation of Jukes-Cantor formula
Transition A G or C TTransversion A T or C G
Hasegawa-Kishino-Yano modelHas a more general substitution rate
Part of Jukes-Cantor distance matrixfor primate examples
(is much larger; for outgroup)
Matrix will be used for clustering methods
Clustering
UPGMA
Neighbor-Joining Method
N-J Method produces an Unrooted, Additive tree
PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0
What is required for the Neighbour joining method?
Distance matrix0. Distance Matrix
Neighbor-Joining MethodAn Example
PAM distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum and we'll calculate the new distances.
Mon-Hum
MonkeyHumanSpinachMosquito Rice
1. First Step
After we have joined two species in a subtree we have to compute the distances from every other node to the new subtree. We do this with a simple average of distances:Dist[Spinach, MonHum]
= (Dist[Spinach, Monkey] + Dist[Spinach, Human])/2 = (90.8 + 86.3)/2 = 88.55
Mon-Hum
MonkeyHumanSpinach
2. Calculation of New Distances
PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0
HumanMosquito
Mon-Hum
MonkeySpinachRice
Mos-(Mon-Hum)
3. Next Cycle
PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0
HumanMosquito
Mon-Hum
MonkeySpinachRice
Mos-(Mon-Hum)
Spin-Rice
4. Penultimate Cycle
PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0
HumanMosquito
Mon-Hum
MonkeySpinachRice
Mos-(Mon-Hum)
Spin-Rice
(Spin-Rice)-(Mos-(Mon-Hum))
5. Last Joining
Human
Monkey
MosquitoRice
Spinach
The result:Unrooted Neighbor-Joining Tree
Bootstrapping
Why are trees not exact?
Pairwise distances usually not tree-like
Searching tree space
Maximum likelihood criterion
Parsimony criterion
Parsimony with molecular data
Parsimony criterion
Paul Higgs:
Is the best tree much better than others?
L: likelihood at nodes
Use Maximum Likelihood to rank alternate trees
yes
yes
same topology
NJ tree is 2nd best
Use Parsimony to rank alternate trees
different topology
; parsimony differentiates weakly
Quartet puzzling
MCMC: Markov chain with Monte Carlo
Topology probabilities according to MCMC
Clade probability compared from tree methods
NJ method is very fast and close to being the best
Lecture and Book
•Lecture by Paul Higgs• online.itp.ucsb.edu/online/infobio01/higgs/• see online.itp.ucsb.edu/online/infobio01/ for many lectures
•Book by Wen-Hsiong Li 李文雄•“Molecular Evolution” (Sinauer Associates, 1997)
•CMS Molecular Biology Resource•www.unl.edu/stc-95/ResTools/cmshp.html•Phylogeny - Molecular Evolution•www.unl.edu/stc-95/ResTools/biotools/biotools2.html
•The Tree of Life Web Project •tolweb.org/tree/phylogeny.html
•Web Resources in Molecular Evolution and Systematics
•darwin.eeb.uconn.edu/molecular-evolution.html
Some web sites on Molecular Evolution
• On-line service • www.ebi.ac.uk/clustalw/• clustalw.genome.ad.jp/
• Software• ftp-igbmc.u-strasbg.fr/pub/ClustalX/• ftp-igbmc.u-strasbg.fr/pub/ClustalW/
Some web sites on ClustalW