lecture 3 molecular evolution and phylogeny. facts on the molecular basis of life every life forms...

54
Lecture 3 Molecular Evolution and Phylogeny

Upload: rodney-simpson

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Lecture 3

Molecular Evolution and Phylogeny

Page 2: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Facts on the molecular basis of life

• Every life forms is genome based• Genomes evolves• There are large numbers of apparently hom

logous intra-genomic (paralog) and inter-genomic (ortholog) genes

• Some genes, especially those related to the function of transcription and translation, are common to ALL life forms

• The closer two organisms seem to be phylogenetically, the more similar their genomes and corresponding genes are

Page 3: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Central dogma of molecular biology

DNA

RNA

Protein

Page 4: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

• Closer related organisms have more similar genomes

• Highly similar genes are homologs (have the same ancestor)

• A universal ancestor exists for all life forms• Molecular difference in homologous genes

(or protein sequences) are positively correlated with evolution time

• Phylogenetic relation can be expressed by a dendrogram (a “tree”)

Basic assumptions of molecular evolution

Page 5: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

The five steps in phylogenetics dancing

Modified from Hillis et al., (1993). Methods in Enzymology 224, 456-487

12

3

4

5

Sequence data

Align Sequences

Phylogenetic signal?Patterns—>evolutionary processes?

Test phylogenetic reliability

Distances methods

Choose a method

MB ML

Characters based methods

Single treeOptimality criterion

Calculate or estimate best fit tree

LS ME NJ

Distance calculation(which model?)

Model?

MPWheighting?

(sites, changes)?Model?

Page 6: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Why protein phylogenies?Why protein phylogenies?

• For historical reasons - first For historical reasons - first

sequences...sequences...• Most genes encode proteins...Most genes encode proteins...• To study protein structure, function To study protein structure, function

andand

evolutionevolution• Comparing DNA and protein based Comparing DNA and protein based

phylogenies can be usefulphylogenies can be useful•Different genes - e.g. 18S rRNA versus Different genes - e.g. 18S rRNA versus

EF-2 proteinEF-2 protein•Protein encoding gene - codons versus Protein encoding gene - codons versus

amino acidsamino acids

Page 7: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Protein were the first molecular Protein were the first molecular sequences to be used for sequences to be used for phylogenetic inferencephylogenetic inference

Fitch and Margoliash (1967)

Construction of phylogenetic trees.

Science 155, 279-284.

Page 8: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Statistical Physics and Biological InformationInstitute of Theoretical Physics

University of California at Santa Barbara2001 May 7

Most of what follows taken from:

Page 9: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Understanding trees

Time

30 Mya

Root

22 Mya

7 Mya

same as

Page 10: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Understanding trees #2

Page 11: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Understanding trees #3

Page 12: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Difference in homologous sequences is a measure of evolution time

Part of multiple sequence alignment of Mitochondrial Small Sub-Unit rRNA

Full length is ~ 950

11 primate species with mouse as outgroup靈長目

Change similarity matrix to distance matrix: d = 1 - S

Page 13: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers
Page 14: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

From alignment construct pairwise distance**Note: Alignment is not the only way to computedistance

Page 15: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Models of sequence evolution

Page 16: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Jukes-Cantor (minimal) Model

All substitution rates = all base frequency = 1/4

A C= 3 Pij(2t)

Page 17: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

• Let probability of site being a base at time t be P(t)• After elapse time t

mutate to other three bases is –3t P(t) Gain from other bases is t (1 - P(t))

• Hence P(t + t) = P(t) –3t P(t) + t (1 - P(t)) dP(t)/dt = P(t)

• Write P(t) = a exp(-bt) +c, solution is b= , c=1/4 P(t) = a exp(- t) +1/4

• If P(0) = 1, then a = ¾. If P(0) = 0, then a = -1/4• Finally Psame(t) =1/4 +3/4 exp(- t)

Pchange(t) =1/4 - 1/4 exp(- t)

Derivation of Jukes-Cantor formula

Page 18: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Transition A G or C TTransversion A T or C G

Hasegawa-Kishino-Yano modelHas a more general substitution rate

Page 19: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers
Page 20: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Part of Jukes-Cantor distance matrixfor primate examples

(is much larger; for outgroup)

Matrix will be used for clustering methods

Page 21: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Clustering

Page 22: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

UPGMA

Page 23: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Neighbor-Joining Method

Page 24: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

N-J Method produces an Unrooted, Additive tree

Page 25: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0

What is required for the Neighbour joining method?

Distance matrix0. Distance Matrix

Neighbor-Joining MethodAn Example

Page 26: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

PAM distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum and we'll calculate the new distances.

Mon-Hum

MonkeyHumanSpinachMosquito Rice

1. First Step

Page 27: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

After we have joined two species in a subtree we have to compute the distances from every other node to the new subtree. We do this with a simple average of distances:Dist[Spinach, MonHum]

= (Dist[Spinach, Monkey] + Dist[Spinach, Human])/2 = (90.8 + 86.3)/2 = 88.55

Mon-Hum

MonkeyHumanSpinach

2. Calculation of New Distances

Page 28: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

3. Next Cycle

Page 29: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

4. Penultimate Cycle

Page 30: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

(Spin-Rice)-(Mos-(Mon-Hum))

5. Last Joining

Page 31: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Human

Monkey

MosquitoRice

Spinach

The result:Unrooted Neighbor-Joining Tree

Page 32: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers
Page 33: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Bootstrapping

Page 34: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Why are trees not exact?

Page 35: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Pairwise distances usually not tree-like

Page 36: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers
Page 37: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Searching tree space

Page 38: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Maximum likelihood criterion

Page 39: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Parsimony criterion

Page 40: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Parsimony with molecular data

Page 41: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Parsimony criterion

Paul Higgs:

Page 42: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers
Page 43: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Is the best tree much better than others?

L: likelihood at nodes

Page 44: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Use Maximum Likelihood to rank alternate trees

yes

yes

same topology

NJ tree is 2nd best

Page 45: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Use Parsimony to rank alternate trees

different topology

; parsimony differentiates weakly

Page 46: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Quartet puzzling

Page 47: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers
Page 48: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

MCMC: Markov chain with Monte Carlo

Page 49: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Topology probabilities according to MCMC

Page 50: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Clade probability compared from tree methods

NJ method is very fast and close to being the best

Page 51: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

Lecture and Book

•Lecture by Paul Higgs• online.itp.ucsb.edu/online/infobio01/higgs/• see online.itp.ucsb.edu/online/infobio01/ for many lectures

•Book by Wen-Hsiong Li 李文雄•“Molecular Evolution” (Sinauer Associates, 1997)

Page 52: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

•CMS Molecular Biology Resource•www.unl.edu/stc-95/ResTools/cmshp.html•Phylogeny - Molecular Evolution•www.unl.edu/stc-95/ResTools/biotools/biotools2.html

•The Tree of Life Web Project •tolweb.org/tree/phylogeny.html

•Web Resources in Molecular Evolution and Systematics

•darwin.eeb.uconn.edu/molecular-evolution.html

Some web sites on Molecular Evolution

Page 53: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers

• On-line service • www.ebi.ac.uk/clustalw/• clustalw.genome.ad.jp/

• Software• ftp-igbmc.u-strasbg.fr/pub/ClustalX/• ftp-igbmc.u-strasbg.fr/pub/ClustalW/

Some web sites on ClustalW

Page 54: Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers