from basic concepts to advanced applications molecular evolution & phylogeny by ofir cohen the...

49
From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel May 2012 http://ibis.tau.ac.il/intro_bioinfo/phylogenyWorkshop/

Upload: berenice-brown

Post on 13-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

From basic Concepts to Advanced applications

Molecular Evolution & Phylogeny

By Ofir Cohen

The Bioinformatics UnitG.S. Wise Faculty of Life Science

Tel Aviv University, IsraelMay 2012

http://ibis.tau.ac.il/intro_bioinfo/phylogenyWorkshop/

Page 2: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

2 of 50

The Human Genome Project ("behind

the scene”) Venter et. al. , Science 292:1304-1351 (2001)

International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001)

The club resident J.D. Watson: Back2back with DJ. Venter -

Page 3: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

3 of 50

Genome Sequencing – Ongoing Revolution

The race is (still) on… The promise is huge…

Page 4: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

4 of 50

SRA(Sequence Read Archive)= raw seq. from next-generation machines

Trace=raw seq. from 90s machines)

Page 5: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

5 of 50

Darwin’s teachings–Tree-like evolution

Introduction – The tree concept

Page 6: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

6 of 50

Darwin’s teachings– common descent Introduction – The tree concept

Page 7: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

7 of 50

Common Descent – Modern evidence

Introduction – The tree concept

"The unity of life is no less remarkable than its diversity" "The unity of life is no less remarkable than its diversity" THEODOSIUS DOBZHANSK

Page 8: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

8 of 50

Mathematicians developed tools to analyze Trees

Adapted from Huson et al. 2008

 connected graph without cycles is a tree. Not a tree! (cycle) Rooted binary treeTree

Part of the wider field of graph theory

Bridges of Königsberg

Page 9: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

9 of 50

What is a Phylogenetic Tree? Phylogenetic tree:

(hypothetical) historical pattern of evolutionary relationships among organisms

Introduction – The tree concept

Homo

Bos

Mus

Rattus0.011

0.025

0.012

0.011

Gallus

0.038

0.066

0.01

Root

Node

Leaf

Branch

(Greek: phylon = race and genetic = birth)

sps

Horizontal branch length –proportional to evolutionary distances (unit = substitution / site)

Page 10: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

10 of 50

Molecular evidence of HIV transmission in a

criminal case

Introduction - Anecdotes

Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, 14292-14297

Page 11: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

11 of 50

Criminal investigation

August 1994 a nurse tests negative for HIV. breaks off a messy 10 year affair with a doctor. Three weeks later the doctor gives his ex-mistress a vitamin B-12 shot

In January 1995, the nurse tests positive for both HIV and hepatitis C.

The doctor’s office records from the day are missing (but eventually found). The doctor had withdrawn blood samples from a known HIV patient and a known hepatitis C patient

the same day as the vitamin B-12 shot. The nurse had never had contact with either patient

Introduction - Anecdotes

Circumstantial evidence that the doctor injected blood from a patient of his into this ex-girlfriend….

How can this be proved using a phylogenetic approach?

Page 12: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

12 of 50

HIV – short background

Extreme heterogeneity Within each patient there are many different viral

strains ("quasi-species")

Introduction - Anecdotes

Page 13: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

13 of 50

History of the virus:

gp120(Gene tree)

PATIENT

VICTIM

CONTROLS

©2002 National Academy of Sciences, U.S.A.

Introduction - Anecdotes

Page 14: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

14 of 50

History of the virus:

RT (Gene tree)

VICTIM

PATIENT

Introduction - Anecdotes

Source sequences that are paraphyletic (other sequences are nested within them)

with respect to the recipient sequences provide evidence for the direction of transmission.

Page 15: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

15 of 50

Ernst Haeckel's Monophyletic tree of organisms, 1866

Reconstructing the tree of life

Page 16: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

16 of 50

Organisms classified into 2 domains: Eukaryotes including {plants, animals,

protists, fungi} Prkaryotes = Bacteria

Whittaker , 1969

Page 17: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

17 of 50

Reconstructing the Tree Of Life Carl Woese, 1977.  phylogenetic taxonomy of 16S ribosomal RNA

Critiques: Woese un-balanced the tree of life… (too much representation for microbial species)

Page 18: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

18 of 50

Phylogenetic analysis: Not only among organisms - Cancer

phylogenyA phylogeny of acute myeloid leukemia (AML) subtypes

Riester et al. 2010Liu et al. 2009

Page 19: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

19 of 50

Phylogenetic analysis: Not only in biology – Language evolution

Russell and Atkinson. 2003

Researchers learn the evolution of languages by treating them like genomes.

Instead of COGs (gene families), analyze COGNATES (words families)

Page 20: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

20 of 50

Reading Trees: Which tree is more accurate?

Reading Trees

Haeckel’s pedigree of man

Human "on top" – wrong!

Page 21: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

21 of 50

Rooted vs. Un-rooted treesRooted vs. Un-rooted treesRooted vs. Un-rooted treesRooted vs. Un-rooted trees

human

mousefugu

Drosophila

root

edge

internal nodeleaf

human

mouse

fuguDrosophila

root

edge

internal node (ancestor)

leaf

time

Reading Trees

Page 22: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

22 of 50

Gorilla gorilla(Gorilla)

Homo sapiens (human)

Pan troglodytes (Chimpanzee)

Gallus gallus (chicken)

How do we root a tree? Reading Trees

Page 23: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

23 of 50

Rooting based on a priori knowledge: Using Outgroup

Human

Chimp

Chicken

Gorilla

Human ChimpChicken Gorilla

Reading Trees

Page 24: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

24 of 50

Comparative Genomics – "All life is one"

Compare homologues sequences – Multiple Sequence Alignments

Page 25: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

25 of 50

Orthologs

speciation

ancestor

descendant 2 (e.g., dog)descendant 1 (e.g., human)

Orthologs will typically have the same or similar function in the course of evolution.

Page 26: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

26 of 50

Paralogs

Duplication

Evolutionary innovation - lack of the original selective pressure upon one copy, this copy is free to mutate and acquire new functions.

Page 27: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

27 of 50

Alignment and phylogeny are mutually dependant

Inaccurate tree building

MSA

Sequence alignment

0.4

Phylogeny reconstruction

Unaligned sequences

Page 28: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

28 of 50

Part II: Tools

Page 29: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

29 of 50

Multiple sequence alignment (MSA)

Several advanced MSA programs are available.Today we will use two:

MAFFT – fast and relatively accurate PRANK – distinct from all other MSA programs because

of its correct treatment of insertions/deletions

Tools - Alignments

Page 30: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

30 of 50

MAFFT Web server (& download option):

http://mafft.cbrc.jp/alignment/server/index.html Efficiency-tuned variants

quick & dirty or slow but accurate

Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066© 2002 Oxford University Press

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

Kazutaka Katoh, Kazuharu Misawa1, Kei-ichi Kuma and Takashi Miyata*

Tools - Alignments

Page 31: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

31 of 50

Choosing a MAFFT strategy

qu

ick &

dirty slow

bu

t accurate

Tools - Alignments

Page 32: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

32 of 50

MAFFT outputSaving the output Choose a format: Clustal, Fasta, or

click "Reformat" to convert to a selection of other formats

Save page as a text file

A colored view of the alignment

Tools - Alignments

Page 33: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

33 of 50

PRANKTools - Alignments

Page 34: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

34 of 50

Classical alignment errors for HIV env

Tools - Alignments

CLUSTALW PRANK

Page 35: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

35 of 50

PRANK Web server: http://www.ebi.ac.uk/goldman-srv/webPRANK/

Tools - Alignments

Page 36: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

36 of 50

PRANK output

If you need a different format – copy the results to the READSEQ sequence converter: http://www-bimas.cit.nih.gov/molbio/readseq/

Tools - Alignments

Page 37: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

38381. Download the sequence files from the web-site

http://ibis.tau.ac.il/intro_bioinfo/phylogenyWorkshop/Open "fahA.fas" in Notepad/Browser – these are 65 protein sequences in FASTA format.

2. Run PRANK web serverhttp://www.ebi.ac.uk/goldman-srv/webPRANK/

(1)

Page 38: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

39 of 50

Trees Reconstruction Methods

Page 39: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

40 of 50

Phylogeny reconstructionDifferent approaches (algorithms / programs): Distance based methods (e.g. neighbor-joining, as in ClustalW)

Fast but inaccurate Maximum parsimony (e.g. MEGA) Maximum likelihood methods (e.g. phyML, RAxML)

Accurate but slower Bayesian methods (e.g. MrBayes)

Most accurate but very slow

ABCDE

Guide tree

A

DCB

E

MSA

Pairwise distance table

Tools - Trees

Page 40: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

41 of 50

PhyMLThe most widely used maximum likelihood (ML) program Web server (& download): http://www.atgc-montpellier.fr/phyml/

Tools - Trees

Page 41: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

44 of 50

RAxML Web server: http://phylobench.vital-it.ch/raxml-bb/ Similar maximum likelihood (ML) methodology as phyML, but much faster

Faster results with bootstrap

Tools - Trees

Page 42: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

45 of 50

Bootstrapping

Now we have a tree, but what is the reliability of this tree?

Page 43: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

46 of 50

BootstrapA. Generate pseudo-data sets by sampling N positions Do not change the number of sequences. Resample (100-1000 time). 12345 100

1 : ATCTG…A 2 : ATCTG…C3 : ACTTA…C 4 : ACCTA…T

12345 1001 : AATTT…T2 : AATTT…G3 : AACTT…T4 : AACTT…T 11244 x

12345 1001 : TTTAT…T2 : TAACC…G3 : TAACC…T4 : TGGGA…T 4 7789…x

12345 1001 : AGGTA…T2 : AGGAC…G3 : AAAAC…A4 : AAAGG…C 15578… x

Page 44: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

47 of 50

BootstrapB. Reconstruct a tree from each data set.

12345 1001 : AATTT…T2 : AATTT…G3 : AACTT…T4 : AACTT…T 11244 x

12345 1001 : TTTAT…T2 : TAACC…G3 : TAACC…T4 : TGGGA…T 4 7789…x

12345 1001 : AGGTA…T2 : AGGAC…G3 : AAAAC…A4 : AAAGG…C 15578… x

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Page 45: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

48 of 50

C. compute the majority rule consensus.

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3

Sp4

67%100%

In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found.

Bootstrap

Page 46: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

4949

1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and “Maximum likelihood search” and enter your email)

(3)

Page 47: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

50 of 50

FigTree: tree visualization and figure creation

http://tree.bio.ed.ac.uk/software/figtree/

Manipulate a node

Manipulate a clade

Manipulate a taxon

Page 48: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

5151

1. In case tree are not ready yet… download tree from website

2. Open "fahA.prank.phylip_phyml_tree.txt" in FigTree http://tree.bio.ed.ac.uk/software/figtree/

3. Play around with the different options and make a pretty figure!

1. Find out how to color specific clades, as below

2. Try each of the three options under "Layout"

4. Export a figure in PDF format(File Export Graphic…)

(4)

Page 49: From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel

52 of 50

Final Questions…