phylogenetics - university of notre damecse/2013fa/40532/lectures/lecture18.pdf · • based on...

35
Phylogenetics

Upload: dangxuyen

Post on 20-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Phylogenetics

Page 2: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

The basics

•  In general, the closer the two species are evolutionarily, the closer their genomes will be (Anthrax homework example)

•  We will assume all life comes from a common ancestor

•  Relationships can be illustrated using trees –  Phylogenetics’ task is to infer these trees

Page 3: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Tree of life (mid 19th century)

http://www.ucmp.berkeley.edu/education/events/eukevol.html

Page 4: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction
Page 5: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction
Page 6: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Interesting facts

•  Darwin's finches are 14 species.

•  They descend from one common ancestor that arrived in the Galapagos Island archipelago within the past 2-3 million years.

•  “Incipient” species: nothing biological keeping them from forming hybrids, but happens very rarely in the wild due to mating call differences.

Page 7: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Try it out!

Source: Warren Ewens U. of Penn

Page 8: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Background

•  Phylogenetics comes from phylogeny (evolutionary history)

•  Phylogenetics can be morphological or molecular –  Paper clips is morphological –  Sequence alignments are molecular

•  Two areas of research: –  Molecular systematics: infer “tree of life” –  Molecular evolution: understand molecules in the

context of related species (e.g., Ka/Ks)

Page 9: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Building trees

•  Fitch and Margoliash (1967) Construction of phylogenetic trees. Science 155, 279-284.

• Based on protein sequences

Page 10: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Tree of life (mid 19th century)

http://www.ucmp.berkeley.edu/education/events/eukevol.html

Page 11: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Carl Woese

Early 16S rRNA tree

Page 12: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Phylogenetic tree basics

•  Nodes: points that connect branches

•  Branches: lines that connect nodes

•  Taxa: things being compared

•  Rooted tree: one node is the base

•  Unrooted tree: no explicit starting point

Page 13: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Rooted vs unrooted

Rooted Unrooted

Page 14: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Tree styles

•  Trees can be thought of as a mobile •  Internal nodes represent common

ancestors

Page 15: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Getting it right

•  Some but not all distance based methods: –  UPGMA –  Neighbor joining –  Invariant –  Dollo –  Wagner –  Sokal – …

Page 16: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Why phylogenetics is hard

•  Number of unrooted trees for more than 2 taxa is:

•  # of rooted trees for more than 1 taxa:

•  Example: 34,459,425 unrooted trees for only ten taxa

2n − 5( )!2n − 3 n − 3( )( )!

2n − 3( )!2n − 2 n − 2( )( )!

Page 17: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Interesting study

•  http://www.pnas.org/content/99/22/14292

•  First time phylogenetics was used in a criminal court case in the U.S.

Page 18: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Case study: SARS

•  The text outlines the story of the origin of SARS and how phylogenetics played a role

•  In the beginning, there was an outbreak in Vietnam; in only a few weeks, the WHO official and 5 hospital workers were dead

Page 19: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Back story •  Actually first appeared in Guangdong province, China

late 2002

•  Most people got sick in a hospital, one doctor in hospital visited Hong Hong

•  Travelers staying on the same floor as the doctor, who died, then got sick

•  These people brought it to other places in the world

Page 20: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Genomics

•  This is a good example of how epidemology, and virology benefit from the tools and algorithms in this course.

•  Soon we will discuss: – What kind of virus cause the epidemic? – What species is it from? – Where did it start?

Page 21: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

But first….

•  Lets review basics of infering phylogenies

•  All molecular reconstruction methods assume you start with a set of aligned sequences

•  This provides the homology information we need, and is critical.

Page 22: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Approach

•  Three main ways to build a tree: – Discrete (per site) – Distance (convert into pairwise distance) – Combination (make a tree from a bunch) – Optimal (looks at all possible trees) – Statistical (e.g., maximum likelihood)

Page 23: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Other species trees

Darwin’s Finches

Primates

http://members.aol.com/darwinpage/trees.htm

Page 24: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Example gene tree

Lodish et al. (2000)

Page 25: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Basic construction approaches

•  Distance –  Tree accounts for evolutionary distances

estimated from data •  Parsimony

–  Tree that requires minimum about of change to explain the data

•  Maximum likelihood –  Tree that maximizes the likelihood of the data

Page 26: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Try it out!

Source: Warren Ewens U. of Penn

Page 27: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Inference

•  We infer trees because we don’t really know all the species, esp. ancestors represented by internal nodes.

•  Today, we’ll discuss simple approaches for phylogenetic tree inference based on distance.

Page 28: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

A sample tree

Reed et al. (2004)

Page 29: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Lice support the hypothesis that Homo erectus and Homo sapiens were separate for a period of time

Page 30: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Preparing for Next Week

•  Input: – Data from a set of genes/species

•  Output: – A phylogenetic tree that accurately

characterizes the respective lineages

Page 31: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

SARS

•  The genome of SARS was sequenced by a Canadian group in April 2003

•  29,751bp, single stranded RNA sequence

•  Has 5-6 genes in the typical structure of a coronavirus –  One of the causes of the common cold

Page 32: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Where did SARS come from?

Page 33: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Himalayan palm civet

Page 34: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Neighbor joining can also be used to study epidemiology

Page 35: Phylogenetics - University of Notre Damecse/2013fa/40532/lectures/lecture18.pdf · • Based on protein sequences! Tree of life (mid 19th ... Lodish et al. (2000) Basic construction

Date of origin

•  Genetic distance scales almost linearly with time

•  Authors of the text estimate SARS jumped to humans around Sept 16 2002