why do trees?

29
Why do trees?

Upload: albany

Post on 05-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Why do trees?. Phylogeny 101. OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal, often living species, individuals) Branches length scaled Branches length unscaled, nominal, arbitrary - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Why do trees?

Why do trees?

 

Page 2: Why do trees?

Phylogeny 101

• OTUs operational taxonomic units: species, populations, individuals

• Nodes internal (often ancestors)Nodes external (terminal, often living species,

individuals)• Branches length scaled

Branches length unscaled, nominal, arbitrary• Outgroup an OTU that is most distantly related

to all the other OTUs in the study.

Page 3: Why do trees?

Phylogeny 102• Trees rooted (N=(2n-3)! / 2n-2(n-2)!

Trees unrooted (N=(2n-5)! / 2n-3(n-3)!OTUs #rooted trees #unrooted trees2 1 13 3 14 15 35 105 156 954 1057 10395 9548 135135 103959 2027025 13513510 34349425 2027025

Page 4: Why do trees?

Trees NJ

• Distance matrix

• UPGMA assumes constant rate of evolution – molecular clock: don’t publish UPGMA trees

• Neighbor joining is very fast

• Often a “good enough” tree

• Embedded in ClustalW

• Use in publications only if too many taxa to compute with MP or ML

Page 5: Why do trees?

Distances from sequence

• Protdist/DNAdist

• Non-identical residues/total sequence length

• Correction for multiple hits necessary because 2 ID residues may be C -> T -> C

• Jukes-Cantor assumes all subs equally likely

• Kimura: transition rate NE transversion rate

• Ts usually > Tv

Page 6: Why do trees?

Trees MP

• Maximum parsimony

• Minimum # mutations to construct tree

• Better than NJ – information lost in distance matrix – but much slower

• Sensitive to long-branch attraction

• No explicit evolutionary model

• Protpars refuses to estimate branch lengths

• Informative sites

Page 7: Why do trees?

Trees ML

• Very CPU intensive• Requires explicit model of evolution – rate

and pattern of nucleotide substitution– JC Jukes/Cantor – K2P Kimura 2 parameter transition/transversion– F81 Felsenstein – base composition bias– HKY85 merges K2P and F81

• Explicit model -> preferred statistically• Assumes change more likely on long branch• No long-branch attraction• Wrong model -> wrong tree

Page 8: Why do trees?

Models of sequence evolution

HKY85

A C G T

A C G T

C A G T

G A C T

T A C G

 

Page 9: Why do trees?

Here we have a representative alignment. Want to determine the phylogenetic relationships among the

OTUs:

Site: 1 2 3 4 5 6 7 8 9OTU1 A A G A G T G C AOTU2 A G C C G T G C GOTU3 A G A T A T C C AOTU4 A G A G A T C C G * * *

It is a good alignment clearly aligning homologous sites without gaps.

Page 10: Why do trees?

There are 3 possible trees for 4 taxa (OTUs):

1 3 1 2 1 2 \_____/ \_____/ \_____/ / \ / \ / \ 2 4 3 4 4 3

Or (1,2)(3,4) (1,3)(2,4) and (1,4)(2,3)

Aim to identify (phylogenetically) informative sites and use these to determine which tree is most parsimonious.

Page 11: Why do trees?

The identical sites 1, 6, 8 are useless for phylogenetic purposes.

 

Site: 1 2 3 4 5 6 7 8 9

OTU1 A A G A G T G C A

OTU2 A G C C G T G C G

OTU3 A G A T A T C C A

OTU4 A G A G A T C C G

* * *

Page 12: Why do trees?

Site 2 also useless: OTU1’s A could be grouped with any of the Gs.

Site: 1 2 3 4 5 6 7 8 9OTU1 A A G A G T G C AOTU2 A G C C G T G C GOTU3 A G A T A T C C AOTU4 A G A G A T C C G * * *

Page 13: Why do trees?

Site 4 is uniformative as each site is different.UNLESS transitions weighted in which case (1,4)(2,3)

Site: 1 2 3 4 5 6 7 8 9

OTU1 A A G A G T G C A

OTU2 A G C C G T G C G

OTU3 A G A T A T C C A

OTU4 A G A G A T C C G

* * *

Page 14: Why do trees?

For site 3 each tree can be made with (minimum) 2 mutations:

Site: 1 2 3 4 5 6 7 8 9

OTU1 A A G A G T G C A

OTU2 A G C C G T G C G

OTU3 A G A T A T C C A

OTU4 A G A G A T C C G

* * *

Page 15: Why do trees?

(1,2)(3,4)

G A G A G A

\ / \ / \ /

G---A C---A A---A

/ \ / \ / \

C A C A C A

Page 16: Why do trees?

(1,3)(2,4)

G C can do worse:G C

\ / \ /

A---A G---A

/ \ / \

A A A A

Page 17: Why do trees?

(1,4)(2,3)

G C

\ /

A---A

/ \

A A

So site 3 is (Counterintuitively) NOT informative

Page 18: Why do trees?

Site 5, however is informative because one tree shortest.

Site: 1 2 3 4 5 6 7 8 9

OTU1 A A G A G T G C A

OTU2 A G C C G T G C G

OTU3 A G A T A T C C A

OTU4 A G A G A T C C G

* * *

Page 19: Why do trees?

(1,2)(3,4) (1,3)(2,4) (1,4)(2,3)

G A G G G G

\ / \ / \ /

G---A A---A G---G

/ \ / \ / \

G A A A A A

Page 20: Why do trees?

Likewise sites 7 and 9.By majority rule most parsimonious tree is

(1,2)(3,4) supported by 2/3 informative sites.

Site: 1 2 3 4 5 6 7 8 9

OTU1 A A G A G T G C A

OTU2 A G C C G T G C G

OTU3 A G A T A T C C A

OTU4 A G A G A T C C G

* * *

Page 21: Why do trees?

Protpars• infile:• 8 370• BRU MSQNSLRLVE DNSV-DKTKA LDAALSQIER • RLR ---------- ---V-DKSKA LEAALSQIER • NGR ---------- -MSD-DKSKA LAAALAQIEK • ECO ---------- AIDE-NKQKA LAAALGQIEK • YPR ---------M AIDE-NKQKA LAAALGQIEK • PSE ---------- -MDD-NKKRA LAAALGQIER • TTH ---------- -MEE-NKRKS LENALKTIEK • ACD ---------- -MDEPGGKIE FSPAFMQIEG

Page 22: Why do trees?

Protpars

• treefile:(((((ACD,TTH),(PSE,(YPR,ECO))),NGR),RLR),BRU);

Page 23: Why do trees?

• outfile:One most parsimonious tree found:

+-ACD +-------7 ! +-TTH +-6 ! ! +----PSE ! +----5 +-3 ! +-YPR ! ! +-4 ! ! +-ECO +-2 ! ! ! +-------------NGR--1 ! ! +----------------RLR ! +-------------------BRU

remember: this is an unrooted tree!

requires a total of 853.000

Page 24: Why do trees?

Clustalw

****** PHYLOGENETIC TREE MENU ******

1. Input an alignment 2. Exclude positions with gaps? = ON 3. Correct for multiple substitutions? = ON 4. Draw tree now 5. Bootstrap tree 6. Output format options

S. Execute a system command H. HELP or press [RETURN] to go back to main menu

Page 25: Why do trees?

ClustalW NJ• (((ACD:0.28958,

TTH:0.32705):0.03395,((BRU:0.07321,RLR:0.07032):0.11692,NGR:0.21168):0.02493):0.02092,(ECO:0.05022,YPR:0.05736):0.11997,PSE:0.15632);

• topologically the same as(((ACD,TTH),((BRU,RLR),NGR)),(ECO,YPR),PSE);

and cf: Protpars:(((((ACD,TTH),(PSE,(YPR,ECO))),NGR),RLR),BRU);

Page 26: Why do trees?

NJ vs ProtPars

Page 27: Why do trees?

Dealing with CDSs

• More info in DNA than proteins• Systematic 3rd posn changes can confuse• Use DNA directly only if evol dist short• For distant relationships: blank 3rd positions• Translate into protein to align

– then copygaps back to DNA

• Use dnadist with weights to investigate rates

Page 28: Why do trees?

Trees

General guidelines – NOT rules

• More data is better

• Excellent alignment = few informative sites

• Exclude unreliable data – toss all gaps?

• Use seqs/sites evolving at appropriate rate– Phylip DISTANCE– 3rd positions saturated– 2nd positions invariant– Fast evolving seqs for closely related taxa– Eliminate transition - homoplasy

Page 29: Why do trees?

Trees

• Beware base composition bias in unrealted taxa

• Are sites (hairpins?) independent?

• Are substitution rates equal across dataset?

• Long branches prone to error – remove them?