example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families)...

20
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus mobilis, S: Synechocystis sp., Ct: Chlorobium tepidum, Ca: Chloroflexus aurantiacus Bipartitions supported by genes from chlorophyll biosynthesis pathway Zhaxybayeva, Hamel, Raymond, and Gogarten, Genome R Ca H S Ct Plurality Ca H R S Ct Chl. Biosynth.

Post on 21-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Example of bipartition analysis for five genomes of photosynthetic bacteria(188 gene families)

total 10 bipartitions

R: Rhodobacter capsulatus, H: Heliobacillus mobilis, S: Synechocystis sp., Ct: Chlorobium tepidum, Ca: Chloroflexus aurantiacus

R: Rhodobacter capsulatus, H: Heliobacillus mobilis, S: Synechocystis sp., Ct: Chlorobium tepidum, Ca: Chloroflexus aurantiacus

Bipartitions supported by genes from chlorophyll biosynthesis pathway

Zha

xyba

yeva

, Ham

el, R

aym

ond,

and

Gog

arte

n, G

enom

e B

iolo

gy 2

004,

5: R

20

R CaH

SCtPlurality

Ca HR

SCtChl. Biosynth.

Phy

loge

netic

Ana

lyse

s of

Gen

es f

rom

ch

loro

phyl

l bio

synt

hesi

s pa

thw

ay

(ext

ende

d da

tase

ts)

R: Rhodobacter capsulatus, H: Heliobacillus mobilis, S: Synechocystis sp., Ct: Chlorobium tepidum, Ca: Chloroflexus aurantiacus

Xio

ng e

t al.

Sci

ence

, 200

0 28

9:17

24-3

0

Zha

xyba

yeva

, Ham

el, R

aym

ond,

and

Gog

arte

n, G

enom

e B

iolo

gy 2

004,

5: R

20

PROBLEMS WITH BIPARTITIONS

• No easy way to incorporate gene families that are not represented in all genomes.

• The more sequences are added, the shorter the internal branches become, and the lower is the bootstrap support for the individual bipartitions. • A single misplaced sequence can destroy all bipartitions.

Bootstrap support values for embedded quartets

+ : tree calculated from one pseudo-sample generated by bootstraping from an alignment of one gene family present in 11 genomes

Quartet spectral analyses of genomes iterates over three loops:Repeat for all bootstrap samples. Repeat for all possible embedded quartets.Repeat for all gene families.

: embedded quartet for genomes 1, 4, 9, and 10 .This bootstrap sample supports the topology ((1,4),9,10).

14

9

101

10

9

4

1

9

10

4

Zh

axy b

aye

v a e

t al. 2

00

6, G

en

om

e R

es e

ar c h

, i n p

res s

This gene family for the quartet of species A, B, C, DSupports the Topology ((A, D), B, C) with 70% bootstrap support

Iterating over Bootstrap Samples

Bootstrap support values for embedded quartets

+ : tree calculated from one pseudo-sample generated by bootstraping from an alignment of one gene family present in 11 genomes

Quartet spectral analyses of genomes iterates over three loops:Repeat for all bootstrap samples. Repeat for all possible embedded quartets.Repeat for all gene families.

: embedded quartet for genomes 1, 4, 9, and 10 .This bootstrap sample supports the topology ((1,4),9,10).

14

9

101

10

9

4

1

9

10

4

Total number of gene families containing the species quartet

Number of gene families supporting the same topology as the plurality (colored according to bootstrap

support level)

Number of gene families supporting one of the two alternative quartet topologies

Illustration of one component of a quartet spectral analyses Summary of phylogenetic information for one genome quartet for all gene

families

330 possible quartets

quartets

Num

ber

of

data

sets

685 datasets show conflicts with plurality

1128 datasets from relaxed core (core datasets + datasets with one or two taxa missing)

Quartet Spectrumof 11 cyanobacterialgenomes

PLURALITY SIGNAL

Gloeobacter

marine Synechococcus

3Prochlorococcus

2Prochlorococcus

1Prochlorococcus

Nostoc

Anabaena

Trichodesmium

Crocosphaera

Synechocystis

Thermosynechococcus

N

A

Tr

C

S

Th

G

1P

2P

3P

mS

Conflicts with plurality signal are observed in sets of orthologs across all

functional categories, including

genes involved in translation and

transcription

624/1128 ≈ 55%

Distribution of 1128 datasets in the relaxed core

212

192

441

160

123INFORMATION STORAGEAND PROCESSINGCELLULAR PROCESSESAND SIGNALINGMETABOLISM

POORLYCHARACTERIZEDNOT PRESENT IN COGS

Distribution of 624 datasets conflicting the plurality signal

127

96

249

82

70

INFORMATION STORAGEAND PROCESSING

CELLULAR PROCESSESAND SIGNALING

METABOLISM

POORLYCHARACTERIZED

NOT PRESENT IN COGS

Genes with orthologs outside the cyanobacterial phylum:Distribution among Functional Categories

(using COG db, release of March 2003)

Cyanobacteria do

not form a coherent

group (160)

Cyanobacteria do

form a coherent

group, but conflict

with plurality (294)

700 phylogenetically

useful extended

datasets

Example of inter-phylum transfer: threonyl tRNA synthetase

In case of the marine Synecchococcus and Prochlorococcus spp. the plurality consensus is unlikely to reflect organismal history.

This is probably due to frequent gene transfer mediated by phages e.g.:

These conflicting observations are not limited to pro-karyotes. In incipient species of Darwin’s finches frequentintrogression can make some individuals characterized by morphology and mating behavior as belonging to the same species genetically more similar to a sister species (Grant et al. 2004 “Convergent evolution of Darwin's finches caused by introgressive hybridization and selection” Evolution Int J Org Evolution 58, 1588-1599).

Species evolution versus plurality consensus

The Coral of Life (Darwin)

Coalescence – the

process of tracing

lineages backwards

in time to their

common ancestors.

Every two extant

lineages coalesce

to their most recent

common ancestor.

Eventually, all

lineages coalesce

to the cenancestor.

t/2(Kingman, 1982)

Illustration is from J. Felsenstein, “Inferring Phylogenies”, Sinauer, 2003

Coalescence of ORGANISMAL and MOLECULAR Lineages

•20 lineages

•One extinction and one speciation event per generation

•One horizontal transfer event once in 5 generations (I.e., speciation events)

RED: organismal lineages (no HGT)BLUE: molecular lineages (with HGT)GRAY: extinct lineages

•20 lineages

•One extinction and one speciation event per generation

•One horizontal transfer event once in 5 generations (I.e., speciation events)

RED: organismal lineages (no HGT)BLUE: molecular lineages (with HGT)GRAY: extinct lineages

RESULTS:

•Most recent common ancestors are different for organismal and molecular phylogenies

•Different coalescence times

•Long coalescence time for the last two lineages

RESULTS:

•Most recent common ancestors are different for organismal and molecular phylogenies

•Different coalescence times

•Long coalescence time for the last two lineages

Time

Adam and Eve never met

Albrecht Dürer, The Fall of Man, 1504

MitochondrialEve

Y chromosomeAdam

Lived approximately

50,000 years ago

Lived 166,000-249,000

years ago

Thomson, R. et al. (2000) Proc Natl Acad Sci U S A 97, 7360-5

Underhill, P.A. et al. (2000) Nat Genet 26, 358-61

Cann, R.L. et al. (1987) Nature 325, 31-6

Vigilant, L. et al. (1991) Science 253, 1503-7

The same is true for ancestral rRNAs, EF, ATPases!

EX

TA

NT

LIN

EA

GE

S F

OR

TH

E S

IMU

LAT

ION

S O

F 5

0 LI

NE

AG

ES

green: organismal lineages ; red: molecular lineages (with gene transfer)

Lineages Through Time Plot

10 simulations of organismal evolution assuming a constant number of species (200) throughout the simulation; 1 speciation and 1 extinction per time step. (green O)

25 gene histories simulated for each organismal history assuming 1 HGT per 10 speciation events (red x)

log

(n

umb

er o

f su

rviv

ing

line

age

s)

Bacterial 16SrRNA based phylogeny (from P. D. Schloss and J. Handelsman, Microbiology and Molecular Biology Reviews,

December 2004.)

The deviation from the “long branches at the base” pattern could be due to • under sampling• an actual radiation

• due to an invention that was not transferred• following a mass extinction