5. assess reliability

25
5. Assess Reliability I. Bootstrap Resampling to produce pseudo-dataset (random weighting) II. Jacknife Sampling with replacement III. Permutation test Random deletion of sub- dataset Randomize dataset to build null likelihood distribution CGATCGTTA CAATGATAG CGCTGATAA CGCTGATCG taxa1 taxa2 taxa3 taxa4 123456789 Dataset1: 729338554 Dataset2: 631981282 Dataset1: 1-3-56789 Dataset2: 12-45678- 100 73

Upload: chun

Post on 15-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

123456789. taxa1. CGATCGTTA. taxa2. CAATGATAG. taxa3. CGCTGATAA. CGCTGATCG. taxa4. Dataset1: 729338554. Dataset2: 631981282. …. Dataset1: 1-3-56789. Dataset2: 12-45678-. …. 5. Assess Reliability. Resampling to produce pseudo-dataset (random weighting). 100. 73. I. Bootstrap. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 5. Assess Reliability

5. Assess Reliability

I. Bootstrap

Resampling to produce pseudo-dataset (random weighting)

II. Jacknife

Sampling with replacement

III. Permutation test

Random deletion of sub-dataset

Randomize dataset to build null likelihood distribution

CGATCGTTA

CAATGATAG

CGCTGATAACGCTGATCG

taxa1

taxa2

taxa3

taxa4

123456789

Dataset1: 729338554Dataset2: 631981282…

Dataset1: 1-3-56789Dataset2: 12-45678-…

10073

Page 2: 5. Assess Reliability

• Was the ancestor of bacteria a thermophile?– Reconstructed EF-Tu at key nodes (Gaucher et al., 2003)

• All ancestral types have high Topt

Reconstructed ancestral sequences to infer environment

Page 3: 5. Assess Reliability

Genetic exchange in bacteria/archaea

• Transformation– Used by Griffiths, and later Avery

to show DNA is genetic material– Some bacteria naturally competent

• Transduction– Generalized & specialized

• Conjugation– High-frequency recombinants (Hfr)

• All are partial and unidirectional

Page 4: 5. Assess Reliability

Detecting HGT from genomes: atypical nt composition

• Recent transfers often have unique signature

• “Molecular archaeology” of E. coli (Lawrence & Ochman, 1998)

– 17.6% HGT, but amelioration since– Calculated age distribution based on

average rate of divergence (Hacker & Carniel, 2001)

Page 5: 5. Assess Reliability

HGT genes often clustered

• HGT genes often clustered in large islands encoding related function

• Could HGT itself drive operon formation?– Selfish operon theory (Lawrence & Roth, 1996)

– Linkage increases chance of co-transfer of cluster– Recent analysis (Price et al., 2005)

• Identified HGT events and new operons in E. coli• No particular enrichment of HGT in new

operons• Role of co-regulation instead?

Page 6: 5. Assess Reliability

Detecting HGT: incongruent phylogeny/synteny

• Any incongruent phylogeny could be explained by HGT or independent gene loss

HGT only Loss only

(Koonin, 2003)

Ex: glycerol-3-P DH

Page 7: 5. Assess Reliability

Detecting HGT: H4MPT pathway

• Tree based upon concatenated sequences (bootstrap = distance/parsimony; Kalyuzhnaya et al., 2005)

Page 8: 5. Assess Reliability

Detecting HGT: H4MPT pathway

different

similar

(Kalyuzhnaya et al., 2005)

Page 9: 5. Assess Reliability

Detecting HGT: H4MPT pathway

different

similar

New C1 genes found due to clustering

(Kalyuzhnaya et al., 2005)

Page 10: 5. Assess Reliability

Detecting HGT: H4MPT pathway

E

Euryarchaeaota

Crenarchaeaota

Proteobacteria

LUCA

Other Bacteria

D

Euryarchaeaota

Crenarchaeaota

Proteobacteria

Planctomycetes

LUCA

Other Bacteria

Planctomycetes

E

Euryarchaeaota

Crenarchaeaota

Proteobacteria

LUCA

Other Bacteria

D

Euryarchaeaota

Crenarchaeaota

Proteobacteria

Planctomycetes

LUCA

Other Bacteria

Planctomycetes

A

Euryarchaeaota

Crenarchaeaota

Proteobacteria

Planctomycetes

LUCA

Other Bacteria

Euryarchaeaota

Crenarchaeaota

Proteobacteria

Planctomycetes

LUCA

Other Bacteria

B

Euryarchaeaota

Crenarchaeaota

Proteobacteria

PlanctomycetesOther Bacteria

C

LUCA

A

Euryarchaeaota

Crenarchaeaota

Proteobacteria

Planctomycetes

LUCA

Other Bacteria

Euryarchaeaota

Crenarchaeaota

Proteobacteria

Planctomycetes

LUCA

Other Bacteria

B

Euryarchaeaota

Crenarchaeaota

Proteobacteria

PlanctomycetesOther Bacteria

C

LUCA

Scenarios for pathway evolution (Chistoserdova et al.,

2004)

(Kalyuzhnaya et al., 2005)

Page 11: 5. Assess Reliability

Detecting HGT: plants!?!(Davis & Wurdack, 2004)

Rafflesia (Malpighiales)Rafflesia (Malpighiales)

Page 12: 5. Assess Reliability

Detecting HGT: differential gene content

• Analysis of three E. coli (Welch et al., 2002)

– Shared genes >95% similarity– Each genome only ~½ core genes– Combination of HGT & differential loss

• Identity of non-core genes?– Unusually AT-rich & short

(Daubin & Ochman, 2004)

– Tend to be more environment-specific (Pál et al., 2005)

Page 13: 5. Assess Reliability

Detecting HGT: differential gene content

• Analysis of three E. coli (Welch et al., 2002)

– Shared genes >95% similarity– Each genome only ~½ core genes– Combination of HGT & differential loss

• Identity of non-core genes?– Unusually AT-rich & short

(Daubin & Ochman, 2004)

– Tend to be more environment-specific and “attach” to network periphery (Pál et al., 2005)

Page 14: 5. Assess Reliability

Measurements of natural horizontal gene transfer (HGT)

• Method for in situ plasmid transfer (Sørensen et al., 2005)

• 20-100x higher rate than culture-dep. method

Page 15: 5. Assess Reliability

Measurements of natural horizontal gene transfer (HGT)

• Sort via FACS, amplify 16S rRNA & sequence

• Visualize in situ on leaf

Page 16: 5. Assess Reliability

Limitations to HGT

• Environmental parameters: conjugation in ocean?– Phage appear to be major vectors for exchange

amongst Prochlorococcus (Lindell et al., 2004)

– Some encode unstable photosynthesis proteins that are expressed during infection (Lindell et al., 2005)

Page 17: 5. Assess Reliability

Limitations to HGT

• Environmental parameters: conjugation in ocean?– Phage appear to be major vectors for exchange

amongst Prochlorococcus (Lindell et al., 2004)

– Some encode unstable photosynthesis proteins that are expressed during infection (Lindell et al., 2005)

– Fate and process of incorporating HGT genes into network?

Page 18: 5. Assess Reliability

Can gene histories be retraced?

• trp operon (Xie et al., 2004)

Page 19: 5. Assess Reliability

Can gene histories be retraced?

• “Highways of gene sharing” (Bieko et al., 2005)

– >220000 proteins from 144 genomes

Page 20: 5. Assess Reliability

Can gene histories be retraced?• “Net of life” (Kunin et al., 2005)

Page 21: 5. Assess Reliability

Is there still a tree?: (Daubin et al., 2003)

Page 22: 5. Assess Reliability

Is there still a tree?: (Daubin et al., 2003)

Page 23: 5. Assess Reliability

Is there still a tree?: (Doolittle & Bapteste, 2007)

Page 24: 5. Assess Reliability

“Automated TOL”(Ciccarelli et al., 2006)

Page 25: 5. Assess Reliability