technology developments for characterizing the complex

1
Technology Developments for Characterizing the Complex Metaproteomes of Microbial Communities R.L. Hettich 1,2 ; N.C. VerBerkmoes 1,2 ; M. Thompson 1,2 ; C. Shook 3 ; A.P. Borole 4 ; M. Shah 4 ; F.W. Larimer 4 ; B.H. Davison 4 1 Chemical Sciences Division, Oak Ridge National Laboratory, 2 ORNL-UTK Genome Sciences and Technology Graduate School, 3 University of Tennessee, 4 Life Sciences Division, Oak Ridge National Laboratory RESULTS AND DISCUSSIONS INTRODUCTION OVERVIEW l Understanding how individual cells function and interact within complex microbial communities is one of the greatest challenges in biology today. l Recent advances in metagenomics have demonstrated that simple microbial communities (Tyson, Nature 2004) and possibly more complex microbial communities (Venter, Science 2004) can be sequenced and annotated for at least the major components of that community. l With representative DNA sequences from microbial communities, the next challenge in systems biology approaches to communities is proteome profiling and the analysis of protein complexes derived from communities. l “Shotgun” proteomics is a rapidly evolving technology that may have the potential to analyze the very complex protein mixtures that will be derived from whole community proteomes. l The purpose of this study is to evaluate and optimize MS-based “shotgun” proteomics techniques to analyze simple known microbial mixtures. The goal is to improve on current technologies and develop analytical methodologies for analyzing microbial communities whose genomes have been sequenced. Recently, there have been strong efforts to develop techniques for genomic sequencing and annotation of microbial communities (metagenomics). With the potential of partial or near complete microbial genomes obtained from environmental samples, along with the rapid proliferation of isolated microbial genomes, systems biology in microbial communities by combining genomic, transcriptomic, proteomic and metabolic studies may be possible in the near future. Our current studies seek to develop and demonstrate metaproteome analysis techniques for mixed cultures, establishing the basis for whole community proteomics”. Currently our major focus is to use controlled simple microbial mixtures of four species (E. coli, S. cerevisiae, R. palustris, and S. oneidensis) to develop MS-based proteomics methods, as well as the proteome bioinformatics tools for detailed analysis of sequenced microbial communities. Our goal is to provide “deep” and “wide” proteome measurements of complex microbial mixtures. Some of the considerations for any microbial community proteome project are highlighted below. Considerations for Proteome Analysis of Microbial Communities l Level of DNA sequence availability and quality of annotation for any species in that community l Diversity and dynamic range of species in the community l Quantity of community available (i.e. how much total protein can be obtained) l Inter- and intra- species relationship at the amino acid level. CONCLUSIONS ACKNOWLEDGEMENTS l Dr. John Yates (Scripps) and Dr. David Tabb (ORNL) for DTASelect/Contrast l Exploratory funding from the ORNL Director’s Research and Development Program l ORNL-UTK Genome Science and Technology Graduate School (for M. Thompson) l Oak Ridge National Laboratory is managed and operated by the University of Tennessee-Battelle, L.L.C., under contract DE-AC05-00OR22725 with the U.S. Department of Energy. l Initial studies have focused on evaluating the standard 24-hour 2D-LC-MS/MS experiment for the analysis of 4-component artificial microbial mixtures. l R. palustris was designated as the target species, and its concentration in the samples was set to 20%, 5%, 1%, 0.2% and 0%, while all other species were kept constant. l Significant R. palustris proteome determinations were possible at 25% and 5%, with 1% borderline. A few unique peptides could be detected confidently at the 0.2% level. l The effects of database size on unique peptide identification were examined. Larger databases diminish the identification of unique peptides for any given species, but species such as E. coli with many closely related species in the database seem to be affected more dramatically. Clearly, databases should be limited to the community genome plus a small number of distracters. l The major focus of future experiments will be on instrumental and bioinformatic developments to increase the confidence and extent of protein identification. In particular, more advanced multistage chromatographic separations combined with high throughput LTQ-MS/MS measurements will be evaluated. Figure 8 Diagnostic Peptide at 1% R. palustris MS/MS at 25% and 1% R. palustris TVPSTSNDRPNVALTPAAPWPGAPFVPTGNPFADGVGPGSYAQR 400 600 800 100 0 120 0 140 0 160 0 180 0 200 0 m/z 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Rel ativ e A bu nd anc e 144 2.2 179 2.8 141 2.7 155 3.6 132 2.5 118 1.0 179 0.7 118 0.1 991. 6 835. 4 172 2.9 155 5.7 122 2.6 179 4.6 106 7.8 896. 2 159 3.8 187 7.8 956. 4 569. 1 800. 2 162 2.5 713. 2 199 1.8 523. 0 682. 7 482. 7 25% 400 600 800 100 0 120 0 140 0 160 0 180 0 200 0 m/z 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Rel ativ e A bu nd anc e 144 1.9 179 0.8 132 2.4 179 2.0 155 3.8 132 3.2 118 0.1 141 2.5 118 1.5 179 4.7 172 2.9 991. 6 106 7.7 835. 4 896. 3 159 2.6 115 1.5 181 8.1 131 4.2 165 1.6 946. 0 569. 2 778. 4 189 7.7 509. 2 747. 8 1% Figure 9 Diagnostic Peptide at 0.2% R. palustris MS/MS at 25% and 0.2% R. palustris ADVPELGLDNLPIIVPLR 25% 0.2% EXPERIMENTAL Cell Growth and Production of Protein Fractions: l The four microbes (Figure 1) used in this study (E. coli, R. palustris, S. cerevisiae, and S. oneidensis) were all grown individually and mixed after cell harvesting at appropriate concentrations based on wet cell weight. Figure 2 illustrates the different mixtures analyzed in this study. l Mixed cell pellets were washed twice with Tris buffer, and then disrupted with sonication. Two crude protein fractions were created by ultracentrifugation 100,000g for 1 hour (membrane fraction and crude fraction). Protein fractions were quantified with BCA. Protein fractions were denatured, reduced, and digested with sequencing grade trypsin. The digested mixed proteomes were desalted via solid phase extraction and concentrated to ~10μg/μL and frozen at -80 o C until analysis. LC-MS/MS Analysis and Database Searching: l The tryptic digestions of the crude and membrane fractions from all mixtures were analyzed via 24-hour two-dimensional LC-MS/MS with a split-phase MudPIT column as described in McDonald, W.H., IJMS, 2002. The LC MS system was composed of an Ultimate HPLC (LC Packings, a division of Dionex, San Francisco, CA) coupled to an LCQ-DECA XP ion trap mass spectrometer (Thermo Finnigan, San Jose, CA). Each analysis required ~500μg of starting protein material from either the crude or membrane fraction. l The resultant MS/MS spectra from the mixed proteomes were searched with SEQUEST (Thermo Finnigan) against three databases highlighted below. The result files were filtered and sorted with DTASelect (Tabb, Journal of Proteome Research 2004) and unique and non-unique peptide identifications were extracted with in-house perl scripts. Databases Used to Search MS/MS Spectra l DB4 - the four species in the mixtures: E. coli R. palustris S. cerevisiae S. oneidensis l DB13 - the four species plus 1 plant and 8 other microbes DB large - 261 species with 1,011,612 total proteins Figure 1 Test species for Artificial Mixture Figure 2 Concentrations in Artificial Mixtures R. palustris E. coli S. oneidensis S. cerevisiae 0 500 1000 1500 2000 2500 3000 3500 4000 # Unique Peptides E. coli R. palustris S. cerevisiae S. oneidensis others Figure 5 Database comparison of the 25% mixture Initial Mixture Studies: l The initial experiments for this study have focused on testing current 2D-LC- MS/MS methodologies to analyze an artificial 4 microbe mixture of E. coli, R. palustris, S. cerevisiae and S. oneidensis. l Two aspects of community proteomics were analyzed in this initial study: The first was to determine at what level functionally meaningful results could be obtained from a target species (R. palustris) whose concentration was decreased over a range of concentrations from 25% to 0.2%. All experiments were conducted with current 2D-LC-MS/MS technologies. The second was to test the effects of database size on the identification of unique peptides from the four microbial species in the sample. Unique peptides are defined as those peptides from a given protein that are unique to a specific species in a given database. l Figure 3 highlights the results from the 4 species database with decreasing concentrations of R. palustris. l Figure 4 highlights the results from the 13 species database with decreasing concentrations of R. palustris. l Figure 5 highlights results from the 261 species database at the 25% of all species in the mixture. Figure 3 Unique Peptides Identified with DB4 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Percentage of R. palustris # Unique Peptides E. coli (25-33%) R. palustris S. cerevisiae (25-33%) S. oneidensis (25-33%) 25% 5% 1% 0.2% 0% DB4 DB13 DB261 Database 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Percentage of R. palustris # Unique Peptides Figure 4 Unique Peptides Identified with DB13 25% 5% 1% 0.2% 0% E. coli (25-33%) R. palustris S. cerevisiae (25-33%) S. oneidensis (2-33%) others (9) Target R. Palustris Proteome Results from Initial Mixture Studies l The functional categories of the identified proteome from R. palustris analyzed from the different mixtures was compared with our R. palustris baseline proteome (VerBerkmoes et al. in preparation). Figure 6 illustrates this comparison with top left being the entire R. palustris baseline proteome and the other three panels being the identified functional categories at the respective mixture concentrations. l Figure 7 highlights the results from a single protein identified in the mixture proteomes. This protein puhA is critical subunit of the photoreaction center and is indicative of metabolic state i.e. photosynthesis. l Figures 8 and 9 illustrate diagnostic tandem mass spectra of peptides from puhA at the different concentrations of R. palustris in the sample, ranging down to 0.2%. Figure 7 Protein PuhA, a Component of the Photosynthetic Reaction Center from R. palustris R. palustis proteome 1695 Total Proteins Unknowns and Unclassif ied Replication and Repair Energy Metabolism Carbon and Carbohy drate Metabolism Lipid Metabolism Transcription Translation Cellular Processes Amino Acid Metabolism General f unction prediction Metabolism of Cofactors and Vitamins Transport Signal Transduction Purine and Pyrimidine Metabolism Mix 1 R. palustris 25% 610 total proteins Mix 2 R. palustris 5% 227 Total Proteins Mix 3 R. palustris 1% 70 total Proteins Figure 6 Functional Categories

Upload: others

Post on 04-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Technology Developments for Characterizing the Complex Metaproteomes of Microbial CommunitiesR.L. Hettich1,2; N.C. VerBerkmoes1,2; M. Thompson1,2; C. Shook3; A.P. Borole4; M. Shah4; F.W. Larimer4; B.H. Davison4

1Chemical Sciences Division, Oak Ridge National Laboratory, 2ORNL-UTK Genome Sciences and Technology Graduate School, 3University of Tennessee, 4Life Sciences Division, Oak Ridge National Laboratory

RESULTS AND DISCUSSIONS

INTRODUCTION

OVERVIEW

l Understanding how individual cells function and interact within complex microbial communities is one of the greatest challenges in biology today.

l Recent advances in metagenomics have demonstrated that simple microbial communities (Tyson, Nature 2004) and possibly more complex microbial communities (Venter, Science 2004) can be sequenced and annotated for at least the major components of that community.

l With representative DNA sequences from microbial communities, the next challenge in systems biology approaches to communities is proteome profiling and the analysis of protein complexes derived from communities.

l “Shotgun” proteomics is a rapidly evolving technology that may have the potential to analyze the very complex protein mixtures that will be derived from whole community proteomes.

l The purpose of this study is to evaluate and optimize MS-based “shotgun” proteomics techniques to analyze simple known microbial mixtures. The goal is to improve on current technologies and develop analytical methodologies for analyzing microbial communities whose genomes have been sequenced.

Recently, there have been strong efforts to develop techniques for genomic sequencing and annotation of microbial communities (metagenomics). With the potential of partial or near complete microbial genomes obtained from environmental samples, along with the rapid proliferation of isolated microbial genomes, systems biology in microbial communities by combining genomic, transcriptomic, proteomic and metabolic studies may be possible in the near future. Our current studiesseek to develop and demonstrate metaproteome analysis techniques for mixed cultures, establishing the basis for “whole community proteomics”. Currently our major focus is to use controlled simple microbial mixtures of four species (E. coli, S. cerevisiae, R. palustris, and S. oneidensis) to develop MS-based proteomics methods, as well as the proteome bioinformatics tools for detailed analysis of sequenced microbial communities. Our goal is to provide “deep” and “wide” proteome measurements of complex microbial mixtures. Some of the considerations for any microbial community proteome project are highlighted below.

Considerations for Proteome Analysis of Microbial Communities

l Level of DNA sequence availability and quality of annotation for any species in that community

l Diversity and dynamic range of species in the communityl Quantity of community available (i.e. how much total protein

can be obtained)l Inter- and intra- species relationship at the amino acid level.

CONCLUSIONS

ACKNOWLEDGEMENTS

l Dr. John Yates (Scripps) and Dr. David Tabb (ORNL) for DTASelect/Contrastl Exploratory funding from the ORNL Director’s Research and Development Programl ORNL-UTK Genome Science and Technology Graduate School (for M. Thompson)l Oak Ridge National Laboratory is managed and operated by the University of Tennessee-Battelle,

L.L.C., under contract DE-AC05-00OR22725 with the U.S. Department of Energy.

l Initial studies have focused on evaluating the standard 24-hour 2D-LC-MS/MS experiment for the analysis of 4-component artificial microbial mixtures.

l R. palustris was designated as the target species, and its concentration in the samples was set to 20%, 5%, 1%, 0.2% and 0%, while all other species were kept constant.

l Significant R. palustris proteome determinations were possible at 25% and 5%, with 1% borderline. A few unique peptides could be detected confidently at the 0.2% level.

l The effects of database size on unique peptide identification were examined. Larger databases diminish the identification of unique peptides for any given species, but species such as E. coli with many closely related species in the database seem to be affected more dramatically. Clearly, databases should be limited to the community genome plus a small number of distracters.

l The major focus of future experiments will be on instrumental and bioinformaticdevelopments to increase the confidence and extent of protein identification. In particular, more advanced multistage chromatographic separations combined with high throughput LTQ-MS/MS measurements will be evaluated.

Figure 8 Diagnostic Peptide at 1% R. palustrisMS/MS at 25% and 1% R. palustris

TVPSTSNDRPNVALTPAAPWPGAPFVPTGNPFADGVGPGSYAQR

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

anc

e

1442.2

1792.8

1412.71553.6

1322.51181.0

1790.7

1180.1

991. 6835. 4

1722.91555.7

1222.6 1794.61067.8896. 2

1593.81877.8956. 4569. 1 800. 2

1622.5713. 2 1991.8523. 0682. 7482. 7

25%

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

anc

e

1441.9

1790.8

1322.41792.0

1553.8

1323.2

1180.11412.5

1181.5

1794.71722.9991. 6

1067.7835. 4

896. 3 1592.61151.5 1818.11314.21651.6

946. 0569. 2 778. 4 1897.7509. 2 747. 8

1%

Figure 9 Diagnostic Peptide at 0.2% R. palustrisMS/MS at 25% and 0.2% R. palustris

ADVPELGLDNLPIIVPLR25%

400 600 800 100 0 120 0 140 0 160 0 180 0 200 0m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bun

dan

ce

830. 2

807. 5

155 9.6

113 7.4156 0.6

113 8.4 146 0.6385. 1

484. 3

131 9.7

879. 8 114 9.6132 0.6111 9.8

795. 1625. 3103 4.6

143 2.9597. 5 177 0.1127 4.6651. 3 166 0.6512. 3369. 0 183 9.6 195 1.7

0.2%

400 600 800 100 0 120 0 140 0 160 0 180 0 200 0m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bun

dan

ce

830. 3

155 9.7

113 7.3807. 6

385. 2 484. 2

113 8.4 146 1.6

131 9.7

114 9.5 132 0.8156 2.6134 7.5597. 2

795. 3 920. 6 112 0.2 127 5.1177 0.8682. 3507. 9 166 0.6339. 9

EXPERIMENTAL

Cell Growth and Production of Protein Fractions:l The four microbes (Figure 1) used in this study (E. coli, R. palustris, S.

cerevisiae, and S. oneidensis) were all grown individually and mixed after cell harvesting at appropriate concentrations based on wet cell weight. Figure 2 illustrates the different mixtures analyzed in this study.

l Mixed cell pellets were washed twice with Tris buffer, and then disrupted with sonication. Two crude protein fractions were created by ultracentrifugation 100,000g for 1 hour (membrane fraction and crude fraction). Protein fractions were quantified with BCA. Protein fractions were denatured, reduced, and digested with sequencing grade trypsin. The digested mixed proteomes were desalted via solid phase extraction and concentrated to ~10µg/µL and frozen at -80oC until analysis.

LC-MS/MS Analysis and Database Searching:l The tryptic digestions of the crude and membrane fractions from all

mixtures were analyzed via 24-hour two-dimensional LC-MS/MS with a split-phase MudPIT column as described in McDonald, W.H., IJMS, 2002. The LC MS system was composed of an Ultimate HPLC (LC Packings, a division of Dionex, San Francisco, CA) coupled to an LCQ-DECA XP ion trap mass spectrometer (Thermo Finnigan, San Jose, CA). Each analysis required ~500µg of starting protein material from either the crude or membranefraction.

l The resultant MS/MS spectra from the mixed proteomes were searched with SEQUEST (Thermo Finnigan) against three databases highlighted below. The result files were filtered and sorted with DTASelect (Tabb, Journal of Proteome Research 2004) and unique and non-unique peptide identifications were extracted with in-house perl scripts.

Databases Used to Search MS/MS Spectral DB4 - the four species in the mixtures:

• E. coli• R. palustris• S. cerevisiae• S. oneidensis

l DB13 - the four species plus 1 plant and 8 other microbesDB large - 261 species with 1,011,612 total proteins

Figure 1 Test species for Artificial Mixture

Figure 2 Concentrations in Artificial Mixtures

R. palustris E. coli

S. oneidensis S. cerevisiae

0

500

1000

1500

2000

2500

3000

3500

4000

# U

niqu

e Pe

ptid

es

E. coli

R. palustris

S. cerevisiae

S. oneidensis

others

Figure 5 Database comparison of the 25% mixture

Initial Mixture Studies:l The initial experiments for this study have focused on testing current 2D-LC-

MS/MS methodologies to analyze an artificial 4 microbe mixture of E. coli, R. palustris, S. cerevisiae and S. oneidensis.

l Two aspects of community proteomics were analyzed in this initial study:Ø The first was to determine at what level functionally meaningful results

could be obtained from a target species (R. palustris) whose concentration was decreased over a range of concentrations from 25% to 0.2%. All experiments were conducted with current 2D-LC-MS/MS technologies.

Ø The second was to test the effects of database size on the identification of unique peptides from the four microbial species in the sample. Unique peptides are defined as those peptides from a given protein that are unique to a specific species in a given database.

l Figure 3 highlights the results from the 4 species database with decreasing concentrations of R. palustris.

l Figure 4 highlights the results from the 13 species database with decreasing concentrations of R. palustris.

l Figure 5 highlights results from the 261 species database at the 25% of all species in the mixture.

Figure 3 Unique Peptides Identified with DB4

0500

1000150020002500300035004000450050005500

Percentage of R. palustris

# U

niqu

e Pe

ptid

es

E. coli (25-33%)

R. palustris

S. cerevisiae (25-33%)

S. oneidensis (25-33%)

25% 5% 1% 0.2% 0%

DB4 DB13 DB261Database

0

5001000

1500

20002500

3000

3500

40004500

5000

Percentage of R. palustris

# U

niqu

e Pe

ptid

esFigure 4 Unique Peptides Identified with DB13

25% 5% 1% 0.2% 0%

E. coli (25-33%)

R. palustris

S. cerevisiae (25-33%)

S. oneidensis (2-33%)

others (9)

Target R. Palustris Proteome Results from Initial Mixture Studies

l The functional categories of the identified proteome from R. palustris analyzed from the different mixtures was compared with our R. palustris baseline proteome (VerBerkmoes et al. in preparation). Figure 6 illustrates this comparison with top left being the entire R. palustris baseline proteome and the other three panels being the identified functional categories at the respective mixture concentrations.

l Figure 7 highlights the results from a single protein identified in the mixture proteomes. This protein puhA is critical subunit of thephotoreaction center and is indicative of metabolic state i.e. photosynthesis.

l Figures 8 and 9 illustrate diagnostic tandem mass spectra of peptides from puhA at the different concentrations of R. palustris in the sample, ranging down to 0.2%.

Figure 7 Protein PuhA, a Component of the Photosynthetic Reaction Center from R. palustris

R. palustis proteome1695 Total Proteins

Unknowns and Unclassif ied

Replication and Repair

Energy Metabolism

Carbon and Carbohy drate Metabolism

Lipid Metabolism

Transcription

Translation

Cellular Processes

Amino Acid Metabolism

General f unction prediction

Metabolism of Cofactors and Vitamins

Transport

Signal Transduction

Purine and Pyrimidine Metabolism

Mix 1 R. palustris 25% 610 total proteins

Mix 2 R. palustris 5%227 Total Proteins

Mix 3 R. palustris 1% 70 total Proteins

Figure 6 Functional Categories