enhancing sequence coverage in proteomics...

7
Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic Enzymes Dominic Baeumlisberger 2 , Christopher Kurz 3 , Tabiwang N. Arrey 1 , Marion Rohmer 2 , Carola Schiller 3 , Thomas Moehring 1 , Walter A. Möller 3 , and Michael Karas 2 1 Thermo Fisher Scientific, Bremen, Germany, 2 Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3 Department of Pharmacology,Goethe-University, Frankfurt am Main, Germany

Upload: dangnhu

Post on 18-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic EnzymesDominic Baeumlisberger2, Christopher Kurz3, Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3, and Michael Karas2

1Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology,Goethe-University, Frankfurt am Main, Germany

Page 2: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

2 Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic Enzymes

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic EnzymesDominic Baeumlisberger2, Christopher Kurz3 Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3 and Michael Karas21Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology, Goethe-University, Frankfurt am Main, Germany

Conclusion The use of three different enzymes in proteomics studies enabled an average

increase in total number of peptides of approximately 227.5 % and protein groups of about 68.8 % identified.

The use of three different enzymes led to an average increase in protein sequence coverage of about 31 %.

The use of three different enzymes improved overall confidence in protein identification

The use of three different enzymes aided the study of changes in protein sequences and post-translational modifications.

The high mass accuracy in both MS and MS/MS minimized false discovery rate (FDR).

In spite of the increase in sequence coverage with multiple enzyme digests, the highest number of protein and peptide identification for single proteolytic digest was obtained with trypsin.

References1. G. Choudhary et al., JPR, 2003, 2 (1), 59–67.

2. A. Gardner and G. R. Boles, Curr. Psychiatry Rev., 2005, 1 (3): 255–271.

3. A. E. Speers and C. C. Wu, Chem Rev., 2007, 107(8):3687–3714.

4. B. Rietschel et al. MCP, 2009, 8(5):1029-43.

5. D. Baeumlisberger et al. Proteomics, 2010, 10(21):3905-9.

OverviewPurpose: Increase sequence coverage and overall confidence of protein identification using a combination of datasets from three enzyme digests.

Methods: Peptides generated by proteolytic digestion of mitochondrial membrane were analyzed using a hybrid quadrupole-OrbitrapTM mass spectrometer.

Results: Combination of datasets from multiple enzyme digests enabled improved sequence coverage of proteins, increased the total number of unique peptide and protein groups identified, and minimized false-positive discovery rates.

IntroductionBesides being the main site of adenosine triphosphate (ATP), mitochondria are associated with a range of other processes and diseases such as cell growth, cellular differentiation, mitochondrial disorder, aging processes and cardiac dysfunctions. To obtain a better understanding of these mitochondrial processes and diseases, we need to identify the proteins and proteins modifications involved.

The ability to identify and characterize large numbers of proteins from medium- to high- complexity samples has made mass spectrometry (MS) coupled to reversed-phase high-performance liquid chromatography (HPLC) a common analytical technique in proteomics. Usually, the extracted proteins are digested with a suitable protease and the resulting peptide mixture is separated and analyzed. Trypsin is the common enzyme of choice for proteomics experiments. Digestion with trypsin (or any single enzyme in general) often results in the identification of large numbers of proteins, but sequence coverage is frequently incomplete. If maximum sequence coverage is desired (e.g. when studying changes in protein modification or different isoforms), then signals covering all or most of the protein sequence are needed. Different approaches have been used to improve protein sequence coverage in proteomics. In this study, data obtained from individual trypsin, chymotrypsin and elastase digests were combined to significantly improve sequence coverage of proteins.

MethodsSample Preparation

Purified mitochondrial membrane proteins from mouse brain were dissolved in 25 mM triethylammonium bicarbonate buffer. Disulfide bridges were reduced in dithiothreitol, alkylated with iodoacetamide and digested over night with trypsin, chymotrypsin and elastase. Digestion was stopped by freezing at −20°C. Just before separation, each digest was labeled with the Thermo Scientific Amine-Reactive Tandem Mass Tag (TMT0) Reagent, to improve fragmentation, especially of the elastase and chymotrypsin generated peptides.

Liquid Chromatography

Samples were loaded onto a Thermo Scientific Acclaim PepMap100 C18 pre-column (100 μm × 2 cm, C18 5 μm, 100 Å), and separated on a reversed-phase Acclaim®

PepMapTM100 C18 column (75 μm × 15 cm, C18 3 μm, 120 Å) using the Thermo Scientific EASY-nLC 1000 nanoflow HPLC. A 90 min gradient at a flow rate of 300 nL/min was used for the separation. Triplicate runs of individual enzyme digests were performed.

Mass Spectrometry

All MS and MS/MS spectra were acquired in positive ion mode using a Thermo Scientific Q Exactive hybrid quadrupole-Orbitrap mass spectrometer. Full-scan data was obtained at a resolution of 70,000 (at m/z 200), demanding 1e6 ions in the mass range 350–1800 Da. For the tandem MS, 1e5 charges were required and the fragment ions were measured at a resolution of 17,500 (at m/z 200). The 10 most intensive ions in a spectrum were selected for fragmentation with a maximum injection time of 200ms.

Data Analysis

The raw data files were searched using Thermo Scientific Proteome Discoverer software v. 1.3 with MascotTM v. 2.2.1 search engine (Matrix Science Ltd, London UK). The peptide tolerance for MS was set at 15 ppm and for MS/MS 20 mmu. A high-confidence peptide filter with FDR of 1% was used.

FIGURE 1. Proteins identified in triplicate experiments of each enzyme digest.

FIGURE 4. Tandem MS and annotated spectrum of the peptide AIQGGVLAGDVTDVLLLDVTPL generated from elastase digest. b-/a-type ions are shown in red while y-type ions in blue colour. The mass deviation of this peptide was 0.01 ppm (IonScore: 136) in MS and below 10 ppm for fragment ions in MS/MS.

ResultsThe Q ExactiveTM mass spectrometer provides not only rich fragmentation but also immonium ions, which are important for peptide correlation. Coupled with the high resolution and high mass accuracy in both MS and MS/MS, reliable identification is possible. This is especially very important for peptides generated using less-specific enzymes. Figure 1 shows triplicate runs of individual enzyme digests. Reproducibility rates of 69.9%, 62.3 % and 58. 25 % were obtained for trypsin, elastase and chymotrypsin, respectively. However, at the peptide level, it decreased to 57%, 46.92 % and 42. 97 % (see Figure 2) respectively.

In total 12,007 peptides from a combination of triplicate dataset of 3 enzyme digests were identified. As expected, no peptide common to all three enzyme digests was identified. Less than 1% of the total number of identified peptides were identified in two enzyme digests. As shown in Figure 3, mostly unique peptides were identified and common peptide sequences in most cases cover regions that could not be identified by one enzyme digest. While the shared peptides between trypsin /chymotrypsin and trypsin/elastase contained basically R and K amino acids at their C termini, 54.05 % of those shared between chymotrypsin and elastase were outside the define cleavage sites (Y, W, F, M, L) of chymotrypsin. Most of these peptides have A, V, L and S at their C-termini, typical cleavage sites for elastase.

Mascot is a registered product of Matrix Science Ltd. All other trademarks are the property of Thermo Fisher Scientific an its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

FIGURE 5. Total number of protein groups identified from triplicate runs of all enzymes. The highest number of proteins were identified with trypsin.FIGURE 3. Venn diagram showing unique peptides identified from triplicates

experiments in all 3 enzyme digest. As expected, no peptide identified was common to all three enzyme preparations.

FIGURE 6. A) Sequence coverage achieved using different enzymes for a 453 amino acid protein Cytochrome b-c1 complex subunit 2. Green represents sections of the protein that were identified and white, the sections that were not covered by any of the identified peptides. The sequence coverage increased by 7.3 %, 45.2 %, and 56.4% for trypsin, elastase and chymotrypsin respectively. Combining all datasets, a net increase of 32.8 % is obtained. B) Comparison of sequence coverage from a single enzyme digest (trypsin) to that of the combined dataset for identified membrane proteins. Dark blue bars represent coverage obtain with trypsin alone and red bars from the sum of all enzymes used.

FIGURE 7. Amino acid sequence of ATP synthase subunit beta showing sections of the protein that was identified with annotated known modification (from UniProt). Acetylation is represented by A and phosphorylation by P.

In general, 992 protein groups were identified in all enzyme digests, of which 18.25% were mitochondrial membrane proteins. Approximately 33% of the total number of identified proteins were present in the combined dataset (Figure 5). This not only lead to a significant increase in the number of protein groups identified but also enhanced the overall sequence coverage. However, the sequence coverage varied from protein to protein. For example, 100% or close to 100% sequence coverage was achieved for the small proteins (>100 amino acid) NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit or cytochrome b-c1 complex subunit, while for larger proteins such as cytochrome b-c1 complex subunit 2 (> 400 amino acid) as shown in Figure 6, sequence coverage above 90% was obtained.

453401351301251201151101511

All 3 enzymes 94.26%

453401351301251201151101511

Chymotrypsin 60.26%

453401351301251201151101511

Elastase 64.90%

453401351301251201151101511

Trypsin 87.86%

FTMS + p NSI d Full ms2 [email protected] [120.00-2480.00]

200 400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

755.4594

868.5435

1111.6288

656.3893 939.5805229.1541

434.7753

470.29381426.7715301.2065

996.6036 1210.6965

542.3498

1525.84171311.7445

300 400 500 600m/z

0.0

0.5

1.0

1.5

2.0

y2+

y3+

b5+

b1+b6

2+

b2+b7

2+

b82+

b3+b4+b10

2+

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700m/z

b6+

a6+

b7+

b8+

b9+

b10+

b11+ b14+b13+

a7+ a8+

Inte

nsity

10^

6

b12+

FIGURE 2. Peptides identified in triplicate experiments of each enzyme digest.

A common phenomenon which is observed with peptides generated by less-specific enzymes such as elastase, is the absence of charge localization at either the N- or C-terminus. Fragmentation of these peptides results in lack of extended b- or y-ion series and an increase in internal fragment ions. Due to the basic moiety (TMT0), extended b-ions were generated. Figure 4 shows an example of a tandem MS of this peptide, IQGGVLAGDVTDVLLLDVTPL with monoisotopic mass of 2408.38506.

The use of multiple enzyme digests in proteomic studies might enable proteolytic cleavages at sites further away from modified peptides, thereby overcoming incomplete digestion caused by these protein modifications. For example, with a combination of datasets, peptides covering almost all known modifications (present in UniProt) from ATP synthase subunit beta were identified (figure 7). This was not true for all the identified proteins; nevertheless, a reasonable number of modified peptides were identified. This shows that to some extent, some portions of the proteome are simply inaccessible following digestion with a single protease. Therefore, in combination with technical replicate, multiple proteases can be used to significantly improve sequence coverage of proteins from a proteome and increase the confidence degree in protein identification. In addition, proteins that were identified by individual enzymes would have been missed, if only this enzyme was used in this experiment.

A

B

0.00

20.00

40.00

60.00

80.00

100.00

1 21 41 61 81 101 121 141 161 181 201

Sequ

ence

Cov

erag

e

Total number of identifed membrane proteins

ΣCoverageCoverage (Trypsin)

Page 3: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

3Thermo Scientific Poster Note • PN63603_E 06/12S

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic EnzymesDominic Baeumlisberger2, Christopher Kurz3 Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3 and Michael Karas21Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology, Goethe-University, Frankfurt am Main, Germany

Conclusion The use of three different enzymes in proteomics studies enabled an average

increase in total number of peptides of approximately 227.5 % and protein groups of about 68.8 % identified.

The use of three different enzymes led to an average increase in protein sequence coverage of about 31 %.

The use of three different enzymes improved overall confidence in protein identification

The use of three different enzymes aided the study of changes in protein sequences and post-translational modifications.

The high mass accuracy in both MS and MS/MS minimized false discovery rate (FDR).

In spite of the increase in sequence coverage with multiple enzyme digests, the highest number of protein and peptide identification for single proteolytic digest was obtained with trypsin.

References1. G. Choudhary et al., JPR, 2003, 2 (1), 59–67.

2. A. Gardner and G. R. Boles, Curr. Psychiatry Rev., 2005, 1 (3): 255–271.

3. A. E. Speers and C. C. Wu, Chem Rev., 2007, 107(8):3687–3714.

4. B. Rietschel et al. MCP, 2009, 8(5):1029-43.

5. D. Baeumlisberger et al. Proteomics, 2010, 10(21):3905-9.

OverviewPurpose: Increase sequence coverage and overall confidence of protein identification using a combination of datasets from three enzyme digests.

Methods: Peptides generated by proteolytic digestion of mitochondrial membrane were analyzed using a hybrid quadrupole-OrbitrapTM mass spectrometer.

Results: Combination of datasets from multiple enzyme digests enabled improved sequence coverage of proteins, increased the total number of unique peptide and protein groups identified, and minimized false-positive discovery rates.

IntroductionBesides being the main site of adenosine triphosphate (ATP), mitochondria are associated with a range of other processes and diseases such as cell growth, cellular differentiation, mitochondrial disorder, aging processes and cardiac dysfunctions. To obtain a better understanding of these mitochondrial processes and diseases, we need to identify the proteins and proteins modifications involved.

The ability to identify and characterize large numbers of proteins from medium- to high- complexity samples has made mass spectrometry (MS) coupled to reversed-phase high-performance liquid chromatography (HPLC) a common analytical technique in proteomics. Usually, the extracted proteins are digested with a suitable protease and the resulting peptide mixture is separated and analyzed. Trypsin is the common enzyme of choice for proteomics experiments. Digestion with trypsin (or any single enzyme in general) often results in the identification of large numbers of proteins, but sequence coverage is frequently incomplete. If maximum sequence coverage is desired (e.g. when studying changes in protein modification or different isoforms), then signals covering all or most of the protein sequence are needed. Different approaches have been used to improve protein sequence coverage in proteomics. In this study, data obtained from individual trypsin, chymotrypsin and elastase digests were combined to significantly improve sequence coverage of proteins.

MethodsSample Preparation

Purified mitochondrial membrane proteins from mouse brain were dissolved in 25 mM triethylammonium bicarbonate buffer. Disulfide bridges were reduced in dithiothreitol, alkylated with iodoacetamide and digested over night with trypsin, chymotrypsin and elastase. Digestion was stopped by freezing at −20°C. Just before separation, each digest was labeled with the Thermo Scientific Amine-Reactive Tandem Mass Tag (TMT0) Reagent, to improve fragmentation, especially of the elastase and chymotrypsin generated peptides.

Liquid Chromatography

Samples were loaded onto a Thermo Scientific Acclaim PepMap100 C18 pre-column (100 μm × 2 cm, C18 5 μm, 100 Å), and separated on a reversed-phase Acclaim®

PepMapTM100 C18 column (75 μm × 15 cm, C18 3 μm, 120 Å) using the Thermo Scientific EASY-nLC 1000 nanoflow HPLC. A 90 min gradient at a flow rate of 300 nL/min was used for the separation. Triplicate runs of individual enzyme digests were performed.

Mass Spectrometry

All MS and MS/MS spectra were acquired in positive ion mode using a Thermo Scientific Q Exactive hybrid quadrupole-Orbitrap mass spectrometer. Full-scan data was obtained at a resolution of 70,000 (at m/z 200), demanding 1e6 ions in the mass range 350–1800 Da. For the tandem MS, 1e5 charges were required and the fragment ions were measured at a resolution of 17,500 (at m/z 200). The 10 most intensive ions in a spectrum were selected for fragmentation with a maximum injection time of 200ms.

Data Analysis

The raw data files were searched using Thermo Scientific Proteome Discoverer software v. 1.3 with MascotTM v. 2.2.1 search engine (Matrix Science Ltd, London UK). The peptide tolerance for MS was set at 15 ppm and for MS/MS 20 mmu. A high-confidence peptide filter with FDR of 1% was used.

FIGURE 1. Proteins identified in triplicate experiments of each enzyme digest.

FIGURE 4. Tandem MS and annotated spectrum of the peptide AIQGGVLAGDVTDVLLLDVTPL generated from elastase digest. b-/a-type ions are shown in red while y-type ions in blue colour. The mass deviation of this peptide was 0.01 ppm (IonScore: 136) in MS and below 10 ppm for fragment ions in MS/MS.

ResultsThe Q ExactiveTM mass spectrometer provides not only rich fragmentation but also immonium ions, which are important for peptide correlation. Coupled with the high resolution and high mass accuracy in both MS and MS/MS, reliable identification is possible. This is especially very important for peptides generated using less-specific enzymes. Figure 1 shows triplicate runs of individual enzyme digests. Reproducibility rates of 69.9%, 62.3 % and 58. 25 % were obtained for trypsin, elastase and chymotrypsin, respectively. However, at the peptide level, it decreased to 57%, 46.92 % and 42. 97 % (see Figure 2) respectively.

In total 12,007 peptides from a combination of triplicate dataset of 3 enzyme digests were identified. As expected, no peptide common to all three enzyme digests was identified. Less than 1% of the total number of identified peptides were identified in two enzyme digests. As shown in Figure 3, mostly unique peptides were identified and common peptide sequences in most cases cover regions that could not be identified by one enzyme digest. While the shared peptides between trypsin /chymotrypsin and trypsin/elastase contained basically R and K amino acids at their C termini, 54.05 % of those shared between chymotrypsin and elastase were outside the define cleavage sites (Y, W, F, M, L) of chymotrypsin. Most of these peptides have A, V, L and S at their C-termini, typical cleavage sites for elastase.

Mascot is a registered product of Matrix Science Ltd. All other trademarks are the property of Thermo Fisher Scientific an its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

FIGURE 5. Total number of protein groups identified from triplicate runs of all enzymes. The highest number of proteins were identified with trypsin.FIGURE 3. Venn diagram showing unique peptides identified from triplicates

experiments in all 3 enzyme digest. As expected, no peptide identified was common to all three enzyme preparations.

FIGURE 6. A) Sequence coverage achieved using different enzymes for a 453 amino acid protein Cytochrome b-c1 complex subunit 2. Green represents sections of the protein that were identified and white, the sections that were not covered by any of the identified peptides. The sequence coverage increased by 7.3 %, 45.2 %, and 56.4% for trypsin, elastase and chymotrypsin respectively. Combining all datasets, a net increase of 32.8 % is obtained. B) Comparison of sequence coverage from a single enzyme digest (trypsin) to that of the combined dataset for identified membrane proteins. Dark blue bars represent coverage obtain with trypsin alone and red bars from the sum of all enzymes used.

FIGURE 7. Amino acid sequence of ATP synthase subunit beta showing sections of the protein that was identified with annotated known modification (from UniProt). Acetylation is represented by A and phosphorylation by P.

In general, 992 protein groups were identified in all enzyme digests, of which 18.25% were mitochondrial membrane proteins. Approximately 33% of the total number of identified proteins were present in the combined dataset (Figure 5). This not only lead to a significant increase in the number of protein groups identified but also enhanced the overall sequence coverage. However, the sequence coverage varied from protein to protein. For example, 100% or close to 100% sequence coverage was achieved for the small proteins (>100 amino acid) NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit or cytochrome b-c1 complex subunit, while for larger proteins such as cytochrome b-c1 complex subunit 2 (> 400 amino acid) as shown in Figure 6, sequence coverage above 90% was obtained.

453401351301251201151101511

All 3 enzymes 94.26%

453401351301251201151101511

Chymotrypsin 60.26%

453401351301251201151101511

Elastase 64.90%

453401351301251201151101511

Trypsin 87.86%

FTMS + p NSI d Full ms2 [email protected] [120.00-2480.00]

200 400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

755.4594

868.5435

1111.6288

656.3893 939.5805229.1541

434.7753

470.29381426.7715301.2065

996.6036 1210.6965

542.3498

1525.84171311.7445

300 400 500 600m/z

0.0

0.5

1.0

1.5

2.0

y2+

y3+

b5+

b1+b6

2+

b2+b7

2+

b82+

b3+b4+b10

2+

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700m/z

b6+

a6+

b7+

b8+

b9+

b10+

b11+ b14+b13+

a7+ a8+

Inte

nsity

10^

6

b12+

FIGURE 2. Peptides identified in triplicate experiments of each enzyme digest.

A common phenomenon which is observed with peptides generated by less-specific enzymes such as elastase, is the absence of charge localization at either the N- or C-terminus. Fragmentation of these peptides results in lack of extended b- or y-ion series and an increase in internal fragment ions. Due to the basic moiety (TMT0), extended b-ions were generated. Figure 4 shows an example of a tandem MS of this peptide, IQGGVLAGDVTDVLLLDVTPL with monoisotopic mass of 2408.38506.

The use of multiple enzyme digests in proteomic studies might enable proteolytic cleavages at sites further away from modified peptides, thereby overcoming incomplete digestion caused by these protein modifications. For example, with a combination of datasets, peptides covering almost all known modifications (present in UniProt) from ATP synthase subunit beta were identified (figure 7). This was not true for all the identified proteins; nevertheless, a reasonable number of modified peptides were identified. This shows that to some extent, some portions of the proteome are simply inaccessible following digestion with a single protease. Therefore, in combination with technical replicate, multiple proteases can be used to significantly improve sequence coverage of proteins from a proteome and increase the confidence degree in protein identification. In addition, proteins that were identified by individual enzymes would have been missed, if only this enzyme was used in this experiment.

A

B

0.00

20.00

40.00

60.00

80.00

100.00

1 21 41 61 81 101 121 141 161 181 201

Sequ

ence

Cov

erag

e

Total number of identifed membrane proteins

ΣCoverageCoverage (Trypsin)

Page 4: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

4 Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic Enzymes

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic EnzymesDominic Baeumlisberger2, Christopher Kurz3 Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3 and Michael Karas21Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology, Goethe-University, Frankfurt am Main, Germany

Conclusion The use of three different enzymes in proteomics studies enabled an average

increase in total number of peptides of approximately 227.5 % and protein groups of about 68.8 % identified.

The use of three different enzymes led to an average increase in protein sequence coverage of about 31 %.

The use of three different enzymes improved overall confidence in protein identification

The use of three different enzymes aided the study of changes in protein sequences and post-translational modifications.

The high mass accuracy in both MS and MS/MS minimized false discovery rate (FDR).

In spite of the increase in sequence coverage with multiple enzyme digests, the highest number of protein and peptide identification for single proteolytic digest was obtained with trypsin.

References1. G. Choudhary et al., JPR, 2003, 2 (1), 59–67.

2. A. Gardner and G. R. Boles, Curr. Psychiatry Rev., 2005, 1 (3): 255–271.

3. A. E. Speers and C. C. Wu, Chem Rev., 2007, 107(8):3687–3714.

4. B. Rietschel et al. MCP, 2009, 8(5):1029-43.

5. D. Baeumlisberger et al. Proteomics, 2010, 10(21):3905-9.

OverviewPurpose: Increase sequence coverage and overall confidence of protein identification using a combination of datasets from three enzyme digests.

Methods: Peptides generated by proteolytic digestion of mitochondrial membrane were analyzed using a hybrid quadrupole-OrbitrapTM mass spectrometer.

Results: Combination of datasets from multiple enzyme digests enabled improved sequence coverage of proteins, increased the total number of unique peptide and protein groups identified, and minimized false-positive discovery rates.

IntroductionBesides being the main site of adenosine triphosphate (ATP), mitochondria are associated with a range of other processes and diseases such as cell growth, cellular differentiation, mitochondrial disorder, aging processes and cardiac dysfunctions. To obtain a better understanding of these mitochondrial processes and diseases, we need to identify the proteins and proteins modifications involved.

The ability to identify and characterize large numbers of proteins from medium- to high- complexity samples has made mass spectrometry (MS) coupled to reversed-phase high-performance liquid chromatography (HPLC) a common analytical technique in proteomics. Usually, the extracted proteins are digested with a suitable protease and the resulting peptide mixture is separated and analyzed. Trypsin is the common enzyme of choice for proteomics experiments. Digestion with trypsin (or any single enzyme in general) often results in the identification of large numbers of proteins, but sequence coverage is frequently incomplete. If maximum sequence coverage is desired (e.g. when studying changes in protein modification or different isoforms), then signals covering all or most of the protein sequence are needed. Different approaches have been used to improve protein sequence coverage in proteomics. In this study, data obtained from individual trypsin, chymotrypsin and elastase digests were combined to significantly improve sequence coverage of proteins.

MethodsSample Preparation

Purified mitochondrial membrane proteins from mouse brain were dissolved in 25 mM triethylammonium bicarbonate buffer. Disulfide bridges were reduced in dithiothreitol, alkylated with iodoacetamide and digested over night with trypsin, chymotrypsin and elastase. Digestion was stopped by freezing at −20°C. Just before separation, each digest was labeled with the Thermo Scientific Amine-Reactive Tandem Mass Tag (TMT0) Reagent, to improve fragmentation, especially of the elastase and chymotrypsin generated peptides.

Liquid Chromatography

Samples were loaded onto a Thermo Scientific Acclaim PepMap100 C18 pre-column (100 μm × 2 cm, C18 5 μm, 100 Å), and separated on a reversed-phase Acclaim®

PepMapTM100 C18 column (75 μm × 15 cm, C18 3 μm, 120 Å) using the Thermo Scientific EASY-nLC 1000 nanoflow HPLC. A 90 min gradient at a flow rate of 300 nL/min was used for the separation. Triplicate runs of individual enzyme digests were performed.

Mass Spectrometry

All MS and MS/MS spectra were acquired in positive ion mode using a Thermo Scientific Q Exactive hybrid quadrupole-Orbitrap mass spectrometer. Full-scan data was obtained at a resolution of 70,000 (at m/z 200), demanding 1e6 ions in the mass range 350–1800 Da. For the tandem MS, 1e5 charges were required and the fragment ions were measured at a resolution of 17,500 (at m/z 200). The 10 most intensive ions in a spectrum were selected for fragmentation with a maximum injection time of 200ms.

Data Analysis

The raw data files were searched using Thermo Scientific Proteome Discoverer software v. 1.3 with MascotTM v. 2.2.1 search engine (Matrix Science Ltd, London UK). The peptide tolerance for MS was set at 15 ppm and for MS/MS 20 mmu. A high-confidence peptide filter with FDR of 1% was used.

FIGURE 1. Proteins identified in triplicate experiments of each enzyme digest.

FIGURE 4. Tandem MS and annotated spectrum of the peptide AIQGGVLAGDVTDVLLLDVTPL generated from elastase digest. b-/a-type ions are shown in red while y-type ions in blue colour. The mass deviation of this peptide was 0.01 ppm (IonScore: 136) in MS and below 10 ppm for fragment ions in MS/MS.

ResultsThe Q ExactiveTM mass spectrometer provides not only rich fragmentation but also immonium ions, which are important for peptide correlation. Coupled with the high resolution and high mass accuracy in both MS and MS/MS, reliable identification is possible. This is especially very important for peptides generated using less-specific enzymes. Figure 1 shows triplicate runs of individual enzyme digests. Reproducibility rates of 69.9%, 62.3 % and 58. 25 % were obtained for trypsin, elastase and chymotrypsin, respectively. However, at the peptide level, it decreased to 57%, 46.92 % and 42. 97 % (see Figure 2) respectively.

In total 12,007 peptides from a combination of triplicate dataset of 3 enzyme digests were identified. As expected, no peptide common to all three enzyme digests was identified. Less than 1% of the total number of identified peptides were identified in two enzyme digests. As shown in Figure 3, mostly unique peptides were identified and common peptide sequences in most cases cover regions that could not be identified by one enzyme digest. While the shared peptides between trypsin /chymotrypsin and trypsin/elastase contained basically R and K amino acids at their C termini, 54.05 % of those shared between chymotrypsin and elastase were outside the define cleavage sites (Y, W, F, M, L) of chymotrypsin. Most of these peptides have A, V, L and S at their C-termini, typical cleavage sites for elastase.

Mascot is a registered product of Matrix Science Ltd. All other trademarks are the property of Thermo Fisher Scientific an its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

FIGURE 5. Total number of protein groups identified from triplicate runs of all enzymes. The highest number of proteins were identified with trypsin.FIGURE 3. Venn diagram showing unique peptides identified from triplicates

experiments in all 3 enzyme digest. As expected, no peptide identified was common to all three enzyme preparations.

FIGURE 6. A) Sequence coverage achieved using different enzymes for a 453 amino acid protein Cytochrome b-c1 complex subunit 2. Green represents sections of the protein that were identified and white, the sections that were not covered by any of the identified peptides. The sequence coverage increased by 7.3 %, 45.2 %, and 56.4% for trypsin, elastase and chymotrypsin respectively. Combining all datasets, a net increase of 32.8 % is obtained. B) Comparison of sequence coverage from a single enzyme digest (trypsin) to that of the combined dataset for identified membrane proteins. Dark blue bars represent coverage obtain with trypsin alone and red bars from the sum of all enzymes used.

FIGURE 7. Amino acid sequence of ATP synthase subunit beta showing sections of the protein that was identified with annotated known modification (from UniProt). Acetylation is represented by A and phosphorylation by P.

In general, 992 protein groups were identified in all enzyme digests, of which 18.25% were mitochondrial membrane proteins. Approximately 33% of the total number of identified proteins were present in the combined dataset (Figure 5). This not only lead to a significant increase in the number of protein groups identified but also enhanced the overall sequence coverage. However, the sequence coverage varied from protein to protein. For example, 100% or close to 100% sequence coverage was achieved for the small proteins (>100 amino acid) NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit or cytochrome b-c1 complex subunit, while for larger proteins such as cytochrome b-c1 complex subunit 2 (> 400 amino acid) as shown in Figure 6, sequence coverage above 90% was obtained.

453401351301251201151101511

All 3 enzymes 94.26%

453401351301251201151101511

Chymotrypsin 60.26%

453401351301251201151101511

Elastase 64.90%

453401351301251201151101511

Trypsin 87.86%

FTMS + p NSI d Full ms2 [email protected] [120.00-2480.00]

200 400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

755.4594

868.5435

1111.6288

656.3893 939.5805229.1541

434.7753

470.29381426.7715301.2065

996.6036 1210.6965

542.3498

1525.84171311.7445

300 400 500 600m/z

0.0

0.5

1.0

1.5

2.0

y2+

y3+

b5+

b1+b6

2+

b2+b7

2+

b82+

b3+b4+b10

2+

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700m/z

b6+

a6+

b7+

b8+

b9+

b10+

b11+ b14+b13+

a7+ a8+

Inte

nsity

10^

6

b12+

FIGURE 2. Peptides identified in triplicate experiments of each enzyme digest.

A common phenomenon which is observed with peptides generated by less-specific enzymes such as elastase, is the absence of charge localization at either the N- or C-terminus. Fragmentation of these peptides results in lack of extended b- or y-ion series and an increase in internal fragment ions. Due to the basic moiety (TMT0), extended b-ions were generated. Figure 4 shows an example of a tandem MS of this peptide, IQGGVLAGDVTDVLLLDVTPL with monoisotopic mass of 2408.38506.

The use of multiple enzyme digests in proteomic studies might enable proteolytic cleavages at sites further away from modified peptides, thereby overcoming incomplete digestion caused by these protein modifications. For example, with a combination of datasets, peptides covering almost all known modifications (present in UniProt) from ATP synthase subunit beta were identified (figure 7). This was not true for all the identified proteins; nevertheless, a reasonable number of modified peptides were identified. This shows that to some extent, some portions of the proteome are simply inaccessible following digestion with a single protease. Therefore, in combination with technical replicate, multiple proteases can be used to significantly improve sequence coverage of proteins from a proteome and increase the confidence degree in protein identification. In addition, proteins that were identified by individual enzymes would have been missed, if only this enzyme was used in this experiment.

A

B

0.00

20.00

40.00

60.00

80.00

100.00

1 21 41 61 81 101 121 141 161 181 201

Sequ

ence

Cov

erag

e

Total number of identifed membrane proteins

ΣCoverageCoverage (Trypsin)

Page 5: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

5Thermo Scientific Poster Note • PN63603_E 06/12S

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic EnzymesDominic Baeumlisberger2, Christopher Kurz3 Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3 and Michael Karas21Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology, Goethe-University, Frankfurt am Main, Germany

Conclusion The use of three different enzymes in proteomics studies enabled an average

increase in total number of peptides of approximately 227.5 % and protein groups of about 68.8 % identified.

The use of three different enzymes led to an average increase in protein sequence coverage of about 31 %.

The use of three different enzymes improved overall confidence in protein identification

The use of three different enzymes aided the study of changes in protein sequences and post-translational modifications.

The high mass accuracy in both MS and MS/MS minimized false discovery rate (FDR).

In spite of the increase in sequence coverage with multiple enzyme digests, the highest number of protein and peptide identification for single proteolytic digest was obtained with trypsin.

References1. G. Choudhary et al., JPR, 2003, 2 (1), 59–67.

2. A. Gardner and G. R. Boles, Curr. Psychiatry Rev., 2005, 1 (3): 255–271.

3. A. E. Speers and C. C. Wu, Chem Rev., 2007, 107(8):3687–3714.

4. B. Rietschel et al. MCP, 2009, 8(5):1029-43.

5. D. Baeumlisberger et al. Proteomics, 2010, 10(21):3905-9.

OverviewPurpose: Increase sequence coverage and overall confidence of protein identification using a combination of datasets from three enzyme digests.

Methods: Peptides generated by proteolytic digestion of mitochondrial membrane were analyzed using a hybrid quadrupole-OrbitrapTM mass spectrometer.

Results: Combination of datasets from multiple enzyme digests enabled improved sequence coverage of proteins, increased the total number of unique peptide and protein groups identified, and minimized false-positive discovery rates.

IntroductionBesides being the main site of adenosine triphosphate (ATP), mitochondria are associated with a range of other processes and diseases such as cell growth, cellular differentiation, mitochondrial disorder, aging processes and cardiac dysfunctions. To obtain a better understanding of these mitochondrial processes and diseases, we need to identify the proteins and proteins modifications involved.

The ability to identify and characterize large numbers of proteins from medium- to high- complexity samples has made mass spectrometry (MS) coupled to reversed-phase high-performance liquid chromatography (HPLC) a common analytical technique in proteomics. Usually, the extracted proteins are digested with a suitable protease and the resulting peptide mixture is separated and analyzed. Trypsin is the common enzyme of choice for proteomics experiments. Digestion with trypsin (or any single enzyme in general) often results in the identification of large numbers of proteins, but sequence coverage is frequently incomplete. If maximum sequence coverage is desired (e.g. when studying changes in protein modification or different isoforms), then signals covering all or most of the protein sequence are needed. Different approaches have been used to improve protein sequence coverage in proteomics. In this study, data obtained from individual trypsin, chymotrypsin and elastase digests were combined to significantly improve sequence coverage of proteins.

MethodsSample Preparation

Purified mitochondrial membrane proteins from mouse brain were dissolved in 25 mM triethylammonium bicarbonate buffer. Disulfide bridges were reduced in dithiothreitol, alkylated with iodoacetamide and digested over night with trypsin, chymotrypsin and elastase. Digestion was stopped by freezing at −20°C. Just before separation, each digest was labeled with the Thermo Scientific Amine-Reactive Tandem Mass Tag (TMT0) Reagent, to improve fragmentation, especially of the elastase and chymotrypsin generated peptides.

Liquid Chromatography

Samples were loaded onto a Thermo Scientific Acclaim PepMap100 C18 pre-column (100 μm × 2 cm, C18 5 μm, 100 Å), and separated on a reversed-phase Acclaim®

PepMapTM100 C18 column (75 μm × 15 cm, C18 3 μm, 120 Å) using the Thermo Scientific EASY-nLC 1000 nanoflow HPLC. A 90 min gradient at a flow rate of 300 nL/min was used for the separation. Triplicate runs of individual enzyme digests were performed.

Mass Spectrometry

All MS and MS/MS spectra were acquired in positive ion mode using a Thermo Scientific Q Exactive hybrid quadrupole-Orbitrap mass spectrometer. Full-scan data was obtained at a resolution of 70,000 (at m/z 200), demanding 1e6 ions in the mass range 350–1800 Da. For the tandem MS, 1e5 charges were required and the fragment ions were measured at a resolution of 17,500 (at m/z 200). The 10 most intensive ions in a spectrum were selected for fragmentation with a maximum injection time of 200ms.

Data Analysis

The raw data files were searched using Thermo Scientific Proteome Discoverer software v. 1.3 with MascotTM v. 2.2.1 search engine (Matrix Science Ltd, London UK). The peptide tolerance for MS was set at 15 ppm and for MS/MS 20 mmu. A high-confidence peptide filter with FDR of 1% was used.

FIGURE 1. Proteins identified in triplicate experiments of each enzyme digest.

FIGURE 4. Tandem MS and annotated spectrum of the peptide AIQGGVLAGDVTDVLLLDVTPL generated from elastase digest. b-/a-type ions are shown in red while y-type ions in blue colour. The mass deviation of this peptide was 0.01 ppm (IonScore: 136) in MS and below 10 ppm for fragment ions in MS/MS.

ResultsThe Q ExactiveTM mass spectrometer provides not only rich fragmentation but also immonium ions, which are important for peptide correlation. Coupled with the high resolution and high mass accuracy in both MS and MS/MS, reliable identification is possible. This is especially very important for peptides generated using less-specific enzymes. Figure 1 shows triplicate runs of individual enzyme digests. Reproducibility rates of 69.9%, 62.3 % and 58. 25 % were obtained for trypsin, elastase and chymotrypsin, respectively. However, at the peptide level, it decreased to 57%, 46.92 % and 42. 97 % (see Figure 2) respectively.

In total 12,007 peptides from a combination of triplicate dataset of 3 enzyme digests were identified. As expected, no peptide common to all three enzyme digests was identified. Less than 1% of the total number of identified peptides were identified in two enzyme digests. As shown in Figure 3, mostly unique peptides were identified and common peptide sequences in most cases cover regions that could not be identified by one enzyme digest. While the shared peptides between trypsin /chymotrypsin and trypsin/elastase contained basically R and K amino acids at their C termini, 54.05 % of those shared between chymotrypsin and elastase were outside the define cleavage sites (Y, W, F, M, L) of chymotrypsin. Most of these peptides have A, V, L and S at their C-termini, typical cleavage sites for elastase.

Mascot is a registered product of Matrix Science Ltd. All other trademarks are the property of Thermo Fisher Scientific an its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

FIGURE 5. Total number of protein groups identified from triplicate runs of all enzymes. The highest number of proteins were identified with trypsin.FIGURE 3. Venn diagram showing unique peptides identified from triplicates

experiments in all 3 enzyme digest. As expected, no peptide identified was common to all three enzyme preparations.

FIGURE 6. A) Sequence coverage achieved using different enzymes for a 453 amino acid protein Cytochrome b-c1 complex subunit 2. Green represents sections of the protein that were identified and white, the sections that were not covered by any of the identified peptides. The sequence coverage increased by 7.3 %, 45.2 %, and 56.4% for trypsin, elastase and chymotrypsin respectively. Combining all datasets, a net increase of 32.8 % is obtained. B) Comparison of sequence coverage from a single enzyme digest (trypsin) to that of the combined dataset for identified membrane proteins. Dark blue bars represent coverage obtain with trypsin alone and red bars from the sum of all enzymes used.

FIGURE 7. Amino acid sequence of ATP synthase subunit beta showing sections of the protein that was identified with annotated known modification (from UniProt). Acetylation is represented by A and phosphorylation by P.

In general, 992 protein groups were identified in all enzyme digests, of which 18.25% were mitochondrial membrane proteins. Approximately 33% of the total number of identified proteins were present in the combined dataset (Figure 5). This not only lead to a significant increase in the number of protein groups identified but also enhanced the overall sequence coverage. However, the sequence coverage varied from protein to protein. For example, 100% or close to 100% sequence coverage was achieved for the small proteins (>100 amino acid) NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit or cytochrome b-c1 complex subunit, while for larger proteins such as cytochrome b-c1 complex subunit 2 (> 400 amino acid) as shown in Figure 6, sequence coverage above 90% was obtained.

453401351301251201151101511

All 3 enzymes 94.26%

453401351301251201151101511

Chymotrypsin 60.26%

453401351301251201151101511

Elastase 64.90%

453401351301251201151101511

Trypsin 87.86%

FTMS + p NSI d Full ms2 [email protected] [120.00-2480.00]

200 400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

755.4594

868.5435

1111.6288

656.3893 939.5805229.1541

434.7753

470.29381426.7715301.2065

996.6036 1210.6965

542.3498

1525.84171311.7445

300 400 500 600m/z

0.0

0.5

1.0

1.5

2.0

y2+

y3+

b5+

b1+b6

2+

b2+b7

2+

b82+

b3+b4+b10

2+

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700m/z

b6+

a6+

b7+

b8+

b9+

b10+

b11+ b14+b13+

a7+ a8+

Inte

nsity

10^

6

b12+

FIGURE 2. Peptides identified in triplicate experiments of each enzyme digest.

A common phenomenon which is observed with peptides generated by less-specific enzymes such as elastase, is the absence of charge localization at either the N- or C-terminus. Fragmentation of these peptides results in lack of extended b- or y-ion series and an increase in internal fragment ions. Due to the basic moiety (TMT0), extended b-ions were generated. Figure 4 shows an example of a tandem MS of this peptide, IQGGVLAGDVTDVLLLDVTPL with monoisotopic mass of 2408.38506.

The use of multiple enzyme digests in proteomic studies might enable proteolytic cleavages at sites further away from modified peptides, thereby overcoming incomplete digestion caused by these protein modifications. For example, with a combination of datasets, peptides covering almost all known modifications (present in UniProt) from ATP synthase subunit beta were identified (figure 7). This was not true for all the identified proteins; nevertheless, a reasonable number of modified peptides were identified. This shows that to some extent, some portions of the proteome are simply inaccessible following digestion with a single protease. Therefore, in combination with technical replicate, multiple proteases can be used to significantly improve sequence coverage of proteins from a proteome and increase the confidence degree in protein identification. In addition, proteins that were identified by individual enzymes would have been missed, if only this enzyme was used in this experiment.

A

B

0.00

20.00

40.00

60.00

80.00

100.00

1 21 41 61 81 101 121 141 161 181 201

Sequ

ence

Cov

erag

e

Total number of identifed membrane proteins

ΣCoverageCoverage (Trypsin)

Page 6: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

6 Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic Enzymes

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic EnzymesDominic Baeumlisberger2, Christopher Kurz3 Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3 and Michael Karas21Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology, Goethe-University, Frankfurt am Main, Germany

Conclusion The use of three different enzymes in proteomics studies enabled an average

increase in total number of peptides of approximately 227.5 % and protein groups of about 68.8 % identified.

The use of three different enzymes led to an average increase in protein sequence coverage of about 31 %.

The use of three different enzymes improved overall confidence in protein identification

The use of three different enzymes aided the study of changes in protein sequences and post-translational modifications.

The high mass accuracy in both MS and MS/MS minimized false discovery rate (FDR).

In spite of the increase in sequence coverage with multiple enzyme digests, the highest number of protein and peptide identification for single proteolytic digest was obtained with trypsin.

References1. G. Choudhary et al., JPR, 2003, 2 (1), 59–67.

2. A. Gardner and G. R. Boles, Curr. Psychiatry Rev., 2005, 1 (3): 255–271.

3. A. E. Speers and C. C. Wu, Chem Rev., 2007, 107(8):3687–3714.

4. B. Rietschel et al. MCP, 2009, 8(5):1029-43.

5. D. Baeumlisberger et al. Proteomics, 2010, 10(21):3905-9.

OverviewPurpose: Increase sequence coverage and overall confidence of protein identification using a combination of datasets from three enzyme digests.

Methods: Peptides generated by proteolytic digestion of mitochondrial membrane were analyzed using a hybrid quadrupole-OrbitrapTM mass spectrometer.

Results: Combination of datasets from multiple enzyme digests enabled improved sequence coverage of proteins, increased the total number of unique peptide and protein groups identified, and minimized false-positive discovery rates.

IntroductionBesides being the main site of adenosine triphosphate (ATP), mitochondria are associated with a range of other processes and diseases such as cell growth, cellular differentiation, mitochondrial disorder, aging processes and cardiac dysfunctions. To obtain a better understanding of these mitochondrial processes and diseases, we need to identify the proteins and proteins modifications involved.

The ability to identify and characterize large numbers of proteins from medium- to high- complexity samples has made mass spectrometry (MS) coupled to reversed-phase high-performance liquid chromatography (HPLC) a common analytical technique in proteomics. Usually, the extracted proteins are digested with a suitable protease and the resulting peptide mixture is separated and analyzed. Trypsin is the common enzyme of choice for proteomics experiments. Digestion with trypsin (or any single enzyme in general) often results in the identification of large numbers of proteins, but sequence coverage is frequently incomplete. If maximum sequence coverage is desired (e.g. when studying changes in protein modification or different isoforms), then signals covering all or most of the protein sequence are needed. Different approaches have been used to improve protein sequence coverage in proteomics. In this study, data obtained from individual trypsin, chymotrypsin and elastase digests were combined to significantly improve sequence coverage of proteins.

MethodsSample Preparation

Purified mitochondrial membrane proteins from mouse brain were dissolved in 25 mM triethylammonium bicarbonate buffer. Disulfide bridges were reduced in dithiothreitol, alkylated with iodoacetamide and digested over night with trypsin, chymotrypsin and elastase. Digestion was stopped by freezing at −20°C. Just before separation, each digest was labeled with the Thermo Scientific Amine-Reactive Tandem Mass Tag (TMT0) Reagent, to improve fragmentation, especially of the elastase and chymotrypsin generated peptides.

Liquid Chromatography

Samples were loaded onto a Thermo Scientific Acclaim PepMap100 C18 pre-column (100 μm × 2 cm, C18 5 μm, 100 Å), and separated on a reversed-phase Acclaim®

PepMapTM100 C18 column (75 μm × 15 cm, C18 3 μm, 120 Å) using the Thermo Scientific EASY-nLC 1000 nanoflow HPLC. A 90 min gradient at a flow rate of 300 nL/min was used for the separation. Triplicate runs of individual enzyme digests were performed.

Mass Spectrometry

All MS and MS/MS spectra were acquired in positive ion mode using a Thermo Scientific Q Exactive hybrid quadrupole-Orbitrap mass spectrometer. Full-scan data was obtained at a resolution of 70,000 (at m/z 200), demanding 1e6 ions in the mass range 350–1800 Da. For the tandem MS, 1e5 charges were required and the fragment ions were measured at a resolution of 17,500 (at m/z 200). The 10 most intensive ions in a spectrum were selected for fragmentation with a maximum injection time of 200ms.

Data Analysis

The raw data files were searched using Thermo Scientific Proteome Discoverer software v. 1.3 with MascotTM v. 2.2.1 search engine (Matrix Science Ltd, London UK). The peptide tolerance for MS was set at 15 ppm and for MS/MS 20 mmu. A high-confidence peptide filter with FDR of 1% was used.

FIGURE 1. Proteins identified in triplicate experiments of each enzyme digest.

FIGURE 4. Tandem MS and annotated spectrum of the peptide AIQGGVLAGDVTDVLLLDVTPL generated from elastase digest. b-/a-type ions are shown in red while y-type ions in blue colour. The mass deviation of this peptide was 0.01 ppm (IonScore: 136) in MS and below 10 ppm for fragment ions in MS/MS.

ResultsThe Q ExactiveTM mass spectrometer provides not only rich fragmentation but also immonium ions, which are important for peptide correlation. Coupled with the high resolution and high mass accuracy in both MS and MS/MS, reliable identification is possible. This is especially very important for peptides generated using less-specific enzymes. Figure 1 shows triplicate runs of individual enzyme digests. Reproducibility rates of 69.9%, 62.3 % and 58. 25 % were obtained for trypsin, elastase and chymotrypsin, respectively. However, at the peptide level, it decreased to 57%, 46.92 % and 42. 97 % (see Figure 2) respectively.

In total 12,007 peptides from a combination of triplicate dataset of 3 enzyme digests were identified. As expected, no peptide common to all three enzyme digests was identified. Less than 1% of the total number of identified peptides were identified in two enzyme digests. As shown in Figure 3, mostly unique peptides were identified and common peptide sequences in most cases cover regions that could not be identified by one enzyme digest. While the shared peptides between trypsin /chymotrypsin and trypsin/elastase contained basically R and K amino acids at their C termini, 54.05 % of those shared between chymotrypsin and elastase were outside the define cleavage sites (Y, W, F, M, L) of chymotrypsin. Most of these peptides have A, V, L and S at their C-termini, typical cleavage sites for elastase.

Mascot is a registered product of Matrix Science Ltd. All other trademarks are the property of Thermo Fisher Scientific an its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

FIGURE 5. Total number of protein groups identified from triplicate runs of all enzymes. The highest number of proteins were identified with trypsin.FIGURE 3. Venn diagram showing unique peptides identified from triplicates

experiments in all 3 enzyme digest. As expected, no peptide identified was common to all three enzyme preparations.

FIGURE 6. A) Sequence coverage achieved using different enzymes for a 453 amino acid protein Cytochrome b-c1 complex subunit 2. Green represents sections of the protein that were identified and white, the sections that were not covered by any of the identified peptides. The sequence coverage increased by 7.3 %, 45.2 %, and 56.4% for trypsin, elastase and chymotrypsin respectively. Combining all datasets, a net increase of 32.8 % is obtained. B) Comparison of sequence coverage from a single enzyme digest (trypsin) to that of the combined dataset for identified membrane proteins. Dark blue bars represent coverage obtain with trypsin alone and red bars from the sum of all enzymes used.

FIGURE 7. Amino acid sequence of ATP synthase subunit beta showing sections of the protein that was identified with annotated known modification (from UniProt). Acetylation is represented by A and phosphorylation by P.

In general, 992 protein groups were identified in all enzyme digests, of which 18.25% were mitochondrial membrane proteins. Approximately 33% of the total number of identified proteins were present in the combined dataset (Figure 5). This not only lead to a significant increase in the number of protein groups identified but also enhanced the overall sequence coverage. However, the sequence coverage varied from protein to protein. For example, 100% or close to 100% sequence coverage was achieved for the small proteins (>100 amino acid) NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit or cytochrome b-c1 complex subunit, while for larger proteins such as cytochrome b-c1 complex subunit 2 (> 400 amino acid) as shown in Figure 6, sequence coverage above 90% was obtained.

453401351301251201151101511

All 3 enzymes 94.26%

453401351301251201151101511

Chymotrypsin 60.26%

453401351301251201151101511

Elastase 64.90%

453401351301251201151101511

Trypsin 87.86%

FTMS + p NSI d Full ms2 [email protected] [120.00-2480.00]

200 400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

755.4594

868.5435

1111.6288

656.3893 939.5805229.1541

434.7753

470.29381426.7715301.2065

996.6036 1210.6965

542.3498

1525.84171311.7445

300 400 500 600m/z

0.0

0.5

1.0

1.5

2.0

y2+

y3+

b5+

b1+b6

2+

b2+b7

2+

b82+

b3+b4+b10

2+

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700m/z

b6+

a6+

b7+

b8+

b9+

b10+

b11+ b14+b13+

a7+ a8+

Inte

nsity

10^

6

b12+

FIGURE 2. Peptides identified in triplicate experiments of each enzyme digest.

A common phenomenon which is observed with peptides generated by less-specific enzymes such as elastase, is the absence of charge localization at either the N- or C-terminus. Fragmentation of these peptides results in lack of extended b- or y-ion series and an increase in internal fragment ions. Due to the basic moiety (TMT0), extended b-ions were generated. Figure 4 shows an example of a tandem MS of this peptide, IQGGVLAGDVTDVLLLDVTPL with monoisotopic mass of 2408.38506.

The use of multiple enzyme digests in proteomic studies might enable proteolytic cleavages at sites further away from modified peptides, thereby overcoming incomplete digestion caused by these protein modifications. For example, with a combination of datasets, peptides covering almost all known modifications (present in UniProt) from ATP synthase subunit beta were identified (figure 7). This was not true for all the identified proteins; nevertheless, a reasonable number of modified peptides were identified. This shows that to some extent, some portions of the proteome are simply inaccessible following digestion with a single protease. Therefore, in combination with technical replicate, multiple proteases can be used to significantly improve sequence coverage of proteins from a proteome and increase the confidence degree in protein identification. In addition, proteins that were identified by individual enzymes would have been missed, if only this enzyme was used in this experiment.

A

B

0.00

20.00

40.00

60.00

80.00

100.00

1 21 41 61 81 101 121 141 161 181 201

Sequ

ence

Cov

erag

e

Total number of identifed membrane proteins

ΣCoverageCoverage (Trypsin)

Page 7: Enhancing Sequence Coverage in Proteomics …apps.thermoscientific.com/media/SID/LSMS/.../ASMS12_T135_ATabiwang...Enhancing Sequence Coverage in Proteomics Studies by Using a Combination

Thermo Fisher Scientific, San Jose, CA USA is ISO Certified.

PN63603_E 06/12S

Africa-Other +27 11 570 1840Australia +61 3 9757 4300Austria +43 1 333 50 34 0Belgium +32 53 73 42 41Canada +1 800 530 8447China +86 10 8419 3588Denmark +45 70 23 62 60

Europe-Other +43 1 333 50 34 0Finland/Norway/Sweden +46 8 556 468 00France +33 1 60 92 48 00Germany +49 6103 408 1014India +91 22 6742 9434Italy +39 02 950 591

Japan +81 45 453 9100Latin America +1 561 688 8700Middle East +43 1 333 50 34 0Netherlands +31 76 579 55 55New Zealand +64 9 980 6700Russia/CIS +43 1 333 50 34 0South Africa +27 11 570 1840

Spain +34 914 845 965Switzerland +41 61 716 77 00UK +44 1442 233555USA +1 800 532 4752

www.thermoscientific.com©2012 Thermo Fisher Scientific Inc. All rights reserved. ISO is a trademark of the International Standards Organization. All other trademarks are the property of Thermo Fisher Scientific Inc. and its subsidiaries. This information is presented as an example of the capabilities of Thermo Fisher Scientific Inc. products. It is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others. Specifications, terms and pricing are subject to change. Not all products are available in all countries. Please consult your local sales representative for details.