a comprehensive comparison of the de novo sequencing accuracies of peaks, bioanalyst and plgs bin ma...

1
A Comprehensive Comparison of the A Comprehensive Comparison of the de novo de novo Sequencing Accuracies of Sequencing Accuracies of PEAKS, BioAnalyst and PLGS PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson 2 ; Gilles Lajoie 1 1 University of Western Ontario, London, ON, Canada 2 UVic Genome BC Proteomics Centre, Victoria, BC, Canada Overview We compared three commonly used de novo sequencing programs, PEAKS, BioAnalyst and PLGS. The result showed that PEAKS has the best accuracy. Experimental Result Q-TOF GLOBAL was used to measure the MS/MS spectra for BSA_BOVIN and ADH_YEAST for the comparison of PEAKS and PLGS. . A low filter (i.e. 10 cts/sec above background for the precursor ions) was used in the data collection and therefore a large number of spectra (265) were collected as the raw MS/MS data set. We then manually extracted all the spectra that have at least three strong y-ion matches with some peptides of the two proteins. The other spectra were discarded because they generally were of poor quality and we were not able to determine their peptides even knowing the protein sequences. Sixty-one spectra remained after this selection, and there are in total 764 amino acids in their sequences. Then both PLGS 2.0 and PEAKS were employed to compute the sequences de novo. Table 1 compares the performance of PEAKS and PLGS. It is worth noting that because of our selection criteria, many of the 61 spectra are of lower quality than needed by de novo sequencing. The numbers shown in Table 1 are valid for the comparison of the two programs. But the low success rate cannot be interpreted as the low quality of either software. It is also interesting that PEAKS and PLGS are complementary to each other, reflecting different methods employed in the two programs. Table 2 shows the de novo sequencing results of both programs. The sequences are in general sorted by the spectrum quality. We regard an amino acid computed by the software is correct if the mass is approximately equal to the mass of the amino acid at the corresponding position of the correct sequence. For example, a letter Q is regarded as correct if it corresponds to a letter K in the correct sequence. PEAKS P LG S C orrectam ino acids 456 232 C orrectsequences 13 7 S equences w ith length>4 tags 45 28 Table 1. PEAKS found more correct amino acids and sequences. m/z z correct PEAKS PLGS 464.3 2 YLYEIAR Y LY E LA R Y LY E LVK 507.8 2 Q TA LV E LLK Q TA LV E LLK TK A LV E LLK 540.2 2 STLPEIYEK S TLP E LY E K S TLP E EF EK 582.3 2 LVNELTEFAK LV N E LTE FA K LV N E LTVFTK 653.4 2 HLVDEPQNLIK H LV D E P K N LLK H LV Pm P K N LLK 740.4 2 LGEYGFQNALIVR LG E Y G FQ N A LLV R LSV YGFKNALLVR 756.5 2 VPQVSTPTLVEVSR V P Q V S TP TLV E V S R VPKVSTLR A A K VSR 418.7 2 IGDYAGIK LG D Y A G LK 567.2 2 VSEAAIEASTR VSEAALEASTR V S E A A LE GSD R 567.3 2 VSEAAIEASTR VSEAALEASTR VSEA PS E A S TR 618.7 2 DGGEGKEELFR DGGEGKEELFR D G G E G K E E Lm R 809.9 2 VLGIDGGEGKEELFR V LG LD G G E G Q E E LFR V LG LD G G E G Q E E Lm R 484.7 2 EALDFFAR EALDFFAR EALDFmAR 703.8 2 GIDGGEGKEELFR G LD G G E G Q E GAN FR G LD G G E G Q E E LFR 496.7 2 TLPE IYE K DV PELYEK TLP E LY E K 526.2 2 SIVGSYVGNR LS VGSY NR R S LV G S Y V G N R 602.3 1 P E TQ K EP TK K P E TQ K 547.3 3 KVPQVSTPTLVEVSR N M P Q V LG P TLV E V S R VK P K V S TP TLKKA SR 626.3 2 SISIVGSYVGNR LS SLVGSYVGNR S LS LV G S FD GNR 501.3 2 ALKAW SVAR SP KAW SVAR 693.8 2 YICDNQDTISSK Y ES D N Q D TLS S K YLmAPYP TLS S K 461.8 2 AEFVEVTK AEFVE AE K TV F KAK TK 675.8 3 K V P Q V S TP TLV E V S R S LG K Q V P Q V S TP TLV E P G G LA F GK 656.8 2 SIG G EVFIDFTK LS GGEVF DYP TK 681.8 2 G A A G G LG S LA V Q Y A K AG A G G LG S LA V YAG AK TP D LG S SP V YAG AK 693.8 2 A N G TTV LV G M P A G A K ELTTV LV G M P A G A K 693.9 2 A N G TTV LV G M P A G A K SQQ TV LV G M P A G A K 706.3 2 ADTREALDFFAR W TVG EALDFFAR 760.3 2 LGIDGGEGKEELFR LG LD VGS GQEELFR LG LD G EG G AG E YPE R 771.3 3 ATDGGAHGVINVSVSEAAIEASTR KNGDNPVHVM SVSEAA GQG A S TR LE D Y LS D E D V V PCS A LE A S EK 507.2 2 ANELLINVK A GG E LLLN V K 784.4 2 DAFLGSFLYEYSR W FLG SFLDGAGGAN R SV FLG S G S LP FLS TR 820.5 2 KVPQVSTPTLVEVSR Q V P Q V S TP NKA E W R RA PKVSTM LR LLV R 824.8 3 Q N C D Q FE K LG E Y G FQ N A LIV R E TN G FG M K QLSV YGFKNALLVR 841.2 3 LSQKFPKAEFVEVTKLVTDLTK LSQKFP LS K FVEVTKLTV D LTK 515.8 4 Y TR K V P Q V S TP TLV E V S R Y V V P G TA LV TS A A K LV E V S R 571.9 2 K Q TA LV E LLK AGAG TA LV E LLK QKTVGKK LLK 450.5 3 IDGGEGKEELFR SPVSTGKEELFR 582.8 2 ISIVGSYVGNR LS LV G S Y GAAA R SLLV G AANYTR 631.1 4 LSQKFPKAEFVEVTKLVTDLTK LDKALGPVSLTVVGAAAPKG V TD LTK 522.3 2 TV LV G M P A G A K AE LV G M AP GAK 489.9 3 STLPEIYEKM EK RAGN E LY E AG MEK 681.9 2 S LH TLFG D E LC K EA H TLF DG E SE K S LH TLSHAPGKS K 625.0 3 SPIKVVGLSTLPEIYEK TG V LTA G P P D S V V A G M G S EK S LS H N S A TA P E P E LY E K 447.2 2 DIPVPKPK NGG PVP AG PK MP PVP AG PK 596.8 2 LSTLPEIYEK LS TLNPAG YEK 536.3 2 EKDIVGAVLK VGTDLR A V LK 417.2 3 FKDLGEEHFK HGAAGAGAPVNAE K 438.5 4 LSQKFPKAEFVEVTK FSVPGGPAGAGGVVPPG V TK 450.8 2 P TLV E V S R VV LDLVSR 465.8 2 LKAW SVAR LK A D A TG V R 584.4 3 LSQKFPKAEFVEVTK LPPAPKSKATSVLGG V TK 642.4 2 HPEYAVSVLLR LADVHSEVSAQK 700.4 2 TVM ENFVAFVDK E A LA G TR G S TH G DK 747.0 2 FV E V TK LV TD LTK FV TQ G N G LAFPTLK 767.7 3 NYQEAKDAFLGSFLYEYSR SSVDPGPNLAGNA GS G G S G LG S G M V R 434.2 3 TKEKDIVGAVLK E Q LD D G G V TA A P K 483.3 3 FTKEKDIVGAVLK M FNLQ GGGG V AR LK 518.2 2 SDVFNQVVK ASFVAAGS VVK 550.0 4 AM GYRVLGIDGGEGKEELFR TLR H A G G D TD A G G G G S G S G G R P TR NAEGKDKYVQQGW E GAAFAK 582.8 2 ISIVGSYVGNR LAE V TYGANVK Table 2. PEAKS and PLGS results on 61 Micromass QTOF spectra. Red fonts indicate the amino acids are correct. Orange area means the software found length 5 sequence tags and performed better than or equally to the other. The empty lines in PLGS column means PLGS did not output any sequences for those spectra. m/z z correct PEAKS B ioA nalyst 482.7 2 E D LIAY LK E D LLA Y LK E D LLA Y LK 464.2 2 YLYEIAR Y LY E LA R Y LY E LA R 582.3 2 LVNELTEFAK LV N E LTE FA K LV GG ELTEFAK 450.2 2 LcV LH E K LcV LH E K LPESVVGA K 570.7 2 ccTE S LV N R ccTE S LV N R ccTE S LV GG R 512.2 3 LK E ccD K P LLE K LK E ccD K P LLE K LAG E ccD AG P LLE K 722.8 2 YIcDNQ DTISSK Y LcD N Q D TLS S K Y LcD GGGA D TLS S K 740.4 2 LGEYGFQNALIVR LG E Y G FQ N A LLV R LG E Y G F GAGGPS LV R 728.8 2 TGQAPGFSYTDANK TG Q A P G AGASFGPP NK TG GA APGF HLTD A GG K 545.2 3 IFVQKCAQCHTVEK C A Q E LA CAKCHTVEK TS S V TTG G V A G V G G A G VEK 528.9 3 KTGQAPGFSYTDANK TG A G AG APGFSYTDANK G TG A A G APG AYAGPGP A GG K 1005.5 2 G ITW G E E TLM EY LE N P K A V TW G E E TM FLTG G G D N P K LG V S TG E E TM M E TE G TLP K 478.9 3 G E R ED LIA Y LK K KMYVLNHAAFLK Q M A G D P D LLA Y LK Table 4. PEAKS and BioAnalyst results on 13 MDS Sciex/ABI QSTAR spectra. Red fonts indicate the amino acids are correct. Orange area means the software found length 5 sequence tags and performed better than or equally to the other. PEAKS BioAnalyst C orrectam ino acids 117 88 C orrectsequences 8 2 S equences w ith length>4 tags 12 7 Table 3. PEAKS found more correct amino acids and sequences. Experimental Result For the comparison of PEAKS and BioAnalyst, a SCIEX API QSTAR Pulsar was used to measure the MS/MS spectra for BSA_BOVIN and CYC_HORSE. Only the 6 most intense peaks of BSA_BOVIN and 7 most intense peaks of CYC_HORSE were selected for fragmentation. Therefore, only 13 spectra of good quality were collected. There are 150 amino acids in these sequences. Table 3 compares the performance of PEAKS and BioAnalyst. Table 4 lists the results of the two programs on the 13 spectra, where lower case “c” indicates a carboxyamidomethylcysteine. Introduction To identify proteins, a de novo sequencing algorithm computes the peptide sequences from MS/MS data without the need of a protein database. When proteins are heavily modified or from an organism whose genome is not sequenced, de novo sequencing is the only reliable approach to identify the proteins in a sample. De novo sequencing typically requires higher quality data than those required by a database search method. Therefore, a hybrid quadrupole time-of- flight (Q-TOF) instrument is most often used for measuring the MS/MS data. There are three commercial de novo sequencing software packages commonly used for the analysis of Q-TOF MS/MS data: BioAnalyst for the MDS Sciex/ABI QSTAR, PLGS for Micromass/Waters Q-TOFs, and PEAKS 1 for both. In this poster we compare the accuracies of the three packages. Methods MS/MS spectra measured with a Micromass Q-TOF GLOBAL were analyzed by PLGS 2.0. Similarly, MS/MS spectra measured with a SCIEX API QSTAR Pulsar were analyzed by BioAnalyst (Analyst QS 1.11.). PEAKS 2.0 was used to analyse both datasets and the de novo sequencing results of PEAKS were compared with PLGS and BioAnalyst, respectively. In the analyses, each software outputs more than one sequence for each spectrum, but only the sequence with the highest score is used in this comparison. Three criteria were considered to evaluate the accuracy of each software: number of correct amino acids, number of completely correct sequences, number of partially correct sequences with five or more contiguous correct amino acids. Reference: 1. B. Ma, K. Zhang, A. Doherty-Kirby, C. Hendrie, C. Liang, M. Li and G. Lajoie, Rapid Communications in Mass Spectrometry 17(20): 2337-2342. 2003.

Upload: prudence-greer

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson

A Comprehensive Comparison of the A Comprehensive Comparison of the de novode novo Sequencing Accuracies of Sequencing Accuracies of PEAKS, BioAnalyst and PLGS PEAKS, BioAnalyst and PLGS Bin Ma1; Amanda Doherty-Kirby1; Aaron Booy2; Bob Olafson2; Gilles Lajoie1

1University of Western Ontario, London, ON, Canada2UVic Genome BC Proteomics Centre, Victoria, BC, Canada

A Comprehensive Comparison of the A Comprehensive Comparison of the de novode novo Sequencing Accuracies of Sequencing Accuracies of PEAKS, BioAnalyst and PLGS PEAKS, BioAnalyst and PLGS Bin Ma1; Amanda Doherty-Kirby1; Aaron Booy2; Bob Olafson2; Gilles Lajoie1

1University of Western Ontario, London, ON, Canada2UVic Genome BC Proteomics Centre, Victoria, BC, Canada

Overview

We compared three commonly used de novo sequencing programs, PEAKS, BioAnalyst and PLGS. The result showed that PEAKS has the best accuracy.

Overview

We compared three commonly used de novo sequencing programs, PEAKS, BioAnalyst and PLGS. The result showed that PEAKS has the best accuracy.

Experimental Result

Q-TOF GLOBAL was used to measure the MS/MS spectra for BSA_BOVIN and ADH_YEAST for the comparison of PEAKS and PLGS. . A low filter (i.e. 10 cts/sec above background for the precursor ions) was used in the data collection and therefore a large number of spectra (265) were collected as the raw MS/MS data set. We then manually extracted all the spectra that have at least three strong y-ion matches with some peptides of the two proteins. The other spectra were discarded because they generally were of poor quality and we were not able to determine their peptides even knowing the protein sequences. Sixty-one spectra remained after this selection, and there are in total 764 amino acids in their sequences. Then both PLGS 2.0 and PEAKS were employed to compute the sequences de novo. Table 1 compares the performance of PEAKS and PLGS.

 

It is worth noting that because of our selection criteria, many of the 61 spectra are of lower quality than needed by de novo sequencing. The numbers shown in Table 1 are valid for the comparison of the two programs. But the low success rate cannot be interpreted as the low quality of either software. It is also interesting that PEAKS and PLGS are complementary to each other, reflecting different methods employed in the two programs. Table 2 shows the de novo sequencing results of both programs. The sequences are in general sorted by the spectrum quality. We regard an amino acid computed by the software is correct if the mass is approximately equal to the mass of the amino acid at the corresponding position of the correct sequence. For example, a letter Q is regarded as correct if it corresponds to a letter K in the correct sequence.

Experimental Result

Q-TOF GLOBAL was used to measure the MS/MS spectra for BSA_BOVIN and ADH_YEAST for the comparison of PEAKS and PLGS. . A low filter (i.e. 10 cts/sec above background for the precursor ions) was used in the data collection and therefore a large number of spectra (265) were collected as the raw MS/MS data set. We then manually extracted all the spectra that have at least three strong y-ion matches with some peptides of the two proteins. The other spectra were discarded because they generally were of poor quality and we were not able to determine their peptides even knowing the protein sequences. Sixty-one spectra remained after this selection, and there are in total 764 amino acids in their sequences. Then both PLGS 2.0 and PEAKS were employed to compute the sequences de novo. Table 1 compares the performance of PEAKS and PLGS.

 

It is worth noting that because of our selection criteria, many of the 61 spectra are of lower quality than needed by de novo sequencing. The numbers shown in Table 1 are valid for the comparison of the two programs. But the low success rate cannot be interpreted as the low quality of either software. It is also interesting that PEAKS and PLGS are complementary to each other, reflecting different methods employed in the two programs. Table 2 shows the de novo sequencing results of both programs. The sequences are in general sorted by the spectrum quality. We regard an amino acid computed by the software is correct if the mass is approximately equal to the mass of the amino acid at the corresponding position of the correct sequence. For example, a letter Q is regarded as correct if it corresponds to a letter K in the correct sequence.

PEAKS PLGSCorrect amino acids 456 232Correct sequences 13 7Sequences with length>4 tags 45 28

Table 1. PEAKS found more correct amino acids and sequences.

m/z z correct PEAKS PLGS464.3 2 YLYEIAR YLYELAR YLYELVK507.8 2 QTALVELLK QTALVELLK TKALVELLK540.2 2 STLPEIYEK STLPELYEK STLPEEFEK582.3 2 LVNELTEFAK LVNELTEFAK LVNELTVFTK653.4 2 HLVDEPQNLIK HLVDEPKNLLK HLVPmPKNLLK740.4 2 LGEYGFQNALIVR LGEYGFQNALLVR LSVYGFKNALLVR756.5 2 VPQVSTPTLVEVSR VPQVSTPTLVEVSR VPKVSTLRAAKVSR418.7 2 IGDYAGIK LGDYAGLK567.2 2 VSEAAIEASTR VSEAALEASTR VSEAALEGSDR567.3 2 VSEAAIEASTR VSEAALEASTR VSEAPSEASTR618.7 2 DGGEGKEELFR DGGEGKEELFR DGGEGKEELmR809.9 2 VLGIDGGEGKEELFR VLGLDGGEGQEELFR VLGLDGGEGQEELmR484.7 2 EALDFFAR EALDFFAR EALDFmAR703.8 2 GIDGGEGKEELFR GLDGGEGQEGANFR GLDGGEGQEELFR496.7 2 TLPEIYEK DVPELYEK TLPELYEK526.2 2 SIVGSYVGNR LSVGSYNRR SLVGSYVGNR602.3 1 PETQK EPTKK PETQK547.3 3 KVPQVSTPTLVEVSR NMPQVLGPTLVEVSR VKPKVSTPTLKKASR626.3 2 SISIVGSYVGNR LSSLVGSYVGNR SLSLVGSFDGNR501.3 2 ALKAWSVAR SPKAWSVAR693.8 2 YICDNQDTISSK YESDNQDTLSSK YLmAPYPTLSSK461.8 2 AEFVEVTK AEFVEAEK TVFKAKTK675.8 3 KVPQVSTPTLVEVSRSLGK QVPQVSTPTLVEPGGLAFGK656.8 2 SIGGEVFIDFTK LSGGEVFDYPTK681.8 2 GAAGGLGSLAVQYAK AGAGGLGSLAVYAGAK TPDLGSSPVYAGAK693.8 2 ANGTTVLVGMPAGAK ELTTVLVGMPAGAK693.9 2 ANGTTVLVGMPAGAK SQQTVLVGMPAGAK706.3 2 ADTREALDFFAR WTVGEALDFFAR760.3 2 LGIDGGEGKEELFR LGLDVGSGQEELFR LGLDGEGGAGEYPER771.3 3 ATDGGAHGVINVSVSEAAIEASTR KNGDNPVHVMSVSEAAGQGASTR LEDYLSDEDVVPCSALEASEK507.2 2 ANELLINVK AGGELLLNVK784.4 2 DAFLGSFLYEYSR WFLGSFLDGAGGANR SVFLGSGSLPFLSTR820.5 2 KVPQVSTPTLVEVSR QVPQVSTPNKAEWR RAPKVSTMLRLLVR824.8 3 QNCDQFEKLGEYGFQNALIVR ETNGFGMKQLSVYGFKNALLVR841.2 3 LSQKFPKAEFVEVTKLVTDLTK LSQKFPLSKFVEVTKLTVDLTK515.8 4 YTRKVPQVSTPTLVEVSR YVVPGTALVTSAAKLVEVSR571.9 2 KQTALVELLK AGAGTALVELLK QKTVGKKLLK450.5 3 IDGGEGKEELFR SPVSTGKEELFR582.8 2 ISIVGSYVGNR LSLVGSYGAAAR SLLVGAANYTR631.1 4 LSQKFPKAEFVEVTKLVTDLTK LDKALGPVSLTVVGAAAPKGVTDLTK522.3 2 TVLVGMPAGAK AELVGMAPGAK489.9 3 STLPEIYEKMEK RAGNELYEAGMEK681.9 2 SLHTLFGDELCK EAHTLFDGESEK SLHTLSHAPGKSK625.0 3 SPIKVVGLSTLPEIYEK TGVLTAGPPDSVVAGMGSEK SLSHNSATAPEPELYEK447.2 2 DIPVPKPK NGGPVPAGPK MPPVPAGPK596.8 2 LSTLPEIYEK LSTLNPAGYEK536.3 2 EKDIVGAVLK VGTDLRAVLK417.2 3 FKDLGEEHFK HGAAGAGAPVNAEK438.5 4 LSQKFPKAEFVEVTK FSVPGGPAGAGGVVPPGVTK450.8 2 PTLVEVSR VVLDLVSR465.8 2 LKAWSVAR LKADATGVR584.4 3 LSQKFPKAEFVEVTK LPPAPKSKATSVLGGVTK642.4 2 HPEYAVSVLLR LADVHSEVSAQK700.4 2 TVMENFVAFVDK EALAGTRGSTHGDK747.0 2 FVEVTKLVTDLTK FVTQGNGLAFPTLK767.7 3 NYQEAKDAFLGSFLYEYSR SSVDPGPNLAGNAGSGGSGLGSGMVR434.2 3 TKEKDIVGAVLK EQLDDGGVTAAPK483.3 3 FTKEKDIVGAVLK MFNLQGGGGVARLK518.2 2 SDVFNQVVK ASFVAAGSVVK550.0 4 AMGYRVLGIDGGEGKEELFR TLRHAGGDTDAGGGGSGSGGRPTR NAEGKDKYVQQGW EGAAFAK582.8 2 ISIVGSYVGNR LAEVTYGANVK

Table 2. PEAKS and PLGS results on 61 Micromass QTOF spectra. Red fonts indicate the amino acids are correct. Orange area means the software found length 5 sequence tags and performed better than or equally to the other. The empty lines in PLGS column means PLGS did not output any sequences for those spectra.

m/z z correct PEAKS BioAnalyst482.7 2 EDLIAYLK EDLLAYLK EDLLAYLK464.2 2 YLYEIAR YLYELAR YLYELAR582.3 2 LVNELTEFAK LVNELTEFAK LVGGELTEFAK450.2 2 LcVLHEK LcVLHEK LPESVVGAK570.7 2 ccTESLVNR ccTESLVNR ccTESLVGGR512.2 3 LKEccDKPLLEK LKEccDKPLLEK LAGEccDAGPLLEK722.8 2 YIcDNQDTISSK YLcDNQDTLSSK YLcDGGGADTLSSK740.4 2 LGEYGFQNALIVR LGEYGFQNALLVR LGEYGFGAGGPSLVR728.8 2 TGQAPGFSYTDANK TGQAPGAGASFGPPNK TGGAAPGFHLTDAGGK545.2 3 IFVQKCAQCHTVEK CAQELACAKCHTVEK TSSVTTGGVAGVGGAGVEK528.9 3 KTGQAPGFSYTDANK TGAGAGAPGFSYTDANK GTGAAGAPGAYAGPGPAGGK

1005.5 2 GITWGEETLMEYLENPK AVTWGEETMFLTGGGDNPK LGVSTGEETMMETEGTLPK478.9 3 GEREDLIAYLKK KMYVLNHAAFLK QMAGDPDLLAYLK

Table 4. PEAKS and BioAnalyst results on 13 MDS Sciex/ABI QSTAR spectra. Red fonts indicate the amino acids are correct. Orange area means the software found length 5 sequence tags and performed better than or equally to the other.

PEAKS BioAnalystCorrect amino acids 117 88Correct sequences 8 2Sequences with length>4 tags 12 7

Table 3. PEAKS found more correct amino acids and sequences.

Experimental Result

For the comparison of PEAKS and BioAnalyst, a SCIEX API QSTAR Pulsar was used to measure the MS/MS spectra for BSA_BOVIN and CYC_HORSE. Only the 6 most intense peaks of BSA_BOVIN and 7 most intense peaks of CYC_HORSE were selected for fragmentation. Therefore, only 13 spectra of good quality were collected. There are 150 amino acids in these sequences. Table 3 compares the performance of PEAKS and BioAnalyst. Table 4 lists the results of the two programs on the 13 spectra, where lower case “c” indicates a carboxyamidomethylcysteine.

Experimental Result

For the comparison of PEAKS and BioAnalyst, a SCIEX API QSTAR Pulsar was used to measure the MS/MS spectra for BSA_BOVIN and CYC_HORSE. Only the 6 most intense peaks of BSA_BOVIN and 7 most intense peaks of CYC_HORSE were selected for fragmentation. Therefore, only 13 spectra of good quality were collected. There are 150 amino acids in these sequences. Table 3 compares the performance of PEAKS and BioAnalyst. Table 4 lists the results of the two programs on the 13 spectra, where lower case “c” indicates a carboxyamidomethylcysteine.

Introduction

To identify proteins, a de novo sequencing algorithm computes the peptide sequences from MS/MS data without the need of a protein database. When proteins are heavily modified or from an organism whose genome is not sequenced, de novo sequencing is the only reliable approach to identify the proteins in a sample. De novo sequencing typically requires higher quality data than those required by a database search method. Therefore, a hybrid quadrupole time-of-flight (Q-TOF) instrument is most often used for measuring the MS/MS data. There are three commercial de novo sequencing software packages commonly used for the analysis of Q-TOF MS/MS data: BioAnalyst for the MDS Sciex/ABI QSTAR, PLGS for Micromass/Waters Q-TOFs, and PEAKS 1 for both. In this poster we compare the accuracies of the three packages.

Introduction

To identify proteins, a de novo sequencing algorithm computes the peptide sequences from MS/MS data without the need of a protein database. When proteins are heavily modified or from an organism whose genome is not sequenced, de novo sequencing is the only reliable approach to identify the proteins in a sample. De novo sequencing typically requires higher quality data than those required by a database search method. Therefore, a hybrid quadrupole time-of-flight (Q-TOF) instrument is most often used for measuring the MS/MS data. There are three commercial de novo sequencing software packages commonly used for the analysis of Q-TOF MS/MS data: BioAnalyst for the MDS Sciex/ABI QSTAR, PLGS for Micromass/Waters Q-TOFs, and PEAKS 1 for both. In this poster we compare the accuracies of the three packages.

Methods

MS/MS spectra measured with a Micromass Q-TOF GLOBAL were analyzed by PLGS 2.0. Similarly, MS/MS spectra measured with a SCIEX API QSTAR Pulsar were analyzed by BioAnalyst (Analyst QS 1.11.). PEAKS 2.0 was used to analyse both datasets and the de novo sequencing results of PEAKS were compared with PLGS and BioAnalyst, respectively. In the analyses, each software outputs more than one sequence for each spectrum, but only the sequence with the highest score is used in this comparison. Three criteria were considered to evaluate the accuracy of each software:

• number of correct amino acids,• number of completely correct sequences, • number of partially correct sequences with five or more contiguous correct amino acids.

Methods

MS/MS spectra measured with a Micromass Q-TOF GLOBAL were analyzed by PLGS 2.0. Similarly, MS/MS spectra measured with a SCIEX API QSTAR Pulsar were analyzed by BioAnalyst (Analyst QS 1.11.). PEAKS 2.0 was used to analyse both datasets and the de novo sequencing results of PEAKS were compared with PLGS and BioAnalyst, respectively. In the analyses, each software outputs more than one sequence for each spectrum, but only the sequence with the highest score is used in this comparison. Three criteria were considered to evaluate the accuracy of each software:

• number of correct amino acids,• number of completely correct sequences, • number of partially correct sequences with five or more contiguous correct amino acids.

Reference:1. B. Ma, K. Zhang, A. Doherty-Kirby, C. Hendrie, C. Liang, M. Li and G. Lajoie, Rapid Communications in Mass Spectrometry 17(20): 2337-2342. 2003.