1 st ms 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th relative intensity fill times scan times...

Download 1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”

If you can't read please download the document

Upload: timothy-hayden

Post on 15-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1
  • 1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times shotgun sequencing
  • Slide 2
  • MS/MS Spectrum Protein Database spectral matching
  • Slide 3
  • time shotgun sequencing
  • Slide 4
  • ms 1 ms 2 time shotgun sequencing
  • Slide 5
  • LTQ Orbitrap base peak chromatogram 37 min LC-MS/MS run-time 6186 MS/MS spectra 2308 peptide IDs (false-positive rate 1%) 287 protein IDs 6000 spectra x 10s/spectrum = 16 CPU hours Server single CPU search time 16 hours Server 20 nodes parallel CPUs 0.8 hours distributed spectral matching
  • Slide 6
  • XCorr: goodness of fit between theoretical b and y ions from peptides in the database dCn: fractional XCorr difference between the highest XCorr and next highest XCorr sequest yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994)
  • Slide 7
  • ms 1 ms 2 time 5000 - 25000 ms 2 spectra all ms 2 in LC run sequest
  • Slide 8
  • all ms 2 in LC run 1 dta all raw 501.000 (precursor m/z) +2 (charge state) ms2 array (all ms2 = 1 file) 1 ms2 = 1 file (all ms2 = ~10000 files) 2 dta 1001.500 (precursor m/z) +3 (charge state) ms2 array sequest
  • Slide 9
  • 2 x 3,250,000 times3 x 3,250,000 times 10000 x 3,250,000 times all ms 2 in LC run 1 dta, 2 3 10000 dta 1000.000 +/- 1Da human ipi database 61236 proteins peptide mass: MSQVQVQVQNPSAALSGSQILNK digest to next peptide calculate peptide mass 2426.258812 compare with precursor not a candidate if cand., calc. theoretical spectrum correlate, score & return 3000.000 +/- 1Da 3,250,000 times sequest
  • Slide 10
  • yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) theoretical candidate spectrumexperimental peptide spectrum correlation spectrum
  • Slide 11
  • yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) correlation spectrum
  • Slide 12
  • yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) correlation spectrum
  • Slide 13
  • yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) correlation spectrum similarity scoring Xcorr score
  • Slide 14
  • Xcorr (cross-correlation) Dot product similarity scoring cross-correlation vs dot product Dot product
  • Slide 15
  • human ipi database 61236 proteins >ipi00000001.2 MSQVQVQVQNPSAALSGSQILNKNQSLLSQ PLMSIPSTTSSLPSENAGRPIQNSALPSASITST SAAAESITPTVELNAL. 1 st >ipi00853644.1 .AKPNINLITGHLEEPMPNPIDEMTEEQKEY EAMKLVNMLDKLSREELLKPMGLKPDGTIT 61236 th 1200 +/- 1Da non-indexed searching
  • Slide 16
  • human ipi database 61236 proteins >ipi00001234.11 G 75 Da >ipi00853644.1 AKPNINLITGHLEEPMPNPIDEMTEEQEYEA MLVNMLDLSEELLKPMGLKPDGTITAKPNINL ITGHLEEPMPNPIDEMTEEQEYEAMLVNML DLSEELLKPMGLKPDGTIT 20245 Da indexed >ipi00344567.1 WEFGGHTVLR 1200 +/- 1Da indexed searching
  • Slide 17
  • scoring & analysis score/criterion frequency TP TN cutoff/threshold FN FP Score/Metric 1Score/Metric 2Score/Metric 3 Peptide A7.650.9997 Peptide B6.990.8797 Peptide C6.210.6597 Peptide D5.570.7196 Peptide E3.310.4450 Peptide F1.850.4141 sensitivity = TP TP + FN precision = TP TP + FP specificity = TN TN + FP accuracy = TP + TN TP + TN + FN + FP
  • Slide 18
  • The Results: Distinguishing Right from Wrong In large proteomics data sets (for which manual data inspection is impossible), how can we distinguish between correct and incorrect peptide assignments? Use decoy sequences to distract non-peptidic, non- uniquely matchable, or otherwise unmatchable spectra into a search space that is known a priori to be incorrect Use the frequency of decoy sequences among total sequences to estimate the overall frequency of wrong answers (False Positive Rate) Adjust filtering criteria to achieve a ~ 1% False Positive Rate
  • Slide 19
  • Decoy Sequences? A Reversed Database! We generate decoy sequences by reversing each protein sequence in a given database, such that the resultant in silico digest contains nonsense peptides, then append the reversed database to the end of the forward database Decoy references are labeled with # Database searching with SEQUEST occurs from top to bottom when decoy references are found, there is an equal probability it could have also mapped to a non-decoy sequence. So our FPR is (# of decoys) x 2 / total matches. S E A R C H I N G
  • Slide 20
  • Forward database 1.MAGFA SHTRP Reversed database 1.PRTHS AFGAM Composite Database Sequest Right Wrong (random) F FR 50% 100% Filter (scoring, mass accuracy, etc) Generate final list Estimate FP rate from 2 x Rev (i.e., 4%) Known FP Unknown FP Target/Decoy Database Searching
  • Slide 21
  • Cn XCorr Forward Sequences Cn XCorr Forward + Reverse TPFP PSM number sequest scores: finding true positives XCorr
  • Slide 22
  • Precision of mass errors between observed and actual m/z LTQ Orbitrap & LTQ FT 0.1 0.4 ppm LTQ FT (SIM) AGC target 50,000 to avoid space-charge effects Olsen et al. (2004) Mol. Cell. Proteomics 3, 608 -0.2 1.0 ppm High Mass Accuracy Haas et al. (2006) Mol. Cell. Proteomics 5, 1326 Mass Accuracy in Proteomics: Performance is related to the width of the distribution, not the average error
  • Slide 23
  • MMA: True Positives and False Positives MMA0 True Positives False Positives TPFP PSM number False positives are distributed evenly across MMA space
  • Slide 24
  • MS/MS vs MMA: Precision vs Sensitivity MMA0 0 MS/MS criteria are strong precision filters require TP / FP separation for sensitivity MMA criteria are weak precision filters assists MS/MS criteria in improving sensitivity
  • Slide 25
  • Distracting Wrong from Right: MMA MMA0 True Positives False Positives True Positives False Positives MMA 0 Extended Search Space Search Space Filtered
  • Slide 26
  • Mass Accuracy: Another dimension of selectivity Cn XCorr Cn XCorr Forward Sequences Cn XCorr Forward + Reverse Tryptic Search +/- 2Da Cn XCorr Tryptic Search +/- 2Da 5ppm filter
  • Slide 27
  • Distracting Wrong from Right: Trypticity True Positives False Positives K/R-PeptideK/R- True Positives False Positives A-G-C-S-T-I-L-F-P-M-V-H-D-E-Y-W-Q-N- A-G-C-S-T-I-L-F-P-M-V-H-D-E-Y-W-Q-N- PeptideK/R- K/R-Peptide Filtered Tryptic Search Partial Enzyme Search
  • Slide 28
  • Phosphorylated Unphosphorylated XCorr dCn n = 286 What do we have here, hm? 0 0.2 0.4 0.6 0.8 1 02468 Reversed Hits
  • Slide 29
  • dCn (Phosphorylated) dCn (Unphosphorylated) Doubly Phosphorylated (n=79)Singly Phosphorylated (n=207) n = 286 Phosphopeptides: Chemically disadvantaged XCorr (Unphosphorylated) XCorr (Phosphorylated) n = 286 0 2 4 6 8 02468 Dataset of phosphorylated and unphosphorylated peptide MS/MS pairs MSFEILR P
  • Slide 30
  • Doubly Phosphorylated Singly Phosphorylated XCorr (Ph/UnPh) 86% Phosphopeptides: Less power in XCorr & dCn Unphosphorylated 93% 0 0.5 1 1.5 2 dCn (Ph/UnPh) Unphosphorylated
  • Slide 31
  • Yeast Whole-Cell Lysate Red., Alkyl. SDS-PAGE 60-80 kDa Trypsin IMAC-purification Mass Accuracy: Can it help for phosphorylation?
  • Slide 32
  • -50500 Mass Accuracy: Rescuing phosphopeptides +2: 1.3 +3: 2.3 +2: 2.7 +3: 3.5 XCorr n=1390 LTQ TOP10 SEQUEST partial enzyme search, fully tryptic peptide spectral matches n=1311 MMA (ppm) Orbitrap TOP10 XCorr
  • Slide 33
  • LTQ Orbitrap 600 1.0% FP 1046 0.4% FP 74% increase Mission: Phosphopeptide rescue accomplished! 715 1.0% FP No MMAMMA # of phosphopeptides
  • Slide 34
  • search algorithms & phosphorylation Bakalarski et al., Anal. Bioanal. Chem., 2007 sequest omssa 936 928 98
  • Slide 35
  • phosphorylation site localization GFDSNQpTWR or GFDpSNQTWR? Beausoleil et al., Nat. Biotechnol, 2006
  • Slide 36
  • phosphorylation site localization Beausoleil et al., Nat. Biotechnol, 2006
  • Slide 37
  • phosphorylation site localization Taus et al., JPR, 2011
  • Slide 38
  • phosphorylation localization rate (FLR) Chalkey & Clauser, MCP, 2012 Baker et al., MCP, 2011 use non-native phosphoacceptors as decoys Ser + Thr (human proteome): 14.1% Pro + Glu (human proteome): 14.5% allow search engine / localization assessment tools to consider pP and pE as true negative decoys calculate dataset FLR based on frequency of pP + pE decoys