statistical significance for peptide identification by tandem mass spectrometry
DESCRIPTION
Statistical Significance for Peptide Identification by Tandem Mass Spectrometry. Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park. Mass Spectrometry for Proteomics. Measure mass of many (bio)molecules simultaneously High bandwidth - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/1.jpg)
Statistical Significance for
Peptide Identification by
Tandem Mass Spectrometry
Statistical Significance for
Peptide Identification by
Tandem Mass SpectrometryNathan EdwardsCenter for Bioinformatics and Computational BiologyUniversity of Maryland, College Park
![Page 2: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/2.jpg)
2
Mass Spectrometry for Proteomics
• Measure mass of many (bio)molecules simultaneously• High bandwidth
• Mass is an intrinsic property of all (bio)molecules• No prior knowledge required
![Page 3: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/3.jpg)
3
Mass Spectrometry for Proteomics
• Measure mass of many molecules simultaneously• ...but not too many, abundance bias
• Mass is an intrinsic property of all (bio)molecules• ...but need a reference to compare to
![Page 4: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/4.jpg)
4
High Bandwidth
100
0250 500 750 1000
m/z
% I
nte
nsit
y
![Page 5: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/5.jpg)
5
Mass is fundamental!
![Page 6: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/6.jpg)
6
Mass Spectrometry for Proteomics
• Mass spectrometry has been around since the turn of the century...• ...why is MS based Proteomics so new?
• Ionization methods• MALDI, Electrospray
• Protein chemistry & automation• Chromatography, Gels, Computers
• Protein sequence databases• A reference for comparison
![Page 7: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/7.jpg)
7
Sample Preparation for Peptide Identification
Enzymatic Digestand
Fractionation
![Page 8: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/8.jpg)
8
Single Stage MS
MS
m/z
![Page 9: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/9.jpg)
9
Tandem Mass Spectrometry(MS/MS)
Precursor selection
m/z
m/z
![Page 10: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/10.jpg)
10
Tandem Mass Spectrometry(MS/MS)
Precursor selection + collision induced dissociation
(CID)
MS/MS
m/z
m/z
![Page 11: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/11.jpg)
11
Peptide Fragmentation
H…-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus
C-terminus
Peptides consist of amino-acids arranged in a linear backbone.
![Page 12: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/12.jpg)
12
Peptide Fragmentation
![Page 13: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/13.jpg)
13
Peptide Fragmentation
-HN-CH-CO-NH-CH-CO-NH-
RiCH-R’
bi
yn-iyn-i-1
bi+1
R”
i+1
i+1
![Page 14: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/14.jpg)
14
Peptide Fragmentation
Peptide: S-G-F-L-E-E-D-E-L-K
y1
y2
y3
y4
y5
y6
y7
y8
y9
ion
1020
907
778
663
534
405
292
145
88
MW
762SGFL EEDELKb4
389SGFLEED ELKb7
MWion
633SGFLE EDELKb5
1080S GFLEEDELKb1
1022SG FLEEDELKb2
875SGF LEEDELKb3
504SGFLEE DELKb6
260SGFLEEDE LKb8
147SGFLEEDEL Kb9
![Page 15: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/15.jpg)
15
Peptide Fragmentation
100
0250 500 750 1000
m/z
% I
nte
nsit
y
K1166
L1020
E907
D778
E663
E534
L405
F292
G145
S88 b ions
147260389504633762875102210801166 y ions
![Page 16: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/16.jpg)
16
Peptide Fragmentation
K1166
L1020
E907
D778
E663
E534
L405
F292
G145
S88 b ions
100
0250 500 750 1000
m/z
% I
nte
nsit
y
147260389504633762875102210801166 y ions
y6
y7
y2 y3 y4
y5
y8 y9
b3
b5 b6 b7b8 b9
b4
![Page 17: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/17.jpg)
17
Peptide Identification
• For each (likely) peptide sequence1. Compute fragment masses2. Compare with spectrum3. Retain those that match well
• Peptide sequences from protein sequence databases• Swiss-Prot, IPI, NCBI’s nr, ...
• Automated, high-throughput peptide identification in complex mixtures
![Page 18: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/18.jpg)
18
High Quality Peptide Identification: E-value < 10-8
![Page 19: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/19.jpg)
19
Moderate quality peptide identification: E-value < 10-3
![Page 20: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/20.jpg)
20
Amino-Acid Molecular Weights
Amino-Acid Residual MW Amino-Acid Residual MW
A Alanine 71.03712 M Methionine 131.04049
C Cysteine 103.00919 N Asparagine 114.04293
D Aspartic acid 115.02695 P Proline 97.05277
E Glutamic acid 129.04260 Q Glutamine 128.05858
F Phenylalanine 147.06842 R Arginine 156.10112
G Glycine 57.02147 S Serine 87.03203
H Histidine 137.05891 T Threonine 101.04768
I Isoleucine 113.08407 V Valine 99.06842
K Lysine 128.09497 W Tryptophan 186.07932
L Leucine 113.08407 Y Tyrosine 163.06333
![Page 21: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/21.jpg)
21
Peptide Identification
• Peptide fragmentation by CID is poorly understood
• MS/MS spectra represent incomplete information about amino-acid sequence• I/L, K/Q, GG/N, …
• Correct identifications don’t come with a certificate!
![Page 22: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/22.jpg)
22
Peptide Identification
• High-throughput workflows demand we analyze all spectra, all the time.
• Spectra may not contain enough information to be interpreted correctly• …bad static on a cell phone
• Peptides may not match our assumptions• …its all Greek to me
• “Don’t know” is an acceptable answer!
![Page 23: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/23.jpg)
23
Peptide Identification
• Rank the best peptide identifications
• Is the top ranked peptide correct?
![Page 24: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/24.jpg)
24
Peptide Identification
• Rank the best peptide identifications
• Is the top ranked peptide correct?
![Page 25: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/25.jpg)
25
Peptide Identification
• Rank the best peptide identifications
• Is the top ranked peptide correct?
![Page 26: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/26.jpg)
26
Peptide Identification
• Incorrect peptide has best score• Correct peptide is missing?• Potential for incorrect conclusion• What score ensures no incorrect
peptides?• Correct peptide has weak score
• Insufficient fragmentation, poor score• Potential for weakened conclusion• What score ensures we find all correct
peptides?
![Page 27: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/27.jpg)
27
Statistical Significance
• Can’t prove particular identifications are right or wrong...• ...need to know fragmentation in advance!
• A minimal standard for identification scores...• ...better than guessing.• p-value, E-value, statistical significance
![Page 28: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/28.jpg)
28
Pin the tail on the donkey…
![Page 29: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/29.jpg)
29
Probability Concepts
Throwing darts• One at a time• Blindfolded
Uniform distribution?Independent?Identically distributed?
Pr [ Dart hits 20 ] = 0.05
![Page 30: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/30.jpg)
30
Probability Concepts
Throwing darts• One at a time• Blindfolded• Three darts
Pr [Hitting 20 3 times] = 0.05 * 0.05 * 0.05
Pr [Hit 20 at least twice] = 0.007125 + 0.000125
0 times 0.857375
1 times 0.135375
2 times 0.007125
3 times 0.000125
![Page 31: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/31.jpg)
31
Probability Concepts
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probability 0.857375 0.135375 0.007125 0.000125
0 1 2 3
![Page 32: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/32.jpg)
32
Probability Concepts
Throwing darts• One at a time• Blindfolded• 100 darts
Pr [Hitting 20 3 times] = 0.139575
Pr [Hit 20 at least twice] = 0.9629188
0 times 0.005920
1 times 0.031160
2 times 0.081181
3 times 0.139575
![Page 33: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/33.jpg)
33
Probability ConceptsHistogram of rbinom(10000, 100, 0.05)
rbinom(10000, 100, 0.05)
Fre
qu
en
cy
0 5 10 15
05
00
10
00
15
00
![Page 34: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/34.jpg)
34
Match Score
• Dartboard represents the mass range of the spectrum
• Peaks of a spectrum are “slices”• Width of slice corresponds to mass tolerance
• Darts represent • random masses
• masses of fragments of a random peptide• masses of peptides of a random protein• masses of biomarkers from a random class
• How many darts do we get to throw?
![Page 35: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/35.jpg)
35
Match Score
100
0250 500 750 1000 m/z
% I
nte
nsit
y
270
755 580
550
330
870
• What is the probability that we match at least 5 peaks?
![Page 36: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/36.jpg)
36
Match Score
• Pr [ Match ≥ s peaks ] = Binomial( p , n ) ≈ Poisson( p n ), for small p and large n
p is prob. of random mass / peak match,n is number of darts (fragments in our answer)
![Page 37: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/37.jpg)
37
Match Score
Theoretical distribution• Used by OMSSA• Proposed, in various forms, by many.
• Probability of random mass / peak match• IID (independent, identically distributed)• Based on match tolerance
![Page 38: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/38.jpg)
38
Match Score
Theoretical distribution assumptions• Each dart is independent
• Peaks are not “related”
• Each dart is identically distributed• Chance of random mass / peak match is
the same for all peaks
![Page 39: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/39.jpg)
39
Tournament Size
0 2 4 6 8 10 12
0.00
0.05
0.10
0.15
0 2 4 6 8 10 12
0.00
0.05
0.10
0.15
0 5 10 15
0.00
0.05
0.10
0.15
0 5 10 15
0.00
0.05
0.10
0.15
100
Dar
ts, #
20’
s
100 people 1000 people10000 people 100000 people
![Page 40: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/40.jpg)
40
Tournament Size10
0 D
arts
, # 2
0’s
100 people 1000 people10000 people 100000 people
10 12 14 16 18
010
2030
4050
10 12 14 16 18
010
2030
4050
10 12 14 16 18
010
2030
4050
10 12 14 16 18
010
2030
4050
![Page 41: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/41.jpg)
41
Number of Trials
• Tournament size == number of trials• Number of peptides tried• Related to sequence database size
• Probability that a random match score is ≥ s• 1 – Pr [ all match scores < s ]• 1 – Pr [ match score < s ] Trials (*)• Assumes IID!
• Expect value • E = Trials * Pr [ match ≥ s ]• Corresponds to Bonferroni bound on (*)
![Page 42: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/42.jpg)
42
Better Dart Throwers
![Page 43: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/43.jpg)
43
Better Random Models
• Comparison with completely random model isn’t really fair
• Match scores for real spectra with real peptides obey rules
• Even incorrect peptides match with non-random structure!
![Page 44: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/44.jpg)
44
Better Random Models
• Want to generate random fragment masses (darts) that behave more like the real thing:• Some fragments are more likely than others• Some fragments depend on others
• Theoretical models can only incorporate this structure to a limited extent.
![Page 45: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/45.jpg)
45
Better Random Models
• Generate random peptides• Real looking fragment masses• No theoretical model!• Must use empirical distribution• Usually require they have the correct
precursor mass
• Score function can model anything we like!
![Page 46: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/46.jpg)
46
Better Random Models
Fenyo & Beavis, Anal. Chem., 2003
![Page 47: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/47.jpg)
47
Better Random Models
Fenyo & Beavis, Anal. Chem., 2003
![Page 48: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/48.jpg)
48
Better Random Models
• Truly random peptides don’t look much like real peptides
• Just use peptides from the sequence database!
• Caveats:• Correct peptide (non-random) may be included• Peptides are not independent
• Reverse sequence avoids only the first problem
![Page 49: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/49.jpg)
49
Extrapolating from the Empirical Distribution
• Often, the empirical shape is consistent with a theoretical model
Geer et al., J. Proteome Research, 2004 Fenyo & Beavis, Anal. Chem., 2003
![Page 50: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/50.jpg)
50
False Positive Rate Estimation
• Each spectrum is a chance to be right, wrong, or inconclusive.• How many decisions are wrong?
• Given identification criteria:• SEQUEST Xcorr, E-value, Score, etc., plus...• ...threshold
• Use “decoy” sequences• random, reverse, cross-species• Identifications must be incorrect!
![Page 51: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/51.jpg)
51
False Positive Rate Estimation
• # FP in real search = # hits in decoy search• Need same size database, or rate conversion
• FP Rate: # decoy hits # real hits
• FP Rate: 2 x # decoy hits . (# real hits + # decoy hits)
![Page 52: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/52.jpg)
52
False Positive Rate Estimation
• A form of statistical significance• In “theory”, E-value and a FP rate are the
same.• Search engine independent
• Easy to implement• Assumes a single threshold for all
spectra• Spectrum/Peptide Identification scores are
not iid!...• ...but E-values, in principle, are.
![Page 53: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/53.jpg)
53
Peptide Prophet
• From the Institute for Systems Biology• Keller et al., Anal. Chem. 2002
• Re-analysis of SEQUEST results
• Spectra are trials • Assumes that many of the spectra are
not correctly identified
![Page 54: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/54.jpg)
54
Peptide Prophet
Distribution of spectral scores in the results
Keller et al., Anal. Chem. 2002
![Page 55: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/55.jpg)
55
Peptide Prophet
• Assumes a bimodal distribution of scores, with a particular shape
• Ignores database size• …but it is included implicitly
• Like empirical distribution for peptide sampling, can be applied to any score function• Can be applied to any search engines’ results
![Page 56: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/56.jpg)
56
Peptide Prophet
• Caveats• Are spectra scores sampled from the same
distribution?• Is there enough correct identifications for second
peak?• Are spectra independent observations?• Are distributions appropriately shaped?
• Huge improvement over raw SEQUEST results
![Page 57: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/57.jpg)
57
Peptides to Proteins
Nesvizhskii et al., Anal. Chem. 2003
![Page 58: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/58.jpg)
58
Peptides to Proteins
![Page 59: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/59.jpg)
59
Peptides to Proteins
• A peptide sequence may occur in many different protein sequences• Variants, paralogues, protein families
• Separation, digestion and ionization is not well understood
• Proteins in sequence database are extremely non-random, and very dependent
![Page 60: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/60.jpg)
60
Publication Guidelines
![Page 61: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/61.jpg)
61
Publication Guidelines
1. Computational parameters• Spectral processing• Sequence database• Search program• Statistical analysis
2. Number of peptides per protein• Each peptide sequence counts once!• Multiple forms of the same peptide
count once!
![Page 62: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/62.jpg)
62
Publication Guidelines
3. Single-peptide proteins must be explicitly justified by
• Peptide sequence• N and C terminal amino-acids• Precursor mass and charge• Peptide Scores• Multiple forms of the peptide counted once!
4. Biological conclusions based on single-peptide proteins must show the spectrum
![Page 63: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/63.jpg)
63
Publication Guidelines
5. More stringent requirements for PMF data analysis
• Similar to that for tandem mass spectra
6. Management of protein redundancy• Peptides identified from a different species?
7. Spectra submission encouraged
![Page 64: Statistical Significance for Peptide Identification by Tandem Mass Spectrometry](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814d0d550346895dba4894/html5/thumbnails/64.jpg)
64
Summary
• Could guessing be as effective as a search?
• More guesses improves the best guess
• Better guessers help us be more discriminating
• Peptide to proteins is not as simple as it seems
• Publication guidelines reflect sound statistical principles.