Peptide Identification via Tandem Mass Spectrometry
Sorin Istrail
Sample Preparation for MS
Enzymatic Digestion (Trypsin)
+Fractionation
Single Stage MS
MassSpectrometry
LC-MS: 1 MS spectrum / second
Tandem MS
Secondary Fragmentation
LC-MS/MS: 2-3 spectra / second
Tandem MS for Peptide ID
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus C-terminus
The peptide backbone breaks to formfragments with characteristic masses.
Tandem MS for Peptide ID
m/z
KLEDEELFGS147260389504633762875102210801166
1166102090777866353440529214588
% R
elat
ive
Abu
nda
nce
100
0250 500 750 1000
Tandem MS for Peptide ID
-HN-CH-CO-NH-CH-CO-NH-
RiCH-R’
ai
bici
xn-iyn-i
zn-i
yn-i-1
bi+1
R”
di+1
vn-i wn-i
i+1
i+1
low energy fragments high energy fragments
Peptide fragmentation possibilities
Tandem MS Spectrum Interpretation
Peptide sequenceOutput:
Mass of parent peptide,Tandem MS spectrum
Input:
• De novo
• Putative fragment comparison
- Combinatorial enumeration
- Sequence database
De novo Spectrum Interpretation
m/z
% R
elat
ive
Abu
nda
nce
100
0250 500 750 1000
E L F
KL
SGF G
E DE
L E
E D E L
De novo Spectrum Interpretation
• Works best for spectra with simple, well formed fragment ladders.
• Missing fragments create ambiguity.
• Noise or unexpected fragments create ambiguity.
• Many fragment types create ambiguity.
• “Best” de novo interpretation may have no biological relevance.
Putative Fragment Comparison
m/z
KLEDEELFGS147260389504633762875102210801166 y ions
1166102090777866353440529214588 b ions
% R
elat
ive
Abu
nda
nce
100
0250 500 750 1000
y2 y3 y4
y5
y6
y7
b3b4 b5 b8 b9
[M+2H]2+
b6 b7 y9
y8
y1y2y3y4y5y6y7y8y9M+H
b9b7 b8b5 b6b2 b4b3b1 M+H
Putative Fragment Comparison
Candidate peptide sequenceOutput:
Peptide mass, tryptic digestion properties, compositional information…
Input:
Generating candidate peptide sequences
• Combinatorial enumeration• Sequence database
Putative Fragment Comparison
Combinatorial enumeration• All possible sequences can be checked• Too many candidates• Many candidates are equally plausible. • “Best” candidate may have no biological relevance
Sequence database• Sequences with no biological relevance are eliminated• Few candidates to evaluate• Sequence permutations eliminated• Correct candidate might be missing from database• All candidates have some biological relevance
Candidate Peptide Evaluation
Score functions for candidate peptide evaluation
• Shared peak count
• Correlation
• Pr [ spectrum | peptide ]
By itself, the score of a peptide candidate is meaningless!
Candidate Peptide Evaluation
1 83.5 TCVADESAENCDK ALBU_HUMAN,ALBU_MACMU,ALBU_PIG
2 109.4 KCAADESAENCDK ALBU_HORSE
3 115.3 FKKCDGDTVWDK SRB9_YEAST
4 121.7 SGKAPILIATDVASR DD17_HUMAN
5 124.1 MGFINLSLFDVDK RRPO_RCNMV
6 126.4 QSDEDCVEIYIK LEM2_BOVIN
7 127.8 MLDQSTDFEERK SMOO_HUMAN
8 128.1 NFEMDTLTLLSSK DHAS_BACSU
9 129.3 DNIAKEYENKFK HPAA_HELNE
10 129.6 VEHVAFGLVLGDDK SYR_CAEEL
11 129.6 LVEVSHDAEDEQK DYHC_NEUCR
12 129.9 KTGYAHFFSRER HIS2_THEMA
13 130.2 DYTLFALQEGDVK RK27_PLECA,RK27_PLEHA
14 130.3 FNVTISLTDFITK SYK_CAEEL
15 130.4 ENCQTLDNYVSR GS27_CAEEL
Candidate Peptide Evaluation