mass spectrometry: protein identification strategies
DESCRIPTION
A talk on the basics of protein identification for mass spectrometry.TRANSCRIPT
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 1
Mass SpectrometryProtein Identification Strategies
Michel DumontierCarleton University
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 2
Typical MS experiment
Protein Identification
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 3
Protein identification strategies
• Mass Spectrometry– Peptide Mass Fingerprinting
• Tandem Mass Spectrometry– Spectral alignment– de novo sequencing
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 4
Peptide Mass Fingerprinting (PMF)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 5
Matrix-Assisted Laser Desorption/Ionization (MALDI)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 6
Electrospray Ionization (ESI)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 7
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 8
Peptide Mass Fingerprinting
• Identify a protein from peptide signature– MALDI-TOF, ESI-TOF
• Approach– Compare observed with theoretical masses
• Requirements– Protease & cleavage pattern– Database of known sequences
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 9
Principles of Fingerprinting
>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe
>Protein Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe
>Protein Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe
Sequence Mass (M+H) Tryptic Fragments
4842.05
4842.05
4842.05
acedfhsakdfgeasdfpkivtmeeewendadnfekgwfe
acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe
acedfhsadfgekasdfpkivtmeeewendakdnfegwfe
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 10
Principles of Fingerprinting
>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe
>Protein Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe
>Protein Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe
Sequence Mass (M+H) Mass Spectrum
4842.05
4842.05
4842.05
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 11
Mass Calculation (Glycine)
NH2—CH2—COOH
R1—NH—CH2—CO—R3
free amino acid
amino acidresidue
Monoisotopic Mass1H = 1.00782512C = 12.0000014N = 14.0030716O = 15.99491
Glycine Free Amino Acid Mass5xH + 2xC + 2xO + 1xN= 75.032015 amuGlycine Residue Mass3xH + 2xC + 1xO + 1xN=57.021455 amu
Monoisotopic vs average mass
Monoisotopic mass is the mass determined using the masses of the most abundant isotopes
Average mass is the abundance weighted mass of all isotopic components
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 13
Amino Acid ResiduesMonoisotopic Masses
Glycine 57.02147Alanine 71.03712Serine 87.03203Proline 97.05277Valine 99.06842Threonine 101.04768Cysteine 103.00919Isoleucine 113.08407Leucine 113.08407Asparagine 114.04293
Aspartic acid 115.02695Glutamine 128.05858Lysine 128.09497Glutamic acid 129.0426Methionine 131.04049Histidine 137.05891Phenylalanine 147.06842Arginine 156.10112Tyrosine 163.06333Tryptophan 186.07932
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 14
Building a PMF Database
• Download protein sequence database – SwissProt or GenBank’s NR (non-redundant)
• Pick a protease, determine cleavage sites and identify resulting peptides for each protein entry
• Calculate the mass (M+H) for each peptide• Sort the mass list
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 15
Building A PMF Database
>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe
>Protein Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe
>Protein Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe
Sequence DB Calc. Tryptic Frags Mass List
acedfhsakdfgeasdfpkivtmeeewendadnfekgwfe
acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe
acedfhsadfgekasdfpkivtmeeewendakdnfegwfe
450.2017 (B-1) 538.2296 (A-4) 664.3300 (C-2) 1007.4251 (A-1)1112.4894 (A-2)1114.4416 (C-4)1300.5116 (B-4) 1407.6462 (B-3)1526.6211 (C-1)1593.7101 (C-3) 1740.7500 (B-2) 2098.8909 (A-3)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 16
The Fingerprint (PMF) Approach• Take a mass spectrum of a protease-cleaved
protein (from gel or HPLC peak)• Identify as many peaks as possible in spectrum• Compare query peaks with database peaks and
calculate # of matches or matching score (based on length and mass difference)
• Rank hits and return top scoring entry (having the most matching peptides) – the protein of interest
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 17
Query (MALDI) Spectrum
500 1000 1500 2000 2500
698
2098
11991007
538
450
2211 (trypsin)
1940 (trypsin)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 18
Query vs. DatabaseQuery Masses Database Mass List Results
450.2017 (B) 538.2296 (A) 664.3300 (C) 1007.4251 (A)1112.4894 (A)1114.4416 (C)1300.5116 (B) 1407.6462 (B)1526.6211 (C)1593.7101 (C) 1740.7501 (B) 2098.8909 (A)
450.2201538.2296 698.31001007.53911199.49162098.9909
2 Unknown masses1 hit on B3 hits on A
Conclude the queryprotein is A
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 19
What You Need To Do PMF
• A list of query masses (as many as possible)• Protease(s) used or cleavage reagents• Databases to search (SP, NR)• Estimated mass and pI of protein spot (opt) • Cysteine (or other) modifications• Minimum number of hits for significance• Mass tolerance (100 ppm = 1000.0 ± 0.1 Da)• A PMF website (Prowl, ProFound, Mascot, PepIdent)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 20
Challenge 1:Overlap in combined masses
Gly + Gly = 114.043 -> Asn = 114.043 Ala + Gly = 128.059 -> Gln = 128.059
-> Lys = 128.095 Gly + Val = 156.090 -> Arg = 156.101Ala + Asp = Glu + Gly = 186.064
Trp = 186.079 Ser + Val = 186.100 -> Trp = 186.079 u Leu = Ile = 113.084
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 21
Challenge 2:Missed Cleavage
>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe
Sequence Tryptic Fragments (no missed cleavage)
acedfhsak (1007.4251) dfgeasdfpk (1183.5266) ivtmeeewendadnfek (2098.8909) gwfe (538.2296)
Tryptic Fragments (1 missed cleavage)
acedfhsak (1007.4251) dfgeasdfpk (1183.5266) ivtmeeewendadnfek 2098.8909) gwfe (609.2667)acedfhsakdfgeasdfpk (2171.9338)ivtmeeewendadnfekgwfe (2689.1398)dfgeasdfpkivtmeeewendadnfek (3263.2997)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 22
Advantages of PMF• Uses a “robust” & inexpensive form of MS (MALDI)• Doesn’t require too much sample optimization• Can be done by a moderately skilled operator (don’t
need to be an MS expert) • Widely supported by web servers• Improves as DB’s get larger & instrumentation gets
better• Very amenable to high throughput robotics (up to 500
samples a day)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 23
Limitations With PMF
• Requires that the protein of interest already be in a sequence database
• Not good for 3+ protein mixtures • Spurious or missing critical mass peaks always
lead to problems• Mass resolution/accuracy is critical, best to
have <20 ppm mass resolution• Generally found to only be about 40%
effective in positively identifying gel spots
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 24
Protein identification strategies
• Mass Spectrometry– Peptide Mass Fingerprinting
• Tandem Mass Spectrometry– Spectral alignment– de novo sequencing
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 25
Tandem Mass Spectrometry
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 26
MS-MS Peptide Fragmentation
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 27
S E Q U E N C E
b-ions (prefix or N-terminal ions)
Mass/Charge (M/Z)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 28
a-ions = b-ions - CO = b-ions - 28
Mass/Charge (M/Z)
S E Q U E N C E
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 29
y-ions (suffix of C-terminal ions)
Mass/Charge (M/Z)
E C N E U Q E S
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 30
Mass/Charge (M/Z)
Inte
nsit
y
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 31
noise
Mass/Charge (M/Z)
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 32
MS/MS Spectrum
Mass/Charge (M/z)
Inte
nsit
y
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 33
Some Mass Differences between Peaks Correspond to Amino Acids
s
ss
e
ee
e
e
e
e
e
q
q
qu
u
u
n
n
n
e
cc
c
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 34
database search vs de novoS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
WR
A
C
VG
E
K
DW
LP
T
L T
WR
A
C
VG
EK
DW
LP
T
L T
de novo
AVGELTK
Database Search
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA,
EKLNKAATYIN..
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 35
SEQUEST Algorithm
• SEQUEST correlates uninterpreted tandem mass (MS-MS) spectra of peptides with amino acid sequences from protein and nucleotide databases
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 36
SEQUEST Algorithm
>Aacedfhsakdfqeasdfpkivtmeeewendadnfekgpfna
>Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe
>Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe
Sequence DB Calc. Tryptic Frags Calc. MS-MS Spec.
acedfhsakdfgeasdfpkivtmeeewendadnfekgpfna
acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe
acedfhsadfgekasdfpkivtmeeewendakdnfegwfe
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 37
Creating a Synthetic MS-MS Spectrum for GPFNA
57 154 301 415 486 71 185 332 429 486
G57
P97
F147
N114
A71
A71
N114
F147
P97
G57
b ions y ions
combine
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 38
SEQUEST Algorithm
acedfhsak
mtlsyk
nmqtydr
giqwemncyk
Query Spectrum Spectral Database Result
giqwemncyk
Score = 128Accession P12345Protein = p53Org. Homo sapiens
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 39
SEQUEST Xcorrhigher is better
CrossCorr
avg AutoCorr offset=-75 to 75
Cross Correlation(direct comparison)
Auto Correlation(background)
XCorr =Gentzel M. et al Proteomics 3 (2003) 1597-1610
Offset (AMU)
Corr
elati
on S
core
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 40
Accuracy Score Relative ScoreAl
tern
ate
Met
hod
Strong(XCorr)
Weak
Weak(DeltaCn)
Strong
SEQ
UES
T
Mascot and X! Tandem
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 41
Mascot
Mascot Score: 120 = 1x10-12
– Scoring based on peptide frequency distribution from a non-redundant database (MOWSE – Molecular Weight SEarch)
– The significance of that result depends on the size of the database being searched. Mascot shades in green the insignificant hits using a P=0.05 cutoff.
In this example, scores less than 74 are insignificant
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 42
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 43
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 44
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 45
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 46
9%
19% 7%
34%
5%
4%22%
Mascot
Each search engine identifies about the same number of spectra,
Each search engine identifies about the same number of spectra,
But the overlap is surprisingly small.
Different search engines match different spectra.
But the overlap is surprisingly small.
Different search engines match different spectra.
Each search engine scores differently
SEQUEST
X!tandem
Courtesy: Proteome Software Inc.
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 47
database search vs de novoS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
WR
A
C
VG
E
K
DW
LP
T
L T
WR
A
C
VG
EK
DW
LP
T
L T
de novo
AVGELTK
Database Search
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA,
EKLNKAATYIN..
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 48
de novo vs Database Search: A Paradox
• A database search scans all peptides to find the best one.• de novo eliminates the need to scan all peptides by
modeling the problem as a graph search.• de novo algorithms are much faster, even though their
search space is much larger!• Done when no PMF or ms/ms spectral match
• Advantage:– Gets the sequences that are not necessarily in the
database.• Disadvantage:
– Requires higher quality spectra to be accurate.
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 49
de novo sequencing is not very accurate:
• Less than 30% of the peptides sequenced were completely correct!
Algorithm Amino Acid
Accuracy
Whole Peptide Accuracy
Lutefisk, 1997 0.566 0.189SHERENGA, 1999 0.690 0.289Peaks, 2003 0.673 0.246PepNovo, 2005 0.727 0.296
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 50
Protein identification strategies
• Mass Spectrometry– Peptide Mass Fingerprinting
• Tandem Mass Spectrometry– Spectral alignment– de novo sequencing
04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 51
References
• SLIDES– Proteomics. 2005 Canadian Bioinformatics Workshops. David Wishart, Gary Van Domselaar.
http://bioinformatics.ca/workshop_pages/proteomics2005/index.html– Protein Sequencing and Identification by Mass Spectrometry. http://bioalgorithms.info– Interpreting MS/MS Proteomics Results. Brian C. Searle. Proteome Software Inc
• Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003 Mar 13;422(6928):198-207. Review.
• Mueller LN, Brusniak MY, Mani DR, Aebersold R. An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. J Proteome Res. 2008 Jan;7(1):51-61.
• MOWSE: Pappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol. 3:327-332
• MASCOT: Perkins DN, Pappin DJC, Creasy DM, and Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551-3567.