mass spectrometry: protein identification strategies

51
Mass Spectrometry Protein Identification Strategies Michel Dumontier Carleton University 06/12/2022 1 OISB: The ABC of Mass Spectrometry for Biology Workshop

Upload: michel-dumontier

Post on 10-May-2015

9.512 views

Category:

Education


0 download

DESCRIPTION

A talk on the basics of protein identification for mass spectrometry.

TRANSCRIPT

Page 1: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 1

Mass SpectrometryProtein Identification Strategies

Michel DumontierCarleton University

Page 2: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 2

Typical MS experiment

Protein Identification

Page 3: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 3

Protein identification strategies

• Mass Spectrometry– Peptide Mass Fingerprinting

• Tandem Mass Spectrometry– Spectral alignment– de novo sequencing

Page 4: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 4

Peptide Mass Fingerprinting (PMF)

Page 5: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 5

Matrix-Assisted Laser Desorption/Ionization (MALDI)

Page 6: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 6

Electrospray Ionization (ESI)

Page 7: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 7

Page 8: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 8

Peptide Mass Fingerprinting

• Identify a protein from peptide signature– MALDI-TOF, ESI-TOF

• Approach– Compare observed with theoretical masses

• Requirements– Protease & cleavage pattern– Database of known sequences

Page 9: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 9

Principles of Fingerprinting

>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe

>Protein Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe

>Protein Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence Mass (M+H) Tryptic Fragments

4842.05

4842.05

4842.05

acedfhsakdfgeasdfpkivtmeeewendadnfekgwfe

acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe

acedfhsadfgekasdfpkivtmeeewendakdnfegwfe

Page 10: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 10

Principles of Fingerprinting

>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe

>Protein Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe

>Protein Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence Mass (M+H) Mass Spectrum

4842.05

4842.05

4842.05

Page 11: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 11

Mass Calculation (Glycine)

NH2—CH2—COOH

R1—NH—CH2—CO—R3

free amino acid

amino acidresidue

Monoisotopic Mass1H = 1.00782512C = 12.0000014N = 14.0030716O = 15.99491

Glycine Free Amino Acid Mass5xH + 2xC + 2xO + 1xN= 75.032015 amuGlycine Residue Mass3xH + 2xC + 1xO + 1xN=57.021455 amu

Page 12: Mass Spectrometry: Protein Identification Strategies

Monoisotopic vs average mass

Monoisotopic mass is the mass determined using the masses of the most abundant isotopes

Average mass is the abundance weighted mass of all isotopic components

Page 13: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 13

Amino Acid ResiduesMonoisotopic Masses

Glycine 57.02147Alanine 71.03712Serine 87.03203Proline 97.05277Valine 99.06842Threonine 101.04768Cysteine 103.00919Isoleucine 113.08407Leucine 113.08407Asparagine 114.04293

Aspartic acid 115.02695Glutamine 128.05858Lysine 128.09497Glutamic acid 129.0426Methionine 131.04049Histidine 137.05891Phenylalanine 147.06842Arginine 156.10112Tyrosine 163.06333Tryptophan 186.07932

Page 14: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 14

Building a PMF Database

• Download protein sequence database – SwissProt or GenBank’s NR (non-redundant)

• Pick a protease, determine cleavage sites and identify resulting peptides for each protein entry

• Calculate the mass (M+H) for each peptide• Sort the mass list

Page 15: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 15

Building A PMF Database

>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe

>Protein Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe

>Protein Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence DB Calc. Tryptic Frags Mass List

acedfhsakdfgeasdfpkivtmeeewendadnfekgwfe

acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe

acedfhsadfgekasdfpkivtmeeewendakdnfegwfe

450.2017 (B-1) 538.2296 (A-4) 664.3300 (C-2) 1007.4251 (A-1)1112.4894 (A-2)1114.4416 (C-4)1300.5116 (B-4) 1407.6462 (B-3)1526.6211 (C-1)1593.7101 (C-3) 1740.7500 (B-2) 2098.8909 (A-3)

Page 16: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 16

The Fingerprint (PMF) Approach• Take a mass spectrum of a protease-cleaved

protein (from gel or HPLC peak)• Identify as many peaks as possible in spectrum• Compare query peaks with database peaks and

calculate # of matches or matching score (based on length and mass difference)

• Rank hits and return top scoring entry (having the most matching peptides) – the protein of interest

Page 17: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 17

Query (MALDI) Spectrum

500 1000 1500 2000 2500

698

2098

11991007

538

450

2211 (trypsin)

1940 (trypsin)

Page 18: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 18

Query vs. DatabaseQuery Masses Database Mass List Results

450.2017 (B) 538.2296 (A) 664.3300 (C) 1007.4251 (A)1112.4894 (A)1114.4416 (C)1300.5116 (B) 1407.6462 (B)1526.6211 (C)1593.7101 (C) 1740.7501 (B) 2098.8909 (A)

450.2201538.2296 698.31001007.53911199.49162098.9909

2 Unknown masses1 hit on B3 hits on A

Conclude the queryprotein is A

Page 19: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 19

What You Need To Do PMF

• A list of query masses (as many as possible)• Protease(s) used or cleavage reagents• Databases to search (SP, NR)• Estimated mass and pI of protein spot (opt) • Cysteine (or other) modifications• Minimum number of hits for significance• Mass tolerance (100 ppm = 1000.0 ± 0.1 Da)• A PMF website (Prowl, ProFound, Mascot, PepIdent)

Page 20: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 20

Challenge 1:Overlap in combined masses

Gly + Gly = 114.043 -> Asn = 114.043 Ala + Gly = 128.059 -> Gln = 128.059

-> Lys = 128.095 Gly + Val = 156.090 -> Arg = 156.101Ala + Asp = Glu + Gly = 186.064

Trp = 186.079 Ser + Val = 186.100 -> Trp = 186.079 u Leu = Ile = 113.084

Page 21: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 21

Challenge 2:Missed Cleavage

>Protein Aacedfhsakdfqeasdfpkivtmeeewendadnfekqwfe

Sequence Tryptic Fragments (no missed cleavage)

acedfhsak (1007.4251) dfgeasdfpk (1183.5266) ivtmeeewendadnfek (2098.8909) gwfe (538.2296)

Tryptic Fragments (1 missed cleavage)

acedfhsak (1007.4251) dfgeasdfpk (1183.5266) ivtmeeewendadnfek 2098.8909) gwfe (609.2667)acedfhsakdfgeasdfpk (2171.9338)ivtmeeewendadnfekgwfe (2689.1398)dfgeasdfpkivtmeeewendadnfek (3263.2997)

Page 22: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 22

Advantages of PMF• Uses a “robust” & inexpensive form of MS (MALDI)• Doesn’t require too much sample optimization• Can be done by a moderately skilled operator (don’t

need to be an MS expert) • Widely supported by web servers• Improves as DB’s get larger & instrumentation gets

better• Very amenable to high throughput robotics (up to 500

samples a day)

Page 23: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 23

Limitations With PMF

• Requires that the protein of interest already be in a sequence database

• Not good for 3+ protein mixtures • Spurious or missing critical mass peaks always

lead to problems• Mass resolution/accuracy is critical, best to

have <20 ppm mass resolution• Generally found to only be about 40%

effective in positively identifying gel spots

Page 24: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 24

Protein identification strategies

• Mass Spectrometry– Peptide Mass Fingerprinting

• Tandem Mass Spectrometry– Spectral alignment– de novo sequencing

Page 25: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 25

Tandem Mass Spectrometry

Page 26: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 26

MS-MS Peptide Fragmentation

Page 27: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 27

S E Q U E N C E

b-ions (prefix or N-terminal ions)

Mass/Charge (M/Z)

Page 28: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 28

a-ions = b-ions - CO = b-ions - 28

Mass/Charge (M/Z)

S E Q U E N C E

Page 29: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 29

y-ions (suffix of C-terminal ions)

Mass/Charge (M/Z)

E C N E U Q E S

Page 30: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 30

Mass/Charge (M/Z)

Inte

nsit

y

Page 31: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 31

noise

Mass/Charge (M/Z)

Page 32: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 32

MS/MS Spectrum

Mass/Charge (M/z)

Inte

nsit

y

Page 33: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 33

Some Mass Differences between Peaks Correspond to Amino Acids

s

ss

e

ee

e

e

e

e

e

q

q

qu

u

u

n

n

n

e

cc

c

Page 34: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 34

database search vs de novoS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lative

Ab

un

da

nce

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

WR

A

C

VG

E

K

DW

LP

T

L T

WR

A

C

VG

EK

DW

LP

T

L T

de novo

AVGELTK

Database Search

Database ofknown peptides

MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,

HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,

ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA,

EKLNKAATYIN..

Page 35: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 35

SEQUEST Algorithm

• SEQUEST correlates uninterpreted tandem mass (MS-MS) spectra of peptides with amino acid sequences from protein and nucleotide databases

Page 36: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 36

SEQUEST Algorithm

>Aacedfhsakdfqeasdfpkivtmeeewendadnfekgpfna

>Bacekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe

>Cacedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence DB Calc. Tryptic Frags Calc. MS-MS Spec.

acedfhsakdfgeasdfpkivtmeeewendadnfekgpfna

acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe

acedfhsadfgekasdfpkivtmeeewendakdnfegwfe

Page 37: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 37

Creating a Synthetic MS-MS Spectrum for GPFNA

57 154 301 415 486 71 185 332 429 486

G57

P97

F147

N114

A71

A71

N114

F147

P97

G57

b ions y ions

combine

Page 38: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 38

SEQUEST Algorithm

acedfhsak

mtlsyk

nmqtydr

giqwemncyk

Query Spectrum Spectral Database Result

giqwemncyk

Score = 128Accession P12345Protein = p53Org. Homo sapiens

Page 39: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 39

SEQUEST Xcorrhigher is better

CrossCorr

avg AutoCorr offset=-75 to 75

Cross Correlation(direct comparison)

Auto Correlation(background)

XCorr =Gentzel M. et al Proteomics 3 (2003) 1597-1610

Offset (AMU)

Corr

elati

on S

core

Page 40: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 40

Accuracy Score Relative ScoreAl

tern

ate

Met

hod

Strong(XCorr)

Weak

Weak(DeltaCn)

Strong

SEQ

UES

T

Mascot and X! Tandem

Page 41: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 41

Mascot

Mascot Score: 120 = 1x10-12

– Scoring based on peptide frequency distribution from a non-redundant database (MOWSE – Molecular Weight SEarch)

– The significance of that result depends on the size of the database being searched. Mascot shades in green the insignificant hits using a P=0.05 cutoff.

In this example, scores less than 74 are insignificant

Page 42: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 42

Page 43: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 43

Page 44: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 44

Page 45: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 45

Page 46: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 46

9%

19% 7%

34%

5%

4%22%

Mascot

Each search engine identifies about the same number of spectra,

Each search engine identifies about the same number of spectra,

But the overlap is surprisingly small.

Different search engines match different spectra.

But the overlap is surprisingly small.

Different search engines match different spectra.

Each search engine scores differently

SEQUEST

X!tandem

Courtesy: Proteome Software Inc.

Page 47: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 47

database search vs de novoS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lative

Ab

un

da

nce

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

WR

A

C

VG

E

K

DW

LP

T

L T

WR

A

C

VG

EK

DW

LP

T

L T

de novo

AVGELTK

Database Search

Database ofknown peptides

MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,

HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,

ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA,

EKLNKAATYIN..

Page 48: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 48

de novo vs Database Search: A Paradox

• A database search scans all peptides to find the best one.• de novo eliminates the need to scan all peptides by

modeling the problem as a graph search.• de novo algorithms are much faster, even though their

search space is much larger!• Done when no PMF or ms/ms spectral match

• Advantage:– Gets the sequences that are not necessarily in the

database.• Disadvantage:

– Requires higher quality spectra to be accurate.

Page 49: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 49

de novo sequencing is not very accurate:

• Less than 30% of the peptides sequenced were completely correct!

Algorithm Amino Acid

Accuracy

Whole Peptide Accuracy

Lutefisk, 1997 0.566 0.189SHERENGA, 1999 0.690 0.289Peaks, 2003 0.673 0.246PepNovo, 2005 0.727 0.296

Page 50: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 50

Protein identification strategies

• Mass Spectrometry– Peptide Mass Fingerprinting

• Tandem Mass Spectrometry– Spectral alignment– de novo sequencing

Page 51: Mass Spectrometry: Protein Identification Strategies

04/11/2023 OISB: The ABC of Mass Spectrometry for Biology Workshop 51

References

• SLIDES– Proteomics. 2005 Canadian Bioinformatics Workshops. David Wishart, Gary Van Domselaar.

http://bioinformatics.ca/workshop_pages/proteomics2005/index.html– Protein Sequencing and Identification by Mass Spectrometry. http://bioalgorithms.info– Interpreting MS/MS Proteomics Results. Brian C. Searle. Proteome Software Inc

• Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003 Mar 13;422(6928):198-207. Review.

• Mueller LN, Brusniak MY, Mani DR, Aebersold R. An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. J Proteome Res. 2008 Jan;7(1):51-61.

• MOWSE: Pappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol. 3:327-332

• MASCOT: Perkins DN, Pappin DJC, Creasy DM, and Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551-3567.