simultaneously identify and quantitate proteins … · in fact, relative abundance between two...

30
Guide to Quantitative Proteomics using Tandem Mass Tags (TMT) SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS WITH TMT This guide begins with an introduction to the basics of protein identification using nanoLCtandem MS, with a description of how the TMT quantitative reporters are used to determine relative quantitation of proteins. Included in this introduction are screenshots with descriptions of what you are looking at, and how to interpret search results using Mascot (Matrix Science, Ltd.) analysis programs. After this introduction, there are specific instructions on how to approach quantitative search results using. TMT provides relative quantitation, comparing 2 or 3 conditions. There are step by step guidelines for setting statistical thresholds and then viewing the protein families and inspecting isoform specific quantitation.

Upload: others

Post on 11-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Guide to Quantitative Proteomics using Tandem Mass Tags (TMT)

SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS WITH TMT

This guide begins with an introduction to the basics of protein identification using nanoLC‐tandem MS, with a description of how the TMT quantitative reporters are used to determine relative quantitation of proteins.

Included in this introduction are screenshots with descriptions of what you are looking at, and how to interpret search results using Mascot (Matrix Science, Ltd.) analysis programs.

After this introduction, there are specific instructions on how to approach quantitative search results using. TMT provides relativequantitation, comparing 2 or 3 conditions. There are step by step guidelines for setting statistical thresholds and then viewing the protein families and inspecting isoform specific quantitation.

Page 2: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Typical proteomics data from LC-tandem MS is a series of thousands of scans that include 1. MS scans: survey scans to observe the mass or “m/z” of intact peptide ions, and2. tandem MS (MS/MS) scans that show fragment masses that are generated from individual peptide ions (below is an example)The data acquisition cycle includes one “survey” MS scan to observe peptide ions and measure their mass, followed by MS/MS scans on isolated peptide ions that were observed in the survey scan: about 5 to 20 of them depending on the speed of the instrument.This cycle is repeated about every 5 seconds as peptides elute from the reversed phase nanoLC column. This generates lots of peptide MS/MS scans: the mass of a peptide is known based on the survey MS scan, and the fragments that were generated from each peptide ion is a very unique set of masses. In the example below, a peptide ion of 738.43 (m/z = mass/charge) was isolated after observing it in an MS survey scan moments before. It was a doubly charged ion, so the mass of the peptide is actually about twice that value (738 x 2 = 1476). This is why you can see (singly charged) fragments with m/z of higher than 738 in the MS/MS spectrum.

*Search programs that identify proteins from peptide MS/MS data simply compare the fragment masses in the spectrum to theoretical fragment masses based on peptides in a database. Only peptides in the database that have the same mass are considered for matching to the fragment masses in the spectrum. In this case it would be 1476Da plus or minus a user-defined search tolerance of maybe 0.5Da or less.

07Jan2405 #1219 RT: 25.23 AV: 1 NL: 2.09E3T: ITMS + c ESI d Full ms2 [email protected] [190.00-1490.00]

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

489.19

1228.21729.23

1100.17

987.15830.34

1086.23

608.26

720.90376.18

958.26588.27

1215.29

1329.17

646.06358.10534.88

248.12

888.12709.18

1069.301312.22471.20261.03 941.17

822.17340.21 1198.221051.33

1183.11 1295.68314.18 455.42 1347.68

Protein identification is based on at least one peptide spectral match or “PSM”. Usually, many different PSMs are associated with a protein, so protein identification based on this technology can be statistically VERY convincing.

It’s like looking up the answer in the back of the book. If the peptide is in the database, a match should occur. Of course, RANDOM matches also occur. They are usually of lower quality, so scoring algorithms help sort that out.

Mascot uses a statistical method to present matches with a score and other useful info that helps you determine if it is a true peptide match.

Mass of the peptide ion (m/z) from the

survey scanPeptide fragments

Basics of Protein Identification by tandem MS:

m = mass, z = charge state

Page 3: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Peak height/peak area is directly proportional to abundance in most types of MS data.In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement.

Quantitation based on tandem mass tags (TMT) is done by covalently attaching a molecule to the N-terminus of peptides.It also labels any Lysine side chain amino group, which does not detract from the quantitative accuracy. In MS/MS spectra, the reporter ion breaks off (at the “cleavable linker”), and the reporter ion is detected/measured in the MS/MS scan. It is a quantitative reporter (the intensity of the peak is proportional to abundance).

To multiplex, the same molecule is synthesized with isotopes (O18, N15, C13) distributed differently in this molecule, such that each different TMT reagent actually has the same mass. But when the reporter ion breaks off in MS/MS, each reporter ion has a different mass.So you can label different samples (peptide mixtures) with a unique TMT reagent, then pool them back together, and the same peptide from the each sample elutes from the reversed phase LC at the same time. Peptides from different samples with identical sequence (same peptide) are isolated together for MS/MS (they all have the same mass after each is tagged with the “isobaric” TMT labels).

The unique reporter ions are generated in the MS/MS spectrum! The example data used in this training session uses SIX different TMT reagents (6-plex). The reporter ions are m/z 126, 127, 128, 129, 130, and 131.

PEPTIDE

Again, each different TMT reagent has the same mass and the same structure, shown to the right, but stable isotopes are distributed differently across that labile bond-the “cleavable linker”. Reporter ions that break off in the MS/MS spectra are quantitative reporter ions with a unique mass for each TMT reagent. The example MS/MS data on the next few slides is zoomed in on this region of the scanfrom ~ m/z 121 -137 to illustrate the reporter ion concept.

Quantitative analysis using Tandem Mass Tags (TMT):

Page 4: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

RO-CS-DallasS_KA_Fraction-3 #4599 RT: 32.62 AV: 1 NL: 1.47E7T: FTMS + p NSI d Full ms2 [email protected] [120.00-1445.00]

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

129.13 130.14

131.14

128.14127.13126.13

132.14

This is an example of experimental sample showing elevated expression level for this peptide/protein

Control

Experimental

Each sample was prepared in triplicate, with 126, 127 and 128 TMT reagents used on the control sample, and 129, 130, 131 TMT reagents used on the experimental sample.

The RELATIVE abundance is reported as the RATIO of experimental divided by control, so that if the ratio is ABOVE 1.0, it is more abundant in the experimental, if the ratio is below 1.0, the protein is less abundant in the experimental sample.

This is “zoomed in” on the region where the reporter ions appear in the MS/MS spectrum-~m/z 121 to 137see the x-axis:

MS/MS scan:The reporter ions break off

Page 5: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

RO-CS-DallasS_KA_Fraction-3 #4408 RT: 31.44 AV: 1 NL: 1.17E7T: FTMS + p NSI d Full ms2 [email protected] [120.00-1500.00]

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

126.13

127.13 128.14 130.14129.13

131.14

136.08

132.14135.56128.28 131.53

Control Experimental

This is an example of experimental sample showing ~ no change for this peptide/protein

This is “zoomed in” on the region where the reporter ions appear in the MS/MS spectrum-~m/z 121 to 137see the x-axis:

Page 6: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

RO-CS-DallasS_KA_Fraction-3 #4584 RT: 32.53 AV: 1 NL: 1.58E8T: FTMS + p NSI d Full ms2 [email protected] [120.00-1050.00]

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

126.13

128.14

127.13

130.14

131.14129.13

124.09

130.07 132.14 135.56125.10 136.08

Control

Experimental

This is an example of experimental sample showing reduced expression level for this peptide/protein

This is “zoomed in” on the region where the reporter ions appear in the MS/MS spectrum-~m/z 121 to 137see the x-axis:

Page 7: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

-the unique peptide fragmentation pattern is used to identify the peptide (a peptide spectral match-PSM), -the reporter ions provide the relative quantitation information.

TMT labeling thus allows the simultaneous protein identification and quantitation in MS/MS. It is all in the same MS/MS scan. Separate/additional PSMs from the same protein provide independent quantitative measurements, and also statistically strengthens the identification.

Reporter ions Peptide fragment ions

The specific peptide fragments are indicated here(note that b-ions include the N-terminus,

and y-ions include the C-terminus)

The complete MS/MS spectrum, showing all fragments:

Peptide Spectral Match (PSM)

Page 8: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

This is the first page that opens from a Mascot Search Results link when the search includes TMT quantitation. The database searched here is a Spring 2015 download of all Mouse and Rat protein sequences in the UNIPROT database (named MusRat).

There are parameters for identification, and separate parameters for quantitation. Mass tolerances, and other parameters are different.Search parameters summary Peptide matches in the identificationsare based on these user defined parameters(1). Here are a few of them:

* Peptide mass tolerance: Mass assigned by the mass spectrometer (peptide ions eluting from the nanoLC) must match the theoretical peptide mass within 10ppm (user definable) to be considered for matching MS/MS fragment ions for the purpose of identification. FT-MS data from a calibrated Orbitrap is typically within 2ppm of accurate mass.

*Fragment mass tolerance: In this search, masses assigned by the mass spectrometer in the peptide MS/MS spectra must be within 0.05 Da of theoretical mass to be included as a matching fragment. This is an “absolute error” (not a “relative” error, like the ppm error expression).

*Fixed modifications: Since peptides have been modified (TMT), their mass has been changed. The search program must know this to match the masses. The TMT reagent is covalently attached to the N-terminal amino group, and any Lysine residue (K). Also, Cysteine residues are alkylated with iodoacetimide during sample processing to block disulfide formation, so C residues have a carbamidomethyl group.

If a peptide in the database has a calculated mass with these theoretical modifications, within 10ppm of the assigned mass in the MS data, it is considered for matching predicted fragment ions in the corresponding MS/MS spectrum, and those MS/MS fragments must match within 0.05 Da to the theoretical fragment masses to contribute to scoring the peptide spectral match (PSM). Mascot assigns an “Ions score” for each peptide match. Aside from the coincidence of fragment masses with predicted, theoretical masses, other features affect the Ions scores.

1 Absolute error (in Da)Peptide Mass x 106 = error expressed in ppm

Error expressed in ppm scales with the mass of the ion measured, so it is a “relative” error

Note: you can calculate/set relative error in ppm for fragment ions too, but these parameters set absolute error of 0.05Da for fragment masses

Database search

**

*

Page 9: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Parameters for quantitation are based on a parameter file, in this case- “TMTe6 6plex 5ppm tol NO correction”. That parameter file is shown to the right of Quantitation (2) under Search Parameters. The user sets parameters, names the parameter file, and selects it for Quantitation (the parameter method editor window is shown).

There are many choices to be made for quantitation parameters. Since there are different quantitation methods, Mascot needs to know which method is used, and how to conduct the quantitation. For TMT reporter ion methods, relevant parameters include the exact mass of the reporter ions, whether and how to normalize the ratios calculated, the mass tolerance allowed when identifying/selecting peaks for quantitation, etcetera.

In this view, note the help screen at the bottom is showing the choices for Protein Ratio Type. For this search, weighted was chosen. Certain search choices can expand or limit what you see in a search result, such as the Show subsets option (protein families). There can be many family members that appear in “subsets”. The Report Ratio tab is where you select the quantitative ratio to be calculated. For the example data here, the ratio is based on three control reporter ions (126, 127, and 128) and three experimental (treated) reporter ions (129, 130, and 131). The ratio is expt/control, times the normalization ratio (which is set in the Normalisation tab). In this case, the average ratio of all or a large random set of reporter ions in the data is the normalization ratio (loading control).That is a brief description of how you set up searches and quantitative analysis. But there is search and quantitation help pages with definitions and valuable information available from the Mascot home page (http://kc-bio-mascot/mascot/index.html). The help link is in the main top bar: Database search help

This information is meant to enhance your recognition of some of the choices made in setting up searches. You will see the consequences of these settings within search results.

2

TMT quantitative data analysis

Page 10: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Every PSM gets an Ions score:The exact scoring algorithm is proprietary, but peptide Ions scores are based in part on the coincidence of MS/MS masses assigned by the mass spectrometer compared to predicted fragment masses from peptides (in proteins) in the database. The Ions score for a longer peptide is generally higher than the Ions score for a shorter peptide (Ions score grows larger with peptide length). This is because there are more potentially matching fragments, and because a more complex PSM is less likely to be due to a random event (probability is a factor in the Ions score). Unassigned peaks cause the Ions score to be lowered, since spectral noise can result in random matches to predicted fragment masses. Additional factors go into calculating the peptide Ions score. Ions scores are sole contributors to the overall “Mascot score” for the protein.

So, what’s a good score for a peptide? Mascot provides threshold calculations based on probability. Recall that all peptides in the database that have the same mass (same as the assigned mass) within search tolerance are considered for matching MS/MS fragment ions. If the Peptide mass tolerance is broad or if the database is bigger, more peptides will be considered, and the potential for a random match increases.

The threshold, therefore should be based on the desired significance threshold and the number of peptides in the database that fall within the mass range (mass +/- tolerance). How is this calculated?

Search programs like Mascot score peptide MS/MS matches. It provides the numeric basis for sorting and ranking identified proteins automatically, and the matched peptides are presented in the context of the identified protein. Some, but not all PSMs (only those that meet stringent scoring criteria) contribute to the quantitation.

Peptide Spectral Match (PSM) Scoring

Page 11: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Default significance threshold is 0.05* (aka 95% confidence)95% confidence level simply means that there is a 1 in 20 chance that the match is a random event.

The identity threshold is calculated by the formula: -10Log(P)P is the probability that the match is a random event. If there are 15,000 peptides in the database with a mass within search tolerance, and the significance threshold is 0.05 (1 in 20 chance), here is a summary of the threshold calculation

*

The identity threshold is provided for EVERY peptide MS/MS match. If the Ions score meets or exceeds the identity threshold, it is at least a 95% confident match.

(-10) times log10 (P), where P = 20(15,000)1

Identity threshold = 54.77

As you would expect, if there are more peptides in the database of the same mass (150,000), the threshold for 95% confidence is higher,because there are more opportunities for a random match:

(-10) times log10 (P), where P = 20(150,000)1

Identity threshold = 64.77

Your statistical tools:-Identity Threshold-Ions Score-Significance Threshold-Expect value

Significance and Identity thresholds

The expect value is another very useful statistic provided:Expect value is described by Mascot as “The number of times we would expect to obtain an equal or higher score, purely by chance. The lower this value, the more significant the result”

Page 12: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

*

Your statistical tools:-Identity Threshold-Ions Score-Significance Threshold-Expect value

The expectation value

Mascot calculates another threshold based on the distribution of ALL random scores for a given MS/MS scan. It is called the “homology threshold”. View the homology threshold as a less stringent threshold than the identity threshold. If there are too few random PSMs, a homology score cannot be calculated. You will see this in search results.

The expect value combines the Ions score and identity threshold into one informative number. Here is the calculation.

Score difference () = identity threshold - Ions score

Expect = threshold probability x (10 )

Expect value has a range of between 0 and 1. If identity threshold is 50, and significance threshold is 0.05, then if the

Ions score of 40, expect value = 0.5Ions score of 50, expect value = 0.05Ions score of 60, expect value = 0.005

The lower the expect, the better.

10

Page 13: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

The scoring details described above are somewhat difficult, but provided so that you can refer back to them at any time.

The expect value is generally the most useful threshold to consider since it combines identity and significance thresholds.

Now, we will set more stringent thresholds and limitations on the search results and quantitation so that you can concentrate on the identifications, isoform questions, and look at the quantitation results.

These are accomplished using the Filter and Quantitation sections, and also in the Decoy search summary settings, and then you’ll be ready to look at results.

Page 14: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

*

FIRST STEP: Replace the 0 in the Ions Score or expect cutoff field with 0.05 and click the Filter button. Mascot knows that any value between 0 and 1 is an expect value. Wait until the calculations are complete, and the page has stopped reloading. This eliminates lower quality PSMs that are not included in the identifications, sets a fairly stringent threshold for identification scoring, simplifying the results you will view.

SECOND STEP:Open the Decoy search summary (click the black triangle)And adjust the false discovery rate to 1% (select 1% and click the Adjust to button). This “retrocalculates” the Significance threshold downward (which had default value of 0.05) and the homology threshold downward so that the false discovery rate results in 1% false positive matches in the decoy database.The decoy database is the same as the MusRat database, but with all sequences reversed.

When it is done, you will see that the significance threshold is far more stringent (lower than 0.05) This conveniently sets false discovery rate at 1% (for publication), limiting the acceptable matches in the real database search by that arbitrary significance threshold, also increasing the stringency on acceptable PSMs that are included for quantitation. It also simplifies the lists of PSMs for each protein/protein family

Your statistical tools:-Identity Threshold-Ions Score-Significance Threshold-Expect value

Setting thresholds and quant output

Page 15: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

THIRD STEP:Now we need to further limit the PSMs used in the quantitation to Unique peptides only. Check that box in the Quantitate parameters and click the Quantitate button.

Do you see that the peptide threshold field (Above homology 0.05) is greyed out? That happened when we forced the false discovery rate to 1% because thresholds are adjusted to accomplish this. It is no longer adjustable after that process is done.

When the calculations are complete and the page has stopped reloading, the data is ready for reviewing.

NOTE: You can expand the number of families shown per page shown after applying these filters.

After the page has completed the reload, scroll down and review the search results.

Proteins (protein families) are listed in descending order (Mascot score).

Page 16: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Scroll down, and open a protein family by clicking the black triangle *. Isoforms in protein families have common peptides, and unique peptides needed to allow isoform specific quantitation reporting. Here is an example: the Guanine Nuc. Binding protein family.

Mascot scores isoform specific ratios are shown here

Mouse-over a ratio to see how many PSMs are included in the

weighted average shown(N=10 on this one)

These black squares indicate which of the 6 family

members have the peptide.

Ions scores

The Ions score, expect, and ratio shown on the line for a matched peptide are from the best PSM when there is more than one.

When there is more than one PSM for a peptide in the data. You can see the individual scores for each match by opening the Dupes (duplicates) -click on the black triangle.

*

You may open the summary for one of these proteins (“protein view”) by clicking on the blue accession #

Page 17: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Mouse over the black triangle in the rank column to see the identity and homology threshold values. If the red text is bold, it means at least one MS/MS scan matched to the peptide met or exceeded the homology threshold (95% confidence). Some peptides are NOT bold red because the Ions Score does not meet the threshold. Note that the expect DOES meet the criteria we set, so they appear in the list. The dashed line in the ratio column appears when that ratio was excluded for this reason.

Note: homology threshold is based on the average score of the set of peptides NOT matched to this protein (typically with lower scores). If no homology threshold could be calculated (not enough other matches), of course the ratio is also excluded.

If a peptide appears more than once in the list, look at the observed mass (m/z), and you will see that the peptide was isolated in different charge states (z) ** (value is ½, or 1/3, or ¼… of the Mr). These are considered independent measurements, and each PSM contributes to the ratio if they meet all criteria.

**

z = +2z = +3

Page 18: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Red and Bold means the PSM is significant based on the new expect value set by forcing 1% false discovery rate in the decoy database. There will be a ratio reported which contributes to the quantitation. Red, not bold just means it supports the identification but does not meet stringent criteria.If the PSM meets the criteria for identification (bold red), it may not meet criteria to contribute to quantitation. Dashes are displayed when a ratio was not determined or does not meet some criteria. This may be because one or more of the relevant peaks were missing, giving a ratio which was zero, infinity, or indeterminate. Alternatively, the peptide match may have been rejected on quality grounds.

Note: homology threshold is based on the average score of the set of peptides NOT matched to this protein (typically with lower scores). If no homology threshold could be calculated (not enough other matches), of course the ratio is also excluded.

Page 19: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

The triple pound appears when there is a “divide by zero” error. This happens if the reporters peaks are out of mass tolerance range. When this occurs, the peptide may be in bold red because the PSM met significance threshold for identification.Therefore, the identified peptide contributes to the Mascot score, but not to the quantitation weighted average for one of the family members, even if it is unique to that family member and bold red.

Page 20: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Here is a screen shot from the ras family summary. Let’s look at some peptide MS/MS match summary pages in this set. First, I clicked on the Query number 22621, which is the best of three scans matched to the peptide (there were two duplicate scans (“Dupes)-see the black triangle with the 2?). It gets a high score (61) well above identity threshold, but was excluded. Why? Clicking on the query number opens the “peptide view”, which is a validation window to inspect the PSM. There are a few critically useful things in peptide view if you need to scrutinize a certain peptide match.

Peptide View opens showing the reporter ions region of the matching MS/MS scan. Zoom it out using this scan navigation tool to see the entire spectrum. We’ll continue on the next slide.

Mouse over this triangle to see the identity threshold

Page 21: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Now you can see the peptide fragment ions (a peptide generally breaks only once, at a peptide bond in this type of MS/MS). There were maybe 10 nmolesof this peptide that broke at different places to generate these ions. The matching b-ions (which include the N-terminus) and y-ions (which include the C-terminus) are indicated on the scan, and shown in bold red in the summary table. These are the main features that made the identification.NOW, look at the mass assignment error analysis under the table. Absolute error (in Da) is shown on the left. The relative error in ppm is shown on the right. Quantitation was set with error tolerance of (no more than) 5 ppm for this search, and you can see the errors are more like 7ppm- It is actually a divide by zero error that causes this (###). But it is still an excellent peptide match contributing to the Mascot score, because the mass tolerance in the search parameters is broader (10 ppm precursor, 0.05Da fragment).

Mass error is the difference between measured and theoretical mass, plotted as absolute (Da) or relative (ppm) error.It’s an excellent validation feature.

absolute error relative error

Page 22: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Most of the data from HRMS instrumentation (an Orbitrap in this case) has excellent mass accuracy. Here is another peptide from the ras family. You can see the fragments indicated in the panel to the right, with b and y ions labeled in the spectrum. For instance, If the peptide bond between the A and F amino acid residues break, you get two fragments: the b-ion (b8) includes the N-terminus (TASNVEEA-), and the y-ion (y6) includes the C-terminus (-FINTAK). Those were both found in the spectrum, and shown in bold red in the table.

The math adds up. For example, the mass of the b8 ion (table) is 1031.52, and the mass of the y6 ion is 922.55.1032.52 + 922.55 = 1954.07This was a doubly charged peptide ion. An extra proton (H+) was present on both of these positively charged fragment ions, so subtract those two:1954.07 – 2 = 1952.07 This should be the uncharged peptide mass. It is!So you see this is actually simple math, appearing complex until you dig into it!

Now look at the error analysis. The absolute error 1 (in Daltons) shows that most of these matched peaks have extremely small errors. The cluster of peak with larger errors at about 1500 are probably spectral noise, coincident with predicted fragment masses (look in the spectrum in that area). The relative error 2 (in ppm) is generally less than 1ppm (!), and you can see those likely random noise peaks in the same mass range.ANY peaks that match within the mass tolerance (0.05 for fragment ions-search parameters) will get assigned, so the outlying noise peak with error of more than 0.02 Da (30ppm) is included-the display to autoscales to include all assigned peaks.

Error analysis:

1 2

Page 23: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Resolving Protein Isoform/Spliceformand Species Issues-Protein families share conserved peptides, but there are also unique peptides. -Only the unique peptides provide conclusive proof that an isoform is present.-Only the reporter ion ratios from unique peptides quantitate specific isoforms

The protein family is divided into four groups.7.1 is Filamen alpha. Mouse-over the ratio for these 4 isoforms and you’ll see that alpha has the most unique PSMs. The unique individual peptides that support this are shown in column 1 (black squares, when they are only in column 1).

“Samesets” have the same set of PSMs, so they are indistinguishable.

Note that there is a “sameset” for alpha. Click the triangle to see the accession/description. In this case it is alpha filament. Note the mass is slightly different (320792, not 320072Da). All of the same peptides are present in both, and no unique PSM distinguishes them. It may be a splice variant.Sometimes a sameset is the same protein from another species.

ratio

Page 24: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Resolving Protein Isoform/Spliceformand Species Issues

Species issues can be difficult, since isoform A in one species might have been named isoform B in another species. But sometimes the nomenclature is not an issue.

In this case, the rat accession 7.3 is Filamen B (beta), and 7.2 is the mouse entry for Filamen B. Why are they both there? Look at the black squares under 2 and 3. Almost all peptides in 7.3 are in 7.2. One is not. It is a long peptide at the bottom of the list. Mouse over the ratio for 7.3 (0.945). N = 1, so the ratio reported is for that one peptide.

ratio7.3 is a Rat accession. How do you deal with that?

Page 25: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Sometimes the accession shown as primary identified protein family is the wrong species (Rat in this case). You can see this happened many times in this Ras-related protein rab superfamily.

Sometimes the accession shown as primary identified protein family is the wrong species (Rat in this case). You can see this happened many times in this Ras-related protein rabsuperfamily.

-First, check the samesets. You may find the same peptides found in a mouse entry there.

-If no mouse entry is listed, see how many PSMs were used in the ratio calculation (mouse over the ratio). If it is just one peptide, it may not be worth looing into, but if there are a few unique peptides, perhaps that mouse isoform is not in this database.Example rab13 (89.6)Follow the link to the protein view by clicking the blue hypertext RAT entry (P35286).

See next page:

Page 26: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

The Protein View summarizes the PSMs and shows which of them were used for quantitation using the reporter ratios.

Three peptides were NOT unique, one had divide by zero error (###), presumably due to reporter ions out of tolerance.

This identification, and the quantitation are based on the PSM for one peptide that happens to match a RAT sequence! It could be a random match, or…

was the corresponding mouse RAB 13 missing from the database? BLAST the sequence using the link. Inspect the best MOUSE protein returned in the search to find the peptides identified in this RAT accession. If they are there, you may have solved the problem.Even if this putatively matched protein is not differentially expressed, you may want to attempt to validate it’s presence only for identification purposes (presence in the sample). ONLY unique peptides prove this.

BLAST

Page 27: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

EXPORT an EXCEL compatible list of matched proteins with quantitative ratiosTo begin this process, Open the search results, impose the threshold limits discussed earlier, and the unique peptides only filter, when the page stops loading, click on the Report Builder Tab. It builds the list you see there. Then open on the Columns options-black triangle

Page 28: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Use the pulldown to select <custom> which enables the column selection editor.

Page 29: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

Select the item you want to remove (red arrow), and use the right arrow (circled) to move it out of the list

Similarly, add items by highlighting something in the right column, and the click the left arrow. Highlight the position you want to place over in the left column before you move it, or it will be added at the bottom. The green arrow is aimed at the reporter ion ratio. You’ll want to add that.

The main reason to do this initially will be to export an Excel compatible list WITH quant ratios so you can sort in Excel by quant ratio OR by Score.

Eliminate Mass, emPAI, “Num. of whatever” (don’t need them since you have access to Mascot server).

Note: you can eliminate additional things after you import into Excel by highlighting the column and deleting it.

Move things up or down the list by highlighting, then click the up or down arrow to move it.

Click the apply button to make your selected changes. When the page is done reloading, you can export to Excel (Export as CSV).

Page 30: SIMULTANEOUSLY IDENTIFY AND QUANTITATE PROTEINS … · In fact, relative abundance between two peaks in the same spectrum is a particularly reliable quantitative measurement. Quantitation

The exported Excel sheet can be used to manually edit the results while you validate and check entries on the Mascot server. For instance, if you sort by ratio in Excel, this will instantly show you which are highest differential. Then you can go back to Mascot and find the entry, dig into it to sort out any issues like isoforms, and check the ratios. This will speed up the process of finding interesting quantitation differences. You may need to BLAST a matched protein or align it with something we’ve previously identified to verify it is the same protein (sometimes, the accessions change as the database is curated).