mass spectrometry in proteomics modified from: i519 introduction to bioinformatics, fall, 2012
TRANSCRIPT
![Page 1: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/1.jpg)
Mass spectrometry in proteomics
Modified from: www.bioalgorithms.info
I519 Introduction to Bioinformatics, Fall, 2012
![Page 2: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/2.jpg)
Outline Proteomics & Mass spectrometry Application of MS/MS in proteomics
– Protein sequencing and identification by mass spectrometry
• Protein Identification via Database Search (SPC & spectral alignment)
• De Novo Peptide Sequencing (Spectrum graph)• Hybrid
– Identifying Post Translationally Modified (PTM) Peptides
– (Quantitative proteomics)• identifying proteins that are differentially abundant
![Page 3: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/3.jpg)
The Dynamic Nature of the Proteome The proteome of the cell
is changing Various extra-cellular,
and other signals activate pathways of proteins.
A key mechanism of protein activation is post-translational modification (PTM)
These pathways may lead to other genes being switched on or off
Mass spectrometry is key to probing the proteome and detecting PTMs
![Page 4: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/4.jpg)
An analytical technique for the determination of the elemental composition of a sample or moleculeIon source: ESI (electrospray ionization), MALDI (matrix-assisted laser desorption/ionization)Mass analyzer: separate the ions according to their mass-to-charge ratio, e.g., TOF (time-of-flight)
Mass Spectrometry (MS)
![Page 5: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/5.jpg)
Proteins Pure Protein Peptides Pure Peptide
MolecularIons
Single m/zIons Fragments
Gel/AC Digest
LC
ESI
1st MS Dissociation 2nd MS
MS Spectra
MS Operation
WetLab Operation
ShotgunTopdown Bottomup
Computing Operation
ProteinID
ProteinQuantiftn
PTMsites
Search EngineOther
SoftwareTools
Taken from: http://ms-facility.ucsf.edu/documents/PC235_2009_Lec1_MS_Intro.ppt
Proteomics Approaches
![Page 6: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/6.jpg)
Protein Identification by Tandem Mass Spectrometry
SSeeqquueennccee
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
MS/MS instrumentMS/MS instrument
Database search•Sequestde Novo interpretation•Sherenga
Ref: Mass spectrometry-based proteomics, Nature 2003, 422:198
![Page 7: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/7.jpg)
Breaking Protein into Peptides and Peptides into Fragment Ions
Proteases, e.g. trypsin, break protein into peptides.
A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece.
Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones.
Mass Spectrometer measure mass/charge ratio of an ion.
![Page 8: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/8.jpg)
Breaking Proteins into Peptides
peptides
MPSER……
GTDIMRPAKID
……
HPLCTo MS/MSMPSERGTDIMRPAKID......
protein
![Page 9: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/9.jpg)
Tandem Mass Spectrometry
![Page 10: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/10.jpg)
Tandem Mass SpectrometryRT: 0.01 - 80.02
5 10 15 20 253 035 40 45 50 55 60 65 70 75 80Time (min)
0
10
20
30
40
50
60
70
80
90
100
Relati
ve Ab
undanc
e
13891991
1409 21491615 1621
14112147
161119951655
15931387
21551435 19872001 21771445 1661
19372205
1779 21352017
1313 22071307 23291105 17071095
2331
NL:1.52E8
Base Peak F: + c Full ms [ 300.00 - 2000.00]
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
Scan 1708
LC
S#: 1707 RT: 54.44 AV: 1 NL: 2.41E7F: + c Full ms [ 300.00 - 2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
638.0
801.0
638.9
1173.8872.3 1275.3
687.6944.7 1884.51742.1122.0783.3 1048.3 1413.9 1617.7
Scan 1707
MS
MS/MSIon
Source
MS-1collision
cell MS-2
![Page 11: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/11.jpg)
Tandem Mass Spectrum Tandem Mass Spectrometry (MS/MS): mainly
generates partial N- and C-terminal peptides Chemical noise often complicates the spectrum. Represented in 2-D: mass/charge axis vs. intensity
axis
![Page 12: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/12.jpg)
Protein Identification with MS/MS
G V D L K
mass0
Inte
nsity
mass0
MS/MSPeptide Identification:
![Page 13: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/13.jpg)
N- and C-terminal Peptides
N-term
inal
pep
tides
C-te
rmin
al p
eptid
es
![Page 14: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/14.jpg)
N- and C-terminal Peptides
N-term
inal
pep
tides
C-te
rmin
al p
eptid
es
415
486
301
154
57
71
185
332
429
![Page 15: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/15.jpg)
Peptide Shotgun Sequencing
415
486
301
154
57
71
185
332
429
Reconstruct peptide from the set of masses of fragment ions
(mass-spectrum)
![Page 16: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/16.jpg)
(Ideal) Theoretical Spectrum
![Page 17: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/17.jpg)
Issue 1: Spectrum Consists of Different Ion Types
y3
b2
y2 y1
b3a2 a3
HO NH3+
| |
R1 O R2 O R3 O R4
| || | || | || |H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H
b2-H2O
y3 -H2O
b3- NH3
y2 - NH3
because peptides can be broken in several places.
![Page 18: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/18.jpg)
Issue 2: Noise and Missing Peaks
The peaks in the mass spectrum:– Prefix
– Fragments with neutral losses (-H2O, -NH3)
– Noise and missing peaks.
G V D L K
mass0
57 Da = ‘G’ 99 Da = ‘V’LK D V G
and Suffix Fragments.
D
H2O
![Page 19: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/19.jpg)
De Novo vs. Database Search S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
WR
A
C
VG
E
K
DW
LP
T
L T
WR
A
C
VG
E
K
DW
LP
T
L T
De Novo
AVGELTK
Database Search
Database of all peptides = 20n
AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI,
AVGELTI, AVGELTK , AVGELTL, AVGELTM,
YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
Mass, Score
![Page 20: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/20.jpg)
Peptide Identification Problem (Database Search)
Goal: Find a peptide from the database with maximal match between an experimental and theoretical spectrum.
Input:– S: experimental spectrum– database of peptides– Δ: set of possible ion types– m: parent mass
Output: – A peptide of mass m from the database whose
theoretical spectrum matches the experimental S spectrum the best
![Page 21: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/21.jpg)
Peptide Identification by Database Search Compare experimental spectrum with theoretical
spectra of database peptides to find the best fit The match between two spectra is the number of
masses (peaks) they share (Shared Peak Count or SPC)
In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks
Match between experimental and theoretical spectra is defined similarly
To find the peptide with theoretic spectrum that is most similar to the real spectrum
![Page 22: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/22.jpg)
Peptide Sequencing Problem(De Novo)
Goal: Find a peptide with maximal match between an experimental and theoretical spectrum.
Input:– S: experimental spectrum– Δ: set of possible ion types– m: parent mass
Output: – A peptide with mass m, whose theoretical
spectrum matches the experimental S spectrum the best
![Page 23: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/23.jpg)
De novo Peptide Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
SequenceSequence
![Page 24: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/24.jpg)
Building Spectrum Graph
How to create vertices (from masses)
How to create edges (from mass differences)
How to score paths
How to find best path
![Page 25: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/25.jpg)
Mass/Charge (M/Z)Mass/Charge (M/Z)
Inte
nsit
yIn
tens
ity
a is an ion type shift in b
y
![Page 26: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/26.jpg)
MS/MS Spectrum (Ion Types Unknown & With Noise)
Mass/Charge (M/z)Mass/Charge (M/z)
Inte
nsit
yIn
tens
ity
![Page 27: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/27.jpg)
Some Mass Differences between Peaks Correspond to Amino Acids
ss
ssss
ee
eeee
ee
ee
ee
ee
ee
qquu
uu
uu
nn
nn
nn
ee
cc
cc
cc
![Page 28: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/28.jpg)
Knowing Ion Types Some masses correspond to fragment ions,
others are just random noise
Knowing ion types Δ={δ1, δ2,…, δk} lets us
distinguish fragment ions from noise
We can learn ion types δi and their probabilities qi
by analyzing a large test sample of annotated
spectra.
![Page 29: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/29.jpg)
Vertices of Spectrum Graph Masses of potential N-terminal peptides
Vertices are generated by reverse shifts corresponding to ion types
Δ={δ1, δ2,…, δk}
Every N-terminal peptide can generate up to k ions
m-δ1, m-δ2, …, m-δk
Every mass s in an MS/MS spectrum generates k vertices
V(s) = {s+δ1, s+δ2, …, s+δk}
corresponding to potential N-terminal peptides
Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}
![Page 30: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/30.jpg)
Reverse Shifts
Shift in H2O+NH3
Shift in H2O
![Page 31: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/31.jpg)
Edges of Spectrum Graph
Two vertices with mass difference corresponding
to an amino acid A:
– Connect with an edge labeled by A
Gap edges for di- and tri-peptides
![Page 32: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/32.jpg)
Paths
Path in the labeled graph spell out amino acid sequences
There are many paths, how to find the correct one?
We need scoring to evaluate paths
![Page 33: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/33.jpg)
Path Score
p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq}
p(P, s) = the probability that peptide P generates a peak s
Scoring = computing probabilities
p(P,S) = πsєS p(P, s)
![Page 34: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/34.jpg)
Finding Optimal Paths in the Spectrum Graph
For a given MS/MS spectrum S, find a peptide P’ maximizing p(P,S) over all possible peptides P:
Peptides = paths in the spectrum graph
P’ = the optimal path in the spectrum graph
p(P,S)p(P',S) Pmax
![Page 35: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/35.jpg)
De Novo vs. Database Search: A Paradox The database of all peptides is huge ≈ O(20n) .
The database of all known peptides is much smaller ≈ O(108).
However, de novo algorithms can be much faster, even though their search space is much larger!
A database search scans all peptides in the database of all known peptides search space to find best one.
De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.
But De novo sequencing is still not very accurate!
![Page 36: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/36.jpg)
Sequencing of Modified Peptides
De novo peptide sequencing is invaluable for identification of unknown proteins:
However, de novo algorithms are designed for working with high quality spectra with good fragmentation and without modifications.
Another approach is to compare a spectrum against a set of known spectra in a database.
![Page 37: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/37.jpg)
Post-Translational ModificationsProteins are involved in cellular signaling and
metabolic regulation.
They are subject to a large number of biological modifications.
Almost all protein sequences are post-translationally modified and 200 types of modifications of amino acid residues are known.
![Page 38: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/38.jpg)
Examples of Post-Translational Modification
Post-translational modifications increase the number of “letters” in amino acid alphabet and lead to a combinatorial explosion in both database search and de novo approaches.
![Page 39: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/39.jpg)
Identification of Peptides with Mutations: Challenge
Very similar peptides may have very different spectra (so SPC won’t work)!
Goal: Define a notion of spectral similarity that correlates well with the sequence similarity.
If peptides are a few mutations/modifications apart, the spectral similarity between their spectra should be high.
![Page 40: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/40.jpg)
S(PRTEIN) = {98, 133, 246, 254, 355, 375, 476, 484, 597, 632}S(PRTEYN) = {98, 133, 254, 296, 355, 425, 484, 526, 647, 682}S(PGTEYN) = {98, 133, 155, 256, 296, 385, 425, 526, 548, 583}
no mutations
SPC=10
1 mutation
SPC=5
2 mutations
SPC=2
Problem: SPC diminishes very quickly as the number of mutations increases. (Only a small portion of correlations between the spectra is captured by SPC.)
Similar Peptides with Different Spectra
![Page 41: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/41.jpg)
Search for Modified Peptides: Virtual Database Approach
Yates et al.,1995: an exhaustive search in a virtual database of all modified peptides.
Exhaustive search leads to a large combinatorial problem, even for a small set of modifications types.
Problem (Yates et al.,1995). Extend the virtual database approach to a large set of modifications.
![Page 42: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/42.jpg)
Spectral Convolution
)0)((
))((,
12
12
122211
22111212
:
SS
xSSssSsSs
}S,sS:ss{sSS
x
:peak) (SPC count peaks shared The
with pairs of Number
convolution is a mathematical operation on two functions producing a third function that is typically viewed as a modified version of one of the original functions; cross-correlation
![Page 43: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/43.jpg)
Elements of S2 S1 represented as elements of a difference matrix. The elements with multiplicity >2 are colored; the elements with multiplicity =2 are circled. The SPC takes into account only the red entries
![Page 44: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/44.jpg)
Spectral Comparison: Difficult Case
S = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}
Which of the spectra
S’ = {10, 20, 30, 40, 50, 55, 65, 75,85, 95}
or
S” = {10, 15, 30, 35, 50, 55, 70, 75, 90, 95}
fits the spectrum S the best?
SPC: both S’ and S” have 5 peaks in common with S.
Spectral Convolution: reveals the peaks at 0 and 5.
![Page 45: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/45.jpg)
Spectral Comparison: Difficult Case
S S’
S S’’
![Page 46: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/46.jpg)
Limitations of the Spectrum Convolutions
Spectral convolution does not reveal that spectra S and S’ are similar, while spectra S and S” are not.
Clumps of shared peaks: the matching positions in S’ come in clumps while the matching positions in S” don't.
This important property was not captured by spectral convolution.
![Page 47: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/47.jpg)
Shifts
A = {a1 < … < an} : an ordered set of natural numbers.
A shift (i,) is characterized by two parameters,
the position (i) and the length ().
The shift (i,) transforms
{a1, …., an}
into
{a1, ….,ai-1,ai+,…,an+ }
![Page 48: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/48.jpg)
Shifts: An Example
The shift (i,) transforms {a1, …., an}
into {a1, ….,ai-1,ai+,…,an+ }
e.g.
10 20 30 40 50 60 70 80 90
10 20 30 35 45 55 65 75 85
10 20 30 35 45 55 62 72 82
shift (4, -5)
shift (7,-3)
![Page 49: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/49.jpg)
Spectral Alignment Problem
Find a series of k shifts that make the sets
A={a1, …., an} and B={b1,….,bn}
as similar as possible.
k-similarity between sets
D(k) - the maximum number of elements in common between sets after k shifts.
![Page 50: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/50.jpg)
Representing Spectra in 0-1 Alphabet
Convert spectrum to a 0-1 string with 1s corresponding to the positions of the peaks.
![Page 51: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/51.jpg)
Comparing Spectra=Comparing 0-1 Strings A modification with positive offset corresponds to
inserting a block of 0s A modification with negative offset corresponds to
deleting a block of 0s Comparison of theoretical and experimental spectra
(represented as 0-1 strings) corresponds to a (somewhat unusual) edit distance/alignment problem where elementary edit operations are insertions/deletions of blocks of 0s
Use sequence alignment algorithms!
![Page 52: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/52.jpg)
Spectral Alignment vs. Sequence Alignment
Manhattan-like graph with different alphabet and scoring.
Movement can be diagonal (matching masses) or horizontal/vertical (insertions/deletions corresponding to PTMs).
At most k horizontal/vertical moves.
![Page 53: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/53.jpg)
SPC reveals only D(0)=3 matching peaks.
Spectral Alignment reveals more hidden similarities between spectra: D(1)=5 and D(2)=8 and detects corresponding mutations.
Use of k-Similarity
![Page 54: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/54.jpg)
Protein Identification We can detect peptides from mass spectra by
database search or de novo approaches Homologous proteins
![Page 55: Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d885503460f94a6cd82/html5/thumbnails/55.jpg)
References Mass spectrometry-based proteomics, Nature
422:198, 2003 Applying mass spectrometry-based proteomics
to genetics, genomics and network biology, Nature Reviews Genetics 10, 617-627, 2009