error tolerant search large number of spectra remain without significant score. reasonable number of...

23
Error tolerant search • Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated mass measurement error (should be seen in peptide view graphs, – Incorrect determination of precursor charge state – Peptide sequence is not in the database. – Missed cleavage & unexpected cleavage, – Unexpected chemical & post-translational modification. • The biological structure, function and activity of a protein can be determined by the modification of the given protein. • An increasing part of the proteins that have been mapped to e.g. different diseases, not only change in expression levels but also or exclusively in the level of posttranslational modifications. 1

Upload: eleanore-bond

Post on 17-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

PTMs Complete modifications (chemical modifications) Variable modifications 3

TRANSCRIPT

Page 1: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

1

Error tolerant search• Large number of spectra remain without significant score.

Reasonable number of fragment ion peaks might have not match.– Underestimated mass measurement error (should be seen in

peptide view graphs,– Incorrect determination of precursor charge state– Peptide sequence is not in the database.– Missed cleavage & unexpected cleavage,– Unexpected chemical & post-translational modification.

• The biological structure, function and activity of a protein can be determined by the modification of the given protein.

• An increasing part of the proteins that have been mapped to e.g. different diseases, not only change in expression levels but also or exclusively in the level of posttranslational modifications.

Page 2: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

2

Post-Translational Modifications (PTMs)• PTM alters the weight of amino acids and the

peptide that results peak shifts in the spectrum:

b1: Hb2: HQb3: HQSb4: HQSVb5: HQSVM…b10:HQSVMVGMVQ

QSVMVGMVQK:y10

SVMVGMVQK: y9

VMVGMVQK: y8

MVGMVQK: y7

VGMVQK: y6

…K: y1

m/z200 400 1000

b1 y1 b2 b3 y10b10b3 y10b10y7 ……

H Q S V M V G M V Q Kb1

y10

b2

y9

b3

y8

b4

y7

b5

y6

b9

y5

b6

y4

b7

y3

b8

y2

b10

y1

Page 3: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

3

PTMs

• Complete modifications (chemical modifications)

• Variable modifications

Page 4: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

4

PTMs

• Obstacles– Complexity (means longer execution time)

• Can increase the search space 1,10,...10000 fold– Significance

Page 5: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

5

Obstacles - Complexity• Let the theoretical peptide be:

– HQSVMVGMVQK (11 amino acids)– Each amino acid can be modified by, let’s say, 5 PTMs

# included PTMs # modified theoretical spectra time

0 1 1 sec

1 11*5 = 55 55 seconds (1min)

2 11*25 = 275 4.5 mins

3 11*15*125 = 20625 5.7 hours

...

10 29839 hours (3.5 years)

In general:Peptide length = LIncluded PTMs = KPTMs/aa = M

1074218759765625*1151011 10

KMKL

Page 6: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

6

– Inserting many PTMs make the theoretical spectra too flexible and in the end all theoretical spectra can be aligned to the experimental spectra.

100%

0%

1

0

Page 7: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

7

Significance

• Increases the random matches– Inserting many PTMs make the theoretical spectra

too flexible and in the end all theoretical spectra can be aligned to the experimental spectra.

T hscore

A

B

probability distributionof random scores

probability distributionof correct scores

p-value of hit h

Freq

uenc

y

Page 8: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

8

Computational Identification of PTMs

• 3 approaches:– Targeted,– Untargeted or also called restricted– Unrestricted, de novo, blind search

Page 9: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

9

Targeted approach

• Almost all search engine supports it.– Experimenter needs to guess the PTMs in the

sample. • Two pass strategy

– Two rounds, refinement on a smaller – Sequest, Mascot

Page 10: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

10

Targeted approach – X!Tandem

Page 11: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

11

Targeted approach – InsPecT

Page 12: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

12

Untargeted approaches

• Uses a big list of databases– Search space is limited but can be very huge.– if we allow 5 of the 10 most frequent

modifications to occur in a peptide at the same type, the search space grows 3 orders of magnitude.

– The growth is more dramatic if instead of 10 types of modifications we wish to consider all of roughly 500 known types.

Page 13: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

13

Database of PTMs

• Unimod– http://www.unimod.org– Contains 906 modifications

• Resid– http://www.ebi.ac.uk/RESID– 559 Entries

Page 14: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

14

Untargeted

• PILOT_PTM– Uses a large dataset of modifications.– Binary Linear programming.

• Objective function is the number of the matched peaks• Linear constrain functions are guarantee meaningful

modifications of the peptide.

Page 15: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

15

Unrestricted

• No priori information about PTMs. • De novo identification of PTMs• Search space is infinite.• In practice no more than one or two PTMs can

be identified on the same peptide.

Page 16: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

16

TwinPeaks approach

• Based on the Sequest idea.• Shifts the experimental spectra over a range,

and plots the similarity score as a function of the mass shift.

Page 17: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

17

TwinPeaks approachSu

m o

f mat

ched

inte

nsity

Page 18: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

18

MS-Alignment

• Based on the alignment of the theoretical spectra to the experimental spectra

Page 19: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

19

Theoretical Spectrum

Expe

rimen

tal S

pect

rum

Page 20: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

20

MS-alignment

Page 21: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

21

Comparison of targeted and unrestricted results

Scan ID log(-E) Peptide3.1.1 -13.8 fqyr295 ILTAAALCHF TSIEVVK 311kasg (130)

6.1.1 -6.6 rihr159 FVEKPQVFVS NK 170inag (471)

11.1.1 -3.4 rtcr30 SPEPGPSSSI GSPQASSPPR PN 51hyll (48)

12.1.1 -4.0 dvtr473 TMHFGTPTAY EK 484ecft (306)

13.1.1 -10.0 ietk133 FFDDDLLVST SR 144vrlf (176)

24.1.1 -4.2 pskr237 QTNGCLNGYT PSR 249krqa (112)

25.1.1 -2.5 ntpr149 KNGGLGHMNI ALLSDLTK 166qisr (1776)

27.1.1 -7.4 pqgr19 IHQIEYAMEA VK 30qgsa (10317)

31.1.1 -2.0 kefk80 DREDLVPYTG EK 91rgkv (137)

34.1.1 -1.6 dyhr131 YLAEFATGND R 141keaa (9406)

35.1.1 -7.0 grar16 QYTSPEEIDA QLQAEK 31qkar (2754)

36.1.1 -2.0 rlar172 QDPQLHPEDP ER 183raai (644)

37.1.1 -8.1 iflh92 ISDVEGEYVP VEGDEVTYK 110mcsi (73)

38.1.1 -3.9 mrsr328 TASGSSVTSL DGTR 341srsh (2698)

40.1.1 -3.7 lgnk29 YVQLNVGGSL YYTTVR 44altr (71)

42.1.1 -1.9 dlqk183 EGEFSTCFTE LQR 195dflk (239)

45.1.1 -2.9 pkek135 QPVAGSEGAQ YR 146kkql (694)

46.1.1 -10.3 lsar446 ASNAWILQQH IATVPSLTHL CR 467leir (107)

53.1.1 -6.8 evyr175 NSMPASSFQQ QK 186lrvc (7099)

57.1.1 -4.7 iygk81 QFEDELHPDL K 91ftga (491)

Scan ID P-value Peptide3 1.00E-05 R.ILTAAALCHFTSIEVVK.K

6 1.00E-05 R.FVEKPQVFVSNK.I

13 1.00E-05 K.FFDDDLLVSTSR.V

27 1.00E-05 R.IHQIEYAMEAVK.Q

47 0.028806584 A.V+172LTAFANGR.S

57 1.00E-05 K.QFEDELHPDLK.F

58 0.004739336 R.ETFY+18LAQDFFDR.F

59 1.00E-05 R.TCLSQLLDIMK.S

71 1.00E-05 K.EYFSTFGEVLM+16VQVK.K

75 1.00E-05 K.QH-18LENDPGSNEDTDIPK.G

97 0.004672897 Q.L+128GVSHVFEYIR.S

98 0.004830918 C.T+160EDMTEDELR.E

99 1.00E-05 R.EFFD-18SNGNFLYR.I

100 1.00E-05 R.LVLESPAPVEVNLK.L

105 1.00E-05 K.LQEFAYVTDGAC+14SEEDILR.M

108 1.00E-05 K.SFDENGFDYLLTYSDNPQTVFP+156.R

115 1.00E-05 R.GPATVEDLPSAFEEK.A

119 1.00E-05 Y.ITD+163VLTEEDALEILQK.G

147 1.00E-05 R.IYSYQMALTPVVVTLWYR.A

X!Tandem targeted

MS-AlignmentUnrestricted (de novo)

Page 22: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

22

Validate your results

Page 23: Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated

23

Summary

• What you should remember:– PTM identification is computationally expensive– 3 approaches (targeted, untargeted, unrestricted)– Always examine the results, omit weird PTMs,– Decreases the statistical significance– The more you are looking for the less you get (due

to significance)