peparml: a model-free, result-combining peptide identification arbiter via machine learning

35
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland, College Park, and Georgetown University Medical Center

Upload: pooky

Post on 12-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

PepArML: A model-free, result-combining peptide identification arbiter via machine learning. Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland, College Park, and Georgetown University Medical Center. SEQUEST. Mascot. 28%. 14%. 14%. 38%. 1%. 3%. 2%. X! Tandem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

PepArML: A model-free, result-

combining peptide

identification arbiter via

machine learning

PepArML: A model-free, result-

combining peptide

identification arbiter via

machine learning

Xue Wu, Chau-Wen Tseng, Nathan Edwards

University of Maryland, College Park, andGeorgetown University Medical Center

Page 2: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

2

Comparison of Search Engines

• No single score is comprehensive

• Search engines disagree

• Many spectra lack confident peptide assignment

• Many spectra lack any peptide assignment

Searle et al. JPR 7(1), 2008

38%

14%28%

14%

3%

2%

1%

X! Tandem

SEQUESTMascot

Page 3: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

3

Black-box Techniques

• Significance re-estimation• Target-Decoy search• Bimodal distribution fit

• Supervised machine learning• Train predictors on synthetic datasets• Select and/or create (many) good features

• Result combiners• Incorrect peptide IDs unlikely to match• Significance re-estimation• Independence and/or supervised model

Page 4: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

4

PepArML

• Unified machine learning result combiner• Significance re-estimation too!

• Model-free feature use and result combination• Use agreement and features if useful

• Unsupervised training procedure• No loss of classification performance

Page 5: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

5

PepArML Overview

X!Tandem

Mascot

OMSSA

Other

PepArML

Page 6: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

6

PepArML Overview

X!Tandem

Mascot

OMSSA

Other

PepArML

Feature extraction

Page 7: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

7

Dataset Construction

T),( 11 PS

F),( 21 PS

T),( 12 PS

X!Tandem Mascot OMSSA

T),( mn PS

……

Page 8: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

8

Dataset Construction

• Calibrant 8 Protein Mix (C8) • 4594 MS/MS spectra (LTQ)• 618 (11.2%) true positives

• Sashimi 17mix_test2 (S17)• 1389 MS/MS spectra (Q-TOF)• 354 (25.4%) true positives

• AURUM 1.0 (364 Proteins)• 7508 MS/MS spectra (MALDI-TOF-TOF)• 3775 (50.3%) true positives

Page 9: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

9

PepArML Machine Learning

• Machine learning (generally) helps single search engines

• PepArML result-combiner (C-TMO) improves on single search engines

• Sometimes combining two search engines works as well, or better, than three

Page 10: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

10

PepArML vs Search Engines (C8)

Page 11: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

11

True vs. Est. FDR (C-TMO, C8)

Page 12: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

12

PepArML vs Search Engines (C8)

Page 13: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

13

PepArML Pairs vs PepArML (C8)

Page 14: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

14

Sensitivity Comparison

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(10%, C8) (10%, S17) (10%, AURUM)

Classifier (FDR, dataset)

Sen

siti

vity

C-TMO

C-TM

C-TO

C-MO

C-T

C-M

C-O

Tandem

Mascot

OMSSA

Page 15: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

15

Feature Evaluation

0

0.2

0.4

0.6

0.8

Info

Gai

n

C8

0

0.2

0.4

0.6

0.8

Info

Gai

n

S17

0

0.2

0.4

0.6

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Info

Gai

n

AURUM

1 Peptide length2 hyperscore3 precursor mass delta4 # of matched y-ions5 # of matched b-ions6 # of missed cleavages7 sum matched intensity8 E-value9 sentinel

10 score11 precursor mass delta12 # of matched ions13 # of matched peaks14 # of missed cleavages15 E-value16 sentinel17 p-value18 # of matched ions19 E-value20 sentinel

Ta

nde

mO

MS

SA

Ma

sco

t

Page 16: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

16

Application to Real Data

• How well do these models generalize?

• Different instruments• Spectral characteristics change scores

• Search parameters• Different parameters change score values

• Supervised learning requires• (Synthetic) experimental data from every instrument• Search results from available search engines• Training/models for all

parameters x search engine sets x instruments

Page 17: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

17

Model Generalization

Train C8 / Score S17

Train S17 / Score S17

Page 18: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

18

Rescuing Machine Learning

• Train a new machine learning model for every dataset!• Generalization not required• No predetermined search engines, parameters,

instruments, features

• Perhaps we can “guess” the true proteins• Most proteins not in doubt• Machine learning can tolerate imperfect labels

Page 19: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

19

Unsupervised Learning

Page 20: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

20

Unsupervised Learning (S17)

Page 21: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

21

Unsupervised Learning (S17)

Page 22: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

22

Protein Selection Heuristic

• Modeled on typical protein identification criteria• High confidence peptide IDs• At least 2 non-overlapping peptides• At least 10% sequence coverage

• Robust, fast convergence

• Easily enforce additional constraints

Page 23: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

23

What about real data?

Dr. Rado Goldman (LCCC, GUMC)• Proteolytic serum peptides from clinical

hepatocellular carcinoma samples• ~ 200 MALDI MS/MS Spectra (TOF-TOF)

PepArML for non-specific search of IPI-Human

• Increase in confidence & sensitivity• Observation of “ragged” proteolytic trimming

Page 24: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

24

Protein Identification Example

M T OKey E-value

< 1e-5< 0.05Any IDNo ID

*

Page 25: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

25

Future Directions

• Apply to more experimental datasets• Integrate

• novel features• new search engines, spectral matching• multiple searches with varied parameters,

sequence databases• Construct meta-search engine• FDR by bimodal fit instead of decoys• Release as open source

• http://peparml.sourceforge.org

Page 26: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

26

http://PepArML.SourceForge.Net

Page 27: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

27

Acknowledgements

• Xue Wu* & Dr. Chau-Wen Tseng, • Computer Science

University of Maryland, College Park• Dr. Brian Balgley, Dr. Paul Rudnick

• Calibrant Biosystems & NIST• Dr. Rado Goldman, Dr. Yanming An

• Department of OncologyGeorgetown University Medical Center

• Kam Ho To• Biochemistry Masters student

Georgetown University

• Funding: NIH/NCI CPTAC

Page 28: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

28

Page 29: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

29

PepArML vs Search Engines (S17)

Page 30: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

30

PepArML vs Search Engines (S17)

Page 31: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

31

PepArML Pairs vs PepArML (C8)

Page 32: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

32

PepArML Pairs vs PepArML (S17)

Page 33: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

33

PepArML Pairs vs PepArML (S17)

Page 34: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

34

Unsupervised Learning (C8)

Page 35: PepArML: A model-free, result-combining peptide identification arbiter via machine learning

35

Unsupervised Learning (C8)