an overview of the pride ecosystem of resources and computational tools for mass spectrometry...

34
An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data Dr. Juan Antonio Vizcaíno EMBL-European Bioinformatics Institute Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 15-Apr-2017

35 views

Category:

Science


1 download

TRANSCRIPT

EMBL-EBI Now and in the Future

An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics dataDr. Juan Antonio Vizcano

EMBL-European Bioinformatics InstituteHinxton, Cambridge, UK

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

1

Overview

PRIDE Archive and ProteomeXchange

PRIDE tools

Reuse of public proteomics data

PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016What is a proteomics publication in 2016?Proteomics studies generate potentially large amounts of data and results.

Ideally, a proteomics publication needs to:Summarize the results of the studyProvide supporting information for reliability of any results reported

Information in a publication:ManuscriptSupplementary materialAssociated data submitted to a public repository

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

PRIDE stores mass spectrometry (MS)-based proteomics data:Peptide and protein expression data (identification and quantification)Post-translational modificationsMass spectra (raw data and peak lists)Technical and biological metadataAny other related information

Full support for tandem MS approaches

PRIDE (PRoteomics IDEntifications) Archivehttp://www.ebi.ac.uk/pride/archiveMartens et al., Proteomics, 2005Vizcano et al., NAR, 2016

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

4

ProteomeXchange: A Global, distributed proteomics database

PASSEL (SRM data)

PRIDE (MS/MS data)

MassIVE (MS/MS data)

Raw

ID/Q

Meta

jPOST(MS/MS data)

Mandatory raw data deposition since July 2015

Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.

http://www.proteomexchange.orgNew in 2016Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016ProteomeCentralMetadata / ManuscriptRaw DataResults

Journals

Peptide Atlas Receiving repositories

PRIDE

Researchers results

Raw dataMetadata

PASSEL

Research groupsReanalysis of datasets

MassIVE

jPOST MS/MS data(as completesubmissions)

Any other workflow (mainly partial submissions)

DATASETS

SRM data

Reprocessed results

MassIVEProteomeXchange data workflow

Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

6

ProteomeCentral: Centralised portal for all PX datasetshttp://proteomecentral.proteomexchange.org/cgi/GetDataset

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016ProteomeCentralMetadata / ManuscriptRaw DataResults

Journals

Peptide Atlas Receiving repositories

PRIDE

Researchers results

Raw dataMetadata

PASSEL

Research groupsReanalysis of datasets

MassIVE

jPOST MS/MS data(as completesubmissions)

Any other workflow (mainly partial submissions)

DATASETS

SRM data

Reprocessed results

MassIVEProteomeXchange data workflow

Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

8

ProteomeCentralMetadata / ManuscriptRaw DataResults

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs Receiving repositories

PRIDE

GPMDB

Researchers results

Raw dataMetadata

PASSEL

proteomicsDB

Research groupsReanalysis of datasets

MassIVE

jPOST MS/MS data(as completesubmissions)

Any other workflow (mainly partial submissions)

DATASETS

OmicsDIIntegration with other omics datasets

SRM data

Reprocessed results

MassIVEProteomeXchange data workflow

Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

9

PRIDE: Source of MS proteomics data

PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas.

http://www.ebi.ac.uk/pride/archive

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

10

PRIDE Archive over 4,500 datasets from over 51 countries and 1,700 groupsUSA 814 datasetsGermany 528 UK 338China 328France 222Netherlands 175Canada - 137

Data volume:Total: ~275 TB Number of all files: ~560,000PXD000320-324: ~ 4 TBPXD002319-26 ~2.4 TBPXD001471 ~1.6 TB1,973 datasets i.e. 52% of all are publicly accessible~90% of all ProteomeXchange datasets

YearSubmissionsAll submissionsCompletePRIDE Archive growthIn the last 12 months: ~165 submitted datasets per monthTop Species studied by at least 100 datasets:2,010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus >900 reported taxa in total

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016(> 922 processed by MaxQuant)

11

Overview

PRIDE Archive and ProteomeXchange

PRIDE tools

Reuse of public proteomics data

PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Components: Data Submission ProcessPRIDE Converter 2PRIDE InspectorPX Submission Tool

mzIdentMLPRIDE XMLIn addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

13

PRIDE Inspector Toolsuite

Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., Bioinformatics, 2015Perez-Riverol et al., MCP, 2016

PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics.Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML.Broad functionality.

https://github.com/PRIDE-Utilities/ms-data-core-apihttps://github.com/PRIDE-Toolsuite/pride-inspector

Summary and QC charts

Peptide spectra annotation and visualization

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

14

PX Submission Tool

Desktop application for data submissions to ProteomeXchange via PRIDE

Implemented in Java 7Streamlines the submission processCapture mappings between filesRetain metadataFast file transfer with Aspera (FASP transfer technology) FTP also availableCommand line option

Submission tool screenshot

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

15

Overview

PRIDE Archive and ProteomeXchange

PRIDE tools

Reuse of public proteomics data

PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Datasets are being reused more and more.

Vaudel et al., Proteomics, 2016Data download volume for PRIDE Archive in 2015: 198 TB

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

17

Data sharing in Proteomics

Vaudel et al., Proteomics, 2016

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Draft Human proteome papers published in 2014

Wilhelm et al., Nature, 2014Kim et al., Nature, 2014

Two independent groups claimed to have produced the first complete draft of the human proteome by MS.

Some of their findings are controversial and need further validation but generated a lot of discussion and put proteomics in the spotlight.

They used many different tissues.Nature cover 29 May 2014

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Draft Human proteome papers published in 2014

Wilhelm et al., Nature, 2014Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE.

They complement that data with exotic tissues.

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Data sharing in Proteomics

Vaudel et al., Proteomics, 2016

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Examples of repurposing in proteogenomics

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Public datasets from different omics: OmicsDIhttp://www.ebi.ac.uk/Tools/omicsdi/Aims to integrate of omics datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVEjPOSTPASSELGPMDB

ArrayExpressExpression Atlas

MetaboLightsMetabolomics WorkbenchGNPS

EGAPerez-Riverol et al., Nat Biotechnol, in press

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.23

OmicsDI: Portal for omics datasets

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.24

OmicsDI: Portal for omics datasets

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.25

Overview

PRIDE Archive and ProteomeXchange

PRIDE tools

Reuse of public proteomics data

PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Added value resources: PRIDE Cluster and PRIDE ProteomesCondensed and across-data set, QC-filtered view on PRIDE data.PRIDE Cluster: Peptide centric.PRIDE Proteomes: Protein centric (identification data)

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Data sharing in Proteomics

Vaudel et al., Proteomics, 2016

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Cluster

Provide an aggregated peptide centric view of PRIDE Archive.Hypothesis: same peptide will generate similar MS/MS spectra across experiments.New version of spectral clustering algorithm to reliably group spectra coming from the same peptide. Enables QC of peptide-spectrum matches (PSMs). Infer reliable identifications by comparing submitted identifications of spectra within a cluster.

After clustering, a representative spectrum is built for all peptides consistently identified across different datasets.Used to build spectral libraries (for 16 species).Griss et al., Nat. Methods, 2013Griss et al., Nat. Methods, 2016

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016

29

Example: one perfect cluster

880 PSMs give the same peptide ID4 species28 datasetsSame instruments

http://wwwdev.ebi.ac.uk/pride/cluster/

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes web interface: identification info

Unique/Shared Peptides Mass spec-based sequence coveragePTM detected ( )Observed tissues

Biological vs Sample Prep PTMshttp://wwwdev.ebi.ac.uk/pride/proteomes/

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016ConclusionsPRIDE Archive and ProteomeXchange have become the standard platform for public data deposition in proteomics.PRIDE Inspector: support for data standards.PX submission tool.

Reuse of public proteomics data is increasing: many opportunities for data miners.OmicsDI: new platform to identify public datasets coming from different omics technologies (more possibilities for data reuse!).PRIDE Cluster and PRIDE Proteomes.

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)

Johannes GrissYasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak

Enrique Perez

Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob

Acknowledgements: The PRIDE Team

All data submitters !!!

@pride_ebi@proteomexchange

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 201633

Questions?

http://www.slideshare.net/JuanAntonioVizcaino

Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 201634