an overview of the pride ecosystem of resources and computational tools for mass spectrometry...
TRANSCRIPT
EMBL-EBI Now and in the Future
An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics dataDr. Juan Antonio Vizcano
EMBL-European Bioinformatics InstituteHinxton, Cambridge, UK
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
1
Overview
PRIDE Archive and ProteomeXchange
PRIDE tools
Reuse of public proteomics data
PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016What is a proteomics publication in 2016?Proteomics studies generate potentially large amounts of data and results.
Ideally, a proteomics publication needs to:Summarize the results of the studyProvide supporting information for reliability of any results reported
Information in a publication:ManuscriptSupplementary materialAssociated data submitted to a public repository
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
PRIDE stores mass spectrometry (MS)-based proteomics data:Peptide and protein expression data (identification and quantification)Post-translational modificationsMass spectra (raw data and peak lists)Technical and biological metadataAny other related information
Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) Archivehttp://www.ebi.ac.uk/pride/archiveMartens et al., Proteomics, 2005Vizcano et al., NAR, 2016
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
4
ProteomeXchange: A Global, distributed proteomics database
PASSEL (SRM data)
PRIDE (MS/MS data)
MassIVE (MS/MS data)
Raw
ID/Q
Meta
jPOST(MS/MS data)
Mandatory raw data deposition since July 2015
Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.orgNew in 2016Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
6
ProteomeCentral: Centralised portal for all PX datasetshttp://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
8
ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
UniProt/neXtProt
Peptide Atlas
Other DBs Receiving repositories
PRIDE
GPMDB
Researchers results
Raw dataMetadata
PASSEL
proteomicsDB
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
OmicsDIIntegration with other omics datasets
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017, in press
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
9
PRIDE: Source of MS proteomics data
PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas.
http://www.ebi.ac.uk/pride/archive
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
10
PRIDE Archive over 4,500 datasets from over 51 countries and 1,700 groupsUSA 814 datasetsGermany 528 UK 338China 328France 222Netherlands 175Canada - 137
Data volume:Total: ~275 TB Number of all files: ~560,000PXD000320-324: ~ 4 TBPXD002319-26 ~2.4 TBPXD001471 ~1.6 TB1,973 datasets i.e. 52% of all are publicly accessible~90% of all ProteomeXchange datasets
YearSubmissionsAll submissionsCompletePRIDE Archive growthIn the last 12 months: ~165 submitted datasets per monthTop Species studied by at least 100 datasets:2,010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus >900 reported taxa in total
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016(> 922 processed by MaxQuant)
11
Overview
PRIDE Archive and ProteomeXchange
PRIDE tools
Reuse of public proteomics data
PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Components: Data Submission ProcessPRIDE Converter 2PRIDE InspectorPX Submission Tool
mzIdentMLPRIDE XMLIn addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
13
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., Bioinformatics, 2015Perez-Riverol et al., MCP, 2016
PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics.Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML.Broad functionality.
https://github.com/PRIDE-Utilities/ms-data-core-apihttps://github.com/PRIDE-Toolsuite/pride-inspector
Summary and QC charts
Peptide spectra annotation and visualization
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
14
PX Submission Tool
Desktop application for data submissions to ProteomeXchange via PRIDE
Implemented in Java 7Streamlines the submission processCapture mappings between filesRetain metadataFast file transfer with Aspera (FASP transfer technology) FTP also availableCommand line option
Submission tool screenshot
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
15
Overview
PRIDE Archive and ProteomeXchange
PRIDE tools
Reuse of public proteomics data
PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Datasets are being reused more and more.
Vaudel et al., Proteomics, 2016Data download volume for PRIDE Archive in 2015: 198 TB
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
17
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014Kim et al., Nature, 2014
Two independent groups claimed to have produced the first complete draft of the human proteome by MS.
Some of their findings are controversial and need further validation but generated a lot of discussion and put proteomics in the spotlight.
They used many different tissues.Nature cover 29 May 2014
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE.
They complement that data with exotic tissues.
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Examples of repurposing in proteogenomics
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Public datasets from different omics: OmicsDIhttp://www.ebi.ac.uk/Tools/omicsdi/Aims to integrate of omics datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVEjPOSTPASSELGPMDB
ArrayExpressExpression Atlas
MetaboLightsMetabolomics WorkbenchGNPS
EGAPerez-Riverol et al., Nat Biotechnol, in press
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.23
OmicsDI: Portal for omics datasets
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.24
OmicsDI: Portal for omics datasets
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.25
Overview
PRIDE Archive and ProteomeXchange
PRIDE tools
Reuse of public proteomics data
PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Added value resources: PRIDE Cluster and PRIDE ProteomesCondensed and across-data set, QC-filtered view on PRIDE data.PRIDE Cluster: Peptide centric.PRIDE Proteomes: Protein centric (identification data)
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Cluster
Provide an aggregated peptide centric view of PRIDE Archive.Hypothesis: same peptide will generate similar MS/MS spectra across experiments.New version of spectral clustering algorithm to reliably group spectra coming from the same peptide. Enables QC of peptide-spectrum matches (PSMs). Infer reliable identifications by comparing submitted identifications of spectra within a cluster.
After clustering, a representative spectrum is built for all peptides consistently identified across different datasets.Used to build spectral libraries (for 16 species).Griss et al., Nat. Methods, 2013Griss et al., Nat. Methods, 2016
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016
29
Example: one perfect cluster
880 PSMs give the same peptide ID4 species28 datasetsSame instruments
http://wwwdev.ebi.ac.uk/pride/cluster/
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016PRIDE Proteomes web interface: identification info
Unique/Shared Peptides Mass spec-based sequence coveragePTM detected ( )Observed tissues
Biological vs Sample Prep PTMshttp://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016ConclusionsPRIDE Archive and ProteomeXchange have become the standard platform for public data deposition in proteomics.PRIDE Inspector: support for data standards.PX submission tool.
Reuse of public proteomics data is increasing: many opportunities for data miners.OmicsDI: new platform to identify public datasets coming from different omics technologies (more possibilities for data reuse!).PRIDE Cluster and PRIDE Proteomes.
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 2016Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)
Johannes GrissYasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak
Enrique Perez
Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
@pride_ebi@proteomexchange
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 201633
Questions?
http://www.slideshare.net/JuanAntonioVizcaino
Juan A. [email protected] Spectrometry and Proteomics Congress 2016London, 15 November 201634