the mztab data standard format for reporting ms-based peptide, protein and small molecule...

22
mzTab - Reporting MS-based Proteomics and Metabolomics Results Dr. Johannes Griss Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK Division of Immunology, Allergy and Infectious Diseases Department of Dermatology Medical University of Vienna, Austria Dr. Juan A. Vizcaíno on behalf of

Upload: juan-antonio-vizcaino

Post on 23-Jun-2015

124 views

Category:

Science


2 download

DESCRIPTION

This is the talk I gave in HUPO 2014 on behalf of Johannes Griss about the mzTab data standard format.

TRANSCRIPT

Page 1: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

mzTab - Reporting MS-based Proteomics and Metabolomics Results

Dr. Johannes Griss

Proteomics Services Team

EMBL-EBI

Hinxton, Cambridge, UK

Division of Immunology, Allergy and Infectious Diseases

Department of Dermatology

Medical University of Vienna, Austria

Dr. Juan A. Vizcaíno on behalf of

Page 2: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Overview

• Need for mzTab

• Details about the data format (mzTab 1.0)

• Existing software implementations

• Extension of mzTab 1.0 for metabolomics

Page 3: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

•Develops data format standards for proteomics.

•Both data representation and annotation standards.

•Involves data producers, database providers, software producers, publishers, …

•Active Workgroups: MI, MS, PI, Mod, (Protein Separation).

•Inter-group activities: MIAPE and Controlled Vocabularies.

•Started in 2002, so some experience already…

www.psidev.info

HUPO Proteomics Standards Initiative

Page 4: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

PSI-MS/PI Standard File Formats before mzTab

• TraMLSRM

• mzQuantMLQuantitation

• mzIdentMLIdentification

• mzMLMS data

Page 5: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Reasons for an additional file format (mzTab)• mzIdentML and mzQuantML (necessary) focus on

complete representation of proteomics results

• Complex XML-based file formats

• Specialised software required for visualisation

• In-depth bioinformatics understanding required to create and use files

• No simple method to communicate final results to non-proteomics experts

• No simple method to utilise files through scripting languages and standard statistical software

Page 6: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – Aims

• Store final results of MS-based experiment in a single file

• Quantitation data

• Identification data

• Small Molecule data

• Reduce complexity to make data accessible to non-proteomics / bioinformatics experts

• Be easily accessible using “standard” software

Page 7: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – Aims

• What the format does NOT aim at:

• Replace mzIdentML or mzQuantML for proteomics approaches

• Contain the complete data of a MS based experiment

• Provide fully detailed evidence for the data

• Allow a researcher to recreate the process which led to the results

Page 8: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Why a tab-delimited file?

• Using XML based formats requires sophisticated bioinformatics expertise

• Many researchers are still used to use MS Excel to “look” at or exchange their data.

• Standard tab-delimited file formats for transcriptomics (MAGE-TAB) and molecular interactions (MI-TAB) data were already successful

Page 9: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab format

http://mztab.googlecode.com

Page 10: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab - Sections

• Basic information about experiment and sample• Key-Value pairsMetadata

• Basic information about protein identifications• Table-basedProtein

• Information about quantified peptides• Table-basedPeptide

• Information about identified spectra• Table-basedPSM

• Basic information about identified small molecules• Table-basedSmall Molecule

Page 11: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Metadata section - Example

Page 12: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab –Modes and Types

• Modes (depending on the level of detail):

• ‘Summary’: only the ‘final results’.

• ‘Complete’: detailed information for each individual assay or replicate is provided.

• Types:

• ‘Identification’: Only identification results.

• ‘Quantification’: They can also contain identification results.

• Overall, 4 different files “flavors” are possible, so very flexible design.

Page 13: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Protein Section (label-free)

Page 14: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Protein Section (label-free)

Page 15: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Peptide Section (label-free)

• Only used in “Quantification” files.

Page 16: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

PSM section (identification data)

Page 17: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – Current implementations

• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript published in the journal Proteomics.

• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).

• mzIdentML and mzQuantML to mzTab converters (Andy Jones group).

• MaxQuant: exporter in beta is available.

• OpenMS (version 1.10).

• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).

• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).

• Metabolights (EBI).

Page 18: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab – ongoing development

• More detailed modelling of MS metabolomics data

• Led by S. Neumann (COSMOS EU FP7 project).

• Extension from one to three sections.

Example file exists at

https://github.com/sneumann/mtbls2/faahKO.mzTab

http://www.cosmos-fp7.eu/

Page 19: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab format related publications

http://code.google.com/p/mztab/

J. Griss et al., MCP, 2014

Q.W. Xu et al., Proteomics, 2014

Page 20: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

mzTab format

http://mztab.googlecode.com

Page 21: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Current PSI-MS/PI Standard File Formats

• mzTabFinal Results

• TraMLSRM

• mzQuantMLQuantitation

• mzIdentMLIdentification

• mzMLMS data

Page 22: The mzTab data standard format for reporting MS-based peptide, protein and small molecule identification and quantification results

Johannes [email protected]

HUPO 2014

Acknowledgements

Johannes GrissQing-Wei XuHenning Hermjakob

Timo SachsenbergMathias WalzerOliver Kohlbacher

http://mztab.googlecode.com

Andy Jones

S. Neumann and other COSMOS partners

PSI editor and reviewers… and many others have also contributed

BBSRC PROCESS grantBBSRC ProteoSuite grant