euro lipids 2014_graz
DESCRIPTION
Bioinformatics for lipidomics: Putting some building blocks together. 4th European Lipidomic Meeting. Graz, Austria. 22-24/09/2014.TRANSCRIPT
Bioinformatics for lipidomics: putting some building blocks together
Dr. Juan Antonio Vizcaíno
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Some of the main bioinformatics building blocks
Data standards
Databases, data repositories
Stable identifiers for molecules
Infrastructure to store and access the information
Nothing new… Lipidomics (metabolomics) is following the steps of other disciplines
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Bioinformatics infrastructure
Usually, we will not realize they are there… unless something does not work
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Standards are needed in life: also in bioinformatics…
With a small number of standards,data converters are feasible
Data standards are needed
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Metabolomics Standards Initiative 2007 publications
Roy Goodacre Metabolomics (2014) 10:5-7
Not much adoption happened in practise…
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Situation at the field
LipidXplorer LDA ALEX Others
Lab 1 Lab 2 Lab 3 …
…
Different output files from different tools
How can these results coming from different groups be easily compared? (also applicable to visualization, storage, …)
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Situation at the field
LipidXplorer LDA ALEX Others
Lab 1 Lab 2 Lab 3 …
…
Different output files from different tools
mzTab Common analysis/visualization tools
Converters
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
The mzTab format
http://code.google.com/p/mztab/
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab – Aims and concept• To provide a simple and efficient way of exchanging results
from MS approaches.
• Simple summary report of the experimental results
• Peptides and proteins identified in a given experimental setting
• Small molecules identified
• Reported quantification values
• Technical and biological metadata
• Easier to update and maintain, and flexible enough.
• Easier to parse and use by the research community, systems biologists as well as providers of knowledge bases.
• It can be used by non-experts in bioinformatics.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Why a tab-delimited file?• An effective use of the XML based formats in the
proteomics field (mzIdentML, mzQuantML) requires sophisticated bioinformatics expertise.
• No alternative was available for metabolomics results…
• Many researchers are still used to use MS Excel to “look” or exchange their data.
• The transcriptomics field has a widely used standard tab-delimited file format (MAGE-TAB) for exchanging data. The format MI TAB has also been a success in the molecular interaction field.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab –Format Specification (version 1.0.0)• Five sections:
• (Optional) Metadata section
• (Optional) Protein section
• (Optional) Peptide section
• (Optional) PSM (Peptide Spectrum Match) version
• (Optional) Small Molecule section
• Can report experimental design to a high detail level.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab – Metadata Section
• It provides additional information about the dataset. It consists of key- value pairs.
• Extensive use of CVs/ontologies.
• Different requirements depending on the file mode (‘summary’ or ‘complete’) and type (‘identification’ or ‘quantification’).
• Support for experimental design (very similar to mzQuantML).
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab – Metadata Section
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab – Small Molecule Table • Main contents:
• Identifier• Unit-ID• Chemical formula• SMILES identifier• InChi identifier• Descriptive name• Mass to charge• Charge and retention time• Tax ID and species name• Spectral library name + version• Software name + version• Relative or absolute quantification values• Reference to the spectrum ID in an external file (i.e. mzML),
…
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab – Small Molecule Section
• It contains mandatory and optional fields.
• It is possible to link with the external mass spectra.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab – Current implementations
• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript published in the journal Proteomics.
• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).
• mzIdentML and mzQuantML to mzTab converters (Andy Jones group).
• MaxQuant: exporter in beta is available.
• OpenMS (version 1.10).
• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).
• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).
• Metabolights (EBI).
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Implementation in Lipid Data Analyzer• In collaboration with TU of Graz.
• mzTab export support is available from v1.6 (May 2012)
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab format publications
http://code.google.com/p/mztab/
J. Griss et al., MCP, 2014
Q.W. Xu et al., Proteomics, 2014
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
COSMOS: EU FP7 project
• COordination of Standards in MetabOlomicS
• Started October 2012
• 14 European partners
• World wide collaborators• Standards!!
• Data exchange• Opensource
http://www.cosmos-fp7.eu/
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzTab in Mx: extension ongoing
•Meeting in Tuebingen to extend mzTab for metabolomics (March 2014).
•NEW! 3 Tables for SM (analogous to Proteins)
1)SmallMoleculeList
2)SmallMoleculeFeatures
3)SmallMoleculeEvidence
Example file exists at
https://github.com/sneumann/mtbls2/faahKO.mzTab
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzML: Standard for MS data
• A data format for the storage and exchange of MS output files
• Originally designed for proteomics by merging the best aspects of both mzData and mzXML
• Developed with full participation of academic researchers, hardware and software vendors
• For both raw data and processed peaks.
• Version 1.1 released in June 2009
• Many implementations already exist in the proteomics world
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
mzML for Metabolomics
•A no-brainer. No need to reinvent the wheel
•No schema change required.
•But in next documentation update:
1.Describe multidimensional retention time (GCxGC/MS, LCxLC/MS and LC-IMS/MS)
2.Describe tools for conversion (especially the GC world)
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Data standards in MS for metabolomics
Steffen Neumann
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights and COSMOS
• Specialist resource: LipidHome
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Situation at the field
•Very challenging to share experimental results efficiently:
•No standard data format for experimental results (Excel spreadsheets are routinely used).
•Lipid species are called in a slightly different way by different groups and the level of detail also varies.
•This situation is maybe good enough for human consumption, but not for computers. This hinders the development of:
•Analysis tools
•Data repositories
•LIMS systems
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Standard LipidomicNet Nomenclature
• Address some limitations of LIPID MAPS (de facto standard nomenclature) for high-throughput lipid MS approaches
• Enabling different levels of resolution for lipid species (needed to add clarification to the data)
• Suitable for bioinformatics approaches (used in LipidHome)
• Includes at present the main lipid classes (from FA to Sterols).
G. Liebisch et al., JLR, 2013
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Nomenclature Structural Hierarchy
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Data sharing in Biology
• In some ‘omics’ fields, data sharing ‘culture’ is well established. Generally, it is considered to be a good scientific practise.
• In metabolomics (lipidomics), that ‘culture’ is not there yet.
• Public availability of data enables: • Reinterpretation.• validation of the experimental results reported. • reuse of the data (e.g. for meta-analysis studies). • Specific use cases for metabolomics (lipidomics): e.g.
development of MRM assays, spectral libraries, fragmentation models,…etc.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
MetaboLights – metabolomics repository
www.ebi.ac.uk/metabolights (metabolights.org, metabolights.eu)
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
MetaboLights – Data types stored
• Primary research data
• Investigation, Study, Assay and Protocols (metadata)
• Instrument and analytical software output (raw / processed)
• Metabolite references, QC, Blanks, …
• Open source formats
• Imported Reference data, for each metabolite
• Reference data imported from external databases
• Chemistry, Biology, Reactions, Pathways, NMR/MS spectra, Literature
• Link to:
• ChEBI, Rhea and others
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
MetaboLights – Private Data – Share data
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
MetabolomeXchange.org
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
LipidHome
www.ebi.ac.uk/apweiler-srv/lipidhome
J. Foster et al., PLOS One, 2013
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
LipidHome: executive summary• Provides stable identifiers for all common lipid structures.
• Provides all theoretical lipid structures, while maintaining clear separation between them and experimentally validated structures.
• Evidence based system for annotating lipids with papers.
• A useful annotation level hierarchy that allows interrogation of the database from whatever results you have. E.g. Mass, structural fragment or empirical formula.
• Programmatic access so that lipid identification software/ LIMS / analysis pipelines can be built on top of it.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
LipidHome Structural Hierarchy
• Lipids are stored at the levels described in the proposed LipidomicNet nomenclature
• Lipid identifications can accurately be mapped to suitable records in the database
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Use cases
• What Species/Isomers are viable identifications for mass X with tolerance Y?
• For species PC 36:2 what are the experimentally validated isomers/ Fatty acid scan species?
• What are all the experimentally validated sub species containing the fatty acid species 18:2?
• What are all the identifications validated by “PMID:20564011”?
• For the mass X what is the most likely sub species based on previous identifications.
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
The data in LipidHome
GL
MG
MG
MG O-
DG
DG
DG O-
DG dO-
TG
TG
TG O-
TG dO-
TG tO-
GP
PC
PC
PC O-
PC dO-
LPC
LPC O-
PA
PA
PA O-
PA dO-
LPA
LPA O-
PE
PE
PE O-
PE dO-
LPE
LPE O-
PS
PS
PS O-
PS dO-
LPS
LPS O-
PI
PI
PI O-
PI dO-
LPI
LPI O-
PG
PG
PG O-
PG dO-
LPG
LPG O-
Species: 17497Fatty Acid Scan species: 1821760Sub Species: 2140592Annotated Isomers: 7584Fatty Acid species: 164
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Theoretical lipid generation
• A set of rules were derived that describe common fatty acids.
• Minimum carbons = 2
• Maximum carbons = 30
• Minimum double bonds = 0
• Maximum double bonds = 10
• Minimum gap between double bonds
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
LipidHome – Species view
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
LipidHome – MS1 search output
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
The big picture…Standard nomenclature
mzTabCommon analysis and visualization software
Local LIMS systemsMetaboLights
Different output files from different tools
Data convertersto mzTab
mzTab importer intoLIMS/ resource
mzTab exporter fromLIMS/ resource
LipidXplorer LDA ALEX Others
Juan A. Vizcaí[email protected]
4th European Lipidomic meetingGraz, 24 September 2014
Acknowledgements
Johannes GrissQing-Wei XuJoe Foster
R. Salek & C. SteinbeckCOSMOS partners
G. Liebisch, M. Troetzmueller, F. Spener, H. Koefeler & M. Wakelam
http://code.google.com/p/mztab/
Jurgen HartlerGerhard Thallinger
BBSRC PROCESS grant
Mathias WalzerTimo SachsenbergOliver Kohlbacher