discovering and linking public ‘omics’ datasets 2 henning...omicsdi – discovering and linking...
TRANSCRIPT
OmicsDI –Discovering and Linking Public ‘Omics’ Datasets
Henning HermjakobEuropean Bioinformatics InstituteUCLA HeartBD2K Center
OmicsDI VisionA PubMed for (omics) datasets
http://omicsdi.org
Omics XMLDatabases EBI Search Indexer
INDEXING ENGINE
EBI CLUSTERIndexed Data
LuceneIndexes
LuceneIndexes
• 520GB• 1.1B entries
REST
WS
SEARCH ENGINE
CACHE SERVERS
End points:• Statistics• Datasets
RESTFUL WS DATABASE
WEB APP
SEARCHSTATISTICSTAGGING
Om
icsA
pp
Dataset XML
Valid
ator
Mandatory Fields:
• Repository Id• Dataset Title• Publication date• Submitter information (Name, Affiliation)• Original URL
Desired Fields:
• Description/Abstract• Sample and Data Protocols• PubMed Id• Organism, Tissue, Disease
Additional Fields:• Protein Id (Ensembl or Uniprot)• Metabolite Id (ChEMBL) • More…
Graphical Browsing
Perez-Riverol, Yasset, et al. "Omics Discovery Index - Discovering and Linking Public Omics Datasets." bioRxiv (2016): 049205.
Search results overview
Dataset view
Access Metrics
Multi-Omics linking
Data re-use
Data re-use
Schema.org integration- Search engine exposure- citability
<script type="application/ld+json"> {
"@context": "http://schema.org", "@type": "Dataset", "name": "Expression data from skin biopsy samples from patients with moderate -to severe
psoriasis ", "description": "A gene expression profiling sub-study was conducted in which skin biopsy samples
were collected from 85 patients with moderate-to-severe psoriasis who were participating in ACCEPT, an IRB-approved Phase 3, multicenter, randomized trial. This analysis identified 4,175 probe-sets as being significantly modulated in psoriasis lesions (LS) compared with matched biopsies of non-lesional (NL) skin. Skin biopsy samples (n=170) were collected at baseline for RNA extraction and microarray analysis from 85 patients with moderate-to-severe psoriasis without receiving active psoriasis therapy.",
"sameAs": "http://www.ebi.ac.uk/gxa/experiments/E-GEOD-30999",
"creator": { "@type" : "Person", "name" : “Suárez-Fariñas Mayte”
}, "url": "http://www.omicsdi.org/dataset/ExpressionAtlas/E-GEOD-30999"
} </script>
http://www.omicsdi.org/dataset/atlas-experiments/E-GEOD-30999
13
Biological SimilarityProteomic dataset Genomic dataset Metabolomic dataset
P P P
publication
metadata
M M M
publication
metadata
P P P G G G M M M
UNIPROTENSEMBL
ENSEMBL CHEMBL
Cross-references
PUBMED
Cross-references Cross-references
REACTOME Pathways
PG
M
INTACTP
P
P
G G G
publication
metadata
Biological Similarity
OmicsDI and bioCADDIE Originally independenty funded Administrative supplement 2016
PIs Peipei Ping, UCLA Lucila Ohno-Machado, UCSD
Susanna Assunta-Sansone, U Oxford Eric Deutsch, ISB Henning Hermjakob, EBI
WPs Map OmicsDI, DATS data model Re-usable visualisation widgets Access metrics
Collaboration OmicsDI provides metadata from “its” repositories to DataMed OmicsDI goes more into the “depth” for omics DataMed focuses on breadth
Acknowledgements
YassetPerez-Riverol
MingzeBai
GaurhariDass
PRIDE TeamRui WangTobias TernentNoemi del ToroJuan Antonio VizcainoHenning Hermjakob
Metabolights TeamKenneth HaugPablo Conesa Mingo
EGA TeamHelen ParkinsonJustin PaschallDylan Spalding
NIH BD2K Center of Excellence @ UCLAGrant number 1U54GM114833-01
Peipei PingVincent Ky
EBI Search TeamSilvano SquizzatoYoung Mi ParkRodrigo Lopez
The Bigger Picture - WorkflowsData Discovery:• OmicsDI
Tool Discovery: • BD2K
Coordination Center
Data Access:ProXI• Web services
based retrieval of data from OmicsDIrepositories
Data Analysis:• Reactome
Many options, data dependent
Finding a publication• Straightforward through PubMed or (Europe) PubMed Central
Finding a Dataset
• Many disconnectedsearch entry points
• Google does not workwell, as it does notseparate out datasets
• Vision:A PubMed for (‘omics)datasets
21
Indexing System
Lucence-based Indexer System:
Strength:• Already implemented• Open source if we need to
migrate the infrastructure.• Indexed with all the EBI
information facilitates cross-references.
• Indexes all of EBI (1.1 B entries), known to scale well
Limitations:• Only an indexing system, not a
database -> no persistence• Relies on EBI infrastructure
EBI Search Indexer
INDEXING ENGINE
EBI CLUSTERIndexed Data
LuceneIndexes
LuceneIndexes
• 520GB• 1.1B entries
REST
WS
SEARCH ENGINE
CACHE SERVERS
DDI application
• Database (Mongo)• Access statistics
• Web Service• Search • Statistics
• Web Application• Statistics • Browsing• Knowledge
Discovery
End points:• Statistics• Datasets
RESTFUL WS DATABASE
WEB APP
SEARCHSTATISTICSTAGGING
Om
icsA
pp
Ontology-aware Indexing
Ontology Highlighting