disgenet: applying semantic web and network analysis ... · disgenet: a discovery platform for the...
TRANSCRIPT
DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations for Translational Research and Drug Discovery
Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University
Acknowledgements The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).
DISCOVERY PLATFORM2
KNOWLEDGE BASE TOOLS
EVIDENCE-BASED DISCOVERY
CLINICIAN INTEROPERABILITY
METADATA
DATABASES & LITERATURE STANDARDS
INTEGRATION
OPEN
http://www.disgenet.org/
RESEARCHER
CURATOR
BIOINFORMATICIAN & DEVELOPER
DISCOVERABILITY
COMMUNITY USE
LARGE-SCALE EXTRACTION
DIGITAL PUBLICATION, SHARING AND LINKING Usage stats (Ago2014-Ago2015):
• 12,040 users, 22,696 sessions • 14,494 downloads • DisGeNET used in +20 publications,
cited in +60 articles
Registered: • biosharing • OMICtools • NeuroLex • Datahub
Integrated knowledge: • Text mining extraction • Integration with well-curated data • Analysis • Discovery and decision-making
Present in the Semantic Web: • URI/RDF/nanpublications • Machine-processable • Semantic integration • Links to the Linked Open Data cloud • Data analysis across domains
IMPLEMENTATION
STANDARDIZATION
TRANSLATIONAL RESEARCH INTEROPERABILITY
INTEGRATION
EVIDENCE
SEMANTIC WEB
NETWORK ANALYSIS
DRUG DISCOVERY
WEB-BASED EXPLORATION
KNOWLEDGE BASE TOOLS FOR EXPLORATION AND ANALYSIS
REPRODUCIBILITY ACCESSIBILITY
LHGDN
S = WCURATED + WPREDICTED + WLITERATURE
SYNTACTIC
NORMALIZATION
SEMANTIC
Downloads
Web Interface
SPARQL endpoint
Open Database License
Programmatic access
Metadata
Digital objects
• Data item-level description • Dataset-level description Transparency and validation
• Tab separated plain text • SQLite • RDF • Trusty nanopublications
http://www.disgenet.org/
http://rdf.disgenet.org/sparql/
http://opendatacommons.org/licenses/odbl/1.0/
Linked Data Browser http://rdf.disgenet.org/fct/
• Automatic analysis • Higher speed • Reduce error • Share results • Embed in workflows
DisGeNET association type ontology
Semanticscience Integrated Ontology (SIO)4
• 11 common ontologies in NCBO BioPortal
• RDF1
• Nanopublications3
• NCBI Gene ID • UMLS Concept Unique Identifiers
• Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID
What are the diseases associated to melanocortin 4 receptor (MC4R)?
What are the genes associated to Obesity?
429,111 Gene-Disease Associations
What is the pattern of tissue expression of the genes associated to Obesity?
References 1. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the
Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 2. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a discovery
platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028–bav028.
3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as Nanopublications.
Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience
Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked
data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012-0088
Several formats and models
• Provenance (PubMed ID, source)
• DisGeNET score (evidence)
Context metadata for each G-D pair
Sentence description
• 17,181 Genes • PANTHER class
• 14,610 Diseases • MeSH class
• Semantic Web platform to answer complex questions for the pharmacological field 5
What is the pattern of tissue protein expression of the genes associated to Obesity that are involved in the same pathway, and retrieve all bioactive
small molecules hitting each target (filter minEx-pChembl=5)?
• Large-scale integration across domains
• DisGeNET Cytoscape plugin
60% complex, 36% rare/Mendelian, and 4% infectious diseases
DO MSH OMIM NCI ORDO ICD9
19 58 38 33 13 12
Recent findings