possibilities for integrating model-related data in computational biology (dils 2013)
DESCRIPTION
TRANSCRIPT
Possibilities for Integrating Model-related Data in Computational
Biology
Databases in Life Sciences, Montreal, July 2013
Dagmar Waltemath, University of Rostock, Germany Nicolas Le Novère, Babraham Institute, UK
Michel Dumontier, Carleton University, Canada
Archive
Introduction
13-07-12 Integrating model-related data 2
Fig.: DOI: 10.1038/35002125
Introduction
13-07-12 Integrating model-related data 3
Fig.: DOI: 10.1038/35002125
Introduction
13-07-12 Integrating model-related data 4
No and size of models
time
Fig.: DOI: 10.1038/35002125
Introduction
13-07-12 Integrating model-related data 5
Fig.: DOI: 10.1038/35002125 model reuse – result reproducibility
Introduction
13-07-12 Integrating model-related data 6
Introduction
1. How can we distribute models with all information necessary to reuse them (MIRIAM)?
2. How can we effectively manage different types of model-related data?
3. How can we link model-related data to the rest of the world?
13-07-12 Integrating model-related data 7
1. Distributing models
Archive 13-07-12 Integrating model-related data 8
Frank Bergmann
Nicolas Le Novère
1. Distributing models
The COMBINE archive v0.1
• single “.zip” file
• bundles models and model-related data
• single file
http://co.mbine.org/documents/archive
13-07-12 Integrating model-related data 9
1. A manifest file, "manifest.xml“,
2. all described files, 3. a metadata file,
"metadata.*“, 4. remaining files.
• All documents necessary for the description of a model and all associated data and procedures.
• In the future: also references to documents
1. Distributing models
<?xml version="1.0" encoding="utf-8"?>
<omexManifest xmlns="http://identifiers.org/combine.specifications/omex-manifest">
<content location="./manifest.xml" format="http://identifiers.org/combine.specifications/omex-manifest"/>
<content location="./model/model.xml" format="http://identifiers.org/combine.specifications/sbml"/>
<content location="./simulation.xml" format="http://identifiers.org/combine.specifications/sedml"/>
<content location="./article.pdf" format="application/pdf"/>
<content location="./metadata.rdf" format="http://identifiers.org/combine.specifications/omex-metadata"/>
</omexManifest>
13-07-12 Integrating model-related data 10
2. Managing models
13-07-12 11 Integrating model-related data
Ron Henkel
2. Managing models
• Neo4J database
• Model2graph mapping ( , )
• Rich relations http://biomodels.net/qualifiers
• Links to annotations
13-07-12 Integrating model-related data 12
“Which models are annotated with ‚Adenosine tri-phosphate‘?“
“Which models contain reactions with ATP as reactant and ADP as product?”
Document
Model
P E CR S
SBO:0000268 uniprot:P07101 uniprot:Q03393 GO:0005737HGNC:8582
is
isV
ers
ion
Of
is
isE
nco
de
dB
y
is
asProduct
asReactant
asModifier
Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig
Document
Model
P E CR S
SBO:0000268 uniprot:P07101 uniprot:Q03393 GO:0005737HGNC:8582
is
isV
ers
ion
Of
is
isE
nco
de
dB
y
is
asProduct
asReactant
asModifier
2. Managing models
• Lucene-based ranked retrieval
13-07-12 Integrating model-related data 13
“Give me the best matching model published about the Cell Cycle and covering forms of cdc.“
Lucene query "cdc*" AND "Cell Cycle"
http://www.ebi.ac.uk/biomodels-demo/
Henkel et al. (2010), Bioinformatics
Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig
2. Managing models
• Representing simulation descriptions
• ... and other types of model-related data 13-07-12 Integrating model-related data 14
“Give me all possible simulations that show the dependency of the Cell Cycle on the concentration of cdc25.“
Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig
3. Integrating model data
13-07-12 Integrating model-related data 15
3. Integrating model data
13-07-12 Integrating model-related data 16 16
At the heart of Linked Data for the Life Sciences
• Free and open source • Based on Semantic Web standards • Billions of interlinked statements from dozens
of conventional and high value datasets • Partnerships with EBI, NCBI, DBCLS, NCBO,
OpenPHACTS, and commercial tool providers
chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways
BioModels animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications
3. Integrating model data # get all biochemical reactions in biomodels that are kinds of "protein catabolic process“, as defined by the gene ontology (in bioportal endpoint) SELECT ?go ?label count(distinct ?x) WHERE { ?go rdfs:label ?label . ?go rdfs:subClassOf ?tgo OPTION (TRANSITIVE) . ?tgo rdfs:label ?tlabel . FILTER regex(?tlabel, "^protein catabolic process") service <http://biomodels.bio2rdf.org/sparql> { ?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> . }
13-07-12 Integrating model-related data 17
Gene Ontology Annotation Number of Reactions
protein catabolic process [go:0030163] 51 cellular protein catabolic process [go:0044257] 26
modification-dependent protein catabolic process [go:0019941] 1 beta-amyloid formation [go:0034205] 1
“Give me all reactions in BioModels Database that represent protein catabolic processes. “
Summary
Approach Features Purpose
COMBINE archive
File bundle; • Easy access to all model-related
data through one single file
Shipping files
Graph-DB (MORRE)
Network of interrelated nodes • IR techniques easily applicable • No schema • Link models and simulations
Managing existing model data
BIO2RDF
Semantic integration of knowledge • Automated reasoning • No schema • Linking into LOD
Full integration
13-07-12 Integrating model-related data 18
Thank you.
13-07-12 Integrating model-related data 19
http://co.mbine.org/events/COMBINE_2013