possibilities for integrating model-related data in computational biology (dils 2013)

19
Possibilities for Integrating Model-related Data in Computational Biology Databases in Life Sciences, Montreal, July 2013 Dagmar Waltemath, University of Rostock, Germany Nicolas Le Novère, Babraham Institute, UK Michel Dumontier, Carleton University, Canada Archive

Upload: dagmar-waltemath

Post on 19-Jan-2015

313 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Possibilities for integrating model-related data in computational biology (DILS 2013)

Possibilities for Integrating Model-related Data in Computational

Biology

Databases in Life Sciences, Montreal, July 2013

Dagmar Waltemath, University of Rostock, Germany Nicolas Le Novère, Babraham Institute, UK

Michel Dumontier, Carleton University, Canada

Archive

Page 2: Possibilities for integrating model-related data in computational biology (DILS 2013)

Introduction

13-07-12 Integrating model-related data 2

Fig.: DOI: 10.1038/35002125

Page 3: Possibilities for integrating model-related data in computational biology (DILS 2013)

Introduction

13-07-12 Integrating model-related data 3

Fig.: DOI: 10.1038/35002125

Page 4: Possibilities for integrating model-related data in computational biology (DILS 2013)

Introduction

13-07-12 Integrating model-related data 4

No and size of models

time

Fig.: DOI: 10.1038/35002125

Page 5: Possibilities for integrating model-related data in computational biology (DILS 2013)

Introduction

13-07-12 Integrating model-related data 5

Fig.: DOI: 10.1038/35002125 model reuse – result reproducibility

Page 7: Possibilities for integrating model-related data in computational biology (DILS 2013)

Introduction

1. How can we distribute models with all information necessary to reuse them (MIRIAM)?

2. How can we effectively manage different types of model-related data?

3. How can we link model-related data to the rest of the world?

13-07-12 Integrating model-related data 7

Page 8: Possibilities for integrating model-related data in computational biology (DILS 2013)

1. Distributing models

Archive 13-07-12 Integrating model-related data 8

Frank Bergmann

Nicolas Le Novère

Page 9: Possibilities for integrating model-related data in computational biology (DILS 2013)

1. Distributing models

The COMBINE archive v0.1

• single “.zip” file

• bundles models and model-related data

• single file

http://co.mbine.org/documents/archive

13-07-12 Integrating model-related data 9

Page 10: Possibilities for integrating model-related data in computational biology (DILS 2013)

1. A manifest file, "manifest.xml“,

2. all described files, 3. a metadata file,

"metadata.*“, 4. remaining files.

• All documents necessary for the description of a model and all associated data and procedures.

• In the future: also references to documents

1. Distributing models

<?xml version="1.0" encoding="utf-8"?>

<omexManifest xmlns="http://identifiers.org/combine.specifications/omex-manifest">

<content location="./manifest.xml" format="http://identifiers.org/combine.specifications/omex-manifest"/>

<content location="./model/model.xml" format="http://identifiers.org/combine.specifications/sbml"/>

<content location="./simulation.xml" format="http://identifiers.org/combine.specifications/sedml"/>

<content location="./article.pdf" format="application/pdf"/>

<content location="./metadata.rdf" format="http://identifiers.org/combine.specifications/omex-metadata"/>

</omexManifest>

13-07-12 Integrating model-related data 10

Page 11: Possibilities for integrating model-related data in computational biology (DILS 2013)

2. Managing models

13-07-12 11 Integrating model-related data

Ron Henkel

Page 12: Possibilities for integrating model-related data in computational biology (DILS 2013)

2. Managing models

• Neo4J database

• Model2graph mapping ( , )

• Rich relations http://biomodels.net/qualifiers

• Links to annotations

13-07-12 Integrating model-related data 12

“Which models are annotated with ‚Adenosine tri-phosphate‘?“

“Which models contain reactions with ATP as reactant and ADP as product?”

Document

Model

P E CR S

SBO:0000268 uniprot:P07101 uniprot:Q03393 GO:0005737HGNC:8582

is

isV

ers

ion

Of

is

isE

nco

de

dB

y

is

asProduct

asReactant

asModifier

Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig

Page 13: Possibilities for integrating model-related data in computational biology (DILS 2013)

Document

Model

P E CR S

SBO:0000268 uniprot:P07101 uniprot:Q03393 GO:0005737HGNC:8582

is

isV

ers

ion

Of

is

isE

nco

de

dB

y

is

asProduct

asReactant

asModifier

2. Managing models

• Lucene-based ranked retrieval

13-07-12 Integrating model-related data 13

“Give me the best matching model published about the Cell Cycle and covering forms of cdc.“

Lucene query "cdc*" AND "Cell Cycle"

http://www.ebi.ac.uk/biomodels-demo/

Henkel et al. (2010), Bioinformatics

Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig

Page 14: Possibilities for integrating model-related data in computational biology (DILS 2013)

2. Managing models

• Representing simulation descriptions

• ... and other types of model-related data 13-07-12 Integrating model-related data 14

“Give me all possible simulations that show the dependency of the Cell Cycle on the concentration of cdc25.“

Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig

Page 15: Possibilities for integrating model-related data in computational biology (DILS 2013)

3. Integrating model data

13-07-12 Integrating model-related data 15

Page 16: Possibilities for integrating model-related data in computational biology (DILS 2013)

3. Integrating model data

13-07-12 Integrating model-related data 16 16

At the heart of Linked Data for the Life Sciences

• Free and open source • Based on Semantic Web standards • Billions of interlinked statements from dozens

of conventional and high value datasets • Partnerships with EBI, NCBI, DBCLS, NCBO,

OpenPHACTS, and commercial tool providers

chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways

BioModels animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications

Page 17: Possibilities for integrating model-related data in computational biology (DILS 2013)

3. Integrating model data # get all biochemical reactions in biomodels that are kinds of "protein catabolic process“, as defined by the gene ontology (in bioportal endpoint) SELECT ?go ?label count(distinct ?x) WHERE { ?go rdfs:label ?label . ?go rdfs:subClassOf ?tgo OPTION (TRANSITIVE) . ?tgo rdfs:label ?tlabel . FILTER regex(?tlabel, "^protein catabolic process") service <http://biomodels.bio2rdf.org/sparql> { ?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> . }

13-07-12 Integrating model-related data 17

Gene Ontology Annotation Number of Reactions

protein catabolic process [go:0030163] 51 cellular protein catabolic process [go:0044257] 26

modification-dependent protein catabolic process [go:0019941] 1 beta-amyloid formation [go:0034205] 1

“Give me all reactions in BioModels Database that represent protein catabolic processes. “

Page 18: Possibilities for integrating model-related data in computational biology (DILS 2013)

Summary

Approach Features Purpose

COMBINE archive

File bundle; • Easy access to all model-related

data through one single file

Shipping files

Graph-DB (MORRE)

Network of interrelated nodes • IR techniques easily applicable • No schema • Link models and simulations

Managing existing model data

BIO2RDF

Semantic integration of knowledge • Automated reasoning • No schema • Linking into LOD

Full integration

13-07-12 Integrating model-related data 18

Page 19: Possibilities for integrating model-related data in computational biology (DILS 2013)

Thank you.

13-07-12 Integrating model-related data 19

http://co.mbine.org/events/COMBINE_2013