rdfizing the ebi gene expression atlas

18
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari [email protected]

Upload: oliana

Post on 08-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

RDFizing the EBI Gene Expression Atlas. James Malone, Electra Tapanari [email protected]. Motivation. Initial motivation is explorative Can we ask new questions? Do we get new answers? Can we integrate this data with other related data? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RDFizing the EBI Gene Expression Atlas

Master headline

RDFizing the EBI Gene Expression Atlas

James Malone, Electra Tapanari

[email protected]

Page 2: RDFizing the EBI Gene Expression Atlas

Master headline

- Initial motivation is explorative- Can we ask new questions?- Do we get new answers?- Can we integrate this data with other related

data?- Is there a sufficient user community to justify an

RDF Atlas resource?

Motivation

Page 3: RDFizing the EBI Gene Expression Atlas

Master headline

SESL Project

- Semantic Enrichment of Scientific Literature Working Group

- Includes EBI (Dietrich Rebholz) and Pistoia Alliance

- Pilot project in 2010 looking at Developing knowledge brokering standards for semantic integration of gene to Type II diabetes data using Gene Expression Atlas, OMIM, UniProt literature

Page 4: RDFizing the EBI Gene Expression Atlas

Master headline

Gene Expression: Archive to Atlas

AE/GEO acquire

>250,000 Assays

>10,000 experiments

Re-annotate & summarizeATLAS

ArrayExpress

Curation Curation

Page 5: RDFizing the EBI Gene Expression Atlas

Master headline 04/20/235

Experimental Factor Ontology• We consume parts of reference ontologies from domain• Construct new classes and relations to answer our use cases• Aim is reuse of existing resources, shared frameworks and mapping of equivalencies where they exist

EFO

Disease Ontology Anatomy Reference Ontology

Ontology Biomedical Investigations

Chemical Entities of Biological Interest

(ChEBI)

Various Species Anatomy

Ontologies

Relation Ontology

Text mining

Page 6: RDFizing the EBI Gene Expression Atlas

Master headline

Gene Expression Atlas @ www.ebi.ac.uk/gxa

Query for Cell adhesion genes in all ‘organism parts’

‘View on EFO’

Ontologically Modeling Sample Variables in Gene Expression Data [email protected]

Page 7: RDFizing the EBI Gene Expression Atlas

Master headline

Input XML

Page 8: RDFizing the EBI Gene Expression Atlas

Master headline

Mapping XML Results to RDF (1)

Id here is an ENSEMBL Gene ID, e.g. RUNX1 (ENSG00000159216)

• Gene to related transcripts, sequence and gene functions • Also EFO ontology classes in RDF form (shown is label to IRI

triple)

Page 9: RDFizing the EBI Gene Expression Atlas

Master headline

Mapping XML Results to RDF (2)

• Connecting gene and ontology id together with experimental metrics

Page 10: RDFizing the EBI Gene Expression Atlas

Master headline

Mapping XML Results to RDF (3)

• Connecting gene with experimental metadata

Page 11: RDFizing the EBI Gene Expression Atlas

Master headline

Relationship Issues

• EFO attempts to follow OBO Foundry guidance and uses the OBO Relation Ontology

• OBI model is more complex, e.g. the relation between sample and measure is indirect*

• Relationship between some of entities is still not well represented across community, even protein product to gene (see my post to OBO list)

• is_about relation is very generic and largely meaningless

• We will use RO where possible, subclass RO otherwise and continue to monitor OBO

*see Brinkman et al, (2010) Modeling biomedical experimental processes with OBI, JBMS, 1(Suppl 1):S7

Page 12: RDFizing the EBI Gene Expression Atlas

Master headline

Display of query results in Gene Expression Atlas DB

Already: 1) JSON format 2) XML format Plus now: 3) RDF format

Page 13: RDFizing the EBI Gene Expression Atlas

Master headline

Java code RDF triples XML doc

XML result doc from Atlas

INPUT

PROCESS OUTPUT

XML doc with triple patterns

RDF pipeline

• Pipeline for generating the RDF given the XML input

• note this works with any XML code

Page 14: RDFizing the EBI Gene Expression Atlas

Master headline

Triple Pattern specification

Page 15: RDFizing the EBI Gene Expression Atlas

Master headline

Example RDF

Page 16: RDFizing the EBI Gene Expression Atlas

Master headline

Blank Node Connections

• First row (n1_0 ) 7 triples

Page 17: RDFizing the EBI Gene Expression Atlas

Master headline

• Is there a community that warrants directing resources towards this?

• Can we answer new questions?

• Can we integrate with other data sources?

• Can we consolidate complex, non-interoperable ontologies?

• EFO represents a view on this but is a scoped, pragmatic choice – will this indeed always be the case?

Discussion

Page 18: RDFizing the EBI Gene Expression Atlas

Master headline

Acknowledgements

• Electra Tapanari (intern that did bulk of implementation)

• Dietrich Rebholz-Schumann (funding internship)

• Christoph Grabmuller

• Misha Kapushesky

• Helen Parkinson

• Contact me

James Malone: [email protected]