semantic concept-based queries in onix cagrid · pdf fileoutline problem statement high-level...

28
Semantic concept-based queries in ONIX caGrid case Alejandra González Beltrán Joint work with Anthony Finkelstein (UCL), J Max Wilkinson (NCRI) and Jeff Kramer (ICL) caBIG Arch/VCDE F2F Meeting 11 th May 2009

Upload: trinhtu

Post on 22-Feb-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Semantic concept-based queries in ONIXcaGrid case

Alejandra González BeltránJoint work with Anthony Finkelstein (UCL), J Max Wi lkinson (NCRI) and Jeff Kramer (ICL)

caBIG Arch/VCDE F2FMeeting

11th May 2009

Outline

� Problem statement� High-level concept-based queries for caGrid resources

implemented in ONIX� Motivating Example

� Approach� Exploiting the combination of Semantic Web technologies and

caGrid� UML vs OWL

� System Architecture� OWL Generation� Query Rewriting and Translation

� Concluding remarks

Problem Statement

� Support for high-level descriptive queries based on domain concepts� High-level query: the user may be unaware of the structure of the

particular target resources� Descriptive query: the query is not given in terms of how to find

data, but gives the criteria for the desired data

� caGrid case: exploit the rich metadata infrastructure to support high-level queries based on NCIt concepts and its automatic translation to CQL

Motivating example

� A biomedical scientist is interested in the gene Transforming Growth Factor Beta 1 (TGFB1), which has been implicated in many diseases, including cancer� Interest in examining the effect of all or certain SNPs present in the

gene TGFB1.� Need to retrieve SNPs associated with the gene TGFB1 from a

data source

� caBIO is an appropriate target resource [Komatsoulis’08]

� caBIO model 4.2 is the latest version, but 4.0 is the latest version

Motivating Example over caBIO 4.0 service

Motivating Example over caBIO 4.0 service

Inherited UMLassociations

Motivating Example in CQL

<ns1:CQLQuery xmlns:ns1=" http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"><ns1:Target name=" gov.nih.nci.cabio.domain.SNP"><ns1:Association

name=" gov.nih.nci.cabio.domain.GeneRelativeLocation" roleName=" relativeLocationCollection">

<ns1:Association name=" gov.nih.nci.cabio.domain.Gene" roleName=" gene">

<ns1:Attribute name=" symbol" predicate= " EQUAL_TO" value=" TGFB1"/></ns1:Association></ns1:Association>

</ns1:Target></ns1:CQLQuery>

Motivating example – Annotated UML model

Single_Nucleotide_Polymorphism

Gene_Relative_Location

Gene

Gene_Symbol

Relative_Value

Chromosome

Location

Description Logic query(DL-query) in Manchester OWL Syntax

Motivating example – Query in terms of NCItconcepts

� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”

� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)

CQL query

Description Logic query(DL-query) in Manchester OWL Syntax

Motivating example – Query in terms of NCItconcepts

� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”

� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)

CQL query

Description Logic query(DL-query) in Manchester OWL Syntax

Motivating example – Query in terms of NCItconcepts

� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”

� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)

CQL query

Description Logic query(DL-query) in Manchester OWL Syntax

Motivating example – Query in terms of NCItconcepts

� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”

� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)

CQL query

Outline

� Problem statement� High-level concept-based queries for caGrid resources

implemented in ONIX� Motivating Example

� Approach� Exploiting the combination of Semantic Web technologies and

caGrid� UML vs OWL

� System Architecture� OWL Generation� Query Rewriting and Translation

� Concluding remarks

ApproachR

ules

Syntax

Structure

Reference

Domain EVS

caDSR

Ontologicalrepresentations of

data services’ metadata

Exploit caGrid rich metadata

infrastructure

InformationModels/ GME

Instance Data

Metadata Hierarchy [Pollock’04]

caGridData Service

ApproachR

ules

Syntax

Structure

Reference

Domain EVS

caDSR

Ontologicalrepresentations of

data services’ metadata

Exploit caGrid rich metadata

infrastructure

InformationModels/ GME

Instance Data

Metadata Hierarchy [Pollock’04]

caGridData Service

Annotated UML Web Ontology Language (OWL)

Software design and analysis

Closed World Assumption

Main constructs:� UML class: a type for a set of objects with common characteristics (UML attributes)� UML Associations� Generalisations: classes and associations

No formal semantics

Visual modelling language

Open World Assumption

Reasoning (or inference) capabilities

Main constructs:� OWL class: set of individuals� Object properties: relationships between individuals� Datatype properties: relationships between individuals and data values

Formal semantics: DLs

Textual modelling language

Semantic web, knowledge representation

UML OWL

Outline

� Problem statement� High-level concept-based queries for caGrid resources

implemented in ONIX� Motivating Example

� Approach� Exploiting the combination of Semantic Web technologies and

caGrid� UML vs OWL

� System Architecture� OWL Generation� Query Rewriting and Translation

� Concluding remarks

Architecture

FQP servicecaDSR service

EVS service

SemanticFQP service

OWL generationservice

OWL repositoryservice

caGriddata service

CQL

DCQLCQL

ONIX Portal

DCQL

concept-basedquery

� Services for data integration developed within the Ontology-based Query WG can extend this architecture

OWL generation from caGrid information models

� Analysis of existing UML-to-OWL transformation

� Custom UML-to-OWL transformation considering semantic annotations

� Two generation methods� Semantic data integration:

e.g. multiplicities [McCusker’09]

� Semantic queries: e.g. associations are transitive, only existential restrictions

NCIt

Data Service MetadataOntology

Annotated UMLontology

NCIt Module forData Service

Metadata

NCIt Module forData Service

Metadata

<<imports>> <<imports>>

<<depends>>

UMLontology

<<imports>>

Annotated UML model

Single_Nucleotide_Polymorphism

Gene_Relative_Location

Gene

Gene_Symbol

Relative_Value

Chromosome

Location

Part of caBIO 4.0 ontology

cb:GeneRelativeLocationcb:SNP

nci:Single_Nucleotide_Polymorphism

cb:relativeLocation

nci:Gene

cb:Gene

nci:Gene_Relative_Location

sq:hasConcept sq:hasConcept sq:hasConcept

sq:hasAssociation

rdfs:subPropertyOf

cb:gene

rdfs:subPropertyOf

Q ≅ hasConcept some Single_Nucleotide_Polymorphismand hasAssociation some (hasConcept some GeneConcept and

hasAttribute some (hasConcept some Gene_Symbol andhasValue value “TGFB1”))

<ns1:CQLQuery xmlns:ns1=" http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"><ns1:Target name=" gov.nih.nci.cabio.domain.SNP"><ns1:Association

name=" gov.nih.nci.cabio.domain.GeneRelativeLocation" roleName=" relativeLocationCollection">

<ns1:Association name=" gov.nih.nci.cabio.domain.Gene" roleName=" gene">

<ns1:Attribute name=" symbol" predicate=“ EQUAL_TO" value=" TGFB1"/></ns1:Association>

</ns1:Association></ns1:Target>

</ns1:CQLQuery>

Semantic Query ServiceQuery rewriting and translation

DL- Query Parsing

DL-query to OQOL

OQOL to CQL

DL- Query Validation

DL- Query RewritingData Service

Ontology

Syntactic analysis of query

Semantic analysis of query: it must entail a UML class or UML attributes to be valid

Reformulate query in terms of data service constructs using explanations [Kalyanpur’07]

Translate DL-query into the Object-Query Optimisation Language(monoid comprehensions) & Normalization [Fegaras’00].

Translate OQOL expression to CQL

DL-query

CQL

Conclusions

� Support for high-level description queries based on domain concepts for caGrid data services

� Approach generates ontologies from caGrid metadata and uses ontologies+reasoning to translate concept-based queries to CQL� No need for the user to be aware of the structure of the target

resource

� Approach is applicable to other resources exposing metadata and semantic annotations

Future Work� Extend approach to support DCQL queries� Incorporate caDSR semantics to OWL generator

References

[Komatsoulis’08] G Komatsoulis, D Warzel, F Hartel, K Shanbag, et al. caCORE version 3: Implementation of a model-driven, service-oriented architecture for semantic interoperability. Journal of Biomedical Informatics, 41(1):106-123.

[McCusker’09] J McCusker, J Phillips, A González Beltrán, A Finkelstein, Semantic web datawarehousing for caGrid. To appear in BMC Bioinformatics, 2009.

[Kalyanpur’07] A Kalyanpur, B Parsia, M Horridge, E Sirin. Finding all justifications of OWL DL entailments. ISWC/ASWC, 267-280.

[Fegaras’00] L Fegaras, D Maier. Optimizing Object Queries Using an Effective Calculus. ACM Trans. Database Syst. 25(4):457-516,2000

Acknowledgements

� NCRI team� Konrad Rokicki from caBIO team

Thank you!

Questions?