document navigation: ontologies or knowledge organisation systems simon jupp bio-health informatics...

24
Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG, ISMB 2007 Vienna, Austria A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious D

Upload: miguel-crawford

Post on 28-Mar-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

Document Navigation: Ontologies or Knowledge Organisation Systems

Simon Jupp

Bio-Health Informatics Group

University of Manchester, UK

Bio-ontologies SIG, ISMB 2007 Vienna, Austria

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Page 2: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Document navigation over a Semantic Web

Introduction

Ontologies are being developed as background knowledge to drive the Semantic Web.

Message: Formal ontologies are not the only knowledge artefact needed, artefacts with weaker semantics have their role and are the best solution in some circumstances

Page 3: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

COHSE (Conceptual Open Hypermedia SErvice)

Navigation via Hypertext is a mainstay of WWW

Problem: Links are typically embedded to Web pages; hard-coding, format restrictions, ownership, legacy resources, maintenance, Unary targets etc.

Page 4: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Page 5: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Page 6: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Page 7: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

HTML Document in

HTML Document in

Linked HTML Document outLinked HTML

Document out

DLS AgentDLS

Agent

Knowledge Service

Knowledge Service

Resource Service

Resource Service

OntologyOntology

SKOSSKOS

Search EngineSearch Engine

Annotation DB

Annotation DB

COHSE Architecture

Page 8: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Sealife project

www.biotec.tu-dresden.de/sealife

A realisation of a Semantic Grid Browser, which links the current Web to the emerging eScience infrastructure

Built on existing tools developed by partners:

COHSE, but also GoPubMed and others.

Applications: from cells via tissue to patients

Evidence-based medicine

Patent and literature mining

Molecular biology

Page 9: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

What is a Sealife browser?

A Sealife browser is a browser that identifies concepts in texts and offers appropriate services based on these concepts.

Three main components:

1) an underlying “ontology” or set of “ontologies”

2) a text mining component

3) a service composition module

Page 10: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Sealife use case - study of disease

NeLI: National electronic Library of Infection portal. Range of users but few links.

User Group Question Targets Family Doctor (GP)

Tuberculosis drugs and side effects?

British National Formulary (BNF)

Clinicians Tuberculosis treatments guidelines?

Public Health Observatories (PHO)

Molecular Biologists

Drug resistant tuberculosis species?

PubMed

General Public What is tuberculosis? Health Protection Agency (HPA) or the NHS direct online website.

Given a document about Tuberculosis, where would users want to navigate to next?

http://www.neli.org.uk

Page 11: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Background Knowledge

What style of knowledge model is best suited for navigation?

Do we need strict semantics, are they a help or hindrance?

All I want to say is this concept “has something to do with” this concept - this doesn’t fit well with formal ontology

e.g Polio has something to do with polio virus / polio disease / polio treatment - get me documents about all of them!

Page 12: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Sealife background knowledge

To cover molecular biology through to medicine we need a large knowledge artefact to serve as background knowledge for SeaLife. This artefact must support sensible navigation between documents on the web.

Luckily…

Page 13: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

Phenotype

Sequence

Proteins

Gene products Transcript

Pathways

Cell type

BRENDA tissue / enzyme source

Development

Anatomy

Phenotype

Plasmodium life cycle

-Sequence types and features-Genetic Context

- Molecule role - Molecular Function- Biological process - Cellular component

-Protein covalent bond -Protein domain -UniProt taxonomy

-Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction

-Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version

-Mosquito gross anatomy-Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy-Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development

-NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history

eVOC (Expressed Sequence Annotation for Humans)

Page 14: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

1603 1975

ICD9

1900

ICD

1855 1785

SynopsisNosologiaeMethodicae

OPCS

SNOPCPT

MESHICD

MeSH

OPCSEmTree

SNOPCPT

1700

OPCS3

CTV3READ

1975 2005 1985 1995

ICPCSNOMED-2

SNOMEDInternational SNOMED-RT

SNOMED-CT

GALEN DM&D

UMLS

FMA

OPCS4 OPCS4.3

History of Medical Vocabularies

Page 15: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

What do we need for navigation?

The bio-medical domain is rich in vocabularies and ontologies. Large lexical resource including textual definitions and synonyms There is a varying degree of semantics, expressivity and formality in these vocabularies (e.g. MeSH) and ontologies (e.g FMA). Most include some form of hierarchy. Hierarchies are well suited for driving navigation.Question: Do we want strict sub/super class relationships? Or, do we want looser notations such as broader/narrower?

Page 16: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Ontology or Vocabulary?

Initial approach to COHSE and SeaLife was to represent everything in OWLThe strict semantics of OWL do not always lend themselves to sensible navigation, conversion from vocabularies to OWL are difficult. It’s hard to model some things in OWL…

MeSH OBO/OWL

Head <-- Ear

<-- Nose

Accident <-- Traffic Accident

<-- Accident Prevention

Nucleus part_of Cell

Cell has_part Nucleus - Not always True

PolioDisease causedby PolioVirus

Page 17: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

SKOS (Simple Knowledge Organisation System)

Purpose: Subject Metadata and information retrieval

RDF/XML representation

Model ideally suited for our task

e.g. This document is about tuberculosis

e.g. prefLabe, altLabel, broader, narrower, related.

http://www.w3.org/2004/02/skos/

Model for representing concept schemes, thesauri, classification system, taxonomies etc…

SKOS Concept is an individual of the class skos:concept, relationships are just asserted facts between (not restrictions!)

Page 18: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Conversion to SKOS

• relationship:part_of --> skos:broader ( e.g. finger part_of hand)

• relationship:contains --> skos:narrower ( e.g. skull contains brain)

• relationship:causes --> skos:related ( e.g. PolioDisease causes PolioVirus)

Sub properties:

skos:broader skos:narrowerinverse

obo:part_of obo:has_part

Leaves us open to migration back to OWL

Page 19: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Conversion to SKOS

SKOS

MeSHOBO ontologies

OWL ontologies

SNOMED

NeLITaxonomies

Concept Schemas

Thesauri Other..

Page 20: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Advantage of this approach

For a given concept e.g. “Polio Virus”, we can query multiple resources and bring related concepts together.

Source Terms found SKOS relation to “Poliovirus”

MeSH Brunhilde Virus skos:altTerm Disease Ontology Spinal cord disease skos:broaderThan Postpoliomyelitis

Syndrome skos:narrowerThan

SNOMED Microorganism skos:broaderThan Enterovirus skos:broaderThan

• Rapid (and cheap!) generation of knowledge artefact

• Take advantage efforts in multiple biomedical communities

• We don’t have to make any strong ontological distinctions

Page 21: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Disadvantage of this approach

Trade off: Lose the inferential power when querying a knowledge resource

Inability to do inconsistency checking

Potentially large redundancy in our knowledge base

Maintenance and scalability (>1000000 concepts) - especially for dynamic hyper-linking.

Unwanted concepts & relationship - especially from OWL conversion e.g. ‘Physical Entity’, ‘Continuant’ etc….

Linking overload!

Page 22: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Plug for Manchester’s SKOS plug-in - Protégé 4

• Instance hierarchy viewer

• OBO or OWL --> SKOS wizards

• Various rendering options

Page 23: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Conclusion

We need a large knowledge artefact that support navigation between related web resources

Rapid generation and reuse of existing terminologies (cheap)

Loosening the semantics of our model enables this with acceptable trade off

SKOS is a suitable model to represent our knowledge artefact

We have strong use cases from the life sciences - demos available

Still some open issues ….

Page 24: Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases

Acknowledgments

Manchester

Robert Stevens

Sean Bechhofer

Yeliz Yesilada

NeLI

Patty Kostkova

Other

COHSE developers

Sealife project

Thank you.