11 shared representation and community participation in standards (including vocabularies) for human...

11

Shared Representation and Community Participation in

Standards (including vocabularies) for Human

and Machine Interoperability

A Cooperative Effort Between the NCBO, NCRI and NCI CBIIT

Stuart Turner, DVM, MSLeafpath [email protected]

Architecture/VCDE Joint Face-to-Face

Friday 4 June 2010 | St. Louis, Missouri

22

Participants

- Mark Musen- Natasha Noy- Trish Whetzel

National (U.S.) Center for Biomedical Ontology (NCBO)

bioontology.org

http://bioontology.org/

33

Participants

- Alan Hogg- Stuart Bell

National (U.K.) Cancer Research Institute (NCRI)

Oncology Information Exchange (ONIX) & the Cancer InfoMatrix (CIM)

ncri-onix.org.uk

44

Participants

- Brian Davis- Sherri De Coronado- Robert Freimuth- Richard Kiefer- Hua Min- Michael Riben- Harold Solbrig- Grace Stafford- Larry Wright

National (U.S.) Cancer Institute (NCI)

Center for Biomedical Informatics and Information Technology (CBIIT)

55

Standards Representation Group

Objectives

1.Identify common features in a shared profile for ontologies for discovery, understanding, aggregation, annotation, rating and evaluation.

2.Avoid or limit the creation of something new.

3.A phased and pragmatic approach. Vocabularies first, then standards, specifications, projects, people, artifacts, etc.

4.Identify methods for discourse, rating, clustering, certifying, etc

5.Identify methods for federation and synchronization •Satisfy needs for certification, engaging a diverse community, broader entity representation and ultimately a “web of trust”.

1.White paper

2.An implementation

66

Standards Representation Group

Approach

1.Gather requirements from the three organizations.

2.Identify common features-of-interest as well as those where there may be discordance but of high value to an individual organization.

3.Identify existing active metadata models, including those that overlap and are complimentary.•Identify methods for discourse, rating, clustering, etc.•Identify methods

Where

https://wiki.nci.nih.gov/x/mkNyAQ

77

NCBO

About*

1.One of three National Centers for Biomedical Computing launched by NIH in 2005

2.Collaboration of Stanford, Mayo, Buffalo, Washington University, Johns Hopkins, and the Medical College of Wisconsin

3.Primary goal is to make ontologies accessible and usable

4.Research will develop technologies for ontology dissemination, use, indexing, alignment, and peer review

Key Activities*

1.Creates and maintain a library of biomedical ontologies.

2.Builds tools and Web services to enable the use of ontologies.

3.Collaborate with scientific communities that develop and use ontologies.

*Adapted from Mark Musen’s presentation to VCDE, 2009

88

NCBO

Biomedical Resource Ontology


99

NCBO

Notes in BioPortal


1010

NCRI - ONIX

About

Partnership of greater than 20 organizations

Goals: Promote data sharing, describing relevant standards, forming alliances

Projects: Cancer InfoMatrix

1111

NCRI - Cancer InfoMatrix

Illustration: A view of the Cancer InfoMatrix showing Ontologies matched to Clinical and highlighting CTCAE (Common Terminology Criteria for Adverse Events)

1212

NCRI - Cancer InfoMatrix

Illustration: A view of the Cancer InfoMatrix showing Ontologies matched to Clinical and showing details (metadata) for CTCAE (Common Terminology Criteria for Adverse Events)

1313

caBIG Vocabulary Reviews

Evolving class of certified vocabularies

Certification is via formal consensus review (Modified Delphi) using ~ 105 evaluation criteria grouped logically into four categories (structure, content, documentation and editorial/governance)

Evaluation criteria based on best-practices derived principally from healthcare community including Jim Cimino’s Desiderata

Two principal outcomes

1. Environment agnostic benchmark of the merits of a vocabulary

2. Benchmark in turn is used as a certification vehicle - yields a more specific measure of the fit-for-purpose of a vocabulary within the caBIG enterprise

Reviews performed since 2005NCI Thesaurus, Gene Ontology, CTCAE v3.0, LOINC, SNOMED CT, RadLex, Nanoparticle Ontology, MedDRA, CTCAE v4.0, ICD-9-CM (pending), ICD-10 (pending)

Ref: Cimino, J.J., et al., The caBIG terminology review process. J Biomed Inform, 2008.

1414

caBIG Vocabulary Reviews

Process continues to evolve, be refined

Example: Discrete literature (peer and grey) review

Augments the normative documentation and communication with a vocabulary representative

Attenuates any inherent gaps in knowledge or understanding and bias

Criteria statements “is the terminology evolving to maintain domain coverage?”, “is there a process for review by independent experts from the field in which the terminology will be used?”, or “is there nothing controversial about the terminology that should be considered?”

Illustration: A view of a vocabulary review subprocess showing discrete activities for literature, tooling, regulatory and use case reviews.

1515

Vocabulary Reviews: Challenges

1. Resolving a certification classification scheme that fairly and consistently abstracts the recommended usage of a vocabulary in caBIG

-Common issues to-date: narrative (absent, incomplete, inadequate) definitions (e.g. SNOMED-CT) and limited governance

-Expecting any vocabulary to survive all criteria unscathed is a tall order

-Pass/Fail doesn’t work (insufficient and often inappropriate)

-Fully certified, partially certified, uncertified scheme more approachable

-Partially certified requires qualifying guidance statements (e.g. “for use in value domains only”)

2. Questionable utility of reviews-Monolithic reports-Too terminology-centric. - Insufficient perspectives for different users-“At-a-glance” vs. “in-depth”-Rapidly obsolete-Time-consuming, costly, not updated-Limited community, use-case information-Unable to aggregate or cluster information (usage or concept domains)

1616

Cochrane Library Style Summaries

1717


1818


1919

Profiles: Candidate Metadata Models

1. Ontology Metadata Vocabulary (OMV | Consortium)Human readable and comprehensive

2. Terminology Metadata Model (TMM | CBIIT)Includes certification related attributes important to CBIIT

3. Common Terminology Services 2 (CTS2)Especially important for value domains/value sets, discovery, localizations, machine interoperability

4. Ontology Definition Metamodel (ODM | OMG)Broad coverage, use cases (clustering), lifecycle, engineering (tools), DL and CL, RDFS, Topic Maps

5. Metamodel for Ontology Registration (ISO 19763-3)Ontology registries and tracking evolution, machine interoperability

6. Open Provenance Model (OPM)Compliments other models to describe entities (agents), processes and artifacts. Good fit for describing “there”.

7. Dublin Core MetadataDocument and provenance centric. Often included in other models

8. Friend of a Friend (FOAF)Important for social integration, including user ratings (also Expertise Ontology, KSA’s)

9. Description of a Project (DOAP)Matching ontologies and standards to projects. NCBO has added project metadata to OMV

2020

Example of one issue and resolution

Issue

Identify a metamodel that captures the salient and common (NCBO, NCRI, CBIIT) attributes to describe and share ontologies (now) as well as standards, projects, people, artifacts (future).

Resolution

1. Gather requirements and rank them

2. Review candidate metamodels

3. Match requirements to features of extant metamodels

4. Resolve to a single model (if possible)

Progress

Current focus is NCBO’s use and extensions of the Ontology Metadata Vocabulary.

2121


Illustration: Focused view of the Decision Matrix used by group to rank requirements (on NCI Wiki)

2222


Illustration: Focused view of annotating features of the Ontology Metadata Vocabulary (on NCI Wiki)

2323


Courtesy: Natasha Noy (NCBO)

• The main class OMV:Ontology– Represents metadata about a version of an ontology

2424


Courtesy: Natasha Noy (NCBO)

Some OMV properties describing an ontology (properties on OMV:Ontology)• OMV:acronym• OMV:name• OMV:URI• OMV:naturalLanguage• OMV:creationDate• OMV:modificationDate• OMV:description• OMV:designedForOntologyTask• OMV:documentation• OMV:endorsedBy• OMV:hasContributor• OMV:hasCreator• OMV:hasDomain• OMV:status

• OMV:cointainsABox, OMV:containsTBox• OMV:expressiveness• OMV:hasFormalityLevel• OMV:hasLicense• OMV:keywords• OMV:keyClasses• OMV:knownUsage• OMV:isOfType• OMV:usedOntologyEngineeringTool• OMV:usedKnowledgeRepresentationPar

adigm• OMV:numberOfClasses• OMV:numberOfIndividuals• OMV:numberOfAxioms• OMV:numberOfProperties

2525


Adapted from slides by Natasha Noy (NCBO)

Properties of OMV:ontology that were added by NCBO

• administeredBy• hasContactEmail• hasContactName• uploadDate• id• internalVersionNumber• preferredNameProperty• synonymProperty• documentationProperty• authorProperty

• codingScheme• fileNames• filePath• hasView• isVersionOfVirtualOntology

2626



Properties of OMV:ontology that were added by NCBO

• administeredBy• hasContactEmail• hasContactName• uploadDate• id• internalVersionNumber• preferredNameProperty• synonymProperty• documentationProperty• authorProperty

• codingScheme• fileNames• filePath• hasView• isVersionOfVirtualOntology

2727

Virtual Ontology


• Needed a container for all the versions of the same ontology (e.g., to be able to provide an id that resolves to the latest version)

2828

Other Classes


• Project – describing ontology-based projects

• View, VirtualView – handling ontology views and subsets

• BioPortalUser (subclass of OMV:Person)

2929

Where are we?


• Moving towards OMV (core), and extensions (provenance and workflow), including those added by NCBO

• NCBO’s model may become putative model

• Next: Evaluate ratings and rankings systems (e.g. Amazon style)

• Next: Evaluate methods for federated exchange, synchronization of profiles and community participation

• Next: Proposed phased roll-out, implementation

3030

SAIF and ECCF effects on this process

• Updated ontology profiles are a better fit for our emerging agile and iterative environment

• Aggregation or clustering of ontology information more useful for describing concept domains, regulatory domains, ontology tasks, etc.

• Inclusion of community and “grass roots” participation more useful to discovery of relevant use cases, education and adoption (“web of trust”, evaluation by peers)

• Vocabulary review criteria have potential to be used as a self-assessment tool to “pre-certify”. Criteria may be used as conformance statements.

• The review process and criteria have already proven to help guide ontology development (i.e. CTCAE version 4.0)

• Provide sufficient detail (granularity, context) to assist usage and adoption at various levels in the implementation stack

• Community participation (experiential) important to mitigate presumptions about interoperability (e.g. semantic drift or change)

3131

Conclusions & Recommendations

• Ontology evaluation should include formal evaluations, self-evaluations, community reviews and case studies, methods of aggregation, viewing varying perspectives and should maintain currency, context, granularity and a web-of-trust

• Identify relevant metadata for ontologies that is also reusable for other entities, processes and artifacts (e.g. other standards, non-standards, people, projects, etc.)

• Be pragmatic (implement now/soon) yet prescient (have sufficient foresight) or “don’t repeat yourself” (DRY

3232

Questions?

11 shared representation and community participation in standards (including vocabularies) for human...

Documents

use of ontologies

ontologies accessible

cancer institute ncicenter

biomedical informatics

biomedical computing

shared representation

biomedical resource

mark musens presentation