11 shared representation and community participation in standards (including vocabularies) for human...
TRANSCRIPT
11
Shared Representation and Community Participation in
Standards (including vocabularies) for Human
and Machine Interoperability
A Cooperative Effort Between the NCBO, NCRI and NCI CBIIT
Stuart Turner, DVM, MSLeafpath [email protected]
Architecture/VCDE Joint Face-to-Face
Friday 4 June 2010 | St. Louis, Missouri
22
Participants
- Mark Musen- Natasha Noy- Trish Whetzel
National (U.S.) Center for Biomedical Ontology (NCBO)
bioontology.org
33
Participants
- Alan Hogg- Stuart Bell
National (U.K.) Cancer Research Institute (NCRI)
Oncology Information Exchange (ONIX) & the Cancer InfoMatrix (CIM)
ncri-onix.org.uk
44
Participants
- Brian Davis- Sherri De Coronado- Robert Freimuth- Richard Kiefer- Hua Min- Michael Riben- Harold Solbrig- Grace Stafford- Larry Wright
National (U.S.) Cancer Institute (NCI)
Center for Biomedical Informatics and Information Technology (CBIIT)
55
Standards Representation Group
Objectives
1.Identify common features in a shared profile for ontologies for discovery, understanding, aggregation, annotation, rating and evaluation.
2.Avoid or limit the creation of something new.
3.A phased and pragmatic approach. Vocabularies first, then standards, specifications, projects, people, artifacts, etc.
4.Identify methods for discourse, rating, clustering, certifying, etc
5.Identify methods for federation and synchronization •Satisfy needs for certification, engaging a diverse community, broader entity representation and ultimately a “web of trust”.
1.White paper
2.An implementation
66
Standards Representation Group
Approach
1.Gather requirements from the three organizations.
2.Identify common features-of-interest as well as those where there may be discordance but of high value to an individual organization.
3.Identify existing active metadata models, including those that overlap and are complimentary.•Identify methods for discourse, rating, clustering, etc.•Identify methods
Where
https://wiki.nci.nih.gov/x/mkNyAQ
77
NCBO
About*
1.One of three National Centers for Biomedical Computing launched by NIH in 2005
2.Collaboration of Stanford, Mayo, Buffalo, Washington University, Johns Hopkins, and the Medical College of Wisconsin
3.Primary goal is to make ontologies accessible and usable
4.Research will develop technologies for ontology dissemination, use, indexing, alignment, and peer review
Key Activities*
1.Creates and maintain a library of biomedical ontologies.
2.Builds tools and Web services to enable the use of ontologies.
3.Collaborate with scientific communities that develop and use ontologies.
*Adapted from Mark Musen’s presentation to VCDE, 2009
88
NCBO
Biomedical Resource Ontology
*Adapted from Mark Musen’s presentation to VCDE, 2009
99
NCBO
Notes in BioPortal
*Adapted from Mark Musen’s presentation to VCDE, 2009
1010
NCRI - ONIX
About
Partnership of greater than 20 organizations
Goals: Promote data sharing, describing relevant standards, forming alliances
Projects: Cancer InfoMatrix
1111
NCRI - Cancer InfoMatrix
Illustration: A view of the Cancer InfoMatrix showing Ontologies matched to Clinical and highlighting CTCAE (Common Terminology Criteria for Adverse Events)
1212
NCRI - Cancer InfoMatrix
Illustration: A view of the Cancer InfoMatrix showing Ontologies matched to Clinical and showing details (metadata) for CTCAE (Common Terminology Criteria for Adverse Events)
1313
caBIG Vocabulary Reviews
Evolving class of certified vocabularies
Certification is via formal consensus review (Modified Delphi) using ~ 105 evaluation criteria grouped logically into four categories (structure, content, documentation and editorial/governance)
Evaluation criteria based on best-practices derived principally from healthcare community including Jim Cimino’s Desiderata
Two principal outcomes
1. Environment agnostic benchmark of the merits of a vocabulary
2. Benchmark in turn is used as a certification vehicle - yields a more specific measure of the fit-for-purpose of a vocabulary within the caBIG enterprise
Reviews performed since 2005NCI Thesaurus, Gene Ontology, CTCAE v3.0, LOINC, SNOMED CT, RadLex, Nanoparticle Ontology, MedDRA, CTCAE v4.0, ICD-9-CM (pending), ICD-10 (pending)
Ref: Cimino, J.J., et al., The caBIG terminology review process. J Biomed Inform, 2008.
1414
caBIG Vocabulary Reviews
Process continues to evolve, be refined
Example: Discrete literature (peer and grey) review
Augments the normative documentation and communication with a vocabulary representative
Attenuates any inherent gaps in knowledge or understanding and bias
Criteria statements “is the terminology evolving to maintain domain coverage?”, “is there a process for review by independent experts from the field in which the terminology will be used?”, or “is there nothing controversial about the terminology that should be considered?”
Illustration: A view of a vocabulary review subprocess showing discrete activities for literature, tooling, regulatory and use case reviews.
1515
Vocabulary Reviews: Challenges
1. Resolving a certification classification scheme that fairly and consistently abstracts the recommended usage of a vocabulary in caBIG
-Common issues to-date: narrative (absent, incomplete, inadequate) definitions (e.g. SNOMED-CT) and limited governance
-Expecting any vocabulary to survive all criteria unscathed is a tall order
-Pass/Fail doesn’t work (insufficient and often inappropriate)
-Fully certified, partially certified, uncertified scheme more approachable
-Partially certified requires qualifying guidance statements (e.g. “for use in value domains only”)
2. Questionable utility of reviews-Monolithic reports-Too terminology-centric. - Insufficient perspectives for different users-“At-a-glance” vs. “in-depth”-Rapidly obsolete-Time-consuming, costly, not updated-Limited community, use-case information-Unable to aggregate or cluster information (usage or concept domains)
1616
Cochrane Library Style Summaries
1717
Vocabulary Reviews: Challenges
1818
Vocabulary Reviews: Challenges
1919
Profiles: Candidate Metadata Models
1. Ontology Metadata Vocabulary (OMV | Consortium)Human readable and comprehensive
2. Terminology Metadata Model (TMM | CBIIT)Includes certification related attributes important to CBIIT
3. Common Terminology Services 2 (CTS2)Especially important for value domains/value sets, discovery, localizations, machine interoperability
4. Ontology Definition Metamodel (ODM | OMG)Broad coverage, use cases (clustering), lifecycle, engineering (tools), DL and CL, RDFS, Topic Maps
5. Metamodel for Ontology Registration (ISO 19763-3)Ontology registries and tracking evolution, machine interoperability
6. Open Provenance Model (OPM)Compliments other models to describe entities (agents), processes and artifacts. Good fit for describing “there”.
7. Dublin Core MetadataDocument and provenance centric. Often included in other models
8. Friend of a Friend (FOAF)Important for social integration, including user ratings (also Expertise Ontology, KSA’s)
9. Description of a Project (DOAP)Matching ontologies and standards to projects. NCBO has added project metadata to OMV
2020
Example of one issue and resolution
Issue
Identify a metamodel that captures the salient and common (NCBO, NCRI, CBIIT) attributes to describe and share ontologies (now) as well as standards, projects, people, artifacts (future).
Resolution
1. Gather requirements and rank them
2. Review candidate metamodels
3. Match requirements to features of extant metamodels
4. Resolve to a single model (if possible)
Progress
Current focus is NCBO’s use and extensions of the Ontology Metadata Vocabulary.
2121
Example of one issue and resolution
Illustration: Focused view of the Decision Matrix used by group to rank requirements (on NCI Wiki)
2222
Example of one issue and resolution
Illustration: Focused view of annotating features of the Ontology Metadata Vocabulary (on NCI Wiki)
2323
Example of one issue and resolution
Courtesy: Natasha Noy (NCBO)
• The main class OMV:Ontology– Represents metadata about a version of an ontology
2424
Example of one issue and resolution
Courtesy: Natasha Noy (NCBO)
Some OMV properties describing an ontology (properties on OMV:Ontology)• OMV:acronym• OMV:name• OMV:URI• OMV:naturalLanguage• OMV:creationDate• OMV:modificationDate• OMV:description• OMV:designedForOntologyTask• OMV:documentation• OMV:endorsedBy• OMV:hasContributor• OMV:hasCreator• OMV:hasDomain• OMV:status
• OMV:cointainsABox, OMV:containsTBox• OMV:expressiveness• OMV:hasFormalityLevel• OMV:hasLicense• OMV:keywords• OMV:keyClasses• OMV:knownUsage• OMV:isOfType• OMV:usedOntologyEngineeringTool• OMV:usedKnowledgeRepresentationPar
adigm• OMV:numberOfClasses• OMV:numberOfIndividuals• OMV:numberOfAxioms• OMV:numberOfProperties
2525
Example of one issue and resolution
Adapted from slides by Natasha Noy (NCBO)
Properties of OMV:ontology that were added by NCBO
• administeredBy• hasContactEmail• hasContactName• uploadDate• id• internalVersionNumber• preferredNameProperty• synonymProperty• documentationProperty• authorProperty
• codingScheme• fileNames• filePath• hasView• isVersionOfVirtualOntology
2626
Example of one issue and resolution
Adapted from slides by Natasha Noy (NCBO)
Properties of OMV:ontology that were added by NCBO
• administeredBy• hasContactEmail• hasContactName• uploadDate• id• internalVersionNumber• preferredNameProperty• synonymProperty• documentationProperty• authorProperty
• codingScheme• fileNames• filePath• hasView• isVersionOfVirtualOntology
2727
Virtual Ontology
Adapted from slides by Natasha Noy (NCBO)
• Needed a container for all the versions of the same ontology (e.g., to be able to provide an id that resolves to the latest version)
2828
Other Classes
Adapted from slides by Natasha Noy (NCBO)
• Project – describing ontology-based projects
• View, VirtualView – handling ontology views and subsets
• BioPortalUser (subclass of OMV:Person)
2929
Where are we?
Adapted from slides by Natasha Noy (NCBO)
• Moving towards OMV (core), and extensions (provenance and workflow), including those added by NCBO
• NCBO’s model may become putative model
• Next: Evaluate ratings and rankings systems (e.g. Amazon style)
• Next: Evaluate methods for federated exchange, synchronization of profiles and community participation
• Next: Proposed phased roll-out, implementation
3030
SAIF and ECCF effects on this process
• Updated ontology profiles are a better fit for our emerging agile and iterative environment
• Aggregation or clustering of ontology information more useful for describing concept domains, regulatory domains, ontology tasks, etc.
• Inclusion of community and “grass roots” participation more useful to discovery of relevant use cases, education and adoption (“web of trust”, evaluation by peers)
• Vocabulary review criteria have potential to be used as a self-assessment tool to “pre-certify”. Criteria may be used as conformance statements.
• The review process and criteria have already proven to help guide ontology development (i.e. CTCAE version 4.0)
• Provide sufficient detail (granularity, context) to assist usage and adoption at various levels in the implementation stack
• Community participation (experiential) important to mitigate presumptions about interoperability (e.g. semantic drift or change)
3131
Conclusions & Recommendations
• Ontology evaluation should include formal evaluations, self-evaluations, community reviews and case studies, methods of aggregation, viewing varying perspectives and should maintain currency, context, granularity and a web-of-trust
• Identify relevant metadata for ontologies that is also reusable for other entities, processes and artifacts (e.g. other standards, non-standards, people, projects, etc.)
• Be pragmatic (implement now/soon) yet prescient (have sufficient foresight) or “don’t repeat yourself” (DRY
3232
Questions?