the obo foundry

Post on 03-Jan-2016

67 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The OBO Foundry. Barry Smith. History of Ontology as Computational Artifact. 1970s: AI (based on FOL: McCarthy, Hayes) 1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...) 1999: GO, OBO format (Ashburner, ...) - PowerPoint PPT Presentation

TRANSCRIPT

The OBO Foundry

Barry Smith

1

History of Ontology as Computational Artifact

1970s: AI (based on FOL: McCarthy, Hayes)

1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...)

1999: GO, OBO format (Ashburner, ...)

2000s: Semantic Web (based on OWL; Horrocks, Hendler, 1000 lite ontologies)

2009: Reconciliation of OBO with OWL; but still 2 methodologies: OBO Foundry; NCBO Bioportal

2

Ontology and the Semantic Web

• html demonstrated the power of the Web to allow sharing of information

• can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)?

• can we use RDF and OWL to break down silos, and create useful integration of on-line data and information?

3

people tried, but the more they were successful, they more they failed

OWL breaks down data silos via controlled vocabularies for the description of data dictionaries

Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways

4

reasons for this effect• Semantic Web (original) idea: if a million ‘lite

ontologies bloom’, then somehow intelligence will be created

• let’s all build new ones (shrink-wrapped software mentality – you will not get paid for reusing existing ontologies

• requirements-driven software development, promotes forking, reduces potential for secondary uses

5

Ontology success stories, and some reasons for failure

A fragment of the “Linked Open Data” in the biomedical domain

6

What you get with ‘mappings’

HPO: all phenotypes (excess hair loss, duck feet ...)

7

What you get with ‘mappings’

HPO: all phenotypes (excess hair loss, duck feet ...)

NCIT: all organisms

8

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

all organisms

allose (a form of sugar)

9

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

all organisms

allose (a form of sugar)

Acute Lymphoblastic Leukemia (A.L.L.)

10

Mappings are hardThey are fragile, and expensive to maintainNeed new authorities to maintain(one for each pair of

mapped ontologies), yielding new risk of forking – who will police the mappings?

The goal should be to minimize the need for mappings, by avoiding redundancy in the first place

Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible

11

Why should you care?

• you need to create systems for data mining and text processing which will yield useful digitally coded output

• if the codes you use are constantly in need of ad hoc repair huge, resources will be wasted

• serious investment in annotation will be defeated from the start

• relevant data will not be found, because it will be lost in multiple semantic cemeteries

12

How to do it right?

• how create an incremental, evolutionary process, where what is good survives, and what is bad fails

• where the number of ontologies needing to be linked is small

• where links are stable• create a scenario in which people will find it

profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

13

Reasons why GO has been successful

It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists

Based on community consensusUpdated every nightClear versioning principles ensure backwards

compatibility; prior annotations do not lose their value

Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)

14

GO has learned the lessons of successful cooperation

• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in

manifold combinations with other ontologies)• Subjected to considerable third-party critique• Tracker for user input with rapid turnaround and

help desk

15

GO has been amazingly successful in overcoming the data balkanization

problembut it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …

16

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 17

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Environment Ontology

envi

ron

men

ts

are

her

e

18

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Population-level ontologies 19

Ontology success stories, and some reasons for failure

20

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org 21

Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology

and agree in advance to collaborate with developers of ontologies in adjacent domains.

http://obofoundry.org

The OBO Foundry: a step-by-step, evidence-based approach to expand

the GO

22

OBO Foundry Principles

Common governance (coordinating editors)

Common training

Common architecture to overcome Tim Berners Lee-ism:

• simple shared top level ontology

• shared Relation Ontology: www.obofoundry.org/ro

23

Open Biomedical Ontologies Foundry

Seeks to create high quality, validated terminology modules across all of the life sciences which will be

• one ontology for each domain, so no need for mappings

• close to language use of experts

• evidence-based

• incorporate a strategy for motivating potential developers and users

• revisable as science advances

24

Principles

http://obofoundry.org/wiki/index.php/OBO_FoundryPrinciples

25

Pistoia AllianceOpen standards for data and technology interfaces in

the life science research industry

consortium of major pharmaceutical and life science companies

can we address the data silo problems created by multiplicity of proprietary terminologies by declaring terminology ‘pre-competitive’

require shared use of something like OBO Foundry ontologies in presentation of information?

26

27

Virtual Physiological Human

28

Only with a prospective standard like that of the OBO Foundry could

something like the VPH work

designed to guarantee interoperability of ontologies from the very start (and to keep out weeds)

initial set of 10 criteria tested in the annotation of

scientific literature

model organism databases

life science experimental results

29

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO Foundry coverage

GRANULARITY

RELATION TO TIME

30

ORTHOGONALITY

modularity ensures • annotations can be additive• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit

work on other modules• incentivization of those responsible for

individual modules

31

Benefits of coordination

• Can more easily reuse what is made by others• Can more easily inspect and criticize what is

made by others• Leads to innovations (e.g. Mireot strategy for

importing terms into ontologies)

32

8 Foundry members (2010)

CHEBI: Chemical Entities of Biological Interest

GO: Gene Ontology

PATO: Phenotypic Quality Ontology

PRO: Protein Ontology

XAO: Xenopus Anatomy Ontology

ZFA: Zebrafish Anatomy Ontology

33

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)XAO ZFA

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule (SO, RnaO)

Molecular Function(GO)

Molecular Process

(GO)ChEBI PRO

Current Foundry members in yellow34

ORGAN ANDORGANISM

OrganismNCBI

Taxonomy

CARO FMAOrgan

Function(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)

XAO ZFA

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULESO RnaO Molecular Function

(GO)

Molecular Process

(GO)ChEBI PRO

Prospective Foundry ontologies (in green):Foundational Model of Anatomy Ontology (FMA)Cell Ontology (CL)Sequence Ontology (SO)RNA Ontology (RnaO)

35

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)Sequence Ontology

(SO*) Molecular Function

(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

36

Problem cases

Common Anatomy Reference Ontology

Disease Ontology

Function Ontologies Cellular Component Function

Cellular Function

Organ Function

Artifact Function (pumping, transporting ...)

Environment Ontology

Species Ontology (NCBI Taxonomy)37

IDO (Infectious Disease Ontology) Core

Follows GO strategy of providing a canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectorsProvides common terminology resources and tested common guidelines for a vast array of different disease communities

38

IDO (Infectious Disease Ontology) Consortium• MITRE, Mount Sinai, UTSouthwestern – Influenza• IMBB/VectorBase – Vector borne diseases (A.

gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)

• Colorado State University – Dengue Fever• Duke University – Tuberculosis, Staph. aureus• Cleveland Clinic – Infective Endocarditis• University of Michigan – Brucellosis• Duke University, University at Buffalo – HIV

39

Ontology for General Medical Science

http://code.google.com/p/ogms/

(OBO) http://purl.obolibrary.org/obo/ogms.obo

(OWL) http://purl.obolibrary.org/obo/ogms.owl

40

OGMS-based initiatives

Vital Signs Ontology (VSO) (Welch Allyn)

EHR / Demographics Ontology

Infectious Disease Ontology

Mental Health Ontology

Emotion Ontology

41

Ontology for General Medical Science

Jobst Landgrebe (then Co-Chair of the HL7 Vocabulary Group):

“the best ontology effort in the whole biomedical domain by far”

42

EXPERIMENTAL ARTIFACTS Ontology for Biomedical Investigations (OBI)

CLINICAL MEDICINE Ontology of General Medical Science (OGMS)

INFORMATION ARTIFACTS Information Artifact Ontology (IAO)

How to keep clear about the distinction• processes of observation,

• results of such processes (measurement data)

• the entities observed

43

top related