towards common upper ontology barry smith ontology.buffalo/smith september 25, 2009
DESCRIPTION
Towards Common Upper Ontology Barry Smith http://ontology.buffalo.edu/smith September 25, 2009. Overview. The Rise of Applied Ontology The OBO Foundry Basic Formal Ontology How to Build an Ontology What is a Disease?. Overview. The Rise of Applied Ontology The OBO Foundry - PowerPoint PPT PresentationTRANSCRIPT
Towards Common Upper Ontology
Barry Smithhttp://ontology.buffalo.edu/smith
September 25, 2009
1
Overview
1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?
2
Overview
1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?
3
Uses of ‘ontology’ in PubMed abstracts
4
2006 2260
2007 2968
2008 3236
year number of abstracts
5
By far the most successful: The Gene Ontology
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
How to do biology across the genome?
7
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
8
what cellular component?
what molecular function?
what biological process?
9
GO used to tag database entriesMouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
10
GO used to tag database entriesMouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
11
GO used to tag database entriesMouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
12
what cellular component?
what molecular function?
what biological process?
GO used in curation of literature
13
A new kind of scientific publishing
Biologist curators annotate experimental observations reported in the biomedical literature to link gene products (such as proteins) with GO terms
International Society of Biocurators http://www.biocurator.org/
14
15
16
17
Clark et al., 2005
part_of
converting journal articles into algorithmically processable artifacts
18
The logic of GO
OBO Format
http://oboedit.org/
OWL DL
http://www.co-ode.org/resources/papers/OBO2OWL.pdf
Common Logic http://www.berkeleybop.org/people/cjm/Mungall-bib.html#mungall_experiences_2009
19
$100 mill. invested in literature curation using GO
over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO
20
GO provides a controlled system of representations for use in annotating
data and literature
• multi-species
• multi-disciplinary
• multi-granularity, from molecules to population
21
Example of use of the GOA study of 11 breast and 11 colorectal cancers found 13,023 genes
The GO tells you what is standard functioning for each these genes
By searching for deviations from this standard in the sample, 189 genes were identified as being mutated at significant frequencies and thus as providing targets for diagnostic and therapeutic intervention.
Sjöblöm T, et al. Science. 2006 ;314:268-74.
22
This kind of research only works if we have a common ontology
• Data is retrievable
• Data is comparable
• Data is integratable
only to the degree that it is annotated using a common controlled vocabulary (compare the role of seconds, meters, kilograms …)
23
Overview
1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?
24
GO is amazingly successful in overcoming data silo problems
but it covers only
– cellular components
– molecular functions
– biological processes
25
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry26
The OBO Foundry– to extend the GO to enable intelligent integration of gigantic bodies of heterogeneous data across the entire domain of the life sciences, including clinical medicine
– to create an evolving, map-like, computable representation of the entire domain of biological and medical reality
Barry Smith, et al., “The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration”, Nature Biotechnology, 25 (11), 2007
27
Overview
1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?
28
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
rationale of OBO Foundry coverage
GRANULARITY
RELATION TO TIME
29
Basic Formal Ontology (BFO)
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
http://ontology.buffalo.edu/bfo/ 30
BFO
A simple top-level ontology to support information integration in scientific research
No abstracta
Nothing propositional
No overlap with domain ontologies (for society, for information, …) – built by populating downwards
31
Three Fundamental Dichotomies
Continuant vs. occurrent
Dependent vs. independent
Type vs. instance
32
Continuant
thing, quality …
Occurrent
process, event
33
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......quality dependson bearer
34
instance_of
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......
types
instances35
3 kinds of (binary) relations
Between types
• human is_a mammal
• human heart part_of human
Between an instance and a type
• this human instance_of the type human
• this human allergic_to the type tamiflu
Between instances
• Mary’s heart part_of Mary
• Mary’s aorta connected_to Mary’s heart36
depends_on
Continuant Occurrent
process
IndependentContinuant
thing
DependentContinuant
quality
.... ..... .......temperature dependson bearer
37
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality, …
.... ..... .......event dependson participant
38
3 kinds of (binary) relations
Between types
• human is_a mammal
• human heart part_of human
Between an instance and a type
• this human instance_of the type human
• this human allergic_to the type tamiflu
Between instances
• Mary’s heart part_of Mary
• Mary’s aorta connected_to Mary’s heart39
Clark et al., 2005
part_of
is_a
Definitions of relations
40
Barry Smith, et al., “Relations in Biomedical Ontologies”, Genome Biology 2005, 6 (5), R46.
Type-level relations presuppose the underlying instance-level relations
A is_a B =def. A and B are types and all instances of A are instances of B
A part_of B =def. All instances of A are instance-level-parts-of some instance of B
41
human testis part_of adult human being
but nothuman being has_part human testis
and not even
male human being has_part human testis
42
The assertions linking terms in ontologies must hold universally
Hence type-level relations such as
part_of are provided with
All-Some definitions
43
part_of for continuant types
A part_of B =def.
For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x instance_level_part_of y at t
cell membrane part_of cell44
part_of for occurrent types
A part_of B =def.
For all x, if x instance_of A then there is some y, y instance_of B and x instance_level_part_of y
EVERY A IS PART OF SOME B 45
Instances vs. types
Instance-level relations and type-level relations have logically distinct properties
What is symmetric on the level of instances need not be symmetric on the level of types
46
seminal vesicle adjacent_to urinary bladder
Not: urinary bladder adjacent_to seminal vesicle
nucleus adjacent_to cytoplasm
Not: cytoplasm adjacent_to nucleus
47
Overview
1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?
48
Blinding Flash of the Obvious
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
How to create an ontology from the top down
49
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry50
Example: The Cell Ontology
Benefits of coordination
No need to reinvent the wheel
Can profit from lessons learned through mistakes made by others
Can more easily reuse what is made by others
Can more easily inspect and criticize results of others’ work (PATO)
Leads to innovations (e.g. Mireot) in strategies for combining ontologies
52
Users of BFO
PharmaOntology (W3C HCLS SIG)
MediCognos / Microsoft Healthvault
Cleveland Clinic Semantic Database in Cardiothoracic Surgery
Major Histocompatibility Complex (MHC) Ontology (NIAID)
Neuroscience Information Framework Standard (NIFSTD) and Constituent Ontologies
53
Users of BFO
Interdisciplinary Prostate Ontology (IPO)
Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research
Neural Electromagnetic Ontologies (NEMO)
ChemAxiom – Ontology for Chemistry
Ontology for Risks Against Patient Safety (RAPS/REMINE) (EU FP7)
IDO Infectious Disease Ontology (NIAID)
54
Users of BFO
National Cancer Institute Biomedical Grid Terminology (BiomedGT)
US Army Universal Core Semantic Layer (UCore SL)
US Army Biometrics Ontology
US Army Command and Control Ontology
Ontology for General Medical Science (OGMS)
55
Infectious Disease Ontology Consortium
• MITRE, Mount Sinai, UTSouthwestern – Influenza
• IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis, Staph. aureus, HIV
• Case Western Reserve – Infective Endocarditis
• University of Michigan – Brucilosis
56
Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology
The OBO Foundry
57
Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology
The OBO Foundry
58
Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology
The OBO Foundry
59
Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology
The OBO Foundry
60
Blinding Flash of the Obvious
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
How to create an ontology from the top down
61
Continuant
IndependentContinuant
DependentContinuant
..... .....
Non-realizableDependentContinuant(quality)
Realizable DependentContinuant(function, role, disposition)
62
Realizable dependent continuants
plan
function
role
disposition
capability
tendency
continuants
63
Their realizations
execution
expression
exercise
realization
application
course
occurrents
64
Continuant
IndependentContinuant
DependentContinuant
..... .....
Non-realizableDependentContinuant(quality)
Realizable DependentContinuant(function, role, disposition)
65
realization depends_on realizable
Continuant Occurrent
IndependentContinuant
bearer
DependentContinuant
disposition
.... ..... .......Process of realization
66
Specific Dependenceon the instance level
a depends_on b =def. a is necessarily such that if b ceases to exist than a ceases to exist
on the type level
A specifically_depends_on B =def. for every instance a of A, there is some instance b of B such that a depends_on b.
67
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......temperature dependson bearer
68
Specifically dependent continuants
• the quality of whiteness of this cheese
• your role as lecturer
• the disposition of this patient to experience diarrhea
69
the particular case of redness (of a particular fly eye)
the universal red
instantiates
an instance of an eye (in a particular fly)
the universal eye
instantiates
depends on
70
the particular case of redness (of a particular fly eye)
red
instantiates
an instance of an eye (in a particular fly)
eye
instantiates
depends on
color anatomical structure
is_a is_a
71
depends_on
Continuant Occurrent
process
IndependentContinuant
thing
DependentContinuant
quality
.... ..... .......temperature dependson bearer
72
Specifically Dependent Continuants
SpecificallyDependentContinuant
Quality, Pattern
Realizable Dependent Continuant
if the bearer ceases to exist, then its quality, function, role ceases to exist
the color of my skin
the function of my heart to pump blood
my weight73
Generically Dependent Continuants
GenericallyDependentContinuant
Information Object
Gene Sequence
if one bearer ceases to exist, then the entity can survive, because there are other bearers
(copyability)
the pdf file on my laptop
the DNA (sequence) in this chromosome 74
Overview
1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?
75
What is a Disease?
a state in which a function or part of the body is no longer in a healthy condition.
an illness
a process that is a hazard to health and/or longevity.
a pathological condition that is cross-culturally defined and recognized
76
Four distinct classificatory tasks
1. of people (patients, carriers, …)
2. of diseases (cases, instances, problems, …)
3. of courses of disease (symptoms, …)
4. of representations (data, diagnoses…)
77
Four distinct BFO categories
1. person (patient, carrier, …) – independent continuant
2. disease (case, instance, problem, …) – specifically dependent continuant
3. course of disease (symptom, treatment…)– occurrent
4. representation (record, datum, diagnosis…)– generically dependent continuant
78
Disposition
Internally-Grounded Realizable Entity
A disposition is
a realizable entity which is such that
(1) if it ceases to exist, then its bearer is physically changed, and
(2) whose realization occurs, in virtue of the bearer’s physical make-up, when this bearer is in some special physical circumstances
79
Disorder
A part of an (extended) organism which serves as the bearer of a disposition of a certain sort
80
Big Picture
81
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
etiological process
produces
disorder
bears
disposition
realized_in
pathological process
produces
abnormal bodily features
recognized_as
signs & symptomsinterpretive process
produces
diagnosis
used_in82
Elucidation of Primitive Terms ‘bodily feature’ - an abbreviation for a physical
component, a bodily quality, or a bodily process. disposition - an attribute describing the propensity to
initiate certain specific sorts of processes when certain conditions are satisfied.
clinically abnormal - some bodily feature that (1) is not part of the life plan for an organism of the
relevant type (unlike aging or pregnancy), (2) is causally linked to an elevated risk either of pain or
other feelings of illness, or of death or dysfunction, and (3) is such that the elevated risk exceeds a certain
threshold level.*
*Compare: baldness83
Definitions - Foundational Terms
Disorder =def. – A physical component that is clinically abnormal.
Pathological Process =def. – A bodily process that is a realization of a disorder and is clinically abnormal.
Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism.
84
Dispositions and Predispositions
All diseases are dispositions; not all dispositions are diseases.
A predisposition is a disposition. Predisposition to Disease of Type X
=def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing the disease X.
85
Cirrhosis - environmental exposure Etiological process - phenobarbitol-
induced hepatic cell death produces
Disorder - necrotic liver bears
Disposition (disease) - cirrhosis realized_in
Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces
Abnormal bodily features recognized_as
Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out cirrhosis suggests
Laboratory tests produces
Test results - elevated liver enzymes in serum used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease cirrhosis
86
Influenza - infectious Etiological process - infection of
airway epithelial cells with influenza virus produces
Disorder - viable cells with influenza virus bears
Disposition (disease) - flu realized_in
Pathological process - acute inflammation produces
Abnormal bodily features recognized_as
Symptoms - weakness, dizziness Signs - fever
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out influenza suggests
Laboratory tests produces
Test results - elevated serum antibody titers used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease flu
But the disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course).
87
Huntington’s Disease - genetic Etiological process - inheritance of
>39 CAG repeats in the HTT gene produces
Disorder - chromosome 4 with abnormal mHTT bears
Disposition (disease) - Huntington’s disease realized_in
Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces
Abnormal bodily features recognized_as
Symptoms - anxiety, depression Signs - difficulties in speaking and
swallowing
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out Huntington’s suggests
Laboratory tests produces
Test results - molecular detection of the HTT gene with >39CAG repeats used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease
88
Benefits of coordinationNo need to reinvent the wheel
Can profit from lessons learned through mistakes made by others
Can more easily reuse data collected by others
Can more easily resolve the silo problems created by multiple independent discipline-specific ontologies
89