New Foundations for Social Ontology
Barry Smithhttp://ontology.buffalo.edu/smith
1
We will be able to use ontologies to help us share data
are ontologically coherent (created under adult supervision)
and logically coherent
and computationally tractable
and work well together – evolve together
– created according to the tested rules
2
A new approach
prospective standardization based on objective measures of what works
bring together selected groups to agree on and commit to good terminology / annotation habits (traffic laws) preemptively
3
Compare science1. scientific theories must be common
resources (cannot be bought or sold)
2. they must use open publishing venues
3. they must constantly evolve to reflect results of scientific experiments (“evidence-based”)
4. must be synchronized– use common SI system of units– common mathematical theories (built by adults)
4
for science
create an evolutionary path towards improvement, of the sort we find in science
a collaborative, community effort to ensure buy-in
with rewards for participation
good versioning principles to ensure legacy annotation efforts not wasted
Requirements
5
for scienceCreate a consensus core of
interoperable domain ontologies
starting with low hanging fruit and working outwards from there
built and validated by trained experts
backed by persons of influence in different communities
6
This solution is already being implemented in the domain of
biomedicine
7
Uses of ‘ontology’ in PubMed abstracts
8
By far the most successful: GO (Gene Ontology)
9
a family of interoperable gold standard biomedical reference ontologies, based on the Gene Ontology
http://obofoundry.org
The OBO FoundryThe OBO Foundry
10
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry11
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
OBO Foundry ontology modules
GRANULARITY
RELATION TO TIME
12
Central principle of the OBO Foundry: ontological modularity
• division of labor• division of expertise• division of authority• no mapping problem• additivity of annotations• no silo effect – always one ontology for
every need• creates tested guidelines (traffic laws) for
those with new ontology needs13
Obstacles to the realization of ontology modularity based on
coherent traffic laws of the sort we find in science
• Computer scientists are teaching
people ontology tools• Computer engineers have an interest in
multiple new ontologies• Every Tuesday a new ontology of wheat
14
The result:
Paris has_temperature 62o
Mohammed is_a string
Amount of money is_a integer
Currency has_unit $
Nuclear weapon is_a concept
15
ontologies for ‘agent’
16
17
SUMO
Contract = def. Attribute that applies to Propositions where something is promised in return, i.e. a reciprocal promise.
18
Cyc
Contract =def. a collection of agreements
[whereby] each sentence is a legal
agreement in which two or more agreeing
agents promise to do (or not to do)
something. There are legal consequences
to breaking the promises made in a
contract
19
Legal Ontology of Contract Formation
http://www.dit.unitn.it/~pavel/cando/Pictures/Posters/Mullen.pdf
20
Legal Ontology of Contract Formation
http://www.dit.unitn.it/~pavel/cando/Pictures/Posters/Mullen.pdf
21
What we need 1 (adults)
a thoroughly tested, mandated, common top-level ontology to enable minimal ontology interoperability
thoroughly tested, mandated domain ontologies built and maintained recognized by domain experts
25
What we need 2 (training)
Professional training for ontologists
to teach people to CREATE ONTOLOGY CONTENT
to teach people to USE ONTOLOGY CONTENT
26
What we need 3 (institutions)
institutions for ontology standardization
– counterparts of the authority structure maintained by Linux or by the SI System of Units or the IUPAC chemical nomenclature organization
(W3C does not see what is needed for advancement of ontology towards coherence and consistency)
27
What we need 4 (standards)
Standards governing
rules for ontology development, versioning, modularity
ensuring interoperability
Authorities able to apply these rules, and ensure
filling in of gaps by experts
sustainability
28
What we need 6 (Darwinian struggle for survival)
ontology evaluation with teeth
if ontology (science) is to be born, ontologies must die
29
Ontology needs to become more like a science
basis in evidence
established results – authoritative ontologies*
expert peer review
credit for good ontology work
30
Peer review evaluation process
Required where the quality of inputs cannot be evaluated mechanically
-- journal articles
-- research proposals
-- people (for career promotion in universities …)
31
Treat ontologies like publications
This is happening already with databases:Nature Signaling
Nature Pathway Interactions
Nature Ontologies ?
Ontology peer review methodology being tested within the OBO Foundry
32
Peer review assessment tasks
Is the ontology consistent with the rules (on modularity, …) ?
Does the ontology provide adequate coverage of its defined domain?
To what level is inferencing supported in the ontology relations structure?
Does the ontology interoperate with other ontologies in the system
33
Is the ontology being developed collaboratively through the engagement and participation of relevant domain stakeholders and developers of neighboring ontologies?
Does the ontology have a tracker for submissions of new terms and notification of errors?
Does the ontology have a help desk which has prompt response times?
34
Is the ontology syntactical correct
Is a URI assigned to each term of the ontology?
Does the URI point to required metadata for this term (including natural language and formal definitions).
Are all identifiers and preferred terms unique
Are all asserted subclass relations correct in light of the intended interpretation
35
Perhaps the ontology of law needs silos?
But if we are to use ontologies as a rigorous means of comparing and integrating legal systems and associated data, then we need a robust common ontology framework
-- a common top-level ontology
-- a common set of ontology relations
-- common (mid-level) domain ontologies
36
http://code.google.com/p/information-artifact-ontology/
Information Artifact Ontology
One domain ontology with which an ontology of legal entities must cohere:
37
Basic Formal Ontology (BFO)
Continuant Occurrent
processIndependentContinuant
thing
DependentContinuant
quality, role, function …
.... ..... .......38
Blinding Flash of the Obvious
Continuant Occurrent
processIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......quality dependson bearer
39
Information entities are relative to provenance and to processors in a way in which types are not
40
What is a datum?
Continuant Occurrent
processIndependentContinuant
laptop, book
DependentContinuant
quality
.... ..... .......datum: a pattern in some medium with a certain kind of provenance
41
Continuant Occurrent
IndependentContinuant
DependentContinuant
.... ..... .......
InformationEntity
Action
creating a datum
42
Generically Dependent Continuants
GenericallyDependentContinuant
Information Entity
Sequence
if one bearer ceases to exist, then the entity can survive, because there are other bearers (copyability)
the pdf file on this laptop
the DNA (sequence) in that chromosome
43
Generically Dependent Continuants
GenericallyDependentContinuant
Information Artifact
Gene Sequence
.pdf file .doc file
instances 44
Transcriptomics (MIAME Working Group)
Proteomics (Proteomics Standards Initiative)
Metabolomics (Metabolomics Standards Initiative)
Genomics and Metagenomics (Genomic Standards Consortium)
In Situ Hybridization and Immunohistochemistry (MISFISHIE Working Group)
Phylogenetics (Phylogenetics Community)
RNA Interference (RNAi Community)
Toxicogenomics (Toxicogenomics WG)
Environmental Genomics (Environmental Genomics WG)
Nutrigenomics (Nutrigenomics WG)
Flow Cytometry (Flow Cytometry Community)
IAO adopted, and being violently tested, inter alia, by:
45
Information Entity (science)
‘information’ – mass noun (Shannon and Weaver)
‘information entity’ – count noun (Information Ontology)
Information entities are, roughly: artifacts in the realm of qualities (patterns)
46
Information Entity (science)
protocoldatabasetheoryontology gene listpublicationresult...
47
Information Entity (labeling)
serial numberbatch numbergrant numberperson numbernameaddressemail addressURL...
48
Type or instance
ContinuantOccurrent(Process)
IndependentContinuant
human being,protocol document
DependentContinuant
pattern of ink marks
Applying the protocol
Side-Effect …
... .. ..... .... .....49
Continuant Occurrent
IndependentContinuant
DependentContinuant
.... ..... .......
InformationEntity
Action
creating a datum
50
type: human beingInstance: Leon Tolstoy
type: novelInstance: War and Peace
type: bookInstance: this copy of War and Peace
Types and instances
51
Is the Coca-Cola trademark a type or an instance?If the Coca-Cola trademark were a type, and the copies on my laptop and on your laptop instances, then there would be many Coca-Cola trademarks
Hence the Coca-Cola trademark is an instance
What is a trademark?
53
Is War and Peace a type or an instance?If War and Peace were a type, and the copies of War and Peace in my library and in your library were instances, then
• there would be many War(s) and Peaces.
Hence War and Peace is an instance.
What is a work of literature?
54
There can be two copies of the US Declaration of Independence
There cannot be two US Declarations of Independence
There cannot be subtypes of the US Declaration of Independence
There are not two Declarations of Independence
55
Rule for types
Their names are pluralizable
There can be three peopleThere cannot be three Condoleezza Rices
Information Entities = entities which can exist in many perfect copies
56
Specific dependence
Continuant Occurrent
process
IndependentContinuant
thing
DependentContinuant
quality
.... ..... .......headache dependson human being
57
Generically Dependent Continuants
GenericallyDependentContinuant
Information Entity
Sequence
if one bearer ceases to exist, then the entity can survive, because there are other bearers (copyability)
the pdf file on my laptop
the DNA (sequence) in this chromosome
58
are realized through being concretized in specifically dependent continuants(the plan in your head, the protocol being realized by your research team)
Generically dependent continuants
59
they have a different kind of provenance
◦Aspirin as product of Bayer GmbH◦aspirin as molecular structure
Generically dependent continuants are distinct from types
60
Generically Dependent Continuants
GenericallyDependentContinuant
Information Entity
Sequence
.pdf file .doc file
instances 61
are concretized in specifically dependent continuants
Beethoven’s 9th Symphony is concretized in the pattern of ink marks which make up this score in my hand
Generically dependent continuants
62
do not require specific media (paper, silicon, neuron …)
Generically dependent continuants
63
Realizable Dependent Continuants
SpecificallyDependentContinuant
Quality, PatternRealizable Dependent Continuant
inert ert
Occurrent
64
Examplesperformance of a symphonyprojection of a filmutterance of a sentenceapplication of a therapycourse of a diseaseincrease of temperature
OccurrentRealizable Dependent Continuant
65
ContinuantOccurrent
IndependentContinuant
Specifically DependentContinuant
Quality Disposition
Realization
Role
Realizable DependentContinuant
GenericallyDependentContinuant
66
A violinist reads the score of Beethoven’s 9th Symphony and a concretization of the Symphony is created in his mind (something like a plan)
In playing he realizes this plan, thereby generating a performance of the Symphony
Realizable Dependent Continuants are always specifically dependent
67
Nature Protocols
vs.
The protocol McDoe has been following in this project since March
Realizable Dependent Continuants are always specifically dependent
68
McDoe reads the protocol as published and a concretization of the protocol is created in his mind (something like a plan)
In his laboratory work he realizes this plan, thereby generating an experiment
Realizable Dependent Continuants are always specifically dependent
69
Informational Entity (law)
licensepermissioncontractregulation
...
70
Open Source Licenses
Open source licenses define the privileges and restrictions a licensor must follow in order to use, modify or redistribute the open source software.
Examples include Apache License, BSD license, GNU General Public License, ...
The proliferation of open source licenses is one of the few negative aspects of the open source movement because it is often difficult to understand the legal implications of the differences between licenses.
(Wikipedia)71
By following the strategy of the Gene Ontology
Examine the instances in reality – laptops, labels, actions of signing contracts – and their interrelations
Distinguish license template from license (correctly filled-in)
How to create a common representation of the entities in the domain of
contracts and licensing?
72
All terms in an ontology must have instances in realityOntologies must be anchored to reality through these instancesWe anchor the ontology of information entities through human acts of using language, through documents, through acts of entering data into a registry ...
Basic rule of evidence-based ontology
73
Open Source Licenses
Open source license as generically dependent continuant (compare: protocol in Nature Protocols)
The license signed by John and Jim, a specifically dependent continuant whose bearer is (say) a specific piece of paper
The former is a concretization of the latter
74