copyright © 1997 pangea systems, inc. all rights reserved. building ontologies
TRANSCRIPT
Cop
yri
ght
© 1
99
7 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Building Ontologies
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Building Ontologies
No field of Ontological Engineering equivalent to Knowledge or Software Engineering;
No standard methodologies for building ontologies;
Such a methodology would include: a set of stages that occur when building
ontologies; guidelines and principles to assist in the
different stages; an ontology life-cycle which indicates the
relationships among stages.Gruber's guidelines for constructing ontologies
are well known.
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
The Development Lifecycle Two kinds of complementary methodologies emerged:
Stage-based, e.g. TOVE [Uschold96] Iterative evolving prototypes, e.g. MethOntology
[Gomez Perez94]. Most have TWO stages:
1. Informal stage ontology is sketched out using either natural language
descriptions or some diagram technique
2. Formal stage ontology is encoded in a formal knowledge
representation language, that is machine computable
An ontology should ideally be communicated to people and unambiguously interpreted by software the informal representation helps the former the formal representation helps the latter.
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
A Provisional Methodology
A skeletal methodology and life-cycle for building ontologies;
Inspired by the software engineering V-process model;
The overall process moves through a life-cycle.
The left side charts the processes in building an ontology
The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
The V-model Methodology
Conceptualisation
Integrating existing ontologies
Encoding
Representation
Identify purpose and scope
Knowledge acquisition
Evaluation: coverage, verification, granularity
Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency
Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation
Ontology in Use
User Model
Conceptualisation Model
Implementation Model
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
The ontology building life-cycle
Identify purpose and scope
Knowledge acquisition
Evaluation
Language and representation
Available development tools
Conceptualisation
Integrating existing ontologiesEncoding
Building
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
User Model: Identify purpose and scope
Decide what applications the ontology will support
EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source
TAMBIS: retrieval across a broad range of bioinformatics resources
The use to which an ontology is put affects its content and style
Impacts re-usability of the ontology
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
User Model: Knowledge Acquisition
Specialist biologists; standard text books; research papers and other ontologies and database schema.
Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer
Evaluation: Fitness for purpose Coverage and competency
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Conceptualisation Model: Conceptualisation
Identify the key concepts, their properties and the relationships that hold between them;
Which ones are essential? What information will be required by the applications?
Structure domain knowledge into explicit conceptual models.
Identify natural language terms to refer to such concepts, relations and attributes;
Determine naming conventions Consistent naming for classes and slots EcoCyc:
Classes are capitalized, hyphenated, plural Slot names are uppercase
A quality ontology captures relevant biological distinctions with high fidelity
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Conceptualisation Model: Pitfalls
Pitfall: Missing ontological elements Missing classes: Swiss-Prot Protein complexes Missing attributes: Genetic code identifier Confuse 1:1 with 1:Many, or 1:Many with
Many:Many Cofactor as an attribute of reaction
Important data is stored within text/comment fields
Pitfall: Extra ontological elementsPitfall: Stop over-elaborating – when do I stop?Pitfall: Relevance – do I really need all this
detail?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Integrating Existing Ontologies
Reuse or adapt existing ontologies when possible
Save time Correctness Facilitate interoperation
Integration of ontologies Ontologies have to be aligned Hindered by poor documentation and
argumentation Hindered by implicit assumptions Shared generic upper level ontologies
should make integration easier
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Encoding: Implementation Toolkit
Construct ontology using an ontology-development system
Does the data model have the right expressivity? Is it just a taxonomy or are relationships needed? Is multiple parentage needed? Inverse relationships? What types of constraints are needed?
Are reasoning services needed? What are authoring features of the development
tool? Can ontology be exported to a DBMS schema? Can ontology be exported to an ontology exchange
language? Is simultaneous updating by multiple authors
needed? Size limitations of development tool?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Encoding: Ontology Implementation Pitfalls
Pitfall: Semantic ambiguity Multiple ways to encode the same
information Meaning of class definitions unclear
Pitfall: Encoding Bias Encoding the ontology changes the
ontology
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Encoding: Ontology Implementation Pitfalls
Pitfall: Redundancy (lack of normalization) Exact same information repeated Presence of computationally derivable
information Date of birth and age DNA sequence and reverse complement
More effort required for entry and update Partial updates lead to inconsistency OK if redundant information is maintained
automatically
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Encoding: The Interaction Problem
Task influences what knowledge is represented and how its represented
Molecular biology: chemical and physical properties of proteins
Bioinformatics: accession number, function gene
Underlying perspectives mean they may not be reconcilable
If an ontology has too many conflicting tasks it can end up compromised – TaO experience
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Evaluate it - A guide for reusability
Conciseness No redundancy Appropriateness – protein molecules at the
atomic resolution when amino acid level would do
ClarityConsistencySatisfiability – it doesn’t contradict itself
Enzyme is a both a protein which catalyses a reaction and does not catalyse a reaction
Commitment Do I have to buy into a load of stuff I don’t
really need or want just to get the bit I do?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Documentation: Make Ontology Understandable!
Produce clear informal and formal documentation
An ontology that cannot be understood will not be reused
Genbank feature table NCBI ASN.1 definitions
There exists a space of alternative ontology design decisions
Semantics / Granularity Terminology
Pitfall: Neglecting to record design rationale
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Publish the Ontology
Formal and informal specificationsIntended domain of applicationDesign rationaleLimitations
See EcoCyc paper in ISMB-93/Bioinformatics 00See TAMBIS paper in Bioinformatics 99
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
SequenceComponent
GeneMotif
Restriction site
Phosphorylation site
Macromolecule Reference Ontology
MacroMolecule
ProteinNucleic Acid
Lipid
Peptide EnzymeRNA
DNA
cDNA gDNA mDNA
mRNA
componentOf
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Discussion
What is a macromolecule?Where does macromolecule fit into an upper
level ontology? Substance? Structure?
Is lipid a macromolecule?If we replace macromolecule with biopolymer
is the placement of lipid legit?Is a peptide a protein and therefore a
macromolecule? If not, where does it go?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Taxonomy and Roles
Do we want to assert everything in a taxonomy?
Or do we want to define things in terms of their properties?
Enzyme = Protein catalyses Reaction gDNA = DNA hasLocation Chromosomal Sufficiency as well as necessary conditions
Whats the relationship between cDNA and EST cDNA and some child of RNA ?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Axioms and constraints
Not all RNA is translated to protein Do we want to say that DNA is translated to
protein?Do we want to model catalytic RNAs? Relationships – what other ones do we need?
Genes express proteins Genes express rRNA, tRNA Genes are found on gDNA Genes are found on mDNA Genes have their own components –
recursive relationships with partitive semantics
Reasoning? Instances? Reusable? Clear? Concise?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Ontological Pitfalls
Stop-over – when do I stop over elaborating? Proteins amino acid residues side
chains physical chemical properties …. Relevance
Do we need to mention all the types of nucleic acid?
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
EcoCyc
MacroMolecule
Proteins
Nucleic-Acids
PolyPeptidesProtein-Complexes
RNA DNA
DNA-SegmentsMisc-RNA
Chemicals
Compounds-And-Elements
Compounds
Lipids
Genes
Cop
yri
ght
© 1
99
8 P
ang
ea S
yst
em
s, Inc.
A
ll ri
ghts
rese
rved
.
Macromolecule in other Ontologies
Gene OntologyUsed to add attributes to gene instances in
databases Doesn’t need to talk about molecules or
components of molecules
TAMBIS OntologyModels it in a similar way to our reference
macromolecule ontologyBecause it asks questions of bioinformatics
sources