copyright © 1997 pangea systems, inc. all rights reserved. building ontologies

25
Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Upload: branden-evans

Post on 13-Dec-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

7 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Building Ontologies

Page 2: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Building Ontologies

No field of Ontological Engineering equivalent to Knowledge or Software Engineering;

No standard methodologies for building ontologies;

Such a methodology would include: a set of stages that occur when building

ontologies; guidelines and principles to assist in the

different stages; an ontology life-cycle which indicates the

relationships among stages.Gruber's guidelines for constructing ontologies

are well known.

Page 3: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

The Development Lifecycle Two kinds of complementary methodologies emerged:

Stage-based, e.g. TOVE [Uschold96] Iterative evolving prototypes, e.g. MethOntology

[Gomez Perez94]. Most have TWO stages:

1. Informal stage ontology is sketched out using either natural language

descriptions or some diagram technique

2. Formal stage ontology is encoded in a formal knowledge

representation language, that is machine computable

An ontology should ideally be communicated to people and unambiguously interpreted by software the informal representation helps the former the formal representation helps the latter.

Page 4: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

A Provisional Methodology

A skeletal methodology and life-cycle for building ontologies;

Inspired by the software engineering V-process model;

The overall process moves through a life-cycle.

The left side charts the processes in building an ontology

The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology

Page 5: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

The V-model Methodology

Conceptualisation

Integrating existing ontologies

Encoding

Representation

Identify purpose and scope

Knowledge acquisition

Evaluation: coverage, verification, granularity

Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency

Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation

Ontology in Use

User Model

Conceptualisation Model

Implementation Model

Page 6: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

The ontology building life-cycle

Identify purpose and scope

Knowledge acquisition

Evaluation

Language and representation

Available development tools

Conceptualisation

Integrating existing ontologiesEncoding

Building

Page 7: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

User Model: Identify purpose and scope

Decide what applications the ontology will support

EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source

TAMBIS: retrieval across a broad range of bioinformatics resources

The use to which an ontology is put affects its content and style

Impacts re-usability of the ontology

Page 8: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

User Model: Knowledge Acquisition

Specialist biologists; standard text books; research papers and other ontologies and database schema.

Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer

Evaluation: Fitness for purpose Coverage and competency

Page 9: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Conceptualisation Model: Conceptualisation

Identify the key concepts, their properties and the relationships that hold between them;

Which ones are essential? What information will be required by the applications?

Structure domain knowledge into explicit conceptual models.

Identify natural language terms to refer to such concepts, relations and attributes;

Determine naming conventions Consistent naming for classes and slots EcoCyc:

Classes are capitalized, hyphenated, plural Slot names are uppercase

A quality ontology captures relevant biological distinctions with high fidelity

Page 10: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Conceptualisation Model: Pitfalls

Pitfall: Missing ontological elements Missing classes: Swiss-Prot Protein complexes Missing attributes: Genetic code identifier Confuse 1:1 with 1:Many, or 1:Many with

Many:Many Cofactor as an attribute of reaction

Important data is stored within text/comment fields

Pitfall: Extra ontological elementsPitfall: Stop over-elaborating – when do I stop?Pitfall: Relevance – do I really need all this

detail?

Page 11: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Integrating Existing Ontologies

Reuse or adapt existing ontologies when possible

Save time Correctness Facilitate interoperation

Integration of ontologies Ontologies have to be aligned Hindered by poor documentation and

argumentation Hindered by implicit assumptions Shared generic upper level ontologies

should make integration easier

Page 12: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Encoding: Implementation Toolkit

Construct ontology using an ontology-development system

Does the data model have the right expressivity? Is it just a taxonomy or are relationships needed? Is multiple parentage needed? Inverse relationships? What types of constraints are needed?

Are reasoning services needed? What are authoring features of the development

tool? Can ontology be exported to a DBMS schema? Can ontology be exported to an ontology exchange

language? Is simultaneous updating by multiple authors

needed? Size limitations of development tool?

Page 13: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Encoding: Ontology Implementation Pitfalls

Pitfall: Semantic ambiguity Multiple ways to encode the same

information Meaning of class definitions unclear

Pitfall: Encoding Bias Encoding the ontology changes the

ontology

Page 14: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Encoding: Ontology Implementation Pitfalls

Pitfall: Redundancy (lack of normalization) Exact same information repeated Presence of computationally derivable

information Date of birth and age DNA sequence and reverse complement

More effort required for entry and update Partial updates lead to inconsistency OK if redundant information is maintained

automatically

Page 15: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Encoding: The Interaction Problem

Task influences what knowledge is represented and how its represented

Molecular biology: chemical and physical properties of proteins

Bioinformatics: accession number, function gene

Underlying perspectives mean they may not be reconcilable

If an ontology has too many conflicting tasks it can end up compromised – TaO experience

Page 16: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Evaluate it - A guide for reusability

Conciseness No redundancy Appropriateness – protein molecules at the

atomic resolution when amino acid level would do

ClarityConsistencySatisfiability – it doesn’t contradict itself

Enzyme is a both a protein which catalyses a reaction and does not catalyse a reaction

Commitment Do I have to buy into a load of stuff I don’t

really need or want just to get the bit I do?

Page 17: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Documentation: Make Ontology Understandable!

Produce clear informal and formal documentation

An ontology that cannot be understood will not be reused

Genbank feature table NCBI ASN.1 definitions

There exists a space of alternative ontology design decisions

Semantics / Granularity Terminology

Pitfall: Neglecting to record design rationale

Page 18: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Publish the Ontology

Formal and informal specificationsIntended domain of applicationDesign rationaleLimitations

See EcoCyc paper in ISMB-93/Bioinformatics 00See TAMBIS paper in Bioinformatics 99

Page 19: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

SequenceComponent

GeneMotif

Restriction site

Phosphorylation site

Macromolecule Reference Ontology

MacroMolecule

ProteinNucleic Acid

Lipid

Peptide EnzymeRNA

DNA

cDNA gDNA mDNA

mRNA

componentOf

Page 20: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Discussion

What is a macromolecule?Where does macromolecule fit into an upper

level ontology? Substance? Structure?

Is lipid a macromolecule?If we replace macromolecule with biopolymer

is the placement of lipid legit?Is a peptide a protein and therefore a

macromolecule? If not, where does it go?

Page 21: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Taxonomy and Roles

Do we want to assert everything in a taxonomy?

Or do we want to define things in terms of their properties?

Enzyme = Protein catalyses Reaction gDNA = DNA hasLocation Chromosomal Sufficiency as well as necessary conditions

Whats the relationship between cDNA and EST cDNA and some child of RNA ?

Page 22: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Axioms and constraints

Not all RNA is translated to protein Do we want to say that DNA is translated to

protein?Do we want to model catalytic RNAs? Relationships – what other ones do we need?

Genes express proteins Genes express rRNA, tRNA Genes are found on gDNA Genes are found on mDNA Genes have their own components –

recursive relationships with partitive semantics

Reasoning? Instances? Reusable? Clear? Concise?

Page 23: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Ontological Pitfalls

Stop-over – when do I stop over elaborating? Proteins amino acid residues side

chains physical chemical properties …. Relevance

Do we need to mention all the types of nucleic acid?

Page 24: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

EcoCyc

MacroMolecule

Proteins

Nucleic-Acids

PolyPeptidesProtein-Complexes

RNA DNA

DNA-SegmentsMisc-RNA

Chemicals

Compounds-And-Elements

Compounds

Lipids

Genes

Page 25: Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Cop

yri

ght

© 1

99

8 P

ang

ea S

yst

em

s, Inc.

A

ll ri

ghts

rese

rved

.

Macromolecule in other Ontologies

Gene OntologyUsed to add attributes to gene instances in

databases Doesn’t need to talk about molecules or

components of molecules

TAMBIS OntologyModels it in a similar way to our reference

macromolecule ontologyBecause it asks questions of bioinformatics

sources