Ontologies
Presented by:Rokhlenko Oleg
Data Integration SeminarSpring 2002Supervisor: Ron Y. Pinter
References1. Ontologies: Principles,Methods and
Applications
by:
Mike Uschold – The university of Edinburg, Scotland & Michael Gruniger – University of Toronto, Canada
2. Ontology-Driven Integration of Scientific Repositoriesby: Vassilis Christophide
Catherine HoustisSpyros Lalis Hariklia TsalapataDepartment of CS, University of Crete, Greece
Agenda
Why Ontologies and what are they? Uses of Ontologies. A skeletal methodology for building
ontologies. Ontologies in practice.
What are the problems? The lack of shared understanding leads to:
Poor communication within and between people and their organizations
Difficulties in identifying requirements and thus defining of a specification of the system
Disparate modeling methods, paradigms, languages and software tools severely limit: Inter-operability The potential for re-use and sharing => much wasted effort re-inventing the wheel
How can we solve them?The way to address these problems, is to reduce or eliminate conceptual and terminological confusion and come to a shared understanding. Such an understanding can function as a unifying framework for the different viewpoints and serve as the basis for: Communication between people. Inter-Operability between systems. System Engineering benefits as:
Re-usability Reliability Specification
Examples (1) Unifying Research Fields
Situation/Problem:Researches in the different but related fields of AI Planning, Decision Theory and Distributed Systems Theory cannot readily make use of each other’s results. This is because they have a different perspective on and use different terms to describe the same underlying ideas.
Solution:Develop a unifying conceptual framework which enables research results in one field to be applied to the other fields.
Examples (2) SemiConductorFabrication
Situation/Problem:Software bought in from the outside includes a WIP tracking system and production line simulation package. The simulation package requires as input, a very large description of a model of the product flow in the factory, which incorporates various details of the WIP tracking mechanism. When new versions of the simulation package are released, or if a new supplier is chosen, the model must be converted to a new format. This conversion is both time consuming and error prone.
Solution:Automate the process of converting the model when new external software is introduced. This both saves time and ensures model fidelity.
Examples (3) SpacecraftMissionOperations
Situation/Problem: Various knowledge based systems were developed independently to assist in different aspects of spacecraft operations (e.g. in planning, anomaly detection, diagnosis). Each uses its own approach to structuring and representing the relevant concepts in a large knowledge base.It is desirable to integrate these system, so that each can make use of the knowledge of the others.
Solution:Use a federated agent based approach to knowledge sharing. The overall system is called ATOS: Advanced Technology Operations System.
What is an ontology? From Greek: Ontos = being, logos = science `Ontology' is the term used to refer to the shared
understanding of some domain of interest which may be used as a unifying framework to solve the above problems in the above described manner.
An ontology necessarily entails or embodies some sort of world view with respect to a given domain. The world view is often conceived as a set of concepts (e.g. entities, attributes, processes), their definitions and their inter relationships; this is referred to as a conceptualization.
What is an ontology? (cont.)
Such a conceptualization may be implicit, e.g. existing only in someone's head, or
embodied in a piece of software. For example, an accounting package presumes some world view encompassing such concepts as invoice, and a department in an organization. The word `ontology' is sometimes used to refer to this implicit conceptualization.
However, the more standard usage and that which we will adopt is that the ontology is an explicit account or representation of [some part of] a conceptualization.
What does an ontology look like? An [explicit] ontology may take a variety of forms,
but necessarily it will include a vocabulary of terms and some specification of their meaning (i.e. definitions). The degree of formality by which a vocabulary is created and meaning is specified varies considerably:
highly informal: expressed loosely in natural language semi informal: expressed in a restricted and structured
form of natural language semi formal: expressed in an artificial formally defined
language rigorously formal: meticulously defined terms with
formal semantics, theorems and proofs of such properties as soundness and completeness.
What did we see till now?
Why Ontologies and what are they? Uses of Ontologies. A skeletal methodology for building
ontologies. Ontologies in practice.
Uses of ontologies
COMMUNICATION
between people and organizations
INTER-OPERABILITY
between systems
Reusable Components Reliability
Specification
SYSTEMENGINEERING
Weidentifythreemaincategoriesofusesforontologies.Withineach,otherdistinctionsmaybeimportant,suchasthenatureofthesoftware,whotheintendedusersare,andhowgeneralthedomainis.
Communication Normative Models
Within any large scale integrated software system, different people must have a shared understanding of the system and its objectives.
Networks of Relationships We can also use ontologies to create a network of relationships,
keep track of what is linked, and explore and navigate through this network.
Consistency and Lack of Ambiguity One of the most important roles an ontology plays in
communication is that it provides unambiguous definitions for terms used in a software system.
Integrating Different User Perspectives If we have a system with multiple communicating agents, this
integration through shared understanding becomes vital.
Inter Operability Many applications of ontologies address the
issue of inter operability, in which we have different users that need to exchange data or who are using different software tools. A major theme for the use of ontologies in domains such as enterprise modeling and multiagent architectures is the creation of an integrating environment for different software tools.
Example
Theterm‘procedure’usedbyonetoolistranslatedintotheterm‘method ‘usedbytheotherviatheontology,whosetermforthesameunderlyingconceptis‘process’.procedure
viewer
translator
Ontology
method
library
give me the procedure for…
here is the
procedure for…
translator
give me the
METHOD for…
here is the
METHOD for…
procedure = ???
procedure =
process
give me the
process for…
here is
the process for…
METHOD =
process
??? = process
Ontologies as Inter Lingua
One approach is to design unique translators for every two party exchange; however, this would require O(n 2 ) translators for n different ontologies
To assist inter operability, ontologies can be used to support translation between different languages and representations.
L1
L3
L2
L4
L1
L3
L2
L4
Interlingua
T1
T2
T3
T4
Using ontologies as inter lingua to support translation, we can reduce the number of translators to O(n) for n different ontologies, since it would only require translators from a native ontology into the interchange ontology
System Engineering The applications of ontologies that we have
considered to this point have focused on the role that ontologies play in the operation of software systems. In this section we consider applications of ontologies that support the design and development of the software systems themselves: Specification Reliability Reusability
1. Specification A shared understanding of the problem and the task at
hand can assist in the specification of software systems. The ontology’s role in specification varies with the degree of formality within the system design methodology:
In an informal approach, ontologies facilitate the process of identifying the requirements of the system and understanding the relationships among the components of the system. This is particularly important for systems involving distributed teams of designers working in different domains.
In a formal approach, an ontology provides a declarative specification of a software system, which allows us to reason about what the system is designed for, rather than how the system supports this functionality.
2. Reliability Informal ontologies can improve the reliability of software
systems by serving as a basis for manual checking of the design against the specification.
Using formal ontologies enables the use of [semi ]automated consistency checking of the software system with respect to the declarative specification. In addition, formal ontologies can be used to make explicit the various assumptions made by different components of a software system, facilitating their integration.
Declaratively specified assumptions may explicitly restrict the applicability of a particular ontology to a problem domain . By proving that the ontology is capable of supporting various reasoning problems, we can demonstrate the reliability of the software system within the domain.
3. Reusability To be effective, ontologies must also support reusability, so
that we can import and export modules among different software systems.
The problem is that when software tools are applied to new domains, they may not perform as expected, since they relied on assumptions that were satisfied in the original applications but not in the new ones.
By characterizing classes of domains and tasks within these domains, ontologies provide a framework for determining which aspects of an ontology are reusable between different domains and tasks.
What did we see till now?
Why Ontologies and what are they? Uses of Ontologies. A skeletal methodology for
building ontologies. Ontologies in practice.
A Skeletal Methodology for Building Ontologies
Although there is much collective experience in
developing and using ontologies, there are no standard methodologies for building ontologies.
Proposed comprehensive methodology for developing ontologies includes the following: Identify Purpose and Scope; Building the Ontology; Evaluation; Documentation;
Purpose and Scope
It is important to be clear about why the ontology is being built and what its intended uses are. The previous section explores the space of possible uses; this can be a starting point in identifying the purpose of an ontology yet to be constructed.
It will also be useful to identify and characterize the range of intended users of the ontology.
Building the Ontology The identification of the purpose and scope
of the ontology, at least in general terms, serves to provide a reasonably well defined target for building the ontology.
Three aspects to this are: capture, coding, and integration of existing ontologies.
Capture
By ontology capture, we mean:1) identification of the key concepts and relationships in the domain of interest;
(scoping)2) production of precise unambiguous text definitions for such concepts and
relationships; 3) agreeing on all of the above.
1) Scoping Brainstorming --- Have a brain storming session to
produce all potentially relevant terms and phrases; Grouping--- Structure the terms loosely into work
areas corresponding to naturally arising sub groups. Connecting--- Identify semantic cross references
between the areas; i.e. concepts that are likely to refer to or be referred to by concepts in other areas. This information can be used to help identify which work area to tackle first to minimize likelihood of re work.
2) Produce Definitions DeterminingMetaOntology---Let the careful consideration of
the concepts and their inter relationships determine the requirements for the meta ontology. Keep in mind various possibilities, and use words and phrases in a consistent manner where appropriate (e.g. role, entity, relationship, type, instance).
WorkAreas--- Address each work area in turn. Start with work areas that have the most semantic overlap with other work areas.
Terms--- Proceed in a middle out fashion rather than top down or bottom up. That is, define the most fundamental terms in each work area before moving on to more abstract and more specific terms within a work area.
The idea of what is fundamental, or basic, is a psychological phenomenon. For example, `dog' is basic, `mammal' is a generalization, and `cocker spaniel' is a specialization.
Why Middle-Out Approach?
Bottom-Up
Approach
High level of detail Difficult
commonality
Inconsistency Re-work & more effort
Why Middle-Out Approach?
Top-down
Approach
Better control
of the level of detail
Choosing arbitrary
high-level categories
Re-work & more effort
Less stability
Why Middle-Out Approach?
Middle-Out
Approach
Capture
commonality
Consistency &
Accuracy
Less re-work &
less effort
Balance in terms
of the level detail Stability
Starting with most
important concepts
3) Reaching Agreement There is considerable variation in the degree of effort
required to agree on definitions and terms for underlying concepts. For some terms, consensus on the definition of a single concept can be fairly easy. In other cases several terms seem to correspond with one concept definition.
In practice, there are only few cases where commonly used terms have significantly different informal usage, but no useful different definitions could be agreed. This should be recorded in notes against the definition.
Finally, some highly ambiguous terms are identified as corresponding with several closely related, but different concepts. In this situation, the term itself gets in the way of a shared understanding.
Coding By coding, we mean explicit representation of
the conceptualization captured in the previous stage in some formal language. This will involve: committing to the basic terms that will be used to
specify the ontology (e.g. class, entity, relation); this is often called a `meta ontology' because it is in essence, the [underlying] ontology of representational terms that will be used to express the main ontology;
choosing a representation language (which is capable of supporting the meta ontology);
writing the code.
Integrating Existing Ontologies
During either or both of the capture and coding processes, there is the question of how and whether to use [all or part of] ontologies that already exist. In general this is a very difficult problem. One way forward is to make explicit all assumptions underlying the ontology.
Overall, provision of guidance and tools in this area may be one of the biggest challenges in developing a comprehensive methodology for building ontologies. It is easy enough to identify synonyms, and to extend an ontology where no concepts readily exist. However, when there are obviously similar concepts defined in existing ontologies, it is rarely clear how and whether such concepts can be adapted and reused.
Evaluation G'omez P'erez provides a good definition of evaluation
in the context of knowledge sharing technology: “to make a technical judgment of the ontologies, their associated software environment, and documentation with respect to a frame of reference … The frame of reference may be requirements specifications, competency questions, and/or the real world.”
Some detailed work has been done on the evaluation of ontologies which could contribute to a comprehensive methodology for building ontologies .The approach taken in some of this work, is to look first at what has been done in the field of KBS, and to adapt it for ontologies.
Documentation It may be desirable to have established guidelines
for documenting ontologies, possibly differing according to type and purpose of the ontology.
As pointed out by Skuce , one of the main barriers to effective knowledge sharing, is the inadequate documentation of existing knowledge bases and ontologies. To address these problems all important assumptions should be documented, both about the main concepts defined in the ontology, as well as the primitives used to express the definitions in the ontology (i.e. the meta ontology).
What did we see till now?
Why Ontologies and what are they? Uses of Ontologies. A skeletal methodology for building
ontologies. Ontologies in practice.
A scenario for Costal Zone Management (CZM)
Environmental scientists and public institutions working on CZM often need to extract and combine data from different scientific disciplines, such as marine biology, physical and chemical oceanography, geology and engineering, stored in distributed repositories.
For example: the transport of waste in particular coastal area given a pollution source. Local authorities could require this information to determine the best location for installing a waste pipeline.
This data is typically generated through a 2- step process, involving the execution of 2 different programs:
Example
Combination of data and programs for producing waste transport data:
Ocean
Circulation
Model
Waste
Transport
Model
Currents Bathymetry
Sea Circulation
Bathymetry
Pollution
Source
Sea Circulation
Waste
A scenario for CZM (cont.)Provided that user has no knowledge of this information, the following actions are necessary to discover which productions can be used to obtain Waste data for a particular costal are using the available recourses:
1. Locate Waste data stored in the distributed repositories.2. Determine usability of such search results.3. If no Waste data that satisfies the user requirements is
available, locate programs capable to produce this data.4. Having identified as appropriate program, i.e. a Waste
Transport model, determine the required input.5. For each input, locate appropriate sources or determine
ways to produce corresponding data sets.
An Ontology for the Waste Transport Scenario
Waste
Transport
Model
Ocean
Circulation
Model
Global
Currents
Sea
CirculationBathymetry
Waste
Chemical
Oceanography
CZM
Physical
Oceanography
The Knowledge Base In order to allow reasoning on the combination
alternatives between data and programs, we advocate a definition of the ontology notions in a KBS using Horn Clauses.
An ontology notion N is defined as a clauseN(A1, A2, …,An) where A1,A2,…,An are it’s attributes.
Relations between concepts are expressed as rules of the form: N(A1,A2,…,An) :- N1(A1,…,An), … ,
Nn(A1,…,An), Expr(A1,…,An) where “:-” denotes implication and “,” conjunction.
The Knowledge Base (cont.)
The rule body includes program and data concepts Ni as well as constrains Expr , e.g. parameter restrictions, for deducing the notion appearing as a consequent in the rule head.
Exactly one literal in the body describes the corresponding program notion. The rest ot the literals stand for the description of input data required by that program.
The following clauses define the notions introduced in the above ontology:
Bathymetry(Location, GridRes)
ExtCurrents(Location, GridRes)
SeaCirc(Location, GridRes)
OceanCircModel(Location, GridRes)
WasteTransprotModel(Location, GridRes)
N(A1,A2,…,An) :- N1(A1,…,An), … , Nn(A1,…,An), Expr(A1,…,An)
The Knowledge Base (cont.)
The ontology relation are formalized using 2 rules:
1. SeaCirc(Location,GridRes):-OceanCircModel(Location,GridRes),ExtCurrents(Location,GridRes’),Bathymetry(Location,GridRes’’),GridRes<=GridRes’,GridRes<=GridRes’’
“<=“ denotes higher or equal grid resolution
Rule 1 states that Sea Circulation data for a specific location and grid resolution can be derived from local Bathymetry
and external Current data using Ocean Circulation program.
The Knowledge Base (cont.)
2. Waste(Location,GridRes):-WasteTransportModel(Location,GridRes),SeaCirc(Location,GridRes’),Bathymetry(Location,GridRes’’),GridRes<=GridRes’,GridRes<=GridRes’’
Rule 2 states that Waste data for a specific location and grid resolution can be produced by combining Sea Circulation
with local Bathymetry data via a Waste Transport program.
The Knowledge Base (cont.)
Clauses without a body, called facts , are instances of abstract notions. For example:
SeaCirc (HER, 10m3) stands for 3-D Sea Circulation for the area of Heraklion with a grid resolution of ten cubic meters.
WasteTransport (HER, 1m2) stands for Waste Transport program that computes 2-D Waste data for the area of Heraklion with a grid resolution of one square meter.
Facts are either extensional, indicating available data sets or programs, or intentional, denoting data sets that can be generated through programs.
There is no need to explicitly store facts in KBS. Intentional facts are dynamically deduced through rules. Extensional facts can be constructed “on-the-fly” via metadata search engine that locates the corresponding resources.
On-Demand Generation of Data Production Paths
Given this formal representation of the ontology, requests for data productions translate into queries to the knowledge base.
A query is a description of the desired resource in terms of an ontology concept. It must be satisfied through the extensional or intentional facts. The latter being sub-queries requiring further expansion. This iterative matching process takes into account all possible combinations of rules and extensional facts.
The result is set of trees, whose nodes are intentional facts and leaves are extensional facts, embodying all valid production paths through which data for the queried concept can be generated.
Example To illustrate the on demand generation of data
production paths, let us assume that the following resources are available in the system repositories, expressed as extensional facts:
Bathymetry (HER, 10m2) ExtCurrents (HER, 10m3) OceanCircModel (HER, 10m2) SeaCirc (HER, 25m3) WasteTransportModel (HER, 10m2) WasteTransportModel (HER, 50m3)
The use can inquire on the concept of Waste without restricting any attributes by posing the query Waste(X, Y)
Example Waste
(HER, 10m2)
WasteTransportModel
(HER, 10m2)
Bathymetry
(HER, 10m2)
SeaCirc
(HER, 10m2)
OceanCircModel
(HER, 10m2)
Bathymetry
(HER, 10m2)
ExtCurrents
(HER, 10m3)
WasteTransportModel
(HER, 50m3)
Waste
(HER, 50m3)
Bathymetry
(HER, 10m2)
SeaCirc
(HER, 25m3)
Production for Waste data as presented to user by GUI.
- Extensional fact
- Intentional fact
- Program
data sets model
wrapper
WorkflowRuntime
MetadataSearchEngine
Invoke/access
resources
query
Graphical User Interface
Workflow specification
productions
Middleware
system
KnowledgeBase System
export
resources
Architecture (Overview)
Architecture (cont.) TheMetadataSearchEngineis responsible for locating
data sets or programs. It accepts metadata queries on the properties of resources and returns a list of metadata descriptions and references. References point to repository wrappers, which provide an access and invocation interface to the underlying legacy systems where the data and programs reside.
TheKnowledgeBaseSystemaccepts queries regarding the availability of ontology concepts. It generates and returns the corresponding dada productions based on the available resources and constrains imposed by the ontology rules.
TheWorkflowRuntimeSystemmonitors and coordinates the execution of workflows. It executes each intermediate step of workflow specification, accessing data and invoking program through the repository wrappers.