iso tc37/sc4 n433 busan 2007 ontologies & taxonomies bodil nistrup madsen department of...

89
ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology www.cbs.isv.dk & DANTERMcentret www.danterm.dk Copenhagen Business School Copenhagen Business School

Upload: gregory-porter

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

ISO TC37/SC4 N433

Busan 2007

ONTOLOGIES & TAXONOMIES

Bodil Nistrup MadsenDepartment of International Language Studies and

Knowledge Technologywww.cbs.isv.dk

&DANTERMcentretwww.danterm.dk

Copenhagen Business School

Copenhagen Business School

Page 2: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 3: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 4: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Terminological Ontologies

Principles:• feature specifications (modelling

characteristics of concepts)• dimensions / subdividing dimensions

(modelling subdivision criteria)• dimension specifications • constraints

Tools:• i-Term & i-Model, DANTERMcentret• CAOS 2 prototype

(Computer-Aided Ontology Structuring), Group of Computational Linguistics

Page 5: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Terminological principles presented by means of

examples from i-Term & i-Model

Terminology and Knowledge Management System

DANTERMcentretwww.i-term.dk

Page 6: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

feature specification: attribute-value pair

subdivision criteria

polyhierarchy

inheritance

Extract of an ontology for prevention in i-Model

Page 7: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

In terminological concept modelling only relevant subconcepts are registered. This means that not all possible ‘combinations’ of concepts from two or more groups (dimensions) will be registered, e.g. a concept universal secondary prevention is not relevant.

Page 8: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology
Page 9: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Concept oriented !

Page 10: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Associative concept relations

OntoQuery – Ontology-based Queryingwww.OntoQuery.dk

Madsen, Bodil Nistrup, Bolette Sandford Pedersen & Hanne Erdman Thomsen:

The Role of Semantic Relations in a Content-based Querying System: a Research Presentation from the OntoQuery Project.

In: Simov, Kiril & Atanas Kiryakov (eds.): Proceedings from OntoLex ”2000, Workshop on Ontologies and Lexical Knowledge Bases, Sept. 8-10 2000, Sozopol, Bulgaria,.

Page 11: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

rolerelation

agentrelation

patientrelation

instrument relation

resultrelation

agent-patient relation

agent-instrument relation

agent-result relation

patient-instrument relation

patient-result relation

instrument-result relation

activity-agent relation

activity-patient relation

activity-result relation

activity-instrument relation

activity relation

semantic relation

source-target relation

activity-source relation

activity-target relation

source relation

target relation

location relation

staticlocation relation

activity – static location relation

entity – static location relation

dynamiclocation relation

ship – quay

disembarkation – ship landing – airport

swim – water

oasis – dessert

heal – doctor

teach – student

paint – brush

coffee making – coffee

drawer – drawee

surgeon – scalpel

drawer – draft

wood - plane

student – graduate

coffee machine – coffee

Madsen, Pedersen & Thomsen, 2001: The relations location, activity and role

Page 12: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

location relation

dynamic location

source

directed to (activity-target)

originates in (activity-source)

transmits to (source-target)

takes place in (activity-static location)

situated in (entity-static location) static location

role relation

performed by (activity-agent)

performed on (activity-patient)

performed with (activity-instrument)

results in (activity-result)

affects (agent-patient)

uses (agent-instrument)

brings about (agent-result)

processed with (patient-instrument)

transformed into (patient-result)

produces (instrument-result)

associative relation

target

activity

agent

patient

instrument

result

CAOS Version

Page 13: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

CAOSComputer-Aided Ontology

Structuring

Bodil Nistrup MadsenHanne Erdman Thomsen

Carl ViknerBo Krantz Simonsen

Jacob M. Christensen

Group of Computational LinguisticsISV

Page 14: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

CAOS implements more restrictive terminological principles than i-Model.

CAOS helps the user in setting up consistent ontologies adhering to the terminological principles.

CAOS is based on the UML notation, but extensions are needed.

Page 15: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

The backbone of this concept modelling is constituted by characteristics modelled by formal feature specifications, i.e. attribute-value pairs.

Page 16: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

dimension specifications(specify the values associated with the corresponding attribute on the subconcepts)

subdividing dimension(concepts belonging to the same subdividing dimension are grouped together and the subdividing dimension is shown on the links to the concepts) type relation

primary feature specification

inherited feature specifications

Page 17: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Three subordinate concepts automatically generated on the basis of the dimension specification. No terms – yet!

Page 18: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Attempt at creating an illegal polyhierarchy: a concept universal selective prevention with two superordinate concepts within the same group (dimension TARGET GROUP).

Page 19: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

The CAOS prototype is based on UML notation, but extensions are needed for terminological concept modelling

UML class diagrams:

• not possible to represent several dimensions, from which one may be chosen as the subdividing dimension

• no notation for the specification of dimension values, at least not in the way it is done in CAOS

• no notation for feature specifications (it is possible to use a facility of UML which comes close to feature specifications as used in CAOS: in specializations it is possible to introduce attributes with initial values).

Page 20: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 21: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

classification taxonomy

thesauruswordnet

ontologyconcept system

Clarification needed !

Page 22: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

classcategory

keyword (syn)set

type concept

term

Page 23: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

eCatTerminology taskCEN CWA 15045

Multilingual Catalogue Strategies for eCommerce and eBusiness

Bodil Nistrup Madsen

DANTERMcentret, the Danish Terminology Centre - www.danterm.dk

&

Håvard Hjulstad

Standards Norway, [email protected] or [email protected]

Page 24: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

data model

classification system

ontology

model

knowledge representation

subjectclassification system

taxonomy

Based on eCatTerminology taskCEN CWA 15045Multilingual Catalogue Strategies for eCommerce and eBusiness

Page 25: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Concept Synonyms Characteristic Definitions

model PURPOSE: knowledge representation

simplified representation of knowledge about phenomena

ontology concept model; concept system

DESCRIPTION: concepts

model for the description of knowledge about concepts

data model DESCRIPTION: data

formal model for the description of data in an IT system

classification system

classification PURPOSE: classification

system for the division of phenomena into classes

taxonomy CONTENTS: categories

classification system for the division of categories of a domain

subject classification system

subject classification

CONTENTS: subject fields

classification system for the division of phenomena into subject fields

Translations of definitions from the OIO Concept database

Page 26: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

data model

classification system

ontology

model

knowledge representation

subjectclassification system

taxonomy

Based on eCatTerminology taskCEN CWA 15045Multilingual Catalogue Strategies for eCommerce and eBusiness

Page 27: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Concept Synonyms Characteristic Definitions

model PURPOSE: knowledge representation

simplified representation of knowledge about phenomena

ontology concept model; concept system

DESCRIPTION: concepts

model for the description of knowledge about concepts

data model DESCRIPTION: data

formal model for the description of data in an IT system

classification system

classification PURPOSE: classification

system for the division of phenomena into classes

taxonomy CONTENTS: categories

classification system for the division of categories of a domain

subject classification system

subject classification

CONTENTS: subject fields

classification system for the division of phenomena into subject fields

Translations of definitions from the OIO Concept database

Page 28: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

ontology

NOTE: An ontology may comprise all kinds of relations between concepts, e.g. generic, partitive and temporal relations.

Synonyms: concept model, concept system

A data model should always be based on an ontology, but sometimes a data model, represented by means of an ER or a UML diagram, is referred to as an ‘ontology‘. Our recommendation is to use the term ‘ontology‘ only as defined here. Please observe that the term ‘conceptual model‘ is referring to a kind of data model.An ontology may typically be used for the precise description of concepts.

CEN CWA 15045

Page 29: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

taxonomy

NOTE: A taxonomy is a kind of classification system, that comprises exclusively generic relations between the categories, in contrast to an ontology, which is a kind of model that may comprise all kinds of relations between concepts.

A taxonomy may typically be used for defining the types of data categories used within a specific field, eg. within the field of product description.

CEN CWA 15045

Page 30: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

data model

classification system

ontology

model

knowledge representation

subjectclassification system

taxonomy

Based on eCatTerminology taskCEN CWA 15045Multilingual Catalogue Strategies for eCommerce and eBusiness

Page 31: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Concept Synonyms Characteristic Definitions

model PURPOSE: knowledge representation

simplified representation of knowledge about phenomena

ontology concept model; concept system

DESCRIPTION: concepts

model for the description of knowledge about concepts

data model DESCRIPTION: data

formal model for the description of data in an IT system

classification system

classification PURPOSE: classification

system for the division of phenomena into classes

taxonomy CONTENTS: categories

classification system for the division of categories of a domain

subject classification system

subject classification

CONTENTS: subject fields

classification system for the division of phenomena into subject fields

Translations of definitions from the OIO Concept database

Page 32: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

conceptual data model

data model that represents an abstract view of the real world

ISO/IEC 11179-3: 2003(E), 3.2.8

information model

data model that represents the organization of information in a manner that reflects the structure of an information system

Amended from ISO/IEC FCD 11179-3: 2003(E), 3.2.13

CEN CWA 15045

Page 33: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

class description

concept in an ontology

concept definition

data category in a taxonomy

data category definition

class in a classification system

class description

class in a data model

Concept definitions form the basis for the definitions / descriptions of data categories, classes in classification systems, classes in data models

Page 34: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 35: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

ontology

philoso-phical ontology

pragmaticontology

top level ontology

universalontology

domain specific ontology

generalontology

taskspecificontology

task inde-pendantontology

language inde-pendant ontology

language inde-pendant ontology

formalontology

not formal onto-logy

POINT OF VIEW

specific ontology

LEVEL SUBJECT

PURPOSE LANGUAGE FORMALIZATION

application specificontology

With input from:

Guarino, Nicola (1998). Formal Ontology and Information Systems,.

Bodil Nistrup Madsen, Alting på sin plads og plads til alting. Om at ordne og udnytte viden om verden. I: Anita Nuopponen, Bertha Toft, Johan Myking (eds.): I Terminologins tjänst. Festskrift för Heribert Picht på 60-årsdagen. Proceedings of the University of Vaasa, Reports, Vaasa 2000, s. 71-91.

terminological ontology

METHOD

Page 36: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology
Page 37: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology
Page 38: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Hanne Erdman Thomsen

On the basis of Gómez-Pérez et al (2004) Ontological Engineering

Page 39: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

TC 37 Terminology and other language and content resources

Ontology Task Force

Provo, August 2007

Page 40: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Members of TC37 Ontology Task Force team:

SC 1Donald ChapinHanne Erdman Thomsen Hendrick Kockaert

SC 2Gerhard Budin

SC 3Bodil Nistrup Madsen (convenor)Klaus-Dirk SchmitzSue Ellen Wright

 

Page 41: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

SC 4Koiti Hasida Jae Sung LeeKey-Sun ChoiNicoletta Calzolari

ISO/IEC JTC 1 SC32Bruce Bargmeyer

TC37 SecretariatChristian Galinski

Page 42: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Organization of work:

1) Concept clarification

• Different types of knowledge representation resources

• What is the difference between ontology, taxonomy, thesaurus etc.

Result: systematic overview (concept system with definitions)

2) Overview of ontologies and projects 'outside' TC37- examples!

Page 43: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Organization of work:

3) Overview of related ongoing projects, existing standards, proposals for future projects within TC 37

4) Proposal for a strategy for TC 37 including future co-ordination by the Ontology Task Force

Page 44: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

New title 2005-08-25:

Systems to manage terminology, knowledge and content

New scope 2005-08-25:

Standardization of specifications and modelling principles for systems to manage terminology, knowledge and content with respect to semantic interoperability

ISO/TC 37/SC 3 N542

Examples of future projects:

Page 45: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Principles for building taxonomies for metadata

Examples: • A taxonomy for lexical and terminological data.• A taxonomy for any other kind of data collection.

TC 37 should be a pioneer within this field!Such taxonomies, which describe the contents of data

collections, also comprise systematic definitions and examples which will make it easier to classify data elements.

Motivation: It is extremely important to be able to describe elements of

data collections systematically in order to build databases / IT systems for storage, management and exchange of data. Many metadata vocabularies are not built on the basis of the principles of taxonomies, which means that they may be incomplete, inconsistent and difficult to use.

Page 46: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Principles for the use of concept models (ontologies) in developing data models

Examples: • Concept model for central concepts that form the basis of the

development of a data model for a terminology database.• Concept model for any other kind of database / IT system

(e.g. Electronic Health Care Systems) and the corresponding data model.

This project is not the same as the NWI in TC 37 SC 1 on Guidelines for applying concept modelling in terminology work (N 273).

Motivation: Many data models are still developed without being based on a

concept model. The two concepts ‘concept model’ (ontology) and ‘conceptual data model’ are very often mixed up. Ontology is often used for conceptual data model.

Page 47: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Principles for the development and use of meta models

This project should comprise guidelines based on the experience gained from the development of the meta model in ISO 16642.

It is related to the previous proposal.

Motivation: It is a non-trivial job to develop a meta model: How detailed should / could a meta model be? How to build specific data models on the basis of a meta

model?

Page 48: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 49: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Proposal for a Taxonomy of Lexical Metadata Categories

for ISO TC 37

Terminology and Other Language Applications

Page 50: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

ISO TC 37 published a standard in 1999 specifying data categories used in terminological resources, ISO 12620:1999, Computer assisted terminology management ― Data Categories.

In 2003, TC 37/SC 3 initiated a revision of the existing document with the intention of creating a family of data category standards designed to meet the needs of terminologists and other language experts developing a variety of electronic linguistic resources.

The intention was to include data categories for a variety of applications, including for example terminological and lexicographical data collections as well as machine translation lexica, cf.: SC 3 Systems to manage terminology, knowledge and content SC 2 Terminographical and lexicographical working methods SC 4 Language resource management.

Page 51: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

At the same time it was suggested to set up a Data Category Registry (DCR) for all the above mentioned kinds of lexical data, cf. also Ide & Romary (2004). The DCR is intended to be compliant with ISO 11179-3, Information technology — Metadata registries (MDR) — Part 3: Registry metamodel and basic attributes.

Page 52: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

The data categories of ISO 12620:1999 were classified in three major groups, and the groups were further subdivided into ten sub-groups:

A.1 term

A.2 term-related information

A.3 equivalence

A.4 subject field

A.5 concept-related description

A.6 concept relation

A.7 conceptual structures

A.8 note

A.9 documentary language

A.10 administrative information

Term and term-related data categories:

Descriptive data categories:

Administrative data categories:

Page 53: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

This structure is not homogenous, i.e. it reflects various subdividing criteria (dimensions), and it does not give a very clear overview of the data categories. One dimension is for example term-related information vs. concept-related description. Here it is not clear why e.g. subject field and concept relation do not fall within the group: concept-related description.

An example of term-related information is A.2.1.18.1 collocation, while an example of concept-related information is A.5.3 context (a text or part of a text in which a term occurs). Types of contexts can, among others, include: defining context (a context that contains substantial information about a concept, but that does not possess the formal rigor of a definition) and linguistic context (context that illustrates the function of a term in discourse, but that provides no conceptual information).

Page 54: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

It seems as if the structure of ISO 12620:1999 is to some extent based on the structure typically found in a terminological entry. Since the above mentioned DCR of TC 37 will also include data categories of dictionaries, this structure is not very appropriate.

Consequently it was decided to give up a classification of the categories.

It is however difficult to ensure completeness, consistency, user-friendliness and extensibility of the above mentioned DCR, if there is no structure of the data categories.

Page 55: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

The structure of the DCRAs already mentioned the DCR will contain data categories that are relevant in various areas, such as terminology, lexicography and machine translation. These areas are referred to as thematic domains.

In August 2007 there was an introductory meeting for the TC 37 Data Category Registry, in which all Sub-Committees and Working Groups that have any activities involving data categories were requested to nominate experts to serve on the Thematic Domain Group (TDG).

The idea was, that each TDG should be charged with the specification of domain-specific data categories for a specific data processing environment within TC 37.

Page 56: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Figure 1 (from Wright 2004) illustrates clearly that the various subsets of the DCR, i.e. the thematic domains, will overlap. For example, the data categories part of speech and grammatical gender will be relevant in all three different thematic domains.

Page 57: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Collections of Lexical data -

description of data categories and data structure -

Part 1: Taxonomy for the classification of information types

’STANLEX’

 

Danish Standard: DS 2394-1 (1998)

Page 58: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

STANLEX taxonomy

Main categories based on linguistic disciplines

etymological information grammatical information graphical information phonetic information semantic information usage

In addition to these categories there are some categories for administrative information and structural information.

This taxonomy was developed by a group of terminologists, lexicographers and people working with machine translation and other kinds of natural language processing.

Page 59: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Grammatical information

• Part of speech• Gender

 

• Information on inflection • Stem• Paradigm information• Inflected form

  • Word formation  

  • Syntax • Syntactic frame (valency)• Specification of syntactic frame• Specification of auxiliary verb• Syntactic function

Main group Category Subcategory

Examples:

Page 60: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Main group Category Subcategory

Usage • Examples of usage • Citation• Collocation

  • Information on usage 

• Temporal• Spatial• Communicative • Frequency

  • Evaluative information  

Page 61: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

All main groups, categories and subcategories are defined and exemplified. The structure of this taxonomy gives a much clearer overview of the data categories than the original structure of ISO 12620:1999, and it is clearly better than a plain alphabetical list.

The use of a taxonomy makes it much easier to check whether the DCR of ISO TC 37 comprises all relevant data categories within a certain group. In the case of proposals for new data categories it is also much easier to check whether the category is already in the DCR, maybe under another category name.

Page 62: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Proposal for a taxonomy for lexical metadata categories

On the background of the above mentioned advantages of using a taxonomy for the classification of metadata categories it is suggested that the principles of the taxonomy of DS 2394-1:1998 are used for the structuring data categories in the DCR for lexical data in ISO TC 37.

There will no doubt be a need for more categories and subcategories than those found in DS 2394-1:1998, but it will be easy to fit new categories into the structure, as long as they are mutually independent.

There may also be a need for adjustments of the structure, since there do exist different ways of classifying lexical data.

Page 63: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

However, DS 2394-1:1998 is a good starting point, and using the principles of this taxonomy will ensure

completeness

consistency

user-friendliness

extensibility

Page 64: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 65: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Ontologies (concept models) and conceptual data models have different aims:

• ontologies aim at concept clarification and mutual understanding of concepts and consistent use of terms

• conceptual data models aim at specifying the information types of an IT system and their mutual relationships

In order to produce a well-functioning database it is necessary to know the concept model for the domain underlying the database structure, i.e. you have to be familiar with the central concepts of the domain in which the database is going to function.

Page 66: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Ontologies (concept models)

conceptual data models

information about concepts in the form of feature specifications and concept relations (information about meaning)

information about the classes in the form of attributes and associations between the classes

NB! The attributes of the data model give no information about the meaning of the classes, but only a specification of what kind of information will be given about the entities represented by the classes in question.

Page 67: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• ontologies and data models do have something in common

but• there is no one-to-one correspondence between an

ontology and the data model of the database:

• There is no one-to-one mapping between concepts

and characteristics in the ontology and classes and attributes in the data model.

• Some concepts correspond to attributes or values

in the data model - some concepts may not correspond to classes, attributes or values.

Terminological concept modelling vs. conceptual data modelling

Page 68: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Draft concept system: NORDTERM ‘Terminology of terminology’ in i-Model (here translated into English)

Page 69: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

More elaborate Danish version

Page 70: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Conceptual data modelling

for DANTERM / CAOS databases represented in UML

Page 71: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

belo

ngs

to

1..*

is related to

is related to

1..* 1..*

1..*

0..*

is expressed by

1..*

conceptSystem

S-IDSYSTNAMELANG

concSystPos

S-ID C-ID1C-ID2

concept

C-ID LANG CLASSA

term

C-IDE-IDSTATUS …

expression

E-ID EXPRESS

concSystRel

S-ID C-ID1 S-ID2C-ID2R-ID

0..* = zero, one or more1..* = one or more

class

attributes

association

multiplicity:

Page 72: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

belo

ngs

to

1..*

is related to

is related to

1..* 1..*

1..*

0..*

is expressed by

1..*

conceptSystem

S-ID pkSYSTNAMELANG fk

concSystPos

S-ID pkC-ID1 pkC-ID2 pk

concept

C-ID pkLANG fkCLASSA fk

term

C-ID pk fkE-ID pk fkSTATUS String…

expression

E-ID pkEXPRESS

concSystRel

S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID

information aboutprimary key (pk) foreign keys (fk) and data types (String), may be added to the attributes

Page 73: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

belo

ngs

to

1..*

is related to

is related to

1..* 1..*

1..*

0..*

is expressed by

1..*

conceptSystem

S-ID pkSYSTNAMELANG fk

concSystPos

S-ID pkC-ID1 pkC-ID2 pk

concept

C-ID pkLANG fkCLASSA fk

term

C-ID pk fkE-ID pk fkSTATUS String…

expression

E-ID pkEXPRESS

concSystRel

S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID

extra class between classes in a many-to-many relationship

Page 74: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

belo

ngs

to

1..*

is related to

is related to

1..* 1..*

1..*

0..*

is expressed by

1..*

conceptSystem

S-ID pkSYSTNAMELANG fk

concSystPos

S-ID pkC-ID1 pkC-ID2 pk

concept

C-ID pkLANG fkCLASSA fk

term

C-ID pk fkE-ID pk fkSTATUS String…

expression

E-ID pkEXPRESS

concSystRel

S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID

Reflexive association:

One concept in one position in a concept system is related to one or several concepts in the same concept system.

Page 75: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

A concept system for concepts may comprise concepts such as superordinate concept and subordinate concept (both subordinate concepts to concept).

These concepts are not found in the data model.

Page 76: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Conceptual data model

There are no corresponding classes: superordinate and subordinate concepts in the conceptual data model; rather, they will be represented by means of the attributes C-ID1 and C-ID2 on the class concSystRel, and the corresponding table concSystRel relates two concepts to each other together with a specification of which relation type (attribute R-ID) holds between them.

concSystRel

S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID

belo

ngs

to

1..*

is related to

is related to

1..*

0..*

1..*

conceptSystem

S-ID pkSYSTNAMELANG fk

concSystPos

S-ID pkC-ID1 pkC-ID2 pk

concept

C-ID pkLANG fkCLASSA fk

Page 77: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Draft concept system: NORDTERM ‘Terminology of terminology’ in i-Model (here translated into English)

Another example: concepts such as intension and extension, which are very important in a concept system for the understanding of central concepts like concept and characteristic, will not be found in an UML diagram for a terminology database.

Page 78: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Ontology Conceptual data model

Logical data model

Physical data model

Models as the basis for development of IT systems

Recommendation:

Always develop an ontology before developing a conceptual data model!

Page 79: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Terminological ontologies

• Concept clarification: ontology, taxonomy, data model etc.

• An ontology of ontologies

• A taxonomy for lexical data

• Terminological concept modelling vs.conceptual data modelling

• Modelling partiel equivalence between concepts

Overview

Page 80: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

In multilingual terminology work concept systems will be established for all languages in a project.

In some cases one concept in one language may be partially equivalent to several concepts in one or several other languages. This also means that the concept systems of the individual languages differ.

Therefore it should be possible in a terminology database to establish an equivalence relationship between one concept in one language and two or three concepts in another language.

Modelling partiel equivalence between concepts

Page 81: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

parti

al

equi

vale

nce

partial equivalence

partial equivalence

Concept 1English

Concept 1German

Concept 2German

Concept 2English

This could for example be the case in terminology within the field of education. One level in the English education system may correspond to two levels in the German education system and vice versa.

Page 82: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Entry 1

Entry 2Entry 3

This means that each concept in a terminology database may be linked to several terminological entries.

Concept 1English

Concept 1German

Concept 2German

Concept 2English

Page 83: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

partial equivalence

Entry 3

Entry 2

Entry 1 In many cases the Terminology Management System will build on a simpler model:

Concept 1 in English and Concept 1 in German are duplicated in the database, since one concept can not be linked to several entries.

Concept 1English Version 1

Concept 1GermanVersion 1

Concept 2German

Concept 2English

Concept 1GermanVersion 2

Concept 1English Version 2

partial equivalence

partial equivalence

Page 84: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

0..1

1

1 1

0..*

1..*

1..*

1

0..*

1..*

1..*0..1

0..*

1

subject field

concept

definition term

grammarsourceidentifier

originatingperson

1

0..1

1

0..*

originationdate

1..*

0..*

context

note

1

0..*

terminological entry

1..*

1..*

degree of equivalence

entry_concept

1

0..*

The model allows one concept to be linked to one or more concepts in other languages. This means that each concept may be linked to several terminological entries.

SC 3: Design, implementation and use of terminology management systems

Page 85: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

The system of courts in Italy

22 concepts

Page 86: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

The Danish system

13 concepts

Page 87: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Generic concepts within Health Care in Denmark (top & core ontology – translated into English

Problems in translating e.g. activity, conduct, action.

Page 88: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

• Domain specific ontologies should be based on the principles of terminological ontologies.

• Concept clarification is needed wrt. ontology, taxonomy, data model etc.

• An ontology of ontologies would help in this concept clarification

• A taxonomy for lexical data for TC 37 will ensure completeness, consistency, user-friendliness and extensibility

• Terminological ontologies and conceptual data models differ

• Introducing multilingualism in ontologies requires modelling of partiel equivalence between concepts

Conclusions

Page 89: ISO TC37/SC4 N433 Busan 2007 ONTOLOGIES & TAXONOMIES Bodil Nistrup Madsen Department of International Language Studies and Knowledge Technology

Thank you for your attention!

Bodil Nistrup MadsenDepartment for International Language

Studies and Knowledge Technologywww.cbs.isv.dk

&DANTERMcentretwww.danterm.dk

Copenhagen Business School

Handelshøjskolen i København