xml 之運用 --- 知識管理工具 topic maps (iso/13250) 飛資得資訊有限公司 溫達茂...

127
XML 之之之 --- 之之之之之之 Topic Maps (ISO/13250) 之之之之之之之之之 之之之 之之之之 91 之 8 之 22 之

Upload: garey-floyd

Post on 25-Dec-2015

244 views

Category:

Documents


0 download

TRANSCRIPT

XML 之運用 --- 知識管理工具

Topic Maps (ISO/13250)

飛資得資訊有限公司

溫達茂中華民國 91 年 8 月 22 日

What is XML

• XML is a method for defining special markers or ‘tags’ that can be inserted into text to indicate its logical structure and to make explicit the meaning or rhetorical role of its component parts

Why XML?• HTML: Containing information only about a page’s

appearance.– <H1>The future of the electronic scientific literature</H

1>– <H3>by John Smith</H3>

• XML: Document to be tagged with machine-readable ‘metadata’– <articletitle>The future of the electronic scientific literat

ure</articletitle>– <author><firstname>John</firstname><lastname>Smith

</lastname></author>

HTML VS. XML

• Latin phrase "Quid pro quo" in HTML– <i>Quid pro quo</i>

• US Battle Ship in HTML– <i>USS Constitution</i>

• Latin phrase "Quid pro quo" in SGML– <foreign lang="latin">Quid

pro quo</foreign>

• US Battle Ship Title– <name type="ship">USS Co

nstitution</name>

XML Intelligence

• George Washington -- 華盛頓總統 <name type=person>George Washington</name>

• Washington, D.C.-- 地名 <name type=place>Washington, D.C.</name>

• Washington's Army -- 華盛頓的軍隊 <name type=org>Washington's Army</name>

• USS Washington -- 華盛頓號戰艦 <name type=ship>USS Washington</name>

XML Structure for Journal Article

XML-Structured Documentfor

Journal Article

What does that mean? (I)

• Every part of the document is not just displayable, but also definable, including tables and/or charts.

• For example, in a scientific article, XML tags can be used to distinguish the title of the article from the names of its authors or the cells in a table

What does that mean? (II)

• Analytical

• Structural

XML -- Analytical

• Tag (Marker) -- standardize -- metadata:

– Data Interchange -- Dublin Core

– System Communication -- OpenUrl

XML -- Analytical -- Application

• Data Interchange– 數位典藏計劃聯合目錄 (OAI 標準 )– MARC 的 XML 化

• Machine Communication– OpenUrl Resolver

XML -- Structural

• Logical Structure

• Logical Relationship

XML -- Structural -- Application

• XML in Hierarchical and Structural Context

• Metadata Within XML – Knowledge Structure

– Knowledge Organization Tools

Three General Categories of Knowledge Organization

• Term List:– Emphasizing lists of terms with definitions

• Classification and Categorization– Emphasizing the creation of subject sets

• Relationship List:– Emphasizing the connections between terms

and concepts

Term Lists

• Authority files

• Glossaries

• Dictionaries

• Gazetteers

Classification and Categorization

• Subject headings

• Classification schemes, taxonomies, and categorization schemes

Relationship Lists

• Thesauri

• Topic Maps

• Semantic network

• Ontologies

Principles of Knowledge Organization

1. Group By

2. Association

Key Issues in the Principle

• Terms -- Subjects -- Concepts

• Subject Relationship -- Concept 與 Concept 之間的關係

What is Concept( 概念 )?• 定義 :

– 概念是知識基本單位,也是思維最小單位 – 概念是人類斯為的重要組成部分 , 是反映事務特有屬性的思維方式

• 屬性 :– 概念的內涵 : 事務反映到概念的特有屬性 , 亦即構成此一概念的屬性總和

• 汽車的內涵是下列屬性的總和 : “ 車” , “ 由發動機驅動” , “有駕駛” .

– 外延─概念的外延是事物反映到概念的數量範圍• 該概念所包括的各個個體的總和,稱同類外延

– 大象此一概念的外延是印度象 , 非洲象 .

• 構成整體的各部分的總和,稱成分外延– “化學元素系統”此一概念的外延是指氫 , 氧 , 硫… .

概念的邏輯暨語義關係 (I)

• 同一關係 – 兩個概念的外延一樣。例如:「機器翻譯」和「自動翻譯」,都表示用機器進行的翻譯。

• 屬種關係 – 一個概念的外延把另一個概念的外延完全包含。

• 屬概念:例如:交通工具。外延較大,包含另一個概念全部外延的概念。

• 種概念:例如:汽車。被屬概念包含,外延較小的概念。

概念的邏輯暨語義關係 (II)•交叉關係

–兩個概念外延有一部份相同,另一部份不同。例如:作家、教授。有部分作家是教授,也有部分作家不是教授。

•全異關係 –並列的全異關係:例如:鋼鐵工人、紡織工人,「工人」共同的鄰近屬概念。

–非並列的全異關係:無共同的鄰近屬概念。例如:茶杯、水果

概念的邏輯暨語義關係 (III)• 否定關係

–一個概念的否定關係構成了另一個概念的屬性。例如:加壓、減壓

概念的邏輯關係暨語義關係

概念的屬性 -- 邏輯 -- 語義關係如何定義及運用於 K.O.?

What is Topic Map?

Topic Map 之定義

• Definition: (T. A. O.)

– A set of Topics, Associations, Occurrence, Facet, and Added Theme Elements that are used to Manage a set of Terms relevant to a particular Knowledge Domain.

Topic Maps

• a Topic Map is a collection of topics and (semantically meaningful) relationships between these topics

• Topic Maps link these topics with external references, such as resources behind URLs

• XTM serves as XML-based interchange format for topic maps

Topic Maps (cont’d)

• TMs are a “superimposed semantic layer”– connection between topics and resources are

URLs

• TMs capture real-world subjects/objects but also concepts, like “TCP” or “love”– these are defined not absolute but relative to

each other

Topic Maps (cont’d)

• can deal with incomplete knowledge:– I know that Prince Charles was married but I do

not know the name of his wife.

• can be merged:– Maybe someone else knows that someone

called ‘Dianna’ was married to a British Prince– merging maps by identifying common topics

Topic Maps (cont’d)

• are supposed to deal with many thousand topics

• are built to denote information, not knowledge (no semantic network)

• are not built for a specific application but will be reused in many different contexts

What is Topic Map

• Information connection is not just web hyperlinks, instead a structured semantic link network over the resources -- easy and selective navigation to the requested information.

Elements of Topic Map

• Topics

• Association

• Occurrence and Resources

• Scope

Reification (Definition)

• creation and/or identification of a subject

• this topic ‘stands for the subject’ (proxy)

• in this process, a topic will be created and

• characteristics (name, ...) will be assigned

Topics• A topic can be any thing. Regardless whether it exists or not,

whether it is of physical nature or just an idea or expression– Web resources (Stock Quotes, Documents ....)– real world (someone, people, countries, ....)

• A topic can be any concept. – Abstract idea (Happiness, Effectiveness)

• Each topic has an internal identification (id), an external representation (baseName), can have any number of external references (occurrence) and any number of classification (instanceOf)

• Topics are only representants: they represent (proxy) the subject; the subject itself exists outside the topic map -- This is what a subject is reified by a topic means and why subjectIdentity element is proposed.

Topic (Example)

Topic Names

• every topic has an unique id within a map

• this id is for internal use only

• every topic can have (one or more) names:– this name is visible to ‘end users’

Topic Name -- BaseName

• The <baseName> element specifies a topic name• A topic name is represented by one string: the cont

ent of the <baseNameString> child of <baseName>

• The context within which the assignment of a name to a topic is valid may be expressed using a <scope> child element.

• A topic may have multiple base names in the same and/or multiple scopes.

Topic Name -- Variant

• The <variant> element is an alternate form of a topic's base name appropriate for a processing context specified by the variant's <parameters> child element

• A variant name whose parameters include the “display” or “sort” published subjects, which is semantically equivalent to display names and sort names (respectively) as defined in ISO 13250.

Variants

• variants are names for a specific purpose and/or in a specific format:

• name, as it should– appear on a mobile display– logo on black&white screen

• high resolution

• low resolution

– be used for sorting

Variants (cont’d)

• external representations– organized as a tree– parameters control which variant will be used

Topic Types

• any topic can have any number of types

• every type is itself a topic– either within the same map

• <topicRef xlink:href=“#university”/>

• then university must be a defined topic

– or defined via some other document• <subjectIndicatorRef

xlink:href=“http://www....../all/about/unis.html”/>

Topic Types (cont’d)

• topic types introduce a type hierarchy

• every topic map has its own type hierarchy

• there is NO global type system (ontology)

Topic -- instanceOf

• The <instanceOf> element specifies the class to which its parent belongs, via a <topicRef> or <subjectIndicatorRef> child element.

• The <instanceOf> element is a syntactic shortcut for an association of a special type defined by the class-instance published subject.

Topic Types (Example)

<topic id=“bond-uni”>

<instanceOf> <topicRef xlink:href=“#university”/> </instanceOf> <baseName> <baseNameString>Bond University</baseNameString> </baseName> <occurrence> <resourceRef xlink:href=“http://www.bond.edu.au/”/> </occurrence></topic>

Topic -- SubjectIdentity• The <subjectIdentity> element specifies the subject that is reified

by a topic, via <resourceRef>, <subjectIndicatorRef>, and/or <topicRef> child elements.

• When a topic has an addressable subject, the subject can be addressed directly via a <resourceRef> element. In that case, it is the resource itself which is considered the subject of the topic, not what the resource means or indicates. There can be only one such resource per topic.

• Resources may also be subject indicators, as opposed to subjects in and of themselves. Resources are used to indicate subjects via <subjectIndicatorRef> elements, of which there may be more than one per topic.

• A topic may also indicate that it has the same subject as another topic by addressing that topic via a <topicRef> element.

Associations

• topics can participate in relationships, called association, in which topics play roles as members

• Among the associations, which relationship two or more topics have to each other. It must be explicitly defined.

• topics play there– Members: the topics involved in the association are

called members– Role: and the members play the role.

• typical associations– is-located-in, lived-in, written-by– is-facillity-provided-by, requires-to-have

Association (Example)

Associations (cont’d)

• all newly introduced topics has to be defined:– is-located-in, building, location

• also these topics can be linked with associations

• associations can have any number of members (1, 2, 3, ...)

Topic Occurrences

• reference external resources– documents: via URLs

• http://www....../where/is/the/document.pdf

– defined by IANA/ICANN: via URNs• urn:inet:bond.edu.au:tech_report01

– not defined, but globally unique:• ???????• urn:my-social-security-numbers:1234-5678-9

• a topic can have any number of resources

Topic Occurrences (cont’d)

<topic id=“bond-uni”>

<baseName>

<baseNameString>Bond University</baseNameString>

</baseName>

<occurrence>

<resourceRef xlink:href=“http://www.bond.edu.au/”/>

</occurrence>

</topic>

Scopes

• not all topic characteristics are valid in all contexts

• scopes limit a characteristic

• scopes are topics themselves

Scopes (cont’d)

• occurrences:– a web document could be written in german– the document is not for a beginner, but an

expert– a visa to visit a country is not relevant for

residents, only for non-residents

• names– the document writes about trees in computer

science but not about trees in agriculture

Scopes (cont’d)

• associations– “Santa Clause brings the presents” is good

enough for children, but not for adults

Scopes (cont’d)

• if no scope was defined, then the characteristic is valid in ALL scopes

unconstrained scope

Scopes (Example)

mergeMap

• A <mergeMap> element references an external <topicMap> element through an xlink:href attribute containing a URI.

• <!ELEMENT mergeMap ( topicRef | resourceRef | subjectIndicatorRef )* >

• <topicRef>

Topic Map-DTD

Topic Map-XML

Topic Map-XSL

Topic Maps Limitation

• XML-enabled Database and Search Engine

• Association is only Part of Relationship (Non-directional Relationship)

Database-supported Topic Maps

What is an ontology

• Philosophy: Theory of existence

• An ontology is an explicit specification of objects and relations in the target world intended to share with the community and to use for building a model of the target world

• It is a taxonomy of concepts

Ontology

• To support the sharing and reuse of formally represented knowledge, it is useful to define the common vocabulary in which shared knowledge is represented. A specification of a representational vocabulary for a shared domain of discourse -- definitions of classes, relations, functions, and other objects -- is called an ontology.

Ontology

• Ontology is a Specification of a Conceptualization• Ontology:

a formal explicit description of concepts and relationship in a domain of knowledge

–Class -- Concepts

–Slot (roles, properties) -- Features & attributes of Concepts

–Facet (role restriction)

• Subject Description & Analysis:–Relationship:

• Vertical & Horizon

• Hierarchical & Structured

• Semantic and Conceptual Relationship

Ontology

• Classes describe concepts in the domain• A class can have subclasses that represent concept

s that are more specific than the superclass• An ontology together with a set of individual insta

nces of classes constitutes a knowledge base• Ontology ends and the knowledge base begins

Components of an ontology

• Concepts

• Taxonomy of the concepts

• Relations among concepts

• Formal specification of the concepts and relations

Ontology has:

• A common vocabulary

• An explicit representation of thing (conceptualization) usually left implicit behind a system

• An explicit representation of a shared understanding of the target world

Ontology Engineering

Ontology Engineering: Defining terms in the domain and relations among them– Defining concepts in the domain (classes)– Arranging the concepts in a hierarchy (subclas

s-superclass hierarchy)-- (taxonomy)– Defining which attributes and properties (slots)

classes can have and constraints on their values– Defining individuals and filling in slot values

Ontology

• What is “Ontology”?– 利用定義好的字彙來描述目前已存在之實體

– 以樹狀結構勾畫出實體間彼此之關係– 進而建構專業領域中可解釋並利用之知識架構

– 提供一致性的相關 “詞彙” 解釋與定義– 資訊呈現的單一化– 資料分類與註解的標準化

ontology

Concept Relation Instances

subConceptOf relation domain instance

domain relation

domain

concept

• Research Project

• Full Professor(AcademicStaff)

• PhDStudent

ontology

Relation

• worksAtProject

• Supervises

• Supervisor

ontology

Instances• Rudi

– Concept: FullProfessor– Relation 1. Supervises: York 2. Name: Rudi Studer

• York– Concept: PhDStudent– Relation 1. worksAtProject: On-To- Knowledge 2. Name: York Sure 3. Supervisor: Rudi

• On-To-Knowledge– Concept: ResearchProject– Relation 1.name: On-To-Knowledge ontology

Rudi

York On-To-Knowledge

supervises

supervisor

worksAtProject

worksAtProject

ontology

Gene Ontology

• Gene Ontology 的形成– Life Science is “Knowledge base” instead of “Axiom b

ase” • Knowledge of the biological role of proteins in one org

anism can often be transferred to other organisms.

– 當資料累積至一定程度 , 就必須以 “ Ontology” 的方式來呈現

– 例如 : 新發現 Protein 或 DNA 時 , 會利用已知的資料來推論其功能

Gene Ontology

• Gene Ontology 三大主體結構– Biological process 生物過程– Molecular function 分子功能– Cellular component 細胞元件

Gene Ontology

Cellular Component

Biological Process

Molecular Function

基因或蛋白質

Upgrading the Application of XML in Knowledge

Organization• HTML: Web Content Display

• XML:– Data Interchange– Web Content Representation

• Elements for the Object Attributes

• Elements for the Object Attributes with Structure

• Elements for the Object Attributes with Structure and Semantic meaning

Cycorp Cyc Knowledge Server for artificial intelligence-based Common Sense http://www.cyc.com/

UMLS - Unified Medical Language System (UMLS) of the National Library of Medicine (NLM).

What is an Ontology? http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

Ontology.org http://www.ontology.org/

Ontology internet links (complet) http://saussure.irmkant.rm.cnr.it/onto/link.html

GENE ONTOLOGY CONSORTIUM http://www.geneontology.org/

Descriptive and Formal Ontology http://www.formalontology.it/

Ontology projects http://www.kr.org/top/projects.html

The InterMed Project http://camis.stanford.edu/projects/intermed-web/

The Bio-ontologies Working Group http://smi-web.stanford.edu/projects/bio-ontology/

Java Ontology Browser http://igd.rz-berlin.mpg.de/~www/oe/mbo.html

The GO Browser http://www.informatics.jax.org/go/go_browser_help.shtml

Synchronous editing of an ontology tools http://www.swi.psy.uva.nl/wondertools/html/wondertools.html

XOL - Ontology Exchange Language http://www.ai.sri.com/~pkarp/xol/

The Ontology Inference Layer OIL http://www.ontoknowledge.org/oil/

Line's Ontology Resource http://www.cs.utk.edu/~pouchard/onto/

Formal ontology and conceptual analysis http://www.ladseb.pd.cnr.it/infor/ontology/Papers/Ontobiblio/TOC.htmla

Some links