istic thesaurus ws-keizer_2010-10-22

42
The role of Thesauri and Standard Vocabularies in linking data Dr. Johannes Keizer FAO of the United Nations Office of Knowledge Exchange, Research and Extension Knowledge and Capacity for Development

Upload: johannes-keizer

Post on 08-May-2015

370 views

Category:

Documents


1 download

DESCRIPTION

Presentation on ISTIC workshop on thesauri. Enlarged and revised version of the presentation given to the UNKSIM

TRANSCRIPT

Page 1: Istic thesaurus ws-keizer_2010-10-22

The role of Thesauriand Standard Vocabularies in linking data

Dr. Johannes KeizerFAO of the United NationsOffice of Knowledge Exchange, Research and ExtensionKnowledge and Capacity for Development

Page 2: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 The Development of the Internet

Page 3: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

“Closed” (“normal”) IT environments

Data sources carefully controlled.

Data formats “custom-defined” for an application.

Linked data based on an “open world mindset”

Integrating data from the open Web

Systems designed to incorporate new information incrementally

By design, tolerance of incomplete information

Open World Mindset

Page 4: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 The Linked Data Universe: http://www.linkeddata.org (july 2009)

4

Page 5: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22The Linked Data Universe: http://www.linkeddata.org (july 2010)

Page 6: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 Example: BBC Wildlife Finder

Page 7: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 Humboldt Squid page, pulled together from a diversity of Linked Data sources

Animal Diversity Web:Nocturnal way of life

BBC TV Documentary

BBC News item

Wikipedia

Page 8: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

RDF– a grammar for the language of data

ResourcerelatedTo

ResourceA ResourceB

ResourcedescribedBy

ResourceA Some text

1. Describe resources using interrelated “statements” (“triples”).2. Use URIs – unique, globally managed identifiers – as the “words” of statements.

Page 9: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

• http://www.w3.org/2007/Talks/0221-Bangalore-IH/

RDF as a common format for merging data

Page 10: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 Finding things related to “genes” across databases

Source: Joanne Luciano, Mitre, and the W3C HCLS IG

Page 11: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Born as tools to assure consistency in the indexing of library collections

Thesauri were based on “terms”, but terms represented already concepts in a non explicit way

Hierarchical and associative relationships represented generic ontological domain knowledge

Candidate building blocks for the semantic web

Role of thesauri/concept schemes

Page 12: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 ..from thesaurus to Ontologies….

Page 13: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

around 30,000 concepts

600000 labels in around 20 languages.

one-stop shop for terminological knowledge related to agriculture in general

a knowledge base of related concepts organized in ontological relationships (hierarchical, associative, equivalence)

Is a concept/term/string based system

Concepts may be organized in multiple categories.

AGROVOC today

Page 14: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 Semantic Relationships

Concept to Concept

isA (hierarchy), isPestOf, hasPest

Concept to Term

has_lexicalization (links concepts to their lexical realizations)

Term to Term

isSynonymOf, isTranslationOf, hasAcronym, hasAbbreviation

Term to String

hasSpellingVariant, hasSingular

Page 15: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

The AGROVOC SKOS-XL Model

8171

1474

12332

skosxl:altLabel

skosxl:prefLabel

skos:broader

SKOS Label

skos:broader

SKOSConcept

rdf:type

rdf:type

6211skos:broader

AgrovocConceptScheme

skos:topConceptOfskos:inScheme

SKOSConceptScheme

rdf:type

rdf:type

:bar

:foo

“corn”

“maize”

skosxl:literalForm

skosxl:literalForm

rdf:type

rdf:type

rdf:type

Page 16: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

http://www.w3.org/2004/02/skos/

Page 17: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 SKOS-XL output

<rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/></rdf:Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/c_330829"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/><skos:inScheme rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/><skos:topConceptOf rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/></rdf:Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/xl_en_1278479064610"><literalForm xmlns="http://www.w3.org/2008/05/skos-xl#" xml:lang="en">subjects</literalForm> <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/></rdf:Description>

URI of AGROVOC concept

Page 18: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

AGROVOC EUROVOC UNBIS Relationship

http://aims.fao.org/aos/agrovoc/c_207

http://eurovoc.europa.eu/219055

agroforestry skos:exactMatch/ owl:sameAs

http://aims.fao.org/aos/agrovoc/c_4826

http://eurovoc.europa.eu/220018

MILK skos:exactMatch/ owl:sameAs

http://aims.fao.org/aos/agrovoc/c_12332

http://eurovoc.europa.eu/219871

MAIZE skos:exactMatch/ owl:sameAs

Linking vocabularies

Page 19: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049

Page 20: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

http://aims.fao.org/aos/agrovoc/c_7825

http://eurovoc.europa.eu/218754

Page 21: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

http://eurovoc.europa.eu/219871

Maize

skosxl: literalForm

Maize

http://aims.fao.org/aos/agrovoc/c_12332

AGROVOC

skosxl: literalFormMaize

http://aims.fao.org/aos/agrovoc/c_12332 owl:sameAs http://eurovoc.europa.eu/219871

owl:sameAs/exactMatch

http://agris.fao.org/agris-search/search/display.do?f=1996/TR/TR96001.xml;TR9600026

Linking data through common URIs

skosxl: literalForm

owl:sameAs/exactMatch

http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:202:0011:0015:EN:PDF

http://unbisnet.un.org:8080/ipac20/ipac.jsp?session=128F308557F34.283092&profile=bib&uri=full=3100001~!685149~!1&ri=1&aspect=subtab124&menu=search&source=~!horizon

Maize

Eurovoc

UNBIS

Page 22: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

What are we doing with unstructured data?• We have enormous amounts of unstructured

material

• Still most of the documents that we are producing are mostly semantically unstructured

• Human work to catalogue and index is becoming always more rare

• We need machines to do automatic semantic mark ups of text

• If machines are trained and based on concept schemes, ther are able to do so

Page 23: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Page 24: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

• Does Concept identification in unstructured texts

• Uses Agrovoc as a controlled vocabulary

• Prototype under testing with excellent results (entire repository of ICARDA indexed)

• Will produce in future Structured RDF files that can be used to link data like “open Calais”

AgroTagger

Page 25: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Page 26: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Page 27: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Page 28: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Life Demo: Semantic mark ups:

http://viewer.opencalais.com/

http://agropedialabs.iitk.ac.in/Tagger/Agrotagger_text.php

Page 29: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 The concept scheme workbench

Page 30: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Is a web-based working environment for managing the AGROVOC Concept Server

Facilitate the collaborative editing of multilingual terminology and semantic concept information

It includes administration and group management features

It includes workflows for maintenance, validation and quality assurance of the data pool

The CS is accessible freely to everybody to facilitates collaborative editing

The workbench

Page 31: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 Group/Action/Status

GROUP

Non registered usersTerm editorsOntology editorsValidatorsPublishersAdministrators

ACTION

concept-createconcept-deleteconcept-edit

term-createterm-editterm-delete..........

STATUS

Proposed by guestProposedRevised by guestRevisedValidatedPublishedProposed deprecatedDeprecated

Page 32: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

32

Concept Life Cycle

GUEST<concept-create>Proposed by guest

VALIDATOR<validates>Validated

PUBLISHER<publishes>Published

TERM EDITOR<concept-edit>Revised

ADMINISTRATOR<validates>Published

ONTOLOGY EDITOR<concept-delete>Proposed deprecated

PUBLISHER<validates>Deprecated

Page 33: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22 Modules

• Home

• Search

• Concept/Term Management

• Relationship Management

• Classification Scheme Management

• Validation

• Consistency Check

• Import/Export

• User/Group Management

• Statistics/Preferences

33

Page 34: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

• by string: the user can specify if the system should search by exact match, beginning with, contains or fuzzy

• by URI or term code; or by range of term code (e.g. between 123 and 9876)

• by classification schemes

• by creation or modification date

• by specific relationships (e.g. search all concepts using the “has_pest”)

• by status, language

by notes/attributes

Search

34

Page 35: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

35

Graph Visualization

Java Applets based touch graph

Visualizes concepts and its relationships with other concepts in graphical view

Page 36: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

36

Web services

AGROVOC CS WORKBENCH maintain access

response

uses

SKOS

TripleStore

Other Applications

Page 37: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

AGROVOC Web Services

Page 38: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Architecture of the System

Page 39: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

39

Front endFront end Back endBack end

Administrative Database(Mysql)

Protégé Triple Store(Mysql)

MiddlewareMiddleware

Hibernate Layer

ProtégéOWL API

Gilead

Intermediate Layer

Google Web Toolkit(GWT)

Graph Visualization

GWT Incubator

Web services

System Overview

Page 40: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Giving it a try…….

A demo version of the AWB: http://202.73.13.50:55234/agrovocdevv10d/ With all functionalities, availabe to users for testing purpose.

Latest stable release version 1.0 : (read/write) http://202.73.13.50:55381/agrovocv10i/

Latest stable release version 1.0 (Read only): http://202.73.13.50:55481/agrovocv10i/ (Visitors only with only view privilege)

Page 41: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

…and more: http://aims.fao.org

Page 42: Istic thesaurus ws-keizer_2010-10-22

dr johannes keizer - FAO of the United Nations - knowledge and capacity for development

T

hes

auru

s W

ork

sho

p –

CA

S

Bei

jin

g,

2010

-10-

22

Thank You!