porting terminologies to the semantic web

24
making sense of content TM ISKO Linked Data Event - London - 2010-0 9-14 1 Porting terminologies to the Semantic Web (aka: the Semiotic Web) bernard.vatant @ mondeca.com making sense of content TM

Upload: bernard-vatant

Post on 11-May-2015

1.461 views

Category:

Technology


0 download

DESCRIPTION

aka : the Semiotic Web. Presentation at ISKO UK Linked Data Event, London, 2010-09-14

TRANSCRIPT

Page 1: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 1

Porting terminologies to the Semantic Web(aka: the Semiotic Web)

[email protected]

making sense of content TM

Page 2: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 2

Mondeca at a glance

Facts and figures - Established : 1999 - Founder and CEO : Jean Delahousse - Staff 2010 : 22- Bernard Vatant has been Senior Consultant for Mondeca since 2000

Products- Intelligent Topic Manager (Vocabularies and Knowledge base management)- CA Manager (Content integration through semantic annotation)

Services- Consulting and training in Semantic Web technologies deployment- Modeling, data and vocabulary migration and integration

References- Publication, territorial management, tourism, public sector, health

- Lexis Nexis, Wolters Kluwer, Thomson, BnF, Documentation Française, OPOCE …

- Participation in many national and european research projects- Including DataLift http://datalift.org/ (just about to kick off)

- Ongoing participation in Semantic Web standards and linked data community- From Topic Maps (2000-2001) to OWL, SKOS, …- In the Cloud : geonames.org, lingvoj.org ontologies

Page 3: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 3

Summary

A semiotic view of terminology- « Every sign is a thing » : signs (terms) are resources (business objects)- The semiotic triangle : terms, concepts and referents

Current approaches to term representations- SKOS-XL, BS 8723, ISO 25964- The Eurovoc model : a term is a denotation of a concept- Lexvo.org : a term is a sign defined by string + language- ISO TC-37 standards (LMF) only XML schemas, no ontology

Moving forward- Limits of current approaches- A strawman « Simple Term System »- Introducing explicit « meaning » objects (aka : references or significations)

Page 4: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 4

The pervasive Web – quick reminder

Internet (ca.1970)- Network of identified, connected and addressable computers

- Technical support : IP addresses

Web 1.0 (ca. 1990)- Network of identified, connected and addressable resources

- Technical support : URLs, http

Semantic Web (ca. 2010)- Network of identified, connected and addressable representations

- Technical support : URIs, RDF, content negociation

- Just about anything can be represented and connected- People (Social Web), Devices (Web of Things), Places (GeoSemantic Web),

Concepts (Web of Vocabularies) … « Everything is a Thing »

Everything? Even signs?

Page 5: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 5

Every sign is a thing (& vice versa)

http://fr.wikipedia.org/wiki/Fichier:Impasse_%C3%A0_sens_unique.jpg

Impasse Saint-Quentin

Page 6: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 6

The semiotic triangle : road signs

impasse, cul-de-sac, voie sans issue, no through road, dead end, 死路… have to get out using the path you get in … sometimes no way to get out at all

« signifiant »

« signifié »

« référent »

denotation

representation

Page 7: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 7

The semiotic triangle : lexical signs (terms)

L’Arctique est la région entourant le pôle Nord de la Terreà l’intérieur et aux abords du cercle polaire nord (Wikipédia)

‘Arctique’@fr

« signifiant »

« signifié »

« référent »

denotation

representation

Page 8: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 8

Sorting out Terms, Concepts and Things

Terms are lexical entities (signifiants)- Generally used as denotations for concepts or things- If possible qualified by terminologists- Expressed in some identified natural language

- Devil in the details : encoding system, scripting system.

Concepts are specific representations of « things »- In a certain view of the world- For a specific functional purpose

- Indexing, classification, search, inference

Things are ... just things- What users are about at the end the day (people, places, products, ideas …)

Terms, Concepts and Things should all be first-class citizens in the Semantic Web- Switching from a term-centric to a concept-centric view …

- Like in SKOS and ISO 25964- … does not mean that terms and terminology are out of the picture!

- They simply need to be defined and managed at a different level

Page 9: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 9

Translation into Semantic Web languages

Something« référent »

Concept« signifié »

Term« signifiant »

denotes

represents

owl:Thinghttp://dbpedia.org/resource/Arctic

skos:Concepthttp://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11940481m

skosxl:Labelhttp://lexvo.org/id/term/fra/Arctique

foaf:focus

lvont:means

‘Arctique’@frskosxl:literalForm

skosxl:prefLabel

Page 10: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 10

Concept-centric approach of terms (SKOS)

The concept-centric approach put concepts at the center of discourse- Terms are denotations of concepts- Standalone terms can be considered in theory, but not in practice

Minimal, shallow level of description of terms- Basic properties : lexical form + language- No support for proper lexical properties

- Part of speech, lemma, tokenization, variant

- Basic expressivity for term-to-term relationships- skosxl:labelRelation is just an abstract superproperty

Good expressivity of the term-to-concept relationships- But clearly asserted from a concept viewpoint

No support for context- Implicit context : the term-concept relationship inside a given concept scheme

Similar approach used by BS 8723 and ISO 25964- Also used in EUROVOC model with customized extensions

Page 11: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 11

Concept-centric approach to homographs

A term can denote more than one concept- aka: homography, ambiguity … issue

Q : Are homograph terms (denoting different concepts) the same resource, or not?- In other words : should they be given the same URI?

The SKOS-xl approach- SKOS-xl statement : If two instances of the class skosxl:Label

have the same literal form, they are not necessarily the same resource. - IOW : Existence of distinct terms (distinct URIs) bearing the same literal form

in the same language is not forbidden.- « table@en » can be the literal form of different terms (different URIs),

e.g., denoting different concepts such as « table (furniture) », « table (data base) » …

- SKOS-xl does not enforce this distinction, either- Using the same term (same URIs) for different concepts is not forbidden

Page 12: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 12

Concept-centric model : EUROVOC

EUROVOC model is built as extension of SKOS

Subclasses of skosxl:Label- eu:ThesaurusTerm, eu:PreferredTerm, eu:SimpleNonPreferredTerm …

- Type of term defined by the type of relationship to a concept

- No « standalone definition » of a term : a term is attached to a single concept

Specific relationships between terms- Translation, Permuted lexical form- Full name/short name, Acronym/expansion

No lexical (grammatical) level properties- Neither POS, lemma, variants …

Homographs are distinct terms- Hence homographs attached to different concepts

- Have different URIs …- … are not linked whatsoever, except appearing as sibling results of a query …- … should not occur since EUROVOC should be a unique name space

Page 13: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 13

A concept representation in EUROVOCas seen in Mondeca back-office (ITM)

pref label in current language

concept attributes

preferred term in current language

preferred terms in other languages

User language choice(25 languages available)

concept schemes hierarchy(domains and microthesauri) related concepts

Page 14: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 14

A concept representation (continued)

non-preferred termsin various languages

broader-narrower hierarchyDisplay uses terms in current user language

Page 15: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 15

Term representation level

lexical formterm type

term attributes

The term « meaning » concept Display uses the preferred termin current user language

relationships between terms

User language choice(25 languages available)

Page 16: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 16

The term-centric (semiotic) approach

As used by Lexvo.org A term is uniquely defined by a string and a language

- This definition is made functional in the URI structure- Example : http://lexvo.org/id/term/fra/Arctique

A term can have zero or more declared « meanings »- Values of the « lvont:means » property

The URI is functional whether there is zero, one or more declared « meanings »

Simple approach, but the number of meanings is to everyone guess1. http://www.lexvo.org/id/term/eng/hubject

- No meaning found in the data base, but the world is open 2. http://www.lexvo.org/id/term/eng/photosphere

- Two meanings found, linked by a lexvont:nearlySameAs relationship

3. http://www.lexvo.org/id/term/eng/table - How many meanings?

Page 17: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 17

What « table@en » means

many more of the same…

Page 18: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 18

ISO TC-37 terminology standards

Build up on top of various other (ISO) standards Define a lot of data models or schemas

- Either UML or XML schemas

Dwelve in deep complex lexical details- Addressing fine-grained terminology management issues

But provide no interoperability with the Semantic Web universe- Not even as informative annexes

Example : Lexical Markup Framework- An attempt to produce an OWL representation of LMF model- Neither normative nor even OWL-conformant- Been sitting useless on LMF website for two years.

- Any feedback? Does anyone really care? http://www.lexicalmarkupframework.org/

Even if published in Semantic Web formats- Chances of mainstream adoption are weak- Due to their sheer complexity…

Page 19: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 19

Adding context to the semiotic triangle

http://sw.opencyc.org/2009/04/07/concept/en/Table_PieceOfFurniture

‘table’@en

« signifiant »

« signifié »

« référent »

denotation

representation

Furniture

« context »

Page 20: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 20

Context of meaning in existing approaches

In SKOS and concept-centric models- The context of the meaning is the Concept Scheme

<http://id.loc.gov/authorities/sh85131792#concept>a skos:Concept [ skos:prefLabel ‘Table@en’

skos:inScheme http://id.loc.gov/authorities#topicalTerms> ]- Reads from the viewpoint of the term

- ‘Table’ is the english preferred term for concept ‘ #sh85131792’in the context of LCSH topical terms

In the purely semiotic approach of Lexvo.org- The only context is the declared language- Ambiguity is assumed, but not resolved- A term description is a bag of possible meanings ad translations- Useful, but not enough

In a nutshell, regarding context- Concept-centric approach is too restrictive …- Lexvo.org approach is too open …

Page 21: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 21

Trying to capture context

Context can be more than an implicit skos:ConceptScheme- A language- A country, a community- A document or corpus lexical context- Any combination of the above …

Actually a context might be any kind of relevant resource- Including list of resources

Neither term or concept should be linked directly to a context- Need to define « reference » or « meaning » resources- Linking one term to one concept and one context- Allowing attachement of metadata (e.g., Dublin Core)

Page 22: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 22

Requirements for « STS »

STS = « Simple Terminology System » - aka : « Simple Terminology Semiotics »

As simple as SKOS is for representation of concepts- And as extensible

Based on core classes of LMF or any relevant ISO TC-37 model- Simpler than LMF but extensible to capture all LMF subtleties

Interoperable with concept layers formats (SKOS and SKOS-xl) As open and robust as the semiotic approach of Lexvo.org Including representation of context/meanings/references

And of course recommended by a relevant standard body - Food for another W3C recommandation track?

Page 23: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 23

STS draft model (built upon lexvo ontology)

lvont:Term

sts:Context

sts:signifier

sts:Meaning

skos:Concept

sts:inContextsts:signified

anythingsts:contextPropery

geo:SpatialThingsts:spatialContext

time:Periodsts:timeContext

skos-xl:Label

lvont:Languagelvont:language

rdf:Literalskosxl:literaForm

sts:lexicalProperty

Dublin Core metadatadcterms:*

anything

extensions to fit e.g., TC-37 LMF schemasor EUROVOC management specifics …

Page 24: Porting terminologies to the Semantic Web

making sense of content TM

ISKO Linked Data Event - London - 2010-09-14 24

Ready for a standardization track ?