![Page 1: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/1.jpg)
Aligning Thesauri for an integrated Access to Cultural Heritage Collections
Antoine ISAAC(including slides by Frank van Harmelen)STITCH Project
UDC ConferenceJune 5th, 2007
![Page 2: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/2.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Background
• CATCH • Continuous Access To Cultural Heritage• Funded by NWO• 10 computer science research projects applied to the
Cultural Heritage field
• STITCH• SemanTic Interoperability To access Cultural
Heritage• Exchanging and integrating metadataBeware: this is research!
![Page 3: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/3.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Agenda
• The Semantic Interoperability problem• Demo• Semantic Web solutions for interoperability
• Conceptual vocabulary alignment• Conceptual vocabulary representation
![Page 4: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/4.jpg)
Aligning Thesauri for an integrated Access to CH Collections
KB Illustrated Manuscripts
![Page 5: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/5.jpg)
Aligning Thesauri for an integrated Access to CH Collections
KB Illustrated Manuscripts
![Page 6: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/6.jpg)
Aligning Thesauri for an integrated Access to CH Collections
BNF Mandragore
![Page 7: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/7.jpg)
Aligning Thesauri for an integrated Access to CH Collections
BNF Mandragore
![Page 8: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/8.jpg)
Aligning Thesauri for an integrated Access to CH Collections
The Semantic Interoperability Problem
• Trend: simultaneous access to different collections
• Problem: conceptual heterogeneity• No standard vocabulary/thesaurus
• “classical ruins” vs. “landscape with ruins”• “the Virgin Mary” vs. “Saint Mary”
• We don’t really want itdifferent vocabularies for different domains, traditions, tasks
• Practical consequence:• Searching for “the Virgin Mary” misses “Saint Mary”• Unless we know both vocabularies
![Page 9: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/9.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Old situation
![Page 10: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/10.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Vocabulary alignment
• Find semantic correspondences between vocabulary elements • “classical ruins” ≈ “landscape with ruins”• “the Virgin Mary” = “Saint Mary”
![Page 11: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/11.jpg)
Aligning Thesauri for an integrated Access to CH Collections
New situation
![Page 12: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/12.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Demo
• http://stitch.cs.vu.nl/rp33333/MANDRA-SV-ICE-mandraNewNONE , amphibians
• Wheat
[Screenshots at the end of these slides]
![Page 13: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/13.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Agenda
• The Semantic Interoperability problem• Demo• Semantic Web solutions for interoperability
• Conceptual vocabulary alignment• Conceptual vocabulary representation
![Page 14: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/14.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Vocabulary alignment• Find correspondences between vocabulary elements
• “klassieke ruïnes” ≈ “landschap met ruïnes”• “maagd Maria” = “Heilige Moeder”
• STITCH aim: doing it (semi-)automatically• Vocabularies are big• They evolve over time
• Using techniques from Semantic Web research domain• Problem comparable to ontology alignment• Techniques already investigated there
• Linguistics, statistics
![Page 15: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/15.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Automatic alignment techniques
• Lexical • Structural• Statistical• Background knowledge
![Page 16: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/16.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Lexical alignment
• Labels of entities, textual definitions
tumorbrainLong tumor LongMore specific than
![Page 17: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/17.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Automatic Alignment Techniques
• Lexical • Structural• Statistical• Background knowledge
![Page 18: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/18.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Statistical alignment• Object information (e.g. book indexing)
![Page 19: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/19.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Statistical alignment: KB collections
(4951 1152 613) Nederlands - Nederlandse taalkunde (280 714 243) Diabetes mellitus - suikerziekte
![Page 20: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/20.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Automatic Alignment Techniques
• Lexical • Structural• Statistical• Background knowledge
![Page 21: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/21.jpg)
Aligning Thesauri for an integrated Access to CH Collections
backgroundknowledge
Alignment using shared background knowledge• Using a shared conceptual reference to find
links
thesaurus 1 thesaurus 2
![Page 22: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/22.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Alignment: no universal solution
• No single technique gives an ideal solution• Different techniques have to be
selected/combined, depending on the application case• Poor vs. rich semantic structure• Extensive vs. limited lexical coverage• Existence of collections described by several
vocabularies
• Alignment is a difficult research problem
![Page 23: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/23.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Agenda
• The Semantic Interoperability problem• Demo• Semantic Web solutions for interoperability
• Conceptual vocabulary alignment• Conceptual vocabulary representation
![Page 24: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/24.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Representing Vocabularies
Many different models and formats to represent vocabularies
• Need for standard formats to develop standardized tools and methods• Alignment process• Browsing/information retrieval tools using vocabularies
• Need to represent features commonly used by these tools• Especially lexical information and semantic links
![Page 25: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/25.jpg)
Aligning Thesauri for an integrated Access to CH Collections
SKOS (Simple Knowledge Organisation System)
• World Wide Web Consortium (W3C)• Model to represent simple conceptual vocabularies
(thesauri, classification schemes) on the Semantic Web• Comparable to Dublin Core, for conceptual vocabularies
• SKOS offers building blocks to create XML/RDF data• Concepts and ConceptSchemes• Lexical properties (prefLabel, altLabel)• Semantic relations (broader, related)• Notes (scopeNote, definition)
![Page 26: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/26.jpg)
Aligning Thesauri for an integrated Access to CH Collections
SKOS: Small UDC Example
skos:Concept http://www.udcc.org/udc/class_512skos:prefLabel
512@zxx
skos:prefLabel
Algebra@en
skos:broader http://www.udcc.org/udc/class_51
• Beware: this is a standard, not everything can be represented!E.g. for UDC, difficult to represent all types of
auxiliariesIs -2 Evidence of religion a standard concept?
![Page 27: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/27.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Agenda
• The Semantic Interoperability problem• Demo• Semantic Web solutions for interoperability
• Conceptual vocabulary alignment• Conceptual vocabulary representation
![Page 28: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/28.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Conclusion: New opportunities for making knowledge accessible
• Integration of collections at the semantic level• Semantic integration and vocabulary alignment
• Representation and publication of conceptual vocabularies• SKOS is an open, web-compatible standard
• Semantic Web research can help Cultural Heritage • Vision: a global network of interconnected collections
and vocabularies that can be exploited by standard tools?• Or somewhere in-between present situation and the
vision
![Page 29: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/29.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Discussion: UDC and Semantic Interoperability?• UDC as pivot language (spine) for multilingual
access• Ideal for multilingual scenarios• Compatible with common information needs
• “Front-office” scenario• Aligning initial vocabularies to UDC• Using UDC in the access system• MSAC
• Multilingual Subject Access to Catalogues of National Libraries• UDC as a searching/browsing means, with other vocabularies
![Page 30: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/30.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Discussion: UDC and Semantic Interoperability?
• “Back-office” scenario?• UDC as a background resource for automatic
pairwise alignment between the initial vocabularies• Multilingual information, rich semantic structure
• Both scenarios require more accessible UDC• And experimentation…
![Page 31: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/31.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Thanks!
![Page 32: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/32.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Links• STITCH http://stitch.cs.vu.nl• Demo collections
• BNF Mangragore http://mandragore.bnf.fr• KB illuminated manuscripts http://www.kb.nl/manuscripts/
• Library-originated integration projects:• MSAC search interface http://sigma.nkp.cz• MACS project http://macs.cenl.org
• Semantic web links• Semantic Web at W3C http://www.w3.org/2001/sw/• SKOS http://www.w3.org/2004/02/skos/
• Semantic Web projects dealing with Cultural Heritage• MuseumFinland http://www.museosuomi.fi/ • eCulture http://e-culture.multimedian.nl/
![Page 33: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/33.jpg)
Aligning Thesauri for an integrated Access to CH CollectionsDemo (1)
Subject vocabulary, collection 1
Subjects
![Page 34: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/34.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Demo (2)Hierarchical path
from root to selected subject
Possible specialization for selected subject
![Page 35: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/35.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Document from Collection 2
Semantic alignment of subjects activated
Demo (3)
![Page 36: Aligning Thesauri for an integrated Access to Cultural Heritage Collections](https://reader036.vdocuments.mx/reader036/viewer/2022070502/568149e3550346895db70ba3/html5/thumbnails/36.jpg)
Aligning Thesauri for an integrated Access to CH Collections
Demo (4)
Subject from voc2 aligned to voc1:amphibians”
Back