accessing cultural heritage collections using semantic web techniques antoine isaac stitch project...
TRANSCRIPT
Accessing Cultural Heritage Collections using Semantic Web Techniques
Antoine ISAACSTITCH Project
SIKS Semantic Web Seminar, UtrechtApril 11th, 2007
Accessing Cultural Heritage collections using Semantic Web techniques
Background
• CATCH@ NWO• Continuous Access To Cultural Heritage• 10 computer science projects applied to the CH field
• Personalization of access, image/text/audio analysis
• Integration of projects in CH institutes (museums, archives)
• STITCH • SemanTic Interoperability To access Cultural Heritage
• Exchanging and integrating metadata• Vrije Universiteit, Koninklijke Bibliotheek & Max
Planck Institute
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web• Two important issues
• Representing Cultural Heritage vocabularies on the Semantic Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Some Needs for CH Collections
• Representation of objects and knowledge about them • Pointing at collection artifacts: books…• Describing them: creating metadata
• Specific metadata structures (metadata schemes)• Controlled expert vocabularies (e.g. thesauri)
• Accessing artifacts using metadata • E.g. search using information contained in
thesauri
Accessing Cultural Heritage collections using Semantic Web techniques
KB Illustrated Manuscripts – Iconclass vocabulary
Accessing Cultural Heritage collections using Semantic Web techniques
KB Illustrated Manuscripts
Accessing Cultural Heritage collections using Semantic Web techniques
Some Needs for CH Collections (2)
• Communicating data to the outside world• Web portals
• Integrating different collections• Virtual collections
• The European Library, http://www.theeuropeanlibrary.org
• Geheugen van Nederland, http://www.geheugenvannederland.nl
Accessing Cultural Heritage collections using Semantic Web techniques
(Biased) Semantic Web
• Pointing at resources: documents, knowledge objects
• Enabling structured assertions• Metadata about entities present on the Web
• Using vocabularies with defined semanticsOntologies: formal definitions of shared conceptual
vocabulariesRDF Schema /OWL
<owl:Class rdf:about="#Bird"> <owl:disjointWith> <owl:Class rdf:about="#Mammals"/> </owl:disjointWith> <rdfs:subClassOf> <owl:Class rdf:ID="Animals"/> </rdfs:subClassOf> </owl:Class>
<Bird rdf:about="#tweety"/>
Accessing Cultural Heritage collections using Semantic Web techniques
(Biased) Semantic Web
• Web-based resources allow division/sharing of • document• vocabulary• metadata
(doc3, hasSubject, Amsterdam)
differentowners & locations
http://www.kb.nl/eDepot
http://www.geo.org/voc/
http://www.ned.nl/doc3
Accessing Cultural Heritage collections using Semantic Web techniques
Cultural Heritage Collections and Semantic Web
• Categorizing/classifying things
• Structuring descriptions
• Web-based approach
Semantic Web techniques are good candidates for representing and exploiting Cultural Heritage metadata
Accessing Cultural Heritage collections using Semantic Web techniques
Important line of research
• Long-term projects• MuseumFinland, http://www.museosuomi.fi/ • eCulture, http://e-culture.multimedian.nl/
• Common portals to (many) collections
• Exploiting the data found in the original systems• Metadata content: place, date, creator…• Semantics of vocabularies used to create this
information• E.g. hierarchical information • “A Picture featuring a crow features a bird”
Accessing Cultural Heritage collections using Semantic Web techniques
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web• Two important issues
• Representing Cultural Heritage vocabularies on the Semantic Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Representing CH vocabularies on the Semantic Web - Similarities
• Both ontologies and thesauri bring concept hierarchies
• giving the intended meaning of a vocabulary through links between its items
• “concept/term” ≈? owl:Class• “broader” ≈?
rdfs:subClassOf• “scope notes” ≈?
rdfs:comment
Accessing Cultural Heritage collections using Semantic Web techniques
Representing CH vocabularies on the Semantic Web - Problems
• Thesauri designed for humans, no formal interpretation
• How to interpret a thesaurus in RDFS/OWL:• If “(Story of) Hercules” is a class, what are its instances?• Is “Hercules shooting Nessus” a subclass of “Love-affairs of
Hercules”?Thesaurus hierarchy: subsumption, mereological relation,
…
Accessing Cultural Heritage collections using Semantic Web techniques
Representing CH vocabularies on the Semantic Web – Different approaches
• Ontologising• Cleaning thesaurus by distinguishing roles, kinds,
etc.• Cleaning the hierarchical links
• Representing knowledge found in sources as such• Informal knowledge represented in RDF/OWL formal
framework
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS
• Simple Knowledge Organization Systems• (Future) W3C standard
• Model to represent controlled and structured vocabularies on the Semantic Web• Compatible with community needs• Core model for representing thesauri, classification
schemes, etc.
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS
• Building blocks (ontology) to create XML/RDF data about controlled vocabularies• Classes Concept and ConceptScheme• Lexical properties
• prefLabel• altLabel
• Semantic properties • broader, narrower• related
• Properties for notes and comments• scopeNote• definition
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS: Brinkman Trefwoorden (KB)
075607204 geneeskunde RT geneesmiddelenNT kindergeneeskunde
075607220 geneesmiddelen UF medicijnen
075611791 kindergeneeskunde BT geneeskunde noot: kinderen ouder dan 12 vallen niet
onderkindergeneeskunde
medicijnen USE geneesmiddelen
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS: Brinkman Trefwoorden (KB)
skos: = http://www.w3.org/2004/02/skos/core#bk: = http://www.kb.nl/brinkman/
bk:075611791
kindergeneeskundekinderen ouder dan12 vallen niet onderkindergeneeskunde
bk:075607204
geneeskunde
bk:075607220
medicijnengeneesmiddelenskos:prefLabel
skos:scopeNote
skos:broader
skos:prefLabel
skos:related
skos:prefLabel skos:altLabel
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS: Brinkman Trefwoorden (KB)
<skos:Concept rdf:about="http://www.kb.nl/brinkman/bk075607204"><skos:prefLabel>geneeskunde</skos:prefLabel><skos:related rdf:resource="http://www.kb.nl/brinkman/bk075607220"/>
</skos:Concept><skos:Concept rdf:about="http://www.kb.nl/brinkman/bk075607220">
<rdf:type rdf:resource="&skos;Concept"/><skos:prefLabel>geneesmiddelen</skos:prefLabel><skos:altLabel>medicijnen</skos:altLabel>
</skos:Concept><skos:Concept rdf:about="http://www.kb.nl/brinkman/bk075611791">
<rdf:type rdf:resource="&skos;Concept"/><skos:prefLabel>kindergeneeskunde</skos:prefLabel><skos:broader rdf:resource="http://www.kb.nl/brinkman/bk075607204"/><skos:scopeNote>kinderen ouder dan 12 vallen niet onder
kindergeneeskunde</skos:scopeNote></skos:Concept>
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web• Two important issues
• Representing Cultural Heritage vocabularies on the Semantic Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Cultural Heritage Interoperability Problems
• Problem: integrating different databases/metadata schemes/vocabularies
• Syntactic interoperability can be solved• Common format: XML (RDF)• Common vocabulary model (SKOS)
• How about conceptual heterogeneity?
Accessing Cultural Heritage collections using Semantic Web techniques
The semantic interoperability problem
• There is no standard thesaurus• We don’t really want it
different vocabularies for different expertise domains, traditions, tasks
• Consequence:• “klassieke ruïnes” vs. “landschap met ruïnes”• “maagd Maria” vs. “Heilige Moeder”
• Practical problem:• Searching for “Heilige Moeder” misses “maagd
Maria”• Unless we know both vocabularies
Accessing Cultural Heritage collections using Semantic Web techniques
Old situation
Accessing Cultural Heritage collections using Semantic Web techniques
Vocabulary alignment
• STITCH aim: find correspondences between vocabulary elements• “klassieke ruïnes” ≈ “landschap met ruïnes”• “maagd Maria” = “Heilige Moeder”
Accessing Cultural Heritage collections using Semantic Web techniques
New situation
Accessing Cultural Heritage collections using Semantic Web techniques
Automatic alignment techniques
• Lexical Labels of entities and textual definitions
• StructuralStructure of the formal definitions of entities, position in the
hierarchy
• StatisticalObject information (e.g. book indexing)
• Background knowledge Using a shared conceptual reference to find links
brainLong tumor tumorLong
Accessing Cultural Heritage collections using Semantic Web techniques
Lexical alignment
• Use preferred labels, synonyms, notes• Heuristic methods to discover
equivalence and specialization relations
Funeral of Patroclus PatroclusMore specific than
Accessing Cultural Heritage collections using Semantic Web techniques
Automatic Alignment Techniques
• Lexical Labels of entities and textual definitions
• StructuralStructure of the formal definitions of entities, position in the
hierarchy
• StatisticalObject information (e.g. book indexing)
• Shared background knowledge Using a conceptual reference to deduce correspondences
brainLong tumor tumorLong
Accessing Cultural Heritage collections using Semantic Web techniques
Statistical alignment
Accessing Cultural Heritage collections using Semantic Web techniques
Statistic approach: Koninklijke Bibliotheek case
• Situation: 2 overlapping collections indexed with different thesauri
• Comparison means: measuring overlap between concepts from the thesauri• Using the sets of books indexed by these concepts
• Results1: 9132.9 Schilderijen - schilderkunst
2: 8088.5 Kwaliteitszorg - kwaliteitsmanagement
3: 6232.7 Personeelsmanagement - personeelsbeleid
...
17: 3421.8 Diabetes mellitus - suikerziekte
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web• Two important issues
• Representing Cultural Heritage vocabularies on the Semantic Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Demo
• KB Illuminated Manuscripts• French National Library Mandragore
Manuscripts
Accessing Cultural Heritage collections using Semantic Web techniques
Manuscripts, 2nd Collection: BNF Mandragore
Accessing Cultural Heritage collections using Semantic Web techniques
Manuscripts, 2nd Collection: BNF Mandragore
Accessing Cultural Heritage collections using Semantic Web techniques
Demo
• http://stitch.cs.vu.nl/rp33333/MANDRA-SV-ICE-mandraNewNONE , amphibians
• http://stitch.cs.vu.nl/rp33333/MANDRA-SV-MANDRA-mandraNewNONE, wheat
Accessing Cultural Heritage collections using Semantic Web techniques
Conclusion: Semantic Web can help Cultural Heritage
• Representation of collections and associated expert vocabularies
• Semantic integration through correspondences between different vocabularies
New opportunities for exploiting cultural heritage information
Accessing Cultural Heritage collections using Semantic Web techniques
Thanks!
Accessing Cultural Heritage collections using Semantic Web techniques
Links
• Semantic Web at Vrije Universiteit• http://www.cs.vu.nl/ai/kr/• http://www.cs.vu.nl/bi/
• SKOS• http://www.w3.org/2004/02/skos/
• Other Cultural Heritage and Semantic Web projects• MuseumFinland, http://www.museosuomi.fi/ • eCulture, http://e-culture.multimedian.nl/