using linked data to diversify search results: a case study in cultural heritage
DESCRIPTION
EKAW 2014 Short Paper presentation on how to diversify search results using Linked Data.TRANSCRIPT
Using Linked Data to diversify search results
a case study in cultural heritage Chris Dijkshoorn Lora Aroyo Guus Schreiber Jan Wielemaker Lizzy Jongma
Provide diverse search results to enable exploration
Provide clusters of results with different topics to address ambiguity
Uses of diverse search results
Provide diverse search results to enable exploration
Provide clusters of results with different topics to address ambiguity
Hypothesis External semantics can provide the means to diversify search results while providing context
Uses of diverse search results
Rijksmuseum Amsterdam
‣ ~1,000,000 objects ‣ Dutch Masters like Rembrandt
The collection online
‣ Published as Linked Data ‣ ~550,000 object records ‣ ~250,000 images
Cultural heritage collection
Subject matter
‣ Iconclass ‣ IOC World Bird List
Type and format of artwork
‣ Art & Architecture Thesaurus ‣ Wordnet
Creators
‣ Union List of Artist Names
Links to external datasets
Statistics of datasets linked to collection
Iconclass39,578
concepts
Art and Architecture Thesaurus
38,619 concepts
Union List of Artist Names
113,768 Persons
Wordnet115,424 concepts
IOC World Bird List34,197
concepts
RijksmuseumCollection
543,816 Artworks143
distinct
708,287total
306,872total11,945distinct
2,293alignments
7,977distinct 148
alignments
186,126total
Influence external semantics on search functionality
Linked Data and search result diversity
Investigate how many query terms match literals in triple store
‣ Collect query terms ‣ See if query matches a literal in dataset ‣ Record percentage of literals that match per dataset
Goal: investigate potential of external vocabularies
Experiment: match queries with datasets
Use one month of Rijksmuseum website query logs as input
Split logs into three parts
‣ High frequency 2,393 queries ‣ Medium Frequency 16,963 queries ‣ Low frequency 25,303 queries
Query logs
Query terms that match with a label in the vocabularies
Query Frequency Splits
Perc
enta
ge o
f Que
ry T
erm
s m
atch
ing
a Li
tera
l
0%20
%40
%60
%80
%10
0%
High Split Medium Split Low Split H − M − L
Loaded VocabulariesArt and Architecture ThesaurusWordnetUnion List of Artist NamesIconclassIOC World Bird ListCollection only
Use existing semantic search algorithm1
Investigate the effects of added external semantics on
‣ Overall number of results ‣ Number of found clusters ‣ Path Length of cluster definitions
Experiment: diversifying search results
1 Thesaurus-Based Search in Large Heterogeneous Collections - Jan Wielemaker, Michiel Hildebrand, Jacco van Ossenbruggen and Guus Schreiber - ISWC 2008
Semantic Search Algorithm
Steps algorithm
‣ Match user query with literals in index
‣ Find paths to artwork using graph structure
‣ Use path to create clusters
Semantic Search Algorithm
”Rubens, Peter Paul”dc:
subject”Rubens, Philip”
Steps algorithm
‣ Match user query with literals in index
‣ Find paths to artwork using graph structure
‣ Use path to create clusters
Paths, Clusters and Diversity
skos: altLabel
Peale, Rubens”Rubens Peale”
ulan: siblingOf
Peale, James ulan: teacherOf
Benjamin, West
dc: creator
dc: creator
skos: altLabel”Paolo Rubens” Rubens,
Peter Paul
Use path length and number of clusters as indicators for search result diversity
Number of clusters per query
rma high all high rma med all med rma low all low
24
68
1012
14
Loaded Vocabularies and Query Frequency Splits
Num
ber o
f Clu
ster
s
Path lengths of cluster definitions
rma edm aat wn ulan ic ioc all vocs
Vocabularies
Perc
enta
ge o
f Pat
hs
0%20
%40
%60
%80
%10
0%Path Lengths
6+54321
Linked Data can be used to diversify search results
But not every vocabulary is useful
Hypothesis Search result diversification influenced by
1. the number of distinct links to vocabularies
Conclusions
Linked Data can be used to diversify search results
But not every vocabulary is useful
Hypothesis Search result diversification influenced by
1. the number of distinct links to vocabularies
2. the richness of the internal links between vocabulary objects
Conclusions
Incorporate relevance assessments in evaluation
Create variants graph search algorithm
Run experiments on integrated collections
Future Work
Chris Dijkshoorn [email protected] http://chrisdijkshoorn.nl
Using Linked Data to diversify search results a case study in cultural heritagehttps://github.com/rasvaan/cluster_search_experimental_data/ http://sealinc.ops.few.vu.nl/clustersearch/