using linked data to diversify search results: a case study in cultural heritage

20
Using Linked Data to diversify search results a case study in cultural heritage Chris Dijkshoorn Lora Aroyo Guus Schreiber Jan Wielemaker Lizzy Jongma

Upload: chris-dijkshoorn

Post on 07-Jul-2015

548 views

Category:

Science


2 download

DESCRIPTION

EKAW 2014 Short Paper presentation on how to diversify search results using Linked Data.

TRANSCRIPT

Page 1: Using Linked Data to diversify search results: a case study in cultural heritage

Using Linked Data to diversify search results

a case study in cultural heritage Chris Dijkshoorn Lora Aroyo Guus Schreiber Jan Wielemaker Lizzy Jongma

Page 2: Using Linked Data to diversify search results: a case study in cultural heritage

Provide diverse search results to enable exploration

Provide clusters of results with different topics to address ambiguity

Uses of diverse search results

Page 3: Using Linked Data to diversify search results: a case study in cultural heritage

Provide diverse search results to enable exploration

Provide clusters of results with different topics to address ambiguity

Hypothesis External semantics can provide the means to diversify search results while providing context

Uses of diverse search results

Page 4: Using Linked Data to diversify search results: a case study in cultural heritage

Rijksmuseum Amsterdam

‣ ~1,000,000 objects ‣ Dutch Masters like Rembrandt

The collection online

‣ Published as Linked Data ‣ ~550,000 object records ‣ ~250,000 images

Cultural heritage collection

Page 5: Using Linked Data to diversify search results: a case study in cultural heritage

Subject matter

‣ Iconclass ‣ IOC World Bird List

Type and format of artwork

‣ Art & Architecture Thesaurus ‣ Wordnet

Creators

‣ Union List of Artist Names

Links to external datasets

Page 6: Using Linked Data to diversify search results: a case study in cultural heritage

Statistics of datasets linked to collection

Iconclass39,578

concepts

Art and Architecture Thesaurus

38,619 concepts

Union List of Artist Names

113,768 Persons

Wordnet115,424 concepts

IOC World Bird List34,197

concepts

RijksmuseumCollection

543,816 Artworks143

distinct

708,287total

306,872total11,945distinct

2,293alignments

7,977distinct 148

alignments

186,126total

Page 7: Using Linked Data to diversify search results: a case study in cultural heritage

Influence external semantics on search functionality

Linked Data and search result diversity

Page 8: Using Linked Data to diversify search results: a case study in cultural heritage

Investigate how many query terms match literals in triple store

‣ Collect query terms ‣ See if query matches a literal in dataset ‣ Record percentage of literals that match per dataset

Goal: investigate potential of external vocabularies

Experiment: match queries with datasets

Page 9: Using Linked Data to diversify search results: a case study in cultural heritage

Use one month of Rijksmuseum website query logs as input

Split logs into three parts

‣ High frequency 2,393 queries ‣ Medium Frequency 16,963 queries ‣ Low frequency 25,303 queries

Query logs

Page 10: Using Linked Data to diversify search results: a case study in cultural heritage

Query terms that match with a label in the vocabularies

Query Frequency Splits

Perc

enta

ge o

f Que

ry T

erm

s m

atch

ing

a Li

tera

l

0%20

%40

%60

%80

%10

0%

High Split Medium Split Low Split H − M − L

Loaded VocabulariesArt and Architecture ThesaurusWordnetUnion List of Artist NamesIconclassIOC World Bird ListCollection only

Page 11: Using Linked Data to diversify search results: a case study in cultural heritage

Use existing semantic search algorithm1

Investigate the effects of added external semantics on

‣ Overall number of results ‣ Number of found clusters ‣ Path Length of cluster definitions

Experiment: diversifying search results

1 Thesaurus-Based Search in Large Heterogeneous Collections - Jan Wielemaker, Michiel Hildebrand, Jacco van Ossenbruggen and Guus Schreiber - ISWC 2008

Page 12: Using Linked Data to diversify search results: a case study in cultural heritage

Semantic Search Algorithm

Steps algorithm

‣ Match user query with literals in index

‣ Find paths to artwork using graph structure

‣ Use path to create clusters

Page 13: Using Linked Data to diversify search results: a case study in cultural heritage

Semantic Search Algorithm

”Rubens, Peter Paul”dc:

subject”Rubens, Philip”

Steps algorithm

‣ Match user query with literals in index

‣ Find paths to artwork using graph structure

‣ Use path to create clusters

Page 14: Using Linked Data to diversify search results: a case study in cultural heritage

Paths, Clusters and Diversity

skos: altLabel

Peale, Rubens”Rubens Peale”

ulan: siblingOf

Peale, James ulan: teacherOf

Benjamin, West

dc: creator

dc: creator

skos: altLabel”Paolo Rubens” Rubens,

Peter Paul

Use path length and number of clusters as indicators for search result diversity

Page 15: Using Linked Data to diversify search results: a case study in cultural heritage

Number of clusters per query

rma high all high rma med all med rma low all low

24

68

1012

14

Loaded Vocabularies and Query Frequency Splits

Num

ber o

f Clu

ster

s

Page 16: Using Linked Data to diversify search results: a case study in cultural heritage

Path lengths of cluster definitions

rma edm aat wn ulan ic ioc all vocs

Vocabularies

Perc

enta

ge o

f Pat

hs

0%20

%40

%60

%80

%10

0%Path Lengths

6+54321

Page 17: Using Linked Data to diversify search results: a case study in cultural heritage

Linked Data can be used to diversify search results

But not every vocabulary is useful

Hypothesis Search result diversification influenced by

1. the number of distinct links to vocabularies

Conclusions

Page 18: Using Linked Data to diversify search results: a case study in cultural heritage

Linked Data can be used to diversify search results

But not every vocabulary is useful

Hypothesis Search result diversification influenced by

1. the number of distinct links to vocabularies

2. the richness of the internal links between vocabulary objects

Conclusions

Page 19: Using Linked Data to diversify search results: a case study in cultural heritage

Incorporate relevance assessments in evaluation

Create variants graph search algorithm

Run experiments on integrated collections

Future Work

Page 20: Using Linked Data to diversify search results: a case study in cultural heritage

Chris Dijkshoorn [email protected] http://chrisdijkshoorn.nl

Using Linked Data to diversify search results a case study in cultural heritagehttps://github.com/rasvaan/cluster_search_experimental_data/ http://sealinc.ops.few.vu.nl/clustersearch/