edgard marx, amrapali zaveri, diego moussallem and sandro rautenberg | dbtrends: exploring query...

48
DBtrends Exploring Query Logs for Ranking RDF Data AKSW Edgard Marx, Amrapali Javeri, Diego Moussallem, Sandro Rautenberg 12th International Conference on Semantic Systems

Upload: semanticsconference

Post on 16-Apr-2017

79 views

Category:

Technology


1 download

TRANSCRIPT

DBtrends

Exploring Query Logs for Ranking RDF Data

AKSW

Edgard Marx, Amrapali Javeri,

Diego Moussallem, Sandro Rautenberg

12th International Conference on Semantic Systems

Outline

• Motivation

• Background

• Ranking using Query Logs

• Evaluation• Results• Discussion

• Conclusion

• Future Works

2AKSW

3

Personal Data Enterprise Data

Motivation

Open Data

AKSW

4

http://linkeddatacatalog.dws.informatik.uni-annheim.de/state/"The size of LOD by 2014 was 31 billion triples"

"Facebook users generates 2.7 billion Like actions

per day and 300 million new

photos are uploaded daily"Josh Constine, 2012

We Have Data

"Google Processing 20,000

Terabytes A Day, And Growing"Erick Schonfeld, 2008techcrunch.com

techcrunch.com

AKSW

Motivation

Not all of

data is relevant

We Have Data

Motivation

5AKSW

6

We Have Data

Motivation

AKSW

We Have Data

7AKSW

Motivation

Ranking

8AKSW

Motivation

Scenarios

Search Machine Learning Link Discovery

9AKSW

Motivation

Resource Description

Framework (RDF)

Concrete

E=MC²

Abstract

10

Background

AKSW

Web of Data

Things

11

Background

AKSW

Web of Data

• Semantic Search• Entity Search• Question Answering• Named Entity Recognition• Link Discovery• Machine Learning

Use RDF Data

E=MC²

Ranking Functions (Types)

12

"Give me all persons"

AKSW

Retrieve

Processing &

Ranking

Background

...

Ranking Functions (Types)

13

"Give me all persons"

AKSW

Retrieve

Persons

Sort

Processing &

Ranking

Answer

Background

...

Ranking Functions (Types)

14

"Give me all persons"

AKSW

Retrieve

Persons

Sort

Processing &

Ranking

Answer

Background

...Query dependent Query independent

Ranking

15AKSW

Background

Page et al.1999

Ranking

16AKSW

Background

Page et al.1999

2001

Lee et al.

Web of Data

Ranking RDF Data

17AKSW

Background

Page et al.

2011

1999

Cheng et al. (Property)

2001

Lee et al.

Web of Data

Ranking RDF Data

18AKSW

Background

Page et al.

Thalhammer et al.

2011

1999

2014

Cheng et al. (Property)

2001

Lee et al.

Web of Data

Benchmarks

19

DBtrends Benchmark (Marx, 2016)

• 60 users from different countries (USA, India)• 9 entity ranking functions applied to DBpedia Knowledge Base

• Users sort relevant classes, properties and entities extracted from the top twenty entities belonging to the top four classes

• Task were executed using Amazon Mechanical Turk

Previous Benchmarks• Not public available

• Evaluate performace of 30 profilesAKSW

Background

Why use query logs?

AKSW20

Ranking using Query Logs

Why use query logs?

AKSW21

Ranking using Query Logs

Why use query logs?

AKSW22

Ranking using Query Logs

Query Logs

search...

Why use query logs?

AKSW23

Ranking using Query Logs

Why use query logs?

• Query logs provide relevant information about user's preference

• They refer to the real-world entities

E=MC²

AKSW24

Ranking using Query Logs

Questions

• How to map real-world entitiesto Web of Data?

• How to measure it's relevance?• Where to find a good and trustable

query log?

AKSW25

Ranking using Query Logs

How to map real world

resources?

• Rocha et al. (2004)• Ding et al. (2005)• Hogan et al. (2006)• Alsarem et al (2015)

AKSW26

Ranking using Query Logs

Query Logs

search...

Web of Data

How to measure the

resource's relevance?

AKSW27

Ranking using Query Logs

• Users search (more often) for things that are relevant

• Query logs register how often something is searched

• Query logs can be used for better estimate resource's relevance by looking how oftenit is searched

Where to find a good and

trustable query log?

AKSW28

Ranking using Query Logs

Where to find a good and

trustable query log?

AKSW29

Ranking using Query Logs

Where to find a good and

trustable query log?

• Public API• Filters

Geographic• Country• State• City

Period Day Week Month Year

AKSW30

Ranking using Query Logs

DBtrends Ranking Function

AKSW31

Ranking using Query Logs

DBtrends Ranking Function

AKSW32

Ranking using Query Logs

36Trendsdbr:New_York_City

“New York”

dbo:City

dbo:Place

2

1

1

• First, the labels of the entities are extracted and used to acquire the search history in query logs e.g. GoogleTrends ( )2-

DBtrends Ranking Function

18

36Trendsdbr:New_York_City

“New York”

dbo:City

dbo:Place

1

23

4

9 • First, the labels of the entities are extracted and used to acquire the search history in query logs e.g. GoogleTrends ( )

• Thereafter, the entity ranks are used as a base to propagate the rank to the classes ( )3 4-

2-

AKSW

1

33

Ranking using Query Logs

Entity Ranking Functions

• DBtrends• MIXED-RANK

• DB-IN • DB-OUT• DB-RANK

• PAGE-IN • PAGE-OUT• PAGE-RANK• E-PAGE-IN• SEO-PA• SHARED-LINKS

+

Evaluation

34AKSW

Property/Class Ranking

Functions

• Instances• Instances

Property

Class

AKSW35

Evaluation

• Relin• RandomRank• Instances• Instances

Results

AKSW

• PAGE-RANK• E-PAGE-IN• SHARED-LINKS• SEO-PA

• DB-OUT• PAGE-IN• PAGE-OUT• DB-IN• DB-RANK

36

Evaluation Entity

Results

AKSW

• MIXED-RANK• PAGE-RANK• E-PAGE-IN• SHARED-LINKS• SEO-PA

• DB-OUT• PAGE-IN• DBtrends • PAGE-OUT• DB-IN• DB-RANK

37

Evaluation Entity

Discussion

AKSW

• Functions that take into consideration external information provide more insights about resource's relevance

• RDF Links reflect natural connections rather than resouce's relevance

• MIXED-RANK• PAGE-RANK• E-PAGE-IN• SHARED-LINKS• SEO-PA

• DB-OUT• PAGE-IN• DBtrends • PAGE-OUT• DB-IN• DB-RANK

Entity

38

Evaluation

Discussion

AKSW

• There is no pattern in the impact distribution of query longs

• Queries (not necessarly) help to improve a ranking functions

• Internal agreement ~63%

39

Evaluation Entity

Results

AKSW

• RandomRank• Relin• Instances• Instances

• Instances• Instances

Property

Class

40

Evaluation

Discussion

AKSW

• RandomRank• Relin• Instances• Instances

• Internal agreement ~37%• Ranks are very sparse• Not conclusive

41

Evaluation Property

Discussion

AKSW

• Internal agreement ~67%• Instances• Instances

42

Evaluation Class

Discussion

AKSW

dbo:PopulatedPlacedbo:Settlementdbo:Placeowl:Thing

A simple sort can be very

effective

43

Evaluation

dbo:PopulatedPlace

dbo:Settlement

dbo:Place

owl:Thing

• Instances• Instances

Class

Discussion

AKSW

• Confidence in executing the tasks:

Indians 90%

Americans 60%

• Ranks produced by Indians were

more sparse

• Abstract entities appear before

entities

44

Evaluation Caviats

Summary

AKSW

• Entity Ranking functions produce better results

when considering external information

• A simple sort of the number of instances can be

very effective for ranking classes

• Query logs can (not necessarily) improve entity

ranking functions

45

Evaluation

Benchmark

AKSW

• Benchmark

• Ranking functions

• Library (Java)

46

Evaluation

dbtrends.aksw.org

Future Works

AKSW

• Extend the evaluation to other

countries and ranking functions

• Evaluate the impact of

contex-aware ranking functions

• Use others similarity ranking

functions

47

Acknowledgements

48

AKSW

Contact

http://emarx.org