iknow ranking sem_info_v9.0__2012.09.07_anjorin

32
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide 7-Sep-12 Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab Iknow_Ranking_Sem_Info_v9.0__2012.09.07_MA.pptx Ranking Resources in Folksonomies By Exploiting Semantic Information i-KNOW 2012, Saarbrücken Thomas Rodenhausen Mojisola Anjorin Renato Domínguez García Christoph Rensing Ralf Steinmetz Research Talk Ranking Algorithms Slideshare Tags Resources Users Type Event Person Location Other Topic Activity

Upload: mojisola-erdt-nee-anjorin

Post on 19-Jun-2015

209 views

Category:

Technology


2 download

DESCRIPTION

Presentation at the Iknow 2012 conference in Graz, Austria.

TRANSCRIPT

Page 1: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide 7-Sep-12

Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab

Iknow_Ranking_Sem_Info_v9.0__2012.09.07_MA.pptx

Ranking Resources in Folksonomies By Exploiting Semantic Information

i-KNOW 2012, Saarbrücken

Thomas Rodenhausen Mojisola Anjorin Renato Domínguez García Christoph Rensing Ralf Steinmetz

Research Talk

Ranking

Algorithms

Slideshare

Tags ResourcesUsers

Type

Event

Person

Location

Other

Topic

Activity

Page 2: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 2

Social Tagging Applications

www.wordle.net

Social tagging applications are used to organize, classify, manage and share knowledge resources ! Tags are freely chosen keywords attached to resources ! Tags often describe an aspect of the resource

Keynote

I-know 2012

Lora

Aroyo Graz

Semantic Web

Dial E

for

EventsPreparing for Conference

Page 3: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 3

Recommendations in Social Tagging Applications

Page 4: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 4

! Motivation ! Basics !  Folksonomy !  Folksonomy Extended by Tag Types ! Graph-based Resource Recommendation !  Challenge: Concept Drift

! AspectScore & InteliScore ! Evaluation Methodology and Metrics ! Results ! Conclusion & Future Work

Overview

Page 5: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 5

A folksonomy is a quadruple F:= (U, T, R, Y), where U – Users T – Tags R – Resources Y ⊆ U ! T ! R - tag assignment

Folksonomy

Research Talk

Ranking

Algorithms

Slideshare

Tags ResourcesUsers

[Hotho et al. 2006]

Page 6: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 6

An extended folksonomy FA:= (U, T, R, A, Y) where U – Users T – Tags R – Resources A – Tag Types Y ⊆ U ! T ! R x A A = {Topic, Resource Type, Location, Person, Event, Activity, Other}

Folksonomy Extended by Tag Types

[Böhnstedt et al. 2009]

Type

Event

Person

Location

Other

Topic

Activity

Research Talk

Ranking

Algorithms

Slideshare

Tags ResourcesUsers

Page 7: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 7

Graph-based Resource Recommendation

Graph-based Ranking

Algorithm

Item Score r1 0.9 r2 0.7 r3 0.5 r4 0.2

Recommendation List

1 1

2 1

P1

P2

P4

P3

3

4

2

1 2

Folksonomy Graph Scorer e.g. PageRank, HITS Ranked Resources

Page 8: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 8

Adapted PageRank

!

!

!"

"

"

"

"

# #

$%&'()*+,& Tango0

Buenos

Aires0

Buenos

Aires0

Dancing

Festival0

1

"-.

#-.

#-.

"-.

PageRank‘s intelligent surfer model The ranking of a node is determined by how often the surfer visits the node Adjoining edges are followed with a certain probability – determined by the edge weights The query node acts as the starting point and focus i.e. the surfer returns to this node with a certain probability – determined by the node weights

[Hotho et al. 2006]

Page 9: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 9

Concept drift is a challenge for graph-based ranking algorithms !  e.g. Ambiguous tags can cause concept drift as a single tag might represent multiple semantic concepts

Challenge: Concept Drift

FC Barcelona Website

News about Messi

Dallas Cowboys‘ Website

football

?

?

Page 10: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 10

! Motivation ! Basics !  Folksonomy !  Folksonomy Extended by Tag Types ! Graph-based Resource Recommendation !  Challenge: Concept Drift

! AspectScore & InteliScore ! Evaluation Methodology and Metrics ! Results ! Conclusion & Future Work

Overview

Page 11: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 11

The semantic information gained from the semantic relatedness between tags is used to reduce concept drift

Source: wikipedia.org

InteliScore

0.005

Semantic Relatedness (XESA)

XESA calculates the semantic relatedness between pairs of tokens (tags) using the English Wikipedia as reference corpus

[Scholl et al. 2010]

Tango

Buenos

Aires

Dancing Festival

Page 12: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 12

Tag types help to alleviate concept drift !  Tags are disambiguated with respect to different aspects of a resource that a user may describe while tagging

AspectScore

News about Tango

topic location

Tourism in Argentina

Buenos

Aires

Page 13: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 13

News about Tango

topic location

Tourism in Argentina

Buenos

Aires

AspectScore

Google Map of Buenos Aires

topic

Assumption: The tags of type „Topic“ describe the content of the resources well, therefore „Topic“ Tags are given priority.

Tag types help to alleviate concept drift !  e.g. by focusing on the tags describing the content of resources

Page 14: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 14

AspectScore: Step 1

1. Transform Query Node into Query Tags

User query node is transformed into tag nodes, weighted by the usage frequency of the user Assumption: Tags of a user describe the user‘s interests well

Tango3

Buenos

Aires1

Buenos

Aires1

[Abel 2011]

Page 15: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 15

AspectScore: Step 1

1. Transform Query Node in Query Tags

Tango

Buenos

Aires

Buenos

Aires

Dancing

Festival

Query Node

Query Tags

Query Tag

Page 16: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 16

AspectScore: Step 2

1. Transform Query Node into Query Tags

2. Create Folksonomy

Graph for each Query Tag

Page 17: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 17

AspectScore: Step 2

1. Transform Query Node into Query Tags

2. Create Folksonomy

Graph for each Query Tag !

!

"

#!

!"

" "

"

"

"

# #

"

$%&'()*+, Tango3

Buenos

Aires0

Buenos

Aires0

"

Dancing

Festival0

Depending on Ranking Algorithm e.g. FolkRank

Page 18: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 18

AspectScore: Step 3

1. Transform Query Node into Query Tags

2. Create Folksonomy Graph for each Query Tag

3. Adapt Edge Weights

Edge Weights are adapted (in several iteration steps) depending on Query Tag

!

!

"#"$

"%"$!

!"#"$

"#"$ "#"$

"#"$

"#"$

"#"$

"%"$ "%"$

"#"$&

'()*+",-. Tango3

Buenos

Aires0

Buenos

Aires0

"#"$

Dancing

Festival0

Page 19: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 19

AspectScore: Step 4

1. Transform Query Node into Query Tags

2. Create Folksonomy Graph for each Query Tag

3. Adapt Edge Weights

4. Run Ranking Algorithm Run e.g. FolkRank on the adapted folksonomy graph

!

!

"#"$

"%"$!

!"#"$

"#"$ "#"$

"#"$

"#"$

"#"$

"%"$ "%"$

"#"$&

'()*+",-. Tango3

Buenos

Aires0

Buenos

Aires0

"#"$

Dancing

Festival0

Page 20: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 20

AspectScore: Step 5

1. Transform Query Node into Query Tags

2. Create Folksonomy Graph for each Query Tag

3. Adapt Edge Weights

4. Run Ranking Algorithm

5. Accumulate Results for each Query Node

The resulting rankings are accumulated giving preference to certain tag types e.g. topic tags

Tango

3δTopicBuenos

Aires

1δ Location

Buenos

Aires1δ

Topic

Page 21: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 21

! Motivation ! Basics !  Folksonomy !  Folksonomy Extended by Tag Types ! Graph-based Resource Recommendation !  Challenge Concept Drift

! AspectScore & InteliScore ! Evaluation Methodology and Metrics ! Results ! Conclusion & Future Work

Overview

Page 22: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 22

Tango

Buenos

Aires

DancingFestival

Tango

Buenos

Aires

Dancing Festival

A post is a Pu,r= {(u,r,t)|(u,r,t) ! Y} For LeavePostOut, the recommendation task with user as input is harder as with tag as input

Evaluation Methodology: LeavePostOut

[Jäschke et al. 2007]

Page 23: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 23

RTr,t= {(u,r,t)|(u,r,t) ! Y} For LeaveRTOut, the recommendation task with tag as input is harder as with user as input

Evaluation Methodology: LeaveRTOut

Tango

Buenos

Aires

DancingFestival

Tango

Buenos

Aires

Dancing Festival

Page 24: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 24

Bibsonomy corpus with a p-core extraction at level 5 to reduce noise and to focus on the dense portion of the corpus

Evaluation Corpus

Knowledge and Data Engineering Group, University of Kassel: Benchmark Folksonomy Data from Bibsonomy, version of July 7th 2011

Before After Users 7243 69 Bookmark resources 281550 9 Bibtex resources 469654 134 Tags 216094 179 Tag assignments 2740834 3269 Bookmark posts 330192 51 Bibtex posts 526691 959

Tag Type Count Topic 2225 Other 486 Resource Type 198 Event 182 Person/Organisation 143 Activity 35

FReSET – Domínguez García et al 2012 http://www.kom.tu-darmstadt.de/research-results/downloads/software/freset/

Page 25: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 25

Evaluation Metrics

Mean Normalized Precision:

The mean of the normalized Precision at k over several queries Q

The mean of the Average Precision over several queries Q

MNP(Q,k) =1

|Q|

|Q|�

j=1

Precisionj(k)

Precisionmax,j(k)

MAP(Q) =1

|Q|

|Q|�

j=1

1

mj

mj�

k=1

Precision(Rjk)

Mean Average Precision:

[Manning et al 2008]

Page 26: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 26

A violin plot is a combination of a box plot and a density trace

Visualization of Results with Violin Plots

Median

3rd Quartile

1st Quartile [Hintze et al. 1998]

Page 27: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 27

Evaluation Results LeavePostOut

Evaluation results for the recommendation task having tag as input

Page 28: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 28

Evaluation Results for LeavePostOut

Approaches MAP AspectScore 0.2240 FolkRank 0.2136 InteliScore 0.1801 Popularity 0.0937

Evaluation results for the recommendation task having tag as input

Page 29: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 29

Evaluation results for the recommendation task having tag as input

Evaluation Results for LeaveRTOut

Page 30: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 30

Evaluation Results for LeaveRTOut

Approaches MAP Popularity 0.0834 AspectScore 0.0589 FolkRank 0.0529 InteliScore 0.0433

Evaluation results for the recommendation task having tag as input

Page 31: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 31

Exploiting semantic information for resource ranking in folksonomies Limitations ! Manually labeled tag type dataset – error prone, subjective !  XESA based on English Wikipedia – No semantic relatedness measurable for

27% of tags in corpus

Future Work ! Evaluation using CROKODIL corpus – an e-learning application with tag types ! User Study

Conclusion and Future Work

AspectScore Tag disambiguation & importance of tags (based on type)

InteliScore Based on semantic relatedness between tags e.g. XESA Type

Event

Person

Location

Other

Topic

Activity

www.crokodil.de

Page 32: Iknow ranking sem_info_v9.0__2012.09.07_anjorin

KOM – Multimedia Communications Lab 32

Questions & Contact