an evaluation of simrank and personalized pagerank to build a recommender system for the web of data

24
WWW 2015 – 7th International Workshop on Web Intelligence & Communities May 18, 2015 in Florence, Italy An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data Phuong T. Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio {phuong.nguyen, paolo.tomeo, tommaso.dinoia, eugenio.disciascio}@poliba.it Polytechnic University of Bari - Bari (ITALY)

Upload: paolo-tomeo

Post on 16-Apr-2017

301 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

An evaluation of SimRank and Personalized PageRank tobuild a recommender system for the Web of Data

Phuong T. Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio

{phuong.nguyen, paolo.tomeo, tommaso.dinoia, eugenio.disciascio}@poliba.it

Polytechnic University of Bari - Bari (ITALY)

Page 2: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Outline

Introduction Recommender systems Linked Open Data SimRank and Personalized PageRank Experimental Evaluation Conclusions

Page 3: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Introduction

Content-based recommender systems base on the notion of similarity between items, but obviously they need content

Web of data is an opportunity to foster data-intensive applications

We have investigated how effectively SimRank and Personalized PageRank work with Linked Data in the recommendation task.

Page 4: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

Recommender Systems

Page 5: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

Recommender Systems

• Content-based filtering• Collaborative filtering

Page 6: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Content-based RSs try to recommend items similar to those a given user has liked in the past. The recommendations are based upon a description of the items and a profile of the user’s interests

Recommender System

User profile

Content-based RSs

Items description

Personalized Recommendations

Page 7: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

We need domain knowledgeand rich descriptions of the items

No suitable suggestion without enough information

Quality of CB recommendations depends on quantity and quality of the features explicitly associated to the items

Main drawback: Limited Content Analysis

P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners, Chapter 3, Springer, 2010.

Page 8: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Linked Open Data

• Initiative for publishing and connecting data on the Web using Semantic Web technologies

• >30 billion of RDF triples from hundreds of data sources• Good resource to mine existing information and deduce new

knowledge

Page 9: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Florence in DBpedia

Florence

Formers capital of Italydcterms:subject

Tuscany

dbpedia-owl:region

Italydbpedia-owl:country

.

.

.

Example of facts in Triple-form: subject- predicate - object

Page 10: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Enrich Data model

Catalog Items Knowledge Graph

1 – Mapping (Entity linking)2 – Subgraph extraction

Page 11: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Enrich Data model

Catalog Items Knowledge Graph

Each item is represented by a part of this subgraph.We need for algorithms able to compute similarity between graphs.

Page 12: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

SimRank

SimRank computes similarity between nodes in a graph using the structural context: two nodes are similar if they are referenced by similar nodes.

G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In ACM KDD ’02, pages 538–543, 2002.

Given k ≥ 0, R(k) (α, β) = 1 with α = β. R(k) (α, β) = 0 with k = 0 and α = β. Otherwise, the general formula is

Page 13: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Personalized PageRank

A node receives an amount of rank from every node which points to it and in turn transfers an amount of its rank to the nodes it refers to.

T. H. Haveliwala. Topic-sensitive pagerank. In ACM WWW ’02, pages 517–526, 2002. ACM.

The similarity between two items α and β represented by vectors α = {ai }i=1,..,n

and β = {bi }i=1,..,n is computed as the inner product space between the two vectors

Page 14: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

K-Nearest Neighbors

K-Nearest Neighbors (k-NN) algorithm find the set neighbors(α) containing the k most similar entities β to a given item α using a similarity function sim(α, β)

A k-NN recommendation algorithm predicts the rating for item α for an user u, taking into account the neighbors of α and the profile of u

Page 15: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Evaluation Methodology

• Top-N Item recommendation task, for different number of N (from top-1 to top-50 )

• 60-40% holdout split for each user

• Baselines: Vector Space Model (VSM) and a simplified variation of the algorithm proposed in [T. Di Noia, R. Mirizzi, V. C. Ostuni, D. Romito, and M. Zanker. Linked open data to support content-based recommender systems. In ACM I-SEMANTICS ’12. ACM]

• The results for the four algorithms were computed with 10, 20, 40, 60, 80 neighbors

Page 16: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Evaluation Metrics

Accuracy Precision (P@N) Recall (R@N)

Sales diversity Catalog coverage Entropy Gini index

Novelty Long-tail percentage Expected Popularity Complement (EPC@N)

Page 17: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Dataset

Subset of Last.fm mapped to DBpedia• 1,867 users• 700 most popular artists and bands• 47,330 ratings• 113,386 number of extracted triples

Mappingshttp://sisinflab.poliba.it/semanticweb/lod/recsys/datasets/

Page 18: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Properties extracted

Properties extracted from DBpedia (113,386 extracted triples)

Inbound Outbounddbo:producerdbo:artistdbo:writerdbo:associatedBanddbo:associatedMusicalArtistdbo:musicalArtistdbo:musicComposerdbo:bandMemberdbo:formerBandMemberdbo:starringdbo:composer

dcterms:subjectdbo:genredbo:associatedBanddbo:associatedMusicalArtistdbo:instrumentdbo:occupationdbo:birthPlacedbo:background

Page 19: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Accuracy with 40 neighbors

PageRank and SimRank more accurate

Page 20: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Catalog coverage with 40 neighbors

Worst coverage

Page 21: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Distribution with 40 neighbors

Recommendations concentred on a few items

Page 22: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Novelty Results

Page 23: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Conclusions

SimRank and Personalized PageRank obtained interesting results compared to the two baselines in terms of precision, recall and novelty, even though catalog coverage and items distribution decrease

Future work:Same experiments on two more dataset: Movielens and

TheLibraryThingSame experiments using other graph similarity metrics

Page 24: An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy

Thanks for your attention!

Q & A