an evaluation of simrank and personalized pagerank to build a recommender system for the web of data
TRANSCRIPT
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
An evaluation of SimRank and Personalized PageRank tobuild a recommender system for the Web of Data
Phuong T. Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio
{phuong.nguyen, paolo.tomeo, tommaso.dinoia, eugenio.disciascio}@poliba.it
Polytechnic University of Bari - Bari (ITALY)
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Outline
Introduction Recommender systems Linked Open Data SimRank and Personalized PageRank Experimental Evaluation Conclusions
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Introduction
Content-based recommender systems base on the notion of similarity between items, but obviously they need content
Web of data is an opportunity to foster data-intensive applications
We have investigated how effectively SimRank and Personalized PageRank work with Linked Data in the recommendation task.
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
Recommender Systems
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
Recommender Systems
• Content-based filtering• Collaborative filtering
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Content-based RSs try to recommend items similar to those a given user has liked in the past. The recommendations are based upon a description of the items and a profile of the user’s interests
Recommender System
User profile
Content-based RSs
Items description
Personalized Recommendations
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
We need domain knowledgeand rich descriptions of the items
No suitable suggestion without enough information
Quality of CB recommendations depends on quantity and quality of the features explicitly associated to the items
Main drawback: Limited Content Analysis
P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners, Chapter 3, Springer, 2010.
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Linked Open Data
• Initiative for publishing and connecting data on the Web using Semantic Web technologies
• >30 billion of RDF triples from hundreds of data sources• Good resource to mine existing information and deduce new
knowledge
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Florence in DBpedia
Florence
Formers capital of Italydcterms:subject
Tuscany
dbpedia-owl:region
Italydbpedia-owl:country
.
.
.
Example of facts in Triple-form: subject- predicate - object
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Enrich Data model
Catalog Items Knowledge Graph
1 – Mapping (Entity linking)2 – Subgraph extraction
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Enrich Data model
Catalog Items Knowledge Graph
Each item is represented by a part of this subgraph.We need for algorithms able to compute similarity between graphs.
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
SimRank
SimRank computes similarity between nodes in a graph using the structural context: two nodes are similar if they are referenced by similar nodes.
G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In ACM KDD ’02, pages 538–543, 2002.
Given k ≥ 0, R(k) (α, β) = 1 with α = β. R(k) (α, β) = 0 with k = 0 and α = β. Otherwise, the general formula is
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Personalized PageRank
A node receives an amount of rank from every node which points to it and in turn transfers an amount of its rank to the nodes it refers to.
T. H. Haveliwala. Topic-sensitive pagerank. In ACM WWW ’02, pages 517–526, 2002. ACM.
The similarity between two items α and β represented by vectors α = {ai }i=1,..,n
and β = {bi }i=1,..,n is computed as the inner product space between the two vectors
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
K-Nearest Neighbors
K-Nearest Neighbors (k-NN) algorithm find the set neighbors(α) containing the k most similar entities β to a given item α using a similarity function sim(α, β)
A k-NN recommendation algorithm predicts the rating for item α for an user u, taking into account the neighbors of α and the profile of u
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Evaluation Methodology
• Top-N Item recommendation task, for different number of N (from top-1 to top-50 )
• 60-40% holdout split for each user
• Baselines: Vector Space Model (VSM) and a simplified variation of the algorithm proposed in [T. Di Noia, R. Mirizzi, V. C. Ostuni, D. Romito, and M. Zanker. Linked open data to support content-based recommender systems. In ACM I-SEMANTICS ’12. ACM]
• The results for the four algorithms were computed with 10, 20, 40, 60, 80 neighbors
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Evaluation Metrics
Accuracy Precision (P@N) Recall (R@N)
Sales diversity Catalog coverage Entropy Gini index
Novelty Long-tail percentage Expected Popularity Complement (EPC@N)
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Dataset
Subset of Last.fm mapped to DBpedia• 1,867 users• 700 most popular artists and bands• 47,330 ratings• 113,386 number of extracted triples
Mappingshttp://sisinflab.poliba.it/semanticweb/lod/recsys/datasets/
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Properties extracted
Properties extracted from DBpedia (113,386 extracted triples)
Inbound Outbounddbo:producerdbo:artistdbo:writerdbo:associatedBanddbo:associatedMusicalArtistdbo:musicalArtistdbo:musicComposerdbo:bandMemberdbo:formerBandMemberdbo:starringdbo:composer
dcterms:subjectdbo:genredbo:associatedBanddbo:associatedMusicalArtistdbo:instrumentdbo:occupationdbo:birthPlacedbo:background
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Accuracy with 40 neighbors
PageRank and SimRank more accurate
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Catalog coverage with 40 neighbors
Worst coverage
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Distribution with 40 neighbors
Recommendations concentred on a few items
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Novelty Results
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Conclusions
SimRank and Personalized PageRank obtained interesting results compared to the two baselines in terms of precision, recall and novelty, even though catalog coverage and items distribution decrease
Future work:Same experiments on two more dataset: Movielens and
TheLibraryThingSame experiments using other graph similarity metrics
WWW 2015 – 7th International Workshop on Web Intelligence & CommunitiesMay 18, 2015 in Florence, Italy
Thanks for your attention!
Q & A