comparing state-of-the-art collaborative filtering systems

Introduction

Collaborativeapproaches

Experiments

Conclusions

Comparing State-of-the-Art

Collaborative Filtering Systems

Laurent Candillier, Frank Meyer, Marc BoulleFrance Telecom R&D Lannion

MLDM 2007

1 Introduction

2 Collaborative approaches

3 Experiments

4 Conclusions

Introduction


Experiments

Conclusions

Recommender systems

Help users find items they should appreciate from hugecatalogues [Adomavicius and Tuzhilin, 2005]

⇒ Collaborative filtering : based on user to item rating matrix

i1 i2 i3 i4 i5

u1 4 4 1

u2 4 3

u3 5 2 1

u4 4 5

u5 5 4

u6 5 3

u7 4 ? 1

Introduction


Experiments

Conclusions

User-based approaches

Recommend items appreciated by users whose tastes are similarto the ones of the given user [Resnick et al., 1994]

⇒ need a similarity measure between usersex : pearson similarity : cosine of deviation from the mean

w(a, u) =

∑

i∈Sa∩Su(vai − va)(vui − vu)

√

∑

i∈Sa∩Su(vai − va)2

∑

i∈Sa∩Su(vui − vu)2

vui : rating of user u on item i

Su : set of items rated by user u

vu : mean rating of user u

vu =

∑

i∈Suvui

|Su|

Introduction


Experiments

Conclusions

User-based approaches

Which rating for user a (active) on item i ?

Prediction using weighted sum

pai =

∑

{u|i∈Su}w(a, u) × vui

∑

{u|i∈Su}|w(a, u)|

Prediction using weighted sum of deviations from the mean

pai = va +

∑

{u|i∈Su}w(a, u) × (vui − vu)

∑

{u|i∈Su}|w(a, u)|

How many neighbors considered ?

Introduction


Experiments

Conclusions

Cluster-based approaches

Recommend items appreciated by users that belong to thesame group as the given user [Breese et al., 1998]

⇒ need

a clustering method : ex : K-means

a distance measure : ex : euclidian distance

Then the rating of a user on an item is the mean rating givenby the users that belong to the same cluster

How many clusters considered ?

Introduction


Experiments

Conclusions

Item-based approaches

Recommend items similar to those appreciated by the givenuser [Karypis, 2001]

⇒ dual of user-based approach

pai = vi +

∑

{j∈Sa|j 6=i} sim(i , j) × (vaj − vj)∑

{j∈Sa|j 6=i} |sim(i , j)|

sim(i , j) : similarity measure between items i and j

Sa : set of items rated by user a

vi : mean rating on item i

How many neighbors considered ?

Introduction


Experiments

Conclusions

Experiments

For user- and item-based approaches, choose

similarity measure

prediction scheme

neighborhood size K

For cluster-based approaches, choose

distance measure

prediction scheme

number of clusters

Evaluation protocol [Herlocker et al., 2004]

movie rating dataset : MovieLens (6040 × 3706)

10-fold cross validation (10 × 9/10th for learning)

Mean Absolute Error Rate on test set T = {(u, i , r)}

MAE =1

|T |

∑

(u,i ,r)∈T

|pui − r |

Introduction


Experiments

Conclusions

User-based approaches, similarity measures

0.68

0.72

0.76

0.8

MAE

0 500 1000 1500 2000 2500 K

Pearson

Constraint

CosineAdjusted

Proba

Introduction


Experiments

Conclusions

User-based approaches, prediction schemes

0.68

0.72

0.76

0.8

MAE

0 500 1000 1500 2000 2500 K

PearsonWeightedPearsonDeviation

ProbaWeightedProbaDeviation

Introduction


Experiments

Conclusions

Item-based approaches, similarity measures

0.64

0.68

0.72

0.76

MAE

0 200 400 600 800 1000 1200 1400 K

Pearson

Constraint

CosineAdjusted

Proba

Introduction


Experiments

Conclusions

Summary of experiments

BestDefault BestUser BestItem BestCluster

model construction1 730 170 254

time (in sec.)prediction time

1 31 3 1(in sec.)

MAE 0.6829 0.6688 0.6382 0.6736

BestDefault : Bayes minimizing MAE

BestUser : pearson similarity, 1500 neighbors, predictionusing deviation from the mean

BestItem : probabilistic similarity, 400 neighbors,prediction using deviation from the mean

BestCluster : K-means, euclidian distance, 4 clusters,prediction using Bayes minimizing MAE

Introduction


Experiments

Conclusions

Conclusions

All approaches, and all their possible options, are testedunder exactly the same conditions

Bayes is a good compromise : low error rate, lowexecution time, incremental

Deviation from the mean : better results, new foritem-based approaches

Similarity measures : pearson for user-based, probabilisticfor item-based

Introduction


Experiments

Conclusions

Conclusions

The item-based approach

get the best performances in the experiments

seems to need fewer neighbors than user-based approach

is also appropriate to navigate in item catalogues evenwith no user information

may naturally use content data about items to improve itsresults (idem for user-based approach with demographicdata)

results depend on the number of items compared to thenumber of users ?

Introduction


Experiments

Conclusions

Next

Need to scale well even when faced with huge datasetsex : netflix prize : 100,480,507 ratings from 480,189 users on17,770 movies

select most relevant users [Yu et al., 2002]

reduce dimensionality with PCA or SVD[Goldberg et al., 2001, Vozalis and Margaritis, 2005]

create a set of super-users [Rashid et al., 2006]

sampling ? stochastic ? bagging ?

Combine approaches ⇒ ensemble methods [Polikar, 2006]

Introduction


Experiments

Conclusions

P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J.Riedl (1994)Grouplens: an open architecture for collaborative filteringof netnewsIn Conference on Computer Supported Cooperative Work,pages 175–186. ACM

J. Breese, D. Heckerman and C. Kadie (1998)Empirical analysis of predictive algorithms for collaborativefilteringIn 14th Conference on Uncertainty in Artificial Intelligence,pages 43–52. Morgan Kaufman

G. Karypis (2001)Evaluation of item-based top-N recommendationalgorithms

Introduction


Experiments

Conclusions

In 10th International Conference on Information and

Knowledge Management, pages 247–254

K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001)Eigentaste: a constant time collaborative filteringalgorithmInformation Retrieval, 4(2):133–151

K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002)Instance selection techniques for memory-basedcollaborative filteringIn SIAM Data Mining

J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004)Evaluating collaborative filtering recommender systemsACM Transactions on Information Systems, 22(1):5–53

G. Adomavicius and A. Tuzhilin (2005)

Introduction


Experiments

Conclusions

Toward the next generation of recommender systems: asurvey of the state-of-the-art and possible extensionsIEEE Transactions on Knowledge and Data Engineering,17(6):734–749

M. Vozalis and K. Margaritis (2005)Applying SVD on item-based filteringIn 5th International Conference on Intelligent Systems

Design and Applications, pages 464–469

A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006)ClustKNN: a highly scalable hybrid model- &memory-based CF algorithmIn KDD Workshop on Web Mining and Web Usage Analysis

R. Polikar (2006)Ensemble systems in decision makingIEEE Circuits & Systems Magazine, 6(3):21–45

comparing state-of-the-art collaborative filtering systems

Technology