comparing state-of-the-art collaborative filtering systems
TRANSCRIPT
Introduction
Collaborativeapproaches
Experiments
Conclusions
Comparing State-of-the-Art
Collaborative Filtering Systems
Laurent Candillier, Frank Meyer, Marc BoulleFrance Telecom R&D Lannion
MLDM 2007
1 Introduction
2 Collaborative approaches
3 Experiments
4 Conclusions
Introduction
Collaborativeapproaches
Experiments
Conclusions
Recommender systems
Help users find items they should appreciate from hugecatalogues [Adomavicius and Tuzhilin, 2005]
⇒ Collaborative filtering : based on user to item rating matrix
i1 i2 i3 i4 i5
u1 4 4 1
u2 4 3
u3 5 2 1
u4 4 5
u5 5 4
u6 5 3
u7 4 ? 1
Introduction
Collaborativeapproaches
Experiments
Conclusions
User-based approaches
Recommend items appreciated by users whose tastes are similarto the ones of the given user [Resnick et al., 1994]
⇒ need a similarity measure between usersex : pearson similarity : cosine of deviation from the mean
w(a, u) =
∑
i∈Sa∩Su(vai − va)(vui − vu)
√
∑
i∈Sa∩Su(vai − va)2
∑
i∈Sa∩Su(vui − vu)2
vui : rating of user u on item i
Su : set of items rated by user u
vu : mean rating of user u
vu =
∑
i∈Suvui
|Su|
Introduction
Collaborativeapproaches
Experiments
Conclusions
User-based approaches
Which rating for user a (active) on item i ?
Prediction using weighted sum
pai =
∑
{u|i∈Su}w(a, u) × vui
∑
{u|i∈Su}|w(a, u)|
Prediction using weighted sum of deviations from the mean
pai = va +
∑
{u|i∈Su}w(a, u) × (vui − vu)
∑
{u|i∈Su}|w(a, u)|
How many neighbors considered ?
Introduction
Collaborativeapproaches
Experiments
Conclusions
Cluster-based approaches
Recommend items appreciated by users that belong to thesame group as the given user [Breese et al., 1998]
⇒ need
a clustering method : ex : K-means
a distance measure : ex : euclidian distance
Then the rating of a user on an item is the mean rating givenby the users that belong to the same cluster
How many clusters considered ?
Introduction
Collaborativeapproaches
Experiments
Conclusions
Item-based approaches
Recommend items similar to those appreciated by the givenuser [Karypis, 2001]
⇒ dual of user-based approach
pai = vi +
∑
{j∈Sa|j 6=i} sim(i , j) × (vaj − vj)∑
{j∈Sa|j 6=i} |sim(i , j)|
sim(i , j) : similarity measure between items i and j
Sa : set of items rated by user a
vi : mean rating on item i
How many neighbors considered ?
Introduction
Collaborativeapproaches
Experiments
Conclusions
Experiments
For user- and item-based approaches, choose
similarity measure
prediction scheme
neighborhood size K
For cluster-based approaches, choose
distance measure
prediction scheme
number of clusters
Evaluation protocol [Herlocker et al., 2004]
movie rating dataset : MovieLens (6040 × 3706)
10-fold cross validation (10 × 9/10th for learning)
Mean Absolute Error Rate on test set T = {(u, i , r)}
MAE =1
|T |
∑
(u,i ,r)∈T
|pui − r |
Introduction
Collaborativeapproaches
Experiments
Conclusions
User-based approaches, similarity measures
0.68
0.72
0.76
0.8
MAE
0 500 1000 1500 2000 2500 K
Pearson
Constraint
CosineAdjusted
Proba
Introduction
Collaborativeapproaches
Experiments
Conclusions
User-based approaches, prediction schemes
0.68
0.72
0.76
0.8
MAE
0 500 1000 1500 2000 2500 K
PearsonWeightedPearsonDeviation
ProbaWeightedProbaDeviation
Introduction
Collaborativeapproaches
Experiments
Conclusions
Item-based approaches, similarity measures
0.64
0.68
0.72
0.76
MAE
0 200 400 600 800 1000 1200 1400 K
Pearson
Constraint
CosineAdjusted
Proba
Introduction
Collaborativeapproaches
Experiments
Conclusions
Summary of experiments
BestDefault BestUser BestItem BestCluster
model construction1 730 170 254
time (in sec.)prediction time
1 31 3 1(in sec.)
MAE 0.6829 0.6688 0.6382 0.6736
BestDefault : Bayes minimizing MAE
BestUser : pearson similarity, 1500 neighbors, predictionusing deviation from the mean
BestItem : probabilistic similarity, 400 neighbors,prediction using deviation from the mean
BestCluster : K-means, euclidian distance, 4 clusters,prediction using Bayes minimizing MAE
Introduction
Collaborativeapproaches
Experiments
Conclusions
Conclusions
All approaches, and all their possible options, are testedunder exactly the same conditions
Bayes is a good compromise : low error rate, lowexecution time, incremental
Deviation from the mean : better results, new foritem-based approaches
Similarity measures : pearson for user-based, probabilisticfor item-based
Introduction
Collaborativeapproaches
Experiments
Conclusions
Conclusions
The item-based approach
get the best performances in the experiments
seems to need fewer neighbors than user-based approach
is also appropriate to navigate in item catalogues evenwith no user information
may naturally use content data about items to improve itsresults (idem for user-based approach with demographicdata)
results depend on the number of items compared to thenumber of users ?
Introduction
Collaborativeapproaches
Experiments
Conclusions
Next
Need to scale well even when faced with huge datasetsex : netflix prize : 100,480,507 ratings from 480,189 users on17,770 movies
select most relevant users [Yu et al., 2002]
reduce dimensionality with PCA or SVD[Goldberg et al., 2001, Vozalis and Margaritis, 2005]
create a set of super-users [Rashid et al., 2006]
sampling ? stochastic ? bagging ?
Combine approaches ⇒ ensemble methods [Polikar, 2006]
Introduction
Collaborativeapproaches
Experiments
Conclusions
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J.Riedl (1994)Grouplens: an open architecture for collaborative filteringof netnewsIn Conference on Computer Supported Cooperative Work,pages 175–186. ACM
J. Breese, D. Heckerman and C. Kadie (1998)Empirical analysis of predictive algorithms for collaborativefilteringIn 14th Conference on Uncertainty in Artificial Intelligence,pages 43–52. Morgan Kaufman
G. Karypis (2001)Evaluation of item-based top-N recommendationalgorithms
Introduction
Collaborativeapproaches
Experiments
Conclusions
In 10th International Conference on Information and
Knowledge Management, pages 247–254
K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001)Eigentaste: a constant time collaborative filteringalgorithmInformation Retrieval, 4(2):133–151
K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002)Instance selection techniques for memory-basedcollaborative filteringIn SIAM Data Mining
J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004)Evaluating collaborative filtering recommender systemsACM Transactions on Information Systems, 22(1):5–53
G. Adomavicius and A. Tuzhilin (2005)
Introduction
Collaborativeapproaches
Experiments
Conclusions
Toward the next generation of recommender systems: asurvey of the state-of-the-art and possible extensionsIEEE Transactions on Knowledge and Data Engineering,17(6):734–749
M. Vozalis and K. Margaritis (2005)Applying SVD on item-based filteringIn 5th International Conference on Intelligent Systems
Design and Applications, pages 464–469
A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006)ClustKNN: a highly scalable hybrid model- &memory-based CF algorithmIn KDD Workshop on Web Mining and Web Usage Analysis
R. Polikar (2006)Ensemble systems in decision makingIEEE Circuits & Systems Magazine, 6(3):21–45