algorithms for efficient collaborative filtering vreixo formoso fidel cacheda víctor carneiro...
Post on 22-Dec-2015
219 views
TRANSCRIPT
Algorithms for Efficient Collaborative Filtering
Vreixo Formoso
Fidel Cacheda
Víctor CarneiroUniversity of A Coruña (Spain)
Glasgow - 30th March 2008EIIR 20082
Outline
Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
Glasgow - 30th March 2008EIIR 20083
Introduction
More and more information every day Personalized retrieval systems are quite
interesting– Recommender systems: recommend items that
would be more appropriate for the user’s needs or preferences
– Useful in e-commerce, but we think they could be also useful in Web IR
Recommender systems store some information about the user preferences User profile– Explicit or implicit
Glasgow - 30th March 2008EIIR 20084
Introduction
Types of recommender systems:– Content-based filtering: recommend items
based on their content Depends on automatic analysis of the items Unable to determine the item quality Serendipitous find
– Collaborative filtering: based on other users evaluations
It will recommend items well considered by other users with similar interests
Problems with computational performance and efficiency
Glasgow - 30th March 2008EIIR 20085
Outline
Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
Glasgow - 30th March 2008EIIR 20086
Background
User profile: evaluations carried by the user Evaluation: numerical value (e.g. 1 – 5) Evaluation matrix: contains the evaluation of
the users Types of collaborative filtering algorithms:
– Memory-based: use similarity measures to predict related neighbours (users or items)
The entire matrix is used in each prediction
– Model-based: build a model that represents the user behaviour predict his evaluations
The parameters of the model are estimated using the evaluation matrix (off-line)
Glasgow - 30th March 2008EIIR 20087
Background
Memory-based– Simple and give reasonably precise results– Low scalability– More sensitive to common recommender systems
problems: sparsity, cold-start and spam. Model-based
– Finds underlying characteristics in the data– Faster in prediction time– Complexity of the models:
Sensitive to changes in the data High construction times Model updating when new data are available
Glasgow - 30th March 2008EIIR 20088
Background: Notation
i1 i2u1
u2
…
.
.
.
in
um
v11 …
… v2n
vm1 vm2 …
.
.
.
.
.
.
.
.
.
.
.
.
Items (I)
Users (U)
User profile (I1)
Users that have evaluated i1 (U1)
Evaluation matrix (V)
Prediction of evaluation of user m for item n (pmn)
vu. : evaluations of user u
v.i : evaluations for item i
Mean values: vu. and v.i
Glasgow - 30th March 2008EIIR 20089
Outline
Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
Glasgow - 30th March 2008EIIR 200810
Proposed algorithms
Objectives:– Good behaviour in low density– Computational efficiency– Constant updating
Item mean algorithm– Our base Use the mean of an item as its prediction–
Simple mean based algorithm– The item mean is corrected with the mean of the user
–
ui ip v
( )
| |u
uj jj I
ui iu
v v
p vI
Glasgow - 30th March 2008EIIR 200811
Proposed algorithms
Tendencies based algorithm– Main idea: users tend to evaluate items positively
or negatively Include tendencies in the formula– Tendency ≠ mean– Tendency of a user (ubu) and tendency of an item
(ibi):
– In this algorithm we use the mean of the item and the user as well as their respective tendencies.
( )
| |u
ui ii I
uu
v v
ubI
( )
| |i
ui uu U
ii
v v
ibU
Glasgow - 30th March 2008EIIR 200812
Proposed algorithms
Tendencies based algorithm
max( , )ui u i i up v ib v ub
min( , )ui u i i up v ib v ub
min[max( , ) ( )(1 )), ]ui u i u u i ip v v ub v ib v
(1 )ui i up v v
Glasgow - 30th March 2008EIIR 200813
Outline
Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
Glasgow - 30th March 2008EIIR 200814
Experiments
Algorithms evaluated– Memory-based: user-based, item-based and similarity
fusion– Model-based: regression based, slope one, latent semantic
index and cluster based smoothing– Hybrid: personality diagnosis
Dataset MovieLens– Real rating of films: 1 (very bad) – 5 (excellent)– 100,000 evaluations from 943 users for 1,682 movies (1.78
items evaluated/user). Density 6%– Training set: 10%, 50% and 90%
For each algorithm we evaluated (5 times):– Training and prediction times– Quality of the predictions
Glasgow - 30th March 2008EIIR 200815
Proposed algorithms
Tendencies based algorithm
Only 5% of the prediction with 10% training set 2% of the prediction with 90% training set This case represents some unusual elements Tendencies seem a good prediction mechanism
Glasgow - 30th March 2008EIIR 200816
Experiments: Computational complexity
AlgorithmTraining complexity
Prediction complexity
User Based - O(mn)
Item-Based O(mn²) O(n)
Similarity Fusion O(n²m + m²n) O(mn)
Personality Diagnosis O(m²n) O(m)
Regression Based O(mn²) O(n)
Slope One O(mn²) O(n)
Latent Semantic Indexing O((m+n)³) O(1)
Cluster Based Smoothing O(mnα + m²n) O(mn)
Item Mean O(mn) O(1)
Simple Mean Based O(mn) O(1)
Tendencies Based O(mn) O(1)
Glasgow - 30th March 2008EIIR 200817
Experiments: Training time
Algorithms 10% 50% 90%
User Based 0 0 0
Item Based 415 1,060 1,986
Similarity Fusion 987 3,840 5,474
Personality Diagnosis 257 994 2,213
Regression Based 3,302 4,575 7,780
Slope One 1,246 2,175 2,541
Latent Semantic Indexing 117,758 115,218 102,855
Cluster Based Smoothing 60,247 71,529 44,635
Item Mean 2 3 3
Simple Mean Based 7 10 5
Tendencies Based 11 15 9
Glasgow - 30th March 2008EIIR 200818
Experiments: Prediction time
Algorithms 10% 50% 90%
User Based 6,250 15,597 8,915
Item Based 221 1,864 909
Similarity Fusion 227,736 756,834 264,951
Personality Diagnosis 1,369 3,845 1,400
Regression Based 205 570 265
Slope One 319 501 116
Latent Semantic Indexing 162 158 20
Cluster Based Smoothing 70,515 251,595 118,552
Item Mean 24 12 2
Simple Mean Based 25 11 4
Tendencies Based 24 16 4
Glasgow - 30th March 2008EIIR 200819
Experiments: Prediction quality
Algorithms 10% 50% 90%
User Based 0.99 0.71 0.68
Item Based 0.92 0.75 0.71
Similarity Fusion 0.84 0.73 0.71
Personality Diagnosis 0.82 0.78 0.78
Regression Based 1.03 0.76 0.74
Slope One 0.90 0.72 0.70
Latent Semantic Indexing 0.85 0.77 0.73
Cluster Based Smoothing 0.97 0.87 0.80
Item Mean 0.82 0.79 0.79
Simple Mean Based 0.79 0.72 0.72
Tendencies Based 0.79 0.72 0.71
Glasgow - 30th March 2008EIIR 200820
Outline
Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
Glasgow - 30th March 2008EIIR 200821
Conclusions
We have presented a couple of algorithms for collaborative filtering:– Very simple Good response times– Tendencies based algorithm:
Quality of the predictions equivalent to the best algorithms
Even better in low density training sets
Next steps: use these algorithms in Web IR– Problems: dataset?
Glasgow - 30th March 2008EIIR 200822
Thank you!
Questions?