collaborative filtering intro - full
DESCRIPTION
A new version of the collaborative filtering talk: - Presenting the Netflix Prize story - Discussing User-based and Item-based collaborative filtering, and various similarity metrics - Discussing how to Map-Reduce the calculationTRANSCRIPT
![Page 1: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/1.jpg)
WE KNOW YOU WILL LIKE THISIntroduction to Recommendation Engines
Monday, January 14, 13
![Page 2: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/2.jpg)
Supervised
MLX X + Y
X X + Y
T + YT
Unsupervised
Regression Classification
Y=Turnout
301225
(numeric) Y=ClassSpamNot SpamSpam
(Categorical)
Clustering
Hierarchical Clustering
Monday, January 14, 13
![Page 3: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/3.jpg)
Recommendation
Content/Model-Based
(predicting the rating)
MarabooKarnaf Ima AdamaLivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5
(Agnostic, Behavioural)
Monday, January 14, 13
![Page 4: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/4.jpg)
Monday, January 14, 13
![Page 5: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/5.jpg)
Monday, January 14, 13
![Page 6: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/6.jpg)
Monday, January 14, 13
![Page 7: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/7.jpg)
Monday, January 14, 13
![Page 8: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/8.jpg)
Monday, January 14, 13
![Page 9: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/9.jpg)
Monday, January 14, 13
![Page 10: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/10.jpg)
Monday, January 14, 13
![Page 11: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/11.jpg)
Monday, January 14, 13
![Page 12: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/12.jpg)
Rating Problem (Movies)
Preference Problem (Ads)
Monday, January 14, 13
![Page 13: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/13.jpg)
Monday, January 14, 13
![Page 14: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/14.jpg)
Related problem: Ranking
Monday, January 14, 13
![Page 15: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/15.jpg)
Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5
Monday, January 14, 13
![Page 16: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/16.jpg)
Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
Monday, January 14, 13
![Page 17: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/17.jpg)
Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5
Monday, January 14, 13
![Page 18: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/18.jpg)
User-based Collaborative Filtering
Monday, January 14, 13
![Page 19: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/19.jpg)
Monday, January 14, 13
![Page 20: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/20.jpg)
Pearson’s Correlation 1-Distance
Cosine Similiarity
Jaccard Distance “We share 5 preferences out of 7!”
“Our preferences go in the same direction!”
(but only 2 such preferences do...)
Euclidean Distance
Log-LikelihoodRatio
Measure of “Surprise” at correlation
Monday, January 14, 13
![Page 21: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/21.jpg)
Item-Based Collaborative Filtering
Usually bounded
Monday, January 14, 13
![Page 22: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/22.jpg)
Case study: Amazon100,000,000 users
2,000,000 items
Each user expresses preference for 10 items
Each item has 500 reviewsUser-Based CF:
100,000,000 x 100,000,000 similarity matrix
2,000,000 x 500 sum terms
Item-Based CF:
2,000,000 x 2,000,000 similarity matrix
2,000,000 x 10 sum terms
Monday, January 14, 13
![Page 23: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/23.jpg)
Interpretability
“People who go to La Colombe
Torrefaction & FourSquare HQ tend
to go here”
“Coffee Shop connoisseurs tend
to come here”
Monday, January 14, 13
![Page 24: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/24.jpg)
EvaluationRating Problem: Predictive accuracy (regression) metrics
RMSE, MAE, etc.
Preference (Binary) Problem: Classification accuracy (IR) metrics
Accuracy, Precision, Recall, F-1, ROC, etc.
Benchmark vs. ‘random’ and ‘popular’
Ranking accuracy metrics: Similarity of permutations
Pearson’s correlation, Spearman’s rho, Kendall’s tau
Monday, January 14, 13
![Page 25: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/25.jpg)
Monday, January 14, 13
![Page 26: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/26.jpg)
Challenges
Cold-start problems (new item, new user)
“Black” and “Grey” sheep
Exploration-exploitation and reinforcement learning
Scale
Monday, January 14, 13
![Page 27: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/27.jpg)
Advanced Topics
Dimensionality Reduction
Map-Reducible calculations
Content-based (feature-based)
Multiple models
Monday, January 14, 13
![Page 28: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/28.jpg)
MapReduce Similarity Calculation
Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1
GadiIdan 0Shahar 2Gadi 2
* =
A ui
GadiIdan 0Shahar 2Gadi 2
Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1
*
AT Aui
=Gadi
Maraboo 2Karnaf 4Ima Adama 0Liv 4
“User-based”
AT(Aui)User similarity vector
Monday, January 14, 13
![Page 29: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/29.jpg)
MapReduce Similarity Calculation
Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1
* =
A
ui
Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1
*
AT
=Gadi
Maraboo 2Karnaf 4Ima Adama 0Liv 4
“Item-Based”
(ATA)ui
Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2
Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2
ATAItem similarity matrix
Similarity of item x to item y is <i ,i >x y
Monday, January 14, 13
![Page 30: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/30.jpg)
MapReduce Similarity Calculation
Recall row outer-product matrix multiplication:
Maraboo Karnaf Ima Adama LivMaraboo 0 0 0 0Karnaf 0 1 0 1Ima Adama 0 0 0 0Liv 0 1 0 1
Maraboo Karnaf Ima Adama LivMaraboo 1 0 1 0Karnaf 0 0 0 0Ima Adama 1 0 1 0Liv 0 0 0 0
Maraboo Karnaf Ima Adama LivMaraboo 1 1 0 1Karnaf 1 1 0 1Ima Adama 0 0 0 0Liv 1 1 0 1
Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2
=
+ +
uIdanuIdan uShaharuShahar uGadiuGadiT TT
Only one user’s list of items is used every time!
Monday, January 14, 13
![Page 31: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/31.jpg)
MapReduce Similarity Calculation
All of the classic similarity functions aremade up of 3 stages:
Preprocess (uses only one ELEMENT)
Norm (Can be done in reduce on one VECTOR)
Similarity utilizes the A A matrix joinedwith norm entries
T
Monday, January 14, 13
![Page 32: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/32.jpg)
BibliographyGoogle News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Garg, Rajaram, WWW2007
Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation - Bartz, Murthi, Sebastian, EC2006
Evaluating Collaborative Filtering Recommender Systems - Herlocker, Konstan, Tenveen, Riedl, ACM TIS2004
A Survey of Collaborative Filtering Techniques - Su, Khoshgoftaar, AAI2009
An Introduction to Information Retrieval - Manning, Raghavan, Schutze, Cambridge Press
Mahout in Action - Friedman, Dunning, Anil, Owen, Manning Publications
Lessons from the Netflix Prize Challenge - Bell, Koren, KDD2009
Factorization meets the Neighbourhood: a Multifaceted Collaborative Filtering Model - Koren, KDD2008
Accurate Methods for the Statistics of Surprise and Coincidence - Dunning, ACL1993
Item-Based Collaborative Filtering Recommendation Algorithms - Sarwar, Konstan, Karypis, Riedl, WWW2001
Matrix Factorization Techniques for Recommender Systems - Koren, Bell, Volinsky, IEEE2009
recommenderlab: A Framework for Developing and Testing Recommendation Algorithms - Hahsler, 2001
Scalable Similarity-Based Neighbourhood Methods with MapReduce - Schelter, Boden, Markl, RecSys2012
Monday, January 14, 13
![Page 33: Collaborative filtering intro - Full](https://reader038.vdocuments.mx/reader038/viewer/2022103110/54b72bed4a795916198b4808/html5/thumbnails/33.jpg)
Thanks!
Nimrod Priell [email protected]@nimrodpriellhttp://www.educated-guess.com
Monday, January 14, 13