advanced recommendations with collaborative...
TRANSCRIPT
![Page 1: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/1.jpg)
Advanced Recommendations with Collaborative Filtering
![Page 2: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/2.jpg)
Data Preprocessing
Advanced Topics
Remember Recommendations?
2
Let’s review the basics.
![Page 3: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/3.jpg)
Data Preprocessing
Advanced Topics
Recommendations
3
![Page 4: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/4.jpg)
Data Preprocessing
Advanced Topics
Recommendations are Everywhere
4
![Page 5: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/5.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize (2006-2009)
5
![Page 6: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/6.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize (2006-2009)
6
![Page 7: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/7.jpg)
Data Preprocessing
Advanced Topics
What was the Netflix Prize?
• In October, 2006 Netflix released a dataset containing 100 million anonymous movie ratings and challenged the data mining, machine learning, and computer science communities to develop systems that could beat the accuracy of its recommendation system, Cinematch.
• Thus began the Netflix Prize, an open competition for the best collaborative filtering algorithm to predict user ratings for films, solely based on previous ratings without any other information about the users or films.
7
![Page 8: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/8.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Datasets
• Netflix provided a training dataset of 100,480,507 ratings that 480,189 users gave to 17,770 movies.
– Each training rating (or instance) is of the form user,movie, data of rating, rating .
– The user and movie fields are integer IDs, while ratings are from 1 to 5 (integral) stars.
8
![Page 9: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/9.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Datasets
• The qualifying dataset contained over 2,817,131 instances of the form user,movie, date of rating , with ratings known only to the jury.
• A participating team’s algorithm had to predict grades on the entire qualifying set, consisting of a validation and test set.
– During the competition, teams were only informed of the score for a validation or quiz set of 1,408,342 ratings.
– The jury used a test set of 1,408,789 ratings to determine potential prize winners.
9
![Page 10: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/10.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Data
10
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Users
Movie Ratings
![Page 11: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/11.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Data
11
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Instances (samples, examples,
observations)
Movie Ratings
![Page 12: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/12.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Data
12
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Users
Features (attributes, dimensions)
![Page 13: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/13.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Goal
13
Star Wars
Hoop Dreams
Contact Titanic
Joe 5 2 5 4
John 2 5 3
Al 2 2 4 2
Everaldo 5 1 5 ?
Movie Ratings
Users
Goal: Predict ? (a movie rating) for a user
![Page 14: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/14.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Methods
14
Bennett, James, and Stan Lanning. "The Netflix Prize." Proceedings of KDD Cup and Workshop. Vol. 2007. 2007.
![Page 15: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/15.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Methods
15
We discussed these methods. We will discuss these methods now.
![Page 16: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/16.jpg)
Data Preprocessing
Advanced Topics
16
All of these methods are based upon collaborative filtering.
What was that again?
![Page 17: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/17.jpg)
Data Preprocessing
Advanced Topics
Key to Collaborative Filtering
Common insight: personal tastes are correlated
If Alice and Bob both like X and Alice likes Y, then Bob is more likely to like Y, especially (perhaps) if Bob knows Alice.
17
![Page 18: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/18.jpg)
Data Preprocessing
Advanced Topics
Types of Collaborative Filtering
18
Neighborhood- or Memory-based
Model-based
Hybrid
2
3
1
![Page 19: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/19.jpg)
Data Preprocessing
Advanced Topics
Types of Collaborative Filtering
19
Neighborhood- or Memory-based
Model-based
Hybrid
2
3
1 We’ll talk about this type now.
![Page 20: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/20.jpg)
Data Preprocessing
Advanced Topics
Neighborhood-based CF
A subset of users are chosen based on their similarity to the active users, and a weighted combination of their ratings is used to produce predictions for this user.
20
![Page 21: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/21.jpg)
Data Preprocessing
Advanced Topics
Neighborhood-based CF
It has three steps:
21
1 Assign a weight to all users with respect to similarity with the active user
2
3
Select k users that have the highest similarity with the active user—commonly called the neighborhood.
Compute a prediction from a weighted combination of the selected neighbors’ ratings.
![Page 22: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/22.jpg)
Data Preprocessing
Advanced Topics
Neighborhood-based CF
In step 1, the weight 𝑤𝑎,𝑢 is a measure of similarity
between the user 𝑢 and the active user 𝑎. The most commonly used measure of similarity is the Pearson correlation coefficient between the ratings of the two users:
22
Step 1
𝑤𝑎,𝑢 = 𝑟𝑎,𝑖 − 𝑟 𝑎 𝑟𝑢,𝑖 − 𝑟 𝑢𝑖∈𝐼
𝑟𝑎,𝑖 − 𝑟 𝑎2 𝑟𝑢,𝑖 − 𝑟 𝑢
2
𝑖∈𝐼𝑖∈𝐼
where 𝐼 is the set of items rated by both users, 𝑟𝑢,𝑖 is the rating given to item 𝑖 by user 𝑢, and 𝑟 𝑢 is the mean rating given by user 𝑢.
![Page 23: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/23.jpg)
Data Preprocessing
Advanced Topics
Neighborhood-based CF
In step 2, some sort of threshold is used on the similarity score to determine the “neighborhood.”
23
Step 2
![Page 24: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/24.jpg)
Data Preprocessing
Advanced Topics
Neighborhood-based CF
In step 3, predictions are generally computed as the weighted average of deviations from the neighbor’s mean, as in:
24
Step 3
𝑝𝑎,𝑖 = 𝑟 𝑎 + 𝑟𝑢,𝑖 − 𝑟 𝑢 ×𝑤𝑎,𝑢𝑢∈𝐾
𝑤𝑎,𝑢𝑢∈𝐾
where 𝑝𝑎,𝑖 is the prediction for the active user 𝑎 for item 𝑖, 𝑤𝑎,𝑢 is the similarity between users 𝑎 and 𝑢, and 𝐾 is the neighborhood or set of most similar users.
But how do we compute the similarity 𝑤𝑎,𝑢?
![Page 25: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/25.jpg)
Data Preprocessing
Advanced Topics
Item-to-Item Matching
• An extension to neighborhood-based CF.
• Addresses the problem of high computational complexity of searching for similar users.
• The idea:
25
Rather than matching similar users, match a user’s rated items to similar items.
![Page 26: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/26.jpg)
Data Preprocessing
Advanced Topics
Item-to-Item Matching
In this approach, similarities between pairs of items 𝑖 and 𝑗 are computed off-line using Pearson correlation, given by:
26
𝑤𝑖,𝑗 = 𝑟𝑢,𝑖 − 𝑟 𝑖 𝑟𝑢,𝑗 − 𝑟 𝑗𝑢∈𝑈
𝑟𝑢,𝑖 − 𝑟 𝑖2 𝑟𝑢,𝑗 − 𝑟 𝑗
2
𝑢∈𝑈𝑢∈𝑈
where 𝑈 is the set of all users who have rated both items 𝑖 and 𝑗, 𝑟𝑢,𝑖 is the rating of user 𝑢 on item 𝑖, and 𝑟 𝑖 is the average rating of
the 𝑖th item across users.
![Page 27: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/27.jpg)
Data Preprocessing
Advanced Topics
Item-to-Item Matching
Now, the rating for item 𝑖 for user 𝑎 can be predicted using a simple weighted average, as in:
27
𝑝𝑎,𝑖 = 𝑟𝑢,𝑖𝑤𝑖,𝑗𝑗∈𝐾
𝑤𝑖,𝑗𝑗∈𝐾
where 𝐾 is the neighborhood set of the 𝑘 items rated by 𝑎 that are most similar to 𝑖.
![Page 28: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/28.jpg)
Data Preprocessing
Advanced Topics
The Netflix Prize Methods
28
Item-oriented collaborative filtering using Pearson correlation gets us right about here.
So how do we get here?
![Page 29: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/29.jpg)
Data Preprocessing
Advanced Topics
Generalizing the Recommender System
• Use an ensemble of complementing predictors.
• Many seemingly different models expose similar characteristics of the data, and will not mix well.
• Concentrate efforts along three axes. – Scale
– Quality
– Implicit/explicit
29
![Page 30: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/30.jpg)
Data Preprocessing
Advanced Topics
The First Axis: Scale
The first axis:
• Multi-scale modeling of the data
• Combine top level, regional modeling of the data, with refined, local view: – kNN: Extracts local patterns
– Factorization: Addresses regional effects
30
Global effects
Factorization
k-NN
![Page 31: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/31.jpg)
Data Preprocessing
Advanced Topics
Multi-Scale Modeling: 1st Tier
Global effects:
• Mean movie rating: 3.7 stars
• The Sixth Sense is 0.5 stars above average
• Joe rates 0.2 stars below average → Baseline estimation:
Joe will rate The Sixth Sense 4 stars
31
![Page 32: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/32.jpg)
Data Preprocessing
Advanced Topics
Multi-Scale Modeling: 2nd Tier
Factors model:
• Both The Sixth Sense and Joe are placed high on the “Supernatural Thrillers” scale
→ Adjusted estimate: Joe will rate The Sixth Sense 4.5 stars
32
![Page 33: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/33.jpg)
Data Preprocessing
Advanced Topics
Multi-Scale Modeling: 3rd Tier
Neighborhood Model
• Joe didn’t like related movie Signs
→ Final estimate: Joe will rate The Sixth Sense 4.2 stars
33
![Page 34: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/34.jpg)
Data Preprocessing
Advanced Topics
The Second Axis: Model Quality
The second axis: • Quality of modeling
• Make the best out of a model
• Strive for: – Fundamental derivation
– Simplicity
– Avoid overfitting
– Robustness against number of iterations, parameter settings, etc.
• Optimizing is good, but don’t overdo it! 34
Global
Local
Quality
![Page 35: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/35.jpg)
Data Preprocessing
Advanced Topics
Local Modeling via kNN
• Earliest and most popular collaborative filtering method.
• Derive unknown ratings from those of “similar” items (movie-movie variant).
• A parallel user-user flavor. – Rely on ratings of like-minded users
35
![Page 36: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/36.jpg)
Data Preprocessing
Advanced Topics
Collaborative Filtering with kNN
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mo
vies
— Unknown rating — Rating from 1 to 5
![Page 37: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/37.jpg)
Data Preprocessing
Advanced Topics
Collaborative Filtering with kNN
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mo
vies
— Estimate rating of movie 1 by user 5
![Page 38: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/38.jpg)
Data Preprocessing
Advanced Topics
Collaborative Filtering with kNN
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mo
vies
Neighbor selection: Identify movies similar to 1, rated by user 5
![Page 39: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/39.jpg)
Data Preprocessing
Advanced Topics
Collaborative Filtering with kNN
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mo
vies
Compute similarity weights: 𝑠13 = 0.2, 𝑠16 = 0.3
![Page 40: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/40.jpg)
Data Preprocessing
Advanced Topics
Collaborative Filtering with kNN
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 2.6 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mo
vies
Predict by taking weighted average: (0.2 × 2 + 0.3 × 3)/(0.2 + 0.3) = 2.6
![Page 41: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/41.jpg)
Data Preprocessing
Advanced Topics
Properties of kNN
• Intuitive.
• No substantial preprocessing is required.
• Easy to explain reasoning behind a recommendation.
• Accurate?
![Page 42: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/42.jpg)
Data Preprocessing
Advanced Topics
0.8563: Grand Prize:
0.8693: Ensemble
0.9514: Cinematch (baseline)
1.0533: Movie average
1.0651: User average
0.96
0.91
kNN on the Error (RMSE) Scale
1.1296: Global average
kNN
Erroneous
Accurate
![Page 43: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/43.jpg)
Data Preprocessing
Advanced Topics
Item-Oriented kNN CF
• Problems: – Suppose that a particular item is predicted perfectly by a
subset of the neighbors, where the predictive subset should receive all the weight. Pearson correlation cannot do this.
– Suppose the neighbors set contains three movies that are highly correlated with each other. Basic neighborhood methods do not account for interactions among neighbors.
– Suppose that an item has no useful neighbors rated by a particular user. The standard formula uses a weighted average of rates for the uninformative neighbors. 43
![Page 44: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/44.jpg)
Data Preprocessing
Advanced Topics
Interpolation Weights
To address the problem of arbitrary similarity measures, we can use a weighted sum rather than a weighted average:
𝑝𝑎,𝑖 = 𝑟 𝑎 + 𝑟𝑢,𝑖 − 𝑟 𝑢 × 𝑤𝑎,𝑢𝑢∈𝐾
Now, we can allow 𝑤𝑎,𝑢𝑢∈𝐾 ≠ 1.
![Page 45: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/45.jpg)
Data Preprocessing
Advanced Topics
Interpolation Weights
To address the other problems, we can model relationships between item 𝑖 and its neighbors. This can be learned through a least squares problem from all other users that rated 𝑖:
min𝑤 𝑟𝑣𝑖 − 𝑏𝑣𝑖 − 𝑤𝑎,𝑢 𝑟𝑣𝑢 − 𝑏𝑣𝑢
𝑢∈𝐾
2
𝑣≠𝐾
![Page 46: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/46.jpg)
Data Preprocessing
Advanced Topics
Interpolation Weights
The Result:
– Interpolation weights derived based on their role; no use of an arbitrary similarity measure.
– Explicitly account for interrelationships among the neighbors.
Challenges: – Dealing with missing values.
– Avoiding overfitting.
– Efficient implementation.
![Page 47: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/47.jpg)
Data Preprocessing
Advanced Topics
From Local to Latent Trends
Inherently, nearest neighbors is a local technique. What about capturing non-local, or latent, trends?
47
![Page 48: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/48.jpg)
Data Preprocessing
Advanced Topics
Latent Factor Models
• Decompose user ratings on movies into separate item and movie matrices to capture latent factors. Frequently performed using singular value decomposition (SVD).
• Estimate unknown ratings as inner-products of factors.
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
~
• Very powerful model, but can easily overfit.
Ratings Movies
Users
![Page 49: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/49.jpg)
Data Preprocessing
Advanced Topics
0.8563: Grand Prize:
0.8693: Ensemble
0.9514: Cinematch (baseline)
1.0533: Movie average
1.0651: User average
Factorization on the Error (RMSE) Scale
1.1296: Global average Erroneous
Accurate
0.93
0.89
Factorization
![Page 50: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/50.jpg)
Data Preprocessing
Advanced Topics
Ensemble Creation
• Factorization and kNN models are used at various scales.
• These models can be combined to form an ensemble.
• Stacked generalization or blending is used. – A linear regression model can be trained over the base model
predictions.
– Models can be weighted differently at different scales.
50
![Page 51: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/51.jpg)
Data Preprocessing
Advanced Topics
Combining Multi-Scale Views
Global Effects
Regional Effects
Local Effects
Residual Fitting
Factorization
kNN
Weighted Average A Unified Model
kNN
Factorization
![Page 52: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/52.jpg)
Data Preprocessing
Advanced Topics
Seek Alternative Perspectives
The previous models all address the movies. The problem, however, is about users!
![Page 53: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/53.jpg)
Data Preprocessing
Advanced Topics
The Third Axis: Implicit Information
• Improve accuracy by exploiting implicit feedback.
• Implicit behavior is abundant and easy to collect:
– Rental history, search patterns, browsing history, etc.
• Allows predicting personalized ratings for users that never rated.
The Idea:
Characterize users by which movies they rated, rather than how they rated.
![Page 54: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/54.jpg)
Data Preprocessing
Advanced Topics
Binary Implicit
Global
Local
Quality
The Big Picture
Where do you want to be?
• All over the global-local axis
• Relatively high on the quality axis
• All over the explicit-implicit axis
Ratings Explicit
![Page 55: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/55.jpg)
Data Preprocessing
Advanced Topics
0.8563: Grand Prize:
0.8693: Ensemble
0.9514: Cinematch (baseline)
1.0533: Movie average
1.0651: User average
Ensemble on the Error (RMSE) Scale
1.1296: Global average Erroneous
Accurate
0.89 Ensemble
![Page 56: Advanced Recommendations with Collaborative Filteringrjohns15/cse40647.sp14/www/content...Generalizing the Recommender System •Use an ensemble of complementing predictors. •Many](https://reader030.vdocuments.mx/reader030/viewer/2022021717/5b2d2a127f8b9adc6e8ba1f8/html5/thumbnails/56.jpg)
Data Preprocessing
Advanced Topics
The Take-Away Messages
Solving challenging data mining and data science problems require you to:
1. Think deeply – Design better, more innovative algorithms.
2. Think broadly – Use ensembles of multiple predictors.
3. Think differently – Model the data from different
perspectives and in different ways. 56