![Page 1: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/1.jpg)
1
Google News Personalization
Scalable Online Collaborative Filtering
Abhinandan Das - [email protected] Datar - [email protected]
Ashutosh Garg - [email protected] Rajaram - [email protected]
Presented by: Aniket Zamwar - [email protected]
![Page 2: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/2.jpg)
2
Already Studied in Class
•Map Reduce
•Collaborative Filtering
•Content Based Recommendation
•Clustering Techniques - Pros/Cons
![Page 3: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/3.jpg)
3
Problem Statement• Scale of operation is very huge - order of several
million news stories dynamically changing at high rate.
• Presented with the click history for N users ( U = { u1,u2,...,uN} ) over M items ( S = {s1,s2,...,sM} ), and given a specific user ‘u’ with click history set Cu consisting of stories { si1 , . . . , si|Cu | }, recommend K stories to the user.
• Strict timing constraints for recommendation engine to generate recommendations.
Google News Personalization
4/18/13
![Page 4: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/4.jpg)
4
Approaches
•Collaborative Clustering
•Probabilistic Latent Semantic Indexing
•Covisitation Counts
Google News Personalization
4/18/13
![Page 5: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/5.jpg)
5
Problem Setting
•Record User Queries and Clicks
•Recommendations of News using user click history and click history of the community
Google News Personalization
4/18/13
![Page 6: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/6.jpg)
6
Recommender System•Content based Systems
•Collaborative Filtering Systems
‣ Memory-based Algorithms‣ Prediction calculated as weighted average of the ratings
given by other users
‣ weight is proportional to to “similarity” between users.
‣ Model-based Algorithms‣ Model the users based on their past ratings and use these
models to predict ratings of unseen items.
•Mix of memory based + model based systems
Google News Personalization
4/18/13
![Page 7: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/7.jpg)
7
Algorithms
•Model based approach
•Clustering Techniques: Probabilistic Latent Semantic Indexing(PLSI) and min hash
•Memory based approach
•Item Covisitation
Google News Personalization
4/18/13
![Page 8: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/8.jpg)
8
Min Hashing• Probabilistic clustering technique - assigns pairs of
users to same cluster with probability proportional to overlab between the set of items the users have voted for.
• Similarity calculated using Jaccard Coefficient
• To Do: Given user u-i, compute similarity S(u-i, u-j) for all users u-j, and recommend stories to u-i voted by u-j with weight equal to S(u-i, u-j)
• Issues: Real time not scalable, using hash table to find vote for specific user is also not feasible, offline computation is also not feasible
• Locality Sensitive Hashing (LSH) comes for rescueGoogle News Personalization
4/18/13
![Page 9: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/9.jpg)
9
Locality Sensitive Hashing
• Key Idea: Hash data points using several hash functions, such that for each hash function the probability of collision is much higher for objects which are close to each other.
• Min-hashing technique is used to randomly permute the set of items (S) and for each user u-i compute its hash value h(u-i) as the index of first item under the permutation that belongs to user’s item set Cu-i.
• Min-hashing = probabilistic clustering where each hash bucket corresponds to a cluster, that puts two users together in the same cluster with probability equal to item set similarity S(u-i, u-j)
Google News Personalization
4/18/13
![Page 10: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/10.jpg)
10
PLSI•Probabilistic Latent Semantic models
•Models users and items as random variables - relationship between users and items is learned by modeling joint distribution of users and items as mixed distribution
•A hidden variable Z is used to define the relationship, it represents user communities and item communities.
Google News Personalization
4/18/13
![Page 11: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/11.jpg)
11
Covisitation•Covisitation is defined as event in which
two stories are clicked by same user within a certain time interval.
•A graph whose nodes represent items and weighted edges represent time discounted number of covisitation instances.
•For each user click the adjacency list representing graph is updated: for entry for each item in user history, new entry corresponding to clicked item is added if not there; if it is already there then the age discounted count is updated.
Google News Personalization
4/18/13
![Page 12: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/12.jpg)
12
Covisitation based Recommendation
•Fetch user u-i’s recent click history - limited to past few hours or days.
•For each item s-i in click history of user, lookup the entry for pair (s-i, s) in adjacency list for s-i stored in Big Table.
•The value stored in entry normalized by sum of all entries for s-i is stored to recommendation score.
•Recommendation score is normalized to a value between 0 and 1 by linear scaling.
Google News Personalization
4/18/13
![Page 13: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/13.jpg)
13
System Setup•Three Components:
• Offline component to cluster users based on click history
• Online servers:
• Updating story and user statistics each time user clicks on news story
• Generating news story when user requests
• Two Data Tables
• User Table (UT) indexed by user-id, stores user click history and clustering information.
• Story Table (ST) indexed by story-id, stores real time click counts for every story-story and story-cluster pair.
Google News Personalization
4/18/13
![Page 14: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/14.jpg)
14
System Components
Google News Personalization
4/18/13
NFENFENFENFE
NSSNSSNSSNSS
NPSNPSNPSNPS
UTUTUTUT
STSTSTST
OfflineOfflineLog Log
AnalysiAnalysiss
OfflineOfflineLog Log
AnalysiAnalysiss
ccaacchhee
ccaacchhee
bbuuffffeerr
bbuuffffeerr
Update Statistics
User Clusters
User Click Histories
Clusters + Click History
UserId + Clicked Story
view personalizednews page request
user click UserID
+ C
licke
d St
ory
UserID +
Candidate
StoriesRanked Stories
Clusters +
Click History
FetchStatistics
NFE: News Front EndNSS: News Statistics ServerNPS: News Personalization Server
UT: User TableST: Story Table
Min HashingPLSI
![Page 15: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/15.jpg)
15
Pros•Scalable collaboration of Content based
and Collaborative clustering
•Recommendation system using Min Hashing and PLSI Algorithms
•Scaling the algorithms by using Map Reduce and Big Table representation for data
•Using click events as vote for news
•Dynamically providing the latest likely news that suits the interest of the user
Google News Personalization
4/18/13
![Page 16: 1 Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.comabhinandan@google.com Mayur Datar - mayur@google.com](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5516f789550346f5558b4ccd/html5/thumbnails/16.jpg)
16
Cons
•Depends a lot on User Clicks
•User Clicks considered as positive vote
•Does not say anything about negative vote
Google News Personalization
4/18/13