Download - Recommender system
Recommender System
How does it work ?
Group 4: Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Instructed by Dr.Tim Reichert
2 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Outline
Introduction Collaborative filtering algorithms
User-based Item-based Similarity algorithms Apache Mahout demo
Case study – Amazon Demo – Music recommender system
3 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Introduction
Source: http://www.cguru.info/information_technology_branch.htm
Technologies help people do many jobs ..
… and also in searching information
4 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Go to web directories
5 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Or use search engines
6 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
So what is the Problem ?
Find
Knew what you
need
Search
Knew the keywords ?RECOMMENDER
SYSTEM
7 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender system
Predict & produce the most relevant
recommendations to its audiences
based on their tastes.
8 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Where’s it applied ?
9 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Breaking news
Source: http://blogs.wsj.com/digits/2014/01/17/amazon-wants-to-ship-your-package-before-you-buy-it/
Amazon Wants to
Ship Your Package
Before You Buy It
What is the secret behind ?
10 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
Item Hierarchy
Attribute-based recommendations
Collaborative filtering – User-user Similarity
Collaborative filtering – Item-item Similarity
Social + Interest Graph Based
Model Based
11 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
Item Hierarchy
Attribute-based recommendations
Collaborative filtering – User-user Similarity
Collaborative filtering – Item-item Similarity
Social + Interest Graph Based
Model Based
12 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
Item Hierarchy
Attribute-based recommendations
Collaborative filtering – User-user Similarity
Collaborative filtering – Item-item Similarity
Social + Interest Graph Based
Model Based
13 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
Item Hierarchy
Attribute-based recommendations
Collaborative filtering – User-user Similarity
Collaborative filtering – Item-item Similarity
Social + Interest Graph Based
Model Based
14 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
Item Hierarchy
Attribute-based recommendations
Collaborative filtering – User-user Similarity
Collaborative filtering – Item-item Similarity
Social + Interest Graph Based
Model Based
15 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
Item Hierarchy
Attribute-based recommendations
Collaborative filtering – User-user Similarity
Collaborative filtering – Item-item Similarity
Social + Interest Graph Based
Model Based
16 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
What is Collaborative filtering ?
Method of making automatic predictions
About the interests of a user by collecting preferences or taste information from many users.
?
17 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
User-based CF
1/25/2014
18 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
User-based CF
1/25/2014
• General Idea• Algorithm• K-Nearest Neighbors
19 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
20 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
What kind of film does he like ?What kind of film does he like ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
21 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but his best friend really like insect collecting
I don't know but his best friend really like insect collecting
22 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but his best friend really like insect collecting
I don't know but his best friend really like insect collecting
Maybe your boyfriend is similar to him. You guys can watch spiderman !?
Maybe your boyfriend is similar to him. You guys can watch spiderman !?
23 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but his best friend really like insect collecting
I don't know but his best friend really like insect collecting
Maybe your boyfriend is similar to him. You guys can watch spiderman !?
Maybe your boyfriend is similar to him. You guys can watch spiderman !?
NOT IN OUR SCOPE
24 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
25 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
What kind of film does he like ?What kind of film does he like ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
26 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but he really enjoys our last film, iron man
I don't know but he really enjoys our last film, iron man
27 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but he really enjoys our last film, iron man
I don't know but he really enjoys our last film, iron man
Really, my boyfriend also likes it and his favourite one is the amazing Spiderman so maybe you guys can try it
Really, my boyfriend also likes it and his favourite one is the amazing Spiderman so maybe you guys can try it
28 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
General Idea
1/25/2014
Similar
Recommend
29 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014
… …
8 8 03
…
30 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014
… …
8 8 03
…
31 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014
… …
8 8 03
…5
5
32 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014
… …
8 8 03
…
33 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
K-Nearest Neighbors
1/25/2014
Neighborhood of most similar users is computed first
Only items known to those users are considered
34 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
K-Nearest Neighbors
1/25/2014
…
8 8 03
35 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Item-Based CF
1/25/2014
• General Idea• Why we need ?• Algorithm
36 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
37 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
What kind of film does he like ?What kind of film does he like ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?
38 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but he really enjoys our last film, iron man
I don't know but he really enjoys our last film, iron man
39 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
I don't know but he really enjoys our last film, iron man
I don't know but he really enjoys our last film, iron man
Really, if you enjoy ironman then you should try ironman 2
Really, if you enjoy ironman then you should try ironman 2
40 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
General Idea
1/25/2014
Item-based recommendation is derived from how similar items are to items, instead of users to users.
Similar
Recommen
d
41 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Why we need ?
1/25/2014
So MANY users !!!
42 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014Human is COMPLEX ?
43 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
10 000 users like
Like
8 000 users like
44 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014
…
……
…
8 5 16
Check every item that has no preference
For each of them, calculate the similarity between it and every item that has preference
… …
45 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Similarities
1/25/2014
• Pearson correlation• Euclidean distance• Tanimoto
46 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014
A hypothesis that how tall you are effects your self esteem
47 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014
The Pearson correlation is a number between –1 and 1
Measure of the strength of a linear association between two variables
48 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014
• Doesn’t take into account the number of items in which two users’ preferences overlap
49 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014
• Doesn’t take into account the number of items in which two users’ preferences overlap
• If two users overlap on only one item, no correlation can be computed
50 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Tanimoto
1/25/2014
Ignore preference values entirely.
• It’s the ratio of the size of the intersection to the size of the union of their preferred items
• When two users’ items completely overlap, the result is 1.0
• When they have nothing in common, it’s 0.0
51 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Tanimoto
1/25/2014
= AB / ( A + B - AB)
53 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Tanimoto
1/25/2014
Only use while underlying data contains only Boolean preferences
Too much noise in preferences
Mahout Basic Demo
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
What is Apache Mahout?
• Open Source from Apache• Mahout is a Java library
o Implementing Machine Learning techniques• Recommendation• Clustering• Classification
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Why do we prefer Mahout ?
• Apache License• Good Community & Documentation
• Scalableo Based on Hadoop (not mandatory!)
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
•
Physical Storage(database, files …)
Data Model
Recommender
Application
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommendation in Mahout
• Input: raw data (user preferences)• Output: Preference estimation• Step 1
o Mapping raw data into a DataModel Mahout-compliant
• Step 2o Tuning recommender components
• Similarity measure, neighborhood, …
• Step 3o Recommend
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommendation Components
• Five Java interfaceso DataModel interface:
• MySQLJDBCDataModel, FileDataModel …o UserSimilarity interface
• Methods to calculate the degree of correlation between two users
o ItemSimilarity interface• Methods to calculate the degree of correlation between two
itemso UserNeighborhood interface
• Methods to define the concept of ‘neighborhood’o Recommender interface
• Methods to implement the recommendation step itself
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Similarity Metrics
• SIMILARITY_COOCCURRENCE • SIMILARITY_LOGLIKELIHOOD • SIMILARITY_TANIMOTO_COEFFICIENT• SIMILARITY_CITY_BLOCK • SIMILARITY_COSINE• SIMILARITY_PEARSON_CORRELATION • SIMILARITY_EUCLIDEAN_DISTANCE
61 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
CASE STUDY
“Much is made of what the likes of Facebook, Google and Apple know about users. Truth is, Amazon may know more. And the massive retailer proves it every day “ - JP Mangalindan, Writer [*]
References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/
62 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
CASE STUDY
1/25/2014
• Amazon recommendation system is based on a number of simple elements:o what a user has bought in the past and recently viewedo which items a user has in virtual shopping carto items the user has rated and liked,o what other customers have viewed and purchased
• The retail giant's call this "item-to-item collaborative filtering“ • And used this algorithm to heavily customize the browsing experience
for returning customers
References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/
63 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
• The recommendation system worked, and Amazon reported very successfullyo 29% sales increase to $12.83 billion during its 2nd fiscal quarter (as of July 26,
2012 )o Compare to $9.9 billion during the same time last year
• Amazon has integrated recommendations into nearly every part of the purchasing process from product discovery to checkout
• "Our mission is to delight our customers by allowing them to serendipitously discover great products.“ an Amazon spokesperson
CASE STUDY
1/25/2014References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/
64 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
CASE STUDY – Amazon recommendations services
References http://www.google.com/patents/US7921042
65 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
CASE STUDY - Generation of Similar Items Table
References http://www.google.com/patents/US7921042 (Fig.1)http://www.google.com/patents/US7113917 (Fig.3,4)
The recommendation services components include:- a recommendation
process- and an off-line table
generation process- a similar items table
66 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014References http://www.google.com/patents/US7921042 (Fig.1)
http://www.google.com/patents/US7113917 (Fig.2)
CASE STUDY - Generation of Recommendation
67 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014References http://www.google.com/patents/US7921042 (Fig.1)
http://www.google.com/patents/US7113917 (Fig.5)
CASE STUDY - Generation of Recommendation
68 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
References http://www.google.com/patents/US7921042 (Fig.1)http://www.google.com/patents/US7113917 (Fig.7)
CASE STUDY - Generation of Recommendation
69 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
EVALUATION
Experimental Settings• Offline Experiments
- Performed by using a pre-collected data set of users choosing or rating items- Simulate the behavior of users that interact with a recommendation system.- Assume that the user behavior when the data was collected will be similar enough to the user
behavior when the recommender system is deployed,- Make reliable decisions based on the simulation.
• User Studies - conducted by recruiting a set of test subject, - and asking and observing them to perform several tasks requiring an interaction with the
recommendation system.- We can then check whether the recommendations are used, and whether people read different
stories with and without recommendations then ask them whether recommend were relevant
• Online Experiments - measuring the change in user behavior when interacting with different recommendation
systems.- if users of one system follow the recommendations more often, or if some utility gathered from
users of one system exceeds utility gathered from users of the other system, then we can conclude that one system is superior to the otherReferences: Microsoft research :Evaluating Recommendation Systems -
Guy Shani and Asela Gunawardana
70 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014
EVALUATION
References: Microsoft research :Evaluating Recommendation Systems - Guy Shani and Asela Gunawardana
Reliable conclusion1. Confidence and p-values2. Multiple tests
Measure Metrics3. User Preference & Prediction Accuracy: voting from user
o Root Mean Squared Error (RMSE)o Measuring Usage Prediction
4. Coverage: Item Space & User Space5. Novelty: recommendations for items that the user did not know about6. Utility: the recommendation engine can be judged by the revenue
that it generates for the website
71 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
References
• Personalized recommendations of items represented within a database http://www.google.com/patents/US7113917
• Computer processes for identifying related items and generating personalized item recommendations http://www.google.com/patents/US7921042
• Microsoft research :Evaluating Recommendation Systems - Guy Shani and Asela Gunawardana
• Amazon Recommendation – Industry report
1/25/2014
72 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Summary
• Recommender Systems• User-based vs Item-based• Similarity metrics
o Depend on data to choose the most suitable
• Evaluation and challenges• Apache mahout
o A Java library implements machine learning techniques
1/25/2014
73 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Mahout Music Recommend Demo
IF YOU LIKE BRITNEY, YOU WILL
LOVE ….
Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Architecture of Recommender
DemoFriendLikes.csv
DataModel
FacebookRecommender Recommend
er
FacebookRecommenderSOAP
Glassfish Java 6 EE Server
facebook-recommender-demo.war
SOAP
75 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Q&A
THANK YOU!
1/25/2014