recommender system

74
Recommender System How does it work ? Group 4: Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh Instructed by Dr.Tim Reichert

Upload: bao-nguyen

Post on 11-Nov-2014

133 views

Category:

Data & Analytics


1 download

DESCRIPTION

The presentation discusses about how the Recommender System works for ex: how Amazon recommends books for customers when they login to Amazon website.

TRANSCRIPT

Page 1: Recommender system

Recommender System

How does it work ?

Group 4: Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Instructed by Dr.Tim Reichert

Page 2: Recommender system

2 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Outline

Introduction Collaborative filtering algorithms

User-based Item-based Similarity algorithms Apache Mahout demo

Case study – Amazon Demo – Music recommender system

Page 3: Recommender system

3 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Introduction

Source: http://www.cguru.info/information_technology_branch.htm

Technologies help people do many jobs ..

… and also in searching information

Page 4: Recommender system

4 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Go to web directories

Page 5: Recommender system

5 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Or use search engines

Page 6: Recommender system

6 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

So what is the Problem ?

Find

Knew what you

need

Search

Knew the keywords ?RECOMMENDER

SYSTEM

Page 7: Recommender system

7 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender system

Predict & produce the most relevant

recommendations to its audiences

based on their tastes.

Page 8: Recommender system

8 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Where’s it applied ?

Page 9: Recommender system

9 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Breaking news

Source: http://blogs.wsj.com/digits/2014/01/17/amazon-wants-to-ship-your-package-before-you-buy-it/

Amazon Wants to

Ship Your Package

Before You Buy It

What is the secret behind ?

Page 10: Recommender system

10 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender Approaches

Item Hierarchy

Attribute-based recommendations

Collaborative filtering – User-user Similarity

Collaborative filtering – Item-item Similarity

Social + Interest Graph Based

Model Based

Page 11: Recommender system

11 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender Approaches

Item Hierarchy

Attribute-based recommendations

Collaborative filtering – User-user Similarity

Collaborative filtering – Item-item Similarity

Social + Interest Graph Based

Model Based

Page 12: Recommender system

12 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender Approaches

Item Hierarchy

Attribute-based recommendations

Collaborative filtering – User-user Similarity

Collaborative filtering – Item-item Similarity

Social + Interest Graph Based

Model Based

Page 13: Recommender system

13 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender Approaches

Item Hierarchy

Attribute-based recommendations

Collaborative filtering – User-user Similarity

Collaborative filtering – Item-item Similarity

Social + Interest Graph Based

Model Based

Page 14: Recommender system

14 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender Approaches

Item Hierarchy

Attribute-based recommendations

Collaborative filtering – User-user Similarity

Collaborative filtering – Item-item Similarity

Social + Interest Graph Based

Model Based

Page 15: Recommender system

15 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommender Approaches

Item Hierarchy

Attribute-based recommendations

Collaborative filtering – User-user Similarity

Collaborative filtering – Item-item Similarity

Social + Interest Graph Based

Model Based

Page 16: Recommender system

16 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

What is Collaborative filtering ?

Method of making automatic predictions

About the interests of a user by collecting preferences or taste information from many users.

?

Page 17: Recommender system

17 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

User-based CF

1/25/2014

Page 18: Recommender system

18 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

User-based CF

1/25/2014

• General Idea• Algorithm• K-Nearest Neighbors

Page 19: Recommender system

19 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

Page 20: Recommender system

20 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

What kind of film does he like ?What kind of film does he like ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

Page 21: Recommender system

21 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but his best friend really like insect collecting

I don't know but his best friend really like insect collecting

Page 22: Recommender system

22 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but his best friend really like insect collecting

I don't know but his best friend really like insect collecting

Maybe your boyfriend is similar to him. You guys can watch spiderman !?

Maybe your boyfriend is similar to him. You guys can watch spiderman !?

Page 23: Recommender system

23 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but his best friend really like insect collecting

I don't know but his best friend really like insect collecting

Maybe your boyfriend is similar to him. You guys can watch spiderman !?

Maybe your boyfriend is similar to him. You guys can watch spiderman !?

NOT IN OUR SCOPE

Page 24: Recommender system

24 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

Page 25: Recommender system

25 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

What kind of film does he like ?What kind of film does he like ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

Page 26: Recommender system

26 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but he really enjoys our last film, iron man

I don't know but he really enjoys our last film, iron man

Page 27: Recommender system

27 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but he really enjoys our last film, iron man

I don't know but he really enjoys our last film, iron man

Really, my boyfriend also likes it and his favourite one is the amazing Spiderman so maybe you guys can try it

Really, my boyfriend also likes it and his favourite one is the amazing Spiderman so maybe you guys can try it

Page 28: Recommender system

28 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

General Idea

1/25/2014

Similar

Recommend

Page 29: Recommender system

29 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Algorithm

1/25/2014

… …

8 8 03

Page 30: Recommender system

30 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Algorithm

1/25/2014

… …

8 8 03

Page 31: Recommender system

31 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Algorithm

1/25/2014

… …

8 8 03

…5

5

Page 32: Recommender system

32 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Algorithm

1/25/2014

… …

8 8 03

Page 33: Recommender system

33 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

K-Nearest Neighbors

1/25/2014

Neighborhood of most similar users is computed first

Only items known to those users are considered

Page 34: Recommender system

34 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

K-Nearest Neighbors

1/25/2014

8 8 03

Page 35: Recommender system

35 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Item-Based CF

1/25/2014

• General Idea• Why we need ?• Algorithm

Page 36: Recommender system

36 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

Page 37: Recommender system

37 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

What kind of film does he like ?What kind of film does he like ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

I'm gonna rent a film to watch with my boyfriend this week. Do you have any suggestion ?

Page 38: Recommender system

38 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but he really enjoys our last film, iron man

I don't know but he really enjoys our last film, iron man

Page 39: Recommender system

39 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

I don't know but he really enjoys our last film, iron man

I don't know but he really enjoys our last film, iron man

Really, if you enjoy ironman then you should try ironman 2

Really, if you enjoy ironman then you should try ironman 2

Page 40: Recommender system

40 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

General Idea

1/25/2014

Item-based recommendation is derived from how similar items are to items, instead of users to users.

Similar

Recommen

d

Page 41: Recommender system

41 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Why we need ?

1/25/2014

So MANY users !!!

Page 42: Recommender system

42 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014Human is COMPLEX ?

Page 43: Recommender system

43 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

10 000 users like

Like

8 000 users like

Page 44: Recommender system

44 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Algorithm

1/25/2014

……

8 5 16

Check every item that has no preference

For each of them, calculate the similarity between it and every item that has preference

… …

Page 45: Recommender system

45 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Similarities

1/25/2014

• Pearson correlation• Euclidean distance• Tanimoto

Page 46: Recommender system

46 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Pearson Correlation

1/25/2014

A hypothesis that how tall you are effects your self esteem

Page 47: Recommender system

47 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Pearson Correlation

1/25/2014

The Pearson correlation is a number between –1 and 1

Measure of the strength of a linear association between two variables

Page 48: Recommender system

48 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Pearson Correlation

1/25/2014

• Doesn’t take into account the number of items in which two users’ preferences overlap

Page 49: Recommender system

49 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Pearson Correlation

1/25/2014

• Doesn’t take into account the number of items in which two users’ preferences overlap

• If two users overlap on only one item, no correlation can be computed

Page 50: Recommender system

50 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Tanimoto

1/25/2014

Ignore preference values entirely.

• It’s the ratio of the size of the intersection to the size of the union of their preferred items

• When two users’ items completely overlap, the result is 1.0

• When they have nothing in common, it’s 0.0

Page 51: Recommender system

51 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Tanimoto

1/25/2014

= AB / ( A + B - AB)

Page 52: Recommender system

53 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Tanimoto

1/25/2014

Only use while underlying data contains only Boolean preferences

Too much noise in preferences

Page 53: Recommender system

Mahout Basic Demo

Page 54: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

What is Apache Mahout?

• Open Source from Apache• Mahout is a Java library

o Implementing Machine Learning techniques• Recommendation• Clustering• Classification

Page 55: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Why do we prefer Mahout ?

• Apache License• Good Community & Documentation

• Scalableo Based on Hadoop (not mandatory!)

Page 56: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Physical Storage(database, files …)

Data Model

Recommender

Application

Page 57: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommendation in Mahout

• Input: raw data (user preferences)• Output: Preference estimation• Step 1

o Mapping raw data into a DataModel Mahout-compliant

• Step 2o Tuning recommender components

• Similarity measure, neighborhood, …

• Step 3o Recommend

Page 58: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Recommendation Components

• Five Java interfaceso DataModel interface:

• MySQLJDBCDataModel, FileDataModel …o UserSimilarity interface

• Methods to calculate the degree of correlation between two users

o ItemSimilarity interface• Methods to calculate the degree of correlation between two

itemso UserNeighborhood interface

• Methods to define the concept of ‘neighborhood’o Recommender interface

• Methods to implement the recommendation step itself

Page 59: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Similarity Metrics

• SIMILARITY_COOCCURRENCE • SIMILARITY_LOGLIKELIHOOD • SIMILARITY_TANIMOTO_COEFFICIENT• SIMILARITY_CITY_BLOCK • SIMILARITY_COSINE• SIMILARITY_PEARSON_CORRELATION        • SIMILARITY_EUCLIDEAN_DISTANCE

Page 60: Recommender system

61 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

CASE STUDY

“Much is made of what the likes of Facebook, Google and Apple know about users. Truth is, Amazon may know more. And the massive retailer proves it every day “ - JP Mangalindan, Writer [*]

References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/

Page 61: Recommender system

62 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

CASE STUDY

1/25/2014

• Amazon recommendation system is based on a number of simple elements:o what a user has bought in the past and recently viewedo which items a user has in virtual shopping carto items the user has rated and liked,o what other customers have viewed and purchased

• The retail giant's call this "item-to-item collaborative filtering“ • And used this algorithm to heavily customize the browsing experience

for returning customers

References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/

Page 62: Recommender system

63 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

• The recommendation system worked, and Amazon reported very successfullyo 29% sales increase to $12.83 billion during its 2nd fiscal quarter (as of July 26,

2012 )o Compare to $9.9 billion during the same time last year

• Amazon has integrated recommendations into nearly every part of the purchasing process from product discovery to checkout

• "Our mission is to delight our customers by allowing them to serendipitously discover great products.“ an Amazon spokesperson

CASE STUDY

1/25/2014References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/

Page 63: Recommender system

64 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

CASE STUDY – Amazon recommendations services

References http://www.google.com/patents/US7921042

Page 64: Recommender system

65 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

CASE STUDY - Generation of Similar Items Table

References http://www.google.com/patents/US7921042 (Fig.1)http://www.google.com/patents/US7113917 (Fig.3,4)

The recommendation services components include:- a recommendation

process- and an off-line table

generation process- a similar items table

Page 65: Recommender system

66 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014References http://www.google.com/patents/US7921042 (Fig.1)

http://www.google.com/patents/US7113917 (Fig.2)

CASE STUDY - Generation of Recommendation

Page 66: Recommender system

67 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014References http://www.google.com/patents/US7921042 (Fig.1)

http://www.google.com/patents/US7113917 (Fig.5)

CASE STUDY - Generation of Recommendation

Page 67: Recommender system

68 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

References http://www.google.com/patents/US7921042 (Fig.1)http://www.google.com/patents/US7113917 (Fig.7)

CASE STUDY - Generation of Recommendation

Page 68: Recommender system

69 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

EVALUATION

Experimental Settings• Offline Experiments

- Performed by using a pre-collected data set of users choosing or rating items- Simulate the behavior of users that interact with a recommendation system.- Assume that the user behavior when the data was collected will be similar enough to the user

behavior when the recommender system is deployed,- Make reliable decisions based on the simulation.

• User Studies - conducted by recruiting a set of test subject, - and asking and observing them to perform several tasks requiring an interaction with the

recommendation system.- We can then check whether the recommendations are used, and whether people read different

stories with and without recommendations then ask them whether recommend were relevant

• Online Experiments - measuring the change in user behavior when interacting with different recommendation

systems.- if users of one system follow the recommendations more often, or if some utility gathered from

users of one system exceeds utility gathered from users of the other system, then we can conclude that one system is superior to the otherReferences: Microsoft research :Evaluating Recommendation Systems -

Guy Shani and Asela Gunawardana

Page 69: Recommender system

70 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

1/25/2014

EVALUATION

References: Microsoft research :Evaluating Recommendation Systems - Guy Shani and Asela Gunawardana

Reliable conclusion1. Confidence and p-values2. Multiple tests

Measure Metrics3. User Preference & Prediction Accuracy: voting from user

o Root Mean Squared Error (RMSE)o Measuring Usage Prediction

4. Coverage: Item Space & User Space5. Novelty: recommendations for items that the user did not know about6. Utility: the recommendation engine can be judged by the revenue

that it generates for the website

Page 70: Recommender system

71 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

References

• Personalized recommendations of items represented within a database http://www.google.com/patents/US7113917

• Computer processes for identifying related items and generating personalized item recommendations http://www.google.com/patents/US7921042

• Microsoft research :Evaluating Recommendation Systems - Guy Shani and Asela Gunawardana

• Amazon Recommendation – Industry report

1/25/2014

Page 71: Recommender system

72 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Summary

• Recommender Systems• User-based vs Item-based• Similarity metrics

o Depend on data to choose the most suitable

• Evaluation and challenges• Apache mahout

o A Java library implements machine learning techniques

1/25/2014

Page 72: Recommender system

73 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Mahout Music Recommend Demo

IF YOU LIKE BRITNEY, YOU WILL

LOVE ….

Page 73: Recommender system

Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Architecture of Recommender

DemoFriendLikes.csv

DataModel

FacebookRecommender Recommend

er

FacebookRecommenderSOAP

Glassfish Java 6 EE Server

facebook-recommender-demo.war

SOAP

Page 74: Recommender system

75 Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh

Q&A

THANK YOU!

1/25/2014