introduction to collaborative filtering usingblog.trifork.com › wp-content › uploads › 2010...

23
Frank Scholten [email protected] Introduction to Collaborative Filtering using

Upload: others

Post on 25-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Frank Scholten

[email protected]

Introduction toCollaborative Filtering

using

Page 2: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Agenda

● Introduction● Mahout / Taste● Taste Architecture● Algorithms● Evaluating algorithms● Questions?

Page 3: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Recommendation Engines

● Amazon

● Stumbleupon

● Youtube

● Last.fm

● Netflix

● Digg

● Google News

Page 4: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

CollaborativeFiltering

Clustering

Classification

Is this SPAM?

Page 5: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Users & Items

Page 6: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Preferences

I rateI am buying

Explicit Implicit

3 stars

Page 7: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Item-based recommendation

Which are books are read

by people that also read

Page 8: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

User-based recommendation

We've got similar tastes, read any good books?

Page 9: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

User neighborhood

Page 10: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Taste Architecture

DataModel

Recommender

ItemSimilarityor UserSimilarity

234, 854, 4.0234, 598, 3.0234, 458, 5.0235, 289, 4.0… , … , ...

Preferences CSV file

3 stars

Page 11: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Preferences

● Preference● long userId;● long itemId;● float value;

● PreferenceArray● Implicit

BooleanUserPreferenceArray & BooleanItemPreferenceArray

Page 12: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

DataModels

● FileDataModel

● GenericJDBCDataModel

● MySQLDataModel

Page 13: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Similarity Algorithms

Class Explicit Implicit

TanimotoCoefficientSimilarity

LogLikelihoodSimilarity

EuclidianDistanceSimilarity

PearsonCorrelationSimilarity

SpearmanCorrelationSimilarity

UncenteredCosineSimilarity

Slope One

Page 14: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Similarity Algorithms

Class Explicit Implicit

TanimotoCoefficientSimilarity

LogLikelihoodSimilarity

EuclidianDistanceSimilarity

PearsonCorrelationSimilarity

SpearmanCorrelationSimilarity

UncenteredCosineSimilarity

Slope One

Page 15: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

TanimotoCoefficientSimilarity

#Users preferring A AND B

Divided by

#Users preferring A XOR B

T(A,B) =

Page 16: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

LoglikelihoodSimilarity

● Hypothesis A = “Items are similar”

● Hypothesis B = “Items are not similar”

● L(A,B) = log (max likelihood A) – log (max likelihood B)

● See “Accurate methods for statistics of suprise and coincidence” ~ Ted Dunning

Page 17: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

● MySQLJDBCItemSimilarity

● Generic*Similarity● GenericItemSimilarity.ItemItemSimilarity

● GenericUserSimilarity.UserUserSimilarity

Precomputed Similarities

Page 18: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

long itemId = 345;

GenericItemBasedRecommender itemRec = …itemRec.mostSimilarItems(itemId, 5);

long userId = 103;

GenericUserBasedRecommender userRec = …userRec.recommend(userId, 5);

Recommenders

Page 19: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

● User/Item-based recommendation

● Refresh logic

● Access to DataModel

● Recommended because

Recommenders

Page 20: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Evaluating algorithms

Eval %

Originaldataset

Train %

Recommender

Testdataset

Trainingdataset

Estimatedpreference

Actualpreference

3.0

Page 21: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Evaluating algorithms

● AverageAbsoluteDifference­ or RMSRecommenderEvaluator● Evaluation %● Training %● RecommenderBuilder● DataModelBuilder● DataModel

Page 22: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

Evaluation Demo

● Helper classes for doing evaluation

● TODO - Evaluation of implicit data

● Suggestions welcome

Page 23: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix

References

Mahout in Action EAP

http://blog.jteam.nl

Mailinglist