introduction to collaborative filtering usingblog.trifork.com › wp-content › uploads › 2010...

Post on 25-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Frank Scholten

frank@jteam.nl

Introduction toCollaborative Filtering

using

Agenda

● Introduction● Mahout / Taste● Taste Architecture● Algorithms● Evaluating algorithms● Questions?

Recommendation Engines

● Amazon

● Stumbleupon

● Youtube

● Last.fm

● Netflix

● Digg

● Google News

CollaborativeFiltering

Clustering

Classification

Is this SPAM?

Users & Items

Preferences

I rateI am buying

Explicit Implicit

3 stars

Item-based recommendation

Which are books are read

by people that also read

User-based recommendation

We've got similar tastes, read any good books?

User neighborhood

Taste Architecture

DataModel

Recommender

ItemSimilarityor UserSimilarity

234, 854, 4.0234, 598, 3.0234, 458, 5.0235, 289, 4.0… , … , ...

Preferences CSV file

3 stars

Preferences

● Preference● long userId;● long itemId;● float value;

● PreferenceArray● Implicit

BooleanUserPreferenceArray & BooleanItemPreferenceArray

DataModels

● FileDataModel

● GenericJDBCDataModel

● MySQLDataModel

Similarity Algorithms

Class Explicit Implicit

TanimotoCoefficientSimilarity

LogLikelihoodSimilarity

EuclidianDistanceSimilarity

PearsonCorrelationSimilarity

SpearmanCorrelationSimilarity

UncenteredCosineSimilarity

Slope One

Similarity Algorithms

Class Explicit Implicit

TanimotoCoefficientSimilarity

LogLikelihoodSimilarity

EuclidianDistanceSimilarity

PearsonCorrelationSimilarity

SpearmanCorrelationSimilarity

UncenteredCosineSimilarity

Slope One

TanimotoCoefficientSimilarity

#Users preferring A AND B

Divided by

#Users preferring A XOR B

T(A,B) =

LoglikelihoodSimilarity

● Hypothesis A = “Items are similar”

● Hypothesis B = “Items are not similar”

● L(A,B) = log (max likelihood A) – log (max likelihood B)

● See “Accurate methods for statistics of suprise and coincidence” ~ Ted Dunning

● MySQLJDBCItemSimilarity

● Generic*Similarity● GenericItemSimilarity.ItemItemSimilarity

● GenericUserSimilarity.UserUserSimilarity

Precomputed Similarities

long itemId = 345;

GenericItemBasedRecommender itemRec = …itemRec.mostSimilarItems(itemId, 5);

long userId = 103;

GenericUserBasedRecommender userRec = …userRec.recommend(userId, 5);

Recommenders

● User/Item-based recommendation

● Refresh logic

● Access to DataModel

● Recommended because

Recommenders

Evaluating algorithms

Eval %

Originaldataset

Train %

Recommender

Testdataset

Trainingdataset

Estimatedpreference

Actualpreference

3.0

Evaluating algorithms

● AverageAbsoluteDifference­ or RMSRecommenderEvaluator● Evaluation %● Training %● RecommenderBuilder● DataModelBuilder● DataModel

Evaluation Demo

● Helper classes for doing evaluation

● TODO - Evaluation of implicit data

● Suggestions welcome

References

Mahout in Action EAP

http://blog.jteam.nl

Mailinglist

top related