collaborative filtering with temporal dynamics yehuda koren yahoo research israel kdd’09

Collaborative Filtering with Temporal Dy-namics

Yehuda Koren

Yahoo Research Israel

KDD’09

Copyright 2008 by CEBT

Outline

Introduction

Temporal Dynamics

Global effect

Baseline predictors

Time changing baseline predictors

User, Item effect

Bellkor function

Performance

Exploratory Study

Conclusion

2


The $1 Million Question


Million Dollars Awarded Sept 21st 2009


Preliminaries

Quiz set and Probe set– Given

(Kevin, Avatar, 2009/12/20, ★ ★ ★ ★ ★)

(Coca, 2012, 2009/12/10, ★ ★ ★ ★)

– Predict

(Kevin, District 9, 2009/12/18, ?????)

Training

– Dec 31, 1999 – Dec 31, 2005

– 100 million ratings

– 480 thousand users

– 17,770 movies

Testing

– 1.4 million ratings


Ratings Data

1 3 4

3 5 5

4 5 5

3

3

2 2 2

5

2 1 1

3 3

1

17,700 movies

480,000users


Ratings Data

1 3 4

3 5 5

4 5 5

3

3

2 ? ?

?

2 1 ?

3 ?

1

Test Data Set(most recent ratings)

480,000users

17,700 movies


Training Data

100 million ratings

Held-Out Data

3 million ratings

1.5m ratings 1.5m ratings

Quiz Set:scores

posted onleaderboard

Test Set:scores

known onlyto Netflix

Scores used indeterminingfinal winner

Labels only known to NetflixLabels known publicly


Scoring

Quality of the result is measured by RMSE

1/|R| S (u,i) e R ( rui - rui )2

Does not necessarily correlate well with user satisfaction

Baseline RMSE Scores on Test data 1.054 - just predict the mean user rating for each movie 0.953 - Netflix’s own system (Cinematch) as of 2006 0.941 - nearest-neighbor method using correlation 0.857 - required 10% reduction to win $1 million


Considerations

User preference changes over time

Problem of Concept Drift

Instance Selection

Instance Weighting

Tries difference exponential time decay rates to solve the problem


Considerations

Full extent of the time period, not only the present be-havior

Key to being able to extract signal from each time point, while neglecting only the noise

Multiple changing concepts should be captured

User or/and item dependency

User-item within a single framework

Do not try to extrapolate future temporal dynamics

Too difficult…….


Components of a rating predictor

user-movie interactionmovie biasuser bias

User-movie interaction

Characterizes the matching between users and movies

Attracts most research in the field

Baseline predictor• Separates users and movies

• Often overlooked • Benefits from insights into users’

behavior• Among the main practical contribu-

tions of the competition

(slide from Yehuda Koren)


Global temporal effects

Average movie rating made a jump

Ratings increase with the movie age at the time of the rating


Baseline predictor is :

, where

– u is overall average rating of a user

– and is observed deviation of user u and item i

Baseline predictors

– Rating scale of user u– Values of other ratings user

gave(day-specific mood, anchor-ing, multi-user accounts)

– Popularity of movie i– Selection bias; related to num-

ber of ratings user gave on the same day (“frequency”)


Two major temporal effects :

Item’s popularity is changing over time

User change their baseline rating over time

To take the parameter and as a function of time

Time changing baseline predic-tors


Item temporal effect

Considering resolution and enough rating

Each bin corresponds to roughly ten consecutive weeks of data

30 bins spanning all days in the dataset


Static model

Linear model , where , b=0.4

Spline model ,

Result static .9799 < Linear .9605 = Spline .9603

User temporal effect


User-Item temporal effect


Periodic effect

Dayparting

Some products can be more popular in specific seasons or near certain holidays

Different types of television or radio shows are popular throughout different segments of the day

Season, day-of-week effect

Unfortunately, periodic effects do not shows significant predictive power


The bellkor function


Performance

1.054 - just predict the mean user rating for each movie

0.953 - Netflix’s own system (Cinematch) as of 2006 0.941 - nearest-neighbor method using correlation 0.864 - Bellkor algorithm 2008

0.856 - Bellkor algorithm 2009, 10.05% improved


An Exploratory Study

Sudden rise in the average movie rating (early 2004)

Technical improvements in Netflix Cinematch

GUI improvements

Meaning of rating changed

‘Normal User’ increases (?)


An Exploratory Study

Movie’s age

Users prefer new movies without any reasons

Older movies are just inherently better than newer ones (x)


Conclusion

Tracking the temporal dynamics of user preference is unique challenges

Traditional decay models lose too much signal, thus de-grading prediction accuracy

Understanding your data is important, e.g., time-effects

Our model won the contest

collaborative filtering with temporal dynamics yehuda koren yahoo research israel kdd’09

Documents

rating slide

movies slide

cebt ratings data

problem slide

competition slide

cebt components

number of ratings user

cebt training data