multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative...

Multiverse Recommendation: N-dimensional TensorFactorization for Context-aware Collaborative

Filtering

Alexandros Karatzoglou1 Xavier Amatriain 1 Linas Baltrunas 2

Nuria Oliver 2

1Telefonica ResearchBarcelona, Spain

2Free University of BolzanoBolzano, Italy

November 4, 2010

1

Context in Recommender Systems

Context is an important factor to consider in personalizedRecommendation

2

Current State of the Art in Context- awareRecommendation

Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling

The approach presented here fits in the Contextual Modeling category

3


Pre-Filtering Techniques

Post-Filtering TechniquesContextual modeling


3


Pre-Filtering TechniquesPost-Filtering Techniques

Contextual modeling


3


Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling


3

Collaborative Filtering problem setting

Typically data sizes e.g. Netlix data n = 5× 105, m = 17× 103

4

Standard Matrix Factorization

Find U ∈ Rn×d and M ∈ Rd×m so that F = UM

minimizeU,ML(F ,Y ) + λΩ(U,M)

Movies!

Users!

U!

M!

5

Multiverse Recommendation: Tensors for ContextAware Collaborative Filtering

Movies!

Users!

Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗

R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]

6

Tensors for Context Aware Collaborative Filtering

Movies!

Users! U!

C!

M!

S!

Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗

R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]

7

Regularization

Ω[F ] = λM‖M‖2F + λU‖U‖2F + λC‖C‖2F

Ω[S] := λS ‖S‖2F (1)

8

Squared Error Loss Function

Many implementations of MF used a simple squared error regressionloss function

l(f , y) =12

(f − y)2

thus the loss over all users and items is:

L(F ,Y ) =n∑i

m∑j

l(fij , yij)

Note that this loss provides an estimate of the conditional mean

9

Absolute Error Loss Function

Alternatively one can use the absolute error loss function

l(f , y) = |f − y |

thus the loss over all users and items is:

L(F ,Y ) =n∑i

m∑j

l(fij , yij)

which provides an estimate of the conditional median

10

Optimization - Stochastic Gradient Descent for TF

The partial gradients with respect to U, M, C and S can then bewritten as:

∂Ui∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×M Mj∗ ×C Ck∗

∂Mj∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×C Ck∗

∂Ck∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×M Mj∗

∂S l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )Ui∗ ⊗Mj∗ ⊗ Ck∗

We then iteratively update the parameter matrices and tensors usingthe following update rules:

U t+1i∗ = U t

i∗ − η∂UL− ηλUUi∗

M t+1j∗ = M t

j∗ − η∂ML− ηλMMj∗

Ct+1k∗ = Ct

k∗ − η∂CL− ηλCCk∗

St+1 = St − η∂S l(Fijk ,Yijk )− ηλSS

where η is the learning rate.

11


The partial gradients with respect to U, M, C and S can then bewritten as:

∂Ui∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×M Mj∗ ×C Ck∗

∂Mj∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×C Ck∗

∂Ck∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×M Mj∗

∂S l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )Ui∗ ⊗Mj∗ ⊗ Ck∗

We then iteratively update the parameter matrices and tensors usingthe following update rules:

U t+1i∗ = U t

i∗ − η∂UL− ηλUUi∗

M t+1j∗ = M t

j∗ − η∂ML− ηλMMj∗

Ct+1k∗ = Ct

k∗ − η∂CL− ηλCCk∗

St+1 = St − η∂S l(Fijk ,Yijk )− ηλSS

where η is the learning rate.11


Movies!

Users! U!

C!

M!

S!

12


Movies!

Users! U!

C!

M!

S!

13


Movies!

Users! U!

C!

M!

S!

14


Movies!

Users! U!

C!

M!

S!

15


Movies!

Users! U!

C!

M!

S!

16

Experimental evaluation

We evaluate our model on contextual rating data and computing theMean Absolute Error (MAE),using 5-fold cross validation defined asfollows:

MAE =1K

n,m,c∑ijk

Dijk |Yijk − Fijk |

17

Data

Data set Users Movies Context Dim. Ratings ScaleYahoo! 7642 11915 2 221K 1-5Adom. 84 192 5 1464 1-13Food 212 20 2 6360 1-5

Table: Data set statistics

18

Context Aware Methods

Pre-filtering based approach, (G. Adomavicius et.al), computesrecommendations using only the ratings made in the same contextas the target oneItem splitting method (L. Baltrunas, F. Ricci) which identifies itemswhich have significant differences in their rating under differentcontext situations.

19

Results: Context vs. No Context

Nocontext

TensorFactorization

1.9

2.0

2.1

2.2

2.3

2.4

2.5

MA

E

(a)

Nocontext

TensorFactorization

0.80

0.85

0.90

0.95

MA

E

(b)

Figure: Comparison of matrix (no context) and tensor (context) factorizationon the Adom and Food data.

20

Yahoo Artificial Data

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

MA

E

α=0.1

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00α=0.5

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00α=0.9

No Context Reduction Item-Split Tensor Factorization

Figure: Comparison of context-aware methods on the Yahoo! artificial data

21

Yahoo Artificial Data

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Probability of contextual influence

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95M

AE

No Context

Reduction

Item-Split

Tensor Factorization

Figure: Evolution of MAE values for different methods with increasinginfluence of the context variable on the Yahoo! data.

22


Reduction Item-Split TensorFactorization

1.9

2.0

2.1

2.2

2.3

2.4

2.5

MA

E

Figure: Comparison of context-aware methods on the Adom data.

23


Nocontext

Reduction TensorFactorization

0.80

0.85

0.90

0.95

MA

E

Figure: Comparison of context-aware methods on the Food data.

24

Conclusions

Tensor Factorization methods seem to be promising for CARSMany different TF methods existFuture work: extend to implicit taste dataTensor representation of context data seems promising

25

Thank You !

26

multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative...

Technology