multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative...
DESCRIPTION
Slides from RecSys 2010 presentation. Context has been recognized as an important factor to con- sider in personalized Recommender Systems. However, most model-based Collaborative Filtering approaches such as Ma- trix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we introduce a Collaborative Filtering method based on Tensor Factorization, a generalization of Matrix Factoriza- tion that allows for a flexible and generic integration of con- textual information by modeling the data as a User-Item- Context N-dimensional tensor instead of the traditional 2D User-Item matrix. In the proposed model, called Multiverse Recommendation, different types of context are considered as additional dimensions in the representation of the data as a tensorTRANSCRIPT
Multiverse Recommendation: N-dimensional TensorFactorization for Context-aware Collaborative
Filtering
Alexandros Karatzoglou1 Xavier Amatriain 1 Linas Baltrunas 2
Nuria Oliver 2
1Telefonica ResearchBarcelona, Spain
2Free University of BolzanoBolzano, Italy
November 4, 2010
1
Context in Recommender Systems
Context is an important factor to consider in personalizedRecommendation
2
Context in Recommender Systems
Context is an important factor to consider in personalizedRecommendation
2
Context in Recommender Systems
Context is an important factor to consider in personalizedRecommendation
2
Context in Recommender Systems
Context is an important factor to consider in personalizedRecommendation
2
Context in Recommender Systems
Context is an important factor to consider in personalizedRecommendation
2
Current State of the Art in Context- awareRecommendation
Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling
The approach presented here fits in the Contextual Modeling category
3
Current State of the Art in Context- awareRecommendation
Pre-Filtering Techniques
Post-Filtering TechniquesContextual modeling
The approach presented here fits in the Contextual Modeling category
3
Current State of the Art in Context- awareRecommendation
Pre-Filtering TechniquesPost-Filtering Techniques
Contextual modeling
The approach presented here fits in the Contextual Modeling category
3
Current State of the Art in Context- awareRecommendation
Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling
The approach presented here fits in the Contextual Modeling category
3
Current State of the Art in Context- awareRecommendation
Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling
The approach presented here fits in the Contextual Modeling category
3
Collaborative Filtering problem setting
Typically data sizes e.g. Netlix data n = 5× 105, m = 17× 103
4
Standard Matrix Factorization
Find U ∈ Rn×d and M ∈ Rd×m so that F = UM
minimizeU,ML(F ,Y ) + λΩ(U,M)
Movies!
Users!
U!
M!
5
Multiverse Recommendation: Tensors for ContextAware Collaborative Filtering
Movies!
Users!
Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗
R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]
6
Tensors for Context Aware Collaborative Filtering
Movies!
Users! U!
C!
M!
S!
Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗
R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]
7
Tensors for Context Aware Collaborative Filtering
Movies!
Users! U!
C!
M!
S!
Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗
R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]
7
Regularization
Ω[F ] = λM‖M‖2F + λU‖U‖2F + λC‖C‖2F
Ω[S] := λS ‖S‖2F (1)
8
Squared Error Loss Function
Many implementations of MF used a simple squared error regressionloss function
l(f , y) =12
(f − y)2
thus the loss over all users and items is:
L(F ,Y ) =n∑i
m∑j
l(fij , yij)
Note that this loss provides an estimate of the conditional mean
9
Absolute Error Loss Function
Alternatively one can use the absolute error loss function
l(f , y) = |f − y |
thus the loss over all users and items is:
L(F ,Y ) =n∑i
m∑j
l(fij , yij)
which provides an estimate of the conditional median
10
Optimization - Stochastic Gradient Descent for TF
The partial gradients with respect to U, M, C and S can then bewritten as:
∂Ui∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×M Mj∗ ×C Ck∗
∂Mj∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×C Ck∗
∂Ck∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×M Mj∗
∂S l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )Ui∗ ⊗Mj∗ ⊗ Ck∗
We then iteratively update the parameter matrices and tensors usingthe following update rules:
U t+1i∗ = U t
i∗ − η∂UL− ηλUUi∗
M t+1j∗ = M t
j∗ − η∂ML− ηλMMj∗
Ct+1k∗ = Ct
k∗ − η∂CL− ηλCCk∗
St+1 = St − η∂S l(Fijk ,Yijk )− ηλSS
where η is the learning rate.
11
Optimization - Stochastic Gradient Descent for TF
The partial gradients with respect to U, M, C and S can then bewritten as:
∂Ui∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×M Mj∗ ×C Ck∗
∂Mj∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×C Ck∗
∂Ck∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×M Mj∗
∂S l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )Ui∗ ⊗Mj∗ ⊗ Ck∗
We then iteratively update the parameter matrices and tensors usingthe following update rules:
U t+1i∗ = U t
i∗ − η∂UL− ηλUUi∗
M t+1j∗ = M t
j∗ − η∂ML− ηλMMj∗
Ct+1k∗ = Ct
k∗ − η∂CL− ηλCCk∗
St+1 = St − η∂S l(Fijk ,Yijk )− ηλSS
where η is the learning rate.11
Optimization - Stochastic Gradient Descent for TF
Movies!
Users! U!
C!
M!
S!
12
Optimization - Stochastic Gradient Descent for TF
Movies!
Users! U!
C!
M!
S!
13
Optimization - Stochastic Gradient Descent for TF
Movies!
Users! U!
C!
M!
S!
14
Optimization - Stochastic Gradient Descent for TF
Movies!
Users! U!
C!
M!
S!
15
Optimization - Stochastic Gradient Descent for TF
Movies!
Users! U!
C!
M!
S!
16
Experimental evaluation
We evaluate our model on contextual rating data and computing theMean Absolute Error (MAE),using 5-fold cross validation defined asfollows:
MAE =1K
n,m,c∑ijk
Dijk |Yijk − Fijk |
17
Data
Data set Users Movies Context Dim. Ratings ScaleYahoo! 7642 11915 2 221K 1-5Adom. 84 192 5 1464 1-13Food 212 20 2 6360 1-5
Table: Data set statistics
18
Context Aware Methods
Pre-filtering based approach, (G. Adomavicius et.al), computesrecommendations using only the ratings made in the same contextas the target oneItem splitting method (L. Baltrunas, F. Ricci) which identifies itemswhich have significant differences in their rating under differentcontext situations.
19
Results: Context vs. No Context
Nocontext
TensorFactorization
1.9
2.0
2.1
2.2
2.3
2.4
2.5
MA
E
(a)
Nocontext
TensorFactorization
0.80
0.85
0.90
0.95
MA
E
(b)
Figure: Comparison of matrix (no context) and tensor (context) factorizationon the Adom and Food data.
20
Yahoo Artificial Data
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
MA
E
α=0.1
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00α=0.5
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00α=0.9
No Context Reduction Item-Split Tensor Factorization
Figure: Comparison of context-aware methods on the Yahoo! artificial data
21
Yahoo Artificial Data
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Probability of contextual influence
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95M
AE
No Context
Reduction
Item-Split
Tensor Factorization
Figure: Evolution of MAE values for different methods with increasinginfluence of the context variable on the Yahoo! data.
22
Tensor Factorization
Reduction Item-Split TensorFactorization
1.9
2.0
2.1
2.2
2.3
2.4
2.5
MA
E
Figure: Comparison of context-aware methods on the Adom data.
23
Tensor Factorization
Nocontext
Reduction TensorFactorization
0.80
0.85
0.90
0.95
MA
E
Figure: Comparison of context-aware methods on the Food data.
24
Conclusions
Tensor Factorization methods seem to be promising for CARSMany different TF methods existFuture work: extend to implicit taste dataTensor representation of context data seems promising
25
Thank You !
26