on the scalability of graph kernels applied to collaborative recommenders
DESCRIPTION
We study the scalability of several recent graph kernel-based collaborative recommendation algorithms.We compare the performance of several graph kernel-basedrecommendation algorithms, focusing on runtime and recommendation accuracy with respect to the reduced rank of the subspace. We inspect the exponential and Laplacian exponential kernels, the resistance distance kernel, the regularized Laplacian kernel, and the stochastic diffusion kernel. Furthermore, we introduce new variants of kernels based on the graphLaplacian which, in contrast to existing kernels, also allownegative edge weights and thus negative ratings. We perform an evaluation on the Netflix Prize rating corpus on prediction and recommendation tasks, showing that dimensionality reduction not only makes prediction faster, but sometimes also more accurate.TRANSCRIPT
![Page 1: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/1.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
On the Scalability of Graph Kernels Applied to Collaborative Recommenders
Jérôme Kunegis, Andreas Lommatzsch,
Christian Bauckhage, Şahin Albayrak
DAI-Labor, Technische Universität Berlin18th European Conference on Artificial Intelligence, Patras
Workshop on Recommender Systems
![Page 2: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/2.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Collaborative Filtering – Definition
Users rate items
•Task: Recommend items to users
•Task: Find similar users
•Task: Find similar items
•Task: Predict missing ratings
…using only ratings, not the content
Examples: Amazon, Netflix, …
![Page 3: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/3.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Ratings modeled as matrix or graph:
R=
A = [0 R; RT 0] is the adjacency matrix of G
•The rating graph is bipartite with weighted edges
•The adjacency matrix is symmetric, sparse
Rating Models
Negative edges
Items
Users
-1+1I3
+1-1-1I2
+1+1+1I1
U3U2U1I\UU1
U2
U3
I1
I2
I3
![Page 4: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/4.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Dimensionality Reduction
Find a low-rank approximation to A
A ≈ Ã = Uk Λ U
kT
Entry Ãui contains prediction for A
ui
The parameter k is the reduced dimension.
![Page 5: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/5.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Kernels
A kernel is a similarity function that is positive semi-definite
Represented as a matrix: K = GGT
(when the underlying space is finite)
The matrix à is a kernel when Λ ≥ 0.
![Page 6: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/6.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Exponential Diffusion Kernel
The matrix exponential of A
exp(αA) = Σi (αi/i!) Ai
Is a weighted sum of the powers Ai
•Ai contains the number of paths of length i between all node pairs
•exp(A) is a weighted mean of a path counts, weighted by the inverse factorial of their length, and powers of the parameter α.
![Page 7: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/7.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Laplacian Kernel
Define the Laplacian matrix L
L = D – A with Dii = Σ
j A
ij
It's pseudo-inverse K = L+ is a kernel
•Using K = GGT, we can consider the nodes in the space given by G
•Distances in this space correspond to the resistance distance
![Page 8: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/8.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Laplacian Kernel – Variants
Other Laplacian kernels:
(D – A)+ resistance distance kernel
(1 – α)(I – αD−1A) stochastic diffusion kernel
exp(−αL) Laplacian exponential diffusion kernel
(I + α(γD – A))−1 regularized Laplacian kernel
![Page 9: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/9.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Signed Laplacian Kernels
•Laplacian kernels are suited for positive data
•Ratings are signed: use signed Laplacian kernels
Let Ls = Ds – A
with Dsii = Σ
j |A
ij|
(see my presentation about Modeling Collaborative Similarity with the Signed Resistance Distance Kernel on Thursday)
![Page 10: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/10.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Computation
All kernels are computed using the eigenvalue decomposition of A or L.
Exponentiation and pseudoinversion are simple to apply to the decomposition:
exp(UΛUT) = U exp(Λ) UT
(UΛUT)+ = UΛ+UT
![Page 11: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/11.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Prediction and Recommendation
•Normalization: rnorm
= (2r – μu – μ
i) / (σ
u + σ
i)
(where mu/i and su/i are the mean and standard deviation of u/is ratings)
Rating prediction using the nearest-neighbor/weighted-mean algorithm:
•Find nearest neighbors according to the kernel that have already rated the item
•Compute weighted mean of their ratings, using the kernel as weight
For recommendation, rank all items by their prediction
![Page 12: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/12.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Evaluation – Prediction with Netflix
Root mean squared error on a subset of the Netflix Prize rating prediction task
![Page 13: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/13.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Evaluation – Recommendation with Netflix
Recommendation on the Netflix data, showing the normalized discounted cumulated gain
![Page 14: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/14.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Evaluation – Prediction with the Slashdot Zoo
Work in progress: predict whether users are friends or foes. Show mean score (−1 on error, +1 correct prediction)
![Page 15: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/15.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Observations 1
•Best performance may be achieved with a specific k or is best asymptotically for big k
•General trend: simple dimensionality reduction suffers from overfitting. Laplacian and to a lesser extent exponential kernels are good for large k.
•Best k is attained (if not at infinity), at k = 2, 3 for Netflix and k = up to 12 for Slashdot Zoo
•The Laplacians are smoother, making the choice of k less relevant
•Good recommender/bad predictor: Prediction seems to be harder that recommendation
![Page 16: On the Scalability of Graph Kernels Applied to Collaborative Recommenders](https://reader038.vdocuments.mx/reader038/viewer/2022100506/554e91e1b4c90573338b4e4b/html5/thumbnails/16.jpg)
2008-07-22 Dipl.-Inform. Jérôme Kunegis I [email protected]
Observations 2
•Good recommender/bad predictor: Prediction seems to be harder that recommendation. What is more relevant in practice?
•Signed kernels perform better than equivalent unsigned kernels.
(and don't forget my presentation on Thursday about the Signed Resistance Distance in CMI4 at 11:30)
THANK YOU!