item recommendation with variational autoencoders and

Item Recommendation with Variational Autoencoders and Heterogenous Priors

Giannis Karamanolakis Kevin Raji Cherian Ananth Ravi Narayan

Jie Yuan Da Tang Tony Jebara

Columbia Columbia Columbia

Columbia Columbia Columbia, Netflix

U × I

Item Recommendation - Collaborative Filtering (CF)

U × I

U × I (U × K) × (K × I)

Latent Factor Models

U × I (U × K) × (K × I)

(-) linear limited modeling capacity

U × I (U × K) × (K × I)

(-) linear limited modeling capacityNon-linear Features

U × I (U × K) × (K × I)

Neural Networks

U × I (U × K) × (K × I)

Neural Networks

Variational Autoencoders (VAEs)

“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018

U × I (U × K) × (K × I)

Neural Networks

Variational Autoencoders (VAEs)

“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018

generalize linear latent factor modelshave larger modeling capacity(+)

“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014

VAEs for Collaborative Filtering

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)-

Multinomial Likelihood

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)-

Training Objective:

(Negative) reconstruction error

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

Regularization term (Kullback-Leibler Divergence)

qϕ(zu |xu)

𝒩(0,𝕀K)

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

qϕ(zu |xu)

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)-

- 0.70

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

“Posterior-Collapse” Issue

𝒩(0,𝕀K)qϕ(zui |xui)

collapses to

Hoffman et al. 2016 (“ELBO surgery”)

∀ user:

Bowman et al. 2016,

Training Objective:

qϕ(zu |xu)

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

In this paper

Use heterogenous, user-dependent priors!

𝒩(tu4, 𝕊u4)p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

This work: Item Recommendation with VAEs and Heterogenous Priors

• For each user , we replace by

- Prior parameters encode user preferences

- Explicitly encourage user diversity in latent VAE space

Using heterogenous, user-dependent priors

- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space

?How can you encode my preferences?

Have I revealed them?

- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space

How can you encode my preferences?

Have I revealed them?

Yes, you have!

By writing reviews…

Item Recommendation with VAEs and Heterogenous Priors

Users reveal their preferences in text reviews

Yelp Review

IMDB Review

Yelp Review

IMDB Review

likes burgers

likes heist movies

cares aboutportion size

likes suspense

Before After

Encoding user preferences from text (2 methods):

• , : functions of the user’s review text

Encoding user preferences from text (2 methods):

- Method 1: Word Embeddings (word2vec)- Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

Before

• , : functions of the user’s review text

Encoding user preferences from text:• Method 1: Word Embeddings (word2vec)

-dim Word EmbeddingsK Mikolov et. al. 2013: “Efficient estimation of word representations in vector space”

Encoding user preferences from text:

1. Create review embeddings: avg of word embeddings

Review Embeddings-dim Word EmbeddingsK

• Method 1: Word Embeddings (word2vec)

1. Create review embeddings: avg of word embeddings

- : avg of review embeddings (written by )u- : diagonal covariance matrix

-dim Word EmbeddingsK

diagonal values : std of review embeddingss1, . . . , sK ∈ ℝ

2. Represent each user: Gaussian distribution

Review Embeddings User Distributions

• Method 1: Word Embeddings (word2vec)

Encoding user preferences from text:• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

1. Train LDA to extract topicsK

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”

1. Train LDA to extract topics2. For each user :

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

2a: concatenate all of the user’s reviews in one document

2a: concatenate all of the user’s reviews in one document2b: represent as the distribution over the topics

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

2b: represent as the distribution over the topics

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

2a: concatenate all of the user’s reviews in one document

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Online Reviews

p(zu)1. word2vec2. LDA

Online Reviews

Ratings

Online Reviews

Ratings KL

Online Reviews

Ratings

Online Reviews

Ratings

Online Reviews

Ratings

Item Ranking

Online Reviews

Ratings

Item Ranking

Evaluate on held-out ratings

Evaluation Metrics

• NDCG@100• Recall@20• Recall@50

Online Reviews

Ratings

Item Ranking

Evaluation Metrics

• NDCG@100• Recall@20• Recall@50

EncoderI × 600 × 300

300 × 600 × I

MLP:DecoderMLP:

Evaluate on held-out ratings

Evaluation Datasets: Online Reviews (Rating & Text)

Preprocessing:

% non-empty entries

•Yelp Challenge Dataset• IMDB Corpus of Movie Reviews

•Binarize ratings

•Reduce sparsity (cutoff)

-Yelp: 1-2 stars 0, 3-5 stars 1- IMDB: 1-4 stars 0, 5-10 stars 1

-Yelp: discard businesses < 30 reviews, users < 5 reviews- IMDB: discard movies < 5 reviews, users < 5 reviews

Ranking the items in random order

Model Comparison

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Evaluation Results

std of scores

~ 0.003~ 0.007

Model ComparisonEvaluation Results

std of scores

~ 0.003~ 0.007

std of scores

~ 0.003~ 0.007

std of scores

~ 0.003~ 0.007

std of scores

~ 0.003~ 0.007

Relative Performance Improvement:Mult-VAE VAE-HPrior

+18.4% +29.4% +17.7% NDCG@100 Recall@20 Recall@50

+14.4% +18.7% +12.3% NDCG@100 Recall@20 Recall@50

YelpIMDB

std of scores

~ 0.003~ 0.007

std of scores

~ 0.003~ 0.007

Ranking the items in random orderMatrix Factorization

Text-only: Ranking items according to cos(tu, ti)VAE (Liang et al. 2018)

VAE with random user-dependent priorsVAE with Text Regularization

ℒγ = ℒβ − γ ⋅ dist(zu, tu)

Denoising autoencoder (Liang et al. 2018)

~ 0.003~ 0.007std of scores

Conclusions•Extend VAEs to Collaborative Filtering with side information (ratings + text)

- User-agnostic user-dependent priors- Prior parameters as functions of the users’ review text

• Outperform the existing Mult-VAE model (up to 29.41% relative improvement in Recall@20)• Perform comparably to a denoising autoencoder (Mult-DAE).

- User representations in a multimodal latent space (encoding ratings + text)

Ongoing & Future work• Experiments: VAE-HPrior vs Mult-DAE on different levels of sparsity• Models: more effective aspect-based methods for extracting user preferences from text reviews• Data: extra side-information available (e.g., geolocation)

Thank you!

Giannis Karamanolakis gkaraman@cs.columbia.edu

Contact

Questions?

Giannis Karamanolakis gkaraman@cs.columbia.edu

Contact

item recommendation with variational autoencoders and

Documents

conditional flow variational autoencoders for structured...

factorized variational autoencoders for face landmarks...

variational autoencoders (vaes) -...

auto-encoders for hierarchical nonparametric variational...

variational autoencoders with jointly optimized...

autoencoders cs598laz -...

variational autoencoders for collaborative filtering ·...

introduction to variational autoencoders · introduction to...

isolating sources of disentanglement in variational ... ·...

variational autoencoders -...

variational autoencoders

designing variational autoencoders for image retrieval

how to train deep variational autoencoders and probabilistic...

auxiliary guided autoregressive variational autoencoders ·...

variational attention for sequence-to-sequence models ·...

variational autoencoders for collaborative filtering

deep clustering by gaussian mixture variational...

disentangling disentanglement in variational...

automatic chemical design using variational...

disclaimer - seoul national...