item recommendation with variational autoencoders and

Post on 02-Dec-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Item Recommendation with Variational Autoencoders and Heterogenous Priors

Giannis Karamanolakis Kevin Raji Cherian Ananth Ravi Narayan

Jie Yuan Da Tang Tony Jebara

Columbia Columbia Columbia

Columbia Columbia Columbia, Netflix

U × I

Item Recommendation - Collaborative Filtering (CF)

U × I

Item Recommendation - Collaborative Filtering (CF)

? ?

U × I

Item Recommendation - Collaborative Filtering (CF)

? ?

U × I

Item Recommendation - Collaborative Filtering (CF)

?

U × I (U × K) × (K × I)

Item Recommendation - Collaborative Filtering (CF)

Latent Factor Models

U × I (U × K) × (K × I)

Item Recommendation - Collaborative Filtering (CF)

Latent Factor Models

(-) linear limited modeling capacity

U × I (U × K) × (K × I)

Item Recommendation - Collaborative Filtering (CF)

Latent Factor Models

(-) linear limited modeling capacityNon-linear Features

U × I (U × K) × (K × I)

Item Recommendation - Collaborative Filtering (CF)

Latent Factor Models

(-) linear limited modeling capacityNon-linear Features

Neural Networks

U × I (U × K) × (K × I)

Item Recommendation - Collaborative Filtering (CF)

Latent Factor Models

(-) linear limited modeling capacityNon-linear Features

Neural Networks

Variational Autoencoders (VAEs)

“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018

U × I (U × K) × (K × I)

Item Recommendation - Collaborative Filtering (CF)

Latent Factor Models

(-) linear limited modeling capacityNon-linear Features

Neural Networks

Variational Autoencoders (VAEs)

“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018

generalize linear latent factor modelshave larger modeling capacity(+)

(+)

“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014

VAEs for Collaborative Filtering

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)-

-

-

0.0

1.0

0.0

1.0

VAEs for Collaborative Filtering

Multinomial Likelihood

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)-

-

-

0.0

1.0

0.0

1.0

VAEs for Collaborative Filtering

0.70

0.00

0.15

0.00

Training Objective:

(Negative) reconstruction error

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

Regularization term (Kullback-Leibler Divergence)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

-

-

-

0.0

1.0

0.0

1.0

VAEs for Collaborative Filtering

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

𝒩(0,𝕀K)

-

-

-

0.0

1.0

0.0

1.0

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

-

-

-

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)-

- 0.70

0.00

0.15

0.00

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

0.70

0.00

0.15

0.00

-

-

-

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-

My variational posterior:

should be near my prior:

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-

My variational posterior:

should be near my prior:

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-

My variational posterior:

should be near my prior:

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-

My variational posterior:

should be near my prior:

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

?

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-

My variational posterior:

should be near my prior:

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

“Posterior-Collapse” Issue

𝒩(0,𝕀K)qϕ(zui |xui)

collapses to

Hoffman et al. 2016 (“ELBO surgery”)

∀ user:

Bowman et al. 2016,

?

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

VAEs for Collaborative Filtering

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)

My variational posterior:

should be near my prior:

qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-

My variational posterior:

should be near my prior:

qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

In this paper

Use heterogenous, user-dependent priors!

𝒩(tu4, 𝕊u4)p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)

This work: Item Recommendation with VAEs and Heterogenous Priors

• For each user , we replace by

- Prior parameters encode user preferences

u

- Explicitly encourage user diversity in latent VAE space

Using heterogenous, user-dependent priors

- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space

Using heterogenous, user-dependent priors

• For each user , we replace by

This work: Item Recommendation with VAEs and Heterogenous Priors

u

?How can you encode my preferences?

Have I revealed them?

?

- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space

Using heterogenous, user-dependent priors

• For each user , we replace by

This work: Item Recommendation with VAEs and Heterogenous Priors

u

How can you encode my preferences?

Have I revealed them?

Yes, you have!

By writing reviews…

Item Recommendation with VAEs and Heterogenous Priors

Users reveal their preferences in text reviews

Users reveal their preferences in text reviews

Item Recommendation with VAEs and Heterogenous Priors

Yelp Review

Item Recommendation with VAEs and Heterogenous Priors

Users reveal their preferences in text reviews

Yelp Review

IMDB Review

Item Recommendation with VAEs and Heterogenous Priors

Users reveal their preferences in text reviews

Yelp Review

IMDB Review

likes burgers

likes heist movies

cares aboutportion size

likes suspense

Item Recommendation with VAEs and Heterogenous Priors

Before After

Encoding user preferences from text (2 methods):

• , : functions of the user’s review text

Encoding user preferences from text (2 methods):

- Method 1: Word Embeddings (word2vec)- Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

Item Recommendation with VAEs and Heterogenous Priors

Before

• , : functions of the user’s review text

After

Encoding user preferences from text:• Method 1: Word Embeddings (word2vec)

Item Recommendation with VAEs and Heterogenous Priors

-dim Word EmbeddingsK Mikolov et. al. 2013: “Efficient estimation of word representations in vector space”

Encoding user preferences from text:

1. Create review embeddings: avg of word embeddings

Item Recommendation with VAEs and Heterogenous Priors

Review Embeddings-dim Word EmbeddingsK

1.

• Method 1: Word Embeddings (word2vec)

Encoding user preferences from text:

1. Create review embeddings: avg of word embeddings

- : avg of review embeddings (written by )u- : diagonal covariance matrix

-dim Word EmbeddingsK

diagonal values : std of review embeddingss1, . . . , sK ∈ ℝ

2. Represent each user: Gaussian distribution

Item Recommendation with VAEs and Heterogenous Priors

Review Embeddings User Distributions

1. 2.

• Method 1: Word Embeddings (word2vec)

Encoding user preferences from text:• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

Item Recommendation with VAEs and Heterogenous Priors

David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

Encoding user preferences from text:

1. Train LDA to extract topicsK

Item Recommendation with VAEs and Heterogenous Priors

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”

Encoding user preferences from text:

1. Train LDA to extract topics2. For each user :

Ku

Item Recommendation with VAEs and Heterogenous Priors

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

2a: concatenate all of the user’s reviews in one document

Encoding user preferences from text:

1. Train LDA to extract topics2. For each user :

K

2a: concatenate all of the user’s reviews in one document2b: represent as the distribution over the topics

u

u K

Topi

c 1

Topi

c 2

Topi

c K

Item Recommendation with VAEs and Heterogenous Priors

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

Encoding user preferences from text:

1. Train LDA to extract topics2. For each user :

K

2b: represent as the distribution over the topics

u

u

Topi

c 1

Topi

c 2

Topi

c K

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

Item Recommendation with VAEs and Heterogenous Priors

K

actionrobbery

killpolice

… … …

• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

2a: concatenate all of the user’s reviews in one document

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Online Reviews

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Text

Online Reviews

p(zu)1. word2vec2. LDA

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Text

Online Reviews

Ratings

p(zu)

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Text

Online Reviews

Ratings KL

p(zu)

Text

Online Reviews

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Ratings

p(zu)

Text

Online Reviews

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Ratings

0.70

0.01

0.15

0.00

p(zu)

Text

Online Reviews

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Ratings

0.70

0.01

0.15

0.00

Item Ranking

p(zu)

Text

Online Reviews

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Ratings

0.70

0.01

0.15

0.00

Item Ranking

p(zu)

Evaluate on held-out ratings

Evaluation Metrics

• NDCG@100• Recall@20• Recall@50

Text

Online Reviews

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Ratings

0.70

0.01

0.15

0.00

Item Ranking

p(zu)

Evaluation Metrics

• NDCG@100• Recall@20• Recall@50

EncoderI × 600 × 300

300 × 600 × I

MLP:DecoderMLP:

Evaluate on held-out ratings

Evaluation Datasets: Online Reviews (Rating & Text)

Preprocessing:

Item Recommendation with VAEs and Heterogenous Priors

% non-empty entries

•Yelp Challenge Dataset• IMDB Corpus of Movie Reviews

•Binarize ratings

•Reduce sparsity (cutoff)

-Yelp: 1-2 stars 0, 3-5 stars 1- IMDB: 1-4 stars 0, 5-10 stars 1

-Yelp: discard businesses < 30 reviews, users < 5 reviews- IMDB: discard movies < 5 reviews, users < 5 reviews

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

Model Comparison

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Evaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Relative Performance Improvement:Mult-VAE VAE-HPrior

+18.4% +29.4% +17.7% NDCG@100 Recall@20 Recall@50

+14.4% +18.7% +12.3% NDCG@100 Recall@20 Recall@50

YelpIMDB

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random order

std of scores

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Ranking the items in random orderMatrix Factorization

Text-only: Ranking items according to cos(tu, ti)VAE (Liang et al. 2018)

VAE with random user-dependent priorsVAE with Text Regularization

ℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

Denoising autoencoder (Liang et al. 2018)

~ 0.003~ 0.007std of scores

Model ComparisonEvaluation Results

Item Recommendation with VAEs and Heterogenous Priors

Conclusions•Extend VAEs to Collaborative Filtering with side information (ratings + text)

- User-agnostic user-dependent priors- Prior parameters as functions of the users’ review text

• Outperform the existing Mult-VAE model (up to 29.41% relative improvement in Recall@20)• Perform comparably to a denoising autoencoder (Mult-DAE).

- User representations in a multimodal latent space (encoding ratings + text)

Ongoing & Future work• Experiments: VAE-HPrior vs Mult-DAE on different levels of sparsity• Models: more effective aspect-based methods for extracting user preferences from text reviews• Data: extra side-information available (e.g., geolocation)

Thank you!

Giannis Karamanolakis gkaraman@cs.columbia.edu

Contact

Questions?

Giannis Karamanolakis gkaraman@cs.columbia.edu

Contact

top related