uplift-based evaluation and optimization of recommendersl generate a recommendation list l by model...

47
Uplift-based Evaluation and Optimization of Recommenders Masahiro Sato , Janmajay Singh, Sho Takemori, Takashi Sonoda, Qian Zhang, and Tomoko Ohkuma Fuji Xerox Co., Ltd. Sep. 18, 2019.

Upload: others

Post on 28-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Uplift-based Evaluation and Optimization of Recommenders�

Masahiro Sato, Janmajay Singh, Sho Takemori, Takashi Sonoda, Qian Zhang, and Tomoko Ohkuma Fuji Xerox Co., Ltd. Sep. 18, 2019.

Page 2: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Outline �

l Introduction

l Uplift-based Evaluation

l Uplift-based Optimization

l Experiment

l Summary

Page 3: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Introduction

Page 4: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�Recommended Items

Page 5: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�Recommended Items

Purchased Items

Precision 100%! Perfect recommendation?

Page 6: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�

What if NOT recommended? (counterfactual)

Page 7: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�

What if NOT recommended? (counterfactual)

Purchased Items

Page 8: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�Recommended Items

Purchased Items

What if NOT recommended? (counterfactual)

Purchased Items

No Difference!

Page 9: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�Recommended Items

Page 10: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�Recommended Items

Purchased Items

Precision 50% Lower than the former.

Page 11: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�

What if NOT recommended? (counterfactual)

Page 12: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�

What if NOT recommended? (counterfactual)

Purchased Items

Page 13: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Motivating Examples�Recommended Items

Purchased Items

What if NOT recommended? (counterfactual)

Purchased Items

Sales Increase!

Page 14: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Concept of Uplift�

l Uplift = increase in user actions caused by recommendations

Purchase probability

Uplift (increase by recommen-dation)

Recommendation of this week �

Purchase probability without recommendation

Purchase probability with recommendation

Item A�

Item A�

Page 15: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Focus on Uplift�

l From accuracy-based recommendation To uplift-based recommendation

Uplift

Purchase probability

Uplift-based recommendation

Accuracy-based recommendation

Page 16: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Unobservable Nature�

Both cannot be observed together�

Difficulty in evaluation and optimization�

Uplift (increase by recommen-dation)

Recommendation of this week �

Purchase probability without recommendation

Purchase probability with recommendation

Item A�

Item A�

Page 17: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

l Uplift-based Evaluation <- Causal Inference l Uplift-based Optimization <- Uplift modeling

In this work �

Both cannot be observed together�

Difficulty in evaluation and optimization�

Uplift (increase by recommen-dation)

Recommendation of this week �

Purchase probability without recommendation

Purchase probability with recommendation

Item A�

Item A�

Page 18: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Uplift-based Evaluation

Page 19: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Potential Outcomes and Item Classes�

l Potential outcomes (concept in causal inference): ü  YT in {0,1}: potential outcome if recommended ü  YC in {0,1}: potential outcome if NOT recommended

�purchased�Not

purchased�

Page 20: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Potential Outcomes and Item Classes�

l Potential outcomes (unobservable) : ü  YT in {0,1}: potential outcome if recommended ü  YC in {0,1}: potential outcome if NOT recommended

l Possible item classes (unobservable) : ü  True Uplift (TU). YT = 1 and YC = 0. ü  False Uplift (FU). YT = 1 and YC = 1. ü  True Drop (TD). YT = 0 and YC = 1. ü  False Drop (FD). YT = 0 and YC = 0.

<- want to recommend�

Page 21: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Procedure of Uplift Evaluation�

l  Generate a recommendation list L by model M.

FU � FU � FD � FD � FD � FD � FD � FD �TU � TU �

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 22: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Procedure of Uplift Evaluation�

l  Generate a recommendation list L by model M.

l  Check purchase (colored) in offline purchase logs.

FU � FU � FD � FD � FD � FD � FD � FD �TU � TU �

FU � FU � FD � FD � FD � FD � FD � FD �TU � TU �

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 23: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Procedure of Uplift Evaluation�

l  Generate a recommendation list L by model M.

l  Check purchase (colored) in offline purchase logs.

l  Split by whether included in offline recommendation logs.

�FU � FU � FD �FD � FD �FD � FD � FD �TU � TU �

FU � FU � FD � FD � FD � FD � FD � FD �TU � TU �

FU � FU � FD � FD � FD � FD � FD � FD �TU � TU �

In offline recommendation logs� Not in recommendation logs�

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 24: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Procedure of Uplift Evaluation�

�FU � FU � FD �FD � FD �FD � FD � FD �TU � TU �

l  Calculate the difference of the ratio of purchased items.

�2/5� 1/5�

2/5 – 1/5 = 1/5 �

This equals the ratio of TU items! (2/10 = 1/5)�

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 25: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Equation of Uplift Estimates�

FU � FU � FD �FD � FD �FD � FD � FD �TU � TU �

l Simple version of uplift estimates

Average purchase probability with recommendation

Average purchase probability without recommendation

Page 26: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

SNIPS version �

[1] Swaminathan and Joachims. In NIPS ’15.

l Self-normalized inverse propensity scoring (SNIPS) [1]

Propensity score e(X): probability of recommendation conditioned on covariates X.

Page 27: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Common Counterfactual Evaluation�

Similar to previous counterfactual evaluation

[2] Li et al. In WSDM’11. [3] Gilotte et al. In WSDM’18 [4] Gruson et al. In WSDM’19

Page 28: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Our Extension �

Ours includes possibility of purchase without recommendation

Page 29: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Uplift-based Optimization

Page 30: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Observed and Latent Classes�

�l Items are either recommended (R) or not (NR); and

either purchased (P) or not (NP).

CR-P ~ TU or FU CNR-P ~ FU or TD CR-NP ~ FU or TD

CNR-NP ~ TU or FU

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 31: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Observed and Latent Classes�

�l Items are either recommended (R) or not (NR); and

either purchased (P) or not (NP).

CR-P ~ TU or FU CNR-P ~ FU or TD CR-NP ~ FU or TD

CNR-NP ~ TU or FU

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

CR-P and CNR-NP include TU (True Uplift) items.

Page 32: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Define Training Labels�

�l Items are either recommended (R) or not (NR); and

either purchased (P) or not (NP).

CR-P ~ TU or FU CNR-P ~ FU or TD CR-NP ~ FU or TD

CNR-NP ~ TU or FU

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

CR-P and CNR-NP include TU (True Uplift) items.

Analogous to label transformation in uplift modeling

Page 33: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

More about Priorities�

�l Items are either recommended (R) or not (NR); and

either purchased (P) or not (NP).

CR-P ~ TU or FU CNR-P ~ FU or TD CR-NP ~ FU or TD

CNR-NP ~ TU or FU

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Note that we set higher priorities on CR-P, since

Page 34: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Conventional Optimizations�

Point- wise

Pair- wise

RMF (Regularized Matrix Factorization) BPR (Bayesian Personalized Ranking)

CR-P ~ TU or FU CNR-P ~ FU or TD

l Conventional accuracy-based optimization used purchased items as positives.

Posi-tives

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 35: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Proposed Optimizations�

�l Our proposed uplift-based optimization used

redefined positive items.

Point- wise

Pair- wise

RMF BPR

ULRMF ULBPR

CR-P ~ TU or FU CNR-P ~ FU or TD

CR-P ~ TU or FU CNR-NP ~ TU or FU

Posi-tives

True Uplift False Uplift True Drop False Drop

FU �

FD �TD �

TU �

Page 36: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Experiment

Page 37: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Research Questions and Used Datasets�

l RQ1: How do our uplift-based recommenders perform compared with other existing methods?

l RQ2: What are the properties of uplift-based optimization?

l RQ3: How do recommended items differ for traditional and uplift-based recommender methods?

Page 38: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Performance Comparison (RQ1)�

l Our ULRMF or ULBPR achieve the best for most cases.

RecResp [5] Sato et al. In UMAP ’16. CausE, CausE-Prod [6] Bonner and Vasile. In RecSys ‘18

Page 39: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Uplift-based Optimization Properties (RQ2)�

l ULRMF & ULBPR converge faster than RMF & BPR. �

Page 40: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Trends of the Recommended Items (RQ3)�

l RMF tends to recommend popular items. l ULRMF recommends items for impulse purchases

(pasta sauce, heat-and-serve meals).

Page 41: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Summary

Page 42: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Summary�

l Conclusions ü  Proposed Uplift-based Evaluation and Optimizations. ü  Uplift improvement over baselines. ü  Investigated characteristics of uplift optimizations.

l Future Work ü  Compare with online A/B experiments. ü  Uplift-based optimizations to neural models.

Page 43: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Appendix

Page 44: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Previous methods targeting uplift�

l Purchase probability prediction with and without recommendations [7,5,6]

l Recommendation based on: (purchase probability if recommended) - (purchase probability if NOT recommended)

l However, models are optimized for purchase prediction, not for uplift -> Direct optimization for uplift can improve uplift!

[7] Bodapati. In JMR ’08. [5] Sato et al. In UMAP ’16. [6] Bonner and Vasile. In RecSys ‘18.

Page 45: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

More on Priorities�

Page 46: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Uplift-based Optimization Properties (RQ2)�

l The optimal α (the probability of regarding NR-NP as positive) is less than 1.

l This supports our claim of treating NR-NP as an intermediate between positive and negative.

Page 47: Uplift-based Evaluation and Optimization of Recommendersl Generate a recommendation list L by model M. l Check purchase (colored) in offline purchase logs. l Split by whether included

Uplift-based Optimization Properties (RQ2)�

l ULBPR outperforms BPR in wide range of data densities.