estimating the causal impact of recommender systems

48
Estimating the causal impact of recommendation systems AMIT SHARMA , JAKE HOFMAN, DUNCAN WATTS MICROSOFT RESEARCH, NEW YORK 1 2 nd International Conference on Computational Social Science

Upload: amit-sharma

Post on 12-Apr-2017

167 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Estimating the causal impact of recommender systems

1

Estimating the causal impact of recommendation systemsAMIT SHARMA, JAKE HOFMAN, DUNCAN WATTSMICROSOFT RESEARCH, NEW YORK

2nd International Conference on Computational Social Science

Page 2: Estimating the causal impact of recommender systems

2

How much do they change user behavior?

Page 3: Estimating the causal impact of recommender systems

3

Naively, up to 30% of traffic comes from recommendations

Page 4: Estimating the causal impact of recommender systems

4

Naively, up to 30% of traffic comes from recommendations

“Burton Snowboard, a sports retailer, reported that personalized product recommendations have driven nearly 25% of total sales since it began offering them in 2008. Prior to this, Burton’s customer recommendations consisted of items from its list of top-selling products.”

Page 5: Estimating the causal impact of recommender systems

Almost surely an over-estimate of the actual effect, because of correlated demand between a product and its recommendations.

Page 6: Estimating the causal impact of recommender systems

Example: product browsing on Amazon.com

Page 7: Estimating the causal impact of recommender systems

Example: product browsing on Amazon.com

Page 8: Estimating the causal impact of recommender systems

Example: product browsing on Amazon.com

Page 9: Estimating the causal impact of recommender systems

Counterfactual browsing: no recommendations

Page 10: Estimating the causal impact of recommender systems

Counterfactual browsing: no recommendations

Page 11: Estimating the causal impact of recommender systems

Problem: Correlated demand may drive page visits, even without recommendations

Page 12: Estimating the causal impact of recommender systems

12

The problem of correlated demand

Demand for winter

accessories

Visits to winter hat

Rec. visits to winter

gloves

Page 13: Estimating the causal impact of recommender systems

13

Goal: Estimate the extra activity caused by a recommender system that would not have happened otherwise

Causal

Convenience

OBSERVED CLICK-THROUGHS WITHOUT RECOMMENDER

Convenience

?

Page 14: Estimating the causal impact of recommender systems

14

Ideal experiment: A/B Test

Treatment (A)Control (B)

But, experiments:may be costlyhamper user experiencerequire full access to the system

Page 15: Estimating the causal impact of recommender systems

Experiments may be costly or infeasible.

Can we derive an observational method to identify the causal effect of recommendations?

Page 16: Estimating the causal impact of recommender systems

16

Using natural variations to simulate an experiment

Page 17: Estimating the causal impact of recommender systems

17

Studying sudden spikes, “shocks” to demand for a book

[Carmi et al. 2012]

Page 18: Estimating the causal impact of recommender systems

18

The same author’s recommended book may also have a shock

Page 19: Estimating the causal impact of recommender systems

19

Past work: Controlling for correlated demand

Uses statistical models to control for confounds Carmi et al. [2012], Oestreicher and Sundararajan [2012] and Lin [2013] construct “complementary sets” of similar, non-recommended products.

Garfinkel et. al. [2006] and Broder et al. [2015] compare to model-predicted clicks without recommendations.

But, 1. These assumptions are hard to verify.2. Finding examples of valid shocks requires ingenuity

and restricts researchers to very specific categories

Page 20: Estimating the causal impact of recommender systems

Shock-IV: A simpler, more robust method for estimating causal impact.

Page 21: Estimating the causal impact of recommender systems

21

Distinguishing between recommendation and direct traffic

All visits to a product

Recommender visits Direct visits

Search visits

Direct browsing

Proxy for unobserved demand

Page 22: Estimating the causal impact of recommender systems

22

The Shock-IV strategy: Searching for valid shocks

? ?

Page 23: Estimating the causal impact of recommender systems

23

The Shock-IV strategy: Filtering out invalid shocks

Page 24: Estimating the causal impact of recommender systems

Search for products that receive a sudden shock in their traffic but direct traffic for their recommendations remains constant.

Page 25: Estimating the causal impact of recommender systems

Why does it work? Shock as an instrumental variable

Demand

Focal visits (X)

Rec. visits (Y)

Sudden Shock

Directvisits (Y)

Page 26: Estimating the causal impact of recommender systems

Computing the causal estimate

Increase in recommendation clicks (Δr)

Causal CTR (ρ) = Δr/Δv

*Same as Wald estimator for instrumental variables

Increase in visits to focal product (Δv)

Page 27: Estimating the causal impact of recommender systems

The shock-IV strategy: In equations

At any time t:

When product i experiences a shock in page visits:

(Because constant dj implies constant convenience visits)

Page 28: Estimating the causal impact of recommender systems

Application to Amazon.com, using Bing toolbar logs

Anonymized browsing logs:

• 23 million pageviews

• 1.3 million Amazon products

• 2 million Bing Toolbar users

Sept 2013-May 2014

Page 29: Estimating the causal impact of recommender systems

Recreating sequence of page visits by a user

Search page Focal product page Recommended product page

Page 30: Estimating the causal impact of recommender systems

Recreating sequence of page visits by a user

Timestamp URL2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=George%20saunders

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_1

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_2

Page 31: Estimating the causal impact of recommender systems

Recreating sequence of page visits by a user

Timestamp URL2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=George%20saunders

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_1

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_2

User searches for George Saunders

User clicks on the first search result

User clicks on the second recommendation

Page 32: Estimating the causal impact of recommender systems

I. Weekly and seasonal patterns in traffic, nearly tripling in holidays

Page 33: Estimating the causal impact of recommender systems

II. 30% of all pageviews come through recommendations

Page 34: Estimating the causal impact of recommender systems

III. Books and eBooks are the most popular categories by far

Page 35: Estimating the causal impact of recommender systems

IV. Apparel and shoes see a substantially higher fraction of visits through recommendations

Page 36: Estimating the causal impact of recommender systems

36

Shock-IV: Finding shocks in user visit data

We look for focal products with large and sudden increases in views relative to typical traffic.

Size of shock exceeds:◦ 5 times median traffic◦ Shock exceeds 5 times the previous day's traffic and 5 times the

mean of the last 7 days.

Shocked product has: ◦ Visits from at least 10 unique users during the shock◦ Non-zero visits for at least five out of seven days before and after

the shock

Page 37: Estimating the causal impact of recommender systems

37

Shock-IV: Ensuring exclusion restriction

Recommended product (Y) should have constant direct visits during the time of the shock.

(1-β): Ratio of maximum 14-day variation in visits to a recommended product to the size of the shock for the focal product.

Direct traffic to Y is stable relative to the shock to the focal product.

β = 1 Direct traffic to Y is no less varying than the shock to focal product.

β = 0

Page 38: Estimating the causal impact of recommender systems

How to choose

Focal product visits Rec. product direct visits

Focal product visits Rec. product direct visits

Accept

RejectSelect

Page 39: Estimating the causal impact of recommender systems

Using the method, obtain >4000 natural experiments!

20% of all products that had visits on any single day.

Page 40: Estimating the causal impact of recommender systems

Estimating the causal clickthrough rate ()

ρ =Δrxyt*/ Δvxt*

At β = 0.7, causal CTR =3%.

Page 41: Estimating the causal impact of recommender systems

Causal click-through rate by product category

Page 42: Estimating the causal impact of recommender systems

What fraction of the observed click-throughs are causal?

Page 43: Estimating the causal impact of recommender systems

43

Estimating fraction of observed click-throughs that are causal

Compare the number of estimated causal clicks to all observed recommendation clicks (non-shock period).

λ = ρxy.vxt / rxyt

Page 44: Estimating the causal impact of recommender systems

Only a quarter of the observed click-throughs are causal

At β = 0.7, only 25% of recommendation traffic is caused by the recommender.

Page 45: Estimating the causal impact of recommender systems

45

Generalization? Shocks may be due to discounts or sales

Lower CTR may be due to the holiday season

Page 46: Estimating the causal impact of recommender systems

46

Local average treatment effect (LATE), not fully generalizable

Shocked products are not a representative sample of all products, nor are the users who participate in them.

• Shock-IV method covers roughly one-fifth of all products with at least 10 visits on any single day.

• Our results are robust to sale or holiday effects. • Causal estimates are consistent with

experimental findings (e.g., Belluf et. al. [2012])

Page 47: Estimating the causal impact of recommender systems

47

More generally…A robust, scalable method for causal inference.

◦ Causal CTR for Amazon’s recommender system much less than the naïve observational CTR.

◦ Can be applied to other domains, such as online ads.

Data mining for instruments I. Allows us to study a much larger sample of natural experiments, while being able to test for exclusion restriction directly.

II. Can be used for finding potential instruments.

Page 48: Estimating the causal impact of recommender systems

48

Thank you!AMIT SHARMA

MICROSOFT RESEARCH http://www.amitsharma.in

Sharma, A., Hofman, J. M., & Watts, D. J. (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the Sixteenth ACM Conference on Economics and Computation.

Shock-IV: A robust, scalable method for estimating causal impact from observational data, with testable assumptions.

Naïve observational estimates of CTR for recommendation systems may be big overestimates.