causal data mining: identifying causal effects at scale

103
Causal data mining: Identifying causal effects at scale 1 AMIT SHARMA Postdoctoral Researcher, Microsoft Research New York http://www.amitsharma.in @ amt_shrma

Upload: amit-sharma

Post on 16-Apr-2017

173 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Causal data mining: Identifying causal effects at scale

1

Causal data mining: Identifying causal effects at scaleAMIT SHARMA Postdoctoral Researcher, Microsoft Research New Yorkhttp://www.amitsharma.in@amt_shrma

Page 2: Causal data mining: Identifying causal effects at scale

2

A tale of two questions

Q1: How much activity comes from the recommendation system?

Q2: How much activity comes because of the recommendation system?

Page 3: Causal data mining: Identifying causal effects at scale

3

How much activity comes because of the recommendation system?

A causal question.

With recommender

Without recommender

Real world Counterfactual world

Page 4: Causal data mining: Identifying causal effects at scale

2. Evaluating systems

1. Modeling user behavior

Understanding causal relationships from data

Page 5: Causal data mining: Identifying causal effects at scale

Distinguishing between personal preference and homophily in online activity feeds. Sharma and

Cosley (2016).

Studying and modeling the effect of social explanations in recommender systems. Sharma

and Cosley (2013).

Amit and Dan like this.

SOME MUSICAL ARTIST

2. Evaluating and improving systems

Understanding causal relationships from data

Page 6: Causal data mining: Identifying causal effects at scale

Distinguishing between personal preference and homophily in online activity feeds. Sharma and

Cosley (2016).

Studying and modeling the effect of social explanations in recommender systems. Sharma

and Cosley (2013).

Amit and Dan like this.

SOME MUSICAL ARTIST

Understanding causal relationships from data

Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior. Barbosa, Cosley,

Sharma, Cesar (2016)

Auditing search engines for differential satisfaction across demographics. Mehrotra, Anderson, Diaz, Sharma,

Wallach (2016)

Page 7: Causal data mining: Identifying causal effects at scale

7

A core problem across the sciencesJake and Duncan like this

Understanding causal relationships from data

Code profiling, static analysis [Berger et al.]

Debugging machine learning [Chakarov et

al.]

Decision-making in robotics

Page 8: Causal data mining: Identifying causal effects at scale

8

Why is it hard?

Without:

Recommender System

algorithm Any code

change Social policy,

medical treatment

Observed Data from the Real world

No data from the Counterfactual

world

Without a randomized experiment, hard to estimate.

Page 9: Causal data mining: Identifying causal effects at scale

9

Difference between prediction and causation

Cause (X=) Outcome (Y)

Unobserved Confounders

(U)𝒚= 𝑓 (𝒙 ,𝑢) 𝑔 𝑓

𝑓

Hofman, Sharma, and Watts (2017). Science, 355.6324

Page 10: Causal data mining: Identifying causal effects at scale

10

Prediction: = ( )+𝑦 𝑘 𝑥 𝜖

¿ 𝑋 ,𝑌>¿�̂� ?

Hofman, Sharma, and Watts (2017). Science, 355.6324

Causation:

¿ 𝑋 ,𝑌>¿

Page 11: Causal data mining: Identifying causal effects at scale

11

Research goal

How can we use large-scale data to infer causal estimates?

Use algorithms to find experiment-like data: Quasi (”Natural”) experiments

Page 12: Causal data mining: Identifying causal effects at scale

12 PredictionCausation

𝑦=𝛽𝑥+𝜖

¿ 𝑋 ,𝑌>¿�̂�

¿ 𝑋 ,𝑌>¿

�̂�

Natural experiment

Page 13: Causal data mining: Identifying causal effects at scale

Combine Pearl’s causal graphical model framework with natural

experiments

Page 14: Causal data mining: Identifying causal effects at scale

14

Inverting the natural experiment paradigmHypothesize about a natural variation

Argue why it resembles a randomized experiment

Observational DataDevelop tests for validity of natural

variation

Mine for data subsets with such

valid variations

¿ 𝑋 ,𝑌>¿

Page 15: Causal data mining: Identifying causal effects at scale

15

¿ 𝑋 ,𝑌>¿

Natural Experimen

t

Natural Experimen

t

Natural Experimen

t

Natural Experimen

t

Natural Experimen

t

Natural Experimen

tNatural

Experiment

Since 1850s¿ 𝑋 ,𝑌>¿

Page 16: Causal data mining: Identifying causal effects at scale

16

¿ 𝑋 ,𝑌>¿Natural

Experiment

¿ 𝑋 ,𝑌>¿

1. Split-door Criterion Causal effect of recommender systems

2. Bayesian Natural Experiment Test Validate past economics studies

Data mining for causal inference

¿ 𝑋 ,𝑌>¿

Page 17: Causal data mining: Identifying causal effects at scale

17

Part 0: Traditional causal inference using a natural experiment

Page 18: Causal data mining: Identifying causal effects at scale

18

1854: London was having a devastating cholera outbreak

Page 19: Causal data mining: Identifying causal effects at scale

19

Causal question: What is causing cholera?Air-borne: Spreads through air (“miasma”)

Water-borne: Spreads through contaminated water

Page 20: Causal data mining: Identifying causal effects at scale

Polluted Air

Cholera Diagnosis

Contaminated Water

Cholera Diagnosis

Neighborhood

Page 21: Causal data mining: Identifying causal effects at scale

21

Enter John Snow. He found higher cholera deaths near a water pump, but could be just

correlational.

Page 22: Causal data mining: Identifying causal effects at scale

22

S & V

WATER

COM

PANY

LAMBETH

WATER

COMPANY

New Idea: Two major water companies for London:

one upstream and one downstream.

Page 23: Causal data mining: Identifying causal effects at scale

23

No difference in neighborhood, still an 8-fold increase in cholera with the downstream

company.

S&V and Lambeth

Page 24: Causal data mining: Identifying causal effects at scale

24

Led to a change in belief about cholera’s cause.

Page 25: Causal data mining: Identifying causal effects at scale

25

• Choice of water company cannot cause cholera.

• Choice of water company was not related to people’s neighborhood or its air quality. • People receiving water from the two companies were interspersed

within neighborhoods.

Why was Snow’s study so convincing?

Page 26: Causal data mining: Identifying causal effects at scale

Choice of water company cannot cause cholera.Choice of water company is not related to neighborhood.

Probably the first application of cause-effect principles

26

Exclusion

As-if-random

Contaminated Water

Cholera Diagnosis

Neighborhood

Water Compan

y

Page 27: Causal data mining: Identifying causal effects at scale

27

Contaminated Water (X)

Cholera Diagnosis

(Y)

Other factors [e.g.

neighborhood] (U)

Water Compan

y(Z)

As-If-Random

Exclusion

Two assumptions central to causal inference: Exclusion and As-if-random

Page 28: Causal data mining: Identifying causal effects at scale

28(𝑍 ∐𝑌∨𝑋 ,𝑈 )

Cause (X) Outcome (Y)

Unobserved Confounders

(U)

New variable

(Z)

As-If-Random

Exclusion

¿

Two assumptions central to causal inference: Exclusion and As-if-random

Page 29: Causal data mining: Identifying causal effects at scale

1930s: Fisher introduces randomized experiment

Since then, these assumptions have formed the core of causal inference

29

Cause (X)

Outcome (Y)

Unobserved Confounders

(U)

Randomized

Assignment (Z)

Exclusion: Randomized assignment should not affect outcome.As-if-random: Randomized assignment should be independent of unobserved confounders.

Page 30: Causal data mining: Identifying causal effects at scale

Z is now a special observed variable, called an instrumental variable.

All studies using observational data also need to satisfy these two assumptions

Cause (X)

Outcome (Y)

Unobserved Confounders

(U)

Instrumental Variable

(Z)30

Page 31: Causal data mining: Identifying causal effects at scale

But Exclusion and As-if-random are hard to establish, because of unobserved confounders.

Page 32: Causal data mining: Identifying causal effects at scale

32

More formally…

Full dataset Subsets of the data

¿ 𝑋 ,𝑌>¿

Expt

¿ 𝑋 ,𝑌>¿

Expt

Expt

Expt

Such that:As-If-Random: Exclusion:

Hard to verify from observed data.

Page 33: Causal data mining: Identifying causal effects at scale

33

Current methods haven’t changed much from that used by John Snow in 1850s.Use rhetorical arguments to justify an instrumental variable.

1. Manually finding an instrumental variable restricts researchers to single-source events (e.g. weather or lottery)

2. Still no guarantee that either Exclusion or As-if-random is satisfied.

Page 34: Causal data mining: Identifying causal effects at scale

34

Causal data mining: Inverting the natural experiment paradigm

Hypothesize about a natural variation

Argue why it resembles a randomized experiment

Observational DataDevelop tests for validity of natural

variation

Mine for data subsets with such

valid variations

¿ 𝑋 ,𝑌>¿

Page 35: Causal data mining: Identifying causal effects at scale

35

Part I: Split-door criterion for causal identification

Page 36: Causal data mining: Identifying causal effects at scale

36

Intuition: What if we can observe an auxiliary outcome that is unaffected by causal variable?

Cause Outcome

Unobserved

Confounders

Auxiliary

Outcome

Outcome can be separated into two observable parts:

i) Primary outcome: (possibly) affected by cause

ii) Auxiliary outcome: unaffected by cause

Page 37: Causal data mining: Identifying causal effects at scale

37

Intuition: What if we can observe an auxiliary outcome that is unaffected by causal variable?

CausePrimary Outcom

e

Unobserved

Confounders

Auxiliary

Outcome

Page 38: Causal data mining: Identifying causal effects at scale

Intuition: What if we can observe an auxiliary outcome that is unaffected by causal variable?

Cause Outcome

Unobserved

Confounders

Auxiliary

Outcome

Page 39: Causal data mining: Identifying causal effects at scale

Intuition: What if we can observe an auxiliary outcome that is unaffected by causal variable?

Cause Outcome

Unobserved

Confounders

Auxiliary

Outcome

Page 40: Causal data mining: Identifying causal effects at scale

40

Simplest case: Outcome can be separated into two observable parts

i) Primary outcome: (possibly) affected by cause

ii) Auxiliary outcome: unaffected by cause

Page 41: Causal data mining: Identifying causal effects at scale

41

Such outcome data commonly available in digital systemsRecommender systemsAd systemsApp notificationsAny content website (such as news)

Let’s take a concrete example: recommender systems

Page 42: Causal data mining: Identifying causal effects at scale

42

Can we find such an auxiliary outcome ()?

Page 43: Causal data mining: Identifying causal effects at scale

43

Example: Estimating the causal impact of a recommender system (novel recommendations)

Page 44: Causal data mining: Identifying causal effects at scale

44

How much activity comes from the recommendation system?

30% of product page visits.

30% of groups joined.

80% of movies watched.

Sharma and Yan (2013), Sharma, Hofman and Watts (2015), Gomez and Hunt (2015)

Page 45: Causal data mining: Identifying causal effects at scale

Confounding: Observed click-throughs may be due to correlated demand

45

Demand for The Road

Visits to The Road

Rec. visits to No

Country for Old

Men

Demand for No Country for Old Men

Correlated Demand for Cormac McCarthy

Page 46: Causal data mining: Identifying causal effects at scale

46

Observed activity is almost surely an overestimate of the causal effect

Causal

Convenience

OBSERVED ACTIVITY

FROM RECOMMENDER

All page visits

?

ACTIVITY WITHOUT

RECOMMENDER

Page 47: Causal data mining: Identifying causal effects at scale

47

Counterfactual thought experiment: What would have happened without recommendations?

Page 48: Causal data mining: Identifying causal effects at scale

48

Hypothetical experiment: Randomized A/B test

But such experiments can be costly.Can we develop an offline metric?

Treatment (A) Control (B)

Page 49: Causal data mining: Identifying causal effects at scale

49

Past work: traditional instrumental variable

Instrument

Demand for Cormac

McCarthy

Visits to The Road

Rec. visits to No

Country for Old

Men

Carmi et al. (2012)

Page 50: Causal data mining: Identifying causal effects at scale

Data mining approach (Shock-IV): Finding valid shocks across product categories

50

Shock to demand of a product due to

Oprah

Argue why it resembles a randomized experiment

¿ 𝑋 ,𝑌>¿ Develop tests for validity of a shock

Mine for shocks in observational data

¿ 𝑋 ,𝑌>¿

Page 51: Causal data mining: Identifying causal effects at scale

Finding auxiliary outcome: Split outcome into recommender (primary) and direct visits (auxiliary)

51

All visits to a recommended

product

Recommender visits Direct visits

Search visits

Direct browsing

Auxiliary outcome: Proxy for unobserved demand

Page 52: Causal data mining: Identifying causal effects at scale

52

Causal graphical model for effect of a recommendation system

Demand for focal

product(UX)

Visits to focal

product (X)Rec. visits

(YR)Direct

visits (YD

Demand for rec.

product(UY)

Page 53: Causal data mining: Identifying causal effects at scale

? ?

1a. Search for any product with a shock to page visits

53

Page 54: Causal data mining: Identifying causal effects at scale

1b. Filtering out invalid natural experiments

54

Page 55: Causal data mining: Identifying causal effects at scale

55

The “split-door” criterionTest if auxiliary outcome is independent of the cause. Criterion:

Exclusion

Demand for focal

product(UX)

Visits to focal

product (X)Rec. visits

(YR)Direct

visits (YD

Demand for rec.

product(UY)

Page 56: Causal data mining: Identifying causal effects at scale

56

More formally, why does it work?

Theorem 1: Barring incidental equality of parameters, statistical independence of and guarantees unconfoundedness between and .Proof: Follows from properties of causal graphical models and Pearl’s do-calculus [Pearl 2009]

Unobserved variables

(UX)

Cause(X)

Outcome (YR)

Auxiliary Outcome

(YD

Unobserved variables

(UY)

Page 57: Causal data mining: Identifying causal effects at scale

57

Example: Assuming a linear model

Theorem 1a: Whenever , and , then the unbiased causal estimate can be estimated as:

TreatmentOutcome: Unobserved confoundersCausal effectParameters

Page 58: Causal data mining: Identifying causal effects at scale

58

Relationship to instrumental variable techniqueBoth utilize naturally occurring variation in data.

Instrumental Variable Split-door criterionAssumption: Exclusion and As-if-random

Independence test used to find natural experiments.Only Assumption: Auxiliary outcome is affected by the causes of the primary outcome.

By testing if treatment is independent of auxiliary outcome, Split-door requires a weaker dependence

assumption for validity.

Page 59: Causal data mining: Identifying causal effects at scale

59

By testing if treatment is independent of auxiliary outcome, Split-door requires a weaker dependence

assumption for validity.

Treatment

Outcome

Unobserved

Confounders

Exclusion?

Instrumental Variable

Treatment

Outcome

Unobserved

Confounders

Auxiliary Outcome

Split-door criterion

Instrumental Variable

Page 60: Causal data mining: Identifying causal effects at scale

Data from Amazon.com, using Bing toolbarAnonymized browsing logs (Sept 2013-May 2014)• 23 M pageviews • 2 M Bing Toolbar users• 1.3 M Amazon productsOut of which 20 K products have at least 10 visits on any one day

Page 61: Causal data mining: Identifying causal effects at scale

61

Constructed sequence of visits for each user

Search page Focal product pageRecommended product page

Page 62: Causal data mining: Identifying causal effects at scale

62

Recreating sequence of visits: Log data

Timestamp URL2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=Cormac%20McCarthy

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_2

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_1

Page 63: Causal data mining: Identifying causal effects at scale

63

Recreating sequence of visits: Log data

Timestamp

URL

2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=Cormac%20McCarthy

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_2

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_1

User searches for Cormac McCarthy

User clicks on the second search result

User clicks on the first recommendation

Page 64: Causal data mining: Identifying causal effects at scale

I. Weekly and seasonal patterns in traffic, nearly tripling in holidays

Page 65: Causal data mining: Identifying causal effects at scale

65

II. 30% of pageviews come from recommendations

Page 66: Causal data mining: Identifying causal effects at scale

III. Books and eBooks are the most popular categories by far

Page 67: Causal data mining: Identifying causal effects at scale

67

Implementing the split-door criterion

¿ 𝑋 ,𝑌 𝐷>¿

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

Nat. Expt.

days

𝑥(2) , 𝑦𝐷(2)

𝑥(𝑖) , 𝑦𝐷(𝑖)

𝑥(1) , 𝑦𝐷(1 )

𝑥(𝑛−1) , 𝑦𝐷(𝑛−1 )

𝑥(𝑛) , 𝑦𝐷(𝑛)

Causal effect

Page 68: Causal data mining: Identifying causal effects at scale

68

Implementing the split-door criterion1. Divide up data into t=15 day periods.

2. For each time period: a) Using Fisher’s test, find product pairs (X and Y) such that:

Visits to focal product: Direct visits to recommended product

b) Compute

Page 69: Causal data mining: Identifying causal effects at scale

69

Using the split-door criterion, obtain 23,000 natural experiments for over

12,000 products.1) Traditional IV method using Oprah Winfrey

[Carmi et al.]: 133 natural experiments

2) Covers more than half of all products~20k

Page 70: Causal data mining: Identifying causal effects at scale

70

VALID

INVALID

Page 71: Causal data mining: Identifying causal effects at scale

71

Observational click-through rate overestimates causal effect

Over half of the recommendation click-throughs would have happened anyways.

Page 72: Causal data mining: Identifying causal effects at scale

72

Can vary the confidence in validity of obtained natural experiments

Page 73: Causal data mining: Identifying causal effects at scale

73

Page 74: Causal data mining: Identifying causal effects at scale

74

Similar, more precise causal estimates than simply using shocks

Page 75: Causal data mining: Identifying causal effects at scale

75

Generalization? Distribution of products with a natural experiment identical to overall distribution

Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012], Lee

and Hosanager [2014])

Page 76: Causal data mining: Identifying causal effects at scale

• Shocks may be due to discounts or sales

Generalizable to all products on amazon.com?

76

Lower CTR may be due to the holiday

season

Page 77: Causal data mining: Identifying causal effects at scale

• Split-door products are not a representative sample of all products, nor are the users who participate in them.

• But Split-door criterion covers more than half of all products with at least 10 visits on any single day.

• Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012], Lee and Hosanager [2014])

Generalization to all of Amazon.com?

77

Page 78: Causal data mining: Identifying causal effects at scale

78

Potential applications: Whenever an auxiliary outcome is availableDigital systemsRecommender systems, ad systems, app notificationsAny media website or app (such as newspapers)Offline contextsDiscount mailers sent by storesAny two marketing channelsIn the future…Effect of medical treatments, teaching interventions, etc.

Page 79: Causal data mining: Identifying causal effects at scale

79

Summary: Mining natural experiments at scaleUnlike traditional natural experiments, Split-door criterion relies on fine-grained data to:

Verify exclusion assumption [Robustness]Cover a broad range of data

[Generalizability]

Provides an offline metric for computing causal effects in digital systems (e.g., ad systems, media websites, app notifications).Code available for use.

Oprah [Carmi et al.] 133 shocks Restricted to books

Split-door criterion

12,000 natural experiments

Representative of overall product distribution

Page 80: Causal data mining: Identifying causal effects at scale

80

Nat. Exp.

The spectrum: split-door, regression and a natural experiment

Cutoff for Likelihood of Independence

0 .80 .95 1

Split-door

Nat. Exp.

Nat. Exp.

Regression

Amou

nt o

f dat

a

Page 81: Causal data mining: Identifying causal effects at scale

81

Part 2: A general Bayesian test for natural experiments in any dataset

Page 82: Causal data mining: Identifying causal effects at scale

Cause (X)

Outcome (Y)

Unobserved Confounders

(U)

Instrumental Variable

(Z)

As-If-Random?

Exclusion?Given observed data, can we determine whether it was generated from, a) the above model class (Invalid-IV), or

b) a model class without red edges (Valid-IV)?

Page 83: Causal data mining: Identifying causal effects at scale

83

Observational Data

Cause (X)

Outcome (Y)

Unobserved

Confounders (U)

I.V.(Z)

(X)

(Y)

(U)

(Z) (X

)(Y)

(U)

(Z)

(X)

(Y)

(U)

(Z)

𝑦= 𝑓 (𝑥 ,𝑢)

𝑦= 𝑓 (𝑥 ,𝑧 ,𝑢)

Page 84: Causal data mining: Identifying causal effects at scale

84

Necessary test: By properties of causal graph

Test :

Pearl (1993)

Cause (X)

Outcome (Y)

Unobserved

Confounders (U)

I.V.(Z)

Page 85: Causal data mining: Identifying causal effects at scale

85

But we would like a sufficient test for instrumental variables.

Page 86: Causal data mining: Identifying causal effects at scale

86

A first try: Compare model classes by maximum likelihood

Every data distribution that can be generated by a ValidIV model can also be generated by an InvalidIV model.

𝑀𝐿 𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉= max𝑚 ′∈𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉

𝑃 (𝐷𝑎𝑡𝑎∨𝑚′ )

Page 87: Causal data mining: Identifying causal effects at scale

Diamond represents all observable probability distributions P(X,Y|Z).

Sufficiency is almost “impossible”

87

Passes Necessary test

Both Valid and Invalid IV models can generate this data distribution.

Can attain a weaker notion: probable sufficiency

Page 88: Causal data mining: Identifying causal effects at scale

88

𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦𝑅𝑎𝑡𝑖𝑜=𝑃 (𝑉𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)𝑃 (𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)

A “probably sufficient” criterion

Page 89: Causal data mining: Identifying causal effects at scale

89

Intuition

(X)

(Y)

(U)

(Z)

(X)

(Y)

(U)

(Z)

Valid-IV

𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦𝑅𝑎𝑡𝑖𝑜=𝑃 (𝑉𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)𝑃 (𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)

(X)

(Y)

(U)

(Z)

(X)

(Y)

(U)

(Z)

Invalid-IV

Observational Data

Observational Data

𝑔1 𝑔2

𝑔3

𝑓 1

𝑓 3

𝑓 2

𝑓 4𝑔4h3 h4

Page 90: Causal data mining: Identifying causal effects at scale

Develop a generative meta-model of the data.

Compare marginal likelihoods of Valid versus Invalid-IV models.

Can formalize as a Bayesian model comparison

90

Data is likely to be generated from a Valid-IV model if ValidityRatio ≫ 1

Page 91: Causal data mining: Identifying causal effects at scale

91

Computing the Validity Ratio

Two problems:Each causal model contains unobserved

variable U.Infinitely many causal models in each

sub-class.

Page 92: Causal data mining: Identifying causal effects at scale

92

I. Use a response variable frameworkAssumes discrete variables.

𝑦= 𝑓 (𝑥 ,𝑢)

Page 93: Causal data mining: Identifying causal effects at scale

93

II. Non-standard integral over infinite models

Denominator (Invalid-IV)

Derived a closed form solution.

Properties of dirichlet and hyperdirichlet distributions.

-Laplace transform

Numerator (Valid-IV)

No closed form solution exists.

Used Monte Carlo methods for approximating.

-Annealed Importance Sampling

∫❑ ∫❑

Page 94: Causal data mining: Identifying causal effects at scale

94

Use the NPS test to validate IV studies from American Economic ReviewCollected studies from American Economic Review (AER) with “instrumental variable” in title or abstract.

Page 95: Causal data mining: Identifying causal effects at scale

95

Many recent studies from American Economic Review do not pass the testCollected studies from American Economic Review (AER) with “instrumental variable” in title or abstract. Studies from American Economic Review Validity

RatioEffect of Mexican immigration on crime in United States (2015)

0.07Effect of subsidy manipulation on Medicare premiums (2015)

1.02Effect of credit supply on housing prices (2015) 0.01Effect of Chinese import competition on local labor markets (2013)

0.3Effect of rural electrification on employment in South Africa (2011)

3.6

Expt: National Job Training Partnership Act (JTPA) Study (2002)

3.4

Page 96: Causal data mining: Identifying causal effects at scale

Challenges decades-long belief that causal assumptions cannot be tested from data

Can use data mining for causal effects in large-scale data.

Two recipes:• Create new graphical structures that identify

causal effect: Split-door criterion• Use Bayesian modeling to test instrumental

variables: NPS test

Conclusion: Causal data mining enables causal inference from large-scale data

96

Page 97: Causal data mining: Identifying causal effects at scale

97

More generally, a viable methodology for causal inference in large datasets

¿ 𝑋 ,𝑌>¿ Develop tests for validity of natural

variation

Mine for such valid variations in

observational data

Page 98: Causal data mining: Identifying causal effects at scale

98

LotteryWeatherShocks

Hard-to-find variations

Discontinuities

Change in access of digital services

Change in medicines at a hospital

Change in train stops in a city

More generally, a viable methodology for causal inference in large datasets

Page 99: Causal data mining: Identifying causal effects at scale

99

Controlled experiments

IV Test

Future Work

Ability to experiment

Amou

nt o

f dat

a

1010

108

106

104

102

Contextual BanditsA/B

test

Split-door

Causal algorithms

Warm Start (choosing expts.)

Online+Offline

Page 100: Causal data mining: Identifying causal effects at scale

100

Future work: Causal inference and machine learning

Causal inference robust prediction

Causal inferencePredicted value under the counterfactual distribution P’(X,y).

(Supervised) MLPredicted value under the training distribution P(X,y).

Page 101: Causal data mining: Identifying causal effects at scale

101

Thank you!Amit Sharmahttp://www.amitsharma.in1. Hofman, Sharma, and Watts (2017).

Prediction and explanation in social systems. Science, 355.6324.

2. Sharma (2016). Necessary and probably sufficient test for finding instrumental variables. Working paper.

3. Sharma, Hofman, and Watts (2016). Split-door criterion for causal identification: An algorithm for finding natural experiments. Under review at Annals of Applied Statistics (AOAS).

4. Sharma, Hofman, and Watts (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the 16th ACM Conference on Economics and Computation.

Page 102: Causal data mining: Identifying causal effects at scale

102

References1. Angrist and Pischke (2008). Mostly harmless econometrics:

An empiricist’s companion. Princeton Univ. Press.2. Belluf, Xavier and Giglio (2012). Case study on the business

value impact of personalized recommendations on a large online retailer. In Proc. ACM Conf. on Recommender Systems.

3. Carmi, Oestreicher-Singer and Sundararajan (2012). Is Oprah contagious? Identifying demand spillovers in online networks. SSRN 1694308

4. Dunning (2012). Natural experiments in the social sciences: a design-based approach. Cambridge University Press

5. Gomez-Uribe and Hunt (2015). The Netflix recommender system: Algorithms, business value and innovation. ACM Transactions on Management Information Systems.

6. Lee and Hosanager (2014). When do recommender systems work the best? The moderating effects of product attributes and consumer reviews on recommender performance. In Proc. ACM World Wide Web Conference.

Page 103: Causal data mining: Identifying causal effects at scale

103

References7. Lin, Goh and Heng (2013). The demand effects of product

recommendation networks: An empirical analysis of network diversity and stability. SSRN 2389339.

8. Linden, Smith and York (2003). Amazon. com recommendations: Item to-item collaborative filtering. IEEE Internet Computing.

9. Mulpuru (2006). What you need to know about Third-Party Recommendation Engines. Forrester Research.

10. Oestreicher-Singer and Sundararajan (2012). The Visible Hand? Demand Effects of Recommendation Networks in Electronic Markets. Management Science.

11. Pearl (2009). Causality: models, reasoning and inference. Cambridge Univ Press.

12. Sharma and Yan (2013). Pairwise learning in recommendation: Experiments with community recommendation on Linkedin. In ACM Conf. on Recommender Systems.