causal reasoning and learning systemsseminars/seminars/extra/2012... · causal inference viewpoint...

75
Causal Reasoning and Learning Systems Ed Snelson MSR Cambridge UK / AdCenter

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Causal Reasoning and Learning Systems

Ed Snelson

MSR Cambridge UK / AdCenter

Page 2: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Work by

Jointly with:

Jonas Peters Joaquin Quiñonero Candela

Denis Xavier Charles D. Max Chickering

Elon Portugaly Dipankar Ray

Patrice Simard Ed Snelson

Léon Bottou

Page 3: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

I. MOTIVATION

Page 4: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

The pesky little ads

Page 5: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

The online ads ecosystem

• We run a marketplace with different players:

• Player: User

– Is looking for something

– Selects search engine (publisher)

– Issues a query

– Clicks or not on organic result(s) or ad(s)

– Is happy / not with relevance, adjusts attitude towards

publisher accordingly

– Goal: few ads, high relevance

Page 6: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

The online ads ecosystem

• We run a marketplace with different players:

• Player: Advertiser

– Is looking for customers to sell to

– Defines campaigns, ads and places bids for clicks on these ads in different ad markets

– Monitors performance of campaigns

– Adjusts campaigns / bids accordingly

– Possibly uses rational strategy to bid

– Goal: maximize total value – total cost

Page 7: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

The online ads ecosystem

• We run a marketplace with different players:

• Player: Publisher

– Has search engine

– Contracts marketplace to provide ads

– Goal: short term and long term revenue, search

engine reputation and market share

Page 8: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Three feedback loops

User Advertiser

Queries

Ads & Bids

Ads Prices

Clicks (and consequences) Learning

ADVERTISER FEEDBACK LOOP

LEARNING FEEDBACK LOOP

USER FEEDBACK

LOOP

Publisher

Page 9: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Learning to run a marketplace

• The learning machine is not a machine but is an organization with lots of people doing stuff and making decisions working in the dark

How can we help?

• Goal: improve marketplace machinery such that its long term revenue is maximal

• Approximate goal by improving multiple performance measures related to all players

• Provide data for decision making • Automatically optimize parts of the

system

Page 10: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

No clear way to decompose complex learning systems into simple auctions or simple MAB problems!

Previous work Auction theory and mechanism design

• Motivate the design of ad auctions

• Address the advertiser feedback loop

• What about the other interactions?

• What about learning?

Algorithmic machine learning

• Often correlational, not causal

• E.g. find the best click-predictor for the current traffic

• Regret bounds for “multi armed bandits” (MAB)

• Deals with the learning feedback loop (explore/exploit)

• What about the other interactions?

Page 11: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

This work

Causal inference viewpoint

• Changing the ad placement algorithms is an intervention on the system.

• We track the consequences of such interventions along the paths of the causal graph.

Much more flexible framework

Leads to powerful estimation methods

Leads to novel learning algorithms

Now in daily use

Page 12: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Outline from here on

II. Experimentation

III. Causal inference

IV. Counterfactual measurements

V. Algorithmic toolbox

Page 13: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

II. EXPERIMENTATION

Page 14: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

The problem with correlation analysis (Simpson’s paradox)

Trying to decide whether a drug helps or not

• Historical data:

• Conclusion: don’t give the drug

But what if the Drs. were saving the drug for the severe cases?

• Conclusion reversed: drug helps for both severe and mild cases

• But other confounding factors might reverse conclusion again…

All Survived Died Survival Rate

Treated 5,000 2,100 2,900 42%

Not Treated 5,000 2,900 2,100 58%

Severe cases (treatment rate 80%)

All Survived Died Survival Rate

Treated 4,000 1,200 2,800 30%

Not Treated 1,000 100 900 10%

Mild case (treatment rate 20%)

All Survived Died Survival Rate

Treated 1,000 900 100 90%

Not Treated 4,000 2,800 1,300 70%

Page 15: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Overkill?

Pervasive causation paradoxes in ad data!

Example. – Logged data shows a positive correlation between

event A “First mainline ad receives a high score 𝑞1” and event B “Second mainline ad receives a click”.

– Do high quality ads encourage clicking below?

– Controlling for event C ”Query categorized as commercial” reverses the correlation for both commercial and noncommercial queries.

Page 16: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Randomized experiments Randomly select who to treat

• Selection independent of all confounding factors

• Therefore eliminates Simpson’s paradox and allows:

Counterfactual estimates

• If we had given penicillin to 𝑥% of the patients, the success rate would have been 60% × 𝑥 + 40% × 1 − 𝑥

• In fact, the selection probability can depend on known factors (e.g. severity). It just has to be independent of all unknown factors.

All population (treatment rate 30%)

All Survived Died Survival Rate

Treated 3,000 1,800 1,200 60%

Not Treated 7,000 2,800 4,200 40%

Page 17: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Experiments in the online world

• AB tests are used throughout the online world to compare different versions of the system

– A random fraction of the traffic (a flight) uses click prediction system A

– Another random fraction uses click prediction system B

• Overlapping experiments

– Simultaneously test two flavors of the ad selection system and two flavors of the click prediction system

• Our framework takes this one stage further

Page 18: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

III. CAUSAL INFERENCE

Page 19: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Structural equation model (SEM)

Direct causes / Known and unknown functions Noise variables / Exogenous variables

Page 20: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Intervention

Interventions as algebraic manipulation of the SEM.

Causal graph must remain acyclic.

* NEW Q=𝒇𝟒

Page 21: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Isolation assumption

What to do with unknown functions? • Replace knowledge by statistics. • Statistics need repeated isolated experiments. • Isolate experiments by assuming an

unknown but invariant joint distribution for the exogenous variables.

• What about the feedback loops?

𝑃(𝑢, 𝑣)

Page 22: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Markov factorization

A Bayes network is born (Pearl, 1988)

Page 23: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Markov interventions

Many interrelated Bayes networks are born (Pearl, 2000)

– They are interrelated because they share some factors.

– More complex algebraic interventions are of course possible.

*

*

Distribution under intervention

Page 24: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

IV. COUNTERFACTUAL MEASUREMENTS

Page 25: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Counterfactuals

Measuring something that did not happen “How would the system have performed if, when the data was collected, we had used 𝑃∗ 𝑞|𝑥, 𝑎 instead of 𝑃 𝑞|𝑥, 𝑎 ? ”

Learning procedure • Collect data that describes the operation of the system

during a past time period. • Find changes that would have increased the performance

of the system if they had been applied during the data collection period.

• Implement and verify…

Page 26: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Replaying past data

Classification example

• Collect labeled data in existing setup

• Replay the past data to evaluate what the performance would have been if we had used classifier θ.

• Requires knowledge of all functions connecting the point of intervention to the point of measurement.

𝑞

Page 27: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Importance sampling

*

*

Distribution under intervention

• Can we estimate the results of the intervention counterfactually (without actually performing the intervention) – Yes if P and P* are non-deterministic (and close enough)

Page 28: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Importance sampling

Actual expectation

𝑌 = ℓ 𝜔 𝑃(𝜔)𝜔

Counterfactual expectation

𝑌∗ = ℓ 𝜔 𝑃∗(𝜔)𝜔

= ℓ 𝜔𝑃∗(𝜔)

𝑃 𝜔𝑃(𝜔)

𝜔

≈ 1

𝑛

𝑃∗(𝜔𝑖)

𝑃 𝜔𝑖

𝑛

𝑖=1

ℓ 𝜔𝑖

Page 29: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Importance sampling

Principle • Reweight past examples to emulate the

probability they would have had under the counterfactual distribution.

𝑤 𝜔𝑖 =𝑃∗(𝜔𝑖)

𝑃 𝜔𝑖=

𝑃∗(𝑞|𝑥, 𝑎)

𝑃(𝑞|𝑥, 𝑎)

• Only requires the knowledge of the function under intervention (before and after)

Factors in P* not in P

Factors in P not in P*

Page 30: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Exploration

𝑃(𝜔) 𝑃∗(𝜔)

Quality of the estimation

• Large ratios undermine estimation quality.

• Confidence intervals reveal whether the data collection distribution 𝑃 𝜔 performs sufficient exploration to answer the counterfactual question of interest.

𝑃(𝜔) 𝑃∗(𝜔)

Page 31: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Confidence intervals

𝑌∗ = ℓ 𝜔 𝑤 𝜔 𝑃(𝜔)𝜔

≈ 1

𝑛 ℓ 𝜔𝑖 𝑤 𝜔𝑖

𝑛

𝑖=1

Using the central limit theorem?

• 𝑤 𝜔𝑖 very large when 𝑃(𝜔𝑖) small.

• A few samples in poorly explored regions dominate the sum with their noisy contributions.

• Solution: ignore them.

Page 32: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Confidence intervals (ii)

Zero-clipped weights

𝑤 𝜔 = 𝑤(𝜔) if less than 𝑅,

0 otherwise.

Easier estimate

𝑌 ∗ = ℓ 𝜔 𝑤 𝜔 𝑃(𝜔)𝜔

≈ 1

𝑛 ℓ 𝜔𝑖 𝑤 𝜔𝑖

𝑛

𝑖=1

Page 33: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Confidence intervals (iii)

Bounding the bias

• Observe 𝑤 𝜔 𝑃(𝜔)𝜔

= 𝑃∗ 𝜔

𝑃 𝜔𝑃 𝜔 = 1

𝜔.

• Assuming 0 ≤ ℓ 𝜔 ≤ 𝑀 we have

0 ≤ 𝑌∗ − 𝑌 ∗ = 𝑤 − 𝑤 ℓ(𝜔) 𝑃(𝜔

𝜔) ≤ 𝑀 𝑤 − 𝑤 𝑃(𝜔

𝜔)

= 𝑀 1 − 𝑤 (𝜔)𝑃(𝜔

𝜔) ≈ 𝑀 1 −1

𝑛 𝑤 (𝜔𝑖)

𝑛

𝑖=1

• This is easy to estimate because 𝑤 (𝜔) is bounded.

• This represents what we miss because of insufficient exploration.

Page 34: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Two-parts confidence interval

Outer confidence interval

• Bounds Y ∗ − Y 𝑛∗

• When this is too large, we must sample more.

Inner confidence interval

• Bounds 𝑌∗ − 𝑌 ∗

• When this is too large, we must explore more.

Page 35: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

The pesky little ads again

Mainline Sidebar

Page 36: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Playing with mainline reserves

Mainline reserves (MLRs)

• Thresholds that control whether ads are displayed in the mainline (north ads)

Randomized bucket

• Random log-normal multiplier applied to MLRs.

• 22M auctions over five weeks

Control buckets

• Same setup with 18% lower mainline reserves

• Same setup without randomization

Page 37: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Playing with mainline reserves (ii)

Inner interval

Outer interval

Control with no randomization

Control with 18% lower MLR

Page 38: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Playing with mainline reserves (iii)

This is easy to estimate

Page 39: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Playing with mainline reserves (iv)

Revenue has always high

variance

Page 40: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

More with the same data

Examples

• Estimates for different randomization variance Good to determine how much to explore.

• Query-dependent reserves Just another counterfactual distribution!

This is the big advantage

• Collect data first, choose question later.

• Randomizing more stuff increases opportunities.

Page 41: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

A generalization of AB tests

• Add a node randomly selecting which click prediction system to use

• Counterfactual measurement: what if the probability for selecting system A was 100%

• Leverage part of algorithmic toolbox, but not all, because node is discrete

Page 42: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

V. ALGORITHMIC TOOLBOX

Page 43: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Algorithmic toolbox

• Improving the confidence intervals: – Exploiting causal graph for much better behaved

weights

– Incorporating predictors invariant to the manipulation

• Counterfactual derivatives and optimization – Counterfactual differences

– Counterfactual derivatives

– Policy gradients

– Optimization (= learning)

• Equilibrium analysis See tech report for extensive details http://leon.bottou.org

Page 44: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Shifting the reweighting point

• Users make click decisions on the basis of what they see. • They cannot see the scores, the reserves, the prices, etc.

Page 45: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Shifting the reweighting point

Actual score distribution

A1 A3

A2 and many more

p=41% p=22% p=12% p=…

Counterfactual score distribution

A1 A3 A2

p=18% p=34% p=17% p=…

• We can estimate counterfactual click yield by reweighting the samples according to P⋄(𝑠| … ) rather than the probabilities of scores P⋄(𝑞| … ).

Page 46: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Shifting the reweighting point

Standard weights

𝑤 𝜔𝑖 =𝑃∗(𝜔𝑖)

𝑃 𝜔𝑖=

𝑃∗(𝑞|𝑥, 𝑎)

𝑃(𝑞|𝑥, 𝑎)

Shifted weights

𝑤 𝜔𝑖 = 𝑃∗(𝑠|𝑥, 𝑎, 𝑏)

𝑃(𝑠|𝑥, 𝑎, 𝑏)

with 𝑃⋄ 𝑠 𝑥, 𝑎, 𝑏 = 𝑃 𝑠 𝑎, 𝑞, 𝑏 𝑃⋄(𝑞|𝑥, 𝑎)𝑞

.

Page 47: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Shifting the reweighting point

Experimental validation

• Mainline reserves

Score reweighting Slate reweighting

Page 48: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Shifting the reweighting point

When can we do this?

• 𝑃⋄(𝜔) factorizes in the right way iff

1. Reweighting variables intercept every causal path connecting the point(s) of intervention to the point of measurement.

2. All functional dependencies between the point(s) of intervention and the reweighting variables are known.

Page 49: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Estimating differences

Comparing two potential interventions

• Is scoring model 𝑀1 better than 𝑀2 ?

• Improved confidence via variance reduction

– Example: since seasonal variations affect both models nearly identically, the variance resulting from these variations cancels in the difference.

Click-thru-rate if we had used model 𝑀1

Click-thru-rate if we had used model 𝑀2 Δ =

Page 50: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Estimating derivatives

Infinitesimal interventions

• Related to “policy gradient” in RL.

• Can drive optimization algorithms to learn the model parameters 𝜃.

Click-thru-rate if we had used 𝑀(𝜃 + 𝑑𝜃)

Click-thru-rate if we had used model 𝑀 𝜃

𝜕𝐶𝑇𝑅

𝜕𝜃=

d𝜃

Page 51: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Learning by Optimizing Counterfactuals

• Sequential design? Thompson sampling comes naturally in this context.

• Metering exploration wisely? Inner confidence interval tells how much exploration we need to answer a counterfactual question.

But it does not tell which questions we should ask.

This was not a problem in practice…

Page 52: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Revisiting the feedback loops

Queries

Ads & Bids

Ads Prices

Clicks (and consequences)

ADVERTISER FEEDBACK LOOP

LEARNING FEEDBACK LOOP

USER FEEDBACK

LOOP

Page 53: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Revisiting the feedback loops

Tracking the equilibrium

• If we increase the ad relevance thresholds :

– We show less ads and lose revenue in the short term.

– Users see more relevant ads, are more likely to click on ads in the future, possibly making up for the lost revenue [eventually].

– Advertisers will [eventually] update their bids. It could go both ways because they receive less clicks from more engaged users…

Page 54: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Counterfactual equilibrium

Counterfactual question

“What would have been the system performance metrics if we had applied an infinitesimal change to the parameter 𝜃 of the scoring model long enough to reach the equilibrium during the data collection period? ”

Estimation technique similar to the quasi-static analysis in physics.

Page 55: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

VI. CONCLUSION

Page 56: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Main messages

• There are systems in the real world that are too complex to easily formalize

– ML can assist humans in running these systems

• Causal inference clarifies many problems

– Ignoring causality => Simpson’s paradox

– Randomness allows inferring causality

• The counterfactual framework is modular

– Randomize in advance, ask later

– Compatible with other methodologies, e.g. optimization using gradients, equilibrium analysis, AB testing

Page 57: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

VII. APPENDIX

Page 58: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Invariant predictors

• Some variables 𝑣 not affected by the intervention have a strong impact on the outcome: time, location, …

• Let 𝜁(𝑣) be an arbitrary predictor of ℓ(𝜔)

𝑌∗ = ℓ 𝜔 − 𝜁(𝜈) 𝑤 𝜔 𝑃 𝜔𝜔

+ 𝜁 𝜈 𝑃(𝜈)𝜈

≈ 1

𝑛 ℓ 𝜔𝑖 − 𝜁 𝜈𝑖 𝑤 𝜔𝑖

𝑛

𝑖=1

+1

𝑛 𝜁 𝑣𝑖

𝑛

𝑖=1

No multiplier 𝑤(𝜔) for the variance captured by 𝜁 𝜈 .

Reduced variance if the predictor 𝜁 𝜈 is any good.

Page 59: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Counterfactual differences

• Which scoring model works best?

• Comparing expectations under counterfactual distributions 𝑃+(𝜔) and 𝑃∗(𝜔).

𝑌+ − 𝑌∗ = ℓ 𝜔 − 𝜁 𝜈 Δ𝑤 𝜔𝜔

𝑃 𝜔

≈ 1

𝑛 ℓ 𝜔𝑖 − 𝜁(𝜈𝑖) Δ𝑤 𝜔𝑖

𝑛

𝑖=1

with Δ𝑤 𝜔 =𝑃+ 𝜔

𝑃 𝜔 −

𝑃∗ 𝜔

𝑃 𝜔

Variance captured by 𝜁 𝜈 is gone!

Page 60: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Counterfactual derivatives

• Counterfactual distribution 𝑃𝜃 𝜔

𝜕𝑌𝜃

𝜕𝜃= ℓ 𝜔 − 𝜁 𝜈 𝑤𝜃

′ 𝜔 𝑃 𝜔𝜔

≈1

𝑛 ℓ 𝜔𝑖 − 𝜁 𝜈𝑖 𝑤𝜃

′ 𝜔𝑖

𝑛

𝑖=1

with 𝑤𝜃′ 𝜔 =

𝜕𝑤𝜃(𝜔)

𝜕𝜃= 𝑤𝜃 𝜔

𝜕 log 𝑃𝜃 𝜔

𝜕𝜃

𝑤𝜃 𝜔 can be large but there are ways…

Page 61: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Policy gradient

Infinitesimal interventions • Assuming 𝑃 𝜔 = 𝑃0 𝜔 and using the previous result:

𝜕𝑌𝜃

𝜕𝜃 𝜃=0

≈ 1

𝑛 ℓ 𝜔𝑖 − 𝜁 𝜈𝑖 𝑤0

′ 𝜔𝑖

𝑛

𝑖=1

with 𝑤0′ 𝜔 =

𝜕𝑤𝜃 𝜔

𝜕𝜃 𝜃=0

= 𝜕 log 𝑃𝜃 𝜔

𝜕𝜃 𝜃=0

Potentially large 𝑤𝜃 𝜔 is gone!

Page 62: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Derivatives and optimization

Tuning squashing exponents and reserves

• Ads ranked by decreasing 𝑏𝑖𝑑 × 𝑝𝐶𝑙𝑖𝑐𝑘 𝛼

• Lahaie & McAfee (2011) show that 𝛼 < 1 is good when click probability estimation gets less accurate.

• Different 𝛼𝑘 and reserves 𝜌𝑘 for each query cluster 𝑘.

• Squashing can increase prices. Optimize advertiser value instead of revenue. “Making the pie larger instead of our slice of the pie!”

Page 63: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Derivatives and optimization

Objective function • 𝑉 𝜶, 𝝆 : Lower bound for advertiser value.

• 𝑁 𝜶, 𝝆 : Upper bound for number of mainline ads

max𝜶,𝝆

𝑉 𝜶, 𝝆 subject to 𝑁 𝜶, 𝝆 < 𝑁0

• We can estimate these functions and their derivatives. Therefore we can optimize easily.

• The alternative is auction simulation. Users are then assumed to behave like 𝑝𝐶𝑙𝑖𝑐𝑘 says.

Page 64: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Derivatives and optimization

Variation of the average number of mainline ads.

Estimated advertiser value (arbitrary units)

Level curves for one particular query cluster

Page 65: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Revisiting the feedback loops

Queries

Ads & Bids

Ads Prices

Clicks (and consequences)

ADVERTISER FEEDBACK LOOP

LEARNING FEEDBACK LOOP

USER FEEDBACK

LOOP

Page 66: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Revisiting the feedback loops

Different time scales • Auctions happen thousands of times per second. • Learning feedback: a couple hours. • User and advertiser feedback: several weeks.

Tracking the equilibrium • Assume the user and advertiser feedbacks converge to

equilibrium • “What will have been the total revenue had we

changed MLR and waited until a new equilibrium was reached”

Page 67: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

User feedback

We make the following causal assumptions

• We can measure a quantity 𝑔 that quantifies the relevance of the displayed ads. – For instance using human labelers.

• The user reaction can be expressed by the effect on the click yield of the average relevance 𝑔 experienced by each user in the recent past. – Realistically, we should also consider an effect on the

number of queries issued by each user. We omit this for simplicity.

Page 68: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

User feedback

Let 𝑌(𝜃, 𝑔 ) = ℓ(𝜔) 𝑃𝜃(𝜔|𝑔 )𝜔

and 𝐺 𝜃, 𝑔 = 𝑔 𝑃𝜃 𝜔 𝑔 𝜔

.

Equilibrium condition : 𝐺 = 𝑔 .

Page 69: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Total derivative

Total derivatives

d𝑌 = 𝜕𝑌

𝜕𝜃 d𝜃 +

𝜕𝑌

𝜕𝑔 d𝑔

d𝐺 = 𝜕𝐺

𝜕𝜃 d𝜃 +

𝜕𝐺

𝜕𝑔 d𝑔 = d𝑔

Solving gives

d𝑔 =𝜕𝐺

𝜕𝜃d𝜃

And substituting gives the answer to our question

d𝑌 = 𝜕𝑌

𝜕𝜃 +

𝜕𝑌

𝜕𝑔 𝜕𝐺

𝜕𝜃 d𝜃

Equilibrium condition

Zero because of the graph structure

Page 70: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Estimating derivatives

• Estimate 𝜕𝑌

𝜕𝜃 and

𝜕𝐺

𝜕𝜃 like policy gradients.

• Estimate 𝜕𝑌

𝜕𝑔 as follows:

1. Randomly submit users to different treatments.

2. At some point switch to same treatment and observe click yield 𝑌 as functions of 𝑔 .

Page 71: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Advertiser feedback

Same thing with total derivatives. For instance, using economics to characterize the equilibrium, avoiding the extra experiment.

𝑦𝑎 , 𝑧𝑎

𝑦 𝑎, 𝑧 𝑎

Page 72: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Pricing theory

Rational advertisers keep Va =𝜕𝑧 𝑎

𝜕𝑦 𝑎=

𝜕𝑧 𝑎

𝜕𝑏𝑎

𝜕𝑦 𝑎

𝜕𝑏𝑎 constant.

Number of clicks 𝑦 𝑎

To

tal p

aid

𝑧 𝑎

Pricing curve Adjusting the bid 𝑏𝑎 moves 𝑦 𝑎 , 𝑧 𝑎 on this curve.

Value curve Advertiser will not pay more than this.

Maximum surplus Best deal for the advertisers. The slope of the pricing curve reveals their value.

Page 73: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Estimating values

• At equilibrium 𝑦 𝑎 = 𝑌𝑎 and 𝑧 𝑎 = 𝑍𝑎.

• Therefore we can compute

Va =𝜕𝑧 𝑎𝜕𝑏𝑎

𝜕𝑦 𝑎𝜕𝑏𝑎

=𝜕𝑍𝑎

𝜕𝑏𝑎

𝜕𝑌𝑎

𝜕𝑏𝑎

• So vector Φ = …𝜕𝑍𝑎

𝜕𝑏𝑎− 𝑉𝑎

𝜕𝑌𝑎

𝜕𝑏𝑎… = 0

• Then we can use the policy gradient equation.

Page 74: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Advertiser feedback equilibrium

Φ was zero before 𝑑𝜃, then it converges to zero when equilibrium is returned so:

dΦ =𝜕Φ

𝜕𝜃d𝜃 +

𝜕Φ

𝜕𝑏𝑎𝑎

d𝑏𝑎 = 0

• Solve the linear system for d𝑏𝑎

d𝜃

• Then answer the counterfactual question

d𝑌 =𝜕𝑌

𝜕𝜃+

𝜕𝑌

𝜕𝑏𝑎 d𝑏𝑎

d𝜃𝑎

d𝜃

Equilibrium condition

Page 75: Causal Reasoning and Learning Systemsseminars/seminars/Extra/2012... · Causal inference viewpoint • Changing the ad placement algorithms is an intervention on the system. • We

Multiple feedback loops

Same procedure:

1. Write total derivatives.

2. Solve the linear system formed by all the equilibrium conditions.

3. Substitute into the total derivative of the counterfactual expectation of interest.