byron galbraith, chief data scientist, talla, at mlconf nyc 2017

24
Bayesian Bandits Byron Galbraith, PhD Cofounder / Chief Data Scientist, Talla 2017.03.24

Upload: mlconf

Post on 11-Apr-2017

326 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian BanditsByron Galbraith, PhD

Cofounder / Chief Data Scientist, Talla2017.03.24

Page 2: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits for the Impatient

Online adaptive learning: “Earn while you Learn”1

2

3

Powerful alternative to A/B testing optimization

Can be efficient and easy to implement

Page 3: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Dining Ware VR Experiences on Demand

Page 4: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Dining Ware VR Experiences on Demand

Page 5: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Iterated Decision Problems

What product recommendations should we present to subscribers to keep them engaged?

Page 6: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

A/B Testing

Page 7: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Exploit vs Explore - What should we do?Choose what seems best so far🙂 Feel good about our decision🤔 There still may be something better

Try something new😄 Discover a superior approach😧 Regret our choice

Page 8: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

A/B/n Testing

Page 9: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Regret - What did that experiment cost us?

Page 10: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

The Multi-Armed Bandit Problem

http://blog.yhat.com/posts/the-beer-bandit.html

Page 11: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bandit Solutions

𝑅𝑇=∑𝑡=1

𝑇

[𝑟 (𝑌 𝑡 (𝑎∗ ))−𝑟 (𝑌 𝑡 (𝑎𝑡 )) ]

k-MAB =

𝑎𝑡=argmax𝑖 [𝑟 𝑖𝑡+

𝑐√ log 𝑡𝑛𝑖 ]

𝑃 (𝐴𝑡=𝑎 )= 𝑒h 𝑎𝑛

∑𝑏=1

𝑘

𝑒h𝑏𝑛=𝜋𝑡 (𝑎)

𝑃 (𝑋=𝑥 )=𝑥𝛼−1 (1−𝑥 )𝛽− 1

𝐵 (𝛼 , 𝛽 )𝑃 (𝑋=𝑥 )=(𝑛𝑥 )𝑝𝑥 (1−𝑝 )𝑛−𝑥

𝐵𝑒𝑡𝑎𝑎(𝛼+𝑟𝑎 , 𝛽+𝑁−𝑟 𝑎)

𝑃 (𝑋|𝑌 ,𝑍 )= 𝑃 (𝑌|𝑋 ,𝑍 )𝑃 ( 𝑋|𝑍 )𝑃 (𝑌|𝑍 )

Page 12: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Thompson Sampling

𝑷 (𝜽|𝒓 ,𝒂 )∝𝑷 (𝒓|𝜽 ,𝒂) 𝑷 (𝜽∨𝒂 )PriorLikeliho

odPosterior

Page 13: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits – The ModelModel if a recommendation will result in user engagement

• Bernoulli distribution: - likelihood of event occurring

How do we find ?• Conjugate prior• Beta distribution: - number of hits, - number of misses𝛼 𝛽

Only need to keep track of two numbers per option• # of hits, # of misses

Page 14: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits – The Algorithm1. Initialize (uniform prior)

2. For each user request for recommendations t1. Sample 2. Choose action corresponding to largest 3. Observe reward 4. Update

Page 15: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 16: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 17: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 18: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 19: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 20: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bandit Regret

Page 21: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

But behavior is dependent on context• Categorical contexts• One bandit model per category• One-hot context vector

• Real-valued contexts• Can capture interrelatedness of context dimensions• More difficult to incorporate effectively

Page 22: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

So why would I ever A/B test again?Test intent

Optimization vs understanding

Difficulty with non-stationarityMonday vs Friday behavior

DeploymentFew turnkey optionsSpecialized skill set

https://vwo.com/blog/multi-armed-bandit-algorithm/

Page 23: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits for the PatientThompson Sampling balances exploitation & exploration while minimizing decision regret

1

2

3

No need to pre-specify decision splits, time horizon for experiments

Can model a variety of problems and complex interactions

Page 24: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Resourceshttps://github.com/bgalbraith/bandits