Transcript
Page 1: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian BanditsByron Galbraith, PhD

Cofounder / Chief Data Scientist, Talla2017.03.24

Page 2: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits for the Impatient

Online adaptive learning: β€œEarn while you Learn”1

2

3

Powerful alternative to A/B testing optimization

Can be efficient and easy to implement

Page 3: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Dining Ware VR Experiences on Demand

Page 4: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Dining Ware VR Experiences on Demand

Page 5: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Iterated Decision Problems

What product recommendations should we present to subscribers to keep them engaged?

Page 6: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

A/B Testing

Page 7: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Exploit vs Explore - What should we do?Choose what seems best so farπŸ™‚ Feel good about our decisionπŸ€” There still may be something better

Try something newπŸ˜„ Discover a superior approach😧 Regret our choice

Page 8: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

A/B/n Testing

Page 9: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Regret - What did that experiment cost us?

Page 10: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

The Multi-Armed Bandit Problem

http://blog.yhat.com/posts/the-beer-bandit.html

Page 11: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bandit Solutions

𝑅𝑇=βˆ‘π‘‘=1

𝑇

[π‘Ÿ (π‘Œ 𝑑 (π‘Žβˆ— ))βˆ’π‘Ÿ (π‘Œ 𝑑 (π‘Žπ‘‘ )) ]

k-MAB =

π‘Žπ‘‘=argmax𝑖 [π‘Ÿ 𝑖𝑑+

π‘βˆš log 𝑑𝑛𝑖 ]

𝑃 (𝐴𝑑=π‘Ž )= 𝑒h π‘Žπ‘›

βˆ‘π‘=1

π‘˜

𝑒h𝑏𝑛=πœ‹π‘‘ (π‘Ž)

𝑃 (𝑋=π‘₯ )=π‘₯π›Όβˆ’1 (1βˆ’π‘₯ )π›½βˆ’ 1

𝐡 (𝛼 , 𝛽 )𝑃 (𝑋=π‘₯ )=(𝑛π‘₯ )𝑝π‘₯ (1βˆ’π‘ )π‘›βˆ’π‘₯

π΅π‘’π‘‘π‘Žπ‘Ž(𝛼+π‘Ÿπ‘Ž , 𝛽+π‘βˆ’π‘Ÿ π‘Ž)

𝑃 (𝑋|π‘Œ ,𝑍 )= 𝑃 (π‘Œ|𝑋 ,𝑍 )𝑃 ( 𝑋|𝑍 )𝑃 (π‘Œ|𝑍 )

Page 12: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Thompson Sampling

𝑷 (𝜽|𝒓 ,𝒂 )βˆπ‘· (𝒓|𝜽 ,𝒂) 𝑷 (πœ½βˆ¨π’‚ )PriorLikeliho

odPosterior

Page 13: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits – The ModelModel if a recommendation will result in user engagement

β€’ Bernoulli distribution: - likelihood of event occurring

How do we find ?β€’ Conjugate priorβ€’ Beta distribution: - number of hits, - number of misses𝛼 𝛽

Only need to keep track of two numbers per optionβ€’ # of hits, # of misses

Page 14: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits – The Algorithm1. Initialize (uniform prior)

2. For each user request for recommendations t1. Sample 2. Choose action corresponding to largest 3. Observe reward 4. Update

Page 15: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 16: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 17: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 18: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 19: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 20: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bandit Regret

Page 21: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

But behavior is dependent on contextβ€’ Categorical contextsβ€’ One bandit model per categoryβ€’ One-hot context vector

β€’ Real-valued contextsβ€’ Can capture interrelatedness of context dimensionsβ€’ More difficult to incorporate effectively

Page 22: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

So why would I ever A/B test again?Test intent

Optimization vs understanding

Difficulty with non-stationarityMonday vs Friday behavior

DeploymentFew turnkey optionsSpecialized skill set

https://vwo.com/blog/multi-armed-bandit-algorithm/

Page 23: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits for the PatientThompson Sampling balances exploitation & exploration while minimizing decision regret

1

2

3

No need to pre-specify decision splits, time horizon for experiments

Can model a variety of problems and complex interactions

Page 24: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Resourceshttps://github.com/bgalbraith/bandits


Top Related