byron galbraith, chief data scientist, talla, at mlconf nyc 2017

Bayesian BanditsByron Galbraith, PhD

Cofounder / Chief Data Scientist, Talla2017.03.24

Bayesian Bandits for the Impatient

Online adaptive learning: “Earn while you Learn”1

Powerful alternative to A/B testing optimization

Can be efficient and easy to implement

Dining Ware VR Experiences on Demand

Iterated Decision Problems

What product recommendations should we present to subscribers to keep them engaged?

A/B Testing

Exploit vs Explore - What should we do?Choose what seems best so far🙂 Feel good about our decision🤔 There still may be something better

Try something new😄 Discover a superior approach😧 Regret our choice

A/B/n Testing

Regret - What did that experiment cost us?

The Multi-Armed Bandit Problem

http://blog.yhat.com/posts/the-beer-bandit.html

Bandit Solutions

𝑅𝑇=∑𝑡=1

[𝑟 (𝑌 𝑡 (𝑎∗ ))−𝑟 (𝑌 𝑡 (𝑎𝑡 )) ]

k-MAB =

𝑎𝑡=argmax𝑖 [𝑟 𝑖𝑡+

𝑐√ log 𝑡𝑛𝑖 ]

𝑃 (𝐴𝑡=𝑎 )= 𝑒h 𝑎𝑛

∑𝑏=1

𝑒h𝑏𝑛=𝜋𝑡 (𝑎)

𝑃 (𝑋=𝑥 )=𝑥𝛼−1 (1−𝑥 )𝛽− 1

𝐵 (𝛼 , 𝛽 )𝑃 (𝑋=𝑥 )=(𝑛𝑥 )𝑝𝑥 (1−𝑝 )𝑛−𝑥

𝐵𝑒𝑡𝑎𝑎(𝛼+𝑟𝑎 , 𝛽+𝑁−𝑟 𝑎)

𝑃 (𝑋|𝑌 ,𝑍 )= 𝑃 (𝑌|𝑋 ,𝑍 )𝑃 ( 𝑋|𝑍 )𝑃 (𝑌|𝑍 )

Thompson Sampling

𝑷 (𝜽|𝒓 ,𝒂 )∝𝑷 (𝒓|𝜽 ,𝒂) 𝑷 (𝜽∨𝒂 )PriorLikeliho

odPosterior

Bayesian Bandits – The ModelModel if a recommendation will result in user engagement

• Bernoulli distribution: - likelihood of event occurring

How do we find ?• Conjugate prior• Beta distribution: - number of hits, - number of misses𝛼 𝛽

Only need to keep track of two numbers per option• # of hits, # of misses

Bayesian Bandits – The Algorithm1. Initialize (uniform prior)

2. For each user request for recommendations t1. Sample 2. Choose action corresponding to largest 3. Observe reward 4. Update

Belief Adaptation

Bandit Regret

But behavior is dependent on context• Categorical contexts• One bandit model per category• One-hot context vector

• Real-valued contexts• Can capture interrelatedness of context dimensions• More difficult to incorporate effectively

So why would I ever A/B test again?Test intent

Optimization vs understanding

Difficulty with non-stationarityMonday vs Friday behavior

DeploymentFew turnkey optionsSpecialized skill set

https://vwo.com/blog/multi-armed-bandit-algorithm/

Bayesian Bandits for the PatientThompson Sampling balances exploitation & exploration while minimizing decision regret

No need to pre-specify decision splits, time horizon for experiments

Can model a variety of problems and complex interactions

Resourceshttps://github.com/bgalbraith/bandits

byron galbraith, chief data scientist, talla, at mlconf nyc 2017

Technology

mlconf nyc chang wang

jake mannix, mlconf 2013

from galbraith to krugman and back galbraith, krugman and...

daniel shank, data scientist, talla at mlconf sf 2017

reviewanalysis mlconf 2016 jprendki

mlconf nyc justin basilico

mlconf nyc ted willke

james kenneth galbraith

josh wills, mlconf 2013

galbraith farrer

h2o 0xdata mlconf

talla baja y talla alta2010

mlconf nyc josh wills

music recommendations @ mlconf 2014

mlconf nyc samantha kleinberg

daniel shank, data scientist, talla at mlconf sf 2016

galbraith presentatie

american express slides, mlconf 2013

sri ambati – ceo, 0xdata at mlconf atl

xia zhu – intel at mlconf atl