multi armed bandits [email protected]. survey click here

Multi Armed Bandits

[email protected]

Survey

Click Here

Click Here

Click-through Rate (Clicks / Impressions)20%

Click Here Click Here


Click-through Rate20% ?



AB Test• Randomized Controlled Experiment• Show each button to 50% of users

AB Test Timeline

AB Test

AB TestAfter Test (show winner)Before Test

Time

Exploration Phase(Testing)

Exploitation Phase(Show Winner)


Click-through Rate20% 30%

• 10,000 impressions/month

• Need 4,000 clicks by EOM• 30% CTR won’t be enough

Need to keep testing (Exploration)

















ABCDEFG... TestEach variant would be assigned with probability 1/N

N = # of variants

Not everyone is a winner

















ABCDEFG... TestEach variant would be assigned with probability 1/N

N = # of variants

Need to keep testing (Exploration)

Need to minimize regret(Exploitation)

Multi Armed Bandit

Balance of Exploitation & Exploration

Bandit Algorithm Balances Exploitation & Exploration

Multi Armed BanditBefore Test

Time

AB Test

AB TestAfter TestBefore Test

Discrete Exploitation & Exploration Phases

Continuous Exploitation & Exploration

Bandit Favors Winning Arm

Bandit Algorithm Reduces Risk of Testing

AB TestBest arm exploited with probability 1/N– More Arms: Less exploitation

BanditBest arm exploited with determined probability– Reduced exposure to suboptimal arms

Demo

Borrowed fromProbabilistic Programming & Bayesian Methods for Hackers

Split Test

Bandit

Winner Breaks Away!

Still sending losers

AB test would have cost 4.3 percentage points

How it works

Epsilon Greedy Algorithmε = Probability of Exploration

ε

1 - ε

Exploration

Exploitation(show best arm)

Start of round

1 / N

1 / N

Click Here

Click Here

Click Here

ε / N

1-ε

ε / N

Epsilon Greedy with ε = 1 = AB Test

Epsilon Greedy Issues

• Constant Epsilon:– Initially under exploring– Later over exploring– Better if probability of exploration decreases with

sample size (annealing)• No prior knowledge

Some Alternatives

• Epsilon-First• Epsilon-Decreasing• Softmax• UCB (UCB1, UCB2)• Bayesian-UCB• Thompson Sampling (Bayesian Bandits)

Bandit Algorithm Comparison

Regret:

Thompson Sampling

Setup: Assign each arm a Beta distribution with parameters (α,β) (# Success, # Failures)

Click Here Click Here Click Here

Beta(α,β) Beta(α,β) Beta(α,β)

Thompson Sampling

Setup: Initialize priors with ignorant state of Beta(1,1) (Uniform distribution)- Or initialize with an informed prior to aid convergence


Beta(1,1) Beta(1,1) Beta(1,1)

For each round:

Thompson Sampling



1: Sample random variable X from each arm’s Beta Distribution

2: Select the arm with largest X3: Observe the result of selected arm

4: Update prior Beta distribution for selected arm

X

Success!

0.7 0.2 0.4

For each round:

Thompson Sampling






X

Success!

0.7 0.2 0.4

For each round:

Thompson Sampling






X

Failure!

0.4 0.8 0.2

Posterior after 100k pulls (30 arms)

Bandits at Meetup

Meetup’s First Bandit

Control: Welcome To Meetup! - 60% Open RateWinner: What?Winner: Hi - 75% Open Rate (+25%)

76 Arms

Avoid Linkbaity Subject Lines

Control: Save 50%, start your Meetup Group – 42% Open RateWinner: Here is a coupon – 53% Open Rate (+26%)

16 Arms

Coupon Email

398 Arms

210% Click-through Difference:

Best:Looking to start the perfect Meetup for you?We’ll help you find just the right people

Start the perfect Meetup for you!We’ll help you find just the right people

Worst:Launch your own Meetup in January and save 50%Start the perfect Meetup for you50% off promotion ends February 1st.

Choose the Right Metric of Success

• Success tied to click in last experiment• Sale end & discount messaging had bad

results• Perhaps people don’t know that hosting a

Meetup costs $$$?– Better to tie success to group creation

More Issues

• Email open & click delay• New subject line effect– Problem when testing notifications

• Monitor success trends to detect weirdness

Seasonality

• Thompson Sampling should naturally adapt to seasonal changes– Learning rate can be added for faster adaptation

Click Here

Winner all other times

Click Here

Bandit or Split Test?

AB Test good for:- Biased Tests- Complicated Tests

Bandit good for:- Unbiased Tests- Many Variants- Time Restraints- Set It And Forget It

Thanks!

[email protected]

multi armed bandits [email protected]. survey click here

Documents

meetup slide

winner slide

bandit slide

weirdness slide

survey slide

ab test slide

winning arm slide

testing exploration