multi armed bandits [email protected]. survey click here
TRANSCRIPT
Multi Armed Bandits
Survey
Click Here
Click Here
Click-through Rate (Clicks / Impressions)20%
Click Here Click Here
Click Here Click Here
Click-through Rate20% ?
Click Here Click Here
Click-through Rate20% ?
AB Test• Randomized Controlled Experiment• Show each button to 50% of users
AB Test Timeline
AB Test
AB TestAfter Test (show winner)Before Test
Time
Exploration Phase(Testing)
Exploitation Phase(Show Winner)
Click Here Click Here
Click-through Rate20% ?
Click Here Click Here
Click-through Rate20% 30%
• 10,000 impressions/month
• Need 4,000 clicks by EOM• 30% CTR won’t be enough
Need to keep testing (Exploration)
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
ABCDEFG... TestEach variant would be assigned with probability 1/N
N = # of variants
Not everyone is a winner
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
Click Here Click Here
ABCDEFG... TestEach variant would be assigned with probability 1/N
N = # of variants
Need to keep testing (Exploration)
Need to minimize regret(Exploitation)
Multi Armed Bandit
Balance of Exploitation & Exploration
Bandit Algorithm Balances Exploitation & Exploration
Multi Armed BanditBefore Test
Time
AB Test
AB TestAfter TestBefore Test
Discrete Exploitation & Exploration Phases
Continuous Exploitation & Exploration
Bandit Favors Winning Arm
Bandit Algorithm Reduces Risk of Testing
AB TestBest arm exploited with probability 1/N– More Arms: Less exploitation
BanditBest arm exploited with determined probability– Reduced exposure to suboptimal arms
Demo
Borrowed fromProbabilistic Programming & Bayesian Methods for Hackers
Split Test
Bandit
Winner Breaks Away!
Still sending losers
AB test would have cost 4.3 percentage points
How it works
Epsilon Greedy Algorithmε = Probability of Exploration
ε
1 - ε
Exploration
Exploitation(show best arm)
Start of round
1 / N
1 / N
Click Here
Click Here
Click Here
ε / N
1-ε
ε / N
Epsilon Greedy with ε = 1 = AB Test
Epsilon Greedy Issues
• Constant Epsilon:– Initially under exploring– Later over exploring– Better if probability of exploration decreases with
sample size (annealing)• No prior knowledge
Some Alternatives
• Epsilon-First• Epsilon-Decreasing• Softmax• UCB (UCB1, UCB2)• Bayesian-UCB• Thompson Sampling (Bayesian Bandits)
Bandit Algorithm Comparison
Regret:
Thompson Sampling
Setup: Assign each arm a Beta distribution with parameters (α,β) (# Success, # Failures)
Click Here Click Here Click Here
Beta(α,β) Beta(α,β) Beta(α,β)
Thompson Sampling
Setup: Initialize priors with ignorant state of Beta(1,1) (Uniform distribution)- Or initialize with an informed prior to aid convergence
Click Here Click Here Click Here
Beta(1,1) Beta(1,1) Beta(1,1)
For each round:
Thompson Sampling
Click Here Click Here Click Here
Beta(1,1) Beta(1,1) Beta(1,1)
1: Sample random variable X from each arm’s Beta Distribution
2: Select the arm with largest X3: Observe the result of selected arm
4: Update prior Beta distribution for selected arm
X
Success!
0.7 0.2 0.4
For each round:
Thompson Sampling
Click Here Click Here Click Here
Beta(2,1) Beta(1,1) Beta(1,1)
1: Sample random variable X from each arm’s Beta Distribution
2: Select the arm with largest X3: Observe the result of selected arm
4: Update prior Beta distribution for selected arm
X
Success!
0.7 0.2 0.4
For each round:
Thompson Sampling
Click Here Click Here Click Here
Beta(2,1) Beta(1,1) Beta(1,1)
1: Sample random variable X from each arm’s Beta Distribution
2: Select the arm with largest X3: Observe the result of selected arm
4: Update prior Beta distribution for selected arm
X
Failure!
0.4 0.8 0.2
For each round:
Thompson Sampling
Click Here Click Here Click Here
Beta(2,1) Beta(1,2) Beta(1,1)
1: Sample random variable X from each arm’s Beta Distribution
2: Select the arm with largest X3: Observe the result of selected arm
4: Update prior Beta distribution for selected arm
X
Failure!
0.4 0.8 0.2
Posterior after 100k pulls (30 arms)
Bandits at Meetup
Meetup’s First Bandit
Control: Welcome To Meetup! - 60% Open RateWinner: What?Winner: Hi - 75% Open Rate (+25%)
76 Arms
Control: Welcome To Meetup! - 60% Open RateWinner: What?Winner: Hi - 75% Open Rate (+25%)
76 Arms
Control: Welcome To Meetup! - 60% Open RateWinner: What?Winner: Hi - 75% Open Rate (+25%)
76 Arms
Avoid Linkbaity Subject Lines
Control: Save 50%, start your Meetup Group – 42% Open RateWinner: Here is a coupon – 53% Open Rate (+26%)
16 Arms
Coupon Email
398 Arms
210% Click-through Difference:
Best:Looking to start the perfect Meetup for you?We’ll help you find just the right people
Start the perfect Meetup for you!We’ll help you find just the right people
Worst:Launch your own Meetup in January and save 50%Start the perfect Meetup for you50% off promotion ends February 1st.
Choose the Right Metric of Success
• Success tied to click in last experiment• Sale end & discount messaging had bad
results• Perhaps people don’t know that hosting a
Meetup costs $$$?– Better to tie success to group creation
More Issues
• Email open & click delay• New subject line effect– Problem when testing notifications
• Monitor success trends to detect weirdness
Seasonality
• Thompson Sampling should naturally adapt to seasonal changes– Learning rate can be added for faster adaptation
Click Here
Winner all other times
Click Here
Bandit or Split Test?
AB Test good for:- Biased Tests- Complicated Tests
Bandit good for:- Unbiased Tests- Many Variants- Time Restraints- Set It And Forget It
Thanks!