efÞcient bandit algorithms for online multiclass predictionshai/talks/banditron.pdf · efÞcient...

Sham M. KakadeShai Shalev-Shwartz Ambuj Tewari

Efficient Bandit Algorithms for Online Multiclass Prediction

ICML, July 2008

Motivation

Online web advertisement systems

User submits a query

System (the learner) places an ad

User either “clicks” or ignores

Goal: Maximize number of “clicks”

Modeling ?

Not the common online learning setting -- If user ignores, we donʼt get the “correct” ad

Not the common multi-armed bandit -- We are also provided with a query

Outline

Online Bandit Multi-class Categorization

Background: The Multi-class Perceptron

The Banditron

Analysis

Experiments

The Separable Case

Extensions and Open Problems

Online Bandit Multiclass Categorization

For t = 1, 2, . . . , T

Receive x ! Rd

Predict yt ! {1, . . . , k}

Pay 1[yt "= yt]

yt is not revealed

(query)

(ad)

(click feedback)

Linear Hypotheses

A hypothesis is a mapping h : Rd ! {1, . . . , k}

Linear hypothesis: Exists k " d matrix W s.t.

h(x) = argmaxr

(W x)r

The Multiclass Perceptron

U t =

!

"""""""""""""""""#

0 . . . 0...

0 . . . 0. . . xt . . .0 . . . 0

...0 . . . 0

. . . !xt . . .0 . . . 0

...0 . . . 0

$

%%%%%%%%%%%%%%%%%&

row yt

row yt

For t = 1, 2, . . . , T

Receive xt ! Rd

Predict yt = argmaxr

(W t xt)r

Receive yt

Update: W t+1 = W t + U t where

Perceptron in the Bandit Setting

row yt

row yt

U t =

!

"""""""""""""""""#

0 . . . 0...

0 . . . 0. . . xt . . .0 . . . 0

...0 . . . 0

. . . !xt . . .0 . . . 0

...0 . . . 0

$

%%%%%%%%%%%%%%%%%&

Problem: We’re blind to value of yt

Solution: Randomization can help !

Exploration

Explore: instead of predicting yt guess some yt

Suppose we get the feedback ’correct’, i.e. yt = yt

Then, we know that

yt != yt

yt = yt

So, we can update W using the matrix U t

Exploration vs. Exploitation

But, if our current model is correct, i.e. yt = yt

And, we guess some other yt

Then, we both su!er loss and do not know how toupdate W

In this case, it’s better to Exploit the quality ofcurrent model

We control the exploration-exploitation tradeo!using randomization

For t = 1, 2, . . . , T

Receive xt ! Rd

Set yt = argmaxr

(W t xt)r

Define: P (r) = (1" !)1[r = yt] + !k

Randomly sample yt according to P

Predict yt and receive feedback 1[yt = yt]

Update: W t+1 = W t + U t

The Banditron

For t = 1, 2, . . . , T

Receive xt ! Rd

Set yt = argmaxr

(W t xt)r

Define: P (r) = (1" !)1[r = yt] + !k

Randomly sample yt according to P

Predict yt and receive feedback 1[yt = yt]

Update: W t+1 = W t + U t

The Banditron!

""""""""""""""""""""#

0 . . . 0...

0 . . . 0. . . 1[yt=yt]

P (yt)xt . . .

0 . . . 0...

0 . . . 0. . . !xt . . .0 . . . 0

...0 . . . 0

$

%%%%%%%%%%%%%%%%%%%%&

row yt

row yt

The Banditron Expected Update

E[U t] =!

r

P (r)

"

####################$

0 . . . 0...

0 . . . 0. . . 1[yt=r]

P (yt)xt . . .

0 . . . 0...

0 . . . 0. . . !xt . . .0 . . . 0

...0 . . . 0

%

&&&&&&&&&&&&&&&&&&&&'

= U t

Analysis: The Hinge-Loss

5 !! 1

(W xt)yt (W xt)r

! 1[yt "= yt]!t(W ) = maxr !=yt

1! (W xt)yt + (W xt)r

Analysis: The Hinge-Loss

5 !! 1

(W xt)yt (W xt)r

! 1[yt "= yt]

The Separable Case:

!t(W ) = maxr !=yt

1! (W xt)yt + (W xt)r

Mistake Bounds

M ! L + D +"

L D

E[M ] ! L + ! T + 3 max!

k D! ,

"D ! T

#+

$k D L

!

Perceptron:

Banditron:

Symbol MeaningM # mistakesL competitor loss

!t !t(W !)

D competitor margin !W !!2F

k # classesT # rounds" Exploration-Exploitation parameter

Mistake Bounds (cont.)

Symbol MeaningL competitor loss

!t !t(W !)

D competitor margin !W !!2F

k # classesT # rounds

Perceptron BanditronNo noise:L = 0 D

!k D T

Low noise:L = O(

!k D T )

!k D T

!k D T

Noisy: L + T 1/2 L + T 2/3

Experiments

Reuters RCV1

~700k documents

Bag-of-words (d ~ 350k)

4 labels {CCAT, ECAT, GCAT, MCAT}

Synthetic separable data set

9 classes, d=400, million instances

A simple simulation of generating text documents

Synthetic non-separable data set

separable + 5% label noise

Experimental Results -- Reuters

102 103 104 105 10610!2

10!1

100! = 0.050

number of examples

erro

r rat

e

PerceptronBanditron

Similar Slope

Experimental Results -- Separable Data

102 103 104 105 10610!5

10!4

10!3

10!2

10!1

100! = 0.014

number of examples

erro

r rat

e

PerceptronBanditron

Slope -1

Slope -0.55

Experimental Results -- 5% label noise

102 103 104 105 106

10!0.9

10!0.7

10!0.5

10!0.3

! = 0.006

number of examples

erro

r rat

e

PerceptronBanditron

13%

10%

Exploration-Exploitation Parameter

10!3 10!2 10!1 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

!

erro

r rat

e

PerceptronBanditron

10!3 10!2 10!1 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

!

erro

r rat

e

PerceptronBanditron

Reuters5% label noise

The Separable Case

Halving

Discretized hypothesis space

Predict by majority vote

Remove ’wrong’ hypotheses

Note: can be applied in Banditsetting

Mistake Bound O(k2 d log(D d))

Using JL lemma we can also ob-tain O(k2 D log(T+k

! ) log(D))

Extensions and Open Problems

Label Ranking

Predicting a “label ranking”

How to interpret feedback ?

Multiplicative and Margin-based updates

Bandit versions of “Winnow” and “Passive-Aggressive”

Deterministic vs. Randomized strategies

Achievable rates ?

Efficient algorithms for the separable case ?

efÞcient bandit algorithms for online multiclass predictionshai/talks/banditron.pdf · efÞcient...

Documents