tutorial 11 (computational advertising)

Computational advertising

Kira Radinsky

Slides based on material from the paper

“Bandits for Taxonomies: A Model-based Approach” by

Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti,

Vanja Josifovski, in SDM 2007

The Content Match Problem

Advert

isers

Ads

DB

Ads

Ad impression: Showing an ad to a user

(click)


Advert

isers

Ads

Ad click: user click leads to revenue for ad server and content provider

Ads

DB

(click)


Advert

isers

Ads

DB

Ads

The Content Match Problem:

Match ads to pages to maximize clicks


Advert

isers

Ads

DB

Ads

Maximizing the number of clicks means: For each webpage, find the ad with the best

Click-Through Rate (CTR) but without wasting too many impressions in

learning this.

Outline

Problem

Background: Multi-armed bandits

• Proposed Multi-level Policy

• Experiments

• Conclusions

Background: Bandits

Bandit “arms”

p1 p2 p3(unknown payoff

probabilities)

Pull arms sequentially so as to maximize the total

expected reward

• Estimate payoff probabilities pi

• Bias the estimation process towards better arms

http://digitalmedia.ucf.edu/site_files/images/port_interfaces/dmsinterface_slot.jpg



Background: Bandits Solutions

• Try 1: Greedy Solution:

• Compute the sample mean of an arm A by dividing the total reward received from the arm by the number of times the arm has been pulled. At each time step choose the arm with

highest sample mean.

• Try 2: Naïve solution:

• Pull each arm an equal number of times.

• Epsilon-greedy strategy:

• The best bandit is selected for a proportion 1 − ε of the trials,

and another bandit is randomly selected (with uniform

probability) for a proportion ε.

• Many more strategies

Ad matching as a bandit problemW

eb

pa

ge

1

Bandit “arms”

We

bp

ag

e 2

We

bp

ag

e 3

= ads

~106 ads

~109

pages










Ad matching as a bandit problem

Ads

Web

pa

ge

s

Content Match = A matrix

• Each row is a bandit

• Each cell has an unknown CTR

One instance of the MAB

problem (1 bandit)

Unknown CTR










Background: Bandits

Bandit Policy

1.Assign priority to

each arm

2. “Pull” arm with

max priority, and

observe reward

3.Update priorities

Priority 1 Priority 2 Priority 3

Allocation

Estimation




Background: Bandits

Why not simply apply a bandit policy

directly to the problem?

• Convergence is too slow

~109 instances of the MAB

problem(bandits), with ~106 arms per

instance (bandit)

• Additional structure is available, that

can help Taxonomies

Outline

Problem


Proposed Multi-level Policy

• Experiments

• Conclusions

Multi-level Policy

Ads

Webpages

… …

……

……

classes

classes

Consider only two levels


Multi-level Policy

ApparelCompu-

ters Travel

… …

……

……

Consider only two levels

Tra

ve

lC

om

pu

-

ters

Ap

pa

rel

Ad parent

classes

Ad child classes

Block

One MAB problem

instance (bandit)


Multi-level Policy

ApparelCompu-

ters Travel

… …

……

……

Key idea: CTRs in a block are homogeneous

Ad parent

classes

Block

One MAB problem

instance (bandit)

Tra

ve

lC

om

pu

-

ters

Ap

pa

rel

Ad child classes


Multi-level Policy

• CTRs in a block are homogeneous

– Used in allocation (picking ad for each new page)

– Used in estimation (updating priorities after each observation)

Multi-level Policy


Used in allocation (picking ad for each new page)

– Used in estimation (updating priorities after each observation)

C

A C T

AT

Multi-level Policy (Allocation)

?

Page

classifier

• Classify webpage page class, parent page class

• Run bandit on ad parent classes pick one ad parent class

C

A C T

AT


• Classify webpage page class, parent page class

• Run bandit on ad parent classes pick one ad parent class

• Run bandit among cells pick one ad class

• In general, continue from root to leaf final ad

?

Page

classifier

ad

C

A C T

AT

ad


Bandits at higher levels

• use aggregated information

• have fewer bandit arms

Quickly figure out the best ad parent class

Page

classifier

Multi-level Policy


Used in allocation (picking ad for each new page)

Used in estimation (updating priorities after each observation)

Multi-level Policy (Estimation)


– Observations from one cell also give information about others in the block

– How can we model this dependence?


• Shrinkage Model

Scell | CTRcell ~ Bin (Ncell, CTRcell)

CTRcell ~ Beta (Paramsblock)

# clicks in

cell

# impressions in cell

All cells in a block come from the same distribution


• Intuitively, this leads to shrinkageof cell CTRs towards block CTRs

E[CTR] = α.Priorblock + (1-α).Scell/Ncell

Estimated

CTR

Beta prior (“block

CTR”)

Observed

CTR

Outline

Problem


Proposed Multi-level Policy

Experiments

• Conclusions

Experiments [S. Panday et al. 2007]

Root

20 nodes

221 nodes

…

~7000 leaves

Taxonomy structure

use these 2

levels

Depth 0

Depth

7

Depth 1

Depth 2

Experiments

• Data collected over a 1 day period

• Collected from only one server, under some other ad-matching rules (not our bandit)

• ~229M impressions

• CTR values have been linearly transformed for purposes of confidentiality

Experiments (Multi-level Policy)

Multi-level gives much higher #clicks

Number of pulls

Clic

ks

Experiments (Multi-level Policy)

Multi-level gives much better Mean-Squared Error it has learnt

more from its explorations

Mean-S

qu

are

d E

rror

Number of pulls

Conclusions

• When having a CTR guided system, exploration is a key component

• Short term penalty for the exploration needs to be limited (exploration budget)

• Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance)

• Exploration in a reduced dimensional space: class hierarchy

• Top down traversal of the hierarchy to determine the class of the ad to show

tutorial 11 (computational advertising)

Technology

ad class

multilevel policy ctrs

bandits bandit policy

ad matching

ad server

final ad

page classifier ad

homogeneous ad parent