tutorial 11 (computational advertising)
DESCRIPTION
Part of the Search Engine course given in the Technion (2011)TRANSCRIPT
Computational advertising
Kira Radinsky
Slides based on material from the paper
“Bandits for Taxonomies: A Model-based Approach” by
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti,
Vanja Josifovski, in SDM 2007
The Content Match Problem
Advert
isers
Ads
DB
Ads
Ad impression: Showing an ad to a user
(click)
The Content Match Problem
Advert
isers
Ads
Ad click: user click leads to revenue for ad server and content provider
Ads
DB
(click)
The Content Match Problem
Advert
isers
Ads
DB
Ads
The Content Match Problem:
Match ads to pages to maximize clicks
The Content Match Problem
Advert
isers
Ads
DB
Ads
Maximizing the number of clicks means: For each webpage, find the ad with the best
Click-Through Rate (CTR) but without wasting too many impressions in
learning this.
Outline
Problem
Background: Multi-armed bandits
• Proposed Multi-level Policy
• Experiments
• Conclusions
Background: Bandits
Bandit “arms”
p1 p2 p3(unknown payoff
probabilities)
Pull arms sequentially so as to maximize the total
expected reward
• Estimate payoff probabilities pi
• Bias the estimation process towards better arms
Background: Bandits Solutions
• Try 1: Greedy Solution:
• Compute the sample mean of an arm A by dividing the total reward received from the arm by the number of times the arm has been pulled. At each time step choose the arm with
highest sample mean.
• Try 2: Naïve solution:
• Pull each arm an equal number of times.
• Epsilon-greedy strategy:
• The best bandit is selected for a proportion 1 − ε of the trials,
and another bandit is randomly selected (with uniform
probability) for a proportion ε.
• Many more strategies
Ad matching as a bandit problemW
eb
pa
ge
1
Bandit “arms”
We
bp
ag
e 2
We
bp
ag
e 3
= ads
~106 ads
~109
pages
Ad matching as a bandit problem
Ads
Web
pa
ge
s
Content Match = A matrix
• Each row is a bandit
• Each cell has an unknown CTR
One instance of the MAB
problem (1 bandit)
Unknown CTR
Background: Bandits
Bandit Policy
1.Assign priority to
each arm
2. “Pull” arm with
max priority, and
observe reward
3.Update priorities
Priority 1 Priority 2 Priority 3
Allocation
Estimation
Background: Bandits
Why not simply apply a bandit policy
directly to the problem?
• Convergence is too slow
~109 instances of the MAB
problem(bandits), with ~106 arms per
instance (bandit)
• Additional structure is available, that
can help Taxonomies
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
• Experiments
• Conclusions
Multi-level Policy
Ads
Webpages
… …
……
……
classes
classes
Consider only two levels
Multi-level Policy
ApparelCompu-
ters Travel
… …
……
……
Consider only two levels
Tra
ve
lC
om
pu
-
ters
Ap
pa
rel
Ad parent
classes
Ad child classes
Block
One MAB problem
instance (bandit)
Multi-level Policy
ApparelCompu-
ters Travel
… …
……
……
Key idea: CTRs in a block are homogeneous
Ad parent
classes
Block
One MAB problem
instance (bandit)
Tra
ve
lC
om
pu
-
ters
Ap
pa
rel
Ad child classes
Multi-level Policy
• CTRs in a block are homogeneous
– Used in allocation (picking ad for each new page)
– Used in estimation (updating priorities after each observation)
Multi-level Policy
• CTRs in a block are homogeneous
Used in allocation (picking ad for each new page)
– Used in estimation (updating priorities after each observation)
C
A C T
AT
Multi-level Policy (Allocation)
?
Page
classifier
• Classify webpage page class, parent page class
• Run bandit on ad parent classes pick one ad parent class
C
A C T
AT
Multi-level Policy (Allocation)
• Classify webpage page class, parent page class
• Run bandit on ad parent classes pick one ad parent class
• Run bandit among cells pick one ad class
• In general, continue from root to leaf final ad
?
Page
classifier
ad
C
A C T
AT
ad
Multi-level Policy (Allocation)
Bandits at higher levels
• use aggregated information
• have fewer bandit arms
Quickly figure out the best ad parent class
Page
classifier
Multi-level Policy
• CTRs in a block are homogeneous
Used in allocation (picking ad for each new page)
Used in estimation (updating priorities after each observation)
Multi-level Policy (Estimation)
• CTRs in a block are homogeneous
– Observations from one cell also give information about others in the block
– How can we model this dependence?
Multi-level Policy (Estimation)
• Shrinkage Model
Scell | CTRcell ~ Bin (Ncell, CTRcell)
CTRcell ~ Beta (Paramsblock)
# clicks in
cell
# impressions in cell
All cells in a block come from the same distribution
Multi-level Policy (Estimation)
• Intuitively, this leads to shrinkageof cell CTRs towards block CTRs
E[CTR] = α.Priorblock + (1-α).Scell/Ncell
Estimated
CTR
Beta prior (“block
CTR”)
Observed
CTR
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
• Conclusions
Experiments [S. Panday et al. 2007]
Root
20 nodes
221 nodes
…
~7000 leaves
Taxonomy structure
use these 2
levels
Depth 0
Depth
7
Depth 1
Depth 2
Experiments
• Data collected over a 1 day period
• Collected from only one server, under some other ad-matching rules (not our bandit)
• ~229M impressions
• CTR values have been linearly transformed for purposes of confidentiality
Experiments (Multi-level Policy)
Multi-level gives much higher #clicks
Number of pulls
Clic
ks
Experiments (Multi-level Policy)
Multi-level gives much better Mean-Squared Error it has learnt
more from its explorations
Mean-S
qu
are
d E
rror
Number of pulls
Conclusions
• When having a CTR guided system, exploration is a key component
• Short term penalty for the exploration needs to be limited (exploration budget)
• Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance)
• Exploration in a reduced dimensional space: class hierarchy
• Top down traversal of the hierarchy to determine the class of the ad to show