ad serving kdd2008_2
Post on 14-Jun-2015
701 Views
Preview:
DESCRIPTION
TRANSCRIPT
KDD2008
April 10, 2010
Scalable Ad Serving
Human Relevance Team
Pascale Queva
Woohee Kwak
Gang Wu and Brendan Kitts
http://team/sites/Broadmatch
Revenue Team
Shuzhen Nong
Paul Clark
Jeremy Tantrum
Test
Binu John
Harish Krishnan
Martin Markov
Gong Cheng
Deepika Othuluru Sharat
Development
Hung Nguyen
Ashok Madala
Gang Wu
Program Management
Gang Wu
Brendan Kitts
Brian Burdick
Algorithms
Ewa Dominowska
Shuzhen Nong
Susan Dumais
Donald Metzler
Chris Meek
Max Chickering
Jesper Lind
Abhinai Srivastava
Gang Wu
Hua Li
Jian Hu
Hua-Jun Zeng
Zheng Chen
Jody Biggs
Bo Thiesson
Kathy Dai
Silviu-Petru Cucerzan
Robert Ragno
KDD2008
The Ad Serving Problem
Banner Advertising
Ad Response
Triger: Pageview
from User
KDD2008
The Ad Serving Problem
Triger: Pageview from User
Ad Response
Ad Response
Paid Search Advertising
KDD2008
The Ad Serving Problem: Technical
Challenge to do this at Scale!
• Problem• Given any Trigger, respond with an Ad that maximizes Revenue…
• Scale• For simple bayesian or codebook method, Scale = Triggers x Ads
• 5 million x 9 million = 45 trillion possible pairs to evaluate for suitability
• Speed• Ad serving should be completed in around 50 miliseconds.
• Can’t store 45 trillion in memory.
• Ad Serving Algorithm• Maintan a codebook of triggers and the ads that should be presented using Hash for rapid
serving. Distribute hash across machines..
• Data mining problem • Come up with a good code-book to use the precious memory resource.
KDD2008
The Ad Serving Problem: Definition
• Given any Trigger, respond with an Ad that maximizes Revenue subject to some constraints
• Constraints include: · Relevance: CTR > x
· Storage limit: Number of code-book pairs < N
· And lots more
· Frequency capping
· Sequence constraints
· Competitive exclusion
· Mainline Reserve constraints
• Let’s have a look at Revenue….
Rev vrs Rel
KDD2008
Revenue in the Ad Business
tktktk crI ,,,
Should we Serve Ad? (0 or 1) * Revenue per action rk,t * Probability of action ck,t
Revenue =
KDD2008
Probability of Action (CTR)
Global CTR = Pr(k) CTR of advertisement without
condition / Popularity of advertisement.
Conditional CTR = Pr(k|t) CTR of advertisement
conditional upon trigger – basic historical
performance
Smoothed CTR = Smoothly vary between the two
Feature-based Model Dtree, Linear Regression,
etc = Disadvantage is that this requires some
knowledge of the ads.
Revenue = tktktk crI ,,,
Ad Serving 101
KDD2008
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
globalctr
history
smoothed
linearreg
dtree
CTR Prediction Accuracy
Global CTR does
surprisingly
well…..
Feature based
methods
Conditional CTR
does well but
peters out
because we lack
data
One could go
model-less at
least for top
15% of data as
measured by
conditional
probabilities
and generate
fairly good
results
Ad servers
purportedly use
this
technique….
KDD2008
Revenue in the Ad Business
tktktk crI ,,,Revenue =
Should we Serve Ad? (0 or 1) * Revenue per action rk,t * Probability of action ck,t
KDD2008
Ad Serving: Solution
• Greedy optimization:
· Add Ik,t to code-book that have the highest expected
revenue (meaning probability of action * payout for action)
· Add while constraints are met. Constraints include.
tktktk crI ,,,Revenue =
Ik,t * rk,t * ck,t = dk,t
Ik,t * rk,t * ck,t = dk,t
Ik,t * rk,t * ck,t = dk,t
Sort
Pick highest E[Revenue]
up to the capacity
constraint
tktktk crI ,,, =
0 0.5 1 1.5 2 2.5 3
x 105
10-7
10-6
10-5
10-4
10-3
10-2
10-1
number of trigger-ads being served
pre
dic
ted C
TR
greedy allocation of trigger,ads to ad-server
1 2 3
KDD2008
Some curious things about maximizing
revenue….
KDD2008
Some curious things about maximizing
revenue….
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26global CTR of expansion
Revenue per display
CTR of
ad
Property noted by Jensen and
other authors: Tendency for
relevance to be correlated with
revenue – advertisers have to be
highly relevant to offer to pay such
high prices since otherwise they
pay for lots of non-converting clicks
Hey, what happened!?!? Might
advertisers with poor CTRs be
trying to make up for it by
increasing their bid price?
Each knot
is a decile
of the
trigger-ad
population
KDD2008
Ad Serving Application
• Use a lookup table to map to keyword-tagged advertisement.
When a user types in “shoes”, map it to the keyword-tagged
advertisement “nike sneakers” (for example).
• The keyword tags and ad creatives are entered by the
advertiser.
• We can choose whether to add a code-book entry or leave it
go
KDD2008
Building it will be a piece of cake…. Not really!
• 1 year to launch
• 55 algorithms tested from 10 teams! Turned into a competition
• Unexpected challenges including Porn, Trademark, Bad
expansions, Editorial policy, Adoption and acceptance by
internal teams
KDD2008
Results
• Implemented on Live.com search engine Paid Advertisements.
• Data for 4 months analyzed in this paper, although system has been
running for the past two years.
• 3 billion impressions
• Experimental test setup:
· Test split randomly on Live search traffic
· Control = Basic Ad Serving Algorithm
· Experimental = Optimized Ad Serving Algorithm
• Positive on all metrics including advertiser value, searcher value,
adCenter performance, but required some work to achieve this
KDD2008
Algorithms which are positive on both
CTR and RPS Oct-Nov 2006
6 27
6 14
6 25 6 316 32
6 11
6 156 1
6 176 4
5 220.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5%
RPS %
CT
R %
(Sc
ale
Re
mo
ve
d)
(Scale Removed)
KDD2008
Ad Serving Revenue vrs Control
5.7%
4.1%
0.7%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
RPS % RPBS % CTR % (CPBS)
Smart vs Control May 2007 - Jan 2008
Ad Serving Revenue versus Control
(Scale
Rem
oved
)
KDD2008
Ad Serving Revenue vrs Control
$0
$5,000,000
$10,000,000
$15,000,000
$20,000,000
$25,000,000
$30,000,000
Smartmatch Revenue vrs ControlAd Serving Revenue versus Control
(Scale
removed)
KDD2008
Algorithms in Public Domain
Alg14 and Alg24
Jidong Wang, Hua-Jun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-
Ying Ma. ReCoM: Reinforcement Clustering of Multi-Type Interrelated
Data Objects. In Proceedings of the 26th annual international ACM
SIGIR conference on Research and development in information
retrieval (SIGIR'03), pp. 274-281, Toronto, Canada, July 2003.
http://team/sites/Broadmatch/Shared%20Documents/p16477-
wang.pdf
Alg 11
Donald Metzler, Susan Dumais, Chris Meek, (2006), Similarity
Measures for Short Segments of Text, preprint
http://team/sites/Broadmatch/Shared%20Documents/MetzlerDumais
MeekECIR07-Final.doc
KDD2008
Conclusion
• Greedy optimization method for maximizing Revenue or CTR.
• Used very simple features, eg. CTR and Conditional CTR, as well as
more complex ones we haven’t discussed.
• Running live, at scale (7% US Traffic), with control groups
• Revenue and Relevance generally correlated (as noted by Jensen
and other authors), but very high revenue is not correlated with
relevance. Inverted “U” Shaped function! Hypothesis: High revenue
advertisers may be compensating for poor CTR by boosting their
Prices as high as possible.
• Conditional CTR and Global CTR are effective methods for predicting
ad performance. They also avoid training.
• Feature-based prediction most effective.
top related