jun li, peng zhang, yanan cao, ping liu, li guo chinese academy of sciences state grid energy...
TRANSCRIPT
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li GuoChinese Academy of SciencesState Grid Energy Institute, China
Efficient Behavior Targeting Using SVM Ensemble Indexing
Behavior Targeting (BT) uses users’ historical behavior data to select the most relevant ads for display.
Example from Yahoo! Research
Behavior targeting
ads
User behavior data
Targeted users
Regression for BT
Poisson Regression model (Ye Chen, eBay, 2009). x: ad clicks and views, page views, search
queries and clicks. y: click-through rate (CTR).
Ye Chen et al., Large-scale behavior targeting (KDD’09 best paper award)
View data
Click data
Poisson dis.
Poisson dis.
Poisson reg.
on view
Poisson reg. on
click
ad catego
ry
Limitations
Limitations: parameter tuning is very difficult. the Poisson assumption is not always true for real-world
behavior data. Clicks are typically several orders of magnitude fewer
than views. User interests are not always fixed, but rather transient.
Classification for BT SVM for classification
Example 1: 3 users on Nikon (www.nikon.com)’s ad a
View data
Click data
ad catego
ry
View and click
data(+)
View but no click data(-)
SVM for classificat
ion
Challenges 1,2,3
Classification for BT
Ensemble SVM on data streams
Merits no complicated parameters no statistical assumptions Dynamic model on data streams
Challenge 4
Limitations
Time cost is heavy for online computing ensemble prediction
time cost: A (advertisers)*W(ensemble size)*N(support vectors)*T(features)
Example 2: We collect 2 million behavior events (W = 10) in 1 minute, and prediction result costs 53 minutes.
Solutions
Construct Index structure for Ensemble SVM.
Why the index work ?Trade space for time. shared features among multiple support vectorsthe sparse structure of support vectors
Support vector
Text termsFeatures
Document
Ensemble SVM
Document set
map
P. Zhang et al., knowledge index for online data streams
( KDD 2011 & ICDM 2011)
The index structure
The SVM-index structure Example 3: based on example 1,
consider a SVM with 3 support vectors
Ensemble informati
on
Support vectors
Inverted hashing
table
Time complexity O(T)
The index structure
Operations– Search: Predict the label of each incoming user data x,
• Step 1: searches support vectors in the left inverted indexes
• Step 2: calculate x’s class label
– Insert: Integrate new classifiers into ensemble– Delete: Drop outdated classifiers from ensemble
Memory
See our source codes.
Experiments
Data sets Search engine data
• Comparisons– Possion– E-SVM– E-Index (our method)
Observations
Comparisons
E-index has sub-linear prediction time
E-SVM consumes more memory
Comparisons
Ensemble models are more accurate than Poisson regression model
Comparisons
The index method can significantly improve the efficiency, especially when the ensemble size is
large.
Related Work
Behavior targeting Regression models vs. classification models
Stream indexing Boolean expression indexing in
Publish/subscribe systems
Ensemble models Concept drifting
Conclusions
Contributions Identify and address the prediction efficiency
problem for ensemble models for behavior targeting.
Convert ensemble SVM model to a document set, and propose a new type of invert text index structure to achieve sub-linear prediction time.
Future work Index more complicated SVM models with non-
linear kernels.
For source code, visit our websitestreamming.org/homepages/lijun.html
Questions?