jun li, peng zhang, yanan cao, ping liu, li guo chinese academy of sciences state grid energy...

17
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble Indexing

Upload: della-glenn

Post on 05-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li GuoChinese Academy of SciencesState Grid Energy Institute, China

Efficient Behavior Targeting Using SVM Ensemble Indexing

Page 2: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Behavior Targeting (BT) uses users’ historical behavior data to select the most relevant ads for display.

Example from Yahoo! Research

Behavior targeting

ads

User behavior data

Targeted users

Page 3: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Regression for BT

Poisson Regression model (Ye Chen, eBay, 2009). x: ad clicks and views, page views, search

queries and clicks. y: click-through rate (CTR).

Ye Chen et al., Large-scale behavior targeting (KDD’09 best paper award)

View data

Click data

Poisson dis.

Poisson dis.

Poisson reg.

on view

Poisson reg. on

click

ad catego

ry

Page 4: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Limitations

Limitations: parameter tuning is very difficult. the Poisson assumption is not always true for real-world

behavior data. Clicks are typically several orders of magnitude fewer

than views. User interests are not always fixed, but rather transient.

Page 5: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Classification for BT SVM for classification

Example 1: 3 users on Nikon (www.nikon.com)’s ad a

View data

Click data

ad catego

ry

View and click

data(+)

View but no click data(-)

SVM for classificat

ion

Challenges 1,2,3

Page 6: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Classification for BT

Ensemble SVM on data streams

Merits no complicated parameters no statistical assumptions Dynamic model on data streams

Challenge 4

Page 7: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Limitations

Time cost is heavy for online computing ensemble prediction

time cost: A (advertisers)*W(ensemble size)*N(support vectors)*T(features)

Example 2: We collect 2 million behavior events (W = 10) in 1 minute, and prediction result costs 53 minutes.

Page 8: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Solutions

Construct Index structure for Ensemble SVM.

Why the index work ?Trade space for time. shared features among multiple support vectorsthe sparse structure of support vectors

Support vector

Text termsFeatures

Document

Ensemble SVM

Document set

map

P. Zhang et al., knowledge index for online data streams

( KDD 2011 & ICDM 2011)

Page 9: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

The index structure

The SVM-index structure Example 3: based on example 1,

consider a SVM with 3 support vectors

Ensemble informati

on

Support vectors

Inverted hashing

table

Time complexity O(T)

Page 10: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

The index structure

Operations– Search: Predict the label of each incoming user data x,

• Step 1: searches support vectors in the left inverted indexes

• Step 2: calculate x’s class label

– Insert: Integrate new classifiers into ensemble– Delete: Drop outdated classifiers from ensemble

Memory

See our source codes.

Page 11: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Experiments

Data sets Search engine data

• Comparisons– Possion– E-SVM– E-Index (our method)

Page 12: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Observations

Comparisons

E-index has sub-linear prediction time

E-SVM consumes more memory

Page 13: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Comparisons

Ensemble models are more accurate than Poisson regression model

Page 14: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Comparisons

The index method can significantly improve the efficiency, especially when the ensemble size is

large.

Page 15: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Related Work

Behavior targeting Regression models vs. classification models

Stream indexing Boolean expression indexing in

Publish/subscribe systems

Ensemble models Concept drifting

Page 16: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

Conclusions

Contributions Identify and address the prediction efficiency

problem for ensemble models for behavior targeting.

Convert ensemble SVM model to a document set, and propose a new type of invert text index structure to achieve sub-linear prediction time.

Future work Index more complicated SVM models with non-

linear kernels.

Page 17: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble

For source code, visit our websitestreamming.org/homepages/lijun.html

Questions?