recommendation in advertising and social networks

52
1 Recommendation in Advertising and Social Networks Deepayan Chakrabarti ([email protected])

Upload: rex

Post on 15-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Recommendation in Advertising and Social Networks. Deepayan Chakrabarti ([email protected]). This presentation. Content Match [KDD 2007] : How can we estimate the click-through rate (CTR) of an ad on a page?. CTR for ad j on page i. ~10 9 pages. ~10 6 ads. This presentation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recommendation in Advertising and Social Networks

1

Recommendation in Advertising and Social Networks

Deepayan Chakrabarti ([email protected])

Page 2: Recommendation in Advertising and Social Networks

This presentation

1) Content Match [KDD 2007]: How can we estimate the click-through rate

(CTR) of an ad on a page?

~106 ads

~109 p

ages CTR for ad j

on page i

2

Page 3: Recommendation in Advertising and Social Networks

This presentation

1) Estimating CTR for Content Match [KDD ‘07]

2) Theoretical underpinnings[COLT ‘10 best student paper]

Represent relationships as a graph Recommendation = Link Prediction Many useful heuristics exist Why do these heuristics work?

Goal: Suggest friends

3

Page 4: Recommendation in Advertising and Social Networks

4

Estimating CTR for Content Match Contextual Advertising

Show an ad on a webpage (“impression”) Revenue is generated if a user clicks Problem: Estimate the click-through rate (CTR) of

an ad on a page

~106 ads

~109 p

ages

CTR for ad j on page i

Page 5: Recommendation in Advertising and Social Networks

Estimating CTR for Content Match Why not use the MLE?

1. Few (page, ad) pairs have N>02. Very few have c>0 as well3. MLE does not differentiate between 0/10 and

0/100 We have additional information: hierarchies

5

Page 6: Recommendation in Advertising and Social Networks

6

Estimating CTR for Content Match Use an existing, well-understood hierarchy

Categorize ads and webpages to leaves of the hierarchy

CTR estimates of siblings are correlatedThe hierarchy allows us to aggregate data

Coarser resolutions provide reliable estimates for rare events which then influences estimation at finer

resolutions

Page 7: Recommendation in Advertising and Social Networks

7

Estimating CTR for Content Match

Level i

Page hierarchy Ad hierarchy

Region= (page node, ad node)

Region Hierarchy A cross-product of the page

hierarchy and the ad hierarchy

Region

Page 8: Recommendation in Advertising and Social Networks

8

Estimating CTR for Content Match Level 0

Level i

Page hierarchy Ad hierarchy

Region= (page node, ad node)

Region Hierarchy A cross-product of the page

hierarchy and the ad hierarchy

Page 9: Recommendation in Advertising and Social Networks

Estimating CTR for Content Match Our Approach

Data Transformation Model Model Fitting

9

Page 10: Recommendation in Advertising and Social Networks

Data Transformation

Problem:

Solution: Freeman-Tukey transform

Differentiates regions with 0 clicks Variance stabilization:

10

Page 11: Recommendation in Advertising and Social Networks

Model

Goal: Smoothing across siblings in hierarchy[Huang+Cressie/2000]

1111

Level i

Level i+1

S1S2

S3 S4

Sparent1. Each region has a latent state Sr

2. yr is independent of the hierarchy given Sr

3. Sr is drawn from its parent Spa(r)

y1 y2 y4

observable

late

nt

Page 12: Recommendation in Advertising and Social Networks

Model

12

Sr

Spa(r)

yr

ypa(r)

variance Vr coeff. βr

variance wr Vpa(r)

wpa(r)

ur

βpa(r)

upa(r)

Page 13: Recommendation in Advertising and Social Networks

However, learning Wr , Vr and βr for each region is clearly infeasible

Assumptions: All regions at the same level ℓ share

the same W(ℓ) and β(ℓ)

Vr = V/Nr for some constant V, since

Model

13

Sr

yr

Vr βr

wr

ur

Spa(r)

Page 14: Recommendation in Advertising and Social Networks

Model

Implications: determines degree of smoothing :

Sr varies greatly from Spa(r) Each region learns its own Sr

No smoothing :

All Sr are identical A regression model on features ur is learnt

Maximum Smoothing

14

Sr

yr

Vr βr

wr

ur

Spa(r)

Page 15: Recommendation in Advertising and Social Networks

Implications: determines degree of smoothing Var(Sr) increases from root to leaf

Better estimates at coarser resolutions

Model

15

Sr

yr

Vr βr

wr

ur

Spa(r)

Page 16: Recommendation in Advertising and Social Networks

Implications: determines degree of smoothing Var(Sr) increases from root to leaf Correlations among siblings at

level ℓ: Depends only on level of least common

ancestor

Model

16

Sr

yr

Vr βr

wr

ur

Spa(r)

Corr( , ) > Corr( , )

Page 17: Recommendation in Advertising and Social Networks

Estimating CTR for Content Match Our Approach

Data Transformation (Freeman-Tukey) Model (Tree-structured Markov Chain) Model Fitting

17

Page 18: Recommendation in Advertising and Social Networks

18

Model Fitting

Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate

data from leaves to root Smoothing: Propagate

information from root to leaves

Complexity: linear in the number of regions, for both time and space

filtering

smoo

thing

Page 19: Recommendation in Advertising and Social Networks

19

Model Fitting

Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate

data from leaves to root Smoothing: Propagates

information from root to leaves

Kalman filter requires knowledge of β, V, and W EM wrapped around the

Kalman filter

filtering

smoo

thing

Page 20: Recommendation in Advertising and Social Networks

20

Experiments

503M impressions 7-level hierarchy of which the top 3 levels

were used Zero clicks in

76% regions in level 2 95% regions in level 3

Full dataset DFULL, and a 2/3 sample DSAMPLE

Page 21: Recommendation in Advertising and Social Networks

21

Experiments

Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE

Some of these regions R>0 get clicks in DFULL

A good model should predict higher CTRs for R>0 as against the other regions in R

Page 22: Recommendation in Advertising and Social Networks

22

Experiments

We compared 4 models TS: our tree-structured model LM (level-mean): each level smoothed

independently NS (no smoothing): CTR proportional to 1/Nr

Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R

Page 23: Recommendation in Advertising and Social Networks

23

Experiments

TS

Rando

m

LM, N

S

Page 24: Recommendation in Advertising and Social Networks

Experiments MLE=0 everywhere, since 0 clicks were observed What about estimated CTR?

24

Impressions

Est

imat

ed C

TR

ImpressionsE

stim

ated

CTR

No Smoothing (NS) Our Model (TS)

Variability from coarser resolutions

Close to MLE for large N

Page 25: Recommendation in Advertising and Social Networks

25

Estimating CTR for Content Match We presented a method to estimate

rates of extremely rare events at multiple resolutions under severe sparsity constraints

Key points: Tree-structured generative model Extremely fast parameter fitting

Page 26: Recommendation in Advertising and Social Networks

Theoretical underpinnings

1) Estimating CTR for Content Match [KDD ‘07]

2) Theoretical underpinnings of link prediction [COLT ‘10 best student paper]

26

Page 27: Recommendation in Advertising and Social Networks

Link Prediction Which pair of nodes {i,j} should be connected?

Alice

Bob

Charlie

Goal: Recommend a movie

27

Page 28: Recommendation in Advertising and Social Networks

Link Prediction Which pair of nodes {i,j} should be connected?

Goal: Suggest friends

28

Page 29: Recommendation in Advertising and Social Networks

Link Prediction Heuristics Predict link between nodes

Connected by the shortest path With the most common neighbors (length 2 paths) More weight to low-degree common nbrs

(Adamic/Adar)

3 followers

1000

followers

Prolific common friends

Less evidence

Less prolific

Much more evidence

Alice

Bob

Charlie

Page 30: Recommendation in Advertising and Social Networks

Link Prediction Heuristics Predict link between nodes

Connected by the shortest path With the most common neighbors (length 2 paths) More weight to low-degree common nbrs (Adamic/Adar) With more short paths (e.g. length 3 paths )

exponentially decaying weights to longer paths (Katz measure)

Page 31: Recommendation in Advertising and Social Networks

Previous Empirical Studies*

Random Shortest Path

Common Neighbors

Adamic/Adar Ensemble of short paths

Link

pre

dict

ion

accu

racy

*

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

How do we justify these observations?

Especially if the graph is sparse

31

Page 32: Recommendation in Advertising and Social Networks

Link Prediction – Generative Model

Unit volume universe

Model:1. Nodes are uniformly distributed points in a latent space

2. This space has a distance metric

3. Points close to each other are likely to be connected in the graph Logistic distance function (Raftery+/2002)

32

Page 33: Recommendation in Advertising and Social Networks

33

1

½

Higher probability of linking

radius r

α determines the steepness

Link prediction ≈ find nearest neighbor who is not currently linked to the node.

Equivalent to inferring distances in the latent space

Link Prediction – Generative Model

Model:1. Nodes are uniformly distributed points in a latent space

2. This space has a distance metric

3. Points close to each other are likely to be connected in the graph

Page 34: Recommendation in Advertising and Social Networks

Previous Empirical Studies*

Random Shortest Path

Common Neighbors

Adamic/Adar Ensemble of short paths

Link

pre

dict

ion

accu

racy

*

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Especially if the graph is sparse

34

Page 35: Recommendation in Advertising and Social Networks

Common Neighbors

Pr2(i,j) = Pr(common neighbor|dij)

jkikijjkikjkik2 dd)d|d,d()d|~Pr()d|~Pr(j)(i,Pr Pkjki

Product of two logistic probabilities, integrated over a volume determined by dij

i j

35

As α∞ Logistic Step function

Much easier to analyze!

Page 36: Recommendation in Advertising and Social Networks

Common Neighbors

36

Everyone has same radius r

i j

)dr,A(r,j)(i,Pr ij2

# common nbrs gives a bound

on distance

DD

rr/2

ij

/1

ij

V(r)εη/N12d

V(r)εη/N12

21εNη)dr,A(r,ε

NηP

η=Number of common

neighbors

V(r)=volume of radius r in

D dims

Unit volume universe

Page 37: Recommendation in Advertising and Social Networks

Common Neighbors

OPT = node closest to i MAX = node with max common neighbors with i

Theorem:

w.h.p

Link prediction by common neighbors is asymptotically optimal

dOPT ≤ dMAX ≤ dOPT + 2[ε/V(1)]1/D

37

Page 38: Recommendation in Advertising and Social Networks

Common Neighbors: Distinct Radii Node k has radius rk .

ik if dik ≤ rk (Directed graph) rk captures popularity of node k

38

Type 2: i k j

rk rk

A(rk , rk ,dij)

i jk

Type 1: i k j

rirj

A(ri , rj ,dij)

i jk

i

rkk

j

m

Page 39: Recommendation in Advertising and Social Networks

Type 2 common neighbors

i j

kη1 ~ Bin[N1 , A(r1, r1,

dij)]η2 ~ Bin[N2 , A(r2, r2,

dij)]

Example graph:

N1 nodes of radius r1 and N2 nodes of radius r2

r1 << r2

Pick d* to maximize Pr[η1 , η2 | dij]

w(r1) E[η1|d*] + w(r2) E[η2|d*] = w(r1)η1 + w(r2) η2

Weighted common neighbors

Inversely related to d*

Page 40: Recommendation in Advertising and Social Networks

Type 2 common neighbors

r is close to max radius

D1

deg

constr

constw(r)

Real world graphs generally fall in this range

i

rkk

j

Presence of common neighbor is very informative

Absence is very informative

Adamic/Adar

1/r

41

Page 41: Recommendation in Advertising and Social Networks

Previous Empirical Studies*

Random Shortest Path

Common Neighbors

Adamic/Adar Ensemble of short paths

Link

pre

dict

ion

accu

racy

*

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Especially if the graph is sparse

42

Page 42: Recommendation in Advertising and Social Networks

ℓ-hop Paths Common neighbors = 2 hop paths

For longer paths:

Bounds are weaker For ℓ’ ≥ ℓ we need ηℓ’ >> ηℓ to obtain similar bounds

justifies the exponentially decaying weight given to longer paths by the Katz measure

δN,,ηg-11)r(rdij

43

Page 43: Recommendation in Advertising and Social Networks

Summary Three key ingredients

1. Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001

2. Triangle inequality holds necessary to extend to ℓ-hop paths

3. Points are spread uniformly at random Otherwise properties will depend on location as well as distance

44

Page 44: Recommendation in Advertising and Social Networks

Summary

Random Shortest Path

Common Neighbors

Adamic/Adar Ensemble of short paths

Link

pre

dict

ion

accu

racy

*

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

The number of paths matters, not the

length

For large dense graphs, common neighbors are

enough

Differentiating between different degrees is

important

In sparse graphs, length 3 or more

paths help in prediction.

45

Page 45: Recommendation in Advertising and Social Networks

Conclusions

Discussed two problems1. Estimating CTR for Content Match

Combat sparsity by hierarchical smoothing

2. Theoretical underpinnings Latent space model Link prediction ≈ finding nearest neighbors in this

space

46

Page 46: Recommendation in Advertising and Social Networks

Other Work

47

Web Search Finding Quicklinks [WWW ‘09]

Titles for Quicklinks [KDD ‘08]

Incorporating tweets into search results [ICWSM ‘11]

Website clustering [WWW ‘10]

Webpage segmentation [WWW ‘08]

Template detection [WWW ‘07]

Finding hidden query aspects [KDD ’09]

Computational Advertising Combining IR with click feedback [WWW

‘08]

Multi-armed bandits using hierarchies [SDM ‘07, ICML ‘07]

“Mortal” multi-armed bandits [NIPS ‘08]

Traffic Shaping [EC ‘12]

Graph Mining Epidemic thresholds [SRDS ‘03, Infocom

‘07]

Non-parametric prediction in dynamic graphs

Graph sampling [ICML ‘11]

Graph generation models [SDM ‘04, PKDD ‘05, JMLR ‘10]

Community detection [KDD ‘04, PKDD ‘04]

Page 47: Recommendation in Advertising and Social Networks

Advertising Setting

Content

match ad

Display Content Match

Sponsored Search

48

Page 48: Recommendation in Advertising and Social Networks

Advertising Setting

Pick ads

Text ads

Match ads to the content

Display Content Match

Sponsored Search

49

Page 49: Recommendation in Advertising and Social Networks

Common Neighbors: Distinct Radii Node k has radius rk .

ik if dik ≤ rk (Directed graph) rk captures popularity of node k

“Weighted” common neighbors: Predict (i,j) pairs with highest Σ w(r)η(r)Weight for nodes

of radius r

# common neighbors of radius r

i

rkk

j

m

50

Page 50: Recommendation in Advertising and Social Networks

Common Neighbors: Distinct Radii Node k has radius rk .

ik if dik ≤ rk (Directed graph) rk captures popularity of node k

“Weighted” common neighbors: Predict (i,j) pairs with highest Σ w(r)η(r)

i

rk

Weight for nodes of radius r

# common neighbors of radius r

k

j

m

51

Page 51: Recommendation in Advertising and Social Networks

52

Estimating CTR for Content Match

Level i

Page hierarchy Ad hierarchy

Region= (page node, ad node)

Region Hierarchy A cross-product of the page

hierarchy and the ad hierarchy

Region

Page 52: Recommendation in Advertising and Social Networks

53

Estimating CTR for Content Match Level 0

Level i

Page hierarchy Ad hierarchy

Region= (page node, ad node)

Region Hierarchy A cross-product of the page

hierarchy and the ad hierarchy

Page classes Ad classes

Region