understanding crowds’ migration on the web yong wang komal pal aleksandar kuzmanovic northwestern...

31
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University http://networks.cs.northwestern.edu

Upload: hortense-stokes

Post on 02-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Understanding Crowds’ Migration

on the Web

Yong WangKomal PalAleksandar Kuzmanovic

Northwestern University

http://networks.cs.northwestern.edu

Page 2: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

MSN

CNN

(5.8M)

(6.1M)(14.3M)

(4.3M)(19M)

(2.3

M)

(1.3

M)

(4.7M)

(2M)

A User-Driven Web Network

Node: #unique visitors to website.

Edge: #Common visitors between endpoints.

Fig: Target graph

2

Page 3: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Motivation

Study the Web from the point of view of its users

– Evaluate properties of network• Analyze user movement among websites• Determine properties of the user-driven Web network• Compare to Online Social Networks and “classical” Web

networks

– Mine data to serve –• Online advertisers• Search engines

3

Page 4: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Our Contributions

Generate the user-driven Web network

Study the user-driven Web

Apply the user-driven Web

4

Page 5: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Outline

Generate the user-driven Web network

Study the user-driven Web

Apply the user-driven Web

5

Page 6: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Information Reconstruction

Fact– Plethora of information made publicly available on a

daily basis• E.g., Google Trends, AdPlanner, Analytics, ALEXA, etc.

Problem– The publicly available information snippets are not

comprehensive

Approach– Combine multiple data sources and develop

methods to reconstruct globally meaningful information

6

Page 7: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web 7

Page 8: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Parent node

Child/edge nodes

Generating a User-Driven Web

8

Page 9: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Crawling

Breadth First Search for 15 days

3 seeds – nytimes.com, sina.com.cn, timesofindia.com

US centric network : ~297K nodes and 2M edgesChina centric network : ~290K nodes and 2.7M edgesIndia centric network : ~297K nodes and 2.8M edges

Captured information:• Unique #users – Google AdPlanner• Shared users – Google Trends

9

Page 10: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Problems without Normalization

Network without Normalization (Problems!!!)

100

50

25

C

F

B

G

100

20

10

D

C

E

A

100

20

10

D

C

E

A

100

50

25

C

F

B

G

Fig: Sub-graph AFig: Sub-graph B

Fig: Merged graphs A&B without normalization

Weight to the first child is always set to 100

10

Page 11: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Ideal Normalized Network

11

100

20

10

D

C

E

A

10

5

2.5

C

F

B

G

Fig: Normalized graph – Target scenario

Weights scaled w.r.t weight(AD)

Page 12: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Normalization Process

Parent nodes

Relationship between Website 2 and child nodes of Website 1

12

Page 13: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Normalization Process

Phase 1: Select a starting point (a node with max in-degree – say C)– Select parent (A) of C, and child of A (D). – Normalize all other parent nodes to weight of AD

(by querying the parent nodes together with A) • Normalized nodes: Nodes whose all edges are normalized

13

AB

F

G

C

D

Normalized node

Child of a normalized node

Page 14: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Phase 2: Back link from a child of a normalized node to its parent– The weight of the forward link must be equal to the

weight of the backward link

14

Normalization Process

A C

B D

E

Normalized node

Child of a normalized node

Page 15: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Phase 3: A child of a normalized node (D) shares a child (C) with a normalized node (A)– We can normalize D (by querying it together with

node A)– Note: the shared child (green) could itself either be

a normalized node or a child of a normalized node

15

Normalization Process

A

B

E

C

D

Either normalized node or a child of a normalized node

Normalized node

Child of a normalized node

Page 16: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Phase 4: A node (D) shares a child (C) with a normalized node (A)– We can normalize D (by querying it together with

node A)– Note: Node D (black) is initially neither a normalized

node nor a child of a normalized node

16

Normalization Process

A

B

E

C

D

Neither normalized node nor a child of a normalized node

Either normalized node or a child of a normalized node

Normalized node

Child of a normalized node

Page 17: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Normalization Process

Validation– Popularity ranking of our normalized network

compared to Google AdPlanner – The two tanking results match in 91.66% of cases

Adding absolute traffic– Google AdPlanner for #unique users

Unifying two scale systems– Top 10 children are sufficient– Relative weight -> Absolute weight

17

Page 18: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Outline

Generate the user-driven Web network

Study the user-driven Web

Apply the user-driven Web

18

Page 19: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Weighted Degree Distribution

–The sum of link weights for each node

–Log-normal distribution• OSN and WWW follow a power-law distribution

– Small-traffic sites filtered by Google Trends–Seed-free properties with distinctions

• Extreme values

19

Minimum degree nodes Maximum degree nodes

High peak => strong connectedness

US network India network China network

Page 20: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Average Path Length and Diameter

–User-Driven Web has properties closer to Online Social Networks than to WWW

• The human component makes the network more connected

–Larger average path length for the Chinese network• Because high-degree clusters in the core are loosely

connected with low-degree clusters at edges• For the other 2 networks, high-degree clusters in the core

are well connected to the nodes at the edges

20

Page 21: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

– High clustering coefficients• 4 orders of magnitude higher than the corresponding

random graphs– Clustering coefficients uniform for the three networks

• China:– High-degree and low-degree nodes are separately

clustered and loosely connected• US:

– High-degree nodes are clustered in the core while low degree nodes are not well clustered

• India:– A smaller difference between high- and low-degree

node clusters

21

Clustering Coefficient

Page 22: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

User Driven Web is closer to Online Social Networks than to WWW in all properties– The human component prevails

Seed-free properties– Independent from the starting crawling point

Scale-free properties– Independent from the network scale

22

Network Properties

Page 23: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Outline

Generate the user-driven Web network

Study the user-driven Web

Apply the user-driven Web

23

Page 24: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Online Advertising

MSN

CNN

(5.8M)

(6.1M)(14.3M)

(4.3M)(19M)

(2.3

M)

(1.3

M)

(1700)

(2M)

24

Page 25: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Website Selector

Problem: Find the best selection of websites (ad hosts) that provide maximum visibility at minimum cost

Target users – – Independent advertisers – Ad commissioners

Alternative approaches:– Greedy

• Choose the websites in descending order of their popularity

– Sub-optimal • Linear optimization without shared user information

25

Page 26: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Modeling

Inputs – – CPI model – random normal distribution – User-driven web – Budget

Output – – List of potential ad hosts providing maximum

visibility within budget constraints

26

Page 27: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Optimization Problem

Maximize :

Σi uixi – ΣjΣk(j≠k) sjkxjxk

subject to linear constraint :

Σi cixi < = B

where –

xi – website (node) i

ui – unique #users on node xi sjk – #shared users between xj and xk

ci – CPI for node xi B – budget constraint

27

Page 28: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Performance Results

Greedy approach used as a baselineSub-optimal approach lacks shared-user

information– And hence doesn’t perform well in improving ads

visibility Website Selector improves performance by 22-

25%

28

Page 29: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Eliminating High-Volume Websites

5% of top 1,000 websites eliminated (volume >= 1M)

Several cases of high volume nodes being ignored due to significant number of shared users

MSN

CNN

(2.9M)

(11M) (23M)

(1.2

M) (0.7M)

CPI~$42

CPI~$49CPI~$53

29

Page 30: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal PalUnderstanding Crowd’s Migration on the Web

Conclusions

Generated user-driven web– Used publicly available information – Designed methods to fuse pieces into a global network

Studied user-driven web and its properties– Scale- and seed-free network properties– User-driven web different from “classical Web” but

similar to Online Social NetworksDesigned website selector– Incorporates idea of “shared visitors” between websites– Increases visibility of ads by 22-25%, increases revenue– Tailored for ad commissioners

30

Page 31: Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Thank You

http://networks.cs.northwestern.edu