thang n. dinh, dung t. nguyen, my t. thai dept. of computer & information science &...

24
Cheap, Easy, and Massively Effective Viral Marketing in Social Networks: Truth or Fiction? Thang N. Dinh, Dung T. Nguyen, My T. Thai Dept. of Computer & Information Science & Engineering University of Florida, Gainesville, FL Hypertext-2012, Milwaukee, WI. USA

Upload: horace-tyler

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Cheap, Easy, and Massively Effective Viral Marketing in Social Networks:

Truth or Fiction?

Thang N. Dinh, Dung T. Nguyen, My T. ThaiDept. of Computer & Information Science & Engineering

University of Florida, Gainesville, FL

Hypertext-2012, Milwaukee, WI. USA

Spread of InfluenceWord-of-mouth effect:

Trust our friends more than strangers

Online Social Networks (OSNs) Platform for spreading INFLUENCEInformationInnovation Political influence…

Thang N. [email protected] 2

1. Introduction

Source: www.wikispaces.com

Source: http://3.bp.blogspot.com/

Viral Marketing as an Optimization ProblemGiven the network, select top users such

that by targeting them, the spread of influence is maximized (Domingos et al. ‘01, Richardson et al. ‘02, Kempe et al ’03,…)

Common perception: Targeting only few nodes (high centrality) Influence the whole network.CheapEasyMassively effective

Thang N. [email protected] 3

Information Propagation: Observation 1M. Cha et al. WWW’09,

Propagation in Flickr.Not widely – within two

yards Not quickly, it takes a long

time.

J. Leskovec et al. ACM TWEBRecommendations often

stop after one-hop The average delay in information propagation across social links is

about 140 days!!!

1. Introduction

Information Propagation: Observation 2

Many social networks are power-law: many nodes with low-degree and few nodes with high degree

Number of nodes with degree

Most real-world networks:

My T. [email protected] 5

Thang N. Dinh @ ACM HT’ 12

6

Questions ???In the presence of time-limit propagation,

is viral marketing still cheap & massively efficient ?

How to select seed for fast propagation ?Does power-law topology really helps

spreading influence ?Can targeting one (or a few) nodes influence

the whole networks? How about targeting nodes?

Thang N. Dinh @ ACM HT’ 12

7

Our contributionsTheoretically justify the seeding size to

influence the network in presence of time-limit and power-law topology.

Study the difference in the hardness of the influence problem in general networks vs. power-law networks.

Provide VirAds, a scalable algorithm for fast influence propagation.

Thang N. Dinh @ ACM HT’ 12

8

COST-EFFECTIVE, FAST, MASSIVE VIRAL MARKETING PROBLEM

Cost-effective, Fast, and Massive viral marketing problem (CFM)Given

Network G=(V, E).Diffusion model

ObjectiveSpread the influence into

the whole network within d hops

TaskFind the minimum set

of nodes to target!

9

Source: M. G. Rodriguez, J. Leskovec, A. Krause

Diffusion ModelDeterministic model

Inactive Active : Has a fraction of active neighbors ()

Active Inactive: Nope.A (slight) generation of

Majority Voting Model ()“Special case” of the Linear

threshold model, butThe threshold is deterministicA single fraction for every

node.

11

1

1

0

0

22

1

Hardness of ApproximatingCFM is NP-hard Approximating CFM in is NP-hard

Even for (adjusting Feige’s proof for Set cover)

My T. [email protected] 11

2. Hardness of Approximating

S1

e1

e2

e3

e4

e5

e|

U|

x1

x2

xt

x'1

x'2

x't

S3

S2

D’ D S U

S|S|

. . . .

. . . . . . . . . .

.

w1

u uv1

. . .

v

w1 Wc(ρ

)

uv2 uvd-1 . . .

1

1

1|min)(

t

t

t

ttc

Thang N. Dinh @ ACM HT’ 12

12

Hardness of Approximating (d>1)2. Hardness of Approximating

A solution of size k A solution of sizeAn optimal solution An optimal solution

. . .

. . .

. . .

. . .ba

c

d

ba

c

d

w1 . . .w2 Wc(ρ)(V, E)G (V’, E’)G’

)(ck

optkc )(optk

abilityinapproxim ln n)O(abilityinapproxim ln n)O(

Direct Failures (d=1) D-hop failures

CFM Power-law Networks

Corollary: Even selecting all vertices results in a constant approximation algorithm (vs. hardness).

13

3. Power-law Networks

Theorem 1. For power-law networks with , there is a constant that depends only on , so that influencing the whole network would require targeting at least nodes.

Thang N. Dinh @ ACM HT’ 12

14

Power-law networks vs. Genral NetworksGeneral networks

Selecting one node can influence the whole networks (e.g. star graph)

Hard to approximate within a factor

Power-law networksMust select at least

nodes to influence the network

Approximating within a constant factor is trivial (just selecting all nodes in the network)

Thang N. Dinh @ ACM HT’ 12

15

Optimal solutions via Math. Prog.

16

Optimal solutions via Math. Prog.Propagation in Erdos’s Collaboration

network:

Thang N. Dinh @ ACM HT’ 12

Optimal seeding size Y-axis: Seeding size in percentX-axis: No. of propagation round

𝝆=𝟎 .𝟒 𝝆=𝟎 .𝟔

𝝆=𝟎 .𝟖

Thang N. Dinh @ ACM HT’ 12

17

VIRADS: ALGORITHM FOR FAST INFORMATION PROPAGATION

Efficient Heuristics for Large Scale Networks

VirAds-Fast-Spreading Algorithm1. A priority queue of nodes:

priority = # affected vertices + #affected edges.

2. Pickup vertex with highest priority3. Recalculate priority, and select the vertex

if the new priority is still the highest, repeat otherwise

4. Update the number of activated vertices with the selected node

5. Lazy update: Update priority for only vertices that are “affected” by the selected vertex.

18

4. VirAds: Scalable Algorithm

Heuristics for Large Scale Networks

Datasets: Physic collaboration network 37K vertices, 63

K edgesFacebook New Orleans City: 90 K vertices,

~4M edges.Orkut social network: 3 M vertices, 220 M

edgesCompetitors:

Max degree selectorVirads: One-hop greedy selectorExaustive Update:

Expensive multi-hop greedy Cannot run for large networks (e.g. Orkut) 19

Experiments Results

20

Solution Quality

Seeding size when , + X-axis: No. of propagation rounds + Y-axis: Seeding size in percent.

Physics Facebook

Orkut

Experimental ResultsRunning time

My T. [email protected] 21

Physics Facebook

Orkut

Running time when , + X-axis: No. of propagation rounds + Y-axis: Time in seconds (log-scale).

Thang N. Dinh @ ACM HT’ 12

22

SummaryFinding seed nodes is a hard problem in

general“Not so hard” in power-law networksThe seeding cost is often NOT cheap.Propose VirAds: Scalable algorithm for target

selectionBetter than centrality heuristicScalable for networks of millions nodes

Thang N. Dinh @ ACM HT’ 12

23

AcknowledgementWe would like to thank NSF and

DTRA for their generous support.

We thank anonymous reviewers who provided helpful comments to improve the paper.

24Hypertext-2012,

Milwaukee, WI. USA

THANK YOU FOR LISTENING!

Q&A