least cost influence in multiplex social networks

68
Least Cost Influence in Multiplex Social Networks MODEL REPRESENTATION AND ANALYSIS Presented by: Ayushi Jain Rahul Bobhate Natasha Mandal Ankur Sachdeva Dung T. Nguyen, Huiyuan Zhang, Soham Das, My T. Thai, Thang N. Dinh

Upload: natasha-mandal

Post on 15-Jan-2017

158 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Least Cost Influence in Multiplex Social Networks

Least Cost Influence in Multiplex Social Networks

MODEL REPRESENTATION AND ANALYSIS

Presented by:Ayushi Jain Rahul BobhateNatasha Mandal Ankur Sachdeva

Dung T. Nguyen, Huiyuan Zhang, Soham Das, My T. Thai, Thang N. Dinh

Page 2: Least Cost Influence in Multiplex Social Networks

Structure• Define a few terms

• Motivation

• Related work

• Challenges and proposed solution

• Math notations and problem definition

• Lossless coupling

• Lossy coupling

• Influence relay

• Experiments

• Conclusion

Page 3: Least Cost Influence in Multiplex Social Networks

What are Multiplex networks?• Networks extended to multiple edges between nodes like in more than one

social media platforms

• Example: A set of users who interact of Facebook, Twitter & Foursquare

Page 4: Least Cost Influence in Multiplex Social Networks

What is least cost influence (LCI) problem?

• A minimum number of seed users who can eventually influence a large number of users

• Example: How to find the least advertising cost set of influencers who can influence a massive number of users

Or

How to find the minimum number of inducements required for the product adoption to reach a certain proportion of the population

Page 5: Least Cost Influence in Multiplex Social Networks

motivation• In the recent decade, the popularity of OSNs has created a major

communication medium which allows for information sharing

• Similar to real social networks: word-of-mouth & peer-pressure effect

Do you know how much time does an individual spend(on average) on social media?

1.72 hours per

day

28% of online activity

Page 6: Least Cost Influence in Multiplex Social Networks

some more statistics!Number of Facebook Users

Page 7: Least Cost Influence in Multiplex Social Networks
Page 8: Least Cost Influence in Multiplex Social Networks

Number of users(in millions)

Page 9: Least Cost Influence in Multiplex Social Networks

Why is it important to study information diffusion in these networks??• Considerable number of overlapping users

• Users can relay the information from one network to another

• Example:

Jack

Page 10: Least Cost Influence in Multiplex Social Networks

If we only consider the information propagation in one network, we’ll fail to identify the most influent users

Page 11: Least Cost Influence in Multiplex Social Networks

Single network• Kempe et al.

• Find a set of k users who can maximize influence• Stochastic process- Independent Cascade Model (IC)• Probability of influencing friends α Strength of Friendship• NP Hard- greedy algorithm with approximation ratio (1-1/e)

• Linear Threshold Model (LT)• User adopts a new product when total influence of friends exceeds a threshold

• Dinh et al.• Suggested algorithm for a special case of LT• Influence between users is uniform and user is influenced if a certain fraction ρ of his friends

are active

Related work

Page 12: Least Cost Influence in Multiplex Social Networks

Multiplex Networks• Yagan et al.

• Studied connection between online and offline networks• Investigated outbreak of information using SIR model on random networks

• Liu at al.• Analyzed networks formed by online interaction and offline events

Drawbacks:• Studied flow of information and network clustering but not LCI• Did not study specific optimization problem of viral marketing

• Shen et al.• Studied information propagation in multiplex OSN• Combined all networks into one network by representing an overlapping user as a

super node• Cannot preserve individual networks’ properties

Page 13: Least Cost Influence in Multiplex Social Networks

challenges

How to evaluate influence of

overlapping users in multiplex networks?

In which network, a user is easier to be influenced?

Which network propagates the

influence better?

Page 14: Least Cost Influence in Multiplex Social Networks

• In this paper, we study LCI for a set of users with minimum cardinality to influence a certain fraction of users in multiplex networks

• Represent a model for various coupling schemes to reduce the problem in multiplex networks to an equivalent problem on a single network. Coupling schemes can be applied for most popular diffusion models including: Linear Threshold model, Stochastic Threshold model, and Independent Cascading model

• Introduce a new metric called influence relay to analyze the influence diffusion process in both- a single network and multiplex networks

Proposed solution

Page 15: Least Cost Influence in Multiplex Social Networks

Graph Notations• Gi – Weighted directed graph consisting of (Vi, Ei, θi, Wi).

• Vi – Set of vertices in graph Gi, represents users in the network.

• Ei – Set of edges in graph Gi, which represent the connection between the users.

• Wi – Set of weights of the edges which belong to Ei, which represents the strength of influence or the strength of connection.

• Nui- , Nu

i+ – Set of incoming and outgoing neighbors of u.

• θi(u) – Threshold indicating the persistence of opinions of u.

Page 16: Least Cost Influence in Multiplex Social Networks

Least Cost Influence (LCI) Problem definition• Given:

• System of k networks G1..k

• Set of users U• Time hop d• 0<β<1

• To find:• A seed set S ⊂ U of minimum cardinality to such that • There are at least β fraction of users U active• After d hops

Page 17: Least Cost Influence in Multiplex Social Networks
Page 18: Least Cost Influence in Multiplex Social Networks
Page 19: Least Cost Influence in Multiplex Social Networks

Linear Threshold model• Influence and information diffusion model for single network

• Could be extended to handle multiple networks

• In LT model:• Every user is either active or inactive• A user u is active if he/she accepts the information OR• The total influence of their neighbors is greater than their threshold.

• After each time hop, inactive users are activated and they continue to activate new users.• d be the number of hops in the network till which information is propagated.• Active set of users after d hops caused by seed set S is denoted by Ad(G1...k, S)

Page 20: Least Cost Influence in Multiplex Social Networks

Coupling Schemes

• Lossless coupling scheme:• Scheme to combine multiple networks into single network.• No loss of data while combining networks. (Obviously!)• Advantages:

• Use existing algorithms• Same quality of solution

Page 21: Least Cost Influence in Multiplex Social Networks

Challenges• Heterogeneity of user participation:

• User might have joined a single network• Other user might have joined multiple networks• Recognition of users is difficult

• Inter-network Influence propagation• User transmits the information in multiple networks• Represent transmission of influence between networks in a single network.

• Preserving properties of individual networks• Coupled network should preserve diffusion properties of individual networks.• Should be able to establish relationship between solution for coupled network and

individual network

Page 22: Least Cost Influence in Multiplex Social Networks
Page 23: Least Cost Influence in Multiplex Social Networks

Coupling scheme for LT-model• Solution to 1st challenge

• Introduce dummy nodes.• They represent a user u in the network Gi, in which the user is not registered.

• Solution to 2nd challenge• Introduce gateway vertices.• Introduce Synchronization edges. • Instead of an edge between two vertices, there exist • An edge between a user to a gateway vertex• And an edge from gateway vertex to a user

• Solution to 3rd challenge• Don’t need to do anything else.

Page 24: Least Cost Influence in Multiplex Social Networks
Page 25: Least Cost Influence in Multiplex Social Networks
Page 26: Least Cost Influence in Multiplex Social Networks
Page 27: Least Cost Influence in Multiplex Social Networks

Lemmas• Lemma 1: Suppose that the propagation process in the coupled network G

starts from the seed set which contains only gateway vertices S = {s01, . . . , s0

p}, then representative vertices are activated only at even propagation hops.

• Lemma 2: Suppose that the propagation process on G1...k and G starts from the same seed set S, then following conditions are equivalent: • User u is active after d propagation hops in G1...k.• There exists i such that ui is active after 2d − 1 propagation hops in G. • Vertex u0 is active after 2d propagation hops in G.

Page 28: Least Cost Influence in Multiplex Social Networks

Theorems• Theorem 1: Given a system of k networks G1...k with the user set U, the coupled

network G produced by the lossless coupling scheme, and a seed set S = {s1, s2, . . . , sp}, if Ad(G1...k, S) = {a1, a2, . . . , aq} is the set of active users caused by S after d propagation hops in multiplex networks, then A2d(G, S)= {a0

1, a11, . . . ,

ak1, . . ., a0

q, a1q, . . . , ak

q} is the set of active vertices caused by S after 2d propagation hops in the coupled network.

• Theorem 2: When the lossless scheme is used, the set S = {s1, s2, . . . , sp} influences β fraction of users in G1...k after d propagation hops if and only if S = {s0

1, s02, . . . , s0

p} influences β fraction of vertices in coupled network G after 2d propagation hops.

Page 29: Least Cost Influence in Multiplex Social Networks

Extension to other diffusion models• Lossless coupling scheme can be used for other diffusion models.

• Stochastic Threshold model• Independent Cascading model

• Similarity between LT model and other approaches• Same approach of using

• Gateway vertices• Representative vertices• Synchronization edges

Page 30: Least Cost Influence in Multiplex Social Networks

Lossy Coupling

MOTIVATION

• In the coupled network of Lossless Coupling which was shown, there were a large number of extra vertices and edges.

• It is ideal to have a compact coupled network which contains only users as vertices.

• Such a compact coupled network will inevitably have loss of information.

Page 31: Least Cost Influence in Multiplex Social Networks

Lossy Coupling

GOALS

• The goal is to design a scheme which will minimize this loss of information.

• The solution for finding the Least Cost Influence in the compact coupled network should be very close to the solution in the original multiplex network.

Page 32: Least Cost Influence in Multiplex Social Networks

Lossy CouplingOBSERVATION 1 • A user will be activated if there exists such that where is the set of active

users.

• We can relax the conditions to activate with positive parameters as in follows:

Page 33: Least Cost Influence in Multiplex Social Networks

Lossy CouplingPROPOSITION 1• For a system of networks , if

is satisfied, then user is activated.

• This can be used by checking the condition for a single network . The inequality still holds because .

Page 34: Least Cost Influence in Multiplex Social Networks

Lossy Coupling• can constitute for extra influence which may be required to activate

• can be made proportional to . In this way, when we choose [].

• In real life, we don’t know in which network will be activated. Hence, we have to use heuristics.

Page 35: Least Cost Influence in Multiplex Social Networks

Lossy CouplingOBSERVATION 2• When participates in multiple networks, it may be easier to influence in some

networks, than in others.

• For example if a node is in two networks:

Network 1: = 0.1, has 8 in-neighbors and each in-neighbor influences with = 0.1, it takes 1 neighbor to activate .

Network 2: = 0.7, has 8 in-neighbors and each in-neighbor influences with = 0.1, it takes 7 neighbors to activate .

Page 36: Least Cost Influence in Multiplex Social Networks

Lossy CouplingEASINESS

• Intuitively we can say that is easier to influence in Network 1.

• Formally, =

• We can use as for the equation stated in OBSERVATION 1.

Page 37: Least Cost Influence in Multiplex Social Networks

Lossy Coupling• Vertex Set is the set of users ,…}

• The threshold of vertex is

• The weight of edge is where if there is no edge from to in the network

Page 38: Least Cost Influence in Multiplex Social Networks

Lossy Coupling

For the blue node, =>

For the edge between red node and blue node,=>

Page 39: Least Cost Influence in Multiplex Social Networks

Lossy CouplingINVOLVEMENT

• If a user is surrounded by a group of friends who have a high influence on each other, the user tends to get influenced.

• We estimate of a node in a network by measuring how strongly the 1-hop neighborhood is connected and to what extent influence can propagate from one node to another in a 1-hop neighborhood.

Page 40: Least Cost Influence in Multiplex Social Networks

Lossy Coupling• Formally, of a node in a network is defined as where

AVERAGE• All parameters have same value i.e.

Page 41: Least Cost Influence in Multiplex Social Networks

Lossy CouplingTHEOREM 3• When a lossy coupling scheme is used, if the set of users activates fraction of

users in (lossy coupled network), then it activates at least fraction of users in (original system).

• The proof is based on the fact that the active state of a user in implies an active state of users in .

Page 42: Least Cost Influence in Multiplex Social Networks
Page 43: Least Cost Influence in Multiplex Social Networks

Influence RelayMOTIVATION• When information is diffused in multiplex networks, it may flow within a single

network or may travel through multiplex networks.

• What is the contribution of each component network in the influence process?

• How much information flows within a network or between networks?

• Quantifying these values will help us understand the diffusion process in multiplex networks.

Page 44: Least Cost Influence in Multiplex Social Networks

Influence RelayDEFINITION• The authors proposed as a metric to quantify the role of users in propagating

information.

• The of vertices is recursively defined depending on order of activation.

• = seed set, = coupled network, = number of hops after which the activation process stops, = hop at which u is activated.

• All inactive vertices in have an of 0.

Page 45: Least Cost Influence in Multiplex Social Networks

Influence Relay• For each activated vertex , of u, denoted by , is a linear combination of the of

its outgoing neighbors that are activated after .

• Formally, the of vertices is defined as:

Page 46: Least Cost Influence in Multiplex Social Networks

Influence Relay• The captures the amount of influence a vertex relays to other vertices after

adopting the information.

• Thus, the of a vertex depends largely on the of vertices that helps to activate and the weight of edges between and them.

• The vertex is responsible for of ’s .

• We add 1 to of since also contributes itself to the set of activated vertices.

Page 47: Least Cost Influence in Multiplex Social Networks

Influence RelayCOMPUTING INFLUENCE RELAY

• We compute of vertices in reverse order of the diffusion process.

• We construct the influence graph from the seed set to represent the diffusion process and to calculate the of all nodes in .

• The vertex set of nodes is .

• There is an edge from to in if has passed information to i.e. and .

• is a directed acyclic graph and the reverse topological ordering of takes linear time. The main loop runs for all the edges in so of all vertices can be computed in linear time.

Page 48: Least Cost Influence in Multiplex Social Networks

Input: A network , a seed set and the number of hops .

Output: The influence relay of all vertices.

← The influence graph caused by on

for each do

← 0

end for

Compute the topological ordering of vertices in

for down to 1 do

← + 1

total ← 0

for each do

total ← total +

end for

for each do

end for

end for

Return IR

Page 49: Least Cost Influence in Multiplex Social Networks

Influence RelayTHEOREM 4• One of the important properties of is that it preserves the number of activated

vertices.

• The total of seeding vertices is equal to the total number of activated vertices.

|

Page 50: Least Cost Influence in Multiplex Social Networks

Influence RelayINFLUENCE CONTRIBUTION• To obtain the contribution of a network to the diffusion process, we sum up of

all seed vertices in that network.

INTERNAL AND EXTERNAL INFLUENCE• This can be used to quantify the amount of information flowing within and

between networks.

Page 51: Least Cost Influence in Multiplex Social Networks

Influence Relay• When the information is propagated within a component network called the

“target” network there are two kinds of influence paths: • include edges only in the target network.• include some edges of other networks. They are formed when some of the

vertices are activated outside the target network.

• We adapt relay influence to measure internal influence (passes through internal paths) and external influence (passes through external paths) of the seed set in the target network as follows:

Page 52: Least Cost Influence in Multiplex Social Networks

Influence Relay• Each vertex has internal influence and external influence .

• Both values are calculated backwards from activated vertices under ’s influence.

• Only activated vertex in the target network receives 1 more influence unit to since we only consider the influence propagation in the target network.

• If a vertex is activated outside the target network, all internal influence is converted to external influence.

Page 53: Least Cost Influence in Multiplex Social Networks
Page 54: Least Cost Influence in Multiplex Social Networks

EXPERIMENTS

Page 55: Least Cost Influence in Multiplex Social Networks

Data Sets

Type

s of d

ata

sets

Real Networks

Synthesized Networks

Page 56: Least Cost Influence in Multiplex Social Networks

Real Networks• Experiments performed on 2 data sets :

• Foursquare (FSQ) and Twitter networks• Co-author networks in the area of Condensed Matter(CM), High-Energy Theory(Het), and Network

Science(NetS)

• Number of overlapping users in first dataset FSQ-Twitter is 4100.• For second dataset, the numbers of overlapping users of the network pairs CM-Het, CM-NetS, and Het-

NetS are 2860, 517, and 90, respectively.

Page 57: Least Cost Influence in Multiplex Social Networks

Real Networks

Weights of edges are randomly assigned

from 0 to 1.

The edge weights are then normalized so

that the total weight of incoming degree of each node is 1.

Threshold of each node is a random value from 0 to 1.

Page 58: Least Cost Influence in Multiplex Social Networks

Synthesized Networks• Synthesized networks generated by Erdos-Renyi random network model are

used for testing networks with controlled parameters.

• Two networks with 10000 nodes are formed by randomly connecting each pair of nodes with probabilities 0.0008 and 0.006.

• The average degrees of the two networks are 8 and 60.

Page 59: Least Cost Influence in Multiplex Social Networks

Comparison of coupling schemes Solution Quality

• In both networks the seed size is smallest when the lossless coupling scheme is used.• The seed sizes are only a bit larger using the lossy coupling schemes.

Page 60: Least Cost Influence in Multiplex Social Networks

Comparison of coupling schemes• The small seed size is obtained through two different means:

• Increasing the fraction of overlapping users.

• Increasing the number of propagation hops.

Page 61: Least Cost Influence in Multiplex Social Networks

Comparison of coupling schemesRunning Time

• The greedy algorithm runs much faster in the lossy coupled networks than in the lossless coupled networks.

• Using the lossy coupled networks reduces the running times by a factor of 2 in FSQ-Twitter and a factor 4 in the co-author networks in comparison to using the lossless coupled networks.

• The major disadvantages of the lossless coupling scheme are the doubled number of hops and the number of extra nodes and edges.

Page 62: Least Cost Influence in Multiplex Social Networks

Advantages of using coupled networksInfluencing a fraction β of the nodes in all networks:

• The results using our lossless coupling method outperform the results when we run the greedy algorithm on each network separately and take the union of the produced seed sets.

• In Co-author networks, the size of seed set is 30% larger, and in FSQ-Twitter, it is 47% larger than the size of seed sets using lossless coupling method.

Page 63: Least Cost Influence in Multiplex Social Networks

Influencing a fraction β of the nodes in a particular network:• The seed size decreases up to 9%, 25%, 17%, and 26% in CM, Het, FSQ, and Twitter,

respectively, when we consider these networks in connection with other networks.

• The external influence is substantial and accounts for large portions in many cases. For instance, when the influenced fraction β = 0.2, the external influence accounts for 27.3%, 52.7%, and 30.0% the total influence in CM, Het, and NetS, respectively.

Page 64: Least Cost Influence in Multiplex Social Networks

Analysis of seed sets• A significant fraction of the seed set is overlapping nodes although only 5%-7% users

of any network are overlapping users.

• For β = 0.4, the fraction of overlapping seed vertices is around 24.9% and 25% in the co-author and FSQ-Twitter networks, respectively.

• When β is small, there is high influence contribution of overlapping users(approx. 50% when β = 0.2). However when β is large, overlapping users are already selected so they are not favored.

Page 65: Least Cost Influence in Multiplex Social Networks

Mutual Impact of networks

• When k increases from 2 to 5, the seed size decreases several times. It implies that the introduction of a new OSN increases the diffusion of information significantly.

• The number of influenced vertices is raised 46% with the support of 3 new networks when k is changed from 2 to 5.

• the fraction of external influence is also increased dramatically from 39% when k = 2 to 67% when k = 5.

• All these results suggest that the existing networks may benefit from the newly introduced competitor.

Page 66: Least Cost Influence in Multiplex Social Networks

Conclusion and future Work• To tackle the LCI problem, novel coupling schemes are introduced to reduce

the problem to a version on a single network.

• A new metric is designed to quantify the flow of influence inside and between networks based on the coupled network.

• Exhaustive experiments provide new insights to the information diffusion in multiplex networks.

• In future, the LCI problem can be investigated in multiplex networks with heterogeneous diffusion models in which each network may have its own diffusion mode.

Page 67: Least Cost Influence in Multiplex Social Networks
Page 68: Least Cost Influence in Multiplex Social Networks

Thank you!!

Thank You!!