[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...

5

Click here to load reader

Upload: t

Post on 27-Mar-2017

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

On Measurement of Influence in Social Networks

Behnam HajianSchool of Computer Science

Carleton University, Ottawa, Canada

Email: [email protected]

Tony WhiteSchool of Computer Science

Carleton University, Ottawa, Canada

Email: [email protected]

Abstract—One of the issues to be resolved in social rec-ommender systems is the identification of opinion leaders ina network. Finding effective people in societies has been akey question for many groups; e.g., marketers. The researchundertaken in this paper focuses on finding important nodes ina network based on their behaviour as well as the structure of thenetwork. This paper views the propagation of information in asocial network as a process of infection. The paper proposesan algorithm called the Probability Propagation Method formeasuring the probability of infection of all the nodes in anetwork starting from a given node in the network. Then,assuming independence in activation of nodes in a network,a method is proposed for ranking nodes according to theircapabilities in infecting a larger number of nodes in a network.These methods are validated using simulation software in whicha non-deterministic model of information diffusion is simulatedon several classes of network.

I. INTRODUCTION

On-line recommender systems are often not valued because

their users feel suggestions from these systems are useless [1].

It has been shown that the main reason for this feeling is that

people prefer to get recommendations from people they trust

rather than automated advertising software [2].

According to Iyengar et al. [3], friends have a significant

effect on the purchase decisions of a customer. It has been

observed that people are infected by a new idea from being

exposed to the innovation by their friends, rather than adver-

tisements. This motivates marketers to turn to viral marketing

rather than ad-hoc advertising. It is clear that influence varies

from person to person in a social network and identifying

the most influential people can significantly affect the effec-

tiveness of an advertising campaign. It is this problem that

motivates the research reported in this paper.

This paper focuses on how to rank nodes (people) according

to their ability to infect more nodes if that node individually

gets infected. This problem is referred to in this paper as the

influence measurement problem because influence is modelled

as the ability of a node to infect other nodes in the network.

The remainder of this paper is organized in five sections.

Section two describes related work. In Section three, a model

of information diffusion is proposed in the context of a social

network. Section four describes algorithms for measuring

influence in a social network. In order to validate the proposed

algorithms, a general framework of a social network is simu-

lated. Section five discusses the properties of our simulations,

experimental results and includes a discussion. Finally, the

paper is summarized in Section six.

II. RELATED WORK

There are several methods for measuring influence which

concentrate on the structure of the network rather than the

behavior of nodes and their interactions. However, researchers

have also considered other aspects, such as the analysis of

users’ behavior in a network. These are described in the

following paragraphs.

Meeyoung Cha et al. [4] studied the measurement of influ-

ence in Twitter based on the following three metrics:

• In-degree: number of followers of a user. This represents

the popularity of a user.

• Retweets: the number of times that followers of a specific

node pass-along a posting from a tweeter. Retweeting

causes propagation of a posting or news in a network.

• Mentions: the number of times that the name of a user

is mentioned in his followers’ postings.

Meeyoung Cha et al. [4] used Spearman’s rank correlation

coefficient for comparing users’ influence. The research com-

pared the three measures mentioned above to analyze topics of

the most influential people in Twitter according to the afore-

mentioned analysis and retweets. Meeyoung Cha et al. also

revealed that influence is not gained accidentally but requires

that users need to be consistently active in the network, an

observation corroborated by Khrabrov and Cybenko [5].

Afrasiabi and Benyoucef [6] studied node influence as a

function of link strength and incoming and outgoing clustering

value. Link strength is measured according to the volume

of interactions among users while the clustering value is

measured by the closeness of a node to highly interconnected

communities.

Kempe et al. [7] modelled influence using two diffusion

models: Linear Threshold and Independent Cascade. The

former model lies at the core of most subsequent generaliza-

tions. In the Linear Threshold model, a node v is influenced

by each neighbourhood w with a weight bv,w such that∑∀wneighborofv bv,w ≤ 1. In order for v to become active, the

sum of the weights of its active neighbours should be greater

than a given threshold θv . The Independent Cascade model is

based on interacting particle systems from probability theory

in which a node u may become active at time t+1 according

to the probability pu,v if v had become active in the time t.Khrabrov and Cybenko studied Dynamic Graph Analysis

[5] in which the number of daily mentions for each user is

considered as a indicator for computing different ranks such

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.27

101

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.27

101

Page 2: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

as PageRank, drank, and starrank for influence analysis of

each node in a network. For example, starrank considers user

importance with respect to his or her neighborhoods. These

researchers used several primitive indices in combination and

show how influence rank changes with time.

III. MODELLING INFORMATION DIFFUSION

1) Definition: A node vi is called infected according to the

posting pk ∈ P if the posting pk belongs to the wall of the

node vi but it is called active if it is capable of infecting its

neighbourhoods.

λpk(vi) =

{1 pk ∈ Wi

0 Otherwise(1)

where, Wi is the wall of the node vi.2) Definition: ζpk(vi, vj) is a function mapping each pair of

nodes (vi, vj) to 1 if the node vi is infectious to his neighbour

vj , otherwise its value is equal to 0.3) Definition: ωpk(vi, vj) is the probability that the jth

individual gets infected from the ith individual by re-sharing

the posting pk from the ith individual.

A. Deterministic Independent Cascade ModelIn this model, a node, vj , has a threshold probability

of infection, θj . A node vi is capable of activating vj by

the posting pk, if the node vi is infectious for vj and the

probability of transmission of pk from vi to vj is greater than

a threshold θj . The activation function for a specific node is:

Λpk(vi, vj) =

{1 ζpk(vi, vj) = 1 and ωpk(vi, vj) > θj0 Otherwise

(2)

B. Non-deterministic Independent Cascade ModelIn this model, a node vi is capable of activating vj by the

posting pk, if the node vi is infectious for vj and the value of

a random variable d is less than or equal to the probability of

transmission of the posting pk from vi to vj . Formally,

Λpk(vi, vj) =

{1 ζpk(vi, vj) = 1 and d < ωpk(vi, vj)0 Otherwise

(3)

C. Activation Probability of a Node in a NetworkAccording to [7], in the Independent Cascade model, when-

ever a node becomes active, each of its neighbourhoods has

a single chance to be activated in a random process with

the probability ω(vi, vj) independent of the history of other

neighbours’ trials in activating vj .As shown in Figure 1, although the probability of activa-

tion of a node from one of its neighbours in a network is

independent of trials from other neighbours, the probability

that vj becomes activated by vi in a network depends on

the probability of activation of vi times the probability of

transmission from vi to vj . In other words:

Pr(Λpk(vi, vj) = 1) = Pr(ζpk(vi, vj) = 1)× ωpk(vi, vj),(4)

1) Definition: The path pathpk(vs, vj) is called an active

path iff ∀ < vn, vm >∈ pathpk(vs, vj) : Λpk(vn, vm) = 1

then the edge < vn, vm > is called an active edge.

Vs

1

2Vi

... VjVi”

Wi’j

...

n n-1

Vi’

Wij

Wi”j3

Fig. 1: Multi-path transmission in a network

If we have an active starting node vs, the probability that any

arbitrary node vj is activated starting from vs can be measured

by finding all of the paths from vs to vj . Mathematically:

Pr(λpk(vj) = 1 | λpk(vs) = 1)

= 1−∏

∀Pa∈pathpk (vs,vj)

(1−∏

∀<vn,vm>∈Pa

ωpk(vn, vm)),

(5)

where pathpk(vs, vj) returns all of the paths starting

from node vs ending at vj . In the above equation∏∀<vn,vm>∈Pa ω

pk(vn, vm) is the probability of activation of

vj from vs through the path Pa where < vn, vm > denotes

the directed edge from vn to vm in the corresponding path.

This problem can be mapped into a network reliability

problem [8], many of which are NP-complete. Fortunately,

however, reliability calculations on acyclic graphs are often

solvable with polynomial algorithms [8].

D. Measuring Influence in a Social Network

1) Definition: The infection function δ(vs) maps every

node vs ∈ V from the set of vertices in the graph to the

expected set of nodes infected in the graph starting from vs.

The function η returns the immediate neighbourhoods infected

from vs.

δ(vs) = η(vs) ∪ (⋃

∀vl∈η(vs)

δ(vl)), (6)

The problem of measuring influence of each node in a socialnetwork is equivalent to rating nodes in the graph according to

the ability of each node to infect other nodes in the network.

In other words, influence is measured according to the ability

to cause propagation of information. Formally, a node vi is

the most influential node in the network iff:

∀vj ∈ V :| δ(vi) |≥| δ(vj) | . (7)

The above equation models influence as a recursive concept

that considers influence of a node as a function of influence

102102

Page 3: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

of its neighbourhoods; see [9] for a similar definition.

E. Measuring Influence in non-Deterministic Cascade Model

There is a low complexity method for measuring influence

in a deterministic cascade model [10]. The question of interest

in this paper is measuring the influence of a node considering

a random process in infecting nodes across the network. In a

random process, a piece of information pk may be transmitted

from vi to vj according to the probability ωpk(vi, vj). Accord-

ing to Equation 5 from Section III-C, in order to calculate the

probability of activation of a node, we need to calculate all

of the paths starting from a node to all the other nodes in

the graph. Knowing the activation probabilities, the expected

number of active nodes starting from the node vs (or influence)

is given by the following equation:

E(| δ(vs) |) = 1× Pr(| δ(vs) |= 1)

+ 2× Pr(| δ(vs) |= 2)

+ ...

+ n× Pr(| δ(vs) |= n).

(8)

The first part refers to the probability of activation of one

node in a network which has(n1

)terms as follows:

Pr(| δ(vs) |= 1) =∑

∀vi∈VPr(λ(vi))

∏∀vj �=vi

(1− Pr(λ(vj))).

(9)

The calculation of equations 8 and 9 is equivalent to generating

all of the subsets of a set which is non-polynomial. The goal

of this research is to estimate the value of equation 8 for all

network nodes.

IV. ALGORITHM FOR MEASURING INFLUENCE IN A

SOCIAL NETWORK

The process of activation of a series of nodes can be

represented as a sequence of transitions in a Markov chain

model [11]. The non-deterministic progressive cascade model

is represented as a sequence of transitions between 2n states

where n =| V | is the number of nodes in the network. Each

state can be represented as a permutation of a binary vector of

nodes si = {a1, a2, ..., an} in which each element represents

the activation of a node in the network. The outcome of the

next experiment is independent of the series of previous states.

A. Approximating Activation probability by Iteration

In order for a node vj to become activated, this node has to

be activated by at least one of its neighbourhoods. This fact

is demonstrated in the following equation:

Pr(λpk(vj) = 1)

= 1−∏

∀vi∈η(vj)

(1− Pr(λpk(vi) = 1)× ωpk(vi, vj)), (10)

The second expression in the above equation; i.e.,∏∀vi∈η(vj)

(1− Pr(λpk(vi) = 1)× ωpk(vi, vj)) is the proba-

bility that vj is activated by none of its neighbourhoods.

In order to measure the value of Equation 10, an algorithm

is proposed similar to the power iteration method [12]. In

this algorithm, the probability of activation of the node vj at

time t+ 1 is a function of the probability of activation of its

neighbourhoods at time t. This algorithm runs in 0 < t < MIsteps or until an error threshold, φ, is reached.

Equation 10 results in the following matrix manipulation

where ωvi,vj is the probability of transmission of the idea

from node vi to vj and the operator⊗

is defined as PRt+1j =

1−∏Ni=1(1−ωi,jPRt

i) applied to the following matrices. PRis the activation probability vector:

PRt+1 =

⎡⎢⎣

ωv1,v1 ... ωv1,vN... ...

...

ωvN ,v1 ... ωvN ,vN

⎤⎥⎦⊗

PRt (11)

B. Probability Propagation Algorithm

The above method can be formulated in Algorithm 1 called

the Probability Propagation Method. It is a conjecture that

Equation 13 converges to the probabilities of activations when

it is applied to a directed acyclic graph.

In order to create a directed acyclic graph, the method

that was used in this algorithm is running the algorithm

iteratively for each node while the outgoing edges have been

removed from the corresponding node. The outgoing edges

makes no contribution to the probability of activation for the

corresponding node.

Algorithm 1 Probability propagation algorithm for measuring

activation probability of nodes in a network; removing cycles

by ignoring out going edges for a particular node

t = 0error = φ+ 1Proc Prob-Prop(double φ, int vs.Id, int MI)

for k = 1→| V | dowhile error > φ and t < MI doPRt[vs.Id] = 1t = t+ 1for all vj ∈ V do

tmp = 1for all vi ∈ η(vj) do

if vi �= vk thentmp = tmp ∗ (1− PRt[vi.Id] ∗ ωvivj )

end ifend forPRt+1[vj .Id] = 1− tmp

end forerror =

∑vi∈V |PRt+1[vi.Id]− PRt[vi.Id]|

end whileend forreturn PRt+1

The complexity of the algorithm is O(| E | × | V | ×MI).Assuming activation independence of nodes in the graph of

the corresponding network, the expected number of infections

103103

Page 4: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

becomes:

E(| δ(vs) |) =∑

∀vj∈Vpvs,vj (12)

V. EXPERIMENTAL SPECIFICATION

In our social network simulation, individuals are modelled

by agents with creating content and re-sharing behaviours.

Agents have properties: trust values, post propagation rate

PPRi represented by a real number in the range of [0, 1] and

is randomly generated using a uniform random distribution; a

vector of interests, V Ii =< Ii1, ..., Iim >, containing m real

numbers between 0 and 1. Each element Ii1 represents the

degree of interest of the agent i in the topic 1 ≤ j ≤ m.

The wall of each agent is represented by the set Wi ={w1, ..., wl} containing l elements, each is a pointer to a

posting generated by the agent i itself or re-shared by the

agent i from one of its friends. The elements of the wall of

each agent i are accessible by its friend to consume. In this

simulation, consuming a posting pk by an agent vj means re-

sharing of the posting by adding a pointer to the wall of the

agent vj with the probability of ωpk(vi, vj). More formally,

Consume(pk, vi, vj) =

{ Wj = Wj ∪ {pk} if(Γpk (vi, vj) = 1)do nothing Otherwise

(13)

The trust set T ki = {T k

i1, ..., Tkin} is also constructed by

generating its elements using a Gaussian random generator

each of which represents the degree of trust of the agent vihas in its friend, agent vj on the topics related to the posting

pk.

A. Simulation of Posting Creation

In this simulation, a posting pk is represented by the vector

of relatedness, V Tk =< tk1, ..., tkm >. Each element of this

vector is generated according to the corresponding element

in the vector of the interest of the user vi. Hence, a number

−1 < dl < 1 is randomly generated as noise according to the

normal distribution and is added to the corresponding element

of the vector of relatedness. In other words, ∀tkl ∈ V Tk :

tkl =

⎧⎨⎩

Iil + dl 0 ≤ Iil + dl ≤ 10 Iil + dl < 01 Iil + dl > 1

(14)

where Iil is the lth element in the vector of interest of the

user vi.

B. Simulation of Re-Sharing of Postings

Once a node is activated, each of its neighbours gets a single

chance to propagate the posting according to the probability

ω(vs, vi). If the probability of propagation exceeds the value

of random variable 0 ≤ d ≤ 1, the post will be added to the

wall of the target agent vi and the agent will be infected and

activated as well.

Λpk(vs, vi) =

{1 d ≤ ωpk(vs, vi)0 Otherwise

(15)

C. Probability of Re-sharing of a Posting

The probability of propagation of a posting from the user

vi to vj is a function of the similarity between the vector

of interests of agent vj and the vector of related topics of

posting pk, the laziness of the target agent vj and the trust

value between the two agents. For the experimental evaluation

reported in this paper, we have:

ωpk(vi, vj) = Ψ(vj , pk)× τpk(vj , vi)× Γ(vj). (16)

where Ψ(vj , pk) = Correl(vj .V I, pk.V T ).

D. Experimental Evaluation

Algorithm 1 was tested on Epinion, Facebook [13], [14],

random graph, small-world, scale free networks with 10, 100

and 1000 nodes. Table I provides details related to these

datasets. For each topology, one instance of the network

was created using sampling of 10, 100 and 1000 nodes and

selecting the corresponding edges for the sampled nodes. In

the simulation, 1 posting was generated by one of the nodes

randomly, and that posting was tested on all the nodes to

measure the number of infected nodes for 1000 iterations.

Prpk(λ(vi) | λ(vs)) =#of times the node vi gets infected

# of Iterations.

(17)

Dataset of of Type DescriptionNodes Edges

Epinion 75,879 508,837 Directed Who-trusts-whom

of Epinions.com

Facebook 63,891 274,076 Directed Friendship

TABLE I: Properties of Epinion and Facebook datasets

The properties of, and results for, the simulated networks

are shown in Table II. Please note that the threshold φ was

set to 0.001 in all of the experiments. The results of statistical

analysis of these simulations are shown in Table III. Spearman

and Pearson correlation coefficients were used to measure the

correlation between the anticipated number of nodes infected

in the network and the results of simulation on different

networks. In fact, the expected number of nodes calculated by

the proposed algorithm is a rank which sorts nodes according

to their capability of infecting higher numbers of nodes in a

network.

Figure 2 shows the correlation between the results of

prediction of probability propagation algorithm and simulation

of propagation of a posting in the simulated social network

using the Epinion dataset.

VI. FUTURE WORK AND CONCLUSIONS

This paper proposes a formal model of a class of social

network and uses it to characterize the problem of measuring

104104

Page 5: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Dataset of Max, Avg Max,Avg Edges of infected of Hops

Epinion-10 13 5, 2.0 3, 0.69

Facebook-10 6 5, 1.75 3, 0.5

Random Graph-10 5 4, 2.0 2, 0.8

Small World-10 26 6, 2.19 5, 0.99

Scale Free-10 10 4, 1.72 2, 0.51

Epinion-100 1359 29, 4.2 15, 1.75

Facebook-100 63 11, 1.97 6, 0.73

Random Graph-100 239 25, 3.23 14, 1.43

Small World-100 300 20, 2.19 10, 0.93

Scale Free-100 100 14, 2.24 4, 0.72

Epinion-1000 29509 275, 21.39 139, 5.24

Facebook-1000 2184 83, 9.64 37, 2.96

Random Graph-1000 10022 299, 4.81 96, 1.96

Small World-1000 2882 66, 3.1 27, 1.3

Scale Free-1000 1000 42, 1.85 7, 0.51

TABLE II: Statistics about the maximum and average number

and level of propagations and hops to which a posting

penetrated

Dataset 10 100 1000Epinion 0.98, 0.98 0.97, 0.96 0.86, 0.85

Facebook 0.98, 0.99 0.98, 0.89 0.91, 0.73

Scale-free 0.98, 0.99 0.96, 0.90 0.91, 0.73

Random Graph 1.00, 1.00 0.97, 0.94 0.97, 0.96

Small-World 0.95, 0.99 0.98, 0.98 0.86, 0.37

TABLE III: Correlation between simulation and proposed

algorithm in predicting number of infected nodes using

Spearman’s correlation and Pearson correlation

influence in that network. Two diffusion models: one determin-

istic and one non-deterministic are proposed as behavioural

processes useful in characterizing influence.

Section 4 provided an algorithm to measure the probability

of infection of nodes in the network according to a method

called Probability Propagation. This method uses the notion of

a Markov chain process to describe the process of activation of

nodes in a network. This algorithm consists of a combination

of two concepts: Markov chain theory and eigenvector cen-

trality. It is observed that the proposed method can estimate

the number of activated nodes in a network given the starting

point for propagation with a high level of accuracy. Clearly,

further work must be undertaken using data from actual social

networks such as Facebook or Twitter.

REFERENCES

[1] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl, “Evaluating collabora-tive filtering recommender systems,” ACM Transactions on InformationSystems (TOIS), vol. 22, no. 1, pp. 5–53, 2004.

[2] J. Golbeck, “Generating predictive movie recommendations from trustin social networks,” Trust Management, pp. 93–104, 2006.

[3] R. Iyengar, S. Han, and S. Gupta, “Do friends influence purchases ina social network?” Harvard Business School Marketing Unit WorkingPaper No. 09-123, 2009.

[4] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi, “Measuring userinfluence in twitter: The million follower fallacy,” in 4th InternationalAAAI Conference on Weblogs and Social Media (ICWSM), 2010.

[5] A. Khrabrov and G. Cybenko, “Discovering influence in communicationnetworks using dynamic graph analysis,” in Social Computing (Social-Com), 2010 IEEE Second International Conference on. IEEE, 2010,pp. 288–294.

[6] A. Afrasiabi Rad and M. Benyoucef, “Towards detecting influential usersin social networks,” E-Technologies: Transformation in a ConnectedWorld, pp. 227–240, 2011.

[7] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread ofinfluence through a social network,” in Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and datamining. ACM, 2003, pp. 137–146.

[8] M. Ball, “Computational complexity of network reliability analysis: Anoverview,” Reliability, IEEE Transactions on, vol. 35, no. 3, pp. 230–239, 1986.

[9] B. Hajian and T. White, “Modelling influence in a social network:Metrics and evaluation,” in Social Computing (SocialCom), 2011 IEEEThird International Conference on. IEEE, 2011, pp. 498–500.

[10] B. Hajian, “On measuring influence and its properties in social net-works,” Master’s thesis, Carleton University School of Computer Sci-ence, 2012.

[11] L. Tierney, “Introduction to general state-space markov chain theory,”Markov chain Monte Carlo in practice, pp. 59–74, 1996.

[12] I. Ipsen and R. Wills, “Analysis and computation of google’s pagerank,”in 7th IMACS international symposium on iterative methods in scientificcomputing, Fields Institute, Toronto, Canada, vol. 5, no. 8, 2005.

[13] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networksof diffusion and influence,” in Proceedings of the 16th ACM SIGKDDinternational conference on Knowledge discovery and data mining.ACM, 2010, pp. 1019–1028.

[14] J. Leskovec, “Stanford large network dataset collection,http://snap.stanford.edu/data/,” November.

105105