[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...
TRANSCRIPT
On Measurement of Influence in Social Networks
Behnam HajianSchool of Computer Science
Carleton University, Ottawa, Canada
Email: [email protected]
Tony WhiteSchool of Computer Science
Carleton University, Ottawa, Canada
Email: [email protected]
Abstract—One of the issues to be resolved in social rec-ommender systems is the identification of opinion leaders ina network. Finding effective people in societies has been akey question for many groups; e.g., marketers. The researchundertaken in this paper focuses on finding important nodes ina network based on their behaviour as well as the structure of thenetwork. This paper views the propagation of information in asocial network as a process of infection. The paper proposesan algorithm called the Probability Propagation Method formeasuring the probability of infection of all the nodes in anetwork starting from a given node in the network. Then,assuming independence in activation of nodes in a network,a method is proposed for ranking nodes according to theircapabilities in infecting a larger number of nodes in a network.These methods are validated using simulation software in whicha non-deterministic model of information diffusion is simulatedon several classes of network.
I. INTRODUCTION
On-line recommender systems are often not valued because
their users feel suggestions from these systems are useless [1].
It has been shown that the main reason for this feeling is that
people prefer to get recommendations from people they trust
rather than automated advertising software [2].
According to Iyengar et al. [3], friends have a significant
effect on the purchase decisions of a customer. It has been
observed that people are infected by a new idea from being
exposed to the innovation by their friends, rather than adver-
tisements. This motivates marketers to turn to viral marketing
rather than ad-hoc advertising. It is clear that influence varies
from person to person in a social network and identifying
the most influential people can significantly affect the effec-
tiveness of an advertising campaign. It is this problem that
motivates the research reported in this paper.
This paper focuses on how to rank nodes (people) according
to their ability to infect more nodes if that node individually
gets infected. This problem is referred to in this paper as the
influence measurement problem because influence is modelled
as the ability of a node to infect other nodes in the network.
The remainder of this paper is organized in five sections.
Section two describes related work. In Section three, a model
of information diffusion is proposed in the context of a social
network. Section four describes algorithms for measuring
influence in a social network. In order to validate the proposed
algorithms, a general framework of a social network is simu-
lated. Section five discusses the properties of our simulations,
experimental results and includes a discussion. Finally, the
paper is summarized in Section six.
II. RELATED WORK
There are several methods for measuring influence which
concentrate on the structure of the network rather than the
behavior of nodes and their interactions. However, researchers
have also considered other aspects, such as the analysis of
users’ behavior in a network. These are described in the
following paragraphs.
Meeyoung Cha et al. [4] studied the measurement of influ-
ence in Twitter based on the following three metrics:
• In-degree: number of followers of a user. This represents
the popularity of a user.
• Retweets: the number of times that followers of a specific
node pass-along a posting from a tweeter. Retweeting
causes propagation of a posting or news in a network.
• Mentions: the number of times that the name of a user
is mentioned in his followers’ postings.
Meeyoung Cha et al. [4] used Spearman’s rank correlation
coefficient for comparing users’ influence. The research com-
pared the three measures mentioned above to analyze topics of
the most influential people in Twitter according to the afore-
mentioned analysis and retweets. Meeyoung Cha et al. also
revealed that influence is not gained accidentally but requires
that users need to be consistently active in the network, an
observation corroborated by Khrabrov and Cybenko [5].
Afrasiabi and Benyoucef [6] studied node influence as a
function of link strength and incoming and outgoing clustering
value. Link strength is measured according to the volume
of interactions among users while the clustering value is
measured by the closeness of a node to highly interconnected
communities.
Kempe et al. [7] modelled influence using two diffusion
models: Linear Threshold and Independent Cascade. The
former model lies at the core of most subsequent generaliza-
tions. In the Linear Threshold model, a node v is influenced
by each neighbourhood w with a weight bv,w such that∑∀wneighborofv bv,w ≤ 1. In order for v to become active, the
sum of the weights of its active neighbours should be greater
than a given threshold θv . The Independent Cascade model is
based on interacting particle systems from probability theory
in which a node u may become active at time t+1 according
to the probability pu,v if v had become active in the time t.Khrabrov and Cybenko studied Dynamic Graph Analysis
[5] in which the number of daily mentions for each user is
considered as a indicator for computing different ranks such
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.27
101
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.27
101
as PageRank, drank, and starrank for influence analysis of
each node in a network. For example, starrank considers user
importance with respect to his or her neighborhoods. These
researchers used several primitive indices in combination and
show how influence rank changes with time.
III. MODELLING INFORMATION DIFFUSION
1) Definition: A node vi is called infected according to the
posting pk ∈ P if the posting pk belongs to the wall of the
node vi but it is called active if it is capable of infecting its
neighbourhoods.
λpk(vi) =
{1 pk ∈ Wi
0 Otherwise(1)
where, Wi is the wall of the node vi.2) Definition: ζpk(vi, vj) is a function mapping each pair of
nodes (vi, vj) to 1 if the node vi is infectious to his neighbour
vj , otherwise its value is equal to 0.3) Definition: ωpk(vi, vj) is the probability that the jth
individual gets infected from the ith individual by re-sharing
the posting pk from the ith individual.
A. Deterministic Independent Cascade ModelIn this model, a node, vj , has a threshold probability
of infection, θj . A node vi is capable of activating vj by
the posting pk, if the node vi is infectious for vj and the
probability of transmission of pk from vi to vj is greater than
a threshold θj . The activation function for a specific node is:
Λpk(vi, vj) =
{1 ζpk(vi, vj) = 1 and ωpk(vi, vj) > θj0 Otherwise
(2)
B. Non-deterministic Independent Cascade ModelIn this model, a node vi is capable of activating vj by the
posting pk, if the node vi is infectious for vj and the value of
a random variable d is less than or equal to the probability of
transmission of the posting pk from vi to vj . Formally,
Λpk(vi, vj) =
{1 ζpk(vi, vj) = 1 and d < ωpk(vi, vj)0 Otherwise
(3)
C. Activation Probability of a Node in a NetworkAccording to [7], in the Independent Cascade model, when-
ever a node becomes active, each of its neighbourhoods has
a single chance to be activated in a random process with
the probability ω(vi, vj) independent of the history of other
neighbours’ trials in activating vj .As shown in Figure 1, although the probability of activa-
tion of a node from one of its neighbours in a network is
independent of trials from other neighbours, the probability
that vj becomes activated by vi in a network depends on
the probability of activation of vi times the probability of
transmission from vi to vj . In other words:
Pr(Λpk(vi, vj) = 1) = Pr(ζpk(vi, vj) = 1)× ωpk(vi, vj),(4)
1) Definition: The path pathpk(vs, vj) is called an active
path iff ∀ < vn, vm >∈ pathpk(vs, vj) : Λpk(vn, vm) = 1
then the edge < vn, vm > is called an active edge.
Vs
1
2Vi
... VjVi”
Wi’j
...
n n-1
Vi’
Wij
Wi”j3
Fig. 1: Multi-path transmission in a network
If we have an active starting node vs, the probability that any
arbitrary node vj is activated starting from vs can be measured
by finding all of the paths from vs to vj . Mathematically:
Pr(λpk(vj) = 1 | λpk(vs) = 1)
= 1−∏
∀Pa∈pathpk (vs,vj)
(1−∏
∀<vn,vm>∈Pa
ωpk(vn, vm)),
(5)
where pathpk(vs, vj) returns all of the paths starting
from node vs ending at vj . In the above equation∏∀<vn,vm>∈Pa ω
pk(vn, vm) is the probability of activation of
vj from vs through the path Pa where < vn, vm > denotes
the directed edge from vn to vm in the corresponding path.
This problem can be mapped into a network reliability
problem [8], many of which are NP-complete. Fortunately,
however, reliability calculations on acyclic graphs are often
solvable with polynomial algorithms [8].
D. Measuring Influence in a Social Network
1) Definition: The infection function δ(vs) maps every
node vs ∈ V from the set of vertices in the graph to the
expected set of nodes infected in the graph starting from vs.
The function η returns the immediate neighbourhoods infected
from vs.
δ(vs) = η(vs) ∪ (⋃
∀vl∈η(vs)
δ(vl)), (6)
The problem of measuring influence of each node in a socialnetwork is equivalent to rating nodes in the graph according to
the ability of each node to infect other nodes in the network.
In other words, influence is measured according to the ability
to cause propagation of information. Formally, a node vi is
the most influential node in the network iff:
∀vj ∈ V :| δ(vi) |≥| δ(vj) | . (7)
The above equation models influence as a recursive concept
that considers influence of a node as a function of influence
102102
of its neighbourhoods; see [9] for a similar definition.
E. Measuring Influence in non-Deterministic Cascade Model
There is a low complexity method for measuring influence
in a deterministic cascade model [10]. The question of interest
in this paper is measuring the influence of a node considering
a random process in infecting nodes across the network. In a
random process, a piece of information pk may be transmitted
from vi to vj according to the probability ωpk(vi, vj). Accord-
ing to Equation 5 from Section III-C, in order to calculate the
probability of activation of a node, we need to calculate all
of the paths starting from a node to all the other nodes in
the graph. Knowing the activation probabilities, the expected
number of active nodes starting from the node vs (or influence)
is given by the following equation:
E(| δ(vs) |) = 1× Pr(| δ(vs) |= 1)
+ 2× Pr(| δ(vs) |= 2)
+ ...
+ n× Pr(| δ(vs) |= n).
(8)
The first part refers to the probability of activation of one
node in a network which has(n1
)terms as follows:
Pr(| δ(vs) |= 1) =∑
∀vi∈VPr(λ(vi))
∏∀vj �=vi
(1− Pr(λ(vj))).
(9)
The calculation of equations 8 and 9 is equivalent to generating
all of the subsets of a set which is non-polynomial. The goal
of this research is to estimate the value of equation 8 for all
network nodes.
IV. ALGORITHM FOR MEASURING INFLUENCE IN A
SOCIAL NETWORK
The process of activation of a series of nodes can be
represented as a sequence of transitions in a Markov chain
model [11]. The non-deterministic progressive cascade model
is represented as a sequence of transitions between 2n states
where n =| V | is the number of nodes in the network. Each
state can be represented as a permutation of a binary vector of
nodes si = {a1, a2, ..., an} in which each element represents
the activation of a node in the network. The outcome of the
next experiment is independent of the series of previous states.
A. Approximating Activation probability by Iteration
In order for a node vj to become activated, this node has to
be activated by at least one of its neighbourhoods. This fact
is demonstrated in the following equation:
Pr(λpk(vj) = 1)
= 1−∏
∀vi∈η(vj)
(1− Pr(λpk(vi) = 1)× ωpk(vi, vj)), (10)
The second expression in the above equation; i.e.,∏∀vi∈η(vj)
(1− Pr(λpk(vi) = 1)× ωpk(vi, vj)) is the proba-
bility that vj is activated by none of its neighbourhoods.
In order to measure the value of Equation 10, an algorithm
is proposed similar to the power iteration method [12]. In
this algorithm, the probability of activation of the node vj at
time t+ 1 is a function of the probability of activation of its
neighbourhoods at time t. This algorithm runs in 0 < t < MIsteps or until an error threshold, φ, is reached.
Equation 10 results in the following matrix manipulation
where ωvi,vj is the probability of transmission of the idea
from node vi to vj and the operator⊗
is defined as PRt+1j =
1−∏Ni=1(1−ωi,jPRt
i) applied to the following matrices. PRis the activation probability vector:
PRt+1 =
⎡⎢⎣
ωv1,v1 ... ωv1,vN... ...
...
ωvN ,v1 ... ωvN ,vN
⎤⎥⎦⊗
PRt (11)
B. Probability Propagation Algorithm
The above method can be formulated in Algorithm 1 called
the Probability Propagation Method. It is a conjecture that
Equation 13 converges to the probabilities of activations when
it is applied to a directed acyclic graph.
In order to create a directed acyclic graph, the method
that was used in this algorithm is running the algorithm
iteratively for each node while the outgoing edges have been
removed from the corresponding node. The outgoing edges
makes no contribution to the probability of activation for the
corresponding node.
Algorithm 1 Probability propagation algorithm for measuring
activation probability of nodes in a network; removing cycles
by ignoring out going edges for a particular node
t = 0error = φ+ 1Proc Prob-Prop(double φ, int vs.Id, int MI)
for k = 1→| V | dowhile error > φ and t < MI doPRt[vs.Id] = 1t = t+ 1for all vj ∈ V do
tmp = 1for all vi ∈ η(vj) do
if vi �= vk thentmp = tmp ∗ (1− PRt[vi.Id] ∗ ωvivj )
end ifend forPRt+1[vj .Id] = 1− tmp
end forerror =
∑vi∈V |PRt+1[vi.Id]− PRt[vi.Id]|
end whileend forreturn PRt+1
The complexity of the algorithm is O(| E | × | V | ×MI).Assuming activation independence of nodes in the graph of
the corresponding network, the expected number of infections
103103
becomes:
E(| δ(vs) |) =∑
∀vj∈Vpvs,vj (12)
V. EXPERIMENTAL SPECIFICATION
In our social network simulation, individuals are modelled
by agents with creating content and re-sharing behaviours.
Agents have properties: trust values, post propagation rate
PPRi represented by a real number in the range of [0, 1] and
is randomly generated using a uniform random distribution; a
vector of interests, V Ii =< Ii1, ..., Iim >, containing m real
numbers between 0 and 1. Each element Ii1 represents the
degree of interest of the agent i in the topic 1 ≤ j ≤ m.
The wall of each agent is represented by the set Wi ={w1, ..., wl} containing l elements, each is a pointer to a
posting generated by the agent i itself or re-shared by the
agent i from one of its friends. The elements of the wall of
each agent i are accessible by its friend to consume. In this
simulation, consuming a posting pk by an agent vj means re-
sharing of the posting by adding a pointer to the wall of the
agent vj with the probability of ωpk(vi, vj). More formally,
Consume(pk, vi, vj) =
{ Wj = Wj ∪ {pk} if(Γpk (vi, vj) = 1)do nothing Otherwise
(13)
The trust set T ki = {T k
i1, ..., Tkin} is also constructed by
generating its elements using a Gaussian random generator
each of which represents the degree of trust of the agent vihas in its friend, agent vj on the topics related to the posting
pk.
A. Simulation of Posting Creation
In this simulation, a posting pk is represented by the vector
of relatedness, V Tk =< tk1, ..., tkm >. Each element of this
vector is generated according to the corresponding element
in the vector of the interest of the user vi. Hence, a number
−1 < dl < 1 is randomly generated as noise according to the
normal distribution and is added to the corresponding element
of the vector of relatedness. In other words, ∀tkl ∈ V Tk :
tkl =
⎧⎨⎩
Iil + dl 0 ≤ Iil + dl ≤ 10 Iil + dl < 01 Iil + dl > 1
(14)
where Iil is the lth element in the vector of interest of the
user vi.
B. Simulation of Re-Sharing of Postings
Once a node is activated, each of its neighbours gets a single
chance to propagate the posting according to the probability
ω(vs, vi). If the probability of propagation exceeds the value
of random variable 0 ≤ d ≤ 1, the post will be added to the
wall of the target agent vi and the agent will be infected and
activated as well.
Λpk(vs, vi) =
{1 d ≤ ωpk(vs, vi)0 Otherwise
(15)
C. Probability of Re-sharing of a Posting
The probability of propagation of a posting from the user
vi to vj is a function of the similarity between the vector
of interests of agent vj and the vector of related topics of
posting pk, the laziness of the target agent vj and the trust
value between the two agents. For the experimental evaluation
reported in this paper, we have:
ωpk(vi, vj) = Ψ(vj , pk)× τpk(vj , vi)× Γ(vj). (16)
where Ψ(vj , pk) = Correl(vj .V I, pk.V T ).
D. Experimental Evaluation
Algorithm 1 was tested on Epinion, Facebook [13], [14],
random graph, small-world, scale free networks with 10, 100
and 1000 nodes. Table I provides details related to these
datasets. For each topology, one instance of the network
was created using sampling of 10, 100 and 1000 nodes and
selecting the corresponding edges for the sampled nodes. In
the simulation, 1 posting was generated by one of the nodes
randomly, and that posting was tested on all the nodes to
measure the number of infected nodes for 1000 iterations.
Prpk(λ(vi) | λ(vs)) =#of times the node vi gets infected
# of Iterations.
(17)
Dataset of of Type DescriptionNodes Edges
Epinion 75,879 508,837 Directed Who-trusts-whom
of Epinions.com
Facebook 63,891 274,076 Directed Friendship
TABLE I: Properties of Epinion and Facebook datasets
The properties of, and results for, the simulated networks
are shown in Table II. Please note that the threshold φ was
set to 0.001 in all of the experiments. The results of statistical
analysis of these simulations are shown in Table III. Spearman
and Pearson correlation coefficients were used to measure the
correlation between the anticipated number of nodes infected
in the network and the results of simulation on different
networks. In fact, the expected number of nodes calculated by
the proposed algorithm is a rank which sorts nodes according
to their capability of infecting higher numbers of nodes in a
network.
Figure 2 shows the correlation between the results of
prediction of probability propagation algorithm and simulation
of propagation of a posting in the simulated social network
using the Epinion dataset.
VI. FUTURE WORK AND CONCLUSIONS
This paper proposes a formal model of a class of social
network and uses it to characterize the problem of measuring
104104
Dataset of Max, Avg Max,Avg Edges of infected of Hops
Epinion-10 13 5, 2.0 3, 0.69
Facebook-10 6 5, 1.75 3, 0.5
Random Graph-10 5 4, 2.0 2, 0.8
Small World-10 26 6, 2.19 5, 0.99
Scale Free-10 10 4, 1.72 2, 0.51
Epinion-100 1359 29, 4.2 15, 1.75
Facebook-100 63 11, 1.97 6, 0.73
Random Graph-100 239 25, 3.23 14, 1.43
Small World-100 300 20, 2.19 10, 0.93
Scale Free-100 100 14, 2.24 4, 0.72
Epinion-1000 29509 275, 21.39 139, 5.24
Facebook-1000 2184 83, 9.64 37, 2.96
Random Graph-1000 10022 299, 4.81 96, 1.96
Small World-1000 2882 66, 3.1 27, 1.3
Scale Free-1000 1000 42, 1.85 7, 0.51
TABLE II: Statistics about the maximum and average number
and level of propagations and hops to which a posting
penetrated
Dataset 10 100 1000Epinion 0.98, 0.98 0.97, 0.96 0.86, 0.85
Facebook 0.98, 0.99 0.98, 0.89 0.91, 0.73
Scale-free 0.98, 0.99 0.96, 0.90 0.91, 0.73
Random Graph 1.00, 1.00 0.97, 0.94 0.97, 0.96
Small-World 0.95, 0.99 0.98, 0.98 0.86, 0.37
TABLE III: Correlation between simulation and proposed
algorithm in predicting number of infected nodes using
Spearman’s correlation and Pearson correlation
influence in that network. Two diffusion models: one determin-
istic and one non-deterministic are proposed as behavioural
processes useful in characterizing influence.
Section 4 provided an algorithm to measure the probability
of infection of nodes in the network according to a method
called Probability Propagation. This method uses the notion of
a Markov chain process to describe the process of activation of
nodes in a network. This algorithm consists of a combination
of two concepts: Markov chain theory and eigenvector cen-
trality. It is observed that the proposed method can estimate
the number of activated nodes in a network given the starting
point for propagation with a high level of accuracy. Clearly,
further work must be undertaken using data from actual social
networks such as Facebook or Twitter.
REFERENCES
[1] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl, “Evaluating collabora-tive filtering recommender systems,” ACM Transactions on InformationSystems (TOIS), vol. 22, no. 1, pp. 5–53, 2004.
[2] J. Golbeck, “Generating predictive movie recommendations from trustin social networks,” Trust Management, pp. 93–104, 2006.
[3] R. Iyengar, S. Han, and S. Gupta, “Do friends influence purchases ina social network?” Harvard Business School Marketing Unit WorkingPaper No. 09-123, 2009.
[4] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi, “Measuring userinfluence in twitter: The million follower fallacy,” in 4th InternationalAAAI Conference on Weblogs and Social Media (ICWSM), 2010.
[5] A. Khrabrov and G. Cybenko, “Discovering influence in communicationnetworks using dynamic graph analysis,” in Social Computing (Social-Com), 2010 IEEE Second International Conference on. IEEE, 2010,pp. 288–294.
[6] A. Afrasiabi Rad and M. Benyoucef, “Towards detecting influential usersin social networks,” E-Technologies: Transformation in a ConnectedWorld, pp. 227–240, 2011.
[7] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread ofinfluence through a social network,” in Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and datamining. ACM, 2003, pp. 137–146.
[8] M. Ball, “Computational complexity of network reliability analysis: Anoverview,” Reliability, IEEE Transactions on, vol. 35, no. 3, pp. 230–239, 1986.
[9] B. Hajian and T. White, “Modelling influence in a social network:Metrics and evaluation,” in Social Computing (SocialCom), 2011 IEEEThird International Conference on. IEEE, 2011, pp. 498–500.
[10] B. Hajian, “On measuring influence and its properties in social net-works,” Master’s thesis, Carleton University School of Computer Sci-ence, 2012.
[11] L. Tierney, “Introduction to general state-space markov chain theory,”Markov chain Monte Carlo in practice, pp. 59–74, 1996.
[12] I. Ipsen and R. Wills, “Analysis and computation of google’s pagerank,”in 7th IMACS international symposium on iterative methods in scientificcomputing, Fields Institute, Toronto, Canada, vol. 5, no. 8, 2005.
[13] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networksof diffusion and influence,” in Proceedings of the 16th ACM SIGKDDinternational conference on Knowledge discovery and data mining.ACM, 2010, pp. 1019–1028.
[14] J. Leskovec, “Stanford large network dataset collection,http://snap.stanford.edu/data/,” November.
105105