Dealing with Disappearance of an Actor Set inSocial Networks
Idrissa Sarr
Universite Cheikh Anta Diop
Dakar, Senegal
Email: [email protected]
Rokia Missaoui
Universite du Quebec en Outaouais
Quebec, Canada
Email: [email protected]
Romain Lalande
Universite du Quebec en Outaouais
Quebec, Canada
Email: [email protected]
Abstract—Social networks are dynamic structures that containa set of entities and links. In such a dynamic environment, aspecific node or a group of nodes can play an important role in theinformation flow transmission within the network and therefore,its disappearance may lead to a disconnected network or abreakdown in the information flow. The objective of this paper isto extend our previous work on managing a node disappearanceto handling the disappearance of a group of nodes. The proposedapproach relies on the role played by a group of nodes toconduct network changes and maintain the network connectedwhile restoring the information flow with a similar quality asbefore the group disappearance. We consider two situations (onlyone versus many communities) and categorize groups of nodesinto three classes (scattered, contiguous and hybrid). Hence, wemanage a group disappearance with respect to its class and thenetwork topology by adding new links in a parsimonious way andfinding a substitute for a leaving group. Our approach differsfrom existing link prediction solutions by the fact that it usesthe information flow quality as a key performance indicator toidentify the potential links to add and/or the possible substituteto a disappearing group. We implement a prototype by usingan open source social network analysis library (NetworkX) andwe validate our solution through experiments. The results showthe benefits of our solution in terms of response time and thenumber of added links.
I. INTRODUCTION
Social network analysis has attracted many researchers from
different fields and covers many topics such as centrality
computation, community detection, link prediction, influential
actor identification, and so on.A social network is generally represented as a graph of
a set of entities/actors (nodes) with links (edges) between
them. The structure of the graph is usually dynamic and can
be analyzed according to different approaches and techniques
[5], [16], [28]. As in any social structure, each actor plays a
more or less important role within the network. For example,
influential customers are seen as more important since they
play the role of those who have the best ability to spread
the information flow within the network, whereas influenced
customers can just follow or receive information from others.
However, even if some nodes are not influential individually,
they can play an important role in the information flow as
a whole. Therefore, the disappearance of a group of nodes
may have the same impact in the information flow as the
disappearance of a central node (i.e., the one with the highest
centrality score).
Furthermore, the problem of information flow in social
networks is very relevant and has been widely studied [2], [6],
[1], [24], [29], [9], [30], [12]. The structure of the network is
identified as one of the factors that can impact the information
flow. For instance, when the network is partitioned, generally
due to the disappearance of central nodes or a group of
nodes, the spread of information is broken or limited to a
small part of the overall network, and therefore some nodes
will never receive the information. With this in mind, it is
essential to efficiently manage the network structure in such a
way that a breakdown will never happen even if some nodes
disappear from the network. In [25], we propose a solution
that deals with the disappearance of one node at a time. The
proposed solution minimizes the number of network updates
while maintaining the information flow quality. It consists
to add new links and possibly select a substitute whenever
a node disappears from the network. This is why we place
it in the realm of link prediction solutions well studied in
[3], [17], [18], [10], [19]. Even though the solution described
in [25] differs from other solutions by using the information
flow quality to find the optimal links and avoid exploding the
number of links, it does not deal with the disappearance of a
group of nodes.
The goal of this paper is to tackle the problem of a
group disappearance from an existing network. The main
contributions of this work can be summarized as follows:
• An algorithm to first detect whether a leaving group of
nodes is a contiguous, scattered or hybrid one. Then, it
possibly finds a substitute for the leaving group and adds
links to maintain the network connected. To this end,
we rely on JOAN and JOAN-C algorithms described in
[25] that minimize the number of network updates while
maintaining the information flow quality. This approach
assumes that the network has only one community.
• An adaptation of our solution to the situation where the
network contains many communities. In such a case, we
first identify the communities in the network that contain
nodes of the leaving group and then, we handle our
algorithm in a parallel manner based on the configuration
of each subgroup in each community.
• A validation of our approach (through implementation
and experiments with network data sets) done by using a
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.85
492
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.85
493
cloud computing infrastructure that shows the feasibility
of our approach and evaluates its performance.
This paper is organized as follows: the next section gives
some background and related work on social network analysis.
Section III summarizes our previous work related to JOAN and
JOAN-C algorithms. In Section IV, the proposed approach is
defined and related procedures are described and illustrated.
The case of a network with well-identified communities is
presented in Section V. The validation of the approach is
given in Section VI while Section VII concludes this paper
and provides future work.
II. BACKGROUND
We consider S, a social network with a set of actors such
as individuals, organizations, objects, or events which are
connected via links (e.g., friendship, common interests or
features, collaboration). The network is represented as a graph
S = 〈V,E〉 where vertices in V are actors and edges in
E are interactions or ties between actors. In this paper, we
assume that all the interactions between actors are symmetric
and uniform, and hence, the edges are single, undirected
and unweighted. Moreover, we assume that the network is
connected and we aim to maintain such connectivity after the
disappearance of a node or even a set of nodes.
In the following, we first recall some basic and useful
notions that help understand our approach.
A. Centrality measures
There are many measures of node centrality in a graph to
capture the importance of an entity within such structure. For
example, the degree centrality, the betweenness centrality, and
the closeness centrality of a given node [28] are the frequently
used centrality measures.
Given Ni, an individual node belonging to a network, the
degree centrality of Ni provides the number of its direct
edges, while its betweenness centrality expresses the amount
of control that it has over the interactions of other nodes and
finally, its closeness centrality indicates how Ni is close to
the other nodes. A node with the highest degree (betweenness
or closeness resp.) centrality is called a leader (mediator or
witness resp.).
Similarly, the same metrics for an individual node can be
defined for a group of individual nodes [7]. For example, the
group degree centrality of a set G is defined as the number
of nodes outside G that are linked to elements of G. The
normalized group degree centrality in a given social network
S of size N is obtained by dividing the group degree by the
number of non-group actors. CSD(G) =
|Γ(G)|N−|G| , where Γ(G)
is the number of nodes linked to G. This centrality measure
will be used in the sequel to estimate the importance of a group
G in the network. For the definitions of other centrality metrics
such as group betweenness centrality and group closenesscentrality, the reader is referred to [7].
B. Distance measures
The geodesic distance between two nodes in a graph is the
number of edges in a shortest path connecting them. In case
of unconnected actors, the geodesic distance is either set to ∞or computed as the greatest distance in the network augmented
by 1 [13].
The eccentricity of a node i is the greatest geodesic distance
between i and any other node j in the network. It expresses
how far j is from i. Nodes with a low eccentricity can be
used to disseminate the information rapidly and play the role
of information flow authorities as studied by [1].
C. Related work
Many social network evolution models have been proposed
in dynamic networks, and are mainly based on adding nodes
or links. Dynamic networks are a kind of graphs [4], [23]
in which nodes and links can be added and/or deleted. The
deletion of a node can be managed by predicting and actually
adding new links. Adding new links is generally done by using
the triadic closure and focal closure [15], [23] through non
deterministic approaches generally limited to geodesic neigh-
borhood. We recall that the triadic closure is the probability
that B is linked to C knowing that A is already linked to B and
C. The focal closure is the probability that A and B that share
an interest could interact while the geodesic neighborhood of
A is the set of nodes located at a geodesic distance (i.e., the
shortest path) from A.
The studies in [18], [27] focus on predicting new links but
exclusively in static social networks. In [18], the objective is to
define approaches to link prediction based on measures that ex-
ploit the proximity of nodes within a network. More precisely,
the idea is to identify proximity measures that efficiently
predict the new links that will likely happen in the future in
a social network by assuming that two close nodes have a
greater probability to be linked. An experimental comparison
of a set of measures (e.g. Adamic/Adar distance, common
neighbor and Katz measures) is also described in [18] and
their impact on the quality of link prediction is proposed. In
[27], the authors take into account the evolution of a network
over time by integrating edge weights (potentially derived from
temporal characteristics) into existing link prediction methods.
They also investigate a new testing method to compute the
performance of prediction algorithms in ranking the neighbors
of a selected node. The proposed solutions usually rely on
the network topology and add new links without taking into
account a specific goal (e.g., information flow quality).
Finally, the issue raised in [14] is a bit more close to ours
since the authors handle the problem of identifying the node
of the network which will replace a given deleted node. To
that end, they propose a Bayesian approach by first computing
the posterior probability of each node in the network. The best
substitute is then the one with the closest posterior probability
to that of the deleted node. Moreover, they take into account
the hierarchical structure of nodes (a background knowledge
not reflected in the network) so that the substitute needs to
be of similar level of hierarchy (e.g. a Vice-president should
493494
be replaced by another Vice-president in the organization).
However, the link prediction problem is not studied. Some
methods to predict the new structure of a social network once
a node disappears is proposed (see [19], [25]). They consist
to possibly find a substitute for the leaving node provided it
plays a role in the network and/or the information flow quality.
New links can then be established and any isolated node is
discarded. However, these studies do not deal with a group
disappearance that can happen in any social network.
III. JOAN AND JOAN-C ALGORITHMS
In this section we describe the overall features of JOAN
and JOAN-C algorithms proposed in [25]. The goal of these
algorithms is to: (i) avoid a degradation of the information flow
that could follow the disappearance of a node in the network,
and (ii) identify a substitute to the disappearing node whenever
it is important in terms of its centrality measures.
To this end, nodes are classified into two main classes:
• Critical nodes are those which play such a central
role that their disappearance can lead to an amount of
changes in the network (disconnected network, increase
of diameter/radius, ...). Based on the node centrality
measures, we identify three kinds of critical nodes,
namely, the leader, the mediator and the witness that
have respectively the highest degree, betweenness and
closeness centrality. Several nodes may have the same
highest centrality measure within a network.
• Non critical nodes play less important roles than those
described above. However, some of them may have high
(but not the highest) centrality measures, and therefore,
we distinguish two kinds of non critical nodes: finger and
follower. A finger is a node whose centrality measure
deviates slightly from the one of a critical node while
followers have the lowest centrality measures. For each
kind of critical node, we define its fingers and followersand we may have k fingers within the non critical nodes.
As we pointed out earlier, JOAN and JOAN-C algorithms
exploit the class of a given node to both estimate the impact of
its disappearance on the information flow and conduct minimal
network changes to restore the information flow with a similar
quality as before the node disappearance. The information flowquality within a network S is defined in [25] as:
γF (S) = 〈λF (S), εS(w)〉,where λF (S) is the information flow degree which gives
the total number of nodes that can receive an information
introduced into the network from the witness and εS(w) is
the eccentricity of the witness in the network. The witness is
considered as the best entry to the network for putting new
information since it is the closest node to any other node in
the network, and hence, its eccentricity determines how long
it takes to an information to reach any node of S.
After a node disappearance, the first step is to check whether
the node is critical or not. If the node is non critical, then we
use JOAN algorithm. Otherwise, we rely on JOAN-C.
A. JOAN algorithm
When a non critical node Ni is leaving the network Sand its associated links are deleted, we observe two cases:
either the remaining network S′ is connected or not. If S′
is connected, we do nothing because the information flow can
circulate through S′ without any breakdown. Otherwise, while
a fixed information flow quality is ensured, a link between
two neighbors Nj and Nk of Ni is added if there is no path
between them (after the node deletion) as follows:
• i) a direct link is added if Nj or Nk is a critical node.
• ii) an indirect link (between Nj and Nk) is added if
neither Nj nor Nk is a critical one. The indirect link
is set between Nj and Nk through another neighbor Ns
which has the shortest path to a critical node Nc.
In other words, links are only added to maintain the net-
work connected and the quality of information flow. JOAN
algorithm always adds links between nodes that were within
the same neighborhood before a node disappearance. The
motivation of doing so is to ensure that two nodes will never
be linked directly if they do not fulfill some conditions such
as having the same interests or being friends of a leaving
individual.
B. JOAN-C algorithm
When a disappearing node Nc is critical (e.g., a leader), we
first find a substitute in its neighborhood since an information
relevant to one person is more likely to be pertinent to
individuals in his direct neighborhood than those outside his
circle. Three cases are considered:
• Nc has some fingers (at least one). In this case, the first
finger (F 1Nc
) (i.e., the finger with the highest centrality
measure of the same kind as Nc) is chosen as the
replacing node.
• There is no finger to replace the leaving critical node.
Then, one of the remaining critical nodes is selected as
a substitute. The critical node with the shortest geodesic
distance to Nc is used as a substitute.
• There is no other critical nodes, which means that the
leaving node was the only critical node (e.g., a central
node of a star network). Then, one of the Nc neighbors
with the highest centrality measure is selected as the
substitute).
For each one of the three cases, any neighbor of Nc is linked
to the substitute if ever it has not been already linked (directly
or indirectly) to it.
The merit of these algorithms lies in the fact that the
insertion of new links is done in a parsimonious way while
maintaining the information flow quality as close as possible
to the one before the node disappearance. Moreover, the
experimental results show the effectiveness of such algorithms
(see [25] for more details).
IV. DISAPPEARANCE OF A GROUP OF NODES
This section describes how we handle the disappearance
of a group of nodes. The main idea is to rely on JOAN and
494495
JOAN-C algorithms to deal with group disappearance. In other
words, since these algorithms were devised to deal with the
disappearance of one node at a time, we combine them with
some additional features in order to efficiently manage the
disappearance of a set of nodes. We recall that our goal is to
maintain the network still connected after a disappearance of
a group of nodes while avoiding to deteriorate the information
flow quality.
The approach used to handle the disappearance of a group
of nodes depends tightly on the group configuration. Actually,
we distinguish three possible configurations of a group G:
scattered, contiguous and hybrid group. In fact, G is consid-
ered as scattered if there is no direct link for any couple of
nodes in the group. However, if there exists a direct link for
any couple of nodes (Ni, Nk) or an indirect link between Ni
and Nk through only nodes inside G, then the group is called
contiguous. Finally, when both configurations coexist, the set
is called a hybrid one. In this section we describe our approach
for the three possible configurations of G and we assume
that the network is connected, unweighted and undirected, and
contains only one community.
A. Scattered group
In this case, each node Ni in G is far from the other ones in
such a way that its disappearance will not involve another node
Nk in the group. In other words, according to the procedures
described in Section III, we handle the disappearance of Ni
without using any Nk ∈ G as a substitute or link directly Ni
to Nk. Therefore, for each node Ni ∈ G, we apply the JOAN
or JOAN-C algorithm based on the class of Ni to manage its
disappearance. The reason is as follows: if all of the nodes in
G disappear simultaneously, their deletions are independent
in such a way that we can manage them in a separate and
iterative manner. This iterative choice is highly dependent on
the fact that the network S has only one community and
trying to manage node deletion in a simultaneous way is quite
impossible. However, when S contains several communities,
we handle the problem in a parallel way and therefore, the
disappearance of a group of nodes can be managed in a
simultaneous way.
Figure 1 depicts a network where a group that contains three
nodes 6, 7 and 2 is leaving. Since there is no direct tie between
these nodes, the set is considered as scattered and thus, we
manage the disappearance of one node at a time as described
in Section III until all the set is processed. For instance, 6 and
7 are not critical based on the node classification described
earlier, thus, their disappearance is managed one after another
by using the JOAN algorithm. However, the disappearance of
node 2 is tackled by JOAN-C algorithm since it was a critical
node before the group disappearance. The neighbors of node 2are now connected to node 8 which is the most critical node.
The result of applying iteratively JOAN and JOAN-C is
depicted in Figure 2 and one can see that the network is still
connected and the information flow quality remains the same
in terms of witness eccentricity value.
Fig. 1. A scattered leaving group.
Fig. 2. After the nodes in the group {2, 7, 6} leave the network.
B. Contiguous group
When nodes are contiguous, trying to handle the disap-
pearance one node at a time by using JOAN or JOAN-C
algorithms is not realistic. The reason is that the algorithms
use the neighborhood of a leaving node to update the network
and therefore any neighbor used for replacing a node can be
removed in the next steps. Precisely, the disappearance of a
node Ni may lead to tie two nodes that will leave afterwards
because they belong to the same leaving group as Ni, and
consequently, this naive network update is time-consuming
and inefficient. That is why the disappearance of a contiguous
group is handled in one shot. Moreover, the disappearance of
a set of nodes is always considered as a critical disappearance
even if some nodes of the leaving set are non critical ones.
The motivation is based on the fact that the role played by a
set of nodes can be as important as the one played by a single
critical node. Thus, whenever a group leaves the network, it
needs to be replaced in order to preserve the overall shape of
the network and ensure the information dissemination. Figure
3 shows a contiguous group G = {2, 4, 8}.
Fig. 3. A contiguous leaving group.
After the disappearance of such a contiguous group G, our
goal is to find a substitute group G′ and thus, to link any
neighbor of G with one node in G′ if ever they have not been
linked yet. To find the neighbors of G, we consider all the
495496
neighbors of any node in G. Formally, let Γ(G) be the set of
neighbors of G, and Γ(Ni) be the set of neighbors of a node
Ni,
Γ(G) = {⋃
i
Γ(Ni) \G | Ni ∈ G}
Moreover, we define the total common neighbors ΓT (G) of a
group G as the set of nodes that belong to each neighborhood
of a node in G.
ΓT (G) = {⋂
i
Γ(Ni) | Ni ∈ Γ(G)}
We define the partial common neighbors (ΓP (G)) of the
group as the union of the intersections of the neighbors of
any couple of nodes in G.
ΓP (G) =⋃
i,k
{Γ(Ni) ∩ Γ(Nk) | (Ni, Nk) ∈ Γ(G)}
Total and partial common neighbors are defined in order
to know what are the nodes we have to consider first when
we look for a substitute. Therefore, once the neighbors of a
leaving group are determined, the following cases happen :
• ΓT (G) �= ∅, which means that there is at least one
common neighbor for all nodes in G. In such a case,
ΓT (G) is used as a substitute group (i.e., G′ = ΓT (G)).Hence, we add direct links between any couple of nodes
in G′ if ever they are not yet directly or indirectly linked.
Furthermore, for any node Ni ∈ Γ(G) and Ni /∈ G′ we
add a link between Ni and only one of the nodes in the
substitute group G′.
• ΓT (G) = ∅ ∧ ΓP (G) �= ∅, which means that there is no
common neighbor to all nodes in the group while some
couples of nodes have some common neighbors. In this
case, we try to find a substitute group G′, which contains
the same number of nodes as G, by including some nodes
in ΓP (G). Since we can get several combinations if the
number of partial common neighbors is higher than the
number of nodes in G, then we choose the combination
that has the highest group degree centrality. Choosing
the combination with the highest degree centrality has
the advantage to select the most leading group. We get
the group with the highest group centrality by considering
first nodes with the highest centrality metrics. Once G′ is
defined, we use the same process as in the first case to link
nodes in G′ and to link the remaining nodes in Γ(G) to
G′. We note that even if a group of nodes with less nodes
than G′ may have a degree centrality higher than the one
of G, we choose G′ with the same number of nodes as Gto avoid connecting all the elements in Γ(G) to a smaller
set. Moreover, choosing a group substitute with at least
the same cardinality as ‖G‖ has the advantage to limit
the number of links to add for maintaining the network
connected after the disappearance of G. In fact, let us
define pNi(G) as the probability that Ni /∈ G be linked
to a group G thats contains n nodes V1, V2, ..., Vn. Since
once Ni is linked to one of the nodes in G it becomes
linked to any other node because the group is contiguous,
thus,
pNi(G) = (pNi
(n⋃
k=1
Vk)) ≤n∑
k=1
pNi(Vk)
Suppose we have G′ ⊆ G′′, which means that n′ ≤ n”and assume n” = n and both G′ and G′′ can be used as
a substitute of G. Thus we obtain,
n′′∑
k=1
pNi(Vk) ≥
n′∑
k=1
pNi(Vk)⇒ pNi
(G′′) ≥ pNi(G′).
This inequality means that using G” as a substitute is
more optimal than using G′ since it is more likely that
a node Nj ∈ Γ(G) is linked to G” than to G′. In other
words, G” minimizes more the number of links to add
than G′ since it is already linked to most of other nodes
in Γ(G) than G′ does. This assertion is well confirmed
by our experimental results (see Section VI).
• ΓT (G) = ∅ ∧ ΓP (G) = ∅. This case is the extreme one
since even if the leaving group is contiguous, its nodes
do not even share a single neighbor. To find a substitute
group, G′, we gather a set of nodes from Γ(G) in order
to get a centrality value similar to or higher than the
degree centrality of G. To this end, we choose the same
number of nodes as in G and we consider first the nodes
with the highest centrality measures. Thus, we compute
the group degree centrality and compare it with the one
of G. If the degree centrality of G′ is less than the one for
G, we create a new group G” by adding a new node to
G′ and check its group degree once again and we repeat
this process until we get a group degree centrality quite
similar to the one of G. However, using a group G′ with
the same cardinality as G is enough if |Γ(G)| = |Γ(G′)|since the group degree centrality of G′ will be greater
than the one of G, i.e., CSD(G) ≤ CS
D(G′). In fact,
CSD(G
′) = |Γ(G′)|N ′−|G′| =
|Γ(G′)|(N−|G|)−|G′| .
So, if |Γ(G)| = |Γ(G′)| ⇒ CSD(G
′) = |Γ(G)|(N−|G|)−|G| .
which means that,|Γ(G)|
(N−|G|)−|G| ≥ |Γ(G)|N−|G| and thus,
CSD(G) ≤ CS
D(G′).
For instance, let G be the group 2, 4, 8 in Figure 3,
which has a group degree centrality equal to 0, 857.
After the disappearance of G, we observe that ΓT (G) =∅∧ΓP (G) = ∅. Thus, let us form a group G′ = {3, 6, 10}and compute its group degree centrality, we find that
the value is 1, which is higher than the group degree
of G. Therefore G′ is considered as the substitute and
the network looks like the one depicted by Figure 4 after
adding links between node of G′ and links between G′
and the remaining node in the neighborhood of G.
496497
Fig. 4. The substitute G′ and additional links after the leaving of the group2, 4, 8 in Figure 3
C. Hybrid group
This case includes both of the two cases described above
and happens when one part of the group is scattered and other
parts are contiguous. Figure 5 shows an example of an hybrid
group. Nodes 2 and 8 are contiguous while node 11 is isolated.
To handle this situation, we first identify the (scattered and
contiguous) subgroups, and for each subgroup we apply one
of the procedures presented earlier.
Fig. 5. An hybrid leaving group
V. GROUP DISAPPEARANCE WITH IDENTIFIED
COMMUNITIES
A network may contain communities that represent clusters
of nodes [21], [8], [11].
Our goal in this section is to adapt our approach in the
case of social networks with communities. To this end, we
use the method presented by [21] to first find communities
within a network. Then, a leaving group of nodes can be
localized either in one single community or its nodes are
spread over many communities. When the overall group of
the leaving node belongs to the same community, we simply
apply the procedure described earlier. However, when the
nodes of the group are distributed over several communities,
we first identify such communities and the corresponding
subgroups. For each subgroup within a community, we apply
the procedure based on the nature of the group (e.g., scattered).
Figure 6 shows an example of a network with two com-
munities. The leaving group has six nodes spread over two
communities with a scattered subgroup and a contiguous
one. To handle the disappearance of this group we apply
simultaneously our algorithm in each community.
The goal behind such approach is twofold. First, we want
to deal with such a real-life situation in which the network
Fig. 6. An hybrid leaving group G = {8, 12, 14, 1, 15, 16} lying in twocommunities
contains several communities. Second, using communities
allows to manage the network evolution in a parallel way and
hence get better performances in large networks.
VI. VALIDATION
In this section, we aim at assessing the performance of the
proposed solution and its feasibility through experimentation.
To this end, we first check if our approach minimizes the
number of added links after a group disappearance. Then, we
evaluate the performance in terms of execution time and added
links when group configurations vary.
A. Experimental Setup
The prototype is written in Python with the social network
library NetworkX [20]. The experiments were conducted on
both a single machine Intel Core i5 (8 GB of RAM and 3.20
GHz running under Linux Ubuntu) and a cloud infrastructure
PiCloud [22].
The data set used for our experiments is Autonomous
systems AS-733 of the Stanford University [26]. This set
describes a graph of Internet routers with a communication
network model of who-talks-to-whom. This communication
model is well-suited to our context since it allows us to see
how information circulates from one node to another one. The
AS-733 network is undirected and unweighted and contains
6474 nodes and 13233 edges. It is organized into sub-graphs
which will allow us to later handle the problem of node
and group of nodes deletion in networks with well-identified
communities.
B. Cost of network updates
We aim in this experiment to evaluate the cost of managing
a group disappearance in terms of links to add for maintaining
the network connected. To this end, we vary the size of the
leaving group from 10% to 40% of the total number of nodes
in the network. We argue that varying the size up to 40%
is enough to assess the performance of our approach and to
cope with most of the real-life situations. The leaving group is
chosen randomly and corresponds most of the time to a hybrid
group. Figure 7 depicts the average number of added links per
(deleted) node when the size of the leaving group increases
and shows that the average number of links to add is very low.
For example, with a disappearance of 1294 nodes (i.e., 20% of
the size of the network), we need to add at most 0.15 links per
497498
deleted node, which is about 190 links for the whole leaving
group. This is mainly due to the fact that we add only links if
ever the network is disconnected and moreover, if the group
size is important, the probability that the remaining nodes are
linked to the group is so high that we do not need to add new
links. This result confirms our assertion in Section IV.
Fig. 7. Average added links per node according to the group size (representedas a proportion of the network size).
C. Impact of using the communities
In this experiment, we assess the performance of managing
a group disappearance from a network that contains com-
munities. To this end, we use the same data sets as before
and run the experiments on a cloud computing infrastructure.
We first find the communities, assign each one of them to a
single machine, and the leaving group is formed of subgroups
spread over communities. We measure the average response
time for managing group disappearance when the size of
the group varies. To better highlight the impact of using
communities, we depict in Figure 8 the time required to
update the network structure in case of both one or many
communities. We observe that the average response time when
the leaving group is distributed over several communities
(dashed curve) is far lower than the case when the network
forms only one community. In the latter case, the network is
stored and managed in a unique machine. However, in the
presence of communities, we manage in a simultaneous way
the disappearance of a set of nodes on different machines
and hence, the overall response time decreases significantly.
Moreover, we observe that the response time of both situations
varies slightly even if the group size increases. This is due to
the fact that the complexity of our procedure does not depend
on the group size but rather on the neighborhood size of the
group.
D. Impact of the network density
The goal of this experiment is to evaluate the performance
of our approach in two cases: a sparse graph and a dense
one with density values of 0.6% and 80%. To this end, we
consider graphs of 3000 nodes and three configurations: sparse
Fig. 8. Response time with or without communities according to group size(represented as a proportion of the network size).
graphs with a leaving hybrid group (HSG), sparse graphs with
a leaving contiguous group (CSG) and finally dense graphs
with a contiguous leaving group (CDG). We do not consider
the case of dense graphs with a hybrid leaving group because
such a case is very rare. In fact, since the graph is dense, any
node is tied to any other node and therefore, any leaving group
is more likely to be a contiguous group. We vary the leaving
group size from 10% to 40% of the initial number of nodes
and we measure the time required to update the network after
the disappearance of a group. The results shown in Figure 9
reveal that the response time is less than one second for a
hybrid leaving group from a sparse graph (HSG curve). This
is mainly due to the fact that most of the nodes in the group
are isolated because of the sparsity of the graph, and hence,
the number of links to add as well as the number of neighbors
to consider for finding a substitute is small and leads to a
rapid network update. Moreover, this result shows again that
the complexity of our algorithm depends on the size of the
neighborhood and since with a sparse graph the number of
neighbors of a given node is small, the time to update the
network remains low.
However, when the leaving group is contiguous, the neigh-
borhood is more important and the time to find whether the
group has some common or partial neighbors increases the
response time even if the graph is sparse (CSG). Furthermore,
this is more remarkable with a dense graph where a node (or
group of nodes) has an important neighborhood size. That is
why the response time with a dense graph for a contiguous
group (CDG) is very high compared to the one with a sparse
graph.
VII. CONCLUSION
The goal of this paper is to propose a method to deal with
the disappearance of an actor set within a social network. We
rely on our previous work that exploits the role played by a
given node to manage node disappearance while maintaining
the information flow quality. Three cases are identified for a
leaving actor set: a scattered, contiguous or hybrid one. When
the group is scattered, the JOAN and JOAN-C algorithms
proposed in [25] are used since nodes are not directly linked
498499
Fig. 9. Response time with dense or sparse graph vs group size
and their disappearance can be managed independently in an
iterative way. However, when the group is contiguous, our
method tries to find a substitute for the leaving group. To this
end, we consider all common neighbors of the leaving group
and find a group with the same number of nodes as the leaving
one. If there are no common neighbors, we gather a set of
nodes from the neighbors of the leaving group in order to get
a centrality degree value close to or higher than the one of the
leaving group. The same number of nodes as in the leaving
group are chosen and the nodes with the highest centrality
measures are considered first. In case of a hybrid group, we
dissociate the contiguous group part from the scattered part
and handle each one of them with the appropriate algorithm.
Our solution minimizes the number of added links while main-
taining the network connected after the disappearance of a set
of nodes. Two situations are also considered: the presence of
only one community and the existence of many communities.
An implementation and validation of our approach are also
studied and promising results are shown.
Since we are concerned with network connectivity, a more
pro-active approach we plan to explore is to notify and
recommend ways to strengthen the weak links.
ACKNOWLEDGMENT
Idrissa Sarr is grateful to “Programme canadien de bourses
de la Francophonie” of the Canadian International Develop-
ment Agency for the postdoctoral fellowship while Rokia
Missaoui acknowledges the financial support of the Natu-
ral Sciences and Engineering Research Council of Canada
(NSERC).
REFERENCES
[1] C. C. Aggarwal, A. Khan, and X. Yan. On flow authority discovery insocial networks. In SDM, pages 522–533, 2011.
[2] R. Ahlswede, C. Ning, S.-Y. Li, and R. Yeung. Network informationflow. Information Theory, IEEE Transactions on, 46(4):1204 –1216, jul2000.
[3] L. Backstrom and J. Leskovec. Supervised random walks: predictingand recommending links in social networks. In WSDM, pages 635–644,2011.
[4] E. Ben-Naim and P. L. Krapivsky. Addition - deletion networks. Journalof Physics A: Mathematical and Theoretical, 40(30):8607, 2007.
[5] P. J. Carrington. Models and methods in social network analysis.Structural analysis in the social sciences. Cambridge Univ. Press, 2005.
[6] M. D. Choudhury, Y.-R. Lin, H. Sundaram, K. S. Candan, L. Xie, andA. Kelliher. How does the data sampling strategy impact the discoveryof information diffusion in social media? In ICWSM, 2010.
[7] M. G. Everett and S. P. Borgatti. Extended cantrality. In P. J. Carrington,editor, Models and methods in social network analysis, pages 57–76.Cambridge University Press, 2005.
[8] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75–174, February 2010.
[9] N. E. Friedkin. Information flow through strong and weak ties inintraorganizational social networks. Social Networks, 3(4):273 – 285,1982.
[10] E. Gilbert and K. Karahalios. Predicting tie strength with social media.In Proceedings of the 27th international conference on Human factorsin computing systems, CHI ’09, pages 211–220, 2009.
[11] M. Girvan and M. E. J. Newman. Community structure in social andbiological networks. Proceedings of the National Academy of Sciencesof the United States of America, 99(12):7821–7826, 2002.
[12] F. Halim, R. H. C. Yap, and Y. Wu. A mapreduce-based maximum-flow algorithm for large small-world network graphs. In ICDCS, pages192–202, 2011.
[13] R. Hanneman and M. Riddle. Introduction to social network methods.http://faculty.ucr.edu/ hanneman/nettext/, 2005.
[14] D. M. A. Hussain and Z. Ahmed. Dynamical adaptation in terroristcells/networks. In SCSS (2), pages 557–562, 2008.
[15] J. M. Kumpula and J. P. Onnela and J. Saramaki, and K. Kaski and J.Kertesz. Emergence of communities in weighted networks. PhysicalReview Letters, 99(22), 2007.
[16] D. Knoke and S. Yang. Social Network Analysis. Sage Publications,second edition edition, 2008.
[17] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive andnegative links in online social networks. In Proceedings of the 19thinternational conference on World wide web, WWW ’10, pages 641–650, 2010.
[18] D. Liben-Nowell and J. Kleinberg. The link prediction problem forsocial networks. In CIKM ’03: Proceedings of the twelfth internationalconference on Information and knowledge management, pages 556–559.ACM, 2003.
[19] E. Negre, R. Missaoui, and J. Vaillancourt. Predicting a social networkstructure once a node is deleted. In ASONAM, pages 297–304, 2011.
[20] NetworkX library. http://networkx.lanl.gov/, 2011.[21] M. E. J. Newman. Fast algorithm for detecting community structure in
networks. Phys. Rev. E, 69(6):066133, Jun 2004.[22] PiCloud. http://www.picloud.com/, 2012.[23] R. Toivonen and L. Kovanen and M. Kivel and J. P. Onnela, and J.
Saramki and K. Kaski. A comparative study of social network models:Network evolution models and nodal attribute models. Social Networks,31(4):240–254, October 2009.
[24] M. G. Rodriguez, J. Leskovec, and A. Krause. Inferring networks ofdiffusion and influence. In Proceedings of the 16th ACM SIGKDDinternational conference on Knowledge discovery and data mining, KDD’10, pages 1019–1028, 2010.
[25] I. Sarr and R. Missaoui. Managing Node DisappearanceBased on Information Flow in Social Networks. SocialNetwork Analysis and Mining Journal (SNAM), 2012http://www.springerlink.com/openurl.asp?genre=articleid=doi:10.1007/s13278-012-0071-y.
[26] Stanford dataset. http://snap.stanford.edu/data/as.html, 2009.[27] T. Tylenda, R. Angelova, and S. Bedathur. Towards time-aware link
prediction in evolving social networks. In The 3rd SNA-KDD Workshop’09 (SNA-KDD’09), June 2009.
[28] S. Wasserman and K. Faust. Social network analysis : methods andapplications. Structural analysis in the social sciences. CambridgeUniversity Press, 1 edition, 1994.
[29] F. Wu, B. A. Huberman, L. A. Adamic, and J. R. Tyler. Information flowin social groups. Physica A: Statistical Mechanics and its Applications,337(1-2):327 – 335, 2004.
[30] J. Yang and J. Leskovec. Modeling information diffusion in implicitnetworks. In ICDM, pages 599–608, 2010.
499500