Transcript
Page 1: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Dealing with Disappearance of an Actor Set inSocial Networks

Idrissa Sarr

Universite Cheikh Anta Diop

Dakar, Senegal

Email: [email protected]

Rokia Missaoui

Universite du Quebec en Outaouais

Quebec, Canada

Email: [email protected]

Romain Lalande

Universite du Quebec en Outaouais

Quebec, Canada

Email: [email protected]

Abstract—Social networks are dynamic structures that containa set of entities and links. In such a dynamic environment, aspecific node or a group of nodes can play an important role in theinformation flow transmission within the network and therefore,its disappearance may lead to a disconnected network or abreakdown in the information flow. The objective of this paper isto extend our previous work on managing a node disappearanceto handling the disappearance of a group of nodes. The proposedapproach relies on the role played by a group of nodes toconduct network changes and maintain the network connectedwhile restoring the information flow with a similar quality asbefore the group disappearance. We consider two situations (onlyone versus many communities) and categorize groups of nodesinto three classes (scattered, contiguous and hybrid). Hence, wemanage a group disappearance with respect to its class and thenetwork topology by adding new links in a parsimonious way andfinding a substitute for a leaving group. Our approach differsfrom existing link prediction solutions by the fact that it usesthe information flow quality as a key performance indicator toidentify the potential links to add and/or the possible substituteto a disappearing group. We implement a prototype by usingan open source social network analysis library (NetworkX) andwe validate our solution through experiments. The results showthe benefits of our solution in terms of response time and thenumber of added links.

I. INTRODUCTION

Social network analysis has attracted many researchers from

different fields and covers many topics such as centrality

computation, community detection, link prediction, influential

actor identification, and so on.A social network is generally represented as a graph of

a set of entities/actors (nodes) with links (edges) between

them. The structure of the graph is usually dynamic and can

be analyzed according to different approaches and techniques

[5], [16], [28]. As in any social structure, each actor plays a

more or less important role within the network. For example,

influential customers are seen as more important since they

play the role of those who have the best ability to spread

the information flow within the network, whereas influenced

customers can just follow or receive information from others.

However, even if some nodes are not influential individually,

they can play an important role in the information flow as

a whole. Therefore, the disappearance of a group of nodes

may have the same impact in the information flow as the

disappearance of a central node (i.e., the one with the highest

centrality score).

Furthermore, the problem of information flow in social

networks is very relevant and has been widely studied [2], [6],

[1], [24], [29], [9], [30], [12]. The structure of the network is

identified as one of the factors that can impact the information

flow. For instance, when the network is partitioned, generally

due to the disappearance of central nodes or a group of

nodes, the spread of information is broken or limited to a

small part of the overall network, and therefore some nodes

will never receive the information. With this in mind, it is

essential to efficiently manage the network structure in such a

way that a breakdown will never happen even if some nodes

disappear from the network. In [25], we propose a solution

that deals with the disappearance of one node at a time. The

proposed solution minimizes the number of network updates

while maintaining the information flow quality. It consists

to add new links and possibly select a substitute whenever

a node disappears from the network. This is why we place

it in the realm of link prediction solutions well studied in

[3], [17], [18], [10], [19]. Even though the solution described

in [25] differs from other solutions by using the information

flow quality to find the optimal links and avoid exploding the

number of links, it does not deal with the disappearance of a

group of nodes.

The goal of this paper is to tackle the problem of a

group disappearance from an existing network. The main

contributions of this work can be summarized as follows:

• An algorithm to first detect whether a leaving group of

nodes is a contiguous, scattered or hybrid one. Then, it

possibly finds a substitute for the leaving group and adds

links to maintain the network connected. To this end,

we rely on JOAN and JOAN-C algorithms described in

[25] that minimize the number of network updates while

maintaining the information flow quality. This approach

assumes that the network has only one community.

• An adaptation of our solution to the situation where the

network contains many communities. In such a case, we

first identify the communities in the network that contain

nodes of the leaving group and then, we handle our

algorithm in a parallel manner based on the configuration

of each subgroup in each community.

• A validation of our approach (through implementation

and experiments with network data sets) done by using a

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.85

492

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.85

493

Page 2: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

cloud computing infrastructure that shows the feasibility

of our approach and evaluates its performance.

This paper is organized as follows: the next section gives

some background and related work on social network analysis.

Section III summarizes our previous work related to JOAN and

JOAN-C algorithms. In Section IV, the proposed approach is

defined and related procedures are described and illustrated.

The case of a network with well-identified communities is

presented in Section V. The validation of the approach is

given in Section VI while Section VII concludes this paper

and provides future work.

II. BACKGROUND

We consider S, a social network with a set of actors such

as individuals, organizations, objects, or events which are

connected via links (e.g., friendship, common interests or

features, collaboration). The network is represented as a graph

S = 〈V,E〉 where vertices in V are actors and edges in

E are interactions or ties between actors. In this paper, we

assume that all the interactions between actors are symmetric

and uniform, and hence, the edges are single, undirected

and unweighted. Moreover, we assume that the network is

connected and we aim to maintain such connectivity after the

disappearance of a node or even a set of nodes.

In the following, we first recall some basic and useful

notions that help understand our approach.

A. Centrality measures

There are many measures of node centrality in a graph to

capture the importance of an entity within such structure. For

example, the degree centrality, the betweenness centrality, and

the closeness centrality of a given node [28] are the frequently

used centrality measures.

Given Ni, an individual node belonging to a network, the

degree centrality of Ni provides the number of its direct

edges, while its betweenness centrality expresses the amount

of control that it has over the interactions of other nodes and

finally, its closeness centrality indicates how Ni is close to

the other nodes. A node with the highest degree (betweenness

or closeness resp.) centrality is called a leader (mediator or

witness resp.).

Similarly, the same metrics for an individual node can be

defined for a group of individual nodes [7]. For example, the

group degree centrality of a set G is defined as the number

of nodes outside G that are linked to elements of G. The

normalized group degree centrality in a given social network

S of size N is obtained by dividing the group degree by the

number of non-group actors. CSD(G) =

|Γ(G)|N−|G| , where Γ(G)

is the number of nodes linked to G. This centrality measure

will be used in the sequel to estimate the importance of a group

G in the network. For the definitions of other centrality metrics

such as group betweenness centrality and group closenesscentrality, the reader is referred to [7].

B. Distance measures

The geodesic distance between two nodes in a graph is the

number of edges in a shortest path connecting them. In case

of unconnected actors, the geodesic distance is either set to ∞or computed as the greatest distance in the network augmented

by 1 [13].

The eccentricity of a node i is the greatest geodesic distance

between i and any other node j in the network. It expresses

how far j is from i. Nodes with a low eccentricity can be

used to disseminate the information rapidly and play the role

of information flow authorities as studied by [1].

C. Related work

Many social network evolution models have been proposed

in dynamic networks, and are mainly based on adding nodes

or links. Dynamic networks are a kind of graphs [4], [23]

in which nodes and links can be added and/or deleted. The

deletion of a node can be managed by predicting and actually

adding new links. Adding new links is generally done by using

the triadic closure and focal closure [15], [23] through non

deterministic approaches generally limited to geodesic neigh-

borhood. We recall that the triadic closure is the probability

that B is linked to C knowing that A is already linked to B and

C. The focal closure is the probability that A and B that share

an interest could interact while the geodesic neighborhood of

A is the set of nodes located at a geodesic distance (i.e., the

shortest path) from A.

The studies in [18], [27] focus on predicting new links but

exclusively in static social networks. In [18], the objective is to

define approaches to link prediction based on measures that ex-

ploit the proximity of nodes within a network. More precisely,

the idea is to identify proximity measures that efficiently

predict the new links that will likely happen in the future in

a social network by assuming that two close nodes have a

greater probability to be linked. An experimental comparison

of a set of measures (e.g. Adamic/Adar distance, common

neighbor and Katz measures) is also described in [18] and

their impact on the quality of link prediction is proposed. In

[27], the authors take into account the evolution of a network

over time by integrating edge weights (potentially derived from

temporal characteristics) into existing link prediction methods.

They also investigate a new testing method to compute the

performance of prediction algorithms in ranking the neighbors

of a selected node. The proposed solutions usually rely on

the network topology and add new links without taking into

account a specific goal (e.g., information flow quality).

Finally, the issue raised in [14] is a bit more close to ours

since the authors handle the problem of identifying the node

of the network which will replace a given deleted node. To

that end, they propose a Bayesian approach by first computing

the posterior probability of each node in the network. The best

substitute is then the one with the closest posterior probability

to that of the deleted node. Moreover, they take into account

the hierarchical structure of nodes (a background knowledge

not reflected in the network) so that the substitute needs to

be of similar level of hierarchy (e.g. a Vice-president should

493494

Page 3: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

be replaced by another Vice-president in the organization).

However, the link prediction problem is not studied. Some

methods to predict the new structure of a social network once

a node disappears is proposed (see [19], [25]). They consist

to possibly find a substitute for the leaving node provided it

plays a role in the network and/or the information flow quality.

New links can then be established and any isolated node is

discarded. However, these studies do not deal with a group

disappearance that can happen in any social network.

III. JOAN AND JOAN-C ALGORITHMS

In this section we describe the overall features of JOAN

and JOAN-C algorithms proposed in [25]. The goal of these

algorithms is to: (i) avoid a degradation of the information flow

that could follow the disappearance of a node in the network,

and (ii) identify a substitute to the disappearing node whenever

it is important in terms of its centrality measures.

To this end, nodes are classified into two main classes:

• Critical nodes are those which play such a central

role that their disappearance can lead to an amount of

changes in the network (disconnected network, increase

of diameter/radius, ...). Based on the node centrality

measures, we identify three kinds of critical nodes,

namely, the leader, the mediator and the witness that

have respectively the highest degree, betweenness and

closeness centrality. Several nodes may have the same

highest centrality measure within a network.

• Non critical nodes play less important roles than those

described above. However, some of them may have high

(but not the highest) centrality measures, and therefore,

we distinguish two kinds of non critical nodes: finger and

follower. A finger is a node whose centrality measure

deviates slightly from the one of a critical node while

followers have the lowest centrality measures. For each

kind of critical node, we define its fingers and followersand we may have k fingers within the non critical nodes.

As we pointed out earlier, JOAN and JOAN-C algorithms

exploit the class of a given node to both estimate the impact of

its disappearance on the information flow and conduct minimal

network changes to restore the information flow with a similar

quality as before the node disappearance. The information flowquality within a network S is defined in [25] as:

γF (S) = 〈λF (S), εS(w)〉,where λF (S) is the information flow degree which gives

the total number of nodes that can receive an information

introduced into the network from the witness and εS(w) is

the eccentricity of the witness in the network. The witness is

considered as the best entry to the network for putting new

information since it is the closest node to any other node in

the network, and hence, its eccentricity determines how long

it takes to an information to reach any node of S.

After a node disappearance, the first step is to check whether

the node is critical or not. If the node is non critical, then we

use JOAN algorithm. Otherwise, we rely on JOAN-C.

A. JOAN algorithm

When a non critical node Ni is leaving the network Sand its associated links are deleted, we observe two cases:

either the remaining network S′ is connected or not. If S′

is connected, we do nothing because the information flow can

circulate through S′ without any breakdown. Otherwise, while

a fixed information flow quality is ensured, a link between

two neighbors Nj and Nk of Ni is added if there is no path

between them (after the node deletion) as follows:

• i) a direct link is added if Nj or Nk is a critical node.

• ii) an indirect link (between Nj and Nk) is added if

neither Nj nor Nk is a critical one. The indirect link

is set between Nj and Nk through another neighbor Ns

which has the shortest path to a critical node Nc.

In other words, links are only added to maintain the net-

work connected and the quality of information flow. JOAN

algorithm always adds links between nodes that were within

the same neighborhood before a node disappearance. The

motivation of doing so is to ensure that two nodes will never

be linked directly if they do not fulfill some conditions such

as having the same interests or being friends of a leaving

individual.

B. JOAN-C algorithm

When a disappearing node Nc is critical (e.g., a leader), we

first find a substitute in its neighborhood since an information

relevant to one person is more likely to be pertinent to

individuals in his direct neighborhood than those outside his

circle. Three cases are considered:

• Nc has some fingers (at least one). In this case, the first

finger (F 1Nc

) (i.e., the finger with the highest centrality

measure of the same kind as Nc) is chosen as the

replacing node.

• There is no finger to replace the leaving critical node.

Then, one of the remaining critical nodes is selected as

a substitute. The critical node with the shortest geodesic

distance to Nc is used as a substitute.

• There is no other critical nodes, which means that the

leaving node was the only critical node (e.g., a central

node of a star network). Then, one of the Nc neighbors

with the highest centrality measure is selected as the

substitute).

For each one of the three cases, any neighbor of Nc is linked

to the substitute if ever it has not been already linked (directly

or indirectly) to it.

The merit of these algorithms lies in the fact that the

insertion of new links is done in a parsimonious way while

maintaining the information flow quality as close as possible

to the one before the node disappearance. Moreover, the

experimental results show the effectiveness of such algorithms

(see [25] for more details).

IV. DISAPPEARANCE OF A GROUP OF NODES

This section describes how we handle the disappearance

of a group of nodes. The main idea is to rely on JOAN and

494495

Page 4: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

JOAN-C algorithms to deal with group disappearance. In other

words, since these algorithms were devised to deal with the

disappearance of one node at a time, we combine them with

some additional features in order to efficiently manage the

disappearance of a set of nodes. We recall that our goal is to

maintain the network still connected after a disappearance of

a group of nodes while avoiding to deteriorate the information

flow quality.

The approach used to handle the disappearance of a group

of nodes depends tightly on the group configuration. Actually,

we distinguish three possible configurations of a group G:

scattered, contiguous and hybrid group. In fact, G is consid-

ered as scattered if there is no direct link for any couple of

nodes in the group. However, if there exists a direct link for

any couple of nodes (Ni, Nk) or an indirect link between Ni

and Nk through only nodes inside G, then the group is called

contiguous. Finally, when both configurations coexist, the set

is called a hybrid one. In this section we describe our approach

for the three possible configurations of G and we assume

that the network is connected, unweighted and undirected, and

contains only one community.

A. Scattered group

In this case, each node Ni in G is far from the other ones in

such a way that its disappearance will not involve another node

Nk in the group. In other words, according to the procedures

described in Section III, we handle the disappearance of Ni

without using any Nk ∈ G as a substitute or link directly Ni

to Nk. Therefore, for each node Ni ∈ G, we apply the JOAN

or JOAN-C algorithm based on the class of Ni to manage its

disappearance. The reason is as follows: if all of the nodes in

G disappear simultaneously, their deletions are independent

in such a way that we can manage them in a separate and

iterative manner. This iterative choice is highly dependent on

the fact that the network S has only one community and

trying to manage node deletion in a simultaneous way is quite

impossible. However, when S contains several communities,

we handle the problem in a parallel way and therefore, the

disappearance of a group of nodes can be managed in a

simultaneous way.

Figure 1 depicts a network where a group that contains three

nodes 6, 7 and 2 is leaving. Since there is no direct tie between

these nodes, the set is considered as scattered and thus, we

manage the disappearance of one node at a time as described

in Section III until all the set is processed. For instance, 6 and

7 are not critical based on the node classification described

earlier, thus, their disappearance is managed one after another

by using the JOAN algorithm. However, the disappearance of

node 2 is tackled by JOAN-C algorithm since it was a critical

node before the group disappearance. The neighbors of node 2are now connected to node 8 which is the most critical node.

The result of applying iteratively JOAN and JOAN-C is

depicted in Figure 2 and one can see that the network is still

connected and the information flow quality remains the same

in terms of witness eccentricity value.

Fig. 1. A scattered leaving group.

Fig. 2. After the nodes in the group {2, 7, 6} leave the network.

B. Contiguous group

When nodes are contiguous, trying to handle the disap-

pearance one node at a time by using JOAN or JOAN-C

algorithms is not realistic. The reason is that the algorithms

use the neighborhood of a leaving node to update the network

and therefore any neighbor used for replacing a node can be

removed in the next steps. Precisely, the disappearance of a

node Ni may lead to tie two nodes that will leave afterwards

because they belong to the same leaving group as Ni, and

consequently, this naive network update is time-consuming

and inefficient. That is why the disappearance of a contiguous

group is handled in one shot. Moreover, the disappearance of

a set of nodes is always considered as a critical disappearance

even if some nodes of the leaving set are non critical ones.

The motivation is based on the fact that the role played by a

set of nodes can be as important as the one played by a single

critical node. Thus, whenever a group leaves the network, it

needs to be replaced in order to preserve the overall shape of

the network and ensure the information dissemination. Figure

3 shows a contiguous group G = {2, 4, 8}.

Fig. 3. A contiguous leaving group.

After the disappearance of such a contiguous group G, our

goal is to find a substitute group G′ and thus, to link any

neighbor of G with one node in G′ if ever they have not been

linked yet. To find the neighbors of G, we consider all the

495496

Page 5: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

neighbors of any node in G. Formally, let Γ(G) be the set of

neighbors of G, and Γ(Ni) be the set of neighbors of a node

Ni,

Γ(G) = {⋃

i

Γ(Ni) \G | Ni ∈ G}

Moreover, we define the total common neighbors ΓT (G) of a

group G as the set of nodes that belong to each neighborhood

of a node in G.

ΓT (G) = {⋂

i

Γ(Ni) | Ni ∈ Γ(G)}

We define the partial common neighbors (ΓP (G)) of the

group as the union of the intersections of the neighbors of

any couple of nodes in G.

ΓP (G) =⋃

i,k

{Γ(Ni) ∩ Γ(Nk) | (Ni, Nk) ∈ Γ(G)}

Total and partial common neighbors are defined in order

to know what are the nodes we have to consider first when

we look for a substitute. Therefore, once the neighbors of a

leaving group are determined, the following cases happen :

• ΓT (G) �= ∅, which means that there is at least one

common neighbor for all nodes in G. In such a case,

ΓT (G) is used as a substitute group (i.e., G′ = ΓT (G)).Hence, we add direct links between any couple of nodes

in G′ if ever they are not yet directly or indirectly linked.

Furthermore, for any node Ni ∈ Γ(G) and Ni /∈ G′ we

add a link between Ni and only one of the nodes in the

substitute group G′.

• ΓT (G) = ∅ ∧ ΓP (G) �= ∅, which means that there is no

common neighbor to all nodes in the group while some

couples of nodes have some common neighbors. In this

case, we try to find a substitute group G′, which contains

the same number of nodes as G, by including some nodes

in ΓP (G). Since we can get several combinations if the

number of partial common neighbors is higher than the

number of nodes in G, then we choose the combination

that has the highest group degree centrality. Choosing

the combination with the highest degree centrality has

the advantage to select the most leading group. We get

the group with the highest group centrality by considering

first nodes with the highest centrality metrics. Once G′ is

defined, we use the same process as in the first case to link

nodes in G′ and to link the remaining nodes in Γ(G) to

G′. We note that even if a group of nodes with less nodes

than G′ may have a degree centrality higher than the one

of G, we choose G′ with the same number of nodes as Gto avoid connecting all the elements in Γ(G) to a smaller

set. Moreover, choosing a group substitute with at least

the same cardinality as ‖G‖ has the advantage to limit

the number of links to add for maintaining the network

connected after the disappearance of G. In fact, let us

define pNi(G) as the probability that Ni /∈ G be linked

to a group G thats contains n nodes V1, V2, ..., Vn. Since

once Ni is linked to one of the nodes in G it becomes

linked to any other node because the group is contiguous,

thus,

pNi(G) = (pNi

(n⋃

k=1

Vk)) ≤n∑

k=1

pNi(Vk)

Suppose we have G′ ⊆ G′′, which means that n′ ≤ n”and assume n” = n and both G′ and G′′ can be used as

a substitute of G. Thus we obtain,

n′′∑

k=1

pNi(Vk) ≥

n′∑

k=1

pNi(Vk)⇒ pNi

(G′′) ≥ pNi(G′).

This inequality means that using G” as a substitute is

more optimal than using G′ since it is more likely that

a node Nj ∈ Γ(G) is linked to G” than to G′. In other

words, G” minimizes more the number of links to add

than G′ since it is already linked to most of other nodes

in Γ(G) than G′ does. This assertion is well confirmed

by our experimental results (see Section VI).

• ΓT (G) = ∅ ∧ ΓP (G) = ∅. This case is the extreme one

since even if the leaving group is contiguous, its nodes

do not even share a single neighbor. To find a substitute

group, G′, we gather a set of nodes from Γ(G) in order

to get a centrality value similar to or higher than the

degree centrality of G. To this end, we choose the same

number of nodes as in G and we consider first the nodes

with the highest centrality measures. Thus, we compute

the group degree centrality and compare it with the one

of G. If the degree centrality of G′ is less than the one for

G, we create a new group G” by adding a new node to

G′ and check its group degree once again and we repeat

this process until we get a group degree centrality quite

similar to the one of G. However, using a group G′ with

the same cardinality as G is enough if |Γ(G)| = |Γ(G′)|since the group degree centrality of G′ will be greater

than the one of G, i.e., CSD(G) ≤ CS

D(G′). In fact,

CSD(G

′) = |Γ(G′)|N ′−|G′| =

|Γ(G′)|(N−|G|)−|G′| .

So, if |Γ(G)| = |Γ(G′)| ⇒ CSD(G

′) = |Γ(G)|(N−|G|)−|G| .

which means that,|Γ(G)|

(N−|G|)−|G| ≥ |Γ(G)|N−|G| and thus,

CSD(G) ≤ CS

D(G′).

For instance, let G be the group 2, 4, 8 in Figure 3,

which has a group degree centrality equal to 0, 857.

After the disappearance of G, we observe that ΓT (G) =∅∧ΓP (G) = ∅. Thus, let us form a group G′ = {3, 6, 10}and compute its group degree centrality, we find that

the value is 1, which is higher than the group degree

of G. Therefore G′ is considered as the substitute and

the network looks like the one depicted by Figure 4 after

adding links between node of G′ and links between G′

and the remaining node in the neighborhood of G.

496497

Page 6: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Fig. 4. The substitute G′ and additional links after the leaving of the group2, 4, 8 in Figure 3

C. Hybrid group

This case includes both of the two cases described above

and happens when one part of the group is scattered and other

parts are contiguous. Figure 5 shows an example of an hybrid

group. Nodes 2 and 8 are contiguous while node 11 is isolated.

To handle this situation, we first identify the (scattered and

contiguous) subgroups, and for each subgroup we apply one

of the procedures presented earlier.

Fig. 5. An hybrid leaving group

V. GROUP DISAPPEARANCE WITH IDENTIFIED

COMMUNITIES

A network may contain communities that represent clusters

of nodes [21], [8], [11].

Our goal in this section is to adapt our approach in the

case of social networks with communities. To this end, we

use the method presented by [21] to first find communities

within a network. Then, a leaving group of nodes can be

localized either in one single community or its nodes are

spread over many communities. When the overall group of

the leaving node belongs to the same community, we simply

apply the procedure described earlier. However, when the

nodes of the group are distributed over several communities,

we first identify such communities and the corresponding

subgroups. For each subgroup within a community, we apply

the procedure based on the nature of the group (e.g., scattered).

Figure 6 shows an example of a network with two com-

munities. The leaving group has six nodes spread over two

communities with a scattered subgroup and a contiguous

one. To handle the disappearance of this group we apply

simultaneously our algorithm in each community.

The goal behind such approach is twofold. First, we want

to deal with such a real-life situation in which the network

Fig. 6. An hybrid leaving group G = {8, 12, 14, 1, 15, 16} lying in twocommunities

contains several communities. Second, using communities

allows to manage the network evolution in a parallel way and

hence get better performances in large networks.

VI. VALIDATION

In this section, we aim at assessing the performance of the

proposed solution and its feasibility through experimentation.

To this end, we first check if our approach minimizes the

number of added links after a group disappearance. Then, we

evaluate the performance in terms of execution time and added

links when group configurations vary.

A. Experimental Setup

The prototype is written in Python with the social network

library NetworkX [20]. The experiments were conducted on

both a single machine Intel Core i5 (8 GB of RAM and 3.20

GHz running under Linux Ubuntu) and a cloud infrastructure

PiCloud [22].

The data set used for our experiments is Autonomous

systems AS-733 of the Stanford University [26]. This set

describes a graph of Internet routers with a communication

network model of who-talks-to-whom. This communication

model is well-suited to our context since it allows us to see

how information circulates from one node to another one. The

AS-733 network is undirected and unweighted and contains

6474 nodes and 13233 edges. It is organized into sub-graphs

which will allow us to later handle the problem of node

and group of nodes deletion in networks with well-identified

communities.

B. Cost of network updates

We aim in this experiment to evaluate the cost of managing

a group disappearance in terms of links to add for maintaining

the network connected. To this end, we vary the size of the

leaving group from 10% to 40% of the total number of nodes

in the network. We argue that varying the size up to 40%

is enough to assess the performance of our approach and to

cope with most of the real-life situations. The leaving group is

chosen randomly and corresponds most of the time to a hybrid

group. Figure 7 depicts the average number of added links per

(deleted) node when the size of the leaving group increases

and shows that the average number of links to add is very low.

For example, with a disappearance of 1294 nodes (i.e., 20% of

the size of the network), we need to add at most 0.15 links per

497498

Page 7: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

deleted node, which is about 190 links for the whole leaving

group. This is mainly due to the fact that we add only links if

ever the network is disconnected and moreover, if the group

size is important, the probability that the remaining nodes are

linked to the group is so high that we do not need to add new

links. This result confirms our assertion in Section IV.

Fig. 7. Average added links per node according to the group size (representedas a proportion of the network size).

C. Impact of using the communities

In this experiment, we assess the performance of managing

a group disappearance from a network that contains com-

munities. To this end, we use the same data sets as before

and run the experiments on a cloud computing infrastructure.

We first find the communities, assign each one of them to a

single machine, and the leaving group is formed of subgroups

spread over communities. We measure the average response

time for managing group disappearance when the size of

the group varies. To better highlight the impact of using

communities, we depict in Figure 8 the time required to

update the network structure in case of both one or many

communities. We observe that the average response time when

the leaving group is distributed over several communities

(dashed curve) is far lower than the case when the network

forms only one community. In the latter case, the network is

stored and managed in a unique machine. However, in the

presence of communities, we manage in a simultaneous way

the disappearance of a set of nodes on different machines

and hence, the overall response time decreases significantly.

Moreover, we observe that the response time of both situations

varies slightly even if the group size increases. This is due to

the fact that the complexity of our procedure does not depend

on the group size but rather on the neighborhood size of the

group.

D. Impact of the network density

The goal of this experiment is to evaluate the performance

of our approach in two cases: a sparse graph and a dense

one with density values of 0.6% and 80%. To this end, we

consider graphs of 3000 nodes and three configurations: sparse

Fig. 8. Response time with or without communities according to group size(represented as a proportion of the network size).

graphs with a leaving hybrid group (HSG), sparse graphs with

a leaving contiguous group (CSG) and finally dense graphs

with a contiguous leaving group (CDG). We do not consider

the case of dense graphs with a hybrid leaving group because

such a case is very rare. In fact, since the graph is dense, any

node is tied to any other node and therefore, any leaving group

is more likely to be a contiguous group. We vary the leaving

group size from 10% to 40% of the initial number of nodes

and we measure the time required to update the network after

the disappearance of a group. The results shown in Figure 9

reveal that the response time is less than one second for a

hybrid leaving group from a sparse graph (HSG curve). This

is mainly due to the fact that most of the nodes in the group

are isolated because of the sparsity of the graph, and hence,

the number of links to add as well as the number of neighbors

to consider for finding a substitute is small and leads to a

rapid network update. Moreover, this result shows again that

the complexity of our algorithm depends on the size of the

neighborhood and since with a sparse graph the number of

neighbors of a given node is small, the time to update the

network remains low.

However, when the leaving group is contiguous, the neigh-

borhood is more important and the time to find whether the

group has some common or partial neighbors increases the

response time even if the graph is sparse (CSG). Furthermore,

this is more remarkable with a dense graph where a node (or

group of nodes) has an important neighborhood size. That is

why the response time with a dense graph for a contiguous

group (CDG) is very high compared to the one with a sparse

graph.

VII. CONCLUSION

The goal of this paper is to propose a method to deal with

the disappearance of an actor set within a social network. We

rely on our previous work that exploits the role played by a

given node to manage node disappearance while maintaining

the information flow quality. Three cases are identified for a

leaving actor set: a scattered, contiguous or hybrid one. When

the group is scattered, the JOAN and JOAN-C algorithms

proposed in [25] are used since nodes are not directly linked

498499

Page 8: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Fig. 9. Response time with dense or sparse graph vs group size

and their disappearance can be managed independently in an

iterative way. However, when the group is contiguous, our

method tries to find a substitute for the leaving group. To this

end, we consider all common neighbors of the leaving group

and find a group with the same number of nodes as the leaving

one. If there are no common neighbors, we gather a set of

nodes from the neighbors of the leaving group in order to get

a centrality degree value close to or higher than the one of the

leaving group. The same number of nodes as in the leaving

group are chosen and the nodes with the highest centrality

measures are considered first. In case of a hybrid group, we

dissociate the contiguous group part from the scattered part

and handle each one of them with the appropriate algorithm.

Our solution minimizes the number of added links while main-

taining the network connected after the disappearance of a set

of nodes. Two situations are also considered: the presence of

only one community and the existence of many communities.

An implementation and validation of our approach are also

studied and promising results are shown.

Since we are concerned with network connectivity, a more

pro-active approach we plan to explore is to notify and

recommend ways to strengthen the weak links.

ACKNOWLEDGMENT

Idrissa Sarr is grateful to “Programme canadien de bourses

de la Francophonie” of the Canadian International Develop-

ment Agency for the postdoctoral fellowship while Rokia

Missaoui acknowledges the financial support of the Natu-

ral Sciences and Engineering Research Council of Canada

(NSERC).

REFERENCES

[1] C. C. Aggarwal, A. Khan, and X. Yan. On flow authority discovery insocial networks. In SDM, pages 522–533, 2011.

[2] R. Ahlswede, C. Ning, S.-Y. Li, and R. Yeung. Network informationflow. Information Theory, IEEE Transactions on, 46(4):1204 –1216, jul2000.

[3] L. Backstrom and J. Leskovec. Supervised random walks: predictingand recommending links in social networks. In WSDM, pages 635–644,2011.

[4] E. Ben-Naim and P. L. Krapivsky. Addition - deletion networks. Journalof Physics A: Mathematical and Theoretical, 40(30):8607, 2007.

[5] P. J. Carrington. Models and methods in social network analysis.Structural analysis in the social sciences. Cambridge Univ. Press, 2005.

[6] M. D. Choudhury, Y.-R. Lin, H. Sundaram, K. S. Candan, L. Xie, andA. Kelliher. How does the data sampling strategy impact the discoveryof information diffusion in social media? In ICWSM, 2010.

[7] M. G. Everett and S. P. Borgatti. Extended cantrality. In P. J. Carrington,editor, Models and methods in social network analysis, pages 57–76.Cambridge University Press, 2005.

[8] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75–174, February 2010.

[9] N. E. Friedkin. Information flow through strong and weak ties inintraorganizational social networks. Social Networks, 3(4):273 – 285,1982.

[10] E. Gilbert and K. Karahalios. Predicting tie strength with social media.In Proceedings of the 27th international conference on Human factorsin computing systems, CHI ’09, pages 211–220, 2009.

[11] M. Girvan and M. E. J. Newman. Community structure in social andbiological networks. Proceedings of the National Academy of Sciencesof the United States of America, 99(12):7821–7826, 2002.

[12] F. Halim, R. H. C. Yap, and Y. Wu. A mapreduce-based maximum-flow algorithm for large small-world network graphs. In ICDCS, pages192–202, 2011.

[13] R. Hanneman and M. Riddle. Introduction to social network methods.http://faculty.ucr.edu/ hanneman/nettext/, 2005.

[14] D. M. A. Hussain and Z. Ahmed. Dynamical adaptation in terroristcells/networks. In SCSS (2), pages 557–562, 2008.

[15] J. M. Kumpula and J. P. Onnela and J. Saramaki, and K. Kaski and J.Kertesz. Emergence of communities in weighted networks. PhysicalReview Letters, 99(22), 2007.

[16] D. Knoke and S. Yang. Social Network Analysis. Sage Publications,second edition edition, 2008.

[17] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive andnegative links in online social networks. In Proceedings of the 19thinternational conference on World wide web, WWW ’10, pages 641–650, 2010.

[18] D. Liben-Nowell and J. Kleinberg. The link prediction problem forsocial networks. In CIKM ’03: Proceedings of the twelfth internationalconference on Information and knowledge management, pages 556–559.ACM, 2003.

[19] E. Negre, R. Missaoui, and J. Vaillancourt. Predicting a social networkstructure once a node is deleted. In ASONAM, pages 297–304, 2011.

[20] NetworkX library. http://networkx.lanl.gov/, 2011.[21] M. E. J. Newman. Fast algorithm for detecting community structure in

networks. Phys. Rev. E, 69(6):066133, Jun 2004.[22] PiCloud. http://www.picloud.com/, 2012.[23] R. Toivonen and L. Kovanen and M. Kivel and J. P. Onnela, and J.

Saramki and K. Kaski. A comparative study of social network models:Network evolution models and nodal attribute models. Social Networks,31(4):240–254, October 2009.

[24] M. G. Rodriguez, J. Leskovec, and A. Krause. Inferring networks ofdiffusion and influence. In Proceedings of the 16th ACM SIGKDDinternational conference on Knowledge discovery and data mining, KDD’10, pages 1019–1028, 2010.

[25] I. Sarr and R. Missaoui. Managing Node DisappearanceBased on Information Flow in Social Networks. SocialNetwork Analysis and Mining Journal (SNAM), 2012http://www.springerlink.com/openurl.asp?genre=articleid=doi:10.1007/s13278-012-0071-y.

[26] Stanford dataset. http://snap.stanford.edu/data/as.html, 2009.[27] T. Tylenda, R. Angelova, and S. Bedathur. Towards time-aware link

prediction in evolving social networks. In The 3rd SNA-KDD Workshop’09 (SNA-KDD’09), June 2009.

[28] S. Wasserman and K. Faust. Social network analysis : methods andapplications. Structural analysis in the social sciences. CambridgeUniversity Press, 1 edition, 1994.

[29] F. Wu, B. A. Huberman, L. A. Adamic, and J. R. Tyler. Information flowin social groups. Physica A: Statistical Mechanics and its Applications,337(1-2):327 – 335, 2004.

[30] J. Yang and J. Leskovec. Modeling information diffusion in implicitnetworks. In ICDM, pages 599–608, 2010.

499500


Top Related