[ieee 2013 1st international conference on emerging trends and applications in computer science...
TRANSCRIPT
ICETACS 2013
978-1-4673-5250-5/13/$31.00 ©2013 IEEE
An Empirical Study of Community and Sub-Community Detection in Social Networks Applying
Newman-Girvan Algorithm
Deepjyoti Choudhury
Department of Information
Technology
Assam University
Silchar-788011, India
Saprativa Bhattacharjee
Department of Information
Technology
Assam University
Silchar-788011, India
Anirban Das
Department of Information
Technology
Assam University
Silchar-788011, India
Abstract— A social network can be represented by a set
of human beings in which one member is connected to one
or more members from the same set. We can obtain visual
and mathematical models of human relationship by
analysing a social network. There are several inherent
properties of social networks such as power law
distribution, centrality, small world network, modularity
etc. Community structure is another important property
of social network and it has gained tremendous popularity
in terms of current research trends. With the increasing
popularity, community structure is also getting equally
complex within online social network services like
Facebook, Google+, MyS pace and Twitter. Newman-
Girvan algorithm is the widely used community detection
algorithm in social ne tworks. This paper reflects the
structure of communities as well as sub-communities
occurring in a social network by applying Newman-
Girvan algorithm. We have implemented this community
detection algorithm on real world networks. We have
given a new concept to detect sub-communities in real
world networks in this paper. This paper is mainly
focused on an empirical study of the Newman-Girvan
algorithm.
Keywords— node, graph, community, algorithms, clustering
I. INTRODUCTION
Community structure or we can say clustering is one of
the most efficient features of networks which represent real
world systems. That means we can define communities in a
social network as the well-organized nodes in clusters with
multip le edges among the nodes of the same cluster and fewer
edges joining the nodes of different clusters. Community
detection in networks has been an important issue in
sociology, biology, computer science and in many other
disciplines. So, networks are commonly represented as graphs
in all those fields and disciplines. Real networks are not
random graphs as they appear in large homogeneities. Real
world networks have a h igh level of o rder and organizat ion.
The degree distribution in social networks generally fo llows a
power law. Furthermore, the distribution of edges is not only
globally, but also locally non-homogeneous with high
concentrations of edges within special groups of nodes, and
low concentrations between the other nodes. This feature of
real networks is called community structure, or clustering.
The term ―Community‖ first appeared in the book
―Gemeinschaft und Gesellchaft‖ published in 1887. In social
networks, the term community has no unique definit ion till
today which can be widely accepted.
Figure 1: A simple graph with three communities having twelve nodes.
In the above figure, there are three communities in which
all the nodes within a community are densely inter-connected
with each other and have sparse inter-connection with the
nodes belonging to another community. In a social network
community, nodes are connected with each other based on
their human relationship like friendship, colleague etc.
II. RELATED WORK
In computer science, community can be regarded as sub-
graphs of a network. We can generate the whole network as a
graph where several sub-graphs may reside in the original
graph. Connection among the nodes in a sub-graph is intra-
dense. On the other hand connection among the nodes
belonging to different sub-graphs is comparatively sparse.
Newman termed these sub-graphs as community structure
[1].This definit ion puts importance on structural
characteristics of a community, where links or edges in intra-
communit ies are denser than inter-community, which can be
measured by degree of the module [2]. There are several
existing community detection algorithms which have the
limitat ion of not being able to detect the overlapping
communit ies in a social network [3]. Overlapping community
detection involves community definit ion as well as the
evaluation metric. The evaluation metric focuses especially
2
1
6
5
7
8
9
11
1
ICETACS 2013
-75-
on analysis and comparison of the existing overlapping
community detection algorithms including the basic ideas of
the algorithms. M. Girvan and M. E. J. Newman [4] had
proposed community structure and detection algorithm in
social and biological networks. Social groupings in a social
network are represented as communities.
Dynamic graphs generally consist of multi-graph and
community detection in dynamic networks [5] is a
challenging task. In a dynamic graph, a pair o f nodes can
have links appearing or disappearing at different time po ints.
Mobility is used as a network transport mechanism for
distributing data in many networks. GuoDong Kang et.al
proposed two new mobility models in 2011 [6], which are
known as Social Community Partner Mobility Model (SCP)
and Social Community Leader Mobility Model (SCL).
Minimum-cut method is one of the oldest algorithms for
dividing a network into parts. This method uses in-load
balancing for parallel computing in order to min imize
communicat ion between processor nodes. So it is less than
ideal for finding community structure in general networks [1].
In simulation environment, SCP model [6] will regard the
office, gymnasium and restaurant to be small squares in the
given simulation area. Here, the concept of community
destination may come. When the community moves from the
office to the gymnasium, the gymnasium is called the
community destination. In simulation, the community
destination is the square which is chosen with respect to the
gymnasium. When the community moves from gymnasium to
the restaurant, one new square in the simulat ion will be
chosen as a new community destination which corresponds to
the restaurant. In Partner Movement Case, the members in
one community will also have their own destinations in
gymnasium or restaurant.
Jie Jin et. Al [7] proposed a new center-based method,
which is especially designed for weighted networks. And the
method is also suitable for large-scale network because of its
low computational complexity. They demonstrated the
method on a synthetic network and two real-world networks.
Most known techniques for community detection use only the
informat ion about the linkage behaviour [8] for the purposes
of community prediction and clustering. Some recent work
has shown that the use of node content can be helpful in
improving the quality of the communities. Moreover, we can
see that edge content [9] provides a number of unique
distinguishing characteristics of the communities which
cannot be modelled by node content.
III. METHODOLOGY
A. Newman-Girvan algorithm
Hierarchical methods have several shortcomings with
respect to detecting the communities in a social network. To
remove those shortcomings, Newman and Girvan presented
their algorithm to detect the communit ies in social networks
in 2002 [4]. They brought a new concept, popularly known as
―edge betweenness‖ to detect the community in large and
complex networks. According to the algorithm, we simply
focus on those edges that are least central to the network and
those edges are considered as most ‗‗between‘‘ communities,
instead of calculating the measure of the edges which are
central to the network. That means, ―edge betweenness‖ score
of a particular edge can be calculated as the number of times
it appears in the shortest path matrix o f the graph. Then, we
remove the particular edge which has the highest ―edge
betweenness‖ score according to the algorithm and we get
first two communities. If there are more than one links or
connections between the communit ies, then we will remove
the edges which connect both the communities serially
according to the highest ―edge betweenness‖ score. We will
remove all the edges in the network in this way until we get
the single nodes. The procedure of Newman-Girvan
algorithm is stated below:
Calculate the betweenness score for all the edges in the network.
The edge having the highest edge betweenness score
will be removed.
After removal of the edge, betweenness score will be recalculated for all the remaining edges in the network.
Step 2 will be repeated until we remove all edges or we get the single node in the network.
Newman-Girvan have defined the community detection
procedure in a network with this algorithm. But the steps
defined by Newman-Girvan will g ive us only dendrograms of
the network. So, we will get only two major communit ies by
following the steps of Newman-Girvan algorithm during the
first iteration. In this paper, we have defined a new concept of
detecting ―sub-communities‖ in a network applying Newman-
Girvan algorithm. We can detect sub-communities in a
network under the main two communit ies. We have presented
here the concept of a sub-community which has two or more
than two nodes. All the nodes contained in a sub-community
are intra-dense connected. The number of nodes contained in
a sub-community depends upon the threshold value given by
the user.
B. Data Set
We have tested Newman-Girvan algorithm on three real
world networks: Zachary Karate Club, College Football
Network and Bottlenose Dolphin Network. Given below is a
brief description of all three datasets:
1) Zachary Karate Club: Zachary [10] had generated this
network. He studied the friendship of 34 members of a karate
club over a period of two years. The club was divided in two
groups during that period almost of the same size because of
disagreements. The original div ision of the club in 2
communit ies is shown in result given below. And we have
also found out the sub-communities in Zachary Karate Club
under the main two communities.
2) American College Football Network : The American
College Football network [12] is a real world network which
consists of 115 teams. The edges in the network represent the
regular season football games between the two teams they
connect. The teams are divided into conferences and let the
teams play within their own conference more frequently.
Twelve conferences or communit ies are defined in the
network.
ICETACS 2013
-76-
3) Bottlenose Dolphin Network : Lusseau et.al [11] studied
the behaviour of dolphins and compiled the Bottlenose
Dolphin Network in 2003 which consists of 62 bottlenose
dolphins living in Doubtful Sound, New Zealand. Two
dolphins established a relation between them by their
statistically frequent association. The network is divided into
two large groups and the number of relations or edges is 159.
C. Experimental Set-up
All the programs are coded in java. The execution and the
testing are done on a machine with 3.10GHz Intel® Core™ i5
processor and 4 GB of memory.
IV. RESULTS AND DISCUSSIONS
A. Zachary Karate Club Network :
Figure 2: Zachary Karate Club is divided into two communities.
Discussion: It is seen in most of the papers that Zachary
Karate Club is divided into two communities: one is
Admin istrators of the club and another is Instructors of the
club. The point of disagreement was raised in both the
communit ies and the Instructors left out from the club and
made one new club
Figure 3: Zachary Karate Club is divided into five sub-communities.
Discussion: Five sub-communities are detected in Zachary
Karate Club and all the sub-communit ies are derived from the
major two communities.
B. College Football Network
Figure 4: College Football Network is divided into twelve sub-communities.
Discussion: Co llege Football Network is a real world
network which was played in USA. In this network, node
represents team edge between the nodes repres ents game.
There are twelve teams in the network which is shown in the
above figure.
Figure 5: College Football Network is divided into two major communities.
Discussion: In most of the papers we can see that College
Football Network is divided into twelve communit ies. Here,
we have also shown that there are two major communities
and the rest all can be considered as sub-communities which
are derived from these two communities.
C. Bottlenose Dolphins Network
Figure 6: Bottlenose Dolphins Network is divided into two communities.
ICETACS 2013
-77-
Discussion: This network consists of 62 bottlenose
dolphins and all the dolphins have the relation with some
another dolphins which is shown in the above figure. There
are two major communities in Bottlenose Dolphins Network.
Figure 7: Bottlenose Dolphins Network is divided into five sub-communities.
Discussion: The new result is shown on Bottlenose
Dolphins Network where we can get five sub-communities
and all the sub-communit ies are derived from the major two
communities.
V. CONCLUSIONS & FUTURE WORK
In this paper, we have presented an empirical study of
Newman-Girvan algorithm on various data sets. Our results
differ from those presented earlier in the sense that we have
defined a new concept of sub-communit ies. The main
drawback of Newman-Girvan algorithm is the absence of a
clear specification on the definition of what constitutes a
community.
A lot has been left out fo r individual interpretations. The
problem increases many-fo lds in cases of unsupervised
datasets. The user has to manually identify the major
communities from the dendrograms structure.
As a future work, we hope to apply the concept of multi-
objective function to detect the stable communities in social
networks.
REFERENCES
[1] M.E.J. Newman, ―Detecting Community Structure in
Networks‖, Eur. Phys. J. B 38, pp . 321-330, 2004.
[2] M.E.J. Newman and M Girvan, ―Finding and evaluation
community structure in networks‖, Physical Review E, 69(2),
2004.
[3] L. Zhubing, W. Jian, and Li yuzhou, ―An Overview on
Overlapping Community Detection‖, The 7th International
Conference on Computer Science & Education (ICCSE 2012) ,
Melbourne, Australia , July 14-17, 2012.
[4] M. Girvan and M. Newman, ―Community Structure in Social
and Biological Networks‖, Proceedings of the National Academy of Scinces, vol. 99, no. 12, pp. 7821–7826, June,
2002.
[5] L.C. Huang, T.J Yen, and S.C.T. Chou, ―Community Detect ion
in Dynamic Social Networks‖, International Conference on
Advances in Social Networks Analysis and Mining
(ASONAM), July 2011, pp. 110 – 117.
[6] G. Kangl, M. Diaz, T. Perennou, P. Senac, and L. Xul,
―Mobility Model Based on Social Community Detection Scheme‖, Cross Strait Quad-Regional Radio Science and
Wireless Technology Conference, 2011.
[7] J. Jin, L. Pan, C. Wang, and J. Xie, ―A Center-based
Community Detection Method In Weighted Networks‖, 23rd
IEEE International Conference on Tools with Artificial
Intelligence, 2011.
[8] A. Clauset, M. E. J. Newman, and C. Moore, ―Finding
community structure in very large networks‖, In Phys. Rev. E
70, 066111, 2004.
[9] G.J. Qi, C. C. Aggarwal, and T. Huang, ―Community Detection with Edge Content in Social Media Networks‖, IEEE 28th
International Conference on Data Engineering, 2012.
[10] Zachary, ―W.W: An information flow model for conflict and
fission in small groups‖, Journal of Anthropological Research.
33, pp. 452—473, 1977.
[11] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten,
and S. M. Dawson, ―Behavioral Ecology and Sociobiology 54‖,
pp. 396-405, 2003.
[12] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. USA
99, pp. 7821-7826, 2002.