[ieee 2010 international conference on advances in social networks analysis and mining (asonam 2010)...

5
Detecting Communities in Massive Networks based on Local Community Attractive Force Optimization Qi Ye School of Computer Science Beijing University of Posts and Telecommunications Beijing, China, 100876 Email: [email protected] Bin Wu School of Computer Science Beijing University of Posts and Telecommunications Beijing, China, 100876 Email: [email protected] Yuan Gao and Bai Wang School of Computer Science Beijing University of Posts and Telecommunications Beijing, China, 100876 Email: [email protected], [email protected] Abstract—Currently, community detection has led to a huge interest in data analysis on real-world networks. However, the high computationally demanding of most community detection algorithms limits their applications. In this paper, we propose a heuristic algorithm to extract the community structure in large networks based on local community attractive force optimization whose time complexity is near linear and space complexity is linear. The effectiveness of our algorithm is demonstrated by extensive experiments on lots of computer generated graphs and public available real-world graphs. The result shows our algorithm is extremely fast, and it is easy for us to explore massive networks interactively. I. I NTRODUCTION As many complex systems can be described as networks and graphs [1], [2], [3], recently, massive data sets of real-world networks are accumulating at a tremendous pace in various fields including telecommunications, blogs, groups of web users, online social networks, instant-messaging services, etc. Nowadays, researchers are increasingly interested in address- ing a wide range of challenges residing in massive real-world networks. Detection communities in massive networks is a big challenge to researchers. In this paper, we focus on detecting communities in massive network based on local community attractive force proposed Hu et al. [4]. To overcome the high time complexity algorithm proposed by Hu et al. [4] whose time complexity is O(|V | 2 ), we propose a novel commu- nity extraction algorithm based on theirs. Our algorithm is extremely fast and needs low memory storage requirements. Its time complexity is near linear and its space complexity is linear. To evaluate the effective of our community detection algorithm in real-world networks, we test our algorithm on several public real-world network data sets. This article is structured as follows. In section 2, we present previous related work. In section 3, we show our community detection algorithm based on local community attractive force optimization. In section 4, to verify the validity and utility of our algorithm, we run detailed experiments on a lot of public networks. Section 5 discusses our experiences and concludes this paper. II. RELATED WORK Communities are important structures of many real-world networks. Conventionally, a community can be loosely de- fined as a subsets of nodes in which there are more edges between nodes within the set than to nodes outside. Currently, many community detection algorithms have been proposed, such as divisive clustering algorithm [5], [6], modularity optimization [7], [8], spectral bisection optimization [9], label propagation [10], [11], etc. Newman and Girvan [12] propose the well adopted concept of modularity to measure the quality of community detection. There are lots of algorithms based on the global modularity optimization [7], [8]. Clauset, Newman and Moore [8] (CNM) propose an agglomerative hierarchical clustering algorithm based on the modularity optimization by incorporating several sophisticated data structures, and its time complexity is expected to be O(|V | log 2 |V |) in sparse graphs. Blondel et al. [13] have introduced another greedy agglomerative clustering algorithm (BGLL) for the general case of weighted graphs based on the local modularity optimization. However, the modularity is not a scale-invariant measurement, and the modularity optimization algorithms may fail to identify communities smaller than a certain scale [14]. Hu et al. [4] propose a community detection algorithm by employing an attractive-force-based self-organizing process, however, its time complexity is O(|V | 2 ). Raghavan et al. [11] design a simple and fast method based on label propagation whose time complexity is believed to be linear. Leung et al. [10] study the dynamic of the algorithm and propose several methods to avoid the formation of giant communities in large graphs. III. COMMUNITY DEFINITIONS AND DETECTION A. Symbols and Definitions An undirected graph G is a triple consisting of a vertex set V , an edge set E, and a relation that associates with each edge two vertices. Each graph G can be represented mathematically by an adjacency matrix A with elements A i,j =1 if there is an edge from node i to node j and A i,j =0 otherwise. So the degree of node i can be defined as k i = j A i,j . Suppose the node set in graph G is partitioned into m communities 2010 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.32 291 2010 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.32 291 2010 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.32 291

Upload: bai

Post on 10-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Detecting Communities in Massive Networks basedon Local Community Attractive Force Optimization

Qi YeSchool of Computer ScienceBeijing University of Posts

and TelecommunicationsBeijing, China, 100876

Email: [email protected]

Bin WuSchool of Computer ScienceBeijing University of Posts

and TelecommunicationsBeijing, China, 100876

Email: [email protected]

Yuan Gao and Bai WangSchool of Computer ScienceBeijing University of Posts

and TelecommunicationsBeijing, China, 100876

Email: [email protected],[email protected]

Abstract—Currently, community detection has led to a hugeinterest in data analysis on real-world networks. However, thehigh computationally demanding of most community detectionalgorithms limits their applications. In this paper, we propose aheuristic algorithm to extract the community structure in largenetworks based on local community attractive force optimizationwhose time complexity is near linear and space complexity islinear. The effectiveness of our algorithm is demonstrated byextensive experiments on lots of computer generated graphsand public available real-world graphs. The result shows ouralgorithm is extremely fast, and it is easy for us to explore massivenetworks interactively.

I. INTRODUCTION

As many complex systems can be described as networks andgraphs [1], [2], [3], recently, massive data sets of real-worldnetworks are accumulating at a tremendous pace in variousfields including telecommunications, blogs, groups of webusers, online social networks, instant-messaging services, etc.Nowadays, researchers are increasingly interested in address-ing a wide range of challenges residing in massive real-worldnetworks. Detection communities in massive networks is a bigchallenge to researchers. In this paper, we focus on detectingcommunities in massive network based on local communityattractive force proposed Hu et al. [4]. To overcome the hightime complexity algorithm proposed by Hu et al. [4] whosetime complexity is O(|V |2), we propose a novel commu-nity extraction algorithm based on theirs. Our algorithm isextremely fast and needs low memory storage requirements.Its time complexity is near linear and its space complexity islinear. To evaluate the effective of our community detectionalgorithm in real-world networks, we test our algorithm onseveral public real-world network data sets.

This article is structured as follows. In section 2, we presentprevious related work. In section 3, we show our communitydetection algorithm based on local community attractive forceoptimization. In section 4, to verify the validity and utility ofour algorithm, we run detailed experiments on a lot of publicnetworks. Section 5 discusses our experiences and concludesthis paper.

II. RELATED WORK

Communities are important structures of many real-worldnetworks. Conventionally, a community can be loosely de-fined as a subsets of nodes in which there are more edgesbetween nodes within the set than to nodes outside. Currently,many community detection algorithms have been proposed,such as divisive clustering algorithm [5], [6], modularityoptimization [7], [8], spectral bisection optimization [9], labelpropagation [10], [11], etc. Newman and Girvan [12] proposethe well adopted concept of modularity to measure the qualityof community detection. There are lots of algorithms based onthe global modularity optimization [7], [8]. Clauset, Newmanand Moore [8] (CNM) propose an agglomerative hierarchicalclustering algorithm based on the modularity optimizationby incorporating several sophisticated data structures, andits time complexity is expected to be O(|V | log2 |V |) insparse graphs. Blondel et al. [13] have introduced anothergreedy agglomerative clustering algorithm (BGLL) for thegeneral case of weighted graphs based on the local modularityoptimization. However, the modularity is not a scale-invariantmeasurement, and the modularity optimization algorithms mayfail to identify communities smaller than a certain scale [14].Hu et al. [4] propose a community detection algorithm byemploying an attractive-force-based self-organizing process,however, its time complexity is O(|V |2). Raghavan et al. [11]design a simple and fast method based on label propagationwhose time complexity is believed to be linear. Leung etal. [10] study the dynamic of the algorithm and propose severalmethods to avoid the formation of giant communities in largegraphs.

III. COMMUNITY DEFINITIONS AND DETECTION

A. Symbols and Definitions

An undirected graph G is a triple consisting of a vertex setV , an edge set E, and a relation that associates with each edgetwo vertices. Each graph G can be represented mathematicallyby an adjacency matrix A with elements Ai,j = 1 if there isan edge from node i to node j and Ai,j = 0 otherwise. Sothe degree of node i can be defined as ki =

∑j Ai,j . Suppose

the node set in graph G is partitioned into m communities

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.32

291

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.32

291

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.32

291

that is c1, c2, · · · , cm, and let C = {c1, c2, · · · , cm} to bethe community set which contains all the communities. LetAdjCom(i) to be the set of adjacent communities of nodei. Suppose c to be a community which node i belongs to,and we can split the total degree of node i into two parts:ki = kin

i (c) + kouti (c), where kin

i (c) is the number of edgesconnecting node i to community c, and kout

i (c) is the numberof edges connecting node i to the rest of the graph.

B. Community Definition

Comparative definitions of communities are given on thebasis of link comparison [4]. Radicchi et al. [6] proposethe comparative definitions of strong communities and weakcommunities. The strong community definition concerns abouteach node in a community, and the weak community definitiontakes all the nodes in the community as a whole. Inspiredby the strong and weak community definitions proposed byRadicchi et al. [6], Hu et al. [4] also propose new alternativedefinitions of strong and weak community and give the corre-sponding detection algorithm based on the definitions. We callthese new community definitions as weaker strong communitydefinition and most weak community definition.

C. Linear Complexity Algorithm

1) Algorithm: Following the community definitions, Huet al. [4] propose the definition of attractive force Fi,c ofcommunity c to node i, and Fi,c can be formulated by follows:

Fi,c =∑

j∈c

Ai,j . (1)

As shown in Eq. 1, the community attractive force Fi,c is justdepended on the number of neighbors of i in community c. Asmentioned in the algorithm proposed by Hu et al. [4] whosetime complexity is O(|V |2), each node will be moved into thecommunity or communities with the largest attractive force,respectively. To get non-overlapping communities, we find thisprocess could be modified into a near linear complexity oneby only moving each node into the community with largestattractive force. We also note that the definition of communityattractive force in the algorithm proposed by Hu et al. [4]are very similar with the label selection strategy in the labelpropagation algorithm proposed by Raghavan et al. [11] andit is different from the label score selection strategy proposedby Leung et al. [10]. Our algorithm is based on the algorithmframework proposed by Hu et al. [4]. Inspired by the algorithmproposed by Hu et al. [4], we now summarize our improvedversion based on their algorithm which is shown as follow:

1) We initially set each node and half of its neighborsrandomly to be a community, if they are still notbelonging to any community.

2) For each node i remove it from its community. Calcu-late the force Fi,c between node i to all the adjacentcommunity c ∈ AdjCom(i).

3) If the new largest community force is larger than itsoriginal community force, move node i into the new

(a) Partition 1 (b) Partition 2 (c) Partition 3

Fig. 1. The formation of unconnected communities during the process ofthe community detection.

community otherwise keep node i in the original com-munity.

4) Repeat step 2 and step 3, until all the communities isstable or the fixed sufficient N steps is reached.

5) Identify the community c(i) for each node i. For eachedge ei,j , if c(i) �= c(j) mark ei,j as unlinked. Find allthe components as communities in the graph.

2) Community Separation and Oscillation: There are twocases that may hinder our work on community detection: thefirst one is that a community may be unconnected if we justuse the local community attractive optimization which mayalso arise in the algorithm proposed by Hu et al. [4] and thesecond one is the case that communities may oscillate duringthe community detection process by our non-overlappingversion.

a) Community Separation: A required property of acommunity is connectedness that is there must be a pathbetween each pair of nodes in the community. In Fig. 1, weuse a toy network with 3 obvious communities to show theformation of unconnected communities during the communitydetection process. As shown in Fig. 1(a), there are two com-munities linked to node 9, and node 9 has equal communityforce from both of these 2 communities. In Fig. 1(b), thenode 9 moves from community 2 to community 1. However,node 9 is a bridge in community 2 and his departure willcause community 2 become unconnected. To avoid this case,it is more reasonable to regard community 2 as two separatedsmall communities as shown in Fig. 1(c). So at step 5of our algorithm, we mark all the edges between differentcommunities as unlinked and find all the components as thefinal communities just as Raghavan et al. [11] did in theiralgorithm.

b) Community Oscillation: Fig. 2 shows the case ofcommunity oscillation during community detection process.As shown in Fig. 2, node 10 has equal community force fromthe 3 communities, during each iteration it may randomly beselected by any neighboring communities. We will find thatthis case leads to the community oscillation, and it will hinderthe algorithm to get the convergence condition. To prevent thiscase, we will just remove a node from old community to a newone only when the community force in the new community islarger than the old one.

3) Data Structure and Complexity Analysis: In this part,we will discuss the data structures used in our algorithm.

292292292

(a) Partition 1 (b) Partition 2 (c) Partition 3

Fig. 2. The oscillation during the community detection process.

Note, at step 2 and step 3, that we need to find out whichcommunity each node belongs to, and for each community wecan remove and add a node quickly. As the algorithm is non-overlapping and each node just belongs to one community,therefore we store the node community relations in a hashtable. Furthermore, to speed up this algorithm, we also keepthe data structures of the communities as hash tables. Underreasonable assumptions, the expected time to search, add andremove for an element in a hash table is O(1) [15].

The time complexity of our algorithm is O((I + 1)(|V | +|E|)), where |V | is the number of nodes and |E| is the numberof edges and I is the number of iterations in this algorithm.At step 5, it will cost O(|V | + |E|) time to traverse thegraph by breadth-first search algorithm in an adjacency-listrepresentation graph. The space complexity of our algorithmis O(2|V | + |E|). As we have to store sparse graph G inan adjacency-list representation, the desirable property thatthe amount of memory it requires is O(|V | + |E|). We haveto maintain the communities, and the amount of space itrequires in the hash tables is O(|V |). Comparing with theoriginal algorithm proposed by Hu et al. [4] whose timecomplexity is about O(|V |2), our algorithm is much faster andthe time complexity is near linear, and the space complexityof algorithm is linear.

IV. ALGORITHM COMPARISON AND EXPERIMENTS

Our algorithm is implemented in the network analysisframework JSNVA [16], [17]. Taking a graph mining and visualanalytics approach, based on the framework of JSNVA [17], wedevelop a tool called TeleComVis [16] in Java programminglanguage to analyze the structure of massive graphs. All thecommunity detection algorithms in Table I are integrated intothis tool, and all our Java algorithms are implemented insingle threaded. We have implement all of these algorithmsusing Java platform: Java 6.0, Java HotSpot Server VM with1.3G heap size. The experiments are performed on a ordinaryPC (CPU=Intel Core2 Duo 2.66GHz, L2 Cache=3072kB,RAM=3G) running a Window XP operating system. In thefollowing experiments we set the fixed iteration constantN = 30.

A. Partition Goodness Measure

Note that since the core of our algorithm is stochastic,different runs could yield in principle different partitions. Wehave performed 10 runs of the algorithm for different networks

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

1

Mixing parameter μ

Mut

ual i

nfor

mat

ion

GNCNMOurs

(a) Lancichinetti

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

External degree kout

Mut

ual i

nfor

mat

ion

GNCNMOurs

(b) Girvan & Newman

Fig. 3. Performance of community detection algorithms, in the benchmarkgraphs. In Lancichinetti model graphs, all the graphs contains 1000 nodes, andthe exponents γ and β of degree distributions and community size distributionsare 2 and 1, respectively. In the Girvan and Newman benchmark graphs, eachgraph contains 128 nodes divided into 4 equal communities.

and chose the best modularity Q partition just as what Hu etal. [4] did in their experiments.

B. Computer-Generated graph

In this part we present a number of tests of the communitydetection algorithms on benchmark graphs proposed by Girvanand Newman [5] and Lancichinetti et al. [18] for which thebuilt-in communities are already known. In this case we canfind out whether the algorithms reliably detects the knownstructures in different graph generating benchmark models.In Lancichinetti model graphs, all the graphs contains 1000nodes, and the exponents γ and β of degree distributionsand community size distributions are 2 and 1, respectively.Fig. 3(a) and Fig. 3(b) show the well used normalized mutualinformation similarity [19] between the partitions found bydifferent community algorithms and the built-in communitiesin the Lancichinetti benchmark graphs and the Girvan andNewman benchmark graphs. As shown in Fig. 3(a), we canfind in the heterogeneous benchmark graphs proposed byLancichinetti et al. [18], our algorithm performances betterthan CNM in most cases. However, we can find the quality ofour algorithm drops quickly as the growing of the externalcommunity degrees of nodes in these benchmark models.We regard that is because that the local community forceis not based on the definition of community density buton the number of links. We also perform our algorithm onthe linked clique benchmark graphs proposed by Fortunatoand Barthelemy [14], and we find we can find out all theclique communities separately for all size of cliques. As ouralgorithm is based on local community definition and it isunrelated to the problem of modularity optimization, thereforthe communities found by algorithm can get rid of the famouscommunity resolution limit problem caused by modularityoptimization.

C. Typical Social Networks

In this section, we present a number of experiments towhich our algorithm is applied. For our first examples of thealgorithm performance, we apply it to several small socialnetworks. After that, to show the performance of our algo-rithm, we will use our algorithm to explore several real-worldmassive networks. As Hu et al. [4] do not mention the partition

293293293

(a) Zachary’s karate club (b) USA college football

Fig. 4. The communities extracted by our algorithm in the Zachary’s karateclub network and USA college football network.

modularity scores for these networks, we do not compare theour results with theirs.

1) Zachary’s karate club: The first network is the famous“Zachary karate club” network of the friendships between 34persons of a karate club [5]. As shown in Fig. 4(a), we get apartition of 4 communities, including 4 weak communities and1 strong one. We can find the modularity of the communitiesfound by our algorithm is Q = 0.416 while modularity gotby the GN partition is Q = 0.401. The highest modularity Qfound by Raghavan et al. [11] is 0.399. As shown in Fig. 4(a),we can find that the two communities of the the administratorand the teacher are separated into 2 smaller communities,respectively. We can also find that the node 10 is incorrectlypartitioned. We regard that is because node 10 get equal thecommunity force from the communities of the administratorand the teacher.

2) College football network: We also apply our algorithmto the collage football network [5]. As shown in Fig. 4(b),we can divide the football network into 12 communities withhigh degree of success. There are 8 strong communities and10 weak communities. All the communities satisfy the weakstrong community definition and the most weak communitydefinition. The modularity Q of the partition is 0.578 by ouralgorithm. The GN algorithm find 10 communities whosemodularity is Q = 0.599, and the CNM algorithm find 7communities whose modularity is Q = 0.577. In this network,we find our algorithm performs remarkably well in the footballnetwork comparing with the results got by GN and CNMalgorithm. The highest modularity Q found by Raghavan etal. [11] is 0.476. We regard that our initial partition conditionand the oscillation preventing condition make our algorithmperformance better.

D. Algorithm Performance

To verify the validity and utility of our algorithm, we rundetailed experiments on a lot of public network data sets. Thenetworks are provided by Neman 1, Arenas 2 and Leskovec 3.These networks are shown in Table I.

1http://www-personal.umich.edu/ mejn/netdata/2http://deim.urv.cat/ aarenas/data/welcome.htm3http://snap.stanford.edu

Although the optimization of modularity may fail to identifymodules smaller than a certain scale, we still regard thatmodularity Q as a good posteriori metric to show the goodnessof graph partition. To compare our algorithm with generalmodularity optimization algorithms, we run the experiments onthe GN algorithm and the CNM algorithm. Table I shows theperformances of GN, CNM and ours for community detectionin networks of various sizes. In Table I, for each algorithmwe display the modularity Q of the communities, the numberof communities |C|, the size of largest community M andthe computation time t (seconds). We use the subscripts g,c and f to show the metrics got by GN, CNM and ouralgorithm, respectively. Our algorithm clearly performs betterin computing time and gives acceptable modularity. Althoughcommunity partition criterion of the GN and CNM algorithmsis based on modularity optimization, our algorithm still per-form better in some network such as Karate club network,Dolphin network, Pol-book network, Astro-ph network, etc.We find that our community extraction algorithm is extremelyfast in massive networks, and we can got the communities inthe Wiki-talk network with more than 2 million nodes and4 million edges in less than 5 minutes. We can also find aninteresting phenomenon in the communities extracted by ouralgorithm, the largest community found by our algorithm issmaller than the communities found by GN and CNM in mostcases. We also find that the convergence of the algorithm isvery fast we can get the communities in the Wiki talk networkjust in 15 iterations.

To ensure almost all of the communities found by ouralgorithm should satisfy the definitions of weaker strong com-munity definition and the most weak community definition, wecalculate the proportion WF of weaker strong communitiesand the proportion MS of the most weak communities to allfound communities. We also use the subscripts g, c and fto show the metrics got by GN, CNM and our algorithm,respectively. As shown in Table II, most of the communi-ties extracted by these 3 algorithms meet the weaker strongcommunity definition, and almost all the communities meetthe most weak community definition, we regard our algorithmperforms better on the definitions of the communities proposedby Hu et al. [4] considering that the communities found byus are smaller than the communities found by GN and CNMalgorithms.

V. CONCLUSIONS

The massive real-world networks makes the issue of thetime complexity of community detection algorithms essential.Inspired by the algorithm proposed by Hu et al. [4] whosetime complexity is O(|V |2), we propose a novel communitydetection algorithm based on the local community force opti-mization. Our algorithm is very fast and easy to implement. Toevaluate the effectiveness of our algorithm, we give a lots ofexperiments. The result shows our algorithm is very efficientand it can enhance our ability to explore massive networksinteractively. In the future, we will investigate how to extend

294294294

TABLE ICOMPARISONS OF THE COMMUNITY DETECTION ALGORITHM GN AND CNM AND OURS

Network |V | |E| Qg Mg tg(s) Qc Mc tc(s) Qf Mf tf (s)Karate Club 34 78 0.401 12 0 0.381 17 0 0.416 13 0

Dolphin 62 159 0.519 21 0 0.492 24 0 0.522 20 0Pol-books 105 441 0.517 45 3 0.502 49 0 0.525 45 0Football 115 613 0.599 18 5 0.577 27 0 0.578 15 0

Jazz 198 2742 0.405 59 238 0.439 67 0 0.442 73 0Elegans-neu 297 2148 0.302 150 342 0.372 110 0 0.327 149 0

Elegans-meta 453 2025 0.401 116 674 0.400 151 0 0.350 241 0Net-sci 1589 2742 0.958 91 26 0.955 105 0 0.909 32 0E-mail 1133 5451 0.532 251 6713 0.510 375 1 0.468 418 0

Pol-blogs 1490 16715 0.417 556 70150 0.427 634 2 0.426 668 0Power-grid 4941 6594 0.933 223 3238 0.933 302 0 0.644 30 0CA-GrQc 5242 14484 0.849 267 57394 0.817 899 2 0.766 189 0

Hep-th 8361 15751 0.836 518 166527 0.811 1147 4 0.698 85 0PGP 10680 24316 - - - 0.852 1126 8 0.744 295 0

Astro-ph 16706 121251 - - - 0.630 3907 139 0.667 1048 2Cond-Mat 40421 175693 - - - 0.654 8315 662 0.592 443 3Internet 22963 48436 - - - 0.635 6305 121 0.529 3922 1

Enron-email 36692 183831 - - - 0.519 9959 680 0.540 9097 3Insti-email 265214 364481 - - - 0.749 53967 17708 0.702 7634 10

Amazon0302 262111 899792 - - - 0.814 52131 16059 0.694 1132 25Web-Google 875713 4322051 - - - - - - 0.802 5516 113

Wiki-Talk 2394385 4659565 - - - - - - 0.527 599904 292

TABLE IIPROPORTION OF WEAKER COMMUNITIES AND MOST WEAK COMMUNITIES

FOUND BY DIFFERENT ALGORITHMS

Network WSg MWg WSc MWc WSf MWf

Karate Club 1.00 1.00 0.67 1.00 1.00 1.00Dolphin 1.00 1.00 0.75 0.75 1.00 1.00

Pol-books 1.00 1.00 0.50 0.75 1.00 1.00Football 1.00 1.00 0.57 1.00 1.00 1.00

Jazz 1.00 1.00 0.0 0.75 1.00 1.00Elegans-neu 1.00 1.00 0.40 0.80 1.00 1.00

Elegans-meta 1.00 1.00 0.44 1.00 1.00 1.00Pol-blogs 1.00 1.00 0.98 0.99 1.00 1.00Net-sci 1.00 1.00 0.99 1.0 1.00 1.00

Power-grid 1.00 1.00 0.93 1.00 1.00 1.00CA-GrQc 1.00 1.00 0.99 1.00 1.00 1.00

Hep-th 1.00 1.00 0.99 1.00 1.00 1.00PGP - - 0.98 1.0 1.00 1.00

Astro-ph - - 0.97 1.0 0.96 0.98Cond-Mat - - 0.98 1.0 0.94 0.97

Internet - - 0.73 1.0 1.00 1.00Enron-email - - 0.94 1.00 1.00 1.00Insti-email - - 1.00 1.00 1.00 1.00Ama0302 - - 0.96 1.00 1.00 1.00

Web-Google - - - - 1.00 1.00Wiki-Talk - - - - 1.00 1.00

this algorithm to get a fast overlapping community algorithmto get soft partitions for massive graphs.

ACKNOWLEDGMENT

We thank M. E. J. Newman, Alex Arenas and Jure Leskovecfor providing us the network data sets. This work is supportedby the National Science Foundation of China (No. 90924029,60905025). It is also supported the National Hightech R&DProgram of China (No.2009AA04Z136) and the National KeyTechnology R&D Program of China (No.2006BAH03B05).

REFERENCES

[1] M. E. J. Newman, “The structure of scientific collaboration networks,”Proc. Natl. Acad. Sci., vol. 98, no. 2, pp. 404–409, January 2001.

[2] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’networks,” Nature, vol. 393, no. 6684, pp. 440–442, June 1998.

[3] V. Spirin and L. A. Mirny, “Protein complexes and functional modulesin molecular networks,” Proc. Natl. Acad. Sci., vol. 100, no. 21, pp.12 123–12 128, October 2003.

[4] Y. Hu, H. Chen, and et al., “Comparative definition of community andcorresponding identifying algorithm,” Phys. Rev. E, vol. 78, p. 026121,2008.

[5] M. Girvan and M. E. J. Newman, “Community structure in social andbiological networks,” Proc. Natl. Acad. Sci., no. 12, pp. 7821–7826,June 2002.

[6] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi,“Defining and identifying communities in networks,” Proc. Natl. Acad.Sci., vol. 101, no. 9, pp. 2658–2663, March 2004.

[7] M. E. J. Newman, “Fast algorithm for detecting community structure innetworks,” Phys. Rev. E, vol. 69, no. 066133, 2004.

[8] A. Clauset, M. E. J. Newman, and C. Moore, “Finding communitystructure in very large networks,” Physical Review E, vol. 70, no. 6,p. 066111, December 2004.

[9] M. E. J. Newman, “Modularity and community structure in networks,”Proc. Natl. Acad. Sci., no. 103, pp. 8577–8582, 2006.

[10] I. X. Y. Leung, P. Hui, P. Lio, and J. Crowcroft, “Towards real-timecommunity detection in large networks,” Phys. Rev. E, vol. 79, no. 6, p.066107, Jun 2009.

[11] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithmto detect community structures in large-scale networks,” Phys. Rev. E,vol. 76, no. 3, p. 036106, Sep 2007.

[12] M. E. J. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,” Phys. Rev. E, vol. 69, p. 026113, 2004.

[13] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fastunfolding of communities in large networks,” J. Stat. Mech., p. 10008,9 October 2008.

[14] S. Fortunato and M. Barthelemy, “Resolution limit in communitydetection,” Proc. Natl. Acad. Sci., vol. 104, pp. 36–41, 2007.

[15] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introductionto Algorithms, 2nd ed. MIT Press, 2001.

[16] Q. Ye, B. Wu, L. Suo, and et al., “TeleComVis: Exploring temporalcommunities in telecom networks,” in ECML PKDD, Bled Slovenia,2009, pp. 755–758.

[17] Q. Ye, T. Zhu, D. Hu, B. Wu, and N. Du, “Cell phone mini challengeaward: Social network accuracy—exploring temporal communicationin mobile call graphs,” in IEEE International Symposium on VisualAnalytics Science and Technology, Columbus, USA, 2008, pp. 207–208.

[18] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs fortesting community detection algorithms,” Phys. Rev. E, vol. 78, no. 4,p. 046110, Oct 2008.

[19] L. Danon, J. Duch, A. Arenas, and A. Dłaz-guilera, “Comparingcommunity structure identification,” Journal of Statistical Mechanics:Theory and Experiment, vol. 9008, p. 09008, 2005.

295295295