parallel community detection for massive graphs

Parallel Community Detection for MassiveGraphsE. Jason Riedy, Henning Meyerhenke, David Ediger, andDavid A. Bader

13 September 2011

Main contributions

• First massively parallel algorithm for community detection.

• Very general algorithm using a weighted matching and applyingto many optimization criteria.

• Partition a 122 million vertex, 1.99 billion edge graph intocommunities in 2 hours.

PPAM11—Parallel Community Detection—Jason Riedy 2/25

Exascale data analysis

Health care Finding outbreaks, population epidemiology

Social networks Advertising, searching, grouping

Intelligence Decisions at scale, regulating algorithms

Systems biology Understanding interactions, drug design

Power grid Disruptions, conservation

Simulation Discrete events, cracking meshes

• Graph clustering is common in all application areas.


These are not easy graphs.Yifan Hu’s (AT&T) visualization of the Livejournal data set


http://www2.research.att.com/~yifanhu/GALLERY/GRAPHS/GIF_SMALL/[email protected]

But no shortage of structure...

Protein interactions, Giot et al., “A ProteinInteraction Map of Drosophila melanogaster”,Science 302, 1722-1736, 2003.

Jason’s network via LinkedIn Labs

• Locally, there are clusters or communities.

• First pass over a massive social graph:• Find smaller communities of interest.• Analyze / visualize top-ranked communities.

• Our part: First massively parallel community detection method.


http://inmaps.linkedinlabs.com/share/Jason_Riedy/135243263311536682812471775171414573322

Outline

Motivation

Defining community detection and metrics

Our parallel method

ResultsImplementation and platform detailsPerformance

Conclusions and plans


Community detection

What do we mean?• Partition a graph’s

vertices into disjointcommunities.

• A community locallymaximizes some metric.• Modularity,

conductance, ...

• Trying to capture thatvertices are more similarwithin one communitythan betweencommunities. Jason’s network via LinkedIn Labs



Community detection

Assumptions

• Disjoint partitioning ofvertices.

• There is no one uniqueanswer.• Some metrics are

NP-complete to opti-mize [Brandes, et al.].

• Graph is lossyrepresentation.

• Want an adaptabledetection method.

Jason’s network via LinkedIn Labs



Common community metric: Modularity

• Modularity: Deviation of connectivity in the communityinduced by a vertex set S from some expected backgroundmodel of connectivity.

• We take [Newman]’s basic uniform model.

• Let m count all edges in graph G, mS count of edges withboth endpoints in S, and xS count the edges with anyendpoint in S. Modularity QS :

QS = (mS − x2S/4m)/m

• Total modularity: sum of each partion’s subsets’ modularity.

• A sufficiently positive modularity implies some structure.

• Known issues: Resolution limit, NP-complete opt. prob.


Sequential agglomerative method

A

B

C

D

E

FG

• A common method(e.g. [Clauset, et al.]) agglomeratesvertices into communities.

• Each vertex begins in its owncommunity.

• An edge is chosen to contract.• Merging maximally increases

modularity.• Priority queue.

• Known often to fall into an O(n2)performance trap withmodularity [Wakita & Tsurumi].



A

B

C

D

E

FG

C

B








A

B

C

D

E

FG

C

B

D

A








A

B

C

D

E

FG

C

B

D

A

B

C







New parallel method

A

B

C

D

E

FG

• We use a matching to avoid the queue.

• Compute a heavy weight, largematching.• Greedy algorithm.• Maximal matching.• Within factor of 2 in weight.

• Merge all communities at once.

• Maintains some balance.

• Produces different results.

• Agnostic to weighting, matching...• Can maximize modularity, minimize

conductance.• Modifying matching permits easy

exploration.


New parallel method

A

B

C

D

E

FG

C

D

G








exploration.


New parallel method

A

B

C

D

E

FG

C

D

G

E

B

C








exploration.


Implementation: Cray XMT

Tolerates latency by massive multithreading.

• Hardware: 128 threads per processor• Every-cycle context switch• Many outstanding memory requests (180/proc)

• Flexibly supports dynamic load balancing• Globally hashed address space, no data cache

• Support for fine-grained, word-level synchronization• Full/empty bit on with every memory word

• 128 processor XMT atPacific Northwest Nat’l Lab

• 500 MHz processors, 16384threads, 1 TB of sharedmemory

Image: cray.com


Implementation: Data structures

Extremely basic for graph G = (V,E)

• An array of (i, j;w) weighted edge pairs, i < j, |E|• An array to store self-edges, d(i) = w, |V |• A temporary floating-point array for scores, |E|• Two additional temporary arrays (|V |, |E|) to store degrees,

matching choices, linked list for compression, ...

• Weights count number of agglomerated vertices or edges.

• Scoring methods need only vertex-local counts.

• Greedy matching parallelizes trivially over the edge array.

• Contraction compacts the edge list in place through atemporary, hashed linked list of duplicates.• Full-empty bits or compare&swap keep correctness.


Data and experiment

• Generated R-MAT graphs [Chakrabarti, et al.] (a = 0.55,b = c = 0.1, and d = 0.25) and extracted the largestcomponent.

Scale Fact. |V | |E| Avg. degree Edge group

18 8 236 605 2 009 752 8.5 2M16 252 427 3 936 239 15.6 4M32 259 372 7 605 572 29.3 8M

19 8 467 993 3 480 977 7.4 4M16 502 152 7 369 885 14.7 8M32 517 452 14 853 837 28.7 16M

• Three runs with each parameter setting.

• Tested both modularity (CNM for Clauset-Newman-Moore)and McCloskey-Bader’s filtered modularity (MB).


Performance: Time

Number of processors

Tim

e (s

)

22

24

26

28

210

CNM

●●●

●●●

●●●

●●●

●●●

●●●●●●

●●●

●●●

●●●

2.7s

51.4s

4.3s

144.6s

9.8s

464.4s

19.8s

1321.1s

1 2 4 8 16 32 64 128

MB

●●●

●●●

●●●

●●●

●●●

●●● ●●● ●●●

●●●

●●●

2.7s

51.9s

4.5s

144.7s

10.0s

460.8s

19.7s

1316.6s

1 2 4 8 16 32 64 128

Edge group

● 2M

4M

8M

16M


Performance: Scaling

Number of processors

Spe

edup

1

2

4

8

16

32

64

128

CNM

●●●

●●●

●●●

●●●

●●●

●●● ●●●

●●●

●●●

●●●

2M: 19.2x at 32P

4M: 33.4x at 48P

8M: 48.6x at 96P

16M: 66.8x at 128P

1 2 4 8 16 32 64 128

MB

●●●

●●●

●●●

●●●

●●●

●●● ●●● ●●

●

●●

●●●●

2M: 19.4x at 48P

4M: 32.5x at 48P

8M: 48.0x at 96P

16M: 66.7x at 128P

1 2 4 8 16 32 64 128

Edge group

● 2M

4M

8M

16M


Performance: Modularity

Scoring method

Mod

ular

ity

0.1

0.2

0.3

0.4

●

●

●

●

●

●

(18,16): 0.171

(18,32): 0.084

(18, 8): 0.240

(18,16): 0.187

(18,32): 0.104

(18, 8): 0.319

(18,16): 0.425(18,32): 0.418

(18, 8): 0.423

(18,16): 0.013(18,32): 0.009

(18, 8): 0.023

CNM MB

Implementation

● Parallel

SNAP

• SNAP: GT’s sequentialimplementation.

• Plain modularity (CNM)known for too-largecommunities (resolutionlimit).

• MB filters out statisticallynegligible merges, smallercommunities & modularity.

• Our more balanced, parallelalgorithm falls between thetwo?


Performance: Large and real-world

Large R-MAT Scale 27, edge factor 16 generated a component with122 million vertices and 1.99 billion edges.

7258 seconds ≈ 2 hours on 128PR-MAT known not to have much communitystructure...

LiveJournal 4.8 million vertices, 68 million edges2779 seconds ≈ 47 minutes on 128P

Long tail of around 1 000 matched pairs per iteration.Tail is important to modularity and community count,but to users?


Conclusions and plans

• First massively parallel algorithm based on matching foragglomerative community detection.• Non-deterministic algorithm, still studying impact.

• Now can easily experiment with agglomerative communitydetection on real-world graphs.• How volatile are modularity and conductance to perturbations?• What matching schemes work well?• How do different metrics compare in applications?

• Experimenting on OpenMP platforms.• CSR format is more cache-friendly, but uses more memory.

• Extending to streaming graph data!

• Code available on request, to be integrated into GT packageslike GraphCT.


And if you can do better...

Then please do!

10th DIMACS Implementation Challenge — Graph Partitioningand Graph Clustering

• 12-13 Feb, 2012 in Atlanta,http://www.cc.gatech.edu/dimacs10/

• Paper deadline: 21 October, 2011http://www.cc.gatech.edu/dimacs10/data/call-for-papers.pdf

• Challenge problem statement and evaluation rules:http://www.cc.gatech.edu/dimacs10/data/dimacs10-rules.pdf

• Co-sponsored by DIMACS, by the Command, Control, and Interoperability

Center for Advanced Data Analysis (CCICADA); Pacific Northwest

National Lab.; Sandia National Lab.; and Deutsche

Forschungsgemeinschaft (DFG).


http://www.cc.gatech.edu/dimacs10/

http://www.cc.gatech.edu/dimacs10/data/call-for-papers.pdf

http://www.cc.gatech.edu/dimacs10/data/dimacs10-rules.pdf

Acknowledgment of support


Bibliography I

D. Bader and J. McCloskey.Modularity and graph algorithms.Presented at UMBC, Sept. 2009.

J. Berry., B. Hendrickson, R. LaViolette, and C. Phillips.Tolerating the community detection resolution limit with edgeweighting.CoRR, abs/0903.1072, 2009.

U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer,Z. Nikoloski, and D. Wagner.On modularity clustering.IEEE Trans. Knowledge and Data Engineering, 20(2):172–188,2008.


Bibliography II

D. Chakrabarti, Y. Zhan, and C. Faloutsos.R-MAT: A recursive model for graph mining.In Proc. 4th SIAM Intl. Conf. on Data Mining (SDM), Orlando,FL, Apr. 2004. SIAM.

A. Clauset, M. Newman, and C. Moore.Finding community structure in very large networks.Physical Review E, 70(6):66111, 2004.

M. Newman.Modularity and community structure in networks.Proc. of the National Academy of Sciences, 103(23):8577–8582,2006.

K. Wakita and T. Tsurumi.Finding community structure in mega-scale social networks.CoRR, abs/cs/0702048, 2007.


parallel community detection for massive graphs

Technology

ownb community

common community metric

modularity modularity

disjoint communities

smaller communities

new parallel method

common methode

modularity qs