streaming graph challenge: stochastic block partition...

Streaming Graph Challenge: Stochastic Block Partition- draft -

Edward Kao, Vijay Gadepally, Michael Hurley, Michael Jones, Jeremy Kepner, Sanjeev Mohindra,Paul Monticciolo, Albert Reuther, Siddharth Samsi, William Song, Diane Staheli, Steven Smith

MIT Lincoln Laboratory, Lexington, MA

Abstract—An important objective for analyzing real-world graphs is to achieve scalable performance on large,streaming graphs. A challenging and relevant example isthe graph partition problem. As a combinatorial prob-lem, graph partition is NP-hard, but existing relaxationmethods provide reasonable approximate solutions thatcan be scaled for large graphs. Competitive benchmarksand challenges have proven to be an effective means toadvance state-of-the-art performance and foster communitycollaboration. This paper describes a graph partition chal-lenge with a baseline partition algorithm of sub-quadraticcomplexity. The algorithm employs rigorous Bayesian in-ferential methods based on a statistical model that cap-tures characteristics of the real-world graphs. This strongfoundation enables the algorithm to address limitations ofwell-known graph partition approaches such as modularitymaximization. This paper describes various aspects of thechallenge including: (1) the data sets and streaming graphgenerator, (2) the baseline partition algorithm with pseu-docode, (3) an argument for the correctness of parallelizingthe Bayesian inference, (4) different parallel computationstrategies such as node-based parallelism and matrix-basedparallelism, (5) evaluation metrics for partition correctnessand computational requirements, (6) preliminary timing ofa Python-based demonstration code and the open sourceC++ code, and (7) considerations for partitioning the graphin streaming fashion. Data sets and source code for thealgorithm as well as metrics, with detailed documentationare available at GraphChallenge.org.

I. INTRODUCTION

In the era of big data, analysis and algorithms oftenneed to scale up to large data sets for real-world applica-tions. With the rise of social media and network data, al-gorithms on graphs face the same challenge. Competitivebenchmarks and challenges have proven to be an effec-tive means to advance state-of-the-art performance andfoster community collaboration. Previous benchmarkssuch as Graph500 [1] and the Pagerank Pipeline [2]are examples of such, targeting analysis of large graphsand focusing on problems with sub-quadratic complexity,such as search, path-finding, and PageRank computa-tion. However, some analyses on graphs with valuable

*This material is based upon work supported by the DefenseAdvanced Research Projects Agency under Air Force Contract No.FA8721-05-C-0002. Any opinions, findings and conclusions or recom-mendations expressed in this material are those of the author(s) anddo not necessarily reflect the views of the Department of Defense.

applications are NP-hard. The graph partition and thegraph isomorphism (i.e. matching) problems are well-known examples. Although these problems are NP-hard,existing relaxation methods provide good approximatesolutions that can be scaled to large graphs [3], [4],especially with the aid of high performance computinghardware platform such as massively parallel CPUs andGPUs. For example, the 10th DIMACS ImplementationChallenge [5] resulted in substantial participation in thegraph partition problem, mostly with solutions basedon modularity maximization. To promote algorithmicand computational advancement in these two importantareas of graph analysis, our team has implemented achallenge for graph isomorphism [6] and graph partitionat GraphChallenge.org. This paper describes the graphpartition challenge with a recommended baseline parti-tion algorithm of sub-quadratic complexity. Furthermore,the algorithm employs rigorous Bayesian inferentialmethods based on the stochstic blockmodels that capturecharacteristics of the real-world graphs. Participants arewelcome to submit solutions based on other partitionalgorithms as long as knowledge on the true numberof communities (i.e. blocks) is not assumed. All entriesshould be submitted with performance evaluation on thechallenge data sets using the metrics described in SectionV.

Graph partition, also known as community detectionand graph clustering, is an important problem with manyreal-world applications. The objective of graph partitionis to discover the distinct community structure of thegraph, specifically the community membership for eachnode in the graph. The partition gives much insight tothe interactions and relationships between the nodes andenables detection of nodes belonging to certain commu-nities of interest. Much prior work has been done in theproblem space of graph partition, with a comprehensivesurvey in [7]. The most well-known algorithm is prob-ably the spectral method by [8] where partition is donethrough the eigenspectrum of the modularity matrix.Most of the existing partition algorithms work throughthe principle of graph modularity where the graph is par-titioned into communities (i.e. modules) that have muchstronger interactions within them than between them.

http://GraphChallenge.org


Typically, partitioning is done by maximizing the graphmodularity [9]. [10] extends the concept of modularityfor time-dependent, multiscale, and multiplex graphs.Modularity maximization is an intuitive and convenientapproach, but has inherent challenges such as resolutionlimit on the size of the detectable communities [11],degeneracies in the objective function, and difficulty inidentifying the optimal number of communities [12].

To address these challenges, recent works performgraph partition through membership estimation based ongenerative statistical models. For example, [13], [14],[15], [16] estimate community memberships using thedegree corrected stochastic blockmodels [17], and [18]proposes a mixed-memberships estimation procedureby applying tensor methods to the mixed-membershipstochastic blockmodels [19]. The baseline partition al-gorithm for this challenge is based on [14], [15], [16],because of its rigorous statistical foundation and sub-quadratic computational requirement. Under this ap-proach, each community is represented as a “block” inthe model. Going forward, this paper will use the term“block” as the nomenclature for a community or a graphcluster.

When some nodes in the graph have known mem-berships a priori, these nodes can serve as “cues” inthe graph partition problem. [20] is an example of suchusing random walks on graph. This challenge will focuson the graph partition problem where such cues are notavailable.

In many real-world applications, graph data arrivesin streaming fashion over time or stages of sampling[21]. This challenge addresses this aspect by providingstreaming graph data sets and recommending a baselinepartition algorithm that is suitable for streaming graphsunder the Bayesian inference paradigm.

This paper describes the graph partition challenge indetail, beginning with Section II on the data sets andstreaming graph generator. Section III describes the base-line partition algorithm, including pseudocode on thecore Bayesian updates. Section IV focuses on the parallelcomputation of the baseline algorithm, argues for thecorrectness of parallelizing the Bayesian updates, thenproposes parallel computation strategies such as node-based parallelism and matrix-based parallelism. SectionV describes the evaluation metrics for both partitioncorrectness and computational requirements, includinga preliminary timing of a Python-based demonstrationcode and the open source C++ code [22]. Considerationsfor partitioning the graph in streaming fashion are giventhroughout the paper.

II. DATA SETS

The data sets for this challenge consist of graphs ofvarying sizes and characteristics. Denote a graph G “

pV ,Eq, with the set V of N nodes and the set E of Eedges. The edges, represented by a N ˆ N adjacencymatrix A, can be either directed or undirected, binaryor weighted. Specifically, Aij is the weight of the edgefrom node i to node j. A undirected graph will have asymmetric adjacency matrix.

In order to evaluate the partition algorithm imple-mentation on graphs with a wide range of realisticcharacteristics, graphs are generated according to a truthpartition b: of B: blocks (i.e. clusters), based on thedegree-corrected stochastic blockmodels by Karrer andNewman in [17]. Under this generative model, eachedge, Aij , is drawn from a Poisson distribution of rateλij governed by the equations below:

Aij „ Poissonpλijq (1)λij “ θiθjΩbibj (2)

where θi is a correction term that adjusts node i’s ex-pected degree, Ωbibj the strength of interaction betweenblock bi and bj , and bi the block assignment for node i.The degree-corrected stochastic blockmodels enable thegeneration of graphs with characteristics and variationsconsistent with real-world graphs. The degree correctionterm for each node can be drawn from a Power-Lawdistribution with an exponent between ´3 and ´2 tocapture the degree distribution of realistic, scale-freegraphs [23]. The block interaction matrix Ω specifies thestrength of within- and between-block (i.e. community)interactions. Stronger between-block interactions willincrease the block overlap, making the block partitiontask more difficult. Lastly, the block assignment for eachnode (i.e. the truth partition b:) can be drawn froma multinomial distribution with a Dirichlet prior thatdetermines the amount of variation in size between theblocks. Figure 1 shows generated graphs of various char-acteristics by adjusting the parameters of the generator.These parameters server as “knobs” that can be dialed tocapture a rich set of characteristics for realism and alsofor adjusting the difficulty of the block partition task.

Real-world graphs will also be included in the datasets. Since the truth partition is not available in mostreal-world graphs, generated graphs with truth will beembedded with the real-world graphs. While the entiregraph will be partitioned, evaluation on the correctnessof the partition will be done only on the generatedpart of the hybrid graph. Embedding will be done byadding edges between nodes in the real-world graph andthe generated graph, with a relatively small probabilityproportional to the product of both node degrees.

In real-world applications, graph data often arrives instreaming fashion, where parts of the input graph becomeavailable at different stages. This happens as interactionsand relationships take place and are observed over time,

(a) baseline (b) increased block overlap

(c) higher block size variation (d) more high degree nodes

Fig. 1. Generated graphs with varying characteristics. Nodes arecolored and shaped according to their true block assignments. Graphsare typically much larger. Small graphs are shown here for the purposeof demonstration. For simplicity and clarity, the edge directions (i.e.arrows) are not displayed.

or as data is collected incrementally by exploring thegraph from starting points (e.g. snowball sampling) [21].Streaming graph data sets in this challenge are generatedin both ways, as demonstrate in Figure 2. The partitionalgorithm should process the streaming graph at eachstage and ingest the next stage upon completion of thecurrent stage. Performance evaluated using the metricsin Section V should be reported at each stage of theprocessing. For efficiency, it is recommended that thepartition algorithm leverages partitions from the previousstage(s) to speed up processing at the current stage. Thebaseline partition algorithm for this challenge is a naturalfit for streaming processing, as discussed in Section III.

III. BASELINE ALGORITHM

This section described the recommended baseline par-tition algorithm, although participants are welcome tosubmit solutions based on other partition algorithms aslong as knowledge on the true number of blocks is notassumed.

The baseline graph partition algorithm for this chal-lenge, chosen for its rigorous statistical foundation andsub-quadratic, OpE log2Eq, computational requirement,is developed by Tiago Peixoto in [14], [15], [16] basedon the degree-corrected stochastic blockmodels by Kar-rer and Newman in [17]. Given the input graph, the

(a) streaming graph as edges emerge

(b) streaming graph with snowball sampling

Fig. 2. Streaming graphs generated in two ways: (a) as edges emergeover time and (b) as the graph is explored from starting point(s).

algorithm partitions the nodes into B blocks (i.e. clustersor communities), by updating the nodal block assign-ment represented by vector b of N elements wherebi P t1, 2, ..., Bu, and the inter-block and intra-blockedge count matrix (typically sparse in a large graph) rep-resented by M of size BˆB, where each element Mij

represents the number or the total weight of edges goingfrom block i to block j. The diagonal elements representthe edge counts within each block. For conciseness, thismatrix will be referred to as the inter-block edge countmatrix going forward. The goal of the algorithm is torecover the truth partition b: of B: blocks (i.e. clusters).

The algorithm performs a Fibonacci search (i.e. goldensection search) [24] through different numbers of blocksB and attempts to find the minimum description lengthpartition. The best overall partition b˚ with the optimalnumber of block B˚ minimize the total descriptionlength of the model and the observed graph (i.e. entropyof the fitted model). To avoid being trapped in localminima, the algorithm starts with each node in its ownblock (i.e. B “ N ) and the blocks are merged at eachstep of the Fibonacci search, followed by iterative MonteCarlo Markov Chain (MCMC) updates on the blockassignment for each node to find the best partition forthe current number of blocks. The block-merge movesand the nodal updates are both governed by the sameunderlying log posterior probability of the partition giventhe observed graph:

ppb|Gq 9ÿ

t1,t2

Mt1t2 log

ˆ

Mt1t2

dt1,outdt2,in

˙

(3)

The log posterior probability is a summation over allpairs of blocks t1 and t2 where dt1,out is the total out-

degree for block t1 and dt2,in is the total in-degree forblock t2. Note that in computing the posterior proba-bilities on the block assignments, the sufficient statisticsfor the entire graph is only the inter-block edge counts,giving much computational advantage for this algorithm.Another nice property of the log posterior probability isthat it is also the negative entropy of the fitted model.Therefore, maximizing the posterior probability of thepartition also minimizes the overall entropy, fitting nicelyinto the minimum description length framework. Theblock-merge moves and the nodal block assignmentupdates are described in detail next, starting with thenodal updates.

A. Nodal Block Assignment Updates

The nodal updates are performed using the MonteCarlo Markov Chain (MCMC), specifically with Gibbssampling and the Metropolis-Hastings algorithm sincethe partition posterior distribution in Equation 3 doesnot have a closed-form and is best sampled one node ata time. At each MCMC iteration, the block assignmentof each node i is updated conditional on the assignmentsof the other nodes according to the conditional pos-terior distribution: ppbi|b´i, Gq. Specifically, the blockassignment bi for each node i is updated based on theedges to its neighbors, AiNi

and ANii, the assignmentsof its neighbors, bNi

, and the inter-block edge count,M . For each node i, the update begins by proposinga new block assignment. To increase exploration, ablock is randomly chosen as the proposal with somepredefined probability. Otherwise, the proposal will bechosen from the block assignments of nodes nearby toi. The new proposal will be considered for acceptanceaccording to how much it changes the log posteriorprobability. The acceptance probability is adjusted bythe Hastings correction, which accounts for potentialasymmetry in the directions of the proposal to achievethe important detailed balance condition that ensuresthe correct convergence of the MCMC. Algorithm 1in Appendix A is a detailed description of the blockassignment update at each node, using some additionalnotations: dt,in “

ř

kMkt is the number of edges intoblock t, dt,out “

ř

kMtk the number of edges out ofblock t, dt “ dt,in`dt,out the number of edges into andout of block t, Kit the number of edges between nodesi and block t, and β is the update rate that controls thebalance between exploration and exploitation. The blockassignments are updated for each node iteratively untilconvergence when the improvement in the log posteriorprobability falls below a threshold.

B. Block-Merge Moves

The block-merge moves work in almost identicalways as the nodal updates described in Algorithm 1 in

Appendix A, except that it takes place at the block level.Specifically, a block-merge move proposes to reassign allthe nodes belonging to the current block i to a proposedblock s. In other words, it is like applying Algorithm 1on the block graph where each node represents the entireblock (i.e. all the nodes belonging to that block) and eachedge represents the number of edges between the twoblocks. Another difference is that the block-merges aredone in a greedy manner to maximize the log posteriorprobability, instead of through MCMC. Therefore, theHastings correction computation step and the proposalacceptance step are not needed. Instead, the best mergemove over some number of proposals is computed foreach block according to the change in the log posteriorprobability, and the top merges are carried out to resultin the number of blocks targeted by the Fibonacci search.

C. Put It All Together

Overall, the algorithms shifts back and forth betweenthe block-merge moves and the MCMC nodal updates,to find the optimal number of blocks B˚ with theresulting partition b˚. Optimality is defined as havingthe minimum overall description length, H , of the modeland the observed graph given the model:

H “ E h

ˆ

B2

E

˙

`N logB´ÿ

r,s

Mrs log

ˆ

Mrs

dr,outds,in

˙

(4)where the function hpxq “ p1`xq logp1`xq´x logpxq.The number of blocks may be reduced at a fixed rated(e.g. 50%) at each block-merge phase until the Fibonacci3-point bracket is established. At any given stage of thesearch for optimal number of blocks, the past partitionwith the closest and higher number of blocks is used tobegin the block-merge moves, followed by the MCMCnodal updates, to find the best partition at the targetednumber of blocks. Figure 3 shows the partition at se-lected stages of the algorithm on a 500 node graph:

The algorithm description in this section is for directedgraphs. Very minor modifications can be applied forundirected graphs that have no impact on the compu-tational requirement. These minor differences are docu-mented in Peixoto’s papers [14], [15], [16].

Advantageously, the baseline partition algorithm withits rigorous statistical foundation, is ideal for processingstreaming graphs. Good partitions found on the graphat a previous streaming stage are samples on the poste-rior distribution of the partition, which can be used asstarting partitions for the graph at the current stage. Thishas the natural Bayesian interpretation of the posteriordistribution from a previous state serving as the priordistribution on the current state, as additional data onthe graph arrives.

(a) 250 blocks (b) 32 blocks

(c) 8 blocks (d) 4 blocks

Fig. 3. Partitions at selected stages of the algorithm, with the nodescolored and shaped according to their block assignments. The algo-rithm begins with too many blocks (i.e. over partition) and performsblock-merges and nodal updates as it searches for the optimal partition.The Fibonacci search eventually converges to the partition with theoptimal number of blocks, which is shown in (c) with 8 blocks.

IV. PARALLEL COMPUTATION STRATEGIES

Significant speed up of the baseline partition algorithmis the primary focus of this graph challenge, and isnecessary for computation on large graphs. Since thesame core computation, described in Algorithm 1 inAppendix A, is repeated for each block and each node,parallelizing this core computation across the blocksand nodes provides a way to speed up the computationpotentially by the order of the number of processorsavailable. This section first discusses the correctness inparallelizing the MCMC updates. It then examines someof the parallel computation schemes for the baselinealgorithm, with their respective advantages and require-ments.

A. Correctness of Parallel MCMC Updates

The block-merge moves are readily parallelizable,since each of the potential merge move is evaluatedbased on the previous partition and the best mergesare carried out. However, the nodal block assignmentupdates are not so straight forward, since it relies onMCMC through Gibbs sampling which is by nature asequential algorithm where each node is updated oneat a time. Parallelizing MCMC updates is an area ofrising interest, with the increasing demand to perform

Bayesian inference on large data sets. Running thebaseline partition algorithm on large graphs is a perfectexample of this need. Very recently, researchers haveproposed to use asynchronous Gibbs sampling as a wayto parallelize MCMC updates [25], [26]. In asynchronousGibbs sampling, the parameters are updated in paralleland asynchronous fashion without any dependency con-straint. In [26], a proof is given to show that when theparameters in the MCMC sparsely influence one another(i.e. the Dobrushin’s condition), asynchronous Gibbs isable to converge quickly to the correct distribution. Itis difficult to show analytically that the MCMC nodalupdates here satisfy the Dobrushin’s condition. However,since the graph is typically quite sparse, the blockassignment on each node influences one another sparsely.This gives intuition on the adequacy of parallel MCMCupdates for the baseline partition algorithm. In fact,parallel MCMC updates based on one-iteration-old blockassignments have shown to result in equally good parti-tions compared to the sequential updates, based on thequantitative metrics in Section V-A, for the preliminarytests we conducted so far.

B. Parallel Updates on Nodes and Blocks

An intuitive and straight-forward parallel computationscheme is to evaluate each block-merge and update eachnodal block assignment (i.e. Algorithm 1 in AppendixA) in distributed fashion across multiple processors. Theblock-merge evaluation is readily parallelizable sincethe computation is based on the previous partition. TheMCMC nodal updates can be parallelized using theone-iteration-old block assignments, essentially approx-imating the true conditional posterior distribution with:ppbi|b

´´i, Gq. The conditional block assignments, b´´i,

may be more “fresh” if asynchronous Gibbs samplingis used so that some newly updated assignments maybecome available to be used for updates on later nodes.In any case, once all the nodes have been updatedin the current iteration, all the new block assignmentsare gathered and their modifications on the inter-blockedge count matrix aggregated (this can also be donein parallel). These new block assignments and the newinter-block edge count matrix are then available for thenext iteration of MCMC updates.

C. Batch Updates Using Matrix Operations

Given an efficient parallelized implementation oflarge-scale matrix operations, one may consider carryingout Algorithm 1 as much as possible with batch compu-tation using matrix operations [27]. Such matrix opera-tions in practice perform parallel computation across allnodes simultaneously.

Under this computation paradigm, the block assign-ments are represented as a sparse N ˆB binary matrix

Γ, where each row πi‚ is an indicator vector with avalue of one at the block it is assigned to and zeroseverywhere else. This representation results in simplematrix products for the inter-block edge counts:

M “ ΓTAΓ (5)

The contributions of node i of block assignment r to theinter-block edge count matrix row r and column r are:

∆M row,i‚ “ Ai‚Γ (6)

∆M`col,i‚ “ A

T‚iΓ (7)

These contributions are needed for computing the ac-ceptance probabilities of the nodal block assignmentproposals, which makes up a large part of the overallcomputation requirement.

Algorithm 2 in Appendix B is a batch implementationof the nodal updates described in Algorithm 1. The inter-block edge counts under each of the N proposal arerepresented using a 3D matrix M of size N ˆB ˆB.For clarity, computations of the acceptance probabilitiesinvolving the inter-block edge counts and degrees arespecified using tensor notation. Note that much of thesecomputations may be avoided with clever implementa-tions. For example:‚ If the proposed block assignment for a node is

the same as its previous assignment, its acceptanceprobability does not need to be computed.

‚ New proposals only change two rows and columnsof the inter-block edge count matrix, correspondingto moving the counts from the old block to thenew block, so most of the entries in M are simplycopies of M´.

‚ The inter-block edge count matrix should be sparse,especially when there is a large number of commu-nities, since most communities do not interact withone another. This gives additional opportunity forspeeding up operations on this matrix.

‚ Similarly, each node is likely to connect with only afew different communities (i.e. blocks). Therefore,changes by each nodal proposal on the inter-blockedge count matrix will only involve a few se-lected rows and columns. Limiting the computationof change in log posterior, ∆S, to these rowsand columns may result in significant computationspeedup.

V. METRICS

An essential part of this graph challenge is a canon-ical set of metrics for comprehensive evaluation of thepartition algorithm implementation by each participatingteam. The evaluation should report both the correctnessof the partitions produced, as well as the computationalrequirements, efficiency, and complexity of the imple-mentations. For streaming graphs, evaluation should be

done at each stage of the streaming processing, forexample, the length of time it took for the algorithmto finish processing the graph after the first two parts ofthe graph become available, and the correctness of theoutput partition on the available parts so far. Efficientimplementations of the partition algorithm leverage par-titions from previous stages of the streaming graph to“jump start” the partition at the current stage.

A. Correctness Metrics

The true partition of the graph is available in thischallenge, since the graph is generated with a stochasticblock structure, as described in Section II. Therefore,correctness of the output partition by the algorithm im-plementation can be evaluated against the true partition.On the hybrid graphs where a generated graph is em-bedded within a real-world graph with no available truepartition, correctness is only evaluated on the generatedpart.

Evaluation of the output partition (i.e. clustering)against the true partition is well established in existingliterature and a good overview can be found in [28].Widely-adopted metrics fall under three general cate-gories: unit counting, pair-wise counting, and informa-tion theoretic metrics. The challenge in this paper adoptsall of them for comprehensiveness and recommends thepairwise precision-recall as the primary correctness met-ric for its holistic evaluation and intuitive interpretation.Computation of the correctness metrics described inthis section are implemented in Python and shared asa resource for the participants at GraphChallenge.org.Table I provides a simple example to demonstrate eachmetric, where each cell in row i and column j is thecount of nodes belonging to truth block i and reportedin output block j.

TABLE ICONTINGENCY TABLE OF TRUE VS. OUTPUT PARTITION

Output A Output B Output C TotalTruth A 30 2 0 32Truth B 1 20 3 24

Total 31 22 3 56

In this example, the nodes are divided into two blocksin the true partition, but divided into three blocks inthe output partition. Therefore, this is an example ofover-clustering (i.e. too many blocks). The diagonalcells shaded in green here represent the nodes thatare correctly partitioned whereas the off-diagonal cellsshaded in pink represent the nodes with some kind ofpartition error.


1) Unit Counting Metrics: The most intuitive metricis perhaps the overall accuracy, specifically the percent-age of nodes correctly partitioned. This is simply thefraction of the total count that belong to the diagonalentries of the contingency table after the truth blocksand the output blocks have been optimally associated tomaximize the diagonal entries, typically using a linearassignment algorithm [29]. In this example, the overallaccuracy is simply 5066 “ 89%. While this one singlenumber provides an intuitive overall score, it does notaccount for the types and distribution of errors. Forexample, truth block B in Table I has three nodesincorrectly split into output block C. If instead, thesethree nodes were split one-by-one into output block C,D,and E, a worse case of over-clustering would have takenplace. The overly simplified accuracy cannot make thisdifferentiation.

A way to capture more details on the types anddistribution of errors is to report block-wise precision-recall. Block-wise precision is the fraction of cor-rectly identified nodes for each output block (e.g.Precision(Output A) “ 3031) and the block-wise recallis the fraction of correctly identified nodes for each truthblock (e.g. Recall(Truth B) “ 2024). The block-wiseprecision-recall present a intuitive score for each of thetruth and output blocks, and can be useful for diagnosingthe block-level behavior of the implementation. How-ever, it does not provide a global measure on correctness.

2) Pairwise Counting Metrics: Measuring the levelof agreement between the truth and the output partitionby considering every pair of nodes has a long historywithin the clustering community [30], [31]. The basicidea is simple, by considering every pair of nodes whichbelongs to one of the following four categories: 1.) inthe same truth block and the same output block, 2.) indifferent truth blocks and different output blocks, 3.) inthe same truth block but different output blocks, and4.) in different truth blocks but the same output block.Category 1.) and 2.) are the cases of agreements betweenthe truth and the output partition, whereas categories3.) and 4.) indicate disagreements. An intuitive overallscore on the level of agreement is the fraction of allpairs belonging to category 1.) and 2.), known as theRand index [30]. [31] proposes the adjusted Rand indexwith a correction to account for the expected value ofthe index by random chance, to provide a fairer metricacross different data sets. Categories 4.) and 3.) can beinterpreted as type I (i.e. false positives) and type II (i.e.false negative) errors, if one considers a “positive” caseto be where the pair belongs to the same block. Thepairwise precision-recall metrics [32] can be computedas:

Pairwise-precision “#Category 1

#Category 1`#Category 4(8)

Pairwise-recall “#Category 1

#Category 1`#Category 3(9)

Pairwise-precision considers all the pairs reported asbelonging to the same output block and measures thefraction of them being correct, whereas pairwise-recallconsiders all the pairs belonging to the same truth blockand measures the fraction of them reported as belongingto the same output block. In the example of Table I,the pairwise-precision is about 90% and the pairwise-recall about 81%, which indicates this to be a caseof over-clustering with more Type II errors. Althoughpairwise counting is somewhat arbitrary, it does presentholistic and intuitive measures on the overall level ofagreement between the output and the true partition. Forthe challenge, the pairwise precision-recall will serveas the primary metrics for evaluating correctness of theoutput partition.

3) Information Theoretic Metrics: In recent years,holistic and rigorous metrics have been proposed basedon information theory, for evaluating partitions and clus-terings [28], [33]. Specifically, these metrics are basedon the information content of the partitions measuredin Shannon entropy. Naturally, information theoreticprecision-recall metrics can be computed as:

Information-precision “IpT ;Oq

HpOq(10)

Information-recall “IpT ;Oq

HpT q(11)

where IpT ;Oq is the mutual information between truthpartition T and the output partition O, and HpOq is theentropy (i.e. information content) of the output partition.Using the information theoretic measures, precision isdefined as the fraction of the output partition informationthat is true, and recall is defined as the fraction ofthe truth partition information captured by the outputpartition. In the example of Table I, the informationtheoretic precision is about 57% and recall about 71%.The precision is lower than the recall because of the extrablock in the output partition introducing informationcontent that does not correspond to the truth. The infor-mation theoretic precision-recall provide a rigorous andcomprehensive measure of the correctness of the outputpartition. However, the information theoretic quantitiesmay not be as intuitive to some and the metrics tend tobe harsh, as even a small number of errors often lowerthe metrics significantly.

B. Computational Metrics

The following metrics should be reported by thechallenge participants to characterize the computationalrequirements of their implementations.‚ Total number of edges in the graph (E): This measures

the amount of data processed.‚ Execution time: The total amount of time taken for the

implementation to complete the partition, in seconds.‚ Rate: This metric measures the throughput of the

implementation, in the number of edges processedover execution time (E/second). Figure 4 shows thepreliminary results on this metric between four differ-ent implementations of the partition algorithm, whenrun on a desktop with 16-core 2.4 GHz Intel Xeonprocessors and 128 GB of 1066 MHz DDR3 SDRAM.The four implementations are: 1.) C++ sequentialimplementation, 2.) C++ parallel implementation [22],3.) Python sequential implementation without sparsematrices, and 4.) Python sequential implementationwith sparse matrices. Since the algorithm complexityis super-linear, the rate drops as the size of the graphincreases, with a slope matching the change in rateaccording to the analytical complexity of the algorithm,OpE log2Eq.

Fig. 4. Processing rate for four different implementations of thebaseline algorithm across graphs of increasing size. Overall, the slopeof the rates follow the complexity of the algorithm, OpE log2 Eq.

The C++ implementation is about an order of mag-nitude faster than the Python implementation. Withparallel updates, the C++ implementation gains anotherorder of magnitude in rate when the graph is largeenough. The Python implementation without sparsematrices suffers in performance on larger graphs due tothe inefficiency of the dense matrix representation. ThePython implementation with sparse matrices attemptsto address this issue, but it runs very slowly due to

the lack of a fast implementation of sparse matricesin Python. All four implementations are available atGraphChallenge.org.

‚ Energy consumption in watts: The total amount ofenergy consumption for the computation.

‚ Rate per energy: This metric captures the throughputachieved per unit of energy consumed, measured inE/second/Watt.

‚ Memory requirement: The amount of memory requiredto execute the implementation.

‚ Processor requirement: The number and type of pro-cessors used to execute the implementation.

C. Implementation Complexity Metric

‚ Total lines-of-code count: This measure the complexityof the implementation. SCLC [34] and CLOC [35]are open source line counters that can be used forthis metric. The Python demonstration code for thischallenge has a total of 569 lines. The C++ open sourceimplementation is a part of a bigger package, so it isdifficult to count the lines on just the graph partition.

VI. SUMMARY

This paper gives a detailed description of the graphpartition challenge, its statistical foundation in thestochastic blockmodels, and comprehensive metrics toevaluate the correctness, computational requirements,and complexity of the competing algorithm implemen-tations. This paper also recommends strategies for mas-sively parallelizing the computation of the algorithm inorder to achieve scalability for large graphs. Theoreticalarguments for the correctness of the parallelization arealso given. Our hope is that this challenge will provide ahelpful resource to advance state-of-the-art performanceand foster community collaboration in the important andchallenging problem of graph partition on large graphs.Data sets and source code for the algorithm as wellas metrics, with detailed documentation are available atGraphChallenge.org.

VII. ACKNOWLEDGMENT

The authors would like thank Trung Tran, Tom Salter,David Bader, Jon Berry, Paul Burkhardt, Justin Brukardt,Chris Clarke, Kris Cook, John Feo, Peter Kogge, ChrisLong, Jure Leskovec, Richard Murphy, Steve Pritchard,Michael Wolfe, Michael Wright, and the entire Graph-BLAS.org community for their support and helpful sug-gestions. Also, the authors would like to recognize RyanSoklaski, John Griffith, and Philip Tran for their helpon the baseline algorithm implementation, as well asBenjamin Miller for his feedback on the matrix-basedparallelism.



REFERENCES

[1] Richard C Murphy, Kyle B Wheeler, Brian W Barrett, andJames A Ang. Introducing the graph 500. Cray Users Group(CUG), 2010.

[2] Patrick Dreher, Chansup Byun, Chris Hill, Vijay Gadepally,Bradley Kuszmaul, and Jeremy Kepner. Pagerank pipelinebenchmark: Proposal for a holistic system benchmark for big-dataplatforms. In Parallel and Distributed Processing SymposiumWorkshops, 2016 IEEE International, pages 929–937. IEEE,2016.

[3] Yu Jin and Joseph F Jaja. A high performance implementationof spectral clustering on cpu-gpu platforms. In 2016 IEEEInternational Parallel and Distributed Processing SymposiumWorkshops, pages 825–834. IEEE, 2016.

[4] Hiroki Kanezashi and Toyotaro Suzumura. An incremental local-first community detection method for dynamic graphs. In 2016IEEE International Conference on Big Data, pages 3318–3325.IEEE, 2016.

[5] David A Bader, Henning Meyerhenke, Peter Sanders, andDorothea Wagner. Graph partitioning and graph clustering,volume 588. American Mathematical Soc., 2013.

[6] Siddharth Samsi, Vijay Gadepally, Michael Hurley, MichaelJones, Edward Kao, Sanjeev Mohindra, Paul Monticciolo, AlbertReuther, Steven Smith, William Song, Diane Staheli, and JeremyKepner. Subgraph isomorphism graph challenge. in prep.

[7] Santo Fortunato. Community detection in graphs. PhysicsReports, 486(3):75–174, 2010.

[8] Mark EJ Newman. Finding community structure in net-works using the eigenvectors of matrices. Physical review E,74(3):036104, 2006.

[9] Mark EJ Newman. Modularity and community structure innetworks. Proceedings of the national academy of sciences,103(23):8577–8582, 2006.

[10] Peter J Mucha, Thomas Richardson, Kevin Macon, Mason APorter, and Jukka-Pekka Onnela. Community structure intime-dependent, multiscale, and multiplex networks. science,328(5980):876–878, 2010.

[11] Andrea Lancichinetti and Santo Fortunato. Limits of modu-larity maximization in community detection. Physical Rev. E,84(6):066122, 2011.

[12] Benjamin H Good, Yves-Alexandre de Montjoye, and AaronClauset. Performance of modularity maximization in practicalcontexts. Physical Rev. E, 81(4):046106, 2010.

[13] Karrer-Brian Ball, Brian and Mark E.J. Newman. An efficientand principled method for detecting communities in networks.Physical Review E, 84:036103, 2011.

[14] Tiago P Peixoto. Efficient monte carlo and greedy heuristicfor the inference of stochastic block models. Physical Rev. E,89(1):012804, 2014.

[15] Tiago P Peixoto. Parsimonious module inference in large net-works. Physical Rev. Letters, 110(14):148701, 2013.

[16] Tiago P Peixoto. Entropy of stochastic blockmodel ensembles.Physical Rev. E, 85(5):056122, 2012.

[17] Brian Karrer and Mark EJ Newman. Stochastic blockmodels andcommunity structure in networks. Physical Rev. E, 83(1):016107,2011.

[18] Furong Huang, UN Niranjan, M Hakeem, and AnimashreeAnandkumar. Fast detection of overlapping communities viaonline tensor methods. arXiv preprint arXiv:1309.0787, 2013.

[19] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, andEric P Xing. Mixed membership stochastic blockmodels. Journalof Machine Learning Research, 9(1981-2014):3, 2008.

[20] Steven Thomas Smith, Edward K Kao, Kenneth D Senne, GarrettBernstein, and Scott Philips. Bayesian discovery of threatnetworks. IEEE Transactions on Signal Processing, 62(20):5324–5338, 2014.

[21] Nesreen K Ahmed, Jennifer Neville, and Ramana Kompella.Network sampling: From static to streaming graphs. ACM Trans-actions on Knowledge Discovery from Data (TKDD), 8(2):7,2014.

[22] Tiago P Peixoto. Graph-tool repository. https://git.skewed.de/count0/graph-tool/tree/master, 2014.

[23] Albert-Laszlo Barabasi. Scale-free networks: a decade andbeyond. science, 325(5939):412–413, 2009.

[24] William H Press, Saul A Teukolsky, William T Vetterling, andBrian P Flannery. Numerical recipes in C, volume 2. CambridgeUniv Press, 1982.

[25] Alexander Terenin, Daniel Simpson, and David Draper. Asyn-chronous gibbs sampling. arXiv preprint arXiv:1509.08999,2015.

[26] Christopher De Sa, Kunle Olukotun, and Christopher Re. Ensur-ing rapid mixing and low bias for asynchronous gibbs sampling.arXiv preprint arXiv:1602.07415, 2016.

[27] Jeremy Kepner and John Gilbert. Graph algorithms in thelanguage of linear algebra. SIAM, 2011.

[28] Marina Meila. Comparing clusteringsan information based dis-tance. Journal of multivariate analysis, 98(5):873–895, 2007.

[29] Harold W Kuhn. The hungarian method for the assignmentproblem. Naval research logistics quarterly, 2(1-2):83–97, 1955.

[30] William M Rand. Objective criteria for the evaluation of clus-tering methods. Journal of the American Statistical Association,66(336):846–850, 1971.

[31] Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of classification, 2(1):193–218, 1985.

[32] Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, SugatoBasu, and Raymond J Mooney. Model-based overlapping cluster-ing. In Proceedings of the eleventh ACM SIGKDD internationalconference on Knowledge discovery in data mining, pages 532–537. ACM, 2005.

[33] Ryan S Holt, Peter A Mastromarino, Edward K Kao, andMichael B Hurley. Information theoretic approach for perfor-mance evaluation of multi-class assignment systems. In SPIEDefense, Security, and Sensing, pages 76970R–76970R. Interna-tional Society for Optics and Photonics, 2010.

[34] Brad Appleton. Source code line counter. http://www.bradapp.com/clearperl/sclc.html.

[35] Al Danial. Count lines of code. https://github.com/AlDanial/cloc,2017.

https://git.skewed.de/count0/graph-tool/tree/master

https://git.skewed.de/count0/graph-tool/tree/master

http://www.bradapp.com/clearperl/sclc.html

http://www.bradapp.com/clearperl/sclc.html

https://github.com/AlDanial/cloc

APPENDIX A: PARTITION ALGORITHM PSEUDOCODE

Algorithm 1: Block Assignment Update At Each Node i

input : bí , bŃi: current block labels for node i and its neighbors Ni

M´: current B ˆB inter-block edge count matrixAiNi ,ANii: edges between i and all its neighbors

output: bì : the new block assignment for node i

// propose a block assignmentobtain the current block assignment r “ bídraw a random edge of i which connects with a neighbor j, obtain its block assignment u “ b´jdraw a uniform random variable x1 „ Uniformp0, 1qif x1 ď B

dú`Bthen

// with some probability, propose randomly for explorationpropose bì “ s by drawing s randomly from t1, 2, ..., Bu

else// otherwise, propose by multinomial draw from neighboring blocks to u

propose bì “ s from MultinomialDraw´

Mú‚`M´

‚u

dú

¯

end// accept or reject the proposalsif s “ r then

return bì “ bí // proposal is the same as the old assignment. done!else

compute M` under proposal (update only rows and cols r and s, on entries for blocks connected to i)compute proposal probabilities for the Hastings correction:prÑs “

ř

tPtbŃiu

”

KitM´

ts`M´st`1

d´t `B

ı

and psÑr “ř

tPtbŃiu

”

KitM`

tr`M`rt`1

d`t `B

ı

compute change in log posterior (t1 and t2 only need to cover rows and cols r and s):

∆S “ř

t1,t2

„

´M`t1t2 log

ˆ

M`t1t2

d`t1,outd`t2,in

˙

`M´t1t2 log

ˆ

M´t1t2

d´t1,outd´t2,in

˙

compute probability of acceptance:paccept “ min

”

expp´β∆SqpsÑr

prÑs, 1ı

draw a uniform random variable x3 „ Uniformp0, 1qif x3 ď paccept then

return bì “ s // accept the proposalelse

return bì “ r // reject the proposalend

end

APPENDIX B: MATRIX-BASED BATCH UPDATE PSEUDOCODE

Algorithm 2: Batch Assignment Update for All Nodesinput : Γ´: current block assignment matrix for all nodes

M´: current B ˆB inter-block edge count matrixA: graph adjacency matrix

output: Γ`: new block assignments for all nodes

// propose new block assignmentscompute node degrees: k “ pAÀT q1

compute block degrees: dóut “M´1 ; dín “M

´T

1 ; d´ “ dóut ` dín

compute probability for drawing each neighbor: PNbr “ RowDividepAÀT ,kqdraw neighbors (Nbr is a binary selection matrix): Nbr “ MultinomialDrawpPrnq

compute probability of uniform random proposal: pUnifProp “B

NbrΓ´d´`B

compute probability of block transition: PBlkTran “ RowDividepM´ `M´T

,d´qcompute probability of block transition proposal: PBlkProp “NbrΓ

´PBlkTran

propose new assignments uniformly: ΓUnif “ UniformDrawpB,Nqpropose new assignments from neighborhood: ΓNbr “ MultinomialDrawpPBlkPropq

draw N Uniformp0, 1q random variables xcompute which proposal to use for each node: IUnifProp “ x ď pUnifProp

select block assignment proposal for each node:ΓP “ RowMultiplypΓUnif , IUnifPropq ` RowMultiplypΓNbr, p1´ IUnifPropqq

// accept or reject the proposalscompute change in edge counts by row and col: ∆M`

row “ AΓ´ ; ∆M`col “ A

TΓ´

update edge count matrix for each proposal: (resulting matrix is N ˆ P ˆ P ):M`

ijk “M´jk ´ Γíj∆M

`row,ik ` ΓP

ij∆M`row,ik ´ Γík∆M`

col,ij ` ΓPik∆M`

col,ij

update block degrees for each proposal: (resulting matrix is N ˆ P ):Dòut,ij “ dóut,j ´ Γíj

ř

k ∆M`row,ik ` ΓP

ij

ř

k ∆M`row,ik

Dìn,ij “ dín,j ´ Γíjř

k ∆M`col,ik ` ΓP

ij

ř

k ∆M`col,ik

compute the proposal probabilities for Hastings correction (N ˆ 1 vectors):prÑs “

”

ppNbrΓ´q ˝ pΓPM´ ` ΓPM´

T

` 1q ˝ RepMatp 1d´`B , Nq

ı

1

psÑr,i “

”

ppNbrΓ´q ˝ pΓ´M`

i‚‚ ` Γ´M`T

i‚‚ ` 1q ˝ 1D`

out`Dìn`B

ı

1

compute change in log posterior (only need to operate on the impacted rows and columns corresponding to r,s, and the neighboring blocks to i):

∆Si “ř

jk

„

´Mìjk log

ˆ

Mìjk

Dòut,ij`Dìn,ik

˙

`M´jk log

ˆ

M´jk

dóut,j`dín,k

˙

compute probabilities of accepting the proposal (N ˆ 1 vector):pAccept “ min

”

expp´β∆Sq ˝ psÑr ˝1

prÑs,1ı

draw N Uniformp0, 1q random variable xAccept

compute which proposals to accept: IAccept “ xAccept ď pAccept

return Γ` “ RowMultiplypΓP , IAcceptq ` RowMultiplypΓ´, p1´ IAcceptqq

APPENDIX C: LIST OF NOTATIONS

Below is a list of notations used in this document:

N : Number of nodes in the graphB: Number of blocks in the partitionA: Adjacency matrix of size N ˆN , where Aij is the

edge weight from node i to jk: Node degree vector of N elements, where ki is the

total (i.e. both in and out) degree of node iK: Node degree matrix of N ˆB elements, where kit

is the total number of edges between node i andblock t

Ni: Neighborhood of node i, which is a set containingall the neighbors of i

´: Superscript that denotes any variable from the previ-ous MCMC iteration

`: Superscript that denotes any updated variable in thecurrent MCMC iteration

b: Block assignment vector of N elements where bi isthe block assignment for node i

Γ: Block assignment matrix of N ˆB elements whereeach row Γi‚ is a binary indicator vector with 1only at the block node i is assigned to. ΓP is theproposed block assignment matrix.

M : Inter-block edge count matrix of size BˆB, whereMij is the number of edges from block i to j

M`: Updated inter-block edge count matrix for eachproposal, of size N ˆB ˆB

∆M`

rowcol: Row and column updates to the inter-blockedge count matrix, for each proposal. This matrixis of size N ˆB.

din: In-degree vector of B elements, where din,i is thenumber of edges into block i

dout: Out-degree count vector of B elements, wheredout,i is the number of edges out of block i

d: Total edge count vector of B elements, where di isthe total number of edges into and out of block i.d “ din ` dout

D`

inout: In and out edge count matrix for each block,on each proposal. It is of size N ˆB

∆S: The difference in log posterior between the previ-ous block assignment and the new proposed assign-ment

β: Learning rate of the MCMCprÑs: Probability of proposing block s on the node to

be updated which currently is in block rpAccept: Probability of accepting the proposed block on

the nodePNbr: Matrix of N ˆ N elements where each element

PNbr,ij is the probability of selecting node j when

updating node iNbr: Matrix of NˆN elements where each row Nbr,i‚

is a binary indicator vector with 1 only at j,indicating that j is selected when updating i

pUnifProp: Vector of N elements representing the proba-bility of uniform proposal when updating each node

PBlkTran: Matrix of BˆB elements where each elementPBlkTran,ij is the probability of landing in block jwhen randomly traversing an edge from block i

PBlkProp: Matrix of N ˆ B elements where each el-ement PBlkProp,ij is the probability of proposingblock assignment j for node i

ΓUnif : Block assignment matrix from uniform proposalacross all blocks. It has NˆB elements where eachrow ΓUnif,i‚ is a binary indicator vector with 1 onlyat the block node i is assigned to

ΓNbr: Block assignment matrix from neighborhood pro-posal. It has N ˆ B elements where each rowΓUnif,i‚ is a binary indicator vector with 1 only atthe block node i is assigned to

IUnifProp: Binary vector of N elements with 1 at eachnode taking the uniform proposal and 0 at each nodetaking the neighborhood proposal

IAccept: Binary vector of N elements with 1 at eachnode where the proposal is accepted and 0 wherethe proposal is rejected

Uniformpx, yq: Uniform distribution with range from xto y

δtk: Dirac delta function which equals 1 if t “ k and 0otherwise.

RowDividepA, bq: Matrix operator that divides eachrow of matrix A by the corresponding element invector b

RowMultiplypA, bq: Matrix operator that multiplieseach row of matrix A by the corresponding elementin vector b

UniformDrawpB,Nq: Uniformly choose an elementfrom t1, 2, ..., Bu as the block assignment N timesfor each node, and return a N ˆ B matrix whereeach row i is a binary indicator vector with 1 onlyat j, indicating node i is assigned block j

MutinomialDrawpPBlkPropq: For each row of the pro-posal probability matrix PBlkProp,i‚, draw an blockaccording to the multinomial probability vectorPBlkProp,i‚ and return a N ˆB matrix where eachrow i is a binary indicator vector with 1 only at j,indicating node i is assigned block j

streaming graph challenge: stochastic block partition...

Documents