community structure in time-dependent, multiscale, and ... · pdf filecommunity structure in...

Click here to load reader

Post on 29-Jun-2018

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Community Structure inTime-Dependent, Multiscale,and Multiplex NetworksPeter J. Mucha,1,2* Thomas Richardson,1,3 Kevin Macon,1 Mason A. Porter,4,5 Jukka-Pekka Onnela6,7

    Network science is an interdisciplinary endeavor, with methods and applications drawn from acrossthe natural, social, and information sciences. A prominent problem in network science is thealgorithmic detection of tightly connected groups of nodes known as communities. We developed ageneralized framework of network quality functions that allowed us to study the communitystructure of arbitrary multislice networks, which are combinations of individual networks coupledthrough links that connect each node in one network slice to itself in other slices. This frameworkallows studies of community structure in a general setting encompassing networks that evolve overtime, have multiple types of links (multiplexity), and have multiple scales.

    Thestudy of graphs, or networks, has a longtradition in fields such as sociology andmathematics, and it is now ubiquitous inacademic and everyday settings. An importanttool in network analysis is the detection ofmesoscopic structures known as communities (orcohesive groups), which are defined intuitively asgroups of nodes that are more tightly connected toeach other than they are to the rest of the network(13). One way to quantify communities is by aquality function that compares the number ofintracommunity edges to what one would expectat random.Given the network adjacencymatrixA,where the element Aij details a direct connectionbetween nodes i and j, one can construct a qual-ity functionQ (4, 5) for the partitioning of nodesinto communities as Q = ij (Aij Pij)d(gi, gj),where d(gi, gj) = 1 if the community assignmentsgi and gj of nodes i and j are the same and 0otherwise, and Pij is the expected weight of theedge between i and j under a specified null model.

    The choice of null model is a crucial con-sideration in studying network community struc-ture (2). After selecting a null model appropriateto the network and application at hand, one canuse a variety of computational heuristics to assignnodes to communities to optimize the quality Q(2, 3). However, such null models have not beenavailable for time-dependent networks; analyseshave instead depended on ad hoc methods to

    piece together the structures obtained at differenttimes (69) or have abandoned quality functionsin favor of such alternatives as the MinimumDescriptionLength principle (10). Although tensordecompositions (11) have been used to clusternetwork data with different types of connections,no quality-function method has been developedfor such multiplex networks.

    We developed a methodology to remove theselimits, generalizing the determination of commu-nity structure via quality functions to multislicenetworks that are defined by coupling multipleadjacency matrices (Fig. 1). The connectionsencoded by the network slices are flexible; theycan represent variations across time, variationsacross different types of connections, or evencommunity detection of the same network atdifferent scales. However, the usual procedure forestablishing a quality function as a direct count ofthe intracommunity edge weight minus that

    expected at random fails to provide any contribu-tion from these interslice couplings. Because theyare specified by common identifications of nodesacross slices, interslice couplings are either presentor absent by definition, so when they do fall insidecommunities, their contribution in the count of intra-community edges exactly cancels that expected atrandom. In contrast, by formulating a null model interms of stability of communities under Laplaciandynamics, we have derived a principled generaliza-tion of community detection to multislice networks,

    REPORTS

    1Carolina Center for Interdisciplinary Applied Mathematics,Department of Mathematics, University of North Carolina,Chapel Hill, NC 27599, USA. 2Institute for Advanced Materials,Nanoscience and Technology, University of North Carolina,Chapel Hill, NC 27599, USA. 3Operations Research, NorthCarolina State University, Raleigh, NC 27695, USA. 4OxfordCentre for Industrial and Applied Mathematics, MathematicalInstitute, University of Oxford, Oxford OX1 3LB, UK. 5CABDyNComplexity Centre, University of Oxford, Oxford OX1 1HP, UK.6Department of Health Care Policy, Harvard Medical School,Boston, MA 02115, USA. 7Harvard Kennedy School, HarvardUniversity, Cambridge, MA 02138, USA.

    *To whom correspondence should be addressed. E-mail:mucha@unc.edu

    1

    2

    3

    4

    Fig. 1. Schematic of amultislice network. Four slicess= {1, 2, 3, 4} represented by adjacencies Aijs encodeintraslice connections (solid lines). Interslice con-nections (dashed lines) are encoded byCjrs, specifyingthe coupling of node j to itself between slices r and s.For clarity, interslice couplings are shown for only twonodes and depict two different types of couplings: (i)coupling between neighboring slices, appropriate forordered slices; and (ii) all-to-all interslice coupling,appropriate for categorical slices.

    node

    s

    resolution parameters

    coupling = 0

    1 2 3 4

    5

    10

    15

    20

    25

    30

    node

    s

    resolution parameters

    coupling = 0.1

    1 2 3 4

    5

    10

    15

    20

    25

    30

    node

    s

    resolution parameters

    coupling = 1

    1 2 3 4

    5

    10

    15

    20

    25

    30

    Fig. 2. Multislice community detection of theZachary Karate Club network (22) across multipleresolutions. Colors depict community assignments ofthe 34 nodes (renumbered vertically to groupsimilarly assigned nodes) in each of the 16 slices(with resolution parameters gs = {0.25, 0.5,, 4}),for w = 0 (top), w = 0.1 (middle), and w =1 (bottom). Dashed lines bound the communitiesobtained using the default resolution (g = 1).

    14 MAY 2010 VOL 328 SCIENCE www.sciencemag.org876

    CORRECTED 16 JULY 2010; SEE LAST PAGE

    on

    July

    6, 2

    012

    www.

    scie

    ncem

    ag.o

    rgDo

    wnlo

    aded

    from

  • with a single parameter controlling the interslicecorrespondence of communities.

    Important to our method is the equivalencebetween themodularity quality function (12) [witha resolution parameter (5)] and stability of com-munities under Laplacian dynamics (13), whichwe have generalized to recover the null models forbipartite, directed, and signed networks (14). First,we obtained the resolution-parameter generaliza-

    tion of Barbers null model for bipartite networks(15) by requiring the independent joint probabilitycontribution to stability in (13) to be conditionalon the type of connection necessary to stepbetween two nodes. Second, we recovered thestandard null model for directed networks (16, 17)(again with a resolution parameter) by generaliz-ing the Laplacian dynamics to include motionalong different kinds of connectionsin this case,

    both with and against the direction of a link. Bythis generalization, we similarly recovered a nullmodel for signed networks (18). Third, weinterpreted the stability under Laplacian dynamicsflexibly to permit different spreading weights onthe different types of links, giving multiple reso-lution parameters to recover a general null modelfor signed networks (19).

    We applied these generalizations to derive nullmodels for multislice networks that extend theexisting quality-function methodology, includingan additional parameter w to control the couplingbetween slices. Representing each network slice sby adjacencies Aijs between nodes i and j, withinterslice couplingsCjrs that connect node j in slicer to itself in slice s (Fig. 1), we have restricted ourattention to unipartite, undirected network slices(Aijs = Ajis) and couplings (Cjrs = Cjsr), but we canincorporate additional structure in the slices andcouplings in the same manner as demonstrated forsingle-slice null models. Notating the strengths ofeach node individually in each slice by kjs =iAijsand across slices by cjs = rCjsr, we define themultislice strength by kjs = kjs + cjs. The continuous-time Laplacian dynamics given by

    pis jrAijsdsr dijCjsrpjr

    kjr pis 1

    respects the intraslice nature of Aijs and theinterslice couplings of Cjsr. Using the steady-stateprobability distribution pjr kjr=2m, where 2m = jrkjr, we obtained the multislice null model interms of the probability ris| jr of sampling node i inslice s conditional on whether the multislice struc-ture allowsone to step from ( j, r) to (i, s), accountingfor intra- and interslice steps separately as

    risj jrpjr

    kis2ms

    kjrkjr

    dsr Cjsrcjr

    cjrkjr

    dij

    ! "kjr2m

    2

    where ms = jkjs. The second term in parentheses,which describes the conditional probability ofmotion between two slices, leverages the definitionof the Cjsr coupling. That is, the conditionalprobability of stepping from ( j, r) to (i, s) alongan interslice coupling is nonzero if and only if i = j,and it is proportional to the probability Cjsr/kjr ofselecting the precise interslice link that connects toslice s. Subtracting this conditional joint probabilityfrom the linear (in time) approximation of theexponential describing the Laplacian dynamics,weobtained a multislice generalization of modularity(14):

    Qmultislice 12m

    ijsr

    h#Aijs gs

    kiskjs2ms

    dsr$

    dijCjsridgis,gjr 3

    where we have used reweighting of the conditionalprobabilities, which allows a different resolution gsin each slice. We have absorbed the resolution pa-rameter for the interslice couplings into the mag-nitude of the elements ofCjsr, which, for simplicity,we presume to take binary values {0,w} indicatingthe absence (0) or presence (w) of interslice links.

    1800 1820 1840 1860 1880 1900 1920 1940 196

View more