[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...

Download [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - Real Time Distributed Community Structure Detection in Dynamic Networks

Post on 09-Dec-2016

221 views

Category:

Documents

6 download

Embed Size (px)

TRANSCRIPT

  • Real Time Distributed Community StructureDetection in Dynamic Networks

    Valerie GalluzziDepartment of Computer Science

    University of Iowa14 MacLean Hall Iowa City, IA, 52242-2429

    USATelephone: (319)353-2037

    Email: valerie-galluzzi@uiowa.edu

    AbstractCommunities can be observed in many real-worldgraphs. In general, a community can be thought of as a portionof a graph in which intra-community links are dense while inter-community links are sparse. Automatic community structuredetection has been well studied in static graphs. However, manypractical applications of community structure involve networksin which communities change dynamically over time. Severalmethods of detecting the community structure of dynamic graphshave been proposed, however most treat the dynamic graph as aseries of static snapshots, which creates unrealistic assumptions.Others require large amounts of computational resources orrequire knowledge of the dynamic graph from start to nish,relegating them to post-processing. For those who desire real-timecommunity structure detection distributed over the observingnetwork, these solutions are insufcient. This paper proposes anew method of community structure detection which allows forreal time distributed detection of community structure.

    I. INTRODUCTIONMany common phenomena can be modeled as networks.

    Examples from everyday life include information networkslike the internet, transportation networks, social networksboth electronic and real-world, and epidemiological contactnetworks. In many of these networks we observe clusters ofnodes which are densely connected while their connections toother nodes are comparatively few. Such clusters are knownas communities, and are common features of social networks.

    Automatic detection of communities is essential for someapplications. One example is the understanding of diseasediffusion through a social network, where a sophisticatedunderstanding of the underlying community structure couldhelp with better understanding disease spread and guide thedeployment of interventions like vaccines or warnings toincrease hand washing. Automatically detecting communitystructure in graphs that are static, or unchanging over time, iswell explored, and a survey can be found here [1]. However,communities in social network graphs often change over time,leading to dynamic community structure which is difcult tocorrectly detect automatically.

    Some algorithms for automatically detecting communitystructure in dynamic graphs have been proposed. Initial at-tempts tracked communities rather than detecting them in thenetwork [2], [3], [4]. However, these could not nd a newlydeveloped community that appears in the network.

    2 2 2 2 2 2 2 2 2 2 2 2

    Time

    1

    2

    3

    4

    Snapshot

    Sequence A Sequence B Sequence C

    Fig. 1. Edge sequences. All sequences map to same static snapshot. If edgeswere weighted by number of times observed, weights would be 2.

    More recent algorithms apply known static communitydetection algorithms to static graphs created from snapshotsof the dynamic network [6], [7]. However, these need someway of capturing a dynamic network in a series of staticsnapshots, which can be challenging. For instance, each ob-served sequence of edges in gure 1 could map to the samestatic snapshot, even if we choose to weight edges by thenumber of contacts. This is despite the fact that the differentsequences could mean different outcomes. For instance, in anepidemiological setting, a disease starting at the leftmost nodeand capable of infecting one more node with each timestepwould infect 4 nodes in A, 3 nodes in B, and 5 in C. Shouldthey all map to the same static snapshot? These questions makeit difcult to create good mappings to static snapshots.

    Others assume that the entire network is known from therst timestep to the last, and apply post-processing techniquesto produce community structure assignments [8]. These meth-ods preclude real time results.

    However, there exist some applications for which commu-nity structure detection should happen in real time, ideallyin a distributed fashion, a situation which current methodscannot accommodate. An example is the University of IowaComputational Epidemiology groups deployment of sensors

    2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

    978-0-7695-4799-2/12 $26.00 2012 IEEEDOI 10.1109/ASONAM.2012.213

    1268

    2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

    978-0-7695-4799-2/12 $26.00 2012 IEEEDOI 10.1109/ASONAM.2012.213

    1236

  • in the University of Iowa Hospital [9]. Sensors are small radioequipped devices, and in this case they are attached to peopleand objects. A contact between two sensors is detected whentheir radios are able to communicate with each other and soa network is created by which health care worker movementsand interactions can be observed through the contacts of theirattached sensors.

    An ideal application of community structure detection inthis setting would be assessments of individual risk for in-fection given travel between and within health care workercommunities. If this assessment occurred in post processing,interventions would be difcult to apply on an individualizedbasis as the collected sensor data is anonymized to protectworker privacy. A centralized real time approach would dependupon good connections between sensors and a centralizedcommunity structure detector, which may be difcult to ensuregiven that the sensors are frequently moving. A more idealsetting would be risk levels which are calculated in real timeon a health care workers sensor based on their travel betweencommunities and then displayed to the worker, avoiding pri-vacy concerns and connectivity and post processing issues.

    Another application would be a system that delivers ads tomobile devices. Said system might wish to differentiate thosewho frequent the location from those who are passing throughto deliver different content. This could be accomplished bydetermining whether or not the user is a member of a localcommunity. If community assessment were real time anddistributed then ads could be delivered immediately and the adsystem could be deployed without worrying about connectivityto a master community structure detector.

    These applications require real time assessment of com-munity structure in a distributed fashion, which cannot berealized by current community structure detection algorithms.This paper aims to introduce such an algorithm.

    II. ALGORITHMThe following section describes our algorithm which is

    lightweight and customizable. It is inspired by the staticnetwork community detection algorithm proposed by [10].

    A. DenitionsA dynamic graph G is a series of graphs G =

    G0, G1, ..., GT where Gt(V,Et) is the graph with vertex setV and the edge set consisting of the edges at time stept 0, 1, ..., T . Dene ci as the one community of which iis a member. Let h be the maximum number of contacts iwill remember. Let a history Hi be the list of the communityafliations of node is h most recent contacts at the time imade contact with them.

    Let d be a parameter such that the most recent contact inHi has an importance of h while the next most recent hasimportance of max(hd, 1), and the kth most recent contactin Hi has importance of max(h dk, 1). Let p >> h bea number such that 1/p is the probability that the node willrandomly relabel itself by changing its community to its ownid. Let tr be the number of contacts that will be notied ofrelabeling.

    TABLE IALGORITHM FOR EACH NODE

    Require: Hi be the list of the community label of is h most recent contactsat time of contact

    Require: ri = True if i is currently in a relabeling phase, Falseotherwise

    Require: rci is the number of contacts for which i has been in therelabeling phase

    Require: h, d, p, tr dened as described in II-AFindCommunity(i:Self ID, j:Connected Node):if |Hi| h then

    Remove the oldest contact in Hiend ifHi = Hi + cjif ri then

    With probability 1/pLet ai = ci and bi = iRelabel ci with birci = 0ri = True

    end ifif ri then

    Replace all instances of ai in Hi with biend ifif rj then

    Replace all instances of aj in Hi with bjend ifLet Li be a list of all communities mentioned in HiLet all communities in Li have weight 1for lk in Hi do

    Add max(h dk, 1) to the community lk in list Liend forci = the highest weight community in Liif rci = tr then

    ri = Falseend ifif ri then

    rci = rci + 1end if

    B. Algorithm

    Observe the following:1) Individuals are more likely to interact with fellow com-

    munity members than with members of other communi-ties

    2) Interactions are captured as edges between two individ-uals.

    Let the set Ei be all edges with i as an endpoint in subsetSb,e = Gb, Gb+1, ..., Ge where [b, e] is the interval of interest.Given these two observations, we can see that if in set Ei nodei interacts (shares edges) mostly with members of community lthen it is probably a member of community l during that time.If individuals are known to change community infrequentlythen the current community association of the individual islikely to be predictive of future community association.

    Using these insights, we can estimate community mem-bership on an individual basis in real time using only localinformation with the algorithm shown in table I.

    C. Analysis

    1) Runtime: If this algorithm were simulated using thealgorithm in table II, it would run in O(TEh) time. However,the T and E gures are not relevant to the individual node

    12691237

  • TABLE IIALGORITHM FOR SIMULATION

    Set ci = i i {Initialize each nodes community to be its ID}Set Hi = {i} i {Initialize each nodes histo

Recommended

View more >