community detection in multiplex social networks

6
Community Detection in Multiplex Social Networks Hung T. Nguyen, Thang N. Dinh Department of Computer Science Virginia Commonwealth University Richmond, Virginia Email: {hungnt, tndinh}@vcu.edu Tam Vu Department of Computer Science and Engineering University of Colorado, Denver Denver, Colorado Email: [email protected] Abstract—Community detection has emerged rapidly as an important problem for many years. Although a large number of methods for this problem have been proposed, none of them address directly the problem for multiplex Online Social Networks (OSNs) in which a user can have multiple accounts in different networks. In this paper, we propose and compare two classes of approaches named Unifying Approach and Coupling Approach for community detection in multiplex OSNs. Moreover, we develop for each class a specialized NMF-based algorithm. For testing purposes, we extend the LFR benchmark to generate multiplex OSNs. Our intensive experiments show the significant improvement of our methods over the naive approach of finding community structure (CS) in each network separately. I. I NTRODUCTION Online social networks (OSNs) have become ubiquitous in everyday settings for decades. Many popular OSNs now have millions of users such as Twitter, Google+ or even billions of users as in the case of Facebook [1]. Despite their distinct natures, social networks exhibit several common topological properties, such as small-world [2], scale-free phenomenon [3] and the crucial feature known as community structure (CS) [4]. Communities can be defined intuitively as groups of nodes that are more densely connected to each other than to the rest of the network. For example, a community in Facebook may correspond to a group of users who share a common interest, such as cooking, fashion, music, etc. The goal of community detection, consequently, is to partition meaningfully networks into groups of nodes. Thus, it lends itself into a wealth of applications, such as forwarding and routing strategies in communication networks [5], [6]. Such structures give us insight into how the network function and topology affect each other. A large number of methods has been proposed for community detection (see [7] and the reference therein), however, there is no study on the problem in multiplex OSNs where a user can participate in multiple networks at once. In multiplex social networks, the participant of users across multiple networks requires us to analyze all the networks (also referred to as the layers) simultaneously. The connections in a layer may reveal latent relationship in other layers and, there- fore, provide additional information to unveil the underlying structure of those networks. As is illustrated in Fig. 1 of a set of users in Facebook, Twitter and LinkedIn, by intuition, we should have two communities, {1,2} and {3,4,5,6,7}. However, if finding CS in each layer separately, we will obtain several distinct CSs which are usually not the one from our intuition. Community detection in multiplex OSNs exposes several challenges. First, multiplex OSNs are often heterogeneous, i.e., they can be directed vs. undirected, weighted vs. unweighted or have different degree densities. Moreover, the diverse topolog- ical wirings of networks make the problem very complicated. 2 3 4 5 Facebook 6 1 7 2 3 4 5 Twitter 6 1 2 3 4 5 LinkedIn 1 7 6 Fig. 1. A toy example of 7 users participating in three OSNs, namely Facebook, Twitter and LinkedIn. If we analyze each layer separately, node 3 can be grouped with node 1 and 2 or with 4 and 5 in Facebook network. However, with the information from Twitter, we can surely assign node 3 to the same community with nodes 1, 2. From LinkedIn, we obtain one more structural information that nodes 3, 4, 5, 6, 7 should be in the same group. Despite a large amount of research on CS detection, CS in multiplex OSNs remains unaddressed at large. The closest works are the ones on CS detection in multi-relational net- works [8], however, these methods cannot be applied directly for multiplex social networks. The reason for that lies in an unique feature that multiplex OSNs has only one entity type, i.e., user, each entity is present in several layers and existing approaches ignore this important phenomenon. In this paper, we address CS detection problem in multiplex OSNs, where the networks can be directed, undirected or weighted, unweighted. Our main contributions are: 1 We proposed and compare two classes of approaches. The first class, named unifying approach, finds a consistent CS in the networks by aggregating multiple accounts of the same users. The second class finds mostly consistent CSs in the network using coupling techniques. We also develop specialized NMF-based method for each class. 2 We extend the LFR benchmark [7] to create a new benchmark for community detection in multiplex OSNs. The new extension is capable of generating layers with varying node’s degree distribution and the fraction between links inside and outside communi- ties. 3 We carry intensive experiments on synthesized data. The results suggest that our approaches outperform the naive approach of finding CS in each network separately. Related works. CS detection has attracted enormous at- tention from network sciences. Inspired by the work of New- man and Girvan [4], various algorithms have been developed 978-1-4673-7131-5/15/$31.00 ©2015 IEEE IEEE Workshop on Inter-Dependent Networks 2015 654

Upload: others

Post on 13-Jul-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Community Detection in Multiplex Social Networks

Community Detection in Multiplex Social Networks

Hung T. Nguyen, Thang N. DinhDepartment of Computer Science

Virginia Commonwealth UniversityRichmond, Virginia

Email: {hungnt, tndinh}@vcu.edu

Tam VuDepartment of Computer Science and Engineering

University of Colorado, DenverDenver, Colorado

Email: [email protected]

Abstract—Community detection has emerged rapidly as animportant problem for many years. Although a large numberof methods for this problem have been proposed, none ofthem address directly the problem for multiplex Online SocialNetworks (OSNs) in which a user can have multiple accounts indifferent networks. In this paper, we propose and compare twoclasses of approaches named Unifying Approach and Coupling

Approach for community detection in multiplex OSNs. Moreover,we develop for each class a specialized NMF-based algorithm.For testing purposes, we extend the LFR benchmark to generatemultiplex OSNs. Our intensive experiments show the significantimprovement of our methods over the naive approach of findingcommunity structure (CS) in each network separately.

I. INTRODUCTION

Online social networks (OSNs) have become ubiquitous ineveryday settings for decades. Many popular OSNs now havemillions of users such as Twitter, Google+ or even billionsof users as in the case of Facebook [1]. Despite their distinctnatures, social networks exhibit several common topologicalproperties, such as small-world [2], scale-free phenomenon [3]and the crucial feature known as community structure (CS) [4].

Communities can be defined intuitively as groups of nodesthat are more densely connected to each other than to the restof the network. For example, a community in Facebook maycorrespond to a group of users who share a common interest,such as cooking, fashion, music, etc. The goal of communitydetection, consequently, is to partition meaningfully networksinto groups of nodes. Thus, it lends itself into a wealthof applications, such as forwarding and routing strategies incommunication networks [5], [6]. Such structures give usinsight into how the network function and topology affecteach other. A large number of methods has been proposedfor community detection (see [7] and the reference therein),however, there is no study on the problem in multiplex OSNswhere a user can participate in multiple networks at once.

In multiplex social networks, the participant of users acrossmultiple networks requires us to analyze all the networks (alsoreferred to as the layers) simultaneously. The connections in alayer may reveal latent relationship in other layers and, there-fore, provide additional information to unveil the underlyingstructure of those networks. As is illustrated in Fig. 1 of a setof users in Facebook, Twitter and LinkedIn, by intuition, weshould have two communities, {1,2} and {3,4,5,6,7}. However,if finding CS in each layer separately, we will obtain severaldistinct CSs which are usually not the one from our intuition.

Community detection in multiplex OSNs exposes severalchallenges. First, multiplex OSNs are often heterogeneous, i.e.,they can be directed vs. undirected, weighted vs. unweighted or

have different degree densities. Moreover, the diverse topolog-ical wirings of networks make the problem very complicated.

2

34

5

Facebook

61

7

2

34

5

Twitter

61

2

34

5

LinkedIn

1

7

6

Fig. 1. A toy example of 7 users participating in three OSNs, namelyFacebook, Twitter and LinkedIn. If we analyze each layer separately, node3 can be grouped with node 1 and 2 or with 4 and 5 in Facebook network.However, with the information from Twitter, we can surely assign node 3 tothe same community with nodes 1, 2. From LinkedIn, we obtain one morestructural information that nodes 3, 4, 5, 6, 7 should be in the same group.

Despite a large amount of research on CS detection, CSin multiplex OSNs remains unaddressed at large. The closestworks are the ones on CS detection in multi-relational net-works [8], however, these methods cannot be applied directlyfor multiplex social networks. The reason for that lies in anunique feature that multiplex OSNs has only one entity type,i.e., user, each entity is present in several layers and existingapproaches ignore this important phenomenon.

In this paper, we address CS detection problem in multiplexOSNs, where the networks can be directed, undirected orweighted, unweighted. Our main contributions are:

1 We proposed and compare two classes of approaches.The first class, named unifying approach, finds aconsistent CS in the networks by aggregating multipleaccounts of the same users. The second class findsmostly consistent CSs in the network using couplingtechniques. We also develop specialized NMF-basedmethod for each class.

2 We extend the LFR benchmark [7] to create a newbenchmark for community detection in multiplexOSNs. The new extension is capable of generatinglayers with varying node’s degree distribution and thefraction between links inside and outside communi-ties.

3 We carry intensive experiments on synthesized data.The results suggest that our approaches outperformthe naive approach of finding CS in each networkseparately.

Related works. CS detection has attracted enormous at-tention from network sciences. Inspired by the work of New-man and Girvan [4], various algorithms have been developed

978-1-4673-7131-5/15/$31.00 ©2015 IEEE

IEEE Workshop on Inter-Dependent Networks 2015

654

Page 2: Community Detection in Multiplex Social Networks

for the problem including clique-based, degree-based, matrix-pertubation-based [8] in social network perspective. Louvain[9] and Infomap [10] are among the top favored methodsaccording to the benchmark in [7]. The scientific communityhas been developing tools for temporal networks [11], althoughmuch more work remains to be done. An increasingly largenumbers of researchers with diverse expertises have turnedtheir attention to studying multiple layer networks [12].

Nonnegative matrix factorization (NMF) was first intro-duced by Paatero and Tapper and popularized by Lee and Se-ung [13]. The main idea is to approximate a nonnegative matrixV by the product of two nonnegative matrix factors W and H.Due to natural nonnegative property of the factorization, thoseworks started a massive flow of researches covering a widerange of area, i.e., text mining, spectral data analysis, speechdenoising, bioinformatics and many more [12]. Recently insocial science, Lin et al. presented MetaFac [8] which usesrelational hypergraph representation and tensor factorization.Wang et al. subsequently used NMF algorithm and proposedthree NMF solutions [14] for undirected, directed networks andcompound networks of different entities (users and movies).Unfortunately, the method cannot be adapted for multiplexOSNs which have single entity type and multiple entities indifferent networks may refer to a same person.

Recently, several problems in single social networks havebeen generalized to multiple network settings. Brodka etal. [15] proposed two separate algorithms for evaluation ofshortest paths in the multi-layered social network. The diffu-sion processes across multiple networks were investigated in[16]. Kazienko et al. studied multidimensional social networksmodel and application in social recommender systems [17].In [18], the authors studied the user matching between socialnetworks problem. However, CS detection in multiplex OSNshas not been systematically investigated.

Organization. The rest of the paper is organized as fol-lows. Section II states the problem definition and introducestwo classes of methods. Sections III and IV present in details oftwo classes of approaches including the detailed formulations,the update rules and the proof for convergence. Experimentalresults are presented in Section V and followed by someconclusion remarks in Section VI.

II. PROBLEM FORMULATION

We model multiplex OSNs as a collection G of graphs. Gconsists of p layers or p single networks. Layer i is abbreviatedby Gi = (Vi, Ei) where Vi and Ei are the set of nodes and theset of edges, respectively in that layer. Note that a node canappear in one or multiple layers. We define set V =

Spi=1 Vi

and n = |V | - the capacity of set V . Now, we can representeach layer in matrix form: Ai is an n⇥n adjacency matrix ofGi. A three layer OSN is illustrated in Fig. 1.

Assume there exist k communities in layer i. We modelthe interaction (Ai)uv between nodes u and v in layer iby a mixture model of combined effect due to all the kcommunities. That is, we approximate (Ai)uv using (Ai)uv =P

m,l pmlpm!upl!v where pml is the interaction densitybetween communities m and l, pm!u and pl!v are theprobabilities that an interaction with communities m and linvolves node u and v, respectively. Written in matrix form,we have Ai = XiSiXT

i where Xi is a non-negative matrixwith (Xi)um = pm!u, Si is also a non-negative matrix with

(Si)ml = pml. Our goal is to find the CS which can berepresented as a n⇥ k matrix Xi for each layer i where eachrow reflects the community membership for an user. (Xi)um

reveals the strength of participation of user u to communitym. This representation can be used for either overlapping ordisjoint CS. The latter, disjoint CS, is the focus of this paper.

Central assumption. If nodes u, v are in the same com-munity in a layer, they are more likely to belong a communityin the other layers. Based on how strictly we enforce thisassumption, we derive two classes of approaches:

Unifying approach: We force the instances of an user indifferent layers to be in the same community by aggregatingall the layers into a single network where multiple instancesappear as a node in aggregated network.

Coupling approach: We relax the enforcement by usingcoupling schema. Instead of forcing instances of a node, wesuggest them to be in the same community by creating acoupling edges between matching pairs of instances.

III. UNIFYING APPROACH

In this section, we present the unifying approach whichfinds a single CS for all layers. We consider two directions:1) Convert multiple layers to one layer network and applyexisting algorithms and 2) Adapt NMF algorithm on theoriginal networks.A. Network aggregation

To apply existing algorithms for multiplex OSNs, we needto: 1) Aggregate all layers into a single network Gc, 2) Applyexisting CS algorithms, e.g., Louvain [9], Infomap [10], to findCS in Gc, 3) Project the found CS back onto each layer to findtheir CSs.

Given a multiple layer network G as defined in problemformulation section, the aggregated network [12] is denotedas Gc = (V,Ec) where Ec = (E1 [ E2 [ ... [ Ep) andE1, E2, ..., Ep are edge sets in the layers.

Now, we can obviously use algorithms for single networkson aggregated networks. However, the aggregation disclosesitself several shortcomings, i.e., the edge types in the layersmay be different from each other or some layers are weightedbut the others are unweighted. Those characteristics make itdifficult to aggregate the layers. Therefore, we propose theNMF-based algorithm on the original multiple layer networks.B. NMF-based algorithm on the original networks

We present NMF-based algorithms for both directed andundirected networks.

1) Directed networks: We attempt to find a communitymembership matrix X that agrees with the structures of thegiven networks. Specifically, we want to minimize the sumof difference between XSXT and the matrices Ai, i = 1..p.Here S shows the connectivity between communities. Then,the community detection problem can be cast to a nonneg-ative matrix factorization problem. Therefore, we obtain thefollowing objective function

min

X�0,S�0

pX

i=1

d(AikXSXT), (1)

where d(AkB) is the measure for difference between twomatrices. In the literature, we have seen two most popular andwell-studied measures, the former is called the square of theEuclidean distance [13]

IEEE Workshop on Inter-Dependent Networks 2015

655

Page 3: Community Detection in Multiplex Social Networks

kA � Bk2F =

X

i,j

(Aij �Bij)2. (2)

Similarly, the second measure named Kullback-Leiblerdivergence (KL-divergence) [13] of A from B is defined as

D(AkB) =X

i,j

(Aij logAij

Bij�Aij +Bij). (3)

a) Using KL-divergence: The cost function using KL-divergence is as follows

min

X�0,S�0L =

pX

i=1

D(AikXSXT). (4)

We derive the update rules following the framework in [13]

Xjk = Xjk

Ppi=1

Pl(Ai)lj(XS)lk/(XSXT

)lj

p�P

t((XS)tk + (SXT)kt)�

+

Ppi=1

Pl(Ai)jl(SXT

)kl/(XSXT)jl

p�P

t((XS)tk + (SXT)kt)�

!, (5)

Sjk = Sjk

Ppi=1

Ps,t(Ai)stXsjXtk/(XSXT

)st

p (P

st XsjXtk)

!. (6)

Algorithm 1 NMF-based algorithm for directed networksusing KL-divergence

Input: Adjacency matrices {Ai|i = 1..p}, max iterations T .Output: Membership matrix X.Assign Xij , Sij (uniformly) random values in [0,1].repeat

Update Xjk following Eq. 5Update Sjk following Eq. 6

until Convergence or after T iterationsFor each row i, argmaxj{Xij} is the community that nodei is assigned to.Return the list of communities corresponding to the nodes.

Alg. 1 depicts NMF-based algorithms for directed networksusing KL-divergence in unifying approach. The main segmentis the updating procedure where Xjk and Sjk gets updatedin each iteration until convergence or after T updates. Rowi of matrix X shows the participation of user i in all thecommunities. Therefore, we assign user i to the communitycorresponding to the largest value in Xi, if there several suchcommunities, choose the first one.

To find the number of communities k, we adopt one of themost popular approaches used in [19]. We choose k at whichthe modularity function Q achieves the maximum (see [4] formore details).

Theorem 1: The value of the objective function in Eq. 4 isnon-increasing and converged to an local minimum under theupdates rules in Eq. 5 and Eq. 6.

Proof: We will show that X and S converge and theconvergence point is a local minimum.

Convergence: To prove the convergence, we need to findthe auxiliary functions for X and S that lead to the updaterules. We define the following auxiliary functions Q(X, ˜X) andQ(S, ˜S):

Q(X, ˜X) =

pX

i=1

⇣X

jk

(Ai)jk (log(Ai)jk � 1)

+

1

2

X

jk

⇣(YS˜X

T)jk + (

˜XSYT)jk

⌘⌘

�pX

i=1

⇣X

jk

(Ai)jk

X

uv

⌘jkuv (log(XjvSvuXku)� log(⌘jkuv))⌘,

Q(S, ˜S) =pX

i=1

⇣X

jk

(Ai)jk (log(Ai)jk � 1) +

X

jk

(XSXT)jk

�pX

i=1

⇣X

jk

(Ai)jk

X

uv

�jkuv (log(XjvSvuXku)� log(�jkuv))

⌘.

where�jkuv =

Xjv˜SvuXkuP

s,t Xjt˜StsXks

, ⌘jkuv =

XjvSvu˜XkuP

s,t XjtSts˜Xks

,

Yij =Xij

˜Xij

.

Then, we only need to verify that Q(X, ˜X) � F (X) andQ(S, ˜S) � F (S). The second summation of these inequalitiesare equivalent (with substitution of �jkuv to ⌘jkuv) to

� log(

X

u,v

�jkuvXjvSvuXku

�jkuv) �

X

u,v

�jkuv log(XjvSvuXku

�jkuv),

which holds due to Jensen’s inequality [13] and the convex-ity of logarithmic function. So, we can verify Q(S, ˜S) � F (S).We also have 1

2 (YS˜XT)jk+

12 (

˜XSYT)jk � (XSXT

)jk and thatmakes the inequality Q(X, ˜X) � F (X) satisfied. Then, takingthe derivatives of Q(X, ˜X) and Q(S, ˜S), we get the updaterules.

Local minimum: We need to point out that the updaterules satisfy the KKT slackness conditions [13].

Introducing the Lagrangian multipliers ↵jk and �jk to theloss function L , we have

J =

pX

i=1

D(AikXSXT) =

pX

i=1

X

j,k

((Ai)jk log(Ai)jk

(XSXT)jk

� (Ai)jk + (XSXT)jk) +

X

j,k

↵jkXjk +

X

j,k

�jkSjk.

Take the derivatives of J in terms of Xjk and Sjk

�J

�Xjk=

pX

i=1

⇣�X

l

(Ai)lj(XS)lk(XSXT

)lj

�X

l

(Ai)jl(XS)kl(XSXT

)jl

+

X

t

((XS)tk + (SXT)kt)

⌘� ↵jk,

�J

�Sjk=

pX

i=1

⇣�X

s,t

(Ai)stXsjXtk

(XSXT)st

+

X

st

XsjXtk

⌘� �jk.

Following the KKT slackness conditions, we get�J

�Xjk=

pX

i=1

�X

l

(Ai)lj(XS)lk(XSXT

)lj

�X

l

(Ai)jl(XS)kl(XSXT

)jl

+

X

t

((XS)tk + (SXT)kt)� ↵jk

!Xjk = 0,

�J

�Sjk=

pX

i=1

�X

s,t

(Ai)stXsjXtk

(XSXT)st

+

X

st

XsjXtk � �jk

!Sjk = 0.

IEEE Workshop on Inter-Dependent Networks 2015

656

Page 4: Community Detection in Multiplex Social Networks

Then, we can see that the update rules satisfy the aboveconditions or X and S will converge to a local minimum. Sincematrices Ai, S, and X are all nonnegative during the updatingprocess, the final X and S will also be nonnegative.

b) Using Euclidean distance: We can also use theEuclidean distance and obtain the corresponding cost function

min

X�0,S�0(

pX

i=1

kAi � XSXT k2F ). (7)

Theorem 2: The value of the objective function in Eq. 7 isnon-increasing and converged to an local minimum under thefollowing updates rules

Xjk = Xjk

Ppi=1[A

Ti XS + AiXST

]jk

p[XSXT XST+ XST XT XS]jk

!1/4

, (8)

Sjk = Sjk

Ppi=1[X

T AiX]jk

p[XT XSXT S]jk

!. (9)

We omit the proof of the Theorem 2 due to space limit.2) Undirected networks: The problem in undirected net-

works is actually a special case of that problem in directednetwork. The adjacency matrix for each layer is symmetric,we, therefore, factorize Ai = XXT and then formulate theresulting problem as:

a) KL-divergence version:

min

X�0

pX

i=1

D(AikXXT) (10)

with the simplified update rule only for matrix X

Xjk = Xjk

Ppi=1

Pl(Ai)ljXlk/(XXT

)lj

p (P

t Xtk)

!. (11)

b) Euclidean distance version:

min

X�0,S�0

pX

i=1

kAi � XXT k2F (12)

with the corresponding update rule

Xjk = Xjk

Ppi=1[A

Ti X]jk

p[XXT X]jk

!1/4

. (13)

IV. COUPLING APPROACH

A. Coupling techniquesTo suggest instances of a node in multiple networks being

in the same community, we create coupling edges betweenthem and construct coupled networks. We investigate fourbasic coupling schema [12], namely diagonal, categorical, starand full couplings. In the article [20], the authors apply twovariants of star and aggregated couplings in the context ofleast cost influence problem. They named lossless and lossycoupling schema for two variants, the former is constructedby creating gateway vertices as an intermediate layer similarto star coupling, whereas aggregation is used for the latter.However, they made some modifications to adapt in diffusionprocess, i.e. defining weights and thresholds.

Diagonal coupling [12]: Given two layers Gi and Gi+1, iftwo nodes u 2 Gi and v 2 Gi+1 belong to an entity, thereexists a coupling edges (u, v).

Categorical coupling [12]: For any pair of layers Gi andGj , if two nodes u 2 Gi and v 2 Gj belong to an entity, thereexists a coupling edges (u, v).

Star coupling [12]: We add another intermediate layerGp+1 = (V,E0

) in which E0 is empty and we connect eachnode in Gp+1 to all nodes belonging to the same entity in allother layers.

Full coupling [12]: For two adjacent layers Gi = (V,Ei)

and Gi+1 = (V,Ei+1), if there is an edge (u, v) 2 (Ei[Ei+1),we have coupling edges (ui, vi+1) where ui 2 Gi, vi+1 2Gi+1 and (ui+1, vi) where ui+1 2 Gi+1, vi 2 Gi.

With the knowledge of coupling, besides matrices Ai withi = 1..p, we introduce matrices Aij representing couplingconnections between layers i and j.

B. Directed networksTo find CS in multiplex OSNs using coupling approach,

we do: 1) build coupled network by a coupling scheme, then2) apply a CS detection algorithm on coupled network and 3)extract CS.

After constructing the coupled networks, we can simplyapply existing algorithms for single networks that have anapparent advantage of requiring less effort. Let us take NMFas an example with the cost function

min

X�0,S�0(AkXSXT

), (14)

where A is the giant (n ⇥ p) ⇥ (n ⇥ p) adjacency matrixfor the coupled network. However, we observe that matrix A isvery sparse because it only contains Ai and Aij as its’ buildingblocks. Therefore, we can take advantage of that structure andhave the following NMF problem under KL-divergence

min

Xi�0 8i,S�0

X

i

D(AikXiSXTi ) +

X

i,j

D0(AijkXiSXT

j ), (15)

where D0(AkB) =

PAst 6=0

⇣Ast log

AstBst

�Ast +Bst

⌘.

The first summation corresponds to each layer separately,whereas the second one takes into account the couplings.

Theorem 3: The value of the objective function in Eq. 15is non-increasing and converged to an local minimum underthe following updates rules

(Xi)uv = (Xi)uv

✓X

k

⇣(Ai)uk

(XiSXTi )uk

(SXTi )vk

+

(Ai)ku

(XiSXTi )ku

(XiS)kv⌘+

X

j

⇣(Aij)uu

(XiSXTj )uu

(SXTj )vu

+

(Aji)uu

(XjSXTi )uu

(XjS)uv⌘◆.✓X

k

⇣(SXi)vk + (XiS)kv

+

X

j

⇣(SXT

j )vu + (XjS)uv⌘◆

, (16)

Suv = Suv

Pi

Pp,q(Ai)pq(Xi)pu(Xi)qv/(XiSXT

i )pqPi

Ppq(Xi)pu(Xi)qv +

Pi,j

Pk(Xi)ku(Xi)kv

+

Pi,j

Pk(Aij)kk(Xi)ku(Xi)kv/(XiSXT

j )kkPi

Ppq(Xi)pu(Xi)qv +

Pi,j

Pk(Xi)ku(Xi)kv

!. (17)

The proof the theorem are highly similar to those of unifyingapproach and is skipped to save space.

IEEE Workshop on Inter-Dependent Networks 2015

657

Page 5: Community Detection in Multiplex Social Networks

C. Undirected networksThe problem for undirected networks can be formulated as

min

Xi�0 8i,S�0

X

i

D(AikXiXTi ) +

X

i,j

D0(AijkXiXT

j ), (18)

which is also a special case of problem for directed networkswhen S is an identity matrix. We, therefore, treat them in thesame way and obtain the below update rule

(Xi)uv =(Xi)uv

Pk(Ai)uk(Xi)kv/(XiXT

i )kuPk(Xi)kv +

Pj(Xj)uv

+

Pj(Aji)uu(Xj)uv/(XjXT

i )uuPk(Xi)kv +

Pj(Xj)uv

!. (19)

V. EXPERIMENTAL RESULTS

In this section, we compare different algorithms and cou-pling schema. Specifically, we will represent how to modifythe LFR bechmark [7] to create multiplex OSNs. We, then,run our NMF-based algorithms and two of the best algorithmsfor single network, namely Infomap [10] and Louvain [9] oncoupled networks when varying the fraction of out-community-degree over total degree of each node and simultaneouslychanging the average node degree in each layer. We also testthe algorithms on each layer without coupling to evaluate thesuperior of coupling techniques in multiplex OSNs.

Normalized Mutual Information (NMI) score [21] is usedas the measurement for accuracy. In coupling approach, eachlayer has a CS and we compute NMI score for multiplenetworks at once by combining all the nodes in all the layersto a single CS and compute NMI score on that CS.A. Extend LFR bechmark

LFR benchmark [7] was proposed by Lancichinetti et. al.in 2008 that takes into account the power law property of nodedegree and community size with tunable exponents. The pro-cedure of original benchmark goes through three fundamentalstages:

1 Assigning degree for each node that obeys the power-law distribution with provided exponent.

2 Assigning nodes to communities in the sense that thenumber of nodes in communities also follows power-law distribution with another given exponent. At thesame time, the method determines the in-community-degree and out-community-degree of each node tosatisfy the required fraction µ.

3 Drawing random edges with the specified degrees.

Unfortunately, the LFR benchmark is unable of gener-ating multiplex OSNs. LFR generates single networks withtotally different CS each time it runs even with identicalparameters. Therefore, we make some changes to supportmultilayer feature while still preserving the important powerlaw characteristics.

The point where we can alter the LFR benchmark is afterstep 2 when we have already assigned nodes to communities.To change the average node degree in each layer, we multiplythe nodes’ in-community-degree and out-community-degreewith the ratio of the desired average degree to calculatedaverage degree from the procedure. Thus, we can generatemany layers with the same CS and different nodes’ averagedegrees in each layer.

B. Dataset and SettingsWe create four types of networks which are specified by

the directed and weighted properties. For each network type,we subsequently generate 5 three-layer networks with 1000nodes, when node average degrees in layers are (5, 5, 5); (15,15 ,15); (20, 20, 20); (25, 25, 25); (15, 20, 25) respectively.

In reality, we may not know exactly whether two accountsin different OSNs belong to same user. Therefore, whencoupling two layers, we can only connect p% out of all thevertices. For the testing dataset, we generate coupled networkswhen p = 100% and p = 20%.

All the experiments are carried on undirected unweightednetworks, the results for three other network types are similarand put in supplementary materials. We use a Linux systemrunning on an Intel CPU Core Dual 3 GHz, 4 GB RAMmachine as the testing environment.C. Experimental results

We use the following notations

Comparison of the algorithms. Figs. 2 and 3 present theMNI scores for all the algorithms with varying nodes’ averagedegrees and mixing parameters µ in undirected unweightednetworks. Consistently through all three experiments, NMF-based algorithms always give the highest NMIs and remainstable in all network’s settings. Infomap relies heavily onthe type of coupling, i.e. performing as well as NMF-basedalgorithms on aggregated networks and full-coupled networksbut extremely poorly on other coupled networks. Meanwhile,Louvain’s results lie in the middle of two other methods onall the datasets.

Comparison of coupling schema. Observations on cou-pling schema used, we see that aggregated and full-couplednetworks support best for all the algorithms with highest NMImeasures. Diagonal-coupled, categorical-coupled and star-coupled networks are only suitable for NMF-based algorithmswhen having identical behavior as running on aggregated orfull-coupled networks. However, the results on these networksfor Infomap approach 0 quickly even with very small value ofmixing parameters.

Comparison of coupling and non-coupling. By non-coupling we refer to the approach that find CS in each layerseparately. We run the three algorithms on networks withoutcoupling and with full-coupling, the results are reported in Fig.4. We can easily see that, for the same algorithm, running oncoupled network gives better result, especially in case of NMF-based algorithm. From a general view, NMF-based algorithmshow the best accuracy in term of NMI score before µ reaches0.7. Whereas Fig. 5 shows the results when we keep µ = 0.3and change p. We see that NMF-based algorithm on couplednetwork is by far better than the others, for Infomap andLouvain, running on coupled networks get better than non-coupled cases when p � 0.3.

IEEE Workshop on Inter-Dependent Networks 2015

658

Page 6: Community Detection in Multiplex Social Networks

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

$YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ���

1RUPDOL]HG�0XWXDO�,QIRUPDWLRQ

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

$YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ���

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

0L[LQJ�SDUDPHWHU�1RUPDOL]HG�0XWXDO�,QIRUPDWLRQ $YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ���

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

0L[LQJ�SDUDPHWHU�

$YHUDJH�QRGH�GHJUHH�N�� �����N�� �����N�� ���

Fig. 2. NMI scores on network with p = 100%.

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

$YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ���

1RUPDOL]HG�0XWXDO�,QIRUPDWLRQ

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

$YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ���

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

0L[LQJ�SDUDPHWHU�1RUPDOL]HG�0XWXDO�,QIRUPDWLRQ $YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ���

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��

����������������������������

0L[LQJ�SDUDPHWHU�

$YHUDJH�QRGH�GHJUHH�N�� �����N�� �����N�� ���

Fig. 3. NMI scores on network with p = 20%.

� ��� ��� ��� ��� ��� ��� ��� ��� ��� ����

���

���

���

���

���

���

���

���

$YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ��

0L[LQJ�SDUDPHWHU�

1RUPDOL]HG�0XWXDO�,QIRUPDWLRQ

,QIRPDS�RQ�IXOOíFRXSOHG�QHWZRUN/RXYDLQ�RQ�IXOOíFRXSOHG�QHWZRUN10)íEDVHG�DOJRULWKP�RQ�IXOOíFRXSOHG�QHWZRUN,QIRPDS�RQ�QRQíFRXSOHG�QHWZRUN/RXYDLQ�RQ�QRQíFRXSOHG�QHWZRUN10)íEDVHG�DOJRULWKP�RQ�QRQíFRXSOHG�QHWZRUN

Fig. 4. Quality of detection with different mixing parameters (p = 20%)

��� ��� ��� ��� ��� ��� ��� ��� ��� ����

���

���

���

���

���

$YHUDJH�QRGH�GHJUHH�N�� �N�� �N�� ��

0DWFKLQJ�IUDFWLRQ�S

1RUPDOL]HG�0XWXDO�,QIRUPDWLRQ

,QIRPDS�RQ�IXOOíFRXSOHG�QHWZRUN/RXYDLQ�RQ�IXOOíFRXSOHG�QHWZRUN10)íEDVHG�DOJRULWKP�RQ�IXOOíFRXSOHG�QHWZRUN,QIRPDS�RQ�QRQíFRXSOHG�QHWZRUN/RXYDLQ�RQ�QRQíFRXSOHG�QHWZRUN10)íEDVHG�DOJRULWKP�RQ�QRQíFRXSOHG�QHWZRUN

Fig. 5. Quality of detection with different matching fractions (µ = 0.3)

In summary, NMF-based algorithms show the best resultsand can be used in all the classes of multiplex OSNs. Louvainachieves good stability and medium accuracy when compared

to NMF. Although Infomap works very well on full-coupledand aggregated networks, it is not an acceptable candidate ondiagonal, categorical and star-coupled networks.

VI. CONCLUSION REMARKSIn this work, we investigate the community detection

problem in multiplex OSNs. We propose and compare twoclasses of approaches, namely unifying and coupling, wherewe develop a specialization based on NMF algorithm foreach approach. The intensive experiments show that NMF-based algorithms perform consistently and give better resultscompared to Infomap and Louvain in our benchmark. AlthoughInfomap and Louvain only work well on aggregated andfull-coupled networks, they run much faster than NMF-basedalgorithms.

REFERENCES

[1] http://newsroom.fb.com/company-info/, updated: 7.1.2014.[2] D. J. Watts and S. H. Strogatz, “Collective dynamics of small-

worldnetworks,” nature, vol. 393, no. 6684, pp. 440–442, 1998.[3] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free

networks,” Phys. Rev. Let., vol. 86, no. 14, p. 3200, 2001.[4] M. E. Newman and M. Girvan, “Finding and evaluating community

structure in networks,” Phys. Rev. E, vol. 69, no. 2, p. 026113, 2004.[5] T. N. Dinh, Y. Xuan, and M. T. Thai, “Towards social-aware routing in

dynamic communication networks,” in IPCCC, 2009, pp. 161–168.[6] N. P. Nguyen, T. N. Dinh, Y. Xuan, and M. T. Thai, “Adaptive algo-

rithms for detecting community structure in dynamic social networks,”in INFOCOM. IEEE, 2011, pp. 2282–2290.

[7] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs fortesting community detection algorithms,” Phys. Rev. E, vol. 78, no. 4,p. 046110, 2008.

[8] Y.-R. Lin, J. Sun, P. Castro, R. Konuru, H. Sundaram, and A. Kelliher,“Metafac: community discovery via relational hypergraph factoriza-tion,” in Proc. of the 15th ACM SIGKDD. ACM, 2009.

[9] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fastunfolding of communities in large networks,” Journal of StatisticalMechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.

[10] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complexnetworks reveal community structure,” vol. 105, no. 4, 2008.

[11] P. Holme and J. Saramaki, “Temporal networks,” Phys. Rep., vol. 519,no. 3, pp. 97–125, 2012.

[12] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, andM. A. Porter, “Multilayer networks,” arXiv arXiv:1309.7233, 2013.

[13] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrixfactorization,” in Adv. in neu. info. proc. sys., 2001, pp. 556–562.

[14] F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding, “Community discoveryusing nonnegative matrix factorization,” Data Mining and KnowledgeDiscovery, vol. 22, no. 3, pp. 493–521, 2011.

[15] P. Brodka, P. Stawiak, and P. Kazienko, “Shortest path discovery in themulti-layered social network,” in ASONAM. IEEE, 2011, pp. 497–501.

[16] M. Magnani and L. Rossi, “The ml-model for multi-layer socialnetworks,” in ASONAM. IEEE, 2011, pp. 5–12.

[17] P. Kazienko, E. Kukla, K. Musial, T. Kajdanowicz, P. Brodka, andJ. Gaworecki, “A generic model for a multidimensional temporal socialnetwork,” in e-Tech. and Net. for Dev. Springer, 2011, pp. 1–14.

[18] L. Yartseva and M. Grossglauser, “On the performance of percolationgraph matching,” in Proc. of the first ACM conf. on Online socialnetworks. ACM, 2013, pp. 119–130.

[19] Z.-Y. Zhang, Y. Wang, and Y.-Y. Ahn, “Overlapping community detec-tion in complex networks using symmetric binary matrix factorization,”Phys. Rev. E, vol. 87, no. 6, p. 062803, 2013.

[20] D. T. Nguyen, H. Zhang, S. Das, M. T. Thai, and T. N. Dinh, “Leastcost influence in multiplex social networks: Model representation andanalysis,” in ICDM. IEEE, 2013, pp. 567–576.

[21] L. Danon, A. Dıaz-Guilera, J. Duch, and A. Arenas, “Comparingcommunity structure identification,” Journal of Statistical Mechanics:Theory and Experiment, vol. 2005, no. 09, p. P09008, 2005.

IEEE Workshop on Inter-Dependent Networks 2015

659