social network analysis: community detectionkanawati/ars/rk-eisti-ado2.pdf · introduction complex...

104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Social network analysis: Community detection Rushed Kanawati LIPN, CNRS UMR 7030; USPC http://lipn.fr/kanawati [email protected] 1 / 104

Upload: others

Post on 12-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

Social network analysis: Community detection

Rushed KanawatiLIPN, CNRS UMR 7030; USPC

http://lipn.fr/∼[email protected]

October 26, 2016

1 / 104

Page 2: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

PLAN

1 INTRODUCTION

2 COMPLEX NETWORK ANALYSIS TASKS

3 Community detectionLocal community detectionGlobal communities detectionCommunity evaluation approaches

4 Challenges

5 Conclusion

2 / 104

Page 3: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMPLEX NETWORK

DefinitionGraphs modeling (direct/indirect) interactions among actors.

Basic topological featuresI Low Density

I Small Diameter

I Heterogeneous degreedistribution.

I High Clusteringcoefficient

I Community structure

3 / 104

Page 4: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE I : SCIENTIFIC COLLABORATION NETWORK

Density : 10−4

Diameter : 24

Clustering coeff.: 0.67

Co-authorship network - DBLP 1980-1984 (for authors

active for more than 10 years)

4 / 104

Page 5: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE II: SPATIAL NETWORK

Public sites accessibility in the Bourget districtprojet FUI UrbanD 2009-2012

Densiy : 0.052

Diameter : 7

clustering Coef.: 0.87

A relative-neighbourhood graph 5 / 104

Page 6: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE III: PRODUCT PURCHASE NETWORK

projet ANR CADI 2008-2010

Density : 0.02

Diameter : 8

Clustering coef. : 0.84

Music purchase on Mondomix site during April-June 2004

6 / 104

Page 7: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE IV: CO-RATING NETWORK

Movie Lens

Density : 0.061

Diameter : 4

Clustering coef. : 0.626

Users that co-rate by 1 at least one movie

7 / 104

Page 8: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE V: PRODUCT NETWORK

MovieLens

Density : 0.22

Diameter : 4

Clustering coef. : 0.746

Movies co-rated 5 by at least 1 user

8 / 104

Page 9: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ANOTHER EXAMPLE

I Graph of the Internet

I Nodes = serviceproviders

I Edges : Connections

[CHK+07]

9 / 104

Page 10: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ANOTHER EXAMPLE

I Mubarak’s resignationannouncement.

I Nodes = Twitter user

I Edges : retweet on

#jan25 hashtag

http://gephi.org/2011/the-egyptian-revolution-on-twitter/

10 / 104

Page 11: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LAST EXAMPLE !

I Ingredients networkextracted fom cookingreceipts

I Nodes = Ingredients

I Edges : usage in the

same receipt.

[Lada Adamic Course on Network Analysis: Coursera]

11 / 104

Page 12: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

NOTATIONS

A graph G =< V,E ⊆ V × V >

I V is the set of nodes (a.k.a. vertices,actors, sites)

I E is the set of edges (a.k.a. ties, links,bonds)

NotationsI AG Adjacency Matrix d : aij 6= 0 si les nœuds (vi, vj) ∈ E, 0

otherwise.I n = |V|I m = |E|. We have often m ∼ nI Γ(v) : neighbor’s of node v. Γ(v) = {x ∈ V : (x, v) ∈ E}.I Node degree : d(v) =‖ Γ(v) ‖

12 / 104

Page 13: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMPLEX NETWORK ANALYSIS: TASKS

Node orientedI Nodes ranking in function of their importance, influence, . . .I Applying centrality functionsI Applications: Ranking (ex. researchers ! ), Viral Marekting,

Cyper-attacks (prevention); . . .

Network orientedI Network formation & evolution modelsI Link prediction, Diffusion modelsI Applications: Explanation, Recommendation, Anomalies

detection, infirmation diffusion, . . .

Community oriented

13 / 104

Page 14: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CENTRALITY: EXAMPLES

Degree centrality

I Cd(v) = ‖(Γ(v)‖maxu∈V‖Γ(u)‖

I Complexity : O(m)

14 / 104

Page 15: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CENTRALITY: SOME EXAMPLES

Closeness centrality

I Cc(v) = 1∑u∈V sp(v,u)

I sp(u, v) : shortest path lengthbetween u, v.

I Complexity =O(n×m + n2logn)

15 / 104

Page 16: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CENTRALITY: SOME EXAMPLES

Betweenness centrality

I Ci(v) =∑

s,t∈V,stv

σs,t(v)σs,t

I σs,t(v) : number of shortestpaths linking s to t thatinclude v

I σs,t : total number of shortestpaths linking s to t

I Complexity = O(n3)

16 / 104

Page 17: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMMUNITY ?

DefinitionsI A dense subgraph loosely coupled to other modules in the

networkI A community is a set of nodes seen as one by nodes outside the

communityI A subgraph where almost all nodes are linked to other nodes in

the community.I . . .

17 / 104

Page 18: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMMUNITY DETECTION PROBLEM

I Local community identification (ego-centred).

I Network partition computing

I Overlapping community detection

18 / 104

Page 19: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LOCAL COMMUNITY

19 / 104

Page 20: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LOCAL COMMUNITY

20 / 104

Page 21: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LOCAL COMMUNITY

1 C← {φ}, B← {n0} S← Γ(n0)

2 Q← 0 /* a community quality function */

3 While Q can be enhanced Do

1 n← argmaxn∈SQ2 S← S− {n}3 D← D + {n}4 update B,S,C

4 Return D

21 / 104

Page 22: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

QUALITY FUNCTIONS : EXEMPLES I

Local modularity R [Cla05]

R = BinBin+Bout

Local modularity M [LWP08]

M = DinDout

Local modularity L [CZG09]

L = LinLex

where : Lin =

∑i∈D‖Γ(i)∩D‖

‖D‖ , Lex =

∑i∈B‖Γ(i)∩S‖

‖B‖

And many many others . . . [YL12]

22 / 104

Page 23: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LOCAL MODULARITY LIMITATIONS: AN EXEMPLE

� Blank nodes enhance Bin and Din without affecting Bout and Dout

� Bank nodes will be added if M or R modularities are used� Low precision computed communities� Proposed solution: Ensemble approaches

23 / 104

Page 24: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MULTI-OBJECTIVE LOCAL COMMUNITY

IDENTIFICATION [?]

Three main approaches

Combine then RankEnsemble rankingEnsemble clustering

24 / 104

Page 25: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMBINE THEN RANK

Principle

Let Qi(s) be the local modularity value induced by node s ∈ S

Qi(s) =

Qi(s)−min

u∈SQi(u)

maxu∈S

Qi(u)−minu∈S

Qi(u) if maxu∈S

Qi(u) 6= minu∈S

Qi(u)

1 otherwise

Qcom(s) = 1k

k∑i=1

Qi(s)

25 / 104

Page 26: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING APPROACHES

Principle

I Rank S in function of each local modularity Qi

I Select the winner after applying ensemble ranking approachI What stopping criteria to apply ?

Stopping criteria

I Strict policy : All modularities should be enhancedI Majority policy : Majority of local modularities are enhancedI Least gain policy : At least one local modularity is enhanced.

26 / 104

Page 27: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING

ProblemI Let S be a set of elements to rank by n rankersI Let σi be the rank provided by ranker iI Goal: Compute a consensus rank of S.

Deja Vu: Social choice algorithms, but . . .

I Small number of voters and big number of candidatesI Algorithmic efficiency is requiredI Output could be a complete rank

27 / 104

Page 28: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING

ProblemI Let S be a set of elements to rank by n rankersI Let σi be the rank provided by ranker iI Goal: Compute a consensus rank of S.

Deja Vu: Social choice algorithms, but . . .

I Small number of voters and big number of candidatesI Algorithmic efficiency is requiredI Output could be a complete rank

28 / 104

Page 29: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING : APPROACHES

Jean-Charles de Borda [1733-1799]

BordaI Borda’s score of i ∈ σk :

Bσk(i) = {count(j)|σk(j) < σk(i) ; j ∈ σk}.I Rank elements in function of

B(i) =∑k

t=1 wt × Bσt(i).

29 / 104

Page 30: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING : APPROACHES

Marquis de Condorcet [1743-1794]

CondorcetI The winner is the candidate who defeats

every other candidate in pairwisemajority-rule election

I The winner may not exists

Condorcet 6= BordaI Votes : 6× A � B � C, 4× B � C � AI Borda winner : BI Condorcet winner : A

30 / 104

Page 31: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING : APPROACHES

Marquis de Condorcet [1743-1794]

CondorcetI The winner is the candidate who defeats

every other candidate in pairwisemajority-rule election

I The winner may not exists

Condorcet 6= BordaI Votes : 6× A � B � C, 4× B � C � AI Borda winner : BI Condorcet winner : A

31 / 104

Page 32: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING : APPROACHES

Extended Condorcet criterionIf for every a ∈ A and b ∈ B, majority prefers a to b the all elements in Ashould be ranked before any element in B.

Kenneth Arrow, 1921-

Arrow’s TheoremNo vote rule can have the following desiredproprieties :I Every result must be achievable somehow.I Monotonicity.I Independence of irrelevant attributes.I Non-dictatorship.

32 / 104

Page 33: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE RANKING : APPROACHES

John Kemeny 1926-1992

Optimal Kemeny rank aggregation

I Let d() be distance over rankings σi (ex.Kendall τ , Spearman’s footrule)

I Find π that minimise∑

i d(π, σi)

I NP-Hard problem

I Approximation : Local Kemeny : twoadjacent candidats are in the good order.

I Local Kemeny : Apply Bubble sort using themajority preference partial order relationship

I Approximate Kemeny : Apply QuickSort

33 / 104

Page 34: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING APPROACHES

Principle

I Let CQivq be the the local community of vq applying Qi.

I We have a natural partition : PQi = {CQivq ,C

Qivq }

I Apply an ensemble clustering approach.

34 / 104

Page 35: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING: PRINCIPLE

from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005

35 / 104

Page 36: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING: APPROACHES

CSPA: Cluster-based Similarity Partitioning Algorithm

I Let K be the number of basic models, Ci(x) be the cluster in modeli to which x belongs.

I Define a similarity graph on objects : sim(v,u) =

K∑i=1

δ(Ci(v),Ci(u))

K

I Cluster the obtained graph :Isolate connected components after prunning edgesApply community detection approach

I Complexity : O(n2kr) : n # objects, k # of clusters, r# of clusteringsolutions

36 / 104

Page 37: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CSPA : EXEMPLE

from Seifi, M. Cœurs stables de communautes dans les graphes de terrain. These de l’universite Paris 6, 2012

37 / 104

Page 38: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING: APPROACHES

HGPA: HyperGraph-Partitioning Algorithm

I Construct a hypergraph where nodes are objects and hyperedgesare clusters.

I Partition the hypergraph by minimizing the number of cuthyperedges

I Each component forms a meta cluster

I Complexity : O(nkr)

38 / 104

Page 39: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING: APPROACHES

MCLA: Meta-Clustering Algorithm

I Each cluster from a base model is an itemI Similarity is defined as the percentage of shared common objectsI Conduct meta-clustering on these clustersI Assign an object to its most associated meta-cluster

I Complexity : O(nk2r2)

39 / 104

Page 40: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXPERIMENTS

Protocol ([Bag08])

1 Apply the different algorithms on nodes in networks for which aground-truth community partition is known.

2 For each node compute the distance between the real-partitionand the computed one (Ex. NMI [Mei03])

3 Compute average and standard deviation for the network.

40 / 104

Page 41: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

DATASETS

Zachary’s Karate Club

US Political books network

Football network

Dolphins social network

41 / 104

Page 42: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RESULTS : EVALUATING STOPPING CRITERIA (NMI)

Zachary’s Karate Club

US Political books network

Football network

Dolphins social network

42 / 104

Page 43: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RESULTS : COMPARATIVE RESULTS (NMI)

Zachary’s Karate Club

US Political books network

Football network

Dolphins social network

43 / 104

Page 44: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

GLOBAL COMMUNITIES DETECTION

ProblemI Divid the set of nodes in a number of

(overlapping) subsets such that inducedsubgraphs are dense and loosley coupled.

Recommended readings

I S. Fortunato. Community detection in graphs. Physics Reports, 2010,486, 75-174

I L. Tang, H. Liu. Community Detection and Mining in Social Media,Morgan Claypool Publishers, 2010

I R. Kanawati, Detection de communautes dans les grands graphesd’interactions multiplexes : etat de l’art, (RNTI 2014), hal-00881668

44 / 104

Page 45: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

GLOBAL COMMUNITIES DETECTION

ProblemI Divid the set of nodes in a number of

(overlapping) subsets such that inducedsubgraphs are dense and loosley coupled.

Recommended readings

I S. Fortunato. Community detection in graphs. Physics Reports, 2010,486, 75-174

I L. Tang, H. Liu. Community Detection and Mining in Social Media,Morgan Claypool Publishers, 2010

I R. Kanawati, Detection de communautes dans les grands graphesd’interactions multiplexes : etat de l’art, (RNTI 2014), hal-00881668

45 / 104

Page 46: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMMUNITIES DETECTION: METHODS

Classification

I Group-basedNodes are grouped in function of shared topological features (ex. clique)

I Network-basedClustering, Graph-cut, block-models, modularity optimization

I Propagation-basedUnstable approaches, good when used with ensemble clustering

I Seed-centredfrom local community identification to global community detection

46 / 104

Page 47: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

GROUP-BASED APPROCHES

Principle

Search for special (dense) subgraphs:I k-cliqueI n-cliqueI γ-dense cliqueI K-core

47 / 104

Page 48: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

GROUP-BASED APPROCHES

from Symeon Papadopoulos, Community Detection inSocial Media, CERTH-ITI, 22 June 2011

48 / 104

Page 49: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE: CLIQUE PERCOLATION

Suits fairly dense graph.

49 / 104

Page 50: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

NETWORK-BASED APPROCHES

Clustering approaches

I Apply classical clustering approaches using graph-based diastasefunction

I Different types of Graph-based distances: neighborhood-based,path-based (Random-walk)

I Usually requires the number of clusters to discover

50 / 104

Page 51: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MODULARITY OPTIMIZATION APPROACHES

Modularity: a partition quality criteria

Q(P) =1

2m

∑c∈P

∑i,j∈c

(Aij −didj

2m) (1)

Figure: Example : Q = (15+6)−(11.25+2.56)25 = 0.275

51 / 104

Page 52: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MODULARITY OPTIMIZATION APPROACHES

I Applying classical optimization algorithms (ex. Geneticalgorithms [Piz12]).

I Applying hierarchical clustering and select the level with Qmax(ex. Walktrap [PL06])

I Divisive approach : Girvan-Newman algorithm [GN02]I Greedy optimization : Louvain algorithm [BGL08]I . . .

52 / 104

Page 53: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE: GIRVAN-NEWMAN ALGORITHM

1 Compute betweenness centrality for each edge.

2 Remove edge with highest score.

3 Re-compute all scores.

4 Repeat 2nd step.

Complexity : O(n3)

53 / 104

Page 54: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXAMPLE: LOUVAIN APPROACH

54 / 104

Page 55: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MODULARITY OPTIMIZATION LIMITATIONS

Hypothesis

The best partition of a graph is the one that maximize themodularity.If a network has a community structure, then it is popsicle to finda precise partition with maximal modularityIf a network has a community structure, then partitions inducinghigh modularity values are structurally similar.

All three hypothesis do not hold [GdMC10, LF11].

55 / 104

Page 56: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MODULARITY RESOLUTION LIMITE

For m = 3 MaxQ = 0.675 while natural partition has Q = 0.65 56 / 104

Page 57: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MODULARITY DISTRIBUTION

57 / 104

Page 58: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

PROPAGATION-BASED APPROACHES

Algorithm 1 Label propagation

Require: G =< V,E > a connected graph,1: Initialize each node with unique label lv2: while Labels are not stable do3: for v ∈ V do

lv =l |Γl(v)| /* random tie-breaking */4: end for5: end while6: return communities from labels

Γl(v) : set of neighbors having label l

58 / 104

Page 59: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LABEL PROPAGATION

Advantages

I Complexity : O(m)

I Highly parallel

Disadvantages

I No convergence guarantee, oscillationphenomena

I Low robustnessDifferent runs yields very different community

structure due to randomness

59 / 104

Page 60: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LABEL PROPAGATION: CONVERGENCE

ENHANCEMENT

I Asynchronous label update, [RAK07]I Semi-synchronous label update (graph coloring + propagation by

color) [CG12].

I Making stability even worse !I Harden parallel implementations.

60 / 104

Page 61: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ROBUST LABEL PROPAGATION

I Label hop attenuation [LHLC09]I Balanced label propagation [SB11].

I Multiplex approach !adding new neighborhood similarity based relationships betweenadjacent nodes→ multiplex network

I Ensemble clustering→ Communities core [OGS10, SG12, LF12].

61 / 104

Page 62: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

PARALLEL LP

62 / 104

Page 63: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LABEL PROPAGATION PREPROCESSING (EPP)

1 Apply an ensemble clustering on results of basic Parallel LP

2 Coarse the graph according to obtained communities

3 Apply a high quality community detection algorithm on coarsedgraph.

4 Expand obtained results to the initial graph.

63 / 104

Page 64: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EVALUATIONS

benchmark networks Pareto front

64 / 104

Page 65: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

SEED-CENTRIC ALGORITHMS (KANAWATI, SCSM’2014)

Algorithm 2 General seed-centric community detection algorithm

Require: G =< V,E > a connected graph,1: C ← ∅2: S← compute seeds(G)3: for s ∈ S do4: Cs ← compute local com(s,G)5: C ← C + Cs6: end for7: return compute community(C)

65 / 104

Page 66: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

SEED-CENTRIC APPROACHES

Table: Characteristics of major seed-centric algorithms

Algorithm Seed Nature Seed Number Seed selection Local Com. Com. computationLeaders-Followers [SZ10] Single Computed informed Agglomerative -Top-Leaders [KCZ10] Single Input Random Expansion -PapadopoulosKVS12 Subgraph Computed Informed Expansion -WhangGD13 Single Computed informed Expansion -BollobasR09 Subgraph Computed informed expansion -

Licod [Kan11] Set Computed Informed Agglomerative -Yasca [?] Single Computed Informed Expansion Ensemble clustering

66 / 104

Page 67: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXEMPLE: LICOD (KANAWATI, SOCILACOMP’2011)

Licod : the idea !

1 Compute a set of seeds that are likely to be leaders in theircommunities

Heuristic : nodes having higher centralities than their neighbors

2 Each node in the graph ranks seeds in function of its ownpreference

3 Each node modify its preference vector in function of neighbor’spreferences

4 iterate max times or till convergence.

67 / 104

Page 68: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LICOD: SOME RESULTS

Comparative results on benchmark networks : NMI

More results in ([YK14])68 / 104

Page 69: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

THE YASCA ALGORITHM (KANAWATI, COCOON’2014)

The algorithm

Yet Another Seed-centric Community detection Algorithm

1 Compute a set of diverse seed nodes

2 For each node compute a bi-partition of the whole graphe basedon local community identification.

3 Apply ensemble clustering to obtain a graph partition.

69 / 104

Page 70: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

YASCA: SOME RESULTS

Comparative Results on benchmark networks : NMI

Select 15% of high central nodes and 15% of low central nodes

70 / 104

Page 71: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EVALUATION METHODS

I Topological criteria : is it useful for applications ?I Ground-truth comparaison : hard to find on large-scaleI Task-oriented approaches : Clustering , Recommendation, link

prediction, . . .

71 / 104

Page 72: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

TOPOLOGICAL EVALUATION CRITERIA

Quality of C = {S1, . . . ,Si} :

Q(C) =

∑i

f (Si)

|C|(2)

f () is a single- community quality metric4 types of quality functions

Internal connectivityExterneal connectivityHybrid functionsModel-based functions: the modularity

72 / 104

Page 73: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

INTERNAL CONNECTIVITY FUNCTIONS

Internal density : f (S) = 2×mSnS×(nS−1)

Average Degree : f (S) = 2×mSnS

FOMD: Fraction over Median Degree : f (s) = |{u:u∈S,|(u,v),v∈S|>dm}|nS

,dm est le median de degres des nœuds dans V

TPR: Triangle Participation Ratio : |{u∈S}:∃v,w∈S:(u,v),(w,v),(u,w)∈E|nS

73 / 104

Page 74: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXTERNAL CONNECTIVITY FUNCTIONS

Expansion : f (S) = CSnS

Cut : f (S) = CsnS×(N−nS)

74 / 104

Page 75: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

HYBRID FUNCTIONS

I Conductance : f (S) = CS2mS+CS

I MAX-ODF : Out Degree Fraction : f (S) = maxu∈S|{(u,v)∈E,v6∈S}|

d(u)

I AVG-ODF : f (S) = 1nS×∑u∈S

|{(u,v)∈E,v6∈S}|d(u)

75 / 104

Page 76: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

GROUND-TRUTH BASED EVALUATION

I Principle : Computing a similarity between obtained partition anda ground-truth one.

I Ground-truth :Expert defined.Model-generated

I Two types of metrics :Agreement-based metrics: purity, rand, ARIInformation theory metrics: MI, NMI, . . . , etc.

76 / 104

Page 77: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

PURITY

I P = {p1, p2, . . . , pk} : a computed partitionI R = {r1, r2, . . . , rn} : ground-truth partition

I purity(P,R) = 1|V|∑k

j=1 maxi(|pk ∩ ri|)

77 / 104

Page 78: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RAND INDEX

I P = {p1, p2, . . . , pk} : a computed partitionI R = {r1, r2, . . . , rn} : ground-truth partitionI a : # pairs of nodes in a same community according to P andRI b : # pairs of nodes in a same community according to P and in

different communities inRI c : # pairs of nodes in a different communities according to P and

in same community inRI d : # pairs of nodes in a different communities in P andRI

I rand(P,R) = a+da+b+c+d

I ARI = C2n(a+d)−[(a+b)(a+c)+(c+d)(b+d)]

(C2n)2−[(a+b)(a+c)+(c+d)(b+d)]

I E(ARI)=0I rappel : Ck

n = n!k!(n−k)!

78 / 104

Page 79: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MUTUAL INFORMATION

I(X,Y) = H(X) + H(Y)−H(X,Y)

H(X) =∑nx

i=1 p(xi)× log( 1p(xi)

)

Normalisation : NMI(U,V) = I(U,V)√H(U)H(V)

79 / 104

Page 80: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CHALLENGES

I Real-world networks are DynamicI Real-world networks are HeterogeneousI Nodes have usually semantic attributesI Scale problemI Towards multiplex network mining

80 / 104

Page 81: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MULTIPLEX NETWORK

DefinitionA set of nodes related by different types of relations

Source: muxviz

MotivationI Real networks are dynamic.I Real networks are heterogeneous.I Nodes are usually qualified by a set

of attributes.

81 / 104

Page 82: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MULTIPLEX NETWORK: NOTATIONS

G =< V,E1, . . . ,Eα : Ek ⊆ V × V ∀k ∈ {1, . . . , α} >

I V: set of nodes (a.k.a. vertices, actors,sites)

I Ek: set of edges of type k (a.k.a. ties,links, bonds)

NotationsI A[k] Adjacency Matrix of slice k : a[k]ij 6= 0 si les nœuds (vi, vj) ∈ Ek, 0 otherwise.

I m[k] = |Ek|. We have often m ∼ n

I Neighbor’s of v in slice k: Γ(v)[k] = {x ∈ V : (x, v) ∈ Ek}.I Node degree in slice k: dk

v =‖ Γ(v)[k] ‖

I Total degree of node i: dtotv =

∑αs=1 d[s]v

I All neighbors of v : Γ(v)tot = ∪s∈{1,...,α}Γ(c)[s]

82 / 104

Page 83: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMMUNITY ?

Some definitions

I A dense subgraph loosely coupled to other modules in the network

I A community is a set of nodes seen as one by nodes outside thecommunity

I A subgraph where almost all nodes are linked to other nodes in thecommunity.

What is a dense subgraph in a multiplex network ?BerlingerioCG11

83 / 104

Page 84: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

COMMUNITY DETECTION IN MULTIPLEX NETWORKS

Approaches

1 Transformation into a monoplex community detection problem

I Layer aggregation approaches.I Ensemble clustering approachesI Hypergraph transformation based approaches

2 Generalization of monoplex oriented algorithms to multiplexnetworks.

I Generalized-modularity optimizationI Seed-centric approaches

84 / 104

Page 85: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LAYER AGGREGATION

Aggregation functions

Aij =

{1 ∃1 ≤ l ≤ α : A[l]

ij 6= 0

0 otherwise

Aij =‖ {d : A[d]ij 6= 0} ‖

Aij =1α

α∑k=1

wkA[k]ij

Aij = sim(vi, vj)

85 / 104

Page 86: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

LAYER AGGREGATION

Aggregation functions

Aij =

{1 ∃1 ≤ l ≤ α : A[l]

ij 6= 0

0 otherwise

Aij =‖ {d : A[d]ij 6= 0} ‖

Aij =1α

α∑k=1

wkA[k]ij

Aij = sim(vi, vj)

86 / 104

Page 87: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING APPROACHES

Ensemble Clustering Strehl2003

I CSPA: Cluster-based Similarity Partitioning AlgorithmI HGPA: HyperGraph-Partitioning AlgorithmI MCLA: Meta-Clustering AlgorithmI . . .

87 / 104

Page 88: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

ENSEMBLE CLUSTERING APPROACHES

Ensemble Clustering Strehl2003

I CSPA: Cluster-based Similarity Partitioning AlgorithmI HGPA: HyperGraph-Partitioning AlgorithmI MCLA: Meta-Clustering AlgorithmI . . .

88 / 104

Page 89: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

K-UNIFORM HYPERGRAPH TRANSFORMATION

KIVELA2013MULTILAYER

Principle

I A k-uniform hypergraph is a hypergraph in which the cardinalityof each hyperedge is exactly k

I Mapping a multiplex to a 3-uniform hypergraphH = (V, E) suchthat :

V = V ∪ {1, . . . , α}(u, v, i) ∈ E if ∃l : A[l]

uv 6= 0, u, v ∈ V, i ∈ {1, . . . , α}I Apply community detection approaches in Hypergraphs (Ex.

tensor factorization approaches)

89 / 104

Page 90: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MULTIPLEX MODULARITY

Generalized modularity mucha2010communityI

Qmultiplex(P) =1

∑c∈P

∑i,j∈c

k,l:1→α

A[s]ij − λk

d[k]i d[k]

j

2m[k]

δkl + δijCklij

I µ =

∑j∈V

k,l:1→α

m[k] + Cljk

I Cklij Inter slice coupling = 0 ∀i 6= j

90 / 104

Page 91: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MODULARITY OPTIMIZATION LIMITATIONS

Hypothesis

I The best partition of a graph is the one that maximize themodularity.

I If a network has a community structure, then it is possible to finda precise partition with maximal modularity.

I If a network has a community structure, then partitions havinghigh modularity values are structurally similar.

All three hypothesis do not hold Good10, LAN11a.

91 / 104

Page 92: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

MUXLICOD

Multiplex degree centrality [?]

dmultiplexi = −

α∑k=1

d[k]i

d[tot]i

log

(d[k]

i

d[tot]i

)

Multiplex shortest path

SP(u, v)multiplex =

α∑k=1

SP(u, v)[k]

α

Multiplex neighborhood

Γmux(v) = {x ∈ Γ(v)tot :Γ(v)tot ∩ Γ(x)tot

Γ(v)tot ∪ Γ(x)tot ≥ δ}

92 / 104

Page 93: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EXPERIMENTS

Datasets

1 European airports network

2 Bibliographical network (DBLP) : Three layer network :co-authorship, co-venue, co-citing.

93 / 104

Page 94: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

EVALUATION CRITERIA

1 Multiplex modularity

2 Redundancy [?]

ρ(c) =∑

(u,v)∈ ¯Pc

‖ {k : ∃A[k]uv 6= 0} ‖

α× ‖ Pc ‖

¯P the set of couple (u, v) which are directly connected in at leasttwo layers

94 / 104

Page 95: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RESULTS: LAYER AGGREGATION APPROACHES

Figure: European airports multiplex network

95 / 104

Page 96: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RESULTS: PARTITION AGGREGATION APPROACHES

Figure: European airports multiplex network

96 / 104

Page 97: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RESULTS: MUXLICOD

Figure: European airports multiplex network

97 / 104

Page 98: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

RESULTS: MUXLICOD

Figure: DBLP 3-layer multiplex

98 / 104

Page 99: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CONCLUSIONS I

I Networks are everywhere (ou presque)

I Network analysis can be applied to different kind of tasks

I Community detection can help in different analysis andapplication tasks

I Challenges : Scale-problems, multiplex network miningProblems : Evaluation and interoperation of computedcommunities.

99 / 104

Page 100: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

BIBLIOGRAPHY I

J. P. Bagrow, Evaluating local community methods in networks, J. Stat. Mech. 2008 (2008), no. 5, P05001.

Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre, Fast unfolding of communities in large networks, Journal ofStatistical Mechanics: Theory and Experiment 2008 (2008), P10008.

Gennaro Cordasco and Luisa Gargano, Label propagation algorithm: a semi-synchronous approach, IJSNM 1 (2012), no. 1, 3–26.

S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, and E. Shir, A model of internet topology using k-shell decomposition, PNAS 104(2007), no. 27, 11150–11154.

Aaron Clauset, Finding local community structure in networks, Physical Review E (2005).

Jiyang Chen, Osmar R. Zaıane, and Randy Goebel, Local community identification in social networks, ASONAM, 2009,pp. 237–242.

100 / 104

Page 101: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

BIBLIOGRAPHY II

B. H. Good, Y.-A. de Montjoye, and A. Clauset., The performance of modularity maximization in practical contexts., PhysicalReview E (2010), no. 81, 046106.

M. Girvan and M. E. J. Newman, Community structure in social and biological networks, PNAS 99 (2002), no. 12, 7821–7826.

Rushed Kanawati, Licod: Leaders identification for community detection in complex networks, SocialCom/PASSAT, 2011,pp. 577–582.

Reihaneh Rabbany Khorasgani, Jiyang Chen, and Osmar R Zaiane, Top leaders community detection approach in informationnetworks, 4th SNA-KDD Workshop on Social Network Mining and Analysis (Washington D.C.), 2010.

Andrea Lancichinetti and Santo Fortunato, Limits of modularity maximization in community detection, CoRR abs/1107.1(2011).

, Consensus clustering in complex networks, Sci. Rep. 2 (2012).

101 / 104

Page 102: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

BIBLIOGRAPHY III

Ian X.Y. Leung, Pan Hui, Pietro Lio, and Jon Crowcroft, Towards real-time community detection in large networks, Phys. Phy. E.79 (2009), no. 6, 066107.

Feng Luo, James Zijun Wang, and Eric Promislow, Exploring local community structures in large networks, Web Intelligenceand Agent Systems 6 (2008), no. 4, 387–400.

Marina Meila, Comparing clusterings by the variation of information, COLT (Bernhard Scholkopf and Manfred K. Warmuth,eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187.

Michael Ovelgonne and Andreas Geyer-Schulz, Cluster cores and modularity maximization, ICDM Workshops, 2010,pp. 1204–1213.

Clara Pizzuti, A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. EvolutionaryComputation 16 (2012), no. 3, 418–430.

102 / 104

Page 103: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

BIBLIOGRAPHY IV

Pascal Pons and Matthieu Latapy, Computing communities in large networks using random walks, J. Graph Algorithms Appl.10 (2006), no. 2, 191–218.

Usha N. Raghavan, Roka Albert, and Soundar Kumara, Near linear time algorithm to detect community structures in large-scalenetworks, Physical Review E 76 (2007), 1–12.

Lovro Subelj and Marko Bajec, Robust network community detection using balanced propagation, European Physics Journal B81 (2011), no. 3, 353–362.

Massoud Seifi and Jean-Loup Guillaume, Community cores in evolving networks, WWW (Companion Volume) (Alain Mille,Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1173–1180.

D Shah and T Zaman, Community detection in networks: The leader-follower algorithm, Workshop on Networks AcrossDisciplines in Theory and Applications, NIPS, 2010.

103 / 104

Page 104: Social network analysis: Community detectionkanawati/ars/rk-eisti-ado2.pdf · INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detectionChallengesConclusion PLAN 1 INTRODUCTION

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

BIBLIOGRAPHY V

Zied Yakoubi and Rushed Kanawati, Interactions in complex systems, ch. Community detection in complex interactionnetworks: Seed-based approaches, Cambridge Scholar Publishing, 2014.

Jaewon Yang and Jure Leskovec, Defining and evaluating network communities based on ground-truth, ICDM(Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEEComputer Society, 2012, pp. 745–754.

104 / 104