algorithmic methods for complex network analysis: …i11 · algorithmic methods for complex network...

205
KARLSRUHE INSTITUTE OF TECHNOLOGY –INSTITUTE OF THEORETICAL INFORMATICS Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner | September 19, 2014 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu

Upload: donhu

Post on 26-Aug-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

KARLSRUHE INSTITUTE OF TECHNOLOGY – INSTITUTE OF THEORETICAL INFORMATICS

Algorithmic Methods for ComplexNetwork Analysis: Graph ClusteringSummer School on Algorithm Engineering

Dorothea Wagner | September 19, 2014

KIT – University of the State of Baden-Wuerttemberg andNational Laboratory of the Helmholtz Association

www.kit.edu

Scenario of Network AnalysisGiven a network . . .

34

33

31

30

27

23

21

19

16

15

10

9

22

20

18

14

13

12

8

4

32

117

11

7

6

5

322928

26

25

24

explore the instancederive its structureidentify its properties

How can we learn about the instance?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 2/49

Scenario of Network AnalysisGiven a network . . .

34

33

31

30

27

23

21

19

16

15

10

9

22

20

18

14

13

12

8

4

32

117

11

7

6

5

322928

26

25

24

explore the instancederive its structureidentify its properties

How can we learn about the instance?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 2/49

An Archetypal Example“Zachary’s Karate Club”, a real, social network

34

33

31

30

27

23

21

19

16

15

10

9

22

20

18

14

13

12

8

4

32

117

11

7

6

5

322928

26

25

24

2 years of observation34 vertices = members78 edges = social ties

club split up after disputemanager vs. trainersarchon of toy examples

Caused by an “unequal flow of sentiments and information across the ties”a “factional division led to a formal separation of the club”.[Wayne Zachary: An Information Flow Model for Conflict and Fission in Small Groups, ’77]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 3/49

A Glimpse of Network Analysis

graph clustering / detecting communities

Group 1

322928

26

25

24

Group 2

17

11

7

6

5

Group 3

22

20

18

14

13

12

8

4

32

1

Group 4

34

33

31

30

27

23

21

19

16

15

10

9

box = cluster

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 4/49

Scaling of Real-World Instances

”‘Zachary’s Karate Club”’)(vertices/edges = 34/78)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49

Scaling of Real-World Instances

”‘US college football”’ teams and matches(vertices/edges = 115/616)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49

Scaling of Real-World Instances

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49

variables of aSAT-instanceedges = direct dep.(electr. components)(vertices/edges ≈ 2K/6K)

Scaling of Real-World Instances

sci. collaborations:3-hop neighorhoodvon D. Wagner(DBLP)(vertices/edges ≈ 10k/40k)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49

Scaling of Real-World Instances

physical Internet: autonomous systemes(vertices/edges ≈ 20K/60K)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49

Scaling of Real-World Instances

instance vertices edges

coauthors in DBLP 300K 1M

roads in the USA 24M 60M

WWW: .UK-domain ’02 20M 500M

( neurons in human brain & 1011 ∼ 1017 )

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49

. . . no limit to be expected. . .

Clustering: Intuition to FormalizationTask: partition graphinto natural groupsParadigm:intra-cluster densityvs. inter-clustersparsity

Different approaches exist to formalize this paradigm, usually:

Paradigm of Graph ClusteringIntra-cluster density vs. inter-cluster sparsity

⇓Mathematical Formalization

quality measures for clusterings

Many exist, optimization generally (NP-)hardThere is no single, universally best strategy

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 6/49

Clustering: Intuition to FormalizationTask: partition graphinto natural groupsParadigm:intra-cluster densityvs. inter-clustersparsity

Different approaches exist to formalize this paradigm, usually:

Paradigm of Graph ClusteringIntra-cluster density vs. inter-cluster sparsity

⇓Mathematical Formalization

quality measures for clusterings

Many exist, optimization generally (NP-)hardThere is no single, universally best strategy

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 6/49

Clustering: Intuition to FormalizationTask: partition graphinto natural groupsParadigm:intra-cluster densityvs. inter-clustersparsity

Different approaches exist to formalize this paradigm, usually:

Paradigm of Graph ClusteringIntra-cluster density vs. inter-cluster sparsity

⇓Mathematical Formalization

quality measures for clusterings

Many exist, optimization generally (NP-)hardThere is no single, universally best strategy

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 6/49

Algorithm Engineering

Algorithms

implement

design

experiment

anal

yze

modelling reality is hard

finding optima is hardsatisfying needs ofapplication is hard

still, we do need to cluster⇒ need good foundation

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49

Algorithm Engineering

Algorithms

implement

design

experiment

anal

yze

modelling reality is hardfinding optima is hard

satisfying needs ofapplication is hard

still, we do need to cluster⇒ need good foundation

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49

Algorithm Engineering

Algorithms

implement

design

experiment

anal

yze

modelling reality is hardfinding optima is hardsatisfying needs ofapplication is hard

still, we do need to cluster⇒ need good foundation

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49

Algorithm Engineering

Algorithms

implement

design

experiment

anal

yze

modelling reality is hardfinding optima is hardsatisfying needs ofapplication is hard

still, we do need to cluster

⇒ need good foundation

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49

Algorithm Engineering

Algorithms

implement

design

experiment

anal

yze

modelling reality is hardfinding optima is hardsatisfying needs ofapplication is hard

still, we do need to cluster⇒ need good foundation

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49

Clustering vs. Partitioning

clustering partitioningpurpose analysis (pred.) handling of instance. . . and then? zoom/abstraction computations on parts

# of parts open predefined (upper bound)size of parts open upper bound (or even fixed)criteria various (later) weighted cutsconstraints often none see above

applications various (later) often: distributed finite elementmethods on 3d-meshes of objects

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 8/49

Bicriterial Formulations

observations:1 clusterings often “nice” if balanced (like partition)2 intra-density vs. inter-sparsity is bicriterial

bicriterial (or multi-) measures for clusterings can help:constrain sparsity within clustersconstrain density between clustersexplicitly formulate desiderata

(more on bicriteria later)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 9/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher quality

less inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher quality

cliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separated

clusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connected

random clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad quality

disjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum quality

locality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)

double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same result

comparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instances

fulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49

Formalization via Bottleneck

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Quality of the clustering, upper cluster:

inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 11/49

Formalization via Bottleneck

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Quality of the clustering, upper cluster:inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)

intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 11/49

Formalization via Bottleneck

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Quality of the clustering, upper cluster:inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 11/49

Examples: Conductance, Expansion

conductance of a cut (C,V \ C):

ϕ(C,V \ C) :=ω(E(C,V \ C))

min{∑

v∈C

ω(v),∑

v∈V\Cω(v)

}(i.e.: thickness of bottleneck which cuts off C)

inter-cluster conductance (C) := 1−maxC∈C ϕ(C,V \ C)(i.e.: 1− worst bottleneck induced by some C ∈ C)

intra-cluster conductance (C) := minC∈C minP]Q=C ϕ|C(P,Q)(i.e.: best bottleneck still left uncut inside some C ∈ C)

expansion of a cut (C,V \ C):

ψ(C,V \ C) :=ω(E(C,V \ C))

min{|C|, |V \ C|

}(i.e.: in ϕ, replace ω(v) by 1; intra- and inter-cluster expansion analogously)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 12/49

Formalization: Counting Edges

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Measuring clustering quality by counting edges:inter-cluster sparsity: 6 edges of ca. 800 node pairs (few)

intra-cluster density: 53 edges of 99 node pairs (many)example: quality measure coverage = # intra-cluster edges

# edges

≈ 0.9

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 13/49

Formalization: Counting Edges

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Measuring clustering quality by counting edges:inter-cluster sparsity: 6 edges of ca. 800 node pairs (few)intra-cluster density: 53 edges of 99 node pairs (many)

example: quality measure coverage = # intra-cluster edges# edges

≈ 0.9

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 13/49

Example Counting Measures

coverage: cov(C) := # intra-cluster edges# edges

(i.e.: fraction of covered edges)

performance: perf(C) := # intra-cluster edges+# absent inter-cluster edges12 n(n−1)

(i.e.: fraction of correctly classified pairs of nodes)

density: den(C) := 12

# intra-cluster edges# possible intra-cluster edges + 1

2# absent inter-cluster edges

# possible inter-cluster edges(i.e.: fractions of correct intra- and inter-edges)

modularity: mod(C) := cov(C)− E[cov(C)](i.e.: how clear is the clustering, compared to random network?)

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 14/49

Motivation for Modularity

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

coverage = # intra-cluster edges# edges ≈ 0.9

only one cluster⇒ coverage = 1.0

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 15/49

Motivation for Modularity

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

coverage = # intra-cluster edges# edges ≈ 0.9

only one cluster⇒ coverage = 1.0

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 15/49

A Promising Remedy[Girvan and Newman: Finding and evaluating community structure in networks,’04]:”. . . if we subtract from [coverage] the expected value [. . . ],we do get a useful measure.”

Modularity

mod(C) := cov(C) − E(cov(C))

=# intra-cluster edges

|#edges| − 14|#edges|2

∑C∈C

(∑v∈C

deg(v)

)2

first: stopping criterion for cuttingthen: optimization criterion

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 16/49

A Promising Remedy[Girvan and Newman: Finding and evaluating community structure in networks,’04]:”. . . if we subtract from [coverage] the expected value [. . . ],we do get a useful measure.”

Modularity

mod(C) := cov(C) − E(cov(C))

=# intra-cluster edges

|#edges| − 14|#edges|2

∑C∈C

(∑v∈C

deg(v)

)2

first: stopping criterion for cuttingthen: optimization criterion

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 16/49

Modularity in Practiceeasy to use & implementreasonable behavior on many practical instances; heavily used in various fields

ecosystem explorationcollaboration analysesbiochemistrystructure of the internet (AS-graph, www, routers)

close to human intuition of quality[Gorke et al.: Comp. aspects of lucidity-driven clustering, 2010]

scaling behavior (double instance, result differs) [folklore]non-locality of optimal clustering [folklore]resolution limit (no tiny and large clusters at the same time)[Fortunato and Barthelemy ’07]large sparse graph ; high values, balanced clusters [Good et al.: Theperformance of modularity maximization in practical contexts, 2009]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 17/49

Modularity in Practiceeasy to use & implementreasonable behavior on many practical instances; heavily used in various fields

ecosystem explorationcollaboration analysesbiochemistrystructure of the internet (AS-graph, www, routers)

close to human intuition of quality[Gorke et al.: Comp. aspects of lucidity-driven clustering, 2010]

scaling behavior (double instance, result differs) [folklore]non-locality of optimal clustering [folklore]resolution limit (no tiny and large clusters at the same time)[Fortunato and Barthelemy ’07]large sparse graph ; high values, balanced clusters [Good et al.: Theperformance of modularity maximization in practical contexts, 2009]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 17/49

Modularity, Algorithmic Theory

The complexity of modularity optimization:

finding C with maximum modularity is NP-hard; reduction from 3-PARTITION

restriction to |C| = 2 also hard⇒ not FPT wrt. |C|greedy maximization (later) does not approximatevery limited families combinatorially solvableILP-formulation, feasible for ≈ |V | ≤ 200

[Brandes et al.: On modularity clustering, 2008]

diverse results on approximability on specific classes of graphs

[DasGupta, Devine: On the complexity of newman’s community finding approachfor biological and social networks, 2011]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 18/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons

⇒ merge clusters

Top-down: start with the one-cluster

⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons

⇒ merge clustersTop-down: start with the one-cluster

⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clusters

Top-down: start with the one-cluster

⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clusters

Top-down: start with the one-cluster

⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clusters

Top-down: start with the one-cluster

⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clusters

Top-down: start with the one-cluster

⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster

⇒ split clustersLocal Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clusters

Local Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering

⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

How to Cluster?Optimization of quality function:

Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes

Variants of recursive min-cutting

Percolation of network by removal of highly central edges

Spectral methods using eigenanalysis of adjacency Laplacian

Direct identification of dense substructures

Random walks

Geometric approaches

. . .

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49

Density-Constrained Clustering: Overview

New Optimization Problem:Find clusterings with guaranteed intra-cluster density and goodinter-cluster sparsity

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 20/49

Density-Constrained Clustering: Overview

New Optimization Problem:Find clusterings with guaranteed intra-cluster density and goodinter-cluster sparsity

This talk:Systematic collection of sparsity and density measuresClassification of measures with respect to their behaviorExperimental evaluation of greedy merge vs. greedy movesQualitative comparison of clusterings obtained by optimizing differentmeasures

See also:[Schumm et al.: Density-Constrained Graph Clustering, WADS’2011][Kappes et al.: Experiments on Density-Constrained Graph Clustering, to appearin ACM JEA]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 20/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cut

Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cut

Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cut

Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cut

Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraph

Global View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraph

Global View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraph

Global View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster-sparsity: Cut-based

Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49

Inter-cluster Sparsity:Degrees of Freedom

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49

Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)

Inter-cluster Sparsity:Degrees of Freedom

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49

Measuresnumber of cut-edgesdensityconductanceexpansion

Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)

Inter-cluster Sparsity:Degrees of Freedom

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49

Measuresnumber of cut-edgesdensityconductanceexpansion

Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)

Combinationsaverage sparsityminimum sparsity

Inter-cluster Sparsity:Degrees of Freedom

⇒ 14 (reasonable) inter-cluster sparsity measures

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49

Measuresnumber of cut-edgesdensityconductanceexpansion

Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)

Combinationsaverage sparsityminimum sparsity

Inter-cluster Sparsity:Degrees of Freedom

⇒ 14 (reasonable) inter-cluster sparsity measures

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49

Measuresnumber of cut-edgesdensityconductanceexpansion

Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)

Combinationsaverage sparsityminimum sparsity

Intra-cluster density

Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard

Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49

Intra-cluster density

Definitions analoguous to inter-cluster sparsity possible

Finding cut with optimal density/conductance/expansion is NP-hard

Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49

Intra-cluster density

Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard

Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49

Intra-cluster density

Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard

Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49

Intra-cluster density

610

Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard

Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49

Intra-cluster density

610

Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard

Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|

⇒ minimum/average/global intra-cluster density

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49

Problem Statement

Density-Constrained ClusteringGiven a graph G = (V ,E), among all clusterings with an intra-clusterdensity of no less than α, find a clustering C with optimum inter-clustersparsity.

3 possible intra-cluster density measure14 possible inter-cluster sparsity measures⇒ Family of 42 optimization problems

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 24/49

Complexity (Example)

...

Kn

...

Kn

S1

Sm

VX

vx1

vxn

Reduction from Exact Cover by 3-Sets

TheoremDensity-Constrained Clustering combining any intra-cluster densitymeasure with the number of inter-cluster edges is NP-hard.

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 25/49

Complexity (Example)

...

Kn

...

Kn

S1

Sm

VX

vx1

vxn

Reduction from Exact Cover by 3-Sets

TheoremDensity-Constrained Clustering combining any intra-cluster densitymeasure with the number of inter-cluster edges is NP-hard.

⇒ motivates use of heuristic greedy algorithms

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 25/49

Greedy Algorithms

Greedy Merge (GM)Popular for modularity-based clusteringIdea: Merge clusters iteratively

Greedy Vertex Moving (GVM)Closely related to algorithms for graph partitioningVery successfull for optimizing modularity [Rotta et al. ‘11]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 26/49

Generic Greedy Merge Algorithm

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

1 1

1 1

1

1

1

9

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

1

1

1

1

1

1

8

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

1

1 1

1

1

6

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

56

1

1

1

4

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

1

1

56

3

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

1

56

1

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

37

0

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

37

0

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

1

56

1

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3

4

Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49

Influence of Measures on Algorithm:Coarseness

inter-cluster sparsityintra-cluster density

Rough Intuition

QuestionWithout constraints, is there always a merge that improves the objectivefunction?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 28/49

Influence of Measures on Algorithm:Coarseness

inter-cluster sparsityintra-cluster density

Rough Intuition

QuestionWithout constraints, is there always a merge that improves the objectivefunction?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 28/49

(Un-)Boundedness

DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .

Max. pw. inter-cluster conductanceis bounded

18

18

18

e.g., modularity is bounded

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49

(Un-)Boundedness

DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .

Max. pw. inter-cluster conductanceis bounded

28

e.g., modularity is bounded

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49

(Un-)Boundedness

DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .

Max. pw. inter-cluster conductanceis bounded

28

e.g., modularity is bounded

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49

(Un-)Boundedness

DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .

Max. pw. inter-cluster conductanceis bounded

28

Max. pw. inter-cluster conductanceis bounded

28

e.g., modularity is bounded

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49

(Un-)Boundedness

DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .

Max. pw. inter-cluster conductanceis bounded

28

Max. pw. inter-cluster conductanceis bounded

28

e.g., modularity is bounded

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49

boundedmpxcmpxe

apxdapxc

aixd

(Un-)Boundedness

DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .

Max. pw. inter-cluster conductanceis bounded

28

Max. pw. inter-cluster conductanceis bounded

28

e.g., modularity is bounded

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49

boundedmpxcmpxe

apxdapxc

aixd

unboundednxegxdmixc

mixdmixeaixc

aixemixdmpxd

Influence of Measures on Algorithm

Feasiblemerges

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49

Influence of Measures on Algorithm

Feasiblemerges

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49

Influence of Measures on Algorithm

Update Feasiblemerges?

QuestionDoes feasibility of a merge only depend on involved clusters?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49

Influence of Measures on Algorithm

Update Feasiblemerges?

QuestionDoes feasibility of a merge only depend on involved clusters?

⇒ Context insensitivity of an intracluster measure

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

11 ≥ 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

23 < 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

23 < 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

11 ≥ 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

22 ≥ 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

34 ≥ 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity

DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.

E.g., global intra-cluster density is context sensitive

Constraint: |intra-cluster edges||possible intra-cluster edges| =

34 ≥ 0.7

E.g., minimum intra-cluster density is context insensitive

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49

Context Insensitivity: Classification

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 32/49

context insensitiveminimumintra-cluster density

context sensitiveaverageintra-cluster densityglobalintra-cluster density

Influence of Measures on Algorithm

Optimum

Heap

?Feasiblemerges

QuestionGiven context insensitivity, can the set of feasible merges be efficientlymaintained in a heap?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 33/49

Influence of Measures on Algorithm

Optimum

Heap

?Feasiblemerges

QuestionGiven context insensitivity, can the set of feasible merges be efficientlymaintained in a heap?

⇒ Locality of an objective function

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 33/49

Locality: Intuition

Example: Maximum isolated inter-cluster conductance

A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D

First approach: Use gain in inter-cluster sparsity as key

goodmerges

badmerges

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49

Locality: Intuition

Example: Maximum isolated inter-cluster conductance

A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D

First approach: Use gain in inter-cluster sparsity as key

merge G and I

A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3

goodmerges

badmerges

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49

Locality: Intuition

Example: Maximum isolated inter-cluster conductance

A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D

First approach: Use gain in inter-cluster sparsity as key

merge G and I

A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3

goodmerges

badmerges

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49

Locality: Intuition

Example: Maximum isolated inter-cluster conductance

A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D

First approach: Use gain in inter-cluster sparsity as key

merge G and I

A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3

goodmerges

badmerges

Clever tie-breaking possible?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49

Locality: Intuition

Example: Maximum isolated inter-cluster conductance

A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D

First approach: Use gain in inter-cluster sparsity as key

merge G and I

A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3

goodmerges

badmerges

Clever tie-breaking possible?

Needed: Suitable order that does not change if unrelated clusters merge

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49

Locality: Intuition

Example: Maximum isolated inter-cluster conductance

A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D

First approach: Use gain in inter-cluster sparsity as key

merge G and I

A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3

goodmerges

badmerges

Clever tie-breaking possible?

Needed: Suitable order that does not change if unrelated clusters merge

Existence of such an order ≈ Locality of the inter-cluster measure

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49

Example: Max. isolated inter-clusterconductance

0.5

Current sequence of conductance of all clusters (sorted)

A 0.4B 0.3C 0.3D 0.1E

Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49

Example: Max. isolated inter-clusterconductance

0.5

Current sequence of conductance of all clusters (sorted)

A 0.4B 0.3C 0.3D 0.1E

Sequence if A and B are merged

0.45A ∪ B 0.3C 0.3D 0.1E

Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49

Example: Max. isolated inter-clusterconductance

0.5

Current sequence of conductance of all clusters (sorted)

A 0.4B 0.3C 0.3D 0.1E

Sequence if A and B are merged

0.45A ∪ B 0.3C 0.3D 0.1E

Sequence if A and D are merged

0.45A ∪ D 0.3C 0.1E0.4B

Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49

Example: Max. isolated inter-clusterconductance

0.5

Current sequence of conductance of all clusters (sorted)

A 0.4B 0.3C 0.3D 0.1E

Sequence if A and B are merged

0.45A ∪ B 0.3C 0.3D 0.1E

Sequence if A and D are merged

0.45A ∪ D 0.3C 0.1E0.4B

compare lexicographically:

Merging A and B is better!

Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49

Example: Max. isolated inter-clusterconductance

0.5

Current sequence of conductance of all clusters (sorted)

A 0.4B 0.3C 0.3D 0.1E

Sequence if A and B are merged

0.45A ∪ B 0.3C 0.3D 0.1E

Sequence if A and D are merged

0.45A ∪ D 0.3C 0.1E0.4B

compare lexicographically:

Merging A and B is better!

Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49

Example: Max. isolated inter-clusterconductance

0.5

Current sequence of conductance of all clusters (sorted)

A 0.4B 0.3C 0.3D 0.1E

Sequence if A and B are merged

0.45A ∪ B 0.3C 0.3D 0.1E

Sequence if A and D are merged

0.45A ∪ D 0.3C 0.1E0.4B

compare lexicographically:

Merging A and B is better!

Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers

⇒ Maximum isolated inter-cluster conductance is local

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

1743

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

1539

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

1743

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

1642

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

1743

better

worse

global inter-cluster density is not localglobal inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

937

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

733

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

937

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

836

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

|inter-cluster edges||possible inter-cluster edges| =

937

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

better

worse

|inter-cluster edges||possible inter-cluster edges| =

836

global inter-cluster density is not local

localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Locality: Results

Does such an order exist for all objective functions?

better

worse

|inter-cluster edges||possible inter-cluster edges| =

836

global inter-cluster density is not local localmixdmixcmixe

aixdaixcaixe

nxe

not localmpxdapxdmpxc

mpxegxdapxe

apxc

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49

Influence of Measures on Algorithm

sufficient?

Feasiblemerges

connectedmerges

Feasiblemerges

important?

QuestionDo we have to consider pairs of unconnected clusters?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 37/49

Influence of Measures on Algorithm

sufficient?

Feasiblemerges

connectedmerges

Feasiblemerges

important?

QuestionDo we have to consider pairs of unconnected clusters?

⇒ Connectedness of an objective function

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 37/49

Disconnectedness

DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .

14

max. pw. inter-cluster conductanceis not connected

14

14

14

14

14

14

14

connectednxe

unconnectedgxdmixcmixdmixeaixc

aixemixdmpxdmpxcmpxe

apxd

apxc

aixd

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49

Disconnectedness

DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .

max. pw. inter-cluster conductanceis not connected

18

18

18

18

18

18

18

18

connectednxe

unconnectedgxdmixcmixdmixeaixc

aixemixdmpxdmpxcmpxe

apxd

apxc

aixd

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49

Disconnectedness

DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .

max. pw. inter-cluster conductanceis not connected

Best option!

18

18

18

18

18

18

18

18

connectednxe

unconnectedgxdmixcmixdmixeaixc

aixemixdmpxdmpxcmpxe

apxd

apxc

aixd

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49

Disconnectedness

DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .

max. pw. inter-cluster conductanceis not connected

Best option!

18

18

18

18

18

18

18

18

connectednxe

unconnectedgxdmixcmixdmixeaixc

aixemixdmpxdmpxcmpxe

apxd

apxc

aixd

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49

Influence of Measures on Efficiency

(Given the necessary data can efficiently be maintained:)

Contextinsensitivity Locality+ =

O(n2 log n)running time

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 39/49

Influence of Measures on Efficiency

(Given the necessary data can efficiently be maintained:)

Contextinsensitivity Locality+ =

O(n2 log n)running time

Contextinsensitivity

Locality+ =

O(md log n)running time

Connectedness+ &

linear space

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 39/49

Example: Email Graph of ourDepartment

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 40/49

chair

Modularity-based algorithm greedy merge (mid + aixc)

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1 1

1 1

1

1

1

923

4 1

5

6

7

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1 1

1 1

1

1

1

923

4 1

5

6

7

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1 1

1

1

1

1

823

4 1

5

6

7

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

1

1

723

4 1

5

6

7

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

1

723

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

1

723

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

1

723

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1 1

623

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1 1

623

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

523

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

523

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

1

523

4 1

5

6

7

1

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

1

323

4 1

5

6

7

56

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Local Moving

Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3

4

1

123

4 1

5

6

7

56

Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49

Greedy Vertex Moving

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract project

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract project

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract project

project

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract project

project

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract project

project

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Greedy Vertex Moving

contract

contract project

project

Idea: Use Local Moving on multiple levels

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49

Effectiveness: Merge vs. Move

Question: Which greedy algorithm is more effective?

Setup:Preliminary Experiments: Pairwise measures behavecounter-intuitively⇒ left out of experimental analysisExperiments on Real-World Networks taken from the benchmark setsof Arenas and Newman

Outcome:Different Configurations

Intracluster density measureIntercluster sparsity measureParameter α

Summary: In 74 percent of all configurations, greedy vertex movingperforms better than greedy merging

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 43/49

Social Network of Dolphins

[Lusseau ’04]

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Objectives: average intercluster density

Restriction: global intracluster density > 0.2

maximum intercluster density

global intercluster density

intercluster edges

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Objectives: av. intercluster conductanceav. intercluster expansion

Restriction: global intracluster density > 0.2

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Objective: max. intercluster expansionmax. intercluster conductance

Restriction: global intracluster density > 0.2

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Objective: modularity

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Social Network of Dolphins

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49

Planted Partition Graphs: Setup

Planted Partition Graph:

pinpout

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 45/49

Planted Partition Graphs: Setup

Planted Partition Graph:

pinpout

QuestionWhat is the distance between clustering found by objective function andhidden clustering?

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 45/49

Planted Partition Graphs: Setup

Planted Partition Graph:

pinpout

QuestionWhat is the distance between clustering found by objective function andhidden clustering?

Parameter α≈

expected intracluster density

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 45/49

Planted Partition Graphs: RoughSummary

ML-M

OD

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

0.2

0.4

0.6

0.8

1.0

····

··

·······

···········································

·

·

··

······

········

·

····

·

·

·····

·····

·

·

·

··

····

··

·····

··

·

·

··

·

··

··

·

························································· ·······

··································

·

···

···

··

··

···········

·

·

· ·············································

················

·····················································

···········global intracluster density minimum intracluster density

Distance to reference clustering

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49

Planted Partition Graphs: RoughSummary

ML-M

OD

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

0.2

0.4

0.6

0.8

1.0

····

··

·······

···········································

·

·

··

······

········

·

····

·

·

·····

·····

·

·

·

··

····

··

·····

··

·

·

··

·

··

··

·

························································· ·······

··································

·

···

···

··

··

···········

·

·

· ·············································

················

·····················································

···········global intracluster density minimum intracluster density

Distance to reference clustering

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49

Planted Partition Graphs: RoughSummary

ML-M

OD

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

0.2

0.4

0.6

0.8

1.0

····

··

·······

···········································

·

·

··

······

········

·

····

·

·

·····

·····

·

·

·

··

····

··

·····

··

·

·

··

·

··

··

·

························································· ·······

··································

·

···

···

··

··

···········

·

·

· ·············································

················

·····················································

···········global intracluster density minimum intracluster density

Distance to reference clustering

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49

Planted Partition Graphs: RoughSummary

ML-M

OD

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

0.2

0.4

0.6

0.8

1.0

····

··

·······

···········································

·

·

··

······

········

·

····

·

·

·····

·····

·

·

·

··

····

··

·····

··

·

·

··

·

··

··

·

························································· ·······

··································

·

···

···

··

··

···········

·

·

· ·············································

················

·····················································

···········global intracluster density minimum intracluster density

reference

Distance to reference clustering

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49

Planted Partition Graphs: RoughSummary

ML-M

OD

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

0.2

0.4

0.6

0.8

1.0

····

··

·······

···········································

·

·

··

······

········

·

····

·

·

·····

·····

·

·

·

··

····

··

·····

··

·

·

··

·

··

··

·

························································· ·······

··································

·

···

···

··

··

···········

·

·

· ·············································

················

·····················································

···········global intracluster density minimum intracluster density

Distance to reference clustering

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49

Planted Partition Graphs: RoughSummary

ML-M

OD

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

MOD

NXE

GXD

MIXD

AIXD

MIXC

AIXC

MIXE

AIXE

0.2

0.4

0.6

0.8

1.0

····

··

·······

···········································

·

·

··

······

········

·

····

·

·

·····

·····

·

·

·

··

····

··

·····

··

·

·

··

·

··

··

·

························································· ·······

··································

·

···

···

··

··

···········

·

·

· ·············································

················

·····················································

···········global intracluster density minimum intracluster density

Distance to reference clustering

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49

Planted Partition Graphs: Insights

Investigating different configurations yields further insights:Using average intracluster density as constraint leads to veryunbalanced clusteringsConstraining modularity by maximum intracluster density improves itsresults

. . . especially if expected number of clusters is high

Fine reference clusterings disbalance maximum objectivesAverage intercluster expansion/density identify many clusters

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 47/49

Conclusion

Clustering as bicriterial problemOptimize inter-cluster sparsity respecting intra-cluster densityCollection of new measuresAlgorithm Engineering aspects:Formulation of measuresClassification of measures with respect to greedy merge⇒ Insightsabout behavior of measuresExperimental evaluation of greedy methodsExperimental comparison on planted partition graphs

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 48/49

Conclusion

Clustering as bicriterial problemOptimize inter-cluster sparsity respecting intra-cluster densityCollection of new measuresAlgorithm Engineering aspects:Formulation of measuresClassification of measures with respect to greedy merge⇒ Insightsabout behavior of measuresExperimental evaluation of greedy methodsExperimental comparison on planted partition graphs

Thank you for your attention!

Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 48/49