algorithmic methods for complex network analysis: …i11 · algorithmic methods for complex network...
TRANSCRIPT
KARLSRUHE INSTITUTE OF TECHNOLOGY – INSTITUTE OF THEORETICAL INFORMATICS
Algorithmic Methods for ComplexNetwork Analysis: Graph ClusteringSummer School on Algorithm Engineering
Dorothea Wagner | September 19, 2014
KIT – University of the State of Baden-Wuerttemberg andNational Laboratory of the Helmholtz Association
www.kit.edu
Scenario of Network AnalysisGiven a network . . .
34
33
31
30
27
23
21
19
16
15
10
9
22
20
18
14
13
12
8
4
32
117
11
7
6
5
322928
26
25
24
explore the instancederive its structureidentify its properties
How can we learn about the instance?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 2/49
Scenario of Network AnalysisGiven a network . . .
34
33
31
30
27
23
21
19
16
15
10
9
22
20
18
14
13
12
8
4
32
117
11
7
6
5
322928
26
25
24
explore the instancederive its structureidentify its properties
How can we learn about the instance?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 2/49
An Archetypal Example“Zachary’s Karate Club”, a real, social network
34
33
31
30
27
23
21
19
16
15
10
9
22
20
18
14
13
12
8
4
32
117
11
7
6
5
322928
26
25
24
2 years of observation34 vertices = members78 edges = social ties
club split up after disputemanager vs. trainersarchon of toy examples
Caused by an “unequal flow of sentiments and information across the ties”a “factional division led to a formal separation of the club”.[Wayne Zachary: An Information Flow Model for Conflict and Fission in Small Groups, ’77]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 3/49
A Glimpse of Network Analysis
graph clustering / detecting communities
Group 1
322928
26
25
24
Group 2
17
11
7
6
5
Group 3
22
20
18
14
13
12
8
4
32
1
Group 4
34
33
31
30
27
23
21
19
16
15
10
9
box = cluster
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 4/49
Scaling of Real-World Instances
”‘Zachary’s Karate Club”’)(vertices/edges = 34/78)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49
Scaling of Real-World Instances
”‘US college football”’ teams and matches(vertices/edges = 115/616)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49
Scaling of Real-World Instances
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49
variables of aSAT-instanceedges = direct dep.(electr. components)(vertices/edges ≈ 2K/6K)
Scaling of Real-World Instances
sci. collaborations:3-hop neighorhoodvon D. Wagner(DBLP)(vertices/edges ≈ 10k/40k)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49
Scaling of Real-World Instances
physical Internet: autonomous systemes(vertices/edges ≈ 20K/60K)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49
Scaling of Real-World Instances
instance vertices edges
coauthors in DBLP 300K 1M
roads in the USA 24M 60M
WWW: .UK-domain ’02 20M 500M
( neurons in human brain & 1011 ∼ 1017 )
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 5/49
. . . no limit to be expected. . .
Clustering: Intuition to FormalizationTask: partition graphinto natural groupsParadigm:intra-cluster densityvs. inter-clustersparsity
Different approaches exist to formalize this paradigm, usually:
Paradigm of Graph ClusteringIntra-cluster density vs. inter-cluster sparsity
⇓Mathematical Formalization
quality measures for clusterings
Many exist, optimization generally (NP-)hardThere is no single, universally best strategy
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 6/49
Clustering: Intuition to FormalizationTask: partition graphinto natural groupsParadigm:intra-cluster densityvs. inter-clustersparsity
Different approaches exist to formalize this paradigm, usually:
Paradigm of Graph ClusteringIntra-cluster density vs. inter-cluster sparsity
⇓Mathematical Formalization
quality measures for clusterings
Many exist, optimization generally (NP-)hardThere is no single, universally best strategy
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 6/49
Clustering: Intuition to FormalizationTask: partition graphinto natural groupsParadigm:intra-cluster densityvs. inter-clustersparsity
Different approaches exist to formalize this paradigm, usually:
Paradigm of Graph ClusteringIntra-cluster density vs. inter-cluster sparsity
⇓Mathematical Formalization
quality measures for clusterings
Many exist, optimization generally (NP-)hardThere is no single, universally best strategy
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 6/49
Algorithm Engineering
Algorithms
implement
design
experiment
anal
yze
modelling reality is hard
finding optima is hardsatisfying needs ofapplication is hard
still, we do need to cluster⇒ need good foundation
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49
Algorithm Engineering
Algorithms
implement
design
experiment
anal
yze
modelling reality is hardfinding optima is hard
satisfying needs ofapplication is hard
still, we do need to cluster⇒ need good foundation
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49
Algorithm Engineering
Algorithms
implement
design
experiment
anal
yze
modelling reality is hardfinding optima is hardsatisfying needs ofapplication is hard
still, we do need to cluster⇒ need good foundation
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49
Algorithm Engineering
Algorithms
implement
design
experiment
anal
yze
modelling reality is hardfinding optima is hardsatisfying needs ofapplication is hard
still, we do need to cluster
⇒ need good foundation
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49
Algorithm Engineering
Algorithms
implement
design
experiment
anal
yze
modelling reality is hardfinding optima is hardsatisfying needs ofapplication is hard
still, we do need to cluster⇒ need good foundation
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 7/49
Clustering vs. Partitioning
clustering partitioningpurpose analysis (pred.) handling of instance. . . and then? zoom/abstraction computations on parts
# of parts open predefined (upper bound)size of parts open upper bound (or even fixed)criteria various (later) weighted cutsconstraints often none see above
applications various (later) often: distributed finite elementmethods on 3d-meshes of objects
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 8/49
Bicriterial Formulations
observations:1 clusterings often “nice” if balanced (like partition)2 intra-density vs. inter-sparsity is bicriterial
bicriterial (or multi-) measures for clusterings can help:constrain sparsity within clustersconstrain density between clustersexplicitly formulate desiderata
(more on bicriteria later)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 9/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher quality
less inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher quality
cliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separated
clusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connected
random clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad quality
disjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum quality
locality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)
double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same result
comparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instances
fulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Postulations to a MeasureGiven a graph G and a clustering C, a quality measure should behave asfollows:
more intra-edges⇒ higher qualityless inter-edges⇒ higher qualitycliques must never be separatedclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measure (being better/worse in one part does notdepend on what is done in other part of graph)double the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 10/49
Formalization via Bottleneck
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Quality of the clustering, upper cluster:
inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 11/49
Formalization via Bottleneck
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Quality of the clustering, upper cluster:inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)
intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 11/49
Formalization via Bottleneck
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Quality of the clustering, upper cluster:inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 11/49
Examples: Conductance, Expansion
conductance of a cut (C,V \ C):
ϕ(C,V \ C) :=ω(E(C,V \ C))
min{∑
v∈C
ω(v),∑
v∈V\Cω(v)
}(i.e.: thickness of bottleneck which cuts off C)
inter-cluster conductance (C) := 1−maxC∈C ϕ(C,V \ C)(i.e.: 1− worst bottleneck induced by some C ∈ C)
intra-cluster conductance (C) := minC∈C minP]Q=C ϕ|C(P,Q)(i.e.: best bottleneck still left uncut inside some C ∈ C)
expansion of a cut (C,V \ C):
ψ(C,V \ C) :=ω(E(C,V \ C))
min{|C|, |V \ C|
}(i.e.: in ϕ, replace ω(v) by 1; intra- and inter-cluster expansion analogously)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 12/49
Formalization: Counting Edges
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Measuring clustering quality by counting edges:inter-cluster sparsity: 6 edges of ca. 800 node pairs (few)
intra-cluster density: 53 edges of 99 node pairs (many)example: quality measure coverage = # intra-cluster edges
# edges
≈ 0.9
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 13/49
Formalization: Counting Edges
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
Measuring clustering quality by counting edges:inter-cluster sparsity: 6 edges of ca. 800 node pairs (few)intra-cluster density: 53 edges of 99 node pairs (many)
example: quality measure coverage = # intra-cluster edges# edges
≈ 0.9
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 13/49
Example Counting Measures
coverage: cov(C) := # intra-cluster edges# edges
(i.e.: fraction of covered edges)
performance: perf(C) := # intra-cluster edges+# absent inter-cluster edges12 n(n−1)
(i.e.: fraction of correctly classified pairs of nodes)
density: den(C) := 12
# intra-cluster edges# possible intra-cluster edges + 1
2# absent inter-cluster edges
# possible inter-cluster edges(i.e.: fractions of correct intra- and inter-edges)
modularity: mod(C) := cov(C)− E[cov(C)](i.e.: how clear is the clustering, compared to random network?)
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 14/49
Motivation for Modularity
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
coverage = # intra-cluster edges# edges ≈ 0.9
only one cluster⇒ coverage = 1.0
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 15/49
Motivation for Modularity
Raul
Susan
Phil
Robyn Els
Ken
Alice
KadkaChrisViolaine
Holly
DaveDoro Bob
Yoan
HelenCain
KateSue
Ron
Ralph
Tess
Mandy
Didi
Diane
Elaine
Richard
Clair
Marc
Toby
Frank
Lee
coverage = # intra-cluster edges# edges ≈ 0.9
only one cluster⇒ coverage = 1.0
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 15/49
A Promising Remedy[Girvan and Newman: Finding and evaluating community structure in networks,’04]:”. . . if we subtract from [coverage] the expected value [. . . ],we do get a useful measure.”
Modularity
mod(C) := cov(C) − E(cov(C))
=# intra-cluster edges
|#edges| − 14|#edges|2
∑C∈C
(∑v∈C
deg(v)
)2
first: stopping criterion for cuttingthen: optimization criterion
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 16/49
A Promising Remedy[Girvan and Newman: Finding and evaluating community structure in networks,’04]:”. . . if we subtract from [coverage] the expected value [. . . ],we do get a useful measure.”
Modularity
mod(C) := cov(C) − E(cov(C))
=# intra-cluster edges
|#edges| − 14|#edges|2
∑C∈C
(∑v∈C
deg(v)
)2
first: stopping criterion for cuttingthen: optimization criterion
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 16/49
Modularity in Practiceeasy to use & implementreasonable behavior on many practical instances; heavily used in various fields
ecosystem explorationcollaboration analysesbiochemistrystructure of the internet (AS-graph, www, routers)
close to human intuition of quality[Gorke et al.: Comp. aspects of lucidity-driven clustering, 2010]
scaling behavior (double instance, result differs) [folklore]non-locality of optimal clustering [folklore]resolution limit (no tiny and large clusters at the same time)[Fortunato and Barthelemy ’07]large sparse graph ; high values, balanced clusters [Good et al.: Theperformance of modularity maximization in practical contexts, 2009]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 17/49
Modularity in Practiceeasy to use & implementreasonable behavior on many practical instances; heavily used in various fields
ecosystem explorationcollaboration analysesbiochemistrystructure of the internet (AS-graph, www, routers)
close to human intuition of quality[Gorke et al.: Comp. aspects of lucidity-driven clustering, 2010]
scaling behavior (double instance, result differs) [folklore]non-locality of optimal clustering [folklore]resolution limit (no tiny and large clusters at the same time)[Fortunato and Barthelemy ’07]large sparse graph ; high values, balanced clusters [Good et al.: Theperformance of modularity maximization in practical contexts, 2009]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 17/49
Modularity, Algorithmic Theory
The complexity of modularity optimization:
finding C with maximum modularity is NP-hard; reduction from 3-PARTITION
restriction to |C| = 2 also hard⇒ not FPT wrt. |C|greedy maximization (later) does not approximatevery limited families combinatorially solvableILP-formulation, feasible for ≈ |V | ≤ 200
[Brandes et al.: On modularity clustering, 2008]
diverse results on approximability on specific classes of graphs
[DasGupta, Devine: On the complexity of newman’s community finding approachfor biological and social networks, 2011]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 18/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons
⇒ merge clusters
Top-down: start with the one-cluster
⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons
⇒ merge clustersTop-down: start with the one-cluster
⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clusters
Top-down: start with the one-cluster
⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clusters
Top-down: start with the one-cluster
⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clusters
Top-down: start with the one-cluster
⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clusters
Top-down: start with the one-cluster
⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster
⇒ split clustersLocal Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clusters
Local Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering
⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
How to Cluster?Optimization of quality function:
Bottom-up: start with singletons⇒ merge clustersTop-down: start with the one-cluster⇒ split clustersLocal Opt.: start with random clustering⇒ migrate nodes
Variants of recursive min-cutting
Percolation of network by removal of highly central edges
Spectral methods using eigenanalysis of adjacency Laplacian
Direct identification of dense substructures
Random walks
Geometric approaches
. . .
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 19/49
Density-Constrained Clustering: Overview
New Optimization Problem:Find clusterings with guaranteed intra-cluster density and goodinter-cluster sparsity
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 20/49
Density-Constrained Clustering: Overview
New Optimization Problem:Find clusterings with guaranteed intra-cluster density and goodinter-cluster sparsity
This talk:Systematic collection of sparsity and density measuresClassification of measures with respect to their behaviorExperimental evaluation of greedy merge vs. greedy movesQualitative comparison of clusterings obtained by optimizing differentmeasures
See also:[Schumm et al.: Density-Constrained Graph Clustering, WADS’2011][Kappes et al.: Experiments on Density-Constrained Graph Clustering, to appearin ACM JEA]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 20/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cut
Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cut
Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cut
Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cut
Pairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraph
Global View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraph
Global View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraph
Global View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster-sparsity: Cut-based
Isolated View: Each cluster induces a cutPairwise View: Each pair of clusters induces a cut in their subgraphGlobal View: A clustering with k clusters induces a k-way cut
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 21/49
Inter-cluster Sparsity:Degrees of Freedom
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49
Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)
Inter-cluster Sparsity:Degrees of Freedom
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49
Measuresnumber of cut-edgesdensityconductanceexpansion
Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)
Inter-cluster Sparsity:Degrees of Freedom
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49
Measuresnumber of cut-edgesdensityconductanceexpansion
Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)
Combinationsaverage sparsityminimum sparsity
Inter-cluster Sparsity:Degrees of Freedom
⇒ 14 (reasonable) inter-cluster sparsity measures
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49
Measuresnumber of cut-edgesdensityconductanceexpansion
Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)
Combinationsaverage sparsityminimum sparsity
Inter-cluster Sparsity:Degrees of Freedom
⇒ 14 (reasonable) inter-cluster sparsity measures
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 22/49
Measuresnumber of cut-edgesdensityconductanceexpansion
Set of cutsisolated (one for eachcluster)pairwise (one for eachpair of clusters)global (k -way cut)
Combinationsaverage sparsityminimum sparsity
Intra-cluster density
Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard
Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49
Intra-cluster density
Definitions analoguous to inter-cluster sparsity possible
Finding cut with optimal density/conductance/expansion is NP-hard
Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49
Intra-cluster density
Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard
Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49
Intra-cluster density
Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard
Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49
Intra-cluster density
610
Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard
Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49
Intra-cluster density
610
Definitions analoguous to inter-cluster sparsity possibleFinding cut with optimal density/conductance/expansion is NP-hard
Practical approach: evaluate |intra-cluster edges||possible intra-cluster edges|
⇒ minimum/average/global intra-cluster density
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 23/49
Problem Statement
Density-Constrained ClusteringGiven a graph G = (V ,E), among all clusterings with an intra-clusterdensity of no less than α, find a clustering C with optimum inter-clustersparsity.
3 possible intra-cluster density measure14 possible inter-cluster sparsity measures⇒ Family of 42 optimization problems
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 24/49
Complexity (Example)
...
Kn
...
Kn
S1
Sm
VX
vx1
vxn
Reduction from Exact Cover by 3-Sets
TheoremDensity-Constrained Clustering combining any intra-cluster densitymeasure with the number of inter-cluster edges is NP-hard.
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 25/49
Complexity (Example)
...
Kn
...
Kn
S1
Sm
VX
vx1
vxn
Reduction from Exact Cover by 3-Sets
TheoremDensity-Constrained Clustering combining any intra-cluster densitymeasure with the number of inter-cluster edges is NP-hard.
⇒ motivates use of heuristic greedy algorithms
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 25/49
Greedy Algorithms
Greedy Merge (GM)Popular for modularity-based clusteringIdea: Merge clusters iteratively
Greedy Vertex Moving (GVM)Closely related to algorithms for graph partitioningVery successfull for optimizing modularity [Rotta et al. ‘11]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 26/49
Generic Greedy Merge Algorithm
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
1 1
1 1
1
1
1
9
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
1
1
1
1
1
1
8
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
1
1 1
1
1
6
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
56
1
1
1
4
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
1
1
56
3
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
1
56
1
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
37
0
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
37
0
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
1
56
1
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Generic Greedy Merge AlgorithmExample: Minimize number of inter-cluster edges such that the density ofeach cluster is at least 3
4
Idea: Merge clusters greedilyObjective: Increase inter-cluster sparsityConstraint: Intra-cluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 27/49
Influence of Measures on Algorithm:Coarseness
inter-cluster sparsityintra-cluster density
Rough Intuition
QuestionWithout constraints, is there always a merge that improves the objectivefunction?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 28/49
Influence of Measures on Algorithm:Coarseness
inter-cluster sparsityintra-cluster density
Rough Intuition
QuestionWithout constraints, is there always a merge that improves the objectivefunction?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 28/49
(Un-)Boundedness
DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .
Max. pw. inter-cluster conductanceis bounded
18
18
18
e.g., modularity is bounded
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49
(Un-)Boundedness
DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .
Max. pw. inter-cluster conductanceis bounded
28
e.g., modularity is bounded
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49
(Un-)Boundedness
DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .
Max. pw. inter-cluster conductanceis bounded
28
e.g., modularity is bounded
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49
(Un-)Boundedness
DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .
Max. pw. inter-cluster conductanceis bounded
28
Max. pw. inter-cluster conductanceis bounded
28
e.g., modularity is bounded
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49
(Un-)Boundedness
DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .
Max. pw. inter-cluster conductanceis bounded
28
Max. pw. inter-cluster conductanceis bounded
28
e.g., modularity is bounded
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49
boundedmpxcmpxe
apxdapxc
aixd
(Un-)Boundedness
DefinitionAn objective function measure f is unbounded if for any clustering C with|C| > 1 there exists a merge that does not deteriorate f .
Max. pw. inter-cluster conductanceis bounded
28
Max. pw. inter-cluster conductanceis bounded
28
e.g., modularity is bounded
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 29/49
boundedmpxcmpxe
apxdapxc
aixd
unboundednxegxdmixc
mixdmixeaixc
aixemixdmpxd
Influence of Measures on Algorithm
Feasiblemerges
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49
Influence of Measures on Algorithm
Feasiblemerges
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49
Influence of Measures on Algorithm
Update Feasiblemerges?
QuestionDoes feasibility of a merge only depend on involved clusters?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49
Influence of Measures on Algorithm
Update Feasiblemerges?
QuestionDoes feasibility of a merge only depend on involved clusters?
⇒ Context insensitivity of an intracluster measure
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 30/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
11 ≥ 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
23 < 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
23 < 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
11 ≥ 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
22 ≥ 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
34 ≥ 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity
DefinitionA constraint is context insensitive, if the feasibility of a merge does notdepend on the remainder of the clustering.
E.g., global intra-cluster density is context sensitive
Constraint: |intra-cluster edges||possible intra-cluster edges| =
34 ≥ 0.7
E.g., minimum intra-cluster density is context insensitive
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 31/49
Context Insensitivity: Classification
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 32/49
context insensitiveminimumintra-cluster density
context sensitiveaverageintra-cluster densityglobalintra-cluster density
Influence of Measures on Algorithm
Optimum
Heap
?Feasiblemerges
QuestionGiven context insensitivity, can the set of feasible merges be efficientlymaintained in a heap?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 33/49
Influence of Measures on Algorithm
Optimum
Heap
?Feasiblemerges
QuestionGiven context insensitivity, can the set of feasible merges be efficientlymaintained in a heap?
⇒ Locality of an objective function
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 33/49
Locality: Intuition
Example: Maximum isolated inter-cluster conductance
A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D
First approach: Use gain in inter-cluster sparsity as key
goodmerges
badmerges
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49
Locality: Intuition
Example: Maximum isolated inter-cluster conductance
A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D
First approach: Use gain in inter-cluster sparsity as key
merge G and I
A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3
goodmerges
badmerges
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49
Locality: Intuition
Example: Maximum isolated inter-cluster conductance
A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D
First approach: Use gain in inter-cluster sparsity as key
merge G and I
A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3
goodmerges
badmerges
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49
Locality: Intuition
Example: Maximum isolated inter-cluster conductance
A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D
First approach: Use gain in inter-cluster sparsity as key
merge G and I
A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3
goodmerges
badmerges
Clever tie-breaking possible?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49
Locality: Intuition
Example: Maximum isolated inter-cluster conductance
A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D
First approach: Use gain in inter-cluster sparsity as key
merge G and I
A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3
goodmerges
badmerges
Clever tie-breaking possible?
Needed: Suitable order that does not change if unrelated clusters merge
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49
Locality: Intuition
Example: Maximum isolated inter-cluster conductance
A,B -0.3 C ,D 0 E ,F C, F G ,H G , I 0.30 0 0A,B C ,D
First approach: Use gain in inter-cluster sparsity as key
merge G and I
A,B -0.3 C ,D 0 E ,FC, F 0.2-0.2A,B C ,DG ∪ I ,T -0.3
goodmerges
badmerges
Clever tie-breaking possible?
Needed: Suitable order that does not change if unrelated clusters merge
Existence of such an order ≈ Locality of the inter-cluster measure
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 34/49
Example: Max. isolated inter-clusterconductance
0.5
Current sequence of conductance of all clusters (sorted)
A 0.4B 0.3C 0.3D 0.1E
Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49
Example: Max. isolated inter-clusterconductance
0.5
Current sequence of conductance of all clusters (sorted)
A 0.4B 0.3C 0.3D 0.1E
Sequence if A and B are merged
0.45A ∪ B 0.3C 0.3D 0.1E
Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49
Example: Max. isolated inter-clusterconductance
0.5
Current sequence of conductance of all clusters (sorted)
A 0.4B 0.3C 0.3D 0.1E
Sequence if A and B are merged
0.45A ∪ B 0.3C 0.3D 0.1E
Sequence if A and D are merged
0.45A ∪ D 0.3C 0.1E0.4B
Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49
Example: Max. isolated inter-clusterconductance
0.5
Current sequence of conductance of all clusters (sorted)
A 0.4B 0.3C 0.3D 0.1E
Sequence if A and B are merged
0.45A ∪ B 0.3C 0.3D 0.1E
Sequence if A and D are merged
0.45A ∪ D 0.3C 0.1E0.4B
compare lexicographically:
Merging A and B is better!
Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49
Example: Max. isolated inter-clusterconductance
0.5
Current sequence of conductance of all clusters (sorted)
A 0.4B 0.3C 0.3D 0.1E
Sequence if A and B are merged
0.45A ∪ B 0.3C 0.3D 0.1E
Sequence if A and D are merged
0.45A ∪ D 0.3C 0.1E0.4B
compare lexicographically:
Merging A and B is better!
Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49
Example: Max. isolated inter-clusterconductance
0.5
Current sequence of conductance of all clusters (sorted)
A 0.4B 0.3C 0.3D 0.1E
Sequence if A and B are merged
0.45A ∪ B 0.3C 0.3D 0.1E
Sequence if A and D are merged
0.45A ∪ D 0.3C 0.1E0.4B
compare lexicographically:
Merging A and B is better!
Ordering merges lexicographically is stableTwo merges can be compared in constant time by comparing keysconsisting of three numbers
⇒ Maximum isolated inter-cluster conductance is local
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 35/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
1743
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
1539
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
1743
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
1642
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
1743
better
worse
global inter-cluster density is not localglobal inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
937
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
733
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
937
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
836
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
|inter-cluster edges||possible inter-cluster edges| =
937
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
better
worse
|inter-cluster edges||possible inter-cluster edges| =
836
global inter-cluster density is not local
localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Locality: Results
Does such an order exist for all objective functions?
better
worse
|inter-cluster edges||possible inter-cluster edges| =
836
global inter-cluster density is not local localmixdmixcmixe
aixdaixcaixe
nxe
not localmpxdapxdmpxc
mpxegxdapxe
apxc
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 36/49
Influence of Measures on Algorithm
sufficient?
Feasiblemerges
connectedmerges
Feasiblemerges
important?
QuestionDo we have to consider pairs of unconnected clusters?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 37/49
Influence of Measures on Algorithm
sufficient?
Feasiblemerges
connectedmerges
Feasiblemerges
important?
QuestionDo we have to consider pairs of unconnected clusters?
⇒ Connectedness of an objective function
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 37/49
Disconnectedness
DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .
14
max. pw. inter-cluster conductanceis not connected
14
14
14
14
14
14
14
connectednxe
unconnectedgxdmixcmixdmixeaixc
aixemixdmpxdmpxcmpxe
apxd
apxc
aixd
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49
Disconnectedness
DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .
max. pw. inter-cluster conductanceis not connected
18
18
18
18
18
18
18
18
connectednxe
unconnectedgxdmixcmixdmixeaixc
aixemixdmpxdmpxcmpxe
apxd
apxc
aixd
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49
Disconnectedness
DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .
max. pw. inter-cluster conductanceis not connected
Best option!
18
18
18
18
18
18
18
18
connectednxe
unconnectedgxdmixcmixdmixeaixc
aixemixdmpxdmpxcmpxe
apxd
apxc
aixd
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49
Disconnectedness
DefinitionAn objective function f is connected if merging unconnected clusters isnever the best option with respect to f .
max. pw. inter-cluster conductanceis not connected
Best option!
18
18
18
18
18
18
18
18
connectednxe
unconnectedgxdmixcmixdmixeaixc
aixemixdmpxdmpxcmpxe
apxd
apxc
aixd
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 38/49
Influence of Measures on Efficiency
(Given the necessary data can efficiently be maintained:)
Contextinsensitivity Locality+ =
O(n2 log n)running time
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 39/49
Influence of Measures on Efficiency
(Given the necessary data can efficiently be maintained:)
Contextinsensitivity Locality+ =
O(n2 log n)running time
Contextinsensitivity
Locality+ =
O(md log n)running time
Connectedness+ &
linear space
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 39/49
Example: Email Graph of ourDepartment
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 40/49
chair
Modularity-based algorithm greedy merge (mid + aixc)
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1 1
1 1
1
1
1
923
4 1
5
6
7
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1 1
1 1
1
1
1
923
4 1
5
6
7
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1 1
1
1
1
1
823
4 1
5
6
7
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
1
1
723
4 1
5
6
7
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
1
723
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
1
723
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
1
723
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1 1
623
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1 1
623
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
523
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
523
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
1
523
4 1
5
6
7
1
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
1
323
4 1
5
6
7
56
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Local Moving
Example: Minimize number of intercluster edges such that the density ofeach cluster is at least 3
4
1
123
4 1
5
6
7
56
Idea: Move vertices greedilyObjective: Increase intercluster sparsityConstraint: Intracluster density must not drop below given threshold
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 41/49
Greedy Vertex Moving
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract project
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract project
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract project
project
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract project
project
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract project
project
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Greedy Vertex Moving
contract
contract project
project
Idea: Use Local Moving on multiple levels
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 42/49
Effectiveness: Merge vs. Move
Question: Which greedy algorithm is more effective?
Setup:Preliminary Experiments: Pairwise measures behavecounter-intuitively⇒ left out of experimental analysisExperiments on Real-World Networks taken from the benchmark setsof Arenas and Newman
Outcome:Different Configurations
Intracluster density measureIntercluster sparsity measureParameter α
Summary: In 74 percent of all configurations, greedy vertex movingperforms better than greedy merging
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 43/49
Social Network of Dolphins
[Lusseau ’04]
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Objectives: average intercluster density
Restriction: global intracluster density > 0.2
maximum intercluster density
global intercluster density
intercluster edges
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Objectives: av. intercluster conductanceav. intercluster expansion
Restriction: global intracluster density > 0.2
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Objective: max. intercluster expansionmax. intercluster conductance
Restriction: global intracluster density > 0.2
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Objective: modularity
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Social Network of Dolphins
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 44/49
Planted Partition Graphs: Setup
Planted Partition Graph:
pinpout
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 45/49
Planted Partition Graphs: Setup
Planted Partition Graph:
pinpout
QuestionWhat is the distance between clustering found by objective function andhidden clustering?
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 45/49
Planted Partition Graphs: Setup
Planted Partition Graph:
pinpout
QuestionWhat is the distance between clustering found by objective function andhidden clustering?
Parameter α≈
expected intracluster density
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 45/49
Planted Partition Graphs: RoughSummary
ML-M
OD
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
0.2
0.4
0.6
0.8
1.0
····
··
·······
···········································
·
·
··
······
········
·
····
·
·
·····
·····
·
·
·
··
····
··
·····
··
·
·
··
·
··
··
·
························································· ·······
··································
·
···
···
··
··
···········
·
·
· ·············································
················
·····················································
···········global intracluster density minimum intracluster density
Distance to reference clustering
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49
Planted Partition Graphs: RoughSummary
ML-M
OD
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
0.2
0.4
0.6
0.8
1.0
····
··
·······
···········································
·
·
··
······
········
·
····
·
·
·····
·····
·
·
·
··
····
··
·····
··
·
·
··
·
··
··
·
························································· ·······
··································
·
···
···
··
··
···········
·
·
· ·············································
················
·····················································
···········global intracluster density minimum intracluster density
Distance to reference clustering
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49
Planted Partition Graphs: RoughSummary
ML-M
OD
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
0.2
0.4
0.6
0.8
1.0
····
··
·······
···········································
·
·
··
······
········
·
····
·
·
·····
·····
·
·
·
··
····
··
·····
··
·
·
··
·
··
··
·
························································· ·······
··································
·
···
···
··
··
···········
·
·
· ·············································
················
·····················································
···········global intracluster density minimum intracluster density
Distance to reference clustering
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49
Planted Partition Graphs: RoughSummary
ML-M
OD
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
0.2
0.4
0.6
0.8
1.0
····
··
·······
···········································
·
·
··
······
········
·
····
·
·
·····
·····
·
·
·
··
····
··
·····
··
·
·
··
·
··
··
·
························································· ·······
··································
·
···
···
··
··
···········
·
·
· ·············································
················
·····················································
···········global intracluster density minimum intracluster density
reference
Distance to reference clustering
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49
Planted Partition Graphs: RoughSummary
ML-M
OD
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
0.2
0.4
0.6
0.8
1.0
····
··
·······
···········································
·
·
··
······
········
·
····
·
·
·····
·····
·
·
·
··
····
··
·····
··
·
·
··
·
··
··
·
························································· ·······
··································
·
···
···
··
··
···········
·
·
· ·············································
················
·····················································
···········global intracluster density minimum intracluster density
Distance to reference clustering
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49
Planted Partition Graphs: RoughSummary
ML-M
OD
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
MOD
NXE
GXD
MIXD
AIXD
MIXC
AIXC
MIXE
AIXE
0.2
0.4
0.6
0.8
1.0
····
··
·······
···········································
·
·
··
······
········
·
····
·
·
·····
·····
·
·
·
··
····
··
·····
··
·
·
··
·
··
··
·
························································· ·······
··································
·
···
···
··
··
···········
·
·
· ·············································
················
·····················································
···········global intracluster density minimum intracluster density
Distance to reference clustering
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 46/49
Planted Partition Graphs: Insights
Investigating different configurations yields further insights:Using average intracluster density as constraint leads to veryunbalanced clusteringsConstraining modularity by maximum intracluster density improves itsresults
. . . especially if expected number of clusters is high
Fine reference clusterings disbalance maximum objectivesAverage intercluster expansion/density identify many clusters
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 47/49
Conclusion
Clustering as bicriterial problemOptimize inter-cluster sparsity respecting intra-cluster densityCollection of new measuresAlgorithm Engineering aspects:Formulation of measuresClassification of measures with respect to greedy merge⇒ Insightsabout behavior of measuresExperimental evaluation of greedy methodsExperimental comparison on planted partition graphs
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 48/49
Conclusion
Clustering as bicriterial problemOptimize inter-cluster sparsity respecting intra-cluster densityCollection of new measuresAlgorithm Engineering aspects:Formulation of measuresClassification of measures with respect to greedy merge⇒ Insightsabout behavior of measuresExperimental evaluation of greedy methodsExperimental comparison on planted partition graphs
Thank you for your attention!
Dorothea Wagner –Algorithmic Methods for Complex Network Analysis: Graph Clustering September 19, 2014 48/49