introduction to graph cluster analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity...
TRANSCRIPT
![Page 1: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/1.jpg)
Introduction to Graph Cluster Analysis
![Page 2: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/2.jpg)
Outline
• Algorithms for Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
2
![Page 3: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/3.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
3
![Page 4: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/4.jpg)
What is Cluster Analysis?
The process of dividing a set of input data into possibly overlapping, subsets, where elements in each subset are considered related by some similarity measure
4
2 Clusters
3 Clusters
![Page 5: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/5.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
5
![Page 6: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/6.jpg)
What is Graph Clustering?
• Types
– Between-graph
• Clustering a set of graphs
– Within-graph
• Clustering the nodes/edges of a single graph
6
![Page 7: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/7.jpg)
Between-graph Clustering
Between-graph clustering methods divide a set of graphs into different clusters
E.g., A set of graphs representing chemical compounds can be grouped into clusters based on their structural similarity
7
![Page 8: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/8.jpg)
Within-graph Clustering
Within-graph clustering methods divides the nodes of a graph into clusters
E.g., In a social networking graph, these clusters could represent people with same/similar hobbies
8
Note: In this lecture we will look at different algorithms to
perform within-graph clustering
![Page 9: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/9.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Within Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
9
![Page 10: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/10.jpg)
Graph-Based Clustering
• Graph-Based clustering uses the proximity graph
– Start with the proximity matrix
– Consider each point as a node in a graph
– Each edge between two nodes has a weight which is the proximity between the two points
– Initially the proximity graph is fully connected
– MIN (single-link) and MAX (complete-link) can be viewed as starting with this graph
• In the simplest case, clusters are connected components in the graph.
![Page 11: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/11.jpg)
Graph-Based Clustering: Sparsification
• The amount of data that needs to be processed is drastically reduced – Sparsification can eliminate more than 99% of the entries in
a proximity matrix
– The amount of time required to cluster the data is drastically reduced
– The size of the problems that can be handled is increased
![Page 12: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/12.jpg)
Graph-Based Clustering: Sparsification …
• Clustering may work better – Sparsification techniques keep the connections to the most
similar (nearest) neighbors of a point while breaking the connections to less similar points.
– The nearest neighbors of a point tend to belong to the same class as the point itself.
– This reduces the impact of noise and outliers and sharpens the
distinction between clusters.
• Sparsification facilitates the use of graph partitioning algorithms (or algorithms based on graph partitioning algorithms. – Chameleon and Hypergraph-based Clustering
![Page 13: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/13.jpg)
Sparsification in the Clustering Process
![Page 14: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/14.jpg)
Minimum Spanning Tree based Clustering
14
1
2
3
4
5
2
3 2 k-Spanning
Tree
k
k groups
of
non-overlapping
vertices 4
Minimum Spanning Tree
STEPS:
• Obtains the Minimum Spanning Tree (MST) of input graph G
• Removes k-1 heaviest edges from the MST
• Results in k clusters
![Page 15: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/15.jpg)
What is a Spanning Tree?
A connected subgraph with no cycles that includes all vertices in the graph
15
1
2
3
4
5
2
3 2
4
6
5
7 4
1
2
3
4
5 2
6
7 Weight = 17
2
Note: Weight can represent either distance or similarity between
two vertices or similarity of the two vertices
G
![Page 16: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/16.jpg)
What is a Minimum Spanning Tree (MST)?
16
1
2
3
4
5
2
3 2
4
6
5
7 4
G
1
2
3
4
5
2
3 2
4
Weight = 11
2 1
2
3
4
5 2
4 5
Weight = 13
1
2
3
4
5 2
6
7 Weight = 17
2
The spanning tree of a graph with the minimum possible sum
of edge weights, if the edge weights represent distance
Note: maximum
possible sum of
edge weights, if the
edge weights
represent similarity
![Page 17: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/17.jpg)
k-Spanning Tree
17
1
2
3
4
5
2
3 2 Remove k-1 edges with
highest weight 4
Minimum Spanning Tree
Note: k – is the
number of
clusters
E.g., k=3
1
2
3
4
5
2
3 2
4
E.g., k=3
1
2
3
4
5
3 Clusters
![Page 18: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/18.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Within Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor Clustering
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
18
![Page 19: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/19.jpg)
19
Shared Nearest Neighbor Clustering
0
1
2
3
4
Shared Nearest Neighbor Graph (SNN)
2
2
2 2 1
1
3
2
Shared Nearest
Neighbor Clustering
Groups
of
non-overlapping
vertices
STEPS:
• Obtains the Shared Nearest Neighbor Graph (SNN) of input graph G
• Removes edges from the SNN with weight less than τ
τ
![Page 20: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/20.jpg)
What is Shared Nearest Neighbor?
20
u v
Shared Nearest Neighbor is a proximity measure and denotes the number
of neighbor nodes common between any given pair of nodes
![Page 21: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/21.jpg)
Shared Nearest Neighbor (SNN) Graph
21
0
1
2
3
4
G
0
1
2
3
4
SNN
2
2
2 2 1
1
3
Given input graph G, weight each edge (u,v) with the number of shared nearest
neighbors between u and v
1
Node 0 and Node 1 have 2 neighbors in
common: Node 2 and Node 3
![Page 22: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/22.jpg)
Shared Nearest Neighbor Clustering Jarvis-Patrick Algorithm
22
0
1
2
3
4
SNN graph of input graph G
2
2
2 2 1
1
3
2
If u and v share more than τ neighbors
Place them in the same cluster
0
1
2
3
4
E.g., τ =3
![Page 23: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/23.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Within Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor Clustering
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
23
![Page 24: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/24.jpg)
What is Betweenness Centrality?
Two types:
– Vertex Betweenness
– Edge Betweenness
24
Betweenness centrality quantifies the degree to which a vertex (or
edge) occurs on the shortest path between all the other pairs of
nodes
![Page 25: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/25.jpg)
Vertex Betweenness
25
The number of shortest paths in the graph G that pass through a given node S
G
E.g., Sharon is likely a liaison between NCSU and DUKE and hence
many connections between DUKE and NCSU pass through Sharon
![Page 26: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/26.jpg)
Edge Betweenness
The number of shortest paths in the graph G that pass through given edge (S, B)
26
E.g., Sharon and
Bob both study at
NCSU and they are
the only link
between NY DANCE
and CISCO groups
NCSU
Vertices and Edges with high Betweenness
form good starting points to identify clusters
![Page 27: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/27.jpg)
Vertex Betweenness Clustering
27
Repeat until
highest vertex
betweenness ≤ μ
Select vertex v with
the highest
betweenness
E.g., Vertex 3 with
value 0.67
Given Input graph G Betweenness for each vertex
1. Disconnect graph at
selected vertex (e.g.,
vertex 3 )
2. Copy vertex to both
Components
![Page 28: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/28.jpg)
28
Edge-Betweenness Clustering Girvan and Newman Algorithm
28
Repeat until
highest edge
betweenness ≤ μ
Select edge with
Highest Betweenness
E.g., edge (3,4) with
value 0.571
Given Input Graph G Betweenness for each edge
Disconnect graph at
selected edge
(E.g., (3,4 ))
![Page 29: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/29.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Within Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor Clustering
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
29
![Page 30: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/30.jpg)
What is a Highly Connected Subgraph?
• Requires the following definitions
– Cut
– Minimum Edge Cut (MinCut)
– Edge Connectivity (EC)
30
![Page 31: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/31.jpg)
Cut
• The set of edges whose removal disconnects a graph
31
6
5
4
7
3 2
1
0
8
6
5
4
7
3 2
1
0
8
6
5
4
7
3 2
1
0
8
Cut = {(0,1),(1,2),(1,3}
Cut = {(3,5),(4,2)}
![Page 32: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/32.jpg)
Minimum Cut
32
6
5
4
7
3 2
1
0
8 6
5
4
7
3 2
1
0
8
MinCut = {(3,5),(4,2)}
The minimum set of edges whose removal disconnects
a graph
![Page 33: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/33.jpg)
Edge Connectivity (EC)
• Minimum NUMBER of edges that will disconnect a graph
33
6
5
4
7
3 2
1
0
8
MinCut = {(3,5),(4,2)}
EC = | MinCut|
= | {(3,5),(4,2)}|
= 2
Edge Connectivity
![Page 34: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/34.jpg)
Highly Connected Subgraph (HCS)
A graph G =(V,E) is highly connected if EC(G)>V/2
34
6
5
4
7
3 2
1
0
8
EC(G) > V/2
2 > 9/2
G
G is NOT a highly connected subgraph
![Page 35: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/35.jpg)
HCS Clustering
35
6
5
4
7
3 2
1
0
8 Find the
Minimum Cut
MinCut (G)
Given Input graph G
(3,5),(4,2)}
YES
Return G
NO
G1 G2
Divide G
using MinCut
Is EC(G)> V/2
Process Graph G1
Process Graph G2
![Page 36: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/36.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Within Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor Clustering
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
36
![Page 37: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/37.jpg)
What is a Clique?
A subgraph C of graph G with edges between all pairs of nodes
37
6
5
4
7
8
Clique
6
5
7 G C
![Page 38: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/38.jpg)
What is a Maximal Clique?
38
6
5
4
7
8
Clique
Maximal Clique
6
5
7
6
5
7
8
A maximal clique is a clique that is not part of a
larger clique.
![Page 39: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/39.jpg)
39
BK(C,P,N)
C - vertices in current clique
P – vertices that can be added to C
N – vertices that cannot be added to C
Condition:
If both P and N are empty – output C as
maximal clique
Maximal Clique Enumeration Bron and Kerbosch Algorithm
Input Graph G
![Page 40: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/40.jpg)
Outline
• Introduction to Clustering
• Introduction to Graph Clustering
• Algorithms for Within Graph Clustering
k-Spanning Tree
Shared Nearest Neighbor Clustering
Betweenness Centrality Based
Highly Connected Components
Maximal Clique Enumeration
Kernel k-means
• Application
40
![Page 41: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/41.jpg)
What is k-means?
• k-means is a clustering algorithm applied to vector data points
• k-means recap:
– Select k data points from input as centroids
1. Assign other data points to the nearest centroid
2. Recompute centroid for each cluster
3. Repeat Steps 1 and 2 until centroids don’t change
41
![Page 42: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/42.jpg)
k-means on Graphs Kernel K-means
• Basic algorithm is the same as k-means on Vector data
• We utilize the “kernel trick”
• “kernel trick” recap
– We know that we can use within-graph kernel functions to calculate the inner product of a pair of vertices in a user-defined feature space.
– We replace the standard distance/proximity measures used in k-means with this within-graph kernel function
42
![Page 43: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/43.jpg)
Application
• Functional modules in protein-protein interaction networks
• Subgraphs with pair-wise interacting nodes => Maximal cliques
43
![Page 44: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/44.jpg)
Chameleon: Clustering Using Dynamic Modeling
• Adapt to the characteristics of the data set to find the natural clusters
• Use a dynamic model to measure the similarity between clusters – Main property is the relative closeness and relative inter-connectivity
of the cluster
– Two clusters are combined if the resulting cluster shares certain properties with the constituent clusters
– The merging scheme preserves self-similarity
• One of the areas of application is spatial data
![Page 45: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/45.jpg)
Characteristics of Spatial Data Sets
• Clusters are defined as densely populated
regions of the space
• Clusters have arbitrary shapes, orientation,
and non-uniform sizes
• Difference in densities across clusters and
variation in density within clusters
• Existence of special artifacts (streaks) and
noise
The clustering algorithm must address
the above characteristics and also
require minimal supervision.
![Page 46: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/46.jpg)
Chameleon: Steps
• Preprocessing Step: Represent the Data by a Graph – Given a set of points, construct the k-nearest-
neighbor (k-NN) graph to capture the relationship between a point and its k nearest neighbors
– Concept of neighborhood is captured dynamically (even if region is sparse)
• Phase 1: Use a multilevel graph partitioning algorithm on the graph to find a large number of clusters of well-connected vertices – Each cluster should contain mostly points from
one “true” cluster, i.e., is a sub-cluster of a “real” cluster
![Page 47: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/47.jpg)
Chameleon: Steps …
• Phase 2: Use Hierarchical Agglomerative Clustering to merge sub-clusters
– Two clusters are combined if the resulting cluster shares certain properties with the constituent clusters
– Two key properties used to model cluster similarity:
• Relative Interconnectivity: Absolute interconnectivity of two clusters normalized by the internal connectivity of the clusters
• Relative Closeness: Absolute closeness of two clusters normalized by the internal closeness of the clusters
![Page 48: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/48.jpg)
Experimental Results: CHAMELEON
![Page 49: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/49.jpg)
Experimental Results: CHAMELEON
![Page 50: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/50.jpg)
Experimental Results: CURE (10 clusters)
![Page 51: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/51.jpg)
Experimental Results: CURE (15 clusters)
![Page 52: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/52.jpg)
Experimental Results: CHAMELEON
![Page 53: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/53.jpg)
i j i j 4
SNN graph: the weight of an edge is the number of shared
neighbors between vertices given that the vertices are connected
Shared Near Neighbor Approach
![Page 54: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/54.jpg)
Creating the SNN Graph
Sparse Graph
Link weights are similarities
between neighboring points
Shared Near Neighbor Graph
Link weights are number of Shared
Nearest Neighbors
![Page 55: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/55.jpg)
ROCK (RObust Clustering using linKs)
• Clustering algorithm for data with categorical and Boolean attributes – A pair of points is defined to be neighbors if their similarity is greater than
some threshold
– Use a hierarchical clustering scheme to cluster the data.
1. Obtain a sample of points from the data set
2. Compute the link value for each set of points, i.e., transform the original similarities (computed by Jaccard coefficient) into similarities that reflect the number of shared neighbors between points
3. Perform an agglomerative hierarchical clustering on the data using the “number of shared neighbors” as similarity measure and maximizing “the shared neighbors” objective function
4. Assign the remaining points to the clusters that have been found
![Page 56: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/56.jpg)
Jarvis-Patrick Clustering
• First, the k-nearest neighbors of all points are found – In graph terms this can be regarded as breaking all but the k strongest
links from a point to other points in the proximity graph
• A pair of points is put in the same cluster if – any two points share more than T neighbors and
– the two points are in each others k nearest neighbor list
• For instance, we might choose a nearest neighbor list of size 20 and put points in the same cluster if they share more than 10 near neighbors
• Jarvis-Patrick clustering is too brittle
![Page 57: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/57.jpg)
When Jarvis-Patrick Works Reasonably Well
Original Points Jarvis Patrick Clustering
6 shared neighbors out of 20
![Page 58: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/58.jpg)
Smallest threshold, T,
that does not merge
clusters.
Threshold of T - 1
When Jarvis-Patrick Does NOT Work Well
![Page 59: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/59.jpg)
SNN Clustering Algorithm 1. Compute the similarity matrix
This corresponds to a similarity graph with data points for nodes and edges whose weights are the similarities between data points
2. Sparsify the similarity matrix by keeping only the k most similar neighbors This corresponds to only keeping the k strongest links of the similarity graph
3. Construct the shared nearest neighbor graph from the sparsified similarity matrix. At this point, we could apply a similarity threshold and find the connected components to obtain the clusters (Jarvis-Patrick algorithm)
4. Find the SNN density of each Point. Using a user specified parameters, Eps, find the number points that have an SNN similarity of Eps or greater to each point. This is the SNN density of the point
![Page 60: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/60.jpg)
SNN Clustering Algorithm …
5. Find the core points Using a user specified parameter, MinPts, find the core points, i.e., all points that have an SNN density greater than MinPts
6. Form clusters from the core points If two core points are within a radius, Eps, of each other they are place in the same cluster
7. Discard all noise points All non-core points that are not within a radius of Eps of a core point are discarded
8. Assign all non-noise, non-core points to clusters This can be done by assigning such points to the nearest core point
(Note that steps 4-8 are DBSCAN)
![Page 61: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/61.jpg)
SNN Density
a) All Points b) High SNN Density
c) Medium SNN Density d) Low SNN Density
![Page 62: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/62.jpg)
SNN Clustering Can Handle Differing Densities
Original Points SNN Clustering
![Page 63: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/63.jpg)
SNN Clustering Can Handle Other Difficult Situations
![Page 64: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/64.jpg)
Finding Clusters of Time Series In Spatio-Temporal Data
26 SLP Clusters via Shared Nearest Neighbor Clustering (100 NN, 1982-1994)
longitude
latitu
de
-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180
90
60
30
0
-30
-60
-90
13 26
24 25
22
14
16 20 17 18
19
15
23
1 9
6
4
7 10 12 11
3
5 2
8
21
SNN Clusters of SLP.
SNN Density of SLP Time Series Data
longitudela
titu
de
-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180
90
60
30
0
-30
-60
-90
SNN Density of Points on the Globe.
![Page 65: Introduction to Graph Cluster Analysisweb.iitd.ac.in/~bspanda/graphclustering.pdf · similarity measure 4 2 Clusters 3 Clusters . ... Clustering Using Dynamic Modeling ... Use a multilevel](https://reader034.vdocuments.mx/reader034/viewer/2022051106/5b7740b37f8b9a515a8c9a38/html5/thumbnails/65.jpg)
Features and Limitations of SNN Clustering
• Does not cluster all the points
• Complexity of SNN Clustering is high – O( n * time to find numbers of neighbor within Eps)
– In worst case, this is O(n2)
– For lower dimensions, there are more efficient ways to find the nearest neighbors
• R* Tree
• k-d Trees