Download - Clustering Algorithms
1
1
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Outline
Part 1: Basic Concepts of data clustering
Non-Supervised Learning and Clustering
: Problem formulation – cluster analysis
: Taxonomies of Clustering Techniques
: Data types and Proximity Measures
: Difficulties and open problems
Part 2: Clustering Algorithms
Hierarchical methods
: Single-link
: Complete-link
: Clustering Based on Dissimilarity Increments Criteria
From Single Clustering to Ensemble Methods - April 2009
2
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering
Use proximity matrix: nxn
: D(i,j): proximity (similarity or distance) between patterns i and j
Step 0 Step 1 Step 2 Step 3 Step 4
b
d
c
e
aa b
d e
c d e
a b c d e
Step 4 Step 3 Step 2 Step 1 Step 0
agglomerative
divisive
From Single Clustering to Ensemble Methods - April 2009
2
3
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering: Agglomerative Methods
1. Start with n clusters containing one object
2. Find the most similar pair of clusters Ci e Cj from the proximity
matrix and merge them into a single cluster
3. Update the proximity matrix (reduce its order by one, by replacing
the individual clusters with the merged cluster)
4. Repeat steps (2) e (3) until a single cluster is obtained (i.e. N-1
times)
From Single Clustering to Ensemble Methods - April 2009
4
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering: Agglomerative Methods
Similarity measures between clusters:
:Well known similarity measures can be written using the Lance-Williams
formula, expressing the distance between cluster k and cluster i+j,
obtained by the merging of clusters i and j:
Single-link
Complete-link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)
),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),( 0
2 kji
ji
ji
ji
j
j
ji
ii dkjidc
nn
nnb
nn
na
nn
na
baa ji 0c ; 25.0 ; 5.0
ji CbCaji
ji
ji
j
j
ji
ii bad
nnCCdcb
nn
na
nn
na
,
),(1
),( 0
0
c
nnn
nb
nnn
nna
nnn
nna
jik
k
jik
jk
j
jik
iki
),(),(),(),(),(),( kjdkidcjibdkjdakidakjid ji
From Single Clustering to Ensemble Methods - April 2009
3
5
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single-link
Complete-link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)
),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),( 0
2 kji
ji
ji
ji
j
j
ji
ii dkjidc
nn
nnb
nn
na
nn
na
baa ji 0c ; 25.0 ; 5.0
ji CbCaji
ji
ji
j
j
ji
ii bad
nnCCdcb
nn
na
nn
na
,
),(1
),( 0
0
c
nnn
nb
nnn
nna
nnn
nna
jik
k
jik
jk
j
jik
iki
Single Link: Distance between two clusters
is the distance between the closest points.
Also called “neighbor joining.”
From Single Clustering to Ensemble Methods - April 2009
6
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single-link
Complete-link
Centroide
Median
(Average link)
Ward’s Method
(minimum variance)
),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),( 0
2 kji
ji
ji
ji
j
j
ji
ii dkjidc
nn
nnb
nn
na
nn
na
baa ji 0c ; 25.0 ; 5.0
ji CbCaji
ji
ji
j
j
ji
ii bad
nnCCdcb
nn
na
nn
na
,
),(1
),( 0
0
c
nnn
nb
nnn
nna
nnn
nna
jik
k
jik
jk
j
jik
iki
Complete Link: Distance between clusters
is distance between farthest pair of points.
From Single Clustering to Ensemble Methods - April 2009
4
7
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single-link
Complete-link
Centroide
Median
(Average link)
Ward’s Method
(minimum variance)
),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),( 0
2 kji
ji
ji
ji
j
j
ji
ii dkjidc
nn
nnb
nn
na
nn
na
baa ji 0c ; 25.0 ; 5.0
ji CbCaji
ji
ji
j
j
ji
ii bad
nnCCdcb
nn
na
nn
na
,
),(1
),( 0
0
c
nnn
nb
nnn
nna
nnn
nna
jik
k
jik
jk
j
jik
iki
Centroid: Distance between clusters is
distance between centroids.
From Single Clustering to Ensemble Methods - April 2009
8
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single-link
Complete-link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)
),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),( 0
2 kji
ji
ji
ji
j
j
ji
ii dkjidc
nn
nnb
nn
na
nn
na
baa ji 0c ; 25.0 ; 5.0
ji CbCaji
ji
ji
j
j
ji
ii bad
nnCCdcb
nn
na
nn
na
,
),(1
),( 0
0
c
nnn
nb
nnn
nna
nnn
nna
jik
k
jik
jk
j
jik
iki
Average Link: Distance between clusters is
average distance between the cluster points.
From Single Clustering to Ensemble Methods - April 2009
5
9
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Ward’s Link: Minimizes the sum-of-squares
criterion (measure of heterogenity)
Hierarchical Clustering: Agglomerative Methods
Single-link
Complete-link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)
),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji
),(),( 0
2 kji
ji
ji
ji
j
j
ji
ii dkjidc
nn
nnb
nn
na
nn
na
baa ji 0c ; 25.0 ; 5.0
ji CbCaji
ji
ji
j
j
ji
ii bad
nnCCdcb
nn
na
nn
na
,
),(1
),( 0
0
c
nnn
nb
nnn
nna
nnn
nna
jik
k
jik
jk
j
jik
iki
From Single Clustering to Ensemble Methods - April 2009
10
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single Linkage: ),(min,
,badCCd
ji CbCaji
x y
1 4 4
2 8 4
3 15 8
4 24 4
5 24 121 2 3 4 5
1 - 4 11.7 20 21.5
2 4 - 8.1 16 17.9
3 11.7 8.1 - 9.8 9.8
4 20 16 9.8 - 8
5 21.5 17.9 9.8 8 -
1 2
35
4
1 2
35
4
1,2 3 4 5
1,2 - 8.1 16 17.9
3 8.1 - 9.8 9.8
4 16 9.8 - 8
5 17.9 9.8 8 -
1 2
35
4
From Single Clustering to Ensemble Methods - April 2009
6
11
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single Linkage:
1,2 3 4,5
1,2 - 8.1 16
3 8.1 - 9.8
4,5 16 9.8 -
1 2
35
4
1,2,3 4,5
1,2,3 - 9.8
4,5 9.8 -
1 2
35
4
Dendrogram
),(min,,
badCCdji CbCa
ji
From Single Clustering to Ensemble Methods - April 2009
12
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Complete-Link: ),(max,,
badCCdji CbCa
ji
1 2 3 4 5
1 - 4 11.7 20 21.5
2 4 - 8.1 16 17.9
3 11.7 8.1 - 9.8 9.8
4 20 16 9.8 - 8
5 21.5 17.9 9.8 8 -
1 2
35
4
1 2
35
4
1,2 3 4 5
1,2 - 11.7 20 21.5
3 11.7 - 9.8 9.8
4 20 9.8 - 8
5 21.5 9.8 8 -
1 2
35
4
1,2 3 4,5
1,2 - 11.7 21.5
3 11.7 - 9.8
4,5 21.5 9.8 -
From Single Clustering to Ensemble Methods - April 2009
7
13
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Complete-Link: ),(max,,
badCCdji CbCa
ji
1 2
35
4
1,2,3 4,5
1,2,3 - 21.5
4,5 21.5 -
Dendrogram
From Single Clustering to Ensemble Methods - April 2009
14
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
1 2
35
4
Single-link
1 2
35
4
Complete-link
Single-link and Complete-Link
: A clustering of the data objects is obtained by cutting the dendrogram at
the desired level, then each connected component forms a cluster.
Favours connectedness Favours compactness
From Single Clustering to Ensemble Methods - April 2009
8
15
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single-link and Complete-Link
SL algorithm:
: Favors connectedness
Equivalent to building a MST
And cutting at weak links
2 3 4 5 6 7 8 9 101
2
3
4
5
6
7
8
9
10
2 3 4 5 6 7 8 9 101
2
3
4
5
6
7
8
9
10
2 3 4 5 6 7 8 9 101
2
3
4
5
6
7
8
9
10
2 3 4 5 6 7 8 9 101
2
3
4
5
6
7
8
9
10
From Single Clustering to Ensemble Methods - April 2009
16
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single-link and Complete-Link
SL algorithm:
: Favors connectedness
: Detects arbitrary-shaped clusters with even densities
: Cannot handle distinct density clusters
. Single-link method, th=0.49
From Single Clustering to Ensemble Methods - April 2009
9
17
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
-5 0 5 10-4
-2
0
2
4
-5 0 5 10-4
-2
0
2
4
Single-link and Complete-Link
SL algorithm:
: Favors connectedness
: Detects arbitrary-shaped clusters with even densities
: Cannot handle distinct density clusters
: Is sensitive to in-between patterns
-5 0 5 10-4
-2
0
2
4
From Single Clustering to Ensemble Methods - April 2009
18
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single-link and Complete-Link
SL algorithm:
: Favors connectedness
: Detects arbitrary-shaped clusters with even densities
: Cannot handle distinct density clusters
: Is sensitive to in-between patterns
: Needs criteria to set the final number of clusters
CL algorithm:
: Favors compactness
-5 0 5 10-4
-2
0
2
4
-5 0 5 10-4
-2
0
2
4
From Single Clustering to Ensemble Methods - April 2009
10
19
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single-link and Complete-Link
SL algorithm:
: Favors connectedness
: Detects arbitrary-shaped clusters with even densities
: Cannot handle distinct density clusters
: Is sensitive to in-between patterns
: Needs criteria to set the final number of clusters
CL algorithm:
: Favors compactness
: Imposes spherical-shaped clusters on data
-2 -1 0 1 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1 0 1 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
From Single Clustering to Ensemble Methods - April 2009
20
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Single-link and Complete-Link
SL algorithm:
: Favors connectedness
: Detects arbitrary-shaped clusters with even densities
: Cannot handle distinct density clusters
: Is sensitive to in-between patterns
: Needs criteria to set the final number of clusters
CL algorithm:
: Favors compactness
: Imposes spherical-shaped clusters on data
: Needs criteria to set the final number of clusters
From Single Clustering to Ensemble Methods - April 2009
11
21
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Hierarchical Clustering
Weakness
: do not scale well: time complexity of O(n2), where n is the number
of total objects
: can never undo what was done previously
Integration of hierarchical with distance-based clustering
: BIRCH (Zhang, Ramakrishnan, Livny, 1996): uses a
ClusteringFeature-tree and incrementally adjusts the quality of sub-
clusters
: CURE (Guha, Rastogi & Shim, 1998): selects well-scattered points
from the cluster and then shrinks them towards the center of the
cluster by a specified fraction
: CHAMELEON (G. Karypis, E.H. Han and V. Kumar, 1999):
hierarchical clustering using dynamic modeling
1. Use a graph partitioning algorithm: cluster objects into a large
number of relatively small sub-clusters
2. Use an agglomerative hierarchical clustering algorithm: find the
genuine clusters by repeatedly combining these sub-clusters
From Single Clustering to Ensemble Methods - April 2009
22
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Smoothness Hypothesis:
: A cluster is a set of patterns sharing important characteristics in a
given context
: A dissimilarity measure encapsulates the notion of pattern
resemblance
: Higher resemblance patterns are more likely to belong to the same
cluster and should be associated first
: Dissimilarity between neighboring patterns within a cluster should
not occur with abrupt changes
: The merging of well separated clusters results in abrupt changes
in dissimilarity values
A Fred, J Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE PAMI, 2003.
From Single Clustering to Ensemble Methods - April 2009
12
23
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Dissimilarity Increments:
Dissimilarity increment:
, , - nearest neighbors
: argmin , ,
: argmin , ,
i j k
j l il
k l jl
x x x
x j d x x l i
x k d x x l i
, , , ,inc i j k i j j kd x x x d x x d x x
ixjx
kx
,i jd x x
,i jd x x
incd
From Single Clustering to Ensemble Methods - April 2009
24
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Distribution of Dissimilarity Increments:
: Uniformly distributed data
From Single Clustering to Ensemble Methods - April 2009
13
25
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Distribution of Dissimilarity Increments:
: 2D Gaussian data
From Single Clustering to Ensemble Methods - April 2009
26
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Distribution of Dissimilarity Increments:
: Ring-shaped data
From Single Clustering to Ensemble Methods - April 2009
14
27
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Distribution of Dissimilarity Increments:
: Directional expanding data
From Single Clustering to Ensemble Methods - April 2009
28
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Distribution of Dissimilarity Increments:
: Exponential distribution: ( ) expx
p x
From Single Clustering to Ensemble Methods - April 2009
15
29
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Exponential distribution:
: Higher density patterns -> higher
: Well separated clusters -> dinc on the tail of p(x)
From Single Clustering to Ensemble Methods - April 2009
30
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Gap between clusters: ,i i j t igap d C C d C
12 3
45
6
7
8
9
10
11
12
13
14
15
16
17
18d
t(C
i) d(C
i,C
j)
dt(C
j)
gapj
dt(C
j)
dt(C
i)gap
i
Ci
Cj
From Single Clustering to Ensemble Methods - April 2009
16
31
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Gap between clusters: ,i i j t igap d C C d C
Ci
Cj
gapi
gapj
dt(C
i )d
t(C
j ) d(C
i ,C
j )
From Single Clustering to Ensemble Methods - April 2009
32
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Cluster Isolation criterion:
Let Ci, Ck be two clusters which are candidates for merging, and let i , k
be the respective mean values of the dissimilarity increments in each
cluster. Compute the increments for each cluster, gapi and gapk. If
gapi i (gapk k), isolate cluster Ci (Ck) and proceed the
clustering strategy with the remaining patterns. If neither cluster exceeds
the gap limit, merge them.
From Single Clustering to Ensemble Methods - April 2009
17
33
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Setting the Isolation Criterion Parameter :
: Result: the crossings of the tangential line, at points which are multiple of
the distribution mean value, /, with the x axis, is given by (+1)/
: 3,5 cover the significant part of the distribution
exponential distribution, =.05
tangential lines at points multiple of
=.05
From Single Clustering to Ensemble Methods - April 2009
34
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Hierarchical Clustering Algorithm:
: A statistic of the dissimilarity increments within a cluster is maintained and
updated during cluster merging
: Clusters are obtained by comparing dissimilarity increments with a dynamic
threshold, i , based on cluster statistics
From Single Clustering to Ensemble Methods - April 2009
18
35
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:Ring-Shaped Clusters
From Single Clustering to Ensemble Methods - April 2009
36
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:Ring-Shaped Clusters
. Single-link method, th=0.49
From Single Clustering to Ensemble Methods - April 2009
19
37
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:Ring-Shaped Clusters
. Dissimilarity Increments-base method:
d1
d2
d1
d2d2
g1
g2
3/
3/1
3/3
From Single Clustering to Ensemble Methods - April 2009
38
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:Ring-Shaped Clusters
. Dissimilarity Increments-base method:
d1
d2d2
g1
g2
3/
3/1
3/3
From Single Clustering to Ensemble Methods - April 2009
20
39
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:2-D Patterns with Complex Structure
Dissimilarity Increments-base method Single-link
2 < < 9
Single Link
th=1.1
From Single Clustering to Ensemble Methods - April 2009
40
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering of Contour Images
: The data set is composed by 634 contour images of 15 types of hardware tools: t1 to t15.
: When counting each pose as a distinct sub-class in the object type, we obtain a total of 24 classes.
From Single Clustering to Ensemble Methods - April 2009
21
41
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering of Contour Images
Contour extraction
: the object boundary is sampled at 50 equally spaced points
: the angle between consecutive segments is quantized in 8 levels.
Contour
Extraction
String Contour
Description
Based on a
thresholding
method
8 directional
differential
chain code
From Single Clustering to Ensemble Methods - April 2009
42
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering of Contour Images
From Single Clustering to Ensemble Methods - April 2009
22
43
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering of Contour Images
t1
t2
t3
t4
t5
t1
t2
t3
t4
t5
Single-link Dissimilarity Increments-based method
(string-edit distance)
From Single Clustering to Ensemble Methods - April 2009
44
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Description:
: Hierarchical agglomerative algorithm adopting a cluster isolation
criterion based on dissimilarity increments
Strength
: Method is not conditioned by a particular dissimilarity measure
(examples used Euclidean and string-edit distances)
: Ability to identify arbitrary shaped and sized clusters
: The number of clusters is intrinsically found
Weakness
: Sensitive to in-between points connecting touching clusters
From Single Clustering to Ensemble Methods - April 2009
23
45
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Outline
Partitional Methods
: K-Means
: Spectral Clustering
: EM-based Gaussian Mixture Decomposition
Part 3.: Validation of clustering solutions
Cluster Validity Measures
Part 4.: Ensemble Methods
Evidence Accumulation Clustering
From Single Clustering to Ensemble Methods - April 2009
46
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Partitional Methods
K-Means: Minimizes the cost function:
: Algorithm:
. Input: k, the number of clusters; data set
1. Randomly select k seed points from the data set, and take them as initial centroids
2. Partition the data into k clusters by assigning each object to the cluster with the nearest centroid.
3. Compute centroids of the clusters of the current partition. The centroid is the center (mean point) of the cluster.
4. Go back to step 2 or stop when no more new assignment.
From Single Clustering to Ensemble Methods - April 2009
24
47
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Partitional Methods: K-Means
Favors compactness
-5 0 5 10-4
-2
0
2
4
-5 0 5 10-4
-2
0
2
4
From Single Clustering to Ensemble Methods - April 2009
48
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Partitional Methods: K-Means
Favors compactness
Strength
: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)
: Scalability
: Often terminates at a local optimum.
Weakness
: Imposes spherical-shaped clusters
K-means clustering of uniform data (k=4)
From Single Clustering to Ensemble Methods - April 2009
25
49
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Partitional Methods: K-Means
Favors compactness
Strength
: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)
: Scalability
: Often terminates at a local optimum.
Weakness
: Imposes spherical-shaped clusters
: Is sensitive to the number of objects in clusters0 2 4 6 8 100
2
4
6
8
10
0 2 4 6 8 100
2
4
6
8
10
From Single Clustering to Ensemble Methods - April 2009
50
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Partitional Methods: K-Means
Favors compactness
Strength
: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)
: Scalability
: Often terminates at a local optimum.
Weakness
: Imposes spherical-shaped clusters
: Is sensitive to the number of objects in clusters
: Dependence on initialization
From Single Clustering to Ensemble Methods - April 2009
26
51
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Favors compactness
Strength
: Fast
: Scalability
: Robust to noise
Weakness
: Imposes spherical-shaped clusters
: Is sensitive to the number of objects in clusters
: Dependence on initialization
0 2 4 6 8 100
2
4
6
8
10
0 2 4 6 8 100
2
4
6
8
10
0 2 4 6 8 100
2
4
6
8
10
From Single Clustering to Ensemble Methods - April 2009
52
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Partitional Methods: K-Means
Favors compactness
Strength
: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)
: Scalability
: Often terminates at a local optimum.
Weakness
: Imposes spherical-shaped clusters
: Is sensitive to the number of objects in clusters
: Dependence on initialization
: Needs criteria to set the final number of clusters
: Applicable only when mean is defined (what about categorical
data?)
From Single Clustering to Ensemble Methods - April 2009
27
53
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Variations of K-Means Method
K-Means(MacQueen’67): each cluster is represented by the center
of the cluster
A few variants of the k-means which differ in
: Selection of the initial k means
: Dissimilarity calculations (Mahalanobis distance -> elliptic clusters)
: Strategies to calculate cluster means
. Medoid - each cluster is represented by one of the objects in the cluster
: Fuzzy version: Fuzzy K-Means
Handling categorical data: k-modes (Huang’98)
: Replacing means of clusters with modes
: Using new dissimilarity measures to deal with categorical objects
: Using a frequency-based method to update modes of clusters
From Single Clustering to Ensemble Methods - April 2009
54
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering
For a given data set, X, Spectral Clustering finds a set of data clusters on the basis of spectral analysis of a similarity graph
The clustering problem is defined in terms of a complete graph, G, with vertices V={1, …, N}, corresponding to the data points in the data set, and each edge between two vertices is weighted by the similarity between them.
The weight matrix is also called the affinity matrix or the similarity matrix.
2
2
2
ji xx
ij eA
Gaussian Kernel
From Single Clustering to Ensemble Methods - April 2009
28
55
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering
Cutting edges of G we obtain disjoint subgraphs of G as the clusters of X
The goal of clustering is to organize the dataset into disjoint subsets with high intra-cluster similarity and low inter-cluster similarity
The resulting clusters should be as compact and isolated as possible
From Single Clustering to Ensemble Methods - April 2009
56
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering
The graph partitioning for data clustering can be interpreted as aminimization problem of an objective function, in which the compactnessand isolation are quantified by the subset sums of edge weights.
Common objective functions
Ratio cut (Rcut)
Normalised cut (Ncut)
Min-max cut (Mcut)
• cut(A, B) is the sum of the edge weights between ∀p ∈ A and ∀p ∈ B
• P\Cl is the complement of Cl ⊂ X,
• card Cl denotes the number of points in Cl
k
l ll
llk
k
l l
llk
k
l l
llk
CC
CXCCC
XC
CXCCC
C
CXCCC
1
1
1
1
1
1
),cut(
)\,cut(:),,Mcut(
),cut(
)\,cut(:),,Ncut(
card
)\,cut(:),,Rcut(
From Single Clustering to Ensemble Methods - April 2009
29
57
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering
The solution of the minimization problem of any of the previous objectivefunctions is obtained from the matrix of the first k eigenvectors of a matrixderived from the affinity matrix (Laplacian matrix)
The eigenvectors for Ncut and Mcut are identical, and obtained from thesymmetrical Laplacian
Another common choice is
Distinct algorithms differ on the way of producing and using the eigenvectors and how to derive clusters from them.: Some use each eigenvector one at a time
: Other, use top k eigenvectors simultaneously
Closely related with spectral graph partitioning, in which the second eigenvector of a graph’s Laplacian is used to define a semi-optimal cut; the second eigenvector solves a relaxation of an NP-hard discrete graph partitioning problem, giving an approximation to the optimal cut.
2/12/1: ADDLL sym
• D is a diagonal matrix whose ii-th entry is the sum of the i-th row of A
ADDLL rw 1:
From Single Clustering to Ensemble Methods - April 2009
58
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
Maps the feature space into a new space, Y, based on the
eigenvectors of a matrix derived from an affinity matrix associated
with the data set.
The data partition is obtained by applying the K-means algorithm
on the new space.
(NG. and Al. 2001)
2
2
2
ji xx
ij eA
Original feature
space
Eigen vector
Feature space
A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, NIPS 2001
From Single Clustering to Ensemble Methods - April 2009
30
59
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
Algorithm:
From Single Clustering to Ensemble Methods - April 2009
60
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
From Single Clustering to Ensemble Methods - April 2009
31
61
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
Results strongly depend on parameter values: k e σ
K=2, σ=0.1 K=2, σ=0.4
From Single Clustering to Ensemble Methods - April 2009
62
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
Results strongly depend on parameter values: k e σ
K=2, σ=0.1 K=2, σ=0.4
From Single Clustering to Ensemble Methods - April 2009
32
63
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
Results strongly depend on parameter values: k e σ
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1.5
-1
-0.5
0
0.5
1
1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1.5
-1
-0.5
0
0.5
1
1.5
K=3, σ=0.1 K=3, σ=0.45
From Single Clustering to Ensemble Methods - April 2009
64
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering (Ng et al, 2001)
Results strongly depend on parameter values: k e σ
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1.5
-1
-0.5
0
0.5
1
1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1.5
-1
-0.5
0
0.5
1
1.5
K=3, σ=0.1 K=3, σ=0.45
From Single Clustering to Ensemble Methods - April 2009
33
65
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering
Selection of parameter values:
MSE:
Eigengap
Rcut
k
i
n
j
i
j
iY
i
myN
MSE1 1
21
1
21)(
A
i j ij
K
k
K
kll Sj Sj jk
KA
ARcut k l1 ,1
From Single Clustering to Ensemble Methods - April 2009
66
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Selection of Parameters: Global Results on selecting σ and K
None of the studied methods is suitable for the automatic selection of the spectral clustering parameters
A majority voting decision did not significantly improve the results
Percentage of correct classification:
σ
From Single Clustering to Ensemble Methods - April 2009
34
67
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Spectral Clustering
Strength
: Detects arbitrary-shaped clusters.
: By using an adequate similarity measure between patterns, can be
applied to all types of data
Weakness
: Computationally heavy
: Needs criteria to set the final number of clusters and scaling factor
From Single Clustering to Ensemble Methods - April 2009
68
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Model-Based Clustering: Finite Mixtures
k random sources, with probability density functions fi(x), i=1,…,k
f1(x)
f2(x)
fi(x)
fk(x)
Choose at random
X random variable
f (x|source i) = fi (x)Conditional:
f (x and source i) = fi (x) iJoint:
f(x) = Unconditional: f (x and source i) sourcesall
k
1i
ii )x(f
From Single Clustering to Ensemble Methods - April 2009
35
69
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Model-Based Clustering: Finite Mixtures
each component models one cluster
clustering = mixture fitting
f1(x)
f2(x)
f3(x)
k
i i
i=1
f(x|Θ) = α f(x|θ )
From Single Clustering to Ensemble Methods - April 2009
70
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Gaussian Mixture Decomposition
Mixture Model:
:.
.
k
i i
i=1
f(x|Θ) = α f(x|θ )Component densities
Mixing probabilities:
k
1i
i 1iα 0 and
f ( | )ix Gaussian
Arbitrary covariances: ),|x(N)|x(f iii Cμ
},...,,,...,,,,...,,{ 1k21k21k21 CCCμμμ
Common covariance: ),|x(N)|x(f ii Cμ
},...,,,,...,,{ 1k21k21 Cμμμ
From Single Clustering to Ensemble Methods - April 2009
36
71
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Mixture Model Fitting
n independent observations
Mixture density model:
Estimate that maximizes (log)likelihood (ML estimate of ):
k
i i
i=1
f(x|Θ) = α f(x|θ )
}x,...,x,x{ )n()2()1(x
),x(Lmaxargˆ
n
1j
k
1i
i
)j(
i )|(flog x
mixture
n
1j
)j()|x(flog),x(L
From Single Clustering to Ensemble Methods - April 2009
72
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Gaussian Mixture Model Fitting
Problem: the likelihood function is unbounded as
. There is no global maximum
. Unusual goal: a “good” local maximum
Example: a 2-component Gaussian mixture
some data points
0)det( i C
2
)x(
σ2
)x(
2
2
21
22
21
21
e2
1e
σ2),σ,μ,μ|x(f
}x,...,x,x{ n21
n
2j
2
)x(
2log(...)e
2
1
σ2log),x(L
221
0σas, 2
11 xμ
From Single Clustering to Ensemble Methods - April 2009
37
73
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Mixture Model Fitting
ML estimate has no closed-form solution
Standard alternative: expectation-maximization (EM) algorithm:
: Missing data problem:
. Observed data: }x,...,x,x{ )n()2()1(x
From Single Clustering to Ensemble Methods - April 2009
74
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Mixture Model Fitting
ML estimate has no closed-form solution
Standard alternative: expectation-maximization (EM) algorithm:
: Missing data problem:
. Observed data:
. Missing data:
Missing labels (“colors”)
“1” at position i x( j) generated by component i
. Complete log-likelihood function:
}x,...,x,x{ )n()2()1(x},...,,{ )n()2()1(
zzzz
T)j(
k
)j(
2
)j(
1
)j( 0...010...0,z...,,z,z z
n
1j
k
1i
i
)j(
ii
)j(
ic )|(flogz),,(L xzx
)|z,x(flog )j()j(
From Single Clustering to Ensemble Methods - April 2009
38
75
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
The EM Algorithm
The E-step: compute the expected value of
The M-step: update parameter estimates
),,(Lc zx
)ˆ,(Q]ˆ,x|),,(L[E )t()t(
c zx
)ˆ,(Qmaxargˆ )t()1t(
From Single Clustering to Ensemble Methods - April 2009
76
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
EM-Algorithm for Mixture of Gaussian
Iterative procedure:
The E-step:
The M-step:
,...ˆ,ˆ,...,ˆ,ˆ )1t()t()1()0(
( ) ( )( , )
( ) ( )
1
ˆˆ ( | )
ˆˆ ( | )
j tj t i i
i kj t
n n
n
f xw
f x
)t,j(
iw Estimate, at iteration t, of the probability that x( j) was
produced by component i
n
1j
)t,j(
i
)1t(
i wn
1ˆ
n
1j
)t,j(
i
n
1j
)j()t,j(
i
)1t(
i
w
xw
ˆ
n
1j
)t,j(
i
n
1j
T)1t(
i
)j()1t(
i
)j()t,j(
i
)1t(
i
w
)ˆx()ˆx(w
C
From Single Clustering to Ensemble Methods - April 2009
39
77
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Mixture Gaussian Decomposition: Model Selection
How many components?
: The maximized likelihood never decreases when k increases
:
: Usually:
: Criteria in this cathegory:
. Minimum description length (MDL), Rissanen and Ristad, 1992.
. Akaike’s information criterion (AIC), Whindham and Cutler, 1992.
. Schwarz´s Bayesian inference criterion (BIC), Fraley and Raftery, 1998.
: Resampling-based techniques
. Bootstrap for clustering, Jain and Moreau, 1987.
. Bootstrap for Gaussian mixtures, McLachlan, 1987.
. Cross validation, Smyth, 1998.
maxminmin)k( k,...,1k,kk),ˆ(minargk C
)ˆ(P)ˆ,(L)ˆ( )k()k()k( xC
From Single Clustering to Ensemble Methods - April 2009
78
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Mixture Gaussian Decomposition: Model Selection
Given (k) , shortest code-length for x (Shannon’s):
MDL criterion: parameter code length
: Total code-length (two part code):
: MDL criterion:
)|(flog)|(L )k()k( xx
)(L)|(flog),(L )k()k()k( xx
Parameter
code-length
)(L)|(flogminargˆ)k()k()k(
)k(
x
L(each component of (k) ) = )'nlog(2
1
'n Amount of data from which the parameter is estimated
From Single Clustering to Ensemble Methods - April 2009
40
79
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Mixture Gaussian Decomposition: Model Selection
Classical MDL: n’ = n
Mixtures MDL (MMDL) (Figueiredo, 2002)
: Using EM and redefining the M-Step
)nlog(2
k)|(flogminargˆ
)k()k(
)k( x
k
1m
m
pp
)k()k( )log(2
N)nlog(
2
)1N(k)|(flogminargˆ
)k(
x
2
Nw
pn
1j
)t,j(
ii
y
y+
k
1m
m
i)1t(
iˆ
This M-step may annihilate components
M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002
Np is the number of parameters
of each component.
Gaussian, arbitrary covariances
Np = d + d(d+1)/2
Gaussian, common covariance:
Np = d
From Single Clustering to Ensemble Methods - April 2009
80
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Gaussian Mixture Decomposition – EM MMDL
Examples
From Single Clustering to Ensemble Methods - April 2009
41
81
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Gaussian Mixture Decomposition – EM MMDL
Examples
From Single Clustering to Ensemble Methods - April 2009
82
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Gaussian Mixture Decomposition
Strength
: Model-based approach
: Good for Gaussian data
: Handles touching clusters
Weakness
: Unable to detect arbitrary shaped clusters
: Dependence on initialization
From Single Clustering to Ensemble Methods - April 2009
42
83
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
Gaussian Mixture Decomposition
It is a local (greedy) algorithm (likelihood never decreases)
=> Initialization dependent
74 iterations 270 iterations
From Single Clustering to Ensemble Methods - April 2009
84
Unsupervised Learning Clustering Algorithms
Unsupervised Learning -- Ana Fred
References
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
A.K. Jain and M. N. Murty and P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol 31. No
3,pp 264-323, 1999.
Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann Pusblishers, 2001.
A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, in Advances in
Neural Information Processing Systems 14, T. G. Dietterich and S. Becker and Z. Ghahramani, MIT
Press, 2002,
M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002
A. L. N. Fred, J. M. N. Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE
Trans. On Pattern Analysis and Machine Intelligence, vol 25, NO. 8, pp 2003.
From Single Clustering to Ensemble Methods - April 2009