lecture notes for chapter 9 introduction to data mining
TRANSCRIPT
Data MiningCluster Analysis: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 9
Introduction to Data Miningby
Tan, Steinbach, Kumar
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Clustering Algorithms Types
Prototype Based Density Based Graph Based
(Added By Dr. Rafea)
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Prototype Based (Fuzzy C-Mean)
http://www.cse.aucegypt.edu/~rafea/CSCE564/slides/Clustering.pdf
http://www.cse.aucegypt.edu/~rafea/CSCE564/slides/Fuzzy-C_Means%20example.pdf
(Added By Dr. Rafea)
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Hard (Crisp) vs Soft (Fuzzy) Clustering
Hard (Crisp) vs. Soft (Fuzzy) clusteringβ For soft clustering allow point to belong to more than
one cluster β For K-means, generalize objective function
π€π€ππππ : weight with which object xi belongs to cluster ππππ
β To minimize SSE, repeat the following steps: Fix ππππand determine wππππ (cluster assignment) Fixwππππ and recompute ππππ
β Hard clustering:wππππβ {0,1}
11
=β=
k
jijwππππππ = οΏ½
ππ=1
ππ
οΏ½ππ=1
ππ
π€π€ππππ ππππππππ(ππππ , ππππ)2
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Soft (Fuzzy) Clustering: Estimating Weights
21
22
21
9)25()12()(
xx
xx
wwwwxSSE
+=β+β=
SSE(x) is minimized when wx1 = 1, wx2 = 0
1 2 5
c1 c2x
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Fuzzy C-means
Objective function
wππππ: weight with which object ππππ belongs to cluster ππππ ππ: a power for the weight not a superscript and controls how
βfuzzyβ the clustering is
β To minimize objective function, repeat the following: Fix ππππ and determinewππππ
Fixwππππand recompute ππ
β Fuzzy c-means clustering:wππππβ[0,1]
Bezdek, James C. Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, 1981.
11
=β=
k
jijw
p: fuzzifier (p > 1)
ππππππ = οΏ½ππ=1
ππ
οΏ½ππ=1
ππ
π€π€ππππππ ππππππππ(ππππ , ππππ)2
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Fuzzy C-means
22
21
222
221
9
)25()12()(
xx
xx
wwwwxSSE
+=
β+β=
SSE(x) is minimized when wx1 = 0.9, wx2 = 0.1
1 2 5
c1 c2x
SS
E(x
)
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Fuzzy C-means
Objective function:
Initialization: choose the weights wij randomly
Repeat:β Update centroids:
β Update weights:
11
=β=
k
jijw
p: fuzzifier (p > 1)
π€π€ππππ = (1/ππππππππ(ππππ , ππππ)2)1
ππβ1/οΏ½ππ=1
ππ
(1/ππππππππ(ππππ , ππππ)2)1
ππβ1
ππππ = οΏ½ππ=1
ππ
π€π€ππππππππ /οΏ½ππ=1
ππ
π€π€ππππ
ππππππ = οΏ½ππ=1
ππ
οΏ½ππ=1
ππ
π€π€ππππππ ππππππππ(ππππ , ππππ)2
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Fuzzy K-means Applied to Sample Data
maximummembership
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
An Example Application: Image Segmentation
Modified versions of fuzzy c-means have been used for image segmentationβ Especially fMRI images (functional magnetic
resonance images)
Referencesβ Gong, Maoguo, Yan Liang, Jiao Shi, Wenping Ma, and Jingjing Ma. "Fuzzy c-means clustering with local
information and kernel metric for image segmentation." Image Processing, IEEE Transactions on 22, no. 2 (2013): 573-584.
From left to right: original images, fuzzy c-means, EM, BCFCMβ Ahmed, Mohamed N., Sameh M. Yamany, Nevin Mohamed, Aly A. Farag, and Thomas Moriarty. "A modified
fuzzy c-means algorithm for bias field estimation and segmentation of MRI data." Medical Imaging, IEEE Transactions on 21, no. 3 (2002): 193-199.
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Hard (Crisp) vs Soft (Probabilistic) Clustering
Idea is to model the set of data points as arising from a mixture of distributions
β Typically, normal (Gaussian) distribution is usedβ But other distributions have been very profitably used
Clusters are found by estimating the parameters of the statistical distributions
β Can use a k-means like algorithm, called the Expectation-Maximization (EM) algorithm, to estimate these parameters Actually, k-means is a special case of this approach
β Provides a compact representation of clusters β The probabilities with which point belongs to each cluster provide
a functionality similar to fuzzy clustering.
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Probabilistic Clustering: Example
Informal example: consider modeling the points that generate the following histogram.
Looks like a combination of twonormal (Gaussian) distributions
Suppose we can estimate the mean and standard deviation of each normal distribution.
β This completely describes the two clustersβ We can compute the probabilities with which each point belongs
to each clusterβ Can assign each point to the
cluster (distribution) for which it is most probable.
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Probabilistic Clustering: EM Algorithm
Initialize the parametersRepeat
For each point, compute its probability under each distributionUsing these probabilities, update the parameters of each distribution
Until there is no change
Very similar to K-means Consists of assignment and update steps Can use random initialization
β Problem of local minima For normal distributions, typically use K-means to initialize If using normal distributions, can find elliptical as well as
spherical shapes.
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Probabilistic Clustering: Updating Centroids
Update formula for weights assuming an estimate for statistical parameters
Very similar to the fuzzy k-means formulaβ Weights are probabilitiesβ Weights are not raised to a powerβ Probabilities calculated using Bayes rule:
Need to assign weights to each clusterβ Weights may not be equalβ Similar to prior probabilitiesβ Can be estimated:
ββ==
=m
iij
m
iijij CpCp
11
)|(/)|( xxxc
β=
=m
iijj Cp
mCp
1
)|(1)( x
ππππ ππππ ππ ππππππππ πππππππππππΆπΆππ ππππ ππ ππππππππππππππππππ ππππ ππ ππππππππππππππππ
ππ πΆπΆππ ππππ =ππ ππππ πΆπΆππ ππ(πΆπΆππ)
βππ=1ππ ππ ππππ πΆπΆππ ππ(πΆπΆππ)
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
More Detailed EM Algorithm
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Probabilistic Clustering Applied to Sample Data
maximumprobability
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Probabilistic Clustering: Dense and Sparse Clusters
-10 -8 -6 -4 -2 0 2 4-8
-6
-4
-2
0
2
4
6
x
y
?
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Problems with EM
Convergence can be slow
Only guarantees finding local maxima
Makes some significant statistical assumptions
Number of parameters for Gaussian distribution grows as O(d2), d the number of dimensionsβ Parameters associated with covariance matrixβ K-means only estimates cluster means, which grow
as O(d)
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Useful Links to Probabilistic Clustering
https://www.youtube.com/watch?v=iQoXFmbXRJA&t=6s
https://www.youtube.com/watch?v=TG6Bh-NFhA0
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Density Based (Grid Based Clustering)
Algorithm1. Define a set of grid cells2. Assign objects to the appropriate cells and compute
the density of each cell3. Eliminate cells having a density below a specified
threshold4. Form clusters from adjacent groups of dense cells
(Added By Dr. Rafea)
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Example
00000000000000000618174
271801313141431240211018110004142030000000
Define a set of Grid cellsAssign Objects to the cells and compute their densitiesDiscard cells having less than 9 objects (loosing parts of the cluster
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Graph-Based Clustering
Graph-Based clustering uses the proximity graphβ Start with the proximity matrixβ Consider each point as a node in a graphβ Each edge between two nodes has a weight which is
the proximity between the two pointsβ Initially the proximity graph is fully connected β MIN (single-link) and MAX (complete-link) can be
viewed as starting with this graph
In the simplest case, clusters are connected components in the graph.
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Graph-Based Clustering: Sparsification
The amount of data that needs to be processed is drastically reduced β Sparsification can eliminate more than 99% of the
entries in a proximity matrix β The amount of time required to cluster the data is
drastically reducedβ The size of the problems that can be handled is
increased
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Graph-Based Clustering: Sparsification β¦
Clustering may work betterβ Sparsification techniques keep the connections to the most
similar (nearest) neighbors of a point while breaking the connections to less similar points.
β The nearest neighbors of a point tend to belong to the same class as the point itself.
β This reduces the impact of noise and outliers and sharpens the distinction between clusters.
Sparsification facilitates the use of graph partitioning algorithms (or algorithms based on graph partitioning algorithms.
β Chameleon and Hypergraph-based Clustering
Β© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 βΉ#βΊ
Sparsification in the Clustering Process