Download - Cluster Anlysis
![Page 1: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/1.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 1/30
Data Mining
Cluster Analysis
Lecture Notes 10
Probabilistic Model-Based Clustering
![Page 2: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/2.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 2/30
Probabilistic Model-Based Clustering
!
Clustering methods discussed so far
– Every data object is assigned to exactly one cluster
!
Some applications may need for fuzzy or soft
cluster assignment
– Ex. An e-game could belong to both entertainment
and software
! Methods:
– fuzzy clusters
– probabilistic model-based clusters
![Page 3: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/3.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 3/30
Fuzzy Clustering
!
Given a set of objects, X=[x1,!,xn], a fuzzy set S is asubset of X that allows each object in X to have a
membership degree between 0 and 1.
! A fuzzy set, S, can be modeled as a function:
F S : X!
[0, 1]! Example: Popularity of cameras is defined as a fuzzy
mapping
– Then, A(0.05), B(1), C (0.86), D(0.27)
pop (o ) =
1 if 1000 or more units of o are soldi 1000 if i (i < 1000) units of o are sold.
Camera Sales (units)
A 50B 1320
C 860
D 270
![Page 4: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/4.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 4/30
Fuzzy Clustering
!
k fuzzy clusters C1, !,Ck , represented as a partitionmatrix M = [wij] ( wij is the membership degree of an
object oi in fuzzy cluster C j)
– for each object oi and cluster C j , 0 " w ij " 1 (fuzzy set)
– for each object oi
, , equal participation in the clustering
– for each cluster C j , ensures there is no empty cluster
! For an object oi , SSE:
– where the parameter p controls the influence of the degrees of membership
!
For a cluster Ci , SSE:
! Measure how well a clustering fits the data:
SSE(o i )=
j =1
w p ij dist (o i , c j )
2,
SSE(C j )=
n
i =1
w p ij dist (o i , c j )
2.
SSE(C )=
n
i =1
j =1
w p ij dist (o i , c j )
2.
j =1
w ij = 1
0 <i =1
w ij < n .
![Page 5: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/5.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 5/30
Fuzzy Clustering - Example
!
Example: Suppose the store has six reviews
! Let cluster features be
– C1 :digital camera and lens
– C2: computer
! For review, Ri, and cluster, C j, wij defined as
The partition matrix
Here review R4 belongs to
clusters C1 and C2 with
membership degrees 2/3
and 1/3, respectively
w ij =|R i ∩C j |
|R i ∩ (C 1 ∪C 2)|=
|R i ∩C j |
|R i ∩ {digital camera , lens , computer }|.
Review ID Keywords
R 1 digital camera, lens
R 2 digital camera
R 3 lens
R 4 digital camera, lens, computer
R 5 computer, CPU
R 6 computer, computer game M =
1 0
1 0
1 023
13
0 1
0 1
.
![Page 6: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/6.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 6/30
![Page 7: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/7.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 7/30
Model-Based Clustering
!
A set C of k probabilistic clusters C 1,",C k with probability densityfunctions f 1,", f k , respectively, and their probabilities #1,", #k.
!
Probability of an object o generated by cluster C j is
! Probability of o generated by the set of cluster C is
! Since objects are assumed to be generated independently, for a data
set D = {o1, !, on}, we have,
P (o |C j ) = ω j f j (o ).
P (o |C ) =
k
j =1
ω
j f j (o ).
P (D |C ) =
n
i =1
P (o i |C ) =
n
i =1
j =1
ω j f j (o i ).
![Page 8: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/8.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 8/30
Model-Based Clustering
!
Task: Find a set C of k probabilistic clusters such that P(D|C) ismaximized.
!
However, maximizing P(D|C) is often intractable since the probability
density function of a cluster can take an arbitrarily complicated form
! To make it computationally feasible (as a compromise), assume the
probability density functions being some parameterized distributions
!
O = {o1, !, on} (n observed objects), # = {$1, !, $k} (parameters of
the k distributions), and P j(oi| $ j) is the probability that oi is generated
from the j-th distribution using parameter $ j, we have
P (o i |) =
j =1
ω j P j (o i | j ),
P (O|) =
n
i =1
k
j =1
ω j P j (o i | j ).
![Page 9: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/9.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 9/30
Univariate Gaussian Mixture Model
!
Assume the probability density function of each cluster follows a 1-d
Gaussian distribution. Suppose that there are k clusters.
! The probability density function of each cluster are centered at µj with
standard deviation % j, $ j, = (µ j, % j), we have
!
Assuming that each cluster has the same probability,
– that is &1 = &2 = ' ' ' = &k = 1 ,
!
Then, we have
P (o i | j ) =1√ 2πσ j
e − (o i −µ j )
2
2σ 2 .
P (O|) =1
k
n
i =1
k
j =1
1√ 2πσ j
e − (o i −µ j )
2
2σ 2 .
P (o i |) =
1
k
k
j =1
1√ 2πσ j e −
(o i −µ j )2
2σ 2 .
![Page 10: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/10.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 10/30
The EM (Expectation Maximization) Algorithm
!
The k-means algorithm has two steps at each iteration: – Expectation Step (E-step): Given the current cluster centers,
each object is assigned to the cluster whose center is closest to
the object: An object is expected to belong to the closest cluster
– Maximization Step (M-step): Given the cluster assignment, for
each cluster, the algorithm adjusts the center so that the sum of
distance from the objects assigned to this cluster and the new
center is minimized
! The (EM) algorithm: A framework to approach maximum likelihood
or maximum a posteriori estimates of parameters in statistical
models.
– E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
– M-step finds the new clustering or parameters that maximize the
sum of squared error (SSE) or the expected likelihood
![Page 11: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/11.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 11/30
Example: Fuzzy Clustering Using the EM Algorithm
!
Initially, let c1 = a and c2 = b! 1st E-step: assign o to c1,w. wt =
–
!
1st M-step: recalculate the centroids according to the partition matrix,
minimizing the sum of squared error (SSE)
!
Iteratively calculate this until the cluster centers converge or the change
is small enough
![Page 12: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/12.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 12/30
Computing Mixture Models with EM
12
!
Given n objects O = {o1, !, on}, we want to mine a set of parameters
# = {#1, !, #k} such that P(O|#) is maximized, where # j = (µ j, % j) are
the mean and standard deviation of the j-th univariate Gaussian
distribution
!
We initially assign random values to parameters # j, then iteratively
conduct the E- and M- steps until converge or sufficiently small change
!
At the E-step, for each object oi, calculate the probability that oi belongsto each distribution,
!
At the M-step, adjust the parameters # j = (µ j, % j) so that the expected
likelihood P(O|#) is maximized
P ( j |o i ,) =P (o i | j )k l =1P (o i |l )
.
µ j =1
k
n i =1
o i P ( j |o i ,)n l =1P ( j |o l ,)
=1
k
n i =1 o i P ( j |o i ,)n i =1P ( j |o i ,)
σ j =
n i =1 P ( j |o i ,)(o i − u j )2n
i =1 P ( j |o i ,).
![Page 13: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/13.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 13/30
Advantages and Disadvantages of Mixture Models
!
Strength – Mixture models are more general than partitioning and fuzzy
clustering
– Clusters can be characterized by a small number of parameters
–
The results may satisfy the statistical assumptions of thegenerative models
!
Weakness
– Converge to local optimal (overcome: run multi-times w. random
initialization)
–
Computationally expensive if the number of distributions is large,
or the data set contains very few observed data points
– Need large data sets
– Hard to estimate the number of clusters
![Page 14: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/14.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 14/30
Notes about slides
!
The following slides are adapted from“Introduction to Data Mining” book. It describes
the EM in simpler way.
! There may be some notation differences from
previous ones.
![Page 15: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/15.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 15/30
Model-based clustering
!
In order to understand our data, we will assume that thereis a generative process (a model) that creates/describes
the data, and we will try to find the model that best fits the
data.
– Models of different complexity can be defined, but we will
assume that our model is a distribution from which data pointsare sampled
– Example: the data is the height of all people in Greece
! In most cases, a single distribution is not good enough todescribe all data points: different parts of the data follow a
different distribution
– Example: the data is the height of all people in Greece and China
– We need a mixture model
– Different distributions correspond to different clusters in the data.
![Page 16: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/16.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 16/30
Gaussian Distribution
!
Example: the data is the height of all people inGreece
– Experience has shown that this data follows aGaussian (Normal) distribution
–
Reminder: Normal distribution:
–
! = mean, " = standard deviation
![Page 17: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/17.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 17/30
Gaussian Model
!
What is a model? – A Gaussian distribution is fully defined by the mean !
and the standard deviation "
– We define our model as the pair of parameters
# = (!,")
!
This is a general principle: a model is defined as
a vector of parameters !
! Fitting the model: We want to find the normaldistribution that best fits our data
– Find the best values for ! and " and "
– But what does best fit mean?
![Page 18: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/18.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 18/30
Maximum Likelihood Estimation (MLE)
!
Suppose that we have a vector $ = (%1, ... , %&) of values
! And we want to fit a Gaussian '(!, ") model to the data
! Probability of observing point %(::
! Probability of observing all points (assume independence)
! We want to find the parameters # = (!, ") that maximizethe probability )( $|#)
![Page 19: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/19.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 19/30
![Page 20: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/20.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 20/30
Maximum Likelihood Estimation (MLE)
!
Note: these are also the most likely parametersgiven the data
!
If we have no prior information about #, or X,
then maximizing )( $|#) is the same as
maximizing )(#|X)
![Page 21: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/21.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 21/30
A Gaussian Distribution
![Page 22: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/22.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 22/30
Mixture of Gaussians
! Suppose that you have the heights of people from Greeceand China. In this case the data is the result of the
mixture of two Gaussians
– One for Greek people, and one for Chinese people
– Identifying for each value which Gaussian is most likely to have
generated it will give us a clustering.
![Page 23: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/23.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 23/30
Mixture model
!
A value %( is generated according to the followingis generated according to the followingprocess:
– First select the nationality
" With probability +, select Greek, with probability +- select
China (+, + +- = 1) – Given the nationality, generate the point from the
corresponding Gaussian
" )(%(|#,) ~ '(!,,",) if Greece
" )(%(|#C) ~ '(!C,"C) if China
![Page 24: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/24.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 24/30
Mixture Model
!
Our model has the following parameters
!
For value %(, we have:, we have:
!
For all values $ =(%1,...,%&)
!
We want to estimate the parameters that
maximize the Likelihood of the data
Mixture probabilities Distribution Parameters
![Page 25: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/25.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 25/30
Mixture Models
!
Once we have the parameters# = (+,, +-, !,, !-, ",, "-) we can estimate the
membership probabilities )(,|%() and )(C|%() for each
point %(::
–
This is the probability that point %( belongs to thebelongs to theGreek or the Chinese population (cluster )
![Page 26: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/26.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 26/30
EM (Expectation Maximization) Algorithm
!
Initialize the values of the parameters in # to somerandom values
! Repeat until convergence
– E-Step: Given the parameters # estimate the membership
probabilities )(,|%() and )(C|%()
– M-Step: Compute the parameter values that (in expectation)
maximize the data likelihood
MLE Estimates
if
Fraction of
population in G,C
![Page 27: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/27.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 27/30
Relationship to K-means
!
E-Step: Assignment of points to clusters – K-means: hard assignment, EM: soft assignment
!
M-Step: Computation of centroids
– K-means assumes common fixed variance (spherical
clusters)
– EM: can change the variance for different clusters or
different dimensions (elipsoid clusters)
!
If the variance is fixed then both minimize the
same error function
![Page 28: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/28.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 28/30
EM Clustering
![Page 29: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/29.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 29/30
EM Clustering
![Page 30: Cluster Anlysis](https://reader033.vdocuments.mx/reader033/viewer/2022051321/577cb1031a28aba7118b6af7/html5/thumbnails/30.jpg)
8/20/2019 Cluster Anlysis
http://slidepdf.com/reader/full/cluster-anlysis 30/30
Mixture Model and K-means Clustering