k -means clustering via principal component analysis
DESCRIPTION
K -means Clustering via Principal Component Analysis. According to the paper by Chris Ding and Xiaofeng He from Int’l Conf. Machine Learning, Banff, Canada, 2004. Traditional K -means Clustering. Minimizing the sum of squared errors. Where data matrix. Centroid of cluster C k. - PowerPoint PPT PresentationTRANSCRIPT
1
K-means Clustering via Principal Component Analysis
According to the paper by Chris Ding and Xiaofeng He from Int’l Conf.
Machine Learning, Banff, Canada, 2004
2
Traditional K-means Clustering
K
k CikiK
k
J1
2)( mx
),,( 1 nX xx ),,( 1 di xx x
Minimizing the sum of squared errors
Where data matrix
kCii
kk n
xm1
Centroid of cluster Ck
nk is the number of points in Ck
3
Principal Component Analysis (PCA)
Centered data matrix
),,,( 1 nY yy ,xxy ii
n
iin 1
1xx
Covariance matrix
n
i
Tii
T
nYY
n 1
))((1
1
1
1xxxx
Factor 1
1
nis ignored
4
PCA - continuation
Eigenvalues and eigenvectors
2/1/,, kkT
kkkkT
kkkT YYYYY uvvvuu
Singular value decomposition (SVD)
k
TkkkY vu2/1
5
PCA - example
6
K-means → PCA
Indikator vectors 2/1/)0,,0,1,,1,0,,0( kT
n
k nk
h
),,( 1 KKH hh
Criterion )Tr()Tr( KTT
KT
K XHXHXXJ Linear transform by K × K orthonormal matrix T
THQ KKk ),,( 1 qq Last column of T
TK nnnnt )/,,/( 11
7
K-means → PCA - continuation
Therefore ehhqnn
n
n
nK
KK
11
1
)Tr()Tr( 11 KTT
KT
K YQYQYYJCriterion
Optimization becomes
)Tr(max 111
KTT
KQ
YQYQK
Solution is first K-1 principal components
),,( 11 KkQ vv
8
PCA → K-means
Clustering by PCA
K
k
Tkk
K
k
Tkk
K
k
Tkk
T nC11
1
1
/ hhqqvvee
Probability of connectivity between i and j
2/12/1jjii
ijij cc
cp
ij
ijij p
pp
if,1
if,0
0.5usually,10
9
10
11
12
13
14
15
16
17
Eigenvalues
• 1. case 164030, 58, 5
• 2. case 212920, 1892, 157
18
19
20
Thank you for your attention