enhanced visual analysis for cluster tendency assessment and data partitioning
DESCRIPTION
Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning. Liang Wang, Xin Geng, James Bezdek , Christopher Leckie, and Kotagiri Ramamohanarao Presented by Wen-Chung Liao 2010/12/08. Outlines. VAT Motivation Objectives Methodology SpecVAT A-SpecVAT P-SpecVAT - PowerPoint PPT PresentationTRANSCRIPT
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning
Liang Wang, Xin Geng, James Bezdek,
Christopher Leckie, and Kotagiri Ramamohanarao
Presented by Wen-Chung Liao2010/12/08
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outlines VAT Motivation Objectives Methodology
─ SpecVAT─ A-SpecVAT─ P-SpecVAT─ E-SpecVAT
Experimental results Conclusions Comments
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.VAT (Visual Assessment of cluster Tendency)
I(D)
(Reorder the rows and columns of D)
VAT:• Find P so that is as close to a block
diagonal form as possible.
1. Only D is required as the input.2. Matrix reordering produces neither a
partition nor a hierarchy of clusters.
D D~
D~
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
Reordered dissimilarity images (RDIs) ─ only effective in compact well-separated clusters.
However, many practical applications involve data sets with highly complex structure.
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
Propose a new approach to generating RDIs that combines VAT with spectral analysis of pairwise data.
Spectral VAT (SpecVAT) ─ images can clearly show the number of clusters c and the appr
oximate sizes of each cluster for data sets with highly irregular cluster structures.
─ the cluster structure in the data can be reliably estimated by visual inspection.
─ A-SpecVAT: automated determination of the number of clusters c.
─ P-SpecVAT: partition the data into c groups.─ E-SpecVAT: handle large data sets, in a “sampling plus exten
sion” manner.
VAT SpecVAT
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.SPECTRAL VAT
VAT
VAT
D~
Spectral Mapping
Spectral Mapping
)~
(I DD
D D~ )
~(I D
''' DVVLWD
SpecVAT
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.SPECTRAL VAT
n
j ijii wmM1
diagonal, is
matriz)affinity n (n WD
matrix)Laplacian d(normalize 'LW
VL '
'VV
'' DV ),( jiij uudd
Viui of rowth the:
O(Kn2
) DVVLWD O( n3)
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.AUTOMATIC CLUSTER TENDENCY ASSESSMENT
Find a “best” SpecVAT image in terms of “clarity” and “block structure.”
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
AUTOMATIC CLUSTER TENDENCY ASSESSMENT
C1: diagonal dark block, “within-cluster blocks”, [1, ..., T] C2: non-dark block, “between-cluster blocks ”, [T+1, ..., L]
• Measures for evaluating the class separability (clarity)
• ξis the simplest measure to obtain an optimal threshold T*
σ2W: the within-class variance
σ2B: the between-class variance
σ2T: the total variance of levels
T
C1 C2
A-SpecVAT• Select the best SpecVAT image & determine
the number of clusters as
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.VISUAL DATA PARTITIONING
1111
11
111
1111
U
cnnnn...
321
A c-partition matrix for O (a data set)
A good candidate partition U? the contrast differences between the dark blocks along the main diagonal and the pixels adjacent to them.
U={n1 : n2 : : nc}
n1+n2 + +nc=n
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.VISUAL DATA PARTITIONING
Let U be a candidate partition
Ew: mean dissimilarity within dark regions
Eb: mean dissimilarity between dark and nondark regions
(by GA)
Ew
Eb
P-SpecVAT
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.DEALING WITH LARGE DATA SETS (
E-SpecVAT
DVVLWD O(n3
)
mmS
kn
Fkn
nF
mS
T
mnmmm
mnmB
mmS
nn
D
V
UkV
WU
SU
WCB
BS
DDD
)(
)(
)~
of columns first the(
) of rseigenvecto(~
) of rseigenvecto(
)(
)(
?
?
)(
)(
SpecVAT
O(m3
)
Sampling m (<<n) rows from D
)~
(I SDVAT
P-SpecVAT
Out-of-sample extension (kNN)
A-SpecVATDetermine the number of clusters c
O(Kn2
)
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.EXPERIMENTAL RESULTS
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
R-2S-8
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
• n=3,000,000 2D data points • a mixture of 5 normal
distributions
• high-resolution image segmentation
• infeasible to use the full data• five 481 321 images• 154,401 pixels• 300 samples
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions
The VAT algorithm has been improved by using spectral analysis of the proximity matrix of the data.
How to find a direct visual validation method will be one of important issues in our future work.
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
Advantages─ Provide well mathematical analysis. (a good
learning example)
Shortages─ …
Applications─ Clustering─ Image segmentation