enhanced visual analysis for cluster tendency assessment and data partitioning

18
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning Liang Wang, Xin Geng, James Bezdek, Christopher Leckie, and Kotagiri Ra mamohanarao Presented by Wen-Chung Liao 2010/12/08

Upload: zaina

Post on 08-Jan-2016

40 views

Category:

Documents


3 download

DESCRIPTION

Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning. Liang Wang, Xin Geng, James Bezdek , Christopher Leckie, and Kotagiri Ramamohanarao Presented by Wen-Chung Liao 2010/12/08. Outlines. VAT Motivation Objectives Methodology SpecVAT A-SpecVAT P-SpecVAT - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

Liang Wang, Xin Geng, James Bezdek,

Christopher Leckie, and Kotagiri Ramamohanarao

Presented by Wen-Chung Liao2010/12/08

Page 2: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outlines VAT Motivation Objectives Methodology

─ SpecVAT─ A-SpecVAT─ P-SpecVAT─ E-SpecVAT

Experimental results Conclusions Comments

Page 3: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VAT (Visual Assessment of cluster Tendency)

I(D)

(Reorder the rows and columns of D)

VAT:• Find P so that is as close to a block

diagonal form as possible.

1. Only D is required as the input.2. Matrix reordering produces neither a

partition nor a hierarchy of clusters.

D D~

D~

Page 4: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

Reordered dissimilarity images (RDIs) ─ only effective in compact well-separated clusters.

However, many practical applications involve data sets with highly complex structure.

Page 5: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

Propose a new approach to generating RDIs that combines VAT with spectral analysis of pairwise data.

Spectral VAT (SpecVAT) ─ images can clearly show the number of clusters c and the appr

oximate sizes of each cluster for data sets with highly irregular cluster structures.

─ the cluster structure in the data can be reliably estimated by visual inspection.

─ A-SpecVAT: automated determination of the number of clusters c.

─ P-SpecVAT: partition the data into c groups.─ E-SpecVAT: handle large data sets, in a “sampling plus exten

sion” manner.

VAT SpecVAT

Page 6: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.SPECTRAL VAT

VAT

VAT

D~

Spectral Mapping

Spectral Mapping

)~

(I DD

D D~ )

~(I D

''' DVVLWD

SpecVAT

Page 7: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.SPECTRAL VAT

n

j ijii wmM1

diagonal, is

matriz)affinity n (n WD

matrix)Laplacian d(normalize 'LW

VL '

'VV

'' DV ),( jiij uudd

Viui of rowth the:

O(Kn2

) DVVLWD O( n3)

Page 8: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.AUTOMATIC CLUSTER TENDENCY ASSESSMENT

Find a “best” SpecVAT image in terms of “clarity” and “block structure.”

Page 9: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

AUTOMATIC CLUSTER TENDENCY ASSESSMENT

C1: diagonal dark block, “within-cluster blocks”, [1, ..., T] C2: non-dark block, “between-cluster blocks ”, [T+1, ..., L]

• Measures for evaluating the class separability (clarity)

• ξis the simplest measure to obtain an optimal threshold T*

σ2W: the within-class variance

σ2B: the between-class variance

σ2T: the total variance of levels

T

C1 C2

A-SpecVAT• Select the best SpecVAT image & determine

the number of clusters as

Page 10: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VISUAL DATA PARTITIONING

1111

11

111

1111

U

cnnnn...

321

A c-partition matrix for O (a data set)

A good candidate partition U? the contrast differences between the dark blocks along the main diagonal and the pixels adjacent to them.

U={n1 : n2 : : nc}

n1+n2 + +nc=n

Page 11: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VISUAL DATA PARTITIONING

Let U be a candidate partition

Ew: mean dissimilarity within dark regions

Eb: mean dissimilarity between dark and nondark regions

(by GA)

Ew

Eb

P-SpecVAT

Page 12: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.DEALING WITH LARGE DATA SETS (

E-SpecVAT

DVVLWD O(n3

)

mmS

kn

Fkn

nF

mS

T

mnmmm

mnmB

mmS

nn

D

V

UkV

WU

SU

WCB

BS

DDD

)(

)(

)~

of columns first the(

) of rseigenvecto(~

) of rseigenvecto(

)(

)(

?

?

)(

)(

SpecVAT

O(m3

)

Sampling m (<<n) rows from D

)~

(I SDVAT

P-SpecVAT

Out-of-sample extension (kNN)

A-SpecVATDetermine the number of clusters c

O(Kn2

)

Page 13: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.EXPERIMENTAL RESULTS

Page 14: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 15: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

R-2S-8

Page 16: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

• n=3,000,000 2D data points • a mixture of 5 normal

distributions

• high-resolution image segmentation

• infeasible to use the full data• five 481 321 images• 154,401 pixels• 300 samples

Page 17: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

The VAT algorithm has been improved by using spectral analysis of the proximity matrix of the data.

How to find a direct visual validation method will be one of important issues in our future work.

Page 18: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

Advantages─ Provide well mathematical analysis. (a good

learning example)

Shortages─ …

Applications─ Clustering─ Image segmentation