a new initialization method for fuzzy c- means using fuzzy subtractive clustering

20
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011

Upload: mauli

Post on 25-Jan-2016

88 views

Category:

Documents


1 download

DESCRIPTION

A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering. Thanh Le, Tom Altman University of Colorado Denver July 19, 2011. Overview. Introduction Data clustering: approaches and current challenges fzSC - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

A new initialization method for Fuzzy C-Means usingFuzzy Subtractive Clustering

Thanh Le, Tom AltmanUniversity of Colorado Denver

July 19, 2011

Page 2: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Overview Introduction

Data clustering: approaches and current challenges

fzSC a novel fuzzy subtractive clustering

method for FCM parameter initialization Datasets

artificial and real datasets for testing fzSC Experimental results Discussion

Page 3: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Clustering problem Data points are clustered based on

Similarity Dissimilarity

Clusters are defined by Number of clusters Cluster boundaries & overlaps Compactness within clusters Separation between clusters

Page 4: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Clustering approaches Hierarchical approach Partitioning approach

Hard clustering approach Crisp cluster boundaries Crisp cluster membership

Soft/Fuzzy clustering approach Soft/Fuzzy membership Overlapping cluster boundaries Most appropriate for the real problems

Page 5: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Fuzzy C-Means algorithm The model

Features:Fuzzy membership, soft cluster boundariesEach data point can belong to multiple clusters, more relationship information provided

c

1kki

2

ki

n

1i

c

1k

mki

n..1i,1u

1mmin,vxu)V,U|X(J

Page 6: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Fuzzy C-Means (contd.) Possibility-based model Fuzzy sets to describe clusters Model parameters estimated using an

iteration process Rapid convergence Challenges:

Determining the number of clusters Initializing the partition matrix to avoid local

optima

Page 7: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Methods for partition matrix initialization Based on randomization

Problem: Different randomization methods depend on

different data distributions

Using heuristic algorithms: Particle Swarm Problem:

Slow convergence because of velocity adjustment

Integrated with optimization algorithms Problem:

Still based on other methods of partition matrix initialization

Page 8: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Methods for partition matrix…(contd) using Subtractive Clustering Mountain function; the data density,

, : mountain peak radius Mountain amendment; density adjustment,

, : mountain radius Cluster candidate; the most dense data point

, : threshold to stop the cluster center selection

n

1j

2

xx

i

2

2ji

e)x(M

2

2

2jx

*x

eM)x(M)x(M *j1tjt

*0

*t

M

M

Page 9: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Subtractive Clustering methodThe problems Mountain peak radius?

Remaining density to be selected?

Mountain radius?

OK

NO

OKNO

Computational time: O(n2)

Page 10: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

The proposed method: fzSCfor partition matrix initialization

1. Generate a random fuzzy partition2. Compute cluster density using

histogram3. Use strong uniform fuzzy partition

concept4. Estimate mountain function based

on cluster density5. Amend mountain function:

1. Update cluster density (step 2)2. Re-estimate mountain function (step 4)

Page 11: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

fzSC:Optimal number of clusters

1. The most dense data point is a cluster candidate

Data density is not much affected, say less than 0.05 of the data density removed by the mountain function amendment process.

The number of such points is less than n

2. , , are not required3. Computational time: O(c*n)

Page 12: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Datasets Artificial datasets

Finite mixture model based datasetsA manually created (MC) dataset

Data were generated using finite mixture modelClusters were moved to have different distances among clusters

Real datasetsIris, Wine, Glass and Breast Cancer Wisconsin datasets at UC Irvine Machine Learning Repository

Page 13: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Visualization of fzSC result on the manually created (MC) dataset

Rectangles- cluster centers of random fuzzy partition, Circles- cluster centers by fzSC

Page 14: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

A visualization…

Stars- cluster centers of random fuzzy partition, Circles- cluster centers by fzSCThe utility is available online: http://ouray.ucdenver.edu/~tnle/fzsc/

Page 15: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Experimental results onmanually created dataset

The algorithm performance on the MC dataset

AlgorithmCorrectness ratio by class

Avg. Ratio1 2 3 4 5 6

fzSC 1.00 1.00 1.00 1.00 1.00 1.00 1.00

k-means 0.97 0.87 1.00 1.00 1.00 0.75 0.93

k-medians

0.95 0.82 1.00 1.00 1.00 0.62 0.90

FCM 0.97 1.00 0.95 1.00 1.00 0.96 0.98

Page 16: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Experimental results onartificial datasets

The number of clusters generated in the

dataset

The dataset dimension

2 3 4 5

5 0.97 1.00 1.00 1.00

6 1.00 0.98 0.90 1.00

7 1.00 1.00 1.00 1.00

8 1.00 0.99 0.97 1.00

9 0.87 0.99 1.00 0.96

Correctness ratio in determining cluster number

Page 17: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Experimental results onReal datasets

Dataset# data points

known #clusters

predicted #clusters

ratio

Iris 150 3 3 1.00

Wine 178 3 3 1.00

Glass 214 665

0.950.05

Breast Cancer Wisconsin

699 665

0.650.35

Correctness ratio in determining cluster number

Page 18: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Discussion:The advantages of fzSC Traditional subtractive clustering

, , are not required Computational time O(c*n) vs. O(n2)

Heuristic based approaches Rapid convergence Escape local optima

Probability model based Rapid convergence No assumption of data distribution

Page 19: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Discussion:Future work

Combine fzSC with biological cluster validation methods and optimization algorithms for novel clustering algorithms regarding the gene expression data analysis problem.

Page 20: A new  initialization method  for  Fuzzy  C- Means using Fuzzy Subtractive Clustering

Thank you!

Questions?

We acknowledge the support from Vietnamese Ministry of Education and

Training, the 322 scholarship program.