gaussian kernel width exploration and cone cluster labeling for support vector clustering

46
1 Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering Department of Computer Science Department of Computer Science University of Massachusetts Lowell University of Massachusetts Lowell Nov. 28, 2007 Sei-Hyung Lee Karen Daniels

Upload: esma

Post on 05-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Department of Computer Science University of Massachusetts Lowell. Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering. Nov. 28, 2007 Sei-Hyung Lee Karen Daniels. Outline. Clustering Overview SVC Background and Related Work - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

1

Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

Department of Computer ScienceDepartment of Computer ScienceUniversity of Massachusetts LowellUniversity of Massachusetts Lowell

Nov. 28, 2007

Sei-Hyung LeeKaren Daniels

Page 2: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

2

Outline

• Clustering Overview

• SVC Background and Related Work

• Selection of Gaussian Kernel Widths

• Cone Cluster Labeling

• Comparisons

• Contributions

• Future Work

Page 3: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

3

Clustering Overview

• Clustering– discovering natural groups in data

• Clustering problems arise in– bioinformatics

• patterns of gene expression

– data mining/compression– pattern recognition/classification

Page 4: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

4

Definition of Clustering

• Definition by Everitt(1974)– “A cluster is a set of entities which are alike, and

entities from different clusters are not alike.”

If we assume that the objects to be clustered are represented as points in the measurement space, then

– “Clusters may be described as connected regions of a multi-dimensional space containing a relatively high density of points, separated from other such regions by a region containing a relatively low density of points.”

Page 5: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

5

Page 6: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

6

Page 7: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

7

Page 8: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

8

Page 9: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

9

Page 10: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

10

Sample Clustering Taxonomy(Zaiane1999)

Partitioning Hierarchical Density-based Grid-based Model-based

fixed number of clusters k

Statistical(COBWEB)

Neural Network(SOM)

Hybrids are also possible.

http://www.cs.ualberta.ca/~zaiane/courses/cmput690/slides/ (Chapter 8)

Page 11: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

11

Strengths and Weaknesses

Typical Strength Weakness

Partitioning • relatively efficient O(ikn)

• split large clusters & merge small clusters• find spherical-shape• sensitive to outliers (k-means)• choice of k• sensitive to initial selection

Hierarchical • does not require choice of k• never be undone• requires termination condition• does not scale well

Density-based • discover arbitrary shape • sensitive to parameters

Grid-based • fast processing time• sensitive to parameters• can’t find arbitrary shape

Model-based• exploit underlying data distribution

• assumption is not always true• expensive to update• difficult for large data sets• slow

Page 12: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

12

Comparison of Clustering TechniquesScalability

Arbitrary Shape

Handle

Noise

Order

Dependency

High

DimensionTime

Complexity

Partitional

k-means YES NO NO NO YES O(ikN)

k-medoids YES NO Outlier NO YES O(ikN)

CLARANS YES NO Outlier NO NO O(N2)

Hierarchical

BIRCH YES NO ? NO NO O(N)

CURE YES YES YES NO NO O(N2logN)

SVC ? YES YES NO YES O((N-Nbsv)Nsv)

Density-based DBSCAN YES YES YES NO NO O(NlogN)

Grid-based STING YES NO ? NO NO O(N)

Model-based COBWEB NO ? ? YES NO ?

k = number of clusters, i = number of iterations, N = number of data points, Nsv = number of support vectors, Nbsv = number of bounded support vectors. SVC time is for single combination of parameters.

Page 13: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

13

Jain et al. Taxonomy (1999)

Distance between 2 clusters = minimum of distances between all inter- cluster pairs.

Distance between 2 clusters = maximum of distances between all inter- cluster pairs.

Cross-cutting Issues

Agglomerative vs. Divisive

Monothetic vs. Polythetic(sequential feature consideration)

Hard vs. Fuzzy

Deterministic vs. Stochastic

Incremental vs. Non-incremental

Page 14: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

14

• Clustering Large Datasets (Mercer 2003)– Hybrid Methods: e.g. Distribution-Based Clustering Algorithm for

Clustering Large Spatial Datasets (Xu et al. 1998)• Hybrid: model-based, density-based, grid-based

More Recent Clustering Surveys

• Doctoral Thesis (Lee 2005)– Boundary-Detecting Methods:

• AUTOCLUST (Estivill-Castro et al. 2000)– Voronoi modeling and Delaunay triangulation

• Random Walks (Harel et al. 2001)– Delaunay triangulation modeling and k-nearest-neighbors– Random walk in weighted graph

• Support Vector Clustering (Ben-Hur et al. 2001)– One-class Support Vector Machine + cluster labeling

Page 15: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

15

Overview of SVM

0)( bxxf

• Map non-linearly separable data into a feature space where they are linearly separable• Class of hyperplanes :

where, ω is normal vector of a hyper-planeb is the offset from the origin

: non-linear mapping

Page 16: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

16

Overview of SVC• Support Vector Clustering (SVC)• Clustering algorithm using (one-class) SVM• Able to handle arbitrary shaped clusters• Able to handle outliers• Able to handle high dimensions, but…• Need input parameters

– For kernel function that defines inner product in feature space

• e.g. Gaussian kernel width q in

– Soft margin C to control outliers 2

1),(

yxqeyxK

Page 17: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

17

aΦ(x)

Gaussian Kernel

x

BSVSV

R : Radius of the minimal hyper-spherea : center of the sphereR(x) : distance between (x) and a

BSV : data x outside of sphere, R(x) > R Num(BSV) is controlled by CSV : data x on the surface of sphere, R(x)=R Num(SV) is controlled by qOthers : data x inside of sphere, R(x) < R

R

2

1),(

yxeyxK

unitball

SVC Main Idea

x

“Attract” hyper-plane onto data points instead of “repel.”

Data space contours are not explicitly available.

q

Page 18: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

18

BSVs.for rmpenalty te a is andconstant a is

s,multiplier Lagrange are 0 and 0 where

)||)(||(min max

:Lagrangian222

,,

jj

jj

jj

jjj

jjjj

R,a

CC

CaxRRL

0 ||)(|| subject to

minimize22

2

jjj andRax

R

j jijijijjjjijj

i jjijij

jjj

jj

j jjjjj

jj

jjj

j jjjj

jjj

jj

jj

jj

jjjj

j jjjj

jjjj

xxKxxKxxxxK

xxxax

CaaxCRR

CaxaxRR

CaxRRL

,

222

22222

2222

222

),(),(,)()(),( using

)()()()(

2)()(

)(2)(

)||)(||( of form dual Wolfe

sphere. inside0,0

by

0sphere. outside

0,0:

sphere of surfaceon

by0,0:

jj

j

j

jj

jj

C

BSV

CSV

Find Minimal Hyper-sphere (with BSVs)

.0 :subject to s'obtain toMaximize Cjj 0)||)(||(0:conditions KKT 22 jjjjj axR (Only points on boundary contribute.)1 2

1

0)1(222

j

jj RRRR

L

0

j j

jjj

jj

j

C

CL

)(

02)(2

jjj

jjjj

j

xa

axa

L

3

4

5

3by 4

by 4

:point dataclassify to Use j

2

1

5

2

Page 19: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

19

ji,jj

2

jj

jj

22

2

2

),(),(2-),(

))(()())(2(-),(

)()2(-)(

||-)(||

and (x)between distance )(

jijij

jj

xxKxxKxxK

xxxxxK

axax

ax

axR

RxRx

contours

)(|

sphere minimal theof surface on the points

Relationship Between Minimal Hyper-sphere and Cluster Contours

R : Radius of the minimal hyper-spherea : center of the sphereR(x) : distance between φ(x) and a

Challenge: Contour boundaries are not explicitly available.

Number of clusters increases with increasing q.

Page 20: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

20

SVC (X)

q initial value;C initial C ( =1)loop K computeKernel(X,q); β solveLagrangian(K,C); cluster labeling(X,β ); if clustering result is satisfactory, exit choose new q and/or C;end loop

SVC High-Level Pseudo-Code

Page 21: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

21

Previous Work on SVC• Tax and Duin (1999): Novelty detection using (one-

class) SVM. • SVC suggested by A. Ben-Hur, V.Vapnik, et al. (2001)

– Complete Graph– Support Vector Graph

• J. Yang, et al. (2002): Proximity Graph• J. Park, et al. (2004): Spectral Graph Partitioning• J. Lee, et al. (2005): Gradient Descent• W. Puma-Villanueva et al. (2005) Ensembles• S. Lee and K. Daniels (2004, 2005, 2006, 2007): Kernel

width exploration and fast cluster labeling

Page 22: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

22

Previous Work on Cluster Labeling

Complete Graph (CG) Support Vector Graph (SVG) Proximity Graph (PG)

all (xi,xj) in Xall (xi,xj),

where xi or xj is a SV

all (xi,xj), where xi and xj are linked in a PG

Page 23: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

23

support vectorsNon-SV data pointsstable equilibrium points

Gradient Descent (GD)

Page 24: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

24

xi

xj

m sample points

xixj

xj

y

Traditional Sample Points Technique

• CG, SVG, PG, and GD use this technique.

① disconnected

② disconnected③ connected

Page 25: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

25

Problems of Sample Points Technique

xi

xj

False Negative False Positive

xi

xj

sample points

Page 26: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

26

CG Result (C=1)

Page 27: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

27

Problems of SVC

• Difficult to find appropriate q and C– no guidance for choosing q and C– too much trial and error

• Slow cluster labeling– O(N2Nsvm) time for CG method, where m is the number of sample

points on the line segment connecting any pair of data points– general size of Delaunay triangulation in d dimensions =

• Bad performance in high-dimensions– as the number of principal components is increased, there is a

performance degradation

)( 2

d

N

Page 28: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

28

Our q Exploration

• Lemmas – If q=0, then R2=0

– If q=∞, then βi=1/N for all i {1,…, ∈ N}

– If q =∞,then R2=1-1/N– R2=1 iff q =∞, and N =∞– If N is finite,

then R2≤1-1/N <1

• Theorem– Under certain circumstances,

R2 is a monotonically nondecreasing function of q

– Secant-like algorithm

Page 29: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

29

q-list Length Analysis

• Estimation of q-list length≈

• depends only on – spatial characteristics of the data set and – not on the dimensionality of the data set or

the number of data

• 89% accuracy w.r.t. the actual q-list length for all datasets considered

})lg(min{})lg(max{22

jiji xxxx

Page 30: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

30

Our Recent q Exploration Work

*q̂

• Curve typically has one critical radius of curvature at q*.

• Approximate q* to yield (without cluster labeling). • Use as starting q value in sequence.*q̂

Page 31: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

31

q Exploration Results

Dim.9

2534

200

• 2D: On average actual number is– 32% of estimate– 22% of secant length

• Higher dimensions: On average actual number is– 112% of estimate.– 82% of secant length

Page 32: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

32

2D q Exploration Results

Page 33: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

33

Higher Dimensional q Exploration Results

Page 34: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

34

Cone Cluster Labeling (CCL)

• Motivation: Avoid line segment sampling• Approach:

– Leverage geometry of feature space. – For Gaussian kernel

• Images of all data points are on surface of unit ball in feature space.

• Hyper-sphere in data space corresponds to cone in feature space with apex at origin.

unit ball

Gaussian Kernel

2

1),(

yxeyxK

q

Sample 2D Data SpaceLow-Dimensional View of High-Dimensional Feature Space

Page 35: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

35

Cone Cluster Labeling

ctorssupport ve ofset is where, ))(( : Covering

:ConeVector Support

space featurein ehyperspher minimal the

and ballunit theof surface ebetween thon intersecti :

VPPP

P

ii

i

vVv

v

θ

θ θθ

vi vj

iv jv

Page 36: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

36

Cone Cluster Labeling

aavi ')(

')()cos( avi

• Cone base angles are all = .

• Cones have a’ in common.

• Pythagorean Theorem holds in feature space.

• To derive data space hyper-sphere radius, use

21 Ra

21)cos( R

Page 37: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

37

Cone Cluster Labeling

P' S

q

R

qZ

ZvSP

PP'

ii

i

vVv

iv

coversely approximat )(

)1ln())ln(cos(-

: radius with at centered ehyperspherctor support ve a toscorrespond )(

space data theinto of mapping :

2

vi

q=0.003q=0.137

P’

Z

Page 38: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

38

Cone Cluster Labeling

for end

Labelsprint

for end

)Labels( )Labels(

toSVnearest thefind idx

where,each for

trix)djacencyMamponents(AFindConnCo Labels

),y(onnectivitConstructC atrix AdjacencyM

for compute

each for

),,(LabelingCluster Cone

idxxx

x

VxXx

ZV

qZ

Qq

VQX

Page 39: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

39

2D CCL Results (C=1)

Page 40: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

40

Sample Higher Dimensional CCL Results in “Heat Map” Form

N

d

N = 12d = 93 clusters

N = 30d = 255 clusters

N = 205d = 2005 clusters

Page 41: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

41

Comparison – cluster labeling algorithms

CG SVG PG GD CCLConstruct Adjacency

MatrixO(N2Nsvm) O(NN2

svm) O(N(logN+ Nsvm)) O(m(N2i+ NsvN2sep)) O(N2

sv)

Find Connected

ComponentsO(N2) O(NNsv) O(N2) O(N2

sep) O(N2sv)

Non-SV Labeling

N/A O((N-Nsv)Nsv) O((N-Nsv)Nsv) O(N-Nsep) O((N-Nsv)Nsv)

TOTAL O(N2Nsvm) O(NN2svm) O(N2 + NNsvm) O(m(N2i+ NsvN2

sep)) O(NNsv)

m: the number of sample pointsi: the number of iterations for convergenceTime is for a single (q,C) combination.

Page 42: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

42

Construct Adjacency Matrix Find Connected Components

Non-SV Labeling Total Time for Cluster Labeling

Comparisons – 2D

Page 43: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

43

Construct Adjacency MatrixFind Connected

ComponentsNon-SV Labeling

Comparisons – HD

Page 44: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

44

Contributions

• Automatically generate Gaussian kernel width values– include appropriate width values for our test data sets– obtain some reasonable cluster results from the q-list

• Faster cluster labeling method– faster than any other SVC cluster labeling algorithms– good clustering quality

Page 45: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

45

Future Work

“The presence or absence of robust, efficient parallel clustering techniques will determine the success or failure of cluster analysis in large-scale data mining applications in the future.” - Jain et al. 1999

Make SVC scalable!

Page 46: Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering

46

End