presenter : lin, shu -han authors : jeen-shing wang, jen- chieh chiang

23
Intelligent Database Systems Lab N.Y.U.S. T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang PR (2008)

Upload: misty

Post on 22-Feb-2016

72 views

Category:

Documents


0 download

DESCRIPTION

A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm. Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang. PR (2008 ). Outline. Introduction of SVC Motivation Objective Methodology Experiments - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

A cluster validity measure with a hybrid parameter search method

for the support vector clustering algorithm

Presenter : Lin, Shu-HanAuthors : Jeen-Shing Wang, Jen-Chieh Chiang

PR (2008)

Page 2: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC

SVC is from SVMs SVMs is supervised clustering technique

Fast convergence Good generalization performance Robustness for noise

SVC is unsupervised approach1. Data points map to HD feature space using a Gaussian kernel.

2. Look for smallest sphere enclose data.

3. Map sphere back to data space to form set of contours.

4. Contours are treated as the cluster boundaries.

3

Page 4: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

To find the minimal enclose sphere with soft margin:

To solve this problem, the Lagrangian function:

4

a

Page 5: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

5

Page 6: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

Karush-Kuhn-Tucker complementarity:

6

Page 7: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC -Sphere Analysis

To find the minimal enclose sphere with soft margin:

C : existence of outliers allowed

7

Wolfe dual optimization

problem a

Bound SV; Outlier

Page 8: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC -Sphere Analysis

The distance (similarity) between x and a:

q : |clusters| & the smoothness/tightness of the cluster boundaries.

8

Mercer kernelKernel: Gaussian

a

Gaussian function:

Page 9: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

9

Drawbacks of Cluster validation Compactness

Different densities or size As the # of clusters increases, it will monotonic decrease

Separation Irregular cluster structures

Page 10: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

10

Their previous study Can handle

Different sizes Different densities Arbitrary shape

But…

Page 11: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Objectives – A cluster validity method and a parameter search algorithm for SVC

Auto determine the two parameter: Increasing q lead to increasing # of clusters C regulates the existence of outliers and overlapping clusters

To Identify the optimal structure

11

Page 12: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Idea

12

q is related to the densities of the clusters Each cluster structure corresponds to an interval of q Identify the optimal structure is equivalent to finding the

largest interval

N=64, max # of cluster = , 8 N

Page 13: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Problem

13

How to locate overall search range of q How to detect outliers/noises How to identify the largest interval

Page 14: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Locate range of q

14

Lower bound

Upper bound: Employ K-Means to get clusters, and get variance of each clusters vi

N

Ascending order: cluster size

n =3, the biggest 3 clusters’ variance

Page 15: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Outlier Detection

Set q = qmax ,the tightest of q

15

outliersingleton

And we get Copt, remove these outlier

Page 16: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – the largest interval

16

qopt

Page 17: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – the largest interval

17

Fibonacci search: locate the interval wherethe cluster structure is the same

Bisection search

n: iteration

Page 18: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Overview

18

Locate range of q

Outlier Detection

the largest interval

Page 19: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments - Benchmark and Artificial Examples

19

Page 20: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments - Outlier

20

Copt

Page 21: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

21

Page 22: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

22

Conclusions

A new measure: Inspired from the observations of q

Determine the optimal cluster structure with its corresponding range of q and C

qC

Page 23: Presenter : Lin,  Shu -Han Authors    :  Jeen-Shing  Wang, Jen- Chieh  Chiang

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

23

Comments

Advantage Inspired from observation of parameter

Drawback …

Application SVC DBSCAN: MinPts / Eps