classification: support vector machine

37
Classification: Support Vector Machine 10/10/07

Upload: timothy-booth

Post on 31-Dec-2015

51 views

Category:

Documents


4 download

DESCRIPTION

Classification: Support Vector Machine. 10/10/07. What hyperplane (line) can separate the two classes of data?. What hyperplane (line) can separate the two classes of data?. But there are many other choices! Which one is the best?. M: margin. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classification: Support Vector Machine

Classification: Support Vector Machine

10/10/07

Page 2: Classification: Support Vector Machine

What hyperplane (line) can separate the two classes of data?

Page 3: Classification: Support Vector Machine

What hyperplane (line) can separate the two classes of data?

But there are many other choices!

Which one is the best?

Page 4: Classification: Support Vector Machine

What hyperplane (line) can separate the two classes of data?

But there are many other choices!

Which one is the best?

M: margin

1iy

00 Tix

1iy

Page 5: Classification: Support Vector Machine

M

Optimal separating hyperplane

The best hyperplane is the one that maximizes the margin, M.

1iy

1iy 1iy

1iy

M

Page 6: Classification: Support Vector Machine

A hyperplane is

}0)(:{ 0 Txxfx

Computing the margin width

1iy

1iy

xT + 0

= 1

xT + 0

= 0

xT + 0

= -1

x+

x-

Find x+ and x- on the “plus” and “minus” plane, so that x+ - x- is perpendicular to .

Then M = | x+ - x- |

Page 7: Classification: Support Vector Machine

Find x+ and x- on the “plus” and “minus” plane, so that x+ - x- is perpendicular to .

Then M = | x+ - x- |

Since x+T + 0 = 1

x-T + 0 = -1

(x+ - x-)T = 2

A hyperplane is

}0)(:{ 0 Txxfx

Computing the margin width

1iy

1iy

xT + 0

= 1

xT + 0

= 0

xT + 0

= -1

x+

x-

M = | x+ - x- | = 2/| |

Page 8: Classification: Support Vector Machine

The hyperplane is separating if

The maximizing problem is

subject to

ixy Tii ,0)( 0

,

2max

Computing the marginal width

M

support vector

1iy

1iyixy T

ii ,1)( 0

Page 9: Classification: Support Vector Machine

Optimal separating hyperplaneRewrite the problem as

subject to

Lagrange function

To minimize, set partial derivatives to be 0

Can be solved by quadratic programming.

2||||2

1min

i

TiiiP xyL ]1)([||||

2

10

2

iii

iiii

y

xy

0

ixy Tii ,1)( 0

Page 10: Classification: Support Vector Machine

What is the best hyperplane?

1iy

1iy

When the two classes are non-separable

Idea: allow some points to lie on the wrong side, but not by much.

ii

Page 11: Classification: Support Vector Machine

Support vector machineWhen the two classes are not separable, the problem is

slightly modified:

Find

subject to

Can be solved using quadratic programming.

constant,0

,1)( 0

ii

iTii ixy

1iy

1iy

0,

2||||2

1min

Page 12: Classification: Support Vector Machine

Convert a nonseparable to separable case by nonlinear transformation

non-separable in 1D

Page 13: Classification: Support Vector Machine

Convert a nonseparable to separable case by nonlinear transformation

separable in 1D

))(,()( xfxxh

Page 14: Classification: Support Vector Machine

• Introduce nonlinear kernel functions h(x), and work on the transformed functions.

Then the separating function is

In fact, all you need is the kernel function:

Common kernels:

))(( 0 Txhy

)'(),()',( xhxhxxK

Kernel function

Page 15: Classification: Support Vector Machine

Applications

Page 16: Classification: Support Vector Machine

Prediction of central nervous systems embryonic tumor outcome• 42 patient samples

• 5 cancer types

• Array contains 6817 genes

• Question: are different tumors types distinguishable from gene expression pattern?

(Pomeroy et al. 2002)

Page 17: Classification: Support Vector Machine

(Pomeroy et al. 2002)

Page 18: Classification: Support Vector Machine

Gene expressions within a cancer type cluster together

(Pomeroy et al. 2002)

Page 19: Classification: Support Vector Machine

PCA based on all genes

(Pomeroy et al. 2002)

Page 20: Classification: Support Vector Machine

PCA based on a subset of informational genes

(Pomeroy et al. 2002)

Page 21: Classification: Support Vector Machine
Page 22: Classification: Support Vector Machine
Page 23: Classification: Support Vector Machine

(Khan et al. 2001)

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks

•Four different cancer types.

•88 samples

•6567 genes

•Goal: to predict cancer types from gene expression data

Page 24: Classification: Support Vector Machine

(Khan et al. 2001)

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks

Page 25: Classification: Support Vector Machine

Procedures

• Filter out genes that have low expression values (retain 2308 genes)

• Dimension reduction by using PCA --- select top 10 principle components

• 3 fold cross-validation:

(Khan et al. 2001)

Page 26: Classification: Support Vector Machine

Artificial Neural Network

Page 27: Classification: Support Vector Machine
Page 28: Classification: Support Vector Machine

(Khan et al. 2001)

Page 29: Classification: Support Vector Machine

Procedures

• Filter out genes that have low expression values (retain 2308 genes)

• Dimension reduction by using PCA --- select top 10 principle components

• 3 fold cross-validation:

• Repeat 1250 times.

(Khan et al. 2001)

Page 30: Classification: Support Vector Machine

(Khan et al. 2001)

Page 31: Classification: Support Vector Machine

(Khan et al. 2001)

Page 32: Classification: Support Vector Machine

Acknowledgement

• Sources of slides:– Cheng Li– http://www.cs.cornell.edu/johannes/papers/20

01/kdd2001-tutorial-final.pdf– www.cse.msu.edu/~lawhiu/

intro_SVM_new.ppt

Page 33: Classification: Support Vector Machine

Aggregating predictors

• Sometimes aggregating several predictors can perform better than each single predictor alone. Aggregating is achieved by weighted sum of different predictors, which can be the same kind of predictors obtained from slightly perturbed training datasets.

• Key to the improvement of accuracy is the instability of individual classifiers, such as the classification trees.

Page 34: Classification: Support Vector Machine

AdaBoost

• Step 1: Initialization the observation weights • Step 2: For m = 1 to M,

– Fit a classifier Gm(X) to the training data using weight wi– Compute

– Compute

– Set • Step 3: Output

N

i i

N

i miim

w

XGyIwerr

1

1))((

)/)1log(( mmm errerr

NixGyIww immmii ,,1))],((exp[

M

mmm xGsignxG

1

)()(

NiNwi ,,1,/1

misclassified obs are given more weights

Page 35: Classification: Support Vector Machine

Boosting

Page 36: Classification: Support Vector Machine

• Substituting, we get the Lagrange (Wolf) dual function

subject to

To complete the steps, see Burges et al.

• If then

These xi’s are called the support vectors.

is only determined by the support vectors

Optimal separating hyperplane

i j j

Tijijii iD xxyyL

2

1

ii ,0

i

iii xy

,0i 1)( 0 Tii xy

Page 37: Classification: Support Vector Machine

The Lagrange function is

Setting the partial derivatives to be 0.

Substituting, we get

Subject to

i i iii

Tiiii iP xyL )]1()([||||

2

10

2

ii

iii

iiii

y

xy

0

i j j

Tijijii iD xxyyL

2

1

0,0 i iii y

Support vector machine