support vector machines (svm): a tool for machine learning yixin chen ph.d candidate, cse 1/10/2002

27
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Upload: susan-preston

Post on 06-Jan-2018

220 views

Category:

Documents


1 download

DESCRIPTION

Introduction Building machines capable of learning from experiences. Experiences are usually specified by finite amount of training data. The goal is to achieve high generalization performance via learning from the training set. The construction of a good learning machine is a compromise between the accuracy attained on a particular training set and the “capacity” of the machine. SVMs have large learning capacity and can have excellent generalization performance.

TRANSCRIPT

Page 1: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines (SVM): A Tool for Machine Learning

Yixin ChenPh.D Candidate, CSE

1/10/2002

Page 2: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Presentation Outline Introduction Linear Learning Machines Support Vector Machines (SVM) Examples Conclusions

Page 3: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Introduction Building machines capable of learning from

experiences. Experiences are usually specified by finite

amount of training data. The goal is to achieve high generalization

performance via learning from the training set. The construction of a good learning machine is

a compromise between the accuracy attained on a particular training set and the “capacity” of the machine.

SVMs have large learning capacity and can have excellent generalization performance.

Page 4: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Linear Learning Machines Binary classification uses a linear

function g(x) = wtx+w0. x is the feature vector, w is the

weight vector and w0 the bias or threshold weight.

A two-category classifier implements the decision rule: Decide class 1 if g(x)>0 and class -1 if g(x)<0.

Page 5: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

A Simple Linear Classifier

Page 6: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Some Properties of Linear Learning Machines Decision surface is a hyperplane. The feature space is divided into

two half-spaces.

Page 7: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Several Questions Does there exist a hyperplane

which separates the training set? If yes, how to compute it? Is it unique? If not unique, can we and how can

we find an “optimal” one? What can we do if there doesn’t

exist one?

Page 8: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Facts If the training set is linearly

separable, then there exist infinitely many separating hyperplanes for the given training set.

If the training set is linearly inseparable then there does not exist any separating hyperplane for the given training set.

Page 9: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Linearly Separable

Page 10: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Margin: 2/|w|

H1: wtx-w0=1 H: wtx-w0=0 H2: wtx-w0=-1

Page 11: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Maximize the margin Minimize |w|/2

Page 12: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Quadratic Program (Maximal Margin)

minw,w0 |w|2/2,

s.t. wtxi≥w0+1 for yi=1, and wtxiw0-1 for yi=-1.(or equivalently yi(wtxi-w0) ≥1)

Dual QP (Maximal Margin)min 0.5i=1,…,mj=1,…,myiyjijxi

txj - i=1,…,mis.t. i=1,…,myii=0, i0, i=1,…,m

Support Vectorsw is a linear combination of support vectors.

Page 13: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Linearly Inseparable

Page 14: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Maximize Margin and Minimize Error (Soft Margin)

minw,w0,z |w|2/2+Ci=1,…mzi,s. t. yi(wtxi-w0)+zi ≥1,zi ≥0, i=1,…,m.(zi is slack or error variable)

Dual QP (Soft Margin)min 0.5i=1,…,mj=1,…,myiyjijxi

txj - i=1,…,mi

s.t. i=1,…,myii=0Ci0, i=1,…,m

Page 15: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Nonlinear Mappings via Kernels

Idea: Map original features into higher dimensional feature space x(x). Design classifier in the new feature space. The classifier is nonlinear in the original feature space but linear in the new feature space. (With an appropriate nonlinear mapping to a sufficiently high dimension, data from two categories can always be separated by a hyperplane.)

Page 16: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Maximal Margin

min 0.5i=1,…,mj=1,…,myiyjij(xi)t(xj) - i=1,…,mi

s.t. i=1,…,myii=0, i0, i=1,…,m

Soft Marginmin 0.5i=1,…,mj=1,…,myiyjij(xi)t(xj) - i=1,…,mi

s.t. i=1,…,myii=0, Ci0, i=1,…,m

Page 17: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Role of Kernels

Simplify the computation of inner product in the new feature space:

K(x,y) = (x)t(y). Some Popular Kernels

Polynomial K(x,y)=(xty+1)p

Gaussian K(x,y)=e-|x-y|2/22

Sigmoid K(x,y)=tanh(xty-) Maximal Margin and Soft Margin

Page 18: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Support Vector Machines Maximal Margin

min 0.5i=1,…,mj=1,…,myiyjijK(xi,xj) - i=1,…,mi

s.t. i=1,…,myii=0, i0, i=1,…,m

Soft Marginmin 0.5i=1,…,mj=1,…,myiyjijK(xi,xj) - i=1,…,mi

s.t. i=1,…,myii=0, Ci0, i=1,…,m

Page 19: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Examples Checker-Board Problem

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

120

140

160

1803 3 Checker-Board

Page 20: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Checker-Board Problem

0 60 120 1800

60

120

180Region Boundary ( = 10)

0 60 120 1800

60

120

180Region Boundary ( = 5)

169 training samples, Gauss Kernel, Soft Margin, C=1000

Page 21: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Checker-Board Problem

0 60 120 1800

60

120

180Region Boundary ( = 15)

0 60 120 1800

60

120

180Region Boundary ( = 20)

169 training samples, Gauss Kernel, Soft Margin, C=1000

Page 22: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Examples Two-Spiral

Problem

-30 -20 -10 0 10 20 30-30

-20

-10

0

10

20

30

Page 23: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Two-Spiral Problem

-30 -20 -10 0 10 20 30-30

-20

-10

0

10

20

30Region Boundary ( =2 )

Spiral 1spiral 2

-30 -20 -10 0 10 20 30-30

-20

-10

0

10

20

30Region Boundary ( = 1)

Spiral 1spiral 2

154 training samples, Gauss Kernel, Soft Margin, C=1000

Page 24: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Two-Spiral Problem

-30 -20 -10 0 10 20 30-30

-20

-10

0

10

20

30Region Boundary ( = 5)

Spiral 1spiral 2

-30 -20 -10 0 10 20 30-30

-20

-10

0

10

20

30Region Boundary ( = 7)

Spiral 1spiral 2

154 training samples, Gauss Kernel, Soft Margin, C=1000

Page 25: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Conclusions Advantages

Always finds a global minimum. Simple and clear geometric interpretation.

Limitations Choice of Kernel. Training a multi-class SVM in one step.

Page 26: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

References N. Cristianini and J.Shawe-Taylor, An

Intorduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000.

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley & Sons, INC., 2001.

C. J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, c, 121-167, 1998.

K. P. Bennett and C. Campbell, Support Vector Machines: Hype or Hallelujah?, SIGKDD Explorations, 2, 2, 1-13, 2000.

SVMLight, http://svmlight.joachims.org/

Page 27: Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002