support vector machine (svm)

29
Support Vector Machine (SVM) Based on Nello Cristianini presentation http://www.support-vector.net/tutorial.html

Upload: rowena

Post on 21-Jan-2016

81 views

Category:

Documents


4 download

DESCRIPTION

Support Vector Machine (SVM). Based on Nello Cristianini presentation http:// www.support-vector.net/tutorial.html. Basic Idea. Use Linear Learning Machine (LLM). Overcome the linearity constraints: Map to non-linearly to higher dimension. Select between hyperplans Use margin as a test - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Support Vector Machine (SVM)

Support Vector Machine (SVM)

Based on Nello Cristianini presentationhttp://www.support-vector.net/tutorial.html

Page 2: Support Vector Machine (SVM)

Basic Idea

• Use Linear Learning Machine (LLM).

• Overcome the linearity constraints: Map to non-linearly to higher dimension.

• Select between hyperplans Use margin as a test

• Generalization depends on the margin.

Page 3: Support Vector Machine (SVM)

General idea

Original Problem Transformed Problem

Page 4: Support Vector Machine (SVM)

Kernel Based Algorithms

• Two separate learning functions

• Learning Algorithm: in an imbedded space

• Kernel function performs the embedding

Page 5: Support Vector Machine (SVM)

Basic Example: Kernel Perceptron

• Hyperplane classification f(x)=<w,x>+b = <w’,x’> h(x)= sign(f(x))

• Perceptron Algorithm: Sample: (xi,ti), ti{-1,+1}

If ti <wk,xi> < 0 THEN /* Error*/

wk+1 = wk + ti xi

k=k+1

Page 6: Support Vector Machine (SVM)

Recall

• Margin of hyperplan w

• Mistake bound

2

*max

)(

wD

xM

w

xwwD

Sbx ii

,

min)(

Page 7: Support Vector Machine (SVM)

Observations

• Solution is a linear combination of inputs w = ai ti xi

where ai >0

• Mistake driven Only points on which we make mistake

influence!

• Support vectors The non-zero ai

Page 8: Support Vector Machine (SVM)

Dual representation

• Rewrite basic function: f(x) = <w,x> +b = ai ti <xi , x> +b

w = ai ti xi

• Change update rule: IF tj ( ai ti <xi , xj> +b) < 0

THEN aj = aj+1

• Observation: Data only inside inner product!

Page 9: Support Vector Machine (SVM)

Limitation of Perceptron

• Only linear separations• Only converges for linearly

separable data• Only defined on vectorial data

Page 10: Support Vector Machine (SVM)

The idea of a Kernel

• Embed data to a different space

• Possibly higher dimension

• Linearly separable in the new space.

Original Problem Transformed Problem

Page 11: Support Vector Machine (SVM)

Kernel Mapping

• Need only to compute inner-products.

• Mapping: M(x)

• Kernel: K(x,y) = < M(x) , M(y)>

• Dimensionality of M(x): unimportant!

• Need only to compute K(x,y)

• Using it in the embedded space: Replace <x,y> by K(x,y)

Page 12: Support Vector Machine (SVM)

Example

x=(x1 , x2); z=(z1 , z2); K(x,z) = (<x,z>)2

M(z)) (M(x),

])2,,[],2,,([

)2(

)(),(

2122

221

22

2

221122

22

22

22211

2

11

11

zzzzxxxx

zxzxzxzx

zxzxzx

Page 13: Support Vector Machine (SVM)

Polynomial Kernel

Original Problem Transformed Problem

Page 14: Support Vector Machine (SVM)

Kernel Matrix

k(1,4)k(1,3)k(1,2)K(1,1)K(2,4)K(2,3)K(2,2)K(2,1)K(3,4)K(3,3)K(3,2)K(3,1)K(4,4)K(4,3)K(4,2)K(4,1)

Page 15: Support Vector Machine (SVM)

Example of Basic Kernels

• Polynomial K(x,z)= (<x,z> )d

• Gaussian K(x,z)= exp{- ||x-z||2 /2}

Page 16: Support Vector Machine (SVM)

Kernel: Closure Properties

• K(x,z) = K1(x,z) + c

• K(x,z) = c*K1(x,z)

• K(x,z) = K1(x,z) * K2(x,z)

• K(x,z) = K1(x,z) + K2(x,z)

• Create new kernels using basic ones!

Page 17: Support Vector Machine (SVM)

Support Vector Machines

• Linear Learning Machines (LLM)

• Use dual representation

• Work in the kernel induced feature space f(x) = ai ti K(xi , x) +b

• Which hyperplane to select

Page 18: Support Vector Machine (SVM)

Generalization of SVM

• PAC theory: error = O( Vcdim / m) Problem: Vcdim >> m No preference between consistent hyperplanes

Page 19: Support Vector Machine (SVM)

Margin based bounds

• H: Basic Hypothesis class

• conv(H): finite convex combinations of H

• D: Distribution over X and {+1,-1}

• S: Sample of size m over D

Page 20: Support Vector Machine (SVM)

Margin based bounds

• THEOREM: for every f in conv(H)

Lxyfxyf SD ])([Pr]0)([Pr

/1log

θ

||loglog12

Hm

mOL

Page 21: Support Vector Machine (SVM)

Maximal Margin Classifier

• Maximizes the margin

• Minimizes the overfitting due to margin selection.

• Increases margin Rather than reduce dimensionality

Page 22: Support Vector Machine (SVM)

SVM: Support Vectors

Page 23: Support Vector Machine (SVM)

Margins

• Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi)

f(x)

Page 24: Support Vector Machine (SVM)

Main trick in SVM

• Insist on functional marginal at least 1. Support vectors have margin 1.

• Geometric margin = 1 / || w||

• Proof.

Page 25: Support Vector Machine (SVM)

SVM criteria

• Find a hyperplane (w,b)

• That Maximizes: || w ||2 = <w,w>

• Subject to: for all i ti (<w,xi>+b) 1

Page 26: Support Vector Machine (SVM)

Quadratic Programming

• Quadratic goal function.

• Linear constraint.

• Unique Maximum.

• Polynomial time algorithms.

Page 27: Support Vector Machine (SVM)

Dual Problem

• Maximize W(a) = ai - 1/2 i,j ai ti aj tj K(xi , xj) +b

• Subject to i ai ti =0

ai 0

Page 28: Support Vector Machine (SVM)

Applications: Text

• Classify a text to given categories Sports, news, business, science, …

• Feature space Bag of words Huge sparse vector!

Page 29: Support Vector Machine (SVM)

Applications: Text

• Practicalities: Mw(x) = tfw log (idfw) / K

ftw = text frequency of w

idfw = inverse document frequency

idfw = # documents / # documents with w

• Inner product <M(x),M(z)> sparse vectors

• SVM: finds a hyperplan in “document space”