support vector machines 簡介林宗勳 - cmlab.csie.ntu...

Support Vector Machines ([email protected]) Support Vector Machines(SVM)(Classification) Vapnik SVM

SVM SVM (hyperplane)

(margin)

{ } niyx ii ,...,1,, = and { }1,1, + idi yRx

bxwxf T =)( 1=iy

separating hyperplane optimal separating hyperplane (OSH)

0)( xf )(xf

Support Hyperplane support hyperplane support hyperplane optimal separating hyperplane support hyperplanesupport hyperplane

=

+=

bxwbxw

T

T

(over-parameterized problem) x, b, x, b, =1 x b (scale) optimal separating hyperplane (margin )support hyperplane( support hyperplane OSH) separating hyperplane support hyperplane d

( ) ( )( ) ( )0,1/1/1

0,1/1/1

=++=

=+=

bifwwbbd

bifwwbbd

separating hyperplan margin d

margin = 2d = 2/||w|| ||w|| margin support hyperplan optimal separating hyperplane 1 ()

11

11

+=+

=

iiT

iiT

ybxw

ybxw

( ) 01 bxwy iTi

minimize 221 w

subject to ( ) ibxwy iTi 01 SVM (the primal problem of the SVM) Lagrange Multiplier MethodLw, b, i(iLagrange Multiplier)

( )[ ]=

=N

ii

Tii bxwywbwL

1

2 121),,(

( ) 01 = bxwy iTi i 0

( ) 01 > bxwy iTi i 0 Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method() http://episte.math.ntu.edu.tw/entries/en_lagrange_mul/index.htmlhttp://episte.math.ntu.edu.tw/entries/en_lagrange_mul/notes.html
http://episte.math.ntu.edu.tw/entries/en_lagrange_mul/index.htmlhttp://episte.math.ntu.edu.tw/entries/en_lagrange_mul/notes.html

Tutorial on Lagrange Multipliers ()http://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdf L w b

0

0

1

1

*

1

=

==

=

==

N

iii

N

iiii

N

iiii

y

xywxyw

Lagrangian

maximize == ij

jTijiji

N

iiD xxyyL 2

11

subject to 001

==

i

N

iii y

= 0pwL 01

==

N

iiii xyw

= 0pbL 01

==

N

iii y

( ) 01 bxwy iTi Lagrange multiplier condition 0i

complementary slackness ( )[ ] 01 = bxwy iTii KKT(Karush-Kuhn-Tucker conditions) Support Vector (training data)KKTsupport hyperplan support vector i 0
http://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdfhttp://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdfhttp://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdf

support vector

==i

Tiii

T bxxybxwxf ***)(

= i

ji

Tjjj yxxyb

* i is a support vector

Kernel

(feature space)

x xiTxj (xi)T (xj) xiTxj x radial based function (RBF) RBF k(xi , xj) = (xi)T (xj)

= 2

2

2exp),(

ji

ji

xxxxk

SVM kernel (The Non-Separable case)

optimal separating hyperplane OSH

iybxw

ybxw

i

iiiT

iiiT

+=+

=+

011

11

(cost) C

Cost =

k

iiC

minimize +i

iCw 2

21

subject to ( ) ibxwy iiTi + 01

ii 0

Lagrange Multiplier Method

( )[ ] ==

++=N

iii

N

iii

Tii

ii bxwyCwbwL

11

2 121),,,,(

KKT

= 0pwL 01

==

N

iiii xyw

= 0pbL 01

==

N

iii y

= 0pL 0= iiC

( ) 01 + iiTi bxwy 0i Lagrange multiplier condition 0i Lagrange multiplier condition 0i

complementary slackness ( )[ ] 01 =+ iiTii bxwy complementary slackness 0=ii SVM http://www.twocw.net/mit/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/LectureNotes/index.htm SVM SVM (binary classifier) SVM SVM a. (one-against-rest)

k k SVM m SVM m SVM

b. (one-against-one)

SVM k(k-1)/2 SVM SVM a b SVM a b c
http://www.twocw.net/mit/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/LectureNotes/index.htmhttp://www.twocw.net/mit/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/LectureNotes/index.htm

SVM

SVM1,3,6,7 1,3 6,7 ( 1,3 SVM 6,7 SVM) 1,6 1,6 libsvm (face recognition) SVM libsvm libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/ libsvm libsvm libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/otherdocuments/ http://cvc.yale.edu/projects/yalefaces/yalefaces.html 15 11 (illumination)
http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/otherdocuments/http://cvc.yale.edu/projects/yalefaces/yalefaces.html

8 3 1. 20 x 20 pixel20 x 20

2. PCA (feature extraction) eigen face

eigen face a. average face

=

=M

nnM 1

1(M )

b. average face vector

= ii

c. vector A d. A covariance matrix

==

=

==

M

n

T

nNN

NM

n

Tnn AA

ppp

ppp

MMC

11

11

1 )var(),cov(

),cov()var(11

L

MOM

L

e. covariance matrix eigen vector f. 14 eigne vector g. 14 eigen vector

14

libsvm libsvm : : ...

: : ...

...

TrainingSet.txt 1 1 1 1 1 1 1 1 2 2 2 2

1:599.252014 1:142.808151 1:527.042419 1:413.529144 1:192.096954 1:396.304565 1:492.790314 1:-1109.4270 1:557.744751 1:547.599426 1:404.558716 1:237.526169

2:287.783844 2:540.754883 2:183.781281 2:150.624924 2:824.704895 2:291.588989 2:294.677399 2:-385.86004 2:-123.60966 2:-254.80453 2:-173.10485 2:-147.54145

3:618.0111693:236.2783053:586.4095463:604.8552863:-345.253783:848.9808963:823.7429813:205.6776583:-210.887683:496.6896363:-214.581693:-181.04240

...

...

...

...

...

...

...

...

...

...

...

...

14:112.386940 14:-50.537468 14:109.996880 14:62.5474210 14:-30.935461 14:43.3401150 14:96.3041920 14:138.645569 14:127.031754 14:-159.22918 14:148.403870 14:41.8436470

... 15 1:-916.79571 2:-569.626953 3:299.984253 ... 14:163.308319 (normalization) [-1,+1] libsvm svmscale svmscale usage: svmscale [-l lower] [-u upper] [-y y_lower y_upper] [-s save_filename] [-r restore_filename] filename (default: lower = -1, upper = 1, no y scaling) svmscale -l -1 -u 1 -s range TrainingSet.txt > TrainingSet.scale (scale) svmtrain svmtrain TrainingSet.scale.model svmtrain TrainingSet.scale

TrainingSet.scale.model svm_type c_svc kernel_type rbf gamma 0.0714286 nr_class 15 total_sv 119 rho 0.0148653 0.0397785 -0.0480648 -0.128948 0.11245 0.239645 0.0185018 label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 nr_sv 8 8 8 8 7 8 8 8 8 8 8 8 8 8 8 SV svmpredict svmpredict Usage: svm-predict [options] test_file model_file output_file options: -b probability_estimates: whether to predict probability estimates, 0 or 1 (defa ult 0); for one-class SVM only 0 is supported svmpredict TrainingSet.scale TrainingSet.scale.model out Accuracy = 94.1667% (113/120) (classification) Mean squared error = 3.95 (regression) Squared correlation coefficient = 0.804937 (regression) 94.1667% (cross-validation) (grid-search) SVM cost gamma 1. cost SVM in non-separable case C 2. gamma kernel

cost gamma cost gamma

libsvm n

(cross-validation) over-fitting over-fitting

libsvm grid.py grid.py ( ) C (grid) libsvm

313151535 2,...,2,22,...,2,2 == C

grid.py TrainingSet.scale [local] 13 3 43.3333 (best c=128.0, g=0.0078125, rate=82.5) [local] 13 -9 82.5 (best c=128.0, g=0.0078125, rate=82.5)

[local] 13 -3 80.8333 (best c=128.0, g=0.0078125, rate=82.5) 128.0 0.0078125 82.5 grid.py cost=128, gamma=0.0078125

(scale)

support vector machines 簡介 林宗勳 - cmlab.csie.ntu...

Documents

support vector machines 簡介林宗勳 - cmlab.csie.ntu...