support vector machines 簡介 林宗勳 - cmlab.csie.ntu...
TRANSCRIPT
-
Support Vector Machines ([email protected]) Support Vector Machines(SVM)(Classification) Vapnik SVM
SVM SVM (hyperplane)
(margin)
-
{ } niyx ii ,...,1,, = and { }1,1, + idi yRx
bxwxf T =)( 1=iy
separating hyperplane optimal separating hyperplane (OSH)
0)( xf )(xf
Support Hyperplane support hyperplane support hyperplane optimal separating hyperplane support hyperplanesupport hyperplane
=
+=
bxwbxw
T
T
(over-parameterized problem) x, b, x, b, =1 x b (scale) optimal separating hyperplane (margin )support hyperplane( support hyperplane OSH) separating hyperplane support hyperplane d
( ) ( )( ) ( )0,1/1/1
0,1/1/1
=++=
=+=
bifwwbbd
bifwwbbd
separating hyperplan margin d
-
margin = 2d = 2/||w|| ||w|| margin support hyperplan optimal separating hyperplane 1 ()
11
11
+=+
=
iiT
iiT
ybxw
ybxw
( ) 01 bxwy iTi
minimize 221 w
subject to ( ) ibxwy iTi 01 SVM (the primal problem of the SVM) Lagrange Multiplier MethodLw, b, i(iLagrange Multiplier)
( )[ ]=
=N
ii
Tii bxwywbwL
1
2 121),,(
( ) 01 = bxwy iTi i 0
( ) 01 > bxwy iTi i 0 Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method() http://episte.math.ntu.edu.tw/entries/en_lagrange_mul/index.htmlhttp://episte.math.ntu.edu.tw/entries/en_lagrange_mul/notes.html
http://episte.math.ntu.edu.tw/entries/en_lagrange_mul/index.htmlhttp://episte.math.ntu.edu.tw/entries/en_lagrange_mul/notes.html -
Tutorial on Lagrange Multipliers ()http://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdf L w b
0
0
1
1
*
1
=
==
=
==
N
iii
N
iiii
N
iiii
y
xywxyw
Lagrangian
maximize == ij
jTijiji
N
iiD xxyyL 2
11
subject to 001
==
i
N
iii y
= 0pwL 01
==
N
iiii xyw
= 0pbL 01
==
N
iii y
( ) 01 bxwy iTi Lagrange multiplier condition 0i
complementary slackness ( )[ ] 01 = bxwy iTii KKT(Karush-Kuhn-Tucker conditions) Support Vector (training data)KKTsupport hyperplan support vector i 0
http://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdfhttp://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdfhttp://www.twocw.net/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/CFAA1B28-850A-422E-AF84-4A1C1DB51D2C/0/lagrange.pdf -
support vector
==i
Tiii
T bxxybxwxf ***)(
= i
ji
Tjjj yxxyb
* i is a support vector
Kernel
(feature space)
x xiTxj (xi)T (xj) xiTxj x radial based function (RBF) RBF k(xi , xj) = (xi)T (xj)
= 2
2
2exp),(
ji
ji
xxxxk
-
SVM kernel (The Non-Separable case)
optimal separating hyperplane OSH
iybxw
ybxw
i
iiiT
iiiT
+=+
=+
011
11
(cost) C
Cost =
k
iiC
minimize +i
iCw 2
21
subject to ( ) ibxwy iiTi + 01
ii 0
Lagrange Multiplier Method
( )[ ] ==
++=N
iii
N
iii
Tii
ii bxwyCwbwL
11
2 121),,,,(
-
KKT
= 0pwL 01
==
N
iiii xyw
= 0pbL 01
==
N
iii y
= 0pL 0= iiC
( ) 01 + iiTi bxwy 0i Lagrange multiplier condition 0i Lagrange multiplier condition 0i
complementary slackness ( )[ ] 01 =+ iiTii bxwy complementary slackness 0=ii SVM http://www.twocw.net/mit/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/LectureNotes/index.htm SVM SVM (binary classifier) SVM SVM a. (one-against-rest)
k k SVM m SVM m SVM
b. (one-against-one)
SVM k(k-1)/2 SVM SVM a b SVM a b c
http://www.twocw.net/mit/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/LectureNotes/index.htmhttp://www.twocw.net/mit/Electrical-Engineering-and-Computer-Science/6-867Machine-LearningFall2002/LectureNotes/index.htm -
SVM
SVM1,3,6,7 1,3 6,7 ( 1,3 SVM 6,7 SVM) 1,6 1,6 libsvm (face recognition) SVM libsvm libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/ libsvm libsvm libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/otherdocuments/ http://cvc.yale.edu/projects/yalefaces/yalefaces.html 15 11 (illumination)
http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/otherdocuments/http://cvc.yale.edu/projects/yalefaces/yalefaces.html -
8 3 1. 20 x 20 pixel20 x 20
2. PCA (feature extraction) eigen face
eigen face a. average face
=
=M
nnM 1
1(M )
b. average face vector
= ii
c. vector A d. A covariance matrix
==
=
==
M
n
T
nNN
NM
n
Tnn AA
ppp
ppp
MMC
11
11
1 )var(),cov(
),cov()var(11
L
MOM
L
e. covariance matrix eigen vector f. 14 eigne vector g. 14 eigen vector
14
-
libsvm libsvm : : ...
: : ...
...
TrainingSet.txt 1 1 1 1 1 1 1 1 2 2 2 2
1:599.252014 1:142.808151 1:527.042419 1:413.529144 1:192.096954 1:396.304565 1:492.790314 1:-1109.4270 1:557.744751 1:547.599426 1:404.558716 1:237.526169
2:287.783844 2:540.754883 2:183.781281 2:150.624924 2:824.704895 2:291.588989 2:294.677399 2:-385.86004 2:-123.60966 2:-254.80453 2:-173.10485 2:-147.54145
3:618.0111693:236.2783053:586.4095463:604.8552863:-345.253783:848.9808963:823.7429813:205.6776583:-210.887683:496.6896363:-214.581693:-181.04240
...
...
...
...
...
...
...
...
...
...
...
...
14:112.386940 14:-50.537468 14:109.996880 14:62.5474210 14:-30.935461 14:43.3401150 14:96.3041920 14:138.645569 14:127.031754 14:-159.22918 14:148.403870 14:41.8436470
... 15 1:-916.79571 2:-569.626953 3:299.984253 ... 14:163.308319 (normalization) [-1,+1] libsvm svmscale svmscale usage: svmscale [-l lower] [-u upper] [-y y_lower y_upper] [-s save_filename] [-r restore_filename] filename (default: lower = -1, upper = 1, no y scaling) svmscale -l -1 -u 1 -s range TrainingSet.txt > TrainingSet.scale (scale) svmtrain svmtrain TrainingSet.scale.model svmtrain TrainingSet.scale
-
TrainingSet.scale.model svm_type c_svc kernel_type rbf gamma 0.0714286 nr_class 15 total_sv 119 rho 0.0148653 0.0397785 -0.0480648 -0.128948 0.11245 0.239645 0.0185018 label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 nr_sv 8 8 8 8 7 8 8 8 8 8 8 8 8 8 8 SV svmpredict svmpredict Usage: svm-predict [options] test_file model_file output_file options: -b probability_estimates: whether to predict probability estimates, 0 or 1 (defa ult 0); for one-class SVM only 0 is supported svmpredict TrainingSet.scale TrainingSet.scale.model out Accuracy = 94.1667% (113/120) (classification) Mean squared error = 3.95 (regression) Squared correlation coefficient = 0.804937 (regression) 94.1667% (cross-validation) (grid-search) SVM cost gamma 1. cost SVM in non-separable case C 2. gamma kernel
cost gamma cost gamma
-
libsvm n
(cross-validation) over-fitting over-fitting
libsvm grid.py grid.py ( ) C (grid) libsvm
313151535 2,...,2,22,...,2,2 == C
grid.py TrainingSet.scale [local] 13 3 43.3333 (best c=128.0, g=0.0078125, rate=82.5) [local] 13 -9 82.5 (best c=128.0, g=0.0078125, rate=82.5)
-
[local] 13 -3 80.8333 (best c=128.0, g=0.0078125, rate=82.5) 128.0 0.0078125 82.5 grid.py cost=128, gamma=0.0078125
(scale)