two important parts of smo (selection heuristics & stopping criterion)
DESCRIPTION
Two Important Parts of SMO (selection heuristics & stopping criterion). A good selection of a pair of updating points will speed up the convergence. The selection heuristics maybe depend on the stopping criterion. Stopping criterion: duality gap - PowerPoint PPT PresentationTRANSCRIPT
Two Important Parts of SMO(selection heuristics & stopping criterion)
A good selection of a pair of updating points will speed up the convergence
The selection heuristics maybe depend on the stopping criterion
Stopping criterion: duality gap=> naturally to choose the points with the most violation of the KKT conditions (Too expensive)
How to Solve an Unconstrained MP
Get an initial point and iteratively decrease the obj. function value
Newton’s method is highly recommended Local and quadratic convergent algorithm
Stop once the stopping criteria satisfied
Steep decent might not be a good choice
Need to choose a good step size to guarantee global convergence
Steep Descent with Exact Line Search
Start with any . Having , stop if
Else compute as follows:
x0 2 Rn xi r f (xi) = 0
xi+1
(i) Steep descent direction:di = à r f (xi)
(ii) Exact line search: Choose a stepsize õ 2 R
such that
(iii) Updating:
dõdf (x i+õdi) = f 0(xi + õdi) = 0
xi+1 = xi + õdi
Newton’s Method
Start with . Having , stop if
Else compute as follows:
x0 2 Rn xi r f (xi) = 0
xi+1
(i) Newton direction: r 2f (xi)di = à r f (xi):
(ii) Updating:xi+1 = xi + di
Have to solve a system of linear equations here!
Converge only when is close to enough.x0 xã
It can not converge to the optimal solution.
f (x) = à 61x6+ 4
1x4+ 2x2
g(x) = f(xi) + f 0(xi)(x à xi) + 21f 00(xi)(x à xi)
SVM as an Unconstrained Minimization
Problem
Hence (QP) is equivalent to the nonsmooth SVM:
minw;b 2
C k(eà D(Aw+ eb))+k22 + 2
1(kwk22 + b2)
2C køk2
2 + 21(kwk2
2 + b2)
D(Aw+ eb) + ø>eø>0;w;bmin
s. t.(QP)
Change (QP) into an unconstrained MP
Reduce (n+1+l) variables to (n+1) variables
At the solution of (QP) : where(á)+ = maxf á;0g
ø= (eà D(Aw+ eb))+
.
Smooth the Plus Function: Integrate
Step function: xã Sigmoid function:(1+"à 5x)
1
Plus function: x+ p-function: p(x;5)
(1+"à ì x)1
p(x; ì ) := x + ì1 log(1+ "à ì x)
SSVM: Smooth Support Vector
Machine
(á)+ Replacing the plus function in the nonsmooth SVM by the smooth p(á; ì ), gives our SSVM:
ìnonsmooth SVM as goes to infinity. The solution of SSVM converges to the solution of
ì = 5(Typically, )
min(w;b) 2 Rn+12
Ckp((eà D(Aw+ eb)); ì )k22 + 2
1(kwk22 + b2)
, obtained by integrating the sigmoid function (á)+ofHere,p(á; ì ) is an accurate smooth approximation
of neural networks. (sigmoid = smoothed step)
Newton-Armijo Method: Quadratic Approximation of SSVM
(wi;bi)è é
generated by solving a The sequence
(wã;bã)quadratic approximation of SSVM, converges to the
of SSVM at a quadratic rate.
At each iteration we solve a linear system of: n+1 equations in n+1 variables Complexity depends on dimension of input space
Converges in 6 to 8 iterations
unique solution
It might be needed to select a stepsize
Newton-Armijo Algorithm
Start with any
(w0;b0) 2 Rn+1 . Having
(wi;bi);
stop if r Ðì (wi;bi) = 0; else : (i) Newton Direction :
r 2Ðì (wi;bi)di = à r Ðì (wi;bi)0
(ii) Armijo Stepsize :
(wi+1;bi+1) = (wi;bi) + õidi
õi 2 f1;21;4
1; :::g
globally and globally and quadraticallquadratically converge y converge to unique to unique solution in a solution in a finite finite number of number of stepssteps
such that Armijo’s rule is satisfied
2÷kp((eà D(Awà eí ));ë)k2
2 + 21kw;í k2
2Ðë(w;í ) :=
Ðë(w;í )minNewton-Armijo Algorithm for SSVM:
(wi; í i)(w0; í 0)Start with any 2 Rn+1. Having , stop if
. Else compute(wi+1; í i+1) as follows:r Ðë(wi; í i) = 0(i) Newton Direction: Determine direction di 2 Rn+1 by
solving n+1 linear equations in n+1 variables:
r 2Ðë(wi; í i)di = à r Ðë(wi; í i)0
(ii) Armijo Stepsize: Choose õi = maxf 1;21;4
1; . . .gsuch that:Ðë(wi; í i) à Ðë((wi; í i) + õidi) > à î õir Ðë(wi; í i)di;where î 2 (0;2
1)
(iii) Updating: (wi+1; í i+1) = (wi; í i) + õidi
Comparisons of SSVM with other SVMs
Cleveland Heart297 x 13
86.131.63
84.5518.71
72.1267.55
BUPA Liver345 x 6
70.331.05
64.0319.94
69.86124.23
Ionosphere 351 x 34
89.633.69
86.1042.41
89.17128.15
Pima Indians768 x 8
78.121.54
74.47286.59
77.071138.0
WPBC(24 months)155 x 32
83.472.32
71.086.25
82.0212.50
WPBC(60 months)110 x 22
68.181.03
66.233.72
61.834.91
mâ nDataset Size SSVM SVMí
í áíí 2
2SVMí
í áíí
1
Tenfold test set correctness % (best in Red)CPU time in seconds
QPLPLinear Eqns.
The Perceptron Algorithm (Dual Form)
w =P
i=1l ë iyixi
Given a linearly separable training setS ë = 0; ë 2 R land
b= 0;R = max16 i6 l jjxijj
Repeat: for i = 1 to l
if yi(P
j=1
l
ë jyjêxj áxi
ë+ b)60 then
ë i ë i + 1;
end if
until no mistakes made within the for loop return:
end for
(ë;b)
Nonlinear SVM Motivation
Linear SVM: (Linear separating surface:x0w = í )2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y > ey > 0;w;í
mins. t.
(QP)
By QP “duality”, w = A0Du. Maximizing the margin in the “dual space” gives:
D(AA0Duà eí ) + y > es. t.uy > 0; ; í
min 2÷kyk2
2 + 21k ; í k2
2u
2÷kp(eà D(AA0Du à eí );ë)k2
2+ 21ku;í k2
2u;ímin
Dual SSVM with separator:x0A0Du = í
Nonlinear Smooth SVM
K (x0;A0)Du = í
K (A;A0) ReplaceAA0by a nonlinear kernel :
2÷kp(eà D(K (A;A0)Du à eí ;ë)k2
2+ 21ku; í k2
2u; ímin
The kernel matrixK (A;A0) 2 Rmâ mis fully dense Use Newton algorithm to solve the problem
Each iteration solves m+1 linear equations in m+1 variables
Nonlinear classifier depends on entire dataset :
K (x0;A0)Du = í
Nonlinear Classifier:
Difficulties with Nonlinear SVM
for Large Problems
The nonlinear kernelK (A;A0) 2 Rmâ m is fully dense
Computational complexity depends on m
Separating surface depends on almost entire dataset
Complexity of nonlinear SSVM ø O((m+ 1)3)
Runs out of memory while storing the kernel matrix
Long CPU time to compute the dense kernel matrix
O(m2) Need to generate and store entries
Need to store the entire dataset even after solving the problem