a ba-based algorithm for parameter optimization of support vector machine
TRANSCRIPT
A BA-based algorithm for parameter optimizationof support vector machine
Alaa Tharwat
Electrical Dept. - Suez Canal University- EgyptScientific Research Group in Egypt (SRGE)
Email: [email protected]
November 3, 2016
Alaa Tharwat November 3, 2016 1 / 26
Agenda
The main objective.
Support Vector Machine (SVM).
Bat Algorithm (BA).
The proposed Model.
Experimental Results and Discussions.
Conclusions and Future Work.
Alaa Tharwat November 3, 2016 2 / 26
The main objective
The main objective is to:
Optimize the SVM parameters.
Alaa Tharwat November 3, 2016 3 / 26
Support Vector Machine (SVM)
1
2
3
4
5
6
1 2 3 4 5 6
x2 Classe1
7
7
Classe2
x1
w.xi +b=0
1/||w||
1/||w||
w.x
i +b=-1
w.xi +b=+1
H1
H2
SupporteVectors
Margin=2/||w||
Figure: The structure of building a classifier, which includes N samples and cdiscriminant functions or classes.
Alaa Tharwat November 3, 2016 4 / 26
Support Vector Machine (SVM)
The aim of SVM is to select the values of w and b to orientate thehyperplane to be as far as possible from the closest samples and toconstruct the two planes, H1 and H2, as follows:
H1 → wT xi + b = +1 for yi = +1
H2 → wT xi + b = −1 for yi = −1(1)
These two equations can be combined as follows:
yi(wT xi + b)− 1 ≥ 0 ∀i = 1, 2, . . . , N (2)
In SVM, the margin width needs to be maximized subject to Eq. (2) asfollows:
min1
2‖w‖2
s.t. yi(wT xi + b)− 1 ≥ 0 ∀i = 1, 2, . . . , N
(3)
Alaa Tharwat November 3, 2016 5 / 26
Support Vector Machine (SVM)
In Eq. (3), minimizing ‖w‖ is equivalent to minimizing 12 ‖w‖
2.Moreover, Eq. (3) represents quadratic programming problem that isformalized into Lagrange formula by combining the objective function(min 1
2 ‖w‖2) and the constraints (yi(wT xi + b)− 1 ≥ 0) as follows:
min LP =‖w‖2
2−∑i
αi(yi(wT xi + b)− 1)
=‖w‖2
2−∑i
αiyi(wT xi + b) +
N∑i=1
αi
(4)
where αi ≥ 0, i = 1, 2, . . . , N represent the Lagrange multipliers andeach Lagrange multiplier (αi) corresponds to one training sample (xi, yi)and LP represents the primal problem.
Alaa Tharwat November 3, 2016 6 / 26
Support Vector Machine (SVM)
To calculate w, b, and α which minimize Eq. (4), LP is differentiatingwith respect to w and b and setting the derivatives to zero as follows:
∂LP
∂w= 0⇒ w =
N∑i=1
αiyixi (5)
∂LP
∂b= 0⇒
N∑i=1
αiyi = 0 (6)
Substituting Eqs. (5 and 6) into Eq. (4), hence the dual problem can bewritten as follows:
max LD =
N∑i=1
αi −1
2
∑i,j
αiαjyiyjxTi xj
s.t. αi ≥ 0 ,N∑i=1
αiyi = 0 ∀i = 1, 2, . . . , N
(7)
Alaa Tharwat November 3, 2016 7 / 26
Support Vector Machine (SVM)
In SVM, most of αi’s are zeros; thus, sparseness is a commonproperty of SVM. The non-zero α’s are corresponding to SupportVectors (SVs), which are the samples closest to the separatinghyperplane; hence, SVs achieved the maximum width margin.
A new sample x0 is classified by evaluating y0 = sgn (wT x0 + b) andif y0 is positive; thus, the new sample belongs to the positive class;otherwise, it belongs to the negative class.
Alaa Tharwat November 3, 2016 8 / 26
Support Vector Machine (SVM)
In the case of non-separable data, more misclassified samples result.Therefore, the constraints of linear SVM must be relaxed by adding aslack variable, εi, as denoted in Eq. (8).
wT xi + b ≥ +1− εi for yi = +1
wT xi + b ≤ −1 + εi for yi = −1(8)
yi(wT xi + b)− 1 + εi ≥ 0 where εi ≥ 0 (9)
min1
2‖w‖2 + C
N∑i=1
εi
s.t. yi(wT xi + b)− 1 + εi ≥ 0 ∀i = 1, 2, . . . , N
(10)
Equation (10) is formalized into Lagrange formula as follows:
LP =1
2‖w‖2 + C
N∑i=1
εi −N∑i=1
αi[yi(wT xi + b)− 1 + εi] (11)
Alaa Tharwat November 3, 2016 9 / 26
Support Vector Machine (SVM)
∂LP
∂εi= 0⇒ C = αi + εi (12)
From Eq. (12) it can be noticed that αi is limited by the upper-bound C.If the data are non-linearly separable, SVM uses kernel functions to mapthe data into a higher dimensional space using a nonlinear function φ.
min1
2‖w‖2 + C
N∑i=1
εi
s.t. yi(wTφ(xi) + b)− 1 + εi ≥ 0 ∀i = 1, 2, . . . , N
(13)
Alaa Tharwat November 3, 2016 10 / 26
Support Vector Machine (SVM) SVM Parameter Optimization
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
2
4
6
8
10
12
(a) C = 0.01
0 0.5 1 1.5 2 2.5 3 3.5 4 4.51.5
2
2.5
3
3.5
4
4.5
5
(b) C = 1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.51.5
2
2.5
3
3.5
4
4.5
5
(c) C = 100
Figure: The effect of the penalty parameter (C) with linear SVM. Decisionboundaries (blue lines), two planes (black lines), support vectors are marked ingreen squares, and misclassified samples are marked with red squares.
Alaa Tharwat November 3, 2016 11 / 26
Support Vector Machine (SVM) SVM Parameter Optimization
Table: The training error rate, number of SVs, and number of misclassifiedsamples of the linear and RBF kernel SVM using different values of C.
CLinear Kernel RBF kernel (σ = 0.1)
Trainingerror (%)
# SVs# Misc.Samples
Trainingerror (%)
# SVs# Misc.Samples
0.01 53.75 52 26 9.51 768 820.1 7.14 34 4 6.14 453 531 0 18 0 1.04 187 9
10 0 8 0 1.04 75 9100 0 4 0 0.46 40 4
1000 0 3 0 3.82 40 33
Alaa Tharwat November 3, 2016 12 / 26
Support Vector Machine (SVM) SVM Parameter Optimization
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
(a) C = 0.01
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
(b) C = 1
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
(c) C = 100
Figure: The effect of the penalty parameter (C) with nonlinear SVM, whereRBF kernel was used and σ = 0.1. Decision boundaries (black lines), supportvectors are marked in green squares, and misclassified samples are marked withred squares.
Alaa Tharwat November 3, 2016 13 / 26
Support Vector Machine (SVM) SVM Parameter Optimization
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
(a) σ = 0.05
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
(b) σ = 0.1
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
(c) σ = 0.2
Figure: The effect of the RBF parameter (σ) with nonlinear SVM when C = 10.Decision boundaries (blue lines), support vectors are marked in green squares,and misclassified samples are marked with red squares.
Alaa Tharwat November 3, 2016 14 / 26
Support Vector Machine (SVM) SVM Parameter Optimization
Table: Training error, number of support vectors, and number of misclassifiedsamples of the RBF kernel SVM when C = 10 using different values of σ.
σ Training error (%) # SVs # Misc. Samples
0.01 44.96 324 3880.05 0.23 100 20.1 1.04 74 90.2 30.59 216 264
Alaa Tharwat November 3, 2016 15 / 26
Bat Algorithm (BA)
1 Bats’ Positions (Xi): The positions of the bats are used to calculatethe objective function at that location.
2 Bats’ Velocity (Vi): The directed velocity of the bats is used tomove the bats in the search space to the optimal solution.
3 Pulse Rate (ri) and ri ∈ [0, 1]) is updated, i.e. increased, accordinglyas the iteration proceeds as follows rt+1
i = r0i [1− exp(−γt)], whereγ > 0 is a constant and t is the current iteration.
4 Frequency (fi): this parameter is used to adjust the velocity of bats.
5 Loudness (Ai): The loudness of the emitted sound varies from a highloudness when looking for a prey to a low loudness when the prey isnear.
Alaa Tharwat November 3, 2016 16 / 26
The Proposed Model: BA-SVM
IntializeYBAYparameters
TraininigYset
Dataset
Scaling
TestingYset
TrainYSVMYclassifier
TrainedYSVMclassifier
FitnessYevaluationYhF)
SatisfyingYstoppingcriterion
Yes
OptimizedYhCYandYσ)
NoP
aram
etersYhCYandYσ
)
GeneratesYnewYsolutionsYusingYbatY
algorithm
Population
Figure: Flowchart of the proposed model (BA-SVM).
Alaa Tharwat November 3, 2016 17 / 26
The Proposed Model: BA-SVM
Data preprocessing.
Parameters’ Initialization.
Fitness evaluation.
Minimize : F =Ne
N(14)
Termination criteria.
Updating positions.
Alaa Tharwat November 3, 2016 18 / 26
Experimental Results and Discussions
Table: Datasets description.
Dataset Dimension # Samples # Classes
Iris (D1) 4 150 3Ionosphere (D2) 34 351 2
Liver-disorders (D3) 6 345 2Breast Cancer (D4) 13 683 2
Sonar (D5) 60 208 2Tic-Tac-Toc (D6) 9 958 2
Glass (D7) 9 214 7Wine (D8) 13 178 3
Pima Indians Diabetes (D9) 8 768 2
Alaa Tharwat November 3, 2016 19 / 26
Experimental Results and Discussions Parameter setting for BA
5 10 15 20 25 30 35 402.5
3
3.5
4
4.5
5
5.5
Number of bats
Tes
ting
Err
or r
ate
(%)
(a)
0 10 20 30 40 500
10
20
30
40
Number of bats
CP
U T
ime
(sec
s)
(b)
Figure: Effect of the number of bats on the performance of BA-SVM model foriris dataset: (a) Testing error rate of the BA-SVM model with different numbersof bats; (b) CPU time of the BA-SVM using different numbers of bats.
Alaa Tharwat November 3, 2016 20 / 26
Experimental Results and Discussions Parameter setting for BA
0 20 40 60 80 1001
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Number of Iterations
Tes
ting
Err
or R
ate
(%)
First runSecond runThird run
(a)
0 20 40 60 80 1005
10
15
20
25
30
35
Number of Iterations
CP
U T
ime
(sec
s)
First runSecond runThird run
(b)
Figure: Effect of the number of iterations on the performance of BA-SVMmodel for iris dataset using three runs. (a) Testing error rate of the BA-SVMmodel with different numbers of iterations; (b) CPU time of the BA-SVM usingdifferent numbers of iterations.
Alaa Tharwat November 3, 2016 21 / 26
Experimental Results and Discussions Parameter setting for BA
Table: The initial parameters of bat algorithm.
Parameter Value
Frequency (fmin and fmax) fmin = 0 and fmax = 2Pulse Rate (r) 0.5Loudness (A) 0.5Population size 20Maximum number of iterations 20
Alaa Tharwat November 3, 2016 22 / 26
Experimental Results and Discussions BA-SVM vs. Grid Search
Table: Results of the proposed BA-SVM algorithm and grid search SVMalgorithm (using RBF kernel).
DatasetGrid Search SVM BA-SVM p-Value for
Wilcoxontesting
Costtime (secs)
Testerror (%)
Costtime (secs)
Testerror (%)
D1 268.8 0± 0.2 168.2 0± 0 <0.005D2 1064.2 2.1± 0.6 645.3 0.3± 0.2 <0.005D3 4399.2 15.1± 2.3 2820.4 12± 1.2 <0.005D4 35630.0 2.4± 0.7 25450.0 0.8± 0.3 <0.005D5 532.9 1.2± 0.5 319.1 0.9± 0.8 0.0052D6 53824.2 6.7± 1.4 37120.6 2.1± 0.6 <0.005D7 1710.6 17.4± 2.4 1056.7 13.5± 1.2 <0.005D8 587.36 3.7± 1.0 367.1 0.3± 0.2 <0.005D9 2028.8 19.8± 2.7 1276.0 14.3± 1.5 <0.005
Alaa Tharwat November 3, 2016 23 / 26
Experimental Results and Discussions BA-SVM vs. other Optimization Algorithms
Table: Comparison between the BA-SVM and PSO+SVM approach proposed byLin [15] and GA+SVM approach proposed by huang[16] (%).
Dataset(1)
BA-SVM(2)
PSO+SVM(3)
GA+SVM
p-Value forWilcoxon
testing(1 vs. 2)
p-Value forWilcoxon
testing(1 vs. 3)
D1 0.0± 0.0 2.0± 0.23 0.0± 0.0 <0.005 0.0052
D2 0.30± 0.20 2.50± 0.65 0.57± 0.53 <0.005 <0.005
D3 12.0± 1.20 14.23± 1.93 16.86± 2.45 <0.005 <0.005
D4 0.80± 0.30 2.05± 0.74 1.0± 0.42 <0.005 <0.005
D5 0.90± 0.80 11.68± 2.64 8.40± 3.14 <0.005 <0.005
D6 2.10± 0.60 6.48± 2.47 8.41± 2.19 <0.005 <0.005
D7 13.50± 1.20 21.96± 4.59 19.26± 4.55 <0.005 <0.005
D8 0.30± 0.20 0.40± 0.33 0.0± 0.15 <0.005 0.0058
D9 14.30± 1.50 19.79± 6.12 16.4± 4.21 <0.005 <0.005
Alaa Tharwat November 3, 2016 24 / 26
Experimental Results and Discussions BA-SVM vs. other Optimization Algorithms
0
2
4
6
8
10 0
1
2
3
4
50
0.5
1
1.5
2
log (C)log (σ)
Tes
t Err
or (
%)
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
(a) Error surface
0.20.4
0.6 0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.8
0.8
0.8 0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8 0.8 0.8
0.8
0.8 0.8
1
1
1 1 1 1
11
1
1.2
1.2
1.2 1.2 1.2
1.4
1.4
1.4 1.4 1.41.6 1.6 1.61.8 1.8 1.8
log (C)lo
g (σ
)
1 2 3 4 5 6 7 8 9
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
(b) Contour plot
Figure: Test error surface and contour plot with parameters on iris dataset.
Alaa Tharwat November 3, 2016 25 / 26
Conclusions and Future Work
The parameters of SVM and the influence of each parameter.
How BA optimizes the SVM parameters.
Alaa Tharwat November 3, 2016 26 / 26