a ba-based algorithm for parameter optimization of support vector machine

26
A BA-based algorithm for parameter optimization of support vector machine Alaa Tharwat Electrical Dept. - Suez Canal University- Egypt Scientific Research Group in Egypt (SRGE) Email: [email protected] November 3, 2016 Alaa Tharwat November 3, 2016 1 / 26

Upload: aboul-ella-hassanien

Post on 19-Jan-2017

204 views

Category:

Education


0 download

TRANSCRIPT

Page 1: A BA-based algorithm for parameter optimization of support vector machine

A BA-based algorithm for parameter optimizationof support vector machine

Alaa Tharwat

Electrical Dept. - Suez Canal University- EgyptScientific Research Group in Egypt (SRGE)

Email: [email protected]

November 3, 2016

Alaa Tharwat November 3, 2016 1 / 26

Page 2: A BA-based algorithm for parameter optimization of support vector machine

Agenda

The main objective.

Support Vector Machine (SVM).

Bat Algorithm (BA).

The proposed Model.

Experimental Results and Discussions.

Conclusions and Future Work.

Alaa Tharwat November 3, 2016 2 / 26

Page 3: A BA-based algorithm for parameter optimization of support vector machine

The main objective

The main objective is to:

Optimize the SVM parameters.

Alaa Tharwat November 3, 2016 3 / 26

Page 4: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

1

2

3

4

5

6

1 2 3 4 5 6

x2 Classe1

7

7

Classe2

x1

w.xi +b=0

1/||w||

1/||w||

w.x

i +b=-1

w.xi +b=+1

H1

H2

SupporteVectors

Margin=2/||w||

Figure: The structure of building a classifier, which includes N samples and cdiscriminant functions or classes.

Alaa Tharwat November 3, 2016 4 / 26

Page 5: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

The aim of SVM is to select the values of w and b to orientate thehyperplane to be as far as possible from the closest samples and toconstruct the two planes, H1 and H2, as follows:

H1 → wT xi + b = +1 for yi = +1

H2 → wT xi + b = −1 for yi = −1(1)

These two equations can be combined as follows:

yi(wT xi + b)− 1 ≥ 0 ∀i = 1, 2, . . . , N (2)

In SVM, the margin width needs to be maximized subject to Eq. (2) asfollows:

min1

2‖w‖2

s.t. yi(wT xi + b)− 1 ≥ 0 ∀i = 1, 2, . . . , N

(3)

Alaa Tharwat November 3, 2016 5 / 26

Page 6: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

In Eq. (3), minimizing ‖w‖ is equivalent to minimizing 12 ‖w‖

2.Moreover, Eq. (3) represents quadratic programming problem that isformalized into Lagrange formula by combining the objective function(min 1

2 ‖w‖2) and the constraints (yi(wT xi + b)− 1 ≥ 0) as follows:

min LP =‖w‖2

2−∑i

αi(yi(wT xi + b)− 1)

=‖w‖2

2−∑i

αiyi(wT xi + b) +

N∑i=1

αi

(4)

where αi ≥ 0, i = 1, 2, . . . , N represent the Lagrange multipliers andeach Lagrange multiplier (αi) corresponds to one training sample (xi, yi)and LP represents the primal problem.

Alaa Tharwat November 3, 2016 6 / 26

Page 7: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

To calculate w, b, and α which minimize Eq. (4), LP is differentiatingwith respect to w and b and setting the derivatives to zero as follows:

∂LP

∂w= 0⇒ w =

N∑i=1

αiyixi (5)

∂LP

∂b= 0⇒

N∑i=1

αiyi = 0 (6)

Substituting Eqs. (5 and 6) into Eq. (4), hence the dual problem can bewritten as follows:

max LD =

N∑i=1

αi −1

2

∑i,j

αiαjyiyjxTi xj

s.t. αi ≥ 0 ,N∑i=1

αiyi = 0 ∀i = 1, 2, . . . , N

(7)

Alaa Tharwat November 3, 2016 7 / 26

Page 8: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

In SVM, most of αi’s are zeros; thus, sparseness is a commonproperty of SVM. The non-zero α’s are corresponding to SupportVectors (SVs), which are the samples closest to the separatinghyperplane; hence, SVs achieved the maximum width margin.

A new sample x0 is classified by evaluating y0 = sgn (wT x0 + b) andif y0 is positive; thus, the new sample belongs to the positive class;otherwise, it belongs to the negative class.

Alaa Tharwat November 3, 2016 8 / 26

Page 9: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

In the case of non-separable data, more misclassified samples result.Therefore, the constraints of linear SVM must be relaxed by adding aslack variable, εi, as denoted in Eq. (8).

wT xi + b ≥ +1− εi for yi = +1

wT xi + b ≤ −1 + εi for yi = −1(8)

yi(wT xi + b)− 1 + εi ≥ 0 where εi ≥ 0 (9)

min1

2‖w‖2 + C

N∑i=1

εi

s.t. yi(wT xi + b)− 1 + εi ≥ 0 ∀i = 1, 2, . . . , N

(10)

Equation (10) is formalized into Lagrange formula as follows:

LP =1

2‖w‖2 + C

N∑i=1

εi −N∑i=1

αi[yi(wT xi + b)− 1 + εi] (11)

Alaa Tharwat November 3, 2016 9 / 26

Page 10: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM)

∂LP

∂εi= 0⇒ C = αi + εi (12)

From Eq. (12) it can be noticed that αi is limited by the upper-bound C.If the data are non-linearly separable, SVM uses kernel functions to mapthe data into a higher dimensional space using a nonlinear function φ.

min1

2‖w‖2 + C

N∑i=1

εi

s.t. yi(wTφ(xi) + b)− 1 + εi ≥ 0 ∀i = 1, 2, . . . , N

(13)

Alaa Tharwat November 3, 2016 10 / 26

Page 11: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM) SVM Parameter Optimization

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

2

4

6

8

10

12

(a) C = 0.01

0 0.5 1 1.5 2 2.5 3 3.5 4 4.51.5

2

2.5

3

3.5

4

4.5

5

(b) C = 1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.51.5

2

2.5

3

3.5

4

4.5

5

(c) C = 100

Figure: The effect of the penalty parameter (C) with linear SVM. Decisionboundaries (blue lines), two planes (black lines), support vectors are marked ingreen squares, and misclassified samples are marked with red squares.

Alaa Tharwat November 3, 2016 11 / 26

Page 12: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM) SVM Parameter Optimization

Table: The training error rate, number of SVs, and number of misclassifiedsamples of the linear and RBF kernel SVM using different values of C.

CLinear Kernel RBF kernel (σ = 0.1)

Trainingerror (%)

# SVs# Misc.Samples

Trainingerror (%)

# SVs# Misc.Samples

0.01 53.75 52 26 9.51 768 820.1 7.14 34 4 6.14 453 531 0 18 0 1.04 187 9

10 0 8 0 1.04 75 9100 0 4 0 0.46 40 4

1000 0 3 0 3.82 40 33

Alaa Tharwat November 3, 2016 12 / 26

Page 13: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM) SVM Parameter Optimization

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

(a) C = 0.01

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

(b) C = 1

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

(c) C = 100

Figure: The effect of the penalty parameter (C) with nonlinear SVM, whereRBF kernel was used and σ = 0.1. Decision boundaries (black lines), supportvectors are marked in green squares, and misclassified samples are marked withred squares.

Alaa Tharwat November 3, 2016 13 / 26

Page 14: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM) SVM Parameter Optimization

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

(a) σ = 0.05

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

(b) σ = 0.1

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

(c) σ = 0.2

Figure: The effect of the RBF parameter (σ) with nonlinear SVM when C = 10.Decision boundaries (blue lines), support vectors are marked in green squares,and misclassified samples are marked with red squares.

Alaa Tharwat November 3, 2016 14 / 26

Page 15: A BA-based algorithm for parameter optimization of support vector machine

Support Vector Machine (SVM) SVM Parameter Optimization

Table: Training error, number of support vectors, and number of misclassifiedsamples of the RBF kernel SVM when C = 10 using different values of σ.

σ Training error (%) # SVs # Misc. Samples

0.01 44.96 324 3880.05 0.23 100 20.1 1.04 74 90.2 30.59 216 264

Alaa Tharwat November 3, 2016 15 / 26

Page 16: A BA-based algorithm for parameter optimization of support vector machine

Bat Algorithm (BA)

1 Bats’ Positions (Xi): The positions of the bats are used to calculatethe objective function at that location.

2 Bats’ Velocity (Vi): The directed velocity of the bats is used tomove the bats in the search space to the optimal solution.

3 Pulse Rate (ri) and ri ∈ [0, 1]) is updated, i.e. increased, accordinglyas the iteration proceeds as follows rt+1

i = r0i [1− exp(−γt)], whereγ > 0 is a constant and t is the current iteration.

4 Frequency (fi): this parameter is used to adjust the velocity of bats.

5 Loudness (Ai): The loudness of the emitted sound varies from a highloudness when looking for a prey to a low loudness when the prey isnear.

Alaa Tharwat November 3, 2016 16 / 26

Page 17: A BA-based algorithm for parameter optimization of support vector machine

The Proposed Model: BA-SVM

IntializeYBAYparameters

TraininigYset

Dataset

Scaling

TestingYset

TrainYSVMYclassifier

TrainedYSVMclassifier

FitnessYevaluationYhF)

SatisfyingYstoppingcriterion

Yes

OptimizedYhCYandYσ)

NoP

aram

etersYhCYandYσ

)

GeneratesYnewYsolutionsYusingYbatY

algorithm

Population

Figure: Flowchart of the proposed model (BA-SVM).

Alaa Tharwat November 3, 2016 17 / 26

Page 18: A BA-based algorithm for parameter optimization of support vector machine

The Proposed Model: BA-SVM

Data preprocessing.

Parameters’ Initialization.

Fitness evaluation.

Minimize : F =Ne

N(14)

Termination criteria.

Updating positions.

Alaa Tharwat November 3, 2016 18 / 26

Page 19: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions

Table: Datasets description.

Dataset Dimension # Samples # Classes

Iris (D1) 4 150 3Ionosphere (D2) 34 351 2

Liver-disorders (D3) 6 345 2Breast Cancer (D4) 13 683 2

Sonar (D5) 60 208 2Tic-Tac-Toc (D6) 9 958 2

Glass (D7) 9 214 7Wine (D8) 13 178 3

Pima Indians Diabetes (D9) 8 768 2

Alaa Tharwat November 3, 2016 19 / 26

Page 20: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions Parameter setting for BA

5 10 15 20 25 30 35 402.5

3

3.5

4

4.5

5

5.5

Number of bats

Tes

ting

Err

or r

ate

(%)

(a)

0 10 20 30 40 500

10

20

30

40

Number of bats

CP

U T

ime

(sec

s)

(b)

Figure: Effect of the number of bats on the performance of BA-SVM model foriris dataset: (a) Testing error rate of the BA-SVM model with different numbersof bats; (b) CPU time of the BA-SVM using different numbers of bats.

Alaa Tharwat November 3, 2016 20 / 26

Page 21: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions Parameter setting for BA

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Number of Iterations

Tes

ting

Err

or R

ate

(%)

First runSecond runThird run

(a)

0 20 40 60 80 1005

10

15

20

25

30

35

Number of Iterations

CP

U T

ime

(sec

s)

First runSecond runThird run

(b)

Figure: Effect of the number of iterations on the performance of BA-SVMmodel for iris dataset using three runs. (a) Testing error rate of the BA-SVMmodel with different numbers of iterations; (b) CPU time of the BA-SVM usingdifferent numbers of iterations.

Alaa Tharwat November 3, 2016 21 / 26

Page 22: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions Parameter setting for BA

Table: The initial parameters of bat algorithm.

Parameter Value

Frequency (fmin and fmax) fmin = 0 and fmax = 2Pulse Rate (r) 0.5Loudness (A) 0.5Population size 20Maximum number of iterations 20

Alaa Tharwat November 3, 2016 22 / 26

Page 23: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions BA-SVM vs. Grid Search

Table: Results of the proposed BA-SVM algorithm and grid search SVMalgorithm (using RBF kernel).

DatasetGrid Search SVM BA-SVM p-Value for

Wilcoxontesting

Costtime (secs)

Testerror (%)

Costtime (secs)

Testerror (%)

D1 268.8 0± 0.2 168.2 0± 0 <0.005D2 1064.2 2.1± 0.6 645.3 0.3± 0.2 <0.005D3 4399.2 15.1± 2.3 2820.4 12± 1.2 <0.005D4 35630.0 2.4± 0.7 25450.0 0.8± 0.3 <0.005D5 532.9 1.2± 0.5 319.1 0.9± 0.8 0.0052D6 53824.2 6.7± 1.4 37120.6 2.1± 0.6 <0.005D7 1710.6 17.4± 2.4 1056.7 13.5± 1.2 <0.005D8 587.36 3.7± 1.0 367.1 0.3± 0.2 <0.005D9 2028.8 19.8± 2.7 1276.0 14.3± 1.5 <0.005

Alaa Tharwat November 3, 2016 23 / 26

Page 24: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions BA-SVM vs. other Optimization Algorithms

Table: Comparison between the BA-SVM and PSO+SVM approach proposed byLin [15] and GA+SVM approach proposed by huang[16] (%).

Dataset(1)

BA-SVM(2)

PSO+SVM(3)

GA+SVM

p-Value forWilcoxon

testing(1 vs. 2)

p-Value forWilcoxon

testing(1 vs. 3)

D1 0.0± 0.0 2.0± 0.23 0.0± 0.0 <0.005 0.0052

D2 0.30± 0.20 2.50± 0.65 0.57± 0.53 <0.005 <0.005

D3 12.0± 1.20 14.23± 1.93 16.86± 2.45 <0.005 <0.005

D4 0.80± 0.30 2.05± 0.74 1.0± 0.42 <0.005 <0.005

D5 0.90± 0.80 11.68± 2.64 8.40± 3.14 <0.005 <0.005

D6 2.10± 0.60 6.48± 2.47 8.41± 2.19 <0.005 <0.005

D7 13.50± 1.20 21.96± 4.59 19.26± 4.55 <0.005 <0.005

D8 0.30± 0.20 0.40± 0.33 0.0± 0.15 <0.005 0.0058

D9 14.30± 1.50 19.79± 6.12 16.4± 4.21 <0.005 <0.005

Alaa Tharwat November 3, 2016 24 / 26

Page 25: A BA-based algorithm for parameter optimization of support vector machine

Experimental Results and Discussions BA-SVM vs. other Optimization Algorithms

0

2

4

6

8

10 0

1

2

3

4

50

0.5

1

1.5

2

log (C)log (σ)

Tes

t Err

or (

%)

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

(a) Error surface

0.20.4

0.6 0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.8

0.8

0.8 0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8 0.8 0.8

0.8

0.8 0.8

1

1

1 1 1 1

11

1

1.2

1.2

1.2 1.2 1.2

1.4

1.4

1.4 1.4 1.41.6 1.6 1.61.8 1.8 1.8

log (C)lo

g (σ

)

1 2 3 4 5 6 7 8 9

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

(b) Contour plot

Figure: Test error surface and contour plot with parameters on iris dataset.

Alaa Tharwat November 3, 2016 25 / 26

Page 26: A BA-based algorithm for parameter optimization of support vector machine

Conclusions and Future Work

The parameters of SVM and the influence of each parameter.

How BA optimizes the SVM parameters.

Alaa Tharwat November 3, 2016 26 / 26