multiclass svm design and parameter selection with genetic algorithms

Multiclass SVM Design and ParameterSelection with Genetic Algorithms

Ana Carolina Lorena

André C. P. L. F. de Carvalho

ICMC - USP

Topics

� Introduction

� Code-Matrix Decomposition

� Evolutionary Design of Code-Matrices

� Optimization of SVMs Parameters

� Conclusions

Introduction

� Classification problems

� Binary

� Multiclass

� Support Vector Machines (SVMs)

� Statistical learning theory

� Margin maximization

� Originally: binary problems

� Multiclass extension �� Direct

�� Decomposition

Introduction

� Main decomposition strategies

� Multiclass decompotion a priori

� Adapt decompositions to each problem

Genetic AlgorithmsGenetic Algorithms

Extension to determine SVMs parameters

� Represented by a code-matrix M

+1 –1 –1 +1

–1 +1 0 –1

0 –1 +1 –1

–1 0 –1 +1

1

2

3

4

f1 f2 f3 f4

(1,4)x(2,3)

(3)x(1,4)

(2)x(1,3)

(1)x(2,4)

Code-Matrix Decomposition


� Decoding

f3

class 1 class 2

class 3

class 4

f2 f4f1

+1 -1 -1 -1

-1 +1 -1 -1

-1 -1 +1 -1

-1 -1 -1 +1

x

f1 f2 f3 f4

–0,1 2,0 –0,1 –0,1

d

� Decoding function d• Margin-based


� Main decompositions:

� One-agaisnt-all (OAA)

• One classifier for each class+1 –1 –1 –1

–1 +1 –1 –1

–1 –1 +1 –1

–1 –1 –1 +1

1

2

3

4

f1 f2 f3

(3)x(1,2,4)(2)x(1,3,4)

(1)x(2,3,4)

f4

(4)x(1,2,3)

OAA




• One classifier for each class

� One-against-one (OAO)

• One classifier for each pair of classes

+1 +1 +1 0 0 0

–1 0 0 +1 +1 0

0 –1 0 –1 0 +1

0 0 –1 0 –1 –1

(1)x(4)

(1)x(3)

(1)x(2)

(2)x(3)

1

2

3

4

f1 f2 f3 f4 f5 f6

(2)x(4)(3)x(4)

OAO




• One classifier for each class

� One-against-one (OAO)

• One classifier for each pair of classes

� Error Correction Output Codes (ECOC)

• Error correcting codes for the classes(1,3)x(2,4)

(1,4)x(2,3)(1)x(2,3,4)

(1,3,4)x(2)

(1,2)x(3,4)

+1 +1 +1 +1 +1 +1 +1

–1 –1 –1 –1 +1 +1 +1

–1 –1 +1 +1 –1 –1 +1

–1 +1 –1 +1 –1 +1 –1

1

2

3

4

f1 f2 f3 f4 f5 f6 f7

(1,2,4)x(3)(1,2,3)x(4)

ECOC

Evolutionary Design of Code-Matrices

� Determine combination of binary predictors

� Matrices adapted to each application

Binary

classifiers

Number of

classifiersGenetic Algorithms

Avoid equal classifiers

Individuals Representation

� Code matrices� Varying sizes

• Between minimum and maximum

+1 -1 -1 -1

-1 +1 -1 -1

Individual 1 Individual 2

-1 -1 +1 -1

-1 -1 -1 +1

+1 +1 +1 0

-1 0 0 +1

0 -1 0 -1

0 0 -1 0

0 0

+1 0

0 +1

-1 -1

class 1

class 2

class 3

class 4

f1 f2 f3 f4 f1 f2 f3 f4 f5 f6

log2k

Individuals Evaluation

� Predictive power in the multiclass problem solution

� Error in a validation set

• Includes unknown classifications

Simpler solutions Occam’s razor

� Equivalent columns are penalized

� Number of binary classifiers� Preference is given to smaller individuals

minimized

Genetic Operators

� Cross-over

+1 -1 -1 -1-1 +1 -1 -1

Parent 1 Parent 2

-1 -1 +1 -1-1 -1 -1 +1

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

Cross-over point

+1 -1 0 -1-1 +1 +1 -1

Offspring 1 Offspring 2

-1 -1 -1 -1-1 -1 0 +1

+1 +1 +1 -1-1 0 0 -10 -1 0 +10 0 -1 -1

0 0+1 00 +1-1 -1

+1 -1-1 +1

Offspring 1 Offspring 2

-1 -1-1 -1

+1 +1 +1-1 0 00 -1 00 0 -1

0+1-10

0 0+1 00 +1-1 -1

-1 -1-1 -1+1 -1-1 +1

Parent 1

+1 -1 -1 -1-1 +1 -1 -1

Parent 2

-1 -1 +1 -1-1 -1 -1 +1

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

Cross-over point� Column exchange

� Columns permute

� Mutation

� Element alteration

� Column alteration

� Column insertion

� Column removal

Individual 1

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

Individual 2

+1 +1 +1 +1-1 -1 0 00 -1 0 -10 0 -1 -1

0 0+1 00 +1-1 -1

Individual 1

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

Individual 2

+1 +1 +1 +1-1 -1 0 00 -1 0 -10 0 -1 -1

0 0+1 00 +1-1 -1

Individual 1

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

Individual 2

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

+1-1+10

Individual 1

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

0 0+1 00 +1-1 -1

Individual 2

+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0

00

+1-1

Evolutionary Design of Code-Matrices

� Choice of cross-over and mutation types

� Probabilistic

• Depend on the performance on previous generations

� Elitism

� Tournament selection

� Initial population

� Random solutions

� Common code-matrices

Experiments – Datasets

92355fungi

72310segment

5203lung

41728car

#Classes#TrainDataset

� Train-test division

� Validation sets

• Holdout in training sets

� Pre-processing

• Nominal attributes

• Attribute selection

• Normalization

• Cross-validation

Experiments I

� OAA: most used decomposition

� Good results for SVMs

� Less binary classifiers

� GAs: maximum number of classifiers = k

Objetives:

� Determine adequate decompositions...

� simpler than OAA...

� with similar or higher accuracy rates

Experiments I

� SVMs parameters

• Gaussian Kernel, σ = 0.01

• Regularization constant C = 100

50*k5*k0.050.750.8

CyclesPopulationMutationSelectionCross-over

Equal in all binary

SVMs

� GAs parameters:

� Configuration:

Experiments I

� GAs: executed 30 times for each partition

� Matrices with medium performance in analysis

� Random matrices

� Error correcting heuristic

� Varying numbers of classifiers

� 30 for each dataset

� Matrices of medium performance in analysis

Experiments I - Results

� Accuracy rates:

82.8

65.9 67.8

39.8

97.8

85.5

6256.1

95.494

85.1

96.2

0

20

40

60

80

100

car lun seg fun

OAA

Rand.

GA

Experiments I - Results

� Number of binary classifiers:

4

5

7

9

6.5

7.7

3.63.6

0

2

4

6

8

10

car lun seg fun

OAA

GA

Experiments I - Conclusions

� GA accomplished goals

� Compared or better than OAA

� Less binary classifiers

� Similar or better than other matrices

� Obtained matrices:

� Adapted to the applications and also to the SVMs

Adapt SVMs parameters

Optimization of SVMs Parameters

� Extension in the individuals

class 1

class 2

class 3class 4

f4

+1 0 -1 +1-1 +1 0 -10 -1 +1 -1-1 +1 +1 +1

f1 f2 f3

0 0 1 11 0 0 10 0 1 10 1 1 1

C

σ

• Parameters for each SVM

• Decode bits information:

bd+a

b = based = decimal value of bitsa = constant

b=10 a=-3

d=0

σ= 100-3 = 0.001

Experiments II

� Two bits for each parameter

Objetives:

� Determine decompositions...

� and parameters of the binary SVMs...

� can improve accuracy rates achieved?

C = [100,101,102,103]

a = 0 and b = 10

σ = [10-3,10-2,10-1,100]

a = -3 and b = 10� Same parameters for GA

Experiments II- Results

� Accuracy rates97.8

85.5

62

99.3

90.2

65.1

96.2 96.7

0

20

40

60

80

100

car lun seg fun

GA

Gasel

Experiments II - Results

� Number of binary classifiers:

6.5

7.7

6.8

7.7

3.63.6 3.53.6

0

2

4

6

8

10

car lun seg fun

GA

Gasel

Experiments II - Conclusions

� Parameter adjustment can improve solutions

� Difficult to adjust by trial and error

� Usually, commom values for binary classifiers

GAs can be used in this process

Conclusions

� GAs can be used to adapt code-matrices

� Also the parameters of the binary technique

� Use of less binary SVMs

� Good results in Bioinformatics applications

� Computational cost of GAs

Conclusions

� Modification of GAs parameters

� Other multiclass datasets

� Modification of evaluation function

� GAs to search parameters of other strategies

� OAA, OAO, ECOC

multiclass svm design and parameter selection with genetic algorithms

Documents