multiclass svm design and parameter selection with genetic algorithms
TRANSCRIPT
Multiclass SVM Design and ParameterSelection with Genetic Algorithms
Ana Carolina Lorena
André C. P. L. F. de Carvalho
ICMC - USP
Topics
� Introduction
� Code-Matrix Decomposition
� Evolutionary Design of Code-Matrices
� Optimization of SVMs Parameters
� Conclusions
Introduction
� Classification problems
� Binary
� Multiclass
� Support Vector Machines (SVMs)
� Statistical learning theory
� Margin maximization
� Originally: binary problems
� Multiclass extension ���� Direct
���� Decomposition
Introduction
� Main decomposition strategies
� Multiclass decompotion a priori
� Adapt decompositions to each problem
Genetic AlgorithmsGenetic Algorithms
Extension to determine SVMs parameters
� Represented by a code-matrix M
+1 –1 –1 +1
–1 +1 0 –1
0 –1 +1 –1
–1 0 –1 +1
1
2
3
4
f1 f2 f3 f4
(1,4)x(2,3)
(3)x(1,4)
(2)x(1,3)
(1)x(2,4)
Code-Matrix Decomposition
Code-Matrix Decomposition
� Decoding
f3
class 1 class 2
class 3
class 4
f2 f4f1
+1 -1 -1 -1
-1 +1 -1 -1
-1 -1 +1 -1
-1 -1 -1 +1
x
f1 f2 f3 f4
–0,1 2,0 –0,1 –0,1
d
� Decoding function d• Margin-based
Code-Matrix Decomposition
� Main decompositions:
� One-agaisnt-all (OAA)
• One classifier for each class+1 –1 –1 –1
–1 +1 –1 –1
–1 –1 +1 –1
–1 –1 –1 +1
1
2
3
4
f1 f2 f3
(3)x(1,2,4)(2)x(1,3,4)
(1)x(2,3,4)
f4
(4)x(1,2,3)
OAA
Code-Matrix Decomposition
� Main decompositions:
� One-agaisnt-all (OAA)
• One classifier for each class
� One-against-one (OAO)
• One classifier for each pair of classes
+1 +1 +1 0 0 0
–1 0 0 +1 +1 0
0 –1 0 –1 0 +1
0 0 –1 0 –1 –1
(1)x(4)
(1)x(3)
(1)x(2)
(2)x(3)
1
2
3
4
f1 f2 f3 f4 f5 f6
(2)x(4)(3)x(4)
OAO
Code-Matrix Decomposition
� Main decompositions:
� One-agaisnt-all (OAA)
• One classifier for each class
� One-against-one (OAO)
• One classifier for each pair of classes
� Error Correction Output Codes (ECOC)
• Error correcting codes for the classes(1,3)x(2,4)
(1,4)x(2,3)(1)x(2,3,4)
(1,3,4)x(2)
(1,2)x(3,4)
+1 +1 +1 +1 +1 +1 +1
–1 –1 –1 –1 +1 +1 +1
–1 –1 +1 +1 –1 –1 +1
–1 +1 –1 +1 –1 +1 –1
1
2
3
4
f1 f2 f3 f4 f5 f6 f7
(1,2,4)x(3)(1,2,3)x(4)
ECOC
Evolutionary Design of Code-Matrices
� Determine combination of binary predictors
� Matrices adapted to each application
Binary
classifiers
Number of
classifiersGenetic Algorithms
Avoid equal classifiers
Individuals Representation
� Code matrices� Varying sizes
• Between minimum and maximum
+1 -1 -1 -1
-1 +1 -1 -1
Individual 1 Individual 2
-1 -1 +1 -1
-1 -1 -1 +1
+1 +1 +1 0
-1 0 0 +1
0 -1 0 -1
0 0 -1 0
0 0
+1 0
0 +1
-1 -1
class 1
class 2
class 3
class 4
f1 f2 f3 f4 f1 f2 f3 f4 f5 f6
log2k
Individuals Evaluation
� Predictive power in the multiclass problem solution
� Error in a validation set
• Includes unknown classifications
Simpler solutions Occam’s razor
� Equivalent columns are penalized
� Number of binary classifiers� Preference is given to smaller individuals
minimized
Genetic Operators
� Cross-over
+1 -1 -1 -1-1 +1 -1 -1
Parent 1 Parent 2
-1 -1 +1 -1-1 -1 -1 +1
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
Cross-over point
+1 -1 0 -1-1 +1 +1 -1
Offspring 1 Offspring 2
-1 -1 -1 -1-1 -1 0 +1
+1 +1 +1 -1-1 0 0 -10 -1 0 +10 0 -1 -1
0 0+1 00 +1-1 -1
+1 -1-1 +1
Offspring 1 Offspring 2
-1 -1-1 -1
+1 +1 +1-1 0 00 -1 00 0 -1
0+1-10
0 0+1 00 +1-1 -1
-1 -1-1 -1+1 -1-1 +1
Parent 1
+1 -1 -1 -1-1 +1 -1 -1
Parent 2
-1 -1 +1 -1-1 -1 -1 +1
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
Cross-over point� Column exchange
� Columns permute
� Mutation
� Element alteration
� Column alteration
� Column insertion
� Column removal
Individual 1
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
Individual 2
+1 +1 +1 +1-1 -1 0 00 -1 0 -10 0 -1 -1
0 0+1 00 +1-1 -1
Individual 1
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
Individual 2
+1 +1 +1 +1-1 -1 0 00 -1 0 -10 0 -1 -1
0 0+1 00 +1-1 -1
Individual 1
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
Individual 2
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
+1-1+10
Individual 1
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
0 0+1 00 +1-1 -1
Individual 2
+1 +1 +1 0-1 0 0 +10 -1 0 -10 0 -1 0
00
+1-1
Evolutionary Design of Code-Matrices
� Choice of cross-over and mutation types
� Probabilistic
• Depend on the performance on previous generations
� Elitism
� Tournament selection
� Initial population
� Random solutions
� Common code-matrices
Experiments – Datasets
92355fungi
72310segment
5203lung
41728car
#Classes#TrainDataset
� Train-test division
� Validation sets
• Holdout in training sets
� Pre-processing
• Nominal attributes
• Attribute selection
• Normalization
• Cross-validation
Experiments I
� OAA: most used decomposition
� Good results for SVMs
� Less binary classifiers
� GAs: maximum number of classifiers = k
Objetives:
� Determine adequate decompositions...
� simpler than OAA...
� with similar or higher accuracy rates
Experiments I
� SVMs parameters
• Gaussian Kernel, σ = 0.01
• Regularization constant C = 100
50*k5*k0.050.750.8
CyclesPopulationMutationSelectionCross-over
Equal in all binary
SVMs
� GAs parameters:
� Configuration:
Experiments I
� GAs: executed 30 times for each partition
� Matrices with medium performance in analysis
� Random matrices
� Error correcting heuristic
� Varying numbers of classifiers
� 30 for each dataset
� Matrices of medium performance in analysis
Experiments I - Results
� Accuracy rates:
82.8
65.9 67.8
39.8
97.8
85.5
6256.1
95.494
85.1
96.2
0
20
40
60
80
100
car lun seg fun
OAA
Rand.
GA
Experiments I - Results
� Number of binary classifiers:
4
5
7
9
6.5
7.7
3.63.6
0
2
4
6
8
10
car lun seg fun
OAA
GA
Experiments I - Conclusions
� GA accomplished goals
� Compared or better than OAA
� Less binary classifiers
� Similar or better than other matrices
� Obtained matrices:
� Adapted to the applications and also to the SVMs
Adapt SVMs parameters
Optimization of SVMs Parameters
� Extension in the individuals
class 1
class 2
class 3class 4
f4
+1 0 -1 +1-1 +1 0 -10 -1 +1 -1-1 +1 +1 +1
f1 f2 f3
0 0 1 11 0 0 10 0 1 10 1 1 1
C
σ
• Parameters for each SVM
• Decode bits information:
bd+a
b = based = decimal value of bitsa = constant
b=10 a=-3
d=0
σ= 100-3 = 0.001
Experiments II
� Two bits for each parameter
Objetives:
� Determine decompositions...
� and parameters of the binary SVMs...
� can improve accuracy rates achieved?
C = [100,101,102,103]
a = 0 and b = 10
σ = [10-3,10-2,10-1,100]
a = -3 and b = 10� Same parameters for GA
Experiments II- Results
� Accuracy rates97.8
85.5
62
99.3
90.2
65.1
96.2 96.7
0
20
40
60
80
100
car lun seg fun
GA
Gasel
Experiments II - Results
� Number of binary classifiers:
6.5
7.7
6.8
7.7
3.63.6 3.53.6
0
2
4
6
8
10
car lun seg fun
GA
Gasel
Experiments II - Conclusions
� Parameter adjustment can improve solutions
� Difficult to adjust by trial and error
� Usually, commom values for binary classifiers
GAs can be used in this process
Conclusions
� GAs can be used to adapt code-matrices
� Also the parameters of the binary technique
� Use of less binary SVMs
� Good results in Bioinformatics applications
� Computational cost of GAs