20110809 svm in pd prediction
TRANSCRIPT
-
7/28/2019 20110809 SVM in PD Prediction
1/52
Genetic Algorithm forSupport Vector Machines Optimization
in Probability of Default PredictionWolfgang Hrdle
Dedy Dwi Prastyo
Ladislaus von Bortkiewicz Chair of StatisticsC.A.S.E. Center for Applied Statisticsand Economics
HumboldtUniversitt zu Berlinhttp://lvb.wiwi.hu-berlin.dehttp://www.case.hu-berlin.de
http://lvb.wiwi.hu-berlin.de/http://www.case.hu-berlin.de/http://www.case.hu-berlin.de/http://lvb.wiwi.hu-berlin.de/ -
7/28/2019 20110809 SVM in PD Prediction
2/52
Introduction 1-1
Classifier
Figure 1: Linear classifier functions (1 and 2) and a non-linear one (3)
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
3/52
Introduction 1-2
Loss
Nonlinear classifier function f be described by a function class
Ffixed a priori, i.e. class of linear classifiers (hyperplanes)
Loss
L(x,y) =1
2|f(x) y| =
0, if classification is correct,
1, if classification is wrong.
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
4/52
Introduction 1-3
Expected and Empirical Risk
Expected risk expected value of loss under the trueprobability measure
R(f) = 12
|f(x) y| dF(x,y)
Empirical risk average value of loss over the training set
R(f) = 1n
ni=1
12
|f(xi) yi|
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
5/52
Introduction 1-4
VC bound
Vapnik-Chervonenkis (VC) bound there is a function (monotone increasing in VC dimension h) so that for all f
Fwith
probability 1 hold
R(f)
R(f) +
h
n,
log()
n
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
6/52
Outline
1. Introduction
2. Support Vector Machine (SVM)3. Feature Selection
4. Application
5. Conclusions
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
7/52
Support Vector Machines 3-1
SVM
Classification
Data Dn = {(x1,y1) , . . . , (xn,yn)} : (X Y)n
X Rd and Y {1, 1}
Goal to predict Yfor new observation, x X, based oninformation in Dn
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
8/52
Support Vector Machines 3-2
Linearly (Non-) Separable Case detail
0
Margin ( d )
Figure 2: Hyperplane and its margin in linearly (non-) separable case
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
9/52
Support Vector Machines 3-3
SVM Dual Problem
max LD () = max n
i=1
i 1
2
n
i=1
n
j=1
ijyiyjx
i xj ,s.t. 0 i C
n
i=1 iyi = 0
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
10/52
Support Vector Machines 3-4
Data Space Feature Space
Figure 3: Mapping two dimensional data space into a three dimensional fea-
ture space, R2 R3. The transformation (x1, x2) = (x21 , 2x1x2, x22 )corresponds to K(xi, xj) = (x
i xj)2
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
11/52
Support Vector Machines 3-5
Non-Linear SVM
max
LD () = max
n
i=1 i 1
2
n
i=1n
j=1 ijyiyjK(xi, xj)
s.t. 0 i C,n
i=1
iyi = 0
Gaussian RBF kernel K(xi, xj) = exp 1 xi xj2 Polynomial kernel K(xi, xj) =
xi xj + 1
pSupport Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
12/52
Feature Selection 4-1
Structural Risk Minimization (SRM)
Search for the model structure Sh,
Sh1 Sh2 . . . Sh . . . Shk = F
such that f Sh minimises the expected risk bound, with f Fis class of linear function and h is VC dimensioni.e.
SVM(h1) . . . SVM(h) . . . SVM(hk) = Fwith h correspond to the value of SVM (kernel) parameter
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
13/52
Feature Selection 4-2
Evolutionary Feature Selection GA
Featured selection SVM parameters optimization Evolutionary optimization Genetic Algorithm (GA)
GA finds global optimum solution
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
14/52
Feature Selection 4-3
GA - SVM
mating poolpopulation
Generating Evaluation(fitness)
SVM
SVM
SVM
MODEL
Learning
Learning
Learning
DATA
selectioncrossover
mutation
Figure 4: Iteration (generation) in GA-SVM
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
15/52
Application 5-1
Credit Scoring & Probability of Default
Score (Sc) from SVM method
Sc(x) =n
i=1 iyiK(xi, x) Probability of Default (PD)
f(y = 1|Sc) = 11 + exp (0 + 1Sc)
0 and 1 are estimated by minimizing the negativelog-likelihood function (Karatzoglou and Meyer, 2006)
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
16/52
Application 5-2
Validation of Scores
Discriminatory power (of the score)
Cumulative Accuracy Profile (CAP) curve Receiver Operating Characteristic (ROC) curve Accuracy, Specificity, Sensitivity
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
17/52
Application 5-3
1
1
Model with zeropredictive power
Perfect model
Random model
Model beingevaluted /
actual
Sample rank basedon its score
0
1
1
Model with zero
predictive power
Perfect model
Random model
ROCcurve
false alarm rate
hit
ra
te
0
Figure 5: CAP curve (left) and ROC curve (right)
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
18/52
Application 5-4
Discriminatory power
Cumulative Accuracy Profile (CAP) curve
CAP/Power/Lorenz curve Accuracy Ratio (AR) Total sample vs. default sample
Receiver Operating Characteristic (ROC) curve
ROC curve Area Under Curve (AUC) Non-default sample vs. default sample
Relationship: AR = 2 AUC - 1
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
19/52
Application 5-5
Discriminatory power (contd)
sampledefault non-default
(1) (-1)
predicted
(1) True Positive (TP) False Positive (FP)
(-1) False Negative (FN) True Negative (TN)total P N
Accuracy, P(
Y = Y) = TP+TN
P+N
Specificity, P(Y = 1|Y = 1) = TNN Sensitivity, P(Y = 1|Y = 1) = TP
P
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
20/52
Application 5-6
Examples Small Sample
100 solvent and insolvent companies
X3 Operating Income / Total Asset
X24 Account Payable / Total Asset
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
21/52
Application 5-7
1.0
0.5
0.0
0.5
1.0
0.00 0.10 0.20 0.30
0.2
0.1
0.0
0.1
0.2
qq
q
qq
qqq
qq
qqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
SVM classification plot
x24
x3
1.5
1.0
0.5
0.0
0.5
1.0
1.5
0.00 0.10 0.20 0.30
0.2
0.1
0.0
0.1
0.2
qq
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
SVM classification plot
x24
x3
Figure 6: SVM plot, C = 1 and = 1/2, training error 0.19 (left) andGA-SVM, C = 14.86 and = 1/121.61, training error 0 (right).
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
22/52
Application 5-8
Credit reform data
type solvent (%) insolvent (%) total (%)
Manufacturing 27.37 (26.06) 25.70 (1.22) 27.29Construction 13.88 (13.22) 39.70 (1.89) 15.11Wholesale and retail 24.78 (23.60) 20.10 (0.96) 24.56
Real estate 17.28 (16.46) 9.40 (0.45) 16.90total 83.31 (79.34) 94.90 (4.52) 83.86
others 16.69 (15.90) 5.10 (0.24) 16.14
# 20,000 1,000 21,000
Table 1: Credit reform data
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
23/52
Application 5-9
Pre-processing
year solvent insolvent total# (%) # (%) # (%)
1997 872 ( 9.08) 86 (0.90) 958 ( 9.98)1998 928 ( 9.66) 92 (0.96) 1020 (10.62)
1999 1005 (10.47) 112 (1.17) 1117 (11.63)2000 1379 (14.36) 102 (1.06) 1481 (15.42)2001 1989 (20.71) 111 (1.16) 2100 (21.87)2002 2791 (29.07) 135 (1.41) 2926 (30.47)
total 8964 (93.36) 638 (6.64) 9602 (100)Table 2: Pre-processed credit reform data
Support Vector Machines
-
7/28/2019 20110809 SVM in PD Prediction
24/52
Application 5-10
Scenario
scenario training set testing set
Scenario-1 1997 1998
Scenario-2 1997-1998 1999Scenario-3 1997-1999 2000Scenario-4 1997-2000 2001Scenario-5 1997-2001 2002
Table 3: Training and testing data set
Support Vector Machines
l
-
7/28/2019 20110809 SVM in PD Prediction
25/52
Application 5-11
Full model, X1, . . . , X28
Predictors 28 financial ratio variables
Population (# solutions) 20
Evolutionary iteration (generation) 100
Elitism 0.2 of population
Crossover rate 0.5, mutation rate 0.1
Optimal SVM parameters = 1/178.75 andC
= 63.44
Support Vector Machines
A li i 5 12
-
7/28/2019 20110809 SVM in PD Prediction
26/52
Application 5-12
Quality of classification (1/2)
sampletraining testing
Disc. power
AR 1 1AUC 1 1
Accuracy 1 1Specificity 1 1Sensitivity 1 1
Table 4: Discriminatry power of Scenario-1, 2, 3, 4, 5
Support Vector Machines
A li ti 5 13
-
7/28/2019 20110809 SVM in PD Prediction
27/52
Application 5-13
Quality of classification (2/2)
training TE (CV) testing TE (CV)
1997 0 (8.98) 1998 0 ( 9.02)1997-1998 0 (8.99) 1999 0 (10.03)1997-1999 0 (9.37) 2000 0 ( 6.89)1997-2000 0 (8.57) 2001 0 ( 5.29)1997-2001 0 (4.55) 2002 0 ( 4.61)
Table 5: Percentage of Training Error (TE) and Cross-Validation (CV, with
group=5)
Support Vector Machines
C l i 6 1
-
7/28/2019 20110809 SVM in PD Prediction
28/52
Conclusion 6-1
Conclusion
Optimal feature selection (via Genetic Algorithm) leads to
perfect classification
Cross validation overcome the overfiting in training & testingerror
Support Vector Machines
G ti Al ith f
-
7/28/2019 20110809 SVM in PD Prediction
29/52
Genetic Algorithm forSupport Vector Machines Optimizationin Probability of Default Prediction
Wolfgang Hrdle
Dedy Dwi Prastyo
Ladislaus von Bortkiewicz Chair of StatisticsC.A.S.E. Center for Applied Statisticsand EconomicsHumboldtUniversitt zu Berlinhttp://lvb.wiwi.hu-berlin.dehttp://www.case.hu-berlin.de
References 7 1
http://lvb.wiwi.hu-berlin.de/http://www.case.hu-berlin.de/http://www.case.hu-berlin.de/http://lvb.wiwi.hu-berlin.de/ -
7/28/2019 20110809 SVM in PD Prediction
30/52
References 7-1
References
Chen, S., Hrdle, W. and Moro, R.Estimation of Default Probabilities with Support Vector
MachinesQuantitative Finance, 2011, 11, 135 - 154
Holland, J.H.Adaptation in Natural and Artificial Systems
University of Michigan Press, 1975
Support Vector Machines
References 7 2
-
7/28/2019 20110809 SVM in PD Prediction
31/52
References 7-2
References
Karatzoglou, A. and Meyer, D.Support Vector Machines in R
Journal of Statistical Software, 2006, 15:9, 1-28Zhang, J. L. and Hrdle, W.The Bayesian Additive Classification Tree Applied to CreditRisk ModellingComputational Statistics and Data Analysis, 2010, 54,
1197-1205
Support Vector Machines
Appendix Linearly Separable Case 9-1
-
7/28/2019 20110809 SVM in PD Prediction
32/52
Appendix Linearly Separable Case 9 1
Linearly Separable Case back
0
Margin ( d )
Figure 7: Separating hyperplane and its margin in linearly separable case
Support Vector Machines
Appendix Linearly Separable Case 9-2
-
7/28/2019 20110809 SVM in PD Prediction
33/52
Appendix Linearly Separable Case 9 2
Choose f Fsuch that margin (d + d+) is maximal No error separation, if all i = 1, 2, ..., n satisfy
xi w+ b +1 for yi = +1xi w+ b
1 for yi =
1
Both constraints are combined into
yi(x
i w+ b) 1 0 i = 1, 2, ..., n
Support Vector Machines
Appendix Linearly Separable Case 9-3
-
7/28/2019 20110809 SVM in PD Prediction
34/52
Appendix Linearly Separable Case 9 3
Distance between margins and the separating hyperplane isd+ = d = 1/w
Maximize the margin, d+ + d = 2/w, could be attained byminimizing w or w2
Lagrangian for the primal problem
LP (w, b) =1
2w2
ni=1
i{yi(xi w+ b) 1}
Support Vector Machines
Appendix Linearly Separable Case 9-4
-
7/28/2019 20110809 SVM in PD Prediction
35/52
pp y p
Karush-Kuhn-Tucker (KKT) first order optimality conditions
LPwk
= 0 : wk n
i=1
iyixik = 0 k= 1, ...,d
LP
b
= 0 :n
i=1 iyi = 0yi(x
i w+ b) 1 0 i = 1, ..., ni 0
i
{yi(x
i w+ b)
1
}= 0
Support Vector Machines
Appendix Linearly Separable Case 9-5
-
7/28/2019 20110809 SVM in PD Prediction
36/52
pp y p
Solution w =n
i=1 iyixi, therefore
1
2w2
=
1
2
n
i=1
n
j=1
ijyiyjx
i xj
n
i=1
i{yi(xi w+ b) 1} = n
i=1
iyix
i
nj=1
jyjxj +n
i=1
i
= ni=1
nj=1
ijyiyjx
i xj +n
i=1
i
Lagrangian for the dual problem
LD () =n
i=1
i 12
ni=1
nj=1
ijyiyjx
i xj
Support Vector Machines
Appendix Linearly Separable Case 9-6
-
7/28/2019 20110809 SVM in PD Prediction
37/52
Primal and dual problems
minw,b
LP (w, b)
max
LD () s.t. i 0,n
i=1 iyi = 0 Optimization problem is convex, therefore the dual and primal
formulations give the same solution
Support vector, a point i for which yi(x
i w+ b) = 1 holds
Support Vector Machines
Appendix Linearly Non-separable Case 10-1
-
7/28/2019 20110809 SVM in PD Prediction
38/52
Linearly Non-separable Case back
0
Figure 8: Hyperplane and its margin in linearly non-separable case
Support Vector Machines
Appendix Linearly Non-separable Case 10-2
-
7/28/2019 20110809 SVM in PD Prediction
39/52
Slack variables i represent the violation from strict separation
x
i w+ b 1 i for yi = 1,xi w+ b 1 + i for yi = 1,
i 0
constraints are combined into
yi(x
i w+ b) 1 i and i 0
If i > 0, the objective function is
1
2w2 + C
ni=1
i
Support Vector Machines
Appendix Linearly Non-separable Case 10-3
-
7/28/2019 20110809 SVM in PD Prediction
40/52
Lagrange function for the primal problem
LP (w, b, ) = 12
w2 + Cn
i=1
in
i=1i{yi
xi w+ b
1 + i}
n
i=1ii,
where i 0 and i 0 are Lagrange multipliers
Primal problem
minw,b,
LP (w, b, )
Support Vector Machines
Appendix Linearly Non-separable Case 10-4
-
7/28/2019 20110809 SVM in PD Prediction
41/52
First order conditions
LPwk = 0 : wk
ni=1
iyixik = 0
LPb
= 0 :n
i=1
iyi = 0
LPi
= 0 : C i i = 0
s.t. i
0, i
0, ii = 0
i{yi(xi w+ b) 1 + i} = 0
Support Vector Machines
Appendix Linearly Non-separable Case 10-5
-
7/28/2019 20110809 SVM in PD Prediction
42/52
Note thatn
i=1 iyib= 0. Translate primal problem into
LD () =n
i=1
i 12
ni=1
nj=1
ijyiyjxi xj +n
i=1
i (C i i)
Last term is 0, therefore the dual problem is
max
LD () = max
ni=1
i 12
ni=1
nj=1
ijyiyjx
i xj
,
s.t. 0 i C,n
i=1
iyi = 0
back
Support Vector Machines
AppendixGenetic Algorithm 11-1
-
7/28/2019 20110809 SVM in PD Prediction
43/52
What is a Genetic Algorithm ?
Genetics algorithm is search
and optimization techniquebased on Darwins principleon natural selection(Holland, 1975)
Support Vector Machines
AppendixGenetic Algorithm 11-2
-
7/28/2019 20110809 SVM in PD Prediction
44/52
GA Initialization Back
f obj
solution
1 1 1 1 0 1 00 1 1 0 1 1 0
0 0 1 0 0 1 11 0 0 0 1 0 10 1 0 1 1 0 0
1 1 1 0 0 0 11 1 0 1 1 1 01 0 1 1 0 0 10 0 1 0 1 1 11 1 0 1 0 0 1
binary encodingpopulation
decoding
global maximum
global minimum
12277
541969441131108923105
real number( solution )
Figure 9: GA at first generation
Support Vector Machines
AppendixGenetic Algorithm 11-3
-
7/28/2019 20110809 SVM in PD Prediction
45/52
GA Convergency
f obj
solution
f obj
solution
Figure 10: Solutions at 1st generation (left) and rth generation (right)
Support Vector Machines
AppendixGenetic Algorithm 11-4
-
7/28/2019 20110809 SVM in PD Prediction
46/52
GA Decoding
Figure 11: Decoding
= lower + (upper lower)l1i=0 ai2
i
2l
where is solution (i.e. parameter C or ), a is allele
Support Vector Machines
AppendixGenetic Algorithm 11-5
-
7/28/2019 20110809 SVM in PD Prediction
47/52
GA Fitness evaluation
Calculate f(i), i = 1, . . . , popsize Evaluate fitness, fdp(i)
fdp(i) AR, AUC, accuracy, specificity, sensitivity
Relative fitness, pi =fdp(
i)popsize
k=ifdp(i)
S S
S
$
L
SS
LS
#
SRSVL]HS
Figure 12: Proportion to be choosen in the next iteration (generation)
Support Vector Machines
AppendixGenetic Algorithm 11-6
-
7/28/2019 20110809 SVM in PD Prediction
48/52
GA Roulette wheel
S S
S
$
L
SS
LS
#
SRSVL]HS
[ , ] ( )=( )( ) ( ) ( )
rand U(0, 1)
Select ith
chromosome ifki=1 pi < rand < k+1i=1 pi Repeat popsize times to get popsize new chromosomes
Support Vector Machines
AppendixGenetic Algorithm 11-7
-
7/28/2019 20110809 SVM in PD Prediction
49/52
GA Crossover
1
1
1
12
2
2
2
Figure 13: Crossover in nature
Cross point
Figure 14: Randomly choosen
one-point crossover (top) and
two-points crossover (bottom)
Support Vector Machines
AppendixGenetic Algorithm 11-8
-
7/28/2019 20110809 SVM in PD Prediction
50/52
GA Reproductive operator[ , ]
O
3DUHQWV 2IIVSULQJ
O
O
2IIVSULQJRIFURVVRYHU 0XWDQW
M
Figure 15: One-point crossover (top) and bit-flip mutation (bottom)
Support Vector Machines
AppendixGenetic Algorithm 11-9
-
7/28/2019 20110809 SVM in PD Prediction
51/52
GA Elitism
Best solution in each iteration is maintained in anothermemory place
New population replaces the old one, check whether bestsolution is in the population
If not, replace any one in the population with best solution
Support Vector Machines
AppendixGenetic Algorithm 11-10
-
7/28/2019 20110809 SVM in PD Prediction
52/52
Nature to Computer Mapping Back
Nature GA-SVM
Population Set of parameterIndividual (phenotype) Parameters
Fitness Discriminatory powerChromosome (genotype) Encoding of parameterGene Binary encodingReproduction CrossoverGeneration Iteration
Table 6: Nature to GA-SVM mapping
Support Vector Machines