a genetic algorithm-based approach to cost-sensitive bankruptcy prediction

Expert Systems with Applications 38 (2011) 12939–12945

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

A genetic algorithm-based approach to cost-sensitive bankruptcy prediction

Ning Chen a,⇑, Bernardete Ribeiro b, Armando S. Vieira a, João Duarte a, João C. Neves c

a GECAD, Instituto Superior de Engenharia do Porto, Instituto Politecnico do Porto, Rua Dr. Antonio Bernardino de Almeida, 431, 4200-072 Porto, Portugalb CISUC, Department of Informatics Engineering, University of Coimbra, Rua Silvio Lima-Polo II, Coimbra 3030-790, Portugalc ISEG-School of Economics, Technical University of Lisbon, Rua do Quelhas 6, Lisbon 1200-781, Portugal

a r t i c l e i n f o a b s t r a c t

Keywords:Neural networkLearning vector quantizationClassificationCost-sensitive learningFeature selectionGenetic algorithm

0957-4174/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.04.090

⇑ Corresponding author. Tel.: +351 228340500; faxE-mail addresses: [email protected] (N. C

Ribeiro), [email protected] (A.S. Vieira), [email protected] (J.C. Neves).

The prediction of bankruptcy is of significant importance with the present-day increase of bankrupt com-panies. In the practical applications, the cost of misclassification is worthy of consideration in the mod-eling in order to make accurate and desirable decisions. An effective prediction system requires theintegration of the cost preference into the construction and optimization of prediction models. This paperpresents an evolutionary approach for optimizing simultaneously the complexity and the weights oflearning vector quantization network under the symmetric cost preference. Experimental evidences ona real-world data set demonstrate the proposed algorithm leads to significant reduction of features with-out the degradation of prediction capability.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Bankruptcy prediction is a widely studied research topic infinancial analysis due to the increasing tendency of bankruptenterprises and deepening financial crisis nowadays. For example,in Portugal the bankruptcy of enterprises represented an increaseof 49% during the year 2009 compared to the previous year. Theprediction systems which can identify the risk of failures correctlyare important to bank decision and early warning. In the field ofbankruptcy prediction, two errors are mostly concerned, namelytype I error and type II error. The former stands for classifying abankrupt company as an insolvent one, which results in the costof losing principal and interest. While the latter stands for classify-ing a solvent company as a bankrupt one, which results in the costof losing profit. As is well known, the two costs are usually asym-metric and should be considered in the practical application tomake a tradeoff between the two errors. Various studies focus onthe modification of conventional classification algorithms for thepurpose of incorporating the cost preference into the classification.Some researchers argue that by specifying an appropriate cost-rel-evant objective function, the classification can be regarded as anoptimization problem and solved by evolutionary algorithms.Genetic algorithm (GA) gained rapid popularity and proved to beeffective in optimizing the linear discriminant analysis model, sup-port vector machine, back-propagation neural networks, etc. How-

ll rights reserved.

: +351 228321159.hen), [email protected] (B..pt (J. Duarte), jcneves@ise-

ever, few studies illustrate the integration of GA and learningvector quantization (LVQ) for the purpose of cost-sensitive bank-ruptcy prediction.

It has been shown that LVQ with genetically evolved con-nected weights outperforms a modified LVQ which integratesthe cost information into learning methodology (Chen, Ribeiro,Vieira, Duarte, & Neve, 2010). The idea is to enhance the LVQtraining through the global search of genetic algorithm with re-spect to the appropriate fitness function. Since only the connectedweights are optimized without taking the feature selection andparameter determination into consideration, the optimal solutionmay be missed. As pointed out by many evidences, feature selec-tion plays an important role in classification in terms of improv-ing the predictive accuracy and decreasing the complexity ofmodels. Additionally, the resultant predictive model is somewhatdependent on the parameters employed. GA provides the facilityto simultaneously optimize the factors that potentially impact onthe performance so that it does not require the prior knowledgeabout the important features and the number of units neededto represent the classes. In this paper, we present a genetic algo-rithm-based approach to integrate the connected weight optimi-zation, parameter determination and feature selection in anevolutionary procedure. The cost preference is directly incorpo-rated into the fitness function of the genetic algorithm for perfor-mance evaluation. The performance of the proposed algorithm isinvestigated on the real-life data. The obtained results demon-strate that the reduction of features contributes to the improve-ment of prediction capability.

The rest of this paper is organized as follows. Section 2 reviewsthe previous work of bankruptcy prediction, cost-sensitive learning

http://dx.doi.org/10.1016/j.eswa.2011.04.090

mailto:[email protected]






http://dx.doi.org/10.1016/j.eswa.2011.04.090

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

12940 N. Chen et al. / Expert Systems with Applications 38 (2011) 12939–12945

and feature selection. Section 3 presents the framework of a GAFS-LVQ algorithm. Section 4 describes the experimental design andresults. The last section summarizes the paper with contributionsand future research issues.

2. Related work

2.1. Bankruptcy prediction

Bankruptcy prediction has profound impact on bank decisionsand profitability. The main concern of interest is to construct theprediction model representing the relationship between the bank-ruptcy and financial ratios and then deploy the model to identifythe high risk of failure in the future. In the literature, bankruptcyprediction was solved by statistical methods and machine learningmethods. Statistical methods comprise discriminant analysis, lo-gistic model, factor analysis, etc. Machine learning methods in-clude neural network, decision tree, support vector machine,case-based reasoning, fuzzy logic, rough set, hybrid and ensembleapproach (Ravi & Ravi, 2007). Among them, neural network is oneof the most widely applied tool and its capability has been provedby a large variety of work. Vector quantization (VQ) forms a quan-tized approximation of input vectors through a finite number ofprototypes (connected weights). LVQ is a supervised variant ofVQ and useful for complicated non-linear separation problems(Kohonen, 2001). The network is composed of two levels, in whichthe input level is fully connected with the output level. The mod-eling technique is based on the neurons representing prototypevectors and the nearest neighbor classification rule. The goal oflearning is to determine the weights that best represent the clas-ses. LVQ has been employed to detect the distressed companieswith satisfactory performance (Neves Boyacioglu, Kara, & Baykan,2009; Chen & Vieira, 2009; Neves & Vieira, 2006). In this paper,we use the LVQ model for cost-sensitive bankruptcy prediction.

2.2. Cost-sensitive learning

Cost-sensitive learning addresses the challenging classificationproblem in which there are different costs associated with differenterrors. Compared with most existing classification methods whichaim to minimize the total number of errors, cost-sensitive learningassigns the cost to the errors and intends to minimize the total costof errors. Generally, cost-sensitive learning is performed on the datalevel or the algorithm level. The first approach performs as apreprocessing (or a postprocessing) phase of error-based classifiersfor general-purpose, such as stratification which changes the fre-quency of classes in the training data (Chan & Stolfo, 1998), Meta-Cost which re-labels the training samples with their estimatedminimal-cost classes (Domingos, 1999), and threshold-movingwhich moves the output towards the expensive class (Pendharkar,2008). The second approach explicitly incorporates the cost infor-mation into the learning methodology. The existing algorithms in-clude decision tree, regularized least square, boosting learning,back-propagation neural network, and mathematical programming(Koh, 1992; Ling, Yang, Wang, & Zhang, 2004; Pendharkar & Nanda,2006; Sun, Kamela, Wong, & Wang, 2007; Vo & Won, 2007). Thesemethods are implemented by adapting the learning methodologyto asymmetric cost preference. Evolutionary algorithms are a prom-ising approach to cost-sensitive learning, which can be representedas an optimization problem. The results reported in Nanda and Pen-dharkar (2001) show that a genetic-based approach that incorpo-rates the asymmetric cost preference in the linear discriminantanalysis model leads to desirable results.

Some efforts have been undertaken to make LVQ cost-sensitivein both data level and algorithm level. In Chen, Vieira, and Duarte

(2009), the cost matrix is integrated with basic LVQ algorithmusing standard sampling and threshold-moving techniques. InChen, Vieira, Duarte, Ribeiro, and Neves (2009), a cost-LVQ algo-rithm is presented based on the modification of a batch LVQ algo-rithm. The cost information is incorporated into the model whenperforming the update of map neurons so that the instances ofexpensive class are harder to be misclassified. A hybrid algorithm(Chen et al., 2010) employs genetic algorithm to optimize the con-nected weights of a LVQ model. The prototypes are coded as the in-put to genetic evolution and optimized through the geneticoperators. The superiority of this approach is demonstrated com-pared to the local search strategy.

2.3. Feature selection

In the field of bankruptcy prediction, a large number of indica-tors are usually involved so that the training data is insufficient tocover the decision space, which is called as the curse of dimension-ality. Feature selection addresses the problem by removing irrele-vant, redundant and correlated features, improving the accuracyand compactness of classification model, decreasing the computa-tional effort, and facilitating the use of models.

Feature selection is basically an optimization problem whichsearches through the space of feature subsets to identify the rele-vant features. The previous studies can be divided into two catego-ries, namely filter approach and wrapper approach. Filter approachselects the features based on desirable properties before modelconstruction. Despite the computational efficiency, filter approachignores the induction algorithms and is prone to unexpected fail-ures (Fogel, 2000). Wrapper approach embeds the feature selectioninto the model learning and searches for an optimal solution to theparticular classifier employed. Several searching strategies are de-ployed, including greed backward and forward technique and evo-lutionary technique. Evolutionary technique produces superior andmore reliable results than greedy technique which does not con-sider the correlation among features. An embedded feature selec-tion paradigm is implemented in the weight training procedurefor neural network (Castellani & Marques, 2008). Genetic algo-rithm is also used to improve the performance of support vectormachine (SVM) in both feature subset selection and parameteroptimization (Min, Lee, & Han, 2006). Besides, swarm intelligencerepresented by ant colony optimization and particle swarm opti-mization was employed for solving the feature selection problemand enhancing the prediction capability of machine learning mod-els (Lin, Ying, Chen, & Lee, 2008; Marinakis, Marinaki, Doumpos, &Zopounidis, 2009). In this paper, we embed a wrapper featureselection and LVQ learning into a genetic algorithm framework.

3. Simultaneous optimization of feature selection andparameter determination using genetic algorithm

The complexity of prediction models is a critical problem rele-vant to underfitting or overfitting. Regarding LVQ, the complexitycomprises the number of weights (number of units) and the sizeof weights (subset of features). A feasible way is to integrate thecomplexity optimization with weight optimization in the frame-work of GA. GA is an evolutionary technique to find the optimalor near optimal solution to optimization problems. The populationof solutions is generated randomly and evolved towards optimiza-tion under the direction of fitness function. GA has been exten-sively applied to various combination optimization problems inthe conjunction with machine learning methods. In an evolution-ary LVQ, GA is used to discover the right number of prototypesneeded to represent the classes (Cordella, Stefano, Fontanella, &Marcelli, 2006). In this section, we present a GA-based approach

Table 1Confusion matrix.

Real class Predicted class

Default Non-default Total

Default Bb Bg BNon-default Gb Gg GTotal b g T

N. Chen et al. / Expert Systems with Applications 38 (2011) 12939–12945 12941

to optimize a LVQ model for cost-sensitive prediction. The globalnature of GA is exploited to search through the solution space foroptimal complexity and weight of a LVQ model.

3.1. Solution representation

The subset of features and number of units have significantinfluence on the optimal connected weights, therefore it is reason-able to optimize these relevant factors simultaneously. For ease ofimplementation, they are encoded as real values in the chromo-some as shown in Fig. 1. The feature indicators taking value from[0,1] stand for the presence of corresponding feature in modeltraining, in which a value smaller than 0.5 denotes the feature isunselected, otherwise the feature is selected. The number of unitsis a positive value restricted in a specified range (set as [20,100] inour experiment). The map size is determined so that the ratio be-tween sidelengths of the map grid is set to the square root of thetwo biggest eigenvalues of training data and their product is asclose to the desired number of map units as possible (SOM Tool-box). Since the data is normalized to unity range in the preprocess,the connected weights are restricted between 0 and 1. In order toadapt to the maximum number of units and features, the size ofweights are set to n � d, where n is the upper bound of units andd is the total number of features. For a given specification of theparameters, only a portion of the weights is used.

3.2. Fitness function

The bankruptcy prediction problem is to separate the companiesinto two class: default and non-default. Given a LVQ model, the pre-diction is performed by calculating the distance between theinstance and each prototype and assigning it to the label of thenearest one. The binary classification produces a two dimensionalconfusion matrix containing the distribution of instances in the realclass and predicted class. In the confusion matrix shown in Table 1,Bg denotes the number of misclassification of ‘default’ company as‘non-default’, and Gb denotes the number of misclassification of a‘non-default’ company as ‘default’.

Normally the performance of individual is evaluated by the pre-diction accuracy, i.e., the total number of instances correctly classi-fied. Since the cost preference impacts the assessment of decisionsmade based on the learned model, simple assessment is replacedby the cost-relevant measures. In the case of asymmetric cost pref-erence, the fitness function should be modified to incorporate thecost information through a tradeoff between two kinds of errors.Assume that Cb denotes the misclassification cost of ‘default’ com-pany and Cg denotes the misclassification cost of ‘non-default’company. Some metrics have been presented. In Nanda andPendharkar (2001), an objective function is defined as

Cb

Cb þ Cg� Bg þ

Cg

Cb þ Cg� Gb

Expected misclassified cost (EMC) is defined in West (2000) andLee and Chen (2005), in which Pb is the prior probability of class‘default’, Pg is the prior probability of class ‘non-default’.

EMC ¼ Cb � Pb �Bg

Bþ Cg � Pg �

Gb

G

a1 a2 ad... n w11 w

srotacidni erutaeF No. of units

Fig. 1. Representatio

As can be proved, the two metrics are equivalent.

EMC ¼ Cb � Pb �Bg

Bþ Cg � Pg �

Gb

G¼ Cb �

BT� Bg

Bþ Cg �

GT� Gb

G

¼ Cb �Bg

Tþ Cg �

Gb

T¼ Cb þ Cg

TCb

Cb þ Cg� Bg þ

Cg

Cb þ Cg� Gb

� �

In the case of symmetric cost preference, minimizing the EMCvalue is same to minimizing the total number of misclassification(Bg + Gb).

3.3. Stop condition

The evolutionary procedure starts from an initial populationand improves the solutions iteratively until the termination condi-tion is satisfied. The optimal solution to the problem is the onewith the best fitness within all the generations. Generally, the ter-mination condition is set as the predefined number of iterations.However, overfitting is a critical problem for classification models,especially when the model is excessively complex in relation to theamount of data available. The model usually performs more accu-rately in fitting known data (training data) but less accurately inpredicting new data. In this study, we use an independent valida-tion data set to prevent overfitting, namely for defining the earlierstop in the training process. For every 5 iterations, the best-so-farmodel is evaluated on the validation data. If the predefined maxi-mal number of iterations is reached or the generalization perfor-mance degrades, the evolution terminates.

3.4. Framework of GAFS-LVQ algorithm

The flowchart of the algorithm is outlined in Fig. 2. The algo-rithm starts from an initial population of individuals composedof the feature indicators, number of units, and connected weights.For each individual, the training data and validation data sets areconstructed to comprise the selected features and a LVQ model isconfigured with the given number of units and the correspondingweights. Afterwards, the units are assigned by the majority votingscheme. It is performed by projecting the samples in the trainingdata to the best-matching unit (BMU) and selecting the majoritylabel in the Voronoi sets. The predictive performance of the resul-tant model is evaluated on the training data in terms of the EMCvalue. The individuals with higher fitness values are selected togenerate the next population through crossover and mutationoperators. Meantime, the generalization performance is measuredon the validation data set to avoid overfitting. The evolution is ter-minated until the stopping condition is satisfied, which can be themaximum number of iterations or the degradation of the general-ization performance. Finally, the optimal solution is used to predict

12 wnd...

sthgiew detcennoC

wnd-1

n of individuals.

Map labelling

Solution coding

Fitness evaluation Training data

Genetic operatorsSelectionCrossoverMutation

Next generation

x

F(x)

Optimal model

Termination conditionY

N

Initial population

Test datax

F(x)

Fitness evaluation

FeaturesN

o.ofm

apunits

Connected

weights

Select best model

Fitness evaluation Validation datax

F(x)

Fig. 2. Framework of GAFS-LVQ approach.

Table 2Financial ratios and selected features.

Var. Description 10FS 15FS GAFS

x1 Number of employees last available yearp

x2 Capital employed/fixed assetsx3 Financial debt/capital employedx4 Depreciation of tangible assets p


the bankruptcy on the test data through the nearest neighbor ap-proach which projects the instance to the BMU. The pseudocodeof the proposed algorithm is presented in Fig. 3.

Fig. 3. The GAFS-LVQ algorithm.

4. Research design and experimental results

4.1. Data description

The experimental data used in this study is a real-life bank-ruptcy data called Diane data. It contains financial statements ofprivate-owned French companies in small or medium size. Asshown in Table 2, each instance is characterized by 30 financial ra-tios in 2006 and the fate (default or non-default) in the followingyear. The objective is to predict the failure (declared as bankruptcyor submit a restructuring plan to the French court) of companies inyear 2007 according to the financial statements one year beforethat. We use the following strategies to preprocess the data forour experiment:

1. A set of 600 default companies is selected with at most 10 miss-ing data;

2. A set of 600 non-default companies is sampled randomly toobtain a balanced data set;

3. The missing values are replaced by the value of the closestavailable year;

4. The ratios are preprocessed by logarithmized operation todecrease the scatter of data distribution;

5. A linear normalization is applied so that the data is transformedto the range between 0 and 1.

4.2. Experimental results

To test the efficacy of the proposed algorithm, a 5-fold cross-validation procedure is used. The whole data is separated ran-domly into 5 disjoint folds, each contains approximately N/5 in-stances, where N is the total number of companies. In each trial,4-fold is used for model training and the remaining is used for test.The first data set is further divided into training data (60% out of

x5 Working capital/current assetsx6 Current ratio

p p

x7 Liquidity ratiop

x8 Stock turnover daysx9 Collection period days

p

x10 Credit period daysp

x11 Turnover per employeep

x12 Interest/turnoverp

x13 Debt period daysx14 Financial debt/equity

p p p

x15 Financial debt/cashflowp p p

x16 Cashflow/turnoverp p p

x17 Working capital/turnover daysp p

x18 Net current assets/turnover daysx19 Working capital needs/turnoverx20 Exportx21 Added value per employee in EUR

p p

x22 Total assets turnoverx23 Operating profit margin

p p p

x24 Net profit marginp p

x25 Added value marginx26 Part of employees

p p

x27 Return on capital employedp p p

x28 Return on total assetsp p p

x29 EBIT marginp p p

x30 EBITDA marginp p p

x31 Class default, non-default


the whole data) and validation data (20% out of the whole data)randomly. Afterwards, the LVQ model is learned using the trainingdata and evaluated by the validation data to avoid overfitting. Thenthe test data set is input to the resultant optimal model and theclass is predicted. After the experiment is repeated 5 times, theperformance is evaluated by the average results.

Experiments are conducted to find the effect of map units onthe performance of proposed algorithm, varying the number ofunits from 20 to 100. Since type I error has stronger impact thantype II error, the cost preference of ‘default’ class is set alternativelyto 1, 2, 5, 10 while the cost preference of ‘non-default’ company isfixed as 1. Table 3 presents the average number of features foundby the 5 constructed training data sets, the performance on train-ing data and test data (the errors are in percent). The optimal valueof map units to get the best prediction changes at different costpreference, suggesting the inclusion of parameter determinationinto the evolutionary procedure.

We compare the proposed GAFS-LVQ algorithm with the vari-ants using all features (30FS GA-LVQ), 15 features selected byinformation gain (15FS GA-LVQ) and 10 features selected by chi-squared statistics (10FS GA-LVQ) respectively. The selected fea-tures are shown in Table 2 (for GAFS-LVQ, the features are sortedin descending order of frequency selected across all cases andthe top 16 features are given). The prediction results are given inTable 4. When the cost ratio increases from 1 to 10, the type I errordecreases while type II error increases, indicating the bias of pre-diction towards the expensive class. As can be observed, GAFS-LVQ approach uses almost half of the available features for modelconstruction, while produces better results than using the wholefeatures. The superiority appears more significant compared tothe approaches using pre-selected features (both 15FS GA-LVQand 10FS GA-LVQ). Particularly, GAFS-LVQ produces better resultson both type I error and type II error than the approach withoutfeature selection or using filtered features. Therefore, the genetic

Table 3Prediction results (values in bold mean the best performance at a fixed cost preference).

Cost ratio Ave. fea. Training data

Err1 Err2 Err

20 units1 15 11.0 5.8 8.42 13 5.8 14.9 10.35 15 4.0 26.4 15.2

10 13 1.5 49.3 25.3

40 units1 13 12.4 5.1 8.72 13 8.4 9.9 9.25 15 3.6 25.1 14.3

10 14 1.9 40.8 21.3

60 units1 14 10.8 5.2 8.12 11 7.9 10.8 9.35 13 4.3 23.2 13.7

10 15 1.9 34.8 18.2

80 units1 13 12.6 4.5 8.62 15 8.56 10.4 9.45 17 3.9 21.7 12.7

10 15 1.6 49.9 25.6

100 units1 11 11.1 5.4 8.22 10 6.7 10.7 8.75 15 3.5 22.8 13.1

10 16 2.7 35.0 18.8

algorithm paradigm significantly reduces the dimensionality offeatures and simultaneously leads to a better performance.

Finally, we compare the proposed algorithm with MetaCost, ageneral cost-sensitive method in which J4.8, one hidden layerMLP, SVM, and KNN (k = 5) are used as the base classifier respec-tively. In Table 5, the results of MetaCost with and without featureselection are presented. As mentioned above, in the former 15 fea-tures are selected by information gain criterion. As can be seen,GAFS-LVQ always outperforms MetaCost with reduced features inminimizing the EMC. Compared to MetaCost without feature selec-tion, GAFS-LVQ achieves lower EMC in most cases, especially whenthe cost asymmetry becomes critical. The results obtained supportthe conclusion that GAFS-LVQ is effective in cost reduction forbankruptcy prediction.

5. Conclusion and future work

Due to the presence of asymmetric cost preference in bank-ruptcy prediction, the classifiers sensitive to the cost are of greatinterest to decision makers. In the proposed approach, geneticalgorithm is used to evolve the complexity and weights of a LVQmodel for optimal or near optimal bankruptcy prediction. Theasymmetric cost is incorporated directly into the fitness functionand is responsible for the evolution of solutions. The approachhas been tested on a real-life bankruptcy data and compared withprevious methods. The obtained results demonstrate the effective-ness of the proposed algorithm to reduce the features withoutaffecting the prediction performance.

Several issues will be addressed in further study. The resultfound by proposed algorithm is not always the best compared withthe fixed unit approach, which might be caused by the early termi-nation and empirical specification of parameters. The searchingefficiency can be enhanced by adaptive genetic operators andvariable length encoding. Despite using the earlier stop criterion,

Test data

EMC Err1 Err2 Err EMC

0.084 14.0 9.7 11.8 0.1180.132 9.0 17.9 13.4 0.1790.233 8.0 27.6 17.7 0.3330.320 3.0 51.9 27.4 0.409

0.087 12.3 9.8 11.0 0.1100.134 13.9 15.0 14.4 0.2130.215 6.5 27.2 16.8 0.2980.298 3.9 41.5 22.5 0.397

0.081 16.7 10.7 13.7 0.1370.132 11.4 12.8 12.2 0.1790.222 7.8 23.7 15.8 0.3140.267 4.9 36.6 20.8 0.425

0.086 17.3 7.8 12.5 0.1250.137 14.2 14.8 14.6 0.2180.205 7.66 24.6 16.2 0.3150.329 3.56 57.6 30.5 0.463

0.082 17.2 12.0 14.6 0.1460.121 12.6 15.5 14.1 0.2040.201 5.0 25.2 15.1 0.250.310 3.8 38.1 20.8 0.381

Table 4Prediction results (values in bold mean statistical significance at 5% level compared to GAFS-LVQ).

Cost ratio Ave. fea. Ave. units Training data Test data

Err1 Err2 Err EMC Err1 Err2 Err EMC

GAFS-LVQ1 11 72 11.7 5.7 8.7 0.087 17.1 9.7 13.3 0.1332 14 60 6.0 10.6 8.3 0.113 10.2 13.3 11.7 0.1685 14 60 3.3 26.5 14.8 0.214 5.5 27.7 16.6 0.276

10 15 52 1.6 38.2 19.8 0.268 3.3 40.9 22.0 0.37

30FS GA-LVQ1 30 69 12.5 5.7 9.1 0.091 17.9 10.4 14.3 0.1432 30 67 8.4 12.6 10.5 0.147 11.9 15.1 13.4 0.1935 30 55 3.1 28.0 15.5 0.217 6.9 31.0 18.8 0.325

10 30 44 1.9 46.5 24.1 0.326 3.7 47.6 25.6 0.421

15FS GA-LVQ1 15 81 14.6 5.8 10.2 0.102 20.6 9.8 15.3 0.1532 15 71 11.8 12.0 11.9 0.178 17.4 16.0 16.7 0.2535 15 50 5.8 30.6 18.2 0.29 8.7 35.0 21.8 0.392

10 15 50 2.8 46.3 24.5 0.37 4.9 51.7 28.2 0.499

10FS GA-LVQ1 10 77 17.4 4.0 10.7 0.107 21.7 8.0 14.9 0.1492 10 75 12.4 11.3 11.9 0.181 18.0 14.5 16.3 0.2525 10 77 6.8 32.7 19.8 0.334 10.1 33.2 21.5 0.415

10 10 27 3.2 58.0 30.4 0.449 3.8 58.6 31.3 0.486

Table 5Comparison results.

Cost ratio 30FS 15FS

Err1 Err2 Err EMC Err1 Err2 Err EMC

J4.81 12.7 9.7 11.2 0.112 20.2 10.2 15.2 0.1522 11.2 13.2 12.2 0.178 15.0 16.8 15.9 0.2345 9.2 16.8 13.0 0.313 12.2 21.5 16.8 0.412

10 6.5 28.5 17.5 0.468 10.7 32.2 21.4 0.694

MLP1 12.2 6.7 9.4 0.094 23.5 6.0 14.8 0.1482 11.3 6.7 9.0 0.147 20.5 9.7 15.1 0.2535 9.2 12.7 10.9 0.293 13.2 21.3 17.3 0.436

10 7.8 15.0 11.4 0.467 0 100 50 0.500

SVM1 15.7 4.3 10.0 0.100 26.3 4.0 15.2 0.1522 15.8 4.8 10.3 0.183 25.7 4.2 14.9 0.2785 14.0 6.2 10.1 0.381 23.8 6.0 14.9 0.629

10 12.5 7.5 10.0 0.663 21.5 8.5 15.0 1.118

KNN1 21.5 4.7 13.1 0.131 22.5 6.7 14.6 0.1462 15.8 9.5 12.7 0.206 16.0 16.7 16.3 0.2435 10.7 2.2 15.4 0.368 9.5 33.8 21.7 0.407

10 7.5 3.2 19.5 0.533 6.5 45.5 26.0 0.553


overfitting still seems a critical problem. Ensemble methods canmake the base classifier more robust to overfitting (Kima & Kang,2010). The combination of devised approach with bagging orboosting might improve the generalization performance. SinceLVQ employs a nearest neighbor approach for classifying data,the feature indicators can be replaced by weights and used in thedistance calculation as what has done in Cho, Hong, and Ha(2010). Additionally, for comparison purpose we will considerthe use of different machine learning methods (such as SVM andcase-based reasoning) in conjunction with other evolutionary algo-rithms (such as particle swarm optimization and ant colonyoptimization).

Acknowledgement

This work was supported by Project C2007-FCT/442/2006-GE-CAD/ISEP (Knowledge Based, Cognitive and Learning Systems)from Fundacão para a Ciência e a Tecnologia.

References

Boyacioglu, M. A., Kara, Y., & Baykan, O. K. (2009). Predicting bank financial failuresusing neural networks, support vector machines and multivariate statisticalmethods: A comparative analysis in the sample of savings deposit insurancefund (SDIF) transferred banks in Turkey. Expert Systems with Applications, 36,3355–3366.

Castellani, M., & Marques, N. C. (2008). FeaSANNT – An embedded evolutionaryfeature selection approach for neural network classifiers. VIMation JournalKnowledge, Service & Production: IT as an Enabler, 1, 46–53.

Chan, P., & Stolfo, S. (1998). Toward scalable learning with non-uniform class andcost distributions. In Proceedings of the 4th international conference on knowledgediscovery and data mining, New York, NY (pp. 164–168).

Chen, N., & Vieira, A. (2009). Bankruptcy prediction based on independentcomponent analysis. In Proceedings of the 1st international conference on agentsand artificial intelligence (ICAART09), Porto, Portugal (pp. 150–155).

Chen, N., Vieira, A., & Duarte, J. (2009). Cost-sensitive LVQ for bankruptcyprediction: An empirical study. In Proc. W. Li, & J. Zhou (Eds.), Proceedings ofthe 2nd IEEE international conference on computer science and informationtechnology, Beijing (pp. 115–119).

Chen, N., Vieira, A. S., Duarte, J., Ribeiro, B., & Neves, J. C. (2009). Cost-sensitivelearning vector quantization for financial distress prediction. In Proc. L. S. Lopeset al. (Eds.), LNAI (Vol. 5816, pp. 374–385).

Chen, N., Ribeiro, B., Vieira, A. S., Duarte, J., & Neve, J. C. (2010). Hybrid geneticalgorithm and learning vector quantization modeling for cost-sensitivebankruptcy prediction. In 2nd International conference on machine learning andcomputing (ICMLC), Bangalore, India (pp. 213–217).

Cho, S., Hong, H., & Ha, B. C. (2010). A hybrid approach based on the combination ofvariable selection using decision trees and case-based reasoning using themahalanobis distance: For bankruptcy prediction. Expert Systems withApplications, 37, 3482–3488.

Cordella, L. P., Stefano, C. D., Fontanella, F., & Marcelli, A. (2006). Evolutionarygeneration of prototypes for a learning vector quantization classifier. In F.Rothlauf et al. (Eds.), EvoWorkshops 2006, LNCS (Vol. 3907, pp. 391–402).

Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD international conference onknowledge discovery and data mining, United States (pp. 155–164).

Fogel, D. B. (2000). Evolutionary computation: Toward a new philosophy of machineintelligence (2nd ed.). New York: IEEE Press.

Kima, M. J., & Kang, D. K. (2010). Ensemble with neural networks for bankruptcyprediction. Expert Systems with Applications, 37, 3373–3379.


Koh, H. C. (1992). The sensitivity of optimal cutoff points to misclassification costsof type I and type II errors in the going-concern prediction context. Journal ofBusiness Finance and Accounting, 17, 187–197.

Kohonen, T. (2001). Self-organizing maps (3rd ed.). Springer-Verlag.Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial

neural networks and multivariate adaptive regression splines. Expert Systemswith Applications, 28(4), 743–752.

Ling, C. X., Yang, Q., Wang, J., & Zhang, S. (2004). Decision trees with minimal costs.In C. E. Proc. Brodley (Ed.), ACM international conference proceeding series, 21stinternational conference on machine learning (ICML), Banff, Canada (Vol. 69).

Lin, S. W., Ying, K. C., Chen, S. C., & Lee, Z. J. (2008). Particle swarm optimization forparameter determination and feature selection of support vector machine.Expert Systems with Applications, 35, 1817–1824.

Marinakis, Y., Marinaki, M., Doumpos, M., & Zopounidis, C. (2009). Ant colony andparticle swarm optimization for financial classification problems. ExpertSystems with Applications, 36, 10604–10611.

Min, S. H., Lee, J., & Han, I. (2006). Hybrid genetic algorithms and support vectormachines for bankruptcy prediction. Expert Systems with Applications, 31,652–660.

Nanda, S., & Pendharkar, P. (2001). Linear models for minimizing misclassificationcosts in bankruptcy prediction. International Journal of Intelligent Systems inAccounting, Finance & Management, 10, 155–168.

Neves, J. C., & Vieira, A. (2006). Improving bankruptcy prediction with hidden layerlearning vector quantization. European Accounting Review, 15(2), 253–271.

Pendharkar, P. C. (2008). A threshold varying bisection method for cost sensitivelearning in neural networks. Expert Systems with Applications, 34, 1456–1464.

Pendharkar, P., & Nanda, S. (2006). A misclassification cost-minimizing evolutionary– Neural classification approach. Naval Research Logistics, 53(5), 432–447.

Ravi, K. P., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statisticaland intelligent techniques – A review. European Journal of Operational Research,180(1), 1–28.

Laboratory of Computer and Information Sciences & Neural Networks ResearchCenter, Helsinki University of Technology: SOM Toolbox 2.0. Available from:<http://www.cis.hut.fi/somtoolbox>.

Sun, Y. M., Kamela, M. S., Wong, A. K. C., & Wang, Y. (2007). Cost-sensitive boostingfor classification of imbalanced data. Pattern Recognition, 40, 3358–3378.

Vo, N. H., & Won, Y. (2007). Classification of unbalanced medical data with weightedregularized least squares. Frontiers in the Convergence of Bioscience andInformation Technologies, 347–352.

West, D. (2000). Neural network credit scoring models. Computers and OperationsResearch, 27, 1131–1152.

http://www.cis.hut.fi/somtoolbox

a genetic algorithm-based approach to cost-sensitive bankruptcy prediction

Documents