a data mining approach to the prediction of corporate failure

24
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology A data mining approach to the prediction of corporate failure Advisor : Dr. Hsu Presenter : ching-wen Hong Author : Feng Yu Lin, Sally M cClean

Upload: jered

Post on 29-Jan-2016

56 views

Category:

Documents


2 download

DESCRIPTION

A data mining approach to the prediction of corporate failure. Advisor : Dr. Hsu Presenter : ching-wen Hong Author : Feng Yu Lin, Sally McClean. Outline. Motivation Objection Classifier technique The five steps of Data Mining: the SAS SEMMA methodology Sampling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A data mining approach to the prediction of corporate failure

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

A data mining approach to the prediction

of corporate failure

Advisor : Dr. Hsu

Presenter : ching-wen Hong

Author : Feng Yu Lin, Sally McClean

Page 2: A data mining approach to the prediction of corporate failure

2 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline Motivation

Objection

Classifier technique

The five steps of Data Mining: the SAS SEMMA methodology

Sampling

Data exploration

Data manipulation

Modelling and results

Assessment

Conclusion

My opinion

Page 3: A data mining approach to the prediction of corporate failure

3 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation Due to recent changes in the world economy and as more firms,

large or small, seem to fail now more than ever corporate failure prediction is of increasing importance.

Page 4: A data mining approach to the prediction of corporate failure

4 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objection This paper uses a data mining approach to the prediction of

corporate failure.

Page 5: A data mining approach to the prediction of corporate failure

5 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Classifier technique The classifiers include numerous statistical methods and machine l

earning methods.

The statistical methods include discriminant analysis (DA) and logistic regression (LG).

The machine learning methods include artificial neural networks (NN) and decision tree method (C5.0).

Another one, the hybrid method is proposed by the author.

Page 6: A data mining approach to the prediction of corporate failure

6 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The five steps of Data Mining: the SAS SEMMA methodology

Sampling: The data sampling consists of company financial data from the UK. Explore: Preprocessing of dataManipulate: Feature selectionModel: The classification models used to the prediction of corporate failure Assess: Comparison of prediction accuracy

Page 7: A data mining approach to the prediction of corporate failure

7 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Sampling The data sampling consists of company financial data fr

om the UK. The financial data were accessed from Datastream/ICV.

The companies are divided into two groups:one is the failed companies group and the other is the nonfailed companies group.

This training sample consists of 690 nonfailed companies and 106 failed companies.(1980-1990)

The test dataset consists of 289 nonfailed companies and 48 failed companies.(1991-1999)

Page 8: A data mining approach to the prediction of corporate failure

8 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 9: A data mining approach to the prediction of corporate failure

9 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data exploration Exploration or preprocessing of data is very important and is sometimes the most time-consuming part.

Exploration of data is included: (1)The data is presented in a ready to use state. (2)The preprocessing of missing data. (3)We must filter out the redundant records such as duplicated data.

We decide to delete x719,x734,x735,x761,x766and x792, since these variables include too many missing values.

Page 10: A data mining approach to the prediction of corporate failure

10 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data manipulation One main purpose of data manipulation is to fea

ture selection.

We will use two feature selection methods. The first one is based on financial theory and human judgement( Feature selection І ), the second is based on ANOVA ( Feature selection Ⅱ ),.

Page 11: A data mining approach to the prediction of corporate failure

11 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Feature selection І- human judgement based on financial theory

Laurent[21],Ezzamuel et al.[13] and Clarke[10] attempted to reduce the wide variety of financial ratios into several major groups (e.g. profitability, liquidity,etc), the suggestion being that a researcher need only select one ratio from each of the groups to obtain an indication of the companies overall performance.

The features selected are from these categories: return on capital employed(ROCE);Turnover to total assets employed(turnover/TA);capital gearing(CG);working capital(WC)ratio.

Page 12: A data mining approach to the prediction of corporate failure

12 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Feature selection Ⅱ- ANOVA statistical method

To classify failed companies and the nonfailed companies effectively, the predictors should be chosen to enable us to distinguish between these two groups.

We use ANOVA to select the variables that are significantly different between these two groups.

Table 3, the ANOVA output of the training set, we can get the variables indicated by *, which indicate a significantly different between the failed and nonfailed group.

Page 13: A data mining approach to the prediction of corporate failure

13 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 14: A data mining approach to the prediction of corporate failure

14 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Modelling and results

The classification models used in this study are:Statistics:DA,LG;Machine learning methods:decision trees,NN.

Page 15: A data mining approach to the prediction of corporate failure

15 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Hybrid models combining different classifiers

A hybrid method usually integrates two or more technologies. The purpose of integrating technologies is to strengthen the best features of each.

A hybrid method that combines the best features of several classification models is developed to increase the prediction performance.

Page 16: A data mining approach to the prediction of corporate failure

16 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Our purpose is to minimize the number of wrong predictions:

Min δ=∑i=1n (Ti*Oi)

s.t. Oi=f(w1Vi1+ w2Vi2+ w3Vi3+…+ wmVim-θ),i=1,…,nn=the number of classifiers.m=the number of companies in the sample.Ti the actual outcome of ith company in the test sample.Oi the predicted outcome of ith company in the test sam

ple based on the hybrid method Oi=1 if w1Vi1+ w2Vi2+ w3Vi3+…+ wmVim>θ, Oi=0 elseVij the predicted output of ith company in the test sampl

e by classifier j.wj is the weight of classifier j,Θ is the threshold. *return

s 1 if both sides of *are different, *returns 0 if both sides of *are same.

Page 17: A data mining approach to the prediction of corporate failure

17 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The problem of finding such a combination of classifiers is to try to find (1)the combination of weight,(2)their respective classifiers and (3) the threshold θ such that the miss hit δ between actual output Ti and Oi is as small as possible.

Page 18: A data mining approach to the prediction of corporate failure

18 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.The hybrid algorithm

1.Compte the total hit ratio (total accuracy) for the training sample itself for all m independent classifiers.(w1, w2 ,… , wm )

2.Take the outcome prediction Vij of classifier j for all the companies i in the test sample for j=1,…,m, i=1,…,n

3.Take all the population of classifiers, or subsets of it. Compute w1Vi1+ w2Vi2+ w3Vi3+…+ wm

Vim for each companies i in the test sample.

Page 19: A data mining approach to the prediction of corporate failure

19 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4. if w1Vi1+ w2Vi2+ w3Vi3+…+ wmVim>θ,then Oi=1 , else Oi=0. Adjust the parameter θ, where 0<θ< w1+ w2+ w3+…+ wm , such that the misclassification δ is as small as possible.

Page 20: A data mining approach to the prediction of corporate failure

20 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

We present three hybrid classifiers.

Hybrid1-DA+LG+NN+C5.0

Hybrid2-DA+NN+C5.0

Hybrid3-LG+C5.0

Page 21: A data mining approach to the prediction of corporate failure

21 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 22: A data mining approach to the prediction of corporate failure

22 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Assessment

Page 23: A data mining approach to the prediction of corporate failure

23 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

For all the models,DA,LG,NN,C5.0 and hybrid classifiers, we found the ANOVA feature selection is better than human judgement feature selection except for DA.

The machine learning methods (NN and decision trees) show better performance than the statistical approach.

We present three hybrid classifiers:Hybrid1-DA+LG +NN+C5.0;Hybrid2-DA+NN+C5.0; Hybrid3- LG + C5.0.The empirical tests show that a hybrid classifiers produces higher prediction accuracy than individual classifiers.

Page 24: A data mining approach to the prediction of corporate failure

24 Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.My opinion Advantage: A hybrid method that combines the best features of

several classification models is developed to increase the prediction performance.

Disadvantage: (1)The hybrid algorithm is time-consuming in computing w1Vi1+ w2Vi2+ w3Vi3+…+ wmVim ,i=1,…,n for subsets of

classifiers. (2) The hybrid method is not good in the application.