research article hierarchical neural regression models for...

10
Hindawi Publishing Corporation Journal of Engineering Volume 2013, Article ID 543940, 9 pages http://dx.doi.org/10.1155/2013/543940 Research Article Hierarchical Neural Regression Models for Customer Churn Prediction Golshan Mohammadi, 1 Reza Tavakkoli-Moghaddam, 2 and Mehrdad Mohammadi 2 1 Department of Finance Management, Faculty of Humanities and social Sciences, Islamic Azad University, Sanandaj Branch, Sanandaj, Iran 2 Department of Industrial Engineering, College of Engineering, University of Tehran, P.O. Box 11155/4563, Tehran, Iran Correspondence should be addressed to Mehrdad Mohammadi; [email protected] Received 25 November 2012; Revised 27 January 2013; Accepted 1 February 2013 Academic Editor: Jie Zhou Copyright © 2013 Golshan Mohammadi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. is paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN), self-organizing maps (SOM), alpha-cut fuzzy c-means (-FCM), and Cox proportional hazards regression model. e hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and -FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. en, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. e experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the -FCM + ANN + Cox model significantly performs better than the two other hierarchical models. 1. Introduction In today’s competitive world, customer churn management (CCM) is an important task for each service provider to build long-term and profitable relationships with specific customers [1, 2]. e service providers in telecommunication industry suffer from attracting valuable customers with competitors; this is known as customer churn. Recently, there have been many changes in the telecommunications industry, such as, loyalty program for more profitable customers [3]. Loyal customers are the most fertile source of data for decision making. is data reflects the customers’ actual behavior and those factors affect their loyalty. e potential value of customers can be evaluated by these data [3], also assessing the risk that they will stop paying their bills, and predicting their future needs [4]. Besides, because customer attrition will absolutely result in loss of incomes, customer churn management has received increasing attention in the whole marketing and management literature. Moreover, it has been proven that considerable impact on incomes is occurred by small change in retention rate [5]. e effective customer churn management for com- panies needs building more comprehensive and accurate churn prediction model. Recently, several customer churn prediction models have been presented in a number of domains such as telecommunications [68], retail markets [9, 10], subscription management [11, 12], banking service providers [13], and wireless commerce [14]. Among pre- vious studies in the literature, statistical and data min- ing techniques have been applied to build the prediction models.

Upload: others

Post on 11-Mar-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

Hindawi Publishing CorporationJournal of EngineeringVolume 2013 Article ID 543940 9 pageshttpdxdoiorg1011552013543940

Research ArticleHierarchical Neural Regression Models forCustomer Churn Prediction

Golshan Mohammadi1 Reza Tavakkoli-Moghaddam2 and Mehrdad Mohammadi2

1 Department of Finance Management Faculty of Humanities and social Sciences Islamic Azad UniversitySanandaj Branch Sanandaj Iran

2Department of Industrial Engineering College of Engineering University of TehranPO Box 111554563 Tehran Iran

Correspondence should be addressed to Mehrdad Mohammadi mehrdadmohamadiutacir

Received 25 November 2012 Revised 27 January 2013 Accepted 1 February 2013

Academic Editor Jie Zhou

Copyright copy 2013 Golshan Mohammadi et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

As customers are the main assets of each industry customer churn prediction is becoming a major task for companies to remainin competition with competitors In the literature the better applicability and efficiency of hierarchical data mining techniqueshas been reported This paper considers three hierarchical models by combining four different data mining techniques for churnprediction which are backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means(120572-FCM) and Cox proportional hazards regression model The hierarchical models are ANN + ANN + Cox SOM + ANN + Coxand 120572-FCM + ANN + Cox In particular the first component of the models aims to cluster data in two churner and nonchurnergroups and also filter out unrepresentative data or outliers Then the clustered data as the outputs are used to assign customers tochurner and nonchurner groups by the second technique Finally the correctly classified data are used to create Cox proportionalhazards model To evaluate the performance of the hierarchical models an Iranian mobile dataset is considered The experimentalresults show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracyTypes I and II errors RMSE and MAD metrics In addition the 120572-FCM + ANN + Cox model significantly performs better thanthe two other hierarchical models

1 Introduction

In todayrsquos competitive world customer churn management(CCM) is an important task for each service provider tobuild long-term and profitable relationships with specificcustomers [1 2]The service providers in telecommunicationindustry suffer from attracting valuable customers withcompetitors this is known as customer churn Recently therehave beenmany changes in the telecommunications industrysuch as loyalty program for more profitable customers [3]Loyal customers are the most fertile source of data fordecision making This data reflects the customersrsquo actualbehavior and those factors affect their loyalty The potentialvalue of customers can be evaluated by these data [3] alsoassessing the risk that they will stop paying their bills andpredicting their future needs [4]

Besides because customer attrition will absolutely resultin loss of incomes customer churnmanagement has receivedincreasing attention in thewholemarketing andmanagementliterature Moreover it has been proven that considerableimpact on incomes is occurred by small change in retentionrate [5]

The effective customer churn management for com-panies needs building more comprehensive and accuratechurn prediction model Recently several customer churnprediction models have been presented in a number ofdomains such as telecommunications [6ndash8] retail markets[9 10] subscription management [11 12] banking serviceproviders [13] and wireless commerce [14] Among pre-vious studies in the literature statistical and data min-ing techniques have been applied to build the predictionmodels

2 Journal of Engineering

These techniques include artificial neural networks(ANNs) [7] Bayesian networks [6 9] decision trees [15 16]AdaBoosting [13] logistic regression [10 11 16 17] randomforest [10 11] the proportional hazard model [5] and SVMsLessmann and Voszlig [18] gave a detailed review on this topic

Two main tasks of data mining techniques are describingremarkable pattern or relationship in the data and alsopredicting a conceptual model which data followed up[2]

In the literature it has been proven that hybrid datamining approaches by combining clustering and classifica-tion techniques have better performance in comparison withsingle clustering and classification data mining techniquesHybrid approaches are particularly combined of two learningstages in which the first one is preprocessing the data and thesecond one is the final prediction output [7] Other hybriddatamining techniques for predicting customer churnmodelinclude using well-known metaheuristic algorithms (eggenetic algorithm) based on neural network which outper-form traditional local search gradient descentgradient ascentneural networks that use Rumelhart et al [19] procedure forupdating connection weights [20ndash22]

In addition to predicting the customer churn model anddetermining that which customer belongs to which class (iechurned and nonchurned classes) companies are eager toknow when why and with what probability their customerstry to switch their subscription Having knowledge aboutthose factors which significantly affect customers churnbehavior is more important than just knowing classes ofcustomers These effective factors are needed for companiesto plan their long-term strategies for decreasing customerchurn rate and above all scheduling and adopting bestmarketing strategies based on when and why their customerslike to break up their relationship because some companiessuffer from marketing expenses in some especial times whilethey are not aware of what their customers want On theother hand having knowledge about effective factors andprobability of attrition enables companies to focus on thosecustomers who are more likely to churn This useful infor-mation can be extracted using survival analysis of customersIn order to determine the hazard probability function ofthe customers and the above-mentioned information theCox proportional hazard method is applied as a last partof hierarchical methods because the ANNs are not able tocalculate the churn probability of the customers Anotherreason for using the Cox regression model is our used dataThe customer churn data consists of censored data Censoreddata occurs when you know that a measurement exceedssome threshold but you do not know by howmuch So in thisstudy each customer who has not churned till the end of theexperiment is considered as a right censored data Thereforethe Cox regression model is conducted on the customer datato cope with censored data

However few papers studied hierarchical data miningtechniques for customer churn prediction Therefore in thispaper some data mining techniques are presented to createthe hierarchical model of customer churn prediction Thehierarchical methods are based on combining clustering thatis alpha-cut fuzzy c-means (120572-FCM) self-organizing maps

(SOM) and artificial neural network (ANN) classificationtechniques that is ANN and survival analysis that is Coxproportional hazard regression model which their combina-tions are 120572-FCM + ANN + Cox SOM + ANN + Cox andANN + ANN + Cox To evaluate the performance of thehierarchical models an Iranian mobile dataset is consideredfor comparison between the hierarchical models and thesingle Cox regression baseline model in terms of predictionaccuracy Types I and II errors RMSE and MAD metricsIt also should be mentioned that some other well-knowntechniques such as Fuzzy ARTMAP [23] and LLMF [24]were used in designing some other hierarchicalmethods (egSOM + Fuzzy ARTMAP + Cox ANN + Fuzzy ARTMAP +Cox ANN + LLMF + Cox SOM + LLMF + Cox and 120572-FCM+ LLMF + Cox) but just the above-mentioned hierarchicaltechniques are proposed and reported based on their betterperformance Finally some of contributions of this paper areas follows

(i) Considering nonchurned customer as censored dataand using Cox regression model as a first time inthe literature in order to determine customers churnprediction

(ii) Determining important factors affecting the cus-tomer churn in the Iranian telecommunication indus-try

(iii) Determining the hazard and survival functions ofeach customer based on effective factors

(iv) Proposing some new combination of data miningtechniques containing ANN SOM 120572-FCM and Coxregression as hierarchical methods

(v) Conducting the proposed hierarchical methods on adataset of Iranian telephony market

(vi) Comparing different proposed hierarchical methods

The rest of our paper is organized as follows In Section 2we describe the proposed data mining techniques in thispaper Section 3 describes the research methodology andSection 4 presents the experimental results Finally the con-clusion is provided in Section 5

2 Proposed Data Mining Techniques

In order to create effective and accurate customer churnprediction models many data mining techniques have beenconsidered over the past time in the marketing and man-agement literature (eg [12 25]) The proposed data miningtechniques are as follows

21 Alpha-Cut Fuzzy C-Means Clustering Clustering is anunsupervised learning technique that breaks down a set ofpatterns into groups (or clusters) Clustering technique refersto the partitioning of a set of data object into clusters Inparticular no predefined classes are assumed [26]

Classical clustering partitions each observation isassigned to a single group (cluster) without consideringthe degree of distinction or similarity of the observation

Journal of Engineering 3

from all the other possible clusters This type of clustering isoften called hard or crisp clustering [5] Nevertheless fuzzyclustering methods based on the fuzzy set theory and on theconcept of membership functions have been developed Inthe fuzzy clustering observations are allowed to belong tomore than one cluster with different degrees of membership

Fuzzy clustering of an observation X into c clusters ischaracterized by cmembership functions 120583 119895 as follows

120583119895 119883 997888rarr [0 1] 119895 = 1 119888

119888

sum

119895=1

120583119895 (119909119894) = 1 119894 = 1 2 119899

0 lt

119899

sum

119894=1

120583119895 (119909119894) lt 119899 119895 = 1 2 119888

(1)

Membership function is calculated based on the distanceof observations fromclustersrsquo centerThewell-knownmethodof fuzzy clustering is the fuzzy c-means technique (FCM)initially proposed by Dunn [27] FCM applies two consecu-tive steps including (a) calculation of the clustersrsquo center and(b) assigning the observations to these clustersrsquo center usingspecific form of distance in order tominimize a standard lossfunction (SLF) as follows

SLF =

119888

sum

119896=1

119899

sum

119894=1

[120583119896 (119909119894)]1198981003817100381710038171003817119909119894 minus 119888119896

1003817100381710038171003817

2 (2)

where cluster center 119888119896 andmembership function of observa-tion i in cluster k are calculated by (3) and (4) respectively

119888119896 =sum119894 [120583119896 (119909119894)]

119898119909119894

sum119894 [120583119896 (119909119894)]119898 (3)

120583119896 (119909119894) =(1119889119896119894)

1(119898minus1)

sum119888119896=1 (1119889119896119894)

1(119898minus1) (4)

where 119889119896119894 is the distance metric for observation i in cluster k

22 Self-Organizing Maps A new form of a neural networkarchitecture called self-organizing map (SOM) was proposedby Kohonen [28] which has proved extremely efficient whenthe high degree of dimensionality and complexity accursesin input data SOM is used to find out relationships in adataset and cluster data according to the similarity of data(ie similar expression patterns) where the nature of theclassification cannot be predicted by the model creatorsor there may be more than one method to cluster thecharacteristics of a dataset [29] Figure 1 shows an exampleof a 4 times 4 SOM

23 Artificial Neural Network Classification is one of thecommonly used data mining techniques categorizing assupervised learning techniques It determines the valueof some variables and classifies according to results Thecommon algorithms of classification include decision trees

Output nodes

Input nodes

Figure 1 A 4 times 4 Kohonenrsquos self-organizing map

artificial neural networks (ANNs) and so on [30] in whichartificial neural networks are the most recently appliedmethods in literature

An ANN consists of some nodes and links between themThe ANN takes a number of input data and produces a singleoutput data through an internal weighting system ANNscan be categorized into single-layer perception or multilayerperception (MLP) The multilayer perception consists ofmultiple layers of simple two taste sigmoid processing nodesor neurons that act together using internal weighted systemIn addition the neural network consists in one or moreseveral intermediary layers between the input and outputlayers Such intermediary layers are called hidden layers andnodes embedded in these layers are called hidden nodesFigure 2 illustrates a multilayer neural network

24 Cox Proportional Hazards Model According to the Coxand Oakes [31] the Cox model is based on a modellingapproach in order to analysing survival data The purpose ofthe model is to simultaneously explore the effects of severalvariables on survival The Cox model is a well-recognisedstatistical technique for analysing survival data Survivalanalysis typically examines the relationship of the survivaldistribution to covariates Most commonly this examinationentails the specification of a linear-like model for the loghazard For example a parametric model based on theexponential distribution may be written as follows

log ℎ119894 (119905) = 120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896 (5)

or equivalently

ℎ119894 (119905) = exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896)

(6)

Equation (5) is a linear model for the log-hazard or amultiplicative model for the hazard In (5) i is a subscriptfor observation and the xrsquos are the covariates The constant 120572in this model represents a kind of log-baseline hazard sincelog ℎ119894(119905) = 120572 [or ℎ119894(119905) = 119890

120572] when all of the xrsquos are zero

Equation (6) is similar to parametric regressionmodels basedon the other survival distributions

4 Journal of Engineering

Output layer

Input layer

Hidden layer

Figure 2 Multilayer neural network

The Cox model in contrast leaves the baseline hazardfunction 120572(119905) = log ℎ0(119905) unspecified

log ℎ119894 (119905) = 120572 (119905) + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896

(7)

ℎ119894 (119905) = ℎ0 (119905) exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896) (8)

where (8) is a semiparametric because while the baselinehazard can take any form the covariates enter the modellinearly

3 Research Methodology

31 Data Set For the purpose of this paper we considera CRM data set provided by an Iranian mobile operatorSpecifically the dataset contains 3150 subscribers including495 churners and 2655 nonchurners from September 2008toAugust 2009 In addition the subscribers have to bematurecustomers who were with the mobile operator for at least 2months Churn was then calculated based on whether thesubscriber left the company during the 10 remained monthsChurned customer is defined as a customerwhohas notmadeany contact with the operator (eg making a call charging acredit changing subscription etc)

32 Model Development

321 The Baseline As the last part of all proposed hierarchi-cal methods is Cox regression method and also the final aimof these hierarchical methods is determining better hazardand survival functions for customer churn prediction there-fore we use the original dataset to create a Cox proportionalhazards regression model as the baseline Cox model forcomparison

322 ANN + ANN + Cox The first hierarchical model isbased on combining two ANN models and Cox regressionin which the first ANN performs the data reduction taskand the second ANN for churn classification and the lastCox regression for hazard function prediction As there isno 100 accuracy there are a number of correctly andincorrectly predicted data from the training set by the firstANN model Consequently the incorrectly predicted data

can be regarded as outliers since the ANN model cannotpredict them accuratelyThen the correctly predicted data bythe first ANNmodel are used to train the secondANNmodelas the classification model Finally the corrected classifieddata from second ANN are used by Cox regression to predicthazard function

323 SOM+ANN+Cox For the second hierarchicalmodela self-organizingmap (SOM) which is a clustering techniqueis used for the data reduction task Then the correctedclustered data are used to train the second model basedon ANN Finally the classification result is used to hazardfunction prediction To develop the SOM the map size is setby 2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 respectively in orderto obtain the highest rate of prediction accuracy Then twoclusters of SOM which contain the highest proportion of thechurner and nonchurner groups respectively are selected asthe clustering result

324 120572-FCM + ANN + Cox In the third hierarchicalmodel 120572-FCM which is a clustering approach is used fordata reduction task In the fuzzy c-means (FCM) clusteringalgorithm almost none of the data points have amembershipvalue of 1 Besides noise and outliers may cause difficultiesin obtaining appropriate clustering results from the FCMalgorithmTherefore many studies have been done about theFCM algorithm in the literature [32] Furthermore studiesabout FCM can be divided into two categories One isto extend the dissimilarity (or distance) measure d(119909119895 119886119894)between the data point119909119895 and the cluster center 119886119894 in the FCMobjective function by replacing the Euclidean distance withother types of metric measures [33] The other category is toextend the FCM objective function by adding a penalty term[34]

One of the best methods for assigning a data point toexactly one cluster is that if the membership value 120583119894119895 of thedata point 119909119895 in the ith cluster is larger than a given value 120572then the point 119909119895 will exactly belong to the ith cluster withmembership value of 1 and 1205831198941015840119895 = 0 for all i = 119894

1015840 In order toguarantee that no two of these c cluster cores will overlapthe value of 120572 is set to interval [05 1] [35] The cluster coresgenerated by FCM120572 can be calculated by (9)

120583119894119895 =

119889 (119909119895 119886119894)minus1(119898minus1)

sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1)gt 120572 (9)

which is equivalent to

119889 (119909119895 119886119894)minus1(119898minus1)

lt

1

120572sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1) (10)

where m is the fuzziness index so its value is considered as2 Interesting readers are referred to [35] for more detailThen the corrected clustered data are used to train secondANN model in order to customer classification Finally thehazard function using Cox regression is predicted based onthe corrected classified result from ANNmodel

Journal of Engineering 5

325 Evaluation Method To evaluate the proposed churnpredictionmodels prediction accuracy and the Type I and IIerrors are considered They can be measured by a confusionmatrix shown in Table 1 The rate of prediction accuracy isdefined as (119886 + 119889)(119886 + 119887 + 119888 + 119889)

The Type I error is the error of not rejecting a nullhypothesis when the alternative hypothesis is the true stateof nature In this paper it means that the customer is notchurned when the model has predicted that the hazardfunction of that customer is more than 120572 (ie 120572 is the alphacut in fuzzy c-means clustering method) On the other handthe Type II error is defined as the error of rejecting a nullhypothesis when it is the true state of nature It means thatthe customer is churned when the model has predicted thatthe survival function of that customer is more than 120572

We also compare the performance of the proposedmodelwith pure Cox proportional hazards model in predictingthe churn or survival probability of the customers Theobserved outcome for each customer in the sample is eitherchurn or survival (ie still active) by the end of the studyperiod We compute the deviation between observed andpredicted outcomes (ie the probability of churn or survivalas predicted by the model) for both proposed and pure Coxmodel The Root Mean Squared Error (RMSE) and MeanAbsolute Deviation (MAD) are calculated for comparingboth models as follows

RMSE =radic

sum119873ch119894=1 (119878119894ch minus 0)

2

119873ch+

sum119873Nch119895=1 (1 minus 119878

119895

Nch)2

119873Nch

MAD =

1

119873ch

119873ch

sum

119894=1

10038161003816100381610038161003816119890119894ch minus 119890ch

10038161003816100381610038161003816+

1

119873Nch

119873Nch

sum

119895=1

10038161003816100381610038161003816119890119895

Nch minus 119890Nch10038161003816100381610038161003816

(11)

where 119878119894ch and 119878

119895

Nch are the survival probability of churnedcustomer i and non-churned customer j respectively119873ch and 119873Nch are the number of churned and nonchurnedcustomer respectively 119890119894ch is the deviation of churned cus-tomer i fromzero (ie 119890119894ch = (119878

119894chminus0)) and 119890

119895

Nch is the deviationof non-churned customer j from one (ie 119890119895Nch = (1 minus 119878

119895

Nch))and 119890ch and 119890Nch are the mean of the deviation of churnedand non-churned customers respectively

4 Experimental Results

41 The Baseline In order to create the Cox model 2350and remained 800 numbers of data are used for trainingand testing the Cox model respectively Table 2 shows theprediction performance of the baseline Cox proportionalhazardsmodel based on type I and II errors accuracy RMSEandMADmetrics On average the baseline Cox proportionalhazards model provides about 84 accuracy meaning thatin 128 cases of data the Cox model was unable to correctlypredict the survival and hazard probability based on value ofalpha-cut 07 The type I and II errors were equal to 87 and 41cases of incorrectly predicted data The baseline Cox modelalso provides 0083 and 0098 as the RMSE and MAD errormetrics respectively

75

80

85

90

95

100

Base

line

8479

8842

8631

9586

Accu

racy

()

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 3 The accuracy of hierarchical models

0

20

40

60

80

100Er

ror (

)

Type I errorType II error

87

61

76

2135 32 34

12

Base

line

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 4 The Type I and II errors of hierarchical models

42 ANN + ANN + Cox For the first hierarchical modelbased on combining two ANN models and Cox regressionthe first ANN model performs the data reduction taskTherefore we run the ANNmodel by a set of different hiddenlayer and learning epochsThe result of different combinationof hidden layer and learning epochs is as Table 3 in whichan ANN models with 16 and 12 hidden layer and 100 and300 learning epochs are considered for two ANN modelrespectively Finally the accuracy and other performancemetrics for hierarchical ANN+ANN+Coxmodel are shownin Table 4

43 SOM+ANN+Cox To construct the second hierarchicalmodel by combination of SOM ANN and Cox regression2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 SOMs are used to

6 Journal of Engineering

Table 1 Confusion matrix

ActualNonchurners Churners

Predicted Nonchurners 119886 119887 (II 119878lowast gt 120572)

Churners 119888 (I119867lowastlowast gt 120572) 119889

lowastProbability of survivallowastlowastProbability of hazard

Table 2 The prediction performance of the baseline Cox proportional hazards model

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8479 87 35 0091 0098

cluster the data at first We found that 4 lowast 4 SOM performsthe best which can provide the highest rate of accuracy fortwo clusters that is churner and nonchurner clusters Thenthe accurate clustered data are used for training classifierANN and the result of different hidden layer and learningepochs is as Table 5 in which an ANN model with 16 hiddenlayer and 200 epochs is considered as classifiermodel Finallythe accuracy and other performance metrics for hierarchicalSOM + ANN + Cox model are shown in Table 6

44 120572-FCM + ANN + Cox To construct the third hierarchi-cal model based on alpha-cut fuzzy c-means classifier ANNmodel and Cox proportional hazards model the alpha-cutfuzzy c-means with alpha-cut equal to 07 and ANN with 16hidden layer and 100 learning epochs were found regardingto the best reported accuracy The 120572-FCM used for datareduction task results in two clusters including 2210 churnersand 940 nonchurnersThen 2350 and remained 800 numbersof data are respectively used for training and testing theclassification ANNmodel where the prediction performanceof ANNmodel is shown in Table 7 Finally Table 8 shows theperformance metrics of 120572-FCM + ANN + Cox hierarchicalmodels

On average the 120572-FCM+ANN+Cox hierarchical modelprovides about 9549 accuracy based on alpha-cut equalto 07 and ANN with 16 hidden layer and 100 learningepochs The type I and II errors are equal to 21 and 12 casesof incorrectly predicted data The 120572-FCM + ANN + Coxhierarchicalmodel also provides 0031 and 0042 as the RMSEand MAD error metrics respectively

In order to show the high performance of 120572-FCM+ANN+ Cox hierarchical model the accuracy errors type I and IIRMSE and MAD metrics are illustrated in Figures 3 4 and5 respectively

5 Conclusion

As customers are the main competitive advantage of eachindustry customer churn prediction is becoming a majortask for companies to remain in competition with otherindustries Therefore building an effective customer churnprediction model which provides an acceptable level ofaccuracy has become a research problem for companies in

002

004

006

008

01

012

Erro

r

MADRMSE

00980088 0091

0042

0091

0071

0084

0031Ba

selin

e

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 5 The RMSE and MAD errors of hierarchical models

recent years In the literature the better applicability andefficiency of hierarchical data mining techniques in orderto predict customer attrition by combining two or moretechniques has been reported over a number of differentdomain problems In this paper we consider three differenthierarchical datamining techniques based on combination ofsome neural networks and regressionmodel to examine theirperformances for telecommunication industry In particu-lar backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means and Coxproportional hazard model are considered ConsequentlyANN + ANN + Cox SOM + ANN + Cox and 120572-FCM +ANN + Cox hierarchical models are developed in whichthe first component of the hierarchical models filter outunrepresentative data or outliers Then the corrected outputclustered data are used to classify customer into churner andnonchurner groups

To evaluate the performance of the hierarchical modelsan Iranian mobile dataset is considered The experimental

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

2 Journal of Engineering

These techniques include artificial neural networks(ANNs) [7] Bayesian networks [6 9] decision trees [15 16]AdaBoosting [13] logistic regression [10 11 16 17] randomforest [10 11] the proportional hazard model [5] and SVMsLessmann and Voszlig [18] gave a detailed review on this topic

Two main tasks of data mining techniques are describingremarkable pattern or relationship in the data and alsopredicting a conceptual model which data followed up[2]

In the literature it has been proven that hybrid datamining approaches by combining clustering and classifica-tion techniques have better performance in comparison withsingle clustering and classification data mining techniquesHybrid approaches are particularly combined of two learningstages in which the first one is preprocessing the data and thesecond one is the final prediction output [7] Other hybriddatamining techniques for predicting customer churnmodelinclude using well-known metaheuristic algorithms (eggenetic algorithm) based on neural network which outper-form traditional local search gradient descentgradient ascentneural networks that use Rumelhart et al [19] procedure forupdating connection weights [20ndash22]

In addition to predicting the customer churn model anddetermining that which customer belongs to which class (iechurned and nonchurned classes) companies are eager toknow when why and with what probability their customerstry to switch their subscription Having knowledge aboutthose factors which significantly affect customers churnbehavior is more important than just knowing classes ofcustomers These effective factors are needed for companiesto plan their long-term strategies for decreasing customerchurn rate and above all scheduling and adopting bestmarketing strategies based on when and why their customerslike to break up their relationship because some companiessuffer from marketing expenses in some especial times whilethey are not aware of what their customers want On theother hand having knowledge about effective factors andprobability of attrition enables companies to focus on thosecustomers who are more likely to churn This useful infor-mation can be extracted using survival analysis of customersIn order to determine the hazard probability function ofthe customers and the above-mentioned information theCox proportional hazard method is applied as a last partof hierarchical methods because the ANNs are not able tocalculate the churn probability of the customers Anotherreason for using the Cox regression model is our used dataThe customer churn data consists of censored data Censoreddata occurs when you know that a measurement exceedssome threshold but you do not know by howmuch So in thisstudy each customer who has not churned till the end of theexperiment is considered as a right censored data Thereforethe Cox regression model is conducted on the customer datato cope with censored data

However few papers studied hierarchical data miningtechniques for customer churn prediction Therefore in thispaper some data mining techniques are presented to createthe hierarchical model of customer churn prediction Thehierarchical methods are based on combining clustering thatis alpha-cut fuzzy c-means (120572-FCM) self-organizing maps

(SOM) and artificial neural network (ANN) classificationtechniques that is ANN and survival analysis that is Coxproportional hazard regression model which their combina-tions are 120572-FCM + ANN + Cox SOM + ANN + Cox andANN + ANN + Cox To evaluate the performance of thehierarchical models an Iranian mobile dataset is consideredfor comparison between the hierarchical models and thesingle Cox regression baseline model in terms of predictionaccuracy Types I and II errors RMSE and MAD metricsIt also should be mentioned that some other well-knowntechniques such as Fuzzy ARTMAP [23] and LLMF [24]were used in designing some other hierarchicalmethods (egSOM + Fuzzy ARTMAP + Cox ANN + Fuzzy ARTMAP +Cox ANN + LLMF + Cox SOM + LLMF + Cox and 120572-FCM+ LLMF + Cox) but just the above-mentioned hierarchicaltechniques are proposed and reported based on their betterperformance Finally some of contributions of this paper areas follows

(i) Considering nonchurned customer as censored dataand using Cox regression model as a first time inthe literature in order to determine customers churnprediction

(ii) Determining important factors affecting the cus-tomer churn in the Iranian telecommunication indus-try

(iii) Determining the hazard and survival functions ofeach customer based on effective factors

(iv) Proposing some new combination of data miningtechniques containing ANN SOM 120572-FCM and Coxregression as hierarchical methods

(v) Conducting the proposed hierarchical methods on adataset of Iranian telephony market

(vi) Comparing different proposed hierarchical methods

The rest of our paper is organized as follows In Section 2we describe the proposed data mining techniques in thispaper Section 3 describes the research methodology andSection 4 presents the experimental results Finally the con-clusion is provided in Section 5

2 Proposed Data Mining Techniques

In order to create effective and accurate customer churnprediction models many data mining techniques have beenconsidered over the past time in the marketing and man-agement literature (eg [12 25]) The proposed data miningtechniques are as follows

21 Alpha-Cut Fuzzy C-Means Clustering Clustering is anunsupervised learning technique that breaks down a set ofpatterns into groups (or clusters) Clustering technique refersto the partitioning of a set of data object into clusters Inparticular no predefined classes are assumed [26]

Classical clustering partitions each observation isassigned to a single group (cluster) without consideringthe degree of distinction or similarity of the observation

Journal of Engineering 3

from all the other possible clusters This type of clustering isoften called hard or crisp clustering [5] Nevertheless fuzzyclustering methods based on the fuzzy set theory and on theconcept of membership functions have been developed Inthe fuzzy clustering observations are allowed to belong tomore than one cluster with different degrees of membership

Fuzzy clustering of an observation X into c clusters ischaracterized by cmembership functions 120583 119895 as follows

120583119895 119883 997888rarr [0 1] 119895 = 1 119888

119888

sum

119895=1

120583119895 (119909119894) = 1 119894 = 1 2 119899

0 lt

119899

sum

119894=1

120583119895 (119909119894) lt 119899 119895 = 1 2 119888

(1)

Membership function is calculated based on the distanceof observations fromclustersrsquo centerThewell-knownmethodof fuzzy clustering is the fuzzy c-means technique (FCM)initially proposed by Dunn [27] FCM applies two consecu-tive steps including (a) calculation of the clustersrsquo center and(b) assigning the observations to these clustersrsquo center usingspecific form of distance in order tominimize a standard lossfunction (SLF) as follows

SLF =

119888

sum

119896=1

119899

sum

119894=1

[120583119896 (119909119894)]1198981003817100381710038171003817119909119894 minus 119888119896

1003817100381710038171003817

2 (2)

where cluster center 119888119896 andmembership function of observa-tion i in cluster k are calculated by (3) and (4) respectively

119888119896 =sum119894 [120583119896 (119909119894)]

119898119909119894

sum119894 [120583119896 (119909119894)]119898 (3)

120583119896 (119909119894) =(1119889119896119894)

1(119898minus1)

sum119888119896=1 (1119889119896119894)

1(119898minus1) (4)

where 119889119896119894 is the distance metric for observation i in cluster k

22 Self-Organizing Maps A new form of a neural networkarchitecture called self-organizing map (SOM) was proposedby Kohonen [28] which has proved extremely efficient whenthe high degree of dimensionality and complexity accursesin input data SOM is used to find out relationships in adataset and cluster data according to the similarity of data(ie similar expression patterns) where the nature of theclassification cannot be predicted by the model creatorsor there may be more than one method to cluster thecharacteristics of a dataset [29] Figure 1 shows an exampleof a 4 times 4 SOM

23 Artificial Neural Network Classification is one of thecommonly used data mining techniques categorizing assupervised learning techniques It determines the valueof some variables and classifies according to results Thecommon algorithms of classification include decision trees

Output nodes

Input nodes

Figure 1 A 4 times 4 Kohonenrsquos self-organizing map

artificial neural networks (ANNs) and so on [30] in whichartificial neural networks are the most recently appliedmethods in literature

An ANN consists of some nodes and links between themThe ANN takes a number of input data and produces a singleoutput data through an internal weighting system ANNscan be categorized into single-layer perception or multilayerperception (MLP) The multilayer perception consists ofmultiple layers of simple two taste sigmoid processing nodesor neurons that act together using internal weighted systemIn addition the neural network consists in one or moreseveral intermediary layers between the input and outputlayers Such intermediary layers are called hidden layers andnodes embedded in these layers are called hidden nodesFigure 2 illustrates a multilayer neural network

24 Cox Proportional Hazards Model According to the Coxand Oakes [31] the Cox model is based on a modellingapproach in order to analysing survival data The purpose ofthe model is to simultaneously explore the effects of severalvariables on survival The Cox model is a well-recognisedstatistical technique for analysing survival data Survivalanalysis typically examines the relationship of the survivaldistribution to covariates Most commonly this examinationentails the specification of a linear-like model for the loghazard For example a parametric model based on theexponential distribution may be written as follows

log ℎ119894 (119905) = 120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896 (5)

or equivalently

ℎ119894 (119905) = exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896)

(6)

Equation (5) is a linear model for the log-hazard or amultiplicative model for the hazard In (5) i is a subscriptfor observation and the xrsquos are the covariates The constant 120572in this model represents a kind of log-baseline hazard sincelog ℎ119894(119905) = 120572 [or ℎ119894(119905) = 119890

120572] when all of the xrsquos are zero

Equation (6) is similar to parametric regressionmodels basedon the other survival distributions

4 Journal of Engineering

Output layer

Input layer

Hidden layer

Figure 2 Multilayer neural network

The Cox model in contrast leaves the baseline hazardfunction 120572(119905) = log ℎ0(119905) unspecified

log ℎ119894 (119905) = 120572 (119905) + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896

(7)

ℎ119894 (119905) = ℎ0 (119905) exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896) (8)

where (8) is a semiparametric because while the baselinehazard can take any form the covariates enter the modellinearly

3 Research Methodology

31 Data Set For the purpose of this paper we considera CRM data set provided by an Iranian mobile operatorSpecifically the dataset contains 3150 subscribers including495 churners and 2655 nonchurners from September 2008toAugust 2009 In addition the subscribers have to bematurecustomers who were with the mobile operator for at least 2months Churn was then calculated based on whether thesubscriber left the company during the 10 remained monthsChurned customer is defined as a customerwhohas notmadeany contact with the operator (eg making a call charging acredit changing subscription etc)

32 Model Development

321 The Baseline As the last part of all proposed hierarchi-cal methods is Cox regression method and also the final aimof these hierarchical methods is determining better hazardand survival functions for customer churn prediction there-fore we use the original dataset to create a Cox proportionalhazards regression model as the baseline Cox model forcomparison

322 ANN + ANN + Cox The first hierarchical model isbased on combining two ANN models and Cox regressionin which the first ANN performs the data reduction taskand the second ANN for churn classification and the lastCox regression for hazard function prediction As there isno 100 accuracy there are a number of correctly andincorrectly predicted data from the training set by the firstANN model Consequently the incorrectly predicted data

can be regarded as outliers since the ANN model cannotpredict them accuratelyThen the correctly predicted data bythe first ANNmodel are used to train the secondANNmodelas the classification model Finally the corrected classifieddata from second ANN are used by Cox regression to predicthazard function

323 SOM+ANN+Cox For the second hierarchicalmodela self-organizingmap (SOM) which is a clustering techniqueis used for the data reduction task Then the correctedclustered data are used to train the second model basedon ANN Finally the classification result is used to hazardfunction prediction To develop the SOM the map size is setby 2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 respectively in orderto obtain the highest rate of prediction accuracy Then twoclusters of SOM which contain the highest proportion of thechurner and nonchurner groups respectively are selected asthe clustering result

324 120572-FCM + ANN + Cox In the third hierarchicalmodel 120572-FCM which is a clustering approach is used fordata reduction task In the fuzzy c-means (FCM) clusteringalgorithm almost none of the data points have amembershipvalue of 1 Besides noise and outliers may cause difficultiesin obtaining appropriate clustering results from the FCMalgorithmTherefore many studies have been done about theFCM algorithm in the literature [32] Furthermore studiesabout FCM can be divided into two categories One isto extend the dissimilarity (or distance) measure d(119909119895 119886119894)between the data point119909119895 and the cluster center 119886119894 in the FCMobjective function by replacing the Euclidean distance withother types of metric measures [33] The other category is toextend the FCM objective function by adding a penalty term[34]

One of the best methods for assigning a data point toexactly one cluster is that if the membership value 120583119894119895 of thedata point 119909119895 in the ith cluster is larger than a given value 120572then the point 119909119895 will exactly belong to the ith cluster withmembership value of 1 and 1205831198941015840119895 = 0 for all i = 119894

1015840 In order toguarantee that no two of these c cluster cores will overlapthe value of 120572 is set to interval [05 1] [35] The cluster coresgenerated by FCM120572 can be calculated by (9)

120583119894119895 =

119889 (119909119895 119886119894)minus1(119898minus1)

sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1)gt 120572 (9)

which is equivalent to

119889 (119909119895 119886119894)minus1(119898minus1)

lt

1

120572sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1) (10)

where m is the fuzziness index so its value is considered as2 Interesting readers are referred to [35] for more detailThen the corrected clustered data are used to train secondANN model in order to customer classification Finally thehazard function using Cox regression is predicted based onthe corrected classified result from ANNmodel

Journal of Engineering 5

325 Evaluation Method To evaluate the proposed churnpredictionmodels prediction accuracy and the Type I and IIerrors are considered They can be measured by a confusionmatrix shown in Table 1 The rate of prediction accuracy isdefined as (119886 + 119889)(119886 + 119887 + 119888 + 119889)

The Type I error is the error of not rejecting a nullhypothesis when the alternative hypothesis is the true stateof nature In this paper it means that the customer is notchurned when the model has predicted that the hazardfunction of that customer is more than 120572 (ie 120572 is the alphacut in fuzzy c-means clustering method) On the other handthe Type II error is defined as the error of rejecting a nullhypothesis when it is the true state of nature It means thatthe customer is churned when the model has predicted thatthe survival function of that customer is more than 120572

We also compare the performance of the proposedmodelwith pure Cox proportional hazards model in predictingthe churn or survival probability of the customers Theobserved outcome for each customer in the sample is eitherchurn or survival (ie still active) by the end of the studyperiod We compute the deviation between observed andpredicted outcomes (ie the probability of churn or survivalas predicted by the model) for both proposed and pure Coxmodel The Root Mean Squared Error (RMSE) and MeanAbsolute Deviation (MAD) are calculated for comparingboth models as follows

RMSE =radic

sum119873ch119894=1 (119878119894ch minus 0)

2

119873ch+

sum119873Nch119895=1 (1 minus 119878

119895

Nch)2

119873Nch

MAD =

1

119873ch

119873ch

sum

119894=1

10038161003816100381610038161003816119890119894ch minus 119890ch

10038161003816100381610038161003816+

1

119873Nch

119873Nch

sum

119895=1

10038161003816100381610038161003816119890119895

Nch minus 119890Nch10038161003816100381610038161003816

(11)

where 119878119894ch and 119878

119895

Nch are the survival probability of churnedcustomer i and non-churned customer j respectively119873ch and 119873Nch are the number of churned and nonchurnedcustomer respectively 119890119894ch is the deviation of churned cus-tomer i fromzero (ie 119890119894ch = (119878

119894chminus0)) and 119890

119895

Nch is the deviationof non-churned customer j from one (ie 119890119895Nch = (1 minus 119878

119895

Nch))and 119890ch and 119890Nch are the mean of the deviation of churnedand non-churned customers respectively

4 Experimental Results

41 The Baseline In order to create the Cox model 2350and remained 800 numbers of data are used for trainingand testing the Cox model respectively Table 2 shows theprediction performance of the baseline Cox proportionalhazardsmodel based on type I and II errors accuracy RMSEandMADmetrics On average the baseline Cox proportionalhazards model provides about 84 accuracy meaning thatin 128 cases of data the Cox model was unable to correctlypredict the survival and hazard probability based on value ofalpha-cut 07 The type I and II errors were equal to 87 and 41cases of incorrectly predicted data The baseline Cox modelalso provides 0083 and 0098 as the RMSE and MAD errormetrics respectively

75

80

85

90

95

100

Base

line

8479

8842

8631

9586

Accu

racy

()

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 3 The accuracy of hierarchical models

0

20

40

60

80

100Er

ror (

)

Type I errorType II error

87

61

76

2135 32 34

12

Base

line

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 4 The Type I and II errors of hierarchical models

42 ANN + ANN + Cox For the first hierarchical modelbased on combining two ANN models and Cox regressionthe first ANN model performs the data reduction taskTherefore we run the ANNmodel by a set of different hiddenlayer and learning epochsThe result of different combinationof hidden layer and learning epochs is as Table 3 in whichan ANN models with 16 and 12 hidden layer and 100 and300 learning epochs are considered for two ANN modelrespectively Finally the accuracy and other performancemetrics for hierarchical ANN+ANN+Coxmodel are shownin Table 4

43 SOM+ANN+Cox To construct the second hierarchicalmodel by combination of SOM ANN and Cox regression2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 SOMs are used to

6 Journal of Engineering

Table 1 Confusion matrix

ActualNonchurners Churners

Predicted Nonchurners 119886 119887 (II 119878lowast gt 120572)

Churners 119888 (I119867lowastlowast gt 120572) 119889

lowastProbability of survivallowastlowastProbability of hazard

Table 2 The prediction performance of the baseline Cox proportional hazards model

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8479 87 35 0091 0098

cluster the data at first We found that 4 lowast 4 SOM performsthe best which can provide the highest rate of accuracy fortwo clusters that is churner and nonchurner clusters Thenthe accurate clustered data are used for training classifierANN and the result of different hidden layer and learningepochs is as Table 5 in which an ANN model with 16 hiddenlayer and 200 epochs is considered as classifiermodel Finallythe accuracy and other performance metrics for hierarchicalSOM + ANN + Cox model are shown in Table 6

44 120572-FCM + ANN + Cox To construct the third hierarchi-cal model based on alpha-cut fuzzy c-means classifier ANNmodel and Cox proportional hazards model the alpha-cutfuzzy c-means with alpha-cut equal to 07 and ANN with 16hidden layer and 100 learning epochs were found regardingto the best reported accuracy The 120572-FCM used for datareduction task results in two clusters including 2210 churnersand 940 nonchurnersThen 2350 and remained 800 numbersof data are respectively used for training and testing theclassification ANNmodel where the prediction performanceof ANNmodel is shown in Table 7 Finally Table 8 shows theperformance metrics of 120572-FCM + ANN + Cox hierarchicalmodels

On average the 120572-FCM+ANN+Cox hierarchical modelprovides about 9549 accuracy based on alpha-cut equalto 07 and ANN with 16 hidden layer and 100 learningepochs The type I and II errors are equal to 21 and 12 casesof incorrectly predicted data The 120572-FCM + ANN + Coxhierarchicalmodel also provides 0031 and 0042 as the RMSEand MAD error metrics respectively

In order to show the high performance of 120572-FCM+ANN+ Cox hierarchical model the accuracy errors type I and IIRMSE and MAD metrics are illustrated in Figures 3 4 and5 respectively

5 Conclusion

As customers are the main competitive advantage of eachindustry customer churn prediction is becoming a majortask for companies to remain in competition with otherindustries Therefore building an effective customer churnprediction model which provides an acceptable level ofaccuracy has become a research problem for companies in

002

004

006

008

01

012

Erro

r

MADRMSE

00980088 0091

0042

0091

0071

0084

0031Ba

selin

e

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 5 The RMSE and MAD errors of hierarchical models

recent years In the literature the better applicability andefficiency of hierarchical data mining techniques in orderto predict customer attrition by combining two or moretechniques has been reported over a number of differentdomain problems In this paper we consider three differenthierarchical datamining techniques based on combination ofsome neural networks and regressionmodel to examine theirperformances for telecommunication industry In particu-lar backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means and Coxproportional hazard model are considered ConsequentlyANN + ANN + Cox SOM + ANN + Cox and 120572-FCM +ANN + Cox hierarchical models are developed in whichthe first component of the hierarchical models filter outunrepresentative data or outliers Then the corrected outputclustered data are used to classify customer into churner andnonchurner groups

To evaluate the performance of the hierarchical modelsan Iranian mobile dataset is considered The experimental

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

Journal of Engineering 3

from all the other possible clusters This type of clustering isoften called hard or crisp clustering [5] Nevertheless fuzzyclustering methods based on the fuzzy set theory and on theconcept of membership functions have been developed Inthe fuzzy clustering observations are allowed to belong tomore than one cluster with different degrees of membership

Fuzzy clustering of an observation X into c clusters ischaracterized by cmembership functions 120583 119895 as follows

120583119895 119883 997888rarr [0 1] 119895 = 1 119888

119888

sum

119895=1

120583119895 (119909119894) = 1 119894 = 1 2 119899

0 lt

119899

sum

119894=1

120583119895 (119909119894) lt 119899 119895 = 1 2 119888

(1)

Membership function is calculated based on the distanceof observations fromclustersrsquo centerThewell-knownmethodof fuzzy clustering is the fuzzy c-means technique (FCM)initially proposed by Dunn [27] FCM applies two consecu-tive steps including (a) calculation of the clustersrsquo center and(b) assigning the observations to these clustersrsquo center usingspecific form of distance in order tominimize a standard lossfunction (SLF) as follows

SLF =

119888

sum

119896=1

119899

sum

119894=1

[120583119896 (119909119894)]1198981003817100381710038171003817119909119894 minus 119888119896

1003817100381710038171003817

2 (2)

where cluster center 119888119896 andmembership function of observa-tion i in cluster k are calculated by (3) and (4) respectively

119888119896 =sum119894 [120583119896 (119909119894)]

119898119909119894

sum119894 [120583119896 (119909119894)]119898 (3)

120583119896 (119909119894) =(1119889119896119894)

1(119898minus1)

sum119888119896=1 (1119889119896119894)

1(119898minus1) (4)

where 119889119896119894 is the distance metric for observation i in cluster k

22 Self-Organizing Maps A new form of a neural networkarchitecture called self-organizing map (SOM) was proposedby Kohonen [28] which has proved extremely efficient whenthe high degree of dimensionality and complexity accursesin input data SOM is used to find out relationships in adataset and cluster data according to the similarity of data(ie similar expression patterns) where the nature of theclassification cannot be predicted by the model creatorsor there may be more than one method to cluster thecharacteristics of a dataset [29] Figure 1 shows an exampleof a 4 times 4 SOM

23 Artificial Neural Network Classification is one of thecommonly used data mining techniques categorizing assupervised learning techniques It determines the valueof some variables and classifies according to results Thecommon algorithms of classification include decision trees

Output nodes

Input nodes

Figure 1 A 4 times 4 Kohonenrsquos self-organizing map

artificial neural networks (ANNs) and so on [30] in whichartificial neural networks are the most recently appliedmethods in literature

An ANN consists of some nodes and links between themThe ANN takes a number of input data and produces a singleoutput data through an internal weighting system ANNscan be categorized into single-layer perception or multilayerperception (MLP) The multilayer perception consists ofmultiple layers of simple two taste sigmoid processing nodesor neurons that act together using internal weighted systemIn addition the neural network consists in one or moreseveral intermediary layers between the input and outputlayers Such intermediary layers are called hidden layers andnodes embedded in these layers are called hidden nodesFigure 2 illustrates a multilayer neural network

24 Cox Proportional Hazards Model According to the Coxand Oakes [31] the Cox model is based on a modellingapproach in order to analysing survival data The purpose ofthe model is to simultaneously explore the effects of severalvariables on survival The Cox model is a well-recognisedstatistical technique for analysing survival data Survivalanalysis typically examines the relationship of the survivaldistribution to covariates Most commonly this examinationentails the specification of a linear-like model for the loghazard For example a parametric model based on theexponential distribution may be written as follows

log ℎ119894 (119905) = 120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896 (5)

or equivalently

ℎ119894 (119905) = exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896)

(6)

Equation (5) is a linear model for the log-hazard or amultiplicative model for the hazard In (5) i is a subscriptfor observation and the xrsquos are the covariates The constant 120572in this model represents a kind of log-baseline hazard sincelog ℎ119894(119905) = 120572 [or ℎ119894(119905) = 119890

120572] when all of the xrsquos are zero

Equation (6) is similar to parametric regressionmodels basedon the other survival distributions

4 Journal of Engineering

Output layer

Input layer

Hidden layer

Figure 2 Multilayer neural network

The Cox model in contrast leaves the baseline hazardfunction 120572(119905) = log ℎ0(119905) unspecified

log ℎ119894 (119905) = 120572 (119905) + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896

(7)

ℎ119894 (119905) = ℎ0 (119905) exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896) (8)

where (8) is a semiparametric because while the baselinehazard can take any form the covariates enter the modellinearly

3 Research Methodology

31 Data Set For the purpose of this paper we considera CRM data set provided by an Iranian mobile operatorSpecifically the dataset contains 3150 subscribers including495 churners and 2655 nonchurners from September 2008toAugust 2009 In addition the subscribers have to bematurecustomers who were with the mobile operator for at least 2months Churn was then calculated based on whether thesubscriber left the company during the 10 remained monthsChurned customer is defined as a customerwhohas notmadeany contact with the operator (eg making a call charging acredit changing subscription etc)

32 Model Development

321 The Baseline As the last part of all proposed hierarchi-cal methods is Cox regression method and also the final aimof these hierarchical methods is determining better hazardand survival functions for customer churn prediction there-fore we use the original dataset to create a Cox proportionalhazards regression model as the baseline Cox model forcomparison

322 ANN + ANN + Cox The first hierarchical model isbased on combining two ANN models and Cox regressionin which the first ANN performs the data reduction taskand the second ANN for churn classification and the lastCox regression for hazard function prediction As there isno 100 accuracy there are a number of correctly andincorrectly predicted data from the training set by the firstANN model Consequently the incorrectly predicted data

can be regarded as outliers since the ANN model cannotpredict them accuratelyThen the correctly predicted data bythe first ANNmodel are used to train the secondANNmodelas the classification model Finally the corrected classifieddata from second ANN are used by Cox regression to predicthazard function

323 SOM+ANN+Cox For the second hierarchicalmodela self-organizingmap (SOM) which is a clustering techniqueis used for the data reduction task Then the correctedclustered data are used to train the second model basedon ANN Finally the classification result is used to hazardfunction prediction To develop the SOM the map size is setby 2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 respectively in orderto obtain the highest rate of prediction accuracy Then twoclusters of SOM which contain the highest proportion of thechurner and nonchurner groups respectively are selected asthe clustering result

324 120572-FCM + ANN + Cox In the third hierarchicalmodel 120572-FCM which is a clustering approach is used fordata reduction task In the fuzzy c-means (FCM) clusteringalgorithm almost none of the data points have amembershipvalue of 1 Besides noise and outliers may cause difficultiesin obtaining appropriate clustering results from the FCMalgorithmTherefore many studies have been done about theFCM algorithm in the literature [32] Furthermore studiesabout FCM can be divided into two categories One isto extend the dissimilarity (or distance) measure d(119909119895 119886119894)between the data point119909119895 and the cluster center 119886119894 in the FCMobjective function by replacing the Euclidean distance withother types of metric measures [33] The other category is toextend the FCM objective function by adding a penalty term[34]

One of the best methods for assigning a data point toexactly one cluster is that if the membership value 120583119894119895 of thedata point 119909119895 in the ith cluster is larger than a given value 120572then the point 119909119895 will exactly belong to the ith cluster withmembership value of 1 and 1205831198941015840119895 = 0 for all i = 119894

1015840 In order toguarantee that no two of these c cluster cores will overlapthe value of 120572 is set to interval [05 1] [35] The cluster coresgenerated by FCM120572 can be calculated by (9)

120583119894119895 =

119889 (119909119895 119886119894)minus1(119898minus1)

sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1)gt 120572 (9)

which is equivalent to

119889 (119909119895 119886119894)minus1(119898minus1)

lt

1

120572sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1) (10)

where m is the fuzziness index so its value is considered as2 Interesting readers are referred to [35] for more detailThen the corrected clustered data are used to train secondANN model in order to customer classification Finally thehazard function using Cox regression is predicted based onthe corrected classified result from ANNmodel

Journal of Engineering 5

325 Evaluation Method To evaluate the proposed churnpredictionmodels prediction accuracy and the Type I and IIerrors are considered They can be measured by a confusionmatrix shown in Table 1 The rate of prediction accuracy isdefined as (119886 + 119889)(119886 + 119887 + 119888 + 119889)

The Type I error is the error of not rejecting a nullhypothesis when the alternative hypothesis is the true stateof nature In this paper it means that the customer is notchurned when the model has predicted that the hazardfunction of that customer is more than 120572 (ie 120572 is the alphacut in fuzzy c-means clustering method) On the other handthe Type II error is defined as the error of rejecting a nullhypothesis when it is the true state of nature It means thatthe customer is churned when the model has predicted thatthe survival function of that customer is more than 120572

We also compare the performance of the proposedmodelwith pure Cox proportional hazards model in predictingthe churn or survival probability of the customers Theobserved outcome for each customer in the sample is eitherchurn or survival (ie still active) by the end of the studyperiod We compute the deviation between observed andpredicted outcomes (ie the probability of churn or survivalas predicted by the model) for both proposed and pure Coxmodel The Root Mean Squared Error (RMSE) and MeanAbsolute Deviation (MAD) are calculated for comparingboth models as follows

RMSE =radic

sum119873ch119894=1 (119878119894ch minus 0)

2

119873ch+

sum119873Nch119895=1 (1 minus 119878

119895

Nch)2

119873Nch

MAD =

1

119873ch

119873ch

sum

119894=1

10038161003816100381610038161003816119890119894ch minus 119890ch

10038161003816100381610038161003816+

1

119873Nch

119873Nch

sum

119895=1

10038161003816100381610038161003816119890119895

Nch minus 119890Nch10038161003816100381610038161003816

(11)

where 119878119894ch and 119878

119895

Nch are the survival probability of churnedcustomer i and non-churned customer j respectively119873ch and 119873Nch are the number of churned and nonchurnedcustomer respectively 119890119894ch is the deviation of churned cus-tomer i fromzero (ie 119890119894ch = (119878

119894chminus0)) and 119890

119895

Nch is the deviationof non-churned customer j from one (ie 119890119895Nch = (1 minus 119878

119895

Nch))and 119890ch and 119890Nch are the mean of the deviation of churnedand non-churned customers respectively

4 Experimental Results

41 The Baseline In order to create the Cox model 2350and remained 800 numbers of data are used for trainingand testing the Cox model respectively Table 2 shows theprediction performance of the baseline Cox proportionalhazardsmodel based on type I and II errors accuracy RMSEandMADmetrics On average the baseline Cox proportionalhazards model provides about 84 accuracy meaning thatin 128 cases of data the Cox model was unable to correctlypredict the survival and hazard probability based on value ofalpha-cut 07 The type I and II errors were equal to 87 and 41cases of incorrectly predicted data The baseline Cox modelalso provides 0083 and 0098 as the RMSE and MAD errormetrics respectively

75

80

85

90

95

100

Base

line

8479

8842

8631

9586

Accu

racy

()

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 3 The accuracy of hierarchical models

0

20

40

60

80

100Er

ror (

)

Type I errorType II error

87

61

76

2135 32 34

12

Base

line

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 4 The Type I and II errors of hierarchical models

42 ANN + ANN + Cox For the first hierarchical modelbased on combining two ANN models and Cox regressionthe first ANN model performs the data reduction taskTherefore we run the ANNmodel by a set of different hiddenlayer and learning epochsThe result of different combinationof hidden layer and learning epochs is as Table 3 in whichan ANN models with 16 and 12 hidden layer and 100 and300 learning epochs are considered for two ANN modelrespectively Finally the accuracy and other performancemetrics for hierarchical ANN+ANN+Coxmodel are shownin Table 4

43 SOM+ANN+Cox To construct the second hierarchicalmodel by combination of SOM ANN and Cox regression2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 SOMs are used to

6 Journal of Engineering

Table 1 Confusion matrix

ActualNonchurners Churners

Predicted Nonchurners 119886 119887 (II 119878lowast gt 120572)

Churners 119888 (I119867lowastlowast gt 120572) 119889

lowastProbability of survivallowastlowastProbability of hazard

Table 2 The prediction performance of the baseline Cox proportional hazards model

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8479 87 35 0091 0098

cluster the data at first We found that 4 lowast 4 SOM performsthe best which can provide the highest rate of accuracy fortwo clusters that is churner and nonchurner clusters Thenthe accurate clustered data are used for training classifierANN and the result of different hidden layer and learningepochs is as Table 5 in which an ANN model with 16 hiddenlayer and 200 epochs is considered as classifiermodel Finallythe accuracy and other performance metrics for hierarchicalSOM + ANN + Cox model are shown in Table 6

44 120572-FCM + ANN + Cox To construct the third hierarchi-cal model based on alpha-cut fuzzy c-means classifier ANNmodel and Cox proportional hazards model the alpha-cutfuzzy c-means with alpha-cut equal to 07 and ANN with 16hidden layer and 100 learning epochs were found regardingto the best reported accuracy The 120572-FCM used for datareduction task results in two clusters including 2210 churnersand 940 nonchurnersThen 2350 and remained 800 numbersof data are respectively used for training and testing theclassification ANNmodel where the prediction performanceof ANNmodel is shown in Table 7 Finally Table 8 shows theperformance metrics of 120572-FCM + ANN + Cox hierarchicalmodels

On average the 120572-FCM+ANN+Cox hierarchical modelprovides about 9549 accuracy based on alpha-cut equalto 07 and ANN with 16 hidden layer and 100 learningepochs The type I and II errors are equal to 21 and 12 casesof incorrectly predicted data The 120572-FCM + ANN + Coxhierarchicalmodel also provides 0031 and 0042 as the RMSEand MAD error metrics respectively

In order to show the high performance of 120572-FCM+ANN+ Cox hierarchical model the accuracy errors type I and IIRMSE and MAD metrics are illustrated in Figures 3 4 and5 respectively

5 Conclusion

As customers are the main competitive advantage of eachindustry customer churn prediction is becoming a majortask for companies to remain in competition with otherindustries Therefore building an effective customer churnprediction model which provides an acceptable level ofaccuracy has become a research problem for companies in

002

004

006

008

01

012

Erro

r

MADRMSE

00980088 0091

0042

0091

0071

0084

0031Ba

selin

e

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 5 The RMSE and MAD errors of hierarchical models

recent years In the literature the better applicability andefficiency of hierarchical data mining techniques in orderto predict customer attrition by combining two or moretechniques has been reported over a number of differentdomain problems In this paper we consider three differenthierarchical datamining techniques based on combination ofsome neural networks and regressionmodel to examine theirperformances for telecommunication industry In particu-lar backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means and Coxproportional hazard model are considered ConsequentlyANN + ANN + Cox SOM + ANN + Cox and 120572-FCM +ANN + Cox hierarchical models are developed in whichthe first component of the hierarchical models filter outunrepresentative data or outliers Then the corrected outputclustered data are used to classify customer into churner andnonchurner groups

To evaluate the performance of the hierarchical modelsan Iranian mobile dataset is considered The experimental

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

4 Journal of Engineering

Output layer

Input layer

Hidden layer

Figure 2 Multilayer neural network

The Cox model in contrast leaves the baseline hazardfunction 120572(119905) = log ℎ0(119905) unspecified

log ℎ119894 (119905) = 120572 (119905) + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896

(7)

ℎ119894 (119905) = ℎ0 (119905) exp (120572 + 12057311199091198941 + 12057321199091198942 + sdot sdot sdot + 120573119896119909119894119896) (8)

where (8) is a semiparametric because while the baselinehazard can take any form the covariates enter the modellinearly

3 Research Methodology

31 Data Set For the purpose of this paper we considera CRM data set provided by an Iranian mobile operatorSpecifically the dataset contains 3150 subscribers including495 churners and 2655 nonchurners from September 2008toAugust 2009 In addition the subscribers have to bematurecustomers who were with the mobile operator for at least 2months Churn was then calculated based on whether thesubscriber left the company during the 10 remained monthsChurned customer is defined as a customerwhohas notmadeany contact with the operator (eg making a call charging acredit changing subscription etc)

32 Model Development

321 The Baseline As the last part of all proposed hierarchi-cal methods is Cox regression method and also the final aimof these hierarchical methods is determining better hazardand survival functions for customer churn prediction there-fore we use the original dataset to create a Cox proportionalhazards regression model as the baseline Cox model forcomparison

322 ANN + ANN + Cox The first hierarchical model isbased on combining two ANN models and Cox regressionin which the first ANN performs the data reduction taskand the second ANN for churn classification and the lastCox regression for hazard function prediction As there isno 100 accuracy there are a number of correctly andincorrectly predicted data from the training set by the firstANN model Consequently the incorrectly predicted data

can be regarded as outliers since the ANN model cannotpredict them accuratelyThen the correctly predicted data bythe first ANNmodel are used to train the secondANNmodelas the classification model Finally the corrected classifieddata from second ANN are used by Cox regression to predicthazard function

323 SOM+ANN+Cox For the second hierarchicalmodela self-organizingmap (SOM) which is a clustering techniqueis used for the data reduction task Then the correctedclustered data are used to train the second model basedon ANN Finally the classification result is used to hazardfunction prediction To develop the SOM the map size is setby 2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 respectively in orderto obtain the highest rate of prediction accuracy Then twoclusters of SOM which contain the highest proportion of thechurner and nonchurner groups respectively are selected asthe clustering result

324 120572-FCM + ANN + Cox In the third hierarchicalmodel 120572-FCM which is a clustering approach is used fordata reduction task In the fuzzy c-means (FCM) clusteringalgorithm almost none of the data points have amembershipvalue of 1 Besides noise and outliers may cause difficultiesin obtaining appropriate clustering results from the FCMalgorithmTherefore many studies have been done about theFCM algorithm in the literature [32] Furthermore studiesabout FCM can be divided into two categories One isto extend the dissimilarity (or distance) measure d(119909119895 119886119894)between the data point119909119895 and the cluster center 119886119894 in the FCMobjective function by replacing the Euclidean distance withother types of metric measures [33] The other category is toextend the FCM objective function by adding a penalty term[34]

One of the best methods for assigning a data point toexactly one cluster is that if the membership value 120583119894119895 of thedata point 119909119895 in the ith cluster is larger than a given value 120572then the point 119909119895 will exactly belong to the ith cluster withmembership value of 1 and 1205831198941015840119895 = 0 for all i = 119894

1015840 In order toguarantee that no two of these c cluster cores will overlapthe value of 120572 is set to interval [05 1] [35] The cluster coresgenerated by FCM120572 can be calculated by (9)

120583119894119895 =

119889 (119909119895 119886119894)minus1(119898minus1)

sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1)gt 120572 (9)

which is equivalent to

119889 (119909119895 119886119894)minus1(119898minus1)

lt

1

120572sum119888119896=1 119889 (119909119895 119886119896)

minus1(119898minus1) (10)

where m is the fuzziness index so its value is considered as2 Interesting readers are referred to [35] for more detailThen the corrected clustered data are used to train secondANN model in order to customer classification Finally thehazard function using Cox regression is predicted based onthe corrected classified result from ANNmodel

Journal of Engineering 5

325 Evaluation Method To evaluate the proposed churnpredictionmodels prediction accuracy and the Type I and IIerrors are considered They can be measured by a confusionmatrix shown in Table 1 The rate of prediction accuracy isdefined as (119886 + 119889)(119886 + 119887 + 119888 + 119889)

The Type I error is the error of not rejecting a nullhypothesis when the alternative hypothesis is the true stateof nature In this paper it means that the customer is notchurned when the model has predicted that the hazardfunction of that customer is more than 120572 (ie 120572 is the alphacut in fuzzy c-means clustering method) On the other handthe Type II error is defined as the error of rejecting a nullhypothesis when it is the true state of nature It means thatthe customer is churned when the model has predicted thatthe survival function of that customer is more than 120572

We also compare the performance of the proposedmodelwith pure Cox proportional hazards model in predictingthe churn or survival probability of the customers Theobserved outcome for each customer in the sample is eitherchurn or survival (ie still active) by the end of the studyperiod We compute the deviation between observed andpredicted outcomes (ie the probability of churn or survivalas predicted by the model) for both proposed and pure Coxmodel The Root Mean Squared Error (RMSE) and MeanAbsolute Deviation (MAD) are calculated for comparingboth models as follows

RMSE =radic

sum119873ch119894=1 (119878119894ch minus 0)

2

119873ch+

sum119873Nch119895=1 (1 minus 119878

119895

Nch)2

119873Nch

MAD =

1

119873ch

119873ch

sum

119894=1

10038161003816100381610038161003816119890119894ch minus 119890ch

10038161003816100381610038161003816+

1

119873Nch

119873Nch

sum

119895=1

10038161003816100381610038161003816119890119895

Nch minus 119890Nch10038161003816100381610038161003816

(11)

where 119878119894ch and 119878

119895

Nch are the survival probability of churnedcustomer i and non-churned customer j respectively119873ch and 119873Nch are the number of churned and nonchurnedcustomer respectively 119890119894ch is the deviation of churned cus-tomer i fromzero (ie 119890119894ch = (119878

119894chminus0)) and 119890

119895

Nch is the deviationof non-churned customer j from one (ie 119890119895Nch = (1 minus 119878

119895

Nch))and 119890ch and 119890Nch are the mean of the deviation of churnedand non-churned customers respectively

4 Experimental Results

41 The Baseline In order to create the Cox model 2350and remained 800 numbers of data are used for trainingand testing the Cox model respectively Table 2 shows theprediction performance of the baseline Cox proportionalhazardsmodel based on type I and II errors accuracy RMSEandMADmetrics On average the baseline Cox proportionalhazards model provides about 84 accuracy meaning thatin 128 cases of data the Cox model was unable to correctlypredict the survival and hazard probability based on value ofalpha-cut 07 The type I and II errors were equal to 87 and 41cases of incorrectly predicted data The baseline Cox modelalso provides 0083 and 0098 as the RMSE and MAD errormetrics respectively

75

80

85

90

95

100

Base

line

8479

8842

8631

9586

Accu

racy

()

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 3 The accuracy of hierarchical models

0

20

40

60

80

100Er

ror (

)

Type I errorType II error

87

61

76

2135 32 34

12

Base

line

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 4 The Type I and II errors of hierarchical models

42 ANN + ANN + Cox For the first hierarchical modelbased on combining two ANN models and Cox regressionthe first ANN model performs the data reduction taskTherefore we run the ANNmodel by a set of different hiddenlayer and learning epochsThe result of different combinationof hidden layer and learning epochs is as Table 3 in whichan ANN models with 16 and 12 hidden layer and 100 and300 learning epochs are considered for two ANN modelrespectively Finally the accuracy and other performancemetrics for hierarchical ANN+ANN+Coxmodel are shownin Table 4

43 SOM+ANN+Cox To construct the second hierarchicalmodel by combination of SOM ANN and Cox regression2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 SOMs are used to

6 Journal of Engineering

Table 1 Confusion matrix

ActualNonchurners Churners

Predicted Nonchurners 119886 119887 (II 119878lowast gt 120572)

Churners 119888 (I119867lowastlowast gt 120572) 119889

lowastProbability of survivallowastlowastProbability of hazard

Table 2 The prediction performance of the baseline Cox proportional hazards model

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8479 87 35 0091 0098

cluster the data at first We found that 4 lowast 4 SOM performsthe best which can provide the highest rate of accuracy fortwo clusters that is churner and nonchurner clusters Thenthe accurate clustered data are used for training classifierANN and the result of different hidden layer and learningepochs is as Table 5 in which an ANN model with 16 hiddenlayer and 200 epochs is considered as classifiermodel Finallythe accuracy and other performance metrics for hierarchicalSOM + ANN + Cox model are shown in Table 6

44 120572-FCM + ANN + Cox To construct the third hierarchi-cal model based on alpha-cut fuzzy c-means classifier ANNmodel and Cox proportional hazards model the alpha-cutfuzzy c-means with alpha-cut equal to 07 and ANN with 16hidden layer and 100 learning epochs were found regardingto the best reported accuracy The 120572-FCM used for datareduction task results in two clusters including 2210 churnersand 940 nonchurnersThen 2350 and remained 800 numbersof data are respectively used for training and testing theclassification ANNmodel where the prediction performanceof ANNmodel is shown in Table 7 Finally Table 8 shows theperformance metrics of 120572-FCM + ANN + Cox hierarchicalmodels

On average the 120572-FCM+ANN+Cox hierarchical modelprovides about 9549 accuracy based on alpha-cut equalto 07 and ANN with 16 hidden layer and 100 learningepochs The type I and II errors are equal to 21 and 12 casesof incorrectly predicted data The 120572-FCM + ANN + Coxhierarchicalmodel also provides 0031 and 0042 as the RMSEand MAD error metrics respectively

In order to show the high performance of 120572-FCM+ANN+ Cox hierarchical model the accuracy errors type I and IIRMSE and MAD metrics are illustrated in Figures 3 4 and5 respectively

5 Conclusion

As customers are the main competitive advantage of eachindustry customer churn prediction is becoming a majortask for companies to remain in competition with otherindustries Therefore building an effective customer churnprediction model which provides an acceptable level ofaccuracy has become a research problem for companies in

002

004

006

008

01

012

Erro

r

MADRMSE

00980088 0091

0042

0091

0071

0084

0031Ba

selin

e

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 5 The RMSE and MAD errors of hierarchical models

recent years In the literature the better applicability andefficiency of hierarchical data mining techniques in orderto predict customer attrition by combining two or moretechniques has been reported over a number of differentdomain problems In this paper we consider three differenthierarchical datamining techniques based on combination ofsome neural networks and regressionmodel to examine theirperformances for telecommunication industry In particu-lar backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means and Coxproportional hazard model are considered ConsequentlyANN + ANN + Cox SOM + ANN + Cox and 120572-FCM +ANN + Cox hierarchical models are developed in whichthe first component of the hierarchical models filter outunrepresentative data or outliers Then the corrected outputclustered data are used to classify customer into churner andnonchurner groups

To evaluate the performance of the hierarchical modelsan Iranian mobile dataset is considered The experimental

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

Journal of Engineering 5

325 Evaluation Method To evaluate the proposed churnpredictionmodels prediction accuracy and the Type I and IIerrors are considered They can be measured by a confusionmatrix shown in Table 1 The rate of prediction accuracy isdefined as (119886 + 119889)(119886 + 119887 + 119888 + 119889)

The Type I error is the error of not rejecting a nullhypothesis when the alternative hypothesis is the true stateof nature In this paper it means that the customer is notchurned when the model has predicted that the hazardfunction of that customer is more than 120572 (ie 120572 is the alphacut in fuzzy c-means clustering method) On the other handthe Type II error is defined as the error of rejecting a nullhypothesis when it is the true state of nature It means thatthe customer is churned when the model has predicted thatthe survival function of that customer is more than 120572

We also compare the performance of the proposedmodelwith pure Cox proportional hazards model in predictingthe churn or survival probability of the customers Theobserved outcome for each customer in the sample is eitherchurn or survival (ie still active) by the end of the studyperiod We compute the deviation between observed andpredicted outcomes (ie the probability of churn or survivalas predicted by the model) for both proposed and pure Coxmodel The Root Mean Squared Error (RMSE) and MeanAbsolute Deviation (MAD) are calculated for comparingboth models as follows

RMSE =radic

sum119873ch119894=1 (119878119894ch minus 0)

2

119873ch+

sum119873Nch119895=1 (1 minus 119878

119895

Nch)2

119873Nch

MAD =

1

119873ch

119873ch

sum

119894=1

10038161003816100381610038161003816119890119894ch minus 119890ch

10038161003816100381610038161003816+

1

119873Nch

119873Nch

sum

119895=1

10038161003816100381610038161003816119890119895

Nch minus 119890Nch10038161003816100381610038161003816

(11)

where 119878119894ch and 119878

119895

Nch are the survival probability of churnedcustomer i and non-churned customer j respectively119873ch and 119873Nch are the number of churned and nonchurnedcustomer respectively 119890119894ch is the deviation of churned cus-tomer i fromzero (ie 119890119894ch = (119878

119894chminus0)) and 119890

119895

Nch is the deviationof non-churned customer j from one (ie 119890119895Nch = (1 minus 119878

119895

Nch))and 119890ch and 119890Nch are the mean of the deviation of churnedand non-churned customers respectively

4 Experimental Results

41 The Baseline In order to create the Cox model 2350and remained 800 numbers of data are used for trainingand testing the Cox model respectively Table 2 shows theprediction performance of the baseline Cox proportionalhazardsmodel based on type I and II errors accuracy RMSEandMADmetrics On average the baseline Cox proportionalhazards model provides about 84 accuracy meaning thatin 128 cases of data the Cox model was unable to correctlypredict the survival and hazard probability based on value ofalpha-cut 07 The type I and II errors were equal to 87 and 41cases of incorrectly predicted data The baseline Cox modelalso provides 0083 and 0098 as the RMSE and MAD errormetrics respectively

75

80

85

90

95

100

Base

line

8479

8842

8631

9586

Accu

racy

()

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 3 The accuracy of hierarchical models

0

20

40

60

80

100Er

ror (

)

Type I errorType II error

87

61

76

2135 32 34

12

Base

line

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 4 The Type I and II errors of hierarchical models

42 ANN + ANN + Cox For the first hierarchical modelbased on combining two ANN models and Cox regressionthe first ANN model performs the data reduction taskTherefore we run the ANNmodel by a set of different hiddenlayer and learning epochsThe result of different combinationof hidden layer and learning epochs is as Table 3 in whichan ANN models with 16 and 12 hidden layer and 100 and300 learning epochs are considered for two ANN modelrespectively Finally the accuracy and other performancemetrics for hierarchical ANN+ANN+Coxmodel are shownin Table 4

43 SOM+ANN+Cox To construct the second hierarchicalmodel by combination of SOM ANN and Cox regression2 lowast 2 3 lowast 3 4 lowast 4 5 lowast 5 and 6 lowast 6 SOMs are used to

6 Journal of Engineering

Table 1 Confusion matrix

ActualNonchurners Churners

Predicted Nonchurners 119886 119887 (II 119878lowast gt 120572)

Churners 119888 (I119867lowastlowast gt 120572) 119889

lowastProbability of survivallowastlowastProbability of hazard

Table 2 The prediction performance of the baseline Cox proportional hazards model

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8479 87 35 0091 0098

cluster the data at first We found that 4 lowast 4 SOM performsthe best which can provide the highest rate of accuracy fortwo clusters that is churner and nonchurner clusters Thenthe accurate clustered data are used for training classifierANN and the result of different hidden layer and learningepochs is as Table 5 in which an ANN model with 16 hiddenlayer and 200 epochs is considered as classifiermodel Finallythe accuracy and other performance metrics for hierarchicalSOM + ANN + Cox model are shown in Table 6

44 120572-FCM + ANN + Cox To construct the third hierarchi-cal model based on alpha-cut fuzzy c-means classifier ANNmodel and Cox proportional hazards model the alpha-cutfuzzy c-means with alpha-cut equal to 07 and ANN with 16hidden layer and 100 learning epochs were found regardingto the best reported accuracy The 120572-FCM used for datareduction task results in two clusters including 2210 churnersand 940 nonchurnersThen 2350 and remained 800 numbersof data are respectively used for training and testing theclassification ANNmodel where the prediction performanceof ANNmodel is shown in Table 7 Finally Table 8 shows theperformance metrics of 120572-FCM + ANN + Cox hierarchicalmodels

On average the 120572-FCM+ANN+Cox hierarchical modelprovides about 9549 accuracy based on alpha-cut equalto 07 and ANN with 16 hidden layer and 100 learningepochs The type I and II errors are equal to 21 and 12 casesof incorrectly predicted data The 120572-FCM + ANN + Coxhierarchicalmodel also provides 0031 and 0042 as the RMSEand MAD error metrics respectively

In order to show the high performance of 120572-FCM+ANN+ Cox hierarchical model the accuracy errors type I and IIRMSE and MAD metrics are illustrated in Figures 3 4 and5 respectively

5 Conclusion

As customers are the main competitive advantage of eachindustry customer churn prediction is becoming a majortask for companies to remain in competition with otherindustries Therefore building an effective customer churnprediction model which provides an acceptable level ofaccuracy has become a research problem for companies in

002

004

006

008

01

012

Erro

r

MADRMSE

00980088 0091

0042

0091

0071

0084

0031Ba

selin

e

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 5 The RMSE and MAD errors of hierarchical models

recent years In the literature the better applicability andefficiency of hierarchical data mining techniques in orderto predict customer attrition by combining two or moretechniques has been reported over a number of differentdomain problems In this paper we consider three differenthierarchical datamining techniques based on combination ofsome neural networks and regressionmodel to examine theirperformances for telecommunication industry In particu-lar backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means and Coxproportional hazard model are considered ConsequentlyANN + ANN + Cox SOM + ANN + Cox and 120572-FCM +ANN + Cox hierarchical models are developed in whichthe first component of the hierarchical models filter outunrepresentative data or outliers Then the corrected outputclustered data are used to classify customer into churner andnonchurner groups

To evaluate the performance of the hierarchical modelsan Iranian mobile dataset is considered The experimental

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

6 Journal of Engineering

Table 1 Confusion matrix

ActualNonchurners Churners

Predicted Nonchurners 119886 119887 (II 119878lowast gt 120572)

Churners 119888 (I119867lowastlowast gt 120572) 119889

lowastProbability of survivallowastlowastProbability of hazard

Table 2 The prediction performance of the baseline Cox proportional hazards model

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8479 87 35 0091 0098

cluster the data at first We found that 4 lowast 4 SOM performsthe best which can provide the highest rate of accuracy fortwo clusters that is churner and nonchurner clusters Thenthe accurate clustered data are used for training classifierANN and the result of different hidden layer and learningepochs is as Table 5 in which an ANN model with 16 hiddenlayer and 200 epochs is considered as classifiermodel Finallythe accuracy and other performance metrics for hierarchicalSOM + ANN + Cox model are shown in Table 6

44 120572-FCM + ANN + Cox To construct the third hierarchi-cal model based on alpha-cut fuzzy c-means classifier ANNmodel and Cox proportional hazards model the alpha-cutfuzzy c-means with alpha-cut equal to 07 and ANN with 16hidden layer and 100 learning epochs were found regardingto the best reported accuracy The 120572-FCM used for datareduction task results in two clusters including 2210 churnersand 940 nonchurnersThen 2350 and remained 800 numbersof data are respectively used for training and testing theclassification ANNmodel where the prediction performanceof ANNmodel is shown in Table 7 Finally Table 8 shows theperformance metrics of 120572-FCM + ANN + Cox hierarchicalmodels

On average the 120572-FCM+ANN+Cox hierarchical modelprovides about 9549 accuracy based on alpha-cut equalto 07 and ANN with 16 hidden layer and 100 learningepochs The type I and II errors are equal to 21 and 12 casesof incorrectly predicted data The 120572-FCM + ANN + Coxhierarchicalmodel also provides 0031 and 0042 as the RMSEand MAD error metrics respectively

In order to show the high performance of 120572-FCM+ANN+ Cox hierarchical model the accuracy errors type I and IIRMSE and MAD metrics are illustrated in Figures 3 4 and5 respectively

5 Conclusion

As customers are the main competitive advantage of eachindustry customer churn prediction is becoming a majortask for companies to remain in competition with otherindustries Therefore building an effective customer churnprediction model which provides an acceptable level ofaccuracy has become a research problem for companies in

002

004

006

008

01

012

Erro

r

MADRMSE

00980088 0091

0042

0091

0071

0084

0031Ba

selin

e

AN

N+

AN

N+

Cox

120572FC

M+

AN

N+

Cox

SOM+

AN

N+

Cox

Figure 5 The RMSE and MAD errors of hierarchical models

recent years In the literature the better applicability andefficiency of hierarchical data mining techniques in orderto predict customer attrition by combining two or moretechniques has been reported over a number of differentdomain problems In this paper we consider three differenthierarchical datamining techniques based on combination ofsome neural networks and regressionmodel to examine theirperformances for telecommunication industry In particu-lar backpropagation artificial neural networks (ANN) self-organizing maps (SOM) alpha-cut fuzzy c-means and Coxproportional hazard model are considered ConsequentlyANN + ANN + Cox SOM + ANN + Cox and 120572-FCM +ANN + Cox hierarchical models are developed in whichthe first component of the hierarchical models filter outunrepresentative data or outliers Then the corrected outputclustered data are used to classify customer into churner andnonchurner groups

To evaluate the performance of the hierarchical modelsan Iranian mobile dataset is considered The experimental

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

Journal of Engineering 7

Table 3 Prediction performance of ANN + ANN hierarchical models

Hidden layerLearning epochs

Clustering ANN Classification ANN50 100 200 300 50 100 200 300

8 8790 8934 9066 9117 8565 8750 8959 918412 8350 8742 9011 9087 8523 8825 9190 924916 9025 9269 8845 9077 9118 8843 8857 847620 8854 9021 8580 9106 9054 8792 8428 860824 9155 9185 8929 8646 8685 8841 8680 834628 9161 9021 8602 9129 9169 9055 8937 899532 9078 8935 8813 8611 9128 8655 8480 9039

Table 4 Performance metrics of ANN + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8842 61 32 0071 0088

Table 5 Prediction performance of ANN in SOM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8340 8530 8930 906712 8431 8646 8931 897716 8745 9028 9388 911820 9030 9029 9187 840624 8576 8944 8337 938728 9099 9157 8759 902032 9146 8328 8571 8960

Table 6 Performance metrics of SOM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 8631 76 34 0084 0091

Table 7 Prediction performance of ANN in 120572-FCM + ANN + Cox hierarchical models

Hidden layer Learning epochs50 100 200 300

8 8432 8531 8796 892112 8471 8776 8663 863416 8735 9320 9267 907820 9035 8381 9101 895924 8811 8621 8878 883328 8372 8654 8998 875332 8569 8791 8541 8715

Table 8 Performance metrics of 120572-FCM + ANN + Cox hierarchical models

Performance metricsAccuracy Error type I Error type II RMSE MAD

Value 9586 21 12 0031 0042

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

8 Journal of Engineering

results show that the hierarchical models outperform thesingle Cox regression baseline model in terms of predictionaccuracy types I and II errors RMSE and MAD metricsIn addition the 120572-FCM + ANN + Cox model significantlyperforms better than the SOM + ANN + Cox and ANN +

ANN + Cox modelsFor future work other prediction techniques can be

applied such as support vectormachines genetic algorithmslogistic regression and so forth Finally other domaindatasets about churn prediction can be used for furthercomparison

References

[1] K Coussement and D Van den Poel ldquoIntegrating the voice ofcustomers through call center emails into a decision supportsystem for churn predictionrdquo Information and Managementvol 45 no 3 pp 164ndash174 2008

[2] E W T Ngai L Xiu and D C K Chau ldquoApplication ofdata mining techniques in customer relationship managementa literature review and classificationrdquo Expert Systems withApplications vol 36 no 2 pp 2592ndash2602 2009

[3] C Hung and C F Tsai ldquoMarket segmentation based onhierarchical self-organizing map for markets of multimedia ondemandrdquo Expert Systems with Applications vol 34 no 1 pp780ndash787 2008

[4] M J A Berry and G S Linoff Data Mining Techniques ForMarketing Sales and Customer Support John Wiley amp Sons2003

[5] D Van den Poel and B Lariviere ldquoCustomer attrition analysisfor financial services using proportional hazard modelsrdquo Euro-pean Journal of Operational Research vol 157 no 1 pp 196ndash2172004

[6] P Kisioglu and Y I Topcu ldquoApplying Bayesian Belief Networkapproach to customer churn analysis a case study on thetelecom industry of Turkeyrdquo Expert Systems with Applicationsvol 38 no 6 pp 7151ndash7157 2011

[7] C F Tsai and Y H Lu ldquoCustomer churn prediction by hybridneural networksrdquo Expert Systems with Applications vol 36 no10 pp 12547ndash12553 2009

[8] W Verbeke K Dejaeger D Martens J Hur and B BaesensldquoNew insights into churn prediction in the telecommunicationsector a profit driven data mining approachrdquo European Journalof Operational Research vol 218 pp 211ndash229 2011

[9] B Baesens G Verstraeten D Van den Poel M Egmont-Petersen P VanKenhove and J Vanthienen ldquoBayesian networkclassifiers for identifying the slope of the customer lifecycle oflong-life customersrdquo European Journal of Operational Researchvol 156 no 2 pp 508ndash523 2004

[10] W Buckinx and D Van den Poel ldquoCustomer base analysis par-tial defection of behaviourally loyal clients in a non-contractualFMCG retail settingrdquo European Journal of Operational Researchvol 164 no 1 pp 252ndash268 2005

[11] J Burez and D Van den Poel ldquoCRM at a pay-TV companyusing analytical models to reduce customer attrition by targetedmarketing for subscription servicesrdquo Expert Systems with Appli-cations vol 32 no 2 pp 277ndash288 2007

[12] K Coussement and D Van den Poel ldquoChurn prediction in sub-scription services an application of support vector machineswhile comparing two parameter-selection techniquesrdquo ExpertSystems with Applications vol 34 no 1 pp 313ndash327 2008

[13] N Glady B Baesens andC Croux ldquoModeling churn using cus-tomer lifetime valuerdquo European Journal of Operational Researchvol 197 no 1 pp 402ndash411 2009

[14] X Yu S Guo J Guo and X Huang ldquoAn extended supportvector machine forecasting framework for customer churn ine-commercerdquo Expert Systems with Applications vol 38 no 3pp 1425ndash1430 2011

[15] J Qi L Zhang Y Liu et al ldquoADTreesLogit model for customerchurn predictionrdquoAnnals of Operations Research vol 168 no 1pp 247ndash265 2009

[16] B Huang M T Kechadi and B Buckley ldquoCustomer churnprediction in telecommunicationsrdquo Expert Systems with Appli-cations vol 39 no 1 pp 1414ndash1425 2012

[17] AKeramati and SM SArdabili ldquoChurn analysis for an Iranianmobile operatorrdquo Telecommunications Policy vol 35 no 4 pp344ndash356 2011

[18] S Lessmann and S Voszlig ldquoA reference model for customer-centric data mining with support vector machinesrdquo EuropeanJournal of Operational Research vol 199 no 2 pp 520ndash5302009

[19] D E Rumelhart G E Hinton and R J Williams ldquoLearn-ing internal representations by error propagationrdquo in ParallelDistributed Processing Explorations in the Microstructure ofCognition vol 1 pp 318ndash362MIT Press CambrigeMass USA1986

[20] P C Pendharkar and J A Rodger ldquoAn empirical study of impactof crossover operators on the performance of non-binarygenetic algorithm based neural approaches for classificationrdquoComputers and Operations Research vol 31 no 4 pp 481ndash4982004

[21] P Pendharkar and S Nanda ldquoA misclassification cost-minimizing evolutionary-neural classification approachrdquoNaval Research Logistics vol 53 no 5 pp 432ndash447 2006

[22] P C Pendharkar ldquoA comparison of gradient ascent gradientdescent and genetic-algorithm- based artificial neural networksfor the binary classification problemrdquo Expert Systems vol 24no 2 pp 65ndash86 2007

[23] E Granger M A Rubin S Grossberg and P Lavoie ldquoA What-and-Where fusion neural network for recognition and trackingof multiple radar emittersrdquo Neural Networks vol 14 no 3 pp325ndash344 2001

[24] J Sharifie C Lucas andBNAraabi ldquoLocally linear neurofuzzymodeling and prediction of geomagnetic disturbances based onsolar wind conditionsrdquo Space Weather vol 4 no 6 pp 1ndash122006

[25] S Y Hung D C Yen andH YWang ldquoApplying datamining totelecom churn managementrdquo Expert Systems with Applicationsvol 31 no 3 pp 515ndash524 2006

[26] A Jain M Murty and P Flyn ldquoData clustering a reviewrdquo ACMComputing Surveys vol 31 no 3 pp 264ndash323 1999

[27] J Dunn ldquoA fuzzy relative of the ISO-data process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[28] T Kohonen ldquoAdaptive associative and self-organizing func-tions in neural computingrdquo Applied Optics vol 26 no 23 pp4910ndash4918 1987

[29] X Zhang J Edwards and J Harding ldquoPersonalised online salesusing web usage data miningrdquo Computers in Industry vol 58no 8-9 pp 772ndash782 2007

[30] J Han and M Kamber Data Mining Concepts and TechniquesMorgan Kaufmann San Francisco Calif USA 2001

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

Journal of Engineering 9

[31] D R Cox and D Oakes Analysis of Survival Data Chapmanand Hall London UK 1984

[32] J Yu and M S Yang ldquoOptimality test for generalized FCM andits application to parameter selectionrdquo IEEE Transactions onFuzzy Systems vol 13 no 1 pp 164ndash176 2005

[33] K L Wu and M S Yang ldquoAlternative c-means clusteringalgorithmsrdquo Pattern Recognition vol 35 no 10 pp 2267ndash22782002

[34] N R Pal K Pal J M Keller and J C Bezdek ldquoA possibilisticfuzzy c-means clustering algorithmrdquo IEEE Transactions onFuzzy Systems vol 13 no 4 pp 517ndash530 2005

[35] M S Yang K L Wu J N Hsieh and J Yu ldquoAlpha-cut imple-mented fuzzy clustering algorithms and switching regressionsrdquoIEEETransactions on SystemsMan andCybernetics Part B vol38 no 3 pp 588ndash603 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article Hierarchical Neural Regression Models for ...downloads.hindawi.com/journals/je/2013/543940.pdf · Research Article Hierarchical Neural Regression Models for Customer

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of