data mining and statistics for decision making (tufféry/data mining and statistics for decision...

13

Factors for success in a data

mining project

The aim of this chapter is to present the factors for success in a data mining project in business,

particularly where the project is implemented in-house rather than outsourced. It will describe

the pitfalls to be avoided and provide an outline of the expected return on investment.

13.1 The subject

The subject of the study must of course be one that requires the use of data mining tools and

cannot be dealt with by simple descriptive statistics. Data mining will not help us to find

the ‘20% of customers who generate 80% of the profits’, but it is useful for determining

their profile or for discovering thosewho do not form part of this group at present, but whowill

in future.

The subject, the target population and the objectives must be precisely specified. We must

avoid constructing a score on a certain customer segment and then extending it to another for

which it is inappropriate. The results must be capable of being measured. We must try to

estimate the return on investment.

The objectives must be realistic: if the rate of response to a mailing is 1%, it may perhaps

be increased to 3%, but certainly not 10%. Unrealistically ambitious objectives can lead

to disappointment which will harm the credibility of data mining and its wider application

in the business.

The business must have at least a degree of expertise on the subject.

The subject must be a challenge for the enterprise, and must offer some real benefits. This

is particularly true of a first project, which must be convincing and develop loyalty.

The business must be both willing and able to implement the solutions proposed by data

mining. For example, it is necessary to check the ITand electronic publishing resources: there

is no point in devising customized mailings if they cannot be provided at an acceptable cost.

Data Mining and Statistics for Decision Making, First Edition. Stéphane Tufféry.

© 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd. ISBN: 978-0-470-68829-8

13.2 The people

The project must be supported by a business decision. The decision makers must be made

aware of the project and must back it.

The specialist staff of the business must be mobilized:

. before the project, to specify its content, outline the correct underlying concepts,

identify useful information sources, and supply the necessary data and definitions for

the study;

. during the study, to assess relevance and identify elements for closer examination

among the phenomena discovered by the statistician;

. after the study, to use the results and take the appropriate action.

IT specialists are needed to extract the data, construct the database to be supplied to the

statistician, and if necessary to program statistical models subsequently in an industrial

computing environment.

Statisticians are need to analyse and format the data, detect any anomalies, choose the

appropriate modelling techniques, implement them correctly, produce effective models from

these, test the models and analyse the results of their application.

A thorough knowledge of statistics, data, and the nature and customers of the business is

essential if we are to decide whether a clustering is correct and usable (the clusters must be

homogeneous, consistent and readable) and whether a model is correct (the coefficients of the

variables must have relevant values and signs, and sufficient reliability), and for the purposes

of regrouping or transforming the data and constructing good indicators or excluding

redundant data. A model or a classification may be far from self-evident and may even

include unexpected elements (which is what makes data mining useful), but it must not be

unlikely, incomprehensible or unusable by the people for whom it is designed.

We can see that the business will require a lot of in-house skills. The active cooperation of

all these people is essential. Specialist staff and future users must be involved in the progress

of the study: the knowledge contained in the data is only one part of the business’s general

know-how. Data mining on its own will not provide the best models; these will be created by

the interplay between the knowledge extracted from the data and the experience of specialist

staff. However, it is preferable for a single person, the data miner, to be skilled in three areas

(knowledge of the business, statistics, and information technology) to ensure the fast and

effective deployment of the project.

Finally, we must add the necessary legal expertise to decide on the legal and regulatory

aspects of using the data and carrying out the planned processing.

13.3 The data

We must have data which are known, reliable and usable, and there must be enough of them

(see Sections 2.4 and 3.3–3.5). We need to archive the data that change over time and whose

variation is to be analysed, or those which will be used to predict subsequent phenomena and

behaviour. We need to keep all information on earlier business operations: who has been

contacted, by what channel, who has responded, after what time interval, who has been

618 FACTORS FOR SUCCESS IN A DATA MINING PROJECT

followed up, how many times, who has accepted or refused, what the cost of the process was,

and so on. This will help us avoid the problems mentioned in Section 11.16.2. At the very

least, we must preserve the number of business contacts with each customer and their results.

This is because data mining does not guess the profile of ‘good’ customers, but extrapolates

from the data provided, mainly the results of earlier operations, which can be used to extract

positive and negative profiles relating to risk, propensity, etc. It is therefore absolutely

essential to store this information on business operations.

The multiplication of customer contact and distribution channels also offers many sources

of information, which tends to be scattered accordingly. To make the best use of this

information, we must be able to consolidate them into a coherent form in a synthetic

database, in order to obtain a unique and comprehensive view of each customer. This is not

easy, and is not always achieved.

13.4 The IT Systems

If a business is implementing data warehousing and data mining projects at the same time, it is

preferable to execute them in parallel rather than in sequence. To archive the data first and then

carry out the data mining would not be an acceptable procedure, because to some extent it is

the data mining itself that determines what must be archived in a reliable form, and how to go

about this task. It would also be unfortunate if we waited several years for the completion of

the data warehouse and then found, in the early stages of data mining, that we lacked some

important data which were not considered when the warehouse was designed. There are at

least four good reasons for the early application of data mining:

. it will generate a return on investment which will provide evidence of the value of data

mining, especially to managers in charge of budgets;

. it will identify the most important data and indicators for the construction of relevant

models;

. it will start the process of archiving business operations, as mentioned in Section 13.3;

. it will support and develop the skills of the participants.

For a first trial, or pilot project, it is possible and may even be preferable to use existing

tools, or new tools that are easily implemented, for data collection and output, and to wait

for the results of the initial trials before deciding on major changes in the IT systems for

full-scale operation.

It will also be necessary to archive numerous files, requiring a large storage capacity.

Unlike conventional management information systems, which back up data on media suitable

for limited, one-off retrieval, data mining often has to process several years of archives

simultaneously. Whereas a simple back-up system only has to retrieve a given stored file on a

given day, the data mining system must be able to put a very large number files – up to several

tens of terabytes – on-line.

Data mining also uses specialized data models. We cannot directly ‘mine’ the production

data or the tables of a data centre or data warehouse. We must first set up a special data mart,

known as a mining mart, a modelling base (Section 2.3), resulting in an even greater increase

in the volume of computer data to be stored. However, we can try to standardize the

THE IT SYSTEMS 619

descriptions of themodelling bases so that they can be used for a number of different studies or

applications in the business.

Finally, it is obvious that a successful implementation of data mining is dependent on full

integration into the users’ workstations, especially those of the customer service representa-

tives, who need to have fast and straightforward access to the data, given the context in which

they use them. The output must be user-friendly and allow for direct, simple and unambiguous

interpretation. If necessary it must be made simpler than the detailed results supplied by the

data mining algorithms. For example, the average human mind will find it difficult to think of

more than six or seven customer segments simultaneously and instantly place a customer in

the correct segment.

13.5 The business culture

Data miningmust form part of the business culture. The business must ensure that it maintains

its expertise in data mining and statistics, as well as the quality of the data gathered and stored.

Every business operation dependent on data mining must be carefully managed in its

implementation and monitoring (recording the results). The iterative nature of data mining

must be clearly understood, and the results of an operation dependent on a data mining study

must be used automatically to enrich the next study.

Care must be taken in presenting and ‘selling’ data mining to marketing and sales

managers and to field staff, who may think it calls their know-how into question or is designed

to replace it. They must be persuaded that data mining only offers an aid to decision making,

not the decision itself, which is always up to them. They must also be reminded to keep the

marketing databases up to date, especially as regards the results of campaigns, namely the

acceptances and refusals. Customer service representatives must be made aware of the gains

in productivity and security that they can expect from data mining.

Marketing managers must also be involved in data mining studies, so that they do not

feel that they have lost control of the choice of target customers. Data mining will not make

their experience obsolete, but should incorporate it, not after the identification of targets for

campaigns, in the definition of each target, but rather beforehand, in the design of the data

mining models. In some businesses it will be essential to make the change from ‘product-

orientated’ to ‘customer-orientated’ marketing. Conventional product-orientated marketing

starts with a product i, looks for the period of the year Pi which will be best for selling it,

looks for the customers Ci who are likely to buy it, and targets customers C1 in period P1,

customers C2 in period P2, and so on. The drawback of this method is that the intersection of

the Ci is not necessarily empty: in other words, the same customer may be targeted several

times without any consistency in the marketing communications and trading logic. At best,

this is useless; at worst, it detracts from the customer’s image of the business. Of course, it

would be possible to excluded previously targeted customers from each campaign, but there

would be no guarantee that the order of targeting would lead to the best results. Conversely,

some customers will never be targeted. Moving to customer-orientated marketing means that

marketing operations are carried out according to the profile of customers, their require-

ments or their life events, rather than according to the events in the life of the products. We

know that a given customer belongs to a given segment characterized by a certain

consumption of products, services and means of access, and that this customer should


therefore be offered certain products and services in a certain order of priority and via a

certain channel. It will be the strong trends in the customer segment that determine the

priorities for this customer, not the sequence of marketing campaigns and the randomness of

targeting. This will require a radical review of working habits, or even of whole organiza-

tions, where the marketing management is structured according to product lines rather than

customer segments.

13.6 Data mining: eight common misconceptions

13.6.1 No a priori knowledge is needed

It is true that some descriptive methods such as cluster analysis can be used without knowing

what the resulting clusters will be, or even the appropriate number of clusters.

However, it is important to know that the result of the clustering is influenced by the choice

of data and their coding at the input of the algorithm (an example is the standardization of

continuous variables), and it is therefore impossible to be completely neutral. We could

imagine a system in which all the available computer data were fed into the clustering process.

But even if this were technically feasible, such a solution would mean that the result of the

classification would be dependent on the computer data model, rather than on the business

or statistical requirements, which would obviously be unsatisfactory. Additionally, for

purely technical reasons, there may be redundant data which could distort the result of the

cluster analysis.

As for the predictivemethods of data mining, these require some a priori input in all cases,

because it is necessary to choose a target (dependent) variable whose definition and categories

will be carefully weighted.

In any case, someone who knows what he is looking for is more likely to find it!

13.6.2 No specialist staff are needed

The assistance of professional specialists (in production, engineering, risk assessment,

marketing, etc.) is indispensable at several stages of a data mining study. First of all, it is

required for the definition of the objectives. For example, before drawing up a risk score for a

financial establishment, we need to agree on the definition of a risk: is it a delayed payment, a

downgrading of debt, or a financial loss for the establishment? This is not a question for the

statistician only. It must be answered by the professional specialist, who will consider the

regulatory constraints and the policy of the establishment among other matters.

The assistance of specialists is also required in building up the store of useful and legally

usable data, including both raw and composite data. It is useful to know which data are

considered to be relevant by the specialists, and whichmay have concealed pitfalls, even if the

statistician may subsequently question certain prejudices about the importance of some of the

data, such as the debt ratio for the granting of credit.

Finally, such assistance is essential for analysing the results. Given two classifications of

equal statistical merit, a marketing analyst may prefer one which he considers to be more

suitable for business use. On seeing the initial results of a study, a professional specialist may

also say whether he considers them to be predictable, new and worth investigating, or

surprising and highly suspect, in which case the validity of the data, the sampling, and the use

DATA MINING: EIGHT COMMON MISCONCEPTIONS 621

of the data mining tools will be called into question. The professional specialist may also be

consulted to discover if a correlation between the dependent variable and an independent

variable is created simply by the definition of the variable, or if it can be considered valid. In

some complex problems such as the analysis of the financial health of businesses based on

their accounting data, the cooperation of the statistician and the professional specialist is

essential if errors of interpretation are to be avoided.

13.6.3 No statisticians are needed (‘you can just press a button’)

In any data mining study, the most time-consuming andmost decisive stage is data processing.

It is entirely dependent on the use of statistical analyses for verifying the reliability of the

variables, their distribution, their correlations, etc., and for carrying out reliability improve-

ments, transformations, discretizations and groupings on categories and the like, before the

data mining algorithms are used. These operations are not performed in the same way for

every algorithm. Not all algorithms can accept every type of input variable. Variables with

missing values can be retained in some cases, but not in others. Some algorithms also require

preliminary sampling (see below). In predictive methods, we must ensure that variables

correlated by definition with the dependent variable are not included among the independent

variables. We must also be wary of the phenomenon of overfitting. Finally, the setting of the

parameters of data mining algorithms can have a considerable effect on the results, and certain

seemingly fine adjustments can lead to surprising differences. Simply encoding a qualitative

variable as a ‘discrete numeric’ variable may be enough to distort the results completely, even

if it is only one variable among a hundred other correctly coded ones.

Finally, the data processing phase is interleaved with the modelling phase, as the first

models produced are hardly ever completely satisfactory, and require further data transfor-

mations before the operations are repeated.

On completion of the data processing, the reading of the results may be deceptive; for

example, correlation may be confused with causation.

In conclusion, I quote Philippe Besse, from his course on ‘Statistical modelling and

learning’ at the University of Toulouse (France):

With the tools now available, it is becoming so easy to start the computation process that some

people compare a data miner with a driver, saying that you do not need to be a skilled mechanic

to drive a car. However, the designer of a modelling, segmentation or discrimination procedure

has to make more or less implicit decisions which are far from being neutral and which are far

more complex than the simple choice of a fuel by a driver at a service station.

13.6.4 Data mining will reveal unbelievable wonders

The models produced by data mining are rarely marvellous or extraordinary; they normally

make use of variables considered to be discriminating by professional specialists, in a

common-sense way. So what does data mining offer us? Simply the fact that there are

thousands of common-sense combinations of variables known to be discriminating for any

given problem area, and that data mining enables us to detect the very best possible

combination (or one of the best), together with the precise parameter that should be assigned

to each of the variables. Ultimately, a small improvement in each rule among a set of several

targeting rules is enough to multiply the response rate by a factor of 3–4.


13.6.5 Data mining is revolutionary

Data mining incorporates conventional data analysis, and only differs from it in the following

ways (see also Section A.1.2):

. some of the techniques used, such as decision trees and neural networks, are exclusive to

data mining;

. the number of individuals studied is often larger in data mining, where the optimization

of the algorithms for data processing may be crucial;

. data mining sometimes prefers a slightly less precise model, if it is much more

understandable;

. data mining models are integrated into industrialized data processing procedures, with

automatic updates, computation and outputs.

In spite of everything, we cannot claim that data mining is really a radically new approach.

13.6.6 You must use all the available data

We might think that the results of a data mining model will improve as the number of input

variables increases. However, this is not the case. Models are degraded by unreliable or

incomplete variables and by the presence of outliers; furthermore, redundant variables may

affect a cluster analysis, variables with categories having irregular frequencies may affect a

factor analysis, poorly discriminating or excessively intercorrelated variables may reduce

the predictive power of a discriminant analysis, and an excessive number of variables may

swamp a neural network. Quite often, when a good score model has been built, an attempt is

made to improve it by incorporating a new variable, but, even though the relevance and

reliability of this variable have been ascertained, it actually degrades the quality, and above all

the robustness, of the model.

13.6.7 You must always sample

It is always tricky to achieve satisfactory sampling. A thorough knowledge of the population

to be sampled is a prerequisite. Since this knowledge is not always available, especially with

the kind of unstable populations formed by customers, we must avoid sampling as far as

possible. As an example of the problems caused by sampling, if the distribution of a variable in

the training sample differs from its distribution across the whole population, this may have a

major impact on a method using this variable. It is also best to avoid sampling when we are

looking for rare phenomena (e.g. types of fraud) or narrow customer segments.

13.6.8 You must never sample

Predictive methods based on modelling (inductive methods) require sampling, because

they work by building a model based on part of the population, and then testing the model

on another part of the population. The test phase is essential for selecting the best of the

resulting models.

It may also be desirable to work on a sample of the population in order to avoid prohibitive

computing time for large volumes of data. Sometimes it is best to sample and perform more

DATA MINING: EIGHT COMMON MISCONCEPTIONS 623

in-depth calculations on a sample, rather than more superficial calculations on the total

population. In the words of Jerome H. Friedman: ‘a powerful computationally intense

procedure operating on a subsample of the data may in fact provide superior accuracy than

a less sophisticated one using the entire data base’.1

13.7 Return on investment

The return on investment (ROI) is generated by an increase in the response rate to marketing

campaigns, an increase in the productivity of sales staff, a better distribution of resources, an

increase in customer loyalty, a reduction in defaults, etc.

Many figures have been quoted on the subject of this ROI. The truth is that it is often

difficult to quantify, because the gains due to the use of data mining are not always

distinguished from those due to good communication, effective marketing, and motivated

personnel. In some cases, these various factors cannot be separated: one example is that of the

bank which, having established a risk score, a propensity score for consumer credit and a

monthly repayment capacity for each customer, sent a customized offer of credit to each of its

customers having a good score for risk and propensity (the ‘core target’ group). The amount of

credit offered to each customer was not a standard (rather low) amount such as D1000, D2000

or D3000, but an amount corresponding to his capacity to repay, which was itself calculated

according to his profile, income, expenditure, commitments and scores. This was a clear

example of ‘one-to-one’ marketing. The results were much better than usual, as demonstrated

both quantitatively (in the increased take-up rate) and qualitatively (in the appreciation of the

sales personnel and telephone sales staff). How much of this was due to the quality of

targeting, and how much was due to the customization of the mailing and the amount offered,

which were highly appreciated by the customers? The answer will never be known, and in any

case is irrelevant, since the customization would not have been possible without the

information provided by data mining. Clearly, the essential factor in the return on investment

is not the possession of the best data mining tools (although this certainly cannot be

disregarded), but the ability to use them in an integrated database marketing strategy.

Data mining is only one element in database marketing, among others such as:

. the marketing communication style;

. the sales dialogue used;

. the format of the mailings sent to customers (colour or black and white? etc.);

. the provision of a dedicated telephone number;

. a system of telephone follow-ups;

. the training of the sales staff;

. the quality of the data output from data mining;

. the recording and storage of information supplied by customers;

1 Friedman, J.H. (1997)Datamining and statistics: what’s the connection? http://www-stat.stanford.edu/�jhf/ftp/

dm-stat.pdf


. the adaptation of the marketing processes (changing from ‘product’ to ‘customer’

marketing);

. the adaptation of the sales procedures (including decision-making powers).

However, if we really need to provide accurate information on the quality of a targeting

process based on data mining, because there will always be some managers concerned about

Table 13.1 Calculated return on investment.

Conventional

targeting

Targeting using

data mining

A number of customers targeted 30 000 15 000

B cost of each mailing D1 D1

C cost of each telephone follow-up D5 D5

D total cost (¼ A� (B þ C)) D180 000 D90 000

E number of new subscriptions 1 000 1 500

F subscription rate (¼ E/A) 3.33% 10%

G cost per subscription (¼ D/E) D180 D60

H annual turnover per subscription D150 D175 (larger amounts

taken up)

I total annual turnover (¼ H�E) D150 000 D262 500

ROI (¼ I/D) 83% 292%

Table 13.2 Calculated return on investment due to increased loyalty.

A cost of acquiring a new customer D150

B annual profit from each departing customer D450

C customer activation time 0.5 year

D loss due to a departure (¼ A þ (B�C)) D375

E cost of increasing loyalty of a detected ‘departing’ customer D50

F total number of customers 1 000 000

G number of departures per year 80 000

H attrition rate (¼ G/F) 8%

I number of ‘departing customers’ detected (correctly or incorrectly) 40 000

J total cost of increasing loyalty (¼ E� I) D2 000 000

K number of actual departing customers retained 8 000

L losses avoided (¼ D�K) D3 000 000

net total profit (¼ L� J) D1 000 000

RETURN ON INVESTMENT 625

the soundness of their investments, and also because it is important to measure performance in

order to improve it, there is oneway of achieving this. This is to add a random ‘control’ sample

of customers from one (or more) conventional target groups, identified by the marketing

department, to the marketing target generated by data mining. We must then treat all the

customers in the same way (using the same channels, the same media, the same commu-

nications, the same follow-ups, etc.) and compare the results at the end of the campaign.

They can be presented as in Table 13.1, where the last row shows the ROI, which is greater

than 100% if it is achieved in less than one year, and less than 100% otherwise.

In another field, the development of customer loyalty is also an important source of profit

for a business. We can attempt to estimate this as in Table 13.2. For the sake of completeness,

we should deduct the software costs and the salaries of data miners from the profits and ROI.

However, these costs are often small compared with the savings they offer.


data mining and statistics for decision making (tufféry/data mining and statistics for decision...

Documents