Transcript
Page 1: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Application of Data Mining and Machine

Learning for Credit Card Fraud Detection

A Comparative Analysis on two Academic Papers

Author: Christian AdomComputer Science Department , City University London

26th April 2015

Abstract

The use of credit cards as a payment method has become increas-ingly popular in recent years. As advancements in e-commerce tech-nologies continue to emerge, consumers are increasingly taking advan-tage of the convenience and flexibility offered by credit card purchases.This change in consumer behaviour has given rise to an unprecedentedincrease in cases of credit card frauds and subsequently lead to sub-stantial financial loss for consumers, issuing banks and merchants inthe payments industry. In this paper we compare and discuss two aca-demic research papers that attempt to apply data mining and machinelearning techniques to address the problem of credit card fraud detec-tion and prevention. Our aim is to critically assess the methodologies,techniques and results presented by the researchers for both papers incountering the problem of credit card fraud. Next we identify the areasof similarities in the approach and methodologies used, while clearlydelineating between the differences in the techniques implemented. Fi-nally we provide a short excursion of the use of these techniques inindustry.

1 Introduction

The use of credit cards as the primary payment method has become in-creasingly popular with consumers in recent years. A study conducted in2014 by the UK Cards Association revealed there were approximately 175.6million cards in issue (55.4 million of which were credit cards) and that cardexpenditure rose by £0.6 billion, amounting to a total of £49.0 billion. [1]

1

Page 2: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

There is strong evidence to suggest that this growing trend in consumerbehaviour is driven by advancements in e-commerce technologies and con-sumers are increasingly taking advantage of the convenience and flexibilityoffered by credit card purchases.

Although a prosperous economic time for the e-commerce market, issuingbanks, merchants and payment providers are now facing an unprecedentedrise in the number of credit card fraud cases as a result of this rise. A studyconducted by the Financial Fraud Action UK (FFA) estimates the fraudlosses on UK cards totalled £450.4 million in 2013, a 16 % increases from£388.3 million in 2012. [2]

In an effort to address this growing problem, a number of fraud detectionmodels and techniques have been proposed by researchers within both thepayments industry and academia. A survey of the current literature on frauddetection methods reveals a number of approaches to tackling this problem.Below are a few of the research areas: [3] [4] [5]

• Bayesian Network

• Genetic Algorithm

• Neural Network

• Support Vector Machine

• Decision Tree

• Fuzzy Logic Based System

• Hidden Markov Model

• Meta Learning Strategy

For the purpose of this paper, we will discuss two academic research papersthat address the issue of credit card fraud detection and prevention, namely:

1. Credit Card Fraud detection using Hidden Markov Model [6]

2. Neural data mining for Credit card Fraud Detection [7]

The aim of this paper is to critically assess the methodologies, techniquesand results presented by the researchers in countering the problem of creditcard fraud. Areas of similarities in the approach and methodologies used areidentified, while clearly delineating between the differences in the techniquesimplemented.

2

Page 3: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Generally, credit card fraud can be divided into two types, off-line andon-line fraud. For off-line fraud, the crime is committed by using the stolenphysical card to make (usually face-to-face) unauthorized transactions. Inthe case of on-line fraud, the fraud is committed by stealing the card details(without the knowledge of the legal card holder) and proceeding to makeunauthorized transaction through the internet, phone or other card-holdernot present (CNP) channels [8]

In order to successfully detect this kind of fraud it is necessary to developmethods to analyse the spending pattern on a card and find inconsistencieswith respect to the normal spending patterns of the card-holder.Generally, humans exhibit specific behaviour in their spending habits, thusevery card-holder can be represented by a set of patterns containing uniqueinformation such as; purchase category, time/location of purchase, trans-action amount, etc. Deviation from known patterns is considered to be apotential threat to the model.

The principles introduced above are the key insights into successful frauddetection presented in both papers, and it is on this common ground thatwe will address the research. The remainder of this paper is organised asfollows: Section 2 and 3 presents a summarisation and review of the twopapers under comparison. Section 4 covers a comparative analysis.

2 Credit Card Fraud Detection Using Hidden MarkovModel

Credit card fraud detection for CNP transactions using a Hidden MarkovModel (HMM) is based on the analysis of the spending pattern on a cardholder. The key approach is to model the sequence of operations for pro-cessing credit card transactions using the HMM.

The details of items purchased in individual transactions (not known toFraud detection system) are represented as the underlying finite Markovchain, which are not observable. The transactions can only be observedthrough a stochastic process that produces the sequence of the amount ofmoney spent in each transaction.

The HMM is trained with normal behaviour of card-holder by generatingsynthetic data - This need to generate artificial data is due to difficulties re-searchers face in obtaining real credit card data sets due to security, privacyand cost issues. Upon completion of model training, the Fraud detection sys-tem (FDS) is tested by running transactions through the system - Incoming

3

Page 4: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

transactions not accepted by HMM with sufficiently high probability aremarked as fraud.

2.1 HMM Background

In probability theory, a Markov Model is a stochastic model used to modelrandomly changing systems where it is assumed that future states dependonly on the present state and not on the sequence of events that precedeit. [9] When the system being modelled is assumed to be a Markov processwith unobserved (hidden) states, we can represent it as a Hidden MarkovModel.

A HMM has a finites set of states which are governed by a set of transi-tion probabilities, where for any particular state, an outcome or observationcan be generated according to an associated probability distribution. It isonly the outcome and not the state that is visible to an external observer(reference to the term ”Hidden”).

2.1.1 Characteristics of an HMM

To set the foundation for understanding how HMM’s are applied to theproblem of fraud detection, it is to necessary to provide a formal descriptionof the characteristics of an HMM.

Each Hidden Markov Model is defined by the following elements: states,observation symbols, transition probabilities, and initial probabilities.

1. The N states of the model is defined as:

S = {S1, S2, ..., Sn} (1)

2. The M observation symbols per state is defined as:

V = {V1, V2, ..., Vm} (2)

3. The state transition probability distribution A is given by:

aij = P (qt+1 = Sj |qt = Sj) (3)

where:qt = current state

The transition probabilities satisfy the stochastic constraints:

aij ≥ 0, 1 ≤ i, j ≤ N and

N∑j=1

aij = 1, 1 ≤ i ≤ N

4

Page 5: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

4. The observation symbol probability distribution B in each state isgiven by:

bj(k) = P (vk = Sj) , 1 ≤ j ≤ N, 1 ≤ k ≤M (4)

The observation symbol probabilities satisfy the stochastic constraints:

bj(k) ≥ 0, 1 ≤ j ≤ N 1 ≤ k ≤ M and

M∑k=1

bj(k) = 1, 1 ≤ j ≤ N

5. The initial state probability vector π is the probability that the modelis in state Si at time t = 0 and is defined as:

πi = P (q1 = Si) , 1 ≤ i ≤ N such thatN∑i=1

πi = 1 (5)

From the above definitions, the complete specification of an HMM re-quires the estimation of two model parameters, N and M, and three proba-bility distributions A, B, and π. This is represented by the notation:

λ = (A,B, π) (6)

2.2 Implementing the HMM for Credit Card Fraud Detec-tion

With the mathematical characteristics of the HMM defined , we now turn toproviding a high level outline of the fraud detection process using an HMM.This can be summarised in three steps:

1. Each incoming transaction is submitted to the FDS (usually runningat an issuing bank) for verification.

2. The FDS tries to find an anomaly in the transaction based on thespending profile of the card-holder.

3. If the FDS confirms the transaction to be fraudulent, it raises an alertand the issuing bank declines the transaction.

This general process raises a number of questions, such as:

• How are the credit card transaction processing operations mapped interms of an HMM ?

• How are the spending profiles of the cardholders determined and cat-egorised?

We discuss these and other issues in the next section.

5

Page 6: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

2.2.1 HMM for Credit Card Transaction Processing

The process for mapping credit card transaction processes in terms of aHMM can be enumerated in six steps:

1. Decide on observation symbols in model.

2. Quantize purchase values x into M price ranges V1, V2, ..., Vm formingthe observation symbols.

3. Define the observation symbols as, V = l, m, h making M = 3where: l = low, m = medium, h = high

4. The transition in purchase type is used as the state transition in themodel.

5. The set of all possible types of purchases forms the set of hidden statesof the HMM.

6. Compute the probability matrices A, B and π

We can make some general comments about the steps defined above:In step 4 the transition in purchase type is used as the state transition in themodel. This choice seems to contradict our intuition, since a credit card-holder makes different kinds of purchases of different amounts over a periodof time, the natural choice would be to consider the sequence of transactionamounts instead. However, the sequence of types of purchase is consideredto be more reliable compared to the transaction amounts because a card-holder makes purchases depending on his/her need for obtaining differenttypes of items over a period of time. This spending behaviour subsequentlygenerates a sequence of transaction amounts. Furthermore, the individualtransaction amount generally depends on the associated type of purchase.

Figure 1: Special case of fully connected HMM

6

Page 7: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

In step 6, the optimal values for these parameters are determined in thetraining phase using a ”Baum-Welch (forward–backward) algorithm.

With the process defined we present a graphical representation of a HMMas shown in Figure 1. This is a special case of a fully connected HMM inwhich every state of the model can be reached in a single step from anotherstate. In this figure GR, EL and MI represent Groceries, Electronics andMiscalenoues purchases respectively.

2.3 Process flow of FDS

Figure 2: Process flow of the FDS

After obtaining an estimate of the HMM parameters through the train-ing phase, Abhinav et al obtain an initial sequence of symbols from thecardholders transactions. This sequence can then be passed parametricallyto the HMM to compute the probability of acceptance:

α1 = P (O1, O2, ..., OR|λ) (7)

They then form another sequence of length R by discarding O1 andadding OR+1 in the new sequence. This is then passed parametrically to theHMM and the probability of acceptance is computed as:

α2 = P (O2, O3, ..., OR+1|λ) (8)

where OR+1 is the symbol generated by a new transaction at time t+ 1The metric for accepting or declining the sequence is defined as:

∆α = α1 − α2 (9)

7

Page 8: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

where if ∆α > 0 they conclude the sequence is accepted by the HMMwith low probability thus it is potentially fraud, given that the followingadditional condition holds:

∆α

α1≥ Threshold (10)

Alternatively, if the condition does not hold then OR+1 is permanentlyadded in the sequence and the new sequence is used for determining thevalidity of the next transaction.

The complete process flow of the FDS is shown in Figure 2, where theprocess is divided into two separate phases (Training and Detection)

2.4 Results and Analysis

In the final stage of testing and analysis, large-scale simulations were carriedout to test the effectiveness of the FDS system. Abhinav et al used TruePositive (TP) , False Positive (FP), TP-FP spread and Accuracy metrics,to measure the capability of the system. In this context, TP represents thefraction of fraudulent transactions correctly classified as fraudulent, whereasFP is the fraction of genuine transactions incorrectly classified as fraudulent.Furthermore the converse of these metrics, True negative (TN) and FalseNegative (FN) were also used as part of accuracy calculation.

To measure the performance of the FDS , the difference between TP andFP, called the TP-FP spread, was used as a metric. Accuracy representsthe fraction of total number of transactions (both genuine and fraudulent)that have been detected correctly, and is given by:

TP + TN

TP + TN + FP + FN(11)

Lastly, experiments were carried out to determine the correct combina-tion of HMM design parameters namely; number of states, sequence length,and threshold value. After obtaining these parameters , a comparative studywith another Fraud detection system was carried out as a means of bench-marking the system.

2.5 Performance Comparison

The performance of the proposed system was measured while varying thenumber of fraudulent transactions and spending profile of the card-holder.The performance was then compared with the credit card fraud detectiontechnique proposed by S.J Stolfo et al in the paper “Credit Card FraudDetection Using Meta-Learning [10]

8

Page 9: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Abhinav et al carried out experiments by considering four profiles, notingthat one of them is a mixed profile, meaning that spending profile was notconsidered in their approach. The profiles they considered are (55 35 10),(70 20 10), and (95 3 2). Here, (x, y, z) profile represents a low spendingprofile card-holder who has been carried out x % of their transactions in thelow, y % in medium, and z % in the high range. The goal was to determinehow the system performed for different mixes of transaction amount rangesin the transactions.

For every combination of spending profile and malicious transaction dis-tribution, they carried out 100 test runs and recorded the average result.For consistency the same set of data was used to determine the performanceof both the approach used by Abhinav et al and S.J Stolfo et al (denoted”OA” and” ST” respectively for convenience).

Fig 3a shows the variation of TP and FP for the two approaches usingthe spending profile (95 3 2). The variation of TP-FP and Accuracy is alsoshown in Fig 3b. The graph shows that the TP of the researchers approachis markedly close to Stolfo et al’s approach. Furthermore, both approacheshave similar values of FP.They concluded that the two systems had comparable accuracies and averageTP-FP spread and showed a similar trend with variation in µ.

Figure 3: Performance variation of the two systems (OA and ST) for thespending profile (95 3 2)

9

Page 10: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

From the testing results, Abhinav et al conclusion was that the proposedsystem has an overall Accuracy of 80%, even under large input conditionvariations, which is much higher than the overall Accuracy of the methodproposed by Stolfo. Their system therefore correctly detects most of thefraudulent transactions.

3 Neural Data Mining for Card Fraud Detection

This paper applies a combination of data mining techniques and a neuralnetwork algorithm to address the fraud detection problem. The aim is to ob-tain high fraud coverage, combined with a low false alarm rate. The generalapproach is to model the sequence of operations in credit card transactionprocessing using a confidence-based neural network. To ensure the accuracyand effectiveness of fraud detection, receiver operating characteristic (ROC)analysis is applied as a means of measuring the accuracy of the model.

A neural network is initially trained with synthetic data, then if an in-coming credit card transaction is not accepted by the trained neural networkmodel (NNM) with sufficiently low confidence, it is considered to be fraudu-lent. The paper shows how confidence value, neural network algorithm andROC can be combined successfully to perform credit card fraud detection.

3.1 Background on Neural Networks

Fraud detection using Artificial Neural Network takes inspiration from thebiological nervous system and brain function of humans beings. The ap-proach is based on the brains ability to learn from past experience andapply the data/knowledge in decision making and problem solving. Thegoal of researchers have been to apply these principals to credit card frauddetection methods.[11]

The neural network is usually trained with both personal and historicaltransaction data of the card-holder, such as occupation, income, transactionamount, purchase location, frequency and time period of purchases. In ad-dition to training the network with this information, it is also very commonto include the variety of credit card fraud faced by a particular issuing bankinto the training data.

The neural network is typically depicted as having three interconnectedlayers: (As shown in Figure 4)

• Input Layer: Receives input from an external source such as a database

10

Page 11: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Figure 4: Multi-layered neural network

• Hidden Layer: A layer hidden from external observer and receivesinput from input layer or another hidden layer

• Output Layer: Exposes the network to external observers and providesthe final output of the network

The output of the neural network generally takes real values between 0and 1. If the output is below some specified threshold values (for example0.6 ) then the transaction is classified as genuine. If the output is above somespecified threshold then the ”probability” that the transaction is fraudulentis considered to be high. This is a rather subtle distinction.

3.2 Data Set

As discussed in previous sections, a critical part of designing an effectiveNNM is to supply the model with realistic transactional data. However dueto security, privacy and cost issues, researchers face difficulties in obtainingcredit card data sets, the typical approach taken by researchers has been togenerate ”synthetic” data to facilitate the development and testing of themodel.

11

Page 12: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

In generating the data, it is necessary to provide the neural network witha mix of genuine as well as fraudulent transactions to train the classifiers.Tao et al specify a ratio of approximately 100 good transactions for eachfraudulent transaction in the training data set in order to accurately simulatereal customer transactions.

In designing the data set for the model, it is important to clearly specifythe credit card payment-related training data attributes for the NMM. Taoet al specify key attributes such as:

• time of transaction

• location of transaction

• type of merchandise

• business code for merchandise

• business type for merchandise

• transaction amount

Furthermore, the idea of ”Actual Target Values” are used (for classifica-tion) to guide the neural network learning process, where a target value 1represents abnormal, and 0 represents normal.

3.3 Calculation of confidence value

The unique approach introduced by the Tao et al in the application of Neu-ral network techniques for fraud detection is to convert both training andtesting data into confidence values before putting into NNM. These valuesare formatted to the range [0.0, 1.0] and each input contains historical in-formation at this time. Furthermore, they categorise the input attributesinto discrete and continuous values, where attributes such time, location,type of merchandise, e.t.c are defined as discrete whilst an attribute suchas transaction amount belongs to the continuous category. Therefore thereare two separate methods proposed for the calculation of confidence valuesbased on the category of input attributes.

To illustrate the calculation of confidence values for discrete and continu-ous attributes, they consider the location of the transaction and transactionamount respectively as examples:

12

Page 13: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Given a sequence of transactions:

X = {x1, x2, ..., xn}

The confidence for the transaction location is given by:

C(xi) =mxi

n(12)

where:n is the number of uses of the credit cardxi for i = 1, 2, ..., 3 is the location of use of the credit cardmxi denotes the number of uses of credit card in the location xi

The confidence for the transaction location is given by:

C(xi) = e

−1

2(xi − µσ

)2

(13)

where:n is the number of uses of the credit cardxi for i = 1, 2, ..., 3 is the transaction amount i of use of the credit cardσ is standard deviation for the transaction amountµ is the average of transaction amount

The purpose of the calculation of the confidence values is two fold. Firstlyit will be tested against a threshold value that enables the researchers todetermine whether the transaction is genuine or fraudulent. Furthermore,through the confidence calculation, neural network input is formatted to therange [0.0, 1.0] - where all input values achieve the purpose of format - Theformatted data will help to speed up the neural network learning process.

3.4 Back Propagation and Receiver operating characteristic

A brief overview of the theory and operation of NNM were discussed insection 3 and 3.1. In this section we look at the methods applied by Tao etal in further detail.

3.4.1 Back Propagation

In the proposed system, the reseachers apply a multi-layer neural networkmodel and a backpropagation (BP) algorithm on the model.The BP algorithm is a common approach to training artificial neural net-works. It computes the gradient of a cost function with respects to all theweights in the network. The gradient is passed as input into an optimizationmethod (such as steepest descent) which subsequently updates the weights,with the aim of minimizing the cost function[12].

13

Page 14: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

In this study the BP algorithm learns by iteratively processing a data setof training ”tuples” (a finite ordered list of elements):

X = x1, x2, ..., xn

The algorithm compares the networks prediction for each tuple with theactual known target value. For each training tuple, the weights are adjustedto minimize the mean squared error between the networks prediction and theactual target value. The modifications are made in the backwards direction,from the output layer Y = y1, y2, ..., yn, through each hidden layer down tothe first hidden layer.

For this study the researchers used a sigmoid function:

S(t) =1

1 + e−t(14)

for the nodes in the hidden layers and the output layer.

3.4.2 Receiver operating characteristic (ROC)

In this paper, ROC analysis has the dual purpose of ensuring the accuracyand performance of the model is adequate. This is achieved by obtainingan optimal threshold for determining whether a transaction is genuine orfraudulent. This threshold value is tested against the output of the NNM,which takes the form of confidence value Y = y1, y2, ..., yn. Here the classi-fication of the transaction as genuine or fraudulent will then be determinedby whether this confidence value is higher or lower than the threshold.

A crucial part of ROC analysis is the specification of the confusion matrixand Table 1 shows the layout of the matrix. The confusion matrix comparesactual classification values (rows) against model predictions of fraud. If themodel predicted fraud high accuracy, all observations in the confusion matrixwould reside in the two cells labelled ”True Positive” and ”True Nega-tive”. The objective is to maximize correct predictions while managing theincrease in false alarms. [13]

In this context the False Positive Rate is the ratio of abnormal spendingpattern incorrectly detected as normal over total abnormal spending patternand is given by:

FPR =FP

FP + TN(15)

Conversely the True Positive Rate is also given by the ratio:

TPR =TP

TP + FN(16)

14

Page 15: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Table 1: Confusion matrix

Prediction ClassificationY N

Actual Classification Y True Positive False NegativeN False Positive True Negative

With the FPR and TPR metrics defined, we introduce another importantmetric at this point, namely the Youden Index. In medical/biologicalsciences, the Youden Index (or Youden exponent as defined in this paper) istypically a used as a summary measure of the ROC curve. It both measuresthe effectiveness of a ”diagonistic marker” and enables the selection of anoptimal threshold value (cutoff point) for the marker.It is defined as J = Senesitivity + speficity − 1 [14]Its value ranges from 0 to 1, where a value of 1 indicates that there are nofalse positives or false negatives, i.e. the test is perfect. (Value of 0 indicatesthe converse)

In the context of this paper, when considering the optimal point on theROC curve, Tao et al define the maximal number of Youden exponent E as:

E = TPR− FPR (17)

Then taking into consideration the cost of false negative and false posi-tive, the weighted exponet (CE) is defined as:

CE =FNC

FPC + FNC∗ TPR− FNC

FPC + FNC∗ FPR (18)

where:FNC is the cost of false negative and FPC is the cost of false positiveSatisfying the constraints:

0 ≤ FPC ≤ 1 , 0 ≤ FNC ≤ 1 , FPC + FNC 6= 0 (19)

And:FPC = FNC 6= 0 (20)

Now when equation [20] holds, then equation [18] reduces to:

CE =1

2∗ (TPR− FPR) =

1

2∗ E (21)

The crucial point under illustration here is that the use of cost of weightedexponent overcomes the inadequacies of setting threshold without consider-ing error cost.

15

Page 16: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

3.5 Results and Analysis

In this section we summarise the experimental results and concluding anal-ysis obtained by the researchers after running test transactions through theFDS.

Firstly, 7000 records of synthetic data was used for training the NNM and3000 for for testing purposes.

Figure 5: Calculation of confidence and classification values per record

The table in Figure 5 shows 10 records of card-holder behaviour attributes(Time of transactions, Merchant type, Business code, etc) with their asso-ciated values (0.56, 0.83, 0.55, etc) after confidence values were calculated.For each record a target value of 0 (normal) or 1 (abnormal) was assignedbased on testing the output yi against a specified threshold value (see sectionon ROC)

We now consider the ROC curve shown in Figure 6. From this the re-searchers attempt to show that setting the threshold value at 0.4 improvesthe detection accuracy of the model. Without considering the cost factor(CE) the model obtains the optimal value at the point where the TP ratehits 91.2 % and FPR is at 13.55%, providing a reasonably good ratio ofTrue positive to False positive. They can then choose to factor the cost bycomputing the threshold value according to equation [21], noting that theoptimal threshold value will be adjusted when the relative cost changes.

16

Page 17: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Figure 6: ROC curve

4 Comparative Analysis

In this section we compare and contrasts both papers by identifying theareas of similarities in the approach used, while clearly delineating betweenthe differences. Figure 7 provides an illustration of the key areas that will bediscussed in this section, where we attempt to show that there is a generalconcord in the scope, motivation and methodologies applied in both paperswhilst expounding on the primary difference in the implementation andtechniques applied to solving the fraud detection problem.

4.1 Areas of concordance

4.1.1 Scope

As an introduction to this section, it should be stated that these two paperswere carefully selected from a range of available literature on the topic offraud detection, due to the fact that they both approached this issue from aperspective that incorporated data mining, machine learning and statisticaltechniques within the context of credit card fraud detection. Therefore

17

Page 18: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Figure 7: Illustration of various components of both papers

the scope of their research were closely aligned, making for an interestingexposition and comparison.

4.1.2 Motivation

As discussed in section 1, in the retail market environment, e-commerce hasrapidly grown and gained popularity due to the ability to facilitate instan-taneous transactions. Subsequently credit card payment has become themost important means of payment due to rapid development in informa-tion technology globally. However as the usage of credit card increases therate of fraudulent practices is also increasing substantially. Both Abhinavet al. and Tao et al are acutely aware of this problem and its implicationfor card-holders and especially issuing banks, who face the risks of losingmillions in fraud compensation and fines. It is therefore clear that it is thiscommon concern that acts as the motivation for the research presented byboth papers.

4.1.3 Methodology

The usage of transaction data to understand the spending pattern of card-holders and to detect credit card fraud is not a new concept and has beenlargely recognised by researchers in this area (as is evidenced by the numer-ous literature on fraud detection techniques) as the most effective means ofsolving the fraud detection problem. It is therefore not surprising that weshould find this methodological approach adopted by both Abhinav et al.and Tao et al in their work.

18

Page 19: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

The common approach adopted by both is to model the sequence of op-erations for processing credit card transactions, then test the model by run-ning transactions through them. This subsequently leads to Abhinav et al.and Tao et al introducing TP an FP metrics as a means to ensure the ac-curacy and effectiveness of their fraud detection model. Furthermore, theresearchers analyse historical transactional data and attempt to find incon-sistency in spending pattern as a means of detecting fraudulent transactions.

To ensure the fraud detection model is successful in its primary purpose,Abhinav and Toa place a strong emphasis on training their models withrealistic data that contains a good mix/ratio of fraudulent to genuine trans-actions. However they both have to deal with issues surrounding security,privacy and cost of obtaining real transaction data, hence the need to gen-erate synthetic data.

Lastly, both researchers run test transactions against their trained models,making the decision to accept or reject transactions based on a specifiedthreshold value.

4.2 Implementation and Technical differences

4.2.1 Modelling the sequence of operations in credit card trans-action processing

In the approach proposed by Abhinav et al, the key idea is to model the se-quence of operations for processing credit card transaction using the HMM,where the details of items purchased in individual transactions are repre-sented as the underlying finite Markov chain, which are not observable. Thetransactions can only be observed through a stochastic process that producesthe sequence of the amount of money spent in each transaction. Whilst inthe work of Tao et al the the sequence of operations in credit card trans-action processing is modelled using a confidence-based neural network andROC analysis. The calculation of confidence values are introduced for bothdiscrete and continuous input attributes respectively.

4.3 Training Data and Learning Time

The unique approach introduced by the Tao et al in the application of Neu-ral network techniques for fraud detection is to convert the both trainingand testing data into confidence values before putting into NNM. These val-ues are formatted to the range [0.0, 1.0] and each input contains historicalinformation at this time. In Abhinav et al approach, once the sequenceis formed from the cardholder’s transactions, it is passed directly into theHMM without formatting.

19

Page 20: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

In regards to training and learning time of the model, it is important tonote the differences in training approach employed by both researchers. InAbhinav’s implementation, although the training is done offline, the learningtime of the model can have a strong impact on the scalability of the system.This is due to the fact that an HMM has to be trained for every cardholder.Considering the fact that an issuing bank such as HSBC processes millions oftransactions for equally large number of cardholders, their implementationcould lead to issues surrounding performance and scalability of the system.

In Tao’ approach, the challenge to the performance and scalability of theirsystem lies in finding an efficient optimization algorithm for minimizing thegradient cost function and subsequently updating the weights. Fortunately,optimization algorithms are well studied and a large body of techniques areavailable that can be implemented to address this problem.

4.4 Threshold values

In Tao et al work, the key insight is taking into consideration the cost offalse negative and false positive, the approach overcomes the inadequaciesof setting threshold without considering error cost by adjusting the optimalthreshold value when the relative cost changes. Furthermore the applicationof ROC analysis is introduced to show that setting the threshold value at theoptimal value improves the detection accuracy of the model. Alternatively inAbhinav’s implementation the threshold value is learnt empirically throughthe training stage using the Baum-welch algorithm, then is effectively fixedfor the card-holders markov model.

4.5 Acceptance and Declining of incoming transactions

In the Tao’s implementation, the output of the neural network generallytakes real values between 0 and 1. If the output is below some specifiedthreshold values then the transaction is classified as genuine, alternativelyif the output is above some specified threshold then the ”probability” thatthe transaction is fraudulent is considered to be high.

Comparatively, in Abhinav work, after obtaining an estimate of the HMMparameters through the training phase, they obtain an initial sequence ofsymbols from the cardholders transactions. This sequence can then bepassed parametrically to the HMM to compute the probability of accep-tance. Subsequently, if the metric ∆α > 0 they conclude the sequence isaccepted by the HMM with low probability thus it is potentially fraud, ifadditional conditions are satisfied.

20

Page 21: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

5 Usage of HMM and NMM in Industry

By way of contextualising these papers in terms of the application of themethods discussed, it is helpful to examine one case of general applicationof these techniques and another case related directly to credit card frauddetection. Firstly considering HMM’s, they have applications in a broadnumber of scientific and mathematical fields where the goal is to recovera data sequence that is not immediately observable (but other data thatdepend on the sequence are)For example HMM’s have applications in Automatic speech recognition,where the model is trained to recognize speech utterance from given obser-vations. [15]. They have also been used extensively in biological sequenceanalysis, where HMMs can be used to solve various sequence analysis prob-lems such as pairwise and multiple sequence alignments, gene annotation,classification, etc. [16]

Neural networks models are a immensely popular technique for fraud de-tection and are used by some of the worlds largest banks. In fact Santanderbank uses a fraud detection system called Falcon Fraud Manager fromFICO which is based heavily on neural models and leverages adaptive ana-lytics. The adaptive model adjusts the base neural network ”Falcon score”in response to real-time fraud tactics that were not present at the time ofthe neural network model training[17][18]. FICO analytics software andtools are used across multiple industries to manage risk, fight fraud, buildmore profitable customer relationships, optimize operations and meet strictgovernment regulations[19].

References

[1] UK Cards Associationhttp://www.theukcardsassociation.org.uk/wm documents/December

2014 Full Report.pdf

[2] Financial Fraud Action UKhttp://www.financialfraudaction.org.uk/downloads.asp?genre=consumer

[3] S. Benson Edwin Raj, A. Annie Portia Analysis on Credit Card FraudDetection Methods. International Conference on Computer, Communica-tion and Electrical Technology – ICCCET2011, 18th, 19th March, 2011

[4] Masoumeh Zareapoor, Seeja.K.R, and M.Afshar.Alam Analysis of CreditCard Fraud Detection Techniques: based on Certain Design Criteria.International Journal of Computer Applications (0975 – 8887) Volume52– No.3, August 2012

21

Page 22: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

[5] Khyati Chaudhary, Jyoti Yadav, Bhawna Mallick A review of FraudDetection Techniques: Credit Card. International Journal of ComputerApplications (0975 – 8887) Volume 45– No.1, May 2012.

[6] Abhinav Srivastava, Amlan Kundu, Shamik Sural and Arun K. Majum-dar, Credit Card Fraud detection using Hidden Markov Model. IEEETransactions on dependable and secure computing VOL. 5, NO. 1,January-March 2008

[7] Tao Guo, Gui-Yang Li, Neural data mining for Credit card Fraud Detec-tion. Proceedings of the Seventh International Conference on MachineLearning and Cybernetics, Kunming, 12-15 July 2008

[8] Yufeng Kou, Chang-Tien Lu, Sirirat Sinvongwattana, Survey of FraudDetection Techniques. Proceedings of the 2004 IEEE International Con-ference on Networking, Sensing & Control Taipei, Taiwan, March 21-23,2004

[9] Markov Modelhttp://en.wikipedia.org/wiki/Markov model

[10] S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan, CreditCard Fraud Detection Using Meta-Learning: Issues and Initial Results.Proc. AAAI Workshop AI Methods in Fraud and Risk Management, pp.83-90, 1997

[11] Khyati Chaudhary, Jyoti Yadav, Bhawna Mallick A review of FraudDetection Techniques: Credit Card. International Journal of ComputerApplications (0975 – 8887) Volume 45– No.1, May 2012.

[12] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J Learningrepresentations by back-propagating errors. Nature 323 (6088): 533–536,(8 October 1986)

[13] Using Data Mining Techniques for Fraud Detection, A SAS InstituteBest Practices Paperhttp://www.ag.unr.edu/gf/dm/dmfraud.pdf

[14] Ronen Fluss, David Faraggi, and Benjamin Reiser Estimation of theYouden index and its associated cutoff point . Biometrical Journal, 2005

[15] HMM Speech Recognitionhttp://www.fysiskplanering.se/fou/cuppsats.nsf/all/e156a6197d8b0678c1256bbb003f6207

[16] Byung-Jun Yoon, Hidden Markov Models and their Applications inBiological Sequence Analysishttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/

22

Page 23: Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

[17] FICO Analyticshttp://www.fico.com/en/node/8140?file=5380

[18] FICO Analyticshttp://www.fico.com/en/blogs/tag/score-performance/page/5/

[19] FICO Analyticshttp://www.fico.com/en/about-us

23


Top Related