a graph-based, semi-supervised, credit card fraud ... · credit-card fraud detection received a lot...

13
Abstract Global card fraud losses amounted to 16.31 Billion US dollars in 2014 [18]. To recover this huge amount, automated Fraud Detection Systems (FDS) are used to deny a transaction before it is granted. In this paper, we start from a graph-based FDS named APATE [28]: this algorithm uses a collective inference algorithm to spread fraudulent influence through a network by using a limited set of confirmed fraudulent transactions. We propose several improvements from the network data analysis literature [16] and semi-supervised learning [9] to this approach. Furthermore, we re- designed APATE to fit to e-commerce field reality. Those improvements have a high impact on performance, multiplying Precision@100 by three, both on fraudulent card and transaction prediction. This new method is assessed on a three-months real-life e-commerce credit card transactions data set obtained from a large credit card issuer. 1 Introduction Nowadays, e-commerce becomes more and more important for global trade: sales of goods and services represented more or less 2,000 billion dollars in 2014 and it was estimated that on 7,223 millions peoples on earth, 20 % were e-shoppers [14]. Part of the reasons of this success is easy online credit card transactions and cross-border purchases. Furthermore, most organizations, companies and government agencies have adopted e-commerce to increase their productivity or efficiency in trading products or services [4]. Bertrand Lebichot (e-mail: [email protected]) · Marco Saerens [email protected]) Universite Catholique de Louvain, Place des Doyens 1, 1348 Louvain-la-Neuve, Belgium Fabian Braun (e-mail: [email protected]) Worldline GmbH, R&D, Pascalstrasse 19, 52076 Aachen, Germany Olivier Caelen (e-mail: [email protected]) Worldline SA/NV, R&D, Chaussee de Haecht 1442, 1130 Bruxelles, Belgium A graph-based, semi-supervised, credit card fraud detection system Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens mail: (e- © Springer International Publishing AG 2017 H. Cherifi et al. (eds.), Complex Networks & Their Applications V, Studies in Computational Intelligence 693, DOI 10.1007/978-3-319-50901-3_57 721

Upload: others

Post on 21-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

Abstract Global card fraud losses amounted to 16.31 Billion US dollars in 2014 [18].To recover this huge amount, automated Fraud Detection Systems (FDS) are used todeny a transaction before it is granted. In this paper, we start from a graph-based FDSnamed APATE [28]: this algorithm uses a collective inference algorithm to spreadfraudulent influence through a network by using a limited set of confirmed fraudulenttransactions. We propose several improvements from the network data analysisliterature [16] and semi-supervised learning [9] to this approach. Furthermore, we re-designed APATE to fit to e-commerce field reality. Those improvements have a highimpact on performance, multiplying Precision@100 by three, both on fraudulent cardand transaction prediction. This new method is assessed on a three-months real-lifee-commerce credit card transactions data set obtained from a large credit card issuer.

1 IntroductionNowadays, e-commerce becomes more and more important for global trade: salesof goods and services represented more or less 2,000 billion dollars in 2014 andit was estimated that on 7,223 millions peoples on earth, 20 % were e-shoppers[14]. Part of the reasons of this success is easy online credit card transactions andcross-border purchases. Furthermore, most organizations, companies and governmentagencies have adopted e-commerce to increase their productivity or efficiency intrading products or services [4].

Bertrand Lebichot (e-mail: [email protected])� · Marco [email protected])�

Universite Catholique de Louvain, Place des Doyens 1, 1348 Louvain-la-Neuve, Belgium

Fabian Braun (e-mail: [email protected])�Worldline GmbH, R&D, Pascalstrasse 19, 52076 Aachen, Germany

Olivier Caelen (e-mail: [email protected])�Worldline SA/NV, R&D, Chaussee de Haecht 1442, 1130 Bruxelles, Belgium

A graph-based, semi-supervised, credit cardfraud detection system

Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

mail:(e-

© Springer International Publishing AG 2017H. Cherifi et al. (eds.), Complex Networks & Their Applications V,Studies in Computational Intelligence 693, DOI 10.1007/978-3-319-50901-3_57

721

Page 2: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

722 Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

Of course, e-commerce is used by both legitimate users and fraudsters. TheAssociation of Certified Fraud Examiners (ACFE) defines fraud as: ”the use of one’soccupation for personal enrichment through the deliberate misuse or misapplicationof the employing organization’s resources or assets ”[8].

Global card fraud losses amounted to 16.31 Billion US dollar in 2014 and isforecast to continue to increase [18]. This huge number of losses has increased theimportance of fraud fighting: in a competitive environment, fraud have a seriousbusiness impact if not managed, and prevention (and repression) procedures must beundertaken.

For those reasons e-commerce and credit card issuers need automated systemsthat identify incoming fraudulent transactions or transactions that do not correspondto a normal behavior. Data mining and machine learning offer various techniques tofind patterns in data; here, the goal is to discriminate between genuine and fraudulenttransactions. Such Fraud Detection Systems (FDS) exist and are similar to detectionapproaches in Intrusion Detection System (IDS). FDS use misuse and anomaly basedapproaches to detect fraud [15].

However, there are issues and challenges that hinder the development of an idealFDS for e-commerce system [11]; such as,

• Concept drift: fraudsters conceive new fraudulent ways/methods over time. Fur-thermore, normal behavior also varies with time (peak consumption at Christmasfor instance).

• Six-seconds rule [28]: acceptance check must be processed quickly as the algo-rithm must decide within six seconds if a transaction can be pursued.

• Large amount of data: millions of transactions occur per day whereas have to beanalyzed and acceptance must be granted in seconds.

• Unbalanced data: frauds represents hopefully only less than 1% of transactionsbut predicting a pattern is harder with unbalanced dataset.

The presence of those challenges leads to high false alert rate, low detection accuracyor slow detection (see [1] for more details).

This work focuses on automatically detecting e-commerce fraudulent transactionsusing network (or graph) related features. Our work is based on a recent paper [28]which introduced an automated and field-oriented approach to detect fraudulentpatterns in credit card transactions by applying supervised data mining techniques.More precisely, this algorithm uses a collective inference algorithm to spread fraud-ulent influence through a network by using a limited set of confirmed fraudulenttransactions and take a decision based on risk scores of suspiciousness of transactions,card holder and merchants.

In this paper, several improvements from graph literature and semi-supervisedlearning are proposed. The resulting fraud detection method is tested on a three-months real-life e-commerce credit card transaction data set obtained from a largecredit card issuer in Belgium.

The following questions are addressed:

1. Can we enhance graph-based existing FDS in terms of performance?2. How can we make FDS as suitable for real application as possible?

Page 3: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

A graph-based, semi-supervised, credit card fraud detection system 723

3. Is semi-supervised learning [9] or feedback [11] useful for this Graph-basedFDS?

Our approach takes into account various field/ground realities such as the six-second rule, concept drift, dealing with large datasets and unbalanced data. It alsohas been conceived in accordance with field experts to guarantee its applicability.

The rest of this paper is divided as follows: Section 2 introduces background andnotation. Section 3 reviews related work. Section 4 details the proposed contributions.Experimental comparisons are presented and analyzed in Section 5. Finally, Section6 concludes this paper.

2 Background and NotationThis section will first introduce some basic facts about fraud detection, since behaviorof fraudsters has to be taken into account in the development of algorithms designedto counter them. Then some useful graph notation is reviewed.

2.1 FraudsThere are many fraud detection domains but internet e-commerce presents a chal-lenging data mining task (see Section 1) because it blurs the boundaries betweenfraud detection systems and network intrusion detection systems.

As in many domains, profit-motivated fraudsters interact with the affected business.[2, 24] describes comprehensively this interaction: the fraudster can be internal orexternal to the business, can either commit fraud as a customer (consumer) or as asupplier (provider), and has different basic profiles. From this description, it comesout that professional fraudsters (as opposed to occasional ones) modus operandichanges over time. Therefore, fraud detection system algorithms should also adaptthemselve to new behaviors. This is refered as ”Concept drift”: the constant changein fraudsters behavior.

2.2 GraphsConsider a weighted directed graph or network, G, assumed strongly connected witha set of n nodes V (or vertices) and a set of edges E (or arcs, links) [6, 22]. Theadjacency matrix of the graph, containing non-negative affinities between nodes, isdenoted as A, with elements [A]i j (also written ai j) ≥ 0. The natural random walkon G is defined in a standard way. In node i, the random walker chooses the nextedge to follow according to reference transition probabilities

pi j =ai j

n

∑j′=1

ai j′

(1)

Page 4: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

724 Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

representing the probability of jumping from node i to node j ∈ Succ(i), the set ofsuccessor nodes of i. The corresponding transition probability matrix will be denotedas P. In other words, the random walker chooses to follow an edge with a probabilityproportional to the affinity (apart from the sum-to-one normalization), thereforefavoring edges associated to a large affinity. The matrix P, containing the pi j, isstochastic and is called the reference transition matrix.

3 Related WorkCredit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit card issuers protect data sources and mostalgorithms are produced in-house concealing the model’s details [28].

As for any machine learning modeling process, two main approaches can be used:a supervised and an unsupervised scheme. Supervised learning uses labels (the ob-served prediction of an instance, here the fraud tag) to build the classification model,where unsupervised simply extracts clusters of similar data that are then processed.Common unsupervised techniques are peer group analysis [29] and self-organizingmaps [30] while common supervised techniques are artificial logistic regression,neural networks (ANN) and random forests, meta-learning, case-based reasoning,Bayesian belief networks, decision trees, logistic regression, hidden Markov models,association rules, support vector machines, Bayes minimum risk and genetic algo-rithms. The reader is advised to consult [12] for more detail about credit card frauddetection, and [24] for a wider review on fraud detection.

According to [28], APATE was the only one to include network knowledge inthe prediction models for fraud detection: This model first builts a tripartite graph(see below) and then extracts relevant risk scores for each node. [28] shows that thisinformation, added to more conventional ones, increases the performances of thefraud detection system.

In this work, we follows the methodology of APATE [28] (which is described inthis section, to make this paper self-contained), and propose several improvements inthe next section. Other types of graph were also investigated (bipartite,...) but theydid not provide better results and are therefore not presented here.

In particular, APATE starts with a set of time stamped, labeled, transactions. Thegoal is, of course, to fit a model to infer future fraudulent/genuine transactions.Furthermore, for each transaction of this dataset, the card holder (or user) andmerchant (or retailer) is known. APATE thus create a tripartite adjacency matrix Atri

(there are three type of node: transactions, card users and merchants) as follows:

Atri =

0t×t At×c At×m

Ac×t 0c×c 0c×m

Am×t 0m×c 0m×m

(2)

where At×c = (Ac×t)T is an adjacency matrix where transactions are linked with

their corresponding card holders , At×m = (Am×t)T is an adjacency matrix where

Page 5: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

A graph-based, semi-supervised, credit card fraud detection system 725

transaction are linked with corresponding merchants and 0···×··· is a correctly sizedmatrix full of zeros. From Atri, transition matrix P is derived (see Section 2.2).

A column vector r0 = [rTrx0 ,rCH

0 ,rMer0 ]T of length equal to the total number of

transactions (hence the superscript Trx), card holders (CH) and merchants (Mer) isalso created. The vector is full of zeros, except for known fraudulent transactionswhere it is equal to one (and therefore is always zero for merchants and card holders).Finally, element k of a vector r0 is noted [r0]k.

Then, in a convergence procedure similar to the PageRank algorithm [23], r0 isupdated to spread the fraud label through the tripartite graph. This is known as arandom walk with restart procedure (RWWR) [19]:

rk = α ·PTrk−1 +(1−α) · r0 (3)

where α is the probability to continue the walk and (1−α) is the probability torestart the walk from a fraudulent transaction. This parameter could be tuned, butwas fixed to 0.85 in the experimental comparisons (see [23]). The procedure diffusesthe information about the transactions through the network.

Eq. 3 is iterated until convergence. Then, from rkc (where kc stands for k atconvergence) rTrx

kc , rCHkc and rMer

kc can be extracted and considered as a risk measurefor each transaction, card holder and merchant respectively.

As fraud detection models should adapt dynamically to a changing environment,this procedure is repeated several times, introducing a time decay factor. Each non-zero entry of Atri and r0 is modified to characterize transactions based on current andnormal customers past behavior (see [28] for more details):

{[Atri]i j ← e−γ·t([Atri]i j) or 0 if no relation[r0]k ← e−γ·t([r0]k) or 0 if no fraud

(4)

where t(·) is the (scalar) time where transaction between i and j in matrix Atri

occurred (or k for vector r0), and γ is a scalar set in such a way that the half-life ofthe exponential is: one day, one week and one month (i.e. elements are equal to 0.5at half-life). For instance, if a transaction occured two weeks ago, the correspondingelement of Atri with week decay is equal to 0.25 and is 1/(214) with day decay.

Therefore, for each transaction of our starting dataset, we have 12 new features:Transaction risk for transaction, card holder and merchant, each for four (no decay,day decay, week decay and month decay) time windows.

However, this procedure cannot be computed in less than a few minutes, whichis not suitable with the six-seconds rule. Convergence on a graph with millions ofnodes is expensive and is therefore daily re-estimated over night. Transactions madeduring the testing day are evaluated using the model trained on previous night. Forcard holders and merchants, the graph-based feature values are extracted (looked up)from the trained model, since they are likely to be part of the previous data.

Naturally, for the new transaction not part of the model, transaction-based featureshave to be estimated, which is done through the formula:

Page 6: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

726 Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

score(Trxi,k) =1

n

∑j=1

p ji +1score(Meri)+

1m

∑j=1

p jk +1score(CHk) (5)

where score(Trxi,k) stands for the new transaction score between merchant i and cardholder k, score(Meri) stands for the score of merchant i and score(CHk) stands forthe score of card holder k. It represents the score of a new transaction l after one newiteration of Eq. 3 when this transaction is added to P (with pli = 1 and plk = 1). Ifa new transaction involves a new merchant and/or card holder, score(Meri) and/orscore(CHk) are set to zero accordingly.

Finally, those 12 new features (plus transaction-related features, see Table 1) arefed to a random forest classification model, as this model proved to perform well forthe problem at hand, predicting fraudulent transaction [3, 12].

Table 1: Features used by the random forest classifier. First group are demographicalfeatures and second group are graph-based features. Notice that each transaction islinked with a card holder and with a merchant at a certain date: those information areonly used to build the tripartite graph.

Variable name DescriptioninBEL/EURO/OTH Issuing region: Belgium/Europa/WorldTX AMOUNT Amount of transactionTX 3D SECURE Transaction used 3D secureAGE Age of card holderlangNED/FRE/OTH Card holder language: Dutch/French/OtherisMAL/FEM Card holder is Male/FemaleisFoM Card holder gender unknownBROKER Code of card providercardMCD/VIS/OTH Card is a Mastercard/Visa/Other01 Mer score Merchant risk score (boolean, no time damping)ST/MT/LT Mer score Day/week/month decay merchant risk score (3 features)01 CH score Card Holder risk score (boolean, no time damping)ST/MT/LT CH score Day/week/month decay Card Holder risk score (3 features)01 Trx score Transaction risk score (boolean, no time damping)ST/MT/LT Trx score Day/week/month decay Transaction risk score (3 features)TX FRAUD Target variable: Fraud/Guenuine

4 The Proposed ModelWhile showing good performance, APATE can be improved in various ways.

Page 7: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

A graph-based, semi-supervised, credit card fraud detection system 727

4.1 Dealing with hubsFrom the literature, it is known that presence of hubs in a network can harm the clas-sifier [17, 25, 26]: hubs are nodes having a high degree and are therefore neighborsof a large number of nodes. In our dataset, it corresponds to popular nodes such aspopular online shops like Amazon (as an example, the dataset is anonymised). Thosenodes tend to accumulate a high value of risk score since they are connected to alot of transactions. A simple way to counterbalance this accumulation is to dividethe risk score by the node degree after convergence. In general, it is possible todivide by any power of the node degree and/or by different powers for the threetypes of nodes of the tripartite graph (transactions, card holders and merchants). Inpractice however, we did not find any combination that significantly beats the simpledivide-by-node-degree option (results are not reported here).

Furthermore, it allows us to make a link with the regularized commute time kernelwhich is K = (D−αA)−1 (where D is the degree matrix) : element i, j of this kernelcan be interpreted as the discounted cumulated probability of visiting node j whenstarting from node i (see [16, 21, 31] for details). The (scalar) parameter α ∈ ]0,1]corresponds to an evaporating or killed random walk where the random walker has a(1−α) probability of disappearing at each step (therefore it has the same interpreta-tion as for the RWWR used in APATE, see Section 3). This method provided the bestresults in a recent comparative study on semi-supervised classification [16] and thesecond best results in another one [20]. In practice, the efficient implementation pro-posed in [21], Equation (22), for semi-supervised classification with the RegularizedCommute Time Kernel is used and referred as RCTK.

4.2 Introducing a time gapOn the other hand, unlike in [28], the model cannot be based on past few days.Indeed, fraudulent transaction tags (the variable we want to predict) cannot be knownwith certainty without human investigators feedback. Moreover, since the fraudsters’modus operandi is known to change over time (see 2.1), it is not acceptable to builtour model on old, less reliable (but fully inspected) data. However, it takes severaldays to inspect all transactions, mainly because it is sometime card holders that reportundetected frauds. Of course, this makes our fraud detection problem harder [10].

In arrangement with field experts, we designed a real-life scenario containingthree sets of data:

1. Training set: data where the transaction fraud labels can be taken as reliable.2. Gap set: data where the transaction fraud labels are unknown.3. Test set: data of the day on which the algorithm is currently tested.

It corresponds therefore to a semi-supervised learning scheme (SSL), as training dataare partially labeled. If the Gap set is totally left aside, this is an usual supervisedlearning (SL) problem again. Both cases (SL and SSL) were investigated:

• For the SL scheme, only the Training set is used to build the graph, and only theTraining set is used to train the random forest.

Page 8: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

728 Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

• For the SSL scheme, the Training set and the Gap set are used to build the graph,and only the Training is used to train the random forest.

Once again, in arrangement with field experts, 15 days of training data and sevendays for the gap set were chosen [5, 11]. This scenario is depicted on Fig. 1. Noticethat on this figure, τ controls the testing day and that models are systematically built(overnight) on the 22 previous days. By changing τ , we get different testing days.

τ−22 τ−7 τ

1 day

TRAINING SETGAP SET

TEST SET

TIME

Fig. 1: Real-life FDS scenario with three sets of data. It takes several days to inspectall transactions, mainly because it is sometime the card holder who reports undetectedfrauds. Hence, in practice, the fraud tags of the Gap set are unknown. This scenariois repeated each day, as the parameter τ is incremented.

4.3 Including investigators feedbackFinally, even if in this last scenario it is not possible to know all fraud tags for thegap set, it is still conceivable that a fraction of previous alerts have been confirmedor overturned by human investigators (typically when a fraud alert occurs, the card isblocked and the card holder is contacted by phone). In our case, we put this numberof feedbacks per day to 100, in arrangement with field experts.It is a realistic averagenumber of cards than a human investigator can check per day, usually by contactingthe card holder. So each day, the 100 most probable fraudulent card (according to themodel) are checked and then used as feedback. So in each of our gap set (except instarting condition) 700 cards have been checked by human investigators. We will takeadvantage of these investigated cases in order to try to predict more accurately thefraudulent transactions. On average, it means that roughly 1400 transaction feedbacks(two transactions per card) from previous testing day (previous τ’s of our model) areavailable. This option will be referred as +FB and only make sense in a SSL scheme.

4.4 Removing merchant scoresFinally, we observed that removing merchant scores rises the performance. This issurprising at first glance but, after investigation, it turns out that new transactionsinvolving new merchants cause issues (with our set-up, it corresponds to roughly20% of merchants). In this case, the risk score is set to zero, causing the method to

Page 9: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

A graph-based, semi-supervised, credit card fraud detection system 729

under-evaluate the risk. This should clearly be tackled but we choose to let this forfurther work. This last option will be refers as noM.

5 Experimental comparisonsIn this section, the possible variation of considered algorithms will be compared ona real-life e-commerce credit card transaction data set obtained from a large creditcard issuer in Belgium. Those graph-based algorithms compute additional featuresand were presented in Section 3 and 4. For practical purposes, considered algorithmsare recalled in Table 2 and the classifier is always a random forest with 400 trees.

The database is composed of 25,445,744 transactions divided in 139 days andfraud ratio is 0.31%. The features list can be found in Table 1. From this table, thefirst group contains socio-demographic features which are taken as-is. The secondgroup contains the graph-based features described in Section 3 and 4. Notice thateach transaction is linked with a card holder and with a merchant at a certain date:those three pieces of information (card holder, merchant and date) are used to buildthe tripartite graph. Finally, this database does not focus on a certain type of cardfraud (stolen, card-not-present,...) but contains all reported fraudulent transactions inthis time period.

Table 2: The nine compared models, see Sections 3 and 4 for acronyms. Consideredvariations of the APATE Algorithm according to four dimensions: merchant scorestatus, hubs status, learning scheme and utilisation of feedback. Precision@100(see Section 5) both for fraudulent card and transaction prediction is also reported(formatted mean ± std)

Classifier name Mer Score Damp hubs Learning Feedback Card Pr@100 Trx Pr@100RWWR SL = APATE used no Supervised no 18.64 ± 4.66 27.78 ± 11.61RWWR SSL used no Semi-supervised no 16.95 ± 4.46 20.85 ± 10.14RWWR SSL +FB used no Semi-supervised yes 14.19 ± 4.43 13.89 ± 8.49RCTK SL used yes Supervised no 23.78 ± 9.52 40.50 ± 18.00RCTK SSL used yes Semi-supervised no 44.55 ± 9.55 50.58 ± 13.99RCTK SSL +FB used yes Semi-supervised yes 37.15 ± 10.14 49.06 ± 14.70RCTK noM SL discarded yes Supervised no 45.35 ± 9.06 62.25 ± 11.97RCTK noM SSL discarded yes Semi-supervised no 56.08 ± 8.06 81.61 ± 9.00RCTK noM SSL +FB discarded yes Semi-supervised yes 56.65 ± 8.69 84.13 ± 8.42

As a performance indicator, Precision@100 [27] was chosen, in accordance withfield experts. It means that the 100 most probable (according to models) fraudulenttransactions are checked by human investigators each day (and added as feedbackin RWWR SSL +FB, RCTK w/ SSL +FB and RCTK noM w/ SSL +FB). Similarlyall most probable fraudulent transactions are considered until 100 cards have been

Page 10: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

730 Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

0 1 2 3 4 5 6 7 8 9

RCTK noM SSL +FB

RCTK noM SSL

RCTK noM SL

RCTK SSL +FB

RCTK SSL

RCTK SL

RWWR SSL +FB

RWWR SSL

RWWR SL

Friedman/Nemenyi test for Cards Prec@100

Fig. 2: Mean rank (circles and crosses) and critical difference (plain line) of theFriedman/Nemenyi test, obtained on a three-months real-life e-commerce credit cardtransaction data set. The blue (bottom circle) method has the best mean rank andis significantly better than red (crosses) methods. The Critical difference is 1.14.Performance metric is Pr@100 (Precision@100) on fraudulent card prediction.

checked as usually human investigators verify all transactions of a card when theyinvestigate. Precision@100 reports the number of true fraudulent transaction or cardamong 100 investigated cards. Notice that this last metric is more realistic as it issomehow the normal work charge for a human investigators team.

Figure 2 compares methods from Table 2 through a Friedman/Nemenyi test [13].To do so, we adopt a sliding window approach: each day (different τ from Fig 1)is considered as a different (train-gap-test) dataset. This test compares the rankingprovided by Table 2 methods. Friedman null hypothesis is rejected with α = 0.05 andNemenyi critical difference is equal to 1.14. A method is considered as significantlybetter than another if its mean rank is larger by more than this amount.

Firstly, RCTK always beats RWWR, RWWR noM was therefore discarded. Thissuperiority indicates that tackling the hubs problem is actually important.

Secondly, SSL leads to a huge improvement, but only if hubs have been damped.SSL predicted frauds tend to contain more frauds with a fraudulent activity duringgap days, compared to SL ones. As the fraud tag is hidden for the gap set, it meansthat this information is obtained by network analysis (train+gap).

Thirdly, even if +FB bring some kind of information, it only increases performancewhen hubs are tackled (RCTK) and merchant scores are removed (noM). By theway, results are not significantly better on our three-months dataset. Further analysis

Page 11: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

A graph-based, semi-supervised, credit card fraud detection system 731

(not reported here) shows that with more data days and more checked cards, thisimprovement becomes significant (with α = 0.05).

Lastly, removing merchant scores rises performance as explained in Section 4.Overall, the best combination is RCTK noM SSL +FB, but it is not significantly

better than RCTK noM SSL.Finally, Figure 3 indicates the frequency of selected features by the random forest

classifier. The method is RCTK SSL +FB and selects Mer scores most often. Sadly,new transactions involving new merchants cause issues. In this case, the risk scoreis set to zero, causing the method to under-evaluate the risk, resulting in a biasedprediction. Discarding those four features (Mer scores) does increase the overallperformance and selected variables of random forests stay similarly distributed.

0

1

2

3

4

5

6

7Feature frequency

Nu

mb

er

of

se

lectio

n [

%]

MT M

er sco

re

ST Mer

sco

re

01 M

er sco

re

LT M

er sco

re

ST CH sco

re

MT C

H sco

re

MT T

rx sco

re

ST Trx

sco

re

BROKER

01 T

rx sco

re

LT T

rx sco

re

01 C

H sco

re

LT C

H sco

reAG

E

TX AM

OUNT

card

OTH

lang

OTH

isFoM

inOTH

ERS

card

VIS

TX 3D S

EC.

lang

NED

card

MCD

inEU

RO

isFEM

inBEL

lang

FRE

isM

AL

Fig. 3: Selected variables of random forests for the RCTK SSL +FB model for alldays. Mer scores tend to bias the prediction. Discarding those four features doesincrease the overall performance (see Figure 2) and selected variables of randomforests stay similarly distributed.

6 ConclusionIn this paper, we start from an existing Fraud Detection Systems (FDS) APATE andbring several improvements: which have a huge impact on performances dampinghub nodes (RCTK), introduce restrictions due to real application (SSL, Gap set,Pr@100 as a metric) and introduce feedback information from human investigators(+FB). Those improvements multiply the Pr@100 by three, both on fraudulent cardor transaction prediction (for acronyms, see Section 4).

However, introducing feedback does not lead to a significant improvement: feed-back impact can be increased if more cards are checked, but this is non-realisticfor investigators. New transactions involving new merchants are still an issue (see

Page 12: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

732 Bertrand Lebichot, Fabian Braun, Olivier Caelen and Marco Saerens

noM in Section 4) which is let for further work: a possible way would be to mimicthe learning procedure from [7]. Another envisaged further work is to introducesemi-supervised learning not only on graph analysis but also in main classifier.

Acknowledgements This work was partially supported by the Immediate and by the Brufenceprojects funded by Innoviris. We thank this institution for giving us the opportunity to conduct bothfundamental and applied research.

References

[1] Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system. Journal of Network andComputer Applications 68, 90–113 (2016)

[2] Baesens, B., Van Vlasselaer, V., Verbeke, W.: Fraud Analytics Using Descriptive, Predictive,and Social Network Techniques: A Guide to Data Science for Fraud Detection. WileyPublishing (2015)

[3] Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud:A comparative study. Decision Support Systems 50(3), 602–613 (2011)

[4] Bolton, R., Hand, D.: Statistical fraud detection: A review. Statistical science 17, 235–249(2002)

[5] Bolton, R.J., Hand, D.J.: Unsupervised profiling methods for fraud detection. In: Proceedingsof the Credit Scoring and Credit Control VII Conference, p. 235255 (2001)

[6] Brandes, U., Erlebach, T.: Network analysis: methodological foundations. Springer-Verlag(2005)

[7] Braun, F., Caelen, O., Smirnov, E., Kelk, S., Lebichot, B.: Improving card fraud detectionthrough suspicious pattern discovery. Submitted for publication (2016)

[8] of Certified Fraud Examiners, A.: Report to the nation (2002). URL \http://www.acfe.com/uploadedFiles/ACFE_Website/Content/documents/2002RttN.pdf

[9] Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. MIT Press (2006)[10] Dal Pozzolo, A.: Adaptive machine learning for credit card fraud detection. Ph.D. thesis,

Universite Libre de Bruxelles (2015)[11] Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection

and concept-drift adaptation with delayed supervised information. In: Proceedings of theInternational Joint Conference on Neural Networks, pp. 1–8. IEEE (2015)

[12] Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned lessonsin credit card fraud detection from a practitioner perspective. Expert System with Applications10(41), 4915–4928 (2014)

[13] Demsar, J.: Statistical comparaison of classifiers over multiple data sets. Journal of MachineLearning Research 7 pp. 1–30 (2006)

[14] commerce Europe, E.: Global b2c e-commerce light report 2015 (2014). URL \https://www.ecommerce-europe.eu/facts-figures/free-light-reports

[15] Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1,291–316 (1997)

[16] Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M.: An experimental investigationof kernels on a graph on collaborative recommendation and semisupervised classification.Neural Networks 31, 53–72 (2012)

[17] Hara, K., Suzuki, I., Shimbo, M., Kobayashi, K., Fukumizu, K., Radovanovic, M.: Localizedcentering: Reducing hubness in large-sample data. In: Proceedings of the Association for theAdvancement of Artificial Intelligence Conference, pp. 2645–2651 (2015)

[18] HSN Consultants, I.: The nilson report (2015). URL \https://www.nilsonreport.com/publication_newsletter_archive_issue.php?issue=1068

[19] Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer-Verlag (1976)

Page 13: A graph-based, semi-supervised, credit card fraud ... · Credit-card Fraud detection received a lot of attention, but the number of publica-tions available is limited. Indeed, credit

A graph-based, semi-supervised, credit card fraud detection system 733

[20] Lebichot, B., Kivimaki, I., Francoisse, K., Saerens, M.: Semi-supervised classification throughthe bag-of-paths group betweenness. IEEE Transactions on Neural Networks and LearningSystems 25, 1173–1186 (2014)

[21] Mantrach, A., van Zeebroeck, N., Francq, P., Shimbo, M., Bersini, H., Saerens, M.: Semi-supervised classification and betweenness computation on large, sparse, directed graphs.Pattern Recognition 44(6), 1212 – 1224 (2011)

[22] Newman, M.: Networks: an introduction. Oxford University Press (2010)[23] Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing

order to the web. Technical Report 1999-66, Stanford InfoLab (1999). Previous number =SIDL-WP-1999-0120

[24] Phua, C., Lee, V., Smith-Miles, K., Gayler, R.: A comprehensive survey of data mining-basedfraud detection research. Computing Research Repository abs/1009.6119 (2010)

[25] Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Hubs in space: Popular nearest neighbors inhigh-dimensional data. Journal of Machine Learning Research 11, 2487–2531 (2010)

[26] Radovanovic, M., Nanopoulos, A., Ivanovic, M.: On the existence of obstinate results in vectorspace models. In: Proceedings of the 33rd International ACM SIGIR Conference on Researchand Development in Information Retrieval, SIGIR ’10, pp. 186–193. ACM (2010)

[27] Theodoridis, S., Koutroumbas, K.: Pattern recognition, 4th ed. Academic Press (2009)[28] Van Vlasselaer, V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens,

B.: Apate: A novel approach for automated credit card transaction fraud detection usingnetwork-based extensions. Decision Support Systems 75, 38–48 (2015)

[29] Weston, D.J., Hand, D.J., Adams, N.M., Whitrow, C., Juszczak, P.: Plastic card fraud detectionusing peer group analysis. Advances in Data Analysis and Classification 2(1), 45–62 (2008)

[30] Zaslavsky, V., Strizhak, A.: Credit card fraud detection using self-organizing maps. Informationand Security 18, 48 (2006)

[31] Zhou, D., Bousquet, O., Lal, T., Weston, J., Scholkopf, B.: Learning with local and globalconsistency. In: Proceedings of the Neural Information Processing Systems Conference (NIPS2003), pp. 237–244 (2003)