machine learning and fraud detection - mcubed ai london · machine learning and fraud detection...

69
World Class Payment and Enterprise Solutions for the global financial sector Machine Learning and Fraud Detection September 2019 Tamsin Crossland Senior Architect @CrosslandTamsin

Upload: others

Post on 20-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

World Class Payment and Enterprise Solutionsfor the global financial sector

Machine Learning and Fraud Detection

September 2019

Tamsin Crossland – Senior Architect

@CrosslandTamsin

Page 2: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

2

Two main types of article on AI

Page 3: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Machine Learning and Fraud Detection

• Payments

• Demonstration

• Thoughts

3

Page 4: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Machine Learning and Fraud Detection

• Payments

• Demonstration

• Thoughts

4

Page 5: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

5

Page 6: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

• Vulnerabilities in payments services have increased as the shift to digital and mobile customer platforms accelerates.

The increasing scale, diversity, and complexity of fraud.

6

Page 7: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

• New solutions have also led to payments transactions being executed more quickly, leaving banks and processors with less time to identify, counteract, and recover the underlying funds when necessary.

The increasing scale, diversity, and complexity of fraud.

7

Page 8: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

• The sophistication of fraud has increased:• greater collaboration among bad actors, including:

the exchange of stolen data, new techniques, and expertise on the dark web.

The increasing scale, diversity, and complexity of fraud.

8

Page 9: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

The fraud threat facing banks and payments firms has grown dramatically in recent years.

Estimates of fraud’s impact on consumers and financial institutions vary significantly but losses to banks alone are conservatively estimated to exceed $31 billion globally by 2018.

9

Page 10: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Instant Payments

Page 11: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

11

Rule Based Systems

Example: if a credit card transaction is more than ten times larger than the average for this customer

Allow the human experts to apply their subject matter expertise.

Difficult and time-consuming to implement well.

Includes the painstaking definition of every single rule for anomaly possible

If experts make an omission, undetected anomalies will happen and nobody will suspect it.

Today, legacy systems apply about 300 different rules on average to approve a transaction

Page 12: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

NeuralNetwork

12

Page 13: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Weights and Biases

13

Page 14: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Training

14

Page 15: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Training

15

FraudFraud

FraudulentTransaction

Non-FraudulentTransaction

Page 16: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Use Case

16

Page 17: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Rule Based versus Machine Learning

17

Rule Based Machine Learning

Catches obvious fraudulent scenarios Finds hidden correlations in data

Large amount of manual work to enumerate all

possible detection rules

Automatic detection of possible fraud scenarios

Easier to explain More difficult to explain

Page 18: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Machine Learning and Fraud Detection

• Payments

• Demonstration

• Thoughts

18

Page 19: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Install Tensorflow

19

Page 20: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Install Libraries

20

Data Analysis data mining and data analysis

winpty docker exec -i -t 07a24f61e7b6 bashpip install pandaspip install -U scikit-learn

Page 21: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

21

Contains two days worth of credit card transactions made in September 2013 by European cardholders.

492 frauds out of 284,807 transactions (0.172%).

Due to confidentiality issues, cannot provide the original features and more background information about the data.

Contains only numerical input variables which are the result of a Principal Component Analysis transformation

(a method of extracting relevant information from confusing data sets).

Page 22: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

22

Feature 'Time' contains the seconds elapsed between

each transaction and the first transaction in the dataset.

The only features which have not been transformed with PCA are 'Time' and 'Amount’

Feature 'Class' is the response variable and it takes

value 1 in case of fraud and

0 otherwise.

Features V1, V2, ... V28 are the principal components obtained with PCA

The feature 'Amount' is the transaction Amount

Page 23: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Demonstration 1

23

Page 24: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

24

Page 25: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Balance data

25

Page 26: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

49%

26

Page 27: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Data Loss

284315 -> 492

27

Page 28: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

28

Page 29: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Attempt 3Janio Martinez Bachmann

29

Page 30: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Libraries

30

a library for making statistical graphics in Python.

Toolbox for imbalanced dataset in machine learning.

Page 31: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

31

Page 32: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Underfitting and Overfitting

32

Page 33: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Overfitting

33

Page 34: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Outliers

34

Page 35: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Principal Component Analysis

35

Page 36: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Demonstration 2

36

Page 37: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

37

Scale time and amount

Page 38: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Random under-sampling

38

Page 39: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

39

Page 40: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Correlation MatrixUsed to show which features heavily influence whether a transaction is a fraud

40

Page 41: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

41

Page 42: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Anomaly detection

42

Page 43: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

43

Page 44: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

44

After implementing outlier reduction our accuracy has been improved by over 3%! Some outliers can distort the accuracy of our models but remember, we have to avoid an extreme amount of information loss or else our model runs the risk of underfitting.

Page 45: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

45

Dimensionality Reduction and Clustering

Page 46: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

46

Dimensionality Reduction and Clustering

t-SNE takes a high-dimensional dataset and reduces it to a low-dimensional graph whilst still retaining a lot of the information.

Page 47: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

SMOTE

Solving the Class Imbalance: SMOTE creates synthetic points from the minority class in order to reach an equal balance between the minority and majority class.

Location of the synthetic points: SMOTE picks the distance between the closest neighbors of the minority class, in between these distances it creates synthetic points.

Final Effect: More information is retained since we didn't have to delete any rows unlike in random undersampling.

Synthetic Minority Over-sampling Technique

47

Page 48: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

48

Compile the model

The following example uses accuracy, the fraction of the transactions that are correctly

classified.

optimizers shape and mold your model into its most accurate possible form by futzing

with the weights. The loss function is the guide to the terrain, telling the optimizer when

it’s moving in the right or wrong direction.

Page 49: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

49

Page 50: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Confusion Matrix

50

Predicted: no Predicted: Yes

Actual: no True negative False positive

Actual: yes False negative True positive

Page 51: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

51

Predicted: no Predicted: Yes

Actual: no True negative False positive

Actual: yes False negative True positive

Page 52: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Using SMOTE

52

Page 53: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

53

Page 54: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

54

Page 55: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Unsupervised Learning

Page 56: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

56

Page 57: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Iris Data Set

57

• 50 samples from each of three species of Iris.

• Four features were measured from each sample:

• the length and the width of the sepals and petals, in centimeters.

• the objective of K-means is simple:

• group similar data points together and discover underlying patterns.

• To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.

Page 58: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

KMeans

58

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups).

The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K.

The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.

Page 59: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Demonstration 3

59

Page 60: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

60

Page 61: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

61

Page 62: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Machine Learning and Fraud Detection

• Payments

• Demonstration

• Thoughts

62

Page 63: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Two Questions Every Machine Learning Project Should Ask

Is the purpose of the project ethical?

63

Is the implementation of the project ethical?

@CrosslandTamsin

Page 64: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Two Questions Every Machine Learning Project Should Ask

Is the purpose of the project ethical?

64

what are the additional benefits of the project?

who does it benefit?

Page 65: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

65

Is the purpose of the project ethical?

Page 66: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

Two Questions Every Machine Learning Project Should Ask

66

Is the implementation of the project ethical?Does it implement unfair bias?

Disclose to stakeholders about their interactions with an AI

Governance:• secure,

• reliable and robust, and

• appropriate processes are in place to ensure responsibility and accountability for those AI systems

Page 67: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

67

Is the implementation of the project ethical?

Page 68: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

One last thing

Is it Intelligent?

68

Page 69: Machine Learning and Fraud Detection - Mcubed AI London · Machine Learning and Fraud Detection September 2019 Tamsin Crossland ... pip install -U scikit-learn. 21 ... a library for

69

Fraud

FraudulentTransaction

Non-FraudulentTransaction

@CrosslandTamsin