fraud analytics with machine learning and big data engineering for telecom

15
Fraud Analytics with Machine Learning & Engineering (FAME) for Telecom using Big Data Presented by: Sudarson Roy Pratihar Pranab Kumar Dash Subhadip Paul Amartya Kumar Das 1 Copyright © 2015 Authors. All rights reserved.

Upload: sudarson-roy-pratihar

Post on 16-Apr-2017

654 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Fraud Analytics with Machine Learning & Engineering (FAME) for Telecom using Big Data

Presented by:

Sudarson Roy Pratihar

Pranab Kumar Dash

Subhadip Paul

Amartya Kumar Das 1 Copyright © 2015 Authors. All rights reserved.

A Quick Intro – Telecom Frauds

Fraud Analytics With Machine Learning & Engineering

2

• Have you got missed call from unknown numbers from overseas?

• Have you heard of PBX hacking and corporate facing huge bills?

Problem Definition

• Telecom industries loose 46.3 billion USD globally due to various frauds

• 10% operators have bad debt due to fraud

• Detection is cat and mouse game – pattern changes to get undetected by available data mining techniques

• Timely alert by processing huge volume of call records is a challenge

• Alerts with high false positives have more operational expenses

Fraud Analytics With Machine Learning & Engineering

3

Importance to Telecom Industry & Society

• Efficient and self adaptive detection

mechanism can reduce significant loss

(about 2.1% of the revenue) due to fraud

and operational cost

• Less “Bad Money” to the system

Fraud Analytics With Machine Learning & Engineering 4

Data Source

• More than 1 TB of Call Detail Record

(CDR) from a reputed wholesale carrier

as history data

• Tested on few weeks of live CDR of the

carrier

Fraud Analytics With Machine Learning & Engineering 5

Analytics Technique

• Basic components of FAME are:

– Self adaptive Machine learning

methodology

– Actionable dash board for operations and

investigations team to act upon the alerts

and feedback sent to machine learning

model for adjusting weights.

– High performance big data platform for

data processing and machine learning

Fraud Analytics With Machine Learning & Engineering 6

How it detects and adapts …

7 Fraud Analytics With Machine Learning & Engineering

Fraud Detection Model Pipeline

Novelty Detection Pipeline / Stacking

Actionable Dashboards

Pattern validation and tuning work bench

CDR Feed

1

2 4

Remaining Data

Frauds detected 3

5

6

7 New Patterns More frauds

8

New model addition / Tuning of existing 9

10

Operators feedback

Analyst

Operator

Novelty Detection Pipeline

8 Fraud Analytics With Machine Learning & Engineering

• Novelty detection of origin and destination numbers separately

• Various Contextual Anomaly Detection used and outputs are combined

• Below are some examples of algorithms used • Box-plot based outlier • Clustering to find out cluster with distinct

centroid • Use of Mahalonbis Distance – Mdist > ɸ. IQR

Novelty Detection – Illustrations

9 Fraud Analytics With Machine Learning & Engineering

Fraud Detection Pipeline

10

• Use history data and flag records based on “Novelty Detection Pipeline”

• Verify those records and mark them

• Build separate models (logistic regression, random forest models and threshold based) for different patterns

• Combine outputs of the models

Fraud Analytics With Machine Learning & Engineering

ACTIONABLE DASHBOARD

System Behind Magic …

11 Fraud Analytics With Machine Learning & Engineering

ENSEMBLE OF SELF ADAPTIVE ALGOS

BIG DATA PLATFORM POWERED BY HADOOP & SPARK

INTE

GR

ATI

ON

FA

CET

S

FEEDBACK

CDR FEED FROM TELECOM SYSTEM

Platform Behind Magic …

12 Fraud Analytics With Machine Learning & Engineering

Accuracy Results

13

0 0.2 0.4 0.6 0.8 1

True positive

False positive

Accuracy

B-Number A-Number

Fraud Analytics With Machine Learning & Engineering

• Individual accuracy for origin and destination numbers detection

• Combined mechanism has <5% false positive

What Next …

14

• Test for different types telecom frauds

• Extend this industrialized approach to other areas (such as network intrusion detection)

• Productize as cloud based service as well as on premise implementation

Fraud Analytics With Machine Learning & Engineering

Contact Us @

15 Fraud Analytics With Machine Learning & Engineering

Amartya Kumar Das [email protected]

https://in.linkedin.com/pub/amartya-das/b/72b/637

Subhadip Paul [email protected]

https://in.linkedin.com/in/subhadippaul

Pranab Kumar Dash [email protected]

www.linkedin.com/profile/view?id=19155039

Sudarson Roy Pratihar [email protected]

www.linkedin.com/in/sudarson

Follow us #FAMETELCO