data mining for fraud detection

27
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection

Upload: tommy96

Post on 21-Nov-2014

3.723 views

Category:

Documents


10 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data Mining for Fraud Detection

CS490D:Introduction to Data Mining

Prof. Chris Clifton

April 14, 2004

Fraud and Misuse Detection

Page 2: Data Mining for Fraud Detection

What is Fraud Detection?

• Identify wrongful actions– Is right and wrong universal?– If so, why not just prevent wrong actions

• Identify actions by the wrong people

• Identify suspect actions– Legal– But probably not right

Page 3: Data Mining for Fraud Detection

In Data Mining terms…

• Classification?– Classify into fraudulent and non-fraudulent

behavior– What do we need to do this?

• Outlier Detection– Assume non-fraudulent behavior is normal– Find the exceptions

• Problems?

Page 4: Data Mining for Fraud Detection

– +–

Solution: Differential Profiling

• Determine individual behavior– What is normal for the

individual– What separates one

individual from another

• Gives profile of individual behavior

• How do we do this?Profile

ClassificationMining

ProfileProfile

+ +–

Page 5: Data Mining for Fraud Detection

Has this been done?Intrusion Detection (Lane&Brodley)

• Profiled computer users based on command sequences– Command– Some (but not all) argument information– Sequence information

Page 6: Data Mining for Fraud Detection

ResultsAccuracy Time to Alarm

Page 7: Data Mining for Fraud Detection

Scaling Issues

• What happens with millions of users?– Credit card– Cell phone

• What about new users?

• Ideas?

Page 8: Data Mining for Fraud Detection

Multi-user profiles

• Cluster users

• Develop profiles for clusters– E.g., differential profiling

• Old customers: Do they match profile for their cluster?– Allows wider range of acceptable behavior

• New customer: Do they match any profile?

Page 9: Data Mining for Fraud Detection

Data mining for detection and prevention

Page 10: Data Mining for Fraud Detection

Matching known fraud/non-compliance

• Which new cases are similar to known cases?

• How can we define similarity?• How can we rate or score

similarity?

Page 11: Data Mining for Fraud Detection

Anomalies and irregularities

• How can we detect anomalous or unusual behavior?

• What do we mean by usual?• Can we rate or score cases on

their degree of anomaly?

Page 12: Data Mining for Fraud Detection

Techniques used to identify fraud

Predict and Classify– Regression

algorithms (predict numeric outcome): neural networks, CART, Regression, GLM

– Classification algorithms (predict symbolic outcome): CART, C5.0, logistic regression

Group and Find Associations

– Clustering/Grouping algorithms: K-means, Kohonen, 2Step, Factor analysis

– Association algorithms: apriori, GRI, Capri, Sequence

Page 13: Data Mining for Fraud Detection

Techniques for finding fraud:

• Predict the expected value for a claim, compare that with the actual value of the claim.

• Those cases that fall far outside the expected range should be evaluated more closely

Page 14: Data Mining for Fraud Detection

Techniques for finding fraud:

•Build a profile of the characteristics of fraudulent behavior.•Pull out the cases that meet the historical characteristics of fraud.

Decision Trees and Rules

Page 15: Data Mining for Fraud Detection

Techniques for finding fraud:

• Group behavior using a clustering algorithm

• Find groups of events using the association algorithms

• Identify outliers and investigate

Clustering and Associations

Page 16: Data Mining for Fraud Detection

Fraud detection using CRISP-DM

Provides a systematic way to detect fraud and abuse

Ensures auditing and investigative efforts are maximized

Continually assesses and updates models to identify new emerging fraud patterns

Leads to higher recoupments

Page 17: Data Mining for Fraud Detection

Data mining in action: Fraud, waste and abusecase studies

Page 18: Data Mining for Fraud Detection

Payment Error Prevention

…used this information to focus their auditing effort

The US Health Care Finance Administration needed to isolate the likely causes of payment error by developing a profile of acceptable billing practices and...

Page 19: Data Mining for Fraud Detection

Payment error prevention solution

• Clementine™• Using audited discharge records, built

profiles of appropriate decisions such as diagnosis coding and admission

• Matched new cases• Cases not matching are audited

Page 20: Data Mining for Fraud Detection

Payment error prevention results

• Detected 50% of past incorrect payments – resulting in significant recovery of funding lost to payment errors

• PRO analysts able to use resultant Clementine models to prevent future error

Page 21: Data Mining for Fraud Detection

Billing and payment fraud

Identified suspicious cases to focus investigations

The US Defense Finance and Accounting Service needed to find fraud in millions of Dept of Defense transactions and...

Page 22: Data Mining for Fraud Detection

Billing and payment fraud solution

• Clementine• Detection models based on known fraud

patterns• Analyzed all transactions – scored based on

similarity to these known patterns• High scoring transactions flagged for

investigation

Page 23: Data Mining for Fraud Detection

Billing and payment fraud results

• Identified over 1,200 payments for further investigation

• Integrated the detection process• Anomaly detection methods (e.g.,

clustering) will serve as ‘sentinel’ systems for previously undetected fraud patterns

Page 24: Data Mining for Fraud Detection

Audit selection

Focused audit investigations on cases with the highest likely

adjustments

The Washington State Department of Revenue needed to detect erroneous tax returns and...

Page 25: Data Mining for Fraud Detection

Audit selection solution

• Clementine• Using previously audited returns • Model adjustment (recovery) per auditor

hour based on return information• Models will then score future returns

showing highest potential adjustment

Page 26: Data Mining for Fraud Detection

Audit selection results

• Maximizes auditors’ time by focusing on cases likely to yield the highest return

• Closes the ‘tax gap’

Page 27: Data Mining for Fraud Detection

Data mining - key to detecting and preventing fraud, waste and abuse

• Learn from the past– High quality, evidence based

decisions• Predict

– Prevent future instances• React to changing circumstances

– Models kept current, from latest data