fraud detection with matlab · types of fraud corporate –financial statement falsification...

27
1 © 2015 The MathWorks, Inc. Fraud Detection with MATLAB Ian McKenna, Ph.D.

Upload: others

Post on 04-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

1© 2015 The MathWorks, Inc.

Fraud Detection with MATLAB

Ian McKenna, Ph.D.

Page 2: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

2

Agenda

Introduction: Background on Fraud Detection

Challenges: Knowing your Risk

Overview of the MATLAB Solution– Connect to financial data sources

– Calculate fraud indicators

– Classify funds with machine learning

– Generate reports & deploy applications

Questions & Answers

Page 3: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

4

Fraud Detection

Detecting when people

intentionally act secretly

to deprive another of

something of value

Types

– Returns Forensics

– Linguistic Based Cues

http://nakedshorts.typepad.com/files/madoff_fairfieldsentry3x.pdf

Page 4: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

5

Types of Fraud

Corporate

– Financial statement falsification

Securities and commodities

– Hedge Fund returns manipulation

– Stock markets manipulation, regulation compliance

Healthcare

Mortgage

Identity theft (credit card)

Insurance

Mass marketing

Asset forfeiture/money laundering

Page 5: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

6

Hedge Fund Returns Manipulation

More prone to fraud due to decreased regulation

– SEC stats indicate 1% misbehave

Scenarios

– Misbehavior: HF managers that have some discretion in

valuing illiquid investments. Academics have devised methods

to analyze and flag potentially “manipulated” fund returns.

– Outright fraud: Quantitative screening and use of dedicated

algorithms can save a lot of time

Page 6: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

7

Return-Based Analysis

# of negative monthly returns used to judge manager’s

performance

Attract investors by misreporting returns

Distortion possible for returns at manager’s discretion

– Illiquid assets, complex assets

E.g. discontinuity exists at zero but disappears if returns

computed bimonthly

“Suspicious Patterns in Hedge Fund Returns and the Risk of Fraud”. Bollen, Nicolas P.B. and Veronika

K. Pool (2012) Review of Financial Studies 25, 2673-2702.

Page 7: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

9

Returns Distribution Discontinuity

Page 8: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

10

Benford’s Law

Frequency distribution of digits in many real-life sources

of data:

– Electricity bills

– Street addresses

– Stock prices

– Population numbers

– Death rates

– Physical and mathematical constants

– Processes described by power laws

Page 9: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

11

Stock Market Returns First Digit Frequency

Source: Checking Financial markets via Benford's law, Marco Corazza, Andrea Ellero, and Alberto

Zorzi

Page 10: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

12

Agenda

Introduction: Background on Fraud Detection

Challenges: Knowing your Risk

Overview of the MATLAB Solution– Connect to financial data sources

– Calculate fraud indicators

– Classify funds with machine learning

– Generate reports & deploy applications

Questions & Answers

Page 11: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

13

Challenges in Fraud Detection

Cost/Economics

– Most cases not fraud

– Manual analysis

Data

– Huge data sets

– Complex data types

– Data integration

Change

– Evolutionary

– Secrecy in detection methods

Page 12: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

15

Traditional Approach Challenge

Challenges Faced During Model Development

Off-the-shelf softwareInability to work with

custom and complex data

In-house development with

traditional languages

Adapting requires long

development times

Spreadsheets, Excel Limited data size

Combination of the aboveInefficiencies in

Integration & Automation

Page 13: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

16

Computational Finance Workflow

Research and Quantify

Data Analysis

& Visualization

Financial

Modeling

Application

Development

Reporting

Applications

Production

Share

Automate

Files

Databases

Datafeeds

Access

Page 14: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

17

The Desired Report

Three funds to analyze and report:

– Gateway Fund

– American Funds Growth Fund

– Fairfield Sentry (known fraudulent Madoff fund)

Page 15: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

18

Agenda

Introduction: Background on Fraud Detection

Challenges: Knowing your Risk

Overview of the MATLAB Solution– Connect to financial data sources

– Calculate fraud indicators

– Classify funds with machine learning

– Generate reports & deploy applications

Questions & Answers

Page 16: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

20

Implemented Methods – Returns Based

Returns distribution and discontinuity at 0 Check discontinuity at 0 of the distribution of monthly returns

Low correlation with other assets Regress fund returns on a combination of style factors that maximize

explanatory power of the analysis

Unconditional serial correlation Check if monthly returns are serially correlated, i.e. correlated with their

previous month value. Because managers investing in illiquid securities,

with no end-of-month quoted price, may smooth their returns compared to

all available market information

Conditional serial correlation Using the optimal factor model constructed in “Low correlation with other

assets”, check serial correlation occurring especially after a down month

(i.e. when the suspicious managers has the highest incentive to “catch up”)

Page 17: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

21

Implemented Methods – Returns Based

Number of returns equal 0 Calculate the theoretical number of returns being 0, using cumulative

distribution function and binomial coefficients, for a time series exhibiting

the same characteristics (average returns and variance) as the fund. Then

compare that number with the actual count.

Number of negative returns Calculate the theoretical number of negative returns as above. Then

compare that number with the actual count.

Number of unique returns/length of identical recurring

series Calculate the theoretical number of each patterns. Unique returns is the

number of unique numbers in the time series and length of identical series

is the number of consecutive observations that are identical . Then

compare these statistical numbers with the actual count.

Page 18: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

22

Implemented Methods – Returns Based

Sample distribution of the last digit Check if the distribution of the returns last digit is uniformly distributed with

a goodness-of-fit test

Sample distribution of the first digit Check if the distribution of the returns first digit is following the Benford’s

Law with a goodness-of-fit test

Supervised classification methods Using machine learning tools (such a Neural Networks, Classification

methods) train a model to identify potential fraudsters. Input variables

consists of all of the indicators described above so far, attributed to

previously identified fraudulent and non fraudulent fund. Apply the fitted

model to a new fund to obtain its classification.

Page 19: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

24

Text Based Indicators

Idea from published research in criminal investigation

Hypothesis - deceptive senders display:

– Higher quantity

– Higher expressivity

– Higher informality

– Higher uncertainty

– Higher nonimmediacy

– Lower complexity

– Lower diversity

– Lower specificity

“Automating Linguistics-Based Cues for Detecting Deception in Text-based Asynchronous Computer-Mediated Communication”.

LINA ZHOU, Department of Information Systems, University of Maryland, Baltimore County, MD, USA. JUDEE K. BURGOON, JAY F.

NUNAMAKER, JR. AND DOUG TWITCHELL, Center for the Management of Information, University of Arizona, Tucson, AZ, USA. Group

Decision and Negotiation 13: 81–106, 2004

Page 20: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

25

Implemented Methods – Text Based

Measure Complexity Average number of statements (average concepts per sentence)

Average sentence length (average complexity of structures)

Vocabulary complexity (average word length)

Measure Uncertainty Average use of modifiers (number of adjectives/adverbs per sentence)

Average reference to other (number of he, they, …)

Measure of Expressivity Emotiveness (number of adjectives compared to nouns)

Measure of Diversity Lexical diversity (number of unique words)

Page 21: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

26

Classifying Words

Java POS Tagger

Reference online dictionary

Only a few line of code

Page 22: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

28

Comparison: American Growth Fund

Page 23: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

29

Comparison: Madoff

Page 25: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

32

MATLAB Solutions

Traditional Approach Challenge Solution

Off-the-shelf softwareInability to work with

custom and complex dataFlexible Modeling

Work with structured/unstructured

In-house development

with traditional languages

Adapting requires long

development timesRapid Prototyping

Advanced

Spreadsheets, Excel Limited data sizeWork with Big Data Sets

Database/Hadoop

Combination of the aboveInefficiencies in

Integration & AutomationEasy to Integrate & Deploy

Automated reports, encrypted models

Page 26: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

33

Financial Modeling Workflow

Financial

Statistics & Machine

LearningOptimization

Financial Instruments Econometrics

MATLAB

Parallel Computing MATLAB Distributed Computing Server

Files

Databases

Datafeeds

Access

Reporting

Applications

Production

Share

Data Analysis and Visualization

Financial Modeling

Application Development

Research and Quantify

MATLAB Compiler

SDK

MATLAB Compiler

Rep

ort G

en

era

tor

Production Server

Datafeed

Database

Spreadsheet Link EX

Trading

Page 27: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,

34

Q&A