richard a. derrig ph. d. opal consulting llc visiting scholar, wharton school university of...

48
Richard A. Derrig Ph. D. Richard A. Derrig Ph. D. OPAL Consulting LLC OPAL Consulting LLC Visiting Scholar, Wharton School Visiting Scholar, Wharton School University of Pennsylvania University of Pennsylvania New Applications of Statistics and Data Mining Techniques to Classification and Fraud Detection in Insurance. Bogotá, Columbia November 3, 2005

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Richard A. Derrig Ph. D.Richard A. Derrig Ph. D.

OPAL Consulting LLC OPAL Consulting LLC

Visiting Scholar, Wharton SchoolVisiting Scholar, Wharton SchoolUniversity of PennsylvaniaUniversity of Pennsylvania

Richard A. Derrig Ph. D.Richard A. Derrig Ph. D.

OPAL Consulting LLC OPAL Consulting LLC

Visiting Scholar, Wharton SchoolVisiting Scholar, Wharton SchoolUniversity of PennsylvaniaUniversity of Pennsylvania

New Applications of Statistics and Data Mining Techniques to Classification

and Fraud Detection in Insurance.

Bogotá, ColumbiaNovember 3, 2005

Insurance Fraud Bureau Insurance Fraud Bureau of Massachusettsof Massachusetts

You can steal more

with a ball point than

by gun point.

Insurance Fraud Bureau Insurance Fraud Bureau of Massachusettsof Massachusetts

Insurance fraud is known as a high reward, low risk crime.

GENERAL INSURANCE GENERAL INSURANCE PROBLEMSPROBLEMS

WHAT: Product Design WHERE: Market Characteristics WHO: Classification & Sale HOW: Claims Paid WHEN: Forecasting WHY: Profit (Expected)

TRADITIONALTRADITIONALMATHEMATICAL TECHNIQUESMATHEMATICAL TECHNIQUES

Arithmetic (Spreadsheets) Probability & Statistics (Range of Outcomes) Curve Fitting (Interpolation & Extrapolation) Model Building (Equations for Processes) Valuation (Risk, Investments, Catastrophes) Numerical Method (Analytic Solution Rare)

NON-TRADITIONAL NON-TRADITIONAL MATHEMATICSMATHEMATICS

Fuzzy Sets & Fuzzy Logic Elements: “in/out/partially both” Logic: “true/false/maybe” Decisions: “incompatible criteria”

Artificial Intelligence: “data mining” Neural Networks: “learning algorithms”

CLASSIFICATIONCLASSIFICATION

Segmentation: A major exercise for insurance underwriting and claims

Underwriting: Find profitable risks from among the available market

Claims: Sort claims into easy pay and claims needing investigation

Fuzzy Logic Compared with Fuzzy Logic Compared with ProbabilityProbability

Probability: Measures randomness; Measures whether or not event occurs;

and Randomness dissipates over time or with

further knowledge. Fuzziness:

Measures vagueness in language; Measures extent to which event occurs;

and Vagueness does not dissipate with time or

further knowledge.

Fuzzy LogicFuzzy Logic

The field of Pattern Recognition is a search for structure in data.

Old View: Given N objects, divide them into 2 < C < N clusters of homogeneous or similar types. Similarity can be based upon multiple features or criteria but each object is in one and only one cluster.

New View: Objects can be members of one or several clusters with varying strengths of membership; i.e. fuzzy clusters are Fuzzy Sets of clusters.

Example A: Classification of Individual Risks Example B: Classification of Injury Claims

Clusters

FUZZY SETSFUZZY SETSTOWN RATING CLASSIFICATION When is one Town near another for Auto

Insurance Rating?- Geographic Proximity (Traditional)- Overall Cost Index (Massachusetts)

Geographically close Towns do not have the same Expected Losses.

Clusters by Cost Produce Border Problems:Towns between Territories. Fuzzy Clusters acknowledge the Borders.

Are Overall Clusters correct for each Insurance Coverage?

Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering.

Fuzzy Clustering of Fraud Study Claims by Fuzzy Clustering of Fraud Study Claims by Assessment DataAssessment Data

Membership Value Cut at 0.2Membership Value Cut at 0.2

0

1

2

3

4

5

6

0 10 20 30 40 50 60 70 80 90 100 110 120 130

CLAIM FEATURE VECTOR ID

FIN

AL

CL

UST

ER

S

ValidBuild-up (Inj. Sus. Level <5)

Opportunistic Fraud

(0,0,0,0,0,0)

(0,1,0,1,3,0)

(1,4,0,4,6,0)

(1,7,0,7,7,0)

(7,8,7,8,8,0)

= Full Member =Partial Member Alpha = .2

SuspicionCenters

Build-up (Inj. Sus. Level = to or >5)

(A, C, Is, Ij, T, LN)

Planned Fraud

FRAUDFRAUD

The Major QuestionsWhat Is Fraud?

How Much Fraud is There?

What Companies Do about Fraud?

How Can We Identify a Fraudulent Claim?

FRAUD DEFINITIONFRAUD DEFINITION

Principles Clear and willful act Proscribed by law Obtaining money or value Under false pretenses

Abuse: Fails one or more Principles

Fraud DefinitionFraud Definition

COSTS Fraud (Criminal, Hard) Small

Mass. Auto & WC < 1% Abuse (Not Criminal, Soft Fraud),

BIG Bucks, Depends on Line “Abuse” is (legally) a gray area, unethical

behavior “Abuse” Containment is a Matter for

Company/Industry/Regulator

HOW MUCH CLAIM FRAUD?HOW MUCH CLAIM FRAUD?

10%Fraud

FRAUD TYPESFRAUD TYPES Insurer Fraud

Fraudulent Company Fraudulent Management

Agent Fraud No Policy False Premium

Company Fraud Embezzlement Inside/Outside Arrangements

Claim Fraud Claimant/Insured Providers/Rings

CLAIM FRAUD INDICATORSCLAIM FRAUD INDICATORSVALIDATION PROCEDURESVALIDATION PROCEDURES

Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft)

“No one indicator by itself is necessarily suspicious”.

Problem: How to validate the systematic use of Fraud Indicators?

AIB FRAUD INDICATORSAIB FRAUD INDICATORSAIB FRAUD INDICATORSAIB FRAUD INDICATORS

Accident Characteristics (19) No report by police officer at scene No witnesses to accident

Claimant Characteristics (11) Retained an attorney very quickly Had a history of previous claims

Insured Driver Characteristics (8) Had a history of previous claims Gave address as hotel or P.O. Box

1989 Examples

AIB FRAUD INDICATORSAIB FRAUD INDICATORSAIB FRAUD INDICATORSAIB FRAUD INDICATORS

Injury Characteristics (12) Injury consisted of strain/sprain only No objective evidence of injury

Treatment Characteristics (9) Large number of visits to a chiropractor DC provided 3 or more modalities on most visits

Lost Wages Characteristics (6) Claimant worked for self or family member Employer wage differs from claimed wage loss

1989 Examples

REAL PROBLEMREAL PROBLEM

Classify all claims Identify valid classes

Pay the claim No hassle Visa Example

Identify (possible) fraud Investigation needed

Identify “gray” classes Minimize with “learning” algorithms

Experience and Judgment Artificial Intelligence Systems

Regression Models Fuzzy Clusters Neural Networks Expert Systems Genetic Algorithms All of the Above

FRAUDULENT CLAIM IDENTIFICATION

FRAUDULENT CLAIM IDENTIFICATION

DMDatabases

Scoring Functions

Graded Output

Non-Suspicious ClaimsRoutine Claims

Suspicious ClaimsComplicated Claims

POTENTIAL VALUE OF AN ARTIFICIAL POTENTIAL VALUE OF AN ARTIFICIAL INTELLIGENCE SCORING SYSTEMINTELLIGENCE SCORING SYSTEM

Screening to Detect Fraud Early Auditing of Closed Claims to Measure

Fraud Sorting to Select Efficiently among

Special Investigative Unit Referrals Providing Evidence to Support a Denial Protecting against Bad-Faith

Using Kohonen’s Self-Organizing Feature Map to Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims FraudUncover Automobile Bodily Injury Claims Fraud

PATRICK L. BROCKETT

Gus S. Wortham Chaired Prof. of Risk Management

University of Texas at Austin

XIAOHUA XIA

University of Texas, at Austin

RICHARD A. DERRIG

Senior Vice President

Automobile Insurers Bureau of Massachusetts

Vice President of Research

Insurance Fraud Bureau of Massachusetts

JOURNAL OF RISK AND INSURANCE, 65:2, 245-274, 1998,

Self-Organizing Feature Maps T. Kohonen 1982-1990 (Cybernetics) Reference vectors map to OUTPUT format in

topologically faithful way. Example: Map onto 40x40 2-dimensional square.

Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations

NEURAL NETWORKSNEURAL NETWORKSNEURAL NETWORKSNEURAL NETWORKS

MAPPING: PATTERNS-TO-UNITS

Pat

tern

s

FEATURE MAPFEATURE MAPSUSPICION LEVELSSUSPICION LEVELS

1 4 7 10 13 16S1

S4

S7

S10

S13

S16

4-5

3-4

2-3

1-2

0-1

FEATURE MAPFEATURE MAPSIMILIARITY OF A CLAIMSIMILIARITY OF A CLAIM

1 5 9

13

17

S1

S4

S7

S10

S13

S16

4-5

3-4

2-3

1-2

0-1

DATA MODELING EXAMPLE: CLUSTERINGDATA MODELING EXAMPLE: CLUSTERING

Data on 16,000 Medicaid providers analyzed by unsupervised neural net

Neural network clustered Medicaid providers based on 100+ features

Investigators validated a small set of known fraudulent providers

Visualization tool displays clustering, showing known fraud and abuse

Subset of 100 providers with similar patterns investigated: Hit rate > 70% Cube size proportional to annual Medicaid revenues

© 1999 Intelligent Technologies Corporation

Modeling Hidden Exposures in Claim Modeling Hidden Exposures in Claim Severity via the EM AlgorithmSeverity via the EM Algorithm

Grzegorz A. RempalaDepartment of Mathematics

University of Louisvilleand

Richard A. DerrigOPAL Consulting LLC &

Wharton School, University of Pennsylvania

Claim Cost Distribution: Empirical versus Fitted

Hidden Exposures - OverviewHidden Exposures - Overview Modeling hidden risk exposures as additional

dimension(s) of the loss severity distribution

Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice)

Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models

Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms

Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. compared with that submitted by provider A.

Left panel: frequency histograms (provider A’s histogram in filled bars).

Right panel: density estimators (provider A’s density in dashed line)

Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02

Figure 2: EM FitFigure 2: EM Fit

Left panel: mixture of normal distributions fitted via the EM algorithm to BI data

Right panel: Three normal components of the mixture.

Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02

Figure 3: Latent risk in BI data modeled by the EM Algorithm with Figure 3: Latent risk in BI data modeled by the EM Algorithm with m m = 3.= 3.

Left panel: set of responsibilities δ j 3.

Right panel: the third component of the normal mixture compared with the distribution of provider A’s claims (“A” claims density estimator is a solid curve)

Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 14, 11/18/02

Fraud Classification Using Principal Component Fraud Classification Using Principal Component Analysis of RIDITsAnalysis of RIDITs

PATRICK L. BROCKETTGus S. Wortham Chaired Prof. of Risk Management

University of Texas at AustinRICHARD A. DERRIGSenior Vice President

Automobile Insurers Bureau of MassachusettsVice President of Research

Insurance Fraud Bureau of MassachusettsLINDA L. GOLDEN

Marlene & Morton Meyerson Centennial Professor in BusinessUniversity of Texas

Austin, TexasARNOLD LEVINE

Professor EmeritusDepartment of Mathematics

Tulane UniversityNew Orleans LAMARK ALPERT

Professor of MarketingUniversity of Texas

Austin, TexasJOURNAL OF RISK AND INSURANCE, 69:3, SEPT. 2002

THE PROBLEMTHE PROBLEM

Data: Features have no natural metric-scale Model: Stochastic process has no parametric

form Classification: Inverse image of one dimensional

scoring function and decision rule Feature Value: Identify which features are

“important”

PRIDIT METHOD OVERVIEWPRIDIT METHOD OVERVIEW

1. DATA: N Claims, T Features, K sub T Responses, Monotone In “Fraud”

2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero.

3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells.

4. SCORE: Sum weights x claim ridit score.

5. PARTITION: above and below zero.

TABLE 1Computation of PRIDIT Scores

VariableVariable

Label

Proportion of “Yes”

Bt1 ("Yes")

Bt2 (“No")

Large # of Visits to Chiropractor TRT1 44% -.56 .44

Chiropractor provided 3 or more modalities on most visits

TRT2 12% -.88 .12

Large # of visits to a physical therapist TRT3 8% -.92 .08

MRI or CT scan but no inpatient hospital charges

TRT4 20% -.80 .20

Use of “high volume” medical provider TRT5 31% -.69 .31

Significant gaps in course of treatment TRT6 9% -.91 .09

Treatment was unusually prolonged

(> 6 months)

TRT7 24% -.76 .24

Indep. Medical examiner questioned extent of treatment

TRT8 11% -.89 .11

Medical audit raised questions about charges

TRT9 4% -.96 .04

TABLE 2Weights for Treatment Variables

VariablePRIDIT Weights

W(∞)

Regression Weights

TRT1 .30 .32***

TRT2 .19 .19***

TRT3 .53 .22***

TRT4 .38 .07

TRT5 .02 .08*

TRT6 .70 -.01

TRT7 .82 .03

TRT8 .37 .18***

TRT9 -.13 .24**

Regression significance shown at 1% (***), 5% (**) or 10% (*) levels.

TABLE 3PRIDIT Transformed Indicators, Scores and Classes

Claim TRT1

TRT2

TRT3

TRT4

TRT5

TRT6

TRT7

TRT8

TRT9

Score Class

1 0.44 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .07 2

2 0.44 0.12 0.08 0.2 -0.69 0.09 0.24 0.11 0.04 .07 2

3 0.44 -0.88 -0.92 0.2 0.31 -0.91 -0.76 0.11 0.04 -.25 1

4 -0.56 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .04 2

5 -0.56 -0.88 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .02 2

6 0.44 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .07 2

7 -0.56 0.12 0.08 0.2 0.31 0.09 -0.76 -0.89 0.04 -.10 1

8 -0.44 0.12 0.08 0.2 -0.69 0.09 0.24 0.11 0.04 .02 2

9 -0.56 -0.88 0.08 -0.8 0.31 0.09 0.24 0.11 -0.96 .05 2

10 -0.56 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .04 2

TABLE 7

AIB Fraud and Suspicion Score DataTop 10 Fraud Indicators by Weight

PRIDIT Adj. Reg. Score Inv. Reg. Score

ACC3 ACC1 ACC11

ACC4 ACC9 CLT4

ACC15 ACC10 CLT7

CLT11 ACC19 CLT11

INJ1 CLT11 INJ1

INJ2 INS6 INJ3

INJ5 INJ2 INJ8

INJ6 INJ9 INJ11

INS8 TRT1 TRT1

TRT1 LW6 TRT9

TABLE 10

AIB Fraud Indicator and Suspicious Score Classes

Fraud/Non-fraud Classifications

PRIDIT

FraudNon-fraud All

AdjusterRegressionScore

Fraud 30 5 35

Non-Fraud

32 60 92

All 62 65 127

( =11.3) [4.0, 31.8]

REFERENCESREFERENCESBrockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark,

(2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373.

Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274

Derrig, R.A. and H.I. Weisberg, [2004], Determinants of Total Compensation for Auto Bodily Injury Liability Under No-Fault: Investigation, Negotiation and the Suspicion of Fraud, Insurance and Risk Management, Volume 71, (4), pp. 633-662.

Derrig, R.A., H.I. Weisberg and Xiu Chen, [1994], Behavioral Factors and Lotteries Under No-Fault with a Monetary Threshold: A Study of Massachusetts Automobile Claims, Journal of Risk and Insurance, 61:2, 245-275.

Rempala, G. and R.A. Derrig, (2005), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, NAAJ, v9,n2,108-128

Viaene, Stijn, Derrig, Richard A., Dedene, Guido, (2004), A Case Study of Applying Boosting Naïve Bayes to Claim Fraud Diagnosis, IEEE Transactions on Knowledge and Data Engineering, v18,n5, May

Viaene, Stijn, Derrig, Richard A., Baesens, Bart, and Dedene, Guido, (2002), A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423.

Fuzzy ReferencesFuzzy ReferencesInsurance Related MaterialBrockett, P.L., Cooper, W.W., Golden, L.L. and Pitaktong, V. (1994), A neural network method for obtaining an early warning of insurer solvency, Journal of Risk and Insurance 61, pp. 402-424.Cummins, J.D. and Derrig, R.A. (1997), Fuzzy financial pricing of property‑liability insurance, North American Actuarial Journal, 1:4, pp. 21-44.Cummins, J.D. and Derrig, R.A. (1993), Fuzzy trends in property-liability insurance claim costs, Journal of Risk and Insurance, September, 60, pp. 429-465.Derrig, R.A. and Ostaszewski, K.M. (1995), Fuzzy techniques of pattern recognition in risk and claim classification, Journal of Risk and Insurance, 62:3 pp.447-482. Derrig, R.A. and Ostaszewski, K.M. (1997), Managing the tax liability of a propety-liability insurance company, Journal of Risk and Insurance, 64:4, pp.695-711.DeWit, G.W. (1982), Underwriting and uncertainty, Insurance: Mathematics and Economics 1, pp. 277-285.Jablonowski, M. (1993), An expert system for retention selection, CPCU Journal, pp. 214-221.Lemaire, Jean (1990), Fuzzy insurance, Astin Bulletin 20(1), pp. 33-55.Ostaszewski, K.M. (1993), An Investigation into Possible Applications of Fuzzy Sets Methods in Actuarial Science, Society of Actuaries, Schaumburg, Illinois.Young, V. (1994), Application of fuzzy sets to group health underwriting, Transactions of the Society of Actuaries 45, pp. 551-590.Young, V. (1996), Rate changing: A fuzzy logic approach, Journal of Risk and Insurance, 63:3, pp.461-484.