the applicability of benford's law fraud detection in the social sciences johannes bauer

Post on 18-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Applicability of Benford's Law

Fraud detection in the social sciences

Johannes Bauer

2 of 18

0 1 2 3 4 5 6 7 8 9

Benford 0,120 0,114 0,109 0,104 0,100 0,097 0,093 0,090 0,088 0,085

0,0

0,1

0,2

0,3

0,4

second significant digit

0 1 2 3 4 5 6 7 8 9

Benford 0,102 0,101 0,101 0,101 0,100 0,100 0,099 0,099 0,099 0,098

0,0

0,1

0,2

0,3

0,4

third significant digit

0 1 2 3 4 5 6 7 8 9

Benford 0,120 0,114 0,109 0,104 0,100 0,097 0,093 0,090 0,088 0,085

0 1 2 3 4 5 6 7 8 9

Benford 0,100 0,100 0,100 0,100 0,100 0,100 0,099 0,099 0,099 0,099

0,0

0,1

0,2

0,3

0,4

fourth significant digit

Benford distribution

k

i

kikk ddDdDP

1

111011 ])10(1[log)...(

1 2 3 4 5 6 7 8 9

Benford 0,301 0,176 0,125 0,097 0,079 0,066 0,058 0,051 0,046

0,0

0,1

0,2

0,3

0,4

first significant digit

3 of 18

• Paraguay

• Share prices

• The surface of rivers

• House numbers

• Company balance sheets

Two ways to Benford's law

• Multiplications – R.W. Hammering (1970)

• Distributions – Theodor Hill (1995)

Benford distribution

Benford distributed data

4 of 18

Detecting fraud

Approach:

• Distribution of digits with specific values

• Uncovering of fraudulent data

5 of 18

First step: What is Benford distributed

Datasource: Kölner Zeitschrift für Soziologie und Sozialforschung Feb.1985 to Mar.2007 with support from Lehrstuhl Braun

N 1.digit 2.digit 3.digit 4.digit

Not standardised regressions 2180 X X X X

Logistic regressions 2251 X X X

T-values 1325 X X

Cox regressions 506 X X

Pseudo r² 239 X X X

Chi-square values 188 X X X

R² 131 X (X)

6 of 18

First step: Unified distribution of digits

Normaldistribution

Mean: 3 Standard error: 2

Density between 3 and 4 (second digit)

7 of 18

Research survey LS Braun

Hypothesis: 'The higher the education a person has, the fewer cigarettes he consumes per day'

1. digit: Ho rejected (X²=103.39,df = 8,p = 0.000)

2. digit: Ho rejected (X²=122.59,df = 9,p = 0.000)

Fraudulent regressions coefficients Distribution of the first digit (n= 4621)

0,0

0,1

0,2

0,3

0,4

Benford 0,301 0,176 0,125 0,097 0,079 0,067 0,058 0,051 0,046

Students 0,309 0,173 0,136 0,080 0,072 0,048 0,053 0,061 0,067

1 2 3 4 5 6 7 8 9

8 of 18

Research survey: 3. and 4. digit

Fraudulent regressions coefficients Distribution of the third digit (n= 4541)

0,0

0,1

0,2

0,3

0,4

Benford 0,10 0,10 0,10 0,10 0,10 0,10 0,09 0,09 0,09 0,09

Studenten 0,09 0,15 0,10 0,14 0,09 0,07 0,07 0,09 0,08 0,07

0 1 2 3 4 5 6 7 8 9

Fraudulent regressions coefficients Distribution of the fourth digit (n= 4378)

0,0

0,1

0,2

0,3

0,4

Benford 0,10 0,10 0,10 0,10 0,10 0,10 0,09 0,09 0,09 0,09

Studenten 0,03 0,18 0,11 0,14 0,09 0,08 0,08 0,09 0,08 0,08

0 1 2 3 4 5 6 7 8 9

Ho rejected(X² = 304.89, df=9, p= 0.000)

Ho rejected(X² = 622.20, df=9, p= 0.000)

Students Students

9 of 18

Research survey: Individual data

Deviations from the Benford distribution

1. digit 2. digit 3. digit 4. digit

35 40 42 41

47 persons

0.744 0.851 0.893 0.872

absolute

percentage

10 of 18

Second step: detecting fraud

Approach:

• At what point is fraud recognisable?

Method:

• Random selection of fraudulent digits

• Goodness of fit test used on a Benford distribution

• Logistic regression

11 of 18

first significant digit

Second step: results

Mean number of cases in order to reject H0 with a probability of 95 %:

1. digit: 989 cases

2. digit: 766 cases

3. digit: 351 cases

4. digit: 138 cases

12 of 18

first significant digit ~ 50 % fabricated data

2. digit: 3308 cases

3. digit: 1351 cases

4. digit: 585 cases

Second step: results

Mean number of cases in order to reject H0 with a probability of 95 %:

1. digit: 4001 cases

13 of 18

first significant digit ~ 10 % fabricated data

2. digit: 78883 cases

3. digit: 31266 cases

4. digit: 12592 cases

Second step: results

Mean number of cases in order to reject H0 with a probability of 95 %:

1. digit: 94439 cases

14 of 18

Second step: Testing the results

first significant digit ~ 100 % fabricated data

47 persons, each with 100 data

1. digit

7 - 8

0.167

absolut

percentage

exp.deviation

1. digit

35

0.744

absolut

percentage

observed

15 of 18

Second step: Individual data

Mean number of cases in order to reject H0 with a probability of 95 %:

1. digit: 136 cases

2. digit: 102 cases

3. digit: 100 cases

4. digit: 69 cases

first significant digit ~ 100 % fabricated data

16 of 18

Second step: Individual data

first digit

fourth digit

17 of 18

Summary of results

Fraud detection with Benford's law – proposals:

• Observation of individual data („The Wisdom of Crowds“ effect)

• Observation of higher digits

• Recording of all possible metric coefficients

• Application of Goodness of fit test, which react more strongly to the sample size (i.e. chi-square g.o.f. test)

• With a very small degree of fraud, the result is strongly dependent on the procedure of the forger

18 of 18

Literature

Benford, F. (1938): The Law of Anomalous Numbers, in: Proceedings of the American Philosophical Society, 78(4): 551-572

Diekmann, A. (2007): Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientific Data, in: Journal of Applied Statistics, 34(3): 321-329

Hill, T.P. (1996): A Statistical Derivation of the Significant-Digit Law, in: Statistical Science, 10(4) 354-363

Hammering, R.W. (1970): On the Distribution of Numbers, in: Bell System Technical Journal, 49(8): 1609-1625

Judge, G.; Schechter, L. (2006): Detection problems in survey data using Benford's Law: Preprint.

Nigrini, M. (1999): I've Got Your Number. How a Mathematical Phenomenon Can Help CPAs Uncover Fraud and Other Irregularities, in: Journal of Accountancy, May: 79-83

Pietronero, L.; Tossati, E.; Tossati, V., Vespignani, A. (2001): Explaining the uneven distribution of numbers in nature: the laws of Benford and Zipf, in: Physica A, 293(1): 297-304

top related