the applicability of benford's law fraud detection in the social sciences johannes bauer
TRANSCRIPT
The Applicability of Benford's Law
Fraud detection in the social sciences
Johannes Bauer
2 of 18
0 1 2 3 4 5 6 7 8 9
Benford 0,120 0,114 0,109 0,104 0,100 0,097 0,093 0,090 0,088 0,085
0,0
0,1
0,2
0,3
0,4
second significant digit
0 1 2 3 4 5 6 7 8 9
Benford 0,102 0,101 0,101 0,101 0,100 0,100 0,099 0,099 0,099 0,098
0,0
0,1
0,2
0,3
0,4
third significant digit
0 1 2 3 4 5 6 7 8 9
Benford 0,120 0,114 0,109 0,104 0,100 0,097 0,093 0,090 0,088 0,085
0 1 2 3 4 5 6 7 8 9
Benford 0,100 0,100 0,100 0,100 0,100 0,100 0,099 0,099 0,099 0,099
0,0
0,1
0,2
0,3
0,4
fourth significant digit
Benford distribution
k
i
kikk ddDdDP
1
111011 ])10(1[log)...(
1 2 3 4 5 6 7 8 9
Benford 0,301 0,176 0,125 0,097 0,079 0,066 0,058 0,051 0,046
0,0
0,1
0,2
0,3
0,4
first significant digit
3 of 18
• Paraguay
• Share prices
• The surface of rivers
• House numbers
• Company balance sheets
Two ways to Benford's law
• Multiplications – R.W. Hammering (1970)
• Distributions – Theodor Hill (1995)
Benford distribution
Benford distributed data
4 of 18
Detecting fraud
Approach:
• Distribution of digits with specific values
• Uncovering of fraudulent data
5 of 18
First step: What is Benford distributed
Datasource: Kölner Zeitschrift für Soziologie und Sozialforschung Feb.1985 to Mar.2007 with support from Lehrstuhl Braun
N 1.digit 2.digit 3.digit 4.digit
Not standardised regressions 2180 X X X X
Logistic regressions 2251 X X X
T-values 1325 X X
Cox regressions 506 X X
Pseudo r² 239 X X X
Chi-square values 188 X X X
R² 131 X (X)
6 of 18
First step: Unified distribution of digits
Normaldistribution
Mean: 3 Standard error: 2
Density between 3 and 4 (second digit)
7 of 18
Research survey LS Braun
Hypothesis: 'The higher the education a person has, the fewer cigarettes he consumes per day'
1. digit: Ho rejected (X²=103.39,df = 8,p = 0.000)
2. digit: Ho rejected (X²=122.59,df = 9,p = 0.000)
Fraudulent regressions coefficients Distribution of the first digit (n= 4621)
0,0
0,1
0,2
0,3
0,4
Benford 0,301 0,176 0,125 0,097 0,079 0,067 0,058 0,051 0,046
Students 0,309 0,173 0,136 0,080 0,072 0,048 0,053 0,061 0,067
1 2 3 4 5 6 7 8 9
8 of 18
Research survey: 3. and 4. digit
Fraudulent regressions coefficients Distribution of the third digit (n= 4541)
0,0
0,1
0,2
0,3
0,4
Benford 0,10 0,10 0,10 0,10 0,10 0,10 0,09 0,09 0,09 0,09
Studenten 0,09 0,15 0,10 0,14 0,09 0,07 0,07 0,09 0,08 0,07
0 1 2 3 4 5 6 7 8 9
Fraudulent regressions coefficients Distribution of the fourth digit (n= 4378)
0,0
0,1
0,2
0,3
0,4
Benford 0,10 0,10 0,10 0,10 0,10 0,10 0,09 0,09 0,09 0,09
Studenten 0,03 0,18 0,11 0,14 0,09 0,08 0,08 0,09 0,08 0,08
0 1 2 3 4 5 6 7 8 9
Ho rejected(X² = 304.89, df=9, p= 0.000)
Ho rejected(X² = 622.20, df=9, p= 0.000)
Students Students
9 of 18
Research survey: Individual data
Deviations from the Benford distribution
1. digit 2. digit 3. digit 4. digit
35 40 42 41
47 persons
0.744 0.851 0.893 0.872
absolute
percentage
10 of 18
Second step: detecting fraud
Approach:
• At what point is fraud recognisable?
Method:
• Random selection of fraudulent digits
• Goodness of fit test used on a Benford distribution
• Logistic regression
11 of 18
first significant digit
Second step: results
Mean number of cases in order to reject H0 with a probability of 95 %:
1. digit: 989 cases
2. digit: 766 cases
3. digit: 351 cases
4. digit: 138 cases
12 of 18
first significant digit ~ 50 % fabricated data
2. digit: 3308 cases
3. digit: 1351 cases
4. digit: 585 cases
Second step: results
Mean number of cases in order to reject H0 with a probability of 95 %:
1. digit: 4001 cases
13 of 18
first significant digit ~ 10 % fabricated data
2. digit: 78883 cases
3. digit: 31266 cases
4. digit: 12592 cases
Second step: results
Mean number of cases in order to reject H0 with a probability of 95 %:
1. digit: 94439 cases
14 of 18
Second step: Testing the results
first significant digit ~ 100 % fabricated data
47 persons, each with 100 data
1. digit
7 - 8
0.167
absolut
percentage
exp.deviation
1. digit
35
0.744
absolut
percentage
observed
15 of 18
Second step: Individual data
Mean number of cases in order to reject H0 with a probability of 95 %:
1. digit: 136 cases
2. digit: 102 cases
3. digit: 100 cases
4. digit: 69 cases
first significant digit ~ 100 % fabricated data
16 of 18
Second step: Individual data
first digit
fourth digit
17 of 18
Summary of results
Fraud detection with Benford's law – proposals:
• Observation of individual data („The Wisdom of Crowds“ effect)
• Observation of higher digits
• Recording of all possible metric coefficients
• Application of Goodness of fit test, which react more strongly to the sample size (i.e. chi-square g.o.f. test)
• With a very small degree of fraud, the result is strongly dependent on the procedure of the forger
18 of 18
Literature
Benford, F. (1938): The Law of Anomalous Numbers, in: Proceedings of the American Philosophical Society, 78(4): 551-572
Diekmann, A. (2007): Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientific Data, in: Journal of Applied Statistics, 34(3): 321-329
Hill, T.P. (1996): A Statistical Derivation of the Significant-Digit Law, in: Statistical Science, 10(4) 354-363
Hammering, R.W. (1970): On the Distribution of Numbers, in: Bell System Technical Journal, 49(8): 1609-1625
Judge, G.; Schechter, L. (2006): Detection problems in survey data using Benford's Law: Preprint.
Nigrini, M. (1999): I've Got Your Number. How a Mathematical Phenomenon Can Help CPAs Uncover Fraud and Other Irregularities, in: Journal of Accountancy, May: 79-83
Pietronero, L.; Tossati, E.; Tossati, V., Vespignani, A. (2001): Explaining the uneven distribution of numbers in nature: the laws of Benford and Zipf, in: Physica A, 293(1): 297-304