![Page 1: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/1.jpg)
The fundamental problem of Forensic Statistics
How to assess the evidential value
of a rare type match
Giulia Cereda, Université de Lausanne
Richard D. Gill, University of Leiden
![Page 2: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/2.jpg)
The problem
• A crime• A piece of evidence found at the crime scene
(DNA, fingerprint, footprint, hand writing, etc.) • A suspect (identified independently)• A match between suspect’s characteristic and
evidence’s characteristic.• A database which counts the frequency of each
characteristic.• Database frequency of the crime (and the
suspect) characteristic is 0
![Page 3: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/3.jpg)
Example
• A DNA stain is found on the victim’s body.
• Y-STR profile of type h.
• A suspect is identified, which is also of Y-STR type h.
• The Y-STR database of reference does not contain type h
Small databases
![Page 4: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/4.jpg)
Generalized-Good. Non parametric Good-type estimator based on Good (1953).
DiscLap-method (Andersen et al. 2013)
Explore other methods (Brenner 2010, Roewer2000, …)
How to evaluate this kind of evidence?
![Page 5: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/5.jpg)
The Likelihood Ratio
E is the evidence to be evaluated
B is the background information
Hp: the suspect left the stain
Hd: someone else left the stain
Many possiblechoices
THE likelihood ratio does not exists
![Page 6: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/6.jpg)
Typical choice
• E= the particular haplotype of the suspect and of the crime stain
• B=the list of haplotypes in the database
e.g. Discrete Laplace Method
![Page 7: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/7.jpg)
This frequency is not known. It can only be estimated
Un
cert
ain
ty
e.g.
Dis
cLap
met
ho
d
![Page 8: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/8.jpg)
A different choice
• E=number of times the haplotypes of the suspect (hs) and the haplotype of the crime-stain (hc) are in the data-base and whether or not they are the same haplotype.
• B= the frequencies of the frequencies of the database.
Ignore information about the particular haplotype
![Page 9: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/9.jpg)
• D database
Gotham City, 12,13,30,24,10,11,13
Gotham City, 12,13,30,24,10,11,14
Gotham City, 13,12,30,24,10,11,13
Gotham City, 13,13,29,23,10,11,13
Gotham City, 13,13,29,24,10,11,14
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
D’ database count
Gotham City, 12,13,30,24,10,11,131 Gotham City, 12,13,30,24,10,11,141Gotham City, 13,12,30,24,10,11,131Gotham City, 13,13,29,23,10,11,131Gotham City, 13,13,29,24,10,11,141Gotham City, 13,13,29,24,11,13,132Gotham City, 13,13,30,24,10,11,134
The frequencies of frequencies
N1 5
N2 1
N3 0
N4 1
Df frequencies of frequencies
Information is discarded
N1 is the number of haplotypes which occur once in D (singletons)
N2 is the number of dupletsEtc.
![Page 10: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/10.jpg)
A database D of size N
Gotham City, 12,13,30,24,10,11,13
Gotham City, 12,13,30,24,10,11,14
Gotham City, 13,12,30,24,10,11,13
Gotham City, 13,13,29,23,10,11,13
Gotham City, 13,13,29,24,10,11,14
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
can be considered as an i.i.d. sample (Y1, Y2, …, YN ) from species {1,2,…,s} with
probabilities (p1, p2, … ps).
The database count
Gotham City, 12,13,30,24,10,11,13 1
Gotham City, 12,13,30,24,10,11,14 1
Gotham City, 13,12,30,24,10,11,13
1
Gotham City, 13,13,29,23,10,11,13 1
Gotham City, 13,13,29,24,10,11,14 1
Gotham City, 13,13,29,24,11,13,13 2
Gotham City, 13,13,30,24,10,11,13 4
is a realization of r.v. (X1, X2, …, Xs),
defined Xj=#{i|Yi=j}.
The frequencies of frequencies
is made of (N1, N2,… )where Nj=#{i|Xi=j}
N1 5
N2 1
N3 0
N4 1
![Page 11: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/11.jpg)
• E=numbers of times the haplotypes of the suspect (hs) and the haplotype of the crime-stain (hc) are in the data-base and whether or not they are the same haplotype.
• B= the frequencies of the frequencies of the database (Df)
![Page 12: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/12.jpg)
![Page 13: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/13.jpg)
unbiased estimator for the numerator
unbiased estimator for the denominator
It is more sensible to estimate instead of .
is approximately unbiased for .
This suggests to use
as an estimator for
![Page 14: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/14.jpg)
How well estimates the true (unknown) ?
Take a big database of size 12,727.
Consider it as the world population. C1=0, C2=0.
Then,
1. Sample a little databases of size N=100+1+1.
2. If the 101th type is a new one in the small database increase
C1=C1+1
3. Check if the 101th is a new type equal to the 102th. C2=C2+1
4. Repeat steps 1-3 M=10,000 times.
P1=C1/M, P2=C2/M,
distribution of over many replications of small databases (size N=100) sampled from a bigger one (size N=12,727) which we pretend is the population.
And from which we obtain a value for 2.603:
![Page 15: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/15.jpg)
We sample 1000 databases of size 100 from the big one, and for each we calculate the estimate :
Performance of the GG-method
We know .
![Page 16: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/16.jpg)
We know .
We sample 1000 databases of size 100 from the big one, and for each we calculate the estimate :
Performance of the GG-method
![Page 17: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/17.jpg)
How well estimates the true (unknown) ?
distribution over many replications of small databases (size N=100) and new haplotype sampled from a bigger one (size N=12,727).
For each database sampled, the true frequency of the new haplotype h is taken equal to its frequency in the big database.
The estimated frequency is calculated using the Discrete Laplace method with default options (iterations, init_y …).
We calculate the distribution of and for each
database and new haplotype sampled.
![Page 18: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/18.jpg)
Performance of the DiscLap-method
Comparing the distribution of
![Page 19: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/19.jpg)
0 200 400 600 800 1000
02
46
Index
log1
0(R
atio_
An
de
rse
n)
Comparing the errors of the two methods
DiscLap-method GG-method
0 200 400 600 800 1000
02
46
Index
log10(R
atio
_G
ill)
![Page 20: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/20.jpg)
−1
01
23
45
6
log1
0(R
atio_
An
de
rse
n)
−1
01
23
45
6
log
10(R
atio
_G
ill)
Comparing the errors of the two methods
DiscLap-method GG-method
![Page 21: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/21.jpg)
Remarks
Two more levels of uncertainty:
• whether or not the model M that we are assuming for Pr is “correct enough”
• whether or not parameters of Pr in the model M are “correct enough”
Basic uncertainty: • whether or not the trace comes from the
suspect
![Page 22: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/22.jpg)
Maybe DiscLap was never intended it to be used for such small databases.
Maybe DiscLap does better for our purpose when used in more clever (targeted for our purpose) ways.
The error in the DiscLap method is given by two levels of uncertainty:• Population vs DiscLap• Parameter estimation (within Disclap)
The GG is a “model-free” method which thus has only one level of uncertainty.
![Page 23: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/23.jpg)
Conclusions
• The situation is more complex than it appears.
• Using more information less accurate LR.
• Assuming less gives more reliable LR.
![Page 24: The fundamental problem of Forensic Statistics](https://reader031.vdocuments.mx/reader031/viewer/2022020218/55a4f03a1a28ab26408b47ad/html5/thumbnails/24.jpg)
References