analysis of y-chromosomal str population data 0.5em using...

42
Analysis of Y-Chromosomal STR Population Data Using the Discrete Laplace Model 9th ICFIS Leiden University, August 2014 Mikkel Meyer Andersen , Poul Svante Eriksen and Niels Morling

Upload: others

Post on 18-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

Analysis of Y-Chromosomal STR Population DataUsing the Discrete Laplace Model

9th ICFISLeiden University, August 2014

Mikkel Meyer Andersen, Poul Svante Eriksen and Niels Morling

Page 2: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Themes

I The discrete Laplace method and its applicationsI Comparing methods for calculating LR for Y-STR data

Page 3: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Introduction

Page 4: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

2 Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Evidential weight

Hp (prosecutor’s hypothesis): ’The suspect left theY-chromosome DNA in the crime stain.’

Hd (defence attorney’s hypothesis): ’A random man left theY-chromosome DNA in the crime stain.’

E : Evidence (e.g. DNA profile from crime scene)

Match:

LR =1

P (E | Hd )

Non-match:

LR =0

P (E | Hd )

LR =P (E | Hp)

P (E | Hd )

(Ideal situation, no errors, etc.)

Y-STR: Loci are not independent⇒ No product rule

Page 5: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

3 Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Sparsity of Y-STRs

19,630 samples Forensic marker set

MHT SWGDAM PPY12 Yfiler PPY239 loci 11 loci 12 loci 17 loci 23 loci

n = 1 (singletons) 6,083 8,495 9,092 15,263 18,237(31.0%) (43.3%) (46.3%) (77.8%) (92.9%)

n = 2 (doubletons) 1,131 1,227 1,260 1,064 531n = 3 435 436 416 256 64n = 4 226 199 196 94 16n = 5 114 101 106 63 6n = 6 86 85 85 21 2n = 7 63 51 50 12 2n = 8 43 50 41 12 1n = 9 29 29 34 9n = 10 31 21 24 4n = 11 22 24 28 5 1. . .n ∈ (30, 40] 13 11 7 1. . .n ∈ (100, 515] 8 4 4

Purps J, Siegert S, et al. (2014). A global analysis of Y-chromosomal haplotype diversity

for 23 STR loci. Forensic Science International: Genetics, Volume 12, 2014, p. 12-23.

Page 6: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimators

Page 7: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

4 Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimators

I Forensic ’conservatism’ (innocent suspect): For whom –what about paternity, immigration, etc.?

I Precise (low prediction error) – how do we measure this(more later)?

I Does it work for all datasets, also for those only consistingof singletons?

I Statistical model: Guaranteed behaviour (e.g. probabilitiessum to 1)

I Assign probability to all possible haplotypes (e.g. for mixtureLR)

I Probability mass 1 to be distributed among possiblehaplotypes

Page 8: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

5 Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimators

I Match probability ≈ DNA profile population frequencyI Count method (works for any trait, e.g. blood type)

I n: Dataset sizeI nx : Number of times x is observed in the datasetI P(X = x) = nx/n

Page 9: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

6 Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimators

I Include in dataset (new observation)I Additional information: Under Hd , suspect considered as a

random (wrongly accused) individual from the population;the haplotype is just another random sample

I Old dataset: D− of size nI New dataset: D of size n + 1I P(X = x) = (nx + 1)/(n + 1)

I nx = 0: P(X = x) = 1n+1

I∑

x∈Dnx

n+1 = 1n+1

∑x∈D nx = n+1

n+1 = 1, hence P(X = x) = 0for x 6∈ D

I Corrected count estimators:I Brenner’s κ (CH Brenner (2010) / HE Robbins (1968))I Generalised Good (IJ Good (1953), G Cereda/R Gill)

Page 10: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

The Discrete Laplacemethod

Page 11: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

7 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Motivation

I Haplotype probability distribution (statistical model)I Enables a wide range of inferences using one model:

I Haplotype frequency estimation (observed and unobserved)I Mixtures (e.g. separation and LR)I Cluster analysisI . . .

I Not a new ad-hoc tool for each taskI A statistical model gives desirable properties:

I P(x): Probability mass functionI Consistent: ∑

x∈H

P(x) = 1

I P(x) > 0 for all x ∈ H

Page 12: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

8 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Model

I Y-STR: Loci not statistically independentI Our approach: Condition on [something] to obtain

independency between loci

Page 13: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

9 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Discrete Laplace distribution

Discrete Laplace distributed X ∼ DL(p, µ):I Dispersion parameter 0 < p < 1 andI Location parameter µ ∈ Z = {. . . ,−2,−1,0,1,2, . . .}

Probability mass function:

f (X = x ;p, µ) =1− p1 + p

· p|x−µ| for x ∈ Z

Perfectly homogeneous population with 1-locus haplotypes:

P(X = x) = f (X = x ;p, µ)

8 9 10 11 12 13 14 15 16 17 18

x, e.g. Y−STR allele

f(X

= x

; p =

0.3

, µ =

13)

0.0

0.2

0.4

Page 14: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

10 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Statistical model for Y-STR haplotypes

Perfectly homogeneous population with r -locus haplotypes:

P(X = (x1, x2, . . . , xr )) =r∏

k=1

f (xk ;pk , µk )

I ~µ = (µ1, µ2, . . . , µr ): Central haplotypeI ~p = (p1,p2, . . . ,pr ): Discrete Laplace parameters (one for

each locus)I Mutations happen independently across loci (relative to ~µ)

Page 15: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

11 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Statistical model for Y-STR haplotypes

Non-homogeneous population with c subpopulations andr -locus haplotypes:

P(X = (x1, x2, . . . , xr )) =c∑

j=1

τj

r∏k=1

f (xk ;pjk , µjk )

I τj : A priori probability for originating from the j ’thsubpopulation (

∑cj=1 τj = 1)

I ~µj = (µj1, µj2, . . . , µjr ): Central haplotype for the j ’thsubpopulation

I ~pj = (pj1,pj2, . . . ,pjr ): Parameters for all loci at the j ’thsubpopulation

I Parameter estimation from observations using R librarydisclapmix

Page 16: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

12 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Data and fit

● ● ● ● ●

● ●0.0

0.1

0.2

0.3

0.4

0.5

DYS392

Pro

babi

lity

6 7 8 9 10 11 12 13 14 15 16

c: Number of subpopulationsP(X = x) =

∑cj=1 τj f (x ;pj , µj)

I 3 subpopulations: µ̂j 11 13 14τ̂j 52% 46% 2%

I Observed vs expected:Allele 11 12 13 14 15Observed 0.5248 0.0567 0.3322 0.0714 0.0083Expected 0.5248 0.0567 0.3315 0.0715 0.0089

Page 17: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

12 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Data and fit

● ● ● ● ●

● ●0.0

0.1

0.2

0.3

0.4

0.5

DYS392

Pro

babi

lity

6 7 8 9 10 11 12 13 14 15 16

● ObservationsEstimated (c = 1)

c: Number of subpopulationsP(X = x) =

∑cj=1 τj f (x ;pj , µj)

P(DYS392 = x) = 1 · f (x ;p = 0.41, µ = 11)

I 3 subpopulations: µ̂j 11 13 14τ̂j 52% 46% 2%

I Observed vs expected:Allele 11 12 13 14 15Observed 0.5248 0.0567 0.3322 0.0714 0.0083Expected 0.5248 0.0567 0.3315 0.0715 0.0089

Page 18: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

12 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Data and fit

● ● ● ● ●

● ●0.0

0.1

0.2

0.3

0.4

0.5

DYS392

Pro

babi

lity

6 7 8 9 10 11 12 13 14 15 16

● ObservationsEstimated (c = 2)

c: Number of subpopulationsP(X = x) =

∑cj=1 τj f (x ;pj , µj)

P(DYS392 = x) =0.519 · f (x ;p = 0.004, µ = 11)+0.481 · f (x ;p = 0.179, µ = 13)

I 3 subpopulations: µ̂j 11 13 14τ̂j 52% 46% 2%

I Observed vs expected:Allele 11 12 13 14 15Observed 0.5248 0.0567 0.3322 0.0714 0.0083Expected 0.5248 0.0567 0.3315 0.0715 0.0089

Page 19: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

12 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Data and fit

● ● ● ● ●

● ●0.0

0.1

0.2

0.3

0.4

0.5

DYS392

Pro

babi

lity

6 7 8 9 10 11 12 13 14 15 16

● ObservationsEstimated (c = 3)

c: Number of subpopulationsP(X = x) =

∑cj=1 τj f (x ;pj , µj)

I 3 subpopulations: µ̂j 11 13 14τ̂j 52% 46% 2%

I Observed vs expected:Allele 11 12 13 14 15Observed 0.5248 0.0567 0.3322 0.0714 0.0083Expected 0.5248 0.0567 0.3315 0.0715 0.0089

Page 20: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

12 Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Data and fit

● ● ● ● ●

● ●0.0

0.1

0.2

0.3

0.4

0.5

DYS392

Pro

babi

lity

6 7 8 9 10 11 12 13 14 15 16

● ObservationsEstimated (c = 3)

c: Number of subpopulationsP(X = x) =

∑cj=1 τj f (x ;pj , µj)

I 3 subpopulations: µ̂j 11 13 14τ̂j 52% 46% 2%

I Observed vs expected:Allele 11 12 13 14 15Observed 0.5248 0.0567 0.3322 0.0714 0.0083Expected 0.5248 0.0567 0.3315 0.0715 0.0089

Page 21: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Match probability

Page 22: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace13 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Simulation study

Simulation study:I Simulate populations (each 7 loci and 20 mio individuals)I Draw random datasetsI Estimate haplotype frequencies of all singletons and

compare with the true valuesI Result: Smaller prediction error than those with count

estimator and Brenner’s κ method

Page 23: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace14 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimate match probabilityReal data (Y23 dataset)

Singleton proportions

Population Size, n 5 loci 7 loci

World 18,925 0.026 0.108Europe 11,664 0.029 0.101

I Dataset sizes: 200 and 500I Sampling cases with singleton haplotype:

1. Draw dataset, D, from population2. Draw an extra observation, h3. If h ∈ D, skip and go to next sample4. If h 6∈ D: Estimate frequency and compare to nh/n (’true’)

I 100 cases for each dataset size, population and locuscount

I Compare to Brenner’s κ method and Generalised Good

Page 24: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace15 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimate match probability

5 loci 7 loci

●●

●●

●●

102

104

106

108

Bre

nner

Goo

d

Dis

clap

Bre

nner

Goo

d

Dis

clap

Estimator

LR

Page 25: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace16 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Prediction error

Probability LR LR inflation

Case 1 p1 = 0.01 LR = 100.0L̂R/LR = 1.005

p̂1 = 0.00995 L̂R = 100.5

Case 2 p2 = 0.0001 LR = 10,000L̂R/LR = 0.667

p̂2 = 0.00015 L̂R = 6,667

Page 26: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace16 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Prediction error

Probability LR LR inflation

Case 1 p1 = 0.01 LR = 100.0L̂R/LR = 1.005

p̂1 = 0.00995 L̂R = 100.5

Case 2 p2 = 0.0001 LR = 10,000L̂R/LR = 0.667

p̂2 = 0.00015 L̂R = 6,667

Error type

p̂i − pip̂i−pi

pi(p̂i − pi )

2 (p̂i−pi )2

pilog10

(p̂ipi

)Case 1 −0.00005 −0.005 2.5 · 10−9 2.5 · 10−7 −0.002Case 2 0.00005 0.5 2.5 · 10−9 2.5 · 10−5 0.176

Taking summary (sum, mean, median, ...): What is 0?

Page 27: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace17 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Estimate match probability

5 loci 7 loci

0.0000.0030.0060.009

0.00000.00050.00100.00150.00200.0025

0.00

0.05

0.10

0.000.050.100.150.20

Mean

Median

99th QM

axB

renn

er

Goo

d

Dis

clap

Bre

nner

Goo

d

Dis

clap

Estimator

Sum

mar

y of

(p̂h

−p h

)2

p h

Page 28: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete Laplace18 Match probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

YHRD

I The discrete Laplace method and Brenner’s κ isimplemented in upcoming version ofhttp://www.yhrd.org

I The discrete Laplace method helped finding haplotypeswith wrong metapopulation assignments

Page 29: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Mixture separation

Page 30: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

19 Mixture analysis

Cluster analysis

Conclusion

Denmark

Mixture separation

Yfiler trace, 15 loci (DYS385a/b removed):

Locus Alleles

DYS19 14, 15DYS389I 13, 14DYS389II’ 16, 17DYS390 24, 26DYS391 10, 11DYS392 11, 13DYS393 13DYS438 11, 12DYS439 10, 11DYS437 14, 15DYS448 19, 20DYS456 15, 16DYS458 14, 18DYS635 23Y GATA H4 12, 13

213−1 = 4,096 possible contributor pairs

Page 31: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

20 Mixture analysis

Cluster analysis

Conclusion

Denmark

Mixture separation

Danish Somali German

DEN (21) DEN (15) DEN (10) SOM (10) GER (7)

Loci 21 15 10 10 7n 181 181 181 201 3,443Singletons 181 164 112 56 662

(100%) (90.6%) (61.9%) (27.9%) (19.2%)

I For each dataset, 550 mixtures were simulatedI i ’th contributor pair ci = {hi,1,hi,2}, find p̂i = P̂(hi,1)P̂(hi,2)

I Order all pairs according to the p̂i values (highest tolowest)

Page 32: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

21 Mixture analysis

Cluster analysis

Conclusion

Denmark

Mixture separation

Probabiliy DEN (21) DEN (15) DEN (10) SOM (10) GER (7)

Rank ≤ 1 13% 26% 45% 72% 53%Rank ≤ 5 33% 55% 84% 94% 89%Rank ≤ 10 42% 69% 93% 98% 97%

Random ≤ 10 0.03% 0.78% 12.15% 26.79 % 53.93%

DEN (21) DEN (15) DEN (10)

SOM (10) GER (7)0.00.20.40.60.81.0

0.00.20.40.60.81.0

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10x

P(T

rue

rank

≤x)

Ranking Discrete Laplace Random

Page 33: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

22 Mixture analysis

Cluster analysis

Conclusion

Denmark

Mixture LR

Hp : S + UHd : U1 + U2

LR =P(HU)∑

(HU1 ,HU2 )P(HU1)P(HU2)

●●

● ●

●● ●

●●

●●

● ●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●●

● ●

●●

●●

● ●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

● ●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

● ●

●●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●●

●●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

● ●

●● ●●

●● ●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●● ●

●●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

DEN (21) DEN (15) DEN (10)

SOM (10) GER (7)

5

10

15

0

5

10

0.0

2.5

5.0

7.5

0.0

2.5

5.0

7.5

10.0

0

2

4

6

5 10 15 20 5 10 15 2.5

5.0

7.5

10.0

2.5

5.0

7.5

10.0 2 4 6

log10(1 P(Hs))

log1

0(LR

)

Predicted rank

1

2−5

6−10

11−50

51−100

>= 101

Page 34: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Cluster analysis

Page 35: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

23 Cluster analysis

Conclusion

Denmark

Cluster analysis

I τj = P(From subpopulation j)I Haplotype frequency by summing the contributions from

each subpopulation:

P(Haplotype = x)

=c∑

j=1

τj · P(Haplotype = x | From subpopulation j)︸ ︷︷ ︸Discrete Laplace model

.

I Bayes theorem:

P(From subpopulation j | Haplotype = x)

=τj · P(Haplotype = x | From subpopulation j)

P(Haplotype = x)

Page 36: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

24 Cluster analysis

Conclusion

Denmark

Cluster analysis of European data7 loci

14,13,29,25,11,13,13 (R1b1b2a2g)

14,13,29,25,10,13,13 (R1b1b2a1)

14,13,29,24,10,13,13 (R1b1b2a2c)

14,13,30,24,10,13,13 (R1b1b2a2g)

14,13,29,23,10,13,13 (R1b1b2a2c)

14,13,30,23,11,13,12 (R1b1b)

14,13,29,23,11,13,13 (R1b1b2a1)

15,13,29,24,11,13,13 (R1b1b2a2g)

14,13,30,24,11,13,13 (R1b1b2a2c)

14,13,29,24,11,13,13 (J1a)

14,14,30,24,11,13,13 (R1b1b2a2c)

14,14,30,24,11,14,14 (N1c)

14,14,30,23,11,14,14 (N1c1)

15,13,29,23,10,14,14 (N1c)

15,14,31,23,10,12,14 (I2b)

15,13,30,23,10,12,14 (I2b)

15,12,29,22,10,11,14 (G2a3)

15,12,29,22,10,11,13 (G2a3b)

15,12,28,22,10,11,13 (G2a3)

14,12,29,22,10,11,13 (I1)

14,12,28,22,10,11,13 (I1)

14,12,28,23,10,11,13 (I1)

15,12,28,24,10,11,12 (J2b2)

15,13,29,23,10,11,12 (J1)

14,13,30,23,10,11,12 (J1e)

14,13,29,23,10,11,12 (J2a8)

13,14,30,24, 9,11,13 (E1b1b1b)

13,13,30,24,10,11,13 (E1b1b1a2)

13,13,31,24,10,11,13 (E1b1b1a)

14,13,29,24,11,11,13 (R1b1b2a2g)

16,13,31,24,11,11,13 (I2a)

16,13,31,24,10,11,13 (I2a)

16,13,29,24,10,11,13 (R1a1a)

16,13,29,25,10,11,13 (R1a1a7)

16,13,30,25,10,11,13 (R1a1a)

15,13,30,25,10,11,13 (D2)

15,13,30,25,11,11,13 (R1a)

16,13,30,25,11,11,13 (R1a)

17,13,30,25,11,11,13 (R1a)

17,13,30,25,10,11,13 (R1a1a7)

First analysed in ’Signature of recent historical events in the EuropeanY-chromosomal STR haplotype distribution’ by Roewer et al. in 2005

Page 37: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

25 Cluster analysis

Conclusion

Denmark

Cluster analysis of Y2321 loci (from Purps J, Siegert S, et al. (2014))

029:

14,

13,1

6,24

,10,

13,1

3,12

,12,

15,1

9,16

,11,

23,1

2,18

,22,

13,1

2,17

,11

043:

14,

13,1

7,24

,11,

12,1

3,11

,11,

14,1

9,16

,16,

23,1

3,19

,22,

12,1

2,17

,10

041:

14,

13,1

7,23

,11,

13,1

3,12

,12,

15,1

9,17

,16,

23,1

1,18

,22,

12,1

2,17

,08

011:

14,

12,1

5,24

,11,

13,1

3,12

,12,

15,2

1,16

,16,

23,1

1,19

,22,

12,1

4,17

,10

016:

14,

12,1

6,23

,11,

13,1

3,12

,11,

15,1

9,15

,18,

23,1

2,17

,21,

12,1

2,19

,10

040:

14,

13,1

7,23

,11,

13,1

3,12

,11,

15,1

9,16

,20,

23,1

1,17

,21,

12,1

2,17

,10

025:

14,

13,1

6,23

,11,

13,1

3,12

,13,

15,1

8,19

,17,

23,1

1,17

,22,

13,1

2,17

,10

048:

14,

14,1

6,23

,11,

13,1

3,12

,12,

15,1

8,16

,17,

25,1

2,18

,22,

13,1

2,17

,09

052:

14,

14,1

6,24

,11,

13,1

3,12

,11,

15,1

8,16

,17,

23,1

2,18

,22,

13,1

3,20

,10

024:

14,

13,1

6,24

,11,

13,1

3,12

,12,

14,1

8,16

,17,

23,1

1,18

,22,

12,1

2,17

,10

044:

14,

13,1

6,24

,11,

13,1

3,12

,12,

14,1

8,16

,17,

23,1

1,18

,22,

13,1

2,17

,10

071:

14,

13,1

6,24

,11,

13,1

2,12

,12,

14,1

8,16

,18,

23,1

1,18

,22,

14,1

2,17

,10

042:

14,

13,1

6,23

,11,

13,1

3,12

,12,

15,1

9,17

,17,

23,1

1,17

,22,

13,1

2,17

,10

026:

14,

13,1

6,23

,11,

13,1

3,12

,12,

15,1

9,15

,18,

23,1

2,17

,22,

13,1

2,17

,10

037:

14,

13,1

6,24

,11,

13,1

4,12

,11,

15,1

9,15

,19,

23,1

1,17

,22,

12,1

2,17

,10

045:

14,

13,1

7,24

,11,

13,1

2,12

,12,

15,1

9,15

,16,

23,1

2,18

,22,

12,1

2,18

,10

035:

14,

13,1

6,24

,11,

13,1

3,12

,12,

15,1

9,15

,17,

24,1

2,18

,22,

12,1

3,17

,10

019:

14,

13,1

6,24

,11,

13,1

3,12

,12,

15,1

9,15

,17,

24,1

2,18

,22,

13,1

2,16

,11

034:

14,

13,1

6,24

,11,

13,1

3,12

,12,

15,1

9,15

,16,

23,1

2,19

,22,

13,1

2,17

,10

036:

14,

13,1

6,24

,11,

13,1

3,12

,12,

15,1

9,16

,17,

23,1

2,18

,22,

13,1

2,17

,10

033:

14,

13,1

6,24

,11,

13,1

3,12

,12,

15,1

9,15

,18,

23,1

2,18

,23,

13,1

2,17

,10

021:

14,

13,1

6,24

,10,

13,1

3,12

,12,

15,1

9,15

,17,

23,1

2,18

,22,

12,1

2,17

,10

023:

14,

13,1

6,24

,10,

13,1

3,12

,12,

15,2

0,15

,17,

24,1

1,18

,22,

13,1

2,18

,10

060:

14,

12,1

6,24

,10,

13,1

3,12

,13,

15,1

9,16

,16,

23,1

2,17

,22,

13,1

2,16

,10

032:

14,

13,1

6,24

,11,

13,1

3,11

,11,

15,1

9,16

,17,

23,1

2,18

,22,

13,1

2,17

,10

028:

14,

13,1

6,24

,10,

13,1

3,11

,11,

15,1

9,16

,17,

23,1

2,18

,22,

12,1

2,17

,10

050:

14,

14,1

6,24

,10,

13,1

3,12

,11,

15,1

9,16

,16,

23,1

2,17

,22,

13,1

3,18

,10

049:

14,

14,1

6,24

,10,

11,1

2,09

,12,

15,2

0,15

,14,

21,1

1,15

,22,

14,1

3,16

,10

074:

15,

13,1

7,23

,10,

13,1

3,09

,11,

14,1

9,15

,17,

21,1

1,16

,23,

13,1

2,17

,10

046:

15,

14,1

6,23

,10,

13,1

4,10

,11,

14,1

8,15

,16,

21,1

2,17

,24,

13,1

1,18

,11

001:

14,

13,1

6,23

,10,

15,1

3,10

,11,

14,1

9,15

,17,

21,1

2,18

,23,

12,1

2,17

,11

009:

13,

14,1

7,24

,10,

14,1

3,11

,11,

14,1

9,15

,16,

22,1

2,19

,24,

12,1

2,18

,10

004:

13,

13,1

7,24

,10,

14,1

3,11

,12,

14,2

0,15

,16,

22,1

1,18

,25,

12,1

2,18

,10

002:

13,

13,1

7,23

,10,

14,1

3,11

,12,

14,1

9,15

,17,

22,1

2,18

,24,

13,1

1,17

,10

092:

16,

12,1

6,25

,10,

13,1

2,10

,12,

14,1

9,14

,18,

22,1

2,18

,25,

13,1

1,18

,10

061:

15,

12,1

6,24

,10,

13,1

3,10

,12,

15,1

9,15

,16,

22,1

2,18

,24,

11,1

2,15

,11

082:

15,

13,1

6,25

,11,

13,1

4,10

,12,

14,1

8,15

,17,

22,1

1,19

,23,

12,1

0,17

,12

065:

15,

13,1

7,24

,11,

13,1

4,10

,11,

14,1

8,15

,17,

21,1

0,18

,23,

12,1

0,16

,12

063:

15,

12,1

6,23

,10,

14,1

3,10

,12,

14,1

8,15

,16,

21,1

2,17

,24,

12,1

1,19

,11

058:

15,

12,1

6,23

,10,

14,1

3,10

,12,

14,1

8,17

,15,

21,1

2,16

,24,

12,1

1,19

,11

059:

15,

12,1

7,23

,11,

14,1

3,10

,11,

14,1

8,17

,15,

19,1

2,18

,25,

12,1

1,18

,11

091:

16,

12,1

6,23

,10,

14,1

3,10

,11,

14,1

6,17

,18,

21,1

2,16

,27,

13,1

1,18

,11

066:

16,

12,1

8,24

,11,

13,1

3,10

,11,

14,2

1,14

,18,

21,1

2,17

,25,

12,1

1,19

,11

070:

15,

13,1

6,23

,11,

14,1

4,10

,10,

14,1

9,14

,17,

22,1

2,16

,20,

11,1

1,19

,11

051:

14,

14,1

6,23

,10,

14,1

4,10

,10,

14,1

9,14

,17,

22,1

2,17

,20,

11,1

1,19

,11

053:

14,

14,1

6,24

,11,

14,1

4,10

,10,

14,1

9,14

,17,

21,1

2,18

,20,

11,1

1,19

,11

022:

15,

14,1

6,22

,10,

13,1

3,13

,12,

14,1

8,15

,18,

20,1

2,19

,21,

12,1

1,20

,12

031:

16,

14,1

5,23

,10,

13,1

3,13

,12,

14,1

8,15

,17,

21,1

1,17

,23,

12,1

1,18

,13

057:

14,

13,1

6,23

,10,

14,1

3,10

,12,

15,1

8,15

,15,

20,1

2,17

,24,

12,1

1,20

,11

006:

13,

14,1

6,24

,10,

15,1

4,11

,11,

15,1

8,15

,14,

22,1

0,18

,24,

14,1

2,19

,12

010:

13,

14,1

7,24

,10,

14,1

4,10

,13,

15,2

1,16

,17,

22,1

1,20

,23,

12,1

0,18

,10

030:

14,

12,1

6,24

,10,

14,1

2,11

,12,

15,2

0,15

,18,

20,1

2,18

,23,

12,1

1,18

,11

055:

15,

12,1

6,23

,10,

12,1

2,10

,12,

15,1

9,15

,18,

20,1

2,19

,22,

12,1

1,17

,11

093:

15,

12,1

6,24

,10,

13,1

2,10

,12,

15,2

0,15

,17,

21,1

2,18

,24,

13,1

1,18

,11

027:

14,

12,1

6,22

,10,

14,1

1,10

,12,

15,1

9,15

,15,

23,1

2,17

,23,

13,1

2,15

,10

064:

15,

12,1

6,24

,10,

11,1

2,09

,12,

16,1

9,13

,16,

21,1

1,16

,23,

13,1

2,17

,09

089:

15,

12,1

6,24

,10,

11,1

2,09

,12,

16,1

9,13

,17,

21,1

1,17

,23,

13,1

2,17

,09

067:

15,

14,1

6,22

,10,

11,1

2,09

,11,

14,1

9,15

,18,

20,1

2,18

,23,

13,1

2,16

,09

039:

15,

13,1

6,23

,10,

11,1

2,09

,11,

14,2

0,15

,17,

21,1

1,17

,23,

13,1

1,18

,09

020:

14,

13,1

6,23

,10,

11,1

2,09

,11,

15,2

0,15

,16,

22,1

1,17

,23,

12,1

2,17

,10

054:

14,

14,1

7,23

,10,

11,1

2,09

,11,

15,2

0,15

,16,

21,1

1,17

,22,

12,1

1,17

,10

038:

14,

13,1

7,22

,10,

11,1

2,09

,11,

14,2

1,15

,15,

21,1

1,17

,23,

12,1

1,17

,08

068:

15,

13,1

6,23

,09,

11,1

2,09

,12,

14,2

1,16

,14,

22,1

2,16

,22,

12,1

2,18

,10

007:

13,

14,1

6,24

,09,

11,1

3,10

,10,

14,2

0,16

,18,

21,1

2,18

,27,

11,1

1,22

,12

077:

16,

13,1

8,24

,11,

11,1

3,10

,13,

15,2

0,15

,17,

23,1

1,18

,30,

11,1

3,18

,10

076:

15,

13,1

8,25

,11,

11,1

3,11

,10,

14,2

0,15

,15,

23,1

2,18

,23,

12,1

2,19

,10

084:

16,

13,1

7,25

,11,

11,1

3,11

,10,

14,2

0,15

,15,

23,1

3,18

,23,

12,1

2,19

,10

088:

16,

13,1

7,25

,11,

11,1

3,11

,11,

14,2

0,17

,15,

23,1

3,17

,23,

12,1

2,19

,10

087:

16,

13,1

7,25

,10,

11,1

3,11

,10,

14,2

0,16

,16,

23,1

2,18

,23,

12,1

2,18

,10

090:

16,

13,1

7,25

,10,

11,1

3,11

,11,

14,2

0,16

,16,

23,1

2,18

,25,

12,1

2,19

,10

072:

14,

12,1

6,25

,10,

11,1

3,11

,11,

14,1

9,15

,18,

24,1

1,15

,25,

13,1

2,18

,11

047:

14,

14,1

6,23

,10,

10,1

4,11

,11,

16,1

9,15

,17,

24,1

2,17

,25,

12,1

1,18

,10

056:

15,

12,1

7,22

,10,

11,1

4,10

,12,

16,2

1,15

,17,

21,1

2,17

,21,

12,0

9,18

,11

018:

15,

12,1

7,22

,10,

11,1

4,10

,11,

16,2

1,15

,16,

20,1

2,15

,21,

12,0

9,18

,11

080:

15,

12,1

7,21

,10,

11,1

4,10

,11,

16,2

2,15

,16,

21,1

1,15

,21,

12,1

0,18

,12

012:

14,

12,1

6,22

,10,

11,1

3,10

,11,

16,2

0,14

,15,

22,1

1,17

,25,

13,1

1,19

,12

013:

14,

12,1

6,22

,10,

11,1

3,10

,11,

16,2

0,14

,15,

22,1

1,16

,26,

12,1

1,19

,12

015:

14,

12,1

6,23

,10,

11,1

3,10

,11,

16,2

0,14

,15,

21,1

1,16

,25,

12,1

1,20

,12

083:

16,

12,1

6,25

,11,

11,1

3,10

,11,

15,2

1,14

,16,

21,1

0,16

,23,

12,1

1,18

,13

017:

17,

13,1

5,23

,10,

11,1

3,10

,12,

15,2

1,14

,17,

22,1

2,17

,22,

10,1

2,19

,12

003:

13,

13,1

7,24

,10,

11,1

3,10

,12,

14,2

0,16

,16,

22,1

2,17

,22,

12,1

2,20

,12

008:

13,

13,1

7,24

,10,

11,1

3,10

,12,

14,2

0,16

,15,

22,1

2,18

,22,

12,1

2,19

,12

005:

13,

13,1

7,24

,10,

11,1

3,10

,12,

14,2

0,15

,17,

21,1

1,18

,25,

12,1

1,19

,12

098:

16,

14,1

7,25

,10,

11,1

3,10

,12,

14,1

8,15

,15,

21,1

1,18

,26,

12,1

2,17

,12

078:

15,

13,1

6,23

,10,

11,1

4,10

,12,

14,2

1,15

,16,

21,1

1,18

,24,

12,1

2,16

,09

014:

15,

12,1

7,23

,10,

11,1

3,10

,11,

16,2

0,15

,17,

21,1

1,17

,25,

12,1

1,17

,12

069:

15,

13,1

6,23

,10,

12,1

4,10

,11,

15,2

0,14

,16,

21,1

1,18

,25,

12,1

2,19

,12

073:

15,

14,1

8,23

,10,

12,1

5,10

,11,

14,2

0,14

,15,

21,1

0,17

,27,

12,1

2,19

,12

094:

15,

14,1

7,23

,10,

12,1

4,10

,11,

14,2

0,15

,15,

21,1

1,17

,27,

12,1

2,18

,12

095:

15,

13,1

7,21

,10,

11,1

4,11

,12,

14,2

0,15

,17,

22,1

1,16

,25,

11,1

2,19

,13

086:

16,

13,1

7,21

,10,

11,1

5,11

,12,

14,2

1,15

,16,

21,1

1,17

,25,

11,1

1,18

,13

099:

16,

13,1

8,21

,10,

11,1

5,11

,12,

14,2

1,15

,16,

20,1

1,17

,24,

11,1

1,18

,13

100:

17,

13,1

7,21

,10,

11,1

4,11

,12,

14,2

1,16

,16,

21,1

1,16

,25,

11,1

1,17

,13

096:

17,

13,1

7,21

,10,

11,1

4,11

,12,

14,2

1,17

,16,

21,1

1,18

,25,

11,1

1,17

,13

085:

16,

13,1

7,21

,10,

11,1

3,11

,12,

13,2

1,16

,16,

21,1

1,17

,25,

12,1

1,17

,13

097:

16,

14,1

7,22

,10,

11,1

3,11

,12,

14,2

1,15

,16,

21,1

0,16

,27,

12,1

1,14

,12

079:

15,

13,1

8,21

,10,

11,1

3,11

,12,

14,2

1,15

,17,

21,1

2,15

,26,

12,1

1,19

,15

075:

15,

13,1

8,21

,10,

11,1

3,11

,12,

14,2

1,15

,17,

21,1

2,15

,28,

11,1

1,19

,13

081:

15,

14,1

8,21

,10,

11,1

3,11

,12,

14,2

1,15

,15,

22,1

3,16

,28,

11,1

2,19

,13

062:

15,

12,1

7,23

,10,

10,1

4,09

,12,

16,2

1,15

,19,

21,1

2,20

,24,

12,1

0,18

,11

Page 38: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

26 Cluster analysis

Conclusion

Denmark

Pairwise population distances

Pairwise population distances:I 7-locus, 12,727 European males (91 locations):

Correlation(AMOVA, discrete Laplace) = 0.90I 10-locus, 2,736 African males (26 locations):

Correlation(AMOVA, discrete Laplace) = 0.82I 21-locus (Y23), 18,925 males (129 locations):

Correlation(AMOVA, discrete Laplace) = 0.78

Page 39: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Concluding remarks

Page 40: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

27 Conclusion

Denmark

The discrete Laplace method

I Sound statistical propertiesI Applications

I Estimation of Y-STR haplotype population frequenciesI Mixture analysisI Cluster analysis

I Computationally feasibleI Open source software: R libraries disclap and

disclapmix (and fwsim for simulating populations)I Criticism

I Intermediate alleles (e.g. 10.2)I Duplications (e.g. DYS385a/b)I Copy number variation (e.g. Yfiler Plus)I µ̂’s difficult to estimate (curse of dimensionality)

Page 41: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

28 Conclusion

Denmark

Conclusion

I Match probability is of great interest and is difficultI Validation of methods

I Open source software (e.g. R library, C++ program):Compare to your own method

I Availability of real data (PPY23)I Purps J, Siegert S, et al. (2014). A global analysis of

Y-chromosomal haplotype diversity for 23 STR loci. ForensicScience International: Genetics, Volume 12, 2014, p. 12-23.http://dx.doi.org/10.1016/j.fsigen.2014.04.008

I R-object: http://people.math.aau.dk/~mikl/?p=y23I Battery of simulated populationsI Measure of prediction error

I For a matching profile (e.g. Y23 or Yfiler Plus), use onlysubset (e.g. 7 or 10 loci) for LR calculations?

I Easier to validate

Page 42: Analysis of Y-Chromosomal STR Population Data 0.5em Using ...people.math.aau.dk/~mikl/talks/2014-disclap-leiden.pdf · Analysis of Y-Chromosomal STR Population Data Using the Discrete

28

The Discrete LaplaceMethod

MM [email protected]

Introduction

Estimators

Discrete LaplaceMatch probability

Mixture analysis

Cluster analysis

Conclusion

Denmark

Thank you foryour attention