making decisions with biometric systems: the usefulness of

57
Making decisions with biometric systems: the usefulness of a Bayesian perspective A. Nautsch ? , D. Ramos Castro , J. Gonz´ alez Rodr´ ıguez , Christian Rathgeb ? , Christoph Busch ? ? Hochschule Darmstadt, CRISP, CASED, da/sec Security Research Group Universidad Aut´onoma de Madrid, ATVS Biometric Recognition Group NIST IBPC’16, Gaithersburg, 03.05.2016 Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 1/32

Upload: others

Post on 03-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Making decisions with biometric systems: the usefulness of a Bayesian perspective

A. Nautsch?, D. Ramos Castro† , J. Gonzalez Rodrıguez†, Christian Rathgeb? , Christoph Busch?

?Hochschule Darmstadt, CRISP, CASED, da/sec Security Research Group †Universidad Autonoma de Madrid, ATVS Biometric Recognition Group

NIST IBPC’16, Gaithersburg, 03.05.2016

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 1/32

Outline

1. Decision Frameworks in Biometrics and Forensics

2. Bayesian Method: making good decisions

3. Metrics, operating points and examples

4. Conclusion

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 2/32

Decision Frameworks

Biometric Systems in ISO/IEC JTC1 SC37 SD11

⇒ Note: separate decision subsystem Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 3/32

I Forensic Investigation Decide the k list to investigate e.g., AFIS

I Intelligence Decide where to establish relevant links in a database

I Forensic Evaluation Commnunicate for the court to decide a veredict

Decision Frameworks

Making Decisions with Biometric Systems

I Access control Accepted-rejected decision

Decisions are involved in most applications of biometric systems

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

I Intelligence Decide where to establish relevant links in a database

I Forensic Evaluation Commnunicate for the court to decide a veredict

Decision Frameworks

Making Decisions with Biometric Systems

I Access control Accepted-rejected decision

I Forensic Investigation Decide the k list to investigate e.g., AFIS

Decisions are involved in most applications of biometric systems

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

I Forensic Evaluation Commnunicate for the court to decide a veredict

Decision Frameworks

Making Decisions with Biometric Systems

I Access control Accepted-rejected decision

I Forensic Investigation Decide the k list to investigate e.g., AFIS

I Intelligence Decide where to establish relevant links in a database

Decisions are involved in most applications of biometric systems

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

Decision Frameworks

Making Decisions with Biometric Systems

I Access control Accepted-rejected decision

I Forensic Investigation Decide the k list to investigate e.g., AFIS

I Intelligence Decide where to establish relevant links in a database

I Forensic Evaluation Commnunicate for the court to decide a veredict

Decisions are involved in most applications of biometric systems

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

Decision Frameworks

Making Decisions with Biometric Systems

I Decision maker faces multiple sources of information Biometric system is one of them, but also . . .

I Prior knowledge about users/impostors/suspects I Other evidence from other biometric systems I . . .

I Decisions must consider all that information I Formalizing decision framework helps I Especially in complex problems I Example: medical diagnosis support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 5/32

Bayesian Method

Bayesian Decisions with Biometric Systems

I A proposal: Bayesian decision theory I Decisions are made based on posterior probabilities I Considering all the relevant information available I Updating strategy: likelihood ratios (LR)

Example biometrics systems in forensic evaluation of the evidence

Prior probability

all information prior to (forensic) evidence

Posterior probability

all information, inlcuding (forensic) evidence

Weight of the Evidence

Likelihood Ratio (LR)

[1] I. Evett: Towards a uniform framework for Reporting opinions in forensic science Casework, Science and Justice, 1998.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 6/32

Bayesian Method

Value of Evidence: Likelihood Ratio (LR)

I Two-class (H1, H2) decision framework

I Likelihood Ratio: probabilistic value of the evidence, also: the ratio of posterior to prior odds

Prior odds

Posterior odds

Inference

odds: 1:99 odds: 1000:99 LR LR = 1000P (H1) = 1% P (H1 | E) = 91%

Prior odds LR Posterior odds

P (H1) P (H2)

× P (E | H1) P (E | H2)

= P (H1 | E) P (H2 | E)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 7/32

Bayesian Method

Decisions Using Biometric Systems

I Binary classes (hypotheses): H1 and H2

I Inference I Prior probability, before knowing the biometric system outcome I Posterior probability, after the biometric system outcome I LR is the value of the biometric evidence ⇒ Changes prior odds into posterior odds

Prior odds

Posterior odds

Inference

LR (Biometric System)

P (H1 | E) P (H2 | E)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 8/32

Bayesian Method

Decisions Using Biometric Systems

I Costs: Penalty of making a wrong decision towards H1 (Cf1) or H2 (Cf2).

I Can be different — example in access control: I is it better to accept an impostor (cost Cf1) I or to reject a genuine user (cost Cf2)?

Prior odds

Posterior odds

Inference

LR (Biometric System)

Costs Cf1, Cf2

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 9/32

Bayesian Method

Decisions Using Biometric Systems

I Decision: Minimum-risk decision i.e.: minimum mean cost

I Decision rule considers I Posterior odds I Costs P (H1 | E) Cf1 R P (H2 | E) Cf2

Prior odds

Posterior odds

Inference Decision

H1 or H2?

LR (Biometric System)

Costs Cf1, Cf2

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 10/32

Posterior odds

Costs Cf1, Cf2

Decision H1 or H2?

Competence of the Decision Maker

Prior odds

Inference

Competence of the Comparator

(Biometric System) LR

Bayesian Method

Decision Process: Competences

I Total separation between I The comparator (biometric system outputing a LR) I The decision maker (depends on the application)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 11/32

Prior odds

Posterior odds

Inference

Costs Cf1, Cf2

Decision H1 or H2?

LR Should lead to the correct decision!

Bayesian Method

Decision Process: Consequences

I Duty of the biometric systems: yielding LR values that lead to the correct decisions

I The LR should support H1 when H1 is actually true I The LR should support H2 when H2 is actually true

I LR values must be calibrated, which leads to better decisions

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 12/32

Bayesian Method

Biometric Systems

I Score-based architecture I Widely extended I Especially in black-box implementations (COTS)

Criminal Biometric System Score

Suspect

I Score: in general the only output of the system I It may not be directly interpretable as a likelihood ratio I Depends on its calibration performance

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 13/32

Bayesian Method

LR-Based Computation with Biometric Systems

I A further stage is necessary: score-to-LR transformation

Biometric System

Score

Score-to-LR LR

I Objective: I Objective: output discriminating scores transforming the score

I Score-based architecture into a meaningful LR I Improve ROC/DET curves ⇒ Calibration of LRs [2,3]

[2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006.

[3] D. Ramos and J. Gonzalez Rodrıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32

Bayesian Method

LR-Based Computation with Biometric Systems

I A further stage is necessary: score-to-LR transformation

Biometric System Score-to-LR LR

I Objective: I Objective: output discriminating scores transforming the score

I Score-based architecture into a meaningful LR I Improve ROC/DET curves ⇒ Calibration of LRs [2,3]

[2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006.

[3] D. Ramos and J. Gonzalez Rodrıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32

Bayesian Method

LR-Based Computation with Biometric Systems

I A further stage is necessary: score-to-LR transformation

Biometric System Score-to-LR LR

I Objective: I Objective: output discriminating scores transforming the score

I Score-based architecture into a meaningful LR I Improve ROC/DET curves ⇒ Calibration of LRs [2,3]

[2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006.

[3] D. Ramos and J. Gonzalez Rodrıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32

Prior odds

Posterior odds

Inference Decision

H1 or H2?

LR (Biometric System)

Costs Cf1, Cf2

Bayesian Method

Bayesian Decisions: Advantages

I Competences of the biometric system are delimited: I Biometric system: comparator I Decision maker: final decision considering all the information I Separation of roles: important in some fields (e.g. forensics)!

I Information is integrated formally ⇒ LR into a probabilistic framework

I LR computation: great experience in other fields ⇒ Example: forensic biometrics

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 15/32

π= FNMR, FMR 7→ DET1−π

Cf1, Cf2

DCF 7→ APE & NBER

ECE

Cllr

Metrics and Examples

Revisiting ISO/IEC JTC1 SC37 SD11

P (H1 ) P (H2 )

⇒ π

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

P (H1 ) π = P (H2 ) 1−π

⇒ π

Metrics and Examples

Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR 7→ DET

Cf1, Cf2

DCF 7→ APE & NBER

ECE

Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

1−π

Cf1, Cf2

DCF 7→ APE & NBER

ECE

Cllr

FNMR, FMR 7→ DETP (H1 ) π = P (H2 )

⇒ π

Metrics and Examples

Revisiting ISO/IEC JTC1 SC37 SD11

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

P (H1 ) π = P (H2 ) 1−π

⇒ π

Metrics and Examples

Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR 7→ DET

Cf1, Cf2

DCF 7→ APE & NBER

ECE

Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

P (H1 ) π = P (H2 ) 1−π

⇒ π

DCF 7→ APE & NBER

ECE

Metrics and Examples

Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR 7→ DET

Cf1, Cf2

Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

error

1 FNMR

0.5 FMR

0 −10 0 10

Thresholds

40 DET (steppy)

30 FNMs

0.1 1 2 5 10 20 40

20

FNMR

(in

%)

10

5

2

0.1

FMR (in %)

1

0.3 H10.2

pdf

0.1 H2

0 −10 0 10

Scores

Metrics and Examples

Detection Error Trade-off (DET) diagrams

[4] N. Brummer¨ and E. de Villers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score Processing, Tech.Rep. AGNITIO Research, 2011.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 17/32

P (H1) P (H2)

P (E | H1) P (E | H2)

P (H1 | E) P (H2 | E)

P (H1 | E) R Cf2⇔ P (H2 | E) Cf1

P (H1)η = log Cf2 − log R LLRCf1 P (H2)

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayes theorem

Prior odds LR Posterior odds

× =

I Decision rule

P (H1 | E) Cf1 R P (H2 | E) Cf2

I Bayesian threshold η for Log-LRs (LLRs) by posterior odds

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32

P (H1 | E) R Cf2⇔ P (H2 | E) Cf1

P (H1)η = log Cf2 − log R LLRCf1 P (H2)

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayes theorem

Prior odds LR Posterior odds

P (H1) P (H2)

× P (E | H1) P (E | H2)

= P (H1 | E) P (H2 | E)

I Decision rule

P (H1 | E) Cf1 R P (H2 | E) Cf2

I Bayesian threshold η for Log-LRs (LLRs) by posterior odds

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32

P (H1 | E) R Cf2⇔ P (H2 | E) Cf1

P (H1)η = log Cf2 − log R LLRCf1 P (H2)

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayes theorem

Prior odds LR Posterior odds

P (H1) P (H2)

× P (E | H1) P (E | H2)

= P (H1 | E) P (H2 | E)

I Decision rule

P (H1 | E) Cf1 R P (H2 | E) Cf2

I Bayesian threshold η for Log-LRs (LLRs) by posterior odds

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32

Cf2 P (H1)η = log − logCf1 P (H2)

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayesian error rate: Decision Cost Function (DCF)

DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2

I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π P (H ) 1 π1. Mutually exclusive priors: log = log = logit πP (H2) 1−π

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

π Cf12. Introducing an effective prior: π = π Cf1 + (1−π) Cf2

DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)

η = − logit π

⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for

Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

π Cf1π = π Cf1 + (1−π) Cf2

Cf2 P (H1)η = log − logCf1 P (H2)

P (H1) π= log = logit πP (H2) 1−π

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayesian error rate: Decision Cost Function (DCF)

DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2

I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π

1. Mutually exclusive priors: log

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

2. Introducing an effective prior:

DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)

η = − logit π

⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for

Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

π Cf1 π = π Cf1 + (1−π) Cf2

Cf2 P (H1)η = log − logCf1 P (H2)

P (H1) π log = log = logit πP (H2) 1−π

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayesian error rate: Decision Cost Function (DCF)

DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2

I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π

1. Mutually exclusive priors:

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

2. Introducing an effective prior:

DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)

η = − logit π

⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for

Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

π Cf1π = π Cf1 + (1−π) Cf2

Cf2 P (H1)η = log − logCf1 P (H2)

P (H1) πlog = log = logit πP (H2) 1−π

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayesian error rate: Decision Cost Function (DCF)

DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2

I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π

1. Mutually exclusive priors:

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

2. Introducing an effective prior:

DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)

η = − logit π

⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for

Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

π Cf1 π = π Cf1 + (1−π) Cf2

Cf2 P (H1)η = log − logCf1 P (H2)

P (H1) π log = log = logit πP (H2) 1−π

Metrics and Examples

From Bayesian Decisions to Cost Functions

I Bayesian error rate: Decision Cost Function (DCF)

DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2

I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π

1. Mutually exclusive priors:

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

2. Introducing an effective prior:

DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)

η = − logit π

⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for

Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

Metrics and Examples

Example on Decision Cost Functions (DCFs)

I Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) 0.3

0.1 H2

H1

pdf 0.2

0 −10 0 10

LLRs

1

error

0.5 FNMR

FMR0 −10 0 10

η

DCF(π = 1 )101

0.2

Cost

0.1

0 −10 0 10

η

DCF(π = 1 )

2

I Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6)

0.2

Cost

0.1

0 −10 0 10

η

⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32

Metrics and Examples

Example on Decision Cost Functions (DCFs)

I Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) 0.3

0.1 H2

H1

pdf 0.2

0 −10 0 10

LLRs

1

error

0.5 FNMR

FMR0 −10 0 10

η

DCF(π = 1 )101 DCF(π = 1 )

2

I Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6)

0.2

actual 0

−10 0 10

η

Cost

0.1

0.2

Cost

0 −10 0 10

η

0.1

⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32

Metrics and Examples

Example on Decision Cost Functions (DCFs)

I Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) 0.3

0.1 H2

H1

pdf 0.2

0 −10 0 10

LLRs

1

error

0.5 FNMR

FMR0 −10 0 10

η

DCF(π = 1 )101 DCF(π = 1 )

2

I Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6)

0.2

minimum actual

0 −10 0 10

η

Cost

0.1

0.2

Cost

0 −10 0 10

η

0.1

⇒ actual vs. minimum DCF: calibration loss⇒ LLR meaning: aligning scores for Bayesian support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32

P(error)

0.050

0.025

actual DCF

minimum DCF

default

0 −10 −5 0 5 10

logit π = −η

Metrics and Examples

Visualizing DCFs

I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.I Area-under-APE: cost of LLR scores

: DCF = π + (1 − π) ˜

⇒ Goodness of LLRs: Cllr

[5] N. Brummer:¨ FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.

[6] D.A. van Leeuwen and N. Brummer:¨ An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

0.050 actual DCF

minimum DCF

default0.025 Cllr

0

logit π = −η

P(error)

−10 −5 0 5 10

Metrics and Examples

Visualizing DCFs

I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.: DCF = π + (1 − π) I Area-under-APE: cost of LLR scores ⇒ Goodness of LLRs: Cllr

[5] N. Brummer:¨ FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.

[6] D.A. van Leeuwen and N. Brummer:¨ An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

0.050 actual DCF

minimum DCF

default0.025 Cllr

Cmin llr

0

logit π = −η

P(error)

−10 −5 0 5 10

Metrics and Examples

Visualizing DCFs

I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.: DCF = π + (1 − π) I Area-under-APE: cost of LLR scores ⇒ Goodness of LLRs: Cllr

[5] N. Brummer:¨ FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.

[6] D.A. van Leeuwen and N. Brummer:¨ An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

P(error)

0.050 actual DCF

minimum DCF

0.025

interpolation by max. minDCF / min. ROC Convex Hull (ROCCH)

EER: 0.5%

default

Cllr

Cmin llr

0 −10 −5 0 5 10

logit π = −η

Metrics and Examples

Visualizing DCFs

I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.: DCF = π + (1 − π) I Area-under-APE: cost of LLR scores ⇒ Goodness of LLRs: Cllr

[5] N. Brummer: FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.

[6] D.A. van Leeuwen and N. Brummer: An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

Metrics and Examples

Normalized Bayesian Error Rate (NBER)

I APE-plot visually misleading on error impact I EER operating point: lots of scores to mismatch I FMR1000 operating point: few scores to mismatch

I Normalizing by default performance ⇒ wider range of operating points can be compared

1 default

actual

minim0.5 30 FM

30 FN0

η = − logit π

NBER

−15 −10 −5 0 5 10 15

(LLRs=0)

DCFnorm

um DCFnorm

s

Ms

[4] N. Brummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.

Note: in the BOSARIS toolkit, the x-axis is swapped, i.e.: depicting purely the effective prior.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 22/32

P (H1 ) π = P (H2 ) 1−

⇒ π

Metrics and Examples

Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR 7→ DETπ

Cf1, Cf2

X

Cllr

DCF 7→ APE & NBER

ECE

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 23/32

Metrics and Examples

Empirical Cross-Entropy (ECE)

I Objective measure of performance

I Motivation by Information Theory

Evidence

I Prior entropy −−−−−−−−−−→ Posterior entropy Information gain

I Divergence of system to Grund-of-Truth (GoT) I ECE: approximating Kullback-Leibler divergence DGoT||system

Hsystem(H1, H2) information (LLRs)

DGoT||system(H1, H2 | LLRs)

HGoT(H1, H2 | LLRs)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 24/32

Metrics and Examples

Empirical Cross-Entropy (ECE)

I We expect the reference, but obtain the system’s LLRs

I Measuring performance of LR in terms of uncertainty I The lower the better

Calibration loss: overall performance ⇔ discriminating power I Cllr at log(odds) = 0 ⇒ no information on H1/H2 prior

−6 −4 −2 0 2 4 6 0

0.1

0.2

ECE

System

Optimal calibration

default (LLRs=0)

Prior log10(odds)

[7] D. Ramos Castro and J. Gonzalez Rodrıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 25/32

ECE Cllr

Cmin llr

System

Optimal calibration 0.2

0.1 default (LLRs=0)

0 −6 −4 −2 0 2 4 6

Prior log10(odds)

Metrics and Examples

Empirical Cross-Entropy (ECE)

I We expect the reference, but obtain the system’s LLRs

I Measuring performance of LR in terms of uncertainty I The lower the better

Calibration loss: overall performance ⇔ discriminating power I Cllr at log(odds) = 0 ⇒ no information on H1/H2 prior

[7] D. Ramos Castro and J. Gonzalez Rodrıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 25/32

Metrics and Examples

Examples

I Signature recognition [8] I Performance of feature space normalization I Simulation of application-independent decision performances

w/ feature warping

1

−2 0 2 ECE

0.5

0

Prior log10(odds)

Baseline GMM – UBM

1

−2 0 2

ECE

0.5

0

Prior log10(odds)

[8] A. Nautsch, C. Rathgeb, C. Busch: Bridging Gaps: An Application of Feature Warping to Online Signature Verification, ICCST, 2014.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 26/32

default

Baseline DCFnorm 1 Baseline minDCFnorm

Baseline 30 FMs

Baseline 30 FNMs 0.5

w/ qual. DCFnorm

w/ qual. minDCFnorm

w/ qual. 30 FMs

w/ qual. 30 FNMs

0

η = − logit π

NBER

−10 0 10

Metrics and Examples

Examples

I Speaker recognition [9] I Overview of application-dependent decision costs in 10 dB/10 s I Conventional score normalization vs. quality-based

[9] A. Nautsch, R. Saeidi, C. Rathgeb, C. Busch: Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization, Interspeech, 2015.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 27/32

conventional QMF FQE binning

Calibration scheme

Cllr [bits]

0.35

discrimination loss0.25 0.20 calibration loss

0.15

Metrics and Examples

Examples

I Speaker recognition [10] I Examining calibration schemes in 55 quality conditions I Discrimination vs. calibration loss on 55-pooled I Goal: approx. binning performance, avoiding binning

[10] A. Nautsch, R. Saeidi, C. Rathgeb, C. Busch: Robustness of Quality-based Score Calibration of Speaker Recognition Systems with respect to low-SNR and short-duration conditions, Odyssey, 2016. (to appear)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 28/32

Metrics and Examples

Examples

I Recurring challenges in biometrics I NIST Speaker Recognition Evaluation (SRE) ⇒ DCFs (since 1996) & Cllr (since 2006)

I ICDAR Competition on Signature Verification and Writer Identification (SigWIcomp) ⇒ C & Cmin

llr llr (both since 2011)

I Non-biometric forensics [11] I Glass objects I Car paints I Inks

[11] G. Zadora, A. Martyna, D. Ramos, C. Aitken: Statistical Analysis in Forensic Science: Evidential Values of Multivariate Physicochemical Data, John Wiley and Sons, 2014.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 29/32

Conclusion

Summary

I Bayesian decision framework I Bayes theorem & decision rule enploying costs I Biometric systems: generator of Bayesian support (LLRs) I Decisions by posterior knowledge of priors and LLR score

I Score-to-LLR calibration: meaningful LLRs I Necessary step, requiring a calibration data set I Essential for validation/accredetation

I Performance reporting I Decoupled decision policy I APE curves I NBER diagrams

Cllr [bits]

0.1

0 discrimination loss

I ECE plots calibration loss

& CminI Scalars: actDCF, minDCF, Cllr llr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 30/32

Cllr [bits]

0.1

0 discrimination loss

calibration loss

Conclusion

Summary

I Bayesian decision framework I Bayes theorem & decision rule enploying costs I Biometric systems: generator of Bayesian support (LLRs) I Decisions by posterior knowledge of priors and LLR score

I Score-to-LLR calibration: meaningful LLRs I Necessary step, requiring a calibration data set I Essential for validation/accredetation

I Performance reporting I Decoupled decision policyI APE curves I NBER diagrams I ECE plots I Scalars: actDCF, minDC

& Cmin Cllr llr F,

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 30/32

Conclusion

Perspectives

I From forensics to biometrics in general

I Forensics: distinct separation of role provinces

Suspect reference Feature extraction

Recovered probe Feature extraction

Evidence analysis (comparison) Score

Guilty (Accept)

Not-Guilty (Reject)

Province of the forensic scientist Province of the court

⇒ Non-forensic biometric companion/equivalent

. . .

vendor system

. . .

customer decision policy

Note: neither forensic scientists nor courts shall be automated, its an analogue.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 31/32

32/32

Conclusion

Application fields

I Operating point independent performance reporting I Discrimination loss 7→ Goodness of scores w/o calibration I System calibration (meaningful) I Forensic state-of-the-art

⇒ European Network of Forensic Science Institutre (ENFSI): adopted Bayesian methodology (strong recommendation)

I Fix-operational testing: no need

⇒ But: fundamental in technology testing

This work has been funded by the Center for Advanced Security Research Darmstadt (CASED), and the Hesse government (project no. 467/15-09, BioMobile).

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016

0.5 −S 0.5 SCllr = ld 1 + e + ld 1 + e|H1| |H2|S∈H1 S∈H2

X π X ππ −(S ) 1−π S

1−π 1−π |H1| |H2|

S∈H1 S∈H2

ECE = ld 1 + e + ld 1 + e � � � �

Evaluation of evidence strength

I Metrics in the Bayesian Framework I Application-independent generalization [2]:

Goodness of (Log-Likelihood Ratio) scores CllrX � � X � �

I Information-theoretic generalization [7]:

Empirical Cross-Entropy (ECE)

I Metrics represent (cross-) entropy in bits

I Performance reporting with decoupled decision layer [2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection,

Computer Speech and Language, 2006.

[7] D. Ramos Castro and J. Gonzalez Rodrıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008.

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 2/3

Brief introduction to calibration

I Linear: logistic regression (robust model) I Transform: Scal. = w0 + w1 S

I Non-linear: Pool-Adjacent-Violator (PAV) algorithm (optimal) I Transform: monotonic, non-parametric mapping function

PAV-calibrated

LLRs

−10

0

10

−15 −10 −5 0 5 10 15

System scores

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 3/3