making decisions with biometric systems: the usefulness of
TRANSCRIPT
Making decisions with biometric systems: the usefulness of a Bayesian perspective
A. Nautsch?, D. Ramos Castro† , J. Gonzalez Rodrıguez†, Christian Rathgeb? , Christoph Busch?
?Hochschule Darmstadt, CRISP, CASED, da/sec Security Research Group †Universidad Autonoma de Madrid, ATVS Biometric Recognition Group
NIST IBPC’16, Gaithersburg, 03.05.2016
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 1/32
Outline
1. Decision Frameworks in Biometrics and Forensics
2. Bayesian Method: making good decisions
3. Metrics, operating points and examples
4. Conclusion
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 2/32
Decision Frameworks
Biometric Systems in ISO/IEC JTC1 SC37 SD11
⇒ Note: separate decision subsystem Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 3/32
I Forensic Investigation Decide the k list to investigate e.g., AFIS
I Intelligence Decide where to establish relevant links in a database
I Forensic Evaluation Commnunicate for the court to decide a veredict
Decision Frameworks
Making Decisions with Biometric Systems
I Access control Accepted-rejected decision
Decisions are involved in most applications of biometric systems
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32
I Intelligence Decide where to establish relevant links in a database
I Forensic Evaluation Commnunicate for the court to decide a veredict
Decision Frameworks
Making Decisions with Biometric Systems
I Access control Accepted-rejected decision
I Forensic Investigation Decide the k list to investigate e.g., AFIS
Decisions are involved in most applications of biometric systems
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32
I Forensic Evaluation Commnunicate for the court to decide a veredict
Decision Frameworks
Making Decisions with Biometric Systems
I Access control Accepted-rejected decision
I Forensic Investigation Decide the k list to investigate e.g., AFIS
I Intelligence Decide where to establish relevant links in a database
Decisions are involved in most applications of biometric systems
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32
Decision Frameworks
Making Decisions with Biometric Systems
I Access control Accepted-rejected decision
I Forensic Investigation Decide the k list to investigate e.g., AFIS
I Intelligence Decide where to establish relevant links in a database
I Forensic Evaluation Commnunicate for the court to decide a veredict
Decisions are involved in most applications of biometric systems
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32
Decision Frameworks
Making Decisions with Biometric Systems
I Decision maker faces multiple sources of information Biometric system is one of them, but also . . .
I Prior knowledge about users/impostors/suspects I Other evidence from other biometric systems I . . .
I Decisions must consider all that information I Formalizing decision framework helps I Especially in complex problems I Example: medical diagnosis support
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 5/32
Bayesian Method
Bayesian Decisions with Biometric Systems
I A proposal: Bayesian decision theory I Decisions are made based on posterior probabilities I Considering all the relevant information available I Updating strategy: likelihood ratios (LR)
Example biometrics systems in forensic evaluation of the evidence
Prior probability
all information prior to (forensic) evidence
Posterior probability
all information, inlcuding (forensic) evidence
Weight of the Evidence
Likelihood Ratio (LR)
[1] I. Evett: Towards a uniform framework for Reporting opinions in forensic science Casework, Science and Justice, 1998.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 6/32
Bayesian Method
Value of Evidence: Likelihood Ratio (LR)
I Two-class (H1, H2) decision framework
I Likelihood Ratio: probabilistic value of the evidence, also: the ratio of posterior to prior odds
Prior odds
Posterior odds
Inference
odds: 1:99 odds: 1000:99 LR LR = 1000P (H1) = 1% P (H1 | E) = 91%
Prior odds LR Posterior odds
P (H1) P (H2)
× P (E | H1) P (E | H2)
= P (H1 | E) P (H2 | E)
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 7/32
Bayesian Method
Decisions Using Biometric Systems
I Binary classes (hypotheses): H1 and H2
I Inference I Prior probability, before knowing the biometric system outcome I Posterior probability, after the biometric system outcome I LR is the value of the biometric evidence ⇒ Changes prior odds into posterior odds
Prior odds
Posterior odds
Inference
LR (Biometric System)
P (H1 | E) P (H2 | E)
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 8/32
Bayesian Method
Decisions Using Biometric Systems
I Costs: Penalty of making a wrong decision towards H1 (Cf1) or H2 (Cf2).
I Can be different — example in access control: I is it better to accept an impostor (cost Cf1) I or to reject a genuine user (cost Cf2)?
Prior odds
Posterior odds
Inference
LR (Biometric System)
Costs Cf1, Cf2
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 9/32
Bayesian Method
Decisions Using Biometric Systems
I Decision: Minimum-risk decision i.e.: minimum mean cost
I Decision rule considers I Posterior odds I Costs P (H1 | E) Cf1 R P (H2 | E) Cf2
Prior odds
Posterior odds
Inference Decision
H1 or H2?
LR (Biometric System)
Costs Cf1, Cf2
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 10/32
Posterior odds
Costs Cf1, Cf2
Decision H1 or H2?
Competence of the Decision Maker
Prior odds
Inference
Competence of the Comparator
(Biometric System) LR
Bayesian Method
Decision Process: Competences
I Total separation between I The comparator (biometric system outputing a LR) I The decision maker (depends on the application)
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 11/32
Prior odds
Posterior odds
Inference
Costs Cf1, Cf2
Decision H1 or H2?
LR Should lead to the correct decision!
Bayesian Method
Decision Process: Consequences
I Duty of the biometric systems: yielding LR values that lead to the correct decisions
I The LR should support H1 when H1 is actually true I The LR should support H2 when H2 is actually true
I LR values must be calibrated, which leads to better decisions
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 12/32
Bayesian Method
Biometric Systems
I Score-based architecture I Widely extended I Especially in black-box implementations (COTS)
Criminal Biometric System Score
Suspect
I Score: in general the only output of the system I It may not be directly interpretable as a likelihood ratio I Depends on its calibration performance
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 13/32
Bayesian Method
LR-Based Computation with Biometric Systems
I A further stage is necessary: score-to-LR transformation
Biometric System
Score
Score-to-LR LR
I Objective: I Objective: output discriminating scores transforming the score
I Score-based architecture into a meaningful LR I Improve ROC/DET curves ⇒ Calibration of LRs [2,3]
[2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006.
[3] D. Ramos and J. Gonzalez Rodrıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32
Bayesian Method
LR-Based Computation with Biometric Systems
I A further stage is necessary: score-to-LR transformation
Biometric System Score-to-LR LR
I Objective: I Objective: output discriminating scores transforming the score
I Score-based architecture into a meaningful LR I Improve ROC/DET curves ⇒ Calibration of LRs [2,3]
[2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006.
[3] D. Ramos and J. Gonzalez Rodrıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32
Bayesian Method
LR-Based Computation with Biometric Systems
I A further stage is necessary: score-to-LR transformation
Biometric System Score-to-LR LR
I Objective: I Objective: output discriminating scores transforming the score
I Score-based architecture into a meaningful LR I Improve ROC/DET curves ⇒ Calibration of LRs [2,3]
[2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006.
[3] D. Ramos and J. Gonzalez Rodrıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32
Prior odds
Posterior odds
Inference Decision
H1 or H2?
LR (Biometric System)
Costs Cf1, Cf2
Bayesian Method
Bayesian Decisions: Advantages
I Competences of the biometric system are delimited: I Biometric system: comparator I Decision maker: final decision considering all the information I Separation of roles: important in some fields (e.g. forensics)!
I Information is integrated formally ⇒ LR into a probabilistic framework
I LR computation: great experience in other fields ⇒ Example: forensic biometrics
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 15/32
π= FNMR, FMR 7→ DET1−π
Cf1, Cf2
DCF 7→ APE & NBER
ECE
Cllr
Metrics and Examples
Revisiting ISO/IEC JTC1 SC37 SD11
P (H1 ) P (H2 )
⇒ π
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32
P (H1 ) π = P (H2 ) 1−π
⇒ π
Metrics and Examples
Revisiting ISO/IEC JTC1 SC37 SD11
FNMR, FMR 7→ DET
Cf1, Cf2
DCF 7→ APE & NBER
ECE
Cllr
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32
1−π
Cf1, Cf2
DCF 7→ APE & NBER
ECE
Cllr
FNMR, FMR 7→ DETP (H1 ) π = P (H2 )
⇒ π
Metrics and Examples
Revisiting ISO/IEC JTC1 SC37 SD11
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32
P (H1 ) π = P (H2 ) 1−π
⇒ π
Metrics and Examples
Revisiting ISO/IEC JTC1 SC37 SD11
FNMR, FMR 7→ DET
Cf1, Cf2
DCF 7→ APE & NBER
ECE
Cllr
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32
P (H1 ) π = P (H2 ) 1−π
⇒ π
DCF 7→ APE & NBER
ECE
Metrics and Examples
Revisiting ISO/IEC JTC1 SC37 SD11
FNMR, FMR 7→ DET
Cf1, Cf2
Cllr
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32
error
1 FNMR
0.5 FMR
0 −10 0 10
Thresholds
40 DET (steppy)
30 FNMs
0.1 1 2 5 10 20 40
20
FNMR
(in
%)
10
5
2
0.1
FMR (in %)
1
0.3 H10.2
0.1 H2
0 −10 0 10
Scores
Metrics and Examples
Detection Error Trade-off (DET) diagrams
[4] N. Brummer¨ and E. de Villers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score Processing, Tech.Rep. AGNITIO Research, 2011.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 17/32
P (H1) P (H2)
P (E | H1) P (E | H2)
P (H1 | E) P (H2 | E)
P (H1 | E) R Cf2⇔ P (H2 | E) Cf1
P (H1)η = log Cf2 − log R LLRCf1 P (H2)
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayes theorem
Prior odds LR Posterior odds
× =
I Decision rule
P (H1 | E) Cf1 R P (H2 | E) Cf2
I Bayesian threshold η for Log-LRs (LLRs) by posterior odds
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32
P (H1 | E) R Cf2⇔ P (H2 | E) Cf1
P (H1)η = log Cf2 − log R LLRCf1 P (H2)
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayes theorem
Prior odds LR Posterior odds
P (H1) P (H2)
× P (E | H1) P (E | H2)
= P (H1 | E) P (H2 | E)
I Decision rule
P (H1 | E) Cf1 R P (H2 | E) Cf2
I Bayesian threshold η for Log-LRs (LLRs) by posterior odds
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32
P (H1 | E) R Cf2⇔ P (H2 | E) Cf1
P (H1)η = log Cf2 − log R LLRCf1 P (H2)
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayes theorem
Prior odds LR Posterior odds
P (H1) P (H2)
× P (E | H1) P (E | H2)
= P (H1 | E) P (H2 | E)
I Decision rule
P (H1 | E) Cf1 R P (H2 | E) Cf2
I Bayesian threshold η for Log-LRs (LLRs) by posterior odds
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32
Cf2 P (H1)η = log − logCf1 P (H2)
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayesian error rate: Decision Cost Function (DCF)
DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2
I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π P (H ) 1 π1. Mutually exclusive priors: log = log = logit πP (H2) 1−π
DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2
π Cf12. Introducing an effective prior: π = π Cf1 + (1−π) Cf2
DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)
η = − logit π
⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for
Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32
π Cf1π = π Cf1 + (1−π) Cf2
Cf2 P (H1)η = log − logCf1 P (H2)
P (H1) π= log = logit πP (H2) 1−π
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayesian error rate: Decision Cost Function (DCF)
DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2
I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π
1. Mutually exclusive priors: log
DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2
2. Introducing an effective prior:
DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)
η = − logit π
⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for
Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32
π Cf1 π = π Cf1 + (1−π) Cf2
Cf2 P (H1)η = log − logCf1 P (H2)
P (H1) π log = log = logit πP (H2) 1−π
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayesian error rate: Decision Cost Function (DCF)
DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2
I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π
1. Mutually exclusive priors:
DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2
2. Introducing an effective prior:
DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)
η = − logit π
⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for
Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32
π Cf1π = π Cf1 + (1−π) Cf2
Cf2 P (H1)η = log − logCf1 P (H2)
P (H1) πlog = log = logit πP (H2) 1−π
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayesian error rate: Decision Cost Function (DCF)
DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2
I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π
1. Mutually exclusive priors:
DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2
2. Introducing an effective prior:
DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)
η = − logit π
⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for
Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32
π Cf1 π = π Cf1 + (1−π) Cf2
Cf2 P (H1)η = log − logCf1 P (H2)
P (H1) π log = log = logit πP (H2) 1−π
Metrics and Examples
From Bayesian Decisions to Cost Functions
I Bayesian error rate: Decision Cost Function (DCF)
DCF(P (H1), P (H2), Cf1, Cf2) = P (H1) FNMR(η) Cf1 + P (H2) FMR(η) Cf2
I Simplifying the operating point (P (H1), P (H2), Cf1, Cf2) 7→ π
1. Mutually exclusive priors:
DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2
2. Introducing an effective prior:
DCF(π) = π FNMR(η) + (1 − π) FMR(η) = DCF(π, 1, 1)
η = − logit π
⇒ meaningful LLR operating points: π or η [4] N. Brummer¨ and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for
Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32
Metrics and Examples
Example on Decision Cost Functions (DCFs)
I Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) 0.3
0.1 H2
H1
pdf 0.2
0 −10 0 10
LLRs
1
error
0.5 FNMR
FMR0 −10 0 10
η
DCF(π = 1 )101
0.2
Cost
0.1
0 −10 0 10
η
DCF(π = 1 )
2
I Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6)
0.2
Cost
0.1
0 −10 0 10
η
⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32
Metrics and Examples
Example on Decision Cost Functions (DCFs)
I Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) 0.3
0.1 H2
H1
pdf 0.2
0 −10 0 10
LLRs
1
error
0.5 FNMR
FMR0 −10 0 10
η
DCF(π = 1 )101 DCF(π = 1 )
2
I Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6)
0.2
actual 0
−10 0 10
η
Cost
0.1
0.2
Cost
0 −10 0 10
η
0.1
⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32
Metrics and Examples
Example on Decision Cost Functions (DCFs)
I Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) 0.3
0.1 H2
H1
pdf 0.2
0 −10 0 10
LLRs
1
error
0.5 FNMR
FMR0 −10 0 10
η
DCF(π = 1 )101 DCF(π = 1 )
2
I Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6)
0.2
minimum actual
0 −10 0 10
η
Cost
0.1
0.2
Cost
0 −10 0 10
η
0.1
⇒ actual vs. minimum DCF: calibration loss⇒ LLR meaning: aligning scores for Bayesian support
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32
P(error)
0.050
0.025
actual DCF
minimum DCF
default
0 −10 −5 0 5 10
logit π = −η
Metrics and Examples
Visualizing DCFs
I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.I Area-under-APE: cost of LLR scores
: DCF = π + (1 − π) ˜
⇒ Goodness of LLRs: Cllr
[5] N. Brummer:¨ FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.
[6] D.A. van Leeuwen and N. Brummer:¨ An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32
0.050 actual DCF
minimum DCF
default0.025 Cllr
0
logit π = −η
P(error)
−10 −5 0 5 10
Metrics and Examples
Visualizing DCFs
I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.: DCF = π + (1 − π) I Area-under-APE: cost of LLR scores ⇒ Goodness of LLRs: Cllr
[5] N. Brummer:¨ FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.
[6] D.A. van Leeuwen and N. Brummer:¨ An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32
0.050 actual DCF
minimum DCF
default0.025 Cllr
Cmin llr
0
logit π = −η
P(error)
−10 −5 0 5 10
Metrics and Examples
Visualizing DCFs
I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.: DCF = π + (1 − π) I Area-under-APE: cost of LLR scores ⇒ Goodness of LLRs: Cllr
[5] N. Brummer:¨ FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.
[6] D.A. van Leeuwen and N. Brummer:¨ An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32
P(error)
0.050 actual DCF
minimum DCF
0.025
interpolation by max. minDCF / min. ROC Convex Hull (ROCCH)
EER: 0.5%
default
Cllr
Cmin llr
0 −10 −5 0 5 10
logit π = −η
Metrics and Examples
Visualizing DCFs
I Applied Probability of Error (APE) curve I Simulating DCFs on multiple operating points I default: all LLRs = 0, i.e.: DCF = π + (1 − π) I Area-under-APE: cost of LLR scores ⇒ Goodness of LLRs: Cllr
[5] N. Brummer: FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005.
[6] D.A. van Leeuwen and N. Brummer: An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32
Metrics and Examples
Normalized Bayesian Error Rate (NBER)
I APE-plot visually misleading on error impact I EER operating point: lots of scores to mismatch I FMR1000 operating point: few scores to mismatch
I Normalizing by default performance ⇒ wider range of operating points can be compared
1 default
actual
minim0.5 30 FM
30 FN0
η = − logit π
NBER
−15 −10 −5 0 5 10 15
(LLRs=0)
DCFnorm
um DCFnorm
s
Ms
[4] N. Brummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011.
Note: in the BOSARIS toolkit, the x-axis is swapped, i.e.: depicting purely the effective prior.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 22/32
P (H1 ) π = P (H2 ) 1−
⇒ π
Metrics and Examples
Revisiting ISO/IEC JTC1 SC37 SD11
FNMR, FMR 7→ DETπ
Cf1, Cf2
X
Cllr
DCF 7→ APE & NBER
ECE
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 23/32
Metrics and Examples
Empirical Cross-Entropy (ECE)
I Objective measure of performance
I Motivation by Information Theory
Evidence
I Prior entropy −−−−−−−−−−→ Posterior entropy Information gain
I Divergence of system to Grund-of-Truth (GoT) I ECE: approximating Kullback-Leibler divergence DGoT||system
Hsystem(H1, H2) information (LLRs)
DGoT||system(H1, H2 | LLRs)
HGoT(H1, H2 | LLRs)
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 24/32
Metrics and Examples
Empirical Cross-Entropy (ECE)
I We expect the reference, but obtain the system’s LLRs
I Measuring performance of LR in terms of uncertainty I The lower the better
Calibration loss: overall performance ⇔ discriminating power I Cllr at log(odds) = 0 ⇒ no information on H1/H2 prior
−6 −4 −2 0 2 4 6 0
0.1
0.2
ECE
System
Optimal calibration
default (LLRs=0)
Prior log10(odds)
[7] D. Ramos Castro and J. Gonzalez Rodrıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 25/32
ECE Cllr
Cmin llr
System
Optimal calibration 0.2
0.1 default (LLRs=0)
0 −6 −4 −2 0 2 4 6
Prior log10(odds)
Metrics and Examples
Empirical Cross-Entropy (ECE)
I We expect the reference, but obtain the system’s LLRs
I Measuring performance of LR in terms of uncertainty I The lower the better
Calibration loss: overall performance ⇔ discriminating power I Cllr at log(odds) = 0 ⇒ no information on H1/H2 prior
[7] D. Ramos Castro and J. Gonzalez Rodrıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 25/32
Metrics and Examples
Examples
I Signature recognition [8] I Performance of feature space normalization I Simulation of application-independent decision performances
w/ feature warping
1
−2 0 2 ECE
0.5
0
Prior log10(odds)
Baseline GMM – UBM
1
−2 0 2
ECE
0.5
0
Prior log10(odds)
[8] A. Nautsch, C. Rathgeb, C. Busch: Bridging Gaps: An Application of Feature Warping to Online Signature Verification, ICCST, 2014.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 26/32
default
Baseline DCFnorm 1 Baseline minDCFnorm
Baseline 30 FMs
Baseline 30 FNMs 0.5
w/ qual. DCFnorm
w/ qual. minDCFnorm
w/ qual. 30 FMs
w/ qual. 30 FNMs
0
η = − logit π
NBER
−10 0 10
Metrics and Examples
Examples
I Speaker recognition [9] I Overview of application-dependent decision costs in 10 dB/10 s I Conventional score normalization vs. quality-based
[9] A. Nautsch, R. Saeidi, C. Rathgeb, C. Busch: Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization, Interspeech, 2015.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 27/32
conventional QMF FQE binning
Calibration scheme
Cllr [bits]
0.35
discrimination loss0.25 0.20 calibration loss
0.15
Metrics and Examples
Examples
I Speaker recognition [10] I Examining calibration schemes in 55 quality conditions I Discrimination vs. calibration loss on 55-pooled I Goal: approx. binning performance, avoiding binning
[10] A. Nautsch, R. Saeidi, C. Rathgeb, C. Busch: Robustness of Quality-based Score Calibration of Speaker Recognition Systems with respect to low-SNR and short-duration conditions, Odyssey, 2016. (to appear)
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 28/32
Metrics and Examples
Examples
I Recurring challenges in biometrics I NIST Speaker Recognition Evaluation (SRE) ⇒ DCFs (since 1996) & Cllr (since 2006)
I ICDAR Competition on Signature Verification and Writer Identification (SigWIcomp) ⇒ C & Cmin
llr llr (both since 2011)
I Non-biometric forensics [11] I Glass objects I Car paints I Inks
[11] G. Zadora, A. Martyna, D. Ramos, C. Aitken: Statistical Analysis in Forensic Science: Evidential Values of Multivariate Physicochemical Data, John Wiley and Sons, 2014.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 29/32
Conclusion
Summary
I Bayesian decision framework I Bayes theorem & decision rule enploying costs I Biometric systems: generator of Bayesian support (LLRs) I Decisions by posterior knowledge of priors and LLR score
I Score-to-LLR calibration: meaningful LLRs I Necessary step, requiring a calibration data set I Essential for validation/accredetation
I Performance reporting I Decoupled decision policy I APE curves I NBER diagrams
Cllr [bits]
0.1
0 discrimination loss
I ECE plots calibration loss
& CminI Scalars: actDCF, minDCF, Cllr llr
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 30/32
Cllr [bits]
0.1
0 discrimination loss
calibration loss
Conclusion
Summary
I Bayesian decision framework I Bayes theorem & decision rule enploying costs I Biometric systems: generator of Bayesian support (LLRs) I Decisions by posterior knowledge of priors and LLR score
I Score-to-LLR calibration: meaningful LLRs I Necessary step, requiring a calibration data set I Essential for validation/accredetation
I Performance reporting I Decoupled decision policyI APE curves I NBER diagrams I ECE plots I Scalars: actDCF, minDC
& Cmin Cllr llr F,
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 30/32
Conclusion
Perspectives
I From forensics to biometrics in general
I Forensics: distinct separation of role provinces
Suspect reference Feature extraction
Recovered probe Feature extraction
Evidence analysis (comparison) Score
Guilty (Accept)
Not-Guilty (Reject)
Province of the forensic scientist Province of the court
⇒ Non-forensic biometric companion/equivalent
. . .
vendor system
. . .
customer decision policy
Note: neither forensic scientists nor courts shall be automated, its an analogue.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 31/32
32/32
Conclusion
Application fields
I Operating point independent performance reporting I Discrimination loss 7→ Goodness of scores w/o calibration I System calibration (meaningful) I Forensic state-of-the-art
⇒ European Network of Forensic Science Institutre (ENFSI): adopted Bayesian methodology (strong recommendation)
I Fix-operational testing: no need
⇒ But: fundamental in technology testing
This work has been funded by the Center for Advanced Security Research Darmstadt (CASED), and the Hesse government (project no. 467/15-09, BioMobile).
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016
0.5 −S 0.5 SCllr = ld 1 + e + ld 1 + e|H1| |H2|S∈H1 S∈H2
X π X ππ −(S ) 1−π S
1−π 1−π |H1| |H2|
S∈H1 S∈H2
ECE = ld 1 + e + ld 1 + e � � � �
Evaluation of evidence strength
I Metrics in the Bayesian Framework I Application-independent generalization [2]:
Goodness of (Log-Likelihood Ratio) scores CllrX � � X � �
I Information-theoretic generalization [7]:
Empirical Cross-Entropy (ECE)
I Metrics represent (cross-) entropy in bits
I Performance reporting with decoupled decision layer [2] N. Brummer¨ and J. du Preez: Application Independent Evaluation of Speaker Detection,
Computer Speech and Language, 2006.
[7] D. Ramos Castro and J. Gonzalez Rodrıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008.
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 2/3
Brief introduction to calibration
I Linear: logistic regression (robust model) I Transform: Scal. = w0 + w1 S
I Non-linear: Pool-Adjacent-Violator (PAV) algorithm (optimal) I Transform: monotonic, non-parametric mapping function
PAV-calibrated
LLRs
−10
0
10
−15 −10 −5 0 5 10 15
System scores
Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 3/3