Download - Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang OPLab @ NTUIM
Designing multiple biometric systems:
Measure of ensemble effectiveness
Allen TangOPLab @ NTUIM
Agenda
Introduction Measures of performance Measures of ensemble effectiveness Combination Rules Experimental Results Conclusion
2
INTRODUCTION
Introduction
Multimodal biometrics is better
Fuse multiple biometric results
Fusion at matching level is easier
4
Introduction
Which biometric experts shall we choose?
How to evaluate ensemble effectiveness?
Which measure gives out the best result?
5
MEASURES OF PERFORMANCE
Measures of performance
Notation E={E1…Ej…EN}: a set of N experts U={ui}: the set of users sj: the set of all scores by Ej for all user sij: the score by Ej for a user ui
fj(ui): function of Ej produce sij for ui
th: threshold; gen: genuine; imp: impostor
7
Measures of performance: Basic
False Rejection Rate(FRR) for expert Ej:
False Acceptance Rate(FAR) for expert Ej:
......(1))|()|()( genthsPdsgenspthFRR jth
jjj
......(2))|()|()( impthsPdsimpspthFAR jth
jjj
8
Measures of performance: Basic
p(sj|gen): Ej score probability distribution to genuine users
p(sj|imp): Ej score probability distribution to impostor users
Threshold(th) changes with the requirements of the application at hand
9
Measures of performance
Area under the ROC curve(AUC)
Equal error rate(ERR)
The “decidability” index d’
10
Measures of performance
11
Measures of performance: AUC
Estimate AUC by Mann-Whitney statistics:
This formulation of AUC is also called the “probability of correct pair-wise ranking”, as it computes the probability P( > )
......(3))(
]),([1 1
,,
nn
ssIAUC
n
p
n
q
impjq
genjp
genjps ,
impjqs ,
12
Measures of performance: AUC
n+/n−: no. of genuine/imposter users : score set by Ej for genuine users : score set by Ej for impostor users
, ,
, , , ,
, ,
1:
( , ) 0.5 :
0 :
gen impp j q j
gen imp gen impp j q j p j q j
gen impp j q j
s s
I s s s s
s s
genjps ,
impjqs ,
13
Measures of performance: AUC
Features of AUC estimated by WMW stat. :
Theoretically equivalent to the value by integrating ROC curve
Attain more reliable estimation of AUC in real cases(finite samples)
Divide all scores sij into 2 sets: &
genjps ,
impjqs ,
14
Measures of performance: EER
EER is the point of ROC curve where FAR and FRR are equal
The lower the value of EER, the better the performance of a biometric system
15
Measures of performance: d’
The d’ in the biometrics is to measure the separability of the distributions of genuine and impostor scores
2 2
'2 2
gen imp
gen imp
d
16
Measures of performance: d’
μgen/μimp: mean of genuine/impostor score distribution
σgen/σimp: std. deviation of genuine/impostor score distribution
The larger the d’, the better the performance of a biometric system
17
MEASURES OF ENSEMBLE EFFECTIVENESS
Measures of ensemble effectiveness
4 measures for estimating effectiveness of ensemble of biometric experts: AUC, EER, d’, and Score Dissimilarity(SD) Index
But we must take the difference in performance among the experts into consideration
19
Measures of ensemble effectiveness
Generic, weighted and normalized performance measure(pm) formulation:
pmδ=μpm∙ (1−tanh(σpm))
For AUC: AUCδ=μAUC∙ (1−tanh(σAUC)) The higher the AUC average, the
better the performances of an ensemble of experts
20
Measures of ensemble effectiveness
For ERR: ERRδ=μERR∙ (1−tanh(σERR)) The lower the ERR average, the better the
performances of an ensemble of experts For d’, consider the value of d’ that can be
much larger than 1, use normalized D’=logb(1+d’) instead of d’, and base b=10 according to the values of d’ in experiments
Thus D’δ=μD’∙ (1−tanh(σD’)) is used
21
Measures of ensemble effectiveness: SD index
SD index is based on the WMW formulation of the AUC, and is designed to measure the amount of improvement in AUC of the combination of an ensemble of experts
SD index is a measure of the amount of AUC that can be “recovered” by exploiting the complementarity of the experts
22
Measures of ensemble effectiveness: SD index
Consider 2 experts E1 & E2, and all possible scores pairs , divide these pairs into 4 subsets S00, S10, S01, S11:
,1 ,1 ,2 ,2{{ , },{ , }}gen imp gen impp q p qs s s s
23
Measures of ensemble effectiveness: SD index
AUC of E1 & E2 are listed below, where card(Suv) is the cardinality of the subset Suv:
SD index is defined as:
11 101
[ ( ) ( )]card S card SAUC n n
11 012
[ ( ) ( )]card S card SAUC n n
10 01
11 10 01
( ) ( )......(4)
( ) ( ) ( )
card S card SSD
card S card S card S
24
Measures of ensemble effectiveness: SD index
The higher the value of SD, the higher the maximum AUC that could be obtained by the combined scores
But actual increments of AUC depends on the combination method, and high SDs usually related to low performance experts
Performance measure formulation for SD: SDδ=μSD∙ (1−tanh(σSD))
25
COMBINATION RULES
Combination Rules
Combination(Fusion) in this work is at the score level, as it is the most widely used and flexible combination level
Investigate the performance of 4 combination methods: mean rule, product rule, linear combination by LDA, and DSS
LDA & DSS require a training phase to estimate the parameters needed to perform the combination
27
Combination Rules: Mean Rule
The mean rule is applied directly to the matching scores produced by the set of N experts
,1
1 N
i mean ijj
S sN
28
Combination Rules: Product Rule
The product rule is applied directly to the matching scores produced by the set of N experts
,1
1 N
i prod ijj
S SN
29
Combination Rules: Linear Combination by LDA
Linear discriminant analysis(LDA) can be used to compute the weights of a linear combination of the scores
This rule is to attain a fused score with minimum within-class variations and maximum between-class variations
,
ti LDA iS W S
30
Combination Rules: Linear Combination by LDA
Wt(W): transformation vector
computed using a training set Si: vector of the scores assigned to
the user ui by all the experts μgen/μimp: mean of genuine/impostor
score distribution Sw: within-class scatter matrix
1( )W gen impW S
31
Combination Rules: DSS
Dynamic score selection(DSS) is to select one of the scores sij available for each user ui, instead of fusing them into a new score
The ideal selector is based on the knowledge of the state of nature of each user:
i
,*i
max{ }: if u is a genuine......(5)
min{ }: if u is an imposter
ij
iij
sS
s
32
Combination Rules: DSS
DSS selects the scores according estimation of the state of nature for each user, and the algorithm is based on quadratic discriminant classifier (QDC)
For the estimation, a vector space is built where the vector components are the scores assigned to the user by the N experts
33
Combination Rules: DSS
Train a classifier on this vector space by using a training set related to genuine and impostor users
Using the classifier to estimate the state of nature of the user
After getting the estimation of the state of nature of the user, select user’s score according to (5).
34
EXPERIMENTAL RESULTS
Experimental Results: Goal
Investigate the correlation between the measures of the effectiveness of the ensemble
Understand final performances achieved by the combined experts, and get the best measures
36
Experimental Results: Preparation
Scores source: 41 experts and 4 DBs from open category in 3rd Fingerprint Verification Competition(FVC2004)
No. of scores: For each sensor and for each expert, a total of 7750 scores, attempts from gen./imp. users are 2800/4950
For LDA & DSS training, divide scores into 4 subsets, with 700 gen. and 1238 imp. each
37
Experimental Results: Process
No. of expert pairs: 13,120(41x40x2x4) For each pair, compute the measures
of effectiveness by AUC, EER, d’ and SD index
Combine the pairs using 4 combination rules, then compute related values of AUC and EER to show the performance
Use a graphical representation of the results of the experiments
38
Experimental Results: AUCδ plotted against AUC
39
Experimental Results: AUCδ plotted against AUC
40
Experimental Results: AUCδ plotted against AUC
According to graphs, AUCδ isn’t useful because no clear relationship with AUC of combination rules
High AUCδ attains high AUC, but lower AUCδ gets value in wide range
High AUCδ relates to high performance and similar behavior experts pair
Mean rule has best AUCδ
41
Experimental Results: AUCδ plotted against EER
42
Experimental Results: AUCδ plotted against EER
43
Experimental Results: AUCδ plotted against EER
AUCδ is uncorrelated with the EER too Any value of AUCδ , the EER spans
over a wide range of values Can not predict the performance of
the combination in terms of EER by AUCδ
44
Experimental Results: EERδ plotted against AUC
45
Experimental Results: EERδ plotted against AUC
46
Experimental Results: EERδ plotted against AUC
Behavior better than AUCδ, but still no clear relationship between EERδ and AUC
Mean rules has best result too
47
Experimental Results: EERδ plotted against EER
48
Experimental Results: EERδ plotted against EER
49
Experimental Results: EERδ plotted against EER
No correlation between EERδ and EER Graphs from AUCδ against EER and
EERδ against EER have similar results So AUC and EER are not suitable to
evaluate combination of experts, despite that they are widely used for unimodal biometric system
50
Experimental Results: D’δ plotted against AUC
51
Experimental Results: D’δ plotted against AUC
52
Experimental Results: D’δ plotted against AUC
Higher values of D'δ guarantee smaller ranges of values of the performance of the combination
D'δ has higher and clearer correlation with performance of combination
Mean rule gets best result, and product rule is the worst
53
Experimental Results: D’δ plotted against EER
54
Experimental Results: D’δ plotted against EER
55
Experimental Results: D’δ plotted against EER
D'δ has better correlation with EER too
D'δ is much better than AUCδ and EERδ
D'δ is a good measure to evaluate the effectiveness of candidate ensembles of biometric experts
56
Experimental Results: SDδ plotted against AUC
57
Experimental Results: SDδ plotted against AUC
58
Experimental Results: SDδ plotted against AUC
SDδ does have some correlation with AUC because SD is designed to predict max improvement in AUC by combining experts, but is still not clear enough
Small SDδs guarantee large performance, especially for high performance experts pair, because higher the AUC of the individual experts, the smaller the complementarity
59
Experimental Results: SDδ plotted against EER
60
Experimental Results: SDδ plotted against EER
61
Experimental Results: SDδ plotted against EER
SDδ with EER isn’t as good as AUC
Result from product rule is still no good
62
CONCLUSION
Conclusion
To predict performance improvement, product rule exhibit worst, mean rule is best, and LDA & DSS not far from mean rule
Under mean rule, LDA & DSS have similar results
Performance of combined experts is not highly correlated with single one in general
64
Conclusion
The best measure of ensemble is D'δ, while AUC δ and ERR δ isn’t good enough, and SD δ performs like AUC δ
Based on above results, D' δ with mean rule tops any other pairs of measure and combination rule, and is the most suitable method to be the measure of ensemble effeectiveness
65
THANKS FOR LISTENING!It’s Q&A time!