05/06/2005csis © m. gibbons on evaluating open biometric identification systems spring 2005 michael...
Post on 22-Dec-2015
214 views
TRANSCRIPT
05/06/2005
CSISCSIS
© M. Gibbons
On Evaluating OpenBiometric Identification Systems
Spring 2005
Michael Gibbons
School of Computer Science & Information Systems
05/06/2005
CSISCSIS
© M. Gibbons
• Introduction– Biometric models, Error rate types – Motivation, Hypothesis, Approach
• Visualizations in Pattern Classification– Classifier Overview
• Experiments
• Conclusions
OverviewOverview
05/06/2005
CSISCSIS
© M. Gibbons
• What is a biometric?– A person’s biological characteristic, i.e., finger-
print, voice, iris or hand geometry– A person’s behavioral characteristic, i.e., signature
• Biometric applications have started drawing a lot of attention, but there is danger…– Biometrics is not 100% reliable
IntroductionIntroduction
05/06/2005
CSISCSIS
© M. Gibbons
• Two types of biometric models:– Verification systems
• A user is identified by an ID or smart card and is verified by their biometric
– Identification systems• A user is identified by their biometric
• Positive vs. Negative models
Biometric ModelsBiometric Models
05/06/2005
CSISCSIS
© M. Gibbons
• Closed environment– A system consisting of only the people in this room
• Open environment– A system consisting of the U.S. population
Open vs. Closed EnvironmentsOpen vs. Closed Environments
05/06/2005
CSISCSIS
© M. Gibbons
Error Rate TypesError Rate Types
Errors in a Nearest Neighbor approach
Errors in an SVM or ANN approach
FR
FR
FA(1)
FA(1)
FA(2)
FA(2)
05/06/2005
CSISCSIS
© M. Gibbons
• The frequencies at which the false accepts and false rejects occur are known as the False Accept Rate (FAR) and the False Reject Rate (FRR), respectively
• These two error rates are used to determine the two key performance measurements of a biometric system:
Convenience and SecurityConvenience and Security
FARSecurity
FRReConvenienc
1
1
05/06/2005
CSISCSIS
© M. Gibbons
• Many researchers have claimed high identification accuracies on closed systems consisting of a few hundred or thousand members– One may ask if there really are any situations that
correspond to closed worlds• What happens to the security of these systems as the
population becomes larger and open, i.e., non-members are added?
• This study considers the identification problem in an open environment
MotivationMotivation
05/06/2005
CSISCSIS
© M. Gibbons
• Our hypothesis is that the accuracies reported for closed systems are relevant only to those systems and may not generalize well to larger, open systems containing non-members– We claim that a classifier with the lowest error rate
is not necessarily the best for security
HypothesisHypothesis
05/06/2005
CSISCSIS
© M. Gibbons
• Since it is impractical to test a true population, we use a reverse approach to support the hypothesis– We will work with a database M of m members, but assume
a closed system of members, where and train the system on the subset members
– We then have members to test how well the system holds up when non-members attempt to enter the system
• This approach is used on two biometric databases, one consisting of writer data and the other of iris data– First, take a look at visualization of pattern classification
ApproachApproach
05/06/2005
CSISCSIS
© M. Gibbons
• In this section we explore pattern classification in two-dimensions in order to produce visualizations to help understand the decision boundaries of the following classifiers– Nearest Neighbor (NN)– Artificial Neural Network (ANN)– Support Vector Machines (SVM)
Visualization of Pattern ClassificationVisualization of Pattern Classification
05/06/2005
CSISCSIS
© M. Gibbons
• To classify a test subject, the NN computes distances from a test subject d to each member di of the database, and classifies the test subject as the subject that has the closest distance
• The distances can be computed using various methods such city-block distance or Euclidean distance
• A threshold is used to provide reject capability
Nearest NeighborNearest Neighbor
05/06/2005
CSISCSIS
© M. Gibbons
• We chose to implement a 1-vs-all implementation of the ANN to provide reject capability
• The 1-vs-all approach becomes a series of dichotomy problems:
class 1 vs. class 2 and class 3,
class 2 vs. class 1 and class 3,
class x vs. all classes – class x.
– Any points that fall into more than one of the dichotomy decision regions will be rejected
ANN (1-vs-all)ANN (1-vs-all)
05/06/2005
CSISCSIS
© M. Gibbons
• SVM is a pattern classification technique gaining a lot of attention in recent years
• Basic idea is to maximize the margin between data points– Maximizing the margin provides better generalization than
other pattern classifiers (for example, Neural Network)– The points which lie on the hyper-planes separating the data
are called the support vectors
• What if data is non-separable?– Power of SVM is mapping functions transforming feature
space into higher dimension. These mapping functions are called kernels
Support Vector Machines (1-vs-all)Support Vector Machines (1-vs-all)
05/06/2005
CSISCSIS
© M. Gibbons
• To produce the separated data, we choose three center points and randomly generate 200 points within radius r of the center points
• The test points will consist of all points (i, j) where i = {1:100} and j = {1:100}
• We will take a look at how the mentioned classifiers classify the test points based on the separated training data sets
Simple 3-Member DatasetSimple 3-Member Dataset
05/06/2005
CSISCSIS
© M. Gibbons
Nearest NeighborNearest Neighbor
Training data No threshold
Threshold = 8 Threshold = 2
05/06/2005
CSISCSIS
© M. Gibbons
• Our hypothesis is that biometric identification on closed systems does not generalize well to larger, open systems containing non-members
• In previous section we saw a visualization of various classifiers against a simple 2-dimensional dataset– We now investigate this hypothesis further by
conducting experiments on subset database M from both the writer and iris databases
ExperimentsExperiments
'M
05/06/2005
CSISCSIS
© M. Gibbons
• Two biometric databases are used to support our claims in this study: the writer and iris biometric databases
Biometric DatabasesBiometric Databases
05/06/2005
CSISCSIS
© M. Gibbons
• For each of the databases, training sets were created– Training sets for the writer data consisted of 50, 100, 200 and 400
members– Training sets for the iris data consisted of 5, 15, 25 and 35 members– These sets included all instances per member, i.e., 3 per member for
writer and 10 per member for iris
• For each training set we created a combined evaluation set consisting of the trained members plus an increasing number of non-members– The evaluation sets for the 50-writer trained SVM consisted of 50, 100,
200, 400, 700 and 841 subjects, where the first 50 subjects are the members and the remaining subjects are non-members
– Similarly, the evaluation sets for the 25-iris trained SVM consisted of 25, 35, 45 and 52 subjects, where the first 25 subjects are the members and remaining subjects are non-members
Experiment SetupExperiment Setup
05/06/2005
CSISCSIS
© M. Gibbons
SVM ResultsSVM Results
Security for SVM on writer
Security for SVM on iris
05/06/2005
CSISCSIS
© M. Gibbons
• As hypothesized, for each curve, as the number of non-members increases, the security monotonically decreases– It might also be noted that the final rates to which
the security curves decrease appear to converge– To ensure that this is not an artifact of the particular
handwriting data used, we obtained similar experiment results using multiple classifiers and writer and iris data
ObservationsObservations
05/06/2005
CSISCSIS
© M. Gibbons
• We now present a comparison of the results from the two classifiers used in this experiment– The next figures illustrate the security performance
for 100 members of the writer database and 15 members of iris database
• Notice, although Nearest Neighbor does not perform as well on the closed environment, it eventually meets and surpasses the performance of the SVM as non-members enter the system
Classifier ComparisonClassifier Comparison
05/06/2005
CSISCSIS
© M. Gibbons
Classifier ComparisonClassifier Comparison
SVM vs. NN on writer consisting of 100 members
SVM vs. NN vs. ANN on iris consisting of 15 members
05/06/2005
CSISCSIS
© M. Gibbons
• Based on the security results of previous figures, we recognize that the curves appear to be of exponential form and that we might be able to extrapolate the security of a system for large populations containing non-members– After some fitting trials, we find the curve most
similar to be:
Security ConvergenceSecurity Convergence
daey cbx )/)(( 21
05/06/2005
CSISCSIS
© M. Gibbons
• System security (1-FAR) decreases rapidly for closed systems when they are tested in open-system mode– Thus, the high accuracy rates often obtained for closed
biometric identification problems do not appear to generalize well to the open system problem
• We also found that, although systems can be trained for greater closed-system security using SVM rather than NN classifiers, the NN systems are better for generalizing to open systems
• An estimate of the expected error was also projected based on the asymptote of an exponential curve fitted to the data
ConclusionsConclusions