05/06/2005csis © m. gibbons on evaluating open biometric identification systems spring 2005 michael...

05/06/2005

CSISCSIS

© M. Gibbons

On Evaluating OpenBiometric Identification Systems

Spring 2005

Michael Gibbons

School of Computer Science & Information Systems

05/06/2005

CSISCSIS

© M. Gibbons

• Introduction– Biometric models, Error rate types – Motivation, Hypothesis, Approach

• Visualizations in Pattern Classification– Classifier Overview

• Experiments

• Conclusions

OverviewOverview

05/06/2005

CSISCSIS

© M. Gibbons

• What is a biometric?– A person’s biological characteristic, i.e., finger-

print, voice, iris or hand geometry– A person’s behavioral characteristic, i.e., signature

• Biometric applications have started drawing a lot of attention, but there is danger…– Biometrics is not 100% reliable

IntroductionIntroduction

05/06/2005

CSISCSIS

© M. Gibbons

• Two types of biometric models:– Verification systems

• A user is identified by an ID or smart card and is verified by their biometric

– Identification systems• A user is identified by their biometric

• Positive vs. Negative models

Biometric ModelsBiometric Models

05/06/2005

CSISCSIS

© M. Gibbons

• Closed environment– A system consisting of only the people in this room

• Open environment– A system consisting of the U.S. population

Open vs. Closed EnvironmentsOpen vs. Closed Environments

05/06/2005

CSISCSIS

© M. Gibbons

Error Rate TypesError Rate Types

Errors in a Nearest Neighbor approach

Errors in an SVM or ANN approach

FR

FR

FA(1)

FA(1)

FA(2)

FA(2)

05/06/2005

CSISCSIS

© M. Gibbons

• The frequencies at which the false accepts and false rejects occur are known as the False Accept Rate (FAR) and the False Reject Rate (FRR), respectively

• These two error rates are used to determine the two key performance measurements of a biometric system:

Convenience and SecurityConvenience and Security

FARSecurity

FRReConvenienc

1

1

05/06/2005

CSISCSIS

© M. Gibbons

• Many researchers have claimed high identification accuracies on closed systems consisting of a few hundred or thousand members– One may ask if there really are any situations that

correspond to closed worlds• What happens to the security of these systems as the

population becomes larger and open, i.e., non-members are added?

• This study considers the identification problem in an open environment

MotivationMotivation

05/06/2005

CSISCSIS

© M. Gibbons

• Our hypothesis is that the accuracies reported for closed systems are relevant only to those systems and may not generalize well to larger, open systems containing non-members– We claim that a classifier with the lowest error rate

is not necessarily the best for security

HypothesisHypothesis

05/06/2005

CSISCSIS

© M. Gibbons

• Since it is impractical to test a true population, we use a reverse approach to support the hypothesis– We will work with a database M of m members, but assume

a closed system of members, where and train the system on the subset members

– We then have members to test how well the system holds up when non-members attempt to enter the system

• This approach is used on two biometric databases, one consisting of writer data and the other of iris data– First, take a look at visualization of pattern classification

ApproachApproach

05/06/2005

CSISCSIS

© M. Gibbons

• In this section we explore pattern classification in two-dimensions in order to produce visualizations to help understand the decision boundaries of the following classifiers– Nearest Neighbor (NN)– Artificial Neural Network (ANN)– Support Vector Machines (SVM)

Visualization of Pattern ClassificationVisualization of Pattern Classification

05/06/2005

CSISCSIS

© M. Gibbons

• To classify a test subject, the NN computes distances from a test subject d to each member di of the database, and classifies the test subject as the subject that has the closest distance

• The distances can be computed using various methods such city-block distance or Euclidean distance

• A threshold is used to provide reject capability

Nearest NeighborNearest Neighbor

05/06/2005

CSISCSIS

© M. Gibbons

• We chose to implement a 1-vs-all implementation of the ANN to provide reject capability

• The 1-vs-all approach becomes a series of dichotomy problems:

class 1 vs. class 2 and class 3,

class 2 vs. class 1 and class 3,

class x vs. all classes – class x.

– Any points that fall into more than one of the dichotomy decision regions will be rejected

ANN (1-vs-all)ANN (1-vs-all)

05/06/2005

CSISCSIS

© M. Gibbons

• SVM is a pattern classification technique gaining a lot of attention in recent years

• Basic idea is to maximize the margin between data points– Maximizing the margin provides better generalization than

other pattern classifiers (for example, Neural Network)– The points which lie on the hyper-planes separating the data

are called the support vectors

• What if data is non-separable?– Power of SVM is mapping functions transforming feature

space into higher dimension. These mapping functions are called kernels

Support Vector Machines (1-vs-all)Support Vector Machines (1-vs-all)

05/06/2005

CSISCSIS

© M. Gibbons

• To produce the separated data, we choose three center points and randomly generate 200 points within radius r of the center points

• The test points will consist of all points (i, j) where i = {1:100} and j = {1:100}

• We will take a look at how the mentioned classifiers classify the test points based on the separated training data sets

Simple 3-Member DatasetSimple 3-Member Dataset

05/06/2005

CSISCSIS

© M. Gibbons

Nearest NeighborNearest Neighbor

Training data No threshold

Threshold = 8 Threshold = 2

05/06/2005

CSISCSIS

© M. Gibbons

ANN (1 vs. all)ANN (1 vs. all)

Training data 1-vs-all

05/06/2005

CSISCSIS

© M. Gibbons

SVM (1-vs-all)SVM (1-vs-all)

Training data

Gamma = 22

Gamma = 2-3

05/06/2005

CSISCSIS

© M. Gibbons

All ClassifiersAll Classifiers

ANN 1-vs-all

Training Data

NN SVM

05/06/2005

CSISCSIS

© M. Gibbons

• Our hypothesis is that biometric identification on closed systems does not generalize well to larger, open systems containing non-members

• In previous section we saw a visualization of various classifiers against a simple 2-dimensional dataset– We now investigate this hypothesis further by

conducting experiments on subset database M from both the writer and iris databases

ExperimentsExperiments

'M

05/06/2005

CSISCSIS

© M. Gibbons

• Two biometric databases are used to support our claims in this study: the writer and iris biometric databases

Biometric DatabasesBiometric Databases

05/06/2005

CSISCSIS

© M. Gibbons

• For each of the databases, training sets were created– Training sets for the writer data consisted of 50, 100, 200 and 400

members– Training sets for the iris data consisted of 5, 15, 25 and 35 members– These sets included all instances per member, i.e., 3 per member for

writer and 10 per member for iris

• For each training set we created a combined evaluation set consisting of the trained members plus an increasing number of non-members– The evaluation sets for the 50-writer trained SVM consisted of 50, 100,

200, 400, 700 and 841 subjects, where the first 50 subjects are the members and the remaining subjects are non-members

– Similarly, the evaluation sets for the 25-iris trained SVM consisted of 25, 35, 45 and 52 subjects, where the first 25 subjects are the members and remaining subjects are non-members

Experiment SetupExperiment Setup

05/06/2005

CSISCSIS

© M. Gibbons

• As hypothesized, for each curve, as the number of non-members increases, the security monotonically decreases– It might also be noted that the final rates to which

the security curves decrease appear to converge– To ensure that this is not an artifact of the particular

handwriting data used, we obtained similar experiment results using multiple classifiers and writer and iris data

ObservationsObservations

05/06/2005

CSISCSIS

© M. Gibbons

• We now present a comparison of the results from the two classifiers used in this experiment– The next figures illustrate the security performance

for 100 members of the writer database and 15 members of iris database

• Notice, although Nearest Neighbor does not perform as well on the closed environment, it eventually meets and surpasses the performance of the SVM as non-members enter the system

Classifier ComparisonClassifier Comparison

05/06/2005

CSISCSIS

© M. Gibbons

Classifier ComparisonClassifier Comparison

SVM vs. NN on writer consisting of 100 members

SVM vs. NN vs. ANN on iris consisting of 15 members

05/06/2005

CSISCSIS

© M. Gibbons

• Based on the security results of previous figures, we recognize that the curves appear to be of exponential form and that we might be able to extrapolate the security of a system for large populations containing non-members– After some fitting trials, we find the curve most

similar to be:

Security ConvergenceSecurity Convergence

daey cbx )/)(( 21

05/06/2005

CSISCSIS

© M. Gibbons

• System security (1-FAR) decreases rapidly for closed systems when they are tested in open-system mode– Thus, the high accuracy rates often obtained for closed

biometric identification problems do not appear to generalize well to the open system problem

• We also found that, although systems can be trained for greater closed-system security using SVM rather than NN classifiers, the NN systems are better for generalizing to open systems

• An estimate of the expected error was also projected based on the asymptote of an exponential curve fitted to the data

ConclusionsConclusions

05/06/2005csis © m. gibbons on evaluating open biometric identification systems spring 2005 michael...

Documents

biometric system

open systems

closed systems

security slide

reliableintroduction

securityhypothesis slide

closed system of members

types of biometric models