integrating face recognition into security systems

Integrating Face Recognition into Security

Systems

Volker Vetter1, Thomas Zielke1, and Werner von Seelen2

1C-VIS Computer Vision und Automation GmbH, D-44799 Bochum, Germany

2 Institut f�ur Neuroinformatik, Ruhr-Universit�at, D-44780 Bochum, [email protected]

Abstract. Automated processing of facial images has become a seriousmarket for both hard- and software products. For the commercial successof face recognition systems it is most crucial that the process of face im-age capturing is very convenient for the people exposed to such systems.As a consequence, the whole imaging setup has to be carefully designedfor each operational scenario.The paper mainly deals with the problem of face image capturing in realsecurity applications. In this context, we use the term face recognition

in a broader sense, including the important functionality of face spottingfollowed by face validation. The latter provides the front-end of video-based automated person authentication.We describe two examples for a successful integration of face recognitionin a security system. One is a system for recognizing authorization touse a vehicle and the other one is an automatic \assistant" for a securitydesk o�cer.Both applications require techniques for face detection and face segmen-tation. Key problems are the camera setup and the size of the surveillancearea. We propose an approach using a tracking camera with a high speedpan-tilt unit.

1 Introduction

Convenience of operation is a primary prerequisite for the acceptance of a facerecognition system as an access control device. The requirement for user coop-eration should be as little as possible, i.e. it should be limited to looking into acamera for a short period of time. The camera mounting position should there-fore be in view of a person requesting access. A good camera position will notforce the user to behave in a strange way. The observed area should be adaptedto the environment. Using a �xed camera often leads to a small observation areabecause of the required image resolution for recognition.

When integrating a face recognition system into a real world application theissues of face detection and face segmentation become as important as the recog-nition process itself. In section 3 we will present our approach of face candidatesegmentation. It combines a low-level model-based face candidate generationwith a neural-net-based �nal check of the candidate regions. The methods cho-sen were selected under the conditions of an unconstrained environment and

(a) (b) (c)

Fig. 1. Access control scenarios: (a) driver's face monitoring, (b) PIN/video inputconsole, (c) service/reception counter surveillance

real-time operation on limited hardware resources. The use of di�erent imagefeatures supporting each other should improve robustness. Section 4 presents aface recognition system integrated into a car for driver access control. In section5 a surveillance application at a security counter is described.

2 Integrating a Face Recognition System for Hands-Free

Authorization Checks

For user comfort, security systems have to avoid as much user interaction aspossible. Fig. 1 shows di�erent access control scenarios.

Inside a car the environment is quite constrained due to the �xed positionof the driver's seat (Fig. 1 (a)). This restricts the region where his head couldbe found. The driver usually has to cooperate when unlocking the system. Dur-ing the trip he will be looking straight out of the front window, look into themirrors, and monitor the panel instruments. A camera position in the vicinityof one of these positions guarantees to obtain frontal face images during vehicleoperation. However, ideal moments for facial snapshots have to be selected bythe recognition system.

A PIN/video console is found as part of some existing security systems at aprotected entrance or an automated teller machine (Fig. 1 (b)). These systemsrequire a high degree of user interaction: at least a key card and/or a PINcode has to be entered. During these actions the user position is approximatelyprede�ned, the user has to look at the device. It seems reasonable to use thatmoment for face image acquisition. This setup usually tries to control the imagecapturing process by providing a visual user feedback by a semi-transparentmirror. In case of a simple image capturing device there is no other way toobtain high-quality facial images.

In Fig. 1 (c) a surveillance situation at a clearance counter is depicted. Avoid-ing unwanted attention, a face recognition system has to operate autonomouslywithout the requirement of any interaction by the monitored persons in this

(a) (b) (c) (d)

Fig. 2. Face candidate segmentation: (a) scene from driver access control sys-tem, (b) hough space after transformation with multi-scale facial template, (c)di�erence image from edge-�ltered sequence, (d) �nally segmented face.

operational environment. Usually the monitored person will look at the secu-rity o�cer during service. This situation is di�cult to cope with: the operatingenvironment is quite unconstrained, and the position variance of possible facelocations is high. It is the task of the face recognition system to choose theright moment for face image capturing. In general, a �xed camera will not beable to obtain images with a su�cient resolution while covering the completesurveillance area. We propose a tracking camera to overcome this di�culty.

3 Face Candidate Segmentation

Most face recognition algorithms show best performance when dealing with cen-tered, frontal face images, normalized in size and illumination, with a uniformbackground [1]. The task of face candidate segmentation is to process real-worldvideo images with respect to these conditions.

Starting with image acquisition (see Fig. 2 (a)) the face candidate segmenta-tion �rst has to decide whether a face is present in the digitized image or not. Wesolve this by three low-level processes looking in parallel for form, motion andcolor properties of faces. Of course none of these properties warranties to �nda face under any circumstances, but they are unlikely to fail all together. Fromthe low-level processes candidate regions are assigned and ranked. Overlappingregions are merged if the overlap is more than 50%. The 2D position and sizemeasurements are tracked by a steady state Kalman Filter.

High ranking regions which can be tracked over some time are checked by aneural net for the presence of a face. The segmented window with highest neuralnet rate, which must be higher than a given threshold, is considered to containthe current face in front of the camera. This candidate window (Fig. 2 (d)) ispassed to the face recognition module.

3.1 Model Based Face Candidate Segmentation

The basic idea of model based face candidate segmentation is to convolve prede-�ned models with images containing faces. At matching face candidate positions

the convolution result is di�erent from that in a neighborhood. We want to limitthe focus on algorithms which are able to work on non-uniform backgrounds.This group of algorithms often uses multi-scale template matching approaches[7, 13], sometimes supported by neural networks [10], and Hough-based tech-niques [6]. We have chosen a Hough-based technique because of its inherentproperty of statistical template matching at low computational cost. Althougha multidimensional template is used the search space has only dimension two.

The acquired image is �rst appropriately scaled in size. Then a general Houghtransformation with a facial template is performed (see Fig. 2 (b)). Local maximain the Hough space are identi�ed and a �rst candidate list is created. Sincemaxima in Hough space are ambiguous a back-transformation is required toverify candidate positions.

A critical issue in the Hough transformation is of course template scale vari-ance. The candidate positions are chosen best, if the face's size variance in theimage is small and so the template size variance may be small also. Small meanshere about 50% of a mean face size. Size estimates within this range are possiblefor example, if persons are monitored in an approximately known distance.

Our model based face candidate segmentation will fail in cases of very poorface-background contrast and complex backgrounds containing shapes similar inform and size to our face model.

3.2 Color-based Face Candidate Segmentation

As a second face estimate we use color segmentation. If available, skin coloris a strong feature for the presence of a face [2, 3]. Considering varying skincolors does not lead to an unacceptable amount of misclassi�cations on complexbackgrounds.

Our algorithm is derived from the chroma keying technique. Each pixel of acolor image in YCrCb space is classi�ed to be of esh tone color A split and mergealgorithm identi�es connected regions and again a candidate list is created. Thequality is derived from the color match and the shape of the region.

In the chroma keying technique the Cr and Cb components of the video signalare used to specify a key color range. The Y component gives an additional limit.The color ranges permissible for faces have been empirically derived from sampleimages.

Color-based face spotting works most favorable with high-quality 3-chip CCDcameras. Some low-cost color cameras place esh tone colors close to the origin ofthe Cr-Cb plane. By carefully selecting the camera type and the video digitizersu�ciently good results have been achieved with certain low-cost multimediacomponents. Color based face segmentation is best used as support for othersegmentation techniques.

3.3 Motion Based Face Candidate Segmentation

Although not very reliable on its own, motion is a good key for �gure-groundsegmentation. Where a complex background may mislead form- and color-based

(a)

(b)

(c)

Fig. 3. (a) Driver requesting access, (b) Dashboard with user display, and (c) IR-spot,camera and user console

face segmentation, motion is a good indicator for \living" persons. For motiondetection image sequence processing is required. In a real-time system this lim-its the computational complexity of the algorithm that can be considered forimplementation. Motion may be detected by optical ow computation, or theevaluation of image di�erence between subsequent images or between a currentimage and an accumulated background image. All methods are subject to dis-tortions by illumination changes and non-stationary background.

Our method is similar to that in [9]. We use di�erences from subsequentedge-�ltered images (see Fig. 2 (c)) which are evaluated by a lateral histogramtechnique. The results are stabilized by considering only regions which can betracked over time. The computational cost of the process is low, so it is alsoused to control the regions of interest for the form- and color-based segmentationmodules.

4 Driver Access Control

Due to an increasing rate of car thefts, electronic motor locks have become astandard feature of new cars. The weakness of these systems is the transferablekey or chip which is required to start the engine. In order to assure highestsecurity even in case of attempted \carnapping" biometric identi�cation canbe used instead of or in addition to a conventional authorization check. Facerecognition seems to be the only method that allows \authorization monitoring",i.e. repeated identi�cation even during driving.

4.1 System Outline

We have combined our face recognition system with an electronic motor lock andintegrated the complete system into a test car [12]. A driver requesting accesshas to look for a short moment straight at a camera in the dashboard.

The identi�cation process is automatically started when a person is detectedin the camera's surveillance area. On-line recognition is performed on the basis

of a pretrained driver database. The training process can only be started witha chip-card as a special ID. It is possible to enhance the data set of personsalready known to the system. Thereby the acceptance level of identi�cation canbe increased and individual variation in appearance (like drastic changes of facehair or make-up) can be dealt with. The car is equipped with a graphical cockpitdisplay. It is used for the required dialogs in training mode. The interactivedialogs use an easy-to-use two-button interface for all settings.

4.2 Car Integration

Fig. 3 shows the installation of the face recognition system in the test car. Dueto the constrained environment inside the car monochrome form- and motion-based face segmentation is su�cient. This allows to use a monochrome cameracombined with an infrared spot. By this way, operation under di�cult lightconditions is possible without dazzling the driver. Direct sunlight exposure ofthe driver's face is still a problem.

Some e�ort was required to �nd a good position for the camera and theIR-spot. Drivers of di�erent size in multiple seat settings need to be covered. Inaddition, the window regions seen by the camera should be as small as possible toavoid sunlight interference. And at last some free space at the mounting positionshould facilitate a nice integration without obstructing the driver's �eld of view.We choose a position behind the gears, which allows an unobstructed view to thedrivers face in the required resolution. The covered window area in the image israther small.

The image processing hardware consists of two digital signal processors (DSP)of type TMS320C40 on a TIM-40 motherboard. One DSP module contains anon-board video digitizer and a video output connected to the cockpit display.This module carries out the image acquisition, the motion-based face candidatesegmentation and parts of the recognition/training process. The second DSPboard carries out in parallel the form-based face candidate segmentation andsome preprocessing steps for the recognition process.

The system is connected to the car communication bus (i.e. A-BUS). Anexchange of information with other car devices is done via this interface: thedoor and engine status is requested, the cockpit display is controlled. In future,the system may also distribute information about the current driver to adjustseat- and mirror positions.

5 Security Counter Application

As an example for an application where the environment is quite unconstrainedwe describe a system which has been developed for border control stations. Thetask is to automatically capture facial images of persons standing at a counterin the visual pose during security questioning. The situation seen through thecamera's lens is depicted in Fig. 4 (a).

(a)

window

observation box

queue ofpassangers

head of passangerbeing served

head of officer

camera

desk

operational fieldfor face spotting(approx. volume: 1 m - width, 1 m - height, 0.7 m - depth )

camerafield of view

0,999 mm 2 meters

(b)

Fig. 4. Scenario of the security counter application: (a) scene and (b) geometric analysis

A closer look a the scenario (Fig. 4 (b)) shows that the variation of possiblehead positions is much larger in comparison to the driver access control system.In other words: the surveillance area is larger. Fig. 4 (b) actually is a geometricanalysis of what can be achieved with a conventional surveillance camera ona �xed wall mount. Even with the assumption that the person being servedstands close to the desk, the area within the �eld of view of the camera becomesquite small if a face resolution of 18 dpi is considered a lower bound. To obtainfacial images of adequate resolution, we can either increase the camera resolutionwith non-standard equipment or one can use a tracking camera system. Due toprevious experience with active vision systems [5, 11] and the advantage of usingstandard video hardware, we have chosen the second approach.

5.1 Bifocal Camera Head

To ful�ll the requirements of the application a novel bifocal camera head hasbeen developed. A �xed survey camera with a wide-angle lens is used to �ndregions of attention in the whole observed area. By means of a fast pan-tiltunit (PTU) a portrait camera with a telephoto lens is pointed to these regionsto obtain high resolution facial images. The e�ectiveness of this approach hasrecently been con�rmed in [4]. The complete head is displayed in Fig. 5 (a).

The distance between both cameras is small related to the distance betweenthe camera head and the observed persons. Under the assumption of a commonprojection center this allows to derive the guiding angle for the PTU withouta three dimensional model. Due to mechanical restrictions the used assumptionis only an approximation, of course. The guidance error is smaller than 2:5� fora distance of three meters. This is small enough to bring the region of interestinto the �eld of view of the portrait camera.

Once the head of a person is \seen" by the portrait camera, closed-loop track-ing of a face from images of the portrait camera itself leads to higher precisionmovements. The tracked face is kept in center of the portrait camera by drivingthe error signal (distance of face centroid from the image center) to zero.

(a) (b)

Fig. 5. (a) Bifocal Camera Head, (b) FaceCheck user interface

Camera

Portrait

-

-

T Vision P

T Vision S

++

PTUController

Geometric

Fusion

Control Data

+

+

Pos.

Face/Head/Body

Position

Desired

Face

Recognition

Survey

Camera

Transf. S

Geometric

Transf. P

Context

Analysis

P

S

Fig. 6. Block diagram for active face tracking

5.2 Active Head Tracking

Fig. 6 shows the control scheme of the active tracking system. The system worksin two modes: the survey mode and the portrait tracking mode:

{ In survey mode the portrait camera is controlled in open loop by signalsderived from the �xed survey camera.

{ In portrait tracking mode the portrait camera is controlled by closed loopfeedback of it's own video signal.

Mode selection is performed by a data fusion module which judges state in-formation from the survey vision module, the portrait vision module, and theface recognition module. The state information of the vision modules containsdata about presence and position of a face in view of the modules. The recog-nition module determines the duration of the closed loop tracking process. If aperson has been actively tracked and recognized the attention of the portraitcamera should be switched. Both vision modules run in parallel all the time, soeven if portrait tracking mode is active, the complete scene in view of the survey

camera is monitored for other persons. By the position information the datafusion module can determine if another person is in view of the survey camera.

Both inputs on the left de�ne the desired position of the tracked head in theimage of the portrait camera. They have a constant signal, the image center.The image coordinates are geometrically transformed to a pan and tilt angle ofthe pan tilt unit.

The controller is implemented as an �-� tracking �lter [8] which stabilizesthese destination positions to a smooth movement in real-time. It is a steadystate Kalman Filter:

xM (k) = x(k � 1) + T vx(k � 1) (1)

x(k) = �xR(k) + (1� �)xM (k) (2)

vx(k) = vx(k � 1) +�

TxM (k) (3)

where xR(k) is the raw data position, x(k) the �ltered position, xM (k) themodel position, vx(k) the estimated velocity of the tracked face. � and � arepositive constants smaller than one chosen in dependence of the sampling periodT . Due to the constant coe�cients the �-� tracking �lter is much simpler tocompute than a full Kalman �lter.

The pan tilt unit with the attached portrait camera is able to perform eithera fast saccadic move to a �xed position or a smooth speed controlled motion totrack a moving object.

6 Conclusions

On the research side, a lot of work has been done on face identi�cation, facerecognition, and face spotting in images. However, the integration of researchresults concerning facial image processing into practical application systems isstill a very di�cult and risky undertaking.

Car security systems based on video authentication of the driver may becomemass products in a couple of years. We built a functional prototyp from whichwe learned valuable lessons on how to design robust algorithms for real-timeface spotting. This problem had been underestimated at �rst and a number ofmethods tried out from research papers were found to fail completely in theenvironment of the test car.

The second application we describe has led to the development of a novelbifocal camera unit supporting high speed camera pointing toward a humanface. Again the robustness of face image spotting and validation has turnedout to be crucial technical problem. In addition, the issue of user conveniencehas been a driving force for the technical development. If people are to acceptcomputers watching their faces they at least don't want to be bothered with"posing" for a norm snapshot. People feel that choosing the right moment andthe most suitable viewing direction is the computer's business.

7 Acknowledgement

We wish to thank Gerd-J�urgen Gie�ng, Markus Ohlhofer, and all other con-tributors to the FaceCheck3 system. Hubert Weisser and Rudolf Mai of theVolkswagen AG have been valuable partners in our joint e�orts towards a driverrecognizing car.

References

1. R. Brunelli and T. Poggio, Face Recognition: Features versus Templates, in IEEE

Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 10, pp. 1042{1052,1993.

2. T. C. Chang, T. S. Huang, and C. Novak, Facial Feature Extraction from ColorImages, in Proc. International Conference on Pattern Recognition, Vol. II, pp. 39{43, 1994.

3. Y. Dai and Y. Nakano, Face-Texture Model Based on SGLD and its Applicationin Face Detection in a color scene, in Pattern Recognition, Vol. 29, No. 6, pp.1007{1017, 1996.

4. T. Darrel, B. Moghaddam, and A. Pentland, Active Face Tracking and Pose Esti-mation in an Interactive Room, in Proc. Computer Vision and Pattern Recognition,pp. 67{72, 1996.

5. W. Gillner, S. Bohrer, and V. Vetter, Objektverfolgung mit pyramidenbasiertenoptischen Flussfeldern, in Bildverarbeitung '93: Forschen, Entwickeln, Anwenden,Technische Akademie Esslingen, 1993, in German.

6. V. Govindaraju, D. B. Sher, R. K. Srihari, and S. N. Srihari, Locating Human Facesin Newspaper Photographs, in Proc. Computer Vision and Pattern Recognition, pp.549{554, 1989.

7. A. Jacquin and A. Eleftheriadis, Automatic Location Tracking of Faces and FacialFeatures in Video Sequences, in Proc. International Workshop on Automatic Face-

and Gesture-Recognition, pp. 142{147, 1995.8. C. L. Phillips and H. T. Nagle, Digital Control System Analysis and Design, Pren-

tice Hall, 2nd ed. 1989.9. C. Ponticos, A Robust Real Time Face Location Algorithm for Videophones, in

Proc. British Machine Vision Conference, pp. 449{458, 1993.10. H. A. Rowley, S. Baluja, and T. Kanade, Neural Network-Based Face Detection,

in Proc. Computer Vision and Pattern Recognition, pp. 203{208, 1996.11. W. von Seelen et al., A Neural Architecture for Autonomous Visually Guided

Robots - Results of the NAMOS Project -, Fortschr.-Ber. VDI Reihe 10 Nr. 388.VDI Verlag, 1995.

12. V. Vetter, G.-J. Gie�ng, R. Mai, and H. Weisser, Driver Face Recognition as a Se-curity and Safety Feature, in Law Enforcement Technologies: Identi�cation Tech-

nologies and Tra�c Safety, Proc. SPIE 2511, pp. 182{190, 1995.13. G. Yang and T. S. Huang, Human Face Detection in a Complex Background, in

Pattern Recognition, Vol. 27, No. 1, pp. 53{63, 1994.

This article was processed using the LATEX macro package with LLNCS style

3FaceCheck is a registered trademark of C-VIS Computer Vision und AutomationGmbH.

integrating face recognition into security systems

Documents