face extraction 1

8/8/2019 Face Extraction 1

1/22

Face Extraction From Live VideoSITE Technical Report TR-2006

Adam Fourney

School of Information Technology and Engineering

University of Ottawa

Ottawa, Canada, K1N 6N5

Project supervised by: Dr. Robert Laganire


2/22

i

Table of Contents

Introduction.............................................................................................................................................1

Technology..............................................................................................................................................1

The Open Source Computer Vision Library (OpenCV).......................................................................1Microsoft's DirectShow.......................................................................................................................2The Background Segmentation Component ..................................................................................... ..3

Architecture.............................................................................................................................................3Graphical User Interface..........................................................................................................................4

FACE Dialog......................................................................................................................................4Input Configuration Dialog.................................................................................................................4

Output Configuration Dialog...............................................................................................................5Graph Manager...................................................................................................................................5

Face Extractor.....................................................................................................................................6Face Export Rules...............................................................................................................................7

Modes of Operation............................................................................................................................8Mechanism By Which Face Images are Exported............................................................................ ...9

Possible Improvements.......................................................................................................................9Face Detector.........................................................................................................................................10

The OpenCV Face Detector .............................................................................................................10Measurements Used to Assess Image Quality...................................................................................11

Inferring About Image Quality Using Haar-Classifiers.................................................................11Gaze Direction..............................................................................................................................11

Motion, Skin, and Motion & Skin Content................................................................................12Pixel Motion Detection............................................................................................................13

Skin Detection.........................................................................................................................13

Quality of Lighting.......................................................................................................................14Measuring the Width of a Histogram..................................................................................... ..14

Image Sharpness...........................................................................................................................15

Image Dimensions and Area.........................................................................................................16Possible Improvements.....................................................................................................................16

Pedestrian Tracker.................................................................................................................................16Possible Improvements.....................................................................................................................17

Appendix A: Building From Source.......................................................................................................18References.............................................................................................................................................20


3/22

1

Introduction

The first step in any biometric face identification process is recognizing, with a high degree of

accuracy, the regions of input video frames that constitute human faces. There has been much researchfocused on this particular task. Thankfully, this has resulted in some very robust solutions for detecting

faces in digital images.

However, frames from live video streams typically arrive at a rate of between 15 and 30 framesper second. Each frame may contain several faces. This means that faces might be detected at a rate

which is much higher than the original video frame rate. As mentioned above, these faces are destinedto be input into a biometric face identification software package. This software is likely complex, and

certainly requires some finite amount of time to process each face. It is very possible that the high rateof input could overwhelm the software.

Even if the face identification software is very efficient, and can keep up with the high rate ofincoming faces, much of the processing is wasteful. Many of the faces will belong to individuals whohave already been accounted for in previous frames.

The software described in this paper aims to alleviate the situation by detecting faces as fast as

possible, but only exporting select faces for post processing. In fact, the software aims at exporting oneimage for every pedestrian that enters the camera's field of view. This is accomplished by associating

face images to individual pedestrians. Each time a new image is associated to a pedestrian it must becompared to the best image previously associated to the individual. If the new image is an

improvement, then it replaces this best image. When a pedestrian leaves the camera's field of view, thepedestrian's best image is exported.

Technology

The current implementation of the project relies very heavily on three important technologies.Without these technologies, the project would not have been possible. The following section lists the

technologies that were used, and it will discuss why they are so invaluable.

The Open Source Computer Vision Library (OpenCV)

The open source computer vision library is a development library written in the C/C++programming language. The library includes over 300 functions ranging from basic image processing

routines all the way up to state-of-the-art computer vision operations. As the OpenCV documentationdescribes:


4/22

2

Example applications of the OpenCV library are Human-Computer Interaction (HCI);Object Identification, Segmentation and Recognition; Face Recognition; Gesture

Recognition; Motion Tracking, Ego Motion, Motion Understanding; Structure From

Motion (SFM); and Mobile Robotics.(What is OpenCV, 2006)

The importance of OpenCV to this project cannot be stressed enough; the representation of allimages processed by the face extraction software is defined by a structure located in one of the

OpenCV libraries. OpenCV routines are used in almost every instance where image processing is done.Finally, without OpenCV's object detection routines, none of this project would have been possible; the

object detection routines are used to detect faces in the video sequences, and the results are trulyamazing.

Microsoft's DirectShow

Microsoft's DirectShow is an application programming interface that allows developers tomanipulate multimedia data in various useful ways. Microsoft describes the DirectShow API as

follows:

The Microsoft DirectShow application programming interface is a media-streamingarchitecture for the Microsoft Windows platform. Using DirectShow, your applications

can perform high-quality video and audio playback or capture.(Microsoft, 2006)

DirectShow is also occasionally known by its original codename Quartz, and was designed toreplace Microsoft's earlier Video For Windows technology (DirectShow, 2006). Like Video ForWindows, DirectShow provides a standardized interface for working with video input devices as well

as with multimedia files. It also provides a technology called Intelligent Connect which makes iteven easier to program for a wide range of input devices and video encodings.

DirectShow is a very complicated API, with an equally complicated history. It is often criticized

as being overly complex. Perhaps the Wikipedia article describes this situation best:

DirectShow is infamous for its complexity and is often regarded by many people as oneof Microsoft's most complex development libraries/APIs. A long-running semi-joke on

the "Microsoft.public.win32.programmer.directx.video" newsgroup is "see you in 6months" whenever someone wants to develop a new filter for DirectShow.

(DirectShow, 2006)

Thankfully, this project did not require the development of any new DirectShow filters, and ingeneral, the technology seemed relatively manageable.


5/22

3

The Background Segmentation Component

The final technology used for the project was a background segmentation component,contributed by Dr. Robert Laganire. This component is a C++ class that, among other things, is able to

determine which pixels of a video frame constitute the foreground. This is accomplished by building astatistical model of a scene's background, and comparing each video frame to this model. This project

uses background segmentation for motion detection and object tracking. Both the face detector andpedestrian tracker components require the segmented image, which is output from the background

segmentation component.

Architecture

The face extraction software is composed of six main components, as described by the

following diagram. Each component will be the subject of a lengthy discussion


6/22

4

Graphical User Interface

The graphical user interface (GUI) of the face extractor software is arguably the least important

component of the entire project. For this reason, the level of detail in this section will be quite minimal.Additionally, this document focuses more on design than on usability. Thus, the following discussion

will not cover the individual interface widgets, nor will it serve as a manual for anyone operating thesoftware. Instead, it will simply discuss where certain aspects of the implementation can be found, and

what functionality should be expected.

The GUI for the face export application is composed of four main classes. There is the maindialog, the input configuration dialog, the output configuration dialog, and the graph manager. The

graph manager is the most important (and complex) sub-component of this part of the system. Inaddition to the aforementioned classes, there are a few other classes that simply provide some custom

controls to the GUI.

FACE Dialog

Header File: ./FaceDlg.h

C++ File: ./FaceDlg.cpp

Namespace:

C++ Class Name: CFaceDlg

The entire application was originally designed as a Microsoft Foundation Classes (MFC) dialog

project. Every dialog application project including this project begins by displaying a single maindialog window. The face extractor project uses the FACE Dialog for exactly this purpose.

From this dialog, users can:

1. Configure the input settings2. Configure the output settings

3. Start and stop video capture4. Configure the individual settings for DirectShow capture graph pins and filters.

Input Configuration Dialog

Header File: ./ConfigInputDlg.h

C++ File: ./ConfigInputDlg.cpp

Namespace: C++ Class Name: CConfigInputDlg

The input configuration dialog allows users to select a video input device, or a file to which

video has been previously saved. The list of available input devices includes all DirectShow filters that


7/22

5

are in the CLSID_VideoInputDeviceCategory category. These typically include web cameras, TV tuner

cards, and video capture cards.

Output Configuration Dialog

Header File: ./ConfigOutputDlg.h

C++ File: ./ConfigOutputDlg.cppNamespace:

C++ Class Name: CConfigOutputDlg

Unlike input configuration, output configuration is entirely optional. These settings allow usersto save the processed video sequences to a file. It also allows users to specify a directory where the

exported face images can be saved (currently, all images are saved in the JPEG format). If a userdecides to save the video to a file, then the user is prompted for a valid file name. They may also select

and configure a video compressor.

Graph Manager

Header File: ./GraphManager/GraphManager.hC++ File: /GraphManager/GraphManager.cpp

Namespace: C++ Classes: GraphManager, FilterDescriptor

The GraphManager class is one of the largest and most complicated classes of the entire project.

This class is responsible for the construction and destruction of the Microsoft DirectShow capturegraphs used by the application. The graph manager makes heavy use of the intelligent connect

technology (by constructing graphs using the CaptureGraphBuilder2 interface). Therefore, it supportsmany different video capture devices and multimedia file encodings. In fact, the face extractor software

has been tested with various brands of web cameras and at least one brand of TV tuner card(Hauppauge WinTV). Interestingly, when using the TV tuner card, intelligent connect is wise enough

to include all filters required to control the TV tuner.

The graph manager was inspired by the SequenceProcessor class included in Dr. Laganire'sOpenCV / DirectShow tutorial. In fact, both the SequenceProcessor and the GraphManager use

functions defined in the file filters.h, which was also included with the tutorial. There are, however,some major differences between these two classes. The first major difference is the use of the

CaptureGraphBuilder2 interface, which was described above. The second difference is that theGraphManager uses display names (rather than friendly names) to identify the filters internally; this

allows the system to distinguish between many physical devices that each have the same friendly name.The final difference is that the graph manager class provides support for displaying a filter's properties

or an output pin's properties. For example, to change the channel on a TV tuner device, one simplydisplays the properties of the TV tuner filter, and then selects the appropriate channel. Different devices

have different properties, and these property sheets are built into the filters themselves.


8/22

6

So far, the discussion has focused on how the graph manager controls video input. Not

surprisingly, it also controls the video output. In all cases, video is rendered directly to the screen;however, the graph manager also allows users to save the output to disk. Video can be saved in one of

many supported video encodings, and in most cases, the user can specify the level of quality and

compression of the video encoder.

At the time of the writing of this paper there are some outstanding issues regarding the graph

manager. In particular, the graph manager has trouble constructing filter graphs when using anuncompressed video file as the source. Additionally, the ability to pause playback has not been entirely

implemented. As a result, the face extractor software does not have a pause button on the maininterface. Another issue that needs consideration involves the installation of filter graph event handlers.

At this time there is no clean way to forward events to the controlling window. The final issue is thatthere are currently no provisions for seeking within video files. Despite these issues, the graph manager

was able to provide some very powerful functionality. Also, the above issues will likely be resolved inthe near future.

Face Extractor

Header File: ./FaceExtractor.h

C++ File: ./FaceExtractor .cpp

Namespace: C++ Classes: Face, FaceGroup, ExtractorObserver, Extractor

The face extractor component is the next highest level of abstraction below the GUI.

Essentially, it is the only interface that any programmer using the system is likely to need. The face

extractor receives input video frames, and exports face images according to a well defined set of rules.All image processing is accomplished by three of the face extractor's subcomponents: the face detector,the background segmentation component, and the pedestrian tracker. The face extractor merely

interprets the results of its subcomponents, and uses this information to associate faces to pedestrians.This association is achieved by assigning an identifier to each instance of a face. If two face images

have the same identifier, then it implies that both images are from the same individual.

The face extractor is also responsible for determining when a face image should be exported. Asmentioned in the introduction, the idea of the entire project is to export only the best face captured for

each pedestrian. The face extractor, however, provides slightly more flexibility. It exports a face imageif any one of the following three conditions are met:

1. No previous face image has been captured for a given pedestrian.

2. The current face image is an improvement over all previously exported images.3. If a pedestrian leaves the scene, then the best face ever captured is re-exported.

In addition to identifiers, exports are also labeled with an event. Events describe which of the

above three conditions caused the export to take place.

Finally, each exported face is also given a score value. These scores are simply numerical


9/22

7

values that indicated the quality of the exported image. As a consequence of the export conditions, the

score values of a sequence of images exported for a single individual are monotonically increasing invalue.

Using the output of the face extractor component, developers can devise several high-level post-processing rules. For example, developers can decide to process a face as soon its score crosses somepre-determined threshold. An alternative rule would be to process the best available face as soon as the

pedestrian leaves the scene.

Face Export Rules

The three rules described in the previous section assume that faces can be uniquely associated to

pedestrians. Unfortunately, the relationship between the faces returned by the face detector and thepedestrians returned by the pedestrian tracker is not usually one-to-one. For example, two people might

be walking side-by-side, and the pedestrian tracker might incorrectly assume that they are a singlemoving object. In this case, the face detector might associate two faces to one pedestrian. Worse yet,

the face detector might locate one face in some frames, and two faces in other frames. Therefore, theface export rules must be slightly more complex then previously stated.

In total, five rules are used in order to determine when faces are exported. These rules operate

on face groups rather than on individual face images. Face groups are simply unordered collections offace images that are all from the same video frame, and are all associated to the same pedestrian. At

any given time, two face groups are maintained for each pedestrian: the currentface group and thehistorical bestface group. The currentface group represents faces detected in the current video frame.

The historical bestface group represents the best faces ever associated to the pedestrian.

The rules given below describe the circumstances under which the historical bestface group isupdated to reflect the currentface group. Whenever such an update occurs, all faces in the currentface

group are exported. In addition to this simple behavior, the face extractor re-exports a pedestrian'shistorical bestface group whenever the pedestrian tracker indicates that the individual has left the

camera's field of view. The five rules are as follows:

1. If a pedestrian is considered new, then the first face group associated to the pedestrian isconsidered the historical best.

2. If the currentface group contains a single face image, and the historical bestface group also

contains a single image, then the historical bestgroup is updated only if the new face is animprovement over the previous best image. The exported face image is assigned the same

unique identifier as the image it is destined to replace.

3. If the historical bestface group contains more than one image, and this number does notincrease with the currentface group, then update the historical bestface group when the best

image from the current group is better than the worst image in the historical group. This rule isnecessary because it is impossible to determine the association of new faces to old faces.

Therefore, it is impossible to determine which of the many faces may have improved. For this


10/22

8

same reason, all face images in the group are given new unique identifiers.

4. If the current face group contains more faces than the historical bestface group, then

automatically update the historical face group. Of all rules, this one is perhaps the most

obscure. The reason that the historical bestface group is updated is that it causes the faces to beexported. This is important because it is impossible to determine which face in the group is new(and thus not yet exported). The current faces are all given new unique identifiers in order to

ensure that none of the previously exported faces are replaced.

5. If none of the previous rules are applicable, then take no action.

These rules are designed so that the face extractor errors on the side of caution. Whenever apedestrian confuses the face extractor, the software exports every face image that might be associated

to the pedestrian. In order for these rules to capture all possible scenarios, the pedestrian tracker mustalso be programmed to error on the side of caution; if the tracker ever gets confused about a pedestrian,

it must assign the pedestrian a new identifier, and treat it as a new entity.

Modes of Operation

The face exporter component has two modes of operation. These modes simply control thesequencing of operations that are applied to the video frames. In the current version of the software,

these modes are not accessible programmatically; instead, they are specified using complier constants.For example, to search for faces before locating pedestrians, one would compile the application with

the OPERATION_MODE constant set to the value FIND_FACES_THEN_PEDESTRIANS. However,to located pedestrians and then search for faces, the constant should be set to FIND_PEDESTRIANS_

THEN_FACES. The following section describes these modes in detail.

FIND_FACES_THEN_PEDESTRIANS: (Suggested mode of operation)

Thefind faces then pedestrians mode of operation searches for faces before searching for (andtracking) pedestrians. The location of each face is then input into the pedestrian tracker in the form of a

hint. Currently, the tracker uses hints to ensure that all faces are associated to a pedestrian. Forexample, if a face is detected, and no pedestrian is nearby, then a new pedestrian record is created. This

new pedestrian is described by the smallest rectangular region that contains the orphaned face. Undernormal circumstances, faces are associated to whichever pedestrian yields the largest intersection with

the rectangular region describing the face. This mode of operation ensures that all faces detected by theface detector have the opportunity to be processed.

FIND_PEDESTRIANS_THEN_FACES:

Thefind pedestrians then faces mode of operation is based on the idea that faces should only be

located in regions where pedestrians are found. This idea seems rather sound, however, the pedestriantracker occasionally returns regions that do no encompass the entire pedestrian. In this case, the face

detector may fail to detect faces that are cut off by a pedestrian's overly-small bounding rectangle. Thisproblem can be remedied by expanding the search region around each pedestrian. However, if there are

several pedestrians in the scene, then the search regions may overlap; portions of the frame may be


11/22

9

searched twice. Finally, this mode of operation is guaranteed to call the face detection routines once per

pedestrian per frame. Without careful attention to detail, the overhead of multiple calls may besignificant.

All of the above issues can be resolved, but the sources of error are numerous and the benefitsare not significant. For this reason, this mode of operation is not recommended. Selecting this modewill work, but the various algorithms still need to be tuned to address the above issues.

Mechanism By Which Face Images are Exported

Up until now, the discussion has simply mentioned that face images should be exported, but didnot explain the mechanism by which this occurs. The face extractor component implements the

observer / observable design pattern. Developers interested in the output of the face extractor simplyregister their classes as observers. Whenever a face image is to be exported, the face extractor class

simply notifies the observers by calling their updateFace() method. The arguments provided to thismethod are as follows:

Parameter

Name

Description

color This is the RGB color of the rectangle drawn around theface in the output video. This information is not required

for the main functionality, but it improves the usabilityof the GUI (it allows users to easily associate exported

images with faces outlined in the video).

previousFace This is a pointer to the face record being replaced by the

newFace. If this value is NULL, then no face haspreviously been exported on behalf of the pedestrian (i.e:

indicating a new pedestrian).

newFace This is a pointer to the face record being exported. If thisvalue is NULL, then it indicates that the pedestrian has

left the scene. In this case, thepreviousFace is providedas a record of the best face exported on behalf of the

pedestrian.

The individual observers are responsible for determining which of the export events areimportant, and which ones to ignore.

Possible Improvements

The five export rules used by the face extractor may be overly cautious; when many individuals


12/22

10

are erroneously grouped into a single pedestrian, the rate of exported images is quite fast. There are at

least two ways to correct this problem:

one can improve the pedestrian tracker so that there are fewer grouping mistakes, or

a mechanism can be developed to help pair faces across face groups. For example, it can use aperson's shirt color to help match faces belonging to the same individual.

Additionally, the face extractor should offer better support for thefind pedestrians then facesmode of operation. This would allow the software to function on much higher resolution images,

provided that the pedestrians do not occupy the entire viewing area.

Finally, the face extractor occasionally fails to export faces when pedestrians leave the scene. Itis not known if this problem is the result of a bug with the face extractor, or with the pedestrian tracker.

This issue will hopefully be resolved in the near future.

Face Detector

Header File: ./FaceDetector/FaceDetect.h

C++ File: ./FaceDetector/FaceDetect.cpp

Namespace: FaceDetector

C++ Classes: Face, Detector

The face detector is the most important and most complex of all the project components.The face detector is responsible not only for detecting faces in image sequences, but also in assessing

their quality. The OpenCV library provided the mechanism by which faces can be detected, but the

mechanism used to assess image quality needed to be built from the ground up. Measuring imagequality was certainly the most challenging aspect of the entire project.

The OpenCV Face Detector

In OpenCV, face detection is accomplished by invoking a single library method:

cvDetectObjects. This method uses a technique known as Cascading Haar Classifiers in order torecognize certain objects (in this case, faces). A tutorial on the OpenCV documentation Wiki describes

this technique as follows:

First, a classifier (namely a cascade of boosted classifiers working with haar-likefeatures) is trained with a few hundreds of sample views of a particular object (i.e., a face

or a car), called positive examples, that are scaled to the same size (say, 20x20), andnegative examples - arbitrary images of the same size.

After a classifier is trained, it can be applied to a region of interest (of the same size as

used during the training) in an input image. The classifier outputs a "1" if the region islikely to show the object (i.e., face/car), and "0" otherwise. To search for the object in the

whole image one can move the search window across the image and check every location


13/22

11

using the classifier. The classifier is designed so that it can be easily "resized" in order to

be able to find the objects of interest at different sizes, which is more efficient thanresizing the image itself. So, to find an object of an unknown size in the image the scan

procedure should be done several times at different scales.

(Face Detection using OpenCV, 2006)

Currently, the face detector component uses several of these classifiers to identify faces that are

facing in different directions. In order to improve the runtime of the face detector, the classifiers areapplied to a half-scale copy of each input frame. This means that faces smaller than 40x40 pixels will

not be detected. However, the software does allow developers to decide if scaling should take place.

Measurements Used to Assess Image Quality

In order to assess the quality of face images, a series of metrics are used to measure variousaspects of the input. These measurements are then fed into a linear function that returns the final score.

The scores increase in value as the image quality improves; therefore, larger scores are better thansmaller scores. The following list enumerates all of the measurements used for this purpose:

1. The particular Haar-Classifier that detected the face

2. Gaze direction3. Motion and skin content

4. Quality of lighting5. Sharpness of the image

6. The size of the detected face

Each of the above criteria will be discussed in great detail in this section.

Inferring About Image Quality Using Haar-Classifiers

The face detector uses several classifiers to detect faces in the video frames. Some classifiers

are more accurate than others, and some classifiers detect certain gaze directions, but not others. Forexample, results returned from a frontal face classifier are more desirable than those returned by a

profile face classifier.

Gaze Direction

In the introduction to this paper, a biometric face recognition scenario was used to introduce the

concept of face extraction from live video. Face recognition packages perform best when theindividuals are facing towards the camera. For this reason, gaze direction is an enormously important

metric in determining the quality of a face image. Since gaze direction is simply defined as thedirection in which an individual is looking, it can be estimated by locating an individual's eyes in

relation to the rest of their head. For example, if an individual is looking directly forward, the midpoint


14/22


15/22

13

The above discussion assumes that pixels exhibiting motion can be easily detected. It also

assumes that pixels representing human skin are equally simple to identify. Thankfully, this is in factthe case. The following sub-sections will describe how this is accomplished.

Pixel Motion Detection

The face detector constructs a motion history image to determine which pixels have recentlyexperienced motion. Rather than identifying a color or a shade of gray, each pixel in a motion history

image encodes the most recent time that the pixel was considered part of the foreground. This isaccomplished by numbering input video frames, and using the output of the background segmentation

component to selectively update the pixels of the motion history image. For the purpose of this project,pixels that have experienced motion are exactly those that are/were recently considered part of the

foreground. These pixels can be easily identified by a simple thresholding operation applied to themotion history image.

Skin Detection

The skin detector used by the face extraction software is based on the research of Margaret M.

Fleck and David A. Forsyth as described in their paper Naked People Skin Filter. This filter usestexture and color information to determine pixels that likely represent human skin. The main idea is

that pixels representing skin are generally tightly confined in a small region of the hue-saturation colorspace; in particular, skin tends to range from red to yellow in hue, and it tends to only be moderately

saturated. Additionally, skin tends to be rather smooth in texture. This is well represented by areaswhere the variance in pixel intensity is low (although, texture is not considered in the current

implementation of the face detector).

Currently, the skin detector uses the Hue/Saturation/Luminosity color space, in which the hueranges from 0 (red) to 360 (red again), and in which the saturation ranges from 0 to 1. Hues between

0 and 38 tend to be described as reddish-yellow, while values between 330 and 360 are consideredreddish-blue. During informal experimentation, pixels representing skin fell within one of these two

range. Additionally, pixels representing skin tended to be more saturated as the hue approached whatmight be considered yellow. Of course, these results closely agree with the results described in the

aforementioned paper, although a different color space and pair of regions was used. The particularregions used by the face detector are described in the following table:

Region Hue Saturation

Reddish-Yellow 0 38 0 1.0

Reddish-Blue 330 359 0 0.6

The above values were heavily influenced by an articled entitle Skin

Color Analysis authored by Jamie Sherrah and Shaogang Gong

Once the pixels representing skin are identified (producing the skin mask image), a second filter


16/22

14

is applied to all neighboring pixels. This second filter uses a less-strict set of rules in an attempt to

intelligently close gaps that might occur in the original skin mask.

Interestingly, images illuminated by natural or fluorescent lighting tended to be shifted toward

the blue portion of the color spectrum, while images illuminated by incandescent lighting were shiftedtowards reddish-orange. For this reason, the above regions are slightly larger than would be necessaryif the light source could be controlled.

Quality of Lighting

When developing the face detector component, one of the most frustrating phenomena occurredwhen testing the application after a long night of programming; at night, the software was carefully

tuned, and measurements like skin detection and face feature detection worked wonderfully. Indaylight, however, the lighting was harsh, the colors were shifted, and the carefully adjusted

measurements needed to be re-calibrated. The harsh lighting also greatly confused the edge detector(horizontal gradient map) used when detecting face features. For this reason, the quality of lighting is

an important metric for assessing image quality. Even if the face extraction software could cope withpoor lighting, it is not known how biometric face recognition software (or other post processing) might

cope with such images.

The first step in assessing the quality of lighting is converting the color input frames to grayscale intensity images. Once a gray scale image has been acquired, a histogram of the image's pixel

intensities is computed. The general assumption is that the quality of lighting is directly proportional tothe width of this histogram. This is not always a valid assumption; it fails when a subject is not evenly

illuminated.

In order to help address the problems caused by uneven illumination, one can rely on theassumption that faces are symmetric across a central-vertical axis. This central-axis is determined when

the face detector locates the face features. If the lighting is soft and even, then the distribution of grayscale values on one side of an individual's face should be similar to the distribution of values on the

other side of the face. This comparison is done by computing the histogram of the left and right halvesof the face, normalizing each of these histograms, and then computing their intersection. The final

lighting score is computed by multiplying the weight of this intersection with the width of the originalhistogram.

Measuring the Width of a Histogram

In the above discussion, histogram width was not well defined. For the purpose of this

application, a histogram's width is defined as the smallest number of consecutive histogram bins thataccount for 95% of the pixels in the input image. To compute this value, a greedy algorithm is used.

This algorithm locates the mean of the histogram which is the first bin to be added to a region calledthe histogram body. Histogram bins with indices higher than the largest index in the body are said to be

in the headof the histogram. Similarly, bins with lower indices are said to be in the tail.


17/22

15

The greedy algorithm iteratively grows the body of the histogram by claiming the lowest indexfrom the head,or the largest index from the tail. If the headof the histogram accounts for more pixels

than the tail, the body's expansion is in the direction of the head. Otherwise, the body expands in thedirection of the tail. This expansion continues until the body accounts for 95% of all pixels.

Image Sharpness

In addition to the quality of lighting, image sharpness is another good indicator of imagequality. In this sense, the adjective sharp is used as the antonym of blurry. Images can be blurred for

several reasons, including motion blur or a camera that is incorrectly focused. In all cases, blurredimages are certainly less desirable than sharp images. The challenge is determining a viable method for

measuring image sharpness.

Currently, the face extraction software attempts to measure the amount of high-frequencycontent contained in an image in order to judge its sharpness. In images, high frequency content can be

defined as content that encodes edges, lines, and areas where the pixel intensities change significantlyover short distances. With faces, the high-frequency content tends to concentrate around the eyes, lips,

and other face features.

In order to find high-frequency content, the software uses the Laplacian operator as a highpassfilter. Pixels that survive the highpass filter (have values greater than some pre-determined threshold)

are counted, and the result is divided by the total image area. Thus, the current measure of sharpness issimply the percentage of pixels that are considered to encode edges, lines, and other high-frequency

content.

In general images, this approach may not always be valid; one perfectly focused image maycontain fewer edges than another complex, but blurry, image. Thankfully, the face export software

should only ever compare face images to similar images acquired in previous frames. Thus, any changein the high-frequency content can generally be attributed to such things as motion blur.

Image Dimensions and Area


18/22

16

The final assumption about face image quality is that larger images are better than smaller

images. Now, it should be noted that the OpenCV face detector returns regions that are square. If theface regions were not square, then it would likely be the case that certain aspect ratios would be better

than others.

Unfortunately, one cannot always assume that large images are best. In fact, large images oftenindicate a false positive returned from the OpenCV face detector. Often, only a certain range of sizes

will be reasonable for a given scene. For example, a security camera responsible for observing a largelobby will expect smaller face images than a personal web camera sitting on top of a user's computer

monitor. This suggests that the range of acceptable image sizes should be configurable by the end-userof the system. Unfortunately, at this time, these parameters can only be changed by modifying

constants defined in the face detector source code.

Another option worth mentioning is that it might be possible for the software to learn on its ownabout the range of acceptable image sizes. For example, an average face size can be determined by

considering the dimensions of all face regions that have scored well in the other face quality metrics.Once this is accomplished, candidate face regions that are significantly different than this model can be

discarded. At this time, this option has not been explored.


The main problem with the face detector is that an image's score is not always a reasonable

indicator of quality. This is not the fault of the individual quality metrics, but is the result of thefunction which combines these individual values into the final score. As mentioned earlier, this

function is nothing more than a simple weighted sum of the aforementioned quality metrics. The

weights associated to each metric were chosen almost arbitrarily. These weights were then hand-tunedover a series of tests until the results seemed adequate. There is almost certainly a better approach thatneeds to be taken.

Pedestrian Tracker

Header File: ./PedestrianTracker/PedestrianTracker.h

C++ File: ./PedestrianTracker/PedestrianTracker.cppNamespace: PedestrianTracker

C++ Classes: Pedestrian, Tracker

The pedestrian tracker is an important component of the system, but could be the subject of anentire project. For this reason, the tracker was kept as simple as possible, while still producing

acceptable results. The current implementation processes each frame of the video sequence. Thisprocessing operates in four distinct phases:

1. The first phase uses the background segmentation component to identify the foreground

pixels. It then locates all of the connected foreground components, and their boundingrectangles.


19/22

17

2. At this point the second phase attempts to associate each of the connected components to apedestrian detected in the previous video frame. A pedestrian is nothing more than a

rectangular region of interest, and the association is achieved by determining to which

pedestrian each component intersects with the greatest area. If no association is possible,then the component is considered a new pedestrian.

3. The third phase of the tracker groups the connected components based upon the pedestrianto which each is associated. A bounding rectangle is computed for each of these groups.

These large bounding rectangles are then assigned to the appropriate pedestrian (replacingtheir bounding rectangle from the previous frame).

4. The fourth, and final, phase determines the foreground-pixel-density of each of the resultant

pedestrians. This density is simply the percentage of foreground pixels that occupy thepedestrian's current bounding rectangle. If this density is low, then it is assumed that the

results are incorrect. In this case, the tracker re-divides the pedestrian into its individualcomponents.

The above algorithm is based upon the assumption that pedestrians from the current frame

should be associated to nearby pedestrians from the previous frame. It also assumes that pedestriansmay be composed of several distinct foreground components.


As mentioned above, the current generation of the pedestrian tracker is very simple. It can, and

probably should, be replaced at a later date. At present, the tracker is slightly over-zealous aboutmerging objects. It also has difficulty tracking fast-moving objects; but, pedestrians are not usuallymoving very fast. Finally, the tracker fails to recognize when objects merge although this latter issue

could probably be easily resolved.


20/22

18

Appendix A: Building From Source

Assumptions:

The following discussion assumes that OpenCV beta 5 is installed in the following default location onWindows XP Service Pack 2:

C:\Program Files\OpenCV

This implies that the Haar classifier data sets are located as follows:

C:\Program Files\OpenCV\data\haarcascades

Finally, it assumes that OpenCV has been properly installed, and that the system path has been

modified to include:

C:\Program Files\OpenCV\bin

It also assumes that developers are using Microsoft VisualStudio 2003.

If the Assumptions Fail:

If any of the above assumptions fail then, let be the location where OpenCVwas actually installed. The following modifications are then necessary:

Changes to ./FaceDetector/FaceDetect.h

The following constants need to be modified to point to the appropriate Haar datasets:

#define DEFAULT_FRONTAL_CLASSIFIER_PATH \

\\data\\haarcascades\\haarcascade_frontalface_default.xml

#define DEFAULT_PROFILE _CLASSIFIER_PATH \\\data\\haarcascades\\haarcascade_profileface.xml

Changes to VisualStudioProject

Project -> Properties -> C/C++ -> General -> Additional Include Directories must be set to:

"\otherlibs\highgui";

"\filters\ProxyTrans";"\cxcore\include";

"\cvaux\include";"\cv\include"

Also,


21/22

19

Project -> Properties -> Linker -> General -> Additional Library Directories must be set to:

"\lib";

Finally, just to be thorough, make sure that Project -> Properties -> Linker -> Input -> AdditionalDependencies is set to:

strmiids.lib quartz.lib cv.lib cxcore.lib highgui.lib

Running the Binaries on Systems without OpenCV

If one wishes to run the face extractor software on a system where OpenCV is not installed, then thefollowing files must be included in the same directory as the binary executable FACE.exe:

cv097.dll

cv097d.dllcvaux097.dll

cxcore097.dllcxcore097d.dll

haarcascade_frontalface_default.xmlhaarcascade_profileface.xml

highgui096d.dllhighgui097.dll

proxytrans.ax

Additionally, the constants DEFAULT_FRONTAL_CLASSIFIER_PATH andDEFAULT_FONTAL_PROFILE_PATH defined in ./FaceDetector/FaceDetect.h must be modified to

load the classifiers from the local ./ directory.

Finally, the proxy transform filter must be registered. This can be achieved by executing the followingcommand at the command shell:

regsvr32 proxytrans.ax


22/22

20

References

Fleck, M. & Forsyth, A (nd.).Naked People Skin Filter. Berkeley-Iowa Naked People Finder.

Retrieved April 17, 2006 from http://www.cs.hmc.edu/~fleck/naked-skin.html

Laganire, R. (2003).A step-by-step guide to the use of the Intel OpenCV library and the MicrosoftDirectShow technology. Retreived April 17, 2006, from

http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/

Microsoft Corporation (2006).Microsoft DirectShow 9.0. MSDN Library. Retrieved April 17, 2006from http://msdn.microsoft.com/library/default.asp?url=/library/en-

us/directshow/htm/directshow.asp

OpenCV Comminity (2006). Face Detection using OpenCV. OpenCV Library Wiki. Retrieved April

17, 2006 from http://opencvlibrary.sourceforge.net/FaceDetection

OpenCV Comminity (2006). What is OpenCV?. OpenCV Library Wiki. Retrived Retrieved April 17,

2006 from http://opencvlibrary.sourceforge.net/

Peng, K, et al. (2005).A Robust Algorithm for Eye Detection on Gray Intensity Face withoutSpectacles. Journal of Computer Science and Technology. Retrieved April 17, 2006 from

http://journal.info.unlp.edu.ar/Journal/journal15/papers/JCST-Oct05-3.pdf

Sherrah, J. & Gong, S (2001). Skin Color Analysis. CvOnline. Retrieved April 17, 2006 fromhttp://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG288

Wikipedia contributors (2006).DirectShow. Wikipedia, The Free Encyclopedia. Retrieved April 17,

2006 from http://en.wikipedia.org/w/index.php?title=DirectShow&oldid=48688926.

face extraction 1

Documents