face extraction 1
TRANSCRIPT
-
8/8/2019 Face Extraction 1
1/22
Face Extraction From Live VideoSITE Technical Report TR-2006
Adam Fourney
School of Information Technology and Engineering
University of Ottawa
Ottawa, Canada, K1N 6N5
Project supervised by: Dr. Robert Laganire
-
8/8/2019 Face Extraction 1
2/22
i
Table of Contents
Introduction.............................................................................................................................................1
Technology..............................................................................................................................................1
The Open Source Computer Vision Library (OpenCV).......................................................................1Microsoft's DirectShow.......................................................................................................................2The Background Segmentation Component ..................................................................................... ..3
Architecture.............................................................................................................................................3Graphical User Interface..........................................................................................................................4
FACE Dialog......................................................................................................................................4Input Configuration Dialog.................................................................................................................4
Output Configuration Dialog...............................................................................................................5Graph Manager...................................................................................................................................5
Face Extractor.....................................................................................................................................6Face Export Rules...............................................................................................................................7
Modes of Operation............................................................................................................................8Mechanism By Which Face Images are Exported............................................................................ ...9
Possible Improvements.......................................................................................................................9Face Detector.........................................................................................................................................10
The OpenCV Face Detector .............................................................................................................10Measurements Used to Assess Image Quality...................................................................................11
Inferring About Image Quality Using Haar-Classifiers.................................................................11Gaze Direction..............................................................................................................................11
Motion, Skin, and Motion & Skin Content................................................................................12Pixel Motion Detection............................................................................................................13
Skin Detection.........................................................................................................................13
Quality of Lighting.......................................................................................................................14Measuring the Width of a Histogram..................................................................................... ..14
Image Sharpness...........................................................................................................................15
Image Dimensions and Area.........................................................................................................16Possible Improvements.....................................................................................................................16
Pedestrian Tracker.................................................................................................................................16Possible Improvements.....................................................................................................................17
Appendix A: Building From Source.......................................................................................................18References.............................................................................................................................................20
-
8/8/2019 Face Extraction 1
3/22
1
Introduction
The first step in any biometric face identification process is recognizing, with a high degree of
accuracy, the regions of input video frames that constitute human faces. There has been much researchfocused on this particular task. Thankfully, this has resulted in some very robust solutions for detecting
faces in digital images.
However, frames from live video streams typically arrive at a rate of between 15 and 30 framesper second. Each frame may contain several faces. This means that faces might be detected at a rate
which is much higher than the original video frame rate. As mentioned above, these faces are destinedto be input into a biometric face identification software package. This software is likely complex, and
certainly requires some finite amount of time to process each face. It is very possible that the high rateof input could overwhelm the software.
Even if the face identification software is very efficient, and can keep up with the high rate ofincoming faces, much of the processing is wasteful. Many of the faces will belong to individuals whohave already been accounted for in previous frames.
The software described in this paper aims to alleviate the situation by detecting faces as fast as
possible, but only exporting select faces for post processing. In fact, the software aims at exporting oneimage for every pedestrian that enters the camera's field of view. This is accomplished by associating
face images to individual pedestrians. Each time a new image is associated to a pedestrian it must becompared to the best image previously associated to the individual. If the new image is an
improvement, then it replaces this best image. When a pedestrian leaves the camera's field of view, thepedestrian's best image is exported.
Technology
The current implementation of the project relies very heavily on three important technologies.Without these technologies, the project would not have been possible. The following section lists the
technologies that were used, and it will discuss why they are so invaluable.
The Open Source Computer Vision Library (OpenCV)
The open source computer vision library is a development library written in the C/C++programming language. The library includes over 300 functions ranging from basic image processing
routines all the way up to state-of-the-art computer vision operations. As the OpenCV documentationdescribes:
-
8/8/2019 Face Extraction 1
4/22
2
Example applications of the OpenCV library are Human-Computer Interaction (HCI);Object Identification, Segmentation and Recognition; Face Recognition; Gesture
Recognition; Motion Tracking, Ego Motion, Motion Understanding; Structure From
Motion (SFM); and Mobile Robotics.(What is OpenCV, 2006)
The importance of OpenCV to this project cannot be stressed enough; the representation of allimages processed by the face extraction software is defined by a structure located in one of the
OpenCV libraries. OpenCV routines are used in almost every instance where image processing is done.Finally, without OpenCV's object detection routines, none of this project would have been possible; the
object detection routines are used to detect faces in the video sequences, and the results are trulyamazing.
Microsoft's DirectShow
Microsoft's DirectShow is an application programming interface that allows developers tomanipulate multimedia data in various useful ways. Microsoft describes the DirectShow API as
follows:
The Microsoft DirectShow application programming interface is a media-streamingarchitecture for the Microsoft Windows platform. Using DirectShow, your applications
can perform high-quality video and audio playback or capture.(Microsoft, 2006)
DirectShow is also occasionally known by its original codename Quartz, and was designed toreplace Microsoft's earlier Video For Windows technology (DirectShow, 2006). Like Video ForWindows, DirectShow provides a standardized interface for working with video input devices as well
as with multimedia files. It also provides a technology called Intelligent Connect which makes iteven easier to program for a wide range of input devices and video encodings.
DirectShow is a very complicated API, with an equally complicated history. It is often criticized
as being overly complex. Perhaps the Wikipedia article describes this situation best:
DirectShow is infamous for its complexity and is often regarded by many people as oneof Microsoft's most complex development libraries/APIs. A long-running semi-joke on
the "Microsoft.public.win32.programmer.directx.video" newsgroup is "see you in 6months" whenever someone wants to develop a new filter for DirectShow.
(DirectShow, 2006)
Thankfully, this project did not require the development of any new DirectShow filters, and ingeneral, the technology seemed relatively manageable.
-
8/8/2019 Face Extraction 1
5/22
3
The Background Segmentation Component
The final technology used for the project was a background segmentation component,contributed by Dr. Robert Laganire. This component is a C++ class that, among other things, is able to
determine which pixels of a video frame constitute the foreground. This is accomplished by building astatistical model of a scene's background, and comparing each video frame to this model. This project
uses background segmentation for motion detection and object tracking. Both the face detector andpedestrian tracker components require the segmented image, which is output from the background
segmentation component.
Architecture
The face extraction software is composed of six main components, as described by the
following diagram. Each component will be the subject of a lengthy discussion
-
8/8/2019 Face Extraction 1
6/22
4
Graphical User Interface
The graphical user interface (GUI) of the face extractor software is arguably the least important
component of the entire project. For this reason, the level of detail in this section will be quite minimal.Additionally, this document focuses more on design than on usability. Thus, the following discussion
will not cover the individual interface widgets, nor will it serve as a manual for anyone operating thesoftware. Instead, it will simply discuss where certain aspects of the implementation can be found, and
what functionality should be expected.
The GUI for the face export application is composed of four main classes. There is the maindialog, the input configuration dialog, the output configuration dialog, and the graph manager. The
graph manager is the most important (and complex) sub-component of this part of the system. Inaddition to the aforementioned classes, there are a few other classes that simply provide some custom
controls to the GUI.
FACE Dialog
Header File: ./FaceDlg.h
C++ File: ./FaceDlg.cpp
Namespace:
C++ Class Name: CFaceDlg
The entire application was originally designed as a Microsoft Foundation Classes (MFC) dialog
project. Every dialog application project including this project begins by displaying a single maindialog window. The face extractor project uses the FACE Dialog for exactly this purpose.
From this dialog, users can:
1. Configure the input settings2. Configure the output settings
3. Start and stop video capture4. Configure the individual settings for DirectShow capture graph pins and filters.
Input Configuration Dialog
Header File: ./ConfigInputDlg.h
C++ File: ./ConfigInputDlg.cpp
Namespace: C++ Class Name: CConfigInputDlg
The input configuration dialog allows users to select a video input device, or a file to which
video has been previously saved. The list of available input devices includes all DirectShow filters that
-
8/8/2019 Face Extraction 1
7/22
5
are in the CLSID_VideoInputDeviceCategory category. These typically include web cameras, TV tuner
cards, and video capture cards.
Output Configuration Dialog
Header File: ./ConfigOutputDlg.h
C++ File: ./ConfigOutputDlg.cppNamespace:
C++ Class Name: CConfigOutputDlg
Unlike input configuration, output configuration is entirely optional. These settings allow usersto save the processed video sequences to a file. It also allows users to specify a directory where the
exported face images can be saved (currently, all images are saved in the JPEG format). If a userdecides to save the video to a file, then the user is prompted for a valid file name. They may also select
and configure a video compressor.
Graph Manager
Header File: ./GraphManager/GraphManager.hC++ File: /GraphManager/GraphManager.cpp
Namespace: C++ Classes: GraphManager, FilterDescriptor
The GraphManager class is one of the largest and most complicated classes of the entire project.
This class is responsible for the construction and destruction of the Microsoft DirectShow capturegraphs used by the application. The graph manager makes heavy use of the intelligent connect
technology (by constructing graphs using the CaptureGraphBuilder2 interface). Therefore, it supportsmany different video capture devices and multimedia file encodings. In fact, the face extractor software
has been tested with various brands of web cameras and at least one brand of TV tuner card(Hauppauge WinTV). Interestingly, when using the TV tuner card, intelligent connect is wise enough
to include all filters required to control the TV tuner.
The graph manager was inspired by the SequenceProcessor class included in Dr. Laganire'sOpenCV / DirectShow tutorial. In fact, both the SequenceProcessor and the GraphManager use
functions defined in the file filters.h, which was also included with the tutorial. There are, however,some major differences between these two classes. The first major difference is the use of the
CaptureGraphBuilder2 interface, which was described above. The second difference is that theGraphManager uses display names (rather than friendly names) to identify the filters internally; this
allows the system to distinguish between many physical devices that each have the same friendly name.The final difference is that the graph manager class provides support for displaying a filter's properties
or an output pin's properties. For example, to change the channel on a TV tuner device, one simplydisplays the properties of the TV tuner filter, and then selects the appropriate channel. Different devices
have different properties, and these property sheets are built into the filters themselves.
-
8/8/2019 Face Extraction 1
8/22
6
So far, the discussion has focused on how the graph manager controls video input. Not
surprisingly, it also controls the video output. In all cases, video is rendered directly to the screen;however, the graph manager also allows users to save the output to disk. Video can be saved in one of
many supported video encodings, and in most cases, the user can specify the level of quality and
compression of the video encoder.
At the time of the writing of this paper there are some outstanding issues regarding the graph
manager. In particular, the graph manager has trouble constructing filter graphs when using anuncompressed video file as the source. Additionally, the ability to pause playback has not been entirely
implemented. As a result, the face extractor software does not have a pause button on the maininterface. Another issue that needs consideration involves the installation of filter graph event handlers.
At this time there is no clean way to forward events to the controlling window. The final issue is thatthere are currently no provisions for seeking within video files. Despite these issues, the graph manager
was able to provide some very powerful functionality. Also, the above issues will likely be resolved inthe near future.
Face Extractor
Header File: ./FaceExtractor.h
C++ File: ./FaceExtractor .cpp
Namespace: C++ Classes: Face, FaceGroup, ExtractorObserver, Extractor
The face extractor component is the next highest level of abstraction below the GUI.
Essentially, it is the only interface that any programmer using the system is likely to need. The face
extractor receives input video frames, and exports face images according to a well defined set of rules.All image processing is accomplished by three of the face extractor's subcomponents: the face detector,the background segmentation component, and the pedestrian tracker. The face extractor merely
interprets the results of its subcomponents, and uses this information to associate faces to pedestrians.This association is achieved by assigning an identifier to each instance of a face. If two face images
have the same identifier, then it implies that both images are from the same individual.
The face extractor is also responsible for determining when a face image should be exported. Asmentioned in the introduction, the idea of the entire project is to export only the best face captured for
each pedestrian. The face extractor, however, provides slightly more flexibility. It exports a face imageif any one of the following three conditions are met:
1. No previous face image has been captured for a given pedestrian.
2. The current face image is an improvement over all previously exported images.3. If a pedestrian leaves the scene, then the best face ever captured is re-exported.
In addition to identifiers, exports are also labeled with an event. Events describe which of the
above three conditions caused the export to take place.
Finally, each exported face is also given a score value. These scores are simply numerical
-
8/8/2019 Face Extraction 1
9/22
7
values that indicated the quality of the exported image. As a consequence of the export conditions, the
score values of a sequence of images exported for a single individual are monotonically increasing invalue.
Using the output of the face extractor component, developers can devise several high-level post-processing rules. For example, developers can decide to process a face as soon its score crosses somepre-determined threshold. An alternative rule would be to process the best available face as soon as the
pedestrian leaves the scene.
Face Export Rules
The three rules described in the previous section assume that faces can be uniquely associated to
pedestrians. Unfortunately, the relationship between the faces returned by the face detector and thepedestrians returned by the pedestrian tracker is not usually one-to-one. For example, two people might
be walking side-by-side, and the pedestrian tracker might incorrectly assume that they are a singlemoving object. In this case, the face detector might associate two faces to one pedestrian. Worse yet,
the face detector might locate one face in some frames, and two faces in other frames. Therefore, theface export rules must be slightly more complex then previously stated.
In total, five rules are used in order to determine when faces are exported. These rules operate
on face groups rather than on individual face images. Face groups are simply unordered collections offace images that are all from the same video frame, and are all associated to the same pedestrian. At
any given time, two face groups are maintained for each pedestrian: the currentface group and thehistorical bestface group. The currentface group represents faces detected in the current video frame.
The historical bestface group represents the best faces ever associated to the pedestrian.
The rules given below describe the circumstances under which the historical bestface group isupdated to reflect the currentface group. Whenever such an update occurs, all faces in the currentface
group are exported. In addition to this simple behavior, the face extractor re-exports a pedestrian'shistorical bestface group whenever the pedestrian tracker indicates that the individual has left the
camera's field of view. The five rules are as follows:
1. If a pedestrian is considered new, then the first face group associated to the pedestrian isconsidered the historical best.
2. If the currentface group contains a single face image, and the historical bestface group also
contains a single image, then the historical bestgroup is updated only if the new face is animprovement over the previous best image. The exported face image is assigned the same
unique identifier as the image it is destined to replace.
3. If the historical bestface group contains more than one image, and this number does notincrease with the currentface group, then update the historical bestface group when the best
image from the current group is better than the worst image in the historical group. This rule isnecessary because it is impossible to determine the association of new faces to old faces.
Therefore, it is impossible to determine which of the many faces may have improved. For this
-
8/8/2019 Face Extraction 1
10/22
8
same reason, all face images in the group are given new unique identifiers.
4. If the current face group contains more faces than the historical bestface group, then
automatically update the historical face group. Of all rules, this one is perhaps the most
obscure. The reason that the historical bestface group is updated is that it causes the faces to beexported. This is important because it is impossible to determine which face in the group is new(and thus not yet exported). The current faces are all given new unique identifiers in order to
ensure that none of the previously exported faces are replaced.
5. If none of the previous rules are applicable, then take no action.
These rules are designed so that the face extractor errors on the side of caution. Whenever apedestrian confuses the face extractor, the software exports every face image that might be associated
to the pedestrian. In order for these rules to capture all possible scenarios, the pedestrian tracker mustalso be programmed to error on the side of caution; if the tracker ever gets confused about a pedestrian,
it must assign the pedestrian a new identifier, and treat it as a new entity.
Modes of Operation
The face exporter component has two modes of operation. These modes simply control thesequencing of operations that are applied to the video frames. In the current version of the software,
these modes are not accessible programmatically; instead, they are specified using complier constants.For example, to search for faces before locating pedestrians, one would compile the application with
the OPERATION_MODE constant set to the value FIND_FACES_THEN_PEDESTRIANS. However,to located pedestrians and then search for faces, the constant should be set to FIND_PEDESTRIANS_
THEN_FACES. The following section describes these modes in detail.
FIND_FACES_THEN_PEDESTRIANS: (Suggested mode of operation)
Thefind faces then pedestrians mode of operation searches for faces before searching for (andtracking) pedestrians. The location of each face is then input into the pedestrian tracker in the form of a
hint. Currently, the tracker uses hints to ensure that all faces are associated to a pedestrian. Forexample, if a face is detected, and no pedestrian is nearby, then a new pedestrian record is created. This
new pedestrian is described by the smallest rectangular region that contains the orphaned face. Undernormal circumstances, faces are associated to whichever pedestrian yields the largest intersection with
the rectangular region describing the face. This mode of operation ensures that all faces detected by theface detector have the opportunity to be processed.
FIND_PEDESTRIANS_THEN_FACES:
Thefind pedestrians then faces mode of operation is based on the idea that faces should only be
located in regions where pedestrians are found. This idea seems rather sound, however, the pedestriantracker occasionally returns regions that do no encompass the entire pedestrian. In this case, the face
detector may fail to detect faces that are cut off by a pedestrian's overly-small bounding rectangle. Thisproblem can be remedied by expanding the search region around each pedestrian. However, if there are
several pedestrians in the scene, then the search regions may overlap; portions of the frame may be
-
8/8/2019 Face Extraction 1
11/22
9
searched twice. Finally, this mode of operation is guaranteed to call the face detection routines once per
pedestrian per frame. Without careful attention to detail, the overhead of multiple calls may besignificant.
All of the above issues can be resolved, but the sources of error are numerous and the benefitsare not significant. For this reason, this mode of operation is not recommended. Selecting this modewill work, but the various algorithms still need to be tuned to address the above issues.
Mechanism By Which Face Images are Exported
Up until now, the discussion has simply mentioned that face images should be exported, but didnot explain the mechanism by which this occurs. The face extractor component implements the
observer / observable design pattern. Developers interested in the output of the face extractor simplyregister their classes as observers. Whenever a face image is to be exported, the face extractor class
simply notifies the observers by calling their updateFace() method. The arguments provided to thismethod are as follows:
Parameter
Name
Description
color This is the RGB color of the rectangle drawn around theface in the output video. This information is not required
for the main functionality, but it improves the usabilityof the GUI (it allows users to easily associate exported
images with faces outlined in the video).
previousFace This is a pointer to the face record being replaced by the
newFace. If this value is NULL, then no face haspreviously been exported on behalf of the pedestrian (i.e:
indicating a new pedestrian).
newFace This is a pointer to the face record being exported. If thisvalue is NULL, then it indicates that the pedestrian has
left the scene. In this case, thepreviousFace is providedas a record of the best face exported on behalf of the
pedestrian.
The individual observers are responsible for determining which of the export events areimportant, and which ones to ignore.
Possible Improvements
The five export rules used by the face extractor may be overly cautious; when many individuals
-
8/8/2019 Face Extraction 1
12/22
10
are erroneously grouped into a single pedestrian, the rate of exported images is quite fast. There are at
least two ways to correct this problem:
one can improve the pedestrian tracker so that there are fewer grouping mistakes, or
a mechanism can be developed to help pair faces across face groups. For example, it can use aperson's shirt color to help match faces belonging to the same individual.
Additionally, the face extractor should offer better support for thefind pedestrians then facesmode of operation. This would allow the software to function on much higher resolution images,
provided that the pedestrians do not occupy the entire viewing area.
Finally, the face extractor occasionally fails to export faces when pedestrians leave the scene. Itis not known if this problem is the result of a bug with the face extractor, or with the pedestrian tracker.
This issue will hopefully be resolved in the near future.
Face Detector
Header File: ./FaceDetector/FaceDetect.h
C++ File: ./FaceDetector/FaceDetect.cpp
Namespace: FaceDetector
C++ Classes: Face, Detector
The face detector is the most important and most complex of all the project components.The face detector is responsible not only for detecting faces in image sequences, but also in assessing
their quality. The OpenCV library provided the mechanism by which faces can be detected, but the
mechanism used to assess image quality needed to be built from the ground up. Measuring imagequality was certainly the most challenging aspect of the entire project.
The OpenCV Face Detector
In OpenCV, face detection is accomplished by invoking a single library method:
cvDetectObjects. This method uses a technique known as Cascading Haar Classifiers in order torecognize certain objects (in this case, faces). A tutorial on the OpenCV documentation Wiki describes
this technique as follows:
First, a classifier (namely a cascade of boosted classifiers working with haar-likefeatures) is trained with a few hundreds of sample views of a particular object (i.e., a face
or a car), called positive examples, that are scaled to the same size (say, 20x20), andnegative examples - arbitrary images of the same size.
After a classifier is trained, it can be applied to a region of interest (of the same size as
used during the training) in an input image. The classifier outputs a "1" if the region islikely to show the object (i.e., face/car), and "0" otherwise. To search for the object in the
whole image one can move the search window across the image and check every location
-
8/8/2019 Face Extraction 1
13/22
11
using the classifier. The classifier is designed so that it can be easily "resized" in order to
be able to find the objects of interest at different sizes, which is more efficient thanresizing the image itself. So, to find an object of an unknown size in the image the scan
procedure should be done several times at different scales.
(Face Detection using OpenCV, 2006)
Currently, the face detector component uses several of these classifiers to identify faces that are
facing in different directions. In order to improve the runtime of the face detector, the classifiers areapplied to a half-scale copy of each input frame. This means that faces smaller than 40x40 pixels will
not be detected. However, the software does allow developers to decide if scaling should take place.
Measurements Used to Assess Image Quality
In order to assess the quality of face images, a series of metrics are used to measure variousaspects of the input. These measurements are then fed into a linear function that returns the final score.
The scores increase in value as the image quality improves; therefore, larger scores are better thansmaller scores. The following list enumerates all of the measurements used for this purpose:
1. The particular Haar-Classifier that detected the face
2. Gaze direction3. Motion and skin content
4. Quality of lighting5. Sharpness of the image
6. The size of the detected face
Each of the above criteria will be discussed in great detail in this section.
Inferring About Image Quality Using Haar-Classifiers
The face detector uses several classifiers to detect faces in the video frames. Some classifiers
are more accurate than others, and some classifiers detect certain gaze directions, but not others. Forexample, results returned from a frontal face classifier are more desirable than those returned by a
profile face classifier.
Gaze Direction
In the introduction to this paper, a biometric face recognition scenario was used to introduce the
concept of face extraction from live video. Face recognition packages perform best when theindividuals are facing towards the camera. For this reason, gaze direction is an enormously important
metric in determining the quality of a face image. Since gaze direction is simply defined as thedirection in which an individual is looking, it can be estimated by locating an individual's eyes in
relation to the rest of their head. For example, if an individual is looking directly forward, the midpoint
-
8/8/2019 Face Extraction 1
14/22
-
8/8/2019 Face Extraction 1
15/22
13
The above discussion assumes that pixels exhibiting motion can be easily detected. It also
assumes that pixels representing human skin are equally simple to identify. Thankfully, this is in factthe case. The following sub-sections will describe how this is accomplished.
Pixel Motion Detection
The face detector constructs a motion history image to determine which pixels have recentlyexperienced motion. Rather than identifying a color or a shade of gray, each pixel in a motion history
image encodes the most recent time that the pixel was considered part of the foreground. This isaccomplished by numbering input video frames, and using the output of the background segmentation
component to selectively update the pixels of the motion history image. For the purpose of this project,pixels that have experienced motion are exactly those that are/were recently considered part of the
foreground. These pixels can be easily identified by a simple thresholding operation applied to themotion history image.
Skin Detection
The skin detector used by the face extraction software is based on the research of Margaret M.
Fleck and David A. Forsyth as described in their paper Naked People Skin Filter. This filter usestexture and color information to determine pixels that likely represent human skin. The main idea is
that pixels representing skin are generally tightly confined in a small region of the hue-saturation colorspace; in particular, skin tends to range from red to yellow in hue, and it tends to only be moderately
saturated. Additionally, skin tends to be rather smooth in texture. This is well represented by areaswhere the variance in pixel intensity is low (although, texture is not considered in the current
implementation of the face detector).
Currently, the skin detector uses the Hue/Saturation/Luminosity color space, in which the hueranges from 0 (red) to 360 (red again), and in which the saturation ranges from 0 to 1. Hues between
0 and 38 tend to be described as reddish-yellow, while values between 330 and 360 are consideredreddish-blue. During informal experimentation, pixels representing skin fell within one of these two
range. Additionally, pixels representing skin tended to be more saturated as the hue approached whatmight be considered yellow. Of course, these results closely agree with the results described in the
aforementioned paper, although a different color space and pair of regions was used. The particularregions used by the face detector are described in the following table:
Region Hue Saturation
Reddish-Yellow 0 38 0 1.0
Reddish-Blue 330 359 0 0.6
The above values were heavily influenced by an articled entitle Skin
Color Analysis authored by Jamie Sherrah and Shaogang Gong
Once the pixels representing skin are identified (producing the skin mask image), a second filter
-
8/8/2019 Face Extraction 1
16/22
14
is applied to all neighboring pixels. This second filter uses a less-strict set of rules in an attempt to
intelligently close gaps that might occur in the original skin mask.
Interestingly, images illuminated by natural or fluorescent lighting tended to be shifted toward
the blue portion of the color spectrum, while images illuminated by incandescent lighting were shiftedtowards reddish-orange. For this reason, the above regions are slightly larger than would be necessaryif the light source could be controlled.
Quality of Lighting
When developing the face detector component, one of the most frustrating phenomena occurredwhen testing the application after a long night of programming; at night, the software was carefully
tuned, and measurements like skin detection and face feature detection worked wonderfully. Indaylight, however, the lighting was harsh, the colors were shifted, and the carefully adjusted
measurements needed to be re-calibrated. The harsh lighting also greatly confused the edge detector(horizontal gradient map) used when detecting face features. For this reason, the quality of lighting is
an important metric for assessing image quality. Even if the face extraction software could cope withpoor lighting, it is not known how biometric face recognition software (or other post processing) might
cope with such images.
The first step in assessing the quality of lighting is converting the color input frames to grayscale intensity images. Once a gray scale image has been acquired, a histogram of the image's pixel
intensities is computed. The general assumption is that the quality of lighting is directly proportional tothe width of this histogram. This is not always a valid assumption; it fails when a subject is not evenly
illuminated.
In order to help address the problems caused by uneven illumination, one can rely on theassumption that faces are symmetric across a central-vertical axis. This central-axis is determined when
the face detector locates the face features. If the lighting is soft and even, then the distribution of grayscale values on one side of an individual's face should be similar to the distribution of values on the
other side of the face. This comparison is done by computing the histogram of the left and right halvesof the face, normalizing each of these histograms, and then computing their intersection. The final
lighting score is computed by multiplying the weight of this intersection with the width of the originalhistogram.
Measuring the Width of a Histogram
In the above discussion, histogram width was not well defined. For the purpose of this
application, a histogram's width is defined as the smallest number of consecutive histogram bins thataccount for 95% of the pixels in the input image. To compute this value, a greedy algorithm is used.
This algorithm locates the mean of the histogram which is the first bin to be added to a region calledthe histogram body. Histogram bins with indices higher than the largest index in the body are said to be
in the headof the histogram. Similarly, bins with lower indices are said to be in the tail.
-
8/8/2019 Face Extraction 1
17/22
15
The greedy algorithm iteratively grows the body of the histogram by claiming the lowest indexfrom the head,or the largest index from the tail. If the headof the histogram accounts for more pixels
than the tail, the body's expansion is in the direction of the head. Otherwise, the body expands in thedirection of the tail. This expansion continues until the body accounts for 95% of all pixels.
Image Sharpness
In addition to the quality of lighting, image sharpness is another good indicator of imagequality. In this sense, the adjective sharp is used as the antonym of blurry. Images can be blurred for
several reasons, including motion blur or a camera that is incorrectly focused. In all cases, blurredimages are certainly less desirable than sharp images. The challenge is determining a viable method for
measuring image sharpness.
Currently, the face extraction software attempts to measure the amount of high-frequencycontent contained in an image in order to judge its sharpness. In images, high frequency content can be
defined as content that encodes edges, lines, and areas where the pixel intensities change significantlyover short distances. With faces, the high-frequency content tends to concentrate around the eyes, lips,
and other face features.
In order to find high-frequency content, the software uses the Laplacian operator as a highpassfilter. Pixels that survive the highpass filter (have values greater than some pre-determined threshold)
are counted, and the result is divided by the total image area. Thus, the current measure of sharpness issimply the percentage of pixels that are considered to encode edges, lines, and other high-frequency
content.
In general images, this approach may not always be valid; one perfectly focused image maycontain fewer edges than another complex, but blurry, image. Thankfully, the face export software
should only ever compare face images to similar images acquired in previous frames. Thus, any changein the high-frequency content can generally be attributed to such things as motion blur.
Image Dimensions and Area
-
8/8/2019 Face Extraction 1
18/22
16
The final assumption about face image quality is that larger images are better than smaller
images. Now, it should be noted that the OpenCV face detector returns regions that are square. If theface regions were not square, then it would likely be the case that certain aspect ratios would be better
than others.
Unfortunately, one cannot always assume that large images are best. In fact, large images oftenindicate a false positive returned from the OpenCV face detector. Often, only a certain range of sizes
will be reasonable for a given scene. For example, a security camera responsible for observing a largelobby will expect smaller face images than a personal web camera sitting on top of a user's computer
monitor. This suggests that the range of acceptable image sizes should be configurable by the end-userof the system. Unfortunately, at this time, these parameters can only be changed by modifying
constants defined in the face detector source code.
Another option worth mentioning is that it might be possible for the software to learn on its ownabout the range of acceptable image sizes. For example, an average face size can be determined by
considering the dimensions of all face regions that have scored well in the other face quality metrics.Once this is accomplished, candidate face regions that are significantly different than this model can be
discarded. At this time, this option has not been explored.
Possible Improvements
The main problem with the face detector is that an image's score is not always a reasonable
indicator of quality. This is not the fault of the individual quality metrics, but is the result of thefunction which combines these individual values into the final score. As mentioned earlier, this
function is nothing more than a simple weighted sum of the aforementioned quality metrics. The
weights associated to each metric were chosen almost arbitrarily. These weights were then hand-tunedover a series of tests until the results seemed adequate. There is almost certainly a better approach thatneeds to be taken.
Pedestrian Tracker
Header File: ./PedestrianTracker/PedestrianTracker.h
C++ File: ./PedestrianTracker/PedestrianTracker.cppNamespace: PedestrianTracker
C++ Classes: Pedestrian, Tracker
The pedestrian tracker is an important component of the system, but could be the subject of anentire project. For this reason, the tracker was kept as simple as possible, while still producing
acceptable results. The current implementation processes each frame of the video sequence. Thisprocessing operates in four distinct phases:
1. The first phase uses the background segmentation component to identify the foreground
pixels. It then locates all of the connected foreground components, and their boundingrectangles.
-
8/8/2019 Face Extraction 1
19/22
17
2. At this point the second phase attempts to associate each of the connected components to apedestrian detected in the previous video frame. A pedestrian is nothing more than a
rectangular region of interest, and the association is achieved by determining to which
pedestrian each component intersects with the greatest area. If no association is possible,then the component is considered a new pedestrian.
3. The third phase of the tracker groups the connected components based upon the pedestrianto which each is associated. A bounding rectangle is computed for each of these groups.
These large bounding rectangles are then assigned to the appropriate pedestrian (replacingtheir bounding rectangle from the previous frame).
4. The fourth, and final, phase determines the foreground-pixel-density of each of the resultant
pedestrians. This density is simply the percentage of foreground pixels that occupy thepedestrian's current bounding rectangle. If this density is low, then it is assumed that the
results are incorrect. In this case, the tracker re-divides the pedestrian into its individualcomponents.
The above algorithm is based upon the assumption that pedestrians from the current frame
should be associated to nearby pedestrians from the previous frame. It also assumes that pedestriansmay be composed of several distinct foreground components.
Possible Improvements
As mentioned above, the current generation of the pedestrian tracker is very simple. It can, and
probably should, be replaced at a later date. At present, the tracker is slightly over-zealous aboutmerging objects. It also has difficulty tracking fast-moving objects; but, pedestrians are not usuallymoving very fast. Finally, the tracker fails to recognize when objects merge although this latter issue
could probably be easily resolved.
-
8/8/2019 Face Extraction 1
20/22
18
Appendix A: Building From Source
Assumptions:
The following discussion assumes that OpenCV beta 5 is installed in the following default location onWindows XP Service Pack 2:
C:\Program Files\OpenCV
This implies that the Haar classifier data sets are located as follows:
C:\Program Files\OpenCV\data\haarcascades
Finally, it assumes that OpenCV has been properly installed, and that the system path has been
modified to include:
C:\Program Files\OpenCV\bin
It also assumes that developers are using Microsoft VisualStudio 2003.
If the Assumptions Fail:
If any of the above assumptions fail then, let be the location where OpenCVwas actually installed. The following modifications are then necessary:
Changes to ./FaceDetector/FaceDetect.h
The following constants need to be modified to point to the appropriate Haar datasets:
#define DEFAULT_FRONTAL_CLASSIFIER_PATH \
\\data\\haarcascades\\haarcascade_frontalface_default.xml
#define DEFAULT_PROFILE _CLASSIFIER_PATH \\\data\\haarcascades\\haarcascade_profileface.xml
Changes to VisualStudioProject
Project -> Properties -> C/C++ -> General -> Additional Include Directories must be set to:
"\otherlibs\highgui";
"\filters\ProxyTrans";"\cxcore\include";
"\cvaux\include";"\cv\include"
Also,
-
8/8/2019 Face Extraction 1
21/22
19
Project -> Properties -> Linker -> General -> Additional Library Directories must be set to:
"\lib";
Finally, just to be thorough, make sure that Project -> Properties -> Linker -> Input -> AdditionalDependencies is set to:
strmiids.lib quartz.lib cv.lib cxcore.lib highgui.lib
Running the Binaries on Systems without OpenCV
If one wishes to run the face extractor software on a system where OpenCV is not installed, then thefollowing files must be included in the same directory as the binary executable FACE.exe:
cv097.dll
cv097d.dllcvaux097.dll
cxcore097.dllcxcore097d.dll
haarcascade_frontalface_default.xmlhaarcascade_profileface.xml
highgui096d.dllhighgui097.dll
proxytrans.ax
Additionally, the constants DEFAULT_FRONTAL_CLASSIFIER_PATH andDEFAULT_FONTAL_PROFILE_PATH defined in ./FaceDetector/FaceDetect.h must be modified to
load the classifiers from the local ./ directory.
Finally, the proxy transform filter must be registered. This can be achieved by executing the followingcommand at the command shell:
regsvr32 proxytrans.ax
-
8/8/2019 Face Extraction 1
22/22
20
References
Fleck, M. & Forsyth, A (nd.).Naked People Skin Filter. Berkeley-Iowa Naked People Finder.
Retrieved April 17, 2006 from http://www.cs.hmc.edu/~fleck/naked-skin.html
Laganire, R. (2003).A step-by-step guide to the use of the Intel OpenCV library and the MicrosoftDirectShow technology. Retreived April 17, 2006, from
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/
Microsoft Corporation (2006).Microsoft DirectShow 9.0. MSDN Library. Retrieved April 17, 2006from http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/directshow/htm/directshow.asp
OpenCV Comminity (2006). Face Detection using OpenCV. OpenCV Library Wiki. Retrieved April
17, 2006 from http://opencvlibrary.sourceforge.net/FaceDetection
OpenCV Comminity (2006). What is OpenCV?. OpenCV Library Wiki. Retrived Retrieved April 17,
2006 from http://opencvlibrary.sourceforge.net/
Peng, K, et al. (2005).A Robust Algorithm for Eye Detection on Gray Intensity Face withoutSpectacles. Journal of Computer Science and Technology. Retrieved April 17, 2006 from
http://journal.info.unlp.edu.ar/Journal/journal15/papers/JCST-Oct05-3.pdf
Sherrah, J. & Gong, S (2001). Skin Color Analysis. CvOnline. Retrieved April 17, 2006 fromhttp://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG288
Wikipedia contributors (2006).DirectShow. Wikipedia, The Free Encyclopedia. Retrieved April 17,
2006 from http://en.wikipedia.org/w/index.php?title=DirectShow&oldid=48688926.