gradient feature extraction for classification-based face detection

11
Pattern Recognition 36 (2003) 2501 – 2511 www.elsevier.com/locate/patcog Gradient feature extraction for classication-based face detection Lin-Lin Huang a ; , Akinobu Shimizu a , Yoshihoro Hagihara b , Hidefumi Kobatake a a Graduate School of BASE, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan b Faculty of Engineering, Iwate University, 3-18-8 Ueda, Morioka, Iwate 020-8550, Japan Received 8 August 2002; accepted 2 April 2003 Abstract Face detection from cluttered images is challenging due to the wide variability of face appearances and the complexity of image backgrounds. This paper proposes a classication-based method for locating frontal faces in cluttered images. To improve the detection performance, we extract gradient direction features from local window images as the input of the underlying two-class classier. The gradient direction representation provides better discrimination ability than the image intensity, and we show that the combination of gradient directionality and intensity outperforms the gradient feature alone. The underlying classier is a polynomial neural network (PNN) on a reduced feature subspace learned by principal component analysis (PCA). The incorporation of the residual of subspace projection into the PNN was shown to improve the classication performance. The classier is trained on samples of face and non-face images to discriminate between the two classes. The superior detection performance of the proposed method is justied in experiments on a large number of images. ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Face detection; Classication; Gradient direction; Polynomial neural network; PCA 1. Introduction Detecting human faces from scene images has been an active research topic because it has many potential applica- tions, ranging from biometric person identication, security surveillance, image and video database retrieval to intelli- gent human–computer interface [1,2]. An apparent applica- tion of face detection is to serve the pre-processor of face recognition systems for locating faces prior to recognition. Face detection from cluttered images is very challenging due to the wide variability of face appearances and the complex- ity of image backgrounds. The variability of face images lies in the diversity of individuals, the pose and expressions of face, the attachments and lighting conditions, etc. Although signicant progress has been made in the last two decades, Corresponding author. Tel./fax: +81-42-388-7438. E-mail address: [email protected] (L.-L. Huang). there is still a gap between the requirements and the actual performances. The methods proposed for face detection so far can be roughly divided into two categories: feature-based meth- ods and classication-based ones. Feature-based methods make explicit use of the general structure of faces and the geometric relationship of face components (features) to hierarchically group feature hypotheses into faces [310]. Usually, they are implemented in a multi-stage template- based or rule-based system, and low-level features are often extracted by edge detection. To improve the detection speed, the color information is often used to preclude non-face image regions. Since the performance of feature-based methods relies on the reliable location of facial features, it is susceptible to partial occlusion, excessive deformation, and image degradation. In classication-based methods, face detection is consid- ered as a two-class classication problem. On training with a large number of face and non-face samples, the underly- ing classier is able to accurately classify any test pattern to 0031-3203/03/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/S0031-3203(03)00130-4

Upload: lin-lin-huang

Post on 02-Jul-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Gradient feature extraction for classification-based face detection

Pattern Recognition 36 (2003) 2501–2511www.elsevier.com/locate/patcog

Gradient feature extraction for classi"cation-based facedetection

Lin-Lin Huanga ;∗, Akinobu Shimizua, Yoshihoro Hagiharab, Hidefumi KobatakeaaGraduate School of BASE, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan

bFaculty of Engineering, Iwate University, 3-18-8 Ueda, Morioka, Iwate 020-8550, Japan

Received 8 August 2002; accepted 2 April 2003

Abstract

Face detection from cluttered images is challenging due to the wide variability of face appearances and the complexityof image backgrounds. This paper proposes a classi"cation-based method for locating frontal faces in cluttered images. Toimprove the detection performance, we extract gradient direction features from local window images as the input of theunderlying two-class classi"er. The gradient direction representation provides better discrimination ability than the imageintensity, and we show that the combination of gradient directionality and intensity outperforms the gradient feature alone.The underlying classi"er is a polynomial neural network (PNN) on a reduced feature subspace learned by principal componentanalysis (PCA). The incorporation of the residual of subspace projection into the PNN was shown to improve the classi"cationperformance. The classi"er is trained on samples of face and non-face images to discriminate between the two classes. Thesuperior detection performance of the proposed method is justi"ed in experiments on a large number of images.? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords: Face detection; Classi"cation; Gradient direction; Polynomial neural network; PCA

1. Introduction

Detecting human faces from scene images has been anactive research topic because it has many potential applica-tions, ranging from biometric person identi"cation, securitysurveillance, image and video database retrieval to intelli-gent human–computer interface [1,2]. An apparent applica-tion of face detection is to serve the pre-processor of facerecognition systems for locating faces prior to recognition.Face detection from cluttered images is very challenging dueto the wide variability of face appearances and the complex-ity of image backgrounds. The variability of face images liesin the diversity of individuals, the pose and expressions offace, the attachments and lighting conditions, etc. Althoughsigni"cant progress has been made in the last two decades,

∗ Corresponding author. Tel./fax: +81-42-388-7438.E-mail address: [email protected] (L.-L. Huang).

there is still a gap between the requirements and the actualperformances.

The methods proposed for face detection so far can beroughly divided into two categories: feature-based meth-ods and classi"cation-based ones. Feature-based methodsmake explicit use of the general structure of faces and thegeometric relationship of face components (features) tohierarchically group feature hypotheses into faces [3–10].Usually, they are implemented in a multi-stage template-based or rule-based system, and low-level features are oftenextracted by edge detection. To improve the detection speed,the color information is often used to preclude non-faceimage regions. Since the performance of feature-basedmethods relies on the reliable location of facial features, itis susceptible to partial occlusion, excessive deformation,and image degradation.

In classi"cation-based methods, face detection is consid-ered as a two-class classi"cation problem. On training witha large number of face and non-face samples, the underly-ing classi"er is able to accurately classify any test pattern to

0031-3203/03/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.doi:10.1016/S0031-3203(03)00130-4

Page 2: Gradient feature extraction for classification-based face detection

2502 L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511

one of the two classes. The face/non-face sample or test pat-tern is extracted from a scanning window of the image. Thewindow is shifted on multiply scaled images so as to detectfaces of variable sizes and locations. Taking advantage ofthe state of the art of pattern classi"cation, a large varietyof classi"ers and learning methods are available to performclassi"cation-based detection. The classi"cation methods in-clude statistical methods [11–16], arti"cial neural networks[17–21], and support vector machines (SVMs) [22,23], etc.The method of Sung and Poggio [24] can be viewed as ahybrid of statistical method and neural network because itbuilds subspace models for face and non-face classes andthen the subspaces are combined using a neural network fordecision.

Compared to feature-based ones, classi"cation-basedmethods are computationally expensive due to the exhaus-tive examination of re-scaled and shifted windows. Never-theless, due to the discrimination ability of learning-basedclassi"cation, they not only perform well on clean images,but also provide high detection accuracy on low-qualityimages, as reGected in the published results in the litera-ture. Particularly, the reported best results of face detectionare almost yielded by neural networks or statistical learn-ing methods [24,19,20]. The computational complexity ofclassi"cation-based methods can be alleviated by using ge-ometric features or color information for "ltering non-faceregions.

The performance of classi"cation-based face detectionis inGuenced by the representation of the image pattern.Generally, statistical pattern classi"cation and neural net-works require that the input pattern is represented in afeature vector of "xed dimensionality. That is, the localimage in the scanning window is normalized to a "xedsize, and then the intensity values compose a raw featurevector. The classi"cation-based detection methods so farhave almost used the image intensity values as the inputfeatures of classi"er. Some works have considered featureextraction from image intensity, basically under the sub-space projection framework [11–13,24,25]. On the otherhand, using higher-order local statistics such as gradientand local moments in the feature vector is expected to givebetter classi"cation performance. The features extracted insuch a way can also be combined with subspace projection.

This paper proposes a classi"cation-based face detectionmethod using gradient feature extraction. The underlyingclassi"er is a polynomial neural network (PNN) [26,27],which is a single-layer network performing nonlinear clas-si"cation by using the polynomial expansion of patternfeatures as the network input. For the classi"cation ofhigh-dimensional data, the complexity of PNN is managedby dimensionality reduction by principal component anal-ysis (PCA). Using image intensity as the raw feature, thePNN has yielded promising detection performance [28].To further improve the detection performance, we exploitthe directionality of image gradient, which is a quite intu-itive and stable shape descriptor. Unlike that feature-based

detection methods use edges as structural primitives, wemeasure the gradient directions numerically and store in afeature vector for classi"cation. We have tried three optionsof gradient features: gradient map, directional decomposi-tion, and the combination of directional decomposition andimage intensity. The directional decomposition of gradi-ent map signi"cantly improves the detection performance,while the direct use of gradient map does not show advan-tage over the image intensity. The best result is produced bycombining the directional decomposition and the intensityinto a feature vector.

The rest of this paper is organized as follows. Section 2gives an overview of the face detection system; Section 3describes the gradient feature extraction approach; Section 4explains the PNN structure and the learning algorithm; Theexperimental results are presented in Section 5, and Section6 provides concluding remarks.

2. System overview

The diagram of the face detection system is shown inFig. 1. As most previous works did, to detect faces of vari-able sizes and locations, the original input image is re-scaledto multiple images of variable sizes, and on each re-scaledimage, the local images in scanning windows of standardsize are examined exhaustively by a face/non-face classi-"er. If a local image is judged to be a face pattern, the cor-responding area in the original image signi"es a detectedface. Considering that in a 2D image, diJerent faces do notoverlap, 1 the detected faces overlapping in the same scaleas well as in diJerent scales compete with others in an arbi-tration procedure. After mapping the detected face regionsin re-scaled images back to the original image, whenevermore than one face regions overlap, only one region of themaximum face likelihood is retained.

In our system, the size of the scanning window is set to20 × 20 pixels. The image in a scanning window is calleda window image or local image. Each window image isclassi"ed to be a face or non-face pattern, or assigned aface likelihood measure by the underlying classi"er. Beforeclassi"cation by the classi"er, the local image undergoespre-processing to reduce the intensity variation and featureextraction to give a better representation. As shown in Fig. 2,in pre-processing, the local image is "rst subtracted from anoptimally "tted intensity plane so as to compensate for theinhomogenity of illumination. Then the contrast of intensityis normalized by histogram equalization. These two stepsare eJective to alleviate the variation of lighting condition.

After pre-processing, the intensity values of the local im-age can serve as feature values for classi"cation. To achievebetter performance, more discriminative features can be

1 In the case of occlusion, the faces do not overlap in 2D image,because the occluded face is invisible and cannot be detected bythe classi"er.

Page 3: Gradient feature extraction for classification-based face detection

L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511 2503

Fig. 1. Diagram of the face detection system. The local images inscanning windows of multi-scaled images are input to the two-classclassi"er for face/non-face discrimination.

Fig. 2. Pre-processing of local image. On each row, the threeimages from left to right are the local window image, the one afterillumination correction, and the one after histogram equalization,respectively.

extracted by local correlation or global subspace projection.We adopt a gradient directional decomposition strategy forextracting direction features. The experimental results showthat the directional decomposition of gradient yields supe-rior performance, while the direct use of gradient map doesnot perform well. The details of gradient feature extractionwill be given in Section 3.

The classi"er for face/non-face discrimination is a PNN.The PNN has one output unit to give the measure of facelikelihood. This likelihood measure is used in within-scaleand inter-scale arbitration of detected faces. For classifyinga local image pattern, when the likelihood measure exceedsa threshold, we can state that this local image is a facepattern, otherwise is not a face. The output unit of PNNtakes the pattern features and their polynomial expansionas inputs. Since the number of polynomial terms on highdimensionality is huge, the number of features is reduced bydimensionality reduction by PCA. The details of PNN willbe given in Section 4.

3. Gradient feature extraction

Most classi"cation-based methods have used the intensityvalues of window images as the input features of classi"er.Edge detection has been frequently applied in feature-basedface detection for facial feature extraction. However, mea-suring the directionality of edges in feature space has rarelybeen tried in classi"cation-based detection. The direction-ality of edges is a visually prominent feature and is highlystable for characterizing shapes. Speci"cally, in frontal ornearly frontal face images, despite the variation of face iden-tity and expressions, the facial contour is approximately anoval shape, and the eyes and mouth are approximately hor-izontal lines. Therefore, input features composed of bothdirectionality of edges and intensity values should be moreinformative and robust to illumination and facial expressionchanges.

In our approach, we extract direction features via the di-rectional decomposition of gradient map. The direct useof gradient map as the input of a classi"er does not yieldpromising detection performance because the directionalityis not measured explicitly. The use of directional decom-position of gradient map in face detection is inspired bya previous work of on-line character recognition which re-ported promising results using directional gradient features[29]. The gradient direction features are obtained in threesteps: gradient computation, directional decomposition, andfeature reduction.

After the pre-processing of local window image f(x; y)(20 × 20 pixels), the gradient vector g(x; y) = [gx; gy]T

is computed at each pixel location using the Sobel opera-tor. The two masks of Sobel operator for computing hori-zontal and vertical gradient components, respectively, areshown in Fig. 3. Accordingly, the two components arecomputed by

gx(x; y) = f(x + 1; y − 1) + 2f(x + 1; y)

+f(x + 1; y + 1)− f(x − 1; y − 1)

−2f(x − 1; y)− f(x − 1; y + 1);

gy(x; y) = f(x − 1; y + 1) + 2f(x; y + 1)

+f(x + 1; y + 1)− f(x − 1; y − 1)

−2f(x; y − 1)− f(x + 1; y − 1): (1)

The gradient vector g(x; y) is stored in a gradient mapcontaining two components gx(x; y) and gy(x; y), which

Fig. 3. Sobel masks for gradient computation.

Page 4: Gradient feature extraction for classification-based face detection

2504 L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511

Fig. 4. Eight chaincode directions.

Fig. 5. Directional decomposition of gradient vector.

are corresponding to two sub-images. The gradient vectorcan also be equivalently represented by the magnitude (vec-tor length) and the direction angle in 2-D space.

For directional decomposition of gradient map, a goodway is to decompose the gradient vector into components inmultiple discrete directions [29]. Accordingly, we decom-pose the gradient vector into the components in eight chain-code directions as shown in Fig. 4. If a gradient vector liesbetween two discrete directions, it is decomposed into twocomponents along the two discrete directions. For example,in Fig. 5, a gradient vector g in region M2 (between direc-tion 2 and 3) is decomposed into two components g2 andg3. The magnitudes of g2 and g3 are assigned to the direc-tional sub-images f2(x; y) and f3(x; y), respectively. If thedirection of a gradient vector is identical to a discrete chain-code direction, the magnitude of the vector is exclusivelyassigned to the corresponding directional sub-image.

The directional decomposition of gradient map resultsin eight directional sub-images fd(x; y), d = 1; : : : ; 8, eachhas 20 × 20 values. To generate a feature vector of mod-erate dimensionality, each pair of sub-images of opposite

Fig. 6. The intensity image (upper-left), two gradient components(upper-right), and four orientation sub-images (lower).

directions are merged into a single orientation sub-image.Though 8-direction representation provides better dis-crimination ability than 4-orientation representation, thereduction to four sub-images signi"cantly reduces the com-plexity of classi"cation. On each orientation sub-image, the20× 20 pixel values are reduced to 10× 10 by block aver-aging. Further, some corner pixels in the 10 × 10 map areexcluded because they are rarely constitute facial parts.Three pixels are excluded from each upper corner, and onepixel is excluded from each lower corner. We then con-struct a 368-dimensional feature vector by concatenatingthe pixel values of the four sub-images.

Fig. 6 shows an example of gradient feature extraction.The intensity image and the two component sub-images ofgradient (gx and gy) are shown in the upper. The four ori-entation sub-images (un-reduced) are shown in the lower,from left to right, f1(x; y), f2(x; y), f3(x; y), and f4(x; y).

In addition to the directional decomposition of gradient,we test the face detection performance of variable featurevectors in our experiments. The feature vectors are listed inthe following.

• Feature vector Inten: intensity values of pre-processedlocal image, 368-D, by excluding 32 corner pixels (10from each upper corner and 6 from each lower corner)from 20× 20 pixels. This feature vector has been testedpreviously by the authors [28].

• Feature vector Sobel: masked Sobel gradient map, 184-D.Each of the two gradient component sub-images is re-duced to 10× 10 pixels and then eight corner pixels areexcluded.

• Feature vector Decom1: directional decomposition ofgradient map, 368-D. Eight directional sub-images aremerged to four orientation ones, each reduced to 10× 10pixels and then eight corner pixels are excluded.

• Feature vector Decom2: modi"ed direction decompo-sition, 276-D. Considering that in face images, verti-cal edges are less stable, we abandon the directionalsub-images f1(x; y) and f5(x; y). The feature values areextracted from three orientation sub-images.

• Feature vector Combin: combined feature vector of De-com2 and reduced intensity. The 20× 20 intensity image

Page 5: Gradient feature extraction for classification-based face detection

L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511 2505

is compressed to 10 × 10 and masked to 92 values, sothe dimensionality of combined feature vector is 368-D.Though the gradient direction feature outperforms the im-age intensity, the intensity may show complementarinessto the direction feature.

4. Polynomial neural network

The PNN is a single-layer network which uses as inputsnot only the feature measurements of the input pattern butalso the polynomial terms of the measurements. For facedetection, the PNN has one single output unit for two-classclassi"cation. The architecture of the PNN is shown inFig. 7. The number of polynomial terms, i.e., the number ofinputs to the output unit, increases rapidly with the numberof features. Nevertheless, the size of a second-order (bino-mial) network is acceptable and the classi"cation perfor-mance is promising. Compared to the multilayer perceptron(MLP), the PNN is faster in learning and is less susceptibleto local minima because it is a single-layer architecture.And in our previous experiments, the PNN was shown tooutperform the MLP in face detection [28].

Denote the input pattern as a feature vector x =(x1; x2; : : : ; xd)T, the output of the PNN is computed by

y(x) = s

(d∑i=1

wixi +d∑i=1

d∑j=i

wijxixj + w0

); (2)

where s(·) is a sigmoid activation function

s(a) =1

1 + exp(−a) :

In our problem, the input vector comprises the inten-sity values, gradient values, or the directional strengthsof local image. To reduce the complexity of PNN for

Fig. 7. Architecture of the PNN for two-class classi"cation. Thepolynomials include both linear features and binomial terms, andthe residual of subspace projection is used as well.

high-dimensional data, the dimensionality of the raw fea-ture vector is reduced by PCA, wherein the raw vector isprojected onto a linear subspace:

zj = (x− �)T�j; j = 1; 2; : : : ; m; (3)

where zj denotes the projection of x onto the jth axis ofthe subspace, �j denotes the eigenvector of the axis, and �denotes the mean vector of the pattern space. The eigenvec-tors are computed by PCA or K-L transform on a datasetof face samples. The eigenvectors corresponding to the mlargest eigenvalues are selected such that the error of patternreconstruction from the subspace is minimized.

Using the projections of image pattern onto the subspaceas the features, the output of the PNN is now in the form

y(x) = s

(m∑i=1

wizi +m∑i=1

m∑j=i

wijzizj + w0

): (4)

In this form, the reconstruction error of feature space (thedistance from the feature subspace, DFFS) is totally ignored.The DFFS is an important indicator of the deviation of the in-put pattern from the subspace. When the subspace is learnedfrom face samples, the DFFS indicates the dissimilarity ofa pattern being a face. Hence, we integrate the DFFS intothe PNN with hope to improve the detection performance:

y(x) = s

(m∑i=1

wizi +m∑i=1

m∑j=i

wijzizj + wDDf + w0

)

= s(wTzE + w0); (5)

where w denotes the vector composed of all connectingweights while zE is the vector composed of all the inputs tothe output unit, including the DFFS:

Df = ‖x− �‖2 −m∑j=1

z2j : (6)

The connecting weights of PNN are trained in supervisedlearning on a dataset of face and non-face samples with aimto minimize the empirical loss of mean square error (MSE):

E =Nx∑n=1

[y(xn)− tn]2 + �‖w‖2 =Nx∑n=1

En; (7)

where tn denotes the target output for the input pattern xn,with value 1 for face pattern and 0 for non-face pattern; � isa coeNcient of weight decay, which is helpful to improvethe generalization performance.

The connecting weights are updated by stochastic gradientdescent [30]. The example patterns are fed into the networkrepeatedly to update the weights until the empirical lossreaches a local minimum. On an input pattern zn = z(xn),the connecting weights are updated by gradient descent:

w(n+ 1) = w(n)− �(n)@En

@w;

w0(n+ 1) = w0(n)− �(n)@En

@w0; (8)

Page 6: Gradient feature extraction for classification-based face detection

2506 L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511

where �(n) is a learning rate, which is small enough and de-creases progressively. The partial derivatives are computedby@En

@w= [y(xn)− tn]y(1− y)zE + �

Nxw;

@En

@w0= [y(xn)− tn] y(1− y): (9)

Since the PNN is a single layer network, the trainingprocess is quite fast and the result is not inGuenced by therandom initialization of weights.

5. Experimental results

To test the performance of gradient features for face de-tection, we run experiments on a large number of simple andcomplex images. The face/non-face samples or local imagepatterns are represented in a feature vector and are classi"edby a PNN to assign a face likelihood. The PNN is trainedwith a large number of face and non-face samples.

5.1. Image databases

Two types of images were used in our experiments. Thetype-1 images have clear faces and simple backgroundswhile the type-2 images have varying number of faces, vary-ing face sizes and clarity. The type-1 images were down-loaded from websites of multiple sources, and the 123 type-2images are contained in the test set of [19]. 2

We used 2987 type-1 images (Train Set1, containing 2990faces) for collecting face samples, among which, 501 imageswere also used for collecting non-face samples. Non-facesamples were also collected from 14 type-2 images, whichcontain no faces, but constitute various scenes and buildings.The images for collecting non-face samples are containedin a set called Train Set2. Another 270 type-1 images (TestSet1) and the rest 109 type-2 images (Test Set2) were usedin testing.

In collecting face samples from type-1 images, the faceboxes were manually located using a mouse pointer. Eachbox is automatically adjusted to a square. The local imagewithin the face box is normalized to 20 × 20 pixels. Af-ter pre-processing and feature extraction, the feature values(intensity values or gradient features) are stored. The squareface box also varies in aspect ratio and size to generate fourvariations. In addition, the mirror reGection of a face imageabout the vertical axis also gives a variation. Combining theaspect ratio/size/reGection variations, a face image gives 10variations of face patterns. In total, 29,900 face patterns wereextracted from 2990 face images. Besides for training theface/non-face classi"er, the face samples were "rstly usedto compute the feature subspace by PCA.

2 The test set1 of Ref. [19] contains 130 images. We did not usethe 7 images that contain extremely large or small faces.

The non-face samples were collected from the imagesof Train Set2 with a preliminary classi"er to identify thebackground patterns that resemble faces, as done in pre-vious works [24,19]. The non-face samples were collectedin three phases. In the "rst phase, local window images inbackground area 3 of 228 type-1 images in Train Set2 werecompared with the mean vector of face samples and the lo-cal images with the Euclidean distance under a thresholdis considered as a confusing non-face sample. Under thefeature representation Inten, 44,644 non-face samples werecollected in the "rst phase. These samples are used to trainthe "rst-phase PNN of diJerent feature representations byextracting respective features from the collected local im-ages. In the second phase and third phase, the local windowimages are classi"ed by the PNN trained with previouslycollected samples, and so, the local images must be repre-sented in the same feature vector as the PNN.

The "rst-phase PNN was used to classify the local imagesof another 273 type-1 images in Train Set2, wherein the pat-terns with PNN output higher than a threshold are collectedas non-face samples. About 20,000 non-face samples weregathered in this phase. The "rst-phase and second-phasenon-face samples were used together to train a second-phasePNN. Then in the third phase, the second-phase PNN wasused to classify the local images of the 14 type-2 imagesin Train Set2. About 1000 non-face samples were collectedand were used together with the previous samples to train athird-phase PNN, which is used for face detection from testimages.

In non-face sample collection and face detection, thetype-1 images were normalized to 10 scales, starting from0.1 and increasing at factor of 1.21. The type-2 imageshave smaller faces, so the scales start from 0.2 and increaseat factor of 1.21. As result, the 10th scale of type-2 imagesends at 1.11. This implies that the faces as small as 18× 18pixels (normalized to 20 × 20 pixels at scale 1.11) can bedetected.

The 270 simple images of Test Set1 contain 270 faces.Via scanning the images at 10 scales (starting from 0.1),there are totally 26,885,679 shifted windows to be classi"ed.The 109 images of Test Set2 contain 487 faces in total, andthere are totally 77,827,098 shifted windows in 10 scales(starting from 0.2). In evaluating the detection performance,the detection rate is the percentage of correctly detectedfaces with respect to the total number of faces, while the falsepositive (false alarm) rate is the ratio of falsely acceptedfaces with respect to the total number of scanned windows.

5.2. Detection results

The "ve feature vectors described in Section 3 are used torepresent the face/non-face samples and local images. Each

3 In images containing faces, if the scanning window does notoverlap largely with a face box, the local image is considered tobe in background area.

Page 7: Gradient feature extraction for classification-based face detection

L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511 2507

Fig. 8. Detection performances of "ve feature vectors on the imagesof Test Set2.

feature representation has a linear subspace learned by PCAfrom the face samples represented in feature vectors. The di-mensionality of subspace (number of principal components)is variable so as to test the eJect of subspace dimensionalityon the detection performance.

First, to fairly compare the performance of "ve featurevectors, the dimensionality of feature subspace was set tothe same number m = 100. The performance was tested onthe images of Test Set2. The ROC (receiving operating char-acteristics) curves of tradeoJ between correct detection rateand false positive rate with variable decision thresholds areshown in Fig. 8. From the results, we can see that the di-rectional decomposition of gradient (Decom1 and Decom2)signi"cantly improves the detection performance comparedto the intensity (Inten) and the gradient map (Sobel), whilethe direct use of gradient map does not show advantageover the intensity. The exclusion of horizontal directionalsub-images in direction decomposition (Decom2) seems notto inGuence the detection performance. This is because thefacial features are dominated by horizontal edges (verticalgradients). The best performance is given by the combina-tion of directional decomposition and image intensity (Com-bin), which signi"cantly outperforms the directional decom-position alone.

The combined feature vector Combin was further testedwith variable dimensionality of subspace in PNN classi"-cation. The ROC curves of subspace dimensionalities m =100; 150; 180 are shown in Fig. 9. The results show thatthe performance of subspace dimensionalities PC150 andPC180 is superior to that of PC100, while the PC150 andPC180 do not diJer in performance. The detection rates andfalse positive rates on Test Set1 and Test Set2 are listed inTable 1. We can see that on simple images of Test Set1, the

Fig. 9. Detection performances of feature vector Combin on theimages of Test Set2.

detection rate is 100% and the false positive rate is very low.On Test Set2, the detection rate is lower and the false posi-tive rate is higher. This is because the images in Test Set2mostly have low resolution of face appearance and complexbackgrounds.

Some examples of face detection with feature vectorCombin-PC150 are shown in Figs. 10, 11, and 12. Fig. 10shows the examples of Test Set1 while Figs. 11 and 12 showthe examples of Test Set2. From these examples, we can seethat the classi"cation-based method on combined intensityand gradient direction feature is quite robust against lowimage quality and variations of face identity, expression,lighting condition, and attachments, etc. The missed facesare inherently ambiguous or rotated excessively. The falsepositives, on the other hand, mostly resemble human faceswhen viewed in isolation.

5.3. Comparison with other systems

For comparison with other systems reported in the litera-ture, we would like to mention the method proposed in [24],which models the distributions of face and non-face patternsand uses a multi-layer perceptron (MLP) to make "nal de-cision. The detection results of [24] have been consideredto be among the best ones of face detection from complexbackgrounds [1,2]. The result of their system on a set of 23complex images, along with the results of our method on thesame images are listed in Table 2. From the results, we cansee that our method yield higher detection rate with fewerfalse positives.

Our method also consumes much less computation re-sources than the one of [24]. Speci"cally, for processing alocal window image of 19× 19 pixels, the classi"er of [24]needs more than 254,700 multiplications, while our method

Page 8: Gradient feature extraction for classification-based face detection

2508 L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511

Table 1Detection results of combined feature vector Combin

PNN Test Set1 Test Set2

Det. rate False False rate Det. rate False False rate

PC100 100% 17 6:3× 10−7 86.04% 91 3:3× 10−6

PC150 100% 3 1:1× 10−7 86.04% 52 6:7× 10−7

PC180 100% 2 0:7× 10−7 86.04% 53 6:8× 10−7

PC = number of principal components

Fig. 10. Examples of face detection on the images of Test Set1.

Fig. 11. Examples of face detection on the images of Test Set2.

Page 9: Gradient feature extraction for classification-based face detection

L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511 2509

Fig. 12. More examples of face detection on the images of Test Set2.

Table 2Detection results on 23 test images of Ref. [24]

Method True positives Detection rate False positives

[24] 126 84.56% 13Combin-PC100 128 85.23% 11

(Combin-PC100) needs only 41,850 multiplications for pro-cessing a local image of 20 × 20 pixels. Therefore, ourmethod is much faster in detection while the performance iscompetitive.

6. Conclusion

We proposed in this paper a classi"cation-based approachfor locating frontal or nearly frontal faces in clutteredimages. To improve the detection performance, we extractgradient direction features from local window images as theinput of the underlying two-class classi"er. The directionaldecomposition of gradient provides better discriminationability than the image intensity and the gradient map, andit was shown that the combination of gradient direction andintensity further improves the detection performance. Inexperiments of face detection on images with simple andcomplex backgrounds, the proposed method has yieldedsuperior performance.

Page 10: Gradient feature extraction for classification-based face detection

2510 L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511

References

[1] R. Hjelmas, B.K. Low, Face detection: a survey, Comput.Vision Image Understanding 83 (2001) 236–274.

[2] M.H. Yang, D.J. Kriegman, N. Ahuja, Detecting faces inimages: a survey, IEEE Trans. Pattern Anal. Mach. Intell. 24(1) (2002) 34–58.

[3] K.C. Yow, R. Cipolla, Feature-based human face detection,Image Vision Comput. 15 (9) (1997) 713–735.

[4] J. Miao, B. Yin, K. Wang, L. Shen, X. Chen, A hierarchicalmulti-scale and multi-angle system for human face detection ina complex background using gravity-center template, PatternRecognition 32 (1999) 1237–1248.

[5] T. Kondo, H. Yan, Automatic human face detectionand recognition under non-uniform illumination, PatternRecognition 32 (1999) 1707–1718.

[6] D. Maio, D. Maltoni, Real-time face location on gray-level static images, Pattern Recognition 33 (2000)1525–1539.

[7] C. Han, H. Liao, G. Yu, L. Chen, Fast face detectionvia morphology-based pre-processing, Pattern Recognition 33(2000) 1701–1712.

[8] C. Lin, K. Fan, Triangle-based approach to the detection ofhuman face, Pattern Recognition 34 (2001) 1271–1284.

[9] K. Wong, K. Lam, W. Siu, An eNcient algorithm forhuman face detection and facial feature extraction underdiJerent conditions, Pattern Recognition 34 (2001)1993–2004.

[10] H. Yao, W. Gao, Face detection and location based on skinchrominance and lip chrominance transformation from colorimages, Pattern Recognition 34 (2001) 1555–1564.

[11] A. Pentland, B. Moghaddam, T. Starner, View-based andmodular eigenspaces for face recognition, in: Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition, Seattle, USA, 1994, pp. 84–91.

[12] M.S. Lew, N. Huijsmans, Information theory and facedetection, in: Proceedings of the 13th International Conferenceon Pattern Recognition, San Francisco, USA, 1996, Vol. III,pp. 601–605.

[13] B. Moghaddam, A. Pentland, Probabilistic visual learning forobject representation, IEEE Trans. Pattern Anal. Mach. Intell.19 (7) (1997) 696–720.

[14] A.J. Colmenarez, T.S. Huang, Face detection withinformation-based maximum discrimination, in: Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition, Puerto Rico, 1997, pp. 782–787.

[15] H. Schneiderman, T. Kanade, Probabilistic modeling of localappearance and spatial relationships for object recognition,in: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, Queensland, Australia, 1998,pp. 45–51.

[16] L. Meng, T. Nguyen, D. Castanon, An image-based Bayesianframework for face detection, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,South Carolina, USA, 2000, pp. 302–307.

[17] P. Juell, R. Marsh, A hierarchical neural network for humanface detection, Pattern Recognition 32 (3) (1996) 781–787.

[18] S.H. Lin, S.Y. Kung, L.J. Lin, Face recognition/detectionby probabilistic decision-based neural network, IEEE Trans.Neural Networks 8 (1) (1997) 114–132.

[19] H.A. Rowley, S. Baluja, T. Kanade, Neural network-basedface detection, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1)(1998) 23–38.

[20] R. Feraud, O.J. Bernier, J.E. Viallet, M. Collobert, A fast andaccurate face detector based on neural networks, IEEE Trans.Pattern Anal. Mach. Intell. 23 (1) (2001) 42–53.

[21] L.L. Huang, A. Shimizu, Y. Hagihara, H. Kobatake, Facedetection using a modi"ed radial basis function network, in:Proceedings of the 16th International Conference on PatternRecognition, Quebec, Canada, 2002, pp. 342–345.

[22] E. Osuna, R. Freund, F. Girosi, Training support vectormachines: an application to face detection, in: Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition, Puerto Rico, 1997, pp. 130–136.

[23] M.H. Yang, D. Roth, N. Ahuja, Face detection usinglarge margin classi"ers, Proceedings of the InternationalConference on Image Processing, Thessaloniki, Greece, 2001,pp. 665–668.

[24] K.K. Sung, T. Poggio, Example-based learning for view-basedhuman face detection, IEEE Trans. Pattern Anal. Mach. Intell.20 (1) (1998) 39–50.

[25] Q. Song, J. Robinson, A feature space for face imageprocessing, in: Proceedings of the 15th InternationalConference on Pattern Recognition, Barcelona, Spain, 2000,Vol. II, pp. 97–100.

[26] J. SchSurmann, Pattern Classi"cation: A Uni"ed View ofStatistical Pattern Recognition and Neural Networks, WileyInterscience, New York, 1996.

[27] U. KreTel, J. SchSurmann, Pattern classi"cation techniquesbased on function approximation, in: H. Bunke, P.S.P.Wang (Eds.), Handbook of Character Recognition andDocument Image Analysis, World Scienti"c, Singapore, 1997,pp. 49–78.

[28] L.L. Huang, A. Shimizu, Y. Hagihara, H. Kobatake, Facedetection from cluttered images using a polynomial neuralnetwork, Neurocomputing, 51 (2003) 197–211.

[29] A. Kawamura, et al., On-line recognition of freely handwrittenJapanese characters using directional feature densities,Proceedings of the 11th International Conference on PatternRecognition, Hague, Netherlands, 1992, Vol. II, pp. 183–186.

[30] H. Robbins, S. Monro, A stochastic approximation method,Ann. Math. Stat. 22 (1951) 400–407.

About the Author—LIN-LIN HUANG was born in April, 1968. She received the B.S. degree from Wuhan University, M.E. degree fromBeijing Polytechnic University and Ph.D. degree from Tokyo University of Agriculture and Technology, in 1989, 1994 and 2002, respectively.She had been a lecturer in Northern Jiaotong University, Beijing, China, from 1994 to 1998. Currently she is a research associate at TokyoUniversity of Agriculture and Technology. Her research interests include pattern recognition, image processing and computer vision.

About the Author—AKINOBU SHIMIZU was born in October, 1965. He received his B.E. and Ph.D. degrees from Graduate School ofEngineering, Nagoya University in 1989 and 1994, respectively. He became a research associate at Nagoya University in 1994, and hasbeen an associate professor in the Graduate School of Bio-Applications and Systems Engineering, Tokyo University of Agriculture and

Page 11: Gradient feature extraction for classification-based face detection

L.-L. Huang et al. / Pattern Recognition 36 (2003) 2501–2511 2511

Technology since 1998. His research interests include image processing and analysis. He is a member of the Japanese Society of MedicalImaging Technology, the Japanese Society for Medical and Biological Engineering, the Japan Society of Computer Aided Diagnosis ofMedical Images, and the IEEE.

About the Author—YOSHIHIRO HAGIHARA was born in Kanagawa Prefecture, Japan, in May, 1964. He received the B.E. and thePh.D. degrees from Tokyo University of Agriculture and Technology in 1990 and 1996, respectively. From 1993 to 1997 he worked as aresearcher with Systems Development Laboratory, Hitachi, Ltd. From 1997 to 2002 he worked as a research associate with Tokyo Universityof Agriculture and Technology. In 2002 he joined Iwate University, where he is now a Lecturer. His research interests range from patternrecognition to image processing with industrial applications.

About the Author—HIDEFUMI KOBATAKE was born in November, 1943. He received the B.E., M.E., and Ph.D. degrees from TheUniversity of Tokyo, Japan, in 1967, 1969 and 1972, respectively. He is now a Professor at Graduate School of Bio-Applications andSystems Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan. His research activities are in the areas of speechprocessing, image processing, and the applications of digital signal processing. He has received several awards, including a 1987 Society ofInstrument and Control Engineers’ Best Monograph Award and a 1998 Three Dimensional Image Conference’s Best Paper Award. He isthe member of the IEEE, the Society of Instrument, the Acoustical Society of Japan, etc.