1 face detection - advanced topics in information processing junhong liu agenda introduction of face...

40
1 Face Detection - Advanced Topics in Information Processing Junhong Liu Agenda Introduction of Face Detection Detecting Faces in A Single Image Face Image Databases A Bayesian Discriminating Features Method • Conclusions

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1

Face Detection- Advanced Topics in Information Processing

Junhong Liu

Agenda• Introduction of Face Detection

• Detecting Faces in A Single Image

• Face Image Databases

• A Bayesian Discriminating Features Method

• Conclusions

2

Introduction of Face Detection

• Definition: Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the image location and extent of each face.

• Numerous methods have been proposed to detect faces in a single image of intensity or color images, Yang et al. wrote a survey of face detection methods before 2001 [YKA].

• With an aim to introduce recent existing methods and applications to detect faces, my presentation collected techniques from: IEEE. Transactions on Pattern Analysis and Machine Intelligence, Journal of Pattern Recognition, Pattern Recognition letters, and Proc. Pattern Recognition, mainly after 2001.

• Challenges: face detection from a single image is a challenging task because of variability in scale, location, orientation, pose, etc. The challenges associated with face detection can be attributed to the following factors:

3

Introduction of Face Detection (Cont.)

(1) Pose. The images of a face vary due to the relative camera-face pose (frontal, 45o, profile, upside down), and some facial features such as an eye or the nose may become partially or wholly occluded.

(2) Presence or absence of structural components. Facial features such as beards, mustaches, and glasses may or may not be present and there is a great deal of variability among these components including shape, color, and size.

(3) Facial expression. The appearance of faces is directly affected by a person's facial expression.

(4) Occlusion. Faces may be partially occluded by other objects.(5) Image orientation. Face images directly vary for different rotations

about the camera's optical axis.(6) Imaging conditions. When the image is formed, factors such as

lighting (spectra, source distribution, and intensity) and camera characteristics (sensor response, lenses) affect the appearance of a face.

4

Introduction of Face Detection (Cont.)

Closely related problems of face detection: (1) Face localization aims to determine the image position of a

single face.(2) Facial feature detection is to detect the presence and location

of features: eyes, nose, nostrils, eyebrow, mouth, lips, ears, etc. (3) Face recognition or face identification compares an input

image against a database, and reports a match, if any.(4) Face authentication is to verify the claim of the identity of an

individual in an input image.(5) Face tracking methods estimate the location and possibly the

orientation of a face in an image sequence in real time.(6) Facial expression recognition concerns identifying the

affective states (happy, sad, disgusted, etc) of humans.

5

Detecting Faces in A Single ImageRecent existing techniques to detect faces in a single image are classified into four categories: (1) Knowledge-based methods. These rule-based methods encode human knowledge of what constitutes a typical face. The rules capture the relationships between facial features. These methods are mainly for face localization.(2) Feature invariant approaches. These algorithms aim to find structural features that exist even when the pose, viewpoint, or lighting conditions vary, and then use these to locate faces. These methods are mainly for face localization.(3) Template matching methods. Several standard patterns of a face are stored to describe the face as a whole or the facial features separately. The correlations between an input image and the stored patterns are computed for detection. These methods have been used for localization and detection.(4) Appearance-based methods. Models (or templates) are learned from a set of training images that should capture the representative variability of facial appearance. These learned models are then used for detection. These methods are designed mainly for face detection.

6

Detecting Faces In A Single Image: Knowledge-Based Top-Down Methods (1)

• In this approach, face detection methods are developed based on the rules derived from the researcher's knowledge of human faces.

• Simple rules to describe the features of a face and their relationships, e.g., a face often appears in an image with two eyes symmetric to each other, a nose, and a mouth.

• The relationships between features can be represented by their relative distances and positions.

(1) Facial features in an input image are extracted first;(2) Face candidates are identified based on the coded rules; (3) A verification process is usually applied to reduce false

detections.

17

Template Matching: Deformable Templates by Perlibakas (3.1)

It is based on mathematical morphology and the variational calculus for the detection of a face contour in still grayscale images [P]. (1) The facial features (eyes and lips) are detected by using the mathematical morphology and the heuristic rules.(2) Using these features an image is filtered and an edge map is prepared.(3) The face contour is detected using an active contour model (a variational snake) by minimizing its internal and external energy. The internal energy is defined by the contour tension and the rigidity. The external energy is defined by using the generalized gradient vector flow field of the image edge map with discarded face features. Initial contour is calculated using the detected face features. The contour detection experiments were performed using the database of 427 face images. Automatically detected contours were compared with manually labelled contours using an area and the Euclidean distance-base error measures.

Image and edge map filtering with respect to the detected face features allow reducing the number of edges that could cause contour detection errors.

19

Detecting Faces in A Single Image: Appearance-Based Methods (4)

• In general, the appearance-based methods rely on techniques from statistical analysis and machine learning to find the relevant characteristics of face and nonface images.

• The learnt characteristics are in the form of distribution function models or discriminant functions that are constantly used for face detection.

• Dimensionality reduction is usually carried out for the sake of computation efficiency and detection efficiency.

• Templates are used but learnt from examples of images.

23

Appearance-Based Methods:Support Vector Machines (4.2)

• Support Vector Machines (SVMs) can be considered as a new paradigm to train polynomial function, neural networks (NN) or radial basis function (RBF) classifiers.

• While most methods for training a classifier (e.g., Bayesian, NN, and RBF) are based on of minimizing the training error, i.e., empirical risk, SVMs operate on another induction principle, called structural risk minimization that aims to minimize an upper bound on the expected generalization error.

• An SVM classifier is a linear classifier where the separating hyperplane is chosen to minimize the expected classification error of the unseen patterns.

• This optimal hyperplane is defined by a weighted combination of small subset of the training vectors, called support vectors.

• Ma and Ding [MD], and Sahbi and Boujemaa [SB] proposed methods based on hierarchical SVMs respectively. Ai et al. presented a subspace approach with SVMs [AYX]. Xi and Lee developed a coordinate system and several SVMs to detect faces and extract facial features [XL].

28

Appearance-Based Methods:Bayes Classifier (4.6)

Chengjun Liu presented a Bayesian Discriminating Features (BDF) method for multiple frontal face detection [L]. The BDF method, trained on images from only one database, works on test images from diverse sources, displays robust generalization performance. It integrates the discriminating feature analysis of the input image, the statistical modeling of face and nonface classes, and the Bayes classifier for multiple frontal face detection:(1) Feature analysis derives a discriminating feature vector by combining the input image, its 1D Harr wavelet representation, and its amplitude projections.(2) Statistical modeling estimates the conditional probability density functions (PDFs) of the face and nonface classes. The face class is modeled as a multivariate normal distribution. The nonface class includes “the rest of the world”. a subset of the nonfaces that lies closest to the face class is derived and then modeled as a multivariate normal distribution.(3) The Bayes classifier applies the estimated conditional PDFs to detect multiple frontal faces in an image. Experimental results using 887 images (containing a total of 1,034 faces) from diverse image sources show the feasibility of the BDF method. In particular, the BDF method achieves 98.5 percent face detection accuracy with one false detection.

30

Face Image Databases

• Most face detection methods require a training data set of face images. The face image databases used by the referenced papers are:(1) BioID 2003 [BI](2) the Champion Database [CD](3) the Database of Faces [DF](4) MIT-CBCL [MC](5) PICS 2003 [PIC](6) the FERET database [PWHR](7) Yahoo News Photos [YNP]

31

A Bayesian Discriminating Features Method (BDF)

– one of the appearance-based methods developed by Chengjun Liu

• Discriminating Feature Analysis• Statistical Modeling of Face and Nonface Classes• The Bayesian Classifier for Multiple Frontal Face

Detection• Experiments

32

Discriminating Feature Analysis

(a) (b)Fig. 1 Face and natural images. (a) Some examples of the training faces that have been normalized to the standard resolution, 16*16; (b) An example natural image.

33

Discriminating Feature Analysis (Cont.)• The discriminating feature analysis derives a feature vector with

enhanced discriminating power for face detection, by combining the input image, its 1D Harr wavelet representation and its amplitude projections.

(a) (b)Fig. 2 Discriminating feature analysis of the mean face and the mean nonface: (a) The first image is the mean face, the second and the third images are its 1D Harr wavelet representation, and the last two bar graphs are its amplitude projections. (b) The mean nonface, its 1D Harr wavelet representation, and its amplitude projections.

34

Discriminating Feature Analysis (Cont.)

• Let I(i,j)m*n represent an input image, and Xmn be the vector formed by concatenating the rows (or columns) of I(i,j).

• The 1D Harr representation of I(i,j) yields two images, Ih(i,j)(m-1)*n and Iv(i,j)m*(n-1), corresponding to the horizontal and vertical difference images, respectively.

Ih(i,j) = I(i+1,j) - I(i,j), 1≤ i < m, 1 ≤ j ≤ n (1)

Iv(i,j) = I(i,j+1) - I(i,j), 1 ≤ i ≤ m, 1 ≤ j < n. (2)

Let Xh(m-1)n and Xvm(n-1) be the vectors formed by concatenating the rows (or columns) of Ih(i,j) and Iv(i,j).

• The amplitude projections of I(i,j) along its rows and columns form the horizontal (row) and vertical (column) projections, Xrm and Xcn respectively.

1

1

( ) ( , ),1 (3)

( ) ( , ),1 (4)

n

j

m

i

i I i j i m

j I i j j n

r

c

X

X

35

Discriminating Feature Analysis (Cont.)

The vectors X, Xh, Xv, Xr, and Xc are normalized by subtracting the means of their components and dividing by their standard deviations, respectively, to get the normalized vectors:

A new feature vector Ỹ is defined as the concatenation of the normalized vectors:

where t is the transpose operator and N=3mn is the dimensionality of the feature vector Ỹ.

^ ^ ^ ^ ^

( ) , (5)t t t t t th v r cY X X X X X

^ ^ ^ ^ ^

, , , , and .h v r cX X X X X

36

Discriminating Feature Analysis (Cont.)

The normalized vector Y of Ỹ defines the discriminating feature vector, Y N, which is the feature vector for the multiple frontal face detection system, and which combines the input image, its 1D Harr wavelet representation, and its amplitude projections for enhanced discriminating power:

where μ and σ are the mean and the standard deviation of Ỹ.

~

, (6)

Y

Y

37

Statistical Modeling of Face and Nonface Classes

• The main objective of statistical modeling of face and nonface classes is to estimate the conditional probability density functions (PDFs) of these two classes.

• The face class contains only faces; the nonface class encompasses all the other objects.

• The BDF method derives a subset of nonfaces that lies closest to the face class.

• The BDF method models faces and this particular subset of nonfaces as multivariate normal distributions respectively.

38

Statistical Modeling of Face and Nonface Classes: Face Class Modeling

The conditional density function of the face class, f , is modeled as a multivariate normal distribution:

where Mf N and ∑f N*N are the mean and the covariance matrix of face class. Take the natural logarithm on both sides, we have

11

2 2

1 1( | ) ( ) ( ) , (7)

2(2 ) (| |)

tf f f fN

f

p exp

Y Y M Σ Y M

Σ

11[ ( | )] ( ) ( ) (2 ) | | . (8)

2t

f f f f fln p Nln ln Y Y M Σ Y M Σ

39

Face Class Modeling (Cont.)

The covariance matrix, ∑f N*N, can be factorized into the following form using the principle component analysis (PCA):

Where

f N*N: an orthogonal eigenvector matrix,

ΛfN*N: a diagonal eigenvalue matrix with diagonal elements (eigenvalues) in decreasing order (λ1≥λ2≥…≥λN),

IN N*N : an identity matrix.

An important property of PCA is its optimal signal reconstruction in the sense of minimum mean-square error when only a subset of principle components is used to represent the original signal.

1 2 , , ,..., , (9)t t tf f f f f f f f N f Nwith diag Σ Φ Λ Φ Φ Φ Φ Φ I Λ

40

Face Class Modeling (Cont.)

The principle components are defined by the following vector, ZN:

It then follows from (8), (9), and (10) that

( ). (10)f

tf Z Φ Y M

11[ ( | )] (2 ) | | , (11)

2t

f f fln p Nln ln Y Z Λ Z Λ

Applying the optimal signal reconstruction property of PCA, we use only the first M (M≤N) principal components to estimate the conditional density function. A model by Moghaddam and Pentland estimates the remaining N-M eigenvalues, (λM+1, λM+2,…, λN), by the average of those values:

1

1. (12)

N

kk MN M

41

Face Class Modeling (Cont.)

It then follows from (11) and (12) that (¤¤¤)

2 221

1

1

1[ ( | )]

2

( ) (2 ) , (13)

MMf i ii

fi i

M

ii

zzln p

ln N M ln Nln

Y MY

where ||·|| denotes the norm operator, and zis are the components of Z defined by (10). Eq. (13) states that the conditional density function of face class can be estimated using the first M principle components, the input image, the mean face, and the eigenvalues of the face class.

42

Statistical Modeling of Face and Nonface Classes: Nonface Class Modeling

The nonface class modeling starts with the generation of nonface samples by applying (13) to natural images that do not contain any human faces at all. Those subimages of the natural scene that lie closest to the face class are chosen as training samples for the estimation of the conditional density function of the nonface class, n, which is also modeled as a multivariate normal distribution:

where MnN and ∑nN*N are the mean and the covariance matrix of nonface class.

11

2 2

1 1( | ) ( ) ( ) , (14)

2(2 ) (| |)

tn n n nN

n

p exp

Y Y M Σ Y M

Σ

43

Nonface Class Modeling (Cont.)

( ). (16)tn n U Φ Y M

( )

1

1. (17)

Nn

kk MN M

Factorize the covariance matrix, ∑n, using PCA:

where nN*N is an orthogonal eigenvector matrix, ΛnN*N a diagonal eigenvalue matrix with diagonal elements in decreasing order, and INN*N an identity matrix.

The principle components are defined by the following vector, UN:

1 2

( ) ( ) ( ) , , ,..., , (15)N

t t t n n nn n n n n n n n N nwith diag Σ Φ Λ Φ Φ Φ Φ Φ I Λ

Estimate the remaining N-M eigenvalues, , by the average of those values:

1 2

( ) ( ) ( ), ,...,M M N

n n n

44

Nonface Class Modeling (Cont.)

The conditional density function of the nonface class can be estimated as follows:

where uis are the components of U defined by (16). Eq. (18) states that the conditional density function of nonface class can be estimated using the first M principle components, the input image, the mean nonface, and the eigenvalues of the nonface class.

2 221

( )1

( )

1

1[ ( | )]

2

( ) (2 ) , (18)

i

i

MMn i ii

n ni

Mn

i

uuln p

ln N M ln Nln

Y MY

45

The Bayesian Classifierfor Multiple Frontal Face Detection

• Let YN be the discriminating feature vector constructed from an input pattern, i.e., a subimage of some test image.

• Let the a posteriori probabilities of face class and nonface class given Y be P(f|Y) and P(n|Y). The pattern is classified to the face class or the nonface class according to the Bayes decision rule for minimum error:

if ( | ) ( | ). (19)

otherwise.

f f n

n

P P

Y YY

46

The Bayesian Classifier for Multiple Frontal Face Detection (Cont.)

The a posteriori probabilities, P(f|Y) and P(n|Y), can be computed from the conditional PDFs using the Bayes theorem:

where P(f) and P(n) are the a priori probabilities of face class and nonface class, and p(Y) is the mixture density function.

( ) ( | ) ( ) ( | )( | ) , ( | ) , (20)

( ) ( )f f n n

f n

P p P pP P

p p

Y Y

Y YY Y

47

The Bayesian Classifier for Multiple Frontal Face Detection (Cont.)

From (13), (18), and (20), the Bayes decision rule for face detection is then defined as follows:

where δf , δn, τ are as follows:

2 221

1 1

( ) , (22)M MM

f i iif i

i ii

zzln N M ln

Y M

2 221 ( )

( )1 1

( ) , (23)i

i

M MMn i i ni

n ni i

uuln N M ln

Y M

( )2 . (24)

( )n

f

Pln

P

if . (21)

otherwise.

f f n

n

Y

48

The Bayesian Classifier for Multiple Frontal Face Detection (Cont.)

• δf and δn can be calculated from the input pattern Y, the face class parameters (the mean face, the first M eigenvectors, and the eigenvalues), and the nonface class parameters (the mean nonface, the first M eigenvectors, and the eigenvalues).

• τ is a constant, which functions as a control parameter — the larger the value is the fewer the false detections are (¤¤¤).

• To further control the false detection rate, the BDF method introduces another control parameter, θ, to the face detection system, such that

if ( ) and ( ). (25)

otherwise.

f f f n

n

Y

The control parameters, τ and θ , are empirically chosen for the face detection system.

49

Experiments: Data

• The training data for the BDF method consist of 600 FERET frontal face images from Batch 15 [PWHR] and nine natural images.

• The face class thus contains 1,200 face samples for training after including the mirror images of the FERET data.

• The nonface class consists of 4,500 nonface samples, which are generated by choosing the subimages that lie closest to the face class from the nine natural images.

• The BDF method is applied to detect frontal faces from three testing data sets: SET1, SET2, and SET3.

• SET1, consisting of all frontal face images of Batches 12, 13, and 14 from the FERET database, contains mainly head or head and shoulder pictures.

50

Data (Cont.)

• SET2, consisting of all frontal face images from the FERET Batch 2, contains upper body pictures and faces with glasses even having bright reflections.

• SET1 and SET2 consist of 511 and 296 images, respectively. Each image contains only one face (Fig. 3).

• SET3 is created from the MIT-CMU test sets [RBK] that contain frontal faces from diverse sources (Fig. 4):

– the World Wide Web, – photographs and newspaper pictures, – broadcast television.It consists of 80 images of total 227 faces, containing many different sized faces, rotated faces, very large faces, very small faces, low quality face images, partially occluded faces, or slightly pose-angled faces.

51

Data (Cont.)

(a) (b)Fig. 3(a,b) Face detection examples. A square indicates a face region successfully detected. The resolution of the images is 256*384, and the faces are detected at different scales. (a) From SET1. (b) From SET2: Some images contain faces with glasses having bright reflections.

52

Data (Cont.)

Fig. 4(a - g). Face detection samples from SET3. (a) Multiple frontal faces; (b) Multiple frontal faces with rotations; (c) Large frontal face; (d) Small frontal face; (e) Face in low quality image; (f) Partially occluded face; (g) Slightly pose-angled face.

53

Experiments: Statistical Learning of the BDF Method

• Learning the face class parameters: The statistical modeling of the face and the nonface classes requires the estimation of the parameters of these two classes from the training images.

• They are calculated as follows:(1) Normalize the 600 FERET images to a spatial resolution of 16*16 based on the fixed eye locations and interocular distance (Fig. 1a).(2) Add the mirror images of the 600 FERET faces to the face training set and increase the number of training samples to 1,200.(3) Derive the discriminating feature vectors.(4) Derive the face class parameters: the mean face, the face class eigenvectors, eigenvalues, and M. A good choice of M is to balance the face detection performance and the computational complexity. M=10, empirically.

54

Statistical Learning of BDF (Cont.)

• Learning the nonface class parameters. It starts with the generation of nonface samples from the nine natural images (Fig. 1b).

• The nonface images, chosen from the subimages of these nine natural images, have the standard spatial resolution of 16*16 and lie closest to the face class.

• 4,500 nonface images are generated from the nine natural images (Fig. 2b).

• After the generation of the nonface samples, the nonface class parameters can be calculated in the same way as the face class parameters are computed.

• Setting the two control parameters τ and θ.To control the false detection rate, these two control parameters are empirically chosen and are set as τ = 300 and θ = 500.

55

Experiments: Testing Performance of BDF

• The BDF method successfully detects 507 faces from the 511 images in SET1 without any false detection, 290 faces from the 296 faces in SET2 with no false detection (Fig. 3).

• SET3 is used to test the generalization performance of the BDF method. Fig. 4 shows part of the detection results. – In Fig. 4a, three faces are successfully detected at the scales 20

and 26, respectively. The scale 20 means that the original image is resized by a ratio, 16/20. Note that one face with a large pose is not detected, since the BDF method is trained to detect multiple frontal faces.

– The BDF method, trained only on the upright frontal faces, can also detect rotated faces by means of rotating the test images to a number of predefined degrees, such as ±5o, ± 10o, ± 15o, and ± 20o. In Fig. 4b, two scales (30, 38) and one rotation -20o are required.

56

Testing Performance of BDF (Cont.)

• (Cont.)– The BDF method is also tested on images that contain very large or

very small faces. Figs. 4c-d show the real face detection performance on these test images.

– The generalization performance of the BDF method is further tested using low quality face images, partially occluded faces, and slightly pose-angled faces (Figs. 4e-g). The successful performance shows the robustness of the BDF method in real face detection.

– In SET3, there are six faces that are not detected by the BDF method: three pose-angled faces, a baby face, a masked face, and one in a low quality image.

– Fig. 5 shows some examples of missed faces and false detection: A low resolution face in Fig. 5a and a slightly pose-angled face in Fig. 5b. Also a false detection occurs in Fig. 5b.

57

Testing Performance of BDF (Cont.)

(a) (b)Fig.5 Examples of missed faces and false detection. A face in (a) and a slightly pose-angled face in (b). A false detection occurs in (b).

58

Testing Performance of BDF (Cont.)• (Cont.)

– The experimental results using 80 test images (containing in total 227 faces) from MIT-CMU test sets show that the BDF method detects 221 out of the 227 faces in these images with one false detection.

• The following table summarizes the detection performance of the BDF method for the testing data sets: SET1, SET2, and SET3. The overall face detection performance of the BDF method using 887 images containing a total of 1,034 faces is 98.5 percent correct face detection rate with one false detection.

data sources images faces detected false detection

SET1 FERET Batches 12, 13, and 14

511 511 507 0

SET1 FERET 2 296 296 290 0

SET1 MIT-CMU Test Sets

80 227 221 1

Total — 887 1,034 1,018 1

59

Conclusions

• I attempt to provide a survey of research on face detection after 2001, by classifying methods over the about 20 papers from several sources into four main categories. However, some methods can be classified into more than one category. For example, template matching methods usually use a face model and subtemplates to extract facial features [P], and then use these features to locate or detect faces. Furthermore, the boundary between knowledge-based methods and some template matching methods is blurry since the latter usually implicitly applies human knowledge to define the face templates [P].

• I have detailed one method while reporting the performance of the other relative methods. But there is a lack of uniformity in how methods are evaluated and, it is imprudent to explicitly declare which methods indeed have the lowest error rates.

• Although significant progress has been made in the past, there is still work to be done, and we believe that a robust face detection system should be effective under full variation in:

(1) lighting conditions; (2) orientation, pose, and partial occlusion; (3) facial expression; (4) presence of structural components, facial hair, and a variety of hair style.