extraction of male and female facial features using neural networks

Extraction of Male and Female Facial Features Using Neural

Networks

Tsuneo Kanno

Department of Information and Computer Science, Polytechnic University, Sagamihara, Japan 229-1196

Hiroshi Nagahashi

Department of Information Processing, Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of

Technology, Yokohama, Japan 227-8503

Takeshi Agui

Faculty of Engineering, TOIN University of Yokohama, Yokohama, Japan 225-5802

SUMMARY

Features of male/female facial images are examined

by analyzing the distribution of connection weights of the

hidden-layer unit and the input-layer unit of a neural net-

work that responds to their distinction. Twenty-five gray-

level (black-and-white) facial images were used for each of

the male and female samples. Several different numbers of

mosaic blocks were used for the whole face region and each

facial component region. The training parameters of the

neural network and the minimum number of hidden layers

that can discriminate male/female were obtained by a ge-

netic algorithm. The neural network consisting of these

parameters has a stable convergence in any region so that

this can discriminate male/female with 100% accuracy. In

the feature extraction experiments, hidden layers were ob-

tained, which significantly respond to male/female facial

regions. The facial features of male/female can be extracted

by analyzing the connection weights between the hidden-

layer unit and the input-layer unit. The facial features

extracted by using the proposed method were similar to

those obtained by other psychological experiments and

facial measurements. © 2000 Scripta Technica, Syst Comp

Jpn, 31(3): 68�76, 2000

Key words: Neural networks; male and female fa-

cial features; hidden-layer unit; connection weight; genetic

algorithm.

1. Introduction

Humans can instantly recognize some personal fea-

tures from a face such as personal identity, sex, age, and

facial expressions. It has been considered that personal

identification is carried out by synthesizing of common

facial features. Facial features have been studied in psychol-

ogy and recognition science for a long time [1], and in

sensitivity engineering recently [2]. Facial components,

such as eyes, nose, and mouth, have been used for male/fe-

male identity [3�6]. Measured data of position of facial

components have also been used for male/female identity

[7, 8]. For example, Yamaguchi and colleagues [8] have

© 2000 Scripta Technica

Systems and Computers in Japan, Vol. 31, No. 3, 2000Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J81-D-II, No. 11, November 1998, pp. 2645�2652

68

examined the positions of facial components (at 26 feature

points) to estimate the age and sex of each person, using 50

Japanese men and 50 Japanese women (all in their 20s), and

25 Japanese boys and 25 Japanese girls (average age 6

years). Their results show that facial features are deter-

mined by relatively small parts such as eyes, eyebrows, and

nose.

Computer recognition of human faces has been stud-

ied since the 1970s [9, 10]. Facial discrimination of

male/female by using a neural network (NN) has shown

high accuracy [11�13]. An NN is a parallel-distributed

information processing and is suitable for a data processing

containing ambiguous elements (e.g., edges of a nose and

a mouth). An NN can find a regularity in an object by

repeating its training, since the NN has a flexibility of

changing coefficients which form the circuit structure. The

intermediate layer of a hierarchical NN gathers input data

obtained by training, and this carries out a kind of multi-

variate analysis by repeated training. If back-propagation

(BP) training is carried out ideally, the NN extracts the

k-dimensional main components of the data from k hidden

units [14].

Kosugi [13] has obtained accurate sex and personal

identifications by using mosaic facial images with an NN.

He also has examined male/female features from facial

images automatically extracted by the hidden-layer units.

These sex and personal identification methods using

a conventional NN aim at practical automatic facial identi-

fication, without examining their technical details such as

a fine feature extraction of each facial element, the features

of the hidden-layer unit by which the features of an image

are extracted, and the number of mosaic blocks in an

original image.

This paper examines features of male/female facial

images by using the distribution of connection weights of

the hidden-layer unit and the input-layer unit of a hierarchi-

cal NN that can identify these images with 100% accuracy

after training with a BP. In this method, facial features are

extracted by using not only the whole facial region but each

facial-component region, by choosing the number of mo-

saics for each area. The results of the experiments were

evaluated by comparison with the data reported in psycho-

logical experiments and measured data. The training pa-

rameters of the NN and the minimum required number of

the hidden-layer units were determined by genetic algo-

rithms (GA).

2. Feature Extraction Using NN

2.1. Structure of NN

An NN consisting of a three-layer hierarchical struc-

ture (a single hidden layer) is used in this paper. A BP

method including inertia terms is used for training. The NN

is connected between units of each layer, and units in each

layer are not connected. The output of the hidden-layer unit

Hj and the output of the output-layer unit Ok are given,

respectively, by

where

Ii: Output of input-layer unit i

wij: Connection weight between input-layer unit i and

hidden-layer unit j

vjk: Connection weight between hidden-layer unit j

and output-layer unit k

NI: Number of input-layer units

NH: Number of hidden-layer units

H: Slant of sigmoid function

The mean square error E of the teaching signal tnk and

the output Ok of the output-layer unit is given by

where

ND: Number of training patterns

NO: Number of output-layer units

The connection weights wij and vjk in the NN are sequen-

tially corrected, and the NN repeatedly is trained until E

becomes less than a preset value. When E becomes less than

the preset value, the training is stopped regarding that a

convergence is achieved.

2.2. Determination of learning parameter

using GA

Determination of parameters (such as the training

coefficient K, the slant of sigmoid function H, and the inertia

coefficient D) in each training problem for pattern recogni-

tion using a hierarchical NN requires much experience and

knowledge. These training parameters significantly affect

the capability of the NN such as the number of trainings and

the convergence stability [15].

Recently the GA method has been used for determi-

nation of the structure of an NN and its learning parameters.

(1)

(2)

(3)

(4)

69

NNs consisting of various parameters obtained by using the

GA have excellent convergence stability [16]. The GA can

search for the best value in a large space speedily and

efficiently [17]. The authors have successfully used the GA

for determination of the training parameters for the NN for

male/female facial identity [18].

In this experiment, the parameters K. H, D, and NH are

obtained by using the GA while varying the number of

mosaics in each region of the object. These parameters are

defined as genetic elements of each body. Each genetic

element forms the NN by using the parameters given as

genetic elements, and the element is repeatedly trained by

the data of the input facial image so that E and NT are

obtained. The fitness of an individual is defined by

where

Emin: Minimum mean square error

E: Mean square error

Hmax: Maximum number of hidden-layer units

NH: Number of hidden-layer units

Nmax: Maximum number of learning steps

NT: Number of training steps

w1, w2, w3: Weight coefficients

Emin, Hmax, and Nmax are determined by preliminary

experiments. If E is higher than Emin when NN reaches

Nmax, then f is forced to be 0 so that the fitness becomes low.

The second term of Eq. (5) increases when the num-

ber of the hidden-layer units is reduced, and f increases. w1,

w2, w3 are determined depending on the importance of E,

NH, or NT. If NH is large, the gender-specific facial features

may become dispersed. In this experiment, to obtain the

necessary minimum number of hidden-layer units, w2 is

chosen to be greater than w1 and w3. The convergence of

the NN and NT are significantly influenced by the distribu-

tion of the initial connection weight, which is determined

by using random numbers. A mean fitness, which is a mean

value of fitness of five trainings with different initial

weights, was used for each individual.

Among individuals, generated in a computer, those

with a large f (i.e., with a small E, NH, and NT) remain

following the genetic rules of a generation alteration (in-

creasing, crossover, mutation). In the process of an increase

in the genetic rules, individuals with a large f remain at a

certain rate Gr in the order of magnitude and the rest are

discarded. In this experiment, an �elite strategy� (certain

individuals are succeeded as the next generation) was used.

In the crossover process, a pair of individuals having excel-

lent chromosomes are crossed over with probability of Cr

in each genetic block so that the total number of individuals

Np is always constant. A mutation is carried out by bit-re-

versing in the chromosome of an individual with a mutation

probability Mr. The generation alteration is repeated in this

way a predetermined number of times. Then each parameter

obtained from the chromosome of an individual having the

highest f, among all of the individuals of the last generation,

is used for the number of units of the hidden-layer unit and

each training parameter.

2.3. Extraction of male/female facial features

Male/female facial features can be identified by using

an NN (consisting of parameters that are obtained by the

GA method) and by repeating its training. In this process,

the hidden-layer unit shows four patterns: (a) the pattern

significantly responds to a male facial image alone, (b) the

pattern significantly responds to a female facial image

alone, (c) the pattern responds to both male and female

facial images, or (d) the pattern does not respond to any

facial image. It is not possible to predict that a pattern

appears on the hidden-layer unit, since this depends on the

state of an initial connection weight. However, the NN

identifies a male or female face by using (a) or (b). A part

in which input Ii connected to a positive connection weight

wij is large (i.e., an image is bright) and a part in which input

Ii connected to a negative connection weight wij is small

(i.e., an image is dark) significantly contribute to the in-

crease of output Hi of hidden-layer unit j. For example, the

output Hj of a hidden-layer unit j (which responds to a male

facial image alone) strongly depends on the connection

weight wij (which connects j and Ii).

Therefore, the largest feature by which an NN iden-

tifies male/female facial images can be estimated by the

distribution of wij. This paper uses this method.

3. Experimental Results and Discussion

3.1. Input facial image

To extract a feature of each part of a facial image by

using the proposed method, it is necessary to normalize the

size and position of the input image. Also, it is necessary to

minimize the unevenness of photographic conditions of the

image, since the image is a gray-scale type (black-and-

white). In this experiment, 60 facial photographs (25 male

and 25 female faces for training, and 5 male and 5 female

faces for the NN evaluation) were sampled from a school

album in which all of the photographs were taken under

relatively uniform light conditions. Each photograph was

scanned by an image scanner, and all were manually nor-

(5)

70

malized so that the background was removed, the eye

positions were leveled, and the face width was constant.

Faces with glasses, hair decorations, and mustaches were

excluded. Each photograph was converted from a color

image to a 256-gray-scale (black-and-white) image with

600 u 640 pixels.

To extract details of important features, seven regions

of each face as shown in Fig. 1(a) were used for the

experiment: (r1) the whole face except for hair, (r2) eyes

with eyebrows, (r3) eyebrows, (r4) eyes, (r5) nose and

mouth, (r6) nose, and (r7) mouth. (r7) was adjusted verti-

cally for each face since individual differences of mouth

size are considerable.

Each facial region is converted into a mosaic image,

each mosaic being a rectangle. The number of pixels varies

from 8 u 8 (r1 region: 12 u 12) to the maximum number

where the NN cannot identify a male/female face. The mean

gray level of each mosaic block is used for its gray level.

Figure 1(b) shows an example in which 16 u 16 mosaic

blocks (24 u 24 pixels) are used for the whole (r1) region.

The reason why the number of training images is

greater than the number of evaluation images is to examine

the elements of male/female identification in the NN for

which the training has been completed. These evaluation

images were also used, for the NN after the training has

been completed, as untrained data, so that the degree of

completion of the NN is evaluated.

3.2. Determination of training parameters by

using GA, and evaluation

The number of input-layer units for the NN is the

same as the number of mosaic blocks in each facial image.

The number of hidden-layer units is the same as the mini-

mum necessary number of possible convergence given by

the GA. Two output-layer units were used since there are

two categories (male and female). To determine the parame-

ters for the NN and GA, preliminary experiments were

carried out for the nose (r6) and mouth (r7), in which

convergence is most difficult. The number of bits of each

training parameter of the NN was determined experimen-

tally by assuming that the hidden-layer unit is 40 from the

number of mosaic blocks (the same as the number of

input-layer units) and referring to the training parameters

in the XOR problem. The maximum training iterations and

the minimum mean square error were determined in con-

sideration of the convergence of the NN and the rate of

correct solutions.

It is useful to increase the total number of individuals

that are parameters of the GA since a search region of a

single generation is expanded. It is also useful to expand

the upper limit of generation alteration in the search for the

fittest value. However, the increase of these values also

increases the computation load and the search time. There-

fore, the amount of computation is taken into account for

the upper limits of the total number of individuals and the

number of generation alterations. The survival probability,

crossover probability, the mutation probability, and each

weight coefficient were determined by considering the

distribution of fitness and the transition of the fittest indi-

viduals between generations.

The experiments were carried out based on the pre-

liminary experiments. Twenty-seven bits were used for the

chromosomes of each individual, which determine each

parameter of the NN and the number of hidden-layer units

NH. K, H, and D are chosen to be (0.00 to 1.27). These values

are derived from 7-bit integers extracted from 27-bit chro-

mosomes, multiplying each by 0.01. Six-bit integers (0 to

63) were used for the number of hidden-layer units. In this

experiment, the NN (consisting of parameters obtained

from each individual) sequentially learns the 50 facial

images, and the learning is repeated until the minimum

mean square error Emin becomes less than 10�4 or the

maximum training Nmax exceeds 500. Np = 100, the upper

limit of generation alteration = 50, and Gr = 0.05 were used.

The occurrence probability Cr = 0.20 for each block of

parameter-bit line was used. A mutation is represented by

bit-reversed Mr with 0.02. The fitness f of each individual

is obtained by using w1 = 0.2, w2 = 0.6, and w3 = 0.2,

respectively, for the weight coefficient of Eq. (5), so that

the minimum necessary number of hidden-layer units is

obtained.

Table 1 shows the parameters for each mosaic-de-

pending region obtained by the GA. Parameters for a region

that has a coarser mosaic than those shown in Table 1 cannot

be obtained by using the GA. The convergence rate, mean

training iteration, and male/female facial identification rate

Fig. 1. Seven regions used in the experiment and an

example of mosaic pattern.

71

of trained and untrained facial images (with 100 different

initial connection weights) were used for the evaluation of

the parameters. The NN consisting of parameters obtained

by the GA highly converges, and this identifies trained/un-

trained mate/female facial images with almost 100% accu-

racy. An exception is the (r5) region, which has 2 u 3 mosaic

blocks over the nose and mouth, with a convergence rate of

about 60%.

Using an NN consisting of parameters used in the

GA, the effects of the number of hidden-layer units on the

convergence rate and the training iterations were investi-

gated. The number of hidden-layer units was varied from 1

to Hmax = 40, and 100 experiments were carried out with

different initial connection training iterations. Figure 2

shows the results of the experiment obtained for the whole

facial region (r1). The results show that this kind of NN

almost 100% converges independently of the initial connec-

tion weight, provided the number of hidden-layer units is

greater than the obtained number by GA. An exception is

the nose region (r5) where the number of mosaics is 2 u 3.

3.3. Characteristics of hidden-layer unit

When male/female facial images (50 each) are ap-

plied to the trained NN, four hidden-layer units [(a) to (d)

Table 1. The obtained neural network parameters by GA, and the results of estimation

Fig. 2. Effects of the number of hidden units on

convergent rate and training iterations.

72

in Section 2.3] were obtained. Figure 3 shows the distribu-

tion of the connection weights between the hidden-layer

unit and the input-layer unit in the whole facial region (r1),

with varying number of mosaic blocks. (a) has 4 u 4

mosaics, (b) has 8 u 8 mosaics, (c) has 16 u 16 mosaics, and

(d) has 32 u 32 mosaics. The size of the squares in Fig. 3 is

proportional to the absolute values of the connection

weights, a white square being positive, and a black square

negative. The connection weights are normalized by using

the absolute value of the maximum connection weight.

The influence of the distribution of connection

weights on its initial value and the number of hidden-layer

units is investigated. The experiments were repeated 100

times with different initial values so that the connection

weights for four feature hidden-layer units were obtained.

The four weight distributions were evaluated by the mean

value of variance of each mosaic block (variance in a

group). The effect of the hidden-layer units was evaluated

by varying the connection weight from the number of units

obtained by the GA to a maximum of 40. Table 2 shows the

intragroup variance of connection weights in each region.

Male/female facial features shown in Fig. 3 are almost

independent of the initial values of the connection weights.

However, if the number of hidden-layer units is changed,

the intragroup variance becomes larger in the whole region.

The reason for this is apparently that when the hidden-layer

units are increased, the facial features are distributed; and

when they are reduced, the features are condensed, so that

the intragroup variance is affected. The connection weight

distribution of hidden-layer units has a larger variance and

has less regularity than that of two other connection weight

distributions. It is necessary to investigate the role of these

feature hidden-layer units in detail.

3.4. Results of extraction of facial features

3.4.1. Whole facial area (r1)

Male/female features in the whole face (r1) are esti-

mated by the distribution of connection weights shown in

Fig. 3. The connection weight for male and female are

almost the same with an opposite sign. For example, a dark

part in a male face corresponds to a bright part in a female

face. Even the NN with 4 u 4 mosaic blocks could identify

male/female accurately. However, it is difficult to know

which identity element in this NN contributed to the iden-

tification. Such elements can be found easily in Figs. 3(c)

and 3(d), which have more mosaic blocks.

The cheeks and chin of most female faces have a

lighter skin color, and the connection weight is positive

(i.e., a female face is generally brighter than a male face).

The right edge and chin of most faces have a large positive

value connection weight as shown in Fig. 3(c). This is due

to the fact that these photographs were taken by illuminat-

ing the faces from the right side.

Fig. 3. Four configurations of connection weights in the

r1 region of both males and females.

Table 2. Within-group variance of connection weights

in each region

73

The negative areas on two sides (left and right) of a

female facial image are due to the shade of hair on the

forehead, and this is darker than in a male facial image.

Large positive areas around the ears of a male facial image

are due to the fact that most males� hairs does not reach their

ears.

The results of these experiments agree with the result,

obtained in many psychological experiments, that hair

shape is the main factor of discrimination of male/female

[1], and with the investigation of the skin color of Japanese

youths [19].

3.4.2. Regions of eyebrows and nose (r2, r3, r4)

Figure 4 shows the distributions of connection

weights in the regions of the eyebrows, eyes, nose, and

mouth superimposed on original facial images (16 u 16

mosaic blocks) each of which has the largest value of the

output-layer unit as male and female.

Generally, the eyebrows of men are thicker and

darker than those of women. The authors� experiments

show that the connection weight of the eyebrow region (r2)

of a female face is negative. This is caused by shade cast by

the hair at the top of the female face when photographed

(although photographs were preselected to avoid this ef-

fect). The test results of the eyebrow region (r3) show that

a large negative connection weight appears below the eye-

brows on male facial images (which are a characteristic of

male faces).

There are large connection weights around the pupils

of the female face. This indicates that females have large

distinct pupils, while males have a thin eyelid opening.

These results agree with the measured results of

Yamaguchi and colleagues [8] in which facial configura-

tions of Japanese were analyzed.

3.4.3. Nose and mouth regions (r5, r6, r7)

Referring to the nose region (r5, r6), there are many

negative connection-weight regions to the left side of the

nose in the female faces. This is caused by the shade cast

by the nose when the face was photographed. The nose

height can be estimated by its shade. This suggests that

females have narrower and higher noses compared with

males. The result that the female nose is narrower than the

male nose agrees with Yamaguchi�s paper [8].

In the mouth region (r7), many female facial images

have positive connection weights around the right side of

the nose. This seems to be an effect of the photographic

conditions in addition to the bright color of the lips. The

connection weight is negative around the mouth of male

faces, and positive for female faces. This is due to the

difference of skin colors of male/female and the fact that

males� lips are thicker than females�.

4. Conclusions

In this paper, the features of male/female faces are

extracted using an NN. The results of the experiments give

the features of hair style and skin colors (from the whole

face); the size of eyebrows, the size of eyes (from the eyes

region); the nose height (from the nose region); and the

thickness of the lips (from the mouth region). The experi-

ments in this paper used only 50 training images and 10

evaluation images (all images from school photographs).

However, the results are similar to those of psychological

experiments and facial measurements, showing that the NN

can accurately identify partial features of male/female

faces.

Future plans of this project are: (a) to sample facial

images from a wide age group; (b) to include color infor-

mation; (c) to include the features of the whole face [8, 20];

and (d) to examine in detail the roles of hidden-layer units.

These extensions will increase the applications of the pro-

Fig. 4. Examples of connection weights in the r2, r3, r4,

r5, r6, r7 regions of both male and female.

74

posed method, including understanding the information-

processing mechanism used in discriminating male/female

identity.

Acknowledgments. The authors wish to express

their thanks to Professor Yasumasa Teramachi (Department

of Information and Computer Science, Polytechnic Univer-

sity) for his advice, and to Mr. Hideki Kumano and Shin-

ichi Okamoto (Vocational Promotion Corporation) for their

support in preparing facial images.

REFERENCES

1. Burce V (translated by Yoshukawa S). Facial recog-

nition and information processing. Science-Sha;

1990.

2. Edited by Multidisciplinary Research Council of Ja-

pan (supervised by Ichimatsu S, Muraoka Y).

Sensitivity and information processing. Kyoritsu-

Shuppan; 1993.

3. Roberts T, Burce V. Feature saliency in judging sex

and familiarity of faces. Perception 1988;17:475�

481.

4. Burce V, Burton AM, Hanna E, Healy P, Mason O.

Sex discrimination: How do we tell the difference

between male and female. Perception 1993;22:131�

152.

5. Kanno T, Ahui T, Nagahashi H. Psychological ele-

ments for male/female identification. Spring Na-

tional Conference IEICE, Japan, D-518, p 279, 1993.

6. Kumano H, Kanno T, Teramachi Y, Nagahashi H.

Fixation points measurement system considering in-

fluence of peripheral vision. Tech Rep IEICE

1996;HCS96-27.

7. Burton AM, Burce V, Dench N. What�s the difference

between men and women? Evidence from facial

measurement. Perception 1993;22:153�176.

8. Yamaguchi M, Kato T, Akamatsu S. Relationship

between physical traits and subjective impressions of

the face. Trans IEICE 1996;J79-A:279�287.

9. Toshiyuki, Nagao M, Kanai T. Analysis of facial

photograph using computer. Trans IEICE 1973;56-

D:226�233.

10. Minami T. Facial identity technology. SICE 1986;25.

11. Golomb BA, Lawrence DT, Sejnowski TJ. SEXNET:

A neural network identifies sex from human faces.

Adv Neural Inf Process Syst 1991;3:572�577.

12. Kaai H, Tamura S. Gender and individual classifica-

tion by mosaic facial images with different resolu-

tions using neural network. ITE 1992;46:93�96.

13. Kosugi M. Human face recognition using mosaic

pattern and neural networks. Trans IEICE 1993;J76-

D-II:1123�1139.

14. Funaba K. Principle of hierarchical neural networks.

SICE 1991;30:280�284.

15. Agui T, Nagahashi H, Takahashi H. Neural pro-

grammes. Shokodo; 1993.

16. Takahashi H, Nakajima M. A study of feedforward

neural networks using genetic algorithms. Trans

IEICE 1996;J79-D-II:1920�1928.

17. Tomoharu, Nagao, Agui T. Genetic algorithm. Shok-

odo; 1993.

18. Sugano T, Kumano H, Teramachi Y, Nagahashi H.

Designing neural networks for recognition of male

and female faces using genetic algorithms. Trans

IEICE 1996;J80-D-II:2251�2253.

19. Edited by the Color Science Association of Japan.

Handbook of color science. Tokyo University Press;

1991. p 1097�1117.

20. Rhodes G. Looking at faces: First-order and second-

order features as determinants of facial appearance.

Perception 1988;17:43�63.

75

Tsuneo Kanno graduated from the Electronics Department of Polytechnic University in 1975. He then joined the

Department of Information and Computer Science of that university and presently is a lecturer (information engineering). He

has engaged in research on pattern recognition and human communication. He is a member of IEICE and the Institute of Printing

of Japan.

Hiroshi Nagahashi graduated from the Electrical Engineering Department of Tokyo Institute of Technology in 1975 and

completed his doctoral course in 1981. He then joined Yamagata University. He moved to Tokyo Institute of Technology in 1990

and presently is a professor (physical information engineering) in the Department of Information Processing, Interdisciplinary

Graduate School of Science and Engineering. He has engaged in research mainly on pattern recognition and image processing.

He holds a D.Eng. degree. He is a member of the Institute of Information Processing of Japan, IEEE, and IEICE.

Takeshi Agui graduated from the Electrical Engineering Department of Tokyo Institute of Technology in 1959 and

completed his doctoral course in 1964. He was with that institute from 1964 to 1996. He then joined TOIN University of

Yokohama as a professor (control system engineering). He has engaged in research on pattern recognition and image processing.

He holds a D.Eng. degree. He is a member of the Institute of Information Processing of Japan, IEEE, and IEICE.

AUTHORS (from left to right)

76

extraction of male and female facial features using neural networks

Documents