extraction of male and female facial features using neural networks
TRANSCRIPT
Extraction of Male and Female Facial Features Using Neural
Networks
Tsuneo Kanno
Department of Information and Computer Science, Polytechnic University, Sagamihara, Japan 229-1196
Hiroshi Nagahashi
Department of Information Processing, Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of
Technology, Yokohama, Japan 227-8503
Takeshi Agui
Faculty of Engineering, TOIN University of Yokohama, Yokohama, Japan 225-5802
SUMMARY
Features of male/female facial images are examined
by analyzing the distribution of connection weights of the
hidden-layer unit and the input-layer unit of a neural net-
work that responds to their distinction. Twenty-five gray-
level (black-and-white) facial images were used for each of
the male and female samples. Several different numbers of
mosaic blocks were used for the whole face region and each
facial component region. The training parameters of the
neural network and the minimum number of hidden layers
that can discriminate male/female were obtained by a ge-
netic algorithm. The neural network consisting of these
parameters has a stable convergence in any region so that
this can discriminate male/female with 100% accuracy. In
the feature extraction experiments, hidden layers were ob-
tained, which significantly respond to male/female facial
regions. The facial features of male/female can be extracted
by analyzing the connection weights between the hidden-
layer unit and the input-layer unit. The facial features
extracted by using the proposed method were similar to
those obtained by other psychological experiments and
facial measurements. © 2000 Scripta Technica, Syst Comp
Jpn, 31(3): 68�76, 2000
Key words: Neural networks; male and female fa-
cial features; hidden-layer unit; connection weight; genetic
algorithm.
1. Introduction
Humans can instantly recognize some personal fea-
tures from a face such as personal identity, sex, age, and
facial expressions. It has been considered that personal
identification is carried out by synthesizing of common
facial features. Facial features have been studied in psychol-
ogy and recognition science for a long time [1], and in
sensitivity engineering recently [2]. Facial components,
such as eyes, nose, and mouth, have been used for male/fe-
male identity [3�6]. Measured data of position of facial
components have also been used for male/female identity
[7, 8]. For example, Yamaguchi and colleagues [8] have
© 2000 Scripta Technica
Systems and Computers in Japan, Vol. 31, No. 3, 2000Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J81-D-II, No. 11, November 1998, pp. 2645�2652
68
examined the positions of facial components (at 26 feature
points) to estimate the age and sex of each person, using 50
Japanese men and 50 Japanese women (all in their 20s), and
25 Japanese boys and 25 Japanese girls (average age 6
years). Their results show that facial features are deter-
mined by relatively small parts such as eyes, eyebrows, and
nose.
Computer recognition of human faces has been stud-
ied since the 1970s [9, 10]. Facial discrimination of
male/female by using a neural network (NN) has shown
high accuracy [11�13]. An NN is a parallel-distributed
information processing and is suitable for a data processing
containing ambiguous elements (e.g., edges of a nose and
a mouth). An NN can find a regularity in an object by
repeating its training, since the NN has a flexibility of
changing coefficients which form the circuit structure. The
intermediate layer of a hierarchical NN gathers input data
obtained by training, and this carries out a kind of multi-
variate analysis by repeated training. If back-propagation
(BP) training is carried out ideally, the NN extracts the
k-dimensional main components of the data from k hidden
units [14].
Kosugi [13] has obtained accurate sex and personal
identifications by using mosaic facial images with an NN.
He also has examined male/female features from facial
images automatically extracted by the hidden-layer units.
These sex and personal identification methods using
a conventional NN aim at practical automatic facial identi-
fication, without examining their technical details such as
a fine feature extraction of each facial element, the features
of the hidden-layer unit by which the features of an image
are extracted, and the number of mosaic blocks in an
original image.
This paper examines features of male/female facial
images by using the distribution of connection weights of
the hidden-layer unit and the input-layer unit of a hierarchi-
cal NN that can identify these images with 100% accuracy
after training with a BP. In this method, facial features are
extracted by using not only the whole facial region but each
facial-component region, by choosing the number of mo-
saics for each area. The results of the experiments were
evaluated by comparison with the data reported in psycho-
logical experiments and measured data. The training pa-
rameters of the NN and the minimum required number of
the hidden-layer units were determined by genetic algo-
rithms (GA).
2. Feature Extraction Using NN
2.1. Structure of NN
An NN consisting of a three-layer hierarchical struc-
ture (a single hidden layer) is used in this paper. A BP
method including inertia terms is used for training. The NN
is connected between units of each layer, and units in each
layer are not connected. The output of the hidden-layer unit
Hj and the output of the output-layer unit Ok are given,
respectively, by
where
Ii: Output of input-layer unit i
wij: Connection weight between input-layer unit i and
hidden-layer unit j
vjk: Connection weight between hidden-layer unit j
and output-layer unit k
NI: Number of input-layer units
NH: Number of hidden-layer units
H: Slant of sigmoid function
The mean square error E of the teaching signal tnk and
the output Ok of the output-layer unit is given by
where
ND: Number of training patterns
NO: Number of output-layer units
The connection weights wij and vjk in the NN are sequen-
tially corrected, and the NN repeatedly is trained until E
becomes less than a preset value. When E becomes less than
the preset value, the training is stopped regarding that a
convergence is achieved.
2.2. Determination of learning parameter
using GA
Determination of parameters (such as the training
coefficient K, the slant of sigmoid function H, and the inertia
coefficient D) in each training problem for pattern recogni-
tion using a hierarchical NN requires much experience and
knowledge. These training parameters significantly affect
the capability of the NN such as the number of trainings and
the convergence stability [15].
Recently the GA method has been used for determi-
nation of the structure of an NN and its learning parameters.
(1)
(2)
(3)
(4)
69
NNs consisting of various parameters obtained by using the
GA have excellent convergence stability [16]. The GA can
search for the best value in a large space speedily and
efficiently [17]. The authors have successfully used the GA
for determination of the training parameters for the NN for
male/female facial identity [18].
In this experiment, the parameters K. H, D, and NH are
obtained by using the GA while varying the number of
mosaics in each region of the object. These parameters are
defined as genetic elements of each body. Each genetic
element forms the NN by using the parameters given as
genetic elements, and the element is repeatedly trained by
the data of the input facial image so that E and NT are
obtained. The fitness of an individual is defined by
where
Emin: Minimum mean square error
E: Mean square error
Hmax: Maximum number of hidden-layer units
NH: Number of hidden-layer units
Nmax: Maximum number of learning steps
NT: Number of training steps
w1, w2, w3: Weight coefficients
Emin, Hmax, and Nmax are determined by preliminary
experiments. If E is higher than Emin when NN reaches
Nmax, then f is forced to be 0 so that the fitness becomes low.
The second term of Eq. (5) increases when the num-
ber of the hidden-layer units is reduced, and f increases. w1,
w2, w3 are determined depending on the importance of E,
NH, or NT. If NH is large, the gender-specific facial features
may become dispersed. In this experiment, to obtain the
necessary minimum number of hidden-layer units, w2 is
chosen to be greater than w1 and w3. The convergence of
the NN and NT are significantly influenced by the distribu-
tion of the initial connection weight, which is determined
by using random numbers. A mean fitness, which is a mean
value of fitness of five trainings with different initial
weights, was used for each individual.
Among individuals, generated in a computer, those
with a large f (i.e., with a small E, NH, and NT) remain
following the genetic rules of a generation alteration (in-
creasing, crossover, mutation). In the process of an increase
in the genetic rules, individuals with a large f remain at a
certain rate Gr in the order of magnitude and the rest are
discarded. In this experiment, an �elite strategy� (certain
individuals are succeeded as the next generation) was used.
In the crossover process, a pair of individuals having excel-
lent chromosomes are crossed over with probability of Cr
in each genetic block so that the total number of individuals
Np is always constant. A mutation is carried out by bit-re-
versing in the chromosome of an individual with a mutation
probability Mr. The generation alteration is repeated in this
way a predetermined number of times. Then each parameter
obtained from the chromosome of an individual having the
highest f, among all of the individuals of the last generation,
is used for the number of units of the hidden-layer unit and
each training parameter.
2.3. Extraction of male/female facial features
Male/female facial features can be identified by using
an NN (consisting of parameters that are obtained by the
GA method) and by repeating its training. In this process,
the hidden-layer unit shows four patterns: (a) the pattern
significantly responds to a male facial image alone, (b) the
pattern significantly responds to a female facial image
alone, (c) the pattern responds to both male and female
facial images, or (d) the pattern does not respond to any
facial image. It is not possible to predict that a pattern
appears on the hidden-layer unit, since this depends on the
state of an initial connection weight. However, the NN
identifies a male or female face by using (a) or (b). A part
in which input Ii connected to a positive connection weight
wij is large (i.e., an image is bright) and a part in which input
Ii connected to a negative connection weight wij is small
(i.e., an image is dark) significantly contribute to the in-
crease of output Hi of hidden-layer unit j. For example, the
output Hj of a hidden-layer unit j (which responds to a male
facial image alone) strongly depends on the connection
weight wij (which connects j and Ii).
Therefore, the largest feature by which an NN iden-
tifies male/female facial images can be estimated by the
distribution of wij. This paper uses this method.
3. Experimental Results and Discussion
3.1. Input facial image
To extract a feature of each part of a facial image by
using the proposed method, it is necessary to normalize the
size and position of the input image. Also, it is necessary to
minimize the unevenness of photographic conditions of the
image, since the image is a gray-scale type (black-and-
white). In this experiment, 60 facial photographs (25 male
and 25 female faces for training, and 5 male and 5 female
faces for the NN evaluation) were sampled from a school
album in which all of the photographs were taken under
relatively uniform light conditions. Each photograph was
scanned by an image scanner, and all were manually nor-
(5)
70
malized so that the background was removed, the eye
positions were leveled, and the face width was constant.
Faces with glasses, hair decorations, and mustaches were
excluded. Each photograph was converted from a color
image to a 256-gray-scale (black-and-white) image with
600 u 640 pixels.
To extract details of important features, seven regions
of each face as shown in Fig. 1(a) were used for the
experiment: (r1) the whole face except for hair, (r2) eyes
with eyebrows, (r3) eyebrows, (r4) eyes, (r5) nose and
mouth, (r6) nose, and (r7) mouth. (r7) was adjusted verti-
cally for each face since individual differences of mouth
size are considerable.
Each facial region is converted into a mosaic image,
each mosaic being a rectangle. The number of pixels varies
from 8 u 8 (r1 region: 12 u 12) to the maximum number
where the NN cannot identify a male/female face. The mean
gray level of each mosaic block is used for its gray level.
Figure 1(b) shows an example in which 16 u 16 mosaic
blocks (24 u 24 pixels) are used for the whole (r1) region.
The reason why the number of training images is
greater than the number of evaluation images is to examine
the elements of male/female identification in the NN for
which the training has been completed. These evaluation
images were also used, for the NN after the training has
been completed, as untrained data, so that the degree of
completion of the NN is evaluated.
3.2. Determination of training parameters by
using GA, and evaluation
The number of input-layer units for the NN is the
same as the number of mosaic blocks in each facial image.
The number of hidden-layer units is the same as the mini-
mum necessary number of possible convergence given by
the GA. Two output-layer units were used since there are
two categories (male and female). To determine the parame-
ters for the NN and GA, preliminary experiments were
carried out for the nose (r6) and mouth (r7), in which
convergence is most difficult. The number of bits of each
training parameter of the NN was determined experimen-
tally by assuming that the hidden-layer unit is 40 from the
number of mosaic blocks (the same as the number of
input-layer units) and referring to the training parameters
in the XOR problem. The maximum training iterations and
the minimum mean square error were determined in con-
sideration of the convergence of the NN and the rate of
correct solutions.
It is useful to increase the total number of individuals
that are parameters of the GA since a search region of a
single generation is expanded. It is also useful to expand
the upper limit of generation alteration in the search for the
fittest value. However, the increase of these values also
increases the computation load and the search time. There-
fore, the amount of computation is taken into account for
the upper limits of the total number of individuals and the
number of generation alterations. The survival probability,
crossover probability, the mutation probability, and each
weight coefficient were determined by considering the
distribution of fitness and the transition of the fittest indi-
viduals between generations.
The experiments were carried out based on the pre-
liminary experiments. Twenty-seven bits were used for the
chromosomes of each individual, which determine each
parameter of the NN and the number of hidden-layer units
NH. K, H, and D are chosen to be (0.00 to 1.27). These values
are derived from 7-bit integers extracted from 27-bit chro-
mosomes, multiplying each by 0.01. Six-bit integers (0 to
63) were used for the number of hidden-layer units. In this
experiment, the NN (consisting of parameters obtained
from each individual) sequentially learns the 50 facial
images, and the learning is repeated until the minimum
mean square error Emin becomes less than 10�4 or the
maximum training Nmax exceeds 500. Np = 100, the upper
limit of generation alteration = 50, and Gr = 0.05 were used.
The occurrence probability Cr = 0.20 for each block of
parameter-bit line was used. A mutation is represented by
bit-reversed Mr with 0.02. The fitness f of each individual
is obtained by using w1 = 0.2, w2 = 0.6, and w3 = 0.2,
respectively, for the weight coefficient of Eq. (5), so that
the minimum necessary number of hidden-layer units is
obtained.
Table 1 shows the parameters for each mosaic-de-
pending region obtained by the GA. Parameters for a region
that has a coarser mosaic than those shown in Table 1 cannot
be obtained by using the GA. The convergence rate, mean
training iteration, and male/female facial identification rate
Fig. 1. Seven regions used in the experiment and an
example of mosaic pattern.
71
of trained and untrained facial images (with 100 different
initial connection weights) were used for the evaluation of
the parameters. The NN consisting of parameters obtained
by the GA highly converges, and this identifies trained/un-
trained mate/female facial images with almost 100% accu-
racy. An exception is the (r5) region, which has 2 u 3 mosaic
blocks over the nose and mouth, with a convergence rate of
about 60%.
Using an NN consisting of parameters used in the
GA, the effects of the number of hidden-layer units on the
convergence rate and the training iterations were investi-
gated. The number of hidden-layer units was varied from 1
to Hmax = 40, and 100 experiments were carried out with
different initial connection training iterations. Figure 2
shows the results of the experiment obtained for the whole
facial region (r1). The results show that this kind of NN
almost 100% converges independently of the initial connec-
tion weight, provided the number of hidden-layer units is
greater than the obtained number by GA. An exception is
the nose region (r5) where the number of mosaics is 2 u 3.
3.3. Characteristics of hidden-layer unit
When male/female facial images (50 each) are ap-
plied to the trained NN, four hidden-layer units [(a) to (d)
Table 1. The obtained neural network parameters by GA, and the results of estimation
Fig. 2. Effects of the number of hidden units on
convergent rate and training iterations.
72
in Section 2.3] were obtained. Figure 3 shows the distribu-
tion of the connection weights between the hidden-layer
unit and the input-layer unit in the whole facial region (r1),
with varying number of mosaic blocks. (a) has 4 u 4
mosaics, (b) has 8 u 8 mosaics, (c) has 16 u 16 mosaics, and
(d) has 32 u 32 mosaics. The size of the squares in Fig. 3 is
proportional to the absolute values of the connection
weights, a white square being positive, and a black square
negative. The connection weights are normalized by using
the absolute value of the maximum connection weight.
The influence of the distribution of connection
weights on its initial value and the number of hidden-layer
units is investigated. The experiments were repeated 100
times with different initial values so that the connection
weights for four feature hidden-layer units were obtained.
The four weight distributions were evaluated by the mean
value of variance of each mosaic block (variance in a
group). The effect of the hidden-layer units was evaluated
by varying the connection weight from the number of units
obtained by the GA to a maximum of 40. Table 2 shows the
intragroup variance of connection weights in each region.
Male/female facial features shown in Fig. 3 are almost
independent of the initial values of the connection weights.
However, if the number of hidden-layer units is changed,
the intragroup variance becomes larger in the whole region.
The reason for this is apparently that when the hidden-layer
units are increased, the facial features are distributed; and
when they are reduced, the features are condensed, so that
the intragroup variance is affected. The connection weight
distribution of hidden-layer units has a larger variance and
has less regularity than that of two other connection weight
distributions. It is necessary to investigate the role of these
feature hidden-layer units in detail.
3.4. Results of extraction of facial features
3.4.1. Whole facial area (r1)
Male/female features in the whole face (r1) are esti-
mated by the distribution of connection weights shown in
Fig. 3. The connection weight for male and female are
almost the same with an opposite sign. For example, a dark
part in a male face corresponds to a bright part in a female
face. Even the NN with 4 u 4 mosaic blocks could identify
male/female accurately. However, it is difficult to know
which identity element in this NN contributed to the iden-
tification. Such elements can be found easily in Figs. 3(c)
and 3(d), which have more mosaic blocks.
The cheeks and chin of most female faces have a
lighter skin color, and the connection weight is positive
(i.e., a female face is generally brighter than a male face).
The right edge and chin of most faces have a large positive
value connection weight as shown in Fig. 3(c). This is due
to the fact that these photographs were taken by illuminat-
ing the faces from the right side.
Fig. 3. Four configurations of connection weights in the
r1 region of both males and females.
Table 2. Within-group variance of connection weights
in each region
73
The negative areas on two sides (left and right) of a
female facial image are due to the shade of hair on the
forehead, and this is darker than in a male facial image.
Large positive areas around the ears of a male facial image
are due to the fact that most males� hairs does not reach their
ears.
The results of these experiments agree with the result,
obtained in many psychological experiments, that hair
shape is the main factor of discrimination of male/female
[1], and with the investigation of the skin color of Japanese
youths [19].
3.4.2. Regions of eyebrows and nose (r2, r3, r4)
Figure 4 shows the distributions of connection
weights in the regions of the eyebrows, eyes, nose, and
mouth superimposed on original facial images (16 u 16
mosaic blocks) each of which has the largest value of the
output-layer unit as male and female.
Generally, the eyebrows of men are thicker and
darker than those of women. The authors� experiments
show that the connection weight of the eyebrow region (r2)
of a female face is negative. This is caused by shade cast by
the hair at the top of the female face when photographed
(although photographs were preselected to avoid this ef-
fect). The test results of the eyebrow region (r3) show that
a large negative connection weight appears below the eye-
brows on male facial images (which are a characteristic of
male faces).
There are large connection weights around the pupils
of the female face. This indicates that females have large
distinct pupils, while males have a thin eyelid opening.
These results agree with the measured results of
Yamaguchi and colleagues [8] in which facial configura-
tions of Japanese were analyzed.
3.4.3. Nose and mouth regions (r5, r6, r7)
Referring to the nose region (r5, r6), there are many
negative connection-weight regions to the left side of the
nose in the female faces. This is caused by the shade cast
by the nose when the face was photographed. The nose
height can be estimated by its shade. This suggests that
females have narrower and higher noses compared with
males. The result that the female nose is narrower than the
male nose agrees with Yamaguchi�s paper [8].
In the mouth region (r7), many female facial images
have positive connection weights around the right side of
the nose. This seems to be an effect of the photographic
conditions in addition to the bright color of the lips. The
connection weight is negative around the mouth of male
faces, and positive for female faces. This is due to the
difference of skin colors of male/female and the fact that
males� lips are thicker than females�.
4. Conclusions
In this paper, the features of male/female faces are
extracted using an NN. The results of the experiments give
the features of hair style and skin colors (from the whole
face); the size of eyebrows, the size of eyes (from the eyes
region); the nose height (from the nose region); and the
thickness of the lips (from the mouth region). The experi-
ments in this paper used only 50 training images and 10
evaluation images (all images from school photographs).
However, the results are similar to those of psychological
experiments and facial measurements, showing that the NN
can accurately identify partial features of male/female
faces.
Future plans of this project are: (a) to sample facial
images from a wide age group; (b) to include color infor-
mation; (c) to include the features of the whole face [8, 20];
and (d) to examine in detail the roles of hidden-layer units.
These extensions will increase the applications of the pro-
Fig. 4. Examples of connection weights in the r2, r3, r4,
r5, r6, r7 regions of both male and female.
74
posed method, including understanding the information-
processing mechanism used in discriminating male/female
identity.
Acknowledgments. The authors wish to express
their thanks to Professor Yasumasa Teramachi (Department
of Information and Computer Science, Polytechnic Univer-
sity) for his advice, and to Mr. Hideki Kumano and Shin-
ichi Okamoto (Vocational Promotion Corporation) for their
support in preparing facial images.
REFERENCES
1. Burce V (translated by Yoshukawa S). Facial recog-
nition and information processing. Science-Sha;
1990.
2. Edited by Multidisciplinary Research Council of Ja-
pan (supervised by Ichimatsu S, Muraoka Y).
Sensitivity and information processing. Kyoritsu-
Shuppan; 1993.
3. Roberts T, Burce V. Feature saliency in judging sex
and familiarity of faces. Perception 1988;17:475�
481.
4. Burce V, Burton AM, Hanna E, Healy P, Mason O.
Sex discrimination: How do we tell the difference
between male and female. Perception 1993;22:131�
152.
5. Kanno T, Ahui T, Nagahashi H. Psychological ele-
ments for male/female identification. Spring Na-
tional Conference IEICE, Japan, D-518, p 279, 1993.
6. Kumano H, Kanno T, Teramachi Y, Nagahashi H.
Fixation points measurement system considering in-
fluence of peripheral vision. Tech Rep IEICE
1996;HCS96-27.
7. Burton AM, Burce V, Dench N. What�s the difference
between men and women? Evidence from facial
measurement. Perception 1993;22:153�176.
8. Yamaguchi M, Kato T, Akamatsu S. Relationship
between physical traits and subjective impressions of
the face. Trans IEICE 1996;J79-A:279�287.
9. Toshiyuki, Nagao M, Kanai T. Analysis of facial
photograph using computer. Trans IEICE 1973;56-
D:226�233.
10. Minami T. Facial identity technology. SICE 1986;25.
11. Golomb BA, Lawrence DT, Sejnowski TJ. SEXNET:
A neural network identifies sex from human faces.
Adv Neural Inf Process Syst 1991;3:572�577.
12. Kaai H, Tamura S. Gender and individual classifica-
tion by mosaic facial images with different resolu-
tions using neural network. ITE 1992;46:93�96.
13. Kosugi M. Human face recognition using mosaic
pattern and neural networks. Trans IEICE 1993;J76-
D-II:1123�1139.
14. Funaba K. Principle of hierarchical neural networks.
SICE 1991;30:280�284.
15. Agui T, Nagahashi H, Takahashi H. Neural pro-
grammes. Shokodo; 1993.
16. Takahashi H, Nakajima M. A study of feedforward
neural networks using genetic algorithms. Trans
IEICE 1996;J79-D-II:1920�1928.
17. Tomoharu, Nagao, Agui T. Genetic algorithm. Shok-
odo; 1993.
18. Sugano T, Kumano H, Teramachi Y, Nagahashi H.
Designing neural networks for recognition of male
and female faces using genetic algorithms. Trans
IEICE 1996;J80-D-II:2251�2253.
19. Edited by the Color Science Association of Japan.
Handbook of color science. Tokyo University Press;
1991. p 1097�1117.
20. Rhodes G. Looking at faces: First-order and second-
order features as determinants of facial appearance.
Perception 1988;17:43�63.
75
Tsuneo Kanno graduated from the Electronics Department of Polytechnic University in 1975. He then joined the
Department of Information and Computer Science of that university and presently is a lecturer (information engineering). He
has engaged in research on pattern recognition and human communication. He is a member of IEICE and the Institute of Printing
of Japan.
Hiroshi Nagahashi graduated from the Electrical Engineering Department of Tokyo Institute of Technology in 1975 and
completed his doctoral course in 1981. He then joined Yamagata University. He moved to Tokyo Institute of Technology in 1990
and presently is a professor (physical information engineering) in the Department of Information Processing, Interdisciplinary
Graduate School of Science and Engineering. He has engaged in research mainly on pattern recognition and image processing.
He holds a D.Eng. degree. He is a member of the Institute of Information Processing of Japan, IEEE, and IEICE.
Takeshi Agui graduated from the Electrical Engineering Department of Tokyo Institute of Technology in 1959 and
completed his doctoral course in 1964. He was with that institute from 1964 to 1996. He then joined TOIN University of
Yokohama as a professor (control system engineering). He has engaged in research on pattern recognition and image processing.
He holds a D.Eng. degree. He is a member of the Institute of Information Processing of Japan, IEEE, and IEICE.
AUTHORS (from left to right)
76