gender recognition

Upload: saket-jha

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Gender recognition

    1/43

    1

    ABSTRACT

    With the development of more and more identification systems to identify a person, there is a

    need of the development of a system which can provide personal identification tasks such as

    gender without any human interference. In this thesis, we consider the problem of gender

    classification from frontal facial images using genetic feature subset selection. We argue that

    feature selection is an important issue in gender classification and demonstrate that Genetic

    Algorithms (GA) can select good subsets of features (i.e., features that encode mostly gender

    information), reducing the classification error. The gender of a facial image is the most

    prominent feature, and improvement in the existing gender classification methods will result in

    the high performance of the face retrieval and classification methods for large repositories.Gender recognition system consists of three steps. At the initial stage of this system some pre-

    processing are applied on the input image. Secondly, face features are extracted, which will be

    taken as the input of the genetic algorithm (GA). In the third step, classification is carried out by

    the Genetic Algorithm to identify the gender.

    Key Words: Gender, Recognition, Genetic, Algorithms, male, female.

  • 7/28/2019 Gender recognition

    2/43

    2

    INDEX

    CHAPTER 1: INTRODUCTION ......................................................................................................................... 7

    1.1 Genetic Algorithm ......................................................................................................................... 7

    1.1.1 Methodology of Genetic Algorithm: .......................................................................................... 7

    1.1.2 Terminologies of Genetic Algorithm: ......................................................................................... 9

    1.1.3 Operators of Genetic Algorithm: ................................................................................................ 10

    1.2.5 Parameters of Genetic Algorithm: ............................................................................................. 11

    1.2 Motivation: ................................................................................................................................. 12

    1.3 Problem Statement: .......................................................................................................................... 13

    1.4 Organization of the Thesis : .............................................................................................................. 13

    CHAPTER 2:BACKGROUND .......................................................................................................................... 14

    2.1 Search Space: .............................................................................................................................. 14

    2.2 Basic Recommendations: ................................................................................................................ 15

    2.3 Selection techniques: .................................................................................................................. 16

    2.3.1 Roulette Wheel Selection .......................................................................................................... 16

    2.3.2 Rank Selection ..................................................................................................................... 16

    2.3.3 Steady-State Selection ............................................................................................................. 16

    2.3.4 Elitism: ........................................................................................................................................ 17

    2.4 Encoding Schemes: ........................................................................................................................... 17

    2.4.1 Binary Encoding.......................................................................................................................... 17

    2.4.2 Permutation Encoding: .............................................................................................................. 17

    2.4.3 Value Encoding: .......................................................................................................................... 18

    2.4.4 Tree Encoding: ........................................................................................................................... 18

  • 7/28/2019 Gender recognition

    3/43

    3

    2.5 Eigen space Representation : ....................................................................................................... 19

    2.5.1 Computation of the Eigen Faces : ............................................................................................ 20

    2.6 Independent Component Analysis : ................................................................................................ 21

    2.6.1 Face image representation based on ICA :............................................................................... 21

    CHAPTER 3: LITERATURE SURVEY ............................................................................................................... 22

    3.1 Overall Review of Previous Work:..................................................................................................... 22

    3.2 Methodology ..................................................................................................................................... 23

    3.2.1 Face Detection ........................................................................................................................... 23

    3.2.2 Feature Extraction ...................................................................................................................... 24

    3.2.3 Classifier:........................................................................................................................................ 25

    CHAPTER 4: THE PROPOSED APPROACH .................................................................................................... 27

    4.1 Introduction: ..................................................................................................................................... 27

    4.2 Feature Extraction: ............................................................................................................................ 28

    4.3 Genetic Feature Selection: ................................................................................................................ 29

    4.3.1 Initial Population: ....................................................................................................................... 30

    4.3.2 Crossover: .................................................................................................................................. 30

    4.3.3 Mutation: ................................................................................................................................... 30

    4.4 Gender Classification: ....................................................................................................................... 30

    4.5 Development of the Gender Recognition System: ........................................................................... 32

    CHAPTER 5: RESULT AND DISCUSSION ....................................................................................................... 33

    5.1 Introduction: ..................................................................................................................................... 33

    5.2 Databases Used: ................................................................................................................................ 33

    5.3 Experimental Results: ....................................................................................................................... 34

    5.3.1 Correct Classification Result: ..................................................................................................... 34

    5.3.2 False Classification Result: ......................................................................................................... 39

  • 7/28/2019 Gender recognition

    4/43

    4

    5.4 Result Analysis: ................................................................................................................................. 40

    CHAPTER 6: CONCLUSION AND SCOPE OF THE FUTURE WORK ................................................................. 41

    6.1 Conclusion: ........................................................................................................................................ 41

    6.2 Limitations: ....................................................................................................................................... 41

    6.3 Future Research Work: ..................................................................................................................... 42

    REFERENCES ................................................................................................................................................ 43

  • 7/28/2019 Gender recognition

    5/43

    5

    List of Figures

    Figure 1 : Working of Genetic Algorithm .............................................................................................. .8

    Figure 2: Example of chromosomes with tree encoding.15

    Figure 3 : Zigzag scan of DCT coefficients ................................................................................................... 24

    Figure 4 : Example of k-NN Classification ................................................................................................... 25

    Figure 5 : Gender Recognition Methodology Example ............................................................................... 26

    Figure 6 : Proposed Gender Recognition system flow ............................................................................... .26

    Figure 7: Neural network, interconnected group of nodes..31

    Figure 8: GUI of the Gender recognition System....32

    Figure 9:Indian Facial database sample ....33

    Figure 10: Stanford student database sample Flow..34

    Figure 11: Indian Female Face...35

    Figure 12: Female Gender Recognized35

    Figure 13 : Indian Male Face ....................................................................................................................... 35

    Figure 14: Male gender recognized ............................................................................................................ 35

    Figure 15 : Male Face .................................................................................................................................. 37

    Figure 16 : Male gender recognized ........................................................................................................... 38

    Figure 17 :Female Face ............................................................................................................................... 37

    Figure 18 : Female gender recognized ........................................................................................................ 38

    Figure 19 : Experiment on Indian Female Face ........................................................................................... 38

    Figure 20 : False gender recognized ........................................................................................................... 39

    http://localhost/var/www/apps/conversion/tmp/scratch_12/Thesis.doc#_Toc355690213http://localhost/var/www/apps/conversion/tmp/scratch_12/Thesis.doc#_Toc355690213
  • 7/28/2019 Gender recognition

    6/43

    6

    List of Tables

    Table 1: Examples of chromosomes...10

    Table 2: Crossover operator on chromosomes10

    Table 3: Mutation operator on chromosomes.11

  • 7/28/2019 Gender recognition

    7/43

    7

    CHAPTER 1: INTRODUCTION

    1.1Genetic Algorithm

    In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search

    heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate

    useful solutions to optimization and search problems. Genetic algorithms belong to the larger

    class of evolutionary algorithms, which generate solutions to optimization problems using

    techniques inspired by natural evolution, such as inheritance, mutation, selection& crossover. [7]

    A typical genetic algorithm requires:

    a) A genetic representation of the solution domain.

    b) A fitness function to evaluate the solution domain.

    1.1.1 Methodology of Genetic Algorithm:

    In a genetic algorithm, a population of strings (called chromosomes or the genotype of the

    genome), which encode candidate solutions (called individuals or phenotypes) to an optimization

    problem, evolves toward better solutions. Traditionally, solutions are represented in binary as

    strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a

    population of randomly generated individuals and happens in generations.

    In each generation, the fitness of every individual in the population is evaluated; multiple

    individuals are stochastically selected from the current population, and modified to form a new

    population.

  • 7/28/2019 Gender recognition

    8/43

    8

    The new population is then used in the next iteration of the algorithm. Commonly, the algorithm

    terminates when either a maximum number of generations has been produced, or a satisfactory

    fitness level has been reached for the population. If the algorithm has terminated due to a

    maximum number of generations, a satisfactory solution may or may not have been reached.[7]

    Figure 1 : Working of Genetic Algorithm

  • 7/28/2019 Gender recognition

    9/43

    9

    1.1.2 Terminologies of Genetic Algorithm:

    There are some essentials terminologies used to understand Genetic Algorithm, they are

    as follows:

    Chromosome should in some way contain information about solution which it represents.

    Population of n chromosomes: Set of all possible solutions.

    Children: New solution generated from two present and available solutions.

    Selection: Select two parent chromosomes from a population according to their fitness

    (the better fitness, the bigger chance to be selected).

    Accepting: Place new offspring in a new population.

    Elitism: At least one best solution is copied without changes to a new population, so the

    best solution found can survive to end of run.

    Outline of the Basic Genetic Algorithm:

    Step 1: [Start] Generate random population of n chromosomes (suitable solutions).

    Step 2: [Fitness] Evaluate the fitness f(x) of each chromosome x in the population.

    Step 3: [New population] Create a new population by repeating following steps until the

    new population is complete

    i.[Selection]

    ii.[Crossover]

    iii.[Mutation]

    iv.[Accepting]

    Step 4: [Replace] Use new generated population for a further run of algorithm

    Step 5: [Test] if the end condition is satisfied, stop, and return the best solution.

    Step 6: [Loop] Go to step 2.

  • 7/28/2019 Gender recognition

    10/43

    10

    1.1.3 Operators of Genetic Algorithm:

    The chromosome should in some way contain information about solution which it represents.

    The most used way of encoding is a binary string. The chromosome then could look like this:

    Chromosome 1 1101100100110110

    Chromosome 2 1101111000011110

    Table 1: Examples of chromosomes

    Each chromosome has one binary string. Each bit in this string can represent some characteristic

    of the solution.

    Crossover:

    After we have decided what encoding we will use, we can make a step to crossover. Crossover

    selects genes from parent chromosomes and creates a new offspring. The simplest way how to do

    this is to choose randomly some crossover point and everything before this point copy from a

    first parent and then everything after a crossover point copy from the second parent.

    Crossover can then look like this (| is the crossover point):

    Chromosome 1 11011 | 00100110110

    Chromosome 2 11011 | 11000011110

    Offspring 1 11011 | 11000011110

    Offspring 2 11011 | 00100110110

    Table 2: Crossover operator on chromosomes

  • 7/28/2019 Gender recognition

    11/43

    11

    There are other ways how to make crossover, for example we can choose more crossover points.

    Specific crossover made for a specific problem can improve performance of the genetic

    algorithm.

    Mutation:

    After a crossover is performed, mutation takes place. This is to prevent falling all solutions in

    population into a local optimum of solved problem. Mutation changes randomly the new

    offspring. For binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0

    to 1. Mutation can then be following:

    Original offspring 1 1101111000011110

    Original offspring 2 1101100100110110

    Mutated offspring 1 1100111000011110

    Mutated offspring 2 1101101100110110

    Table 3: Mutation operator on chromosomes

    1.2.5 Parameters of Genetic Algorithm:

    There are two basic parameters of GA - crossover probability and mutation probability:

    Crossover probability says how often will be crossover performed. If there is no crossover,

    offspring is exact copy of parents. If there is a crossover, offspring is made from parts of parents'

    chromosome. If crossover probability is 100%, then all offspring is made by crossover. If it is

    0%, whole new generation is made from exact copies of chromosomes from old population (but

    this does not mean that the new generation is the same!).

  • 7/28/2019 Gender recognition

    12/43

    12

    Crossover is made in hope that new chromosomes will have good parts of old chromosomes and

    maybe the new chromosomes will be better. However it is good to leave some part of population

    survive to next generation.

    Mutation probability says how often will be parts of chromosome mutated. If there is no mutation,

    offspring is taken after crossover (or copy) without any change. If mutation is performed, part of

    chromosome is changed. If mutation probability is 100%, whole chromosome is changed, if it is

    0%, nothing is changed. Mutation is made to prevent falling GA into local extreme, but it should

    not occur very often, because then GA will in fact change to random search.[7]

    1.2 Motivation:

    Automatic gender identification pays an important role in identification of a person. Gender

    identification can help effectively to reduce the search time by limiting the subsequent searching

    stage to either a male database or a female database. Automatic gender identification can also

    provide an important clue in various security and surveillance based applications.

    Recognizing human gender plays an important role in many human computer interaction (HCI)

    areas. For example, search engines need an image filter to determine the gender of people in

    images from the Internet; demographic research can use gender information extracted from

    images to count the number of men and women entering a shopping mall or movie theater; a

    smart building might use gender for surveillance and control of access to certain areas.

    In psychology studies for HCI, the main focus is about how humans discriminate between males

    and females and what kind of features is more discriminative. A successful gender classification

    approach can boost the performance of many other applications including face recognition and

    smart human-computer interfaces.

    Most of the gender recognition systems are template based as feature based system requires

    automatic facial feature extraction which itself is a complex and time consuming task. Thus the

    idea of genetic algorithm is used in order to eliminate the gender irrelevant features and thus

    decrease the error rate by 17.7%. [4]

  • 7/28/2019 Gender recognition

    13/43

    13

    1.3 Problem Statement:

    The aim of this project is to apply the efficient working methodology of Genetic Algorithm for

    Gender Recogonization.It signifies the process of determining the gender of a subject from face

    images and thus replaces the existing limited range conventional method of Face

    recogonization.Using Genetic Algorithm we can eliminate the gender irrelevant features and

    hence removes the static property of gender classification in present existing system.

    1.4 Organization of the Thesis :Besides this chapter,the thesis consists of 5 more chapters. Brief overview of each chapter is

    given below:

    Chapter 1: This chapter gives a brief introduction to gender identification problem. Motivation

    for present work and application of the system have been presented.

    Chapter 2: This chapter dicusses the necessary background study required for the development

    of the system and all related methods to achieve the objective.

    Chapter 3: This chapter presents a brief review of all previous work done on gender recogntion

    system and there performance.

    Chapter 4:This chapter presents the proposed approach for gender recogntion problem.This

    chapter is divided in three sections;feature selection,feature extraction and gender classification.

    Chapter 5:Experimental results and performance of proposed gender recogntion model are

    presented in this chapter.

    Chapter 6: Conclusions are given in this chapter.It also dicusses the scope of future work.

  • 7/28/2019 Gender recognition

    14/43

    14

    CHAPTER 2:BACKGROUND

    Genetic algorithms are a part of evolutionary computing, which is a rapidly growing area of

    artificial intelligence. Genetic algorithms are inspired by Darwin's theory about evolution.

    Simply said, solution to a problem solved by genetic algorithms is evolved.

    2.1 Search Space:

    If we are solving some problem, we are usually looking for some solution, which will be the best

    among others. The space of all feasible solutions (it means objects among those the desired

    solution is) is called search space (also state space). Each point in the search space represents

    one feasible solution. Each feasible solution can be "marked" by its value or fitness for the

    problem.

    We are looking for our solution, which is one point (or more) among feasible solutions - that is

    one point in the search space. The looking for a solution is then equal to a looking for some

    extreme (minimum or maximum) in the search space. The search space can be whole known bythe time of solving a problem, but usually we know only a few points from it and we are

    generating other points as the process of finding solution continues.

    The problem is that the search can be very complicated. One does not know where to look for the

    solution and where to start. There are many methods, how to find some suitable solution (i.e. not

    necessarily the best solution), for example hill climbing, tab search, simulated annealing and

    genetic algorithm. The solution found by these methods is often considered as a good solution,because it is not often possible to prove what the real optimum is. [7]

    As we have seen in 1st

    chapter, the outline of Basic GA is very general. There are many things

    that can be implemented differently in various problems. First question is how to create

    chromosomes, what type of encoding to choose. With this is connected crossover and mutation,

  • 7/28/2019 Gender recognition

    15/43

    15

    the two basic operators of GA. Next question is how to select parents for crossover. This can be

    done in many ways, but the main idea is to select the better parents (in hope that the better

    parents will produce better offspring). [7]

    Also you may think, that making new population only by new offspring can cause lost of the

    best chromosome from the last population. This is true, so called elitism is often used. This

    means, that at least one best solution is copied without changes to a new population, so the best

    solution found can survive to end of run.

    2.2 Basic Recommendations:

    Following are the standard recommendations for respective parameter:

    Crossover rate: Crossover rate generally should be high, about 80%-95%.(However some

    results show that for some problems crossover rate about 60% is the best.)

    Mutation rate: On the other side, mutation rate should be very low. Best rates reported are

    about 0.5%-1%.Population size: Good population size is about 20-30, however sometimes sizes 50-100 are

    reported as best. Some research also shows that best population size depends on encoding, on

    size of encoded string. It means, if you have chromosome with 32 bits, the population should be

    say 32, but surely two times more than the best population size for chromosome with 16 bits.

    Selection: Basic roulette wheel selection can be used, but sometimes rank selection can be

    better.

    Encoding: Encoding depends on the problemand also on the size of instance of the problem.

  • 7/28/2019 Gender recognition

    16/43

    16

    2.3 Selection techniques:

    2.3.1 Roulette Wheel Selection: Parents are selected according to their fitness. The better

    the chromosomes are, the more chances to be selected they have. Imagine a roulette wheel where

    are placed all chromosomes in the population, every chromosomes has its place big accordingly

    to its fitness function.

    This can be simulated by following algorithm:

    Step 1 [Sum] Calculate sum of all chromosome fitnesss in population - sum S.

    Step 2 [Select] Generate random number from interval (0, S)- r.

    Step 3 [Loop] Go through the population and sum fitnesss from 0- sum s. When the sum sis

    greater then r, stop and return the chromosome where you are.

    Of course, step 1 is performed only once for each population.

    2.3.2 Rank Selection:The previous selection will have problems when the fitnesss differ

    very much. For example, if the best chromosome fitness is 90% of the entire roulette wheel then

    the other chromosomes will have very few chances to be selected. Rank selection first ranks the

    population and then every chromosome receives fitness from this ranking. The worst will have

    fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in

    population). After this all the chromosomes have a chance to be selected. But this method can

    lead to slower convergence, because the best chromosomes do not differ so much from other

    ones. [7]

    2.3.3 Steady-State Selection: Main idea of this selection is that big part of Chromosomes

    should survive to next generation.GA then works in a following way. In every generation isselected a few (good - with high fitness) chromosomes for creating a new offspring. Then some

    (bad - with low fitness) chromosomes are removed and the new offspring is placed in their place.

    The rest of population survives to new generation.

  • 7/28/2019 Gender recognition

    17/43

    17

    2.3.4 Elitism: When creating new population by crossover and mutation, we have a big

    chance, that we will loose the best chromosome. Elitism is name of method, which first copies

    the best chromosome (or a few best chromosomes) to new population. The rest is done in

    classical way. Elitism can very rapidly increase performance of GA, because it prevents losing

    the best found solution.[7]

    2.4 Encoding Schemes:

    2.4.1 Binary Encoding:Binary encoding is the most common, mainly because first works

    about GA used this type of encoding. In binary encoding every chromosome is a string of bits, 0

    or 1. Binary encoding gives many possible chromosomes even with a small number of alleles.

    On the other hand, this encoding is often not natural for many problems and sometimes

    corrections must be made after crossover and/or mutation.[7]

    Example of Problem: Knapsack problem

    The problem: There are things with given value and size. The knapsack has given capacity.

    Select things to maximize the value of things in knapsack, but do not extend knapsack capacity.

    Encoding: Each bit says, if the corresponding thing is in knapsack.

    2.4.2 Permutation Encoding: In permutation encoding, every chromosome is a string of

    numbers, which represents number in a sequence. Permutation encoding is only useful for

    ordering problems. Even for this problems for some types of crossover and mutation corrections

    must be made to leave the chromosome consistent (i.e. have real sequence in it).[7]

    Example of Problem: Travelling salesman problem (TSP)

    The problem: There are cities and given distances between them. Travelling salesman has to

    visit all of them, but he does not to travel very much. Find a sequence of cities to minimize

    travelled distance.

    Encoding: Chromosome says order of cities, in which salesman will visit them.

  • 7/28/2019 Gender recognition

    18/43

    18

    2.4.3 Value Encoding: Direct value encoding can be used in problems, where some

    complicated value, such as real numbers, is used. Use of binary encoding for this type of

    problems would be very difficult. In value encoding, every chromosome is a string of some

    values. Values can be anything connected to problem, form numbers, real numbers or chars to

    some complicated objects. Value encoding is very good for some special problems. On the other

    hand, for this encoding is often necessary to develop some new crossover and mutation specific

    for the problem.

    Example of Problem: Finding weights for neural network

    The problem: There is some neural network with given architecture. Find weights for inputs of

    neurons to train the network for wanted output.

    Encoding: Real values in chromosomes represent corresponding weights for inputs.[7]

    2.4.4 Tree Encoding:Tree encoding is used mainly for evolving programs or expressions,

    for genetic programming. In tree encoding every chromosome is a tree of some objects, such as

    functions or commands in programming language.

    ( + x ( / 5 y ) ) ( do_until step wall )Figure 2: Example of chromosomes with tree encoding

    +

    x

    y5

    do until

    step wall

  • 7/28/2019 Gender recognition

    19/43

    19

    Tree encoding is good for evolving programs. Programming language LISP is often used to this,

    because programs in it are represented in this form and can be easily parsed as a tree, so the

    crossover and mutation can be done relatively easily.

    Example of Problem: Finding a function from given values

    The problem: Some input and output values are given. Task is to find a function, which will

    give the best (closest to wanted) output to all inputs.

    Encoding: Chromosomes are functions represented in a tree.

    2.5 Eigen space Representation :

    Eigenspace representations of images use PCA [9] to linearly project an image in a low-

    dimensional space. This space is spanned by the principal components (i.e., eigenvectors

    corresponding to the largest eigen values) of the distribution of the training images. After an

    image has been projected in the eigen space, a feature vector containing the coefficients of the

    projection is used to represent the image. We refer to these features as eigen features.

    The projection coefficients allow us to represent images as linear combinations of the

    eigenvectors. It is well known that the projection coefficients define a compact image

    representation and that a given image can be reconstructed from its projection coefficients and

    the eigenvectors (i.e., basis). The eigen space representation of images is very powerful and has

    been used in various applications such as image compression and face recognition.

  • 7/28/2019 Gender recognition

    20/43

    20

    2.5.1 Computation of the Eigen Faces :

    Eigen faces are a set of eigenvectors. They can be considered a set of "standardized face

    ingredients", derived from statistical analysis (PCA) of many pictures of faces.

    Step 1: Obtain face images I1, I2, ..., Im (training faces).

    Step 2: represent every image Ii as a vectori.

    Step 3: Compute the average face vector:

    M

    : 1/ Mii=1

    Step 4: subtract the mean face:ii

    Step 5: Compute the covariance matrix C:

    M

    CnTn AAT(N2xN

    2matrix)

    i=1

    whereA [1 2 . . . M] (N2xMmatrix)

    Step 6: Compute the eigenvectors ui of AAT.

    Step 7: Keep only K eigenvectors (corresponding to the K largest eigenvalues). [5]

  • 7/28/2019 Gender recognition

    21/43

    21

    2.6 Independent Component Analysis :

    Independent component analysis (ICA) [6] is a statistical model where the observed data is

    expressed as a linear combination of underlying latent variables. The latent variables are

    assumed non-Gaussian and mutually independent. The task is to find out both the latent

    variables and the mixing process. The ICA model is formula (1):

    x = As.. (1)

    Where x=(x1,,xm)Tis the observed random vector, and s=(s1,.,sm) is the components vector

    in which si are as independent as possible in the sense of higher order statistics, and the matrix A

    is a constant m m mixing matrix. Both A and S is unknown. The above model is identifiable

    under the following fundamental restrictions: at most one of the independent components s i may

    be Gaussian, and the matrix A must be of full column rank.

    2.6.1 Face image representation based on ICA :

    Suppose the size of face images normalized is w h, and the ith

    face image is expressed as a row

    vectorxixi(1,1),.xi(1,h),xi(2,1),xi(w,h)].Themface images in the training set form the

    observed vector X=(x1,x2,..,xm)T.

    According to ICA principle, these face images can be linearly combined by nbasis images Sand

    statistically independent coefficient A. Fast ICA is performed to get the separate matrix Wand

    calculate the estimation of basis images Sthrough Y=WX. Then, each face image is projected in

    a low-dimension space spanned by the estimated basis images Y. After an image has been

    projected in the subspace, a feature vector containing the coefficients of the projection is used to

    represent the image.

  • 7/28/2019 Gender recognition

    22/43

    22

    CHAPTER 3: LITERATURE SURVEY

    3.1 Overall Review of Previous Work:Based on the type of features used, existing

    gender classification approaches fall into one of two categories: geometry-based and

    appearance-based.

    Geometry-based methods use metric features, e.g., face width, face length, mouth size, eye size,

    distances, angles and areas among salient feature points (eyes, nose, etc.). In Burton et al. [1],

    73 points were extracted from a database containing 179 frontal facial images. Discriminantanalysis was then used to classify gender using point-to-point distances. The accuracy reported

    on the training data was 85%. Fellous et al. [2] computed 22 normalized distances using a

    database with 109 images. The accuracy reported in that work was 90%. Brunnelli et al [3] used

    16 geometrical features as the input to two competing hyper-basis function networks. A

    database with 168 images was used for training. The reported accuracy was 79% using novel

    faces.

    Appearance-based methods learn the decision boundary between the male and female classes

    from training imagery without extracting any geometrical features. A representative method

    belonging to this category is the eigenface approach [4]. Cottrell et al [5] has proposed a face

    categorization method using a two-stage neural network, one for face compression and one for

    face classification. The output of the hidden layer of the compression network performs

    dimensionality reduction similar to the eigenface method. The accuracy reported was 63% on a

    database containg 64 images. Colomb et al. [6] used a similar method and referred to their

    gender classification network as SEXNET. Using a database containing 90 images, they

    reported 91.9% accuracy. Yen et al. [7] followed the same scheme using a larger database (i.e.,

    1400 face images).

  • 7/28/2019 Gender recognition

    23/43

    23

    They reported 90% accuracy. Abdi et al. [8] compared raw image with PCA-based image

    representations using Radial Basis Function (RBF) and preceptor networks. Using 160 facial

    images, the best performance of 91.8% was achieved by a preceptor classifier trained with

    PCA-based features. OToole et al. [9], [10] have also reported good performance using PCA

    and neural networks. Using raw images, Moghaddam et al. [11] investigated gender

    classification using SVMs on a database with 1755 face images. They reported 96.6% accuracy

    using RBF kernels. According to the paper [5], the proposed approach of gender recognition

    system is explained below step by step and the system flow is represented in figure 5: Gender

    Recognition Methodology Example, given at page no. 26.

    3.2 Methodology: Gender classification method consists of three main Modules: face

    detection, feature extraction/selection, and Classification. An input facial image is passed to

    face detector to extract face from the image, Viola and Jones face detection method is used for

    this purpose. Then histogram equalization is performed to stretch the contrast of the image, this

    help overcome illumination variation in the images. In they showed that low resolution images

    have equal level of classification accuracy, so we can decrease computational cost by reducing

    the size of the image.

    After face detection, the image is resized to 32x32. This resized image is divided into 16 8x8

    size blocks. Then each 8x8 block is sorted according to zigzag scan order. These sorted

    coefficients are arranged in a vector and passed to the KNN classifier.[2]

    3.2.1 Face Detection:Viola and Jones (2001) in their paper presented a new cascade facedetection technique. This is a well known and robust frontal face detection method; its

    calculation is very fast. This detector extract faces from the image by starting from top left

    corner and ending at bottom right corner of an image.

    Three main modules of technique are: First images are represented in the form of Integral

    Images, which make feature computation very fast. Second module is using adboost learning

    algorithm for feature selection. And the third module is using a cascade of AdaBoost classifiers,

  • 7/28/2019 Gender recognition

    24/43

    24

    to quickly eliminate background regions of the image, while spending more computation on

    promising object-like regions, speed up the process of detection significantly. [2]

    3.2.2 Feature Extraction: DCT can be used for dimension reduction. DCT coefficients are

    then sorted according to zigzag scan order, this way we sort the coefficients with decreasing

    importance, i.e. high variance coefficients are picked first. Like other transforms, the Discrete

    Cosine Transform (DCT) attempts to de correlate the image data. After de correlation each

    transformed coefficient can be encoded independently without losing compression efficiency.[1]

    The DCT coefficients with high variance are mainly located in the upper-left corner of the DCT

    matrix. Accordingly, we scan the DCT coefficient matrix in a zigzag manner starting from the

    upper-left corner and subsequently convert it to a one-dimensional (1-D) vector. This is similar

    to sorting according to importance. High importance coefficients are located in the top-left

    corner of the block. When a total of 16 coefficients are selected from an image, only 1st

    coefficient of each of 16 DCT blocks is selected. As the no. of selected coefficients increases so

    does the size of the feature vector. For 32 size feature vector first 2 coefficients from each DCT

    block are selected, and in the same manner 48, 64, 128 and 256 size feature vectors were created.

    Figure 3: Zigzag scan of DCT coefficients

  • 7/28/2019 Gender recognition

    25/43

    25

    3.2.3 Classifier:

    KNN is a supervised learning classifier. For 1-NN we assign test sample to the class of its closest

    neighbor, and for KNN we assign the majority class of its K closest neighbors where K

    parameter is number of neighbors. It is usual to use the Euclidean distance to find closest

    neighbors, though other distance measures such as the Manhattan distance could in principle be

    used instead.[2]

    Figure 4: Example of k-NN Classification

  • 7/28/2019 Gender recognition

    26/43

    26

    Figure 5: Gender Recognition Methodology Example

  • 7/28/2019 Gender recognition

    27/43

    27

    CHAPTER 4: THE PROPOSED APPROACH

    4.1 Introduction:

    Although several gender classification methods have been reported in the literature, gender

    classification has attracted less attention compared to other research topics in computer vision.

    Almost every gender classification method reported in the literature uses the complete set of

    features extracted from frontal images or uses the raw image directly. Obviously, frontal images

    contain lots of information, such asage, race, and gender. If the objective is to perform gender

    classification, then information from unrelated sources might confuse the classifier. Automatic

    feature subset selection distinguishes our proposed gender classification method from other

    reported approaches.A GA is used to select gender-related features automatically and improve

    the performance of the gender classifier.

    Figure 6: Proposed Gender Recognition System Flow

    Frontal Facial Image

    Classifier trained on the

    available data set

    Represent each Image as afeature vector

    Select a subset of Gender

    relevant features (using Genetic

    Al orithm

    Gender Classification

  • 7/28/2019 Gender recognition

    28/43

    28

    4.2 Feature Extraction:

    Gender detection can be stated as Feature extraction and Classification based on that feature.

    After the acquisition of the image, the features extraction process starts and to extract features of

    a face at first the image is converted into grey level image. From this image the centroid of the

    face image is calculated. Then from the centroid, only face has been cropped and converted into

    the gray level and the features have been collected. Gender relevant features are:

    1. Geometrical (nose width, lips width, vertical distance between eyes and nose,

    vertical distance between eyes and lips etc.)

    2. External (eyebrow thickness, hair information)

    3. Textual (pixel details).

    Some distinguishing feature properties in both genders are summarized below:

    Female: Eye region including eyebrows and lip region is better clue than mouth and chin region.

    Male: Lower portion of face is better clue than the eye region.

    In total nine features can be selected as:

    1. Geometrical features:

    I. Vertical distance between eyes and nose tip

    II.

    Nose widthIII. Vertical distance between eyes and lips centre

    IV. Lips width

    2. External features:

    I. Eyebrow thickness

    II. Moustache detection

    III. Long hair information

    3. Textual features:

    I. Eyebrow Pixels

    II. Lips pixel

  • 7/28/2019 Gender recognition

    29/43

    29

    4.3 Genetic Feature Selection:

    Most gender classification methods in the literature follow the same basic strategy: (a) feature

    extraction is applied on the raw images; (b) a classifier is trained using all the features extracted

    from the images. The problem with this strategy is that it uses all the extracted features for

    gender classification. As a result, gender-irrelevant information might be fed to the gender

    classifier. This might not allow the classifier to generalize nicely, especially when the training set

    is small.

    GAs is a class of optimization procedures inspired by the mechanisms of natural selection . GAs

    operates iteratively on a population of structures, each of which represents a candidate solution

    to the problem, encoded as a string of symbols (chromosome). Arandomly generated set of such

    strings forms the initial population from which the GA starts its search. Three basic genetic

    operators guide this search: selection, crossover and mutation

    The goal of feature subset selection is to use fewer features to achieve the same or better

    performance. Therefore, the fitness evaluation contains two terms: (i) accuracy and (ii) number

    of features used. Only the features in the subset encoded by an individual are used to train the

    NNclassifier. The performance of the NN is estimated using a validation data set and used to

    guide the GA. Each feature subset contains a certain number of features. If two subsets achieve

    the same performance, while containing different number of features, the subset with fewer

    features is preferred. Between accuracy and feature subset size, accuracy is our major concern.

    Combining these two terms, the fitness function is given as:

    Fitness = 104Accuracy + 0.4 *Zeros

    where accuracy is the accuracy rate that an individual, and zerosis the number of zeros in thechromosome. The accuracy ranges roughly from 0.5to 1 (i.e., the first term assumes values in

    the interval 5000 to 10000). The number of zeros ranges from 0 to lwhere l is the length of the

    chromosome (i.e., the second term assumes values in the interval 0 to 100 since l=250here).

  • 7/28/2019 Gender recognition

    30/43

    30

    Overall, the higher the accuracy is, the higher the fitness is. Also, the fewer the number of

    features used the higher the number of zeros and as a result, the higher the fitness. It should be

    noted that individuals with higher accuracy will outweigh individuals with lower accuracy, no

    matter how many features they contain.

    4.3.1 Initial Population:

    In general, the initial population is generated randomly, (e.g., each bit in an individual is set by

    flipping a coin). In this way, however, we will end up with a population where each individual

    contains the same number of 1s and 0s on the average. To explore subsets of different numbers

    of features, the number of 1s for each individual is generated randomly. Then, the 1s are

    randomly scattered in the chromosome.

    4.3.2 Crossover:

    In general, we do not know how the features depend on each other. If dependent features are far

    apart in the chromosome, it is more probable that traditional 1point crossover, will destroy the

    schemata. To avoid this problem, uniform crossover is used here.

    4.3.3 Mutation:

    Mutation is a very low probability operator and just flips a specific bit. It plays the role of

    restoring lost genetic material. Our selection strategy was cross generational. Assuming a

    population of size N, the offspring double the size of the population and we select the best N

    individuals from the combined parent-offspring population.

    Extracted features of the trained face images have been fed in to the Genetic algorithm for

    gender classification.

    4.4 Gender Classification:

    First step in any classification technique is the representation of faces in terms of input vector.

    Once all the feature are extracted, neural network classifier is trained which can classify input

    vector as male or female.

  • 7/28/2019 Gender recognition

    31/43

    31

    A neural network consists of units (neurons), arranged in layers, which convert an input vector

    into some output. Each unit takes an input, applies a (often nonlinear) function to it and then

    passes the output on to the next layer. Generally the networks are defined to be feed-forward: a

    unit feeds its output to all the units on the next layer, but there is no feedback to the previous

    layer. Weightings are applied to the signals passing from one unit to another, and it is these

    weightings which are tuned in the training phase to adapt a neural network to the particular

    problem at hand. This is the learning phase.

    Once a neural network is configured, it forms an appropriate internal feature extractors and

    classifiers based on training examples. In training phase, network uses training set to update

    weights of its neuron in order to reduce network error. After the training phase, trained network

    is used for classification. The representation is internally distributed across the network as aseries of independent weights has many advantages: noise immunity, pattern generalization, and

    interpolation capability.

  • 7/28/2019 Gender recognition

    32/43

    32

    4.5 Development of the Gender Recognition System:

    Matlab R2012b is used for the development of the system including the steps explained in the

    above sections. An intuitive Graphical User Interface is developed as shown below:

    Figure 8: GUI of Gender Recognition System

    Code has been trained and the trained parameters are saved in file gabestopt.mat and these

    parameters are loaded when you click on "Gender Recognition" button.If you want to re-train system from "zero" just click on "GA Optimization" button, on which

    gender_db_file.mat file will be loaded (all facial images with sex).Code will be trained from a

    random state (recognition rate in this random state should be about 50%). You can train the

    system only on a subset of original images: in this case you have to load gender_db_file.mat file.

    Program info and Source code buttons help you to know about the respective topics from the

    documentation.

  • 7/28/2019 Gender recognition

    33/43

    33

    CHAPTER 5: RESULT AND DISCUSSION

    5.1 Introduction:We have developed a system for facial gender recognition that is capable to extract from image

    most informative features using an approach based on genetic algorithms. Code uses some features

    in spatial domain and uses genetic algorithm to optimize feature vector extraction.

    5.2 Databases Used:

    In order to train the classifier, we need a large database of facial images of both genders and

    subsequently test the classifier to detect it performance. Thus to achieve it I have used Stanford

    Medical college facial database (200 male and 200 female faces) and IIT Kanpur facial

    database(40 male and 40 female faces).Given below is the sample of each database facial

    images.

    Figure 9: Indian facial database sample

  • 7/28/2019 Gender recognition

    34/43

    34

    Figure 10: Stanford student database sample

    5.3 Experimental Results:We have performed a number of experiments and comparisons in order to demonstrate the

    performance of the proposed gender classification approach. The code has been tested with

    Stanford medical student database and IIT Kanpur face database (240 female images and 240

    male images, 75% used for training and 25% used for testing, hence there are 360 training

    images and 120 test images in total randomly selected and no overlap exists between the training

    and test images).Results had come positive for most of the cases but had shown some negative

    classifications too as explained below with examples:

    5.3.1 Correct Classification Result:

    Given below is example of correct gender classification of a female and male faces each. Out of

    the 120 tested faces, 113 have shown positive result.

  • 7/28/2019 Gender recognition

    35/43

    35

    Figure 11: Indian Female Face

    Figure 12: Female gender recognized

  • 7/28/2019 Gender recognition

    36/43

    36

    Figure 13: Indian Male Face

    Figure 14: Male gender recognized

  • 7/28/2019 Gender recognition

    37/43

    37

    The code has been tested with Stanford medical student database which includes 200 female

    images and 200 male images, 90% used for training and 10% used for testing, hence there are

    360 training images and 40 test images in total randomly selected and no overlap exists between

    the training and test images. Since these faces have different pattern in there facial features then

    the Indian faces to some extent so in case of ethnic deviation we can train our system with new

    data set using GA Optimization button in our GUI. Below are two randomly selected faces from

    the database and there respective outputs as recognized gender.

    Figure 15: Male Face

  • 7/28/2019 Gender recognition

    38/43

    38

    Figure 16: Male gender recognized

    Figure 17: Female Face

  • 7/28/2019 Gender recognition

    39/43

    39

    Figure 18: Female gender recognized

    5.3.2 False Classification Result:

    Given below is an example of false gender identification. In whole we witnessed seven cases of

    false classification.

    Figure 19: Experiment on Indian Female Face

  • 7/28/2019 Gender recognition

    40/43

    40

    Figure 20:False Gender Recognized

    5.4 Result Analysis:

    The code has been tested with Stanford student database and Indian facial database (240 female

    images and 240 male images, 75% used for training and 25% used for testing, hence there are

    360 training images and 120 test images in total randomly selected and no overlap exists

    between the training and test images). The results showed the recognition rate of 94.16%.

  • 7/28/2019 Gender recognition

    41/43

    41

    CHAPTER 6: CONCLUSION AND SCOPE OF THE FUTURE WORK

    6.1 Conclusion:

    Automatic gender identification plays an important role in identification of a person. Gender

    identification can help effectively to reduce the search time by limiting the subsequent searching

    stage to either a male database or a female database. Automatic gender identification can also

    provide an important clue in various security and surveillance based applications.

    A successful gender classification approach can boost the performance of many other

    applications including face recognition and smart human-computer interfaces. Despite its

    importance, it has received relatively little attention in the literature. Thus an automatic feature-selection-based gender classification scheme is explained in this paper. We argue that feature

    selection is important for gender classification, and demonstrate that, by removing features that

    do not encode important gender information from the representation of the faces using genetic

    algorithm thus reducing the error rate significantly.

    6.2 Limitations:

    Despite the good performances of our gender recognition methods as explained in the previous

    section, some limitations do exist, as explained.

    Gender recognition is based on 2-D images which are sensitive to the viewpoint of the

    camera and lighting conditions.

    Intrinsic factors of genetics and ethnic deviations such as national or cultural heritance are

    not taken into account.

    Extrinsic factors of the environment and behavior choices (i.e. sun exposure, drugs,

    cigarettes, etc) are not included in our system.

  • 7/28/2019 Gender recognition

    42/43

    42

    6.3 Future Research Work:

    3-D human body shapes obtained by laser scanning can be used for gender recognition to

    avoid the limitations caused due to 2-D image. Different machine-learning algorithms andfeature-extraction methods are investigated and analyzed on this issue.

    The idea of using Multi-ethnic face feature estimation can also be incorporated to reduce the

    fault rate in face feature extraction i.e. patterns become confounded by intrinsic factors of

    genetics, gender differences, and ethnic deviations and, equally as important, extrinsic

    factors of the environment and behavior choices (i.e. sun exposure, drugs, cigarettes, etc).

  • 7/28/2019 Gender recognition

    43/43

    REFERENCES

    1. Karl Ricanek,Jr.,Senior Member,IEEE,Yishi Wang and Susan J. Simmons: Generalized Multi-EthnicFace Age

    Estimation, 2011 IEEE Computer Society Conference ,Volume 1,2011.

    2. S.Ravi,S.Wilson:Face detection with facial features and gender classification based on Supportvector machine.International Journal of Imaging Science and Engineering,India ,2010.

    3. ZHEN-HUA WANG, ZHI-CHUN MU: Gender classification using selected independent-featuresbased on genetic algorithm. Proceedings of the eigth International Conference on Machine

    Learning,Baoding,2009.

    4. C.R Vimal Chand , Global Journal of Computer Science and Technology : Face and genderRecognition Using Genetic Algorithm &Hopfield Neural Network ,2003 IEEE Computer Society

    Conference ,p.511,Volume 1,2003.

    5. M. Nazir, Muhammad Ishtiaq, Anab Batool, M. Arfan Jaffar, Anwar M. Mirza :Feature Selectionfor Efficient Gender Classification. National University of Computer and Emerging Science, FAST,

    Islamabad, Pakistan.

    6. M.Turk and A. Pentland,Eigen faces for recognition,journal of cognitive Neuroscience,vol.3,2005.

    7. Lindsay I Smith:A tutorial on Principal Component Analysis.8. Moghaddam,B. Yang et al. Learning gender with support faces.IEEE Transactions on Pattern

    Analysis.2002,24(5):707-711

    9. An Introduction to Genetic Algorithms: Melanie Mitchell10.BEdelman et al., Sex Classification of Face Areas: How Well can a Linear Neural Network

    Predict Human Performance?,Journal of Biological Systems,vol. 6, no. 3, 1998.

    11.L. Eshelman, The CHC adaptive search algorithm: how to have safe search when engaging innon-traditional genetic recombination, Proceedings of the Foundation of Genetic Algorithms

    Workshop,pp. 265-283, 2007.