genetic algorithm - shodhganga...genetic algorithm page 28 2.1 genetic algorithm in the previous...

Genetic Algorithm Page 27

CHAPTER 2

GENETIC ALGORITHM

“Genetic algorithm is basically a method for solving constrained and unconstrained

optimization problems. GA is based on the Darwin’s theory of natural evolution specified

in the origin of species. GA is based on the concept of ‘survival of the fittest’. As in the

nature the fit species remain intact, while the unfit species is eliminated. On the similar

lines out of a number of solutions available, only the more fit solutions are survived,

while the less fit solutions are discarded. GA represents the solutions in the form of

chromosomes and the fitness of the chromosomes is evaluated. The more fit solutions are

selected for the reproduction using the crossover operator. The mutation operator is used

to maintain the diversity the population. The more fit chromosomes replace less fit

chromosomes and the process continues till the optimal solution is found on the basis of

some pre-specified criteria. This chapter provides an overview of GA. GA is based on

population of multiple points as compared to traditional approaches which are based on

single point. The various types of encoding, selection, crossover, mutation and

replacement mechanisms are discussed in this chapter. The major advantage of GA is

that it can be used in such type of situations where the numerical or mathematical models

fail. As it is an evolutionary algorithm, one can easily view the progress within each

iteration. GA can be used in a number of application areas such as optimization, design,

robotics, image processing, machine learning, automatic programming, etc. GA can also

be used in YM for the purpose of airline booking, hotel industry, air traffic control, choice

based network revenue model etc. Overall GA can prove to be a very effective tool for

YM.”

Genetic Algorithm 8

2.1 Genetic Algorithm

In the previous chapter, an overview of yield management and optimization was given. It was

observed that yield management is basically a problem of optimization. The major conditions

for yield management as discussed are fixed capacity, perishable inventory and price

discrimination. The yield management problems are stochastic in nature as it is not known

when and how many customers will arrive, it is merely a prediction. These conditions

perfectly form the platform for applying genetic algorithm on the yield management problem.

An insight into genetic algorithm will now be taken.

A genetic algorithm (GA) is a method for solving both constrained and unconstrained

optimization problems based on a natural selection process that mimics biological evolution.

The algorithm repeatedly modifies a population of individual solutions. At each step, the

genetic algorithm randomly selects individuals from the current population and uses them as

parents to produce the children for the next generation. Over successive generations, the

population evolves toward an optimal solution.

One can apply the genetic algorithm to solve problems that are not well suited for standard

optimization algorithms, including problems in which the objective function is discontinuous,

nondifferentiable, stochastic, or highly nonlinear.

Charles Darwin stated the theory of natural evolution in the origin of species on which the

genetic algorithm is based. According to the theory of natural evolution, “over several

generations, biological organisms evolve based on the principle of natural selection ‘survival

of the fittest’ to reach certain remarkable tasks”.

In nature, an individual in population competes with each other for basic resources for life

such as food, shelter, etc. Also in the same species, individuals compete to attract mates for

reproduction. In this selection procedure, poorly performing individuals have less chance to

survive, and the most adapted or “fit” individuals produce a relatively large number of

offspring’s. It can also be observed that during reproduction, a recombination of the good

characteristics of each ancestor can produce “best fit” offspring whose fitness is greater than

that of a parent in general. After a few generations, species evolve spontaneously to become

more and more adapted to their environment and may sustain for a longer period of time.

In 1975, Holland developed this idea in his book “Adaptation in natural and artificial

systems”. He described how to apply the principles of natural evolution to optimization

problems and built the first Genetic Algorithms. Holland’s theory has now been further

developed by leaps and bounds. Genetic Algorithms (GAs) now stand up as a powerful tool


for solving search and optimization problems. Genetic algorithms are based on the basic

principle of genetics and evolution.

2.2 Historical Background

Holland’s influence in the development of the GA has been very important, but several other

scientists with different backgrounds were also involved in developing similar ideas. 1975

was a pivotal year in the development of genetic algorithms. It was in that year that Holland’s

book was published, but perhaps more relevantly for those interested in metaheuristics, that

year also saw the completion of a doctoral thesis by one of Holland’s graduate students, Ken

DeJong (1975). Other students of Holland’s had completed theses in this area before, but this

was the first to provide a thorough treatment of the GA’s capabilities in optimization.

Another graduate student of Holland’s, David Goldberg, produced first an award-winning

doctoral thesis on his application to gas pipeline optimization, and then, in 1989, an

influential book —Genetic Algorithms in Search, Optimization, and Machine Learning. This

was the final catalyst in setting off a sustained development of GA theory and applications

that is still growing rapidly.

Optimization had a fairly small place in Holland’s work on adaptive systems, yet the majority

of research on GAs tends to assume this is their purpose. Nevertheless, using GAs for

optimization is very popular, and frequently successful in real applications, and to those

interested in metaheuristics, it will undoubtedly be the viewpoint that is most useful.

When GA is used to solve optimization problems, good results are obtained quite quickly. A

heuristic is a part of an optimization algorithm that uses the information currently gathered by

the algorithm and acts as a carrier to decide which solution candidate should be tested next,

or how the next individual can be produced [Thomas (2007)]. Genetic algorithms are guided

random search and one of the most popular optimization techniques among evolutionary

algorithms for multi-objective optimization problems. Genetic algorithms have been found to

be capable of finding solutions for a wide variety of problems for which no acceptable

algorithmic solutions exist. GA has been used for solving various NP Complete problems

[Vijay Lakshmi and Radha Krishnan (2007)]. GA attempts to arrive at optimal solutions

through a process similar to biological evolution. To use a genetic algorithm, it is required to

represent the solution of the problem as a genome (or chromosome). The genetic algorithm

then creates a population of solutions and applies genetic operators such as mutation and

crossover to evolve the solutions in order to find the best one. These operate on a population

of potential solutions, applying the principle of survival of the fittest to generate improved

estimations to a solution. At each generation, a new set of approximations is created by the


process of selecting individuals according to their level of fitness and breeding them together

using genetic operators inspired by natural genetics. This process leads to the evolution of

better populations than the previous populations [Eiben & Smith (2003), Michalewicz

(1996)]. The GA consists of an iterative process that evolves a working set of individuals

called a population toward an objective function, or fitness function [Goldberg (1989),

Whitley (1994)].Genetic algorithms are typically implemented using computer simulations in

which an optimization problem is specified.

2.3 Natural Selection

The origin of species is based on “Preservation of favourable variations and rejection of

unfavourable variations”. The variation refers to the changes shown by the individual of a

species and also by offspring’s of the same parents. There are lot more individuals born than

can survive, so there is a continuous struggle for life. Individuals with an advantage have a

greater chance for survive i.e., the survival of the fittest. For example, Giraffe with long

necks can have food from tall trees as well from grounds, on the other hand goat, deer with

small neck have food only from grounds. As a result, natural selection plays a major role in

this survival process [S.N.Sivanandam et.al. (2008)]. On the similar lines in a genetic

algorithm in each iteration the favourable (fit) individual are survived and the unfavourable

(unfit) individual are died out. In each iteration the process goes on and on until a stage a

reached in which the stable or optimized solutions are reached, which can be termed as

adaptability.

The following Table 2.1 gives a list of different expressions, which are in common with

natural evolution and genetic algorithm.

Table 2.1 Comparison of natural evolution and genetic algorithm terminology

Natural evolution Genetic algorithm

Chromosome String

Gene Feature or character

Allele Feature value

Locus String position

Genotype Structure or coded string

Phenotype Parameter set, a decoded structure

2.4 Basic Principle

The working principle of a standard GA is illustrated in algorithm 2.1. The major steps

involved are the generation of a population of solutions, finding the objective function and


fitness function and the application of genetic operators. These aspects are described with the

help of a basic genetic algorithm as below.

Algorithm 2.1 Basic Genetic Algorithm

[start] Generate random population of n chromosomes/individuals (suitable and possible

solutions for the problem)

[Fitness] Evaluate the fitness f(x) of each chromosome/individual x in the population

[New population] Create a new population by repeating following steps until the New

population is complete

[selection] select two parent chromosomes from a population according to their

fitness ( the better fitness, the bigger chance to get selected).

[crossover] With a crossover probability, cross over the parents to form new offspring

(children). If no crossover was performed, offspring is the exact copy of parents.

[Mutation] With a mutation probability, mutate new offspring at each locus (position

in chromosome)

[Accepting] Place new offspring in the new population.

[Replace] Use new generated population for a further run of the algorithm.

[Test] If the end condition is satisfied, stop, and return the best solution in current

population.

[Loop] Go to the second step for fitness evaluation.

The basic principle behind GAs is that they create and maintain a population of individuals

represented by chromosomes. Chromosomes are essentially a character string analogous to

the chromosomes appearing in DNA. These chromosomes are typically encoded solutions to

a problem. The chromosomes then undergo a process of evolution according to rules of

selection, reproduction and mutation. Each individual in the environment (represented by a

chromosome) receives a measure of its fitness in the environment. Reproduction selects

individuals with high fitness values in the population, and through crossover and mutation of

such individuals, a new population is derived in which individuals may be even better fitted

to their environment. The process of crossover involves two chromosomes swapping chunks

of data and is analogous to the process of sexual reproduction. Mutation introduces slight

changes into a small proportion of the population and is representative of an evolutionary

step.


2.5 Difference between Traditional and Genetic Approach

An algorithm is a series of steps for solving a problem. A genetic algorithm is a problem

solving method that uses genetics as its model of problem solving. It’s a search technique to

find approximate solutions to optimization and search problems. One can easily differentiate

between a traditional algorithm and a genetic algorithm.

Table 2.2. Difference between Traditional and Genetic Approach

Traditional Algorithm Genetic Algorithm

Generates a single point at each iteration.

The sequence of points approaches an

optimal solution.

Generates a population of points at each

iteration. The best point in the population

approaches an optimal solution.

Selects the next point in the sequence by a

deterministic computation.

Selects the next population by computation

which uses random number generators.

Improvement in each iteration is problem

specific

Convergence in each iteration in problem

independent.

The differences can also be shown with the help of the following fig. 2.1

Fig. 2.1 Comparison of traditional and genetic approaches

2.6 Exploitation and Exploration

Search is one of the more universal problem solving methods for such type of problems in

which one cannot determine a prior sequence of steps leading to a solution. Search can be


performed with either blind strategies or heuristic strategies. Blind search strategies do not

use information about the problem domain. Heuristic search strategies use additional

information to guide search move along with the best search directions.

There are two important issues in search strategies: exploiting the best solution and exploring

the search space. Michalewicz(1996) gave a comparison on hill climbing search, random

search and genetic search. Hill climbing is an example of a strategy which exploits the best

solution for possible improvement, ignoring the exploration of the search space. Random

search is an example of a strategy which explores the search space, ignoring the exploitation

of the promising regions of the search space. GA is a class of general purpose search methods

combining elements of directed and stochastic search which can produce a remarkable

balance between exploration and exploitation of the search space. At the beginning of genetic

search, there is a widely random and diverse population and crossover operator tends to

perform wide-spread search for exploring all solution space. As the high fitness solutions

develop, the crossover operator provides exploration in the neighbourhood of each of them.

In other words, what kinds of searches (exploitation or exploration) a crossover performs

would be determined by the environment of genetic system (the diversity of population) but

not by the operator itself.

2.7 Population-based Search

Generally, an algorithm for solving optimization problems is a sequence of computational

steps which asymptotically converge to optimal solution. Most classical optimization

methods generate a deterministic sequence of computation based on the gradient or higher

order derivatives of objective function. The methods are applied to a single point in the

search space. The point is then improved along the deepest descending direction gradually

through iterations as shown in algorithm 2.1. This Point-to-point approach embraces the

danger of failing in local optima. GA performs a multi-directional search by maintaining a

population of potential solutions. The population-to-population approach is hopeful to make

the search escape from local optima. Population undergoes a simulated evolution: at each

generation the relatively good solutions are reproduced, while the relatively bad solutions die.

GA uses probabilistic transition rules to select someone to be reproduced and someone to die

so as to guide their search toward regions of the search space with likely improvement.

2.8 Building block hypothesis

Genetic algorithms are simple to implement, but their behaviour is difficult to understand. In

particular it is difficult to understand why these algorithms frequently succeed at generating


solutions of high fitness when applied to practical problems. The building block hypothesis

consists of:

(i) A description of a heuristic that performs adaptation by identifying and recombining

building blocks, i.e. low order, low defining-length schemata with above average fitness.

(ii) A hypothesis that a genetic algorithm performs adaptation by implicitly and efficiently

implementing this heuristic.

2.9 Implementation of Genetic Algorithm

GAs encodes the decision variables of a search problem into finite-length strings of alphabets

of certain cardinality. To evolve good solutions and to implement natural selection, one needs

a measure for distinguishing good solutions from bad solutions. The measure could be an

objective function that is a mathematical model or a computer simulation. In essence, the

fitness measure must determine a candidate solution’s relative fitness, which will

subsequently be used by the GA to guide the evolution of good solutions.

Another important concept of GAs is the notion of population. The population size, which is

usually a user-specified parameter, is one of the important factors affecting the scalability and

performance of genetic algorithms. For example, small population sizes might lead to

premature convergence and yield substandard solutions. On the other hand, large population

sizes lead to unnecessary expenditure of valuable computational time [Sastry et.al. (2005)].

Once the problem is encoded in a chromosomal manner and a fitness measure for

discriminating good solutions from bad ones has been chosen, solutions can start to evolve

the search problem using the steps already specified in algorithm 2.1

In the next subsections details of these steps will be discussed.

2.9.1 Initialization

Usually there are only two main components of most genetic algorithms that are problem

dependent: the problem encoding and the evaluation function. Consider a parameter

optimization problem where one must optimize a set of variables either to maximize some

target such as a profit, or to minimize cost or some measure of error. The goal is to set the

various parameters so as to optimize some output. In more traditional terms, some function

‘f’ should be minimized, or maximized.

The most common form of representing a solution as a chromosome is a string of binary

digits. Each bit in this string is a gene. The process of converting the solution from its

original form into the bit string is known as encoding. The specific encoding scheme used is

application dependent. The solution bit strings are decoded to enable their evaluation using a


fitness measure. Chromosomes are all of the same type and same length. The population size

remains constant from generation to generation.

2.9.2 Encoding

Encoding of chromosomes is the first question to ask when starting to solve a problem with

genetic algorithm. Genetic algorithms work on encoded space and solution space

alternatively. Genetic operations work on encoded space i.e. chromosomes, while evaluation

and selection work on solution space [Gen & Mitsuo (1996)]. A chromosome should in some

way contain information about solution that it represents. The most used way of encoding is a

binary string. Each bit in the string can represent some characteristics of the solution. Another

possibility is that the whole string can represent a number. Of course, there are many other

ways of encoding. The encoding depends mainly on the problem to be solved. For example,

one can encode directly integer or real numbers; sometimes it is useful to encode some

permutations and so on. Various types of encodings are available which can be selected

according to the nature of the problem.

Genetic algorithms follow two basic principles for choosing the encoding method namely:

1. The principle of meaningful building blocks: The schemata should be short, of low order,

and relatively unrelated to schemata over other fixed positions.

2. The principle of minimal alphabets: The alphabet of the encoding should be as small as

possible while still allowing a natural representation of solutions.

The first principle states that the user should select a coding such that the building blocks of

the underlying problem are small and relatively unrelated to building blocks at other

positions. The principle of meaningful building blocks is directly motivated by the schema

theorem. If schemata are highly fit, short and of low order, then their numbers exponentially

increase over the generations. If the high-quality schemata are long or of high order, they are

disrupted by crossover and mutation and they cannot be propagated properly. The second

principle states that the user should select the smallest alphabet that permits an expression of

the problem so that the number of exploitable schemas is maximized [Goldberg (1989)]. The

principle of minimal alphabets tells us to increase the potential number of schemata by

reducing the cardinality of the alphabet. When using minimal alphabets the number of

possible schemata is maximal. This is the reason why Goldberg advises to use bit string

representations, because high quality schemata are more difficult to find when using

alphabets of higher cardinality.


2.9.2.1 Binary Encoding

The most common way of encoding is a binary string, which can be represented as in Fig.

2.2. Each chromosome encodes a binary (bit) string. Each bit in the string can represent some

characteristics of the solution. Every bit string therefore is a solution but not necessarily the

best solution. Another possibility is that the whole string can represent a number. The way bit

strings can code differs from problem to problem.

Binary encoding gives many possible chromosomes with a smaller number of alleles. On the

other hand this encoding is not natural for many problems and sometimes corrections must be

made after genetic operation is completed. Binary coded strings with 1s and 0s are mostly

used. The length of the string depends on the accuracy.

In this encoding integers are represented exactly, finite number of real numbers can be

represented and number of real numbers represented increases with string length.

Chromosome 1 1100011101110010

Chromosome 2 0110010101110011

Fig. 2.2 Binary Encoding

2.9.2.2 Octal Encoding

This encoding uses string made up of octal numbers (0–7). The basic advantage of this

encoding scheme over binary encoding is the smaller size.

Chromosome 1 20346151

Chromosome 2 12670231

Fig. 2.3 Octal Encoding

2.9.2.3 Hexadecimal Encoding

This encoding uses string made up of hexadecimal numbers (0–9, A–F). The advantage of

this encoding scheme over binary and octal encoding is again smaller size.

Chromosome 1 A09B

Chromosome 2 932F

Fig. 2.4 Hexadecimal Encoding

2.9.2.4 Gray Encoding

Ordinary binary number representation of the variable values may slow convergence of a GA.

Increasing the number of bits in the variable representation magnifies the problem [Haupt and

Haupt (1998)]. Gray Code can avoid this problem by redefining the binary numbers so that

consecutive numbers have a hamming distance of one [Taub and Schilling (1986)]. Gray

codes speed convergence time by keeping the algorithm’s attention on converging toward a


solution [Caruana and Schaffer (1988)]. Gray coding uses QUAD search for finding the

solutions. As in binary strings, even in gray coded strings a bit change in any arbitrary

location may cause a large change in the decoded integer value. The decoding of the gray

coded strings to the corresponding decision variable introduces an artificial non-linearity in

the relationship between the string and the decoded value.

2.9.2.5 Permutation Encoding

Every chromosome is a string of numbers, which represents the number in sequence.

Sometimes corrections have to be done after genetic operation is completed. In permutation

encoding, every chromosome is a string of integer/real values, which represents number in a

sequence. Permutation encoding is only useful for ordering problems. Even for these

problems for some types of crossover and mutation corrections must be made to leave the

chromosome consistent.

Chromosome 1 1 4 2 7 3 9 8 6 5

Chromosome 2 2 6 1 9 3 7 4 5 8

Fig. 2.5 Permutation Encoding

Permutations are also important for scheduling applications, variants of which are also often

NP complete. This encoding is also called path representation or order representation

[Starkweather et al. (1991)].

2.9.2.6 Value Encoding

Every chromosome is a string of values and the values can be anything related to the

problem. This encoding produces very good results for some special problems. On the other

hand, it is often necessary to develop new genetic operator’s specific to the problem. Direct

value encoding can be used in problems, where some complicated values, such as real

numbers, are used. Use of binary encoding for these types of problems would be very

difficult. In value encoding, every chromosome is a string of some values. Values can be

anything connected to problem, form numbers, real numbers or chars to some complicated

objects.

Value encoding is very good for some special problems. On the other hand, for this encoding

is often necessary to develop some new crossover and mutation specific for the problem.

Chromosome 1 2.125 0.2398 8.0127 0.0932 3.2917

Chromosome 2 AFJBADCEHISTKJHTP

Chromosome 3 (back) (back) (right) (left) (forward)

Fig. 2.6 Value Encoding


2.9.2.7 Tree encoding

Tree encoding is used mainly for genetic programming. In the tree encoding every

chromosome is a tree of some objects, such as functions or commands in programming

language. The representation space is defined by defining the set of functions and terminals

to label the nodes in the trees. Trees provide rich representation that is sufficient to represent

computer programs, analytical functions, and variable length structure, even computer

hardware. Parse tree is a popular representation for evolving executable structures [Back et

al. (1997)]. Parse tree incorporates natural recursive definition, which allows for dynamically

sized structures. Most of the parse tree representations have restriction on size of evolving

programs. In a parse tree representation, the contents of the parse tree determine the power

and suitability of the representation. Due to acyclic nature of parse trees, iterative

computations are not naturally represented. It is very difficult to identify the stopping criteria.

So the evolved function is evaluated within an implied loop that re-executes the evolved

function until some predetermined stopping criteria is satisfied.

Tree encoding allows search space to be open ended. But due to this, tree may grow in an

uncontrolled way. Large trees are difficult to understand and simplify. Large trees also

prevent structure and hierarchical candidate solutions. Parse tree incorporates natural

recursive definition, which allows for dynamically sized structures. Most of the parse tree

representations have restriction on size of evolving programs. If there is no restriction, then it

would lead to increase in size of evolving programs and further lead to swamping of available

computational resources. Size restriction is implemented in two ways. Depth limitation

restricts the size of evolving parse tree based on user-defined maximal depth parameter. Node

limitation places limit on total number of nodes available for an individual parse tree. Node

limitation is preferred over size restriction because it encodes fewer restrictions on structural

organization of evolving programs [Angeline (1996)].

Fig.2.7 Image courtesy: http://www.myreaders.info/09 Genetic Algorithms.pdf


2.9.3 Fitness Evaluation

A fitness function is a particular type of objective function that prescribes the optimality of a

solution in a genetic algorithm so that that particular chromosome may be ranked against all

the other chromosomes. Optimal chromosomes, or at least chromosomes which are more

optimal, are allowed to breed and mix their datasets by any of several techniques, producing a

new generation that will hopefully be even better. An ideal fitness function correlates closely

with the algorithm's goal, and yet may be computed quickly. Speed of execution is very

important, as a typical genetic algorithm [DeJong (2006)] must be iterated many, many times

in order to produce a usable result for a non-trivial problem. This is one of the main

drawbacks of GAs in real world applications and limits their applicability in some industries.

Sometimes approximate models may be one of the most promising approaches, especially in

the following cases:

• Fitness computation time of a single solution is extremely high,

• Precise model for fitness computation is missing,

• The fitness function is uncertain or noisy.

Another way of looking at fitness functions is in terms of a fitness landscape, which shows

the fitness for each possible chromosome. Definition of the fitness function is not

straightforward in many cases and often is performed iteratively if the fittest solutions

produced by GA are not what is desired. In some cases, it is very hard or impossible to come

up even with a guess of what fitness function definition might be. Interactive genetic

algorithms [Kershenbaum (1996), Davis (1987)] address this difficulty up to some extent.

2.9.4 Genetic Operators

A Genetic Operator is an operator used in genetic algorithms to maintain genetic diversity.

Genetic variation is a necessity for the process of evolution. Genetic operators used in genetic

algorithms are analogous to those which occur in the natural world: survival of the fittest, or

selection; reproduction (crossover, also called recombination); and mutation. Genetic

diversity, the level of biodiversity, refers to the total number of genetic characteristics in the

genetic makeup of a species. It is distinguished from genetic variability, which describes the

tendency of genetic characteristics to vary.

When GA proceeds, both the search direction to optimal solution and the search speed should

be considered as important factors, in order to keep a balance between exploration and

exploitation in search space. In general, the exploitation of the accumulated information

resulting from GA search is done by the selection mechanism, while the exploration to new

regions of the search space is accounted for by genetic operators. The genetic operators


mimic the process of heredity of genes to create new offspring at each generation. The

operators are used to alter the genetic composition of individuals during representation. In

essence, the operators perform a random search, and cannot guarantee to yield an improved

offspring. There are three common genetic operators: crossover, mutation and selection.

2.9.4.1 Selection

Selection is the process of selecting two or more parents from the population for crossing.

After deciding on an encoding, the next step is to decide how to perform selection i.e., how to

choose individuals in the population that will create offspring for the next generation and how

many offspring each will create. The purpose of selection is to emphasize fitter individuals in

the population in hopes that their off springs have higher fitness. Chromosomes are selected

from the initial population to be parents for reproduction. The problem is how to select these

chromosomes. According to Darwin’s theory of evolution the best ones survive to create new

offspring.

Selection is a method that randomly picks chromosomes out of the population according to

their evaluation function. The higher the fitness function, the more chance an individual has

to be selected. The selection pressure is defined as the degree to which the better individuals

are favoured. The higher the selection pressured, the more the better individuals are favoured.

This selection pressure drives the GA to improve the population fitness over the successive

generations.

The convergence rate of GA is largely determined by the magnitude of the selection pressure,

with higher selection pressures resulting in higher convergence rates. Genetic Algorithms

should be able to identify optimal or nearly optimal solutions under a wide range of selection

scheme pressure. However, if the selection pressure is too low, the convergence rate will be

slow, and the GA will take unnecessarily longer time to find the optimal solution. If the

selection pressure is too high, there is an increased change of the GA prematurely converging

to an incorrect (sub-optimal) solution. In addition to providing selection pressure, selection

schemes should also preserve population diversity, as this helps to avoid premature

convergence [S.N.Sivanandam et.al. (2008)].

Typically one can distinguish two types of selection scheme, proportionate selection and

ordinal-based selection. Proportionate-based selection picks out individuals based upon their

fitness values relative to the fitness of the other individuals in the population. Ordinal-based

selection schemes select individuals not upon their raw fitness, but upon their rank within the

population. This requires that the selection pressure is independent of the fitness distribution

of the population, and is solely based upon the relative ordering (ranking) of the population.


It is also possible to use a scaling function to redistribute the fitness range of the population

in order to adapt the selection pressure. Selection has to be balanced with variation form

crossover and mutation. Too strong selection means sub optimal highly fit individuals will

take over the population, reducing the diversity needed for change and progress; too weak

selection will result in too slow evolution. The various selection methods generally used are:

Roulette Wheel Selection

Roulette selection is one of the traditional GA selection techniques. This reproduction

operator is the proportionate reproductive operator where a string is selected from the mating

pool with a probability proportional to the fitness. The principle of roulette selection is a

linear search through a roulette wheel with the slots in the wheel weighted in proportion to

the individual’s fitness values. A target value is set, which is a random proportion of the sum

of the fitnesses in the population. The population is stepped through until the target value is

reached. This is only a moderately strong selection technique, since fit individuals are not

guaranteed to be selected for, but somewhat have a greater chance. A fit individual will

contribute more to the target value, but if it does not exceed it, the next chromosome in line

has a chance, and it may be weak. It is essential that the population not be sorted by fitness,

since this would dramatically bias the selection. The Roulette Wheel Selection is shown in

fig. below:

Fig. 2.8 Roulette Wheel Selection Mechanism for four chromosomes

Random Selection

This technique randomly selects a parent from the population. In terms of disruption of

genetic codes, random selection is a little more disruptive, on average, than roulette wheel

selection.

Rank Selection

The roulette selection will have problems when the fitness differs very much. For example, if

the best chromosome fitness is 90% of the entire roulette wheel then the other chromosomes

will have very few chances to be selected.


Rank selection first ranks the population and then every chromosome receives fitness from

this ranking. The worst will have fitness 1, second worst 2 etc. and the best will have fitness

N(number of chromosomes in the population).

One can see in the following picture, how the situation changes after changing fitness to order

number.

Fig. 2.9(a) Situation before Ranking (Graph of fitnesss)

Fig. 2.9(b) Situation after Ranking (Graph of order numbers)

Tournament Selection

An ideal selection strategy should be such that it is able to adjust its selective pressure and

population diversity so as to fine-tune GA search performance. Unlike, the Roulette wheel

selection, the tournament selection strategy provides selective pressure by holding a

tournament competition among Nu individuals.

The best individual from the tournament is the one with the highest fitness, which is the

winner of Nu. Tournament competitions winner are then inserted into the mating pool. The

tournament competition is repeated until the mating pool for generating new offspring is

filled. The mating pool comprising of the tournament winner has higher average population

fitness. The fitness difference provides the selection pressure, which drives GA to improve

the fitness of the succeeding genes. This method is more efficient and leads to an optimal

solution.

2.9.4.2 Crossover

Crossover depends upon the encoding scheme used for the problem. Crossover operates on

selected genes from parent chromosomes and creates new offspring. The simplest way of


performing crossover is to choose randomly some crossover point and copy everything before

this point from the first parent and then copy everything after the crossover point from the

other parent. There exist many other ways to perform crossover like n-point crossover,

uniform crossover, order crossover etc. Crossover can be quite complicated and depends

mainly on the encoding of chromosomes [Beasley, Bull & Martin (1993)].

There are a number of various types of crossover, which are discussed below:

One Point Crossover

The traditional genetic algorithm uses single point crossover, where the two mating

chromosomes are cut once at corresponding points and the sections after the cuts exchanged.

Here, a cross-site or crossover point is selected randomly along the length of the mated

strings and bits next to the cross-sites are exchanged. If appropriate site is chosen, better

children can be obtained by combining good parents else it severely hampers string quality.

The following Fig. 2.10 illustrates single point crossover and it can be observed that the bits

next to the crossover point are exchanged to produce children. The crossover point can be

chosen randomly.

11001011 + 11011111 = 11001111

Fig. 2.10 One Point Crossover

Two Point Crossover

Apart from single point crossover, many different crossover algorithms have been devised,

often involving more than one cut point. It should be noted that adding further crossover

points reduces the performance of the GA. The problem with adding additional crossover

points is that building blocks are more likely to be disrupted.

However, an advantage of having more crossover points is that the problem space may be

searched more thoroughly. In two-point crossover, two crossover points are chosen and the

contents between these points are exchanged between two mated parents as shown in

fig.2.11.

The main problem with one-point crossover was that the head and the tail of one

chromosome cannot be passed together to the offspring. If both the head and the tail of a

chromosome contain good genetic information, none of the offsprings obtained directly with


one-point crossover will share the two good features. Using a 2-point crossover avoids this

drawback, and then, is generally considered better than 1-point crossover.

11001011 + 11011111 = 11011111

Fig. 2.11 Two Point Crossover

Multi-Point Crossover (N-Point crossover)

The problem found in one point crossover may also occur in two point crossover. In fact this

problem can be generalized to each gene position in a chromosome. Genes that are close on a

chromosome have more chance to be passed together to the offspring obtained through a N-

points crossover. Consequently, the efficiency of a N-point crossover will depend on the

position of the genes within the chromosome. In a genetic representation, genes that encode

dependant characteristics of the solution should be close together. Actually this situation in

GA is also known as gene locus problem, which can be eliminated with the help of uniform

crossover.

Uniform Crossover

Uniform crossover is quite different from the N-point crossover. Each gene in the offspring is

created by copying the corresponding gene from one or the other parent chosen according to

a random generated binary crossover mask of the same length as the chromosomes. Where

there is a 1 in the crossover mask, the gene is copied from the first parent, and where there is

a 0 in the mask the gene is copied from the second parent. A new crossover mask is randomly

generated for each pair of parents. Offsprings, therefore contain a mixture of genes from each

parent. The number of effective crossing point is not fixed, but will average of the

chromosome length. In Fig. 2.12, new children are produced using uniform crossover

approach.

11001011 + 11011101 = 11011111

Fig. 2.12 Uniform Crossover


Arithmetic Crossover

Chromosomes having real value or floating point representation undergo arithmetic

crossover. This crossover creates a new allele at each gene position in the offspring. The

value of new allele lies between the values of the parent alleles. The value of new alleles for

offspring is computed using following equation:

Offspring1 = w*parent1 + (1-w)*Parent2

Offspring2= (1-w)*parent1 + w*Parent2

Where w is constant weight factor that is used to compute new values.

2.9.4.3 Mutation

Mutation is a background operator which produces spontaneous random changes in various

chromosomes. A simple way to achieve mutation would be to alter one or more genes. In

GA, mutation serves the crucial role of either (a) replacing the genes lost from the population

during the selection process so that they can be tried in a new context or (b) providing the

genes that were not present in the initial population. The mutation probability is defined as

the percentage of the total number of genes in the population. The mutation probability

controls the probability with which new genes are introduced into the population for trial. If it

is too low, many genes that would have been useful are never tried out, while if it is too high,

there will be much random perturbation, the offspring will start losing their resemblance to

the parents, and the algorithm [Knuth (1997)] will lose the ability to learn from the history of

the search.

There are a number of techniques for mutation. Some of which are discussed as under:

Flipping

Flipping of a bit involves changing 0 to 1 and 1 to 0 based on a mutation chromosome

generated. The fig. 2.13 explains mutation-flipping concept.

11001001 => 10001001

Fig.2.13 Flipping Mutation

Interchanging

Two random positions of the string are chosen and the bits corresponding to those positions

are interchanged. This is shown in Fig. 2.14.


Parent 10100110

Child 11100010

Fig. 2.14 Interchanging Mutation

Reversing

A random position is chosen and the bits next to that position are reversed and child

chromosome is produced. This is shown in Fig. 2.15.

Parent 10100110

Child 10100011

Fig. 2.15 Reversing Mutation

2.9.5 Replacement

This is the last stage of the breeding cycle. This stage basically acts as a building block for

the next iteration. When a new generation of offspring’s is produced, the major question is

which of these newly generated offspring’s would move forward to the next generation and

would replace which chromosomes of the current generation. The answer to this question

again lies in Darwin’s principle of “Survival of Fittest” [Fogel (1995)]. So better fit

individuals should have more chances to survive and carried forward to next generation

leaving behind the less fit ones. The process of forming next generation of individuals by

replacing or removing some offspring’s or parent individuals is done by replacement scheme

[Sivanandam et.al. (2008)]. Basically, there are two kinds of replacement strategies for

maintaining the population – generational replacement and steady state replacement. In

generational replacement, entire population of genomes is replaced at each generation. In

elitism, complete population of genome is replaced except for the best member of each

generation which is carried over to next generation without modification [Affenzeller,

Winkler & Wagner (2009)]. In this case, generations are non-overlapping. Steady state

replacement involves overlapping population in which only a small fraction of the population

is replaced in all iterations. In a steady state replacement, new individuals are inserted in the

population as soon as they are created [Sarma & De Jong (1997)]. There are various types of

replacement techniques as discussed below:

Random Replacement

The children replace two randomly chosen individuals in the population. The parents are also

candidates for selection. This can be useful for continuing the search in small populations,

since weak individuals can be introduced into the population.


Weak Parent Replacement

In weak parent replacement, a weaker parent is replaced by a strong child. With the four

individuals only the fittest two, parent or child, return to population. This process improves

the overall fitness of the population.

Both Parents

Both parents replacement is simple. The child replaces the parent. In this case, each

individual only gets to breed once. As a result, the population and genetic material moves

around but leads to a problem when combined with a selection technique that strongly

favours fit parents: the fit breed and then are disposed of.

2.9.6 Termination

This step is for terminating the algorithm. Although termination depends on the problem and

the user, still following are some of the general criteria under which the algorithm can be

terminated.

• Maximum generations–The genetic algorithm stops when the specified number of

generation’s has evolved.

• Elapsed time–The genetic process will end when a specified time has elapsed.

• No change in fitness–The genetic process will end if there is no change to the population’s

best fitness for a specified number of generations.

• Stall generations–The algorithm stops if there is no improvement in the objective function

for a sequence of consecutive generations of length Stall generations.

• Stall time limit–The algorithm stops if there is no improvement in the objective function

during an interval of time in seconds equal to stall time limit.

The termination or convergence criterion finally brings the search to a halt.

2.9.7 Parameters

There are number of parameters [Merek(1998)] that control the precise operation of the

genetic algorithm. But some of most important parameters are as follows:

Crossover probability: It is the measure of how often crossover will be performed. If

crossover probability is 100%, then all offspring are made by crossover. If it is 0%, whole

new generation is made from exact copies of chromosomes from old population. Crossover is

made in hope that new chromosomes will contain good parts of old chromosomes and

therefore the new chromosomes will be better. Crossover rates should generally be high,

about 80-95%. But in some problems 60% of the crossover rate is also sufficient.

Mutation probability: It is the measure of how often parts of chromosome will be mutated.

If mutation probability is 100%, whole chromosome is changed, if it is 0%, nothing is


changed. Mutation generally prevents the genetic algorithm from falling into local extremes

and helps in recovering the lost genetic material. Mutation should not occur very often,

because then genetic algorithms would act as to random search. Mutation rate generally shoul

be very low, 0.5-1% only.

Population size: It is the number of how many chromosomes are present in the population

(representing one generation). If there are too few chromosomes, genetic algorithm has few

options available for crossover and only a small part of search space is explored. On the

counterpart, if there are too many chromosomes in one population then the speed of genetic

algorithm slows down. It is quite surprising that higher population size does not always

improve the performance of GA. A population size of 20-30 is found to be good enough. But

in some specialized problems the population size of 50-100 are reported as best.

Other parameters: Encoding depends upon the problem type and also the size of the

instance of the problem. Selection method also depends upon the problem, although generally

used selection methods are roulette wheel, tournament or rank selection method

2.10 Advantages of GA systems

There are a number of advantages of Genetic Algorithms. Some of them are as under:

The main advantage of the GA lies in its parallelism. Most of the search techniques

start from one point and continue until with a single point in each iteration until a final

solution is reached. Therefore a problem of local maxima may exist in them, while the

starting solution space in GA is having multiple points in search space and hence the

problem of local maxima generally does not exist.

The GA is much easier to implement as compared to other techniques as it requires no

knowledge or gradient information about the response surface. The advantage of the

GA approach is the ease with which it can handle arbitrary kinds of constraints and

objectives; all such things can be handled as weighted components of the fitness

function, making it easy to adapt the GA scheduler to the particular requirements of a

very wide range of possible overall objectives.

GA can be used when no algorithms or heuristics are available for solving a problem.

A GA based system can be built as long as a solution representation and an evaluation

scheme can be worked out. Since it only requires the description of a good solution

and not how to achieve it, the need for expert access is minimized.

Optimization problems in which the constraints and objective functions are non-linear

and/or discontinuous are not amenable to solution by traditional methods such as


linear programming. GA can solve such problems. GA does not guarantee optimal

solutions, but produce near optimal solutions which are likely to be very good.

Solution time with GA is highly predictable – it is determined by the size of the

population, time taken to decode and evaluate a solution and the number of

generations of population.

GA use simple operations, but are able to solve problems which are found to be

computationally prohibitive by traditional algorithmic and numerical techniques. One

example is the TSP problem.

2.11 Limitations of GA based systems

Although there are a number of advantages of GAs, yet there are some limitations as well.

Some of which are described below:

One of the biggest problems in implementing is identification of the fitness function.

As the optimal solution heavily depends of the fitness function, therefore it must be

determined accurately. There are no standard techniques available to define a fitness

function and it is the sole responsibility of the user to define it.

Sometimes premature convergence can occur and therefore the diversity in the

population is lost, which is one of the major objectives of GA.

Another problem is related with the choosing of various parameters like the size of the

population, mutation rate, crossover rate, the selection method and its strength.

The termination criteria are also not standardized. Till date no effective single

terminator criteria has been identified.

GA themselves are blind to the optimization process, as they only look at the fitness

value of each chromosome rather than knowing what the fitness value actually means.

As a result, their capability to explain why a particular solution was arrived at is

practically very poor or nil.

Although GA are moderately scalable – an increased number of variables can be

accommodated by increasing the length of the chromosome – a longer chromosome

also makes finding the solution more time consuming. The longer the chromosome,

the larger the population needs to be since there are more potential combinations of

genes. This result in more time required for decoding and fitness evaluation.

In general, GA does not require extensive access to data. But some applications may

require access and process data from the organization’s databases to be able to


evaluate the fitness of solutions. For these applications, the quality and quantity of

data is important.

2.12 Applications of Genetic Algorithm

Genetic algorithms have been used for difficult problems (such as NP-hard problems), for

machine learning and also for evolving simple programs. They have been also used for some

art, for evolving pictures and music. A few applications of GA are as follows:

Business: Genetic Algorithms have been used to solve many different types

of business problems in functional areas such as finance, marketing, information

systems, and production/ operations. Within these functional areas, GAs has

performed a variety of applications such as tactical asset allocation, job scheduling,

machine-part grouping, and computer network design.

Optimization: GAs have been used in a wide variety of optimization tasks, including

numerical optimization, and combinatorial optimization problems such as traveling

salesman problem (TSP), circuit design [Louis (1993)] , job shop scheduling

[Goldstein (1991)] and video & sound quality optimization, Telecommunication

routing, State assignment problem, Time tabling problem, Traffic and Shipment

routing etc.

Automatic Programming: They are used to evolve computer programs for specific

tasks and to design other computational structures as in Cellular automata and sorting

networks.

Design: They are also used to optimize the structure and operational design of

buildings, factories, machines etc. They are used to design heat exchangers, robot

gripping arms, flywheels, turbines etc.

Robotics: Robot’s design is dependent on the job it is intended to do. A range of

optimal designs and components can be searched with the help of genetic algorithms

for each specific use and return entirely new type of robots.

Machine Learning: These algorithms are used for machine learning applications like

and prediction, protein structure prediction etc. They are also used to design neural

networks, to evolve rules for learning classifier systems and symbolic production

systems.

Evolvable Hardware: Genetic algorithms are used develop computer models that use

stochastic operators to evolve new configurations from old ones so as develop new

electronic circuits that can be termed as evolvable hardware.


Game Playing: Genetic algorithms are also applied in Game theory and so they are

widely used in developing computer games, simulated environments.

Encryption and code breaking: Genetic algorithms can be used both to create

encryption for sensitive data as well as to break those codes.

Image processing: With medical X-rays or satellite images, there is often a need to

align two images of the same area, taken at different times. By comparing a random

sample of points on the two images, a GA can efficiently and a set of equations which

transform one image to fit onto the other [Goldberg (1989)].

2.13 Applications of Genetic Algorithm in Yield Management

Yield management as already discussed in previous chapter is the problem related to

maximizing revenue by selling the right inventory unit to the right type customer, at the right

time and for the right price. The basic conditions for applying the yield management are

perishable inventory, price discrimination and fixed capacity. On the basis of these

conditions, it was observed that YM is basically a problem of optimization. Now in the

present chapter, it has been observed that for the purpose of optimization GA has been proven

to be a very effective approach.

So far not much of the work has been found in literature which applies GA in YM. But still

some of the applications have been identified from literature. The applications comprise of

decision making tool for YM [Pulugutha et.al.(2003) and Jeng (2011)], air traffic control

system [Xiao-Bing et.al.(2007)], choice based network revenue model [Etebari, F., et.al.

(2011)], crop yield management [Martin (2009)], airline booking [George et.al. (2012)],

pricing inventory [Ganji et.al.(2013)], advertising time allocation [Reza Alaei et.al. (2011)],

and project planning [Karova et.al. (2008)]

2.14 Summary

Genetic Algorithm is an algorithm based on the Darwin’s theory of “survival of the fittest”.

This algorithm tries to replicate this theory in various problems and has been found to be

quite successful. The algorithm is based on the various steps such as initialization, selection,

crossover, mutation and replacement. The biggest problem in implementing a GA is

identifying the fitness function. However, if the fitness function is accurately identified, the

GA can converge in a speedy manner. Another important aspect in using a GA is the use of

operators which are selection, crossover and mutation. The selection mechanism depends on

the problem, though it should be selected so that neither the convergence is premature nor it

is very slow. Crossover and mutation are very important aspects for maintaining the diversity


in the population. The probability of crossover and mutation should be selected such that

neither more fit solutions are lost nor diversity is lost. The algorithm should terminate in a

finite number of steps depending on various criterions. The major advantage of the GA is in

its parallelism i.e. it work on multiple solution in the search space simultaneously as

compared with some other methods like Hill Climbing which works only from a single point.

The advantage of starting with multiple points is that the solution will not be trapped in local

maxima and the chances of finding the global maximum are very high. GA can be used in a

number of applications such as optimization, business, robotics, machine learning,

networking, image processing, etc. GAs is very helpful when the developer does not have

precise domain expertise, because GAs possesses the ability to explore and learn from their

domain. Predictions have been made that advances in mathematics, fuzzy logic, chaos and

fractals will promote and enhance the work currently being undertaken by GA's. The future

will bring forth new applications of genetic algorithms and new techniques which will allow

GA's to be fully exploited.

genetic algorithm - shodhganga...genetic algorithm page 28 2.1 genetic algorithm in the previous...

Documents