on evolutionary algorithms, neural-network computations, and genetic programming. mathematical...

11
ISSN 0005-1179, Automation and Remote Control, 2007, Vol. 68, No. 5, pp. 811–821. c Pleiades Publishing, Ltd., 2007. Original Russian Text c L.N. Korolev, 2007, published in Avtomatika i Telemekhanika, 2007, No. 5, pp. 71–83. TOPICAL ISSUE On Evolutionary Algorithms, Neural-Network Computations, and Genetic Programming. Mathematical Problems 1 L. N. Korolev Moscow State University, Moscow, Russia Received December 18, 2006 Abstract—Some problems related to evolutionary and genetic algorithms, genetic programming, and neural-network computations on solving applied problems that are reduced to analysis of functions prescribed at permutations are roughly studied. Natural parallelism of these algo- rithms and possibility of their realization on modern computers are noted. PACS number: 02.60.Pn DOI: 10.1134/S0005117907050086 1. INTRODUCTION Since the 1980s, the field of scientific and applied investigation known as “Machine Learning” (ML), a part of Computer Science or IT-technologies, has been developed. The aim of the paper is to try to clarify the assumptions, constraints, and mathematical problems that occur in this field of investigation based mainly on heuristics and hypotheses more or less adequate to the subject of study within which limits problems are solved by ML methods. Evolutionary and genetic algorithms, genetic programming, neural-network computations (neurocomputing), and cellular automata belong to this field of investigation. Emergence of the notion “Machine Learning” is connected with K. Samuel, who published an article in 1963 [1] on some problems of machine learning by the example of the game of checkers. 2. ON NEURAL NETWORKS From the formal point of view, the neural network can be represented as a partial case of the parameterized transformer; at is input is entered a vector of values and at the output occurs another vector, perhaps, from another vector space. Such a transformation can be written in the ordinary mathematical symbolism as a system of m functions of the form Y 1 = F 1 (x 1 ,...,x n 11 ,...,ω k1 ) Y 2 = F 2 (x 1 ,...,x n 12 ,...,ω k2 ) ..................................... Y m = F m (x 1 ,...,x n 1m ,...,ω km ), (1) where ω ij are some parameters. We can visualize system of functions (1) in the scheme in Fig. 1 that has much in common with the single-layer neural network. 1 This work was supported by the Russian Foundation for Basic Research, project nos. 06-01-00586 and 06-01- 00046, by the program “Leading Scientific Schools,” project no. NSh-4774.2006.1, by the grant of the President of the Russian Federation, project no. MK-1777.2005.1, by INTAS, project no. 05-109-5267, and by the grant of Lavrent’ev Competition for Youth Projects of the Siberian Branch of the Russian Academy of Sciences. 811

Upload: l-n-korolev

Post on 29-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

ISSN 0005-1179, Automation and Remote Control, 2007, Vol. 68, No. 5, pp. 811–821. c© Pleiades Publishing, Ltd., 2007.Original Russian Text c© L.N. Korolev, 2007, published in Avtomatika i Telemekhanika, 2007, No. 5, pp. 71–83.

TOPICAL ISSUE

On Evolutionary Algorithms, Neural-Network Computations,

and Genetic Programming. Mathematical Problems 1

L. N. Korolev

Moscow State University, Moscow, RussiaReceived December 18, 2006

Abstract—Some problems related to evolutionary and genetic algorithms, genetic programming,and neural-network computations on solving applied problems that are reduced to analysis offunctions prescribed at permutations are roughly studied. Natural parallelism of these algo-rithms and possibility of their realization on modern computers are noted.

PACS number: 02.60.Pn

DOI: 10.1134/S0005117907050086

1. INTRODUCTION

Since the 1980s, the field of scientific and applied investigation known as “Machine Learning”(ML), a part of Computer Science or IT-technologies, has been developed. The aim of the paperis to try to clarify the assumptions, constraints, and mathematical problems that occur in thisfield of investigation based mainly on heuristics and hypotheses more or less adequate to thesubject of study within which limits problems are solved by ML methods. Evolutionary and geneticalgorithms, genetic programming, neural-network computations (neurocomputing), and cellularautomata belong to this field of investigation. Emergence of the notion “Machine Learning” isconnected with K. Samuel, who published an article in 1963 [1] on some problems of machinelearning by the example of the game of checkers.

2. ON NEURAL NETWORKS

From the formal point of view, the neural network can be represented as a partial case of theparameterized transformer; at is input is entered a vector of values and at the output occurs anothervector, perhaps, from another vector space. Such a transformation can be written in the ordinarymathematical symbolism as a system of m functions of the form

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

Y1 = F1(x1, . . . , xn, ω11, . . . , ωk1)Y2 = F2(x1, . . . , xn, ω12, . . . , ωk2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Ym = Fm(x1, . . . , xn, ω1m, . . . , ωkm),

(1)

where ωij are some parameters.We can visualize system of functions (1) in the scheme in Fig. 1 that has much in common with

the single-layer neural network.1 This work was supported by the Russian Foundation for Basic Research, project nos. 06-01-00586 and 06-01-

00046, by the program “Leading Scientific Schools,” project no. NSh-4774.2006.1, by the grant of the Presidentof the Russian Federation, project no. MK-1777.2005.1, by INTAS, project no. 05-109-5267, and by the grant ofLavrent’ev Competition for Youth Projects of the Siberian Branch of the Russian Academy of Sciences.

811

812 KOROLEV

In contrast to the general transformer, in the neural network all functions Fi have identicalforms, e.g., [2] the simplest one:

Fi =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

1, ifn∑

i=1

(xi, ωi) > ν

0, ifn∑

i=1

(xi, ωi) � ν,

where ν is a constant known as a threshold.It is necessary to be noted that values of all functions in system (1) can be computed simulta-

neously. This implies that transformations of this type have inner, natural parallelism.Make the following remark. Figure 1 can be interpreted as a representation of a single-layer

parallel-operating computer network. In practice, multilayer networks are used more often as onapplication of only one layer at computing the resultant values of the function Fi, information onnetwork parameters connected with other functions is lost.

The first work on formal analysis of mathematical models on neural-like networks appeared in1943. It was “Logical Calculus of Ideas Immanent in Nervous Activity” by V.S. McCulloch andW.H. Pitts [3]. It was translated into Russian in 1956.

Fundamental analysis of mathematical models of neural networks was conducted by severalfamous experts in mathematical logistics including M. Minskii, S. Papert, and etc.

Rosenblatt’s perceptron model [4] that appeared in 1962 was an important stage in developmentof neurocomputing ideas. It was successfully used for solving some problems of pattern recognition.

Rosenblatt’s perceptron model was thoroughly investigated by Minskii and Papert [5], whoproved impossibility of perceptron application for solving some simple problems of geometric objectsrecognition, in particular, impossibility to recognize such patterns geometrically.

Then numerous improvements of models of neural networks that lifted a series of perceptronconstraints and expanded the domain of solved problems were proposed. However, computer powerof that time was insufficient to solve many real problems of recognition and classification.

Analyzing the first period of neural networks investigation development, it is possible to saythat this field belonged to mathematics, to studies of a class of mathematical models which mapphysiological reality including terminology.

Complexity and uniqueness of these studies was in the fact that neural networks are modelsof parallel processing in which all nodes (neurons) operate simultaneously and simultaneously ex-change the information with each other.

F

1

F

2

F

m

x

1

x

2

x

n

Y

1

Y

2

Y

m

Fig. 1.

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

ON EVOLUTIONARY ALGORITHMS 813

The distinctive feature of mathematical models of neural networks used for computations isthe principle of their “learning” and “self–learning,” their readiness for solving a certain class ofproblems.

The process of “learning” and “self–learning” is in parameters fitting ω1, . . . , ωk so that “output”vector space possesses predefined properties.

In a series of problems, the set of output values Yi must denote a number of the class to whichthe input vector is attributed to; in other problems, this set is compressed information; in the third,approximate value of some function, and etc.

As was noted, ideas of genetic programming appeared at the end of the 1950s [6]. ProfessorJ. Koza played a crucial role in developing these ideas; in 1992, he published his first monographdevoted to many aspects of the new trend in programming automation [7, 8].

Before speaking about genetic programming, we should clarify the ideas connected with evolu-tionary algorithms in general and genetic algorithms in particular.

3. EVOLUTIONARY ALGORITHMS

Evolutionary algorithms can be related to the known “cut and try” method connected with thesolution to enumeration problems that occur while searching right solutions or extreme values inthe absence of knowledge or theory of the behavior of the object, process, or function under study.

The method of solving such ill-formalized and ill-defined problems is direct enumeration.In continuous mathematics, the method of gradient descent is widely used for finding extremums

of functions; in discrete mathematics, is used the branch-and bounds method that allows removinga large number of variants which certainly do not lead to the right solution.

In both these cases of extremum search by direct enumeration, information on characteristicsof the function or functional, whose extremum must be found, is necessary. On solving practicalproblems, this information can be obtained in the course of the solution process.

Let there is a finite set of N unique elements. Uniqueness of elements implies that each of themhas its name, intrinsic only to it, and possesses a certain set of attached characteristics. Eachelement of this set can have its sequence number. If in the set there are N elements, then we canassign sequence numbers N ! to them by different ways. Let each numeration is connected withsome value and there exists an algorithm for computing this value. Let us identify the functionthat depends on the technique of set elements numeration as a function prescribed at permutationsor a functions of permutations F (p).

As the names of elements we can select numbers from 1 to N and call this sequence an initialpermutation p1 = (1, 2, 3, . . . , N). As is known, any other permutation can be obtained by sequen-tial change of places of two adjacent elements of permutations; respectively, all permutations canbe renumbered from 1 to M = N !.

A lot of problem are reduced to the search for extremums of functions of permutations. Theclassic example of the function prescribed at permutations is the traveling salesman problem. Ifto renumber all cities that the traveling salesman must pass through and prescribe the distancebetween any pair of cities, then the permutation can be regarded as the order of passing thoughcities and the sum of traveled distance can be regarded as the value of the function at the selectedpermutation. With the small number of cities, this problem can be solved by direct enumeration.

But if N is larger than, e.g., 300, then according to the Stirling formula

(

N ! ≈(

N

e

)N √2πN

)

we obtain 300! > 10450. There is no computer that can cope with such a number of enumerations,since the number of elementary particles in the Universe does not exceed 10200!

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

814 KOROLEV

Another example of the problem reduced to the search for extremum of the function at permu-tations is a well-known knapsack problem.

In the simplest case, its formulation is as follows. There are N objects; each of them has itsweight and cost. The knapsack can be loaded only to the certain weight. We should chose themost valuable N objects from all objects and place them into the knapsack without exceeding thepermissible weight. Here two values, weight and cost (value), are connected with each object.

We can try to solve both these problems by choosing permutations at random one after anotherin the hope of coming across a successful permutation that delivers the desired extremum of thefunction or verges towards the extremum.

Common sense guides us to suppose that such problems must be solved by direct enumeration byremoving inadmissible variants of the sequence of passing though cities or the sequence of loadingthe knapsack. It is necessary to find laws that define construction rules of “good” permutations.

We should act as follows. We choose several permutations at random, compute correspond-ing values of the function for them all, thoroughly analyze their composition, and elaboratestrategy for forming new permutations that approximate values of the function to the desiredextremum.

If, e.g., we seek the maximum of function; then we should try to choose the permutations forfurther analysis that have something in common with the initial randomly chosen permutationswith large values of their functions. However, if to head only for “good” permutations, we can fallwithin local maximum. Thus, we should continue testing random permutations hoping that we donot fall within local extremum.

Formally, this implies that to obtain new permutations on the basis of the selected earlier,we need a generation operation of similar permutations and an operation that removes us from“good” permutations. These two operations can be formally determined for some problems assome algebraic operations over permutations [9, 10].

Nature “invented” its own method for finding right solutions that guarantee improvement ofindividuals’ fitness for the habitat which does not demand a priori knowledge of the function orfunctional.

Darwin’s theory of evolution is a hypothesis that explains mechanisms of emergence of stablepopulations of organisms (species) better suited to habitation and reproduction. These mecha-nisms are “interbreeding,” “mutation,” and “selection” (natural selection for reproduction of newpopulations of species).

Modern genetics revealed the details of mechanisms of interbreeding and mutation that occur atthe molecular level in genes or genome. An individual’s genome contains the information necessaryfor appearance of a concrete individual and predetermines its adaptation to the habitat.

In connection with this, appeared ideas of applying the mechanisms of natural selection forsolving problems of search for the best solutions in the domains where there is no a priori knowledgeof the object, whose extreme value must be found.

In mathematics, there are methods that allow finding local extremums of functions. The firststep of numerical methods for finding the extremum is, as a rule, as follows: several points, at whichvalues of function are computed, are “scattered” randomly at some segment of function definition.At the following steps, the points with the largest (smallest) value of function are selected and new“promising” points for the further analysis with application of, e.g., the ideas of gradient descentare found around them. The choice of “promising” points can be called a selection; productionof new points can be called a generation process of new generations of points. Such numericalmethods are iteration, and each iteration step consists in selection of objects and production ofa new generation of points for the further analysis. Such algorithms are called, by analogy with

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

ON EVOLUTIONARY ALGORITHMS 815

Darwin’s theory, evolutionary. Nowadays, it is difficult to find the author who introduced thisnon-mathematical notion in numerical methods.

There are a lot of evolutionary algorithms that use mechanisms of the type of natural selectionin different forms.

To state some problems that occur in attempting to simulate natural processes, let us considera simple example of the problem of search for global minimum of some function of one variable.

Let there is nothing known about the function except the fact that we have an algorithm whichallows computing its value at any arbitrarily chosen point. Bearing in mind computer processing,let us assume that the search for global extremum is conducted for the function prescribed only inthe finite set of discrete points and, hence, in the finite segment.

This moves away the possibilities of precise application of well-developed mathematical methodsof search for extremums of functions that are prescribed by their differential properties.

To find global maximum of functions prescribed this way f(x), an algorithm called the geneticalgorithm [11–13] is often used. It lies as follows:

• Let us take at random n values of x(1)i that belong to the definition domain, compute (find)

values of y(1)i = f(x(1)

i ) at all these points, and reason and act as follows:

• let us place x(1)i in ascending order of the corresponding values of y

(1)i ;

• let us suppose that the values of x(1)i that correspond to the large values of f(x(1)

i ) are closerto the desired maximum of our function than others and let us consider them “good” (it is to benoted that it is a hypothesis based only on intuition);

• let us form another sequence x(2)j of the same length n not randomly but so that a part of new

x(2)j will be close to “good” x

(2)j ; the second part, to avoid falling within local maximum, will cover

others; and the third part will be rejected;• having obtained the new sequence x

(1)j , which is called “population,” let us generate new

populations the same way.

We will continue this process of constructing new populations x(k)i hoping that at some step we

will find a point, at which the function f(x) achieves its maximum.What can prove that our hopes were justified and the value of the desired minimum coordinate

was obtained at some iteration? For example, the fact that new iterations do not improve resultsor that the whole definition domain of the function, whose extremum we are seeking, is, from theexpert’s point of view, quite “densely” investigated. This and many other criteria of completionof genetic algorithm iterations, obviously, do not have anything in common with formal strictmathematical proofs of the validity of the obtained results.

In genetic algorithms is used the terminology borrowed from Darwin’s evolution theory. It isused in mathematical studies that do not have anything common with biology.

In our example, the sequences x(k)j = x

(k)j1 , x

(k)j2 , . . . , x

(k)jn are identified as populations; xi as

genomes that generate values of the function; and the function is identified as the fitness functionor the quality function.

We spoke above about functions prescribed at permutations. In this sense, the genome is apermutation; population, a set of permutations under study; the fitness function, the permutationfunction. The technique for obtaining the points of a new population on the basis of the analysisof the points (genomes) of the previous population are identified as operations of interbreeding andmutation. The choice of “good” points and rejection of “bad” can be called selection.

Referring to our simple example, it is necessary to note the following. If to choose only goodgenomes of the previous population for obtaining a new population, there is great a probability

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

816 KOROLEV

to “get caught” about the point of local maximum and fail to find the population that containsglobal maximum. To avoid this, nature “invented” the process of mutation, i.e., random changeof a genome element that results in variability of a separate individual for the worse or for thebetter. According to this mechanism of variability, in genetic algorithms are used the operation ofmutation that guarantees, as a rule, the way out of “getting caught” about “good” points. Themore complicated operation of “interbreeding” can be interpreted as an attempt to form a newindividual by mixing properties of two other individuals (parents).

After stating the idea of the genetic algorithm, we can state mechanisms of basis operationsused in computations.

The general situation of operation of the genetic algorithm? in our simple problem of search forthe maximum of function should be represented as follows.

There is “a black box,” whose mechanism is unknown. It receives some data at the inputand generates some values at the output that can be compared with each other by ranging themaccording to their quantity or importance.

Characteristics of some objects that we will identify as an individual’s genome can be input.The genome must be encoded by some sequence of symbols. From the computational point of view,any code for a computer is a structured sequence of binary digits. It is structured in the sensethat the whole binary sequence is divided into fields (subsequences) that carry some sense loaddetermined by formulation of the problem.

Binary elements of such a sequence are often called genes; sense fields of sequence, chromosomes.It is to be noted that in the formal formulation of problems solved by genetic algorithms, theseterms are not connected with their initial biological meaning.

In the general case, we can suppose that the input of the “black box” receives data in binarycode; at the output results are coded in the binary form. By these results we can arrange genomesof the population generated at each step of operation of the algorithm.

The operation of interbreeding consists formally in the fact that genomes of two “parents” areselected from the population, and genomes of several “descendants” are formed from them accordingto the rules that remind of cross-pollination: for descendants, a part of chromosomes (blocks ofgenes) is taken from one parent; the other part, from the other parent. They are combined sothat their general number and sometimes their position was the same as their parents’. In classicgenetic algorithms, all individuals of all populations have the same whole length of the binarygenome.

The one-place operation of mutation consists in the fact that one binary digit (gene) of arandomly chosen representative of the population becomes value-reversed, and thus the value of anew representative is obtained. The word “randomly” denotes that this process is conducted withsome predetermined probability.

The operation of selection consists in choosing genomes obtained as a result of interbreedingsand mutations for forming a new population for the next iteration step of the genetic algorithm.While selecting, the probabilistic approach is also used, i.e., are given probabilities of choosing newgenomes and leaving old individuals in the population.

The probabilities above are parameters of the genetic algorithm, since the algorithm performancedepends to a great extent on their choice. The input parameters of the genetic algorithm are alsothe size of the genome, the prescribed size of the population, and the number of iteration stepsthat are necessary for obtaining the good-enough solution.

In 1975, J. Holland [6] proved the theorem known as the schemata theorem. As a rough approx-imation, the schemata theorem confirms that with the increase in the number of iteration steps ofthe genetic algorithm, approximation probability of the extremum of fitness function increases.

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

ON EVOLUTIONARY ALGORITHMS 817

0 0 * * 1 0 1 1 * 0

Fig. 2.

The theorem proceeds from the fact that the genome is a binary sequence, whose every gene(binary digit) can change independently of others as a result of operations of interbreeding andmutation. The theorem asserts that the number of genome digits that do not change form generationto generation increases (by probability) from generation to generation as a result of selection whichleaves in the following generation mainly genomes with the “good” value of the fitness function.

Let us explain the assertion of the theorem by the following example. Let the genome is asequence of n binary digits. We consider a population of some generation consisting of M indi-viduals that have the best values of the fitness function. To each digit of the genome we put incorrespondence the repetition frequency of its value. For example, value 0 of the genome with thenumber is repeated κ times in the population; value 1 of this gene is repeated M − κ times. Ifκ ≈ M − κ, then we will suppose that such a gene is “random” or non-informative and we denoteits uncertain value by the symbol “∗;” if κ � M − κ, then we will assume that the probabilityfor the gene to attain a certain value is proportionate to the frequency of its appearance in thepopulation. This allows constructing a “scheme” of the genome that reflects the attaining degreeof certain gene values in the population. The scheme will look, e.g., as in Fig. 2.

In the population, the genes marked by ∗ change their values (from 0 to 1 and vice versa) fromthe individual to the individual in a random way; so, we can suppose that the digits ∗ influencethe value of the fitness function very little. Digits of gene sets marked by 0 or 1 testify to the factthat the corresponding genes of the population assume fixed values from the majority of “good”individuals.

Under some assumptions concerning probabilities of mutations, interbreedings, and selectionrules, the occurrence probability of uncertainties of the type ∗ gradually decreases from populationto population, and symbols ∗ gradually vanish from schemes.

The genome with the given values of all its digits can be interpreted as one of the corners of then-dimensional cube. The set of corners, which satisfy the scheme where values are attained onlyto several digits, are corners that form an analog of the hyperplane in the ordinary n-dimensionalspace.

If the genome under study is a sequence of binary digits of the length n, then the numberof all possible binary sequences of this length is equal to 2n. For example, with n = 64, thenumber is 2n ≈ 1018. Direct enumeration to solve the problem of search for extremum of the fitnessfunction requires a huge number of computations. Practical experiments with the knapsack problemdemonstrate that to achieve the state aim we need about 200 iterations with the population size of40 genomes in each population. Thus, we are to compute 200 × 40 ≈ 104. Acceleration producedby the genetic algorithm compared to direct enumeration is equal to 1012. Work is accelerated bybillions of times.

Under some artificial assumptions, we can in mathematical exact manner prove convergence ofthe genetic algorithm to the solution to the stated problem.

As was mentioned, in some real problems, it is difficult to prove that the algorithm gave anexact (or close to exact) solution; and this is one of the disadvantages of genetic computations.

4. ON GENETIC PROGRAMMING

There exists another formulation of the problem in the model with the “black box.” Let thereaction of the black box to the input data, which we will also identify as genomes, is known. We

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

818 KOROLEV

y

+

*

*

+

*

a

b

c

d

x

r

Fig. 3.

are to find out the program that guides this apparatus which produces values of some functioncomputed by it.

The problem of genetic programming is reduced to constructing a program that transforms datasimilarly to the black box by using ideas of genetic algorithms. It is a question of constructinga real computing program that can be assigned to the computer for its real realization, i.e., theprogram encoded in one formal algorithmic languages that the computer can understand.

The formal mechanism of genetic programming is as follows [14, 15].The model of any program is represented in the form of a graph or a list in the sense of the

algorithmic language of logical problems programming, i.e., LISP.For simplicity, we will consider a program as a visually prescribed graph.Any computation of the arithmetic expression can be interpreted as a flow of computations

determined by the corresponding graph. For example, let us consider a simple expression:

y = ((a + b)c + (a − d)x) r.

This expression can be represented by a graph of data flow and operations over them (Fig. 3).The sequence of computations expressed by this graph can be the following

a ⇒ b ⇒ + ⇒ c ⇒ ∗ ⇒ a ⇒ d ⇒ − ⇒ x ⇒ ∗ ⇒ + ⇒ r ⇒ ∗.

Let us identify this sequence of symbols that reflect the computation graph as a genome ora set of chromosomes that carry the following semantical load: letter codes denote variables andconstants; codes of operation symbols, the set of admissible arithmetic operations on the computer;the code of the symbol ⇒ denotes the arrow in the software graph.

Complying with the evident rules, let us now at random chose a sequence of these three typesof symbols that form a program able to compute. We substitute the initial values perceived by ourblack box into this randomly chosen program. Probably, the result of computations according tothis randomly formed program will considerably differ from the expected value. Let us regard theerror of calculations as the value of the fitness function of the genetic algorithm that constructs thewanted program.

Thus, we determined the type of the genome and fitness function; to realize the genetic algorithm,it is necessary to reasonably determine operations of interbreeding, mutation, selection, size ofpopulation, and criteria of iteration process stoppage according to the choice of the right programthat minimizes the error of computations in black box response to the input data.

We consider a more formal formulation of the problem of genetic programming connected, as asimple example, with the design of the program for computing the function of one variable. Let

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

ON EVOLUTIONARY ALGORITHMS 819

Parents

Descendants

Fig. 4.

we have a finite table of couple of numbers instead of the black box (xi, yi), where xi denotes anargument; and yi, a value of the function of this argument. We will find a program represented bya tree graph that will compute, according to the value of xi, values of zi which minimally differ inaggregate from yi, i.e., we will seek a program that minimizes the function

F (p) =n∑

i=1

|yi − zi|,

where zi = p(xi). The argument of function F is a program-genome p mentioned above. Let the sizeof the population of programs is equal to, e.g., 10, and at random construct 10 graph programs withthe small number of nodes and communications between them. For each graph i, let us computethe value of the function F (pi) and arrange pi according to the values of the function F (pi).

Further, we will act according to the classic genetic algorithm for obtaining populations ofprograms at the following step. However, we will conduct the operation of interbreeding anotherway, namely: we choose two graph from the population, separate in each of them some branchesand change their places as, e.g., in Fig. 4.

We obtain two descendants from two parents that can be in the next population, if the selectionalgorithm executes this.

The operation of graph mutation is in the fact that either one node is added to the tree or onebranch or one node is removed. While conducting operations of mutation and interbreeding, therule of regular syntax permanence for new graphs that will make a new population is observed.

In contrast to the ordinary genetic algorithm, here genomes of the algorithm can be of thedifferent length, with the different number of nodes and communications between them, whichcreates extra complexities in organizing the design of the algorithm of good programs.

Representation of programs in the form of trees and other graph structures is widespread inthe practice of genetic programming (GP). However, the developed GP theory allows interpretingthe concept of program in different ways. Using GP, we can construct electrical networks with theprescribed characteristics, functions of Boolean algebra, and also any automatic devices that area composition of elementary functional components united into one system instead of computerprograms.

Some automatic radio-engineering devices are constructed and even patented [11] on the basisof the ideas of GP.

In the general case, the program in GP should be understood as a plan of action for achievingthe stated goal with the best parameters of the final result, i.e., a plan for achieving the extremevalue of some functional.

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

820 KOROLEV

In the formal formulation, the problem is a structure consisting of nodes and communicationsbetween them. The structure of the tree-type semantically must admit the following interpretation:each node must be able to fulfill some operations, and the root node must deliver the solution tothe stated problem.

For it to be able to execute the assigned operation, the problems of subsets of nodes directlyconnected with it must be fulfilled. It concerns each node that must carry out its work, if nodesof the following by hierarchy level, connected with it, have fulfilled their work. Leaves of the treeof nodes are terminal nodes; they do not have communications with other lower-level nodes andexecution of work starts from them. In the sense of computer programs, terminal nodes are initialprogram data or procedures that do not require input parameters.

Basic elements of any genetic program are terminals and functions. On using GP for solving theoptimization problem in a subject of study, we should choose an alphabet of terminal symbols anda set of used functions. In computer programs designing, as a rule, a standard set of arithmeticoperations, elementary functions, and operators of branching and cycles is chosen. As terminalsymbols here are identifiers of initial data and of intermediate computation outputs. The latter aresometimes functions of memory access on recording and reading. The set of functions can includemore complex elements such as ready library programs used in this subject of study.

Application of real processes computer simulation of real processes with the goal to obtainpractice-useful results enlarges.

Processes that occur in nature at different levels, starting with the biomolecular level and endingwith microprocesses of living matter development, give much food for creating new algorithmsimitating natural processes. It is enough to cite the names of some algorithms: “the ant colonyoptimization algorithm” that reproduces the logic of ants’ behavior at finding the shortest way tofood, “the flocking algorithm” that solves some navigation problems, and etc.

However, in this field of studies and applications there are too many heuristics though adequatebut without exact mathematical proofs of validity of assumptions. Coincidence of credibility andvalidity of the obtained solutions is proved by the experiment on some data bases accepted as worldstandards; efficiency and accuracy of numerous heuristics of pattern recognition, classification,clustering, and prediction are tested on them.

In Russia, scientists in a number of institutes and universities develop ideas of neural compu-tations, genetic programming, and evolutionary computations. For the last 3–4 years, more than50 scientific articles in journals and collected works have been devoted to this issue.

Nevertheless, we have to admit that the situation leaves much to be desired. Perhaps, the reasonfor this is that all heuristics of this type deal with discrete sequences and finite sets for which thereare no such important notions as infinity, continuity, limit, and etc.; i.e., there are no foundationsthat carry the classical mathematics with its great success in solving numerous problems.

Computer calculations are finite and discrete. Apparently, it is high time we created and devel-oped classical computer mathematics that takes into consideration these peculiarities of computerengineering.

Of great interest here are researches conducted by the school of Academician Yu.I. Zhuravlevon mathematical problems of prediction and the algebraic approach to the analysis of the problemsof this type [9], [10], and [16].

5. CONCLUSIONS

The algorithms described above have an important peculiarity, namely: they are easy to beparallelized due to their construction. However, in the most cases of neural-network computations,this parallelization is “micrograin,” i.e., computational complexity at each node of the computa-tional network is very small compared to possibilities of processing nodes of modern cluster settings

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007

ON EVOLUTIONARY ALGORITHMS 821

realized at multicore microprocessors with gigaflop speed. In most problems, related to genetic algo-rithms and genetic programming, computation of the fitness function requires great computationalresources of memory and speed and parallelization in this case is justified and reasonable.

One of the problems of the use of evolutionary algorithms that requires prelearning is that thisiteration process is ill-parallelized, since it is sequential. Design of parallel learning algorithms isone of the examples of the use of heuristics, whose reliability, as a rule, is not proved. The heartof parallel learning algorithms is that the learning sample is divided into subsets, and each node isoffered its subset of the total sample for independent learning. Obviously, the smaller the sample,the less precise accuracy of the neural network setting. Each node can teach the neural network itsown way. The problem of “independent” learning results adjustment does not have well-groundedsolutions.

Nevertheless, less modern multiprocessors with massive parallelism that take leading stands inTOP-500 give chances to solve practically important problems on the basis of principles of machinelearning.

REFERENCES

1. Samuel, K., Some Studies in Machine Learning Using the Game of Checkers, in Computers and Thought ,New York: McCraw-Hill, 1963.

2. Osovskii, S., Neironnye seti dlya obrabotki informatsii (Neural Networks for Data Processing), Moscow:Finansy i Statistika, 2002.

3. McCulloch, V.S. and Pitts, W.H, Logical Calculus of Ideas Immanent in Nervous Activity, Bull. Math.Biophysics, 1943, vol. 2, pp. 548–558.

4. Rosenblatt, F., Principles of Neural Dynamics, New York: Spartan, 1962. Translated under the titlePrintsipy neirodinamiki , Moscow: Nauka, 1965.

5. Minsky, M. and Papert, S., Perceptrons , Cambridge: MIT Press, 1969. Translated under the titlePerseptrony, Moscow: Nauka, 1971.

6. Holland, J., Adaptation in Natural and Artificial Systems , Cambridge: MIT Press, 1992.7. Koza, J., Genetic Programming: On the Programming of Computers by Means of Natural Selection,

Cambridge: MIT Press, 1992.8. Koza, J., Genetic Programming II: Automatic Discovery of Reusable Programs, Cambridge: MIT Press,

1994.9. Chekhovich, Yu.V., Application of the Algebraic Approach to Problems of Trends Separation, in Matem-

aticheskie metody raspoznavaniya obrazov (MMRO–10). Trudy 10 Vserossiiskoi konferentsii Vychisli-tel’nogo tsentra Rossiiskoi akademii nauk (Mathematical Methods of Pattern Recognition. Proc. 10thall-Russian Conference of the Computing Centre of the Russian Academy of Sciences), Moscow: ALEV–V, 2001, pp. 315–316.

10. Rudakov, K.V. and Chekhovich, Yu.V., On Synthesis of Learning Algorithms of Thrends Separation(the Algebraic Approach), Prikl. Mat. Inform., 2001, no. 8, pp. 97–113.

11. Srinivas, M. and Patnaik, L., Genetic Algorithms. A Survey, J. Comput., 1994, vol. 27, no. 6, pp. 28–43.12. Ribeiro, J. et al., Genetic Algorithms. Programming Environments, J. Comput., 1994, vol. 27, no. 6,

pp. 17–26.13. Forrest, S., Genetic Algorithm: Principles of Natural Selection Applied to Computation, J. Sci., 1993,

vol. 261, pp. 872–878.14. Bonzhaf, W., Nordin, P., Keller, R.E., et al., Genetic Programming—An Introduction, Heidelberg:

Springer, 1998.15. Tou, J.T. and Gonzalez, R.C., Pattern Recognition Principles , Reading: Addison-Wesley, 1974. Trans-

lated under the title Printsipy raspoznavaniya obrazov , Moscow: Mir, 1978.16. Zhuravlev, Yu.I., On Recognition Algorithms with Representative Sets (on Logical Algorithms), Zh.

Vychisl. Mat. Mat. Fiz., 2002, vol. 42, pp. 1425–1435.

This paper was recommended for publication by A.I. Kibzun, a member of the Editorial Board

AUTOMATION AND REMOTE CONTROL Vol. 68 No. 5 2007