turing-complete data structure for genetic programming

Upload: rams

Post on 09-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Turing-complete Data Structure for Genetic Programming

    1/6

    Turing-complete Data Structure for

    Genetic Programming

    Taro Yabuki

    Graduate School of Frontier Sciences

    The University of Tokyo

    [email protected]

    Hitoshi Iba

    Graduate School of Frontier Sciences

    The University of Tokyo

    [email protected]

    Abstract In generating a program automatically, if

    we do not know whether the problem is solvable or not in

    advance, then the representation of the program must be

    Turing-complete, i.e. the representation must be able to

    express any algorithms. However, a tree structure used

    by the standard Genetic Programming is not Turing-

    complete. We propose a representation scheme, which

    is a recurrent network consisting of trees. It makes Ge-

    netic Programming Turing-complete without introduc-

    ing any new non-terminals. In addition, we empirically

    show how it succeeds in evolving language classifiers.

    Keywords: Genetic Programming, language classifier,Turing-completeness

    1 IntroductionGenetic Programming (GP) is a technique to gener-

    ate programs or functions automatically[5]. GP is atype of evolutionary computing that aims to solve prob-lems through the repetition of modification and selec-tion of prospective solution candidates. Various rep-resentations for programs or functions have been used.The most popular GP (standard GP, SGP) uses a singleparse tree (S-expression) as a program representation.S-expressions (e.g. (+ (/ ( y 2) 5) x)) are made bycombining non-terminals (e.g. +,,,and/) and ter-minals (e.g. x, y, and integer from 0 to 10).

    We have proposed a substitution for the single S-expression. It is a recurrent network consisting of trees(RTN). In the following paragraphs, we will explain why

    a new representation is required for GP.When using GP, we must set various configurations.

    For example: the representation of individuals, the com-ponents of representation, the way to use the represen-tation and evolutionary operators, etc. The strategy toset these configurations depends on the objective tasks.

    The objective tasks of GP can be classified as follows:(1) Programs that can be easily written by humans. (2)Programs that are simpler or more efficient than thosewritten by humans. (3) Programs that solve unsolvedproblems.

    0-7803-7952-7/03/$17.00 c- 2003 IEEE.

    If the task belongs to either the first or second class,then the configurations can be decided with referenceto the previously known solution. However, if the taskbelongs to the third class, then it is not easy to set theconfigurations. Choosing a smaller non-terminal set andrestricting the expressiveness of individuals may makethe search easier. However, should the search fail, itwill be impossible to find out whether it is attributableto GP or the configurations. For example, suppose wetry to generate a classifier for the language {ww|w {0, 1}}. If we use a representation whose repertoire isthe same as one of the pushdown automaton, then wewill never succeed.

    One conceivable approach is to start with simple set-tings and gradually introduce complex one. One methodproposed composes the S-expression of basic arithmeticfunctions in the early stage, and then introduces a loopor recursion as the search progresses[5]. A strategy like

    this is adoptable for a task belonging to the first or sec-ond classes mentioned above, but not for the unsolvedproblems, because it is not clear how a loop or recursionaffects the expressiveness of individuals. For unsolvedproblems, a strategy that we can confirm the increaseof expressiveness is desirable.

    2 Expressiveness of the SGPAccording to the theory of computability, the ability

    of a computer can be classified, from smallest to largest,as table, finite states automaton (FSA), pushdown au-tomaton (PDA), or Turing machine (TM)[9].

    The expressiveness of a GP individual depends onboth its components and the way the individual is used.If we compose an S-expression of basic arithmetic func-tions and use it as a program, then the repertoire of sucha program is limited. For example, the S-expressioncomposed of four arithmetic functions, variables andconstants cannot even include the repertoire of a table.

    It is obvious that if we can repeat the evaluation ofS-expression with a finite memory and non-terminals toaccess it, then the expressiveness of the GP individualis equivalent to the one of FSA. Similarly, if there arenon-terminals to access an infinite stack, then the ex-

  • 8/7/2019 Turing-complete Data Structure for Genetic Programming

    2/6

    pressiveness is equivalent to the one of PDA. If thereare non-terminals to access an infinitely indexed mem-ory and we can repeat the evaluation until the datastored in the memory meets a halting condition, thenthe expressiveness is equivalent to the one of a TM, i.e.Turing-complete[10].

    In short, the expressiveness of a GP individual can

    be extended. However, it is not easy to shift natu-rally from the PDA phase (the non-terminal set includesPUSH and POP) to TM phase (the non-terminal set in-cludes READ and WRITE). On the other hand, RTNdescribed in the next section can shift from table to TMnaturally.

    3 Recurrent tree network

    An outline of RTN is given as follows. An exam-ple of RTN is shown in Fig. 1 (P in Fig.1 is a func-

    tion which returns a remainder of its argument dividedby 2). Each node has both a value and a pure func-tion expressed by an S-expression (In this paper, S-expression is written as a normal expression for the sakeof readability). The value is bound to the output of thefunction. The function has at most four parameters(i.e. a, b, c, and d). Those parameters are bound tothe value of nodes. RTN of Fig. 1 can be rewritten as{v

    1= (v1 P(v1))/2, v2 = P(v1)v2}.

    The programs are executed according to the discretetime steps. Define the function and the value of the i-thnode at time t as fn and v(n, t), respectively, the number

    of the parameters as k, and the index of linked node asln,i. The value at t+1 will be fn(v(ln,1, t), , v(ln,k, t)).

    Suppose the value of the first node (#1) is boundto the input data and the value of second node is 1 att = 0. For example, when the input data is a binarydigit 1011, the transition of RTN is given in Table 1.When the value of #1 becomes 0, the value of #2 is 0if and only if the inputted binary digit contains 0.

    Figure 1: Example of RTN (left) and the relation be-tween variables and link (right).

    Table 1: Transition of RTN of Fig. 1

    time step 0 1 2 3 4#1 value 1011 101 10 1 0#2 value 1 1 1 0 0

    3.1 Description of an RTN

    RTN is defined by the following six factors: (1) listof S-expression, (2) network topology vector, (3) initialstate, (4) the way to input data, (5) halting condition,and (6) the way to output a result.

    In the case of the RTN shown in Fig.1, the list ofan S-expression is given as {(c P(c))/2, P(a)d}. Thenetwork is described as follows. Link of #1 can be ex-pressed by {, , 1, }, because the third parameter i.e.c is bound to the value of #1. The behavior of this RTNdoes not change even if any integer enters the position

    with a character , because there are no other param-eters to be bound except for c. Similarly, the link of #2is {1, , , 2}. Network topology vector is the concatena-tion of these links i.e. {, , 1, , 1, , , 2}. Practically,integers between 1 and the network size are put in theposition of and the network topology vector becomes{1, 2, 1, 2, 1, 1, 1, 2}.

    Additionally, the initial value of all nodes is one. Thevalue of #1 is bound to the input data. The haltingcondition is that the value of #1 is 0. The result is thevalue of #2 at the time when the RTN halts. Conse-quently, the RTN shown in Fig. 1 is defined completely.

    3.2 Evolutionary operators

    As mentioned above, the data structure of RTN ofFig. 1 is {{(cP(c))/2, P(a)d}, {1,2,1,2,1,1,1,2}}. Var-ious evolutionary operators for this data structure canbe introduced: (1) replacement of an S-expression byrandomly generated S-expression, (2) exchange of S-expressions between two RTNs, (3) mutation on networktopology vector, (4) exchange of links of nodes betweentwo RTNs, and (5) crossover of two network topologyvectors1. (6) Increment, duplication, and decrement ofnode.

    3.3 Differences between RTN and SGPRTN is a natural extension of the SGP. SGP uses a

    single S-expression as an individual representation. Onthe other hand, RTN uses plural S-expressions. Spe-cial non-terminals are not needed, but four arithmeticfunctions and a function P are needed to make RTN

    1Crossover of two network topology vector is the same as thecrossover used in the ordinary Genetic Algorithm. However, thenode number of parents i.e. the size of network topology vectormust be the same. The number of nodes is changed by the nodeincrement or decrement operators. Thus, these operators must beused to keep the node number of all RTNs in the population thesame.

  • 8/7/2019 Turing-complete Data Structure for Genetic Programming

    3/6

    Turing-complete as discussed in the next section. Thenumber of variables is at least four. Variables of theSGP are bound to the input data. On the other hand,the values of nodes are bound to those.

    4 Turing-completeness of RTNRTN can simulate an arbitrary TM if there are four

    nodes and the function of each node is composed ofarithmetic functions, a function P, four variables andconstants. In short, RTN can represent any algorithms.The proof is straightforward. Firstly, the TM is arith-metized. Secondly, RTN to simulate the arithmetizedTM is constructed.

    4.1 Arithmetization of a TM

    The tape of an arithmetized Turing machine is nota string of characters but an integer. Arithmetizationmakes it possible to express the movements of a TM inthe form of simple arithmetic, rather than the symbol

    operations[6].Suppose that the TM is in the following state, and

    the head is moving rightwards, i.e. D(qi, sj) = 1 (if thehead is moving leftwards, then D(qi, sj) = 0),

    b3 b2 b1 b0 m

    sjqi

    c0 c1 c2 c3 n

    ,

    where sj represents the character of the tape that iscurrently being read, and qi represents the state of themachine. New symbol s, m, and n are defined as s = sj,m =

    i=0 bi2i, and n =

    i=0 ci2i. As a result, the

    state of the tape can be expressed by three numberss, m, and n. (The sum of the above is actually finite,

    because the number of 1 in the tape is finite.) Further-more, by expressing the state of the TM qi by integerq, the complete state of the TM can ultimately be ex-pressed by four numbers q, s, m, and n.

    After the tape has been rewritten to sij = R(q, s) andthe head has moved rightwards, the TM is as follows:

    b3 b2 b1 b0 sij m

    c0qij

    c1 c2 c3 n

    .

    This step can be described as a transition of q, s, m, andn. In this example, the new numbers q, s, m, and n

    are obviously q = Q(q, s), s = P(n), m = R(q, s) +

    2m, and n

    = H(n), where Q(q, s) is the subsequentstate, and H and P are the quotient and the remainderdivided by 2, respectively.

    It is easy to check that the transition of TM can beexpressed in ordinary cases as follows:

    q = Q(q, s), (1)

    s = P(m)(1 D(q, s)) + P(n)D(q, s), (2)

    m = (R(q, s) + 2m)D(q, s)

    + H(m)(1 D(q, s)), (3)

    n = (R(q, s) + 2n)(1 D(q, s))

    + H(n)D(q, s). (4)

    4.2 Simulating TM by RTN

    An arithmetized TM can be simulated by RTN shownin the Fig. 2. Expressions of each node are as follows:#1: Q(a, b),#2: P(c)(1 D(a, b)) + P(d)D(a, b),#3: (R(a, b) + 2c)D(a, b) + H(c)(1 D(a, b)),#4: (R(a, b) + 2d)(1 D(a, b)) + H(d)D(a, b).

    Note that a, b, c, and d are local variables of the node.In short, the variable a of #1 and that of #2 have norelation.

    It is straightforward to prove that the node of RTNcan express Q(q, s), R(q, s), and D(q, s) defined as atable. As a result, an RTN can simulate an arbitraryTM if there are four nodes and each node consists ofarithmetic functions, a function P, four variables andconstant. We do not assert that this is the minimumnon-terminal set. In addition, if we do not worry aboutthe size of the non-terminal set, then the above discus-sion can be omitted by introducing non-terminal if.

    Figure 2: RTN to simulate TM.

    4.3 List as a node value

    The value of a node mentioned above is a number.RTN can be extended to deal with a list of numbers asa node value. Suppose the tapes m and n in previousproof are expressed by lists, and the transition of TMcan be written as follows:

    q = Q(q, s), (5)

    s = if D(q, s) = 0 then car(m) else car(n), (6)

    m

    = if D(q, s) = 0 then cons(R(q, s), m))else cdr(m), (7)

    n = if D(q, s) = 0 then cdr(n)

    else cons(R(q, s), n), (8)

    where car is a function to return the first element of thelist given as its argument, cdr is a function to return alist containing all but the first element, and cons is afunction to take an expression and a list and return anew list whose first element is the expression and whoseremaining elements are those of the old list. In addition,car returns nil if its argument is an empty list, and cdr

  • 8/7/2019 Turing-complete Data Structure for Genetic Programming

    4/6

  • 8/7/2019 Turing-complete Data Structure for Genetic Programming

    5/6

    Table 3: Transition of solution

    t #1 #2 #n 0 {1,0,1,1} {} {}1 {0,1,1} 12 {1,1} 03 {1} 10 {} 1

    1 {} {}...i {} {} 1 or 0

    5.2 Setting of evolutionary computing

    Setting of evolutionary computing is shown in Ta-ble 4. Each evolutionary operator makes one-thirdof the offspring. The non-terminal divide returnsits first argument directly if the second argument is0. Similarly, p returns its argument directly if it is

    not an integer. The non-terminal if is defined asif(a,b,c,d) := if (a = b) then c else d.

    Table 4: Setting of evolutionary computing

    Population size 1,000 (L1,L4) or 5,000 (L2,L3)Number of nodes 20Operators Replacement of an S-expression, mu-

    tation on the network topology vec-tor, and crossover of two networktopology vectors

    Selection method Tournament (size 5)Terminals a,b,c,d, nil, and integer from 0 to 10

    Non-terminals plus, minus, times, divide, p, if, car,cdr, and cons

    5.3 Results

    The results are summarized in Table 5 (Only nodeswith a causal relationship with the node that outputsan answer are shown). For example, RTN for L1 wasgenerated at fifth generation. Its ninth node outputs ananswer after four steps since the value of #1 becomes nil.Fitness scores of generated RTNs are all 1, i.e. generatedRTN classifies the training data perfectly.

    Table 5: Evolved classifiers for Tomita languages

    L1 (generation: 5, node: 9, time: 4)v1

    = cdr(v1)v2

    = car(v1)v9

    = times(if(v19, divide(v19, minus(divide(v19,cdr(if(v19, v15, times(v20, v15), cdr(v19)))),v19)), 1, v20), v15)

    v15

    = v17v17

    = times(v19, v17)v19

    = v2v20

    = v19

    L2 (generation: 14, node: 18, time: 2)v1

    = cdr(v1)v2

    = car(v1)v10

    = cdr(p(v16))v16

    = minus(divide(6, v1), times(car(p(times(v2, 3))),v10))

    v18

    = times(v20, v10)v20

    = cdr(p(if(cons(car(v2), v2), cons(p

    (if(v18, v2, cons(cons(4, car(v2)), divide(v2, v2)),minus(car(minus(v2, v2)), car(plus(divide(0, 3),v2))))), v18), plus(4, v2), minus(9, v2))))

    L3 (generation: 249, node: 4, time: 4)v1

    = cdr(v1)v2

    = car(v1)v4

    = times(v17, v10)v6

    = if(v17, 1, v10, v4)v8

    = times(cdr(v2), v17)v10

    = v6v17

    = p(plus(5, minus(v2, v8)))

    L4 (generation: 40, node: 19, time: 9)v1

    = cdr(v1)v2

    = car(v1)v3

    = v20v5

    = v6v6

    = cdr(plus(v19, v19))v9

    = if(1, v16, v16, if(v10, p(v16), 3, 8))v10

    = if(times(1, v15), v5, cons(v3, v3), v10)v12

    = v9v13

    = cdr(v19)v15

    = v20v16

    = v2v19

    = p(v12)v20

    = if(v12, v20, v20, v13)

    5.4 Validations

    We used strings whose length was smaller than 11. To

    validate the results, we tried to classify all strings whoselength was smaller than 17 using generated RTNs. As aresult, RTNs generated by the evolutionary computingcould classify the strings perfectly.

    6 DiscussionGeneration of Tomita language was tackled by -

    STROGANOFF[3], which is an integrated system of theSGP and recurrent neural networks. However, its high-est score for Tomita language L4 was 91%. On the otherhand, our score is 100%. This comparison will clearlyshow the effectiveness of our approach.

    Various representations for GP have been proposed sofar. Representations related to our work are as follows.

    In the case of the SGP, an individual is representedas a single S-expression which consists of non-terminals(functions) and terminals (variables and constants).Variables are bound to the input data. Output of theprogram is the evaluated value of the S-expression. It isproved that the expressiveness of the individual of theSGP becomes Turing-complete by adding functions toaccess an indexed memory and repeating the evaluationof S-expression[10].

    There are other Turing-complete representations, forexample, Cellular Automaton and recurrent artificial

  • 8/7/2019 Turing-complete Data Structure for Genetic Programming

    6/6

    neural network, etc. One of the drawbacks of theserepresentations is that it is not easy to introduce newfunctions. On the other hand, GP using S-expressioncan be introduced using new functions easily by addingnew non-terminals.

    There are various representations using a graph. Forexample, GP-automata[2], Parallel Distributed GeneticProgramming (PDGP)[8], PADO[11], and Multiple In-teracting Programs (MIPs)[1]. Linear-Graph GP[4] alsouses a graph, but its nodes are not S-expressions.

    GP-automata is different from RTN in expressiveness.Each node of GP-automata corresponds to a state offinite states automaton (FSA). Thus, the expressivenessof GP-automata is the same as the one of FSA.

    PDGP is different from RTN in the fact that eachnode of PDGP is not an S-expression but a non-terminal. As mentioned above, it is easy to show theTuring-completeness of RTN, because the nodes of RTN

    are S-expressions. The expressiveness of PDGP is notclear.

    PADO is different from RTN in the fact that PADOhas an indexed memory, and there are non-terminalslike READ and WRITE. Thus, the data flowing in thenetwork are sometimes dealt as a pure data, but aredealt as a memory index otherwise. Consequently, inintroducing new non-terminals, we must consider thecase in which they are used to calculate a memory index.On the other hand, RTN has nothing like an indexedmemory. Additionally, non-terminal like read the valueof #n cannot be used. Thus, users of RTN need not

    consider this problem.Multiple Interacting Programs (MIPs) is very simi-

    lar to RTN. It is different from RTN in two points asdescribed below. There are some drawbacks because ofthese differences.

    Firstly, many terminals are needed for MIPs. For ex-ample, in case of MIPs, the program shown in Fig. 1 isrepresented like {v1 = (v1 P(v1))/2, v

    2 = P(v1)v2}.On the other hand, it is represented like {v1 = (a P(a))/2, v

    2= P(a)b}, where the value of #1 is con-

    nected to the variable a in #1 and #2. This differenceis conspicuous when the network becomes large. In case

    of RTN, only four variables are needed.

    Secondly, a policy to decide the network size, the linknumber of each node and non-terminal set, etc. is notgiven for MIPs. If proper non-terminal set is not given,then search may be difficult. For example, generation ofbit reverser is reported in [1]. The non-terminal set usedthere includes functions like sin and cos. As a result,the generated program is complicated and limited to 5-bit. On the other hand, we confirmed experimentallythat RTN can generate the bit reverser easily withoutany special non-terminals except what is essential forTuring-completeness.

    7 Conclusion and future worksWe proposed a representation scheme for GP. It was

    a recurrent network of trees (RTN). RTN was proved tobe Turing-complete. As an example, RTN was appliedto generate a classifier for Tomita languages. Generatedclassifier was better than that had been generated by us-ing GP. However, this task can be achieved with limited

    representations. Thus, the task which is impossible forsuch a limited representation is one of the future works.

    We expect the node of RTN will be a good unit toencapsulate, introduce and keep knowledge, i.e. a goodbuilding block for GP. The research about this topicincluding a yardstick for judging this matter is also oneof our future works.

    There are still few cases in which programs have ac-tually been evolved in a system using Turing-completerepresentations[7, 10]. Further study is required, includ-ing the question as to whether such programs can evolvewithin the framework of evolutionary computing.

    References[1] P.J. Angeline. Multiple interacting programs: A repre-

    sentation for evolving complex behaviors. Cyberneticsand Systems, 29(8):779806, 1998.

    [2] D. Ashlock. GP-automata for dividing the dollar. InGenetic Programming 1997: Proceedings of the Second

    Annual Conference, pages 1826, 1997.

    [3] H. Iba et al. Temporal data processing using geneticprogramming. In Proceedings of the Sixth InternationalConference ICGA-95, pages 279286, 1995.

    [4] W. Kantschik and W. Banzhaf. Linear-graph GP

    a new GP structure. In Genetic Programming, Pro-ceedings of the 5th European Conference, EuroGP 2002,volume 2278 of LNCS, pages 8392, 2002.

    [5] J.R. Koza et al. Genetic Programming III. MorganKaufman, 1999.

    [6] M.L. Minsky. Computation: Finite and Infinite Ma-chines. Prentice-Hall, Inc., 1967.

    [7] P. Nordin and W. Banzhaf. Evolving turing-completeprograms for a register machine with self-modifyingcode. In Proceedings of the Sixth International Con-

    ference ICGA-95, pages 318325, 1995.

    [8] R. Poli. Evolution of graph-like programs with paral-

    lel distributed genetic programming. In Genetic Algo-rithms: Proceedings of the Seventh International Con-ference, pages 346353, 1997.

    [9] M. Sipser. Introduction to the Theory of Computation.Thomson Learning, 1997.

    [10] A. Teller. Turing completeness in the language of ge-netic programming with indexed memory. In Proceed-ings of the 1994 IEEE World Congress on Computa-

    tional Intelligence, volume 1, pages 136141, 1994.

    [11] A. Teller and M. Veloso. PADO: A new learning ar-chitecture for object recognition. In Symbolic VisualLearning, pages 81116. Oxford University Press, 1996.