assessing the feasibility of novelty search in the ... › ~chuang › polebalancing › thesis ›...

University of Cape TownComputer Science Honours Thesis

Assessing the feasibility of Novelty Searchin the generalised pole balancing domain.

Allen Huang

Supervised by Dr. Geoff Nitschke, David Shorten

Category Min Max Chosen

1 Requirement Analysis and Design 0 20 0

2 Theoretical Analysis 0 25 0

3 Experiment Design and Execution 0 20 18

4 System Development and Implementation 0 15 0

5 Results, Findings and Conclusion 10 20 17

6 Aim Formulation and Background Work 10 15 15

7 Quality of Report Writing and Presentation 10 10

8 Adherence to Project Proposal and Quality of Deliverables 10 10

9 Overall General Project Evaluation 0 10 10

Total Marks 80 80

Department of Computer Science, 2014

Abstract

The standard pole balancing problem, an effective benchmark for controllers in simulatedenvironments is often ineffective for real-world scenarios. We implement a more generalisedversion of the pole balancing problem in the effort of creating controllers with real-world rel-evance. Additionally, all previous neuro-evolution approaches on the pole balancing problemwere objective-based. As the size of the search space increases for the generalised version,we test a non objective-based neuro-evolution approach and compare its performance withthe objective-based approach.

The results of this thesis show that the non objective-based approach is not as effectiveat finding controllers for the generalised version of the pole balancing problem and the poorperformance may be due to the lack of identifiable deception in the pole balancing domain.

i

Acknowledgements

Dr Geoff Nitschke has not only provided valuable critique and input into this research andthesis, but he introduced me to the realm of Evolutionary Computation in which I am nowdeeply interested. I would like to take this opportunity to thank him as well as David Shortenfor his assistance in this project.

Joel Lehman and Ken Stanley, the creators of Novelty Search have also provided valuable insightfor my research, thank you.

I would also like to thank Hussain Kajee for working with me on this project, Federico Lorenziwho has volunteered to proof-read this thesis and Tina Hsu for her support.

Lastly, I would like to thank my parents and my sister for always supporting me in all mydecisions.

ii

Contents

1 Introduction 1

2 Background 2

2.1 Pole Balancing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Neuro-Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Overview of Neuro-Evolution Approaches . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Objective-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Non Objective-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.6 Limitations of Previous Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.7 NEAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.7.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.7.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.7.3 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.8 A Non Objective-Based Approach for Pole Balancing . . . . . . . . . . . . . . . . 9

2.9 Novelty Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.10 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Methods 12

3.1 Fitness-Based Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Novelty Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Novelty Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 Novelty Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Novelty Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

iii

4 Experiments 17

4.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Experiment Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Single Starting Position Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 41 Starting Position Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Results 19

5.1 Single Starting Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.1.1 Easy Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.1.2 Difficult Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.2 41 Starting Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2.1 Easy Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2.2 Difficult Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Discussion 28

6.1 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Novelty Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2.1 Conflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.2.2 Deception in the Pole Balancing Domain . . . . . . . . . . . . . . . . . . 29

6.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7 Conclusion 30

8 Future Work 31

iv

List of Figures

1 Maze Navigation Maps. The large circle represents the starting position ofthe robot and the small circle represents the goal. Cul-de-sacs in both maps thatlead toward the goal create the potential for deception (Lehman and Stanley,2011). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Single Pole Balancing Problem (James, 2014) . . . . . . . . . . . . . . . . . . . . 3

3 Two Pole Balancing Problem (James, 2014) . . . . . . . . . . . . . . . . . . . . . 3

4 The feedback nature of neuro-evolution (Gomez et al., 2008). . . . . . . . . . . . 4

5 Genotype and phenotype representations of NEAT (Stanley and Miikkulainen,2002). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

6 Mutations in NEAT. The top number in each gene represents the innovationnumber, below that the connection between two nodes is represented. A genecan either be enabled or disabled (which are denoted by DIS and shaded in gray)(Stanley and Miikkulainen, 2002). . . . . . . . . . . . . . . . . . . . . . . . . . . 9

7 Parent genotypes for different ANN topologies are matched up using the genes’innovation numbers (Stanley and Miikkulainen, 2002). . . . . . . . . . . . . . . . 10

8 FIFO Queue Structure for novelty archive. . . . . . . . . . . . . . . . . . . . . 16

9 Multiple pole angles were chosen by using a fixed-interval range from the inputstarting position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

10 Single starting position (Easy Version) - average maximum fitness. . . . . . . . . 20

11 Single starting position (Difficult Version) - average maximum fitness. . . . . . . 21

12 41 starting positions (Easy Version) - average maximum fitness. . . . . . . . . . . 23

13 41 starting positions (Difficult Version) - average maximum fitness. . . . . . . . . 24

14 41 starting positions (Easy Version) - (novelty search approaches only). . . . . . 26

15 41 starting positions (Difficult Version) - (novelty search approaches only). . . . . 27

v

List of Tables

1 Summary of Experiments: This is a list of all the experiments performed.Each experiment was run for the easy and difficult versions. . . . . . . . . . . . . 17

2 Single starting position (Easy Version) - success rates. . . . . . . . . . . . . . . . 20

3 Single starting position (Difficult Version) - success rates. . . . . . . . . . . . . . 21

4 Single Starting Positions (Easy) - Wilcoxon Statistical Text . . . . . . . . . . . . 22

5 Single Starting Positions (Hard) - Wilcoxon Statistical Text . . . . . . . . . . . . 22

6 41 starting positions (Easy Version) - success rates. . . . . . . . . . . . . . . . . . 23

7 41 starting positions (Difficult Version) - success rates. . . . . . . . . . . . . . . . 24

8 41 Starting Positions (Easy) - Wilcoxon Statistical Text . . . . . . . . . . . . . . 25

9 41 Starting Positions (Hard) - Wilcoxon Statistical Text . . . . . . . . . . . . . . 25

10 One pole with complete state information provided to the controller. . . . . . . . 32

11 One pole with incomplete state information provided to the controller. . . . . . . 32

12 Two poles with complete state information provided to the controller. . . . . . . 33

13 Two poles with incomplete state information provided to the controller. . . . . . 33

14 NEAT Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

15 Pole balancing specific parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 34

16 Novelty Search Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vi

1 Introduction

In the field of machine learning, control problems such as finless rocket (Gomez and Miikku-lainen, 2003) and aircraft control, manufacturing and robot locomotion, are of high relevancedue to their practical real-world uses. Control problems require the controller or agent to ob-serve the state of the system or environment and output a control signal that affects futurestates of the environment in some desirable way (Gomez et al., 2008).

The pole balancing, or inverted pendulum system, is a benchmark problem used to assess solu-tions to control problems. The system consists of a pole hinged to a wheeled cart on a finitestretch of track (see figure 2) (Gomez et al., 2008).

In previous research, various ontogenetic1 and phylogenetic2 approaches for adapting artificialneural networks (ANN), outlined by Gomez et al. (2008), were used to find suitable controllersolutions to the pole balancing problem with a fixed initial state. Only a few of the testedapproaches proved to be effective in finding a solution in a reasonable number of evaluations(see appendix A). The results of the experiments3 conducted by Gomez et al. (2008) showedthat the phylogenetic approaches far outperformed their ontogenetic counterparts.

Neuro-evolution is an approach of evolving and adapting ANNs (Gomez et al., 2008). Geneticrepresentations of the ANNs are adapted with feedback from the environment. In the context ofpole balancing, the ANNs represented the controller’s decisions and the environment representedthe angular position of the poles and position of the cart on the track.

Previous approaches employed for the pole balancing problem were objective-based (Gomezet al., 2008) and fitness functions were used to assess solutions’ performances. Fitness functionsare prone to local optima and deception. Additionally, the adapted controllers, which havebeen optimised for the initial fixed state input, were not tested with different or dynamic initialstates.

A search space that is deceptive causes a solution to be trapped into an area where it is unableto ultimately reach the objective. Deception usually lures the solution closer towards the goalbut prevents it from reaching it (Lehman and Stanley, 2011). An example of deception in aproblem space can be seen in figure 1. Non objective-based approaches, however, do not gettrapped in these deceptive regions and are thus more effective in traversing the search space.To the author’s knowledge, a non objective-based approach has not been used or tested for polebalancing.

This thesis looks at our implementation of novelty search, a non objective-based approachby Lehman and Stanley (2011), for pole balancing. The performance of an objective-basedapproach will be compared with that of our novelty search approach. Novelty search (Lehmanand Stanley, 2011) has been tested on other problem domains such as maze navigation andbiped locomotion. However, it has not implemented or tested for the pole balancing domain(Lehman, Joel, 2014).

1Examples include: Reinforcement Learning (Sutton and Barto, 1998), Random Weight Guessing (Gomezet al., 2006) and Back-Propagation (Werbos, 1990).

2Examples include: Neuro-evolution(Floreano et al., 2008), Genetic Algorithms (Davis et al., 1991), Evolu-tionary Strategies (Beyer and Schwefel, 2002) and Genetic Programming (Banzhaf et al., 1998)

3Refer to appendix A for experiment results of Gomez et al. (2008).

1

Figure 1: Maze Navigation Maps. The large circle represents the starting position of therobot and the small circle represents the goal. Cul-de-sacs in both maps that lead toward thegoal create the potential for deception (Lehman and Stanley, 2011).

Further, our research aims to test how these (objective and non objective) approaches fare forfinding generalised controllers4. With the increase in size of the search space (by incorporatingmultiple starting initial conditions), deception might arise in the pole balancing domain, whichotherwise has no immediately identifiable deceptions (Lehman, Joel, 2014).

This thesis first introduces background information on the pole balancing problem, the neuro-evolution of augmenting topologies (NEAT) algorithm (Stanley and Miikkulainen, 2002) andnovelty search. It then moves onto our research by describing the experiment processes andapproaches and finally presents our findings on the performance and feasibility of novelty searchfor the pole balancing problem.

2 Background

In this section, we will introduce the concept of pole balancing and the various objective-based neuro-evolution approaches that have been used. We will then explore the realm of nonobjective-based approaches for pole balancing, the concept of novelty search and the possibilityof creating a non objective-based neuro-evolution approach to find a controller that can solvethe pole balancing problem for dynamic initial states.

2.1 Pole Balancing Problem

The pole balancing problem falls under dynamics and control problems in which a systemreceives feedback from the environment before deciding how to act in a way that is favourable tofuture states (Gomez et al., 2008). It is one of the most popular benchmarks for general controlproblems in control theory. An approach that can find a suitable controller to solve the polebalancing problem can, theoretically, be applied to find suitable controllers to other problemssuch as aircraft control, manufacturing and robotic locomotion (Gomez and Miikkulainen, 2003).

4Controllers that are able to balance poles regardless of initial conditions: pole angles and lengths

2

Figure 2: Single Pole Balancing Problem (James, 2014)

Figure 3: Two Pole Balancing Problem (James, 2014)

The basic pole balancing problem consists of a pole hinged to a wheeled cart on a finite stretchof track (see figure 2) (Gomez et al., 2008). The objective is to apply force on the cart atregular intervals such that the pole is balanced indefinitely and the cart stays within the trackboundaries (Gomez et al., 2008). More challenging versions of the pole balancing problem canbe presented by either modifying the mechanical system, for example, by adding an additionalpole next to or on top of the existing pole, or by restricting the amount of state informationprovided to the controller, for example, not providing the controller with the poles’ velocities(Wieland, 1991).

What makes control problems interesting and difficult is that the environment is often non-linear and noisy, making it impossible to obtain an accurate or tractable mathematical model.Additionally, due to the complexity, there is very little a priori knowledge about the problem(Gomez et al., 2008) to design effective controllers to solve it.

We are interested in eventually solving the two pole balancing problem (see figure 3) with polesof varying length ratios placed next to each other. The difficulty increases as the ratios betweenthe poles decrease, that is, when the poles’ lengths become closer to each other. Balancingtwo poles of the same length is highly improbable. According to Stanley (2014), ”difficultyincreases enormously to the point where if they are the same length the problem is theoreticallyimpossible”.

2.2 Neuro-Evolution

Neuro-evolution is the artificial evolution of ANNs using evolutionary algorithms (Stanley andMiikkulainen, 2002). It searches through the behaviour space for an ANN that performs wellat a given task. Figure 4 shows how an objective-based neuro-evolution algorithm that usesfitness functions work. Each genotype in the evolutionary algorithm (for example, genetic

3

Figure 4: The feedback nature of neuro-evolution (Gomez et al., 2008).

algorithm (Davis et al., 1991)) is transformed into an ANN and evaluated on the task. The ANNreceives input from the environment and produces an output signal that affects the environment.Thereafter, a fitness is assigned to the ANN according to its performance on the problem. ANNsthat perform well are recombined to generate new ANNs (Gomez et al., 2008).

An objective-based neuro-evolution approach proceeds as follows (Stanley and Miikkulainen,2002):

1. A population of genotypes, each encoding an ANN controller are initialised, either ran-domly or using a heuristic.

2. Each genotype in the population is converted into its ANN and evaluated. A fitnessmeasure is assigned to the genotypes.

3. Genotypes are selected by a selection criteria (for example, fitness proportionate (Zitzleret al., 2000) or elitism (Thierens and Goldberg, 1994)) which are then recombined toproduce offspring (Eiben and Smith, 2003).

4. Offspring are mutated and then evaluated. Fitness is assigned to each offspring.

5. Using a replacement criteria (for example: fitness proportionate, elitism), offspring replaceparents in the population.

6. Steps 2 - 5 are repeated until a stopping condition (solution found or maximum numberof evaluations) is met.

The selection pressure in steps 3 (recombining good performance ANNs) and 5 (replacing weakparents with fit offspring) forces newer generations to have high fitness and thus hopes to findgood solutions in a reasonable time.

4

2.3 Overview of Neuro-Evolution Approaches

Neuro-evolution approaches have been popular in solving the pole balancing problem as theyprove to be effective (Gomez et al., 2008). In the following sub-sections, various objective-based neuro-evolution approaches will be described and an introduction to non objective-basedapproaches will be outlined.

Results from Gomez et al. (2008) showed that the neuro-evolution approaches were faster andmore effective than the ontogenetic approaches that were tested5. Other past research hasfurther reinforced this by showing that neuro-evolution is more efficient than reinforcementlearning such as Adaptive Heuristic Critic and Q-learning on single pole balancing and roboticarm control (Moriarty and Mikkulainen, 1996). For this reason, neuro-evolution is our primaryarea of interest and we have thus omitted the ontogenetic approaches altogether.

Neuro-evolution can be categorised into two broad types: objective-based and non objective-based approaches. Past research only tested objective-based approaches when solving the polebalancing problem (Gomez et al., 2008; Lehman, Joel, 2014). However, we will outline nonobjective-based approaches, their relevance in the pole balancing domain and how it may be aneffective approach in finding a general problem solver.

Many approaches were investigated in Gomez et al. (2008), but only the neuro-evolution ap-proaches are listed in the following sub-section with short descriptions as they will be the focusof our research.

2.4 Objective-Based Approaches

An objective-based approach is where solutions are measured with respect to objectives chosena priori by researchers. A well-defined fitness function ensures that the search is effective anddoes not quickly reach local optima.

More details of each algorithm, along with how genotypes were assessed (what types of fitnessfunctions used) are discussed below:

Symbiotic, Adaptive Neuro-Evolution (SANE) : Developed by Moriarty (1997) is a co-operative co-evolution approach. Two separate populations, a population of neurons anda population of network blueprints (that specify how neurons are combined to form com-plete networks) are evolved simultaneously. At each generation, blueprints and neuronsare selected at random to form networks. Neurons that combine to form good networksreceive high fitness values and are recombined in a single population. Fit blueprints arealso recombined to produce even better combinations.

Conventional Neuro-evolution (CNE) : Developed by Gomez et al. (2008) is a single pop-ulation neuro-evolution approach. Network weights are encoded in real numbers, selectionis based on rank and mutation is performed using burst mutation. CNE evolves genotypesat the network level with each genotype representing a complete neural network.

5Refer to appendix A for experiment results of Gomez et al. (2008).

5

Evolutionary Programming (EP) : Developed by Saravanan and Fogel (1995) is a mutation-based evolutionary approach that can be used to search the space of neural networks.Networks are represented by two n-dimensional vectors: one contains synaptic weight val-ues and the other, the standard deviation values for the synaptic weights. Networks areconstructed using weights from the first vector and offspring are produced by applyingGaussian noise to each weight with their corresponding standard deviation.

Cellular Encoding (CE) : Developed by Gruau et al. (1996) is an approach that evolvesgraph-writing programs. A graph-writing program constructs a neural network out ofcells (neurons) using a sequence of operations that either copy cells or modify the contentswithin the cells. Standard genetic programming crossover and mutation approaches areused in recombining the programs.

Covariance Matrix Adaptation Evolutionary Strategies (CMA-ES) : Developed by Hansenand Ostermeier (2001) is an evolutionary strategies approach that evolves the covariancematrix of the mutation operator. The objective is to favour previously selected mutationsteps for future generations. Rigorous pursuit of this objective results in a completelyderandomized self-adaptation scheme of normal mutation distributions.

Neuro-Evolution of Augmenting Topologies (NEAT) : Developed by Stanley and Mi-ikkulainen (2002) is similar to CE (evolves topology as well as synaptic weights), but isprogressive in that it starts with a population of simple, minimal networks that increasein complexity along the evolutionary process. Complexity is increased by either addingmore connections between neurons or adding neurons. Genotypes are grouped into speciesbased on their topological similarity and only ANNs of the same species will recombinewith each other. This ensures diversity and protects innovative solutions from getting lostin the evolutionary process. NEAT is described in further detail later in this review.

Enforced Sub-Populations (ESP) : Developed by Gomez and Miikkulainen (1997) is also aco-evolution approach. It uses cooperative co-evolution to evaluate neurons from differentsub-populations.

Cooperative Synapse Neuro-evolution (CoSyNe) : Developed by Gomez et al. (2008) isa cooperative co-evolution approach that evolves genotype synaptic weights. For each net-work connection in a predefined network, there is a sub-population of real-valued synapticweights. The sub-populations are permuted in such a way so that each weight forms partof a potentially different network in the next generation.

Although some of these approaches are not strictly objective-based (for example, SANE, couldbe adapted to become non objective-based), past research which utilised these on the polebalancing domain were objective-based.

2.5 Non Objective-Based Approaches

Non objective-based approaches do not require researchers to craft fitness functions to assess agenotype’s performance. Many researchers (Gould (1996), Miconi (2008) and Sigmund (1995))have argued that fitness functions, which induce selection pressure (pressure to adapt in acertain way) actually restricts the search and opposes innovation.

6

In many problem domains, it is also extremely difficult to craft effective fitness functions, as itrequires an a priori understanding of the fitness landscapes and stepping stones to the objective(Woolley and Stanley, 2011). An oversight made by the researcher when designing a fitnessfunction could cause the search to be trapped in local optima.

In a non objective-based approach, stepping stones that would have been thrown away byan objective-based approach because they appear to be far away from the objective, will bepreserved. These stepping stones may lead to the ultimate objectives and prevent solutionsfrom getting trapped by deception.

Since the pole balancing problem does not have any immediately identifiable and obvious de-ceptions, non objective-based approaches are considered overkill (Lehman, Joel, 2014) and thushave not yet been taken.

2.6 Limitations of Previous Solutions

Current objective-based solutions to the pole balancing problem (Moriarty and Mikkulainen,1996; Gomez et al., 2008; Saravanan and Fogel, 1995; Gruau et al., 1996; Hansen and Ostermeier,2001; Stanley and Miikkulainen, 2002; Gomez and Miikkulainen, 2003) have only been testedon a fixed initial state, that is, one or two poles starting at a specific angle and cart at a specificlocation on the track. After the runs were completed, some of the solutions were able to solvethe pole balancing problem with that initial state. They were not tested on problems with adifferent or varying initial states.

Therefore, the assumption is that changing the initial state would require solutions to be rebuiltor re-evolved from the beginning to be effective. These solutions are thus limited to a fixed initialstate, which render them useless in real-world scenarios where the initial state of a problem isnot predefined.

Thus far, solutions have not been derived and tested to work for dynamic initial states, that is,a solution controller that can solve the pole balancing problem with any initial state.

2.7 NEAT

Neuro-Evolution of Augmenting Topologies by Stanley & Miikkulainen (Stanley and Miikku-lainen, 2002) is a complexification algorithm in that it starts out with very simple ANNs andprogressively increases the number of neurons and connections between them. This is analog-ical to biological evolution where species increase in complexity over evolutionary generations(Darwin, 1969).

NEAT is of particular interest to us due to its applicability and efficiency for novelty search(Lehman and Stanley, 2011).

7

Figure 5: Genotype and phenotype representations of NEAT (Stanley and Miikkulainen, 2002).

2.7.1 Representation

The NEAT algorithm uses a genetic encoding that is designed to allow corresponding genes tobe easily lined up during recombination. A genotype6 consists of multiple genes (see figure 5).

Each gene has a historical markings known as innovation numbers which allow for speciation aswell as ensuring genes from the same origin do not crossover, solving the problem of competingconventions7.

2.7.2 Mutation

Mutation can occur in two different ways in NEAT: connection weights mutation and structuralmutation.

As with any other neuro-evolution approach, mutation of connection weights either occur or donot occur on a gene in each generation.

Structural mutation in NEAT occurs by either adding new connections or adding new nodes tothe genotypes. When adding a new connection, a connection is made between two previouslyunconnected nodes. A gene representing the new connection is added to the end of the genotypesequence and assigned the next incremental innovation number (see figure 6).

When adding a new node, an existing connection is split and the new node is placed where theconnection used to be. The old connection is disabled and two new connections are added tothe genotype. The new connection leading into the new node receives a weight of 1, and thenew connection leading out receives the same weight as the old connection (see node 6 in figure6). Both of these mutations expand the size of the genotype by adding new genes (Stanley andMiikkulainen, 2002).

6A genotype’s gene specifies one connection between nodes whilst the whole genotype represents the completeANN.

7The Competing Conventions Problem, also known as the Permutations Problem is where there are manygenotypic ways to express an ANN. When genotypes of the same ANN crossover, they are likely to producedamaged offspring (Stanley and Miikkulainen, 2002).

8

Figure 6: Mutations in NEAT. The top number in each gene represents the innovation number,below that the connection between two nodes is represented. A gene can either be enabled ordisabled (which are denoted by DIS and shaded in gray) (Stanley and Miikkulainen, 2002).

2.7.3 Recombination

Parents are recombined by matching up their genes’ innovation numbers. Matching genes areinherited in the offspring randomly. Disjoint genes (those that do not match in the middle) andexcess genes (those that do not match in the end) are inherited from the more fit parent (seefigure 7). In the case that fitness of both parents are the same, random inheritance of thesegenes also occur (Stanley and Miikkulainen, 2002).

2.8 A Non Objective-Based Approach for Pole Balancing

All the previous neuro-evolution approaches for solving the pole balancing problem have beenobjective-based and used traditional fitness functions. A reason that a non objective-basedapproach has not have been tested may be because the pole balancing problem does not haveany immediately identifiable or obvious deceptions.

”A deceptive problem is one in which a reasonable EA will not reach the desired objective ina reasonable amount of time. That is, by exploiting objective fitness in a deceptive problem,a population’s trajectory is unlikely to uncover a path through the search space that ultimatelyleads to the objective” (Lehman and Stanley, 2011).

Since we are interested in solving the pole balancing problem with two poles of varying lengths,deception in the problem space may arise when the poles become similar in length (wheredifficulty increases drastically). A non objective-based approach such as novelty search mayprove to be an effective approach in finding suitable solutions.

As non objective-based approaches have not been employed, it may be a potential avenue ofpursuit.

9

Figure 7: Parent genotypes for different ANN topologies are matched up using the genes’innovation numbers (Stanley and Miikkulainen, 2002).

2.9 Novelty Search

Novelty search is a non objective neuro-evolution approach, proposed by Lehman and Stanley(2011) where solutions are rewarded by a novelty metric based on how significantly different ornovel their behaviours (phenotypes) are in respect to previous solutions.

Novelty search presents a new way of traversing the search space. It operates on the premisethat novel solutions provide the stepping stones (ones that objective-based approaches wouldhave discarded as they appear to be too far away from the objective) to the final objective.This way, novelty search ensures that solutions are not trapped or deceived into a local optima,which plagues objective-based approaches.

The novelty search algorithm encourages novelty and diversity in its genotypes via maintenanceof a novelty archive. This is achieved by keeping a permanent archive of past genotypes whosebehaviours were highly novel when they originated. Newly produced genotypes are measuredagainst the population and the archive. If a genotype’s novelty is sufficiently high (that is, abovea certain threshold), they too, are added to the archive. The aim is to characterize how far awayin the behaviour space new genotypes are from the rest of the population and its predecessors.Thus, a good novelty metric should compute the sparseness at any point in the behaviour space.If a solution appears in a dense cluster of previously visited points, it would be considered lessnovel and rewarded less (Lehman and Stanley, 2011). Searching the behaviour space ensuresthat genotypes that produce similar behaviours aren’t blindly rewarded.

Unlike objective-based approaches which induce a pressure to adapt according to the objective,novelty search, by rewarding highly novel genotypes, creates a constant pressure for the neuro-

10

evolution to produce something new.

Novelty search may resemble prior diversity maintenance techniques used in objective-basedapproaches, which attempt to open up the search and prevent premature convergence to a localoptimum. However, these approaches are still ultimately guided by fitness functions. Noveltysearch, which only rewards solutions by their behavioural diversity, is thus immune to deceptionswhich are inherent in fitness functions.

Experiments conducted by Lehman and Stanley (2011) on two problem domains: maze naviga-tion and biped locomotion, using novelty search showed that it far outperformed their objective-based counterparts. For some of the cases, their counterparts were unable to find the solutionaltogether, reinforcing the shortcomings of objective-based approaches.

However, novelty search is not without limitations. Since novelty search ignores objectivescompletely, there is no bias towards optimisation once a result is found. A possible strategy tomitigate this is by taking the most promising results found from novelty search and further opti-mise them using an objective function. This solution exploits the strengths of both approaches:using novelty search to find good approximate solutions while objective optimisation is used fortuning the approximate solutions (Lehman and Stanley, 2011). To appropriately exploit bothnovelty search and objective optimisation, a multi-objective formulation proposed by Mouret(2011) where novelty and fitness is rewarded concurrently might be most effective.

The NEAT algorithm has been adapted to become a novelty search approach by Lehman andStanley (2011). The adaptation was straightforward as all that was required was to replace thefitness function with a novelty metric because the underlying algorithm of NEAT ensured thatthe ANNs became more complex as solutions were being explored. Implicitly, this means thatonce simple ANNs have been explored, more complex ones will create novel behaviours (thatwere not producible by simpler ANNs), ensuring that the search is not random and revisitingalready explored areas.

This implication shows that NEAT is an effective algorithm for novelty search and encouragesexploration of the search space effectively.

2.10 Motivation

Control theory problems which have real-world applications are often benchmarked using thepole balancing, or inverted pendulum problem. Researchers have thus attempted to use neuro-evolution to find suitable controllers that would be able to solve the pole balancing problem.

Although some of the previous approaches proved to be effective, they only solved the problemfor fixed initial states. Their approaches required solutions to be rebuilt for them to be effective.A general solution that can solve pole balancing for any initial input has yet to be found.Additionally, the approaches taken were objective-based and used fitness functions to assessgenotype solutions along the evolutionary process. A non objective-based approach has notbeen used and could prove effective in finding a general pole balancing controller, especially forvery difficult versions (two poles of similar or same length).

11

A popular non objective-based approach, novelty search, which was effective in finding solutionsfor the maze navigation and robot biped locomotion (Lehman and Stanley, 2011) domains hasnot been tested on the pole balancing problem (Lehman, Joel, 2014).

Given the advantages of novelty search evidenced in previous research, we hypothesize thatnovelty search will be effective for finding a generalised controller for a difficult version of thetwo pole balancing problem with dynamic initial states. The behaviour space is also drasticallylarger than problems with fixed states, further reinforcing the need for a non-objective search(like novelty search) which will allow a more effective coverage of the search space.

In the following sections, we investigate the feasibility and performance of novelty search onthe standard fixed initial input pole balancing problem and a more generalised pole balancingproblem.

3 Methods

The objective-based, fitness function approach was used as a benchmark for our novelty searchimplementation. The following subsections outline the fitness-function and the novelty searchalgorithms.

3.1 Fitness-Based Algorithm

Since the aim of the pole balancing problem is to balance the poles for a certain amount oftime, the fitness function is a simple measure of how long a genotype is able to balance the poleor poles for.

The pole balancing simulation is run on each genotype in the population. Based on how longthe genotype was able to balance the poles for, the genotype received a fitness score.

The fitness score is thus calculated as follows:

fitness(x) = balancedT imesteps(x)totalT imesteps

where totalTimesteps is the maximum amount of timesteps8 as defined in the pole balancingparameters.

8Timestep is a simulated time interval.

12

The pseudocode for the algorithm which scores all the genotypes in the population proceeds asfollows:

Data: Population of genotypesResult: Fitness for each genotypefor genotype ∈ Genotypes do

genotype.fitness = runSimulation(genotype);end

Algorithm 1: Genotype scoring with fitness function

Algorithm 1 scores all the genotypes in the population by their fitness and is used as thefoundation for novelty search and scoring the genotypes’ novelty.

3.2 Novelty Search Algorithm

Implementing novelty search involves removing the fitness function and replacing it with anovelty metric. The runSimulation function in the previous section outputs the genotype’sfitness. For our novelty search implementation, this was changed to only output the ANN’sbehaviour using its metric. After that, the behaviours are ranked and closely related genotypesare compared with one another. The genotype’s fitness is also recorded, but not used for theselection process. It is only used as a stop condition when a genotype’s fitness matches orexceeds the fitness threshold.

The algorithm proceeds as follows:

Data: Population of genotypesResult: Novelty for each genotypefor genotype ∈ Genotypes do

runSimulation(genotype); /* record data for novelty metric */

endsort(Genotypes); /* sorts genotypes by similarity */

calculateSparseness(Genotypes, neighbours) ; /* assigns novelty score */

Algorithm 2: Genotype scoring with novelty metric

Similar to Algorithm 1, Algorithm 2 runs the simulation on each genotype in the population.The actual assignment of the novelty scores is performed in calculateSparseness. Thus the mainfocus here is how these novelty scores are assigned in the calculateSparseness function.

13

Algorithm 3 provides a more detailed look at the calculateSparseness algorithm:

Data: Genotype, neighbours of genotypeResult: Genotype’s novelty scorefor genotype ∈ Genotypes do

sparseness = 0; /* initialise */

for neighbour ∈ Neighbours dosparseness += distance(genotype.metrics, neighbour.metrics);

endgenotype.sparseness = sparseness / Neighbours.size;

endAlgorithm 3: Sparseness calculation algorithm

Metric is some characteristic which describes the genotype and is domain specific. In the polebalancing domain, this could be the poles’ angles or velocities, cart position or velocities ora combination thereof. A more detailed description of what we novelty metrics we used andtested is described in the following subsections.

In both implementations (fitness and novelty search), elitism (Thierens and Goldberg, 1994) isused as the selection mechanism (for a full list of experiment parameters, see appendix B).

3.3 Novelty Search

Novelty search uses a novelty metric to compare ANN’s behaviours and uniqueness. We thushad to test various metrics to find an optimum metric (or combination of metrics) for thisdomain.

3.3.1 Novelty Metrics

The novelty metric measures how unique an ANN’s behaviour is when compared to the othergenotypes of the population and previously novel genotypes (in the novelty archive) (Lehmanand Stanley, 2011). The single pole balancing problem has four different observable propertiesthat can be used to build a novelty or behaviour metric. These observable properties are:

1. Cart position.

2. Cart velocity.

3. Pole (1) angle.

4. Pole (1) angular velocity.

14

However, since we are working with the double balancing problem, three more observable prop-erties can be introduced, namely:

1. Pole (2) angle.

2. Pole (2) angular velocity.

3. Angle between pole 1 and 2.

A good novelty metric has to appropriately represent an ANN’s behaviour and thus differentcombinations of the above observable properties were used when composing the novelty metricto assess their performances. The different novelty metrics implemented for this thesis include:

• Pole angles only.

• Pole velocities only.

• Pole angles combined with velocities.

• Cart position and angle between poles.

• Combination of cart position, velocity, pole angles, velocities and angle between poles.(All state information).

Each genotype in a population is given a novelty measure based on their sparseness. The sparse-ness characterises how far away or how sparse the genotype is from the rest of the populationand its predecessors (in the novelty archive) (Lehman and Stanley, 2011). Sparseness is mea-sured as the average distance between the k -nearest neighbours of that point, where k is fixed.Neighbours are composed of other genotypes in the same generation as well as neighbours inthe novelty archive. The measure of sparseness of any genotype in the population is given by:

sparseness(x) =1

k

k∑i=0

distance(x, µi) (1)

where µ is the ith-nearest neighbor of x with respect to the distance metric (Lehman andStanley, 2011).

Equation 1 describes a general measure of sparseness for any genotype in the search space. Thedistance between two genotypes is defined by a suitable distance metric. In our experiments,two difference distance metrics are explored, namely vector difference and Euclidean distanceand are described in more detail in the following section.

The novelty metric records data at each timestep in the simulation. This means that eachtimestep is a new dimension of the metric. When distances are computed, each dimension ofone genotype’s novelty metric is compared with the corresponding dimension in the other’snovelty metric.

15

Figure 8: FIFO Queue Structure for novelty archive.

3.4 Distance Metrics

The Euclidean distance derived by the Pythogorean theorem, is the distance between two pointsin space (Gower, 1982). The formula is as follows, for n number of dimensions:

distance(x, µ) =√

(x1 − µ1)2 + (x2 − µ2)2 + ...+ (xn − µn)2 (2)

Vector difference is another metric which was tested, it is the sum of the differences of eachdimension:

distance(x, µ) = (x1 − µ1) + (x2 − µ2) + ...+ (xn − µn) (3)

which is equivalent to the difference of the sum of all the dimensions of one genotype and theother:

distance(x, µ) =n∑

i=1

xi −n∑

i=1

µi (4)

For later experiments, the appropriate (and better performing) distance metric to use wasdetermined experimentally from initial experiments.

3.5 Novelty Archive

The novelty archive stores genotypes that were novel when they first appeared. The noveltythreshold determines whether a genotype gets added to the archive or not. This threshold isdynamic in that if too many (more than a certain portion of the population) genotypes exceedthe threshold, it is increased (and the genotypes are not added in this instance) and if nosolutions are added for a number of generations, it decreases. It will not decrease below thethreshold floor.

Our novelty archive is also fixed in size and follows a queue data structure (first-in first-out)which means that when a new solution gets added (enqueued), the oldest solution in the noveltyarchive will be discarded or dequeued, see figure 8.

Since each metric returns different average values for novelty of each genotype, the thresholdparameter has been adapted to be optimum for each novelty metric.

16

Starting Positions Distance Metric Approach Novelty Metric

Single not applicable Fitness Function not applicableSingle Vector Difference Novelty Search pole anglesSingle Euclidean Distance Novelty Search pole angles

41 not applicable Fitness Function not applicable41 Euclidean Distance Novelty Search All States41 Euclidean Distance Novelty Search Cart Pos. & Angle Between Poles41 Euclidean Distance Novelty Search pole angles41 Euclidean Distance Novelty Search pole angles & Velocities41 Euclidean Distance Novelty Search Pole Velocities

Table 1: Summary of Experiments: This is a list of all the experiments performed. Eachexperiment was run for the easy and difficult versions.

4 Experiments

4.1 Tools

Java was used as the main programming language for these experiments. The NEAT algorithmwas chosen for this experiment and Another NEAT Java Implementation (ANJI)9 was used asthe basis for our experiments as it included a pole balancing simulator.

A novelty metric (which replaced the fitness function) was implemented to create a NEAT-basednovelty search approach using ANJI. The included objective-based pole balancing simulator wasused as a benchmark to compare our novelty search’s performance.

Auxiliary tools such as SQLite and JFreeChart were used for the experiment’s database andresults graphing.

4.2 Experiment Process

For all experiments, we chose the non-markovian version of the pole balancing, that is, the polebalancing problem where the velocities are unknown to the ANNs. The non-markovian versionis more representative of a generalised problem solver as it is more difficult than the markovian(input velocities known). The non-markovian version is also more realistic, as input velocitiesare often unknown in real-world scenarios.

There are two categories of experiments: single starting positions (initial conditions) and mul-tiple starting positions.

In both categories of experiments, we begin with easy variations of the two pole balancingsystem. This is where the ratio of the pole’s lengths are large (for example, one pole being 1mand the other, 40cm). We then move onto a more difficult version where the poles’ lengths werealmost the same (for example, one pole being 1m and the other, 90cm). As the poles’ lengthsbecame closer to each other, the difficulty increases drastically (Stanley, 2014).

9ANJI (James, 2014). URL: http://anji.sourceforge.net/polebalance.htm

17

Figure 9: Multiple pole angles were chosen by using a fixed-interval range from the inputstarting position.

We started with a single starting position and thereafter, for finding more generalised controllers,a simulation with 41 starting pole angles were tested.

For the single starting positions, the experiments were run 500 times for both the easy anddifficult versions and for the multiple starting positions, the easy version was run 60 times andthe difficult version, 35 times.

For all experiments, the standard objective-based approach (fitness function) was used as abenchmark to compare novelty search against. Parameters for the simulation that were commonin both the fitness and novelty search approaches were standardized. This was to ensure thatone approach did not have unfair advantage over the other in terms of optimised parameters.To see the full list of parameters, please refer to appendix B.

Experiments for the objective-based approach, described in section 3.1, the novelty search met-rics, described in section 3.3.1 and the novelty search distance metrics, described in section 3.4are all performed. Table 1 shows a summary of all the experiments and each was run for theeasy version and for the difficult version.

4.3 Single Starting Position Experiments

The single starting position experiments intend to show the overall performance of noveltysearch compared to the objective-based approach and the distance metrics’ performances fornon-generalised versions of the pole balancing experiment.

We used pole angles10 for the novelty search metric in these experiments. This means the poleangles were recorded at each timestep and used to represent an ANN’s behaviour.

4.4 41 Starting Position Experiments

These experiments intend to show the overall performance of novelty search compared withthe objective-based approach for a more generalised version of the pole balancing experiment.Additionally, it also outlines the performance of the different novelty metrics for novelty search.

10We chose pole angles as it was determined experimentally, that it is the most effective approach.

18

The starting positions of the multiple position simulations were varied inside a fixed range ofthe input starting position (see figure 9). A 40% variation range, that is, 20% to the left and20% to the right, was chosen for our experiments. For the 41 positions, the range was dividedup into 41 equal parts (-20%, -19%, ... , 19%, 20%). The angles at the respective broken-upsegments made the variation starting positions. The number of positions, 41, was chosen forthe multiple-starting angles because it was at 1 degree granularity from -20% to 20% (including0%) variations and this was an appropriate level of granularity for representing a generalisedproblem.

In these experiments, we tested the different novelty metrics outlined earlier. Euclidean distancewas used in favour of vector difference as it was the more successful metric for the single startingposition experiments.

5 Results

Results from all experiments, starting from easy variations with a single initial position to verydifficult versions with multiple initial positions are outlined in this section.

The results are presented as follows:

1. Single starting position.

Easy Version.

Difficult Version.

2. 41 starting positions.

Easy Version.

Difficult Version.

In the following results subsections, we present a graph plotting the average maximum fitness(champion fitness of all runs) at each generation and a table outlining the success rate (howoften, out of all runs, were solutions found) for each approach.

19

Figure 10: Single starting position (Easy Version) - average maximum fitness.

5.1 Single Starting Position

5.1.1 Easy Version

Figure 10 and table 2 outline the average fitness and success rate of each approach respectively.From these results, we can see that the objective-based (fitness function) approach outperformsboth novelty search approaches by far. The novelty metric used for the novelty search approachesin this experiment is pole angles only. Euclidean distance also outperforms vector differencedistance.

Approach % Success

Fitness Function 74.3%Novelty Search (Vector Diff.) 23.15%

Novelty Search (Euclidean Dist.) 30.13%

Table 2: Single starting position (Easy Version) - success rates.

20

Figure 11: Single starting position (Difficult Version) - average maximum fitness.

5.1.2 Difficult Version

Figure 11 and table 3 outline results for the difficult version of the pole balancing experimentwith a single starting position and from these results, we can again see that the objective-basedapproach was the best-performing followed by novelty search with Euclidean distance and finallynovelty search with vector difference distance. The difference in performance is magnified inthis experiment.

Approach % Success

Fitness Function 1.00%Novelty Search (Vector Diff.) 0.00%

Novelty Search (Euclidean Dist.) 0.60%

Table 3: Single starting position (Difficult Version) - success rates.

21

Comparison Z-Value P-Value p ≤ 0.05 p ≤ 0.01

Novelty (VD) & Novelty (Euclid) -9.6869 0 significant significant

Fitness & Novelty (Euclid) -12.2627 0 significant significantFitness & Novelty (VD) -12.2627 0 significant significant

Table 4: Single Starting Positions (Easy) - Wilcoxon Statistical Text


Novelty (VD) & Novelty (Euclid) -12.2321 0 significant significant

Fitness & Novelty (Euclid) -12.2321 0 significant significantFitness & Novelty (VD) -12.2321 0 significant significant

Table 5: Single Starting Positions (Hard) - Wilcoxon Statistical Text

Table 4 and 5 summarises the results from the Wilcoxon two-tailed signed-rank test. The valuesused were the average champion fitness per run, that is, the mean fitness of all generations foreach run. As shown, in all tests, there was statistical significance between the two distancemetrics and between the distance metrics and objective-based approach.

22

Figure 12: 41 starting positions (Easy Version) - average maximum fitness.

5.2 41 Starting Positions

5.2.1 Easy Version

Figure 12 and table 6 outline the average fitness and success rate of each approach respectivelyfor the easy version. Figure 14 shows a more detailed look at just the different novelty searchapproaches (with different metrics) - attached on following pages. From the results, we can seethat the objective-based approach outperforms all novelty search approaches and novelty searchwith pole angles performed the best out of all the novelty search approaches.

Approach % Success

Fitness Function 93.33%Novelty Search (Pole angles only) 31.67%

Novelty Search (Pole velocities only) 0%Novelty Search (Pole angles combined with velocities) 5%

Novelty Search (Cart position and angle between poles) 0%Novelty Search (All state information) 13.33%

Table 6: 41 starting positions (Easy Version) - success rates.

23

Figure 13: 41 starting positions (Difficult Version) - average maximum fitness.

5.2.2 Difficult Version

Figure 13 and table 7 outline the average fitness and success rate of each approach respectivelyfor the difficult version. Figure 15 shows a more detailed look at just the different noveltysearch approaches (with different metrics) - attached on following pages. From the results, wecan see again the objective-based approach outperforms all novelty search approaches in terms ofaverage fitness. However, neither the objective-based nor the novelty search approaches yieldedany successful controllers.

Approach % Success

Fitness Function 0%Novelty Search (Pole angles only) 0%

Novelty Search (Pole velocities only) 0%Novelty Search (Pole angles combined with velocities) 0%

Novelty Search (Cart position and angle between poles) 0%Novelty Search (All state information) 0%

Table 7: 41 starting positions (Difficult Version) - success rates.

24


Novelty (PA) & Novelty (PV) -6.7359 0 significant significantNovelty (PA) & Novelty (PA&V) -6.5518 0 significant significant

Novelty (PA) & Novelty (CP&ABP) -6.7285 0 significant significantNovelty (PA) & Novelty (ALL) -6.677 0 significant significant

Fitness & Novelty (PA) -6.7359 0 significant significant

Table 8: 41 Starting Positions (Easy) - Wilcoxon Statistical Text


Novelty (CP&ABP) & Novelty (PV) -5.1594 0 significant significantNovelty (CP&ABP) & Novelty (PA&V) -4.6025 0 significant significant

Novelty (CP&ABP) & Novelty (PA) -5.1594 0 significant significantNovelty (CP&ABP) & Novelty (ALL) -5.1267 0 significant significant

Fitness & Novelty (CP&ABP) -5.1594 0 significant significant

Table 9: 41 Starting Positions (Hard) - Wilcoxon Statistical Text

Table 8 and 9 summarises the results from the Wilcoxon two-tailed signed-rank test. The valuesused were the average champion fitness per run. We only tested the best-performing noveltysearch metric against the other metrics and with the objective-based approach.

25

Figure 14: 41 starting positions (Easy Version) - (novelty search approaches only).

26

Figure 15: 41 starting positions (Difficult Version) - (novelty search approaches only).

27

6 Discussion

The overall results show that the objective-based approach vastly outperforms novelty searchin all experiments. Besides the success rates (tables 2, 3, 6 and 7), the average fitness chartsshow the difference in performance of each approach at every generation.

The statistical tests (in tables 4, 5, 8 and 9) further reinforce the significance of performancedifferences.

Furthermore, the objective-based approach was relatively good at solving the easy version ofthe 41 starting position pole balancing problem. However, as with single starting positions,difficult versions were unable to be solved effectively by either fitness or novelty search.

Therefore, this contradicts our initial hypothesis that a non-objective search, like novelty search,is more effective on a larger behaviour space when compared to an objective-based approach.

6.1 Distance Metrics

The aim of this experiment was to test whether using different distance metrics would affect thenovelty search’s performance. Vector difference performed significantly worse than Euclideandistance (tables 2 and 3) especially when the difficulty was increased. From these results, wediscarded vector difference from the multiple starting position experiments.

This could be due to the vector difference distance conflating the behaviours of the solutions.Aggressive conflation may lead to poor performance. Conflation is further discussed in thefollowing sections.

6.2 Novelty Metrics

Different novelty metrics were tested to determine whether different metrics would affect thenovelty search’s performance. For the easy experiment, the novelty metric which used the poleangles recorded at each timestep performed the best out of all tested metrics. This could bedue to the nature of the domain in which the poles are the most significant in characterisinga solution’s behaviour. Unlike the pole angle metric, other metrics such as cart position andvelocities may correspond to multiple pole angles (that is, velocity A might correspond to poleangle A, B or C). Combining pole angles with other metrics (see all states and pole angle andvelocities) seem to decrease the performance of the search (see tables 6, 7, 8 and 9).

However, for the difficult experiment, the performance of the different metrics were vastlydifferent. Cart Position and Angle Betwen Poles topped the rankings and pole angles performed2nd worse (3rd out of the 5 novelty metrics). Combining pole angles with velocity actuallyimproved performance (despite pole velocities performing worse than pole angles), which alsocontradicts the previous observation. This observation cannot however conclusively show thatCart position and angle between poles actually performed any better, since all novelty searchmetrics had a 0% success rate in terms of finding controllers that can actually balance the poles.

28

6.2.1 Conflation

Conflation is when the precision of the novelty metric is reduced such that behaviours of solutionsare combined or fused if they end up having the same result despite the path they take to getto that specific result (Lehman and Stanley, 2011).

Initially, when constructing the novelty metrics, we started with the pole angles, but we onlytook the pole angles at the end of the simulation (final timestep). This means that positions atall other timesteps were ignored and solutions with the same final positions were conflated.

The performance of this was extremely poor as the level of conflation might have been too high.Considering that for our experiments, we had 350 timesteps for each experiment, simplifyingthe novelty metric such that only the last timestep was recorded might have been too extreme.For further research, it would be interesting to explore the effects of conflation at each level.

Lehman and Stanley (2011) explored conflation for the maze navigating robot domain wherethey conflated all controllers that ended up in the same location. We didn’t directly experimentwith conflation, however whilst implementing the novelty metrics, we noticed that performancewas affected when there was conflation. In their results, conflating too much worsened thequality of the search and this was also the case for our domain.

6.2.2 Deception in the Pole Balancing Domain

It is important to note that we could not identify any deceptions in the pole balancing domain,even after introducing a wide range of starting positions and despite making the problem muchmore difficult when the poles were closer in length.

6.3 Limitations

Due to the time constraints of this research, for standardization, we used the same NEATparameters for fitness and novelty to test their relative performances. This may not be ideal asthe parameters may have been optimised for fitness and using the same parameters might havehindered novelty search and this may not represent the true potential of novelty search.

Additionally, since there are many other metrics and observable properties in the pole balancingdomain (positions and velocities of various parts of the mechanics), it was impossible to testevery combination of these for the novelty metric. For example, using the cart position andpole velocities was not tested.

The novelty search parameters were not experimentally tested to find optimum values for eachnovelty metric. Despite the normalisation (see table 16), these parameters could have beenoptimised for one metric whilst limiting the performance of the others.

29

7 Conclusion

The pole balancing problem is a benchmark problem used to assess controllers. Despite previoussuccessful efforts at solving the pole balancing problem with fixed initial inputs, a more gener-alised version that could be effective in the real-world was not explored. Further, all previousneuro-evolution approaches used for solving the pole balancing problem were objective-basedand used fitness functions.

In an effort to create a more generalised controller, we explored the use of a non objective-based approach for solving the generalised pole balancing problem. Our main objective was totest whether novelty search (a popular non objective-based approach by Lehman and Stanley(2011)) was suitable for this domain and secondarily, to test whether a generalised pole balancingcontroller could be evolved using novelty search. We used the standard objective-based fitnessapproach as a control to compare novelty search’s results.

Novelty search has not been previously used in this specific domain, we thus had to find themost optimum distance and novelty metrics. We experimented with two different distancemetrics used in the novelty search sparseness calculation and five different novelty metrics forrepresenting ANN behaviours.

To create a more generalised problem, we varied the initial starting angle when evolving andassessing the genotypes. We chose to vary the initial starting angles by a range of 40%. Bothfitness and novelty search approaches were tested and assessed on this generalised version.

The results overall show that the objective-based approach is far more effective at findingsolutions for both the non-generalised and generalised pole balancing problem. Specific tonovelty search, Euclidean distance out-performed vector difference distance and the pole angleswere the most appropriate as the novelty metric for most scenarios11.

Novelty search has in the past shown to be very effective at evolving controllers for deceptivedomains (Lehman and Stanley, 2011). However, unlike other domains such as maze navigationor biped locomotion, we could not identify deception in the pole balancing domain, despiteintroducing more starting positions. Since the pole balancing domain does not have any decep-tions, it could explain why novelty search performed so poorly compared to an objective-basedapproach. To our knowledge, this is the first time novelty search has been tested on a non-deceptive domain.

The objective-based approach showed promising results for the easy version of the multiplestarting positions experiments. This means that a generalised controller might be possible usingan objective-based approach. However, our research cannot conclusively rule out novelty searchfor pole balancing. Many parameters and metrics which could be optimised might improve theperformance significantly. This is discussed further in the following section.

11see section 6.2

30

8 Future Work

Various areas, due to the constrains of our research, have not been explored and could bepotential avenues of future research. As discussed in the limitations section (6.3) of this thesis,our results might not be representative of the true potential of novelty search.

An obvious area which could improve novelty search’s performance would be to optimise theparameters (such as novelty thresholds and archive size) and using different novelty metrics.Since our novelty search approach used parameters that were included in the ANJI frameworkand were possibly optimised for the objective-based approach, there could be room for improve-ment (for novelty search). This could potentially produce different results to the ones we havepresented here.

Implementation of a hybrid approach (combination of novelty search and fitness) could be aninteresting and effective use of novelty search. The novelty metric in the hybrid approach couldbe a diversity-maintenance mechanism which ensures that the search is not too exploitative,especially for generalised versions of the problem. This, along with optimised parameters, couldpotentially match or outperform the objective-based approach.

In the interest of creating generalised controllers for the pole balancing problem, we variedthe poles from left-to-right uniformly together. This means that when pole 1 was varied toposition -20%, so was pole 2. Varying the pole angles independently (for the multiple startingpositions) could be more representative of the generalised problem. This would potentially bemore difficult than having the poles vary together and would be more representative of thereal-world noisy environments.

Finally, objective-based approaches should be investigated more thoroughly for the generalisedpole balancing problem as it showed promising results in our research.

31

Approach Evaluations CPU Time

AHC 189,500 95PGRL 28,779 1,163Q-MLP 2,056 53

SARSA-CMAC 540 487SARSA-CABA 965 1,713

RPG (863) -CMA-ES* 283 -

CNE* 352 5SANE* 302 5NEAT* 743 7ESP* 289 4RWG 199 2

CoSyNE* 98 1

Table 10: One pole with complete state information provided to the controller.


VAPS (500,000) (5 days)SARSA-CABA 15,617 6,754SARSA-CMAC 13,562 2,034

Q-MLP 11,331 340RWG 8,557 3

RPG (1,893) -NEAT* 1,523 15SANE* 1,212 6CNE* 724 15ESP* 589 11

CoSyNE* 127 2

Table 11: One pole with incomplete state information provided to the controller.

Appendix A. Results from Gomez et al. (2008)

Tables 10, 11, 12 and 13 retrieved from Gomez et al. (2008), show the number of evaluationsand CPU time it took for each approach to find suitable solutions for the different versions ofthe pole balancing problem. Results on each table are averages of 50 runs for all approaches.Neuro-evolution approaches are denoted with an asterisk(*) next to their name.

It is evident from the results that neuro-evolution outperformed the ontogenetic approaches. Asthe difficulty increased, some of the ontogenetic approaches were not able to solve the problemat all and were omitted from the table.

32


RWG 474,329 70

EP* 307,200 -CNE* 22,100 73

SANE* 12,600 37Q-MLP 10,582 153

RPG (4,981) -NEAT* 3,600 31ESP* 3,800 22

CoSyNE* 954 4CMA-ES* 895 -

Table 12: Two poles with complete state information provided to the controller.

Approach Evaluations

Standard Fitness Damping Fitness

RWG 415,209 1,232,296

CE* - (840,000)SANE* 262,700 451,612CNE* 76,906 87,623ESP* 7,374 26,342

NEAT* - 6,929RPG (5,649) -

CMA-ES* 3,521 6,061CoSyNE* 1,249 3,416

Table 13: Two poles with incomplete state information provided to the controller.

33

Parameter Value

Add Conn. Mutation Rate 0.07Remove Conn. Mutation Rate 0.06

Remove Conn. Max Weight 200Add Neuron Mutation Rate 0.04

Prune Mutation Rate 1Weight Mutation Rate 0.85

Weight Standard Deviation 1.5Weight Max 500Weight Min 500

Survival Rate 0.2Elitism True

Elitism Minimum Species Size 1

Table 14: NEAT Parameters

Parameter Single (E) Single (D) Multiple (E) Multiple (D)

Population Size 35 35 35 35Generations 500 500 2000 2000Timesteps 350 350 350 350

Pole 1 start angle 0.07 radians 0.07 radians 0.07 radians 0.07 radiansPole 2 start angle 0.07 radians 0.07 radians 0.07 radians 0.07 radians

Pole 1 length 1.0 1.0 1.0 1.0Pole 2 length 0.4 0.9 0.4 0.9

Runs for Each Approach 500 500 60 35

Table 15: Pole balancing specific parameters

Appendix B. Pole Balancing Experiment Parameters

NEAT Parameters

The NEAT parameters were not altered between each experiment. Using the same parametersfor different the different approaches ensured that the experiment process was standardised andone approach was not more optimised than the other.

Table 14 lists all the NEAT parameters that were kept consistent between the different experi-ments.

Pole Balancing Parameters

Table 15 outlines all the parameters specific to pole balancing. Each approach is shown inthis table. The single column are parameters for the single starting position experiments andmultiple for the multiple (41) starting position experiments. E denotes the easy version (largepole length ratios) and D denotes the difficult version (small pole length ratios).

34

Novelty Metric PA PV PA&V CP&ABP ALL

Archive Threshold 0.005 0.001 0.001 0.003 0.0035Archive Threshold Floor 0.000025 0.000005 0.000005 0.000015 0.0000175

Novel Archive Size 105 105 105 105 105Novelty Neighbours 100 100 100 100 100

Threshold Increase (proportion) 1/4 1/4 1/4 1/4 1/4Threshold Decrease (generations) 10 10 10 10 10

Table 16: Novelty Search Parameters

Table 16 outlines parameters for the novelty search approach. PA, PV, PA&V, CP&ABPand ALL stand for pole angles, Pole Velocities, Pole Angle & Velocities, Cart Position & AngleBetween Poles and All States respectively. These thresholds were normalised across the differentmetrics.

The threshold increase and decrease parameters refer to the dynamics of the threshold. Thethreshold is increased if more than the value proportion of the population exceeds the currentthreshold. It is decreased after no genotypes are added for the value number of generations.

35

References

Banzhaf, W., Nordin, P., Keller, R. E. and Francone, F. D.: 1998, Genetic programming: anintroduction, Vol. 1, Morgan Kaufmann San Francisco.

Beyer, H.-G. and Schwefel, H.-P.: 2002, Evolution strategies–a comprehensive introduction,Natural computing 1(1), 3–52.

Darwin, C.: 1969, On the Origin of Species by Means of Natural Selection, Or the Preservationof Favoured Races in the Strugggle for Life, Culture et civilisation.

Davis, L. et al.: 1991, Handbook of genetic algorithms, Vol. 115, Van Nostrand Reinhold NewYork.

Eiben, A. E. and Smith, J. E.: 2003, Introduction to evolutionary computing, springer.

Floreano, D., Durr, P. and Mattiussi, C.: 2008, Neuroevolution: from architectures to learning,Evolutionary Intelligence 1(1), 47–62.

Gomez, F. J. and Miikkulainen, R.: 2003, Active guidance for a finless rocket using neuroevo-lution, Genetic and Evolutionary ComputationGECCO 2003, Springer, pp. 2084–2095.

Gomez, F. and Miikkulainen, R.: 1997, Incremental evolution of complex general behavior,Adaptive Behavior 5(3-4), 317–342.

Gomez, F., Schmidhuber, J. and Miikkulainen, R.: 2006, Efficient non-linear control throughneuroevolution, Machine Learning: ECML 2006, Springer, pp. 654–662.

Gomez, F., Schmidhuber, J. and Miikkulainen, R.: 2008, Accelerated neural evolution throughcooperatively coevolved synapses, The Journal of Machine Learning Research 9, 937–965.

Gould, S. J.: 1996, Full house: the spread of excellence from plato to darwin, 1996, See alsoexcerpts of Stephen Jay Gould on Stanford Presidential Lectures in the Humanities and Artsp. 197.

Gower, J. C.: 1982, Euclidean distance geometry, Mathematical Scientist 7(1), 1–14.

Gruau, F., Whitley, D. and Pyeatt, L.: 1996, A comparison between cellular encoding anddirect encoding for genetic neural networks, Proceedings of the First Annual Conference onGenetic Programming, MIT Press, pp. 81–89.

Hansen, N. and Ostermeier, A.: 2001, Completely derandomized self-adaptation in evolutionstrategies, Evolutionary computation 9(2), 159–195.

James: 2014, Pole balancing tutorial.URL: http://anji.sourceforge.net/polebalance.htm

Lehman, J. and Stanley, K. O.: 2011, Abandoning objectives: Evolution through the search fornovelty alone, Evolutionary computation 19(2), 189–223.

Lehman, Joel: 2014, Personal communication [email].

Miconi, T.: 2008, Evolution and complexity: The double-edged sword, Artificial life 14(3), 325–344.

36

Moriarty, D. E.: 1997, Symbiotic evolution of neural networks in sequential decision tasks, PhDthesis, Citeseer.

Moriarty, D. E. and Mikkulainen, R.: 1996, Efficient reinforcement learning through symbioticevolution, Machine learning 22(1-3), 11–32.

Mouret, J.-B.: 2011, Novelty-based multiobjectivization, New Horizons in EvolutionaryRobotics, Springer, pp. 139–154.

Saravanan, N. and Fogel, D. B.: 1995, Evolving neural control systems, IEEE Intelligent Systems10(3), 23–27.

Sigmund, K.: 1995, Games of life: explorations in ecology, evolution and behavior.

Stanley: 2014, Personal communication [email].

Stanley, K. O. and Miikkulainen, R.: 2002, Evolving neural networks through augmentingtopologies, Evolutionary computation 10(2), 99–127.

Sutton, R. S. and Barto, A. G.: 1998, Introduction to reinforcement learning, MIT Press.

Thierens, D. and Goldberg, D.: 1994, Elitist recombination: An integrated selection recombi-nation ga, Evolutionary Computation, 1994. IEEE World Congress on Computational Intel-ligence., Proceedings of the First IEEE Conference on, IEEE, pp. 508–512.

Werbos, P. J.: 1990, Backpropagation through time: what it does and how to do it, Proceedingsof the IEEE 78(10), 1550–1560.

Wieland, A. P.: 1991, Evolving neural network controllers for unstable systems, Neural Net-works, 1991., IJCNN-91-Seattle International Joint Conference on, Vol. 2, IEEE, pp. 667–673.

Woolley, B. G. and Stanley, K. O.: 2011, On the deleterious effects of a priori objectiveson evolution and representation, Proceedings of the 13th annual conference on Genetic andevolutionary computation, ACM, pp. 957–964.

Zitzler, E., Deb, K. and Thiele, L.: 2000, Comparison of multiobjective evolutionary algorithms:Empirical results, Evolutionary computation 8(2), 173–195.

37

assessing the feasibility of novelty search in the ... › ~chuang › polebalancing › thesis ›...

Documents