aaaarrrrmmmm - semantic scholar...this dissertation will present a control system for a robotic arm....

37
Page 1 of 37 C C C C C C O O O O O O N N N N N N T T T T T T R R R R R R O O O O O O L L L L L L S S S S S S Y Y Y Y Y Y S S S S S S T T T T T T E E E E E E M M M M M M F F F F F F O O O O O O R R R R R R A A A A A A R R R R R R O O O O O O B B B B B B O O O O O O T T T T T T I I I I I I C C C C C C A A A A A A R R R R R R M M M M M M Christos Skopelitis

Upload: others

Post on 28-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 1 of 37

CCCCCCCCOOOOOOOONNNNNNNNTTTTTTTTRRRRRRRROOOOOOOOLLLLLLLL SSSSSSSSYYYYYYYYSSSSSSSSTTTTTTTTEEEEEEEEMMMMMMMM FFFFFFFFOOOOOOOORRRRRRRR AAAAAAAA RRRRRRRROOOOOOOOBBBBBBBBOOOOOOOOTTTTTTTTIIIIIIIICCCCCCCC

AAAAAAAARRRRRRRRMMMMMMMM

Christos Skopelitis

Page 2: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 2 of 37

ABSTRACT ................................................................................................................. 3

1 INTRODUCTION ............................................................................................... 4

1.1 BACKGROUND ................................................................................................ 4 1.2 AIM ................................................................................................................ 5 1.3 PAPER LAYOUT............................................................................................... 5

2 SIMULATION..................................................................................................... 7

2.1 ARM DESCRIPTION......................................................................................... 8 2.2 ARM KINEMATICS .......................................................................................... 8

3 TIME-DELAY NEURAL NETWORK ........................................................... 10

3.1 A TDNN FOR ROBOTIC ARM CONTROL ....................................................... 11

4 LEARNING ALGORITHMS........................................................................... 13

4.1 BPTT ........................................................................................................... 16 4.2 GA ............................................................................................................... 19

5 EXPERIMENTAL SET-UP ............................................................................. 23

5.1 RESULTS....................................................................................................... 24 5.1.1 Experiment 1: Different Sensory Input ................................................ 24 5.1.2 Experiment 2: Different Tasks............................................................. 28

6 DISCUSSION..................................................................................................... 31

6.1 GA/BP HYBRID ............................................................................................ 32 6.2 FURTHER WORK........................................................................................... 34 6.3 CONCLUDING REMARKS............................................................................... 35

ACKNOWLEDGMENTS......................................................................................... 35

BIBLIOGRAPHY...................................................................................................... 36

APPENDIX: C CODE............................................................................................... 37

Page 3: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 3 of 37

AbstractAbstractAbstractAbstract This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks, will control the arm. The aim of the project is to define a configuration for the neural network that will produce the best possible behaviour on the simple, but not trivial, task of target tracking. In order to achieve this, different methods of applying the problem will be evaluated and compared, i.e. different sensor types and network configurations. The neural network will be trained on tasks of various difficulties using two of the most widely used algorithms in Artificial Intelligence (AI), Genetic Algorithms and Backpropagation. The reader will find in this paper a brief outline of these algorithms, as well as details of the specifics of the implementation. In addition to this, the project includes a comparison of those algorithms, through the results obtained from training the neural network controlling the arm, which will demonstrate the strengths and weaknesses of their use for the problem in hand. Finally, the performance of these two algorithms will be compared with the performance of a relatively new approach, the Genetic Algorithm-Backpropagation hybrid that has recently attracted the researchers interest and has been shown to perform better than either of the two algorithms alone [1].

Page 4: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 4 of 37

1111 IntroductionIntroductionIntroductionIntroduction

1.1 Background There has been a lot of work in recent years on robotic arm control using neural networks. The aim of these projects is to develop arms that will be more flexible, less expensive to set up and more efficient in their performance. One problem of the robotic arms used in Industry for example are very task specific. They need to be manually configured, very carefully, a process that is very expensive. In addition to this, they are capable of performing a very limited variety of tasks, since they have no adaptive or evolutionary mechanisms. The aim of these projects is therefore to produce robotic arms that will have adaptive behaviour, thus being more robust to errors and to input perturbations and are easier to train. One of the inspirations of this project is the work of Moriarty and Miikkulainen [14] in this area. The aim of their work was to evolve neuro-control in an OSCAR-6 robot arm, controlled by a neural network, using a Genetic Algorithm. The arm was trained to reach random target locations by avoiding obstacles. The network used input from a visual camera situated at the end of the manipulator to evolve the obstacle avoidance behaviour. A more specialised field of science, which has also invested in arm control using neural networks, shares the same goals. In the recent years a lot of research has also been conducted in the field of cortical prosthetic arm control. More specifically, the aim of these projects is to create cortically controlled prosthetic arms controlled by a neural network receiving input directly from the motor cortex.

Fig. 1: The work of Wassberg J. et al [18]

Recent work in this field [4][18] also forms the basis of this project. Experiments where performed in primates, were the arm of a monkey was restrained allowing it to

move its wrist in an angle range ]2,2[ ππ− . The level of activation of the monkeys

motor cortex was recorded for every target and normalised into training set of activations and target wrist positions. These patterns were used in training a Time Delay Neural Network (TDNN) with Backpropagation with momentum.

Page 5: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 5 of 37

1.2 Aim The aim of this project is to configure a neuro-control system using sensory information received by a robot arm. Effectively, the goal of this project is to configure a Time Delay Neural Network to develop a target tracking behaviour. In practice target-tracking behaviour is following a target in space for a preset amount of time. Target tracking is similar to the task set to the robotic arm in [4] but with one major difference. The TDNN will have to learn how to coordinate and move two joints instead of one in order to perform the required task. The performance of the arm will be evaluated in different tasks that have been generated by variations of the original task parameters. The parameters of the task that will be varied during testing are the distance between two consecutive points on the trajectory and the smoothness of the trajectory. All the legal trajectories are parabolic due to the fact that parabolic trajectories are very smooth and thus all the points of each trajectory should satisfy the general equation for an ellipsoid (1.1).

12

2

2

2

2

2

=++c

z

b

y

a

x (1.1)

However, since it is very difficult to achieve target-tracking behaviour for all the possible trajectories, a few restrictions apply to the trajectories that the robot arm will be required to predict. Firstly, the arm will be initialised within a restricted area within the test space. In addition to that the arm should be able to reach one point of the trajectory each time moment. In practice this means that the arm is required to make two rotations at most to reach the next target in the trajectory from its current position, one for each joint. The magnitude of that rotation will be adjusted according to the task’s level of difficulty. The further apart the points of the trajectory are, the harder the task. The length of the trajectory, i.e. the number of points that make up every trajectory, will also be varied to test its effect on the arm’s behaviour. In order to achieve this task, two very widely used learning algorithms will be employed to train the TDNN. The performance of Backpropagation (BP) and Genetic Algorithm (GA) will be compared and contrasted based on the quality of the results they produce. They will also be tested on tasks of varying difficulty.

1.3 Paper Layout This paper consists of six chapters. Chapter 1 will provide the reader with the necessary background information on the project. The motivation for this project is also included. In Chapter 2 the simulation used will be discussed. The very basic Geometry required will also be outlined. In Chapter 3 the basics of a neural network will be outlined. A short comparison between static and dynamic neural networks will illustrate the advantages of using a Time Delay Neural Network (TDNN) for the task. The basics of a TDNN will also be outlined. Chapter 4 will provide a short background on the Backpropagation and Genetic learning algorithms. The results of the testing to achieve the best possible results using the algorithms will also be presented. In Chapter 5 the experimental set up will be presented. The specifics of the tests performed will be given. The results for these tests can also be found in this

Page 6: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 6 of 37

chapter. Chapter 6 is the final chapter. In this chapter, the results obtained from performing the tests will be discussed. The two learning algorithms will be compared and contrasted. In addition, the connectionist/Genetic Algorithm hybrid will be compared with the Backpropagation and the Genetic Algorithm. Further work that can be done on the specific work will also be discussed.

Page 7: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 7 of 37

2222 SimulationSimulationSimulationSimulation This is the most important part of the whole project. The simulation is a separate module that is totally independent from the other parameters of the problem, such as the type of neural network or the learning algorithms used. The configuration of these parameters after a series of trial and error tests requires a constant point of reference that remains unchanged throughout the testing and is independent from the problem specific requirements. The simulation module provides this capability, thus allowing a great deal of flexibility in the testing for the configuration of these parameters. The simulation module takes as input the angle change for each joint, which is the output of the neural network. The new coordinates of the arm joints are then calculated and returned to the control structure of the arm. The purpose of the simulation is to provide this crucial information about the problem at each time moment. Since a physical arm wasn’t possible to be built for obvious reasons, this module simulates all the inputs of the network collected from the arm’s sensors and provides the effect of the network output to the position of the arm in space. Before deciding on what kind of simulation was needed for this project a few aspects had to be considered. Firstly the nature of the problem in hand was considered. As was seen in the previous section, the task that is assigned to the robotic arm is target tracking. This involves following a target, with movement subjected to restrictions, in the arena. As defined by the task, the target is not a physical object; it is merely a legal point in an arena. That means that it has no mass, thus it is not subject to any physical forces like gravity and inertia. Furthermore the arm is weightless, thus is not subject to physical forces. This assumption means that it is moving at constant speed all the time and the acceleration time from speed 00 =U to max1 UU = is negligible. Different kinds of simulations can be divided in two categories. A physical simulation is a representation of the physical world taking into account all the physical properties of objects. All the forces that could be applied on an object, like gravity acceleration, friction and inertia are simulated as accurately as possible using formulae from physics, like the three Newton Laws. For example, a robot arm in a physical simulation has weight. The joint motors are required to have an output above a certain threshold, which depends on the physical properties of the arm and its position at any given time, in order for the arm to start accelerating towards the required speed. Furthermore, when reaching for a target the arm requires a certain amount of distance before coming to a stop due to the inertia. It is apparent that creating a physical simulation from scratch would be very time-expensive and unnecessary since the task does not require taking into account any of the physical properties of the real world. On the other hand, a geometric simulation is far simpler. In this kind of simulation all the physical properties of the object are ignored, apart from the physical size of the objects. All the objects are weightless and no other forces are applied on the object. The only aspect of the real world that is simulated is the geometrical properties of the objects. For example a robotic arm in a geometric simulation has length, width and height (if the simulation is in three dimensions). It can move in constant speed meaning that the arm can reach maximum speed instantly ( 0≈∆t ). As can be seen a geometric simulation fulfils the requirements of this project for a simulation. As a result of this, a geometric simulation was used since it is simple enough to allow the completion of the tasks with the lowest time and space complexity.

Page 8: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 8 of 37

2.1 Arm Description The arm has three degrees of freedom and can move in 3D space. It consists of two joints. Both joints are equipped with motors that allow the joint to rotate according to a set of restrictions. The “shoulder” joint has two motors allowing the arm to rotate in the range ( )π,0 in the xy plane and in

the range ( )2,0 π in the yz

plane. The “elbow” joint has one motor allowing the arm to rotate in the range ( )ππ ,− in the yz plane. As can be seen from the angle rotation restrictions the arm can reach all points within a sphere of radius 21 LLr += , where 21,LL are the lengths of the two parts of the arm. The centre of this sphere is located on the “shoulder” coordinates

)0,0(O . The legal positions of the coordinates of each point ),,( zyxP in the arena have to satisfy four equations:

( )κρ

ρφκρφκρ

cos

,0, sinsin

cossin

=∈∀=

=

z

ry

x

(2.1)

( )rzyx ,0,2222 ∈∀=++ ρρ (2.2)

where θ is the longitude and φ is the co latitude (fig. 2). All four of the above equations are directly derived from the properties of a geometrical sphere. It should be noted that (2.2) is a special case of the ellipsoid (1.1).

2.2 Arm Kinematics During the movement of the arm only the geometrical characteristics interest us, thus the position of the arm is simply calculated using the equations used in the previous section. The coordinates of the manipulator can be calculated using:

( )[ ]( )[ ]( )2,1211

2,1211

2,1211

coscos

sinsinsin

cossinsin

θθθφθθθφθθθ

LLz

LLy

LLx

m

m

m

+=+=+=

(2.3)

Fig. 2: The Simulated robot arm

Page 9: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 9 of 37

where 21,θθ are the rotation of the “shoulder” and the “elbow” respectively in the yz plane, whereas φ is the rotation of the arm in the xy plane (fig. 2). These formulae provide the position of the arm when the three angles φθθ ,21, are known. However, during training the magnitude of the three angles has to be found when only the position of the manipulator is known. In such a case, the only information known, apart from the desired manipulator coordinates ),,( mmm zyxM are the lengths of the first and second joint, 21,LL respectively. The angles can be calculated geometrically using the equations shown below (2.4).

λκθκκ

ωπλ

πθω

θπωπθω

φ

−=

=⇒=

−=

−=

++=

=

−=⇔=+

=

1

22

222

2

22

acoscos

2

2asin2

2sin

22

tan

r

z

r

z

L

r

zyxr

L

r

x

y

mm

mmm

m

m

(2.4)

It should be noted that these equations could only be used in the restricted test space that will be defined in Chapter 5.

Page 10: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 10 of 37

3333 TimeTimeTimeTime----Delay Neural NetworkDelay Neural NetworkDelay Neural NetworkDelay Neural Network The choice for the network to be used to control the robotic arm was obvious: a type of dynamic (feedback) neural network would be the best choice. The requirements the chosen network should fulfil are clear from the definition of the problem. Firstly the network must be able to process time as well as space dependent input data. More importantly, it should be able to represent relationships between events in time. Finally, the features learned by the network should be invariant in time. The best choice according to previous experiments performed on similar tasks [4] is a time-delay neural network. A TDNN is effectively a feedforward network with dynamic (time-delay) elements that have the ability to process time series data. This is achieved by unfolding the input data sequence over time thus converting the time-dependant input into a static pattern. This means that time becomes another dimension to the problem. Practically however, this process is performed in a finite period of time. Because of the delay operations, the TDNN is categorised under dynamic neural networks [10]. In order to give the user a better understanding of the TDNN, its description is given in relation to the feedforward neural network (FFNN) described next. The basic neural unit’s input in a FFNN is the sum of the weighted inputs received by the previous layer and the output is calculated by applying some kind of activation function f , usually a linear or squashing function:

)( jj

iijj

fy

yw

αα

=

=∑ , j

i

Nj

Ni

≤≤≤≤

1

,1 (3.1)

where ji NN , represent the number of neurons in the previous layer and the current layer respectively (fig.3).

The same principles apply to the basic dynamic unit of a TDNN with the difference that in this case all the values are time dependent (3.2).

Fig. 3: The basic unit of a FFNN with two

inputs

Page 11: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 11 of 37

)( jtjt

t i

itijtjt

fy

yw

α

α

=

=∑∑,

i

j

i

Dt

Nj

Ni

≤≤≤≤≤≤

1

,1

,1

(3.2)

where iD is the number of delay taps per physical node, also called the delay length. The term node in the context of temporal networks is defined as a finite impulse response (FIR) filter. In fig. 4 the corresponding temporal version of the feedforward network of fig.3 with two delay taps can be seen. The delay taps give the TDNN the ability to relate and compare current and past input. In other words, the delay taps work as a kind of short-term memory. A node with no delays is said to have one tap (fig. 3). The output nodes don’t have delay taps because applying the learning rule in a delayed output tap is equivalent to applying the learning rule to the output of the last training step and consequently is redundant. However, repeatedly training previous output patterns might lead to a different behaviour and it may be worth investigating [6].

3.1 A TDNN for Robotic Arm Control The TDNN consists of three layers: the input layer, one hidden layer and the output layer. The input layer is the input received by the two visual sensors situated in the test area. The network has ten inputs1. The information used as an input is the coordinates of the target (x, y and z), the coordinates of the manipulator, the coordinates of the “elbow” joint and finally the Euclidian distance of the manipulator and the target. The x and z coordinates are normalised in the range [ ]1,1− , whereas the

1 For a more detailed description of the visual sensor refer to Chapter 5. The sensor type will be varied in tests shown in Chapter 5.

Fig. 4: A TDNN consisting of two finite response fi lters (FIR)

Page 12: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 12 of 37

y coordinate and the distance is normalised in the range [ ]1,0 . This normalisation is made to make the inputs more uniform, which will help the learning algorithms perform better. Each of the input nodes has 4=iD delay taps. All inputs, current and past are fully interconnected with the physical nodes of the hidden layer, i.e. the first tap of each FIR filter. The hidden layer consists of four hidden neurons. Each neuron receives the weighted sum of the inputs from the previous layer. Every FIR filter has 4=jD delay taps that are fully interconnected with the output layer. The weighted sum for each hidden neurons is passed to the bipolar sigmoid function (3.2), which output is in the range [ ]1,1− .

11

2)( −

+=

−exb

x (3.2)

The output layer has three neurons. Each neuron receives the weighted sum of the activations of the previous layer (hidden), which is passed through the bipolar sigmoid function (3.2). The output of this function, as mentioned above, is in the range [ ]1,1− , which is why this particular activation function was chosen. The output values can be scaled to the desired range, which will represent the change in the joint’s angle that in turn can be translated to motor activation values. The weights of the TDNN will be configured using the learning algorithms presented in the following section.

Page 13: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 13 of 37

4444 Learning AlgorithmsLearning AlgorithmsLearning AlgorithmsLearning Algorithms As in every problem that involves neural networks, the modifiable parameters of the neural network chosen for this problem have to be properly configured in order to achieve the required target tracking behaviour. The weights of the TDNN are not however the only parameters that need to be configured. The number of hidden layers, the number of hidden neurons per layer, the number of inputs as well as the number of delay taps per node per layer have to be configured. These parameters will be configured by trial and error during the tests performed in Chapter 5. The aim of these tests is to determine the configuration of the TDNN and the learning parameters that will produce the best possible behaviour. There are two very popular training algorithms of configuring neural networks. These two algorithms will be compared in order to determine which produces the best solution for the problem. The first training algorithm that will be used is Backpropagation (BP) [16]. BP was introduced in 1986 and has been widely used ever since as it is a very efficient learning algorithm for neural networks. Mathematically, BP is a gradient decent of the mean-squared error as a function of the weights. In other words the algorithm is aimed at minimizing the error between the actual response of the network and the target response value. The Backpropagation algorithm is one of the most famous supervised learning algorithms. When these kinds of algorithms are applied, the network is trained using a set of training patterns. The training patterns consist of an input pattern and target (desired) response of the network. The training patterns are presented to the network in random order every training cycle. The algorithm then modifies the weights of the network according to the error of the output that is calculated by the “teacher” algorithm as the difference between the actual response and the target response. The algorithm works in two phases; in the first phase, which is called the forward pass, the input training pattern is applied to the network and the network’s response

kk Nko ≤≤1, is calculated. In the second phase, which is called the backward pass, the error ε of the response of the output neurons is calculated according to the target value kt (4.1).

Fig 5: (a) General Learning scheme for an Error Based Learning Algorithm (like BP), (b) runtime

scheme

Page 14: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 14 of 37

( )kkk

N

k

k

otE

Ek

−=

= ∑=

2

1

2

1

2

1ε (4.1)

All the weights of the network are then modified proportionally to how much of the “fault” for the error in the network response is theirs by the weight update rule (4.2). This algorithm attempts to give a solution to the credit assignment problem. The credit assignment problem is the problem of assigning ‘credit’ or ‘blame’ to individual elements (hidden units) involved in forming the overall response of a learning system.

jiji

ww

∂∂−=∆ εη (4.2)

The coefficient η is called the learning rate. This algorithm, in this form, however is not very useful for training temporal neural nets, like a TDNN. The problem is obvious: in a temporal network, inputs are presented through time. Thus the patterns applied to the network change all the time. Furthermore, the amount of input to the network may vary through time. For example in time 1t the network has ten inputs fully interconnected with the above layer and in time 2t it has twenty inputs, fully interconnected with the above layer, ten current inputs and ten past inputs from 1t . To solve this problem, a new version of the training algorithm was presented [6] [17] called Temporal Backpropagation or Backpropagation Trough Time (BPTT). The way that this new version of the algorithm works is to basically unfold the TDNN in its static equivalent and then apply normal BP to the static network produced. The unfolding process of the temporal network to static one is very tricky since there is a major pitfall in the process: the input is not always fully interconnected with the above hidden layer. That is because, for example, an input 1I in time 1t cannot connect to the second delay tap (in 2t ), since when 1I was applied to the network the delay tap simply didn’t exist. Furthermore, some weights are constrained to be the same, since they are effectively weights of the same input to a node at a different time. The weights are still updated in the same manner taking into account the fact that the static network in not fully interconnected:

( ) ( ) ( )[ ]kji

D

n

jtjnjit NNNjiywj

,1,1,1,,1

∪∪∈∀−=∆ ∑=

δη (4.3)

( ) ( )

∈−= ∑ ∑

= −−=

j j

j

D

k

Dn

Dnm

kjmkjn

kjnjj

jn wf

Njfot

1

),min(

)1,1max(

)('

,1),('

δα

αδ (4.4)

where kji NNN ,, is the number of physical nodes in the input, hidden and output layers respectively. subscriptD is the number of delay taps per node in the layer subscriptN . In order to avoid the problem of the hidden layer being partly interconnected to the input layer during the backward pass the number of delay taps per node for the input and hidden layer has been assigned the same value ( ji DD = ) for all the experiments performed in the following Chapters.

Page 15: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 15 of 37

The second learning algorithm that will be used is a Genetic Algorithm (GA) [7]. John Holland and his students first introduced the GA in 1975. This algorithm is inspired by the Darwinian evolution theory. The GA is implemented by encoding a certain predefined number of solutions to a specific problem in a chromosome-like data structure called the genome. A very important design issue has to be addressed when facing with the problem of encoding neural network architecture on a GA genome. The issue of how to encode the weights of the network on a GA genome has been a subject of discussion in the GA community. The two most widely used schemes are the binary and real number scheme. One of the great advantages of the GA algorithm is that the solution of the problems is encoded on the chromosomes as 1s and 0s, thus ensuring that the results obtained are not related to the semantics of the problem. As well as the binary encoding scheme, there is another way of encoding the network’s weights on the chromosomes and that is by representing the weights as real numbers. Each weight is a discrete element; each real number of the chromosome represents one weight. The second approach was chosen for three reasons: firstly it would be very computationally expensive to convert all the weights of the network from binary to real numbers for every evaluation of the chromosomes. Secondly the chromosomes produced by the binary encoding scheme would have as a result very long chromosomes, due to the fact that each weight would be represented by a series of sub strings. This would be impractical, since most programming languages don’t allow very big structures (normally not over 64K). Finally, the most serious problem of the binary encoding scheme is that if too few bits are used to represent each connection weight the training time and the quality of the results (if any are produced) will be unacceptable because some combinations of real number weights cannot be approximated by discrete values. It has been shown [20] that the real number encoding scheme can produce results matching the binary encoding scheme. The collection of all the chromosomes, or individuals, in a generation is called a population. All the individuals in the current population are evaluated and then a reproductive opportunity is assigned to each individual according to the quality of the solution they represent. This reproductive opportunity, also called fitness, is proportional to the quality of the solution; the better the solution, the higher the probability for the individual to “reproduce”. This gives the best solutions a better chance to “reproduce”, since the chance that they will produce an even better solution is higher than the chromosomes representing a poor solution. After fitness has been assigned to every individual a certain amount of individuals is chosen from the current population to create the mating population. In order to do this, a parent selection algorithm is applied to the current population. The purpose of these algorithms is to exploit the fact that individuals, or parents, with a large fitness represent better solutions, thus they should have a greater probability of being chosen to “reproduce” by the parent selection algorithms unlike individuals with a low fitness. Following this, the population of the mating population is recombined using various genetic operators. During recombination, sub-strings of the chromosomes or genes, of the best chromosomes are combined to form new solutions. The most widely used genetic operators are Crossover, Mutation and Reproduction. Each of those operators has a predefined probability of being applied. Usually the probability of mutation is kept very small, usually around %05.0 . The genetic operation is an iterative process, which stops when the new population (the next Generation), which consists of the recombined individuals of the mating population, are of the required

Page 16: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 16 of 37

size. During the genetic operations two individuals from the mating population are chosen. Then the probability of each operator is used to decide which genetic operation is to be performed. There are different kinds of Crossover, what this operator does in principle is to allow an exchange of genetic material between the two chosen individuals, the parents, based on a common crossover point which indicate which parts of the genome is going to be exchanged (fig.6).

Fig. 6: The Crossover Operator

There are also many ways to perform Mutation, however in its basic form, a randomly chosen part of the genome is changed to an arbitrary value. This operation represents random search around the most successful individuals of the previous generation and is therefore not very powerful. This random search usually produces worse solutions than the parent, however there is a chance it will produce a more successful solution and that is the reason it is used.

Figure 7: The Mutation Operator

The Reproduction operator simply creates a copy of the two parents to the next generation. After the genetic operators have been performed, the new generation goes through the same process. The GA is repeated either for a certain amount of generations or until a stopping criterion is fulfilled. Both the BP and the GA have modifiable parameters that can be configured in order to produce the best possible results. These parameters will be configured experimentally by trial and error. Furthermore, there are special optimisation techniques that can be applied to both learning algorithms that further improve their performance. These will be discussed in detail in the following section. It should be noted at this point that the initial weights for all the tests are initialised with small random values. However the initialisation range will be determined by trial and error.

4.1 BPTT The performance of BPTT on the tasks assigned to the arm can be improved by modifying the algorithms adjustable parameters. In the following section various tests will be performed to determine the configuration of the TDNN and BPTT’s parameters that produce the best results. The tests will be performed on a single task that involves the training of the TDNN using a number of smooth trajectories. A trajectory is defined as a collection of points in space that constitute the sequence of the arm’s manipulator coordinates recorded in a predefined number of movements,

Page 17: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 17 of 37

i.e. the length of the trajectory. The task the arm has to perform is to predict a trajectory that involves the movement of both its joints. A further restriction is that

each point of the trajectory has to be within a rotation angle in the range ]12,24[ ππ

for both joints. The tests performed are outlined below. The performance of the BP in all the tests, in order to have a meaningful comparison between them, is evaluated towards the lowest training error produced. In addition to this, the results obtained by each test are implemented in the next. All the training trajectories consist of four points and all tests are executed for 200 iterations. The initial weight region is very important to the quality of results produced by the algorithms. The following test is aiming to determine that initial weight region. The test was performed for weights in regions of the

form ]1,0625.0[],,[ ∈− mmm . The best results were observed for 5.0=m (fig.8). The performance of BPTT is measured as a function of the number of hidden neurons. The aim of this test is to determine the configuration of the network that produces the lowest training error. The test is performed on a range of hidden neurons [ ]20,1 . As it can be seen from the results in fig. 9 BPTT works better in certain regions of the number of neurons space. The lowest error was produced for

12=jN . The aim of next test is to determine the configuration of the TDNN. As it was seen in

an earlier chapter, the delay element of each node corresponds to the amount of past information that contributes to the TDNN response at every time moment. It is obvious that the amount of information stored in the TDNN’s short-term memory could have an effect on the response of the TDNN. Thus this test evaluates the performance of the learning algorithm as a function of the delay length per FIR filter (node). The only constraint of this problem is that the delay length has to be equal to or less than the trajectory length. It has already been mentioned that a network

with one delay tap per node is a FFNN. By testing values for the delay length ranging [ ]T,1 , where 4=T is the length of the trajectory, the performance of a FFNN with the

Fig. 9: Testing for the optimum number of hidden neurons

Fig. 10: Testing for the optimum number of delay taps per node

Fig. 8: Initial Weight region test

Page 18: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 18 of 37

same configuration is evaluated on the particular task. The best results were achieved for 1−== TDD ji . It is worth noting that the feedforward network, 1== ji DD , with the same number of input performed considerably worse than the best dynamic

network (fig10). In the next task the learning rate η will be configured. This parameter, along with the error, determines the amount of change each weight is subject to. The parameter usually receives a value in the range )1,0( , thus the test will be performed for η values in that range. BPTT with learning rate has a drawback: it has no means of escaping local minima of the error function. A local minim is the smallest value of a function f(x) within a small range of input x, that is usually different from the global minimum (the best solution to the

problem of minimizing the error). There are many techniques that have been proposed to deal with this problem. The most widely used is technique is adding a momentum term that affects the amount of change each weight is subject to. With momentumµ the update rule for the weights becomes:

10,)1(

1

<<∆+−=∆ −

=∑ µµδη tji

D

n

jtjnjit wywj

(4.5)

The role of the momentum is to add a small fraction µ of the previous weight update to the current one. When the weights change in a certain direction, positive or negative, the momentum term helps to increase the size of steps taken towards that direction. The momentumµ is usually set in the range )1,0( . The test compares the performance of the network for values of µ in that range. It should be noted however, that a very large µ could result in less than optimum solutions, since the solution can be missed. Since η and µ have been know to be closely linked this test will be performed for all the pairs of µη, in the range )1,0( . As expected the best training results were received with the combination of

1.0=η 2.0=µ 2 (fig11). Finally, the training of the network is meaningless without generalisation; after training the TDNN should be able to control the arm for every target

2 This is the most common µη, combination.

Fig. 11: Testing to find the optimum η

and µ combination

Fig. 12: Early Stopping. In this case the training will stop in 120 iterations

to avoid over-fitting.

Page 19: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 19 of 37

trajectory in the test area satisfying the restrictions, not only the training patterns. As a result, to avoid over-fitting, the training needs a means of evaluating the generalisation capability of the TDNN. In order to achieve this, another simple and very widely used technique is adopted: early stopping. The training set is split in two sets: the training and the validation set. The network is trained normally using the training set. After all training iterations the performance of the network is evaluated on the validation set. Since during the training phase the validation set was not presented to the network, the error of the validation set is the amount of generalisation the TDNN can provide. Consequently, the aim of the training is to minimise the validation error. The effectiveness of early stopping can be seen in fig. 12. If the network is trained more than 120 iterations, the generalisation capability of the network becomes poorer, thus the training is stopped in iteration 120, were the best lowest generalisation error was achieved.

4.2 GA As seen in the previous chapter, the problem’s parameters need to be defined as well in order to produce the best results possible. In order to have a meaningful comparison between the two algorithms the GA will be configured on the same task used to configure the BP parameters. In other words, the task is to follow a trajectory with the restriction that each pair of points are positioned in space in a way that the

arm needs to rotate both joints by an angle in the range ]12,24[ ππ in order to go

from one to the other. Each trajectory consists of four points. The quality of results produced by the GA with the various configurations will be measured in the maximum fitness produced by each test. As before, the results obtained from each test are immediately implemented for the next. Furthermore, the same amount of training patterns used in the BP training is also used. Before starting the test phase, the core of the GA has to be defined: the fitness function. The fitness function is the most important part of a genetic algorithm, since it is directly related to the quality of the results the GA will produce. Consequently a suitable fitness function could mean the difference between getting the desired results and not getting any good results at all. Since the task is basically a matter of minimising the distance of the predicted position of the arm and the desired position. The simplest way of calculating the distance between two points ),,( 111 zyxA and

),,( 222 zyxB in 3D space is the Euclidian distance defined

)()()( 212

212

212 zzyyxxd −+−+−= (4.6)

What we need is the fitness of each genotype to increase when the distance decreases. Thus distance and fitness is inversely proportional. The fitness function is defined in (4.7).

∑=

=

=

T

k

kdD

Dcf

1

1

(4.7)

Page 20: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 20 of 37

where T is the number of points contained in each trajectory (trajectory length). The coefficient c , which also represents the maximum fitness, is task dependant. It is obvious that the weights have to be initialised in a certain region. The weights are usually are initialised with small random values. This test is aimed to determine the best initial weight region that produces the best behaviour. The best behaviour was produced when the weights were initialised in the region [ ]125.0,125.0− (fig. 13).

A second test is performed to determine the parent selection method. As it was mentioned the previous section, the parent selection operator is applied after the evaluation of all the individuals in the current population. Three methods are going to be compared; the tournament, roulette wheel and random parent selection method. The random

selection method is the simplest of the three. The individuals that are going to constitute the mating population are selected at random. The tournament method is more complicated. In this case two individuals are selected from the current population randomly and their fitness is compared. The individual with the best fitness of the two is the winner and is added to the mating population. Finally, the roulette wheel is the most complicated. The mechanism operates in three stages: firstly, the fitness of all the

individuals in the population is summed. Following that, a random number r between 0 and summed fitness S is generated. Then the fitness of all the individuals is added one by one in order until the sum is greater than r. The last individual whose fitness was added to the sum is added to the mating population. The Roulette Wheel Parent Selection method was chosen since it performed better than the other two (fig. 14). The next test is performed to determine the number of individuals in each generation. The population size is linked to the quality of the results because it represents the amount of weight space explored in every generation. A small population size could lead to erroneous solutions, like a local minimum. Usually, the population size for a

Fig. 13: Testing for the optimum initial weight region

Fig. 14: Parent Selection methods comparison

Fig. 15: Testing for the optimum population size

Page 21: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 21 of 37

GA is in the range of[ ]100,30 . The test will be performed in ten of those values to determine the optimum population size. In a similar test the mating population size will also be determined since it is as important as the size of each generation. Since the results encouraged further testing with a higher population size, the performance of the algorithm was tested for 150 individuals in the population. As can be seen from the results in fig. 15 the best results were

achieved with 150 individuals. However, 100 individuals were chosen as the best solution for two reasons: firstly, the difference in the performance of the two GA’s with 100 and 150

individuals is very small and more importantly training with a 150 individual GA is very time-expensive. The mating population is usually smaller than each generation’s population; however values taken from the same range as in the previous test will be used for this test as well. As it can be seen from the results in fig. 16, the size of the mating population doesn’t have significant effect on the training. Consequently, a mating population of 50 individuals is chosen as the best for the sake of time-efficiency. After the mating population has been created, the individuals are recombined

using three genetic operators Crossover, Mutation and Reproduction. The first two have modifiable parameters that need to be tuned in order to produce the best results possible. The crossover operator can be divided in two categories: two-point and uniform crossover. Two-point crossover is a variation of one-point crossover. In one-point crossover the offspring consists of two large “chunks” of genetic material, one from each parent. The number of genes in each “chunk” is determine by the crossover point. The only difference of one-point and two-point crossover is the number of crossover points. On the other hand, in uniform crossover each gene of the offspring has a 50% chance of being from each parent. Two-point crossover was chosen because of the time intensively of uniform crossover (fig. 18).

Fig. 16: Testing for Mating Population size

Fig. 18: Two-point vs. Uniform Crossover

Fig. 17: Testing for optimum mutation probability

Page 22: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 22 of 37

The probability of each of the operations to be used is usually 0.7 for Crossover and very small for Mutation. Nevertheless, the exact probability of Mutation has to be defined by trial and error. Small values for mutation in the range [ ]1.0,01.0 are

compared. As it can be seen from fig. 17 the best results were achieved with

05.0=mP . In order to prevent the best solution of each Generation from being lost from one generation to the next, another very common technique is used: elitism. This technique prevents the best solution from being lost by replacing one individual of the next generation with the best solution of the current generation, i.e. the individual with the highest fitness. As a result of this technique, there is always a good solution in each generation, thus

increasing the probability of a better solution being created (fig. 19).

Fig. 19: Elitism

Page 23: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 23 of 37

5555 Experimental SetExperimental SetExperimental SetExperimental Set----upupupup Before each test is performed, the TDNN and the learning algorithm’s parameters have to be configured according to the test performed in the previous section. Firstly, in order to improve the results of the training the inputs of the neural network are pre-processed before training. The pre-processing involves normalising the data between

]1,1[− . Normalising the training data has been shown to reduce the training error. Furthermore, restrictions apply to the all tests for two reasons: firstly the restrictions allow the network to learn the desired behaviour with a reasonable efficiency and secondly to allow a meaningful comparison of the results of the testing. To begin with, the manipulator will be initialised randomly within a confined area of the test space. As it has already been mentioned in a previous section, the test space, i.e. the area that the arm can be legally positioned, is also restricted. The arm is initialised in a

range of ]245,24[ ππ in the xy plane and ]2,4[ ππ in the yz plane. The common

comparison quantity for all the tests will be the minimum mean distance from each point on a trajectory that were observed during testing. The two algorithms are going to be trained on 300 randomly generated trajectories. In the case of BPTT, a number of them will be used for the training and the rest part as the validation set, which is the measure of the generalisation capability. In the GA, all the patterns will be used for training. Other than this difference the training of the network using the two algorithms is similar. The first step when training the network using BPTT is to initialise all the learnable parameters. The training and validation sets are created taking into account the restrictions mentioned in the previous sections. Training is performed as normal: each pattern is presented to the network. Before each training pattern is presented to the network all the delay taps are initialised, which in practice means that they are emptied from every value they might hold and their contents is set to 0 . The TDNN is required to produce its prediction for each point of the trajectory according the current weights configuration. The error calculations take place and BPTT is applied using the learning rate η and the momentum coefficientµ calculated in the previous section. The weights are then modified according the calculated error. After all the training patterns have been presented and the weights have been modified accordingly the network’s generalisation capability is evaluated by applying the validation set. These steps are repeated for 200 iterations. For every iteration, the current network configuration is printed to a file. At the end of the training, the network configuration that produced the lowest generalisation error is saved, as it is the best possible configuration that could be achieved by the learning algorithm. The performance of the network is tested. If the results are not satisfactory the training steps are repeated for another 200 iterations, until the network produces a result within acceptable error. When training the TDNN using the GA the training trajectories are initialised with respect to the restrictions outlined in the previous section. The initial generation is generated randomly, again with respect to the restrictions. The training is similar to BPTT: Each pattern is presented to the network, which is required to produce its prediction. The distance of the manipulator from the desired position is added to the mean observed distance so far. After all the individuals, representing a TDNN configuration, in the current generation has been evaluated its fitness is calculated by the fitness function defined in the previous section. After the current population has been evaluated the best individual from the current population is printed to a file.

Page 24: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 24 of 37

Applying the genetic operators to the current population creates the next generation. The training steps are repeated for 100 generations at a time, until the best possible behaviour has been achieved. After the learning parameters have been configured as described in Chapter 5, two sets of experiments will be performed with the control systems derived from each training algorithm. In the first set of experiments the performance of the arm will be evaluated using different kinds of input. The results of the experiments will be compared to determine the kind of inputs that produce the best behaviour for each training algorithm. The best kind of input will be immediately integrated to the control system. In the second set of experiments the arm’s behaviour will be evaluated on different tasks. Each control system configured by each of the three training algorithms will be tested in variations of the original task to determine which configuration’s performance is the best. The tasks on which the control systems will be tested on are harder than the original test. As mentioned in Chapter 2, the amount of joint rotation that is required to reach every point in the trajectory is directly linked to the task’s level of difficulty. The further apart the points of the trajectory are, the harder the task. For all the experiments the application used for the simulation was MatLab 5.3 was used. MatLab was used as an offline simulation that provided the necessary tools of representing the different arm states during training and the predicted vs. desired trajectories. This was achieved by storing the coordinates of the arm during testing and then simply plotting the results using the appropriate MatLab command.

5.1 Results

5.1.1 Experiment 1: Different Sensory Input In this first set of experiments the behaviour of the arm will be evaluated in respect to different kinds of sensory input. For the purpose of this experiment three different kinds of sensors will be used, one for each test. Firstly, the input of the network will be calculated from two cameras situated in the test area and are external of the arm (fig. 20). Both cameras have a “god’s eye” point of view. The first camera’s point of view represents the position of the arm and the target in the xy plane and the other in the yz plane. The inputs from the low-resolution cameras are processed and the following information is extracted from the two images: the x, y, and z coordinates of the target, the wrist joint and the elbow joint. The Euclidian distance of the wrist joint from the target is also calculated. The network that is constructed has ten inputs. Secondly, the input of the network will be calculated as in [5]. In this case the manipulator has six directional proximity sensors that can sense the target in the positive and negative x, y and z coordinates. The sensors are located in the back, front, left, right, above and below of the manipulator. All the sensors have range r. If no obstacle

Fig. 20: External cameras as sensors, “god’s eye” point of view

Page 25: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 25 of 37

is within range of each sensor, then its activation is set to r. In any other case the activation is set to the relative distance of the wrist and the target in the corresponding plane. In this test the network has six inputs. Different values for the range r will also be tested. In the final test, four low-resolution cameras situated on the manipulator will provide

the input of the network. Each camera has range r and a field of view angle 2π .

Effectively, the sensors detect in which direction is the target. If a target point is within range of a sensor its activation is 1, if not 0. This kind of input is the more realistic sensory input compared with the other two. For this test, the TDNN will have eight inputs. Different values for the range r will also be tested.

Table 1: Learning Algorithms’ Configuration for Low Resolution Cameras

BPTT GA

Initial Weight space [ ]5.0,5.0− [ ]125.0,125.0−

Number Of Output 3 3 Number of hidden nodes 12 6

Number of Iterations 200 100

Trajectory range ]12,24[ ππ ]12,24[ ππ

Number of Training Patterns 250 200 Number of Validation Patterns 50 50

η 0.1 N/A

µ 0.2 N/A

Population size N/A 100 Mating Population size N/A 50

cP N/A 0.7

mP N/A 0.05

Fig. 21: Average observed distance Fig. 22: Test on previously unseen trajectories

Page 26: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 26 of 37

Table 2: Learning Algorithms’ Configuration for Inf rared Proximity Sensors

BPTT GA

Initial Weight space [ ]5.0,5.0− [ ]125.0,125.0−

Range 5 5 Number Of Output 3 3

Number of hidden nodes 12 6 Number of Iterations 200 100

Trajectory range ]12,24[ ππ ]12,24[ ππ

Number of Training Patterns 250 200 Number of Validation Patterns 50 50

η 0.1 N/A

µ 0.2 N/A

Population size N/A 100 Mating Population size N/A 50

cP N/A 0.7

mP N/A 0.05

Fig. 24: Test on previously unseen trajectories

Fig. 23: Average observed distance from target

Page 27: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 27 of 37

Table 3: Learning Algorithms’ Configuration for Low -Resolution Cameras on the end effectors

BPTT GA

Initial Weight space [ ]5.0,5.0− [ ]125.0,125.0−

Range 3 3 Number Of Output 3 3

Number of hidden nodes 12 6 Number of Iterations 200 100

Trajectory range ]12,24[ ππ ]12,24[ ππ

Number of Training Patterns 250 200 Number of Validation Patterns 50 50

η 0.1 N/A

µ 0.2 N/A

Population size N/A 100 Mating Population size N/A 50

cP N/A 0.7

mP N/A 0.05

In figures 21, 23 and 25 the average distance from the target training trajectories is plotted. The graph is a result of the averaged values produced from 10 simulations. On average, an acceptable solution for both of the learning algorithms was found within the first 100 iterations. Figures 22, 24 and 26 show the behaviour of the arm after being tested for a random previously unseen trajectory. The target trajectory is represented by the red dotted line. The actual trajectory that was produced by the arm during testing is represented with a black or blue line.

Fig. 25:Average observed distance Fig. 26: Testing on previously unseen

trajectories

Page 28: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 28 of 37

5.1.2 Experiment 2: Different Tasks The best results were received when the TDNN takes its input from low-resolution cameras situated on the effectors and is evolved using the GA. The robustness of the best neuro-controller obtained in the previous section will be evaluated in this section in variations of the original task. As it was described earlier, the points of the each trajectory were created based on a few restrictions. Namely, the amount of joint rotation is needed to reach each target in the trajectory will be increased. Consequently, this test will determine the best control system, namely the system that has the best behaviour performing the most difficult task. Currently, the amount of

rotation required by each joint is restricted in a range of size 24π . The size of the

legal allowed rotation amount will be doubled in order to evaluate the adaptive capability of the neuro-controller.

Table 4: GA configuration for range size 12π

BPTT GA

Input Type Low-Resolution

Cameras Low-Resolution

Cameras

Initial Weight space [ ]5.0,5.0− [ ]125.0,125.0−

Range 4 4 Number Of Output 3 3

Number of hidden nodes 12 6 Number of Iterations 200 100

Number of Training Patterns 250 200 Number of Validation Patterns 50 50

η 0.1 N/A

µ 0.2 N/A

Population size N/A 100 Mating Population size N/A 50

cP N/A 0.7

mP N/A 0.05

Fig. 27: Average distance from target Fig. 28:Test on previously unseen trajectory

Page 29: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 29 of 37

Table 5: GA configuration for range size 6π

BPTT GA

Input Type Low-Resolution

Cameras Low-Resolution

Cameras

Initial Weight space [ ]5.0,5.0− [ ]125.0,125.0−

Range 4 4 Number Of Output 3 3

Number of hidden nodes 12 10 Number of Iterations 100 100

Number of Training Patterns 250 250 Number of Validation Patterns 50 50

η 0.1 N/A

µ 0.2 N/A

Population size N/A 100 Mating Population size N/A 50

cP N/A 0.7

mP N/A 0.05

Fig. 29: Average distance from target Fig. 30: Test on a previously unseen trajectory

Page 30: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 30 of 37

Table 5: GA configuration for range size 3π

BPTT GA

Input Type Low-Resolution

Cameras Low-Resolution

Cameras

Initial Weight space [ ]5.0,5.0− [ ]125.0,125.0−

Range 4 4 Number Of Output 3 3

Number of hidden nodes 12 6 Number of Iterations 200 100

Number of Training Patterns 250 200 Number of Validation Patterns 50 50

η 0.1 N/A

µ 0.2 N/A

Population size N/A 100 Mating Population size N/A 50

cP N/A 0.7

mP N/A 0.05

In figures 27, 29 and 31 the average distance observed during testing from the target training trajectories is plotted. The graph is a result of the averaged values produced from 10 simulations. On average, an acceptable solution for both of the learning algorithms was found within the first 100 iterations. Figures 28, 30 and 32 show the behaviour of the arm after being tested for a random previously unseen trajectory. The target trajectory is represented by the red dotted line. The actual trajectory that was produced by the arm during testing is represented with a black or blue line. It is worth noting however that the GA was unable to achieve better solutions than BPTT in difficult tasks with the TDNN configuration of 6 hidden neurons in one layer as in the previous tests. Nevertheless, it was observed that by increasing the number of hidden neurons from 6 to 10, the GA was able to outperform BPTT.

Fig. 31: Average distance from target Fig. 32: Test on a previously unseen trajectory

Page 31: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 31 of 37

6666 DiscussionDiscussionDiscussionDiscussion This paper has reported a series of experiments performed aiming to find the best arm neuro-controller to perform the task described in Chapter 1. These experiments verified [1] that artificial evolution of control architectures using Genetic Algorithms could be far more efficient than local gradient descent algorithms, like Backpropagation. As it was mentioned in a previous section, Backpropagation, and as an extension BPTT, represents a local search of the weight space, in order to configure the neural network employed for the solution of the problem to produce the best results. The weights of the network are modified in proportion to the amount of the “fault” they are assigned by the learning rule for the difference of the actual output and the desired output. A first weakness of this algorithm that can be seen is that BPTT finds the optimum (or near optimum) solution to a problem within the limits of the initial weigh region. However, a better solution could lie in another region of the weight space that is inaccessible by the algorithm due to the local feature of the algorithm. Another disadvantage lies in one of the requirements of BPTT. The size and number of the hidden layers can have a very important role in training. Two hidden layers are sufficient to approximate any function, provided that there are enough hidden neurons in each layer [12]. However, in practice networks that have very big architectures are slow to train and require a lot of computation time. Furthermore, a lot of testing is required to find the optimum number of neurons. A very good way of dealing with this problem is to initialise training with a very large amount of hidden neurons and then use some kind of pruning algorithm to find the optimum number of weights3. In addition to the problem of very large networks to approximate a function, there are other problems with the learning algorithm, like over-fitting and local minima. Optimisation algorithms do exist to deal with these problems, like early stopping and the momentum term. The conclusion is that the results of BPTT need to make use of optimisation techniques is order to produce the best results possible, by avoiding potential pitfalls. Additionally, BPTT is very sensitive to external and internal perturbations of the problem parameters. As it was seen in fig. 11, the modification of the learning rate from 0.2 to 0.3 could mean the difference between learning and not learning at all. Furthermore, whenever the parameters of the problem change, the parameters of BP and the network have to be reconfigured in order to achieve the best solution. As well as it can be seen from fig. 24, BPTT might be unable to train a network if the task becomes too hard (an “impossible task”). Finally, the algorithm might not be able to produce good results when inappropriate inputs are used. The solution to the training problems of BPTT seems to be artificial evolution. Evolving an arm neuro-controller seems to be producing better results than BPTT. The superiority of the Genetic algorithm was expected since this algorithm represents a global search of the weight space in order to find the best network configuration. It should also be noted that the behaviour produced by the evolved neuro-controller was achieved with smaller networks than BPTT. In addition to that, the GA’s search for the optimum solution is not related to the problem semantics. There is evidence enough through testing that due to those two factors the search of the weight space is more powerful, thus more effective than the search performed by the BPTT.

3 Pruning algorithms remove nodes that don’t contribute to the output of the network, thus reducing the size of the network.

Page 32: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 32 of 37

Problems, like local minima, have a lesser effect in artificial evolution than gradient descent based algorithms like BPTT. A locally optimum solution will receive a lower fitness than a globally better solution. Additionally, the evolved neuro-controllers appeared to be very robust to change in the problem parameters, like changes in the kind of inputs. Moreover, the GA was still able to produce acceptable arm behaviour in more difficult problems. The only drawback of this algorithm is that it is very time expensive. Each generation involves the evaluation of every individual in the current generation in a very large amount of training trajectories. Thus the time complexity of every iteration for the BPTT and each generation for the GA can be calculated using

SizePopulationernsainingPattNumberOfTrC

ernsainingPattNumberOfTrC

GA

BPTT

*==

(6.1)

It is obvious that GABPTT CC < when the number of training patters in the same for both algorithms. Nevertheless, the trade of speed to quality of results seem to be worth, as long as the time complexity of the GA is not too high.

6.1 GA/BP Hybrid As was mentioned earlier, the initial weight region can affect the quality of the results of the Backpropagation algorithm. It has been proposed [1] that evolutionary training using a GA can be improved by incorporating a local search mechanism into the artificial evolution. Combining the global sampling capability of a GA with fine-tuning by local search using BPTT could produce better results than either of the two mechanisms working separately. In practice, this local/global search algorithm allows the hybrid to locate a good initial weight region and then by applying a local optimisation algorithm, which could be a gradient descent search algorithm like BP, to find near optimum solution for that region. Belew et al. [1] used a GA to perform a global search to find a good set of initial weights and then used BP to perform further fine-tuning on those results. These results showed that the GA/BP hybrid was more efficient than BP and very competitive in comparison with the GA. Effectively, this hybrid training algorithm provides a more efficient way of choosing the initial weights when compared to the random initialisation scheme. The same tests that were performed for the GA will be repeated for the hybrid in order to achieve a configuration of the learnable parameters that produces the best results possible. Nevertheless, the BPTT configuration will be kept the same, since the hybrid algorithm is practically running the BPTT for different initial weights. Consequently, reconfiguring the parameters of BPTT is redundant. The BP will have momentum and the GA will have elitism. However, the only thing that changes is the manner that each encoded solution will be evaluated. Since in this case the aim of the algorithm is to produce an initial configuration that after fine-tuning can produce the best generalisation error the fitness function becomes:

V

cfε1= (6.2)

where Vε is the mean validation error and c is a normalisation coefficient. It could be argued that duplicating an already time intensive learning algorithm like BP 100 times to create a population and over 100 generations might not be very

Page 33: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 33 of 37

practical, from the time-complexity point of view. If the desired results can be achieved by using BP, then this might be the case. However, there are two main reasons for using GA/connectionist hybrid algorithms as opposed to using a GA or the BP on their own. Firstly, as it was mentioned before, BP has been known to be sensitive to stochastic variation. The use of optimisation techniques combined with random restarts aims to improve the quality of the results. Additionally, new architectures have been evolved that is superior to the standard fully interconnected hidden layer. Extending BP to find those architectures has been proven difficult. Moreover, the global sampling capability of the hybrid algorithm has been proven to be far superior to random initialisation. Finally, as it can be noted from most of the experiment, the BP time complexity has been greatly reduced by the use of the hybrid. The three algorithms will be compared by evaluating the behaviour they produce using the same controller. The training of the network using the GA/BP hybrid algorithm is very similar to the GA. There is a major difference in the way each genotype is evaluated. In practice, in order to evaluate each genotype BPTT with momentum is applied as described above. A validation set has been created from the training set. After the BP has been applied for 100 epochs, the chromosomes generalisation capability is calculated by applying the validation set. The mean error recorder is used to calculate the fitness of the individual. After 100 generations the training stops using the hybrid stops and the solution with the best fitness, that has been saved as seen above, is used to initialise the weights of the network. The BP is then applied producing the near optimal solution.

Table 6:

GA/BP Hybrid

Initial Weight space [ ]1,1−

Range 4 Number Of Output 3

Number of hidden nodes 10 Number of Iterations 50

Trajectory range ]12,24[ ππ

Number of Training Patterns 150 Number of Validation Patterns 50

η 0.1

µ 0.2

Population size 100 Mating Population size 50

cP 0.7

mP 0.05

Page 34: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 34 of 37

As it can be seen for fig. 33, 34 and 35 the GA/BP hybrid has outperformed either of the two algorithms on its own. The hybrid has managed to locate a good weight region, which produced a far better result than BPTT. The GA/BP hybrid has also managed to outperform GA, which has produced the best results so far.

6.2 Further Work The results produced for the simple task of object tracking are promising. The performance of the three algorithms was evaluated and the superiority of the hybrid GA/BP algorithm was demonstrated. However, the capability of evolving connectionist structures with the use of artificial evolution has barely been explored in this paper. There have been experiments [1] [14] [20] showing that evolution can be a more helpful tool than configuring the initial weights of the network to be used by a local optimisation algorithm. Evolutionary search could replace the old-fashioned trial and error tests aimed to find the optimum parameter configuration. As it was mentioned before, even after an algorithm has been chosen, there are still parameters to be configured in order to achieve the best possible results. Evolving network architectures is within the capabilities of an evolutionary algorithm. Since it has been shown that the parameters of the learning algorithms interact with each other, it is only natural to evolve them together, i.e. encode the BP’s parameters on a chromosome as well as the weights could be considered as a further exploration of the interactions of learning algorithm and network architectures. A lot of researchers have addressed the issue of evolving neural network architectures and finding the optimum learning parameters of learning algorithms, like BP. Another equally important field that still lies relatively unexplored is the artificial evolution of the learning, or weight-updating rules. Learning rules can be evolved, as long as they fulfil two basic requirements:

Fig. 33: The GA/BP hybrid Training Error Fig. 34:Training Error running BP with the same parameters used to test the

hybrid

Fig. 35: Comparison of the average distance from the target trajectory of

the hybrid and GA

Page 35: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 35 of 37

• The weight update has to depend on local information, such as the activation

of the input node or the current connection weight, • The update rule has to be the same for all the nodes of the network4.

6.3 Concluding Remarks The results obtained from the testing confirm that this project has been a success. This project showed the strengths and weaknesses of the Backpropagation and Genetic Algorithm. As it was shown, the Genetic Algorithm performs a more powerful search in the weight space, opposed to the Backpropagation due to the global attribute of the search performed by the former contrasting the local search performed by the BPTT, which is less powerful, since a much smaller region of the weight space is explored. Furthermore, it was shown that gradient descent/GA hybrids have the capability of performing better even in difficult tasks5. Additionally, such hybrids have the capability of configuring not only local search algorithm parameters, but network architectures as well. It was successfully demonstrated that hybrids are far superior to BPTT and even GA is

Acknowledgments Acknowledgments Acknowledgments Acknowledgments This work was made possible through a scholarship received from the ALEXANDER S. ONASSIS PUBLIC BENEFIT FOUNDATION: Greek Section of Scholarships and Research to which I express my gratitude. This work was also made possible by the invaluable help of my supervisor Phil Husbands. The author would like to thank Geoffrey Spikes for his very useful comments.

4 Recent work has shown that different updating rules for groups of nodes can produce very interesting results. 5 The definition of the level of difficult of a task has been given in Chapter 1.

Page 36: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 36 of 37

BibliographyBibliographyBibliographyBibliography [1] Belew K.R., McInerney J., Scraudolph N.N., 1990, “Evolving Networks:

Using the Genetic Algorithm with Connectionist Learning”, CSE Technical Report #CS90-174.

[2] Bengio Y., Frasconi P., Gori M., 1993, “Recurrent Neural Networks for Adaptive Temporal Processing”.

[3] Brady M., 1989, Robotics Science, Massachusetts Institute of Technology. [4] Burrow M., Dugger J., Humphrey D.R., Reed D.J., Hochberg L.R.,

“Cortical Control of a Robot Using a Time-Delay Neural Network”. [5] Castaño A., Hutchinson S., “Visual Compliance: Task-Directed Visual

Servo Control”. [6] Edwards T.R., “An Overview of Temporal Backpropagation”, 1991,

Adaptive Systems, EE-373. [7] Goldberg D.E., 1989, “Genetic algorithms in search, optimisation, and

machine learning”, Reading, Mass.: Addison-Wesley. [8] Gottfried B., 1996, Programming with C, McGraw-Hill. [9] Groover M.P., Weiss M., Nagel R.N., Odrey N.G., 1986, Industrial

Robotics: Technology, Programming and Applications, McGraw-Hill. [10] Gupta M.M., Rao D.H., 1994, Neuro-Control Systems: Theory and

Applications, IEEE Press. [11] Haykin S., 1999, Neural Networks: A Comprehensive Foundation,

Prentice Hall. [12] Hertz J., Krogh A., Palmer R.G., 1991, Introduction to the theory of

Neural Computation, Addison-Wesley, Redwood city. [13] Miller T.W. III, Sutton R.S., Webros P.J., 1990, Neural Networks for

Control, Massachusetts Institute of Technology. [14] Moriaty E.D., Miikkulainen R., 1996, “Evolving Obstacle Avoidance

Behaviour in a Robot Arm”, From Animals to Animats: Proceedings of the Fourth International Conference on Simulation of Adaptive Behaviour, Cope Cod, MA.

[15] Ochoa G., Harvey I., Buxton H., “On Recombination and Optimal Mutation Rates”.

[16] Rumelhart D.E., Hinton G.E., Williams R.J., “Learning Internal Representation by Error Propagation”, 1986, Cambridge MA: MIT Press, pp. 318-362.

[17] Waibel A., Hanazawa T., Hinton F., Shikano K., Lang K.J., 1989, “Phoneme recognition using time delay neural networks,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSp-37, pp. 328-339.

[18] Wassberg J., Stambaugh C.R., Kralk J.D., Beck P.D., Laubach M., Chapin J.K., Kim Y., Biggs S.J., Srinivasan M.A., Nicolelis M.A.L., 2000, “Real-time Prediction of Hand Trajectory by Ensembles of Cortical Neurons in Primates”, Nature, vol. 408, pp. 361-365

[19] Whitley D., “A Genetic Algorithm Tutorial”. [20] Yao X., “A Review of Evolutionary Artificial Neural Networks”, 1993,

International Journal of Intelligent Systems, vol. 8, pp. 539-567. [21] Ξένος Θ.Π., Άλγεβρα και Αναλυτική Γεωµετρία, 1997, Εκδόσεις Ζήτη.

Page 37: AAAARRRRMMMM - Semantic Scholar...This dissertation will present a control system for a robotic arm. A Time Delay Neural Network, which is a sub-category of dynamic neural networks,

Page 37 of 37

Appendix: C codeAppendix: C codeAppendix: C codeAppendix: C code Continued in next page.