fredrik lindqvist, [email protected] october 1, 2003 the khepera robot 2.4 training setup figure 6:...

61
Selfsupervised learning for a miniature robot Fredrik Lindqvist, [email protected] October 1, 2003

Upload: nguyenkiet

Post on 19-Apr-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

Selfsupervised learning for a miniature robot

Fredrik Lindqvist, [email protected]

October 1, 2003

Abstract

This thesis examines the possibilities to have a miniature robot learn andadapt to its environment so that it may perform complex behaviors. Withthe system described within a Khepera robot is able to find and move througha door while avoiding obstacles and walls. It utilizes a neural network anduses a selfsupervised learning algorithm to make this possible. A Q-learningapproach is used to get utility values required for training the network. Thismakes the network able to learn within a matter of seconds and requires onlyas little as 30 training examples.

CONTENTS CONTENTS

Contents

1 Introduction 61.1 Why use learning robots? . . . . . . . . . . . . . . . . . . . . 61.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . 6

2 The Khepera robot 82.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 IR-sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Camera module . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Training setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Neural Networks 143.1 Biological model . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Computer model . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Training Neural Networks . . . . . . . . . . . . . . . . . . . . 17

3.3.1 Backpropagation algorithm . . . . . . . . . . . . . . . 173.3.2 Resilient Backpropagation . . . . . . . . . . . . . . . . 18

4 Learning 194.1 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 194.2 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Selfsupervised learning . . . . . . . . . . . . . . . . . . . . . . 194.4 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . 20

4.4.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Robots 225.1 Classical AI robotics . . . . . . . . . . . . . . . . . . . . . . . 225.2 Behavior based robots . . . . . . . . . . . . . . . . . . . . . . 235.3 Breitenberg vehicles . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Neural Networks in robots 266.1 Case study: ALVINN . . . . . . . . . . . . . . . . . . . . . . . 26

6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 266.1.2 Image preprocessing . . . . . . . . . . . . . . . . . . . 266.1.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2 Case study: ALDER . . . . . . . . . . . . . . . . . . . . . . . 296.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 296.2.2 Control structure . . . . . . . . . . . . . . . . . . . . . 306.2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Problem Specification 32

Fredrik Lindqvist 2 Master Thesis

CONTENTS CONTENTS

8 Implementation 338.1 The environment . . . . . . . . . . . . . . . . . . . . . . . . . 338.2 Image preprocessing . . . . . . . . . . . . . . . . . . . . . . . 348.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.3.1 Data gathering . . . . . . . . . . . . . . . . . . . . . . 368.3.2 Utility values . . . . . . . . . . . . . . . . . . . . . . . 368.3.3 Training the neural net . . . . . . . . . . . . . . . . . . 378.3.4 Neural network implementation . . . . . . . . . . . . . 39

8.4 Robot control . . . . . . . . . . . . . . . . . . . . . . . . . . . 408.5 Sequencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428.6 Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8.6.1 Using the Neural Network . . . . . . . . . . . . . . . . 438.6.2 Aim toward door . . . . . . . . . . . . . . . . . . . . . 448.6.3 Move toward door . . . . . . . . . . . . . . . . . . . . . 448.6.4 Avoid obstacle . . . . . . . . . . . . . . . . . . . . . . 448.6.5 Special Case: Annoyed behavior . . . . . . . . . . . . . 45

8.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9 Conclusions 49

10 Encountered problems 52

11 Future work 53

12 Acknowledgements 54

Fredrik Lindqvist 3 Master Thesis

LIST OF FIGURES LIST OF FIGURES

List of Figures

1 Khepera base unit . . . . . . . . . . . . . . . . . . . . . . . . 82 IR-sensors placement . . . . . . . . . . . . . . . . . . . . . . . 103 Reflection values . . . . . . . . . . . . . . . . . . . . . . . . . 104 Output from IR-sensors . . . . . . . . . . . . . . . . . . . . . 115 Camera image . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Khepera with cameraturret . . . . . . . . . . . . . . . . . . . 127 Training setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Biological Neuron . . . . . . . . . . . . . . . . . . . . . . . . . 159 Artificial neuron . . . . . . . . . . . . . . . . . . . . . . . . . . 1610 Tansig function . . . . . . . . . . . . . . . . . . . . . . . . . . 1811 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . 2012 Behaviorbased architecture . . . . . . . . . . . . . . . . . . . . 2313 Breitenberg vehicle . . . . . . . . . . . . . . . . . . . . . . . . 2414 ALVINN car . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2615 ALVINN image tranformation . . . . . . . . . . . . . . . . . . 2716 ALVINN network design . . . . . . . . . . . . . . . . . . . . . 2817 ADLER architecture . . . . . . . . . . . . . . . . . . . . . . . 2918 Khepera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3319 Example image . . . . . . . . . . . . . . . . . . . . . . . . . . 3420 Example image . . . . . . . . . . . . . . . . . . . . . . . . . . 3821 Example image . . . . . . . . . . . . . . . . . . . . . . . . . . 3822 Example image . . . . . . . . . . . . . . . . . . . . . . . . . . 3823 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 3924 Network training . . . . . . . . . . . . . . . . . . . . . . . . . 4025 Sequencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4126 Robot during execution . . . . . . . . . . . . . . . . . . . . . . 4627 Robot during execution . . . . . . . . . . . . . . . . . . . . . . 4728 Robot during execution . . . . . . . . . . . . . . . . . . . . . . 4829 Robot during execution . . . . . . . . . . . . . . . . . . . . . . 4930 Robot during execution . . . . . . . . . . . . . . . . . . . . . . 50

Fredrik Lindqvist 4 Master Thesis

1 INTRODUCTION

1 Introduction

Given a small miniature robot equipped with a camera and eight IR-sensorsthe purpose of the project was to create a control program based on a neuralnetwork that was capable of learning different behaviors. The network hadto be trained with a selfsupervised algorithm, which meant a method ofdetermining the value, or utility, of actions had to be created. Furthermoreseveral minor programs such as a image preprocessing algorithm had to bedeveloped and a sequencer to choose between different behaviors was needed.It was decided that the robot would be required to:

1. Avoid walls and obstacles

2. Aim to the center of a ”door”

3. Move toward and through a ”door”

For a full specification see Section 7 at page 32.

1.1 Why use learning robots?

Why is it important that robots have the ability to learn? Learning is oftenviewed as the very essence of an intelligent system and without this ability,there would be no intelligence at all [Ark98]. A robot that has the ability tolearn and adapt to the environment will increase its lifespan and usability.If the environment, task or even the robot should change all it requires isretraining instead of reprogramming [Neh00]. Programming a robot is some-thing only experts with the right tools are able to do, but training is ofteneasy and only takes a few minutes. Some designs don’t even require training.In Section 6.2 we will see a robot that is able to learn behaviors by itself. Allit needs is a given set of rules and even if the environment should change itonly requires a couple of seconds to adapt.

So if we in the future want robots that are able to adapt to differentenvironments and can do different tasks, some kind of training ability isfundamental.

1.2 Summary of results

The work with the functions on the Khepera robot can be divided into severalparts. Part one was to create a set of drivers that would provide a safe andstable platform for communicating with the robot and to be able to handlethe responses in a simple way. A lot of the work on this part had already beenmade in an earlier course that was focused on the Khepera robot [Lin00].

Fredrik Lindqvist 6 Master Thesis

1 INTRODUCTION 1.2 Summary of results

After that several helpfunctions were created for handling sensors, movingthe robot and performing basic tasks. Although they were created later,several small programs had to be made to make handling the neural networksimpler. Preprocessing of data was an important part of the work. It wasquickly decided that handling all the inputs from the camera was unnecessaryand a function to minimize the needed data was created, for more informationsee Section 8.2.

With the help of datagathering functions, neural network training algo-rithms, and image preprocessing the real work of implementing a neural net-work based robotcontrol program could begin. Several control designs weretried and cast aside until the sequencer described in Section 8.4 was found tobe working as desired. A final program was made that implemented all thebehaviors and was able to choose between the correct behavior depending onthe current situation.

The final design was found to have some flaws that could cause it to beunable to move from certain positions so a fourth behavior was added, the”annoyed” behavior. See Section 8.6.5 for more information. After includingthis, the robot performed as expected. In good conditions it had no problemof completing the goal of walking through the door. When it saw no door itexplored the arena while looking for the door and avoiding obstacles.

As explained throughout Section 2 and 10 the robot itself proved to haveseveral hardware related problems that the solution had to be adjusted to.With a more advanced robot a more advanced program would have beenpossible.

Fredrik Lindqvist 7 Master Thesis

2 THE KHEPERA ROBOT

Figure 1: The base unit of a Khepera robot

2 The Khepera robot

The Khepera robot was designed by K-team as a research and teaching toolfor universities [Kt99b]. By connecting the robot with a cable to a computersserial port, communication between robot and computer was possible.

Currently there is a newer model of the Khepera out (“Khepera 2”), butduring this project the older version was used.

Technical specifications of the Khepera:

• 16 MHz 32 bit CPU, MC68331

• 256 Kb RAM

• 256 Kb ROM

• Diameter: 55 mm

• Height : 30mm

• 2 motors for motion and steering

• Max speed:80 mm/s

• 8 infrared (IR) proximity sensors

Fredrik Lindqvist 8 Master Thesis

2 THE KHEPERA ROBOT 2.1 Basics

2.1 Basics

A Khepera robot is 55 mm in diameter and in its basic configuration only 30mm high, see Figure 1. For movement there are two wheels which can be setto individual speeds. The base unit is equipped with eight IR-sensors thatcan detect nearby walls and objects. If the user requires more interactionwith the surroundings there is a possibility to add one more module on topof the Khepera. Available modules are [Kt99b]:

• A gripper to move objects

• Video turret

• One dimensional vision turret

• Matrix vision, two dimensional

• General I/O turret

• Radio communication turret

2.2 IR-sensors

Around the robot eight IR-sensors are placed, see Figure 2. They providea way to detect and avoid walls. The sensors are of the type Siemens SFH900 and under optimal circumstances they could detect an obstacle at 5 cm[Sie96].

The sensors ability to detect object depends on the material, ambient lightand how reflective the material is. For example a white wall of aluminumwas easily detected from several centimeters, while the robot had a hard timedetecting a black paper at all. See Figure 3.

The output from a sensor is in the range of 0 to 1024. However due tothe poor quality of the sensors most responses were either at max or zero.As a test the robot was placed standing still with walls approximately onecentimeters to the right, left and front. During 20 seconds the outputs fromthe IR-sensors were gathered and were plotted in a graph, see Figure 4. Sincethe robot was standing still all sensors should only report one value, but thesensors detected several different values with as much spread as 170 units. Adifference of about 17 %.

2.3 Camera module

To provide more ways of interacting with the world the Khepera robot isequipped with a K213 vision turret. The turret has a visual range of 36◦

Fredrik Lindqvist 9 Master Thesis

2 THE KHEPERA ROBOT 2.3 Camera module

Figure 2: The IR-sensors are placed around the robot

Figure 3: Reflection values from different materials

Fredrik Lindqvist 10 Master Thesis

2 THE KHEPERA ROBOT 2.3 Camera module

Figure 4: Output during 20 seconds from a robot standing still

Figure 5: An image from the K213 camera turret

Fredrik Lindqvist 11 Master Thesis

2 THE KHEPERA ROBOT 2.4 Training setup

Figure 6: Khepera robot with a camera turret

horizontal. Maximum range for the camera is about 25 cm. The image ispresented as 64x1 pixel with 256 greylevels per pixel [Kt99a]. A typical imagefrom the camera can be seen in Figure 5 . It is possible to get less detailedpictures from the camera. Commands are available that return only 32 or16 pixels. The amount of information in each pixel can also be lowered from8 bit to 4 bit. This would increase the communication speed since less datawould be sent, but in this project all available information was needed, sothe settings were kept at default values.

Originally the turret would automatically adjust the camera to the am-bient light. But that has proven to be slow and give different scan timesdepending on how much light is available. Therefore the robot has beenmodified with a LED placed over the lightsensor [Mui02]. To ensure imageswith good accuracy the robot had to be moving at low speeds, or preferably,not at all. With the camera module attached the robot increases in heightto about 60 mm, see Figure 6.

Fredrik Lindqvist 12 Master Thesis

2 THE KHEPERA ROBOT 2.4 Training setup

Figure 7: A typical training setup

2.4 Training setup

The Khepera robot has an onboard CPU and RAM which makes it possibleto download programs and run programs on the robot. However that is veryinconvenient during development since debugging is difficult. Instead a cablewas connected between the robot and the serial port of a PC. With the de-velopment of drivers it was possible to program in MatLab and send/receivedata to the robot. See Figure 7 for a typical setup.

Fredrik Lindqvist 13 Master Thesis

3 NEURAL NETWORKS

3 Neural Networks

A neural network (NN) is a mathematical system that has the ability to finda meaning from complicated and sometimes noisy data. Furthermore it canextract patterns that often are impossible to find for a human viewer. Quiteoften a welltrained NN is used as an expert system to help persons make theright decision. For an example, there exists several neural networks that canassist doctors when diagnosing patients with breast cancer [Rip98].

Other advantages with neural networks include [RN95]:

• Adaptive learning: A NN can learn how to perform different tasksbased on a set of training data.

• Self-organizing: NN can create its own organization or representationof information it gets while training.

• Fault tolerant: Generally NN have a graceful degradation, which meansthat if a part of the network should be lost, the network will still workand only loose a part of its abilities.

In real life neural networks are used in many different areas, cancer diagnosinghas already been mentioned. Neural networks is often used whenever thereis a need for predictions or forecasting [Ste03].Examples:

• Sales forecasting

• Industrial process control

• Customer research

• Data validation

• Risk management

• Target marketing

• Recognition of speakers in communications

• Diagnosis of hepatitis

• Recovery of telecommunications from faulty software

• Interpretation of multimeaning Chinese words

Fredrik Lindqvist 14 Master Thesis

3 NEURAL NETWORKS 3.1 Biological model

Figure 8: A neuron from a human brain

• Undersea mine detection

• Texture analysis

• Three-dimensional object recognition

• Handwritten word recognition

• Facial recognition

• Data compression

3.1 Biological model

Neural networks are inspired by the brain, so it can be useful to make ancomparison between the brain and a computer implementation of a neuralnetwork. It is still unclear exactly how the brain works. We do know thatit is responsible for thoughts, actions, memories and much more. The brainconsists of billion billions of nervecells, or neurons as they are called. Aneuron consists of dendrites, synapses and an axon. The axon is a singlestrand of fiber that stretches out. At the end it divides into several smallerfibers. Those connect with dendrites from other nearby cellbodies. The

Fredrik Lindqvist 15 Master Thesis

3 NEURAL NETWORKS 3.2 Computer model

Figure 9: The computer model of a neuron, a perceptron

connection point is known as a synapse, see Figure 8 [RN95]. A signal startsat one cell and branches out through the dendrites to connected cells. Thesynapses can either increase or decrease the signal. When a cellbody getsone or more signals from its dendrites it measures the total strength of thesignal. If they are above a threshold, the cell sends them forward along itsaxon to other cells. This causes the whole event to start again in severalother cells [Gur97]. The synapses aren’t static, they can change over timewhich is what the scientist believe is the base for learning in the brain. Whenmillions of simple cells work together like this we end up with an intelligentsystem. And as Russel says: ”Even though a computer is million times fasterin raw switching speed, the brain ends up being a billion times faster at whatit does” [RN95], p 566.

3.2 Computer model

A neural network is not meant to be a model of the human brain, the brainis far to complex to be able to mimic in any simple manner. Instead it canbe said to be inspired by the way the brain works.

The most basic version of a neural network is the perceptron as it wasintroduced by McCulloch and Pitts in 1943 (see [MP43]). It consists ofa number of inputs with weights associated to them, one output and anactivation function, see Figure 9. The inputs are all boolean, i.e. 1 or 0,and the activation function a simple threshold function. In this sense it wasvery similar to the functions of a brain. This perceptron was also known asa Threshold Logical Unit, TLU [Gur97].

Fredrik Lindqvist 16 Master Thesis

3 NEURAL NETWORKS 3.3 Training Neural Networks

Later the networks grew in size and changed design. Binary networksproved to be too limited so activation functions changed to sigmoidal func-tions and inputs could be of any range, this meant the outputs would becontinuous.How to calculate the output of a general neural network unit, step by step:

• Each input is multiplied by the weight associated with the input.

• The results are added together.

• The sum is used in an activation function

• The result for the perceptron is the the result from the function.

With changes to the weights and the activation function we can achievedifferent behaviors.

A perceptron can represent several boolean functions, such as the commonAND, OR and NOT. For the perceptron to be able to represent a functionit must be linear separable. This proved to be a big problem which Minskyand Papert pointed out in their book ”Perceptrons” [MP69] by showing thatthe logic function XOR wasn’t possible to implement in a perceptron. Thiscaused a decline in all work in perceptrons until multilayered neural networkswas introduced and Cybenko showed that by adding a single hidden layer itwas possible to represent any continuous function [RN95].

3.3 Training Neural Networks

3.3.1 Backpropagation algorithm

Backpropagation (BP) algorithm is a way to learn in multilayer neural net-work. It was actually first discovered by Bryson and Ho in 1969, but wasignored until 1980s. All learning algorithms try to minimize the error of theoutput from the network by changing weights. When we have a single layernetwork, it is easy. We can directly see the value of the weights and howmuch they contribute to the error. But in the case of a multilayered networkthere is no easy way to access the weights from input to the hidden layersso it becomes more difficult to change weights. With BP the network findsa way to divide the error back to the connected neurons in a way so thatweights are changed according to how much they contributed to the error.During training with BP the network will perform a gradient descent searchto try and find the value that minimizes the error. There are some problemswith this form of training, the biggest problem is the possibility to get caughtin a local minima [CF03]. For a more detailed description of the algorithm,see [Zur95].

Fredrik Lindqvist 17 Master Thesis

3 NEURAL NETWORKS 3.3 Training Neural Networks

Figure 10: The graph of the tansig function

3.3.2 Resilient Backpropagation

Typically multilayer networks have a sigmoidal function as their transferfunction. In this project a tansig function has been used, see Figure 10.They are often referred to as squashing functions since that is exactly whatthey do; They compress an infinite input to a finite output. This means thatthe slope of the function approaches zero as the input get very large. InFigure 10 it becomes apparent that the derivative will approach zero as thefunction approaches infinity. The problem with this can be that when thegradient has small absolute values it makes small changes in weights, eventhough the values are far from optimal.

With resilient backpropagation the network will train only with regardsto the signs of the derivatives of the slope. The size of the derivative willhave no effect at all on the weight change. Often resilient backpropagationtraining is much faster than standard BP training [DB98].

Fredrik Lindqvist 18 Master Thesis

4 LEARNING

4 Learning

Unless a system (be it robot or just a software) has complete knowledge ofthe environment where it is supposed to operate in, it needs some way togather new knowledge. But even with full knowledge the system will runinto problems as soon as something changes. Since the change is somethingnew, the system don’t know how to relate to the changed environment andcannot perform as expected [Ark98]. Learning is way to make sure we have asystem that can adapt to changing conditions. It is the ability to use perceptsand observation, not just to interact with the world, but also to evaluate itsown decision-making process [RN95].

4.1 Unsupervised learning

Unsupervised learning is a network design that is used when there is noknowledge of pairing input to outputs. Instead we let the network analyzethat. Often unsupervised learning is used as a way to group similar inputstogether [Neh00]. When a network has stabilized after training it will givesimilar outputs to inputs that are similar in characteristics. It is often pos-sible that the network discovers features that the user has never noticed.Kohonens self organizing maps is a good example of network that has thisability, see [Koh01].

4.2 Supervised learning

For every situation an artificial intelligent system can encounter there areinputs and possible outputs. The inputs are the different actions that exists,e.g. move forward, brake, drop object, turn left. Every action gives somesort of output, e.g. bumped into a wall, entered a dark area, rechargedbattery. With supervised learning the robot knows what the best output isin a certain situation. Often this knowledge comes from a human teacherwho controls the robot. So if there is a connection between input and outputthat the network can learn, it is known as supervised learning [RN95]. TheBackpropagation algorithm is probably the most used supervised learningalgorithm, see Section 3.3.1 for more information about that algorithm.

4.3 Selfsupervised learning

With selfsupervised learning the system has an internal critic that has knowl-edge of what actions would be good or bad. By consulting the critic the

Fredrik Lindqvist 19 Master Thesis

4 LEARNING 4.4 Reinforcement learning

Figure 11: An example of how a reinforcement system works

system can evaluate several possible action to determine which one will im-prove the situation the most. Selfsupervised- and reinforcement learning areboth very similar since they both uses an internal critic, but the difference isthat with selfsupervised learning the critic trains itself based on its internalprogramming, not according to an external reviewer [Bra02].

4.4 Reinforcement learning

According to Ronald Arkin reinforcement learning is one of the most widelyused methods for adapting robotic control systems ([Ark98], page 310). Withthis type of learning the system also uses a critic like selfsupervised learning.This critic checks the outcome of the controllers actions and gives a reward orpunishment based on the outcome of the action. The critic has no idea whatthe optimal action should be, but can determine if an action was good orbad, see Figure 11. The basis for this approach stems from the law of effectthat has been studied by psychological for more than 60 years [RN95]. Unlikeselfsupervised learning, this method means that the system usually doesn’tperform the correct action directly, but on the other hand there is no needto know the correct action. A common problem with this type of learning iswhat is known as the credit assignment problem. If we have several behaviorsworking together, how do we determine which behavior cause the outcomeof the action?

4.4.1 Q-learning

Q-learning is a form of reinforcement learning that tries to determine theexpected utility of a certain action before it happens. Exactly how that

Fredrik Lindqvist 20 Master Thesis

4 LEARNING 4.4 Reinforcement learning

works depends on the situation. In his PhD thesis Mark Humphrys [Hum97]creates a house robot problem that learns from an estimation of what theoutcome of an action will be. The estimation is based on a neural networkthat is being more and more trained as time progresses. The utility valuesfor the Q-learning could have been in a lookup table as well, but then thetable would need to be able to store all possible states combined with allpossible actions. This would require much memory, so therefore it will oftenbe easier to train a neural network that will be able to generalize and learnsimilarities.In this project the network uses the data from two timesteps (at t and t+ 1)to be able to establish a utility value. The network will connect the utilityfor the situation in t+1 with the state and action at t and train to rememberthat connection. During execution the robot will then have a good knowledgeof what the outcome of a certain action will be, if something similar hashappened during training. For more details regarding training see Section8.3.3.

Fredrik Lindqvist 21 Master Thesis

5 ROBOTS

5 Robots

In this section we will look more closely at the architecture of intelligentrobots. The architecture, according to Russel and Norviq, ”. . . defines howthe how the job of generating actions from percepts is organized” [RN95],p. 786. In the beginning the approach was what is now called the classicalapproach, which involved planning and carefully observing the environment.Around 1985 a new concept of artificial intelligence arrived. Rodney Brookspresented a behavior based architecture which was based direct actions withlittle or no planning and Breitenberg wrote about small intelligent robotsinspired by insects.

5.1 Classical AI robotics

By the late 60:s technology had advanced to the stage where vision systemswere available that could analyze and recognize simple objects. Together withthese tools and path planning algorithms the first intelligent robots emerged.One of the first was ”Shakey”, created at Stanford Research Institute byNilsson [Ark98]. It used the STRIPS system for navigation. STRIPS is afirst order logic theorem proving algorithm. The approach was often called”sense-plan-act”, since it in general followed the algorithm:

1. Observe the surrounding environment

2. Make an internal plan of the area, noting certain fields of interests (suchas doors)

3. Adapt the robots plan (to move through the door for example) to theenvironment

4. Execute the plan

5. If something were to change, for example a person entered the room,the robot would have to create a completely new plan

The classical approach to intelligent robots has two important charac-teristics: The ability to represent hierarchical structure by abstraction andthe use of ”strong” knowledge [Ark98]. With strong knowledge means thatthe robot is able to map an object it sees to a real object. At this time theidea was that knowledge and knowledge representation was central to intel-ligence and that therefore it had to be central to robotics as well [Ark98].However, the researchers of ”Shakey” discovered several problems with theirarchitecture [RN95];

Fredrik Lindqvist 22 Master Thesis

5 ROBOTS 5.2 Behavior based robots

Figure 12: An example of a behavior based architecture. Each of the boxesrepresent a behavior [Wer98]

• The theoremprover was too inefficient for non trivial plans

• Integrating geometric and symbolic representation of the world was todifficult

• Plans fail due to errors

5.2 Behavior based robots

In 1986 Rodney Brooks came with the revolutionary idea that instead ofdecomposing a robots architecture into functional components - such as vi-sion, planning and learning - it should be as behaviors. A behavior could beobstacle avoidance, wall-following, exploration or lightseeking [RN95]. Thebehaviors would then typically be formed in a hierarchical structure with themost basic actions at the bottom and the more advanced at the top. Themore basic behaviors would then have precedence over the ones above. Forexample, the system in Figure 12 would not start to explore if it was aboutto bump into a wall.

In his very influential paper Brooks made several points why a behaviorbased approach was better than the ”sense-plan-act” approach [Bro86]:

• Simplicity is a virtue

Fredrik Lindqvist 23 Master Thesis

5 ROBOTS 5.3 Breitenberg vehicles

Figure 13: An example of a Breitenberg Vehicle

• Complex behavior need not necessarily be the product of a complexcontrol behavior

• Robustness in the presence of noisy or failing sensors is a design goal

• Planning is just a way to avoid figuring out what to do next

There is also a combination of the classic- and behavior based designscalled hybrid systems.

5.3 Breitenberg vehicles

Breitenberg vehicles is a model suggested by the psychologist Valentino Bre-itenberg in 1984. Instead of planning or deciding between behaviors there isa direct connection between sensors and motors. It was a thought provokingexperiment for connectionists at the time, simple machines that were ableto mimic human and animal behaviors [Eng02]. He originally designed 12vehicles all with different behaviors, see Figure 13. The vehicles could befor example a wallfollower that always followed a wall on its left or rightside. By combining sensors such as lightdetectors with proximitysensors twobehaviors could work together. Whenever there was a collision between twoactions it was solved with behavior fusion, meaning that the actions wouldbe combined to create a third action that would hopefully work for the situa-tion [Hel03]. Emergent behaviors was a phenomena that happened often andmeant that when the robots worked in a natural environment the resultingbehaviors were impossible to predict so a new and unplanned behavior couldoccur. An example of an emergent behavior would be a light seeking robot

Fredrik Lindqvist 24 Master Thesis

5 ROBOTS 5.3 Breitenberg vehicles

that found two light sources, it would then start to move around both sourcesin an eight shape [SL02].In this project a version of Breitenbergs avoid obstacles behavior will be used,see Section 8.6.4 for more information. The robot will have a direct connec-tion between the sensors and the motors, giving it a pure reactive behavioras Breitenberg describes.

Fredrik Lindqvist 25 Master Thesis

6 NEURAL NETWORKS IN ROBOTS

Figure 14: The car used for ALVINN project [Pom95]

6 Neural Networks in robots

6.1 Case study: ALVINN

6.1.1 Introduction

Autonomous Land Vehicle in a Neural Network, or ALVINN for short, wasa project to create a car that was able to drive on its own [Pom95]. Thesystem consisted of a camera mounted on a chevy van with the computerequipment loaded in the back, see Figure 14.

6.1.2 Image preprocessing

An important part of the system was the camera that fed inputs to the neu-ral network. The percepts consisted of a camera able of giving a 30x32 pixelimage and/or a range finder with an output of 8x32 pixel. The image fromthe road and its immediate surroundings were then coded proportional to theintensity of the color blue. Blue was found to give the highest contrast be-tween road and non-road [Zur95]. A problem with teaching neural networksis to gather good and different training examples. Since ALVINN typicallywould be trained by driving on a road it was important to learn as quicklyas possible. If the network was trained while driving on a long and straightpatch of road, it would perform poorly with curves. A possible solution couldbe to have the driver swerve. That would give the network training examplesof the driver recovering from errors. But it would also teach the network toimitate the drivers swerving [Pom95].

The solution was to geometrically transform every image to provide dif-ferent images where the car would appear to be in a different position, see

Fredrik Lindqvist 26 Master Thesis

6 NEURAL NETWORKS IN ROBOTS 6.1 Case study: ALVINN

Figure 15: A view of the car with both the original image and an transformedimage

Figure 15. Since the part of the road that the camera sees is more or lessflat it was possible to transform and rotate the image. The camera gives atrapezoidal view of the road and with several image manipulation techniquesthey provided an image where it seemed like the car had been turned somedegrees left or right. The transformed images would have some missing pixeldue to the transformation, those were filled with a extrapolation scheme thatfound a corresponding pixel parallel to the road [Pom95].

This gave the network the ability to gain many examples to train on foreach image provided by the camera. Through testing it was found that 14training examples of each sensorinput was enough. The transformed imageswere shifted at random between ± 0.6 meters and ± 6◦.

6.1.3 Architecture

After an image had been preprocessed it was presented to the network as960 inputs. The network had a hidden layer with 29 units featuring a signedsquashing function. The hidden layer is then connected to an output layer

Fredrik Lindqvist 27 Master Thesis

6 NEURAL NETWORKS IN ROBOTS 6.1 Case study: ALVINN

Figure 16: The basic design of ALVINN neural network

Fredrik Lindqvist 28 Master Thesis

6 NEURAL NETWORKS IN ROBOTS 6.2 Case study: ALDER

Figure 17: The design of the ADLER robot

of 46 units. The outputs indicate how the car should turn. If the unit inthe middle has the highest output, it would mean that the car should drivestraight. Units to left and right indicate a sharp turn left and right respec-tively. In Figure 16 the network can be seen. The overall design has provenvery effective, as Zurada writes: ”It would take many months of algorithm de-velopment and parameter tuning [. . . ] to develop a conventional autonomousdriver of comparable efficiency” [Zur95], p. 9-10. A final testament to howsuccessful ALVINN was is the ”No Hands Across America” [NOH96] project.In that two researchers drove from Pittsburgh, PA to San Diego, CA witha car using the design from ALVINN. The distance traveled was over 5000kilometers at an average speed of 90 km/h, during this time the computersystem was in control of the driving at 98,2 % of the time [NOH96].

6.2 Case study: ALDER

6.2.1 Introduction

In 1989 Ulrich Nehmzow conducted some experiments with a mobile robotthat was able to learn through trial-and-error with a self-supervising process.In merely a few learning steps the robot was able to follow certain rules andadopt to a changing environment without human intervention [Neh00]. Theprogram behind the robot consists of a neural network that associates input

Fredrik Lindqvist 29 Master Thesis

6 NEURAL NETWORKS IN ROBOTS 6.2 Case study: ALDER

patterns with outputs. Together with a monitor and a move selector therobot makes its moves, see Figure 17. To decide a desired behavior Nehmzowuses something he refers to as ”instinct rules”. These rules are not behaviorthemselves but a set of constants that correspond to some sensor state. A setof instinct rules are then used to make the robot learn different behaviors.With each rule there is a dedicated sensor that decides if the rule is violatedor not. The resulting behavior of the robot becomes to perform actions thatmake sure the rules are not violated.

6.2.2 Control structure

The pattern associator is a perceptron with the sensors as inputs and possiblemotoractions as outputs. From a certain sensor state the perceptron yieldsa vector with outputs, all ranging in utility. The move selector chooses theaction corresponding to the output with the highest result. Meanwhile amonitor examines the new state and if an instinct rule is still violated theoutput with the second highest rating will be performed. This will continueuntil an action is found that doesn’t violate any rules. The monitor will thenteach the pattern associator to associate the sensor state with the successfulmotor action.

6.2.3 Experiments

All behaviors have their origin in the instinct rules so the user defines therules and according to Nehmzow in less than 20 learning steps a new behav-ior will be learned.

Examples of instinct rules ([Neh00]):

• Keep the forward motion sensor ’on’ at all times

• Keep whiskers (touch sensors) ’off’ at all times

• Do touch something every four seconds

• The whiskers that was touched last time, must not be touched this time

A combination of the rules above creates different results. If you, for exam-ple, want to create a corridor following robot, all the rules above would beused. Through experiments performed by Nehmzow with different robots anddifferent sensors gives the result that a simple obstacle avoidance behavior(first and second rule) is learned in under 10 learning steps ([Neh00]).

Fredrik Lindqvist 30 Master Thesis

6 NEURAL NETWORKS IN ROBOTS 6.2 Case study: ALDER

When the environment or the design of the robot changes, the patternassociator will quickly learn to adopt to the new situation. The test robotsused were equipped with whiskers that detected walls and obstacles. In oneexperiment these whiskers were swapped and it only took between four andsix learning steps for the robot to change the pattern associator to regain itsobstacle avoidance behavior.

Fredrik Lindqvist 31 Master Thesis

7 PROBLEM SPECIFICATION

7 Problem Specification

For a Khepera robot a reactive behavior can be described with two functionsf and g:Ml(t) = f(s(t))Mr(t) = g(s(t))whereMl(t) andMr(t) are the velocity commands for the left and right wheelsat the time t. s(t) are the sensor data measured at time t. To learn functionsnormally supervised learning with a teacher would be used. The teachercould then be either human or a program and would provide the correctoutputs for a large number of situations characterized by the sensorsignals s.

An alternative way of making the robot learn a correct behavior is thefollowing: Let a neural network u = h(s,Ml,Mr) represent the utility u forthe situation a robot would end up in if it in situation s performed the action(Ml,Mr). By designing the measure of utility the robot will learn differentbehaviors. By, for example, giving high utility to situations with high speedsand activated sensors on just one side of the robot we would end up with awallfollowing robot.

The function h can be taught by at time t letting the robot choose, atrandom or according to a schedule, a (Ml(t),Mr(t)) which would then beapplied to the motors. After a timestep the utility u(t + 1) would be mea-sured. Trainingdata of the form [s(t),Ml(t),Mr(t)] −→ u(t+ 1) can then beautomatically generated and used to train the network h.A trained network would then be used in this way:For a given sensorvector s(t) the values (Ml,Mr) which maximize h(s(t),Ml,Mr)are calculated. These values will then be chosen as controlling values,i.e. (Ml(t),Mr(t)) = argmax(Ml,Mr)h(s(t),Ml,Mr)

Through discussion it was decided that a good goal for the robot was tofind and move toward a simulated door. All while avoiding obstacles alongthe way. For this the utilityfunction must give a high utility when the robotis looking directly at the door. That means that a small difference betweenthe middle of the door and the middle of the camera means the robot isperforming well and should be rewarded with a high utility.

Fredrik Lindqvist 32 Master Thesis

8 IMPLEMENTATION

Figure 18: Khepera robot in its working environment

8 Implementation

The architecture for the robot consisted of three layers of functions. Atthe lowest level was a basic function to open and close a communicationport to the Khepera robot and to be able to send/receive commands to therobot. These functions had been created earlier and were coded in C. Thecommunication programs only performed the most basic duties of sendingand receiving, so one of the first tasks was to create a layer of drivers thatperformed some kind of error control. As the final layer the neural networkprogram was created. It was using all the functionality of the programs belowmaking it very robust and able to handle errors in a safe way.

8.1 The environment

The robot works in an arena of about 50 x 70 centimeters. The entire arenais made of aluminum with black bottom and white painted walls, see Figure18 for an overview. The colors of the arena are important to get the bestresponse possible from the Khepera robot. If the bottom had been white,more light would have been reflected which would have caused the IR-sensorsto react differently. The simulated door were two plastic blocks with magnetsunderneath. These block were placed so that they formed an opening of about10 centimeters. The ends toward the opening were painted black. Black wasfairly easy for the robot to see against the otherwise white world. Duringdiscussion it was decided that it did not simplify the problem too much and

Fredrik Lindqvist 33 Master Thesis

8 IMPLEMENTATION 8.2 Image preprocessing

Figure 19: Output image from example

that in fact it simulated real world since most doors have a frame aroundthe door with a different color than the walls. To provide more light for theexperiment an ordinary table lamp was placed directly above the arena.

8.2 Image preprocessing

As explained in Section 2.3 the output from the camera are 0 ≤ p(i) ≤255, where p indicates a pixel, and i = (0 . . . 64). To be able to make theneural network as small as possible while still maintaining all the importantinformation, several steps of preprocessing had to be performed.

1. Differentiate pixels. To make sure that changing light conditions didnot influence the experiments the image output was differentiated.That is, p(1) = (p(2) − p(1)), p(2) = (p(3) − p(2)) and so on. Thistypically leaves a vector where −150 ≤ p(i) ≤ 150. Since the doorpostswere black and all surrounding walls white, the posts would be veryeasy to spot.

2. Find pixels where p(i) ≤ −40. If the robot is seeing a doorpost, therewill be two pixels with a very low difference since the image changesfrom white to black.

3. Find pixels where p(i) ≥ 40. As in the previous step, this will findwhere the doorpost ends.

4. Pair the first pixel from step 2 with the first pixel we found in step 3.This gives the first part of the door.

5. Pair the second pixel from step 2 with the second pixel we found instep 3. This gives the second part of the door.

6. To get the middle of the first doorpost, find the middle of the two pixelsfrom 4.

Fredrik Lindqvist 34 Master Thesis

8 IMPLEMENTATION 8.2 Image preprocessing

7. To get the middle of the second doorpost, find the middle of the twopixels from step 5.

The last two steps will give two numbers between 0 and 64 that we will useto determine where the door is1. In the steps above it is assumed that bothdoorposts are visible, but the algorithm will work just as well if just one,or none, are found. In the case that just one doorpost is found, the secondvalue will be zero.

Example: The robot is about 30 cm from the doorposts and looking at itfrom straight ahead. This gives the following numerical result:

240 240 241 247 253 253 253 253 253 254 252 253 253

253 254 253 253 252 254 80 67 71 69 97 226 240

240 233 240 231 231 229 232 227 225 224 224 224 215

220 212 208 208 198 193 176 86 65 80 73 96 192

250 249 240 231 226 230 224 213 198 196 192 176

The numbers are the pixel from left to right. If we plot this, using theindividual values as grey values, we get Figure 19.

Following the algorithm above we differentiate the vector and get:

0 1 6 6 0 0 0 0 1 -2 1 0 0

1 -1 0 -1 2 -174 -13 4 -2 28 129 14 0

-7 7 -9 0 -2 3 -5 -2 -1 0 0 -9 5

-8 -4 0 -10 -5 -17 -90 -21 15 -7 23 96 58

-1 -9 -9 -5 4 -6 -11 -15 -2 -4 -16

Even now it is quite clear were the doorposts are located, there are fourvalues that indicate a big difference. The index numbers to those pixel are 19,24, 46 and 51. Pairing them together we understand that the first doorpoststarts at pixel 19 and end at 24. Doing the same thing with the seconddoorpost and finding the middle value we get the result 22 and 49. This arethe values we use as sensorinputs to the neural network. In this example therobot saw both doorposts, but if only the one to the left had been visible theend result would have been [22, 0]. Since there is no foolproof way of knowingif the one doorpost we see is the left or right one, the program always setsthe first value to the index. Therefore if we had seen the right one, we wouldend up with the result [49, 0].

From a image with 64 pixels we examine the picture and are able toextract all useful information and narrow it down to just two numbers. Thismeans a compression of 97 % with almost no loss of information.

1Actually values can only be between 2 and 62 since the algorithm need atleast onewhite pixel before/after a doorpost

Fredrik Lindqvist 35 Master Thesis

8 IMPLEMENTATION 8.3 Training

8.3 Training

8.3.1 Data gathering

The training data had to be similar to the way the robot would work in theexperiment, so a basic program was created. First the user had to manuallyaim the robot so that it would barely see the left doorpost. The programwould then start and turn the robot to the left at a fixed turnrate. After adecided time it would reverse its turn and rotate back to the original position.During this whole procedure data was collected. The data consisted of:

• The image after preprocessing

• The action made by the robot, i.e. turning left/right.

It would have been possible to utilize an approach similar to what Pomerleaudid in ALVINN (see Section 6.1 on page 26) and just have the robot turnfrom left to right and mirror the result to mimic the movement from right toleft. However due to errors in the Khepera robot with wheels slipping andodometers not being exact, it was best to have the robot do the completemovement. Since it needs to know when to move forward instead of turning,the first part of the trainingdata was copied and used to determine the utilityvalues of forward movements.

8.3.2 Utility values

In this experiment the utility will tell the neural network how well aimedthe robot is toward the door. If an action makes the robot better aimed itis more likely to choose that action. We will use the utility values duringtraining. The utility values are important since that is what determines theoutcome of the net. The neural network will train so that the output fromthe network will be similar to the utility values for the input.

The algorithm for determining utility values, u, has three cases:From the camera we get the preprocessed image [x, y], where x and y arethe indices of the doorposts.

• If x = y = 0, the robot sees no part, set u = 0

• If 0 < x ≤ 64 and y = 0, the robot sees one part of the goal, set u = 1

• If 0 < (x, y) ≤ 64, the robot sees both parts of the door:

1. Set z1 = pmax2

, where pmax is the maximum number of pixels usedin the camera.

Fredrik Lindqvist 36 Master Thesis

8 IMPLEMENTATION 8.3 Training

2. Set z2 = x+y2

3. Set z3 =| z1 − z2 |4. If z3 < 1, set z3 = 1

5. The utility value u is set to 2 + 2z3

6. u will vary from 2 to 4, depending on how well aimed the robot is

This will make sure that an image with one doorpost is more attractivethan one without any doorpost at all. And an image with two doorpostsis more attractive than one with a single doorpost. The utilityvalues of 0,1 and 2-4 combined with the training will make sure the robot chooses theaction that will get it to aim more to the center of the door.

8.3.3 Training the neural net

The basic design can be said to be a combination of Q-learning and self-supervised learning. It uses the idea from Q-learning to decide actions bypredicting future states while doing all training by itself as described in Sec-tion 4.3.

With the aid of help programs training data was gathered. Typically aset of 30 states would be used. The first state represented what the robot sawat time t and the action a(t) it was about to perform. The following state attime t+ 1 would then contain information about the outcome of action a(t).So by calculating the utility value of state t + 1 the network would be ableto learn what the outcome of each encountered state will be.

When training, the network was trained with the following data

• Input:

– The preprocessed image at time t

– The action about to be performed, a(t)

• Target:

– The utility value of the situation at time t+ 1

Example: At t we have the situation in Figure 20:The robot is about to execute the action turn left. To assign the correct

utility for this action we need to look at the image at the next timestep.At t+ 1 we have the image in Figure 21:The robot improved its situations since it more or less has the door

straight ahead. This means that the situation at time t is given a highutility value.

Fredrik Lindqvist 37 Master Thesis

8 IMPLEMENTATION 8.3 Training

Figure 20: Camera image at time t

Figure 21: Camera image at time t+ 1

At the following timestep the robot is about to turn right. We determinethe utilitvalue by looking at the coming image, Figure 22.

Figure 22: Camera image at time t+ 2

It is clear that the situation is worse so we punish the robot by giving alow utility value for the situation at t + 1. Still it is important to give therobot some utility since it can still see at least one part of the door. If theprevious situation had been that the robot didn’t see anything at all, thenthis had meant an improvement. Also note that the doorpost to the far leftof the picture is not seen enough to determine that it is a part of the door.

During training the action of not turning at all is examined but duringexecution the action of moving forward is used instead. In this case theimage at t and t+ 1 would be the same, and it would give the situation at ta medium utility value.

Fredrik Lindqvist 38 Master Thesis

8 IMPLEMENTATION 8.3 Training

Figure 23: The neural network used in the final design

All training data are analyzed in this manner and the resulting utility-values and trainingsets are used to train the neural network. It will alsogeneralize so that it will be able to estimate a reasonable answer for situa-tions that hasn’t been gathered during the training phase.

8.3.4 Neural network implementation

After trying several different variants the final network had the design ac-cording to Figure 23. Transfer functions at the neurons in the hidden layerwas the tansig function. This squashed the values between -1 and +1 asexplained in Section 3.3.2.

Since we need outputs between 0 and 4 we cannot use a sigmoid function.Instead Matlabs ’purelin’ function is used. The purelin function is what itsounds like; A straight line going through (0,0) at a 45◦ angle. This meansthe final layer will be able to deliver any value for output [DB98].

During training the algorithm resilient backpropagation is used as de-scribed in Section 3.3.2. Any backpropagation algorithm could have beenused with a similar result, but resilient backpropagation requires less mem-ory and is reasonably fast [DB98]. For training a goal is set to train thenetwork until the average error is below 0,3. Normally this occurs after amaximum of 300 epochs, often considerably sooner. In the example in Figure24 it occurred after 248 epochs.

Fredrik Lindqvist 39 Master Thesis

8 IMPLEMENTATION 8.4 Robot control

Figure 24: Graph of error value during training

8.4 Robot control

The robot works with three main behaviors;

• Avoid obstacles

• Aim toward the center of the door

• Move toward the door

These behaviors will result in three actions; move forward, turn left orturn right. To decide what to do at a given situation a sequencer is imple-mented to choose between the behaviors. The sequencer is similar to whatArkin calls Perceptual Sequencing ([Ark98] on page 279) and [Wer98] refersto as an orienter (page 18). A fourth behavior had to be added. It wasdiscovered that the robot could encounter a form of dead end where it wasincapable of finding a solution. This was dependent of the images it sawand the amount of training, so to make sure the robot could overcome thisproblem an annoyed behavior was created. For more information, see Section8.6.5 on page 45. The sequencer analyzes the inputs from the percepts anddecides which behavior is most appropriate. Table 1 gives an overview ofwhat the outputs will be given some circumstances.

Often a robot has some kind of explorer behavior (such as [Wer98]). How-ever this implementation does not have a specific explorer behavior which

Fredrik Lindqvist 40 Master Thesis

8 IMPLEMENTATION 8.4 Robot control

Figure 25: A diagram of the internal sequencer

Table 1: Summary of possible actionsSituation behavior Action(s)IR-sensors detect an obstacle Avoid Varying movementsCamera sees the door to the right Aim Turn leftCamera sees the door to the left Aim Turn rightCamera sees the door straight ahead Forward Move forwardNo door and no obstacle is detected Avoid Varying movements

Fredrik Lindqvist 41 Master Thesis

8 IMPLEMENTATION 8.5 Sequencer

would have been used when walking around and trying to find the door. In-stead the avoid obstacle behavior has been modified to create a robot thattries to cover a lot of area.

If the camera sees some part of the door the sequencer will call the neuralnetwork which will determine if the robot should turn or move forward. SeeSection 8.6.1 for more information on how the network is used.

8.5 Sequencer

The sequencer is the control center in the robot. It gathers all data andanalyzes it to try and determine which is the most appropriate behavior forthe current circumstance. The design can be seen in Figure 25Algorithm for sequencer:

1. Get camera image

2. Preprocess the image, this gives [x, y], where x and y are the indices ofthe doorposts

3. If x > pmax or y > pmax, stop the program. pmax is the maximumnumber of pixels we use from the camera

4. Get inputs from IR-sensors, R(i), where i = (0 . . . 7) according to Fig-ure 2

5. If∑7i=0 R(i) > 6000, stop the program

6. Choose behavior:

• If Ri > 200 for i = (1 . . . 6), choose avoid obstacle behavior

• If we cannot see any part of the door, choose avoid obstacle be-havior

• If none of the above apply, that means we can see the door, so letthe neural net decide behavior, see 8.6.1

Since the program will run until it is aborted, two easy ways to stop theprogram were added, see point 3 and 5 above. When the user either placed ahand around the robot or a finger over the camera lens, the program woulddetect that and abort the execution. With the IR-sensors the sequencerwould check the sum itself, but the image processing program would checkthe camera. If the image was very dark, the program would return [pmax +1, pmax + 1], i.e the maximum number of pixels the camera can show plusone.

Fredrik Lindqvist 42 Master Thesis

8 IMPLEMENTATION 8.6 Behaviors

8.6 Behaviors

8.6.1 Using the Neural Network

When the sequencer sees at least one part of the door it calls a subfunction tothe sequencer. It uses the neural network to try and determine the best courseof action. As stated in Section 8.2 the image after preprocessing consists oftwo numbers which indicate where the door is. The neural network uses thatinformation to determine if it should turn or walk forward.Algorithm for neural network:Given:NET , which is the trained network[x, y] which are the position indices of the dooru is utility value from the network and umax is the highest valuea are the actions, and amax is the best actiona−1, a0 and a1 are the possible actions with a−1 meaning a left turn, a0 noturn and a1 means a right turn.

1. Set umax = 0, where umax is the best utility value

2. Set amax = 0

3. Try u = NET (a−1, x, y)

4. If u > umax:

(a) Set umax = u

(b) Set amax = −1

5. Try u = NET (a0, x, y)

6. If u > umax:

(a) Set umax = u

(b) Set amax = 0

7. Try u = NET (a1, x, y)

8. If u > umax:

(a) Set umax = u

(b) Set amax = 1

The neural network will give the action that will make the best improve-ment for the current situation and the sequencer will choose a behavior tocarry out the action.

Fredrik Lindqvist 43 Master Thesis

8 IMPLEMENTATION 8.6 Behaviors

8.6.2 Aim toward door

From the algorithm in 8.6.1 the neural network has discovered which is themost appropriate turn. The turns themselves are coded internally as either-1 or 1, with -1 meaning a left turn. A turn factor is multiplied to the turnand a control function is called to perform the action. The turn factor is 10giving a turn of -10 or 10 pulses 2, this translates to about 2◦ each time it isset to aim.

8.6.3 Move toward door

When the neural network finds the best turn to be no turn at all, it is anindication the robot is well aimed already. No left or right turn will improveits situation. Therefore a move forward behavior will start. The behaviorconsists of setting the robot speed to move forward with an equal velocity onboth wheels for 2

10of an second. After that the robot will stop and control

will be returned to the sequencer that will determine the next action.

8.6.4 Avoid obstacle

The avoid obstacle behavior is inspired by Breitenbergs ideas regarding reac-tive behaviors (see Section 5.3); Sensor inputs should be directly responsiblefor motoractions. In this case the IR-sensors are multiplied by different fac-tors for each wheel and summarized to get one speed for each wheel. Thecomplete formula looks like this [Hel03]:ML =

∑7i=0 Wi · ri +W0

MR =∑7i=0 Vi · ri + V0

Where ML and MR are the speeds of the left and right wheel respectively.Vi and Wi is a vector of factors for each wheel, these are multiplied withevery IR-sensor. The offset W0 and V0 are added to give a base speed whenall sensor inputs are zero. Before using the sums to set the speed it is oftenrequired to multiply them with a small number, in this case 1

100was used.

The behavior the Breitenberg vehicles receives depend on how the weightsW0...7 and V0...7 are set. For this experiment the following weights were used:W = [ 0 3 -1 -4 -3 0 0 0]V = [ 0 -3 -1 -5 3 0 0 0]When comparing the order of the number with the placing of the IR-sensors(Figure 2) we notice several things

• The sensors closest to the wheels are not used (the factor is set tozero), this is to make sure the robot can walk through the door without

2The turn factor can be altered by the annoyed behavior, see section 8.6.5

Fredrik Lindqvist 44 Master Thesis

8 IMPLEMENTATION 8.6 Behaviors

interference.

• Normally the left and right weights are inverted of each other, but thetwo in the front need to be nonsymmetrical. Otherwise the robot wouldcome to a standstill if both the front sensors are equally activated.

• Both of the back sensors are deactivated. They could both be set toa positive value, this would enable the user to ”hurry” the robot bymoving an object close to the backsensors.

The weights above together with a low gain factor results in a behavior thatgives a powerful response which turns the robot. This had to be done tomake sure the robot would make sharp turns whenever it encountered a wall.Otherwise the robot will be inclined to just make a short turn and follow thewall. It should also be noted that the speeds are set to a certain velocity andare run for 4

10of a second, where the aim behavior actually moves the wheel

a decided distance.

8.6.5 Special Case: Annoyed behavior

At first, the entire design utilized the concept of statelessness and used nomemory to remember previous locations. However, during testing it couldfrom time to time run into a situation which it was unable to move awayfrom. When the robot saw just one doorpost it had no way of knowing if itwas the left or right one it saw. Often the problem was solved with the roboteither turning and losing track of the door or turning so that it saw bothdoorposts. But at times the robot would start to oscillate. It would go fromturning left to turning right and be unable to escape from this position.

To remedy this problem an ”annoyed” behavior was added. It is a simplebehavior that keeps track of two things; the previous turn and the totalannoyed factor. Annoyed factor is a number that gets higher or lower tokeep track of when the robot has problems. The actions can be divided intotwo categories, good and bad movements. A good movement is when therobot is adjusting its aim to get into a better position. Going from turningleft/right to moving forward is an indication that the robot is performing”good” actions.

But if the turn changes from left to right or vice versa it is an indicationof a problem, a ”bad” movement. The annoyed factor starts at zero and witheach good movement it will be lowered by -0.1. A bad movement increases thefactor by 0.5. When it goes above 1.0 the annoyed behavior gets activated,which means that it will set the turn factor as explained in 8.6.2 to 100 insteadof 10. This will cause the robot to move away from a difficult situation. There

Fredrik Lindqvist 45 Master Thesis

8 IMPLEMENTATION 8.7 Results

is also a 50 % chance that this action will improve the robots situation andmake it better aimed at the door. This added behavior solved the oscillationproblems that would sometimes occur.

8.7 Results

Figure 26: Picture of the robot when executing

In Figure 26 and Figure 27 it can be seen how the robot is able to see thedoor and walk toward it. During the walk, the aim behavior is used severaltimes to make sure the robot moves straight to the door. Since the robot isround it is not possible to see in the figures when the robot is performing theaim behavior.

Since the robot has a limited field of view of about 36◦ degrees (see Section2.3 on page 9) it will loose track of the door when it comes within about 15 -20 cm. It will then default to its avoid obstacle behavior. If the robot doesn’tsense any objects with the IR-sensors it will keep moving straight forward.This enables it to move through the door. The robot has been trained withdata gathered from the center back of the arena, but will still be able to

Fredrik Lindqvist 46 Master Thesis

8 IMPLEMENTATION 8.7 Results

Figure 27: Picture of the robot when executing

move toward the door from almost any position as long as it can see bothparts of the door. In Figure 28 the robot is starting in the lower half andmoving toward the door. First the avoid obstacle behavior is active since therobot is unable to see the door. When it nears the corner it makes a quickleft turn to avoid going into the wall and can see the door. After that theaim and move toward door behaviors make sure the robot moves to the doorin a straight line.

In the example in Figure 29 the robot sees the goal and is moving towardit. It has not been trained with this extreme angle but is still able to gen-eralize enough to move toward the door. However once it gets closer to thedoor the IR-sensors will detect the wall and the avoid obstacle behavior willclaim precedence over the moving to door, and the robot will avoid the wall.

In this case the ideal movement would have been the path as drawn inFigure 30, but this path would require the robot to lose track of the door forseveral seconds. The robot would be forced to remember the door or performa predecided set of movements to enable it to pass through the door. Howeverboth of these actions would be against the behavior based approach as set by

Fredrik Lindqvist 47 Master Thesis

8 IMPLEMENTATION 8.7 Results

Figure 28: Picture of the robot when executing

Brooks [Bro86]. Werger suggests a perceptual decay where the robots in hisexperiment can remember where an object was the last time it was seen. Inhis case that enabled his soccer playing robots to circle around a ball withoutactually seeing in [Wer98]. For this experiment a behavior like that wouldbe against the robots original programming; If the robots see a part of thedoor, they should always turn to try and center on the middle of the door -not walk away from it.

Fredrik Lindqvist 48 Master Thesis

9 CONCLUSIONS

Figure 29: The robot walks toward door, but sees the wall and starts theavoid behavior

9 Conclusions

To perform the tasks as stated in Section 7 the camera was an importantpart. Since almost all relevant information came through the camera it wasimperative to keep as much information as possible. But at the same timetoo much information will lead to problems with the network. It will be moredifficult to train, take more time to train and it will require more time toexecute. So the solution was to not compress the information but to extractall relevant information. All we need to know from the camera is: Do wesee the door? If so, what are the indices of the doorposts? This enabled acompression from 64 values to just 2. Due to this we end up with a smallnetwork that is able to learn well within a reasonable time and can executequick during execution.

By doing all testing and implementation on a real robot instead of asimulation (as in [SL02] for example) several problems concerning the robotand the hardware were encountered and had to be dealt with. But insteadthe problem that was solved was based on how a real robot actually works

Fredrik Lindqvist 49 Master Thesis

9 CONCLUSIONS

Figure 30: Robot walking toward door but failing, and the ideal path

in the real world, not how a perfect robot would behave in a perfect world.The robot performed as expected and was able to complete all goals.

When it saw both parts of the door it would always start to turn toward thegoal and walk toward it. Depending on the angle it would not always succeedin walking all the way through the door, but as explained in Section 8.7 thiswas due to the limited field of view on the camera rather than a problemwith the design.

The performance when the robot only was able to see one part of thedoor was much worse, as anticipated. This is the case where using the entireimage instead of a preprocessed one could have helped. If the camera onlysees one part of the door it has no way of knowing if it sees the left or theright part. Consequently the robot cannot train a correct behavior for thisand during execution it is more luck than skill if the robot actually turns tofind the center of the door. With a full image, the network would have hadmore clues as to which part of the door it saw and would most likely haveperformed slightly better.

When the robot started so that it couldn’t see any part of the door, it

Fredrik Lindqvist 50 Master Thesis

9 CONCLUSIONS

started the avoid obstacle behavior and started to explore the arena. To findthe door it had to make a sharp turn when it approached a wall so that therobot would face the center of the arena instead of looking toward a wall. Ifthe circumstances were right it could see the door and then it would most ofthe time be able to turn toward the door and move through it.

Fredrik Lindqvist 51 Master Thesis

10 ENCOUNTERED PROBLEMS

10 Encountered problems

Throughout the entire project the hardware proved to be more of a problemthan anticipated. To overcome the problem there had do be done severalspecial solutions. Some of the problems were:

Before sending a command to the robot it must have completed all oldcommands. This proved to be a problem when the robot was moving anda turn command was sent. This created a unpredictable result. Often theresult was that the robot would set each motor to some random value andstart to spin out of control. The solution to this was to make sure that therobot was standing still after each command. When sending a command tomake the robot turn a certain number of degrees, the control was not givenback to the computer until after the robot had come to a complete standstill.

As stated in Section 2.3 the camera has several limitations. To ensurethat the images were good the robot had to be halted for 2

10of a second.

This helped to get good images but at times, not even this helped. But witha good behavior based system 5 - 10% bad images didn’t cause any seriousproblems.

To measure the distance to nearby objects eight IR-sensors are used, butthey proved to be very unreliable as already stated in Section 2.2.

Fredrik Lindqvist 52 Master Thesis

11 FUTURE WORK

11 Future work

Ideas for further development on the same design would be to use a largerneural net and instead of preprocessing the image use the entire 64 pixelimage. That could give the robot the ability to distinguish between left andright doorposts. Of course that will come at an additional cost of the neuralnetwork being harder to train and taking much more time to compute.

In its current implementation the neural network is used for offline train-ing, i.e. it is being trained before it is used. For a new version it would beinteresting to have the network train during execution. The required changeswould be rather small.

If the program would have been downloaded into the robots own memoryan improvement in speed would have been likely since it would not need toconstantly send and receive data from the computer. But due to lack of timethis was never implemented.

K-team has other cameras available, one which can give a 360 degree fieldof view. Since many of the problems during this project were connected withthe camera, it would be interesting to see if a better camera would help.

Fredrik Lindqvist 53 Master Thesis

12 ACKNOWLEDGEMENTS

12 Acknowledgements

At the end of a big project such as this I feel I need to thank some peoplewho made this possible;My supervisor Thomas Hellstrom at Umea University for his expertise andhelpful ideas.My family for lots of encouragement and support.My friends for helping me with proofreading, especially my opponent PeterKemppe.

And of course; A students best friend: Caffeine.

Fredrik Lindqvist 54 Master Thesis

REFERENCES REFERENCES

References

[Ark98] Ronald C. Arkin, Behavior-based robotics, The MIT Press, 1998,ISBN 0-262-01165-5.

[Bra02] Brainbuilders, Brainbuilders, et rollespil om kunstig intelli-gens, http://www.brainbuilders.dk/ingame/visartikel.php?26, 2002.

[Bro86] Rodney Brooks, A robust layered control system for a mobile robot,IEEE journal of robotics and automation (1986).

[CF03] Philippe Crochat and Daniel Franklin (eds.), Back-propagationneural network tutorial, 2003, http://ieee.uow.edu.au/

~daniel/software/libneural/BPN_tutorial/BPN_English/

BPN_English/BPN_English.html.

[DB98] Howard Demuth and Mark Beale, Neural network toolbox, for usewith matlab, version 3 ed., Mathworks Inc., Januray 1998.

[Eng02] Imaginations Engines, Databots, http://www.

imagination-engines.com/databots.htm, 2002.

[Gur97] Kevin Gurney (ed.), Neural nets, 1997, http://www.shef.ac.uk/psychology/gurney/notes/l1/l1.html.

[Hel03] Thomas Hellstrom, Course material for articifial intelligence 2,Tech. report, Umea University, 2003, http://www.cs.umu.se/

kurser/TDBD93/VT03/.

[Hum97] Mark Humphrys, Action selection methods using reinforcementlearning, Ph.D. thesis, Trinity Hall, Cambridge, 1997, http://

www.compapp.dcu.ie/~humphrys/PhD/index.html.

[Koh01] Teuvo Kohonen, Self-organizing maps, third edition ed., Springer-Verlag, 2001, ISBN: 3-540-67921-9.

[Kt99a] K-team, K213 vision turret manual, K-team, version 1.3ed., March 1999, http://www.k-team.com/download/khepera/

documentation/K213Manual.pdf.

[Kt99b] K-team, Khepera user manual, version 5.02 ed., 1999,http://www.k-team.com/download/khepera/documentation/

KheperaUserManual.pdf.

Fredrik Lindqvist 56 Master Thesis

REFERENCES REFERENCES

[Lin00] Fredrik Lindqvist, Robotfotboll, en beteendebaserad ansats. assign-ment in artificial intelligence 2 course, Course assignment, UmeaUniversity, 2000.

[MP43] McCulloch and Pitts, A logical calculus of the ideas immanent innervous activity, Bulletin of Mathematical Biophysics (1943), no. 7,115 – 133.

[MP69] Marvin Minsky and Seymour Papert, Perceptrons, an introductionto computational geometry, The MIT Press, 1969.

[Mui02] Muir, Variable frequency oscillator, Tech. report, K-team,aug 2002, http://www.k-team.com/download/khepera/k213/

k213oscillator.pdf.

[Neh00] Ulrich Nehmzov, Mobile robotics: A practical introduction,Springer-Verlag, 2000.

[NOH96] NOHAA (ed.), No hands across america, 1996, http:

//www-2.cs.cmu.edu/afs/cs/user/tjochem/www/nhaa/nhaa_

home_page.html.

[Pom95] Dean Pomerleau, Neural network vision for robot driving,http://www.ri.cmu.edu/pub_files/pub2/pomerleau_dean_

1995_1/pomerleau_dean_1995_1.pdf, 1995.

[Rip98] Ruth Mary Ripley, Neural network models for breast cancer prog-nosis, Ph.D. thesis, St Cross College, University of Oxford, TrinityTerm, 1998, http://www.stats.ox.ac.uk/~ruth/thesis.pdf.

[RN95] Sturart Russel and Peter Norviq, Artificial intelligence, a modernapproach, Prentice Hall, Inc., 1995, ISBN 0-13-360124-2.

[Sie96] Siemens, Product sheet sfh 900, Tech. report, Siemens,1996, http://www.k-team.com/download/khepera/datasheets/SFH900_IRSensors.pdf.

[SL02] Hans Svedaker and Fredrik Lindborg, Teaching a miniature robotbehaviours using genetic programming, Master’s thesis, Umea Uni-versity, 2002, UMNAD 385/02.

[Ste03] Chris Stergiou (ed.), What is a neural network?, 2003,http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/

cs11/article1.html.

Fredrik Lindqvist 57 Master Thesis

REFERENCES REFERENCES

[Wer98] Barry Brian Werger, Cooperation without deliberation: A min-imal behavior-based approach to multi-robot teams, http://

www-robotics.usc.edu/~barry/papers/aij.pdf, 1998.

[Zur95] Jacek M. Zurada, Introduction to artificial neural systems, PWSPublishing Company, 1995, ISBN 0-534-95460-X.

Fredrik Lindqvist 58 Master Thesis