learning goal-directed sensory-based navigation of a mobile robot

11
Pergamon Neural Networks, Vol. 7, No. 3, pp. 553-563, 1994 Copyright © 1994 Elsevier Science Ltd Printed in the USA. All rights reserved 0893-6080/94 $6.00 + .00 CONTRIBUTED ARTICLE Learning Goal-Directed Sensory-Based Navigation of a Mobile Robot JUN TANI AND NAOHIRO FUKUMURA Sony Computer ScienceLaboratory (Received 10 February 1993; revised and accepted 14 October 1993) Abstract--This paper presents a novel scheme for sensory-based navigation of a mobile robot: the robot is trained to learn a goal-directed task under adequate supervision, utilizing local sensory inputs. Focusing on the topological changes of temporal sensory flow, our scheme constructs a correct mapping from sensory input sequences to the maneuvering outputs through neural adaptation such that a hypothetical vector field that achieves the goal can be generated. The simulation experiments show that a robot, utilizing our scheme, can learn tasks of homing and sequential routing successfully in the work space of a certain geometrical complexity. Keywords--Neural networks, Sensory motor system, Mobile robot, Navigation, Vector field, Learning. 1. INTRODUCTION There are growing interests in the study of the sensory- based, goal-directed navigation of a mobile robot. The objective is that a robot acquires the local sensory in- formation, utilizes an adequate type of domain knowl- edge and navigates to the specified goal. One conven- tional approach is map-based navigation (Elfes, 1987; Durrant-Whyte & Leonard, 1989; Freyberger, Kamp- mann, & Schmidt, 1990), building a detailed environ- mental map in global coordinates, which the robot can utilize during the navigation. This approach, however, contains a potential difficulty in the positioning of the robot (estimating the current position by matching the local sensory profile to the geometry in the map), which is attributed to the limitation of the centralized rep- resentation scheme, as discussed in Brooks (1987). Mataric presented an alternative approach, the so- called behavior-based representation in the navigation problem (Mataric, 1992). In her approach, the robot does not build nor access environmental modeling of global coordinates, instead it acquires a chain of land- marks as a representation of an environmental model through experiences. Although the methodology has quite an advantage such that it does not require precise Acknowledgements!The authors would like to thank Masahiro Fujita and Stan Berkovich for useful discussion for this study. Requests for reprints should be sent to Jun Tani, SonyComputer Science Laboratory Inc., Takanawa Muse Bldg. 3-14-13, Higashi- gotanda, Shinagawa-ku,Tokyo, 141 Japan. geometrical modeling nor a good positioning, its con- ditionings, such as predetermination of landmark types and an assumption of boundary following, prevent re- alizations of complex navigations. Our study presented in this paper, also following the concept of the behavior-based one, investigates a general methodology, the vector field approach. The idea is to construct a hypothetical vector field on the task coor- dinate so that the robot can achieve the goal from an arbitrary initial position, following its flow. In the case that the position of the robot and the specified goal are accessible, the idea simply falls into that of the potential method (Khatib, 1986; Payton, 1990) in which vector flow to the goal is explicitly shown on the task space. In our case, assuming no access to the global position but only to the local sensory information, we need a novel methodology that can reconstruct an equivalent vector field utilizing the sensory inputs. Conducting this study, we considered a simulated mobile robot equipped with a range sensor. The navi- gation tasks assigned to this robot were (a) to home to a predetermined goal, and (b) to cycle through multiple locations in a predetermined sequence. For these tasks, the robot has to find its way home or into the target path sequence from arbitrary starting locations in the work space. We assume a supervisor who knows about the task in the global manner. The robot is trained for the tasks under the necessary supervisions, adapting the sensorimotor mapping on a neural architecture. The following discussion summarizes the problems that arise in applying the vector field approach to the scheme of sensory-based navigation. 553

Upload: jun-tani

Post on 26-Jun-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning goal-directed sensory-based navigation of a mobile robot

Pergamon Neural Networks, Vol. 7, No. 3, pp. 553-563, 1994

Copyright © 1994 Elsevier Science Ltd Printed in the USA. All rights reserved

0893-6080/94 $6.00 + .00

CONTRIBUTED ARTICLE

Learning Goal-Directed Sensory-Based Navigation of a Mobile Robot

JUN TANI AND NAOHIRO FUKUMURA

Sony Computer Science Laboratory

(Received 10 February 1993; revised and accepted 14 October 1993)

Abstract--This paper presents a novel scheme for sensory-based navigation of a mobile robot: the robot is trained to learn a goal-directed task under adequate supervision, utilizing local sensory inputs. Focusing on the topological changes of temporal sensory flow, our scheme constructs a correct mapping from sensory input sequences to the maneuvering outputs through neural adaptation such that a hypothetical vector field that achieves the goal can be generated. The simulation experiments show that a robot, utilizing our scheme, can learn tasks of homing and sequential routing successfully in the work space of a certain geometrical complexity.

Keywords--Neural networks, Sensory motor system, Mobile robot, Navigation, Vector field, Learning.

1. INTRODUCTION

There are growing interests in the study of the sensory- based, goal-directed navigation of a mobile robot. The objective is that a robot acquires the local sensory in- formation, utilizes an adequate type of domain knowl- edge and navigates to the specified goal. One conven- tional approach is map-based navigation (Elfes, 1987; Durrant-Whyte & Leonard, 1989; Freyberger, Kamp- mann, & Schmidt, 1990), building a detailed environ- mental map in global coordinates, which the robot can utilize during the navigation. This approach, however, contains a potential difficulty in the positioning of the robot (estimating the current position by matching the local sensory profile to the geometry in the map), which is attributed to the limitation of the centralized rep- resentation scheme, as discussed in Brooks (1987).

Mataric presented an alternative approach, the so- called behavior-based representation in the navigation problem (Mataric, 1992). In her approach, the robot does not build nor access environmental modeling of global coordinates, instead it acquires a chain of land- marks as a representation of an environmental model through experiences. Although the methodology has quite an advantage such that it does not require precise

Acknowledgements! The authors would like to thank Masahiro Fujita and Stan Berkovich for useful discussion for this study.

Requests for reprints should be sent to Jun Tani, Sony Computer Science Laboratory Inc., Takanawa Muse Bldg. 3-14-13, Higashi- gotanda, Shinagawa-ku, Tokyo, 141 Japan.

geometrical modeling nor a good positioning, its con- ditionings, such as predetermination of landmark types and an assumption of boundary following, prevent re- alizations of complex navigations.

Our study presented in this paper, also following the concept of the behavior-based one, investigates a general methodology, the vector field approach. The idea is to construct a hypothetical vector field on the task coor- dinate so that the robot can achieve the goal from an arbitrary initial position, following its flow. In the case that the position of the robot and the specified goal are accessible, the idea simply falls into that of the potential method (Khatib, 1986; Payton, 1990) in which vector flow to the goal is explicitly shown on the task space. In our case, assuming no access to the global position but only to the local sensory information, we need a novel methodology that can reconstruct an equivalent vector field utilizing the sensory inputs.

Conducting this study, we considered a simulated mobile robot equipped with a range sensor. The navi- gation tasks assigned to this robot were (a) to home to a predetermined goal, and (b) to cycle through multiple locations in a predetermined sequence. For these tasks, the robot has to find its way home or into the target path sequence from arbitrary starting locations in the work space. We assume a supervisor who knows about the task in the global manner. The robot is trained for the tasks under the necessary supervisions, adapting the sensorimotor mapping on a neural architecture. The following discussion summarizes the problems that arise in applying the vector field approach to the scheme of sensory-based navigation.

553

Page 2: Learning goal-directed sensory-based navigation of a mobile robot

554 J. Tani and N. Fukumura

2. T H E VECTOR FIELD APPROACH

2.1. Embedding

The objective is to construct a hypothetical vector field in the task coordinate that converges to a point of the goal. We assume the corresponding vector field in a certain sensory-based coordinate, of which mapping determines the maneuvering direction (motor com- mand) of the robot. This implies that the sensory-based coordinate should be chosen adequately so that the vector flow in the task coordinate is successfully embedded (without crossings). Figure 1 illustrates this concept schematically. The vector flow in the task co- ordinate is transferred to those in two different types of sensory-based coordinates, in which the mapping of the lower case generates a section of crossing. This con- tains a certain area where the position is not uniquely identifiable from the obtained sensory state, with the result that the motor command cannot be determined from the obtained sensory state.

Finding a good space of sensory-based state is an interesting problem. Generally, embedding is achieved with an increase in the dimensionality of the embedded space, which can be realized in two ways in the problem of navigations. One is to utilize a fusion of multiple sensory information. For example, combining the range sensing and the dead reckoning naturally increases the dimensionality of the total sensory input. The other is to utilize history of the sensing flow rather than just a single shot of current sensing. The latter can be intu- itively understood; the positioning can be achieved more easily by watching the past changes of the scenery dur- ing maneuvering. We will focus on the latter approach in this study.

/ e so ate

Tas~ coordinate ~

Sensory-based coordinate (Bad mapping) FIGURE 1. Embedding of the vector field of the task coordinate into those of different sensory-based coordinates. The embed- ding fails in the downward case.

12 N R~: range profile

P~ : potential profile

FIGURE 2. Mobile robot with range sensors. The range profile Ri consists of N values from which potential profile Pi is cal- culated. The robot focuses on a local minimum in the potential profile.

2.2. Topological Trajectory

We assume a constraint on the trajectory to be gener- ated: a trajectory should be a smooth one, avoiding collisions with obstacles. This assumption reduces de- grees of freedom in the navigation and simplifies the whole problem dramatically. We assume a basic control scheme that realizes this constraint. Incorporating this control scheme, the task of navigation is reduced to decisions of maneuvering direction at a finite set of representative points. An approximation of the desired vector field in the task coordinate can be reconstructed only by acquiring the topological trajector); consisting of those representative vectors. The next section will introduce the exact maneuvering methodology em- bodying these concepts.

3. MANEUVERING S CH EME

This section describes the exact maneuvering scheme, implementing the above concept. In the current for- mulation, maneuvering commands are generated as the output of a composite system consisting of two levels: the control level generates a collision-free smooth tra- jectory by a variant of the potential method (Khatib, 1986), and the navigation level decides the topological trajectory. The navigation level is adaptive in the train- ing phase; the control level, on the other hand, is not. First we will describe the control level, based on the mobile robot schema assumed for this study.

The robot has a sensor belt on its forward side hold- ing N range sensors as shown in Figure 2. Thus the robot can sense the forward range profile of the sur- rounding environment in robot centered polar coor- dinates as Ri ( 1 <_ i <_ N) . Consider the angular potential profile Pi such that the potential in each angular di- rection is inversely proportional to the range,

Pi = k 3 ( R p - Ri)3 + k 2 ( R p - Ri)2 + kl (Rp Ri)

if R, < Rp, 0.0 otherwise.

Page 3: Learning goal-directed sensory-based navigation of a mobile robot

Learning Task of Mobile Robot 555

where Rp is the limit distance of the range sensor. The obtained profile is smoothed with an appropriate Gaussian filter. The maneuvering focus of the robot is a local minimum in the input potential profile. The robot basically proceeds toward a particular potential valley (open space in the environment) by targeting its bottom with a constant control gain. The robot is as- sumed to maintain a constant velocity Vo and to ma- nipulate only its angular direction 0. The dynamics of robot motion are thus described by,

O= kp(Of-O) + e.obst(Omin, Rmin, Ro ) (la)

)c= v0cosO, f '= v0sinO (lb)

where Of is angular direction of the focus point, kp is a control gain, and (x, y) is the robot's position. A func- tion is added as the second term of eqn ( l a ) in order to avoid tangential collisions at some critical corners (see Appendix). This function generates a repulsive rotational moment acting on 0 when the robot happens to approach extremely close to obstacles. The exact form of the function is,

obst(0min, Rmin, Ro) = a" [h3(Ro - Rmi.) 3

+ h2(Ro - Rmin) 2 + hl(Ro - Rmin)]

if Rmin < Ro, 0.0 otherwise.

w h e r e Rmi n and 0mi n are the minimum range and its angular direction among the range data values, and Ro is a threshold range. Assuming that the value of this function is negligible at most times, the dynamics of eqn ( 1 ) has the general character of convergent dynam- ics tracking the local minimum in the sensory profile.

Meanwhile, the strategy of the navigation level fo- cuses on topological changes in the sensory profile as the robot moves. Figure 3a shows a particular robot trajectory in the work space and Figure 3b shows the corresponding temporal sequence of sensory potential profiles. After starting in the initial location, the robot moves upward through the space by tracking a local minimum. The corresponding potential profile contains a single basin. (In the shaded sequence the middle part, where the potential values are low, is darker and the sides, with higher potential, are brighter.) As the robot moves through the work space the profile gradually changes until another local minimum appears when the robot reaches location (A) and perceives a new open area to the right. (In the figure the robot is shown pointing toward two local minima.) At this moment of catastrophe (bifurcation point ) the navigation level de- cides whether to transit the focus to the new local min- imum or remain with the current one. In this example, suppose that the robot decides to transit the focus to the new local minimum and proceeds along the right- hand branch. The same decision process is repeated at point (B) where it transits to a new local minimum to travel in an upward direction and point (C) where it

................ (c) %

............. d~ (B)

~" (A

start

(9 E

(C) stay

(B) transit

(A) transit

start FIGURE 3. An example trajectory and the corresponding bifur- cation sequence in the sensory data flow. (a) The trajectory contains three bifurcation points (A), (B), and (C). (b) In the spatiotemporal sensory data, the dark part indicates its low potential and the light part that of high potential. The section of the potential profile is shown at each bifurcating point. The arrows indicate the branching decisions of stay or transit.

decides to remain with the current focus and continue to the left without transit; those generate the resultant trajectory shown. The branching at each bifurcation point is actually decided by the neural networks based on the past bifurcation sequences (such as current and last times) in the sensory inflow, as will be detailed later.

In actual implementation, the generation of a new local minimum in the profile is confirmed after its con- tinuance period of Tlag so that the noisy generation of the local minima due to the minor concaves in the geometry can be prevented. In the current formulation the robot deals with the primal topology of the geom- etry, which can be obtained by filtering the row sensory data in time as well as space.

Figure 4 shows multiple trajectories starting from different initial locations in the same vicinity which all converge when the same branching decisions are taken at the encountered bifurcation points. It can be seen that this bifurcation sequence encodes a topological structure common to these variants. The navigation

Page 4: Learning goal-directed sensory-based navigation of a mobile robot

556 J. Tani and N. Fukumura

ore • °'~ ' ' ' ' " " ~ 6

i:" ..',"

: ! :"

• . : : . ' . ' . . . - . ~ : . : " . ,

i ii??:.

FIGURE 4. Convergence of multiple trajectories A, B, C, and D starting from neighboring initial locations under identical branching decisions.

level deals with this topological level of the navigation in generating the desired trajectory.

4. NEURAL ADAPTATION

This section discusses how the adaptation mechanism in the navigation level is implemented by a neural ad- aptation scheme. The basic idea is to self-organize an adequate mapping between sensory state as input and branching decision as output at each bifurcation point. The major issues of concern is the embedding problem, as discussed in the introduction: whether the output can be mapped uniquely from the defined input space, and whether the mapping preserves enough generality.

Facing this problem, we decided to define the sen- sory-based state by the regressive sequence of sensory profiles rather than the single profile at each moment of bifurcation; therefore its dimensionality can be ad- justed by the length of regression used.

At each moment, the filtered sensory profile of the robot in the later simulation consists of N = 20 range values. Since the essential information in the sensory

profile at each moment is assumed to be far less, we compress the information in the profile by the vector quantization technique known as the Kohonen network (Kohonen, 1982). The Kohonen network consists of an n dimensional lattice o fm nodes along each dimen- sion. The address of the winner unit in this lattice de- notes the output vector of the network. The virtue of this scheme is that the original topological structure of the input space is well preserved in the output space, which assures the generality of the transformation such that the output vector turns out to be similar if the input profile is similar. The N dimensional vector de- scribing potential profile, P, at each bifurcation se- quential time, t, is fed into the network, and its mapping into an output vector vt of fewer dimensions n is self- organized. This normalized output vector is fed into the feedforward network in a regressive manner V~: ( vt, vt_ 1 . . . . . v~_t+ ~ ) and its mapping into maneu- vering output is self-organized through the back prop- agation learning (Rumelhart, Hinton, & Williams, 1986). In this paper, we call this discrete vector of V~ as the sensory-based state.

Page 5: Learning goal-directed sensory-based navigation of a mobile robot

Learning Task of Mobile Robot 55 7

feed forward

I I I I ~vt-1 I ~vt

Kohonen's net ~ output space of (6,6,6)

, ."winnel'.,

Pt ~ "

Pt+l potential profile

FIGURE 5. A composite neural system: Kohonen net and feed- forward net. The output by the Kohonen net (the winner position in 3D output space) is fed into the feedforward net as a regres- sion.

In the simulation, the Kohonen network is set up with an N = 20 dimensional input vector and an n = 3 dimensional output space consisting of 6 X 6 X 6 (m = 6) units. The regression of this output is fed into a three-layered feedforward network with six hidden units. The length of regression is set to two (l = 2: current and last times) therefore the input layer of the network consists of six units. The output layer consists of two units: one corresponds to branching decisions (transit the focus or not) and the other to stopping. (The robot is also trained to stop when it arrives at the goal.) The neural architecture employed is summarized in Figure 5.

5. TRAINING FOR EXAMPLE TASKS

5.1. Outline The robot learns each task with the assistance of a trainer who is assumed to know the optimal paths. To train it for a homing task, the robot is located at an arbitrary starting point and the trainer guides it back to the fixed goal (home) by teaching it how to branch at each bifurcation point and when to stop upon arrival. Guidance to the goal is repeated from varying starting points so that the robot learns a general, robust homing scheme in the work space. The sensory input and cor- responding teaching output at each bifurcation point is accumulated during repeated guidances of the robot and fed to the composite neural network as the training data. Training for a cyclic routing task follows the same procedure; the robot is repeatedly guided from various arbitrarily selected starting points into the desired loop.

5.2. Simulations A work space of some geometrical complexity was adopted for the homing and the cyclic routing tasks simulation. For the homing the goal is located in a con- cave area near the center of the work space. The em- ployed geometry and tasks are schematized in Figure 6). Because the robot sometimes dropped into the wrong concave, we enhanced the navigation strategy in order to avoid the problem: when the robot comes ex- tremely close to the dead end of a concave (such a situation is easily detected by monitoring the range val- ues in the forward direction with respect to a certain threshold), the robot is instructed to turn around by 180 degrees, and only afterward does the network decide whether to stop (if the concave is recognized as the goal) or to continue traveling.

Training was conducted in three stages, with an in- creasing cumulative number of guidance patterns (8, 17, and 37 patterns). The effectiveness of training was tested at each stage by observing the robot's ability to home from a set of untrained starting points. Ten start- ing points were arbitrarily selected for testing. The re- sults of the experiment are summarized in Figure 7. The map at the left of each part of Figure 7a-c shows a stepwise trace of each training trajectory. The branching direction taught at each bifurcation point is indicated by a short bar projecting from the circled point. The number of training points (bifurcation points) was 38, 86, and 195, respectively, for stages 1, 2, and 3. Traces of the 10 testing phase trajectories are shown in the corresponding map on the right. The short bars indicate the branching decisions generated by the neutral network. It can be seen clearly that the robot's navigation skill depends on the number of training pat- terns it is taught. With little training the robot is likely to become lost and wander, depending on its starting point. With more training, it tends to proceed to the goal more directly, regardless of the starting point. The time steps required to home from 10 initial locations totaled 241 l, 1953, and 1312, respectively, for stages l, 2, and 3. The trajectories navigated in stage 3 were nearly perfect (with one exception that was once drawn into the lower right-hand corner). It can be said that

J / ~ \ "-.

@< homing task muting task

FIGURE 6. Tasks and work space.

Page 6: Learning goal-directed sensory-based navigation of a mobile robot

558 .I. Tani and N. Fukurnura

Ca)

, ' " ~ i ~ ~:~P'~ . . . . . . . , . . . . . . . . . t . . . . . . . . .

/ ...... ~ .:. i.~..:~.., .~. ............. .~.~,.... :: .............. .: i~i.~."~.~:.:~:~i.il ........... : , .

: :: . . ~ ~:i "~ . "~, ..."

i }-?

training (8 patterns) test STAGE1

(b )

i. iii, ' ~ • !~i: /~~~' .... ~i~ ••~

...... / \ :

training (17 patterns) STAGE2

test

FIGURE 7. Homing training and testing. Training and testing were conducted in 3 stages with an increasing cumulative number of training patterns (a -c ) . For each stage, the left side shows the training patterns and the right side shows the test result trajectories. Small bars projecting from the circled bifurcation points indicate the branching direction taken. (d) shows corresponding spatio- temporal sequences of sensory input as well as output vector of the kohonen network for five test trajectories in stage 3. The dimensions of the work space are 1.0 x 1.0. Employed parameters are k p = 1.0, c : 0.56, kl = 0.05,/(2 = 0.1,/(3 = 0.85, R p = 1.0, hi = 1.0, h2 = 1.0, h3 = 1.0, Ro = 0.12, v0 = 0.01, T~,g = 0.5, and o of Gaussian filter in radian is 0.384.

Page 7: Learning goal-directed sensory-based navigation of a mobile robot

7)

l'l

mmm

I mm

m

mm

:ime

mm

m

mm

m _m

mm

mm

mm

m mm

m mm

• mm

m

l mm

mm

m mm

m •

mm

m

m m

mm

m

• mm

m

u m

1"1 ...

I,.

D

..... ;;:;...:;::::

....<...,.

......

~/. ................

,.:.L.;,I~;I.;;

.....

...... ;..,~-,;:;;~ii-!~;~';.".~:~......:""., ^

'"-."?..~-..

.....".

d.~

"-~.

' /

\ --:

:;.:,

!~,

~ \

"~J~

" .Y

/

//';;

'

/ / ~4

"'

~~/~

......

~!'~'~i~..~,.~

,-~,!,J';~?;,~

,,~, "~

_

....,...

.--::.;.. !~- \

J ../'"".,.

.~/.-.. ....... ,~ ........

......"

:.-:"

m

i"..

ii '..".. ~,

.,~"

/ !

".~

.!.~7;i.i!~-~.--'s?.:;f"-

Page 8: Learning goal-directed sensory-based navigation of a mobile robot

560 J. Tani and N. Fukumura

the desired vector field converging to the home is con- structed mostly by stage 3.

Figure 7d shows the resultant sensory sequences corresponding to the five test trajectories A, B, C, D, and E of the stage 3. The output vector of the Kohonen network is shown at each bifurcation point in the im- mediate right by three short bars. Trajectories of A and B converged into one way before homing and those of C, D, and E converged into the other in the work space, which results in the last two sequences of the output vectors of the Kohonen net in A and B becoming quite close, and also does the last four of those in C, D, and E.

Training for the cyclic routing task was conducted using the same work space. The robot was trained to go around the two islands in the upper part of the work space in a figure eight. Training was conducted from 30 different starting points that generated 307 training points (see Figure 8). Afterward the robot was tested on 10 arbitrarily selected starting points, as shown at the right of Figure 8. All the test trajectories converge directly into the desired loop and endlessly cycle through it. The results indicate that the vector flows, converging to this cycling, were successfully constructed through the training.

We examined the robustness of the scheme presented here by introducing disturbances to the task environ- ment. After the robot's neural network was trained to home (stage 3) it was tested with an additional, un- expected obstacle in the work space (see Figure 9). It can be seen that the robot wanders a bit, but eventually

finds a way to the goal. We believe that such a distur- bance could partially destroy the shape of the vector field, but those tested were not fatal. Even though the branching decisions can be wrong at some positions because of unknown disturbances, those can be fol- lowed up later by finding and passing through known spaces. One might say that the navigation scheme is sufficiently robust to cope with a restricted class of en- vironmental changes so long as the global convergence of the vector flow is conserved.

5.3. Embedding in Sensory-Based State Space

In the simulation, the sensory-based state space was defined with a regression length of two. This section examines its adequacy in relation to the uniqueness of the mapping: the sensory-based state must be mapped uniquely into the maneuvering output. The trajectories of the example tasks described in the previous section is transversal in the x,y,0 task coordinate space, which means that branching decision can be determined uniquely if the current position is identifiable from the defined regression of sensory profiles. Thus the ade- quacy of the mapping can be confirmed by showing the correspondence between the defined sensory-based state space and positioning as one to one.

Using the training points of the homing task (stage 3 ) and the cyclic routing task, we examined the effect on the correspondence when the regression was varied between length 1 = 1 (only the current time step) and ! = 2. As for / = 1, the sensory-based state space is a

..Y ~ , ~ . ~ ~ .............. ..:.::.:.:.::::::::~,~.:: ...... / ..... ....... ; .....

/ / 1

1 training (30 patterns)

" " . ~ , - . . : . . . . . . . . . . . . : : : / i i . , " . . . . . . . . . . . . . . . . - . . ~ . . . . " ]

l test

FIGURE 8. Cyclic routine training and testing. The assigned task is to cycle around the two islands in the upper part in a figure eight. The parameter settings used are the same as those for the homing task.

Page 9: Learning goal-directed sensory-based navigation of a mobile robot

Learning Task of Mobile Robot 561

i..

...... "......•~

. . . . " ' " ' " ' - . .

" . . . . ." '" • -.-. ......... .."

FIGURE 9. Robust navigation in the face of unexpected disturbances. The robot was tested after training for the homing task (stage 3). Arrows point to unknown obstacles.

discrete state space of 6 X 6 X 6 (output of Kohonen network at the current time). We examined the sensory state for all of the training data and screened out the couple of points that share the same sensory state. Fig- ure 10a,b shows this result for l = 1. The couple of positions connected with dotted lines shares the same sensory state. There are several training points that share the same sensory state even though their positions are distant in the work space, which indicates that there exists some ambiguity in this sensory-based state space. The same was repeated for ! = 2 and its result is shown in Figure 10c,d. This time it is seen that ambiguity was gone in the defined sensory-based state space.

Those results empirically conclude that the trajec- tories of the desired tasks were successfully embedded into the internal discrete state space when gaining the necessary dimensionality in the temporal direction.

6. DISCUSSIONS

This section shows an interpretation of our results in the framework of the dynamical system's perspective (Kugler, Kelso, & Turvey 1980; Saltzman & Kelso, 1987) discussed in the study of actions and motor pro- grams in biological systems. The framework says that a successful system realizes its desired trajectories in

Page 10: Learning goal-directed sensory-based navigation of a mobile robot

562 J. Tani and N. Fukumura

(a) (b)

(c) (d) FIGURE 10. Effect of regression length I on ambiguity of sensory-based state space. The couples of positions connected by dotted lines share the same sensory state. In training points of (a) homing and (b) cyclic routing with I = 1, there exist some ambiguity in the sensory state. In (c) homing and (d) cyclic routing with ! = 2, there exist no ambiguity.

an attractor dynamics of low dimensionality by setting adequate interactions between environmental variables and internal variables. The idea is also applicable to our schemes. The experiments showed that the robot, starting from arbitrary initial locations, converged to the point of the home or to the cycling loops after nec- essary training. Here the essential point is that the global

dynamics of the robot, which is resulted in coupling of the topological decisions and the collision-free control, eventually realized the desired tasks by means of at- tractor dynamics of fixed point and limit cycling.

Our study employed the supervised training scheme; thus it can be said that the global dynamical structure is determined by how the robot has been trained. If the

Page 11: Learning goal-directed sensory-based navigation of a mobile robot

Learning Task of Mobile Robot 563

FIGURE 11. Tangential collisions.

t ra in ing is not enough (no t covering the work space) , some representat ive vectors r ema in unlearned, which prevents smooth convergences o f the navigation, even generat ing undes i red infinite loopings. The t ra in ing should be repeated unti l the vectors at mos t of the lo- cat ions direct in the r ight direct ions.

El iminat ing the supervised t raining and in t roducing self-learning by tr ial and er ror is one future d i rec t ion o f our study. It will be quite interest ing to investigate how the global d y n a m i c a l s t ructure can be self-orga- nized in apply ing self- learning methodologies such as the re inforcement learning (Bar to , Sut ton, & Watkins , 1990; Sutton, 1988; Whi t ehead & Ballard, 1991).

7. S U M M A R Y

Learn ing sensory-based, goal -d i rec ted navigat ion o f a mobi le robot was fo rmula ted in cons t ruc t ing a con- vergent hypothet ica l vector field in the task space uti- l izing local sensory informat ion . We focused on the t empora l sequence of topological changes in sensory flow, ra ther than on the exact sensory profile at each momen t , and considered this in the cons t ruc t ion o f its r ight mapp ing to maneuver ing decision ou tpu t on a neura l a rchi tec ture th rough the supervised t ra ining. The s imula t ion exper iments showed that the robo t suc- cessfully learned the goal -d i rec ted navigations, the homing, and the sequential rout ing in the p roposed scheme.

REFERENCES

Barto, A., Sutton, R., & Watkins, C. (1990). Sequential problems and neural networks. In D. S. Touretzky (Ed.), Advances in neural information processing systems 2 (pp. 686-693 ). San Mateo, CA: Morgan Kaufmann.

Brooks, R. (1987). Intelligence without representation. Artificial In- telligence, 47, 139-159.

Durrant-Whyte, H., & Leonard, J. (1989). Navigation by correlating geometrical sensor data. Proceedings of lEEE / RSJ International Workshop on Intelligent Robotics and Systems "89, 440-447.

Elfes, A. (1987). Sonar-based real-world mapping and navigation. IEEE Journal of Robotics and Automation, 3, 249-265.

Freyberger, E, Kampmann, P., & Schmidt, K. (1990). Constructing maps for indoor navigation of a mobile robot by using an active 3D range image device. Proceedings of lEEE/RSJ International Workshop on Intelligent Robotics and Systems "90, 143-148.

Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research, 5, 90-98.

Kohonen, T. (1982). Self-organized formation of topographically correct feature maps. Biological Cybernetics, 43, 59-69.

Kugler, P. N., Kelso, J. A. S., & Turvey, M. T. (1980). On the concept of coordinative structures as dissipative structures: 1. Theoretical lines of convergence. In G. E. Stelmach and J. Requin (Eds.), Tutorials in motor behavior (pp. 49-70). Amsterdam: North- Holland.

Mataric, M. (1992). Integration of representation into goal-driven behavior-based robot. IEEE Transactions on Robotics and Au- tomation, 8, 304-312.

Payton, D. (1990). Internalized plans: A representation for action resources. Robotics and Autonomous Systems, 6, 89-103.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. ( 1986 ). Learning internal representations by error propagation. In D. E. Rumelhart & J. L McLelland (Eds.), Parallel distributed processing (pp. 318-362). Cambridge, MA: MIT Press.

Saltzman, E. L., & Kelso, J. A. S. (1987). Skilled actions: A task- dynamic approach. Psychological Review, 94, 84-106.

Sutton, R. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.

Whitehead, S., & Ballard, D. ( 1991 ). Learning to perceive and act by trial and error. Machine Learning, 7, 45-83.

APPENDIX: D E S C R I P T I O N O F T A N G E N T I A L C O L L I S I O N S

See an example shown in Figure 11. If the robot has a focus in the direction indicated by an arrow and proceeds straight in the direction, it actually collides tangentially with the corner indicated by a cross. The function obst( ) is added in eqn (la) that generates a slight repulsive rotational moment from the corner when the robot ap- proaches it closely. With this rotational moment the robot can pass through the corner without collisions as indicated by the real line.