conceptual representations of actions for autonomous robots

13
UNCORRECTED PROOF ROBTICS899 Robotics and Autonomous Systems 899 (2001) 1–13 Conceptual representations of actions for autonomous robots A. Chella a,* , S. Gaglio a , R. Pirrone b a Dipartimento di Ingegneria Automatica e Informatica, Università di Palermo, Viale delle Scienze, 90128 Palermo, Italy b Centro di Studio sulle Reti di Elaboratori, CNR, Palermo, Italy Received 20 April 2000; received in revised form 20 September 2000; accepted 4 October 2000 Abstract An autonomous robot involved in long and complex missions should be able to generate, update and process its own plans of action. In this perspective, it is not plausible that the meaning of the representations used by the robot is given form the outside of the system itself. Rather, the meaning of internal symbols must be firmly anchored to the world through the perceptual abilities and the overall activities of the robot. According to these premises, in this paper we present an approach to action representation, that is based on a “conceptual” level of representation, acting as an intermediate level between symbols and data coming from sensors. Symbolic representations are interpreted by mapping them on the conceptual level through a mapping mechanism based on artificial neural networks. Examples of the proposed framework are reported, based on experiments performed on a RWI-B12 autonomous robot. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Artificial vision; Conceptual spaces; Actions; Processes; Representation levels; Neural networks; Hybrid processing 1. Introduction An autonomous robot engaged in complex missions as surveillance in unstructured environments [7], space missions [28], office mail distribution [34], service ap- plications [37], must perform long sequences of ac- tions. In such cases the robot needs high-level plan- ning capabilities in order to generate the correct action sequences according to the current task and the en- vironmental context. High-level planning requires the ability to build and process rich inner representations of the environment and of the agents acting in it, in- cluding the robot itself and its own actions [30]. The AI community developed several approaches to the description of actions and the generation of * Corresponding author. E-mail addresses: [email protected] (A. Chella), [email protected] (S. Gaglio), [email protected] (R. Pirrone). robot plans. Fikes and Nilsson [13] developed the well known STRIPS planner. Penberthy and Weld [31] proposed UCPOP which allows for non-linear plans. Blum and Furst [4] described Graphplan, which is based on efficient graph-search algorithms. Weld et al. [36] extended Graphplan in SGP (Sensory Graphplan), to allow for sensing actions. The GOLOG system [23] is based on the situation calculus. Although these plan- ners allow for rich and expressive descriptions of ac- tions and tasks, they manage only in a very limited way the problem of linking perception with action. Generally, a typical symbolic planner delegates to some sort of execution module the burden of taking into account the “situatedness” of the robot in the envi- ronment. The most famous example is PLANEX [12] which executes the plans generated by STRIPS. An- other approach consists in generating conditional plans that depend on sensing actions; i.e., logical statements whose truth value depends on the state of the exter- 0921-8890/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII:S0921-8890(00)00121-4

Upload: unipa

Post on 12-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899

Robotics and Autonomous Systems 899 (2001) 1–13

Conceptual representations of actions for autonomous robots

A. Chellaa,∗, S. Gaglioa, R. Pirroneba Dipartimento di Ingegneria Automatica e Informatica, Università di Palermo, Viale delle Scienze, 90128 Palermo, Italy

b Centro di Studio sulle Reti di Elaboratori, CNR, Palermo, Italy

Received 20 April 2000; received in revised form 20 September 2000; accepted 4 October 2000

Abstract

An autonomous robot involved in long and complex missions should be able to generate, update and process its ownplans of action. In this perspective, it is not plausible that the meaning of the representations used by the robot is given formthe outside of the system itself. Rather, the meaning of internal symbols must be firmly anchored to the world through theperceptual abilities and the overall activities of the robot. According to these premises, in this paper we present an approachto action representation, that is based on a “conceptual” level of representation, acting as an intermediate level betweensymbols and data coming from sensors. Symbolic representations are interpreted by mapping them on the conceptual levelthrough a mapping mechanism based on artificial neural networks. Examples of the proposed framework are reported, basedon experiments performed on a RWI-B12 autonomous robot. © 2001 Elsevier Science B.V. All rights reserved.

Keywords:Artificial vision; Conceptual spaces; Actions; Processes; Representation levels; Neural networks; Hybrid processing

1. Introduction

An autonomous robot engaged in complex missionsas surveillance in unstructured environments [7], spacemissions [28], office mail distribution [34], service ap-plications [37], must perform long sequences of ac-tions. In such cases the robot needs high-level plan-ning capabilities in order to generate the correct actionsequences according to the current task and the en-vironmental context. High-level planning requires theability to build and process rich inner representationsof the environment and of the agents acting in it, in-cluding the robot itself and its own actions [30].

The AI community developed several approachesto the description of actions and the generation of

∗ Corresponding author.E-mail addresses:[email protected] (A. Chella), [email protected] (S.Gaglio), [email protected] (R. Pirrone).

robot plans. Fikes and Nilsson [13] developed thewell known STRIPS planner. Penberthy and Weld [31]proposed UCPOP which allows for non-linear plans.Blum and Furst [4] described Graphplan, which isbased on efficient graph-search algorithms. Weld et al.[36] extended Graphplan in SGP (Sensory Graphplan),to allow for sensing actions. The GOLOG system [23]is based on the situation calculus. Although these plan-ners allow for rich and expressive descriptions of ac-tions and tasks, they manage only in a very limitedway the problem of linking perception with action.

Generally, a typical symbolic planner delegates tosome sort of execution module the burden of takinginto account the “situatedness” of the robot in the envi-ronment. The most famous example is PLANEX [12]which executes the plans generated by STRIPS. An-other approach consists in generating conditional plansthat depend on sensing actions; i.e., logical statementswhose truth value depends on the state of the exter-

0921-8890/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.PII: S0921-8890(00)00121-4

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS8992 A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13

nal environment. In all these cases, the link betweenperception and action of an effective robot working inreal environments is not taken into account. In the caseof GOLOG, a system called GOLEX has been devel-oped [17], with the aim of linking high level symbolicrepresentations with perception and control within areal robotic architecture. The link consists simply inthe introduction of PROLOG clauses that define highlevel symbols in the terms of low level primitives. Inaddition, the perceptual aspects taken into account arevery simple.

On the other side, researchers involved in mobilerobot architectures developed working robots rich ofsensors that effectively “situate” the robot in the en-vironment. Examples of this kind are, among others,RHINO developed by Thrun et al. [33], AuRA devel-oped by Arkin [2], SOMASS proposed by Malcomand Smithers [25], the Animate Agent Architecture[14] (see [21] for a review). Although the operationsof these robots are impressive, the high-level actionplanning has generally a limited role.

In robotics, various techniques have been devel-oped, in order to extract structured representationsfrom sensory data (see, e.g., [19,22]). In a different,even if related field, various systems for the naturallanguage interpretation and the symbolic descriptionof dynamic scenes have been proposed (among themVITRA proposed by Herzog [18]).

Our aim is to develop a representation of actionsbased on a principled integration of the approachesof mobile robot architectures and of symbolic plan-ners. We assume that this integration requires the in-troduction of a missing link between these two kindsof approaches. The role of such a link is played bythe notion of conceptual space (CS), which a cen-tral feature of this proposed system. The CS acts asan intermediate action representation level between“subconceptual” (i.e., reactive) actions and symbolicactions. The theory of conceptual spaces has been de-veloped by Gärdenfors and is described in detail in[16]. Elsewhere [8–10] we argued for its relevance forrobot vision in the field of static and dynamic sceneanalysis.

Conceptual spaces provide a principled way for re-lating low level, unstructured representation of datawith a high level, linguistic formalism. A conceptualspace is ametric spacewhose dimensions are strictlyrelated to the quantities processed in the subconcep-

Fig. 1. The three computational areas of the system and therelationships among them.

tual area (e.g., sensory related quantities). Examplesof possible dimensions of a CS are color, pitch, spa-tial coordinates. Such dimensions do not depend onany specific linguistic description. In this sense, con-ceptual spaces are prior to symbolic characterizationof cognitive phenomena.

In this perspective, our system is organized in threecomputational areas (Fig. 1). They must not be inter-preted as a hierarchy of levels of higher abstraction;rather, they are concurrent computational componentsworking together on different commitments.

The subconceptualarea is concerned with thelow level processing of perceptual data coming fromthe robot sensors. The term subconceptual suggeststhat information is not yet organized in terms ofconceptual structures and categories. Our subcon-ceptual area includes reactive behavior modules, asa self-localizationmodule and awandering and ob-stacle detectionmodule, described in detail in [9]and some3D reconstructionmodules, described inSection 7.

In the linguistic arearepresentation and processingare based on a logic-oriented formalism. We adoptedNeoClassic, a hybrid formalism based on a Descrip-tion Logic in the KL-ONE tradition, that has beenemployed also in industrial applications [5,6,27]. Ourlinguistic area includes the symbolic representationsof situations and actions and it is described in detailin Section 4.

The conceptual areais intermediate between thesubconceptual and the linguistic areas. This area isbased on the notion ofconceptual spacespreviously

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13 3

introduced. Here, data are organized in conceptualstructures, that are still independent of any linguisticdescription. The symbolic formalism of the linguisticarea is interpreted on these structures. In particular, inour system we choose the CS so that each point rep-resents a simple 3D primitive, as in a CAD system,along with its motion parameters information. In thisway, a moving scene is described as a set of points inCS, i.e., the set of its moving parts. The conceptualspace is described in detail in Section 2.

In Fig. 1 the links between these areas are drawn.There is no privileged direction in the flow of infor-mation among them: some computations are strictlybottom-up, with data flowing from the subconceptualup to the linguistic through the conceptual area; othercomputations combinetop-downwith bottom-up pro-cessing. In particular, the mapping between the con-ceptual area and the linguistic area plays a main rolein the proposed system because it gives the interpreta-tion of linguistic symbols in terms of conceptual struc-tures. It is achieved through anexpectation generationmechanismimplemented by means of suitable recur-rent neural networks [15]. This mapping is describedin detail in Section 5.

The proposed system has been implemented ona Real World InterfaceRWI-B12 autonomous robotequipped with an Ethernet radio link and a visionhead composed of a pan-tilt in which a CCD videocamera is mounted. The robot environment is a lab-oratory area populated by big white boxes, vases,persons and other obstacles.

2. Conceptual space

The theory of conceptual spaces provides a prin-cipled way for relating the low level, unstructured,representation of data coming out from the robot sen-sors with a high level, linguistic formalism. As statedbefore, a CS is a metric space whose dimensions arestrictly related to the quantities processed in the sub-conceptual area. By analogy with the termpixel, wecall knoxel a point in a CS. Aknoxelis an epistemo-logically primitive element at the considered level ofanalysis.

The basic blocks of our representations in CS are thesuperquadrics[32]. Superquadrics are 3D geometricprimitives derived from the quadrics parametric equa-

tion with the trigonometric functions raised to two realexponents. The parametric form of a superquadric is:

f (η, ω) =

ax cosε1 η cosε2ω

ay cosε1 η sinε2ω

az sinε1 η

, (1)

where−π/2 ≤ η ≤ π/2 and−π ≤ ω < π Thequantitiesax , ay , az are the lengths of the superquadricaxes, and the exponentsε1, ε2, are theform factors: ε1acts in terms of the longitude, andε2 in terms of thelatitude of the object’s surface. A value less than 1 letthe superquadric take on a squared form, while a valuenear 1 let the superquadric take on a rounded form.Eq. (1) describes a superquadric in canonical form; todescribe a superquadric in general position in the 3Dspace, three center coordinatespx , py , pz and threeorientation parametersϕ, ϑ , ψ should be added. So,a generic superquadricm is represented as a vector inR

11 space:

m = [axayazε1ε2pxpypzϕϑψ ]. (2)

An example of 3D reconstruction in terms of su-perquadrics of a scene in which our robot is in frontof a box is shown in Fig. 2 (right). How the robot mayperform this reconstruction is described in Section 7.

In order to account for the dynamic aspects of ac-tions, e.g., when the robot moves towards the box, weadopt a CS in which each point represents a wholesim-ple motionof a superquadric. In this sense, the spaceis intrinsically dynamicsince the generic motion of anobject is represented in its wholeness, rather than as asequence of single, static frames.

The decision of which kind of motion can beconsideredsimple is not straightforward, and it isstrictly related to the problem of motion segmen-tation. Marr and Vaina [26] adopt the termmotionsegmentto indicate simple movements. Accordingto their State-Motion-Stateschema, a simple motionconsists of the motion interval between two subse-quent (eventually instantaneous) rest states. We gen-eralize this schema by considering a simple motionas a motion interval between two subsequent genericdiscontinuities in the motion parameters.

Let us callm(t) a function associated to a generic su-perquadric, that, for each instantt, gives as its value thevector of the geometric parameters of the superquadricduring a simple motion:

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS8994 A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13

Fig. 2. An example of 3D reconstruction by means of superquadrics.

m(t)= [ax(t)ay(t)az(t)ε1(t)ε2(t)px(t)py(t)pz(t)

ϕ(t)ϑ(t)ψ(t). (3)

In this way, also changes in shape and size of thesuperquadric can be taken into account.

If we represent a moving superquadric in theR11

space described above, we obtain a set of points cor-responding to subsequent instants of time (the sam-ple values of the functionm(t)). This solution is notsatisfactory because it does not capture the motion inits wholeness. A possible alternative is suggested bythe well known discrete Fourier transform (DFT) [29].Given a generic parameter of a superquadric, e.g.,ax ,consider the functionax(t), that, for each instantt, re-turns the corresponding value ofax . ax(t) can be seenas the superimposition of a discrete number of trigono-metric functions. This allows the representation of thefunction ax(t) in a discrete functional space, whosebasis functions are trigonometric functions.

By a suitable composition of the time functions ofall superquadric parameters, the overall functionm(t)may be represented in its turn in a discrete functionalspace. We adopt the resulting functional space as ourdynamic conceptual space. This CS can be taught asan “explosion” of theR11 space in which each mainaxis is split in a number of new axes, each one cor-responding to a harmonic component. In this way apoint in the CS represents a superquadric along withits own simple motion. This new CS is also consistentwith the static space: a quiet superquadric will haveits harmonic components equal to zero.

In Fig. 3 (left) a static CS is schematically depicted;Fig. 3 (right) shows the dynamic CS obtained fromit. In the CS on the left, axes represent superquadricparameters; in the rightmost figure each of them is split

in the group of axes, that represent the harmonics ofthe corresponding superquadric parameter. The figureis a pictorial representation of the CS: our effectiveCS is made up by 11 dimensions (the superquadricparameters) plus their harmonic components. It is easyto extend this framework in order to consider othersuperquadric features, as for example their colors: eachnew parameter is a new dimension or a set of newdimensions in CS.

As far as single knoxels are concerned, the metricdefined on CS is a standard Euclidean metric. In thecase ofsequences of knoxels, that are the conceptualcounterpart of more complex entities in the world,we assume that their degree of similarity is implicitlylearned by the system on the basis of set of examples(see Section 5).

It is worth noting that knoxels correspond to theworld as it isperceivedby the agent, i.e., as it resultsfrom the data coming from perceptual processes. As aconsequence, conceptual representation of the world isin general incomplete, affected by uncertainty, and canbe (at least partially) wrong. In our approach, differentavailable techniques for dealing with uncertainty canbe applied at different levels: subconceptual, concep-tual and linguistic. In our opinion, a fully satisfactorytreatment of uncertainty and of incompleteness of datarequires the development of the “conceptually driven”active perception feedback mentioned in Section 8.

3. Situations and actions in conceptual space

3.1. Situations

A simple motion of a superquadric corresponds toa knoxel in CS. Let us now consider a scene made up

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13 5

Fig. 3. An evocative, pictorial representation of the conceptual space.

by the robot itself, along with several other movableobjects. Objects may be approximated by one or moresuperquadrics. Consider the robot moving towards awhite block, as in Fig. 2. We callSituationthis kindof scene. It may be represented in CS by the set ofthe knoxels corresponding to the simple motions ofits components, as in Fig. 4, whereka corresponds tothe white box, andkb corresponds to the robot mov-ing towards it. In this case,ka corresponds to a quiet

Fig. 4. An evocative picture of a Situation like that of Fig. 2 in CS.

object and its harmonic components are zero, whilekb corresponds to a moving object.

It should be noted that, on the one side, a con-figuration in CS may correspond to a state of affairsperceived by the robot, i.e., it may correspond to theactual arrangement of the external world as far as itis accessible to the robot itself. On the other side, aconfiguration in CS may correspond to a sceneimag-ined by the robot. For example, it may correspond toa goal, or to some dangerous state of affairs, that therobot must figure out in order to avoid it.

3.2. Actions

In a Situation, the motions in the scene occur simul-taneously, i.e., they correspond to a single configura-tion of knoxels in the conceptual space. To consider acomposition of several motions arranged according toa temporal sequence, we introduce the notion ofAc-tion [1].

An Action corresponds to a “scattering” from oneSituation to another Situation of knoxels in the con-ceptual space. We assume that the situations withinan action are separated by instantaneous events. In thetransition between two subsequent configurations, a“scattering” of at least one knoxel occurs. This cor-responds to a discontinuity in time that is associatedto an instantaneous event [26]. It should be noted that

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS8996 A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13

Fig. 5. An example of Action.

in this respect, the adopted description of Actions isstrictly anchored to the robot perception, because it isdescribed as a change in the scene perceived by therobot.

The robot may perceive an Action passively whenit sees some changes in the scene, e.g., a person inthe robot environment changing his/her position. Moreimportant, the robot may be theactor of the actionitself, when it moves or when it interacts with theenvironment, e.g., when it pushes an object. In bothcases, an action corresponds to a scattering from aSituation to another.

Fig. 5 shows a simple Action performed by therobot. In the figure, the robot goes towards the whitebox (a), it moves right (b), then it turns left in order to

avoid the obstacle (c). This Action may be representedin CS (Fig. 6 ) as adouble scattering of the knoxelkbrepresenting the robot. (The knoxelka representingthe box remains unchanged.)

4. Linguistic area

The representation of actions in the linguistic areais based on a high level, logic oriented formalism.As previously stated, we adopted NeoClassic, a hy-brid formalism based on a Description Logic in theKL-ONE tradition. A hybrid formalism in this senseis constituted by two different components: atermi-nological componentfor the description of concepts,

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13 7

Fig. 6. An evocative picture of the Action of Fig. 5 in CS.

and anassertional component, that stores informationconcerning a specific context. In the domain of robotactions, the terminological component contains the de-scription of relevant concepts such as situation, action,time instant, and so on. The assertional componentstores the assertions describing specific situations andactions.

Fig. 7 shows a fragment of the terminologicalknowledge base (in order to increase the readability,we adopted a graphical notation of the kind usedby Brachman and Schmoltze [6]). In the upper partof the figure some highly general concept is repre-sented. In the lower part, the Avoid concept is shown,as an example of the description of an action in theterminological KB.

Fig. 7. A fragment of the terminological KB.

Every situation has a starting and an ending instant.So, the concept Situation is related toTime instantbythe rolesstart and end. A Robot is an example of amoving object. Also Actions have a start instant and anend instant. An Action involves a temporal evolution(a scattering in CS). Actions have at least two parts,that are Situations not occurring simultaneously: theprecond(i.e., the precondition) and theeffectof theaction itself.

An example of Action isAvoid. According to theKB reported in the figure, the precondition of Avoidis a Blockedpath situation, to which participate therobot and a blocking object. The effect of the Avoidaction is aFree pathsituation.

The temporal relations between situations are notexplicitly represented in the terminology. The formal-ism could be easily extended with temporal operators;these extensions have been proposed and deeply stud-ied in the literature [3]. However, we do not face theseaspects in this paper. In the following section, we willshow how it is possible to deal with many aspects oftemporal ordering by means of the mechanism of map-ping between the conceptual and the linguistic areabased on neural networks.

In general, we assume that the description of theconcepts in the symbolic KB (e.g., Blockedpath) isnot completely exhaustive. We symbolically representonly that information that is necessary for linguistic in-ferences. One of the key assumptions of our approachis that the referential grounding of symbols does notrest on rich explicit symbolic descriptions, but on theinformation implicitly stored in the mapping mecha-nism (see Section 5).

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS8998 A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13

The assertional component contains facts expressedas assertions in a predicative language, in whichthe concepts of the terminological components cor-respond to one argument predicates, and the roles(e.g., precond,part of) correspond to two argumentrelations. For example, the following predicates de-scribe that the instanceav1 of the ActionAvoid hasas a precondition the instancebl1 of the SituationBlocked path and it has as an effect the SituationFree path :

Avoid (av1)precond (av1,bl1)effect (av1,fr1)Blocked path (bl1)Free path (fr1)

5. Mapping between the conceptual and thelinguistic area

5.1. Generation of expectations

The mapping between the symbolic representationsin the linguistic area and structures in the conceptualspace is based on a suitable sequential mechanism ofexpectations.

The recognition of a certain component of a Situ-ation (a knoxel in CS) will elicit the expectation ofother components of the same Situation in the scene.In this case, the mechanism seeks for the correspond-ing knoxels in the current CS configuration. We callthis type of expectationsynchronicbecause it refersto a single situation in CS.

The recognition of a certain situation in CS couldalso elicit the expectation of a scattering in the ar-rangement of the knoxels in the scene; i.e., the mech-anism generates the expectations for another Situationin a subsequent CS configuration. We call this expec-tation diachronic, in the sense that it involves subse-quent configurations of CS. Diachronic expectationscan be related with the link existing between a Situ-ation perceived as the precondition of an Action, andthe corresponding Situation expected as the effect ofthe action itself. In this way diachronic expectationscan prefigure the situation resulting as the outcome ofan action. For example, when the robot recognizes aninstance of the Blockedpath situation, it generates the

expectations for a Freepath situation as the effect ofan Avoid action.

We take into account two main sources of expecta-tions. On the one side, expectations could be generatedon the basis of the structural information stored in thesymbolic knowledge base, as in the previous exampleof the action Avoid. We call these expectationslin-guistic. As soon as a Situation is recognized and thesituation is the precond of an Action, the symbolic de-scription elicit the expectation of the effect situation.

On the other side, expectations could also be gen-erated by purely associative, Hebbian mechanism be-tween situations. Suppose that the robot has learnt thatwhen it sees a person with the arm pointing on theright, it must turn to the right. The system could learnto associate these situations and to perform the relatedaction. We call these expectationsassociative.

5.2. The neural network implementation of themapping

The mapping between the conceptual space and thelinguistic area is implemented by means of recurrentneural networks. Each concept C in the linguistic areais associated with a suitable recurrent neural networkwhich acts as a “predictive filter” on the sequencesof knoxels corresponding to C. We adopted multi lay-ered neural networks with local feedback in the hid-den units [15]. For sake of brevity, we cannot enter indetails here. The neural implementation of the map-ping mechanism is not substantially different from theone we have developed for the interpretation of staticscenes. So, for greater details, we refer the reader to[8], where the neural network for static scene inter-pretation is described.

Let us consider a set of knoxelss = {k1, k2, . . . ,

km} corresponding to an instance of a Situation con-cept C, e.g., Blockedpath. When a knoxel ofs, sayk1, has been individuated by the subconceptual areaand it is presented as input to the recurrent networkassociated to C, the network generates as output an-other knoxel ofs, sayk2.

In this way, the network predicts the presence ofk2 in CS. The expectation is considered confirmedwhen the subconceptual area individuates a knoxelk∗ so thatk2 ≈ k∗. If the expectation is confirmed,then the network receives as inputk2 and generatesa new expected knoxelk3, and so on. The network

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13 9

therefore recognizes the configuration of knoxels ofthe associated concept according to a recognition andexpectation loop.

If C is a Situation, the generated sequences refer toa single CS configuration. If C is an Action, the se-quences refer to a succession of different CS config-urations. The first case is an example of synchronicattention, the second one is an example of diachronicattention.

Recurrent neural networks make it possible to avoidan exhaustive linguistic description of conceptual cat-egories: in some sense, prototype Situations and Ac-tions “emerge” from the activity of the neural networksby means of a training phase based on examples. Inaddition, the measure of similarity between a proto-type and a given Situation or Action is implicit in thebehavior of the network and is determined by learning.

6. Planning in conceptual space

The proposed framework for the interpretation ofrobot actions may be extended in order to allow therobot to deliberate its own sequences of actions. Inthis perspective, some forms of the planning of thesystem may be performed taking advantage from therepresentations in CS. Note that we are not claim-ing that all planning must be performed within CS.Not all the knowledge of the system is simultaneouslyrepresented at the conceptual level. The conceptualspace acts as some sort of working memory. Long termdeclarative knowledge is stored at the linguistic level.The more “abstract” forms of reasoning, that are lessperceptually constrained, are likely to be performedmainly within the linguistic area.

The forms of planning that are more directly re-lated to perceptual information can take great advan-tage from the representations in the conceptual area.In the proposed framework, in facts, the preconditionsof an actions can be simply verified by geometric in-spections in the CS, while in STRIPS-like plannersthe preconditions are verified by means of logical in-ferences on symbolic assertions. Also the effects ofan action may not be described by adding or deletingsymbolic assertions, as in STRIPS, but they can beeasily described by the situation in CS resulting fromtheexpectationsof the execution of the action itself.

In order to explain this point, let us suppose that

the robot has recognized the current situationp, e.g.,it is in front of a box. Let us also suppose that therobot knows that its goalg is to be in a certain positionwith a certain orientation. A set of expected situations{e1, e2, . . . ,} is generated by means of the interactionof both the linguistic and the associative modalitiesdescribed above. Eachei in this set can be recognizedto be the effect of some actionai in a set{a1, a2, . . .,}, where eachai is geometrically compatible with thecurrent situationp.

The robot chooses an actionai according to somecriteria; e.g.,ai is the action whose expected effectei has the minimum Euclidean distance in CS fromthe “goal” g. Once thatai has been chosen, the robotcan execute it; then it may update the current situationp according to the new perceptions, and restart themechanism of generation of expectations.

On the one side, linguistic expectations are the mainsource of deliberative robot plans: the prefiguration ofthe effect of an action is driven by the description ofthe action in the linguistic KB. This mechanism is sim-ilar to the selection of actions in deliberative forwardplanners. On the other side, associative expectationsare at the basis of a more reactive form of planning: inthis latter case, perceived situations can “reactively”recall some expected effect of an action.

This process of action selection has the effect of“situating” the robot in its environment, and allows itto have its own goals, in the sense of Maes [24]. In thisway, the symbols processed by the robot are alwaysfirmly grounded to the robot perceptions.

7. The 3D vision system

Although a precise 3D reconstruction of a scene isa hard task from the computational point of view, it isnot difficult to obtain approximate 3D reconstructionsunder certain conditions. We have implemented some3D reconstruction modules, each one acting on partic-ular categories of objects. In the following we describetwo examples of such modules: the polyhedral recon-struction module and the module for the reconstruc-tion of symmetric objects. More in general, the taskof a 3D reconstruction of the perceived scenes couldtake a great advantage from a “conceptually driven”active perception loop of the kind we mention amongfuture developments (see Section 8).

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS89910 A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13

Fig. 8. The operation of the polyhedral reconstruction module.

Images of the scene are acquired by a color CCDcamera on the top of the RWI B-12 robot. The imagesare processed by the 3D reconstruction modules inorder to recover the superquadric parameters approx-imating the entities in the scene. It should be notedthat the robot is always able to represent itself in CS,as far as it knows its shape, and it is able to estimateits position by the odometer sensors and by a suitableself-localization module, as described in [10].

7.1. The polyhedral reconstruction module

The polyhedral reconstruction module estimates thesuperquadric parameters approximating a polyhedral.This module is activated when the robot encounterslarge box-like objects, as the box shown in Fig. 2. It isbased on the analysis of the possible vertices of a par-allelepiped, according to the methodology described,among others, by Waltz [35] (see Fig. 8).

The image acquired by the robot (a) is processedby means of the Hough transform [11] in order toput in evidence and extract the lines in the image (b).The extracted lines are then analyzed in order to ex-tract the possible intersections that individuate trihe-dral vertices (c).

The knowledge of the position and of the inclinationangle of the robot camera allows the system to estimatethe effective dimensions of the box, and to approxi-mate it by means of a square-shaped superquadric (d).

7.2. The reconstruction module for symmetric objects

The reconstruction module for symmetric objectsis activated when the robot is in front of some objectpresenting an axial symmetry (e.g., a vase). It is basedon the analysis of occluding contours of symmetricobjects (see Fig. 9).

The module works on sequences of images ac-quired while the robot turns around the object. Foreach view (a), the occluding contour is approxi-mated by means of ab-snake[20] (b). A b-snake isa deformable curve that moves in the image underthe influence of forces related to the local distribu-tion of the gray levels, and it is “attracted” by thecontours of the object. The b-snakes of the objectcontours extracted during the robot movements maybe considered as the wireframes of the 3D surfacethat approximates the whole object (c). In order torecover the approximating superquadrics, the objectis segmented in constituent convex regions. Each

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13 11

Fig. 9. The operation of the reconstruction module for symmetric objects.

region is then approximated by means of a suitablesuperquadric (d).

8. Conclusions

We presented a framework for description of Situa-tions and Actions at an intermediate “conceptual” levelbetween the subconceptual and the linguistic ones.The main advantage of this framework is to suitablyground the symbols of the robot, needed for reason-ing about its own actions and for planning the newactions, to the robot perceptions.

Until now the generation of expectations in our sys-tem has been mainly used to drive the analysis and theinterpretation of the conceptual representations withinthe robot itself. As far as future research is concerned,an important development would consist in using ex-pectations to drive the robot’s exploration of the exter-nal world. The expectations generated at the linguisticand conceptual levels could be fed back through thesubconceptual area to the actuators and the sensors ofthe system, in order to acquire new information fromthe environment, achieving in this way some form of“conceptually driven” active perception.

A further line of future development consists in ex-panding the expressiveness of the linguistic formal-ism, and in coupling more complex forms of symbolicreasoning with the inferences performed at the con-ceptual level. Among the most promising extensionsof the symbolic capabilities of our system, are the ex-plicit representation of time, the explicit treatment ofuncertainty and symbolic, high level deliberative plan-ning techniques.

In addition, we are working to extend our frame-work to define suitable plans in a multirobot environ-ment. In this case, the knoxels in CS are generatedthrough suitable processes of information fusion of theperceptions of the team of robots. Also the Situationsand Actions must be referred to the whole robot team,which may be considered as a single “autonomousentity” with its own perceptions, actions and a suitablecommon and shared conceptual space.

Acknowledgements

The authors wish to thank Marcello Frixione andPeter Gärdenfors for interesting discussions about thetopics of the paper. Giuseppe Sajeva and Ignazio In-

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS89912 A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13

fantino contributed to the implementation of the re-ported experimental setup. This work has been par-tially supported by Project “An Intelligent System forAutonomous Robots Supervision in Space” of the ASIand by “Progetto Cofinanziato CERTAMEN” of theMURST.

References

[1] J.F. Allen, Towards a general theory of action and time,Artificial Intelligence 23 (2) (1984) 123–154.

[2] R.C. Arkin, Integrating behavioral perceptual, and worldknowledge in reactive navigation, Robotics and AutonomousSystems 6 (1990) 105–122.

[3] A. Artale, E. Franconi, A temporal description logic forreasoning about actions and plans, Journal of ArtificialIntelligence Research 9 (1998) 463–506.

[4] A. Blum, M. Furst, Fast planning through planning graphanalysis, Artificial Intelligence 90 (1–2) (1997) 281–300.

[5] R.J. Brachman, D.L. McGuinness, P.F. Patel-Schneider, L.Alperin Resnick, A. Borgida, Living with CLASSIC: whenand how to use a KL-ONE-Like language, in: J. Sowa(Ed.), Principles of Semantic Networks: Explorations in theRepresentation of Knowledge, Morgan Kaufmann, San Mateo,CA, 1991, pp. 401–456.

[6] R.J. Brachman, J.C. Schmoltze, An overview of the KL-ONEknowledge representation system, Cognitive Science 9 (2)(1985) 171–216.

[7] H. Buxton, S. Gong, Visual surveillance in a dynamic anduncertain world, Artificial Intelligence 78 (1995) 371–405.

[8] A. Chella, M. Frixione, S. Gaglio, A cognitive architecturefor artificial vision, Artificial Intelligence 89 (1997) 73–111.

[9] A. Chella, M. Frixione, S. Gaglio, An architecture forautonomous agents exploiting conceptual representations,Robotics and Autonomous Systems 25 (3–4) (1998) 231–240.

[10] A. Chella, S. Gaglio, G. Sajeva, F. Torterolo, An architecturefor autonomous agents integrating symbolic and behavioralprocessing, in: Proceedings of the Second EUROMICROWorkshop on Advanced Mobile Robots, IEEE ComputerSociety Press, Los Almitos, CA, 1997, pp. 45–50.

[11] R.O. Duda, P.E. Hart, Use of the Hough transform to detectlines and curves in pictures, Communications of the ACM15 (1972) 11–15.

[12] R.E. Fikes, P.E. Hart, N.J. Nilsson, Learning and executinggeneralized robot plans, Artificial Intelligence 3 (4) (1972)251–288.

[13] R.E. Fikes, N.J. Nilsson, STRIPS: a new approach to theapplication of theorem proving to problem solving, ArtificialIntelligence 2 (3–4) (1971) 189–208.

[14] K.J. Firby, P.N. Prokopowicz, M.J. Swain, The animate agentarchitecture, in: D. Kortenkamp, R.P. Bonasso, R. Murphy(Eds.), Artificial Intelligence and Mobile Robots — CaseStudies of Successful Robot System, AAAI Press/MIT Press,Cambridge, MA, 1998.

[15] P. Frasconi, M. Gori, G. Soda, Local feedback multilayerednetworks, Neural Computation 4 (1) (1992) 120–130.

[16] P. Gärdenfors, Conceptual Spaces, MIT Press/BradfordBooks, Cambridge, MA, 2000.

[17] D. Haehnel, W. Burgard, G. Lakemeyer, Golex — bridging thegap between logic (GOLOG) and a real robot, in: Proceedingsof the 22nd German Conference on Artificial Intelligence(K198), Bremen, Germany, 1998.

[18] G. Herzog, From visual input to verbal output in the visualtranslator, in: R.K. Srihari (Ed.), Proceedings of the AAAIFall Symposium on Computational Models for IntegratingLanguage and Vision, Cambridge, MA, 1995.

[19] I. Horswill, Integrating vision and natural language withoutcentral models, in: Proceedings of the AAAI Fall Symposiumon Embodied Language and Action, 1995.

[20] M. Kass, A. Witkin, D. Terzoupolos, Snakes: active contourmodels, in: Proceedings of the First International Conferenceon Computer Vision, Springer, Berlin, 1987, pp. 259–268.

[21] D. Kortenkamp, R.P. Bonasso, R. Murphy (Eds.), ArtificialIntelligence and Mobile Robots — Case Studies of SuccessfulRobot System, AAAI Press/MIT Press, Cambridge, MA,1998.

[22] B. Kuipers, Y.-T. Yung-Tai, A robot exploration andmapping strategy based on a semantic hierarchy of spatialrepresentations, Robotics and Autonomous Systems 8 (1991)47–63.

[23] H.J. Levesque, R. Reiter, Y. Lesperance, F. Lin, R. Scherl,GOLOG: a logic programming language for dynamicdomains, Journal of Logic Programming 31 (1997) 59–84.

[24] P. Maes, Designing autonomous agents, Robotics andAutonomous Systems 6 (1990) 1–2.

[25] C. Malcom, T. Smithers, Symbol grounding via a hybridarchitecture in an autonomous assembly system, Robotics andAutonomous Systems 6 (1990) 123–144.

[26] D. Marr, L. Vaina, Representation and recognition of themovements of shapes, Proceedings of the Royal Society ofLondon, Series B 214 (1982) 501–524.

[27] D.L. McGuinness, J.R. Wright, An industrial strengthdescription logic-based configurator platform, IEEEIntelligent Systems 13 (4) (1998) 69–77.

[28] N. Muscettola, P.P. Nayak, B. Pell, B.C. Williams, Remoteagents: to boldly go where no AI system has gone before,Artificial Intelligence 103 (1–2) (1998) 5–47.

[29] A.V. Oppenheim, R.W. Shafer, Discrete-time SignalProcessing, Prentice-Hall, Englewood Cliffs, NJ, 1989.

[30] C. Owen, U. Nehmzow, Route learning in mobile robotsthrough self-organization, in: Proceedings of the FirstEUROMICRO Workshop on Advanced Mobile Robots, IEEEComputer Society Press, Los Alamitos, CA, 1996.

[31] J.S. Penberthy, D.S. Weld, UCPOP: a sound, complete, partialorder planner for ADL, in: Proceedings of the KR-92, 1992,pp. 103–114.

[32] A.P. Pentland, Perceptual organization and the representationof natural form, Artificial Intelligence 28 (1986) 293–331.

[33] S. Thrun, A. Bücken, W. Burgard, D. Fox, T. Fröhlinghaus,D. Henning, T. Hofmann, M. Krell, T. Schmidt, Map learningand high-speed navigation in RHINO, in: D. Kortenkamp,

UN

CO

RR

ECTE

D P

RO

OF

ROBTICS899A. Chella et al. / Robotics and Autonomous Systems 899 (2001) 1–13 13

R.P. Bonasso, R. Murphy (Eds.), Artificial Intelligence andMobile Robots — Case Studies of Successful Robot System,AAAI Press/MIT Press, Cambridge, MA, 1998, pp. 21–52.

[34] S.J. Vestli, Tschichold-Gürman, MOPS, a system for maildistribution in office type buildings, Service Robot, anInternational Journal 2 (2) (1996).

[35] D. Waltz, Understanding line drawings of scenes withshadows, in: P.H. Winston (Ed.), The Psychology of ComputerVision, McGraw-Hill, New York, 1975.

[36] D. Weld, C. Anderson, D. Smith, Extending Graphplan tohandle uncertainty and sensing actions, in: Proceedings ofthe 16th AAAI, 1998.

[37] A. Zelinsky (Ed.), Field and Service Robotics, Springer, NewYork, 1998.

A. Chella was born in Florence, Italy, on4 March 1961. He received his LaureaDegree in electronic engineering in 1988and his Ph.D. in computer engineering in1993 from the University of Palermo, Italy.Currently, he is an Associate Professor ofrobotics at the University of Palermo. Hisresearch interests are in the field of au-tonomous robotics, artificial vision, neuralnetworks, hybrid (symbolic–subsymbolic)

systems and knowledge representation. He is a member of IEEEand AAAI.

S. Gagliowas born in Agrigento, Italy, on11 April 1954. He graduated in electricalengineering from the University of Genoa,Italy in 1977. In 1978, he received hisM.S.E.E. degree from the Georgia Instituteof Technology, Atlanta, GA. From 1986,he is a Professor of artificial intelligenceat the University of Palermo, Italy. Hispresent research activities are in the areaof artificial intelligence and robotics. He

is a member of IEEE, ACM and AAAI.

R. Pirrone was born in Palermo, Italy,on 2 May 1966. He received his LaureaDegree in electronic engineering in 1997and his Ph.D. in computer engineering in1995 from the University of Palermo, Italy.Currently, he is an Assistant Professor atthe University of Palermo. His researchinterests are in the field of autonomousrobotics, artificial vision and neural net-works. He is a member of IEEE.