multi-task vision for mobile robots

Multi-Task Vision for Mobile Robots∗

D. Hernandez, J. Cabrera, A. Naranjo, A. Dominguez, J. Isern

IUSIANI, ULPGC

[email protected]

Abstract

This paper describes the architecture of anactive-vision system that has been conceivedto ease the concurrent utilization of a visualsensor by several tasks on a mobile robot. Wedescribe in detail the functional architectureof the system and provide several solutionsto the problem of sharing the visual atten-tion when several visual tasks need to be in-terleaved. The system's design hides this com-plexity to client processes that can be designedas if they were exclusive users of the visualsystem. Some preliminary results obtained inexperiments with real mobile robots are alsoprovided.

1 Introduction

The control of the gaze in an active vision sys-tem has attracted the attention of researchersunder di�erent approaches. Usually it is for-mulated as a problem of detection of signi�-cant points in the image. In this context, sev-eral aspects such as saliency, bottom-up vs.top-down control, computational modelling,etc, have been analyzed [1][3]. An alternativeview considers the shared resource nature ofthe sensor, transforming the scenario into amanagement/coordination problem where thecontrol of the gaze must be shared among aset of dynamic concurrent tasks.

In close relation with the aforementioned,but from a more engineering point of view,is the fact that researchers and engineers in-

∗This work has been partially supported bySpanish Education Ministry and FEDER (projectTIN2004-07087) and Canary Islands Government(project PI2003/160)

volved in the development of vision systemshave been primarily concerned with the visualcapabilities of the system in terms of perfor-mance, reliability, knowledge integration, etc.However very little attention has been devotedto the problem of modular composition of vi-sion capabilities in perception-action systems.In fact, the majority of vision systems are stilldesigned and integrated in a very primitiveway according to modern software engineer-ing principles, and there are few published re-sults on how to control the visual attention ofthe system among several tasks that executeconcurrently [8]. On the other hand, roboticarchitectures and frameworks (Player, URBI,...) are normally more focused on hardwareabstraction and software reuse aspects.

Posing a simple analogy to clarify theseissues, when we execute programs thatread/write �les that are kept in the hard disk,we aren't normally aware of any contentionproblem and need not to care if any other pro-cess is accessing the same disk at the sametime. This is managed by the underlying ser-vices, simplifying the writing of programs thatcan be more easily codi�ed as if they haveexclusive access to the device. Putting it onmore general terms, we could consider this fea-ture as an �enabling technology� as it eases thedevelopment of much more complex programsthat build on top of this and other features.

How many of these ideas are currently ap-plied when designing vision systems?. In ouropinion, not many. Maybe because designingvision systems has been considered mainly as aresearch endeavor, much more concerned withother higher level questions, these low level is-sues tend to be simply ignored. If we trans-late the former example to the context of vi-

https://www.researchgate.net/publication/238270606_Models_of_Bottom-up_Attention_and_Saliency?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3

https://www.researchgate.net/publication/2869158_Eye_Movements_for_Reward_Maximization?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3

https://www.researchgate.net/publication/272161384_Behavior_based_Robotics?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3

sion systems, some important drawbacks canbe easily identi�ed.

• Vision systems tend to be monolithic de-velopments. If several visual tasks need toexecute concurrently this need to be an-ticipated since the design stage. If sometype of resource arbitration is necessaryit is embedded in the code of the relatedtasks.

• Within this approach it is very di�cult toadd new visual capabilities that may com-pete against other tasks for the attentionof the system.

• As far as contention situations are dealtwith internally and treated by means ofad-hoc solutions, the development of suchsystems does not produce any reusabletechnology for coping with these prob-lems.

As a selection of related work, the followingthree systems can be mentioned. Dickmannsand colleagues [5] have studied the problemof gaze control in the context of their Mar-VEye Project, where an active multi-camerahead is used to drive a car in a highway. Inthat project, several areas of interest are pro-moted and ranked by di�erent modules of thecontrol architecture using a measure of infor-mation gain. The gaze controller tries to de-sign the gaze trajectory that within 2 secondscan provide the highest gain of information tothe system. Several areas of attention can bechained in a gaze trajectory, though in prac-tice this number is never higher than two.Humanoid robots need to rely in their vi-

sual capabilities to perceive the world. Evenfor simple navigation tasks, a number of tasks(obstacle avoidance, localization, . . . ) needto operate concurrently to provide the robotwith minimal navigation capabilities. Severalgroups have explored the problem of gaze ar-bitration in this scenario, both in simulationand with real robots. Seara et al. [6] [7]have experimented with a biped robot thatused a combination of two tasks to visuallyavoid obstacles and localize itself. The de-cision of where to look next was solved in

two stages. Firstly, each task selects its nextpreferred focus of attention as that provid-ing the largest reduction of incertitude in therobot localization, or in the location of obsta-cles. In a second stage, a multiagent decisionschema, along with a winner-selection societymodel, was used to �nally decide which taskwas granted the control of gaze. Of the sev-eral society models that were evaluated, thebest results, as judged by the authors, were ob-tained by a society that tries to minimize thetotal �collective unhappiness�. Here the con-cept of unhappiness is derived of loosing theopportunity of reducing incertitude (in self orobstacle localization).

Sprague et al. [9][8] have designed a simula-tion environment where a biped robot mustwalk a lane while it picks up litter andavoids obstacles, using vision as the only sen-sor. These capabilities are implemented as vi-sual behaviors using a reinforcement learningmethod for discovering the optimal gaze con-trol policy for each task. The lateral positionof the robot within the lane and the positionof obstacles and litter are modelled by Kalman�lters. Every 300 msec, the gaze is given tothe task that provides the largest gain in un-certainty reduction.

Similar in spirit to this related research, themotivation of our work is consequently two-fold: contribute to build a vision system moreconsistent from an engineering point of view,and to take a �rst step towards systems wherethe vision becomes integrated in an action con-text with higher semantic and cognitive level(an �intelligent� way of looking).

In the next sections we will present the pro-posed architecture, with its design objectivesand main components. Some experimental re-sults obtained on a real robot, along with con-clusions and intended future development willbe described in the last two sections.

2 MTVS architecture

Motivated by the previously stated descrip-tion of the problem we have designed and im-plemented MTVS (Multi-Tasking Vision Sys-tem), a proposal of architecture for active-

vision systems in multi-tasking environments.MTVS has been designed to deal with thescheduling of concurrent visual tasks in sucha way that resource arbitration is hidden tothe user.More in detail, MTVS pursues the following

objectives:

• The assignment of the gaze control to atask is based on a simple scheduler model,so that the behavior can be easily inter-preted by an external observer.

• The client tasks are integrated in the sys-tem individually with no coordination re-quirements.

• The set of tasks managed (the activity)can change dynamically.

• The clients should not assume any a pri-ori response time guarantee, though thesystem can o�er high-priority attentionmodes.

• The system o�ers services based on areduced set of visual primitives, in pre-categorical terms.

2.1 Internal structure

The �gure 1 shows an example with two clientsand the basic elements making up the visionsystem architecture: a system server, a taskscheduler and the data acquisition subsystem.Basically, the clients connect to the systemserver to ask for visual services with a givencon�guration (client A active). In response,the system server launches both a task-thread,to deal with internal scheduling issues, and adevoted server-thread that will be in charge ofthe interaction with the external client. Thescheduler analyzes the tasks demands undera given scheduling policy and selects one toreceive the gaze control. In combination, asecond covert scheduler checks for compatibil-ity between tasks to share images among them(FOA's overlapping). The data acquisitionsubsystem processes the di�erent sensor datastreams (images, head pose and robot pose) togenerate as accurately as possible time stampsand pose labels for the served images.

2.2 Visual services

Clients can connect to the vision system anduse it through a number of pre-categorical low-level services. The MTVS services are builtfrom basic visual capabilities (primitives) thathave also been explored by other authors [2]:

WATCH: Capture N images of a 3D pointwith a given camera con�guration.

SCAN: Take N images while the head is mov-ing along a trajectory.

SEARCH: Detect a model pre-categoricallyin a given image area.

TRACK: Track a model pre-categorically.

NOTIFY: Inform the client about move-ment, color or other changes.

Except for WATCH, the rest of primi-tives can be executed discontinuously, allowingfor the implementation of interruptible visualtasks.The clients also regulate their activity in the

system by means of the messages they inter-change with their devoted server. Currently,the following messages have been de�ned fora task: creation, suspension, recon�guration(modify parameters, change priority, commuteprimitive on success) and annihilation.

2.3 Scheduling policies

Several scheduling policies have been imple-mented and studied inside MTVS. This anal-ysis has considered two main groups of sched-ulers: time-based and urgency based sched-ulers.Time-based schedulersThree types of time-based schedulers have

been studied: Round-Robin (RR), EarliestDeadline First (EDF) and EDF with priori-ties (EDFP). The prioritized RR algorithm re-vealed rapidly as useless in a dynamic and con-textual action schema. First, it makes no senseto assign similar time slices to di�erent tasks,and second, the time assigned used for sac-cadic movements, specially when a slow neckis involved, becomes wasted.

Figure 1: Control Architecture: example with two clients

The EDF algorithm yielded a slightly betterperformance than RR, but was di�cult to gen-eralize as visual tasks are not suitable for beingmodelled as periodic tasks. The best results ofthis group were obtained by the EDFP algo-rithm combining critical tasks (strict deadline)with non-critical tasks. Each time a task isconsidered for execution and not selected itspriority is incremented [4].

Urgency-based schedulers

The concept of urgency is well correlatedwith a criteria of loss minimization, as a conse-quence of the task not receiving the control ofthe gaze within a time window. This measurecan also be put into relation with uncertaintyin many visual tasks.

Two schedulers have been studied in thisgroup: lottery [9] and max-urgency. The lot-tery scheduler is based in a randomized schemewhere the probability of a task being selectedto obtain the gaze control is directly propor-tional to its urgency. Every tasks has the pos-sibility of gaining the control of the gaze, butthe random unpredictability can sometimesproduce undesirable e�ects.

The max-urgency scheduler substitutes theweighted voting by a direct selection of the

task with higher urgency value. This schemehas produced acceptable results provided thatthe urgency of a task is reduced signi�cantlyafter gaining the control of the gaze (similarto an inhibition of return mechanism). Exper-iments on scheduling policies comparison havenot been included due to space limitations.

2.4 Implementation

The Vision System can be operated in a num-ber of con�gurations. In its most simple con-�guration the system may comprise a simplepan/tilt system and a camera, or a motorizedcamera, or it can integrate both systems as il-lustrated in the experiments described later.In this last con�guration, a relatively slowpan/tilt system, acting as the neck of thesystem, carries a motorized camera (SONYEVIG21) equipped with a fast pan/tilt sys-tem, that can be considered as the eye of thesystem. This mechanical system in turn canbe used in isolation or it can be installed on amobile robot.

The vision system runs under the Linux OSas a multithreaded process. Clients run asindependent processes. They may share the

https://www.researchgate.net/publication/222430299_Deciding_when_to_switch_tasks_in_time-critical_multitasking?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3

Figure 2: Robot Neck-Eye hardware

same host, in which case the communicationprimitives are implemented using shared mem-ory, or on di�erent hosts connected by a localnetwork. The interface to the image acquisi-tion hardware is based on the Video4Linux2services so that a large number of camerasand/or digitizers can be supported. For theimage processing required by the primitives ofthe system or at the clients (color/shape de-tection), the OpenCV library is used.

3 Mobile robot experiments

A set of experiments were carried out to ana-lyze the behavior of MTVS on a real roboticapplication. The basic experimental setupconsists of two ActivMedia Pioneer robots,one with the basic con�guration and the othermounting an active vision system formed by aDirected Perception PTU (neck) and a motor-ized Sony EVI-G21 camera (eye).

Two main tasks were combined along thedi�erent experiments: target following and ob-stacle avoidance. The target following taskcommands the active vision robot (pursuer)to detect and follow a special square targetmounted on other robot (leader), trying to

keep a prede�ned constant distance betweenthem. The obstacle avoidance task looks forcolored cylinders on the �oor, estimating, asexactly as possible their 2D position. Kalman�ltering is used to model both target and ob-stacles positions.

3.1 One-task experiments

As a reference for the maximum expectedperformance for each task some experimentswhere designed involving only one task.

Experiment 1: Follow Robot only

In this experiment, the leader robot is com-manded to move forward at a constant speedof 200 mm/sec, while the pursuer must try tokeep a constant separation of 2 meters. Severaltests have been conducted along the main cor-ridor of our lab following a 15 meters straightline path. The pursuer was able to stabilizethe reference distance with a maximum erroraround 150 mm.

Experiment 2: Obstacle avoidanceonly

Now the active vision robot is commandedto explore the environment looking for objects(yellow cylinders), trying to reduce their posi-tion uncertainty below a prede�ned threshold.The robot moves straight-line inside a corri-dor formed by 8 cylinders equally distributedin a zigzag pattern along the path. The �gure3 illustrates the robot path and the di�erentdetections for each localized object, includingtheir �rst (larger) and minimum uncertaintyellipses. The results show how the robot wasable to localize all the objects with minimumuncertainty ellipses ranging from 100 to 200mm in diameter.

3.2 Multiple-task experiments

The multiple-task experiments consider anscenario in which each task computes its de-sired camera con�guration and urgency andasks the MTVS scheduler to obtain the gazecontrol. The scheduler uses this informationto select where to look next and how to dis-tribute images. The obstacle avoidance taskis extended to classify special con�gurations

Figure 3: Obstacle avoidance-Only experiment

of objects as �doors� (two objects aligned per-pendicularly to robot initial orientation witha pre-de�ned separation).The urgency of the following task is com-

puted as a function of the distance error, therobot velocity and the time. This urgency in-creases as the distance between the robots dif-fers from the reference, the velocity is high andthe elapsed time since the last image was re-ceived becomes larger.The urgency of the obstacle avoidance task

is computed separately for three possible fo-cus of attention: front (the urgency increaseswhen the robot moves towards visually unex-plored areas), worst estimated object (the ur-gency increases as the position of a previouslydetected object is not con�rmed with new im-ages), and closest door (the urgency increaseswith narrow doors).The �rst two simple multiple-task experi-

ments try to illustrate the sharing images ca-pability of MTVS. In each case one task hashigher priority, so in a context of exclusivecamera sensor access the secondary task showsvery low performance. Allowing the sharingof images (comparing FOA's), however, theoverall performance can be improved. In ex-periment 5 a more complex scenario includingdoors is analyzed.Experiment 3: Obstacle avoidance

and robot following competing for thegaze (following priority)In this experiment, the control of the gaze is

only granted to the avoidance task when boththe leader speed and the distance error arelow. Typically, the following task performanceis not a�ected signi�cantly, but the avoidancetask degrades yielding few objects localizationwith poor precision. As an example, the up-per plot of the �gure 4 presents the results of anon sharing run where only half the potentialobjects (all right sided due to the position ofthe closest obstacle) have been detected withlarge uncertainty ellipses. As the lower plotof the �gure shows, the sharing of images per-mits a much better behavior of the obstacleavoidance task.Experiment 4: Obstacle avoidance

and robot following competing for thegaze (avoidance priority)In this experiment the control of the gaze

is only granted to the following task when theclosest objects have been localized with preci-sion. When sharing is not allowed, the avoid-ance task keeps an acceptable performancewhile the following task fails as the leaderrobot goes out of visual �eld with no reaction.When sharing is permitted, the following taskbehavior improves, but only initially. Soon,the camera must be pointed laterally to re-

Figure 4: Follow (priority) and avoidance experiment

duce the localization error of the objects, andthe captured images are no longer valid for thefollowing task, that degrades rapidly.

Experiment 5: Localize doors androbot following competing for the gaze(narrow and wide doors)

The con�guration of objects used for thisexperiment consist of a set of four �doors�: twonarrow type (600 mm width) and two widetype (1500 mm width). All doors are locatedstraight line in front of the robot, the �rst one(wide) three meters ahead and the rest every1.5 meters, alternating narrow and wide types.The leader robot is commanded to move atconstant speed crossing the doors centered.

The �gure 5 illustrates how the camera ispointed to both sides when crossing narrowdoors. As a consequence of this behavior, thepursuer robot slows down when approaching anarrow door until the door extremes positionhave been estimated with the required preci-sion (compare �nal error ellipses for narrowand wide doors). After traversing the door,the robot accelerates to recover the desired fol-lowing distance from the leader.

4 Conclusions and future work

In this paper we propose an open architecturefor the integration of concurrent visual tasks.The clients requests are articulated on the ba-sis of a reduced set of services or visual prim-itives. All the low level control/coordinationaspects are hidden to the clients simplifyingthe programming and allowing for an open anddynamic composition of visual activity frommuch simpler visual capabilities.

Regarding the gaze control assignationproblem, several schedulers have been imple-mented. The best results are obtained by acontextual scheme governed by urgencies, tak-ing the interaction of the agent with its envi-ronment as organization principle instead oftemporal frequencies. Usually, a correspon-dence between urgency and uncertainty abouta relevant task element can be established.

The system described in this paper is justa prototype, mainly a proof of concept, andit can be improved in many aspects. The fol-lowing are just a few of them. We plan toimprove the adaptability of the system to dif-ferent active vision heads (hardware abstrac-tion). A �rst step is to consider the extensionof the system to be applied over a binocularsystem, where new problems like eye coordi-nation, vergence and accommodation must be

Figure 5: Narrow and wide doors experiment

tackled. Another issue is the need of an ac-ceptance test for new service requests to avoidoverloading the system. Besides, the introduc-tion of homeostasis mechanisms could help tomake the system more robust; and the inclu-sion of bottom-up directed attention (indepen-dent movement, e. g.) would allow for newsaliency-based primitives.

References

[1] Arkin, R., ed.: Behavior-Based Robotics.MIT Press (1998)

[2] Christensen, H., Granum, E.: Control ofperception. In Crowley, J., Christensen,H., eds.: Vision as Process. Springer-Verlag (1995)

[3] Itti, L.: Models of bottom-up attentionand saliency. In Itti, L., Rees, G., Tsot-sos, J.K., eds.: Neurobiology of Attention.Elsevier Academic Press (2005)

[4] Kushleyeva, Y., Salvucci, D.D., Lee, F.J.:Deciding when to switch tasks in time-critical multitasking. Cognitive SystemsResearch (2005)

[5] Pellko�er, M., Lützeler, M., Dickmanns,E.: Interaction of perception and gaze con-

trol in autonomous vehicles. In: SPIE: In-telligent Robots and Computer Vision XX:Algorithms, Techniques and Active Vision,Newton, USA (2001)

[6] F. Seara, J., Lorch, O., Schmidt, G.:Gaze Control for Goal-Oriented HumanoidWalking. In: Proceedings of theIEEE/RAS International Conference onHumanoid Robots (Humanoids), Tokio,Japan (November 2001) 187�195

[7] F. Seara, J., Strobl, K.H., Martin, E.,Schmidt, G.: Task-oriented and Situation-dependent Gaze Control for Vision GuidedAutonomous Walking. In: Proceedingsof the IEEE/RAS International Confer-ence on Humanoid Robots (Humanoids),Munich and Karlsruhe, Germany (October2003)

[8] Sprague, N., Ballard, D.: Eye movementsfor reward maximization. In: Advancesin Neural Information Processing Systems(Vol. 16). MIT-Press (2003)

[9] Sprague, N., Ballard, D., Robinson, A.:Modeling attention with embodied visualbehaviors. ACM Transactions on AppliedPerception (2005)

https://www.researchgate.net/publication/238272643_Modeling_Attention_with_Embodied_Visual_Behaviors?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3








https://www.researchgate.net/publication/255672273_Gaze_Control_for_Goal-Oriented_Humanoid_Walking?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3








https://www.researchgate.net/publication/300027668_Control_of_Perception?el=1_x_8&enrichId=rgreq-f456581b-09a7-43fe-8fb7-3486800140dd&enrichSource=Y292ZXJQYWdlOzIyODY4NzgwNDtBUzo5OTcyNjgyNTM2MTQxOUAxNDAwNzg4MTM2NzA3












multi-task vision for mobile robots

Documents