control of a camera for active vision: foveal vision, smooth tracking and saccade

16
International Journal of Computer Vision 39(2), 81–96, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade EHUD RIVLIN Department of Computer Science, Technion, Israel Institute of Technology, Haifa 32000, Israel [email protected] H ´ ECTOR ROTSTEIN Rafael—Armament Development Authority and Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel [email protected] Abstract. Several characteristics of the human oculomotor system have been suggested to be useful also for active vision mechanisms. Among others, foveal vision and a tracking scheme based on two different modes, called smooth pursuit and saccade have often been postulated or implemented. The purpose of this paper is to formulate a setup in which the benefit of implementing these schemes can be evaluated in a systematic manner, based on control considerations but incorporating image processing constraints. First, the advantage of using foveal vision is evaluated by computing the size of the foveal window which will allow tracking of the largest possible class of signals. By using linear optimal control theory, this problem can be formulated as a one-variable maximization. Second, foveal vision leads naturally to smooth pursuit, defined as the performance that can be achieved by the controller resulting in the optimal size of the foveal window. This controller is relatively simple (i.e., linear, time-invariant) as is to be expected for this control loop. Finally, when smooth pursuit fails a corrective action must be performed to re-center the target on the fovea. Recent results in linear optimal control, provide the necessary tools for addressing this challenging problem in a systematic manner. Keywords: active vision, smooth pursuit, saccade 1. Introduction “Active Vision” refers to the ability to move an image acquisition system in a controlled manner, in order to facilitate or allow certain machine vision tasks (Bajcsy, 1998; Swain and Stricker, 1993). Active vision systems usually consist of one or more cameras mounted in such a way that their orientation and imaging parameters (focus, zoom, aperture) can be adjusted in real-time. A simple active vision mechanism includes a single cam- era mounted on one or more mechanical degrees of freedom. More complicated stereo vision mechanisms or “robot heads” are constructed using two mo ving cameras mounted on a static or moving platform; over- all, the resulting mechanism usually attempts to match the kinemathics and dynamics of the human oculomo- tor system. The first active vision systems were constructed in the late eighties (see, e.g., Krotkov and Bajcsy, 1993; Ferrier and Clark, 1993); since then, sus- tained advance in the hardware for active vision has given rise to high performance systems, in some re- spects comparable with the human oculomotor sys- tem (Fiala et al., 1994). This progress has steamed the need for both highly efficient dedicated im- age processing tools and for control systems capa- ble of exploiting the potential characteristics of the mechanisms.

Upload: ehud-rivlin

Post on 02-Aug-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

International Journal of Computer Vision 39(2), 81–96, 2000c© 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

Control of a Camera for Active Vision: Foveal Vision, SmoothTracking and Saccade

EHUD RIVLINDepartment of Computer Science, Technion, Israel Institute of Technology, Haifa 32000, Israel

[email protected]

HECTOR ROTSTEINRafael—Armament Development Authority and Department of Electrical Engineering, Technion, Israel Institute

of Technology, Haifa 32000, [email protected]

Abstract. Several characteristics of the human oculomotor system have been suggested to be useful also foractive vision mechanisms. Among others, foveal vision and a tracking scheme based on two different modes, calledsmooth pursuitandsaccadehave often been postulated or implemented. The purpose of this paper is to formulatea setup in which the benefit of implementing these schemes can be evaluated in a systematic manner, based oncontrol considerations but incorporating image processing constraints. First, the advantage of using foveal visionis evaluated by computing the size of the foveal window which will allow tracking of the largest possible class ofsignals. By using linear optimal control theory, this problem can be formulated as a one-variable maximization.

Second, foveal vision leads naturally to smooth pursuit, defined as the performance that can be achieved bythe controller resulting in the optimal size of the foveal window. This controller is relatively simple (i.e., linear,time-invariant) as is to be expected for this control loop.

Finally, when smooth pursuit fails a corrective action must be performed to re-center the target on the fovea.Recent results in linear optimal control, provide the necessary tools for addressing this challenging problem in asystematic manner.

Keywords: active vision, smooth pursuit, saccade

1. Introduction

“Active Vision” refers to the ability to move an imageacquisition system in a controlled manner, in order tofacilitate or allow certain machine vision tasks (Bajcsy,1998; Swain and Stricker, 1993). Active vision systemsusually consist of one or more cameras mounted in sucha way that their orientation and imaging parameters(focus, zoom, aperture) can be adjusted in real-time. Asimple active vision mechanism includes a single cam-era mounted on one or more mechanical degrees offreedom. More complicated stereo vision mechanismsor “robot heads” are constructed using two mo vingcameras mounted on a static or moving platform; over-

all, the resulting mechanism usually attempts to matchthe kinemathics and dynamics of the human oculomo-tor system.

The first active vision systems were constructedin the late eighties (see, e.g., Krotkov and Bajcsy,1993; Ferrier and Clark, 1993); since then, sus-tained advance in the hardware for active vision hasgiven rise to high performance systems, in some re-spects comparable with the human oculomotor sys-tem (Fiala et al., 1994). This progress has steamedthe need for both highly efficient dedicated im-age processing tools and for control systems capa-ble of exploiting the potential characteristics of themechanisms.

Page 2: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

82 Rivlin and Rotstein

1.1. Human Visual System

Following the “natural” model provided by the humanvisual system, gaze control in robot heads is usuallyorganized as a number of low level control loops; theseloops should interact and—hopefully—cooperate toachieve the desired performance (Brown, 1990a). Twoattributes of the human visual system are of interest inthis paper: non uniform resolution and eye movement.The former refers to the fact that the human eye has arelatively small region with a high concentration of vi-sual sensors in the center called “fovea,” and a more orless exponential decay of the concentration of receptorstowards the periphery. This seems to be the result of atradeoff between high resolution vision and the neces-sity to operate—and survive—in real-time as dictatedby neuro computing constraints.

The fact that the fovea is relatively small implies thatit should be reoriented in order to span the desired re-gion of visual acuity. Hence the need for eye movementwhich, interesting enough, is not evident from everydayexperience. Eye movement has been the subject of in-tensive studies by researchers from different fields, anda large amount of sometimes conflicting data is avail-able in the literature. The interested reader is referred toRobinson (1968) and Carpenter (1988) for an accountof the activity from a control systems perspective.

Eye movement is a highly complex task from whichtwo main mechanisms can be recognized. The firstmechanism is calledsaccade, and consists of a rapidshift in the eye position. Saccade is characterized bya fast velocity and a relatively large time delay, pre-sumably caused by processing time of the retinal in-formation. For some time it was thought that vision isimpaired during saccade due to the so-called “saccadicsuppression” (Robinson, 1968). Further studies haveshown that a certain amount of visual processing oc-curs during the course of a saccade (Carpenter, 1988),although they affect on-going computations in an in-direct manner. Specific quantitative information aboutsaccade in human oculomotor system may be found inFiala et al. (1994), which also contains a pointer to theliterature on the subject.

The second main mechanism of the oculomotor sys-tem is calledsmooth pursuit(SP). As suggested by thename, smooth pursuit is characterized by smooth andslow eye movements, involving a relatively short timedelay. The main objective of SP is to keep the targetwithin the fovea, compensating for relatively slow tar-get movements. It is interesting to note that, although

saccades seem to be driven by positional error, there isno consensus among researchers as to whether this er-ror, velocity of the image or some combination, drivessmooth pursuit.

1.2. Modeling Assumptions and Previous Work

High performance active vision systems require con-trol strategies adequate for dealing with the followingcontrol problems:

– Sampled measurements. State-of-the-art commer-cial cameras acquire images at a rate of about30 frames/sec, which is to say that only a sampledversion of the position of the target is available fortracking. Notice, though, that performance shouldbe achieved in “continuous” time and not at the sam-pling instants only.

– Large time-delays. Extracting control-relevant infor-mation, e.g., the position of the target, out of the datadelivered by a camera can be an expensive and timeconsuming computational task. A delay then appearsbetween the instants the measurements are done andthe information on the position is available. Thistime-delay is made larger once other tasks, like com-munication and control-law generation, are takeninto account. As is well known, time-delays mayseverely limit the performance of a control system.

– Tight specifications. Active vision systems shouldachieve the performance expected from current me-chanical heads and computer hardware. The controlstrategy should get the most out of the system, giventhe dynamic performance of the active vision mech-anism and the hardware available for computation.

– Interaction/cooperation between the single-loopcontrollers. Following the premise that control loopsshould be designed one at a time, care should be takento guarantee that the loops “cooperate” towards thecommon objective.

The first issue mentioned above deserves some morediscussion, since it can be confused with the sec-ond one. Basic control systems are usually classifiedinto two groups: continuous and discrete time. Mostphysical systems, and a robot head is an example,evolve in continuous time. As opposed to this, a digitalcomputer deals with events happening at discrete timeintervals. The connection between these two worlds isthrough sampler and hold (i.e., A/D and D/A) devices.

Page 3: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 83

In an active vision systems, sampling is done throughthe image acquisition device, while the hold is imple-mented by the computer when sending piece-wise con-stant commands to the motors of the mechanism. It isclear that the computer/controller “sees” the system asa discrete-time one: information is obtained and controlactions are taken at discrete times only. On the otherhand, control signals to the motors are in continuoustime (although constant over the sampling intervals)and so is the movement of the target that one wants totrack.

When the sampling rate is fast (as compared withthe system dynamics) it is justified to model the con-trol system as either continuous or discrete-time. In-deed, the discrete-time controller behaves almost likea continuous-time one, while the behavior of the sys-tem over the inter-sampling can be ignored. Whenthe sampling rate is low, this modeling becomes du-bious: performance should be achieved in continu-ous time, but control action can only be taken at dis-crete intervals, due to measurement and computationalconstraints. Consequently, if a slow-sampled controlsystem is designed using the discrete or continuous-time paradigms, extensive simulations or experimentsshould be carried out to validate the design.

Let us review next the previous work on active vi-sion control, and check how they fit the above mod-eling premises. Ferrier and Clark (Ferrier and Clark,1993; Clark and Ferrier, 1993) studied the control ofthe Harvard Binocular Head. Their control is basedon the model of the oculomotor control described byRobinson, with separate subsystems for smooth pur-suit and saccadic motion. The smooth pursuit loopuses PI control plus some delay in the loop (mod-eled as a continuous-time one) and is inspired by theSmith predictor. Saccadic movements are controlled bya sampled-data loop. An alternative approach was pur-sued by Pahlavan and Eklundh (1993) for the RoyalInstitute of Technology Head. In their control scheme,smooth pursuit and saccade are two independent loops,designed using linear prediction for the saccade. How-ever, since both loops have approximately the samebandwidth it appears to be difficult to distinguish be-tween them in practical operation.

A more detailed approach was proposed by Brown,Coombs and co-workers (Brown et al., 1993; Coombsand Brown, 1991; Coombs and Brown, 1993), work-ing on the control of the Rochester Robot Head. Theseresearchers introduce a Smith Predictor and a KalmanFilter in an attempt to compensate for time delays in

the loop, following an earlier suggestions by Brown(1990a, 1990b). Notice that the delay is modeled as acontinuous-time one. The general strategy is to use PIDcontrollers coupled with predictors for smooth pursuit,while switching to an open-loop bang-bang controllerfor saccadic movements. Comparing with the previousdiscussion, these researchers attempt to circumvent thesampled-data nature of gaze-control by including pre-dictors that should “fill-in” the intersampling behavior.

Other works have been reported by Milios et al.(1993), Christensen (1993) and Fiala et al. (1994) andare all based on considering pursuit and saccade as sep-arate mechanisms, with PID’s and possibly some pre-dictors and delays on the feedback loops taking care ofsmooth-pursuit, and open-loop controllers based on lin-ear predictions for the saccadic movement. The tuningof the controllers is done using classical control theory,followed by on-line adjustments based on the outcomeof experiments. Switching between controllers is firedby a positional error larger than some threshold. Pursuitcontrollers are reset after each saccade. Some applica-tions use Kalman filters in an attempt to predict thebehavior by means of an observer (Christensen, 1995).

Murray et al. (1993) proposed a rather differentscheme. First, they introduced non-uniform resolutionby dividing the image into a coarse region and a fovealwindow on the center of the image. They also pro-posed an alternative scheme for the gaze control loop,based on a supervisor which should take care of de-ciding whether to pursue or saccade. In the additionalwork (Murray et al., 1995), switching from saccade topursuit was also considered; in particular, the authorssuggested that in order for the switching to succeed,both the position and the velocity of the target and thecamera should be matched. This observation follows asa special case of the approach considered in the presentpaper.

It follows from this review that the sampled-data na-ture of active vision systems has somewhat been under-stated in the literature. The image acquisition processhas been modeled as a pure time delay, which it isnot: it should more accurately be modeled as a samplerfollowed by a discrete-time delay orshift. Often theoverall system has been considered as discrete-time,which again is not: although the control action is takenat discrete intervals, specifications should be achievedin continuous-time. As opposed to these observations,in the present paper we will consider a model that takesthe sampled-data nature of the system explicitly intoaccount.

Page 4: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

84 Rivlin and Rotstein

1.3. Discussion and Main Problems

The discussion in the previous paragraph suggests thatsome active vision issues remain unsolved in spite ofthe activity in the area. In particular this paper attemptsto clarify the following topics:

1. Smooth pursuit and saccade. In spite of its popu-larity and common sense appeal, to the best of ourknowledge there is no proof available for the needof two separate mechanisms for tracking. A naiveobserver would argue for doing as “fast” tracking aspossible all the time instead of slow pursuit move-ments possibly followed by a sudden burst of ac-tivity. The basic question is then to establish underwhat conditions the two mechanisms are desirablefor improved performance.

2. Uniform vs. non-uniform sensing.Foveal vision hasbeen mainly motivated by the needs of image pro-cessing in real-time. The question we would like toaddress is the following: Is there a need for fovealvision based on control considerations? And if so,what is theoptimalsize of the foveal window? Tothe best of our knowledge, no results in this areahave been reported so far.

3. Interrelationship between pursuit and saccade. Iftwo control modes are desirable (and we anticipatehere that this may well be the case), then control lawsmust be computed for each mode, together with amechanism for switching between them. Switchingis clearly critical for a satisfactory operation. Thisis less so if the target of the saccade is stationary,which is one of the assumptions of many of the pre-vious works on this subject, but becomes critical fora moving target (or platform).

The purpose of this paper is twofold. First, the modelingof an active vision system is reviewed under the linesoutlined in the previous section. Second, answers areprovided for the three problems mentioned above.

The paper is organized as follows. In the next sec-tion the model for active vision control is introduced,including some simplifying assumptions. In Section 3,the need for non-uniform resolution is formulated asan optimal control problem, which will eventually leadto the conclusion that the necessity of active visiondepends both on the system dynamics and on the opti-mality criteria. A flexible framework is provided whichallows the computation of theoptimalsize of the foveawhenever non-uniform resolution is found to be con-venient. As a natural corollary to this problem, smooth

pursuit will be defined in rather precise terms, and thenecessity of saccadic movements in order to achievegood overall performance will be established. The ne-cessity of two control laws has been often postulatedin the active vision literature and also used in actualimplementation, but to the best of our knowledge thematerial in Section 3 is the first systematic presentationon the subject. This leads to Section 4, which deals withthe problem of finding a suitable paradigm for saccadiccontrol, and constructing a switching mechanism be-tween the two control modes. Section 5 contains somesimulation and experimental results. The good perfor-mance of the smooth pursuit control law designed fol-lowing the theory in Section 3 is experimentally veri-fied by comparing it with a controller designed usinga classical control approach. The performance of sac-cadic control was evaluated through simulations. Fi-nally, Section 6 presents some conclusions.

2. Setup and Modeling Considerations

For the purpose of addressing the basic problems offoveal vision and tracking mechanism, it suffices toconsider a configuration with a single camera with onlyone degree of freedom. For convenience, additionalsimplifying assumptions are introduced, which yieldthe simplest problem with all the essential features ofactive vision control built into it. Remarks as how to re-move these assumptions will be outlined. Consider thesetup illustrated in Fig. 1. Here the camera is mountedon a motor and has one degree of freedom, i.e., theangleθ that forms the optical axis with the horizontal.This angle can be modified by using the control signalu commanding the motor.

Figure 1. A simplified setup.

Page 5: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 85

Figure 2. Data-acquisition block diagram.

The image of the object is acquired by the cameraconnected to a vision card, which entails a samplingprocess, at a typical rate of at most 30 Hz, and alsospatial discretization which will be neglected in whatfollows. Each image should be processed in order toextract information about the position of the object,e.g., the angleφ that forms the centroid of the objectwith the horizon, as measured from the axis of rotation.The time consumed by this processing will depend ofthe amount of data present, i.e., the “size” of the im-age, and on the sophistication of the image processingalgorithm. In the model illustrated in Fig. 2, the imageprocessing stage is lumped together with other effectslike control law computation and communication de-lays, in a puretime delayτ proportional to the size ofthe image (plus some overhead). It is worth stressingthat, as opposed to, e.g., Sharkey and Murray (1996),this time delay is ofdiscretenature.

If the delayτ is larger than the sampling periodT ,the sequence of images has to be down-sampled by acorresponding factorq as in Fig. 2, unless parallel pro-cessing units are employed. In the simplest possiblecase,q will be equal to the smallest integer larger thanτ/T , but it can be made smaller subject to hardwareavailability. For ease of exposition, the former case isconsidered in the sequel. Assuming that the hardwareand the image processing algorithms are given, the sub-sampling rate will only be a function of the size of theimagex (see below for details); the notationqx will beused to stress this fact.

The feedback block-diagram, including the motorand the load, is shown in Fig. 3. The blockST is asampler with sampling periodT , which is followedby a down-sampler with down-sampling rateq; the

Figure 3. Closed-loop system.

continuous-time signal is then sampled with samplingperiodqT. The blockHqT represents a hold function(typically a zero-order hold) which translates the dis-crete time output of the controller into a continuous-time signal. Several comments are in order:

– The system dynamics are all lumped into the plantPin, and a feedback controllerCin is included in or-der to obtain good position regulation and desensiti-zation of the electro-mechanical system from plantvariations, possible neglected nonlinearities, plat-form motion and disturbances. Standard hardwaremay be used to implement this “inner” loop, whichwill usually work at much larger sampling rates.The transfer function of this closed-loop is calledP, which is assumed to be known. Notice that ifthe active vision system is mounted on a static plat-form, then the above stabilization problem is ratherstraightforward. For a moving platform, however,this problem can become quite complicated; sta-bilization should then be achieved with respect to“inertial” space.

– While the actual image and the angleθ evolve incontinuous time, the acquired image and the input tothe controller are discrete time signals with differentsampling rates wheneverq > 1; the resulting closed-loop is thenmulti-rate. It is worth stressing that nei-ther the continuous time errore(t) = φ(t)−θ(t) northe one resulting from the “fast” samplinge(kT) =φ(kT)−θ(kT) can be measured (the latter ifq > 1),and only theε(k) = φ(kqT) − θ(kqT) are avail-able for control. It is the belief of the authors thatthis observation has been somewhat neglected in theprevious literature.

– Finally, the figure is an idealized setup since a morerealistic configuration should take into account noisymeasurements, plant/model mismatch and possibledisturbances. However, most of these problems canbe dealt with by using classical control techniqueswhen designing the stabilization controllerCin de-scribed above.

In this paper, continuous time signals will be denotedas, e.g.,w(t), θ(t) and sometimes the dependence ont will be dropped when no confusion can arise. Dis-crete time signals will be denoted by, e.g.,ε(k). Whencorresponding to sampling of a continuous time signal,the equalityφ(k) = φ(kT) holds, where the samplingperiodT should be clear from the context.

Page 6: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

86 Rivlin and Rotstein

3. Is Non-Uniform Resolution Convenient?

As opposed to the human visual system, most camerasavailable commercially have uniform resolution, rais-ing the question of whether it is beneficial to implementa fovea in an active vision system. Although foveatedvision has been implicitly or explicitly (e.g., in Murrayet al., 1993) implemented before, the objective of thissection is to justify non-uniform resolution in preciseterms.

A foveal window should be easy to implement inhardware or a combination of hardware and software,by keeping high resolution on a specified region of theimage and reducing the resolution, e.g., by filteringand down-sampling, on the rest. The existence of a re-gion of high resolution reduces computational times,therefore leading to faster sampling-rates and smallertime-delays and suggest the potential of a better per-formances. At the same time, reducing the size of thewindow in which a target should remain makes thetracking specification tighter, and more so the smallerthe region. This describes the basic tradeoff involved indeciding the potential benefits of implementing multi-resolution sensing. The purpose of this section it toformulate this tradeoff in a systematic manner, whichwill allow the computation of the size of the fovea insome optimal sense.

Consider the feedback configuration illustrated inFig. 4. A reference modelM has been included whichgenerates the positionφ(t) of the object as a functionof the external signalw(t). Inclusion of M does notnecessarily imply an a priori knowledge of the behaviorof the target1 since, for instance,M could be a singleor double integrator which corresponds to assumingthatφ is generated by the velocity or acceleration ofthe target which should then be characterized in someuseful sense. It is worth stressing that this does notimply thatw(t) is available for feedback: the controlsystem is driven by the positional error alone, sincethis is the only quantity available for measurement. Thesignalw(t) is introduced as an artifice for designing thecontrollerC.

Whenw(t) denotes the acceleration of the target, afeasible controller should drivee(t) asymptotically to

Figure 4. The feedback configuration with reference model.

0 wheneverw(t) ≡ 0, i.e., zero asymptotic error forconstant velocity. This is a desirable characteristic alsoobserved in the human visual system. As discussed inYamamoto (1994), this cannot be achieved by usingthe discrete time controllerC alone, but it is possibleto connect between the output of the controller andthe input of the plant a pure integrator or, in general, afilter F(s)as shown in the figure. The same observationcan be understood in a perhaps more intuitive manneronce motion is allowed to the platform on which theactive vision system is mounted. Then, the stabilizationobjective should be formulated with respect to inertialspace, in which the velocity of the camera axis ratherthan its actual position is measured and controlled.

The signalw is assumed to be an integrable functionbelonging to a setW(α) parameterized by a positivereal numberα. Examples are the sets

W(α)∞ .= {w s.t. |w(t)| ≤ α ∀t ≥ 0} (1)

or

W(α)2 .={w s.t.

∫ ∞0|w(t)|2 ≤ α2

}. (2)

W(α)∞ andW(α)2 correspond to signals which areuniformly bounded for each timet and signals withbounded energy respectively. The subscript in the no-tation reflects the fact that, in mathematical terms, 1and 2 are the balls of radiusα in the L∞ and L2

norm respectively. In general, the setW(α) should sat-isfy a monotone inclusion property as a function ofα:W(α1) ⊂ W(α2) if α1 < α2. This property is clearlysatisfied byW(α)∞ andW(α)2.

Together with the reference modelM , the setW(α)gives a degree of freedom available for design. In par-ticular, the choice ofW(α) is dictated by the class ofmovements that the camera is expected to be able totrack; (α)∞ is a reasonable choice whenever little isknown a priori aboutw(t), and constitutes the mainexample considered in the sequel.

The other ingredient in the present approach is thehalf size of the fovea, denotedx and measured in thesame units asθ andφ. If e(t) = θ(t) − φ(t) denotesthe difference between the position of the camera andthe target at timet , the control objective is to design adiscrete-time controllerC such that

1. The closed-loop system is stable, and2. |e(t)| ≤ x for eacht ≥ 0, wheneverw ∈W(α).

Page 7: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 87

Notice that the specification in 2. is made in terms of thecontinuous time errore(t)and not the sampled oneε(k)which is available to the controller. The reason for thisis that concentrating inε(k)may result in the target notremaining within the fovea during inter-sample time,which may be undesirable for image processing pur-poses; moreover, it may lead to oscillatory responseswhich should be avoided since the velocity of the objectwith respect to the camera should be relatively smallto prevent image blurring. More generally, and subjectto the particular application, one may want to considera specification combining the continuous and sampledbehavior as considered in Mirkin and Palmor (1997),since this could boost the achievable performance.

The existence of a controller that satisfies the abovecriterion will depend in general onα, sincee(t) can-not be guaranteed to be small for arbitrarily “large”signal. Alternatively, given a controllerC, and a foveahalf-sizex, there exists a largest possible value ofα,depending both on the controller and on the fovea size,that still satisfies the criterion. The dependence on thecontroller can be removed by finding the “best” possi-ble one, which solves the optimization problem:

Problem 1(Maximum Size of Input). Givenx, find thelargestαx for which there exists a controllerCx thatguarantees|e(t)| ≤ x for anyw(t) ∈W(αx).

A solution to this problem is presented in AppendixA, by formulating an equivalent1 optimal controlproblem.

Note that the boundαx will be small both forx ≈ 0and, typically, also for large values ofx since imageprocessing delays become dominant. The maximum ofαx will then be achieved for some finite value ofx:

α∗ .= maxxαx.

In principle, the maximum can be achieved for morethan one value ofx, so letx∗ denote the largest suchvalue less than the half sizeX of the camera. Then:

Non-uniform resolution is beneficial whenever 0<x∗ < X, since then a controller may be designedsuch thatw belongs to the largest possible setW(α∗)such that|e(t)| ≤ x.

The associated controllerCs = Cx∗ , which for thecases considered above is linear and time-invariant, willbe referred to as a smooth tracking controller.Cs guar-antees that the target will remain inside the fovea for

the worst casew ∈W(α∗), althoughw can potentiallynot be inW(α∗) and still the objective|e| < x∗ besatisfied.

Remark. As explained in the appendix, the compu-tation of Cs is made complicated by the fact that thesystem is sampled-data. In theL1 (or rather, inducedL∞ − L∞) optimal case, this problem was consid-ered in Dullerud and Francis (1992) by approximat-ing the continuous-time signals using fast samplingand then solving the resulting discrete-time problem. Ifthe signalw is assumed to have bounded energy, thenthis problem was briefly considered in Bamieh et al.(1991).

The main conclusions of this section are that theoptimal size of the fovea can be computed as the so-lution to a maximization problem, and that the benefitof implementing a foveal window depends on (com-pare with Murray et al., 1995) a) the limits imposedby the dynamics of the mechanical system, b) compu-tational delays and other hardware constraints, and c)the characterization of the signalw (i.e., the definitionofW(α)), which in turn reflect the set of movementsof the targetφ(t) one expects to track. Recent progressin robot head construction suggests that b) is the majorfactor now limiting the achievable performance.

4. Smooth-Pursuit and Saccade

The discussion in the previous section provides a com-plete answer to the first and a partial answer to the sec-ond questions raised in the introduction. Moreover, itestablishes that a single linear time-invariant controllercannot generically guarantee that the target will remainwithin the fovea for arbitrary signalsφ(t). Therefore,although smooth pursuit achieved by a single lineartime-invariant controller may suffice in some cases, itmay be inadequate by itself for many practical situa-tions. For instance, almost by definitionCs cannot beused to perform fixation shifts.

The purpose of this section is to develop a controlstrategy for the case when the target moves out of thefovea or a fixation shift is specified by a higher levelcontroller, which are characterized by|e(tv)| > x forsome timetv. A similar problem was discussed beforein Murray et al. (1995); the main conclusions obtainedin that work through simulations can be considered as aspecial case of the general problem considered below.It is important to stress that the objective of a saccade

Page 8: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

88 Rivlin and Rotstein

is both to center the target on the fovea at some timets > tv and to guarantee that the smooth controllerwill be able to perform satisfactory fort ≥ ts if theassumption onw(t) is satisfied. Since performance canbe poor in the interval [tv, ts], a natural objective is tomake this interval as short as possible.

It is instructive to think of the example of a televi-sion broadcast of a football game2. If the ball remainsin a relatively small sector of the field moving at lowvelocities, the camera tracks its trajectory with smoothmovements, but a ball kicked strongly will usually re-quire rapid camera movement in an attempt to maintainit or bring it back to the field of view. These last move-ments, referred to as “saccades” in the sequel, are ofa different nature than the ones required for smoothpursuit. First, they appear to be morereflectivein thesense that they involve higher level of processing on thepart of the operator. As another example which high-light this fact, it is interesting to note that the visualsystem of a newborn is able to perform some smoothpursuit, but a child is able to follow the flight of a ballonly after he or she is several years old! Second, theyinvolve larger control actions as compared to the onesgenerated by smooth pursuit. Third, the error|e(t)| isreduced only at some timets in the future as opposedto the uniformity achieved by the smooth controller.In order to that, it is necessary to be able to predictφ(ts), which in turn requires finding a suitable modelforφ(t). As will become clear , this model is critical forthe success of the saccadic correction. Fourth, and re-lated with the previous one, the control system appearsto become refractory to new input, which is consistentwith our previous treatment and closed-loop stability.Going back to the example, if the ball hits against anobstacle and bounces back after the saccade has beentriggered, then this is not immediately taken into ac-count but rather a saccade is completed to thewrongplace before a second saccade is executed to correctfor the error. These examples capture some of the maincharacteristics of the saccadic movements of the humanoculomotor system (Robinson, 1968).

4.1. Switching Between Controllers

As described above, a saccade is a fast movement trig-gered when the smooth controller cannot keep the targetwithin the fovea, with the objective of bringing the tar-get back to the smooth-pursuit regime. It is worth stress-ing that the saccadic control should not only achievefast target re-engagement, but it should also guarantee

that the target could be tracked by the smooth pursuitcontroller right after the saccade. The driving idea be-hind our approach to saccadic control is to generate thecontrol law in such a way that from the instant the sac-cade has been completed, the smooth pursuit controlloop can be closed without introducing transients. Thisis achieved by making the internal states of the physi-cal system right after the saccade identical to the onesthat could have been reached by the smooth-pursuitclosed-loop.

A systematic derivation of the control law is rathertechnical, especially because of the difficulties intro-duced by the sampled-data nature of the controller.A derivation of the equations governing the saccade,as well as surrounding theoretical results, are given inAppendix B.

4.2. Saccadic Control

The discussion in the previous section provides theframework for the systematic treatment of saccadiccontrol. Following the approach in Rivlin et al. (1997),four different stages are considered.

Switch On. Suppose that the constraint on|e(t)| isviolated at timetv so that the smooth controller can nolonger guarantee good performance or even continueits normal operation. A saccadic action is then trig-gered, which requires relatively lengthy computations.Meanwhile, the camera should somehow be operatedin a way that will possibly facilitate the future correc-tion. In the absence of additional information about thevariations of the position of the target, then one couldselect a fictitious signalwtv in such a way that the errorcriterion remains constant fromtv and until the saccadiccontrol is employed. Taking thenev

.= e(tv),

xC( j ) = Aj−kvC xC(kv)+

j−1∑i=kv

Aj−iC bCev

wherexC(kv)3 is the internal state of the smooth pursuitcontrollerC1 at sampling instantkv and j > kv. Theoutput signal ofC1 is

u1C( j ) = cCxC( j )+ dCevv

= cC Aj−kvC xC(kv)+

[cC AC

(Aj−kv

C − I)

× (AC − I )−1bC + dC]ev

Page 9: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 89

and

v( j ) = v( j − 1)+ u1C( j − 1). (3)

Take nowjh < t ≤ ( j + 1)h. The control input to theplant is

u(t) = u( jh)+ v( j )(t − jh)

due to the continuous time integrator, giving

θ(t) = ceA(t−tv)xP(tv)+ c∫ t

tv

eA(tv−s)bu(s) ds

= ceA(t−tv)xP(tv)+ c∫ t

tv

eA(tv−s)b(u(bt/hch)+ v(bt/hc)(t − jh))

and

φ(t) = ev + ceA(t−tv)xP(tv)+ c∫ t

tv

eA(t−s)b(u(bs/hch)+ v(bs/hc)(s− jh)) ds.

Taking derivatives:

φ(t)= cAeA(t−tv)xP(tv)+ cb(u( jh)+ v( j )(t − jh))

wv(t)= φ(t) = cA2eA(t−tv)xP(tv)+ cbv( j ).

Replacingv( j ) with (3) gives an expression for thevirtual reference signal, which can be computed on-lineand hence used for driving the camera while performingthe two computational stages discussed next.

Modeling. In order to reduce the error signal belowthe fovea half-size at some future instantτ , it is neces-sary to predict the values of the signalφ(t) for t ≥ τ ,based on measurements which are usually costly to ob-tain and potentially contaminated by noise. The successof the saccadic control action may depend on the accu-racy of these predictions. As an example, suppose thatthe target is located at some stationary point lying out-side the foveal window (this is a standard experimentwhen evaluating human saccades (Robinson, 1968));then the modeling problem reduces to determining thenew position, which can presumably be done accu-rately. On the other hand, suppose that in the footballexample discussed in the beginning of this section, theball bounces back after being kicked; the prediction

will then most probably be poor and in the real life ex-ample requires additional saccadic corrections beforereturning to smooth pursuit regime.

The reference Bar-Shalom and Fortmann (1988)contains an array of different algorithm for the com-putation of models for predicting signals under varioussets of assumptions. The algorithm of choice shouldbe selected depending on the standing assumptions forφ(t) and the noise level corrupting the measurements.This selection is important since it sets the time lagrequired to have a prediction of future position anddetermines the a priori accuracy of the predictions. Apopular choice in the active vision field is to selectα − β or α − β − γ filters for prediction. This filtershave the advantage of their simplicity, and the coef-ficients of these filters are usually selected by usingthe steady-state solution to a corresponding Kalmanfiltering problem (Bar-Shalom and Fortmann, 1988).However, much better predictions can be made if a pri-ori knowledge of the variations ofφ(t) are availableand exploited.

Saccade. Once the model is available at time, say,tp,it is possible to computexM(ts) for some future time in-stant and hence the time-varying target setO(xM(ks)).The problem is now to generate the control signalusac(t) that drives the plant fromxP(tp) toO(xM(ts)).A natural objective is to do this in the shortest possi-ble time, not only because of the tracking objective butalso since the future prediction ofxM(ts) potentiallydeteriorates with time. It is implicitly assumed that theinternal state of the plant is measurable for feedback;this can be achieved at least approximately if the in-ternal control loop discussed before is designed so thatP can be accurately approximated by a second ordersystem, for which both position and velocity are mea-sured.

The computation of the saccadic control appears tobe challenging; it can be approximated by using fast-sampling, i.e., replacing the continuous-time virtual in-putw(t) by a piece-wise constant function:

w(t) = w(k) kh ≤ t < (k+ 1)h

whereh¿ T . This reduces the problem to a discrete-time multi rate one. The advantage is that in that caselinear programming based algorithms exist (de Vliegeret al., 1982) for solving these problems, and they allowthe inclusion of additional constraints, like bounds onthe tolerable control actions. Notice that the constraint

Page 10: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

90 Rivlin and Rotstein

on the target set is a linear one, and so can be incorpo-rated with minor modification into the formulation.

Switch Off. Linear optimal controllers such as anL1-optimal, assume that the initial state for the plant tobe controlled is zero. If the initial state is non-zero andunknown, then the controller can no longer guaranteethe desired performance and should be replaced by ausually more complicated one (e.g., non linear, time-varying). As claimed above, if the initial state is known,then the same controller can be used if it is properlyinitialized, since it amounts to finding a fictitious butlegal disturbance that would drive the state of the plantto the actual non-zero initial state, when the plant isinterconnected with the optimal controller. Then, it ispossible to “read-out” the state of the controller andinitialize the actual configuration so that the optimalperformance can be guaranteed.

5. Simulation and Experimental Results

The control strategy presented above was tested on theTechnion Robot Head. The head consists of two cam-eras mounted on a mechanism providing four mechan-ical degrees of freedom (see Fig. 5). For the purposesof the present research, the pan axis of one camera wasused for control, while a laser pointer was attached tothe other camera. This laser pointer provides a targetfor the control system which can be controlled by thecomputer.

Through experiments, the following open-looptransfer function was obtained for the pan axis of in-terest, relating normalized voltage input to the motor

Figure 5. The Technion Robot Head.

with angular displacement of the axis:

Pin(s) = −3.3012s+ 3.9197× 104

s2+ 64.7983s

The parameters of the controller available with the con-troller board of the Technion Head was subsequentlydesigned using classical control considerations. It wasfound that proportional control suffices to provide verygood open loop performance, in terms of stability,bandwidth and noise rejection properties. After tak-ing all the gains in the loop into account, the controllerwas seen to be

Cin = −.5.Next, the theory presented above for the design of asmooth pursuit controller was applied. In order to dothis, an integrator was added in cascade with the inter-nal closed loop ofPin andCin:

1

s

PinCin

1+ PinCin= .5 −3.3012s+ 3.9197× 104

s3+ 63.1477s2+ 1.96× 104s.

The optimal performance of the system when con-sidering discrete-time PID controllers with samplingtime h = q(1/30) sec is illustrated in Fig. 6. As ex-pected from the interconnection, the performance de-grades monotonically with decreasing sampling rate.These PID controllers were optimally tuned followingthe theory in Section 3. The reason for this is that ac-tual optimal controllers could be of high complexity(indeed, optimality is achieved only for infinite dimen-sional controllers); also, PID controllers are easier toimplement and require minimum software changes.

Figure 6. The optimal performance of the system when consider-ing discrete-time PID controllers. The vertical axisx(pix)/gammameasures the worst-case error (see Appendix A).

Page 11: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 91

Figure 7. The performance of the smooth pursuit controller.

A model for the image processing block is now re-quired to trade-off against the performance degrada-tion induced by the sampling-rate. Due to the charac-ter of the target, a simple correlation algorithm wasemployed to localize the target from each essentiallyone-dimensional image. As a consequence, the com-putational time can be model through the equation

τ = ao+ a1x

wherex is half the size of the fovea, normalized be-tween zero and one. Comparing this cost with the per-formance in Fig. 6, it was found that the optimal trade-off is achieved atq = 2, meaning that the optimal foveahas a half-size of 30 pixels, and that only one out of twoimages should be used as input to the control system.The resulting controller is capable of tracking withinthe fovea, target accelerations of 250◦/sec2. The per-formance of the smooth pursuit controller is illustratedin Fig. 7. There, the target is initially at rest at the cen-ter of the fovea, then it accelerates with the maximaladmissible acceleration in one direction, and finally de-celerates and accelerates in the opposite direction. Thefigure shows that at overall the trajectory, the targetremains inside the fovea.

Figure 8 presents the performance of the smooth pur-suit controller (the solid line) vs. a standard PID con-troller (dashed line) to anφ(t) = t − t ∗ cos(2π/7)as performed running on the Technion Robot Head.The size of the fovea is 4 degrees (about 60 pixels).One can see that the standard PID controller takes thetarget outside of the fovea at about the tenth second.We also checked the difference in reaction to accelera-tion of the two controllers. The results are presentedin Fig. 9. One can see that the standard PID con-

Figure 8. The performance of the smooth pursuit controller (thesolid line) vs. a standard PID controller (dashed line).

Figure 9. The performance of the smooth pursuit controller (thesolid line) vs. a standard PID controller (dashed line).

troller converges to a steady-state after about 8 sec-onds, with a relatively large error (1.5 degree, about22 pixels).

Saccadic Control

The implementation and evaluation of saccadic controlis much more complex and, consequently, the perfor-mance will be illustrated by means of a simulation ex-ample which captures the behavior of the two loops in aquite nice manner. Figure 10 presents the performanceof the system to sinusoidal input when running in twomode control. The input signal is such that smooth pur-suit is broken twice during each period. However, sac-cadic control is able to bring the target again within thefovea and switch back to smooth pursuit. It is interest-ing to observe that since the target is not static, mostof the previous approaches to saccadic control wouldprobably fail to switch successfully back to smooth

Page 12: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

92 Rivlin and Rotstein

Figure 10. The performance of the system in a two-mode control.

pursuit. A possible exception to this is the controlscheme proposed in Murray et al. (1995).

6. Conclusions

In this paper some of the fundamental problems re-garding the control of an active vision system havebeen addressed. It has been shown that the benefit ofimplementing foveal vision can be formulated as an op-timization problem, since a trade-off appears betweenhaving a small window which would yield small com-putational delays but tighter control objectives or re-laxing the control objectives but obtaining more chal-lenging dynamics. Following the current approach, thesize of the fovea is chosen as the one giving best track-ing capabilities, as measured by the size of the signalswhich the system is guaranteed to track. It was alsoshown that foveal vision is tightly related with smoothpursuit, since the solution to the former provides a con-troller which makes the latter meaningful.

If the performance provided by the smooth controllerdoes not suffice, which will be the case if the camera isexpected to perform in a realistic environment, then oneis necessarily led to consider a two-mode controller, inwhich the smooth controller is replaced whenever theerror fails to meet specifications by a saccadic con-troller. This latter controller is substantially more so-phisticated since it has to have capabilities of modelingthe evolution of the target, generating a signal whichwill drive the system momentarily and then anotherone that should position the camera in such a way thatthe smooth pursuit controller can be switched-back intothe loop and perform according to specifications. It wasestablished that, in the light of recent developments in

optimal control theory, all this requirements can be for-mulated in a systematic manner.

The setup considered in this paper is clearly over-simplified, and several adjustments might be requiredin order to implement the different stages. Computa-tion of the optimal fovea size and smooth controllerappear to be straightforward, although several modifi-cations should be introduced to make the design morerealistic; for instance, model uncertainty and noise cor-rupting the measurements should be introduced into thepicture. Although this makes the calculations slightlymore involved, since for instance the calculation ofαx

requires itself some iterations, they can still be per-formed off-line in a reasonable time. As for saccadiccontrol, it is clear that since it involves intensive on-line computations it should be implemented carefullyto obtain satisfactory results, and a priori knowledge onthe type of target the system is expected to track shouldbe used to speed up computations. In this respect, no-tice that the modeling stage has been de-emphasizedalthough it is critical for achieving good performance,and hence constitutes the degree of freedom that thedesigner has at hand for tailoring saccades to specificapplications.

Although the present paper differs from most of theworks on the area of active vision by stressing the co-ntrol aspects of the problem, it is the feeling of theauthors that it is also tightly connected with them sinceissues that have been discussed before can be easily ac-commodated into the current approach, including im-age processing techniques or modeling of the positionof the target.

As stated in the introduction, the most simple setupwas considered in order to highlight the most impor-tant problems involved in tracking by an active visionsystem. The next steps are implementing these ideas inan actual system and increasing the complexity by in-cluding one more camera and other degrees of freedom.The latter appears to require rewriting the theory in thispaper in terms ofmultivariablesystems (e.g., systemswith more than one input and/or output) while the for-mer seems to be more challenging since specificationsare harder to write down in a systematic manner. It willbe interesting to find out in what sense the organiza-tion of the human oculomotor system provides cluesas to how this should be done. As an incidental re-mark, several apparently desirable characteristics havebeen hypothesized about the behavior of this system,including theL1 optimal characteristic. Experimentsare planned to contrast these hypotheses with real data.

Page 13: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 93

Appendix

A. A Solution to Problem 1

In mathematical terms, Problem 1 may be written as:

αx = sup{α : infC∈C

supw∈W(α)

|e(t)| ≤ x, t ≥ 0} (A1)

whereC ∈ C is used to denote that the controller is sta-bilizing. This problem is closely related to optimal con-trol problems with an induced-norm criterion. To seethis, let Tew(C) denote the transfer function betweenw and e for a given controllerC. For C stabilizing,Tew(C) is stable and it is possible to define the systemnorm:

‖Tew(C)‖∞,i .= supw∈Li

‖e(t)‖∞‖w(t)‖i

= supw∈W(α)

‖e(t)‖∞α

where the last equality follows from linearity and thedefinition ofW(α). The relevance of this norm is that,given an inputw, it is possible to bound the norm ofthe output as:

‖e(t)‖∞ ≤ ‖Tew(C)‖∞,i ‖w‖iand the bound is tight in the sense that there alwaysexists an inputw such that it holds as an equality. ThecontrollerC can be chosen optimally as a solution tothe problem

γ xi = inf

C∈C‖Tew(C)‖∞,i

= infC∈C

supw∈Li

‖e‖∞‖w‖i .

A solution Cx to this problem ismin-maxoptimal inthe sense that it guarantees that the norm of the out-put will remain smaller thanγ x

i ‖w‖i for a given inputw ∈ W(α)i . It follows that‖e(t)‖∞ ≤ x if α = x/γ x

iand forα > x/γ x

i there always existsw ∈ W(α)isuch that the constraint on the norm ofe(t) is violated.From a computational point of view,Cx as above maybe found by using control theory techniques: in the casei = 2 Cx is known as generalizedH2 controller whilefor i = ∞ it is called anL1 controller. The perfor-manceγ x depends onx via the sub-sampling rateqx,and hence will be piece-wise constant: only variationsof x large enough to change the integerq will affect

it. Given thatx is positive and bounded above by thephysical size of the camera, the computation of onlya finite number of values forγ x is required. Noticethat γ x will typically be an increasing function ofxsince continuous time performance deteriorates as thesampling period becomes larger.

B. Saccade Computation

Let the plantP be described by the state space equa-tions:

xP(t) = AxP(t)+ bu(t)

θ(t) = cxP(t)

which can be denoted in compact notation

P =(

A bc 0

).

In the sequel, this compact notation will be used to de-note transfer matrices whenever convenient. The dis-crete-time controller has a state-space representation

xC(k+ 1) = ACxC(k)+ bC ε(k)

v(k) = cCxC(k)

and the same compact notation may be used; thecorrect interpretation can be obtained from the con-text. Since a sampling device is linear but not time-invariant, the closed-loop of a sampled-data systemis time-varying. Fortunately, it is periodically time-varying and hencelifting techniques may be usedin order to write down the equations in a compactand elegant form; the reader is referred to Bamiehand Pearson (1992) for a comprehensive introduc-tion to lifting for sampled-data systems. In a nutshell,continuous-time signals can belifted into discrete-timeones by “cutting” them into pieces, and difference-equations can be found for the correspondingly liftedsystems (assuming they are finite-dimensional). Asa consequence, the closed-loop behavior is definedby a set of discrete-time state-space equations, al-beit with operators replacing the state-space realizationmatrices.

In order to make our treatment specific, the modelM will be assumed to be adouble-integrator:

M =0 1 0

0 0 11 0 0

.

Page 14: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

94 Rivlin and Rotstein

In order to have a) boundedL1 norm betweenw andeand b) zero steady-state error wheneverw is constant,we takeF to be the integrator:

F =(

0 11 0

)and setC(z) = 1

z−1C1(z), which also makes thediscrete-time controller strictly causal, a condition thatevery feasible controller should satisfy sinceε(k) can-not affect the current control signal. It follows fromthe internal model principle for sample-data systems(Yamamoto, 1994) that these two conditions are nec-essary and sufficient for satisfying a) and b). It is im-plicitly assumed here that the plant contains no pole atzero, since otherwise either the continuous or discretetime integrator (or both) may not be required.ST is anideal sampler while the interconnection ofST and thedown-sampler is an ideal sampler with sampling rateh = qT. The holdHh is assumed to be a zero-orderhold while the controllerC1 has a state-space realiza-tion

C1 =(

AC bC

cC dC

)and then

C =(

1 h1 0

)(AC bC

cC dC

)=1 hcC hdC

0 AC bC

1 0 0

Using lifting techniques, it is possible to get a repre-sentation for the closed-loop whenkh≤ t < (k+ 1)hin terms of state-space equations of the form:

xS(k+ 1)= ASxS(k)+∫ h

0BS(h− s)w(kh+ s) ds

e(t)=CS(t)xS(k)+∫ h

0DS(h− s)w(kh+ s) ds

where

AS=

1 h 0 0 0 0

0 1 0 0 0 0

0 0 1 c∫ h

0 eAr dr −ψa(h)b 0

0 0 0 eAh −ψb(h)b 0

hdC 0 hdC 0 1 hcC

bC 0 bC 0 0 AC

CS(t)= [1 t 1 c∫ t

0 eAr dr − ψa(t)bcC 0]

BS(t)=

t

1

0

0

0

0

DS(t)= t

and [ψa(t)

ψb(t)

]=[

c∫ t

0

∫ t−s0 eAr drds∫ t

0 eA(t−s)ds

].

The corresponding state vector is formed by stackingthe ones of the reference model, plant and controller:

xS =

xM

xP

xC

.It follows by the internal model principle that theun-stablestates ofM should be unobservable. To exhibitthis fact, note thatA is invertible (since otherwise theplant P would have some poles at zero), assume thatc · b 6= 0 and consider the change of variables:

xM

xa

xb

=

xM

xP +[

1 0

0 A−1b/cb

]xM

xC +[

0 −1/cb

0 0

]xM

.

Then, it is straightforward to verify that the closed loopsystem may be written asS= Suo+ So, where

Suo =0 h t

0 1 10 0 0

So

=

1 c

∫ h0 eAr dr −ψa(h)b 0 t

0 eAh −ψb(h)b 0 A−1b/cbhdC 0 1 hcC −1/cbbC 0 0 AC 01 c

∫ t0 eAr dr −ψa(t)bcC 0 t

;

Page 15: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

Control of a Camera for Active Vision 95

hereSuo is unobservable from the error signal. Thesesystems generate the error signale(t) as a function ofw(t).

Givenφ(ts) at some future timets = qks, the objec-tive of the saccadic control is to synthesize a controlaction that would allow to switch the smooth controllerback into the loop at timets. A moment of reflectionshows that it is not enough to guarantee that|e(t)| willremain smaller thanx for t > ts since, for instance,there may be a large velocity mismatch between thecamera and the target atts. The problem is then to findan appropriate “target set” for the saccade; a similarproblem was addressed in Rivlin et al. (1997) for thediscrete-time case.

Before proceeding, some notation is required. Givena periodically time-varying systemG in lifted form, aninitial state x0 at time t0 and some (integrable) func-tion w(t), let FG(k, x0, t0, w) denote the linear func-tion mappingx0 into the state trajectoryxG(k):

xG(k) = FG(k, x0, t0, w).

Consider theReachable SetRG of G, defined as theset of all states that can be reached from 0 in a finitenumber of samples by using inputsw ∈W(α∗):

RG.= {xG : ∃k f , w ∈W(α∗)

giving xG=FG(k f , 0, 0, w)}.

Let RPSo

denote the projection of the reachable setRP

S into the states of the plant:

RPSo= [ I 0

]RSo.

It was shown in Blanchini and Sznaier (1995) that ifthe system is discrete or continuous-time andW(α∗) =W(α∗)∞, then if xP ∈ RP

S for some time instant, it ispossible to construct a non-linear, static state-feedbackcontroller that will force the future states to remainwithin RP

S . This fact was shown also to be true forthe sampled-data case in Mirkin et al. (1998)4 were,perhaps more interesting, it was also shown that it ispossible to select a statexC(ts) for the smooth controllerCs in such a way that the reduced closed-loop state

[ xa

xb] will remain insideRSo for future time samples if

w ∈ W(α∗) and hence|e(t)| will remain bounded byx. This motivates the following definition of target setfor saccadic control.

Definition 1(Target Set). Given an internal state ofthe reference modelxo

M , the statexP belongs to thetarget setO(xo

M) if there existsks andw ∈ W(α∗)such that[

I 0 0]FS(ks, 0, 0, w) = x◦M[

0 I 0]FS(ks, 0, 0, w) = xP.

The setO(τ, x◦M) contains the states of the plant whichcan be reached by signalsw ∈W(α∗) in a finite num-ber of sample intervals if the internal state of the refer-ence model is constrained to be equal to the one atτ , xM(τ ). The important observation is that ifusac

is now computed so thatxP(τ ) ∈ O(xC(τ )), thenthe smooth controller can be switched back into theloop at timeqτ by initializing its internal mode toxC(τ ) = [0 0 I ]FS(ks, 0, 0, wv) wherewv ∈ W(α)is such that[

xM(τ )

xP(τ )

]=[

I 0 0

0 I 0

]FS(ks, 0, 0, w

v).

It follows from the reasoning in Rivlin et al. (1997) (seealso Mirkin et al., 1998, for a more detailed treatment)that |e(t)| < x for t ≥ τ if the future disturbances|w(t)| ≤ α∗. The reason is that the closed-loop systemwill behave fort > τ as if the past input to the systemwould have beenwv (a similar interpretation can bemade for the case of normed bounded signals).

Acknowledgments

This paper has benefit from discussions with manypeople. In particular, we wish to thank Leonid Mirkin(who also shared his expertize in sample-data systems)and Ruth Onn for patiently listening to early ideas andtheir criticism. We also thank Rafi Sivan and ShukaZeevi for there helpful comments. Finally, the secondauthor acknowledges the additional physical motiva-tion provided by Ariel and Noam.

Notes

1. Except for some smoothness assumption that any physical modelshould satisfy.

2. Here and in what follows, by “football” we refer to thereal footballas opposed to American football. No previous knowledge on thegame is assumed on the part of the reader.

3. State vectors are denoted below withx and a sub-index. Not tobe confused with the fovea half-sizex without indeces.

4. As a matter of fact, the results in Mirkin et al. (1998) evolve fromRivlin et al. (1997) which in turn is inspired by the present work.

Page 16: Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade

96 Rivlin and Rotstein

References

Bajcsy, R. 1988. Active perception,Proceedings of the IEEE, 76(8).Special issue on Computer Vision.

Bamieh, B., Pearson, J.B., Francis, B., and Tannenbaum, A. 1991. Alifting technique for linear periodic systems with applications tosampled-data control.Systems & Control Letters, 17:79–88.

Bamieh, B. and Pearson, J.B. 1992. A general framework for linearperiodic systems with applications toH∞ sampled-data control.IEEE Transactions on Automatic Control, 37(4):418–435.

Bar-Shalom, Y. and Fortmann, T.E. 1988.Tracking and Data Associ-ation, Mathematics in Science and Engineering. Academic Press.

Blanchini, F. and Sznaier, M. 1995. Persistent disturbance rejectionvia static state feedback.IEEE Transactions on Automatic Control,40(6):1127–1131.

Brown, C. 1990a. Gaze controls with interactions and delays.IEEE Transactions on Systems, Man and Cybernetics, 20(1):518–527.

Brown, C. 1990b. Prediction and cooperation in gaze control.Biolo-bical Cybernetics, 63:61–70.

Brown, C., Coombs, D., and Soong, J. 1993. Real-time smooth pur-suit tracking. InActive Vision, A. Blake and A. Yuille (Eds.). MITPress, pp. 123–136.

Carpenter, R.H.S. 1988.Movements of the Eyes. Pion Limited.Christensen, H.I. 1993. A low-cost robot camera head.Interna-

tional Journal of Pattern Recognition and Artificial Intelligence,7(1):69–84.

Christensen, H.I. 1995. Private communication.Clark, J.J. and Ferrier, N.J. 1993. Attentive visual servoing. InActive

Vision, A. Blake and A. Yuille (Eds.). MIT Press, pp. 137–154.Coombs, D.J. and Brown, C.M. 1991. Cooperative gaze holding in

binocular vision.IEEE Control Systems Magazine, 11(3):24–33.Coombs, D.J. and Brown, C.M. 1993. Real-time binocular

smooth pursuit.International Journal of Computer Vision, 11(2):147–164.

de Vlieger, J.H., Verbruggen, H.B., and Bruijn, P.M. 1982. A time-optimal control algorithm for digital computer control.Automat-ica, 18(2):239–244.

Dullerud, G. and Francis, B. 1992.L1 design and analysis in sampled-data systems.IEEE Transactions on Automatic Control, 436–446.

Ferrier, N.J. and Clark, J.J. 1993. The harvard binocular head.Inter-national Journal of Pattern Recognition and Artificial Intelligence,7(1):9–31.

Fiala, J., Lumia, R., Roberts, K., and Wavering, A. 1994. TRICLOPS:A tool for studying active vision.International Journal of Com-puter Vision, 12(2/3):231–250.

Krotov, E. and Bajcsy, R., Active vision for reliable ravsing: co-operating, focus, stereo, and vergence.International Journal ofComputer Vision, 11(2):187–203.

Milios, E., Jenkin, M., and Tsotsos, J. 1993. Design and performanceof TRISH, a binocular robot head with torsional eye movements.International Journal of Pattern Recognition and Artificial Intel-ligence, 7(1):51–68.

Mirkin, L. and Palmor, Z. 1997. Sampled-dataH∞-optimal con-trol with mixed discrete/continuous specifications.Automatica,33(1):1997–2014.

Mirkin, L., Rivlin, E., and Rotstein, H. 1998. On static feedback fortheL1 and other optimal control problems,International Journalof Control.

Murray, D., Bradshaw, K., McLauchlan, P., Reid, I., and Sharkey,P. 1995. Driving saccade to pursuit using image motion.Interna-tional Journal of Computer Vision, 16:205–228.

Murray, D.W., Du, F., McLauchlan, P.F., Reid, I.D., Sharkey, P.M.,and Brady, M. 1993. Design of stereo heads. InActive Vision,A. Blake and A. Yuille (Eds.). MIT Press, pp. 155–172.

Pahlavan, K. and Eklundh, J.-O. 1993. Heads, eyes and head-eye sys-tems.International Journal of Pattern Recognition and ArtificialIntelligence, 7(1):33–49.

Rivlin, E., Rotstein, H., and Zeevi, Y. 1997. Two-mode control: Anoculomotor-based approach to tracking systems.IEEE Transac-tions on Automatic Control, 43(6):833–843.

Robinson, D.A. 1968. The oculomotor control system: A review.Proceedings of the IEEE, 56(6):1032–1049.

Sharkey, P.M. and Murray, D.W. 1996. Delays versus performance ofvisually guided systems.IEE Proceedings—Control Theory andApplications, 143(5):436–447.

Swain, M.J. and Stricker, M.A. 1993. Promising directions in activevision.International Journal of Computer Vision, 11(2):109–126.

Yamamoto, Y. 1994. A function space approach to sampled-datacontrol systems and tracking problems.IEEE Transactions on Au-tomatic Control, 39(4):703–713.