motion from fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...motion from fixation...

30

Upload: others

Post on 17-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

Motion from FixationStefano Soatto and Pietro PeronaControl and Dynamical SystemsCalifornia Institute of Technology 116-81Pasadena CA 91125, [email protected] 20, 1995AbstractWe study the problem of estimating rigid motion from a sequence of monocularperspective images obtained by navigating around an object while xating a particularfeature point. The motivation comes from the mechanics of the human eye, which eitherpursuits smoothly some xation point in the scene, or \saccades" between dierentxation points. In particular, we are interested in understanding whether xationhelps the process of estimating motion in the sense that it makes it more robust,better conditioned or simpler to solve.We cast the problem in the framework of \dynamic epipolar geometry", and pro-pose an implicit dynamical model for recursively estimating motion from xation. Thisallows us to compare directly the quality of the estimates of motion obtained by impos-ing the xation constraint, or by assuming a general rigid motion, simply by changingthe geometry of the parameter space while maintaining the same structure of the re-cursive estimator. We also present a closed-form static solution from two views, and arecursive estimator of the absolute attitude between the viewer and the scene.One important issue is how do the estimates degrade in presence of disturbancesin the tracking procedure. We describe a simple xation control that converges expo-nentially, which is complemented by a image shift-registration for achieving sub-pixelaccuracy, and assess how small deviations from perfect tracking aect the estimates ofmotion.1 IntroductionWhen a rigid object is moving in front of us (or we are moving relative to it), the informationcoming from the time-varying projection of the object onto one of our eyes suces to estimateits motion, even when its shape is unknown.Research sponsored by NSF NYI Award, NSF ERC in Neuromorphic Systems Engineering at Caltech,ONR grant N00014-93-1-0990. This work is registered as CDS technical report n. CIT-CDS 95-006, February1995. 1

Page 2: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

In order to observe the motion of the object while holding our head still and one eyeclosed, we can choose either to track it (or a particular feature on its surface) by moving theeye, or to hold the eye still (by xating some feature in the still background), and let theobject cross our eld of view. When it is us moving in the environment (or \object"), oureye constantly \holds" on some particular feature in the scene (smooth pursuit) or \jumps"between dierent features (saccadic motion).From a geometric point of view there is no dierence between the observer moving or theobject moving, and the problem of estimating rigid motion from a sequence of projections isby now fairly well understood. In this paper we explore how the xation constraint modiesthe geometry of the problem, and whether it facilitates the task.This problem has been in part addressed before in the literature of computational vision.In [6, 5], the xation constraint is exploited for recovering the Focus of Expansion (FOE)and the time-to-collision using normal optical ow, and then computing the full ego-motion,including the portion due to the xating motion. In [12], a pixel shift in the image is usedin order to derive a constraint equation which is solved using static optimization in orderto recover ego-motion parameters, similarly to what is done in [3, 10]. However, nowherein the literature is the estimation of motion, performed by imposing the xation constraint,directly compared with the estimation of a general rigid motion, due to the lack of a commonframework. More seriously, most of the algorithms assume that perfect tracking of thexation point has been performed, and it is not assessed how they degrade in the presenceof inevitable tracking errors.In this paper we study the motion estimation problem in the framework of dynamicepipolar geometry, and assess how such geometry is modied under the xation assumption.Since dynamic motion estimation schemes have been proposed in the framework of epipolargeometry [11], we modify them in order to embed the xation assumption. As a result,we can directly compare the estimates obtained by enforcing the xation constraint withthe estimates obtained by assuming general rigid motion. We also assess analytically how(small) perturbations of the xation constraint aects the quality of the estimates, and weperform simulation experiments in order to probe the boundaries of validity of the xationmodel.1.1 ScenarioWe will consider a system with a camera mounted on a two-degrees of freedom actuatedjoint (the eye) standing on a platform which is moving freely (with 6 degrees of freedom)in the environment (the head), as in gure 1. The architecture of the overall system iscomposed of two parts: an inner control loop that actuates the eye as to maintain a givenfeature in the center of the image-plane or to saccade to a dierent xation point given froma higher-level decision system; an estimator then reconstructs the relative motion betweenthe eye and the object which is due to the motion of the head within the environment. Theseestimates can then be used in order to elaborate control actions with dierent tasks, such asobstacle avoidance, \optimal" estimation of structure, target pursuing etc. .The overall functioning of the scheme can be summarized as follows (see gure 1):1. Select features. 2

Page 3: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

HEAD

EYE (2 DOF)

(6 DOF)

ESTIMATION

MOTION

CONTROL

MOTION

FIXATION POINT

FIXATION

CONTROL

Figure 1: Overall setup of motion from xation: an inner tracking loop controls the twodegrees of freedom of the eye as to maintain a given feature in the center of the image.The images are then fed into the motion estimation algorithm that recursively estimates themotion of the head within the environment. The estimates can possible be fed back to thehead in order to accomplish dierent control tasks such as navigation, inspection, dockingetc. (outer dashed loop).2. Select a target or xation point. This could be the feature closest to the center of theimage, or the best-conditioned feature, or the focus of expansion, or the singularity inthe motion eld or any other location assigned from a higher-level system.3. Control the gaze of the eye to the xation point. Simple control strategies can beimplemented, such as a one-step deadbeat, or control on the sphere with exponentialconvergence. The kinematics and geometry of the eye mechanism must be included inthe model (it will be a change of coordinates in the state-space sphere), the dynamicscan be neglected in a rst approximation.4. Fine-tune xation by shifting the origin of the image-plane.5. Track features between successive time instants. This process (the correspondenceproblem) is greatly facilitated by two facts. First, since we xate one point in thevisible object, features only move little in the image, and always remain within theeld of view. Second, knowledge of the motion of the camera from the actuators helpspredicting the position of the features at successive frames.3

Page 4: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

6. Go to 3. (Inner, fast tracking loop).7. Estimate relative motion between the object and the viewer. Both velocity or absoluteorientation can be estimated. Check the quality of tracking.8. Possibly take control action on the head in order to achieve specied tasks (outer loop).We will only brie y describe the realization of the inner control loop (the \tracking" or \x-ation" loop), which consists of a control system dened on a two-sphere, with measurementsin the real projective plane (section 1.2). This problem is well-understood and extensiveliterature is available on the topic (see [4] and references therein). The rest of the paperassumes that tracking has been performed within some level of accuracy and analyizes theproblem of estimating the remaining degrees of freedom. In section 2 we review the setupof epipolar geometry and show how it is modied by the xation assumption. In section 3we show how the epipolar representation can be used in order to formulate dynamic (re-cursive) estimators of motion. The xation assumption modies the parameter space, butnot the structure of the estimator, which makes it possible to compare motion estimatorsembedding the xation constraint, with estimators of general rigid motions. We presentboth a closed-form solution from two views and a recursive solution based upon the epipolarrepresentation. In section 5 we describe a model for estimating absolute attitude under thexation constraint.While it is evident that xation reduces the number of degrees of freedom, and thereforethe estimator following the tracking loop will operate on a smaller-dimensional space andhence be more constrained, it is not trivial to assess how possible imprecisions in the trackingstage propagate onto the estimation stage. In section 4 we assess the sensitivity of theestimates with respect to the xation constraint ,and dene a measure of \goodness oftracking" that can be performed during the estimation phase.In section 6 we substantiate our analysis with simulation experiments on noisy syntheticimage sequences.1.2 Fixation controlThe task of the inner tracking loop is that of keeping a given point in the center of the imageplane. Equivalently, we can enforce that a given direction (projection ray) in IR3 coincideswith the optical axis (see gure 2). In order to do so, we can act on two motors that drivethe joint on top of which the camera is mounted. If we call [ ]T the angles at the jointwhich describe the local coordinates of the state s of the eye on the sphere, and u1 and u2the torques applied to the motors, then the geometry, kinematics and dynamics of the eyecan be described as a nonlinear dynamical system of the form:_s = f(s; u) s 2 S2: (1)If we call x0 the spherical coordinates of the target point in the reference centered in theoptical center of the camera, with the Z-axis along the optical axis, then the motion of thecamera s(t) induces a vectoreld of the form_x0 = g(x0; s) x0 2 S2: (2)4

Page 5: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

However, we cannot measure directly the spherical coordinates of the target point, since itis projected on a at image-plane, rather than on a spherical retina (gure 2). In fact, theactual measure is a local dieomorphism : S2 ! IRP2x0 7! y0: (3)Our overall dynamic model can be therefore summarized as8><>: _s = f(s; u) s 2 S2_x0 = g(x0; s) x0 2 S2y0 = (x0) + n0 y0 2 IRP2 (4)where n0 is a noise term due to the uncertainty in the tracking procedure. The goal of theinner tracking module can then be expressed as follows:take the control action u(t) such that y0(t) ! [0 0 1] 2 IRP2 exponentially ast!1.When we neglect the dynamics of the eye, and we assume that we are able to act on thevelocity of the joints through our actuators, we can simplify our model into one of the form( _x0 = u x0 2 S2y0 = h(x0) + n0 y0 2 IRP2 (5)which we can write in local coordinates, provided that y0 is close enough to h(x0), as( _x0 = u x0 2 IR2y0 = h(x0) + n0 y0 2 IR2 (6)where h comprises a change of coordinates in the sphere and the perspective projection.From the above expression it is immediate to formulate a proportional control law withexponential convergence to the target xation point y0 either in the workspace,uw(x;y0) = kp h1(y0) x ; (7)or in the output space, represented for simplicity as the two-sphereuo(x;y0) = Jh(x)kpvG(x;y0) (8)where kp is the proportional constant, Jh is the jacobian of h:Jh(x) := @h@x(x) (9)and vG is the geodesic versor vG(x;y0) = (h(x) ^ y0) ^ h(x)d (10)with d = arcos(< h(x);y0 >) the distance between the output and the target along thegeodesic [4].Exponential convergence is required as a mean of contrasting noise. In fact, if the controlis fast, it can dump disturbances at a rate faster than they arrive, which helps the systemnot to diverge in the presence of noise and disturbances. The above controls can be easilyshown to generate exponential convergence to the desired goal [4].5

Page 6: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

1.3 Tracking and shift registrationThe purpose of the eye motion control is to keep a prescribed feature at the origin of theimage plane using two degrees of freedom of the spherical joint of the eye. In principle,tracking of the target feature could be accomplished locally by shifting the origin of theimage-plane at each step, provided that the feature remains within the eld of view (seegure 2). In general, a combination of the two techniques is to be employed. The eye isrotated in order to maintain the target feature as close as possible to the center of the image,then the image plane is shifted, with a purely \software" operation, in order to translate theorigin of the image-plane on the target feature. Provided that the feature tracking schemeachieves sub-pixel accuracy [2], the shift-registration allows us to perform the tracking withinone pixel accuracy on the image-plane.CENTER OF PROJECTION

IMAGE PLANE

Figure 2: Tracking amounts to controlling the camera as to bring one specied feature-pointin the origin of the image plane. The same task can be accomplished locally by shifting theimage-plane, a purely software operation. The two operations are equivalent locally to theextent in which the target feature does not exit the eld of view.2 Epipolar geometry under xationIn the present section we analyze the functioning of the second stage of the scheme depictedin gure 1, which consists of estimating the relative motion between the viewer and theobject being xated. Since one point of the object is still in the image plane, the object is6

Page 7: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

free only to rotate about this point, and to translate along the xation line. Therefore thereare overall 4 degrees of freedom left from the xation loop.We start o with studying how the well-known setup of the epipolar geometry is trans-formed under the xation conditions.Z

Y

X

P

X

X

X

x

x

o

o

o

i

i

i

Figure 3: Imaging geometry. The viewer-reference is centered in the center of projection,with the Z-axis pointing along the ptical axis. The object reference frame is centered in thexation point. Under the xation conditions the object can only rotate about the xationpoint and translate along the xation axis.2.1 NotationWe call X = h X Y Z iT 2 IR3 the coordinates of a generic point P with respect to anorthonormal reference frame centered in the center of projection, with Z along the opticalaxis and X;Y parallel to the image plane and arranged as to form a right-handed frame (seegure 3). The relative attitude between the camera and the object (or scene) is describedby a rigid motion g 2 SE(3).(Pi(t) = tgooPiPi(t+ 1) = t+1gooPi ) P(t+ 1) = t+1gotg1o P(t) (11)where go 2 SE(3) is the change of coordinates between the viewer reference frame at time and the object coordinate frame centered in the xation point P0(t) = [0 0 d(t)]T . Since weare interested in the displacement relative to the moving frame (ego-motion), we can assume7

Page 8: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

that the object reference is aligned with the viewer reference at time t, so that we can writethe relative orientation between time t and t+ 1 in coordinates asXi(t+ 1) = R(t)0B@Xi(t) 264 00d(t) 3751CA+ 264 00d(t+ 1) 375 (12)which we will write as Xi(t+ 1) = R(t)Xi(t) + d(t)T (R; v) (13)where T (R; v) := 264 R13R23R33 + v 375 (14)and v := d(t+ 1)d(t) 6= 0 (15)is the relative velocity along the xation axis. The matrix R 2 SO(3) is an orthonormalrotation matrix that describes the change of coordinates between the viewer's reference attime t and that at time t+ 1 relative to the object. T 2 IR3 describes the translation of theorigin of the viewer's reference frame.What we are able to measure is the perspective projection of the point featuresonto the image plane, which for simplicity we represent as the real projective plane. Theprojection map associates to each p 6= 0 its projective coordinates as an element of IRP2: : IR3 f0g ! IRP2X 7! x := h XZ YZ 1 iT : (16)We usually measure x up to some error n, which is well modeled as a white, zero-mean andnormally distributed process with covariance Rn:y = x+ n n 2 N (0; Rn):Due to the xation constraint, the camera is only allowed to translate along the xationaxis, rotate about the xation axis (cyclorotation) and move on a sphere centered in thexation point with radius equal to the distance from the xation point to the optical center.Therefore there are 4 degrees of freedom in the velocity. These can also be easily seen fromthe object reference frame: the object reference is free to rotate about the xation point (3degrees of freedom) but can only translate along the xation axis (1 degree of freedom).In eq. (13), these 4 degrees of freedom are encoded into R(t) (3 DOF) and v(t) (1DOF). However, note that also the distance from the xation point d(t) enters the model.The epipolar constraint, which will be derived in the next subsection, involves only relativeorientation and measured projections, while it gets rid of the 3-D structure and of theabsolute distance d. 8

Page 9: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

T R

P

P

0

i

x(t)

x(t+1)

Figure 4: Coplanarity constraint: the coordinates of each point in the reference of theviewer at time t, the coordinates of the same point at time t+1 and the translation vectorare coplanar.2.2 Coplanarity constraintThe well-known coplanarity constraint (or \epipolar constraint", or \essential constraint")of Longuet-Higgins [8] imposes that the vectors T (R(t); v(t)),Xi(t+1) and Xi(t) be coplanarfor all t and for all points Pi (gure 4). The triple product of the above vectors is thereforezero; if we multiply both sides of (13) by Xi(t+ 1)T (T^), where 2 IR f0g, we get0 = Xi(t+ 1)(T^)R(t)Xi(t) (17)which we will write as Xi(t+ 1)Q(t)Xi(t) = 0 (18)with Q(t) := Q(R(t); v(t)) = (T (R(t); v(t))) ^R(t): (19)We will use the notation Q(t) when emphasizing the time-dependence, while we will useQ(R; v) when stressing the dependence of Q from the 3 rotation parameters contained inR and from the relative velocity along the xation axix v. Note that Q is an element of a4-dimensional dierentiable manifold which is embedded in IR9, since Q is realized as a 33matrix.Since the coordinates of each point Xi(t) and their projective coordinates xi(t) span thesame direction in IR3, the constraint (18) holds for xi in place of Xi (just divide eq (18) byXi3(t+ 1)Xi3(t)): xi(t+ 1)Q(t)xi(t) = 0 8t ; 8i: (20)9

Page 10: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

2.3 Structure of the essential manifoldFor a generic T 2 IR3 and a rotation matrix R, the matrix Q = (T^)R belongs to theso-called \essential manifold"E := fSR j S 2 so(3); R 2 SO(3)g; (21)which can be characterized as the tangent bundle to the rotation group TSO(3) [11]. Underthe xation constraint, T has a special structure which restricts Q to a submanifold of theessential manifold. In this section we study the geometry of such a submanifold induced bythe xation constraint. We have already seen that the dimension of the space reduces from6 down to 4, since two degrees of freedom are used in order to keep the projection of thexation point still in the image plane.After some simple algebra, it is easy to see thatQ(R; v) = RST + vSR (22)where S := 264 0 0 0 00 0 0 375 (23)and is an unknown scaling factor due to the homogeneous nature of the coplanarity con-straint. If we restrict the essential matrices Q 2 E to have unit norm (as in the denitionof the \normalized essential manifold" [11]), then is xed to be = 1kQk . Note that thisarbitrary scaling does not aect neither the relative velocity v (which is already a scaledparameter) nor the rotation matrix R. We will see in section 2.4 that = 1kQk is a necessarychoice in order to avoid singularities in the representation. Under the xation constraint,both the essential manifold Q and its normalized version QkQk belong to a four-dimensionalsubmanifold of the essential manifold E. The essential matrix is therefore dened, underthe xation constraints, by the Sylvester's equation (22), with strongly structured unknownsR 2 SO(3) and v 2 IR. Other equivalent expressions can be derived as follows, assuming = 1: Q = RSTRT + vSR (24)Q = 0B@R 264 001 375+ v 264 001 3751CA ^ R (25)Q = h R:2 j R:1 j 0 i+ v 264 R2: R1: 0 375 (26)Q = 264 R12 vR21 R11 vR22 vR23R22 + vR11 R21 + vR12 vR13R32 R31 0 375 : (27)Another useful way of writing the epipolar constraint can be derived as follows. Since theconstraints (20) are linear in the components of the essential matrixQ, we can reorder them10

Page 11: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

as (t)Q = 0 (28)where (t) is a N 9 matrix which depends on the measurements xi(t);xi(t + 1) whosegeneric row can be written asi: = xi1(t + 1)xi1(t) xi1(t + 1)xi2(t) xi1(t + 1) xi2(t+ 1)xi1(t) xi2(t + 1)xi2(t) xi2(t + 1) xi1(t) xi2(t) 1 (29)Q is now interpreted as a 9-dimensional column vector obtained by stacking the rows of Qone on top of each other. It is easy to verify that the above can be written as follows:(t)S(v)R = 0 (30)where S(v) := 264 S vI 0vI S 00 0 S 375 (31)is a skew-symmetric, 9 9 matrix with rank 8 which depends only upon the translationalvelocity v. I is the 3-dimensional identity matrix and R is the usual rotation matrix nowinterpreted as a nine-dimensional column vector obtained by stacking the rows of R on topof each other. We will not make a distinction between 3 3 matrices and 9dimensionalcolumn vectors, whenever it is clear from the context which representation is employed.Since both the last row and the last column of S are identically zero, we can delete themalong with the last column of and the last element of R, which is now interpreted as a8-dimensional column-vector.From the above characterizations of the essential matrix constrained by the xation hy-pothesis it is possible to draw some interesting conclusions. In particular, by left-multiplyingthe above equation by [0 0 1], we anihilate the second (rightmost) term of the right hand-sideof (22), while the column vector [0 0 1]T anihilates the leftmost term, if right-multiplied.From this simple observation we can derive a necessary condition which acts as a consistencycheck for the quality of xation: Q33 = 0: (32)In general, from a number of point matches, we can derive an approximate estimate of thematrix QkQk which, due to noise, will be such that Q33 6= 0; later in section 4 we will see howjQ33j gives a measure of how accurate the inner tracking loop is.2.4 Singularities and normalization of the epipolar representa-tionIn the characterizations of the essential matrices described in the previous section, the un-known scaling factor has been taken into account by xing the scalar = 1, and thereforethe matrix Q is uniquely dened. However, there is a continuum of possible motions whichcorrespond to the essential matrix Q(v;) = 0 (33)11

Page 12: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

d

ε

εd

Figure 5: Epipolar setup. Under the xation constraint, both the centers of projection attime t and t+1, and the optical centers of the two cameras lie on the same plane, the epipolarplane. The intersection of the epipolar plane with the image planes is the epipolar line. Theepipolar plane is invariant after xation, for the camera can only translate along the plane,and rotate about a direction orthogonal to it.in particular v = 1 = 264 00 375 j 2 [0; ) ) Q(v;) = 0 (34)since Q = (T^)e^ with T = 0, and therefore all motions consisting of pure cyclorotation(rotation about the optical axis or xation axis) generate a zero essential matrix or anundened normalized essential matrix.If we know that motion occurs only about the optical axis, we can easily estimate theamount of rotation by solving in a lest-squares sense the rigid motion equations (12), whichreduce, in the case of pure cyclorotation, toxi(t+ 1) = e264 001 375^xi(t): (35)In order to get rid of the singularity just mentioned, we need to normalize the essentialmatrices. Since the epipolar constraint is dened up to a scale, it can be arbitrarily multipliedby a constant. In particular, if we multiply it by 1kQk we get rid of the singularity, since thetranslation vector T is constrained to be of unit norm. Note that we do not loose any degreeof freedom in the representation, for the scaling does not aect the motion parameters v;.12

Page 13: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

In section 3.3 we will see that this representation aects the convergence of the lter forestimating motion when away from the singular conguration. When the object purelyrotates about the optical axis, the translation vector is undened; we will see in section 3.3how it is possible to sort out this situation.3 Estimation from the epipolar constraintThe epipolar constraint, with the addition of the xation assumption, can be used in orderto estimate the 4 free parameters (three for rotation and one for relative translation alongthe xation axis). The rst solution we propose is a closed-form solution which is correct inthe absence of noise, but is far from being ecient in the presence of uncertainty, since thestructure of the epipolar constraint is not imposed in the estimation.The second solution is a more correct one, for it enforces the structure of the epipolarconstraint during the estimation. It consists of a dynamic estimator in the local coordinatesof the essential manifold. The constraints are enforced by construction and the structureof the parameter manifold is exploited, while the computation is carried out by an ImplicitExtended Kalman Filter (IEKF) in the lines of [11].3.1 Closed-form, two-frames solutionsConsider N visible points Pi; 8i = 1 : : : N , and the N corresponding scalar constraints (20).The constraints are linear in the components of Q, and can be used for estimating a generic3 3 matrix Q which is least-squares compatible with the measurements, in the same wayas [8, 13, 11].Once the matrix Q has been estimated, we can derive a set of constraints for the com-ponents of the rotation matrix R. Just for the sake of simplicity, assume that we representthe rotation matrix locally using Euler angles 6= 0; 6= 0 and 6= 0:R = RZ()RY ()RZ( ) = 264 ccc ss ccs sc csscc + cs scs + cc sssc ss c 375 (36)where RZ() indicates a rotation about the Zaxis of radiants264 c s 0s c 00 0 1 375 (37)and similarly for RY () and RZ( ). From the above expression of R, and the expression forQ given in eq. (27), it is immediate to solve for the Euler angles: = arctan Q13Q23! (38) = arcsinqQ231 +Q232 (39)13

Page 14: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

= arctan Q31Q32! (40)(41)provided that Q23 6= 0 and Q32 6= 0. It is immediate to see that Q23 = Q23 = 0 only ifrotation occurs only about the optical axis with an angle = + . In such a case, equation(27) becomes Q = 264 s(1 v) c(1 v) 0c(1 v) s(1 v) 00 0 0 375 (42)and we can solve for = + = arctan Q22Q12! (43)provided that Q12 6= 0, in which case we have = = = 0. Once the rotation parametershave been estimated, the translation parameter v can be recovered from the other elementsof Q. For instance, when = 0, v = 1qQ211 +Q221: (44)Alternatively, one may start with a dierent local coordinate parametrization of R, forexample the exponential coordinatizationR = e^ (45)and plug the result into equation (22), which can then be solved for the three unknowns1 : : :3 using an iterative optimization method such as a gradient descent.It must be stressed that these methods do not enforce the structure of the parameter spaceduring the estimation process. Rather, generic, non-structured parameters are estimated,and then their structure is imposed a-posteriori in order to recover an approximation of thedesired estimates.The epipolar constraints can also be used for formulating nonlinear lters that estimatethe motion components over time, while taking into account the geometry of the parameterspace. This is done in the next section.3.2 Implicit dynamical lter for motion from xationConsider the local parametrization of the essential matrix Q(R; v), which is := " v # 2 IR4 (46)where 2 IR3 is dened for kk 2 [0; ) by the equation [9]e^ := R: (47)14

Page 15: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

We can write a dynamic model in the local coordinates of the essential manifold, having asimplicit measurement constraints the epipolar constraint (20) where the matrix Q is nowexpressed as a function of the local coordinates, Q():(xi(t+ 1)TQ((t))xi(t) = 0 2 IR4yi(t) = xi(t) + ni(t) 8i = 1 : : : N: (48)Estimating motion amounts to identifying the parameters from the above model. This canbe done using the local identication procedure presented in [11], which is the IEKF basedupon the model (t + 1) = (t) + n(t)yi(t+ 1)TQ((t))yi(t) = ~ni(t) 8i = 1 : : : N (49)where the second order statistic of the residual ~n is computed according to [11]. An alter-native way of writing the above model is (t+ 1) = (t) + n(t)(t)S(1)R(2; 3; 4) = 0: (50)the equations of the estimator, as derived from [11], are:prediction step: (t+ 1jt) = (tjt) (0j0) = 0 (51)P (t+ 1jt) = P (tjt) +Q (52)where Q is the variance of the noise n driving the random walk model and is intendedas a tuning parameter, and P is the variance of the estimation error of the lter.update step:(t+ 1jt+ 1) = (t+ 1jt) + L(t+ 1)26664 ...yi(t+ 1)TQ((t+ 1jt))yi(t)... 37775 (53)P (t+ 1jt+ 1) = (t+ 1)P (t+ 1jt)(t+ 1)T + L(t+ 1)RnLT (t+ 1) (54)where L(t+1) is the Extended Kalman Gain [7], and = ILC, with C := @Q@ (t+1jt).3.3 Dealing with singularities in the representationIn section 2.4 we have pointed out a singularity in the non-normalized epipolar representationwhen the relative motion between the scene and the object consists of pure rotation aboutthe optical axis. This phenomenon is to be expected, for pure rotation about the opticalaxis generates zero ego-motion translationT = R:3 + 264 00v 375 = 264 000 375 (55)15

Page 16: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

which is a singular conguration for the motion estimation problem [11]. As long as thereis a non-zero translation (that is, as long as there is some components of rotation aboutan axis non corresponding to the optical axis), the constraints are well-dened. However,serious problems may occur while estimating motion even when the motion parameters arefar away from the singular point.In order to visualize that, we can imagine the innovation of the lter as living on aresidual surface that maps some particular motion v; onto IRN when N feature points arevisible. The lter will try to update the state v; as to reach the minimum of the residual.Of course the motion that generated the data v; corresponds to a minimum of the residualsurface (it would be zero in absence of noise). However, also the location v = 1; = [0 0 ]Tcorresponds to a zero of the residual, which is a hole in the residual surface. Therefore thelter must be able to reach the minimum without falling into the singularity (see gure 6).This can be done provided that the initial conditions are close to the minimum of theresidual surface corresponding to the true motion. However, in the presence of high measure-ment noise levels, the residual surface becomes increasingly more irregular, and eventuallythe lter falls into the singularity. This eect will be illustrated in the experimental section,where we will show that in the presence of high noise levels, the lter initialized fare enoughfrom the true value of the state falls into the singularity, the innovation goes to zero and thevariance of the state increases.R

RN

3R

νΩ = [0 0 1]

= 1

ν = ν

Ω = Ω^

Figure 6: Singularity in the non-normalized epipolar representation. The residual surface,where the innovation of the lter takes values, has a minimum corresponding to the true mo-tion, but also a minimumcorresponding to cyclorotation. The lter must be able to convergeto the true minimum without falling into the singularity. The normalized epipolar represen-tation is a way of getting rid of the singularity, for the translation vector is constrained tohaving unit norm. 16

Page 17: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

One way of getting rid of this singularity is to use the normalized essential matrix, whichcorresponds to dividing the epipolar constraint by the norm of translation. This eliminatesthe singularity, since T is constrained to having unit norm. However, the motion correspond-ing to pure cyclorotation gives an essential matrix which is undened, and therefore the lterwill give arbitrary estimates.In order to sort out the case of pure rotation about the optical axis, we can rst try tot a into the purely cyclo-rotational modelx(t+ 1) = RZ()x(t): (56)If the residual is big enough it means that rotation is not purely about the optical axis.Therefore the translation induced in the viewer's reference is non-zero, and the normalizedepipolar constraint is well-dened. We will see in the experimental section how the lterbased upon the normalized epipolar representation performs where the non-normalized lterwould fall into the singularity.4 Vergence control, quality of xation and sensitivityof constraintsOne may argue that, in the proposed architecture, the estimation scheme that follows thexation loop is \blind", in the sense that it cannot reject disturbances due to imperfecttracking. In the present section we analyze how the estimation algorithm is modied in thepresence of non-perfect tracking, and how it can assess the quality of the xation.We will consider two dierent kinds of non-perfect tracking. One in which the two opticalaxes (at time t and t + 1) intersect at a point which is not the desired xation point, andone in which the two optical axes do not intersect at all.4.1 Vergence controlLet us assume that the optical axis of the camera at time t intersects the optical axis attime t + 1 in a \vergence point" which is dierent from the desired xation point (seegure 5). Consider the plane determined by the two centers of projection and the opticalcenter (xation point) in the camera at time t, which is called the epipolar plane at timet. If the optical axes intercept, there must exist one point on the projection of the opticalaxis of the camera at time t which passes through the optical center of the camera at timet+ 1. Equivalently, the optical center at time t+ 1 must belong to the epipolar plane. It isimmediate to see that this can happen only if the direction of rotation is orthogonal to thedirection of translation, which is constrained to belong to the epipolar plane (see g. 5). Inbrief, the epipolar plane is invariant under the vergence conditions.Therefore, under the vergence conditions, we can identify one point P0 at the intersectionof the optical axes, for which the xation constraint is satised, although it is not the desiredxation point. From Chasles' theorem [9] we can conclude that the algorithm proposed inthe previous section estimates the motion of the object relative to the point P0, rather thanrelative to the desired xation point. If the mismatch between the target point and the17

Page 18: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

actual vergence point is along the epipolar line, then the mismatch along the optical axis isapproximately d, where d is the distance between the optical center and the target xationpoint.A natural question to ask at this point is how the algorithm following the xation loopcan verify whether the vergence conditions are satised and, if they are not, send a feedbacksignal to the xation loop.4.2 Vergence conditions, quality of xationWhen the optical axes do not intercept, the epipolar constraint is not satised for the opticalcenter. The vergence constraint between two time instants can be expressed by saying thatthe two optical axes intersect, 9 X0 such that x0(t) = [0 0 1]T ) x0(t+ 1) =[0 0 1]T .It is immediate to verify that the above conditions hold if and only if the direction oftranslation or orthogonal to the direction of rotation. Indeed, a more synthetic conditionthat can be derived by observing thatthe optical axes intersect, Q33 = 0.In fact, clearly if the optical axes intersect, the optical center x0 must satisfy the epipolarconstraint: x0(t+ 1)TQx0(t) = 0 ) Q33 = 0: (57)Vice-versa, assume that 8x0, the condition x0(t+1) 6= [0 0 1]T implies x0(t) 6= [0 0 1]T whileQ33 = 0. Write x0(t + 1) as [ 1]T with 6= 0. Then the epipolar constraint must beviolated for all correspondence points of the form [0 0 1]T :[ 1]Q[0 0 1]T 6= 0 ) Q13 + Q23 +Q33 6= 0: (58)If Q13 = Q23 = 0, then we conclude that Q33 6= 0, from which the contradiction. If at leastone of Q13;Q23 6= 0, by choosing = Q23; = Q13, we conclude again Q33 6= 0, whichcontradicts the hypotheses.Therefore, when the vergence conditions are not satised and the optical axes do notintersect, the scalar jQ33j is a measure of the quality of vergence. From a geometrical pointof view, Q33 is the volume of the parallelepiped with sides equal to the translation vector,the optical axis of the camera at time t and the one at time t+ 1.Since at each step we can estimate the matrix Q from all the visible points, we could useQ33 as a sensory signal to be fed-back to the xation loop. This would allow us to designa vergence control that exploits all the visible features, rather than the projection of thexation point alone. This issue is not explored in the present paper and is an object offuture research. 18

Page 19: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

4.3 Sensitivity and degradation of the constraintIn the previous sections we have treated the problem of motion estimation as an identicationtask where the class of models was determined by the epipolar constraint under the xationassumption. We now want to ask ourselves: suppose the actual process generating the datedoes not exactly fall within the given class of models, how do small deviations from the classaect the quality of the estimates?More specically, suppose that our camera is not tracking exactly the xation point. Themeasurements we get from the image plane do not satisfy the epipolar constraint of eq. (22)for any choice of the parameters. However, if the deviation from the constraints is small, wewould like our estimates to deviate little from the true motion parameters.Suppose that our measurements are generated for an object which rotates about thexation point with , translates along the xation axis by v and also drifts away from thexation point with some velocities 1 and 2 along X and Y respectively. Therefore themodel generating the data looks likeXi(t+ 1; ) = R()0B@Xi(t) 264 00d(t) 3751CA + 264 12d(t+ 1) 375 (59)where we measure xi(t; ) = (Xi(t; )) (60)which we collect into the matrix (t; ) (61)as in equation (29). For = 0 the epipolar constraint is satised by the actual motionparameters v;: (t; 0)S(v)R() = 0 (62)where S and R are a 99 matrix and a 9vector dened as in (30). However, in the presenceof disturbances , there is no element in the class of models that satises the constraints, i.e.8 > 0; ) (t; )S(~v)R(~) 6= 0 8 ~v 2 IR 8 ~ 2 IR3: (63)At this point, assuming small, we may seek for the perturbations ~v = vv and ~ = that make the above residual zero up to second order terms:v; = arg mink(t; )S(v v)R( )k: (64)This is essentially the task of the recursive lter described in the previous sections, wherethe process to be minimized is the innovation. Expanding around the zero-perturbationconditions, we have(t; )S(v v)R( ) = (t; 0)S(v)R() + @@1S(v)R()1 + @@2S(v)R()2 +(t; 0)@S@v (v)R()v (t; 0)S(v)@R@() +O(; v; ): (65)19

Page 20: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

We can now nd the perturbations v = v(; v;) and = (; v;) that make theresidual zero up to higher order terms fromh @@1SR @@2SR i = (t; 0) h @S@vR S @R@ i " v # (66)which we will write as B(v;) = A(v;) " v # : (67)The N 4 matrix A loses normal column rank only at the singular conguration v = 1, = [0 0 ]T for all 2 [0; ). However, this conguration does not belong to the state-spaceof the lter, for it has been eliminated by the normalization constraint. Therefore we canconclude " v # = ATA1ATB := C(v;) (68)and the induced norm of the matrix C(v;) is a measure of the \gain" between (small)disturbances in the constraints (or drifts outside the model class) and the errors in theestimates. In the experimental section we will show the result of a simulation where thedisturbance level was increased up to the point in which the lter based upon the xationconstraint did not converge.5 Attitude estimation from xationIn some cases it may be desirable to reconstruct not only the relative velocity between theobject being xated and the viewer, but also their relative conguration, in the lines of [1].Of course the relative conguration, assuming the initial time as the base frame, can beobtained by integrating velocity information, and this is indeed the only feasible solutionwhen the motion of the viewer induces drastic changes in the image, such as occlusion,appearance of new objects etc. .While in most applications the scene changes signicantly and we cannot assume thatthe same features are visible over extended periods of time, in the case of xation we canassume that the object stays in the eld of view and we can integrate structure informationfrom the same features to the extent in which they are visible.Notice that, while in all the previous cases involving estimation of velocity (or relativeconguration in the moving frame), we could decouple the motion parameters from thestructure and therefore formulate lters involving only motion parameters and measuredprojections, in the case of the absolute orientation, it is necessary to include structure in thestate of the lter.The xation assumption gives the strong constraint that the motion of the object beingxated rotates about the xation point and translate along the xation axis. This resultsin the fact that the object remains in the eld of view as long as we xate it. Therefore wewill adopt an object-centered model, where the coordinates of each point are constant overtime: oPi = const: (69)20

Page 21: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

Since we measure the projection of the coordinates of the point in the reference frame ofthe camera, we can enforce that the coordinates relative to the camera reference at the rstinstant are constant: toPi := oPi 264 00d0 375 = const (70)which relates to the measured projection viayi(t) = 0B@tRto toPi + 264 00d(t) 3751CA : (71)where tRto is the relative orientation between the viewer reference at time t and the samereference frame at the initial time t0.We may conceive at this point a dynamic model having the trivial constant dynamics ofthe points in the state, and the above projection as the measurement constraint. In orderto do so, we have to insert tRt0 and d(t), along with their derivatives, into the state of thelter, which becomes therefore 3N + 8-dimensional:8>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>: t0Pi(t+ 1) = t0Pi(t) t0Pi(0) = 264 y1y21 375 8i = 1 : : : NtRt0(t+ 1) = tRt0(t)e^ tRt0(0) = I(t+ 1) = (t) + n (0) = 0d(t+ 1) = d(t) + v(t) d(0) = d0v(t+ 1) = v(t) + nv(t) v(0) = 0yi(t) = 0B@tRt0 t0Pi + 264 00d(t) 3751CA : (72)where denotes an ideal perspective projection. In the case of weak-perspective, the lastmeasurement equation transforms intoyi(t) = " 1 0 00 1 0 # tRt0 t0Pid : (73)There is an additional constraint that can be imposed in order to set the overall scaling,which is t0P0(t) = 264 001 375 8t: (74)The above can be imposed either as a measurement constraint, or as a model constraint bysetting the variance of the corresponding state to zero, as in [1].The above model may be reduced into a minimal one by removing the dynamics of theabsolute orientation d(t); R(t), and by exploiting the fact thatt0Pi = 264 XYZ 375i = 264 xy1 375i (0)Z i0: (75)21

Page 22: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

Since we measure the initial projection of each feature point, we can leave only the scaling(initial depth) Z i0 in the state. It must be noticed, however, that the error in the locationof the initial features is propagated through time, since we do not update the states corre-sponding to the measured projections. If one is willing to trade the drift due to the initialmeasurement error with eliminating 2N states from the model, he ends up with the followingsystem 8>>>>>>>>>><>>>>>>>>>>:Z i0(t+ 1) = Z i0(t) Z i0(0) = 1(t+ 1) = (t) + n(t) (0) = 0v(t+ 1) = v(t) + nv(t) v(0) = 0yi(t) = 0B@R(t)264 y1(0)y2(0)1 375Z i0(t) + 264 00d(t) 3751CA :Z00 (t) = 1 (76)where R(t) and d(t) are computed from the states (t) and v(t) at each time by integrating(R(t+ 1) R(t) = e(t)^ R(0) = Id(t+ 1) = d(t) + v(t) d(0) = 1: (77)A simple EKF based upon the model above recovers the structure modulo the initial distancefrom the xation point d0. If such a distance is known, it is possible to recover the fullstructure, as well as the motion parameters (t) and v(T ).6 Experiments6.1 Experimental conditionsIn order to test the eectiveness of the schemes proposed, and compare it against equivalentmotion estimation techniques that do not take into account the xation constraint, we havegenerated a cloud of dots within a cubic volume at d = 2m in front of the viewer. Thesedots are projected onto and ideal image plane with unit focal length and 500 500 pixels,corresponding to a visual angle of approximately 30o. Noise has been added to the projec-tions with 1 pixel std, corresponding to the average performance of current feature trackingtechniques [2]. One random point in the cloud is chosen as the xation point, and the cloudis then made rotate about this point and translate along the xation axis with smooth butnon-constant velocity.6.2 Recursive ltersIn gure 7 (top-left), the 4 components of the state of the lter described in section 3.2are plotted, along with the ground truth in dotted lines. The plot on the right shows theabsolute estimation error.The same data have been fed to the essential lter [11], which estimates 5 states cor-responding to the direction of translation and the rotational velocity without enforcing thexation constraint. The states corresponding to the same motion described above, as long22

Page 23: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

as ground truth, are plotted in the left-plot of gure 7 (bottom). The estimation error ismarginally higher than the one of the lter with the xation constraint.0 10 20 30 40 50 60 70 80 90 100

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

frame

com

pone

nts

of th

e st

ate

0 10 20 30 40 50 60 70 80 90 100−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

frame

abso

lute

erro

r

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

frame

com

pone

nts

of th

e st

ate

0 10 20 30 40 50 60 70 80 90 100−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

frame

abso

lute

erro

r

Figure 7: (top-left) Estimates of the 4-dimensional state of the lter for estimating relativeorientation under the xation constraint. Filter estimates are in solid lines, while groundtruth is in dotted lines. The estimation error (top-right) is smooth and strongly correlated,which is a symptom of poor tuning of the lter. If we do not enforce the xation constraint,we need to estimate 5 motion parameters. The lter which does not enforce the xationconstraint converges faster (bottom-left) and the estimation error is larger but far less cor-related (bottom-right), which indicates that the potential limits of the scheme have beenachieved.In our preliminary set of experiments we have observed a higher robustness level in thelter enforcing the xation constraint. For example, the maximum noise level tollerable bythe lter not enforcing the xation constraints in this particular experimental setup is 1:5pixels, while the lter enforcing xation performs up to 2:5 pixels, as reported in gure 8.6.3 Attitude estimationIn gure 9 we report the estimates of the absolute orientation and structure as estimated bythe lter described in section 5. The structure parameters (initial depth of all points) hasbeen plotted against the true parameters, assuming that the initial distance of the xation23

Page 24: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0.4

0.6

0.8

1

1.2

frame

com

pone

nts

of th

e st

ate

0 10 20 30 40 50 60 70 80 90 100−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

frame

com

pone

nts

of th

e st

ate

Figure 8: (left) Convergence of the states of the lter enforcing the xation constraint for anoise level in the feature tracking of 3 pixels. The lter that does not enforce the xationconstraint does not converge in the same experimental situation. Initial conditions, tuningof the lters and noise levels are the same for both lters.point is known. In general, structure can be recovered only up to a scale factor. The fourmotion components are also plotted, along with the estimation error, in the right plot.It must be noticed that this lter has a N+4-dimensional state, unlike the one describedabove which has dimension 4. Furthermore, the lter has proven very sensitive to the initialconditions in the motion parameters, while the structure parameters can be safely initializedto 1, which corresponds to having the visible objects at on the image plane. The erroris signicantly correlated and convergence is slow for the motion parameters, which areobservable only through 2 levels of bracketing with the state equation.In case occlusions occur in the image plane or some features disappear or exit the eld ofview, it is necessary to resort to the schemes described in section 3.2, unless we are willingto deal with a lter with a variable number of states.6.4 Singularities and normalizationAs we have mentioned in section 2.4, the non-normalized epipolar representation contains asingularity in v = 1; = [0 0 ]T , where the innovation of the lter becomes zero. Therefore,even when motion does not correspond to pure translation about the optical axis (the singularconguration), the lter may converge to the singular conguration whenever initialized farenough from the true state. In particular, when the noise level increases, the residual surfacebecomes more and more irregular, and it becomes easier for the lter to fall into the singularconguration.In gure 10 (left) we show the state of the lter that is initialized far from the trueinital conditions for a measurement noise level of 1 pixel. The lter converges to a statecorresponding to v = 1 and = [0 0 ]T with some . Correspondingly, the innovation goesto zero (g. 10 right) and the lter saturates. The variance of the estimation error keepsincreasing after the lter has saturated. In gure 10 (bottom) we plot the state with errorbars24

Page 25: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

0 20 40 60 80 100 120 140 160 180 200−0.5

0

0.5

1

1.5

2

2.5

3

3.5

frame

com

pone

nts

of th

e st

ate

0 20 40 60 80 100 120 140 160 180 200−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

frame

abso

lute

erro

r

0 20 40 60 80 100 120 140 160 180 200−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

frame

com

pone

nts

of th

e st

ate

0 20 40 60 80 100 120 140 160 180 200−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

frame

abso

lute

erro

r

Figure 9: (top-left) Estimates of the N + 4-dimensional state of the lter for estimatingabsolute orientation and structure. Success in the estimation process depends crucially onthe initial conditions of the motion parameters (bottom-left), while the structure-parameterscan be safely initialized to 1, which corresponds to having the visible objects at on theimage-plane. The estimation error (top-right) is strongly correlated and decays slowly. Theestimation error for the motion parameters, initialized within 1% o the true values, isplotted in (bottom-right) for comparison with the relative motion estimation scheme.corresponding to the diagonal elements of the variance/covariance matrix of the estimationerror. It can be seen that, after the variance decreases due to the initial convergence towardsthe minimum, it keeps increasing steadily once the lter has saturated.When the same initial conditions and noise levels are applied to the lter based upon thenormalized essential matrices, convergence is achieved without any problems of saturation(gure 11).6.5 Sensitivity to the xation constraintIn order to experiment with the degradation of the lter enforcing the xation constraint inpresence of motions that violate the xation assumptions, we have perturbed the experimentsdescribed above by translating the cloud on a plane orthogonal to the xation axis at randomwithin a standard deviation ranging from 1% to 6% of the norm of the essential matrix. We25

Page 26: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

have started from the true initial conditions and added no noise to the measurements. Foreach level of disturbance, we have run 100 experiments, and computed the estimation errorfor the translation along the xation axis and for the rotation components. The results areplotted in gure 12, where we show the average error across dierent trials, with the standarddeviation showed as an errorbar. The results seem to conrm that the degradation of theestimates is graceful for small disturbances. However, when the disturbance exceeds 6% ofthe overall norm of the current relative motion, the lter does not reach convergence.7 ConclusionsWe have studied the problem of estimating the motion of a rigid object viewed from amonocular perspective camera which is actuated as to track one particular feature-point inthe scene. We have cast the problem in the framework of epipolar geometry, and formulatedboth closed-form and recursive schemes for recursively estimating motion and attitude usingthe xation constraint. The framework of dynamic epipolar geometry allows us to comparethe proposed scheme directly against the equivalent scheme that does not enforce the epipolarconstraint. Also, the degradation of the performance in the presence of disturbance in thexation hypothesis is assessed.The performance of the estimators have been compared via simulations to the equivalentestimation schemes that does not enforce the xation constraint. The results seem to indicatethat using the xation constraint helps achieving better accuracy, in the presence of perfecttracking. Degradation of the performance in the presence of disturbance in the xationconstraint is graceful for small disturbances. It will be subject to future research to studyhow to compensate for non-perfect tracking by feeding back a measure of \goodness ofxation" and performing a shift-registration of the origin of the image plane.References[1] A. Azarbayejani, B. Horowitz, and A. Pentland. Recursive estimation of structure andmotion using relative orientation constraints. Proc. CVPR, New York, 1993.[2] J. Barron, D. Fleet, and S. Beauchemin. Performance of optical ow techniques. RPL-TR 9107, Queen's University Kingston, Ontario, Robotics and perception laboratory,1992. Also in Proc. CVPR 1992, pp 236-242.[3] M. J. Barth and S. Tsuji. Egomotion determination through an intellingent gaze controlstrategy. IEEE Trans. Pattern Anal. Mach. Intell., 1993.[4] F. Bullo, R. M. Murray, and A. Sarti. Control on the sphere and reduced attitudestabilization. In Proceedings of the IFAC Symposium on Nonlinear Control SystemsNOLCOS, Tahoe City, June 1995.[5] C. Fermuller and Y. Aloimonos. Tracking facilitates 3-d motion estimation. BiologicalCybernetics (67), 259-268, 1992. 26

Page 27: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

[6] C. Fermuller and Y. Aloimonos. The role of xation in visual motion analysis. Int.Journal of Computer Vision (11:2), 165-186, 1993.[7] A.H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, 1970.[8] H. C. Longuet-Higgins. A computer algorithm for reconstructing a scene from twoprojections. Nature, 293:133135, 1981.[9] R.M. Murray, Z. Li, and S.S. Sastry. A Mathematical Introduction to Robotic Manipu-lation. CRC Press, 1994.[10] D. Raviv and M. Herman. A unied approach to camera xation and vision-based roadfollowing. IEEE Trans. on Systems, Man and Cybernetics vol. 24, n. 8, 1994.[11] S. Soatto, R. Frezza, and P. Perona. Motion estimation via dynamic vision. Submittedto the IEEE Trans. on Automatic Control. Registered as Technical Report CIT-CDS-94-004, California Institute of Technology. Reduced version to appear in the proc. of the33 IEEE Conference on Decision and Control. Available through the Worldwide WebMosaic (http://avalon.caltech.edu/cds/techreports/) , 1994.[12] M. A. Taalebinezhaad. Direct recovery of motion and shape in the general case byxation. IEEE Trans. Pattern Anal. Mach. Intell., 1992.[13] J. Weng, T.S. Huang, and N. Ahuja. Motion and structure from line correspondences:closed-form solution, uniqueness and optimization. IEEE Trans. Pattern Anal. Mach.Intell., 14(3):318336, 1992.

27

Page 28: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

0 10 20 30 40 50 60 70 80 90 100

−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

frame

com

pone

nts

of th

e st

ate

10 20 30 40 50 60 70 80 90 100−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1x 10

−3

frame

inno

vatio

n co

mpo

nent

s

10 20 30 40 50 60 70 80 90 100−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

frame

mot

ion

com

pone

nts

and

varia

nce

Figure 10: (top-left) Convergence of the lter to the singular conguration. For a noise levelof 1 pixel and the initial conditions far enough from the true values, the state of the lterends up in the minimum of the residual surface corresponding to cyclorotation (all states arezero but 3 which is arbitrary). Correspondingly the innovation becomes zero (top-right)and the variance increases (bottom plot). The variance is represented via the errorbars inthe motion estimates, which are the diagonal elements of the variance/covariance matrix ofthe estimation error. 28

Page 29: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

0 10 20 30 40 50 60 70 80 90 100−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

frame

com

pone

nts

of th

e st

ate

0 10 20 30 40 50 60 70 80 90 100−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1x 10

−3

framein

nova

tion

com

pone

nts

0 10 20 30 40 50 60 70 80 90 100−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

frame

mot

ion

com

pone

nts

and

varia

nce

Figure 11: (top-left) Convergence of the lter enforcing the normalization constraint. Thereare no singular congurations in the state manifold, and the lter converges fast to thecorrect estimate. The innovation is small but non-zero (top-right), and the variance of thestate decreases as time grows (bottom).29

Page 30: Motion from Fixationpdfs.semanticscholar.org/1750/47601930e760e13653a02415...Motion from Fixation Stefano So atto and Pietr o Per ona Con trol and Dynamical Systems California Institute

0 1 2 3 4 5 6 70

0.5

1

1.5

2

2.5

3

3.5

4

4.5

deviation from fixation (%)

norm

of e

stim

atio

n er

ror f

or tr

ansl

atio

n (%

)

0 1 2 3 4 5 6 7−5

0

5

10

15

20

deviation from fixation (%)

norm

of e

stim

atio

n er

ror f

or ro

tatio

n (%

)

Figure 12: Estimation error versus disturbances in the xation constraint. The plots show theaverage over 100 trials, with the standard deviation across trials shown as an errorbar. Whenthe xation constraint is violated by adding spurious translation components ranging from1 to 6 percent of the norm of the xating motion, the estimation error increases gracefully.In the left plot the estimation error for the translation along the optical axis, on the rightthe norm of the estimation error for the rotational velocity.30