a constrained optimization approach to globally consistent …alonzo/pubs/papers/iros02.pdf ·...

AbstractMobile robot localization from large-scale appearancemosaics has been showing increasing promise as a low-cost, high-performance and infrastructure free solution tovehicle-guidance in man-made environments. The genera-tion of the globally consistent high-resolution mosaics cru-cial to this procedure suffers from the same problem ofloop-closure in cyclic environments that is commonlyencountered in all map-building procedures. This paperpresents a batch solution to the problem of reliably gener-ating globally consistent mosaics at low computationalcost, that simultaneously exploits the topological con-straints among the observations and minimizes the totalresidual in observed features. An extension to a generalscalable framework that facilitates an incremental onlinemapping strategy is also presented, along with resultsusing simulated data and from real indoor environments.

1 Introduction

Mosaic-based localization [5] is a technique employingreal-time imagery to track motion over a previously storedhigh-resolution image of a known environment. Naturally-occurring features in a composite mosaic built usingimages from a downward-looking camera encode positioninformation corresponding to the robot pose when the fea-ture observation was made. Template matching betweenobserved and predicted views provide absolute positionfixes to dampen the otherwise unbounded growth of errorsoriginating in a primary position estimation system such asodometry. Previous papers on the subject of mosaic-basedposition estimation have discussed the feasibility andimplementation issues [5] of this approach, and algorithmsto construct the mosaic itself [10].

1.1 ChallengesMap construction requires a solution to the dual problemsof recovering the motion history of the robot, as well asreconstructing the robot’s environment based on observa-tions made over the duration of the robot motion. In ourapplication, the observations are in the form of imagescaptured sequentially in one-dimensional swaths calledsegments by passing a calibrated downward looking cam-era over the floor. Recorded poses correspond to the 2Dlocation and heading of each image as determined from

dead-reckoning. For high-speed navigation to be possible,the mosaic constructed using these images must satisfy theconditions of local smoothness, whereby the error in tem-porally adjacent images is bounded to some acceptablysmall value, and global consistency, whereby the positionreported at a particular location becomes neither time norpath dependent, and is uniquely represented in the con-structed model. These requirements are common to allmap-building procedures, independent of the type of sen-sor used for the observations.The observations made by the sensor in mosaic-basedlocalization differ from those of sensors like laser range-finders and those used in beacon-based navigation sys-tems, in that the features observed have low persistence inboth temporal and spatial domains. The problem of gener-ating accurate mosaics also typically involves the manipu-lation of thousands of images, and hence demands a time-efficient and scalable solution to be tractable.

1.2 Prior WorkConsiderable literature exists on the field of image mosa-icing and its applications, although only more recentlyhave real-time and globally consistent solutions beenaddressed [9]. Real-time video mosaicing of the oceanfloor for navigation, exploration and wreckage visualiza-tion has also received attention. Work by Fleischer et al.[2] used iterative smoother-follower techniques to reduceerrors accumulated over an image chain, but no mentionwas made of the tractability of its extension to networks.Lu and Milios [6] enforced global consistency in theirwork on map-building with laser range-scans, solving anoverconstrained system of measurements to compute therequired least-squares perturbation to absolute scan poses.Gutmann and Konolige [4] extended the approach to anonline algorithm for mapping cyclic environments. A localregistration step linked scans in a K-neighborhood of thelast scan in time depending primarily on K. Loops wereclosed incrementally when a patch correlation schemereturned an unambiguous high match score, using thescheme of [6]. The complexity of this step, however, was

in the number of scans. Although marginal costreductions could be achieved using sparse matrix tech-niques, it was still far too prohibitive for our purposes.Work by Newman [8] on the problem of simultaneous

O n3( )

A Constrained Optimization Approach to Globally Consistent Mapping

Ranjith Unnikrishnan and Alonzo Kelly

Robotics Institute

Carnegie Mellon University

Pittsburgh, USA

email: [email protected], [email protected]

localization and mapping (SLAM) recognized that obser-vations of relative positional relationships between land-marks had uncertainties that were uncorrelated touncertainties in robot position. An overcomplete, and pos-sibly inconsistent set of such observations formed the statevector in a relative map, that could be updated in time lin-ear in the number of observations. When a consistent mapwas explicitly required, geometric projection filtering wasdone to repeatedly project the relative map estimates ontoa constraint space described by the topological relation-ships between the observations. The constraint equationswere chosen heuristically amongst a large but finite set,such that they spanned a large subset of the observations.Previous work by the authors [10] described how the span-ing set of constraint equations required to enforce globalconsistency could be easily extracted, and presented analgorithm for the same. Each loop equation in this set cor-responded to an element of the fundamental cycle basis ofthe topological graph describing the network of observa-tions. On the assumption of good local estimates fromodometry, an efficient scalable solution to the problem ofmap-building in large cyclic environments was presentedfor observations with a short sensory horizon.This paper presents a formulation that systematicallyrelaxes the assumption of good initial estimates, while pre-serving the low time complexity of the algorithm in [10].Our method employs the approach of total residual mini-mization between observations, while incorporatingknowledge of prior state uncertainty and enforcing topo-logical constraints to maintain map consistency. A frame-work for the solution that facilitates incremental onlineimplementation is also proposed.

2 Residual at an Image Overlap

Consider a linear sequence of images, such as that shownin Figure 1. The images overlap such that each image con-tains a portion of the scene in common with both theimage before and after it in the sequence. Let each imagebe assigned with an index i with magnitude increasing inorder of image capture. Let each image i be associatedwith its pose expressed relative to the image preced-ing it in its sequence. The first image in each sequence hasits pose expressed relative to the world frame w.

2.1 Non-temporally adjacent overlapsLet i and j be two images that are not adjacent in sequence,and possibly even belong to different image sequences,but have features common to them. Such image pairs con-

stitute potential loop-forming overlaps in a network ofimages. Let at least two point features be identified in theimages i and j. The residual of the m-th feature observed inboth images can be expressed in terms of the relative posebetween the images, as

where is the feature residual, is theposition of the m-th feature in the i-th image (subscript)represented in the coordinate frame of image i (super-script), is the 2-by-2 rotation matrix relating the coordi-nate frame of image j with respect to that of image i, and

is the corresponding displacement between thetwo frames.On linearizing, this can be expressed as

where is the Jacobian relating the changein position of feature m in image j with change in relativepose of image j with respect to image i, and rep-resents the linearized change in relative pose.Collecting the terms for feature residuals in non-tempo-rally adjacent overlapping images, where , gives asystem of the form

which can be solved iteratively by the least-norm left-pseudoinverse solution as

Here the matrix to be inverted for each pair of overlappingimages is a 3-by-3 matrix, and hence its computation isinexpensive. Each such solution from feature matchingprovides an initial estimate for the relative pose betweennon-temporally overlapping images.Note that this formulation leaves open the scope of usingrobust estimation techniques, such as iteratively re-weighted least squares (IRLS), commonly encountered incomputer vision literature for outlier rejection. Thereexists a large body of literature on the topic, and furtherdiscussion is avoided with the understanding that the tech-niques can be suitably incorporated in each term involvingfeature residual minimization using appropriate weightingcoefficients.

2.2 Temporally adjacent overlapsFor overlaps between images i-1 and i that are adjacent insequence, the initial estimates of relative pose areprovided by dead-reckoning. We can however concatenatesimilar equations for residuals occurring at each tempo-rally adjacent overlap, to give a system of equations in thelinearized observer form

where the concatenated change vector of change in resid-ual is

23

5

1

4xw

ywx5

y5

Figure 1: A linear mosaic of 5 images

ρii 1–

zij m( ) p i m,( )i

Rjip j m,( )

j– tj

i–=

zij m( ) ℜ2∈ p i m,( )i ℜ2∈

Rji

tji ℜ2∈

∆zij m( ) H j m,( )i ∆ρj

i–=

H j m,( )i ℜ2 3×∈

∆ρji ℜ3∈

j i 1+≠

∆ zij Hij∆xij=

∆ xij HijT

Hij( )1–Hij

T ∆zij=

ρii 1–

∆z H∆x=

∆z ∆z01 0( ) ∆z01 1( ) … ∆zij m( ) … ∆zN 1N M( )–

T=

and the state vector of change in relative pose is given by

2.3 Error ModelingIn registered non-temporally adjacent image pairs, theerrors in estimates of feature residuals arise primarily fromimage noise which leads to inaccuracies in specification ofpoint correspondences. This error can be treated as zeromean noise, uncorrelated between residual measurements.We model this error in a loop-closing overlap betweenimages i and j as a diagonal matrix with entries , ourestimate of variance in pixel noise. Then the covariancematrix describing non-systematic error in estimate of rela-tive pose between images i and j at loop-closure is givenby

For temporally adjacent images, say i-1 and i, the covari-ance can be estimated using a suitable choice ofodometry model. Hence the a priori covariance of thestate vector consisting of the concatenated set of relativeposes between temporally adjacent images and relativeposes between loop-closing non-temporally adjacent over-laps is given by the block diagonal matrix as

The matrix is block diagonal and non-singular, as errorsbetween relative poses are inherently decoupled, and theuncertainty in estimate of non-temporally adjacent relativeposes is derived solely from feature matching and is inde-pendent of poses of other images participating in the loop.

3 Loop analysis

Consider the two cycled network shown in figure 2, com-posed of 5 overlapping segments. The image pairs (a,h),(i,j), (b,c), (d,e), (k,l) and (f,g) form 6 non-temporallyadjacent overlapping image pairs. Let us assume that indi-ces a through h are in increasing order of magnitude. Asper our convention, every image pose is referenced with

respect to its immediate predecessor having a smallerimage index. Transforms represented by upper-case refer to unknowns to be determined, but whose initial esti-mates are available either from odometry, in the case oftemporally adjacent images, or from feature matchingdescribed earlier, in the case of overlaps between non-tem-porally adjacent images. A subscript index refers to theframe being referenced, and a superscript index refers tothe frame with respect to which the reference is made.Hence the homogenous transform relating coordinateframe of image b to that of image a is given by

Figure 3 shows an alternate representation of the network,termed a topology graph, having the property of havingthe same number of cycles as the original network ofobservations. We define a topology graph as a combina-tion of a vertex set , and an edge set . Each vertex

is composed of a set of sub-nodes . Each sub-node belonging to a vertex represents a unique“footprint” of features associated with , i.e. the imagescorresponding to each subnode of a vertex constitute non-temporally adjacent overlaps with common features.Edges between sub-nodes correspond to observed poserelationships.The homogenous transform expressing a base frame rela-tive to itself, as observed along any cycle in the graph, say(a-b-c-d-e-f-g-h-a), should equal the identity. Mathemati-cally,

Each equation is a function of the relative poses betweenimages participating in the associated cycle. The set ofsuch equations constitutes the constraint equations that areto be satisfied for consistency. In our example, there arethree such equations corresponding to the three possiblecycles (a-b-c-d-e-f-g-h-a), (a-i-j-k-l-f-g-h-a), and (i-b-c-d-l-k-j-i) in the graph. As detailed in [10], the linearizedform of any two of these three equations form an indepen-

∆x ∆ρ10 ∆ρ2

1 … ∆ρji … ∆ρN

N 1–T

=

σ2

Pij σ2Hij

THij( )

1–=

Pi - 1, i

P0

P0 diag P01 P12 … Pij …PN 1N–, , , ,( )=

xa

xd

xcxb

xgxf

xe

xh yaˆ

yhˆ

ybˆyc

ˆ

ydˆ

yeˆ

yfˆ

ygˆ

yw

xw

xi

yiˆ

xj

yjˆ

xk

ykˆ xl

ylˆ

Figure 2: Two-cycled network of 5 image segments

ab

c

d

ef

g

h

xa

yaˆ

xh

yhˆ

xb

ybˆ

xc

ycˆ

xd

ydˆ

xeyeˆxf

yfˆ

xg

ygˆ

Figure 3: Topology graph correspondingto a two-cycled network.

xi

yiˆ

xj

yjˆ

xk

ykˆ

xlylˆ

i

j

k

l

T

Tb

aTa 1+

aTa 2+

a 1+Ta 3+

a 2+…Tb

b 1–=

υ εv υ∈ Φ

n Φ∈ vv

Taa

Tba

Tcb

Tdc

Ted

Tfe

Tgf

Thg

Tah⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ I3 3×= =

dent basis in the space of constraints. The set of cycles thatare used form an independent and complete cycle cover,termed in graph theory as a fundamental cycle basis.

4 Formulation as a Constrained Estimation

Let each equation in the chosen set of L constraint equa-tions be of the form

where X is the concatenated set of relative poses betweentemporally adjacent images and non-temporally adjacentoverlapping images.The maximum likelihood solution to X is then a minimizerof the Lagrangian

or in matrix representation as

The first term in J penalizes the sum of square residuals offeature observations in all overlaps, and the second termpenalizes inverse covariance weighted perturbations ofrelative poses from their estimates. These two termstogether constitute a full non-linear optimization that facil-itates linearization of the estimation problem at every stepusing all feature observations and the current best estimateof the entire history of the robot trajectory. The third termcontaining the vector of Lagrange multipliers enforcethe loop equations as hard constraints that are to be met bythe posterior estimate upto first order accuracy.The true X is a combination of the estimate value and aperturbation . Hence the cost function can be rewrittenas

Linearizing and differentiating to extract the Lagrangeconditions at a point of constrained minima gives

and

which together yield the system

Note that this is similar to the basic form of the VariableState Dimension filter [7], and that recursive partitioning

can be done to take advantage of the sparseness of the sys-tem matrix to be inverted. The matrix is blockdiagonal with a block size of 3, and can hence be invertedin time linear in the number of images. The solution to thesystem can thus be obtained directly by solving for in

and substituting back to solve for in

The above update is performed iteratively until conver-gence is achieved as per some suitable criterion. The Lev-enberg-Marquardt procedure can be used to modify thisupdate rule by adding a value to the diagonal elementsof the linearized system before inversion. This value of is increased to smoothly switch towards gradient descent,or decreased to switch towards Gauss-Newton update,depending on whether the value of the cost function Jincreases or decreases respectively.Computational Complexity: The state vector con-sists of relative pose terms between temporallyadjacent images, and terms corresponding to k non-temporally adjacent overlaps. If we denote the total num-ber of relative poses in the state vector by n, the total num-ber of loops by l, and the average number of featuresconsidered in each overlap by p (typically chosen to be 6),it can be shown that can be computed in

and is subsequently computable in time. The computational complexity of each

iteration in the algorithm is hence bounded by where typically .

5 Constrained estimation as a geometric projection operation

Back-substitution of the value of yields a solution to of the form

where

The unconstrained posterior estimate of state obtained ononly minimizing feature residuals is given by the weightedleft pseudoinverse solution

with posterior covariance given by This implies that the constrained estimate of required per-turbation in relative pose is of the form

which is the constrained estimator form of the ExtendedKalman filter, with equivalent to theKalman gain matrix. The constraints consisting of the setof fundamental loop equations are equivalent to artificialobservations with zero measurement noise [8].Hence the Lagrange formulation of the map-building

gl X( ) bl=

J X λ,( ) 12--- zij m( )

Tzij m( )

12--- ρj

i ρji

–( )TPij

1– ρji ρj

i–( )

i j,∑+

m∑

i j,∑=

+ λlT

gl X( ) b–[ ]l 1=

L

∑

J X λ,( ) 12---z X( )

Tz X( ) 1

2--- X X–( )

TPo

1–X X–( )+=

λTg X( ) b–[ ]+

λ

X∆X

J ∆X λ,( ) 12---z X ∆X+( )

Tz X ∆X+( ) 1

2---∆X

TPo

1– ∆X+=

+ λTg X ∆X+( ) b–[ ]

∆X∂∂J

HTz X( ) H

TH∆X Po

1– ∆X GTλ+ + + 0= =

λ∂∂J

g X( ) G∆X b–+ 0= =

HTH Po

1–+ G

T

G 0

∆X

λH

Tz X( )–

b g X( )–=

HTH Po1–

+

λ

G HTH Po

1–+( )

1–G

Tλ g X( ) b–[ ] G HTH Po

1–+( )

1–H

Tz X( )–=

∆X

HTH Po

1–+( )∆X H

Tz X( )– G

Tλ–=

µµ

∆Xni 1– ni

nk

λO np nl

2l3

+ +( ) ∆XO np nl+( )

O np nl2

l3

+ +( ) l n«

λ∆X

∆X I KG–[ ] HTH P0

1–+[ ]

1–H

Tz X( )– K b g X( )–[ ]+=

K HTH P0

1–+[ ]

1–G

TG H

TH P0

1–+[ ]

1–G

T[ ]1–

=

∆Xr HTH P0

1–+[ ]

1–H

Tz X( )–=

Pr HTH P0

1–+[ ]

1–=

∆X I KG–[ ]∆Xr K b g X( )–[ ]+=

K PrGT GPrG

T[ ] 1–=

problem is a constrained estimator that (a) projects the fea-ture residual vector into the column space of H toyield a least-squares unconstrained left pseudoinverse esti-mate of perturbation , and (b) projects the uncon-strained estimate with covariance onto a linearizedhyperplane approximation of the non-linear constraint sur-face evaluated at the unconstrained estimate.

6 Incremental Online Mapping

The ability to decouple the residual minimization and con-sistency enforcement steps using a Kalman filter suggeststhe possibility of an incremental map-building strategy. Ateach step of incorporation of a new sensor reading, thereare a finite number of possible cases, depending on thechange enforced in the topology graph.Let there be n elements in the state vector X with covari-ance given by matrix . Let a new observation be made,and the state element corresponding to the relative posebetween the current observation and the temporally pre-ceding observation be Case A: No feature is observed that is present in a non-temporally adjacent observation. In this case there is nochange in the topology graph. The current observation isthen simply an extension of a sequence of observationswhose associated relative poses are independent. The statevector X is simply augmented by the new relative pose

, and the diagonal of by its covariance from anodometry model.Case B: A feature is observed that is present in a non-tem-porally adjacent observation with no new loop formed.Here an additional node is added to the topology graph. Inaddition to augmenting the state vector and covariancematrix as in case A, another relative pose between the non-temporally adjacent overlapping images in added to X andits covariance matrix from feature matching is added as adiagonal element in .Case C: A feature is observed that is present in a non-tem-porally adjacent observation and a loop is closed. The statevector and covariance matrix are augmented as in Case B,and a subsequent step of constraint enforcement is carriedout. We identify the newly added cycle in the topologygraph as the shortest cycle containing the newly addededge, with the weight of each graph edge set proportionalto the number of images lying in the edge. The state vectorand covariance matrix can then be partitioned as

where consists of the relative pose elements that are part

of a previously assimilated cycle in the topology graph,i.e. part of a cycle other than the one newly created,

consists of the relative pose elements that are part of

both, a previously assimilated cycle in the topology graph,as well as the newly created cycle,

consists of the relative pose elements that are onlypart of the newly created cycle and none other, and

consists of the relative pose elements that are notpart of any cycle.Then and are both block diagonal elements,where each sub-block is a 3-by-3 matrix.The constrained posterior estimate of the state vector canthen be expressed as a projection onto a linearized sub-space of the constraint equation. Let the constraint corre-sponding to new cycle be . Since g is not afunction of state elements in and , its Jacobian withrespect to the state can be written as

The Kalman gain matrix is then given by

and the matrix to be inverted can be written as

and can be shown to be non-singular. With careful imple-mentation, the updated estimate can be computed in

time, which is minimized by the choiceof shortest cycle. The covariance update step is moreexpensive at cost, but the step need onlybe performed at points of loop closure in the course of therobot’s motion.

7 Results

Figures 4-5 show results with a toy problem of 122images, using a vehicle simulated at 1/10th scale and mov-ing at 0.5m/s. The actual images used in this experiment

z X( )

∆XrPr

g X( ) b=

P -( )

xn 1+

xn 1+ P -( )

P -( )

X

xm

xl

xa

xb

= P-( )

Pmm Pml 0 0

Plm Pll 0 0

0 0 Paa 0

0 0 0 Pbb

=

xm nm

xl nl

xa na

xb nb

Figure 4: Mosaic constructed using only initial pose estimatesfrom dead-reckoning. Note the discrepancy in alignment ofthe circled feature at the crossover point (lower left corner).

Paa Pbb

g X( ) b=xm xb

G 0 Gl Ga 0=

K P(-)

GT

GP(-)

GT[ ]

1–=

S GP(-)

GT

GlPllGlT

GaPaaGaT

+[ ]= =

O nmnl nl2

na+ +( )

O nm nl na+ +( )2( )

are extracted from a larger aerial photograph. Initial esti-mates of vehicle pose are obtained from a simulated differ-ential heading based dead-reckoning system. Theodometers on the left and right wheels are each calibratedto 95% accuracy in wheel velocity measurements. Figure4 illustrates the indication of a systematic rightward driftby dead-reckoning measurements, leading to distortionand inconsistency in the generated map on traversing aloop. Figure 5 shows the map after consistency wasenforced with our algorithm.Figure 6 shows results with a 170m long vehicle guidepathin our indoor laboratory, mapped using our test vehiclewith a total of 1836 images. The mosaic built using onlyinitial pose estimates from dead-reckoning had 4 intersec-tions that were initially unresolved.

8 Conclusions and Future Work

What we have presented is an map-building algorithm thatenforces global consistency by exploiting the topology ofobservations made with sensors having small spatial fieldof view, while minimizing observed feature residuals. Thealgorithm is in complexity, where typi-cally the number of loops l is much smaller than the num-ber of images n, and hence scales well to large datasets inthe environments common to our application. In our work,we have generated reliable cyclic mosaics ranging from80m to 800m in total length, both in laboratory and factoryenvironments. The mosaics have been used by our testvehicles at speeds exceeding normal operating conditionsfor up to 40hr stretches without error requiring humanintervention.Ongoing work includes application of the algorithm indomains with other sensor modalities, including buildingdense maps with laser-range data, and eliminating theneed to survey positions of laser reflectors for the installa-tion of bearing-only guidance systems in factories. Consis-

tent odometry error modeling and incorporation of onlineauto-calibration techniques are also being pursued.

9 References[1] H.F. Durrant-Whyte, “Integration of Disparate Sensor

Observations”, Proc. IEEE Intl. Conf. Robotics and Auto-mation, 1986, p. 1464

[2] S.D. Fleischer, H.H. Wang, S.M. Rock, M.J. Lee, “VideoMosaicing along arbitrary vehicle paths”, Proc. Symposiumon Autonomous Underwater Vehicle Technology, 1996, pp.293-299

[3] J. Guivant and E. Nebot, “Optimization of the SimultaneousLocalization and Map Building Algorithm for Real TimeImplementation”, IEEE Trans. on Robotics and Automation,June 2001, 17(3), pp. 242-257

[4] J.-S. Gutmann and K. Konolige, “Incremental Mapping ofLarge Cyclic Environments”, Proc. Computational Intelli-gence in Robots and Automation (CIRA) 1999, pp. 318-325

[5] A. Kelly, “Mobile Robot Localization from Large-ScaleAppearance Mosaics”, Intl. Journal of Robotics Research,19(11), 2000

[6] F. Lu and E. Milios, “Globally consistent range scan align-ment for environment mapping”, Autonomous Robots,4:pp.333-349, 1997

[7] P.F. McLauchlan, “The Variable State Dimension Filterapplied to Surface-based Structure from Motion”, CVSSPTechnical Report VSSP-TR-4/99

[8] P. Newman, “On the Structure and Solution of the Simulta-neous Localization and Map Building Problem”, Ph.D. the-sis submitted in March 1999, Australian Centre for FieldRobotics, Univ. of Sydney.

[9] H.S. Sawhney, S. Hsu and R. Kumar, “Robust video mosa-icing through topology inference and local-to-global align-ment”, Proc. European Conf. on Computer Vision (ECCV),Freiburg, Germany, vol.2, pp.103-119, June 1998.

[10] R. Unnikrishnan and A. Kelly, “Mosaicing Large CyclicEnvironments for Visual Navigation in Autonomous Vehi-cles”, Proc. IEEE Conf. on Robotics and Automation(ICRA), May 2002.

Figure 5: Mosaic after consistency enforcement

O np nl2

l3

+ +( )

Figure 6: Map of indoor vehicle guidepath. Regions ofthe final mosaic at the intersections are highlighted.

5m

a constrained optimization approach to globally consistent …alonzo/pubs/papers/iros02.pdf ·...

Documents