[ieee 2006 international conference on information and automation - colombo, sri lanka...

6
Implementation of an Update Scheme for Monocular Visual SLAM Zhenhe Chen, Ranga Rodrigo, and Jagath Samarabandu Department of Electrical and Computer Engineering The University of Western Ontario London, Ontario, Canada Email: {zchen56, brodrigo, jagath}@uwo.ca Abstract— An autonomous mobile robot is an intelligent agent which explores an unknown environment with minimal human intervention. Building a relative map which describes the spatial model of the environment is essential for exploration by such a robot. Recent advances for robot navigation motivate mapping algorithms to evolve into simultaneous localization and map-building (SLAM). Initial uncertainty is one of the key factors in SLAM. An update scheme of the feature initialization in monocular vision based SLAM will be briefly introduced, which is within a detailed implementation of feature detection and matching, and 3-D reconstruction by multiple view geometry (MVG) within extended Kalman filter (EKF) framework. Experiments clearly show that the proposed scheme can maximize the optimization capacity of EKF. Keywords—Monocular visual SLAM, feature detection and matching, 3-D reconstruction, feature initialization, extended Kalman filter I. I NTRODUCTION Making a robot navigate autonomously is a challenge that has received great interest in the robotic and computer vision research community. We cast the autonomous robot navigation problem within the simultaneous localization and map-building (SLAM) paradigm. The SLAM problem examines the ability of an autonomous robot starting in an unknown environment to incrementally build an environment map and simultaneously localize itself within this map. In robot navigation, accurate mapping and location estimates in either structured or unstruc- tured environments is an essential requirement for the high level tasks of a robot, such as path planning. Robotic map-building can be traced back to twenty years. An important milestone of map building is a series of papers presented by Smith et al. [1]. They presented a powerful prob- abilistic framework for SLAM. Encouraged by their success, a large number of probability-based solutions to SLAM were published [2]–[4]. Most of them used probabilistic frameworks to manage the measurement noise (sensing errors) and turn the measurements into map. Only a few used non-probabilistic techniques [5]. A classical probabilistic approach in SLAM to generate maps is based on extended Kalman filter (EKF) [2], [3]. This is due to the fact that EKF estimates the full posterior over map in an online fashion [4]. The existing SLAM algorithms heavily rely on sophisticated sensing systems, for example, the fusion of GPS, inertial Low level image processing SIFT feature detection SIFT feature matching 3-D reconstruction Fundamental matrix calculation Consistent camera matrix calculation 3-D reconstruction (Projective transform) Upgrading to Euclidean coordinates EKF framework Scene snapshots Image pairs “True” Camera matrix calculation Camera rotation and translation by RQ decomposition Feature reconstruction by DLT 3-D points (up to projective transform) 1. Scene (feature) 3-D cooridnantes 2. Camera rotation and translation Fig. 1. The flowchart of implementing MVG in proposed SLAM algorithm sensor, and LADAR. We propose to solve the SLAM problem using monocular vision. With such a simple sensing system, we achieve two prospective contributions that will be shown in this paper. The first is to apply multiple view geometry (MVG) techniques resulting in a novel update scheme for feature initialization. The second, which is the main task of this paper, is to demonstrate and evaluate the proposed update scheme by implementing the SLAM algorithm in the indoor environment using a monocular vision sensor. In Section II we introduce three main components of the monocular visual SLAM. They include feature detection and matching, reconstruction of 3-D locations of the features and camera centres, and a novel update scheme of EKF based SLAM. In additional to conceptual literature of the techniques, necessary formulation for implementation is also elaborated in this section. Section III specifies experimental results, which clearly shows that robotic SLAM by monocular vision system is feasible and is a promising avenue. II. THEORETICAL BACKGROUND A. Feature Detection and Matching Given a sequence of images captured by an onboard monoc- ular vision sensor, the first step is to detect feature candidates in the successive images for matching them across different views. The classic method of detecting significant features in indoor environments is to use corner detection techniques [6], [7]. However, the corner detector techniques suffer from the mismatching of image scaling and rotation due to the 1–4244–0555–6/06/$20.00 c 2006 IEEE ICIA 2006 Page 212

Upload: jagath

Post on 13-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Implementation of an Update Scheme for MonocularVisual SLAM

Zhenhe Chen, Ranga Rodrigo, and Jagath SamarabanduDepartment of Electrical and Computer Engineering

The University of Western OntarioLondon, Ontario, Canada

Email: {zchen56, brodrigo, jagath}@uwo.ca

Abstract— An autonomous mobile robot is an intelligentagent which explores an unknown environment with minimalhuman intervention. Building a relative map which describesthe spatial model of the environment is essential for explorationby such a robot. Recent advances for robot navigation motivatemapping algorithms to evolve into simultaneous localizationand map-building (SLAM). Initial uncertainty is one of thekey factors in SLAM. An update scheme of the featureinitialization in monocular vision based SLAM will be brieflyintroduced, which is within a detailed implementation offeature detection and matching, and 3-D reconstruction bymultiple view geometry (MVG) within extended Kalmanfilter (EKF) framework. Experiments clearly show that theproposed scheme can maximize the optimization capacity of EKF.

Keywords—Monocular visual SLAM, feature detection andmatching, 3-D reconstruction, feature initialization, extendedKalman filter

I. INTRODUCTION

Making a robot navigate autonomously is a challenge thathas received great interest in the robotic and computer visionresearch community. We cast the autonomous robot navigationproblem within the simultaneous localization and map-building(SLAM) paradigm. The SLAM problem examines the abilityof an autonomous robot starting in an unknown environmentto incrementally build an environment map and simultaneouslylocalize itself within this map. In robot navigation, accuratemapping and location estimates in either structured or unstruc-tured environments is an essential requirement for the highlevel tasks of a robot, such as path planning.

Robotic map-building can be traced back to twenty years.An important milestone of map building is a series of paperspresented by Smith et al. [1]. They presented a powerful prob-abilistic framework for SLAM. Encouraged by their success,a large number of probability-based solutions to SLAM werepublished [2]–[4]. Most of them used probabilistic frameworksto manage the measurement noise (sensing errors) and turn themeasurements into map. Only a few used non-probabilistictechniques [5].

A classical probabilistic approach in SLAM to generatemaps is based on extended Kalman filter (EKF) [2], [3]. Thisis due to the fact that EKF estimates the full posterior overmap in an online fashion [4].

The existing SLAM algorithms heavily rely on sophisticatedsensing systems, for example, the fusion of GPS, inertial

Low level imageprocessing

SIFTfeature

detection

SIFTfeature

matching

3-D reconstruction

Fundamentalmatrix

calculation

Consistentcamera matrix

calculation

3-Dreconstruction

(Projectivetransform)

Upgrading to Euclidean coordinates

EKFframework

Scenesnapshots

Imagepairs

“True”Cameramatrix

calculation

Camerarotation and

translation byRQ

decomposition

Featurereconstruction

by DLT

3-D points (upto projectivetransform)

1. Scene(feature) 3-Dcooridnantes

2. Camerarotation andtranslation

Fig. 1. The flowchart of implementing MVG in proposed SLAM algorithm

sensor, and LADAR. We propose to solve the SLAM problemusing monocular vision. With such a simple sensing system,we achieve two prospective contributions that will be shownin this paper. The first is to apply multiple view geometry(MVG) techniques resulting in a novel update scheme forfeature initialization. The second, which is the main task ofthis paper, is to demonstrate and evaluate the proposed updatescheme by implementing the SLAM algorithm in the indoorenvironment using a monocular vision sensor.

In Section II we introduce three main components of themonocular visual SLAM. They include feature detection andmatching, reconstruction of 3-D locations of the features andcamera centres, and a novel update scheme of EKF basedSLAM. In additional to conceptual literature of the techniques,necessary formulation for implementation is also elaborated inthis section. Section III specifies experimental results, whichclearly shows that robotic SLAM by monocular vision systemis feasible and is a promising avenue.

II. THEORETICAL BACKGROUND

A. Feature Detection and Matching

Given a sequence of images captured by an onboard monoc-ular vision sensor, the first step is to detect feature candidatesin the successive images for matching them across differentviews. The classic method of detecting significant featuresin indoor environments is to use corner detection techniques[6], [7]. However, the corner detector techniques suffer fromthe mismatching of image scaling and rotation due to the

1–4244–0555–6/06/$20.00 c© 2006 IEEE ICIA 2006Page 212

3-D camera viewpoints [8], [9]. We apply a recently pub-lished feature detecting technique called scale-invariant featuretransform (SIFT) to implement distinctive feature detectionand matching. Then the outliers are removed by employingRANdom SAmple Consensus (RANSAC).

1) Scale-Invariant Feature Transform: SIFT was developedby Lowe for extracting image features [8]. The featuresextracted by SIFT are invariant to image translation, scaling,rotation, and partially invariant to the change of illuminationand 3-D camera viewpoint. These characteristics enable SIFTto be a good candidate for a robust SLAM algorithm.

2) RANSAC: It is powerful algorithm to simultaneous solvethe correspondence problem and estimate the fundamentalmatrix related to a pair of images. RANSAC is used toimprove matching techniques that match two point sets inthe images. The matched points are called inliers, and theremaining points are called outliers. An adaptive RANSACalgorithm that handles unknown proportion of inliers, whichmakes the samples draw independently from each other wasproposed by Nielsen [10].

B. Metric 3-D Reconstruction by Multiple View Geometry

In this section, we have two goals. The first is to reconstructthe scene structure and camera motion in 3-space up toa projective transform after we have detected and matchedfeature points in the image pairs. The computer vision theoriesbehind any vision based techniques (monocular, stereo orMVG) can be summarized as follows:

• Correspondence geometry: given a image point x in theimage of one camera, how to find the position of thecorresponding point x′ in the other image of other (orthe same) camera. The epipolar geometry of two viewscan solve this problem [11].

• Motion geometry: given a set of corresponding imagepoints {xi ↔ x′

i}, determine the camera matrices P andP′ for each camera, which describe the translation androtation of the cameras. The matrices can be solved byfundamental matrix computation [12] and reconstructiontechnique [13]

• Scene geometry: given a set of corresponding pointsand cameras P, P′, find out the position of X in 3-D Euclidean coordinate system. A classic method is touse triangulation if the physical installation of the stereocameras is given [14].

The second goal is to upgrade the projective reconstructionto metric reconstruction in Euclidean coordinate system.

1) Epipolar Geometry: The first step of 3-D reconstructionafter image point matching is to apply epipolar theory, whichcan answer the one-to-one corresponding geometry problem(Fig. 2(a)). Following the flowchart in Fig. 1, we will brieflydiscuss three components in the “3-D reconstruction” block,which consists of determining fundamental matrix, consistentcamera matrices and 3-D projective reconstruction.

• Fundamental matrix: this is the algebraic representationof the epipolar geometry [12]. It describes a mappingrelationship between x and its epipolar line l′ on View

2. Customarily, the fundamental matrix is denoted as F,such that

l′ = Fx.

Fig. 2 demonstrates such a mapping relationship. A robustRANSAC-based fundamental matrix estimation can befound in [15], which is a part of Intel Open Com-puter Vision Library (OpenCV). The algorithm combinesthe “7-point” correspondence algorithm and RANSAC.RANSAC works as a “search engine”: within a large num-ber of samples of putative correspondences, RANSACestimates the fundamental matrix, and figure out a setof inliers consistent with this estimate.The reason for calculating F is due to the inherentrelationship between F and camera matrix pair (P, P′):F only depends on projective properties of the cameramatrices (P, P′). Further, F does not depend on thechoice of world frame. Put differently, camera matriceschange if the world coordinate frame are transformed, butnot F.

C C’

X

x'

e'e

x

View 1 View 2

Epipolarplane

l 'l

(a) Epipolar geometry (b) Epipolar lines in one view.

Fig. 2. Illustrations of Eipipolar geometry and its application to draw eipipolarlines onto one of the image pair.

• Consistent camera matrix: Different pairs of camera ma-trices that differ by a projective transformation give riseto the same fundamental matrix. In other words, theremay be ambiguity of finding camera matrices given F.According to the literature given by Luong and Vieville[16], the camera matrices corresponding to a fundamentalmatrix F may be chosen as

P = [I |0] ,P′ = [BF |e′] , (1)

where B is any skew-symmetric matrix to make surefollowing calculation is also skew-symmetric:

[BF |e′]T F [I |0] =[

FTBTF 0e′TF 0

]=

[FT BTF 0

0T 0

].

Specifically, Luong and Vieville suggested choosing Bsuch that

B = [e′]× ,

where [e′]× F means the cross product of the View 2epipole and the fundamental matrix.Note that the camera matrix pair (P, P′) we retrievefrom fundamental matrix in (1) is a “consistent” cameramatrix pair. They are consistent to each other and to thefundamental matrix. They do not reflect the information

Page 213

from the world coordinate frame, such as translation.They are in the form of projection of “true” camera matrixpair (Fig. 1). We will come back to fully solve the “true”camera matrix pair by direct linear transform algorithm(DLT) and RQ decomposition in Sections II-B.2.

• 3-D projective reconstruction: Linear triangulationmethod is one of the simplest techniques to solve the3-space point position X in homogeneous coordinate ifthe follow information is given:

– the image points x and x′ and their correspondingcamera matrices P and P′ in different images;

– the relationship between them: x = PX and x′ =P′X.

Note that the camera matrix pair (P, P′) is not necessaryreflecting the information in Euclidean coordinate systembut it can be the consistent camera matrices that comefrom fundamental matrix.

The linear triangulation method is the direct analogue of DLTmethod [11]. Therefore, a DLT algorithm for 2-D to 3-Dpoints correspondences is implemented for linear triangulationcalculation. In each image we have a 2-D measurement x =PX and x′ = P′X, and the equations can be combined intoa form AX = 0, which is an equation linear in X.

Similar to the 2-D to 2-D homography transform DLTmethod [11], the homogenous scale factor is eliminated bya cross product to give three equations for each image point,of which two are linearly independent. Specifically, for the firstimage, x×(PX) = 0:⎡

⎣ x(P3T X) − (P1T X)y(P3T X) − (P2T X)x(P2T X) − y(P1TX)

⎤⎦ = 0,

where x = [x, y]T , and PiT is the i-th row of P:

P =

⎡⎣ p11 p12 p13 p14

p21 p22 p23 p24

p31 p32 p33 p34

⎤⎦ =

⎛⎝ P1T

P2T

P3T

⎞⎠ .

These equations are linear in the components of X.Using two equations x×(PX) = 0 and x′×(P′X) = 0, an

equation of the form AX = 0 can be written as follows:

A =

⎡⎢⎢⎣

xP3T − P1T

yP3T − P2T

x′P′3T − P′1T

y′P′3T − P′2T

⎤⎥⎥⎦ , (2)

where giving a total of four equations in four homogeneousunknowns. Since the solution is determined only up to scale,it is a redundant set of equations. The solution of X is theright unit singular vector corresponding to the smallest singularvalue of A [11].

2) From Projective to Metric Reconstruction:• Scene reconstruction: It is obvious that the i-th projective

reconstruction scene point Xi in arbitrary homogenouscoordinates cannot be directly used in robotic mapping.Here we denote the i-th scene point in world frameXwi = [xwi, ywi, zwi]T . In our algorithm, we employ the

direct reconstruction technique proposed by Hartley et al.[11]. Its main idea is to directly recover from a projec-tive reconstruction to a metric reconstruction if “groundcontrol points” are given. These so called “ground controlpoints” are those with known 3-D coordinates in the worldcoordinate frame, such that

Xwi = H−1Xi, (3)

where H is a 4×4 projective transform. In (3), each pointcorrespondence provides 3 linearly independent equationson the elements of H. Since H has 15 degrees of freedom,a linear solution is obtained provided n ≥ 5 (and no 4 ofthem are coplanar). In addition, this projective transformH is the same for all the points. Therefore, using only 5non-coplanar ground control points Xwi and comparingthem with the corresponding reconstructed points Xi, theprojective transform H is computed using (3), wherei only has 5 values out of the total n feature points[11]. Once H is computed, it can be applied to all thereconstructed points (in homogenous coordinates), whichwill map these points to their counterparts in the worldframe.

• Motion Reconstruction: In addition to 3D feature pointreconstruction, SLAM requires robot pose as well. Weassume that the robot frame is fixed w.r.t. the cameraframe and is known. Then once we determine the cameraposition and orientation, the robot pose is found as well.In this section, we apply DLT to uniquely determineposition (translation) and orientation (rotation) of thecamera in a 2D world coordinate frame. In other words,we need to evaluate 3 parameters xv = (xv, yv, φ)T .Since the relative positions between the robot and thecamera is fixed, we consider xv as the same as the robotmotion parameters. All these parameters can be foundif the “true” camera matrix Pri for the ith image isknown. Using the DLT algorithm, we compute world-to-image point correspondences {Xwi ↔ xi}. The cameraprojection matrix Pri can now be calculated by maximumlikelihood estimation [11]. The camera matrix can be alsoexpressed like

Pri = KciRci[I|ti] = Kci[Rci|Rciti], (4)

where Kci stands for the calibration matrix of the i-thcamera, Rci stands for orientation w.r.t. world coordinatesystem, and ti stands for translation. To acquire the rangeand bearing of the feature points relating to the camera(robot), we need to compute Rci and ti at each sequentialstep (image), where i = 1, . . . , m.RQ decomposition [17] can evaluate the calibration ma-trix Kci and rotation matrix Rci if Pri is given:

KciRci = RQ(Mi), (5)

where Mi is the left 3×3 submatrix of Pri. Directly, thetranslation matrix can be computed as follows:

ti = Rc−1i Kc

−1i Ni, (6)

Page 214

where Ni is the right 3×1 subvector of Pri, i.e. Pri =[Mi|Ni]. Then xv = (xv, zv, φ)T can be retrieved fromRci and ti for the EKF updating.

C. An Update Scheme of Visual SLAM

We propose a novel technique using MVG for feature ini-tialization in vision based SLAM. In Fig. 3(a), a conventionalprocedure is demonstrated. If only one feature exists, therobot starts to observe the feature xf1 at the (k − 1)st step.Such a feature initialization that must meet enough baselinerequirement is called delayed initialization of the bearing onlySLAM problem [4], [18]. Our scheme falls into this category.After the robot obtains the first observation z1

k, the predictedlocation of feature xf1,k|k−1 is computed by the summation−−−−−−→xf1,k|k−1 =

−−−−−→xv,k|k−1 +

−→z1

k. Most of the algorithms use thepoint-to-point measurement model:

−−−−→z1

k|k−1 =−→xf1 −−−−−−→

xv,k|k−1. (7)

Since at the kth step, it is the first time to obtain the estimatelocation of the feature, xf1 = xf1,k|k−1. Thus z1

k = z1k|k−1.

Consequently, at the kth step, if no innovation ν1k = z1

k −z1

k|k−1 is provided then the state vector at this step cannotbe updated. Put differently, xv,k|k = xv,k|k−1 and xf1,k|k =xf1,k|k−1. Thus in Fig. 3(a), the predicted pose of vehicle(dashed triangle) overlaps the updated pose (dotted triangle)at both (k − 1)st and kth steps, and there is no update forthe feature either at the kth step xf1,k|k = xf1,k|k−1 . Theupdate stage has to wait until the (k + 1)st step if the featureis reobserved.

)1|1(ˆ kkvx)1(kvx

)1(kvx

)|1(ˆ1 kkz

)(1 kz)1(1 kz

)2|1(ˆ kkvx

)1|(ˆ kkvx)|(ˆ kkvx

)(kvx)|1(ˆ kkvx

)1|1(ˆ kkvx

ifX

(a) Conventional initialization

ifX

)1|1(ˆ kkvx )1(kvx)1(kvx

)|1(ˆ1 kkz

)(1 kz)1(1 kz

)2|1(ˆ kkvx

)1|(ˆ kkvx)|(ˆ kkvx

)(kvx)|1(ˆ kkvx

)1|1(ˆ kkvx

)1|(1 kkz

(b) Initialization with update scheme

Fig. 3. Comparison between conventional feature initialization and the oneusing MVG

In contrast, if the MVG techniques are applied [19], recon-structed locations of the feature xf1,k and of the camera (robot)xv,k are determined at the kth step. Comparing to Eq. (7),xf1,k provided by MVG does not drop off the index k. Becausexf1,k is determined by MVG at current kth step, wherexf1 in Eq. (7) is calculated in the previous step. Thus bothpredicted observation z1

k|k−1 and observation z1k are available

by subtraction:−−−−→z1

k|k−1 =−−−→xf1,k −−−−−−→

xv,k|k−1,−→z1

k =−−−→xf1,k −−−→

xv,k,(8)

where xv,k|k−1 is calculated from the pose prediction. xf1,k

and xv,k will be updated later to xf1,k|k and xv,k|k respec-tively, which are optimal estimates from EKF. Once z1

k|k−1

TABLE I

ALGORITHM OF INTEGRATING MVG, EKF AND THE UPDATE SCHEME

Inputs: state vector μk|k = [xv,k|k, xfi,k|k], state covariance Σk|k,control vector uk

• PredictionState prediction: μk+1|k = fk(xv,k|k, uk, 0, xfi)State error covariance predictionΣk+1|k = Fμ,kΣk|kFT

μ,k + FQ,kQkFTQ,k

FOR all observed features (i = 1, · · · , l)• Observation

To the i-th feature: zik+1 = hk+1(xv,k+1,xfi , wk+1)

IF the i-th feature never seen before– Augmenting new feature:

xfi,k+1|k = g(xv,k+1|k, zik+1)

– Corresponding covariance augmentation :

Σk+1|k = Yv,fi

[Σk+1|k 0

0 R

]YT

v,fi

END IF (∗)Predicted observation to the i-th feature:zi

k+1|k = hk+1(xr,k+1|k , pik+1,0)

The innovation term: νik+1 = zi

k+1 − zik+1|k

Kalman gains: Kik+1 = Σk+1|kHiT

μ,k+1(Sik+1)−1

• UpdateUpdating state vector from the strength of i feature observations:μk+1|k = μk+1|k + Ki

k+1νik+1

Updating state error covariance from the strength of i features:Σk+1|k = (I − Ki

k+1Hiμ,k+1)Σk+1|k

END FORPreparing the return values: (∗∗)

State vector: μk+1|k+1 = μk+1|kState error covariance: Σk+1|k+1 = Σk+1|k

Returns: μk+1|k+1 and Σk+1|k+1

and z1k are available, the innovation can be calculated. Now

updating at the kth step is possible. In Fig. 4(b), at bothkth and (k + 1)st steps the update is performed. The up-dated vehicle pose xv,k|k (dotted triangle) is closer to thetrue pose xv,k (solid triangle) compared to the predictedpose xv,k|k−1. Table I summarizes the algorithm integratingMVG, EKF and the proposed update scheme. The notationsare commonly used in [2], [4], [20]. Please note that theconventional algorithms without the update scheme jump fromthe first asterisk (immediately after “END IF”) to the secondasterisks(“preparing the return values”), which indicates thatthe conventional algorithms cannot execute updating in featureinitialization stage. In contrast, due to the provisions of MVG,the proposed update scheme can be performed in feature ini-tialization stage. Therefore, the capacity of optimal estimationof KF (prediction-update) is maximized.

III. EXPERIMENTS

In this section, a series of experiments are carried outto verify the proposed update scheme. We use a camera tosimulate the mobile robot. Fig. 4(a) shows the camera andscene setup. An Elphel NC333L network camera is placed ona wood board. Circles and diameters are drawn symmetricallyon the board. Pre-designed step distances are labelled fortransformation of the camera. Three objects with known sizes

Page 215

are placed in the scene. A global frame (the world coordinatesystem) is carefully selected to ensure that the camera (robot)transverses on a 2-D plane and the convenience to measurethe positions of the camera and objects. Fig. 4 demonstratesthe measurement of orientation of the camera by using a laserpointer on a grid panel that is placed parallel to the front planesof the objects. Therefore, the ground truth of the camera andthe scene are given in every time step when the camera ismoved.

We will present the experiments of computer vision basedmetric reconstruction, which will be further used for theverification of EKF SLAM algorithm.

(a) Camera and scene setup (b) Camera orientation measurement

Fig. 4. Experiments setup

A. Metric Reconstruction

As we addressed in Fig. 1, the image pairs are fed to thelow level image processing block. Fig. 5 illustrates the featuredetection and matching using images 24 and 25. We canobserve that the cross symbols are the SIFT feature detectingresults. After RANSAC examination, successful matching fea-ture points are labelled, which are later used for fundamentalmatrix calculation. Such a corresponding match is accurate.Once we have the fundamental matrix, consistent cameramatrices can be retrieved by (1), which results in the projectivereconstruction Xi of the scene by (2). In the first image pair,five randomly picked feature points are provided with their

Fig. 5. Feature detection and matching by using SIFT and RANSAC (images24 & 25)

-50

510

1520

25

-10

-5

0

5

10

15

20

20

30

40

50

60

70

80

2114

16

15

1317

1819

1220

10

8

9

11

7

6

5

4

32 1

z

x

y

Reconstructed resultsGround truth

Fig. 6. Comparison between ground truth and reconstructed feature pointsin images 4 and 5. Reconstruction average error 3.2cm/point.

-40 -30 -20 -10 0 10 20 30 40-220

-200

-180

-160

-140

-120

-100

x-axis

z-ax

is

Fig. 7. Comparisons among ground truth (“+”), MVG reconstructed cameracentres (“�”) and EKF estimates (“o”).

true locations as ground control points so that the projectivetransform H is computed by (3). By this means, projectivereconstruction Xi is upgraded to metric reconstruction Xwi inworld coordinate system. After the locations of these featuresare acquired in the first pair view, any five of such features canbe used as the ground ground points to upgrade the projectivereconstruction to metric reconstruction in later image pairs.Fig. 6 shows the comparison between the metric reconstructionand ground truth in images 4 and 5. Their average reconstruc-tion error is 3.2cm/point. Consequently, we can use the scenereconstruction to calculate the camera centres by (4)–(6). Thesmall triangles in Fig. 7 are the reconstructed camera centres.The average reconstruction error in norm is 3.4cm/step.

B. Update Scheme Verification Using MVG Provisions

MVG provides scene and motion reconstructions to EKFframework so that in every time step the robot has bothtrue observation zi

k and predicted observation zik|k−1 even

in new feature initialization, which results in the possibilityof implementing the proposed update scheme as discussed inSection II-C. In addition to the analysis of a 1-D monobot in[20], significant improvement results are illustrated in Fig. 8.The map-building estimates of two features are shown by using

Page 216

0 5 10 15 20 25 301.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

Time step

Nor

m o

f err

or in

cm

(3-

D)

Feature 1 using update schemeFeature 2 using update schemeFeature 1 using coventional EKFFeature 2 using convential EKF

Fig. 8. Comparison estimation errors between using the update scheme andconventional EKF. Average error of using the scheme: Feature #1: 2.1cm, #2:1.9cm; using conventional EKF: #1: 2.8cm, #2: 2.4cm.

the proposed update scheme and using conventional acquire-augment method. It is apparent that the contribution of theproposed update scheme is so significant that two essential fac-tors in SLAM are benefited. First, for both features the initialuncertainty is reduced. Second, the initial uncertainty affectslater uncertainty so that in every time step, mapping error ofthe feature using the scheme is lower than its counterpart usingthe conventional method.

IV. CONCLUSION

We present a complete computer vision based frameworkto achieve the goal of SLAM using only a monocular visionsensor. We briefly address the crucial steps of both computervision background and the proposed update scheme. A seriesof experiments are executed to examine the feasibility andaccuracy of the proposed algorithm for the SLAM problem.Significant improvement results indicate that the algorithmfulfils the expectation well. Encouraged by such a successfulintermediate step, we are currently working on implementingthe proposed algorithm on real robot and using a vision sensor.

REFERENCES

[1] R. Smith and P. Cheeseman, “On the representation of spatial uncer-tainty,” International Journal of Robotics Research, vol. 5, pp. 56–68,1986.

[2] M. W. M. G. Dissanayake, P. Newman, and S. Clark, “A solution to thesimultaneous localizaiton and map building (SLAM) problem,” IEEETransactions on Robotics and Automation, vol. 17, no. 3, pp. 229–241,June 2001.

[3] J. Castellanos, J. Neira, and J. Tardos, “Multisensor fusion for simul-taneous localization and map building,” IEEE Transactions on Roboticsand Automation, vol. 17, no. 6, pp. 908 – 914, 2001.

[4] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge,MA, USA: MIT Press, 2005.

[5] M. Di-Marco, A. Garulli, A. Giannitrapani, and A. Vicino, “Simultane-ous localization and map building for a team of cooperating robots: a setmembership approach,” IEEE Transactions on Robotics and Automation,vol. 19, no. 2, pp. 238 – 249, 2003.

[6] C. J. Harris and M. Stephens, “A combined corner and edge detector,”in Proceedings of 4th Alvey Vision Conference, Manchester, UK, 1988,pp. 147–151.

[7] C. S. Kenney, M. Zuliani, and B. S. Manjunath, “An axiomatic approachto corner detection,” in IEEE Computer Society Conference on ComputerVision and Pattern Recognition, vol. 1, San Diego, CA, June 2005, pp.191–197.

[8] D. G. Lowe, “Object recognition from local scale-invariant features,”in Proceedings of the Seventh International Conference on ComputerVision, Kerkyra, Greece, September 1999, pp. 1150–1157.

[9] S. Se, D. Lowe, and J. Little, “Mobile robot localization and mappingwith uncertainty using scale-invariant visual landmarks,” InternationalJournal of Robotics Research, vol. 21, no. 8, pp. 735–758, 2002.

[10] F. Nielsen, Visual Computing: Geometry, Graphics, and Vision. Hing-ham, MA, USA: Charles River Media Inc., 2005.

[11] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vi-sion, 2nd Edition. Cambridge, United Kingdom: Cambridge UniversityPress, 2004.

[12] G. Xu and Z. Zhang, Epipolar Geometry in Stereo, Motion and ObjectRecognition. Kluwer Academic Publishers, 1996.

[13] T. Moons, L. V. Gool, M. V. Diest, and E. Pauwels, Applicationsof Invariance in Computer Vision. Spinger-Verlag, 1994, ch. Affinereconstruction from perspective image pairs.

[14] A. J. Davison and D. W. Murray, “Simultaneous localization and map-building using active vision,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 24, no. 72, pp. 865–880, 2002.

[15] Open Source Computer Vision Library, Intel Corporation, 2001. [Online].Available: http://developer.intel.com

[16] Q. T. Luong and T. Vieville, “Canonical representations for the ge-ometries of multiple projective views,” Computer Vision and ImageUnderstanding, vol. 64, no. 2, pp. 193–199, September 1996.

[17] C. Popeea, B. Dumitrescu, and B. Jora, “An efficient algorithm for firfilter bank completion,” in Proceedings of IEEE International Conferenceon Acoustics, Speech, and Signal Processing, Salt Lake City, Utah, May2001, pp. 3629 – 3632.

[18] Z. Chen, J. Samarabandu, and R. Rodrigo, “Recent advances in simulta-neous localization and map-building using computer vision,” AdvancedRobotics, vol. 21, no. 5, May 2007, to appear.

[19] Z. Chen and J. Samarabandu, “Using multiple view geometry withinextended Kalman filter framework for simultaneous localization andmap-building,” in Proceedings of IEEE International Conference onMechatronics and Automation, Niagara Falls, Canada, July-August 2005,pp. 695–700.

[20] J. Andrade-Cetto, “Environment learning for indoor mobile robots,”Ph.D. dissertation, Universitat Politecnica de Catalunya, 2003.

Page 217