Robot Positioning Using Camera-Space Manipulation With a Linear Camera Model

Download Robot Positioning Using Camera-Space Manipulation With a Linear Camera Model

Post on 24-Mar-2017




0 download

Embed Size (px)


<ul><li><p>726 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 4, AUGUST 2010</p><p>Robot Positioning Using Camera-Space ManipulationWith a Linear Camera Model</p><p>Juan Manuel Rendon-Mancha, Antonio Cardenas, Marco A. Garca,Emilio Gonzalez-Galvan, and Bruno Lara</p><p>AbstractThis paper presents a new version of the camera-space-manipulation method (CSM). The set of nonlinear view parameters ofthe classic CSM is replaced with a linear model. Simulations and exper-iments show a similar precision error for the two methods. However, thenew approach is simpler to implement and is faster.</p><p>Index TermsCamera matrix, camera-space manipulation (CSM), pin-hole camera model, robot control, vision-based control.</p><p>I. INTRODUCTION</p><p>Within computer-vision research, which is an area that has receivedmuch attention is 3-D reconstruction. There are various methods toestimate 3-D structure from 2-D images, including shape from shad-ing [1], [2], depth from focus [3], and stereovision [4]. When precisionand speed are important for the reconstruction process, the choice hasbeen stereovision [5]. Three-dimensional reconstruction by stereovi-sion consists in the estimation of the 3-D structure of objects fromtwo or more 2-D images of the world. The stereovision classic methodworks by first matching points in different images and then computingtheir 3-D position by triangulation [6]. In order to perform triangu-lation computations, a camera model is necessary. A camera modelmaps points from a 3-D space to a 2-D image. For each specific cam-era used, the parameters of the camera model must be known, and thecomputation of these parameters is known as the camera-calibrationprocess [7][9]. In the literature, the most widely used model is thematrix associated with the pinhole camera model [10].</p><p>Throughout the years, computer-vision research has seen an evolu-tion of the camera-calibration algorithms with new proposals improvingthe robustness for different scenarios [11][13]. Most of the methodsfor calibration make use of a calibration pattern to find the param-eters of the camera model. However, in recent years, new methodsnot requiring the use of calibration patterns have been developed [10],[14], [15].</p><p>Similarly, different methods that make use of visual information tocontrol robot manipulators have been developed [16]. The use of 2-D</p><p>Manuscript received January 20, 2010; revised May 10, 2010; accepted April11, 2010. Date of publication June 14, 2010; date of current version August10, 2010. This paper was recommended for publication by Associate EditorL. Villani and Editor K. Lynch upon evaluation of the reviewers comments.This work was supported in part by the National Council of Science and Technol-ogy of Mexico and San Luis Potos under Grant Fondos Mixtos FMSLP-2006-C01-6204, in part by the Universidad Autonoma de San Luis Potos (UASLP)Research Founding Program under Grant CO8-FAI-10-23.59 and Grant C08-PIFI-05.18.18, and in part by the 20062008 Academic development underGrant UASLP-CA-78.</p><p>J. M. Rendon-Mancha and B. Lara are with the Facultad de Ciencias, Uni-versidad Autonoma del Estado de Morelos, C.P. 62209 Cuernavaca, Mexico(e-mail:;</p><p>A. Cardenas, M. A. Garca, and E. Gonzalez-Galvan are with the Facul-tad de Ingeniera, Universidad Autonoma de San Luis Potos, C. P. 78290 SanLuis Potos, Mexico (e-mail:;;</p><p>Color versions of one or more of the figures in this paper are available onlineat</p><p>Digital Object Identifier 10.1109/TRO.2010.2050518</p><p>images coming from cameras to control robotic mechanisms involvesa mapping between different coordinate systems. One such methodis known as camera-space manipulation (CSM) [17][19], which pro-poses that many real-world tasks may be described by two or morecamera-space tasks.</p><p>CSM relies on the estimation of a relationship between the posi-tion of certain cues on the robot manipulator and their correspondingposition in images taken by at least two cameras. This relationship isimplicit in the so-called observation equations. The traditional CSMmethod is based on the nominal kinematics model of the manipulator,the orthographic camera model, and an adjustment known as flatteningthat takes into account the perspective effect.</p><p>The CSM method does not provide a way to accurately determinethe coordinates of a point in some 3-D Cartesian-coordinate system;instead, it provides a way to bring to closure a point A with point Bin camera space and, by this means, guarantee closure in 3-D physicalspace. The strategy to achieve CSM point-position control is to generatea trajectory for the movement of the robot arm. This trajectory shoulddrive the robot joints to produce camera-space coincidence of pointsA (i.e., the target) and B (i.e., the end-effector). The inverse nominalkinematics estimates the angles i , i = 1, . . . , 6 of the robot arm toproduce this coincidence [20].</p><p>In contrast with other widely used vision-based robot-control tech-niques, such as teleoperation and visual servoing (VS), CSM does notrequire real-time updating of the state of the plant; its open-loop con-trol, which is based on estimation techniques, makes the CSM schemea reliable approach to establish a vision-based system for robot con-trol. However, VS is normally a faster method and is more robust thanCSM when time delays are not present in the system [20]. Recently,the CSM method has been successfully implemented in a wide varietyof fields including space exploration [21], [22], warehouse environ-ments [23], [24], industrial tasks [25][28], and mobile robots [20].</p><p>In this paper, the authors propose a modification to the classic CSMmethod by establishing the observation equations based on the matrixassociated with the pinhole camera model [29] instead of a combinationof the orthographic camera model and the flattening procedure. Thelatter will be referred throughout this paper as the orthographic +flattening CSM (OFL-CSM).</p><p>It is important to distinguish between the new method and a previouspinhole-based method discarded in former works [20], [30]. That pro-posal used the nonlinear equations associated with the pinhole model.The detailed discussion can be found at the end of Section III-D.</p><p>The proposed modification, which is referred throughout this paperas the linear camera model CSM (LCM-CSM), presents several ad-vantages. First, the camera matrix is a linear model, thus making theestimation of the view parameters easier to implement and less com-putationally expensive. In addition, given the sufficient data points, thelinearity of the proposed method ensures finding a solution. Second,this estimation can be done in a single step, as opposed to the two neces-sary steps in the regular OFL-CSM method. Next, the proposed methoddoes not require the manual estimation of any parameter. Finally, theproposed model can handle explicitly nonsquare pixels.</p><p>A further contribution of this paper is the rewriting of the classicCSM equations using the pinhole-model terminology, thus providing anovel analysis of the OFL-CSM method and a very close comparisonwith the proposed strategy.</p><p>This paper is organized as follows. Section II reviews the pinholecamera model. Section III outlines the traditional OFL-CSM method. InSection IV, the proposed LCM-CSM method is described. Section Vpresents the software and hardware used as a testbed. Experimentalresults of three different analysis are reported in Section VI. Finally,Section VII is dedicated to conclusions.</p><p>1552-3098/$26.00 2010 IEEE</p></li><li><p>IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 4, AUGUST 2010 727</p><p>II. PINHOLE CAMERA MODEL</p><p>A linear relationship between points in 3-D space and its projectionsin a plane can be established based on the following well-known pinholecamera model [29]:</p><p>[xy</p><p>]=</p><p>[x s px</p><p>y py1</p><p>]XYZ1</p><p>where [x, y]T is the point in the image, is a scale factor, [X, Y, Z ]T</p><p>is the point in the 3-D coordinate system of the camera, x = mxf ,y = my f , where f is the focal distance, mx is the ratio between thepixel width and the 3-D units (e.g., in meters), my is the ratio betweenthe pixel height and the 3-D units, px and py are the coordinates of theprincipal point, and s is the skew factor, which is zero, if the axes ofthe sensor are orthogonal (i.e., normal case).</p><p>In order to take into account the fact that the camera and the 3-Dpoints have different reference frames, an Euclidian transform is added.Thus, we have</p><p>[xy</p><p>]=</p><p>[x s px</p><p>y py1</p><p>][r11 r12 r13 Txr21 r22 r23 Tyr31 r32 r33 Tz</p><p>]X </p><p>Y </p><p>Z </p><p>1</p><p> . (1)</p><p>Equation (1) has 11 degrees of freedom (DOF), given that therotation matrix, i.e., r11 , r12 , . . . , r33 , has 3 DOF. After multiplyingthe matrices of (1), we obtain matrix P34 [see (2)], which also has11 DOF, given that it is homogenous (it is defined up to scale) [29].</p><p>Thus, the relationship between the point in 3-D space and its pro-jection in the image is given by the following equation:</p><p>[xy1</p><p>]=</p><p>[p11 p12 p13 p14p21 p22 p23 p24p31 p32 p33 p34</p><p>]X </p><p>Y </p><p>Z </p><p>1</p><p> (2)</p><p>where [x, y]T is the point in the image, and [X , Y , Z ]T is the pointin the world reference frame. If p34 is not zero, the matrix can bedivided by p34 . To calculate the P elements, at least six points arerequired (i = 1, 2, . . . , 6). However, five points and the coordinate xor y of the sixth point are enough to find the analytical solution of the11 parameters of P.</p><p>If more than six points are used in computing P, an overconstrainedsystem is obtained and a minimization process is required. An over-constrained system produces better results when estimating P in thesense that the minimization process averages out the white noise in theestimation of P.</p><p>In order to increase accuracy, some methods perform a cameracalibration using two or more steps. In the method developed byTsai [7][9], which is a well-known method in the computer-visioncommunity, the extrinsic parameters (i.e., rotation and translation in3-D) are estimated first. An orthogonality constraint is used in a sec-ond step. Finally, the perspective parameter, i.e., focal distance, istaken into account. This strategy has proved to provide a very accuratecalibration.</p><p>III. ORTHOGRAPHIC + FLATTENING CAMERA-SPACE-MANIPULATION METHOD</p><p>The OFL-CSM method works by establishing and refining a map-ping between projections on camera space of visual cues, which arelocated on the manipulator, and the internal-joint configuration of the</p><p>robot [17]. If a manipulator has to perform a positioning task, such amapping has to be determined for each of at least two participatingcameras.</p><p>In the original OFL-CSM method, this mapping is computed intwo steps. In the first step, the method uses an orthographic cameramodel [31] and the nominal forward kinematics model of the robotarm. In the second step, which is known as flattening, the effect ofperspective projection is included in order to increase the precision ofthe orthographic model [30].</p><p>A. First Step: Orthographic Camera Model</p><p>In an orthographic camera model, the distance between the cameraand the target are critical for accuracy. The larger the distance, thebetter the approximation, and as the distance goes to infinity, the modelbecomes exact.</p><p>The mapping of this first step is described by a set of view pa-rameters given by C = [C1 , C2 , . . . , C6 ]T and is represented by thefollowing nonlinear equations:</p><p>xi = C5 + (C21 + C22 C23 C24 )RXi ()</p><p>+ 2(C2C3 + C1C4 )RYi () + 2(C2C4 C1C3 )RZi ()</p><p>yi = C6 + 2(C2C3 C1C4 )RXi ()</p><p>+ (C21 C22 + C23 C24 )RYi ()</p><p>+ 2(C3C4 + C1C2 )RZi () (3)</p><p>where the coordinates (xi , yi )1 represent the camera-space location ofthe center of the ith visual feature located on the end-effector of therobot arm. The position vector (RXi (), RYi (), RZ i ()) describesthe location of each visual features center relative to a reference framefixed to the world and corresponding to the forward kinematic modelof the robot arm. The internal-joint configuration of a n-DOF arm isdenoted by = [1 , 2 , . . . , n ]T . For convenience, (3) is rewritten as</p><p>xi = {h(RXi (), RYi (), RZ i ();C)}.</p><p>The view parameters C are estimated for each camera through theacquisition of a m number of simultaneous joint and camera-spacesamples by minimizing a scalar J over all C = [C1 , C2 , . . . , C6 ]T</p><p>J(C)</p><p>=m</p><p>k=1</p><p>[n (k )i=1</p><p>{[xi,k hx (RXi (k ), RYi (k ), RZ i (k );C)]2</p><p>+ [yi,k hy (RXi (k ), RYi (k ), RZ i (k );C)]2}Wi,k</p><p>](4)</p><p>where coordinates (xi,k , yi,k ) represent the detected camera-spacecoordinates of the ith visual feature located on the robots end-effectorin the camera for pose k. The matrix Wi,k contains the relative weightgiven to each of the measurements to take into account the distance ofthe samples to the target [18], [20].</p><p>The orthographic projection of a 3-D point [X, Y, Z ]T onto the planeZ is given by the following equation:[</p><p>xy</p><p>]=</p><p>[1 0 00 1 0</p><p>][XYZ</p><p>].</p><p>1Hereafter, a capital letter will denote a 3-D coordinate, and a lowercase letterwill represent a camera-space coordinate.</p></li><li><p>728 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 4, AUGUST 2010</p><p>The orthographic model of (3) can also be described as</p><p>[xy</p><p>]= m</p><p>[r11 r12 r13 Txr21 r22 r23 Ty</p><p>]X </p><p>Y </p><p>Z </p><p>1</p><p>=</p><p>[m r11X</p><p> +m r12Y +m r13Z +m Txm r21X</p><p> +m r22Y +m r23Z +m Ty</p><p>]. (5)</p><p>This model is composed of a scale factor, an orthographic projection,and a 3-D rotation + translation; hence, it can be defined as</p><p>m r11 = C21 + C22 C23 C24</p><p>m r12 = 2(C2C3 + C1C4 )</p><p>m r13 = 2(C2C4 C1C3 )</p><p>m r21 = 2(C2C3 C1C4 )</p><p>m r22 = C21 C22 + C23 C24m r23 = 2(C3C4 + C1C2 )</p><p>m Tx = C5</p><p>m Ty = C6</p><p>X = RXi ()</p><p>Y = RYi ()</p><p>Z = RZi () (6)</p><p>where m is an isotropic scale factor that allows modeling of onlysquare-pixel cameras, r11 , r12 , . . . , r23 are the rotation matrix termshaving only 3 DOF, and Tx and Ty are terms of a translation vector.Thus, the matricial equation has the same DOFs as C1 , C2 , . . . , C6 .</p><p>B. Second Step: Flattening Procedure</p><p>A second step, which is known as flattening, is performed in orderto increase the precision of the orthographic model [30]. The maincontribution of this step is to include, in the calculations, the effectof perspective projection, thereby fitting the camera-space data to theorthographic camera model.</p><p>The complete model (i.e., OFL) can be described with the followingequation:</p><p> [x, y, 1]T</p><p>=</p><p>[Zrm</p><p>Zrmsw</p><p>][r11 r12 r13 Txr21 r22 r23 Tyr31 r32 r33 Tz</p><p>]X </p><p>Y </p><p>Z </p><p>1</p><p> (7)</p><p>where sw can be 0 or 1. When sw is set to 0, (7) corresponds to theorthographic model, and when set to 1, the equation takes into accountthe perspective effect. Equation (7) can be broken into following twoequations:[</p><p>XYZ</p><p>]=</p><p>[m</p><p>msw</p><p>][r11 r12 r13 Txr21 r22 r23 Tyr31 r32 r33 Tz</p><p>]X </p><p>Y </p><p>Z </p><p>1</p><p> (8)</p><p>[xy1</p><p>]=</p><p>[Zr</p><p>Zr1</p><p>][XYZ</p><p>](9)</p><p>where [X, Y, Z ]T represents a point given in the 3-D-camera coordinatesystem.</p><p>Fig. 1. Camera fixed-reference system.</p><p>C. Camera-Space-Manipulation Algorithm</p><p>The classic CSM algorithm is summarized as follows:1) Orthographic stage:</p><p>Iteratively estimate C by minimizing (8) with sw = 0 for at leasttwo cameras.</p><p>2) Flattening stage:a) Set Zr at an...</p></li></ul>