robot positioning using camera-space manipulation with a linear camera model

726 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 4, AUGUST 2010

Robot Positioning Using Camera-Space ManipulationWith a Linear Camera Model

Juan Manuel Rendon-Mancha, Antonio Cardenas, Marco A. Garcıa,Emilio Gonzalez-Galvan, and Bruno Lara

Abstract—This paper presents a new version of the camera-space-manipulation method (CSM). The set of nonlinear view parameters ofthe classic CSM is replaced with a linear model. Simulations and exper-iments show a similar precision error for the two methods. However, thenew approach is simpler to implement and is faster.

Index Terms—Camera matrix, camera-space manipulation (CSM), pin-hole camera model, robot control, vision-based control.

I. INTRODUCTION

Within computer-vision research, which is an area that has receivedmuch attention is 3-D reconstruction. There are various methods toestimate 3-D structure from 2-D images, including shape from shad-ing [1], [2], depth from focus [3], and stereovision [4]. When precisionand speed are important for the reconstruction process, the choice hasbeen stereovision [5]. Three-dimensional reconstruction by stereovi-sion consists in the estimation of the 3-D structure of objects fromtwo or more 2-D images of the world. The stereovision classic methodworks by first matching points in different images and then computingtheir 3-D position by triangulation [6]. In order to perform triangu-lation computations, a camera model is necessary. A camera modelmaps points from a 3-D space to a 2-D image. For each specific cam-era used, the parameters of the camera model must be known, and thecomputation of these parameters is known as the camera-calibrationprocess [7]–[9]. In the literature, the most widely used model is thematrix associated with the pinhole camera model [10].

Throughout the years, computer-vision research has seen an evolu-tion of the camera-calibration algorithms with new proposals improvingthe robustness for different scenarios [11]–[13]. Most of the methodsfor calibration make use of a calibration pattern to find the param-eters of the camera model. However, in recent years, new methodsnot requiring the use of calibration patterns have been developed [10],[14], [15].

Similarly, different methods that make use of visual information tocontrol robot manipulators have been developed [16]. The use of 2-D

Manuscript received January 20, 2010; revised May 10, 2010; accepted April11, 2010. Date of publication June 14, 2010; date of current version August10, 2010. This paper was recommended for publication by Associate EditorL. Villani and Editor K. Lynch upon evaluation of the reviewers’ comments.This work was supported in part by the National Council of Science and Technol-ogy of Mexico and San Luis Potosı under Grant Fondos Mixtos FMSLP-2006-C01-6204, in part by the Universidad Autonoma de San Luis Potosı (UASLP)Research Founding Program under Grant CO8-FAI-10-23.59 and Grant C08-PIFI-05.18.18, and in part by the 2006–2008 Academic development underGrant UASLP-CA-78.

J. M. Rendon-Mancha and B. Lara are with the Facultad de Ciencias, Uni-versidad Autonoma del Estado de Morelos, C.P. 62209 Cuernavaca, Mexico(e-mail: [email protected]; [email protected]).

A. Cardenas, M. A. Garcıa, and E. Gonzalez-Galvan are with the Facul-tad de Ingenierıa, Universidad Autonoma de San Luis Potosı, C. P. 78290 SanLuis Potosı, Mexico (e-mail: [email protected]; [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TRO.2010.2050518

images coming from cameras to control robotic mechanisms involvesa mapping between different coordinate systems. One such methodis known as camera-space manipulation (CSM) [17]–[19], which pro-poses that many real-world tasks may be described by two or morecamera-space tasks.

CSM relies on the estimation of a relationship between the posi-tion of certain cues on the robot manipulator and their correspondingposition in images taken by at least two cameras. This relationship isimplicit in the so-called observation equations. The traditional CSMmethod is based on the nominal kinematics model of the manipulator,the orthographic camera model, and an adjustment known as flatteningthat takes into account the perspective effect.

The CSM method does not provide a way to accurately determinethe coordinates of a point in some 3-D Cartesian-coordinate system;instead, it provides a way to bring to closure a point A with point Bin camera space and, by this means, guarantee closure in 3-D physicalspace. The strategy to achieve CSM point-position control is to generatea trajectory for the movement of the robot arm. This trajectory shoulddrive the robot joints to produce camera-space coincidence of pointsA (i.e., the target) and B (i.e., the end-effector). The inverse nominalkinematics estimates the angles θi , i = 1, . . . , 6 of the robot arm toproduce this coincidence [20].

In contrast with other widely used vision-based robot-control tech-niques, such as teleoperation and visual servoing (VS), CSM does notrequire real-time updating of the state of the plant; its open-loop con-trol, which is based on estimation techniques, makes the CSM schemea reliable approach to establish a vision-based system for robot con-trol. However, VS is normally a faster method and is more robust thanCSM when time delays are not present in the system [20]. Recently,the CSM method has been successfully implemented in a wide varietyof fields including space exploration [21], [22], warehouse environ-ments [23], [24], industrial tasks [25]–[28], and mobile robots [20].

In this paper, the authors propose a modification to the classic CSMmethod by establishing the observation equations based on the matrixassociated with the pinhole camera model [29] instead of a combinationof the orthographic camera model and the flattening procedure. Thelatter will be referred throughout this paper as the orthographic +flattening CSM (OFL-CSM).

It is important to distinguish between the new method and a previouspinhole-based method discarded in former works [20], [30]. That pro-posal used the nonlinear equations associated with the pinhole model.The detailed discussion can be found at the end of Section III-D.

The proposed modification, which is referred throughout this paperas the linear camera model CSM (LCM-CSM), presents several ad-vantages. First, the camera matrix is a linear model, thus making theestimation of the view parameters easier to implement and less com-putationally expensive. In addition, given the sufficient data points, thelinearity of the proposed method ensures finding a solution. Second,this estimation can be done in a single step, as opposed to the two neces-sary steps in the regular OFL-CSM method. Next, the proposed methoddoes not require the manual estimation of any parameter. Finally, theproposed model can handle explicitly nonsquare pixels.

A further contribution of this paper is the rewriting of the classicCSM equations using the pinhole-model terminology, thus providing anovel analysis of the OFL-CSM method and a very close comparisonwith the proposed strategy.

This paper is organized as follows. Section II reviews the pinholecamera model. Section III outlines the traditional OFL-CSM method. InSection IV, the proposed LCM-CSM method is described. Section Vpresents the software and hardware used as a testbed. Experimentalresults of three different analysis are reported in Section VI. Finally,Section VII is dedicated to conclusions.

1552-3098/$26.00 © 2010 IEEE

IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 4, AUGUST 2010 727

II. PINHOLE CAMERA MODEL

A linear relationship between points in 3-D space and its projectionsin a plane can be established based on the following well-known pinholecamera model [29]:

[ρxρyρ

]=

[αx s px

αy py

1

]XYZ1

where [x, y]T is the point in the image, ρ is a scale factor, [X, Y, Z ]T

is the point in the 3-D coordinate system of the camera, αx = −mx f ,αy = −my f , where f is the focal distance, mx is the ratio between thepixel width and the 3-D units (e.g., in meters), my is the ratio betweenthe pixel height and the 3-D units, px and py are the coordinates of theprincipal point, and s is the skew factor, which is zero, if the axes ofthe sensor are orthogonal (i.e., normal case).

In order to take into account the fact that the camera and the 3-Dpoints have different reference frames, an Euclidian transform is added.Thus, we have

[ρxρyρ

]=

[αx s px

αy py

1

][r11 r12 r13 Tx

r21 r22 r23 Ty

r31 r32 r33 Tz

]X ′

Y ′

Z ′

1

. (1)

Equation (1) has 11 degrees of freedom (DOF), given that therotation matrix, i.e., r11 , r12 , . . . , r33 , has 3 DOF. After multiplyingthe matrices of (1), we obtain matrix P3×4 [see (2)], which also has11 DOF, given that it is homogenous (it is defined up to scale) [29].

Thus, the relationship between the point in 3-D space and its pro-jection in the image is given by the following equation:

ρ

[xy1

]=

[p11 p12 p13 p14

p21 p22 p23 p24

p31 p32 p33 p34

]X ′

Y ′

Z ′

1

(2)

where [x, y]T is the point in the image, and [X ′, Y ′, Z ′]T is the pointin the world reference frame. If p34 is not zero, the matrix can bedivided by p34 . To calculate the P elements, at least six points arerequired (i = 1, 2, . . . , 6). However, five points and the coordinate xor y of the sixth point are enough to find the analytical solution of the11 parameters of P.

If more than six points are used in computing P, an overconstrainedsystem is obtained and a minimization process is required. An over-constrained system produces better results when estimating P in thesense that the minimization process averages out the white noise in theestimation of P.

In order to increase accuracy, some methods perform a cameracalibration using two or more steps. In the method developed byTsai [7]–[9], which is a well-known method in the computer-visioncommunity, the extrinsic parameters (i.e., rotation and translation in3-D) are estimated first. An orthogonality constraint is used in a sec-ond step. Finally, the perspective parameter, i.e., focal distance, istaken into account. This strategy has proved to provide a very accuratecalibration.

III. ORTHOGRAPHIC + FLATTENING CAMERA-SPACE-MANIPULATION METHOD

The OFL-CSM method works by establishing and refining a map-ping between projections on camera space of visual cues, which arelocated on the manipulator, and the internal-joint configuration of the

robot [17]. If a manipulator has to perform a positioning task, such amapping has to be determined for each of at least two participatingcameras.

In the original OFL-CSM method, this mapping is computed intwo steps. In the first step, the method uses an orthographic cameramodel [31] and the nominal forward kinematics model of the robotarm. In the second step, which is known as flattening, the effect ofperspective projection is included in order to increase the precision ofthe orthographic model [30].

A. First Step: Orthographic Camera Model

In an orthographic camera model, the distance between the cameraand the target are critical for accuracy. The larger the distance, thebetter the approximation, and as the distance goes to infinity, the modelbecomes exact.

The mapping of this first step is described by a set of “view pa-rameters” given by C = [C1 , C2 , . . . , C6 ]T and is represented by thefollowing nonlinear equations:

xi = C5 + (C21 + C2

2 − C23 − C2

4 )RX i(Θ)

+ 2(C2C3 + C1C4 )RYi(Θ) + 2(C2C4 − C1C3 )RZ i

(Θ)

yi = C6 + 2(C2C3 − C1C4 )RX i(Θ)

+ (C21 − C2

2 + C23 − C2

4 )RYi(Θ)

+ 2(C3C4 + C1C2 )RZ i(Θ) (3)

where the coordinates (xi , yi )1 represent the camera-space location ofthe center of the ith visual feature located on the end-effector of therobot arm. The position vector (RX i

(Θ), RYi(Θ), RZ i

(Θ)) describesthe location of each visual feature’s center relative to a reference framefixed to the world and corresponding to the forward kinematic modelof the robot arm. The internal-joint configuration of a n-DOF arm isdenoted by Θ = [θ1 , θ2 , . . . , θn ]T . For convenience, (3) is rewritten as

xi = {h(RX i(Θ), RYi

(Θ), RZ i(Θ);C)}.

The view parameters C are estimated for each camera through theacquisition of a m number of simultaneous joint and camera-spacesamples by minimizing a scalar J over all C = [C1 , C2 , . . . , C6 ]T

J(C)

=m∑

k=1

[n (k )∑i=1

{[xi,k − hx (RX i

(Θk ), RYi(Θk ), RZ i

(Θk );C)]2

+ [yi,k − hy (RX i(Θk ), RYi

(Θk ), RZ i(Θk );C)]2

}Wi,k

](4)

where coordinates (xi,k , yi,k ) represent the detected camera-spacecoordinates of the ith visual feature located on the robot’s end-effectorin the camera for pose k. The matrix Wi,k contains the relative weightgiven to each of the measurements to take into account the distance ofthe samples to the target [18], [20].

The orthographic projection of a 3-D point [X, Y, Z ]T onto the planeZ is given by the following equation:[

xy

]=

[1 0 00 1 0

][XYZ

].

1Hereafter, a capital letter will denote a 3-D coordinate, and a lowercase letterwill represent a camera-space coordinate.


The “orthographic” model of (3) can also be described as

[xy

]= m

[r11 r12 r13 Tx

r21 r22 r23 Ty

]X ′

Y ′

Z ′

1

=

[m r11X

′ + m r12Y′ + m r13Z

′ + m Tx

m r21X′ + m r22Y

′ + m r23Z′ + m Ty

]. (5)

This model is composed of a scale factor, an orthographic projection,and a 3-D rotation + translation; hence, it can be defined as

m r11 = C21 + C2

2 − C23 − C2

4

m r12 = 2(C2C3 + C1C4 )

m r13 = 2(C2C4 − C1C3 )

m r21 = 2(C2C3 − C1C4 )

m r22 = C21 − C2

2 + C23 − C2

4

m r23 = 2(C3C4 + C1C2 )

m Tx = C5

m Ty = C6

X ′ = RX i(Θ)

Y ′ = RYi(Θ)

Z ′ = RZ i(Θ) (6)

where m is an isotropic scale factor that allows modeling of onlysquare-pixel cameras, r11 , r12 , . . . , r23 are the rotation matrix termshaving only 3 DOF, and Tx and Ty are terms of a translation vector.Thus, the matricial equation has the same DOFs as C1 , C2 , . . . , C6 .

B. Second Step: Flattening Procedure

A second step, which is known as flattening, is performed in orderto increase the precision of the orthographic model [30]. The maincontribution of this step is to include, in the calculations, the effectof perspective projection, thereby fitting the camera-space data to theorthographic camera model.

The complete model (i.e., OFL) can be described with the followingequation:

ρ [x, y, 1]T

=

[Zr m

Zr msw

][r11 r12 r13 Tx

r21 r22 r23 Ty

r31 r32 r33 Tz

]X ′

Y ′

Z ′

1

(7)

where sw can be 0 or 1. When sw is set to 0, (7) corresponds to theorthographic model, and when set to 1, the equation takes into accountthe perspective effect. Equation (7) can be broken into following twoequations:[

XYZ

]=

[m

msw

][r11 r12 r13 Tx

r21 r22 r23 Ty

r31 r32 r33 Tz

]X ′

Y ′

Z ′

1

(8)

ρ

[xy1

]=

[Zr

Zr

1

][XYZ

](9)

where [X, Y, Z ]T represents a point given in the 3-D-camera coordinatesystem.

Fig. 1. Camera fixed-reference system.

C. Camera-Space-Manipulation Algorithm

The classic CSM algorithm is summarized as follows:1) Orthographic stage:

Iteratively estimate C by minimizing (8) with sw = 0 for at leasttwo cameras.

2) Flattening stage:a) Set Zr at an approximated value (which is known as the

flattening distance) (see Fig. 1).b) Estimate Z for a set of reference points using (8), with

sw = 1.c) Compute the “flattened points” X, Y using (9).d) Use computed X, Y in (8), with sw = 0, to reestimate C

[by minimizing (4)].e) If not convergence, go to step b).

The points X, Y are called the flattened points; in effect, flatteningmeans to convert real projections x, y to orthographic projections X, Y .At this time, the effect of perspective is already taken into account.The number of parameters for the OFL model is 7 as Zr , which isincluded for flattening, is added to the model. The final estimation ofthe complete model is very accurate, even though Zr is not initializedprecisely, as the parameter m [see (7)] absorbs the error. For moredetails, see [19], [30], and [32].

D. Comparison Between Proposed and Classic Methods

Equation (7), with sw = 1, can be expressed in a nonmatricial wayas follows:

x = ZrX

Z= Zr m

r11X′ + r12Y

′ + r13Z′ + Tx

r31X ′ + r32Y ′ + r33Z ′ + Tz

y = ZrY

Z= Zr m

r21X′ + r22Y

′ + r23Z′ + Ty

r31X ′ + r32Y ′ + r33Z ′ + Tz

. (10)

In order to compare the pinhole model with the OFL model, itcan be assumed in (1) that s = px = py = 0; performing the matrixmultiplication and dividing into the third row, we have

x = αxr11X

′ + r12Y′ + r13Z

′ + Tx

r31X ′ + r32Y ′ + r33Z ′ + Tz

y = αy

r21X′ + r22Y

′ + r23Z′ + Ty

r31X ′ + r32Y ′ + r33Z ′ + Tz

. (11)

From the comparison of (10) and (11), it can be seen that the OFLmodel is similar to the pinhole model.


Fig. 2. Typical CSM system.

In the seminal papers of the CSM method that appeared in 1986 [17],[33], the view-C parameters were estimated using an orthographic cam-era model, on the assumption that cameras were set far from the visualmarks and that long focal-length lenses were used. This model suitedthe specific application tackled in the papers. Later, in 1997 [30], theflattening procedure was introduced to take into account the perspectiveeffect, thus fitting the camera-space data to the orthographic cameramodel. This flattening comes as an additional step to the first approach.Also, in this work, a different set of six view parameters was introducedbased on the Euler parameters plus the nonlinear relationship associatedwith the pinhole camera model (i.e., x = fX/Z, y = fY/Z). Addi-tionally, a solution for the estimation of the view parameters, using aminimization process, was proposed. However, problems of numericalinstability were reported to be associated with this solution [30], [34].These results have been the argument to systematically discard, in theCSM works, the use of the pinhole model in the form presented in [30].

However, to the best of our knowledge, it had never been consideredto establish the 11 parameters of the matrix associated with the pinholemodel as the new set of view parameters.

IV. CAMERA-SPACE MANIPULATION WITH THE PINHOLE MODEL

Instead of the combination of orthographic camera model and flat-tening procedure, the authors propose to establish the relationship basedon the linear pinhole camera model. In other words, we modify the CSMmethod to use the 11 coefficients of the matrix P (p11 , . . . , p33 ), insteadof using the view parameters C1 , . . . , C6 , and Zr . This modificationof the CSM method represents the main contribution of this paper.

This new model presents several advantages. First, the camera matrixP is a linear model; as detailed in Section II, the matrix P is calculateddirectly from five and a half points as a linear system. This is donein only one step instead of the two optimization steps of the standardOFL-CSM model. Second, the proposed method does not require theprior estimation of parameters (Zr in the OFL-CSM). Furthermore,if more than five and a half points are used, an overconstrained linearsystem is obtained, and a minimization method, such as the least-squareresidual method, can be used to get the solution for the P parameters.Since the model equation is linear, the minimum obtained is global, andthe minimization process is stable. As can be seen from the descriptionof the model, this method is capable of handling nonsquare pixels.

V. EXPERIMENTAL IMPLEMENTATION

This section describes the experimental procedure used to validatethe proposed strategy. In Section V-A, the hardware testbed is described.Section V-B describes the software implementation.

A. Hardware Setup

A vision-based control was implemented for a 6-DOF industrialrobot. The whole system consists of two charge-coupled-device (CCD)cameras fixed in the walls of the laboratory, a laser ray mounted in apan-tilt unit (PTU) with 2 DOF, a PUMA 761 industrial robot, anda PC-based controller (see Fig. 2). The PC executes the vision-basedcontrol algorithms, as well as the control algorithms for the robot arm.

The original morphology of the PUMA 761 has been retained. Acylindrical structure was added to the end-effector containing a 6 × 3grid of LEDs. This array is used to locate the end-effector on the images.A pulsewidth-modulated (PWM) ICM-1900 Interconnect Module ispart of the new controller hardware. A DMC 1800 PCI controller isused for the PC-based control. The image-acquisition hardware is a PCIframe-grabber board, and the cameras used are Sony high-resolutionblack and white cameras with a CCD 1/3 in. model SSC-M383, andRainbow varifocal lens 6.5–82.5 mm model 1639 VCS. The operatingsystem used was Fedora FC4.

B. Software Setup

A graphical user interface (GUI) that permits a remote user to ex-ercise high-level supervisory control over the system was developed.The system was programmed in Java and C++ using the Data Transla-tion software-development kit for frame-grabber boards [35]. The aimof the GUI is to allow the human supervisor to identify some surfacefeature of interest as a target point and facilitate the use of the systemby enabling operator control over the PTUs and the laser device.

The GUI displays an image on the computer monitor, thus enablingthe user to see the workspace of the robot and pointing and clickingon the image viewed. The user selects a point of interest on the imageand the GUI leaves a mark on the screen to denote the selected point(see Fig. 3). The user can either “OK” the selection or refresh thescreen.

Once the user is satisfied with the target position, as it appears on thesurface of interest viewed on the monitor, selecting “OK” continues theprocess. A laser spot is then driven by the system, using a VS control,such that it is physically centered on the designated target feature. Then,the control CSM takes over and commands the robot to position the tipof the end-effector at the point targeted by the laser spot.

VI. EXPERIMENTAL RESULTS

A. Sensitivity Analysis Using Simulated Data

In order to test the sensitivity to noise and other factors for the twomethods, a set of simulations using synthetic data was performed. Itis worth noting that a perspective effect is not modeled into the data,as the LCM-CSM model is capable of handling such effect; however,the OFL-CSM would need the flattening procedure and, therefore, themanual calibration of parameter Zr .

The simulation was performed by first generating a set of randomEuler parameters to build a rotation matrix, and then, a set of tenrandom points in 3-D was generated. Using the rotation matrix andan orthographic projection matrix, the image projections of these tenrandom points were computed.

Both CSM models, i.e., LCM-CSM and OFL-CSM, are initializedusing the artificial 3-D points and their image projections. Using the


Fig. 3. Point-and-click interface.

calibrated models and the 3-D points, new image projections are com-puted for the points. The error is defined as the difference between theoriginal image projection and the model-computed image projectionfor each of the points.

All these steps constitute a single run of simulation and allow theanalysis over the ten randomly generated points. In order to validatethe sensitivity analysis taking into consideration factors of the actualsetup, eight different scenarios were constructed based on variations ofthe following factors.

1) Noise presence: Both methods are tested without noise and withthe presence of noise added to the original projections, therebysimulating measurement errors.

2) Square versus nonsquare pixels: Two pixel aspect ratios wereused, i.e., 1 and 0.8. The latter is the estimated aspect ratio forthe used cameras.

3) Local versus nonlocal samples: To initialize the models, theartificial points are used either to span a local region or a widerarea. A point was said to be local when the magnitude of its X-,Y-, and Z-components are all between 0 and 10 units. For thenonlocal samples, these magnitudes vary between 0 and 1000units.

To validate the analysis, it is necessary to ensure that the simulationincludes points that are widespread over a range of values. For thisreason, each of the scenarios was run using 100 different rotationmatrices. Each of these matrices uses a different set of ten randompoints. The result is eight sets containing 1000 points each. Using theexact projections and the projections approximated by the two models,the sum-squared error (sse) and variance (σ2 ) were calculated. Theresults are summarized in Tables I and II.

As expected, when there is no added noise to the artificial data,there is no error for none of the two methods. The exception is for theOFL-CSM for nonsquare pixels and nonlocal samples, which pointsto the fact that the method is not well suited to handle this sort ofconfiguration.

In the presence of noise, the results vary for the different scenarios.For the case of square pixels (see Table I), the difference related to thedistance between samples is negligible, thereby affecting, in a moresignificant way, the LCM-CSM. In both cases, as the distance betweensamples increases, so does the accuracy. For the case of nonsquare pix-els (see Table II), the separation distances among the samples becomes

TABLE IRESULTS OF SIMULATION OVER 1000 POINTS USING A PIXEL ASPECT RATIO OF

1 (SQUARE PIXELS)

TABLE IIRESULTS OF SIMULATION OVER 1000 POINTS USING A PIXEL ASPECT RATIO OF

0.8 (NONSQUARE PIXELS)

Fig. 4. Error as a function of the separation distance among points.

a key factor, thereby affecting the performance of both methods in theopposite direction.

From the results, which are shown in Table II, it can be said thatthe OFL-CSM produces better results when artificial points are local,and when these points span a wider area, the model loses accuracy.It is worth noting that in the scenario of having nonsquare pixels andnonlocal samples, the presence of noise does not seem to be critical.For the LCM-CSM, the best performance is found when the points arefar apart.

To show the effect of the distance between 3-D points for bothmethods, a series of simulations was run by varying this distance,using nonsquare pixels and added noise. The results are shown in Fig. 4,where the sse is plotted against the distance between data points. Thesse is the average of ten runs (i.e., 100 data points) for each of thedistances.

When the distance between points is too small, the estimation ofthe LCM-CSM is not accurate; however, the model becomes accurateshowing a small error after a minimal distance. On the other hand, the


TABLE IIIRESULTS OF THE SENSITIVITY ANALYSIS IN REAL DATA OVER TEN SAMPLES

USING A PIXEL ASPECT RATIO OF 0.8 (NONSQUARE PIXELS)

Fig. 5. Final positioning error.

OFL-CSM is accurate for local samples, showing a steep increase inthe error as a function of the distance.

As can be seen, both methods show variations on the error whenthe distance among points is in the same order of magnitude as theadded noise (i.e., ten units). The LCM-CSM shows worse performancein this region and is due to the fact that the center of the image is aparameter. Using noisy data and local samples together with the factthat the central point of the image is implicit in the model produces apoor estimation of parameters. For the OFL-CSM, this point is fixed,thus making the model less sensitive to these conditions; nevertheless,small variations are still present.

B. Sensitivity Analysis Using Experimental Data

A sensitivity analysis using experimental data was performed forthe OFL-CSM and the LCM-CSM methods, using local and nonlocalsamples. For each case, a set of ten points is used. The camera hasnonsquare pixels with an aspect ratio of 0.8. The results are summarizedin Table III.

These results confirm the findings of the sensitivity analysis usingsimulated data reported in the previous section.

C. Positioning Test

In order to experimentally evaluate the performance of the proposedmethod, a series of positioning tests was performed. The accuracy of thesystem using regular OFL-CSM was used as a benchmark to evaluatethe LCM-CSM.

The experiments consisted of 3-D positioning of the end-effectorof a PUMA robot at a point targeted by a laser spot controlled by theGUI that is described in Section V-B. For each method, task, and axis,the difference between the reached point (measured physically in eachCartesian axis) and the desired point (indicated by the laser spot) wascalculated (see Fig. 5).

In all of these tests, the end-effector of the robot was driven au-tonomously to the selected target position. The parameters of the modelwere updated locally with the data acquired from the visual featuresat the end-effector of the robot arm as it approached the target. It is

Fig. 6. SSE for each method and each axis.

Fig. 7. Variance of the error for each method and each axis.

important to note that the visual features span an area wide enough toavoid their locality becoming a critical factor on the performance ofthe LCM-CSM.

A set of 200 positioning tasks was executed and documented, i.e.,100 tests to evaluate the OFL-CSM approach and 100 tests for theLCM-CSM. Fig. 6 shows the sse for each of the axes (X, Y, Z) and foreach of the two methods.

From Fig. 6, it can be observed that the OFL-CSM method is moreaccurate only on one of the axes. However, a Student’s t-test on themean Euclidean distance for the two methods shows that the two resultsare not statistically differentiable (t(198) = 1.40, and p = 0.16). Theoverall error for the OFL-CSM is eOFL = 0.97, and for the LCM-CSM, eLCM = 0.88. Fig. 7 shows the variance, again for each of theaxes (X, Y, Z), and for each of the two methods.

The OFL-CSM method shows a smaller variance for two of thethree axes, and in the total error, the difference is negligible (i.e.,σ2

OFL = 0.19, and σ2LCM = 0.15). This is confirmed by the results of

an F-test on the variance (i.e., F (99, 99) = 0.79, and p = 0.24).

VII. CONCLUSION

In this paper, an improvement of the CSM method is proposed,which introduces a different, linear set of view parameters for theCSM method. In this new LCM based on the CSM strategy, the


observation equation is established using the matrix associated withthe pinhole camera model instead of the standard CSM combination oforthographic camera model and flattening (OFL).

An important and necessary assumption in the orthographic cameramodel is that, to achieve convergence, the separation distance betweenthe focal point of the camera and the center of the 3-D world spacehas to be large enough to be able to neglect the perspective effect. Aflattening procedure is used to overcome this problem. Furthermore,the OFL-CSM strategy requires an initial estimation of parameter Zr ,which needs to be made manually. The parameter analysis, in this paper,shows that the error produced by the low accuracy in the estimation ofZr is absorbed by parameters C1 , C2 , . . . , C6 . The presented rewritingof the CSM equations using the pinhole-model terminology clarifiesthe correspondence of the parameters of both models.

The modification proposed in this paper for the classic CSM methodpresents several advantages. The first advantage is that the cameramatrix is a linear model, with a global minimum solution. At thesame time, the linearity of the model makes it easier to implementthan the OFL-CSM. The second advantage derives from the fact thatthe proposed method is based on the pinhole camera model, whichcan handle perspective effects. This fact allows the computation of allparameters in a single step, as opposed to the two-steps algorithm ofthe OFL-CSM method. Furthermore, these two facts make the LCM-CSM method less computationally expensive. Another advantage isthat the proposed method takes into account the fact that pixels can benonsquare.

In order to test the sensitivity of the two methods, a series of sim-ulations and a test with the real robot were performed showing thestrengths and weaknesses for both the methods. We can summarizethat under the scenario of a pixel ratio of 1, both the methods performequally well, with the OFL-CSM showing a slightly lower sse. How-ever, for cameras with a pixel ratio of 0.8, i.e., nonsquare pixels, theLCM-CSM shows a higher performance as the distance between pointsincreases.

Both methods were implemented using an industrial 6-DOF robotmanipulator to perform a large set of positioning tasks. According tothe experimental results and the statistical analysis, we conclude thatboth approaches, i.e., OFL-CSM and LCM-CSM, are undifferentiablewith respect to the final positioning error.

REFERENCES

[1] B. K. P. Horn, “Obtaining shape from shading information,” in The Psy-chology of Computer Vision. P. H. Winston, Ed. New York: McGraw-Hill, 1975, pp. 5–109.

[2] B. K. P. Horn and M. J. Brooks, Eds., Shape From Shading. Cambridge,MA: MIT, 1990.

[3] J. Ens and P. Lawrence, “An investigation of methods for determiningdepth from focus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15,no. 2, pp. 97–108, Feb. 1993.

[4] U. R. Dhond and K. K. Aggarwal, “Structure from stereo: A review,” inAutonomous Mobile Robots: Perception, Mapping, and Navigation, vol. 1,S. S. Iyengar and A. Elfes, Eds. Los Alamitos, CA: IEEE Comput. Soc.Press, 1991, pp. 25–46.

[5] M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computationalstereo,” Trans. Pattern Anal. Mach. Intell., vol. 25, no. 8, pp. 993–1008,2003.

[6] R. I. Hartley and P. Sturm, “Triangulation,” Comput. Vis. Image Under-standing, vol. 68, no. 2, pp. 146–157, 1997.

[7] R. Y. Tsai, “An efficient and accurate camera calibration technique for3d machine vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Miami Beach, FL, 1996, pp. 364–374.

[8] R. Y. Tsai, “A versatile camera calibration technique for high-accuracy3d machine vision metrology using off-the-shelf tv cameras and lenses,”J. Robot. Autom., vol. RA-3, no. 4, pp. 323–344, Aug. 1987.

[9] R. Y. Tsai and R. K. Lenz, “A new technique for fully autonomous andefficient 3d robotics hand/eye calibration,” Trans. Robot. Autom., vol. 5,no. 3, pp. 345–358, Jun. 1989.

[10] O. D. Faugeras, Q.-T. Luong, and S. J. Maybank. (1992).Camera self-calibration: Theory and experiments. in Proc.Eur. Conf. Comput. Vis., pp. 321–334. [Online]. Available:citeseer.ist.psu.edu/faugeras92camera.html

[11] K.-Y. K. Wong, P. R. Mendonca, and R. Cipolla, “Camera calibration fromsurfaces of revolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25,no. 2, pp. 147–161, Feb. 2003.

[12] J.-Y. Guillemaut, A. S. Aguado, and J. Illingworth, “Using points at infinityfor parameter decoupling in camera calibration,” IEEE Trans. PatternAnal. Mach. Intell., vol. 27, no. 2, pp. 265–270, Feb. 2005.

[13] M. Agrawal and L. S. Davis, “Camera calibration using spheres: A semi-definite programming approach,” in Proc. 9th IEEE Int. Conf. Comput.Vis., 2003, vol. 2, pp. 782–789.

[14] L. Quan and B. Triggs. (2000, Jan.). A unification of autocalibration meth-ods. in Proc. Asian Conf. Comput. Vis., pp. 917–922. [Online]. Available:http://lear.inrialpes.fr/pubs/2000/QT00

[15] M. Chandraker, S. Agarwal, F. Kahl, D. Nister, and D. Kriegman, “Au-tocalibration via rank-constrained estimation of the absolute quadric,” inProc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2007,pp. 1–8.

[16] P. I. Corke. (1994). Visual control of robot manipulators—A review. [On-line]. Available: citeseer.ist.psu.edu/corke94visual.html

[17] S. B. Skaar, W. H. Brockman, and R. Hanson, “Camera space manipula-tion,” Int. J. Robot. Res., vol. 6, no. 4, pp. 20–32, 1987.

[18] S. B. Skaar, W. H. Brockman, and W. S. Jang, “Three dimensional cameraspace manipulation,” Int. J. Robot. Res., vol. 9, no. 4, pp. 22–39, 1990.

[19] E. J. Gonzalez-Galvan, S. B. Skaar, and M. J. Seelinger, “Efficient camera-space target disposition in a matrix of moments structure using camera-space manipulation,” Int. J. Robot. Res., vol. 18, no. 8, pp. 809–818,1999.

[20] A. Cardenas, M. Seelinger, B. Goodwine, and S. B. Skaar, “Vision-basedcontrol of a mobile base and on-board arm,” Int. J. Robot. Res., vol. 9,no. 22, pp. 677–698, 2003.

[21] E. T. Baumgartner and P. S. Schenker, “Autonomous image-plane robotcontrol for martian lander operations,” in Proc. IEEE Int. Conf. Robot.Autom., 1996, pp. 726–731.

[22] M. L. Robinson, E. T. Baumgartner, K. M. Nickels, and T. E. Litwin,“Hybrid image plane/stereo (hips) manipulation for robotic space appli-cations,” Auton. Robots, vol. 23, no. 2, pp. 83–96, Aug. 2007.

[23] M. Seelinger and J.-D. Yoder, “Automatic visual guidance of a forkliftengaging a pallet,” Robot. Auton. Syst., vol. 54, no. 12, pp. 1026–1038,2006.

[24] M. Seelinger, J.-D. Yoder, E. T. Baumgartner, and S. B. Skaar, “Highprecision visual control of mobile manipulators,” IEEE Trans. Robot.Autom., vol. 18, no. 6, pp. 957–965, Dec. 2002.

[25] Z. Fan, “Industrial applications of camera space manipulation with struc-tured light,” Ph.D. dissertation, Univ. Notre Dame, Notre Dame, IN,2003.

[26] M. Seelinger, E. Gonzalez-Galvan, M. Robinson, and S. Skaar, “Towardsa robotic plasma spraying operation using vision,” IEEE Trans. Robot.Autom., vol. 5, no. 4, pp. 33–46, Dec. 1998.

[27] A. Loredo-Flores, E. J. Gonzalez-Galvan, J. J. Cervantes-Sanchez, andA. Martinez-Soto, “Optimization of industrial, vision-based, intuitivelygenerated robot point-allocating tasks using genetic algorithms,” IEEETrans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 4, pp. 600–608, Jul.2008.

[28] E. J. Gonzalez-Galvan, A. Loredo-Flores, J. J. Cervantes-Sanchez,L. A. Aguilera-Cortes, and S. B. Skaar, “An optimal path-generationalgorithm for manufacturing of arbitrarily curved surfaces using uncali-brated vision,” Robot. Comput.-Integr. Manuf., vol. 24, no. 1, pp. 77–91,2008.

[29] R. I. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2004.

[30] E. J. Gonzalez-Galvan, S. B. Skaar, U. A. Korde, and W. Z. Chen, “Ap-plication of a precision enhancing measure in 3-d rigid-body positioningusing camera-space manipulation,” Int. J. Robot. Res., vol. 16, no. 2,pp. 240–257, 1997.

[31] B. Horn, Robot Vision. Cambridge, MA: MIT, 1986.[32] E. J. Gonzalez-Galvan, F. Pazos, S. Skaar, and A. Cardenas-Galindo,

“Camera pan/tilt to eliminate the workspace-size/pixel-resolution trade-off with camera-space manipulation.,” Robot. Comput.-Integr. Manuf.,vol. 18, no. 2, pp. 95–104, 2002.


[33] S. B. Skaar, R. Hanson, and W. H. Brockman, “An adaptive vision-basedmanipulator control scheme,” in Proc. AIAA Guid., Navigation ControlConf., 1986, pp. 608–614.

[34] P. Puget and T. Skordas, “An optimal solution for mobile camera calibra-tion,” in Proc. First Eur. Conf. Comput. Vis. London, U.K.: Springer-Verlag, 1990, pp. 187–198.

[35] D. Translation, Frame Grabber SDK User Manual. Marlboro, MA:Data Translation, 1997.

Adaptive Task-Space Tracking Control of Robots WithoutTask-Space- and Joint-Space-Velocity Measurements

Xinwu Liang, Xinhan Huang, Min Wang, and Xiangjin Zeng

Abstract—The task-space tracking control of robots without the exactknowledge of kinematics and dynamics has been studied before with the as-sumption that the joint velocities are available for controller designs. How-ever, the velocity measurements can be contaminated by noises, therebyresulting in poor system performance, or even leading to instability prob-lems. Therefore, in this paper, we propose a new tracking controller forrobots in the task space without the use of both task-space and joint-space-velocity measurements, under the condition that both the robot kinematicsand dynamics are unknown. To overcome these incapacities without the ve-locity measurements, we introduce the well-known sliding-observer-designtechniques to estimate the joint velocities for the purpose of our controllerdesign. Our main concern, i.e., the stability analysis of our controller de-sign incorporated with the siding observer, is presented with the help ofLyapunov-analysis methodology and the sliding-patch concept. Simulationresults are presented to show the performance of our controller–observerdesigns.

Index Terms—Adaptive control, asymptotically stable, sliding observer,sliding patch, task-space tracking control.

I. INTRODUCTION

Most robot controllers that are proposed/used today are based onjoint space, i.e., the performance definitions are specified in the jointspace, and so are the control laws [1]–[5]. However, almost all of thetasks performed in robot control are defined in task space, such asCartesian space or sensor/image space. Cartesian-space control is alsoknown as the position-based visual servoing when camera sensors areused to obtain the position measurements, which is sensitive to therobot- and camera-calibration precisions, while image-space control isknown as the image-based visual servoing [6]–[9]. Hence, task-spacespecifications should be changed into joint-space specifications in tra-ditional robot controllers, which is done by solving inverse-kinematics

Manuscript received October 17, 2009; revised April 6, 2010; accepted May23, 2010. Date of publication July 12, 2010; date of current version August10, 2010. This paper was recommended for publication by Associate EditorI.-M. Chen and Editor G. Oriolo upon evaluation of the reviewers’ comments.This work was supported by the National Natural Science Foundation of Chinaunder Project 60873032.

The authors are with the Intelligent Control and Robotics Laboratory, De-partment of Control Science and Engineering, Huazhong University of Sci-ence and Technology, Wuhan 430074, China (e-mail: [email protected];[email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TRO.2010.2051594

problems. Even when we can obtain very high precision in the jointspace, as we know, however, we cannot achieve very high task precisionif the system parameters are uncertain, as is usually the case, especiallywhen cameras are used without tedious calibrations. To enhance thetask-control precision, task-space control of robot manipulators hasbeen studied extensively by many researchers, with uncertain systemparameters including both kinematic and dynamic parameters.

To solve the regulation problem, in [10], an approximate Jacobianproportional–integral–differential (PID) control law for setpoint con-trol of a robot with uncertain kinematics is presented. Feedback con-trol laws for setpoint control of a robot with both uncertain kinematicsand gravitational force are proposed in [11] and [12]. Instead of us-ing joint-space damping, two approximate Jacobian control laws withtask-space damping are proposed in [13]. In this case, however, theexact task-space velocity is needed, which can have problems in actualenvironment since task-space measurements are more noisy than joint-space measurements. To overcome such a problem, an estimated task-space velocity, which is obtained by applying the approximate Jacobiantransformation to the exact joint-space velocity, is used in the feedbackloop [14]. In addition, different from [13], in [14], only transpose-Jacobian strategy is used, whereas both the transpose-Jacobian andthe inverse-Jacobian strategies are used in [13]. Actually, both strate-gies are dual, as stated in [15]. In fact, all of the above regulationlaws exploit a static estimate of the Jacobian to achieve task-spaceregulation objectives. Hence, adaptive Jacobian control laws are pro-posed in [16] to eliminate the assumption that the best-guess-estimateJacobian is available at hand. In the same manner, a similar computa-tionally simple strategy is designed in [17]. Moreover, an amplitude-limited torque-input controller is developed in [18], with an emphasison the actuator constraints plus uncertain kinematics and dynamics.Such a strategy is very helpful when we must consider the system inputconstraints.

Task-space tracking-control problem, on the other hand, is less con-cerned in the robot research community. Cheah et al. [19] derive anew adaptive Jacobian controller for trajectory tracking of robot withuncertain kinematics and dynamics. Besides these two uncertainties,the simple proportion-actuator-model uncertainty is also taken into ac-count to the tracking-controller design in [20]. In addition, an adaptivetracking controller for robot manipulators with uncertain kinematicsand dynamics is developed in [21], which is based on the unit quater-nion representation to avoid the singularities associated with three-parameter representations. All of the aforementioned controllers arebased on the assumption that either the joint-space velocity or thetask-space velocity is measurable.

In this paper, we propose a new task-space-adaptive tracking con-troller for robot manipulators with uncertain kinematics and dynamics,with neither the task-space- nor the joint-space-velocity measurements.Here, the exact joint-space velocity used in [19] is replaced by an esti-mated joint-space velocity, which is provided by our designed slidingobservers in this paper. As we know, sliding observers are successfullycombined with the joint-space controllers [22]–[26]. Those proposedcontrollers with joint-velocity estimation can achieve very satisfactorysystem performance and guarantee the overall closed-loop stability ofrobot systems in joint-space control. Most importantly, they provided asystematic framework to analyze the overall closed-loop system stabil-ity with the help of Filippov solution concept [27] using the so-calledreduced-order manifold dynamics. Furthermore, sliding observers arerobust to dynamic uncertainties of robot manipulators, which is usefulfor our system designs under the assumption that the robot dynam-ics is unknown. As we will see later, in this paper, the design slidingobservers can also make the overall closed-loop system of robot ma-nipulator stable in task-space control.

1552-3098/$26.00 © 2010 IEEE

robot positioning using camera-space manipulation with a linear camera model

Documents