determining pose parameters without primitive-to-primitive correspondence

Vol.12 No.2 J O U R N A L OF E L E C T R O N I C S April 1995

D E T E R M I N I N G P O S E P A R A M E T E R S W I T H O U T P R I M I T I V E - T O - P R I M I T I V E C O R R E S P O N D E N C E

Cao Jun( g (Department of Computer Science Sz Technology, Tsinghua University, Beijing 100084)

He Z h e n y a ( ( ~ ~-[17_)

(Department of Radio Engineering, Southeast University, Nanjing 210018)

A b s t r a c t The problem of determining the pose of an object in 3-D space is essential in many

computer vmion applications. In this paper, a model-based approach for solving this problem

is proposed. This approach does not require the knowledge of point-to-point correspondences

between 3-D points on the model and 2-D points in the observed image. The spatial location of

the object is iteratively estimated and updated from the values globally defined over the model

image and the observed image.

K e y w o r d s Pose determination; Object localization; Image processing; Computer vision

I. I n t r o d u c t i o n

In automat ic assembly, maintenance, and manufactur ing situation, it is often required to

determine the pose of a three-dimensional (3-D) object, located in front of a machine vision

system. In some circumstances, the object is not placed precisely in the same location every

time, but randomly placed within a certain area. In order to relax the need to constrain the

environment, the machine vision system must autonomously adap t to its environment.

Pose determinat ion consists of two steps: primitive matching and pose paramete rs com-

puting based on the corresponding primitives. The former is the key step. Once the corre-

spondence between the primitives of the 3-D object model and the primitives of the observed

2-D image has been set up, it is easy to solve for the pose parameters . The accuracy of pose

determinat ion depends on the accuracy of primitive matching. However, primit ive match-

ing is one of the most difficult problems in the research of computer vision. No sat isfactory

solution is available. Moreover, the pose determinat ion methods based on point primitives

or line primitives are sensitive to noise. Because of the accidental mistakes introduced in

the low level processing algorithms and the measurement errors, there are inevi tably many

mismatching primitive pairs, which will affect the accuracy of pose determination.

Researches in the field of physiology show tha t human eyes do not lose the ability

to perceive pose and structure even when no precise primitive correspondence relations is

known. So there must be more general correspondence relations which can be used to

determine pose parameters without the knowledge of primitive to primitive correspondence.

In this paper, we present a correspondless approach for determining pose parameters .

114 JOURNAL OF ELECTRONICS Vol.12

Unlike most prior methods [1-4], our approach does not require the knowledge of point to

point correspondences between 3-D points on the model and 2-D points in the observed

image. Instead of matching the model image generated by the projection equation with the

observed image directly, we measure the values of the feature functions defined globally over

the model image and the observed image. The spatial location of an object can be computed

by applying Newton method for the necessary number of iterations.

II. Image Feature Funct ion

An image is a function f defined on the image plane. The value f (x , y) depends on the

type of imaging device used. For example, if the imaging device is a monochrome camera,

then f(x, y) is the image intensity, or gray level, at point (x, y). If the imaging device is

a color camera, f(x, y) may be a vector value with three components corresponding to the

intensities of R, G, B, respectively. If the imaging device is a range finder, f(x, y) represents

the range information at (x, y). In any case, a feature function F can be defined over the

set of images, which maps an image f into a real number [a]. For example, F may be defined

as the total length of the lines in the image, or as the number of the corners in the image.

F is named as the image feature function.

We regard the transformation of a 3-D object from a known position in the model

coordinate system to a position in the sensor coordinate system as a hypothetical 3-D rigid

motion. As an object moves in 3-D space, its image varies. Hence, the numerical values

of feature functions may also change. If the image feature function is correctly chosen,

the variation in the values of feature functions will reveal information about the undergoing

motion. So the motion parameters can be computed from the variation of the value of feature

functions when the variation is represented as the function of the motion parameters. We

will determine the pose parameters in the light of this idea. Since the feature functions are !

defined for the entire image, no information about the correspondences between the model

image and the observed image is required.

The 2-D image of a 3-D object at a certain pose can be obtained by the projection

equation. This kind of image is named as the generated image. Correspondingly, the image

obtained by the vision sensor is named as the observed image.

1. I m a g e f e a t u r e f u n c t i o n b a s e d o n t h e p o i n t p r i m i t i v e s

Let Pk(Xk,Yk), k = 1, . . . ,n, be the point primitives of an image, the feature function

of the image is defined as

F = f i w(xk, Yk) (1) k : l

where w(x, y) is an arbitrarily given weight function defined over the image.

2. I m a g e f e a t u r e f u n c t i o n b a s e d o n t h e l ine p r i m i t i v e s

Let Lk, k --- 1 , . . . , m, be the line primitives of an image, the feature function of the

No.2 DETERMINING POSE PARAMETERS 115

image is defined as

F = ~ f L = k w(x,y)ds (2)

where w(x, y) is an arbitrarily given weight function defined over the image.

Let the two end points of an arbitrary line primitive L be (xo,Yo) and (XM, YM) respectively. We divide L into P parts, and the coordinates of the dividing points are

(xl, Yl), 1 = 0, 1 , - - . , P. Using the trapezoid formula, we can obtain the approximate equa-

tion for the line integral

L 1P-1 w(x, y)ds = -~ Z [w(xl, Yl) + w(xl-{-1, Yl+l)]A$1 (3) l=O

where st is given by

~s~ = v/(~z+1 - x~)~ + (~+1 - yt) ~,

3. Se lec t ion o f t h e we igh t f u n c t i o n s

I = 0 , 1 , . . . , P - 1 (4)

Let pk(xk, Yk), k ---- 1 , - . . , n, be n point primitives of the generated image, and ~(5, ~)

be their geometric centroid, 5 and ~ can be obtained using the following equations

-~ __ E k = l Xk (5) n

_ Ek'~_~ yk (6) n

Let d be the maximum of the distances of the feature points from their centroid, which

can be written as

d ----l<mk<~Xn ( ~//(ggk -- ~)2 q_ (Yk -- ~)2 } (7)

We use the following weight functions in our method:

Wl (x, y) -: 1 /d (8)

w~(x , y) = (~ - ~ ) / d (9) wa(x, y) = (y - ~ ) / d (10)

w4(~, y) = (~ - ~)2/d~ (11)

ws(x , y) = (y - y)2/d2 (12)

w6(x, y) = (x - ~) (y - ~ ) / d 2 (13)

wT(~, y) = (~ - ~)31d3 (14)

ws(z , y) = (x - ~)~(y - ~)1 d3 (15)

(16)

(17)

w9(x, y) = (x -- 5)(y -- ff)2/d3

wlo(x, y) = (y - ff)3/d3

In real applications, some of the weight functions given above can be chosen to form

independent image feature functions according to the requirement of the pose determination

116 JOURNAL OF ELECTRONICS Val.12

algorithm. The weight functions are revised at each i teration step, since (5, ~) and d change

their values each time a new model image is generated.

III. Comput ing the Derivatives of Image Feature Funct ions with Respect to Pose Parameters

As a 3-D object transforms from the model coordinate system to the sensor coordinate

system, its 2-D perspective projection image varies. Hence, the numerical values of feature

functions may also change. It is the variation in the values of the pose parameters tha t cause

image feature functions to change their values. We w~ll calculate the derivatives of image

feature function with respect to pose parameters.

The overall transformation from a 3-D object point P0 of coordinates (X0, ]I0, Z0) to

an image point (x, y) can be decomposed into the following two steps.

Step 1 Transformation from the model coordinate system (Xm, Ym, Zm) to the sen-

sor coordinate system (Xs, Ys, Zs). Let the point PI(X1, II1, Z1) be the transformed point

of P( by the 3 • 3 rotation matr ix R and the point P(X, ]I, Z) be the transformed point of

P1 by the translation vector T = [tz, ty, tz] T

P = RPo + T = PI + T = Yl + tv (~8) Z 1 q- t Z

A 3-D rotation is specified by its axis and the angle of rotat ion around it counterclock-

wise. The rotat ion matrix R is given by

cos qa cos 0 cos 9~ sin 0 sin ~b - sin qo cos ~b cos qo sin 0 cos ~b + sin qa sin ~b

R - - / s i n ~ c o s 0 s i n ~ s i n 0 s i n r 1 6 2 s i n ~ s i n 0 c o s r 1 6 2 (19)

\ - sin 0 cos 0 sin ~b cos 0 cos r

where ~b, 0, and q0 are rotat ion angles about X~, Ys, and Z~ axes of the sensor coordinate

system respectively.

Step 2 Transformation from 3-D sensor coordinate system (Xs, Y~, Zs) to the image

plane coordinate system (xi, yi) using perspective projection model

X~ + t x x (20) X : Z1 -~- t g f : ~ - f

111 + t y V (21) Y-: Zx q-tz f = -Zf

We use (tl, t2, t3, t4, t5, t6) to replace pose parameters (tx, ty , tz , r 0, ~) for the purpose

of simplification.

For point primitive based image feature functions, the derivatives OF/Oti, i = 1, 2 , . . . , 6

can be expressed as

No.2 D E T E R M I N I N G POSE P A R A M E T E R S 117

For line primitive based image feature functions, the derivatives OF/Otl, i = 1, 2 , . . . , 6

can be expressed as

L ox ot, + ds (23)

The part ial derivatives of x and y with respect to three rotat ion parameters require the

part ial derivatives of X1, Y1, and Z1 with respect to the same parameters , which can be

expressed in a simple form using ~b, 0, qv and as the rotat ion parameters . For example, the

derivative of X1 at a point (X1, ]I1, Z1) with respect to a counterclockwise rotat ion of ~v about

the Z axis is simply -Y1. This follows from the fact (X1, ]I1, Z1) = (r cos ~, r sin ~, Z), where

r is the distance of the point from the Z axis, and therefore OX1/O~o = - r sin ~o = -Y1-

Tab. I gives these derivatives for all combinations of variables.

T a b . 1 P a r t i a l d e : ! " a t i v e s o f X1, ]I1, Zl w i t h r e s p e c t t o r 0,

X1 ]I1 Z1 r 0 -Z1 Y1 0 Z1 0 --X1

-Y~ X~ 0

The part ial derivatives of x and y with respect to three t ranslat ion paramete rs can be

calculated in a straightforward way. For example, f rom Eqs.(1) and (2), we know tha t

X1 + T x f X - - Z 1 - ' ~ Z

SO Ox f Ox Ox X f

gOTx - - Z ' OTy -- O, OTz Z 2

Tab.2 gives the derivatives of x and y with respect to each of the six unknown param-

eters.

T a b . 2 P a r t i a l d e r i v a t i v e s o f x, y w i t h r e s p e c t t o e a c h o f t h e p o s e p a r a m e t e r s

Tx

Ty

Tz

r

x y

f z

0

f x

z ~

f XYx Z 2

z

fYI

z

f z

f r Z 2

Z

f X I Y Z 2

fX1 Z


I V . I t e r a t i v e D e t e r m i n a t i o n o f t h e P o s e P a r a m e t e r s

While the 3-D object model is transforming from the model coordinate system to the

sensor coordinate system, the feature function of the generated image changes its value with )b the variation in tl~e value of the pose parameters. The feature function of the generated

image has the same value as that of the observed image when the object model is at the

imaged position. So we obtain equations using the difference between values of the feature

functions of the generated image and the observed image.

We denote the feature functions of the observed image and the generated image as FS

and FG respectively. FS is a constant, while FG can be regarded as a variable of the pose

parameters�9 M independent weight functions are chosen to form M feature functions of the

observed image and the generated image. We define a set of nonlinear equations as follows:

E1 = FGl(tl , t2,t3, t4,ts,t6) - FS1 = 0 ]

E2 = FG2(tl,t2,t3,t4,ts,te) - FS~ = 0 �9 ( 2 4 )

EM = FGM(t l , t~, t3~ t4, t5, re) -- FSM --- 0

Written in a compact form

E ( t ) = F G ( t ) - F S = 0 (25)

where E = [E1,E2, '" ,EM] T, FG = [FG1,FG2, - . . ,FGM] T, FS = [FS1,FS2, . . . ,FSM] T, t -=

[ t l , t 2 , ' ' ' , t M ] T

Using Newton formula [el, we can obtain the following iterative equation

t k+l = t lc + A t k, ]

JAr k + E(t k) = 0, k = 0, 1 , 2 , . . . ~ (26)

where J = E ' (t u) is Jacobi matr ix whose elements are

J~J-- a t j ' 1 < i < M , l < j _ ~ 6 (27)

From J A t + E(t) = 0, we can express Ei, the difference between values of the feature

function of the generated image and the observed image as the combinations of the variation

introduced by each parameter increment

OEi OE~ aE~ - - . . . . ( 2 8 ) Ei ~ 1 At l Or2 ~ 6 Ate

In Newton iteration algorithm, the six pose parameter corrections At : ( i t l , At2, At3,

At4, Ats, Ate) are determined by solving the linear equations, so the calculation of the

inverse of the Jacobi matr ix can be avoided. If the number of the equations equals the

number of the pose parameters, i.e., M : 6, At can be obtained using the classical solving

methods for linear equations�9 The Jacobi matr ix must be nonsingular to guarantee the

No.2 DETERMINING POSE PARAMETERS 119

uniqueness of the solution. If the number of the equations is larger than the number of

the pose parameters, i.e., M > 6, At can be obtained using the least square method for

over-constrained equations which can be found in Ref.[6].

The iteration procedure for computing the pose of a 3-D object can be described as

follows:

Step 1 Let k = 0. Give initial estimates for the unknown parameters t o and the

precision requirement e.

Step 2 For the k-th iteration, project the 3-D object model stored in a database onto

the image using the current parameter estimates and obtain the generated image. Measure

the discrepancy between the generated image and the observed image by comparing the

values of their feature functions I IE(tk)ll.

Step 3 If I[E(tk)lt < ~, the estimates are correct, t k is the wanted pose~)arameter.

The procedure terminates. Otherwise, continue.

Step 4 Obtain the improved translation vector and rotation matrix using the fol-

lowing iteration equations

Step 5

t T M : t k + / k t k

J A t k + E( t k) : 0

k ~ k § 1, and go to Step 2.

V. I n i t i a l E s t i m a t i o n o f t h e P o s e P a r a m e t e r s

Initial estimates for the pose parameters have a great effect on the accuracy of the

iteration procedure. Since in general the iterations converge more rapidly as the generated

model image is closer to the observed image in geometric shape, the following preprocessing

is effective. Before starting iterations, the object model is translated in the scene in such a

way that the centroid of its projection image coincides with that of the observed image and

also the size of the generated image becomes equal to that of the observed image.

Let p(~, y) and ~' (~', y') be geometrical centroids of the generated image and the ob-

served image respectively. Let d and d' be the maximum distances from point primitives on

the generated image and the observed image to their geometrical centroid respectively. We

define P ( X , Y , Z), the 3-D corresponding point of ~(5, y) as follows:

= Eke1 zk N (29)

- - Z Y = -]y (31)

where Zk is the coordinate in Z direction of the point primitive P k ( X k , Y k , Z k ) , k =

1, 2 , . . - , N on the 3-D object model.


We will find a translation vector V = [Vx,Vy,Vz] T. After translating P by V, the

centroid of the generated image will coincide with that of the observed image and d will be

similar to d'. So an intermediate variable D is defined as

Z (32) D = - fd

I ! I t

Let the coordinate of a 3-D point P (X , Y , Z ) be

--' D' Z = - ~ f (33)

t

t Z t X = -]-5 (34)

t

~ ' Z , = --]-~ (35)

I

The geometric relations for ~ and P, d and D, ~' and P , d' and D are shown in Fig.1.

It is obvious that the geometric centroid of the generated image coincides with that of the

observed image and d equals d' if point P is transformed to point So the translation

can be written as I

v x = X - X

vy = Y - Y

vz = Z - Z

vector V = [vx, v y , vz] T

(36)

(37)

(38)

D _ d

Fig.1 Initial translation of the object model

VI. Experimental Results and Conclusions

Computer simulation has been carried out to show the performance of the proposed

pose determination method. One of the object model is shown in Fig.2. The 2-D projection

image of the object model at the pose (40, 30, 100, 60, -60, -10) is shown in Fig.3. We

No.2 DETERMINING POSE P A R A M E T E R S 121

regard Fig.3 as the observed image of the 3-D object model shown in Fig.2. Before starting

iterations, we translate the object model using the method given in Section V to make the

centroid of the generated image coincide with tha t of the observed image. So the initial

estimates for pose parameters are (32.10, 25.92, 63.35, 0, 0, 0). The generated image at the

initial pose is shown in Fig.4.

| . . . . . . . . . . . . / t

Fig.2 3-D object model

Fig.3 The observed image

Fig.4 The generated image at initial pose

All the weight functions given in Section II were chosen to form 10 line primitive based

image feature functions. Observations were made with regard to the iteration number, the

convergence smoothness and the estimation accuracy. The iterations are as follows:


0: [32.10, 25.92, 63.35, 0, 0, 0]

1: [40.34, 28.15, 96.38, 52.61, -31.17, -49.98]

2: [40.26, 29.05, 98.12, 57.92, -56.80, -18.52]

3: [40.21, 29.37, 101.07, 58.69, -58.17, -12.75]

4: [40.19, 29.51, 100.81, 59.23, -59.65,-10.63]

5: [40.17, 29.60, 100.63, 59.85, -59.86,-10.39]

6: [40.16, 29.63, 100.59, 59.88,-59.90,-10.33]

The results of the 6-th iteration correspond the pose parameters when the termination

condition for the algorithm is satisfied.

The iteration process is shown in Fig.5.

(3) (4)

(5) (6/

Fig.5 The iteration process

The initial estimates for pose parameters have great effect on the convergence, of the

algorithm. The reason for this is that Newton method is locally converging algorithm,

which converges to the correct solution rapidly if appropriate initial estimates is available.

So only in one iteration do the estimated pose parameters approach the correct solution in

our simulation. The iteration procedure can not be guaranteed to converge to the correct

solution without selecting appropriate initial estimates using the proposed method. Because

we assumed that the rotational parameters took the values of zero instead of making any

initial estimates for them, the convergence of the algorithm could be guaranteed when the

rotation angles were small. But the algorithm could converge to the wrong solution for axis-

symmetrical object when the rotation angle were large. However, in automatic assembly,

maintenance, and manufacturing situation, the uncertainty of the object relative to the

No.2 D E T E R M I N I N G POSE P A R A M E T E R S 123

opera tor is small, so the proposed pose determinat ion method is especially suitable for tha t

situation.

Compared with the pose determinat ion methods based on space projection and as-

sumption verification, the proposed correspondless pose determinat ion method has obvious

advantage in respect of computat ional complexity. For example, in Ref.[2], the correspon-

dence relations between model primitives and image primitives are projected into the 6-D

pose pa ramete r space. The position and a t t i tude of the 3-D object are determined according

to the peak point of the pose pa ramete r space. Let the number of model primitives be N.

If three points are chosen as the primitive group, the computa t iona l complexi ty for assump-

tion verification is O(N3). The number of matches is t remendous for a large value of N.

In addition, it is not an easy task to solve for the clustering point in the 6-D Hough space.

In the pose determinat ion algori thm proposed in this paper , the selected matching primi-

tives are only a subset of the model primitives, and the number of the selected primitives

has no remarkable impact on the calculation of the weight function. So the computa t iona l

complexity is irrelevant to the s t ructural complexity of the object model.

As most current pose determinat ion methods, the proposed method assumes a known

correspondence relation between the object model and the project ion image. If the observed

image contains the 2-D projections of many object models, divide-and-conquer scheme will

be adopted to determine the correspondence relation between the 3-D object model and

its 2-D projection. Methods for model matching and par t ly occluded object processing are

research issues presently under investigation.

R e f e r e n c e s

[1] S. T. Baxnard, Comput. Vision Graph. Image Processing, 29(1985)1, 87-99.

[2] S. Linnainmaa, D. Harwood, L. S. Davis, IEEE Trans. on PAMI, PAMI-10(1988)5, 634-647.

[3] T. M. Silberberg, D. Harwood, L. S. Davis, Three-dimensional object recognition using oriented model

points, in Techniques for 3-D Machine Perception, A. Rosenfeld, Ed. North-Holland, Amsterdam, 1988,

271-320.

[4] D. G. Lowe, Artificial Intelligence, 31(1987)3, 355-395.

[5] A. Rosenfeld, A. C. Kak, Digital Picture Processing, Academic Press, New York, 1982, 551-559.

[6] J. Ortega, W. Rheinboldt, Iterative solution of nonlinear equation in several variables, Academic Press,

New York, 1970, 318-327.

determining pose parameters without primitive-to-primitive correspondence

Documents