texture, contour, shape, and motion

18
PatternRecognitionLetters511987)151-168 February1987 North-Holland Texture,contour,shape,andmotion ChrisBROWN,JohnALOIMONOS,MikeSWAIN,PaulCHOUandAnupBASU ComputerScienceDepartment,Universityof Rochester,Rochester,\Y14627,USA ReceivedSeptember1985 Abstract: Intrinsicmagiccalculationexploitsconstraintsarisingfromphysicalandimagingprocesses to derivephysicalscene parametersfrominputimages .Afterabriefreviewofaparadigmaticintrinsicimagecalculationweturntoarecentresultin shapefromtextureandthentoanewresultthatderivesshapeandmotionfromasequenceofpatternedinputs .Experimental resultsaredemonstratedforsyntheticandnaturalimages- Key, words : Intrinsicimagecalculation,texture,colour,shape,motion . 1 .Intrinsicimagecalculation Oneofthefirstandbest-knownexamplesofintrinsicimagecalculation(BarrowandTenenbaum,1978) istherecoveryofshapefromintensity (e .g .Horn,1978 ;IkeuchiandHorn,1981) .Shapeofasmoothsur- faceisdefinedtobelocalsurfaceorientationorviewer-centeredrelativedepth,whicharederivablefrom eachother .Orientationisparameterizedbytheunitnormalvectorofthesurface (x,, y,z), orbytheprojec- tionofsuchvectorswithz>Ofromtheoriginontotheplanez=1 .Thisprojectionyieldsthepopular (p,q) orgradientspacerepresentationcommonlyusedinreasoningaboutlinedrawings .Apolarversionof(p,q) spaceis(slant,tilt)space .Forexample,theorientationofaplanethatistiltedinanupwarddirectionlies somewhereonthepositivepaxis,thefartherawayfromtheoriginthegreatertheslantoftheplane .At infiniteslanttheplaneisedge-ontotheviewer,withitsnormalperpendiculartothelineofsight . Forafixedorthographicviewinggeometry,lightinggeometrythatisinvariantoverthesurface,anda reflectancefunctionthatisinvariantoverthesurface(implyingnoshadows),theonlyvariationsinthe brightnessoftheimageofthesurfacearisefromdifferencesintheorientationofthesurface .Thissortof situation,restrictedasitis,hasbeenbasictointrinsicimagecalculation :thereareimagevariationsthat arecausedbyvariationsinwhichweareinterested .Inthesimplersituationsitisonlythescenevariations thatcausetheimagevariations,andthereissomehopeofinvertingtheprocesstoderivescenecharacteristics fromimagecharacteristics . Inshapefromshading,theoutputisanarrayof(p, q) values,indexedbyimagelocation,thatcorresponds totheinputarrayofbrightnessvalues .Areflectancemap,whichmaybeempiricallyobtainedfromacali- brationobject,relatesorientationtobrightness .Generallythereareinfinitelymanyorientationsthatpro- ducethesamebrightness,butoftentheylieonaone-dimensionallocusin (p,q) space .Auniqueorientation maybederivedbyminimizingaglobalerrormeasure .Oneerrortermisthesquareddifferenceofthebright- nessatanimagepoint(i,j)andthepredictedbrightnessofthat (i,j) point(obtainedfromitscurrent (p,q) valueandthereflectancemap) .Anothererrortermmeasuresthesmoothnessoftheobjectatapoint(say bythesumsofsquaresofdifferencesofneighboringpand q values) .Minimizingtheglobalerrorthentakes theformofaGauss-Seideliterativeprocessthatcanberoughlyenvisionedaseachorientationvectorat an(i,j)locationtakingontheaverageorientationofitsneighborsandthenbendinginthedirectionthat 0167-8655/87,-53 .50 "1 1987,ElseeierSciencePublishersB .V .(North-Holland) 15 1

Upload: chris-brown

Post on 14-Jul-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Texture, contour, shape, and motion

Pattern Recognition Letters 5 11987) 151-168

February 1987North-Holland

Texture, contour, shape, and motion

Chris BROWN, John ALOIMONOS, Mike SWAIN, Paul CHOU and Anup BASUComputer Science Department, University of Rochester, Rochester, \Y 14627, USA

Received September 1985

Abstract: Intrinsic magic calculation exploits constraints arising from physical and imaging processes to derive physical sceneparameters from input images . After a brief review of a paradigmatic intrinsic image calculation we turn to a recent result inshape from texture and then to a new result that derives shape and motion from a sequence of patterned inputs . Experimentalresults are demonstrated for synthetic and natural images-

Key, words : Intrinsic image calculation, texture, colour, shape, motion .

1. Intrinsic image calculation

One of the first and best-known examples of intrinsic image calculation (Barrow and Tenenbaum, 1978)is the recovery of shape from intensity (e.g. Horn, 1978; Ikeuchi and Horn, 1981) . Shape of a smooth sur-face is defined to be local surface orientation or viewer-centered relative depth, which are derivable fromeach other . Orientation is parameterized by the unit normal vector of the surface (x,, y, z), or by the projec-tion of such vectors with z>O from the origin onto the plane z= 1 . This projection yields the popular (p,q)or gradient space representation commonly used in reasoning about line drawings . A polar version of (p,q)space is (slant, tilt) space . For example, the orientation of a plane that is tilted in an upward direction liessomewhere on the positive p axis, the farther away from the origin the greater the slant of the plane . Atinfinite slant the plane is edge-on to the viewer, with its normal perpendicular to the line of sight .

For a fixed orthographic viewing geometry, lighting geometry that is invariant over the surface, and areflectance function that is invariant over the surface (implying no shadows), the only variations in thebrightness of the image of the surface arise from differences in the orientation of the surface . This sort ofsituation, restricted as it is, has been basic to intrinsic image calculation : there are image variations thatare caused by variations in which we are interested . In the simpler situations it is only the scene variationsthat cause the image variations, and there is some hope of inverting the process to derive scene characteristicsfrom image characteristics .

In shape from shading, the output is an array of (p, q) values, indexed by image location, that correspondsto the input array of brightness values . A reflectance map, which may be empirically obtained from a cali-bration object, relates orientation to brightness . Generally there are infinitely many orientations that pro-duce the same brightness, but often they lie on a one-dimensional locus in (p,q) space. A unique orientationmay be derived by minimizing a global error measure . One error term is the squared difference of the bright-ness at an image point (i, j) and the predicted brightness of that (i, j) point (obtained from its current (p,q)value and the reflectance map) . Another error term measures the smoothness of the object at a point (sayby the sums of squares of differences of neighboring p and q values) . Minimizing the global error then takesthe form of a Gauss-Seidel iterative process that can be roughly envisioned as each orientation vector atan (i, j) location taking on the average orientation of its neighbors and then bending in the direction that

0167-8655/87,-53 .50 "1 1987, Elseeier Science Publishers B .V . (North-Holland)

1 5 1

Page 2: Texture, contour, shape, and motion

Volume 5, Number 2

PATTERN RECOGNITION LETTERS

February 1987

maximally reduces the resulting brightness error . This minimization can go on in iterative passes, with eachpass proceeding in parallel for all vectors . Boundary conditions (such as knowing the orientation of a setof points on the object apriori) are needed to start the process, which is hoped will converge to a reasonableanswer. Details are available in (Horn and lkeuchi, 1981 ; Aloimonos and Swain, 1985) .

Intrinsic parameter calculation is open to further research . First, real images result from the conflationof several physical phenomena - the reflectance may be changing from shadows or paint over the same areathat the surface is curving . Unscrambling and correctly attributing such multi-causal variations is an obviousgoal and such results are always of interest. Second, the uniqueness of answers obtained by intrinsic imagecalculations is an interesting topic . Generally the convergence and uniqueness of parallel iterative relaxationprocesses is not guaranteed .

2. Shape from texture

Gibson [19501 was one of the first to enunciate the idea that surface orientation could be derived fromvariations in texture . Since 1976 several researchers have addressed this problem (cf . Aloimonos and Swain(1985) for a brief catalog) using various assumptions about texture and imaging geometry . In the workdescribed below, the assumptions are that texture elements, or texels, are scattered over the surface . Theuniformity of their distribution is not an issue, nor is their shape . There is an assumption that their areais constant, and that any three nearest neighbor texels lie in a plane that closely approximates the surface .This last amounts to an assumption of smoothness relative to texel size . There is an assumption that thevariation in depth over the surface is small relative to the distance from the camera . Finally, Ohta's (1981)perspective approximation is taken to hold, and this amounts again to a limitation on the size of texelsrelative to the surface curvature . We develop a constraint that arises purely from imaging geometry thatrelates imaged texel area to orientation . This constraint takes the same form as the shape from shading con-straint for Lambertian surfaces .

Let a coordinate system OXYZ be fixed with respect to the camera, with the - Z axis pointing along theoptical axis, and 0 the nodal point of the eye (center of the lens) . The image plane is assumed to be perpen-dicular to the Z axis at the point (0, 0, -1), (i .e . focal length= I) . Ohta approximates perspective projectionby a two-stage affine transformation . A texel is taken to lie on the tangent plane Q of the surface at thetexel's centroid . Q has orientation (p, q) . A plane P is erected through the texel centroid parallel to the imageplane, and the texel's shape is projected, parallel to the line joining the viewpoint and the texel centroid,onto P. This first stage is a skew transformation . The second stage is a true point projection from planeP to the image plane through the viewpoint, which amounts to a pure scaling by some constant factor, sayI/fl. Ohta's transformation approximates location- and depth-dependent perspective foreshortening andsize distortion, and the approximations are quite good for small shapes .

To represent the original pattern of the surface texel, we use an (a, b, c) coordinate system, with its originat the mass center of the texel and the (a,b) plane identical to the plane Q. To represent the pattern of theimage texel, we use an (a', b', c') coordinate system, with its origin the point (A, B, - 1), where (A, B) is themass center of the image texel, and the axes a', b', c' are parallel to the axes X, Y, Z respectively . Then thetransformation from (a,b) to (a', b') with the two step projection process is given by the affine transforma-tions . In our work we choose one P plane for the entire image, which is why depth variation should be smallrelative to depth .

1 52

[a' b] =[a bl-

1+pA

pB~I+p'

) I+p 2

q(p+A)

q/3- p 2- 1

_V(1+p2)(l+p'+q2) 17(1+p')(1+p'+q2)_

Page 3: Texture, contour, shape, and motion

Volume i, Number'_

PATTERN RECOGNITION LETTERS

February 1957

It is clear that this transformation is the relation between two 2-D patterns, one in the 3-D space and theother its image on the image plane. We now use the above affine transformation to develop the desired con-straint .

3 . The constraint

The determinant of the matrix of an affine transformation is equal to the ratio of the areas of the twopatterns before and after the transformation . Specifically, if Sw is the area of a world texel that lies on aplane with gradient (p,q) and S1 is the area of its image that has mass center (A,B), then we have :

detSw

VI +P2

qQ-p2- IV(1+p`)(I+p-+q-) V(l+p2)(1+p +q )_

or

S1

I I - Ap - BqSw f' Vl+p-+q2

or

S-Sw, 1-Ap-Bq

(UQz Vl+pz+q-

Equation (I) relates the area of a world texel Sw, its gradient (p, q), the area S1 of its image and its masscenter (A, B). If we call the quantity S1 `textural intensity', and the quantity Sw-/(32 'textural albedo', thenequation (1) is very similar to the image irradiance equation for Lambertian surfaces :

I +Ap+Bq

Y l+p2+qswhere I is the intensity, (p,q) the gradient of the surface point whose image has intensity I, ). is the albedoat that point and (A, B, 1) the direction of the light source (Horn, 1977 ; Ikeuchi, 1981) . Thus equation (I)can be used to recover surface orientation, using the standard methods described earlier .

4. Recovering the textural albedo

To recover orientation, we must know the textural albedo ,1=Sw//3` . We cannot know /3 from a staticmonocular view; neither can we know Sw in general . But it turns out that we can compute approximatelythe ratio SW/#2 .

Following Ohta et al . (1980) and assuming local planarity, i .e . three neighboring texels belong to the sameplane which we call Q, we have that :

f1 = 51V1 f. = s~1:'i ,

where ft, f2 are the distances from two texels to the vanishing line of the plane Q along the line joining thetwo texels and s,,s, are the areas of the two texels in the image . Since I f, -f,l, is just the distance betweenthe two texels in the image and is known, a point on the vanishing line may be determined . With a thirdtexel, two points may be determined, which give the equation of the vanishing line . Since the equation of

1 5 3

Page 4: Texture, contour, shape, and motion

Volume 5, Number 2

PATTERN RECOGNITION LETTERS

February 1987

the vanishing line of the plane Q is px+qy= 1, the orientation of the plane Q can be determined, and fromthat an approximation of the textural albedo is found .

5. Experiments

The algorithm was tested on artificial images of a plane, cylinder and sphere . There are four distinct stepsto the algorithm :

(I) Location of texels .(2) Minimum triangulation of the texel centers .(3) Calculation of initial orientations and local textural albedo .(4) Iterative process .In (1), connected regions in the image are detected . Their centers of gravity are taken to be the locations

of the texels . Their size is recorded and the texels that are in the boundary are marked (Ballard and Brown,1982). In (2), the locations of the texels are triangulated so that the sum of the length of the lines is minimum(Aho, Hopcroft and Ullman, 1974) . In (3), the initial orientations are calculated using the method in Section4. Finally, azimuthal equidistant coordinates (AEC) were used through the iterative process instead of thegradient space p and q, since AEC change linearly with change in orientation . The estimate of A was calcu-lated from the local orientation with the lowest value of p and q. Due to curvature of the surface, convexobjects tend to overestimate d while concave objects tend to underestimate . These errors are minimizedwhen the surface of the objects is most nearly perpendicular to the image plane . The algorithm is quite insen-sitive to initial orientations given to texels whose orientations were allowed to vary through the iterative pro-cess . Boundary texel orientations were not allowed to change . The error in calculating their values was thepredominant factor in influencing the total error . The iterative process took under 10 iterations . The processalways converged for our synthetic images . The final error values were

1 54

Image

Fractional error

Plane

NegligibleSphere

0.005Cylinder

0.015

The errors in the above table denote the average percent error at each texel . The error at each texel wastaken to be

1147r • rp,

where /p =solid angle subtended by rotating the calculated orientation about the actual orientation . Figure 1gives a pictorial description of the error at each texel .

Figure 1 .

Figures 2-4 show the process . The (a) figures are the inputs . The (b) figures show the starting point of

K :

calculatedorientation

y :

actualorientation

Error = 5/(area ofthe sphere)

Page 5: Texture, contour, shape, and motion

Volume 5, Number 2 PATTERN RECOGNITION LETTERS

Figure 2 .

the relaxation after applying Ohta's algorithm locally . The (c) parts show the final reconstruction afterrelaxation .

6 . An extension to natural images

In natural images, the assumption that a surface is covered with texels of identical area is not veryrealistic . This section refers to recent work by Aloimonos and Chou (1985) on shape from the images oftextured planes, based on the uniform density assumption under strong segmentation (identification oftexels) and weak segmentation (edge finding) . Consider a world plane -z=pc+qv-c, and the plane 17passing through the center of mass of a small area s of the world plane . If we consider an area S t in the

February 198'

1 5 5

Page 6: Texture, contour, shape, and motion

Volume <, Number

a

Nttttt*~tttiHttttttittsiNtttttttttf4NtttttttttitFtttttS •$ tlt!l+tttttattttitIIttt#ti • • fl

PATTERN RECOGNITION LETTERS

Figure 3 .

image, then in order to find the area in the world plane whose projection is S,, we must multiply the areaS, with the factor

R=absd3yl+p-+ q'

absrc -yl+p - + q -

c \(I -AP - Bq)' ~where (A,B) is the center of gravity of the image area S t .

7 . Exploiting the uniform density assumption

The uniform density assumption states that if K, A are any two regions in the world plane and they containK, and K, texels respectively, they K,/area(K)=K 2/area(A) . So consider any two regions s t and s, in the

February 1987

1 5 6

Page 7: Texture, contour, shape, and motion

PATTERN RLCUC\!TlO\ LL

b

Figure 4 .

image with areas s, and s, that contain K, and K, texels respectively, then under the assumption that theworld texels are uniformly distributed, we have K,/StR,=K,%S,R, with

(0

-+

,R,=abs

c-ll+p-+q- ,

R,=absr c''yl.-Aip - B,q) )'

\(l-A,p-B,q)-/

and (.4,, B,), (-4,, B_) the centers of gravity of the image regions S and S, respectively .From this we get -.

K, S

K, S' 1 ' 3

,

~ K S , 3~~K,S,I

A,A,Jp+[(K,S,) B'_B,

Iqt-..,K,S,/

The above equation represents a line in p - q s

p

pace . Any two regions in the image constrain (p, q) to lieon a tine in the gradient space . Thus, taking any two pairs of regions we can solve explicitly for p and q .(To overcome undesirable results due to errors from the digitization process and the density fluctuationsof the regions, we employed a least-square-fit mechanism by considering several image pairs . A Houghtransform estimation method might also be appropriate.) Figure 5 is the image of a plane parallel to theimage plane, covered with random dots (texels) . Figure 6 is the image of the dotted plane rotated andtranslated with tilt =135' and slant =30' . Our program, based on the scheme described above, recoveredtilt =134 .4' and slant =29 .'5° .

F?hr,iart 1~,z'

Page 8: Texture, contour, shape, and motion

Volume 5 . Number 2

PATTERN RECOGNIT_OV LETTERS

February 1987

1 5 8

Figure 5 .

8. Solving the problem with a weaker segmentation

The method in the last section makes the strong assumption that the texel (hence region) segmentationproblem has been solved. As a practical matter, that is not the case . On the other hand, in the recent litera-ture (Bandyopadhyay, 1984 ; Ballard and Brown, 1982 ; Marr, 1979) there are several methods for the com-putation of partial boundaries of the texels (edges) at every point in a textured image . Let us redefine densityto be the total length of the texel boundaries per unit area . The uniform density assumption states that this

N

Figure 7 .

Figure 9 .

Figure 6 .

Figure 8 .

Figure t0.

Page 9: Texture, contour, shape, and motion

Fieure 11 .

Figure 13 .

PATTERS RECOGNITION LU ELK,

Figure 12 .

Figure 1-! .

F-_ rua' . 195%

density is the same everywhere in the world plane . This assumption is not far from the previous one andseems to be true for a large subset of natural images, in contrast with Witkin's (1981) isotropy assumption,which does not seem to hold true for many natural images . Aloimonos and Chou (1985) describe a methodthat finds the orientation of the world plane under the new assumption and below we describe experimentsbased on that method .

Figure 7 presents the image of a plane parallel to the image plane, covered with random line segments .Figure 8 presents the image of this plane rotated with tilt = 135° and slant =30° . The program recoveredtilt= 133 .77° and slant =30 .40', Figure 9 presents the image of a plane (parallel to the image plane) coveredwith randomly generated small circles . Figure 10 presents the image of this plane rotated with tilt =135°and slant =30° . The program recovered tilt = 135.54° and slant =29.77° . The natural images used were firstpreprocessed to find the boundaries of texels (edges) by applying the modified Frei-Chen operators intro-duced by Bandopadhyay (1984) . Figure I I shows the photograph of a textured floor with slant =_45' andtilt= 108' . Figure 13 shows the edges after the proprocessing . The algorithm produced slant =45 .87 andtilt = 109 .43° . Finally, Figure 14 shows the photograph of a part of a grass field with slant=60° and tilt =0 ' .Figure 12 shows the images of its edges after the preprocessing . The program recovered slant = 63 .057' andtilt =- 1 .076° .

Aloimonos and Chou (1985) present a theoretical analysis of the error introduced by the affine approxi-mation to the perspective projection .

I

1 5 9

Page 10: Texture, contour, shape, and motion

Nun c :

PATTERN RECOGNITION LL iIER,

F n nary 198'

9. Orientation from contour

This section refers to the recovery of surface orientation from contour information ; in particular it isproved that the change of the perceived area of a planar contour from different cameras in a known con-figuration is enough to recover the 3-D structure of the contour, without the knowledge of point to pointcorrespondence between the different images .

The recovery of three-dimensional shape and surface orientation from a two-dimensional contour is afundamental process in any visual system . Recently, a number of methods have been proposed for com-puting this shape from contour . For the most part, previous techniques have concentrated on trying to iden-tify a few simple, general constraints and assumptions that are consistent with the nature of all possibleobjects and imaging geometries in order to recover a single `best' interpretation, from among the manypossible for a given image. For example, Kanadc (1981) defines shape constraints in terms of image spaceregularities such as parallel lines and skew symmetries under orthographic projection . Witkin (1981) looksfor the most uniform distribution of tangents to a contour over a set of possible inverse projections in objectspace under orthography . Similarly, Brady and Yuille (1984) search for the most compact shape (using themeasure of area over perimeter squared) in the object space of inverse projected planar contours .

Rather than attempting to maximize some general shape-based evaluation function over the space ofpossible inverse projective transforms of a given image contour, we propose to find a unique solution byusing more than one camera, since it can be easily proved that only one image (under orthography or per-spective) of a planar contour admits infinite interpretations of the structure of the world plane on whichthe contour lies . Finally, the need for a unique solution, which is guaranteed in our approach, comes alsofrom the fact that there exist many real world counterexamples to the non-purposive evaluation functionsthat have been developed to date . For example, Kanade's and Witkin's measures incorrectly estimate sur-face orientation for regular shapes such as ellipses and rectangles since they will compute them to be rotatedsquares and circles (e .g . if we view a rectangular table top, we do not see it as a rotated square surface,but as a rotated rectangle .)

In the sequel, we will present two methods for the unique recovery of shape from contour, one basedon three views and the other based on two views, without having to solve the point to point correspondencebetween the different images of the contour . We proceed with the following proposition (Figure 15) .

Proposition . Let a coordinates ystem 0, X, Y, Z be fixed, with the - Z axis pointing along the optical axis .We consider that the image plane Im, is perpendicular to the Z axis at the point (0, 0, - 1). Let a plane 17with equation -Z=pX+qY+c in the world, where (p, q) is the gradient of the plane that contains a con-tour C. Furthermore, we consider two more cameras with image planes Im, and Im3 , whose coordinatesystems (nodal points) are such that any world point has the same depth with respect to any of the cameras .Then, assuming that the projection ofthe contour C onto any of the image planes follows the two step para-perspective project process (Section 2), the images C 1 , C,, and C3 of the contour in the three cameras areenough to determine uniquely the orientation of the plane 17, without having to solve the point to pointcorrespondence between C,, C2 , C3 .

Proof. Let S,, S,, and S3 be the areas of the contours C,, C, and C3 respectively . Let also the depth ofthe center of gravity of the contour C of (3 . If Sw is the area of the contour C on the plane 17, and (A„B,),(A,, B,) and (A 3 , B 3 ) the centers of gravity of the image contours C 1 , C, and C3 respectively, then using theproperties of the paraperspective projection, we can easily prove (eq . (1) of Section 3) that

S,

1

1-A I p-B,q

S,c /3 =

Y l-p - +q -

t 60

Page 11: Texture, contour, shape, and motion

Equations (4) constitute a linear system with unknowns p and q, which in general has a unique solution .

A degenerate case in the solution of the above system arises when the centers of all three image planesare collinear. Experiments using the above method on perspective images computed the orientation of theworld contour with great accuracy . Despite the fact that the paraperspective projection is an approximationof the perspective projection, and the error depends on many factors (slant, tilt, depth, size of the contour ;for a detailed discussion, see Aloimonos and Chou (1985), it seems that in the above method much of theintroduced error is cancelled . This fact was brought to our attention from extensive experiments . We arecurrently working towards the theoretical explanation of this error cancellation .

10. Solving the problem with two frames

In the previous section, we used three frames for the recovery of shape from contour . But the informationwe used from the image contours was only their area, and in particular how the area was changing fromview to view . A useful piece of information that we have not yet utilized is the length of the contour (whichis of course independent of its area) . Using this information, we can solve the shape from contour problemwith two projections (binocular observer) but in a computationally much harder way (nonlinear equations) .

Figure 15 .

nVD~'A

R

Figure 16 .

Consider a coordinate system 0, X, Y, Z, to be fixed with respect to the left camera, with the - Z axis againpointing along the optical axis . We consider that the image plane of the left camera is perpendicular to theZ axis at the point (0, 0, - 1). Consider now the nodal point of the right camera to be the point (Ax, 0, 0)and the image plane of the right camera, identical to the one of the left camera (see Figure 16) . Consideralso a contour Con a world plane 17 with equation -Z=pX+qY+c, and let C L and CR be the projections

1 6 1

Volume 5, Number 2

PATTERN RECOGNITION LETTERS February 1987

S, 1 I-A,p-B,q (2)Sw

S3

Th

1l/1+p'+q'I-A 3p-B3 q

SW $ }Vl+p t +q'(3)

Dividing

S,

the above equations appropriately, we derive

1-A l p-Bl q

S2 1-A.p-B3 qS, 1 - Azp-B2q '

S3 1 -A3P - B 39(4)

Page 12: Texture, contour, shape, and motion

Volume i, Number 2

PATTERN RECOGNITION LETTERS

February 1987

of the contour Con the left and right image respectively using the paraperspective projection . We can easilyprove that a small line segment (1 cos 0,1 cos B) on the image plane is due to the projection of a line segmenton the world plane, with length L-1Lq , with

cL

(I-AP-Bq)-Vklcos-B+k,sin'B+k, sin BcosB,

where

k1=(I - qB)2 +(PB)2 +P 2 ,

k,=(I-pA)2+q(A)2+q2,

k3 = 2((l - gB)gA + (1- pA)pB + pq),

and (A, B) is the center of gravity of the area under consideration . So, given a contour in an image, if webreak the contour into small line segments (edges) (1 i cos O ,1; sin 6 i ), i=1, . . ., n, then the length of the con-tour in the world plane is given by

nL = E 1, L ;

i=1

with

L,=Q

yk l cos2 0+k 2 sin 2 0+k3 cos8i sin8;,I-Ap-Bq

with

k1 =(I -qB)2 +(pB)2 +P2 ,

k,= (I - pA) 2 +(qA) 2 +q2 ,

k3=2((l - gB)gA+(I - pA)PB+Pq)-

and /I is the depth of the center of gravity of the world contour and (A, B) the center of gravity of the imagecontour. If we consider now the left and right images of the contour C (Figure 17), and we compute the

1 62

L R

Figure 17 .

length of the world contour from each one, we should find the same answer . In other words, if L L and LRare the length of the world contour that we compute from the left and right image, respectively, we musthave

LL =L R .

( 6) .

Equation (6) is an equation in the unknowns p,q, but it is in a complicated form that does not permiteasy algebraic manipulations .

On the other hand, if Sµ,, SL , S R are the areas of the world contour, the left image contour and the rightimage contour respectively, then we have

Page 13: Texture, contour, shape, and motion

where (A L , B L ) and (A R , BR ) are the centers of gravity of the left and the right image contour respectively .From (7) and (8) we conclude

SL _ I - ALP - BLq

(9)SR I - ARP - BRq

Equation (9) represents a straight line in gradient space, or a great circle in the (equivalent) Gaussiansphere formalism. Equations (6) and (9) constitute a nonlinear system in the unknowns p and q . We arecurrently working on a theoretical analysis concerning the number of the solutions of this system . Pre-liminary experimental results, based on the following discrete method, indicate that there exists a uniquesolution . The discrete method we used is as follows : Equation (9) represents a great circle in the Gaussiansphere (constant azimuth, varying elevation) . By taking different values for the elevation angle (I80 values,if the different values are I degree apart) we solve for the gradient p, q and we choose this p, q that makesthe function (L L -L R )'- minimum .

Up to this point, we have presented two methods for the determination of shape from contour, one basedon three views and the relative change of area among the different views, and the other based on two views(binocular observer) and change of area and perimeter of the contours between the two different views . Wenow proceed to a method for 3-D motion determination without having to find point to point cor-respondence between the successive dynamic frames .

It . Determining 3-D motion in perspective without correspondence

In this section we give a method for the detection of the 3-D motion of a moving contour in true perspec-tive projection, without using point to point correspondence . Here we only treat the case of pure translation .The general case (rotation plus translation) can be found in (Aloimonos and Basu, 1985) .

x I ,Y 'Z I)

xFigure IS .

Consider a coordinate system OXYZ fixed with respect to the camera, 0 the nodal point of the eye andthe image plane perpendicular to the - Z axis, (focal length 1) chat is pointing along the optical axis (Figure18). Let us represent points on the image plane with small letters, (x,y), and points in the world with capitalletters, (X, Y, Z) .

1 6 3

Volume 5 . Number'_ PATTERN RECOGNITION LETTERS Februan 198',

SL I I - ALP - BLq(7)

Sw (3= VI +p'--+q-

SR I 1 - .ARP - BRq (8)Sw T - Vi +p - + q 3

Page 14: Texture, contour, shape, and motion

Volume 5 . Number 2

PATTERN RECOGNITION LETTERS

February 1987

Let a point P=(X„ Y„ Z,) in the world with perspective image (x1 , y,) where x,=X1 /Z, and y,=Y1 /Z, .If the point P moves to the position P'= (X2 , Y, Z2 ) with

X,=X,+4X,

Y,=Y1 +4Y,

Z,=Z,+4Z,

then we desire to find the direction of the translation (4X/4Z,4Y/4Z) . If the image of P is (x2,y2 ),then the observed motion of the world point in the image plane is given by the displacement vector(x2-x,,y2-y,) (which in the case of very small motion is also known as optic flow) .

We can easily prove that

4X-x,4Z

4Y-y 1 4Zx2-x,-

Z 1 +4Z ' yz-y1= Z +4Z '

but under the assumption that the depth is large (and the motion in depth small), the equation abovebecomes :

x2 -x 1 =(4X-x 1 •4 Z)/Z

(10)

Y2 - Y1 = (4 Y-Y, 'AZ)/Z-

(11)

Until now, all the published methods for the recovery of the direction (AX/AZ,4Y/4Z) are based onthe above equations (10) and (11) (see Ullman, 1979 ; Longuet-Higgins, 1981 ; Tsai and Huang, 1984 ;Bandyopadhyay and Aloimonos, 1985), which of course require the knowledge of the correspondence be-tween points in the successive frames . In the next section, we present a method for the recovery of the trans-lational direction of a moving planar contour (AX/4Z,4 Y/4 Z), without having to solve the correspondenceproblem .

12. Motion of a planar contour without correspondence

Consider again a coordinate system OXYZ fixed with respect to the camera, and following the imaginggeometry introduced in the previous sections, consider a contour C on a plane Z=pX+qY+c that ismoving along the vector (AX, 4Y,4Z), and let C, and C2 be the two successive images of the contour C(see Figure 19) . We suppose that the orientation of the contour world plane is already known (i .e . it hasbeen found by one of the two already presented methods) . In what follows, to facilitate analysis, we willpresent a discrete analysis (i .e . we will talk about summation over all discrete points of the contour, insteadof integration along the contour) .

Figure 19 .

Consider a point (x;, y ;) on contour C, (the first frame of the sequence), which moves to a point (xi, yy )on contour C2 . Note that for the moment we do not worry about where the point (x;, y1 ) is on the secondcontour C2 . The only important thing is that (x1 , y;) c C2 and it is the corresponding point of (x„ y;) e C1 .From that, we have

4X-x14Z,v,-x1=

(12)

1 64

Z;

Page 15: Texture, contour, shape, and motion

t"lume "umber'

]Y- V;vZ

Z, .where Z; is the depth of the contour point whose image is the point (x;, v,) . Taking into account that

Z,=p_k;+qYi +c or I=px;+qyi +c/Z; or 1/Zi=l/c(1-px;-qy,),

equations (12) and (13) become :

dX-x;dZxj-xi=

(I - pxi - qyi),

( 14 )c

AY- y;JZ(1 -pxi-qyi) .

)C

Equations (14) and (15) relate the x and y coordinates respectively, of two corresponding points, the firston contour C, and the second on contour C 2 .

If we write equation (14) for all the points on the two contours, and we sum up all these equations, weget the following

4X-x1AZE xi E xi =

(1 - Pxi - qy1) .

(16),

C

In equation (16), Ejx1 denotes the sum of the x coordinates of all the points on contour C ;, E, x,denotes the sum of the x coordinates of all the points on contour C, and the right hand side of the aboveequation is summed over all points of contour C I . Equation (16) becomes :

PATTER` RECOGNiTiON LETTERS

February t9$"

From equations (17), (18) we conclude :

Figure 20 .

16_

c(E xj - E x) =4X E (1-p-xi-qyi)-AZ E xi(l-px,- qyi) .i

In an analogous way, working with equation (15) we get :

(17)

c( V. y,- E yr)=4Y E (I -px;-qyi)-AZ E y,(I - pxi - qyi) . (I8)

Page 16: Texture, contour, shape, and motion

e 5, Num :_er

PATTERN RECOGNITION LE - FlER

February iex ,

jk'Z

(1-Pw-

s-L;(1-p-rr-iv;)`,

d Y

(19)

- S' (I-pr,-4y;)- S Yr(l - pxr- 9l';)4Zr -

Equation (t9) is a linear equation on the unknowns JX/JZ and J Y'4Z . This equation is due to the mo-tion of the contour on one image frame . By considering a binocular observer, we get two linear equations(eq . (19)) from the motion of the contour in both the left and right images, which gives a unique solutionfor the direction of the translation (4X/AZ,AY/JZ), without using point to point correspondence . Ob-viously, to apply this method in natural images, one would have to solve the problem of contour cor-

dorsN. W,

cwt h. b

i t

0.4 a al.ollls l .lllrsattl„s w••al.llll~ls.ams

Figure 21 .

ft"".* 4n. asd

1 66

Figure 23 .

Figure 22 .

trl. ri-

nor tn- rtws

nt_! MR. rtot

Its.. f. Vt.eI

Yt~t SMr. St • t .ll1M 1.lllltlass. ate. ss.w - sass s.na»a

Figure 24 .

Page 17: Texture, contour, shape, and motion

Volume 5, Number 2

PATTERN RECOGNITION LETTERS

February 1987

respondence (macro-correspondence) which seems easier than the point to point correspondence .Finally, experimental results based on this method are very accurate (for motion parameter determination

without using correspondence) . Another method was presented by Kanatani (1985a, 1985b), but numericalinstabilities affect the desired result a great deal .

Figures 20-24 show results of binocular and trinocular experiments . Figure 20 shows the perspectiveimages of a planar contour taken by three cameras at the positions (0, 0), (0, 50) and (50, 0) respectively . Theactual orientation of the contour in space was given by the gradient (p, q) = (15,25) . The computed orienta-tion was (p, q)=(14 .99, 24 .99). Figure 21 shows again the perspective images of a planar contour taken bythree cameras at the positions (0,0), (0,50) and (50,0) respectively. The actual orientation of the contourin space was (p, q) = (30,5) and the estimated orientation was (p, q)=(30, 4 .99). Figure 22 shows the imagesof a translating planar contour (human figure) taken by a binocular system at two different time instants .The actual orientation of the contour in space was (p,q)=(10,5) and the actual direction of translation(dx/dz, dy/dz) = (- 4,6) . Our program recovered orientation (p, q) = (10 .00007,5 .000297) and direction oftranslation (dx/dz, dy/dz) = (- 4.000309,6 .00463) . Figure 23 shows again the perspective images of a trans-lating planar contour taken by a binocular system at two different time instances . The actual orientationof the contour was (p, q)=(-25, 30) and the direction of translation (dx/dz, dy/dz) = (50, 60) . The computedorientation from these images was (p, q)=(-24 .99,30.000021) and the computed direction of translation(dx/dz,dy/dz)=(49.858421,59 .830266) . Finally, Figure 24 shows the perspective images of a translatingplanar contour taken by a binocular system at two different times . The actual orientation of the contourwas (p,q)=(t0,-11) and the direction of translation (dx/dz,dy/dz)=(1 .66,3 .33) . The estimatedparameters from these images were (p, q)=(9 .99,-11.000383) and (dx/dz,dy/dz)=(1 .66,3 .33).

Acknowledgements

Our thanks go to Dana Ballard and Amit Bandyopadhyay for their help during the preparation of thispaper. This research was sponsored by the Defense Advanced Research Projects Agency under GrantDACA76-85-C-0001 .

References

Aho, A . V ., J . E . Hopcroft and 1 .D. Ullman (1974) . The Design and Analysis ofComputer Algorithms . Addison-Wesley . Reading, ,10A .Aloimonos, J . (1986) . Shape and motion from contour . Proc . IEEE-CVPR, Miami, FL, June .Aloimonos, J . and P . Chou (1985) . Detection of surface orientation and motion from texture . TRI61. Dept . of Computer Science,

Univ . of Rochester, January .Aloimonos, J . and M . Swain (1985) . Shape from texture . Proceedings IJCA185 . Los Angeles, CA, Aug .Bajcsy, R . and L . Lieberman (1986) . Texture gradients as a depth cue . Comp. Graphics Image Processing 5, 52-67 .Ballard, D .H . and C .M. Brown (1982) . Computer Vision . Prentice-Hall, Englewood Cliffs, NJ .Bandyopadhyay, A . (1984) . Interesting points, disparities and correspondence . Proceedings DARPA Image Understanding Workshop,

October.Bandyopadhyay, A . and J . Aloimonos (1985) . Perception of structure and motion of rigid o bjects. TR 169, Dept. of Computer Science,

Univ . of Rochester .Barrow, H .G . and Jbt . Tenenbaum (1978) . Recovering intrinsic scene characteristics from images . In : A . Hanson and E . Riseman,

Eds, Computer Vision Systems . Academic Press, New York .Brady . M . and A . Yuille (1984) . An extremum principle for shape from contour . IEEE Trans. Part . Anal . Mach . Intel!. 6, 288-301 .Gibson, J .J . (1950) . The Perception of the Visual World. Houghton ,Miffin, Boston,Horn, B.K .P . (1977) . Understanding image intensities . Artificial Intel?. 8(2), 201-231 .Ikeuchi, K . (1984) . Shape from regular patterns . Artificial Intell . 22, 49-75 .Ikeuchi, K . and B .K .P . Horn (1981) . Numerical shape from shading and occluding boundaries . Artificial Intell . 17, 141-184 .Kanade, T . (1981) . Recovery of the three dimensional shape of an object from a single view . Artificial Intell . 17, 409-460 .

1 6 7

Page 18: Texture, contour, shape, and motion

Volume 5, Number 2

PATTERN RECOGNITION LETTERS

February 1987

Kanatani, K . (1985a) . Tracing planar surface motion from projection without knowing correspondence . CVGIP 29, 1-12 .Kanatani, K . (1985b) . Detecting the motion of a planar surface by line and surface integrals . CVGIP 29, 13-22 .Kender, J .R . (1980) . Shape from texture : An aggregation transform that maps a class of textures into surface orientation . Proceedings

IJCAI, 475-480 .Kender, 1 .R . (1979) . Shape from texture : A computational paradigm . Proceedings DARPA Image Understanding Workshop, April,

79-84 .Longuet-Higgins, H.C. (1981) . A computer algorithm for reconstructing a scene from two projections . Nature 239(10), 133-135 .Mart, D. and H . Hildreth (1979) . A theory of edge detection . Al Memo 518, MIT .Ohia, Y ., K . Maenobu and T . Sakai (1981) . Obtaining surface orientation of from texels under perspective projection . Proceedings

IJCAI, 746-751 .Tsai, R .Y . and T .S . Huang (1984) . Uniqueness and estimation of three-dimensional motion parameters of rigid objects with curved

surfaces . IEEE Trans . Pall. Anal. Mach . Intel!. 6, 13-27 .Ullman, S. (1979) . The Interpretation of Visual Motion . MIT Press, Cambridge .Witkin, A. (1981) . Recovering surface shape and orientation from texture . Artificial Intel!. 17, 17-45-

1 6 8