velocity as a cue to segmentation

390 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, MAy 1975

Velocity as a Cue to Segmentation Despite this evidence for the importance of motion to visual

JERRY L. POTTER perception, the field of scene description has by and large ignoredit. In [15] Uhr approaches the subject. He shows how to

Abstract-Although motion is one of the most fundamental aspects of recognize objects the individual features of which appear atthe visual world, its use has not been widely explored in the field of scene different points in time. Several attempts have been made to usedescription. This correspondence describes a program that demonstrates motion in the recognition of handwriting [4], [10], [14].the feasibility of segmenting a scene into regions on the basis of velocity. Generally, the tip of a pen is monitored during the writingA velocity measure is obtained by determining the amount ofdisplacement process. The position of the pen as a function of time is then usedof a reference feature between two pictures of the same scene. The as a basis for character recognition. Several articles in meterologypictures are taken only a short time apart so that the reference features have been concerned with cloud motion [13], [7], [3]. Thesecan be correlated between the pictures on the basis of spatial position articles are primarily concerned with the detection of the globalalone. This velocity measurement is associated with absolute reference motion of cloud banks. They do not deal with local motion.points superimposed over the scene. It is assumed that all parts of an Therefore, the techniques discussed, while quite good for theirobject have the same velocity measurement. Thus the velocity measure- particular application, are not universal. For example, they tendments associated with the reference points provide a means of classifying to wor polyimtiona re than or eral thectendthem as "belonging" to the same object. Points "inside" moving objects to work poorly if motion in more than one general direction isfall into either of two velocity measurement classes, which are easily present in a single scene.correlated. Since the velocity of an object is independent of its color, ascene is segmented into regions that contain complete objects even if the THE SYSTEMobjects are not monochromatic. The program described here is part of a more comprehensive

system designed to investigate the uses of motion in sceneINTRODUCTION description. The system consists of two subsystems. One sub-

The field of scene description has in addition to the traditional system is written in PAL assembly language for the PDPI 1problems of pattern recognition, the burden of dividing the computer. It is responsible for collecting data from a Zeltex

scene into segnenscnsitinofrei.Rmodel ZD470 analog to digital converter which is connected to ascene into segmentsconsistingofrecognizableobjects.Roberts KGM 113TM television camera. The output of the converter[12] approached the problem by finding points that were on the

consists of a gray scale value ranging from zero to 255 for eachedges of objects. Line segments were passed through thesepoints, and the resulting line drawings were fitted to math- point. The equipment is capable of sampling 524 288 points ofpoints, and the resulting line drawingthe television camera picture. In practice, this is too much dataematical models. Guzman [5] segmented line drawings into

to handle economically so the PDP11 subsystem selects a 2500objects on the basis of vertex configurations. Both of these y, yprocedures involve several steps.point sample. The data gathered on the PDP1 1 is transmitted to

Attcemptes have bevenradeltosegmentasyi a Univac 1110 computer by a remote interface arrangement. TheAttempts have been made to segment a scene directly intoscn usse sporme nFrrn5adpromregions [11], [2]. These attempts are based on the concept of send ssysis p ra ein Fortran 5 and performs

finding areas of points of similar gray scale values. These methodshave been used to segment a scene into surfaces. Yet because anobject may consist of several surfaces, they do not necessarily AsSOCIATION OF VELOCITY AND POINTSsegment a scene directly into complete objects. liowever, it is Velocity is the measurement of change of position over time.possible to segment a scene directly into regions consisting of Therefore, the determination of velocity requires at least twosingle whole objects with the use of motion. This is because all pictures of the same scene taken at slightly different moments inof the points of an object move in concert. Therefore, if the time. The television camera is kept in the same position for themotion of a point in a scene can be determined, it can be grouped two pictures, so that the boundaries of both pictures can bewith other points that have the same motion on the hypothesis aligned with each other. A Cartesian coordinate grid is super-that they are part of the same object.1 A fundamental motivation imposed over the pictures such that the positive x and y axesfor this approach to motion extraction is the ability to integrate overlay the bottom and left boundaries, respectively. Thus themotion cues with other standard cues such as gray scale values point (x,y) refers to the same physical location in both pictures.and proximity. However, this correspondence is concerned with The alignment of the two pictures and the grid is essential so thatdemonstrating the use of motion for segmentation and does not the measurements from the two different pictures can beattempt to combine motion and spatial cues for scene segmen- accurately compared.tation. The goal of the program is to segment a scene into regionsMotion is one of the most important cues to visual processing. consisting of collections of points such that every point in a

Lettvin, et al. [8] established that some of a frog's ganglion region has the same velocity. However, measurements of velocitycells are capable of detecting motion. Motion detectors have can not be made directly on points, but only on objects andalso been reported in the cat [6], the pigeon [9], and the rabbit their details or features. Therefore, details of objects that can be[1]. In fact, according to Walls [17] many animals may not see used to correlate the positions of an object in the two picturesan object at all until it moves. of the scene must be identified. These details, which enable the

measurement of motion, are called reference details. The primarycriterion for a reference detail is that it must be likely that at

Manuscript received March 23, 1974; revised December 26, 1.974. This least one detail will be present in any given area of any givenwork was supported in part by the National Science Foundation underGrant GJ-363 12. scene. In addition, it is advantageous for reference details to beThe author was with the Computer Science Department, University of

Wisconsin, Madison, Wis. He is now with the Xerox' Corporation, easy to identify. A spatial discontinuity in gray scale value iS aRochester, N.Y. 14644. detail that is likely to be present in almost any portion of a

lIt is a basic assumption that "features" that move with the samevelocity are part of the same object. Certainly there are exceptions, but scene. Moreover, discontinuities are easy to detect. However, ifthis assumption especially in the limiting sense seems to hold generally. a scene contains many discontinuities, it is difficult to correlateThat is, this assumption can be made valid for most objects by restricting the in th tw pitrs eseial wethdicnnuisarthe motion to sufficiently short intervals.thmnthtw ctrsepcalyhntedsotnutsae

CORRESPONDENCE 391

in motion. This drawback is overcome by making the restrictionthat the two pictures have to be taken sufficiently close in timeso that the discontinuities can be correlated on the basis of theirproximity alone.A spatial gray scale discontinuity can be loosely thought of as

an edge. However, it is important to realize that an edge is a 0 degree directiontwo-dimensional spatial entity, while a gray scale discontinuityis a sudden change in the gray scale value along a single dimen-gr ~~~~~~~~~~180degree dir2ctionA 0 degree directionsion. Edges or parts of edges are frequently manifest as gray 4IF. I

scale discontinuities in two-dimensional representations of three- 270 degree direc1tiondimensional scenes. However, one of the major problems inscene analysis is that this is not always the case. That is, understrong light, spatial discontinuities may be present in a picturewhere no distinct edge is present in the scene. While under weaklight, the edge of an object may not produce a sufficiently distinct Fig. 1. Associating discontinuities with a point.discontinuity to be identified as an edge. One of the advantagesof the process outlined here is that it makes no difference if thediscontinuity is due to an edge or to some other aspect of theobject (such as a highlight). The only criterion for the procedureto be effective is that the discontinuity move in concert with allof the features that can be associated with a single moving 11 unitsobject.A method of relating discontinuities to points is needed. A

straightforward method is to associate a point with its nearestdiscontinuities. This is done by simply scanning in a straight linefrom a point outward until a discontinuity (or the border of the Picture I

picture) is detected by a substantial and abrupt change in thegray scale value (see Fig. 1). The distance between the point andthe discontinuity provides a unique association between themfor a given direction of scanning. Such associations are made for 14 unitseach of the four directions parallel to the positive x, the positive . -y, the negative x, and the negative y axes.The velocity2 of a discontinuity is proportionate to the

distance it moves between the time when the first and second Picture 2pictures are taken. The distance that a discontinuity moves isdetermined by subtracting the distance between a point and a Units of Velocity = 14 - 11 = 3 unitsdiscontinuity in the first picture from the distance between them Fig. 2. Determination of velocity for zero degree direction.in the second. This distance difference is assigned to the point asan approximate measure of the magnitude of its velocity in thedirection of the scan (see Fig. 2). A measurement of the truevelocity of a given point is obtained by taking a velocity measure- /ment in each of the four directions previously mentioned.An important property of this measurement of velocity is that units

its value is independent of the exact location of the point.Instead, its value is a function of the discontinuities that directly 3 unitssurround the point. If these discontinuities correspond to theedges of a single object, they are said to be "related," and thepoint is said to be 'inside" the object. Every point surrounded Picture 1

by the same set of related discontinuities has the same velocitymeasurement. Moreover, the value of the velocity measurementassociated with each surrounded point corresponds directly tothe velocity of the object the edges of which are responsible for \the discontinuities. Note also, that if two "things" have the samevelocity, then the points inside both "things" will have identicalvelocity measurements. Thus a cube that is in motion, even if it 5 u tshas a different gray scale value for each face, will be initially\/included in one segment because all the faces give the same \velocity measurement (see Fig. 3).Pitr2

This method of determining a velocity measurement is simple. Velocity for point B - 4 - 2 - 2 unitsUnfortunately, the resulting measurements require extensive Velocity for point C =5 - 3 = 2 units

Fig. 3. Effect of segmentation on basis of motion.

2 It should be pointed out that velocity is a vector quantity that has adirection component as well as magnitude.

392 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, MAY 1975

TABLE I

Picture 1 Picture 2 Differenced Summed Motion RegionMotion Type

0 90 180 270 0 90 180 270 0 90 180 270 0+ 90+ 180 270180 270 +0 +90

Point A 11 3 9 1 15 3 5 1 4 0 -4 0 24 4 16 4 1Point B 7 5 1 7 3 1 17 3 -4 -4 16 -4 4 8 24 8 2Point C 19 2 1 2 36 5 6 -16 4 4 4 4 8 24 8 3Point D 2 6 6 6 2 6 2 6 0o0 -4 0 8 12 4 12 4Point E 3 6 1 6 7 6 1 6 40 0 0 8 12 4 12 5Point P 27 3 5 1 27 3 5 9 0 0 0 8 32 4 32 12 6Point G 31 2 1 10 31 2 1 10 0 0 0 0 3212 32 12 7

Picture 1

Fig. 4. Example of point surrounded by unrelated discontinuities.

Picture 2analysis to be useful for the segmentation of a completelygeneral scene. Since the immediate purpose of this research is todetermine the feasibility of using motion for segmentationpurposes, only simple scenes consisting of rectangular shapednonoccluded objects the sides of which are parallel to the axes / __have been completely analyzed to date. This analysis and the Type 3 Type 1 Type 2description of a segmentation program based on it follow.

Regions

CLASSIFICATION OF POINTS ON BASIS OF VELOCITY Fig. 5.

Some points in a picture may be surrounded by discontinuitiesthat are not related but instead are formed by edges of different

o picture but not in the other. All four of the differenced velocityobjects, as in Fig. 4. Some points may be surrounded by related values are always nonzero. Moreover, since the commondiscontinuities in one picture but not in the other. Each of these discontiuity ysfondycng one d in the first,. . . . . . . ~~~~~~~~~discontinuity is found by scanning in one direction in the firstdifferent situations requires a different interpretation of the. . . ....... . . . ............ ~picture and in the complementary3 directio in the second, itsvelocity associated with the points. If motion is limited to one y d on iobject at a time, then seven differernt typesofregions can be velocity is determined by summing the distances between theobject at a time, then 'seven dliferent types Of regions can be .. .

identified. point and the discontinuity in the two complementary directions;one distance from each picture. These values are shown as theFig. 5 illustrates type 1-3 regions. A type 1 region contains s v m i T

. - .. . . . . sumed veloclty measuremets in Table I. In Fig. S, for examplpoints that are inside the same moving object in both pictures. t c d i f b s. . >. . s;. . ,, > . ~~~~~the common disco'ntinuity iS found by scarnning in the negativeFo'r this reason, it iS referr'ed to as the "ody" Of the object.Forthis reason, it is referred toas. the "body" of the object.x axis direction from point C in the first picture. However, in thePoints Of the body Of an object have the prop~erty that the sum.. ....Points of the body of an object have the propertsecond picture, the same discontinuity is in the positive x axisof the velocities of the differenced velocity measurements is . .

p .always zero. The entry for point A in Table I is derived from direction from point C.Fig. 5. Fig. 6 depicts type 4-7 regions. Type 4 regions contain pointsAFtypei2gregion.contains points that are surrounded by that are outside of the object in both pictures but are aligned

1. . .... . . . ~~~~~~~~~suchthat they detect the object moving toward the point. Type 5discontinuities corresponding to the edges of an object in thesecond picture but not in the first. A type 3 region (the converse rein'r opeetr otye4 hyhv onstaof typ 2) cotan pont tha ar*nieteojetih is detect the object moving away. Three of the four differenced^ . r . . -~~~~~~~~~motion values are always zero. The fourth reflects the velocity ofpicture but not in the second. A type 2 region iS referred to asthe "forward shadow" of an object because the body movesfowr inoi.A typ 3 reio is cale th "bcwr shdw

. The positive x axis and the negative x axis directions are complementary.because the points in the shadows are inside the object in one The positive y axis and the negative y axis directions are complementary.

CORRESPONDENCE 393

F

*E OD l_l_ _l

Picture I Picture 1

Picture 2 Picture 2

Type 6 [[ _ lislassified

T Y e~ I j ¶ YP 4II IRegions

Fig. 6. Fig. 7.

the discontinuity and is, therefore, the same as the related type I discontinuity in the first picture from the corresponding distance(body) region. Points D and E in Fig. 6 and Table I illustrate in the second picture for each of the four directions. The otherthis point. four, the summed velocity measurements, are obtained by addingA type 6 region is an unusual phenomenon. It contains points to the distance between the point and discontinuity in the second

that are associated with "pseudomotion." Pseudomotion is picture the distance in the complementary direction from themotion that is perpendicular to the motion of the object that first picture. A sample set of measurements for the points incaused it. The detection of pseudomotion is caused by the Figs. 5 and 6 is given in Table I. On the basis of these measure-object moving out of or into the path of the discontinuity ments, the point is initially classified as body (type 1) if thedetection scan. Three of the four differenced velocity values are complementary directions of the differenced velocity sum toalways zero for points in type 6 regions. The fourth value has no zero; shadow (type 2 and 3) if all four of the differenced velocityrelationship to the body of the object that caused it. See point F values are nonzero; and background (type 4-7) otherwise.in Fig. 6 and Table I for an example of a point in a type 6 If the new point is classified as body, its velocity values areregion. compared with all previously processed body points. If there is a

Finally, a type 7 region is one in which the points are sur- match,. they are grouped together. If there is no match, therounded by discontinuities that are not in motion. Therefore, all previously processed shadow points are examined for a velocityfour differenced velocity values are zero. See point G in Fig. 6 (speed and direction) match. If a match is found, the points' gridand Table I. values are compared to determine the shadow type.5 If they are

a forward shadow-body pair, they are associated but not groupedtogether. Thus points can be added to either group on the basis

The velocity measurement associated with each point inside an of an initial match alone. The two associated groups correspondobject is the same. This means that processing more than one to a single object. If the matched point is a backward shadowpoint in a region to obtain more velocity information is a waste point, no association is made because the matched point is notof effort. Therefore, a serial organization instead of the more part of the object in the current scene. If the new body point iscommon parallel organization was used. not successfully grouped with any other point, it is simplyA serial organization requires a method of ordering and entered in the list of processed body points.

selecting the points for processing. A short preprocessor was If the initial classification of the point is shadow, the motionwritten which looks at a list of n (a parameter) evenly distributed values are compared with all previously processed points thatpoints and orders them on the basis of their proximity to were classified as shadow. If there is a match, the points arediscontinuities. Those points that are the closest4 to the most grouped together. Otherwise, the previously processed bodydiscontinuities are ranked highest. The program processes the points are investigated to look for an exact velocity match. Ifhighest ranked points first. found, the points are tested for the forward shadow-bodyAs each new point is processed, eight velocity measurements relationship. If the test is successful, the points, as in the

are made. Four of the differenced velocity measurements are preceding, are associated but not grouped together. If the test ismade by subtracting the distance between the point and the unsuccessful, the search continues.

4 The smallest sum of the Euclidean distances in each of the four axis s Forward shadow points are in the positive velocity direction from thedirections. body points.

394 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, MAY 1975

If the initial classification is background, the point is combined occluded objects and forward and backward motion seem to bewith the other previously processed background points. If the straight forward. Rotation (especially unrestricted, i.e., rotationmotion of an object is other than parallel to one of the axes, about any point), however, seems at this point to be beyond thisthere may be "shadow" areas, which are classified as type 6 approach to motion perception. It is hoped that a report on theregions. It is assumed that the pictures were taken close enough progress of the investigation of the aforementioned extensionsin time so that these areas are small. The points in these areas will be forthcoming shortly.will have to be reclassified on the basis of their position. Fig. 7illustrates these areas.

In this process, it is unlikely that a point that is initially The author wishes to express his gratitude to Dr. L. Uhr whomisclassified due to a peculiarity of the picture will be grouped provided many helpful suggestions and criticisms during thewith points from another region. Erroneous grouping can occur course of this research.only when the velocity values for the misclassified points arecoincidentally identical to a legitimate set of values. In this case, REFERENCESno action is taken and a false segmentation will occur. [1] H. B. Barlow, R. M. Hill, and W. R. Levick, "Retinal ganglion cells

responding selectively to direction and speed of image motion in therabbit," J. Physiol., vol. 173, pp. 377-407, 1964.

1 1 10 SUBSYSTEM [2] C. R. Brice and C. L. Fennema, "Scene analysis using regions,"Artificial Intelligence, vol. 1, pp. 205-226, 1970.

The preceding program can identify all the regions of a simple [3] C. L. Bristor, "The earth location of geostationary satellite imagery,"few of heordred ponts. Te 1110 Pattern Recognition, vol. 2, pp. 269-277, 1970.scene by processing just a few of the ordered points. The1110 [4] R. M. Brown, "On-line computer recognition of handprinted charac-

subsystem of which this program is only a part is aware of this ters," IEEE Trans. Comput.^, vol. C-13, pp. 750-752, 1964.[5] A. Guzman, "Decomposition of a visual scene into three dimensionalfact. Accordingly, it reorders the list of points as each point IS bodies," in 1968 Fall Joint Computer Conf., AFIPS Conf. Proc., vol. 33,

processed. In addition, it has a hierarchically organized scene 1968, pp. 291-304.[61 D. H. Hubel and T. N. Wrisel, "Receptive fields and functionaldescription component that attempts to identify the objects on architecture in two nonstriate visual areas of the cat," J. Neurophysiol.,

the basis of the data gathered while processing the points. vol. 28, pp. 229-289, 1965.[7] J. A. Leese, C. S. Novak, and V. R. Taylor, "The determination ofWhenever a scene has been satisfactorily described, the sub- cloud pattern motions from geosynchronous satellite image data,"

systemstopsand prints its results. [8Pattern Recognition, vol. 2, 1970.systemstops~~~ ~ ~ ~ ~ ~ ~ ~~~[]J. Y. Lettvin, 'H. R. Maturana, W. S. McCulloch and W. H. Pitts,"What the frog's eye tells the frog's brain," Proc. IRE, vol. 47, pp.

RESULTS 1940-1951, Nov. 1959.[9] H. R. Maturana and S. Frenk, "Directional movement and horizontal

The current system can theoretically handle any number of edge detection in the pigeon retina," Science, vol. 142, pp. 977-979,1900.rectangular objects in a scene. To this date, it has been success- [10] T. Marrill, A. K. Hartley, T. G. Evans, B. H. Bloom, D. M. R. Park,fully tested with three black rectangular objects moving in front T. P. Hart, and D. L. Darley, "CYCLOPS-1: A second-generation

recognition system," in 1963 Fall Joint Computer Conf., AFIPS Conf.of a white background. Each object moves with a different Proc., vol. 24, 1963, pp. 27-33.velocity. The scene is always successfully segmented into three [11] MiT and S. Papert, "S. Project MAC Progress Report IV,'MTPress, Cambridge, 1967.objects and background. In certain scenes with four or more [12] L. G. Roberts, "Machine perception of three dimensional solids," in

it.mayinitially misclassify points due to unique Optical and Electro-optical Information Processing, J. T. Tippet, et al.objects, it may initially misclassify points due to unique Eds. Cambridge, Mass.: M.I.T. Press, 1965.circumstances. Fig. 4 is an example of a scene where a back- [13] E. A. Smith and D. R. Phillips, "Automated cloud tracking using

precisely aligned digital ATS pictures," IEEE Trans. Comput., vol.ground point could be initially misclassified as a shadow area C-21, pp. 715-729, July 1972.point.6 No effort is made in the current version of the program [14] W. Teitelman, "Real time recognition of hand-drawn characters,"in 1964 Fall Joint Computer Conf., AFIPS Conf. Proc., vol. 29, 1964,to reclassify misclassified points. As a result, under these pp. 559-575.circmstaces,fals sementtionwilloccr. I genral,the [15] L. Uhr, "The description of scenes over time and space," Univ.circumstances, false segmentation will occur. In general, the WicnnMaso,Th.Rp17,Fb193Wisconsin, Madison, Tech. Rep. 172, Feb. 1973.more objects moving in the scene, the poorer the performance. [16] L. Uhr and C. Vossler, "A pattern recognition program that generates,The current version is also incapable of handling occluded evaluates, and adjusts its own opertos, iPtrRotnL. Uhr Ed. New York, Wiley, 1966, pp. 349-364.objects, oblique motion parallel to an edge, rotation, or forward [17] G. L. Walls, The Vertebrate Eye and Its Adaptive Radiation. Newand backward motion. In addition, due to the simple imple-mentation of the discontinu.ity matching, the discontinuities mustbe quite pronounced.

CONCLUSION

This first attempt to use motion for segmentation was highly A Modification of a Clustering Methodsuccessful in its ability to separate moving rectangular objects IVAN TOMEKfrom stationary background. A more general ability of thisnature would be most useful in a real time situation where an Abstract-A modification of a simple clustering method is described,entire scene could not always be processed; it would enable the and its performance is improved considerably in a number of cases withanalytical selection of cohesive regions of a scene for more practically the same amount of computation. The method is based onextensive analysis. dimensionwise application of the original method. An example is shown.The program is, however, severely restricted by the types of

objects and motion it can handle. However, investigation is INTRODUCTIONcurrently under way with a cross-shaped template that promises In a recent paper on cluster analysis [1] D. J. Eigen et a!.to extend potential segmentation to any arbitrarily shaped proposed a simple clustering method based on approximation toobject undergoing any type of velocity mnovement. Extensions to marginal densities. The method is quite simple, but results of its

6 The conditions under which misclassifications can occur are quite Manuscript received July 26, 1974; revised December 2, 1974. This workvaried. They are not only a function of the!number of objects in motion was supported by the Canadian Heart Foundation under a Postdoctoralin a scene but also of the relationships between the direction of the motion Fellowship.and spatial position. The nature of these conditions has not been completely The author is with the Department of Medicine, University of Alberta,explored yet. Edmonton, Alta., Canada.

velocity as a cue to segmentation

Documents