concurrent 3-d motion segmentation and 3-d interpretation of temporal sequences of monocular images

SEKKATI AND MITICHE: CONCURRENT 3-D MOTION SEGMENTATION 9

Fig. 6. (a) Gray scale of depth. (b) Ground truth image motion of marbled-block. (c) Recovered image motion from the 3-D in estimated by our method. (d) Imagemotion computed with Horn–Schunck method.

does not produce a translation in the projection plane [see(1)]. This can be clearly seen in this example.

Fig. 4 displays the recorded depth, triangulated and shaded ac-cording to a local light source.Toshowthe improvement obtainedwith anisotropic smoothing of depth (11) in preserving bound-aries (eyes, noses, bow tie), the recovered structure is displayedwhen using anisotropy [Fig. 4(a)] and without anisotropy [(12);Fig. 4(b)]. The estimated depth is shown in gray scale in Fig. 4(c).

In this next example, we use the Marbled-block sequence(image database of KOGS/IAKS Laboratory, University ofKarlsruhe, Germany). Two blocks are in motion independentlyagainst a static background. The larger block moves to the leftand slightly up. The other block moves diagonally to the left.Fig. 5(a) shows the first of the two frames used, along with thetwo curves of the initial segmentation. The blocks cast shadows.Also, the texture of the top of the blocks is identical to that ofthe background, causing two of the image occluding edges tobe very week, not visible at places. For this sequence, depthpresents sharp discontinuities at occlusion boundaries. Thecomputed segmentation is displayed in Fig. 5(b). The recon-structed depth of the blocks (triangulated and shaded accordingto a local light source) are shown in Fig. 5(c) and (d). Note thatthe edges between the blocks faces are preserved due to the useof anisotropic diffusion on depth. A gray level representation of

depth is also given in Fig. 6(a). The ground truth image motion,the image motion reconstructed from the estimated depth (1),and image motion computed by the Horn–Schunck algorithm,are displayed in Fig. 6(b)–(d), respectively.

This last example uses the Robert sequence of real imagestaken in common, realistic conditions (Heinrich-Hertz Instituteimage database, Germany). This is a complex sequence ofreal images of a person moving his head, lips, and eyebrowsagainst a noisy textured background. The head moves backwardand slightly to the left. The movements of the eyebrows andlips do not fit the rigid motion assumption when consideredwith the movement of the head. The sequence is taken undercommon lighting. There are large areas (hair, cheeks, forehead)with weak spatiotemporal variations. The particularities of thissequence make it a good testbed to show the difficulties faced by,and the limitations of, algorithms to reconstruct 3-D structurefrom a temporal sequence spatiotemporal variations. Fig. 7(a)shows the first frame of this sequence along with the curveof the initial segmentation. The final segmentation is shownin Fig. 7(b). The image motion reconstructed from (1), andimage motion computed by the Horn–Schunck algorithm aredisplayed in Fig. 7(c) and (d), respectively. The reconstructeddepth of the head (triangulated and shaded according to a locallight source) is shown in Fig. 7(e), and the corresponding gray

10 IEEE TRANSACTIONS ON IMAGE PROCESSING

Fig. 7. (a) First frame of robert sequence and initial level set. (b) Computed 3-D motion segmentation. (c) Image motion reconstructed from the 3-D variablesestimated by our method. (d) Image motion computed with Horn–Schunk method. (e) Reconstructed 3-D structure of the moving head. (f) Gray scale of depth.

scale of depth of the whole scene is shown in Fig. 7(f). Thecomputed segmentation is overall satisfactory although partsof the hair and ears, as well as the lips and eyebrows areaffected to the background due to their lack of texture. Also,the reconstructed depth is shallower than in the other examples.This is due to the weak spatiotemporal variations on the imageof practically the entire face. Varying the initialization has giventhe same segmentation and 3-D interpretation. As with theother examples, careful setting of the weights in the functionalis necessary since these weights set the relative contributionof the various terms in the functional.

A. Anaglyph Viewing of the Computed 3-D Interpretation

We can view the 3-D interpretation of a monocular sequenceby viewing an anaglyph of a stereoscopic image constructedfrom an image of the monocular sequence and the estimateddepth for that image. Anaglyphs on paper are best viewed whenprinted on high quality photographic paper. When viewing on aCRT screen, high resolution and display options for high qualitycolor image rendering offer a clearer impression of depth. In allcases, however, anaglyphs offer a good, inexpensive means ofviewing 3-D interpretation results.


Fig. 8. Color anaglyphs of (a) squares, (b) marbled-block, (c) teddy1, and (d) Robert.

Given an image and the corresponding depth map, weconstructed a stereoscopic image using the following simplescheme. Let be the given image. will be one of the twoimages of the stereoscopic pair and we construct the otherimage, . Let be the viewing system representing thecamera which acquired , and that of the other (fictitious)

camera. Both viewing systems are as in Fig. 1. is placed todiffer from by a translation of amount along the X axis.

Let a point on the image position array of , corre-sponding to a point in space with coordinates in

. The coordinates of in are , , and. The image of in the image domain of are,


according to our viewing system model (Fig. 1)

(22)

(23)

Because depth has been estimated, coordinates areknown. Image , which will be the second of the stereoscopicpair, is then constructed as follows:

(24)

where is the -coordinate of the point on the image positionalarray of with coordinate closest to . Alternatively, onecan use interpolation. However, we found it unnecessary for ourpurpose here.

The anaglyph images constructed for the four sequences areshown in Fig. 8. They are to be viewed with chromatic (red-blue)glasses (common, inexpensive commercial plastic glasses areavailable). Viewers presented with these anaglyphs experienceda strong sense of depth for all sequences. The 3-D interpretationof the Squares sequence places the two objects at the same depthagainst a the background because the regularization term fordepth vanishes for planar surfaces. The anaglyphs have beengenerated using the algorithm in [40] (courtesy of E. Dubois,its inventor).

V. CONCLUSION

We presented a novel method to segment multiple indepen-dent 3-D motions and simultaneously infer 3-D interpretationin temporal sequences of monocular images. Both viewingsystem and viewed objects were allowed to move. The problemwas stated according to a variational formulation. The cor-responding Euler–Lagrange descent equations led to analgorithm which, after initialization, iterated three consecutivesteps, namely, computation of rigid 3-D motion parameters byleast squares, depth by gradient descent, and curve evolutionby level sets PDEs. The algorithm and its implementation havebeen validated on synthetic and real image sequences. Viewersstrongly perceived depth in stereoscopic images constructedfrom the scheme’s output.

APPENDIX

The energy (7) can be written in general form as

(25)

where . Thefunctional derivative of with respect to is

where

Therefore

which yields (9). Following the justification in [34] and [37],the descent equations with respect to , , areobtained by simultaneous minimization of the followingfunctionals:

(26)where is the set complement of and

. Using the generic derivations in [27], [38],the derivative with respect to of the terms of (26) are givenby

which lead to evolution (17) for , .

REFERENCES

[1] T. Huang, Image Sequence Analysis. New York: Springer-VerlagBerlin Heidelberg, 1981.

[2] J. Aloimonos and C. Brown, “Direct processing of curvilinear sensormotion from a sequence of perspective images,” in Proc. IEEE Work-shop on Computer Vision: Representation and Analysis, Annapolis, MD,1984, pp. 72–77.

[3] B. Horn and E. Weldon, “Direct methods for recovering motion,” Int. J.Comput. Vis., vol. 2, no. 2, pp. 51–76, 1988.

[4] J. Aggarwal and N. Nandhakumar, “On the computation of motion froma sequence of images: a review,” Proc. IEEE, vol. 76, no. 8, pp. 917–935,Aug. 1988.

[5] T. Huang and A. Netravali, “Motion and structure from feature corre-spondences: A review,” Proc. IEEE, vol. 82, no. 2, pp. 252–268, Feb.1994.

[6] O. Faugeras, Three Dimensional Computer Vision: A Geometric View-point. Cambridge, MA: MIT Press, 1993.

[7] A. Mitiche, Computational Analysis of Visual Motion. New York:Plenum, 1994.

[8] A. Zisserman and R. Hartly, Multiple View Geometry in Computer Vi-sion. Cambridge, U.K.: Cambridge Univ. Press, 2000.

[9] B. Horn and B. Schunk, “Determining optical flow,” Artif. Intell., no. 17,pp. 185–203, 1981.

[10] G. Aubert, G. Deriche, and P. Kornprobst, “Computing optical flow viavariational techniques,” SIAM J. Appl. Math., vol. 60, no. 1, pp. 156–182,1999.

[11] A. Mitiche and P. Bouthemy, “Computation and analysis of image mo-tion: a synopsis of current problems and methods,” Int. J. Comput. Vis.,vol. 19, no. 1, pp. 29–55, 1996.

[12] J. Mellor, S. Teller, and T. Lozano-Perez, “Dense depth map for epipolarimages,” in Proc. Image Understanding Workshop, 1997, pp. 893–900.

[13] S. Negahdaripour and B. Horn, “Direct passive navigation,” IEEE Trans.Pattern Anal. Mach. Intell., vol. PAMI-9, no. 1, pp. 168–176, 1987.

[14] R. Laganiere and A. Mitiche, “Direct bayesian interpretation of visualmotion,” J. Robot. Autonom. Syst., no. 14, pp. 247–254, 1995.

[15] R. Chellappa and S. Srinivasan, “Structure from motion: sparse versusdense correspondance methods,” in Proc. Int. Conf. Image Processing,vol. 2, 1999, pp. 492–499.

https://www.researchgate.net/publication/221123747_Relative_Depth_Estimation_of_Video_Objects_for_Image_Interpolation?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=

https://www.researchgate.net/publication/224378009_Direct_Passive_Navigation?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/3908713_A_projection_method_to_generate_anaglyph_stereo_images?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=

https://www.researchgate.net/publication/220142666_Direct_Bayesian_interpretation_of_visual_motion?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/2984059_On_the_Computation_of_Motion_from_Sequences_of_Images_-_a_Review?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/2984753_Motion_and_Structure_from_Feature_Correspondences_A_Review?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/261144367_Mathematical_Problems_in_Image_Processing?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=

https://www.researchgate.net/publication/3835207_Structure_from_motion_sparse_versus_dense_correspondence_methods?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/268798941_Image_Sequence_Analysis?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/2521485_Dense_Depth_Maps_from_Epipolar_Images?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/30869787_Three-Dimensional_Computer_Vision_a_Geometric_Viewpoint_The_MIT_Press?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/243657841_Computational_Analysis_of_Visual_Motion?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/239511878_Direct_processing_of_curvi-linear_sensor_motion_from_a_sequence_of_perspective_images?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=




https://www.researchgate.net/publication/220660453_Computation_and_analysis_of_image_motion_a_synopsis_of_current_problems_and_methods_Int_J_Comput_Vision?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/220660553_Direct_methods_for_recovering_motion?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/2747050_Computing_Optical_Flow_Via_Variational_Techniques?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/215458920_Multi-View_Geometry_in_Computer_Vision?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/30833725_Determining_optical_flow_A_retrospective?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/220181333_Region_Competition_Unifying_Snakes_Region_Growing_and_BayesMDL_for_Multiband_Image_Segmentation?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


[16] Y. Hung and H. Ho, “A kalman filter approach to direct depth estima-tion incorporating surface structure,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 21, no. 6, pp. 570–575, Jun. 1999.

[17] S. Srinivasan, “Extracting structure from optical flow using the fast errorsearch technique,” Int. J. Comput. Vis., vol. 37, no. 3, pp. 203–230, 2000.

[18] T. Brodsky, C. Fermuller, and Y. Aloimonos, “Structure from motion:beyond the epipolar constraint,” Int. J. Comput. Vis., vol. 37, no. 3, pp.231–258, 2000.

[19] H. Longuet-Higgins and K. Prazdny, “The interpretation of a movingretinal image,” Proc. Roy. Soc. Lond. B, pp. 385–397, 1981.

[20] G. Adiv, “Determining three-dimensional motion and structure from op-tical flow generated by several moving objects,” IEEE Trans. PatternAnal. Mach. Intell., vol. PAMI-7, no. 4, pp. 384–401, Apr. 1985.

[21] A. Mitiche and S. Hadjres, “Mdl estimation of a dense map of rela-tive depth and 3D motion from a temporal sequence of images,” PatternAnal. Applicat., no. 6, pp. 78–87, 2003.

[22] H. Sekkati and A. Mitiche, “Dense 3D interpretation of image se-quences: A variational approach using anisotropic diffusion,” presentedat the Int. Conf. Image Analysis and Processing, Mantova, Italy, 2003.

[23] F. Morier, H. Nicolas, J. Benois, D. Barba, and H. Sanson, “Relativedepth estimation of video objects for image interpolation,” in Proc. Int.Conf. Image Processing, 1998, pp. 953–957.

[24] F. Martinez, J. Benois-Pineau, and D. Barba, “Extraction of the relativedepth information of objects in video sequences,” in Proc. Int. Conf.Image Processing, 1998, pp. 948–952.

[25] J. Sethian, Level Set Methods and Fast Marching Methods. Cam-bridge, U.K.: Cambridge Univ. Press, 1999.

[26] S. Jehan, M. Gastaud, M. Barlaud, and G. Aubert, “Region-based activecontours using geometrical and statistical features for image segmenta-tion,” presented at the Int. Conf. Image Processing, Barcelona, Spain,2003.

[27] G. Aubert and P. Komprobst, Mathematical Problems in Image Pro-cessing. New York: Springer, 2002.

[28] S. Osher and N. Paragios, Geometric Level Set Methods in Imaging, Vi-sion, and Graphics. New York: Springer, 2003.

[29] O. Faugeras and R. Keriven, “Variational principles, surface evolution,PDE’s, level set methods, and the stereo problem,” IEEE Trans. ImageProcess., vol. 7, no. 3, pp. 336–344, Mar. 1998.

[30] H. Sekkati and A. Mitiche, “Joint dense 3D interpretation and multiplemotion segmentation of temporal image sequences: a variational frame-work with active curve evolution and level sets,” presented at the Int.Conf. Image Processing, Singapore, 2004.

[31] D. Mumford and J. Shah, “Optimal approximation by piecewise smoothfunctions and associated variational problems,” Commun. Pure Appl.Math., no. 42, pp. 577–685, 1989.

[32] R. Feghali and A. Mitiche, “Fast computation of a boundary preservingestimate of optical flow,” in Proc. Brit. Machine Vision Conf., 2000, pp.212–221.

[33] T. Chan and L. Vese, “An active contour model without edges,” in Proc.Int. Conf. Scale-Space Theories in Computer Vision, Corfu, Greece,1999, pp. 141–151.

[34] A. Mansouri and J. Konrad, “Multiple motion segmentation with levelsets,” IEEE Trans. Image Process., vol. 12, no. 2, pp. 201–220, Feb.2003.

[35] A. Mansouri, A. Mitiche, and C. Vazquez, “Image segmentation by mul-tiregion competition,” presented at the Reconnaissance de Formes et In-telligence Artificielle Conf., RFIA-04, Toulouse, France, 2004.

[36] C. Vazquez, A. Mitiche, and I. B. Ayed, “Segmentation of vectorial im-ages by a global curve evolution method,” presented at the Reconnais-sance de Formes et Intelligence Artificielle Conf., RFIA-04, Toulouse,France, 2004.

[37] , “Image segmentation as regularized clustring: a fully flobal curveevolution method,” in Int. Conf. Image Processing, Singapore, 2004, pp.3467–3470.

[38] S. Zhu and A. Yuille, “Region competition: unifying snakes, regiongrowing, and bayes/mdl for multiband image segmentation,” IEEETrans. Pattern Anal. Mach. Intell., vol. 18, no. 9, pp. 884–900, Sep.1996.

[39] A. Mitiche, R. Feghali, and A. Mansouri, “Motion tracking as spatio-temporal motion boundary detection,” J. Robot. Autonom. Syst., no. 43,pp. 39–50, 2003.

[40] E. Dubois, “A projection method to generate anaglyph stereo images,”in Proc. ICASP, vol. III, 2001, pp. 1661–1664.

Hicham Sekkati received the Licence Ès Sciencesin physics and the Diplome d’études approfondies(DEA) in automatic and signal processing fromthe university Mohammed V, Rabat, Morocco, in1993 and 1994, respectively, and the M.S. degreein telecommunications from the National Instituteof Scientific Research (INRS-EMT), Montreal, QC,Canada, in 2003. He is currently pursuing the Ph.D.degree at the INRS-EMT.

His research interests are in computer vision andimage processing.

Amar Mitiche received the Licence És Sciences inmathematics from the University of Algiers, Algiers,Algeria, and the Ph.D. degree in computer sciencefrom the University of Texas, Austin.

He is currently a Professor in the Department oftelecommunications, Institut National de RechercheScientifique, Montreal, QC, Canada. His researchinterests include computer vision, motion analysisin monocular and stereoscopic image sequences(detection, estimation, segmentation, and tracking)with a focus on methods based on level-set PDEs,

and written text recognition with a focus on neural networks methods.

https://www.researchgate.net/publication/236121217_Geometric_Level_Set_Methods_in_Imaging_Vision_and_Graphics?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/236122816_Image_segmentation_by_multiregion_competition?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/221122973_Extraction_of_the_Relative_Depth_Information_of_Objects_in_Video_Sequences?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=






https://www.researchgate.net/publication/4035679_Dense_3D_interpretation_of_image_sequences_A_variational_approach_using_anisotropic_diffusion?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/4137730_Joint_dense_3D_interpretation_and_multiple_motion_segmentation_of_temporal_image_sequences_A_variational_framework_with_active_curve_evolution_and_level_sets?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=






https://www.researchgate.net/publication/3192984_A_Kalman_filter_approach_to_direct_depth_estimation_incorporating_surface_structure?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/2478411_An_Active_Contour_Model_without_Edges?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/222544333_Motion_tracking_as_spatio-temporal_motion_boundary_detection?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=





https://www.researchgate.net/publication/227213154_Structure_from_Motion_Beyond_the_Epipolar_Constraint?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/224377887_Determining_Three-Dimensional_Motion_and_Structure_from_Optical_Flow_Generated_by_Several_Moving_Objects?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/37446251_Level_Set_Methods_and_Fast_Marching_Method?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/2388455_Fast_Computation_of_a_Boundary_Preserving_Estimate_of_Optical_Flow?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/221123802_Region-based_active_contours_using_geometrical_and_statistical_features_for_image_segmentation?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=




https://www.researchgate.net/publication/2702171_Extracting_Structure_from_Optical_Flow_Using_the_Fast_Error_Search_Technique?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/41224935_Optimal_Approximation_by_Piecewise_Smooth_Functions_and_Associated_Variational_Problems_Comm_Pure_Appl_Math?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=



https://www.researchgate.net/publication/17080509_The_Interpretation_of_a_Moving_Retinal_Image?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=


https://www.researchgate.net/publication/225778875_MDL_estimation_of_a_dense_map_of_relative_depth_and_3D_motion_from_a_temporal_sequence_of_images?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=







https://www.researchgate.net/publication/4035696_Variational_Principles_Surface_Evolution_PDE's_Level_Set_Methods_and_the_Stereo_Problem?el=1_x_8&enrichId=rgreq-09748f57938a8c12dd05c8b12683030a-XXX&enrichSource=Y292ZXJQYWdlOzMzMjgxMzE7QVM6MTQ3MDY4MjI4MjE0Nzg0QDE0MTIwNzUyMDYxOTM=




Fig. 6. (a) Gray scale of depth. (b) Ground truth image motion of marbled-block. (c) Recovered image motion from the 3-D in estimated by our method. (d) Imagemotion computed with Horn–Schunck method.

does not produce a translation in the projection plane [see(1)]. This can be clearly seen in this example.

Fig. 4 displays the recorded depth, triangulated and shaded ac-cording to a local light source.Toshowthe improvement obtainedwith anisotropic smoothing of depth (11) in preserving bound-aries (eyes, noses, bow tie), the recovered structure is displayedwhen using anisotropy [Fig. 4(a)] and without anisotropy [(12);Fig. 4(b)]. The estimated depth is shown in gray scale in Fig. 4(c).

In this next example, we use the Marbled-block sequence(image database of KOGS/IAKS Laboratory, University ofKarlsruhe, Germany). Two blocks are in motion independentlyagainst a static background. The larger block moves to the leftand slightly up. The other block moves diagonally to the left.Fig. 5(a) shows the first of the two frames used, along with thetwo curves of the initial segmentation. The blocks cast shadows.Also, the texture of the top of the blocks is identical to that ofthe background, causing two of the image occluding edges tobe very week, not visible at places. For this sequence, depthpresents sharp discontinuities at occlusion boundaries. Thecomputed segmentation is displayed in Fig. 5(b). The recon-structed depth of the blocks (triangulated and shaded accordingto a local light source) are shown in Fig. 5(c) and (d). Note thatthe edges between the blocks faces are preserved due to the useof anisotropic diffusion on depth. A gray level representation of

depth is also given in Fig. 6(a). The ground truth image motion,the image motion reconstructed from the estimated depth (1),and image motion computed by the Horn–Schunck algorithm,are displayed in Fig. 6(b)–(d), respectively.

This last example uses the Robert sequence of real imagestaken in common, realistic conditions (Heinrich-Hertz Instituteimage database, Germany). This is a complex sequence ofreal images of a person moving his head, lips, and eyebrowsagainst a noisy textured background. The head moves backwardand slightly to the left. The movements of the eyebrows andlips do not fit the rigid motion assumption when consideredwith the movement of the head. The sequence is taken undercommon lighting. There are large areas (hair, cheeks, forehead)with weak spatiotemporal variations. The particularities of thissequence make it a good testbed to show the difficulties faced by,and the limitations of, algorithms to reconstruct 3-D structurefrom a temporal sequence spatiotemporal variations. Fig. 7(a)shows the first frame of this sequence along with the curveof the initial segmentation. The final segmentation is shownin Fig. 7(b). The image motion reconstructed from (1), andimage motion computed by the Horn–Schunck algorithm aredisplayed in Fig. 7(c) and (d), respectively. The reconstructeddepth of the head (triangulated and shaded according to a locallight source) is shown in Fig. 7(e), and the corresponding gray


Fig. 7. (a) First frame of robert sequence and initial level set. (b) Computed 3-D motion segmentation. (c) Image motion reconstructed from the 3-D variablesestimated by our method. (d) Image motion computed with Horn–Schunk method. (e) Reconstructed 3-D structure of the moving head. (f) Gray scale of depth.

scale of depth of the whole scene is shown in Fig. 7(f). Thecomputed segmentation is overall satisfactory although partsof the hair and ears, as well as the lips and eyebrows areaffected to the background due to their lack of texture. Also,the reconstructed depth is shallower than in the other examples.This is due to the weak spatiotemporal variations on the imageof practically the entire face. Varying the initialization has giventhe same segmentation and 3-D interpretation. As with theother examples, careful setting of the weights in the functionalis necessary since these weights set the relative contributionof the various terms in the functional.

A. Anaglyph Viewing of the Computed 3-D Interpretation

We can view the 3-D interpretation of a monocular sequenceby viewing an anaglyph of a stereoscopic image constructedfrom an image of the monocular sequence and the estimateddepth for that image. Anaglyphs on paper are best viewed whenprinted on high quality photographic paper. When viewing on aCRT screen, high resolution and display options for high qualitycolor image rendering offer a clearer impression of depth. In allcases, however, anaglyphs offer a good, inexpensive means ofviewing 3-D interpretation results.


Fig. 8. Color anaglyphs of (a) squares, (b) marbled-block, (c) teddy1, and (d) Robert.

Given an image and the corresponding depth map, weconstructed a stereoscopic image using the following simplescheme. Let be the given image. will be one of the twoimages of the stereoscopic pair and we construct the otherimage, . Let be the viewing system representing thecamera which acquired , and that of the other (fictitious)

camera. Both viewing systems are as in Fig. 1. is placed todiffer from by a translation of amount along the X axis.

Let a point on the image position array of , corre-sponding to a point in space with coordinates in

. The coordinates of in are , , and. The image of in the image domain of are,


according to our viewing system model (Fig. 1)

(22)

(23)

Because depth has been estimated, coordinates areknown. Image , which will be the second of the stereoscopicpair, is then constructed as follows:

(24)

where is the -coordinate of the point on the image positionalarray of with coordinate closest to . Alternatively, onecan use interpolation. However, we found it unnecessary for ourpurpose here.

The anaglyph images constructed for the four sequences areshown in Fig. 8. They are to be viewed with chromatic (red-blue)glasses (common, inexpensive commercial plastic glasses areavailable). Viewers presented with these anaglyphs experienceda strong sense of depth for all sequences. The 3-D interpretationof the Squares sequence places the two objects at the same depthagainst a the background because the regularization term fordepth vanishes for planar surfaces. The anaglyphs have beengenerated using the algorithm in [40] (courtesy of E. Dubois,its inventor).

V. CONCLUSION

We presented a novel method to segment multiple indepen-dent 3-D motions and simultaneously infer 3-D interpretationin temporal sequences of monocular images. Both viewingsystem and viewed objects were allowed to move. The problemwas stated according to a variational formulation. The cor-responding Euler–Lagrange descent equations led to analgorithm which, after initialization, iterated three consecutivesteps, namely, computation of rigid 3-D motion parameters byleast squares, depth by gradient descent, and curve evolutionby level sets PDEs. The algorithm and its implementation havebeen validated on synthetic and real image sequences. Viewersstrongly perceived depth in stereoscopic images constructedfrom the scheme’s output.

APPENDIX

The energy (7) can be written in general form as

(25)

where . Thefunctional derivative of with respect to is

where

Therefore

which yields (9). Following the justification in [34] and [37],the descent equations with respect to , , areobtained by simultaneous minimization of the followingfunctionals:

(26)where is the set complement of and

. Using the generic derivations in [27], [38],the derivative with respect to of the terms of (26) are givenby

which lead to evolution (17) for , .

REFERENCES

[1] T. Huang, Image Sequence Analysis. New York: Springer-VerlagBerlin Heidelberg, 1981.

[2] J. Aloimonos and C. Brown, “Direct processing of curvilinear sensormotion from a sequence of perspective images,” in Proc. IEEE Work-shop on Computer Vision: Representation and Analysis, Annapolis, MD,1984, pp. 72–77.

[3] B. Horn and E. Weldon, “Direct methods for recovering motion,” Int. J.Comput. Vis., vol. 2, no. 2, pp. 51–76, 1988.

[4] J. Aggarwal and N. Nandhakumar, “On the computation of motion froma sequence of images: a review,” Proc. IEEE, vol. 76, no. 8, pp. 917–935,Aug. 1988.

[5] T. Huang and A. Netravali, “Motion and structure from feature corre-spondences: A review,” Proc. IEEE, vol. 82, no. 2, pp. 252–268, Feb.1994.

[6] O. Faugeras, Three Dimensional Computer Vision: A Geometric View-point. Cambridge, MA: MIT Press, 1993.

[7] A. Mitiche, Computational Analysis of Visual Motion. New York:Plenum, 1994.

[8] A. Zisserman and R. Hartly, Multiple View Geometry in Computer Vi-sion. Cambridge, U.K.: Cambridge Univ. Press, 2000.

[9] B. Horn and B. Schunk, “Determining optical flow,” Artif. Intell., no. 17,pp. 185–203, 1981.

[10] G. Aubert, G. Deriche, and P. Kornprobst, “Computing optical flow viavariational techniques,” SIAM J. Appl. Math., vol. 60, no. 1, pp. 156–182,1999.

[11] A. Mitiche and P. Bouthemy, “Computation and analysis of image mo-tion: a synopsis of current problems and methods,” Int. J. Comput. Vis.,vol. 19, no. 1, pp. 29–55, 1996.

[12] J. Mellor, S. Teller, and T. Lozano-Perez, “Dense depth map for epipolarimages,” in Proc. Image Understanding Workshop, 1997, pp. 893–900.

[13] S. Negahdaripour and B. Horn, “Direct passive navigation,” IEEE Trans.Pattern Anal. Mach. Intell., vol. PAMI-9, no. 1, pp. 168–176, 1987.

[14] R. Laganiere and A. Mitiche, “Direct bayesian interpretation of visualmotion,” J. Robot. Autonom. Syst., no. 14, pp. 247–254, 1995.

[15] R. Chellappa and S. Srinivasan, “Structure from motion: sparse versusdense correspondance methods,” in Proc. Int. Conf. Image Processing,vol. 2, 1999, pp. 492–499.











































[16] Y. Hung and H. Ho, “A kalman filter approach to direct depth estima-tion incorporating surface structure,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 21, no. 6, pp. 570–575, Jun. 1999.

[17] S. Srinivasan, “Extracting structure from optical flow using the fast errorsearch technique,” Int. J. Comput. Vis., vol. 37, no. 3, pp. 203–230, 2000.

[18] T. Brodsky, C. Fermuller, and Y. Aloimonos, “Structure from motion:beyond the epipolar constraint,” Int. J. Comput. Vis., vol. 37, no. 3, pp.231–258, 2000.

[19] H. Longuet-Higgins and K. Prazdny, “The interpretation of a movingretinal image,” Proc. Roy. Soc. Lond. B, pp. 385–397, 1981.

[20] G. Adiv, “Determining three-dimensional motion and structure from op-tical flow generated by several moving objects,” IEEE Trans. PatternAnal. Mach. Intell., vol. PAMI-7, no. 4, pp. 384–401, Apr. 1985.

[21] A. Mitiche and S. Hadjres, “Mdl estimation of a dense map of rela-tive depth and 3D motion from a temporal sequence of images,” PatternAnal. Applicat., no. 6, pp. 78–87, 2003.

[22] H. Sekkati and A. Mitiche, “Dense 3D interpretation of image se-quences: A variational approach using anisotropic diffusion,” presentedat the Int. Conf. Image Analysis and Processing, Mantova, Italy, 2003.

[23] F. Morier, H. Nicolas, J. Benois, D. Barba, and H. Sanson, “Relativedepth estimation of video objects for image interpolation,” in Proc. Int.Conf. Image Processing, 1998, pp. 953–957.

[24] F. Martinez, J. Benois-Pineau, and D. Barba, “Extraction of the relativedepth information of objects in video sequences,” in Proc. Int. Conf.Image Processing, 1998, pp. 948–952.

[25] J. Sethian, Level Set Methods and Fast Marching Methods. Cam-bridge, U.K.: Cambridge Univ. Press, 1999.

[26] S. Jehan, M. Gastaud, M. Barlaud, and G. Aubert, “Region-based activecontours using geometrical and statistical features for image segmenta-tion,” presented at the Int. Conf. Image Processing, Barcelona, Spain,2003.

[27] G. Aubert and P. Komprobst, Mathematical Problems in Image Pro-cessing. New York: Springer, 2002.

[28] S. Osher and N. Paragios, Geometric Level Set Methods in Imaging, Vi-sion, and Graphics. New York: Springer, 2003.

[29] O. Faugeras and R. Keriven, “Variational principles, surface evolution,PDE’s, level set methods, and the stereo problem,” IEEE Trans. ImageProcess., vol. 7, no. 3, pp. 336–344, Mar. 1998.

[30] H. Sekkati and A. Mitiche, “Joint dense 3D interpretation and multiplemotion segmentation of temporal image sequences: a variational frame-work with active curve evolution and level sets,” presented at the Int.Conf. Image Processing, Singapore, 2004.

[31] D. Mumford and J. Shah, “Optimal approximation by piecewise smoothfunctions and associated variational problems,” Commun. Pure Appl.Math., no. 42, pp. 577–685, 1989.

[32] R. Feghali and A. Mitiche, “Fast computation of a boundary preservingestimate of optical flow,” in Proc. Brit. Machine Vision Conf., 2000, pp.212–221.

[33] T. Chan and L. Vese, “An active contour model without edges,” in Proc.Int. Conf. Scale-Space Theories in Computer Vision, Corfu, Greece,1999, pp. 141–151.

[34] A. Mansouri and J. Konrad, “Multiple motion segmentation with levelsets,” IEEE Trans. Image Process., vol. 12, no. 2, pp. 201–220, Feb.2003.

[35] A. Mansouri, A. Mitiche, and C. Vazquez, “Image segmentation by mul-tiregion competition,” presented at the Reconnaissance de Formes et In-telligence Artificielle Conf., RFIA-04, Toulouse, France, 2004.

[36] C. Vazquez, A. Mitiche, and I. B. Ayed, “Segmentation of vectorial im-ages by a global curve evolution method,” presented at the Reconnais-sance de Formes et Intelligence Artificielle Conf., RFIA-04, Toulouse,France, 2004.

[37] , “Image segmentation as regularized clustring: a fully flobal curveevolution method,” in Int. Conf. Image Processing, Singapore, 2004, pp.3467–3470.

[38] S. Zhu and A. Yuille, “Region competition: unifying snakes, regiongrowing, and bayes/mdl for multiband image segmentation,” IEEETrans. Pattern Anal. Mach. Intell., vol. 18, no. 9, pp. 884–900, Sep.1996.

[39] A. Mitiche, R. Feghali, and A. Mansouri, “Motion tracking as spatio-temporal motion boundary detection,” J. Robot. Autonom. Syst., no. 43,pp. 39–50, 2003.

[40] E. Dubois, “A projection method to generate anaglyph stereo images,”in Proc. ICASP, vol. III, 2001, pp. 1661–1664.

Hicham Sekkati received the Licence Ès Sciencesin physics and the Diplome d’études approfondies(DEA) in automatic and signal processing fromthe university Mohammed V, Rabat, Morocco, in1993 and 1994, respectively, and the M.S. degreein telecommunications from the National Instituteof Scientific Research (INRS-EMT), Montreal, QC,Canada, in 2003. He is currently pursuing the Ph.D.degree at the INRS-EMT.

His research interests are in computer vision andimage processing.

Amar Mitiche received the Licence És Sciences inmathematics from the University of Algiers, Algiers,Algeria, and the Ph.D. degree in computer sciencefrom the University of Texas, Austin.

He is currently a Professor in the Department oftelecommunications, Institut National de RechercheScientifique, Montreal, QC, Canada. His researchinterests include computer vision, motion analysisin monocular and stereoscopic image sequences(detection, estimation, segmentation, and tracking)with a focus on methods based on level-set PDEs,

and written text recognition with a focus on neural networks methods.
































































concurrent 3-d motion segmentation and 3-d interpretation of temporal sequences of monocular images

Documents