combining color and depth for enhanced image segmentation ......

Download Combining color and depth for enhanced image segmentation ... dcor/articles/2012/Combining-color.pdf ·…

Post on 04-Jun-2018




0 download

Embed Size (px)


  • Vis Comput (2012) 28:11811193DOI 10.1007/s00371-011-0667-7

    O R I G I NA L A RT I C L E

    Combining color and depth for enhanced image segmentationand retargeting

    Meir Johnathan Dahan Nir Chen Ariel Shamir Daniel Cohen-Or

    Published online: 29 December 2011 Springer-Verlag 2011

    Abstract As depth cameras become more popular, pixeldepth information becomes easier to obtain. This informa-tion can clearly enhance many image processing applica-tions. However, combining depth and color information isnot straightforward as these two signals can have differ-ent noise characteristics, differences in resolution, and theirboundaries do not generally agree. We present a techniquethat combines depth and color image information from realdevices in synergy. In particular, we focus on combiningthem to improve image segmentation. We use color infor-mation to fill and clean depth and use depth to enhance colorimage segmentation. We demonstrate the utility of the com-bined segmentation for extracting layers and present a novelimage retargeting algorithm for layered images.

    Keywords Image segmentation Retargeting Depth maps

    1 Introduction

    In recent years we have witnessed a rapid advance in thedevelopment of cameras that capture depth. While this ad-vance has been driven by video gaming, simultaneously ac-quiring depth with photometric imaging has an immense po-tential for a wide range of applications. In particular, having

    M.J. Dahan () D. Cohen-OrTel Aviv University, Tel Aviv, Israele-mail:

    N. Chen A. ShamirInterdisciplinary Center Herzilya, Herzilya, Israel

    N. Chene-mail:

    A. Shamire-mail:

    commodity level cameras with depth sensing will consid-erably enhance the performance of image processing anddigital photography applications. However, using data fromsuch cameras may not be straightforward. First, the depthdata spatial resolution does not always match the photo-metric resolution. Second, in many cases the depth imageis incomplete, imperfect, and contains noise. Under theseassumptions, we investigate the potential of combining thetwo modalities to enhance image segmentation, which is thebasis of many image processing applications.

    In image segmentation, boundaries between objects inthe scene must be determined. In many cases, the photo-metric (color) image edges correspond to scene edges, i.e.,to discontinuity in depth. However, there are many imageedges created by illumination (e.g., shadows) or textures thatdo not signify object boundaries. Color segmentation usu-ally finds coherent color regions in an image. Real objectsin a scene such as people or animals can contain a vari-ety of colors, but usually not a wide range of depth values.Hence, using depth information can assist the mapping ofcolor-distinct regions to the same object and distinguishingbetween false image edges and real object boundaries.

    Still, simply using the raw depth information is not suffi-cient. Our key idea is to utilize the information contained inone channel to enhance the other. Therefore, the RGB chan-nel, being more accurate and having higher resolution, can,in turn, assist in enhancing the depth channel by reconstruct-ing missing depth regions and enhancing the correctness ofedges in the depth map.

    Combining the two modalities together naively does notnecessarily amount to unambiguous data from which thesegmentation can be extracted. In this paper, we introducea technique that combines the two modalities while increas-ing the agreement between them. We present an end-to-enddepth-color segmentation that provides better results than

  • 1182 M.J. Dahan et al.

    Fig. 1 Segmentation and retargeting using depth: (from left to right) color image, depth image and our automatic segmentation using color anddepth overlayed, color image retargeted by 20%, 30%, 40%, and 50%

    a direct concatenation of the two (e.g., using mean shiftor clustering). Our technique is initialized by using joint-billateral upsampling of both depth and color and then ap-plying graph cut segmentation based both modalities. Wedemonstrate the technique on depth data which is imperfect,i.e., some regions are missing, there are outliers, and noiseis present. Furthermore, the depth spatial resolution is lowerthan the photometric one, and thus also the registration be-tween the two is imperfect.

    In addition, to demonstrate the use of the combined seg-mented image, we present a novel image retargeting algo-rithm. We show how the augmented segmentation allows aqualitatively better solution to the retargeting problem byexploiting the depth information to partially overlap the lay-ers when size is reduced (see Fig. 1). We demonstrate ourtechnique on both synthetic and real depth images.

    2 Related work

    Depth There are numerous ways for estimating depth inimages, for instance, by using stereo, or multiple views instructure from motion [21]. Zhang et al. [29] introduced amethod of recovering depth from a video sequence and showmany applications; some of the images in this paper wereacquired in this manner. Other works reconstruct depth us-ing video sequences based on the motion between frames[8, 20]. Another possibility to acquire depth is simply to usedepth cameras. Such cameras are becoming a commoditythese days, and their use is widespread. Most examples inthis paper were acquired using a depth camera. However,such depth sensors still have lower resolution than the RGBsensor, and their quality in general is inferior to photometricsensors.

    Upsampling Some previous work attempted to deal withthe lower spatial resolution of one modality versus the otherby upsampling one to match the resolution of the other. Kopfet al. [10] introduced the joint bilateral filter for a pair ofcorresponding images. They upsample a low-resolution im-age while respecting the edges of the corresponding high-resolution one. This technique was then improved by using

    iterations [16, 27]. Schuon et al. [22] combined several low-resolution depth images to produce a super-resolution im-age.

    Except for low spatial resolution, depth imaging is alsosubject to noise, and, due to surface reflection properties,some regions of the surface are completely missing. Thisproblem is often referred to as depth shadow. To addressthis issue, Becker et al. [3] use the color channels taken froma camera with a known location. They rely on the assump-tion that similar geometries possess similar colors and in-corporate a patch-based inpainting technique. In our workwe also complete missing color and depth values. We usetechniques such as [12] for layer completion and reconstructthe depth with a multistep joint bilateral depth upsamplingfilter [16].

    Segmentation Segmentation is one of the fundamentalproblems in image processing [9]. As we discussed in the in-troduction, the performance of segmentation can be greatlyimproved with the assistance of depth images. Recently,Crabb et al. [6] employed depth imaging to extract fore-ground objects in real time. Their segmentation requires auser-defined threshold depth to separate the foreground fromthe background. The color image is used only to support asmall fraction (12%) of the pixels which are not solved bythe depth threshold. Another interesting use of depth is toassist background subtraction in video. In [13] several chan-nels of the depth camera are used and combined with thecolor channel.

    Image retargeting To demonstrate the usefulness of ourmethod, we show how it can be utilized for image retar-geting. Techniques such as seam-carving [1, 19], shiftmaps[15], and non-homogeneous warping methods [11, 24, 26]all rely on the ability to automatically detect important re-gions in an image and preserve them under the resizingoperator. If the image is condensed and does not containsmooth or textured areas, these techniques can cause seri-ous artifacts. Using depth layers can assist in such cases byoverlaying parts of the image while resizing. In [7] repeti-tion is used to segment, layer, and rearrange objects in animage closer to our approach, but no depth input is used.For video, Zhang et al. [28] present a video editing system

  • Combining color and depth for enhanced image segmentation and retargeting 1183

    for re-filming visually appealing effects based on the extrac-tion of depth layers from the video. Recently Mansfield etal. [14] exploited the seam carving technique to cause pre-segmented image objects to occlude each other. However,due to the nature of seam carving, distortions are still intro-duced to the carved background image. We show how theaddition of depth information can assist specifically in suchcases, by allowing different parts of the image to overlapin the resized image based on their depth. We compare ourresults to various other techniques using the recent Retar-getMe benchmark [17].

    3 Color and depth imaging

    Depth cameras provide an image where each pixel containsa measurement of the distance from the camera to the ob-ject visible in that pixel. There are several types of depthdevices such as time-of-flight cameras and projected codedimages. Most depth cameras use IR light for depth acquisi-tion since IR is not visible to the human eye and does notinterfere with the color sensor. The device we use is basedon projected IR Light Coding image and provide a depthresolution of 640 480. It is important to note that con-temporary depth sensors are still rather noisy and there aresignificant regions in the image where depth is not accurateor even missing. Moreover, since the depth and color imagesare created and captured with different sensors, the two im-ages are not perfectly aligned and may have different reso-lutions. Specifically, photometric i