fast stereo matching using two stage color-based segmentation and dynamic programming

6
Proceeding of the 6 th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009 ISMA09-1 FAST STEREO MATCHING USING TWO STAGE COLOR- BASED SEGMENTATION AND DYNAMIC PROGRAMMING Mohammadjavad Abdollahifard Karim Faez Mohammadreza pourfard Amirkabir University of Technology Department of Electrical Engineering, Tehran, Iran [email protected] Amirkabir University of Technology Department of Electrical Engineering, Tehran, Iran [email protected] Amirkabir University of Technology Department of Electrical Engineering, Tehran, Iran [email protected] ABSTRACT A new method for fast stereo matching is presented in this paper. Our stereo algorithm relies on over-segmenting the source image. Computing match values over entire segments rather than single pixels provides robustness to noise and intensity bias. Color- based segmentation helps to split each image into regions that are likely to contain similar disparities. By employing a dynamic programming technique that applies regularization weights both along and across the scanlines, we solve the typical inter-scanline inconsistency problem. To adaptively determine regularization weight functions, we propose second-stage segmentation that assigns small weights to regions of two different segments to let their common boundary to be accounted as disparity jump. Combining over-segmentation and dynamic programming significantly speeds up stereo matching process while keeping matching results comparable to state-of-the-arts. 1. INTRODUCTION Stereo matching is a problem to find correspondences between two or more input images. It is used in a host of applications such as creation of 3D models, robot navigation, parts inspection, and image-based rendering. Hence it has been studied in computer vision field for decades. However, there still exist some difficult inherent problems in stereo matching, for example the presence of homogeneously textured regions, and occlusions near the object boundaries that make the disparity assignment very difficult. Stereo algorithms consist of three fundamental elements, namely the representation, the objective function, and the optimization technique. The representation refers to how the images are used to decide depth or disparity: independent pixels, voxels, rectangular local windows, 1D features (lines, contours), or segments. The objective function specifies the weighting of the data fit term relative to the regularization term. Finally, optimizing of the objective function can take various forms, such as winner-take-all, dynamic programming [7], graph cuts [15], and belief propagation [8]. Using the reasonable assumption that neighboring pixels with similar colors have similar or continuous depths, some researchers have used image segments to simplify the stereo problem. This has three important effects, first it reduces the ambiguity associated with textureless regions. The side effect of the assumption is that depth discontinuities tend to occur at color boundaries. Second, by dealing with much larger segments the computational complexity is reduced. Finally noise tolerance is enhanced by aggregating over pixels with similar colors. Many techniques have been proposed to break up the image into segments. Recent works directly rely on color segmentation [6], and over-segmented regions [2,3]. In this paper, the input image is over-segmented using color similarity criteria. Then, local matching is carried out and results of the local matching are applied to global optimization. This approach has been already adopted in several algorithms [9, 13, 1]. Regularization term of objective function is weighted with an adaptive coefficient. To adaptively determine the regularization weight, we propose second-stage segmentation where neighboring over-segmented regions with similar colors are merged to form larger segments. The regularization term is greater if two neighboring regions belong to a single second-stage segment. Because of our segmentation algorithm, inter-scanline inconsistence is not a big challenge here and can be handled easily. So, we do not use two-pass dynamic programming. Instead of it, regularization terms in horizontal, vertical, and diagonal directions are applied to maintain smoothness throughout a single second stage segment. We have developed a fast stereo matching algorithm which its speed is due to some key features: use of over-segmented regions instead of independent pixels, use of dynamic programming instead of two-pass dynamic programming or other global optimization approaches, and use of candidate disparities in optimization process instead of testing all of possible disparities. So, our stereo matching algorithm is a fast one while achieving the accuracy comparable to the state-of-the-arts. 2. OVER-SEGMENTATION As mentioned previously, segmentation helps the stereo matching algorithm by reducing the ambiguity associated with textureless regions, reducing the computational complexity, and enhancing noise tolerance [2, 3, 6]. The segment size needs to be at a trade-off point where the amount of information within a segment is sufficient for matching without compromising the characterization of true disparity distribution. If a segment is too small, it is difficult for it to unambiguously find the correct pixel correspondence. On the other hand, segments that cover a complex disparity distribution or straddle more than two objects are often undesirable. The use of over-segmentation strikes a good balance between providing

Upload: amirkabir

Post on 28-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009

ISMA09-1

FAST STEREO MATCHING USING TWO STAGE COLOR-BASED SEGMENTATION AND DYNAMIC PROGRAMMING

Mohammadjavad Abdollahifard Karim Faez Mohammadreza pourfard

Amirkabir University of Technology

Department of Electrical Engineering, Tehran, Iran

[email protected]

Amirkabir University of Technology

Department of Electrical Engineering, Tehran, Iran [email protected]

Amirkabir University of Technology

Department of Electrical Engineering, Tehran, Iran [email protected]

ABSTRACT

A new method for fast stereo matching is presented in this paper. Our stereo algorithm relies on over-segmenting the source image. Computing match values over entire segments rather than single pixels provides robustness to noise and intensity bias. Color-based segmentation helps to split each image into regions that are likely to contain similar disparities. By employing a dynamic programming technique that applies regularization weights both along and across the scanlines, we solve the typical inter-scanline inconsistency problem. To adaptively determine regularization weight functions, we propose second-stage segmentation that assigns small weights to regions of two different segments to let their common boundary to be accounted as disparity jump. Combining over-segmentation and dynamic programming significantly speeds up stereo matching process while keeping matching results comparable to state-of-the-arts.

1. INTRODUCTION

Stereo matching is a problem to find correspondences between two or more input images. It is used in a host of applications such as creation of 3D models, robot navigation, parts inspection, and image-based rendering. Hence it has been studied in computer vision field for decades. However, there still exist some difficult inherent problems in stereo matching, for example the presence of homogeneously textured regions, and occlusions near the object boundaries that make the disparity assignment very difficult. Stereo algorithms consist of three fundamental elements, namely the representation, the objective function, and the optimization technique. The representation refers to how the images are used to decide depth or disparity: independent pixels, voxels, rectangular local windows, 1D features (lines, contours), or segments. The objective function specifies the weighting of the data fit term relative to the regularization term. Finally, optimizing of the objective function can take various forms, such as winner-take-all, dynamic programming [7], graph cuts [15], and belief propagation [8]. Using the reasonable assumption that neighboring pixels with similar colors have similar or continuous depths, some researchers have used image segments to simplify the stereo problem. This has three important effects, first it reduces the ambiguity associated with textureless regions. The side effect of the assumption is that depth discontinuities tend to occur at color

boundaries. Second, by dealing with much larger segments the computational complexity is reduced. Finally noise tolerance is enhanced by aggregating over pixels with similar colors. Many techniques have been proposed to break up the image into segments. Recent works directly rely on color segmentation [6], and over-segmented regions [2,3]. In this paper, the input image is over-segmented using color similarity criteria. Then, local matching is carried out and results of the local matching are applied to global optimization. This approach has been already adopted in several algorithms [9, 13, 1]. Regularization term of objective function is weighted with an adaptive coefficient. To adaptively determine the regularization weight, we propose second-stage segmentation where neighboring over-segmented regions with similar colors are merged to form larger segments. The regularization term is greater if two neighboring regions belong to a single second-stage segment. Because of our segmentation algorithm, inter-scanline inconsistence is not a big challenge here and can be handled easily. So, we do not use two-pass dynamic programming. Instead of it, regularization terms in horizontal, vertical, and diagonal directions are applied to maintain smoothness throughout a single second stage segment. We have developed a fast stereo matching algorithm which its speed is due to some key features: use of over-segmented regions instead of independent pixels, use of dynamic programming instead of two-pass dynamic programming or other global optimization approaches, and use of candidate disparities in optimization process instead of testing all of possible disparities. So, our stereo matching algorithm is a fast one while achieving the accuracy comparable to the state-of-the-arts.

2. OVER-SEGMENTATION

As mentioned previously, segmentation helps the stereo matching algorithm by reducing the ambiguity associated with textureless regions, reducing the computational complexity, and enhancing noise tolerance [2, 3, 6]. The segment size needs to be at a trade-off point where the amount of information within a segment is sufficient for matching without compromising the characterization of true disparity distribution. If a segment is too small, it is difficult for it to unambiguously find the correct pixel correspondence. On the other hand, segments that cover a complex disparity distribution or straddle more than two objects are often undesirable. The use of over-segmentation strikes a good balance between providing

Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009

ISMA09-2

Figure 1. Neighboring pixels groups used for averaging. segments that contain enough information for matching and reducing the risk of a segment spanning multiple objects. The over-segmentation approach occupies the space between single pixel matching and standard segmentation approaches. The goal of over-segmentation is to split each image into regions that are likely to contain similar disparities. In creating these segments we assume that areas of homogeneous color have smooth disparities. Segmentation is done in a two step process as shown in Figure 2. We first smooth the image using an anisotropic technique [2]. Next we cluster neighboring colors using a simple K-means technique. Using the smoothing algorithm we remove as much image noise as possible in order to create more consistent segments. Our smoothing algorithm iteratively averages (8 times) a pixel with three contiguous neighbors as shown in Figure 1. The set of pixels used for averaging is determined by which pixels have the minimum absolute difference in color from the center pixel. After smoothing, the image is partitioned into a grid of equally sized segments (8 8). The shape and size of each segment are then refined using an iterative K-means algorithm. Each segment

is modeled as a Gaussian in color space with mean and covariance matrix Σ . Similarly the spatial extent of the segment is modeled with a Gaussian with mean and covariance matrix Δ . During each K-means iteration , , and Δ are updated. To ensure that segments consist of pixels with roughly constant color, Σ is held fixed to some scalar multiple of the image noise.

3. LOCAL MATCHING

In local matching, the disparity candidates for true disparity of each segment are obtained. First, data fit term is computed for each segment. Then,   best matches are selected as disparity candidates. Finally post-processing is carried out to make the stereo matching procedure ready for subsequent step, i.e. dynamic programming. For convenience we assume that input images are rectified. Then correspondences between input images are represented by a univalued disparity function , with respect to a pixel , of the reference image. The disparity function can take one of integer values within the disparity range of the scene. To compute data fit term of objective function, first, the initial matching cost

, , for each pair of pixel , and its disparity is evaluated by

, , | , , |.  (1)

Unlike global form of disparity function for an over-segmented image, disparities are assigned to segments and are considered

constant within the whole segment. So, disparity function can be represented as d(xs,ys), which indicates to disparity value of segment S in xsth column and ysth row of segments. After computing the initial cost, data fit term for each segment is computed by

, , ∑ , ,, ,  (2)

Where is the number of pixels of the segment.

The data fit term for each segment is now sorted and best disparities with best matching are selected as disparity candidates. Using these candidates lessen computational complexity of dynamic programming / times, where is the number of all possible disparities. Note that using too small ratio of / (i.e. / <0.25) can reduce the matching accuracy greatly. Finally, in the post-processing step, a method is used to handle occlusions [1, 7]. Occlusions result in matching ambiguities. To prevent these ambiguities to contribute in objective function and optimization process, matching cost of all candidate disparities at occluded parts of the image should be equalized to a constant. Let

be the winner-take-all disparity of segment of the reference image, i.e.

min arg min ( , , )r s s

dd C x y d=

Figure 2. Segmentation procedure: (a) original image, (b) smoothing image, (c) image broken into square grid, (d) over-segmentation results, (e) result of second stage segmentation. Different segments have different colors.

Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009

ISMA09-3

Translating by disparity result in a region, named , in the matching image that may span more than one segment. For each of these segments ( s) using the same approach a disparity, , is computed. If for all , then the segment fails to pass visibility test. For such a segment matching cost of all candidate disparities are equalized to zero, so these segments do not contribute in cost function and optimization process.

4. OPTIMIZATION

4.1. Dynamic programming

In this section dynamic programming is performed using the scanline optimization. After local matching for each segment candidate disparities are selected. During the dynamic programming process we should select one of these disparities for each segment. The optimization along the scanlines find a path of disparities that minimizes the following energy functional, , ∑ , , ,

  ∑ , 1,   ∑ , 1, 1   ∑ , , 1   ∑ , 1, 1 , (3) for a scanline . Equation (3) is composed of five parts. The first term is the matching cost and can be evaluated by the procedure mentioned in section 2. The other four terms are horizontal, vertical, and two diagonal regularization terms. For each segment four neighbors are considered as shown in Figure 3, and for each neighbor a regularization term is added. In Equation (3), is an increasing function of disparity difference between adjacent over-segmented regions evaluating the smoothness of the disparity function, and , , , and are weight functions. Note that : , 1 was computed during previous scanline optimization and is considered constant in current scanline. So, only : , should be found in order to minimize the energy function. As a function, Potts model [12] has been widely used in pixel-based stereo, since it can handle the disparity jumps:

0          01          0. (4)

To avoid excessive smoothing in homogeneously textured regions, the modified Potts model is used. The modified Potts model incorporates disparity gradient constraint into the original Potts model, and is written as

| |           | | 3    1                  .

(5)

In contrast to the original Potts model preferring the fronto-parallel planes (especially in the homogeneous regions), the modified Potts model encourages the slanted planes by lowering the cost imposed on one and two-pixel sized disparity differences. This strategy aids to diminish the influence of the smoothness constraint on the slanted surfaces where disparities of neighboring pixels commonly vary within a small range.

4.2. Weight functions

To complete the objective function we should determine weight functions. In pixel-based stereo s ( 1,2,3,4) are selected to be inversely proportional to the intensity gradient to help align the disparity jumps with the intensity edges [1, 7, 11, 15]. For example for , horizontal intensity gradient computed by 3 3 sized horizontal Sobel operator can be used. By applying one or two thresholds to the gradient function, is set to one of two or three different constants. Although intensity gradient thresholding is a good idea to let disparity jumps to occur at image intensity edges, but it cannot be implemented here directly and should be modified to fit the segment-based stereo. Some different variants of this technique that are based on the same idea are tested by authors, but they fails to achieve acceptable results. So we have proposed second-stage segmentation to be used to evaluate the weight functions. Unlike intensity gradient technique where color information is not considered, our method relies on a color-based segmentation. Over-segmented regions with similar colors are assigned to a single second-stage segment, so they are encouraged to have low disparity differences. Starting from one segment, 8-connected segments of the current segment are tested. Let initial segment color mean to be , and neighboring segment color mean to be if , the two segments are merged and color mean is updated. The process continues until the second-stage segment reaches its boundaries. is a threshold to impose color similarity. Now can be calculated using the following equation:

Figure 3. Evaluating weight functions by second stage segmentation. Table 1. Running time in seconds

Tsukuba Venus Teddy Cones Pixels. 110592 166222 168750 168750 Disps. 15 20 60 60 Can.disps 10 10 20 20 Time. 2.8 4.3 11.2 12.1

Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009

ISMA09-4

Table 2. Performance comparison table from the Middlebury stereo vision page. Error percentages are calculated over 3 different areas in the image classified as nonocc (nonoccluded), the entire image (all), and discontinuous (disc).

Algorithm Tsukuba Venus Teddy Cones nonocc all disc nonocc all disc nonocc all disc nonocc all disc

AdaptingBP 1.11 1.37 5.79 0.1 0.21 1.44 4.22 7.06 11.8 2.48 7.92 7.32 Coopregion 0.87 1.16 4.61 0.11 0.21 1.54 3.53 8.3 9.63 2.9 8.87 7.79 DoubleBP 0.88 1.29 4.76 0.13 0.45 1.87 3.55 8.71 9.7 2.9 9.24 7.8

OutlierConf 0.88 1.43 4.74 0.18 0.26 2.40 3.45 8.38 10.0 2.93 8.73 7.91 Our Method 1.67 1.91 7.39 0.61 0.63 5.01 6.15 10.8 14.3 3.3 11.9 11.8

Figure 4. disparity maps produced by our method. Left column shows reference images. Center column shows disparity maps produced by our method. Right column shows ground truth disparity maps.

Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009

ISMA09-5

sin0.25 ,k if two regionsbelong to a gle segment

k otherwiseλ

⎧= ⎨⎩

(6)

where is a constant. Using two different values of , can change to a triple-valued function. Figure 3 illustrate s for a small region of an image. Large is demonstrated with thick solid lines, while the thin solid line represents small weight value.

After calculating s from Equation (6), if at least one of upper over-segmented regions belong to the current segment then ,

, and are refined so that :

4

2

( 2,3, 4).ii

jj

k iβ λλλ

=

= =

∑ (7)

Where is a constant near one ( 1). The refinement results in stronger effect of horizontal neighbor ( ) versus other three neighbors. Hence, if any error occurs during previous scanline optimization, it is less likely to distribute this error in scanlines below.

Over-segmented region boundaries, themselves, tend to lie on object edges. But in some areas, spatial limitations of these segments prevent them to do so. Second-stage segments, however, expand themselves with no limitation to object boundaries. Although all of the second-stage segments boundaries do not lie on object boundaries, but the threshold can be selected so that approximately all of the object boundaries lie on second-stage segments boundaries. In other words, second-stage segments boundaries are disparity discontinuity candidates and dynamic programming optimization decides whether these candidates coincide with disparity jumps or not.

After computing weight functions, the objective function is completed and ready to be used in dynamic programming. Now, starting from first row of segments the objective function is optimized line by line. Obviously for first row neighboring segments , , and do not exist so they are not considered in objective function.

5. EXPERIMENTAL RESULTS

We evaluated the proposed algorithm using four standard data sets, Tsukuba, Venus, Teddy and Cones that are provided by Sharstein and Szeliski on the web [17]. The quality metric is the percentage of error disparities deviating from the ground truth more than one pixel.

As discussed we used a 3 3 anisotropic filter for image smoothing and then the image was broken into a grid of 8 8 segments. In the experiments, the threshold for second-stage segmentation was set to 8. For weight functions, the parameters and were set to 0.1 and 0.8. Number of candidate disparities in cases was set to 10, 10, 20 and 20 for four images. All of the parameters were fixed for the four data sets.

Running Time, Table 1 reports the running time of our optimization algorithm obtained on a Pentium IV 2.8 GHZ PC.

Comparing our results to other fast algorithms [7, 16] shows that our algorithm is a fast one while keeping the matching accuracy comparable to state-of-the-arts. Matching Accuracy, Figure 4 shows the final disparity maps computed using two stage segmentation and dynamic programming. The overall evaluation and comparison is presented in Table 2. The first four rows of the table are most accurate methods of Middlebury stereo page. Our method shows accuracy comparable to these methods and by considering its computation time benefits, it is a promising method for fast stereo matching.

6. CONCLUSION

In this paper we have developed a stereo matching technique by over-segmenting the input image and use of dynamic programming to perform optimization. Adaptive weights for regularization terms of the cost function are applied using second-stage color-based segmentation. To evaluate the effectiveness of our algorithm we have used the Middlebury benchmark. Experimental results show that our algorithm has performance comparable to state-of-the-art stereo matching algorithms. Besides, our algorithm is a fast one because of use of segments, candidate disparities, and dynamic programming which is a fast optimization technique.

7. ACKNOWLEDGMENT

The authors would like to thank the Iran Telecommunication Research Center (ITRC) as the financial supporter of the project, as well as Daniel Scharstein and Richard Szeliski from Middlebury College for their Stereo Datasets.

8. REFERENCES

[1] A.F.Bobick and S.S.Intile, “Large occlusion stereo,” in International Journal of Computer Vision, 1999, pp. 181-200. [2] C.L.Zitnick, S.B.Kang, M.Uyttendaele, S.Winder, and R.Szeliski, “High-quality video view interpolation using a layered representation,” in Proceedings of SIGGRAPH (ACM Transactions on Graphics), 2004, pp.600–608. [3] C.L.Zitnick, and S.B.Kang, “Stereo for image-based rendering using image over-segmentation,” in International Journal of Computer Vision,Springer,2007, pp.49-65 [4] D.Scharstein and R.Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” in International Journal of Computer Vision,April-June 2002, pp. 7-42. [5] D.Scharstein and R.Szeliski, “High-accuracy stereo depth maps using structured light,” in Proc. Conf. on Computer Vision and Pattern Recognition, 2003, vol.1, pp. 195- 202. [6] H.Tao, H.S.Sawhney, and R.Kumar, “A global matching framework for stereo computation,” in International Conference on Computer Vision, 2001, vol.1, pp. 532–539. [7] J.C.kim, K.M.Lee, B.T.choi, and S.U.Lee, “A dense stereo matching using two-pass dynamic programming with generalized ground control points,” in CVPR, 2005

Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009

ISMA09-6

[8] J.Sun, Y.H.Shum, and N.N.Zheng, “Stereo matching using belief propagation,” in IEEE transactions on Pattern Analysis and Machine Intelligence, 2003, pp.787–800. [9] M.Agrawal and L.Davis, “Window-based, discontinuity preserving stereo,” in Proc. Conf. on Computer Vision and Pattern Recognition, 2004, vol.1, pp. 66-73. [10] M.Bleyer, and M.Gelautz, “A layered stereo algorithm using image segmentation and global visibility constraints,” in ICIP, 2004, pp. 2997–3000. [11] P.Fua, “A parallel stereo algorithm that produces dense depth maps and preserves image features,” in Machine Vision and Applications, 1993, pp. 35-49. [12] R.B.Potts, “Some generalized order-disorder transitions,” in Proc. Camb. Phil. Soc. , 1952, vol. 48, pp. 106-109. [13] S.B.Kang, R.Szeliski, “Extracting view-dependent depth maps from a collecting of images,” in International Journal of Computer Vision,July 2004, pp. 139-163. [14] S.Huq, B.Abodi, and M.abidi, “Stereo-based 3D face modeling using annealing in local energy minimization,” in 14th International Conference on Image Analysis and Processing (ICIAP), IEEE, 2007, pp. 265-272. [15] Y.Boykov, O.Veksler and R.Zabih, “Fast approximate energy minimization via graph cuts,” in IEEE Trans. On Pattern Analysis and Machine Intelligence, vol.23, No.11, 2001, pp. 1222-1239. [16] Y.Wei and L.Quan, “Region-based progressive stereo matching,” in Proc. Conf. on Computer Vision and Pattern Recognition, 2004, vol.1, pp. 106-113. [17] http://www.middlebury.edu/stereo/.