automatic expert system for 3d terrain reconstruction based on stereo vision and histogram matching

Expert Systems with Applications xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Automatic expert system for 3D terrain reconstruction based on stereovision and histogram matching

0957-4174/$ - see front matter � 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.eswa.2013.09.003

⇑ Corresponding author. Tel.: +34 685850240.E-mail address: [email protected] (R. Correal).

Please cite this article in press as: Correal, R., et al. Automatic expert system for 3D terrain reconstruction based on stereo vision and histogram maExpert Systems with Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.09.003

R. Correal a,⇑, G. Pajares a, J.J. Ruz b

a University Complutense of Madrid, Software Engineering and Artificial Intelligence Dept., C/ Profesor, José García Santesmases, s/n, 28040 Madrid, Spainb University Complutense of Madrid, Computers Architecture and Automation Dept., C/ Profesor, José García Santesmases, s/n, 28040 Madrid, Spain

a r t i c l e i n f o

Keywords:Expert systemTerrain reconstructionHistogram matchingStereo visionImage processing

a b s t r a c t

This paper proposes an automatic expert system for 3D terrain reconstruction and automatic intensitycorrection in stereo pairs of images based on histogram matching. Different applications in robotics,particularly those based on autonomous navigation in rough and natural environments, require a high-quality reconstruction of the surface. The stereo vision system is designed with a defined geometryand installed onboard a mobile robot, together with other sensors such as an Inertial Measurement Unit(IMU), necessary for sensor fusion. It is generally assumed the intensities of corresponding points in twoimages of a stereo pair are equal. However, this assumption is often false, even though they are acquiredfrom a vision system composed of two identical cameras. We have also found this issue in our dataset.Because of the above undesired effects the stereo matching process is significantly affected, as many cor-respondence algorithms are very sensitive to these deviations in the brightness pattern, resulting in aninaccurate terrain reconstruction. The proposed expert system exploits the human knowledge which ismapped into three modules based on image processing techniques. The first one is intended for correct-ing intensities of the stereo pair coordinately, adjusting one as a function of the other. The second one isbased in computing disparity, obtaining a set of correspondences. The last one computes a reconstructionof the terrain by reprojecting the computed points to 2D and applying a series of geometrical transforma-tions. The performance of this method is verified favorably.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction reprojected to 3D space using perspective transformations, obtain-

1.1. Problem statement

Machine vision is an excellent sensor, widely used for a multi-tude of different applications. In the domain of robotic autonomousnavigation in natural, rough terrain and planetary rovers for spaceexploration, one of the most important features is 3D perceptionand terrain reconstruction for path planning and navigation. Ste-reoscopic vision is a mechanism to obtain depth or range databased on images. It consists of two cameras separated by a givendistance to obtain two differing views of a scene, similar to humanbinocular vision. By comparing both images, relative depth infor-mation is obtained, in the form of disparities, which are inverselyproportional to distance to objects. A matching process computesthe difference in position of a set of features or pixels from oneimage relative to the other. Provided that the position of centersof projection, the focal length and the orientation of the opticalaxes are known, the depth can be established by triangulation ofthe disparities obtained from the matching process. It is then

ing a set of world coordinates.Different methods and strategies for 3D environment recon-

struction using stereo vision have been applied in different works(Bakambu, Allard, & Dupuis, 2006; Goldberg, Maimone, & Matthies,2002; Lin & Zhou, 2009; Morisset, 2009; Song & et al., 2012;Xing-zhe, 2010). While most existing strategies focus in the prob-lem of computation of disparities and the matching process, thereis little work devoted to the correction and validation of the inputimages, beyond vertical alignment and rectification (Papadimitriou& Dennis, 1996; Kang & Ho, 2012). The constant image brightness(CIB) approach assumes that the intensities of correspondingpoints in two images of a stereo pair are equal. This assumptionis central to much of computer vision works. However, surprisinglylittle work has been performed to support this assumption, despitethe fact the many of the algorithms are very sensitive to deviationsfrom CIB. In Cox and Hingorani (1995) a study revealed that afteran examination of 49 images pairs contained in the SRI JISCT stereodatabase (Bolles, Baker, & Hannah, 1993), a dataset that includesimages provided by research groups at JPL, INRIA, SRI, CMU, andTeleos, the constant image brightness assumption is indeed oftenfalse. We have also found the same issue in our set of stereoimages affecting the matching process necessary for terrainreconstruction.

tching.

http://dx.doi.org/10.1016/j.eswa.2013.09.003

mailto:[email protected]


http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa


Fig. 1. Stereo pair of a natural terrain.

2 R. Correal et al. / Expert Systems with Applications xxx (2013) xxx–xxx

The problem of correspondence in real (non-simulated) stereo-scopic systems stems from the fact that images from cameras,although similar, show different intensity levels for the same phys-ical entity in the 3D scene. An important reason for this feature liesin the different response from the camera sensors to the signallight from the scene and also from the different mapping of thescene over each image due to their different locations. That makesnecessary to devote a major research effort to correct these devia-tions typical of all natural and real stereo system, as this problem isnot yet satisfactorily solved, particularly in unstructured anduncontrolled environments. Fig. 1 shows an example of a stereopair taken with the onboard navigation camera, a Videre stereo-scopic system of parallel optical axes separated 9 cm,STH-DCSG-9 color, 640 � 480 pixels resolution and 3 mm minia-ture lenses. It can be appreciated both images expose slightly dif-ferent color tones, although they have been simultaneouslycaptured by two supposedly identical cameras (in next sectionsthe histograms of these images is shown, where differences canbe more easily observed).

Most matching algorithms are based on the minimization of dif-ferences, using correlation between brightness (intensity) patternsin the local neighborhood of a pixel in one image with respect theother (Baker, 1982). These differences in illumination may lead to apoor and inaccurate computation of disparities, and therefore to anincorrect terrain reconstruction.

We propose a new automatic method based on several sequen-tial stages for 3D terrain reconstruction from stereoscopic dataapplied to robotic navigation in natural terrain, where the initialphase, correction of images, is based on the application of thehuman expert knowledge. This leads to the design of the proposedexpert system, gaining an important advantage with regard toother approaches, as no training is required and it can be directlyapplied to the stereo pair under processing becoming independentfrom other images. The design of this automatic expert systemmakes the main contribution of this paper.

1.2. Revision of methods

Several strategies have been proposed to correct intensity val-ues (brightness) in images. Next, some of the most commonly usedtechniques, which have been in addition employed in stereo visionapplications, are commented:

(1) Homomorphic filtering (Correal, Pajares, & Ruz, 2013;Gonzalez & Woods, 2008; Pajares & de la Cruz, 2007) is aprocedure based on the fact that each image is formed bythe concurrence of two-component image: reflectance (r)and illumination (i). The illumination component comesfrom the light conditions in the scene when the image iscaptured and may change as the light conditions alsochange. The reflectance component depends on how theobjects in the scene reflect light, which is determined bythe intrinsic properties of the objects themselves, which(usually) do not change. In many practical applications it is

Please cite this article in press as: Correal, R., et al. Automatic expert system forExpert Systems with Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.

useful to enhance the reflectance component, while the illu-mination is reduced. Homomorphic filtering is a filteringprocess in the frequency domain to compress the brightnessbased on the lighting conditions, while enhancing the con-trast from the reflectance properties of the objects. Thisapproach is based on the fact that an image f(x, y) can beexpressed in terms of illumination i(x, y) and reflectancer(x, y) components by the relationship, f(x, y) = i(x, y)r(x, y).Illumination is associated with low frequencies of the Fou-rier transform and reflectance with high frequencies. Theabove equation cannot be used directly to operate separatelyin the illumination and reflectance components because theFourier transform of the product of two functions is not sep-arable; however, if we define z(x, y) = ln f(x, y) = ln i(x, y) + lnr(x, y), then f{z(x, y)} = f{ln f(x, y)} = f{ln i(x, y)} + f{ln r(x, y)},or Z(u, v) = I(u, v) + R(u, v) where I(u, v) and R(u, v) are theFourier transforms of ln i(x, y) and ln r(x, y) respectively. Ifwe process Z(u, v) by a filter function H(u, v) then we obtain,S(u, v) = H(u, v) Z(u, v) = H(u, v) I(u, v) + H(u, v) R(u, v), whereS(u, v) is the Fourier transform of the result. In the spatialdomain, S(x, y) = f�1{S(u, v)} = f�1{H(u, v) I(u, v)} + f�1{H(u,v) R(u, v)} = i’(x, y) + r’(x, y). Finally, as z(x, y) is obtained bytaking the logarithm of the original image f(x, y), the reverseoperation provides the desired enhanced image g(x, y),namely g(x, y) = exp[s(x, y)] = exp[i’(x, y)] exp[r’(x, y)] = i0(x,y) r0(x, y), where i0(x, y) and r0(x, y) are the illuminationand reflectance components of the output image. Furtherdetails of the application of this process to stereo imagesand results are described in Correal et al. (2013).

(2) Histogram equalization is a method used in image process-ing for contrast adjustment using the image’s histogram(Laia, Chunga, Chena, Lina, & Wangb, 2012; Pajares & de laCruz, 2007). This method usually increases the global con-trast, especially when the usable data of the image is repre-sented by close contrast values. Through this adjustment,the intensities can be better distributed on the histogram.This allows for areas of lower local contrast to gain a highercontrast. Histogram equalization accomplishes this by effec-tively spreading out the most frequent intensity values. Con-sider a discrete grayscale image {x} and let ni be the numberof occurrences of gray level i. The probability of an occur-rence of a pixel of level i in the image is px(i) = p(x = i) = ni/n, 0 6 i < L, L being the total number of gray levels in theimage, n being the total number of pixels in the image, andpx(i) being in fact the image’s histogram for pixel value i,normalized to [0, 1]. The cumulative distribution functioncorresponding to px is cdfx(i) = R px(j), which is also theimage’s accumulated normalized histogram. Then, a trans-formation of the form y = T(x) produces a new image {y},such that its CDF will be linearized across the value range,i.e. cdfy(i) = iK for some constant K. The properties of theCDF allows to perform such a transform; it is defined asy = T(x) = cdfx(x). The T function maps the levels into therange [0, 1]. In order to map the values back into their origi-nal range, the following transformation needs to be appliedon the result: y’ = y (max{x} �min{x}) + min{x}. This methodcan also be used on color images by applying the samemethod separately to the Red, Green and Blue componentsof the RGB color values of the image. Different histogramequalization methods have been applied in stereo vision(Cox & Hingorani, 1995; Liling, Yuhui, Quansen, & Deshen,2012; Nalpantidis & Gasteratos, 2010; Zhang, Lafruit,Lauwereins, & Van Gool, 2010).

(3) In Kawai and Tomita (1998) authors propose a method tocalibrate intensity for images based on segment correspon-dence. First, the edges are detected in each image. Each

3D terrain reconstruction based on stereo vision and histogram matching.09.003


R. Correal et al. / Expert Systems with Applications xxx (2013) xxx–xxx 3

section is defined as a segment by dividing the edges usingsome characteristic points such as turning, wiggle, inflec-tion, transition, etc. This data is then converted into bound-ary representation. The intensity information consists of theintensity value and a derivative at the point, which is thepoint where the derivative is the smallest point in the neigh-borhood in the direction of the normal in the region. Thesegment divides regions with different intensities. The cor-respondence of the segments between I1, the referenceimage, and an image to be corrected I2 is obtained using asegment-based stereo method, finding similar boundariesin I1 and I2. Next, the intensity correspondences betweenimages is found, as the correspondence between the pointsis obtained from the segment correspondences. The inten-sity calibration equation between images (I1, I2) is derivedfrom the distribution. If the correspondence is correct, thepoints are distributed on a straight line, and a straight lineis fitted using the next equation: I2(0) = a I1 + b, where aand b are coefficients and Ii(n) is the image after the nth iter-ation. The image I2(0) is calibrated based on this equation,and a refined image I2(1) is calculated. The process isrepeated using subsequently refined image, until I2(n) � I1.

(4) In Cruz, Pajares, and Aranda (1995), Cruz, Pajares, Aranda,and Vindel (1995) and Pajares, Cruz, and Aranda (1998)authors have applied different learning-based strategies,including Bayesian or neural networks (perceptron, self-organizing feature maps) to learn these claimed differencesbetween the two images in the stereoscopic pair. The mainidea was to estimate appropriate parameters that allowintensities correction.

1.3. Motivational research of the proposed strategy

The methods described in points (1)–(3) are intended to correctand adjust brightness in images. However, the main drawback ofthese approaches is that they were originally designed for singleimages; and even in stereo vision scenarios, they are typicallyapplied separately, processing each image of the pair indepen-dently of one another. However, as introduced before, most match-ing algorithms are based in correlation between brightnesspatterns in the local neighborhood of each pixel to find its corre-spondence in the other image. The methods described in (4) arebased on learning strategies, where a set of sample patterns arealways required. This represents an important drawback because

Fig. 2. Expert system


one is not quite sure if all the scenarios have been incorporatedduring the training phase to incorporate all relevant information.This could be especially dramatic when unknown scenarios appearfor the first time and the system has not been trained.

Based on the above, when working in stereo vision applications,we need a method to correct these deviations and problemsderived from the illumination conditions adjusting the pair ofimages coordinately, or one as a function of the other, tocompensate for those differences typical of all stereoscopic system.Thus, the idea is to apply an automatic image correction strategyfor 3D terrain reconstruction, similar to that a human expert wouldapply to a similar problematic situation. Therefore, this knowledgeis mapped into our design following the logical strategy the experthuman applies. In this scheme the problem concerning with differ-ent illumination patterns receives special attention from the pointof view of human reasoning. To do that, the histograms of bothimages of the stereo pair are obtained and compared; they are thenautomatically matched to adjust their intensity levels, bothchannel by channel in RGB images or converting them to grayscaleimages and matching then their histograms, previous to the stereomatching, reprojection and terrain reconstruction processes. Thisreasoning or knowledge, based on several stages, is the kernel ofthe proposed expert system. Although the main stage is the oneconcerning the histogram matching, the other stages are conve-niently linked to form the body of the proposed reasoning. Eachstage is designed for a given purpose and specific image processesare applied for achieving the goal at each stage.

This paper is organized as follows. In Section 2 we explain thedesign of the proposed automatic expert system with its stagesand the corresponding image procedures associated. In Section 3the performance of the proposed strategy is evaluated and finallyin Section 4, the most relevant conclusions are extracted.

2. Expert system design

2.1. Reasoning for knowledge extraction

As mentioned before, based on a logical human reasoning, theproposed expert system is designed according to the modulararchitecture displayed in Fig. 2. It contains three stages, whichare sequentially linked to form the expert system as a whole. Eachstage contains the required automatic image and data processingmodules.

architecture.




(1) Image processing: performs the automatic processes neces-sary to adjust intensities in the stereo pair of imagescoordinately.

(2) Stereo matching: is the process of identifying features inboth images, searching for correspondences, including anysubsequent process to filter out any potential mismatch orerror.

(3) 3D terrain reconstruction: once the list of correspondenceshas been computed, data is reprojected from 3D to a 2Dplane, in order to build a Digital Elevation Map (DEM). Thatdata is fused with information coming from other sensors,like an Inertial Measurement Unit (IMU) to obtain the robotpose, and a series of mathematical 3D transformations areexecuted to transform data from the camera reference sys-tem to the robot and world reference systems.

2.2. Automatic image processing modules

Following the three previous stages, at each stage a sequence ofimage processing techniques are applied for automatic purposes,they are outlined in the graphic displayed in Fig. 2, being groupedand linked conveniently. In this document, emphasis has been putin the first step mainly, where the images are automatically cor-rected, one as a function of the other, as a human expert wouldperform.

(1) Image processing: Performs the automatic processes neces-sary to adjust intensities in the stereo pair of images coordi-nately. This process is guided by the intuitive humancriterion that two images of the same scene will be muchmore like the more similar are the corresponding spectralvalues. Comparing the intensity histograms of the stereoimages we have found that corresponding pairs of histo-grams could vary significantly. As introduced before, thereare several techniques for histogram equalization, typicallyapplied to single images (Acharya & Ray, 2005; Pajares &de la Cruz, 2007) even in stereo vision applications (Cox &Hingorani, 1995; Nalpantidis & Gasteratos, 2010; Lilinget al., 2012; Zhang et al., 2010). In this work, we have imple-mented an automatic histogram matching procedure. Thistechnique enhances the contrast of images using cumulativedistribution functions to transform the values in an intensityimage, or the values in the colormap of an indexed image, sothat the histogram of the output image approximatelymatches a reference; in this case it uses the histogram ofone of the images in the stereo pair to adjust the other,approximating both illumination components.The algorithm chooses the grayscale transformation T of thereference histogram to minimize:

PleaseExpert

min jc1ðTðkÞÞ � c0ðkÞj ð1Þ

where c0 is the cumulative histogram of the image to be ad-
justed and c1 is the cumulative sum of the reference histo-gram for all intensities k. This minimization is subject tothe constraints that T must be monotonic and c1(T(a)) cannotovershoot c0(a) by more than half the distance between thehistogram counts at a. The procedure uses the transformationb = T(a) to map the gray levels in X (or the colormap) to theirnew values.In the case of our stereo pair of images, A and B, this processis performed from the histograms hA and hB, respectively,from which are obtained cumulative probability values foreach gray level ka and kb to the respective images A and Bas follows,
cite this article in press as: Correal, R., et al. Automatic expert system for 3D terSystems with Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.09.003

PðkaÞ ¼Xka

i¼0

pðkiÞ; and PðkbÞ ¼Xkb

i¼0

pðkiÞ ð2Þ

The matching procedure is to find for each value P(kb) associ-
ated with the intensity level kb, which is the closest P(ka) sothat it allows to exchange the value kb by ka in the image.After this exchange of intensity levels a new image trans-formed B’ is obtained.
(2) Stereo matching: Once intensities of the input images havebeen adjusted, the common way to determine depth, withtwo stereo cameras, is by calculating disparity. This processis similar to human binocular vision and our intuitive per-ception of depth, where the farther the object are in thescene the less their position change when closing our eyesalternately. A similar principle happens in stereo vision:objects lying more far away correspondingly have a smalldifference, or disparity, between the images of the stereopair. Disparity is defined as the subtraction, from the leftimage to the right image, of the 2D coordinates of corre-sponding points in image space. There are two main classesof disparity computation processes: feature-based and area-based methods (Ozanian, 1995). Feature-based methods usesets of pixels with similar attributes, either pixels belongingto edges (Grimson, 1985) or the corresponding edges them-selves (Ayache & Faverjon, 1987); leading to a sparse depthmap used mainly for object segmentation and recognition.Area-based stereo techniques use pixel by pixel correlationbetween brightness (intensity) patterns in the local neigh-borhood of a pixel in one image with respect the other(Baker, 1982); that is the reason why it is so important tocoordinately adjust the intensities in both images of the pairare previously to the matching process.As the application of our work is terrain reconstruction forpath planning and robotic navigation, the objective is toobtain the highest possible number of disparities. Therefore,our matching process employs an area-based method, wherethe number of possible matches is intrinsically high; con-cretely, we implement the SGBM algorithm (Hirschmüller,2005), a dense hybrid approach that divides the image intowindows or blocks of pixels and looks for matches tryingto minimize a global energy function, based on the contentof the window, instead of looking for similarities betweenindividual pixels. Again this represents the mapping of theexpert knowledge for the proposed expert system. Fromthe point of view of the computational approach, it includestwo parameters that control the penalty for disparitychanges between neighboring pixels, regulating the ‘‘soft-ness’’ in the changes of disparity. It is actually one of the bestperforming and more efficient algorithms for disparity com-putation. Further details on the application of this algorithmto terrain images as well as post-processing methods to filterout mismatches correspondences can be found in Correalet al. (2013).

(3) 3D terrain reconstruction: After disparities have been com-puted in the previous stage, and following the previouslycommented human intuitive perception of depth, accordingto the position of each object in the scene perceived by eacheye a human estimates the size and location of an object inthe world with respect to him/her point of view. This is car-ried out by association in brain; unlike the human approachthe proposed machine vision system applies a mathematicaltriangulation method to find distances. The human does notcompute distances but only locations. Moreover, we are onlydealing with parallel optical axes and humans have the abil-ity to apply convergence. Regarding this, some stereovision

rain reconstruction based on stereo vision and histogram matching.



matching systems have applied vergence (Krotkov, 1989).Nevertheless, the main effort in machine vision has beenput on parallel systems that apply the concept that humanscan see also in the far distance. This is the approach pro-posed in our expert system.

To reproject the 3D scene to 2D for digital representation, it isnecessary to correctly match a point of the environment, seen inboth stereo images, with pixel coordinates (x’, y’) in the first imageand (x, y) in the second. Knowing the camera intrinsic parameters:focal length (f), camera baseline (b), and pixel dimension c, seeFig. 3, the point’s coordinates in the camera system reference canbe computed as (X’, Y’, Z) for the first camera and (X, Y, Z) for thesecond. It can be calculated how far away the matched point is(depth Z) by derivation of the next expression:

c � x0 ¼ fX 0

Zc � x ¼ f

XZ

ð3Þ

Z ¼ f ðX0 � XÞcðx0 � xÞ ð4Þ

Knowing the robot current location in the world coordinates systemand the position of the camera with respect to it, as the camera ismechanically fixed to the robot, the position of the camera in worldcoordinates is a 3D point Cw. To transform any point Pw into thecamera’s coordinate system:

Pc ¼ RðPw � CwÞ ð5Þ

The transformation from world to camera coordinates in homoge-neous coordinates:

Xc

Yc

Zc

1

0BBB@

1CCCA ¼

R �RCw

0 1

� � Xw

Yw

Zw

1

0BBB@

1CCCA ð6Þ

The rotation matrix R and the translation vector Cw define the cam-era’s extrinsic coordinates, namely its orientation and position,respectively, in world coordinates. The matrix R transforms fromworld to camera coordinates.

Fig. 3. Stereo system of parallel axes. (x0, y0) and (x0’, y0’) are the images centralpoints and (x, y) and (x’, y’) the coordinates of point P in each image of the pair(Bhatti, 2012).


As a result of this process, a cloud of 3D points, expressed in thecamera coordinates system, is obtained, representing the perceivedenvironment. The position of these points shall be transformed tothe world coordinate system (X, Y, Z), which is independent of thecamera, to represent the terrain, objects and obstacles indepen-dently of where the camera is. The ‘‘Sensor Fusion. Reference Sys-tems Transformations’’ process perform the transformationbetween these coordinate systems, from point (Xc, Yc, Zc) in cameracoordinates to its location (Xw, Yw, Zw) in world coordinates.

In order for the system to automatically reconstruct the surfaceof the terrain, as the robot navigates in natural, rough environ-ments, it is necessary to know the robot pose when the imageswere captured; it may have had any orientation (in its three axis:roll, pitch and yaw). This data is obtained from an Inertial Measure-ment Unit (IMU) onboard the robot and fused with the pointscloud and the extrinsic parameters of the camera – its positionand orientation – for a correct transformation and terrain recon-struction. The projection from camera 3D coordinates (X, Y, Z) intothe 2D image plane (x, y) for internal digital representation, inhomogeneous coordinates is:

ðx; y;1Þ ¼ fXZ; f

YZ;1

� �ð7Þ

3. Results

The images used for this study were acquired with a commer-cially available Videre stereoscopic system of parallel optical axesSTH-DCSG-9 color, with two optics separated 9 cm, 640 � 480 pix-els resolution and 3 mm miniature lenses. They were captured indifferent locations, days, hours and illumination conditions, seeFig. 4.

These digital images were captured and stored as 24-bit colorimages with resolutions of 640 � 480 pixels, and saved in RGB(Red, Green and Blue) a raw format (BMP) so no data is lost bycompression. Two sets of 14 and 21 pair of images were capturedand processed. In all of them, it has been observed differences inthe illumination pattern, as it can be expected in any stereo systemdue to the inherent differences in the cameras and opticscharacteristics.

For the rest of this section, the pair of images shown in Fig. 1(also top left in Fig. 4) will be used as an example to illustratethe performance of the proposed system. Analogous results areobtained using any other stereo pair from these sets of images.We have arrived to this result after the analysis of 90 stereo pairsof images; thus for simplicity the analysis is carried out with thestereo pair in Fig. 1. These images have 307,200 pixels. From those,this concrete example stereo pair has 156,111 potential matches,the rest belong to remote areas, like sky, with no computabledisparity. Executing the Semi-Global Block Matching algorithm(Hirschmüller, 2005) on the original images, that is without anyprevious image processing, it is able to compute 143,296 corre-spondences. From those, 120,738 are right while 22,558 are incor-rectly matched pixels. The algorithm also misses 35,373 matchesthat should have been detected. Therefore, 77.34% of total possiblecorrespondences have been detected by the stereo matching algo-rithm. Fig. 7(a) shows graphically the set of disparities obtained.

To analyze these results, examining the images of the stereopair and comparing their histograms it can be seen they have dif-ferent intensity distributions, see Figs. 5 and 6. These figures showcorresponding histograms of left, right and corrected right images,both in RGB and grayscale, ranging [0, 255] in the X axis andnumber of pixels of each value in the Y axis. Despite both imagescapture the same scene – with a little variation in the point ofview – and have been taken simultaneously with two identical



Fig. 4. Example of terrain images captured with our stereoscopic system.

Left image Right Image Corrected right image

Fig. 5. Corresponding left, right and corrected right image RGB channels’ histograms.


Please cite this article in press as: Correal, R., et al. Automatic expert system for 3D terrain reconstruction based on stereo vision and histogram matching.Expert Systems with Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.09.003


Fig. 6. Corresponding grayscale left, right and corrected right image histograms.

Fig. 7. Disparity computed by the SGBM stereo matching algorithm performed on the (a) original images, (b) images with histograms matched channel by channel, (c) imagesconverted to grayscale and matching their histograms.

Table 1Results obtained by the stereo matching algorithm implemented in the expert system using as input the original images and the ones with the histograms matched.

Stereo matching algorithm Original images Histogram matching (RGB) Histogram matching (Gray)

Total pixels in the image 307,200 307,200 307,200Potential matches 156,111 156,111 156,111Matches found 143,296 141,993 153,305Right matches (%) 120,738 126,608 139,955

77.34 81.1 89.65Wrong matches 22,558 15,385 13,350Missed matches (%) 35,373 29,503 16,156

22.66 18.9 10.35


cameras, their histograms show different spectral levels. Theyshould be identical theoretically, but they are not in the practice,reinforcing the hypothesis that the constant image brightness(CIB) assumption, introduced previously by what it is assumed thatthe intensities of corresponding points in two images of a stereopair are equal, is indeed often false.

From Fig. 5, it can be noticed some channels are more alike thanothers; in this pair of images the blue channels are particularly dis-similar, which could be also appreciated straight away by a humanexpert observing the images, see Fig. 1. However, other channelsdissimilarities can go unnoticed to the naked eye.

Instead using the images directly as input to the stereo match-ing images, the proposed expert system performs an automatichistogram matching process at the initial stage, adjusting the rightimage as a function of the left one, remaining this unaltered. Theresulting histograms can be observed in Fig. 5, last column. Whenworking in the RGB space, each channel is adjusted separately. Itcan be noticed how the resulting histograms of the right imageafter the matching process are more similar to the reference leftimage. As a result, for this pair of images, the stereo matching pro-cess, using the same algorithm, is able to compute 4.86% morecorrect matches, see Table 1. That implies obtaining 5,870 rightcorrespondences more, what translates in 5,870 3D points morethe reprojection process will be able to use to build a reconstruc-


tion of the terrain. Moreover, the number of errors is reduced; few-er wrong matches are computed and fewer correspondencesmissed. Concretely, after matching the histograms the stereo algo-rithm returns 31.8% fewer errors and 16.7% fewer missed matchesthan using the original images. Summarizing, when the systemperforms the automatic histogram matching of each channel sepa-rately previous to the stereo process, for this example the matchingalgorithm finds 5,870 new correspondences, which also means5,870 fewer misses than before, and prevent 7,173 mistakes, orincorrectly matched pixels. The resulting computed disparitiescan be seen in Fig. 7(b).

There is another approach for automatic histogram matching.This is converting the images to grayscale and then matching theirhistograms, see Fig. 6, instead adjusting each RGB channelseparately. Following this strategy, the stereo algorithm is able tocompute more disparities than adjusting each channel indepen-dently; concretely, it computes 15.92% right matches more,19.217 points, with respect to performing the stereo process onthe original images. That increment in the number of points com-puted has a great impact on the reprojection and terrain recon-struction processes. In addition, the number of wrong and missedmatches decreases, obtaining 9,208 fewer errors, 40.8% less thanbefore, see Table 1. The resulting computed disparities can be seengraphically in Fig. 7 (c)




Potential matches (M) accounts for the total number of possiblematches in this pair of images. Matches found (Mf) accounts for thenumber of matches computed by the stereo algorithm. Rightmatches (Mr) counts the set of correctly computed correspon-dences; these are the ones that coincide with ground truth data.Wrong matches (Mw) indicates the number of correspondencescomputed by the algorithm that do not coincide with ground truthdata and therefore should not have been detected as a valid match.Missed matches (Mm) are those ones that, in spite of having a validcorrespondence, the algorithm are not able to compute. The per-centages shown below right and missed matches are computedwith respect to the total number of potential correspondences(M). The number of matches found is equal to the sum of rightand wrong matches (Mf = Mr + Mw); the number of matches foundminus wrong plus missed matches adds up the total number ofpotential matches (M = Mf�Mw + Mm). Ground truth is establishedby creating a template where all pixels with no possible disparity,like the sky, very far areas and portions of the scene than cannot beseen from both images (left-most vertical area) are removed.Results obtained from the stereo process are compared against thistemplate and the set of matches labeled as correct are supervisedby a human expert to double check for possible mismatches.

Results in Table 1 display that a higher number of right corre-spondences are computed and fewer errors are made when imagesare preprocessed before the stereo process. This means that theautomatic process of matching the histograms results in a signifi-cant improvement of the stereo process, and therefore of the ter-rain reconstruction expert system, verifying and supporting theinitial hypothesis. The results obtained in this example are exten-sible to the whole set of images, over 30 pairs.

It can be observed better results are obtained when images areconverted to gray scale before matching their histograms previousto the stereo matching process instead first matching the histo-grams of each channel separately and then converting the imagesto gray scale previous to the stereo process. These results are influ-enced by the matching algorithm used in the stereo process. In thiscase, the Semi-Global Block Matching algorithm computes corre-spondences based on intensity values of each candidate pixel andits surroundings on gray scale images. In case it receives RGBimages, the algorithm initially converts them to gray scale. Theprocess or converting an image from RGB to gray scale is doneby eliminating the hue and saturation information while retainingthe luminance. Therefore, when the histogram matching process isinitially performed on RGB images on each channel separately andthen converted to gray scale, hue and saturation information isremoved after the histogram matching process is done, alteringthe input images and influencing the results obtained by the stereomatching algorithm. However, if RGB images and first converted togray scale the hue and saturation information are removed fromboth images of the pair previous to the histogram matching pro-cess, where pixels intensities are then matched, obtaining betterresults by the SGBM stereo matching algorithm. Results may differin case of implementing a different design of a stereo matchingalgorithm able to work directly on RGB images using the informa-tion of the three different channels to compute correspondences.

4. Conclusions

We propose a new automatic expert systems for image correc-tion and terrain reconstruction in stereo vision applications. It isbased on three consecutive stages where the main underlying ideais the successive application of automatic image processing tasksmapping the expert knowledge.

The expert system is able to adjust the intensity of one image ofthe pair as a function of the other by automatically matching their


histograms, both for RGB images, matching channel by channel,and for grayscale images. It also probes the constant image bright-ness (CIB) assumption is often erroneous, as it is in the case of ourset of stereo pairs.

In addition, once the intensities are corrected, the stereo match-ing process is able to obtain a larger number of correspondences,reducing also errors and missed matches, what have a great impactin the subsequent reprojection and terrain reconstruction pro-cesses, thus the proposed expert system could be extended to dealwith other applications such 3D object segmentation andrecognition.

The expert system has been designed with an open architecture,so that in the future be possible to replace or add new modules,being of particular interest to study different stereo matching algo-rithms and approaches or add a knowledge-base for improvingimage correction based on the accumulated knowledge.

As a future work, and a line currently under development, is theevaluation of different stereo matching algorithms, as a function ofthe application and the input images, analyzing the behavior of theoverall system when a module is updated or replaced by a differentprocess or algorithm within the chain. Actually, the implementa-tion of a stereo matching algorithm able to compute correspon-dences using the information contained in the three channels ofa RGB image and the influence of the histogram matching processperformed on each channel separately previous to the stereo pro-cess is an promising line of work.

A current disadvantage of the proposed approach is preciselythe lack of awareness of the system regarding how to performthe histogram matching process in the most effective way. Itmay occur the situation observed and commented at the end ofthe previous section, where matching separately each channel ofthe input images, while improving the results of the stereo processin comparison to directly feeding the algorithm with the inputimages, lead to obtain a set of results not as good as they couldbe if the histogram matching process is performed over gray scaleconverted images. In this regard, and in order to ensure the mosteffective operation of the system, it is outlined as a future workthe possibility of adjusting the behavior of the system by a set ofparameters so that algorithms can be configured as a function ofthe concrete application and operational conditions, ensuring themaximum effectiveness and adaptation of the system.

References

Acharya, T., & Ray, A. (2005). Image processing: principles and applications. WileyInterscience.

Ayache, N., & Faverjon, B. (1987). Efficient registration of stereo images by matchinggraph descriptions of edge segments. International Journal of Computer Vision, 1,107–131.

Bakambu, J. N., Allard, P., & Dupuis, E. (2006). 3D terrain modeling for roverlocalization and navigation. In Proceeding of the 3rd Canadian conference oncomputer and robot vision.

Baker, H. H. (1982). Building and using scene representations in imageunderstanding. AGARD-LS-185. Machine Perception, pp. 3.1–3.11.

Bhatti, A. (2012). Current advancements in stereo vision. InTech.Bolles, R. C., Baker, H. H., & Hannah, M. J. (1993). The JISCT stereo evaluation. ARPA

Image Understanding, Workshop, pp. 263–274.Correal, R., Pajares, G., & Ruz, J. J. (2013). Stereo images matching process

enhancement by homomorphic filtering and disparity clustering. RevistaIberoamericana de Automática e Informática Industrial, 10(2), 178–184.

Cox, I. J., & Hingorani, S. L. (1995). Dynamic histogram warping of image pairs forconstant image brightness. International Conference on Image ProcessingProceedings, Springer (I–III, pp. B366–B369). Springer.

Cruz, J. M., Pajares, G., & Aranda, J. (1995a). A neural network model in stereovisionmatching. Neural Networks, 8(5), 805–813.

Cruz, J. M., Pajares, G., Aranda, J., & Vindel, J. L. (1995b). Stereo matching techniquebased on the perceptron criterion function. Pattern Recognition Letters, 16,933–944.

Goldberg, S., Maimone, M., & Matthies, L. (2002). Stereo vision and rover navigationsoftware for planetary exploration. In IEEE aerospace conference.

Gonzalez, R. C., & Woods, R. E. (2008). Digital image processing. Englewood Cliffs, NJ:Prentice-Hall.


http://refhub.elsevier.com/S0957-4174(13)00722-7/h0005





















Grimson, W. E. L. (1985). Computational experiments with a feature-based stereoalgorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7,17–34.

Hirschmüller, H. (2005). Accurate and efficient stereo processing by semi-globalmatching and mutual information. In Conf. on computer vision and patternrecognition (CVPR).

Kang, Y. & Ho, Y. (2012). Efficient stereo image rectification method using horizontalbaseline. In Advances in image and video technology. Lecture notes in computerscience (Vol. 7087, pp. 301–310).

Kawai, Y. & Tomita, F. (1998). Intensity calibration for stereo images based onsegment correspondence. In IAPR Workshop on Machine Vision Applications,Makuhari, Chiba, Japan.

Krotkov, E. P. (1989). Active computer vision by cooperative focus and stereo. NewYork: Springer.

Laia, Y., Chunga, K., Chena, C., Lina, G., & Wangb, C. (2012). Novel mean-shift basedhistogram equalization using textured regions. Expert Systems with Applications,39(3), 2750–2758.

Liling, Z., Yuhui, Z., Quansen, S., & Deshen, X. (2012). Suppression for luminancedifference of stereo image-pair based on improved histogram equalization. InProceedings of the computer science and technology (Vol. 6, p. 2).

Lin, L., & Zhou, W. (2009). Interested sample point pre-selection based dense terrainreconstruction for autonomous navigation. In Third international symposium onintelligent information technology application (Vol. 3, pp. 339–343).


Morisset, B. et al. (2009). Leaving flatland: Toward real-time 3D navigation. InProceeding of the IEEE international conference on robotics and automation (pp.3384–3391).

Nalpantidis, L., & Gasteratos, A. (2010). Stereo vision for robotic applications in thepresence of non-ideal lighting conditions. Image and Vision Computing, 28,940–951.

Ozanian, T. (1995). Approaches for stereo matching – A review. ModelingIdentification Control, 16(2), 65–94.

Pajares, G., Cruz, J. M., & Aranda, J. (1998). Stereo matching based on the self-organizing feature-mapping algorithm. Pattern Recognition Letters, 19, 319–330.

Pajares, G., & de la Cruz, J. M. (2007). Visión por computador: Imágenes digitales yaplicaciones. Editorial Ra-ma, 4, 102–105. chap. 4.

Papadimitriou, D. V., & Dennis, T. J. (1996). Epipolar line estimation and rectificationfor stereo image pairs. IEEE Transactions on Image Processing, 5(4).

Song, W. et al. (2012). Intuitive terrain reconstruction using height observation-based ground segmentation and 3D object boundary estimation. Sensors Journal,12, 17186–17207.

Xing-zhe, X. et al., (2010). 3D terrain reconstruction for patrol robot using pointgrey research stereo vision cameras. In Intl. conf. artificial intelligence andcomputational intelligence (AICI) (pp. 1–2).

Zhang, K., Lafruit, G., Lauwereins, R., & Van Gool, L. (2010). Joint integral histogramsand its application in stereo matching. In Proceedings of the internationalconference on image processing (pp. 817–820). Hong Kong, China.

























automatic expert system for 3d terrain reconstruction based on stereo vision and histogram matching

Documents